Snapshot & Replay

What It Does

A snapshot captures a workspace as a single tar file: mount configs, sessions, history, finished jobs, cache bytes, and one fingerprint per recorded remote read. Loading a snapshot reconstructs the workspace and verifies that the underlying sources have not drifted since capture. When the backend exposes a stable per-object revision marker (S3 VersionId, Drive revisionId, Git commit SHA), the snapshot also records that revision. At load time, reads pin to the recorded revision and serve the exact bytes the original agent saw, even if the live object has since been overwritten.

The API

# capture — async; serializes state plus fingerprints recorded during reads
await ws.snapshot("run.tar")
await ws.snapshot("run.tar.gz", compress="gz")

# replay — drift check fires on first dispatch/execute
restored = await Workspace.load("run.tar")                    # STRICT default
restored = await Workspace.load("run.tar", drift_policy=DriftPolicy.OFF)

# in-process duplicate (shares remote resources, restores local content fresh)
cp = await ws.copy()

import { DriftPolicy, Workspace } from '@struktoai/mirage-node'

// capture to a tar file
await ws.snapshot('run.tar')

// replay with strict drift detection (default)
const restored = await Workspace.load('run.tar')

// or restore the structure while reading current remote content
const current = await Workspace.load('run.tar', {
  driftPolicy: DriftPolicy.OFF,
})

// in-process duplicate
const copy = await ws.copy()

TypeScript file-path snapshot I/O is available in Node. In the browser, Workspace.load(snapshotBytes) accepts a Uint8Array, and Workspace.fromState(...) restores a state object; writing the tar to a file or download target is the application’s responsibility. Both snapshot and copy are async because they serialize workspace state and collect recorded fingerprints.

Versioning: commit, checkout, clone

A tar snapshot is a portable capture you move between machines. Versioning is the in-place complement: a git-style history kept with the workspace, server-side and keyed by workspace id, so you can checkpoint, roll back, branch, and fork without writing a file. It is backed by a real git object store (dulwich) inside the daemon. This is a CLI / REST feature, not an SDK method (there is no ws.commit() in-process):

mirage workspace commit demo -m "before refactor"   # checkpoint the live state
mirage workspace checkout demo <version>             # roll back in place
mirage workspace clone demo --at <version>           # fork a past version

See the CLI versioning reference for log, diff, branch, and the full flag set.

	Tar snapshot	Versioning
Output	a `.tar` you move or archive	history kept with the workspace (server)
API	`ws.snapshot()` / `Workspace.load()` (SDK)	`mirage workspace commit` / `checkout` / `clone` (CLI + REST)
Lineage	none, each tar stands alone	git-style commits, branches, diff
Best for	reproducing a run on another machine	iterating in place with checkpoints to revert to

What Is And Isn’t Captured

Captured

Mount configs (creds redacted; restore via resources= override)
Sessions, history, finished jobs
Cache bytes for touched paths
One fingerprint per remote read (ETag-equivalent)
Optional per-path revision when the backend exposes one

Not captured

Live state of mounts with SUPPORTS_SNAPSHOT=False (Gmail, Slack, Linear, Notion, …)
Files the agent never touched
Raw bytes of remote objects (recoverable only via revision pin)

Drift Detection

On the first dispatch or execute after load, Mirage stats every fingerprinted path against the live source in parallel. If any path’s live fingerprint differs from the recorded one, the workspace raises ContentDriftError:

try:
    await ws.execute("cat /s3/data.csv")
except ContentDriftError as exc:
    print(exc.path, exc.snapshot_fingerprint, exc.live_fingerprint)

Paths that carry a revision pin are skipped — the pinned read serves the exact original bytes, so a fingerprint mismatch is expected and harmless.

Drift Policies

Policy	Behavior on mismatch	Use when
`STRICT` (default)	Raise `ContentDriftError` on first mismatch.	Reproducing an agent run; you want to know the world moved.
`OFF`	Skip drift checks entirely. Evict snapshot cache for fingerprinted paths so reads serve current.	You only wanted the workspace skeleton, not the bytes.

Pass via await Workspace.load(..., drift_policy=DriftPolicy.OFF).

How It Composes With Caching

Snapshots interact with two existing caches:

Cache	Holds	Snapshotted?
File cache (`Workspace._cache`)	raw bytes per virtual path	✅ bytes for touched paths are serialized into the tar
Index cache (`Resource._index`)	`FileStat` entries for listings	❌ rebuilt lazily after load

Only files the agent actually read are fingerprinted. Capture walks ws._ops.records for op == "read" and dedups by path. Files that were listed but never opened, or touched only by stat / readdir, do not carry a fingerprint or a revision. At load time, three pieces of restored state cooperate per read:

Cache is consulted first. Snapshot bytes go back into Workspace._cache; a warm read returns them with no network round-trip.
Fingerprint verifies the cache. Under STRICT, the eager drift check stats every recorded path against live and raises before any read fires if anything moved. The cache is therefore trusted as authoritative until proven stale.
Revision pin is the cold-path recovery. When the cache misses and the backend supports pinning, reads fetch the exact recorded revision (S3 GetObject(VersionId=...)) so you still get original bytes, not the live head.

The same three states under each policy:

Scenario	Cache	Pin	Drift check	Net effect
STRICT, fingerprint matches, cache warm	served	n/a	passes	Original bytes from cache (~0 ms)
STRICT, fingerprint matches, cache cold	miss	n/a	passes	Live GET, current bytes (= original)
STRICT, no pin, fingerprint differs	(n/a)	(n/a)	raises `ContentDriftError`	Caller informed before reading
STRICT + pin (versioned backend)	served (matches recorded)	installed	skipped for pinned paths	Original bytes from cache or pinned GET
OFF, any state	evicted for fingerprinted paths	not installed	skipped	Live GET, current bytes

The cache is the optimization, the fingerprint is the verifier, and the pin is the recovery — three independent guarantees that “what you replay equals what you captured.”

Resource Support Matrix

Remote drift detection is opt-in per resource through SUPPORTS_SNAPSHOT in Python and supportsSnapshot in TypeScript. A working adapter must also attach a fingerprint to each read record; a revision is optional and enables pinned replay. Legend: ✅ = implemented · 🟡 = resource opts in, but recorded reads do not yet carry the fingerprint · ❌ = live-only · — = unavailable in that runtime.

Remote Resources

Resource family	Python	TS Node	TS Browser	Revision pin	Notes
S3 and S3-compatible object stores	✅	✅	✅	When `VersionId` is available	Uses `ETag`; compatible providers without object versions still get drift detection.
OneDrive	✅	—	—	✅	Uses Microsoft Graph `cTag` plus the current version id.
SharePoint	✅	—	—	✅	Same Microsoft Graph version flow as OneDrive.
Hugging Face resources	🟡	🟡	—	❌	Resources opt in, but read records do not yet include the stat fingerprint.
Nextcloud	🟡	—	—	❌	The resource opts in, but read records do not yet include the WebDAV ETag.
GitHub and Google Drive	❌	❌	❌	❌	Current reads are live-only; revision-aware replay is not wired yet.
Other remote resources	❌	❌	❌	❌	Snapshot restores their config/state, then reads current remote content.

Local State

RAM, Disk, and Redis serialize their resource state into the snapshot. They do not need a remote drift check or revision pin: replay restores the captured state directly. Credentials and connection details remain redacted and may require a resource override at load time.

Extending To A New Backend

Three steps in Python:

from mirage.observe.context import record, revision_for
from mirage.resource.base import BaseResource
from mirage.types import FileStat


class MyResource(BaseResource):
    SUPPORTS_SNAPSHOT = True  # 1. opt in

    async def stat(self, ...) -> FileStat:
        return FileStat(
            ...,
            fingerprint=my_etag_equivalent,        # 2. for drift
            revision=my_revision_marker_or_None,   # 3. optional, for pin
        )

And in your read function, look up the active pin and pass both the fingerprint and the revision through to record so the snapshot captures whatever the backend served:

async def read_bytes(accessor, virtual_path, ...):
    pinned = revision_for(virtual_path)  # 4. None if no pin installed
    response = backend_get(virtual_path, revision=pinned)
    record(
        "read", virtual_path, "my-backend", len(response.body), start_ms,
        fingerprint=response.etag,        # 5. for drift detection
        revision=response.revision,       # 6. for pinning on replay
    )
    return response.body

At load time the workspace writes each manifest entry straight into mount.revisions — no per-resource hook required. If your backend has no stable revision, skip steps 3 and 6; drift detection still works on the fingerprint alone.

Caveats

Revision longevity. Pinned reads only work as long as the source still retains the recorded revision. S3 bucket lifecycle rules can age out old versions; Drive keeps revisions for 30 days on non-Workspace files. Treat pins as best-effort.
First read after load is slower than the rest. Workspace.load() returns immediately, but the first execute() or dispatch() afterwards pauses while Mirage verifies that nothing has drifted upstream. Concretely, it asks the live source “is this path still the bytes I remember?” once per recorded read, in parallel. Tiny for short sessions (tens of milliseconds); a few hundred milliseconds to a couple of seconds for sessions with hundreds of recorded reads. The pause happens once per loaded workspace; subsequent calls are normal speed. Pass drift_policy=DriftPolicy.OFF if you want to skip the check entirely.

Getting Started

Learn

Help

Setup

FUSE

Snapshot & Replay

What It Does

The API

Versioning: commit, checkout, clone

What Is And Isn’t Captured

Captured

Not captured

Drift Detection

Drift Policies

How It Composes With Caching

Resource Support Matrix

Remote Resources

Local State

Extending To A New Backend

Caveats

​What It Does

​The API

​Versioning: commit, checkout, clone

​What Is And Isn’t Captured

Captured

Not captured

​Drift Detection

​Drift Policies

​How It Composes With Caching

​Resource Support Matrix

​Remote Resources

​Local State

​Extending To A New Backend

​Caveats

What It Does

The API

Versioning: commit, checkout, clone

What Is And Isn’t Captured

Drift Detection

Drift Policies

How It Composes With Caching

Resource Support Matrix

Remote Resources

Local State

Extending To A New Backend

Caveats