What It Does
A snapshot captures a workspace as a single tar file: mount configs, sessions, history, finished jobs, cache bytes, and one fingerprint per recorded remote read. Loading a snapshot reconstructs the workspace and verifies that the underlying sources have not drifted since capture. When the backend exposes a stable per-object revision marker (S3VersionId, Drive revisionId, Git commit SHA), the snapshot also records that revision. At load time, reads pin to the recorded revision and serve the exact bytes the original agent saw, even if the live object has since been overwritten.
The API
snapshot and copy are async because fingerprint capture stats each touched path on a SUPPORTS_SNAPSHOT mount.
What Is And Isn’t Captured
Captured
- Mount configs (creds redacted; restore via
resources=override) - Sessions, history, finished jobs
- Cache bytes for touched paths
- One fingerprint per remote read (ETag-equivalent)
- Optional per-path
revisionwhen the backend exposes one
Not captured
- Live state of mounts with
SUPPORTS_SNAPSHOT=False(Gmail, Slack, Linear, Notion, …) - Files the agent never touched
- Raw bytes of remote objects (recoverable only via revision pin)
Drift Detection
On the firstdispatch or execute after load, Mirage stats every fingerprinted path against the live source in parallel. If any path’s live fingerprint differs from the recorded one, the workspace raises ContentDriftError:
Drift Policies
| Policy | Behavior on mismatch | Use when |
|---|---|---|
STRICT (default) | Raise ContentDriftError on first mismatch. | Reproducing an agent run; you want to know the world moved. |
OFF | Skip drift checks entirely. Evict snapshot cache for fingerprinted paths so reads serve current. | You only wanted the workspace skeleton, not the bytes. |
Workspace.load(..., drift_policy=DriftPolicy.OFF).
How It Composes With Caching
Snapshots interact with two existing caches:| Cache | Holds | Snapshotted? |
|---|---|---|
File cache (Workspace._cache) | raw bytes per virtual path | ✅ bytes for touched paths are serialized into the tar |
Index cache (Resource._index) | FileStat entries for listings | ❌ rebuilt lazily after load |
ws._ops.records for op == "read" and dedups by path. Files that were listed but never opened, or touched only by stat / readdir, do not carry a fingerprint or a revision.
At load time, three pieces of restored state cooperate per read:
- Cache is consulted first. Snapshot bytes go back into
Workspace._cache; a warm read returns them with no network round-trip. - Fingerprint verifies the cache. Under STRICT, the eager drift check stats every recorded path against live and raises before any read fires if anything moved. The cache is therefore trusted as authoritative until proven stale.
- Revision pin is the cold-path recovery. When the cache misses and the backend supports pinning, reads fetch the exact recorded revision (S3
GetObject(VersionId=...)) so you still get original bytes, not the live head.
| Scenario | Cache | Pin | Drift check | Net effect |
|---|---|---|---|---|
| STRICT, fingerprint matches, cache warm | served | n/a | passes | Original bytes from cache (~0 ms) |
| STRICT, fingerprint matches, cache cold | miss | n/a | passes | Live GET, current bytes (= original) |
| STRICT, no pin, fingerprint differs | (n/a) | (n/a) | raises ContentDriftError | Caller informed before reading |
| STRICT + pin (versioned backend) | served (matches recorded) | installed | skipped for pinned paths | Original bytes from cache or pinned GET |
| OFF, any state | evicted for fingerprinted paths | not installed | skipped | Live GET, current bytes |
Resource Support Matrix
Snapshot support is opt-in per resource viaSUPPORTS_SNAPSHOT = True. Resources without it surface in a load-time warning and serve current state with no drift detection.
Legend: ✅ = supported · 🟡 = adapter needed (no work in flight) · 📝 = planned · ❌ = not applicable.
Object Storage
| Resource | Drift detection (Py) | Revision pin (Py) | Drift detection (TS) | Revision pin (TS) | Marker | Notes |
|---|---|---|---|---|---|---|
| S3 | ✅ | ✅ | 📝 | 📝 | ETag + VersionId | Pin requires bucket versioning. |
| R2 | ✅ | ✅ | 📝 | 📝 | ETag + VersionId | Inherits S3 path; pin requires R2 versioning (GA 2024). |
| GCS | ✅ | 🟡 | 📝 | 🟡 | ETag + x-goog-generation | TODO(snapshot-gcs): map generation → ContentVersion(kind="revision") in core/s3/stat.py. |
| OCI | ✅ | 🟡 | 📝 | 🟡 | ETag + versionId header | TODO(snapshot-oci): same shape as GCS adapter. |
| Supabase | ✅ | ❌ | 📝 | ❌ | ETag only | Inherits S3Resource. Supabase’s S3-compat endpoint does not surface object VersionId, so drift detection works but pinning is not available. |
Files & Code
| Resource | Drift detection (Py) | Revision pin (Py) | Drift detection (TS) | Revision pin (TS) | Marker | Notes |
|---|---|---|---|---|---|---|
| Disk | ✅ | ❌ | 📝 | ❌ | content hash | Bytes travel inside the tar; pin not meaningful. |
| RAM | ✅ | ❌ | 📝 | ❌ | content hash | Same as Disk. |
| Redis | ✅ | ❌ | 📝 | ❌ | content hash | Bytes restored via resources= override. |
| GitHub | 📝 | 📝 | 📝 | 📝 | commit SHA | TODO(snapshot-github): stat → revision=sha, read via repos.get_contents(path, ref=sha). |
| Google Drive | 📝 | 📝 | 📝 | 📝 | md5Checksum + revisionId | TODO(snapshot-gdrive): stat → revision=revisionId, read via revisions().get_media. |
Extending To A New Backend
Three steps in Python:record so the snapshot
captures whatever the backend served:
mount.revisions — no per-resource hook required. If your backend
has no stable revision, skip steps 3 and 6; drift detection still
works on the fingerprint alone.
Caveats
- Revision longevity. Pinned reads only work as long as the source still retains the recorded revision. S3 bucket lifecycle rules can age out old versions; Drive keeps revisions for 30 days on non-Workspace files. Treat pins as best-effort.
- First read after load is slower than the rest.
Workspace.load()returns immediately, but the firstexecute()ordispatch()afterwards pauses while Mirage verifies that nothing has drifted upstream. Concretely, it asks the live source “is this path still the bytes I remember?” once per recorded read, in parallel. Tiny for short sessions (tens of milliseconds); a few hundred milliseconds to a couple of seconds for sessions with hundreds of recorded reads. The pause happens once per loaded workspace; subsequent calls are normal speed. Passdrift_policy=DriftPolicy.OFFif you want to skip the check entirely.