Documentation Index
Fetch the complete documentation index at: https://docs.mirage.strukto.ai/llms.txt
Use this file to discover all available pages before exploring further.
A Workspace can be saved to a tar archive, loaded back into a new
Workspace, or copied in-process. The same machinery powers OpenAI
Agents sandbox checkpoints (persist_workspace / hydrate_workspace).
Three methods
ws.snapshot("snap.tar") # serialize to disk
ws.snapshot("snap.tar.gz", compress="gz") # gzipped tar
ws.snapshot(buf) # serialize to a BytesIO
new_ws = Workspace.load("snap.tar",
resources={"/s3": fresh_s3_resource})
forked = ws.copy() # in-process deep copy
forked = copy.deepcopy(ws) # same path via stdlib
copy.copy(ws) # raises NotImplementedError
copy() is always deep; shallow copy would alias _cache,
_session_mgr, resources, history, etc., that gives an alias, not a
copy. copy.copy(ws) raises with a message pointing to ws.copy()
and copy.deepcopy(ws).
What’s in a snapshot
| Captured | Notes |
|---|
| Mounts (per resource state) | Content for RAM/Disk/Redis; config-only for cloud/token resources |
| Sessions (cwd, env per session) | All active sessions |
| Dirty inode tracker | _tracker._inodes |
| File cache (RAM-backed entries) | Bytes + fingerprint + ttl + cached_at |
max_drain_bytes setting | The cancellable-cache-drain budget |
| Execution history | ExecutionRecords with command, stdout, exit code, timestamps |
| Finished jobs | Job table entries with status COMPLETED / KILLED |
| NOT captured | Why |
|---|
| Running jobs | Pending tasks can’t be serialized; only finished jobs survive |
| Drain tasks | asyncio.Task objects, runtime-only |
| FUSE mountpoint | Environment-specific |
| Cloud credentials | Redacted with "<REDACTED>" sentinel |
A snapshot is a tar archive containing a manifest.json and a set
of binary blob files:
snap.tar
├── manifest.json # state metadata; bytes replaced by {"__file": ...} refs
├── mounts/
│ ├── 0/files/0.bin # RAM mount file (numbered blob)
│ ├── 1/files/data/a.txt # Disk mount file (tree-preserving)
│ └── 2/data/0.bin # Redis mount value (numbered blob)
└── cache/blobs/0.bin # cache entry bytes
The manifest.json contains everything except the bytes; binary fields
are replaced with {"__file": "<tar-relative-path>"} placeholders that
the loader resolves by reading from the tar.
Why tar + JSON instead of pickle?
| Property | Pickle | Tar + JSON manifest |
|---|
| Inspectable with standard tools | no | yes (tar tf, jq) |
| Portable to TS / Go / Rust | no | yes |
| Safe to load from untrusted source | no, RCE | yes (no eval) |
| Streaming-friendly | no | yes (tar streams) |
| Native bytes support | yes | yes (raw blob files) |
| OpenAI Agents sandbox compat | no | yes (their unix_local/docker/vercel backends use tar too) |
The only cost is a few hundred extra LOC for tar bundling.
Disk mounts: tree-preserving layout
For DiskResource mounts, files are placed at their relative path
inside the tar (not as numbered blobs), so tar -xf snap.tar -C /somewhere extracts a usable directory tree:
/somewhere/mounts/1/files/data/a.txt # actual file content, original name
/somewhere/mounts/1/files/sub/b.txt
This makes Mirage snapshots compatible with tools that expect to
extract real files.
Backend serialization policy
| Backend | What’s saved | needs_override at load |
|---|
| RAM | full files dict + dirs (bytes as side-files) | no |
| Disk | full file tree (real files inside the tar) | no (default = fresh tmpdir; or pass DiskResource(root=...)) |
| Redis | full key+value dump | yes (caller supplies target Redis URL) |
| S3, R2, OCI, GCS, Supabase | bucket/region; creds stripped | yes |
| GDrive, Gmail, GDocs, GSheets, GSlides | client_id; secret + refresh_token stripped | yes |
| Slack, Discord, Telegram, Notion, Linear, Trello, GitHub, GitHub CI, Email, Langfuse, MongoDB | endpoint config; token stripped | yes |
| SSH | host config; key path stripped | no (path-based, no embedded secret) |
| Paperclip | path config | no |
needs_override=True semantics
When loading a snapshot, every mount with needs_override=True MUST
have an entry in resources={...}:
ws = Workspace.load(
"snap.tar",
resources={
"/s3": S3Resource(config=S3Config(... fresh creds ...)),
"/redis": RedisResource(url="redis://prod:6379/0"),
"/gdrive": GoogleDriveResource(config=...),
},
)
If any are missing, load fails fast with a single ValueError
listing all missing prefixes:
Workspace.load: resources= must include overrides for: ['/s3', '/gdrive'].
These mounts were saved with redacted creds or transient connection
state and need fresh resources.
For RAM/Disk/SSH/Paperclip, no override is required, the snapshot
already contains everything needed to reconstruct them.
Credential redaction
Cloud and token resource configs replace sensitive fields with the
literal string "<REDACTED>":
{
"type": "s3",
"needs_override": true,
"redacted_fields": ["aws_access_key_id", "aws_secret_access_key"],
"config": {
"bucket": "my-bucket",
"region": "us-east-1",
"aws_access_key_id": "<REDACTED>",
"aws_secret_access_key": "<REDACTED>"
}
}
A test asserts the literal cred bytes never appear anywhere in the
saved tar. Distinct from None so the loader can tell “deliberately
stripped” from “legitimately empty.”
copy() vs save() + load(), one important divergence
| copy() | save → load |
|---|
| Local backends (RAM, Disk) | independent, writes to copy don’t reach original | independent, same |
| Remote backends (S3, GDrive, …) | reuses the original Resource instance, both copies share the same bucket/folder | requires resources= override; loader picks fresh creds (typically same bucket, but it’s the caller’s choice) |
| Redis | reuses the original Resource, both copies see the same Redis instance | requires resources= override; data is dumped + restored to caller-supplied URL |
| Cache | independent (each fork gets its own cache restored from snapshot) | independent, same |
| Speed | fast, no serialization | slower, full tar I/O |
| Use case | speculative execution, parallel agents within one process | resume across processes, share with another machine, OpenAI Agents checkpoint |
copy() is for “fork the local view”; save/load is for “ship state
elsewhere.” The Redis/S3 sharing in copy() is intentional, we
assume you don’t want a copy to silently fork your remote bucket.
Path-traversal defense on load
Every {"__file": "<path>"} reference in the manifest is validated
against an allowlist before being read from the tar:
- Reject empty strings
- Reject leading
/ (absolute paths)
- Reject
.. segments
- Reject NUL byte
A maliciously crafted tar with {"__file": "../../etc/passwd"} is
rejected before any read with a clear ValueError("Unsafe blob path").
We never use tarfile.extractall, only extractfile(named_member)
for entries the manifest explicitly references.
Spaces and unicode in filenames work fine (only the structural
characters above are blocked).
OpenAI Agents integration
MirageSandboxSession.persist_workspace() and hydrate_workspace(data)
plug into the OpenAI Agents SDK’s BaseSandboxSession checkpoint API
, the same surface every other backend (unix_local, docker,
vercel, runloop, cloudflare) implements:
from agents.run import RunConfig
from agents.sandbox import SandboxRunConfig
from mirage.agents.openai_agents import MirageSandboxClient
client = MirageSandboxClient(workspace)
session = await client.create()
# Agent does work in the session...
# Then checkpoint:
snapshot = await session.persist_workspace() # BytesIO of a tar
# Later, in a fresh session:
fresh_session = await fresh_client.create()
await fresh_session.hydrate_workspace(snapshot)
# fresh_session's workspace now has the original's files, cache, history
hydrate_workspace mutates the existing session’s workspace in place
(rather than constructing a new one). This is required by the SDK’s
BaseSandboxSession API; for general use prefer Workspace.load(...)
which returns a fresh Workspace.
The sandbox session’s hydrate_workspace is implemented in terms of
the public read_tar + apply_state_dict primitives, same building
blocks Workspace.load uses, just without Workspace construction.
Anti-patterns
copy.copy(ws), raises. Workspace has no useful shallow copy.
- Hand-editing the tar, manifest validation may reject the result.
Use
to_state_dict + split_manifest_and_blobs + write_tar to
build snapshots programmatically.
- Calling
Workspace.load on an untrusted source assuming it’s safe
because it’s not pickle, it’s safer than pickle (no RCE), but
still validate provenance before loading anything.
- Loading a snapshot from a different Mirage version, v1
refuses non-
version: 1 snapshots; migrations come when needed.
- Workspace, the kernel that owns the state being snapshotted.
- File Store, the cache layer captured in a snapshot.
- Mounts, resource-prefix routing that must be re-supplied at load time.