Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mirage.strukto.ai/llms.txt

Use this file to discover all available pages before exploring further.

A Workspace can be saved to a tar archive, loaded back into a new Workspace, or copied in-process. The same machinery powers OpenAI Agents sandbox checkpoints (persist_workspace / hydrate_workspace).

Three methods

ws.snapshot("snap.tar")                        # serialize to disk
ws.snapshot("snap.tar.gz", compress="gz")      # gzipped tar
ws.snapshot(buf)                               # serialize to a BytesIO

new_ws = Workspace.load("snap.tar",
                        resources={"/s3": fresh_s3_resource})

forked = ws.copy()                             # in-process deep copy
forked = copy.deepcopy(ws)                     # same path via stdlib
copy.copy(ws)                                  # raises NotImplementedError
copy() is always deep; shallow copy would alias _cache, _session_mgr, resources, history, etc., that gives an alias, not a copy. copy.copy(ws) raises with a message pointing to ws.copy() and copy.deepcopy(ws).

What’s in a snapshot

CapturedNotes
Mounts (per resource state)Content for RAM/Disk/Redis; config-only for cloud/token resources
Sessions (cwd, env per session)All active sessions
Dirty inode tracker_tracker._inodes
File cache (RAM-backed entries)Bytes + fingerprint + ttl + cached_at
max_drain_bytes settingThe cancellable-cache-drain budget
Execution historyExecutionRecords with command, stdout, exit code, timestamps
Finished jobsJob table entries with status COMPLETED / KILLED
NOT capturedWhy
Running jobsPending tasks can’t be serialized; only finished jobs survive
Drain tasksasyncio.Task objects, runtime-only
FUSE mountpointEnvironment-specific
Cloud credentialsRedacted with "<REDACTED>" sentinel

Format: tar + JSON manifest + raw blobs

A snapshot is a tar archive containing a manifest.json and a set of binary blob files:
snap.tar
├── manifest.json            # state metadata; bytes replaced by {"__file": ...} refs
├── mounts/
│   ├── 0/files/0.bin        # RAM mount file (numbered blob)
│   ├── 1/files/data/a.txt   # Disk mount file (tree-preserving)
│   └── 2/data/0.bin         # Redis mount value (numbered blob)
└── cache/blobs/0.bin        # cache entry bytes
The manifest.json contains everything except the bytes; binary fields are replaced with {"__file": "<tar-relative-path>"} placeholders that the loader resolves by reading from the tar.

Why tar + JSON instead of pickle?

PropertyPickleTar + JSON manifest
Inspectable with standard toolsnoyes (tar tf, jq)
Portable to TS / Go / Rustnoyes
Safe to load from untrusted sourceno, RCEyes (no eval)
Streaming-friendlynoyes (tar streams)
Native bytes supportyesyes (raw blob files)
OpenAI Agents sandbox compatnoyes (their unix_local/docker/vercel backends use tar too)
The only cost is a few hundred extra LOC for tar bundling.

Disk mounts: tree-preserving layout

For DiskResource mounts, files are placed at their relative path inside the tar (not as numbered blobs), so tar -xf snap.tar -C /somewhere extracts a usable directory tree:
/somewhere/mounts/1/files/data/a.txt   # actual file content, original name
/somewhere/mounts/1/files/sub/b.txt
This makes Mirage snapshots compatible with tools that expect to extract real files.

Backend serialization policy

BackendWhat’s savedneeds_override at load
RAMfull files dict + dirs (bytes as side-files)no
Diskfull file tree (real files inside the tar)no (default = fresh tmpdir; or pass DiskResource(root=...))
Redisfull key+value dumpyes (caller supplies target Redis URL)
S3, R2, OCI, GCS, Supabasebucket/region; creds strippedyes
GDrive, Gmail, GDocs, GSheets, GSlidesclient_id; secret + refresh_token strippedyes
Slack, Discord, Telegram, Notion, Linear, Trello, GitHub, GitHub CI, Email, Langfuse, MongoDBendpoint config; token strippedyes
SSHhost config; key path strippedno (path-based, no embedded secret)
Paperclippath configno

needs_override=True semantics

When loading a snapshot, every mount with needs_override=True MUST have an entry in resources={...}:
ws = Workspace.load(
    "snap.tar",
    resources={
        "/s3":    S3Resource(config=S3Config(... fresh creds ...)),
        "/redis": RedisResource(url="redis://prod:6379/0"),
        "/gdrive": GoogleDriveResource(config=...),
    },
)
If any are missing, load fails fast with a single ValueError listing all missing prefixes:
Workspace.load: resources= must include overrides for: ['/s3', '/gdrive'].
These mounts were saved with redacted creds or transient connection
state and need fresh resources.
For RAM/Disk/SSH/Paperclip, no override is required, the snapshot already contains everything needed to reconstruct them.

Credential redaction

Cloud and token resource configs replace sensitive fields with the literal string "<REDACTED>":
{
  "type": "s3",
  "needs_override": true,
  "redacted_fields": ["aws_access_key_id", "aws_secret_access_key"],
  "config": {
    "bucket": "my-bucket",
    "region": "us-east-1",
    "aws_access_key_id": "<REDACTED>",
    "aws_secret_access_key": "<REDACTED>"
  }
}
A test asserts the literal cred bytes never appear anywhere in the saved tar. Distinct from None so the loader can tell “deliberately stripped” from “legitimately empty.”

copy() vs save() + load(), one important divergence

copy()save → load
Local backends (RAM, Disk)independent, writes to copy don’t reach originalindependent, same
Remote backends (S3, GDrive, …)reuses the original Resource instance, both copies share the same bucket/folderrequires resources= override; loader picks fresh creds (typically same bucket, but it’s the caller’s choice)
Redisreuses the original Resource, both copies see the same Redis instancerequires resources= override; data is dumped + restored to caller-supplied URL
Cacheindependent (each fork gets its own cache restored from snapshot)independent, same
Speedfast, no serializationslower, full tar I/O
Use casespeculative execution, parallel agents within one processresume across processes, share with another machine, OpenAI Agents checkpoint
copy() is for “fork the local view”; save/load is for “ship state elsewhere.” The Redis/S3 sharing in copy() is intentional, we assume you don’t want a copy to silently fork your remote bucket.

Path-traversal defense on load

Every {"__file": "<path>"} reference in the manifest is validated against an allowlist before being read from the tar:
  • Reject empty strings
  • Reject leading / (absolute paths)
  • Reject .. segments
  • Reject NUL byte
A maliciously crafted tar with {"__file": "../../etc/passwd"} is rejected before any read with a clear ValueError("Unsafe blob path"). We never use tarfile.extractall, only extractfile(named_member) for entries the manifest explicitly references. Spaces and unicode in filenames work fine (only the structural characters above are blocked).

OpenAI Agents integration

MirageSandboxSession.persist_workspace() and hydrate_workspace(data) plug into the OpenAI Agents SDK’s BaseSandboxSession checkpoint API , the same surface every other backend (unix_local, docker, vercel, runloop, cloudflare) implements:
from agents.run import RunConfig
from agents.sandbox import SandboxRunConfig
from mirage.agents.openai_agents import MirageSandboxClient

client = MirageSandboxClient(workspace)
session = await client.create()

# Agent does work in the session...

# Then checkpoint:
snapshot = await session.persist_workspace()    # BytesIO of a tar

# Later, in a fresh session:
fresh_session = await fresh_client.create()
await fresh_session.hydrate_workspace(snapshot)
# fresh_session's workspace now has the original's files, cache, history
hydrate_workspace mutates the existing session’s workspace in place (rather than constructing a new one). This is required by the SDK’s BaseSandboxSession API; for general use prefer Workspace.load(...) which returns a fresh Workspace. The sandbox session’s hydrate_workspace is implemented in terms of the public read_tar + apply_state_dict primitives, same building blocks Workspace.load uses, just without Workspace construction.

Anti-patterns

  • copy.copy(ws), raises. Workspace has no useful shallow copy.
  • Hand-editing the tar, manifest validation may reject the result. Use to_state_dict + split_manifest_and_blobs + write_tar to build snapshots programmatically.
  • Calling Workspace.load on an untrusted source assuming it’s safe because it’s not pickle, it’s safer than pickle (no RCE), but still validate provenance before loading anything.
  • Loading a snapshot from a different Mirage version, v1 refuses non-version: 1 snapshots; migrations come when needed.
  • Workspace, the kernel that owns the state being snapshotted.
  • File Store, the cache layer captured in a snapshot.
  • Mounts, resource-prefix routing that must be re-supplied at load time.