Mirage · Unified Virtual Filesystem for AI Agents

A Workspace can be saved to a tar archive, loaded back into a new Workspace, or copied in-process. The same machinery powers OpenAI Agents sandbox checkpoints (persist_workspace / hydrate_workspace).

Three methods

ws.snapshot("snap.tar")                        # serialize to disk
ws.snapshot("snap.tar.gz", compress="gz")      # gzipped tar
ws.snapshot(buf)                               # serialize to a BytesIO

new_ws = Workspace.load("snap.tar",
                        resources={"/s3": fresh_s3_resource})

forked = ws.copy()                             # in-process deep copy
forked = copy.deepcopy(ws)                     # same path via stdlib
copy.copy(ws)                                  # raises NotImplementedError

copy() is always deep; shallow copy would alias _cache, _session_mgr, resources, history, etc., that gives an alias, not a copy. copy.copy(ws) raises with a message pointing to ws.copy() and copy.deepcopy(ws).

What’s in a snapshot

Captured	Notes
Mounts (per resource state)	Content for RAM/Disk/Redis; config-only for cloud/token resources
Sessions (cwd, env per session)	All active sessions
Dirty inode tracker	`_tracker._inodes`
File cache (RAM-backed entries)	Bytes + fingerprint + ttl + cached_at
`max_drain_bytes` setting	The cancellable-cache-drain budget
Execution history	`ExecutionRecord`s with command, stdout, exit code, timestamps
Finished jobs	Job table entries with status `COMPLETED` / `KILLED`

NOT captured	Why
Running jobs	Pending tasks can’t be serialized; only finished jobs survive
Drain tasks	`asyncio.Task` objects, runtime-only
FUSE mountpoint	Environment-specific
Cloud credentials	Redacted with `"<REDACTED>"` sentinel

Format: tar + JSON manifest + raw blobs

A snapshot is a tar archive containing a manifest.json and a set of binary blob files:

snap.tar
├── manifest.json            # state metadata; bytes replaced by {"__file": ...} refs
├── mounts/
│   ├── 0/files/0.bin        # RAM mount file (numbered blob)
│   ├── 1/files/data/a.txt   # Disk mount file (tree-preserving)
│   └── 2/data/0.bin         # Redis mount value (numbered blob)
└── cache/blobs/0.bin        # cache entry bytes

The manifest.json contains everything except the bytes; binary fields are replaced with {"__file": "<tar-relative-path>"} placeholders that the loader resolves by reading from the tar.

Why tar + JSON instead of pickle?

Property	Pickle	Tar + JSON manifest
Inspectable with standard tools	no	yes (`tar tf`, `jq`)
Portable to TS / Go / Rust	no	yes
Safe to load from untrusted source	no, RCE	yes (no eval)
Streaming-friendly	no	yes (tar streams)
Native bytes support	yes	yes (raw blob files)
OpenAI Agents sandbox compat	no	yes (their unix_local/docker/vercel backends use tar too)

The only cost is a few hundred extra LOC for tar bundling.

Disk mounts: tree-preserving layout

For DiskResource mounts, files are placed at their relative path inside the tar (not as numbered blobs), so tar -xf snap.tar -C /somewhere extracts a usable directory tree:

/somewhere/mounts/1/files/data/a.txt   # actual file content, original name
/somewhere/mounts/1/files/sub/b.txt

This makes Mirage snapshots compatible with tools that expect to extract real files.

Backend serialization policy

Backend	What’s saved	`needs_override` at load
RAM	full files dict + dirs (bytes as side-files)	no
Disk	full file tree (real files inside the tar)	no (default = fresh tmpdir; or pass `DiskResource(root=...)`)
Redis	full key+value dump	yes (caller supplies target Redis URL)
S3, R2, OCI, GCS, Supabase	bucket/region; creds stripped	yes
GDrive, Gmail, GDocs, GSheets, GSlides	client_id; secret + refresh_token stripped	yes
Slack, Discord, Telegram, Notion, Linear, Trello, GitHub, GitHub CI, Email, Langfuse, MongoDB	endpoint config; token stripped	yes
SSH	host config; key path stripped	no (path-based, no embedded secret)
Paperclip	path config	no

`needs_override=True` semantics

When loading a snapshot, every mount with needs_override=True MUST have an entry in resources={...}:

ws = Workspace.load(
    "snap.tar",
    resources={
        "/s3":    S3Resource(config=S3Config(... fresh creds ...)),
        "/redis": RedisResource(url="redis://prod:6379/0"),
        "/gdrive": GoogleDriveResource(config=...),
    },
)

If any are missing, load fails fast with a single ValueError listing all missing prefixes:

Workspace.load: resources= must include overrides for: ['/s3', '/gdrive'].
These mounts were saved with redacted creds or transient connection
state and need fresh resources.

For RAM/Disk/SSH/Paperclip, no override is required, the snapshot already contains everything needed to reconstruct them.

Credential redaction

Cloud and token resource configs replace sensitive fields with the literal string "<REDACTED>":

{
  "type": "s3",
  "needs_override": true,
  "redacted_fields": ["aws_access_key_id", "aws_secret_access_key"],
  "config": {
    "bucket": "my-bucket",
    "region": "us-east-1",
    "aws_access_key_id": "<REDACTED>",
    "aws_secret_access_key": "<REDACTED>"
  }
}

A test asserts the literal cred bytes never appear anywhere in the saved tar. Distinct from None so the loader can tell “deliberately stripped” from “legitimately empty.”

`copy()` vs `save()` + `load()`, one important divergence

	`copy()`	`save → load`
Local backends (RAM, Disk)	independent, writes to copy don’t reach original	independent, same
Remote backends (S3, GDrive, …)	reuses the original Resource instance, both copies share the same bucket/folder	requires `resources=` override; loader picks fresh creds (typically same bucket, but it’s the caller’s choice)
Redis	reuses the original Resource, both copies see the same Redis instance	requires `resources=` override; data is dumped + restored to caller-supplied URL
Cache	independent (each fork gets its own cache restored from snapshot)	independent, same
Speed	fast, no serialization	slower, full tar I/O
Use case	speculative execution, parallel agents within one process	resume across processes, share with another machine, OpenAI Agents checkpoint

copy() is for “fork the local view”; save/load is for “ship state elsewhere.” The Redis/S3 sharing in copy() is intentional, we assume you don’t want a copy to silently fork your remote bucket.

Path-traversal defense on load

Every {"__file": "<path>"} reference in the manifest is validated against an allowlist before being read from the tar:

Reject empty strings
Reject leading / (absolute paths)
Reject .. segments
Reject NUL byte

A maliciously crafted tar with {"__file": "../../etc/passwd"} is rejected before any read with a clear ValueError("Unsafe blob path"). We never use tarfile.extractall, only extractfile(named_member) for entries the manifest explicitly references. Spaces and unicode in filenames work fine (only the structural characters above are blocked).

OpenAI Agents integration

MirageSandboxSession.persist_workspace() and hydrate_workspace(data) plug into the OpenAI Agents SDK’s BaseSandboxSession checkpoint API , the same surface every other backend (unix_local, docker, vercel, runloop, cloudflare) implements:

from agents.run import RunConfig
from agents.sandbox import SandboxRunConfig
from mirage.agents.openai_agents import MirageSandboxClient

client = MirageSandboxClient(workspace)
session = await client.create()

# Agent does work in the session...

# Then checkpoint:
snapshot = await session.persist_workspace()    # BytesIO of a tar

# Later, in a fresh session:
fresh_session = await fresh_client.create()
await fresh_session.hydrate_workspace(snapshot)
# fresh_session's workspace now has the original's files, cache, history

hydrate_workspace mutates the existing session’s workspace in place (rather than constructing a new one). This is required by the SDK’s BaseSandboxSession API; for general use prefer Workspace.load(...) which returns a fresh Workspace. The sandbox session’s hydrate_workspace is implemented in terms of the public read_tar + apply_state_dict primitives, same building blocks Workspace.load uses, just without Workspace construction.

Anti-patterns

copy.copy(ws), raises. Workspace has no useful shallow copy.
Hand-editing the tar, manifest validation may reject the result. Use to_state_dict + split_manifest_and_blobs + write_tar to build snapshots programmatically.
Calling Workspace.load on an untrusted source assuming it’s safe because it’s not pickle, it’s safer than pickle (no RCE), but still validate provenance before loading anything.
Loading a snapshot from a different Mirage version, v1 refuses non-version: 1 snapshots; migrations come when needed.

Workspace, the kernel that owns the state being snapshotted.
File Store, the cache layer captured in a snapshot.
Mounts, resource-prefix routing that must be re-supplied at load time.

Getting Started

Learn

Help

Setup

Snapshot

Three methods

What’s in a snapshot

Format: tar + JSON manifest + raw blobs

Why tar + JSON instead of pickle?

Disk mounts: tree-preserving layout

Backend serialization policy

`needs_override=True` semantics

Credential redaction

`copy()` vs `save()` + `load()`, one important divergence

Path-traversal defense on load

OpenAI Agents integration

Anti-patterns

Getting Started

Learn

Help

Setup

Documentation Index

​Three methods

​What’s in a snapshot

​Format: tar + JSON manifest + raw blobs

​Why tar + JSON instead of pickle?

​Disk mounts: tree-preserving layout

​Backend serialization policy

​needs_override=True semantics

​Credential redaction

​copy() vs save() + load(), one important divergence

​Path-traversal defense on load

​OpenAI Agents integration

​Anti-patterns

​Related

Three methods

What’s in a snapshot

Format: tar + JSON manifest + raw blobs

Why tar + JSON instead of pickle?

Disk mounts: tree-preserving layout

Backend serialization policy

`needs_override=True` semantics

Credential redaction

`copy()` vs `save()` + `load()`, one important divergence

Path-traversal defense on load

OpenAI Agents integration

Anti-patterns

Related