/mongodb/.
For connection setup, see MongoDB Setup.
Config
| Config field | Required | Default | Description |
|---|---|---|---|
uri | yes | MongoDB connection URI | |
databases | no | List of database names to mount (omit for all) | |
default_doc_limit | no | 1000 | Default doc cap for one-shot reads (not used by streaming cat) |
default_search_limit | no | 100 | Default result cap for collection/db-level grep |
max_doc_limit | no | 5000 | Hard cap for head -n K / tail -n K |
elide_fields | no | {} | {"<db>.<coll>": ["field", "nested.path"]}; listed fields are dropped from documents.jsonl |
Filesystem Layout
The mount mirrors MongoDB’scluster → database → collection / view
model. The database directory always appears in the path, even when
databases filters to a single entry.
documents.jsonl
One JSON object per line, encoded with BSON Relaxed Extended JSON. BSON-specific types round-trip through their canonical$ wrappers:
cat / head / tail / grep / jq all stream from this file; nothing
materializes the full collection in memory.
schema.json
Generated from a 100-document$sample. Includes:
- field path → observed BSON type frequencies (nested paths unioned across the sample)
- indexes from
listIndexesplus access counts from$indexStats - the
$jsonSchemavalidator if one is registered
database.json
Lists every collection and view under the database with their document counts. Useful forcat /mongodb/<db>/database.json to get an overview
without recursing into each entity.
Elided fields
Fields listed underelide_fields are dropped entirely from
documents.jsonl output. The type stays documented in schema.json, so
heavy fields (embeddings, large binary, raw text blobs) can be hidden
from agent reads without losing the schema signal:
cat (one-shot
streaming reads) and tail -f (live change-stream follows).
Streaming and Limits
cat, grep, head, and tail -f consume documents lazily through a
batched PyMongo async cursor (or change stream); the consumer cancels to stop
fetching. There is no truncation notice because nothing is forced into
memory ahead of the consumer.
| Command | Behavior |
|---|---|
cat | Streams the whole collection sorted by _id; pipe to head to cap |
head -n K / tail -n K | K is capped at max_doc_limit; server-side sort + limit |
tail -f | Opens a change stream; yields each new insert as a JSONL line |
grep (file level) | Streams from documents.jsonl; supports -m for short-circuit |
grep (collection/db level) | Server-side query, capped at default_search_limit |
jq | Inherits the streaming cat source |
wc | Uses countDocuments() server-side; zero download |
stat | Metadata only (doc count, indexes; views skip the index lookup) |
Smart Commands
grep at different scopes
grep uses MongoDB’s query engine at directory scopes instead of
streaming all documents through the regex pipeline:
- Text index exists → uses
$text(ranked by relevance) - Atlas Search index exists → uses
$search(fuzzy, Lucene-based) - Neither → falls back to
$regexon sampled string fields
mirage/core/mongodb/scope.py.
head / tail / tail -f
head and tail use server-side sort + limit; the requested count
is capped at max_doc_limit. tail -f opens a Mongo change stream
filtered to insert events and yields each new document as a JSONL line
in the same format as cat:
tail -f requires the cluster to be a replica set; Atlas already
satisfies this. Views fall through to the non-streaming path because
change streams aren’t defined on views.
Cache
The MongoDB resource usesIndexCacheStore (same as RAM/S3/disk/GitHub)
for listings: database names, collection names, and document counts.
Document content is not cached. The resource leaves caches_reads
at its default of False, so cat, grep, head, and tail always
query the live collection
instead of serving a stored snapshot. This keeps reads consistent with a
mutable database and ensures tail -f follows the live change stream
rather than replaying cached bytes.
Example
Runnable examples
Three working examples live underexamples/python/mongodb/:
mongodb.py— agent-shell workflow:ls,tree,cat,head,tail,wc,stat,grep/rgat every scope,jq,find,cd+ relative paths.mongodb_vfs.py— in-process VFS:os.listdirandopen()walk every readdir level (root, database,collections/,views/, entity) and readdatabase.json,schema.json,documents.jsonl(collection + view).mongodb_fuse.py— same coverage as the VFS example, but the tree is mounted as a real filesystem so other processes cancat/ls/headthe mountpoint directly.
mirage_test database seeded by python/scripts/seed_mongodb_test.py.
Finding IDs
_id is serialized as {"$oid": "..."} under Extended JSON:
Working with Large Collections
Streaming is the default; reach for these patterns when you want to keep the round trip small:elide_fields;
schema.json still documents the original type so the agent can decide
when to ask for the raw bytes through a different path.
Shell Commands
Standard commands available on the mounted MongoDB tree:| Command | Notes |
|---|---|
ls | List databases, collections, views, and per-entity files |
cat | Stream documents.jsonl (or read schema.json / database.json) |
head / tail | Smart: server-side sort + limit (capped at max_doc_limit) |
tail -f | Live-follow new inserts via Mongo change stream |
grep / rg | Smart: MongoDB query engine at collection/db scope, regex at file scope |
jq | Query JSON; use .[] prefix when iterating JSONL files |
wc | Smart: uses countDocuments() server-side |
stat | Metadata (doc count, indexes; views skip the index lookup) |
find | List databases/collections/views with -name, -maxdepth |
tree | Directory tree view |