Mirage · Unified Virtual Filesystem for AI Agents

The Paperclip resource exposes 8M+ biomedical papers from bioRxiv, medRxiv, and PubMed Central as a virtual filesystem organized by source, year, and month. For credential setup, see Paperclip Setup.

Config

from mirage import MountMode, Workspace
from mirage.resource.paperclip import PaperclipConfig, PaperclipResource

config = PaperclipConfig()
resource = PaperclipResource(config=config)
ws = Workspace({"/paperclip": resource}, mode=MountMode.READ)

Filesystem Layout

/paperclip/
  biorxiv/
    2025/
      12/
        bio_07cb291a7ce4/
          meta.json
          content.lines
          sections/
            introduction.lines
            methods.lines
            results.lines
            discussion.lines
          figures/
            fig1.tif
            fig2.gif
            fig3.jpg
          supplements/
            table_s1.csv
            methods_s1.docx
            notes.md.lines
  medrxiv/
    2025/
      11/
        med_a3f19bc2e810/
          meta.json
          content.lines
          sections/
          figures/
          supplements/
  pmc/
    2024/
      06/
        pmc_9e12d4a7b5c3/
          meta.json
          content.lines
          sections/
          figures/
          supplements/

Sources

Directory	Description
`biorxiv`	bioRxiv preprints
`medrxiv`	medRxiv preprints
`pmc`	PubMed Central peer-reviewed

Year / Month

Each source directory contains year directories, each containing month directories. Listing a month directory issues a SQL query against the Paperclip API for that source and time range, returning up to default_limit (500) papers.

Paper Directory

Each paper is a directory containing:

Entry	Type	Description
`meta.json`	JSON	Paper metadata (title, authors, DOI, dates)
`content.lines`	Text	Full text with `L<n>:` line prefixes
`sections/`	Dir	Per-section `*.lines` files
`figures/`	Dir	Figure files (`.tif`, `.gif`, `.jpg`)
`supplements/`	Dir	Supplementary files (`.docx`, `.csv`, `.md.lines`)

Command Passthrough

File-level cat, head, tail, and grep commands are passed through to the Paperclip API rather than downloading the full file first.

grep Flags

Flag	Supported	Notes
`-i`	Yes	Case-insensitive
`-c`	Yes	Count matches
`-m`	Yes	Max match count
`-v`	Yes	Invert match
`-E`	No	Fails silently

grep at Different Scopes

# PAPER level - passthrough to Paperclip API
grep "CRISPR" "/paperclip/biorxiv/2025/12/bio_07cb291a7ce4/content.lines"

# MONTH level - search pre-filter, then grep within results
grep "CRISPR" "/paperclip/biorxiv/2025/12/"

Cache

The Paperclip resource uses IndexCacheStore (same as RAM/S3/disk/GitHub). Index entries store source listings and paper metadata. There is no separate content cache - file content caching is handled by the workspace IOResult mechanism.

Example

import asyncio

from mirage import MountMode, Workspace
from mirage.resource.paperclip import PaperclipConfig, PaperclipResource

config = PaperclipConfig()
resource = PaperclipResource(config=config)


async def main():
    ws = Workspace({"/paperclip": resource}, mode=MountMode.READ)

    # List sources
    r = await ws.execute("ls /paperclip/")
    print(await r.stdout_str())

    # List papers in a month
    r = await ws.execute("ls /paperclip/biorxiv/2025/12/")
    print(await r.stdout_str())

    paper = r.stdout_str().strip().splitlines()[0].strip()
    base = f"/paperclip/biorxiv/2025/12/{paper}"

    # Read paper metadata
    r = await ws.execute(f'cat "{base}/meta.json"')
    print(await r.stdout_str())

    # Read first 20 lines of content
    r = await ws.execute(f'head -n 20 "{base}/content.lines"')
    print(await r.stdout_str())

    # Search within a paper
    r = await ws.execute(f'grep "CRISPR" "{base}/content.lines"')
    print(await r.stdout_str())

    # Search across a month
    r = await ws.execute('grep "CRISPR" "/paperclip/biorxiv/2025/12/"')
    print(await r.stdout_str())

    # List sections
    r = await ws.execute(f'ls "{base}/sections/"')
    print(await r.stdout_str())

    # List figures
    r = await ws.execute(f'ls "{base}/figures/"')
    print(await r.stdout_str())


if __name__ == "__main__":
    asyncio.run(main())

Shell Commands

Standard commands available on the mounted Paperclip tree:

Command	Notes
`ls`	List sources, years, months, papers, files
`cat`	Read full file (passthrough to API)
`head` / `tail`	Read N lines from start/end (passthrough to API)
`grep` / `rg`	Smart: passthrough at paper level, pre-filter at month level
`wc`	Line/word/byte counts
`stat`	File metadata (name, size, type)
`find`	Recursive search with `-name`, `-maxdepth`
`tree`	Directory tree view

Resource-Specific Commands

`search`

Full-text search across the Paperclip corpus.

search --query "CRISPR cas9" --source biorxiv --limit 50

Flag	Required	Default	Description
`--query`	Yes		Search query string
`--source`	No	all	Restrict to a single source
`--limit`	No	50	Max results
`--since`	No		Filter by date (YYYY-MM-DD)

`lookup`

Retrieve a paper by DOI or Paperclip ID.

lookup --doi "10.1101/2025.12.01.123456"
lookup --id bio_07cb291a7ce4

`scan`

Stream through all papers in a month, applying a filter expression.

scan --source biorxiv --year 2025 --month 12 --filter "authors contains 'Zhang'"

`map`

Run a transformation across matched papers and collect results.

map --query "CRISPR" --source pmc --extract "title,authors,doi"

Getting Started

Agents

Resources

Paperclip

Config

Filesystem Layout

Sources

Year / Month

Paper Directory

Command Passthrough

grep Flags

grep at Different Scopes

Cache

Example

Shell Commands

Resource-Specific Commands

`search`

`lookup`

`scan`

`map`

Getting Started

Agents

Resources

Documentation Index

​Config

​Filesystem Layout

​Sources

​Year / Month

​Paper Directory

​Command Passthrough

​grep Flags

​grep at Different Scopes

​Cache

​Example

​Shell Commands

​Resource-Specific Commands

​search

​lookup

​scan

​map

Config

Filesystem Layout

Sources

Year / Month

Paper Directory

Command Passthrough

grep Flags

grep at Different Scopes

Cache

Example

Shell Commands

Resource-Specific Commands

`search`

`lookup`

`scan`

`map`