Documentation Index
Fetch the complete documentation index at: https://docs.mirage.strukto.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Paperclip resource exposes 8M+ biomedical papers from bioRxiv, medRxiv,
and PubMed Central as a virtual filesystem organized by source, year, and
month.
For credential setup, see Paperclip Setup.
Config
from mirage import MountMode, Workspace
from mirage.resource.paperclip import PaperclipConfig, PaperclipResource
config = PaperclipConfig()
resource = PaperclipResource(config=config)
ws = Workspace({"/paperclip": resource}, mode=MountMode.READ)
Filesystem Layout
/paperclip/
biorxiv/
2025/
12/
bio_07cb291a7ce4/
meta.json
content.lines
sections/
introduction.lines
methods.lines
results.lines
discussion.lines
figures/
fig1.tif
fig2.gif
fig3.jpg
supplements/
table_s1.csv
methods_s1.docx
notes.md.lines
medrxiv/
2025/
11/
med_a3f19bc2e810/
meta.json
content.lines
sections/
figures/
supplements/
pmc/
2024/
06/
pmc_9e12d4a7b5c3/
meta.json
content.lines
sections/
figures/
supplements/
Sources
| Directory | Description |
|---|
biorxiv | bioRxiv preprints |
medrxiv | medRxiv preprints |
pmc | PubMed Central peer-reviewed |
Year / Month
Each source directory contains year directories, each containing month
directories. Listing a month directory issues a SQL query against the
Paperclip API for that source and time range, returning up to
default_limit (500) papers.
Paper Directory
Each paper is a directory containing:
| Entry | Type | Description |
|---|
meta.json | JSON | Paper metadata (title, authors, DOI, dates) |
content.lines | Text | Full text with L<n>: line prefixes |
sections/ | Dir | Per-section *.lines files |
figures/ | Dir | Figure files (.tif, .gif, .jpg) |
supplements/ | Dir | Supplementary files (.docx, .csv, .md.lines) |
Command Passthrough
File-level cat, head, tail, and grep commands are passed through
to the Paperclip API rather than downloading the full file first.
grep Flags
| Flag | Supported | Notes |
|---|
-i | Yes | Case-insensitive |
-c | Yes | Count matches |
-m | Yes | Max match count |
-v | Yes | Invert match |
-E | No | Fails silently |
grep at Different Scopes
# PAPER level - passthrough to Paperclip API
grep "CRISPR" "/paperclip/biorxiv/2025/12/bio_07cb291a7ce4/content.lines"
# MONTH level - search pre-filter, then grep within results
grep "CRISPR" "/paperclip/biorxiv/2025/12/"
Cache
The Paperclip resource uses IndexCacheStore (same as RAM/S3/disk/GitHub).
Index entries store source listings and paper metadata. There is no
separate content cache - file content caching is handled by the workspace
IOResult mechanism.
Example
import asyncio
from mirage import MountMode, Workspace
from mirage.resource.paperclip import PaperclipConfig, PaperclipResource
config = PaperclipConfig()
resource = PaperclipResource(config=config)
async def main():
ws = Workspace({"/paperclip": resource}, mode=MountMode.READ)
# List sources
r = await ws.execute("ls /paperclip/")
print(await r.stdout_str())
# List papers in a month
r = await ws.execute("ls /paperclip/biorxiv/2025/12/")
print(await r.stdout_str())
paper = r.stdout_str().strip().splitlines()[0].strip()
base = f"/paperclip/biorxiv/2025/12/{paper}"
# Read paper metadata
r = await ws.execute(f'cat "{base}/meta.json"')
print(await r.stdout_str())
# Read first 20 lines of content
r = await ws.execute(f'head -n 20 "{base}/content.lines"')
print(await r.stdout_str())
# Search within a paper
r = await ws.execute(f'grep "CRISPR" "{base}/content.lines"')
print(await r.stdout_str())
# Search across a month
r = await ws.execute('grep "CRISPR" "/paperclip/biorxiv/2025/12/"')
print(await r.stdout_str())
# List sections
r = await ws.execute(f'ls "{base}/sections/"')
print(await r.stdout_str())
# List figures
r = await ws.execute(f'ls "{base}/figures/"')
print(await r.stdout_str())
if __name__ == "__main__":
asyncio.run(main())
Shell Commands
Standard commands available on the mounted Paperclip tree:
| Command | Notes |
|---|
ls | List sources, years, months, papers, files |
cat | Read full file (passthrough to API) |
head / tail | Read N lines from start/end (passthrough to API) |
grep / rg | Smart: passthrough at paper level, pre-filter at month level |
wc | Line/word/byte counts |
stat | File metadata (name, size, type) |
find | Recursive search with -name, -maxdepth |
tree | Directory tree view |
Resource-Specific Commands
search
Full-text search across the Paperclip corpus.
search --query "CRISPR cas9" --source biorxiv --limit 50
| Flag | Required | Default | Description |
|---|
--query | Yes | | Search query string |
--source | No | all | Restrict to a single source |
--limit | No | 50 | Max results |
--since | No | | Filter by date (YYYY-MM-DD) |
lookup
Retrieve a paper by DOI or Paperclip ID.
lookup --doi "10.1101/2025.12.01.123456"
lookup --id bio_07cb291a7ce4
scan
Stream through all papers in a month, applying a filter expression.
scan --source biorxiv --year 2025 --month 12 --filter "authors contains 'Zhang'"
map
Run a transformation across matched papers and collect results.
map --query "CRISPR" --source pmc --extract "title,authors,doi"