Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mirage.strukto.ai/llms.txt

Use this file to discover all available pages before exploring further.

The GitHub resource mounts a GitHub repository as a read-only virtual filesystem. For token setup, see GitHub Setup.

Config

import os

from mirage import MountMode, Workspace
from mirage.resource.github import GitHubConfig, GitHubResource

config = GitHubConfig(token=os.environ["GITHUB_TOKEN"])
resource = GitHubResource(
    config=config, owner="my-org", repo="my-repo", ref="main")
ws = Workspace({"/github": resource}, mode=MountMode.READ)

Filesystem Layout

/github/
  README.md
  pyproject.toml
  src/
    __init__.py
    main.py
    utils.py
    models/
      user.py
      item.py
  tests/
    test_main.py
The filesystem mirrors the repository tree. No owner/repo/branch in the path - those are specified at mount time.

Tree Fetching

The resource fetches the full recursive tree at init. For repos with
100K entries, it falls back to per-directory fetching.

Cache

The GitHub resource uses IndexCacheStore with SHA-based fingerprinting. Content is content-addressed - if the SHA matches, the content is identical.

Example

import asyncio
import os

from dotenv import load_dotenv

from mirage import MountMode, Workspace
from mirage.resource.github import GitHubConfig, GitHubResource

load_dotenv(".env.development")


async def main():
    config = GitHubConfig(token=os.environ["GITHUB_TOKEN"])
    resource = GitHubResource(
        config=config, owner="my-org", repo="my-repo", ref="main")
    ws = Workspace({"/github": resource}, mode=MountMode.READ)

    # List repository root
    r = await ws.execute("ls /github/")
    print(await r.stdout_str())

    # Read a file
    r = await ws.execute("cat /github/README.md")
    print(await r.stdout_str())

    # Search for a pattern
    r = await ws.execute('rg "def main" /github/')
    print(await r.stdout_str())

    # Tree view
    r = await ws.execute("tree -L 2 /github/")
    print(await r.stdout_str())

    # File metadata with SHA
    r = await ws.execute("stat /github/README.md")
    print(await r.stdout_str())


if __name__ == "__main__":
    asyncio.run(main())
See examples/code/github.py for the full working example.

Finding SHAs

Git blob SHAs are available via the stat command:
stat /github/README.md
# -> extra={"sha": "a1b2c3d4e5f6..."}

stat /github/src/main.py
# -> extra={"sha": "f6e5d4c3b2a1..."}

Working with Large Repos

Tips for efficient access on large repositories:
# Find files by name
find /github/ -name "*.py"

# Search with rg (uses GitHub code search API when applicable)
rg "TODO" /github/

# Read only the first lines of a file
head -n 20 /github/src/main.py

# Check file sizes
du /github/src/

# List deeply nested directories
tree -L 3 /github/src/

Shell Commands

Standard commands available on the mounted GitHub tree:
CommandNotes
lsList files and directories
catRead file contents
head / tailFirst/last N lines
grep / rgPattern search; rg uses code search API
jqQuery JSON files
wcLine/word/byte counts
statFile metadata including git SHA
findRecursive search with -name, -maxdepth
treeDirectory tree view
diffCompare files
duDisk usage / file sizes
awkText processing
sedStream editing
sortSort lines
uniqDeduplicate lines
cutExtract columns
trTranslate characters
nlNumber lines
md5MD5 checksum
sha256sumSHA-256 checksum
fileDetect file type
basenameStrip directory from path
dirnameStrip filename from path
realpathResolve path

Search Optimization

rg uses the GitHub code search API when the search scope exceeds 100 files and the mounted ref is the repository’s default branch. This avoids downloading file contents and returns results significantly faster for large repositories.