Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mirage.strukto.ai/llms.txt

Use this file to discover all available pages before exploring further.

The S3 resource mounts an Amazon S3 bucket (or any S3-compatible service) at some prefix such as /s3/. All operations involve network I/O to the remote object store. Uses aioboto3 for async S3 access. Compatible with: AWS S3, MinIO, Cloudflare R2, Supabase Storage, DigitalOcean Spaces, and any S3-compatible service.

Config

from mirage import MountMode, Workspace
from mirage.resource.s3 import S3Resource, S3Config

config = S3Config(
    bucket="my-bucket",
    region="us-east-1",
    # Optional:
    # aws_access_key_id="...",
    # aws_secret_access_key="...",
    # aws_profile="my-profile",
    # endpoint_url="http://localhost:9000",  # MinIO
    # path_style=True,  # For MinIO/R2
    # timeout=30,
    # proxy="http://proxy:8080",
)

resource = S3Resource(config)
ws = Workspace({"/s3": resource}, mode=MountMode.READ)
S3Resource(config) takes an S3Config object with the bucket name and optional credentials, endpoint, and connection settings. Both READ and WRITE modes are supported.

Filesystem Layout

The S3 resource maps S3 object keys to virtual paths under the mount prefix. S3 “directories” are prefix-based - there are no real directory objects. For example, if bucket my-bucket contains:
data/file.txt
data/config.json
reports/q1.csv
reports/q2.csv
Then mounting at /s3/ exposes:
/s3/
  data/
    file.txt
    config.json
  reports/
    q1.csv
    q2.csv
Path mapping: virtual /s3/data/file.txt maps to S3 key data/file.txt.

Cache

The S3 resource uses IndexCacheStore with _index_ttl = 600 (10 minutes). Directory listings are cached for up to 600 seconds before being refreshed from S3. This reduces API calls for repeated directory traversals.

Example

import asyncio

from mirage import MountMode, Workspace
from mirage.resource.s3 import S3Resource, S3Config

config = S3Config(
    bucket="my-bucket",
    region="us-east-1",
)

resource = S3Resource(config)


async def main() -> None:
    ws = Workspace({"/s3/": resource}, mode=MountMode.READ)

    r = await ws.execute("ls /s3/")
    print(await r.stdout_str())

    r = await ws.execute("cat /s3/data/file.txt")
    print(await r.stdout_str())

    r = await ws.execute("tree /s3/")
    print(await r.stdout_str())

    r = await ws.execute("find /s3/ -name '*.json'")
    print(await r.stdout_str())

    r = await ws.execute("grep example /s3/data/config.json")
    print(await r.stdout_str())

    r = await ws.execute("stat /s3/data/file.txt")
    print(await r.stdout_str())


if __name__ == "__main__":
    asyncio.run(main())

Shell Commands

The S3 resource supports the full set of shell commands since it operates on real file content (text, binary, JSON, CSV, etc.). Large files benefit from range reads to avoid downloading entire objects.

Read Commands

CommandNotes
catRead file content
head / tailFirst/last N lines
grep / rgPattern search (file or directory level)
jqQuery JSON fields
wcLine/word/byte counts
statFile metadata (name, size, type, modified)
findRecursive search with -name, -maxdepth
treeDirectory tree view
nlNumber lines
duDisk usage summary
fileDetect file type
stringsExtract printable strings from binary
xxdHex dump
md5MD5 checksum
sha256sumSHA-256 checksum

Text Processing

CommandNotes
awkPattern scanning and processing
sedStream editor
trTranslate or delete characters
sortSort lines
uniqRemove duplicate lines
cutExtract fields/columns
joinJoin lines on a common field
pasteMerge lines side by side
columnColumnate output
foldWrap lines to a specified width
expandConvert tabs to spaces
unexpandConvert spaces to tabs
fmtSimple text formatter
revReverse lines
tacConcatenate and print in reverse
lookDisplay lines beginning with a given string
shufShuffle lines
tsortTopological sort
commCompare two sorted files
cmpCompare two files byte by byte
diffCompare files line by line
patchApply a diff patch
iconvCharacter encoding conversion

File Operations

CommandNotes
cpCopy files
mvMove/rename files
rmRemove files
mkdirCreate directories
touchCreate empty file or update timestamp
lnCreate symbolic links
teeWrite stdin to file and stdout
mktempCreate temporary file
splitSplit file into pieces
csplitSplit file by context

Path Utilities

CommandNotes
basenameStrip directory from path
dirnameStrip filename from path
realpathResolve path
readlinkPrint symbolic link target
lsList directory contents

Compression

CommandNotes
gzipCompress files
gunzipDecompress gzip files
zipCreate zip archives
unzipExtract zip archives
tarArchive files
zcatCat compressed files
zgrepGrep compressed files

Encoding

CommandNotes
base64Base64 encode/decode

Data Format Support

Commands with format-specific variants for structured data files:
FormatExtensionVariants
Parquet.parquetcat, head, tail, wc, stat, cut, grep, ls, file
Feather.feathercat, head, tail, wc, stat, cut, grep, ls, file
ORC.orccat, head, tail, wc, stat, cut, grep, ls, file
HDF5.hdf5cat, head, tail, wc, stat, cut, grep, ls, file
These variants auto-detect the format by extension and convert to tabular text (CSV) for processing.

Audio Support (Optional)

Audio commands are opt-in and require sherpa-onnx with a Whisper model. They transcribe audio to text, enabling cat, head, tail, grep, and stat on audio files.
FormatExtensionCommands
WAV.wavcat, head, tail, grep, stat
MP3.mp3cat, head, tail, grep, stat
OGG.oggcat, head, tail, grep, stat
To enable, register audio commands manually:
from mirage.commands.audio import AUDIO_COMMANDS
from mirage.commands.audio.utils import configure

configure(model_dir="path/to/sherpa-onnx-whisper-base")

for cmd in AUDIO_COMMANDS:
    ws.register(cmd)

Use Cases

  • AI agents accessing cloud data: Mount S3 buckets for agents to read and process remote datasets
  • Data pipelines: Read and write S3 objects with shell-like commands
  • Sandboxed cloud storage access: Restrict agent operations to a specific bucket and prefix
  • FUSE mounting: Expose S3 buckets through a virtual FUSE mount for external tools