Skip to main content
The HF Buckets resource mounts a Hugging Face Bucket at some prefix such as /hf/. Speaks HF’s HTTP API natively (async, streaming, no Python SDK dependency). For credential setup, see HF Buckets Setup.

Config

import os

from mirage import MountMode, Workspace
from mirage.resource.hf_buckets import HfBucketsConfig, HfBucketsResource

config = HfBucketsConfig(
    bucket=os.environ["HF_BUCKET_NAME"],  # "namespace/bucket-name"
    token=os.environ["HF_TOKEN"],
    # Optional:
    # endpoint="https://huggingface.co",
    # timeout=30,
    # key_prefix="data/",
)
resource = HfBucketsResource(config)
ws = Workspace({"/hf": resource}, mode=MountMode.READ)
HfBucketsResource(config) takes an HfBucketsConfig object with the bucket in namespace/bucket-name form plus an optional access token. Both READ and WRITE modes are supported out of the box.

Filesystem Layout

The HF Buckets resource maps bucket object keys to virtual paths under the mount prefix. For example, if bucket your-user/my-data contains:
data/file.txt
data/config.json
reports/q1.csv
reports/q2.csv
Then mounting at /hf/ exposes:
/hf/
  data/
    file.txt
    config.json
  reports/
    q1.csv
    q2.csv
Path mapping: virtual /hf/data/file.txt maps to bucket key data/file.txt.

Cache

The HF Buckets resource uses IndexCacheStore with index_ttl = 600 (10 minutes). Directory listings are cached and populate file-size/type entries that stat reads via a fast path, so a readdir followed by per-entry stat calls (which is what ls, FUSE getattr, and most shell commands trigger) costs one HTTP request instead of N.

Example

import asyncio
import os

from dotenv import load_dotenv

from mirage import MountMode, Workspace
from mirage.resource.hf_buckets import HfBucketsConfig, HfBucketsResource

load_dotenv(".env.development")

config = HfBucketsConfig(
    bucket=os.environ["HF_BUCKET_NAME"],
    token=os.environ["HF_TOKEN"],
)

resource = HfBucketsResource(config)


async def main() -> None:
    ws = Workspace({"/hf/": resource}, mode=MountMode.READ)

    r = await ws.execute("ls /hf/")
    print(await r.stdout_str())

    r = await ws.execute("cat /hf/data/file.txt")
    print(await r.stdout_str())

    r = await ws.execute("tree /hf/")
    print(await r.stdout_str())

    r = await ws.execute("find /hf/ -name '*.json'")
    print(await r.stdout_str())

    r = await ws.execute("grep example /hf/data/config.json")
    print(await r.stdout_str())

    r = await ws.execute("stat /hf/data/file.txt")
    print(await r.stdout_str())


if __name__ == "__main__":
    asyncio.run(main())

Shell Commands

The HF Buckets resource supports the full set of shell commands since it operates on real file content (text, binary, JSON, CSV, etc.). Large files benefit from range reads to avoid downloading entire objects.

Read Commands

CommandNotes
catRead file content
head / tailFirst/last N lines
grep / rgPattern search (file or directory level)
jqQuery JSON fields
wcLine/word/byte counts
statFile metadata (name, size, type, modified)
findRecursive search with -name, -maxdepth
treeDirectory tree view
nlNumber lines
duDisk usage summary
fileDetect file type
stringsExtract printable strings from binary
xxdHex dump
md5MD5 checksum
sha256sumSHA-256 checksum

Text Processing

CommandNotes
awkPattern scanning and processing
sedStream editor
trTranslate or delete characters
sortSort lines
uniqRemove duplicate lines
cutExtract fields/columns
joinJoin lines on a common field
pasteMerge lines side by side
columnColumnate output
foldWrap lines to a specified width
expandConvert tabs to spaces
unexpandConvert spaces to tabs
fmtSimple text formatter
revReverse lines
tacConcatenate and print in reverse
lookDisplay lines beginning with a given string
shufShuffle lines
tsortTopological sort
commCompare two sorted files
cmpCompare two files byte by byte
diffCompare files line by line
patchApply a diff patch
iconvCharacter encoding conversion

File Operations

CommandNotes
cpCopy files
mvMove/rename files
rmRemove files
mkdirCreate directories
touchCreate empty file or update timestamp
lnCreate symbolic links
teeWrite stdin to file and stdout
mktempCreate temporary file
splitSplit file into pieces
csplitSplit file by context

Path Utilities

CommandNotes
basenameStrip directory from path
dirnameStrip filename from path
realpathResolve path
readlinkPrint symbolic link target
lsList directory contents

Compression

CommandNotes
gzipCompress files
gunzipDecompress gzip files
zipCreate zip archives
unzipExtract zip archives
tarArchive files
zcatCat compressed files
zgrepGrep compressed files

Encoding

CommandNotes
base64Base64 encode/decode

Data Format Support

Commands with format-specific variants for structured data files:
FormatExtensionVariants
Parquet.parquetcat, head, tail, wc, stat, cut, grep, ls, file
Feather.feathercat, head, tail, wc, stat, cut, grep, ls, file
ORC.orccat, head, tail, wc, stat, cut, grep, ls, file
HDF5.hdf5cat, head, tail, wc, stat, cut, grep, ls, file
These variants auto-detect the format by extension and convert to tabular text (CSV) for processing.

Use Cases

  • AI agents accessing HF datasets: Mount HF Buckets for agents to read and process datasets stored on the Hub
  • Data pipelines: Read and write HF bucket objects with shell-like commands
  • Sandboxed bucket access: Restrict agent operations to a specific bucket and prefix
  • FUSE mounting: Expose HF Buckets through a virtual FUSE mount for external tools

Scoping a resource to a key prefix

Pass key_prefix: str | None = None to HfBucketsConfig to transparently scope every operation to a subpath of the bucket:
HfBucketsResource(HfBucketsConfig(
    bucket="your-user/app-data",
    token=hf_token,
    key_prefix=f"users/{user_id}/",
))
When set, every read/write/list/stat operation is transparently scoped to that bucket subpath. Agents see clean paths like /data/notes.md; the underlying bucket key is users/{user_id}/data/notes.md. Useful for multi-tenant systems. Normalization: leading slashes are stripped and a trailing slash is added automatically. Both None and an empty string are treated as “no prefix.”