/ds/.
All reads are lazy: only the bytes you actually cat/head get transferred.
For credential setup, see HF Datasets Setup.
Config
HfDatasetsConfig takes repo_id in namespace/dataset-name form plus an
optional access token. Public datasets need no token.
Filesystem Layout
Maps dataset repo files to virtual paths under the mount prefix. For example, if datasetAlienKevin/SWE-ZERO-12M-trajectories contains:
/ds/ exposes:
Example
Shell Commands
Same set as HF Buckets — read, text-processing, file ops, path utilities, compression, encoding, and format-specific variants for parquet/feather/orc/hdf5.Cache
UsesIndexCacheStore with index_ttl = 600 (10 minutes). Directory
listings are cached and populate file-size/type entries for stat’s fast
path, so a readdir + per-entry stat (which ls, FUSE getattr, and
most shell commands trigger) costs one HTTP request instead of N.
Use Cases
- AI agents inspecting datasets: Mount, browse the README, sample a few rows from parquet shards without downloading the whole dataset
- Dataset triage:
ls,stat,findto see what’s in a repo before committing to a full local copy - Sandboxed access: Pin a
revisionfor reproducibility