Skip to main content
HfDatasetsResource mounts a Hugging Face Dataset repo at a prefix such as /ds/. All reads are lazy: only the bytes you actually cat/head get transferred. The TypeScript backend mirrors the Python one and returns identical results. For credential setup, see HF Datasets Setup.
Node only. The backend uses the native opendal binding, which does not run in the browser.

Node

pnpm add @struktoai/mirage-node
import { HfDatasetsResource, MountMode, Workspace } from '@struktoai/mirage-node'

const dataset = new HfDatasetsResource({
  repoId: 'AlienKevin/SWE-ZERO-12M-trajectories', // "namespace/dataset-name"
  token: process.env.HF_TOKEN, // optional for public datasets
  // Optional:
  // endpoint: 'https://huggingface.co',
  // revision: 'main',
  // keyPrefix: 'train/',
})

const ws = new Workspace({ '/ds/': dataset }, { mode: MountMode.READ })

console.log((await ws.execute('ls -lh /ds/')).stdoutText)
console.log((await ws.execute("find /ds/ -name '*.parquet' | wc -l")).stdoutText)
console.log((await ws.execute('cat /ds/README.md | head -n 20')).stdoutText)
HfDatasetsResource takes repoId in namespace/dataset-name form plus an optional access token. Public datasets need no token.

Filesystem Layout

Maps dataset repo files (README, parquet shards, etc.) to virtual paths under the mount prefix. Parquet shards stream lazily via byte-range reads.

Shell Commands

Same set as HF Buckets.