Skip to main content
The GCS resource mounts a Google Cloud Storage bucket at some prefix such as /gcs/. All operations involve network I/O to the remote object store. Uses aioboto3 against GCS’s S3-compatible XML API via HMAC keys, inheriting all S3 resource capabilities. For credential setup, see GCS Setup.

Config

import os

from mirage import MountMode, Workspace
from mirage.resource.gcs import GCSConfig, GCSResource

config = GCSConfig(
    bucket=os.environ["GCS_BUCKET"],
    access_key_id=os.environ["GCS_ACCESS_KEY_ID"],
    secret_access_key=os.environ["GCS_SECRET_ACCESS_KEY"],
    # Optional:
    # endpoint_url="https://storage.googleapis.com",
    # region="auto",
    # timeout=30,
    # proxy="http://proxy:8080",
)
resource = GCSResource(config)
ws = Workspace({"/gcs": resource}, mode=MountMode.READ)
GCSResource(config) takes a GCSConfig object with the bucket name and HMAC credentials. Both READ and WRITE modes are supported.

Filesystem Layout

The GCS resource maps object keys to virtual paths under the mount prefix, identical to the S3 resource. GCS “directories” are prefix-based — there are no real directory objects. For example, if bucket mirage-ai contains:
data/example.json
data/example.parquet
data/example.jsonl
Then mounting at /gcs/ exposes:
/gcs/
  data/
    example.json
    example.parquet
    example.jsonl
Path mapping: virtual /gcs/data/example.json maps to GCS key data/example.json.

Cache

The GCS resource uses IndexCacheStore with index_ttl = 600 (10 minutes), same as S3. Directory listings are cached for up to 600 seconds before being refreshed from GCS. This reduces API calls for repeated directory traversals.

Example

import asyncio
import os

from dotenv import load_dotenv

from mirage import MountMode, Workspace
from mirage.resource.gcs import GCSConfig, GCSResource

load_dotenv(".env.development")

config = GCSConfig(
    bucket=os.environ["GCS_BUCKET"],
    access_key_id=os.environ["GCS_ACCESS_KEY_ID"],
    secret_access_key=os.environ["GCS_SECRET_ACCESS_KEY"],
)

resource = GCSResource(config)


async def main() -> None:
    ws = Workspace({"/gcs/": resource}, mode=MountMode.READ)

    r = await ws.execute("ls /gcs/")
    print(await r.stdout_str())

    r = await ws.execute("cat /gcs/data/example.json | head -n 10")
    print(await r.stdout_str())

    r = await ws.execute("tree /gcs/")
    print(await r.stdout_str())

    r = await ws.execute("find /gcs/ -name '*.parquet'")
    print(await r.stdout_str())

    r = await ws.execute("stat /gcs/data/example.json")
    print(await r.stdout_str())


if __name__ == "__main__":
    asyncio.run(main())

Shell Commands

The GCS resource supports the full set of shell commands since it operates on real file content (text, binary, JSON, CSV, etc.). Large files benefit from range reads to avoid downloading entire objects.

Read Commands

CommandNotes
catRead file content
head / tailFirst/last N lines
grep / rgPattern search (file or directory level)
jqQuery JSON fields
wcLine/word/byte counts
statFile metadata (name, size, type, modified)
findRecursive search with -name, -maxdepth
treeDirectory tree view
nlNumber lines
duDisk usage summary
fileDetect file type
stringsExtract printable strings from binary
xxdHex dump
md5MD5 checksum
sha256sumSHA-256 checksum

Text Processing

CommandNotes
awkPattern scanning and processing
sedStream editor
trTranslate or delete characters
sortSort lines
uniqRemove duplicate lines
cutExtract fields/columns
joinJoin lines on a common field
pasteMerge lines side by side
columnColumnate output
foldWrap lines to a specified width
expandConvert tabs to spaces
unexpandConvert spaces to tabs
fmtSimple text formatter
revReverse lines
tacConcatenate and print in reverse
lookDisplay lines beginning with a given string
shufShuffle lines
tsortTopological sort
commCompare two sorted files
cmpCompare two files byte by byte
diffCompare files line by line
patchApply a diff patch
iconvCharacter encoding conversion

File Operations

CommandNotes
cpCopy files
mvMove/rename files
rmRemove files
mkdirCreate directories
touchCreate empty file or update timestamp
lnCreate symbolic links
teeWrite stdin to file and stdout
mktempCreate temporary file
splitSplit file into pieces
csplitSplit file by context

Path Utilities

CommandNotes
basenameStrip directory from path
dirnameStrip filename from path
realpathResolve path
readlinkPrint symbolic link target
lsList directory contents

Compression

CommandNotes
gzipCompress files
gunzipDecompress gzip files
zipCreate zip archives
unzipExtract zip archives
tarArchive files
zcatCat compressed files
zgrepGrep compressed files

Encoding

CommandNotes
base64Base64 encode/decode

Data Format Support

Commands with format-specific variants for structured data files:
FormatExtensionVariants
Parquet.parquetcat, head, tail, wc, stat, cut, grep, ls, file
Feather.feathercat, head, tail, wc, stat, cut, grep, ls, file
ORC.orccat, head, tail, wc, stat, cut, grep, ls, file
HDF5.hdf5cat, head, tail, wc, stat, cut, grep, ls, file
These variants auto-detect the format by extension and convert to tabular text (CSV) for processing.

Use Cases

  • AI agents accessing GCS data: Mount GCS buckets for agents to read and process datasets
  • Data pipelines: Read and write GCS objects with shell-like commands
  • FUSE mounting: Expose GCS buckets through a virtual FUSE mount for external tools