Skip to content

radiens_drive_catalog.catalog

catalog.py

The main interface for radiens-drive-catalog. The Catalog class wraps Drive scanning, local caching, and downloading behind a simple API.

Typical usage

from radiens_drive_catalog import Catalog, Config

config = Config.from_file("config.json") catalog = Catalog(config)

Build or refresh the catalog from Drive

catalog.scan()

Query using convenience methods or pandas directly

df = catalog.list_datasets(drive_path_prefix="2026-02-15_batch") assets = catalog.list_assets(asset_type="folder")

Get a local path, downloading if needed

path = catalog.get_dataset_path("2026-02-15_batch/reaching", "rat01_2026-02-14_probe1") path = catalog.get_asset_path("2026-02-15_batch/reaching", "logs")

Download explicitly

catalog.download_dataset("2026-02-15_batch/reaching", "rat01_2026-02-14_probe1") catalog.download_asset("2026-02-15_batch/reaching", "logs")

Bulk prefetch (idempotent — skips items already on disk)

result = catalog.prefetch(drive_path_prefix="2026-02-15_batch")

Access the raw DataFrames

catalog.df catalog.assets_df

Classes

Catalog

Main interface for radiens-drive-catalog.

Wraps Google Drive scanning, local JSON catalog management, and file download. Querying is done directly on the df and assets_df DataFrames using standard pandas operations.

Recordings are uniquely identified by (drive_path, base_name) where drive_path is the slash-joined path from the Drive root to the containing folder.

Example
from radiens_drive_catalog import Catalog, Config

config = Config.from_file("config.json")
catalog = Catalog(config)

catalog.scan()
hits = catalog.list_datasets(drive_path_prefix="2026-02")
path = catalog.get_dataset_path("2026-02/reaching", "rat01")
catalog.prefetch(drive_path_prefix="2026-02")

Attributes

assets_df property
assets_df

The full assets catalog as a pandas DataFrame. Columns: asset_name, asset_type, drive_path, drive_id, mime_type, local_path, upload_time

df property
df

The full dataset catalog as a pandas DataFrame. Columns: base_name, drive_path, drive_file_ids, local_path, upload_time

Functions

__init__
__init__(config=None, *, quiet=False)

Initialize a Catalog.

Parameters:

Name Type Description Default
config Config | None

Package configuration. When omitted, Config.from_file() is called and the config file is located via the standard resolution order (env var → well-known paths).

None
quiet bool

If True, suppress tqdm progress bars during downloads.

False
download_asset
download_asset(drive_path, asset_name)

Download an asset (file or folder) to the local assets directory.

Assets are stored under {local_data_dir}/assets/{drive_path}/{asset_name}. For folder assets the entire subtree is downloaded recursively.

After a successful download, persists the local_path back to the catalog JSON.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the asset's parent folder (e.g. "2026-02-15_batch/reaching"). Shown in assets_df["drive_path"].

required
asset_name str

The file or folder name (e.g. "logs").

required

Returns:

Type Description
str

The local path to the downloaded file or folder.

Raises:

Type Description
EntryNotFoundError

If the asset is not found in the catalog.

download_dataset
download_dataset(drive_path, base_name)

Download the xdat files for a dataset to the local data directory.

Files are stored under {local_data_dir}/{drive_path}/, mirroring the Drive folder hierarchy. After a successful download, persists the local_path back to the catalog JSON.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the folder containing the dataset (as shown in df["drive_path"]).

required
base_name str

The dataset identifier (shared filename stem).

required

Returns:

Type Description
str

The local directory path where the files were written.

Raises:

Type Description
EntryNotFoundError

If the dataset is not found in the catalog.

get_asset_path
get_asset_path(drive_path, asset_name)

Return the local path for an asset, downloading if needed.

If the asset has not been downloaded yet, or if the recorded local_path no longer exists on disk, the download is triggered automatically.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the asset's parent folder (e.g. "2026-02-15_batch/reaching").

required
asset_name str

The file or folder name (e.g. "logs").

required

Returns:

Type Description
str

The local path to the downloaded file or folder.

Raises:

Type Description
EntryNotFoundError

If the asset is not found in the catalog.

get_dataset_path
get_dataset_path(drive_path, base_name)

Return the local directory path for a dataset, downloading if needed.

If the dataset has not been downloaded yet, or if the recorded local_path no longer exists on disk, the download is triggered automatically.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the folder containing the dataset (as shown in df["drive_path"]).

required
base_name str

The dataset identifier (shared filename stem).

required

Returns:

Type Description
str

The local directory path where the xdat files reside.

Raises:

Type Description
EntryNotFoundError

If the dataset is not found in the catalog.

list_assets
list_assets(
    drive_path=None,
    drive_path_prefix=None,
    drive_path_contains=None,
    asset_type=None,
)

Query the asset catalog and return a filtered DataFrame.

All filters are applied together (AND semantics). Omitting all arguments returns the full asset catalog.

Parameters:

Name Type Description Default
drive_path str | None

Exact drive_path match for the asset's parent folder (e.g. "2026-02-15_batch/reaching").

None
drive_path_prefix str | None

Return only rows whose drive_path starts with this string (e.g. "2026-02-15_batch").

None
drive_path_contains str | None

Return only rows whose drive_path contains this substring (e.g. "reaching").

None
asset_type str | None

Filter by "folder" or "file". When None both types are returned.

None

Examples:

catalog.list_assets() # everything catalog.list_assets(drive_path="2026-02-15_batch/reaching") # exact parent folder catalog.list_assets(drive_path_prefix="2026-02-15_batch") # subtree catalog.list_assets(asset_type="folder") # folders only

list_datasets
list_datasets(
    drive_path=None,
    drive_path_prefix=None,
    drive_path_contains=None,
)

Query the dataset catalog and return a filtered DataFrame.

All filters are applied together (AND semantics). Omitting all arguments returns the full catalog.

Parameters:

Name Type Description Default
drive_path str | None

Exact drive_path match (e.g. "2026-02-15_batch/reaching").

None
drive_path_prefix str | None

Return only rows whose drive_path starts with this string (e.g. "2026-02-15_batch").

None
drive_path_contains str | None

Return only rows whose drive_path contains this substring (e.g. "reaching").

None

Examples:

catalog.list_datasets() # everything catalog.list_datasets(drive_path="2026-02-15_batch/reaching") # exact folder catalog.list_datasets(drive_path_prefix="2026-02-15_batch") # subtree catalog.list_datasets(drive_path_contains="reaching") # any depth

prefetch
prefetch(
    drive_path=None,
    drive_path_prefix=None,
    drive_path_contains=None,
    *,
    datasets=True,
    assets=True,
    asset_type=None,
    force=False,
)

Bulk-download matching datasets and assets, skipping already-local items.

Idempotent by default: entries whose recorded local_path already exists on disk are skipped. Running twice in a row issues no downloads on the second call. This makes prefetch safe to use both for proactive ("pre-warm before going offline") and reactive ("ensure everything under X is available") workflows.

The drive_path filters have the same semantics as :meth:list_datasets / :meth:list_assets: AND-combined, any omitted filter is a wildcard.

Parameters:

Name Type Description Default
drive_path str | None

Exact drive_path match.

None
drive_path_prefix str | None

Match rows whose drive_path starts with this string.

None
drive_path_contains str | None

Match rows whose drive_path contains this substring.

None
datasets bool

When False, no datasets are downloaded.

True
assets bool

When False, no assets are downloaded.

True
asset_type str | None

Restrict to "folder" or "file" assets.

None
force bool

When True, download every matching entry regardless of whether it is already present on disk. Useful for refreshing stale or incomplete local copies. Defaults to False.

False

Returns:

Name Type Description
A PrefetchResult

class:PrefetchResult with per-category download and skip counts.

Examples:

catalog.prefetch() # everything catalog.prefetch(drive_path_prefix="2026-02") # subtree catalog.prefetch(drive_path_prefix="2026-02", assets=False) # datasets only catalog.prefetch(drive_path_prefix="2026-02", asset_type="folder") catalog.prefetch(force=True) # re-download everything catalog.prefetch(drive_path_prefix="2026-02", force=True) # re-download a subtree

scan
scan(*, flat=True)

Scan Drive and rebuild the catalog JSON.

Any existing local_path entries are preserved so a rescan doesn't forget which datasets or assets have already been downloaded.

Parameters:

Name Type Description Default
flat bool

If True (the default), use a flat scan of all files visible to the service account — faster and works even when the root folder is inaccessible. Set to False for a recursive traversal from the root folder.

True

Returns:

Name Type Description
A ScanResult

class:ScanResult with new/existing/removed counts for

ScanResult

datasets and assets.

PrefetchResult dataclass

Summary of a :meth:Catalog.prefetch call.

Attributes:

Name Type Description
datasets_downloaded int

Number of datasets fetched from Drive.

datasets_skipped int

Number of datasets already available locally.

assets_downloaded int

Number of assets fetched from Drive.

assets_skipped int

Number of assets already available locally.

ScanResult dataclass

Summary of a :meth:Catalog.scan call.

Attributes:

Name Type Description
datasets_new int

Datasets found on Drive that were not in the previous catalog.

datasets_existing int

Datasets found on Drive that were already in the catalog.

datasets_removed int

Datasets in the previous catalog that were not found on Drive.

assets_new int

Assets found on Drive that were not in the previous catalog.

assets_existing int

Assets found on Drive that were already in the catalog.

assets_removed int

Assets in the previous catalog that were not found on Drive.

Functions