Skip to content

radiens_drive_catalog

radiens-drive-catalog

Programmatic catalog and sync tool for xdat neural datasets stored on Google Drive. Datasets are uniquely identified by (drive_path, base_name) and queryable by drive path exact match, prefix, or substring.

Classes

AssetEntry

Bases: TypedDict

One non-xdat asset record stored in the catalog.

Represents a single file or folder found on Drive that is not an xdat dataset — for example a logs/ directory, a PowerPoint, or a writeup.

Attributes:

Name Type Description
asset_name str

The file or folder name as it appears on Drive (e.g. "logs", "notes.pptx").

asset_type Literal['folder', 'file']

"folder" or "file".

drive_path str

Slash-joined path from the root folder to the parent folder of this asset (e.g. "2026-02-15_batch/reaching"). Together with asset_name, this uniquely identifies an asset.

drive_id str

Google Drive ID of this file or folder.

mime_type str

MIME type as reported by the Drive API.

local_path str | None

Absolute local path to the downloaded file or folder, or None if not yet downloaded.

Catalog

Main interface for radiens-drive-catalog.

Wraps Google Drive scanning, local JSON catalog management, and file download. Querying is done directly on the df and assets_df DataFrames using standard pandas operations.

Recordings are uniquely identified by (drive_path, base_name) where drive_path is the slash-joined path from the Drive root to the containing folder.

Example
from radiens_drive_catalog import Catalog, Config

config = Config.from_file("config.json")
catalog = Catalog(config)

catalog.scan()
hits = catalog.list_datasets(drive_path_prefix="2026-02")
path = catalog.get_dataset_path("2026-02/reaching", "rat01")
catalog.prefetch(drive_path_prefix="2026-02")

Attributes

assets_df property
assets_df

The full assets catalog as a pandas DataFrame. Columns: asset_name, asset_type, drive_path, drive_id, mime_type, local_path

df property
df

The full dataset catalog as a pandas DataFrame. Columns: base_name, drive_path, drive_file_ids, local_path

Functions

__init__
__init__(config=None, *, quiet=False)

Initialize a Catalog.

Parameters:

Name Type Description Default
config Config | None

Package configuration. When omitted, Config.from_file() is called and the config file is located via the standard resolution order (env var → well-known paths).

None
quiet bool

If True, suppress tqdm progress bars during downloads.

False
download_asset
download_asset(drive_path, asset_name)

Download an asset (file or folder) to the local assets directory.

Assets are stored under {local_data_dir}/assets/{drive_path}/{asset_name}. For folder assets the entire subtree is downloaded recursively.

After a successful download, persists the local_path back to the catalog JSON.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the asset's parent folder (e.g. "2026-02-15_batch/reaching"). Shown in assets_df["drive_path"].

required
asset_name str

The file or folder name (e.g. "logs").

required

Returns:

Type Description
str

The local path to the downloaded file or folder.

Raises:

Type Description
EntryNotFoundError

If the asset is not found in the catalog.

download_dataset
download_dataset(drive_path, base_name)

Download the xdat files for a dataset to the local data directory.

Files are stored under {local_data_dir}/{drive_path}/, mirroring the Drive folder hierarchy. After a successful download, persists the local_path back to the catalog JSON.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the folder containing the dataset (as shown in df["drive_path"]).

required
base_name str

The dataset identifier (shared filename stem).

required

Returns:

Type Description
str

The local directory path where the files were written.

Raises:

Type Description
EntryNotFoundError

If the dataset is not found in the catalog.

get_asset_path
get_asset_path(drive_path, asset_name)

Return the local path for an asset, downloading if needed.

If the asset has not been downloaded yet, or if the recorded local_path no longer exists on disk, the download is triggered automatically.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the asset's parent folder (e.g. "2026-02-15_batch/reaching").

required
asset_name str

The file or folder name (e.g. "logs").

required

Returns:

Type Description
str

The local path to the downloaded file or folder.

Raises:

Type Description
EntryNotFoundError

If the asset is not found in the catalog.

get_dataset_path
get_dataset_path(drive_path, base_name)

Return the local directory path for a dataset, downloading if needed.

If the dataset has not been downloaded yet, or if the recorded local_path no longer exists on disk, the download is triggered automatically.

Parameters:

Name Type Description Default
drive_path str

The slash-joined path to the folder containing the dataset (as shown in df["drive_path"]).

required
base_name str

The dataset identifier (shared filename stem).

required

Returns:

Type Description
str

The local directory path where the xdat files reside.

Raises:

Type Description
EntryNotFoundError

If the dataset is not found in the catalog.

list_assets
list_assets(
    drive_path=None,
    drive_path_prefix=None,
    drive_path_contains=None,
    asset_type=None,
)

Query the asset catalog and return a filtered DataFrame.

All filters are applied together (AND semantics). Omitting all arguments returns the full asset catalog.

Parameters:

Name Type Description Default
drive_path str | None

Exact drive_path match for the asset's parent folder (e.g. "2026-02-15_batch/reaching").

None
drive_path_prefix str | None

Return only rows whose drive_path starts with this string (e.g. "2026-02-15_batch").

None
drive_path_contains str | None

Return only rows whose drive_path contains this substring (e.g. "reaching").

None
asset_type str | None

Filter by "folder" or "file". When None both types are returned.

None

Examples:

catalog.list_assets() # everything catalog.list_assets(drive_path="2026-02-15_batch/reaching") # exact parent folder catalog.list_assets(drive_path_prefix="2026-02-15_batch") # subtree catalog.list_assets(asset_type="folder") # folders only

list_datasets
list_datasets(
    drive_path=None,
    drive_path_prefix=None,
    drive_path_contains=None,
)

Query the dataset catalog and return a filtered DataFrame.

All filters are applied together (AND semantics). Omitting all arguments returns the full catalog.

Parameters:

Name Type Description Default
drive_path str | None

Exact drive_path match (e.g. "2026-02-15_batch/reaching").

None
drive_path_prefix str | None

Return only rows whose drive_path starts with this string (e.g. "2026-02-15_batch").

None
drive_path_contains str | None

Return only rows whose drive_path contains this substring (e.g. "reaching").

None

Examples:

catalog.list_datasets() # everything catalog.list_datasets(drive_path="2026-02-15_batch/reaching") # exact folder catalog.list_datasets(drive_path_prefix="2026-02-15_batch") # subtree catalog.list_datasets(drive_path_contains="reaching") # any depth

prefetch
prefetch(
    drive_path=None,
    drive_path_prefix=None,
    drive_path_contains=None,
    *,
    datasets=True,
    assets=True,
    asset_type=None,
    force=False,
)

Bulk-download matching datasets and assets, skipping already-local items.

Idempotent by default: entries whose recorded local_path already exists on disk are skipped. Running twice in a row issues no downloads on the second call. This makes prefetch safe to use both for proactive ("pre-warm before going offline") and reactive ("ensure everything under X is available") workflows.

The drive_path filters have the same semantics as :meth:list_datasets / :meth:list_assets: AND-combined, any omitted filter is a wildcard.

Parameters:

Name Type Description Default
drive_path str | None

Exact drive_path match.

None
drive_path_prefix str | None

Match rows whose drive_path starts with this string.

None
drive_path_contains str | None

Match rows whose drive_path contains this substring.

None
datasets bool

When False, no datasets are downloaded.

True
assets bool

When False, no assets are downloaded.

True
asset_type str | None

Restrict to "folder" or "file" assets.

None
force bool

When True, download every matching entry regardless of whether it is already present on disk. Useful for refreshing stale or incomplete local copies. Defaults to False.

False

Returns:

Name Type Description
A PrefetchResult

class:PrefetchResult with per-category download and skip counts.

Examples:

catalog.prefetch() # everything catalog.prefetch(drive_path_prefix="2026-02") # subtree catalog.prefetch(drive_path_prefix="2026-02", assets=False) # datasets only catalog.prefetch(drive_path_prefix="2026-02", asset_type="folder") catalog.prefetch(force=True) # re-download everything catalog.prefetch(drive_path_prefix="2026-02", force=True) # re-download a subtree

scan
scan(*, flat=True)

Scan Drive and rebuild the catalog JSON.

Any existing local_path entries are preserved so a rescan doesn't forget which datasets or assets have already been downloaded.

Parameters:

Name Type Description Default
flat bool

If True (the default), use a flat scan of all files visible to the service account — faster and works even when the root folder is inaccessible. Set to False for a recursive traversal from the root folder.

True

Returns:

Name Type Description
A ScanResult

class:ScanResult with new/existing/removed counts for

ScanResult

datasets and assets.

CatalogError

Bases: Exception

Base class for all radiens-drive-catalog errors.

Config dataclass

Configuration for radiens-drive-catalog.

All path fields (credentials_path, local_data_dir, catalog_path) support ~ and $ENV_VAR expansion and are resolved to absolute paths on construction. Typically created via Config.from_file() rather than directly.

Attributes:

Name Type Description
credentials_path str

Path to the Google service account credentials JSON file.

root_folder_id str

Google Drive folder ID of the data root folder.

local_data_dir str

Local directory where datasets will be downloaded.

catalog_path str

Path to the catalog JSON file (created by Catalog.scan()).

Functions

__post_init__
__post_init__()

Expand ~ and $ENV_VARS in all path fields and resolve to absolute paths.

from_file classmethod
from_file(path=None)

Load config from a JSON file.

Resolution order when path is None:

  1. RADIENS_DRIVE_CATALOG_CONFIG environment variable.
  2. .secrets/config.json in the current working directory.
  3. config.json in the current working directory.
  4. ~/.config/radiens-drive/config.json in the user's home directory.
  5. /etc/radiens-drive/config.json.

Parameters:

Name Type Description Default
path str | None

Path to the config JSON file. When None, the resolution order above is used.

None

Returns:

Type Description
Config

A Config instance with all paths expanded and resolved.

Raises:

Type Description
FileNotFoundError

If no config file can be located.

JSONDecodeError

If the config file is not valid JSON.

TypeError

If the JSON fields do not match the Config field names.

DatasetEntry

Bases: TypedDict

One xdat dataset record stored in the catalog.

Represents a single neural recording dataset. The three xdat files (_data.xdat, .xdat.json, _timestamp.xdat) share a common base_name stem.

Attributes:

Name Type Description
base_name str

Shared filename stem across all three xdat files. Note: base_name is not globally unique; the pair (drive_path, base_name) uniquely identifies a recording.

drive_path str

Slash-joined path from the root folder to the folder containing the dataset files (e.g. "2026-02-15_batch/reaching/probe1").

drive_file_ids dict[str, str]

Maps file type labels ("data", "meta", "timestamp") to their Google Drive file IDs. May contain fewer than three keys if only some files were found during scanning.

local_path str | None

Absolute path to the local directory where the dataset has been downloaded, or None if not yet downloaded.

EntryNotFoundError

Bases: CatalogError

Raised when a dataset or asset cannot be found in the catalog.

PrefetchResult dataclass

Summary of a :meth:Catalog.prefetch call.

Attributes:

Name Type Description
datasets_downloaded int

Number of datasets fetched from Drive.

datasets_skipped int

Number of datasets already available locally.

assets_downloaded int

Number of assets fetched from Drive.

assets_skipped int

Number of assets already available locally.

ScanResult dataclass

Summary of a :meth:Catalog.scan call.

Attributes:

Name Type Description
datasets_new int

Datasets found on Drive that were not in the previous catalog.

datasets_existing int

Datasets found on Drive that were already in the catalog.

datasets_removed int

Datasets in the previous catalog that were not found on Drive.

assets_new int

Assets found on Drive that were not in the previous catalog.

assets_existing int

Assets found on Drive that were already in the catalog.

assets_removed int

Assets in the previous catalog that were not found on Drive.