radiens_drive_catalog.catalog
catalog.py
The main interface for radiens-drive-catalog. The Catalog class wraps Drive scanning, local caching, and downloading behind a simple API.
Typical usage
from radiens_drive_catalog import Catalog, Config
config = Config.from_file("config.json") catalog = Catalog(config)
Build or refresh the catalog from Drive
catalog.scan()
Query using convenience methods or pandas directly
df = catalog.list_datasets(drive_path_prefix="2026-02-15_batch") assets = catalog.list_assets(asset_type="folder")
Get a local path, downloading if needed
path = catalog.get_dataset_path("2026-02-15_batch/reaching", "rat01_2026-02-14_probe1") path = catalog.get_asset_path("2026-02-15_batch/reaching", "logs")
Download explicitly
catalog.download_dataset("2026-02-15_batch/reaching", "rat01_2026-02-14_probe1") catalog.download_asset("2026-02-15_batch/reaching", "logs")
Bulk prefetch (idempotent — skips items already on disk)
result = catalog.prefetch(drive_path_prefix="2026-02-15_batch")
Access the raw DataFrames
catalog.df catalog.assets_df
Classes
Catalog
Main interface for radiens-drive-catalog.
Wraps Google Drive scanning, local JSON catalog management, and file
download. Querying is done directly on the df and assets_df
DataFrames using standard pandas operations.
Recordings are uniquely identified by (drive_path, base_name) where
drive_path is the slash-joined path from the Drive root to the
containing folder.
Example
Attributes
assets_df
property
The full assets catalog as a pandas DataFrame. Columns: asset_name, asset_type, drive_path, drive_id, mime_type, local_path, upload_time
df
property
The full dataset catalog as a pandas DataFrame. Columns: base_name, drive_path, drive_file_ids, local_path, upload_time
Functions
__init__
Initialize a Catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Config | None
|
Package configuration. When omitted, |
None
|
quiet
|
bool
|
If |
False
|
download_asset
Download an asset (file or folder) to the local assets directory.
Assets are stored under {local_data_dir}/assets/{drive_path}/{asset_name}.
For folder assets the entire subtree is downloaded recursively.
After a successful download, persists the local_path back to the
catalog JSON.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drive_path
|
str
|
The slash-joined path to the asset's parent folder
(e.g. |
required |
asset_name
|
str
|
The file or folder name (e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The local path to the downloaded file or folder. |
Raises:
| Type | Description |
|---|---|
EntryNotFoundError
|
If the asset is not found in the catalog. |
download_dataset
Download the xdat files for a dataset to the local data directory.
Files are stored under {local_data_dir}/{drive_path}/, mirroring
the Drive folder hierarchy. After a successful download, persists the
local_path back to the catalog JSON.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drive_path
|
str
|
The slash-joined path to the folder containing the
dataset (as shown in |
required |
base_name
|
str
|
The dataset identifier (shared filename stem). |
required |
Returns:
| Type | Description |
|---|---|
str
|
The local directory path where the files were written. |
Raises:
| Type | Description |
|---|---|
EntryNotFoundError
|
If the dataset is not found in the catalog. |
get_asset_path
Return the local path for an asset, downloading if needed.
If the asset has not been downloaded yet, or if the recorded
local_path no longer exists on disk, the download is triggered
automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drive_path
|
str
|
The slash-joined path to the asset's parent folder
(e.g. |
required |
asset_name
|
str
|
The file or folder name (e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The local path to the downloaded file or folder. |
Raises:
| Type | Description |
|---|---|
EntryNotFoundError
|
If the asset is not found in the catalog. |
get_dataset_path
Return the local directory path for a dataset, downloading if needed.
If the dataset has not been downloaded yet, or if the recorded
local_path no longer exists on disk, the download is triggered
automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drive_path
|
str
|
The slash-joined path to the folder containing the
dataset (as shown in |
required |
base_name
|
str
|
The dataset identifier (shared filename stem). |
required |
Returns:
| Type | Description |
|---|---|
str
|
The local directory path where the xdat files reside. |
Raises:
| Type | Description |
|---|---|
EntryNotFoundError
|
If the dataset is not found in the catalog. |
list_assets
Query the asset catalog and return a filtered DataFrame.
All filters are applied together (AND semantics). Omitting all arguments returns the full asset catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drive_path
|
str | None
|
Exact |
None
|
drive_path_prefix
|
str | None
|
Return only rows whose |
None
|
drive_path_contains
|
str | None
|
Return only rows whose |
None
|
asset_type
|
str | None
|
Filter by |
None
|
Examples:
catalog.list_assets() # everything catalog.list_assets(drive_path="2026-02-15_batch/reaching") # exact parent folder catalog.list_assets(drive_path_prefix="2026-02-15_batch") # subtree catalog.list_assets(asset_type="folder") # folders only
list_datasets
Query the dataset catalog and return a filtered DataFrame.
All filters are applied together (AND semantics). Omitting all arguments returns the full catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drive_path
|
str | None
|
Exact |
None
|
drive_path_prefix
|
str | None
|
Return only rows whose |
None
|
drive_path_contains
|
str | None
|
Return only rows whose |
None
|
Examples:
catalog.list_datasets() # everything catalog.list_datasets(drive_path="2026-02-15_batch/reaching") # exact folder catalog.list_datasets(drive_path_prefix="2026-02-15_batch") # subtree catalog.list_datasets(drive_path_contains="reaching") # any depth
prefetch
prefetch(
drive_path=None,
drive_path_prefix=None,
drive_path_contains=None,
*,
datasets=True,
assets=True,
asset_type=None,
force=False,
)
Bulk-download matching datasets and assets, skipping already-local items.
Idempotent by default: entries whose recorded local_path already
exists on disk are skipped. Running twice in a row issues no downloads
on the second call. This makes prefetch safe to use both for
proactive ("pre-warm before going offline") and reactive ("ensure
everything under X is available") workflows.
The drive_path filters have the same semantics as
:meth:list_datasets / :meth:list_assets: AND-combined, any omitted
filter is a wildcard.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drive_path
|
str | None
|
Exact |
None
|
drive_path_prefix
|
str | None
|
Match rows whose |
None
|
drive_path_contains
|
str | None
|
Match rows whose |
None
|
datasets
|
bool
|
When |
True
|
assets
|
bool
|
When |
True
|
asset_type
|
str | None
|
Restrict to |
None
|
force
|
bool
|
When |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
PrefetchResult
|
class: |
Examples:
catalog.prefetch() # everything catalog.prefetch(drive_path_prefix="2026-02") # subtree catalog.prefetch(drive_path_prefix="2026-02", assets=False) # datasets only catalog.prefetch(drive_path_prefix="2026-02", asset_type="folder") catalog.prefetch(force=True) # re-download everything catalog.prefetch(drive_path_prefix="2026-02", force=True) # re-download a subtree
scan
Scan Drive and rebuild the catalog JSON.
Any existing local_path entries are preserved so a rescan doesn't forget which datasets or assets have already been downloaded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
flat
|
bool
|
If |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
ScanResult
|
class: |
ScanResult
|
datasets and assets. |
PrefetchResult
dataclass
Summary of a :meth:Catalog.prefetch call.
Attributes:
| Name | Type | Description |
|---|---|---|
datasets_downloaded |
int
|
Number of datasets fetched from Drive. |
datasets_skipped |
int
|
Number of datasets already available locally. |
assets_downloaded |
int
|
Number of assets fetched from Drive. |
assets_skipped |
int
|
Number of assets already available locally. |
ScanResult
dataclass
Summary of a :meth:Catalog.scan call.
Attributes:
| Name | Type | Description |
|---|---|---|
datasets_new |
int
|
Datasets found on Drive that were not in the previous catalog. |
datasets_existing |
int
|
Datasets found on Drive that were already in the catalog. |
datasets_removed |
int
|
Datasets in the previous catalog that were not found on Drive. |
assets_new |
int
|
Assets found on Drive that were not in the previous catalog. |
assets_existing |
int
|
Assets found on Drive that were already in the catalog. |
assets_removed |
int
|
Assets in the previous catalog that were not found on Drive. |