Usage Guide

First scan

Before you can query or download anything, build the catalog by scanning Drive:

from radiens_drive_catalog import Catalog, Config

config = Config.from_file("config.json")
catalog = Catalog(config)

catalog.scan()

scan() defaults to a flat listing of all files visible to the service account, then reconstructs folder paths via the parents field — this is faster and works even when the root folder is not directly accessible. Pass flat=False for a recursive traversal from the root folder instead. Either way, all xdat filesets and non-xdat assets are found and written to catalog_path as JSON. For large drives this may take a minute or two — Drive API list calls are paginated.

The catalog file persists between Python sessions. You don't need to call scan() again unless the Drive contents have changed.

Rescanning

Calling scan() again is safe and idempotent. It rebuilds the catalog from Drive but preserves any local_path values for datasets and assets you have already downloaded, so you won't lose track of local files after a rescan.

Datasets and assets that no longer exist on Drive are dropped from the catalog.

Datasets

`list_datasets()` — filtered DataFrame

catalog.list_datasets()                                                      # all datasets
catalog.list_datasets(drive_path="2026-02-15_batch/reaching")               # exact folder
catalog.list_datasets(drive_path_prefix="2026-02-15_batch")                 # full date subtree
catalog.list_datasets(drive_path_contains="reaching")                        # any depth

All three filters apply to the drive_path column (the slash-joined path from the Drive root to the folder containing the dataset). Multiple filters are combined with AND semantics. Omitting all filters returns the full catalog.

The return value is a pandas DataFrame with a reset index.

`df` — raw DataFrame

catalog.df

The full dataset catalog as a DataFrame with columns:

Column	Type	Description
`base_name`	`str`	Dataset identifier (not globally unique — pair with `drive_path`)
`drive_path`	`str`	Slash-joined path from root to the containing folder
`drive_file_ids`	`dict`	Maps `"data"`, `"meta"`, `"timestamp"` to Drive file IDs
`local_path`	`str \\| None`	Local directory path if downloaded, else `None`
`upload_time`	`datetime64[UTC] \\| NaT`	UTC-aware upload timestamp from Drive, or `NaT` if not available

The DataFrame is cached in memory and invalidated automatically after scan() or download_dataset().

Working with the DataFrame directly

Since catalog.df is a standard pandas DataFrame, you can use the full pandas API:

# Find all datasets that aren't downloaded yet
catalog.df[catalog.df["local_path"].isna()]

# Exact path match
catalog.df[catalog.df["drive_path"] == "2026-02-15_batch/reaching"]

# Prefix / subtree
catalog.df[catalog.df["drive_path"].str.startswith("2026-02")]

# Substring search
catalog.df[catalog.df["drive_path"].str.contains("reaching")]

`download_dataset()` — explicit download

local_path = catalog.download_dataset("2026-02-15_batch/reaching", "rat01_session3")

Downloads the three xdat files for a dataset. Files are stored under {local_data_dir}/{drive_path}/, mirroring the Drive folder hierarchy:

local_data_dir/
    2026-02-15_batch/
        reaching/
            rat01_session3_data.xdat
            rat01_session3.xdat.json
            rat01_session3_timestamp.xdat

After a successful download, local_path is persisted back to the catalog JSON.

`get_dataset_path()` — download if needed

path = catalog.get_dataset_path("2026-02-15_batch/reaching", "rat01_session3")

Returns the local directory path for a dataset. If the dataset hasn't been downloaded yet — or if the recorded local_path no longer exists on disk — the download is triggered automatically. This is the most convenient entry point for analysis scripts:

import numpy as np

path = catalog.get_dataset_path("2026-02-15_batch/reaching", "rat01_session3")
data = np.fromfile(f"{path}/rat01_session3_data.xdat", dtype=np.int16)

Assets

Assets are non-xdat files and folders found alongside datasets on Drive: logs directories, PowerPoint slides, writeups, and similar content. They are discovered automatically during scan() and tracked in the same catalog file under a separate "assets" key.

What gets cataloged as an asset:

Non-xdat files directly inside a date folder or experiment folder → file assets
Subfolders of an experiment folder (e.g. logs/) → folder assets, and also recursed for any xdat datasets they may contain
Content deeper than the experiment level is not separately cataloged — it belongs to its parent folder asset and is downloaded with it

Date folders and experiment folders themselves are never cataloged as assets.

`assets_df` — raw DataFrame

catalog.assets_df

The full asset catalog as a DataFrame with columns:

Column	Type	Description
`asset_name`	`str`	File or folder name (e.g. `"logs"`, `"notes.pptx"`)
`asset_type`	`"folder" \\| "file"`	Whether it is a directory or a file
`drive_path`	`str`	Slash-joined path to the asset's parent folder
`drive_id`	`str`	Google Drive ID of this file or folder
`mime_type`	`str`	MIME type as reported by Drive
`local_path`	`str \\| None`	Local path if downloaded, else `None`
`upload_time`	`datetime64[UTC] \\| NaT`	UTC-aware upload timestamp from Drive, or `NaT` if not available

Query with pandas directly:

# All folder assets
catalog.assets_df[catalog.assets_df["asset_type"] == "folder"]

# Assets within a specific date subtree
catalog.assets_df[catalog.assets_df["drive_path"].str.startswith("2026-02-15_batch")]

# Assets not yet downloaded
catalog.assets_df[catalog.assets_df["local_path"].isna()]

Identifying assets

Assets are uniquely identified by (drive_path, asset_name). drive_path is the slash-joined path to the parent folder — for example a logs/ folder inside 2026-02-15_batch/reaching/ has drive_path = "2026-02-15_batch/reaching" and asset_name = "logs". This means two logs/ folders from different experiments are distinct entries with different drive_path values.

`download_asset()` — explicit download

local_path = catalog.download_asset("2026-02-15_batch/reaching", "logs")

Downloads an asset to local_data_dir/assets/{drive_path}/{asset_name}:

local_data_dir/
    assets/
        2026-02-15_batch/
            reaching/
                logs/           ← entire folder subtree mirrored here
                    log_0215.txt
                notes.pptx      ← file asset

For folder assets the entire Drive subtree is downloaded recursively.

`get_asset_path()` — download if needed

path = catalog.get_asset_path("2026-02-15_batch/reaching", "logs")

Returns the local path for an asset, triggering a download if it isn't already available locally or if the recorded path no longer exists on disk.

Usage Guide

First scan

Rescanning

Datasets

list_datasets() — filtered DataFrame

df — raw DataFrame

Working with the DataFrame directly

download_dataset() — explicit download

get_dataset_path() — download if needed