Usage Guide
First scan
Before you can query or download anything, build the catalog by scanning Drive:
from radiens_drive_catalog import Catalog, Config
config = Config.from_file("config.json")
catalog = Catalog(config)
catalog.scan()
scan() traverses the entire Drive subtree rooted at root_folder_id, finds all xdat filesets and non-xdat assets, and writes the catalog to catalog_path as JSON. For large drives this may take a minute or two — Drive API list calls are paginated.
The catalog file persists between Python sessions. You don't need to call scan() again unless the Drive contents have changed.
Rescanning
Calling scan() again is safe and idempotent. It rebuilds the catalog from Drive but preserves any local_path values for datasets and assets you have already downloaded, so you won't lose track of local files after a rescan.
Datasets and assets that no longer exist on Drive are dropped from the catalog.
Datasets
list() — filtered DataFrame
catalog.list() # all datasets
catalog.list(date_folder="2026-02-15_batch") # one date folder
catalog.list(date_folder="2026-02-15_batch", experiment="reaching") # one experiment
catalog.list(experiment="reaching") # across all dates
Both parameters are exact-match filters against their respective catalog columns (date_folder and experiment). The full folder name must be passed — for example "2026-02-15_batch", not just "2026-02". Passing only date_folder returns everything within that date folder's tree. Passing experiment narrows to that experiment across all dates.
Both filters are optional — omitting one means "all values for that level".
The return value is a pandas DataFrame with a reset index.
df — raw DataFrame
The full dataset catalog as a DataFrame with columns:
| Column | Type | Description |
|---|---|---|
base_name |
str |
Dataset identifier |
date_folder |
str \| None |
Top-level date folder name |
experiment |
str \| None |
Depth-2 folder name |
drive_path |
str |
Full slash-joined path from root to containing folder |
drive_file_ids |
dict |
Maps "data", "meta", "timestamp" to Drive file IDs |
local_path |
str \| None |
Local directory path if downloaded, else None |
The DataFrame is cached in memory and invalidated automatically after scan() or download().
status() — download overview
Returns the same DataFrame as df with an extra boolean column is_local that is True when the dataset has been downloaded and the local path still exists on disk. Useful for a quick overview:
Working with the DataFrame directly
Since catalog.df is a standard pandas DataFrame, you can use the full pandas API:
# Count datasets per date folder
catalog.df.groupby("date_folder").size()
# Find all datasets that aren't downloaded yet
catalog.status().query("not is_local")
# List all unique experiments
catalog.df["experiment"].dropna().unique()
download() — explicit download
Downloads the three xdat files for a dataset to local_data_dir (as specified in your config). Files are stored flat — all files land directly in local_data_dir, not in per-dataset subdirectories. The base_name prefix naturally groups them:
After a successful download, local_path is persisted back to the catalog JSON.
get_path() — download if needed
Returns the local directory path for a dataset. If the dataset hasn't been downloaded yet — or if the recorded local_path no longer exists on disk — the download is triggered automatically. This is the most convenient entry point for analysis scripts:
import numpy as np
path = catalog.get_path("rat01_session3")
data = np.fromfile(f"{path}/rat01_session3_data.xdat", dtype=np.int16)
Assets
Assets are non-xdat files and folders found alongside datasets on Drive: logs directories, PowerPoint slides, writeups, and similar content. They are discovered automatically during scan() and tracked in the same catalog file under a separate "assets" key.
What gets cataloged as an asset:
- Non-xdat files directly inside a date folder or experiment folder → file assets
- Subfolders of an experiment folder (e.g.
logs/) → folder assets, and also recursed for any xdat datasets they may contain - Content deeper than the experiment level is not separately cataloged — it belongs to its parent folder asset and is downloaded with it
Date folders and experiment folders themselves are never cataloged as assets.
list_assets() — filtered DataFrame
catalog.list_assets() # all assets
catalog.list_assets(date_folder="2026-02-15_batch") # one date folder
catalog.list_assets(experiment="reaching") # one experiment
catalog.list_assets(date_folder="2026-02-15_batch", experiment="reaching", asset_type="folder")
Accepts the same date and experiment filters as list(), plus an optional asset_type filter ("folder" or "file").
assets_df — raw DataFrame
The full asset catalog as a DataFrame with columns:
| Column | Type | Description |
|---|---|---|
asset_name |
str |
File or folder name (e.g. "logs", "notes.pptx") |
asset_type |
"folder" \| "file" |
Whether it is a directory or a file |
date_folder |
str \| None |
Top-level date folder name |
experiment |
str \| None |
Depth-2 folder name |
drive_path |
str |
Slash-joined path to the asset's parent folder |
drive_id |
str |
Google Drive ID of this file or folder |
mime_type |
str |
MIME type as reported by Drive |
local_path |
str \| None |
Local path if downloaded, else None |
Identifying assets
Assets are uniquely identified by (drive_path, asset_name). drive_path is the slash-joined path to the parent folder — for example a logs/ folder inside 2026-02-15_batch/reaching/ has drive_path = "2026-02-15_batch/reaching" and asset_name = "logs". This means two logs/ folders from different experiments are distinct entries with different drive_path values.
download_asset() — explicit download
Downloads an asset to local_data_dir/assets/{drive_path}/{asset_name}:
local_data_dir/
assets/
2026-02-15_batch/
reaching/
logs/ ← entire folder subtree mirrored here
log_0215.txt
notes.pptx ← file asset
For folder assets the entire Drive subtree is downloaded recursively.
get_asset_path() — download if needed
Returns the local path for an asset, triggering a download if it isn't already available locally or if the recorded path no longer exists on disk.