Usage Guide
First scan
Before you can query or download anything, build the catalog by scanning Drive:
from radiens_drive_catalog import Catalog, Config
config = Config.from_file("config.json")
catalog = Catalog(config)
catalog.scan()
scan() defaults to a flat listing of all files visible to the service account, then reconstructs folder paths via the parents field — this is faster and works even when the root folder is not directly accessible. Pass flat=False for a recursive traversal from the root folder instead. Either way, all xdat filesets and non-xdat assets are found and written to catalog_path as JSON. For large drives this may take a minute or two — Drive API list calls are paginated.
The catalog file persists between Python sessions. You don't need to call scan() again unless the Drive contents have changed.
Rescanning
Calling scan() again is safe and idempotent. It rebuilds the catalog from Drive but preserves any local_path values for datasets and assets you have already downloaded, so you won't lose track of local files after a rescan.
Datasets and assets that no longer exist on Drive are dropped from the catalog.
Datasets
list() — filtered DataFrame
catalog.list() # all datasets
catalog.list(drive_path="2026-02-15_batch/reaching") # exact folder
catalog.list(drive_path_prefix="2026-02-15_batch") # full date subtree
catalog.list(drive_path_contains="reaching") # any depth
All three filters apply to the drive_path column (the slash-joined path from the Drive root to the folder containing the dataset). Multiple filters are combined with AND semantics. Omitting all filters returns the full catalog.
The return value is a pandas DataFrame with a reset index.
df — raw DataFrame
The full dataset catalog as a DataFrame with columns:
| Column | Type | Description |
|---|---|---|
base_name |
str |
Dataset identifier (not globally unique — pair with drive_path) |
drive_path |
str |
Slash-joined path from root to the containing folder |
drive_file_ids |
dict |
Maps "data", "meta", "timestamp" to Drive file IDs |
local_path |
str \| None |
Local directory path if downloaded, else None |
The DataFrame is cached in memory and invalidated automatically after scan() or download().
Working with the DataFrame directly
Since catalog.df is a standard pandas DataFrame, you can use the full pandas API:
# Find all datasets that aren't downloaded yet
catalog.df[catalog.df["local_path"].isna()]
# Exact path match
catalog.df[catalog.df["drive_path"] == "2026-02-15_batch/reaching"]
# Prefix / subtree
catalog.df[catalog.df["drive_path"].str.startswith("2026-02")]
# Substring search
catalog.df[catalog.df["drive_path"].str.contains("reaching")]
download() — explicit download
Downloads the three xdat files for a dataset. Files are stored under {local_data_dir}/{drive_path}/, mirroring the Drive folder hierarchy:
local_data_dir/
2026-02-15_batch/
reaching/
rat01_session3_data.xdat
rat01_session3.xdat.json
rat01_session3_timestamp.xdat
After a successful download, local_path is persisted back to the catalog JSON.
get_path() — download if needed
Returns the local directory path for a dataset. If the dataset hasn't been downloaded yet — or if the recorded local_path no longer exists on disk — the download is triggered automatically. This is the most convenient entry point for analysis scripts:
import numpy as np
path = catalog.get_path("2026-02-15_batch/reaching", "rat01_session3")
data = np.fromfile(f"{path}/rat01_session3_data.xdat", dtype=np.int16)
Assets
Assets are non-xdat files and folders found alongside datasets on Drive: logs directories, PowerPoint slides, writeups, and similar content. They are discovered automatically during scan() and tracked in the same catalog file under a separate "assets" key.
What gets cataloged as an asset:
- Non-xdat files directly inside a date folder or experiment folder → file assets
- Subfolders of an experiment folder (e.g.
logs/) → folder assets, and also recursed for any xdat datasets they may contain - Content deeper than the experiment level is not separately cataloged — it belongs to its parent folder asset and is downloaded with it
Date folders and experiment folders themselves are never cataloged as assets.
assets_df — raw DataFrame
The full asset catalog as a DataFrame with columns:
| Column | Type | Description |
|---|---|---|
asset_name |
str |
File or folder name (e.g. "logs", "notes.pptx") |
asset_type |
"folder" \| "file" |
Whether it is a directory or a file |
drive_path |
str |
Slash-joined path to the asset's parent folder |
drive_id |
str |
Google Drive ID of this file or folder |
mime_type |
str |
MIME type as reported by Drive |
local_path |
str \| None |
Local path if downloaded, else None |
Query with pandas directly:
# All folder assets
catalog.assets_df[catalog.assets_df["asset_type"] == "folder"]
# Assets within a specific date subtree
catalog.assets_df[catalog.assets_df["drive_path"].str.startswith("2026-02-15_batch")]
# Assets not yet downloaded
catalog.assets_df[catalog.assets_df["local_path"].isna()]
Identifying assets
Assets are uniquely identified by (drive_path, asset_name). drive_path is the slash-joined path to the parent folder — for example a logs/ folder inside 2026-02-15_batch/reaching/ has drive_path = "2026-02-15_batch/reaching" and asset_name = "logs". This means two logs/ folders from different experiments are distinct entries with different drive_path values.
download_asset() — explicit download
Downloads an asset to local_data_dir/assets/{drive_path}/{asset_name}:
local_data_dir/
assets/
2026-02-15_batch/
reaching/
logs/ ← entire folder subtree mirrored here
log_0215.txt
notes.pptx ← file asset
For folder assets the entire Drive subtree is downloaded recursively.
get_asset_path() — download if needed
Returns the local path for an asset, triggering a download if it isn't already available locally or if the recorded path no longer exists on disk.