Skip to content

Usage Guide

First scan

Before you can query or download anything, build the catalog by scanning Drive:

from radiens_drive_catalog import Catalog, Config

config = Config.from_file("config.json")
catalog = Catalog(config)

catalog.scan()

scan() traverses the entire Drive subtree rooted at root_folder_id, finds all xdat filesets and non-xdat assets, and writes the catalog to catalog_path as JSON. For large drives this may take a minute or two — Drive API list calls are paginated.

The catalog file persists between Python sessions. You don't need to call scan() again unless the Drive contents have changed.

Rescanning

Calling scan() again is safe and idempotent. It rebuilds the catalog from Drive but preserves any local_path values for datasets and assets you have already downloaded, so you won't lose track of local files after a rescan.

Datasets and assets that no longer exist on Drive are dropped from the catalog.


Datasets

list() — filtered DataFrame

catalog.list()                                                      # all datasets
catalog.list(date_folder="2026-02-15_batch")                        # one date folder
catalog.list(date_folder="2026-02-15_batch", experiment="reaching") # one experiment
catalog.list(experiment="reaching")                                 # across all dates

Both parameters are exact-match filters against their respective catalog columns (date_folder and experiment). The full folder name must be passed — for example "2026-02-15_batch", not just "2026-02". Passing only date_folder returns everything within that date folder's tree. Passing experiment narrows to that experiment across all dates.

Both filters are optional — omitting one means "all values for that level".

The return value is a pandas DataFrame with a reset index.

df — raw DataFrame

catalog.df

The full dataset catalog as a DataFrame with columns:

Column Type Description
base_name str Dataset identifier
date_folder str \| None Top-level date folder name
experiment str \| None Depth-2 folder name
drive_path str Full slash-joined path from root to containing folder
drive_file_ids dict Maps "data", "meta", "timestamp" to Drive file IDs
local_path str \| None Local directory path if downloaded, else None

The DataFrame is cached in memory and invalidated automatically after scan() or download().

status() — download overview

catalog.status()

Returns the same DataFrame as df with an extra boolean column is_local that is True when the dataset has been downloaded and the local path still exists on disk. Useful for a quick overview:

s = catalog.status()
print(s[["base_name", "date_folder", "experiment", "is_local"]])

Working with the DataFrame directly

Since catalog.df is a standard pandas DataFrame, you can use the full pandas API:

# Count datasets per date folder
catalog.df.groupby("date_folder").size()

# Find all datasets that aren't downloaded yet
catalog.status().query("not is_local")

# List all unique experiments
catalog.df["experiment"].dropna().unique()

download() — explicit download

local_path = catalog.download("rat01_session3")

Downloads the three xdat files for a dataset to local_data_dir (as specified in your config). Files are stored flat — all files land directly in local_data_dir, not in per-dataset subdirectories. The base_name prefix naturally groups them:

local_data_dir/
    rat01_session3_data.xdat
    rat01_session3.xdat.json
    rat01_session3_timestamp.xdat

After a successful download, local_path is persisted back to the catalog JSON.

get_path() — download if needed

path = catalog.get_path("rat01_session3")

Returns the local directory path for a dataset. If the dataset hasn't been downloaded yet — or if the recorded local_path no longer exists on disk — the download is triggered automatically. This is the most convenient entry point for analysis scripts:

import numpy as np

path = catalog.get_path("rat01_session3")
data = np.fromfile(f"{path}/rat01_session3_data.xdat", dtype=np.int16)

Assets

Assets are non-xdat files and folders found alongside datasets on Drive: logs directories, PowerPoint slides, writeups, and similar content. They are discovered automatically during scan() and tracked in the same catalog file under a separate "assets" key.

What gets cataloged as an asset:

  • Non-xdat files directly inside a date folder or experiment folder → file assets
  • Subfolders of an experiment folder (e.g. logs/) → folder assets, and also recursed for any xdat datasets they may contain
  • Content deeper than the experiment level is not separately cataloged — it belongs to its parent folder asset and is downloaded with it

Date folders and experiment folders themselves are never cataloged as assets.

list_assets() — filtered DataFrame

catalog.list_assets()                                                               # all assets
catalog.list_assets(date_folder="2026-02-15_batch")                                 # one date folder
catalog.list_assets(experiment="reaching")                                          # one experiment
catalog.list_assets(date_folder="2026-02-15_batch", experiment="reaching", asset_type="folder")

Accepts the same date and experiment filters as list(), plus an optional asset_type filter ("folder" or "file").

assets_df — raw DataFrame

catalog.assets_df

The full asset catalog as a DataFrame with columns:

Column Type Description
asset_name str File or folder name (e.g. "logs", "notes.pptx")
asset_type "folder" \| "file" Whether it is a directory or a file
date_folder str \| None Top-level date folder name
experiment str \| None Depth-2 folder name
drive_path str Slash-joined path to the asset's parent folder
drive_id str Google Drive ID of this file or folder
mime_type str MIME type as reported by Drive
local_path str \| None Local path if downloaded, else None

Identifying assets

Assets are uniquely identified by (drive_path, asset_name). drive_path is the slash-joined path to the parent folder — for example a logs/ folder inside 2026-02-15_batch/reaching/ has drive_path = "2026-02-15_batch/reaching" and asset_name = "logs". This means two logs/ folders from different experiments are distinct entries with different drive_path values.

download_asset() — explicit download

local_path = catalog.download_asset("2026-02-15_batch/reaching", "logs")

Downloads an asset to local_data_dir/assets/{drive_path}/{asset_name}:

local_data_dir/
    assets/
        2026-02-15_batch/
            reaching/
                logs/           ← entire folder subtree mirrored here
                    log_0215.txt
                notes.pptx      ← file asset

For folder assets the entire Drive subtree is downloaded recursively.

get_asset_path() — download if needed

path = catalog.get_asset_path("2026-02-15_batch/reaching", "logs")

Returns the local path for an asset, triggering a download if it isn't already available locally or if the recorded path no longer exists on disk.