Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

no unreleased changes

[0.0.11] - 2026-06-08

Added

RecordingEntry TypedDict — replaces DatasetEntry (fields unchanged).
DriveItemEntry TypedDict — replaces AssetEntry; asset_name/asset_type fields replaced by name: str and is_folder: bool.
AmbiguousRecordingError exception (subclass of CatalogError) — raised by get_recording() when more than one recording matches a given base_name.
Catalog.get_recording(base_name) — look up a recording by base name, requiring it to be unique across all drive paths. Raises EntryNotFoundError (zero matches) or AmbiguousRecordingError (multiple matches).
Catalog.file_tree() — returns a multi-line string rendering of the Drive folder hierarchy, annotating each entry as [recording], [folder], or [file] and [local] or [not local].
xdat_recording_exists_locally() utility — replaces xdat_dataset_exists_locally().

Changed

Breaking: DatasetEntry renamed to RecordingEntry; AssetEntry renamed to DriveItemEntry.
Breaking: Catalog.recordings_df replaces Catalog.df; Catalog.items_df replaces Catalog.assets_df.
Breaking: Catalog.list_recordings() replaces Catalog.list_datasets(); Catalog.list_items() replaces Catalog.list_assets(). The asset_type: str | None filter on list_items() is replaced by is_folder: bool | None.
Breaking: Catalog.download_recording() replaces Catalog.download_dataset(); Catalog.get_recording_path() replaces Catalog.get_dataset_path().
Breaking: Catalog.download_item() replaces Catalog.download_asset(); Catalog.get_item_path() replaces Catalog.get_asset_path().
Breaking: ScanResult fields renamed: datasets_new/existing/removed → recordings_new/existing/removed; assets_new/existing/removed → items_new/existing/removed.
Breaking: PrefetchResult fields renamed: datasets_downloaded/skipped → recordings_downloaded/skipped; assets_downloaded/skipped → items_downloaded/skipped.
Breaking: prefetch() parameter asset_type replaced by is_folder: bool | None.
Breaking: Item local paths simplified from {local_data_dir}/assets/{drive_path}/{name} to {local_data_dir}/{drive_path}/{name}, matching the same Drive-mirroring convention used for recordings.
scan_drive() and scan_drive_flat() now catalog all non-xdat files and all non-root folders as DriveItemEntry regardless of depth. The previous depth-based _classify_item heuristic is removed entirely — consumer code owns the semantic classification of items.
_load_catalog reads both new ("recordings"/"items") and old ("datasets"/"assets") JSON keys, and handles both name/is_folder and legacy asset_name/asset_type field names, so existing catalog files continue to load without error.

Removed

DatasetEntry, AssetEntry — replaced by RecordingEntry and DriveItemEntry respectively.
xdat_dataset_exists_locally() — replaced by xdat_recording_exists_locally().
Depth-based _classify_item() internal function — catalog no longer applies structural heuristics to classify items.

[0.0.10] - 2026-06-04

Added

xdat_dataset_exists_locally() utility to verify all three canonical xdat files (_data.xdat, .xdat.json, _timestamp.xdat) exist locally.
Catalog.scan() now warns when a dataset on Drive is missing one or more required xdat files.
Catalog.get_dataset_path() and Catalog.get_asset_path() now automatically trigger a scan() if the requested entry is not found in the local catalog, improving resilience to out-of-date catalogs.

Changed

Catalog JSON writes are now atomic (using a temporary file and replace()) to prevent data loss during interruptions.
File downloads are now atomic (using a .tmp suffix) to ensure interrupted downloads do not leave partial or corrupted files on disk.
Catalog.scan() now reconciles local datasets by verifying the presence of all three xdat files, preventing a dataset from being marked "local" if it is incomplete.
Refactored Catalog internal path restoration logic into _restore_dataset_local_paths() and _restore_asset_local_paths().

Fixed

Improved validation of asset entries during catalog load; drive_path must now be a non-empty string.
Streamlined local path construction for assets in download_asset().

[0.0.9] - 2026-05-20

Added

upload_time column in Catalog.df and Catalog.assets_df: a UTC-aware datetime64 column populated from the Drive API's createdTime field during scan(). Rows without a recorded date are NaT. Existing catalogs without this field load cleanly — missing values default to None / NaT.

Changed

Catalog.prefetch() progress bar now covers only items being downloaded (not skipped ones), so the total count and ETA reflect real work. The bar is suppressed entirely when all matching items are already local.
Skipped items in prefetch() are logged at DEBUG level instead of being silently counted.
Per-item download log messages in Catalog.download_dataset() and Catalog.download_asset() lowered from INFO to DEBUG. The prefetch() progress bar is the intended progress signal for bulk operations; direct single-item calls remain observable via debug logging.

[0.0.8] - 2026-04-21

Added

Catalog.scan() now returns a ScanResult dataclass with counts of datasets_new, datasets_existing, datasets_removed, assets_new, assets_existing, and assets_removed — making it easy to inspect what changed after a Drive rescan. ScanResult.__str__ renders as a compact table.
ScanResult dataclass exported from the top-level package.
Progress bar during Drive file traversal in scan() (respects the quiet= flag passed to Catalog).
Progress bar in prefetch() showing overall item count and the name of the entry currently being downloaded (also respects quiet=).
Catalog.prefetch(drive_path, drive_path_prefix, drive_path_contains, datasets=True, assets=True, asset_type=None, force=False) — bulk, idempotent downloader that skips entries already present on disk. Pass force=True to re-download every matching entry regardless of local state. Returns a PrefetchResult with per-category download and skip counts.
PrefetchResult dataclass exported from the top-level package.
CatalogError and EntryNotFoundError exceptions exported from the top-level package. EntryNotFoundError (subclass of CatalogError) is raised when a dataset or asset lookup fails; CatalogError is raised for malformed catalog JSON.
Catalog.list_datasets(drive_path, drive_path_prefix, drive_path_contains) — convenience method returning a filtered dataset DataFrame.
Catalog.list_assets(drive_path, drive_path_prefix, drive_path_contains, asset_type) — convenience method returning a filtered asset DataFrame. Supports an additional asset_type="folder" or "file" filter.
quiet keyword argument on Catalog(...) suppresses tqdm progress bars during downloads.
DatasetEntry TypedDict exported from the top-level package (previously only AssetEntry was exported).

Changed

Catalog.scan() return type changed from None to ScanResult.
Drive scan warnings (e.g. incomplete search results) are now emitted via logging.getLogger rather than warnings.warn.
Dataset identity is now (drive_path, base_name) instead of base_name alone, allowing distinct recordings with the same base name in different folders.
Dataset downloads now mirror Drive hierarchy under local storage: local_data_dir/{drive_path}/....
Catalog.df and Catalog.assets_df now use a path-first schema and no longer include legacy date_folder / experiment columns.
Breaking: Catalog.download() renamed to Catalog.download_dataset() for symmetry with download_asset().
Breaking: Catalog.get_path() renamed to Catalog.get_dataset_path() for symmetry with get_asset_path().
Catalog.download_dataset() and Catalog.get_dataset_path() now require both drive_path and base_name to identify a dataset.
Catalog JSON parse errors now raise CatalogError instead of ValueError. "Not found" errors on single-item download / get_path methods now raise EntryNotFoundError instead of ValueError.

Removed

Breaking: Catalog(config_path=...) constructor shortcut removed. Use Catalog(Config.from_file("path")) or Catalog() (which calls Config.from_file() with auto-discovery).
Removed Catalog.list_experiment_paths(), Catalog.list_by_path(), and Catalog.status() convenience methods. Use Catalog.list_datasets() / Catalog.list_assets() or operate on the raw DataFrames directly.

Fixed

Catalog.scan() now preserves local_path for datasets by (drive_path, base_name) during refresh, preventing path collisions when multiple folders contain the same base_name.
py.typed marker (PEP 561) — the package now declares itself as typed, allowing mypy to use its annotations in downstream consumers.
Catalog.scan() now reconciles local_path against disk rather than trusting the previous catalog blindly. If the catalog is deleted, entries whose data already exists at the canonical local path (local_data_dir/{drive_path} for datasets, local_data_dir/assets/{drive_path}/{asset_name} for assets) are recovered automatically instead of triggering unnecessary re-downloads. Stale catalog paths that no longer exist on disk are also cleared.

[v0.0.7] - 2026-04-08

Changed

Catalog.scan() now defaults to the flat scanner (flat=True), which is faster and works even when the root folder is inaccessible. Pass flat=False to use the recursive traversal.
Removed automatic fallback logic from scan(). The caller now chooses the scan strategy via the flat keyword argument.

[v0.0.6] - 2026-04-07

Added

Catalog.list_experiment_paths() and Catalog.list_by_path() to help discover and filter datasets by their full Drive path.

Fixed

Depth-based asset cataloging: nested subfolders inside experiments (depth 3+) are now correctly cataloged as folder assets. Previously only direct subfolders (depth 2) were included.

[v0.0.5] - 2026-04-06

Added

Flat-scan fallback: when the root folder is inaccessible (HTTP 403/404) or returns an empty listing, Catalog.scan() automatically falls back to a flat files.list() of all files visible to the service account and reconstructs the folder hierarchy from parents metadata. A UserWarning is emitted when the fallback activates.
scan_drive_flat() internal function for flat-listing all Drive files with paginated pageSize=1000 requests and parent-chain path reconstruction.
Catalog can now be created without a pre-built Config object: Catalog(config_path="/path/to/config.json") and Catalog() are both supported.

Changed

Config.from_file() discovery now also checks ~/.config/radiens-drive/config.json and /etc/radiens-drive/config.json after the existing environment variable and local path checks.

[0.0.4] - 2026-04-02

Fixed

[0.0.3] - 2026-04-02

Changed

Config.from_file() now auto-discovers config files when called with no arguments: checks RADIENS_DRIVE_CATALOG_CONFIG env var, then .secrets/config.json, then config.json in the current working directory. Raises FileNotFoundError (previously ValueError) when no config is found.

[0.0.2] - 2026-04-02

Added

AssetEntry TypedDict for non-xdat Drive content (folders like logs/, PowerPoints, writeups, etc.).
scan_drive() now returns (datasets, assets) tuple. Assets are auto-discovered during scan: non-xdat files inside date or experiment folders become file assets; subfolders of experiment folders become folder assets and are still recursed for xdat datasets.
download_asset() in drive.py for downloading file or folder assets; folder assets are downloaded recursively, mirroring the Drive subtree.
Catalog.assets_df property — full asset catalog as a pandas DataFrame.
Catalog.list_assets() — query assets with optional date_folder, experiment, and asset_type filters.
Catalog.download_asset(drive_path, asset_name) — download an asset to local_data_dir/assets/{drive_path}/{asset_name}.
Catalog.get_asset_path(drive_path, asset_name) — return local path, downloading automatically if needed.
Catalog JSON format changed from a bare list to {"datasets": [...], "assets": [...]}. Old flat-list catalogs are migrated automatically on the next scan().
AssetEntry exported from the top-level package.

Changed

Catalog.list() and Catalog.list_assets(): renamed the date parameter to date_folder to match the catalog column name and make clear that an exact folder name (e.g. "2026-02-15_batch") is required, not a date prefix.

[0.0.1] - 2026-04-02

Added

Config dataclass with from_file() classmethod; supports ~ and $ENV_VAR expansion in path fields and RADIENS_DRIVE_CATALOG_CONFIG env var fallback.
build_drive_service() for authenticating with a Google service account.
scan_drive() for recursive Drive scanning; returns DatasetEntry records with date_folder, experiment, drive_path, and drive_file_ids.
download_dataset() for chunked download of xdat filesets to a local directory.
Catalog class with scan(), df, list(), download(), get_path(), and status().
MkDocs documentation site with Material theme, auto-generated API reference, and versioned deployment via mike.
CI/CD workflows: linting, type checking, tests, docs deployment, and PyPI publishing via GitHub Actions.