Skip to content

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

no unreleased changes

[0.0.8] - 2026-04-09

Added

  • Catalog.scan() now returns a ScanResult dataclass with counts of datasets_new, datasets_existing, datasets_removed, assets_new, assets_existing, and assets_removed — making it easy to inspect what changed after a Drive rescan. ScanResult.__str__ renders as a compact table.
  • ScanResult dataclass exported from the top-level package.
  • Progress bar during Drive file traversal in scan() (respects the quiet= flag passed to Catalog).
  • Progress bar in prefetch() showing overall item count and the name of the entry currently being downloaded (also respects quiet=).
  • Catalog.prefetch(drive_path, drive_path_prefix, drive_path_contains, datasets=True, assets=True, asset_type=None, force=False) — bulk, idempotent downloader that skips entries already present on disk. Pass force=True to re-download every matching entry regardless of local state. Returns a PrefetchResult with per-category download and skip counts.
  • PrefetchResult dataclass exported from the top-level package.
  • CatalogError and EntryNotFoundError exceptions exported from the top-level package. EntryNotFoundError (subclass of CatalogError) is raised when a dataset or asset lookup fails; CatalogError is raised for malformed catalog JSON.
  • Catalog.list_datasets(drive_path, drive_path_prefix, drive_path_contains) — convenience method returning a filtered dataset DataFrame.
  • Catalog.list_assets(drive_path, drive_path_prefix, drive_path_contains, asset_type) — convenience method returning a filtered asset DataFrame. Supports an additional asset_type="folder" or "file" filter.
  • quiet keyword argument on Catalog(...) suppresses tqdm progress bars during downloads.
  • DatasetEntry TypedDict exported from the top-level package (previously only AssetEntry was exported).

Changed

  • Catalog.scan() return type changed from None to ScanResult.
  • Drive scan warnings (e.g. incomplete search results) are now emitted via logging.getLogger rather than warnings.warn.
  • Dataset identity is now (drive_path, base_name) instead of base_name alone, allowing distinct recordings with the same base name in different folders.
  • Dataset downloads now mirror Drive hierarchy under local storage: local_data_dir/{drive_path}/....
  • Catalog.df and Catalog.assets_df now use a path-first schema and no longer include legacy date_folder / experiment columns.
  • Breaking: Catalog.download() renamed to Catalog.download_dataset() for symmetry with download_asset().
  • Breaking: Catalog.get_path() renamed to Catalog.get_dataset_path() for symmetry with get_asset_path().
  • Catalog.download_dataset() and Catalog.get_dataset_path() now require both drive_path and base_name to identify a dataset.
  • Catalog JSON parse errors now raise CatalogError instead of ValueError. "Not found" errors on single-item download / get_path methods now raise EntryNotFoundError instead of ValueError.

Removed

  • Breaking: Catalog(config_path=...) constructor shortcut removed. Use Catalog(Config.from_file("path")) or Catalog() (which calls Config.from_file() with auto-discovery).
  • Removed Catalog.list_experiment_paths(), Catalog.list_by_path(), and Catalog.status() convenience methods. Use Catalog.list_datasets() / Catalog.list_assets() or operate on the raw DataFrames directly.

Fixed

  • Catalog.scan() now preserves local_path for datasets by (drive_path, base_name) during refresh, preventing path collisions when multiple folders contain the same base_name.
  • py.typed marker (PEP 561) — the package now declares itself as typed, allowing mypy to use its annotations in downstream consumers.
  • Catalog.scan() now reconciles local_path against disk rather than trusting the previous catalog blindly. If the catalog is deleted, entries whose data already exists at the canonical local path (local_data_dir/{drive_path} for datasets, local_data_dir/assets/{drive_path}/{asset_name} for assets) are recovered automatically instead of triggering unnecessary re-downloads. Stale catalog paths that no longer exist on disk are also cleared.

[v0.0.7] - 2026-04-08

Changed

  • Catalog.scan() now defaults to the flat scanner (flat=True), which is faster and works even when the root folder is inaccessible. Pass flat=False to use the recursive traversal.
  • Removed automatic fallback logic from scan(). The caller now chooses the scan strategy via the flat keyword argument.

[v0.0.6] - 2026-04-07

Added

  • Catalog.list_experiment_paths() and Catalog.list_by_path() to help discover and filter datasets by their full Drive path.

Fixed

  • Depth-based asset cataloging: nested subfolders inside experiments (depth 3+) are now correctly cataloged as folder assets. Previously only direct subfolders (depth 2) were included.

[v0.0.5] - 2026-04-06

Added

  • Flat-scan fallback: when the root folder is inaccessible (HTTP 403/404) or returns an empty listing, Catalog.scan() automatically falls back to a flat files.list() of all files visible to the service account and reconstructs the folder hierarchy from parents metadata. A UserWarning is emitted when the fallback activates.
  • scan_drive_flat() internal function for flat-listing all Drive files with paginated pageSize=1000 requests and parent-chain path reconstruction.
  • Catalog can now be created without a pre-built Config object: Catalog(config_path="/path/to/config.json") and Catalog() are both supported.

Changed

  • Config.from_file() discovery now also checks ~/.config/radiens-drive/config.json and /etc/radiens-drive/config.json after the existing environment variable and local path checks.

[0.0.4] - 2026-04-02

Fixed

[0.0.3] - 2026-04-02

Changed

  • Config.from_file() now auto-discovers config files when called with no arguments: checks RADIENS_DRIVE_CATALOG_CONFIG env var, then .secrets/config.json, then config.json in the current working directory. Raises FileNotFoundError (previously ValueError) when no config is found.

[0.0.2] - 2026-04-02

Added

  • AssetEntry TypedDict for non-xdat Drive content (folders like logs/, PowerPoints, writeups, etc.).
  • scan_drive() now returns (datasets, assets) tuple. Assets are auto-discovered during scan: non-xdat files inside date or experiment folders become file assets; subfolders of experiment folders become folder assets and are still recursed for xdat datasets.
  • download_asset() in drive.py for downloading file or folder assets; folder assets are downloaded recursively, mirroring the Drive subtree.
  • Catalog.assets_df property — full asset catalog as a pandas DataFrame.
  • Catalog.list_assets() — query assets with optional date_folder, experiment, and asset_type filters.
  • Catalog.download_asset(drive_path, asset_name) — download an asset to local_data_dir/assets/{drive_path}/{asset_name}.
  • Catalog.get_asset_path(drive_path, asset_name) — return local path, downloading automatically if needed.
  • Catalog JSON format changed from a bare list to {"datasets": [...], "assets": [...]}. Old flat-list catalogs are migrated automatically on the next scan().
  • AssetEntry exported from the top-level package.

Changed

  • Catalog.list() and Catalog.list_assets(): renamed the date parameter to date_folder to match the catalog column name and make clear that an exact folder name (e.g. "2026-02-15_batch") is required, not a date prefix.

[0.0.1] - 2026-04-02

Added

  • Config dataclass with from_file() classmethod; supports ~ and $ENV_VAR expansion in path fields and RADIENS_DRIVE_CATALOG_CONFIG env var fallback.
  • build_drive_service() for authenticating with a Google service account.
  • scan_drive() for recursive Drive scanning; returns DatasetEntry records with date_folder, experiment, drive_path, and drive_file_ids.
  • download_dataset() for chunked download of xdat filesets to a local directory.
  • Catalog class with scan(), df, list(), download(), get_path(), and status().
  • MkDocs documentation site with Material theme, auto-generated API reference, and versioned deployment via mike.
  • CI/CD workflows: linting, type checking, tests, docs deployment, and PyPI publishing via GitHub Actions.