Skip to content

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

no unreleased changes

[0.0.10] - 2026-06-04

Added

  • xdat_dataset_exists_locally() utility to verify all three canonical xdat files (_data.xdat, .xdat.json, _timestamp.xdat) exist locally.
  • Catalog.scan() now warns when a dataset on Drive is missing one or more required xdat files.
  • Catalog.get_dataset_path() and Catalog.get_asset_path() now automatically trigger a scan() if the requested entry is not found in the local catalog, improving resilience to out-of-date catalogs.

Changed

  • Catalog JSON writes are now atomic (using a temporary file and replace()) to prevent data loss during interruptions.
  • File downloads are now atomic (using a .tmp suffix) to ensure interrupted downloads do not leave partial or corrupted files on disk.
  • Catalog.scan() now reconciles local datasets by verifying the presence of all three xdat files, preventing a dataset from being marked "local" if it is incomplete.
  • Refactored Catalog internal path restoration logic into _restore_dataset_local_paths() and _restore_asset_local_paths().

Fixed

  • Improved validation of asset entries during catalog load; drive_path must now be a non-empty string.
  • Streamlined local path construction for assets in download_asset().

[0.0.9] - 2026-05-20

Added

  • upload_time column in Catalog.df and Catalog.assets_df: a UTC-aware datetime64 column populated from the Drive API's createdTime field during scan(). Rows without a recorded date are NaT. Existing catalogs without this field load cleanly — missing values default to None / NaT.

Changed

  • Catalog.prefetch() progress bar now covers only items being downloaded (not skipped ones), so the total count and ETA reflect real work. The bar is suppressed entirely when all matching items are already local.
  • Skipped items in prefetch() are logged at DEBUG level instead of being silently counted.
  • Per-item download log messages in Catalog.download_dataset() and Catalog.download_asset() lowered from INFO to DEBUG. The prefetch() progress bar is the intended progress signal for bulk operations; direct single-item calls remain observable via debug logging.

[0.0.8] - 2026-04-21

Added

  • Catalog.scan() now returns a ScanResult dataclass with counts of datasets_new, datasets_existing, datasets_removed, assets_new, assets_existing, and assets_removed — making it easy to inspect what changed after a Drive rescan. ScanResult.__str__ renders as a compact table.
  • ScanResult dataclass exported from the top-level package.
  • Progress bar during Drive file traversal in scan() (respects the quiet= flag passed to Catalog).
  • Progress bar in prefetch() showing overall item count and the name of the entry currently being downloaded (also respects quiet=).
  • Catalog.prefetch(drive_path, drive_path_prefix, drive_path_contains, datasets=True, assets=True, asset_type=None, force=False) — bulk, idempotent downloader that skips entries already present on disk. Pass force=True to re-download every matching entry regardless of local state. Returns a PrefetchResult with per-category download and skip counts.
  • PrefetchResult dataclass exported from the top-level package.
  • CatalogError and EntryNotFoundError exceptions exported from the top-level package. EntryNotFoundError (subclass of CatalogError) is raised when a dataset or asset lookup fails; CatalogError is raised for malformed catalog JSON.
  • Catalog.list_datasets(drive_path, drive_path_prefix, drive_path_contains) — convenience method returning a filtered dataset DataFrame.
  • Catalog.list_assets(drive_path, drive_path_prefix, drive_path_contains, asset_type) — convenience method returning a filtered asset DataFrame. Supports an additional asset_type="folder" or "file" filter.
  • quiet keyword argument on Catalog(...) suppresses tqdm progress bars during downloads.
  • DatasetEntry TypedDict exported from the top-level package (previously only AssetEntry was exported).

Changed

  • Catalog.scan() return type changed from None to ScanResult.
  • Drive scan warnings (e.g. incomplete search results) are now emitted via logging.getLogger rather than warnings.warn.
  • Dataset identity is now (drive_path, base_name) instead of base_name alone, allowing distinct recordings with the same base name in different folders.
  • Dataset downloads now mirror Drive hierarchy under local storage: local_data_dir/{drive_path}/....
  • Catalog.df and Catalog.assets_df now use a path-first schema and no longer include legacy date_folder / experiment columns.
  • Breaking: Catalog.download() renamed to Catalog.download_dataset() for symmetry with download_asset().
  • Breaking: Catalog.get_path() renamed to Catalog.get_dataset_path() for symmetry with get_asset_path().
  • Catalog.download_dataset() and Catalog.get_dataset_path() now require both drive_path and base_name to identify a dataset.
  • Catalog JSON parse errors now raise CatalogError instead of ValueError. "Not found" errors on single-item download / get_path methods now raise EntryNotFoundError instead of ValueError.

Removed

  • Breaking: Catalog(config_path=...) constructor shortcut removed. Use Catalog(Config.from_file("path")) or Catalog() (which calls Config.from_file() with auto-discovery).
  • Removed Catalog.list_experiment_paths(), Catalog.list_by_path(), and Catalog.status() convenience methods. Use Catalog.list_datasets() / Catalog.list_assets() or operate on the raw DataFrames directly.

Fixed

  • Catalog.scan() now preserves local_path for datasets by (drive_path, base_name) during refresh, preventing path collisions when multiple folders contain the same base_name.
  • py.typed marker (PEP 561) — the package now declares itself as typed, allowing mypy to use its annotations in downstream consumers.
  • Catalog.scan() now reconciles local_path against disk rather than trusting the previous catalog blindly. If the catalog is deleted, entries whose data already exists at the canonical local path (local_data_dir/{drive_path} for datasets, local_data_dir/assets/{drive_path}/{asset_name} for assets) are recovered automatically instead of triggering unnecessary re-downloads. Stale catalog paths that no longer exist on disk are also cleared.

[v0.0.7] - 2026-04-08

Changed

  • Catalog.scan() now defaults to the flat scanner (flat=True), which is faster and works even when the root folder is inaccessible. Pass flat=False to use the recursive traversal.
  • Removed automatic fallback logic from scan(). The caller now chooses the scan strategy via the flat keyword argument.

[v0.0.6] - 2026-04-07

Added

  • Catalog.list_experiment_paths() and Catalog.list_by_path() to help discover and filter datasets by their full Drive path.

Fixed

  • Depth-based asset cataloging: nested subfolders inside experiments (depth 3+) are now correctly cataloged as folder assets. Previously only direct subfolders (depth 2) were included.

[v0.0.5] - 2026-04-06

Added

  • Flat-scan fallback: when the root folder is inaccessible (HTTP 403/404) or returns an empty listing, Catalog.scan() automatically falls back to a flat files.list() of all files visible to the service account and reconstructs the folder hierarchy from parents metadata. A UserWarning is emitted when the fallback activates.
  • scan_drive_flat() internal function for flat-listing all Drive files with paginated pageSize=1000 requests and parent-chain path reconstruction.
  • Catalog can now be created without a pre-built Config object: Catalog(config_path="/path/to/config.json") and Catalog() are both supported.

Changed

  • Config.from_file() discovery now also checks ~/.config/radiens-drive/config.json and /etc/radiens-drive/config.json after the existing environment variable and local path checks.

[0.0.4] - 2026-04-02

Fixed

[0.0.3] - 2026-04-02

Changed

  • Config.from_file() now auto-discovers config files when called with no arguments: checks RADIENS_DRIVE_CATALOG_CONFIG env var, then .secrets/config.json, then config.json in the current working directory. Raises FileNotFoundError (previously ValueError) when no config is found.

[0.0.2] - 2026-04-02

Added

  • AssetEntry TypedDict for non-xdat Drive content (folders like logs/, PowerPoints, writeups, etc.).
  • scan_drive() now returns (datasets, assets) tuple. Assets are auto-discovered during scan: non-xdat files inside date or experiment folders become file assets; subfolders of experiment folders become folder assets and are still recursed for xdat datasets.
  • download_asset() in drive.py for downloading file or folder assets; folder assets are downloaded recursively, mirroring the Drive subtree.
  • Catalog.assets_df property — full asset catalog as a pandas DataFrame.
  • Catalog.list_assets() — query assets with optional date_folder, experiment, and asset_type filters.
  • Catalog.download_asset(drive_path, asset_name) — download an asset to local_data_dir/assets/{drive_path}/{asset_name}.
  • Catalog.get_asset_path(drive_path, asset_name) — return local path, downloading automatically if needed.
  • Catalog JSON format changed from a bare list to {"datasets": [...], "assets": [...]}. Old flat-list catalogs are migrated automatically on the next scan().
  • AssetEntry exported from the top-level package.

Changed

  • Catalog.list() and Catalog.list_assets(): renamed the date parameter to date_folder to match the catalog column name and make clear that an exact folder name (e.g. "2026-02-15_batch") is required, not a date prefix.

[0.0.1] - 2026-04-02

Added

  • Config dataclass with from_file() classmethod; supports ~ and $ENV_VAR expansion in path fields and RADIENS_DRIVE_CATALOG_CONFIG env var fallback.
  • build_drive_service() for authenticating with a Google service account.
  • scan_drive() for recursive Drive scanning; returns DatasetEntry records with date_folder, experiment, drive_path, and drive_file_ids.
  • download_dataset() for chunked download of xdat filesets to a local directory.
  • Catalog class with scan(), df, list(), download(), get_path(), and status().
  • MkDocs documentation site with Material theme, auto-generated API reference, and versioned deployment via mike.
  • CI/CD workflows: linting, type checking, tests, docs deployment, and PyPI publishing via GitHub Actions.