Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
no unreleased changes
[0.0.8] - 2026-04-09
Added
Catalog.scan()now returns aScanResultdataclass with counts ofdatasets_new,datasets_existing,datasets_removed,assets_new,assets_existing, andassets_removed— making it easy to inspect what changed after a Drive rescan.ScanResult.__str__renders as a compact table.ScanResultdataclass exported from the top-level package.- Progress bar during Drive file traversal in
scan()(respects thequiet=flag passed toCatalog). - Progress bar in
prefetch()showing overall item count and the name of the entry currently being downloaded (also respectsquiet=). Catalog.prefetch(drive_path, drive_path_prefix, drive_path_contains, datasets=True, assets=True, asset_type=None, force=False)— bulk, idempotent downloader that skips entries already present on disk. Passforce=Trueto re-download every matching entry regardless of local state. Returns aPrefetchResultwith per-category download and skip counts.PrefetchResultdataclass exported from the top-level package.CatalogErrorandEntryNotFoundErrorexceptions exported from the top-level package.EntryNotFoundError(subclass ofCatalogError) is raised when a dataset or asset lookup fails;CatalogErroris raised for malformed catalog JSON.Catalog.list_datasets(drive_path, drive_path_prefix, drive_path_contains)— convenience method returning a filtered dataset DataFrame.Catalog.list_assets(drive_path, drive_path_prefix, drive_path_contains, asset_type)— convenience method returning a filtered asset DataFrame. Supports an additionalasset_type="folder"or"file"filter.quietkeyword argument onCatalog(...)suppresses tqdm progress bars during downloads.DatasetEntryTypedDict exported from the top-level package (previously onlyAssetEntrywas exported).
Changed
Catalog.scan()return type changed fromNonetoScanResult.- Drive scan warnings (e.g. incomplete search results) are now emitted
via
logging.getLoggerrather thanwarnings.warn. - Dataset identity is now
(drive_path, base_name)instead ofbase_namealone, allowing distinct recordings with the same base name in different folders. - Dataset downloads now mirror Drive hierarchy under local storage:
local_data_dir/{drive_path}/.... Catalog.dfandCatalog.assets_dfnow use a path-first schema and no longer include legacydate_folder/experimentcolumns.- Breaking:
Catalog.download()renamed toCatalog.download_dataset()for symmetry withdownload_asset(). - Breaking:
Catalog.get_path()renamed toCatalog.get_dataset_path()for symmetry withget_asset_path(). Catalog.download_dataset()andCatalog.get_dataset_path()now require bothdrive_pathandbase_nameto identify a dataset.- Catalog JSON parse errors now raise
CatalogErrorinstead ofValueError. "Not found" errors on single-item download / get_path methods now raiseEntryNotFoundErrorinstead ofValueError.
Removed
- Breaking:
Catalog(config_path=...)constructor shortcut removed. UseCatalog(Config.from_file("path"))orCatalog()(which callsConfig.from_file()with auto-discovery). - Removed
Catalog.list_experiment_paths(),Catalog.list_by_path(), andCatalog.status()convenience methods. UseCatalog.list_datasets()/Catalog.list_assets()or operate on the raw DataFrames directly.
Fixed
Catalog.scan()now preserveslocal_pathfor datasets by(drive_path, base_name)during refresh, preventing path collisions when multiple folders contain the samebase_name.py.typedmarker (PEP 561) — the package now declares itself as typed, allowing mypy to use its annotations in downstream consumers.Catalog.scan()now reconcileslocal_pathagainst disk rather than trusting the previous catalog blindly. If the catalog is deleted, entries whose data already exists at the canonical local path (local_data_dir/{drive_path}for datasets,local_data_dir/assets/{drive_path}/{asset_name}for assets) are recovered automatically instead of triggering unnecessary re-downloads. Stale catalog paths that no longer exist on disk are also cleared.
[v0.0.7] - 2026-04-08
Changed
Catalog.scan()now defaults to the flat scanner (flat=True), which is faster and works even when the root folder is inaccessible. Passflat=Falseto use the recursive traversal.- Removed automatic fallback logic from
scan(). The caller now chooses the scan strategy via theflatkeyword argument.
[v0.0.6] - 2026-04-07
Added
Catalog.list_experiment_paths()andCatalog.list_by_path()to help discover and filter datasets by their full Drive path.
Fixed
- Depth-based asset cataloging: nested subfolders inside experiments (depth 3+) are now correctly cataloged as folder assets. Previously only direct subfolders (depth 2) were included.
[v0.0.5] - 2026-04-06
Added
- Flat-scan fallback: when the root folder is inaccessible (HTTP 403/404) or
returns an empty listing,
Catalog.scan()automatically falls back to a flatfiles.list()of all files visible to the service account and reconstructs the folder hierarchy fromparentsmetadata. AUserWarningis emitted when the fallback activates. scan_drive_flat()internal function for flat-listing all Drive files with paginatedpageSize=1000requests and parent-chain path reconstruction.Catalogcan now be created without a pre-builtConfigobject:Catalog(config_path="/path/to/config.json")andCatalog()are both supported.
Changed
Config.from_file()discovery now also checks~/.config/radiens-drive/config.jsonand/etc/radiens-drive/config.jsonafter the existing environment variable and local path checks.
[0.0.4] - 2026-04-02
Fixed
[0.0.3] - 2026-04-02
Changed
Config.from_file()now auto-discovers config files when called with no arguments: checksRADIENS_DRIVE_CATALOG_CONFIGenv var, then.secrets/config.json, thenconfig.jsonin the current working directory. RaisesFileNotFoundError(previouslyValueError) when no config is found.
[0.0.2] - 2026-04-02
Added
AssetEntryTypedDict for non-xdat Drive content (folders likelogs/, PowerPoints, writeups, etc.).scan_drive()now returns(datasets, assets)tuple. Assets are auto-discovered during scan: non-xdat files inside date or experiment folders become file assets; subfolders of experiment folders become folder assets and are still recursed for xdat datasets.download_asset()indrive.pyfor downloading file or folder assets; folder assets are downloaded recursively, mirroring the Drive subtree.Catalog.assets_dfproperty — full asset catalog as a pandas DataFrame.Catalog.list_assets()— query assets with optionaldate_folder,experiment, andasset_typefilters.Catalog.download_asset(drive_path, asset_name)— download an asset tolocal_data_dir/assets/{drive_path}/{asset_name}.Catalog.get_asset_path(drive_path, asset_name)— return local path, downloading automatically if needed.- Catalog JSON format changed from a bare list to
{"datasets": [...], "assets": [...]}. Old flat-list catalogs are migrated automatically on the nextscan(). AssetEntryexported from the top-level package.
Changed
Catalog.list()andCatalog.list_assets(): renamed thedateparameter todate_folderto match the catalog column name and make clear that an exact folder name (e.g."2026-02-15_batch") is required, not a date prefix.
[0.0.1] - 2026-04-02
Added
Configdataclass withfrom_file()classmethod; supports~and$ENV_VARexpansion in path fields andRADIENS_DRIVE_CATALOG_CONFIGenv var fallback.build_drive_service()for authenticating with a Google service account.scan_drive()for recursive Drive scanning; returnsDatasetEntryrecords withdate_folder,experiment,drive_path, anddrive_file_ids.download_dataset()for chunked download of xdat filesets to a local directory.Catalogclass withscan(),df,list(),download(),get_path(), andstatus().- MkDocs documentation site with Material theme, auto-generated API reference,
and versioned deployment via
mike. - CI/CD workflows: linting, type checking, tests, docs deployment, and PyPI publishing via GitHub Actions.