Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
no unreleased changes
[0.0.10] - 2026-06-04
Added
xdat_dataset_exists_locally()utility to verify all three canonical xdat files (_data.xdat,.xdat.json,_timestamp.xdat) exist locally.Catalog.scan()now warns when a dataset on Drive is missing one or more required xdat files.Catalog.get_dataset_path()andCatalog.get_asset_path()now automatically trigger ascan()if the requested entry is not found in the local catalog, improving resilience to out-of-date catalogs.
Changed
- Catalog JSON writes are now atomic (using a temporary file and
replace()) to prevent data loss during interruptions. - File downloads are now atomic (using a
.tmpsuffix) to ensure interrupted downloads do not leave partial or corrupted files on disk. Catalog.scan()now reconciles local datasets by verifying the presence of all three xdat files, preventing a dataset from being marked "local" if it is incomplete.- Refactored
Cataloginternal path restoration logic into_restore_dataset_local_paths()and_restore_asset_local_paths().
Fixed
- Improved validation of asset entries during catalog load;
drive_pathmust now be a non-empty string. - Streamlined local path construction for assets in
download_asset().
[0.0.9] - 2026-05-20
Added
upload_timecolumn inCatalog.dfandCatalog.assets_df: a UTC-awaredatetime64column populated from the Drive API'screatedTimefield duringscan(). Rows without a recorded date areNaT. Existing catalogs without this field load cleanly — missing values default toNone/NaT.
Changed
Catalog.prefetch()progress bar now covers only items being downloaded (not skipped ones), so the total count and ETA reflect real work. The bar is suppressed entirely when all matching items are already local.- Skipped items in
prefetch()are logged atDEBUGlevel instead of being silently counted. - Per-item download log messages in
Catalog.download_dataset()andCatalog.download_asset()lowered fromINFOtoDEBUG. Theprefetch()progress bar is the intended progress signal for bulk operations; direct single-item calls remain observable via debug logging.
[0.0.8] - 2026-04-21
Added
Catalog.scan()now returns aScanResultdataclass with counts ofdatasets_new,datasets_existing,datasets_removed,assets_new,assets_existing, andassets_removed— making it easy to inspect what changed after a Drive rescan.ScanResult.__str__renders as a compact table.ScanResultdataclass exported from the top-level package.- Progress bar during Drive file traversal in
scan()(respects thequiet=flag passed toCatalog). - Progress bar in
prefetch()showing overall item count and the name of the entry currently being downloaded (also respectsquiet=). Catalog.prefetch(drive_path, drive_path_prefix, drive_path_contains, datasets=True, assets=True, asset_type=None, force=False)— bulk, idempotent downloader that skips entries already present on disk. Passforce=Trueto re-download every matching entry regardless of local state. Returns aPrefetchResultwith per-category download and skip counts.PrefetchResultdataclass exported from the top-level package.CatalogErrorandEntryNotFoundErrorexceptions exported from the top-level package.EntryNotFoundError(subclass ofCatalogError) is raised when a dataset or asset lookup fails;CatalogErroris raised for malformed catalog JSON.Catalog.list_datasets(drive_path, drive_path_prefix, drive_path_contains)— convenience method returning a filtered dataset DataFrame.Catalog.list_assets(drive_path, drive_path_prefix, drive_path_contains, asset_type)— convenience method returning a filtered asset DataFrame. Supports an additionalasset_type="folder"or"file"filter.quietkeyword argument onCatalog(...)suppresses tqdm progress bars during downloads.DatasetEntryTypedDict exported from the top-level package (previously onlyAssetEntrywas exported).
Changed
Catalog.scan()return type changed fromNonetoScanResult.- Drive scan warnings (e.g. incomplete search results) are now emitted
via
logging.getLoggerrather thanwarnings.warn. - Dataset identity is now
(drive_path, base_name)instead ofbase_namealone, allowing distinct recordings with the same base name in different folders. - Dataset downloads now mirror Drive hierarchy under local storage:
local_data_dir/{drive_path}/.... Catalog.dfandCatalog.assets_dfnow use a path-first schema and no longer include legacydate_folder/experimentcolumns.- Breaking:
Catalog.download()renamed toCatalog.download_dataset()for symmetry withdownload_asset(). - Breaking:
Catalog.get_path()renamed toCatalog.get_dataset_path()for symmetry withget_asset_path(). Catalog.download_dataset()andCatalog.get_dataset_path()now require bothdrive_pathandbase_nameto identify a dataset.- Catalog JSON parse errors now raise
CatalogErrorinstead ofValueError. "Not found" errors on single-item download / get_path methods now raiseEntryNotFoundErrorinstead ofValueError.
Removed
- Breaking:
Catalog(config_path=...)constructor shortcut removed. UseCatalog(Config.from_file("path"))orCatalog()(which callsConfig.from_file()with auto-discovery). - Removed
Catalog.list_experiment_paths(),Catalog.list_by_path(), andCatalog.status()convenience methods. UseCatalog.list_datasets()/Catalog.list_assets()or operate on the raw DataFrames directly.
Fixed
Catalog.scan()now preserveslocal_pathfor datasets by(drive_path, base_name)during refresh, preventing path collisions when multiple folders contain the samebase_name.py.typedmarker (PEP 561) — the package now declares itself as typed, allowing mypy to use its annotations in downstream consumers.Catalog.scan()now reconcileslocal_pathagainst disk rather than trusting the previous catalog blindly. If the catalog is deleted, entries whose data already exists at the canonical local path (local_data_dir/{drive_path}for datasets,local_data_dir/assets/{drive_path}/{asset_name}for assets) are recovered automatically instead of triggering unnecessary re-downloads. Stale catalog paths that no longer exist on disk are also cleared.
[v0.0.7] - 2026-04-08
Changed
Catalog.scan()now defaults to the flat scanner (flat=True), which is faster and works even when the root folder is inaccessible. Passflat=Falseto use the recursive traversal.- Removed automatic fallback logic from
scan(). The caller now chooses the scan strategy via theflatkeyword argument.
[v0.0.6] - 2026-04-07
Added
Catalog.list_experiment_paths()andCatalog.list_by_path()to help discover and filter datasets by their full Drive path.
Fixed
- Depth-based asset cataloging: nested subfolders inside experiments (depth 3+) are now correctly cataloged as folder assets. Previously only direct subfolders (depth 2) were included.
[v0.0.5] - 2026-04-06
Added
- Flat-scan fallback: when the root folder is inaccessible (HTTP 403/404) or
returns an empty listing,
Catalog.scan()automatically falls back to a flatfiles.list()of all files visible to the service account and reconstructs the folder hierarchy fromparentsmetadata. AUserWarningis emitted when the fallback activates. scan_drive_flat()internal function for flat-listing all Drive files with paginatedpageSize=1000requests and parent-chain path reconstruction.Catalogcan now be created without a pre-builtConfigobject:Catalog(config_path="/path/to/config.json")andCatalog()are both supported.
Changed
Config.from_file()discovery now also checks~/.config/radiens-drive/config.jsonand/etc/radiens-drive/config.jsonafter the existing environment variable and local path checks.
[0.0.4] - 2026-04-02
Fixed
[0.0.3] - 2026-04-02
Changed
Config.from_file()now auto-discovers config files when called with no arguments: checksRADIENS_DRIVE_CATALOG_CONFIGenv var, then.secrets/config.json, thenconfig.jsonin the current working directory. RaisesFileNotFoundError(previouslyValueError) when no config is found.
[0.0.2] - 2026-04-02
Added
AssetEntryTypedDict for non-xdat Drive content (folders likelogs/, PowerPoints, writeups, etc.).scan_drive()now returns(datasets, assets)tuple. Assets are auto-discovered during scan: non-xdat files inside date or experiment folders become file assets; subfolders of experiment folders become folder assets and are still recursed for xdat datasets.download_asset()indrive.pyfor downloading file or folder assets; folder assets are downloaded recursively, mirroring the Drive subtree.Catalog.assets_dfproperty — full asset catalog as a pandas DataFrame.Catalog.list_assets()— query assets with optionaldate_folder,experiment, andasset_typefilters.Catalog.download_asset(drive_path, asset_name)— download an asset tolocal_data_dir/assets/{drive_path}/{asset_name}.Catalog.get_asset_path(drive_path, asset_name)— return local path, downloading automatically if needed.- Catalog JSON format changed from a bare list to
{"datasets": [...], "assets": [...]}. Old flat-list catalogs are migrated automatically on the nextscan(). AssetEntryexported from the top-level package.
Changed
Catalog.list()andCatalog.list_assets(): renamed thedateparameter todate_folderto match the catalog column name and make clear that an exact folder name (e.g."2026-02-15_batch") is required, not a date prefix.
[0.0.1] - 2026-04-02
Added
Configdataclass withfrom_file()classmethod; supports~and$ENV_VARexpansion in path fields andRADIENS_DRIVE_CATALOG_CONFIGenv var fallback.build_drive_service()for authenticating with a Google service account.scan_drive()for recursive Drive scanning; returnsDatasetEntryrecords withdate_folder,experiment,drive_path, anddrive_file_ids.download_dataset()for chunked download of xdat filesets to a local directory.Catalogclass withscan(),df,list(),download(),get_path(), andstatus().- MkDocs documentation site with Material theme, auto-generated API reference,
and versioned deployment via
mike. - CI/CD workflows: linting, type checking, tests, docs deployment, and PyPI publishing via GitHub Actions.