Dropbox Data Processing System ============================== Overview -------- The Dropbox Data Processing System is a component of the DMS Datastore package that facilitates collection, transformation, and storage of time-series data using a configuration-driven workflow. Key Components -------------- 1. ``dropbox_data.py`` Main processing script that reads a YAML specification file and processes data according to the defined rules. 2. ``dropbox_spec.yaml`` YAML configuration that defines data sources, collection parameters, and metadata inference rules. How it works ------------ The processing follows these steps: 1. Read a YAML specification. 2. For each entry: locate files, read time-series, augment with metadata, and write standardized output files. Usage ----- Basic usage from Python:: from dms_datastore.dropbox_data import dropbox_data dropbox_data("path/to/dropbox_spec.yaml") Or run as a module:: python -m dms_datastore.dropbox_data Configuration Specification --------------------------- Typical keys in the spec include ``dropbox_home``, ``dest``, and a ``data`` list with entries containing ``collect`` and ``metadata``. The spec also supports ``metadata_infer`` rules using regular expressions. Example configuration snippet:: - name: USGS Aquarius flows skip: False collect: name: file_search recursive_search: True file_pattern: "Discharge.ft^3_s.velq@*.EntireRecord.csv" location: "//cnrastore-bdo/.../dropbox/usgs_aquarius_request_2020/**" reader: read_ts metadata: station_id: infer_from_agency_id source: aquarius agency: usgs param: flow sublocation: default unit: ft^3/s metadata_infer: regex: .*@(.*)\.EntireRecord.csv groups: 1: agency_id Key Classes and Functions ------------------------- - ``DataCollector`` — handles file discovery based on patterns. - ``get_spec`` — loads and caches the YAML spec. - ``populate_meta`` — enriches metadata using the station database. - ``infer_meta`` — extracts metadata from filenames via regex. Output ------ Processed files are saved in the destination directory. Filenames follow the pattern ``{source}_{station_id}_{agency_id}_{param}.csv`` and may be chunked by year. Additional Notes ---------------- - Relies on a station database for lookups. - Standardizes time-series to include a `value` column and metadata headers. - Files may be year-sharded for easier management.