Caching and Archiving for Time Series Dataframes

This notebook demonstrates the use of a caching system for small python projects designed to solve the following problems:

  • Avoid repeats of expensive reading and processing chores using the diskcache library (for speed)

  • Provide automatic csv file backup of data in addition retrieve processed elevation data efficiently.

Accelerating a fetch

Let’s say you commonly find yourself writing a code like this to take care of repeat downloading or processing chores. Maybe you are retrieving from models or observed data, and then doing some light processing:

def get_data(station,variable,filter="none"):
    df = read_ts_repo(station,variable)
    df.columns=['value']
    if filter == "none":
        return df
    elif filter == "cosine_lanczos":
        df = df.interpolate(limit=4)     # so that cosine_lanczos doesn't expand small gaps
        return cosine_lanczos(df,'40H)

df0 = get_data(station="mab", variable="flow", filter="cosine_lanczos")
# ... do some plotting or further processing, etc

The function get_data can be a tedious bottleneck, particularly if you are developing or re-running several times in a row. It may be reasonable for the read to take a little while the first time you are doing it and cajoling it. In this tutorial, we will describe a decorator that will greatly accelerate the second and later invocation. Even if get_data takes seconds, the next invocation will take tenths or hundredths.

All you will need for optimal use is rename the function “get_data” something more reasonable for use as a csv file name (e.g. project_data) and use a decorator:

@cache_dataframe
def project_data(station,variable,filter="none"):
    """ Note that all three arguments must be called useing keyword argument syntax and all three are used as keys. You can cherry
        pick this as will be shown later
    """
    pass   # replace with the original process or logic
    ...

Archiving

The second service provided is that everything in the cache can be dumped into a sensible csv file. In the subsequent sections you will learn how to decorate your fetching function, dump the cache to csv, re-constitute the cache (fairly) automatically.

[9]:
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Import Necessary Libraries

We import pandas for data manipulation and modules from dms_datastore which provide functionalities for reading time series data and caching mechanisms.

[18]:

import pandas as pd from dms_datastore.read_multi import * from dms_datastore.caching import * from vtools import cosine_lanczos

Function Definition with Caching

Here we define a function elev_data that reads elevation data for a given station and variable, applies a cosine lanczos filter, and returns both the original and filtered data concatenated as a DataFrame. This function is decorated with @cache_dataframe to enable caching of its results.

[32]:
@cache_dataframe()
def elev_data(station, variable, subloc):
    data = read_ts_repo(station, variable, subloc).loc[pd.Timestamp(2012,1,1):pd.Timestamp(2023,1,1)]
    filt = cosine_lanczos(data, '40H')
    out = pd.concat([data,filt], axis=1)
    out.columns = ["value","filt"]
    out = out.round(3)
    return out

Using the Caching System

We call the elev_data function with specific parameters to fetch data, which will be cached automatically due to our decorator. This step demonstrates fetching data for two different stations. The call for Martinez happens twice. The first invocation takes 10s, the second 0.2s.

[20]:
LocalCache.instance().clear() # Clear the cache
df1 = elev_data(station="mrz", variable="elev", subloc="upper")   # Laborious
print(df1)
df2 = elev_data(station="mal", variable="elev", subloc="upper")
df1 = elev_data(station="mrz",variable="elev",subloc="upper")     # Cached
print(df1)

Cache instance created
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1991.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1992.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1993.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1994.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1995.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1996.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1997.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1998.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_1999.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2000.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2001.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2002.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2003.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2004.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2005.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2006.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2007.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2008.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2009.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2010.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2011.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2012.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2013.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2014.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2015.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2016.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2017.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2018.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2019.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2020.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2021.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2022.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2023.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mrz@upper_40_elev_2024.csv
transition
 [None, None]
here None None ts
                      value
datetime
1991-01-26 18:45:00  -1.81
1991-01-26 19:00:00  -1.53
1991-01-26 19:15:00  -1.26
1991-01-26 19:30:00  -0.98
1991-01-26 19:45:00  -0.74
...                    ...
2024-07-03 06:15:00   0.56
2024-07-03 06:30:00   0.47
2024-07-03 06:45:00   0.50
2024-07-03 07:00:00   0.67
2024-07-03 07:15:00   0.84

[1172307 rows x 1 columns]
<class 'pandas.core.frame.DataFrame'>
meta is False
d:\delta\models\vtools3\vtools\functions\filter.py:31: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.
  cp = pd.tseries.frequencies.to_offset(cutoff_period)
                     value  filt
datetime
2018-01-01 00:00:00   4.91   NaN
2018-01-01 00:15:00   5.04   NaN
2018-01-01 00:30:00   5.13   NaN
2018-01-01 00:45:00   5.19   NaN
2018-01-01 01:00:00   5.21   NaN
...                    ...   ...
2022-12-31 23:00:00   4.63   NaN
2022-12-31 23:15:00   4.48   NaN
2022-12-31 23:30:00   4.28   NaN
2022-12-31 23:45:00   4.11   NaN
2023-01-01 00:00:00   3.94   NaN

[175297 rows x 2 columns]
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1992.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1993.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1994.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1995.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1996.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1997.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1998.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_1999.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2000.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2001.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2002.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2003.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2004.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2005.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2006.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2007.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2008.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2009.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2010.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2011.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2012.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2013.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2014.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2015.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2016.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2017.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2018.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2019.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2020.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2021.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2022.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2023.csv
//cnrastore-bdo/Modeling_Data/repo/continuous/screened\des_mal@upper_60_elev_2024.csv
transition
 [None, None]
here None None ts
                      value
datetime
1992-08-04 14:00:00 -0.010
1992-08-04 14:15:00  0.320
1992-08-04 14:30:00  0.550
1992-08-04 14:45:00  0.750
1992-08-04 15:00:00  1.030
...                    ...
2024-07-03 06:15:00  2.024
2024-07-03 06:30:00  1.981
2024-07-03 06:45:00  1.802
2024-07-03 07:00:00  1.665
2024-07-03 07:15:00  1.577

[1118950 rows x 1 columns]
<class 'pandas.core.frame.DataFrame'>
meta is False
                     value  filt
datetime
2018-01-01 00:00:00   4.91   NaN
2018-01-01 00:15:00   5.04   NaN
2018-01-01 00:30:00   5.13   NaN
2018-01-01 00:45:00   5.19   NaN
2018-01-01 01:00:00   5.21   NaN
...                    ...   ...
2022-12-31 23:00:00   4.63   NaN
2022-12-31 23:15:00   4.48   NaN
2022-12-31 23:30:00   4.28   NaN
2022-12-31 23:45:00   4.11   NaN
2023-01-01 00:00:00   3.94   NaN

[175297 rows x 2 columns]
d:\delta\models\vtools3\vtools\functions\filter.py:31: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.
  cp = pd.tseries.frequencies.to_offset(cutoff_period)

Saving Cache to CSV

After fetching and potentially caching the data, we proceed to save the cached data to CSV files. This ensures that we have a persistent copy of the cached data on disk.

This can be agonizingly slow.

[21]:
cache_to_csv()

Reloading Cached Data from CSV

You can load the CSV files back into the cache. This way you can distribute the data as little data packs in csv form and then reconstitute.

[31]:
# Clear the cache
import timeit
cache = LocalCache.instance()
cache.clear()

print("Loading cache from csv")
load_cache_csv('elev_data.csv')
print("Done")

# Now try again
print("Here is the data from the usual API")
df1 = elev_data(station="mrz", variable="elev", subloc="upper")


print(df1)

# This is the code that directly pulls from the cache
print("Here it is using nuts and bolts calls. Can't think of any obvious reason to do it this way")
print(cache[generate_cache_key("elev_data", station="mrz", variable="elev", subloc="upper")])



Loading cache from csv
Done
Here is the data from the usual API
                    station subloc variable  value  filt
DatetimeIndex
2018-01-01 00:00:00     mrz  upper     elev   4.91   NaN
2018-01-01 00:15:00     mrz  upper     elev   5.04   NaN
2018-01-01 00:30:00     mrz  upper     elev   5.13   NaN
2018-01-01 00:45:00     mrz  upper     elev   5.19   NaN
2018-01-01 01:00:00     mrz  upper     elev   5.21   NaN
...                     ...    ...      ...    ...   ...
2022-12-31 23:00:00     mrz  upper     elev   4.63   NaN
2022-12-31 23:15:00     mrz  upper     elev   4.48   NaN
2022-12-31 23:30:00     mrz  upper     elev   4.28   NaN
2022-12-31 23:45:00     mrz  upper     elev   4.11   NaN
2023-01-01 00:00:00     mrz  upper     elev   3.94   NaN

[175297 rows x 5 columns]
Here it is using nuts and bolts calls. Can't think of any obvious reason to do it this way
                    station subloc variable  value  filt
DatetimeIndex
2018-01-01 00:00:00     mrz  upper     elev   4.91   NaN
2018-01-01 00:15:00     mrz  upper     elev   5.04   NaN
2018-01-01 00:30:00     mrz  upper     elev   5.13   NaN
2018-01-01 00:45:00     mrz  upper     elev   5.19   NaN
2018-01-01 01:00:00     mrz  upper     elev   5.21   NaN
...                     ...    ...      ...    ...   ...
2022-12-31 23:00:00     mrz  upper     elev   4.63   NaN
2022-12-31 23:15:00     mrz  upper     elev   4.48   NaN
2022-12-31 23:30:00     mrz  upper     elev   4.28   NaN
2022-12-31 23:45:00     mrz  upper     elev   4.11   NaN
2023-01-01 00:00:00     mrz  upper     elev   3.94   NaN

[175297 rows x 5 columns]