{ "cells": [ { "cell_type": "markdown", "id": "b01f392d", "metadata": {}, "source": [ "# Caching and Archiving for Time Series Dataframes\n", "\n", "This notebook demonstrates the use of a caching system for small python projects designed to solve the following problems:\n", "\n", "- Avoid repeats of expensive reading and processing chores using the diskcache library (for speed)\n", "- Provide automatic csv file backup of data in addition retrieve processed elevation data efficiently.\n", "\n", "## Accelerating a fetch\n", "Let's say you commonly find yourself writing a code like this to take care of repeat downloading or processing chores. Maybe you are retrieving \n", "from models or observed data, and then doing some light processing:\n", "\n", "```python\n", "def get_data(station,variable,filter=\"none\"):\n", " df = read_ts_repo(station,variable)\n", " df.columns=['value']\n", " if filter == \"none\":\n", " return df\n", " elif filter == \"cosine_lanczos\":\n", " df = df.interpolate(limit=4) # so that cosine_lanczos doesn't expand small gaps \n", " return cosine_lanczos(df,'40H)\n", "\n", "df0 = get_data(station=\"mab\", variable=\"flow\", filter=\"cosine_lanczos\")\n", "# ... do some plotting or further processing, etc\n", "```\n", "\n", "The function `get_data` can be a tedious bottleneck, particularly if you are developing or re-running several times in a row. It may be reasonable for the read to take a little while the first time you are doing it and cajoling it. In this tutorial, we will describe a decorator that will greatly accelerate the second and later invocation. Even if `get_data` takes seconds, the next invocation will take tenths or hundredths. \n", "\n", "All you will need for optimal use is rename the function \"get_data\" something more reasonable for use as a csv file name (e.g. `project_data`) and use a decorator:\n", "\n", "```python\n", "@cache_dataframe\n", "def project_data(station,variable,filter=\"none\"):\n", " \"\"\" Note that all three arguments must be called useing keyword argument syntax and all three are used as keys. You can cherry\n", " pick this as will be shown later\n", " \"\"\"\n", " pass # replace with the original process or logic\n", " ...\n", "```\n", "## Archiving\n", "The second service provided is that everything in the cache can be dumped into a sensible csv file. In the subsequent sections you will learn how to decorate your fetching function, dump the cache to csv, re-constitute the cache (fairly) automatically.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 9, "id": "3bba4703", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The autoreload extension is already loaded. To reload it, use:\n", " %reload_ext autoreload\n" ] } ], "source": [ "%load_ext autoreload\n", "%autoreload 2 " ] }, { "cell_type": "markdown", "id": "b02ede4a", "metadata": {}, "source": [ "## Import Necessary Libraries\n", "\n", "We import `pandas` for data manipulation and modules from `dms_datastore` which provide functionalities for reading time series data and caching mechanisms.\n" ] }, { "cell_type": "code", "execution_count": 18, "id": "927806fd", "metadata": {}, "outputs": [], "source": [ "\n", "import pandas as pd\n", "from dms_datastore.read_multi import *\n", "from dms_datastore.caching import *\n", "from vtools import cosine_lanczos\n" ] }, { "cell_type": "markdown", "id": "8f10c1b9", "metadata": {}, "source": [ "## Function Definition with Caching\n", "\n", "Here we define a function `elev_data` that reads elevation data for a given station and variable, applies a cosine lanczos filter, and returns both the original and filtered data concatenated as a DataFrame. This function is decorated with `@cache_dataframe` to enable caching of its results.\n" ] }, { "cell_type": "code", "execution_count": 32, "id": "7b490295", "metadata": {}, "outputs": [], "source": [ "@cache_dataframe()\n", "def elev_data(station, variable, subloc):\n", " data = read_ts_repo(station, variable, subloc).loc[pd.Timestamp(2012,1,1):pd.Timestamp(2023,1,1)]\n", " filt = cosine_lanczos(data, '40H')\n", " out = pd.concat([data,filt], axis=1)\n", " out.columns = [\"value\",\"filt\"]\n", " out = out.round(3)\n", " return out\n" ] }, { "cell_type": "markdown", "id": "436f348b", "metadata": {}, "source": [ "## Using the Caching System\n", "\n", "We call the `elev_data` function with specific parameters to fetch data, which will be cached automatically due to our decorator. This step demonstrates fetching data for two different stations. The call for Martinez happens twice. The first invocation takes 10s, the second 0.2s.\n" ] }, { "cell_type": "code", "execution_count": 20, "id": "fda4aa83", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cache instance created\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1991.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1992.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1993.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1994.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1995.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1996.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1997.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1998.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_1999.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2000.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2001.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2002.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2003.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2004.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2005.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2006.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2007.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2008.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2009.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2010.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2011.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2012.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2013.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2014.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2015.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2016.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2017.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2018.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2019.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2020.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2021.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2022.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2023.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mrz@upper_40_elev_2024.csv\n", "transition\n", " [None, None]\n", "here None None ts\n", " value\n", "datetime \n", "1991-01-26 18:45:00 -1.81\n", "1991-01-26 19:00:00 -1.53\n", "1991-01-26 19:15:00 -1.26\n", "1991-01-26 19:30:00 -0.98\n", "1991-01-26 19:45:00 -0.74\n", "... ...\n", "2024-07-03 06:15:00 0.56\n", "2024-07-03 06:30:00 0.47\n", "2024-07-03 06:45:00 0.50\n", "2024-07-03 07:00:00 0.67\n", "2024-07-03 07:15:00 0.84\n", "\n", "[1172307 rows x 1 columns]\n", "\n", "meta is False\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "d:\\delta\\models\\vtools3\\vtools\\functions\\filter.py:31: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.\n", " cp = pd.tseries.frequencies.to_offset(cutoff_period)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " value filt\n", "datetime \n", "2018-01-01 00:00:00 4.91 NaN\n", "2018-01-01 00:15:00 5.04 NaN\n", "2018-01-01 00:30:00 5.13 NaN\n", "2018-01-01 00:45:00 5.19 NaN\n", "2018-01-01 01:00:00 5.21 NaN\n", "... ... ...\n", "2022-12-31 23:00:00 4.63 NaN\n", "2022-12-31 23:15:00 4.48 NaN\n", "2022-12-31 23:30:00 4.28 NaN\n", "2022-12-31 23:45:00 4.11 NaN\n", "2023-01-01 00:00:00 3.94 NaN\n", "\n", "[175297 rows x 2 columns]\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1992.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1993.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1994.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1995.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1996.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1997.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1998.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_1999.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2000.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2001.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2002.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2003.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2004.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2005.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2006.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2007.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2008.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2009.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2010.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2011.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2012.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2013.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2014.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2015.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2016.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2017.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2018.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2019.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2020.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2021.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2022.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2023.csv\n", "//cnrastore-bdo/Modeling_Data/repo/continuous/screened\\des_mal@upper_60_elev_2024.csv\n", "transition\n", " [None, None]\n", "here None None ts\n", " value\n", "datetime \n", "1992-08-04 14:00:00 -0.010\n", "1992-08-04 14:15:00 0.320\n", "1992-08-04 14:30:00 0.550\n", "1992-08-04 14:45:00 0.750\n", "1992-08-04 15:00:00 1.030\n", "... ...\n", "2024-07-03 06:15:00 2.024\n", "2024-07-03 06:30:00 1.981\n", "2024-07-03 06:45:00 1.802\n", "2024-07-03 07:00:00 1.665\n", "2024-07-03 07:15:00 1.577\n", "\n", "[1118950 rows x 1 columns]\n", "\n", "meta is False\n", " value filt\n", "datetime \n", "2018-01-01 00:00:00 4.91 NaN\n", "2018-01-01 00:15:00 5.04 NaN\n", "2018-01-01 00:30:00 5.13 NaN\n", "2018-01-01 00:45:00 5.19 NaN\n", "2018-01-01 01:00:00 5.21 NaN\n", "... ... ...\n", "2022-12-31 23:00:00 4.63 NaN\n", "2022-12-31 23:15:00 4.48 NaN\n", "2022-12-31 23:30:00 4.28 NaN\n", "2022-12-31 23:45:00 4.11 NaN\n", "2023-01-01 00:00:00 3.94 NaN\n", "\n", "[175297 rows x 2 columns]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "d:\\delta\\models\\vtools3\\vtools\\functions\\filter.py:31: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.\n", " cp = pd.tseries.frequencies.to_offset(cutoff_period)\n" ] } ], "source": [ "LocalCache.instance().clear() # Clear the cache\n", "df1 = elev_data(station=\"mrz\", variable=\"elev\", subloc=\"upper\") # Laborious\n", "print(df1)\n", "df2 = elev_data(station=\"mal\", variable=\"elev\", subloc=\"upper\")\n", "df1 = elev_data(station=\"mrz\",variable=\"elev\",subloc=\"upper\") # Cached \n", "print(df1)\n" ] }, { "cell_type": "markdown", "id": "169fcc91", "metadata": {}, "source": [ "## Saving Cache to CSV\n", "\n", "After fetching and potentially caching the data, we proceed to save the cached data to CSV files. \n", "This ensures that we have a persistent copy of the cached data on disk. \n", "\n", "This can be agonizingly slow. \n" ] }, { "cell_type": "code", "execution_count": 21, "id": "2e0a6098", "metadata": {}, "outputs": [], "source": [ "cache_to_csv()\n" ] }, { "cell_type": "markdown", "id": "eed5b0fd", "metadata": {}, "source": [ "## Reloading Cached Data from CSV\n", "\n", "You can load the CSV files back into the cache. This way you can distribute the data as little data packs in csv form and then reconstitute.\n" ] }, { "cell_type": "code", "execution_count": 31, "id": "626fde19", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading cache from csv\n", "Done\n", "Here is the data from the usual API\n", " station subloc variable value filt\n", "DatetimeIndex \n", "2018-01-01 00:00:00 mrz upper elev 4.91 NaN\n", "2018-01-01 00:15:00 mrz upper elev 5.04 NaN\n", "2018-01-01 00:30:00 mrz upper elev 5.13 NaN\n", "2018-01-01 00:45:00 mrz upper elev 5.19 NaN\n", "2018-01-01 01:00:00 mrz upper elev 5.21 NaN\n", "... ... ... ... ... ...\n", "2022-12-31 23:00:00 mrz upper elev 4.63 NaN\n", "2022-12-31 23:15:00 mrz upper elev 4.48 NaN\n", "2022-12-31 23:30:00 mrz upper elev 4.28 NaN\n", "2022-12-31 23:45:00 mrz upper elev 4.11 NaN\n", "2023-01-01 00:00:00 mrz upper elev 3.94 NaN\n", "\n", "[175297 rows x 5 columns]\n", "Here it is using nuts and bolts calls. Can't think of any obvious reason to do it this way\n", " station subloc variable value filt\n", "DatetimeIndex \n", "2018-01-01 00:00:00 mrz upper elev 4.91 NaN\n", "2018-01-01 00:15:00 mrz upper elev 5.04 NaN\n", "2018-01-01 00:30:00 mrz upper elev 5.13 NaN\n", "2018-01-01 00:45:00 mrz upper elev 5.19 NaN\n", "2018-01-01 01:00:00 mrz upper elev 5.21 NaN\n", "... ... ... ... ... ...\n", "2022-12-31 23:00:00 mrz upper elev 4.63 NaN\n", "2022-12-31 23:15:00 mrz upper elev 4.48 NaN\n", "2022-12-31 23:30:00 mrz upper elev 4.28 NaN\n", "2022-12-31 23:45:00 mrz upper elev 4.11 NaN\n", "2023-01-01 00:00:00 mrz upper elev 3.94 NaN\n", "\n", "[175297 rows x 5 columns]\n" ] } ], "source": [ "# Clear the cache\n", "import timeit\n", "cache = LocalCache.instance()\n", "cache.clear()\n", "\n", "print(\"Loading cache from csv\")\n", "load_cache_csv('elev_data.csv')\n", "print(\"Done\")\n", "\n", "# Now try again\n", "print(\"Here is the data from the usual API\")\n", "df1 = elev_data(station=\"mrz\", variable=\"elev\", subloc=\"upper\")\n", "\n", "\n", "print(df1)\n", "\n", "# This is the code that directly pulls from the cache\n", "print(\"Here it is using nuts and bolts calls. Can't think of any obvious reason to do it this way\")\n", "print(cache[generate_cache_key(\"elev_data\", station=\"mrz\", variable=\"elev\", subloc=\"upper\")])\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "schism", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 5 }