{
"cells": [
{
"cell_type": "markdown",
"id": "436d5ccf",
"metadata": {},
"source": [
"# Merging and Splicing Time Series\n",
"This tutorial demonstrates the usage and difference between `ts_merge` and `ts_splice`, two methods for folding together time series into a combined data structure.\n",
"\n",
"- **`ts_merge`** blends multiple time series together based on priority, filling missing values. It potentiallyu uses all the input series at all timestamps.\n",
"- **`ts_splice`** stitches together time series in sequential time **blocks** without mixing values.\n",
"\n",
"We will describe the effect on regularly sampled series (which have the `freq` attribute) and on irregular. We will also explore the **`names`** argument, which controls how columns are selected or renamed in the merging/splicing process. There is a file-level command line tools for this as well in the `dms_datastore` package.\n",
"\n",
"## Prioritized filling on regular series\n",
"Let's begin by showing how `ts_merge` and `ts_splice` fold together two regular series but gappy \n",
"series on a prioritized basis.\n",
"\n",
"Here are the sample series:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e52fb077",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'pd' is not defined",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[1], line 4\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;66;03m# ========================================\u001b[39;00m\n\u001b[0;32m 2\u001b[0m \u001b[38;5;66;03m# 1️⃣ Creating Regular Time Series (1D Frequency with Missing Data)\u001b[39;00m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;66;03m# ========================================\u001b[39;00m\n\u001b[1;32m----> 4\u001b[0m idx1 \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241m.\u001b[39mdate_range(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m2023-01-01\u001b[39m\u001b[38;5;124m\"\u001b[39m, periods\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m10\u001b[39m, freq\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m1D\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 5\u001b[0m idx2 \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mdate_range(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m2023-01-01\u001b[39m\u001b[38;5;124m\"\u001b[39m, periods\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m12\u001b[39m, freq\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m1D\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 6\u001b[0m idx3 \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mdate_range(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m2022-12-31\u001b[39m\u001b[38;5;124m\"\u001b[39m, periods\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m14\u001b[39m, freq\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m1D\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
"\u001b[1;31mNameError\u001b[0m: name 'pd' is not defined"
]
}
],
"source": [
"# ========================================\n",
"# 1️⃣ Creating Regular Time Series (1D Frequency with Missing Data)\n",
"# ========================================\n",
"idx1 = pd.date_range(\"2023-01-01\", periods=10, freq=\"1D\")\n",
"idx2 = pd.date_range(\"2023-01-01\", periods=12, freq=\"1D\")\n",
"idx3 = pd.date_range(\"2022-12-31\", periods=14, freq=\"1D\")\n",
"\n",
"series1 = pd.Series([1, np.nan, 3, np.nan, 5, 6, np.nan, 8, 9, 10], index=idx1, name=\"A\")\n",
"series2 = pd.Series([np.nan, 2, np.nan, 4, np.nan, np.nan, 7, np.nan, np.nan, np.nan,3.,4.], index=idx2, name=\"A\")\n",
"series3 = pd.Series([1000.,1001., 1002., np.nan, 1004., np.nan, np.nan, 1007., np.nan, np.nan, np.nan,1005.,1006.,1007.], index=idx3, name=\"A\")\n",
"\n",
"print(\"Series 1 (Primary):\")\n",
"display(series1)\n",
"\n",
"print(\"\\nSeries 2 (Secondary - Fills Gaps):\")\n",
"display(series2)\n",
"\n",
"print(\"\\nSeries 3 (Tertiary - Fills Gaps):\")\n",
"display(series3)\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "b2175099",
"metadata": {},
"source": [
"And here is what it looks like spliced instead of merged."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5dd08914",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Merged Series with Prioritization:\n"
]
},
{
"data": {
"text/plain": [
"2022-12-31 1000.0\n",
"2023-01-01 1.0\n",
"2023-01-02 2.0\n",
"2023-01-03 3.0\n",
"2023-01-04 4.0\n",
"2023-01-05 5.0\n",
"2023-01-06 6.0\n",
"2023-01-07 7.0\n",
"2023-01-08 8.0\n",
"2023-01-09 9.0\n",
"2023-01-10 10.0\n",
"2023-01-11 3.0\n",
"2023-01-12 4.0\n",
"2023-01-13 1007.0\n",
"Freq: D, Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# ========================================\n",
"# 2️⃣ Using `ts_merge()` with Prioritization\n",
"# ========================================\n",
"merged_series = ts_merge((series1, series2, series3))\n",
"print(\"\\nMerged Series with Prioritization:\")\n",
"display(merged_series)"
]
},
{
"cell_type": "markdown",
"id": "fab7780c",
"metadata": {},
"source": [
"## Splicing\n",
"Splicing marches through the prioritized list of input time series and exclusively uses values for the higher priority series one during the entire span of that series. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae88f210",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Spliced Series with Prioritization:\n"
]
},
{
"data": {
"text/plain": [
"2022-12-31 1000.0\n",
"2023-01-01 1001.0\n",
"2023-01-02 1002.0\n",
"2023-01-03 NaN\n",
"2023-01-04 1004.0\n",
"2023-01-05 NaN\n",
"2023-01-06 NaN\n",
"2023-01-07 1007.0\n",
"2023-01-08 NaN\n",
"2023-01-09 NaN\n",
"2023-01-10 NaN\n",
"2023-01-11 1005.0\n",
"2023-01-12 1006.0\n",
"2023-01-13 1007.0\n",
"Freq: D, Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Spliced Series with Prioritization, Prefer first:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-02 NaN\n",
"2023-01-03 3.0\n",
"2023-01-04 NaN\n",
"2023-01-05 5.0\n",
"2023-01-06 6.0\n",
"2023-01-07 NaN\n",
"2023-01-08 8.0\n",
"2023-01-09 9.0\n",
"2023-01-10 10.0\n",
"2023-01-11 3.0\n",
"2023-01-12 4.0\n",
"2023-01-13 1007.0\n",
"Freq: D, Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"spliced_series = ts_splice((series1, series2, series3))\n",
"print(\"\\nSpliced Series with Prioritization and default `prefer last`:\")\n",
"display(spliced_series)\n",
"spliced_first = ts_splice((series1, series2, series3),transition=\"prefer_first\")\n",
"print(\"\\nSpliced Series with Prioritization, Prefer first:\")\n",
"display(spliced_first)"
]
},
{
"cell_type": "markdown",
"id": "4a2478d5",
"metadata": {},
"source": [
"## Irregular series\n",
"\n",
"Now we will look at some irregular series and see the difference in output from ts_merge (which shuffles) and ts_splice (which exclusively uses values from one series at a time based on the span of the series and its priority)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a1d0dae",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Irregular Series 1:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-03 NaN\n",
"2023-01-07 3.0\n",
"2023-01-10 4.0\n",
"Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Irregular Series 2:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-02 10.0\n",
"2023-01-04 20.0\n",
"2023-01-08 NaN\n",
"2023-01-11 40.0\n",
"Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Merged Irregular Series (May Shuffle Timestamps):\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-02 10.0\n",
"2023-01-03 NaN\n",
"2023-01-04 20.0\n",
"2023-01-07 3.0\n",
"2023-01-08 NaN\n",
"2023-01-10 4.0\n",
"2023-01-11 40.0\n",
"Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Spliced Irregular Series (prefer_last):\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-02 10.0\n",
"2023-01-04 20.0\n",
"2023-01-08 NaN\n",
"2023-01-11 40.0\n",
"Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"\n",
"# ========================================\n",
"# 3️⃣ Creating Irregular Time Series (No Freq Attribute)\n",
"# ========================================\n",
"idx_irreg1 = pd.to_datetime([\"2023-01-01\", \"2023-01-03\", \"2023-01-07\", \"2023-01-10\"])\n",
"idx_irreg2 = pd.to_datetime([\"2023-01-02\", \"2023-01-04\", \"2023-01-08\", \"2023-01-11\"])\n",
"\n",
"series_irreg1 = pd.Series([1, np.nan, 3, 4], index=idx_irreg1, name=\"A\")\n",
"series_irreg2 = pd.Series([10, 20, np.nan, 40], index=idx_irreg2, name=\"A\")\n",
"\n",
"print(\"\\nIrregular Series 1:\")\n",
"display(series_irreg1)\n",
"\n",
"print(\"\\nIrregular Series 2:\")\n",
"display(series_irreg2)\n",
"\n",
"# ========================================\n",
"# 4️⃣ Using `ts_merge()` with Irregular Time Series\n",
"# ========================================\n",
"merged_irregular = ts_merge((series_irreg1, series_irreg2))\n",
"print(\"\\nMerged Irregular Series (May Shuffle Timestamps):\")\n",
"display(merged_irregular)\n",
"\n",
"# ========================================\n",
"# 5️⃣ Using `ts_splice()` with Irregular Time Series\n",
"# ========================================\n",
"spliced_irregular = ts_splice((series_irreg1, series_irreg2), transition=\"prefer_last\")\n",
"print(\"\\nSpliced Irregular Series (prefer_last):\")\n",
"display(spliced_irregular)"
]
},
{
"cell_type": "markdown",
"id": "de365104",
"metadata": {},
"source": [
"## `Names` argument\n",
"\n",
"Finally let's look at some more intricate examples with mixed series and dataframes with differing numbers of columns and see how `names` can be used to make selections or unify poorly coordinated labels. Here are the series:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35cfc422",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Series 1:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-03 NaN\n",
"2023-01-05 3.0\n",
"2023-01-07 4.0\n",
"2023-01-09 5.0\n",
"Freq: 2D, Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Series 2:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-02 10.0\n",
"2023-01-04 20.0\n",
"2023-01-06 30.0\n",
"2023-01-08 NaN\n",
"2023-01-10 50.0\n",
"Freq: 2D, Name: B, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"DataFrame 1:\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-01-01 | \n",
" 1.0 | \n",
" 10 | \n",
"
\n",
" \n",
" 2023-01-03 | \n",
" NaN | \n",
" 20 | \n",
"
\n",
" \n",
" 2023-01-05 | \n",
" 3.0 | \n",
" 30 | \n",
"
\n",
" \n",
" 2023-01-07 | \n",
" 4.0 | \n",
" 40 | \n",
"
\n",
" \n",
" 2023-01-09 | \n",
" 5.0 | \n",
" 50 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"2023-01-01 1.0 10\n",
"2023-01-03 NaN 20\n",
"2023-01-05 3.0 30\n",
"2023-01-07 4.0 40\n",
"2023-01-09 5.0 50"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"DataFrame 2:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-01-02 | \n",
" 10.0 | \n",
" 100.0 | \n",
"
\n",
" \n",
" 2023-01-04 | \n",
" 20.0 | \n",
" 200.0 | \n",
"
\n",
" \n",
" 2023-01-06 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2023-01-08 | \n",
" 40.0 | \n",
" 400.0 | \n",
"
\n",
" \n",
" 2023-01-10 | \n",
" 50.0 | \n",
" 500.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"2023-01-02 10.0 100.0\n",
"2023-01-04 20.0 200.0\n",
"2023-01-06 NaN NaN\n",
"2023-01-08 40.0 400.0\n",
"2023-01-10 50.0 500.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"DataFrame 3:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-01-02 | \n",
" 310.0 | \n",
" 100.0 | \n",
" 3100.0 | \n",
"
\n",
" \n",
" 2023-01-04 | \n",
" 320.0 | \n",
" 200.0 | \n",
" 3200.0 | \n",
"
\n",
" \n",
" 2023-01-06 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2023-01-08 | \n",
" 340.0 | \n",
" 400.0 | \n",
" 3400.0 | \n",
"
\n",
" \n",
" 2023-01-10 | \n",
" NaN | \n",
" 500.0 | \n",
" 3500.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C\n",
"2023-01-02 310.0 100.0 3100.0\n",
"2023-01-04 320.0 200.0 3200.0\n",
"2023-01-06 NaN NaN NaN\n",
"2023-01-08 340.0 400.0 3400.0\n",
"2023-01-10 NaN 500.0 3500.0"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from vtools import ts_merge, ts_splice # Assuming these functions are in merge.py\n",
"\n",
"# Create irregular time series\n",
"idx1 = pd.date_range(\"2023-01-01\", periods=5, freq=\"2D\")\n",
"idx2 = pd.date_range(\"2023-01-02\", periods=5, freq=\"2D\")\n",
"\n",
"series1 = pd.Series([1, np.nan, 3, 4, 5], index=idx1, name=\"A\")\n",
"series2 = pd.Series([10, 20, 30, np.nan, 50], index=idx2, name=\"B\")\n",
"\n",
"df1 = pd.DataFrame({\"A\": [1, np.nan, 3, 4, 5], \"B\": [10, 20, 30, 40, 50]}, index=idx1)\n",
"df2 = pd.DataFrame({\"A\": [10, 20, np.nan, 40, 50], \"B\": [100, 200, np.nan, 400, 500]}, index=idx2)\n",
"df3 = pd.DataFrame({\"A\": [310, 320, np.nan, 340, np.nan], \n",
" \"B\": [100, 200, np.nan, 400, 500],\n",
" \"C\": [3100, 3200, np.nan, 3400, 3500]\n",
" }, index=idx2)\n",
"\n",
"# Display Data\n",
"print(\"Series 1:\")\n",
"display(series1)\n",
"\n",
"print(\"Series 2:\")\n",
"display(series2)\n",
"\n",
"print(\"DataFrame 1:\")\n",
"display(df1)\n",
"\n",
"print(\"DataFrame 2:\")\n",
"display(df2)\n",
"\n",
"print(\"DataFrame 3:\")\n",
"display(df3)\n"
]
},
{
"cell_type": "markdown",
"id": "219a7aa6",
"metadata": {},
"source": [
"Here are some example usage:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6949e66",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merged Series not renamed:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-02 2.0\n",
"2023-01-03 3.0\n",
"2023-01-04 4.0\n",
"2023-01-05 5.0\n",
"2023-01-06 6.0\n",
"2023-01-07 7.0\n",
"2023-01-08 8.0\n",
"2023-01-09 9.0\n",
"2023-01-10 10.0\n",
"2023-01-11 3.0\n",
"2023-01-12 4.0\n",
"Freq: D, Name: A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merged Series renamed:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-02 2.0\n",
"2023-01-03 3.0\n",
"2023-01-04 4.0\n",
"2023-01-05 5.0\n",
"2023-01-06 6.0\n",
"2023-01-07 7.0\n",
"2023-01-08 8.0\n",
"2023-01-09 9.0\n",
"2023-01-10 10.0\n",
"2023-01-11 3.0\n",
"2023-01-12 4.0\n",
"Freq: D, Name: Renamed_A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merged DataFrame without Selected Columns (names=None) results in an error if the columns don't match\n",
"Merged DataFrame without selected columns (names=None) for input DataFrames with matched columns:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
" B | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-01-01 | \n",
" 1.0 | \n",
" 10.0 | \n",
"
\n",
" \n",
" 2023-01-02 | \n",
" 10.0 | \n",
" 100.0 | \n",
"
\n",
" \n",
" 2023-01-03 | \n",
" NaN | \n",
" 20.0 | \n",
"
\n",
" \n",
" 2023-01-04 | \n",
" 20.0 | \n",
" 200.0 | \n",
"
\n",
" \n",
" 2023-01-05 | \n",
" 3.0 | \n",
" 30.0 | \n",
"
\n",
" \n",
" 2023-01-06 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2023-01-07 | \n",
" 4.0 | \n",
" 40.0 | \n",
"
\n",
" \n",
" 2023-01-08 | \n",
" 40.0 | \n",
" 400.0 | \n",
"
\n",
" \n",
" 2023-01-09 | \n",
" 5.0 | \n",
" 50.0 | \n",
"
\n",
" \n",
" 2023-01-10 | \n",
" 50.0 | \n",
" 500.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B\n",
"2023-01-01 1.0 10.0\n",
"2023-01-02 10.0 100.0\n",
"2023-01-03 NaN 20.0\n",
"2023-01-04 20.0 200.0\n",
"2023-01-05 3.0 30.0\n",
"2023-01-06 NaN NaN\n",
"2023-01-07 4.0 40.0\n",
"2023-01-08 40.0 400.0\n",
"2023-01-09 5.0 50.0\n",
"2023-01-10 50.0 500.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merged DataFrame with Selected Columns A merges that column ([A,B] would have been OK too)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" A | \n",
"
\n",
" \n",
" \n",
" \n",
" 2023-01-01 | \n",
" 1.0 | \n",
"
\n",
" \n",
" 2023-01-02 | \n",
" 10.0 | \n",
"
\n",
" \n",
" 2023-01-03 | \n",
" NaN | \n",
"
\n",
" \n",
" 2023-01-04 | \n",
" 20.0 | \n",
"
\n",
" \n",
" 2023-01-05 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 2023-01-06 | \n",
" NaN | \n",
"
\n",
" \n",
" 2023-01-07 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2023-01-08 | \n",
" 40.0 | \n",
"
\n",
" \n",
" 2023-01-09 | \n",
" 5.0 | \n",
"
\n",
" \n",
" 2023-01-10 | \n",
" 50.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A\n",
"2023-01-01 1.0\n",
"2023-01-02 10.0\n",
"2023-01-03 NaN\n",
"2023-01-04 20.0\n",
"2023-01-05 3.0\n",
"2023-01-06 NaN\n",
"2023-01-07 4.0\n",
"2023-01-08 40.0\n",
"2023-01-09 5.0\n",
"2023-01-10 50.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Spliced Series with Renamed Column:\n"
]
},
{
"data": {
"text/plain": [
"2023-01-01 1.0\n",
"2023-01-02 2.0\n",
"2023-01-03 NaN\n",
"2023-01-04 4.0\n",
"2023-01-05 NaN\n",
"2023-01-06 NaN\n",
"2023-01-07 7.0\n",
"2023-01-08 NaN\n",
"2023-01-09 NaN\n",
"2023-01-10 NaN\n",
"2023-01-11 3.0\n",
"2023-01-12 4.0\n",
"Freq: D, Name: Renamed_A, dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Example: Using `names` to rename output columns\n",
"\n",
"# Merging without a rename\n",
"merged_series_named = ts_merge((series1, series2))\n",
"print(\"Merged Series not renamed:\")\n",
"display(merged_series_named)\n",
"\n",
"# Rename a single column\n",
"merged_series_named = ts_merge((series1, series2), names=\"Renamed_A\")\n",
"print(\"Merged Series renamed:\")\n",
"display(merged_series_named)\n",
"\n",
"# Select specific columns in DataFrame\n",
"try:\n",
" merged_df_named = ts_merge((df1, df2, df3), names=None)\n",
"except:\n",
" print(\"Merged DataFrame without Selected Columns (names=None) results in an error if the columns don't match\")\n",
"#display(merged_df_named)\n",
"\n",
"# Select specific columns in DataFrame\n",
"merged_df_named = ts_merge((df1, df2), names=None)\n",
"print(\"Merged DataFrame without selected columns (names=None) for input DataFrames with matched columns:\")\n",
"display(merged_df_named)\n",
"\n",
"\n",
"# Select specific columns in DataFrame\n",
"merged_df_named = ts_merge((df1, df2, df3), names=[\"A\"])\n",
"print(\"Merged DataFrame with Selected Columns A merges that column ([A,B] would have been OK too)\")\n",
"display(merged_df_named)\n",
"\n",
"\n",
"# Rename column in splicing\n",
"spliced_series_named = ts_splice((series1, series2), names=\"Renamed_A\", transition=\"prefer_last\")\n",
"print(\"Spliced Series with Renamed Column:\")\n",
"display(spliced_series_named)\n"
]
},
{
"cell_type": "markdown",
"id": "6baebda5",
"metadata": {},
"source": [
"## Summary\n",
"- **Use `ts_merge`** when you want to blend time series together, filling missing values in order of priority.\n",
"- **Use `ts_splice`** when you want to keep each time series separate and transition from one to another based on time.\n",
"- **The `names` argument** allows you to rename output columns or select specific columns when merging/splicing DataFrames.\n",
"\n",
"This notebook provides a clear comparison to help you decide which method best suits your use case.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "schism",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}