Merging and Splicing Time Series¶

This tutorial demonstrates the usage and difference between ts_merge and ts_splice, two methods for folding together time series into a combined data structure.

``ts_merge`` blends multiple time series together based on priority, filling missing values. It potentiallyu uses all the input series at all timestamps.
``ts_splice`` stitches together time series in sequential time blocks without mixing values.

We will describe the effect on regularly sampled series (which have the freq attribute) and on irregular. We will also explore the ``names`` argument, which controls how columns are selected or renamed in the merging/splicing process. There is a file-level command line tools for this as well in the dms_datastore package.

Prioritized filling on regular series¶

Let’s begin by showing how ts_merge and ts_splice fold together two regular series but gappy series on a prioritized basis.

Here are the sample series:

[ ]:

# ========================================
# 1️⃣ Creating Regular Time Series (1D Frequency with Missing Data)
# ========================================
idx1 = pd.date_range("2023-01-01", periods=10, freq="1D")
idx2 = pd.date_range("2023-01-01", periods=12, freq="1D")
idx3 = pd.date_range("2022-12-31", periods=14, freq="1D")

series1 = pd.Series([1, np.nan, 3, np.nan, 5, 6, np.nan, 8, 9, 10], index=idx1, name="A")
series2 = pd.Series([np.nan, 2, np.nan, 4, np.nan, np.nan, 7, np.nan, np.nan, np.nan,3.,4.], index=idx2, name="A")
series3 = pd.Series([1000.,1001., 1002., np.nan, 1004., np.nan, np.nan, 1007., np.nan, np.nan, np.nan,1005.,1006.,1007.], index=idx3, name="A")

print("Series 1 (Primary):")
display(series1)

print("\nSeries 2 (Secondary - Fills Gaps):")
display(series2)

print("\nSeries 3 (Tertiary - Fills Gaps):")
display(series3)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 4
      1 # ========================================
      2 # 1️⃣ Creating Regular Time Series (1D Frequency with Missing Data)
      3 # ========================================
----> 4 idx1 = pd.date_range("2023-01-01", periods=10, freq="1D")
      5 idx2 = pd.date_range("2023-01-01", periods=12, freq="1D")
      6 idx3 = pd.date_range("2022-12-31", periods=14, freq="1D")

NameError: name 'pd' is not defined

And here is what it looks like spliced instead of merged.

[ ]:

# ========================================
# 2️⃣ Using `ts_merge()` with Prioritization
# ========================================
merged_series = ts_merge((series1, series2, series3))
print("\nMerged Series with Prioritization:")
display(merged_series)


Merged Series with Prioritization:

2022-12-31    1000.0
2023-01-01       1.0
2023-01-02       2.0
2023-01-03       3.0
2023-01-04       4.0
2023-01-05       5.0
2023-01-06       6.0
2023-01-07       7.0
2023-01-08       8.0
2023-01-09       9.0
2023-01-10      10.0
2023-01-11       3.0
2023-01-12       4.0
2023-01-13    1007.0
Freq: D, Name: A, dtype: float64

Splicing¶

Splicing marches through the prioritized list of input time series and exclusively uses values for the higher priority series one during the entire span of that series.

[ ]:

spliced_series = ts_splice((series1, series2, series3))
print("\nSpliced Series with Prioritization and default `prefer last`:")
display(spliced_series)
spliced_first = ts_splice((series1, series2, series3),transition="prefer_first")
print("\nSpliced Series with Prioritization, Prefer first:")
display(spliced_first)


Spliced Series with Prioritization:

2022-12-31    1000.0
2023-01-01    1001.0
2023-01-02    1002.0
2023-01-03       NaN
2023-01-04    1004.0
2023-01-05       NaN
2023-01-06       NaN
2023-01-07    1007.0
2023-01-08       NaN
2023-01-09       NaN
2023-01-10       NaN
2023-01-11    1005.0
2023-01-12    1006.0
2023-01-13    1007.0
Freq: D, Name: A, dtype: float64


Spliced Series with Prioritization, Prefer first:

2023-01-01       1.0
2023-01-02       NaN
2023-01-03       3.0
2023-01-04       NaN
2023-01-05       5.0
2023-01-06       6.0
2023-01-07       NaN
2023-01-08       8.0
2023-01-09       9.0
2023-01-10      10.0
2023-01-11       3.0
2023-01-12       4.0
2023-01-13    1007.0
Freq: D, Name: A, dtype: float64

Irregular series¶

Now we will look at some irregular series and see the difference in output from ts_merge (which shuffles) and ts_splice (which exclusively uses values from one series at a time based on the span of the series and its priority)

[ ]:

# ========================================
# 3️⃣ Creating Irregular Time Series (No Freq Attribute)
# ========================================
idx_irreg1 = pd.to_datetime(["2023-01-01", "2023-01-03", "2023-01-07", "2023-01-10"])
idx_irreg2 = pd.to_datetime(["2023-01-02", "2023-01-04", "2023-01-08", "2023-01-11"])

series_irreg1 = pd.Series([1, np.nan, 3, 4], index=idx_irreg1, name="A")
series_irreg2 = pd.Series([10, 20, np.nan, 40], index=idx_irreg2, name="A")

print("\nIrregular Series 1:")
display(series_irreg1)

print("\nIrregular Series 2:")
display(series_irreg2)

# ========================================
# 4️⃣ Using `ts_merge()` with Irregular Time Series
# ========================================
merged_irregular = ts_merge((series_irreg1, series_irreg2))
print("\nMerged Irregular Series (May Shuffle Timestamps):")
display(merged_irregular)

# ========================================
# 5️⃣ Using `ts_splice()` with Irregular Time Series
# ========================================
spliced_irregular = ts_splice((series_irreg1, series_irreg2), transition="prefer_last")
print("\nSpliced Irregular Series (prefer_last):")
display(spliced_irregular)


Irregular Series 1:

2023-01-01    1.0
2023-01-03    NaN
2023-01-07    3.0
2023-01-10    4.0
Name: A, dtype: float64


Irregular Series 2:

2023-01-02    10.0
2023-01-04    20.0
2023-01-08     NaN
2023-01-11    40.0
Name: A, dtype: float64


Merged Irregular Series (May Shuffle Timestamps):

2023-01-01     1.0
2023-01-02    10.0
2023-01-03     NaN
2023-01-04    20.0
2023-01-07     3.0
2023-01-08     NaN
2023-01-10     4.0
2023-01-11    40.0
Name: A, dtype: float64


Spliced Irregular Series (prefer_last):

2023-01-01     1.0
2023-01-02    10.0
2023-01-04    20.0
2023-01-08     NaN
2023-01-11    40.0
Name: A, dtype: float64

`Names` argument¶

Finally let’s look at some more intricate examples with mixed series and dataframes with differing numbers of columns and see how names can be used to make selections or unify poorly coordinated labels. Here are the series:

[ ]:

import pandas as pd
import numpy as np
from vtools import ts_merge, ts_splice  # Assuming these functions are in merge.py

# Create irregular time series
idx1 = pd.date_range("2023-01-01", periods=5, freq="2D")
idx2 = pd.date_range("2023-01-02", periods=5, freq="2D")

series1 = pd.Series([1, np.nan, 3, 4, 5], index=idx1, name="A")
series2 = pd.Series([10, 20, 30, np.nan, 50], index=idx2, name="B")

df1 = pd.DataFrame({"A": [1, np.nan, 3, 4, 5], "B": [10, 20, 30, 40, 50]}, index=idx1)
df2 = pd.DataFrame({"A": [10, 20, np.nan, 40, 50], "B": [100, 200, np.nan, 400, 500]}, index=idx2)
df3 = pd.DataFrame({"A": [310, 320, np.nan, 340, np.nan],
                    "B": [100, 200, np.nan, 400, 500],
                    "C": [3100, 3200, np.nan, 3400, 3500]
                    }, index=idx2)

# Display Data
print("Series 1:")
display(series1)

print("Series 2:")
display(series2)

print("DataFrame 1:")
display(df1)

print("DataFrame 2:")
display(df2)

print("DataFrame 3:")
display(df3)

Series 1:

2023-01-01    1.0
2023-01-03    NaN
2023-01-05    3.0
2023-01-07    4.0
2023-01-09    5.0
Freq: 2D, Name: A, dtype: float64

Series 2:

2023-01-02    10.0
2023-01-04    20.0
2023-01-06    30.0
2023-01-08     NaN
2023-01-10    50.0
Freq: 2D, Name: B, dtype: float64

DataFrame 1:

	A	B
2023-01-01	1.0	10
2023-01-03	NaN	20
2023-01-05	3.0	30
2023-01-07	4.0	40
2023-01-09	5.0	50

DataFrame 2:

	A	B
2023-01-02	10.0	100.0
2023-01-04	20.0	200.0
2023-01-06	NaN	NaN
2023-01-08	40.0	400.0
2023-01-10	50.0	500.0

DataFrame 3:

	A	B	C
2023-01-02	310.0	100.0	3100.0
2023-01-04	320.0	200.0	3200.0
2023-01-06	NaN	NaN	NaN
2023-01-08	340.0	400.0	3400.0
2023-01-10	NaN	500.0	3500.0

Here are some example usage:

[ ]:

# Example: Using `names` to rename output columns

# Merging without a rename
merged_series_named = ts_merge((series1, series2))
print("Merged Series not renamed:")
display(merged_series_named)

# Rename a single column
merged_series_named = ts_merge((series1, series2), names="Renamed_A")
print("Merged Series renamed:")
display(merged_series_named)

# Select specific columns in DataFrame
try:
    merged_df_named = ts_merge((df1, df2, df3), names=None)
except:
    print("Merged DataFrame without Selected Columns (names=None) results in an error if the columns don't match")
#display(merged_df_named)

# Select specific columns in DataFrame
merged_df_named = ts_merge((df1, df2), names=None)
print("Merged DataFrame without selected columns (names=None) for input DataFrames with matched columns:")
display(merged_df_named)


# Select specific columns in DataFrame
merged_df_named = ts_merge((df1, df2, df3), names=["A"])
print("Merged DataFrame with Selected Columns A merges that column ([A,B] would have been OK too)")
display(merged_df_named)


# Rename column in splicing
spliced_series_named = ts_splice((series1, series2), names="Renamed_A", transition="prefer_last")
print("Spliced Series with Renamed Column:")
display(spliced_series_named)

Merged Series not renamed:

2023-01-01     1.0
2023-01-02     2.0
2023-01-03     3.0
2023-01-04     4.0
2023-01-05     5.0
2023-01-06     6.0
2023-01-07     7.0
2023-01-08     8.0
2023-01-09     9.0
2023-01-10    10.0
2023-01-11     3.0
2023-01-12     4.0
Freq: D, Name: A, dtype: float64

Merged Series renamed:

2023-01-01     1.0
2023-01-02     2.0
2023-01-03     3.0
2023-01-04     4.0
2023-01-05     5.0
2023-01-06     6.0
2023-01-07     7.0
2023-01-08     8.0
2023-01-09     9.0
2023-01-10    10.0
2023-01-11     3.0
2023-01-12     4.0
Freq: D, Name: Renamed_A, dtype: float64

Merged DataFrame without Selected Columns (names=None) results in an error if the columns don't match
Merged DataFrame without selected columns (names=None) for input DataFrames with matched columns:

	A	B
2023-01-01	1.0	10.0
2023-01-02	10.0	100.0
2023-01-03	NaN	20.0
2023-01-04	20.0	200.0
2023-01-05	3.0	30.0
2023-01-06	NaN	NaN
2023-01-07	4.0	40.0
2023-01-08	40.0	400.0
2023-01-09	5.0	50.0
2023-01-10	50.0	500.0

Merged DataFrame with Selected Columns A merges that column ([A,B] would have been OK too)

	A
2023-01-01	1.0
2023-01-02	10.0
2023-01-03	NaN
2023-01-04	20.0
2023-01-05	3.0
2023-01-06	NaN
2023-01-07	4.0
2023-01-08	40.0
2023-01-09	5.0
2023-01-10	50.0

Spliced Series with Renamed Column:

2023-01-01    1.0
2023-01-02    2.0
2023-01-03    NaN
2023-01-04    4.0
2023-01-05    NaN
2023-01-06    NaN
2023-01-07    7.0
2023-01-08    NaN
2023-01-09    NaN
2023-01-10    NaN
2023-01-11    3.0
2023-01-12    4.0
Freq: D, Name: Renamed_A, dtype: float64

Summary¶

Use ``ts_merge`` when you want to blend time series together, filling missing values in order of priority.
Use ``ts_splice`` when you want to keep each time series separate and transition from one to another based on time.
The ``names`` argument allows you to rename output columns or select specific columns when merging/splicing DataFrames.

This notebook provides a clear comparison to help you decide which method best suits your use case.

Navigation

Related Topics

Merging and Splicing Time Series¶

Prioritized filling on regular series¶

Splicing¶

Irregular series¶

`Names` argument¶

Summary¶

Merging and Splicing Time Series¶

Prioritized filling on regular series¶

Splicing¶

Irregular series¶

Names argument¶

Summary¶

`Names` argument¶