Time series concepts and manipulation

Time series class

The vtools.data.timeseries.TimeSeries class it the fundamental data structure for both regular and irregular time series. The class supports a number of slicing and querying methods, as well as simple index-aligned arithmetic.

Regular vs irregular

We use the term regular when the times correspond to an organized sampling interval. In the implementation, a regular time series is simply one that has a non-None value for the interval attribute. Rather than relying on this implementation, the preferred way to test this is the is_regular() method.

Creating time series

The time series class has a constructor, but the preferred “guaranteeed backward compatible” way to create a time series is with the factory functions rts() for regular time series or its() for irregular. These terms are clarified in section Regular vs irregular.

The function rts() creates a time series from a a start date, interval and array:

# Import VTools and numpy array creation function
from vtools.data.api import *
import datetime as dtm
import numpy as np
DATUM="datum"  

# Create the start time and interval
start=dtm.datetime(1990,1,1,0,0) # 01JAN1990 00:00
dt = hours(1)
n = 2400

# Create the data (see create_numpy_array.py for more examples)
x=np.arange(n)
data=np.cos(2.*np.pi*x/24.) + np.cos(2.*np.pi*x/12.5)

# Create the attribute dictionary, optional
props={DATUM:"NGVD88"}

# Create the series
ts=rts(data,start,dt,props)


The function its() creates a time series from a regular time series is from a start date, interval and array:

# Import VTools and numpy array creation function
from vtools.data.api import *
import datetime as dtm
from numpy import arange,sin,pi
DATUM="datum"

# Create the times and data. The its function accepts
# lists or arrays. Here lists and "append" are used,
# because in the typical situation you
# can't predict the number of times.
# Where performance is an issue (say >10000 entries),
# consider using numpy arrays instead of lists
times=[]
data=[]

times.append(dtm.datetime(1990,1,1,0,0))
data.append(1.)
times.append(dtm.datetime(1990,1,2,13,30))
data.append(10.)
times.append(dtm.datetime(1990,1,4,10,15))
data.append(7.)
times.append(dtm.datetime(1990,1,5,12,20))
data.append(1.)

# create the attribute dictionary, optional
props={DATUM:"NGVD88"}

# create the series
ts=its(times,data,props)

print ts[0]                          # todo: time series elements don't print
print ts[1]

If what you have is a start and end time, there are two options. One is to use rts_constant(), which creates a time series initialized to a constant. You can then access the data attribute if you want to alter it.

Properties: metadata about the series

Every time series has an attribute called props that stores some basic metadata about the series. These include some fairly universal quantities (e.g units or type of period averaging) as well as some attributes that are data source specific. The exact names used for properties does matter if you are going to be interacting with data sources.

Timestamp conventions for period aggregated (e.g. ave) data

HEC-DSS and some real time storage programs have a convention of storing period-averaged data at the end of each period. This hard to work with in terms of labels and quality control. Vtools interacts with data sources using their own conventions, but converts it on import/export so that in the vtools environment the timestamp is at the beginning of the period. So, for instance, if you import DSS period averaged data the data will be translated from period-end to period-start on import and the reverse on export.

For plotting or cell-centered numerical work, it can also be convenient to have times and ticks that are period centered. The TimeSeries.centered(copy_data=[True,False]) method will created a shared or copied-data version of the parent time series.

Indexing, slicing

You can index or slice a series using integers or datetime objects. Indexing with a single index will return a single TimeSeriesElement. If the index is a slice using a start and stop index separated by a colon, the operation will return a TimeSeries with shared data. A slice in VTools follows the Python convention for the stop index – it is not included in the resulting slice. The behavior is demonstrated by this example:

This can be inconvenient, in which case you may want to consider using the TimeSeries.window() for a shared memory copy or the TimeSeries.copy() method for a deep copy. If what you will be doing is replacing the contents of the series with another series of the same interval, use the replace() method.

Shifting series and centering period averaged data

A series can be shifted forward and back in time using the member function shift(), which also lets you decide whether you want to use copied data or a copy. The centered() method is a special case that returns interval-centered data for period averaged time series. For instance, daily average data in vtools will be stamped at the beginning of the period by convention (see time stamping conventions). The appropriate centering for analysis is noon. This is the shifting that is returned by centered()

Iterating through time

Occasionally it is necessary to march through the elements of a time series.

Not often. You should avoid stepwise iteration as much as possible. Vtools, numpy and python in general recommend “functional programming” on entire arrays or time series, and besides not being “pythonic” the speed penalty of iterating is big at least in relative terms. When necessary, though, time series do provide their own iterator so this will work to traverse through a series by time steps:

for elem in ts:
    print elem

The element in this case is an object of type TimeSeriesElement, which has just three attributes:

class TimeSeriesElement(time_data)
ticks

Time point in long integer ticks of element

time

Time at element

value

Value at element

There is no connection back to the parent series, so mostly you can just use this for read-only access to data at one time step. A TimeSeriesElement can also be used to set data in the time series:

ts[element] = 7.0

although the use cases for this are few and if you are using one series to replace another you should consider using :class:`~vtools.data.timeseries.TimeSeries.replace’.

One example where iteration is very useful is when coordinating the traversal of a coarse and fine time series using itertools. The following comes from an example that coordinates the traversal of a two minute regular time series (gate_rts) and two daily series (gate_daily,qts) based on date:

from itertools import groupby
def elday(el):
    return dtm.datetime.combine(el.time.date(),dtm.datetime.min.time())

...

# on entry, gate_rts is an indicator (1,0) of whether gate is open
for i,(k,g) in enumerate( groupby(gate_rts,elday)):
    gtot = gate_daily[k].value
    daily_pump_ave = qts[k].value
    for el in g:
        if not np.isnan(el.value):
            # on exit, a daily pumping value has been distributed
            # over the times when the gate is open, and the (1,0) indicator
            # is replaced with the value
            gate_rts[el.time] = el.value*daily_pump_ave/gtot

Time series arithmetic

Most unary and binary operators that works in numpy work in vtools, only in an index aligned way:

::
ts3 = ts1 + ts2 # ts3[ some_time ] = ts1[ some_time ] + ts2[ some_time ]

The time range of the output (ts3 in the above example) will be the union of the input time series.

Unary operators and binary operators with scalars work the same way, except there is no need for time alignment:

::
is_pos = ts >= 0. # produces a time series of boolean (True/False) values. ts23 = ts**(2./3.) # each element raised to the two thirds power