vtools.data package¶
Submodules¶
vtools.data.dst module¶
Daylight Savings Time Conversion¶
This module provides the function dst_st()
for converting a pandas Series/Dataframe
with a naive DatetimeIndex that observes daylight savings time (DST) to a fixed
standard time zone (e.g., PST) using POSIX conventions.
See the automatic API documentation for details:
vtools.data.dst.dst_st()
- dst_st(ts, src_tz: str = 'US/Pacific', target_tz: str = 'Etc/GMT+8')[source]¶
Convert a pandas Series with a datetime index from a timezone-unaware index that observes DST (e.g., US/Pacific) to a fixed standard time zone (e.g., Etc/GMT+8) using POSIX conventions.
- Parameters:
- tspandas.Series
Time series with a naive (timezone-unaware) DatetimeIndex.
- src_tzstr, optional
Source timezone name (default is ‘US/Pacific’).
- target_tzstr, optional
Target standard timezone name (default is ‘Etc/GMT+8’).
- Returns:
- pandas.Series
Time series with index converted to the target standard timezone and made naive.
Notes
The function assumes the index is not already timezone-aware.
‘Etc/GMT+8’ is the correct tz name for UTC-8 (PST) in pytz; note the sign is reversed from what might be expected.
Handles ambiguous/nonexistent times due to DST transitions.
The returned index is naive (timezone-unaware) but represents the correct standard time.
If the input index is already timezone-aware, this function will raise an error.
Examples
>>> import pandas as pd >>> from vtools import dst_st >>> rng = pd.date_range("2023-11-05 00:00", "2023-11-05 04:00", freq="30min") >>> ts = pd.Series(range(len(rng)), index=rng) >>> converted = dst_st(ts) >>> print(converted) 2023-11-05 00:00:00 0 2023-11-05 00:30:00 1 2023-11-05 01:00:00 2 2023-11-05 01:30:00 3 2023-11-05 02:30:00 5 2023-11-05 03:00:00 6 2023-11-05 03:30:00 7 2023-11-05 04:00:00 8 dtype: int64
vtools.data.gap module¶
- describe_null(dset, name, context=2)[source]¶
If dset is a DataFrame, run describe_series_gaps on each column. If it’s a Series, just run it once.
- describe_series_gaps(s: Series, name: str, context: int = 2)[source]¶
Print gaps in a single Series s, showing context non-null points before and after each gap, with an ellipsis marker in between.
- gap_count(ts, state='gap', dtype=<class 'int'>)[source]¶
Count missing data Identifies gaps (runs of missing or non-missing data) and quantifies the length of the gap in terms of number of samples, which works better for regular series. Each time point receives the length of the run.
- Parameters:
- ts
DataFrame
Time series to analyze
- statestr one of ‘gap’|’good’|’both’
State to count. If state is gap, block size of missing data are counted and reported for time points in the gap (every point in a given gap will receive the same value). Non missing data will have a size of zero. Setting state to ‘good’ inverts this – missing blocks are reported as zero and good data are counted.
- dtypestr or type
Data type of output, should be acceptable to pandas
astype
- ts
- gap_distance(ts, disttype='count', to='good')[source]¶
For each element of ts, count the distance to the nearest good data/or bad data.
- Parameters:
- ts
DataFrame
- Time series to analyze
- disttypestr one of ‘bad’|’good’
- If disttype = “count” this is the number of values. If dist_type=”freq” it is in the units of ts.freq
- (so if freq == “15min” it is in minutes”)
- tostr one of ‘bad’|’good’
- If to = “good” this is the distance to the nearest good data (which is 0 for good data).
- If to = “bad”, this is the distance to the nearest nan (which is 0 for nan).
- ts
- Returns:
- result
DataFrame
A new regular time series with the same freq as the argument holding the distance of good/bad data.
- result
- gap_size(ts)[source]¶
Identifies gaps (runs of missing data) and quantifies the length of the gap. Each time point receives the length of the run in terms of seconds or number of values in the time dimension, with non-missing data returning zero. Time is measured from the time the data first started being missing to when the data first starts being not missing .
- Parameters:
- Returns:
- result
DataFrame
A new regular time series with the same freq as the argument holding the size of the gap.
- result
Examples
>>> ndx = pd.date_range(pd.Timestamp(2017,1,1,12),freq='15min',periods=10) >>> vals0 = np.arange(0.,10.,dtype='d') >>> vals1 = np.arange(0.,10.,dtype='d') >>> vals2 = np.arange(0.,10.,dtype='d') >>> vals0[0:3] = np.nan >>> vals0[7:-1] = np.nan >>> vals1[2:4] = np.nan>>> >>> vals1[6] = np.nan >>> vals1[9] = np.nan
>>> df = pd.DataFrame({'vals0':vals0,'vals1':vals1,'vals2':vals2},index = ndx) >>> out = gap_size(df) >>> print(df) vals0 vals1 vals2 2017-01-01 12:00:00 NaN 0.0 0.0 2017-01-01 12:15:00 NaN 1.0 1.0 2017-01-01 12:30:00 NaN NaN 2.0 2017-01-01 12:45:00 3.0 NaN 3.0 2017-01-01 13:00:00 4.0 4.0 4.0 2017-01-01 13:15:00 5.0 5.0 5.0 2017-01-01 13:30:00 6.0 NaN 6.0 2017-01-01 13:45:00 NaN 7.0 7.0 2017-01-01 14:00:00 NaN 8.0 8.0 2017-01-01 14:15:00 9.0 NaN 9.0 >>> print(out) vals0 vals1 vals2 2017-01-01 12:00:00 45.0 0.0 0.0 2017-01-01 12:15:00 45.0 0.0 0.0 2017-01-01 12:30:00 45.0 30.0 0.0 2017-01-01 12:45:00 0.0 30.0 0.0 2017-01-01 13:00:00 0.0 0.0 0.0 2017-01-01 13:15:00 0.0 0.0 0.0 2017-01-01 13:30:00 0.0 15.0 0.0 2017-01-01 13:45:00 30.0 0.0 0.0 2017-01-01 14:00:00 30.0 0.0 0.0 2017-01-01 14:15:00 0.0 0.0 0.0
vtools.data.sample_series module¶
- jay_flinchem_chirptest(c1=3.5, c2=5.5, c3=0.0002, c4=6.75)[source]¶
Approximation of the signal from Jay and Flinchem 1999 A comparison of methods for analysis of tidal records containing multi-scale non-tidal background energy that has a small tide with noisy, river-influenced amplitude and subtide
vtools.data.timeseries module¶
Time series module Helpers for creating regular and irregular time series, transforming irregular to regular and analyzing gaps.
- class PchipInterpolator(x, y, axis=0, extrapolate=None)[source]¶
Bases:
CubicHermiteSpline
PCHIP 1-D monotonic cubic interpolation.
x
andy
are arrays of values used to approximate some function f, withy = f(x)
. The interpolant uses monotonic cubic splines to find the value of new points. (PCHIP stands for Piecewise Cubic Hermite Interpolating Polynomial).- Parameters:
- xndarray, shape (npoints, )
A 1-D array of monotonically increasing real values.
x
cannot include duplicate values (otherwise f is overspecified)- yndarray, shape (…, npoints, …)
A N-D array of real values.
y
’s length along the interpolation axis must be equal to the length ofx
. Use theaxis
parameter to select the interpolation axis.- axisint, optional
Axis in the
y
array corresponding to the x-coordinate values. Defaults toaxis=0
.- extrapolatebool, optional
Whether to extrapolate to out-of-bounds points based on first and last intervals, or to return NaNs.
See also
CubicHermiteSpline
Piecewise-cubic interpolator.
Akima1DInterpolator
Akima 1D interpolator.
CubicSpline
Cubic spline data interpolator.
PPoly
Piecewise polynomial in terms of coefficients and breakpoints.
Notes
The interpolator preserves monotonicity in the interpolation data and does not overshoot if the data is not smooth.
The first derivatives are guaranteed to be continuous, but the second derivatives may jump at \(x_k\).
Determines the derivatives at the points \(x_k\), \(f'_k\), by using PCHIP algorithm [1].
Let \(h_k = x_{k+1} - x_k\), and \(d_k = (y_{k+1} - y_k) / h_k\) are the slopes at internal points \(x_k\). If the signs of \(d_k\) and \(d_{k-1}\) are different or either of them equals zero, then \(f'_k = 0\). Otherwise, it is given by the weighted harmonic mean
\[\frac{w_1 + w_2}{f'_k} = \frac{w_1}{d_{k-1}} + \frac{w_2}{d_k}\]where \(w_1 = 2 h_k + h_{k-1}\) and \(w_2 = h_k + 2 h_{k-1}\).
The end slopes are set using a one-sided scheme [2].
References
[1]F. N. Fritsch and J. Butland, A method for constructing local monotone piecewise cubic interpolants, SIAM J. Sci. Comput., 5(2), 300-304 (1984). :doi:`10.1137/0905021`.
[2]see, e.g., C. Moler, Numerical Computing with Matlab, 2004. :doi:`10.1137/1.9780898717952`
- Attributes:
- axis
- c
- extrapolate
- x
Methods
__call__
(x[, nu, extrapolate])Evaluate the piecewise polynomial or its derivative.
derivative
([nu])Construct a new piecewise polynomial representing the derivative.
antiderivative
([nu])Construct a new piecewise polynomial representing the antiderivative.
roots
([discontinuity, extrapolate])Find real roots of the piecewise polynomial.
- __doc__ = "PCHIP 1-D monotonic cubic interpolation.\n\n ``x`` and ``y`` are arrays of values used to approximate some function f,\n with ``y = f(x)``. The interpolant uses monotonic cubic splines\n to find the value of new points. (PCHIP stands for Piecewise Cubic\n Hermite Interpolating Polynomial).\n\n Parameters\n ----------\n x : ndarray, shape (npoints, )\n A 1-D array of monotonically increasing real values. ``x`` cannot\n include duplicate values (otherwise f is overspecified)\n y : ndarray, shape (..., npoints, ...)\n A N-D array of real values. ``y``'s length along the interpolation\n axis must be equal to the length of ``x``. Use the ``axis``\n parameter to select the interpolation axis.\n axis : int, optional\n Axis in the ``y`` array corresponding to the x-coordinate values. Defaults\n to ``axis=0``.\n extrapolate : bool, optional\n Whether to extrapolate to out-of-bounds points based on first\n and last intervals, or to return NaNs.\n\n Methods\n -------\n __call__\n derivative\n antiderivative\n roots\n\n See Also\n --------\n CubicHermiteSpline : Piecewise-cubic interpolator.\n Akima1DInterpolator : Akima 1D interpolator.\n CubicSpline : Cubic spline data interpolator.\n PPoly : Piecewise polynomial in terms of coefficients and breakpoints.\n\n Notes\n -----\n The interpolator preserves monotonicity in the interpolation data and does\n not overshoot if the data is not smooth.\n\n The first derivatives are guaranteed to be continuous, but the second\n derivatives may jump at :math:`x_k`.\n\n Determines the derivatives at the points :math:`x_k`, :math:`f'_k`,\n by using PCHIP algorithm [1]_.\n\n Let :math:`h_k = x_{k+1} - x_k`, and :math:`d_k = (y_{k+1} - y_k) / h_k`\n are the slopes at internal points :math:`x_k`.\n If the signs of :math:`d_k` and :math:`d_{k-1}` are different or either of\n them equals zero, then :math:`f'_k = 0`. Otherwise, it is given by the\n weighted harmonic mean\n\n .. math::\n\n \\frac{w_1 + w_2}{f'_k} = \\frac{w_1}{d_{k-1}} + \\frac{w_2}{d_k}\n\n where :math:`w_1 = 2 h_k + h_{k-1}` and :math:`w_2 = h_k + 2 h_{k-1}`.\n\n The end slopes are set using a one-sided scheme [2]_.\n\n\n References\n ----------\n .. [1] F. N. Fritsch and J. Butland,\n A method for constructing local\n monotone piecewise cubic interpolants,\n SIAM J. Sci. Comput., 5(2), 300-304 (1984).\n :doi:`10.1137/0905021`.\n .. [2] see, e.g., C. Moler, Numerical Computing with Matlab, 2004.\n :doi:`10.1137/1.9780898717952`\n\n "¶
- __module__ = 'scipy.interpolate._cubic'¶
- axis¶
- c¶
- extrapolate¶
- x¶
- datetime_elapsed(index_or_ts, reftime=None, dtype='d', inplace=False)[source]¶
Convert a time series or DatetimeIndex to an integer/double series of elapsed time
- Parameters:
- index_or_ts
DatatimeIndex
Time series or index to be transformed
- reftime
DatatimeIndex
or something convertible The reference time upon which elapsed time is measured. Default of None means start of series
- dtypestr like ‘i’ or ‘d’ or type like int (Int64) or float (Float64)
Data type for output, which starts out as a Float64 (‘d’) and gets converted, typically to Int64 (‘i’)
- inplacebool
If input is a data frame, replaces the index in-place with no copy
- index_or_ts
- Returns:
- result
A new index using elapsed time from reftime as its value and of type dtype
- elapsed_datetime(index_or_ts, reftime=None, time_unit='s', inplace=False)[source]¶
Convert a time series or numerical Index to a Datetime index or series
- Parameters:
- index_or_ts
DatatimeIndex
Time series or index to be transformed with index in elapsed seconds from reftime
- reftime
DatatimeIndex
or something convertible The reference time upon which datetimes are to be evaluated.
- inplacebool
If input is a data frame, replaces the index in-place with no copy
- index_or_ts
- Returns:
- result
A new index using DatetimeIndex inferred from elapsed time from reftime as its value and of type dtype
- extrapolate_ts(ts, start=None, end=None, method='ffill', val=None)[source]¶
Extend a regular time series to a new start and/or end using a specified extrapolation method.
- Parameters:
- tspandas.Series or pandas.DataFrame
The input time series with a DateTimeIndex and a regular frequency.
- startdatetime-like, optional
The new starting time. If None, no extension is done before the existing data.
- enddatetime-like, optional
The new ending time. If None, no extension is done after the existing data.
- method{‘ffill’, ‘bfill’, ‘linear_slope’, ‘taper’, ‘constant’}, default ‘ffill’
The method used to fill new values outside the original time range:
‘ffill’ : Forward-fill after the original data using its last value.
‘bfill’ : Backward-fill before the original data using its first value.
‘linear_slope’ : Bidirectional linear extrapolation using the first/last two points.
‘taper’ : One-sided linear interpolation to/from a specified value (val).
‘constant’ : One-sided constant value fill with val.
- valfloat, optional
Required for ‘taper’ and ‘constant’. Specifies the value to use.
- Returns:
- extendedpandas.Series or pandas.DataFrame
The time series extended and filled using the selected method.
- Raises:
- ValueError
If extrapolation rules are violated based on the method.
If method requires or forbids val and it’s misused.
If frequency cannot be inferred.
- is_regular(ts, raise_exception=False)[source]¶
Check if a pandas DataFrame, Series, or xarray object with a time axis (axis 0) has a regular time index.
- Regular means:
The index is unique.
The index equals a date_range spanning from the first to the last value with the inferred frequency.
- Parameters:
ts : DataFrame, Series, or xarray object. raise_exception (bool): If True, raises a ValueError when the index is not regular.
Otherwise, returns False.
- Returns:
bool : True if the time index is regular; False otherwise.
- rts(data, start, freq, columns=None, props=None)[source]¶
Create a regular or calendar time series from data and time parameters
- Parameters:
- dataarray_like
- Should be a array/list of values. There is no restriction on data
type, but not all functionality like addition or interpolation will work on all data.
- start
Pandas.Timestamp
Timestamp or a string or type that can be coerced to one.
- interval_time_interval
Can also be a string representing a pandas freq.
- Returns:
- result
Pandas.DataFrame
A regular time series with the freq attribute set
- result
- rts_formula(start, end, freq, valfunc=nan)[source]¶
Create a regular time series filled with constant value or formula based on elapsed seconds
- Parameters:
- start
Pandas.Timestamp
Starting Timestamp or a string or type that can be coerced to one.
- end
Pandas.Timestamp
Ending Timestamp or a string or type that can be coerced to one.
- freq_time_interval
Can also be a string representing an interval.
- valfuncdict
Constant or dictionary that maps column names to lambdas based on elapsed time from the starts of the series. An example would be {“value”: lambda x: np.nan}
- start
- Returns:
- result
Pandas.DataFrame
A regular time series with the freq attribute set
- result
- time_overlap(ts0, ts1, valid=True)[source]¶
Check for overlapping time coverage between series Returns a tuple of start and end of overlapping periods. Only considers the time stamps of the start/end, possibly ignoring NaNs at the beginning if valid=True, does not check for actual time stamp alignment
vtools.data.vis_gap module¶
vtools.data.vtime module¶
Basic ops for creating, testing and manipulating times and time intervals. This module contains factory and helper functions for working with times and time intervals.
For time intervals (or deltas), VTools uses classes that are compatible with the “freq” argument of
requires a time and time interval system that is consistent (e.g. time+n*interval makes sense) and that can be applied to both calendar dependent and calendar-independent intervals. Because this requirement is not met by any one implementation it is recommended that you always use the factory functions in this module for creating intervals or testing whether an interval is valid.
- dst_to_standard_naive(ts, dst_zone='US/Pacific', standard_zone='Etc/GMT+8')[source]¶
Convert timezone-unaware series from a local (with daylight) time to standard time This would be useful, say, for converting a series that is PDT during summer to one that is not. The routine is mainly to treat cases where the time stamps at DST interfaces are not redundant – if they are you can probably use tz_convert and tz_localize with the ambiguous = ‘infer’ option and do the job more efficiently, but lots of databases don’t store data this way.
The choice of the standard_zone is, it seems, buggy. The defaults are supposed to convert from PST/PDT to pure PST, and the latter should be GMT-8. In a sense, this function is included before the behavior is really understood.
Only regular series are accepted … this is a quirk of the implementation