vtools.functions package

Submodules

vtools.functions.climatology module

apply_climatology(climate, index=None, start=None, end=None, freq=None)[source]

Apply daily or monthly climatology to a new index or generate index from start/end/freq

Parameters:
climateDataFrame or Series

DataFrame with integer index representing month of year (Jan=1) or day of year. Must be of size 12, 365, or 366. Day 366 will be inferred from day 365 value.

indexpandas.DatetimeIndex, optional

Locations to be inferred. If not provided, must specify start, end, and freq.

startstr or datetime-like, optional

Start date for generating index (used if index is None).

endstr or datetime-like, optional

End date for generating index (used if index is None).

freqstr, optional

Frequency string for generating index (used if index is None). E.g., ‘D’ for daily, ‘M’ for monthly.

Returns:
DataFrame or Series

Values extracted from climatology for the month or day at the specified index.

Notes

  • If index is not provided, start, end, and freq must be specified to generate a DatetimeIndex using pandas.date_range.

  • Backward compatible: original behavior is preserved if index is provided.

climatology(ts, freq, nsmooth=None)[source]

“ Create a climatology on the columns of ts

Parameters:
ts: DataFrame or Series
DataStructure to be analyzed. Must have a length of at least 2*freq
freq: period [“day”,”month”]
Period over which the climatology is analyzed
nsmooth: int

window size (number of values) of pre-smoothing. This may not make sense for series that are not approximately regular. An odd number is usually best.

Returns:

out: DataFrame or Series Data structure of the same type as ts, with Integer index representing month (Jan=1) or day of year (1:365).

climatology_quantiles(ts, min_day_year, max_day_year, window_width, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])[source]

“ Create windowed quantiles across years on a time series

Parameters:
ts: DataFrame or Series
DataStructure to be analyzed.
min_day_year: int
Minimum Julian day to be considered
freq: period [“day”,”month”]
Maximum Julian day to be considered
window_width: int
Number of days to include, including the central day and days on each side. So for instance window_width=15 would span the central date and 7 days on each side
quantiles: array-like

quantiles requested

Returns:

out: DataFrame or Series Data structure with Julian day as the index and quantiles as columns.

vtools.functions.colname_align module

Column naming alignment utilities for time series composition functions.

This module provides decorators that standardize how functions like ts_merge, ts_splice, and transition_ts handle their names argument and enforce column consistency across multiple time series inputs.

Main features

  • Column consistency enforcement: Ensures that when names=None (default), all input DataFrames share identical columns. This prevents accidental creation of staggered or mismatched columns.

  • Centralized naming behavior: Applies uniform handling of names values:

    • None — require identical columns across all inputs and keep them.

    • str — require univariate inputs (single column each); output is a single-column DataFrame (or Series if all inputs were Series) with this name.

    • Iterable[str] — treated as a column selector: these columns are selected (and ordered) from the final output and must exist in every input.

  • Support for both list-style and pairwise APIs: Works for functions that accept a sequence of time series (like ts_merge/ts_splice) or two explicit series arguments (like transition_ts).

Usage pattern

Decorate your functions as follows:

@columns_aligned(mode="same_set")
@names_aligned(seq_arg=0, pre_rename=True)
def ts_splice(series, names=None, ...):
    ...

@columns_aligned(mode="same_set")
@names_aligned_pair(ts0_kw="ts0", ts1_kw="ts1")
def transition_ts(ts0, ts1, names=None, ...):
    ...

This ensures consistent semantics for all multi-series combination tools.

_coerce_inputs_strict(seq, names)[source]

Strict input alignment policy: - names is None -> all inputs must have identical column lists (no unions/intersections). - names is str -> leave inputs as-is; final renaming happens via align_names(…). - names is list -> for each DF, select exactly those columns; for a Series, only len==1 allowed.

align_inputs_pair_strict(ts0_kw='ts0', ts1_kw='ts1', names_kw='names')[source]
align_inputs_strict(seq_arg=0, names_kw='names')[source]
align_names(result, names)[source]

vtools.functions.envelope module

class PchipInterpolator(x, y, axis=0, extrapolate=None)[source]

Bases: CubicHermiteSpline

PCHIP 1-D monotonic cubic interpolation.

x and y are arrays of values used to approximate some function f, with y = f(x). The interpolant uses monotonic cubic splines to find the value of new points. (PCHIP stands for Piecewise Cubic Hermite Interpolating Polynomial).

Parameters:
xndarray, shape (npoints, )

A 1-D array of monotonically increasing real values. x cannot include duplicate values (otherwise f is overspecified)

yndarray, shape (…, npoints, …)

A N-D array of real values. y’s length along the interpolation axis must be equal to the length of x. Use the axis parameter to select the interpolation axis.

axisint, optional

Axis in the y array corresponding to the x-coordinate values. Defaults to axis=0.

extrapolatebool, optional

Whether to extrapolate to out-of-bounds points based on first and last intervals, or to return NaNs.

See also

CubicHermiteSpline

Piecewise-cubic interpolator.

Akima1DInterpolator

Akima 1D interpolator.

CubicSpline

Cubic spline data interpolator.

PPoly

Piecewise polynomial in terms of coefficients and breakpoints.

Notes

The interpolator preserves monotonicity in the interpolation data and does not overshoot if the data is not smooth.

The first derivatives are guaranteed to be continuous, but the second derivatives may jump at \(x_k\).

Determines the derivatives at the points \(x_k\), \(f'_k\), by using PCHIP algorithm [1].

Let \(h_k = x_{k+1} - x_k\), and \(d_k = (y_{k+1} - y_k) / h_k\) are the slopes at internal points \(x_k\). If the signs of \(d_k\) and \(d_{k-1}\) are different or either of them equals zero, then \(f'_k = 0\). Otherwise, it is given by the weighted harmonic mean

\[\frac{w_1 + w_2}{f'_k} = \frac{w_1}{d_{k-1}} + \frac{w_2}{d_k}\]

where \(w_1 = 2 h_k + h_{k-1}\) and \(w_2 = h_k + 2 h_{k-1}\).

The end slopes are set using a one-sided scheme [2].

References

[1]

F. N. Fritsch and J. Butland, A method for constructing local monotone piecewise cubic interpolants, SIAM J. Sci. Comput., 5(2), 300-304 (1984). :doi:`10.1137/0905021`.

[2]

see, e.g., C. Moler, Numerical Computing with Matlab, 2004. :doi:`10.1137/1.9780898717952`

Methods

__call__(x[, nu, extrapolate])

Evaluate the piecewise polynomial or its derivative.

derivative([nu])

Construct a new piecewise polynomial representing the derivative.

antiderivative([nu])

Construct a new piecewise polynomial representing the antiderivative.

roots([discontinuity, extrapolate])

Find real roots of the piecewise polynomial.

__annotations__ = {}
__doc__ = "PCHIP 1-D monotonic cubic interpolation.\n\n    ``x`` and ``y`` are arrays of values used to approximate some function f,\n    with ``y = f(x)``. The interpolant uses monotonic cubic splines\n    to find the value of new points. (PCHIP stands for Piecewise Cubic\n    Hermite Interpolating Polynomial).\n\n    Parameters\n    ----------\n    x : ndarray, shape (npoints, )\n        A 1-D array of monotonically increasing real values. ``x`` cannot\n        include duplicate values (otherwise f is overspecified)\n    y : ndarray, shape (..., npoints, ...)\n        A N-D array of real values. ``y``'s length along the interpolation\n        axis must be equal to the length of ``x``. Use the ``axis``\n        parameter to select the interpolation axis.\n    axis : int, optional\n        Axis in the ``y`` array corresponding to the x-coordinate values. Defaults\n        to ``axis=0``.\n    extrapolate : bool, optional\n        Whether to extrapolate to out-of-bounds points based on first\n        and last intervals, or to return NaNs.\n\n    Methods\n    -------\n    __call__\n    derivative\n    antiderivative\n    roots\n\n    See Also\n    --------\n    CubicHermiteSpline : Piecewise-cubic interpolator.\n    Akima1DInterpolator : Akima 1D interpolator.\n    CubicSpline : Cubic spline data interpolator.\n    PPoly : Piecewise polynomial in terms of coefficients and breakpoints.\n\n    Notes\n    -----\n    The interpolator preserves monotonicity in the interpolation data and does\n    not overshoot if the data is not smooth.\n\n    The first derivatives are guaranteed to be continuous, but the second\n    derivatives may jump at :math:`x_k`.\n\n    Determines the derivatives at the points :math:`x_k`, :math:`f'_k`,\n    by using PCHIP algorithm [1]_.\n\n    Let :math:`h_k = x_{k+1} - x_k`, and  :math:`d_k = (y_{k+1} - y_k) / h_k`\n    are the slopes at internal points :math:`x_k`.\n    If the signs of :math:`d_k` and :math:`d_{k-1}` are different or either of\n    them equals zero, then :math:`f'_k = 0`. Otherwise, it is given by the\n    weighted harmonic mean\n\n    .. math::\n\n        \\frac{w_1 + w_2}{f'_k} = \\frac{w_1}{d_{k-1}} + \\frac{w_2}{d_k}\n\n    where :math:`w_1 = 2 h_k + h_{k-1}` and :math:`w_2 = h_k + 2 h_{k-1}`.\n\n    The end slopes are set using a one-sided scheme [2]_.\n\n\n    References\n    ----------\n    .. [1] F. N. Fritsch and J. Butland,\n           A method for constructing local\n           monotone piecewise cubic interpolants,\n           SIAM J. Sci. Comput., 5(2), 300-304 (1984).\n           :doi:`10.1137/0905021`.\n    .. [2] see, e.g., C. Moler, Numerical Computing with Matlab, 2004.\n           :doi:`10.1137/1.9780898717952`\n\n    "
__init__(x, y, axis=0, extrapolate=None)[source]
__module__ = 'scipy.interpolate._cubic'
static _edge_case(h0, h1, m0, m1)[source]
static _find_derivatives(x, y)[source]
axis
c
extrapolate
x
chunked_loess_smoothing(ts, window_hours=1.25, chunk_days=10, overlap_days=1)[source]

Apply LOESS smoothing in overlapping chunks to reduce computation time.

Parameters:
tspd.Series

Time series with datetime index and possible NaNs.

window_hoursfloat

LOESS smoothing window size in hours.

chunk_daysint

Core chunk size (e.g., 10 days).

overlap_daysint

Overlap added before and after each chunk to avoid edge effects.

Returns:
pd.Series

Smoothed series, NaNs where input is NaN or unsupported.

filter_extrema_ngood(extrema_df, smoothed, series, loess_window_pts=25, n_good=3, sig_gap_minutes=45)[source]

Filter extrema based on local and contextual data quality criteria.

Parameters:
extrema_dfpd.DataFrame

DataFrame with columns ‘time’ and ‘value’ for candidate extrema.

smoothedpd.Series

Smoothed version of the signal used for extrema detection.

seriespd.Series

Original time series (with gaps).

loess_window_ptsint

Number of points in the LOESS window.

n_goodint

Minimum number of non-NaN points required.

sig_gap_minutesfloat

Threshold for detecting significant gaps (in minutes).

Returns:
pd.DataFrame

Filtered extrema DataFrame.

find_peaks(x, height=None, threshold=None, distance=None, prominence=None, width=None, wlen=None, rel_height=0.5, plateau_size=None)[source]

Find peaks inside a signal based on peak properties.

This function takes a 1-D array and finds all local maxima by simple comparison of neighboring values. Optionally, a subset of these peaks can be selected by specifying conditions for a peak’s properties.

Parameters:
xsequence

A signal with peaks.

heightnumber or ndarray or sequence, optional

Required height of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.

thresholdnumber or ndarray or sequence, optional

Required threshold of peaks, the vertical distance to its neighboring samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required threshold.

distancenumber, optional

Required minimal horizontal distance (>= 1) in samples between neighbouring peaks. Smaller peaks are removed first until the condition is fulfilled for all remaining peaks.

prominencenumber or ndarray or sequence, optional

Required prominence of peaks. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence.

widthnumber or ndarray or sequence, optional

Required width of peaks in samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required width.

wlenint, optional

Used for calculation of the peaks prominences, thus it is only used if one of the arguments prominence or width is given. See argument wlen in peak_prominences for a full description of its effects.

rel_heightfloat, optional

Used for calculation of the peaks width, thus it is only used if width is given. See argument rel_height in peak_widths for a full description of its effects.

plateau_sizenumber or ndarray or sequence, optional

Required size of the flat top of peaks in samples. Either a number, None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied as the maximal required plateau size.

New in version 1.2.0.

Returns:
peaksndarray

Indices of peaks in x that satisfy all given conditions.

propertiesdict

A dictionary containing properties of the returned peaks which were calculated as intermediate results during evaluation of the specified conditions:

  • ‘peak_heights’

    If height is given, the height of each peak in x.

  • ‘left_thresholds’, ‘right_thresholds’

    If threshold is given, these keys contain a peaks vertical distance to its neighbouring samples.

  • ‘prominences’, ‘right_bases’, ‘left_bases’

    If prominence is given, these keys are accessible. See peak_prominences for a description of their content.

  • ‘width_heights’, ‘left_ips’, ‘right_ips’

    If width is given, these keys are accessible. See peak_widths for a description of their content.

  • ‘plateau_sizes’, left_edges’, ‘right_edges’

    If plateau_size is given, these keys are accessible and contain the indices of a peak’s edges (edges are still part of the plateau) and the calculated plateau sizes.

    New in version 1.2.0.

To calculate and return properties without excluding peaks, provide the open interval (None, None) as a value to the appropriate argument (excluding distance).

Warns:
PeakPropertyWarning

Raised if a peak’s properties have unexpected values (see peak_prominences and peak_widths).

Warning

This function may return unexpected results for data containing NaNs. To avoid this, NaNs should either be removed or replaced.

See also

find_peaks_cwt

Find peaks using the wavelet transformation.

peak_prominences

Directly calculate the prominence of peaks.

peak_widths

Directly calculate the width of peaks.

Notes

In the context of this function, a peak or local maximum is defined as any sample whose two direct neighbours have a smaller amplitude. For flat peaks (more than one sample of equal amplitude wide) the index of the middle sample is returned (rounded down in case the number of samples is even). For noisy signals the peak locations can be off because the noise might change the position of local maxima. In those cases consider smoothing the signal before searching for peaks or use other peak finding and fitting methods (like find_peaks_cwt).

Some additional comments on specifying conditions:

  • Almost all conditions (excluding distance) can be given as half-open or closed intervals, e.g., 1 or (1, None) defines the half-open interval \([1, \infty]\) while (None, 1) defines the interval \([-\infty, 1]\). The open interval (None, None) can be specified as well, which returns the matching properties without exclusion of peaks.

  • The border is always included in the interval used to select valid peaks.

  • For several conditions the interval borders can be specified with arrays matching x in shape which enables dynamic constrains based on the sample position.

  • The conditions are evaluated in the following order: plateau_size, height, threshold, distance, prominence, width. In most cases this order is the fastest one because faster operations are applied first to reduce the number of peaks that need to be evaluated later.

  • While indices in peaks are guaranteed to be at least distance samples apart, edges of flat peaks may be closer than the allowed distance.

  • Use wlen to reduce the time it takes to evaluate the conditions for prominence or width if x is large or has many local maxima (see peak_prominences).

New in version 1.1.0.

Examples

To demonstrate this function’s usage we use a signal x supplied with SciPy (see scipy.datasets.electrocardiogram). Let’s find all peaks (local maxima) in x whose amplitude lies above 0.

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from scipy.datasets import electrocardiogram
>>> from scipy.signal import find_peaks
>>> x = electrocardiogram()[2000:4000]
>>> peaks, _ = find_peaks(x, height=0)
>>> plt.plot(x)
>>> plt.plot(peaks, x[peaks], "x")
>>> plt.plot(np.zeros_like(x), "--", color="gray")
>>> plt.show()

We can select peaks below 0 with height=(None, 0) or use arrays matching x in size to reflect a changing condition for different parts of the signal.

>>> border = np.sin(np.linspace(0, 3 * np.pi, x.size))
>>> peaks, _ = find_peaks(x, height=(-border, border))
>>> plt.plot(x)
>>> plt.plot(-border, "--", color="gray")
>>> plt.plot(border, ":", color="gray")
>>> plt.plot(peaks, x[peaks], "x")
>>> plt.show()

Another useful condition for periodic signals can be given with the distance argument. In this case, we can easily select the positions of QRS complexes within the electrocardiogram (ECG) by demanding a distance of at least 150 samples.

>>> peaks, _ = find_peaks(x, distance=150)
>>> np.diff(peaks)
array([186, 180, 177, 171, 177, 169, 167, 164, 158, 162, 172])
>>> plt.plot(x)
>>> plt.plot(peaks, x[peaks], "x")
>>> plt.show()

Especially for noisy signals peaks can be easily grouped by their prominence (see peak_prominences). E.g., we can select all peaks except for the mentioned QRS complexes by limiting the allowed prominence to 0.6.

>>> peaks, properties = find_peaks(x, prominence=(None, 0.6))
>>> properties["prominences"].max()
0.5049999999999999
>>> plt.plot(x)
>>> plt.plot(peaks, x[peaks], "x")
>>> plt.show()

And, finally, let’s examine a different section of the ECG which contains beat forms of different shape. To select only the atypical heart beats, we combine two conditions: a minimal prominence of 1 and width of at least 20 samples.

>>> x = electrocardiogram()[17000:18000]
>>> peaks, properties = find_peaks(x, prominence=1, width=20)
>>> properties["prominences"], properties["widths"]
(array([1.495, 2.3  ]), array([36.93773946, 39.32723577]))
>>> plt.plot(x)
>>> plt.plot(peaks, x[peaks], "x")
>>> plt.vlines(x=peaks, ymin=x[peaks] - properties["prominences"],
...            ymax = x[peaks], color = "C1")
>>> plt.hlines(y=properties["width_heights"], xmin=properties["left_ips"],
...            xmax=properties["right_ips"], color = "C1")
>>> plt.show()
find_raw_extrema(smoothed, prominence=0.01)[source]

Find raw peaks and troughs using scipy.signal.find_peaks. Returns DataFrames for peaks and troughs.

generate_pink_noise(n, seed=None, scale=1.0)[source]

Generate pink (1/f) noise using the Voss-McCartney algorithm.

Parameters:
nint

Number of samples to generate.

seedint or None

Random seed for reproducibility.

scalefloat

Standard deviation scaling factor for the noise.

Returns:
np.ndarray

Pink noise signal of length n.

generate_simplified_mixed_tide(start_time='2022-01-01', ndays=40, freq='15min', A_M2=1.0, A_K1=0.5, A_O1=0.5, phase_D1=1.570795, noise_amplitude=0.08, return_components=False)[source]

Generate a simplified synthetic mixed semidiurnal/diurnal tide with explicit O1 and K1.

Parameters:
start_timestr

Start time for the series.

ndaysint

Number of days.

freqstr

Sampling interval.

A_M2float

Amplitude of M2.

A_K1float

Amplitude of K1.

A_O1float

Amplitude of O1.

phase_D1float

Common phase shift for O1 and K1.

return_componentsbool

Whether to return individual components.

Returns:
pd.Series or pd.DataFrame

Combined tide or components with time index.

interpolate_envelope(anchor_df, series, max_anchor_gap_hours=36)[source]

Interpolate envelope using PCHIP, breaking if anchor points are too far apart.

lowess(endog, exog, frac=0.6666666666666666, it=3, delta=0.0, xvals=None, is_sorted=False, missing='drop', return_sorted=True)[source]

LOWESS (Locally Weighted Scatterplot Smoothing)

A lowess function that outs smoothed estimates of endog at the given exog values from points (exog, endog)

Parameters:
endog1-D numpy array

The y-values of the observed points

exog1-D numpy array

The x-values of the observed points

fracfloat

Between 0 and 1. The fraction of the data used when estimating each y-value.

itint

The number of residual-based reweightings to perform.

deltafloat

Distance within which to use linear-interpolation instead of weighted regression.

xvals: 1-D numpy array

Values of the exogenous variable at which to evaluate the regression. If supplied, cannot use delta.

is_sortedbool

If False (default), then the data will be sorted by exog before calculating lowess. If True, then it is assumed that the data is already sorted by exog. If xvals is specified, then it too must be sorted if is_sorted is True.

missingstr

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘drop’.

return_sortedbool

If True (default), then the returned array is sorted by exog and has missing (nan or infinite) observations removed. If False, then the returned array is in the same length and the same sequence of observations as the input array.

Returns:
out{ndarray, float}

The returned array is two-dimensional if return_sorted is True, and one dimensional if return_sorted is False. If return_sorted is True, then a numpy array with two columns. The first column contains the sorted x (exog) values and the second column the associated estimated y (endog) values. If return_sorted is False, then only the fitted values are returned, and the observations will be in the same order as the input arrays. If xvals is provided, then return_sorted is ignored and the returned array is always one dimensional, containing the y values fitted at the x values provided by xvals.

Notes

This lowess function implements the algorithm given in the reference below using local linear estimates.

Suppose the input data has N points. The algorithm works by estimating the smooth y_i by taking the frac*N closest points to (x_i,y_i) based on their x values and estimating y_i using a weighted linear regression. The weight for (x_j,y_j) is tricube function applied to abs(x_i-x_j).

If it > 1, then further weighted local linear regressions are performed, where the weights are the same as above times the _lowess_bisquare function of the residuals. Each iteration takes approximately the same amount of time as the original fit, so these iterations are expensive. They are most useful when the noise has extremely heavy tails, such as Cauchy noise. Noise with less heavy-tails, such as t-distributions with df>2, are less problematic. The weights downgrade the influence of points with large residuals. In the extreme case, points whose residuals are larger than 6 times the median absolute residual are given weight 0.

delta can be used to save computations. For each x_i, regressions are skipped for points closer than delta. The next regression is fit for the farthest point within delta of x_i and all points in between are estimated by linearly interpolating between the two regression fits.

Judicious choice of delta can cut computation time considerably for large data (N > 5000). A good choice is delta = 0.01 * range(exog).

If xvals is provided, the regression is then computed at those points and the fit values are returned. Otherwise, the regression is run at points of exog.

Some experimentation is likely required to find a good choice of frac and iter for a particular dataset.

References

Cleveland, W.S. (1979) “Robust Locally Weighted Regression and Smoothing Scatterplots”. Journal of the American Statistical Association 74 (368): 829-836.

Examples

The below allows a comparison between how different the fits from lowess for different values of frac can be.

>>> import numpy as np
>>> import statsmodels.api as sm
>>> lowess = sm.nonparametric.lowess
>>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500)
>>> y = np.sin(x) + np.random.normal(size=len(x))
>>> z = lowess(y, x)
>>> w = lowess(y, x, frac=1./3)

This gives a similar comparison for when it is 0 vs not.

>>> import numpy as np
>>> import scipy.stats as stats
>>> import statsmodels.api as sm
>>> lowess = sm.nonparametric.lowess
>>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500)
>>> y = np.sin(x) + stats.cauchy.rvs(size=len(x))
>>> z = lowess(y, x, frac= 1./3, it=0)
>>> w = lowess(y, x, frac=1./3)
main()[source]
savgol_filter(x, window_length, polyorder, deriv=0, delta=1.0, axis=-1, mode='interp', cval=0.0)[source]

Apply a Savitzky-Golay filter to an array.

This is a 1-D filter. If x has dimension greater than 1, axis determines the axis along which the filter is applied.

Parameters:
xarray_like

The data to be filtered. If x is not a single or double precision floating point array, it will be converted to type numpy.float64 before filtering.

window_lengthint

The length of the filter window (i.e., the number of coefficients). If mode is ‘interp’, window_length must be less than or equal to the size of x.

polyorderint

The order of the polynomial used to fit the samples. polyorder must be less than window_length.

derivint, optional

The order of the derivative to compute. This must be a nonnegative integer. The default is 0, which means to filter the data without differentiating.

deltafloat, optional

The spacing of the samples to which the filter will be applied. This is only used if deriv > 0. Default is 1.0.

axisint, optional

The axis of the array x along which the filter is to be applied. Default is -1.

modestr, optional

Must be ‘mirror’, ‘constant’, ‘nearest’, ‘wrap’ or ‘interp’. This determines the type of extension to use for the padded signal to which the filter is applied. When mode is ‘constant’, the padding value is given by cval. See the Notes for more details on ‘mirror’, ‘constant’, ‘wrap’, and ‘nearest’. When the ‘interp’ mode is selected (the default), no extension is used. Instead, a degree polyorder polynomial is fit to the last window_length values of the edges, and this polynomial is used to evaluate the last window_length // 2 output values.

cvalscalar, optional

Value to fill past the edges of the input if mode is ‘constant’. Default is 0.0.

Returns:
yndarray, same shape as x

The filtered data.

See also

savgol_coeffs

Notes

Details on the mode options:

‘mirror’:

Repeats the values at the edges in reverse order. The value closest to the edge is not included.

‘nearest’:

The extension contains the nearest input value.

‘constant’:

The extension contains the value given by the cval argument.

‘wrap’:

The extension contains the values from the other end of the array.

For example, if the input is [1, 2, 3, 4, 5, 6, 7, 8], and window_length is 7, the following shows the extended data for the various mode options (assuming cval is 0):

mode       |   Ext   |         Input          |   Ext
-----------+---------+------------------------+---------
'mirror'   | 4  3  2 | 1  2  3  4  5  6  7  8 | 7  6  5
'nearest'  | 1  1  1 | 1  2  3  4  5  6  7  8 | 8  8  8
'constant' | 0  0  0 | 1  2  3  4  5  6  7  8 | 0  0  0
'wrap'     | 6  7  8 | 1  2  3  4  5  6  7  8 | 1  2  3

New in version 0.14.0.

Examples

>>> import numpy as np
>>> from scipy.signal import savgol_filter
>>> np.set_printoptions(precision=2)  # For compact display.
>>> x = np.array([2, 2, 5, 2, 1, 0, 1, 4, 9])

Filter with a window length of 5 and a degree 2 polynomial. Use the defaults for all other parameters.

>>> savgol_filter(x, 5, 2)
array([1.66, 3.17, 3.54, 2.86, 0.66, 0.17, 1.  , 4.  , 9.  ])

Note that the last five values in x are samples of a parabola, so when mode=’interp’ (the default) is used with polyorder=2, the last three values are unchanged. Compare that to, for example, mode=’nearest’:

>>> savgol_filter(x, 5, 2, mode='nearest')
array([1.74, 3.03, 3.54, 2.86, 0.66, 0.17, 1.  , 4.6 , 7.97])
select_salient_extrema(extrema, typ, spacing_hours=14, envelope_type='outer')[source]

Select salient extrema (HH/LL or HL/LH) using literal spacing-based OR logic.

Parameters:
extremapd.DataFrame with columns [“time”, “value”]

Candidate extrema.

typstr

Either “high” or “low” (for peak or trough selection).

spacing_hoursfloat

Time window for neighbor comparison.

envelope_typestr

Either “outer” (default) or “inner” to switch saliency logic.

Returns:
pd.DataFrame

Extrema that passed the saliency test.

smooth_series(ts, window_hours=1.75)[source]
smooth_series2(series, window_pts=25, method='lowess', **kwargs)[source]

Smooth a time series using the specified method. Currently supports ‘lowess’, ‘moving_average’, or ‘savgol’.

tidal_envelope(series, smoothing_window_hours=2.5, n_good=3, peak_prominence=0.05, saliency_window_hours=14, max_anchor_gap_hours=36, envelope_type='outer')[source]

Compute the tidal envelope (high and low) of a time series using smoothing, extrema detection, and interpolation. This function processes a time series to extract its tidal envelope by smoothing the data, identifying significant peaks and troughs, filtering out unreliable extrema, selecting salient extrema within a specified window, and interpolating between anchor points to generate continuous envelope curves. Parameters ———- series : pandas.Series

Time-indexed series of water levels or similar data.

smoothing_window_hoursfloat, optional

Window size in hours for smoothing the input series (default is 2.5).

n_goodint, optional

Minimum number of good points required for an extremum to be considered valid (default is 3).

peak_prominencefloat, optional

Minimum prominence of peaks/troughs to be considered as extrema (default is 0.05).

saliency_window_hoursfloat, optional

Window size in hours for selecting salient extrema (default is 14).

max_anchor_gap_hoursfloat, optional

Maximum allowed gap in hours between anchor points for interpolation (default is 36).

envelope_typestr, optional

Type of envelope to compute, e.g., “outer” (default is “outer”).

Returns

env_highpandas.Series

Interpolated high (upper) envelope of the input series.

env_lowpandas.Series

Interpolated low (lower) envelope of the input series.

anchor_highspandas.DataFrame

DataFrame of selected anchor points for the high envelope.

anchor_lowspandas.DataFrame

DataFrame of selected anchor points for the low envelope.

smoothedpandas.Series

Smoothed version of the input series.

Notes

This function assumes regular time intervals in the input series. If the frequency cannot be inferred, it is estimated from the first two timestamps.

vtools.functions.error_detect module

_nrepeat(ts)[source]

Series-only version

bounds_test(ts, bounds)[source]
despike(arr, n1=2, n2=20, block=10)[source]
example()[source]
example2()[source]
med_outliers(ts, level=4.0, scale=None, filt_len=7, range=(None, None), quantiles=(0.01, 0.99), copy=True, as_anomaly=False)[source]

Detect outliers by running a median filter, subtracting it from the original series and comparing the resulting residuals to a global robust range of scale (the interquartile range). Individual time points are rejected if the residual at that time point is more than level times the range of scale.

The original concept comes from Basu & Meckesheimer (2007) Automatic outlier detection for time series: an application to sensor data although they didn’t use the interquartile range but rather expert judgment. To use this function effectively, you need to be thoughtful about what the interquartile range will be. For instance, for a strongly tidal flow station it is likely to

level: Number of times the scale or interquantile range the data has to be

to be rejected.d

scale: Expert judgment of the scale of maximum variation over a time step.

If None, the interquartile range will be used. Note that for a strongly tidal station the interquartile range may substantially overestimate the reasonable variation over a single time step, in which case the filter will work fine, but level should be set to a number (less than one) accordingly.

filt_len: length of median filter, default is 5

quantilestuple of quantiles defining the measure of scale. Ignored

if scale is given directly. Default is interquartile range, and this is almost always a reasonable choice.

copy: if True, a copy is made leaving original series intact

You can also specify rejection of values based on a simple range

Returns: copy of series with outliers replaced by nan

median_test(ts, level=4, filt_len=7, quantiles=(0.005, 0.095), copy=True)[source]
nrepeat(ts)[source]

Return the length of consecutive runs of repeated values

Parameters:
ts: DataFrame or series
Returns:
Like-indexed series with lengths of runs. Nans will be mapped to 0
steep_then_nan(ts, level=4.0, scale=None, filt_len=11, range=(None, None), quantiles=(0.01, 0.99), copy=True, as_anomaly=True)[source]

Detect outliers by running a median filter, subtracting it from the original series and comparing the resulting residuals to a global robust range of scale (the interquartile range). Individual time points are rejected if the residual at that time point is more than level times the range of scale.

The original concept comes from Basu & Meckesheimer (2007) although they didn’t use the interquartile range but rather expert judgment. To use this function effectively, you need to be thoughtful about what the interquartile range will be. For instance, for a strongly tidal flow station it is likely to

level: Number of times the scale or interquantile range the data has to be

to be rejected.d

scale: Expert judgment of the scale of maximum variation over a time step.

If None, the interquartile range will be used. Note that for a strongly tidal station the interquartile range may substantially overestimate the reasonable variation over a single time step, in which case the filter will work fine, but level should be set to a number (less than one) accordingly.

filt_len: length of median filter, default is 5

quantilestuple of quantiles defining the measure of scale. Ignored

if scale is given directly. Default is interquartile range, and this is almost always a reasonable choice.

copy: if True, a copy is made leaving original series intact

You can also specify rejection of values based on a simple range

Returns: copy of series with outliers replaced by nan

threshold(ts, bounds, copy=True)[source]

vtools.functions.example2 module

main()[source]

vtools.functions.filter module

Module contains filter used in tidal time series analysis.

_gf1d(ts, sigma, order, mode, cval, truncate)[source]
_lanczos_impl(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True, cosine_taper=False)[source]

squared low-pass cosine lanczos filter on a regular time series.

Parameters:
tsDataFrame
filter_lenint, time_interval

Size of lanczos window, default is to number of samples within filter_period*1.25.

cutoff_frequency: float,optional

Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.

cutoff_periodstring or _time_interval

Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.

padtypestr or None, optional

Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.

padlenint or None, optional

The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.

fill_edge_nan: bool,optional

If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.

Returns:
resultTimeSeries

A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.

Raises:
ValueError

If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size

butter(N, Wn, btype='low', analog=False, output='ba', fs=None)[source]

Butterworth digital and analog filter design.

Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.

Parameters:
Nint

The order of the filter. For ‘bandpass’ and ‘bandstop’ filters, the resulting order of the final second-order sections (‘sos’) matrix is 2*N, with N the number of biquad sections of the desired system.

Wnarray_like

The critical frequency or frequencies. For lowpass and highpass filters, Wn is a scalar; for bandpass and bandstop filters, Wn is a length-2 sequence.

For a Butterworth filter, this is the point at which the gain drops to 1/sqrt(2) that of the passband (the “-3 dB point”).

For digital filters, if fs is not specified, Wn units are normalized from 0 to 1, where 1 is the Nyquist frequency (Wn is thus in half cycles / sample and defined as 2*critical frequencies / fs). If fs is specified, Wn is in the same units as fs.

For analog filters, Wn is an angular frequency (e.g. rad/s).

btype{‘lowpass’, ‘highpass’, ‘bandpass’, ‘bandstop’}, optional

The type of filter. Default is ‘lowpass’.

analogbool, optional

When True, return an analog filter, otherwise a digital filter is returned.

output{‘ba’, ‘zpk’, ‘sos’}, optional

Type of output: numerator/denominator (‘ba’), pole-zero (‘zpk’), or second-order sections (‘sos’). Default is ‘ba’ for backwards compatibility, but ‘sos’ should be used for general-purpose filtering.

fsfloat, optional

The sampling frequency of the digital system.

New in version 1.2.0.

Returns:
b, andarray, ndarray

Numerator (b) and denominator (a) polynomials of the IIR filter. Only returned if output='ba'.

z, p, kndarray, ndarray, float

Zeros, poles, and system gain of the IIR filter transfer function. Only returned if output='zpk'.

sosndarray

Second-order sections representation of the IIR filter. Only returned if output='sos'.

See also

buttord, buttap

Notes

The Butterworth filter has maximally flat frequency response in the passband.

The 'sos' output parameter was added in 0.16.0.

If the transfer function form [b, a] is requested, numerical problems can occur since the conversion between roots and the polynomial coefficients is a numerically sensitive operation, even for N >= 4. It is recommended to work with the SOS representation.

Warning

Designing high-order and narrowband IIR filters in TF form can result in unstable or incorrect filtering due to floating point numerical precision issues. Consider inspecting output filter characteristics freqz or designing the filters with second-order sections via output='sos'.

Examples

Design an analog filter and plot its frequency response, showing the critical points:

>>> from scipy import signal
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> b, a = signal.butter(4, 100, 'low', analog=True)
>>> w, h = signal.freqs(b, a)
>>> plt.semilogx(w, 20 * np.log10(abs(h)))
>>> plt.title('Butterworth filter frequency response')
>>> plt.xlabel('Frequency [radians / second]')
>>> plt.ylabel('Amplitude [dB]')
>>> plt.margins(0, 0.1)
>>> plt.grid(which='both', axis='both')
>>> plt.axvline(100, color='green') # cutoff frequency
>>> plt.show()

Generate a signal made up of 10 Hz and 20 Hz, sampled at 1 kHz

>>> t = np.linspace(0, 1, 1000, False)  # 1 second
>>> sig = np.sin(2*np.pi*10*t) + np.sin(2*np.pi*20*t)
>>> fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
>>> ax1.plot(t, sig)
>>> ax1.set_title('10 Hz and 20 Hz sinusoids')
>>> ax1.axis([0, 1, -2, 2])

Design a digital high-pass filter at 15 Hz to remove the 10 Hz tone, and apply it to the signal. (It’s recommended to use second-order sections format when filtering, to avoid numerical error with transfer function (ba) format):

>>> sos = signal.butter(10, 15, 'hp', fs=1000, output='sos')
>>> filtered = signal.sosfilt(sos, sig)
>>> ax2.plot(t, filtered)
>>> ax2.set_title('After 15 Hz high-pass filter')
>>> ax2.axis([0, 1, -2, 2])
>>> ax2.set_xlabel('Time [seconds]')
>>> plt.tight_layout()
>>> plt.show()
butterworth(ts, cutoff_period=None, cutoff_frequency=None, order=4)[source]

low-pass butterworth-squared filter on a regular time series.

Parameters:
tsDataFrame

Must be one or two dimensional, and regular.

order: int ,optional

The default is 4.

cutoff_frequency: float,optional

Cutoff frequency expressed as a ratio with Nyquist frequency, should within the range (0,1). For a discretely sampled system, the Nyquist frequency is the fastest frequency that can be resolved by that sampling, which is half the sampling frequency. For example, if the sampling frequency is 1 sample/1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.

cutoff_periodstring or _time_interval

Period corresponding to cutoff frequency. If input as a string, it must be convertible to a regular interval using the same rules as a pandas frequency.. cutoff_frequency and cutoff_period can’t be specified at the same time.

Returns:
result

A new regular time series with the same interval as ts.

Raises:
ValueError

If input order is not even, or input timeseries is not regular, or neither cutoff_period and cutoff_frequency is given while input time series interval is not 15min or 1 hour, or cutoff_period and cutoff_frequency are given at the same time.

convert_span_to_nstep(freq, span)[source]
cosine_lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]
cosine_lanczos5(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]

squared low-pass cosine lanczos filter on a regular time series.

Parameters:
tsDataFrame
filter_lenint, time_interval

Size of lanczos window, default is to number of samples within filter_period*1.25.

cutoff_frequency: float,optional

Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.

cutoff_periodstring or _time_interval

Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.

padtypestr or None, optional

Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.

padlenint or None, optional

The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.

fill_edge_nan: bool,optional

If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.

Returns:
resultTimeSeries

A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.

Raises:
ValueError

If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size

filtfilt(b, a, x, axis=-1, padtype='odd', padlen=None, method='pad', irlen=None)[source]

Apply a digital filter forward and backward to a signal.

This function applies a linear digital filter twice, once forward and once backwards. The combined filter has zero phase and a filter order twice that of the original.

The function provides options for handling the edges of the signal.

The function sosfiltfilt (and filter design using output='sos') should be preferred over filtfilt for most filtering tasks, as second-order sections have fewer numerical problems.

Parameters:
b(N,) array_like

The numerator coefficient vector of the filter.

a(N,) array_like

The denominator coefficient vector of the filter. If a[0] is not 1, then both a and b are normalized by a[0].

xarray_like

The array of data to be filtered.

axisint, optional

The axis of x to which the filter is applied. Default is -1.

padtypestr or None, optional

Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is ‘odd’.

padlenint or None, optional

The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis] - 1. padlen=0 implies no padding. The default value is 3 * max(len(a), len(b)).

methodstr, optional

Determines the method for handling the edges of the signal, either “pad” or “gust”. When method is “pad”, the signal is padded; the type of padding is determined by padtype and padlen, and irlen is ignored. When method is “gust”, Gustafsson’s method is used, and padtype and padlen are ignored.

irlenint or None, optional

When method is “gust”, irlen specifies the length of the impulse response of the filter. If irlen is None, no part of the impulse response is ignored. For a long signal, specifying irlen can significantly improve the performance of the filter.

Returns:
yndarray

The filtered output with the same shape as x.

See also

sosfiltfilt, lfilter_zi, lfilter, lfiltic, savgol_filter, sosfilt

Notes

When method is “pad”, the function pads the data along the given axis in one of three ways: odd, even or constant. The odd and even extensions have the corresponding symmetry about the end point of the data. The constant extension extends the data with the values at the end points. On both the forward and backward passes, the initial condition of the filter is found by using lfilter_zi and scaling it by the end point of the extended data.

When method is “gust”, Gustafsson’s method [1] is used. Initial conditions are chosen for the forward and backward passes so that the forward-backward filter gives the same result as the backward-forward filter.

The option to use Gustaffson’s method was added in scipy version 0.16.0.

References

[1]

F. Gustaffson, “Determining the initial states in forward-backward filtering”, Transactions on Signal Processing, Vol. 46, pp. 988-992, 1996.

Examples

The examples will use several functions from scipy.signal.

>>> import numpy as np
>>> from scipy import signal
>>> import matplotlib.pyplot as plt

First we create a one second signal that is the sum of two pure sine waves, with frequencies 5 Hz and 250 Hz, sampled at 2000 Hz.

>>> t = np.linspace(0, 1.0, 2001)
>>> xlow = np.sin(2 * np.pi * 5 * t)
>>> xhigh = np.sin(2 * np.pi * 250 * t)
>>> x = xlow + xhigh

Now create a lowpass Butterworth filter with a cutoff of 0.125 times the Nyquist frequency, or 125 Hz, and apply it to x with filtfilt. The result should be approximately xlow, with no phase shift.

>>> b, a = signal.butter(8, 0.125)
>>> y = signal.filtfilt(b, a, x, padlen=150)
>>> np.abs(y - xlow).max()
9.1086182074789912e-06

We get a fairly clean result for this artificial example because the odd extension is exact, and with the moderately long padding, the filter’s transients have dissipated by the time the actual data is reached. In general, transient effects at the edges are unavoidable.

The following example demonstrates the option method="gust".

First, create a filter.

>>> b, a = signal.ellip(4, 0.01, 120, 0.125)  # Filter to be applied.

sig is a random input signal to be filtered.

>>> rng = np.random.default_rng()
>>> n = 60
>>> sig = rng.standard_normal(n)**3 + 3*rng.standard_normal(n).cumsum()

Apply filtfilt to sig, once using the Gustafsson method, and once using padding, and plot the results for comparison.

>>> fgust = signal.filtfilt(b, a, sig, method="gust")
>>> fpad = signal.filtfilt(b, a, sig, padlen=50)
>>> plt.plot(sig, 'k-', label='input')
>>> plt.plot(fgust, 'b-', linewidth=4, label='gust')
>>> plt.plot(fpad, 'c-', linewidth=1.5, label='pad')
>>> plt.legend(loc='best')
>>> plt.show()

The irlen argument can be used to improve the performance of Gustafsson’s method.

Estimate the impulse response length of the filter.

>>> z, p, k = signal.tf2zpk(b, a)
>>> eps = 1e-9
>>> r = np.max(np.abs(p))
>>> approx_impulse_len = int(np.ceil(np.log(eps) / np.log(r)))
>>> approx_impulse_len
137

Apply the filter to a longer signal, with and without the irlen argument. The difference between y1 and y2 is small. For long signals, using irlen gives a significant performance improvement.

>>> x = rng.standard_normal(4000)
>>> y1 = signal.filtfilt(b, a, x, method='gust')
>>> y2 = signal.filtfilt(b, a, x, method='gust', irlen=approx_impulse_len)
>>> print(np.max(np.abs(y1 - y2)))
2.875334415008979e-10
firwin(numtaps, cutoff, *, width=None, window='hamming', pass_zero=True, scale=True, nyq=<object object>, fs=None)[source]

FIR filter design using the window method.

This function computes the coefficients of a finite impulse response filter. The filter will have linear phase; it will be Type I if numtaps is odd and Type II if numtaps is even.

Type II filters always have zero response at the Nyquist frequency, so a ValueError exception is raised if firwin is called with numtaps even and having a passband whose right end is at the Nyquist frequency.

Parameters:
numtapsint

Length of the filter (number of coefficients, i.e. the filter order + 1). numtaps must be odd if a passband includes the Nyquist frequency.

cutofffloat or 1-D array_like

Cutoff frequency of filter (expressed in the same units as fs) OR an array of cutoff frequencies (that is, band edges). In the latter case, the frequencies in cutoff should be positive and monotonically increasing between 0 and fs/2. The values 0 and fs/2 must not be included in cutoff.

widthfloat or None, optional

If width is not None, then assume it is the approximate width of the transition region (expressed in the same units as fs) for use in Kaiser FIR filter design. In this case, the window argument is ignored.

windowstring or tuple of string and parameter values, optional

Desired window to use. See scipy.signal.get_window for a list of windows and required parameters.

pass_zero{True, False, ‘bandpass’, ‘lowpass’, ‘highpass’, ‘bandstop’}, optional

If True, the gain at the frequency 0 (i.e., the “DC gain”) is 1. If False, the DC gain is 0. Can also be a string argument for the desired filter type (equivalent to btype in IIR design functions).

New in version 1.3.0: Support for string arguments.

scalebool, optional

Set to True to scale the coefficients so that the frequency response is exactly unity at a certain frequency. That frequency is either:

  • 0 (DC) if the first passband starts at 0 (i.e. pass_zero is True)

  • fs/2 (the Nyquist frequency) if the first passband ends at fs/2 (i.e the filter is a single band highpass filter); center of first passband otherwise

nyqfloat, optional, deprecated

This is the Nyquist frequency. Each frequency in cutoff must be between 0 and nyq. Default is 1.

Deprecated since version 1.0.0: firwin keyword argument nyq is deprecated in favour of fs and will be removed in SciPy 1.14.0.

fsfloat, optional

The sampling frequency of the signal. Each frequency in cutoff must be between 0 and fs/2. Default is 2.

Returns:
h(numtaps,) ndarray

Coefficients of length numtaps FIR filter.

Raises:
ValueError

If any value in cutoff is less than or equal to 0 or greater than or equal to fs/2, if the values in cutoff are not strictly monotonically increasing, or if numtaps is even but a passband includes the Nyquist frequency.

See also

firwin2
firls
minimum_phase
remez

Examples

Low-pass from 0 to f:

>>> from scipy import signal
>>> numtaps = 3
>>> f = 0.1
>>> signal.firwin(numtaps, f)
array([ 0.06799017,  0.86401967,  0.06799017])

Use a specific window function:

>>> signal.firwin(numtaps, f, window='nuttall')
array([  3.56607041e-04,   9.99286786e-01,   3.56607041e-04])

High-pass (‘stop’ from 0 to f):

>>> signal.firwin(numtaps, f, pass_zero=False)
array([-0.00859313,  0.98281375, -0.00859313])

Band-pass:

>>> f1, f2 = 0.1, 0.2
>>> signal.firwin(numtaps, [f1, f2], pass_zero=False)
array([ 0.06301614,  0.88770441,  0.06301614])

Band-stop:

>>> signal.firwin(numtaps, [f1, f2])
array([-0.00801395,  1.0160279 , -0.00801395])

Multi-band (passbands are [0, f1], [f2, f3] and [f4, 1]):

>>> f3, f4 = 0.3, 0.4
>>> signal.firwin(numtaps, [f1, f2, f3, f4])
array([-0.01376344,  1.02752689, -0.01376344])

Multi-band (passbands are [f1, f2] and [f3,f4]):

>>> signal.firwin(numtaps, [f1, f2, f3, f4], pass_zero=False)
array([ 0.04890915,  0.91284326,  0.04890915])
gaussian_filter1d(input, sigma, axis=-1, order=0, output=None, mode='reflect', cval=0.0, truncate=4.0, *, radius=None)[source]

1-D Gaussian filter.

Parameters:
inputarray_like

The input array.

sigmascalar

standard deviation for Gaussian kernel

axisint, optional

The axis of input along which to calculate. Default is -1.

orderint, optional

An order of 0 corresponds to convolution with a Gaussian kernel. A positive order corresponds to convolution with that derivative of a Gaussian.

outputarray or dtype, optional

The array in which to place the output, or the dtype of the returned array. By default an array of the same dtype as input will be created.

mode{‘reflect’, ‘constant’, ‘nearest’, ‘mirror’, ‘wrap’}, optional

The mode parameter determines how the input array is extended beyond its boundaries. Default is ‘reflect’. Behavior for each valid value is as follows:

‘reflect’ (d c b a | a b c d | d c b a)

The input is extended by reflecting about the edge of the last pixel. This mode is also sometimes referred to as half-sample symmetric.

‘constant’ (k k k k | a b c d | k k k k)

The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter.

‘nearest’ (a a a a | a b c d | d d d d)

The input is extended by replicating the last pixel.

‘mirror’ (d c b | a b c d | c b a)

The input is extended by reflecting about the center of the last pixel. This mode is also sometimes referred to as whole-sample symmetric.

‘wrap’ (a b c d | a b c d | a b c d)

The input is extended by wrapping around to the opposite edge.

For consistency with the interpolation functions, the following mode names can also be used:

‘grid-mirror’

This is a synonym for ‘reflect’.

‘grid-constant’

This is a synonym for ‘constant’.

‘grid-wrap’

This is a synonym for ‘wrap’.

cvalscalar, optional

Value to fill past edges of input if mode is ‘constant’. Default is 0.0.

truncatefloat, optional

Truncate the filter at this many standard deviations. Default is 4.0.

radiusNone or int, optional

Radius of the Gaussian kernel. If specified, the size of the kernel will be 2*radius + 1, and truncate is ignored. Default is None.

Returns:
gaussian_filter1dndarray

Notes

The Gaussian kernel will have size 2*radius + 1 along each axis. If radius is None, a default radius = round(truncate * sigma) will be used.

Examples

>>> from scipy.ndimage import gaussian_filter1d
>>> import numpy as np
>>> gaussian_filter1d([1.0, 2.0, 3.0, 4.0, 5.0], 1)
array([ 1.42704095,  2.06782203,  3.        ,  3.93217797,  4.57295905])
>>> gaussian_filter1d([1.0, 2.0, 3.0, 4.0, 5.0], 4)
array([ 2.91948343,  2.95023502,  3.        ,  3.04976498,  3.08051657])
>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> x = rng.standard_normal(101).cumsum()
>>> y3 = gaussian_filter1d(x, 3)
>>> y6 = gaussian_filter1d(x, 6)
>>> plt.plot(x, 'k', label='original data')
>>> plt.plot(y3, '--', label='filtered, sigma=3')
>>> plt.plot(y6, ':', label='filtered, sigma=6')
>>> plt.legend()
>>> plt.grid()
>>> plt.show()
generate_godin_fir(freq)[source]

generate godin filter impulse response for given freq freq is a pandas freq

godin(ts)[source]

Low-pass Godin filter a regular time series. Applies the \(\mathcal{A_{24}^{2}A_{25}}\) Godin filter [1] The filter is generalized to be the equivalent of one boxcar of the length of the lunar diurnal (~25 hours) constituent and two of the solar diurnal (~24 hours), though the implementation combines these steps.

Parameters:
tsDataFrame
Returns:
resultDataFrame

A new regular time series with the same interval of ts.

Raises:
NotImplementedError

If input time series is not univariate

References

[1]

Godin (1972) Analysis of Tides

hours(h)[source]

Create a time interval representing h hours

lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]
lfilter(b, a, x, axis=-1, zi=None)[source]

Filter data along one-dimension with an IIR or FIR filter.

Filter a data sequence, x, using a digital filter. This works for many fundamental data types (including Object type). The filter is a direct form II transposed implementation of the standard difference equation (see Notes).

The function sosfilt (and filter design using output='sos') should be preferred over lfilter for most filtering tasks, as second-order sections have fewer numerical problems.

Parameters:
barray_like

The numerator coefficient vector in a 1-D sequence.

aarray_like

The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0].

xarray_like

An N-dimensional input array.

axisint, optional

The axis of the input data array along which to apply the linear filter. The filter is applied to each subarray along this axis. Default is -1.

ziarray_like, optional

Initial conditions for the filter delays. It is a vector (or array of vectors for an N-dimensional input) of length max(len(a), len(b)) - 1. If zi is None or is not given then initial rest is assumed. See lfiltic for more information.

Returns:
yarray

The output of the digital filter.

zfarray, optional

If zi is None, this is not returned, otherwise, zf holds the final filter delay values.

See also

lfiltic

Construct initial conditions for lfilter.

lfilter_zi

Compute initial state (steady state of step response) for lfilter.

filtfilt

A forward-backward filter, to obtain a filter with zero phase.

savgol_filter

A Savitzky-Golay filter.

sosfilt

Filter data using cascaded second-order sections.

sosfiltfilt

A forward-backward filter using second-order sections.

Notes

The filter function is implemented as a direct II transposed structure. This means that the filter implements:

a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[M]*x[n-M]
                      - a[1]*y[n-1] - ... - a[N]*y[n-N]

where M is the degree of the numerator, N is the degree of the denominator, and n is the sample number. It is implemented using the following difference equations (assuming M = N):

a[0]*y[n] = b[0] * x[n]               + d[0][n-1]
  d[0][n] = b[1] * x[n] - a[1] * y[n] + d[1][n-1]
  d[1][n] = b[2] * x[n] - a[2] * y[n] + d[2][n-1]
...
d[N-2][n] = b[N-1]*x[n] - a[N-1]*y[n] + d[N-1][n-1]
d[N-1][n] = b[N] * x[n] - a[N] * y[n]

where d are the state variables.

The rational transfer function describing this filter in the z-transform domain is:

                    -1              -M
        b[0] + b[1]z  + ... + b[M] z
Y(z) = -------------------------------- X(z)
                    -1              -N
        a[0] + a[1]z  + ... + a[N] z

Examples

Generate a noisy signal to be filtered:

>>> import numpy as np
>>> from scipy import signal
>>> import matplotlib.pyplot as plt
>>> rng = np.random.default_rng()
>>> t = np.linspace(-1, 1, 201)
>>> x = (np.sin(2*np.pi*0.75*t*(1-t) + 2.1) +
...      0.1*np.sin(2*np.pi*1.25*t + 1) +
...      0.18*np.cos(2*np.pi*3.85*t))
>>> xn = x + rng.standard_normal(len(t)) * 0.08

Create an order 3 lowpass butterworth filter:

>>> b, a = signal.butter(3, 0.05)

Apply the filter to xn. Use lfilter_zi to choose the initial condition of the filter:

>>> zi = signal.lfilter_zi(b, a)
>>> z, _ = signal.lfilter(b, a, xn, zi=zi*xn[0])

Apply the filter again, to have a result filtered at an order the same as filtfilt:

>>> z2, _ = signal.lfilter(b, a, z, zi=zi*z[0])

Use filtfilt to apply the filter:

>>> y = signal.filtfilt(b, a, xn)

Plot the original signal and the various filtered versions:

>>> plt.figure
>>> plt.plot(t, xn, 'b', alpha=0.75)
>>> plt.plot(t, z, 'r--', t, z2, 'r', t, y, 'k')
>>> plt.legend(('noisy signal', 'lfilter, once', 'lfilter, twice',
...             'filtfilt'), loc='best')
>>> plt.grid(True)
>>> plt.show()
lowpass_cosine_lanczos_filter_coef(cf, m, normalize=True)[source]

return the convolution coefficients for low pass lanczos filter.

Parameters:
cf: float

Cutoff frequency expressed as a ratio of a Nyquist frequency.

m: int

Size of filtering window size.

Returns:
results: list

Coefficients of filtering window.

lowpass_lanczos_filter_coef(cf, m, normalize=True, cosine_taper=False)[source]

Return the convolution coefficients for a low-pass Lanczos filter.

Parameters:
cffloat

Cutoff frequency expressed as a ratio of the Nyquist frequency.

mint

Size of the filtering window.

normalizebool, optional

Whether to normalize the filter coefficients so they sum to 1.

cosine_taperbool, optional

If True, applies a cosine-squared taper to the Lanczos window.

Returns:
resnp.ndarray

Coefficients of the filtering window.

minutes(m)[source]

Create a time interval representing m minutes

process_cutoff(cutoff_frequency, cutoff_period, freq)[source]
seconds(s)[source]

Create a time interval representing s seconds

ts_gaussian_filter(ts, sigma, order=0, mode='reflect', cval=0.0, truncate=4.0)[source]

Column-wise Gaussian smoothing of regular time series. Missing/irregular values are not handled, which means this function is not much different from a rolling window gaussian average in pandas which may be preferable in the case of missing data (ts.rolling(window=5,win_type=’gaussian’).mean. This function has been kept around awaiting irreg as an aspiration but yet to be implemented.

Parameters:
tsDataFrame

The series to be smoothed

sigmaint or freq

The sigma scale of the smoothing (analogous to std. deviation), given as a number of steps or freq

Returns:
resultDataFrame

A new regular time series with the same interval of ts.

vtools.functions.frequency_response module

butterworth(ts, cutoff_period=None, cutoff_frequency=None, order=4)[source]

low-pass butterworth-squared filter on a regular time series.

Parameters:
tsDataFrame

Must be one or two dimensional, and regular.

order: int ,optional

The default is 4.

cutoff_frequency: float,optional

Cutoff frequency expressed as a ratio with Nyquist frequency, should within the range (0,1). For a discretely sampled system, the Nyquist frequency is the fastest frequency that can be resolved by that sampling, which is half the sampling frequency. For example, if the sampling frequency is 1 sample/1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.

cutoff_periodstring or _time_interval

Period corresponding to cutoff frequency. If input as a string, it must be convertible to a regular interval using the same rules as a pandas frequency.. cutoff_frequency and cutoff_period can’t be specified at the same time.

Returns:
result

A new regular time series with the same interval as ts.

Raises:
ValueError

If input order is not even, or input timeseries is not regular, or neither cutoff_period and cutoff_frequency is given while input time series interval is not 15min or 1 hour, or cutoff_period and cutoff_frequency are given at the same time.

compare_response(cutoff_period)[source]

Generate frequency response plot of low-pass filters: cosine_lanczos, boxcar 24h, boxcar 25h, and godin.

Parameters:
cutoff_periodint

Low-pass filter cutoff period in number of hours.

Returns:
None.
cosine_lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]
cosine_lanczos5(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]

squared low-pass cosine lanczos filter on a regular time series.

Parameters:
tsDataFrame
filter_lenint, time_interval

Size of lanczos window, default is to number of samples within filter_period*1.25.

cutoff_frequency: float,optional

Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.

cutoff_periodstring or _time_interval

Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.

padtypestr or None, optional

Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.

padlenint or None, optional

The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.

fill_edge_nan: bool,optional

If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.

Returns:
resultTimeSeries

A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.

Raises:
ValueError

If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size

freqz(b, a=1, worN=512, whole=False, plot=None, fs=6.283185307179586, include_nyquist=False)[source]

Compute the frequency response of a digital filter.

Given the M-order numerator b and N-order denominator a of a digital filter, compute its frequency response:

            jw                 -jw              -jwM
   jw    B(e  )    b[0] + b[1]e    + ... + b[M]e
H(e  ) = ------ = -----------------------------------
            jw                 -jw              -jwN
         A(e  )    a[0] + a[1]e    + ... + a[N]e
Parameters:
barray_like

Numerator of a linear filter. If b has dimension greater than 1, it is assumed that the coefficients are stored in the first dimension, and b.shape[1:], a.shape[1:], and the shape of the frequencies array must be compatible for broadcasting.

aarray_like

Denominator of a linear filter. If b has dimension greater than 1, it is assumed that the coefficients are stored in the first dimension, and b.shape[1:], a.shape[1:], and the shape of the frequencies array must be compatible for broadcasting.

worN{None, int, array_like}, optional

If a single integer, then compute at that many frequencies (default is N=512). This is a convenient alternative to:

np.linspace(0, fs if whole else fs/2, N, endpoint=include_nyquist)

Using a number that is fast for FFT computations can result in faster computations (see Notes).

If an array_like, compute the response at the frequencies given. These are in the same units as fs.

wholebool, optional

Normally, frequencies are computed from 0 to the Nyquist frequency, fs/2 (upper-half of unit-circle). If whole is True, compute frequencies from 0 to fs. Ignored if worN is array_like.

plotcallable

A callable that takes two arguments. If given, the return parameters w and h are passed to plot. Useful for plotting the frequency response inside freqz.

fsfloat, optional

The sampling frequency of the digital system. Defaults to 2*pi radians/sample (so w is from 0 to pi).

New in version 1.2.0.

include_nyquistbool, optional

If whole is False and worN is an integer, setting include_nyquist to True will include the last frequency (Nyquist frequency) and is otherwise ignored.

New in version 1.5.0.

Returns:
wndarray

The frequencies at which h was computed, in the same units as fs. By default, w is normalized to the range [0, pi) (radians/sample).

hndarray

The frequency response, as complex numbers.

See also

freqz_zpk
sosfreqz

Notes

Using Matplotlib’s matplotlib.pyplot.plot() function as the callable for plot produces unexpected results, as this plots the real part of the complex transfer function, not the magnitude. Try lambda w, h: plot(w, np.abs(h)).

A direct computation via (R)FFT is used to compute the frequency response when the following conditions are met:

  1. An integer value is given for worN.

  2. worN is fast to compute via FFT (i.e., next_fast_len(worN) <scipy.fft.next_fast_len> equals worN).

  3. The denominator coefficients are a single value (a.shape[0] == 1).

  4. worN is at least as long as the numerator coefficients (worN >= b.shape[0]).

  5. If b.ndim > 1, then b.shape[-1] == 1.

For long FIR filters, the FFT approach can have lower error and be much faster than the equivalent direct polynomial calculation.

Examples

>>> from scipy import signal
>>> import numpy as np
>>> b = signal.firwin(80, 0.5, window=('kaiser', 8))
>>> w, h = signal.freqz(b)
>>> import matplotlib.pyplot as plt
>>> fig, ax1 = plt.subplots()
>>> ax1.set_title('Digital filter frequency response')
>>> ax1.plot(w, 20 * np.log10(abs(h)), 'b')
>>> ax1.set_ylabel('Amplitude [dB]', color='b')
>>> ax1.set_xlabel('Frequency [rad/sample]')
>>> ax2 = ax1.twinx()
>>> angles = np.unwrap(np.angle(h))
>>> ax2.plot(w, angles, 'g')
>>> ax2.set_ylabel('Angle (radians)', color='g')
>>> ax2.grid(True)
>>> ax2.axis('tight')
>>> plt.show()

Broadcasting Examples

Suppose we have two FIR filters whose coefficients are stored in the rows of an array with shape (2, 25). For this demonstration, we’ll use random data:

>>> rng = np.random.default_rng()
>>> b = rng.random((2, 25))

To compute the frequency response for these two filters with one call to freqz, we must pass in b.T, because freqz expects the first axis to hold the coefficients. We must then extend the shape with a trivial dimension of length 1 to allow broadcasting with the array of frequencies. That is, we pass in b.T[..., np.newaxis], which has shape (25, 2, 1):

>>> w, h = signal.freqz(b.T[..., np.newaxis], worN=1024)
>>> w.shape
(1024,)
>>> h.shape
(2, 1024)

Now, suppose we have two transfer functions, with the same numerator coefficients b = [0.5, 0.5]. The coefficients for the two denominators are stored in the first dimension of the 2-D array a:

a = [   1      1  ]
    [ -0.25, -0.5 ]
>>> b = np.array([0.5, 0.5])
>>> a = np.array([[1, 1], [-0.25, -0.5]])

Only a is more than 1-D. To make it compatible for broadcasting with the frequencies, we extend it with a trivial dimension in the call to freqz:

>>> w, h = signal.freqz(b, a[..., np.newaxis], worN=1024)
>>> w.shape
(1024,)
>>> h.shape
(2, 1024)
godin(ts)[source]

Low-pass Godin filter a regular time series. Applies the \(\mathcal{A_{24}^{2}A_{25}}\) Godin filter [1] The filter is generalized to be the equivalent of one boxcar of the length of the lunar diurnal (~25 hours) constituent and two of the solar diurnal (~24 hours), though the implementation combines these steps.

Parameters:
tsDataFrame
Returns:
resultDataFrame

A new regular time series with the same interval of ts.

Raises:
NotImplementedError

If input time series is not univariate

References

[1]

Godin (1972) Analysis of Tides

inset_axes(parent_axes, width, height, loc='upper right', bbox_to_anchor=None, bbox_transform=None, axes_class=None, axes_kwargs=None, borderpad=0.5)[source]

Create an inset axes with a given width and height.

Both sizes used can be specified either in inches or percentage. For example,:

inset_axes(parent_axes, width='40%', height='30%', loc='lower left')

creates in inset axes in the lower left corner of parent_axes which spans over 30% in height and 40% in width of the parent_axes. Since the usage of .inset_axes may become slightly tricky when exceeding such standard cases, it is recommended to read the examples.

Parameters:
parent_axesmatplotlib.axes.Axes

Axes to place the inset axes.

width, heightfloat or str

Size of the inset axes to create. If a float is provided, it is the size in inches, e.g. width=1.3. If a string is provided, it is the size in relative units, e.g. width=’40%’. By default, i.e. if neither bbox_to_anchor nor bbox_transform are specified, those are relative to the parent_axes. Otherwise, they are to be understood relative to the bounding box provided via bbox_to_anchor.

locstr, default: ‘upper right’

Location to place the inset axes. Valid locations are ‘upper left’, ‘upper center’, ‘upper right’, ‘center left’, ‘center’, ‘center right’, ‘lower left’, ‘lower center’, ‘lower right’. For backward compatibility, numeric values are accepted as well. See the parameter loc of .Legend for details.

bbox_to_anchortuple or ~matplotlib.transforms.BboxBase, optional

Bbox that the inset axes will be anchored to. If None, a tuple of (0, 0, 1, 1) is used if bbox_transform is set to parent_axes.transAxes or parent_axes.figure.transFigure. Otherwise, parent_axes.bbox is used. If a tuple, can be either [left, bottom, width, height], or [left, bottom]. If the kwargs width and/or height are specified in relative units, the 2-tuple [left, bottom] cannot be used. Note that, unless bbox_transform is set, the units of the bounding box are interpreted in the pixel coordinate. When using bbox_to_anchor with tuple, it almost always makes sense to also specify a bbox_transform. This might often be the axes transform parent_axes.transAxes.

bbox_transform~matplotlib.transforms.Transform, optional

Transformation for the bbox that contains the inset axes. If None, a .transforms.IdentityTransform is used. The value of bbox_to_anchor (or the return value of its get_points method) is transformed by the bbox_transform and then interpreted as points in the pixel coordinate (which is dpi dependent). You may provide bbox_to_anchor in some normalized coordinate, and give an appropriate transform (e.g., parent_axes.transAxes).

axes_class~matplotlib.axes.Axes type, default: .HostAxes

The type of the newly created inset axes.

axes_kwargsdict, optional

Keyword arguments to pass to the constructor of the inset axes. Valid arguments include:

Properties: adjustable: {‘box’, ‘datalim’} agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array and two offsets from the bottom left corner of the image alpha: scalar or None anchor: (float, float) or {‘C’, ‘SW’, ‘S’, ‘SE’, ‘E’, ‘NE’, …} animated: bool aspect: {‘auto’, ‘equal’} or float autoscale_on: bool autoscalex_on: unknown autoscaley_on: unknown axes_locator: Callable[[Axes, Renderer], Bbox] axisbelow: bool or ‘line’ box_aspect: float or None clip_box: ~matplotlib.transforms.BboxBase or None clip_on: bool clip_path: Patch or (Path, Transform) or None facecolor or fc: color figure: ~matplotlib.figure.Figure frame_on: bool gid: str in_layout: bool label: object mouseover: bool navigate: bool navigate_mode: unknown path_effects: list of .AbstractPathEffect picker: None or bool or float or callable position: [left, bottom, width, height] or ~matplotlib.transforms.Bbox prop_cycle: ~cycler.Cycler rasterization_zorder: float or None rasterized: bool sketch_params: (scale: float, length: float, randomness: float) snap: bool or None subplotspec: unknown title: str transform: ~matplotlib.transforms.Transform url: str visible: bool xbound: (lower: float, upper: float) xlabel: str xlim: (left: float, right: float) xmargin: float greater than -0.5 xscale: unknown xticklabels: unknown xticks: unknown ybound: (lower: float, upper: float) ylabel: str ylim: (bottom: float, top: float) ymargin: float greater than -0.5 yscale: unknown yticklabels: unknown yticks: unknown zorder: float

borderpadfloat, default: 0.5

Padding between inset axes and the bbox_to_anchor. The units are axes font size, i.e. for a default font size of 10 points borderpad = 0.5 is equivalent to a padding of 5 points.

Returns:
inset_axesaxes_class

Inset axes object created.

Notes

The meaning of bbox_to_anchor and bbox_to_transform is interpreted differently from that of legend. The value of bbox_to_anchor (or the return value of its get_points method; the default is parent_axes.bbox) is transformed by the bbox_transform (the default is Identity transform) and then interpreted as points in the pixel coordinate (which is dpi dependent).

Thus, following three calls are identical and creates an inset axes with respect to the parent_axes:

axins = inset_axes(parent_axes, "30%", "40%")
axins = inset_axes(parent_axes, "30%", "40%",
                   bbox_to_anchor=parent_axes.bbox)
axins = inset_axes(parent_axes, "30%", "40%",
                   bbox_to_anchor=(0, 0, 1, 1),
                   bbox_transform=parent_axes.transAxes)
lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]
lowpass_cosine_lanczos_filter_coef(cf, m, normalize=True)[source]

return the convolution coefficients for low pass lanczos filter.

Parameters:
cf: float

Cutoff frequency expressed as a ratio of a Nyquist frequency.

m: int

Size of filtering window size.

Returns:
results: list

Coefficients of filtering window.

lowpass_lanczos_filter_coef(cf, m, normalize=True, cosine_taper=False)[source]

Return the convolution coefficients for a low-pass Lanczos filter.

Parameters:
cffloat

Cutoff frequency expressed as a ratio of the Nyquist frequency.

mint

Size of the filtering window.

normalizebool, optional

Whether to normalize the filter coefficients so they sum to 1.

cosine_taperbool, optional

If True, applies a cosine-squared taper to the Lanczos window.

Returns:
resnp.ndarray

Coefficients of the filtering window.

main()[source]
mark_inset(parent_axes, inset_axes, loc1, loc2, **kwargs)[source]

Draw a box to mark the location of an area represented by an inset axes.

This function draws a box in parent_axes at the bounding box of inset_axes, and shows a connection with the inset axes by drawing lines at the corners, giving a “zoomed in” effect.

Parameters:
parent_axes~matplotlib.axes.Axes

Axes which contains the area of the inset axes.

inset_axes~matplotlib.axes.Axes

The inset axes.

loc1, loc2{1, 2, 3, 4}

Corners to use for connecting the inset axes and the area in the parent axes.

**kwargs

Patch properties for the lines and box drawn:

Properties: agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array and two offsets from the bottom left corner of the image alpha: unknown animated: bool antialiased or aa: bool or None capstyle: .CapStyle or {‘butt’, ‘projecting’, ‘round’} clip_box: ~matplotlib.transforms.BboxBase or None clip_on: bool clip_path: Patch or (Path, Transform) or None color: color edgecolor or ec: color or None facecolor or fc: color or None figure: ~matplotlib.figure.Figure fill: bool gid: str hatch: {‘/’, ‘\’, ‘|’, ‘-’, ‘+’, ‘x’, ‘o’, ‘O’, ‘.’, ‘*’} in_layout: bool joinstyle: .JoinStyle or {‘miter’, ‘round’, ‘bevel’} label: object linestyle or ls: {‘-’, ‘–’, ‘-.’, ‘:’, ‘’, (offset, on-off-seq), …} linewidth or lw: float or None mouseover: bool path_effects: list of .AbstractPathEffect picker: None or bool or float or callable rasterized: bool sketch_params: (scale: float, length: float, randomness: float) snap: bool or None transform: ~matplotlib.transforms.Transform url: str visible: bool zorder: float

Returns:
pp~matplotlib.patches.Patch

The patch drawn to represent the area of the inset axes.

p1, p2~matplotlib.patches.Patch

The patches connecting two corners of the inset axes and its area.

ts_gaussian_filter(ts, sigma, order=0, mode='reflect', cval=0.0, truncate=4.0)[source]

Column-wise Gaussian smoothing of regular time series. Missing/irregular values are not handled, which means this function is not much different from a rolling window gaussian average in pandas which may be preferable in the case of missing data (ts.rolling(window=5,win_type=’gaussian’).mean. This function has been kept around awaiting irreg as an aspiration but yet to be implemented.

Parameters:
tsDataFrame

The series to be smoothed

sigmaint or freq

The sigma scale of the smoothing (analogous to std. deviation), given as a number of steps or freq

Returns:
resultDataFrame

A new regular time series with the same interval of ts.

unit_impulse_ts(size, interval)[source]
Parameters:
sizeint

Length of impluse time series, odd number.

intervalstring

time series interval, such as “15min”

Returns:
pandas.Dataframe

time series value 0.0 except 1 at the middle of time series.

vtools.functions.interannual module

interannual(ts)[source]

Pivots years of multiyear series to columns and convert index to elapsed time of year

Parameters:
tsseries Univariate series
Returns:
annualDataFrame with year in the columns and elapsed time of year as index
interannual_sample()[source]
interannual_ticks_labels(freq)[source]

Convenient ticks and labels for interannual

Parameters:
freqFrequency string for desired spacing. Must be “Q”,”QS”,”M”,”MS” for quarterly or monthly
Returns:
ticks_labelstuple of tick locations and labels

vtools.functions.interpolate module

Module for data interpolation using splines or interfaces unavailable in Pandas.

_monotonic_spline(x, y, xnew)[source]

Third order (M3-A) monotonicity-preserving spline Usage: interpolate.spline(x,y,xnew)

where

x are the sorted index values of the original data y are the original dataxnew xnew are new locations for the spline

Reference: Huynh, HT <<Accurate Monotone Cubic Interpolation>>, SIAM J. Numer. Analysis V30 No. 1 pp 57-100 All equation numbers refer to this paper. The variable names are also almost the same. Double letters like “ee” to indicate that the subscript should have “+1/2” added to it and a number after the variable to show the “t” that the first member applies to.

_ratsp1(x, y, p, q, y0, yn)[source]

RATSP1 in Spath (1995)

interpolate_to_index(df, dest)[source]
monotonic_spline(ts, dest)[source]

Interpolating a regular time series (rts) to a finer rts by rational histospline.

The rational histospline preserves area under the curve. This is a good choice of spline for period averaged data where an interpolant is desired that is ‘conservative’. Note that it is the underlying continuous interpolant that will be ‘conservative though, not the returned discrete time series which merely samples the underlying interpolant.

Parameters:
tsPandas.DataFrame

Series to be interpolated, typically with DatetimeIndex

desta pandas freq code (e.g. ‘16min’ or ‘D’) or a DateTimeIndex
Returns:
resultDataFrame

A regular time series with same columns as ts, populated with instantaneous values and with an index of type DateTimeIndex

rhist(x, y, xnew, y0, yn, p, q)[source]

Histopline for arrays with tension. Based by an algorithm rhist2 in One Dimensional Spline Interpolation Algorithms by Helmuth Spath (1995).

Parameters:
xarray-like

Abscissa array of original data, of length n

yarray-like, dimension (n-1)

Values (mantissa) of original data giving the rectangle (average) values between x[i] and x[i+1]

xnewarray-like

Array of new locations at which to interpolate.

y0,ynfloat

Initial and terminal values

p,q: array-like, dimension (n-1)

Tension parameter, p and q are almost always the same. The higher p and q are for a particular x interval, the more rectangular the interpolant will look and the more positivity and shape preserving it is at the expense of accuracy. For this routine any number p,q > -1 is allowed, although the bound routine doesn’t use vals less than zero.

Returns
——-
ynewarray-like

Array that interpolates the original data.

rhist_bound(x, y, xnew, y0, yn, p, lbound=None, maxiter=5, pfactor=2, floor_eps=0.001)[source]

Numpy implementation of histospline with bounds Histopline for arrays with lower bound enforcement. This routine drives rhist() but tests that the output array observes the lower bound and adapts the tension parameters as needed.

This will not work exactly if the input array has values right on the lower bound. In this case, the parameter floor_eps allows you to specify a tolerance of bound violation to shoot for … and if it isn’t met in maxiter iterations the value is simply floored.

Parameters:
xarray-like

Abscissa array of original data to be interpolated, of length n

yarray-like, dimension (n-1)

Values (mantissa) of original data giving the rectangle (average) values between x[i] and x[i+1]

xnewarray-like

Array of new locations at which to interpolate.

y0,ynfloat

Initial and terminal values

p: float

Tension parameter. This starts out as a global scalar, but will be converted to an array and adapted locally. The higher this goes for a particular x interval, the more rectangular the interpolant will look and the more positivity and shape preserving it is at the expense of accuracy. A good number is 1, and for this routine, p > 0 is required because the adaptive process multiplies it by pfactor each iteration on the expectation that it will get bigger.

lbound: float

Lower bound to be enforced. If the original y’s are strictly above this value, the output has the potential to also be strictly above. If the original y’s lie on the lower bound, then the lower bound can only be enforced within a tolerance using the Spath algorithm … and once the values reach that tolerance they are floored. If lbound = None, this function behaves like rhist()

maxiterinteger

Number of times to increase p by multiplying it by pfactor before giving up on satisfying floor_eps.

pfactorfloat

Factor by which to multiply individual time step p

floor_epsfloat

Tolerance for lower bound violation at which the algorithm will be terminated and the bounds will be enforced by flooring.

Returns:
ynewarray-like

Array that interpolates the original data, on a curve that conserves mass and strictly observes the lower bound.

rhist_coef(x, y, y0, yn, p, q)[source]

Routine that produces coefficients for the histospline

rhist_example()[source]
rhist_val(xnew, x, p, q, a, b, c)[source]

Evaluate a histospline at new x points

rhistinterp(ts, dest, p=2.0, lowbound=None, tolbound=0.001, maxiter=5)[source]

Interpolate a regular time series (rts) to a finer rts by rational histospline.

The rational histospline preserves area under the curve. This is a good choice of spline for period averaged data where an interpolant is desired that is ‘conservative’. Note that it is the underlying continuous interpolant that will be ‘conservative though, not the returned discrete time series which merely samples the underlying interpolant.

Parameters:
tsPandas.DataFrame

Series to be interpolated, with period index and assuming time stamps at beginning of the period and no missing data

deststring or DateTimeIndex

A pandas freq code (e.g. ‘16min’ or ‘D’) or a DateTimeIndex

pfloat, optional

Spline tension, usually between 0 and 20. Must >-1. For a ‘sufficiently large’ value of p, the interpolant will be monotonicity-preserving and will maintain strict positivity (always being strictly > lowbound). It will also preserve the original shape of the time series.

lowboundfloat, optional

Lower bound of interpolated values.

tolboundfloat, optional

Tolerance for determining if an input is on the bound.

Returns:
resultpandas.DataFrame

A regular time series with same columns as ts, populated with instantaneous values and with an index of type DateTimeIndex

tridiagonal(a, b, c, d)[source]

a is the lower band (with leading zero) b is the center diagonal (length == nrow) c is upper band (trailing zero) d is right hand side

vtools.functions.lag_cross_correlation module

_coerce_fixed_duration(x, name)[source]
calculate_lag(lagged, base, max_lag, res, interpolate_method='linear')[source]

Calculate shift in lagged, that maximizes cross-correlation with base.

Parameters:
base,lagged: :class:`Pandas.Series`

time series to compare. The result is relative to base

max_lag: interval

Maximum pos/negative time shift to consider in cross-correlation (ie, from -max_lag to +max_lag). Required windows in lagged will account for this bracket. For series dominated by a single frequency (eg 1c/12.5 hours for tides), the algorithm can tolerate a range of 180 degrees (6 hours)

res: interval

Resolution of analysis. The series lagged will be interpolated to this resolution using interpolate_method. Unit used here will determine the type of the output. See documentation of the interval concept which is most compatible with pandas.tseries.offsets, not timeDelta, because of better math properties in things like division – vtime helpers like minutes(1) may be helpful

interpolate_method: str, optional

Interpolate method to refine lagged to res. Must be compatible with pandas interpolation method names (and hence scipy)

Returns:
laginterval

Shift as a pandas.tseries.offsets subtype that matches units with res This shift is the apparent lateness (pos) or earliness (neg). It must be applied to base or removed to lagged to align the features.

icrosscorr(lag, ts0, ts1)[source]

Lag-N cross correlation. Shifted data filled with NaNs

Parameters:
lagint, default 0
ts0, ts1pandas.Series objects of equal length
Returns
———-
crosscorrfloat
mincrosscorr(lag, ts0, ts1)[source]
to_offset(freq, is_period=False)

Return DateOffset object from string or datetime.timedelta object.

Parameters:
freqstr, datetime.timedelta, BaseOffset or None
Returns:
BaseOffset subclass or None
Raises:
ValueError

If freq is an invalid frequency

See also

BaseOffset

Standard kind of date increment used for a date range.

Examples

>>> from pandas.tseries.frequencies import to_offset
>>> to_offset("5min")
<5 * Minutes>
>>> to_offset("1D1h")
<25 * Hours>
>>> to_offset("2W")
<2 * Weeks: weekday=6>
>>> to_offset("2B")
<2 * BusinessDays>
>>> to_offset(pd.Timedelta(days=1))
<Day>
>>> to_offset(pd.offsets.Hour())
<Hour>

vtools.functions.merge module

_reindex_to_continuous(result, first_freq)[source]
align_inputs_strict(seq_arg=0, names_kw='names')[source]
reduce(function, iterable[, initial]) value

Apply a function of two arguments cumulatively to the items of a sequence or iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty.

ts_merge(series, names=None, strict_priority=False)[source]

Merge multiple time series together, prioritizing series in order.

Parameters:
seriessequence of pandas.Series or pandas.DataFrame

Higher priority first. All indexes must be DatetimeIndex.

namesNone, str, or iterable of str, optional
  • If None (default), inputs must share compatible column names.

  • If str, the output is univariate and will be named accordingly.

  • If iterable, it is used as a subset/ordering of columns.

strict_prioritybool, default False

If False (default): lower-priority data may fill NaNs in higher-priority series anywhere (traditional merge/overlay). If True: for each column, within the window [first_valid_index, last_valid_index] of any higher-priority series, lower-priority data are masked out — even if the higher-priority value is NaN. Outside those windows, behavior is unchanged.

Returns:
pandas.Series or pandas.DataFrame
ts_splice(series, names=None, transition='prefer_last', floor_dates=False)[source]

Splice multiple time series together, prioritizing series in patches of time.

Unlike ts_merge, which blends overlapping data points, ts_splice stitches together time series without overlap. The function determines when to switch between series based on a transition strategy.

Parameters:
seriestuple or list of pandas.DataFrame or pandas.Series

A tuple or list of time series. Each series must have a DatetimeIndex and consistent column structure.

namesNone, str, or iterable of str, optional
  • If None (default), all input series must share common column names, and the output will merge common columns.

  • If a str, all input series must have a single column, and the output will be a DataFrame with this name as the column name.

  • If an iterable of str, all input DataFrames must have the same number of columns matching the length of names, and these will be used for the output.

transition{‘prefer_first’, ‘prefer_last’} or list of pandas.Timestamp

Defines how to determine breakpoints between time series: - ‘prefer_first’: Uses the earlier series on the list during until its valid timestamp. - ‘prefer_last’: Uses the later series starting from its first valid timestamp. - A list of specific timestamps can also be provided as transition points.

floor_datesbool, optional, default=False

If True, inferred transition timestamps (prefer_first or prefer_last) are floored to the beginning of the day. This can introduce NaNs if the input series are regular with a freq attribute.

Returns:
pandas.DataFrame or pandas.Series
  • If the input contains multi-column DataFrames, the output is a DataFrame with the same column structure.

  • If a collection of single-column Series is provided, the output will be a Series.

  • The output retains a freq attribute if all inputs share the same frequency.

See also

ts_merge

Merges series by filling gaps in order of priority.

Notes

  • The output time index is the union of input time indices.

  • If transition is ‘prefer_first’, gaps may appear in the final time series.

  • If transition is ‘prefer_last’, overlapping data is resolved in favor of later series.

vtools.functions.neighbor_fill module

Neighbor-based time-series gap filling.

This module provides a single high-level API, fill_from_neighbor(), with pluggable backends for common algorithms used to infer a target series from one or more nearby stations. It is designed for operational use in Delta/Bay hydrodynamics workflows, but is intentionally general.

Highlights

  • Robust time alignment and optional resampling.

  • Multiple modeling strategies: OLS/robust, rolling regression, lagged elastic-net, and state-space/Kalman.

  • Forward-chaining (temporal) cross-validation utilities.

  • Optional regime stratification (e.g., barrier in/out, season).

  • Uncertainty estimates where available (analytic or residual-based).

  • Clear return structure with diagnostics for auditability.

Example

>>> res = fill_from_neighbor(
...     target=y, neighbor=x, method="state_space", lags=range(0, 4),
...     bounds=(0.0, None), regime=regime_series
... )
>>> filled = res["filled"]
>>> info = res["model_info"]

Notes

  • “Neighbor” can be one series or multiple (as a DataFrame); both are supported.

  • Missing data in the target are left as-is where the model cannot reasonably infer a value (e.g.no overlapping neighbor data). Where predictions exist, they are merged into the target to produce filled. DFM methods can carry through a gap in the neighbor.

class DFMFill(endog: DataFrame, factor: str = 'default', anomaly_mode: str = 'ar', anom_var: str = 'neighbor', rx_scale: float = 1.0)[source]

Bases: MLEModel

Bivariate DFM with level+slope common factor and optional anomalies.

Attributes:
param_names

(list of str) List of human readable parameter names (for parameters

start_params

(array) Starting parameters for maximum likelihood estimation.

Methods

update(params[, transformed])

Update the parameters of the model

__doc__ = '\n    Bivariate DFM with level+slope common factor and optional anomalies.\n\n    '
__init__(endog: DataFrame, factor: str = 'default', anomaly_mode: str = 'ar', anom_var: str = 'neighbor', rx_scale: float = 1.0)[source]
__module__ = 'vtools.functions.neighbor_fill'
_constrain(vec)[source]
property param_names

(list of str) List of human readable parameter names (for parameters actually included in the model).

property start_params: ndarray

(array) Starting parameters for maximum likelihood estimation.

update(params, transformed=True, **kwargs)[source]

Update the parameters of the model

Parameters:
paramsarray_like

Array of new parameters.

transformedbool, optional

Whether or not params is already transformed. If set to False, transform_params is called. Default is True.

Returns:
paramsarray_like

Array of parameters.

Notes

Since Model is a base class, this method should be overridden by subclasses to perform actual updating steps.

class ElasticNetCV(*, l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, precompute='auto', max_iter=1000, tol=0.0001, cv=None, copy_X=True, verbose=0, n_jobs=None, positive=False, random_state=None, selection='cyclic')[source]

Bases: RegressorMixin, LinearModelCV

Elastic Net model with iterative fitting along a regularization path.

See glossary entry for cross-validation estimator.

Read more in the User Guide.

Parameters:
l1_ratiofloat or list of float, default=0.5

Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2 This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1].

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path, used for each l1_ratio.

alphasarray-like, default=None

List of alphas where to compute the models. If None alphas are set automatically.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

precompute‘auto’, bool or array-like of shape (n_features, n_features), default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument.

max_iterint, default=1000

The maximum number of iterations.

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=0

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

positivebool, default=False

When set to True, forces the coefficients to be positive.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

See also

enet_path

Compute elastic net path with coordinate descent.

ElasticNet

Linear regression with combined L1 and L2 priors as regularizer.

Notes

In fit, once the best parameters l1_ratio and alpha are found through cross-validation, the model is fit again using the entire training set.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortran-contiguous numpy array.

The parameter l1_ratio corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the optimization objective is:

1 / (2 * n_samples) * ||y - Xw||^2_2
+ alpha * l1_ratio * ||w||_1
+ 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2

If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:

a * L1 + b * L2

for:

alpha = a + b and l1_ratio = a / (a + b).

For an example, see examples/linear_model/plot_lasso_model_selection.py.

Examples

>>> from sklearn.linear_model import ElasticNetCV
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=2, random_state=0)
>>> regr = ElasticNetCV(cv=5, random_state=0)
>>> regr.fit(X, y)
ElasticNetCV(cv=5, random_state=0)
>>> print(regr.alpha_)
0.199...
>>> print(regr.intercept_)
0.398...
>>> print(regr.predict([[0, 0]]))
[0.398...]
Attributes:
alpha_float

The amount of penalization chosen by cross validation.

l1_ratio_float

The compromise between l1 and l2 penalization chosen by cross validation.

coef_ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula).

intercept_float or ndarray of shape (n_targets, n_features)

Independent term in the decision function.

mse_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)

Mean square error for the test set on each fold, varying l1_ratio and alpha.

alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)

The grid of alphas used for fitting, for each l1_ratio.

dual_gap_float

The dual gaps at the end of the optimization for the optimal alpha.

n_iter_int

Number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.

n_features_in_int

Number of features seen during fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

Methods

path(X, y, *[, l1_ratio, eps, n_alphas, ...])

Compute elastic net path with coordinate descent.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

__abstractmethods__ = frozenset({})
__annotations__ = {'_parameter_constraints': <class 'dict'>}
__doc__ = "Elastic Net model with iterative fitting along a regularization path.\n\n    See glossary entry for :term:`cross-validation estimator`.\n\n    Read more in the :ref:`User Guide <elastic_net>`.\n\n    Parameters\n    ----------\n    l1_ratio : float or list of float, default=0.5\n        Float between 0 and 1 passed to ElasticNet (scaling between\n        l1 and l2 penalties). For ``l1_ratio = 0``\n        the penalty is an L2 penalty. For ``l1_ratio = 1`` it is an L1 penalty.\n        For ``0 < l1_ratio < 1``, the penalty is a combination of L1 and L2\n        This parameter can be a list, in which case the different\n        values are tested by cross-validation and the one giving the best\n        prediction score is used. Note that a good choice of list of\n        values for l1_ratio is often to put more values close to 1\n        (i.e. Lasso) and less close to 0 (i.e. Ridge), as in ``[.1, .5, .7,\n        .9, .95, .99, 1]``.\n\n    eps : float, default=1e-3\n        Length of the path. ``eps=1e-3`` means that\n        ``alpha_min / alpha_max = 1e-3``.\n\n    n_alphas : int, default=100\n        Number of alphas along the regularization path, used for each l1_ratio.\n\n    alphas : array-like, default=None\n        List of alphas where to compute the models.\n        If None alphas are set automatically.\n\n    fit_intercept : bool, default=True\n        Whether to calculate the intercept for this model. If set\n        to false, no intercept will be used in calculations\n        (i.e. data is expected to be centered).\n\n    precompute : 'auto', bool or array-like of shape             (n_features, n_features), default='auto'\n        Whether to use a precomputed Gram matrix to speed up\n        calculations. If set to ``'auto'`` let us decide. The Gram\n        matrix can also be passed as argument.\n\n    max_iter : int, default=1000\n        The maximum number of iterations.\n\n    tol : float, default=1e-4\n        The tolerance for the optimization: if the updates are\n        smaller than ``tol``, the optimization code checks the\n        dual gap for optimality and continues until it is smaller\n        than ``tol``.\n\n    cv : int, cross-validation generator or iterable, default=None\n        Determines the cross-validation splitting strategy.\n        Possible inputs for cv are:\n\n        - None, to use the default 5-fold cross-validation,\n        - int, to specify the number of folds.\n        - :term:`CV splitter`,\n        - An iterable yielding (train, test) splits as arrays of indices.\n\n        For int/None inputs, :class:`~sklearn.model_selection.KFold` is used.\n\n        Refer :ref:`User Guide <cross_validation>` for the various\n        cross-validation strategies that can be used here.\n\n        .. versionchanged:: 0.22\n            ``cv`` default value if None changed from 3-fold to 5-fold.\n\n    copy_X : bool, default=True\n        If ``True``, X will be copied; else, it may be overwritten.\n\n    verbose : bool or int, default=0\n        Amount of verbosity.\n\n    n_jobs : int, default=None\n        Number of CPUs to use during the cross validation.\n        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.\n        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`\n        for more details.\n\n    positive : bool, default=False\n        When set to ``True``, forces the coefficients to be positive.\n\n    random_state : int, RandomState instance, default=None\n        The seed of the pseudo random number generator that selects a random\n        feature to update. Used when ``selection`` == 'random'.\n        Pass an int for reproducible output across multiple function calls.\n        See :term:`Glossary <random_state>`.\n\n    selection : {'cyclic', 'random'}, default='cyclic'\n        If set to 'random', a random coefficient is updated every iteration\n        rather than looping over features sequentially by default. This\n        (setting to 'random') often leads to significantly faster convergence\n        especially when tol is higher than 1e-4.\n\n    Attributes\n    ----------\n    alpha_ : float\n        The amount of penalization chosen by cross validation.\n\n    l1_ratio_ : float\n        The compromise between l1 and l2 penalization chosen by\n        cross validation.\n\n    coef_ : ndarray of shape (n_features,) or (n_targets, n_features)\n        Parameter vector (w in the cost function formula).\n\n    intercept_ : float or ndarray of shape (n_targets, n_features)\n        Independent term in the decision function.\n\n    mse_path_ : ndarray of shape (n_l1_ratio, n_alpha, n_folds)\n        Mean square error for the test set on each fold, varying l1_ratio and\n        alpha.\n\n    alphas_ : ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)\n        The grid of alphas used for fitting, for each l1_ratio.\n\n    dual_gap_ : float\n        The dual gaps at the end of the optimization for the optimal alpha.\n\n    n_iter_ : int\n        Number of iterations run by the coordinate descent solver to reach\n        the specified tolerance for the optimal alpha.\n\n    n_features_in_ : int\n        Number of features seen during :term:`fit`.\n\n        .. versionadded:: 0.24\n\n    feature_names_in_ : ndarray of shape (`n_features_in_`,)\n        Names of features seen during :term:`fit`. Defined only when `X`\n        has feature names that are all strings.\n\n        .. versionadded:: 1.0\n\n    See Also\n    --------\n    enet_path : Compute elastic net path with coordinate descent.\n    ElasticNet : Linear regression with combined L1 and L2 priors as regularizer.\n\n    Notes\n    -----\n    In `fit`, once the best parameters `l1_ratio` and `alpha` are found through\n    cross-validation, the model is fit again using the entire training set.\n\n    To avoid unnecessary memory duplication the `X` argument of the `fit`\n    method should be directly passed as a Fortran-contiguous numpy array.\n\n    The parameter `l1_ratio` corresponds to alpha in the glmnet R package\n    while alpha corresponds to the lambda parameter in glmnet.\n    More specifically, the optimization objective is::\n\n        1 / (2 * n_samples) * ||y - Xw||^2_2\n        + alpha * l1_ratio * ||w||_1\n        + 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2\n\n    If you are interested in controlling the L1 and L2 penalty\n    separately, keep in mind that this is equivalent to::\n\n        a * L1 + b * L2\n\n    for::\n\n        alpha = a + b and l1_ratio = a / (a + b).\n\n    For an example, see\n    :ref:`examples/linear_model/plot_lasso_model_selection.py\n    <sphx_glr_auto_examples_linear_model_plot_lasso_model_selection.py>`.\n\n    Examples\n    --------\n    >>> from sklearn.linear_model import ElasticNetCV\n    >>> from sklearn.datasets import make_regression\n\n    >>> X, y = make_regression(n_features=2, random_state=0)\n    >>> regr = ElasticNetCV(cv=5, random_state=0)\n    >>> regr.fit(X, y)\n    ElasticNetCV(cv=5, random_state=0)\n    >>> print(regr.alpha_)\n    0.199...\n    >>> print(regr.intercept_)\n    0.398...\n    >>> print(regr.predict([[0, 0]]))\n    [0.398...]\n    "
__init__(*, l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, precompute='auto', max_iter=1000, tol=0.0001, cv=None, copy_X=True, verbose=0, n_jobs=None, positive=False, random_state=None, selection='cyclic')[source]
__module__ = 'sklearn.linear_model._coordinate_descent'
_abc_impl = <_abc._abc_data object>
_get_estimator()[source]

Model to be fitted after the best alpha has been determined.

_is_multitask()[source]

Bool indicating if class is meant for multidimensional target.

_more_tags()[source]
_parameter_constraints: dict = {'alphas': ['array-like', None], 'copy_X': ['boolean'], 'cv': ['cv_object'], 'eps': [<sklearn.utils._param_validation.Interval object>], 'fit_intercept': ['boolean'], 'l1_ratio': [<sklearn.utils._param_validation.Interval object>, 'array-like'], 'max_iter': [<sklearn.utils._param_validation.Interval object>], 'n_alphas': [<sklearn.utils._param_validation.Interval object>], 'n_jobs': [<class 'numbers.Integral'>, None], 'positive': ['boolean'], 'precompute': [<sklearn.utils._param_validation.StrOptions object>, 'array-like', 'boolean'], 'random_state': ['random_state'], 'selection': [<sklearn.utils._param_validation.StrOptions object>], 'tol': [<sklearn.utils._param_validation.Interval object>], 'verbose': ['verbose']}
static path(X, y, *, l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, precompute='auto', Xy=None, copy_X=True, coef_init=None, verbose=False, return_n_iter=False, positive=False, check_input=True, **params)

Compute elastic net path with coordinate descent.

The elastic net optimization function varies for mono and multi-outputs.

For mono-output tasks it is:

1 / (2 * n_samples) * ||y - Xw||^2_2
+ alpha * l1_ratio * ||w||_1
+ 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2

For multi-output tasks it is:

(1 / (2 * n_samples)) * ||Y - XW||_Fro^2
+ alpha * l1_ratio * ||W||_21
+ 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2

Where:

||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}

i.e. the sum of norm of each row.

Read more in the User Guide.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training data. Pass directly as Fortran-contiguous data to avoid unnecessary memory duplication. If y is mono-output then X can be sparse.

y{array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_targets)

Target values.

l1_ratiofloat, default=0.5

Number between 0 and 1 passed to elastic net (scaling between l1 and l2 penalties). l1_ratio=1 corresponds to the Lasso.

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path.

alphasarray-like, default=None

List of alphas where to compute the models. If None alphas are set automatically.

precompute‘auto’, bool or array-like of shape (n_features, n_features), default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument.

Xyarray-like of shape (n_features,) or (n_features, n_targets), default=None

Xy = np.dot(X.T, y) that can be precomputed. It is useful only when the Gram matrix is precomputed.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

coef_initarray-like of shape (n_features, ), default=None

The initial values of the coefficients.

verbosebool or int, default=False

Amount of verbosity.

return_n_iterbool, default=False

Whether to return the number of iterations or not.

positivebool, default=False

If set to True, forces coefficients to be positive. (Only allowed when y.ndim == 1).

check_inputbool, default=True

If set to False, the input validation checks are skipped (including the Gram matrix when provided). It is assumed that they are handled by the caller.

**paramskwargs

Keyword arguments passed to the coordinate descent solver.

Returns:
alphasndarray of shape (n_alphas,)

The alphas along the path where models are computed.

coefsndarray of shape (n_features, n_alphas) or (n_targets, n_features, n_alphas)

Coefficients along the path.

dual_gapsndarray of shape (n_alphas,)

The dual gaps at the end of the optimization for each alpha.

n_iterslist of int

The number of iterations taken by the coordinate descent optimizer to reach the specified tolerance for each alpha. (Is returned when return_n_iter is set to True).

See also

MultiTaskElasticNet

Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer.

MultiTaskElasticNetCV

Multi-task L1/L2 ElasticNet with built-in cross-validation.

ElasticNet

Linear regression with combined L1 and L2 priors as regularizer.

ElasticNetCV

Elastic Net model with iterative fitting along a regularization path.

Notes

For an example, see examples/linear_model/plot_lasso_coordinate_descent_path.py.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ElasticNetCV

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ElasticNetCV

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class FillResult(filled: Series, yhat: Series, pi_lower: Series | None, pi_upper: Series | None, model_info: Dict[str, Any], metrics: Dict[str, float])[source]

Bases: object

Container for gap-filling outputs.

Parameters:
filledpd.Series

Target series with gaps filled where possible.

yhatpd.Series

Model predictions aligned to the union index used for fitting/prediction.

pi_lower, pi_upperOptional[pd.Series]

Prediction interval bounds where available; otherwise None.

model_infodict

Method, parameters, chosen lags, training window, etc.

metricsdict

Holdout scores (MAE/RMSE/R^2) using forward-chaining CV where configured.

Methods

to_dict

__annotations__ = {'filled': 'pd.Series', 'metrics': 'Dict[str, float]', 'model_info': 'Dict[str, Any]', 'pi_lower': 'Optional[pd.Series]', 'pi_upper': 'Optional[pd.Series]', 'yhat': 'pd.Series'}
__dataclass_fields__ = {'filled': Field(name='filled',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'metrics': Field(name='metrics',type='Dict[str, float]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'model_info': Field(name='model_info',type='Dict[str, Any]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_lower': Field(name='pi_lower',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_upper': Field(name='pi_upper',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'yhat': Field(name='yhat',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}
__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)
__dict__ = mappingproxy({'__module__': 'vtools.functions.neighbor_fill', '__annotations__': {'filled': 'pd.Series', 'yhat': 'pd.Series', 'pi_lower': 'Optional[pd.Series]', 'pi_upper': 'Optional[pd.Series]', 'model_info': 'Dict[str, Any]', 'metrics': 'Dict[str, float]'}, '__doc__': 'Container for gap-filling outputs.\n\n    Parameters\n    ----------\n    filled : pd.Series\n        Target series with gaps filled where possible.\n\n    yhat : pd.Series\n        Model predictions aligned to the union index used for fitting/prediction.\n\n    pi_lower, pi_upper : Optional[pd.Series]\n        Prediction interval bounds where available; otherwise ``None``.\n\n    model_info : dict\n        Method, parameters, chosen lags, training window, etc.\n\n    metrics : dict\n        Holdout scores (MAE/RMSE/R^2) using forward-chaining CV where configured.\n    ', 'to_dict': <function FillResult.to_dict>, '__dict__': <attribute '__dict__' of 'FillResult' objects>, '__weakref__': <attribute '__weakref__' of 'FillResult' objects>, '__dataclass_params__': _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False), '__dataclass_fields__': {'filled': Field(name='filled',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'yhat': Field(name='yhat',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_lower': Field(name='pi_lower',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_upper': Field(name='pi_upper',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'model_info': Field(name='model_info',type='Dict[str, Any]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'metrics': Field(name='metrics',type='Dict[str, float]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}, '__init__': <function FillResult.__init__>, '__repr__': <function FillResult.__repr__>, '__eq__': <function FillResult.__eq__>, '__hash__': None, '__match_args__': ('filled', 'yhat', 'pi_lower', 'pi_upper', 'model_info', 'metrics')})
__doc__ = 'Container for gap-filling outputs.\n\n    Parameters\n    ----------\n    filled : pd.Series\n        Target series with gaps filled where possible.\n\n    yhat : pd.Series\n        Model predictions aligned to the union index used for fitting/prediction.\n\n    pi_lower, pi_upper : Optional[pd.Series]\n        Prediction interval bounds where available; otherwise ``None``.\n\n    model_info : dict\n        Method, parameters, chosen lags, training window, etc.\n\n    metrics : dict\n        Holdout scores (MAE/RMSE/R^2) using forward-chaining CV where configured.\n    '
__eq__(other)

Return self==value.

__hash__ = None
__init__(filled: Series, yhat: Series, pi_lower: Series | None, pi_upper: Series | None, model_info: Dict[str, Any], metrics: Dict[str, float]) None
__match_args__ = ('filled', 'yhat', 'pi_lower', 'pi_upper', 'model_info', 'metrics')
__module__ = 'vtools.functions.neighbor_fill'
__repr__()

Return repr(self).

__weakref__

list of weak references to the object (if defined)

filled: Series
metrics: Dict[str, float]
model_info: Dict[str, Any]
pi_lower: Series | None
pi_upper: Series | None
to_dict() Dict[str, Any][source]
yhat: Series
class HuberT(t=1.345)[source]

Bases: RobustNorm

Huber’s T for M estimation.

Parameters:
tfloat, optional

The tuning constant for Huber’s t function. The default value is 1.345.

See also

statsmodels.robust.norms.RobustNorm

Methods

psi(z)

The psi function for Huber's t estimator

psi_deriv(z)

The derivative of Huber's t psi function

rho(z)

The robust criterion function for Huber's t.

weights(z)

Huber's t weighting function for the IRLS algorithm

__doc__ = "\n    Huber's T for M estimation.\n\n    Parameters\n    ----------\n    t : float, optional\n        The tuning constant for Huber's t function. The default value is\n        1.345.\n\n    See Also\n    --------\n    statsmodels.robust.norms.RobustNorm\n    "
__init__(t=1.345)[source]
__module__ = 'statsmodels.robust.norms'
_subset(z)[source]

Huber’s T is defined piecewise over the range for z

psi(z)[source]

The psi function for Huber’s t estimator

The analytic derivative of rho

Parameters:
zarray_like

1d array

Returns:
psindarray

psi(z) = z for |z| <= t

psi(z) = sign(z)*t for |z| > t

psi_deriv(z)[source]

The derivative of Huber’s t psi function

Notes

Used to estimate the robust covariance matrix.

rho(z)[source]

The robust criterion function for Huber’s t.

Parameters:
zarray_like

1d array

Returns:
rhondarray

rho(z) = .5*z**2 for |z| <= t

rho(z) = |z|*t - .5*t**2 for |z| > t

weights(z)[source]

Huber’s t weighting function for the IRLS algorithm

The psi function scaled by z

Parameters:
zarray_like

1d array

Returns:
weightsndarray

weights(z) = 1 for |z| <= t

weights(z) = t/|z| for |z| > t

class KNeighborsRegressor(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)[source]

Bases: KNeighborsMixin, RegressorMixin, NeighborsBase

Regression based on k-nearest neighbors.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Read more in the User Guide.

New in version 0.9.

Parameters:
n_neighborsint, default=5

Number of neighbors to use by default for kneighbors() queries.

weights{‘uniform’, ‘distance’}, callable or None, default=’uniform’

Weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

pfloat, default=2

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metricstr, DistanceMetric object or callable, default=’minkowski’

Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in distance_metrics for valid metric values.

If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.

If metric is a DistanceMetric object, it will be passed directly to the underlying computation routines.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=None

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn’t affect fit() method.

See also

NearestNeighbors

Unsupervised learner for implementing neighbor searches.

RadiusNeighborsRegressor

Regression based on neighbors within a fixed radius.

KNeighborsClassifier

Classifier implementing the k-nearest neighbors vote.

RadiusNeighborsClassifier

Classifier implementing a vote among neighbors within a given radius.

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsRegressor
>>> neigh = KNeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y)
KNeighborsRegressor(...)
>>> print(neigh.predict([[1.5]]))
[0.5]
Attributes:
effective_metric_str or callable

The distance metric to use. It will be same as the metric parameter or a synonym of it, e.g. ‘euclidean’ if the metric parameter set to ‘minkowski’ and p parameter set to 2.

effective_metric_params_dict

Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.

n_features_in_int

Number of features seen during fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

n_samples_fit_int

Number of samples in the fitted data.

Methods

fit(X, y)

Fit the k-nearest neighbors regressor from the training dataset.

predict(X)

Predict the target for the provided data.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

__abstractmethods__ = frozenset({})
__annotations__ = {'_parameter_constraints': <class 'dict'>}
__doc__ = 'Regression based on k-nearest neighbors.\n\n    The target is predicted by local interpolation of the targets\n    associated of the nearest neighbors in the training set.\n\n    Read more in the :ref:`User Guide <regression>`.\n\n    .. versionadded:: 0.9\n\n    Parameters\n    ----------\n    n_neighbors : int, default=5\n        Number of neighbors to use by default for :meth:`kneighbors` queries.\n\n    weights : {\'uniform\', \'distance\'}, callable or None, default=\'uniform\'\n        Weight function used in prediction.  Possible values:\n\n        - \'uniform\' : uniform weights.  All points in each neighborhood\n          are weighted equally.\n        - \'distance\' : weight points by the inverse of their distance.\n          in this case, closer neighbors of a query point will have a\n          greater influence than neighbors which are further away.\n        - [callable] : a user-defined function which accepts an\n          array of distances, and returns an array of the same shape\n          containing the weights.\n\n        Uniform weights are used by default.\n\n    algorithm : {\'auto\', \'ball_tree\', \'kd_tree\', \'brute\'}, default=\'auto\'\n        Algorithm used to compute the nearest neighbors:\n\n        - \'ball_tree\' will use :class:`BallTree`\n        - \'kd_tree\' will use :class:`KDTree`\n        - \'brute\' will use a brute-force search.\n        - \'auto\' will attempt to decide the most appropriate algorithm\n          based on the values passed to :meth:`fit` method.\n\n        Note: fitting on sparse input will override the setting of\n        this parameter, using brute force.\n\n    leaf_size : int, default=30\n        Leaf size passed to BallTree or KDTree.  This can affect the\n        speed of the construction and query, as well as the memory\n        required to store the tree.  The optimal value depends on the\n        nature of the problem.\n\n    p : float, default=2\n        Power parameter for the Minkowski metric. When p = 1, this is\n        equivalent to using manhattan_distance (l1), and euclidean_distance\n        (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.\n\n    metric : str, DistanceMetric object or callable, default=\'minkowski\'\n        Metric to use for distance computation. Default is "minkowski", which\n        results in the standard Euclidean distance when p = 2. See the\n        documentation of `scipy.spatial.distance\n        <https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and\n        the metrics listed in\n        :class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric\n        values.\n\n        If metric is "precomputed", X is assumed to be a distance matrix and\n        must be square during fit. X may be a :term:`sparse graph`, in which\n        case only "nonzero" elements may be considered neighbors.\n\n        If metric is a callable function, it takes two arrays representing 1D\n        vectors as inputs and must return one value indicating the distance\n        between those vectors. This works for Scipy\'s metrics, but is less\n        efficient than passing the metric name as a string.\n\n        If metric is a DistanceMetric object, it will be passed directly to\n        the underlying computation routines.\n\n    metric_params : dict, default=None\n        Additional keyword arguments for the metric function.\n\n    n_jobs : int, default=None\n        The number of parallel jobs to run for neighbors search.\n        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.\n        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`\n        for more details.\n        Doesn\'t affect :meth:`fit` method.\n\n    Attributes\n    ----------\n    effective_metric_ : str or callable\n        The distance metric to use. It will be same as the `metric` parameter\n        or a synonym of it, e.g. \'euclidean\' if the `metric` parameter set to\n        \'minkowski\' and `p` parameter set to 2.\n\n    effective_metric_params_ : dict\n        Additional keyword arguments for the metric function. For most metrics\n        will be same with `metric_params` parameter, but may also contain the\n        `p` parameter value if the `effective_metric_` attribute is set to\n        \'minkowski\'.\n\n    n_features_in_ : int\n        Number of features seen during :term:`fit`.\n\n        .. versionadded:: 0.24\n\n    feature_names_in_ : ndarray of shape (`n_features_in_`,)\n        Names of features seen during :term:`fit`. Defined only when `X`\n        has feature names that are all strings.\n\n        .. versionadded:: 1.0\n\n    n_samples_fit_ : int\n        Number of samples in the fitted data.\n\n    See Also\n    --------\n    NearestNeighbors : Unsupervised learner for implementing neighbor searches.\n    RadiusNeighborsRegressor : Regression based on neighbors within a fixed radius.\n    KNeighborsClassifier : Classifier implementing the k-nearest neighbors vote.\n    RadiusNeighborsClassifier : Classifier implementing\n        a vote among neighbors within a given radius.\n\n    Notes\n    -----\n    See :ref:`Nearest Neighbors <neighbors>` in the online documentation\n    for a discussion of the choice of ``algorithm`` and ``leaf_size``.\n\n    .. warning::\n\n       Regarding the Nearest Neighbors algorithms, if it is found that two\n       neighbors, neighbor `k+1` and `k`, have identical distances but\n       different labels, the results will depend on the ordering of the\n       training data.\n\n    https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm\n\n    Examples\n    --------\n    >>> X = [[0], [1], [2], [3]]\n    >>> y = [0, 0, 1, 1]\n    >>> from sklearn.neighbors import KNeighborsRegressor\n    >>> neigh = KNeighborsRegressor(n_neighbors=2)\n    >>> neigh.fit(X, y)\n    KNeighborsRegressor(...)\n    >>> print(neigh.predict([[1.5]]))\n    [0.5]\n    '
__init__(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)[source]
__module__ = 'sklearn.neighbors._regression'
_abc_impl = <_abc._abc_data object>
_more_tags()[source]
_parameter_constraints: dict = {'algorithm': [<sklearn.utils._param_validation.StrOptions object>], 'leaf_size': [<sklearn.utils._param_validation.Interval object>], 'metric': [<sklearn.utils._param_validation.StrOptions object>, <built-in function callable>, <class 'sklearn.metrics._dist_metrics.DistanceMetric'>], 'metric_params': [<class 'dict'>, None], 'n_jobs': [<class 'numbers.Integral'>, None], 'n_neighbors': [<sklearn.utils._param_validation.Interval object>, None], 'p': [<sklearn.utils._param_validation.Interval object>, None], 'weights': [<sklearn.utils._param_validation.StrOptions object>, <built-in function callable>, None]}
fit(X, y)[source]

Fit the k-nearest neighbors regressor from the training dataset.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples) if metric=’precomputed’

Training data.

y{array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_outputs)

Target values.

Returns:
selfKNeighborsRegressor

The fitted k-nearest neighbors regressor.

predict(X)[source]

Predict the target for the provided data.

Parameters:
X{array-like, sparse matrix} of shape (n_queries, n_features), or (n_queries, n_indexed) if metric == ‘precomputed’

Test samples.

Returns:
yndarray of shape (n_queries,) or (n_queries, n_outputs), dtype=int

Target values.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KNeighborsRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class MLEModel(endog, k_states, exog=None, dates=None, freq=None, **kwargs)[source]

Bases: TimeSeriesModel

State space model for maximum likelihood estimation

Parameters:
endogarray_like

The observed time-series process \(y\)

k_statesint

The dimension of the unobserved state process.

exogarray_like, optional

Array of exogenous regressors, shaped nobs x k. Default is no exogenous regressors.

datesarray_like of datetime, optional

An array-like object of datetime objects. If a Pandas object is given for endog, it is assumed to have a DateIndex.

freqstr, optional

The frequency of the time-series. A Pandas offset or ‘B’, ‘D’, ‘W’, ‘M’, ‘A’, or ‘Q’. This is optional if dates are given.

**kwargs

Keyword arguments may be used to provide default values for state space matrices or for Kalman filtering options. See Representation, and KalmanFilter for more details.

See also

statsmodels.tsa.statespace.mlemodel.MLEResults
statsmodels.tsa.statespace.kalman_filter.KalmanFilter
statsmodels.tsa.statespace.representation.Representation

Notes

This class wraps the state space model with Kalman filtering to add in functionality for maximum likelihood estimation. In particular, it adds the concept of updating the state space representation based on a defined set of parameters, through the update method or updater attribute (see below for more details on which to use when), and it adds a fit method which uses a numerical optimizer to select the parameters that maximize the likelihood of the model.

The start_params update method must be overridden in the child class (and the transform and untransform methods, if needed).

Attributes:
ssmstatsmodels.tsa.statespace.kalman_filter.KalmanFilter

Underlying state space representation.

Methods

clone(endog[, exog])

Clone state space model with new data and optionally new specification

filter(params[, transformed, ...])

Kalman filtering

fit([start_params, transformed, ...])

Fits the model by maximum likelihood via Kalman filter.

fit_constrained(constraints[, start_params])

Fit the model with some parameters subject to equality constraints.

fix_params(params)

Fix parameters to specific values (context manager)

from_formula(formula, data[, subset])

Not implemented for state space models

handle_params(params[, transformed, ...])

Ensure model parameters satisfy shape and other requirements

hessian(params, *args, **kwargs)

Hessian matrix of the likelihood function, evaluated at the given parameters

impulse_responses(params[, steps, impulse, ...])

Impulse response function

initialize_approximate_diffuse([variance])

Initialize approximate diffuse

initialize_known(initial_state, ...)

Initialize known

initialize_statespace(**kwargs)

Initialize the state space representation

initialize_stationary()

Initialize stationary

loglike(params, *args, **kwargs)

Loglikelihood evaluation

loglikeobs(params[, transformed, ...])

Loglikelihood evaluation

observed_information_matrix(params[, ...])

Observed information matrix

opg_information_matrix(params[, ...])

Outer product of gradients information matrix

prepare_data()

Prepare data for use in the state space representation

score(params, *args, **kwargs)

Compute the score function at params.

score_obs(params[, method, transformed, ...])

Compute the score per observation, evaluated at params

set_conserve_memory([conserve_memory])

Set the memory conservation method

set_filter_method([filter_method])

Set the filtering method

set_inversion_method([inversion_method])

Set the inversion method

set_smoother_output([smoother_output])

Set the smoother output

set_stability_method([stability_method])

Set the numerical stability method

simulate(params, nsimulations[, ...])

Simulate a new time series following the state space model

simulation_smoother([simulation_output])

Retrieve a simulation smoother for the state space model.

smooth(params[, transformed, ...])

Kalman smoothing

transform_jacobian(unconstrained[, ...])

Jacobian matrix for the parameter transformation function

transform_params(unconstrained)

Transform unconstrained parameters used by the optimizer to constrained parameters used in likelihood evaluation

untransform_params(constrained)

Transform constrained parameters used in likelihood evaluation to unconstrained parameters used by the optimizer

update(params[, transformed, ...])

Update the parameters of the model

__annotations__ = {}
__doc__ = "\n    State space model for maximum likelihood estimation\n\n    Parameters\n    ----------\n    endog : array_like\n        The observed time-series process :math:`y`\n    k_states : int\n        The dimension of the unobserved state process.\n    exog : array_like, optional\n        Array of exogenous regressors, shaped nobs x k. Default is no\n        exogenous regressors.\n    dates : array_like of datetime, optional\n        An array-like object of datetime objects. If a Pandas object is given\n        for endog, it is assumed to have a DateIndex.\n    freq : str, optional\n        The frequency of the time-series. A Pandas offset or 'B', 'D', 'W',\n        'M', 'A', or 'Q'. This is optional if dates are given.\n    **kwargs\n        Keyword arguments may be used to provide default values for state space\n        matrices or for Kalman filtering options. See `Representation`, and\n        `KalmanFilter` for more details.\n\n    Attributes\n    ----------\n    ssm : statsmodels.tsa.statespace.kalman_filter.KalmanFilter\n        Underlying state space representation.\n\n    See Also\n    --------\n    statsmodels.tsa.statespace.mlemodel.MLEResults\n    statsmodels.tsa.statespace.kalman_filter.KalmanFilter\n    statsmodels.tsa.statespace.representation.Representation\n\n    Notes\n    -----\n    This class wraps the state space model with Kalman filtering to add in\n    functionality for maximum likelihood estimation. In particular, it adds\n    the concept of updating the state space representation based on a defined\n    set of parameters, through the `update` method or `updater` attribute (see\n    below for more details on which to use when), and it adds a `fit` method\n    which uses a numerical optimizer to select the parameters that maximize\n    the likelihood of the model.\n\n    The `start_params` `update` method must be overridden in the\n    child class (and the `transform` and `untransform` methods, if needed).\n    "
__getitem__(key)[source]
__init__(endog, k_states, exog=None, dates=None, freq=None, **kwargs)[source]
__module__ = 'statsmodels.tsa.statespace.mlemodel'
__setitem__(key, value)[source]
_clone_from_init_kwds(endog, **kwargs)[source]
_forecasts_error_partial_derivatives(params, transformed=True, includes_fixed=False, approx_complex_step=None, approx_centered=False, res=None, **kwargs)[source]
_get_extension_time_varying_matrices(params, exog, out_of_sample, extend_kwargs=None, transformed=True, includes_fixed=False, **kwargs)[source]

Get updated time-varying state space system matrices

Parameters:
paramsarray_like

Array of parameters used to construct the time-varying system matrices.

exogarray_like or None

New observations of exogenous regressors, if applicable.

out_of_sampleint

Number of new observations required.

extend_kwargsdict, optional

Dictionary of keyword arguments to pass to the state space model constructor. For example, for an SARIMAX state space model, this could be used to pass the concentrate_scale=True keyword argument. Any arguments that are not explicitly set in this dictionary will be copied from the current model instance.

transformedbool, optional

Whether or not start_params is already transformed. Default is True.

includes_fixedbool, optional

If parameters were previously fixed with the fix_params method, this argument describes whether or not start_params also includes the fixed parameters, in addition to the free parameters. Default is False.

_get_index_with_final_state()[source]
_get_init_kwds()[source]

return dictionary with extra keys used in model.__init__

_hessian_complex_step(params, **kwargs)[source]

Hessian matrix computed by second-order complex-step differentiation on the loglike function.

_hessian_finite_difference(params, approx_centered=False, **kwargs)[source]
_hessian_oim(params, **kwargs)[source]

Hessian matrix computed using the Harvey (1989) information matrix

_hessian_opg(params, **kwargs)[source]

Hessian matrix computed using the outer product of gradients information matrix

_hessian_param_defaults = [True, 'approx', None, False]
_hessian_param_names = ['transformed', 'hessian_method', 'approx_complex_step', 'approx_centered']
_loglike_param_defaults = [True, False, False]
_loglike_param_names = ['transformed', 'includes_fixed', 'complex_step']
property _res_classes
_score_complex_step(params, **kwargs)[source]
_score_finite_difference(params, approx_centered=False, **kwargs)[source]
_score_harvey(params, approx_complex_step=True, **kwargs)[source]
_score_obs_harvey(params, approx_complex_step=True, approx_centered=False, includes_fixed=False, **kwargs)[source]

Score

Parameters:
paramsarray_like, optional

Array of parameters at which to evaluate the loglikelihood function.

**kwargs

Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.

Notes

This method is from Harvey (1989), section 3.4.5

References

Harvey, Andrew C. 1990. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.

_score_param_defaults = [True, False, 'approx', None, False]
_score_param_names = ['transformed', 'includes_fixed', 'score_method', 'approx_complex_step', 'approx_centered']
_validate_can_fix_params(param_names)[source]
_validate_out_of_sample_exog(exog, out_of_sample)[source]

Validate given exog as satisfactory for out-of-sample operations

Parameters:
exogarray_like or None

New observations of exogenous regressors, if applicable.

out_of_sampleint

Number of new observations required.

Returns:
exogarray or None

A numpy array of shape (out_of_sample, k_exog) if the model contains an exog component, or None if it does not.

_wrap_results(params, result, return_raw, cov_type=None, cov_kwds=None, results_class=None, wrapper_class=None)[source]
clone(endog, exog=None, **kwargs)[source]

Clone state space model with new data and optionally new specification

Parameters:
endogarray_like

The observed time-series process \(y\)

k_statesint

The dimension of the unobserved state process.

exogarray_like, optional

Array of exogenous regressors, shaped nobs x k. Default is no exogenous regressors.

kwargs

Keyword arguments to pass to the new model class to change the model specification.

Returns:
modelMLEModel subclass

Notes

This method must be implemented

filter(params, transformed=True, includes_fixed=False, complex_step=False, cov_type=None, cov_kwds=None, return_ssm=False, results_class=None, results_wrapper_class=None, low_memory=False, **kwargs)[source]

Kalman filtering

Parameters:
paramsarray_like

Array of parameters at which to evaluate the loglikelihood function.

transformedbool, optional

Whether or not params is already transformed. Default is True.

return_ssmbool,optional

Whether or not to return only the state space output or a full results object. Default is to return a full results object.

cov_typestr, optional

See MLEResults.fit for a description of covariance matrix types for results object.

cov_kwdsdict or None, optional

See MLEResults.get_robustcov_results for a description required keywords for alternative covariance estimators

low_memorybool, optional

If set to True, techniques are applied to substantially reduce memory usage. If used, some features of the results object will not be available (including in-sample prediction), although out-of-sample forecasting is possible. Default is False.

**kwargs

Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.

fit(start_params=None, transformed=True, includes_fixed=False, cov_type=None, cov_kwds=None, method='lbfgs', maxiter=50, full_output=1, disp=5, callback=None, return_params=False, optim_score=None, optim_complex_step=None, optim_hessian=None, flags=None, low_memory=False, **kwargs)[source]

Fits the model by maximum likelihood via Kalman filter.

Parameters:
start_paramsarray_like, optional

Initial guess of the solution for the loglikelihood maximization. If None, the default is given by Model.start_params.

transformedbool, optional

Whether or not start_params is already transformed. Default is True.

includes_fixedbool, optional

If parameters were previously fixed with the fix_params method, this argument describes whether or not start_params also includes the fixed parameters, in addition to the free parameters. Default is False.

cov_typestr, optional

The cov_type keyword governs the method for calculating the covariance matrix of parameter estimates. Can be one of:

  • ‘opg’ for the outer product of gradient estimator

  • ‘oim’ for the observed information matrix estimator, calculated using the method of Harvey (1989)

  • ‘approx’ for the observed information matrix estimator, calculated using a numerical approximation of the Hessian matrix.

  • ‘robust’ for an approximate (quasi-maximum likelihood) covariance matrix that may be valid even in the presence of some misspecifications. Intermediate calculations use the ‘oim’ method.

  • ‘robust_approx’ is the same as ‘robust’ except that the intermediate calculations use the ‘approx’ method.

  • ‘none’ for no covariance matrix calculation.

Default is ‘opg’ unless memory conservation is used to avoid computing the loglikelihood values for each observation, in which case the default is ‘approx’.

cov_kwdsdict or None, optional

A dictionary of arguments affecting covariance matrix computation.

opg, oim, approx, robust, robust_approx

  • ‘approx_complex_step’ : bool, optional - If True, numerical approximations are computed using complex-step methods. If False, numerical approximations are computed using finite difference methods. Default is True.

  • ‘approx_centered’ : bool, optional - If True, numerical approximations computed using finite difference methods use a centered approximation. Default is False.

methodstr, optional

The method determines which solver from scipy.optimize is used, and it can be chosen from among the following strings:

  • ‘newton’ for Newton-Raphson

  • ‘nm’ for Nelder-Mead

  • ‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)

  • ‘lbfgs’ for limited-memory BFGS with optional box constraints

  • ‘powell’ for modified Powell’s method

  • ‘cg’ for conjugate gradient

  • ‘ncg’ for Newton-conjugate gradient

  • ‘basinhopping’ for global basin-hopping solver

The explicit arguments in fit are passed to the solver, with the exception of the basin-hopping solver. Each solver has several optional arguments that are not the same across solvers. See the notes section below (or scipy.optimize) for the available arguments and for the list of explicit arguments that the basin-hopping solver supports.

maxiterint, optional

The maximum number of iterations to perform.

full_outputbool, optional

Set to True to have all available output in the Results object’s mle_retvals attribute. The output is dependent on the solver. See LikelihoodModelResults notes section for more information.

dispbool, optional

Set to True to print convergence messages.

callbackcallable callback(xk), optional

Called after each iteration, as callback(xk), where xk is the current parameter vector.

return_paramsbool, optional

Whether or not to return only the array of maximizing parameters. Default is False.

optim_score{‘harvey’, ‘approx’} or None, optional

The method by which the score vector is calculated. ‘harvey’ uses the method from Harvey (1989), ‘approx’ uses either finite difference or complex step differentiation depending upon the value of optim_complex_step, and None uses the built-in gradient approximation of the optimizer. Default is None. This keyword is only relevant if the optimization method uses the score.

optim_complex_stepbool, optional

Whether or not to use complex step differentiation when approximating the score; if False, finite difference approximation is used. Default is True. This keyword is only relevant if optim_score is set to ‘harvey’ or ‘approx’.

optim_hessian{‘opg’,’oim’,’approx’}, optional

The method by which the Hessian is numerically approximated. ‘opg’ uses outer product of gradients, ‘oim’ uses the information matrix formula from Harvey (1989), and ‘approx’ uses numerical approximation. This keyword is only relevant if the optimization method uses the Hessian matrix.

low_memorybool, optional

If set to True, techniques are applied to substantially reduce memory usage. If used, some features of the results object will not be available (including smoothed results and in-sample prediction), although out-of-sample forecasting is possible. Default is False.

**kwargs

Additional keyword arguments to pass to the optimizer.

Returns:
results

Results object holding results from fitting a state space model.

See also

statsmodels.base.model.LikelihoodModel.fit
statsmodels.tsa.statespace.mlemodel.MLEResults
statsmodels.tsa.statespace.structural.UnobservedComponentsResults
fit_constrained(constraints, start_params=None, **fit_kwds)[source]

Fit the model with some parameters subject to equality constraints.

Parameters:
constraintsdict

Dictionary of constraints, of the form param_name: fixed_value. See the param_names property for valid parameter names.

start_paramsarray_like, optional

Initial guess of the solution for the loglikelihood maximization. If None, the default is given by Model.start_params.

**fit_kwdskeyword arguments

fit_kwds are used in the optimization of the remaining parameters.

Returns:
resultsResults instance

Examples

>>> mod = sm.tsa.SARIMAX(endog, order=(1, 0, 1))
>>> res = mod.fit_constrained({'ar.L1': 0.5})
fix_params(params)[source]

Fix parameters to specific values (context manager)

Parameters:
paramsdict

Dictionary describing the fixed parameter values, of the form param_name: fixed_value. See the param_names property for valid parameter names.

Examples

>>> mod = sm.tsa.SARIMAX(endog, order=(1, 0, 1))
>>> with mod.fix_params({'ar.L1': 0.5}):
        res = mod.fit()
classmethod from_formula(formula, data, subset=None)[source]

Not implemented for state space models

handle_params(params, transformed=True, includes_fixed=False, return_jacobian=False)[source]

Ensure model parameters satisfy shape and other requirements

hessian(params, *args, **kwargs)[source]

Hessian matrix of the likelihood function, evaluated at the given parameters

Parameters:
paramsarray_like

Array of parameters at which to evaluate the hessian.

*args

Additional positional arguments to the loglike method.

**kwargs

Additional keyword arguments to the loglike method.

Returns:
hessianndarray

Hessian matrix evaluated at params

Notes

This is a numerical approximation.

Both args and kwargs are necessary because the optimizer from fit must call this function and only supports passing arguments via args (for example scipy.optimize.fmin_l_bfgs).

impulse_responses(params, steps=1, impulse=0, orthogonalized=False, cumulative=False, anchor=None, exog=None, extend_model=None, extend_kwargs=None, transformed=True, includes_fixed=False, **kwargs)[source]

Impulse response function

Parameters:
paramsarray_like

Array of model parameters.

stepsint, optional

The number of steps for which impulse responses are calculated. Default is 1. Note that for time-invariant models, the initial impulse is not counted as a step, so if steps=1, the output will have 2 entries.

impulseint, str or array_like

If an integer, the state innovation to pulse; must be between 0 and k_posdef-1. If a str, it indicates which column of df the unit (1) impulse is given. Alternatively, a custom impulse vector may be provided; must be shaped k_posdef x 1.

orthogonalizedbool, optional

Whether or not to perform impulse using orthogonalized innovations. Note that this will also affect custum impulse vectors. Default is False.

cumulativebool, optional

Whether or not to return cumulative impulse responses. Default is False.

anchorint, str, or datetime, optional

Time point within the sample for the state innovation impulse. Type depends on the index of the given endog in the model. Two special cases are the strings ‘start’ and ‘end’, which refer to setting the impulse at the first and last points of the sample, respectively. Integer values can run from 0 to nobs - 1, or can be negative to apply negative indexing. Finally, if a date/time index was provided to the model, then this argument can be a date string to parse or a datetime type. Default is ‘start’.

exogarray_like, optional

New observations of exogenous regressors for our-of-sample periods, if applicable.

transformedbool, optional

Whether or not params is already transformed. Default is True.

includes_fixedbool, optional

If parameters were previously fixed with the fix_params method, this argument describes whether or not params also includes the fixed parameters, in addition to the free parameters. Default is False.

**kwargs

If the model has time-varying design or transition matrices and the combination of anchor and steps implies creating impulse responses for the out-of-sample period, then these matrices must have updated values provided for the out-of-sample steps. For example, if design is a time-varying component, nobs is 10, anchor=1, and steps is 15, a (k_endog x k_states x 7) matrix must be provided with the new design matrix values.

Returns:
impulse_responsesndarray

Responses for each endogenous variable due to the impulse given by the impulse argument. For a time-invariant model, the impulse responses are given for steps + 1 elements (this gives the “initial impulse” followed by steps responses for the important cases of VAR and SARIMAX models), while for time-varying models the impulse responses are only given for steps elements (to avoid having to unexpectedly provide updated time-varying matrices).

See also

simulate

Simulate a time series according to the given state space model, optionally with specified series for the innovations.

Notes

Intercepts in the measurement and state equation are ignored when calculating impulse responses.

TODO: add an option to allow changing the ordering for the

orthogonalized option. Will require permuting matrices when constructing the extended model.

property initial_variance
property initialization
initialize_approximate_diffuse(variance=None)[source]

Initialize approximate diffuse

initialize_known(initial_state, initial_state_cov)[source]

Initialize known

initialize_statespace(**kwargs)[source]

Initialize the state space representation

Parameters:
**kwargs

Additional keyword arguments to pass to the state space class constructor.

initialize_stationary()[source]

Initialize stationary

loglike(params, *args, **kwargs)[source]

Loglikelihood evaluation

Parameters:
paramsarray_like

Array of parameters at which to evaluate the loglikelihood function.

transformedbool, optional

Whether or not params is already transformed. Default is True.

**kwargs

Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.

See also

update

modifies the internal state of the state space model to reflect new params

Notes

[1] recommend maximizing the average likelihood to avoid scale issues; this is done automatically by the base Model fit method.

References

[1]

Koopman, Siem Jan, Neil Shephard, and Jurgen A. Doornik. 1999. Statistical Algorithms for Models in State Space Using SsfPack 2.2. Econometrics Journal 2 (1): 107-60. doi:10.1111/1368-423X.00023.

property loglikelihood_burn
loglikeobs(params, transformed=True, includes_fixed=False, complex_step=False, **kwargs)[source]

Loglikelihood evaluation

Parameters:
paramsarray_like

Array of parameters at which to evaluate the loglikelihood function.

transformedbool, optional

Whether or not params is already transformed. Default is True.

**kwargs

Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.

See also

update

modifies the internal state of the Model to reflect new params

Notes

[1] recommend maximizing the average likelihood to avoid scale issues; this is done automatically by the base Model fit method.

References

[1]

Koopman, Siem Jan, Neil Shephard, and Jurgen A. Doornik. 1999. Statistical Algorithms for Models in State Space Using SsfPack 2.2. Econometrics Journal 2 (1): 107-60. doi:10.1111/1368-423X.00023.

observed_information_matrix(params, transformed=True, includes_fixed=False, approx_complex_step=None, approx_centered=False, **kwargs)[source]

Observed information matrix

Parameters:
paramsarray_like, optional

Array of parameters at which to evaluate the loglikelihood function.

**kwargs

Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.

Notes

This method is from Harvey (1989), which shows that the information matrix only depends on terms from the gradient. This implementation is partially analytic and partially numeric approximation, therefore, because it uses the analytic formula for the information matrix, with numerically computed elements of the gradient.

References

Harvey, Andrew C. 1990. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.

opg_information_matrix(params, transformed=True, includes_fixed=False, approx_complex_step=None, **kwargs)[source]

Outer product of gradients information matrix

Parameters:
paramsarray_like, optional

Array of parameters at which to evaluate the loglikelihood function.

**kwargs

Additional arguments to the loglikeobs method.

References

Berndt, Ernst R., Bronwyn Hall, Robert Hall, and Jerry Hausman. 1974. Estimation and Inference in Nonlinear Structural Models. NBER Chapters. National Bureau of Economic Research, Inc.

property param_names

(list of str) List of human readable parameter names (for parameters actually included in the model).

prepare_data()[source]

Prepare data for use in the state space representation

score(params, *args, **kwargs)[source]

Compute the score function at params.

Parameters:
paramsarray_like

Array of parameters at which to evaluate the score.

*args

Additional positional arguments to the loglike method.

**kwargs

Additional keyword arguments to the loglike method.

Returns:
scorendarray

Score, evaluated at params.

Notes

This is a numerical approximation, calculated using first-order complex step differentiation on the loglike method.

Both args and kwargs are necessary because the optimizer from fit must call this function and only supports passing arguments via args (for example scipy.optimize.fmin_l_bfgs).

score_obs(params, method='approx', transformed=True, includes_fixed=False, approx_complex_step=None, approx_centered=False, **kwargs)[source]

Compute the score per observation, evaluated at params

Parameters:
paramsarray_like

Array of parameters at which to evaluate the score.

**kwargs

Additional arguments to the loglike method.

Returns:
scorendarray

Score per observation, evaluated at params.

Notes

This is a numerical approximation, calculated using first-order complex step differentiation on the loglikeobs method.

set_conserve_memory(conserve_memory=None, **kwargs)[source]

Set the memory conservation method

By default, the Kalman filter computes a number of intermediate matrices at each iteration. The memory conservation options control which of those matrices are stored.

Parameters:
conserve_memoryint, optional

Bitmask value to set the memory conservation method to. See notes for details.

**kwargs

Keyword arguments may be used to influence the memory conservation method by setting individual boolean flags.

Notes

This method is rarely used. See the corresponding function in the KalmanFilter class for details.

set_filter_method(filter_method=None, **kwargs)[source]

Set the filtering method

The filtering method controls aspects of which Kalman filtering approach will be used.

Parameters:
filter_methodint, optional

Bitmask value to set the filter method to. See notes for details.

**kwargs

Keyword arguments may be used to influence the filter method by setting individual boolean flags. See notes for details.

Notes

This method is rarely used. See the corresponding function in the KalmanFilter class for details.

set_inversion_method(inversion_method=None, **kwargs)[source]

Set the inversion method

The Kalman filter may contain one matrix inversion: that of the forecast error covariance matrix. The inversion method controls how and if that inverse is performed.

Parameters:
inversion_methodint, optional

Bitmask value to set the inversion method to. See notes for details.

**kwargs

Keyword arguments may be used to influence the inversion method by setting individual boolean flags. See notes for details.

Notes

This method is rarely used. See the corresponding function in the KalmanFilter class for details.

set_smoother_output(smoother_output=None, **kwargs)[source]

Set the smoother output

The smoother can produce several types of results. The smoother output variable controls which are calculated and returned.

Parameters:
smoother_outputint, optional

Bitmask value to set the smoother output to. See notes for details.

**kwargs

Keyword arguments may be used to influence the smoother output by setting individual boolean flags.

Notes

This method is rarely used. See the corresponding function in the KalmanSmoother class for details.

set_stability_method(stability_method=None, **kwargs)[source]

Set the numerical stability method

The Kalman filter is a recursive algorithm that may in some cases suffer issues with numerical stability. The stability method controls what, if any, measures are taken to promote stability.

Parameters:
stability_methodint, optional

Bitmask value to set the stability method to. See notes for details.

**kwargs

Keyword arguments may be used to influence the stability method by setting individual boolean flags. See notes for details.

Notes

This method is rarely used. See the corresponding function in the KalmanFilter class for details.

simulate(params, nsimulations, measurement_shocks=None, state_shocks=None, initial_state=None, anchor=None, repetitions=None, exog=None, extend_model=None, extend_kwargs=None, transformed=True, includes_fixed=False, pretransformed_measurement_shocks=True, pretransformed_state_shocks=True, pretransformed_initial_state=True, random_state=None, **kwargs)[source]

Simulate a new time series following the state space model

Parameters:
paramsarray_like

Array of parameters to use in constructing the state space representation to use when simulating.

nsimulationsint

The number of observations to simulate. If the model is time-invariant this can be any number. If the model is time-varying, then this number must be less than or equal to the number of observations.

measurement_shocksarray_like, optional

If specified, these are the shocks to the measurement equation, \(\varepsilon_t\). If unspecified, these are automatically generated using a pseudo-random number generator. If specified, must be shaped nsimulations x k_endog, where k_endog is the same as in the state space model.

state_shocksarray_like, optional

If specified, these are the shocks to the state equation, \(\eta_t\). If unspecified, these are automatically generated using a pseudo-random number generator. If specified, must be shaped nsimulations x k_posdef where k_posdef is the same as in the state space model.

initial_statearray_like, optional

If specified, this is the initial state vector to use in simulation, which should be shaped (k_states x 1), where k_states is the same as in the state space model. If unspecified, but the model has been initialized, then that initialization is used. This must be specified if anchor is anything other than “start” or 0 (or else you can use the simulate method on a results object rather than on the model object).

anchorint, str, or datetime, optional

First period for simulation. The simulation will be conditional on all existing datapoints prior to the anchor. Type depends on the index of the given endog in the model. Two special cases are the strings ‘start’ and ‘end’. start refers to beginning the simulation at the first period of the sample, and end refers to beginning the simulation at the first period after the sample. Integer values can run from 0 to nobs, or can be negative to apply negative indexing. Finally, if a date/time index was provided to the model, then this argument can be a date string to parse or a datetime type. Default is ‘start’.

repetitionsint, optional

Number of simulated paths to generate. Default is 1 simulated path.

exogarray_like, optional

New observations of exogenous regressors, if applicable.

transformedbool, optional

Whether or not params is already transformed. Default is True.

includes_fixedbool, optional

If parameters were previously fixed with the fix_params method, this argument describes whether or not params also includes the fixed parameters, in addition to the free parameters. Default is False.

pretransformed_measurement_shocksbool, optional

If measurement_shocks is provided, this flag indicates whether it should be directly used as the shocks. If False, then it is assumed to contain draws from the standard Normal distribution that must be transformed using the obs_cov covariance matrix. Default is True.

pretransformed_state_shocksbool, optional

If state_shocks is provided, this flag indicates whether it should be directly used as the shocks. If False, then it is assumed to contain draws from the standard Normal distribution that must be transformed using the state_cov covariance matrix. Default is True.

pretransformed_initial_statebool, optional

If initial_state is provided, this flag indicates whether it should be directly used as the initial_state. If False, then it is assumed to contain draws from the standard Normal distribution that must be transformed using the initial_state_cov covariance matrix. Default is True.

random_state{None, int, Generator, RandomState}, optional

If seed is None (or np.random), the class:~numpy.random.RandomState singleton is used. If seed is an int, a new class:~numpy.random.RandomState instance is used, seeded with seed. If seed is already a class:~numpy.random.Generator or class:~numpy.random.RandomState instance then that instance is used.

Returns:
simulated_obsndarray

An array of simulated observations. If repetitions=None, then it will be shaped (nsimulations x k_endog) or (nsimulations,) if k_endog=1. Otherwise it will be shaped (nsimulations x k_endog x repetitions). If the model was given Pandas input then the output will be a Pandas object. If k_endog > 1 and repetitions is not None, then the output will be a Pandas DataFrame that has a MultiIndex for the columns, with the first level containing the names of the endog variables and the second level containing the repetition number.

See also

impulse_responses

Impulse response functions

simulation_smoother(simulation_output=None, **kwargs)[source]

Retrieve a simulation smoother for the state space model.

Parameters:
simulation_outputint, optional

Determines which simulation smoother output is calculated. Default is all (including state and disturbances).

**kwargs

Additional keyword arguments, used to set the simulation output. See set_simulation_output for more details.

Returns:
SimulationSmoothResults
smooth(params, transformed=True, includes_fixed=False, complex_step=False, cov_type=None, cov_kwds=None, return_ssm=False, results_class=None, results_wrapper_class=None, **kwargs)[source]

Kalman smoothing

Parameters:
paramsarray_like

Array of parameters at which to evaluate the loglikelihood function.

transformedbool, optional

Whether or not params is already transformed. Default is True.

return_ssmbool,optional

Whether or not to return only the state space output or a full results object. Default is to return a full results object.

cov_typestr, optional

See MLEResults.fit for a description of covariance matrix types for results object.

cov_kwdsdict or None, optional

See MLEResults.get_robustcov_results for a description required keywords for alternative covariance estimators

**kwargs

Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.

property start_params

(array) Starting parameters for maximum likelihood estimation.

property state_names

(list of str) List of human readable names for unobserved states.

property tolerance
transform_jacobian(unconstrained, approx_centered=False)[source]

Jacobian matrix for the parameter transformation function

Parameters:
unconstrainedarray_like

Array of unconstrained parameters used by the optimizer.

Returns:
jacobianndarray

Jacobian matrix of the transformation, evaluated at unconstrained

See also

transform_params

Notes

This is a numerical approximation using finite differences. Note that in general complex step methods cannot be used because it is not guaranteed that the transform_params method is a real function (e.g. if Cholesky decomposition is used).

transform_params(unconstrained)[source]

Transform unconstrained parameters used by the optimizer to constrained parameters used in likelihood evaluation

Parameters:
unconstrainedarray_like

Array of unconstrained parameters used by the optimizer, to be transformed.

Returns:
constrainedarray_like

Array of constrained parameters which may be used in likelihood evaluation.

Notes

This is a noop in the base class, subclasses should override where appropriate.

untransform_params(constrained)[source]

Transform constrained parameters used in likelihood evaluation to unconstrained parameters used by the optimizer

Parameters:
constrainedarray_like

Array of constrained parameters used in likelihood evaluation, to be transformed.

Returns:
unconstrainedarray_like

Array of unconstrained parameters used by the optimizer.

Notes

This is a noop in the base class, subclasses should override where appropriate.

update(params, transformed=True, includes_fixed=False, complex_step=False)[source]

Update the parameters of the model

Parameters:
paramsarray_like

Array of new parameters.

transformedbool, optional

Whether or not params is already transformed. If set to False, transform_params is called. Default is True.

Returns:
paramsarray_like

Array of parameters.

Notes

Since Model is a base class, this method should be overridden by subclasses to perform actual updating steps.

class Pipeline(steps, *, memory=None, verbose=False)[source]

Bases: _BaseComposition

A sequence of data transformers with an optional final predictor.

Pipeline allows you to sequentially apply a list of transformers to preprocess the data and, if desired, conclude the sequence with a final predictor for predictive modeling.

Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit. The transformers in the pipeline can be cached using memory argument.

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below. A step’s estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting it to ‘passthrough’ or None.

For an example use case of Pipeline combined with GridSearchCV, refer to Selecting dimensionality reduction with Pipeline and GridSearchCV. The example Pipelining: chaining a PCA and a logistic regression shows how to grid search on a pipeline using ‘__’ as a separator in the parameter names.

Read more in the User Guide.

New in version 0.5.

Parameters:
stepslist of tuples

List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define fit. All non-last steps must also define transform. See Combining Estimators for more details.

memorystr or object with the joblib.Memory interface, default=None

Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

verbosebool, default=False

If True, the time elapsed while fitting each step will be printed as it is completed.

See also

make_pipeline

Convenience function for simplified pipeline construction.

Examples

>>> from sklearn.svm import SVC
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.pipeline import Pipeline
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
...                                                     random_state=0)
>>> pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
>>> # The pipeline can be used as any other estimator
>>> # and avoids leaking the test set into the train set
>>> pipe.fit(X_train, y_train).score(X_test, y_test)
0.88
>>> # An estimator's parameter can be set using '__' syntax
>>> pipe.set_params(svc__C=10).fit(X_train, y_train).score(X_test, y_test)
0.76
Attributes:
named_stepsBunch

Access the steps by name.

classes_ndarray of shape (n_classes,)

The classes labels.

n_features_in_int

Number of features seen during first step fit method.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during first step fit method.

Methods

decision_function(X, **params)

Transform the data, and apply decision_function with the final estimator.

fit(X[, y])

Fit the model.

fit_predict(X[, y])

Transform the data, and apply fit_predict with the final estimator.

fit_transform(X[, y])

Fit the model and transform with the final estimator.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

inverse_transform(Xt, **params)

Apply inverse_transform for each step in a reverse order.

predict(X, **params)

Transform the data, and apply predict with the final estimator.

predict_log_proba(X, **params)

Transform the data, and apply predict_log_proba with the final estimator.

predict_proba(X, **params)

Transform the data, and apply predict_proba with the final estimator.

score(X[, y, sample_weight])

Transform the data, and apply score with the final estimator.

score_samples(X)

Transform the data, and apply score_samples with the final estimator.

set_output(*[, transform])

Set the output container when "transform" and "fit_transform" are called.

set_params(**kwargs)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

transform(X, **params)

Transform the data, and apply transform with the final estimator.

__abstractmethods__ = frozenset({})
__annotations__ = {'_parameter_constraints': <class 'dict'>, 'steps': 'List[Any]'}
__doc__ = "\n    A sequence of data transformers with an optional final predictor.\n\n    `Pipeline` allows you to sequentially apply a list of transformers to\n    preprocess the data and, if desired, conclude the sequence with a final\n    :term:`predictor` for predictive modeling.\n\n    Intermediate steps of the pipeline must be 'transforms', that is, they\n    must implement `fit` and `transform` methods.\n    The final :term:`estimator` only needs to implement `fit`.\n    The transformers in the pipeline can be cached using ``memory`` argument.\n\n    The purpose of the pipeline is to assemble several steps that can be\n    cross-validated together while setting different parameters. For this, it\n    enables setting parameters of the various steps using their names and the\n    parameter name separated by a `'__'`, as in the example below. A step's\n    estimator may be replaced entirely by setting the parameter with its name\n    to another estimator, or a transformer removed by setting it to\n    `'passthrough'` or `None`.\n\n    For an example use case of `Pipeline` combined with\n    :class:`~sklearn.model_selection.GridSearchCV`, refer to\n    :ref:`sphx_glr_auto_examples_compose_plot_compare_reduction.py`. The\n    example :ref:`sphx_glr_auto_examples_compose_plot_digits_pipe.py` shows how\n    to grid search on a pipeline using `'__'` as a separator in the parameter names.\n\n    Read more in the :ref:`User Guide <pipeline>`.\n\n    .. versionadded:: 0.5\n\n    Parameters\n    ----------\n    steps : list of tuples\n        List of (name of step, estimator) tuples that are to be chained in\n        sequential order. To be compatible with the scikit-learn API, all steps\n        must define `fit`. All non-last steps must also define `transform`. See\n        :ref:`Combining Estimators <combining_estimators>` for more details.\n\n    memory : str or object with the joblib.Memory interface, default=None\n        Used to cache the fitted transformers of the pipeline. The last step\n        will never be cached, even if it is a transformer. By default, no\n        caching is performed. If a string is given, it is the path to the\n        caching directory. Enabling caching triggers a clone of the transformers\n        before fitting. Therefore, the transformer instance given to the\n        pipeline cannot be inspected directly. Use the attribute ``named_steps``\n        or ``steps`` to inspect estimators within the pipeline. Caching the\n        transformers is advantageous when fitting is time consuming.\n\n    verbose : bool, default=False\n        If True, the time elapsed while fitting each step will be printed as it\n        is completed.\n\n    Attributes\n    ----------\n    named_steps : :class:`~sklearn.utils.Bunch`\n        Dictionary-like object, with the following attributes.\n        Read-only attribute to access any step parameter by user given name.\n        Keys are step names and values are steps parameters.\n\n    classes_ : ndarray of shape (n_classes,)\n        The classes labels. Only exist if the last step of the pipeline is a\n        classifier.\n\n    n_features_in_ : int\n        Number of features seen during :term:`fit`. Only defined if the\n        underlying first estimator in `steps` exposes such an attribute\n        when fit.\n\n        .. versionadded:: 0.24\n\n    feature_names_in_ : ndarray of shape (`n_features_in_`,)\n        Names of features seen during :term:`fit`. Only defined if the\n        underlying estimator exposes such an attribute when fit.\n\n        .. versionadded:: 1.0\n\n    See Also\n    --------\n    make_pipeline : Convenience function for simplified pipeline construction.\n\n    Examples\n    --------\n    >>> from sklearn.svm import SVC\n    >>> from sklearn.preprocessing import StandardScaler\n    >>> from sklearn.datasets import make_classification\n    >>> from sklearn.model_selection import train_test_split\n    >>> from sklearn.pipeline import Pipeline\n    >>> X, y = make_classification(random_state=0)\n    >>> X_train, X_test, y_train, y_test = train_test_split(X, y,\n    ...                                                     random_state=0)\n    >>> pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])\n    >>> # The pipeline can be used as any other estimator\n    >>> # and avoids leaking the test set into the train set\n    >>> pipe.fit(X_train, y_train).score(X_test, y_test)\n    0.88\n    >>> # An estimator's parameter can be set using '__' syntax\n    >>> pipe.set_params(svc__C=10).fit(X_train, y_train).score(X_test, y_test)\n    0.76\n    "
__getitem__(ind)[source]

Returns a sub-pipeline or a single estimator in the pipeline

Indexing with an integer will return an estimator; using a slice returns another Pipeline instance which copies a slice of this Pipeline. This copy is shallow: modifying (or fitting) estimators in the sub-pipeline will affect the larger pipeline and vice-versa. However, replacing a value in step will not affect a copy.

__init__(steps, *, memory=None, verbose=False)[source]
__len__()[source]

Returns the length of the Pipeline

__module__ = 'sklearn.pipeline'
__sklearn_is_fitted__()[source]

Indicate whether pipeline has been fit.

_abc_impl = <_abc._abc_data object>
_can_fit_transform()[source]
_can_inverse_transform()[source]
_can_transform()[source]
_check_method_params(method, props, **kwargs)[source]
property _estimator_type
property _final_estimator
_fit(X, y=None, routed_params=None)[source]
_iter(with_final=True, filter_passthrough=True)[source]

Generate (idx, (name, trans)) tuples from self.steps

When filter_passthrough is True, ‘passthrough’ and None transformers are filtered out.

_log_message(step_idx)[source]
_more_tags()[source]
_parameter_constraints: dict = {'memory': [None, <class 'str'>, <sklearn.utils._param_validation.HasMethods object>], 'steps': [<class 'list'>, <sklearn.utils._param_validation.Hidden object>], 'verbose': ['boolean']}
_required_parameters = ['steps']
_sk_visual_block_()[source]
_validate_steps()[source]
property classes_

The classes labels. Only exist if the last step is a classifier.

decision_function(X, **params)[source]

Transform the data, and apply decision_function with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls decision_function method. Only valid if the final estimator implements decision_function.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

**paramsdict of string -> object

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.

Returns:
y_scorendarray of shape (n_samples, n_classes)

Result of calling decision_function on the final estimator.

property feature_names_in_

Names of features seen during first step fit method.

fit(X, y=None, **params)[source]

Fit the model.

Fit all the transformers one after the other and sequentially transform the data. Finally, fit the transformed data using the final estimator.

Parameters:
Xiterable

Training data. Must fulfill input requirements of first step of the pipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**paramsdict of str -> object
  • If enable_metadata_routing=False (default):

    Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

  • If enable_metadata_routing=True:

    Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Changed in version 1.4: Parameters are now passed to the transform method of the intermediate steps as well, if requested, and if enable_metadata_routing=True is set via set_config().

See Metadata Routing User Guide for more details.

Returns:
selfobject

Pipeline with fitted steps.

fit_predict(X, y=None, **params)[source]

Transform the data, and apply fit_predict with the final estimator.

Call fit_transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls fit_predict method. Only valid if the final estimator implements fit_predict.

Parameters:
Xiterable

Training data. Must fulfill input requirements of first step of the pipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**paramsdict of str -> object
  • If enable_metadata_routing=False (default):

    Parameters to the predict called at the end of all transformations in the pipeline.

  • If enable_metadata_routing=True:

    Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 0.20.

Changed in version 1.4: Parameters are now passed to the transform method of the intermediate steps as well, if requested, and if enable_metadata_routing=True.

See Metadata Routing User Guide for more details.

Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.

Returns:
y_predndarray

Result of calling fit_predict on the final estimator.

fit_transform(X, y=None, **params)[source]

Fit the model and transform with the final estimator.

Fit all the transformers one after the other and sequentially transform the data. Only valid if the final estimator either implements fit_transform or fit and transform.

Parameters:
Xiterable

Training data. Must fulfill input requirements of first step of the pipeline.

yiterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**paramsdict of str -> object
  • If enable_metadata_routing=False (default):

    Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

  • If enable_metadata_routing=True:

    Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

Changed in version 1.4: Parameters are now passed to the transform method of the intermediate steps as well, if requested, and if enable_metadata_routing=True.

See Metadata Routing User Guide for more details.

Returns:
Xtndarray of shape (n_samples, n_transformed_features)

Transformed samples.

get_feature_names_out(input_features=None)[source]

Get output feature names for transformation.

Transform input features using the pipeline.

Parameters:
input_featuresarray-like of str or None, default=None

Input features.

Returns:
feature_names_outndarray of str objects

Transformed feature names.

get_metadata_routing()[source]

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRouter

A MetadataRouter encapsulating routing information.

get_params(deep=True)[source]

Get parameters for this estimator.

Returns the parameters given in the constructor as well as the estimators contained within the steps of the Pipeline.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsmapping of string to any

Parameter names mapped to their values.

inverse_transform(Xt, **params)[source]

Apply inverse_transform for each step in a reverse order.

All estimators in the pipeline must support inverse_transform.

Parameters:
Xtarray-like of shape (n_samples, n_transformed_features)

Data samples, where n_samples is the number of samples and n_features is the number of features. Must fulfill input requirements of last step of pipeline’s inverse_transform method.

**paramsdict of str -> object

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.

Returns:
Xtndarray of shape (n_samples, n_features)

Inverse transformed data, that is, data in the original feature space.

property n_features_in_

Number of features seen during first step fit method.

property named_steps

Access the steps by name.

Read-only attribute to access any step by given name. Keys are steps names and values are the steps objects.

predict(X, **params)[source]

Transform the data, and apply predict with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict method. Only valid if the final estimator implements predict.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

**paramsdict of str -> object
  • If enable_metadata_routing=False (default):

    Parameters to the predict called at the end of all transformations in the pipeline.

  • If enable_metadata_routing=True:

    Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 0.20.

Changed in version 1.4: Parameters are now passed to the transform method of the intermediate steps as well, if requested, and if enable_metadata_routing=True is set via set_config().

See Metadata Routing User Guide for more details.

Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.

Returns:
y_predndarray

Result of calling predict on the final estimator.

predict_log_proba(X, **params)[source]

Transform the data, and apply predict_log_proba with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict_log_proba method. Only valid if the final estimator implements predict_log_proba.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

**paramsdict of str -> object
  • If enable_metadata_routing=False (default):

    Parameters to the predict_log_proba called at the end of all transformations in the pipeline.

  • If enable_metadata_routing=True:

    Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 0.20.

Changed in version 1.4: Parameters are now passed to the transform method of the intermediate steps as well, if requested, and if enable_metadata_routing=True.

See Metadata Routing User Guide for more details.

Returns:
y_log_probandarray of shape (n_samples, n_classes)

Result of calling predict_log_proba on the final estimator.

predict_proba(X, **params)[source]

Transform the data, and apply predict_proba with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict_proba method. Only valid if the final estimator implements predict_proba.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

**paramsdict of str -> object
  • If enable_metadata_routing=False (default):

    Parameters to the predict_proba called at the end of all transformations in the pipeline.

  • If enable_metadata_routing=True:

    Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 0.20.

Changed in version 1.4: Parameters are now passed to the transform method of the intermediate steps as well, if requested, and if enable_metadata_routing=True.

See Metadata Routing User Guide for more details.

Returns:
y_probandarray of shape (n_samples, n_classes)

Result of calling predict_proba on the final estimator.

score(X, y=None, sample_weight=None, **params)[source]

Transform the data, and apply score with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls score method. Only valid if the final estimator implements score.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

yiterable, default=None

Targets used for scoring. Must fulfill label requirements for all steps of the pipeline.

sample_weightarray-like, default=None

If not None, this argument is passed as sample_weight keyword argument to the score method of the final estimator.

**paramsdict of str -> object

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.

Returns:
scorefloat

Result of calling score on the final estimator.

score_samples(X)[source]

Transform the data, and apply score_samples with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls score_samples method. Only valid if the final estimator implements score_samples.

Parameters:
Xiterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

Returns:
y_scorendarray of shape (n_samples,)

Result of calling score_samples on the final estimator.

set_output(*, transform=None)[source]

Set the output container when “transform” and “fit_transform” are called.

Calling set_output will set the output of all estimators in steps.

Parameters:
transform{“default”, “pandas”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**kwargs)[source]

Set the parameters of this estimator.

Valid parameter keys can be listed with get_params(). Note that you can directly set the parameters of the estimators contained in steps.

Parameters:
**kwargsdict

Parameters of this estimator or parameters of estimators contained in steps. Parameters of the steps may be set using its name and the parameter name separated by a ‘__’.

Returns:
selfobject

Pipeline class instance.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Pipeline

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

transform(X, **params)[source]

Transform the data, and apply transform with the final estimator.

Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls transform method. Only valid if the final estimator implements transform.

This also works where final estimator is None in which case all prior transformations are applied.

Parameters:
Xiterable

Data to transform. Must fulfill input requirements of first step of the pipeline.

**paramsdict of str -> object

Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.

New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.

Returns:
Xtndarray of shape (n_samples, n_transformed_features)

Transformed data.

class RLM(endog, exog, M=None, missing='none', **kwargs)[source]

Bases: LikelihoodModel

Robust Linear Model

Estimate a robust linear model via iteratively reweighted least squares given a robust criterion estimator.

Parameters:
endogarray_like

A 1-d endogenous response variable. The dependent variable.

exogarray_like

A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant().

Mstatsmodels.robust.norms.RobustNorm, optional

The robust criterion function for downweighting outliers. The current options are LeastSquares, HuberT, RamsayE, AndrewWave, TrimmedMean, Hampel, and TukeyBiweight. The default is HuberT(). See statsmodels.robust.norms for more information.

missingstr

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.

Examples

>>> import statsmodels.api as sm
>>> data = sm.datasets.stackloss.load()
>>> data.exog = sm.add_constant(data.exog)
>>> rlm_model = sm.RLM(data.endog, data.exog,                            M=sm.robust.norms.HuberT())
>>> rlm_results = rlm_model.fit()
>>> rlm_results.params
array([  0.82938433,   0.92606597,  -0.12784672, -41.02649835])
>>> rlm_results.bse
array([ 0.11100521,  0.30293016,  0.12864961,  9.79189854])
>>> rlm_results_HC2 = rlm_model.fit(cov="H2")
>>> rlm_results_HC2.params
array([  0.82938433,   0.92606597,  -0.12784672, -41.02649835])
>>> rlm_results_HC2.bse
array([ 0.11945975,  0.32235497,  0.11796313,  9.08950419])
>>> mod = sm.RLM(data.endog, data.exog, M=sm.robust.norms.Hampel())
>>> rlm_hamp_hub = mod.fit(scale_est=sm.robust.scale.HuberScale())
>>> rlm_hamp_hub.params
array([  0.73175452,   1.25082038,  -0.14794399, -40.27122257])
Attributes:
df_modelfloat

The degrees of freedom of the model. The number of regressors p less one for the intercept. Note that the reported model degrees of freedom does not count the intercept as a regressor, though the model is assumed to have an intercept.

df_residfloat

The residual degrees of freedom. The number of observations n less the number of regressors p. Note that here p does include the intercept as using a degree of freedom.

endogndarray

See above. Note that endog is a reference to the data so that if data is already an array and it is changed, then endog changes as well.

exogndarray

See above. Note that endog is a reference to the data so that if data is already an array and it is changed, then endog changes as well.

Mstatsmodels.robust.norms.RobustNorm

See above. Robust estimator instance instantiated.

nobsfloat

The number of observations n

pinv_wexogndarray

The pseudoinverse of the design / exogenous data array. Note that RLM has no whiten method, so this is just the pseudo inverse of the design.

normalized_cov_paramsndarray

The p x p normalized covariance of the design / exogenous data. This is approximately equal to (X.T X)^(-1)

Methods

deviance(tmp_results)

Returns the (unnormalized) log-likelihood from the M estimator.

fit([maxiter, tol, scale_est, init, cov, ...])

Fits the model using iteratively reweighted least squares.

information(params)

Fisher information matrix of model.

loglike(params)

Log-likelihood of model.

predict(params[, exog])

Return linear predicted values from a design matrix.

score(params)

Score vector of model.

__annotations__ = {}
__doc__ = '\n    Robust Linear Model\n\n    Estimate a robust linear model via iteratively reweighted least squares\n    given a robust criterion estimator.\n\n    Parameters\n    ----------\n    endog : array_like\n        A 1-d endogenous response variable. The dependent variable.\n    exog : array_like\n        A nobs x k array where `nobs` is the number of observations and `k`\n        is the number of regressors. An intercept is not included by default\n        and should be added by the user. See\n        :func:`statsmodels.tools.add_constant`.\n    M : statsmodels.robust.norms.RobustNorm, optional\n        The robust criterion function for downweighting outliers.\n        The current options are LeastSquares, HuberT, RamsayE, AndrewWave,\n        TrimmedMean, Hampel, and TukeyBiweight.  The default is HuberT().\n        See statsmodels.robust.norms for more information.\n    missing : str\n        Available options are \'none\', \'drop\', and \'raise\'. If \'none\', no nan\n        checking is done. If \'drop\', any observations with nans are dropped.\n        If \'raise\', an error is raised. Default is \'none\'.\n\n    Attributes\n    ----------\n\n    df_model : float\n        The degrees of freedom of the model.  The number of regressors p less\n        one for the intercept.  Note that the reported model degrees\n        of freedom does not count the intercept as a regressor, though\n        the model is assumed to have an intercept.\n    df_resid : float\n        The residual degrees of freedom.  The number of observations n\n        less the number of regressors p.  Note that here p does include\n        the intercept as using a degree of freedom.\n    endog : ndarray\n        See above.  Note that endog is a reference to the data so that if\n        data is already an array and it is changed, then `endog` changes\n        as well.\n    exog : ndarray\n        See above.  Note that endog is a reference to the data so that if\n        data is already an array and it is changed, then `endog` changes\n        as well.\n    M : statsmodels.robust.norms.RobustNorm\n         See above.  Robust estimator instance instantiated.\n    nobs : float\n        The number of observations n\n    pinv_wexog : ndarray\n        The pseudoinverse of the design / exogenous data array.  Note that\n        RLM has no whiten method, so this is just the pseudo inverse of the\n        design.\n    normalized_cov_params : ndarray\n        The p x p normalized covariance of the design / exogenous data.\n        This is approximately equal to (X.T X)^(-1)\n\n    Examples\n    --------\n    >>> import statsmodels.api as sm\n    >>> data = sm.datasets.stackloss.load()\n    >>> data.exog = sm.add_constant(data.exog)\n    >>> rlm_model = sm.RLM(data.endog, data.exog,                            M=sm.robust.norms.HuberT())\n\n    >>> rlm_results = rlm_model.fit()\n    >>> rlm_results.params\n    array([  0.82938433,   0.92606597,  -0.12784672, -41.02649835])\n    >>> rlm_results.bse\n    array([ 0.11100521,  0.30293016,  0.12864961,  9.79189854])\n    >>> rlm_results_HC2 = rlm_model.fit(cov="H2")\n    >>> rlm_results_HC2.params\n    array([  0.82938433,   0.92606597,  -0.12784672, -41.02649835])\n    >>> rlm_results_HC2.bse\n    array([ 0.11945975,  0.32235497,  0.11796313,  9.08950419])\n    >>> mod = sm.RLM(data.endog, data.exog, M=sm.robust.norms.Hampel())\n    >>> rlm_hamp_hub = mod.fit(scale_est=sm.robust.scale.HuberScale())\n    >>> rlm_hamp_hub.params\n    array([  0.73175452,   1.25082038,  -0.14794399, -40.27122257])\n    '
__init__(endog, exog, M=None, missing='none', **kwargs)[source]
__module__ = 'statsmodels.robust.robust_linear_model'
_estimate_scale(resid)[source]

Estimates the scale based on the option provided to the fit method.

_initialize()[source]

Initializes the model for the IRLS fit.

Resets the history and number of iterations.

_update_history(tmp_results, history, conv)[source]
deviance(tmp_results)[source]

Returns the (unnormalized) log-likelihood from the M estimator.

fit(maxiter=50, tol=1e-08, scale_est='mad', init=None, cov='H1', update_scale=True, conv='dev', start_params=None)[source]

Fits the model using iteratively reweighted least squares.

The IRLS routine runs until the specified objective converges to tol or maxiter has been reached.

Parameters:
convstr

Indicates the convergence criteria. Available options are “coefs” (the coefficients), “weights” (the weights in the iteration), “sresid” (the standardized residuals), and “dev” (the un-normalized log-likelihood for the M estimator). The default is “dev”.

covstr, optional

‘H1’, ‘H2’, or ‘H3’ Indicates how the covariance matrix is estimated. Default is ‘H1’. See rlm.RLMResults for more information.

initstr

Specifies method for the initial estimates of the parameters. Default is None, which means that the least squares estimate is used. Currently it is the only available choice.

maxiterint

The maximum number of iterations to try. Default is 50.

scale_eststr or HuberScale()

‘mad’ or HuberScale() Indicates the estimate to use for scaling the weights in the IRLS. The default is ‘mad’ (median absolute deviation. Other options are ‘HuberScale’ for Huber’s proposal 2. Huber’s proposal 2 has optional keyword arguments d, tol, and maxiter for specifying the tuning constant, the convergence tolerance, and the maximum number of iterations. See statsmodels.robust.scale for more information.

tolfloat

The convergence tolerance of the estimate. Default is 1e-8.

update_scaleBool

If update_scale is False then the scale estimate for the weights is held constant over the iteration. Otherwise, it is updated for each fit in the iteration. Default is True.

start_paramsarray_like, optional

Initial guess of the solution of the optimizer. If not provided, the initial parameters are computed using OLS.

Returns:
resultsstatsmodels.rlm.RLMresults

Results instance

information(params)[source]

Fisher information matrix of model.

Returns -1 * Hessian of the log-likelihood evaluated at params.

Parameters:
paramsndarray

The model parameters.

loglike(params)[source]

Log-likelihood of model.

Parameters:
paramsndarray

The model parameters used to compute the log-likelihood.

Notes

Must be overridden by subclasses.

predict(params, exog=None)[source]

Return linear predicted values from a design matrix.

Parameters:
paramsarray_like

Parameters of a linear model

exogarray_like, optional.

Design / exogenous data. Model exog is used if None.

Returns:
An array of fitted values
score(params)[source]

Score vector of model.

The gradient of logL with respect to each parameter.

Parameters:
paramsndarray

The parameters to use when evaluating the Hessian.

Returns:
ndarray

The score vector evaluated at the parameters.

class StandardScaler(*, copy=True, with_mean=True, with_std=True)[source]

Bases: OneToOneFeatureMixin, TransformerMixin, BaseEstimator

Standardize features by removing the mean and scaling to unit variance.

The standard score of a sample x is calculated as:

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform().

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

StandardScaler is sensitive to outliers, and the features may scale differently from each other in the presence of outliers. For an example visualization, refer to Compare StandardScaler with other scalers.

This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data.

Read more in the User Guide.

Parameters:
copybool, default=True

If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

with_meanbool, default=True

If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_stdbool, default=True

If True, scale the data to unit variance (or equivalently, unit standard deviation).

See also

scale

Equivalent function without the estimator API.

PCA

Further removes the linear correlation across features with ‘whiten=True’.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_)
[0.5 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]
>>> print(scaler.transform([[2, 2]]))
[[3. 3.]]
Attributes:
scale_ndarray of shape (n_features,) or None

Per feature relative scaling of the data to achieve zero mean and unit variance. Generally this is calculated using np.sqrt(var_). If a variance is zero, we can’t achieve unit variance, and the data is left as-is, giving a scaling factor of 1. scale_ is equal to None when with_std=False.

New in version 0.17: scale_

mean_ndarray of shape (n_features,) or None

The mean value for each feature in the training set. Equal to None when with_mean=False and with_std=False.

var_ndarray of shape (n_features,) or None

The variance for each feature in the training set. Used to compute scale_. Equal to None when with_mean=False and with_std=False.

n_features_in_int

Number of features seen during fit.

New in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

n_samples_seen_int or ndarray of shape (n_features,)

The number of samples processed by the estimator for each feature. If there are no missing samples, the n_samples_seen will be an integer, otherwise it will be an array of dtype int. If sample_weights are used it will be a float (if no missing data) or an array of dtype float that sums the weights seen so far. Will be reset on new calls to fit, but increments across partial_fit calls.

Methods

fit(X[, y, sample_weight])

Compute the mean and std to be used for later scaling.

inverse_transform(X[, copy])

Scale back the data to the original representation.

partial_fit(X[, y, sample_weight])

Online computation of mean and std on X for later scaling.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_inverse_transform_request(*[, copy])

Request metadata passed to the inverse_transform method.

set_partial_fit_request(*[, sample_weight])

Request metadata passed to the partial_fit method.

set_transform_request(*[, copy])

Request metadata passed to the transform method.

transform(X[, copy])

Perform standardization by centering and scaling.

__annotations__ = {'_parameter_constraints': <class 'dict'>}
__doc__ = "Standardize features by removing the mean and scaling to unit variance.\n\n    The standard score of a sample `x` is calculated as:\n\n        z = (x - u) / s\n\n    where `u` is the mean of the training samples or zero if `with_mean=False`,\n    and `s` is the standard deviation of the training samples or one if\n    `with_std=False`.\n\n    Centering and scaling happen independently on each feature by computing\n    the relevant statistics on the samples in the training set. Mean and\n    standard deviation are then stored to be used on later data using\n    :meth:`transform`.\n\n    Standardization of a dataset is a common requirement for many\n    machine learning estimators: they might behave badly if the\n    individual features do not more or less look like standard normally\n    distributed data (e.g. Gaussian with 0 mean and unit variance).\n\n    For instance many elements used in the objective function of\n    a learning algorithm (such as the RBF kernel of Support Vector\n    Machines or the L1 and L2 regularizers of linear models) assume that\n    all features are centered around 0 and have variance in the same\n    order. If a feature has a variance that is orders of magnitude larger\n    than others, it might dominate the objective function and make the\n    estimator unable to learn from other features correctly as expected.\n\n    `StandardScaler` is sensitive to outliers, and the features may scale\n    differently from each other in the presence of outliers. For an example\n    visualization, refer to :ref:`Compare StandardScaler with other scalers\n    <plot_all_scaling_standard_scaler_section>`.\n\n    This scaler can also be applied to sparse CSR or CSC matrices by passing\n    `with_mean=False` to avoid breaking the sparsity structure of the data.\n\n    Read more in the :ref:`User Guide <preprocessing_scaler>`.\n\n    Parameters\n    ----------\n    copy : bool, default=True\n        If False, try to avoid a copy and do inplace scaling instead.\n        This is not guaranteed to always work inplace; e.g. if the data is\n        not a NumPy array or scipy.sparse CSR matrix, a copy may still be\n        returned.\n\n    with_mean : bool, default=True\n        If True, center the data before scaling.\n        This does not work (and will raise an exception) when attempted on\n        sparse matrices, because centering them entails building a dense\n        matrix which in common use cases is likely to be too large to fit in\n        memory.\n\n    with_std : bool, default=True\n        If True, scale the data to unit variance (or equivalently,\n        unit standard deviation).\n\n    Attributes\n    ----------\n    scale_ : ndarray of shape (n_features,) or None\n        Per feature relative scaling of the data to achieve zero mean and unit\n        variance. Generally this is calculated using `np.sqrt(var_)`. If a\n        variance is zero, we can't achieve unit variance, and the data is left\n        as-is, giving a scaling factor of 1. `scale_` is equal to `None`\n        when `with_std=False`.\n\n        .. versionadded:: 0.17\n           *scale_*\n\n    mean_ : ndarray of shape (n_features,) or None\n        The mean value for each feature in the training set.\n        Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.\n\n    var_ : ndarray of shape (n_features,) or None\n        The variance for each feature in the training set. Used to compute\n        `scale_`. Equal to ``None`` when ``with_mean=False`` and\n        ``with_std=False``.\n\n    n_features_in_ : int\n        Number of features seen during :term:`fit`.\n\n        .. versionadded:: 0.24\n\n    feature_names_in_ : ndarray of shape (`n_features_in_`,)\n        Names of features seen during :term:`fit`. Defined only when `X`\n        has feature names that are all strings.\n\n        .. versionadded:: 1.0\n\n    n_samples_seen_ : int or ndarray of shape (n_features,)\n        The number of samples processed by the estimator for each feature.\n        If there are no missing samples, the ``n_samples_seen`` will be an\n        integer, otherwise it will be an array of dtype int. If\n        `sample_weights` are used it will be a float (if no missing data)\n        or an array of dtype float that sums the weights seen so far.\n        Will be reset on new calls to fit, but increments across\n        ``partial_fit`` calls.\n\n    See Also\n    --------\n    scale : Equivalent function without the estimator API.\n\n    :class:`~sklearn.decomposition.PCA` : Further removes the linear\n        correlation across features with 'whiten=True'.\n\n    Notes\n    -----\n    NaNs are treated as missing values: disregarded in fit, and maintained in\n    transform.\n\n    We use a biased estimator for the standard deviation, equivalent to\n    `numpy.std(x, ddof=0)`. Note that the choice of `ddof` is unlikely to\n    affect model performance.\n\n    Examples\n    --------\n    >>> from sklearn.preprocessing import StandardScaler\n    >>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]\n    >>> scaler = StandardScaler()\n    >>> print(scaler.fit(data))\n    StandardScaler()\n    >>> print(scaler.mean_)\n    [0.5 0.5]\n    >>> print(scaler.transform(data))\n    [[-1. -1.]\n     [-1. -1.]\n     [ 1.  1.]\n     [ 1.  1.]]\n    >>> print(scaler.transform([[2, 2]]))\n    [[3. 3.]]\n    "
__init__(*, copy=True, with_mean=True, with_std=True)[source]
__module__ = 'sklearn.preprocessing._data'
_more_tags()[source]
_parameter_constraints: dict = {'copy': ['boolean'], 'with_mean': ['boolean'], 'with_std': ['boolean']}
_reset()[source]

Reset internal data-dependent state of the scaler, if necessary.

__init__ parameters are not touched.

_sklearn_auto_wrap_output_keys = {'transform'}
fit(X, y=None, sample_weight=None)[source]

Compute the mean and std to be used for later scaling.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The data used to compute the mean and standard deviation used for later scaling along the features axis.

yNone

Ignored.

sample_weightarray-like of shape (n_samples,), default=None

Individual weights for each sample.

New in version 0.24: parameter sample_weight support to StandardScaler.

Returns:
selfobject

Fitted scaler.

inverse_transform(X, copy=None)[source]

Scale back the data to the original representation.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The data used to scale along the features axis.

copybool, default=None

Copy the input X or not.

Returns:
X_tr{ndarray, sparse matrix} of shape (n_samples, n_features)

Transformed array.

partial_fit(X, y=None, sample_weight=None)[source]

Online computation of mean and std on X for later scaling.

All of X is processed as a single batch. This is intended for cases when fit() is not feasible due to very large number of n_samples or because X is read from a continuous stream.

The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. “Algorithms for computing the sample variance: Analysis and recommendations.” The American Statistician 37.3 (1983): 242-247:

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The data used to compute the mean and standard deviation used for later scaling along the features axis.

yNone

Ignored.

sample_weightarray-like of shape (n_samples,), default=None

Individual weights for each sample.

New in version 0.24: parameter sample_weight support to StandardScaler.

Returns:
selfobject

Fitted scaler.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') StandardScaler

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns:
selfobject

The updated object.

set_inverse_transform_request(*, copy: bool | None | str = '$UNCHANGED$') StandardScaler

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for copy parameter in inverse_transform.

Returns:
selfobject

The updated object.

set_partial_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') StandardScaler

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in partial_fit.

Returns:
selfobject

The updated object.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') StandardScaler

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for copy parameter in transform.

Returns:
selfobject

The updated object.

transform(X, copy=None)[source]

Perform standardization by centering and scaling.

Parameters:
X{array-like, sparse matrix of shape (n_samples, n_features)

The data used to scale along the features axis.

copybool, default=None

Copy the input X or not.

Returns:
X_tr{ndarray, sparse matrix} of shape (n_samples, n_features)

Transformed array.

_add_lagged_X(X: DataFrame, lags: Iterable[int]) DataFrame[source]
_align(y: Series, X: Series | DataFrame, how: str = 'inner', allow_empty: bool = False) Tuple[Series, DataFrame][source]

Align target and neighbor(s) on a common DatetimeIndex.

_as_series_like(x: Series | DataFrame, name: str) DataFrame[source]
_assert_same_regular_grid(idx_y: DatetimeIndex, idx_x: DatetimeIndex) None[source]

Raise if the two indices are not on the same regular grid (same step & phase).

_compute_metrics(y_true: ndarray, y_pred: ndarray) Dict[str, float][source]
_dfm_params_to_vector(mod, params)[source]
Build transformed vector in the exact order of mod.param_names from either:
  • {‘transformed’: […], ‘param_names’: […]}, or

  • a constrained dict {‘q_beta’:…, ‘q_ax’:…, ‘r_y’:…, ‘r_x’:…, ‘phi_x’:…, ‘load’:…}

_fit_dfm(y, X, *, factor: str = 'default', anomaly_mode: str = 'ar', anom_var: str = 'neighbor', rx_scale: float = 3.0, maxiter: int = 80, disp: int = 0, params: dict | None = None)[source]
_fit_huber(y: Series, X: DataFrame) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]
_fit_lagged_elasticnet(y: Series, X: DataFrame, lags: Iterable[int], alphas: List[float] | None = None, l1_ratio: float = 0.2, n_splits: int = 3) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]
_fit_loess(y: Series, X: DataFrame, frac: float = 0.2) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]

Locally weighted regression (LOESS) smoother for neighbor fill.

_fit_ols(y: Series, X: DataFrame) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]
_fit_resid_interp(y: Series, X: DataFrame, kind: str = 'linear') Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]

Fill y using neighbor via interpolated residuals.

Steps:
  1. Fit baseline y ≈ a + b x on overlap (OLS; falls back to ratio if needed).

  2. Residuals r = y - (a + b x) on overlap.

  3. Interpolate r only inside gaps (bounded on both sides) using ‘linear’ or ‘pchip’.

  4. Reconstruct yhat = (a + b x) + r_interp wherever x is available.

_fit_rolling_regression(y: Series, X: DataFrame, window: int, center: bool = False) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]
_fit_substitute(y: Series, X: Series | DataFrame)[source]
_forward_chain_splits(n: int, n_splits: int = 3, min_train: int = 50) List[Tuple[ndarray, ndarray]][source]

Generate forward-chaining train/test index splits for time series.

Parameters:
nint

Number of samples.

n_splitsint

How many folds.

min_trainint

Minimum size of the initial training window.

_inv_logit(z)[source]
_logit(x)[source]
_mask_overlap(y: Series, X: DataFrame) Tuple[Series, DataFrame][source]

Keep only timestamps where both y and ALL X columns are non-NaN.

_opt_debug(mod, res)[source]

Print start vs fitted params (both transformed & constrained).

_phi_from_logit(z)[source]
_suggest_lags(y: Series, x: Series, max_lag: int) List[int][source]

Suggest non-negative lags (in steps) by cross-correlation peak.

Returns a list of lags sorted by decreasing absolute correlation.

asdict(obj, *, dict_factory=<class 'dict'>)[source]

Return the fields of a dataclass instance as a new dictionary mapping field names to field values.

Example usage:

@dataclass class C:

x: int y: int

c = C(1, 2) assert asdict(c) == {‘x’: 1, ‘y’: 2}

If given, ‘dict_factory’ will be used instead of built-in dict. The function applies recursively to field values that are dataclass instances. This will also look into built-in containers: tuples, lists, and dicts.

dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False)[source]

Returns the same class as was passed in, with dunder methods added based on the fields defined in the class.

Examines PEP 526 __annotations__ to determine fields.

If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method function is added. If frozen is true, fields may not be assigned to after instance creation. If match_args is true, the __match_args__ tuple is added. If kw_only is true, then by default all fields are keyword-only. If slots is true, an __slots__ attribute is added.

dfm_pack_params(model_info: dict) dict[source]

Return a portable blob of fitted DFM params.

Parameters:
model_infodict

Model info dictionary, typically from fill_from_neighbor.

Returns:
dict

Dictionary containing fitted DFM parameters with the following keys: - ‘param_names’: list of parameter names. - ‘transformed’: list of transformed parameter values. - ‘constrained’: dictionary of constrained parameter values. - ‘mle’: dictionary with optimizer info (optional). - ‘reused’: bool indicating if parameters were reused (optional).

Raises:
TypeError

If model_info is not a dictionary.

ValueError

If no fitted parameters are found in model_info.

fill_from_neighbor(target: Series, neighbor: Series | DataFrame, method: str = 'substitute', regime: Series | None = None, bounds: Tuple[float | None, float | None] = (None, None), *, params: dict | None = None, **kwargs) Dict[str, Any][source]

Fill gaps in target using information from neighbor.

This is a high-level wrapper with multiple method backends (OLS/robust, rolling regression, lagged regression, LOESS-in-time, Trimbur-style DFM variants, residual-interpolation baselines, or simple substitution). Inputs must already lie on the same regular time grid (same step and phase); this function does not resample.

Parameters:
targetpandas.Series

Target time series with a DatetimeIndex on a regular grid. Values may be NaN.

neighborpandas.Series or pandas.DataFrame

One or more neighbor series with a DatetimeIndex on the same grid as target (same step and phase). Values may be NaN.

method{‘substitute’, ‘ols’, ‘huber’, ‘rolling’, ‘lagged_reg’,

‘loess’, ‘dfm_trimbur_rw’, ‘dfm_trimbur_ar’, ‘resid_interp_linear’, ‘resid_interp_pchip’}

Algorithm to use:

  • 'substitute': pass-through neighbor after mean/scale alignment.

  • 'ols': ordinary least squares on overlap (optionally with lags).

  • 'huber': robust regression with Huber loss (optionally with lags).

  • 'rolling': rolling-window OLS in sample units (not time offsets).

  • 'lagged_reg': multivariate regression on specified neighbor lags.

  • 'loess': LOESS (time → value) smoothing using neighbor as scaffold.

  • 'dfm_trimbur_rw': dynamic factor model (Trimbur factor) with

random-walk anomaly for the target. - 'dfm_trimbur_ar': dynamic factor model (Trimbur factor) with AR anomaly on the neighbor. - 'resid_interp_linear' / 'resid_interp_pchip': baseline y≈a+bx fit on overlap, then interpolate residuals (linear or PCHIP) across gaps.

regimepandas.Series, optional

Optional categorical series indexed like target to stratify fits (e.g., barrier in/out). If provided, models are fit per category and stitched back together.

bounds(float or None, float or None)

Lower/upper bounds to clip the final filled values (applied at the end).

paramsdict, optional

Pre-fitted/packed parameter blob for methods that support parameter reuse (e.g., the DFM backends). If provided, fitting is skipped and the supplied parameters are used directly.

**kwargs

Method-specific optional arguments. Unsupported keys are ignored unless otherwise noted. Typical extras by method:

Common
lagsint or Sequence[int], optional

Non-negative lags (in samples) for neighbor features. If an int m is provided, implementations may expand to range(0, m+1). Default behavior varies by method (often no lags or a small heuristic set).

seedint, optional

Random seed for any stochastic initializations (where applicable).

‘ols’

lags : int or Sequence[int], optional add_const : bool, default True

Include an intercept term.

fit_intercept : bool, alias of add_const.

‘huber’

lags : int or Sequence[int], optional huber_t : float, default 1.35

Huber threshold (in residual σ units).

maxiter : int, default 200 tol : float, default 1e-6

‘rolling’
windowint, required

Rolling window length in samples (integer). Time-offset strings (e.g., ‘14D’) are not supported here.

min_periodsint, optional

Minimum non-NaN samples required inside each window (default = window).

centerbool, default False

Whether to center the rolling window.

lagsint or Sequence[int], optional

If provided, each regression uses lagged neighbor columns inside the window.

‘lagged_reg’

lags : int or Sequence[int], recommended alpha : float, optional

Ridge/L2 penalty (if the backend supports it).

l1_ratiofloat, optional

Elastic-net mixing (if the backend supports it).

standardizebool, default True

Standardize columns before regression.

‘loess’
fracfloat, default 0.25

LOESS span as a fraction of the data length (used in time→value smoothing).

itint, default 0

Number of robustifying reweighting iterations.

degreeint, default 1

Local polynomial degree.

‘dfm_trimbur_rw’ / ‘dfm_trimbur_ar’
rx_scalefloat, default 1.0

Relative scale factor for neighbor measurement noise.

maxiterint, default 80

Maximum optimizer iterations during parameter fitting.

dispint, default 0

Optimizer verbosity (0 = silent).

anom_var{‘target’,’neighbor’}, optional

Which series carries the anomaly/noise term (fixed by the variant, but may be overridden).

ar_orderint, optional

AR order for the anomaly in the '_ar' variant (default may be 1).

param_nameslist[str], optional

For advanced users: explicit parameter naming (used when packing).

# Note: DFM backends accept params=... at the top level for reuse.

‘resid_interp_linear’ / ‘resid_interp_pchip’
min_overlapint, default 3

Minimum overlapping samples required to fit the baseline y≈a+bx.

clip_residuals_sigmafloat, optional

Winsorize residuals before interpolation (σ units).

enforce_monotonebool, default False

For PCHIP path only: enforce monotonic segments where applicable.

Returns:
dict

Dictionary with the following keys:

yhatpandas.Series

Filled series on the same index as target.

pi_lowerpandas.Series or None

Lower uncertainty band (if the method provides one), otherwise None.

pi_upperpandas.Series or None

Upper uncertainty band (if the method provides one), otherwise None.

model_infodict

Method-specific diagnostics and metadata. Typical fields include: method, param_names, fitted_params (packed blob for reuse), scaling (means/stds used), goodness-of-fit (e.g., llf, aic, bic), and per-regime info when regime is provided.

Raises:
ValueError

If indices are not equally spaced, or grids mismatch in step or phase, or if required method-specific kwargs are missing (e.g., window for method='rolling').

KeyError

If an unknown method name is provided.

fit_loess_time_value(y: Series, X: DataFrame, frac_time: float = 0.05, min_neighbors: int = 25) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]

Two-dimensional LOESS-like smoother: y(t) ~ f(x(t), t), implemented as distance-weighted KNN in (time, value) space.

  • Avoids Series&DataFrame boolean broadcasting by reducing X→Series first.

  • Scales time and value so distances are comparable.

  • Predicts wherever neighbor is present.

load_dfm_params(path: str) Dict[str, Any][source]

Load a DFM parameter blob from YAML and validate minimal schema.

lowess(endog, exog, frac=0.6666666666666666, it=3, delta=0.0, xvals=None, is_sorted=False, missing='drop', return_sorted=True)[source]

LOWESS (Locally Weighted Scatterplot Smoothing)

A lowess function that outs smoothed estimates of endog at the given exog values from points (exog, endog)

Parameters:
endog1-D numpy array

The y-values of the observed points

exog1-D numpy array

The x-values of the observed points

fracfloat

Between 0 and 1. The fraction of the data used when estimating each y-value.

itint

The number of residual-based reweightings to perform.

deltafloat

Distance within which to use linear-interpolation instead of weighted regression.

xvals: 1-D numpy array

Values of the exogenous variable at which to evaluate the regression. If supplied, cannot use delta.

is_sortedbool

If False (default), then the data will be sorted by exog before calculating lowess. If True, then it is assumed that the data is already sorted by exog. If xvals is specified, then it too must be sorted if is_sorted is True.

missingstr

Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘drop’.

return_sortedbool

If True (default), then the returned array is sorted by exog and has missing (nan or infinite) observations removed. If False, then the returned array is in the same length and the same sequence of observations as the input array.

Returns:
out{ndarray, float}

The returned array is two-dimensional if return_sorted is True, and one dimensional if return_sorted is False. If return_sorted is True, then a numpy array with two columns. The first column contains the sorted x (exog) values and the second column the associated estimated y (endog) values. If return_sorted is False, then only the fitted values are returned, and the observations will be in the same order as the input arrays. If xvals is provided, then return_sorted is ignored and the returned array is always one dimensional, containing the y values fitted at the x values provided by xvals.

Notes

This lowess function implements the algorithm given in the reference below using local linear estimates.

Suppose the input data has N points. The algorithm works by estimating the smooth y_i by taking the frac*N closest points to (x_i,y_i) based on their x values and estimating y_i using a weighted linear regression. The weight for (x_j,y_j) is tricube function applied to abs(x_i-x_j).

If it > 1, then further weighted local linear regressions are performed, where the weights are the same as above times the _lowess_bisquare function of the residuals. Each iteration takes approximately the same amount of time as the original fit, so these iterations are expensive. They are most useful when the noise has extremely heavy tails, such as Cauchy noise. Noise with less heavy-tails, such as t-distributions with df>2, are less problematic. The weights downgrade the influence of points with large residuals. In the extreme case, points whose residuals are larger than 6 times the median absolute residual are given weight 0.

delta can be used to save computations. For each x_i, regressions are skipped for points closer than delta. The next regression is fit for the farthest point within delta of x_i and all points in between are estimated by linearly interpolating between the two regression fits.

Judicious choice of delta can cut computation time considerably for large data (N > 5000). A good choice is delta = 0.01 * range(exog).

If xvals is provided, the regression is then computed at those points and the fit values are returned. Otherwise, the regression is run at points of exog.

Some experimentation is likely required to find a good choice of frac and iter for a particular dataset.

References

Cleveland, W.S. (1979) “Robust Locally Weighted Regression and Smoothing Scatterplots”. Journal of the American Statistical Association 74 (368): 829-836.

Examples

The below allows a comparison between how different the fits from lowess for different values of frac can be.

>>> import numpy as np
>>> import statsmodels.api as sm
>>> lowess = sm.nonparametric.lowess
>>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500)
>>> y = np.sin(x) + np.random.normal(size=len(x))
>>> z = lowess(y, x)
>>> w = lowess(y, x, frac=1./3)

This gives a similar comparison for when it is 0 vs not.

>>> import numpy as np
>>> import scipy.stats as stats
>>> import statsmodels.api as sm
>>> lowess = sm.nonparametric.lowess
>>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500)
>>> y = np.sin(x) + stats.cauchy.rvs(size=len(x))
>>> z = lowess(y, x, frac= 1./3, it=0)
>>> w = lowess(y, x, frac=1./3)
save_dfm_params(params: Dict[str, Any], path: str) None[source]

Save a DFM parameter blob to YAML (preferred for this codebase). File extension may be .yaml or .yml. Other extensions raise.

write_filled_csv_with_yaml_header(filled: Series, path: str, model_info: Dict[str, Any], metrics: Dict[str, float] | None = None, extra_meta: Dict[str, Any] | None = None, float_format: str = '{:.6g}')[source]

Write a CSV file with a YAML-like header as #-comments.

Parameters:
filledpd.Series

Series to write; index must be a DatetimeIndex.

pathstr

Destination filepath.

model_infodict

Metadata from fill_from_neighbor; will be serialized.

metricsdict, optional

Metrics to include.

extra_metadict, optional

Any additional metadata.

float_formatstr

Format string for values.

vtools.functions.period_op module

period_op(ts, period='D', agg='mean', max_absent_frac=0.0)[source]
window_op(ts, window, period='D', agg='mean', max_absent_frac=0.0)[source]

vtools.functions.savitzky_golay module

_build_vander(x, degree)[source]
_evaluate_poly(c, x)[source]
_polyfit_window(x, y, w, degree)[source]
main()[source]
savgol_filter_numba(y, window_length, degree, error)[source]
savgol_filter_weighted(data, window_length, degree, error=None, cov_matrix=None, deriv=None, use_numba=True)[source]

Apply a Savitzky–Golay filter with weights to a univariate DataFrame or Series.

Parameters:
datapandas.DataFrame or pandas.Series

DataFrame or Series containing your data.

window_lengthint

Length of the filter window (must be odd).

degreeint

Degree of the polynomial fit.

errorpandas.Series, optional

Series containing the error (used to compute weights).

cov_matrix2D numpy array, optional

Covariance matrix for the errors.

derivint, optional

Derivative order to compute.

use_numbabool, optional

If True, uses the Numba-accelerated kernel.

Returns:
pandas.Series

Series of the filtered values.

Notes

The practical size of window_length depends on the data and the computational resources. Larger window lengths provide smoother results but require more computation and may not capture local variations well. It is recommended to experiment with different window lengths to find the optimal value for your specific application.

Some of the workflow derived from this work: https://github.com/surhudm/savitzky_golay_with_errors

savgol_filter_werror(y, window_length, degree, error=None, cov=None, deriv=None)[source]
solve_leastsq(yarr, ycov, vander, vanderT, deriv=None)[source]
solve_polyfit(xarr, yarr, degree, weight, deriv=None)[source]

vtools.functions.separate_species module

Separation of tidal data into species The key function in this module is separate_species, which decomposes tides into subtidal, diurnal, semidiurnal and noise components.

The fileters are long, so the time resolution of the amplitude may be limited. A demo function is also provided that reads tide series (6min intervl) from input files, seperates the species, writes results and optionally plots an example

create_arg_parser()[source]
main()[source]
plot_result(ts, ts_semi, ts_diurnal, ts_sub_tide, station)[source]
run_example()[source]

This is the data for the example. Note that you want the data to be at least 4 days longer than the desired output

separate_species(ts, noise_thresh_min=40)[source]

Separate species into subtidal, diurnal, semidiurnal and noise components

Input:

ts: timeseries to be decomposed into species, assumed to be at six minute intervals. The filters used have long lenghts, so avoid missing data and allow for four extra days worth of data on each end.

Output:

four regular time series, representing subtidal, diurnal, semi-diurnal and noise

write_th(filename, ts_output)[source]

This works fine for fairly big series

vtools.functions.skill_metrics module

_main()[source]
corr_coefficient(predictions, targets, method='pearson')[source]

Calculates the correlation coefficient (the ‘r’ in ‘-squared’ between two series.

For time series where the targets are serially correlated and may span only a fraction of the natural variability, this statistic may not be appropriate and Murphy (1988) explains why caution should be exercised in using this statistic.

Parameters:
predictions, targetsarray_like

Time series to analyze

methodpearson’, ‘kendall’, ‘spearman’

Method compatilble with pandasa

Returns:
rfloat

Correlation coefficient

mean_error(predictions, targets, proportiontocut)[source]

Calculate the untrimmed mean error, discounting nan values

Parameters:
predictions, targetsarray_like

Time series or arrays to be analyzed

Returns:
medfloat

Median error

median_error(predictions, targets)[source]

Calculate the median error, discounting nan values

Parameters:
predictions, targetsarray_like

Time series or arrays to be analyzed

Returns:
medfloat

Median error

mse(predictions, targets)[source]

Mean squared error

Parameters:
predictions, targetsarray_like

Time series or arrays to analyze

Returns:
msevtools.data.timeseries.TimeSeries

Mean squared error between predictions and targets

rmse(predictions, targets)[source]

Root mean squared error

Parameters:
predictions, targetsarray_like

Time series or arrays to analyze

Returns:
msefloat

Mean squared error

skill_score(predictions, targets, ref=None)[source]

Calculate a Nash-Sutcliffe-like skill score based on mean squared error

As per the discussion in Murphy (1988) other reference forecasts (climatology, harmonic tide, etc.) are possible.

Parameters:
predictions, targetsarray_like

Time series or arrays to be analyzed

Returns:
rmsefloat

Root mean squared error

tmean_error(predictions, targets, limits=None, inclusive=[True, True])[source]

Calculate the (possibly trimmed) mean error, discounting nan values

Parameters:
predictions, targetsarray_like

Time series or arrays to be analyzed

limitstuple(float)

Low and high limits for trimming

inclusive[boolean, boolean]

True if clipping is inclusive on the low/high end

Returns:
meanfloat

Trimmed mean error

willmott_score(predictions, targets, ref=None)[source]

Calculate a Nash-Sutcliffe-like skill score based on mean squared error

As per the discussion in Murphy (1988) other reference forecasts (climatology, harmonic tide, etc.) are possible.

Parameters:
predictions, targetsarray_like

Time series or arrays to be analyzed

Returns:
rmsefloat

Root mean squared error

vtools.functions.tidalhl module

cosine_lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]
filter_where_na(df, dfb)[source]

remove values in df where dfb has na values

get_phase_diff(df1, df2, tolerance='4h')[source]
get_smoothed_resampled(df, cutoff_period='2h', resample_period='1min', interpolate_method='pchip')[source]

Resample the dataframe (indexed by time) to the regular period of resample_period using the interpolate method

Furthermore the cosine lanczos filter is used with a cutoff_period to smooth the signal to remove high frequency noise

Args:

df (DataFrame): A single column dataframe indexed by datetime

cutoff_period (str, optional): cutoff period for cosine lanczos filter. Defaults to ‘2h’.

resample_period (str, optional): Resample to regular period. Defaults to ‘1min’.

interpolate_method (str, optional): interpolation for resampling. Defaults to ‘pchip’.

Returns:

DataFrame: smoothed and resampled dataframe indexed by datetime

get_tidal_amplitude(dfh, dfl)[source]

Tidal amplitude given tidal highs and lows

Args:

dfh (DataFrame): Tidal highs time series

dfl (DataFrame): Tidal lows time series

Returns:

DataFrame: Amplitude timeseries, at the times of the low following the high being used for amplitude calculation

get_tidal_amplitude_diff(dfamp1, dfamp2, percent_diff=False, tolerance='4h')[source]

Get the difference of values within +/- 4H of values in the two amplitude arrays

Args:

dfamp1 (DataFrame): Amplitude time series

dfamp2 (DataFrame): Amplitude time series

percent_diff (bool, optional): If true do percent diff. Defaults to False.

Returns:

DataFrame: Difference dfamp1-dfamp2 or % Difference (dfamp1-dfamp2)/dfamp2*100 for values within +/- 4H of each other

get_tidal_hl(df, cutoff_period='2h', resample_period='1min', interpolate_method='pchip', moving_window_size='7h')[source]

Get Tidal highs and lows

Args:

df (DataFrame): A single column dataframe indexed by datetime

cutoff_period (str, optional): cutoff period for cosine lanczos filter. Defaults to ‘2h’.

resample_period (str, optional): Resample to regular period. Defaults to ‘1min’.

interpolate_method (str, optional): interpolation for resampling. Defaults to ‘pchip’.

moving_window_size (str, optional): moving window size to look for lows within. Defaults to ‘7h’.

Returns:

tuple of DataFrame: Tidal high and tidal low time series

get_tidal_hl_rolling(df, cutoff_period='2h', resample_period='1min', interpolate_method='pchip', moving_window_size='7h')

Get Tidal highs and lows

Args:

df (DataFrame): A single column dataframe indexed by datetime

cutoff_period (str, optional): cutoff period for cosine lanczos filter. Defaults to ‘2h’.

resample_period (str, optional): Resample to regular period. Defaults to ‘1min’.

interpolate_method (str, optional): interpolation for resampling. Defaults to ‘pchip’.

moving_window_size (str, optional): moving window size to look for lows within. Defaults to ‘7h’.

Returns:

tuple of DataFrame: Tidal high and tidal low time series

get_tidal_hl_zerocrossing(df, round_to='1min')[source]

Finds the tidal high and low times using zero crossings of the first derivative.

This works for all situations but is not robust in the face of noise and perturbations in the signal

get_tidal_phase_diff(dfh2, dfl2, dfh1, dfl1, tolerance='4h')[source]

Calculates the phase difference between df2 and df1 tidal highs and lows

Scans +/- 4 hours in df1 to get the highs and lows in that windows for df2 to get the tidal highs and lows at the times of df1

Args:

dfh2 (DataFrame): Timeseries of tidal highs

dfl2 (DataFrame): Timeseries of tidal lows

dfh1 (DataFrame): Timeseries of tidal highs

dfl1 (DataFRame): Timeseries of tidal lows

Returns:

DataFrame: Phase difference (dfh2-dfh1) and (dfl2-dfl1) in minutes

limit_to_indices(df, si, ei)[source]
lmax(arr)[source]

Local maximum: Returns value only when centered on maximum

lmin(arr)[source]

Local minimum: Returns value only when centered on minimum

periods_per_window(moving_window_size: str, period_str: str) int[source]

Number of period size in moving window

Args:

moving_window_size (str): moving window size as a string e.g 7H for 7 hour

period_str (str): period as str e.g. 1T for 1 min

Returns:

int: number of periods in the moving window rounded to an integer

tidal_highs(df, moving_window_size='7h')[source]

Tidal highs (could be upto two highs in a 25 hr period)

Args:

df (DataFrame): a time series with a regular frequency

moving_window_size (str, optional): moving window size to look for highs within. Defaults to ‘7h’.

Returns:

DataFrame: an irregular time series with highs at resolution of df.index

tidal_lows(df, moving_window_size='7h')[source]

Tidal lows (could be upto two lows in a 25 hr period)

Args:

df (DataFrame): a time series with a regular frequency

moving_window_size (str, optional): moving window size to look for lows within. Defaults to ‘7h’.

Returns:

DataFrame: an irregular time series with lows at resolution of df.index

where_changed(df)[source]
where_same(dfg, df)[source]

return dfg only where its value is the same as df for the same time stamps i.e. the interesection locations with df

zerocross(df)[source]

Calculates the gradient of the time series and identifies locations where gradient changes sign Returns the time rounded to nearest minute where the zero crossing happens (based on linear derivative assumption)

vtools.functions.tidalhours module

Functions for analyzing tidal cycles from time series data.

This module provides functions to analyze tidal time series, identify slack water times, and map any time to its position within the tidal cycle (tidal hour). This is useful for tidal phase analysis in estuarine and coastal studies.

Functions

find_slack(jd, u, leave_mean=False, which=’both’)

Identifies the times of “slack water”—the moments when tidal current velocity (u) crosses zero.

hour_tide(jd, u=None, h=None, jd_new=None, leave_mean=False)

Calculates the “tidal hour” for each time point, i.e., the phase of the tidal cycle (0–12, where 0 is slack before ebb).

hour_tide_fn(jd, u, leave_mean=False)

Returns a function that computes tidal hour for arbitrary time points, based on the provided time/velocity series.

tidal_hour_signal(ts, filter=True)

Compute the tidal hour of a semidiurnal signal.

diff_h(tidal_hour_series)

Compute the time derivative of tidal hour.

cdiff(a, n=1, axis=-1)[source]

Like np.diff, but include difference from last element back to first.

Parameters:
aarray-like

Input array.

nint, optional

Order of the difference. Only n=1 is supported.

axisint, optional

Axis along which the difference is taken.

Returns:
ndarray

Array of differences, same shape as input.

Notes

This function computes the difference between consecutive elements of the input array, and also includes the difference from the last element back to the first, preserving the array shape.

cosine_lanczos5(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]

squared low-pass cosine lanczos filter on a regular time series.

Parameters:
tsDataFrame
filter_lenint, time_interval

Size of lanczos window, default is to number of samples within filter_period*1.25.

cutoff_frequency: float,optional

Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.

cutoff_periodstring or _time_interval

Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.

padtypestr or None, optional

Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.

padlenint or None, optional

The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.

fill_edge_nan: bool,optional

If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.

Returns:
resultTimeSeries

A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.

Raises:
ValueError

If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size

diff_h(tidal_hour_series)[source]

Compute the time derivative of tidal hour.

Parameters:
tidal_hour_seriespandas.Series

Output of tidal_hour_signal, indexed by datetime.

Returns:
pandas.Series

Time derivative of tidal hour (dH/dt) in hours/hour, indexed by datetime.

Notes

This derivative is often included to capture how rapidly the tidal phase is changing, which can be important in modeling flow reversals, estuarine dynamics, or for detecting slack tide conditions where the rate of change is near zero.

find_slack(jd, u, leave_mean=False, which='both')[source]

Identify slack water times from a velocity time series.

Parameters:
jdarray-like

Array of time values (Julian days or similar).

uarray-like

Array of velocity values (flood-positive).

leave_meanbool, optional

If False, removes the mean (low-frequency) component from u.

which{‘both’, ‘high’, ‘low’}, optional

Specifies which zero-crossings to return.

Returns:
jd_slackndarray

Array of times when slack water occurs.

start{‘ebb’, ‘flood’}

String indicating the initial state.

Notes

This function detects transitions in the velocity time series where the current reverses direction (i.e., crosses zero), which correspond to slack water events.

hilbert(x, N=None, axis=-1)[source]

Compute the analytic signal, using the Hilbert transform.

The transformation is done along the last axis by default.

Parameters:
xarray_like

Signal data. Must be real.

Nint, optional

Number of Fourier components. Default: x.shape[axis]

axisint, optional

Axis along which to do the transformation. Default: -1.

Returns:
xandarray

Analytic signal of x, of each 1-D array along axis

Notes

The analytic signal x_a(t) of signal x(t) is:

\[x_a = F^{-1}(F(x) 2U) = x + i y\]

where F is the Fourier transform, U the unit step function, and y the Hilbert transform of x. [1]

In other words, the negative half of the frequency spectrum is zeroed out, turning the real-valued signal into a complex signal. The Hilbert transformed signal can be obtained from np.imag(hilbert(x)), and the original signal from np.real(hilbert(x)).

References

[1]

Wikipedia, “Analytic signal”. https://en.wikipedia.org/wiki/Analytic_signal

[2]

Leon Cohen, “Time-Frequency Analysis”, 1995. Chapter 2.

[3]

Alan V. Oppenheim, Ronald W. Schafer. Discrete-Time Signal Processing, Third Edition, 2009. Chapter 12. ISBN 13: 978-1292-02572-8

Examples

In this example we use the Hilbert transform to determine the amplitude envelope and instantaneous frequency of an amplitude-modulated signal.

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from scipy.signal import hilbert, chirp
>>> duration = 1.0
>>> fs = 400.0
>>> samples = int(fs*duration)
>>> t = np.arange(samples) / fs

We create a chirp of which the frequency increases from 20 Hz to 100 Hz and apply an amplitude modulation.

>>> signal = chirp(t, 20.0, t[-1], 100.0)
>>> signal *= (1.0 + 0.5 * np.sin(2.0*np.pi*3.0*t) )

The amplitude envelope is given by magnitude of the analytic signal. The instantaneous frequency can be obtained by differentiating the instantaneous phase in respect to time. The instantaneous phase corresponds to the phase angle of the analytic signal.

>>> analytic_signal = hilbert(signal)
>>> amplitude_envelope = np.abs(analytic_signal)
>>> instantaneous_phase = np.unwrap(np.angle(analytic_signal))
>>> instantaneous_frequency = (np.diff(instantaneous_phase) /
...                            (2.0*np.pi) * fs)
>>> fig, (ax0, ax1) = plt.subplots(nrows=2)
>>> ax0.plot(t, signal, label='signal')
>>> ax0.plot(t, amplitude_envelope, label='envelope')
>>> ax0.set_xlabel("time in seconds")
>>> ax0.legend()
>>> ax1.plot(t[1:], instantaneous_frequency)
>>> ax1.set_xlabel("time in seconds")
>>> ax1.set_ylim(0.0, 120.0)
>>> fig.tight_layout()
hour_tide(jd, u=None, h=None, jd_new=None, leave_mean=False, start_datum='ebb')[source]

Calculate tidal hour from a time series of velocity or water level.

Parameters:
jdarray-like

Time in days (e.g., Julian day, datenum, etc.).

uarray-like, optional

Velocity, flood-positive.

harray-like, optional

Water level, positive up.

jd_newarray-like, optional

Optional new time points to evaluate.

leave_meanbool, optional

By default, the time series mean is removed, but this can be disabled by passing True.

start_datum{‘ebb’, ‘flood’}, optional

Desired starting datum for tidal hour.

Returns:
ndarray

Array of tidal hour values (0–12) for each time point.

Notes

This function computes the phase of the tidal cycle (tidal hour) for each time point, based on either velocity or water level time series. The tidal hour is defined such that 0 corresponds to slack before ebb.

hour_tide_fn(jd, u, start_datum='ebb', leave_mean=False)[source]

Return a function for extracting tidal hour from the time/velocity given.

Parameters:
jdarray-like

Time array.

uarray-like

Velocity array.

start_datum{‘ebb’, ‘flood’}, optional

Desired starting datum for tidal hour.

leave_meanbool, optional

If False, removes the mean (low-frequency) component from u.

Returns:
function

Function: fn(jd_new) → tidal hour array.

Notes

This function generates a callable that computes tidal hour for arbitrary time points, based on the provided time and velocity series. The tidal hour is referenced to slack water.

class interp1d(x, y, kind='linear', axis=-1, copy=True, bounds_error=None, fill_value=nan, assume_sorted=False)[source]

Bases: _Interpolator1D

Interpolate a 1-D function.

x and y are arrays of values used to approximate some function f: y = f(x). This class returns a function whose call method uses interpolation to find the value of new points.

Parameters:
x(npoints, ) array_like

A 1-D array of real values.

y(…, npoints, …) array_like

A N-D array of real values. The length of y along the interpolation axis must be equal to the length of x. Use the axis parameter to select correct axis. Unlike other interpolators, the default interpolation axis is the last axis of y.

kindstr or int, optional

Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.

axisint, optional

Axis in the y array corresponding to the x-coordinate values. Unlike other interpolators, defaults to axis=-1.

copybool, optional

If True, the class makes internal copies of x and y. If False, references to x and y are used. The default is to copy.

bounds_errorbool, optional

If True, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If False, out of bounds values are assigned fill_value. By default, an error is raised unless fill_value="extrapolate".

fill_valuearray-like or (array-like, array_like) or “extrapolate”, optional
  • if a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes.

  • If a two-element tuple, then the first element is used as a fill value for x_new < x[0] and the second element is used for x_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds as below, above = fill_value, fill_value. Using a two-element tuple or ndarray requires bounds_error=False.

    New in version 0.17.0.

  • If “extrapolate”, then points outside the data range will be extrapolated.

    New in version 0.17.0.

assume_sortedbool, optional

If False, values of x can be in any order and they are sorted first. If True, x has to be an array of monotonically increasing values.

See also

splrep, splev

Spline interpolation/smoothing based on FITPACK.

UnivariateSpline

An object-oriented wrapper of the FITPACK routines.

interp2d

2-D interpolation

Notes

Calling interp1d with NaNs present in input values results in undefined behaviour.

Input values x and y must be convertible to float values like int or float.

If the values in x are not unique, the resulting behavior is undefined and specific to the choice of kind, i.e., changing kind will change the behavior for duplicates.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from scipy import interpolate
>>> x = np.arange(0, 10)
>>> y = np.exp(-x/3.0)
>>> f = interpolate.interp1d(x, y)
>>> xnew = np.arange(0, 9, 0.1)
>>> ynew = f(xnew)   # use interpolation function returned by `interp1d`
>>> plt.plot(x, y, 'o', xnew, ynew, '-')
>>> plt.show()
Attributes:
fill_value

The fill value.

Methods

__call__(x)

Evaluate the interpolant

__dict__ = mappingproxy({'__module__': 'scipy.interpolate._interpolate', '__doc__': '\n    Interpolate a 1-D function.\n\n    .. legacy:: class\n\n        For a guide to the intended replacements for `interp1d` see\n        :ref:`tutorial-interpolate_1Dsection`.\n\n    `x` and `y` are arrays of values used to approximate some function f:\n    ``y = f(x)``. This class returns a function whose call method uses\n    interpolation to find the value of new points.\n\n    Parameters\n    ----------\n    x : (npoints, ) array_like\n        A 1-D array of real values.\n    y : (..., npoints, ...) array_like\n        A N-D array of real values. The length of `y` along the interpolation\n        axis must be equal to the length of `x`. Use the ``axis`` parameter\n        to select correct axis. Unlike other interpolators, the default\n        interpolation axis is the last axis of `y`.\n    kind : str or int, optional\n        Specifies the kind of interpolation as a string or as an integer\n        specifying the order of the spline interpolator to use.\n        The string has to be one of \'linear\', \'nearest\', \'nearest-up\', \'zero\',\n        \'slinear\', \'quadratic\', \'cubic\', \'previous\', or \'next\'. \'zero\',\n        \'slinear\', \'quadratic\' and \'cubic\' refer to a spline interpolation of\n        zeroth, first, second or third order; \'previous\' and \'next\' simply\n        return the previous or next value of the point; \'nearest-up\' and\n        \'nearest\' differ when interpolating half-integers (e.g. 0.5, 1.5)\n        in that \'nearest-up\' rounds up and \'nearest\' rounds down. Default\n        is \'linear\'.\n    axis : int, optional\n        Axis in the ``y`` array corresponding to the x-coordinate values. Unlike\n        other interpolators, defaults to ``axis=-1``.\n    copy : bool, optional\n        If True, the class makes internal copies of x and y.\n        If False, references to `x` and `y` are used. The default is to copy.\n    bounds_error : bool, optional\n        If True, a ValueError is raised any time interpolation is attempted on\n        a value outside of the range of x (where extrapolation is\n        necessary). If False, out of bounds values are assigned `fill_value`.\n        By default, an error is raised unless ``fill_value="extrapolate"``.\n    fill_value : array-like or (array-like, array_like) or "extrapolate", optional\n        - if a ndarray (or float), this value will be used to fill in for\n          requested points outside of the data range. If not provided, then\n          the default is NaN. The array-like must broadcast properly to the\n          dimensions of the non-interpolation axes.\n        - If a two-element tuple, then the first element is used as a\n          fill value for ``x_new < x[0]`` and the second element is used for\n          ``x_new > x[-1]``. Anything that is not a 2-element tuple (e.g.,\n          list or ndarray, regardless of shape) is taken to be a single\n          array-like argument meant to be used for both bounds as\n          ``below, above = fill_value, fill_value``. Using a two-element tuple\n          or ndarray requires ``bounds_error=False``.\n\n          .. versionadded:: 0.17.0\n        - If "extrapolate", then points outside the data range will be\n          extrapolated.\n\n          .. versionadded:: 0.17.0\n    assume_sorted : bool, optional\n        If False, values of `x` can be in any order and they are sorted first.\n        If True, `x` has to be an array of monotonically increasing values.\n\n    Attributes\n    ----------\n    fill_value\n\n    Methods\n    -------\n    __call__\n\n    See Also\n    --------\n    splrep, splev\n        Spline interpolation/smoothing based on FITPACK.\n    UnivariateSpline : An object-oriented wrapper of the FITPACK routines.\n    interp2d : 2-D interpolation\n\n    Notes\n    -----\n    Calling `interp1d` with NaNs present in input values results in\n    undefined behaviour.\n\n    Input values `x` and `y` must be convertible to `float` values like\n    `int` or `float`.\n\n    If the values in `x` are not unique, the resulting behavior is\n    undefined and specific to the choice of `kind`, i.e., changing\n    `kind` will change the behavior for duplicates.\n\n\n    Examples\n    --------\n    >>> import numpy as np\n    >>> import matplotlib.pyplot as plt\n    >>> from scipy import interpolate\n    >>> x = np.arange(0, 10)\n    >>> y = np.exp(-x/3.0)\n    >>> f = interpolate.interp1d(x, y)\n\n    >>> xnew = np.arange(0, 9, 0.1)\n    >>> ynew = f(xnew)   # use interpolation function returned by `interp1d`\n    >>> plt.plot(x, y, \'o\', xnew, ynew, \'-\')\n    >>> plt.show()\n    ', '__init__': <function interp1d.__init__>, 'fill_value': <property object>, '_check_and_update_bounds_error_for_extrapolation': <function interp1d._check_and_update_bounds_error_for_extrapolation>, '_call_linear_np': <function interp1d._call_linear_np>, '_call_linear': <function interp1d._call_linear>, '_call_nearest': <function interp1d._call_nearest>, '_call_previousnext': <function interp1d._call_previousnext>, '_call_spline': <function interp1d._call_spline>, '_call_nan_spline': <function interp1d._call_nan_spline>, '_evaluate': <function interp1d._evaluate>, '_check_bounds': <function interp1d._check_bounds>, '__dict__': <attribute '__dict__' of 'interp1d' objects>, '__weakref__': <attribute '__weakref__' of 'interp1d' objects>, '__annotations__': {}})
__doc__ = '\n    Interpolate a 1-D function.\n\n    .. legacy:: class\n\n        For a guide to the intended replacements for `interp1d` see\n        :ref:`tutorial-interpolate_1Dsection`.\n\n    `x` and `y` are arrays of values used to approximate some function f:\n    ``y = f(x)``. This class returns a function whose call method uses\n    interpolation to find the value of new points.\n\n    Parameters\n    ----------\n    x : (npoints, ) array_like\n        A 1-D array of real values.\n    y : (..., npoints, ...) array_like\n        A N-D array of real values. The length of `y` along the interpolation\n        axis must be equal to the length of `x`. Use the ``axis`` parameter\n        to select correct axis. Unlike other interpolators, the default\n        interpolation axis is the last axis of `y`.\n    kind : str or int, optional\n        Specifies the kind of interpolation as a string or as an integer\n        specifying the order of the spline interpolator to use.\n        The string has to be one of \'linear\', \'nearest\', \'nearest-up\', \'zero\',\n        \'slinear\', \'quadratic\', \'cubic\', \'previous\', or \'next\'. \'zero\',\n        \'slinear\', \'quadratic\' and \'cubic\' refer to a spline interpolation of\n        zeroth, first, second or third order; \'previous\' and \'next\' simply\n        return the previous or next value of the point; \'nearest-up\' and\n        \'nearest\' differ when interpolating half-integers (e.g. 0.5, 1.5)\n        in that \'nearest-up\' rounds up and \'nearest\' rounds down. Default\n        is \'linear\'.\n    axis : int, optional\n        Axis in the ``y`` array corresponding to the x-coordinate values. Unlike\n        other interpolators, defaults to ``axis=-1``.\n    copy : bool, optional\n        If True, the class makes internal copies of x and y.\n        If False, references to `x` and `y` are used. The default is to copy.\n    bounds_error : bool, optional\n        If True, a ValueError is raised any time interpolation is attempted on\n        a value outside of the range of x (where extrapolation is\n        necessary). If False, out of bounds values are assigned `fill_value`.\n        By default, an error is raised unless ``fill_value="extrapolate"``.\n    fill_value : array-like or (array-like, array_like) or "extrapolate", optional\n        - if a ndarray (or float), this value will be used to fill in for\n          requested points outside of the data range. If not provided, then\n          the default is NaN. The array-like must broadcast properly to the\n          dimensions of the non-interpolation axes.\n        - If a two-element tuple, then the first element is used as a\n          fill value for ``x_new < x[0]`` and the second element is used for\n          ``x_new > x[-1]``. Anything that is not a 2-element tuple (e.g.,\n          list or ndarray, regardless of shape) is taken to be a single\n          array-like argument meant to be used for both bounds as\n          ``below, above = fill_value, fill_value``. Using a two-element tuple\n          or ndarray requires ``bounds_error=False``.\n\n          .. versionadded:: 0.17.0\n        - If "extrapolate", then points outside the data range will be\n          extrapolated.\n\n          .. versionadded:: 0.17.0\n    assume_sorted : bool, optional\n        If False, values of `x` can be in any order and they are sorted first.\n        If True, `x` has to be an array of monotonically increasing values.\n\n    Attributes\n    ----------\n    fill_value\n\n    Methods\n    -------\n    __call__\n\n    See Also\n    --------\n    splrep, splev\n        Spline interpolation/smoothing based on FITPACK.\n    UnivariateSpline : An object-oriented wrapper of the FITPACK routines.\n    interp2d : 2-D interpolation\n\n    Notes\n    -----\n    Calling `interp1d` with NaNs present in input values results in\n    undefined behaviour.\n\n    Input values `x` and `y` must be convertible to `float` values like\n    `int` or `float`.\n\n    If the values in `x` are not unique, the resulting behavior is\n    undefined and specific to the choice of `kind`, i.e., changing\n    `kind` will change the behavior for duplicates.\n\n\n    Examples\n    --------\n    >>> import numpy as np\n    >>> import matplotlib.pyplot as plt\n    >>> from scipy import interpolate\n    >>> x = np.arange(0, 10)\n    >>> y = np.exp(-x/3.0)\n    >>> f = interpolate.interp1d(x, y)\n\n    >>> xnew = np.arange(0, 9, 0.1)\n    >>> ynew = f(xnew)   # use interpolation function returned by `interp1d`\n    >>> plt.plot(x, y, \'o\', xnew, ynew, \'-\')\n    >>> plt.show()\n    '
__init__(x, y, kind='linear', axis=-1, copy=True, bounds_error=None, fill_value=nan, assume_sorted=False)[source]

Initialize a 1-D linear interpolation class.

__module__ = 'scipy.interpolate._interpolate'
__weakref__

list of weak references to the object (if defined)

_call_linear(x_new)[source]
_call_linear_np(x_new)[source]
_call_nan_spline(x_new)[source]
_call_nearest(x_new)[source]

Find nearest neighbor interpolated y_new = f(x_new).

_call_previousnext(x_new)[source]

Use previous/next neighbor of x_new, y_new = f(x_new).

_call_spline(x_new)[source]
_check_and_update_bounds_error_for_extrapolation()[source]
_check_bounds(x_new)[source]

Check the inputs for being in the bounds of the interpolated data.

Parameters:
x_newarray
Returns:
out_of_boundsbool array

The mask on x_new of values that are out of the bounds.

_evaluate(x_new)[source]

Actually evaluate the value of the interpolator.

_y_axis
_y_extra_shape
dtype
property fill_value

The fill value.

tidal_hour_signal(ts, filter=True)[source]

Compute the tidal hour of a semidiurnal signal.

Parameters:
tspandas.Series

Time series of water level or other semidiurnal signal. Must have a datetime index.

filterbool, default True

Whether to apply a 40-hour cosine Lanczos filter to the input signal. If False, uses the raw signal.

Returns:
pandas.Series

Tidal hour as a float (range [0, 12)), indexed by datetime.

See also

diff_h

Compute the derivative (rate of change) of tidal hour.

cosine_lanczos

External function used to apply low-pass filtering.

Notes

This function returns the instantaneous phase-based tidal hour for a time series, assuming a semidiurnal signal. Optionally applies a cosine Lanczos low-pass filter (e.g., 40h) to isolate tidal components from subtidal or noisy fluctuations.

The tidal hour is computed using the phase of the analytic signal obtained via the Hilbert transform. This phase is then scaled to range from 0 to 12 hours to represent one semidiurnal tidal cycle. The output is a pandas Series aligned with the input time index.

The tidal hour is derived from the instantaneous phase of the analytic signal. This signal is computed as:

analytic_signal = ts + 1j * hilbert(ts)

The phase (angle) of this complex signal varies smoothly over time and reflects the oscillatory nature of the tide, allowing us to construct a continuous representation of “tidal time” even between extrema.

The use of the Hilbert transform provides a smooth interpolation of the signal’s phase progression, since it yields the narrow-band envelope and instantaneous phase of the dominant frequency component (assumed to be semidiurnal here).

tidal_hour_signal2(ts: Series | DataFrame, filter: bool = True) Series | DataFrame[source]

Calculate the tidal hour from a semidiurnal tidal signal.

Parameters:
tspd.Series or pd.DataFrame

Input time series of water levels (must have datetime index)

filterbool, optional

If True, apply Lanczos filter to remove low-frequency components (default True) Note this is opposite of ‘leave_mean’ in original implementation

Returns:
pd.Series or pd.DataFrame

Tidal hour in datetime format (same shape as input)

Notes

The tidal hour represents the phase of the semidiurnal tide in temporal units. The calculation uses complex interpolation for smooth phase estimation: 1. The Hilbert transform creates an analytic signal 2. The angle gives the instantaneous phase 3. Complex interpolation avoids phase jumps at 0/2π boundaries 4. This provides continuous phase evolution even during slack tides

If h/u distinction is needed, consider applying diff_h to separate flood/ebb phases. The derivative was likely included in original code to identify phase reversals during tidal current analysis.

vtools.functions.transition module

class PchipInterpolator(x, y, axis=0, extrapolate=None)[source]

Bases: CubicHermiteSpline

PCHIP 1-D monotonic cubic interpolation.

x and y are arrays of values used to approximate some function f, with y = f(x). The interpolant uses monotonic cubic splines to find the value of new points. (PCHIP stands for Piecewise Cubic Hermite Interpolating Polynomial).

Parameters:
xndarray, shape (npoints, )

A 1-D array of monotonically increasing real values. x cannot include duplicate values (otherwise f is overspecified)

yndarray, shape (…, npoints, …)

A N-D array of real values. y’s length along the interpolation axis must be equal to the length of x. Use the axis parameter to select the interpolation axis.

axisint, optional

Axis in the y array corresponding to the x-coordinate values. Defaults to axis=0.

extrapolatebool, optional

Whether to extrapolate to out-of-bounds points based on first and last intervals, or to return NaNs.

See also

CubicHermiteSpline

Piecewise-cubic interpolator.

Akima1DInterpolator

Akima 1D interpolator.

CubicSpline

Cubic spline data interpolator.

PPoly

Piecewise polynomial in terms of coefficients and breakpoints.

Notes

The interpolator preserves monotonicity in the interpolation data and does not overshoot if the data is not smooth.

The first derivatives are guaranteed to be continuous, but the second derivatives may jump at \(x_k\).

Determines the derivatives at the points \(x_k\), \(f'_k\), by using PCHIP algorithm [1].

Let \(h_k = x_{k+1} - x_k\), and \(d_k = (y_{k+1} - y_k) / h_k\) are the slopes at internal points \(x_k\). If the signs of \(d_k\) and \(d_{k-1}\) are different or either of them equals zero, then \(f'_k = 0\). Otherwise, it is given by the weighted harmonic mean

\[\frac{w_1 + w_2}{f'_k} = \frac{w_1}{d_{k-1}} + \frac{w_2}{d_k}\]

where \(w_1 = 2 h_k + h_{k-1}\) and \(w_2 = h_k + 2 h_{k-1}\).

The end slopes are set using a one-sided scheme [2].

References

[1]

F. N. Fritsch and J. Butland, A method for constructing local monotone piecewise cubic interpolants, SIAM J. Sci. Comput., 5(2), 300-304 (1984). :doi:`10.1137/0905021`.

[2]

see, e.g., C. Moler, Numerical Computing with Matlab, 2004. :doi:`10.1137/1.9780898717952`

Methods

__call__(x[, nu, extrapolate])

Evaluate the piecewise polynomial or its derivative.

derivative([nu])

Construct a new piecewise polynomial representing the derivative.

antiderivative([nu])

Construct a new piecewise polynomial representing the antiderivative.

roots([discontinuity, extrapolate])

Find real roots of the piecewise polynomial.

__annotations__ = {}
__doc__ = "PCHIP 1-D monotonic cubic interpolation.\n\n    ``x`` and ``y`` are arrays of values used to approximate some function f,\n    with ``y = f(x)``. The interpolant uses monotonic cubic splines\n    to find the value of new points. (PCHIP stands for Piecewise Cubic\n    Hermite Interpolating Polynomial).\n\n    Parameters\n    ----------\n    x : ndarray, shape (npoints, )\n        A 1-D array of monotonically increasing real values. ``x`` cannot\n        include duplicate values (otherwise f is overspecified)\n    y : ndarray, shape (..., npoints, ...)\n        A N-D array of real values. ``y``'s length along the interpolation\n        axis must be equal to the length of ``x``. Use the ``axis``\n        parameter to select the interpolation axis.\n    axis : int, optional\n        Axis in the ``y`` array corresponding to the x-coordinate values. Defaults\n        to ``axis=0``.\n    extrapolate : bool, optional\n        Whether to extrapolate to out-of-bounds points based on first\n        and last intervals, or to return NaNs.\n\n    Methods\n    -------\n    __call__\n    derivative\n    antiderivative\n    roots\n\n    See Also\n    --------\n    CubicHermiteSpline : Piecewise-cubic interpolator.\n    Akima1DInterpolator : Akima 1D interpolator.\n    CubicSpline : Cubic spline data interpolator.\n    PPoly : Piecewise polynomial in terms of coefficients and breakpoints.\n\n    Notes\n    -----\n    The interpolator preserves monotonicity in the interpolation data and does\n    not overshoot if the data is not smooth.\n\n    The first derivatives are guaranteed to be continuous, but the second\n    derivatives may jump at :math:`x_k`.\n\n    Determines the derivatives at the points :math:`x_k`, :math:`f'_k`,\n    by using PCHIP algorithm [1]_.\n\n    Let :math:`h_k = x_{k+1} - x_k`, and  :math:`d_k = (y_{k+1} - y_k) / h_k`\n    are the slopes at internal points :math:`x_k`.\n    If the signs of :math:`d_k` and :math:`d_{k-1}` are different or either of\n    them equals zero, then :math:`f'_k = 0`. Otherwise, it is given by the\n    weighted harmonic mean\n\n    .. math::\n\n        \\frac{w_1 + w_2}{f'_k} = \\frac{w_1}{d_{k-1}} + \\frac{w_2}{d_k}\n\n    where :math:`w_1 = 2 h_k + h_{k-1}` and :math:`w_2 = h_k + 2 h_{k-1}`.\n\n    The end slopes are set using a one-sided scheme [2]_.\n\n\n    References\n    ----------\n    .. [1] F. N. Fritsch and J. Butland,\n           A method for constructing local\n           monotone piecewise cubic interpolants,\n           SIAM J. Sci. Comput., 5(2), 300-304 (1984).\n           :doi:`10.1137/0905021`.\n    .. [2] see, e.g., C. Moler, Numerical Computing with Matlab, 2004.\n           :doi:`10.1137/1.9780898717952`\n\n    "
__init__(x, y, axis=0, extrapolate=None)[source]
__module__ = 'scipy.interpolate._cubic'
static _edge_case(h0, h1, m0, m1)[source]
static _find_derivatives(x, y)[source]
axis
c
extrapolate
x
_parse_max_snap(max_snap)[source]
_resolve_gap_endpoints_subset_snap(ts0, ts1, window, max_snap=None)[source]
Contract:
  • If window is None:
    • If there’s a natural gap (ts0.last < ts1.first), use that full gap.

    • Otherwise (overlap/abut), return None to signal ‘no explicit gap’ (algorithms decide).

  • If window is provided:
    • Enforce: start < end; ts0 has <= start; ts1 has >= end. Else: ValueError.

    • If there is a natural gap AND (start,end) is a strict subset of it, expand start left and end right by up to max_snap (default 0) but never beyond the natural gap bounds. Otherwise, ignore max_snap.

    • Always snap endpoints to data: start_time = last ts0 sample <= effective start, end_time = first ts1 sample >= effective end.

Returns:

(start_time, end_time) or None if no explicit gap is to be used.

align_inputs_pair_strict(ts0_kw='ts0', ts1_kw='ts1', names_kw='names')[source]
transition_ts(ts0, ts1, method='linear', window=None, overlap=(0, 0), return_type='series', names=None, max_snap=None)[source]

Create a smooth transition between two aligned time series.

Parameters:
ts0pandas.Series or pandas.DataFrame

The initial time series segment. Must share the same frequency and type as ts1.

ts1pandas.Series or pandas.DataFrame

The final time series segment. Must share the same frequency and type as ts0.

method{“linear”, “pchip”}, default=”linear”

The interpolation method to use for generating the transition.

window[start, end] or None

If None and there’s a natural gap (ts0.last < ts1.first), that full gap is used. If provided, start<end, ts0 must have samples at/before start, ts1 at/after end.

namesNone, str, or iterable of str, optional
  • If None (default), inputs must share compatible column names.

  • If str, the output is univariate and will be named accordingly.

  • If iterable, it is used as a subset/ordering of columns.

overlaptuple of int or str, default=(0, 0)

Amount of overlap to use for interpolation anchoring in pchip mode. Each entry can be: - An integer: number of data points before/after to use. - A pandas-compatible frequency string: e.g., “2h” or “45min”.

max_snapNone | Timedelta-like | (Timedelta-like, Timedelta-like)

Optional widening ONLY when window is strictly inside the natural gap. Expands start earlier and end later by up to max_snap, but never past (ts0.last, ts1.first). Default None = no widening.

return_type{“series”, “glue”}, default=”series”
  • “series”: returns the full merged series including ts0, transition, ts1.

  • “glue”: returns only the interpolated transition segment.

Returns:
pandas.Series or pandas.DataFrame

The resulting time series segment, either the full merged series or just the transition zone.

Raises:
ValueError

If ts0 and ts1 have mismatched types or frequencies, or if overlap exists but window is not specified.

vtools.functions.unit_conversions module

Unit conversion helpers.

This module provides:
  • linear/affine converters for common engineering units: metres↔feet, cms↔cfs, °F↔°C (all functional, no in-place mutation).

  • Domain-specific conversions between electrical conductivity (EC, μS/cm) and practical salinity (PSU) at 25 °C, with optional Hill low-salinity correction and an accuracy-improving root-finding “refinement” step.

  • a general-purpose unit conversion function convert_units() that uses Pint by default (with an optional cf_units backend via an environment variable), and that has fast paths for the above common conversions.

Notes

  • PSU is treated here as a practical “unit” for salinity in workflows, even though in a strict metrological sense it is unitless.

  • The EC↔PSU conversions assume 25 °C and no explicit temperature dependence beyond the optional Hill correction.

References

Schemel, L.E. (2001) Empirical relationships between salinity and specific conductance in San Francisco Bay, California.

Hill, K. (low-salinity correction widely used in estuarine practice).

_get_converter(iu: str, ou: str)[source]

Return a callable(arr)->arr using Pint by default; cf_units if env-forced.

_norm(u: str) str[source]

Normalize common shorthands to canonical spellings without destroying case needed by Pint (e.g., degC/degF).

_rewrap_like(values, arr)[source]
_want_cf_units() bool[source]
celsius_to_fahrenheit(x)[source]

Convert °C to °F.

Parameters:
xscalar | array-like | pd.Series | pd.DataFrame

Value(s) in celsius.

Returns:
same type as x

Value(s) in farenheit.

cfs_to_cms(x)[source]

Convert ft³/s to m³/s.

Parameters:
xscalar | array-like | pd.Series | pd.DataFrame

Value(s) in cfs.

Returns:
same type as x

Value(s) in cubic meters per second.

cms_to_cfs(x)[source]

Convert m³/s to ft³/s.

Parameters:
xscalar | array-like | pd.Series | pd.DataFrame

Value(s) in cms.

Returns:
same type as x

Value(s) in cfs.

convert_units(values, in_unit: str, out_unit: str)[source]

Convert array-like / pandas objects between units. Fast custom paths for EC↔PSU@25C, temperature, cfs↔cms, ft↔m; else Pint-backed.

Parameters:
valuesarray-like | pd.Series | pd.DataFrame
in_unit, out_unitstr

Unit strings. Shorthands like ‘cfs’,’cms’,’ft3/s’,’μS/cm’,’deg F’ accepted.

Returns:
Same type as values, converted.
ec_psu_25c(ec, hill_correction=True)[source]

Convert electrical conductivity (EC, μS/cm) to practical salinity (PSU) at 25 °C.

This implements the empirical relationship used for estuarine work, with an optional Hill correction that improves behavior at low salinities.

Parameters:
ecarray-like or scalar

Electrical conductivity in μS/cm.

hill_correctionbool, default True

Apply Hill low-salinity correction.

Returns:
ndarray or scalar

Practical salinity (PSU). For negative EC inputs: - scalar input → returns NaN - array input → returns NaN at those positions

Notes

  • Assumes temperature is 25 °C.

  • Negative EC values are internally floored to a small positive ratio for computation (R=1e-4); those outputs are then set to NaN on array paths (or NaN returned for scalar paths).

fahrenheit_to_celsius(x)[source]

Convert °F to °C.

Parameters:
xscalar | array-like | pd.Series | pd.DataFrame

Value(s) in degrees F.

Returns:
same type as x

Value(s) in degrees celsius.

ft_to_m(x)[source]

Convert feet to metres.

Parameters:
xscalar | array-like | pd.Series | pd.DataFrame

Value(s) in feet.

Returns:
same type as x

Value(s) in meters.

m_to_ft(x)[source]

Convert metres to feet.

Parameters:
xscalar | array-like | pd.Series | pd.DataFrame

Value(s) in metres.

Returns:
same type as x

Value(s) in feet.

psu_ec_25c(psu, refine=True, hill_correction=True)[source]

Convert practical salinity (PSU) to EC (μS/cm) at 25 °C (vectorized).

Parameters:
psuarray-like or scalar

Practical salinity value(s).

refinebool, default True

Use root finding via psu_ec_25c_scalar() for accuracy.

hill_correctionbool, default True

See psu_ec_25c_scalar().

Returns:
ndarray or scalar

EC in μS/cm. Scalar input returns a scalar; array-like input returns a NumPy array of the same shape.

psu_ec_25c_scalar(psu, refine=True, hill_correction=True)[source]

Convert practical salinity (PSU) to EC (μS/cm) at 25 °C for a scalar value.

Parameters:
psufloat

Practical salinity. Must be non-negative and ≤ ~35 for oceanic cases (a hard check is enforced near sea salinity when refine is True).

refinebool, default True

If True, use a scalar root finder (Brent) to invert the EC→PSU mapping accurately. If False, use a closed-form Schemel-style polynomial approximation.

hill_correctionbool, default True

Only meaningful with refine=True; raises if refine=False and hill_correction=True.

Returns:
float

Electrical conductivity (μS/cm).

Raises:
ValueError

If psu < 0, if psu exceeds the sea-salinity cap in refine mode, or if an invalid combination of refine/hill_correction is requested.

Notes

  • The refinement typically converges in ~4–6 iterations.

  • The non-refined polynomial is faster but can drift on round trips (EC→PSU→EC).

psu_ec_resid(x, psu, hill_correction)[source]

Module contents