vtools.functions package¶
Submodules¶
vtools.functions.climatology module¶
- apply_climatology(climate, index=None, start=None, end=None, freq=None)[source]¶
Apply daily or monthly climatology to a new index or generate index from start/end/freq
- Parameters:
- climateDataFrame or Series
DataFrame with integer index representing month of year (Jan=1) or day of year. Must be of size 12, 365, or 366. Day 366 will be inferred from day 365 value.
- indexpandas.DatetimeIndex, optional
Locations to be inferred. If not provided, must specify start, end, and freq.
- startstr or datetime-like, optional
Start date for generating index (used if index is None).
- endstr or datetime-like, optional
End date for generating index (used if index is None).
- freqstr, optional
Frequency string for generating index (used if index is None). E.g., ‘D’ for daily, ‘M’ for monthly.
- Returns:
- DataFrame or Series
Values extracted from climatology for the month or day at the specified index.
Notes
If index is not provided, start, end, and freq must be specified to generate a DatetimeIndex using pandas.date_range.
Backward compatible: original behavior is preserved if index is provided.
- climatology(ts, freq, nsmooth=None)[source]¶
“ Create a climatology on the columns of ts
- Parameters:
- ts: DataFrame or Series
- DataStructure to be analyzed. Must have a length of at least 2*freq
- freq: period [“day”,”month”]
- Period over which the climatology is analyzed
- nsmooth: int
window size (number of values) of pre-smoothing. This may not make sense for series that are not approximately regular. An odd number is usually best.
- Returns:
out: DataFrame or Series Data structure of the same type as ts, with Integer index representing month (Jan=1) or day of year (1:365).
- climatology_quantiles(ts, min_day_year, max_day_year, window_width, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])[source]¶
“ Create windowed quantiles across years on a time series
- Parameters:
- ts: DataFrame or Series
- DataStructure to be analyzed.
- min_day_year: int
- Minimum Julian day to be considered
- freq: period [“day”,”month”]
- Maximum Julian day to be considered
- window_width: int
- Number of days to include, including the central day and days on each side. So for instance window_width=15 would span the central date and 7 days on each side
- quantiles: array-like
quantiles requested
- Returns:
out: DataFrame or Series Data structure with Julian day as the index and quantiles as columns.
vtools.functions.colname_align module¶
Column naming alignment utilities for time series composition functions.
This module provides decorators that standardize how functions like
ts_merge, ts_splice, and transition_ts handle their names
argument and enforce column consistency across multiple time series inputs.
Main features¶
Column consistency enforcement: Ensures that when
names=None(default), all input DataFrames share identical columns. This prevents accidental creation of staggered or mismatched columns.Centralized naming behavior: Applies uniform handling of
namesvalues:None— require identical columns across all inputs and keep them.str— require univariate inputs (single column each); output is a single-column DataFrame (or Series if all inputs were Series) with this name.Iterable[str]— treated as a column selector: these columns are selected (and ordered) from the final output and must exist in every input.
Support for both list-style and pairwise APIs: Works for functions that accept a sequence of time series (like
ts_merge/ts_splice) or two explicit series arguments (liketransition_ts).
Usage pattern¶
Decorate your functions as follows:
@columns_aligned(mode="same_set")
@names_aligned(seq_arg=0, pre_rename=True)
def ts_splice(series, names=None, ...):
...
@columns_aligned(mode="same_set")
@names_aligned_pair(ts0_kw="ts0", ts1_kw="ts1")
def transition_ts(ts0, ts1, names=None, ...):
...
This ensures consistent semantics for all multi-series combination tools.
- _coerce_inputs_strict(seq, names)[source]¶
Strict input alignment policy: - names is None -> all inputs must have identical column lists (no unions/intersections). - names is str -> leave inputs as-is; final renaming happens via align_names(…). - names is list -> for each DF, select exactly those columns; for a Series, only len==1 allowed.
vtools.functions.envelope module¶
- class PchipInterpolator(x, y, axis=0, extrapolate=None)[source]¶
Bases:
CubicHermiteSplinePCHIP 1-D monotonic cubic interpolation.
xandyare arrays of values used to approximate some function f, withy = f(x). The interpolant uses monotonic cubic splines to find the value of new points. (PCHIP stands for Piecewise Cubic Hermite Interpolating Polynomial).- Parameters:
- xndarray, shape (npoints, )
A 1-D array of monotonically increasing real values.
xcannot include duplicate values (otherwise f is overspecified)- yndarray, shape (…, npoints, …)
A N-D array of real values.
y’s length along the interpolation axis must be equal to the length ofx. Use theaxisparameter to select the interpolation axis.- axisint, optional
Axis in the
yarray corresponding to the x-coordinate values. Defaults toaxis=0.- extrapolatebool, optional
Whether to extrapolate to out-of-bounds points based on first and last intervals, or to return NaNs.
See also
CubicHermiteSplinePiecewise-cubic interpolator.
Akima1DInterpolatorAkima 1D interpolator.
CubicSplineCubic spline data interpolator.
PPolyPiecewise polynomial in terms of coefficients and breakpoints.
Notes
The interpolator preserves monotonicity in the interpolation data and does not overshoot if the data is not smooth.
The first derivatives are guaranteed to be continuous, but the second derivatives may jump at \(x_k\).
Determines the derivatives at the points \(x_k\), \(f'_k\), by using PCHIP algorithm [1].
Let \(h_k = x_{k+1} - x_k\), and \(d_k = (y_{k+1} - y_k) / h_k\) are the slopes at internal points \(x_k\). If the signs of \(d_k\) and \(d_{k-1}\) are different or either of them equals zero, then \(f'_k = 0\). Otherwise, it is given by the weighted harmonic mean
\[\frac{w_1 + w_2}{f'_k} = \frac{w_1}{d_{k-1}} + \frac{w_2}{d_k}\]where \(w_1 = 2 h_k + h_{k-1}\) and \(w_2 = h_k + 2 h_{k-1}\).
The end slopes are set using a one-sided scheme [2].
References
[1]F. N. Fritsch and J. Butland, A method for constructing local monotone piecewise cubic interpolants, SIAM J. Sci. Comput., 5(2), 300-304 (1984). :doi:`10.1137/0905021`.
[2]see, e.g., C. Moler, Numerical Computing with Matlab, 2004. :doi:`10.1137/1.9780898717952`
Methods
__call__(x[, nu, extrapolate])Evaluate the piecewise polynomial or its derivative.
derivative([nu])Construct a new piecewise polynomial representing the derivative.
antiderivative([nu])Construct a new piecewise polynomial representing the antiderivative.
roots([discontinuity, extrapolate])Find real roots of the piecewise polynomial.
- __annotations__ = {}¶
- __doc__ = "PCHIP 1-D monotonic cubic interpolation.\n\n ``x`` and ``y`` are arrays of values used to approximate some function f,\n with ``y = f(x)``. The interpolant uses monotonic cubic splines\n to find the value of new points. (PCHIP stands for Piecewise Cubic\n Hermite Interpolating Polynomial).\n\n Parameters\n ----------\n x : ndarray, shape (npoints, )\n A 1-D array of monotonically increasing real values. ``x`` cannot\n include duplicate values (otherwise f is overspecified)\n y : ndarray, shape (..., npoints, ...)\n A N-D array of real values. ``y``'s length along the interpolation\n axis must be equal to the length of ``x``. Use the ``axis``\n parameter to select the interpolation axis.\n axis : int, optional\n Axis in the ``y`` array corresponding to the x-coordinate values. Defaults\n to ``axis=0``.\n extrapolate : bool, optional\n Whether to extrapolate to out-of-bounds points based on first\n and last intervals, or to return NaNs.\n\n Methods\n -------\n __call__\n derivative\n antiderivative\n roots\n\n See Also\n --------\n CubicHermiteSpline : Piecewise-cubic interpolator.\n Akima1DInterpolator : Akima 1D interpolator.\n CubicSpline : Cubic spline data interpolator.\n PPoly : Piecewise polynomial in terms of coefficients and breakpoints.\n\n Notes\n -----\n The interpolator preserves monotonicity in the interpolation data and does\n not overshoot if the data is not smooth.\n\n The first derivatives are guaranteed to be continuous, but the second\n derivatives may jump at :math:`x_k`.\n\n Determines the derivatives at the points :math:`x_k`, :math:`f'_k`,\n by using PCHIP algorithm [1]_.\n\n Let :math:`h_k = x_{k+1} - x_k`, and :math:`d_k = (y_{k+1} - y_k) / h_k`\n are the slopes at internal points :math:`x_k`.\n If the signs of :math:`d_k` and :math:`d_{k-1}` are different or either of\n them equals zero, then :math:`f'_k = 0`. Otherwise, it is given by the\n weighted harmonic mean\n\n .. math::\n\n \\frac{w_1 + w_2}{f'_k} = \\frac{w_1}{d_{k-1}} + \\frac{w_2}{d_k}\n\n where :math:`w_1 = 2 h_k + h_{k-1}` and :math:`w_2 = h_k + 2 h_{k-1}`.\n\n The end slopes are set using a one-sided scheme [2]_.\n\n\n References\n ----------\n .. [1] F. N. Fritsch and J. Butland,\n A method for constructing local\n monotone piecewise cubic interpolants,\n SIAM J. Sci. Comput., 5(2), 300-304 (1984).\n :doi:`10.1137/0905021`.\n .. [2] see, e.g., C. Moler, Numerical Computing with Matlab, 2004.\n :doi:`10.1137/1.9780898717952`\n\n "¶
- __module__ = 'scipy.interpolate._cubic'¶
- axis¶
- c¶
- extrapolate¶
- x¶
- chunked_loess_smoothing(ts, window_hours=1.25, chunk_days=10, overlap_days=1)[source]¶
Apply LOESS smoothing in overlapping chunks to reduce computation time.
- Parameters:
- tspd.Series
Time series with datetime index and possible NaNs.
- window_hoursfloat
LOESS smoothing window size in hours.
- chunk_daysint
Core chunk size (e.g., 10 days).
- overlap_daysint
Overlap added before and after each chunk to avoid edge effects.
- Returns:
- pd.Series
Smoothed series, NaNs where input is NaN or unsupported.
- filter_extrema_ngood(extrema_df, smoothed, series, loess_window_pts=25, n_good=3, sig_gap_minutes=45)[source]¶
Filter extrema based on local and contextual data quality criteria.
- Parameters:
- extrema_dfpd.DataFrame
DataFrame with columns ‘time’ and ‘value’ for candidate extrema.
- smoothedpd.Series
Smoothed version of the signal used for extrema detection.
- seriespd.Series
Original time series (with gaps).
- loess_window_ptsint
Number of points in the LOESS window.
- n_goodint
Minimum number of non-NaN points required.
- sig_gap_minutesfloat
Threshold for detecting significant gaps (in minutes).
- Returns:
- pd.DataFrame
Filtered extrema DataFrame.
- find_peaks(x, height=None, threshold=None, distance=None, prominence=None, width=None, wlen=None, rel_height=0.5, plateau_size=None)[source]¶
Find peaks inside a signal based on peak properties.
This function takes a 1-D array and finds all local maxima by simple comparison of neighboring values. Optionally, a subset of these peaks can be selected by specifying conditions for a peak’s properties.
- Parameters:
- xsequence
A signal with peaks.
- heightnumber or ndarray or sequence, optional
Required height of peaks. Either a number,
None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required height.- thresholdnumber or ndarray or sequence, optional
Required threshold of peaks, the vertical distance to its neighboring samples. Either a number,
None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required threshold.- distancenumber, optional
Required minimal horizontal distance (>= 1) in samples between neighbouring peaks. Smaller peaks are removed first until the condition is fulfilled for all remaining peaks.
- prominencenumber or ndarray or sequence, optional
Required prominence of peaks. Either a number,
None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required prominence.- widthnumber or ndarray or sequence, optional
Required width of peaks in samples. Either a number,
None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied, as the maximal required width.- wlenint, optional
Used for calculation of the peaks prominences, thus it is only used if one of the arguments prominence or width is given. See argument wlen in peak_prominences for a full description of its effects.
- rel_heightfloat, optional
Used for calculation of the peaks width, thus it is only used if width is given. See argument rel_height in peak_widths for a full description of its effects.
- plateau_sizenumber or ndarray or sequence, optional
Required size of the flat top of peaks in samples. Either a number,
None, an array matching x or a 2-element sequence of the former. The first element is always interpreted as the minimal and the second, if supplied as the maximal required plateau size.New in version 1.2.0.
- Returns:
- peaksndarray
Indices of peaks in x that satisfy all given conditions.
- propertiesdict
A dictionary containing properties of the returned peaks which were calculated as intermediate results during evaluation of the specified conditions:
- ‘peak_heights’
If height is given, the height of each peak in x.
- ‘left_thresholds’, ‘right_thresholds’
If threshold is given, these keys contain a peaks vertical distance to its neighbouring samples.
- ‘prominences’, ‘right_bases’, ‘left_bases’
If prominence is given, these keys are accessible. See peak_prominences for a description of their content.
- ‘width_heights’, ‘left_ips’, ‘right_ips’
If width is given, these keys are accessible. See peak_widths for a description of their content.
- ‘plateau_sizes’, left_edges’, ‘right_edges’
If plateau_size is given, these keys are accessible and contain the indices of a peak’s edges (edges are still part of the plateau) and the calculated plateau sizes.
New in version 1.2.0.
To calculate and return properties without excluding peaks, provide the open interval
(None, None)as a value to the appropriate argument (excluding distance).
- Warns:
- PeakPropertyWarning
Raised if a peak’s properties have unexpected values (see peak_prominences and peak_widths).
Warning
This function may return unexpected results for data containing NaNs. To avoid this, NaNs should either be removed or replaced.
See also
find_peaks_cwtFind peaks using the wavelet transformation.
peak_prominencesDirectly calculate the prominence of peaks.
peak_widthsDirectly calculate the width of peaks.
Notes
In the context of this function, a peak or local maximum is defined as any sample whose two direct neighbours have a smaller amplitude. For flat peaks (more than one sample of equal amplitude wide) the index of the middle sample is returned (rounded down in case the number of samples is even). For noisy signals the peak locations can be off because the noise might change the position of local maxima. In those cases consider smoothing the signal before searching for peaks or use other peak finding and fitting methods (like find_peaks_cwt).
Some additional comments on specifying conditions:
Almost all conditions (excluding distance) can be given as half-open or closed intervals, e.g.,
1or(1, None)defines the half-open interval \([1, \infty]\) while(None, 1)defines the interval \([-\infty, 1]\). The open interval(None, None)can be specified as well, which returns the matching properties without exclusion of peaks.The border is always included in the interval used to select valid peaks.
For several conditions the interval borders can be specified with arrays matching x in shape which enables dynamic constrains based on the sample position.
The conditions are evaluated in the following order: plateau_size, height, threshold, distance, prominence, width. In most cases this order is the fastest one because faster operations are applied first to reduce the number of peaks that need to be evaluated later.
While indices in peaks are guaranteed to be at least distance samples apart, edges of flat peaks may be closer than the allowed distance.
Use wlen to reduce the time it takes to evaluate the conditions for prominence or width if x is large or has many local maxima (see peak_prominences).
New in version 1.1.0.
Examples
To demonstrate this function’s usage we use a signal x supplied with SciPy (see scipy.datasets.electrocardiogram). Let’s find all peaks (local maxima) in x whose amplitude lies above 0.
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from scipy.datasets import electrocardiogram >>> from scipy.signal import find_peaks >>> x = electrocardiogram()[2000:4000] >>> peaks, _ = find_peaks(x, height=0) >>> plt.plot(x) >>> plt.plot(peaks, x[peaks], "x") >>> plt.plot(np.zeros_like(x), "--", color="gray") >>> plt.show()
We can select peaks below 0 with
height=(None, 0)or use arrays matching x in size to reflect a changing condition for different parts of the signal.>>> border = np.sin(np.linspace(0, 3 * np.pi, x.size)) >>> peaks, _ = find_peaks(x, height=(-border, border)) >>> plt.plot(x) >>> plt.plot(-border, "--", color="gray") >>> plt.plot(border, ":", color="gray") >>> plt.plot(peaks, x[peaks], "x") >>> plt.show()
Another useful condition for periodic signals can be given with the distance argument. In this case, we can easily select the positions of QRS complexes within the electrocardiogram (ECG) by demanding a distance of at least 150 samples.
>>> peaks, _ = find_peaks(x, distance=150) >>> np.diff(peaks) array([186, 180, 177, 171, 177, 169, 167, 164, 158, 162, 172]) >>> plt.plot(x) >>> plt.plot(peaks, x[peaks], "x") >>> plt.show()
Especially for noisy signals peaks can be easily grouped by their prominence (see peak_prominences). E.g., we can select all peaks except for the mentioned QRS complexes by limiting the allowed prominence to 0.6.
>>> peaks, properties = find_peaks(x, prominence=(None, 0.6)) >>> properties["prominences"].max() 0.5049999999999999 >>> plt.plot(x) >>> plt.plot(peaks, x[peaks], "x") >>> plt.show()
And, finally, let’s examine a different section of the ECG which contains beat forms of different shape. To select only the atypical heart beats, we combine two conditions: a minimal prominence of 1 and width of at least 20 samples.
>>> x = electrocardiogram()[17000:18000] >>> peaks, properties = find_peaks(x, prominence=1, width=20) >>> properties["prominences"], properties["widths"] (array([1.495, 2.3 ]), array([36.93773946, 39.32723577])) >>> plt.plot(x) >>> plt.plot(peaks, x[peaks], "x") >>> plt.vlines(x=peaks, ymin=x[peaks] - properties["prominences"], ... ymax = x[peaks], color = "C1") >>> plt.hlines(y=properties["width_heights"], xmin=properties["left_ips"], ... xmax=properties["right_ips"], color = "C1") >>> plt.show()
- find_raw_extrema(smoothed, prominence=0.01)[source]¶
Find raw peaks and troughs using scipy.signal.find_peaks. Returns DataFrames for peaks and troughs.
- generate_pink_noise(n, seed=None, scale=1.0)[source]¶
Generate pink (1/f) noise using the Voss-McCartney algorithm.
- Parameters:
- nint
Number of samples to generate.
- seedint or None
Random seed for reproducibility.
- scalefloat
Standard deviation scaling factor for the noise.
- Returns:
- np.ndarray
Pink noise signal of length n.
- generate_simplified_mixed_tide(start_time='2022-01-01', ndays=40, freq='15min', A_M2=1.0, A_K1=0.5, A_O1=0.5, phase_D1=1.570795, noise_amplitude=0.08, return_components=False)[source]¶
Generate a simplified synthetic mixed semidiurnal/diurnal tide with explicit O1 and K1.
- Parameters:
- start_timestr
Start time for the series.
- ndaysint
Number of days.
- freqstr
Sampling interval.
- A_M2float
Amplitude of M2.
- A_K1float
Amplitude of K1.
- A_O1float
Amplitude of O1.
- phase_D1float
Common phase shift for O1 and K1.
- return_componentsbool
Whether to return individual components.
- Returns:
- pd.Series or pd.DataFrame
Combined tide or components with time index.
- interpolate_envelope(anchor_df, series, max_anchor_gap_hours=36)[source]¶
Interpolate envelope using PCHIP, breaking if anchor points are too far apart.
- lowess(endog, exog, frac=0.6666666666666666, it=3, delta=0.0, xvals=None, is_sorted=False, missing='drop', return_sorted=True)[source]¶
LOWESS (Locally Weighted Scatterplot Smoothing)
A lowess function that outs smoothed estimates of endog at the given exog values from points (exog, endog)
- Parameters:
- endog1-D numpy array
The y-values of the observed points
- exog1-D numpy array
The x-values of the observed points
- fracfloat
Between 0 and 1. The fraction of the data used when estimating each y-value.
- itint
The number of residual-based reweightings to perform.
- deltafloat
Distance within which to use linear-interpolation instead of weighted regression.
- xvals: 1-D numpy array
Values of the exogenous variable at which to evaluate the regression. If supplied, cannot use delta.
- is_sortedbool
If False (default), then the data will be sorted by exog before calculating lowess. If True, then it is assumed that the data is already sorted by exog. If xvals is specified, then it too must be sorted if is_sorted is True.
- missingstr
Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘drop’.
- return_sortedbool
If True (default), then the returned array is sorted by exog and has missing (nan or infinite) observations removed. If False, then the returned array is in the same length and the same sequence of observations as the input array.
- Returns:
- out{ndarray, float}
The returned array is two-dimensional if return_sorted is True, and one dimensional if return_sorted is False. If return_sorted is True, then a numpy array with two columns. The first column contains the sorted x (exog) values and the second column the associated estimated y (endog) values. If return_sorted is False, then only the fitted values are returned, and the observations will be in the same order as the input arrays. If xvals is provided, then return_sorted is ignored and the returned array is always one dimensional, containing the y values fitted at the x values provided by xvals.
Notes
This lowess function implements the algorithm given in the reference below using local linear estimates.
Suppose the input data has N points. The algorithm works by estimating the smooth y_i by taking the frac*N closest points to (x_i,y_i) based on their x values and estimating y_i using a weighted linear regression. The weight for (x_j,y_j) is tricube function applied to abs(x_i-x_j).
If it > 1, then further weighted local linear regressions are performed, where the weights are the same as above times the _lowess_bisquare function of the residuals. Each iteration takes approximately the same amount of time as the original fit, so these iterations are expensive. They are most useful when the noise has extremely heavy tails, such as Cauchy noise. Noise with less heavy-tails, such as t-distributions with df>2, are less problematic. The weights downgrade the influence of points with large residuals. In the extreme case, points whose residuals are larger than 6 times the median absolute residual are given weight 0.
delta can be used to save computations. For each x_i, regressions are skipped for points closer than delta. The next regression is fit for the farthest point within delta of x_i and all points in between are estimated by linearly interpolating between the two regression fits.
Judicious choice of delta can cut computation time considerably for large data (N > 5000). A good choice is
delta = 0.01 * range(exog).If xvals is provided, the regression is then computed at those points and the fit values are returned. Otherwise, the regression is run at points of exog.
Some experimentation is likely required to find a good choice of frac and iter for a particular dataset.
References
Cleveland, W.S. (1979) “Robust Locally Weighted Regression and Smoothing Scatterplots”. Journal of the American Statistical Association 74 (368): 829-836.
Examples
The below allows a comparison between how different the fits from lowess for different values of frac can be.
>>> import numpy as np >>> import statsmodels.api as sm >>> lowess = sm.nonparametric.lowess >>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500) >>> y = np.sin(x) + np.random.normal(size=len(x)) >>> z = lowess(y, x) >>> w = lowess(y, x, frac=1./3)
This gives a similar comparison for when it is 0 vs not.
>>> import numpy as np >>> import scipy.stats as stats >>> import statsmodels.api as sm >>> lowess = sm.nonparametric.lowess >>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500) >>> y = np.sin(x) + stats.cauchy.rvs(size=len(x)) >>> z = lowess(y, x, frac= 1./3, it=0) >>> w = lowess(y, x, frac=1./3)
- savgol_filter(x, window_length, polyorder, deriv=0, delta=1.0, axis=-1, mode='interp', cval=0.0)[source]¶
Apply a Savitzky-Golay filter to an array.
This is a 1-D filter. If x has dimension greater than 1, axis determines the axis along which the filter is applied.
- Parameters:
- xarray_like
The data to be filtered. If x is not a single or double precision floating point array, it will be converted to type
numpy.float64before filtering.- window_lengthint
The length of the filter window (i.e., the number of coefficients). If mode is ‘interp’, window_length must be less than or equal to the size of x.
- polyorderint
The order of the polynomial used to fit the samples. polyorder must be less than window_length.
- derivint, optional
The order of the derivative to compute. This must be a nonnegative integer. The default is 0, which means to filter the data without differentiating.
- deltafloat, optional
The spacing of the samples to which the filter will be applied. This is only used if deriv > 0. Default is 1.0.
- axisint, optional
The axis of the array x along which the filter is to be applied. Default is -1.
- modestr, optional
Must be ‘mirror’, ‘constant’, ‘nearest’, ‘wrap’ or ‘interp’. This determines the type of extension to use for the padded signal to which the filter is applied. When mode is ‘constant’, the padding value is given by cval. See the Notes for more details on ‘mirror’, ‘constant’, ‘wrap’, and ‘nearest’. When the ‘interp’ mode is selected (the default), no extension is used. Instead, a degree polyorder polynomial is fit to the last window_length values of the edges, and this polynomial is used to evaluate the last window_length // 2 output values.
- cvalscalar, optional
Value to fill past the edges of the input if mode is ‘constant’. Default is 0.0.
- Returns:
- yndarray, same shape as x
The filtered data.
See also
savgol_coeffs
Notes
Details on the mode options:
- ‘mirror’:
Repeats the values at the edges in reverse order. The value closest to the edge is not included.
- ‘nearest’:
The extension contains the nearest input value.
- ‘constant’:
The extension contains the value given by the cval argument.
- ‘wrap’:
The extension contains the values from the other end of the array.
For example, if the input is [1, 2, 3, 4, 5, 6, 7, 8], and window_length is 7, the following shows the extended data for the various mode options (assuming cval is 0):
mode | Ext | Input | Ext -----------+---------+------------------------+--------- 'mirror' | 4 3 2 | 1 2 3 4 5 6 7 8 | 7 6 5 'nearest' | 1 1 1 | 1 2 3 4 5 6 7 8 | 8 8 8 'constant' | 0 0 0 | 1 2 3 4 5 6 7 8 | 0 0 0 'wrap' | 6 7 8 | 1 2 3 4 5 6 7 8 | 1 2 3
New in version 0.14.0.
Examples
>>> import numpy as np >>> from scipy.signal import savgol_filter >>> np.set_printoptions(precision=2) # For compact display. >>> x = np.array([2, 2, 5, 2, 1, 0, 1, 4, 9])
Filter with a window length of 5 and a degree 2 polynomial. Use the defaults for all other parameters.
>>> savgol_filter(x, 5, 2) array([1.66, 3.17, 3.54, 2.86, 0.66, 0.17, 1. , 4. , 9. ])
Note that the last five values in x are samples of a parabola, so when mode=’interp’ (the default) is used with polyorder=2, the last three values are unchanged. Compare that to, for example, mode=’nearest’:
>>> savgol_filter(x, 5, 2, mode='nearest') array([1.74, 3.03, 3.54, 2.86, 0.66, 0.17, 1. , 4.6 , 7.97])
- select_salient_extrema(extrema, typ, spacing_hours=14, envelope_type='outer')[source]¶
Select salient extrema (HH/LL or HL/LH) using literal spacing-based OR logic.
- Parameters:
- extremapd.DataFrame with columns [“time”, “value”]
Candidate extrema.
- typstr
Either “high” or “low” (for peak or trough selection).
- spacing_hoursfloat
Time window for neighbor comparison.
- envelope_typestr
Either “outer” (default) or “inner” to switch saliency logic.
- Returns:
- pd.DataFrame
Extrema that passed the saliency test.
- smooth_series2(series, window_pts=25, method='lowess', **kwargs)[source]¶
Smooth a time series using the specified method. Currently supports ‘lowess’, ‘moving_average’, or ‘savgol’.
- tidal_envelope(series, smoothing_window_hours=2.5, n_good=3, peak_prominence=0.05, saliency_window_hours=14, max_anchor_gap_hours=36, envelope_type='outer')[source]¶
Compute the tidal envelope (high and low) of a time series using smoothing, extrema detection, and interpolation. This function processes a time series to extract its tidal envelope by smoothing the data, identifying significant peaks and troughs, filtering out unreliable extrema, selecting salient extrema within a specified window, and interpolating between anchor points to generate continuous envelope curves. Parameters ———- series : pandas.Series
Time-indexed series of water levels or similar data.
- smoothing_window_hoursfloat, optional
Window size in hours for smoothing the input series (default is 2.5).
- n_goodint, optional
Minimum number of good points required for an extremum to be considered valid (default is 3).
- peak_prominencefloat, optional
Minimum prominence of peaks/troughs to be considered as extrema (default is 0.05).
- saliency_window_hoursfloat, optional
Window size in hours for selecting salient extrema (default is 14).
- max_anchor_gap_hoursfloat, optional
Maximum allowed gap in hours between anchor points for interpolation (default is 36).
- envelope_typestr, optional
Type of envelope to compute, e.g., “outer” (default is “outer”).
Returns¶
- env_highpandas.Series
Interpolated high (upper) envelope of the input series.
- env_lowpandas.Series
Interpolated low (lower) envelope of the input series.
- anchor_highspandas.DataFrame
DataFrame of selected anchor points for the high envelope.
- anchor_lowspandas.DataFrame
DataFrame of selected anchor points for the low envelope.
- smoothedpandas.Series
Smoothed version of the input series.
Notes¶
This function assumes regular time intervals in the input series. If the frequency cannot be inferred, it is estimated from the first two timestamps.
vtools.functions.error_detect module¶
- med_outliers(ts, level=4.0, scale=None, filt_len=7, range=(None, None), quantiles=(0.01, 0.99), copy=True, as_anomaly=False)[source]¶
Detect outliers by running a median filter, subtracting it from the original series and comparing the resulting residuals to a global robust range of scale (the interquartile range). Individual time points are rejected if the residual at that time point is more than level times the range of scale.
The original concept comes from Basu & Meckesheimer (2007) Automatic outlier detection for time series: an application to sensor data although they didn’t use the interquartile range but rather expert judgment. To use this function effectively, you need to be thoughtful about what the interquartile range will be. For instance, for a strongly tidal flow station it is likely to
- level: Number of times the scale or interquantile range the data has to be
to be rejected.d
- scale: Expert judgment of the scale of maximum variation over a time step.
If None, the interquartile range will be used. Note that for a strongly tidal station the interquartile range may substantially overestimate the reasonable variation over a single time step, in which case the filter will work fine, but level should be set to a number (less than one) accordingly.
filt_len: length of median filter, default is 5
- quantilestuple of quantiles defining the measure of scale. Ignored
if scale is given directly. Default is interquartile range, and this is almost always a reasonable choice.
copy: if True, a copy is made leaving original series intact
You can also specify rejection of values based on a simple range
Returns: copy of series with outliers replaced by nan
- nrepeat(ts)[source]¶
Return the length of consecutive runs of repeated values
- Parameters:
- ts: DataFrame or series
- Returns:
- Like-indexed series with lengths of runs. Nans will be mapped to 0
- steep_then_nan(ts, level=4.0, scale=None, filt_len=11, range=(None, None), quantiles=(0.01, 0.99), copy=True, as_anomaly=True)[source]¶
Detect outliers by running a median filter, subtracting it from the original series and comparing the resulting residuals to a global robust range of scale (the interquartile range). Individual time points are rejected if the residual at that time point is more than level times the range of scale.
The original concept comes from Basu & Meckesheimer (2007) although they didn’t use the interquartile range but rather expert judgment. To use this function effectively, you need to be thoughtful about what the interquartile range will be. For instance, for a strongly tidal flow station it is likely to
- level: Number of times the scale or interquantile range the data has to be
to be rejected.d
- scale: Expert judgment of the scale of maximum variation over a time step.
If None, the interquartile range will be used. Note that for a strongly tidal station the interquartile range may substantially overestimate the reasonable variation over a single time step, in which case the filter will work fine, but level should be set to a number (less than one) accordingly.
filt_len: length of median filter, default is 5
- quantilestuple of quantiles defining the measure of scale. Ignored
if scale is given directly. Default is interquartile range, and this is almost always a reasonable choice.
copy: if True, a copy is made leaving original series intact
You can also specify rejection of values based on a simple range
Returns: copy of series with outliers replaced by nan
vtools.functions.example2 module¶
vtools.functions.filter module¶
Module contains filter used in tidal time series analysis.
- _lanczos_impl(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True, cosine_taper=False)[source]¶
squared low-pass cosine lanczos filter on a regular time series.
- Parameters:
- ts
DataFrame - filter_lenint, time_interval
Size of lanczos window, default is to number of samples within filter_period*1.25.
- cutoff_frequency: float,optional
Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.
- cutoff_periodstring or _time_interval
Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.
- padtypestr or None, optional
Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.
- padlenint or None, optional
The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.
- fill_edge_nan: bool,optional
If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.
- ts
- Returns:
- result
TimeSeries A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.
- result
- Raises:
- ValueError
If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size
- butter(N, Wn, btype='low', analog=False, output='ba', fs=None)[source]¶
Butterworth digital and analog filter design.
Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.
- Parameters:
- Nint
The order of the filter. For ‘bandpass’ and ‘bandstop’ filters, the resulting order of the final second-order sections (‘sos’) matrix is
2*N, with N the number of biquad sections of the desired system.- Wnarray_like
The critical frequency or frequencies. For lowpass and highpass filters, Wn is a scalar; for bandpass and bandstop filters, Wn is a length-2 sequence.
For a Butterworth filter, this is the point at which the gain drops to 1/sqrt(2) that of the passband (the “-3 dB point”).
For digital filters, if fs is not specified, Wn units are normalized from 0 to 1, where 1 is the Nyquist frequency (Wn is thus in half cycles / sample and defined as 2*critical frequencies / fs). If fs is specified, Wn is in the same units as fs.
For analog filters, Wn is an angular frequency (e.g. rad/s).
- btype{‘lowpass’, ‘highpass’, ‘bandpass’, ‘bandstop’}, optional
The type of filter. Default is ‘lowpass’.
- analogbool, optional
When True, return an analog filter, otherwise a digital filter is returned.
- output{‘ba’, ‘zpk’, ‘sos’}, optional
Type of output: numerator/denominator (‘ba’), pole-zero (‘zpk’), or second-order sections (‘sos’). Default is ‘ba’ for backwards compatibility, but ‘sos’ should be used for general-purpose filtering.
- fsfloat, optional
The sampling frequency of the digital system.
New in version 1.2.0.
- Returns:
- b, andarray, ndarray
Numerator (b) and denominator (a) polynomials of the IIR filter. Only returned if
output='ba'.- z, p, kndarray, ndarray, float
Zeros, poles, and system gain of the IIR filter transfer function. Only returned if
output='zpk'.- sosndarray
Second-order sections representation of the IIR filter. Only returned if
output='sos'.
See also
buttord,buttap
Notes
The Butterworth filter has maximally flat frequency response in the passband.
The
'sos'output parameter was added in 0.16.0.If the transfer function form
[b, a]is requested, numerical problems can occur since the conversion between roots and the polynomial coefficients is a numerically sensitive operation, even for N >= 4. It is recommended to work with the SOS representation.Warning
Designing high-order and narrowband IIR filters in TF form can result in unstable or incorrect filtering due to floating point numerical precision issues. Consider inspecting output filter characteristics freqz or designing the filters with second-order sections via
output='sos'.Examples
Design an analog filter and plot its frequency response, showing the critical points:
>>> from scipy import signal >>> import matplotlib.pyplot as plt >>> import numpy as np
>>> b, a = signal.butter(4, 100, 'low', analog=True) >>> w, h = signal.freqs(b, a) >>> plt.semilogx(w, 20 * np.log10(abs(h))) >>> plt.title('Butterworth filter frequency response') >>> plt.xlabel('Frequency [radians / second]') >>> plt.ylabel('Amplitude [dB]') >>> plt.margins(0, 0.1) >>> plt.grid(which='both', axis='both') >>> plt.axvline(100, color='green') # cutoff frequency >>> plt.show()
Generate a signal made up of 10 Hz and 20 Hz, sampled at 1 kHz
>>> t = np.linspace(0, 1, 1000, False) # 1 second >>> sig = np.sin(2*np.pi*10*t) + np.sin(2*np.pi*20*t) >>> fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True) >>> ax1.plot(t, sig) >>> ax1.set_title('10 Hz and 20 Hz sinusoids') >>> ax1.axis([0, 1, -2, 2])
Design a digital high-pass filter at 15 Hz to remove the 10 Hz tone, and apply it to the signal. (It’s recommended to use second-order sections format when filtering, to avoid numerical error with transfer function (
ba) format):>>> sos = signal.butter(10, 15, 'hp', fs=1000, output='sos') >>> filtered = signal.sosfilt(sos, sig) >>> ax2.plot(t, filtered) >>> ax2.set_title('After 15 Hz high-pass filter') >>> ax2.axis([0, 1, -2, 2]) >>> ax2.set_xlabel('Time [seconds]') >>> plt.tight_layout() >>> plt.show()
- butterworth(ts, cutoff_period=None, cutoff_frequency=None, order=4)[source]¶
low-pass butterworth-squared filter on a regular time series.
- Parameters:
- ts
DataFrame Must be one or two dimensional, and regular.
- order: int ,optional
The default is 4.
- cutoff_frequency: float,optional
Cutoff frequency expressed as a ratio with Nyquist frequency, should within the range (0,1). For a discretely sampled system, the Nyquist frequency is the fastest frequency that can be resolved by that sampling, which is half the sampling frequency. For example, if the sampling frequency is 1 sample/1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.
- cutoff_periodstring or _time_interval
Period corresponding to cutoff frequency. If input as a string, it must be convertible to a regular interval using the same rules as a pandas frequency.. cutoff_frequency and cutoff_period can’t be specified at the same time.
- ts
- Returns:
- result
A new regular time series with the same interval as ts.
- Raises:
- ValueError
If input order is not even, or input timeseries is not regular, or neither cutoff_period and cutoff_frequency is given while input time series interval is not 15min or 1 hour, or cutoff_period and cutoff_frequency are given at the same time.
- cosine_lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
- cosine_lanczos5(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
squared low-pass cosine lanczos filter on a regular time series.
- Parameters:
- ts
DataFrame - filter_lenint, time_interval
Size of lanczos window, default is to number of samples within filter_period*1.25.
- cutoff_frequency: float,optional
Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.
- cutoff_periodstring or _time_interval
Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.
- padtypestr or None, optional
Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.
- padlenint or None, optional
The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.
- fill_edge_nan: bool,optional
If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.
- ts
- Returns:
- result
TimeSeries A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.
- result
- Raises:
- ValueError
If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size
- filtfilt(b, a, x, axis=-1, padtype='odd', padlen=None, method='pad', irlen=None)[source]¶
Apply a digital filter forward and backward to a signal.
This function applies a linear digital filter twice, once forward and once backwards. The combined filter has zero phase and a filter order twice that of the original.
The function provides options for handling the edges of the signal.
The function sosfiltfilt (and filter design using
output='sos') should be preferred over filtfilt for most filtering tasks, as second-order sections have fewer numerical problems.- Parameters:
- b(N,) array_like
The numerator coefficient vector of the filter.
- a(N,) array_like
The denominator coefficient vector of the filter. If
a[0]is not 1, then both a and b are normalized bya[0].- xarray_like
The array of data to be filtered.
- axisint, optional
The axis of x to which the filter is applied. Default is -1.
- padtypestr or None, optional
Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is ‘odd’.
- padlenint or None, optional
The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than
x.shape[axis] - 1.padlen=0implies no padding. The default value is3 * max(len(a), len(b)).- methodstr, optional
Determines the method for handling the edges of the signal, either “pad” or “gust”. When method is “pad”, the signal is padded; the type of padding is determined by padtype and padlen, and irlen is ignored. When method is “gust”, Gustafsson’s method is used, and padtype and padlen are ignored.
- irlenint or None, optional
When method is “gust”, irlen specifies the length of the impulse response of the filter. If irlen is None, no part of the impulse response is ignored. For a long signal, specifying irlen can significantly improve the performance of the filter.
- Returns:
- yndarray
The filtered output with the same shape as x.
See also
sosfiltfilt,lfilter_zi,lfilter,lfiltic,savgol_filter,sosfilt
Notes
When method is “pad”, the function pads the data along the given axis in one of three ways: odd, even or constant. The odd and even extensions have the corresponding symmetry about the end point of the data. The constant extension extends the data with the values at the end points. On both the forward and backward passes, the initial condition of the filter is found by using lfilter_zi and scaling it by the end point of the extended data.
When method is “gust”, Gustafsson’s method [1] is used. Initial conditions are chosen for the forward and backward passes so that the forward-backward filter gives the same result as the backward-forward filter.
The option to use Gustaffson’s method was added in scipy version 0.16.0.
References
[1]F. Gustaffson, “Determining the initial states in forward-backward filtering”, Transactions on Signal Processing, Vol. 46, pp. 988-992, 1996.
Examples
The examples will use several functions from scipy.signal.
>>> import numpy as np >>> from scipy import signal >>> import matplotlib.pyplot as plt
First we create a one second signal that is the sum of two pure sine waves, with frequencies 5 Hz and 250 Hz, sampled at 2000 Hz.
>>> t = np.linspace(0, 1.0, 2001) >>> xlow = np.sin(2 * np.pi * 5 * t) >>> xhigh = np.sin(2 * np.pi * 250 * t) >>> x = xlow + xhigh
Now create a lowpass Butterworth filter with a cutoff of 0.125 times the Nyquist frequency, or 125 Hz, and apply it to
xwith filtfilt. The result should be approximatelyxlow, with no phase shift.>>> b, a = signal.butter(8, 0.125) >>> y = signal.filtfilt(b, a, x, padlen=150) >>> np.abs(y - xlow).max() 9.1086182074789912e-06
We get a fairly clean result for this artificial example because the odd extension is exact, and with the moderately long padding, the filter’s transients have dissipated by the time the actual data is reached. In general, transient effects at the edges are unavoidable.
The following example demonstrates the option
method="gust".First, create a filter.
>>> b, a = signal.ellip(4, 0.01, 120, 0.125) # Filter to be applied.
sig is a random input signal to be filtered.
>>> rng = np.random.default_rng() >>> n = 60 >>> sig = rng.standard_normal(n)**3 + 3*rng.standard_normal(n).cumsum()
Apply filtfilt to sig, once using the Gustafsson method, and once using padding, and plot the results for comparison.
>>> fgust = signal.filtfilt(b, a, sig, method="gust") >>> fpad = signal.filtfilt(b, a, sig, padlen=50) >>> plt.plot(sig, 'k-', label='input') >>> plt.plot(fgust, 'b-', linewidth=4, label='gust') >>> plt.plot(fpad, 'c-', linewidth=1.5, label='pad') >>> plt.legend(loc='best') >>> plt.show()
The irlen argument can be used to improve the performance of Gustafsson’s method.
Estimate the impulse response length of the filter.
>>> z, p, k = signal.tf2zpk(b, a) >>> eps = 1e-9 >>> r = np.max(np.abs(p)) >>> approx_impulse_len = int(np.ceil(np.log(eps) / np.log(r))) >>> approx_impulse_len 137
Apply the filter to a longer signal, with and without the irlen argument. The difference between y1 and y2 is small. For long signals, using irlen gives a significant performance improvement.
>>> x = rng.standard_normal(4000) >>> y1 = signal.filtfilt(b, a, x, method='gust') >>> y2 = signal.filtfilt(b, a, x, method='gust', irlen=approx_impulse_len) >>> print(np.max(np.abs(y1 - y2))) 2.875334415008979e-10
- firwin(numtaps, cutoff, *, width=None, window='hamming', pass_zero=True, scale=True, nyq=<object object>, fs=None)[source]¶
FIR filter design using the window method.
This function computes the coefficients of a finite impulse response filter. The filter will have linear phase; it will be Type I if numtaps is odd and Type II if numtaps is even.
Type II filters always have zero response at the Nyquist frequency, so a ValueError exception is raised if firwin is called with numtaps even and having a passband whose right end is at the Nyquist frequency.
- Parameters:
- numtapsint
Length of the filter (number of coefficients, i.e. the filter order + 1). numtaps must be odd if a passband includes the Nyquist frequency.
- cutofffloat or 1-D array_like
Cutoff frequency of filter (expressed in the same units as fs) OR an array of cutoff frequencies (that is, band edges). In the latter case, the frequencies in cutoff should be positive and monotonically increasing between 0 and fs/2. The values 0 and fs/2 must not be included in cutoff.
- widthfloat or None, optional
If width is not None, then assume it is the approximate width of the transition region (expressed in the same units as fs) for use in Kaiser FIR filter design. In this case, the window argument is ignored.
- windowstring or tuple of string and parameter values, optional
Desired window to use. See scipy.signal.get_window for a list of windows and required parameters.
- pass_zero{True, False, ‘bandpass’, ‘lowpass’, ‘highpass’, ‘bandstop’}, optional
If True, the gain at the frequency 0 (i.e., the “DC gain”) is 1. If False, the DC gain is 0. Can also be a string argument for the desired filter type (equivalent to
btypein IIR design functions).New in version 1.3.0: Support for string arguments.
- scalebool, optional
Set to True to scale the coefficients so that the frequency response is exactly unity at a certain frequency. That frequency is either:
0 (DC) if the first passband starts at 0 (i.e. pass_zero is True)
fs/2 (the Nyquist frequency) if the first passband ends at fs/2 (i.e the filter is a single band highpass filter); center of first passband otherwise
- nyqfloat, optional, deprecated
This is the Nyquist frequency. Each frequency in cutoff must be between 0 and nyq. Default is 1.
Deprecated since version 1.0.0: firwin keyword argument nyq is deprecated in favour of fs and will be removed in SciPy 1.14.0.
- fsfloat, optional
The sampling frequency of the signal. Each frequency in cutoff must be between 0 and
fs/2. Default is 2.
- Returns:
- h(numtaps,) ndarray
Coefficients of length numtaps FIR filter.
- Raises:
- ValueError
If any value in cutoff is less than or equal to 0 or greater than or equal to
fs/2, if the values in cutoff are not strictly monotonically increasing, or if numtaps is even but a passband includes the Nyquist frequency.
See also
firwin2firlsminimum_phaseremez
Examples
Low-pass from 0 to f:
>>> from scipy import signal >>> numtaps = 3 >>> f = 0.1 >>> signal.firwin(numtaps, f) array([ 0.06799017, 0.86401967, 0.06799017])
Use a specific window function:
>>> signal.firwin(numtaps, f, window='nuttall') array([ 3.56607041e-04, 9.99286786e-01, 3.56607041e-04])
High-pass (‘stop’ from 0 to f):
>>> signal.firwin(numtaps, f, pass_zero=False) array([-0.00859313, 0.98281375, -0.00859313])
Band-pass:
>>> f1, f2 = 0.1, 0.2 >>> signal.firwin(numtaps, [f1, f2], pass_zero=False) array([ 0.06301614, 0.88770441, 0.06301614])
Band-stop:
>>> signal.firwin(numtaps, [f1, f2]) array([-0.00801395, 1.0160279 , -0.00801395])
Multi-band (passbands are [0, f1], [f2, f3] and [f4, 1]):
>>> f3, f4 = 0.3, 0.4 >>> signal.firwin(numtaps, [f1, f2, f3, f4]) array([-0.01376344, 1.02752689, -0.01376344])
Multi-band (passbands are [f1, f2] and [f3,f4]):
>>> signal.firwin(numtaps, [f1, f2, f3, f4], pass_zero=False) array([ 0.04890915, 0.91284326, 0.04890915])
- gaussian_filter1d(input, sigma, axis=-1, order=0, output=None, mode='reflect', cval=0.0, truncate=4.0, *, radius=None)[source]¶
1-D Gaussian filter.
- Parameters:
- inputarray_like
The input array.
- sigmascalar
standard deviation for Gaussian kernel
- axisint, optional
The axis of input along which to calculate. Default is -1.
- orderint, optional
An order of 0 corresponds to convolution with a Gaussian kernel. A positive order corresponds to convolution with that derivative of a Gaussian.
- outputarray or dtype, optional
The array in which to place the output, or the dtype of the returned array. By default an array of the same dtype as input will be created.
- mode{‘reflect’, ‘constant’, ‘nearest’, ‘mirror’, ‘wrap’}, optional
The mode parameter determines how the input array is extended beyond its boundaries. Default is ‘reflect’. Behavior for each valid value is as follows:
- ‘reflect’ (d c b a | a b c d | d c b a)
The input is extended by reflecting about the edge of the last pixel. This mode is also sometimes referred to as half-sample symmetric.
- ‘constant’ (k k k k | a b c d | k k k k)
The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter.
- ‘nearest’ (a a a a | a b c d | d d d d)
The input is extended by replicating the last pixel.
- ‘mirror’ (d c b | a b c d | c b a)
The input is extended by reflecting about the center of the last pixel. This mode is also sometimes referred to as whole-sample symmetric.
- ‘wrap’ (a b c d | a b c d | a b c d)
The input is extended by wrapping around to the opposite edge.
For consistency with the interpolation functions, the following mode names can also be used:
- ‘grid-mirror’
This is a synonym for ‘reflect’.
- ‘grid-constant’
This is a synonym for ‘constant’.
- ‘grid-wrap’
This is a synonym for ‘wrap’.
- cvalscalar, optional
Value to fill past edges of input if mode is ‘constant’. Default is 0.0.
- truncatefloat, optional
Truncate the filter at this many standard deviations. Default is 4.0.
- radiusNone or int, optional
Radius of the Gaussian kernel. If specified, the size of the kernel will be
2*radius + 1, and truncate is ignored. Default is None.
- Returns:
- gaussian_filter1dndarray
Notes
The Gaussian kernel will have size
2*radius + 1along each axis. If radius is None, a defaultradius = round(truncate * sigma)will be used.Examples
>>> from scipy.ndimage import gaussian_filter1d >>> import numpy as np >>> gaussian_filter1d([1.0, 2.0, 3.0, 4.0, 5.0], 1) array([ 1.42704095, 2.06782203, 3. , 3.93217797, 4.57295905]) >>> gaussian_filter1d([1.0, 2.0, 3.0, 4.0, 5.0], 4) array([ 2.91948343, 2.95023502, 3. , 3.04976498, 3.08051657]) >>> import matplotlib.pyplot as plt >>> rng = np.random.default_rng() >>> x = rng.standard_normal(101).cumsum() >>> y3 = gaussian_filter1d(x, 3) >>> y6 = gaussian_filter1d(x, 6) >>> plt.plot(x, 'k', label='original data') >>> plt.plot(y3, '--', label='filtered, sigma=3') >>> plt.plot(y6, ':', label='filtered, sigma=6') >>> plt.legend() >>> plt.grid() >>> plt.show()
- generate_godin_fir(freq)[source]¶
generate godin filter impulse response for given freq freq is a pandas freq
- godin(ts)[source]¶
Low-pass Godin filter a regular time series. Applies the \(\mathcal{A_{24}^{2}A_{25}}\) Godin filter [1] The filter is generalized to be the equivalent of one boxcar of the length of the lunar diurnal (~25 hours) constituent and two of the solar diurnal (~24 hours), though the implementation combines these steps.
- Parameters:
- Returns:
- result
DataFrame A new regular time series with the same interval of ts.
- result
- Raises:
- NotImplementedError
If input time series is not univariate
References
[1]Godin (1972) Analysis of Tides
- lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
- lfilter(b, a, x, axis=-1, zi=None)[source]¶
Filter data along one-dimension with an IIR or FIR filter.
Filter a data sequence, x, using a digital filter. This works for many fundamental data types (including Object type). The filter is a direct form II transposed implementation of the standard difference equation (see Notes).
The function sosfilt (and filter design using
output='sos') should be preferred over lfilter for most filtering tasks, as second-order sections have fewer numerical problems.- Parameters:
- barray_like
The numerator coefficient vector in a 1-D sequence.
- aarray_like
The denominator coefficient vector in a 1-D sequence. If
a[0]is not 1, then both a and b are normalized bya[0].- xarray_like
An N-dimensional input array.
- axisint, optional
The axis of the input data array along which to apply the linear filter. The filter is applied to each subarray along this axis. Default is -1.
- ziarray_like, optional
Initial conditions for the filter delays. It is a vector (or array of vectors for an N-dimensional input) of length
max(len(a), len(b)) - 1. If zi is None or is not given then initial rest is assumed. See lfiltic for more information.
- Returns:
- yarray
The output of the digital filter.
- zfarray, optional
If zi is None, this is not returned, otherwise, zf holds the final filter delay values.
See also
lfilticConstruct initial conditions for lfilter.
lfilter_ziCompute initial state (steady state of step response) for lfilter.
filtfiltA forward-backward filter, to obtain a filter with zero phase.
savgol_filterA Savitzky-Golay filter.
sosfiltFilter data using cascaded second-order sections.
sosfiltfiltA forward-backward filter using second-order sections.
Notes
The filter function is implemented as a direct II transposed structure. This means that the filter implements:
a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[M]*x[n-M] - a[1]*y[n-1] - ... - a[N]*y[n-N]
where M is the degree of the numerator, N is the degree of the denominator, and n is the sample number. It is implemented using the following difference equations (assuming M = N):
a[0]*y[n] = b[0] * x[n] + d[0][n-1] d[0][n] = b[1] * x[n] - a[1] * y[n] + d[1][n-1] d[1][n] = b[2] * x[n] - a[2] * y[n] + d[2][n-1] ... d[N-2][n] = b[N-1]*x[n] - a[N-1]*y[n] + d[N-1][n-1] d[N-1][n] = b[N] * x[n] - a[N] * y[n]
where d are the state variables.
The rational transfer function describing this filter in the z-transform domain is:
-1 -M b[0] + b[1]z + ... + b[M] z Y(z) = -------------------------------- X(z) -1 -N a[0] + a[1]z + ... + a[N] z
Examples
Generate a noisy signal to be filtered:
>>> import numpy as np >>> from scipy import signal >>> import matplotlib.pyplot as plt >>> rng = np.random.default_rng() >>> t = np.linspace(-1, 1, 201) >>> x = (np.sin(2*np.pi*0.75*t*(1-t) + 2.1) + ... 0.1*np.sin(2*np.pi*1.25*t + 1) + ... 0.18*np.cos(2*np.pi*3.85*t)) >>> xn = x + rng.standard_normal(len(t)) * 0.08
Create an order 3 lowpass butterworth filter:
>>> b, a = signal.butter(3, 0.05)
Apply the filter to xn. Use lfilter_zi to choose the initial condition of the filter:
>>> zi = signal.lfilter_zi(b, a) >>> z, _ = signal.lfilter(b, a, xn, zi=zi*xn[0])
Apply the filter again, to have a result filtered at an order the same as filtfilt:
>>> z2, _ = signal.lfilter(b, a, z, zi=zi*z[0])
Use filtfilt to apply the filter:
>>> y = signal.filtfilt(b, a, xn)
Plot the original signal and the various filtered versions:
>>> plt.figure >>> plt.plot(t, xn, 'b', alpha=0.75) >>> plt.plot(t, z, 'r--', t, z2, 'r', t, y, 'k') >>> plt.legend(('noisy signal', 'lfilter, once', 'lfilter, twice', ... 'filtfilt'), loc='best') >>> plt.grid(True) >>> plt.show()
- lowpass_cosine_lanczos_filter_coef(cf, m, normalize=True)[source]¶
return the convolution coefficients for low pass lanczos filter.
- Parameters:
- cf: float
Cutoff frequency expressed as a ratio of a Nyquist frequency.
- m: int
Size of filtering window size.
- Returns:
- results: list
Coefficients of filtering window.
- lowpass_lanczos_filter_coef(cf, m, normalize=True, cosine_taper=False)[source]¶
Return the convolution coefficients for a low-pass Lanczos filter.
- Parameters:
- cffloat
Cutoff frequency expressed as a ratio of the Nyquist frequency.
- mint
Size of the filtering window.
- normalizebool, optional
Whether to normalize the filter coefficients so they sum to 1.
- cosine_taperbool, optional
If True, applies a cosine-squared taper to the Lanczos window.
- Returns:
- resnp.ndarray
Coefficients of the filtering window.
- ts_gaussian_filter(ts, sigma, order=0, mode='reflect', cval=0.0, truncate=4.0)[source]¶
Column-wise Gaussian smoothing of regular time series. Missing/irregular values are not handled, which means this function is not much different from a rolling window gaussian average in pandas which may be preferable in the case of missing data (ts.rolling(window=5,win_type=’gaussian’).mean. This function has been kept around awaiting irreg as an aspiration but yet to be implemented.
vtools.functions.frequency_response module¶
- butterworth(ts, cutoff_period=None, cutoff_frequency=None, order=4)[source]¶
low-pass butterworth-squared filter on a regular time series.
- Parameters:
- ts
DataFrame Must be one or two dimensional, and regular.
- order: int ,optional
The default is 4.
- cutoff_frequency: float,optional
Cutoff frequency expressed as a ratio with Nyquist frequency, should within the range (0,1). For a discretely sampled system, the Nyquist frequency is the fastest frequency that can be resolved by that sampling, which is half the sampling frequency. For example, if the sampling frequency is 1 sample/1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.
- cutoff_periodstring or _time_interval
Period corresponding to cutoff frequency. If input as a string, it must be convertible to a regular interval using the same rules as a pandas frequency.. cutoff_frequency and cutoff_period can’t be specified at the same time.
- ts
- Returns:
- result
A new regular time series with the same interval as ts.
- Raises:
- ValueError
If input order is not even, or input timeseries is not regular, or neither cutoff_period and cutoff_frequency is given while input time series interval is not 15min or 1 hour, or cutoff_period and cutoff_frequency are given at the same time.
- compare_response(cutoff_period)[source]¶
Generate frequency response plot of low-pass filters: cosine_lanczos, boxcar 24h, boxcar 25h, and godin.
- Parameters:
- cutoff_periodint
Low-pass filter cutoff period in number of hours.
- Returns:
- None.
- cosine_lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
- cosine_lanczos5(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
squared low-pass cosine lanczos filter on a regular time series.
- Parameters:
- ts
DataFrame - filter_lenint, time_interval
Size of lanczos window, default is to number of samples within filter_period*1.25.
- cutoff_frequency: float,optional
Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.
- cutoff_periodstring or _time_interval
Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.
- padtypestr or None, optional
Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.
- padlenint or None, optional
The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.
- fill_edge_nan: bool,optional
If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.
- ts
- Returns:
- result
TimeSeries A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.
- result
- Raises:
- ValueError
If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size
- freqz(b, a=1, worN=512, whole=False, plot=None, fs=6.283185307179586, include_nyquist=False)[source]¶
Compute the frequency response of a digital filter.
Given the M-order numerator b and N-order denominator a of a digital filter, compute its frequency response:
jw -jw -jwM jw B(e ) b[0] + b[1]e + ... + b[M]e H(e ) = ------ = ----------------------------------- jw -jw -jwN A(e ) a[0] + a[1]e + ... + a[N]e
- Parameters:
- barray_like
Numerator of a linear filter. If b has dimension greater than 1, it is assumed that the coefficients are stored in the first dimension, and
b.shape[1:],a.shape[1:], and the shape of the frequencies array must be compatible for broadcasting.- aarray_like
Denominator of a linear filter. If b has dimension greater than 1, it is assumed that the coefficients are stored in the first dimension, and
b.shape[1:],a.shape[1:], and the shape of the frequencies array must be compatible for broadcasting.- worN{None, int, array_like}, optional
If a single integer, then compute at that many frequencies (default is N=512). This is a convenient alternative to:
np.linspace(0, fs if whole else fs/2, N, endpoint=include_nyquist)
Using a number that is fast for FFT computations can result in faster computations (see Notes).
If an array_like, compute the response at the frequencies given. These are in the same units as fs.
- wholebool, optional
Normally, frequencies are computed from 0 to the Nyquist frequency, fs/2 (upper-half of unit-circle). If whole is True, compute frequencies from 0 to fs. Ignored if worN is array_like.
- plotcallable
A callable that takes two arguments. If given, the return parameters w and h are passed to plot. Useful for plotting the frequency response inside freqz.
- fsfloat, optional
The sampling frequency of the digital system. Defaults to 2*pi radians/sample (so w is from 0 to pi).
New in version 1.2.0.
- include_nyquistbool, optional
If whole is False and worN is an integer, setting include_nyquist to True will include the last frequency (Nyquist frequency) and is otherwise ignored.
New in version 1.5.0.
- Returns:
- wndarray
The frequencies at which h was computed, in the same units as fs. By default, w is normalized to the range [0, pi) (radians/sample).
- hndarray
The frequency response, as complex numbers.
See also
freqz_zpksosfreqz
Notes
Using Matplotlib’s
matplotlib.pyplot.plot()function as the callable for plot produces unexpected results, as this plots the real part of the complex transfer function, not the magnitude. Trylambda w, h: plot(w, np.abs(h)).A direct computation via (R)FFT is used to compute the frequency response when the following conditions are met:
An integer value is given for worN.
worN is fast to compute via FFT (i.e., next_fast_len(worN) <scipy.fft.next_fast_len> equals worN).
The denominator coefficients are a single value (
a.shape[0] == 1).worN is at least as long as the numerator coefficients (
worN >= b.shape[0]).If
b.ndim > 1, thenb.shape[-1] == 1.
For long FIR filters, the FFT approach can have lower error and be much faster than the equivalent direct polynomial calculation.
Examples
>>> from scipy import signal >>> import numpy as np >>> b = signal.firwin(80, 0.5, window=('kaiser', 8)) >>> w, h = signal.freqz(b)
>>> import matplotlib.pyplot as plt >>> fig, ax1 = plt.subplots() >>> ax1.set_title('Digital filter frequency response')
>>> ax1.plot(w, 20 * np.log10(abs(h)), 'b') >>> ax1.set_ylabel('Amplitude [dB]', color='b') >>> ax1.set_xlabel('Frequency [rad/sample]')
>>> ax2 = ax1.twinx() >>> angles = np.unwrap(np.angle(h)) >>> ax2.plot(w, angles, 'g') >>> ax2.set_ylabel('Angle (radians)', color='g') >>> ax2.grid(True) >>> ax2.axis('tight') >>> plt.show()
Broadcasting Examples
Suppose we have two FIR filters whose coefficients are stored in the rows of an array with shape (2, 25). For this demonstration, we’ll use random data:
>>> rng = np.random.default_rng() >>> b = rng.random((2, 25))
To compute the frequency response for these two filters with one call to freqz, we must pass in
b.T, because freqz expects the first axis to hold the coefficients. We must then extend the shape with a trivial dimension of length 1 to allow broadcasting with the array of frequencies. That is, we pass inb.T[..., np.newaxis], which has shape (25, 2, 1):>>> w, h = signal.freqz(b.T[..., np.newaxis], worN=1024) >>> w.shape (1024,) >>> h.shape (2, 1024)
Now, suppose we have two transfer functions, with the same numerator coefficients
b = [0.5, 0.5]. The coefficients for the two denominators are stored in the first dimension of the 2-D array a:a = [ 1 1 ] [ -0.25, -0.5 ]
>>> b = np.array([0.5, 0.5]) >>> a = np.array([[1, 1], [-0.25, -0.5]])
Only a is more than 1-D. To make it compatible for broadcasting with the frequencies, we extend it with a trivial dimension in the call to freqz:
>>> w, h = signal.freqz(b, a[..., np.newaxis], worN=1024) >>> w.shape (1024,) >>> h.shape (2, 1024)
- godin(ts)[source]¶
Low-pass Godin filter a regular time series. Applies the \(\mathcal{A_{24}^{2}A_{25}}\) Godin filter [1] The filter is generalized to be the equivalent of one boxcar of the length of the lunar diurnal (~25 hours) constituent and two of the solar diurnal (~24 hours), though the implementation combines these steps.
- Parameters:
- Returns:
- result
DataFrame A new regular time series with the same interval of ts.
- result
- Raises:
- NotImplementedError
If input time series is not univariate
References
[1]Godin (1972) Analysis of Tides
- inset_axes(parent_axes, width, height, loc='upper right', bbox_to_anchor=None, bbox_transform=None, axes_class=None, axes_kwargs=None, borderpad=0.5)[source]¶
Create an inset axes with a given width and height.
Both sizes used can be specified either in inches or percentage. For example,:
inset_axes(parent_axes, width='40%', height='30%', loc='lower left')
creates in inset axes in the lower left corner of parent_axes which spans over 30% in height and 40% in width of the parent_axes. Since the usage of .inset_axes may become slightly tricky when exceeding such standard cases, it is recommended to read the examples.
- Parameters:
- parent_axesmatplotlib.axes.Axes
Axes to place the inset axes.
- width, heightfloat or str
Size of the inset axes to create. If a float is provided, it is the size in inches, e.g. width=1.3. If a string is provided, it is the size in relative units, e.g. width=’40%’. By default, i.e. if neither bbox_to_anchor nor bbox_transform are specified, those are relative to the parent_axes. Otherwise, they are to be understood relative to the bounding box provided via bbox_to_anchor.
- locstr, default: ‘upper right’
Location to place the inset axes. Valid locations are ‘upper left’, ‘upper center’, ‘upper right’, ‘center left’, ‘center’, ‘center right’, ‘lower left’, ‘lower center’, ‘lower right’. For backward compatibility, numeric values are accepted as well. See the parameter loc of .Legend for details.
- bbox_to_anchortuple or ~matplotlib.transforms.BboxBase, optional
Bbox that the inset axes will be anchored to. If None, a tuple of (0, 0, 1, 1) is used if bbox_transform is set to parent_axes.transAxes or parent_axes.figure.transFigure. Otherwise, parent_axes.bbox is used. If a tuple, can be either [left, bottom, width, height], or [left, bottom]. If the kwargs width and/or height are specified in relative units, the 2-tuple [left, bottom] cannot be used. Note that, unless bbox_transform is set, the units of the bounding box are interpreted in the pixel coordinate. When using bbox_to_anchor with tuple, it almost always makes sense to also specify a bbox_transform. This might often be the axes transform parent_axes.transAxes.
- bbox_transform~matplotlib.transforms.Transform, optional
Transformation for the bbox that contains the inset axes. If None, a .transforms.IdentityTransform is used. The value of bbox_to_anchor (or the return value of its get_points method) is transformed by the bbox_transform and then interpreted as points in the pixel coordinate (which is dpi dependent). You may provide bbox_to_anchor in some normalized coordinate, and give an appropriate transform (e.g., parent_axes.transAxes).
- axes_class~matplotlib.axes.Axes type, default: .HostAxes
The type of the newly created inset axes.
- axes_kwargsdict, optional
Keyword arguments to pass to the constructor of the inset axes. Valid arguments include:
Properties: adjustable: {‘box’, ‘datalim’} agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array and two offsets from the bottom left corner of the image alpha: scalar or None anchor: (float, float) or {‘C’, ‘SW’, ‘S’, ‘SE’, ‘E’, ‘NE’, …} animated: bool aspect: {‘auto’, ‘equal’} or float autoscale_on: bool autoscalex_on: unknown autoscaley_on: unknown axes_locator: Callable[[Axes, Renderer], Bbox] axisbelow: bool or ‘line’ box_aspect: float or None clip_box: ~matplotlib.transforms.BboxBase or None clip_on: bool clip_path: Patch or (Path, Transform) or None facecolor or fc: color figure: ~matplotlib.figure.Figure frame_on: bool gid: str in_layout: bool label: object mouseover: bool navigate: bool navigate_mode: unknown path_effects: list of .AbstractPathEffect picker: None or bool or float or callable position: [left, bottom, width, height] or ~matplotlib.transforms.Bbox prop_cycle: ~cycler.Cycler rasterization_zorder: float or None rasterized: bool sketch_params: (scale: float, length: float, randomness: float) snap: bool or None subplotspec: unknown title: str transform: ~matplotlib.transforms.Transform url: str visible: bool xbound: (lower: float, upper: float) xlabel: str xlim: (left: float, right: float) xmargin: float greater than -0.5 xscale: unknown xticklabels: unknown xticks: unknown ybound: (lower: float, upper: float) ylabel: str ylim: (bottom: float, top: float) ymargin: float greater than -0.5 yscale: unknown yticklabels: unknown yticks: unknown zorder: float
- borderpadfloat, default: 0.5
Padding between inset axes and the bbox_to_anchor. The units are axes font size, i.e. for a default font size of 10 points borderpad = 0.5 is equivalent to a padding of 5 points.
- Returns:
- inset_axesaxes_class
Inset axes object created.
Notes
The meaning of bbox_to_anchor and bbox_to_transform is interpreted differently from that of legend. The value of bbox_to_anchor (or the return value of its get_points method; the default is parent_axes.bbox) is transformed by the bbox_transform (the default is Identity transform) and then interpreted as points in the pixel coordinate (which is dpi dependent).
Thus, following three calls are identical and creates an inset axes with respect to the parent_axes:
axins = inset_axes(parent_axes, "30%", "40%") axins = inset_axes(parent_axes, "30%", "40%", bbox_to_anchor=parent_axes.bbox) axins = inset_axes(parent_axes, "30%", "40%", bbox_to_anchor=(0, 0, 1, 1), bbox_transform=parent_axes.transAxes)
- lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
- lowpass_cosine_lanczos_filter_coef(cf, m, normalize=True)[source]¶
return the convolution coefficients for low pass lanczos filter.
- Parameters:
- cf: float
Cutoff frequency expressed as a ratio of a Nyquist frequency.
- m: int
Size of filtering window size.
- Returns:
- results: list
Coefficients of filtering window.
- lowpass_lanczos_filter_coef(cf, m, normalize=True, cosine_taper=False)[source]¶
Return the convolution coefficients for a low-pass Lanczos filter.
- Parameters:
- cffloat
Cutoff frequency expressed as a ratio of the Nyquist frequency.
- mint
Size of the filtering window.
- normalizebool, optional
Whether to normalize the filter coefficients so they sum to 1.
- cosine_taperbool, optional
If True, applies a cosine-squared taper to the Lanczos window.
- Returns:
- resnp.ndarray
Coefficients of the filtering window.
- mark_inset(parent_axes, inset_axes, loc1, loc2, **kwargs)[source]¶
Draw a box to mark the location of an area represented by an inset axes.
This function draws a box in parent_axes at the bounding box of inset_axes, and shows a connection with the inset axes by drawing lines at the corners, giving a “zoomed in” effect.
- Parameters:
- parent_axes~matplotlib.axes.Axes
Axes which contains the area of the inset axes.
- inset_axes~matplotlib.axes.Axes
The inset axes.
- loc1, loc2{1, 2, 3, 4}
Corners to use for connecting the inset axes and the area in the parent axes.
- **kwargs
Patch properties for the lines and box drawn:
Properties: agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array and two offsets from the bottom left corner of the image alpha: unknown animated: bool antialiased or aa: bool or None capstyle: .CapStyle or {‘butt’, ‘projecting’, ‘round’} clip_box: ~matplotlib.transforms.BboxBase or None clip_on: bool clip_path: Patch or (Path, Transform) or None color: color edgecolor or ec: color or None facecolor or fc: color or None figure: ~matplotlib.figure.Figure fill: bool gid: str hatch: {‘/’, ‘\’, ‘|’, ‘-’, ‘+’, ‘x’, ‘o’, ‘O’, ‘.’, ‘*’} in_layout: bool joinstyle: .JoinStyle or {‘miter’, ‘round’, ‘bevel’} label: object linestyle or ls: {‘-’, ‘–’, ‘-.’, ‘:’, ‘’, (offset, on-off-seq), …} linewidth or lw: float or None mouseover: bool path_effects: list of .AbstractPathEffect picker: None or bool or float or callable rasterized: bool sketch_params: (scale: float, length: float, randomness: float) snap: bool or None transform: ~matplotlib.transforms.Transform url: str visible: bool zorder: float
- Returns:
- pp~matplotlib.patches.Patch
The patch drawn to represent the area of the inset axes.
- p1, p2~matplotlib.patches.Patch
The patches connecting two corners of the inset axes and its area.
- ts_gaussian_filter(ts, sigma, order=0, mode='reflect', cval=0.0, truncate=4.0)[source]¶
Column-wise Gaussian smoothing of regular time series. Missing/irregular values are not handled, which means this function is not much different from a rolling window gaussian average in pandas which may be preferable in the case of missing data (ts.rolling(window=5,win_type=’gaussian’).mean. This function has been kept around awaiting irreg as an aspiration but yet to be implemented.
vtools.functions.interannual module¶
vtools.functions.interpolate module¶
Module for data interpolation using splines or interfaces unavailable in Pandas.
- _monotonic_spline(x, y, xnew)[source]¶
Third order (M3-A) monotonicity-preserving spline Usage: interpolate.spline(x,y,xnew)
- where
x are the sorted index values of the original data y are the original dataxnew xnew are new locations for the spline
Reference: Huynh, HT <<Accurate Monotone Cubic Interpolation>>, SIAM J. Numer. Analysis V30 No. 1 pp 57-100 All equation numbers refer to this paper. The variable names are also almost the same. Double letters like “ee” to indicate that the subscript should have “+1/2” added to it and a number after the variable to show the “t” that the first member applies to.
- monotonic_spline(ts, dest)[source]¶
Interpolating a regular time series (rts) to a finer rts by rational histospline.
The rational histospline preserves area under the curve. This is a good choice of spline for period averaged data where an interpolant is desired that is ‘conservative’. Note that it is the underlying continuous interpolant that will be ‘conservative though, not the returned discrete time series which merely samples the underlying interpolant.
- Parameters:
- ts
Pandas.DataFrame Series to be interpolated, typically with DatetimeIndex
- desta pandas freq code (e.g. ‘16min’ or ‘D’) or a DateTimeIndex
- ts
- Returns:
- result
DataFrame A regular time series with same columns as ts, populated with instantaneous values and with an index of type DateTimeIndex
- result
- rhist(x, y, xnew, y0, yn, p, q)[source]¶
Histopline for arrays with tension. Based by an algorithm rhist2 in One Dimensional Spline Interpolation Algorithms by Helmuth Spath (1995).
- Parameters:
- xarray-like
Abscissa array of original data, of length n
- yarray-like, dimension (n-1)
Values (mantissa) of original data giving the rectangle (average) values between x[i] and x[i+1]
- xnewarray-like
Array of new locations at which to interpolate.
- y0,ynfloat
Initial and terminal values
- p,q: array-like, dimension (n-1)
Tension parameter, p and q are almost always the same. The higher p and q are for a particular x interval, the more rectangular the interpolant will look and the more positivity and shape preserving it is at the expense of accuracy. For this routine any number p,q > -1 is allowed, although the bound routine doesn’t use vals less than zero.
- Returns
- ——-
- ynewarray-like
Array that interpolates the original data.
- rhist_bound(x, y, xnew, y0, yn, p, lbound=None, maxiter=5, pfactor=2, floor_eps=0.001)[source]¶
Numpy implementation of histospline with bounds Histopline for arrays with lower bound enforcement. This routine drives rhist() but tests that the output array observes the lower bound and adapts the tension parameters as needed.
This will not work exactly if the input array has values right on the lower bound. In this case, the parameter floor_eps allows you to specify a tolerance of bound violation to shoot for … and if it isn’t met in maxiter iterations the value is simply floored.
- Parameters:
- xarray-like
Abscissa array of original data to be interpolated, of length n
- yarray-like, dimension (n-1)
Values (mantissa) of original data giving the rectangle (average) values between x[i] and x[i+1]
- xnewarray-like
Array of new locations at which to interpolate.
- y0,ynfloat
Initial and terminal values
- p: float
Tension parameter. This starts out as a global scalar, but will be converted to an array and adapted locally. The higher this goes for a particular x interval, the more rectangular the interpolant will look and the more positivity and shape preserving it is at the expense of accuracy. A good number is 1, and for this routine, p > 0 is required because the adaptive process multiplies it by pfactor each iteration on the expectation that it will get bigger.
- lbound: float
Lower bound to be enforced. If the original y’s are strictly above this value, the output has the potential to also be strictly above. If the original y’s lie on the lower bound, then the lower bound can only be enforced within a tolerance using the Spath algorithm … and once the values reach that tolerance they are floored. If lbound = None, this function behaves like rhist()
- maxiterinteger
Number of times to increase p by multiplying it by pfactor before giving up on satisfying floor_eps.
- pfactorfloat
Factor by which to multiply individual time step p
- floor_epsfloat
Tolerance for lower bound violation at which the algorithm will be terminated and the bounds will be enforced by flooring.
- Returns:
- ynewarray-like
Array that interpolates the original data, on a curve that conserves mass and strictly observes the lower bound.
- rhistinterp(ts, dest, p=2.0, lowbound=None, tolbound=0.001, maxiter=5)[source]¶
Interpolate a regular time series (rts) to a finer rts by rational histospline.
The rational histospline preserves area under the curve. This is a good choice of spline for period averaged data where an interpolant is desired that is ‘conservative’. Note that it is the underlying continuous interpolant that will be ‘conservative though, not the returned discrete time series which merely samples the underlying interpolant.
- Parameters:
- ts
Pandas.DataFrame Series to be interpolated, with period index and assuming time stamps at beginning of the period and no missing data
- deststring or
DateTimeIndex A pandas freq code (e.g. ‘16min’ or ‘D’) or a DateTimeIndex
- pfloat, optional
Spline tension, usually between 0 and 20. Must >-1. For a ‘sufficiently large’ value of p, the interpolant will be monotonicity-preserving and will maintain strict positivity (always being strictly > lowbound). It will also preserve the original shape of the time series.
- lowboundfloat, optional
Lower bound of interpolated values.
- tolboundfloat, optional
Tolerance for determining if an input is on the bound.
- ts
- Returns:
- result
pandas.DataFrame A regular time series with same columns as ts, populated with instantaneous values and with an index of type DateTimeIndex
- result
vtools.functions.lag_cross_correlation module¶
- calculate_lag(lagged, base, max_lag, res, interpolate_method='linear')[source]¶
Calculate shift in lagged, that maximizes cross-correlation with base.
- Parameters:
- base,lagged: :class:`Pandas.Series`
time series to compare. The result is relative to base
- max_lag: interval
Maximum pos/negative time shift to consider in cross-correlation (ie, from -max_lag to +max_lag). Required windows in lagged will account for this bracket. For series dominated by a single frequency (eg 1c/12.5 hours for tides), the algorithm can tolerate a range of 180 degrees (6 hours)
- res: interval
Resolution of analysis. The series lagged will be interpolated to this resolution using interpolate_method. Unit used here will determine the type of the output. See documentation of the interval concept which is most compatible with pandas.tseries.offsets, not timeDelta, because of better math properties in things like division – vtime helpers like minutes(1) may be helpful
- interpolate_method: str, optional
Interpolate method to refine lagged to res. Must be compatible with pandas interpolation method names (and hence scipy)
- Returns:
- laginterval
Shift as a pandas.tseries.offsets subtype that matches units with res This shift is the apparent lateness (pos) or earliness (neg). It must be applied to base or removed to lagged to align the features.
- icrosscorr(lag, ts0, ts1)[source]¶
Lag-N cross correlation. Shifted data filled with NaNs
- Parameters:
- lagint, default 0
- ts0, ts1pandas.Series objects of equal length
- Returns
- ———-
- crosscorrfloat
- to_offset(freq, is_period=False)¶
Return DateOffset object from string or datetime.timedelta object.
- Parameters:
- freqstr, datetime.timedelta, BaseOffset or None
- Returns:
- BaseOffset subclass or None
- Raises:
- ValueError
If freq is an invalid frequency
See also
BaseOffsetStandard kind of date increment used for a date range.
Examples
>>> from pandas.tseries.frequencies import to_offset >>> to_offset("5min") <5 * Minutes>
>>> to_offset("1D1h") <25 * Hours>
>>> to_offset("2W") <2 * Weeks: weekday=6>
>>> to_offset("2B") <2 * BusinessDays>
>>> to_offset(pd.Timedelta(days=1)) <Day>
>>> to_offset(pd.offsets.Hour()) <Hour>
vtools.functions.merge module¶
- reduce(function, iterable[, initial]) value¶
Apply a function of two arguments cumulatively to the items of a sequence or iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty.
- ts_merge(series, names=None, strict_priority=False)[source]¶
Merge multiple time series together, prioritizing series in order.
- Parameters:
- seriessequence of pandas.Series or pandas.DataFrame
Higher priority first. All indexes must be DatetimeIndex.
- namesNone, str, or iterable of str, optional
If None (default), inputs must share compatible column names.
If str, the output is univariate and will be named accordingly.
If iterable, it is used as a subset/ordering of columns.
- strict_prioritybool, default False
If False (default): lower-priority data may fill NaNs in higher-priority series anywhere (traditional merge/overlay). If True: for each column, within the window [first_valid_index, last_valid_index] of any higher-priority series, lower-priority data are masked out — even if the higher-priority value is NaN. Outside those windows, behavior is unchanged.
- Returns:
- pandas.Series or pandas.DataFrame
- ts_splice(series, names=None, transition='prefer_last', floor_dates=False)[source]¶
Splice multiple time series together, prioritizing series in patches of time.
Unlike ts_merge, which blends overlapping data points, ts_splice stitches together time series without overlap. The function determines when to switch between series based on a transition strategy.
- Parameters:
- seriestuple or list of pandas.DataFrame or pandas.Series
A tuple or list of time series. Each series must have a DatetimeIndex and consistent column structure.
- namesNone, str, or iterable of str, optional
If None (default), all input series must share common column names, and the output will merge common columns.
If a str, all input series must have a single column, and the output will be a DataFrame with this name as the column name.
If an iterable of str, all input DataFrames must have the same number of columns matching the length of names, and these will be used for the output.
- transition{‘prefer_first’, ‘prefer_last’} or list of pandas.Timestamp
Defines how to determine breakpoints between time series: - ‘prefer_first’: Uses the earlier series on the list during until its valid timestamp. - ‘prefer_last’: Uses the later series starting from its first valid timestamp. - A list of specific timestamps can also be provided as transition points.
- floor_datesbool, optional, default=False
If True, inferred transition timestamps (prefer_first or prefer_last) are floored to the beginning of the day. This can introduce NaNs if the input series are regular with a freq attribute.
- Returns:
- pandas.DataFrame or pandas.Series
If the input contains multi-column DataFrames, the output is a DataFrame with the same column structure.
If a collection of single-column Series is provided, the output will be a Series.
The output retains a freq attribute if all inputs share the same frequency.
See also
ts_mergeMerges series by filling gaps in order of priority.
Notes
The output time index is the union of input time indices.
If transition is ‘prefer_first’, gaps may appear in the final time series.
If transition is ‘prefer_last’, overlapping data is resolved in favor of later series.
vtools.functions.neighbor_fill module¶
Neighbor-based time-series gap filling.
This module provides a single high-level API, fill_from_neighbor(),
with pluggable backends for common algorithms used to infer a target series
from one or more nearby stations. It is designed for operational use in
Delta/Bay hydrodynamics workflows, but is intentionally general.
Highlights¶
Robust time alignment and optional resampling.
Multiple modeling strategies: OLS/robust, rolling regression, lagged elastic-net, and state-space/Kalman.
Forward-chaining (temporal) cross-validation utilities.
Optional regime stratification (e.g., barrier in/out, season).
Uncertainty estimates where available (analytic or residual-based).
Clear return structure with diagnostics for auditability.
Example¶
>>> res = fill_from_neighbor(
... target=y, neighbor=x, method="state_space", lags=range(0, 4),
... bounds=(0.0, None), regime=regime_series
... )
>>> filled = res["filled"]
>>> info = res["model_info"]
Notes¶
“Neighbor” can be one series or multiple (as a DataFrame); both are supported.
Missing data in the target are left as-is where the model cannot reasonably infer a value (e.g.no overlapping neighbor data). Where predictions exist, they are merged into the target to produce filled. DFM methods can carry through a gap in the neighbor.
- class DFMFill(endog: DataFrame, factor: str = 'default', anomaly_mode: str = 'ar', anom_var: str = 'neighbor', rx_scale: float = 1.0)[source]¶
Bases:
MLEModelBivariate DFM with level+slope common factor and optional anomalies.
- Attributes:
param_names(list of str) List of human readable parameter names (for parameters
start_params(array) Starting parameters for maximum likelihood estimation.
Methods
update(params[, transformed])Update the parameters of the model
- __doc__ = '\n Bivariate DFM with level+slope common factor and optional anomalies.\n\n '¶
- __init__(endog: DataFrame, factor: str = 'default', anomaly_mode: str = 'ar', anom_var: str = 'neighbor', rx_scale: float = 1.0)[source]¶
- __module__ = 'vtools.functions.neighbor_fill'¶
- property param_names¶
(list of str) List of human readable parameter names (for parameters actually included in the model).
- property start_params: ndarray¶
(array) Starting parameters for maximum likelihood estimation.
- update(params, transformed=True, **kwargs)[source]¶
Update the parameters of the model
- Parameters:
- paramsarray_like
Array of new parameters.
- transformedbool, optional
Whether or not params is already transformed. If set to False, transform_params is called. Default is True.
- Returns:
- paramsarray_like
Array of parameters.
Notes
Since Model is a base class, this method should be overridden by subclasses to perform actual updating steps.
- class ElasticNetCV(*, l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, precompute='auto', max_iter=1000, tol=0.0001, cv=None, copy_X=True, verbose=0, n_jobs=None, positive=False, random_state=None, selection='cyclic')[source]¶
Bases:
RegressorMixin,LinearModelCVElastic Net model with iterative fitting along a regularization path.
See glossary entry for cross-validation estimator.
Read more in the User Guide.
- Parameters:
- l1_ratiofloat or list of float, default=0.5
Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For
l1_ratio = 0the penalty is an L2 penalty. Forl1_ratio = 1it is an L1 penalty. For0 < l1_ratio < 1, the penalty is a combination of L1 and L2 This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in[.1, .5, .7, .9, .95, .99, 1].- epsfloat, default=1e-3
Length of the path.
eps=1e-3means thatalpha_min / alpha_max = 1e-3.- n_alphasint, default=100
Number of alphas along the regularization path, used for each l1_ratio.
- alphasarray-like, default=None
List of alphas where to compute the models. If None alphas are set automatically.
- fit_interceptbool, default=True
Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).
- precompute‘auto’, bool or array-like of shape (n_features, n_features), default=’auto’
Whether to use a precomputed Gram matrix to speed up calculations. If set to
'auto'let us decide. The Gram matrix can also be passed as argument.- max_iterint, default=1000
The maximum number of iterations.
- tolfloat, default=1e-4
The tolerance for the optimization: if the updates are smaller than
tol, the optimization code checks the dual gap for optimality and continues until it is smaller thantol.- cvint, cross-validation generator or iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross-validation,
int, to specify the number of folds.
An iterable yielding (train, test) splits as arrays of indices.
For int/None inputs,
KFoldis used.Refer User Guide for the various cross-validation strategies that can be used here.
Changed in version 0.22:
cvdefault value if None changed from 3-fold to 5-fold.- copy_Xbool, default=True
If
True, X will be copied; else, it may be overwritten.- verbosebool or int, default=0
Amount of verbosity.
- n_jobsint, default=None
Number of CPUs to use during the cross validation.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. See Glossary for more details.- positivebool, default=False
When set to
True, forces the coefficients to be positive.- random_stateint, RandomState instance, default=None
The seed of the pseudo random number generator that selects a random feature to update. Used when
selection== ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.- selection{‘cyclic’, ‘random’}, default=’cyclic’
If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.
See also
enet_pathCompute elastic net path with coordinate descent.
ElasticNetLinear regression with combined L1 and L2 priors as regularizer.
Notes
In fit, once the best parameters l1_ratio and alpha are found through cross-validation, the model is fit again using the entire training set.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortran-contiguous numpy array.
The parameter l1_ratio corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the optimization objective is:
1 / (2 * n_samples) * ||y - Xw||^2_2 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a * L1 + b * L2
for:
alpha = a + b and l1_ratio = a / (a + b).
For an example, see examples/linear_model/plot_lasso_model_selection.py.
Examples
>>> from sklearn.linear_model import ElasticNetCV >>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=2, random_state=0) >>> regr = ElasticNetCV(cv=5, random_state=0) >>> regr.fit(X, y) ElasticNetCV(cv=5, random_state=0) >>> print(regr.alpha_) 0.199... >>> print(regr.intercept_) 0.398... >>> print(regr.predict([[0, 0]])) [0.398...]
- Attributes:
- alpha_float
The amount of penalization chosen by cross validation.
- l1_ratio_float
The compromise between l1 and l2 penalization chosen by cross validation.
- coef_ndarray of shape (n_features,) or (n_targets, n_features)
Parameter vector (w in the cost function formula).
- intercept_float or ndarray of shape (n_targets, n_features)
Independent term in the decision function.
- mse_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)
Mean square error for the test set on each fold, varying l1_ratio and alpha.
- alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)
The grid of alphas used for fitting, for each l1_ratio.
- dual_gap_float
The dual gaps at the end of the optimization for the optimal alpha.
- n_iter_int
Number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
New in version 1.0.
Methods
path(X, y, *[, l1_ratio, eps, n_alphas, ...])Compute elastic net path with coordinate descent.
set_fit_request(*[, sample_weight])Request metadata passed to the
fitmethod.set_score_request(*[, sample_weight])Request metadata passed to the
scoremethod.- __abstractmethods__ = frozenset({})¶
- __annotations__ = {'_parameter_constraints': <class 'dict'>}¶
- __doc__ = "Elastic Net model with iterative fitting along a regularization path.\n\n See glossary entry for :term:`cross-validation estimator`.\n\n Read more in the :ref:`User Guide <elastic_net>`.\n\n Parameters\n ----------\n l1_ratio : float or list of float, default=0.5\n Float between 0 and 1 passed to ElasticNet (scaling between\n l1 and l2 penalties). For ``l1_ratio = 0``\n the penalty is an L2 penalty. For ``l1_ratio = 1`` it is an L1 penalty.\n For ``0 < l1_ratio < 1``, the penalty is a combination of L1 and L2\n This parameter can be a list, in which case the different\n values are tested by cross-validation and the one giving the best\n prediction score is used. Note that a good choice of list of\n values for l1_ratio is often to put more values close to 1\n (i.e. Lasso) and less close to 0 (i.e. Ridge), as in ``[.1, .5, .7,\n .9, .95, .99, 1]``.\n\n eps : float, default=1e-3\n Length of the path. ``eps=1e-3`` means that\n ``alpha_min / alpha_max = 1e-3``.\n\n n_alphas : int, default=100\n Number of alphas along the regularization path, used for each l1_ratio.\n\n alphas : array-like, default=None\n List of alphas where to compute the models.\n If None alphas are set automatically.\n\n fit_intercept : bool, default=True\n Whether to calculate the intercept for this model. If set\n to false, no intercept will be used in calculations\n (i.e. data is expected to be centered).\n\n precompute : 'auto', bool or array-like of shape (n_features, n_features), default='auto'\n Whether to use a precomputed Gram matrix to speed up\n calculations. If set to ``'auto'`` let us decide. The Gram\n matrix can also be passed as argument.\n\n max_iter : int, default=1000\n The maximum number of iterations.\n\n tol : float, default=1e-4\n The tolerance for the optimization: if the updates are\n smaller than ``tol``, the optimization code checks the\n dual gap for optimality and continues until it is smaller\n than ``tol``.\n\n cv : int, cross-validation generator or iterable, default=None\n Determines the cross-validation splitting strategy.\n Possible inputs for cv are:\n\n - None, to use the default 5-fold cross-validation,\n - int, to specify the number of folds.\n - :term:`CV splitter`,\n - An iterable yielding (train, test) splits as arrays of indices.\n\n For int/None inputs, :class:`~sklearn.model_selection.KFold` is used.\n\n Refer :ref:`User Guide <cross_validation>` for the various\n cross-validation strategies that can be used here.\n\n .. versionchanged:: 0.22\n ``cv`` default value if None changed from 3-fold to 5-fold.\n\n copy_X : bool, default=True\n If ``True``, X will be copied; else, it may be overwritten.\n\n verbose : bool or int, default=0\n Amount of verbosity.\n\n n_jobs : int, default=None\n Number of CPUs to use during the cross validation.\n ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.\n ``-1`` means using all processors. See :term:`Glossary <n_jobs>`\n for more details.\n\n positive : bool, default=False\n When set to ``True``, forces the coefficients to be positive.\n\n random_state : int, RandomState instance, default=None\n The seed of the pseudo random number generator that selects a random\n feature to update. Used when ``selection`` == 'random'.\n Pass an int for reproducible output across multiple function calls.\n See :term:`Glossary <random_state>`.\n\n selection : {'cyclic', 'random'}, default='cyclic'\n If set to 'random', a random coefficient is updated every iteration\n rather than looping over features sequentially by default. This\n (setting to 'random') often leads to significantly faster convergence\n especially when tol is higher than 1e-4.\n\n Attributes\n ----------\n alpha_ : float\n The amount of penalization chosen by cross validation.\n\n l1_ratio_ : float\n The compromise between l1 and l2 penalization chosen by\n cross validation.\n\n coef_ : ndarray of shape (n_features,) or (n_targets, n_features)\n Parameter vector (w in the cost function formula).\n\n intercept_ : float or ndarray of shape (n_targets, n_features)\n Independent term in the decision function.\n\n mse_path_ : ndarray of shape (n_l1_ratio, n_alpha, n_folds)\n Mean square error for the test set on each fold, varying l1_ratio and\n alpha.\n\n alphas_ : ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)\n The grid of alphas used for fitting, for each l1_ratio.\n\n dual_gap_ : float\n The dual gaps at the end of the optimization for the optimal alpha.\n\n n_iter_ : int\n Number of iterations run by the coordinate descent solver to reach\n the specified tolerance for the optimal alpha.\n\n n_features_in_ : int\n Number of features seen during :term:`fit`.\n\n .. versionadded:: 0.24\n\n feature_names_in_ : ndarray of shape (`n_features_in_`,)\n Names of features seen during :term:`fit`. Defined only when `X`\n has feature names that are all strings.\n\n .. versionadded:: 1.0\n\n See Also\n --------\n enet_path : Compute elastic net path with coordinate descent.\n ElasticNet : Linear regression with combined L1 and L2 priors as regularizer.\n\n Notes\n -----\n In `fit`, once the best parameters `l1_ratio` and `alpha` are found through\n cross-validation, the model is fit again using the entire training set.\n\n To avoid unnecessary memory duplication the `X` argument of the `fit`\n method should be directly passed as a Fortran-contiguous numpy array.\n\n The parameter `l1_ratio` corresponds to alpha in the glmnet R package\n while alpha corresponds to the lambda parameter in glmnet.\n More specifically, the optimization objective is::\n\n 1 / (2 * n_samples) * ||y - Xw||^2_2\n + alpha * l1_ratio * ||w||_1\n + 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2\n\n If you are interested in controlling the L1 and L2 penalty\n separately, keep in mind that this is equivalent to::\n\n a * L1 + b * L2\n\n for::\n\n alpha = a + b and l1_ratio = a / (a + b).\n\n For an example, see\n :ref:`examples/linear_model/plot_lasso_model_selection.py\n <sphx_glr_auto_examples_linear_model_plot_lasso_model_selection.py>`.\n\n Examples\n --------\n >>> from sklearn.linear_model import ElasticNetCV\n >>> from sklearn.datasets import make_regression\n\n >>> X, y = make_regression(n_features=2, random_state=0)\n >>> regr = ElasticNetCV(cv=5, random_state=0)\n >>> regr.fit(X, y)\n ElasticNetCV(cv=5, random_state=0)\n >>> print(regr.alpha_)\n 0.199...\n >>> print(regr.intercept_)\n 0.398...\n >>> print(regr.predict([[0, 0]]))\n [0.398...]\n "¶
- __init__(*, l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, precompute='auto', max_iter=1000, tol=0.0001, cv=None, copy_X=True, verbose=0, n_jobs=None, positive=False, random_state=None, selection='cyclic')[source]¶
- __module__ = 'sklearn.linear_model._coordinate_descent'¶
- _abc_impl = <_abc._abc_data object>¶
- _parameter_constraints: dict = {'alphas': ['array-like', None], 'copy_X': ['boolean'], 'cv': ['cv_object'], 'eps': [<sklearn.utils._param_validation.Interval object>], 'fit_intercept': ['boolean'], 'l1_ratio': [<sklearn.utils._param_validation.Interval object>, 'array-like'], 'max_iter': [<sklearn.utils._param_validation.Interval object>], 'n_alphas': [<sklearn.utils._param_validation.Interval object>], 'n_jobs': [<class 'numbers.Integral'>, None], 'positive': ['boolean'], 'precompute': [<sklearn.utils._param_validation.StrOptions object>, 'array-like', 'boolean'], 'random_state': ['random_state'], 'selection': [<sklearn.utils._param_validation.StrOptions object>], 'tol': [<sklearn.utils._param_validation.Interval object>], 'verbose': ['verbose']}¶
- static path(X, y, *, l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, precompute='auto', Xy=None, copy_X=True, coef_init=None, verbose=False, return_n_iter=False, positive=False, check_input=True, **params)¶
Compute elastic net path with coordinate descent.
The elastic net optimization function varies for mono and multi-outputs.
For mono-output tasks it is:
1 / (2 * n_samples) * ||y - Xw||^2_2 + alpha * l1_ratio * ||w||_1 + 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2
For multi-output tasks it is:
(1 / (2 * n_samples)) * ||Y - XW||_Fro^2 + alpha * l1_ratio * ||W||_21 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2
Where:
||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}
i.e. the sum of norm of each row.
Read more in the User Guide.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data. Pass directly as Fortran-contiguous data to avoid unnecessary memory duplication. If
yis mono-output thenXcan be sparse.- y{array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_targets)
Target values.
- l1_ratiofloat, default=0.5
Number between 0 and 1 passed to elastic net (scaling between l1 and l2 penalties).
l1_ratio=1corresponds to the Lasso.- epsfloat, default=1e-3
Length of the path.
eps=1e-3means thatalpha_min / alpha_max = 1e-3.- n_alphasint, default=100
Number of alphas along the regularization path.
- alphasarray-like, default=None
List of alphas where to compute the models. If None alphas are set automatically.
- precompute‘auto’, bool or array-like of shape (n_features, n_features), default=’auto’
Whether to use a precomputed Gram matrix to speed up calculations. If set to
'auto'let us decide. The Gram matrix can also be passed as argument.- Xyarray-like of shape (n_features,) or (n_features, n_targets), default=None
Xy = np.dot(X.T, y) that can be precomputed. It is useful only when the Gram matrix is precomputed.
- copy_Xbool, default=True
If
True, X will be copied; else, it may be overwritten.- coef_initarray-like of shape (n_features, ), default=None
The initial values of the coefficients.
- verbosebool or int, default=False
Amount of verbosity.
- return_n_iterbool, default=False
Whether to return the number of iterations or not.
- positivebool, default=False
If set to True, forces coefficients to be positive. (Only allowed when
y.ndim == 1).- check_inputbool, default=True
If set to False, the input validation checks are skipped (including the Gram matrix when provided). It is assumed that they are handled by the caller.
- **paramskwargs
Keyword arguments passed to the coordinate descent solver.
- Returns:
- alphasndarray of shape (n_alphas,)
The alphas along the path where models are computed.
- coefsndarray of shape (n_features, n_alphas) or (n_targets, n_features, n_alphas)
Coefficients along the path.
- dual_gapsndarray of shape (n_alphas,)
The dual gaps at the end of the optimization for each alpha.
- n_iterslist of int
The number of iterations taken by the coordinate descent optimizer to reach the specified tolerance for each alpha. (Is returned when
return_n_iteris set to True).
See also
MultiTaskElasticNetMulti-task ElasticNet model trained with L1/L2 mixed-norm as regularizer.
MultiTaskElasticNetCVMulti-task L1/L2 ElasticNet with built-in cross-validation.
ElasticNetLinear regression with combined L1 and L2 priors as regularizer.
ElasticNetCVElastic Net model with iterative fitting along a regularization path.
Notes
For an example, see examples/linear_model/plot_lasso_coordinate_descent_path.py.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ElasticNetCV¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') ElasticNetCV¶
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
- class FillResult(filled: Series, yhat: Series, pi_lower: Series | None, pi_upper: Series | None, model_info: Dict[str, Any], metrics: Dict[str, float])[source]¶
Bases:
objectContainer for gap-filling outputs.
- Parameters:
- filledpd.Series
Target series with gaps filled where possible.
- yhatpd.Series
Model predictions aligned to the union index used for fitting/prediction.
- pi_lower, pi_upperOptional[pd.Series]
Prediction interval bounds where available; otherwise
None.- model_infodict
Method, parameters, chosen lags, training window, etc.
- metricsdict
Holdout scores (MAE/RMSE/R^2) using forward-chaining CV where configured.
Methods
to_dict
- __annotations__ = {'filled': 'pd.Series', 'metrics': 'Dict[str, float]', 'model_info': 'Dict[str, Any]', 'pi_lower': 'Optional[pd.Series]', 'pi_upper': 'Optional[pd.Series]', 'yhat': 'pd.Series'}¶
- __dataclass_fields__ = {'filled': Field(name='filled',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'metrics': Field(name='metrics',type='Dict[str, float]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'model_info': Field(name='model_info',type='Dict[str, Any]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_lower': Field(name='pi_lower',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_upper': Field(name='pi_upper',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'yhat': Field(name='yhat',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}¶
- __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)¶
- __dict__ = mappingproxy({'__module__': 'vtools.functions.neighbor_fill', '__annotations__': {'filled': 'pd.Series', 'yhat': 'pd.Series', 'pi_lower': 'Optional[pd.Series]', 'pi_upper': 'Optional[pd.Series]', 'model_info': 'Dict[str, Any]', 'metrics': 'Dict[str, float]'}, '__doc__': 'Container for gap-filling outputs.\n\n Parameters\n ----------\n filled : pd.Series\n Target series with gaps filled where possible.\n\n yhat : pd.Series\n Model predictions aligned to the union index used for fitting/prediction.\n\n pi_lower, pi_upper : Optional[pd.Series]\n Prediction interval bounds where available; otherwise ``None``.\n\n model_info : dict\n Method, parameters, chosen lags, training window, etc.\n\n metrics : dict\n Holdout scores (MAE/RMSE/R^2) using forward-chaining CV where configured.\n ', 'to_dict': <function FillResult.to_dict>, '__dict__': <attribute '__dict__' of 'FillResult' objects>, '__weakref__': <attribute '__weakref__' of 'FillResult' objects>, '__dataclass_params__': _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False), '__dataclass_fields__': {'filled': Field(name='filled',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'yhat': Field(name='yhat',type='pd.Series',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_lower': Field(name='pi_lower',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'pi_upper': Field(name='pi_upper',type='Optional[pd.Series]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'model_info': Field(name='model_info',type='Dict[str, Any]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD), 'metrics': Field(name='metrics',type='Dict[str, float]',default=<dataclasses._MISSING_TYPE object>,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD)}, '__init__': <function FillResult.__init__>, '__repr__': <function FillResult.__repr__>, '__eq__': <function FillResult.__eq__>, '__hash__': None, '__match_args__': ('filled', 'yhat', 'pi_lower', 'pi_upper', 'model_info', 'metrics')})¶
- __doc__ = 'Container for gap-filling outputs.\n\n Parameters\n ----------\n filled : pd.Series\n Target series with gaps filled where possible.\n\n yhat : pd.Series\n Model predictions aligned to the union index used for fitting/prediction.\n\n pi_lower, pi_upper : Optional[pd.Series]\n Prediction interval bounds where available; otherwise ``None``.\n\n model_info : dict\n Method, parameters, chosen lags, training window, etc.\n\n metrics : dict\n Holdout scores (MAE/RMSE/R^2) using forward-chaining CV where configured.\n '¶
- __eq__(other)¶
Return self==value.
- __hash__ = None¶
- __init__(filled: Series, yhat: Series, pi_lower: Series | None, pi_upper: Series | None, model_info: Dict[str, Any], metrics: Dict[str, float]) None¶
- __match_args__ = ('filled', 'yhat', 'pi_lower', 'pi_upper', 'model_info', 'metrics')¶
- __module__ = 'vtools.functions.neighbor_fill'¶
- __repr__()¶
Return repr(self).
- __weakref__¶
list of weak references to the object (if defined)
- class HuberT(t=1.345)[source]¶
Bases:
RobustNormHuber’s T for M estimation.
- Parameters:
- tfloat, optional
The tuning constant for Huber’s t function. The default value is 1.345.
See also
statsmodels.robust.norms.RobustNorm
Methods
psi(z)The psi function for Huber's t estimator
psi_deriv(z)The derivative of Huber's t psi function
rho(z)The robust criterion function for Huber's t.
weights(z)Huber's t weighting function for the IRLS algorithm
- __doc__ = "\n Huber's T for M estimation.\n\n Parameters\n ----------\n t : float, optional\n The tuning constant for Huber's t function. The default value is\n 1.345.\n\n See Also\n --------\n statsmodels.robust.norms.RobustNorm\n "¶
- __module__ = 'statsmodels.robust.norms'¶
- psi(z)[source]¶
The psi function for Huber’s t estimator
The analytic derivative of rho
- Parameters:
- zarray_like
1d array
- Returns:
- psindarray
psi(z) = z for |z| <= t
psi(z) = sign(z)*t for |z| > t
- psi_deriv(z)[source]¶
The derivative of Huber’s t psi function
Notes
Used to estimate the robust covariance matrix.
- class KNeighborsRegressor(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)[source]¶
Bases:
KNeighborsMixin,RegressorMixin,NeighborsBaseRegression based on k-nearest neighbors.
The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.
Read more in the User Guide.
New in version 0.9.
- Parameters:
- n_neighborsint, default=5
Number of neighbors to use by default for
kneighbors()queries.- weights{‘uniform’, ‘distance’}, callable or None, default=’uniform’
Weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Uniform weights are used by default.
- algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’
Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use
BallTree‘kd_tree’ will use
KDTree‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to
fit()method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
- leaf_sizeint, default=30
Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
- pfloat, default=2
Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
- metricstr, DistanceMetric object or callable, default=’minkowski’
Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in
distance_metricsfor valid metric values.If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.
If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.
If metric is a DistanceMetric object, it will be passed directly to the underlying computation routines.
- metric_paramsdict, default=None
Additional keyword arguments for the metric function.
- n_jobsint, default=None
The number of parallel jobs to run for neighbors search.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. See Glossary for more details. Doesn’t affectfit()method.
See also
NearestNeighborsUnsupervised learner for implementing neighbor searches.
RadiusNeighborsRegressorRegression based on neighbors within a fixed radius.
KNeighborsClassifierClassifier implementing the k-nearest neighbors vote.
RadiusNeighborsClassifierClassifier implementing a vote among neighbors within a given radius.
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of
algorithmandleaf_size.Warning
Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Examples
>>> X = [[0], [1], [2], [3]] >>> y = [0, 0, 1, 1] >>> from sklearn.neighbors import KNeighborsRegressor >>> neigh = KNeighborsRegressor(n_neighbors=2) >>> neigh.fit(X, y) KNeighborsRegressor(...) >>> print(neigh.predict([[1.5]])) [0.5]
- Attributes:
- effective_metric_str or callable
The distance metric to use. It will be same as the metric parameter or a synonym of it, e.g. ‘euclidean’ if the metric parameter set to ‘minkowski’ and p parameter set to 2.
- effective_metric_params_dict
Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.
- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
New in version 1.0.
- n_samples_fit_int
Number of samples in the fitted data.
Methods
fit(X, y)Fit the k-nearest neighbors regressor from the training dataset.
predict(X)Predict the target for the provided data.
set_score_request(*[, sample_weight])Request metadata passed to the
scoremethod.- __abstractmethods__ = frozenset({})¶
- __annotations__ = {'_parameter_constraints': <class 'dict'>}¶
- __doc__ = 'Regression based on k-nearest neighbors.\n\n The target is predicted by local interpolation of the targets\n associated of the nearest neighbors in the training set.\n\n Read more in the :ref:`User Guide <regression>`.\n\n .. versionadded:: 0.9\n\n Parameters\n ----------\n n_neighbors : int, default=5\n Number of neighbors to use by default for :meth:`kneighbors` queries.\n\n weights : {\'uniform\', \'distance\'}, callable or None, default=\'uniform\'\n Weight function used in prediction. Possible values:\n\n - \'uniform\' : uniform weights. All points in each neighborhood\n are weighted equally.\n - \'distance\' : weight points by the inverse of their distance.\n in this case, closer neighbors of a query point will have a\n greater influence than neighbors which are further away.\n - [callable] : a user-defined function which accepts an\n array of distances, and returns an array of the same shape\n containing the weights.\n\n Uniform weights are used by default.\n\n algorithm : {\'auto\', \'ball_tree\', \'kd_tree\', \'brute\'}, default=\'auto\'\n Algorithm used to compute the nearest neighbors:\n\n - \'ball_tree\' will use :class:`BallTree`\n - \'kd_tree\' will use :class:`KDTree`\n - \'brute\' will use a brute-force search.\n - \'auto\' will attempt to decide the most appropriate algorithm\n based on the values passed to :meth:`fit` method.\n\n Note: fitting on sparse input will override the setting of\n this parameter, using brute force.\n\n leaf_size : int, default=30\n Leaf size passed to BallTree or KDTree. This can affect the\n speed of the construction and query, as well as the memory\n required to store the tree. The optimal value depends on the\n nature of the problem.\n\n p : float, default=2\n Power parameter for the Minkowski metric. When p = 1, this is\n equivalent to using manhattan_distance (l1), and euclidean_distance\n (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.\n\n metric : str, DistanceMetric object or callable, default=\'minkowski\'\n Metric to use for distance computation. Default is "minkowski", which\n results in the standard Euclidean distance when p = 2. See the\n documentation of `scipy.spatial.distance\n <https://docs.scipy.org/doc/scipy/reference/spatial.distance.html>`_ and\n the metrics listed in\n :class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric\n values.\n\n If metric is "precomputed", X is assumed to be a distance matrix and\n must be square during fit. X may be a :term:`sparse graph`, in which\n case only "nonzero" elements may be considered neighbors.\n\n If metric is a callable function, it takes two arrays representing 1D\n vectors as inputs and must return one value indicating the distance\n between those vectors. This works for Scipy\'s metrics, but is less\n efficient than passing the metric name as a string.\n\n If metric is a DistanceMetric object, it will be passed directly to\n the underlying computation routines.\n\n metric_params : dict, default=None\n Additional keyword arguments for the metric function.\n\n n_jobs : int, default=None\n The number of parallel jobs to run for neighbors search.\n ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.\n ``-1`` means using all processors. See :term:`Glossary <n_jobs>`\n for more details.\n Doesn\'t affect :meth:`fit` method.\n\n Attributes\n ----------\n effective_metric_ : str or callable\n The distance metric to use. It will be same as the `metric` parameter\n or a synonym of it, e.g. \'euclidean\' if the `metric` parameter set to\n \'minkowski\' and `p` parameter set to 2.\n\n effective_metric_params_ : dict\n Additional keyword arguments for the metric function. For most metrics\n will be same with `metric_params` parameter, but may also contain the\n `p` parameter value if the `effective_metric_` attribute is set to\n \'minkowski\'.\n\n n_features_in_ : int\n Number of features seen during :term:`fit`.\n\n .. versionadded:: 0.24\n\n feature_names_in_ : ndarray of shape (`n_features_in_`,)\n Names of features seen during :term:`fit`. Defined only when `X`\n has feature names that are all strings.\n\n .. versionadded:: 1.0\n\n n_samples_fit_ : int\n Number of samples in the fitted data.\n\n See Also\n --------\n NearestNeighbors : Unsupervised learner for implementing neighbor searches.\n RadiusNeighborsRegressor : Regression based on neighbors within a fixed radius.\n KNeighborsClassifier : Classifier implementing the k-nearest neighbors vote.\n RadiusNeighborsClassifier : Classifier implementing\n a vote among neighbors within a given radius.\n\n Notes\n -----\n See :ref:`Nearest Neighbors <neighbors>` in the online documentation\n for a discussion of the choice of ``algorithm`` and ``leaf_size``.\n\n .. warning::\n\n Regarding the Nearest Neighbors algorithms, if it is found that two\n neighbors, neighbor `k+1` and `k`, have identical distances but\n different labels, the results will depend on the ordering of the\n training data.\n\n https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm\n\n Examples\n --------\n >>> X = [[0], [1], [2], [3]]\n >>> y = [0, 0, 1, 1]\n >>> from sklearn.neighbors import KNeighborsRegressor\n >>> neigh = KNeighborsRegressor(n_neighbors=2)\n >>> neigh.fit(X, y)\n KNeighborsRegressor(...)\n >>> print(neigh.predict([[1.5]]))\n [0.5]\n '¶
- __init__(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)[source]¶
- __module__ = 'sklearn.neighbors._regression'¶
- _abc_impl = <_abc._abc_data object>¶
- _parameter_constraints: dict = {'algorithm': [<sklearn.utils._param_validation.StrOptions object>], 'leaf_size': [<sklearn.utils._param_validation.Interval object>], 'metric': [<sklearn.utils._param_validation.StrOptions object>, <built-in function callable>, <class 'sklearn.metrics._dist_metrics.DistanceMetric'>], 'metric_params': [<class 'dict'>, None], 'n_jobs': [<class 'numbers.Integral'>, None], 'n_neighbors': [<sklearn.utils._param_validation.Interval object>, None], 'p': [<sklearn.utils._param_validation.Interval object>, None], 'weights': [<sklearn.utils._param_validation.StrOptions object>, <built-in function callable>, None]}¶
- fit(X, y)[source]¶
Fit the k-nearest neighbors regressor from the training dataset.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples) if metric=’precomputed’
Training data.
- y{array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_outputs)
Target values.
- Returns:
- selfKNeighborsRegressor
The fitted k-nearest neighbors regressor.
- predict(X)[source]¶
Predict the target for the provided data.
- Parameters:
- X{array-like, sparse matrix} of shape (n_queries, n_features), or (n_queries, n_indexed) if metric == ‘precomputed’
Test samples.
- Returns:
- yndarray of shape (n_queries,) or (n_queries, n_outputs), dtype=int
Target values.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KNeighborsRegressor¶
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
- class MLEModel(endog, k_states, exog=None, dates=None, freq=None, **kwargs)[source]¶
Bases:
TimeSeriesModelState space model for maximum likelihood estimation
- Parameters:
- endogarray_like
The observed time-series process \(y\)
- k_statesint
The dimension of the unobserved state process.
- exogarray_like, optional
Array of exogenous regressors, shaped nobs x k. Default is no exogenous regressors.
- datesarray_like of datetime, optional
An array-like object of datetime objects. If a Pandas object is given for endog, it is assumed to have a DateIndex.
- freqstr, optional
The frequency of the time-series. A Pandas offset or ‘B’, ‘D’, ‘W’, ‘M’, ‘A’, or ‘Q’. This is optional if dates are given.
- **kwargs
Keyword arguments may be used to provide default values for state space matrices or for Kalman filtering options. See Representation, and KalmanFilter for more details.
See also
statsmodels.tsa.statespace.mlemodel.MLEResultsstatsmodels.tsa.statespace.kalman_filter.KalmanFilterstatsmodels.tsa.statespace.representation.Representation
Notes
This class wraps the state space model with Kalman filtering to add in functionality for maximum likelihood estimation. In particular, it adds the concept of updating the state space representation based on a defined set of parameters, through the update method or updater attribute (see below for more details on which to use when), and it adds a fit method which uses a numerical optimizer to select the parameters that maximize the likelihood of the model.
The start_params update method must be overridden in the child class (and the transform and untransform methods, if needed).
- Attributes:
- ssmstatsmodels.tsa.statespace.kalman_filter.KalmanFilter
Underlying state space representation.
Methods
clone(endog[, exog])Clone state space model with new data and optionally new specification
filter(params[, transformed, ...])Kalman filtering
fit([start_params, transformed, ...])Fits the model by maximum likelihood via Kalman filter.
fit_constrained(constraints[, start_params])Fit the model with some parameters subject to equality constraints.
fix_params(params)Fix parameters to specific values (context manager)
from_formula(formula, data[, subset])Not implemented for state space models
handle_params(params[, transformed, ...])Ensure model parameters satisfy shape and other requirements
hessian(params, *args, **kwargs)Hessian matrix of the likelihood function, evaluated at the given parameters
impulse_responses(params[, steps, impulse, ...])Impulse response function
initialize_approximate_diffuse([variance])Initialize approximate diffuse
initialize_known(initial_state, ...)Initialize known
initialize_statespace(**kwargs)Initialize the state space representation
Initialize stationary
loglike(params, *args, **kwargs)Loglikelihood evaluation
loglikeobs(params[, transformed, ...])Loglikelihood evaluation
observed_information_matrix(params[, ...])Observed information matrix
opg_information_matrix(params[, ...])Outer product of gradients information matrix
Prepare data for use in the state space representation
score(params, *args, **kwargs)Compute the score function at params.
score_obs(params[, method, transformed, ...])Compute the score per observation, evaluated at params
set_conserve_memory([conserve_memory])Set the memory conservation method
set_filter_method([filter_method])Set the filtering method
set_inversion_method([inversion_method])Set the inversion method
set_smoother_output([smoother_output])Set the smoother output
set_stability_method([stability_method])Set the numerical stability method
simulate(params, nsimulations[, ...])Simulate a new time series following the state space model
simulation_smoother([simulation_output])Retrieve a simulation smoother for the state space model.
smooth(params[, transformed, ...])Kalman smoothing
transform_jacobian(unconstrained[, ...])Jacobian matrix for the parameter transformation function
transform_params(unconstrained)Transform unconstrained parameters used by the optimizer to constrained parameters used in likelihood evaluation
untransform_params(constrained)Transform constrained parameters used in likelihood evaluation to unconstrained parameters used by the optimizer
update(params[, transformed, ...])Update the parameters of the model
- __annotations__ = {}¶
- __doc__ = "\n State space model for maximum likelihood estimation\n\n Parameters\n ----------\n endog : array_like\n The observed time-series process :math:`y`\n k_states : int\n The dimension of the unobserved state process.\n exog : array_like, optional\n Array of exogenous regressors, shaped nobs x k. Default is no\n exogenous regressors.\n dates : array_like of datetime, optional\n An array-like object of datetime objects. If a Pandas object is given\n for endog, it is assumed to have a DateIndex.\n freq : str, optional\n The frequency of the time-series. A Pandas offset or 'B', 'D', 'W',\n 'M', 'A', or 'Q'. This is optional if dates are given.\n **kwargs\n Keyword arguments may be used to provide default values for state space\n matrices or for Kalman filtering options. See `Representation`, and\n `KalmanFilter` for more details.\n\n Attributes\n ----------\n ssm : statsmodels.tsa.statespace.kalman_filter.KalmanFilter\n Underlying state space representation.\n\n See Also\n --------\n statsmodels.tsa.statespace.mlemodel.MLEResults\n statsmodels.tsa.statespace.kalman_filter.KalmanFilter\n statsmodels.tsa.statespace.representation.Representation\n\n Notes\n -----\n This class wraps the state space model with Kalman filtering to add in\n functionality for maximum likelihood estimation. In particular, it adds\n the concept of updating the state space representation based on a defined\n set of parameters, through the `update` method or `updater` attribute (see\n below for more details on which to use when), and it adds a `fit` method\n which uses a numerical optimizer to select the parameters that maximize\n the likelihood of the model.\n\n The `start_params` `update` method must be overridden in the\n child class (and the `transform` and `untransform` methods, if needed).\n "¶
- __module__ = 'statsmodels.tsa.statespace.mlemodel'¶
- _forecasts_error_partial_derivatives(params, transformed=True, includes_fixed=False, approx_complex_step=None, approx_centered=False, res=None, **kwargs)[source]¶
- _get_extension_time_varying_matrices(params, exog, out_of_sample, extend_kwargs=None, transformed=True, includes_fixed=False, **kwargs)[source]¶
Get updated time-varying state space system matrices
- Parameters:
- paramsarray_like
Array of parameters used to construct the time-varying system matrices.
- exogarray_like or None
New observations of exogenous regressors, if applicable.
- out_of_sampleint
Number of new observations required.
- extend_kwargsdict, optional
Dictionary of keyword arguments to pass to the state space model constructor. For example, for an SARIMAX state space model, this could be used to pass the concentrate_scale=True keyword argument. Any arguments that are not explicitly set in this dictionary will be copied from the current model instance.
- transformedbool, optional
Whether or not start_params is already transformed. Default is True.
- includes_fixedbool, optional
If parameters were previously fixed with the fix_params method, this argument describes whether or not start_params also includes the fixed parameters, in addition to the free parameters. Default is False.
- _hessian_complex_step(params, **kwargs)[source]¶
Hessian matrix computed by second-order complex-step differentiation on the loglike function.
- _hessian_oim(params, **kwargs)[source]¶
Hessian matrix computed using the Harvey (1989) information matrix
- _hessian_opg(params, **kwargs)[source]¶
Hessian matrix computed using the outer product of gradients information matrix
- _hessian_param_defaults = [True, 'approx', None, False]¶
- _hessian_param_names = ['transformed', 'hessian_method', 'approx_complex_step', 'approx_centered']¶
- _loglike_param_defaults = [True, False, False]¶
- _loglike_param_names = ['transformed', 'includes_fixed', 'complex_step']¶
- property _res_classes¶
- _score_obs_harvey(params, approx_complex_step=True, approx_centered=False, includes_fixed=False, **kwargs)[source]¶
Score
- Parameters:
- paramsarray_like, optional
Array of parameters at which to evaluate the loglikelihood function.
- **kwargs
Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.
Notes
This method is from Harvey (1989), section 3.4.5
References
Harvey, Andrew C. 1990. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
- _score_param_defaults = [True, False, 'approx', None, False]¶
- _score_param_names = ['transformed', 'includes_fixed', 'score_method', 'approx_complex_step', 'approx_centered']¶
- _validate_out_of_sample_exog(exog, out_of_sample)[source]¶
Validate given exog as satisfactory for out-of-sample operations
- Parameters:
- exogarray_like or None
New observations of exogenous regressors, if applicable.
- out_of_sampleint
Number of new observations required.
- Returns:
- exogarray or None
A numpy array of shape (out_of_sample, k_exog) if the model contains an exog component, or None if it does not.
- _wrap_results(params, result, return_raw, cov_type=None, cov_kwds=None, results_class=None, wrapper_class=None)[source]¶
- clone(endog, exog=None, **kwargs)[source]¶
Clone state space model with new data and optionally new specification
- Parameters:
- endogarray_like
The observed time-series process \(y\)
- k_statesint
The dimension of the unobserved state process.
- exogarray_like, optional
Array of exogenous regressors, shaped nobs x k. Default is no exogenous regressors.
- kwargs
Keyword arguments to pass to the new model class to change the model specification.
- Returns:
- modelMLEModel subclass
Notes
This method must be implemented
- filter(params, transformed=True, includes_fixed=False, complex_step=False, cov_type=None, cov_kwds=None, return_ssm=False, results_class=None, results_wrapper_class=None, low_memory=False, **kwargs)[source]¶
Kalman filtering
- Parameters:
- paramsarray_like
Array of parameters at which to evaluate the loglikelihood function.
- transformedbool, optional
Whether or not params is already transformed. Default is True.
- return_ssmbool,optional
Whether or not to return only the state space output or a full results object. Default is to return a full results object.
- cov_typestr, optional
See MLEResults.fit for a description of covariance matrix types for results object.
- cov_kwdsdict or None, optional
See MLEResults.get_robustcov_results for a description required keywords for alternative covariance estimators
- low_memorybool, optional
If set to True, techniques are applied to substantially reduce memory usage. If used, some features of the results object will not be available (including in-sample prediction), although out-of-sample forecasting is possible. Default is False.
- **kwargs
Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.
- fit(start_params=None, transformed=True, includes_fixed=False, cov_type=None, cov_kwds=None, method='lbfgs', maxiter=50, full_output=1, disp=5, callback=None, return_params=False, optim_score=None, optim_complex_step=None, optim_hessian=None, flags=None, low_memory=False, **kwargs)[source]¶
Fits the model by maximum likelihood via Kalman filter.
- Parameters:
- start_paramsarray_like, optional
Initial guess of the solution for the loglikelihood maximization. If None, the default is given by Model.start_params.
- transformedbool, optional
Whether or not start_params is already transformed. Default is True.
- includes_fixedbool, optional
If parameters were previously fixed with the fix_params method, this argument describes whether or not start_params also includes the fixed parameters, in addition to the free parameters. Default is False.
- cov_typestr, optional
The cov_type keyword governs the method for calculating the covariance matrix of parameter estimates. Can be one of:
‘opg’ for the outer product of gradient estimator
‘oim’ for the observed information matrix estimator, calculated using the method of Harvey (1989)
‘approx’ for the observed information matrix estimator, calculated using a numerical approximation of the Hessian matrix.
‘robust’ for an approximate (quasi-maximum likelihood) covariance matrix that may be valid even in the presence of some misspecifications. Intermediate calculations use the ‘oim’ method.
‘robust_approx’ is the same as ‘robust’ except that the intermediate calculations use the ‘approx’ method.
‘none’ for no covariance matrix calculation.
Default is ‘opg’ unless memory conservation is used to avoid computing the loglikelihood values for each observation, in which case the default is ‘approx’.
- cov_kwdsdict or None, optional
A dictionary of arguments affecting covariance matrix computation.
opg, oim, approx, robust, robust_approx
‘approx_complex_step’ : bool, optional - If True, numerical approximations are computed using complex-step methods. If False, numerical approximations are computed using finite difference methods. Default is True.
‘approx_centered’ : bool, optional - If True, numerical approximations computed using finite difference methods use a centered approximation. Default is False.
- methodstr, optional
The method determines which solver from scipy.optimize is used, and it can be chosen from among the following strings:
‘newton’ for Newton-Raphson
‘nm’ for Nelder-Mead
‘bfgs’ for Broyden-Fletcher-Goldfarb-Shanno (BFGS)
‘lbfgs’ for limited-memory BFGS with optional box constraints
‘powell’ for modified Powell’s method
‘cg’ for conjugate gradient
‘ncg’ for Newton-conjugate gradient
‘basinhopping’ for global basin-hopping solver
The explicit arguments in fit are passed to the solver, with the exception of the basin-hopping solver. Each solver has several optional arguments that are not the same across solvers. See the notes section below (or scipy.optimize) for the available arguments and for the list of explicit arguments that the basin-hopping solver supports.
- maxiterint, optional
The maximum number of iterations to perform.
- full_outputbool, optional
Set to True to have all available output in the Results object’s mle_retvals attribute. The output is dependent on the solver. See LikelihoodModelResults notes section for more information.
- dispbool, optional
Set to True to print convergence messages.
- callbackcallable callback(xk), optional
Called after each iteration, as callback(xk), where xk is the current parameter vector.
- return_paramsbool, optional
Whether or not to return only the array of maximizing parameters. Default is False.
- optim_score{‘harvey’, ‘approx’} or None, optional
The method by which the score vector is calculated. ‘harvey’ uses the method from Harvey (1989), ‘approx’ uses either finite difference or complex step differentiation depending upon the value of optim_complex_step, and None uses the built-in gradient approximation of the optimizer. Default is None. This keyword is only relevant if the optimization method uses the score.
- optim_complex_stepbool, optional
Whether or not to use complex step differentiation when approximating the score; if False, finite difference approximation is used. Default is True. This keyword is only relevant if optim_score is set to ‘harvey’ or ‘approx’.
- optim_hessian{‘opg’,’oim’,’approx’}, optional
The method by which the Hessian is numerically approximated. ‘opg’ uses outer product of gradients, ‘oim’ uses the information matrix formula from Harvey (1989), and ‘approx’ uses numerical approximation. This keyword is only relevant if the optimization method uses the Hessian matrix.
- low_memorybool, optional
If set to True, techniques are applied to substantially reduce memory usage. If used, some features of the results object will not be available (including smoothed results and in-sample prediction), although out-of-sample forecasting is possible. Default is False.
- **kwargs
Additional keyword arguments to pass to the optimizer.
- Returns:
- results
Results object holding results from fitting a state space model.
See also
statsmodels.base.model.LikelihoodModel.fitstatsmodels.tsa.statespace.mlemodel.MLEResultsstatsmodels.tsa.statespace.structural.UnobservedComponentsResults
- fit_constrained(constraints, start_params=None, **fit_kwds)[source]¶
Fit the model with some parameters subject to equality constraints.
- Parameters:
- constraintsdict
Dictionary of constraints, of the form param_name: fixed_value. See the param_names property for valid parameter names.
- start_paramsarray_like, optional
Initial guess of the solution for the loglikelihood maximization. If None, the default is given by Model.start_params.
- **fit_kwdskeyword arguments
fit_kwds are used in the optimization of the remaining parameters.
- Returns:
- resultsResults instance
Examples
>>> mod = sm.tsa.SARIMAX(endog, order=(1, 0, 1)) >>> res = mod.fit_constrained({'ar.L1': 0.5})
- fix_params(params)[source]¶
Fix parameters to specific values (context manager)
- Parameters:
- paramsdict
Dictionary describing the fixed parameter values, of the form param_name: fixed_value. See the param_names property for valid parameter names.
Examples
>>> mod = sm.tsa.SARIMAX(endog, order=(1, 0, 1)) >>> with mod.fix_params({'ar.L1': 0.5}): res = mod.fit()
- classmethod from_formula(formula, data, subset=None)[source]¶
Not implemented for state space models
- handle_params(params, transformed=True, includes_fixed=False, return_jacobian=False)[source]¶
Ensure model parameters satisfy shape and other requirements
- hessian(params, *args, **kwargs)[source]¶
Hessian matrix of the likelihood function, evaluated at the given parameters
- Parameters:
- paramsarray_like
Array of parameters at which to evaluate the hessian.
- *args
Additional positional arguments to the loglike method.
- **kwargs
Additional keyword arguments to the loglike method.
- Returns:
- hessianndarray
Hessian matrix evaluated at params
Notes
This is a numerical approximation.
Both args and kwargs are necessary because the optimizer from fit must call this function and only supports passing arguments via args (for example scipy.optimize.fmin_l_bfgs).
- impulse_responses(params, steps=1, impulse=0, orthogonalized=False, cumulative=False, anchor=None, exog=None, extend_model=None, extend_kwargs=None, transformed=True, includes_fixed=False, **kwargs)[source]¶
Impulse response function
- Parameters:
- paramsarray_like
Array of model parameters.
- stepsint, optional
The number of steps for which impulse responses are calculated. Default is 1. Note that for time-invariant models, the initial impulse is not counted as a step, so if steps=1, the output will have 2 entries.
- impulseint, str or array_like
If an integer, the state innovation to pulse; must be between 0 and k_posdef-1. If a str, it indicates which column of df the unit (1) impulse is given. Alternatively, a custom impulse vector may be provided; must be shaped k_posdef x 1.
- orthogonalizedbool, optional
Whether or not to perform impulse using orthogonalized innovations. Note that this will also affect custum impulse vectors. Default is False.
- cumulativebool, optional
Whether or not to return cumulative impulse responses. Default is False.
- anchorint, str, or datetime, optional
Time point within the sample for the state innovation impulse. Type depends on the index of the given endog in the model. Two special cases are the strings ‘start’ and ‘end’, which refer to setting the impulse at the first and last points of the sample, respectively. Integer values can run from 0 to nobs - 1, or can be negative to apply negative indexing. Finally, if a date/time index was provided to the model, then this argument can be a date string to parse or a datetime type. Default is ‘start’.
- exogarray_like, optional
New observations of exogenous regressors for our-of-sample periods, if applicable.
- transformedbool, optional
Whether or not params is already transformed. Default is True.
- includes_fixedbool, optional
If parameters were previously fixed with the fix_params method, this argument describes whether or not params also includes the fixed parameters, in addition to the free parameters. Default is False.
- **kwargs
If the model has time-varying design or transition matrices and the combination of anchor and steps implies creating impulse responses for the out-of-sample period, then these matrices must have updated values provided for the out-of-sample steps. For example, if design is a time-varying component, nobs is 10, anchor=1, and steps is 15, a (k_endog x k_states x 7) matrix must be provided with the new design matrix values.
- Returns:
- impulse_responsesndarray
Responses for each endogenous variable due to the impulse given by the impulse argument. For a time-invariant model, the impulse responses are given for steps + 1 elements (this gives the “initial impulse” followed by steps responses for the important cases of VAR and SARIMAX models), while for time-varying models the impulse responses are only given for steps elements (to avoid having to unexpectedly provide updated time-varying matrices).
See also
simulateSimulate a time series according to the given state space model, optionally with specified series for the innovations.
Notes
Intercepts in the measurement and state equation are ignored when calculating impulse responses.
- TODO: add an option to allow changing the ordering for the
orthogonalized option. Will require permuting matrices when constructing the extended model.
- property initial_variance¶
- property initialization¶
- initialize_statespace(**kwargs)[source]¶
Initialize the state space representation
- Parameters:
- **kwargs
Additional keyword arguments to pass to the state space class constructor.
- loglike(params, *args, **kwargs)[source]¶
Loglikelihood evaluation
- Parameters:
- paramsarray_like
Array of parameters at which to evaluate the loglikelihood function.
- transformedbool, optional
Whether or not params is already transformed. Default is True.
- **kwargs
Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.
See also
updatemodifies the internal state of the state space model to reflect new params
Notes
[1] recommend maximizing the average likelihood to avoid scale issues; this is done automatically by the base Model fit method.
References
[1]Koopman, Siem Jan, Neil Shephard, and Jurgen A. Doornik. 1999. Statistical Algorithms for Models in State Space Using SsfPack 2.2. Econometrics Journal 2 (1): 107-60. doi:10.1111/1368-423X.00023.
- property loglikelihood_burn¶
- loglikeobs(params, transformed=True, includes_fixed=False, complex_step=False, **kwargs)[source]¶
Loglikelihood evaluation
- Parameters:
- paramsarray_like
Array of parameters at which to evaluate the loglikelihood function.
- transformedbool, optional
Whether or not params is already transformed. Default is True.
- **kwargs
Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.
See also
updatemodifies the internal state of the Model to reflect new params
Notes
[1] recommend maximizing the average likelihood to avoid scale issues; this is done automatically by the base Model fit method.
References
[1]Koopman, Siem Jan, Neil Shephard, and Jurgen A. Doornik. 1999. Statistical Algorithms for Models in State Space Using SsfPack 2.2. Econometrics Journal 2 (1): 107-60. doi:10.1111/1368-423X.00023.
- observed_information_matrix(params, transformed=True, includes_fixed=False, approx_complex_step=None, approx_centered=False, **kwargs)[source]¶
Observed information matrix
- Parameters:
- paramsarray_like, optional
Array of parameters at which to evaluate the loglikelihood function.
- **kwargs
Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.
Notes
This method is from Harvey (1989), which shows that the information matrix only depends on terms from the gradient. This implementation is partially analytic and partially numeric approximation, therefore, because it uses the analytic formula for the information matrix, with numerically computed elements of the gradient.
References
Harvey, Andrew C. 1990. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
- opg_information_matrix(params, transformed=True, includes_fixed=False, approx_complex_step=None, **kwargs)[source]¶
Outer product of gradients information matrix
- Parameters:
- paramsarray_like, optional
Array of parameters at which to evaluate the loglikelihood function.
- **kwargs
Additional arguments to the loglikeobs method.
References
Berndt, Ernst R., Bronwyn Hall, Robert Hall, and Jerry Hausman. 1974. Estimation and Inference in Nonlinear Structural Models. NBER Chapters. National Bureau of Economic Research, Inc.
- property param_names¶
(list of str) List of human readable parameter names (for parameters actually included in the model).
- score(params, *args, **kwargs)[source]¶
Compute the score function at params.
- Parameters:
- paramsarray_like
Array of parameters at which to evaluate the score.
- *args
Additional positional arguments to the loglike method.
- **kwargs
Additional keyword arguments to the loglike method.
- Returns:
- scorendarray
Score, evaluated at params.
Notes
This is a numerical approximation, calculated using first-order complex step differentiation on the loglike method.
Both args and kwargs are necessary because the optimizer from fit must call this function and only supports passing arguments via args (for example scipy.optimize.fmin_l_bfgs).
- score_obs(params, method='approx', transformed=True, includes_fixed=False, approx_complex_step=None, approx_centered=False, **kwargs)[source]¶
Compute the score per observation, evaluated at params
- Parameters:
- paramsarray_like
Array of parameters at which to evaluate the score.
- **kwargs
Additional arguments to the loglike method.
- Returns:
- scorendarray
Score per observation, evaluated at params.
Notes
This is a numerical approximation, calculated using first-order complex step differentiation on the loglikeobs method.
- set_conserve_memory(conserve_memory=None, **kwargs)[source]¶
Set the memory conservation method
By default, the Kalman filter computes a number of intermediate matrices at each iteration. The memory conservation options control which of those matrices are stored.
- Parameters:
- conserve_memoryint, optional
Bitmask value to set the memory conservation method to. See notes for details.
- **kwargs
Keyword arguments may be used to influence the memory conservation method by setting individual boolean flags.
Notes
This method is rarely used. See the corresponding function in the KalmanFilter class for details.
- set_filter_method(filter_method=None, **kwargs)[source]¶
Set the filtering method
The filtering method controls aspects of which Kalman filtering approach will be used.
- Parameters:
- filter_methodint, optional
Bitmask value to set the filter method to. See notes for details.
- **kwargs
Keyword arguments may be used to influence the filter method by setting individual boolean flags. See notes for details.
Notes
This method is rarely used. See the corresponding function in the KalmanFilter class for details.
- set_inversion_method(inversion_method=None, **kwargs)[source]¶
Set the inversion method
The Kalman filter may contain one matrix inversion: that of the forecast error covariance matrix. The inversion method controls how and if that inverse is performed.
- Parameters:
- inversion_methodint, optional
Bitmask value to set the inversion method to. See notes for details.
- **kwargs
Keyword arguments may be used to influence the inversion method by setting individual boolean flags. See notes for details.
Notes
This method is rarely used. See the corresponding function in the KalmanFilter class for details.
- set_smoother_output(smoother_output=None, **kwargs)[source]¶
Set the smoother output
The smoother can produce several types of results. The smoother output variable controls which are calculated and returned.
- Parameters:
- smoother_outputint, optional
Bitmask value to set the smoother output to. See notes for details.
- **kwargs
Keyword arguments may be used to influence the smoother output by setting individual boolean flags.
Notes
This method is rarely used. See the corresponding function in the KalmanSmoother class for details.
- set_stability_method(stability_method=None, **kwargs)[source]¶
Set the numerical stability method
The Kalman filter is a recursive algorithm that may in some cases suffer issues with numerical stability. The stability method controls what, if any, measures are taken to promote stability.
- Parameters:
- stability_methodint, optional
Bitmask value to set the stability method to. See notes for details.
- **kwargs
Keyword arguments may be used to influence the stability method by setting individual boolean flags. See notes for details.
Notes
This method is rarely used. See the corresponding function in the KalmanFilter class for details.
- simulate(params, nsimulations, measurement_shocks=None, state_shocks=None, initial_state=None, anchor=None, repetitions=None, exog=None, extend_model=None, extend_kwargs=None, transformed=True, includes_fixed=False, pretransformed_measurement_shocks=True, pretransformed_state_shocks=True, pretransformed_initial_state=True, random_state=None, **kwargs)[source]¶
Simulate a new time series following the state space model
- Parameters:
- paramsarray_like
Array of parameters to use in constructing the state space representation to use when simulating.
- nsimulationsint
The number of observations to simulate. If the model is time-invariant this can be any number. If the model is time-varying, then this number must be less than or equal to the number of observations.
- measurement_shocksarray_like, optional
If specified, these are the shocks to the measurement equation, \(\varepsilon_t\). If unspecified, these are automatically generated using a pseudo-random number generator. If specified, must be shaped nsimulations x k_endog, where k_endog is the same as in the state space model.
- state_shocksarray_like, optional
If specified, these are the shocks to the state equation, \(\eta_t\). If unspecified, these are automatically generated using a pseudo-random number generator. If specified, must be shaped nsimulations x k_posdef where k_posdef is the same as in the state space model.
- initial_statearray_like, optional
If specified, this is the initial state vector to use in simulation, which should be shaped (k_states x 1), where k_states is the same as in the state space model. If unspecified, but the model has been initialized, then that initialization is used. This must be specified if anchor is anything other than “start” or 0 (or else you can use the simulate method on a results object rather than on the model object).
- anchorint, str, or datetime, optional
First period for simulation. The simulation will be conditional on all existing datapoints prior to the anchor. Type depends on the index of the given endog in the model. Two special cases are the strings ‘start’ and ‘end’. start refers to beginning the simulation at the first period of the sample, and end refers to beginning the simulation at the first period after the sample. Integer values can run from 0 to nobs, or can be negative to apply negative indexing. Finally, if a date/time index was provided to the model, then this argument can be a date string to parse or a datetime type. Default is ‘start’.
- repetitionsint, optional
Number of simulated paths to generate. Default is 1 simulated path.
- exogarray_like, optional
New observations of exogenous regressors, if applicable.
- transformedbool, optional
Whether or not params is already transformed. Default is True.
- includes_fixedbool, optional
If parameters were previously fixed with the fix_params method, this argument describes whether or not params also includes the fixed parameters, in addition to the free parameters. Default is False.
- pretransformed_measurement_shocksbool, optional
If measurement_shocks is provided, this flag indicates whether it should be directly used as the shocks. If False, then it is assumed to contain draws from the standard Normal distribution that must be transformed using the obs_cov covariance matrix. Default is True.
- pretransformed_state_shocksbool, optional
If state_shocks is provided, this flag indicates whether it should be directly used as the shocks. If False, then it is assumed to contain draws from the standard Normal distribution that must be transformed using the state_cov covariance matrix. Default is True.
- pretransformed_initial_statebool, optional
If initial_state is provided, this flag indicates whether it should be directly used as the initial_state. If False, then it is assumed to contain draws from the standard Normal distribution that must be transformed using the initial_state_cov covariance matrix. Default is True.
- random_state{None, int, Generator, RandomState}, optional
If seed is None (or np.random), the class:
~numpy.random.RandomStatesingleton is used. If seed is an int, a new class:~numpy.random.RandomStateinstance is used, seeded with seed. If seed is already a class:~numpy.random.Generatoror class:~numpy.random.RandomStateinstance then that instance is used.
- Returns:
- simulated_obsndarray
An array of simulated observations. If repetitions=None, then it will be shaped (nsimulations x k_endog) or (nsimulations,) if k_endog=1. Otherwise it will be shaped (nsimulations x k_endog x repetitions). If the model was given Pandas input then the output will be a Pandas object. If k_endog > 1 and repetitions is not None, then the output will be a Pandas DataFrame that has a MultiIndex for the columns, with the first level containing the names of the endog variables and the second level containing the repetition number.
See also
impulse_responsesImpulse response functions
- simulation_smoother(simulation_output=None, **kwargs)[source]¶
Retrieve a simulation smoother for the state space model.
- Parameters:
- simulation_outputint, optional
Determines which simulation smoother output is calculated. Default is all (including state and disturbances).
- **kwargs
Additional keyword arguments, used to set the simulation output. See set_simulation_output for more details.
- Returns:
- SimulationSmoothResults
- smooth(params, transformed=True, includes_fixed=False, complex_step=False, cov_type=None, cov_kwds=None, return_ssm=False, results_class=None, results_wrapper_class=None, **kwargs)[source]¶
Kalman smoothing
- Parameters:
- paramsarray_like
Array of parameters at which to evaluate the loglikelihood function.
- transformedbool, optional
Whether or not params is already transformed. Default is True.
- return_ssmbool,optional
Whether or not to return only the state space output or a full results object. Default is to return a full results object.
- cov_typestr, optional
See MLEResults.fit for a description of covariance matrix types for results object.
- cov_kwdsdict or None, optional
See MLEResults.get_robustcov_results for a description required keywords for alternative covariance estimators
- **kwargs
Additional keyword arguments to pass to the Kalman filter. See KalmanFilter.filter for more details.
- property start_params¶
(array) Starting parameters for maximum likelihood estimation.
- property state_names¶
(list of str) List of human readable names for unobserved states.
- property tolerance¶
- transform_jacobian(unconstrained, approx_centered=False)[source]¶
Jacobian matrix for the parameter transformation function
- Parameters:
- unconstrainedarray_like
Array of unconstrained parameters used by the optimizer.
- Returns:
- jacobianndarray
Jacobian matrix of the transformation, evaluated at unconstrained
See also
Notes
This is a numerical approximation using finite differences. Note that in general complex step methods cannot be used because it is not guaranteed that the transform_params method is a real function (e.g. if Cholesky decomposition is used).
- transform_params(unconstrained)[source]¶
Transform unconstrained parameters used by the optimizer to constrained parameters used in likelihood evaluation
- Parameters:
- unconstrainedarray_like
Array of unconstrained parameters used by the optimizer, to be transformed.
- Returns:
- constrainedarray_like
Array of constrained parameters which may be used in likelihood evaluation.
Notes
This is a noop in the base class, subclasses should override where appropriate.
- untransform_params(constrained)[source]¶
Transform constrained parameters used in likelihood evaluation to unconstrained parameters used by the optimizer
- Parameters:
- constrainedarray_like
Array of constrained parameters used in likelihood evaluation, to be transformed.
- Returns:
- unconstrainedarray_like
Array of unconstrained parameters used by the optimizer.
Notes
This is a noop in the base class, subclasses should override where appropriate.
- update(params, transformed=True, includes_fixed=False, complex_step=False)[source]¶
Update the parameters of the model
- Parameters:
- paramsarray_like
Array of new parameters.
- transformedbool, optional
Whether or not params is already transformed. If set to False, transform_params is called. Default is True.
- Returns:
- paramsarray_like
Array of parameters.
Notes
Since Model is a base class, this method should be overridden by subclasses to perform actual updating steps.
- class Pipeline(steps, *, memory=None, verbose=False)[source]¶
Bases:
_BaseCompositionA sequence of data transformers with an optional final predictor.
Pipeline allows you to sequentially apply a list of transformers to preprocess the data and, if desired, conclude the sequence with a final predictor for predictive modeling.
Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit. The transformers in the pipeline can be cached using
memoryargument.The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below. A step’s estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting it to ‘passthrough’ or None.
For an example use case of Pipeline combined with
GridSearchCV, refer to Selecting dimensionality reduction with Pipeline and GridSearchCV. The example Pipelining: chaining a PCA and a logistic regression shows how to grid search on a pipeline using ‘__’ as a separator in the parameter names.Read more in the User Guide.
New in version 0.5.
- Parameters:
- stepslist of tuples
List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define fit. All non-last steps must also define transform. See Combining Estimators for more details.
- memorystr or object with the joblib.Memory interface, default=None
Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute
named_stepsorstepsto inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.- verbosebool, default=False
If True, the time elapsed while fitting each step will be printed as it is completed.
See also
make_pipelineConvenience function for simplified pipeline construction.
Examples
>>> from sklearn.svm import SVC >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from sklearn.pipeline import Pipeline >>> X, y = make_classification(random_state=0) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, ... random_state=0) >>> pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())]) >>> # The pipeline can be used as any other estimator >>> # and avoids leaking the test set into the train set >>> pipe.fit(X_train, y_train).score(X_test, y_test) 0.88 >>> # An estimator's parameter can be set using '__' syntax >>> pipe.set_params(svc__C=10).fit(X_train, y_train).score(X_test, y_test) 0.76
- Attributes:
named_stepsBunchAccess the steps by name.
classes_ndarray of shape (n_classes,)The classes labels.
n_features_in_intNumber of features seen during first step fit method.
feature_names_in_ndarray of shape (n_features_in_,)Names of features seen during first step fit method.
Methods
decision_function(X, **params)Transform the data, and apply decision_function with the final estimator.
fit(X[, y])Fit the model.
fit_predict(X[, y])Transform the data, and apply fit_predict with the final estimator.
fit_transform(X[, y])Fit the model and transform with the final estimator.
get_feature_names_out([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
inverse_transform(Xt, **params)Apply inverse_transform for each step in a reverse order.
predict(X, **params)Transform the data, and apply predict with the final estimator.
predict_log_proba(X, **params)Transform the data, and apply predict_log_proba with the final estimator.
predict_proba(X, **params)Transform the data, and apply predict_proba with the final estimator.
score(X[, y, sample_weight])Transform the data, and apply score with the final estimator.
Transform the data, and apply score_samples with the final estimator.
set_output(*[, transform])Set the output container when "transform" and "fit_transform" are called.
set_params(**kwargs)Set the parameters of this estimator.
set_score_request(*[, sample_weight])Request metadata passed to the
scoremethod.transform(X, **params)Transform the data, and apply transform with the final estimator.
- __abstractmethods__ = frozenset({})¶
- __annotations__ = {'_parameter_constraints': <class 'dict'>, 'steps': 'List[Any]'}¶
- __doc__ = "\n A sequence of data transformers with an optional final predictor.\n\n `Pipeline` allows you to sequentially apply a list of transformers to\n preprocess the data and, if desired, conclude the sequence with a final\n :term:`predictor` for predictive modeling.\n\n Intermediate steps of the pipeline must be 'transforms', that is, they\n must implement `fit` and `transform` methods.\n The final :term:`estimator` only needs to implement `fit`.\n The transformers in the pipeline can be cached using ``memory`` argument.\n\n The purpose of the pipeline is to assemble several steps that can be\n cross-validated together while setting different parameters. For this, it\n enables setting parameters of the various steps using their names and the\n parameter name separated by a `'__'`, as in the example below. A step's\n estimator may be replaced entirely by setting the parameter with its name\n to another estimator, or a transformer removed by setting it to\n `'passthrough'` or `None`.\n\n For an example use case of `Pipeline` combined with\n :class:`~sklearn.model_selection.GridSearchCV`, refer to\n :ref:`sphx_glr_auto_examples_compose_plot_compare_reduction.py`. The\n example :ref:`sphx_glr_auto_examples_compose_plot_digits_pipe.py` shows how\n to grid search on a pipeline using `'__'` as a separator in the parameter names.\n\n Read more in the :ref:`User Guide <pipeline>`.\n\n .. versionadded:: 0.5\n\n Parameters\n ----------\n steps : list of tuples\n List of (name of step, estimator) tuples that are to be chained in\n sequential order. To be compatible with the scikit-learn API, all steps\n must define `fit`. All non-last steps must also define `transform`. See\n :ref:`Combining Estimators <combining_estimators>` for more details.\n\n memory : str or object with the joblib.Memory interface, default=None\n Used to cache the fitted transformers of the pipeline. The last step\n will never be cached, even if it is a transformer. By default, no\n caching is performed. If a string is given, it is the path to the\n caching directory. Enabling caching triggers a clone of the transformers\n before fitting. Therefore, the transformer instance given to the\n pipeline cannot be inspected directly. Use the attribute ``named_steps``\n or ``steps`` to inspect estimators within the pipeline. Caching the\n transformers is advantageous when fitting is time consuming.\n\n verbose : bool, default=False\n If True, the time elapsed while fitting each step will be printed as it\n is completed.\n\n Attributes\n ----------\n named_steps : :class:`~sklearn.utils.Bunch`\n Dictionary-like object, with the following attributes.\n Read-only attribute to access any step parameter by user given name.\n Keys are step names and values are steps parameters.\n\n classes_ : ndarray of shape (n_classes,)\n The classes labels. Only exist if the last step of the pipeline is a\n classifier.\n\n n_features_in_ : int\n Number of features seen during :term:`fit`. Only defined if the\n underlying first estimator in `steps` exposes such an attribute\n when fit.\n\n .. versionadded:: 0.24\n\n feature_names_in_ : ndarray of shape (`n_features_in_`,)\n Names of features seen during :term:`fit`. Only defined if the\n underlying estimator exposes such an attribute when fit.\n\n .. versionadded:: 1.0\n\n See Also\n --------\n make_pipeline : Convenience function for simplified pipeline construction.\n\n Examples\n --------\n >>> from sklearn.svm import SVC\n >>> from sklearn.preprocessing import StandardScaler\n >>> from sklearn.datasets import make_classification\n >>> from sklearn.model_selection import train_test_split\n >>> from sklearn.pipeline import Pipeline\n >>> X, y = make_classification(random_state=0)\n >>> X_train, X_test, y_train, y_test = train_test_split(X, y,\n ... random_state=0)\n >>> pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])\n >>> # The pipeline can be used as any other estimator\n >>> # and avoids leaking the test set into the train set\n >>> pipe.fit(X_train, y_train).score(X_test, y_test)\n 0.88\n >>> # An estimator's parameter can be set using '__' syntax\n >>> pipe.set_params(svc__C=10).fit(X_train, y_train).score(X_test, y_test)\n 0.76\n "¶
- __getitem__(ind)[source]¶
Returns a sub-pipeline or a single estimator in the pipeline
Indexing with an integer will return an estimator; using a slice returns another Pipeline instance which copies a slice of this Pipeline. This copy is shallow: modifying (or fitting) estimators in the sub-pipeline will affect the larger pipeline and vice-versa. However, replacing a value in step will not affect a copy.
- __module__ = 'sklearn.pipeline'¶
- _abc_impl = <_abc._abc_data object>¶
- property _estimator_type¶
- property _final_estimator¶
- _iter(with_final=True, filter_passthrough=True)[source]¶
Generate (idx, (name, trans)) tuples from self.steps
When filter_passthrough is True, ‘passthrough’ and None transformers are filtered out.
- _parameter_constraints: dict = {'memory': [None, <class 'str'>, <sklearn.utils._param_validation.HasMethods object>], 'steps': [<class 'list'>, <sklearn.utils._param_validation.Hidden object>], 'verbose': ['boolean']}¶
- _required_parameters = ['steps']¶
- property classes_¶
The classes labels. Only exist if the last step is a classifier.
- decision_function(X, **params)[source]¶
Transform the data, and apply decision_function with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls decision_function method. Only valid if the final estimator implements decision_function.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- **paramsdict of string -> object
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.
- Returns:
- y_scorendarray of shape (n_samples, n_classes)
Result of calling decision_function on the final estimator.
- property feature_names_in_¶
Names of features seen during first step fit method.
- fit(X, y=None, **params)[source]¶
Fit the model.
Fit all the transformers one after the other and sequentially transform the data. Finally, fit the transformed data using the final estimator.
- Parameters:
- Xiterable
Training data. Must fulfill input requirements of first step of the pipeline.
- yiterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **paramsdict of str -> object
If enable_metadata_routing=False (default):
Parameters passed to the
fitmethod of each step, where each parameter name is prefixed such that parameterpfor stepshas keys__p.If enable_metadata_routing=True:
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
Changed in version 1.4: Parameters are now passed to the
transformmethod of the intermediate steps as well, if requested, and if enable_metadata_routing=True is set viaset_config().See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Pipeline with fitted steps.
- fit_predict(X, y=None, **params)[source]¶
Transform the data, and apply fit_predict with the final estimator.
Call fit_transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls fit_predict method. Only valid if the final estimator implements fit_predict.
- Parameters:
- Xiterable
Training data. Must fulfill input requirements of first step of the pipeline.
- yiterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **paramsdict of str -> object
If enable_metadata_routing=False (default):
Parameters to the
predictcalled at the end of all transformations in the pipeline.If enable_metadata_routing=True:
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 0.20.
Changed in version 1.4: Parameters are now passed to the
transformmethod of the intermediate steps as well, if requested, and if enable_metadata_routing=True.See Metadata Routing User Guide for more details.
Note that while this may be used to return uncertainties from some models with
return_stdorreturn_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.
- Returns:
- y_predndarray
Result of calling fit_predict on the final estimator.
- fit_transform(X, y=None, **params)[source]¶
Fit the model and transform with the final estimator.
Fit all the transformers one after the other and sequentially transform the data. Only valid if the final estimator either implements fit_transform or fit and transform.
- Parameters:
- Xiterable
Training data. Must fulfill input requirements of first step of the pipeline.
- yiterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **paramsdict of str -> object
If enable_metadata_routing=False (default):
Parameters passed to the
fitmethod of each step, where each parameter name is prefixed such that parameterpfor stepshas keys__p.If enable_metadata_routing=True:
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
Changed in version 1.4: Parameters are now passed to the
transformmethod of the intermediate steps as well, if requested, and if enable_metadata_routing=True.See Metadata Routing User Guide for more details.
- Returns:
- Xtndarray of shape (n_samples, n_transformed_features)
Transformed samples.
- get_feature_names_out(input_features=None)[source]¶
Get output feature names for transformation.
Transform input features using the pipeline.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
- Returns:
- feature_names_outndarray of str objects
Transformed feature names.
- get_metadata_routing()[source]¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRouter
A
MetadataRouterencapsulating routing information.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
Returns the parameters given in the constructor as well as the estimators contained within the steps of the Pipeline.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsmapping of string to any
Parameter names mapped to their values.
- inverse_transform(Xt, **params)[source]¶
Apply inverse_transform for each step in a reverse order.
All estimators in the pipeline must support inverse_transform.
- Parameters:
- Xtarray-like of shape (n_samples, n_transformed_features)
Data samples, where
n_samplesis the number of samples andn_featuresis the number of features. Must fulfill input requirements of last step of pipeline’sinverse_transformmethod.- **paramsdict of str -> object
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.
- Returns:
- Xtndarray of shape (n_samples, n_features)
Inverse transformed data, that is, data in the original feature space.
- property n_features_in_¶
Number of features seen during first step fit method.
- property named_steps¶
Access the steps by name.
Read-only attribute to access any step by given name. Keys are steps names and values are the steps objects.
- predict(X, **params)[source]¶
Transform the data, and apply predict with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict method. Only valid if the final estimator implements predict.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- **paramsdict of str -> object
If enable_metadata_routing=False (default):
Parameters to the
predictcalled at the end of all transformations in the pipeline.If enable_metadata_routing=True:
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 0.20.
Changed in version 1.4: Parameters are now passed to the
transformmethod of the intermediate steps as well, if requested, and if enable_metadata_routing=True is set viaset_config().See Metadata Routing User Guide for more details.
Note that while this may be used to return uncertainties from some models with
return_stdorreturn_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.
- Returns:
- y_predndarray
Result of calling predict on the final estimator.
- predict_log_proba(X, **params)[source]¶
Transform the data, and apply predict_log_proba with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict_log_proba method. Only valid if the final estimator implements predict_log_proba.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- **paramsdict of str -> object
If enable_metadata_routing=False (default):
Parameters to the predict_log_proba called at the end of all transformations in the pipeline.
If enable_metadata_routing=True:
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 0.20.
Changed in version 1.4: Parameters are now passed to the
transformmethod of the intermediate steps as well, if requested, and if enable_metadata_routing=True.See Metadata Routing User Guide for more details.
- Returns:
- y_log_probandarray of shape (n_samples, n_classes)
Result of calling predict_log_proba on the final estimator.
- predict_proba(X, **params)[source]¶
Transform the data, and apply predict_proba with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls predict_proba method. Only valid if the final estimator implements predict_proba.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- **paramsdict of str -> object
If enable_metadata_routing=False (default):
Parameters to the predict_proba called at the end of all transformations in the pipeline.
If enable_metadata_routing=True:
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 0.20.
Changed in version 1.4: Parameters are now passed to the
transformmethod of the intermediate steps as well, if requested, and if enable_metadata_routing=True.See Metadata Routing User Guide for more details.
- Returns:
- y_probandarray of shape (n_samples, n_classes)
Result of calling predict_proba on the final estimator.
- score(X, y=None, sample_weight=None, **params)[source]¶
Transform the data, and apply score with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls score method. Only valid if the final estimator implements score.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- yiterable, default=None
Targets used for scoring. Must fulfill label requirements for all steps of the pipeline.
- sample_weightarray-like, default=None
If not None, this argument is passed as
sample_weightkeyword argument to thescoremethod of the final estimator.- **paramsdict of str -> object
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.
- Returns:
- scorefloat
Result of calling score on the final estimator.
- score_samples(X)[source]¶
Transform the data, and apply score_samples with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls score_samples method. Only valid if the final estimator implements score_samples.
- Parameters:
- Xiterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- Returns:
- y_scorendarray of shape (n_samples,)
Result of calling score_samples on the final estimator.
- set_output(*, transform=None)[source]¶
Set the output container when “transform” and “fit_transform” are called.
Calling set_output will set the output of all estimators in steps.
- Parameters:
- transform{“default”, “pandas”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**kwargs)[source]¶
Set the parameters of this estimator.
Valid parameter keys can be listed with
get_params(). Note that you can directly set the parameters of the estimators contained in steps.- Parameters:
- **kwargsdict
Parameters of this estimator or parameters of estimators contained in steps. Parameters of the steps may be set using its name and the parameter name separated by a ‘__’.
- Returns:
- selfobject
Pipeline class instance.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Pipeline¶
Request metadata passed to the
scoremethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
- transform(X, **params)[source]¶
Transform the data, and apply transform with the final estimator.
Call transform of each transformer in the pipeline. The transformed data are finally passed to the final estimator that calls transform method. Only valid if the final estimator implements transform.
This also works where final estimator is None in which case all prior transformations are applied.
- Parameters:
- Xiterable
Data to transform. Must fulfill input requirements of first step of the pipeline.
- **paramsdict of str -> object
Parameters requested and accepted by steps. Each step must have requested certain metadata for these parameters to be forwarded to them.
New in version 1.4: Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.
- Returns:
- Xtndarray of shape (n_samples, n_transformed_features)
Transformed data.
- class RLM(endog, exog, M=None, missing='none', **kwargs)[source]¶
Bases:
LikelihoodModelRobust Linear Model
Estimate a robust linear model via iteratively reweighted least squares given a robust criterion estimator.
- Parameters:
- endogarray_like
A 1-d endogenous response variable. The dependent variable.
- exogarray_like
A nobs x k array where nobs is the number of observations and k is the number of regressors. An intercept is not included by default and should be added by the user. See
statsmodels.tools.add_constant().- Mstatsmodels.robust.norms.RobustNorm, optional
The robust criterion function for downweighting outliers. The current options are LeastSquares, HuberT, RamsayE, AndrewWave, TrimmedMean, Hampel, and TukeyBiweight. The default is HuberT(). See statsmodels.robust.norms for more information.
- missingstr
Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘none’.
Examples
>>> import statsmodels.api as sm >>> data = sm.datasets.stackloss.load() >>> data.exog = sm.add_constant(data.exog) >>> rlm_model = sm.RLM(data.endog, data.exog, M=sm.robust.norms.HuberT())
>>> rlm_results = rlm_model.fit() >>> rlm_results.params array([ 0.82938433, 0.92606597, -0.12784672, -41.02649835]) >>> rlm_results.bse array([ 0.11100521, 0.30293016, 0.12864961, 9.79189854]) >>> rlm_results_HC2 = rlm_model.fit(cov="H2") >>> rlm_results_HC2.params array([ 0.82938433, 0.92606597, -0.12784672, -41.02649835]) >>> rlm_results_HC2.bse array([ 0.11945975, 0.32235497, 0.11796313, 9.08950419]) >>> mod = sm.RLM(data.endog, data.exog, M=sm.robust.norms.Hampel()) >>> rlm_hamp_hub = mod.fit(scale_est=sm.robust.scale.HuberScale()) >>> rlm_hamp_hub.params array([ 0.73175452, 1.25082038, -0.14794399, -40.27122257])
- Attributes:
- df_modelfloat
The degrees of freedom of the model. The number of regressors p less one for the intercept. Note that the reported model degrees of freedom does not count the intercept as a regressor, though the model is assumed to have an intercept.
- df_residfloat
The residual degrees of freedom. The number of observations n less the number of regressors p. Note that here p does include the intercept as using a degree of freedom.
- endogndarray
See above. Note that endog is a reference to the data so that if data is already an array and it is changed, then endog changes as well.
- exogndarray
See above. Note that endog is a reference to the data so that if data is already an array and it is changed, then endog changes as well.
- Mstatsmodels.robust.norms.RobustNorm
See above. Robust estimator instance instantiated.
- nobsfloat
The number of observations n
- pinv_wexogndarray
The pseudoinverse of the design / exogenous data array. Note that RLM has no whiten method, so this is just the pseudo inverse of the design.
- normalized_cov_paramsndarray
The p x p normalized covariance of the design / exogenous data. This is approximately equal to (X.T X)^(-1)
Methods
deviance(tmp_results)Returns the (unnormalized) log-likelihood from the M estimator.
fit([maxiter, tol, scale_est, init, cov, ...])Fits the model using iteratively reweighted least squares.
information(params)Fisher information matrix of model.
loglike(params)Log-likelihood of model.
predict(params[, exog])Return linear predicted values from a design matrix.
score(params)Score vector of model.
- __annotations__ = {}¶
- __doc__ = '\n Robust Linear Model\n\n Estimate a robust linear model via iteratively reweighted least squares\n given a robust criterion estimator.\n\n Parameters\n ----------\n endog : array_like\n A 1-d endogenous response variable. The dependent variable.\n exog : array_like\n A nobs x k array where `nobs` is the number of observations and `k`\n is the number of regressors. An intercept is not included by default\n and should be added by the user. See\n :func:`statsmodels.tools.add_constant`.\n M : statsmodels.robust.norms.RobustNorm, optional\n The robust criterion function for downweighting outliers.\n The current options are LeastSquares, HuberT, RamsayE, AndrewWave,\n TrimmedMean, Hampel, and TukeyBiweight. The default is HuberT().\n See statsmodels.robust.norms for more information.\n missing : str\n Available options are \'none\', \'drop\', and \'raise\'. If \'none\', no nan\n checking is done. If \'drop\', any observations with nans are dropped.\n If \'raise\', an error is raised. Default is \'none\'.\n\n Attributes\n ----------\n\n df_model : float\n The degrees of freedom of the model. The number of regressors p less\n one for the intercept. Note that the reported model degrees\n of freedom does not count the intercept as a regressor, though\n the model is assumed to have an intercept.\n df_resid : float\n The residual degrees of freedom. The number of observations n\n less the number of regressors p. Note that here p does include\n the intercept as using a degree of freedom.\n endog : ndarray\n See above. Note that endog is a reference to the data so that if\n data is already an array and it is changed, then `endog` changes\n as well.\n exog : ndarray\n See above. Note that endog is a reference to the data so that if\n data is already an array and it is changed, then `endog` changes\n as well.\n M : statsmodels.robust.norms.RobustNorm\n See above. Robust estimator instance instantiated.\n nobs : float\n The number of observations n\n pinv_wexog : ndarray\n The pseudoinverse of the design / exogenous data array. Note that\n RLM has no whiten method, so this is just the pseudo inverse of the\n design.\n normalized_cov_params : ndarray\n The p x p normalized covariance of the design / exogenous data.\n This is approximately equal to (X.T X)^(-1)\n\n Examples\n --------\n >>> import statsmodels.api as sm\n >>> data = sm.datasets.stackloss.load()\n >>> data.exog = sm.add_constant(data.exog)\n >>> rlm_model = sm.RLM(data.endog, data.exog, M=sm.robust.norms.HuberT())\n\n >>> rlm_results = rlm_model.fit()\n >>> rlm_results.params\n array([ 0.82938433, 0.92606597, -0.12784672, -41.02649835])\n >>> rlm_results.bse\n array([ 0.11100521, 0.30293016, 0.12864961, 9.79189854])\n >>> rlm_results_HC2 = rlm_model.fit(cov="H2")\n >>> rlm_results_HC2.params\n array([ 0.82938433, 0.92606597, -0.12784672, -41.02649835])\n >>> rlm_results_HC2.bse\n array([ 0.11945975, 0.32235497, 0.11796313, 9.08950419])\n >>> mod = sm.RLM(data.endog, data.exog, M=sm.robust.norms.Hampel())\n >>> rlm_hamp_hub = mod.fit(scale_est=sm.robust.scale.HuberScale())\n >>> rlm_hamp_hub.params\n array([ 0.73175452, 1.25082038, -0.14794399, -40.27122257])\n '¶
- __module__ = 'statsmodels.robust.robust_linear_model'¶
- _initialize()[source]¶
Initializes the model for the IRLS fit.
Resets the history and number of iterations.
- fit(maxiter=50, tol=1e-08, scale_est='mad', init=None, cov='H1', update_scale=True, conv='dev', start_params=None)[source]¶
Fits the model using iteratively reweighted least squares.
The IRLS routine runs until the specified objective converges to tol or maxiter has been reached.
- Parameters:
- convstr
Indicates the convergence criteria. Available options are “coefs” (the coefficients), “weights” (the weights in the iteration), “sresid” (the standardized residuals), and “dev” (the un-normalized log-likelihood for the M estimator). The default is “dev”.
- covstr, optional
‘H1’, ‘H2’, or ‘H3’ Indicates how the covariance matrix is estimated. Default is ‘H1’. See rlm.RLMResults for more information.
- initstr
Specifies method for the initial estimates of the parameters. Default is None, which means that the least squares estimate is used. Currently it is the only available choice.
- maxiterint
The maximum number of iterations to try. Default is 50.
- scale_eststr or HuberScale()
‘mad’ or HuberScale() Indicates the estimate to use for scaling the weights in the IRLS. The default is ‘mad’ (median absolute deviation. Other options are ‘HuberScale’ for Huber’s proposal 2. Huber’s proposal 2 has optional keyword arguments d, tol, and maxiter for specifying the tuning constant, the convergence tolerance, and the maximum number of iterations. See statsmodels.robust.scale for more information.
- tolfloat
The convergence tolerance of the estimate. Default is 1e-8.
- update_scaleBool
If update_scale is False then the scale estimate for the weights is held constant over the iteration. Otherwise, it is updated for each fit in the iteration. Default is True.
- start_paramsarray_like, optional
Initial guess of the solution of the optimizer. If not provided, the initial parameters are computed using OLS.
- Returns:
- resultsstatsmodels.rlm.RLMresults
Results instance
- information(params)[source]¶
Fisher information matrix of model.
Returns -1 * Hessian of the log-likelihood evaluated at params.
- Parameters:
- paramsndarray
The model parameters.
- loglike(params)[source]¶
Log-likelihood of model.
- Parameters:
- paramsndarray
The model parameters used to compute the log-likelihood.
Notes
Must be overridden by subclasses.
- class StandardScaler(*, copy=True, with_mean=True, with_std=True)[source]¶
Bases:
OneToOneFeatureMixin,TransformerMixin,BaseEstimatorStandardize features by removing the mean and scaling to unit variance.
The standard score of a sample x is calculated as:
z = (x - u) / s
where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using
transform().Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger than others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
StandardScaler is sensitive to outliers, and the features may scale differently from each other in the presence of outliers. For an example visualization, refer to Compare StandardScaler with other scalers.
This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data.
Read more in the User Guide.
- Parameters:
- copybool, default=True
If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
- with_meanbool, default=True
If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
- with_stdbool, default=True
If True, scale the data to unit variance (or equivalently, unit standard deviation).
See also
scaleEquivalent function without the estimator API.
PCAFurther removes the linear correlation across features with ‘whiten=True’.
Notes
NaNs are treated as missing values: disregarded in fit, and maintained in transform.
We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.
Examples
>>> from sklearn.preprocessing import StandardScaler >>> data = [[0, 0], [0, 0], [1, 1], [1, 1]] >>> scaler = StandardScaler() >>> print(scaler.fit(data)) StandardScaler() >>> print(scaler.mean_) [0.5 0.5] >>> print(scaler.transform(data)) [[-1. -1.] [-1. -1.] [ 1. 1.] [ 1. 1.]] >>> print(scaler.transform([[2, 2]])) [[3. 3.]]
- Attributes:
- scale_ndarray of shape (n_features,) or None
Per feature relative scaling of the data to achieve zero mean and unit variance. Generally this is calculated using np.sqrt(var_). If a variance is zero, we can’t achieve unit variance, and the data is left as-is, giving a scaling factor of 1. scale_ is equal to None when with_std=False.
New in version 0.17: scale_
- mean_ndarray of shape (n_features,) or None
The mean value for each feature in the training set. Equal to
Nonewhenwith_mean=Falseandwith_std=False.- var_ndarray of shape (n_features,) or None
The variance for each feature in the training set. Used to compute scale_. Equal to
Nonewhenwith_mean=Falseandwith_std=False.- n_features_in_int
Number of features seen during fit.
New in version 0.24.
- feature_names_in_ndarray of shape (n_features_in_,)
Names of features seen during fit. Defined only when X has feature names that are all strings.
New in version 1.0.
- n_samples_seen_int or ndarray of shape (n_features,)
The number of samples processed by the estimator for each feature. If there are no missing samples, the
n_samples_seenwill be an integer, otherwise it will be an array of dtype int. If sample_weights are used it will be a float (if no missing data) or an array of dtype float that sums the weights seen so far. Will be reset on new calls to fit, but increments acrosspartial_fitcalls.
Methods
fit(X[, y, sample_weight])Compute the mean and std to be used for later scaling.
inverse_transform(X[, copy])Scale back the data to the original representation.
partial_fit(X[, y, sample_weight])Online computation of mean and std on X for later scaling.
set_fit_request(*[, sample_weight])Request metadata passed to the
fitmethod.set_inverse_transform_request(*[, copy])Request metadata passed to the
inverse_transformmethod.set_partial_fit_request(*[, sample_weight])Request metadata passed to the
partial_fitmethod.set_transform_request(*[, copy])Request metadata passed to the
transformmethod.transform(X[, copy])Perform standardization by centering and scaling.
- __annotations__ = {'_parameter_constraints': <class 'dict'>}¶
- __doc__ = "Standardize features by removing the mean and scaling to unit variance.\n\n The standard score of a sample `x` is calculated as:\n\n z = (x - u) / s\n\n where `u` is the mean of the training samples or zero if `with_mean=False`,\n and `s` is the standard deviation of the training samples or one if\n `with_std=False`.\n\n Centering and scaling happen independently on each feature by computing\n the relevant statistics on the samples in the training set. Mean and\n standard deviation are then stored to be used on later data using\n :meth:`transform`.\n\n Standardization of a dataset is a common requirement for many\n machine learning estimators: they might behave badly if the\n individual features do not more or less look like standard normally\n distributed data (e.g. Gaussian with 0 mean and unit variance).\n\n For instance many elements used in the objective function of\n a learning algorithm (such as the RBF kernel of Support Vector\n Machines or the L1 and L2 regularizers of linear models) assume that\n all features are centered around 0 and have variance in the same\n order. If a feature has a variance that is orders of magnitude larger\n than others, it might dominate the objective function and make the\n estimator unable to learn from other features correctly as expected.\n\n `StandardScaler` is sensitive to outliers, and the features may scale\n differently from each other in the presence of outliers. For an example\n visualization, refer to :ref:`Compare StandardScaler with other scalers\n <plot_all_scaling_standard_scaler_section>`.\n\n This scaler can also be applied to sparse CSR or CSC matrices by passing\n `with_mean=False` to avoid breaking the sparsity structure of the data.\n\n Read more in the :ref:`User Guide <preprocessing_scaler>`.\n\n Parameters\n ----------\n copy : bool, default=True\n If False, try to avoid a copy and do inplace scaling instead.\n This is not guaranteed to always work inplace; e.g. if the data is\n not a NumPy array or scipy.sparse CSR matrix, a copy may still be\n returned.\n\n with_mean : bool, default=True\n If True, center the data before scaling.\n This does not work (and will raise an exception) when attempted on\n sparse matrices, because centering them entails building a dense\n matrix which in common use cases is likely to be too large to fit in\n memory.\n\n with_std : bool, default=True\n If True, scale the data to unit variance (or equivalently,\n unit standard deviation).\n\n Attributes\n ----------\n scale_ : ndarray of shape (n_features,) or None\n Per feature relative scaling of the data to achieve zero mean and unit\n variance. Generally this is calculated using `np.sqrt(var_)`. If a\n variance is zero, we can't achieve unit variance, and the data is left\n as-is, giving a scaling factor of 1. `scale_` is equal to `None`\n when `with_std=False`.\n\n .. versionadded:: 0.17\n *scale_*\n\n mean_ : ndarray of shape (n_features,) or None\n The mean value for each feature in the training set.\n Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.\n\n var_ : ndarray of shape (n_features,) or None\n The variance for each feature in the training set. Used to compute\n `scale_`. Equal to ``None`` when ``with_mean=False`` and\n ``with_std=False``.\n\n n_features_in_ : int\n Number of features seen during :term:`fit`.\n\n .. versionadded:: 0.24\n\n feature_names_in_ : ndarray of shape (`n_features_in_`,)\n Names of features seen during :term:`fit`. Defined only when `X`\n has feature names that are all strings.\n\n .. versionadded:: 1.0\n\n n_samples_seen_ : int or ndarray of shape (n_features,)\n The number of samples processed by the estimator for each feature.\n If there are no missing samples, the ``n_samples_seen`` will be an\n integer, otherwise it will be an array of dtype int. If\n `sample_weights` are used it will be a float (if no missing data)\n or an array of dtype float that sums the weights seen so far.\n Will be reset on new calls to fit, but increments across\n ``partial_fit`` calls.\n\n See Also\n --------\n scale : Equivalent function without the estimator API.\n\n :class:`~sklearn.decomposition.PCA` : Further removes the linear\n correlation across features with 'whiten=True'.\n\n Notes\n -----\n NaNs are treated as missing values: disregarded in fit, and maintained in\n transform.\n\n We use a biased estimator for the standard deviation, equivalent to\n `numpy.std(x, ddof=0)`. Note that the choice of `ddof` is unlikely to\n affect model performance.\n\n Examples\n --------\n >>> from sklearn.preprocessing import StandardScaler\n >>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]\n >>> scaler = StandardScaler()\n >>> print(scaler.fit(data))\n StandardScaler()\n >>> print(scaler.mean_)\n [0.5 0.5]\n >>> print(scaler.transform(data))\n [[-1. -1.]\n [-1. -1.]\n [ 1. 1.]\n [ 1. 1.]]\n >>> print(scaler.transform([[2, 2]]))\n [[3. 3.]]\n "¶
- __module__ = 'sklearn.preprocessing._data'¶
- _parameter_constraints: dict = {'copy': ['boolean'], 'with_mean': ['boolean'], 'with_std': ['boolean']}¶
- _reset()[source]¶
Reset internal data-dependent state of the scaler, if necessary.
__init__ parameters are not touched.
- _sklearn_auto_wrap_output_keys = {'transform'}¶
- fit(X, y=None, sample_weight=None)[source]¶
Compute the mean and std to be used for later scaling.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The data used to compute the mean and standard deviation used for later scaling along the features axis.
- yNone
Ignored.
- sample_weightarray-like of shape (n_samples,), default=None
Individual weights for each sample.
New in version 0.24: parameter sample_weight support to StandardScaler.
- Returns:
- selfobject
Fitted scaler.
- inverse_transform(X, copy=None)[source]¶
Scale back the data to the original representation.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The data used to scale along the features axis.
- copybool, default=None
Copy the input X or not.
- Returns:
- X_tr{ndarray, sparse matrix} of shape (n_samples, n_features)
Transformed array.
- partial_fit(X, y=None, sample_weight=None)[source]¶
Online computation of mean and std on X for later scaling.
All of X is processed as a single batch. This is intended for cases when
fit()is not feasible due to very large number of n_samples or because X is read from a continuous stream.The algorithm for incremental mean and std is given in Equation 1.5a,b in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. “Algorithms for computing the sample variance: Analysis and recommendations.” The American Statistician 37.3 (1983): 242-247:
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The data used to compute the mean and standard deviation used for later scaling along the features axis.
- yNone
Ignored.
- sample_weightarray-like of shape (n_samples,), default=None
Individual weights for each sample.
New in version 0.24: parameter sample_weight support to StandardScaler.
- Returns:
- selfobject
Fitted scaler.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') StandardScaler¶
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- Returns:
- selfobject
The updated object.
- set_inverse_transform_request(*, copy: bool | None | str = '$UNCHANGED$') StandardScaler¶
Request metadata passed to the
inverse_transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toinverse_transformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toinverse_transform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter ininverse_transform.
- Returns:
- selfobject
The updated object.
- set_partial_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') StandardScaler¶
Request metadata passed to the
partial_fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topartial_fitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topartial_fit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inpartial_fit.
- Returns:
- selfobject
The updated object.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') StandardScaler¶
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
- copystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
copyparameter intransform.
- Returns:
- selfobject
The updated object.
- transform(X, copy=None)[source]¶
Perform standardization by centering and scaling.
- Parameters:
- X{array-like, sparse matrix of shape (n_samples, n_features)
The data used to scale along the features axis.
- copybool, default=None
Copy the input X or not.
- Returns:
- X_tr{ndarray, sparse matrix} of shape (n_samples, n_features)
Transformed array.
- _align(y: Series, X: Series | DataFrame, how: str = 'inner', allow_empty: bool = False) Tuple[Series, DataFrame][source]¶
Align target and neighbor(s) on a common DatetimeIndex.
- _assert_same_regular_grid(idx_y: DatetimeIndex, idx_x: DatetimeIndex) None[source]¶
Raise if the two indices are not on the same regular grid (same step & phase).
- _dfm_params_to_vector(mod, params)[source]¶
- Build transformed vector in the exact order of mod.param_names from either:
{‘transformed’: […], ‘param_names’: […]}, or
a constrained dict {‘q_beta’:…, ‘q_ax’:…, ‘r_y’:…, ‘r_x’:…, ‘phi_x’:…, ‘load’:…}
- _fit_dfm(y, X, *, factor: str = 'default', anomaly_mode: str = 'ar', anom_var: str = 'neighbor', rx_scale: float = 3.0, maxiter: int = 80, disp: int = 0, params: dict | None = None)[source]¶
- _fit_huber(y: Series, X: DataFrame) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]¶
- _fit_lagged_elasticnet(y: Series, X: DataFrame, lags: Iterable[int], alphas: List[float] | None = None, l1_ratio: float = 0.2, n_splits: int = 3) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]¶
- _fit_loess(y: Series, X: DataFrame, frac: float = 0.2) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]¶
Locally weighted regression (LOESS) smoother for neighbor fill.
- _fit_ols(y: Series, X: DataFrame) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]¶
- _fit_resid_interp(y: Series, X: DataFrame, kind: str = 'linear') Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]¶
Fill y using neighbor via interpolated residuals.
- Steps:
Fit baseline y ≈ a + b x on overlap (OLS; falls back to ratio if needed).
Residuals r = y - (a + b x) on overlap.
Interpolate r only inside gaps (bounded on both sides) using ‘linear’ or ‘pchip’.
Reconstruct yhat = (a + b x) + r_interp wherever x is available.
- _fit_rolling_regression(y: Series, X: DataFrame, window: int, center: bool = False) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]¶
- _forward_chain_splits(n: int, n_splits: int = 3, min_train: int = 50) List[Tuple[ndarray, ndarray]][source]¶
Generate forward-chaining train/test index splits for time series.
- Parameters:
- nint
Number of samples.
- n_splitsint
How many folds.
- min_trainint
Minimum size of the initial training window.
- _mask_overlap(y: Series, X: DataFrame) Tuple[Series, DataFrame][source]¶
Keep only timestamps where both y and ALL X columns are non-NaN.
- _suggest_lags(y: Series, x: Series, max_lag: int) List[int][source]¶
Suggest non-negative lags (in steps) by cross-correlation peak.
Returns a list of lags sorted by decreasing absolute correlation.
- asdict(obj, *, dict_factory=<class 'dict'>)[source]¶
Return the fields of a dataclass instance as a new dictionary mapping field names to field values.
Example usage:
@dataclass class C:
x: int y: int
c = C(1, 2) assert asdict(c) == {‘x’: 1, ‘y’: 2}
If given, ‘dict_factory’ will be used instead of built-in dict. The function applies recursively to field values that are dataclass instances. This will also look into built-in containers: tuples, lists, and dicts.
- dataclass(cls=None, /, *, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False)[source]¶
Returns the same class as was passed in, with dunder methods added based on the fields defined in the class.
Examines PEP 526 __annotations__ to determine fields.
If init is true, an __init__() method is added to the class. If repr is true, a __repr__() method is added. If order is true, rich comparison dunder methods are added. If unsafe_hash is true, a __hash__() method function is added. If frozen is true, fields may not be assigned to after instance creation. If match_args is true, the __match_args__ tuple is added. If kw_only is true, then by default all fields are keyword-only. If slots is true, an __slots__ attribute is added.
- dfm_pack_params(model_info: dict) dict[source]¶
Return a portable blob of fitted DFM params.
- Parameters:
- model_infodict
Model info dictionary, typically from fill_from_neighbor.
- Returns:
- dict
Dictionary containing fitted DFM parameters with the following keys: - ‘param_names’: list of parameter names. - ‘transformed’: list of transformed parameter values. - ‘constrained’: dictionary of constrained parameter values. - ‘mle’: dictionary with optimizer info (optional). - ‘reused’: bool indicating if parameters were reused (optional).
- Raises:
- TypeError
If model_info is not a dictionary.
- ValueError
If no fitted parameters are found in model_info.
- fill_from_neighbor(target: Series, neighbor: Series | DataFrame, method: str = 'substitute', regime: Series | None = None, bounds: Tuple[float | None, float | None] = (None, None), *, params: dict | None = None, **kwargs) Dict[str, Any][source]¶
Fill gaps in
targetusing information fromneighbor.This is a high-level wrapper with multiple method backends (OLS/robust, rolling regression, lagged regression, LOESS-in-time, Trimbur-style DFM variants, residual-interpolation baselines, or simple substitution). Inputs must already lie on the same regular time grid (same step and phase); this function does not resample.
- Parameters:
- targetpandas.Series
Target time series with a
DatetimeIndexon a regular grid. Values may be NaN.- neighborpandas.Series or pandas.DataFrame
One or more neighbor series with a
DatetimeIndexon the same grid astarget(same step and phase). Values may be NaN.- method{‘substitute’, ‘ols’, ‘huber’, ‘rolling’, ‘lagged_reg’,
‘loess’, ‘dfm_trimbur_rw’, ‘dfm_trimbur_ar’, ‘resid_interp_linear’, ‘resid_interp_pchip’}
Algorithm to use:
'substitute': pass-through neighbor after mean/scale alignment.'ols': ordinary least squares on overlap (optionally with lags).'huber': robust regression with Huber loss (optionally with lags).'rolling': rolling-window OLS in sample units (not time offsets).'lagged_reg': multivariate regression on specified neighbor lags.'loess': LOESS (time → value) smoothing using neighbor as scaffold.'dfm_trimbur_rw': dynamic factor model (Trimbur factor) with
random-walk anomaly for the target. -
'dfm_trimbur_ar': dynamic factor model (Trimbur factor) with AR anomaly on the neighbor. -'resid_interp_linear'/'resid_interp_pchip': baseline y≈a+bx fit on overlap, then interpolate residuals (linear or PCHIP) across gaps.- regimepandas.Series, optional
Optional categorical series indexed like
targetto stratify fits (e.g., barrier in/out). If provided, models are fit per category and stitched back together.- bounds(float or None, float or None)
Lower/upper bounds to clip the final filled values (applied at the end).
- paramsdict, optional
Pre-fitted/packed parameter blob for methods that support parameter reuse (e.g., the DFM backends). If provided, fitting is skipped and the supplied parameters are used directly.
- **kwargs
Method-specific optional arguments. Unsupported keys are ignored unless otherwise noted. Typical extras by method:
- Common
- lagsint or Sequence[int], optional
Non-negative lags (in samples) for neighbor features. If an int m is provided, implementations may expand to range(0, m+1). Default behavior varies by method (often no lags or a small heuristic set).
- seedint, optional
Random seed for any stochastic initializations (where applicable).
- ‘ols’
lags : int or Sequence[int], optional add_const : bool, default True
Include an intercept term.
fit_intercept : bool, alias of
add_const.- ‘huber’
lags : int or Sequence[int], optional huber_t : float, default 1.35
Huber threshold (in residual σ units).
maxiter : int, default 200 tol : float, default 1e-6
- ‘rolling’
- windowint, required
Rolling window length in samples (integer). Time-offset strings (e.g., ‘14D’) are not supported here.
- min_periodsint, optional
Minimum non-NaN samples required inside each window (default = window).
- centerbool, default False
Whether to center the rolling window.
- lagsint or Sequence[int], optional
If provided, each regression uses lagged neighbor columns inside the window.
- ‘lagged_reg’
lags : int or Sequence[int], recommended alpha : float, optional
Ridge/L2 penalty (if the backend supports it).
- l1_ratiofloat, optional
Elastic-net mixing (if the backend supports it).
- standardizebool, default True
Standardize columns before regression.
- ‘loess’
- fracfloat, default 0.25
LOESS span as a fraction of the data length (used in time→value smoothing).
- itint, default 0
Number of robustifying reweighting iterations.
- degreeint, default 1
Local polynomial degree.
- ‘dfm_trimbur_rw’ / ‘dfm_trimbur_ar’
- rx_scalefloat, default 1.0
Relative scale factor for neighbor measurement noise.
- maxiterint, default 80
Maximum optimizer iterations during parameter fitting.
- dispint, default 0
Optimizer verbosity (0 = silent).
- anom_var{‘target’,’neighbor’}, optional
Which series carries the anomaly/noise term (fixed by the variant, but may be overridden).
- ar_orderint, optional
AR order for the anomaly in the
'_ar'variant (default may be 1).- param_nameslist[str], optional
For advanced users: explicit parameter naming (used when packing).
# Note: DFM backends accept
params=...at the top level for reuse.- ‘resid_interp_linear’ / ‘resid_interp_pchip’
- min_overlapint, default 3
Minimum overlapping samples required to fit the baseline y≈a+bx.
- clip_residuals_sigmafloat, optional
Winsorize residuals before interpolation (σ units).
- enforce_monotonebool, default False
For PCHIP path only: enforce monotonic segments where applicable.
- Returns:
- dict
Dictionary with the following keys:
- yhatpandas.Series
Filled series on the same index as
target.- pi_lowerpandas.Series or None
Lower uncertainty band (if the method provides one), otherwise None.
- pi_upperpandas.Series or None
Upper uncertainty band (if the method provides one), otherwise None.
- model_infodict
Method-specific diagnostics and metadata. Typical fields include:
method,param_names,fitted_params(packed blob for reuse),scaling(means/stds used), goodness-of-fit (e.g.,llf,aic,bic), and per-regime info whenregimeis provided.
- Raises:
- ValueError
If indices are not equally spaced, or grids mismatch in step or phase, or if required method-specific kwargs are missing (e.g.,
windowformethod='rolling').- KeyError
If an unknown method name is provided.
- fit_loess_time_value(y: Series, X: DataFrame, frac_time: float = 0.05, min_neighbors: int = 25) Tuple[Series, Series | None, Series | None, Dict[str, Any]][source]¶
Two-dimensional LOESS-like smoother: y(t) ~ f(x(t), t), implemented as distance-weighted KNN in (time, value) space.
Avoids Series&DataFrame boolean broadcasting by reducing X→Series first.
Scales time and value so distances are comparable.
Predicts wherever neighbor is present.
- load_dfm_params(path: str) Dict[str, Any][source]¶
Load a DFM parameter blob from YAML and validate minimal schema.
- lowess(endog, exog, frac=0.6666666666666666, it=3, delta=0.0, xvals=None, is_sorted=False, missing='drop', return_sorted=True)[source]¶
LOWESS (Locally Weighted Scatterplot Smoothing)
A lowess function that outs smoothed estimates of endog at the given exog values from points (exog, endog)
- Parameters:
- endog1-D numpy array
The y-values of the observed points
- exog1-D numpy array
The x-values of the observed points
- fracfloat
Between 0 and 1. The fraction of the data used when estimating each y-value.
- itint
The number of residual-based reweightings to perform.
- deltafloat
Distance within which to use linear-interpolation instead of weighted regression.
- xvals: 1-D numpy array
Values of the exogenous variable at which to evaluate the regression. If supplied, cannot use delta.
- is_sortedbool
If False (default), then the data will be sorted by exog before calculating lowess. If True, then it is assumed that the data is already sorted by exog. If xvals is specified, then it too must be sorted if is_sorted is True.
- missingstr
Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no nan checking is done. If ‘drop’, any observations with nans are dropped. If ‘raise’, an error is raised. Default is ‘drop’.
- return_sortedbool
If True (default), then the returned array is sorted by exog and has missing (nan or infinite) observations removed. If False, then the returned array is in the same length and the same sequence of observations as the input array.
- Returns:
- out{ndarray, float}
The returned array is two-dimensional if return_sorted is True, and one dimensional if return_sorted is False. If return_sorted is True, then a numpy array with two columns. The first column contains the sorted x (exog) values and the second column the associated estimated y (endog) values. If return_sorted is False, then only the fitted values are returned, and the observations will be in the same order as the input arrays. If xvals is provided, then return_sorted is ignored and the returned array is always one dimensional, containing the y values fitted at the x values provided by xvals.
Notes
This lowess function implements the algorithm given in the reference below using local linear estimates.
Suppose the input data has N points. The algorithm works by estimating the smooth y_i by taking the frac*N closest points to (x_i,y_i) based on their x values and estimating y_i using a weighted linear regression. The weight for (x_j,y_j) is tricube function applied to abs(x_i-x_j).
If it > 1, then further weighted local linear regressions are performed, where the weights are the same as above times the _lowess_bisquare function of the residuals. Each iteration takes approximately the same amount of time as the original fit, so these iterations are expensive. They are most useful when the noise has extremely heavy tails, such as Cauchy noise. Noise with less heavy-tails, such as t-distributions with df>2, are less problematic. The weights downgrade the influence of points with large residuals. In the extreme case, points whose residuals are larger than 6 times the median absolute residual are given weight 0.
delta can be used to save computations. For each x_i, regressions are skipped for points closer than delta. The next regression is fit for the farthest point within delta of x_i and all points in between are estimated by linearly interpolating between the two regression fits.
Judicious choice of delta can cut computation time considerably for large data (N > 5000). A good choice is
delta = 0.01 * range(exog).If xvals is provided, the regression is then computed at those points and the fit values are returned. Otherwise, the regression is run at points of exog.
Some experimentation is likely required to find a good choice of frac and iter for a particular dataset.
References
Cleveland, W.S. (1979) “Robust Locally Weighted Regression and Smoothing Scatterplots”. Journal of the American Statistical Association 74 (368): 829-836.
Examples
The below allows a comparison between how different the fits from lowess for different values of frac can be.
>>> import numpy as np >>> import statsmodels.api as sm >>> lowess = sm.nonparametric.lowess >>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500) >>> y = np.sin(x) + np.random.normal(size=len(x)) >>> z = lowess(y, x) >>> w = lowess(y, x, frac=1./3)
This gives a similar comparison for when it is 0 vs not.
>>> import numpy as np >>> import scipy.stats as stats >>> import statsmodels.api as sm >>> lowess = sm.nonparametric.lowess >>> x = np.random.uniform(low = -2*np.pi, high = 2*np.pi, size=500) >>> y = np.sin(x) + stats.cauchy.rvs(size=len(x)) >>> z = lowess(y, x, frac= 1./3, it=0) >>> w = lowess(y, x, frac=1./3)
- save_dfm_params(params: Dict[str, Any], path: str) None[source]¶
Save a DFM parameter blob to YAML (preferred for this codebase). File extension may be .yaml or .yml. Other extensions raise.
- write_filled_csv_with_yaml_header(filled: Series, path: str, model_info: Dict[str, Any], metrics: Dict[str, float] | None = None, extra_meta: Dict[str, Any] | None = None, float_format: str = '{:.6g}')[source]¶
Write a CSV file with a YAML-like header as #-comments.
- Parameters:
- filledpd.Series
Series to write; index must be a DatetimeIndex.
- pathstr
Destination filepath.
- model_infodict
Metadata from fill_from_neighbor; will be serialized.
- metricsdict, optional
Metrics to include.
- extra_metadict, optional
Any additional metadata.
- float_formatstr
Format string for values.
vtools.functions.period_op module¶
vtools.functions.savitzky_golay module¶
- savgol_filter_weighted(data, window_length, degree, error=None, cov_matrix=None, deriv=None, use_numba=True)[source]¶
Apply a Savitzky–Golay filter with weights to a univariate DataFrame or Series.
- Parameters:
- datapandas.DataFrame or pandas.Series
DataFrame or Series containing your data.
- window_lengthint
Length of the filter window (must be odd).
- degreeint
Degree of the polynomial fit.
- errorpandas.Series, optional
Series containing the error (used to compute weights).
- cov_matrix2D numpy array, optional
Covariance matrix for the errors.
- derivint, optional
Derivative order to compute.
- use_numbabool, optional
If True, uses the Numba-accelerated kernel.
- Returns:
- pandas.Series
Series of the filtered values.
Notes
The practical size of window_length depends on the data and the computational resources. Larger window lengths provide smoother results but require more computation and may not capture local variations well. It is recommended to experiment with different window lengths to find the optimal value for your specific application.
Some of the workflow derived from this work: https://github.com/surhudm/savitzky_golay_with_errors
vtools.functions.separate_species module¶
Separation of tidal data into species The key function in this module is separate_species, which decomposes tides into subtidal, diurnal, semidiurnal and noise components.
The fileters are long, so the time resolution of the amplitude may be limited. A demo function is also provided that reads tide series (6min intervl) from input files, seperates the species, writes results and optionally plots an example
- run_example()[source]¶
This is the data for the example. Note that you want the data to be at least 4 days longer than the desired output
- separate_species(ts, noise_thresh_min=40)[source]¶
Separate species into subtidal, diurnal, semidiurnal and noise components
- Input:
ts: timeseries to be decomposed into species, assumed to be at six minute intervals. The filters used have long lenghts, so avoid missing data and allow for four extra days worth of data on each end.
- Output:
four regular time series, representing subtidal, diurnal, semi-diurnal and noise
vtools.functions.skill_metrics module¶
- corr_coefficient(predictions, targets, method='pearson')[source]¶
Calculates the correlation coefficient (the ‘r’ in ‘-squared’ between two series.
For time series where the targets are serially correlated and may span only a fraction of the natural variability, this statistic may not be appropriate and Murphy (1988) explains why caution should be exercised in using this statistic.
- Parameters:
- predictions, targetsarray_like
Time series to analyze
- methodpearson’, ‘kendall’, ‘spearman’
Method compatilble with pandasa
- Returns:
- rfloat
Correlation coefficient
- mean_error(predictions, targets, proportiontocut)[source]¶
Calculate the untrimmed mean error, discounting nan values
- Parameters:
- predictions, targetsarray_like
Time series or arrays to be analyzed
- Returns:
- medfloat
Median error
- median_error(predictions, targets)[source]¶
Calculate the median error, discounting nan values
- Parameters:
- predictions, targetsarray_like
Time series or arrays to be analyzed
- Returns:
- medfloat
Median error
- mse(predictions, targets)[source]¶
Mean squared error
- Parameters:
- predictions, targetsarray_like
Time series or arrays to analyze
- Returns:
- msevtools.data.timeseries.TimeSeries
Mean squared error between predictions and targets
- rmse(predictions, targets)[source]¶
Root mean squared error
- Parameters:
- predictions, targetsarray_like
Time series or arrays to analyze
- Returns:
- msefloat
Mean squared error
- skill_score(predictions, targets, ref=None)[source]¶
Calculate a Nash-Sutcliffe-like skill score based on mean squared error
As per the discussion in Murphy (1988) other reference forecasts (climatology, harmonic tide, etc.) are possible.
- Parameters:
- predictions, targetsarray_like
Time series or arrays to be analyzed
- Returns:
- rmsefloat
Root mean squared error
- tmean_error(predictions, targets, limits=None, inclusive=[True, True])[source]¶
Calculate the (possibly trimmed) mean error, discounting nan values
- Parameters:
- predictions, targetsarray_like
Time series or arrays to be analyzed
- limitstuple(float)
Low and high limits for trimming
- inclusive[boolean, boolean]
True if clipping is inclusive on the low/high end
- Returns:
- meanfloat
Trimmed mean error
- willmott_score(predictions, targets, ref=None)[source]¶
Calculate a Nash-Sutcliffe-like skill score based on mean squared error
As per the discussion in Murphy (1988) other reference forecasts (climatology, harmonic tide, etc.) are possible.
- Parameters:
- predictions, targetsarray_like
Time series or arrays to be analyzed
- Returns:
- rmsefloat
Root mean squared error
vtools.functions.tidalhl module¶
- cosine_lanczos(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
- get_smoothed_resampled(df, cutoff_period='2h', resample_period='1min', interpolate_method='pchip')[source]¶
Resample the dataframe (indexed by time) to the regular period of resample_period using the interpolate method
Furthermore the cosine lanczos filter is used with a cutoff_period to smooth the signal to remove high frequency noise
Args:
df (DataFrame): A single column dataframe indexed by datetime
cutoff_period (str, optional): cutoff period for cosine lanczos filter. Defaults to ‘2h’.
resample_period (str, optional): Resample to regular period. Defaults to ‘1min’.
interpolate_method (str, optional): interpolation for resampling. Defaults to ‘pchip’.
Returns:
DataFrame: smoothed and resampled dataframe indexed by datetime
- get_tidal_amplitude(dfh, dfl)[source]¶
Tidal amplitude given tidal highs and lows
Args:
dfh (DataFrame): Tidal highs time series
dfl (DataFrame): Tidal lows time series
Returns:
DataFrame: Amplitude timeseries, at the times of the low following the high being used for amplitude calculation
- get_tidal_amplitude_diff(dfamp1, dfamp2, percent_diff=False, tolerance='4h')[source]¶
Get the difference of values within +/- 4H of values in the two amplitude arrays
Args:
dfamp1 (DataFrame): Amplitude time series
dfamp2 (DataFrame): Amplitude time series
percent_diff (bool, optional): If true do percent diff. Defaults to False.
Returns:
DataFrame: Difference dfamp1-dfamp2 or % Difference (dfamp1-dfamp2)/dfamp2*100 for values within +/- 4H of each other
- get_tidal_hl(df, cutoff_period='2h', resample_period='1min', interpolate_method='pchip', moving_window_size='7h')[source]¶
Get Tidal highs and lows
Args:
df (DataFrame): A single column dataframe indexed by datetime
cutoff_period (str, optional): cutoff period for cosine lanczos filter. Defaults to ‘2h’.
resample_period (str, optional): Resample to regular period. Defaults to ‘1min’.
interpolate_method (str, optional): interpolation for resampling. Defaults to ‘pchip’.
moving_window_size (str, optional): moving window size to look for lows within. Defaults to ‘7h’.
Returns:
tuple of DataFrame: Tidal high and tidal low time series
- get_tidal_hl_rolling(df, cutoff_period='2h', resample_period='1min', interpolate_method='pchip', moving_window_size='7h')¶
Get Tidal highs and lows
Args:
df (DataFrame): A single column dataframe indexed by datetime
cutoff_period (str, optional): cutoff period for cosine lanczos filter. Defaults to ‘2h’.
resample_period (str, optional): Resample to regular period. Defaults to ‘1min’.
interpolate_method (str, optional): interpolation for resampling. Defaults to ‘pchip’.
moving_window_size (str, optional): moving window size to look for lows within. Defaults to ‘7h’.
Returns:
tuple of DataFrame: Tidal high and tidal low time series
- get_tidal_hl_zerocrossing(df, round_to='1min')[source]¶
Finds the tidal high and low times using zero crossings of the first derivative.
This works for all situations but is not robust in the face of noise and perturbations in the signal
- get_tidal_phase_diff(dfh2, dfl2, dfh1, dfl1, tolerance='4h')[source]¶
Calculates the phase difference between df2 and df1 tidal highs and lows
Scans +/- 4 hours in df1 to get the highs and lows in that windows for df2 to get the tidal highs and lows at the times of df1
Args:
dfh2 (DataFrame): Timeseries of tidal highs
dfl2 (DataFrame): Timeseries of tidal lows
dfh1 (DataFrame): Timeseries of tidal highs
dfl1 (DataFRame): Timeseries of tidal lows
Returns:
DataFrame: Phase difference (dfh2-dfh1) and (dfl2-dfl1) in minutes
- periods_per_window(moving_window_size: str, period_str: str) int[source]¶
Number of period size in moving window
Args:
moving_window_size (str): moving window size as a string e.g 7H for 7 hour
period_str (str): period as str e.g. 1T for 1 min
Returns:
int: number of periods in the moving window rounded to an integer
- tidal_highs(df, moving_window_size='7h')[source]¶
Tidal highs (could be upto two highs in a 25 hr period)
Args:
df (DataFrame): a time series with a regular frequency
moving_window_size (str, optional): moving window size to look for highs within. Defaults to ‘7h’.
Returns:
DataFrame: an irregular time series with highs at resolution of df.index
- tidal_lows(df, moving_window_size='7h')[source]¶
Tidal lows (could be upto two lows in a 25 hr period)
Args:
df (DataFrame): a time series with a regular frequency
moving_window_size (str, optional): moving window size to look for lows within. Defaults to ‘7h’.
Returns:
DataFrame: an irregular time series with lows at resolution of df.index
vtools.functions.tidalhours module¶
Functions for analyzing tidal cycles from time series data.
This module provides functions to analyze tidal time series, identify slack water times, and map any time to its position within the tidal cycle (tidal hour). This is useful for tidal phase analysis in estuarine and coastal studies.
Functions¶
- find_slack(jd, u, leave_mean=False, which=’both’)
Identifies the times of “slack water”—the moments when tidal current velocity (u) crosses zero.
- hour_tide(jd, u=None, h=None, jd_new=None, leave_mean=False)
Calculates the “tidal hour” for each time point, i.e., the phase of the tidal cycle (0–12, where 0 is slack before ebb).
- hour_tide_fn(jd, u, leave_mean=False)
Returns a function that computes tidal hour for arbitrary time points, based on the provided time/velocity series.
- tidal_hour_signal(ts, filter=True)
Compute the tidal hour of a semidiurnal signal.
- diff_h(tidal_hour_series)
Compute the time derivative of tidal hour.
- cdiff(a, n=1, axis=-1)[source]¶
Like np.diff, but include difference from last element back to first.
- Parameters:
- aarray-like
Input array.
- nint, optional
Order of the difference. Only n=1 is supported.
- axisint, optional
Axis along which the difference is taken.
- Returns:
- ndarray
Array of differences, same shape as input.
Notes
This function computes the difference between consecutive elements of the input array, and also includes the difference from the last element back to the first, preserving the array shape.
- cosine_lanczos5(ts, cutoff_period=None, cutoff_frequency=None, filter_len=None, padtype=None, padlen=None, fill_edge_nan=True)[source]¶
squared low-pass cosine lanczos filter on a regular time series.
- Parameters:
- ts
DataFrame - filter_lenint, time_interval
Size of lanczos window, default is to number of samples within filter_period*1.25.
- cutoff_frequency: float,optional
Cutoff frequency expressed as a ratio of a Nyquist frequency, should within the range (0,1). For example, if the sampling frequency is 1 hour, the Nyquist frequency is 1 sample/2 hours. If we want a 36 hour cutoff period, the frequency is 1/36 or 0.0278 cycles per hour. Hence the cutoff frequency argument used here would be 0.0278/0.5 = 0.056.
- cutoff_periodstring or _time_interval
Period of cutting off frequency. If input as a string, it must be convertible to a _time_interval (Pandas freq). cutoff_frequency and cutoff_period can’t be specified at the same time.
- padtypestr or None, optional
Must be ‘odd’, ‘even’, ‘constant’, or None. This determines the type of extension to use for the padded signal to which the filter is applied. If padtype is None, no padding is used. The default is None.
- padlenint or None, optional
The number of elements by which to extend x at both ends of axis before applying the filter. This value must be less than x.shape[axis]-1. padlen=0 implies no padding. If padtye is not None and padlen is not given, padlen is be set to 6*m.
- fill_edge_nan: bool,optional
If pading is not used and fill_edge_nan is true, resulting data on the both ends are filled with nan to account for edge effect. This is 2*m on the either end of the result. Default is true.
- ts
- Returns:
- result
TimeSeries A new regular time series with the same interval of ts. If no pading is used the beigning and ending 4*m resulting data will be set to nan to remove edge effect.
- result
- Raises:
- ValueError
If input timeseries is not regular, or, cutoff_period and cutoff_frequency are given at the same time, or, neither cutoff_period nor curoff_frequence is given, or, padtype is not “odd”,”even”,”constant”,or None, or, padlen is larger than data size
- diff_h(tidal_hour_series)[source]¶
Compute the time derivative of tidal hour.
- Parameters:
- tidal_hour_seriespandas.Series
Output of tidal_hour_signal, indexed by datetime.
- Returns:
- pandas.Series
Time derivative of tidal hour (dH/dt) in hours/hour, indexed by datetime.
Notes
This derivative is often included to capture how rapidly the tidal phase is changing, which can be important in modeling flow reversals, estuarine dynamics, or for detecting slack tide conditions where the rate of change is near zero.
- find_slack(jd, u, leave_mean=False, which='both')[source]¶
Identify slack water times from a velocity time series.
- Parameters:
- jdarray-like
Array of time values (Julian days or similar).
- uarray-like
Array of velocity values (flood-positive).
- leave_meanbool, optional
If False, removes the mean (low-frequency) component from u.
- which{‘both’, ‘high’, ‘low’}, optional
Specifies which zero-crossings to return.
- Returns:
- jd_slackndarray
Array of times when slack water occurs.
- start{‘ebb’, ‘flood’}
String indicating the initial state.
Notes
This function detects transitions in the velocity time series where the current reverses direction (i.e., crosses zero), which correspond to slack water events.
- hilbert(x, N=None, axis=-1)[source]¶
Compute the analytic signal, using the Hilbert transform.
The transformation is done along the last axis by default.
- Parameters:
- xarray_like
Signal data. Must be real.
- Nint, optional
Number of Fourier components. Default:
x.shape[axis]- axisint, optional
Axis along which to do the transformation. Default: -1.
- Returns:
- xandarray
Analytic signal of x, of each 1-D array along axis
Notes
The analytic signal
x_a(t)of signalx(t)is:\[x_a = F^{-1}(F(x) 2U) = x + i y\]where F is the Fourier transform, U the unit step function, and y the Hilbert transform of x. [1]
In other words, the negative half of the frequency spectrum is zeroed out, turning the real-valued signal into a complex signal. The Hilbert transformed signal can be obtained from
np.imag(hilbert(x)), and the original signal fromnp.real(hilbert(x)).References
[1]Wikipedia, “Analytic signal”. https://en.wikipedia.org/wiki/Analytic_signal
[2]Leon Cohen, “Time-Frequency Analysis”, 1995. Chapter 2.
[3]Alan V. Oppenheim, Ronald W. Schafer. Discrete-Time Signal Processing, Third Edition, 2009. Chapter 12. ISBN 13: 978-1292-02572-8
Examples
In this example we use the Hilbert transform to determine the amplitude envelope and instantaneous frequency of an amplitude-modulated signal.
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from scipy.signal import hilbert, chirp
>>> duration = 1.0 >>> fs = 400.0 >>> samples = int(fs*duration) >>> t = np.arange(samples) / fs
We create a chirp of which the frequency increases from 20 Hz to 100 Hz and apply an amplitude modulation.
>>> signal = chirp(t, 20.0, t[-1], 100.0) >>> signal *= (1.0 + 0.5 * np.sin(2.0*np.pi*3.0*t) )
The amplitude envelope is given by magnitude of the analytic signal. The instantaneous frequency can be obtained by differentiating the instantaneous phase in respect to time. The instantaneous phase corresponds to the phase angle of the analytic signal.
>>> analytic_signal = hilbert(signal) >>> amplitude_envelope = np.abs(analytic_signal) >>> instantaneous_phase = np.unwrap(np.angle(analytic_signal)) >>> instantaneous_frequency = (np.diff(instantaneous_phase) / ... (2.0*np.pi) * fs)
>>> fig, (ax0, ax1) = plt.subplots(nrows=2) >>> ax0.plot(t, signal, label='signal') >>> ax0.plot(t, amplitude_envelope, label='envelope') >>> ax0.set_xlabel("time in seconds") >>> ax0.legend() >>> ax1.plot(t[1:], instantaneous_frequency) >>> ax1.set_xlabel("time in seconds") >>> ax1.set_ylim(0.0, 120.0) >>> fig.tight_layout()
- hour_tide(jd, u=None, h=None, jd_new=None, leave_mean=False, start_datum='ebb')[source]¶
Calculate tidal hour from a time series of velocity or water level.
- Parameters:
- jdarray-like
Time in days (e.g., Julian day, datenum, etc.).
- uarray-like, optional
Velocity, flood-positive.
- harray-like, optional
Water level, positive up.
- jd_newarray-like, optional
Optional new time points to evaluate.
- leave_meanbool, optional
By default, the time series mean is removed, but this can be disabled by passing True.
- start_datum{‘ebb’, ‘flood’}, optional
Desired starting datum for tidal hour.
- Returns:
- ndarray
Array of tidal hour values (0–12) for each time point.
Notes
This function computes the phase of the tidal cycle (tidal hour) for each time point, based on either velocity or water level time series. The tidal hour is defined such that 0 corresponds to slack before ebb.
- hour_tide_fn(jd, u, start_datum='ebb', leave_mean=False)[source]¶
Return a function for extracting tidal hour from the time/velocity given.
- Parameters:
- jdarray-like
Time array.
- uarray-like
Velocity array.
- start_datum{‘ebb’, ‘flood’}, optional
Desired starting datum for tidal hour.
- leave_meanbool, optional
If False, removes the mean (low-frequency) component from u.
- Returns:
- function
Function: fn(jd_new) → tidal hour array.
Notes
This function generates a callable that computes tidal hour for arbitrary time points, based on the provided time and velocity series. The tidal hour is referenced to slack water.
- class interp1d(x, y, kind='linear', axis=-1, copy=True, bounds_error=None, fill_value=nan, assume_sorted=False)[source]¶
Bases:
_Interpolator1DInterpolate a 1-D function.
x and y are arrays of values used to approximate some function f:
y = f(x). This class returns a function whose call method uses interpolation to find the value of new points.- Parameters:
- x(npoints, ) array_like
A 1-D array of real values.
- y(…, npoints, …) array_like
A N-D array of real values. The length of y along the interpolation axis must be equal to the length of x. Use the
axisparameter to select correct axis. Unlike other interpolators, the default interpolation axis is the last axis of y.- kindstr or int, optional
Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.
- axisint, optional
Axis in the
yarray corresponding to the x-coordinate values. Unlike other interpolators, defaults toaxis=-1.- copybool, optional
If True, the class makes internal copies of x and y. If False, references to x and y are used. The default is to copy.
- bounds_errorbool, optional
If True, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If False, out of bounds values are assigned fill_value. By default, an error is raised unless
fill_value="extrapolate".- fill_valuearray-like or (array-like, array_like) or “extrapolate”, optional
if a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes.
If a two-element tuple, then the first element is used as a fill value for
x_new < x[0]and the second element is used forx_new > x[-1]. Anything that is not a 2-element tuple (e.g., list or ndarray, regardless of shape) is taken to be a single array-like argument meant to be used for both bounds asbelow, above = fill_value, fill_value. Using a two-element tuple or ndarray requiresbounds_error=False.New in version 0.17.0.
If “extrapolate”, then points outside the data range will be extrapolated.
New in version 0.17.0.
- assume_sortedbool, optional
If False, values of x can be in any order and they are sorted first. If True, x has to be an array of monotonically increasing values.
See also
splrep,splevSpline interpolation/smoothing based on FITPACK.
UnivariateSplineAn object-oriented wrapper of the FITPACK routines.
interp2d2-D interpolation
Notes
Calling interp1d with NaNs present in input values results in undefined behaviour.
Input values x and y must be convertible to float values like int or float.
If the values in x are not unique, the resulting behavior is undefined and specific to the choice of kind, i.e., changing kind will change the behavior for duplicates.
Examples
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from scipy import interpolate >>> x = np.arange(0, 10) >>> y = np.exp(-x/3.0) >>> f = interpolate.interp1d(x, y)
>>> xnew = np.arange(0, 9, 0.1) >>> ynew = f(xnew) # use interpolation function returned by `interp1d` >>> plt.plot(x, y, 'o', xnew, ynew, '-') >>> plt.show()
- Attributes:
fill_valueThe fill value.
Methods
__call__(x)Evaluate the interpolant
- __dict__ = mappingproxy({'__module__': 'scipy.interpolate._interpolate', '__doc__': '\n Interpolate a 1-D function.\n\n .. legacy:: class\n\n For a guide to the intended replacements for `interp1d` see\n :ref:`tutorial-interpolate_1Dsection`.\n\n `x` and `y` are arrays of values used to approximate some function f:\n ``y = f(x)``. This class returns a function whose call method uses\n interpolation to find the value of new points.\n\n Parameters\n ----------\n x : (npoints, ) array_like\n A 1-D array of real values.\n y : (..., npoints, ...) array_like\n A N-D array of real values. The length of `y` along the interpolation\n axis must be equal to the length of `x`. Use the ``axis`` parameter\n to select correct axis. Unlike other interpolators, the default\n interpolation axis is the last axis of `y`.\n kind : str or int, optional\n Specifies the kind of interpolation as a string or as an integer\n specifying the order of the spline interpolator to use.\n The string has to be one of \'linear\', \'nearest\', \'nearest-up\', \'zero\',\n \'slinear\', \'quadratic\', \'cubic\', \'previous\', or \'next\'. \'zero\',\n \'slinear\', \'quadratic\' and \'cubic\' refer to a spline interpolation of\n zeroth, first, second or third order; \'previous\' and \'next\' simply\n return the previous or next value of the point; \'nearest-up\' and\n \'nearest\' differ when interpolating half-integers (e.g. 0.5, 1.5)\n in that \'nearest-up\' rounds up and \'nearest\' rounds down. Default\n is \'linear\'.\n axis : int, optional\n Axis in the ``y`` array corresponding to the x-coordinate values. Unlike\n other interpolators, defaults to ``axis=-1``.\n copy : bool, optional\n If True, the class makes internal copies of x and y.\n If False, references to `x` and `y` are used. The default is to copy.\n bounds_error : bool, optional\n If True, a ValueError is raised any time interpolation is attempted on\n a value outside of the range of x (where extrapolation is\n necessary). If False, out of bounds values are assigned `fill_value`.\n By default, an error is raised unless ``fill_value="extrapolate"``.\n fill_value : array-like or (array-like, array_like) or "extrapolate", optional\n - if a ndarray (or float), this value will be used to fill in for\n requested points outside of the data range. If not provided, then\n the default is NaN. The array-like must broadcast properly to the\n dimensions of the non-interpolation axes.\n - If a two-element tuple, then the first element is used as a\n fill value for ``x_new < x[0]`` and the second element is used for\n ``x_new > x[-1]``. Anything that is not a 2-element tuple (e.g.,\n list or ndarray, regardless of shape) is taken to be a single\n array-like argument meant to be used for both bounds as\n ``below, above = fill_value, fill_value``. Using a two-element tuple\n or ndarray requires ``bounds_error=False``.\n\n .. versionadded:: 0.17.0\n - If "extrapolate", then points outside the data range will be\n extrapolated.\n\n .. versionadded:: 0.17.0\n assume_sorted : bool, optional\n If False, values of `x` can be in any order and they are sorted first.\n If True, `x` has to be an array of monotonically increasing values.\n\n Attributes\n ----------\n fill_value\n\n Methods\n -------\n __call__\n\n See Also\n --------\n splrep, splev\n Spline interpolation/smoothing based on FITPACK.\n UnivariateSpline : An object-oriented wrapper of the FITPACK routines.\n interp2d : 2-D interpolation\n\n Notes\n -----\n Calling `interp1d` with NaNs present in input values results in\n undefined behaviour.\n\n Input values `x` and `y` must be convertible to `float` values like\n `int` or `float`.\n\n If the values in `x` are not unique, the resulting behavior is\n undefined and specific to the choice of `kind`, i.e., changing\n `kind` will change the behavior for duplicates.\n\n\n Examples\n --------\n >>> import numpy as np\n >>> import matplotlib.pyplot as plt\n >>> from scipy import interpolate\n >>> x = np.arange(0, 10)\n >>> y = np.exp(-x/3.0)\n >>> f = interpolate.interp1d(x, y)\n\n >>> xnew = np.arange(0, 9, 0.1)\n >>> ynew = f(xnew) # use interpolation function returned by `interp1d`\n >>> plt.plot(x, y, \'o\', xnew, ynew, \'-\')\n >>> plt.show()\n ', '__init__': <function interp1d.__init__>, 'fill_value': <property object>, '_check_and_update_bounds_error_for_extrapolation': <function interp1d._check_and_update_bounds_error_for_extrapolation>, '_call_linear_np': <function interp1d._call_linear_np>, '_call_linear': <function interp1d._call_linear>, '_call_nearest': <function interp1d._call_nearest>, '_call_previousnext': <function interp1d._call_previousnext>, '_call_spline': <function interp1d._call_spline>, '_call_nan_spline': <function interp1d._call_nan_spline>, '_evaluate': <function interp1d._evaluate>, '_check_bounds': <function interp1d._check_bounds>, '__dict__': <attribute '__dict__' of 'interp1d' objects>, '__weakref__': <attribute '__weakref__' of 'interp1d' objects>, '__annotations__': {}})¶
- __doc__ = '\n Interpolate a 1-D function.\n\n .. legacy:: class\n\n For a guide to the intended replacements for `interp1d` see\n :ref:`tutorial-interpolate_1Dsection`.\n\n `x` and `y` are arrays of values used to approximate some function f:\n ``y = f(x)``. This class returns a function whose call method uses\n interpolation to find the value of new points.\n\n Parameters\n ----------\n x : (npoints, ) array_like\n A 1-D array of real values.\n y : (..., npoints, ...) array_like\n A N-D array of real values. The length of `y` along the interpolation\n axis must be equal to the length of `x`. Use the ``axis`` parameter\n to select correct axis. Unlike other interpolators, the default\n interpolation axis is the last axis of `y`.\n kind : str or int, optional\n Specifies the kind of interpolation as a string or as an integer\n specifying the order of the spline interpolator to use.\n The string has to be one of \'linear\', \'nearest\', \'nearest-up\', \'zero\',\n \'slinear\', \'quadratic\', \'cubic\', \'previous\', or \'next\'. \'zero\',\n \'slinear\', \'quadratic\' and \'cubic\' refer to a spline interpolation of\n zeroth, first, second or third order; \'previous\' and \'next\' simply\n return the previous or next value of the point; \'nearest-up\' and\n \'nearest\' differ when interpolating half-integers (e.g. 0.5, 1.5)\n in that \'nearest-up\' rounds up and \'nearest\' rounds down. Default\n is \'linear\'.\n axis : int, optional\n Axis in the ``y`` array corresponding to the x-coordinate values. Unlike\n other interpolators, defaults to ``axis=-1``.\n copy : bool, optional\n If True, the class makes internal copies of x and y.\n If False, references to `x` and `y` are used. The default is to copy.\n bounds_error : bool, optional\n If True, a ValueError is raised any time interpolation is attempted on\n a value outside of the range of x (where extrapolation is\n necessary). If False, out of bounds values are assigned `fill_value`.\n By default, an error is raised unless ``fill_value="extrapolate"``.\n fill_value : array-like or (array-like, array_like) or "extrapolate", optional\n - if a ndarray (or float), this value will be used to fill in for\n requested points outside of the data range. If not provided, then\n the default is NaN. The array-like must broadcast properly to the\n dimensions of the non-interpolation axes.\n - If a two-element tuple, then the first element is used as a\n fill value for ``x_new < x[0]`` and the second element is used for\n ``x_new > x[-1]``. Anything that is not a 2-element tuple (e.g.,\n list or ndarray, regardless of shape) is taken to be a single\n array-like argument meant to be used for both bounds as\n ``below, above = fill_value, fill_value``. Using a two-element tuple\n or ndarray requires ``bounds_error=False``.\n\n .. versionadded:: 0.17.0\n - If "extrapolate", then points outside the data range will be\n extrapolated.\n\n .. versionadded:: 0.17.0\n assume_sorted : bool, optional\n If False, values of `x` can be in any order and they are sorted first.\n If True, `x` has to be an array of monotonically increasing values.\n\n Attributes\n ----------\n fill_value\n\n Methods\n -------\n __call__\n\n See Also\n --------\n splrep, splev\n Spline interpolation/smoothing based on FITPACK.\n UnivariateSpline : An object-oriented wrapper of the FITPACK routines.\n interp2d : 2-D interpolation\n\n Notes\n -----\n Calling `interp1d` with NaNs present in input values results in\n undefined behaviour.\n\n Input values `x` and `y` must be convertible to `float` values like\n `int` or `float`.\n\n If the values in `x` are not unique, the resulting behavior is\n undefined and specific to the choice of `kind`, i.e., changing\n `kind` will change the behavior for duplicates.\n\n\n Examples\n --------\n >>> import numpy as np\n >>> import matplotlib.pyplot as plt\n >>> from scipy import interpolate\n >>> x = np.arange(0, 10)\n >>> y = np.exp(-x/3.0)\n >>> f = interpolate.interp1d(x, y)\n\n >>> xnew = np.arange(0, 9, 0.1)\n >>> ynew = f(xnew) # use interpolation function returned by `interp1d`\n >>> plt.plot(x, y, \'o\', xnew, ynew, \'-\')\n >>> plt.show()\n '¶
- __init__(x, y, kind='linear', axis=-1, copy=True, bounds_error=None, fill_value=nan, assume_sorted=False)[source]¶
Initialize a 1-D linear interpolation class.
- __module__ = 'scipy.interpolate._interpolate'¶
- __weakref__¶
list of weak references to the object (if defined)
- _check_bounds(x_new)[source]¶
Check the inputs for being in the bounds of the interpolated data.
- Parameters:
- x_newarray
- Returns:
- out_of_boundsbool array
The mask on x_new of values that are out of the bounds.
- _y_axis¶
- _y_extra_shape¶
- dtype¶
- property fill_value¶
The fill value.
- tidal_hour_signal(ts, filter=True)[source]¶
Compute the tidal hour of a semidiurnal signal.
- Parameters:
- tspandas.Series
Time series of water level or other semidiurnal signal. Must have a datetime index.
- filterbool, default True
Whether to apply a 40-hour cosine Lanczos filter to the input signal. If False, uses the raw signal.
- Returns:
- pandas.Series
Tidal hour as a float (range [0, 12)), indexed by datetime.
See also
diff_hCompute the derivative (rate of change) of tidal hour.
cosine_lanczosExternal function used to apply low-pass filtering.
Notes
This function returns the instantaneous phase-based tidal hour for a time series, assuming a semidiurnal signal. Optionally applies a cosine Lanczos low-pass filter (e.g., 40h) to isolate tidal components from subtidal or noisy fluctuations.
The tidal hour is computed using the phase of the analytic signal obtained via the Hilbert transform. This phase is then scaled to range from 0 to 12 hours to represent one semidiurnal tidal cycle. The output is a pandas Series aligned with the input time index.
The tidal hour is derived from the instantaneous phase of the analytic signal. This signal is computed as:
analytic_signal = ts + 1j * hilbert(ts)
The phase (angle) of this complex signal varies smoothly over time and reflects the oscillatory nature of the tide, allowing us to construct a continuous representation of “tidal time” even between extrema.
The use of the Hilbert transform provides a smooth interpolation of the signal’s phase progression, since it yields the narrow-band envelope and instantaneous phase of the dominant frequency component (assumed to be semidiurnal here).
- tidal_hour_signal2(ts: Series | DataFrame, filter: bool = True) Series | DataFrame[source]¶
Calculate the tidal hour from a semidiurnal tidal signal.
- Parameters:
- tspd.Series or pd.DataFrame
Input time series of water levels (must have datetime index)
- filterbool, optional
If True, apply Lanczos filter to remove low-frequency components (default True) Note this is opposite of ‘leave_mean’ in original implementation
- Returns:
- pd.Series or pd.DataFrame
Tidal hour in datetime format (same shape as input)
Notes
The tidal hour represents the phase of the semidiurnal tide in temporal units. The calculation uses complex interpolation for smooth phase estimation: 1. The Hilbert transform creates an analytic signal 2. The angle gives the instantaneous phase 3. Complex interpolation avoids phase jumps at 0/2π boundaries 4. This provides continuous phase evolution even during slack tides
If h/u distinction is needed, consider applying diff_h to separate flood/ebb phases. The derivative was likely included in original code to identify phase reversals during tidal current analysis.
vtools.functions.transition module¶
- class PchipInterpolator(x, y, axis=0, extrapolate=None)[source]¶
Bases:
CubicHermiteSplinePCHIP 1-D monotonic cubic interpolation.
xandyare arrays of values used to approximate some function f, withy = f(x). The interpolant uses monotonic cubic splines to find the value of new points. (PCHIP stands for Piecewise Cubic Hermite Interpolating Polynomial).- Parameters:
- xndarray, shape (npoints, )
A 1-D array of monotonically increasing real values.
xcannot include duplicate values (otherwise f is overspecified)- yndarray, shape (…, npoints, …)
A N-D array of real values.
y’s length along the interpolation axis must be equal to the length ofx. Use theaxisparameter to select the interpolation axis.- axisint, optional
Axis in the
yarray corresponding to the x-coordinate values. Defaults toaxis=0.- extrapolatebool, optional
Whether to extrapolate to out-of-bounds points based on first and last intervals, or to return NaNs.
See also
CubicHermiteSplinePiecewise-cubic interpolator.
Akima1DInterpolatorAkima 1D interpolator.
CubicSplineCubic spline data interpolator.
PPolyPiecewise polynomial in terms of coefficients and breakpoints.
Notes
The interpolator preserves monotonicity in the interpolation data and does not overshoot if the data is not smooth.
The first derivatives are guaranteed to be continuous, but the second derivatives may jump at \(x_k\).
Determines the derivatives at the points \(x_k\), \(f'_k\), by using PCHIP algorithm [1].
Let \(h_k = x_{k+1} - x_k\), and \(d_k = (y_{k+1} - y_k) / h_k\) are the slopes at internal points \(x_k\). If the signs of \(d_k\) and \(d_{k-1}\) are different or either of them equals zero, then \(f'_k = 0\). Otherwise, it is given by the weighted harmonic mean
\[\frac{w_1 + w_2}{f'_k} = \frac{w_1}{d_{k-1}} + \frac{w_2}{d_k}\]where \(w_1 = 2 h_k + h_{k-1}\) and \(w_2 = h_k + 2 h_{k-1}\).
The end slopes are set using a one-sided scheme [2].
References
[1]F. N. Fritsch and J. Butland, A method for constructing local monotone piecewise cubic interpolants, SIAM J. Sci. Comput., 5(2), 300-304 (1984). :doi:`10.1137/0905021`.
[2]see, e.g., C. Moler, Numerical Computing with Matlab, 2004. :doi:`10.1137/1.9780898717952`
Methods
__call__(x[, nu, extrapolate])Evaluate the piecewise polynomial or its derivative.
derivative([nu])Construct a new piecewise polynomial representing the derivative.
antiderivative([nu])Construct a new piecewise polynomial representing the antiderivative.
roots([discontinuity, extrapolate])Find real roots of the piecewise polynomial.
- __annotations__ = {}¶
- __doc__ = "PCHIP 1-D monotonic cubic interpolation.\n\n ``x`` and ``y`` are arrays of values used to approximate some function f,\n with ``y = f(x)``. The interpolant uses monotonic cubic splines\n to find the value of new points. (PCHIP stands for Piecewise Cubic\n Hermite Interpolating Polynomial).\n\n Parameters\n ----------\n x : ndarray, shape (npoints, )\n A 1-D array of monotonically increasing real values. ``x`` cannot\n include duplicate values (otherwise f is overspecified)\n y : ndarray, shape (..., npoints, ...)\n A N-D array of real values. ``y``'s length along the interpolation\n axis must be equal to the length of ``x``. Use the ``axis``\n parameter to select the interpolation axis.\n axis : int, optional\n Axis in the ``y`` array corresponding to the x-coordinate values. Defaults\n to ``axis=0``.\n extrapolate : bool, optional\n Whether to extrapolate to out-of-bounds points based on first\n and last intervals, or to return NaNs.\n\n Methods\n -------\n __call__\n derivative\n antiderivative\n roots\n\n See Also\n --------\n CubicHermiteSpline : Piecewise-cubic interpolator.\n Akima1DInterpolator : Akima 1D interpolator.\n CubicSpline : Cubic spline data interpolator.\n PPoly : Piecewise polynomial in terms of coefficients and breakpoints.\n\n Notes\n -----\n The interpolator preserves monotonicity in the interpolation data and does\n not overshoot if the data is not smooth.\n\n The first derivatives are guaranteed to be continuous, but the second\n derivatives may jump at :math:`x_k`.\n\n Determines the derivatives at the points :math:`x_k`, :math:`f'_k`,\n by using PCHIP algorithm [1]_.\n\n Let :math:`h_k = x_{k+1} - x_k`, and :math:`d_k = (y_{k+1} - y_k) / h_k`\n are the slopes at internal points :math:`x_k`.\n If the signs of :math:`d_k` and :math:`d_{k-1}` are different or either of\n them equals zero, then :math:`f'_k = 0`. Otherwise, it is given by the\n weighted harmonic mean\n\n .. math::\n\n \\frac{w_1 + w_2}{f'_k} = \\frac{w_1}{d_{k-1}} + \\frac{w_2}{d_k}\n\n where :math:`w_1 = 2 h_k + h_{k-1}` and :math:`w_2 = h_k + 2 h_{k-1}`.\n\n The end slopes are set using a one-sided scheme [2]_.\n\n\n References\n ----------\n .. [1] F. N. Fritsch and J. Butland,\n A method for constructing local\n monotone piecewise cubic interpolants,\n SIAM J. Sci. Comput., 5(2), 300-304 (1984).\n :doi:`10.1137/0905021`.\n .. [2] see, e.g., C. Moler, Numerical Computing with Matlab, 2004.\n :doi:`10.1137/1.9780898717952`\n\n "¶
- __module__ = 'scipy.interpolate._cubic'¶
- axis¶
- c¶
- extrapolate¶
- x¶
- _resolve_gap_endpoints_subset_snap(ts0, ts1, window, max_snap=None)[source]¶
- Contract:
- If window is None:
If there’s a natural gap (ts0.last < ts1.first), use that full gap.
Otherwise (overlap/abut), return None to signal ‘no explicit gap’ (algorithms decide).
- If window is provided:
Enforce: start < end; ts0 has <= start; ts1 has >= end. Else: ValueError.
If there is a natural gap AND (start,end) is a strict subset of it, expand start left and end right by up to max_snap (default 0) but never beyond the natural gap bounds. Otherwise, ignore max_snap.
Always snap endpoints to data: start_time = last ts0 sample <= effective start, end_time = first ts1 sample >= effective end.
- Returns:
(start_time, end_time) or None if no explicit gap is to be used.
- transition_ts(ts0, ts1, method='linear', window=None, overlap=(0, 0), return_type='series', names=None, max_snap=None)[source]¶
Create a smooth transition between two aligned time series.
- Parameters:
- ts0pandas.Series or pandas.DataFrame
The initial time series segment. Must share the same frequency and type as ts1.
- ts1pandas.Series or pandas.DataFrame
The final time series segment. Must share the same frequency and type as ts0.
- method{“linear”, “pchip”}, default=”linear”
The interpolation method to use for generating the transition.
- window[start, end] or None
If None and there’s a natural gap (ts0.last < ts1.first), that full gap is used. If provided, start<end, ts0 must have samples at/before start, ts1 at/after end.
- namesNone, str, or iterable of str, optional
If None (default), inputs must share compatible column names.
If str, the output is univariate and will be named accordingly.
If iterable, it is used as a subset/ordering of columns.
- overlaptuple of int or str, default=(0, 0)
Amount of overlap to use for interpolation anchoring in pchip mode. Each entry can be: - An integer: number of data points before/after to use. - A pandas-compatible frequency string: e.g., “2h” or “45min”.
- max_snapNone | Timedelta-like | (Timedelta-like, Timedelta-like)
Optional widening ONLY when window is strictly inside the natural gap. Expands start earlier and end later by up to max_snap, but never past (ts0.last, ts1.first). Default None = no widening.
- return_type{“series”, “glue”}, default=”series”
“series”: returns the full merged series including ts0, transition, ts1.
“glue”: returns only the interpolated transition segment.
- Returns:
- pandas.Series or pandas.DataFrame
The resulting time series segment, either the full merged series or just the transition zone.
- Raises:
- ValueError
If ts0 and ts1 have mismatched types or frequencies, or if overlap exists but window is not specified.
vtools.functions.unit_conversions module¶
Unit conversion helpers.
- This module provides:
linear/affine converters for common engineering units: metres↔feet, cms↔cfs, °F↔°C (all functional, no in-place mutation).
Domain-specific conversions between electrical conductivity (EC, μS/cm) and practical salinity (PSU) at 25 °C, with optional Hill low-salinity correction and an accuracy-improving root-finding “refinement” step.
a general-purpose unit conversion function convert_units() that uses Pint by default (with an optional cf_units backend via an environment variable), and that has fast paths for the above common conversions.
Notes¶
PSU is treated here as a practical “unit” for salinity in workflows, even though in a strict metrological sense it is unitless.
The EC↔PSU conversions assume 25 °C and no explicit temperature dependence beyond the optional Hill correction.
References¶
Schemel, L.E. (2001) Empirical relationships between salinity and specific conductance in San Francisco Bay, California.
Hill, K. (low-salinity correction widely used in estuarine practice).
- _get_converter(iu: str, ou: str)[source]¶
Return a callable(arr)->arr using Pint by default; cf_units if env-forced.
- _norm(u: str) str[source]¶
Normalize common shorthands to canonical spellings without destroying case needed by Pint (e.g., degC/degF).
- celsius_to_fahrenheit(x)[source]¶
Convert °C to °F.
- Parameters:
- xscalar | array-like | pd.Series | pd.DataFrame
Value(s) in celsius.
- Returns:
- same type as x
Value(s) in farenheit.
- cfs_to_cms(x)[source]¶
Convert ft³/s to m³/s.
- Parameters:
- xscalar | array-like | pd.Series | pd.DataFrame
Value(s) in cfs.
- Returns:
- same type as x
Value(s) in cubic meters per second.
- cms_to_cfs(x)[source]¶
Convert m³/s to ft³/s.
- Parameters:
- xscalar | array-like | pd.Series | pd.DataFrame
Value(s) in cms.
- Returns:
- same type as x
Value(s) in cfs.
- convert_units(values, in_unit: str, out_unit: str)[source]¶
Convert array-like / pandas objects between units. Fast custom paths for EC↔PSU@25C, temperature, cfs↔cms, ft↔m; else Pint-backed.
- Parameters:
- valuesarray-like | pd.Series | pd.DataFrame
- in_unit, out_unitstr
Unit strings. Shorthands like ‘cfs’,’cms’,’ft3/s’,’μS/cm’,’deg F’ accepted.
- Returns:
- Same type as values, converted.
- ec_psu_25c(ec, hill_correction=True)[source]¶
Convert electrical conductivity (EC, μS/cm) to practical salinity (PSU) at 25 °C.
This implements the empirical relationship used for estuarine work, with an optional Hill correction that improves behavior at low salinities.
- Parameters:
- ecarray-like or scalar
Electrical conductivity in μS/cm.
- hill_correctionbool, default True
Apply Hill low-salinity correction.
- Returns:
- ndarray or scalar
Practical salinity (PSU). For negative EC inputs: - scalar input → returns NaN - array input → returns NaN at those positions
Notes
Assumes temperature is 25 °C.
Negative EC values are internally floored to a small positive ratio for computation (R=1e-4); those outputs are then set to NaN on array paths (or NaN returned for scalar paths).
- fahrenheit_to_celsius(x)[source]¶
Convert °F to °C.
- Parameters:
- xscalar | array-like | pd.Series | pd.DataFrame
Value(s) in degrees F.
- Returns:
- same type as x
Value(s) in degrees celsius.
- ft_to_m(x)[source]¶
Convert feet to metres.
- Parameters:
- xscalar | array-like | pd.Series | pd.DataFrame
Value(s) in feet.
- Returns:
- same type as x
Value(s) in meters.
- m_to_ft(x)[source]¶
Convert metres to feet.
- Parameters:
- xscalar | array-like | pd.Series | pd.DataFrame
Value(s) in metres.
- Returns:
- same type as x
Value(s) in feet.
- psu_ec_25c(psu, refine=True, hill_correction=True)[source]¶
Convert practical salinity (PSU) to EC (μS/cm) at 25 °C (vectorized).
- Parameters:
- psuarray-like or scalar
Practical salinity value(s).
- refinebool, default True
Use root finding via
psu_ec_25c_scalar()for accuracy.- hill_correctionbool, default True
See
psu_ec_25c_scalar().
- Returns:
- ndarray or scalar
EC in μS/cm. Scalar input returns a scalar; array-like input returns a NumPy array of the same shape.
- psu_ec_25c_scalar(psu, refine=True, hill_correction=True)[source]¶
Convert practical salinity (PSU) to EC (μS/cm) at 25 °C for a scalar value.
- Parameters:
- psufloat
Practical salinity. Must be non-negative and ≤ ~35 for oceanic cases (a hard check is enforced near sea salinity when refine is True).
- refinebool, default True
If True, use a scalar root finder (Brent) to invert the EC→PSU mapping accurately. If False, use a closed-form Schemel-style polynomial approximation.
- hill_correctionbool, default True
Only meaningful with refine=True; raises if refine=False and hill_correction=True.
- Returns:
- float
Electrical conductivity (μS/cm).
- Raises:
- ValueError
If psu < 0, if psu exceeds the sea-salinity cap in refine mode, or if an invalid combination of refine/hill_correction is requested.
Notes
The refinement typically converges in ~4–6 iterations.
The non-refined polynomial is faster but can drift on round trips (EC→PSU→EC).
