Statistics<a class="headerlink" href="#module-tseda.statistics" title="Link to this heading">

kpss(ts, regression, alpha)[source]

KPSS test.

Parameters:

ts (TimeSeries)
regression (str)
alpha (float)

Return type:

pp(ts, regression, alpha)[source]

Phillips-Perron test (delegates to statsmodels if available).

Parameters:

ts (TimeSeries)
regression (str)
alpha (float)

Return type:

summary(ts, alpha)[source]

Run ADF + KPSS and return a combined verdict string.

Parameters:

ts (TimeSeries)
regression (str)
alpha (float)

Return type:

Notes

When statsmodels is installed the adf and kpss methods automatically use its implementations, which have more accurate critical-value tables. Install with pip install statsmodels.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester

>>> rng = np.random.default_rng(0)
>>> idx = pd.date_range("2020", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx)
>>> r   = StationarityTester().adf(ts)
>>> r.is_stationary
True

adf(ts, *, maxlag=None, regression='c', alpha=0.05)[source]

Augmented Dickey-Fuller unit-root test.

H₀: The series has a unit root (is non-stationary). H₁: The series is stationary.

Reject H₀ (small p-value) → evidence of stationarity.

Parameters:

ts (TimeSeries) – Input series. NaN values are dropped before testing.
maxlag (int, optional) – Maximum lag to consider for AIC-based lag selection. Defaults to int(12 * (n / 100) ** 0.25) (Schwert 1989).
regression (str, optional) –
Deterministic terms to include in the test equation.
- "nc" — no constant, no trend.
- "c" — constant only (default).
- "ct" — constant + linear trend.
alpha (float, optional) – Significance level for is_stationary. Default 0.05.

Return type:

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If fewer than 10 non-NaN observations or regression is invalid.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester
>>> rng = np.random.default_rng(1)
>>> idx = pd.date_range("2020", periods=150, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(150), index=idx)
>>> StationarityTester().adf(ts).is_stationary
True

kpss(ts, *, regression='c', alpha=0.05)[source]

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) stationarity test.

H₀: The series is level (or trend) stationary. H₁: The series has a unit root.

Fail to reject H₀ (large p-value) → evidence of stationarity. This is the opposite null from ADF.

Parameters:

ts (TimeSeries) – Input series.
regression (str, optional) – "c" — test for level stationarity (default). "ct" — test for trend stationarity.
alpha (float, optional) – Significance level. Default 0.05.

Return type:

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester
>>> rng = np.random.default_rng(2)
>>> idx = pd.date_range("2020", periods=150, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(150), index=idx)
>>> StationarityTester().kpss(ts).is_stationary
True

pp(ts, *, regression='c', alpha=0.05)[source]

Phillips-Perron unit-root test.

Like ADF but uses a non-parametric correction for serial correlation (no lag selection required). Requires statsmodels.

H₀: The series has a unit root. H₁: The series is stationary.

Parameters:

ts (TimeSeries) – Input series.
regression (str, optional) – "c" (default) or "ct".
alpha (float, optional) – Significance level. Default 0.05.

Return type:

Raises:

ImportError – If statsmodels is not installed.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester
>>> rng = np.random.default_rng(3)
>>> idx = pd.date_range("2020", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx)
>>> StationarityTester().pp(ts).is_stationary
True

summary(ts, *, regression='c', alpha=0.05)[source]

Run ADF + KPSS and return a human-readable combined verdict.

The two tests have opposite nulls, so their results can be reconciled:

ADF	KPSS	Verdict
stat.	stat.	Strong evidence of stationarity
stat.	non-s.	Trend stationary — consider detrending
non-s.	stat.	Difference stationary — try differencing
non-s.	non-s.	Strong evidence of non-stationarity

Parameters:

ts (TimeSeries) – Input series.
regression (str, optional) – Passed to both ADF and KPSS. Default "c".
alpha (float, optional) – Significance level. Default 0.05.

Returns:

Multi-line plain-English summary.

Return type:

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester
>>> rng = np.random.default_rng(0)
>>> idx = pd.date_range("2020", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx)
>>> print(StationarityTester().summary(ts))

class tseda.statistics.AutocorrelationResult(acf, pacf, lags, conf_lower, conf_upper, lb_statistic, lb_pvalue, n_lags, n_obs, is_white_noise, alpha)[source]

Bases: object

Immutable autocorrelation analysis result.

Parameters:

acf (ndarray)
pacf (ndarray)
lags (ndarray)
conf_lower (ndarray)
conf_upper (ndarray)
lb_statistic (ndarray)
lb_pvalue (ndarray)
n_lags (int)
n_obs (int)
is_white_noise (bool)
alpha (float)

acf

Autocorrelation function values at lags 0, 1, …, n_lags. acf[0] is always 1.0 (lag-0 autocorrelation).

Type:: numpy.ndarray

pacf

Partial autocorrelation function values at lags 0, 1, …, n_lags. pacf[0] is always 1.0 by convention.

Type:: numpy.ndarray

lags

Integer array [0, 1, …, n_lags].

Type:: numpy.ndarray

conf_lower

Lower 95 % confidence bound at each lag (Bartlett’s approximation).

Type:: numpy.ndarray

conf_upper

Upper 95 % confidence bound at each lag.

Type:: numpy.ndarray

lb_statistic

Ljung-Box Q-statistic at each lag from 1 to n_lags.

Type:: numpy.ndarray

lb_pvalue

P-value of the Ljung-Box test at each lag.

Type:: numpy.ndarray

n_lags

Number of lags requested (excluding lag 0).

Type:: int

n_obs

Number of non-NaN observations used.

Type:: int

is_white_noise

True when the Ljung-Box p-value at lag min(n_lags, 20) exceeds alpha.

Type:: bool

alpha

Significance level used for is_white_noise and confidence bounds.

Type:: float

acf: ndarray

pacf: ndarray

lags: ndarray

conf_lower: ndarray

conf_upper: ndarray

lb_statistic: ndarray

lb_pvalue: ndarray

n_lags: int

n_obs: int

is_white_noise: bool

alpha: float

__repr__()[source]

Return repr(self).

Return type:: str

__init__(acf, pacf, lags, conf_lower, conf_upper, lb_statistic, lb_pvalue, n_lags, n_obs, is_white_noise, alpha)

Parameters:

acf (ndarray)
pacf (ndarray)
lags (ndarray)
conf_lower (ndarray)
conf_upper (ndarray)
lb_statistic (ndarray)
lb_pvalue (ndarray)
n_lags (int)
n_obs (int)
is_white_noise (bool)
alpha (float)

Return type:

None

class tseda.statistics.AutocorrelationAnalyzer[source]

Bases: object

Compute ACF, PACF, and Ljung-Box statistics for a TimeSeries.

This class is stateless.

analyze(ts, lags, alpha)[source]

Return an AutocorrelationResult.

Parameters:

ts (TimeSeries)
lags (int)
alpha (float)

Return type:

AutocorrelationResult

significant_lags(result)[source]

Return the lag numbers where ACF or PACF exceeds the CI.

Parameters:

result (AutocorrelationResult)
which (str)

Return type:

ndarray

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.autocorrelation import AutocorrelationAnalyzer

AR(1) process:

>>> rng = np.random.default_rng(7)
>>> n   = 300
>>> idx = pd.date_range("2020", periods=n, freq="D")
>>> eps = rng.standard_normal(n)
>>> x   = np.zeros(n)
>>> for i in range(1, n): x[i] = 0.7 * x[i-1] + eps[i]
>>> ts  = TimeSeries(x, index=idx)
>>> r   = AutocorrelationAnalyzer().analyze(ts, lags=10)
>>> r.acf[1] > 0.5          # strong lag-1 autocorrelation
True
>>> r.is_white_noise         # definitely not white noise
False

analyze(ts, lags=40, *, alpha=0.05)[source]

Compute ACF, PACF, and Ljung-Box statistics.

Parameters:

ts (TimeSeries) – Input series. NaN values are dropped before analysis.
lags (int, optional) – Number of lags to compute (lag 0 is always included). Capped at n // 2. Default 40.
alpha (float, optional) – Significance level for confidence bounds and is_white_noise. Default 0.05.

Return type:

AutocorrelationResult

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If lags is out of range or fewer than 4 non-NaN observations.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.autocorrelation import AutocorrelationAnalyzer
>>> idx = pd.date_range("2020", periods=50, freq="D")
>>> ts  = TimeSeries(np.ones(50), index=idx)
>>> r   = AutocorrelationAnalyzer().analyze(ts, lags=5)
>>> r.acf[0]
1.0

significant_lags(result, *, which='acf')[source]

Return lag numbers (> 0) where the function exceeds the CI.

Parameters:

result (AutocorrelationResult) – Output of analyze().
which (str, optional) – "acf" (default) or "pacf".

Returns:

Integer array of significant lag numbers.

Return type:

numpy.ndarray

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.autocorrelation import AutocorrelationAnalyzer

>>> rng = np.random.default_rng(7)
>>> n   = 300
>>> idx = pd.date_range("2020", periods=n, freq="D")
>>> eps = rng.standard_normal(n)
>>> x   = np.zeros(n)
>>> for i in range(1, n): x[i] = 0.7 * x[i-1] + eps[i]
>>> ts  = TimeSeries(x, index=idx)
>>> r   = AutocorrelationAnalyzer().analyze(ts, lags=10)
>>> len(AutocorrelationAnalyzer().significant_lags(r)) > 0
True

Descriptive statistics for time series.

Provides a single DescriptiveStats result object and a stateless DescriptiveAnalyzer that computes it. All arithmetic uses numpy so there are no extra dependencies beyond the core stack.

The statistics reported go beyond what pandas.Series.describe() offers:

Robust location / spread (median, MAD, trimmed mean).
Shape (skewness, excess kurtosis).
Quantiles at multiple probability levels.
First/last value, range, coefficient of variation.
Count of zeros and near-zero values.

Classes

DescriptiveStats: Frozen dataclass containing every computed statistic.
DescriptiveAnalyzer: Stateless analyzer that produces DescriptiveStats.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.descriptive import DescriptiveAnalyzer

>>> rng = np.random.default_rng(0)
>>> idx = pd.date_range("2020-01-01", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx, name="returns")
>>> r   = DescriptiveAnalyzer().analyze(ts)
>>> round(r.mean, 3)
0.024

class tseda.statistics.descriptive.DescriptiveStats(n_total, n_valid, n_nan, pct_nan, mean, median, trimmed_mean, std, var, mad, cv, min, max, range, first, last, skewness, kurtosis, quantiles, n_zeros, n_positive, n_negative)[source]

Bases: object

Comprehensive descriptive statistics for a TimeSeries.

All statistics are computed on the non-NaN subset unless otherwise noted.

Parameters:

n_total (int)
n_valid (int)
n_nan (int)
pct_nan (float)
mean (float)
median (float)
trimmed_mean (float)
std (float)
var (float)
mad (float)
cv (float)
min (float)
max (float)
range (float)
first (float)
last (float)
skewness (float)
kurtosis (float)
quantiles (Dict[float, float])
n_zeros (int)
n_positive (int)
n_negative (int)

n_total

Total number of observations (including NaN).

Type:: int

n_valid

Number of non-NaN observations.

Type:: int

n_nan

Number of NaN observations.

Type:: int

pct_nan

Percentage of NaN observations (0–100).

Type:: float

mean

Arithmetic mean.

Type:: float

median

50th percentile.

Type:: float

std

Sample standard deviation (ddof=1).

Type:: float

var

Sample variance (ddof=1).

Type:: float

mad

Median absolute deviation: median(|x - median(x)|).

Type:: float

trimmed_mean

Mean with the top and bottom 5 % of values removed.

Type:: float

min

Minimum value.

Type:: float

max

Maximum value.

Type:: float

range

max - min.

Type:: float

first

First (earliest) non-NaN value.

Type:: float

last

Last (most recent) non-NaN value.

Type:: float

cv

Coefficient of variation: std / |mean|. nan when mean == 0.

Type:: float

skewness

Fisher’s moment coefficient of skewness (bias-corrected).

Type:: float

kurtosis

Excess kurtosis (Fisher definition, bias-corrected). 0 for a normal distribution.

Type:: float

quantiles

Mapping from probability level to quantile value. Keys: [0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99].

Type:: dict of float → float

n_zeros

Number of exact zeros.

Type:: int

n_positive

Number of strictly positive values.

Type:: int

n_negative

Number of strictly negative values.

Type:: int

n_total: int

n_valid: int

n_nan: int

pct_nan: float

mean: float

median: float

trimmed_mean: float

std: float

var: float

mad: float

cv: float

min: float

max: float

range: float

first: float

last: float

skewness: float

kurtosis: float

quantiles: Dict[float, float]

n_zeros: int

n_positive: int

n_negative: int

__repr__()[source]

Return repr(self).

Return type:: str

__init__(n_total, n_valid, n_nan, pct_nan, mean, median, trimmed_mean, std, var, mad, cv, min, max, range, first, last, skewness, kurtosis, quantiles, n_zeros, n_positive, n_negative)

Parameters:

n_total (int)
n_valid (int)
n_nan (int)
pct_nan (float)
mean (float)
median (float)
trimmed_mean (float)
std (float)
var (float)
mad (float)
cv (float)
min (float)
max (float)
range (float)
first (float)
last (float)
skewness (float)
kurtosis (float)
quantiles (Dict[float, float])
n_zeros (int)
n_positive (int)
n_negative (int)

Return type:

None

class tseda.statistics.descriptive.DescriptiveAnalyzer[source]

Bases: object

Compute comprehensive descriptive statistics for a TimeSeries.

This class is stateless — one instance, many series.

analyze(ts)[source]

Return a DescriptiveStats for ts.

Parameters:: ts (TimeSeries)
Return type:: DescriptiveStats

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.descriptive import DescriptiveAnalyzer

>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts  = TimeSeries([2.0, 4.0, 4.0, 4.0, 5.0], index=idx)
>>> r   = DescriptiveAnalyzer().analyze(ts)
>>> r.mean
3.8
>>> r.std
1.09...

analyze(ts)[source]

Compute descriptive statistics for ts.

Parameters:

ts (TimeSeries) – Input series.

Return type:

DescriptiveStats

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If ts has no non-NaN values.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.descriptive import DescriptiveAnalyzer

>>> idx = pd.date_range("2020", periods=4, freq="D")
>>> ts  = TimeSeries([1.0, 2.0, 3.0, 4.0], index=idx)
>>> r   = DescriptiveAnalyzer().analyze(ts)
>>> r.median
2.5
>>> r.n_positive
4

Stationarity testing for time series.

Three widely-used tests are implemented with a dual-path strategy:

Primary path — pure numpy / scipy implementation so the package works without statsmodels.
Fast path — if statsmodels is installed the well-tested statsmodels.tsa.stattools implementations are used instead, which have more reliable critical-value tables.

Test	H₀	Detects
ADF	Unit root exists	Evidence against unit root
KPSS	Series is level (or trend) stationary	Evidence of non-stationarity
PP	Unit root exists	Robust to serial correlation without requiring lag selection

The combined StationarityTester.summary() method reconciles all three tests and returns a human-readable verdict with recommended action.

Classes

StationarityResult: Frozen dataclass for a single test’s output.
StationarityTester: Stateless tester.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester

Stationary white noise:

>>> rng = np.random.default_rng(42)
>>> idx = pd.date_range("2020", periods=300, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(300), index=idx)
>>> r   = StationarityTester().adf(ts)
>>> r.is_stationary   # p < 0.05
True

Random walk (non-stationary):

>>> rw  = TimeSeries(np.cumsum(rng.standard_normal(300)), index=idx)
>>> r2  = StationarityTester().adf(rw)
>>> r2.is_stationary
False

class tseda.statistics.stationarity.StationarityResult(test_name, statistic, p_value, critical_values, n_lags, regression, is_stationary, alpha, interpretation)[source]

Bases: object

Immutable result of a stationarity test.

Parameters:

test_name (str)
statistic (float)
p_value (float)
critical_values (dict)
n_lags (int | None)
regression (str)
is_stationary (bool)
alpha (float)
interpretation (str)

test_name

Name of the test (e.g., "ADF").

Type:: str

statistic

Test statistic value.

Type:: float

p_value

Approximate p-value.

Type:: float

critical_values

Critical values at standard significance levels ("1%", "5%", "10%").

Type:: dict of str → float

n_lags

Number of lags used (None for tests that do not select lags).

Type:: int or None

regression

Regression type used ("nc", "c", or "ct").

Type:: str

is_stationary

Convenience flag. For ADF / PP: p_value < alpha (reject unit root → evidence of stationarity). For KPSS: p_value > alpha (fail to reject stationarity null).

Type:: bool

alpha

Significance level used to set is_stationary.

Type:: float

interpretation

One-sentence plain-English summary of the result.

Type:: str

test_name: str

statistic: float

p_value: float

critical_values: dict

n_lags: int | None

regression: str

is_stationary: bool

alpha: float

interpretation: str

__repr__()[source]

Return repr(self).

Return type:: str

__init__(test_name, statistic, p_value, critical_values, n_lags, regression, is_stationary, alpha, interpretation)

Parameters:

test_name (str)
statistic (float)
p_value (float)
critical_values (dict)
n_lags (int | None)
regression (str)
is_stationary (bool)
alpha (float)
interpretation (str)

Return type:

None

class tseda.statistics.stationarity.StationarityTester[source]

Bases: object

Test a TimeSeries for stationarity.

All methods return a StationarityResult and are stateless.

adf(ts, maxlag, regression, alpha)[source]

Augmented Dickey-Fuller test.

Parameters:

ts (TimeSeries)
maxlag (int | None)
regression (str)
alpha (float)

Return type:

kpss(ts, regression, alpha)[source]

KPSS test.

Parameters:

ts (TimeSeries)
regression (str)
alpha (float)

Return type:

pp(ts, regression, alpha)[source]

Phillips-Perron test (delegates to statsmodels if available).

Parameters:

ts (TimeSeries)
regression (str)
alpha (float)

Return type:

summary(ts, alpha)[source]

Run ADF + KPSS and return a combined verdict string.

Parameters:

ts (TimeSeries)
regression (str)
alpha (float)

Return type:

Notes

When statsmodels is installed the adf and kpss methods automatically use its implementations, which have more accurate critical-value tables. Install with pip install statsmodels.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester

>>> rng = np.random.default_rng(0)
>>> idx = pd.date_range("2020", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx)
>>> r   = StationarityTester().adf(ts)
>>> r.is_stationary
True

adf(ts, *, maxlag=None, regression='c', alpha=0.05)[source]

Augmented Dickey-Fuller unit-root test.

H₀: The series has a unit root (is non-stationary). H₁: The series is stationary.

Reject H₀ (small p-value) → evidence of stationarity.

Parameters:

ts (TimeSeries) – Input series. NaN values are dropped before testing.
maxlag (int, optional) – Maximum lag to consider for AIC-based lag selection. Defaults to int(12 * (n / 100) ** 0.25) (Schwert 1989).
regression (str, optional) –
Deterministic terms to include in the test equation.
- "nc" — no constant, no trend.
- "c" — constant only (default).
- "ct" — constant + linear trend.
alpha (float, optional) – Significance level for is_stationary. Default 0.05.

Return type:

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If fewer than 10 non-NaN observations or regression is invalid.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester
>>> rng = np.random.default_rng(1)
>>> idx = pd.date_range("2020", periods=150, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(150), index=idx)
>>> StationarityTester().adf(ts).is_stationary
True

kpss(ts, *, regression='c', alpha=0.05)[source]

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) stationarity test.

H₀: The series is level (or trend) stationary. H₁: The series has a unit root.

Fail to reject H₀ (large p-value) → evidence of stationarity. This is the opposite null from ADF.

Parameters:

ts (TimeSeries) – Input series.
regression (str, optional) – "c" — test for level stationarity (default). "ct" — test for trend stationarity.
alpha (float, optional) – Significance level. Default 0.05.

Return type:

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester
>>> rng = np.random.default_rng(2)
>>> idx = pd.date_range("2020", periods=150, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(150), index=idx)
>>> StationarityTester().kpss(ts).is_stationary
True

pp(ts, *, regression='c', alpha=0.05)[source]

Phillips-Perron unit-root test.

Like ADF but uses a non-parametric correction for serial correlation (no lag selection required). Requires statsmodels.

H₀: The series has a unit root. H₁: The series is stationary.

Parameters:

ts (TimeSeries) – Input series.
regression (str, optional) – "c" (default) or "ct".
alpha (float, optional) – Significance level. Default 0.05.

Return type:

Raises:

ImportError – If statsmodels is not installed.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.statistics.stationarity import StationarityTester
>>> rng = np.random.default_rng(3)
>>> idx = pd.date_range("2020", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx)
>>> StationarityTester().pp(ts).is_stationary
True

summary(ts, *, regression='c', alpha=0.05)[source]

Run ADF + KPSS and return a human-readable combined verdict.

The two tests have opposite nulls, so their results can be reconciled:

ADF	KPSS	Verdict
stat.	stat.	Strong evidence of stationarity
stat.	non-s.	Trend stationary — consider detrending
non-s.	stat.	Difference stationary — try differencing
non-s.	non-s.	Strong evidence of non-stationarity

Parameters:

ts (TimeSeries) – Input series.
regression (str, optional) – Passed to both ADF and KPSS. Default "c".
alpha (float, optional) – Significance level. Default 0.05.

Returns:

Multi-line plain-English summary.

Return type: