tseda.changepoint

Structural break (changepoint) detection for time series.

Public API

ChangepointReport: Frozen dataclass with changepoint positions, timestamps, scores, and a segment-label helper.
ChangepointDetector: Stateless detector: CUSUM, binary segmentation, variance ratio.

class tseda.changepoint.ChangepointReport(changepoints, timestamps, n_changepoints, scores, method)[source]

Bases: object

Immutable changepoint detection result.

Parameters:

changepoints (List[int])
timestamps (DatetimeIndex)
n_changepoints (int)
scores (ndarray)
method (str)

changepoints

0-based integer positions of detected changepoints, sorted ascending. A changepoint at position k means the break occurs between observations k-1 and k.

Type:: list of int

timestamps

Timestamps corresponding to each changepoint position.

Type:: pandas.DatetimeIndex

n_changepoints

Number of detected changepoints.

Type:: int

scores

Continuous changepoint score in [0, 1] for each observation. Higher values indicate stronger evidence of a structural break at or near that position.

Type:: numpy.ndarray

method

Name of the detection method.

Type:: str

changepoints: List[int]

timestamps: DatetimeIndex

n_changepoints: int

scores: ndarray

method: str

segment_labels(n)[source]

Return a 0-indexed integer segment label for each of n observations.

Segment 0 spans [0, changepoints[0]), segment 1 spans [changepoints[0], changepoints[1]), and so on.

Parameters:: n (int) – Total number of observations.
Returns:: Shape (n,).
Return type:: numpy.ndarray of int

Examples

>>> from tseda.changepoint.detector import ChangepointReport
>>> import numpy as np, pandas as pd
>>> r = ChangepointReport(
...     changepoints=[3, 7],
...     timestamps=pd.DatetimeIndex([]),
...     n_changepoints=2,
...     scores=np.zeros(10),
...     method="test",
... )
>>> r.segment_labels(10).tolist()
[0, 0, 0, 1, 1, 1, 1, 2, 2, 2]

__repr__()[source]

Return repr(self).

Return type:: str

__init__(changepoints, timestamps, n_changepoints, scores, method)

Parameters:

changepoints (List[int])
timestamps (DatetimeIndex)
n_changepoints (int)
scores (ndarray)
method (str)

Return type:

None

class tseda.changepoint.ChangepointDetector[source]

Bases: object

Detect structural breaks in a TimeSeries.

This class is stateless — one instance, many series.

cusum(ts, threshold, drift, target)[source]

Two-sided CUSUM control chart for mean shift.

Parameters:

ts (TimeSeries)
threshold (float)
drift (float)
target (float | None)

Return type:

ChangepointReport

binary_segmentation(ts, min_size, penalty)[source]

Recursive mean-shift changepoint detection.

Parameters:

ts (TimeSeries)
min_size (int)
penalty (float | None)

Return type:

ChangepointReport

variance_ratio(ts, window, alpha)[source]

Sliding F-test for variance shifts.

Parameters:

ts (TimeSeries)
window (int)
alpha (float)

Return type:

ChangepointReport

segment(ts, report)[source]

Return segment labels and per-segment statistics.

Parameters:

ts (TimeSeries)
report (ChangepointReport)

Return type:

DataFrame

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector

Single level shift:

>>> rng  = np.random.default_rng(0)
>>> idx  = pd.date_range("2020", periods=200, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100),
...                        rng.standard_normal(100) + 4.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> det  = ChangepointDetector()
>>> r    = det.binary_segmentation(ts)
>>> abs(r.changepoints[0] - 100) <= 5   # within 5 obs of true break
True

cusum(ts, *, threshold=5.0, drift=0.5, target=None)[source]

Two-sided CUSUM (Cumulative Sum) control chart for mean shift.

CUSUM accumulates deviations from a target mean. When the cumulative sum exceeds a threshold (expressed in units of σ), a changepoint is signalled.

Parameters:

ts (TimeSeries) – Input series.
threshold (float, optional) – Decision interval in multiples of σ (default 5.0). Higher values = less sensitive / fewer false alarms.
drift (float, optional) – Allowance parameter k (default 0.5). Typically set to half the magnitude of the smallest shift to detect, in units of σ.
target (float, optional) – Reference (in-control) mean. Defaults to the series mean.

Return type:

ChangepointReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If fewer than 10 non-NaN observations or threshold / drift ≤ 0.

Notes

The CUSUM chart for detecting upward shifts:

\[S_t^+ = \max\bigl(0,\; S_{t-1}^+ + (x_t - \mu_0) - k\sigma\bigr)\]

A changepoint is signalled when \(S_t^+ > h\sigma\) (or similarly for \(S_t^-\)). After each signal the accumulator is reset to zero.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(1)
>>> idx  = pd.date_range("2020", periods=300, freq="D")
>>> vals = np.concatenate([rng.standard_normal(150),
...                        rng.standard_normal(150) + 3.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> r    = ChangepointDetector().cusum(ts, threshold=5.0, drift=0.5)
>>> r.n_changepoints >= 1
True

binary_segmentation(ts, *, min_size=10, penalty=None)[source]

Recursive binary segmentation for mean-shift changepoints.

Iteratively finds the position that maximises the reduction in within-segment sum-of-squares error. A split is accepted when the gain exceeds penalty; recursion continues on each sub-segment.

Parameters:

ts (TimeSeries) – Input series.
min_size (int, optional) – Minimum number of observations per segment (default 10). Prevents detecting breaks on very small sub-sequences.
penalty (float, optional) – Minimum SSE gain required to accept a split. Defaults to n × σ² where σ is estimated from first-differences. A higher penalty → fewer changepoints.

Return type:

ChangepointReport

Notes

The algorithm has O(n²) time complexity per level of recursion. For very long series (n > 5000) consider using a larger min_size or restricting the search.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(0)
>>> idx  = pd.date_range("2020", periods=300, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100),
...                        rng.standard_normal(100) + 5.0,
...                        rng.standard_normal(100)])
>>> ts   = TimeSeries(vals, index=idx)
>>> r    = ChangepointDetector().binary_segmentation(ts)
>>> r.n_changepoints
2

variance_ratio(ts, *, window=30, alpha=0.05)[source]

Detect variance shifts via a sliding two-sample F-test.

Two adjacent windows of width window are compared at each position. A significant difference in variance (p < alpha) signals a variance-change changepoint.

Parameters:

ts (TimeSeries) – Input series.
window (int, optional) – Half-window width for each sample. Default 30.
alpha (float, optional) – Significance level for the F-test. Default 0.05.

Returns:

changepoints are positions with a significant variance shift, with consecutive positives merged into the maximum-score position.

Return type:

ChangepointReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If window < 3 or alpha outside (0, 1).

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(2)
>>> idx  = pd.date_range("2020", periods=200, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100) * 0.5,
...                        rng.standard_normal(100) * 3.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> r    = ChangepointDetector().variance_ratio(ts, window=20)
>>> r.n_changepoints >= 1
True

segment(ts, report)[source]

Return per-segment statistics for a change-point report.

Parameters:

ts (TimeSeries) – The original series.
report (ChangepointReport) – Output of any detection method.

Returns:

One row per segment with columns: segment, start, end, n_obs, mean, std, min, max.

Return type:

pandas.DataFrame

Raises:

TypeError – If either argument has the wrong type.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(0)
>>> idx  = pd.date_range("2020", periods=200, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100),
...                        rng.standard_normal(100) + 4.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> det  = ChangepointDetector()
>>> r    = det.binary_segmentation(ts)
>>> df   = det.segment(ts, r)
>>> len(df) == r.n_changepoints + 1
True

Report

class tseda.changepoint.detector.ChangepointReport(changepoints, timestamps, n_changepoints, scores, method)[source]

Bases: object

Immutable changepoint detection result.

Parameters:

changepoints (List[int])
timestamps (DatetimeIndex)
n_changepoints (int)
scores (ndarray)
method (str)

changepoints

0-based integer positions of detected changepoints, sorted ascending. A changepoint at position k means the break occurs between observations k-1 and k.

Type:: list of int

timestamps

Timestamps corresponding to each changepoint position.

Type:: pandas.DatetimeIndex

n_changepoints

Number of detected changepoints.

Type:: int

scores

Continuous changepoint score in [0, 1] for each observation. Higher values indicate stronger evidence of a structural break at or near that position.

Type:: numpy.ndarray

method

Name of the detection method.

Type:: str

changepoints: List[int]

timestamps: DatetimeIndex

n_changepoints: int

scores: ndarray

method: str

segment_labels(n)[source]

Return a 0-indexed integer segment label for each of n observations.

Segment 0 spans [0, changepoints[0]), segment 1 spans [changepoints[0], changepoints[1]), and so on.

Parameters:: n (int) – Total number of observations.
Returns:: Shape (n,).
Return type:: numpy.ndarray of int

Examples

>>> from tseda.changepoint.detector import ChangepointReport
>>> import numpy as np, pandas as pd
>>> r = ChangepointReport(
...     changepoints=[3, 7],
...     timestamps=pd.DatetimeIndex([]),
...     n_changepoints=2,
...     scores=np.zeros(10),
...     method="test",
... )
>>> r.segment_labels(10).tolist()
[0, 0, 0, 1, 1, 1, 1, 2, 2, 2]

__repr__()[source]

Return repr(self).

Return type:: str

__init__(changepoints, timestamps, n_changepoints, scores, method)

Parameters:

changepoints (List[int])
timestamps (DatetimeIndex)
n_changepoints (int)
scores (ndarray)
method (str)

Return type:

None

Detector

class tseda.changepoint.detector.ChangepointDetector[source]

Bases: object

Detect structural breaks in a TimeSeries.

This class is stateless — one instance, many series.

cusum(ts, threshold, drift, target)[source]

Two-sided CUSUM control chart for mean shift.

Parameters:

ts (TimeSeries)
threshold (float)
drift (float)
target (float | None)

Return type:

ChangepointReport

binary_segmentation(ts, min_size, penalty)[source]

Recursive mean-shift changepoint detection.

Parameters:

ts (TimeSeries)
min_size (int)
penalty (float | None)

Return type:

ChangepointReport

variance_ratio(ts, window, alpha)[source]

Sliding F-test for variance shifts.

Parameters:

ts (TimeSeries)
window (int)
alpha (float)

Return type:

ChangepointReport

segment(ts, report)[source]

Return segment labels and per-segment statistics.

Parameters:

ts (TimeSeries)
report (ChangepointReport)

Return type:

DataFrame

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector

Single level shift:

>>> rng  = np.random.default_rng(0)
>>> idx  = pd.date_range("2020", periods=200, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100),
...                        rng.standard_normal(100) + 4.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> det  = ChangepointDetector()
>>> r    = det.binary_segmentation(ts)
>>> abs(r.changepoints[0] - 100) <= 5   # within 5 obs of true break
True

cusum(ts, *, threshold=5.0, drift=0.5, target=None)[source]

Two-sided CUSUM (Cumulative Sum) control chart for mean shift.

CUSUM accumulates deviations from a target mean. When the cumulative sum exceeds a threshold (expressed in units of σ), a changepoint is signalled.

Parameters:

ts (TimeSeries) – Input series.
threshold (float, optional) – Decision interval in multiples of σ (default 5.0). Higher values = less sensitive / fewer false alarms.
drift (float, optional) – Allowance parameter k (default 0.5). Typically set to half the magnitude of the smallest shift to detect, in units of σ.
target (float, optional) – Reference (in-control) mean. Defaults to the series mean.

Return type:

ChangepointReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If fewer than 10 non-NaN observations or threshold / drift ≤ 0.

Notes

The CUSUM chart for detecting upward shifts:

\[S_t^+ = \max\bigl(0,\; S_{t-1}^+ + (x_t - \mu_0) - k\sigma\bigr)\]

A changepoint is signalled when \(S_t^+ > h\sigma\) (or similarly for \(S_t^-\)). After each signal the accumulator is reset to zero.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(1)
>>> idx  = pd.date_range("2020", periods=300, freq="D")
>>> vals = np.concatenate([rng.standard_normal(150),
...                        rng.standard_normal(150) + 3.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> r    = ChangepointDetector().cusum(ts, threshold=5.0, drift=0.5)
>>> r.n_changepoints >= 1
True

binary_segmentation(ts, *, min_size=10, penalty=None)[source]

Recursive binary segmentation for mean-shift changepoints.

Iteratively finds the position that maximises the reduction in within-segment sum-of-squares error. A split is accepted when the gain exceeds penalty; recursion continues on each sub-segment.

Parameters:

ts (TimeSeries) – Input series.
min_size (int, optional) – Minimum number of observations per segment (default 10). Prevents detecting breaks on very small sub-sequences.
penalty (float, optional) – Minimum SSE gain required to accept a split. Defaults to n × σ² where σ is estimated from first-differences. A higher penalty → fewer changepoints.

Return type:

ChangepointReport

Notes

The algorithm has O(n²) time complexity per level of recursion. For very long series (n > 5000) consider using a larger min_size or restricting the search.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(0)
>>> idx  = pd.date_range("2020", periods=300, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100),
...                        rng.standard_normal(100) + 5.0,
...                        rng.standard_normal(100)])
>>> ts   = TimeSeries(vals, index=idx)
>>> r    = ChangepointDetector().binary_segmentation(ts)
>>> r.n_changepoints
2

variance_ratio(ts, *, window=30, alpha=0.05)[source]

Detect variance shifts via a sliding two-sample F-test.

Two adjacent windows of width window are compared at each position. A significant difference in variance (p < alpha) signals a variance-change changepoint.

Parameters:

ts (TimeSeries) – Input series.
window (int, optional) – Half-window width for each sample. Default 30.
alpha (float, optional) – Significance level for the F-test. Default 0.05.

Returns:

changepoints are positions with a significant variance shift, with consecutive positives merged into the maximum-score position.

Return type:

ChangepointReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If window < 3 or alpha outside (0, 1).

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(2)
>>> idx  = pd.date_range("2020", periods=200, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100) * 0.5,
...                        rng.standard_normal(100) * 3.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> r    = ChangepointDetector().variance_ratio(ts, window=20)
>>> r.n_changepoints >= 1
True

segment(ts, report)[source]

Return per-segment statistics for a change-point report.

Parameters:

ts (TimeSeries) – The original series.
report (ChangepointReport) – Output of any detection method.

Returns:

One row per segment with columns: segment, start, end, n_obs, mean, std, min, max.

Return type:

pandas.DataFrame

Raises:

TypeError – If either argument has the wrong type.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.changepoint.detector import ChangepointDetector
>>> rng  = np.random.default_rng(0)
>>> idx  = pd.date_range("2020", periods=200, freq="D")
>>> vals = np.concatenate([rng.standard_normal(100),
...                        rng.standard_normal(100) + 4.0])
>>> ts   = TimeSeries(vals, index=idx)
>>> det  = ChangepointDetector()
>>> r    = det.binary_segmentation(ts)
>>> df   = det.segment(ts, r)
>>> len(df) == r.n_changepoints + 1
True