Forecastability

tseda.forecastability — Forecast readiness scoring and leakage detection.

Classes

ForecastabilityReport: Immutable result of ForecastabilityScorer.score().
ForecastabilityScorer: Composite 0–100 forecastability scorer.
LeakageReport: Immutable result of LeakageDetector.check().
LeakageDetector: Temporal and target leakage detector for feature sets.

class tseda.forecastability.ForecastabilityReport(score, sub_scores, recommended_model, recommended_diff, recommended_period, n_obs, pct_missing, pct_outlier, is_stationary, dominant_period)[source]

Bases: object

Immutable forecastability assessment.

Parameters:

score (float)
sub_scores (Dict[str, float])
recommended_model (str)
recommended_diff (int)
recommended_period (int | None)
n_obs (int)
pct_missing (float)
pct_outlier (float)
is_stationary (bool)
dominant_period (int | None)

score

Overall forecastability score in [0, 100]. Higher is better.

Type:: float

sub_scores

Individual sub-scores (0–100 each) keyed by sub-score name.

Type:: dict of str → float

recommended_model

Suggested modelling approach: "ARIMA", "SARIMA", "ETS", "Prophet", or "ML".

Type:: str

recommended_diff

Recommended differencing order: 0 (already stationary) or 1.

Type:: int

recommended_period

Dominant seasonal period detected, or None if no seasonality found.

Type:: int or None

n_obs

Number of observations in the series.

Type:: int

pct_missing

Percentage of NaN values.

Type:: float

pct_outlier

Percentage of IQR-flagged outliers.

Type:: float

is_stationary

True when the ADF test rejects the unit-root null.

Type:: bool

dominant_period

Same as recommended_period.

Type:: int or None

score: float

sub_scores: Dict[str, float]

recommended_model: str

recommended_diff: int

recommended_period: int | None

n_obs: int

pct_missing: float

pct_outlier: float

is_stationary: bool

dominant_period: int | None

__repr__()[source]

Return repr(self).

Return type:: str

__init__(score, sub_scores, recommended_model, recommended_diff, recommended_period, n_obs, pct_missing, pct_outlier, is_stationary, dominant_period)

Parameters:

score (float)
sub_scores (Dict[str, float])
recommended_model (str)
recommended_diff (int)
recommended_period (int | None)
n_obs (int)
pct_missing (float)
pct_outlier (float)
is_stationary (bool)
dominant_period (int | None)

Return type:

None

class tseda.forecastability.ForecastabilityScorer[source]

Bases: object

Assess how forecastable a TimeSeries is.

The scorer is stateless — calling score() multiple times is safe.

score(ts, period)[source]

Return a ForecastabilityReport with an overall 0–100 score.

Parameters:

ts (TimeSeries)
period (int | None)
alpha (float)

Return type:

ForecastabilityReport

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.scorer import ForecastabilityScorer

>>> rng = np.random.default_rng(1)
>>> idx = pd.date_range("2020", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx)
>>> r   = ForecastabilityScorer().score(ts)
>>> isinstance(r.score, float)
True

score(ts, *, period=None, alpha=0.05)[source]

Compute the forecastability score for ts.

Parameters:

ts (TimeSeries) – Input series.
period (int, optional) – Seasonal period. When None the period is detected automatically via the FFT periodogram.
alpha (float, optional) – Significance level used for stationarity and ACF tests. Default 0.05.

Return type:

ForecastabilityReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If period is given and is < 2, or ts has fewer than 4 obs.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.scorer import ForecastabilityScorer
>>> rng = np.random.default_rng(2)
>>> idx = pd.date_range("2020", periods=365, freq="D")
>>> n   = 365
>>> seas = np.sin(2 * np.pi * np.arange(n) / 7) * 3
>>> ts  = TimeSeries(seas + rng.standard_normal(n) * 0.5, index=idx)
>>> r   = ForecastabilityScorer().score(ts, period=7)
>>> r.recommended_period
7

class tseda.forecastability.LeakageReport(has_temporal_leakage, has_target_leakage, temporal_leakage_columns, target_leakage_columns, target_leakage_correlations, temporal_peak_lags, horizon, n_features, n_obs, warnings)[source]

Bases: object

Immutable leakage detection result.

Parameters:

has_temporal_leakage (bool)
has_target_leakage (bool)
temporal_leakage_columns (List[str])
target_leakage_columns (List[str])
target_leakage_correlations (Dict[str, float])
temporal_peak_lags (Dict[str, int])
horizon (int)
n_features (int)
n_obs (int)
warnings (List[str])

has_temporal_leakage

True if any feature shows stronger correlation with future target than with current / past target.

Type:: bool

has_target_leakage

True if any feature is correlated with the target at lag 0 above target_corr_threshold.

Type:: bool

temporal_leakage_columns

Names of feature columns flagged for temporal leakage.

Type:: list of str

target_leakage_columns

Names of feature columns flagged for target leakage.

Type:: list of str

target_leakage_correlations

Lag-0 Pearson correlation for each column in target_leakage_columns.

Type:: dict of str → float

temporal_peak_lags

For each feature column, the lag at which the cross-correlation with the target is maximised. Positive lag means feature correlates with future target.

Type:: dict of str → int

horizon

Forecast horizon passed to check().

Type:: int

n_features

Number of feature columns examined.

Type:: int

n_obs

Number of observations in the target series.

Type:: int

warnings

Human-readable diagnostic messages.

Type:: list of str

has_temporal_leakage: bool

has_target_leakage: bool

temporal_leakage_columns: List[str]

target_leakage_columns: List[str]

target_leakage_correlations: Dict[str, float]

temporal_peak_lags: Dict[str, int]

horizon: int

n_features: int

n_obs: int

warnings: List[str]

__repr__()[source]

Return repr(self).

Return type:: str

__init__(has_temporal_leakage, has_target_leakage, temporal_leakage_columns, target_leakage_columns, target_leakage_correlations, temporal_peak_lags, horizon, n_features, n_obs, warnings)

Parameters:

has_temporal_leakage (bool)
has_target_leakage (bool)
temporal_leakage_columns (List[str])
target_leakage_columns (List[str])
target_leakage_correlations (Dict[str, float])
temporal_peak_lags (Dict[str, int])
horizon (int)
n_features (int)
n_obs (int)
warnings (List[str])

Return type:

None

class tseda.forecastability.LeakageDetector[source]

Bases: object

Detect temporal and target leakage in a feature set.

The detector is stateless.

check(ts, horizon, features_df, target_corr_threshold)[source]

Return a LeakageReport.

Parameters:

ts (TimeSeries)
horizon (int)
features_df (DataFrame | None)
target_corr_threshold (float)

Return type:

LeakageReport

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.leakage import LeakageDetector

Target leakage — a feature that is the target:

>>> rng = np.random.default_rng(0)
>>> n   = 80
>>> idx = pd.date_range("2020", periods=n, freq="D")
>>> y   = rng.standard_normal(n)
>>> ts  = TimeSeries(y, index=idx)
>>> feat = pd.DataFrame({"target_copy": y}, index=idx)
>>> r = LeakageDetector().check(ts, horizon=1, features_df=feat)
>>> r.has_target_leakage
True
>>> "target_copy" in r.target_leakage_columns
True

check(ts, horizon, *, features_df=None, target_corr_threshold=0.95)[source]

Check features_df for leakage against target ts.

Parameters:

ts (TimeSeries) – Target time series.
horizon (int) – Forecast horizon in time steps. Must be >= 1.
features_df (pandas.DataFrame, optional) – Feature matrix with the same DatetimeIndex as ts, one column per feature. When None the report is empty with a warning.
target_corr_threshold (float, optional) – Pearson r threshold above which a feature is flagged as target-leaking. Default 0.95.

Return type:

LeakageReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If horizon < 1, target_corr_threshold ∉ (0, 1], or features_df has a different number of rows from ts.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.leakage import LeakageDetector

>>> rng = np.random.default_rng(1)
>>> n   = 60
>>> idx = pd.date_range("2020", periods=n, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(n), index=idx)
>>> r   = LeakageDetector().check(ts, horizon=3)
>>> r.n_features
0

Forecastability scoring for time series.

Computes a composite 0–100 readiness score from six diagnostic sub-scores and recommends a modelling strategy.

Sub-scores and weights

Classes

ForecastabilityReport: Frozen dataclass returned by ForecastabilityScorer.score().
ForecastabilityScorer: Stateless scorer.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.scorer import ForecastabilityScorer

Simple AR(1) process — moderate forecastability:

>>> rng = np.random.default_rng(0)
>>> n   = 300
>>> idx = pd.date_range("2020-01-01", periods=n, freq="D")
>>> eps = rng.standard_normal(n)
>>> x   = np.zeros(n)
>>> for i in range(1, n): x[i] = 0.7 * x[i-1] + eps[i]
>>> ts  = TimeSeries(x, index=idx)
>>> r   = ForecastabilityScorer().score(ts)
>>> 0 <= r.score <= 100
True
>>> r.recommended_model in ("ARIMA", "SARIMA", "ETS", "Prophet", "ML")
True

class tseda.forecastability.scorer.ForecastabilityReport(score, sub_scores, recommended_model, recommended_diff, recommended_period, n_obs, pct_missing, pct_outlier, is_stationary, dominant_period)[source]

Bases: object

Immutable forecastability assessment.

Parameters:

score (float)
sub_scores (Dict[str, float])
recommended_model (str)
recommended_diff (int)
recommended_period (int | None)
n_obs (int)
pct_missing (float)
pct_outlier (float)
is_stationary (bool)
dominant_period (int | None)

score

Overall forecastability score in [0, 100]. Higher is better.

Type:: float

sub_scores

Individual sub-scores (0–100 each) keyed by sub-score name.

Type:: dict of str → float

recommended_model

Suggested modelling approach: "ARIMA", "SARIMA", "ETS", "Prophet", or "ML".

Type:: str

recommended_diff

Recommended differencing order: 0 (already stationary) or 1.

Type:: int

recommended_period

Dominant seasonal period detected, or None if no seasonality found.

Type:: int or None

n_obs

Number of observations in the series.

Type:: int

pct_missing

Percentage of NaN values.

Type:: float

pct_outlier

Percentage of IQR-flagged outliers.

Type:: float

is_stationary

True when the ADF test rejects the unit-root null.

Type:: bool

dominant_period

Same as recommended_period.

Type:: int or None

score: float

sub_scores: Dict[str, float]

recommended_model: str

recommended_diff: int

recommended_period: int | None

n_obs: int

pct_missing: float

pct_outlier: float

is_stationary: bool

dominant_period: int | None

__repr__()[source]

Return repr(self).

Return type:: str

__init__(score, sub_scores, recommended_model, recommended_diff, recommended_period, n_obs, pct_missing, pct_outlier, is_stationary, dominant_period)

Parameters:

score (float)
sub_scores (Dict[str, float])
recommended_model (str)
recommended_diff (int)
recommended_period (int | None)
n_obs (int)
pct_missing (float)
pct_outlier (float)
is_stationary (bool)
dominant_period (int | None)

Return type:

None

class tseda.forecastability.scorer.ForecastabilityScorer[source]

Bases: object

Assess how forecastable a TimeSeries is.

The scorer is stateless — calling score() multiple times is safe.

score(ts, period)[source]

Return a ForecastabilityReport with an overall 0–100 score.

Parameters:

ts (TimeSeries)
period (int | None)
alpha (float)

Return type:

ForecastabilityReport

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.scorer import ForecastabilityScorer

>>> rng = np.random.default_rng(1)
>>> idx = pd.date_range("2020", periods=200, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(200), index=idx)
>>> r   = ForecastabilityScorer().score(ts)
>>> isinstance(r.score, float)
True

score(ts, *, period=None, alpha=0.05)[source]

Compute the forecastability score for ts.

Parameters:

ts (TimeSeries) – Input series.
period (int, optional) – Seasonal period. When None the period is detected automatically via the FFT periodogram.
alpha (float, optional) – Significance level used for stationarity and ACF tests. Default 0.05.

Return type:

ForecastabilityReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If period is given and is < 2, or ts has fewer than 4 obs.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.scorer import ForecastabilityScorer
>>> rng = np.random.default_rng(2)
>>> idx = pd.date_range("2020", periods=365, freq="D")
>>> n   = 365
>>> seas = np.sin(2 * np.pi * np.arange(n) / 7) * 3
>>> ts  = TimeSeries(seas + rng.standard_normal(n) * 0.5, index=idx)
>>> r   = ForecastabilityScorer().score(ts, period=7)
>>> r.recommended_period
7

Leakage detection for time series feature sets.

Two classes of leakage are detected:

When features_df is None the report is returned with empty leakage sets and a warning that no features were provided.

Classes

LeakageReport: Frozen dataclass returned by LeakageDetector.check().
LeakageDetector: Stateless detector.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.leakage import LeakageDetector

No leakage — lagged features only:

>>> rng  = np.random.default_rng(0)
>>> n    = 100
>>> idx  = pd.date_range("2020", periods=n, freq="D")
>>> y    = rng.standard_normal(n)
>>> ts   = TimeSeries(y, index=idx)
>>> feat = pd.DataFrame({"lag1": np.roll(y, 1), "lag2": np.roll(y, 2)}, index=idx)
>>> feat.iloc[:2] = np.nan
>>> r    = LeakageDetector().check(ts, horizon=5, features_df=feat)
>>> r.has_target_leakage
False

class tseda.forecastability.leakage.LeakageReport(has_temporal_leakage, has_target_leakage, temporal_leakage_columns, target_leakage_columns, target_leakage_correlations, temporal_peak_lags, horizon, n_features, n_obs, warnings)[source]

Bases: object

Immutable leakage detection result.

Parameters:

has_temporal_leakage (bool)
has_target_leakage (bool)
temporal_leakage_columns (List[str])
target_leakage_columns (List[str])
target_leakage_correlations (Dict[str, float])
temporal_peak_lags (Dict[str, int])
horizon (int)
n_features (int)
n_obs (int)
warnings (List[str])

has_temporal_leakage

True if any feature shows stronger correlation with future target than with current / past target.

Type:: bool

has_target_leakage

True if any feature is correlated with the target at lag 0 above target_corr_threshold.

Type:: bool

temporal_leakage_columns

Names of feature columns flagged for temporal leakage.

Type:: list of str

target_leakage_columns

Names of feature columns flagged for target leakage.

Type:: list of str

target_leakage_correlations

Lag-0 Pearson correlation for each column in target_leakage_columns.

Type:: dict of str → float

temporal_peak_lags

For each feature column, the lag at which the cross-correlation with the target is maximised. Positive lag means feature correlates with future target.

Type:: dict of str → int

horizon

Forecast horizon passed to check().

Type:: int

n_features

Number of feature columns examined.

Type:: int

n_obs

Number of observations in the target series.

Type:: int

warnings

Human-readable diagnostic messages.

Type:: list of str

has_temporal_leakage: bool

has_target_leakage: bool

temporal_leakage_columns: List[str]

target_leakage_columns: List[str]

target_leakage_correlations: Dict[str, float]

temporal_peak_lags: Dict[str, int]

horizon: int

n_features: int

n_obs: int

warnings: List[str]

__repr__()[source]

Return repr(self).

Return type:: str

__init__(has_temporal_leakage, has_target_leakage, temporal_leakage_columns, target_leakage_columns, target_leakage_correlations, temporal_peak_lags, horizon, n_features, n_obs, warnings)

Parameters:

has_temporal_leakage (bool)
has_target_leakage (bool)
temporal_leakage_columns (List[str])
target_leakage_columns (List[str])
target_leakage_correlations (Dict[str, float])
temporal_peak_lags (Dict[str, int])
horizon (int)
n_features (int)
n_obs (int)
warnings (List[str])

Return type:

None

class tseda.forecastability.leakage.LeakageDetector[source]

Bases: object

Detect temporal and target leakage in a feature set.

The detector is stateless.

check(ts, horizon, features_df, target_corr_threshold)[source]

Return a LeakageReport.

Parameters:

ts (TimeSeries)
horizon (int)
features_df (DataFrame | None)
target_corr_threshold (float)

Return type:

LeakageReport

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.leakage import LeakageDetector

Target leakage — a feature that is the target:

>>> rng = np.random.default_rng(0)
>>> n   = 80
>>> idx = pd.date_range("2020", periods=n, freq="D")
>>> y   = rng.standard_normal(n)
>>> ts  = TimeSeries(y, index=idx)
>>> feat = pd.DataFrame({"target_copy": y}, index=idx)
>>> r = LeakageDetector().check(ts, horizon=1, features_df=feat)
>>> r.has_target_leakage
True
>>> "target_copy" in r.target_leakage_columns
True

check(ts, horizon, *, features_df=None, target_corr_threshold=0.95)[source]

Check features_df for leakage against target ts.

Parameters:

ts (TimeSeries) – Target time series.
horizon (int) – Forecast horizon in time steps. Must be >= 1.
features_df (pandas.DataFrame, optional) – Feature matrix with the same DatetimeIndex as ts, one column per feature. When None the report is empty with a warning.
target_corr_threshold (float, optional) – Pearson r threshold above which a feature is flagged as target-leaking. Default 0.95.

Return type:

LeakageReport

Raises:

TypeError – If ts is not a TimeSeries.
ValueError – If horizon < 1, target_corr_threshold ∉ (0, 1], or features_df has a different number of rows from ts.

Examples

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> from tseda.forecastability.leakage import LeakageDetector

>>> rng = np.random.default_rng(1)
>>> n   = 60
>>> idx = pd.date_range("2020", periods=n, freq="D")
>>> ts  = TimeSeries(rng.standard_normal(n), index=idx)
>>> r   = LeakageDetector().check(ts, horizon=3)
>>> r.n_features
0