Quickstart

tseda — Time Series Exploratory Data Analysis

A comprehensive, dependency-light Python toolkit for understanding time series data before forecasting it.

Quick Start

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.cumsum(np.random.randn(365)), index=idx,
...                 name="stock_price", unit="USD")
>>> print(ts)

Modules

core

TimeSeries data structure and validators.

quality

Missing-value analysis, outlier detection, and duplicate checks.

statistics

Descriptive statistics, stationarity tests, and autocorrelation.

decomposition

Classical (additive/multiplicative) and STL decomposition.

seasonality

Period detection via periodogram and autocorrelation.

anomaly

Point and contextual anomaly detection.

changepoint

Structural break detection.

features

Temporal, statistical, and spectral feature extraction.

forecastability

Forecast-readiness scoring and data-leakage detection.

visualization

Matplotlib-based plot suite.

report

HTML and console report generation.

class tseda.TimeSeries(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]

Bases: object

Univariate time series with a pandas.DatetimeIndex.

Parameters:
  • data (Union[ArrayLike, pd.Series]) –

    Numeric values. Accepted types:

  • index (Optional[DatetimeLike]) –

    Datetime timestamps aligned with data. When data is a pandas.Series with a pandas.DatetimeIndex this argument may be omitted. Accepted types:

  • name (str) – Short identifier for the series (used in plots and reports). Default "value".

  • freq (Optional[str]) – Pandas offset alias (e.g., "D", "h", "MS"). When None (default) the frequency is inferred automatically.

  • unit (Optional[str]) – Physical unit of the values (e.g., "USD", "°C"). Purely informational — used in axis labels.

  • description (Optional[str]) – Free-text description stored in metadata.

Raises:
  • TypeError – If data or index have an unsupported type.

  • ValueError – If data and index have different lengths, if index is not monotonically increasing, or if index contains duplicates.

Examples

From a numpy array:

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> idx = pd.date_range("2020-01-01", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.5, 9.8, 12.0, 11.0], index=idx)
>>> ts.n
5

From a pandas Series:

>>> s = pd.Series([1, 2, 3], index=pd.date_range("2020", periods=3, freq="D"))
>>> ts = TimeSeries.from_series(s)
__init__(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]
Parameters:
Return type:

None

classmethod from_series(series, *, name=None, freq=None, unit=None, description=None)[source]

Construct a TimeSeries from a pandas.Series.

Parameters:
Return type:

TimeSeries

Examples

>>> s = pd.Series([1.0, 2.0], index=pd.date_range("2020", periods=2, freq="D"))
>>> TimeSeries.from_series(s, name="x").name
'x'
classmethod from_arrays(values, index, *, name='value', freq=None, unit=None, description=None)[source]

Construct a TimeSeries from parallel arrays.

Parameters:
Return type:

TimeSeries

Examples

>>> import numpy as np, pandas as pd
>>> vals = np.array([1.0, 2.0, 3.0])
>>> idx  = pd.date_range("2021-01-01", periods=3, freq="D")
>>> TimeSeries.from_arrays(vals, idx).n
3
classmethod from_dataframe(df, column, *, name=None, freq=None, unit=None, description=None)[source]

Extract one column from a pandas.DataFrame.

Parameters:
Return type:

TimeSeries

Raises:

KeyError – If column is not in df.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"temp": [20.0, 21.0, 19.5]},
...                    index=pd.date_range("2020", periods=3, freq="D"))
>>> TimeSeries.from_dataframe(df, "temp").name
'temp'
property values: ndarray

1-D float64 array of observed values.

Returns:

A copy to protect the internal state.

Return type:

numpy.ndarray

property index: DatetimeIndex

Datetime index of the series.

Return type:

pandas.DatetimeIndex

property n: int

Number of observations.

Return type:

int

property start: Timestamp

Timestamp of the first observation.

Return type:

pandas.Timestamp

property end: Timestamp

Timestamp of the last observation.

Return type:

pandas.Timestamp

property duration: Timedelta

Wall-clock span from the first to the last observation.

Return type:

pandas.Timedelta

property name: str

Short identifier for the series.

Return type:

str

property unit: str | None

Physical unit of the values, or None if unspecified.

Return type:

str or None

property description: str | None

Free-text description, or None if unspecified.

Return type:

str or None

property freq: str | None

Pandas offset alias (e.g., "D"), or None for irregular data.

Return type:

str or None

property freq_label: str

Human-readable frequency label (e.g., "Daily").

Return type:

str

property has_nan: bool

True when at least one value is NaN.

Return type:

bool

property n_nan: int

Number of NaN values.

Return type:

int

property is_regular: bool

True when all consecutive time gaps are identical.

A regular series has no missing timestamps (assuming a fixed sampling interval). An irregular series may be the result of market holidays, sensor outages, or event-driven sampling.

Return type:

bool

to_series()[source]

Return the data as a pandas.Series.

The returned Series uses the same DatetimeIndex and the name attribute as its Series name.

Return type:

pandas.Series

to_frame()[source]

Return the data as a single-column pandas.DataFrame.

Returns:

Column name equals name.

Return type:

pandas.DataFrame

to_numpy()[source]

Return a copy of the raw values as a 1-D numpy array.

Return type:

numpy.ndarray

copy()[source]

Return a deep copy of this TimeSeries.

Return type:

TimeSeries

slice(start=None, end=None)[source]

Return a time-bounded subset of the series.

Both start and end are inclusive. Either may be None to leave that boundary open.

Parameters:
  • start (str | Timestamp | None) – Start timestamp (inclusive). Accepts any value parseable by pandas.Timestamp() (e.g., "2020-01-01").

  • end (str | Timestamp | None) – End timestamp (inclusive).

Return type:

TimeSeries

Raises:

ValueError – If the resulting slice is empty.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.arange(365.0), index=idx)
>>> q1 = ts.slice("2020-01-01", "2020-03-31")
>>> q1.n
91
resample(freq, *, agg=AggMethod.MEAN)[source]

Resample the series to a new frequency.

Parameters:
  • freq (str) – Target pandas offset alias (e.g., "W", "MS").

  • agg (str | AggMethod) – Aggregation method. Either an AggMethod member or its string value. Default "mean".

Return type:

TimeSeries

Raises:

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.ones(365), index=idx)
>>> ts.resample("MS").n    # 12 monthly values
12
diff(periods=1, *, method=DiffMethod.SIMPLE)[source]

Difference the series.

Parameters:
  • periods (int) – Number of periods to lag. Default 1 (first difference).

  • method (str | DiffMethod) –

    One of:

    • "simple"y[t] - y[t-k]

    • "log"log(y[t]) - log(y[t-k])

    • "percent"(y[t] - y[t-k]) / y[t-k]

Returns:

The leading NaN rows introduced by differencing are dropped.

Return type:

TimeSeries

Raises:

ValueError – If method is "log" or "percent" and the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.0, 12.0, 11.0, 13.0], index=idx)
>>> ts.diff().values
array([1., 1., -1., 2.])
log()[source]

Apply the natural logarithm element-wise.

Return type:

TimeSeries

Raises:

ValueError – If the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> TimeSeries([1.0, np.e, np.e**2], index=idx).log().values
array([0., 1., 2.])
standardize()[source]

Standardise to zero mean and unit variance (z-score).

The transform is (x - mean) / std. NaN values are ignored when computing statistics but preserved in position.

Return type:

TimeSeries

Raises:

ValueError – If the standard deviation is zero (constant series).

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=4, freq="D")
>>> ts = TimeSeries([2.0, 4.0, 6.0, 8.0], index=idx)
>>> z = ts.standardize()
>>> round(float(z.values.mean()), 10)
0.0
normalize(*, lower=0.0, upper=1.0)[source]

Min-max normalise the series to [lower, upper].

Parameters:
  • lower (float) – Target minimum value. Default 0.0.

  • upper (float) – Target maximum value. Default 1.0.

Return type:

TimeSeries

Raises:

ValueError – If the series has zero range (max == min) or lower >= upper.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([0.0, 5.0, 10.0], index=idx)
>>> ts.normalize().values
array([0. , 0.5, 1. ])
rolling(window, *, agg=AggMethod.MEAN, center=False, min_periods=None)[source]

Apply a rolling-window aggregation.

Parameters:
  • window (int) – Size of the rolling window in number of observations.

  • agg (str | AggMethod) – Aggregation method (default "mean").

  • center (bool) – Whether to set the window labels as the centre of the window (default False — trailing window).

  • min_periods (int | None) – Minimum number of non-NaN observations required to produce a value. Defaults to window.

Returns:

Leading/trailing NaNs introduced by the window are dropped.

Return type:

TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=6, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], index=idx)
>>> ts.rolling(3).values
array([2., 3., 4., 5.])
apply(func, *, name=None)[source]

Apply an arbitrary element-wise function to the values.

Parameters:
  • func (Callable[[ndarray], ndarray]) – Callable that takes a 1-D numpy.ndarray and returns a 1-D array of the same length.

  • name (str | None) – Name for the resulting series. Defaults to "f({self.name})".

Return type:

TimeSeries

Raises:

ValueError – If func changes the length of the array.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 4.0, 9.0], index=idx)
>>> ts.apply(np.sqrt).values
array([1., 2., 3.])
__len__()[source]
Return type:

int

__contains__(timestamp)[source]

Check whether a timestamp exists in the index.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0], index=idx)
>>> pd.Timestamp("2020-01-02") in ts
True
Parameters:

timestamp (object)

Return type:

bool

__getitem__(key)[source]

Positional indexing by integer or slice.

Parameters:

key (int | slice) –

  • int — return the scalar value at that position.

  • slice — return a new TimeSeries for that range.

Return type:

float | TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 20.0, 30.0, 40.0, 50.0], index=idx)
>>> ts[0]
10.0
>>> ts[-1]
50.0
>>> ts[1:3].values
array([20., 30.])
__repr__()[source]

Return repr(self).

Return type:

str

class tseda.Frequency(*values)[source]

Bases: str, Enum

Canonical pandas offset aliases recognised by tseda.

The string value of each member is a valid freq argument to pandas.date_range() and pandas.Series.resample().

Examples

>>> Frequency.DAILY.value
'D'
>>> Frequency.DAILY == "D"
True
SECONDLY = 'S'
MINUTELY = 'min'
HOURLY = 'h'
DAILY = 'D'
BUSINESS_DAILY = 'B'
WEEKLY = 'W'
MONTHLY_START = 'MS'
MONTHLY_END = 'ME'
QUARTERLY_START = 'QS'
QUARTERLY_END = 'QE'
ANNUAL_START = 'YS'
ANNUAL_END = 'YE'
__repr__()

Return repr(self).

class tseda.AggMethod(*values)[source]

Bases: str, Enum

Aggregation functions available when resampling a TimeSeries.

The string value matches the pandas.core.resample.Resampler method name.

Examples

>>> AggMethod.MEAN.value
'mean'
MEAN = 'mean'
SUM = 'sum'
MIN = 'min'
MAX = 'max'
MEDIAN = 'median'
FIRST = 'first'
LAST = 'last'
STD = 'std'
VAR = 'var'
COUNT = 'count'
__repr__()

Return repr(self).

class tseda.DiffMethod(*values)[source]

Bases: str, Enum

Differencing mode for diff().

SIMPLE

y[t] - y[t-k] (standard first/kth difference).

LOG

log(y[t]) - log(y[t-k]) (log return / percent change in log scale).

PERCENT

(y[t] - y[t-k]) / y[t-k] (relative change).

SIMPLE = 'simple'
LOG = 'log'
PERCENT = 'percent'
__repr__()

Return repr(self).