Quickstart

tseda — Time Series Exploratory Data Analysis

A comprehensive, dependency-light Python toolkit for understanding time series data before forecasting it.

Quick Start

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries

>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.cumsum(np.random.randn(365)), index=idx,
...                 name="stock_price", unit="USD")
>>> print(ts)

Modules

core: TimeSeries data structure and validators.
quality: Missing-value analysis, outlier detection, and duplicate checks.
statistics: Descriptive statistics, stationarity tests, and autocorrelation.
decomposition: Classical (additive/multiplicative) and STL decomposition.
seasonality: Period detection via periodogram and autocorrelation.
anomaly: Point and contextual anomaly detection.
changepoint: Structural break detection.
features: Temporal, statistical, and spectral feature extraction.
forecastability: Forecast-readiness scoring and data-leakage detection.
visualization: Matplotlib-based plot suite.
report: HTML and console report generation.

class tseda.TimeSeries(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]

Bases: object

Univariate time series with a pandas.DatetimeIndex.

Parameters:

data (Union[ArrayLike, pd.Series]) –
Numeric values. Accepted types:
- 1-D numpy.ndarray
- pandas.Series — values are extracted; the Series index is used unless index is also provided.
- list or tuple of numbers
index (Optional[DatetimeLike]) –
Datetime timestamps aligned with data. When data is a pandas.Series with a pandas.DatetimeIndex this argument may be omitted. Accepted types:
- pandas.DatetimeIndex
- list / numpy.ndarray of datetime-like strings or numpy.datetime64 objects
name (str) – Short identifier for the series (used in plots and reports). Default "value".
freq (Optional[str]) – Pandas offset alias (e.g., "D", "h", "MS"). When None (default) the frequency is inferred automatically.
unit (Optional[str]) – Physical unit of the values (e.g., "USD", "°C"). Purely informational — used in axis labels.
description (Optional[str]) – Free-text description stored in metadata.

Raises:

TypeError – If data or index have an unsupported type.
ValueError – If data and index have different lengths, if index is not monotonically increasing, or if index contains duplicates.

Examples

From a numpy array:

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> idx = pd.date_range("2020-01-01", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.5, 9.8, 12.0, 11.0], index=idx)
>>> ts.n
5

From a pandas Series:

>>> s = pd.Series([1, 2, 3], index=pd.date_range("2020", periods=3, freq="D"))
>>> ts = TimeSeries.from_series(s)

__init__(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]

Parameters:

data (ndarray | list | tuple | Series)
index (DatetimeIndex | Series | list | ndarray | None)
name (str)
freq (str | None)
unit (str | None)
description (str | None)

Return type:

None

classmethod from_series(series, *, name=None, freq=None, unit=None, description=None)[source]

Construct a TimeSeries from a pandas.Series.

Parameters:

series (Series) – Must have a pandas.DatetimeIndex.
name (str | None) – Override the Series’ .name attribute. When None the Series name (if any) is used, falling back to "value".
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Examples

>>> s = pd.Series([1.0, 2.0], index=pd.date_range("2020", periods=2, freq="D"))
>>> TimeSeries.from_series(s, name="x").name
'x'

classmethod from_arrays(values, index, *, name='value', freq=None, unit=None, description=None)[source]

Construct a TimeSeries from parallel arrays.

Parameters:

values (ndarray | list | tuple | Series) – 1-D numeric array.
index (DatetimeIndex | Series | list | ndarray) – Datetime-like array of the same length.
name (str) – Forwarded to TimeSeries.__init__.
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Examples

>>> import numpy as np, pandas as pd
>>> vals = np.array([1.0, 2.0, 3.0])
>>> idx  = pd.date_range("2021-01-01", periods=3, freq="D")
>>> TimeSeries.from_arrays(vals, idx).n
3

classmethod from_dataframe(df, column, *, name=None, freq=None, unit=None, description=None)[source]

Extract one column from a pandas.DataFrame.

Parameters:

df (DataFrame) – Source DataFrame. Must have a pandas.DatetimeIndex.
column (str) – Column name to extract.
name (str | None) – Override the column name as the series name.
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Raises:

KeyError – If column is not in df.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"temp": [20.0, 21.0, 19.5]},
...                    index=pd.date_range("2020", periods=3, freq="D"))
>>> TimeSeries.from_dataframe(df, "temp").name
'temp'

property values: ndarray

1-D float64 array of observed values.

Returns:: A copy to protect the internal state.
Return type:: numpy.ndarray

property index: DatetimeIndex

Datetime index of the series.

Return type:: pandas.DatetimeIndex

property n: int

Number of observations.

Return type:: int

property start: Timestamp

Timestamp of the first observation.

Return type:: pandas.Timestamp

property end: Timestamp

Timestamp of the last observation.

Return type:: pandas.Timestamp

property duration: Timedelta

Wall-clock span from the first to the last observation.

Return type:: pandas.Timedelta

property name: str

Short identifier for the series.

Return type:: str

property unit: str | None

Physical unit of the values, or None if unspecified.

Return type:: str or None

property description: str | None

Free-text description, or None if unspecified.

Return type:: str or None

property freq: str | None

Pandas offset alias (e.g., "D"), or None for irregular data.

Return type:: str or None

property freq_label: str

Human-readable frequency label (e.g., "Daily").

Return type:: str

property has_nan: bool

True when at least one value is NaN.

Return type:: bool

property n_nan: int

Number of NaN values.

Return type:: int

property is_regular: bool

True when all consecutive time gaps are identical.

A regular series has no missing timestamps (assuming a fixed sampling interval). An irregular series may be the result of market holidays, sensor outages, or event-driven sampling.

Return type:: bool

to_series()[source]

Return the data as a pandas.Series.

The returned Series uses the same DatetimeIndex and the name attribute as its Series name.

Return type:: pandas.Series

to_frame()[source]

Return the data as a single-column pandas.DataFrame.

Returns:: Column name equals name.
Return type:: pandas.DataFrame

to_numpy()[source]

Return a copy of the raw values as a 1-D numpy array.

Return type:: numpy.ndarray

copy()[source]

Return a deep copy of this TimeSeries.

Return type:: TimeSeries

slice(start=None, end=None)[source]

Return a time-bounded subset of the series.

Both start and end are inclusive. Either may be None to leave that boundary open.

Parameters:

start (str | Timestamp | None) – Start timestamp (inclusive). Accepts any value parseable by pandas.Timestamp() (e.g., "2020-01-01").
end (str | Timestamp | None) – End timestamp (inclusive).

Return type:

TimeSeries

Raises:

ValueError – If the resulting slice is empty.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.arange(365.0), index=idx)
>>> q1 = ts.slice("2020-01-01", "2020-03-31")
>>> q1.n
91

resample(freq, *, agg=AggMethod.MEAN)[source]

Resample the series to a new frequency.

Parameters:

freq (str) – Target pandas offset alias (e.g., "W", "MS").
agg (str | AggMethod) – Aggregation method. Either an AggMethod member or its string value. Default "mean".

Return type:

TimeSeries

Raises:

ValueError – If freq is not recognised by pandas.
AttributeError – If agg is not a valid resampler method.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.ones(365), index=idx)
>>> ts.resample("MS").n    # 12 monthly values
12

diff(periods=1, *, method=DiffMethod.SIMPLE)[source]

Difference the series.

Parameters:

periods (int) – Number of periods to lag. Default 1 (first difference).
method (str | DiffMethod) –
One of:
- "simple" — y[t] - y[t-k]
- "log" — log(y[t]) - log(y[t-k])
- "percent"— (y[t] - y[t-k]) / y[t-k]

Returns:

The leading NaN rows introduced by differencing are dropped.

Return type:

TimeSeries

Raises:

ValueError – If method is "log" or "percent" and the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.0, 12.0, 11.0, 13.0], index=idx)
>>> ts.diff().values
array([1., 1., -1., 2.])

log()[source]

Apply the natural logarithm element-wise.

Return type:: TimeSeries
Raises:: ValueError – If the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> TimeSeries([1.0, np.e, np.e**2], index=idx).log().values
array([0., 1., 2.])

standardize()[source]

Standardise to zero mean and unit variance (z-score).

The transform is (x - mean) / std. NaN values are ignored when computing statistics but preserved in position.

Return type:: TimeSeries
Raises:: ValueError – If the standard deviation is zero (constant series).

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=4, freq="D")
>>> ts = TimeSeries([2.0, 4.0, 6.0, 8.0], index=idx)
>>> z = ts.standardize()
>>> round(float(z.values.mean()), 10)
0.0

normalize(*, lower=0.0, upper=1.0)[source]

Min-max normalise the series to [lower, upper].

Parameters:

lower (float) – Target minimum value. Default 0.0.
upper (float) – Target maximum value. Default 1.0.

Return type:

TimeSeries

Raises:

ValueError – If the series has zero range (max == min) or lower >= upper.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([0.0, 5.0, 10.0], index=idx)
>>> ts.normalize().values
array([0. , 0.5, 1. ])

rolling(window, *, agg=AggMethod.MEAN, center=False, min_periods=None)[source]

Apply a rolling-window aggregation.

Parameters:

window (int) – Size of the rolling window in number of observations.
agg (str | AggMethod) – Aggregation method (default "mean").
center (bool) – Whether to set the window labels as the centre of the window (default False — trailing window).
min_periods (int | None) – Minimum number of non-NaN observations required to produce a value. Defaults to window.

Returns:

Leading/trailing NaNs introduced by the window are dropped.

Return type:

TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=6, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], index=idx)
>>> ts.rolling(3).values
array([2., 3., 4., 5.])

apply(func, *, name=None)[source]

Apply an arbitrary element-wise function to the values.

Parameters:

func (Callable[[ndarray], ndarray]) – Callable that takes a 1-D numpy.ndarray and returns a 1-D array of the same length.
name (str | None) – Name for the resulting series. Defaults to "f({self.name})".

Return type:

TimeSeries

Raises:

ValueError – If func changes the length of the array.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 4.0, 9.0], index=idx)
>>> ts.apply(np.sqrt).values
array([1., 2., 3.])

__len__()[source]

Return type:: int

__contains__(timestamp)[source]

Check whether a timestamp exists in the index.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0], index=idx)
>>> pd.Timestamp("2020-01-02") in ts
True

Parameters:: timestamp (object)
Return type:: bool

__getitem__(key)[source]

Positional indexing by integer or slice.

Parameters:

key (int | slice) –

int — return the scalar value at that position.
slice — return a new TimeSeries for that range.

Return type:

float | TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 20.0, 30.0, 40.0, 50.0], index=idx)
>>> ts[0]
10.0
>>> ts[-1]
50.0
>>> ts[1:3].values
array([20., 30.])

__repr__()[source]

Return repr(self).

Return type:: str

class tseda.Frequency(*values)[source]

Bases: str, Enum

Canonical pandas offset aliases recognised by tseda.

The string value of each member is a valid freq argument to pandas.date_range() and pandas.Series.resample().

Examples

>>> Frequency.DAILY.value
'D'
>>> Frequency.DAILY == "D"
True

SECONDLY = 'S'

MINUTELY = 'min'

HOURLY = 'h'

DAILY = 'D'

BUSINESS_DAILY = 'B'

WEEKLY = 'W'

MONTHLY_START = 'MS'

MONTHLY_END = 'ME'

QUARTERLY_START = 'QS'

QUARTERLY_END = 'QE'

ANNUAL_START = 'YS'

ANNUAL_END = 'YE'

__repr__(): Return repr(self).

class tseda.AggMethod(*values)[source]

Bases: str, Enum

Aggregation functions available when resampling a TimeSeries.

The string value matches the pandas.core.resample.Resampler method name.

Examples

>>> AggMethod.MEAN.value
'mean'

MEAN = 'mean'

SUM = 'sum'

MIN = 'min'

MAX = 'max'

MEDIAN = 'median'

FIRST = 'first'

LAST = 'last'

STD = 'std'

VAR = 'var'

COUNT = 'count'

__repr__(): Return repr(self).

class tseda.DiffMethod(*values)[source]

Bases: str, Enum

Differencing mode for diff().

SIMPLE: y[t] - y[t-k] (standard first/kth difference).

LOG: log(y[t]) - log(y[t-k]) (log return / percent change in log scale).

PERCENT: (y[t] - y[t-k]) / y[t-k] (relative change).

SIMPLE = 'simple'

LOG = 'log'

PERCENT = 'percent'

__repr__(): Return repr(self).