Quickstart
tseda — Time Series Exploratory Data Analysis
A comprehensive, dependency-light Python toolkit for understanding time series data before forecasting it.
Quick Start
>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.cumsum(np.random.randn(365)), index=idx,
... name="stock_price", unit="USD")
>>> print(ts)
Modules
- core
TimeSeriesdata structure and validators.- quality
Missing-value analysis, outlier detection, and duplicate checks.
- statistics
Descriptive statistics, stationarity tests, and autocorrelation.
- decomposition
Classical (additive/multiplicative) and STL decomposition.
- seasonality
Period detection via periodogram and autocorrelation.
- anomaly
Point and contextual anomaly detection.
- changepoint
Structural break detection.
- features
Temporal, statistical, and spectral feature extraction.
- forecastability
Forecast-readiness scoring and data-leakage detection.
- visualization
Matplotlib-based plot suite.
- report
HTML and console report generation.
- class tseda.TimeSeries(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]
Bases:
objectUnivariate time series with a
pandas.DatetimeIndex.- Parameters:
data (Union[ArrayLike, pd.Series]) –
Numeric values. Accepted types:
1-D
numpy.ndarraypandas.Series— values are extracted; the Series index is used unless index is also provided.
index (Optional[DatetimeLike]) –
Datetime timestamps aligned with data. When data is a
pandas.Serieswith apandas.DatetimeIndexthis argument may be omitted. Accepted types:list/numpy.ndarrayof datetime-like strings ornumpy.datetime64objects
name (str) – Short identifier for the series (used in plots and reports). Default
"value".freq (Optional[str]) – Pandas offset alias (e.g.,
"D","h","MS"). WhenNone(default) the frequency is inferred automatically.unit (Optional[str]) – Physical unit of the values (e.g.,
"USD","°C"). Purely informational — used in axis labels.description (Optional[str]) – Free-text description stored in
metadata.
- Raises:
TypeError – If data or index have an unsupported type.
ValueError – If data and index have different lengths, if index is not monotonically increasing, or if index contains duplicates.
Examples
From a numpy array:
>>> import numpy as np, pandas as pd >>> from tseda import TimeSeries >>> idx = pd.date_range("2020-01-01", periods=5, freq="D") >>> ts = TimeSeries([10.0, 11.5, 9.8, 12.0, 11.0], index=idx) >>> ts.n 5
From a pandas Series:
>>> s = pd.Series([1, 2, 3], index=pd.date_range("2020", periods=3, freq="D")) >>> ts = TimeSeries.from_series(s)
- classmethod from_series(series, *, name=None, freq=None, unit=None, description=None)[source]
Construct a
TimeSeriesfrom apandas.Series.- Parameters:
series (Series) – Must have a
pandas.DatetimeIndex.name (str | None) – Override the Series’
.nameattribute. WhenNonethe Series name (if any) is used, falling back to"value".freq (str | None) – Forwarded to
TimeSeries.__init__.unit (str | None) – Forwarded to
TimeSeries.__init__.description (str | None) – Forwarded to
TimeSeries.__init__.
- Return type:
Examples
>>> s = pd.Series([1.0, 2.0], index=pd.date_range("2020", periods=2, freq="D")) >>> TimeSeries.from_series(s, name="x").name 'x'
- classmethod from_arrays(values, index, *, name='value', freq=None, unit=None, description=None)[source]
Construct a
TimeSeriesfrom parallel arrays.- Parameters:
values (ndarray | list | tuple | Series) – 1-D numeric array.
index (DatetimeIndex | Series | list | ndarray) – Datetime-like array of the same length.
name (str) – Forwarded to
TimeSeries.__init__.freq (str | None) – Forwarded to
TimeSeries.__init__.unit (str | None) – Forwarded to
TimeSeries.__init__.description (str | None) – Forwarded to
TimeSeries.__init__.
- Return type:
Examples
>>> import numpy as np, pandas as pd >>> vals = np.array([1.0, 2.0, 3.0]) >>> idx = pd.date_range("2021-01-01", periods=3, freq="D") >>> TimeSeries.from_arrays(vals, idx).n 3
- classmethod from_dataframe(df, column, *, name=None, freq=None, unit=None, description=None)[source]
Extract one column from a
pandas.DataFrame.- Parameters:
df (DataFrame) – Source DataFrame. Must have a
pandas.DatetimeIndex.column (str) – Column name to extract.
name (str | None) – Override the column name as the series name.
freq (str | None) – Forwarded to
TimeSeries.__init__.unit (str | None) – Forwarded to
TimeSeries.__init__.description (str | None) – Forwarded to
TimeSeries.__init__.
- Return type:
- Raises:
KeyError – If column is not in df.
Examples
>>> import pandas as pd >>> df = pd.DataFrame({"temp": [20.0, 21.0, 19.5]}, ... index=pd.date_range("2020", periods=3, freq="D")) >>> TimeSeries.from_dataframe(df, "temp").name 'temp'
- property values: ndarray
1-D
float64array of observed values.- Returns:
A copy to protect the internal state.
- Return type:
- property index: DatetimeIndex
Datetime index of the series.
- Return type:
- property unit: str | None
Physical unit of the values, or
Noneif unspecified.- Return type:
str or None
- property description: str | None
Free-text description, or
Noneif unspecified.- Return type:
str or None
- property freq: str | None
Pandas offset alias (e.g.,
"D"), orNonefor irregular data.- Return type:
str or None
- property is_regular: bool
Truewhen all consecutive time gaps are identical.A regular series has no missing timestamps (assuming a fixed sampling interval). An irregular series may be the result of market holidays, sensor outages, or event-driven sampling.
- Return type:
- to_series()[source]
Return the data as a
pandas.Series.The returned Series uses the same DatetimeIndex and the
nameattribute as its Series name.- Return type:
- to_frame()[source]
Return the data as a single-column
pandas.DataFrame.- Returns:
Column name equals
name.- Return type:
- copy()[source]
Return a deep copy of this
TimeSeries.- Return type:
- slice(start=None, end=None)[source]
Return a time-bounded subset of the series.
Both start and end are inclusive. Either may be
Noneto leave that boundary open.- Parameters:
- Return type:
- Raises:
ValueError – If the resulting slice is empty.
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020-01-01", periods=365, freq="D") >>> ts = TimeSeries(np.arange(365.0), index=idx) >>> q1 = ts.slice("2020-01-01", "2020-03-31") >>> q1.n 91
- resample(freq, *, agg=AggMethod.MEAN)[source]
Resample the series to a new frequency.
- Parameters:
- Return type:
- Raises:
ValueError – If freq is not recognised by pandas.
AttributeError – If agg is not a valid resampler method.
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020-01-01", periods=365, freq="D") >>> ts = TimeSeries(np.ones(365), index=idx) >>> ts.resample("MS").n # 12 monthly values 12
- diff(periods=1, *, method=DiffMethod.SIMPLE)[source]
Difference the series.
- Parameters:
periods (int) – Number of periods to lag. Default 1 (first difference).
method (str | DiffMethod) –
One of:
"simple"—y[t] - y[t-k]"log"—log(y[t]) - log(y[t-k])"percent"—(y[t] - y[t-k]) / y[t-k]
- Returns:
The leading NaN rows introduced by differencing are dropped.
- Return type:
- Raises:
ValueError – If method is
"log"or"percent"and the series contains non-positive values.
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=5, freq="D") >>> ts = TimeSeries([10.0, 11.0, 12.0, 11.0, 13.0], index=idx) >>> ts.diff().values array([1., 1., -1., 2.])
- log()[source]
Apply the natural logarithm element-wise.
- Return type:
- Raises:
ValueError – If the series contains non-positive values.
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=3, freq="D") >>> TimeSeries([1.0, np.e, np.e**2], index=idx).log().values array([0., 1., 2.])
- standardize()[source]
Standardise to zero mean and unit variance (z-score).
The transform is
(x - mean) / std. NaN values are ignored when computing statistics but preserved in position.- Return type:
- Raises:
ValueError – If the standard deviation is zero (constant series).
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=4, freq="D") >>> ts = TimeSeries([2.0, 4.0, 6.0, 8.0], index=idx) >>> z = ts.standardize() >>> round(float(z.values.mean()), 10) 0.0
- normalize(*, lower=0.0, upper=1.0)[source]
Min-max normalise the series to [lower, upper].
- Parameters:
- Return type:
- Raises:
ValueError – If the series has zero range (max == min) or lower >= upper.
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=3, freq="D") >>> ts = TimeSeries([0.0, 5.0, 10.0], index=idx) >>> ts.normalize().values array([0. , 0.5, 1. ])
- rolling(window, *, agg=AggMethod.MEAN, center=False, min_periods=None)[source]
Apply a rolling-window aggregation.
- Parameters:
window (int) – Size of the rolling window in number of observations.
agg (str | AggMethod) – Aggregation method (default
"mean").center (bool) – Whether to set the window labels as the centre of the window (default
False— trailing window).min_periods (int | None) – Minimum number of non-NaN observations required to produce a value. Defaults to window.
- Returns:
Leading/trailing NaNs introduced by the window are dropped.
- Return type:
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=6, freq="D") >>> ts = TimeSeries([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], index=idx) >>> ts.rolling(3).values array([2., 3., 4., 5.])
- apply(func, *, name=None)[source]
Apply an arbitrary element-wise function to the values.
- Parameters:
- Return type:
- Raises:
ValueError – If func changes the length of the array.
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=3, freq="D") >>> ts = TimeSeries([1.0, 4.0, 9.0], index=idx) >>> ts.apply(np.sqrt).values array([1., 2., 3.])
- __contains__(timestamp)[source]
Check whether a timestamp exists in the index.
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=3, freq="D") >>> ts = TimeSeries([1.0, 2.0, 3.0], index=idx) >>> pd.Timestamp("2020-01-02") in ts True
- __getitem__(key)[source]
Positional indexing by integer or slice.
- Parameters:
int— return the scalar value at that position.slice— return a newTimeSeriesfor that range.
- Return type:
Examples
>>> import pandas as pd, numpy as np >>> idx = pd.date_range("2020", periods=5, freq="D") >>> ts = TimeSeries([10.0, 20.0, 30.0, 40.0, 50.0], index=idx) >>> ts[0] 10.0 >>> ts[-1] 50.0 >>> ts[1:3].values array([20., 30.])
- class tseda.Frequency(*values)[source]
-
Canonical pandas offset aliases recognised by tseda.
The string value of each member is a valid
freqargument topandas.date_range()andpandas.Series.resample().Examples
>>> Frequency.DAILY.value 'D' >>> Frequency.DAILY == "D" True
- SECONDLY = 'S'
- MINUTELY = 'min'
- HOURLY = 'h'
- DAILY = 'D'
- BUSINESS_DAILY = 'B'
- WEEKLY = 'W'
- MONTHLY_START = 'MS'
- MONTHLY_END = 'ME'
- QUARTERLY_START = 'QS'
- QUARTERLY_END = 'QE'
- ANNUAL_START = 'YS'
- ANNUAL_END = 'YE'
- __repr__()
Return repr(self).
- class tseda.AggMethod(*values)[source]
-
Aggregation functions available when resampling a
TimeSeries.The string value matches the
pandas.core.resample.Resamplermethod name.Examples
>>> AggMethod.MEAN.value 'mean'
- MEAN = 'mean'
- SUM = 'sum'
- MIN = 'min'
- MAX = 'max'
- MEDIAN = 'median'
- FIRST = 'first'
- LAST = 'last'
- STD = 'std'
- VAR = 'var'
- COUNT = 'count'
- __repr__()
Return repr(self).
- class tseda.DiffMethod(*values)[source]
-
Differencing mode for
diff().- SIMPLE
y[t] - y[t-k](standard first/kth difference).
- LOG
log(y[t]) - log(y[t-k])(log return / percent change in log scale).
- PERCENT
(y[t] - y[t-k]) / y[t-k](relative change).
- SIMPLE = 'simple'
- LOG = 'log'
- PERCENT = 'percent'
- __repr__()
Return repr(self).