tseda.core

Core data structures and validation utilities.

Public API

TimeSeries: Univariate time series with a DatetimeIndex.
ArrayLike: Type alias for 1-D numeric inputs.
DatetimeLike: Type alias for datetime-index inputs.
Frequency: Enum of recognised pandas offset aliases.
AggMethod: Enum of aggregation methods for resampling / rolling.
DiffMethod: Enum of differencing strategies.

class tseda.core.TimeSeries(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]

Bases: object

Univariate time series with a pandas.DatetimeIndex.

Parameters:

data (Union[ArrayLike, pd.Series]) –
Numeric values. Accepted types:
- 1-D numpy.ndarray
- pandas.Series — values are extracted; the Series index is used unless index is also provided.
- list or tuple of numbers
index (Optional[DatetimeLike]) –
Datetime timestamps aligned with data. When data is a pandas.Series with a pandas.DatetimeIndex this argument may be omitted. Accepted types:
- pandas.DatetimeIndex
- list / numpy.ndarray of datetime-like strings or numpy.datetime64 objects
name (str) – Short identifier for the series (used in plots and reports). Default "value".
freq (Optional[str]) – Pandas offset alias (e.g., "D", "h", "MS"). When None (default) the frequency is inferred automatically.
unit (Optional[str]) – Physical unit of the values (e.g., "USD", "°C"). Purely informational — used in axis labels.
description (Optional[str]) – Free-text description stored in metadata.

Raises:

TypeError – If data or index have an unsupported type.
ValueError – If data and index have different lengths, if index is not monotonically increasing, or if index contains duplicates.

Examples

From a numpy array:

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> idx = pd.date_range("2020-01-01", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.5, 9.8, 12.0, 11.0], index=idx)
>>> ts.n
5

From a pandas Series:

>>> s = pd.Series([1, 2, 3], index=pd.date_range("2020", periods=3, freq="D"))
>>> ts = TimeSeries.from_series(s)

__init__(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]

Parameters:

data (ndarray | list | tuple | Series)
index (DatetimeIndex | Series | list | ndarray | None)
name (str)
freq (str | None)
unit (str | None)
description (str | None)

Return type:

None

classmethod from_series(series, *, name=None, freq=None, unit=None, description=None)[source]

Construct a TimeSeries from a pandas.Series.

Parameters:

series (Series) – Must have a pandas.DatetimeIndex.
name (str | None) – Override the Series’ .name attribute. When None the Series name (if any) is used, falling back to "value".
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Examples

>>> s = pd.Series([1.0, 2.0], index=pd.date_range("2020", periods=2, freq="D"))
>>> TimeSeries.from_series(s, name="x").name
'x'

classmethod from_arrays(values, index, *, name='value', freq=None, unit=None, description=None)[source]

Construct a TimeSeries from parallel arrays.

Parameters:

values (ndarray | list | tuple | Series) – 1-D numeric array.
index (DatetimeIndex | Series | list | ndarray) – Datetime-like array of the same length.
name (str) – Forwarded to TimeSeries.__init__.
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Examples

>>> import numpy as np, pandas as pd
>>> vals = np.array([1.0, 2.0, 3.0])
>>> idx  = pd.date_range("2021-01-01", periods=3, freq="D")
>>> TimeSeries.from_arrays(vals, idx).n
3

classmethod from_dataframe(df, column, *, name=None, freq=None, unit=None, description=None)[source]

Extract one column from a pandas.DataFrame.

Parameters:

df (DataFrame) – Source DataFrame. Must have a pandas.DatetimeIndex.
column (str) – Column name to extract.
name (str | None) – Override the column name as the series name.
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Raises:

KeyError – If column is not in df.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"temp": [20.0, 21.0, 19.5]},
...                    index=pd.date_range("2020", periods=3, freq="D"))
>>> TimeSeries.from_dataframe(df, "temp").name
'temp'

property values: ndarray

1-D float64 array of observed values.

Returns:: A copy to protect the internal state.
Return type:: numpy.ndarray

property index: DatetimeIndex

Datetime index of the series.

Return type:: pandas.DatetimeIndex

property n: int

Number of observations.

Return type:: int

property start: Timestamp

Timestamp of the first observation.

Return type:: pandas.Timestamp

property end: Timestamp

Timestamp of the last observation.

Return type:: pandas.Timestamp

property duration: Timedelta

Wall-clock span from the first to the last observation.

Return type:: pandas.Timedelta

property name: str

Short identifier for the series.

Return type:: str

property unit: str | None

Physical unit of the values, or None if unspecified.

Return type:: str or None

property description: str | None

Free-text description, or None if unspecified.

Return type:: str or None

property freq: str | None

Pandas offset alias (e.g., "D"), or None for irregular data.

Return type:: str or None

property freq_label: str

Human-readable frequency label (e.g., "Daily").

Return type:: str

property has_nan: bool

True when at least one value is NaN.

Return type:: bool

property n_nan: int

Number of NaN values.

Return type:: int

property is_regular: bool

True when all consecutive time gaps are identical.

A regular series has no missing timestamps (assuming a fixed sampling interval). An irregular series may be the result of market holidays, sensor outages, or event-driven sampling.

Return type:: bool

to_series()[source]

Return the data as a pandas.Series.

The returned Series uses the same DatetimeIndex and the name attribute as its Series name.

Return type:: pandas.Series

to_frame()[source]

Return the data as a single-column pandas.DataFrame.

Returns:: Column name equals name.
Return type:: pandas.DataFrame

to_numpy()[source]

Return a copy of the raw values as a 1-D numpy array.

Return type:: numpy.ndarray

copy()[source]

Return a deep copy of this TimeSeries.

Return type:: TimeSeries

slice(start=None, end=None)[source]

Return a time-bounded subset of the series.

Both start and end are inclusive. Either may be None to leave that boundary open.

Parameters:

start (str | Timestamp | None) – Start timestamp (inclusive). Accepts any value parseable by pandas.Timestamp() (e.g., "2020-01-01").
end (str | Timestamp | None) – End timestamp (inclusive).

Return type:

TimeSeries

Raises:

ValueError – If the resulting slice is empty.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.arange(365.0), index=idx)
>>> q1 = ts.slice("2020-01-01", "2020-03-31")
>>> q1.n
91

resample(freq, *, agg=AggMethod.MEAN)[source]

Resample the series to a new frequency.

Parameters:

freq (str) – Target pandas offset alias (e.g., "W", "MS").
agg (str | AggMethod) – Aggregation method. Either an AggMethod member or its string value. Default "mean".

Return type:

TimeSeries

Raises:

ValueError – If freq is not recognised by pandas.
AttributeError – If agg is not a valid resampler method.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.ones(365), index=idx)
>>> ts.resample("MS").n    # 12 monthly values
12

diff(periods=1, *, method=DiffMethod.SIMPLE)[source]

Difference the series.

Parameters:

periods (int) – Number of periods to lag. Default 1 (first difference).
method (str | DiffMethod) –
One of:
- "simple" — y[t] - y[t-k]
- "log" — log(y[t]) - log(y[t-k])
- "percent"— (y[t] - y[t-k]) / y[t-k]

Returns:

The leading NaN rows introduced by differencing are dropped.

Return type:

TimeSeries

Raises:

ValueError – If method is "log" or "percent" and the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.0, 12.0, 11.0, 13.0], index=idx)
>>> ts.diff().values
array([1., 1., -1., 2.])

log()[source]

Apply the natural logarithm element-wise.

Return type:: TimeSeries
Raises:: ValueError – If the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> TimeSeries([1.0, np.e, np.e**2], index=idx).log().values
array([0., 1., 2.])

standardize()[source]

Standardise to zero mean and unit variance (z-score).

The transform is (x - mean) / std. NaN values are ignored when computing statistics but preserved in position.

Return type:: TimeSeries
Raises:: ValueError – If the standard deviation is zero (constant series).

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=4, freq="D")
>>> ts = TimeSeries([2.0, 4.0, 6.0, 8.0], index=idx)
>>> z = ts.standardize()
>>> round(float(z.values.mean()), 10)
0.0

normalize(*, lower=0.0, upper=1.0)[source]

Min-max normalise the series to [lower, upper].

Parameters:

lower (float) – Target minimum value. Default 0.0.
upper (float) – Target maximum value. Default 1.0.

Return type:

TimeSeries

Raises:

ValueError – If the series has zero range (max == min) or lower >= upper.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([0.0, 5.0, 10.0], index=idx)
>>> ts.normalize().values
array([0. , 0.5, 1. ])

rolling(window, *, agg=AggMethod.MEAN, center=False, min_periods=None)[source]

Apply a rolling-window aggregation.

Parameters:

window (int) – Size of the rolling window in number of observations.
agg (str | AggMethod) – Aggregation method (default "mean").
center (bool) – Whether to set the window labels as the centre of the window (default False — trailing window).
min_periods (int | None) – Minimum number of non-NaN observations required to produce a value. Defaults to window.

Returns:

Leading/trailing NaNs introduced by the window are dropped.

Return type:

TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=6, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], index=idx)
>>> ts.rolling(3).values
array([2., 3., 4., 5.])

apply(func, *, name=None)[source]

Apply an arbitrary element-wise function to the values.

Parameters:

func (Callable[[ndarray], ndarray]) – Callable that takes a 1-D numpy.ndarray and returns a 1-D array of the same length.
name (str | None) – Name for the resulting series. Defaults to "f({self.name})".

Return type:

TimeSeries

Raises:

ValueError – If func changes the length of the array.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 4.0, 9.0], index=idx)
>>> ts.apply(np.sqrt).values
array([1., 2., 3.])

__len__()[source]

Return type:: int

__contains__(timestamp)[source]

Check whether a timestamp exists in the index.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0], index=idx)
>>> pd.Timestamp("2020-01-02") in ts
True

Parameters:: timestamp (object)
Return type:: bool

__getitem__(key)[source]

Positional indexing by integer or slice.

Parameters:

key (int | slice) –

int — return the scalar value at that position.
slice — return a new TimeSeries for that range.

Return type:

float | TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 20.0, 30.0, 40.0, 50.0], index=idx)
>>> ts[0]
10.0
>>> ts[-1]
50.0
>>> ts[1:3].values
array([20., 30.])

__repr__()[source]

Return repr(self).

Return type:: str

class tseda.core.Frequency(*values)[source]

Bases: str, Enum

Canonical pandas offset aliases recognised by tseda.

The string value of each member is a valid freq argument to pandas.date_range() and pandas.Series.resample().

Examples

>>> Frequency.DAILY.value
'D'
>>> Frequency.DAILY == "D"
True

SECONDLY = 'S'

MINUTELY = 'min'

HOURLY = 'h'

DAILY = 'D'

BUSINESS_DAILY = 'B'

WEEKLY = 'W'

MONTHLY_START = 'MS'

MONTHLY_END = 'ME'

QUARTERLY_START = 'QS'

QUARTERLY_END = 'QE'

ANNUAL_START = 'YS'

ANNUAL_END = 'YE'

__repr__(): Return repr(self).

class tseda.core.AggMethod(*values)[source]

Bases: str, Enum

Aggregation functions available when resampling a TimeSeries.

The string value matches the pandas.core.resample.Resampler method name.

Examples

>>> AggMethod.MEAN.value
'mean'

MEAN = 'mean'

SUM = 'sum'

MIN = 'min'

MAX = 'max'

MEDIAN = 'median'

FIRST = 'first'

LAST = 'last'

STD = 'std'

VAR = 'var'

COUNT = 'count'

__repr__(): Return repr(self).

class tseda.core.DiffMethod(*values)[source]

Bases: str, Enum

Differencing mode for diff().

SIMPLE: y[t] - y[t-k] (standard first/kth difference).

LOG: log(y[t]) - log(y[t-k]) (log return / percent change in log scale).

PERCENT: (y[t] - y[t-k]) / y[t-k] (relative change).

SIMPLE = 'simple'

LOG = 'log'

PERCENT = 'percent'

__repr__(): Return repr(self).

tseda.core.validate_data_array(data, *, name='data')[source]

Coerce data to a 1-D float64 numpy.ndarray.

Parameters:

data (Any) –
Numeric input. Accepted types:
- numpy.ndarray — must be 1-D.
- pandas.Series — values extracted; index ignored.
- list or tuple — must be flat and numeric.
name (str) – Variable name used in error messages (default "data").

Returns:

1-D array of dtype float64. NaN values are preserved.

Return type:

numpy.ndarray

Raises:

TypeError – If data is not a recognised type.
ValueError – If data is not 1-D or contains non-numeric elements.

Examples

>>> validate_data_array([1.0, 2.0, 3.0])
array([1., 2., 3.])
>>> validate_data_array(pd.Series([1, 2, 3]))
array([1., 2., 3.])

tseda.core.validate_datetime_index(index, *, name='index')[source]

Coerce index to a sorted, duplicate-free pandas.DatetimeIndex.

Parameters:

index (Any) –
Datetime-like input. Accepted types:
- pandas.DatetimeIndex
- pandas.Series with datetime dtype
- list or numpy.ndarray of datetime-like strings or numpy.datetime64 values
name (str) – Variable name used in error messages (default "index").

Returns:

Validated, monotonically increasing, duplicate-free index.

Return type:

pandas.DatetimeIndex

Raises:

TypeError – If index is not a recognised type.
ValueError – If index is not monotonically increasing or contains duplicates.

Examples

>>> idx = pd.date_range("2020-01-01", periods=5, freq="D")
>>> validate_datetime_index(idx)
DatetimeIndex(['2020-01-01', ..., '2020-01-05'], dtype='datetime64[ns]', freq='D')

tseda.core.validate_freq_string(freq, *, name='freq')[source]

Assert that freq is a non-empty string accepted by pandas.tseries.frequencies.to_offset().

Parameters:

freq (Any) – Candidate frequency string (e.g., "D", "h", "MS").
name (str) – Variable name used in error messages.

Returns:

The validated frequency string.

Return type:

str

Raises:

TypeError – If freq is not a string.
ValueError – If freq is not recognised by pandas.

Examples

>>> validate_freq_string("D")
'D'
>>> validate_freq_string("15min")
'15min'

tseda.core.validate_lags(lags, n, *, name='lags')[source]

Assert that lags is a sensible lag count for a series of length n.

The upper bound is n // 2 because computing autocorrelations at lags approaching n produces unreliable estimates.

Parameters:

lags (int) – Requested number of lags.
n (int) – Length of the time series.
name (str) – Variable name used in error messages.

Returns:

The validated lag count.

Return type:

int

Raises:

ValueError – If lags is not in [1, n // 2].

Examples

>>> validate_lags(40, 100)
40

tseda.core.validate_positive_int(value, *, name='value')[source]

Assert that value is a positive integer.

Parameters:

value (Any) – The candidate value.
name (str) – Variable name used in error messages.

Returns:

The validated integer.

Return type:

int

Raises:

TypeError – If value is not an integer type.
ValueError – If value is less than 1.

Examples

>>> validate_positive_int(5)
5

TimeSeries

class tseda.core.timeseries.TimeSeries(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]

Bases: object

Univariate time series with a pandas.DatetimeIndex.

Parameters:

data (Union[ArrayLike, pd.Series]) –
Numeric values. Accepted types:
- 1-D numpy.ndarray
- pandas.Series — values are extracted; the Series index is used unless index is also provided.
- list or tuple of numbers
index (Optional[DatetimeLike]) –
Datetime timestamps aligned with data. When data is a pandas.Series with a pandas.DatetimeIndex this argument may be omitted. Accepted types:
- pandas.DatetimeIndex
- list / numpy.ndarray of datetime-like strings or numpy.datetime64 objects
name (str) – Short identifier for the series (used in plots and reports). Default "value".
freq (Optional[str]) – Pandas offset alias (e.g., "D", "h", "MS"). When None (default) the frequency is inferred automatically.
unit (Optional[str]) – Physical unit of the values (e.g., "USD", "°C"). Purely informational — used in axis labels.
description (Optional[str]) – Free-text description stored in metadata.

Raises:

TypeError – If data or index have an unsupported type.
ValueError – If data and index have different lengths, if index is not monotonically increasing, or if index contains duplicates.

Examples

From a numpy array:

>>> import numpy as np, pandas as pd
>>> from tseda import TimeSeries
>>> idx = pd.date_range("2020-01-01", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.5, 9.8, 12.0, 11.0], index=idx)
>>> ts.n
5

From a pandas Series:

>>> s = pd.Series([1, 2, 3], index=pd.date_range("2020", periods=3, freq="D"))
>>> ts = TimeSeries.from_series(s)

__init__(data, *, index=None, name='value', freq=None, unit=None, description=None)[source]

Parameters:

data (ndarray | list | tuple | Series)
index (DatetimeIndex | Series | list | ndarray | None)
name (str)
freq (str | None)
unit (str | None)
description (str | None)

Return type:

None

classmethod from_series(series, *, name=None, freq=None, unit=None, description=None)[source]

Construct a TimeSeries from a pandas.Series.

Parameters:

series (Series) – Must have a pandas.DatetimeIndex.
name (str | None) – Override the Series’ .name attribute. When None the Series name (if any) is used, falling back to "value".
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Examples

>>> s = pd.Series([1.0, 2.0], index=pd.date_range("2020", periods=2, freq="D"))
>>> TimeSeries.from_series(s, name="x").name
'x'

classmethod from_arrays(values, index, *, name='value', freq=None, unit=None, description=None)[source]

Construct a TimeSeries from parallel arrays.

Parameters:

values (ndarray | list | tuple | Series) – 1-D numeric array.
index (DatetimeIndex | Series | list | ndarray) – Datetime-like array of the same length.
name (str) – Forwarded to TimeSeries.__init__.
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Examples

>>> import numpy as np, pandas as pd
>>> vals = np.array([1.0, 2.0, 3.0])
>>> idx  = pd.date_range("2021-01-01", periods=3, freq="D")
>>> TimeSeries.from_arrays(vals, idx).n
3

classmethod from_dataframe(df, column, *, name=None, freq=None, unit=None, description=None)[source]

Extract one column from a pandas.DataFrame.

Parameters:

df (DataFrame) – Source DataFrame. Must have a pandas.DatetimeIndex.
column (str) – Column name to extract.
name (str | None) – Override the column name as the series name.
freq (str | None) – Forwarded to TimeSeries.__init__.
unit (str | None) – Forwarded to TimeSeries.__init__.
description (str | None) – Forwarded to TimeSeries.__init__.

Return type:

TimeSeries

Raises:

KeyError – If column is not in df.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"temp": [20.0, 21.0, 19.5]},
...                    index=pd.date_range("2020", periods=3, freq="D"))
>>> TimeSeries.from_dataframe(df, "temp").name
'temp'

property values: ndarray

1-D float64 array of observed values.

Returns:: A copy to protect the internal state.
Return type:: numpy.ndarray

property index: DatetimeIndex

Datetime index of the series.

Return type:: pandas.DatetimeIndex

property n: int

Number of observations.

Return type:: int

property start: Timestamp

Timestamp of the first observation.

Return type:: pandas.Timestamp

property end: Timestamp

Timestamp of the last observation.

Return type:: pandas.Timestamp

property duration: Timedelta

Wall-clock span from the first to the last observation.

Return type:: pandas.Timedelta

property name: str

Short identifier for the series.

Return type:: str

property unit: str | None

Physical unit of the values, or None if unspecified.

Return type:: str or None

property description: str | None

Free-text description, or None if unspecified.

Return type:: str or None

property freq: str | None

Pandas offset alias (e.g., "D"), or None for irregular data.

Return type:: str or None

property freq_label: str

Human-readable frequency label (e.g., "Daily").

Return type:: str

property has_nan: bool

True when at least one value is NaN.

Return type:: bool

property n_nan: int

Number of NaN values.

Return type:: int

property is_regular: bool

True when all consecutive time gaps are identical.

A regular series has no missing timestamps (assuming a fixed sampling interval). An irregular series may be the result of market holidays, sensor outages, or event-driven sampling.

Return type:: bool

to_series()[source]

Return the data as a pandas.Series.

The returned Series uses the same DatetimeIndex and the name attribute as its Series name.

Return type:: pandas.Series

to_frame()[source]

Return the data as a single-column pandas.DataFrame.

Returns:: Column name equals name.
Return type:: pandas.DataFrame

to_numpy()[source]

Return a copy of the raw values as a 1-D numpy array.

Return type:: numpy.ndarray

copy()[source]

Return a deep copy of this TimeSeries.

Return type:: TimeSeries

slice(start=None, end=None)[source]

Return a time-bounded subset of the series.

Both start and end are inclusive. Either may be None to leave that boundary open.

Parameters:

start (str | Timestamp | None) – Start timestamp (inclusive). Accepts any value parseable by pandas.Timestamp() (e.g., "2020-01-01").
end (str | Timestamp | None) – End timestamp (inclusive).

Return type:

TimeSeries

Raises:

ValueError – If the resulting slice is empty.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.arange(365.0), index=idx)
>>> q1 = ts.slice("2020-01-01", "2020-03-31")
>>> q1.n
91

resample(freq, *, agg=AggMethod.MEAN)[source]

Resample the series to a new frequency.

Parameters:

freq (str) – Target pandas offset alias (e.g., "W", "MS").
agg (str | AggMethod) – Aggregation method. Either an AggMethod member or its string value. Default "mean".

Return type:

TimeSeries

Raises:

ValueError – If freq is not recognised by pandas.
AttributeError – If agg is not a valid resampler method.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020-01-01", periods=365, freq="D")
>>> ts = TimeSeries(np.ones(365), index=idx)
>>> ts.resample("MS").n    # 12 monthly values
12

diff(periods=1, *, method=DiffMethod.SIMPLE)[source]

Difference the series.

Parameters:

periods (int) – Number of periods to lag. Default 1 (first difference).
method (str | DiffMethod) –
One of:
- "simple" — y[t] - y[t-k]
- "log" — log(y[t]) - log(y[t-k])
- "percent"— (y[t] - y[t-k]) / y[t-k]

Returns:

The leading NaN rows introduced by differencing are dropped.

Return type:

TimeSeries

Raises:

ValueError – If method is "log" or "percent" and the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 11.0, 12.0, 11.0, 13.0], index=idx)
>>> ts.diff().values
array([1., 1., -1., 2.])

log()[source]

Apply the natural logarithm element-wise.

Return type:: TimeSeries
Raises:: ValueError – If the series contains non-positive values.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> TimeSeries([1.0, np.e, np.e**2], index=idx).log().values
array([0., 1., 2.])

standardize()[source]

Standardise to zero mean and unit variance (z-score).

The transform is (x - mean) / std. NaN values are ignored when computing statistics but preserved in position.

Return type:: TimeSeries
Raises:: ValueError – If the standard deviation is zero (constant series).

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=4, freq="D")
>>> ts = TimeSeries([2.0, 4.0, 6.0, 8.0], index=idx)
>>> z = ts.standardize()
>>> round(float(z.values.mean()), 10)
0.0

normalize(*, lower=0.0, upper=1.0)[source]

Min-max normalise the series to [lower, upper].

Parameters:

lower (float) – Target minimum value. Default 0.0.
upper (float) – Target maximum value. Default 1.0.

Return type:

TimeSeries

Raises:

ValueError – If the series has zero range (max == min) or lower >= upper.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([0.0, 5.0, 10.0], index=idx)
>>> ts.normalize().values
array([0. , 0.5, 1. ])

rolling(window, *, agg=AggMethod.MEAN, center=False, min_periods=None)[source]

Apply a rolling-window aggregation.

Parameters:

window (int) – Size of the rolling window in number of observations.
agg (str | AggMethod) – Aggregation method (default "mean").
center (bool) – Whether to set the window labels as the centre of the window (default False — trailing window).
min_periods (int | None) – Minimum number of non-NaN observations required to produce a value. Defaults to window.

Returns:

Leading/trailing NaNs introduced by the window are dropped.

Return type:

TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=6, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], index=idx)
>>> ts.rolling(3).values
array([2., 3., 4., 5.])

apply(func, *, name=None)[source]

Apply an arbitrary element-wise function to the values.

Parameters:

func (Callable[[ndarray], ndarray]) – Callable that takes a 1-D numpy.ndarray and returns a 1-D array of the same length.
name (str | None) – Name for the resulting series. Defaults to "f({self.name})".

Return type:

TimeSeries

Raises:

ValueError – If func changes the length of the array.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 4.0, 9.0], index=idx)
>>> ts.apply(np.sqrt).values
array([1., 2., 3.])

__len__()[source]

Return type:: int

__contains__(timestamp)[source]

Check whether a timestamp exists in the index.

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=3, freq="D")
>>> ts = TimeSeries([1.0, 2.0, 3.0], index=idx)
>>> pd.Timestamp("2020-01-02") in ts
True

Parameters:: timestamp (object)
Return type:: bool

__getitem__(key)[source]

Positional indexing by integer or slice.

Parameters:

key (int | slice) –

int — return the scalar value at that position.
slice — return a new TimeSeries for that range.

Return type:

float | TimeSeries

Examples

>>> import pandas as pd, numpy as np
>>> idx = pd.date_range("2020", periods=5, freq="D")
>>> ts = TimeSeries([10.0, 20.0, 30.0, 40.0, 50.0], index=idx)
>>> ts[0]
10.0
>>> ts[-1]
50.0
>>> ts[1:3].values
array([20., 30.])

__repr__()[source]

Return repr(self).

Return type:: str

Types & Enumerations

class tseda.core.types.Frequency(*values)[source]

Bases: str, Enum

Canonical pandas offset aliases recognised by tseda.

The string value of each member is a valid freq argument to pandas.date_range() and pandas.Series.resample().

Examples

>>> Frequency.DAILY.value
'D'
>>> Frequency.DAILY == "D"
True

SECONDLY = 'S'

MINUTELY = 'min'

HOURLY = 'h'

DAILY = 'D'

BUSINESS_DAILY = 'B'

WEEKLY = 'W'

MONTHLY_START = 'MS'

MONTHLY_END = 'ME'

QUARTERLY_START = 'QS'

QUARTERLY_END = 'QE'

ANNUAL_START = 'YS'

ANNUAL_END = 'YE'

__repr__(): Return repr(self).

class tseda.core.types.AggMethod(*values)[source]

Bases: str, Enum

Aggregation functions available when resampling a TimeSeries.

The string value matches the pandas.core.resample.Resampler method name.

Examples

>>> AggMethod.MEAN.value
'mean'

MEAN = 'mean'

SUM = 'sum'

MIN = 'min'

MAX = 'max'

MEDIAN = 'median'

FIRST = 'first'

LAST = 'last'

STD = 'std'

VAR = 'var'

COUNT = 'count'

__repr__(): Return repr(self).

class tseda.core.types.DiffMethod(*values)[source]

Bases: str, Enum

Differencing mode for diff().

SIMPLE: y[t] - y[t-k] (standard first/kth difference).

LOG: log(y[t]) - log(y[t-k]) (log return / percent change in log scale).

PERCENT: (y[t] - y[t-k]) / y[t-k] (relative change).

SIMPLE = 'simple'

LOG = 'log'

PERCENT = 'percent'

__repr__(): Return repr(self).

Validators

Input validation utilities for tseda.

Every public function in this module raises a descriptive TypeError or ValueError on bad input and returns the canonicalised value on success. All heavy lifting of data coercion lives here so that TimeSeries and analysis modules stay clean.

Functions

validate_data_array: Coerce arbitrary numeric input to a 1-D float64 numpy.ndarray.
validate_datetime_index: Coerce arbitrary input to a sorted, duplicate-free pandas.DatetimeIndex.
validate_positive_int: Assert that a value is a positive integer.
validate_lags: Assert that the requested lag count is sensible relative to series length.
validate_freq_string: Assert that a string is a recognised pandas offset alias.

tseda.core.validator.validate_data_array(data, *, name='data')[source]

Coerce data to a 1-D float64 numpy.ndarray.

Parameters:

data (Any) –
Numeric input. Accepted types:
- numpy.ndarray — must be 1-D.
- pandas.Series — values extracted; index ignored.
- list or tuple — must be flat and numeric.
name (str) – Variable name used in error messages (default "data").

Returns:

1-D array of dtype float64. NaN values are preserved.

Return type:

numpy.ndarray

Raises:

TypeError – If data is not a recognised type.
ValueError – If data is not 1-D or contains non-numeric elements.

Examples

>>> validate_data_array([1.0, 2.0, 3.0])
array([1., 2., 3.])
>>> validate_data_array(pd.Series([1, 2, 3]))
array([1., 2., 3.])

tseda.core.validator.validate_datetime_index(index, *, name='index')[source]

Coerce index to a sorted, duplicate-free pandas.DatetimeIndex.

Parameters:

index (Any) –
Datetime-like input. Accepted types:
- pandas.DatetimeIndex
- pandas.Series with datetime dtype
- list or numpy.ndarray of datetime-like strings or numpy.datetime64 values
name (str) – Variable name used in error messages (default "index").

Returns:

Validated, monotonically increasing, duplicate-free index.

Return type:

pandas.DatetimeIndex

Raises:

TypeError – If index is not a recognised type.
ValueError – If index is not monotonically increasing or contains duplicates.

Examples

>>> idx = pd.date_range("2020-01-01", periods=5, freq="D")
>>> validate_datetime_index(idx)
DatetimeIndex(['2020-01-01', ..., '2020-01-05'], dtype='datetime64[ns]', freq='D')

tseda.core.validator.validate_positive_int(value, *, name='value')[source]

Assert that value is a positive integer.

Parameters:

value (Any) – The candidate value.
name (str) – Variable name used in error messages.

Returns:

The validated integer.

Return type:

int

Raises:

TypeError – If value is not an integer type.
ValueError – If value is less than 1.

Examples

>>> validate_positive_int(5)
5

tseda.core.validator.validate_lags(lags, n, *, name='lags')[source]

Assert that lags is a sensible lag count for a series of length n.

The upper bound is n // 2 because computing autocorrelations at lags approaching n produces unreliable estimates.

Parameters:

lags (int) – Requested number of lags.
n (int) – Length of the time series.
name (str) – Variable name used in error messages.

Returns:

The validated lag count.

Return type:

int

Raises:

ValueError – If lags is not in [1, n // 2].

Examples

>>> validate_lags(40, 100)
40

tseda.core.validator.validate_freq_string(freq, *, name='freq')[source]

Assert that freq is a non-empty string accepted by pandas.tseries.frequencies.to_offset().

Parameters:

freq (Any) – Candidate frequency string (e.g., "D", "h", "MS").
name (str) – Variable name used in error messages.

Returns:

The validated frequency string.

Return type:

str

Raises:

TypeError – If freq is not a string.
ValueError – If freq is not recognised by pandas.

Examples

>>> validate_freq_string("D")
'D'
>>> validate_freq_string("15min")
'15min'