Module baseline

The baseline can be estimated using estimate_baseline().

Currently available algorithms are AsLS (asymetric least squares with smoothness penalty), arPLS (asymmetrically reweighted penalized least squares) and FlatFit. The AsLS and arPLS algorithms were adapted from 10.1039/C4AN01061B.

As described in the paper, AsLS is biased to estimating baseline lower than it actually is. On the other hand, arPLS can estimate the baseline too high which can cut off some smallest peaks.

In general, I would recommend using FlatFit. If your data is very noisy, arPLS should be a good choice.

estimate_baseline()

This is a wrapper for the baseline estimation algorithms asls(), arpls() and flatfit().

mocca2.estimate_baseline(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D, method: Literal['asls', 'arpls', 'flatfit'] = 'arpls', smoothness: float = 1.0, p: float | None = None, tol: float = 1e-07, max_iter: int | None = None, smooth_wl: int | None = None) ndarray[Any, dtype[_ScalarType_co]]

Estimates baseline using AsLS, arPLS or FlatFit algorithm

Parameters

data: NDArray | Data2D

Data with shape [N] or [sample, N]

method: Literal[‘asls’, ‘arpls’, ‘flatfit’]

Possible baseline estimation methods are AsLS, arPLS and FlatFit. FlatFit and AsLS work well with smooth data, asPLS works better with noisy data

smoothness: float

size of smoothness penalty

p: float | None

Assymetry factor, different for AsLS and arPLS

tol: float

maximum relative change of w for convergence

max_iter: int | None

maximum number of iterations. If not specified, guessed automatically

smooth_wl: int | None

if specified, applies Savitzky-Golay filter (order 2) accross wavelength axis with given window size

Returns

NDArray

values that minimize the asymmetric squared error with smoothness penalty, same shape as data

See details in the individual routines or at [StackOverflow](https://stackoverflow.com/a/50160920) and [10.1039/C4AN01061B](https://doi.org/10.1039/C4AN01061B).

AsLS()

Asymmetric Least Squares with smoothness penalty.

mocca2.baseline.asls.asls(data: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], smoothness: float, p: float, tol: float = 1e-07, max_iter: int | None = None, baseline_guess: ndarray[Any, dtype[_ScalarType_co]] | None = None) ndarray[Any, dtype[_ScalarType_co]]

AsLS: Asymmetric Least Squares with smoothness penalty

Parameters

data: ArrayLike

1D data

smoothness: float

size of smoothness penalty

p: float

asymetry factor, w = p if y_fit < data else (1-p)

tol: float

maximum relative change of w for convergence

max_iter: int | None

maximum number of iterations

baseline_guess: ArrayLike | None

initial guess for baseline

Returns

NDArray

values that minimize the asymmetric squared error with smoothness penalty

Description

This routine finds vector z that minimized:

(y-z).T @ W @ (y-z) + smoothness * z.T @ D.T @ D @ z

where w = p if y < z else (1-p) and D is finite differences for second derivative.

See details at [StackOverflow](https://stackoverflow.com/a/50160920) and [10.1039/C4AN01061B](https://doi.org/10.1039/C4AN01061B).

arPLS()

Asymmetrically reweighted Penalized Least Squares ith smoothness penalty..

mocca2.baseline.arpls.arpls(data: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], smoothness: float, p: float = 2.0, tol: float = 1e-07, max_iter: int | None = None, baseline_guess: ndarray[Any, dtype[_ScalarType_co]] | None = None) ndarray[Any, dtype[_ScalarType_co]]

arPLS: Asymmetrically Reweighted Penalized Least Squares

Parameters

data: ArrayLike

1D data

smoothness: float

size of smoothness penalty

p: float

lower values shift the baseline lower

tol: float

maximum relative change of w for convergence

max_iter: int | None

maximum number of iterations

baseline_guess: ArrayLike | None

initial guess for baseline

Returns

NDArray

values that minimize the asymmetric squared error with smoothness penalty, and w

Description

This routine finds vector z that minimized:

(y-z).T @ W @ (y-z) + smoothness * z.T @ D.T @ D @ z

where w is nonlinear weighting function and D is finite differences for second derivative.

See details at [10.1039/C4AN01061B](https://doi.org/10.1039/C4AN01061B).

FlatFit()

FlatFit algorithm with smoothness penalty. The details will be published soon in the MOCCA2 paper.

mocca2.baseline.flatfit.flatfit(data: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], smoothness: float, p: float) ndarray[Any, dtype[_ScalarType_co]]

FlatFit: least squares weighted by inverse scale of 1st and 2nd derivatives with smoothness penalty

Parameters

data: ArrayLike

1D data

smoothness: float

size of smoothness penalty

p: float

relative size of Savitzky-Golay filter

Returns

NDArray

values that minimize the asymmetric squared error with smoothness penalty

Description

This routine finds vector z that minimized:

(y-z).T @ W @ (y-z) + smoothness * z.T @ D.T @ D @ z

where W is determined by slope and curvature at given point