Module baseline
The baseline can be estimated using estimate_baseline()
.
Currently available algorithms are AsLS (asymetric least squares with smoothness penalty), arPLS (asymmetrically reweighted penalized least squares) and FlatFit. The AsLS and arPLS algorithms were adapted from 10.1039/C4AN01061B.
As described in the paper, AsLS is biased to estimating baseline lower than it actually is. On the other hand, arPLS can estimate the baseline too high which can cut off some smallest peaks.
In general, I would recommend using FlatFit. If your data is very noisy, arPLS should be a good choice.
estimate_baseline()
This is a wrapper for the baseline estimation algorithms asls()
, arpls()
and flatfit()
.
- mocca2.estimate_baseline(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D, method: Literal['asls', 'arpls', 'flatfit'] = 'arpls', smoothness: float = 1.0, p: float | None = None, tol: float = 1e-07, max_iter: int | None = None, smooth_wl: int | None = None) ndarray[Any, dtype[_ScalarType_co]]
Estimates baseline using AsLS, arPLS or FlatFit algorithm
Parameters
- data: NDArray | Data2D
Data with shape [N] or [sample, N]
- method: Literal[‘asls’, ‘arpls’, ‘flatfit’]
Possible baseline estimation methods are AsLS, arPLS and FlatFit. FlatFit and AsLS work well with smooth data, asPLS works better with noisy data
- smoothness: float
size of smoothness penalty
- p: float | None
Assymetry factor, different for AsLS and arPLS
- tol: float
maximum relative change of w for convergence
- max_iter: int | None
maximum number of iterations. If not specified, guessed automatically
- smooth_wl: int | None
if specified, applies Savitzky-Golay filter (order 2) accross wavelength axis with given window size
Returns
- NDArray
values that minimize the asymmetric squared error with smoothness penalty, same shape as data
See details in the individual routines or at [StackOverflow](https://stackoverflow.com/a/50160920) and [10.1039/C4AN01061B](https://doi.org/10.1039/C4AN01061B).
AsLS()
Asymmetric Least Squares with smoothness penalty.
- mocca2.baseline.asls.asls(data: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], smoothness: float, p: float, tol: float = 1e-07, max_iter: int | None = None, baseline_guess: ndarray[Any, dtype[_ScalarType_co]] | None = None) ndarray[Any, dtype[_ScalarType_co]]
AsLS: Asymmetric Least Squares with smoothness penalty
Parameters
- data: ArrayLike
1D data
- smoothness: float
size of smoothness penalty
- p: float
asymetry factor, w = p if y_fit < data else (1-p)
- tol: float
maximum relative change of w for convergence
- max_iter: int | None
maximum number of iterations
- baseline_guess: ArrayLike | None
initial guess for baseline
Returns
- NDArray
values that minimize the asymmetric squared error with smoothness penalty
Description
This routine finds vector z that minimized:
(y-z).T @ W @ (y-z) + smoothness * z.T @ D.T @ D @ z
where w = p if y < z else (1-p) and D is finite differences for second derivative.
See details at [StackOverflow](https://stackoverflow.com/a/50160920) and [10.1039/C4AN01061B](https://doi.org/10.1039/C4AN01061B).
arPLS()
Asymmetrically reweighted Penalized Least Squares ith smoothness penalty..
- mocca2.baseline.arpls.arpls(data: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], smoothness: float, p: float = 2.0, tol: float = 1e-07, max_iter: int | None = None, baseline_guess: ndarray[Any, dtype[_ScalarType_co]] | None = None) ndarray[Any, dtype[_ScalarType_co]]
arPLS: Asymmetrically Reweighted Penalized Least Squares
Parameters
- data: ArrayLike
1D data
- smoothness: float
size of smoothness penalty
- p: float
lower values shift the baseline lower
- tol: float
maximum relative change of w for convergence
- max_iter: int | None
maximum number of iterations
- baseline_guess: ArrayLike | None
initial guess for baseline
Returns
- NDArray
values that minimize the asymmetric squared error with smoothness penalty, and w
Description
This routine finds vector z that minimized:
(y-z).T @ W @ (y-z) + smoothness * z.T @ D.T @ D @ z
where w is nonlinear weighting function and D is finite differences for second derivative.
See details at [10.1039/C4AN01061B](https://doi.org/10.1039/C4AN01061B).
FlatFit()
FlatFit algorithm with smoothness penalty. The details will be published soon in the MOCCA2 paper.
- mocca2.baseline.flatfit.flatfit(data: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], smoothness: float, p: float) ndarray[Any, dtype[_ScalarType_co]]
FlatFit: least squares weighted by inverse scale of 1st and 2nd derivatives with smoothness penalty
Parameters
- data: ArrayLike
1D data
- smoothness: float
size of smoothness penalty
- p: float
relative size of Savitzky-Golay filter
Returns
- NDArray
values that minimize the asymmetric squared error with smoothness penalty
Description
This routine finds vector z that minimized:
(y-z).T @ W @ (y-z) + smoothness * z.T @ D.T @ D @ z
where W is determined by slope and curvature at given point