Module deconvolution

This module contains numerous routines for fitting peak shapes and deconvolution of 2D spectra.

Automatic deconvolution

The deconvolution process is inspired by article by Shimadzu 10.1016/j.chroma.2016.09.037.

The entire procedure is as follows:

Get initial guess of spectra
- if enough maxima are present, spectra at maxima are used
- otherwise, the timepoints are clustered based on cosine similarity and averaged spectra of clusters are used
Refine spectra by fitting peak shapes to some PeakModel
Optionally, unrestricted refinement of concentrations based on spectra

deconvolve_adaptive()

mocca2.deconvolve_adaptive(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]], model: PeakModel | Literal['BiGaussian', 'BiGaussianTailing', 'FraserSuzuki', 'Bemg'], max_mse: float, relaxe_concs: bool, min_comps: int, max_comps: int) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], float]

Deconvolves data with increasingly more components until MSE limit is reached

Parameters

data: NDArray

2D data [wavelength, data]

model: PeakModel | Literal[‘BiGaussian’, ‘BiGaussianTailing’, ‘FraserSuzuki’]

mathematical model used for fitting shapes of components of peaks

max_mse: float

Maximum allowed MSE for termination

relaxe_concs: bool

If False, the fitted peak model functions are returned

Otherwise, the concentrations are refined with restricted least squares

min_comps: int

Minimum number of components that can be fitted

max_comps: int

Maximum number of components that can be fitted

Returns

NDArray: concentrations [compound, time]
NDArray: spectra [compound, wavelength], normalized such that mean = 1
float: MSE

deconvolve_fixed()

mocca2.deconvolve_fixed(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]], n_comps: int, model: PeakModel, relaxe_concs: bool) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], float]

Deconvolves data with given number of components. Returns concentration, spectra and MSE

Parameters

data: NDArray

2D data [wavelength, data]

n_comps: int

how many components should be used for deconvolution

model: PeakModel

mathematical model used for fitting shapes of components of peaks

relaxe_concs: bool

If False, the fitted bigaussian functions are returned

Otherwise, the concentrations are refined with restricted least squares

Returns

NDArray: concentrations [compound, time]
NDArray: spectra [compound, wavelength], normalized such that mean = 1
float: MSE

Peak Models

Mathematical models that describe peak shapes. A nice summary of peak shapes is in 10.1016/S0922-3487(98)80024-5.

You can define your own peak shape model, but it must inherit from PeakModel and implement all relevant methods.

PeakModel

Abstract class that defines all methods that peak model must have.

class mocca2.deconvolution.peak_models.PeakModel

Abstract class for functions that model peak shape

val(t: float | ndarray[tuple[int, ...], dtype[_ScalarType_co]], *params: float) → float | ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Returns height of peak specified by params at given time t.

grad(t: float | ndarray[tuple[int, ...], dtype[_ScalarType_co]], *params: float) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Returns gradient w.r.t params. If t is array, the return shape is [parameter, t]

init_guess(height: float, maximum: float, width_left: float, width_right: float) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Creates initial guess of the model parameters from peak descriptors (height, location of maximum, left and right widths of peak)

get_bounds(max_t: float) → List[Tuple[float, float]]

Returns bounds for individual parameters in the shape [(p1 min, p1 max), (p2 min, p2 max), …].

max_t is the maximum t that will be passed into the model, and also the number of time points in the peak region.

n_params() → int: Returns number of parameters of the model

FraserSuzuki

Fraser-Suzuki peak model works very well on high-quality experimental data. It is described by the following equation.

\[f(t, h, \mu, \sigma, a) = h \cdot \exp \left( - \frac{1}{a^2} \ln^2 \left( 1 + \frac{a(t-\mu)}{\sigma} \right) \right)\]

class mocca2.deconvolution.peak_models.FraserSuzuki: Fraser-Suzuki peak model

BiGaussian

BiGaussian peak model is simple yet efficient for describing peaks. It is described by the following equation.

\[\begin{split}f(t, h, \mu, \sigma_1, \sigma_2) = \begin{cases} h \cdot \exp \left( -\frac{(t-\mu)^2}{2{\sigma_1}^2}\right) &\text{if } t < \mu \\ h \cdot \exp \left( -\frac{(t-\mu)^2}{2{\sigma_2}^2}\right) &\text{if } t \ge \mu \end{cases}\end{split}\]

class mocca2.deconvolution.peak_models.BiGaussian: BiGaussian peak model

BiGaussianTailing

BiGaussian with exponential tailing. This is suitable if the peaks have slowly decaying tails. It is described by the following equation.

\[\begin{split}f(t, h, \mu, \sigma_1, \sigma_2, t_h, t_w) = \begin{cases} h \cdot \exp \left( -\frac{(t-\mu)^2}{2{\sigma_1}^2}\right) &\text{if } t < \mu \\ h \cdot (1-t_h)\cdot \exp \left( -\frac{(t-\mu)^2}{2{\sigma_2}^2}\right) + h \cdot t_h \cdot \exp \left(-\frac{t-\mu}{t_w}\right) &\text{if } t \ge \mu \end{cases}\end{split}\]

class mocca2.deconvolution.peak_models.BiGaussianTailing: BiGaussian with exponential tailing peak model

BEMG

Bi-Exponentially Modified Gaussian function as described in 10.1016/j.chroma.2016.09.037. Very versatile model that can describe tailing and fronting.

class mocca2.deconvolution.peak_models.Bemg: Bi-Exponentially Modified Gaussian peak model

Fitting Peak Models

The peak shape, and optionally also the component spectra, can be fitted using fit_peak_model() to best explain the 2D chromatogram data.

Initial guess for spectra of individual components can be obtained from guess_spectra().

fit_peak_model_and_spectra()

mocca2.deconvolution.fit_peak_model.fit_peak_model(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]], model: PeakModel, spectra: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | None = None, n_compounds: int | None = None, adjust_spectra: bool = True, initial_params: None | ndarray[tuple[int, ...], dtype[_ScalarType_co]] = None, initial_concs: None | ndarray[tuple[int, ...], dtype[_ScalarType_co]] = None) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], float, ndarray[tuple[int, ...], dtype[_ScalarType_co]]]

Fits peak shapes to data. Spectra can be fixed or fitted too.

Parameters

data: NDArray: 2D chromatogram data, [wavelength, time]
model: PeakModel: mathematical model of the peak shape
spectra: NDArray | None: absorption spectra of individual components, [components, wavelength]. If not provided, guess_spectra() is used.
n_compounds: int | None = None: number of peaks to deconvolve, must be provided if spectra is None. Ignored if spectra are provided
adjust_spectra: bool: whether the spectra should be also fitted. When False, the spectra are kept constant
initial_params: None | NDArray: initial parameters for the models, flattened
initial_concs: None | NDArray: initial guess of the concentrations

Returns

NDArray:: Concentrations [component, time]
float: Mean Squared Error
NDArray:: The optimized parameters, flattened

guess_spectra()

mocca2.deconvolution.guess_spectra.guess_spectra(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]], n_compounds: int) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]

Clusters data in the peak to try to isolate dominant spectra.

Parameters

data: NDArray: 2D peak data [wavelength, time]
n_compounds: int: how many spectra should be generated

Returns

NDArray: The generated spectra [compound, wavelength]. Spectra are not normalized

Constrained Linear Problems

This submodule provides methods for estimating the non-negative concentrations or spectra that best fit the 2D chromatogram data, assuming the other information (concentrations or spectra) is known.

concentrations_from_spectra()

mocca2.deconvolution.nonnegative_lstsq.concentrations_from_spectra(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]], spectra: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], float]

Calculates positive concentrations such that the mean squared error is minimized. Returns concentrations and MSE

Parameters

data: NDArray: Absorbance data [wavelength, time]
spectra: NDArray: Spectra of the individual compounds [compound, wavelength]

Returns

NDArray: Positive concentrations that minimize the MSE [compound, time]
float: MSE

spectra_from_concentrations()

mocca2.deconvolution.nonnegative_lstsq.spectra_from_concentrations(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]], concentrations: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], float]

Calculates positive spectra such that the mean squared error is minimized. Returns spectra and MSE

Parameters

data: NDArray: Absorbance data [wavelength, time]
concentrations: NDArray: Concentrations of the individual compounds [compound, time]

Returns

NDArray: Positive spectra that minimize the MSE [compound, wavelength]
float: MSE