Module classes

This module defines light-weight classes for common data.

Data2D

The Data2D class stores absorbance data at given times and wavelengths.

class mocca2.classes.Data2D(time: ndarray[Any, dtype[_ScalarType_co]], wavelength: ndarray[Any, dtype[_ScalarType_co]], data: ndarray[Any, dtype[_ScalarType_co]])

2D chromatogram data

time: ndarray[Any, dtype[_ScalarType_co]]

Time points at which data was sampled

wavelength: ndarray[Any, dtype[_ScalarType_co]]

Wavelengths at which data was sampled

data: ndarray[Any, dtype[_ScalarType_co]]

Absorbances at given wavelength and time absorbance[wavelength, time]

closest_time(time: float) Tuple[int, float]

Returns index and value of time point that is closest to specified time

closest_wavelength(wavelength: float) Tuple[int, float]

Returns index and value of wavelength point that is closest to specified wavelength

extract_time(min_time: float | None, max_time: float | None, inplace: bool = False) Data2D

Extracts the data in the given time range

Parameters

min_time: float | None

Start time of the extracted segment

max_time: float | None

End time of the extracted segment

inplace: bool

If True, modifies the data in-place and returns self

Returns

Data2D

The data in the given time interval

extract_wavelength(min_wavelength: float | None, max_wavelength: float | None, inplace: bool = False) Data2D

Extracts the data in the given wavelength range

Parameters

min_wavelength: float | None

Start wavelength of the extracted segment

max_wavelength: float | None

End wavelength of the extracted segment

inplace: bool

If True, modifies the data in-place and returns self

Returns

Data2D

The data in the given wavelength interval

check_same_sampling(*others: Data2D, tol: float = 0.001, time: bool = True, wavelength: bool = True) bool

Checks that the time sampling and wavelength sampling is same as in all provided data

interpolate_time(time: ndarray[Any, dtype[_ScalarType_co]], kind: str = 'linear', inplace: bool = False) Data2D

Interpolates the data to the given time points using specified interpolation, see scipy.interpolate.interp1d for details

time_step() float

Returns the sampling step of the time axis

wavelength_step() float

Returns the sampling step of the wavelength axis

contract(method: Literal['mean', 'max', 'weighted_mean'] = 'mean', damping: float = 0.2) ndarray[Any, dtype[_ScalarType_co]]

Contracts the first dimension of 2D data to get 1D data for peak picking

Parameters

method: Literal[‘mean’, ‘max’, ‘weighted_mean’]

The method that should be used for contraction. ‘weighted_mean’ weights the wavelenghts average std

damping: float

damping factor for ‘weighted_mean’

Returns

Contracted 1D array

to_dict() Dict[str, Any]

Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) Data2D

Creates a Data2D object from a dictionary

plot(ax: Axes | None = None, color: str = 'k', label: str | None = None, zero_line: bool = False) Axes

Plots the data using matplotlib.pyplot.imshow

plot_2d(ax: Axes | None = None, colormap: str = 'gist_ncar', colorbar: bool = True) Axes

Plots the heatmap for intensity against time and wavelength

Chromatogram

The Chromatogram class extends the Data2D class, adding more metadata.

class mocca2.Chromatogram(sample: Data2D | str, blank: Data2D | str | None = None, name: str | None = None, interpolate_blank=False)

Information about single chromatogram, based on Data2D

peaks: List[Peak | DeconvolvedPeak]

Peaks in the chromatogram

name: str | None

Name of this chromatogram

sample_path: str | None

Filename of the raw chromatogram file

blank_path: str | None

Filename of the raw chromatogram file with blank

correct_baseline(method: Literal['asls', 'arpls', 'flatfit'] = 'flatfit', smoothness: float = 1.0, p: float | None = None, tol: float = 1e-07, max_iter: int | None = None, smooth_wl: int | None = None) Chromatogram

Corrects the baseline using AsLS, arPLS or FlatFit algorithm

Parameters

data: NDArray | Data2D

Data with shape [N] or [sample, N]

method: Literal[‘asls’, ‘arpls’, ‘flatfit’]

Possible baseline estimation methods are AsLS, arPLS and FlatFit. FlatFit and AsLS work well with smooth data, asPLS works better with noisy data

smoothness: float

size of smoothness penalty

p: float | None

Assymetry factor, different for AsLS and arPLS

tol: float

maximum relative change of w for convergence

max_iter: int | None

maximum number of iterations. If not specified, guessed automatically

smooth_wl: int | None

if specified, applies Savitzky-Golay filter (order 2) accross wavelength axis with given window size

Returns

Chromatogram

Returns self

find_peaks(contraction: Literal['mean', 'max', 'weighted_mean'] = 'mean', min_rel_height: float = 0.01, min_height: float = 10.0, width_at: float = 0.1, expand_borders: bool = True, merge_overlapping: bool = True, split_threshold: float | None = 0.05, min_elution_time: float | None = None, max_elution_time: float | None = None) Chromatogram

Finds all peaks in contracted data. Assumes that baseline is flat and centered around 0.

Parameters

contraction: Literal[‘mean’, ‘max’, ‘weighted_mean’]

Contraction method to project 2D data into 1D

min_rel_height: float

minimum relative prominence of the peaks (relative to highest peak)

min_height: float

minimum prominence of the peaks

width_at: float

the peak width will be measured at this fraction of peak height

expand_borders: bool

if True, tries to find peak borders. Otherwise borders from scipy are returned

merge_overlapping: bool

if True, also calls the merge_overlapping_peaks before returning the peaks

split_threshold: float | None

maximum height of a minimum separating two peaks for splitting, relative to smaller of the peaks

min_elution_time: int | None

if specified, peaks with maximum before min_elution_time are omitted

max_elution_time: int | None

if specified, peaks with maximum after max_elution_time are omitted

Returns

Chromatogram

Returns self

Description

  1. The peaks are picked using scipy.signal.find_peaks and filtered based on min_rel_height

  2. If min_elution_time or max_elution_time are specified, the peaks are filtered

  3. If expand_borders, the borders of the peaks are expanded down to baseline (up to estimated background noise)

  4. If merge_overlapping, any overlapping peaks are merged. See merge_overlapping_peaks

  5. If split_threshold is provided, merged peaks with sufficient minimum separating them are split. See split_peaks

check_same_sampling(*others: Data2D, tol: float = 0.001, time: bool = True, wavelength: bool = True) bool

Checks that the time sampling and wavelength sampling is same as in all provided data

closest_time(time: float) Tuple[int, float]

Returns index and value of time point that is closest to specified time

closest_wavelength(wavelength: float) Tuple[int, float]

Returns index and value of wavelength point that is closest to specified wavelength

contract(method: Literal['mean', 'max', 'weighted_mean'] = 'mean', damping: float = 0.2) ndarray[Any, dtype[_ScalarType_co]]

Contracts the first dimension of 2D data to get 1D data for peak picking

Parameters

method: Literal[‘mean’, ‘max’, ‘weighted_mean’]

The method that should be used for contraction. ‘weighted_mean’ weights the wavelenghts average std

damping: float

damping factor for ‘weighted_mean’

Returns

Contracted 1D array

deconvolve_peaks(model: PeakModel | Literal['BiGaussian', 'BiGaussianTailing', 'FraserSuzuki', 'Bemg'], min_r2: float, relaxe_concs: bool, max_comps: int) Chromatogram

Deconvolves peaks with increasingly more components until MSE limit is reached. See deconvolve_adaptive() for details.

Parameters

model: PeakModel | Literal[‘BiGaussian’, ‘BiGaussianTailing’, ‘FraserSuzuki’]

mathematical model used for fitting shapes of components of peaks

min_r2: float

Minimum required R2 for deconvolution

relaxe_concs: bool

If False, the fitted peak model functions are returned. Otherwise, the concentrations are refined with restricted least squares

max_comps: int

Maximum number of components that can be fitted

Returns

Chromatogram

Returns self

extract_time(min_time: float | None, max_time: float | None, inplace: bool = False) Data2D

Extracts the data in the given time range

Parameters

min_time: float | None

Start time of the extracted segment

max_time: float | None

End time of the extracted segment

inplace: bool

If True, modifies the data in-place and returns self

Returns

Data2D

The data in the given time interval

extract_wavelength(min_wavelength: float | None, max_wavelength: float | None, inplace: bool = False) Data2D

Extracts the data in the given wavelength range

Parameters

min_wavelength: float | None

Start wavelength of the extracted segment

max_wavelength: float | None

End wavelength of the extracted segment

inplace: bool

If True, modifies the data in-place and returns self

Returns

Data2D

The data in the given wavelength interval

interpolate_time(time: ndarray[Any, dtype[_ScalarType_co]], kind: str = 'linear', inplace: bool = False) Data2D

Interpolates the data to the given time points using specified interpolation, see scipy.interpolate.interp1d for details

plot_2d(ax: Axes | None = None, colormap: str = 'gist_ncar', colorbar: bool = True) Axes

Plots the heatmap for intensity against time and wavelength

time_step() float

Returns the sampling step of the time axis

wavelength_step() float

Returns the sampling step of the wavelength axis

time: NDArray

Time points at which data was sampled

wavelength: NDArray

Wavelengths at which data was sampled

data: NDArray

Absorbances at given wavelength and time absorbance[wavelength, time]

all_components(sort_by: Callable[[Component], Any] | None = None) List[Component]

Returns all peak components from this chromatogram

get_area_percent(wl_idx: int) Dict[int, float]

Returns area % of individual components. Only deconvolved peaks are considered.

Parameters

wl_idx: int

Index of wavelength which will be used for calculating peak area

Returns

Dict[int, float]

Area % of individual compounds [compound_id -> area %]

get_integrals() Dict[int, float]

Returns integrals of individual components [compound_id -> area %]. Only deconvolved peaks are considered.

get_relative_integrals(relative_to: int) Dict[int, float]

Returns integrals of individual components relative to the specified compound. If reference compound is not present, returns empty dictionary. Only deconvolved peaks are considered.

Parameters

relative_to: int

All integrals will be divided by integral of compound with this ID

Returns

Dict[int, float]

Relative integrals of individual compounds [compound_id -> area %]

refine_peaks(compounds: Dict[int, Compound], model: PeakModel | Literal['BiGaussian', 'BiGaussianTailing', 'FraserSuzuki', 'Bemg'], relaxe_concs: bool, min_rel_integral: float) Chromatogram

Refines the concentration profiles using averaged spectra of the compounds.

Removes components with insuffucient integrals.

to_dict() Dict[str, Any]

Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) Chromatogram

Creates a Chromatogram object from a dictionary

plot(ax: Axes | None = None, color: str = 'k', label: str | None = None, plot_peaks: bool = True, zero_line: bool = False) Axes

Plots the data using matplotlib.pyplot.imshow

Peak

The Peak class stores information about single peak or multiple overlapping peaks.

class mocca2.classes.Peak(left: int, right: int, maximum: int, height: float, prominence: float, all_maxima: List[int] | None = None)

Information about single peak

left: int

Index of peak start

right: int

Index of peak end

maximum: int

Index of peak maximum

height: float

Absolute height of the peak

prominence: float

Height of the peak from the base

all_maxima: List[int]

Indeces of all maxima of the merged peaks

data(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]

Returns part of the 2D data that contains this peak

time(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]

Returns part of the timescale that contains this peak

to_dict() Dict[str, Any]

Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) Peak

Creates a Peak object from a dictionary

DeconvolvedPeak

The DeconvolvedPeak class extends the Peak class, containing information about all deconvolved components of the peak.

class mocca2.classes.DeconvolvedPeak(peak: Peak, concentrations: ndarray[Any, dtype[_ScalarType_co]], spectra: ndarray[Any, dtype[_ScalarType_co]], residual_mse: float, r2: float, resolved: bool)

Information about peak and its deconvolved components

data(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]

Returns part of the 2D data that contains this peak

time(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]

Returns part of the timescale that contains this peak

left: int

Index of peak start

right: int

Index of peak end

maximum: int

Index of peak maximum

height: float

Absolute height of the peak

prominence: float

Height of the peak from the base

all_maxima: List[int]

Indeces of all maxima of the merged peaks

residual_mse: float

Residual MSE after deconvolution

r2: float

R2 after deconvolution

components: List[Component]

Deconvolved components of the peak

resolved: bool

This specifies, whether the deconvolution sufficiently explains the peak

merge_same_components() None

Merges all the components with identical ID (not None).

Concentrations are added, spectra are averaged, weighted by concentration integral.

to_dict() Dict[str, Any]

Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) DeconvolvedPeak

Creates a DeconvolvedPeak object from a dictionary

Component

The Component class stores information about a single peak component, usually a pure compound.

class mocca2.classes.Component(concentration: ndarray[Any, dtype[_ScalarType_co]], spectrum: ndarray[Any, dtype[_ScalarType_co]], time_offset: int = 0, peak_fraction: float = 1.0, compound_id: int | None = None)

Information about single deconvolved component of a peak

concentration: ndarray[Any, dtype[_ScalarType_co]]

Concentration profile in the selected range

spectrum: ndarray[Any, dtype[_ScalarType_co]]

Spectrum of the component. Normalized such that mean = 1

elution_time: int

Index of time point with maximum concentration

integral: float

Integral (sum of individual time points) of the concentration of this component

compound_id: int | None

ID of compound, if assigned

peak_fraction: float

Fraction of the peak area that this component represents

get_area(wl_idx: int) float

Returns peak area at given wavelength (specified by index)

to_dict() Dict[str, Any]

Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) Component

Creates a Component object from a dictionary

Compound

The Compound class stores information about a chemical compound.

class mocca2.classes.Compound(elution_time: int, spectrum: ndarray[Any, dtype[_ScalarType_co]], name: str | None = None, concentration_factor: float | None = None, concentration_factor_vs_istd: float | None = None)

Information about single chemical compound

elution_time: int

Index of the elution time on the time scale

spectrum: ndarray[Any, dtype[_ScalarType_co]]

Absorption spectrum of the compound, normalized to mean = 1

name: str | None

Name of the compound

concentration_factor: float | None

Conversion factor to get absolute concentration, such that concentration = integral * concentration_factor

concentration_factor_vs_istd = float | None

Conversion factor to get absolute concentration relative to ISTD rel_conc = istd_conc * concentration_factor_vs_istd

absorption_maxima() List[Tuple[int, float]]

Finds absorption maxima using 2nd derivatives. Returns indeces of absorption maxima and relative heights.

to_dict() Dict[str, Any]

Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) Compound

Creates a Compound object from a dictionary