Module classes

This module defines light-weight classes for common data.

Data2D

The Data2D class stores absorbance data at given times and wavelengths.

class mocca2.classes.Data2D(time: ndarray[tuple[int, ...], dtype[_ScalarType_co]], wavelength: ndarray[tuple[int, ...], dtype[_ScalarType_co]], data: ndarray[tuple[int, ...], dtype[_ScalarType_co]])

2D chromatogram data

time: ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Time points at which data was sampled

wavelength: ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Wavelengths at which data was sampled

data: ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Absorbances at given wavelength and time absorbance[wavelength, time]

closest_time(time: float) → Tuple[int, float]: Returns index and value of time point that is closest to specified time

closest_wavelength(wavelength: float) → Tuple[int, float]: Returns index and value of wavelength point that is closest to specified wavelength

extract_time(min_time: float | None, max_time: float | None, inplace: bool = False) → Data2D

Extracts the data in the given time range

Parameters

min_time: float | None: Start time of the extracted segment
max_time: float | None: End time of the extracted segment
inplace: bool: If True, modifies the data in-place and returns self

Returns

Data2D: The data in the given time interval

extract_wavelength(min_wavelength: float | None, max_wavelength: float | None, inplace: bool = False) → Data2D

Extracts the data in the given wavelength range

Parameters

min_wavelength: float | None: Start wavelength of the extracted segment
max_wavelength: float | None: End wavelength of the extracted segment
inplace: bool: If True, modifies the data in-place and returns self

Returns

Data2D: The data in the given wavelength interval

check_same_sampling(*others: Data2D, tol: float = 0.001, time: bool = True, wavelength: bool = True) → bool: Checks that the time sampling and wavelength sampling is same as in all provided data

interpolate_time(time: ndarray[tuple[int, ...], dtype[_ScalarType_co]], kind: str = 'linear', inplace: bool = False) → Data2D: Interpolates the data to the given time points using specified interpolation, see scipy.interpolate.interp1d for details

time_step() → float: Returns the sampling step of the time axis

wavelength_step() → float: Returns the sampling step of the wavelength axis

contract(method: Literal['mean', 'max', 'weighted_mean'] = 'mean', damping: float = 0.2) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]

Contracts the first dimension of 2D data to get 1D data for peak picking

Parameters

method: Literal[‘mean’, ‘max’, ‘weighted_mean’]: The method that should be used for contraction. ‘weighted_mean’ weights the wavelenghts average std
damping: float: damping factor for ‘weighted_mean’

Returns

Contracted 1D array

to_dict() → Dict[str, Any]: Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) → Data2D: Creates a Data2D object from a dictionary

plot(ax: Axes = None, color: str = 'k', label: str | None = None, zero_line: bool = False) → Axes: Plots the data using matplotlib.pyplot.imshow

plot_2d(ax: Axes = None, colormap: str = 'gist_ncar', colorbar: bool = True) → Axes: Plots the heatmap for intensity against time and wavelength

Chromatogram

The Chromatogram class extends the Data2D class, adding more metadata.

class mocca2.Chromatogram(sample: Data2D | str, blank: Data2D | str | None = None, name: str | None = None, interpolate_blank=False)

Information about single chromatogram, based on Data2D

peaks: List[Peak | DeconvolvedPeak]: Peaks in the chromatogram

name: str | None: Name of this chromatogram

sample_path: str | None: Filename of the raw chromatogram file

blank_path: str | None: Filename of the raw chromatogram file with blank

correct_baseline(method: Literal['asls', 'arpls', 'flatfit'] = 'flatfit', smoothness: float = 1.0, p: float | None = None, tol: float = 1e-07, max_iter: int | None = None, smooth_wl: int | None = None) → Chromatogram

Corrects the baseline using AsLS, arPLS or FlatFit algorithm

Parameters

data: NDArray | Data2D: Data with shape [N] or [sample, N]
method: Literal[‘asls’, ‘arpls’, ‘flatfit’]: Possible baseline estimation methods are AsLS, arPLS and FlatFit. FlatFit and AsLS work well with smooth data, asPLS works better with noisy data
smoothness: float: size of smoothness penalty
p: float | None: Assymetry factor, different for AsLS and arPLS
tol: float: maximum relative change of w for convergence
max_iter: int | None: maximum number of iterations. If not specified, guessed automatically
smooth_wl: int | None: if specified, applies Savitzky-Golay filter (order 2) accross wavelength axis with given window size

Returns

Chromatogram: Returns self

find_peaks(contraction: Literal['mean', 'max', 'weighted_mean'] = 'mean', min_rel_height: float = 0.01, min_height: float = 10.0, width_at: float = 0.1, expand_borders: bool = True, merge_overlapping: bool = True, split_threshold: float | None = 0.05, min_elution_time: float | None = None, max_elution_time: float | None = None) → Chromatogram

Finds all peaks in contracted data. Assumes that baseline is flat and centered around 0.

Parameters

contraction: Literal[‘mean’, ‘max’, ‘weighted_mean’]: Contraction method to project 2D data into 1D
min_rel_height: float: minimum relative prominence of the peaks (relative to highest peak)
min_height: float: minimum prominence of the peaks
width_at: float: the peak width will be measured at this fraction of peak height
expand_borders: bool: if True, tries to find peak borders. Otherwise borders from scipy are returned
merge_overlapping: bool: if True, also calls the merge_overlapping_peaks before returning the peaks
split_threshold: float | None: maximum height of a minimum separating two peaks for splitting, relative to smaller of the peaks
min_elution_time: int | None: if specified, peaks with maximum before min_elution_time are omitted
max_elution_time: int | None: if specified, peaks with maximum after max_elution_time are omitted

Returns

Chromatogram: Returns self

Description

The peaks are picked using scipy.signal.find_peaks and filtered based on min_rel_height
If min_elution_time or max_elution_time are specified, the peaks are filtered
If expand_borders, the borders of the peaks are expanded down to baseline (up to estimated background noise)
If merge_overlapping, any overlapping peaks are merged. See merge_overlapping_peaks
If split_threshold is provided, merged peaks with sufficient minimum separating them are split. See split_peaks

check_same_sampling(*others: Data2D, tol: float = 0.001, time: bool = True, wavelength: bool = True) → bool: Checks that the time sampling and wavelength sampling is same as in all provided data

closest_time(time: float) → Tuple[int, float]: Returns index and value of time point that is closest to specified time

closest_wavelength(wavelength: float) → Tuple[int, float]: Returns index and value of wavelength point that is closest to specified wavelength

contract(method: Literal['mean', 'max', 'weighted_mean'] = 'mean', damping: float = 0.2) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]

Contracts the first dimension of 2D data to get 1D data for peak picking

Parameters

method: Literal[‘mean’, ‘max’, ‘weighted_mean’]: The method that should be used for contraction. ‘weighted_mean’ weights the wavelenghts average std
damping: float: damping factor for ‘weighted_mean’

Returns

Contracted 1D array

deconvolve_peaks(model: PeakModel | Literal['BiGaussian', 'BiGaussianTailing', 'FraserSuzuki', 'Bemg'], min_r2: float, relaxe_concs: bool, max_comps: int) → Chromatogram

Deconvolves peaks with increasingly more components until MSE limit is reached. See deconvolve_adaptive() for details.

Parameters

model: PeakModel | Literal[‘BiGaussian’, ‘BiGaussianTailing’, ‘FraserSuzuki’]: mathematical model used for fitting shapes of components of peaks
min_r2: float: Minimum required R2 for deconvolution
relaxe_concs: bool: If False, the fitted peak model functions are returned. Otherwise, the concentrations are refined with restricted least squares
max_comps: int: Maximum number of components that can be fitted

Returns

Chromatogram: Returns self

extract_time(min_time: float | None, max_time: float | None, inplace: bool = False) → Data2D

Extracts the data in the given time range

Parameters

min_time: float | None: Start time of the extracted segment
max_time: float | None: End time of the extracted segment
inplace: bool: If True, modifies the data in-place and returns self

Returns

Data2D: The data in the given time interval

extract_wavelength(min_wavelength: float | None, max_wavelength: float | None, inplace: bool = False) → Data2D

Extracts the data in the given wavelength range

Parameters

min_wavelength: float | None: Start wavelength of the extracted segment
max_wavelength: float | None: End wavelength of the extracted segment
inplace: bool: If True, modifies the data in-place and returns self

Returns

Data2D: The data in the given wavelength interval

interpolate_time(time: ndarray[tuple[int, ...], dtype[_ScalarType_co]], kind: str = 'linear', inplace: bool = False) → Data2D: Interpolates the data to the given time points using specified interpolation, see scipy.interpolate.interp1d for details

plot_2d(ax: Axes = None, colormap: str = 'gist_ncar', colorbar: bool = True) → Axes: Plots the heatmap for intensity against time and wavelength

time_step() → float: Returns the sampling step of the time axis

wavelength_step() → float: Returns the sampling step of the wavelength axis

time: NDArray: Time points at which data was sampled

wavelength: NDArray: Wavelengths at which data was sampled

data: NDArray: Absorbances at given wavelength and time absorbance[wavelength, time]

all_components(sort_by: Callable[[Component], Any] | None = None) → List[Component]: Returns all peak components from this chromatogram

get_area_percent(wl_idx: int) → Dict[int, float]

Returns area % of individual components. Only deconvolved peaks are considered.

Parameters

wl_idx: int: Index of wavelength which will be used for calculating peak area

Returns

Dict[int, float]: Area % of individual compounds [compound_id -> area %]

get_integrals() → Dict[int, float]: Returns integrals of individual components [compound_id -> area %]. Only deconvolved peaks are considered.

get_relative_integrals(relative_to: int) → Dict[int, float]

Returns integrals of individual components relative to the specified compound. If reference compound is not present, returns empty dictionary. Only deconvolved peaks are considered.

Parameters

relative_to: int: All integrals will be divided by integral of compound with this ID

Returns

Dict[int, float]: Relative integrals of individual compounds [compound_id -> area %]

refine_peaks(compounds: Dict[int, Compound], model: PeakModel | Literal['BiGaussian', 'BiGaussianTailing', 'FraserSuzuki', 'Bemg'], relaxe_concs: bool, min_rel_integral: float) → Chromatogram

Refines the concentration profiles using averaged spectra of the compounds.

Removes components with insuffucient integrals.

to_dict() → Dict[str, Any]: Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) → Chromatogram: Creates a Chromatogram object from a dictionary

plot(ax: Axes = None, color: str = 'k', label: str | None = None, plot_peaks: bool = True, zero_line: bool = False) → Axes: Plots the data using matplotlib.pyplot.imshow

Peak

The Peak class stores information about single peak or multiple overlapping peaks.

class mocca2.classes.Peak(left: int, right: int, maximum: int, height: float, prominence: float, all_maxima: List[int] | None = None)

Information about single peak

left: int: Index of peak start

right: int: Index of peak end

maximum: int: Index of peak maximum

height: float: Absolute height of the peak

prominence: float: Height of the peak from the base

all_maxima: List[int]: Indeces of all maxima of the merged peaks

data(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Data2D) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Returns part of the 2D data that contains this peak

time(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Data2D) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Returns part of the timescale that contains this peak

to_dict() → Dict[str, Any]: Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) → Peak: Creates a Peak object from a dictionary

DeconvolvedPeak

The DeconvolvedPeak class extends the Peak class, containing information about all deconvolved components of the peak.

class mocca2.classes.DeconvolvedPeak(peak: Peak, concentrations: ndarray[tuple[int, ...], dtype[_ScalarType_co]], spectra: ndarray[tuple[int, ...], dtype[_ScalarType_co]], residual_mse: float, r2: float, resolved: bool)

Information about peak and its deconvolved components

data(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Data2D) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Returns part of the 2D data that contains this peak

time(data: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | Data2D) → ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Returns part of the timescale that contains this peak

left: int: Index of peak start

right: int: Index of peak end

maximum: int: Index of peak maximum

height: float: Absolute height of the peak

prominence: float: Height of the peak from the base

all_maxima: List[int]: Indeces of all maxima of the merged peaks

residual_mse: float: Residual MSE after deconvolution

r2: float: R2 after deconvolution

components: List[Component]: Deconvolved components of the peak

resolved: bool: This specifies, whether the deconvolution sufficiently explains the peak

merge_same_components() → None

Merges all the components with identical ID (not None).

Concentrations are added, spectra are averaged, weighted by concentration integral.

to_dict() → Dict[str, Any]: Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) → DeconvolvedPeak: Creates a DeconvolvedPeak object from a dictionary

Component

The Component class stores information about a single peak component, usually a pure compound.

class mocca2.classes.Component(concentration: ndarray[tuple[int, ...], dtype[_ScalarType_co]], spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]], time_offset: int = 0, peak_fraction: float = 1.0, compound_id: int | None = None)

Information about single deconvolved component of a peak

concentration: ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Concentration profile in the selected range

spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Spectrum of the component. Normalized such that mean = 1

elution_time: int: Index of time point with maximum concentration

integral: float: Integral (sum of individual time points) of the concentration of this component

compound_id: int | None: ID of compound, if assigned

peak_fraction: float: Fraction of the peak area that this component represents

get_area(wl_idx: int) → float: Returns peak area at given wavelength (specified by index)

to_dict() → Dict[str, Any]: Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) → Component: Creates a Component object from a dictionary

Compound

The Compound class stores information about a chemical compound.

class mocca2.classes.Compound(elution_time: int, spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]], name: str | None = None, concentration_factor: float | None = None, concentration_factor_vs_istd: float | None = None)

Information about single chemical compound

elution_time: int: Index of the elution time on the time scale

spectrum: ndarray[tuple[int, ...], dtype[_ScalarType_co]]: Absorption spectrum of the compound, normalized to mean = 1

name: str | None: Name of the compound

concentration_factor: float | None: Conversion factor to get absolute concentration, such that concentration = integral * concentration_factor

concentration_factor_vs_istd = float | None: Conversion factor to get absolute concentration relative to ISTD rel_conc = istd_conc * concentration_factor_vs_istd

absorption_maxima() → List[Tuple[int, float]]: Finds absorption maxima using 2nd derivatives. Returns indeces of absorption maxima and relative heights.

to_dict() → Dict[str, Any]: Converts the data to a dictionary for serialization

static from_dict(data: Dict[str, Any]) → Compound: Creates a Compound object from a dictionary