Module classes
This module defines light-weight classes for common data.
Data2D
The Data2D class stores absorbance data at given times and wavelengths.
- class mocca2.classes.Data2D(time: ndarray[Any, dtype[_ScalarType_co]], wavelength: ndarray[Any, dtype[_ScalarType_co]], data: ndarray[Any, dtype[_ScalarType_co]])
2D chromatogram data
- time: ndarray[Any, dtype[_ScalarType_co]]
Time points at which data was sampled
- wavelength: ndarray[Any, dtype[_ScalarType_co]]
Wavelengths at which data was sampled
- data: ndarray[Any, dtype[_ScalarType_co]]
Absorbances at given wavelength and time absorbance[wavelength, time]
- closest_time(time: float) Tuple[int, float]
Returns index and value of time point that is closest to specified time
- closest_wavelength(wavelength: float) Tuple[int, float]
Returns index and value of wavelength point that is closest to specified wavelength
- extract_time(min_time: float | None, max_time: float | None, inplace: bool = False) Data2D
Extracts the data in the given time range
Parameters
- min_time: float | None
Start time of the extracted segment
- max_time: float | None
End time of the extracted segment
- inplace: bool
If True, modifies the data in-place and returns self
Returns
- Data2D
The data in the given time interval
- extract_wavelength(min_wavelength: float | None, max_wavelength: float | None, inplace: bool = False) Data2D
Extracts the data in the given wavelength range
Parameters
- min_wavelength: float | None
Start wavelength of the extracted segment
- max_wavelength: float | None
End wavelength of the extracted segment
- inplace: bool
If True, modifies the data in-place and returns self
Returns
- Data2D
The data in the given wavelength interval
- check_same_sampling(*others: Data2D, tol: float = 0.001, time: bool = True, wavelength: bool = True) bool
Checks that the time sampling and wavelength sampling is same as in all provided data
- interpolate_time(time: ndarray[Any, dtype[_ScalarType_co]], kind: str = 'linear', inplace: bool = False) Data2D
Interpolates the data to the given time points using specified interpolation, see scipy.interpolate.interp1d for details
- time_step() float
Returns the sampling step of the time axis
- wavelength_step() float
Returns the sampling step of the wavelength axis
- contract(method: Literal['mean', 'max', 'weighted_mean'] = 'mean', damping: float = 0.2) ndarray[Any, dtype[_ScalarType_co]]
Contracts the first dimension of 2D data to get 1D data for peak picking
Parameters
- method: Literal[‘mean’, ‘max’, ‘weighted_mean’]
The method that should be used for contraction. ‘weighted_mean’ weights the wavelenghts average std
- damping: float
damping factor for ‘weighted_mean’
Returns
Contracted 1D array
- to_dict() Dict[str, Any]
Converts the data to a dictionary for serialization
- plot(ax: Axes | None = None, color: str = 'k', label: str | None = None, zero_line: bool = False) Axes
Plots the data using matplotlib.pyplot.imshow
- plot_2d(ax: Axes | None = None, colormap: str = 'gist_ncar', colorbar: bool = True) Axes
Plots the heatmap for intensity against time and wavelength
Chromatogram
The Chromatogram class extends the Data2D class, adding more metadata.
- class mocca2.Chromatogram(sample: Data2D | str, blank: Data2D | str | None = None, name: str | None = None, interpolate_blank=False)
Information about single chromatogram, based on Data2D
- peaks: List[Peak | DeconvolvedPeak]
Peaks in the chromatogram
- name: str | None
Name of this chromatogram
- sample_path: str | None
Filename of the raw chromatogram file
- blank_path: str | None
Filename of the raw chromatogram file with blank
- correct_baseline(method: Literal['asls', 'arpls', 'flatfit'] = 'flatfit', smoothness: float = 1.0, p: float | None = None, tol: float = 1e-07, max_iter: int | None = None, smooth_wl: int | None = None) Chromatogram
Corrects the baseline using AsLS, arPLS or FlatFit algorithm
Parameters
- data: NDArray | Data2D
Data with shape [N] or [sample, N]
- method: Literal[‘asls’, ‘arpls’, ‘flatfit’]
Possible baseline estimation methods are AsLS, arPLS and FlatFit. FlatFit and AsLS work well with smooth data, asPLS works better with noisy data
- smoothness: float
size of smoothness penalty
- p: float | None
Assymetry factor, different for AsLS and arPLS
- tol: float
maximum relative change of w for convergence
- max_iter: int | None
maximum number of iterations. If not specified, guessed automatically
- smooth_wl: int | None
if specified, applies Savitzky-Golay filter (order 2) accross wavelength axis with given window size
Returns
- Chromatogram
Returns self
- find_peaks(contraction: Literal['mean', 'max', 'weighted_mean'] = 'mean', min_rel_height: float = 0.01, min_height: float = 10.0, width_at: float = 0.1, expand_borders: bool = True, merge_overlapping: bool = True, split_threshold: float | None = 0.05, min_elution_time: float | None = None, max_elution_time: float | None = None) Chromatogram
Finds all peaks in contracted data. Assumes that baseline is flat and centered around 0.
Parameters
- contraction: Literal[‘mean’, ‘max’, ‘weighted_mean’]
Contraction method to project 2D data into 1D
- min_rel_height: float
minimum relative prominence of the peaks (relative to highest peak)
- min_height: float
minimum prominence of the peaks
- width_at: float
the peak width will be measured at this fraction of peak height
- expand_borders: bool
if True, tries to find peak borders. Otherwise borders from scipy are returned
- merge_overlapping: bool
if True, also calls the merge_overlapping_peaks before returning the peaks
- split_threshold: float | None
maximum height of a minimum separating two peaks for splitting, relative to smaller of the peaks
- min_elution_time: int | None
if specified, peaks with maximum before min_elution_time are omitted
- max_elution_time: int | None
if specified, peaks with maximum after max_elution_time are omitted
Returns
- Chromatogram
Returns self
Description
The peaks are picked using scipy.signal.find_peaks and filtered based on min_rel_height
If min_elution_time or max_elution_time are specified, the peaks are filtered
If expand_borders, the borders of the peaks are expanded down to baseline (up to estimated background noise)
If merge_overlapping, any overlapping peaks are merged. See merge_overlapping_peaks
If split_threshold is provided, merged peaks with sufficient minimum separating them are split. See split_peaks
- check_same_sampling(*others: Data2D, tol: float = 0.001, time: bool = True, wavelength: bool = True) bool
Checks that the time sampling and wavelength sampling is same as in all provided data
- closest_time(time: float) Tuple[int, float]
Returns index and value of time point that is closest to specified time
- closest_wavelength(wavelength: float) Tuple[int, float]
Returns index and value of wavelength point that is closest to specified wavelength
- contract(method: Literal['mean', 'max', 'weighted_mean'] = 'mean', damping: float = 0.2) ndarray[Any, dtype[_ScalarType_co]]
Contracts the first dimension of 2D data to get 1D data for peak picking
Parameters
- method: Literal[‘mean’, ‘max’, ‘weighted_mean’]
The method that should be used for contraction. ‘weighted_mean’ weights the wavelenghts average std
- damping: float
damping factor for ‘weighted_mean’
Returns
Contracted 1D array
- deconvolve_peaks(model: PeakModel | Literal['BiGaussian', 'BiGaussianTailing', 'FraserSuzuki', 'Bemg'], min_r2: float, relaxe_concs: bool, max_comps: int) Chromatogram
Deconvolves peaks with increasingly more components until MSE limit is reached. See deconvolve_adaptive() for details.
Parameters
- model: PeakModel | Literal[‘BiGaussian’, ‘BiGaussianTailing’, ‘FraserSuzuki’]
mathematical model used for fitting shapes of components of peaks
- min_r2: float
Minimum required R2 for deconvolution
- relaxe_concs: bool
If False, the fitted peak model functions are returned. Otherwise, the concentrations are refined with restricted least squares
- max_comps: int
Maximum number of components that can be fitted
Returns
- Chromatogram
Returns self
- extract_time(min_time: float | None, max_time: float | None, inplace: bool = False) Data2D
Extracts the data in the given time range
Parameters
- min_time: float | None
Start time of the extracted segment
- max_time: float | None
End time of the extracted segment
- inplace: bool
If True, modifies the data in-place and returns self
Returns
- Data2D
The data in the given time interval
- extract_wavelength(min_wavelength: float | None, max_wavelength: float | None, inplace: bool = False) Data2D
Extracts the data in the given wavelength range
Parameters
- min_wavelength: float | None
Start wavelength of the extracted segment
- max_wavelength: float | None
End wavelength of the extracted segment
- inplace: bool
If True, modifies the data in-place and returns self
Returns
- Data2D
The data in the given wavelength interval
- interpolate_time(time: ndarray[Any, dtype[_ScalarType_co]], kind: str = 'linear', inplace: bool = False) Data2D
Interpolates the data to the given time points using specified interpolation, see scipy.interpolate.interp1d for details
- plot_2d(ax: Axes | None = None, colormap: str = 'gist_ncar', colorbar: bool = True) Axes
Plots the heatmap for intensity against time and wavelength
- time_step() float
Returns the sampling step of the time axis
- wavelength_step() float
Returns the sampling step of the wavelength axis
- time: NDArray
Time points at which data was sampled
- wavelength: NDArray
Wavelengths at which data was sampled
- data: NDArray
Absorbances at given wavelength and time absorbance[wavelength, time]
- all_components(sort_by: Callable[[Component], Any] | None = None) List[Component]
Returns all peak components from this chromatogram
- get_area_percent(wl_idx: int) Dict[int, float]
Returns area % of individual components. Only deconvolved peaks are considered.
Parameters
- wl_idx: int
Index of wavelength which will be used for calculating peak area
Returns
- Dict[int, float]
Area % of individual compounds [compound_id -> area %]
- get_integrals() Dict[int, float]
Returns integrals of individual components [compound_id -> area %]. Only deconvolved peaks are considered.
- get_relative_integrals(relative_to: int) Dict[int, float]
Returns integrals of individual components relative to the specified compound. If reference compound is not present, returns empty dictionary. Only deconvolved peaks are considered.
Parameters
- relative_to: int
All integrals will be divided by integral of compound with this ID
Returns
- Dict[int, float]
Relative integrals of individual compounds [compound_id -> area %]
- refine_peaks(compounds: Dict[int, Compound], model: PeakModel | Literal['BiGaussian', 'BiGaussianTailing', 'FraserSuzuki', 'Bemg'], relaxe_concs: bool, min_rel_integral: float) Chromatogram
Refines the concentration profiles using averaged spectra of the compounds.
Removes components with insuffucient integrals.
- to_dict() Dict[str, Any]
Converts the data to a dictionary for serialization
- static from_dict(data: Dict[str, Any]) Chromatogram
Creates a Chromatogram object from a dictionary
- plot(ax: Axes | None = None, color: str = 'k', label: str | None = None, plot_peaks: bool = True, zero_line: bool = False) Axes
Plots the data using matplotlib.pyplot.imshow
Peak
The Peak class stores information about single peak or multiple overlapping peaks.
- class mocca2.classes.Peak(left: int, right: int, maximum: int, height: float, prominence: float, all_maxima: List[int] | None = None)
Information about single peak
- left: int
Index of peak start
- right: int
Index of peak end
- maximum: int
Index of peak maximum
- height: float
Absolute height of the peak
- prominence: float
Height of the peak from the base
- all_maxima: List[int]
Indeces of all maxima of the merged peaks
- data(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]
Returns part of the 2D data that contains this peak
- time(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]
Returns part of the timescale that contains this peak
- to_dict() Dict[str, Any]
Converts the data to a dictionary for serialization
DeconvolvedPeak
The DeconvolvedPeak class extends the Peak class, containing information about all deconvolved components of the peak.
- class mocca2.classes.DeconvolvedPeak(peak: Peak, concentrations: ndarray[Any, dtype[_ScalarType_co]], spectra: ndarray[Any, dtype[_ScalarType_co]], residual_mse: float, r2: float, resolved: bool)
Information about peak and its deconvolved components
- data(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]
Returns part of the 2D data that contains this peak
- time(data: ndarray[Any, dtype[_ScalarType_co]] | Data2D) ndarray[Any, dtype[_ScalarType_co]]
Returns part of the timescale that contains this peak
- left: int
Index of peak start
- right: int
Index of peak end
- maximum: int
Index of peak maximum
- height: float
Absolute height of the peak
- prominence: float
Height of the peak from the base
- all_maxima: List[int]
Indeces of all maxima of the merged peaks
- residual_mse: float
Residual MSE after deconvolution
- r2: float
R2 after deconvolution
- resolved: bool
This specifies, whether the deconvolution sufficiently explains the peak
- merge_same_components() None
Merges all the components with identical ID (not None).
Concentrations are added, spectra are averaged, weighted by concentration integral.
- to_dict() Dict[str, Any]
Converts the data to a dictionary for serialization
- static from_dict(data: Dict[str, Any]) DeconvolvedPeak
Creates a DeconvolvedPeak object from a dictionary
Component
The Component class stores information about a single peak component, usually a pure compound.
- class mocca2.classes.Component(concentration: ndarray[Any, dtype[_ScalarType_co]], spectrum: ndarray[Any, dtype[_ScalarType_co]], time_offset: int = 0, peak_fraction: float = 1.0, compound_id: int | None = None)
Information about single deconvolved component of a peak
- concentration: ndarray[Any, dtype[_ScalarType_co]]
Concentration profile in the selected range
- spectrum: ndarray[Any, dtype[_ScalarType_co]]
Spectrum of the component. Normalized such that mean = 1
- elution_time: int
Index of time point with maximum concentration
- integral: float
Integral (sum of individual time points) of the concentration of this component
- compound_id: int | None
ID of compound, if assigned
- peak_fraction: float
Fraction of the peak area that this component represents
- get_area(wl_idx: int) float
Returns peak area at given wavelength (specified by index)
- to_dict() Dict[str, Any]
Converts the data to a dictionary for serialization
Compound
The Compound class stores information about a chemical compound.
- class mocca2.classes.Compound(elution_time: int, spectrum: ndarray[Any, dtype[_ScalarType_co]], name: str | None = None, concentration_factor: float | None = None, concentration_factor_vs_istd: float | None = None)
Information about single chemical compound
- elution_time: int
Index of the elution time on the time scale
- spectrum: ndarray[Any, dtype[_ScalarType_co]]
Absorption spectrum of the compound, normalized to mean = 1
- name: str | None
Name of the compound
- concentration_factor: float | None
Conversion factor to get absolute concentration, such that concentration = integral * concentration_factor
- concentration_factor_vs_istd = float | None
Conversion factor to get absolute concentration relative to ISTD rel_conc = istd_conc * concentration_factor_vs_istd
- absorption_maxima() List[Tuple[int, float]]
Finds absorption maxima using 2nd derivatives. Returns indeces of absorption maxima and relative heights.
- to_dict() Dict[str, Any]
Converts the data to a dictionary for serialization