Baseline Correction
Baseline correction is often overlooked, but it is crucial for accurate peak modelling and integration.
First let’s import MOCCA2.
from mocca2 import example_data, estimate_baseline
from matplotlib import pyplot as plt
The best way to correct baseline is using blank, and then by refining the baseline.
# Load example chromatogram
chromatogram = example_data.example_1(substract_blank=True)
# Refine the baseline
chromatogram.correct_baseline()
# Load the chromatogram without baseline correction and without blank subtraction
chromatogram_no_baseline = example_data.example_1(substract_blank=True)
chromatogram_no_blank = example_data.example_1(substract_blank=False)
# Plot the chromatogram with corrected baseline
fig, ax = plt.subplots(figsize=(8, 5))
chromatogram_no_blank.plot(ax, color="green", label="No blank subtraction")
chromatogram_no_baseline.plot(ax, color="red", label="No baseline correction")
chromatogram.plot(ax, label="Corrected")
plt.legend()
# plt.savefig("docs/_static/ex_baseline_corrected.svg")
plt.show()
We can also take a look onto different algorightms for baseline estimation. Let’s pretend we don’t have the blank run, so that we can compare it to the estimated baseline.
# Load example chromatogram
chromatogram = example_data.example_1(substract_blank=False)
# To make things faster, lets average absorbance over all wavelengths
mean_absorbance = chromatogram.contract()
# Estimate baseline using different methods
baseline_asls = estimate_baseline(mean_absorbance, method="asls")
baseline_arpls = estimate_baseline(mean_absorbance, method="arpls")
baseline_flatfit = estimate_baseline(mean_absorbance, method="flatfit")
# Plot the result
fig, ax = plt.subplots(figsize=(8, 5))
chromatogram.plot(ax, label="Original")
ax.plot(chromatogram.time, baseline_arpls, label="AsLS")
ax.plot(chromatogram.time, baseline_asls, label="arPLS")
ax.plot(chromatogram.time, baseline_flatfit, label="FlatFit")
ax.set_ylim(-30, 75)
plt.legend()
# plt.savefig("docs/_static/ex_baseline_comparison.svg")
plt.show()
- A few remarks:
The blank has slightly lower intensity than the sample baseline - for this reason it is better to refine baseline even if blank is substracted
AsLS significanly underestimates baseline in the 0.5 - 1.5 minute region
It is hard to interpret the 0 - 0.5 min region where are positive and negative peaks caused by the solvents. These obviously cannot be corrected by any general baseline correction algorithm
The region under the peak around 1.4 - 1.8 minutes shows how different methods approach baseline under peaks differently. The FlatFit seems to best capture the bend in the baseline.
For not-very-noisy data, I would recommend using FlatFit. It is very fast and stable algorithm which can also handle negative peaks.
The description of individual methods is in the baseline reference.