November 26, 2025

Lower limits of detection or quantification

Lower limits of detection or quantification
Nikolas Siccha
Computational Scientist
Lower limits of detection or quantification

In several of our recent projects, but also generally throughout the field of pharmacometrics, it is common to encounter observations that are censored from below, usually due to physical limitations of the assay used to perform the measurement. Generally, there is a difference between the (lower) limit of detection (LOD) and the lower limit of quantification (LLOQ), with the former being the lowest amount/concentration of a substance which can be distinguished from the complete absence of that substance, and the latter being the lowest amount/concentration for which there can be reasonable certainty about the accuracy of the measurement. Both the LOD and the LLOQ are generally reported as single numbers, estimated during assay development or validation using statistical methods, which naturally come with their own modelling choices and uncertainties.

Furthermore, in addition to the lower limit of quantification, there will usually also be a corresponding upper limit of quantification (ULOQ), i.e. the highest amount/concentration for which we can be reasonably confident about the measurement. In practice (and neglecting potential matrix effects of the used dilutant) the ULOQ has less of a significance than the LLOQ, because it is (relatively) easy to dilute a sample such that the to be measured concentration falls below the ULOQ, but it is significantly harder (or impossible) to concentrate a sample such that the concentration rises above the LLOQ. Finally, while the LLOQ is generally lab and assay specific, it can still vary across measurements taken via the same assay, if e.g. the blood sample taken did not have a sufficiently large volume and had to be diluted to be analysed at all.

In an ideal world, i.e. when using a fully joint Bayesian model, one would model both the assay characteristics and the potential effects of e.g. dilution on these characteristics. However, the underlying data are sometimes easy, sometimes hard to obtain. And, in any case, for the point I want to make in this blog post, these real world complications are less relevant. That point is the following, and it should actually be entirely non-controversial (e.g. due to Beal (2001) [1] or Bergstrand and Karlsson (2009)) [2]:

The key takeaway

Do not simply replace the measurements below the LLOQ with arbitrary values (e.g. 0 or LLOQ/2), unless you want to bias your estimates, and also do not just drop them, unless you want to throw away potentially valuable information.

Of course, the amount of bias you introduce by splicing in arbitrary values into your data and treating them as real, or the importance of the information you lose by leaving out part of your data, will depend on your data and model. Still, it should be clear that there is simply no reason to risk it, especially as there are straightforward and routine ways to handle censored data.

A representative, (semi-)realistic toy model

To illustrate the above point, I will simulate from and fit a simple, parametric hierarchical dose-response model using Julia (Bezanson et al. (2017)) [3] and Stan (Carpenter et al. (2017)) [4], where the observation model allows for censored values. I will try to generate somewhat realistic, but not overly complex, data - meaning that the full pipeline from data generation to visualization of results should allow me to go get a coffee from my own kitchen, but not much more. The parameters of the dataset are the following:

  • There will be 5 different doses (1, 2, 4, 8, and 16), with successively doubling amounts,
  • there will be no placebo group or effect,
  • within each group there will be 11 different subjects from the same population and without any measured covariates,
  • for every subject we will simulate 10 measurements, placed equidistantly and starting just before dosing, i.e. with the first observation measuring the subject-level baseline,
  • every observation will follow the same simple proportional normal measurement model with values below the LLOQ censored,
  • the LLOQ will be at 0.1, which is 10% of the population baseline,
  • the dose-response relationship will be logistic with a fixed (i.e. shared across the population) intercept and slope, multiplied with a subject-level multiplier,
  • the time-dependent shape of the response will be described by the probability density function of a Gamma distribution with subject-level shape (constrained to be larger than 1) and rate parameters, chosen because it allows for non-trivial shapes and goes to zero if the time goes to zero or positive infinity,
  • every subject will have their own
    • constant baseline (pre-dosing and after washout),
    • drug effect multiplier, and
    • shape and rate parameters.

The dose-response model

Figure 1 visualizes the prior distribution (left) of the subject-level dose-response relationship and the realized true (right) subject-level dose-response relationship for the simulated data. In the simulated data realization, there is some variation due to the subject-level drug effect multipliers, but it is small relative to the overall increase in response due to dose. Do note the different response scales of the two subplots. In this plot and all upcoming plots, the shaded areas correspond to different central credible intervals (CrI) of the plotted quantity, specifically the 95%, 90%, 80% and 50% CrIs.

Figure 1: Prior (left) and simulated truth (right) subject-level dose-response relationship.

The simulated latent response

Figure 2 visualizes the simulated latent concentration trajectories of our quantity of interest for the middle dose group. Every curved line represents the response for one patient in this dose group. Dosing happens at time = 0. The horizontal red line represents the LLOQ at 0.1 and the horizontal purple line represents LLOQ/2.

Figure 2: The simulated latent concentration trajectories for the middle dose group. The horziontal red line represents the LLOQ at 0.1, and the horizontal purple line represents LLOQ/2.

The censored observations

Table 1 summarizes the key characteristics of the simulated observations generated for the different dose groups. The fraction of censored observations rises from 7% for the lowest dose to 80% for the highest dose, with an overall fraction of 47% of the observations censored.

Table 1: Dose group characteristics.
dose # patients # observations # censored observations
1 11 110 8 (7%)
2 11 110 29 (26%)
4 11 110 57 (51%)
8 11 110 81 (73%)
16 11 110 88 (80%)
all 55 550 263 (47%)

Figure 11 (at the very bottom of this post) visualizes all subject-level concentration trajectories and observations, stratified by dose.

Comparing the different options (M1, M3, and M5)

The three different methods of handling censored observations (“BQL observations”) that we will be comparing are commonly known as M1, M3, and M5, going back to Beal (2001) [1]. These methods are, in short,

  • M1 (ignore): ignore all BQL observations - this is an easy to implement option, which is known to potentially introduce a generally small bias due to survivorship bias.
  • M3 (full): treat BQL observations “properly” (in my opinion) - the likelihood for BQL observations is simply the probability mass below the LLOQ of the observation model (conditional on the underlying latent state and the observation noise scale),
  • M5 (half): replace all BQL observations by LLOQ / 2 - this is again easy to implement, but known to potentially introduce a large bias.

Using our standard analysis pipeline, implementing M3 for arbitrary likelihoods is of course not significantly more difficult than implementing M1 or M5. Throughout the rest of this post, M1 will generally be represented using orange, M3 will be blue, M5 will be purple, and the underlying known truth will be black. All upcoming visualizations use full Bayesian inference, i.e. are generated from draws from the different posteriors obtained via Markov Chain Monte Carlo.

Observing only some of the data

Obviously, if no observations are censored at all, all considered methods will be equivalent. Similarly, the difference between the methods will be more significant if there is a larger fraction of censored observations. For this first part, we will only condition on observations from the first/lowest dose group. With only 8 out of 110 observations censored, we do not expect large differences between the methods, and this is indeed what we observe below.

The inferred latent states

Figure 3 visualizes the different inferred latent concentration trajectories for dose group 1 for the three different methods. Based on the plots of the CrIs of these trajectories, the three methods are in broad agreement. If any method seems to have an advantage, it would be M5, as its visualized CrIs are the only ones which overlap with the deepest trough in this dose group. M5 appearing to slightly outperform the other methods is probably because the replacement of the few BQL observations by LLOQ / 2 “encourages” the posterior to be compatible with stronger responses, relative to the other methods.

M1 and M3 not covering these deeper troughs as well is probably due to the always present compromise between the prior of a model and the likelihood of the observations given that model - the observations conditioned on so far simply do not contain enough information to move the posterior further away from the prior. This is also compatible with the posterior for M1, which is missing the 8 censored observations, performing “worse” than M3 - because the posterior of M1 remains closer to the prior.

Figure 3: The inferred latent concentration trajectories for dose group 1 for the different methods (observed dose groups: 1). Black curved lines are the true subject-level trajectories, dots represent the generated observations with censored observations highlighted in red, the horziontal red line represents the LLOQ at 0.1, and the horizontal purple line represents LLOQ/2. Shaded areas represent 95%, 90%, 80% and 50% central CrIs.
Q-Q plots

A similar picture emerges when comparing Q-Q plots for the different methods, as visualized in Figure 4, though any potential advantage of M5 seems to be less apparent. In this plot, as well as the upcoming Figure 9, grey dots represent observations which were not conditioned on. As in Figure 3, the methods appear to be in broad agreement.

Figure 4: Q-Q plot comparing the true latent concentration with the inferred ones for the different methods (observed dose groups: 1). Grey dots represent observations which were not conditioned on, and the red and purple straight lines represent LLOQ and LLOQ/2 respectively.

The inferred dose-response relationship

Figure 5 visualizes the inferred dose-response relationship, comparing the underlying truth (leftmost subplot) with the posteriors due to the different methods. As before, the methods appear to be in broad agreement, with M3 maybe providing tighter estimates than M1, and M5 maybe being shifted downwards slightly compared to M1 and M3.

Figure 5: The inferred dose-response relationship for the different methods (observed dose groups: 1).

The inferred change from baseline by dose

A common quantity of interest is the (average) relative maximal change from baseline (usually out-of-sample), which incorporates the subject-specific baselines. Figure 6 compares the in-sample relative maximal change from baseline, again comparing the underlying truth (leftmost subplot) with the posteriors due to the different methods. As before, the methods appear to be in broad agreement, with this time no immediately obvious difference between the methods.

Figure 6: In-sample relative maximal change from baseline for the different methods (observed dose groups: 1).

Observing all data

The previous broad agreement between the methods disappears once we condition on all of the data - which is a) to be expected as the fraction of BQL observations rises dramatically, but also b) by construction, otherwise this blog post would be kind of pointless.

The inferred change from baseline by dose

Figure 7 is the analogue to Figure 6, with the difference between the two figures being the amount of data we condition on (only data from the lowest dose group in the latter, all data in the former). Here, there is a clear difference between the three methods, with the posterior due to M3 being closest to the ground truth, the posterior due to M1 being similar to the one due to M3 if wider, and the posterior due to M5 clearly being biased.

Figure 7: In-sample relative maximal change from baseline for the different methods (observed dose groups: 1, 2, 4, 8, and 16).

The inferred dose-response relationship

Figure 8 is the analogue to Figure 5, with the difference between the methods being less pronounced than in Figure 7, but still visible.

Figure 8: The inferred dose-response relationship for the different methods (observed dose groups: 1, 2, 4, 8, and 16).

QQ-plots

Figure 9 is the analogue to Figure 4, with the difference between the methods being extremely obvious - M3 is clearly very well calibrated, M1 has not moved far enough away from the prior yet due to the omission of a large fraction of observations, and M5 is clearly biased due to the replacement of the BQL observations with LLOQ / 2. Again, as previously in Figure 4, grey dots represent observations which were not conditioned on.

Figure 9: Q-Q plot comparing the true latent concentration with the inferred ones for the different methods (observed dose groups: 1, 2, 4, 8, and 16). Grey dots represent observations which were not conditioned on, and the red and purple straight lines represent LLOQ and LLOQ/2 respectively.

The inferred latent states

Finally, Figure 10 visualizes the different inferred latent concentration trajectories for dose group 16 for the three different methods, when conditioning on all data. As in the previous plots in this section, the bias introduced by both M1 and M5 is clearly visible, with the only method “perfectly” calibrated being M3.

Figure 10: The inferred latent concentration trajectories for dose group 16 for the different methods (observed dose groups: 1, 2, 4, 8, and 16). Black curved lines are the true subject-level trajectories, dots represent the generated observations with censored observations highlighted in red, the horziontal red line represents the LLOQ at 0.1, and the horizontal purple line represents LLOQ/2. Shaded areas represent 95%, 90%, 80% and 50% central CrIs.

Summary and caveats

As was probably obvious already from the subtitle, we have been able to observe how mishandling censored observations can introduce significant biases into our statistical methods. Doing “proper” joint modelling of all data should generally be the best way to make inferences and the downstream decisions dependent on the statistical analysis - at least it has been in this case study.

The big caveat is of course model misspecification. For this case study, the data generating process and the model used to fit the generated data have been identical, yielding the wonderfully calibrated posterior predictions visible e.g. in Figure 10. Generally, and especially for these kinds of settings where you have to essentially extrapolate into unobserved regions, the importance of the prior and the model cannot be overstated. Development of appropriate models and priors will generally involve iterative model development, critique, abandonment and extension.

Finally, to generate this blog post with (in my opinion) aesthetically pleasing and statistically interesting figures, some iteration and data and model selection has been necessary. Bayesian inference is hard!

Appendix

Figure 11 visualizes all subject-level trajectories (curved lines) and the corresponding generated observations (dots), stratified by dose, with censored observations highlighted in red. The horizontal red line again represents the LLOQ at 0.1 and the horizontal purple line represents LLOQ/2

Figure 11: Subject level concentration trajectories and observations, stratified by dose. Black curved lines are the true subject-level trajectories, dots represent the generated observations with censored observations highlighted in red, the horziontal red line represents the LLOQ at 0.1, and the horizontal purple line represents LLOQ/2.

References

[1] Beal, Stuart L. 2001. “Ways to Fit a PK Model with Some Data Below the Quantification Limit.” Journal of Pharmacokinetics and Pharmacodynamics 28 (5): 481–504. https://doi.org/10.1023/A:1012299115260.

[2] Bergstrand, Martin, and Mats O. Karlsson. 2009. “Handling Data Below the Limit of Quantification in Mixed Effect Models.” The AAPS Journal 11 (2): 371–80. https://doi.org/10.1208/s12248-009-9112-5.

[3] Bezanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B. Shah. 2017. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review 59 (1): 65–98. https://doi.org/10.1137/141000671.

[4] Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). https://doi.org/10.18637/jss.v076.i01.

Check other articles