The choice of prediction curve method and its effect on the estimated amount of DNA
Date
2024
Authors
Magee, Morgan
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Water samples from the field data that contained environmental DNA (eDNA) were taken from
multiple rivers where oolichan fish (Thaleichthys pacificus) are known to spawn. The samples
were split into 8 technical replicates and analyzed using quantitative real-time polymerase chain
reaction (qPCR). A qPCR experiment is the real time quantification of DNA amount at the
end of a full cycle of heating and cooling. CT values were determined from a qPCR experiment
or a replicate was given N/A (not available) if no DNA was detected. Four data sets from
two different labs were used, Bureau Vertitas Lab (BVL) and University of Victoria (UVic).
Both labs have a gblock and field data set with chemical assays named eTHPA2 and eTHPA6.
Gblock data is comprised of gblock samples which were synthetically constructed genes of known
concentration (copy number) and measured using qPCR. Field data is comprised of samples
taken from river sites in British Columbia where eDNA naturally occurs, and the copy numbers
were unknown for the field samples. The field samples were analyzed using qPCR technique to
determine the CT value for each technical replicate. Each data set was split into two subsets
named full and partial detect, resulting in eight working data sets. The full data sets were
comprised of samples whose technical replicates had (8/8) detects. The partial data set was
comprised of samples whose technical replicates had less than (8/8) detects. For the partial
detect data, a Binomial model for the proportion of detects in a sample was defined, where a
replicate with a CT value was an “event” and N/A was not an “event”. Assuming the number
of molecules in a sample followed a Poisson distribution with mean λ, we estimated the λ as
λˆ = −ln(1−pˆ), where pˆ is the estimated sample proportion of detect from the Binomial model.
Standard/calibration and prediction curves were built from the gblock data. Standard curves
were built using gblock data with known copy number values, and relate CT and λˆ values to
copy number values. Standard curves were used to estimate copy numbers given CT or λˆ values
for samples with unknown copy numbers. Prediction curves were built by fitting least squares
and orthogonal regression using an unweighted and weighted method for each, to the gblock
data. Prediction curves were used to estimate eTHPA6 CT or λˆ values given eTHPA2 CT or
λˆ values. Plots and model summaries for the four prediction curves for each data set were analyzed. Based off the analysis and recommendation of the literature, weighted orthogonal
regression was chosen as the best prediction model for each gblock data set. The prediction
curves were applied to the corresponding field data to investigate how well the models predict
the values of eTHPA6 given eTHPA2. All of the data sets saw majority well predicted final
eTHPA6 copy number values, which indicated that the weighted Deming model was a good
prediction method.
The purpose of this study was to determine the best statistical methods for eDNA assay
prediction for biologists and other researchers to use. From the methods validated in this study
researchers can go on to connect the population estimates of the oolichan species made from
the older and newer assays, make conclusions on the health of the species population, produce
plan(s) to safeguard the population against over harvesting, and more conservation work.
Description
Keywords
Environmental DNA