The choice of prediction curve method and its effect on the estimated amount of DNA

Date

2024

Authors

Magee, Morgan

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Water samples from the field data that contained environmental DNA (eDNA) were taken from multiple rivers where oolichan fish (Thaleichthys pacificus) are known to spawn. The samples were split into 8 technical replicates and analyzed using quantitative real-time polymerase chain reaction (qPCR). A qPCR experiment is the real time quantification of DNA amount at the end of a full cycle of heating and cooling. CT values were determined from a qPCR experiment or a replicate was given N/A (not available) if no DNA was detected. Four data sets from two different labs were used, Bureau Vertitas Lab (BVL) and University of Victoria (UVic). Both labs have a gblock and field data set with chemical assays named eTHPA2 and eTHPA6. Gblock data is comprised of gblock samples which were synthetically constructed genes of known concentration (copy number) and measured using qPCR. Field data is comprised of samples taken from river sites in British Columbia where eDNA naturally occurs, and the copy numbers were unknown for the field samples. The field samples were analyzed using qPCR technique to determine the CT value for each technical replicate. Each data set was split into two subsets named full and partial detect, resulting in eight working data sets. The full data sets were comprised of samples whose technical replicates had (8/8) detects. The partial data set was comprised of samples whose technical replicates had less than (8/8) detects. For the partial detect data, a Binomial model for the proportion of detects in a sample was defined, where a replicate with a CT value was an “event” and N/A was not an “event”. Assuming the number of molecules in a sample followed a Poisson distribution with mean λ, we estimated the λ as λˆ = −ln(1−pˆ), where pˆ is the estimated sample proportion of detect from the Binomial model. Standard/calibration and prediction curves were built from the gblock data. Standard curves were built using gblock data with known copy number values, and relate CT and λˆ values to copy number values. Standard curves were used to estimate copy numbers given CT or λˆ values for samples with unknown copy numbers. Prediction curves were built by fitting least squares and orthogonal regression using an unweighted and weighted method for each, to the gblock data. Prediction curves were used to estimate eTHPA6 CT or λˆ values given eTHPA2 CT or λˆ values. Plots and model summaries for the four prediction curves for each data set were analyzed. Based off the analysis and recommendation of the literature, weighted orthogonal regression was chosen as the best prediction model for each gblock data set. The prediction curves were applied to the corresponding field data to investigate how well the models predict the values of eTHPA6 given eTHPA2. All of the data sets saw majority well predicted final eTHPA6 copy number values, which indicated that the weighted Deming model was a good prediction method. The purpose of this study was to determine the best statistical methods for eDNA assay prediction for biologists and other researchers to use. From the methods validated in this study researchers can go on to connect the population estimates of the oolichan species made from the older and newer assays, make conclusions on the health of the species population, produce plan(s) to safeguard the population against over harvesting, and more conservation work.

Description

Keywords

Environmental DNA

Citation