Abstract
The volume of DNA in a sequencing experiment is often amplified by PCR, leading to the possibility that the same original DNA fragment will be sequenced twice - a ‘PCR duplicate’. Sometimes indistinguishable from these are multiple sequences arising from identical but independent molecules, which can lead to an over-estimation of the PCR duplicate proportion. The PCR duplicate proportion, and other measures derived from it, are important statistics for quality assurance, experimental design, and interpretation of sequencing experiments. Here we provide a full likelihood basis for a combinatorial approach using heterozygous SNPs as implemented in our R package, and demonstrate the efficacy of the approach. We also discuss the association with DNA copy number, and demonstrate the impact on a question of inferring mitochondrial DNA copy number that has recently been a feature of several high-profile cancer studies. This is explored through a simulation study.
Original language | English |
---|---|
Title of host publication | Recent developments in statistics and data science |
Subtitle of host publication | SPE2021, Évora, Portugal, October 13–16 |
Editors | Regina Bispo, Lígia Henriques-Rodrigues, Russell Alpizar-Jara, Miguel de Carvalho |
Place of Publication | Cham |
Publisher | Springer |
Pages | 259-279 |
Number of pages | 21 |
Volume | 398 |
ISBN (Electronic) | 9783031127663 |
ISBN (Print) | 9783031127656 |
DOIs | |
Publication status | Published - 29 Nov 2022 |
Event | XXV Congress of the Portuguese Statistical Society - Online, Évora, Portugal Duration: 13 Oct 2021 → 16 Oct 2021 Conference number: 25 http://www.spe2021.uevora.pt/en/inicio-english/ |
Publication series
Name | Springer proceedings in mathematics & statistics |
---|---|
Volume | 398 |
ISSN (Print) | 2194-1009 |
ISSN (Electronic) | 2194-1017 |
Conference
Conference | XXV Congress of the Portuguese Statistical Society |
---|---|
Abbreviated title | SPE |
Country/Territory | Portugal |
City | Évora |
Period | 13/10/21 → 16/10/21 |
Internet address |
Keywords
- Whole-genome sequencing
- DNA copy number
- Likelihood
- Quality control
- Mitochondria
- Cancer