TY - JOUR
T1 - Hidden copy number variation in the HapMap population
AU - Marioni, John C.
AU - White, Michael
AU - Tavaré, Simon
AU - Lynch, Andrew G.
PY - 2008/7/22
Y1 - 2008/7/22
N2 - Recently, the extent of copy number variation (CNV) throughout the genome has been shown to be far greater than previously thought. Further, it has been demonstrated that specific copy number variable regions (CNVRs) are associated with particular diseases, suggesting that these genetic variations may have an important biological role. Hence, calling CNVRs and subsequently classifying samples as "losses" or "gains" is of great interest. A number of papers have been published containing classifications of CNVs, and here we show how the presence of pedigree information can be used for assessing the performance of those classification methods. In this article, by examining CNV classifications made in the HapMap samples, we show that estimates of the number of false-positive classifications per individual made by current approaches can be determined. Moreover, commonplace technologies for determining the locations of CNVRs aggregate information across the maternal and paternal chromosomes at the locus of interest. Here, we show that copy number variation on each chromosome can be inferred and, in particular, we discuss the existence of a class of CNVs that are inevitably misclassified and give an estimate of their prevalence. Although our focus is not on the development of calling algorithms per se, we describe and provide an example of how our model might be incorporated into the initial classification procedure to produce more robust results. Finally, we discuss how this methodology might be applied to future studies to obtain better estimates of the extent of CNV across the genome.
AB - Recently, the extent of copy number variation (CNV) throughout the genome has been shown to be far greater than previously thought. Further, it has been demonstrated that specific copy number variable regions (CNVRs) are associated with particular diseases, suggesting that these genetic variations may have an important biological role. Hence, calling CNVRs and subsequently classifying samples as "losses" or "gains" is of great interest. A number of papers have been published containing classifications of CNVs, and here we show how the presence of pedigree information can be used for assessing the performance of those classification methods. In this article, by examining CNV classifications made in the HapMap samples, we show that estimates of the number of false-positive classifications per individual made by current approaches can be determined. Moreover, commonplace technologies for determining the locations of CNVRs aggregate information across the maternal and paternal chromosomes at the locus of interest. Here, we show that copy number variation on each chromosome can be inferred and, in particular, we discuss the existence of a class of CNVs that are inevitably misclassified and give an estimate of their prevalence. Although our focus is not on the development of calling algorithms per se, we describe and provide an example of how our model might be incorporated into the initial classification procedure to produce more robust results. Finally, we discuss how this methodology might be applied to future studies to obtain better estimates of the extent of CNV across the genome.
KW - Array CGH
KW - Classification
KW - Copy number variation
KW - HapMap project
KW - Pedigree information
UR - http://www.scopus.com/inward/record.url?scp=48249095891&partnerID=8YFLogxK
U2 - 10.1073/pnas.0711252105
DO - 10.1073/pnas.0711252105
M3 - Article
AN - SCOPUS:48249095891
SN - 0027-8424
VL - 105
SP - 10067
EP - 10072
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 29
ER -