BACKGROUND: The advent of affordable sequencing has enabled researchers to discover many variants contributing to disease, including rare variants. There are methods for determining the most informative individuals for sequencing, but the application of these methods is more complex when working with families. Sets of large families can be beneficial in finding rare variants, but it may be unfeasible to sequence all members of these family sets.
METHODS: Using simulated data from the Genetic Analysis Workshop 19, we apply multiple regression to identify cases and controls. To find the best controls for each case, we used kinship coefficients to match within families. Selected cases and controls were analyzed for rare variants, collapsed by gene, associated with hypertension using the family-based rare variant association test (FARVAT).
RESULTS: The gene with the strongest simulated effect, MAP4, did not meet the Bonferroni corrected significance threshold. However, analysis of cases and controls using our selection method substantially improved the significance of MAP4, despite the reduction in sample size.
CONCLUSIONS: Taking the additional steps to select the optimal cases and controls from large family data sets can help ensure that only informative individuals are included in analysis and may improve the ability to detect rare variants.