TY - JOUR
T1 - Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed)
AU - Vogl, Claus
AU - Karapetiants, Mariia
AU - Yıldırım, Burçin
AU - Kjartansdóttir, Hrönn
AU - Kosiol, Carolin
AU - Bergman, Juraj
AU - Majka, Michal
AU - Mikula, Lynette Caitlin
N1 - CV and BY were supported by the the Austrian Science Fund (FWF; DK W1225-B20); MK and HK were supported by the the Austrian Science Fund (FWF; SFB F6101 and F6106). This work was also partially funded by the Vienna Science and Technology Fund (WWTF) (10.47379/MA16061 to CK). LCM’s research was funded by the School of Biology at the University of StAndrews.
PY - 2024/4/16
Y1 - 2024/4/16
N2 - Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred.
AB - Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred.
U2 - 10.1186/s12859-024-05751-4
DO - 10.1186/s12859-024-05751-4
M3 - Article
SN - 1471-2105
VL - 25
JO - BMC Bioinformatics
JF - BMC Bioinformatics
ER -