Abstract
Bayesian networks (BNs) are probabilistic graphical models used to represent dependencies and independencies among variables. They have been applied widely to many areas in, e.g., biology and medicine, to untangle complex interrelationships, and are now finding wider use in areas such as social science with differing data features, for example highly polychotomous (multi-category) data. To construct BNs, scoring functions guide selection of the most appropriate model. Among these, the BDe scoring function requires specifying hyperparameters that influence the priors on the network parameters. This study evaluates the performance of four scoring functions—AIC, BIC, BDe, and log-likelihood—particularly with highly polychotomous data. We assessed the overall performance of the scoring function, and for BDe, we varied its hyperparameter to evaluate its impact. Performance of the scoring functions was significantly influenced by the number of nodes, network complexity, and sample size. BIC and BDe (with default hyperparameters) generally offered higher precision, especially with larger sample sizes, while log-likelihood tended to overfit, showing high recall but low precision. AIC and BDe required careful tuning based on discrete levels and sample sizes. Optimizing the hyperparameters in BDe was crucial for balancing model complexity and fit. We propose a simulation method for identifying the optimum hyperparameters for using BDe scoring function in real-world data applications. The study provides insights to enhance BN models’ robustness and accuracy, emphasizing the importance of considering sample size and the number of discrete levels when selecting and tuning scoring functions for BN structure learning.
Original language | English |
---|---|
Article number | 18 |
Pages (from-to) | 1-21 |
Number of pages | 21 |
Journal | Discover Data |
Volume | 3 |
Issue number | 1 |
Early online date | 19 May 2025 |
DOIs | |
Publication status | E-pub ahead of print - 19 May 2025 |
Keywords
- Bayesian networks
- Structure learning
- Scoring functions
- Polychotomous data
Fingerprint
Dive into the research topics of 'Evaluation of Bayesian network scoring functions in polychotomous data analysis'. Together they form a unique fingerprint.Datasets
-
Evaluation of Bayesian network scoring functions in polychotomous data analysis (code)
Ke, X. (Creator), GitHub, 2025
https://github.com/thebestecho/Bayesian-network-scoring-functions
Dataset: Software