Identifying individuals with tuberculosis (TB) with a high risk of onward transmission can guide disease prevention and public health strategies. Here, we train classification models to predict the first sampled isolates in Mycobacterium tuberculosis transmission clusters from demographic and disease data. We find that supervised learning, in particular balanced random forests, can be used to develop predictive models to identify people with TB that are more likely associated with TB cluster growth, with good model performance and AUCs of ≥ 0.
View Article and Find Full Text PDF