Tuberculosis (TB) outbreaks in the United States can cause substantial illness. Using surveillance and genotyping data, we applied a plausible source–case algorithm to identify TB cases reported during 2018–2020 responsible for secondary cases attributed to recent Mycobacterium tuberculosis transmission during 2020–2022.
We used mixed models and a machine learning workflow to assess sociodemographic, clinical, and social risk factors associated with plausible sources. In mixed models, sputum smear positivity, cavitary disease, race/ethnicity other than non-Hispanic White or non-Hispanic Asian, age < 65 years, US birth, and homelessness were associated with plausible sources.
An adaptive boosting model achieved an area under the receiver operating characteristic curve of 0.81 on test data. Transmission was heterogeneous; 8.1% of sources linked to 3–15 secondary cases accounted for 24.9% of transmission events.
Focusing case management and contact investigations on cases with the characteristics we identified could reduce M. tuberculosis transmission and improve TB prevention.