The gold standard for diagnosing pulmonary
In 2016, there were an estimated 10.4 million (95% uncertainty interval 8·8–12·2) new incident cases of
The gold standard for TB diagnosis is finding MTB in clinical samples of the patient that include sputum or other specimen [
In the present study, in order to overcome these two technical difficulties in TB detection, a method based on a molecular typing technique and mathematical modeling was proposed. Molecular typing methods based on variable number tandem repeat (VNTR) analysis have been used to identify MTB in epidemiologic studies [
516 clinical sputum specimens were collected from the Shanghai Pulmonary Hospital of Tongji University between February and October 2014. Among them, 9 were excluded due to uncertain diagnoses, and 13 were removed since they were obtained from cured patients whose TB results were affected by drugs. Form the remaining 494 samples, 167 were confirmed as TB and 327 were found to be non-TB, including 4 samples of nontuberculous mycobacteria (NTM). The diagnosis of the TB cases was also based on X-ray imaging, clinical symptoms, therapeutic effects, and clinical history. Finally, 148 samples with complete data sheets and confirmed diagnosis were included in the study. The prior distribution of different subtypes of TB bacteria was calculated based on their VNTR results.
In the proposed method, the aforementioned MIRU-VNTR loci were used as the characteristics of different TB subtypes [
Calculation of the prior probability of each array.
Repeat counts | Loci | |||||||
---|---|---|---|---|---|---|---|---|
MTUB21 | MTUB04 | QUB-18 | QUB-26 | QUB-11b | MIRU31 | MIRU10 | MIRU26 | |
1 | ||||||||
2 |
|
|||||||
3 |
|
|||||||
4 |
|
|
|
|||||
5 |
|
|||||||
6 |
|
|||||||
7 | ||||||||
8 |
|
|||||||
9 | ||||||||
10 |
MIRU-VNTR helps in extracting features of TB, and the chosen loci can be analyzed with PCR (a biological technique able to amplify the signal of such features) in the same temperature, in order to obtain a clear signal. Therefore, features of TB can be extracted with less time and cost. However, the contamination can also be amplified during the PCR process. In order to address this problem, it is important to know whether the contamination is reasonable. Based on the prior distribution of each locus, the occurrence rate of different TB subtypes was calculated. The occurrence rate of one TB subtype in a batch of samples conforms to the law of binomial distribution. Subsequently, using the binomial distribution theory, one could evaluate whether the contamination is reasonable.
Each sample had a specific numerical array and its corresponding prior distribution. As the occurrence probability of any given array is very low, the probability of an array appearing more than once in a single batch is far lower. Knowing the sample number in the testing batch and the prior distribution of each array, the expected occurrence rate (EOR) of an array can be calculated based on the binomial distribution function using the following formula in the event of two repeated loci:
Community surveys were conducted and approved by the ethical committees of the Shanghai Pulmonary Hospital of Tongji University. No human tissue was used in this study. Two of the coauthors of the paper provided clinical documents and clinical sputum specimens. The collected data included conformed diagnosis results and MIRU-VNTR results from the sputum specimens. No private information was used in this study. The data were used only for research purposes. The content of the study was written in the informed consent form, which was signed by all patients. All acquired records and specimens used in the study were anonymized and could not be linked to any of the patients. The ethical committees of the Shanghai Pulmonary Hospital of Tongji University approved all the experimental protocols. The methods carried out in this study were in accordance with the approved guidelines.
A total of 148 samples confirmed as TB was collected over a two-month period, of which 92 were collected in the first month and 76 in the second. Table
Prior distribution of two test results.
Repeated number | MTUB21 | MTUB04 | QUB-18 | QUB-26 | QUB-11b | MIRU31 | MIRU10 | MIRU26 | |
---|---|---|---|---|---|---|---|---|---|
0 | 1st month | 0.00 | 0.00 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2nd month | 0.00 | 0.00 | 0.05 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
Difference | 0.00 | 0.00 | −0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
|
|||||||||
1 | 1st month | 0.04 | 0.01 | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 |
2nd month | 0.07 | 0.01 | 0.01 | 0.02 | 0.02 | 0.01 | 0.01 | 0.01 | |
Difference | −0.02 |
0.00 | −0.01 | −0.02 | −0.01 | −0.01 | −0.01 | 0.00 | |
|
|||||||||
2 | 1st month | 0.03 | 0.09 | 0.03 | 0.03 | 0.02 | 0.01 | 0.16 | 0.01 |
2nd month | 0.04 | 0.08 | 0.03 | 0.03 | 0.01 | 0.03 | 0.15 | 0.02 | |
Difference | −0.01 | 0.01 | 0.01 | 0.01 | 0.01 | −0.02 | 0.01 | −0.01 | |
|
|||||||||
3 | 1st month | 0.07 | 0.09 | 0.04 | 0.02 | 0.07 | 0.10 | 0.76 | 0.04 |
2nd month | 0.08 | 0.18 | 0.04 | 0.03 | 0.09 | 0.08 | 0.74 | 0.03 | |
Difference | −0.02 | −0.09 | 0.00 | −0.01 | −0.02 | 0.02 | 0.02 | 0.01 | |
|
|||||||||
4 | 1st month | 0.18 | 0.70 | 0.03 | 0.13 | 0.10 | 0.05 | 0.05 | 0.08 |
2nd month | 0.20 | 0.63 | 0.02 | 0.08 | 0.11 | 0.08 | 0.07 | 0.06 | |
Difference | −0.01 | 0.07 | 0.01 | 0.05 | −0.02 | −0.03 | −0.01 | 0.02 | |
|
|||||||||
5 | 1st month | 0.52 | 0.12 | 0.07 | 0.04 | 0.21 | 0.67 | 0.01 | 0.09 |
2nd month | 0.49 | 0.11 | 0.07 | 0.04 | 0.21 | 0.68 | 0.02 | 0.07 | |
Difference | 0.03 | 0.01 | −0.01 | 0.00 | 0.00 | 0.00 | −0.01 | 0.02 | |
|
|||||||||
6 | 1st month | 0.07 | 0.00 | 0.03 | 0.09 | 0.49 | 0.08 | 0.01 | 0.04 |
2nd month | 0.05 | 0.00 | 0.02 | 0.08 | 0.41 | 0.07 | 0.01 | 0.07 | |
Difference | 0.01 | 0.00 | 0.01 | 0.01 | 0.08 | 0.01 | 0.00 | −0.02 | |
|
|||||||||
7 | 1st month | 0.01 | 0.00 | 0.05 | 0.13 | 0.10 | 0.01 | 0.00 | 0.08 |
2nd month | 0.01 | 0.00 | 0.06 | 0.16 | 0.14 | 0.01 | 0.00 | 0.08 | |
Difference | 0.00 | 0.00 | −0.01 | −0.02 | −0.04 | 0.00 | 0.00 | 0.00 | |
|
|||||||||
8 | 1st month | 0.04 | 0.00 | 0.34 | 0.41 | 0.01 | 0.01 | 0.00 | 0.50 |
2nd month | 0.03 | 0.00 | 0.36 | 0.43 | 0.01 | 0.01 | 0.00 | 0.50 | |
Difference | 0.01 | 0.00 | −0.03 | −0.02 | 0.00 | 0.00 | 0.00 | 0.00 | |
|
|||||||||
9 | 1st month | 0.03 | 0.00 | 0.18 | 0.12 | 0.00 | 0.00 | 0.00 | 0.14 |
2nd month | 0.03 | 0.00 | 0.17 | 0.11 | 0.00 | 0.00 | 0.00 | 0.14 | |
Difference | 0.01 | 0.00 | 0.02 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | |
|
|||||||||
10 | 1st month | 0.00 | 0.00 | 0.14 | 0.02 | 0.00 | 0.07 | 0.00 | 0.01 |
2nd month | 0.00 | 0.00 | 0.14 | 0.03 | 0.00 | 0.04 | 0.00 | 0.02 | |
Difference | 0.00 | 0.00 | 0.01 | −0.01 | 0.00 | 0.02 | 0.00 | −0.01 | |
|
|||||||||
11 | 1st month | 0.00 | 0.00 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2nd month | 0.00 | 0.00 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
Difference | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
|
|||||||||
12 | 1st month | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2nd month | 0.00 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
Difference | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
The occurrence rate of each strain of tubercle bacillus was calculated using the equation
MIRU-VNTR assays were classified as high frequency if the repeat distribution of any of its loci exceeded 0.1. This category comprised 864 TB subtypes, with occurrences ranging from 0.005 to 5.07
Distribution of VNTR assays. (a) The occurrence of high-frequency array: the higher the occurrence of the arrays, the rarer they are. (b) The reasonable repetition for batches of different sample numbers: repetition in a single batch is highly improbable for most TB subtypes. (c) The change of expected occurrence along with positive sample number in one batch: the occurrence rate of a given array may be very low, but the likelihood of it occurring twice or more increases as the number of positive samples rises.
The EOR of an array increased with the number of positive samples in the batch (Figure
The results of the parts are shown in Figure
In this subsection, a Monte Carlo algorithm was used to simulate the application of the proposed method in a clinical laboratory. The steps of this procedure are illustrated in Figure
Simulating process using Monte Carlo.
The simulation time was set as 50 weeks, with an average of 1000 positive samples per simulation. The simulation results are demonstrated in Figure
Results of simulation. (a) The result of the maximal difference between the distribution of all collected data and the distribution of data collected in each week in each week: these differences are small. (b) The result of the sum of absolute different values: the sum of absolute different values decreased as the accumulation of sample counts increased. (c) The rate of unreasonable repetition: the unreasonable repeated sample rate per week ranged from 0.05 to 0.25. (d) The accuracy of detection of contaminated samples: the accuracy ranged from 0.88 to 1.0.
Data tested with the contamination function produced an average of 28.82 known contaminated samples each week, ranging from 17 to 38. In Figure
In Figure
When the proposed method was used on 148 TB cases, three repetitions were found. Due to their low-occurrence rate, the method assumed that the samples were contaminated, making a wrong diagnosis. The negative samples without contamination were expected to have no MIRU-VNTR features of TB. There was a very low probability that a sample may have a TB type with none of the MIRU-VNTR loci included in the study, which would bring a misdiagnosis. In our dataset, no patient provided TB specimens without at least one of the MIRU-VNTR loci included in the study based on the 327 confirmed TB-negative cases.
Despite the fact that there were some MIRU-VNTR loci whose distributions differed between the two tested results, most of these differences were negligibly small. In the simulation, the difference between the distribution of all collected data and that of the data collected during each week gradually decreased as the total accumulation of collected data in our database increased. The results suggested that most TB bacteria do not mutate frequently. Additionally, it was found that the TB subtypes (5-MTUB21, 4-MTUB04, 8-QUB-18, 8-QUB-26, 6-QUB-11b, 5-MIRU31, 3-MIRU10, and 8-MIRU26) were by far the most dominant subtype in the Shanghai area, with an appearance rate about 17 times that of the next most common subtype. Even though the exact reason why these loci remain stable has not been yet discussed, there is clinical significance in studying the effects (e.g., drug resistance) conferred by different types of tuberculosis. With the increasing accumulation of clinical information from patients, our database may contribute in making key inroads in this area of TB research.
Even small amounts of contamination, such as those obtained through aerosol, can be significantly amplified by PCR. Therefore, a longer time window of seven days was set for the tests. Based on the results, it was concluded that most arrays have an extremely low possibility of appearing twice within a time window. A laboratory that can confirm the elimination of some forms of contamination can set an even shorter time window. In this study, it was found that the high-frequency loci have an inordinately large influence on the testing results. If such results were involved in cross-contamination, the contamination may have been mistaken for a normal result. It was considered reasonable for the samples with an occurrence rate >0.0003 to have two or more copies in a single time window containing 1000 positive samples. These samples were classified into group(s) one (1), two (11), or three (61). The presence of more than two copies of some loci was also reasonable in this system. The most dominant subtype in the present study (5, 4, 8, 8, 6, 5, 3, 8) occurred 233 times and was almost consistent with its occurrence rate of 0.0005. Overall, the suggested testing method can account for a large majority of contamination incidents. However, if high-frequency loci occur twice or more in reasonable tested results, additional confirmation should be performed, such as thorough review of the patient’s medical records.
Theoretically, the possibility that two strains of TB in the same clinical laboratory will have the same genetic features, namely, the same MIRU-VNTR loci repeats, is extremely low. If this happens, it is very plausible that one strain has contaminated the other. Based on this hypothesis, a rapid and accurate clinical scheme for TB testing was developed. In the clinical laboratory, eight samples were measured for eight to ten genetic sites at once and the repeat numbers of each genetic site were recorded as the special identifier of that subtype of TB. According to the results, the highest occurrence rate for any subtype appearing multiple times was 0.005 and was that of 5-MTUB21, 4-MTUB04, 8-QUB-18, 8-QUB-26, 6-QUB-11b, 5-MIRU31, 3-MIRU10, and 8-MIRU26. Despite the fact that this rate was already very low, it still dwarfs those of other subtypes. If a sample with a low-repetition rate appears twice or more, there is a high risk of contamination being the cause. Intracontamination is generally caused by poor procedures followed by the operating technician. When this happens, the suspect samples are recollected and retested. Intercontamination can be derived from contact between patients or airborne tuberculosis in the laboratory. In order to identify and solve these problems, the repeated results, the activity range of the corresponding patients, and the appearance and duration of symptoms should be investigated, all of which are included in the proposed method. If factors stemming from intra- and intercontamination can be excluded, but repeated samples with low-occurrence rates are still detected, it can be speculated that a subtype of TB may be creating an epidemic.
Compared to traditional TB testing, the proposed MIRU-VNTR method in this study can process large amounts of samples in very short time (Figure
Scheme flowchart based on our model.
In the present study, a method that can be widely used in epidemiological studies was proposed. However, due to the limited available volume of data, our method was insufficient to derive bias-free results. Based on the proposed testing scheme, digital MIRU-VNTR data extracted from collected sputum specimens can be automatically and constantly uploaded by clinicians using this scheme, allowing the analysis of large volumes of data and the acquisition of comprehensive and objective results. The data processing power of this technique may also aid researchers in determining the relationships between different TB subtypes and their clinical features, such as drug resistance [
The TB type data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
Tienan Feng and Yan Cheng contributed equally to this work.
This work was supported by the National Natural Science Foundation of China (Grant No. 81371775), Shanghai Municipal Health Bureau (Grant No. 20144y0249), and Medical-Engineering Cross Fund of Shanghai Jiaotong University (Grant No. YG2017QN70).