Asthma is a major public health problem around the world, affecting individuals across the age spectrum from infants to older adults. Therefore, research on its pathogenesis and treatment has been a hot topic in the study of respiratory diseases. It is now well accepted that cell activity has a close relationship with pathogenesis of asthma, and numerous basic and clinical studies focus on different types of relevant cells.
As a typical example of type I hypersensitivity, the research of immune cells concerned with asthma, such as lymphocytes, monocytes, and mast cells, is most common. For instance, T cell subsets include CD4⁺, CD8⁺ T cells [
Currently, there has been continued interest in targeting airway cells for developing new asthma treatments. Therefore, it has become imperative to analyze the current trend and future direction of asthma cell research. This study summarized asthma literature indexed in the Medical Literature Analysis and Retrieval System Online (MEDLINE) of the National Library of Medicine (NLM) in the past 30 years and explored the history and present state of asthma cell research by stem frequency rank to provide ideas for future work.
Literature of asthma cell research indexed from MEDLINE in the past 30 years was divided into three groups with 10 years as the retrieval time. The literature containing the keywords “Asthma” and “Cell” in the fields “Title” or “Abstract” was included for further investigation. The limit of publication date in the three groups were “January 1, 1987, to December 31, 1996,” “January 1, 1997, to December 13, 2006,” and “January 1, 2007, to December 31, 2016,” respectively.
The search results of each decade were exported into a CSV file with information such as title and author. All the titles of each CSV file were saved as a text file for analysis with stem frequency rank.
Due to the large amount data in the literature, we adopted Apache Hadoop, which is commonly used in big data analysis as the data storage framework. As a file system supporting a data-intensive distributed application, Apache Hadoop has better distribution characteristics and provides file services with both reliability and mobility for the program development [
The analysis of stem frequency ranking was handled using the Natural Language Tool Kit (NLTK). NLTK is an important tool for dealing with human natural language, which can be applied to word merging, text retrieval, and statistics, and so on. The technologies such as “Word frequency Accumulation,” “Stemming Processing,” and “Stop-word Filtering” applied in this study were all performed with NLTK [ Programming environment: EC2 server of Amazon Web Services (AWS) platform was selected as the programming environment. Server model: t2 micro Server Location: Oregon, United States Operating System and software: Ubuntu Server 14.04 with built-in Python language (version 2.7.3), Apache Spark (version 1.6.2), and NLTK (version 3.0). Working process: Import a text file. Create the Spark context. Convert all text to lowercase. Remove punctuations, empty lines, and nonletter symbols. Use stop word list to filter irrelevant vocabulary of research such as “they,” “where,” “to,” and “is.” The words influencing the results such as “review,” “asthma,” and “cell” were also added into the stop word list for filtering. Stemming for reducing each word to its base form by removing its common morphological ending. In this study, we utilized PorterStemmer, a Python wrapper of the libstemmer library, to perform the stemming step. Rank all stems according to their frequency. List the top 50 stems, and output the results.
1331, 4393, and 7215 records were retrieved in the 1st, 2nd, and 3rd decade, respectively, which shows that the number of cell research literature of asthma indexed by MEDLINE presents explosive growth from 1987 to 2017; the literature number of the next decade was 1.5–2 times greater than the previous one.
The top 50 stems of 3 decades are listed in Table
Top 50 stems of three decades.
Ranking | 1st decade | 2nd decade | 3rd decade | |||
---|---|---|---|---|---|---|
Stem | Frequency | Stem | Frequency | Stem | Frequency | |
1 | eosinophil | 147 | inflamm | 545 | inflamm | 1153 |
2 | activ | 115 | express | 425 | respons | 552 |
3 | respons | 113 | eosinophil | 358 | express | 537 |
4 | inflamm | 102 | receptor | 351 | t | 526 |
5 | t | 90 | activ | 341 | activ | 525 |
6 | express | 84 | respons | 339 | receptor | 508 |
7 | bronchoalveolar | 80 | t | 326 | mice | 476 |
8 | role | 76 | role | 321 | role | 429 |
9 | lavag | 75 | induc | 298 | epitheli | 422 |
10 | mast | 72 | mast | 232 | induc | 410 |
11 | lymphocyt | 71 | smooth | 229 | muscl | 406 |
12 | atop | 67 | muscl | 228 | smooth | 401 |
13 | cytokin | 61 | epitheli | 228 | inhibit | 401 |
14 | receptor | 60 | cytokin | 211 | immun | 336 |
15 | releas | 59 | product | 207 | mous | 333 |
16 | induc | 58 | inhibit | 205 | murin | 330 |
17 | inhibit | 55 | allergen | 200 | mast | 328 |
18 | allergen | 54 | atop | 177 | eosinophil | 294 |
19 | product | 53 | protein | 171 | remodel | 290 |
20 | epitheli | 53 | sputum | 168 | signal | 257 |
21 | inhal | 50 | gene | 166 | protein | 250 |
22 | adhes | 49 | murin | 160 | children | 245 |
23 | mediat | 47 | mice | 158 | cytokin | 244 |
24 | inflammatori | 46 | pulmonari | 148 | inflammatori | 235 |
25 | children | 45 | inflammatori | 146 | suppress | 234 |
26 | histamin | 43 | hyperrespons | 144 | hyperrespons | 232 |
27 | muscl | 41 | children | 136 | pulmonari | 230 |
28 | leukotrien | 40 | inhal | 136 | gene | 226 |
29 | antigen | 40 | th2 | 118 | product | 224 |
30 | smooth | 40 | remodel | 118 | th2 | 215 |
31 | fluid | 38 | inhibitor | 118 | modul | 213 |
32 | sodium | 38 | mechan | 116 | dendrit | 210 |
33 | ige | 37 | develop | 116 | develop | 207 |
34 | peripher | 34 | rat | 115 | allergen | 196 |
35 | macrophag | 34 | allergeninduc | 114 | attenu | 195 |
36 | pulmonari | 33 | immun | 113 | pathway | 190 |
37 | vitro | 32 | lymphocyt | 110 | differenti | 190 |
38 | mechan | 32 | infect | 106 | target | 185 |
39 | select | 31 | sensit | 102 | novel | 183 |
40 | protein | 30 | novel | 102 | infect | 177 |
41 | therapi | 28 | mediat | 100 | mediat | 176 |
42 | glucocorticoid | 28 | peripher | 100 | regulatori | 174 |
43 | bronchoconstrict | 28 | signal | 100 | type | 164 |
44 | guinea | 28 | leukotrien | 98 | sever | 164 |
45 | chang | 28 | allergi | 97 | rat | 164 |
46 | pig | 28 | growth | 97 | promot | 161 |
47 | tcell | 27 | chemokin | 96 | inhibitor | 156 |
48 | hyperrespons | 27 | modul | 93 | allergi | 156 |
49 | immun | 27 | mous | 93 | mechan | 151 |
50 | modul | 27 | kinas | 93 | potenti | 150 |
The shared stems of three decades. The orange, green, and purple bars show the frequencies of stems in the 1st, 2nd, and 3rd decade, respectively.
The shared stems of the first two decades. The orange and green bars show the frequencies of stems in the 1st and 2nd decade, respectively.
The shared stems of the last two decades. The green and purple bars show the frequencies of stems in the 2nd and 3rd decade, respectively.
The unique stems of three decades. The orange, green, and purple bars show the frequencies of stems in the 1st, 2nd, and 3rd decade, respectively.
The mainstream research trends can be summarized from stems shared with 3 decades. First, experimental research attracted more attention by researchers rather than clinical research. “Children” is the only relevant stem on behalf of clinical research for its frequent occurrence of asthma among children, and a study reported that asthma is common in children and is a leading cause of childhood hospitalization [
Second, two frequent stems about pathologic changes of asthma were “inflamm” and “hyperrespons.” “Inflamm” was also in the top 10 because airway inflammation is the main expression of asthma, with mechanism research about inflammation such as etiological agents and influence factor [
Third, in terms of different types of cells related with asthma, eosinophils, mast cells, and T cells are the hot spots of immunocytes, according to the results of ranking. Mast cells are the “first line of defense” in which innate/adaptive immune cells can be activated to release a wide range of mediators by allergen-IgE-specific triggers and are widely distributed in tissues of the airway exposed to the environment, so mast cells preempt the critical roles played by histamine and mucus secretion in causing airway obstruction [
Epithelia and smooth muscle cells (SMCs) are the hot spots of structural cell studies. Research has shown that airway epithelial barrier dysfunction may have important implications for asthma [
Several variation trends can be summarized after comparing the shared stems in the first and last two decades.
First, the phenotype definition of asthma has become gradually clearer. The shared stem “atopic” in the first two decades showed that “atopic” and “non-atopic” stems were often used to define the phenotypes of asthma due to the limited available data about asthma and atopy at that time [
Second, genetic studies and airway remodeling have received more attention. Along with novel experiment technologies applied in molding and detection, more studies of signaling pathways [
Finally, looking into changes in therapeutic approaches, the shared stem “inhal” in first two decades showed that inhaled treatment was mainstream at the early stage [
Several distinctive research hot spots can be analyzed according to the unique stems of each decade. Two specific aspects were concerned in the studies of the first decade. First, the relevant mechanism researches including the release of cytokines [
The main hot spot drawn from the unique stems of the second decade is that the allergen-induced topics, such as airway hyperresponsiveness [
With the development of genetic technology, the research of the immune response became prevalent in the third decade, and specific stems about its mechanism, regulation, and signaling pathways such as “pathway” [
The number of cell research studies of asthma indexed by MEDLINE has increased rapidly. According to the ranking list of frequent stems, scholars paid more attention to experimental research, especially mechanistic research, rather than clinical research. The immunocyte studies and structural cell research are the two main directions. Eosinophils, mast cells, and T cells are the hot spots of immunocyte studies, while epithelia and SMCs are the hot spots of structural cell research. The research trend is closely linked with the development of experimental technology, including animal models. Early studies featured basic research, but immunity research has dominated in the recent decade with the development of genetic technology.
Based on the stem rankings of three decades, future trends can be predicted in the following aspects: (1) The distinct definition of asthma phenotypes associated with genetic characteristics will provide benefits for basic studies and clinical therapy. For instance, personalized medicine treatment tailored to individual’s asthma phenotypes identified through biomarkers [
The authors declare no conflicts of interest regarding the publication of this paper.
Yi Shang and Wenchao Tang developed and designed the study. Wenchao Tang performed the programming. Wenchao Tang wrote the paper. Bin Xiao, Peitong Wen, Ruoyun Lyu, and Ke Ning reviewed and edited the manuscript. All authors read and approved the manuscript.
This work was supported by National Natural Science Foundation of China (Grant no. 81403469).