Natural Language Processing (NLP) empowered mobile computing is the use of NLP techniques in the context of mobile environment. Research in this field has drawn much attention given the continually increasing number of publications in the last five years. This study presents the status and development trend of the research field through an objective, systematic, and comprehensive review of relevant publications available from Web of Science. Analysis techniques including a descriptive statistics method, a geographic visualization method, a social network analysis method, a latent dirichlet allocation method, and an affinity propagation clustering method are used. We quantitatively analyze the publications in terms of statistical characteristics, geographical distribution, cooperation relationship, and topic discovery and distribution. This systematic analysis of the field illustrates the publications evolution over time and identifies current research interests and potential directions for future research. Our work can potentially assist researchers in keeping abreast of the research status. It can also help monitoring new scientific and technological development in the research field.
With the development of mobile devices as well as the advances in wireless communication technologies, mobile computing is becoming a significantly important paradigm in today’s world of networked computing systems [
NLP empowered mobile computing research field has attracted more and more interests from scientific community, witnessing 12 publications in 2000 to 55 publications in 2016 from Web of Science (WoS). Some representative examples are as follows. Chen et al. [
Bibliometric analysis is defined as the use of statistical methods on evaluating scholarly publications from an objective and quantitative perspective within a certain field [
Bibliometric analysis has been widely applied to various fields for the measurement of quality and productivity of academic output and has demonstrated excellent effectiveness from long-term practice. Relevant researches mainly focused on revealing publication statistical characteristics, exploring the collaboration relationship, and uncovering research themes and their evolution. Some examples are as follows. Geng et al. [
To the best of our knowledge, there is no scientific review of NLP empowered mobile computing research field currently. Thus, in this study, we conduct a bibliometric analysis on publications retrieved from WoS during the years 2000–2016 to explore the research status of the research field. The main objective is to address the following issues:
The rest of the paper is organized as follows. Section
Five different methods are applied to analyze research publications in the NLP empowered mobile computing field retrieved from WoS. The details of the methods are described in Section
Descriptive statistics are brief descriptive coefficients that summarize a collection of information, which can be either a representation of the entire population or a sample. Descriptive statistics are commonly used as measures of central tendency and measures of variability. Measures of central tendency usually include mean, median, and mode, while measures of variability generally contain standard deviation, minimum and maximum variables, kurtosis, and skewness. These two measures use graphs, tables, and general discussions to simply describe data. This simplifies large amounts of data in a sensible way by presenting quantitative descriptions in a manageable form to help users understand the meaning of the data being analyzed.
In this study, descriptive statistics method was applied to acquire characteristics of the retrieved publications, including publication distribution by year, most influential publications, productive journals, authors, affiliations, and countries/regions, as well as co-authors, coaffiliation, and cocountry/region publication distribution and topic distribution by year.
Geographic visualization or Geovisualization is a set of tools and techniques supporting the analysis of geospatial or spatial data, emphasizing knowledge construction over knowledge storage or information transmission. By combining technologies, e.g., image processing, simulation, and virtual reality, computers can help present information in a way that patterns can be found. Geovisualization can be applied to all the stages of problem-solving in geographical analysis, from development of initial hypotheses to knowledge discovery, analysis, presentation, and evaluation. According to Tobler’s First Law of Geography [
In this study, we applied geographic visualization analysis to explore geographical distributions of publications in country/region level.
Social network analysis is a process of investigating social structures using networks and graph theory [
In this study, we applied social network analysis to explore the cooperation relationships for specific countries/regions, affiliations, and authors in the NLP empowered mobile computing research field. The cooperation among countries/regions, affiliations, and authors was visualized using interactive force directed networks. In the networks, nodes represented specific countries/regions, affiliations or authors, and lines indicated cooperation. The size of nodes represented publication numbers of a specific country, affiliation, or author. The width of lines reflected cooperation frequencies between two countries/regions, affiliations, or authors. The color indicated specific continent of a country/region, or specific country/region of an affiliation or author. Users could explore the cooperation relationships for specific countries/regions, affiliations, or authors by dynamically dragging the nodes.
Latent Dirichlet allocation (LDA), proposed by Blei [
LDA formally defines the following terms: A A A
LDA assumes the following generation process: The term distribution The proportions For each word
As for variational expectation-maximization (VEM) estimation, the log-likelihood for one document
Gibbs sampling defines a Markov chain in the space of possible variable assignments such that the stationary distribution of the Markov chain is the joint distribution over variables. Thus, it is a Markov Chain Monte Carlo method [
The perplexity, as shown in (
In (
Additionally, estimation using Gibbs sampling requires specification of values for the parameters of the prior distributions.
In this study, topic discovery and distribution were analyzed using LDA models with the following steps: We assigned the weights of segmented author keywords and Keywords Plus, publication title, and abstract as 0.4, 0.4 and 0.2, respectively, as determined in our former experiment [ Term Frequency-Inverse Document Frequencies (TF-IDF) were used to filter out unimportant terms. As one of the most popular term-weighting schemes, TF-IDF increases proportionally to the number of times a term appears in a publication but is often offset by the frequency of the term in the whole collection of publications. We calculated the TF-IDF values of all terms to sort the terms. By manually examining these ranked terms, we defined a threshold as 0.1 empirically. Only the terms with a TF-IDF value greater than the threshold were kept for further analysis. Through sampling, 16 different topic numbers were set to With an initialized By matching the topics detected by VEM and Gibbs sampling based on Hellinger distance, the best matches with the smallest distance could be identified. Hellinger distance is calculated as (
Affinity Propagation (AP) algorithm was proposed by Frey and Dueck [
AP algorithm takes
There are two types of messages contained in this technique. The responsibility
Responsibility and availability of message updates are
In our study, with the basis of term-topic posterior probability matrix, we applied AP clustering method for the cluster analysis of the topics identified by the LDA method.
Web of Science, as the most authoritative citation database, was used as the data source for retrieving research publications in the NLP empowered mobile computing field. First of all, a list of keywords related to the “natural language processing” and “mobile computing” was determined by a domain expert. With “Science Citation Index Expanded” and “Social Sciences Citation Index” as indexes, publications used in this study were identified using the specific query in Table
The query used to retrieve research publications in the NLP empowered mobile computing field from WoS.
TS=((“natural language processing” OR “NLP” OR “semantic analysis” OR “bag of words” OR “word sense disambiguation” OR “named entity recognition” OR “NER” OR “sentiment analysis” OR “information extraction” “tokenization” OR “stemming” OR “lemmatization” OR “corpus” OR “stop words” OR “parts-of-speech” OR “language modeling” OR “n-grams” OR “syntactic analysis” OR “information retrieval” OR “language model”) AND (“mobile computing” OR “mobile” OR “smart device” OR “smartphone” OR “cellphone” OR “telephony device” OR “Cellular network” OR “Android” OR “iOS” OR “phone”)) |
The raw data of the 716 publications were downloaded as plain text. Key elements including title, author, journal, publication date, subject category, language, funding, author keywords, Keywords Plus, abstract, and author address, as well as number of citations, pages, and references, were extracted. In order to ensure they were closely related to the research field, manual verification was conducted by a domain expert on each publication. 471 publications were identified as relevant for analysis eventually. Further, corresponding affiliations and countries/regions were identified out from author address information. Key terms were extracted from author keywords, Keywords Plus, title, and abstract.
The statistical characteristics of the publications are shown as Table
The statistical characteristics of the 471 publications.
Characteristics | Statistics |
---|---|
Total #pub. | 471 |
| |
#pub. with author keywords or Keywords Plus | 412 |
#unique publication sources | 287 |
#unique countries (or regions)/first countries (or regions) | 60; 52 |
#unique affiliations/first affiliations | 544; 345 |
#unique authors/first authors/last authors | 1,408; 451; 441 |
Average #citations | 10.42 |
Average #countries (or regions) in one pub. | 1.25 |
Average #affiliations in one pub. | 1.64 |
Average #authors in one pub. | 3.27 |
Average #funds in one pub. | 0.73 |
Average #pages in one pub. | 15.66 |
Average #references in one pub. | 33.29 |
Average #author keywords or Keywords Plus | 6.81 |
Average #words/characters in title | 10.57; 80.13 |
Average #words/characters in abstract | 186.40; 1,265.58 |
| |
Language distribution | English (98.73%); Estonian (0.42%); French (0.42%); Spanish (0.21%); Afrikaans (0.21%) |
| |
Subject category distribution (Top 10) | Computer Science (38.76%); Engineering (16.27%); Telecommunications (10.98%); Acoustics (5.82%); Information Science & Library Science (2.78%); Linguistics (2.51%); Psychology (2.12%); Operations Research & Management Science (1.85%); Business & Economics (1.32%); Communication (1.32%) |
| |
Top 10 terms in author keywords and Keywords Plus | Mobile (30.36%); Information (22.08%); Retrieval (16.77%); Recognition (16.56%); System (14.86%); Speech (14.01%); Model (12.10%); Network (12.10%); Language (11.04%); Analysis (9.98%) |
| |
Top 10 terms in titles | Mobile (34.18%); Information (17.83%); System (12.53%); Retrieval (12.10%); Recognition (10.62%); Speech (8.70%); Network (8.28%); Model (7.86%); Language (7.22%); Environment (6.37%) |
| |
Top 10 terms in abstracts | Mobile (66.67%); Information (56.90%); Paper (55.41%); System (48.20%); Result (46.07%); Data (38.00%); User (38.00%); Model (37.37%); Device (32.70%); Retrieval (31.42%) |
The distribution characteristics of the 471 publications are shown in Figure
Distribution characteristics of the 471 publications.
The total publications, total citations, average number of citations per publication, and the number of annual citations are demonstrated in Figure
The statistics of the 417 publications (the light blue bars indicate total publications and the red bars indicate total citations. The dark blue line indicates average citations per publication and the green line indicates annual citations).
The top 11 contributing journals in the research field are presented in Table
Top 11 contributing journals in the NLP empowered mobile computing research field.
Rank | Journals | SC | TP | % | TC | ACP | | ≥10 | T100 |
---|---|---|---|---|---|---|---|---|---|
1 | IEEE/ACM Transactions on Audio Speech and Language Processing | A; E | 25 | 5.31 | 447 | 17.88 | 11 | 12 | 11 |
2 | Speech Communication | A; CS | 11 | 2.34 | 179 | 16.27 | 6 | 6 | 5 |
3 | Computer Speech and Language | CS | 10 | 2.12 | 93 | 9.30 | 6 | 5 | 3 |
4 | Expert Systems with Applications | CS; E; OR&MS | 8 | 1.70 | 320 | 40.00 | 8 | 7 | 5 |
4 | IEEE Transactions on Consumer Electronics | E; T | 8 | 1.70 | 44 | 5.50 | 5 | 1 | 0 |
6 | Mobile Information Systems | CS; T | 7 | 1.49 | 95 | 13.57 | 3 | 2 | 2 |
6 | Multimedia Tools and Applications | CS; E | 7 | 1.49 | 71 | 10.14 | 3 | 1 | 1 |
6 | Personal and Ubiquitous Computing | CS; T | 7 | 1.49 | 67 | 9.57 | 4 | 3 | 1 |
9 | Information Sciences | CS | 6 | 1.27 | 85 | 14.17 | 5 | 3 | 3 |
10 | EURASIP Journal on Wireless Communications and Networking | E; T | 5 | 1.06 | 22 | 4.40 | 2 | 1 | 1 |
10 | IEICE Transactions on Information and Systems | CS | 5 | 1.06 | 11 | 2.20 | 2 | 0 | 0 |
In order to better measure the overall scientific importance of these 11 journals, 5 assessment indicators acquired from Scientific Journal Rankings were used, including Impact Factor (IF), SCImago Journal Rank (SJR), 5-Year IF, Source Normalized Impact per Paper (SNIP), and CiteScore. IF is a measure for reflecting the yearly average number of citations to recent publications published in a journal. It is the primary and widely used indicator on assessing one journal’s significance. SJR is a measure of scientific influence of scholarly journals. It accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from. 5-Year IF is calculated by dividing the number of citations to the journal in a given year by the number of publications published in that journal in the previous five years. SNIP is defined as the ratio of the journal’s citation count per publication and the citation potential in its subject field. CiteScore index, launched by Elsevier in December 2016, is calculated as the ratio of total citations received in a given year by all publications published in a given journal in three previous years and the number of publications published in the journal in three previous years.
Therefore, the 11 productive journals were compared by using their IF, SJR, 5-Year IF, SNIP, and CiteScore for year 2016, as shown in Figure
Comparisons of IF, SJR, 5-Year IF, SNIP, and CiteScore for the top 11 productive journals for year 2016.
The number of citations reflects the popularity and influence of a publication in the scientific community [
Top 15 most influential publications in the NLP empowered mobile computing research field.
Rank | Title | Author/s | Year | TC | C/Y |
---|---|---|---|---|---|
1 | Energy-Efficient Link Adaptation in Frequency-Selective Channels | Miao G. W., et al. | 2010 | 376 | 53.71 |
| |||||
2 | Text Entry for Mobile Computing: Models and Methods, Theory and Practice | MacKenzie I. S.; Soukoreff R. W. | 2002 | 172 | 11.47 |
| |||||
3 | Cell-Phone-Induced Driver Distraction | Strayer D. L.; Drews F. A. | 2007 | 148 | 14.80 |
| |||||
4 | A Vector Space Modeling Approach to Spoken Language Identification | Li H. Z., et al. | 2007 | 116 | 11.60 |
| |||||
5 | Context-Aware System for Proactive Personalized Service Based on Context History | Hong J. Y., et al. | 2009 | 91 | 11.38 |
| |||||
6 | More than Words: Social Networks' Text Mining for Consumer Brand Sentiments | Mostafa M. M. | 2013 | 88 | 22.00 |
| |||||
7 | The Effect of Mobility-Induced Location Errors on Geographic Routing in Mobile Ad Hoc and Sensor Networks: Analysis and Improvement Using Mobility Prediction | Son, D. J., et al. | 2004 | 77 | 5.92 |
| |||||
8 | A Personalized Tourist Trip Design Algorithm for Mobile Tourist Guides | Souffriau W., et al. | 2008 | 76 | 8.44 |
| |||||
9 | D'Agents: Applications and Performance of a Mobile-Agent System | Gray R. S., et al. | 2002 | 73 | 4.87 |
| |||||
10 | Optical Encryption and QR Codes: Secure and Noise-Free Information Retrieval | Barrera J. F., et al. | 2013 | 64 | 16.00 |
| |||||
11 | Text-Dependent Speaker Verification: Classifiers, Databases and RSR2015 | Larcher A., et al. | 2014 | 60 | 20.00 |
| |||||
12 | A Location-Aware Recommender System for Mobile Shopping Environments | Yang W. S., et al. | 2008 | 59 | 6.56 |
| |||||
12 | An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email | Walker M. A. | 2000 | 59 | 3.47 |
| |||||
14 | Landmark Recognition with Compact BoW Histogram and Ensemble ELM | Cao J. W., et al. | 2016 | 56 | 56.00 |
| |||||
14 | Mobile-Agent Coordination Models for Internet Applications | Cabri G., et al. | 2000 | 56 | 3.29 |
From the 471 publications, there are 1,408 authors. 451 of them are first authors and 441 are last authors. 20 authors have 3 or more publications, and 98 authors have 2 or more publications. 20 most productive authors are listed in Table
The most productive authors in the NLP empowered mobile computing research field.
Rank | Name | Country | TP | TC | ACP | | T100 | | FP | LP | CP |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | | SG | 4 | 108 | 27.00 | 4 | 3 | 4 | 3 | 0 | 4 |
1 | | IT | 4 | 45 | 11.25 | 3 | 1 | 0 | 0 | 0 | 4 |
2 | | KR | 3 | 27 | 9.00 | 3 | 1 | 0 | 1 | 0 | 3 |
2 | | USA | 3 | 40 | 13.33 | 3 | 1 | 3 | 0 | 2 | 3 |
2 | | IT | 3 | 18 | 6.00 | 2 | 1 | 4 | 3 | 0 | 3 |
2 | | IT | 3 | 18 | 6.00 | 2 | 1 | 4 | 0 | 2 | 3 |
2 | | GR | 3 | 9 | 3.00 | 1 | 0 | 0 | 0 | 3 | 3 |
2 | | UK | 3 | 37 | 12.33 | 2 | 1 | 0 | 0 | 3 | 3 |
2 | | KR | 3 | 4 | 1.33 | 1 | 0 | 2 | 0 | 2 | 3 |
2 | | GR | 3 | 9 | 3.00 | 1 | 0 | 0 | 1 | 0 | 3 |
2 | | GR | 3 | 9 | 3.00 | 1 | 0 | 0 | 0 | 0 | 3 |
2 | | KR | 3 | 4 | 1.33 | 1 | 0 | 7 | 0 | 2 | 3 |
2 | | USA | 3 | 173 | 57.67 | 3 | 3 | 4 | 1 | 2 | 2 |
2 | | CN | 3 | 3 | 1.00 | 1 | 0 | 6 | 0 | 1 | 3 |
2 | | JP | 3 | 4 | 1.33 | 2 | 0 | 2 | 1 | 2 | 3 |
2 | | CA | 3 | 38 | 12.67 | 2 | 1 | 0 | 0 | 0 | 3 |
2 | | CN | 3 | 48 | 16.00 | 2 | 2 | 0 | 0 | 0 | 2 |
2 | | CN | 3 | 51 | 17.00 | 3 | 2 | 3 | 1 | 0 | 3 |
2 | | CN | 3 | 9 | 3.00 | 2 | 0 | 3 | 0 | 3 | 3 |
2 | | KR | 3 | 27 | 9.00 | 3 | 1 | 0 | 0 | 3 | 3 |
544 affiliations from 60 countries/regions have publications in the NLP empowered mobile computing research field. Table
The most productive affiliations in the NLP empowered mobile computing research field.
Rank | Name | Country | TP | TC | ACP | | T100 | | FP | CP |
---|---|---|---|---|---|---|---|---|---|---|
1 | | SG | 8 | 87 | 10.88 | 5 | 3 | 1 | 4 | 5 |
1 | | CN | 8 | 42 | 5.25 | 4 | 1 | 21 | 8 | 4 |
3 | | CN | 7 | 115 | 16.43 | 5 | 4 | 3 | 3 | 6 |
3 | | TW | 7 | 83 | 11.86 | 5 | 3 | 3 | 5 | 4 |
5 | | USA | 5 | 550 | 110.00 | 4 | 4 | 6 | 2 | 4 |
5 | | USA | 5 | 10 | 2.00 | 2 | 0 | 9 | 4 | 4 |
5 | | TW | 5 | 62 | 12.40 | 3 | 2 | 6 | 4 | 1 |
5 | | USA | 5 | 47 | 9.40 | 4 | 1 | 4 | 1 | 5 |
9 | | IN | 4 | 35 | 8.75 | 3 | 1 | 3 | 4 | 1 |
9 | | USA | 4 | 28 | 7.00 | 3 | 1 | 1 | 4 | 2 |
9 | | USA | 4 | 26 | 6.50 | 3 | 0 | 8 | 2 | 2 |
9 | | KR | 4 | 31 | 7.75 | 4 | 1 | 5 | 4 | 0 |
9 | | UK | 4 | 43 | 10.75 | 2 | 1 | 0 | 4 | 0 |
9 | | IT | 4 | 45 | 11.25 | 3 | 1 | 0 | 1 | 3 |
9 | | CN | 4 | 43 | 10.75 | 2 | 1 | 5 | 3 | 3 |
The 471 publications are from 60 countries/regions. The number of publications affiliated with 1 country/region range
The most productive countries/regions in the NLP empowered mobile computing research field.
Rank | Country | TP | TC | ACP | | T100 | FP (%) | Single-country/region | International collaboration | ||
---|---|---|---|---|---|---|---|---|---|---|---|
ACP | TP (%) | ACP | TFC ( | ||||||||
1 | USA | 105 | 1,795 | 17.10 | 22 | 32 | 77.14 | 20.78 | 60.00 | 11.57 | CN (12) |
2 | CN | 61 | 372 | 6.10 | 10 | 10 | 91.80 | 4.17 | 57.38 | 9.04 | USA (12) |
3 | UK | 44 | 418 | 9.50 | 12 | 11 | 61.36 | 11.68 | 63.64 | 5.69 | IE/CH |
4 | KR | 41 | 281 | 6.85 | 8 | 6 | 92.68 | 7.03 | 85.37 | 5.83 | CN/USA |
5 | TW | 37 | 399 | 10.78 | 11 | 11 | 94.59 | 11.07 | 81.08 | 9.57 | USA |
6 | JP | 24 | 77 | 3.21 | 3 | 1 | 79.17 | 1.44 | 75.00 | 8.50 | CN |
7 | IT | 21 | 299 | 14.24 | 10 | 9 | 80.95 | 13.19 | 76.19 | 17.60 | USA |
8 | AU | 18 | 218 | 12.11 | 7 | 7 | 61.11 | 18.00 | 38.89 | 8.36 | USA |
8 | CA | 18 | 313 | 17.39 | 9 | 4 | 88.89 | 20.38 | 72.22 | 9.60 | N/A |
10 | FR | 17 | 157 | 9.24 | 6 | 5 | 64.71 | 4.45 | 64.71 | 18.00 | CN/USA |
10 | GR | 17 | 38 | 2.24 | 3 | 0 | 100.00 | 2.24 | 100.00 | 0.00 | N/A |
10 | ES | 17 | 124 | 7.29 | 7 | 2 | 88.24 | 6.43 | 82.35 | 11.33 | USA |
13 | SG | 16 | 355 | 22.19 | 9 | 7 | 75.00 | 14.90 | 62.50 | 34.33 | USA |
14 | HK SAR | 15 | 98 | 6.53 | 6 | 2 | 53.33 | 9.17 | 40.00 | 4.78 | CN/USA |
15 | DE | 14 | 114 | 8.14 | 5 | 3 | 85.71 | 8.11 | 64.29 | 8.20 | CN |
Geographical distributions of the NLP empowered mobile computing research publications.
Since the publications are mainly distributed in the USA, China, England, and South Korea, we further explored the annual publication distributions for these 4 countries, as shown in Figure
Publication distributions by year for the top 4 countries/regions.
Figure
International collaborative publication distribution by year.
Figures
Institution-collaborative publication distribution by year.
Author-collaborative publication distribution by year.
Furthermore, the cooperation relations for specific countries/regions, affiliations, and authors were visualized with social network analysis. A cooperation network for 48 countries/regions is shown in Figure
Cooperation network of 48 countries/regions (node colors represent different continents, e.g., orange for Asia, blue for North America, green for Europe, red for Oceania, purple for Africa, and brown for South America). The network can be accessed via the link (
Cooperation network of 91 affiliations (node colors represent different countries/regions, e.g., red for the USA, pink for South Korea, and purple for Australia). The network can be accessed via the link (
Cooperation network of 65 authors (node colors represent different countries/regions, e.g., range for South Korea, red for the USA, purple for Australia, green for China, and brown for Italy). The network can be accessed via the link (
By setting TF-IDF value threshold as 0.1, the terms were ranked by frequency. Table
Top 20 most frequent terms.
Rank | Stemmed terms | Occurrence number | ||
---|---|---|---|---|
Total | 2000–2008 | 2009–2016 | ||
1 | Agent | 369 | 250 | 119 |
2 | Image | 215 | 70 | 145 |
3 | Sentiment | 128 | 0 | 128 |
4 | Dialogue | 83 | 49 | 34 |
5 | Health | 81 | 2 | 79 |
6 | Music | 76 | 27 | 49 |
7 | Radio | 74 | 10 | 64 |
8 | Unit | 74 | 51 | 23 |
9 | Adaptation | 70 | 40 | 30 |
10 | Relevance | 69 | 29 | 40 |
11 | Geographic | 66 | 37 | 29 |
12 | Short Messages | 66 | 9 | 57 |
13 | Protocol | 65 | 20 | 45 |
14 | Chinese | 64 | 29 | 35 |
15 | Medical | 60 | 16 | 44 |
16 | Recommendation | 60 | 4 | 56 |
17 | Clustering | 54 | 20 | 34 |
18 | Privacy | 54 | 9 | 45 |
19 | Ad hoc | 53 | 9 | 44 |
20 | Traffic | 52 | 17 | 35 |
Top 15 most frequent terms for the top 10 best matching topics.
Topic | Potential theme | Top high frequency terms |
---|---|---|
36 | Mobile agent computing | Agent; Coordination; Java; Migration; Protocol; Mobile-agent; Failure; Itinerary; Filtering; Turkish; Attack; Commerce; Context-aware; Truncation; Crash |
| ||
11 | Mobile agent computing | Agent; Planning; Ontology; Cloud; Multi-agent; Net; Interoperability; Neural; Peer-to-Peer; Broadband; Instruction; Complementarity; Natural Language; Traffic; Grounding |
| ||
32 | Mobile privacy and security | Privacy; Private; Secure; Location-Based Services; Encryption; Points of Interest; Protection; Approximate; Attack; Path; Privacy-preserving; Streaming; Password; Protocol; Cryptosystem |
| ||
1 | Image and syllable events | Image; Particular Allophones; Re-ranking; Composite Phoneme; Simple Phonemes; Syllable; Thing; iPad; On-Premise Signs; Spreading; Bow; Modern Orthography; Arabic; Content-based; Descriptor |
| ||
4 | Mobile social media computing | Sentiment; Opinion; Twitter; Tweet; Customer; Suggestion; Emojis; Emotion; Micro-blog; Protest; Brand; Suggestive; Microblog; Orientation; Box |
| ||
8 | Mobile radio | Radio; Phone-in; Localization; Australian; Formulation; Island; Reporting; Talkback; Involvement; Caller; Dialogic; Stance; Backlinking; Cloud; French |
| ||
5 | Mobile location computing | Geographic; Relevance; Seeking; Innovation; Subspace; Tourism; Birthright; Firm; Flier; Sensing; TILES (Temporal, Identity, Location, Environmental and Social); Cross-space; Location-aware; Personalized; Reposting |
| ||
40 | Context-aware computing | Dialogue; Context-aware; Estonian; Clarification; Array; Problematic; Reformulation; Verbose; Email; Mobile Information Services enabled by Mobile Publishing; Non-understanding; Publishing; Agent; Directive; Reinforcement |
| ||
10 | Second screen response | Gesture; Debate; PreFrontal Cortex; Adult; Presidential; Walking; Facial; Twitter; Educational; Gait; Political; Touch; Biometrics; Blink; Cortex |
| ||
35 | Language learning and modeling | Chinese; Information Retrieval; Peer-to-Peer; Conditional Random Field; Update; Apprentice; Affordances; Disyllabic; Website; Workplace; Self-study; Skip-chain; Descriptive; Mobile Peer-to-Peer; Multilingual |
(a) Estimated
We used the AP clustering analysis to perform the cluster analysis of the 40 topics. One way for measuring topic similarity is based on term-level similarity with the hypothesis that topics may contain the same terms. The clustering result based on term-topic posterior probability matrix is shown in Figure
The visualized result of hierarchical clustering based on term-topic posterior probability matrix.
Identifying emerging research topics can provide valuable insights into the development of the research field. Likewise, identification of fading research topics can also help understand the hot spots evolution [
The trends of the 40 research topics during 2000–2016 (
This study provides a most up-to-date bibliometric analysis on the publications in WoS during the years 2000–2016 in the NLP empowered mobile computing research field. Some interesting findings are discussed below.
The annual number of the publication distribution shows a significant growth trend, from 12 publications in 2000 to 55 publications in 2016. This indicates a growing interest in the research field.
The literature characteristics analysis shows that the 417 publications are widely dispersed throughout 287 journals. 11 most productive journals together contribute about 21% of the total publications. The top 3 are
Top 3 most influential publications are: [
There are 1,408 authors and 544 affiliations involved in the publications. Most authors (79.18%) have only 1 publication, and 4.25% of the authors have 3 or more publications. The most productive authors are
Through geographic visualization analysis, 60 countries/regions have participated in the publications. The top 15 productive countries/regions are developed countries/regions, except for China. As the top 2, the USA and China have shown a significant growth in the numbers of scientific publications since 2010. These numbers are predicted to continue to increase in the coming years. This partially reflects the need of the development of NLP techniques in solving mobile computing issues.
Scientific collaboration analysis shows that there are significant growth of international collaborations, institution-collaborations as well as author-collaborations. Through social network analysis, we found that researchers tend to collaborate with others within the same country or area, with institutions under similar administration, or with a neighboring country or area. However, some research institutions might have separate administration arrangements from their associated universities or hospitals and a researcher might be affiliated with multiple institutions. The co-authors might actually work together but are affiliated with different institutions. Therefore, it is worth noticing that institution-wise collaboration might not be the actual collaboration among institutions.
Most topics identified using LDA method are recognizable, as they are related to major issues in the research field. Due to space constraints, here we only provide interpretations of some representative topics.
Topic 36 and Topic 11 contain words such as “Agent”, “Mobile-agent”, “Multi-agent”, “Itinerary”, “Migration”, “Protocol”, and “Truncation”. Thus, Topic 36 and Topic 11 pertain to
Topic 32 discusses
Topic 1 discusses
Topic 4 mainly focuses on
Based on topic distributions, we found that
In the thematic analysis, the optimal number of topics was selected as 40 by a statistical measure of model fitting the data. However, mechanical reliance on statistical measures might lead to the selection of a less meaningful topic model [
Through the AP clustering analysis on the 40-topics, 8 clusters were identified, i.e.,
This study is the first to thoroughly explore research status of the NLP empowered mobile computing research field in the statistical perspective. The study provides a comprehensive overview and an intellectual structure of the field from 2000 to 2016. The findings can potentially help researchers especially newcomers systematically understand the development of the field, learn the most influential journals, recognize potentially academic collaborators, and trace research hotspots.
For future work, there are several directions. First, more comprehensive data is expected to be included. Though WoS is a widely applied repository for bibliometric analysis due to its high authority, some relevant conference proceedings have not been indexed yet in WoS. Second, we intend to employ different data clustering methods and compare clustering results for deeper cluster analyzing.
We conducted a bibliometric analysis on natural language processing empowered mobile computing research publications from Web of Science published during years 2000–2016. The literature characteristics were uncovered using a descriptive statistics method. Geographical publication distribution was explored using a geographic visualization method. By applying a social network analysis method, cooperation relationships among countries/regions, affiliations, and authors were displayed. Finally, topic discovery and distribution were presented using a LDA method and an AP clustering method. We believe the analysis can help researchers comprehend the collaboration patterns and distribution of scholarly resources and research hot spots in the research field more systematically.
Tianyong Hao and Yi Zhou are the corresponding authors.
The authors declare that they have no conflicts of interest.
The work was substantially supported by the grant from National Natural Science Foundation of China (no. 61772146), the Innovative School Project in Higher Education of Guangdong Province (No. YQ2015062), Science and Technology Program of Guangzhou (no. 201604016136), and Major Project of Frontier and Key Technical Innovation of Guangdong Province (no. 2014B010118003).