A Corpus-Based Study of Public Attitudes towards Coronavirus Vaccines

Since the beginning of 2020, COVID-19 has been sweeping the world on an unprecedented scale. As an important means of fighting the virus, vaccines have provoked heated discussions. Motivated by this practical concern, the present study aims at contributing to the understanding of public opinions about vaccines, which may provide implications for the government in their making and implementation of related policies. (is research adopts a corpus-based approach in conjunction with a Critical Discourse Analysis (CDA). Data for this study drawn from the Coronavirus corpus show what people are actually saying in online newspapers and magazines in 20 different English-speaking countries. (e collocation and frequency of the word “vaccine” are arranged in concordance contextually. Overall, this study reveals that the collocation of vaccine can be divided into several categories, and people’s major concerns about COVID-19 vaccination include global progress, equality, and the latest development.


Introduction
Outbreaks of infectious diseases such as Ebola, SARS, and H1N1 flu are common around the world. Since the first half of 2020, Coronavirus has been sweeping the world on an unprecedented scale, affecting every aspect of people's lives. e outbreak is reported in Wuhan, a major city in China's Hubei province. In a very short time, the disease spread so fast that it reached almost every part of China. As a result of mass migration during the Chinese Lunar New Year, the virus began to spread abroad. It was proclaimed by the World Health Organization (WHO) as a pandemic on March 11, 2020. At present, the global Coronavirus is still ongoing, posing a serious threat to the life safety of people all over the world. To some extent, we have come into a new postcorona era.
With no particularly effective drugs to treat Coronavirus, the development of vaccines has been racing against time since the outbreak. As an important means of fighting the virus, the development of vaccines has attracted the attention of people all over the world. However, although laboratories and medical institutions are vigilant in developing a vaccine, their attempts are not effective enough. Besides, because of the lack of knowledge and information about fatalities, diagnosis, medicines, and vaccines of COVID-19, unnecessary fear and nervousness are generated [1]. ese factors led to a war of words and an exchange of blame. Governments, public health agencies, and awareness organizations must be prepared to tackle hesitancy and improve vaccine awareness so that, where possible, the population supports immunization. Antivaccination groups are now lobbying against the need for a vaccine in several nations, with others questioning the presence of COVID-19 entirely. Misinformation transmitted across different networks may have a major influence on acceptance.
However, there is only very limited research on public attitude towards COVID-19 vaccines. erefore, motivated by this practical concern, the present study is aimed at contributing to the understanding of public opinions about vaccines, which may provide implications for governments in their making and implementation of related policies.
is research uses a corpus-based approach in conjunction with a Critical Discourse Analysis (CDA), following the Discourse Analytical Perspective in Dijk [2] to probe public attitudes towards the COVID-19 vaccine. Data for this study drawn from the Coronavirus corpus show what people are actually saying in online newspapers and magazines in 20 different English-speaking countries. In May 2020, the corpus was initially published. Today, there are about 901 million words in the corpus, and every day it keeps rising by 3-4 million words. It is the definitive record of the social, cultural, and economic impact of Coronavirus  in 2020 and beyond. Collocation and frequency of the word "vaccine" can be concordanced contextually.
rough word frequency, word collocation, sentiment analysis towards adjectives, and other research methods, the public's attitude towards Coronavirus vaccines can be analyzed and perceived [3]. In addition, since the media is the main source for the public to acquire information about vaccine development, its role in describing the current phenomenon is also worth exploring. Specifically, this paper wants to focus on the following research questions: (1) What are the most common adjective collocations for "vaccines"? (2) How are "vaccines" portrayed contextually in online discourses? (3) What is the general picture of the public attitude towards Coronavirus vaccines?
is study reveals that the collocation of the word "vaccine" can be divided into several categories, indicating that people's major concerns about COVID-19 vaccination include global progress, equality, and the latest development. e rest of this paper is organized as follows. In Section 2, the theoretical framework and related studies are reviewed, introducing the history of vaccine, research methods of CDA and corpus linguistics, and their applications in studies related to vaccines. In Section 3, instruments used and data collection procedures for this study are described, while in Section 4, results of data analysis are reported. In Section 5, research questions put forward in Section 1 are answered, policy implications are proposed for governments and other organizations, and further research directions are suggested accordingly.

Controversy over Vaccines and Vaccination.
Vaccination ranked first among the ten greatest public health achievements of the 20th century [4], since vaccines have saved millions of lives by preventing contagious diseases from spreading [5]. Vaccines are regarded as an important guarantee to maintain social security and improve social productivity. Smallpox vaccination was mandatory in Europe and North America in the 19th century; in the 20th century, certain vaccinations were also included in the admission requirements of public schools [6].
However, vaccination is controversial. A lot of problems we have today can date back to the 1790s, when the first human vaccine was produced by Edward Jenner for curing smallpox ( [6]: 612). Since then, the argument of safety, money, and proper immunization schedules has aroused a growing concern from the public, including parents, medical profession, policymakers and the media. eir attitudes vary from "awe of a seeming scientific miracle to skepticism and outright hostility" ( [6]: 613). Margaret Chan, the Director-General of the World Health Organization, once described the wide distrust of vaccination as "worrisome" ( [7]:1151).
Increasing anxiety and panic leads to some quitting. e antivaccination movement began to gain support in the United States at the end of the 19th century when smallpox outbroke. Parents' anxiety and protest were triggered by the smallpox vaccine itself, which violates their personal rights. Tensions had escalated when a mandatory vaccination program was issued by the government. More than 100 years ago, Jacobson, a citizen of the city of Cambridge in Massachusetts, declined to be vaccinated for smallpox as he argued that the law violated his right to care for his own body. e Court dismissed the challenge raised by Jacobson. In order to protect the public's welfare, the U. S. Supreme Court ruled in 1905 that, in the case of infectious disease, the state of Massachusetts had the authority to protect the public with mandatory laws. at is the first U.S. Case on the power of states in public health law [8]. ere is often a dispute between preserving individual liberties and protecting public health, and a major challenge is how to balance individual rights with community needs. Certain religions and value systems also fostered alternate vaccine views. In general, moral opposition to vaccination is based on the ethical dilemmas involved in the use of human tissue cells to manufacture vaccines and the conviction that such drugs or blood or tissues should not be received from animals and that they should be cured by God or by natural means [9].
In the mid-1970s, an international dispute arose in Europe, Asia, Australia, and North America over the immunological protection of DTP [10]. e UK is once again the center of antivaccination protests nearly 25 years after the DTP controversy, this time against the MMR vaccine. As Johnston [11] observed, most opponents of vaccines are not entirely opposed to immunization. What they are concerned about is the safety and efficacy of vaccines. erefore, they resist only some specific vaccines with potential safety hazards.

CDA and Corpus Linguistics.
Following the Discourse Analytical Approach in van Dijk [12], the CDA framework analyzes media texts not just in the text but also from the context. It was described as a sort of study of dialogue that focuses on the examination of social motivations or the philosophy of the speaker behind his or her language choice in a debate. e reason for using CDA to analyze these texts is that CDA is an approach to discourse research aimed at analyzing the use of language, whether spoken or written, in daily conversation. From this perspective, language is seen as a social activity [13] that reflects not only other social 2 Complexity practices but also elements such as domination, opposition, control, and ideology [14].
Much of the research that is combined CDA with Corpus Linguistics (henceforth CL) has placed their emphasis on "the implications of lexical choices in the text" ( [15]:16) after analyzing ideological ideas in corpora. According to Hunston [16], such studies usually choose high-frequency words as their research emphasis. e use of CL tends to establish integrity and trustworthiness for the research results since the researcher tries to gather a specialized corpus that is representative of the text type [16]. e textual study would analyze the language used in the texts in expressing the view of writers and their semantic and pragmatic meaning [17]. However, the qualitative method still needs to be adopted to analyze the newspaper's fiscal, political, and cultural values. e purpose of CL is not to discover the meaning in isolated words that are out of context. Instead, in a specific sense, the purpose is to identify patterns of word items used in discourse (consistent lines) and words that occur together (collocations) [17].

CDA and Media Discourse.
is research focuses on the analysis of the reports in online newspapers, a typical discourse type. As the main function of news reports is to give definitions and labels to people's lives, their expression is the key to portraying real life. From the angle of linguistics, the most integrated study was done in Bednarek [18], who believes that the essence of media discourse is "manufactured." Bednarek and Caple [19] also show the embedded bias of newspapers. e media discourse of this study refers to vaccination in online magazines and newspapers that are deposited in the corpus. eir creators are journalists, and readers are the general public. is research will concentrate on the discourse, adopt the method of Van Dijk, and highlight the theme.

Related Studies.
As Coronavirus started to spread rapidly, the scientific community was motivated to work together in order to gather, organize, and analyze data. At the beginning of the outbreak, most of the scientific research was focused on biology or medicine, and linguistic research was rare.
Katermina and Yachenko [20] and Kim [21] examined the linguistic phenomenon related to COVID-19 in English mass media texts to determine if mass media creates, reproduces, and transmits axiological values about COVID-19. e research studied the word collocation of "COVID-19," "Coronavirus," "virus," and "disease." eir results revealed that COVID-19 is widely discussed in the mass media in many ways. For example, some conceptualize the virus as war. e researchers concluded that the COVID-19 pandemic had affected culture and language to the extent that it can be considered a social calamity.
A contrasting analysis conducted in Awad AlAfnan [22] studied the coverage of COVID-19 in two newspapers, America's Washington Post and China's People's Daily. It is found that mass media expressed biased opinions in their coverage and declaration. In the study, both CDA and CL approaches were used, which revealed that the Chinese virus, Wuhan Virus, and KongFlu were used to refer to the COVID-19. In Washington Post, these labels were focused on creating a derogatory hollow among Asian Americans.
Azizan et al. [23] performed a study based on 15 Facebook posts about COVID-19. ey used Constructive Discourse Analysis and Critical Discourse Analysis approaches. A thematic analysis was made to codify and categorize data, which revealed the creation of positive discourse. e posts highlighted the extensive use of collective pronouns such as "we" and "us" that symbolize unity and empowerment in dealing with COVID-19 among Malaysians. Regarding Facebook posts, four positive trends have been identified, which are faith, patriotism, call for heroism, and public awareness. e researchers concluded that individual actions subsequently served as a powerful buffer against negative discourse and as an attempt to make a difference to the emergence of subtle influence.
In most of the studies, Natural Language Processing (NLP), machine learning, sentiment analysis, and text mining were used for analysis by computer software. Many different aspects of COVID-19 and its impacts have been analyzed.
ere are some studies on the public opinions about the introduction of the COVID-19 vaccines. Some of these studies are based on the text of social media, seeking to understand the general public's acceptance or hesitance of vaccines and the influencing factors (e.g., [24][25][26]). ere are also studies designed to understand the attitude of medical professionals towards vaccines, aiming to provide advice and guidance to nonprofessionals (e.g., [27,28]). Compared with most of the above studies, the most remarkable characteristic of the current study is as follows. e text style in the corpus selected for this study is news reports, and the corpus includes the news media, journals, and magazines of 20 English-speaking countries in the world, so it can better represent the global public attitude.

e Coronavirus Corpus.
e discourse analyzed in this study is from English Corpora (https://www.englishcorpora.org/). It is the most widely used collection of corpora in the world. Currently, the corpora are visited by more than 130,000 users every month from over 140 countries worldwide. Besides, hundreds of universities worldwide have applied for academic licenses, which allow their students and scholars to use the expanded and additional functions of the corpora.
e Coronavirus Corpus selected for this study is a subset of the NOW Corpus (News on the Web), a 14.3billion-word corpus based on online newspapers and magazines from 2010 to the present time, with 180-200 million words added each month.
As claimed in the introduction of the corpus on the website, the Coronavirus Corpus can be regarded as "the definitive record of the social, cultural, and economic impact of the coronavirus (COVID-19) in 2020 and beyond" (https://www.english-corpora.org/corona/).

Data Collection and Analysis.
Data between January 2020 and the end of April 2021 was collected from the abovementioned Coronavirus Corpus. Two functions of this corpus are used to analyze this study, frequency list, and collocation analysis. e frequency list displays the frequency of words or phrases searched in 10-day increments since Jan 2020, which can not only provide the total frequency of the retrieved words but also indicate the changing characteristics of the frequencies in a certain period of time.
Collocation analysis was conducted to investigate people's attitudes towards COVID-19 vaccines, and the results list words that usually appear together with the searched word in context. Baker emphasizes that collocation has an ideological nature and that the recurrent collocation of lexical items demonstrates the association of two concepts in people's minds (2006: 114). ese words can be found on the node word's left (L) side and right (R) side. e span of the word was set to search for the 2L and 2R around the node word "vaccine * " (words starting with vaccine including the word vaccines and other compounds) in this study. e minimum frequency of collocation was controlled to be 40, so the words shown in the list must have appeared around the node word over 40 times in the corpus. ese settings are meant to filter out words that have no real meaning or do not occur frequently enough. Besides, in this research, mutual information3 (MI3) is adopted to measure the effect size of associations between a node word and its collocates (see [29] for detailed discussion). Figure 1 shows that the frequencies of the word "vaccine" in NOW Corpus have increased significantly since 2020. In 2020, the PER MIL (per million) frequency of it was 203.06, 8-11 times that of the previous years, while in 2021, the frequency per million words was 613.87, three times that of 2021. ere is no doubt that while the epidemic is raging all over the world, people's attention to vaccines is bound to increase.

Frequencies.
From Figure 2, it is clear that the frequency of the word "vaccine" has been increasing since January 2020. Vaccination has been put on the agenda in Asian countries such as China, European and American countries, and people around the world are paying increasing attention to and widely talking about vaccines. In December 2020, many countries, including China and the United States, successively approved the conditional marketing of vaccines [32,33]. erefore, since December 2020, the frequency of the word in the corpus has increased sharply and maintained this high frequency so far.

Collocation Analysis.
e result of collocation concordance is shown in Figure 3. Adjectives modifying the word "vaccine" are grouped into three emotional categories, i.e., positive, negative, and neutral.

Positive Vaccine Portrayals.
Based on the concordance results, the top100 adjectives are taken to illustrate the point. Among the top 100 most frequently used adjectives, positive and optimistic one are: "available," "effective," "safe," "successful," "promising," "leading," "good," "viable," "positive," and "eligible." eir specific rankings and frequency of use are shown in Table 1.
It can be discovered from Table 1 that the frequencies of the first and the second adjectives "available" and "effective" are very high, and the word "safe" ranked sixth and the word "successful" ranked 10, which are also the focus of public attention. It is quite reasonable that vaccine, as a special public security product, its safety and effectiveness are two of its characteristics that cannot be ignored. In order to gather a deeper understanding of how these words are used contextually, we referred to the concordance lines. Following Barnbrook [34], the power of concordance lines lies in the way "of placing each word back in its original context, so that the details of its use and behaviour can be properly examined" (p. 65). e following example shows a random sampling of concordance lines from a total of 2042 occurrences for the word "successful:" (1) Given that at least 200,000 in the U.S. and more than a million globally have already died from COVID-19, we desperately need as many of the vaccines under development to succeed as soon as possible. # Success in dealing with COVID-19 requires far more than successful vaccine development, however. We need to eradicate the happy talk that, somehow, the pandemic will end when a vaccine is approved. # (2) "A successful vaccine against Sars-Cov-2 could be used to prevent infection, disease, and death in the whole population, with high-risk populations such as hospital workers and older adults, prioritized to receive the vaccination." # (3) e global race to find a successful coronavirus vaccine continues at full pace. Meanwhile, the virus has claimed over 8 lakh lives across the world. # 4 Complexity   Complexity 5 All the three examples above illustrate the public's desire for an effective vaccine, but they are slightly different. In the first example, we can see that the speaker wants to warn people not to rely entirely on the development of vaccines but to be prepared for a long battle.
When analyzing the specific context, we can dig out the deeper inner meaning behind the surface phenomena, which is also an important reason why this study combines the method of CDA with that of corpus linguistics so that both the concordance function of corpus tools and more detailed and critical analysis of CDA can be made full use of.

Negative Vaccine Portrayals.
Among the top 100, there are only five adjectives with negative ideas and attitudes (Table 2). However, it is noteworthy that the use of "potential," which ranks fifth, indicates that people still pay critical attention to the potential risks of vaccines. In addition, the use of words such as "so-called" and "adverse" reflects a high degree of public distrust of vaccines, especially in terms of the effect and quality of vaccines. is point is also well confirmed in the following specific context.
(1) # However, one controversial part of this potential "vaccine bubble" is that people will feel like their freedom will still be restricted unless they are part of the "bubble." # (2) # In recent days, China has vowed to deploy a potential vaccine as a "global public good" that would be accessible and affordable. Toward that end, the World Health Organization also laid out principles to encourage collaboration and information sharing on a COVID-19 treatment. # In the analysis of adjectives indicating negative emotions, the word potential is taken as an example. In one context, the word "potential" is used together with words such as risk or threat. In another context, the word "potential" suggests that vaccines will be developed in the future, and China will use vaccines for global health. e difference between these two examples lies in the use of the potential for two different meanings of the word. at is to say, the contextual results of the corpus will not distinguish the meanings, and at this time, we need to make semantic division manually to obtain more precise and accurate meanings.

Neutral Vaccine Portrayals.
Most adjectives do not have a clear tendency of emotion; thus, they are classified as neutral (Table 3). ese neutral adjectives are divided into four categories (Table 4). e first category can be labeled as "region, country, and organization." Countries on the top of the list are "Chinese" and "Russian." In fact, China and Russia have been among the fastest in the world in terms of vaccine research and development, and their progress has attracted the attention of many countries. It is worth noting that SINOVAC, the major vaccine research company in China, is mentioned frequently. Also, Global Alliance for Vaccines and Immunization (GAVI) is the only organization in the top 100. GAVI is a crucial publicprivate global health alliance established in 1999 that works with governments and nongovernmental organizations to promote global health and immunization. During the period selected by this study, it actively updated the latest vaccine research situation on their website and encouraged global vaccination. GAVI is coleading COVAX raised by WHO, the vaccines pillar of the Access to COVID-19 Tools (ACT) accelerator. It has established a global risk-sharing mechanism for pooled procurement and equitable distribution of COVID-19 vaccines. COVAX has transported 53 million COVID-19 vaccines to 121 participants so far and has played an important role in the transportation and distribution of vaccines around the world. e second category is labeled as "medical terms," including information about testing and production of vaccines and some of the words that are specific to the medical field of vaccines. is shows that the media consciously used medical terms and principles related to vaccines to inform the public of the basic information about vaccines so that the public can better understand the working mechanism of vaccines.
e third category is labeled as "development process," mainly including adjectives related to the research and development process and time points of the vaccine. is indicates that the media has been paying close attention to the research and development process of vaccines all the time. is also fully reflects the urgent need for effective vaccines and the expectation that vaccine development can achieve a breakthrough.

Conclusions.
Research questions put forward here are as follows:  (1) What are the most common adjective collocations for "vaccines"? (2) How are "vaccines" portrayed contextually in online discourses? (3) What is the general picture of the public attitude towards Coronavirus vaccines?
Next, these questions will be answered based on the analysis results obtained in this study.

Adjective Collocations for "Vaccines".
To answer the first two research questions, a total of 447 adjectives were included, and the lowest frequency of use was more than 40 times. e classified percentage of these adjectives is shown in Table 5.
Among the 447 adjectives, the number of positive adjectives was about twice as many as the number of negative adjectives. erefore, it can be concluded that people are more inclined to express positive and hopeful emotions when expressing their opinions on COVID-19 vaccines. Many positive expressions have appeared in reports describing the latest progress of the Coronavirus vaccines, which is bound to play an optimistic role in the fight against the Coronavirus. At the same time, the study found that many of the positive adjectives found their antonyms in the negative adjectives, indicating that there are different views on the same issue among the public.

Public Attitude towards Vaccine.
e third research question, "What is the general picture of the public attitude towards Coronavirus vaccines?" can be answered based on the results of collocation analysis. Public concerns about vaccines fall into three categories. e first is the globalization of vaccines. As the first category of adjectives mentioned in Section 4, among the 447 adjectives, frequent occurrences of global, national  e second category is about the equal distribution of vaccines. Such adjectives as equitable, equal, and fair frequently appeared in the context mostly reflect people's worry about whether everyone can be immunized. As a special public health measure, the distribution of vaccination can be a test to the governments in different countries and to WHO, and the improper sequence of vaccination may even lead to social disorder. e following example indicates concern of this type.
Climate activist Greta unberg says governments, vaccine developers, and the international community must "step up their game" to fight global vaccine inequity. e Swedish teen who inspired the Fridays for Future environmental movement cites estimates that one in four people in high-income countries have received coronavirus vaccines, compared with one in 500 in the middle-and lowerincome countries. # unberg says it is "completely unethical that high-income countries are now vaccinating young and healthy people if that happens at the expense of people in risk groups and on the front lines in low-and middle-income countries." # Finally, most people pay attention to the effect of vaccines. A group of frequently used adjectives is words like latest, novel, update, which express concerns about the latest situation and progress. After studying the context of these adjectives carefully, it is not hard to find that discourses with these words are mainly about whether the vaccine developing now can resist the attack from the variant of the virus. In fact, this is a big problem we are facing. e Coronavirus is mutating at a rapid rate that even vaccinated people may be infected again. e timeliness of vaccines has become one of the concerns of the public. e following statement is an example of this type.
# " e problem with a novel virus is it has got so much room to grow and shift and change to optimize itself, " said Harvard School of Public Health's Dr. Michael Mina. " e question is how quickly is it going to keep updating itself " " Whatever mutation happens, the vaccine manufacturers can keep up with it, but given the multiplicity of strains circulating at one time, that may be a challenge, " said Dr. Schaffner. # Dr. Mina, from Harvard, said he is less confident that the existing vaccines can be updated for any and all future variants.

Discussions.
is study mainly studied the public's attitude towards COVID-19 vaccines and their focus of attention. It confirms the positive portrayal of those who choose to be vaccinated. In addition, this study analyzed the neutral adjectives and divided them into four categories. Such classification can help policymakers and healthcare practitioners to understand people's concerns about vaccines and help the media to understand what people most want to know. Based on these findings, it is recommended that governments and public health agencies improve their strategies to better communicate the benefits of vaccination to the public, i.e., to increase vaccine confidence. Such strategies could include methods like increasing transparency of data and introducing data from scientific studies to make vaccines more credible. For example, Chinese media has constantly reported the development of the COVID-19 vaccines in China and around the world at a very early stage, and there are a lot of discussions about this issue on social media platforms like Sina Weibo (a blogging platform where people leave comments and express attitudes) [25]. Moreover, public attitudes are influenced by many factors. us, it is suggested that strategies should be adopted to eliminate negative emotions of the public on vaccination and improve positive feelings at the same time [35]. A strategy called "prosocial motivation" proved effective in making people take preventive measures by Jordan et al. [36] and Heffner et al. [37] belong to this category, which emphasizes how vaccination will protect the community and will bring life back to a state where everyone could be intimately connected. is position is supported by the study of Lyu et al. [38], which indicates that improving the public's pandemic experience and increasing their sentiment scores can promote their acceptance of the COVID-19 vaccines.  8 Complexity is study is limited to examining data from January 2020 to April 2021. With the continuous development of the epidemic, human beings have entered the postvaccine era, with various virus variants emerging in an endless stream. It is believed that people's attitudes towards vaccines will also continue to evolve. At the same time, this corpus only contains data from a few countries, so a corpus composed of international newspapers and media will also be a valuable supplement to the study. Now, the practice of vaccination has entered a new stage, and the governments of all countries are vigorously advocating and encouraging people to be vaccinated against COVID-19. Although China adheres to the principle of voluntary vaccination, people's enthusiasm about vaccination is very high. However, at the same time, there are still people who refuse to be exposed to the COVID-19 vaccine because of concerns about the safety of the vaccine. Lin et al. [40] investigate and summarize the steps taken by the Chinese government to control the spread of COVID-19 and reopen lockdown cities, emphasizing the effect of awareness diffusion in this process. erefore, further research can be conducted to probe into factors underlining different attitudes and ways to promote people's awareness so that both vaccines and vaccination policies can be improved accordingly.
Data Availability e data analyzed in this study were taken from Coronavirus Corpus from English Corpora (https://www.englishcorpora.org/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.