Ensemble Text Summarization Model for COVID-19-Associated Datasets

of


Introduction to Text Summarization
As web development has advanced to a new level in recent years, there is a greater need than ever for efcient text summarizing techniques for a variety of practical uses.Text summaries are typically used to extract potential information from text documents and provide a meaningful summary of the content.It is additionally seen as a benefcial substitute for information overload.Text summarization seeks to extract the appropriate representative subset of the provided text documents and collaboratively fnds the inherent semantic meanings by determining the key subjects of the textual content utilizing some of the conceptual viewpoints.Te technical method of extracting and abstracting precise brief summaries from a large text source is known as text summarization [1].Text summarizing typically uses one of two main mechanisms: extractive text summarization or abstractive text summarization.Te extractive TS locates, highlights, and extracts the essential phrases from the source text before combining them to efectively summarize the entire text.It is really simple and constantly checks for proper grammatical structure.It has primarily been employed for lengthy texts that ofer more focal points for summarizing the text.Te location of the text passage where the summarizing process should pay special attention is designated as the focal point.By maintaining the keywords and phrases from the original content, abstractive summarization (AS) can, on the other hand, summarizes the documents.It reduces some of the textual grammatical irregularities that extractive summarization produces (ES).
Although the summary appears to be accurate, it is actually quite repetitious.Additionally, it was not efective for large text documents [2] since a single fxed-length vector used to summarize the given text sequence signifcantly loses information.Tis extractive summarization approach would primarily have an impact on text summarization accuracy.Te encoder-decoder neural network (NN) model has done incredibly well when used with the abstractive summarization technique for brief text [3].Te multilayered long short-term memory (LSTM) is utilized for a long input text sequence and remembers the long sequence of text for predicting the delicate words, efectively solving the problem.For some predefned datasets, these models were performing well.However, there are still signifcant research gaps and restrictions.Te following list summarizes the main defciencies and limitations: Te word embedding efect is inappropriate because the input datasets vary in their levels of ambiguity, which further prevents semantic textual entailments from working.Similar to this, many NLP-based initiatives produce erroneous results due to a lack of contextual word representation.Te important text summary models are shown in Figure 1.By employing these summarization models (Seq2Seq, BERT, Attention), the research aims to derive valuable insights from the COVID-19 dataset, making it more accessible and comprehensible for analysis and decision-making in the context of the pandemic.
We used the pre-trained language model known as the Bidirectional Encoder Representation of Transformers (BERT) [4], which is frequently implied in many natural language projects, in light of the aforementioned drawbacks.With a big data corpus as training, the BERT is well equipped to provide superior sequence word embedding.Te semantic importance of text documents can be efciently estimated using the vectors' similarity.For natural language processing (NLP) applications, Word2Vec (Word Vector) [5], Glove (Global Vector) [6], BERT [7], etc., are the most often utilized word embedding.Tese models will take into account a number of strategies to condense the textual information about coronavirus.By displaying the benefts and drawbacks of each model, it will compare the performance of the models.Te attention neural networks will be used to construct an ensemble model.Tis research signifcantly advances the feld of text association by ofering a novel COVID-19 text summarizing model that has surpassed other prior experiments.Te entire process involves considering taking the sentences from COVID-19 datasets and efectively retrieves the distributional semantics of the sentences from the techniques such as Word2Vec, Glove, and BERT and then apply hierarchical clustering to group the sentences based on their semantic similarity.Tis sort of approach has been pervasively used for the NLP tasks such as topic modeling, text summarization, information retrieval, and sentiment analysis.Hence, in this research work, we have proposed this ensemble model that has been designed to be efcient in terms of both memory occupancy and efciency that provides reasonably good performance for the assigned task.

Related Works in Text Summarization and the Motivation
With the emergence of COVID-19, many research institutes such as Allen Institute for AI [8]  In this connection, the authors of [10] deployed an NLPbased medical inference engine (i.e., called WellAI) to accumulate medical-related concepts with appropriate ranking mechanisms and produce a structured list of concepts with high precision and recall scores.Te Tmcovid tool [11] was efectively utilized to populate sufcient biorelated concepts Later, with the advent of sequence-to-sequence models proposed by the authors of [12], it gained massive research attention for NN-based NLP systems and produced qualitative results with high precision.
Earlier, the research communities widely used LSTMbased approaches in applications such as image captioning, text categorization, entity classifcation, and speech recognition.LSTM is the alteration of the recurrent NN (RNN).LSTM has pervasively been used for efective text summarization processes and made text summarization possible, particularly for the abstractive summarization.It scores comparatively well on the extractive summarization.Te authors [13] proposed a novel approach to predict the input's core parts and deeply apply the attention mechanism with suitable transformers to summarize and translate the given input efectively.Te summarization process is mostly extractive because it can efectively detect the input's potential keywords through weight and ranking mechanisms [14].Te extractive summarization [15] is just a reproduction of the top-k-rank sentences.Te document understanding conference (DUC)-2003 and DUC-2004 [16] competitions standardized the abstractive summarization and enabled practitioners to gather more popular new stories on divergent topics from diferent sources and later to analyze the stories for their summarization correctness.
In 2004, DUC-2004 recognized TOPIARY [16] for its attempt to couple both linguistics techniques and unsupervised algorithms in providing standard compressed results.Later, DUC-2004 was used to recognize some abstractive summarization processes.DUC-2004 was also used to formulate the conventional phrase table based on some machine translation approaches, compression using weighted tree transformation rules [17], and quasisynchronous grammar approaches [18].Latent semantic analysis (LSA) [19] is an algebraic learning algorithm that has been predominantly used in research felds such as information retrieval, text summarization, entity categorization, and image classifcation.As the appropriate culmination of statistical and algebraic approaches was taken, the LSA can potentially detect the words' inherent structure and their context by singular value decomposition (SVD) [20] through its input matrix and document representations.Te conventional methods such as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) [21] did not give a correct document matrix for the input.Hence, the results were not considered for further evaluation.
Both BoW and TF-IDF models usually required some external documents to calculate the sentence similarity and process the results precisely based on the document matrix generated for the supplemented document.Later, word embedding methods were introduced to fnd the context associated between every two pairs of words and able to match the words to their semantic roots.Te word embedding tries to continuously learn the vector representation of the words and identify the syntactic and semantic inference through its neural network techniques.Te advantage of the word embedding method is that it does not require any other external document for knowledge evaluation and infers the patterns based on the context given in the entire document.However, inferring the correct context for the words requires some unlabeled input data for establishing the semantic space.Te word embedding method can precisely defne each word's meaning in the processed input document through its semantic space and infer the correct contexts associated with every document pair.Recently, the word embedding method extended into sentence-and document-level embedding [22].Te deep NN methods have gained immense popularity in recent times, and it has been widely used in some applications such as text summarization, entity disambiguation, and fake news detection.However, recently, the deep neural network models have also been used for abstract and extractive summarization.Te extractive query-oriented summarization model can create a feature space out of the term frequency generation.It develops the local word vectorization for each vocabulary in the input sentences.Likewise, the authors of [23] introduced the encoder-decoder model that increases the convolutional neural networks (CNN's) capabilities through its attentional model.Te CNN algorithm has been used predominantly in image processing, but in recent years, its performance on sequence data analysis, such as named entity recognition (NER) and natural language processing (NLP), has become vibrant and made progressive attempts specifcally in the feld of artifcial intelligence.Tis model efectively discards the full-sequence order while producing the input document's hidden representation and fxing the number of iterations based on the n-gram model principle.Similarly, the authors of [24] used the RNN-based sequence model for ES and top-k sentences were ranked based on the binary decision-making process.Te authors of [25] attempted to use the attention mechanism to compute the query relevance based on sentence ranking, which converged randomly for every iteration.Table 1 lists some of the standard NLP methods and techniques used for text summarization.

Comparative Analysis of Extractive Text Summarization (ES) Methods
In this study, we selected three baseline models for efective extractive summarization, including BERT, sequence-tosequence mechanism, and attention mechanism, and then present the ensemble approach to determine the variation in accuracy progress.Regarding the experimental outcomes, we noted variations in the baseline models.We also examined the modest variations that exist between the three baseline techniques that correspond to the NN translation model.A bidirectional gated recurrent unit (GRU)-RNN is always present in the encoder of the NN model [31], while a unidirectional GRU-RNN is stored in the decoder.Te same hidden state is integrated by the encoder and decoder to the source hidden states, and the SoftMax layer is used to generate the target words from the extended vocabulary.Te main motivation behind employing a GRU-NN encoder and decoder approach lies in the ability to train a unifed model that operates on both source and target text sentences

International Journal of Intelligent Systems
Table 1: Some of the NLP methods and techniques used for summarization.International Journal of Intelligent Systems simultaneously, accommodating varying lengths of input and output text sequences.In Figure 2, these encoders and decoders are integrated into various deep learning models, including Seq2Seq, Attention, and BERT.Te bidirectional aspect is an integral part of the encoder's architecture, incorporating self-attention.Te precise position and order of words within a sequence play a pivotal role in comprehending the overall meaning of sentences, especially in text summarization.In the encoder, word embeddings take into account both word positioning and sentence order.A complete description of each of the three baseline models, along with an explanation of their standard operating methods and pertinent empirical analysis, is provided in the parts that follow.Tis research project entails experimenting with the pre-trained Attention, Seq2Seq, and BERT models.Te ranking of the fnal summation text of the input sentences was determined using an ensemble of these three summarization models, and the top N summarizations were gathered for performance comparison.For various degrees of the ROUGE score, the ensemble model and the baseline model of summarization have also been contrasted.Te ensemble model for the task of text summarization is shown in Figure 2.

Extractive Summarization (ES)
Using BERT.Te extractive summarization is highly difcult for many NLP systems to understand, as noted by the authors of [32], but it has made good progress in recent years, thanks to the development of the BERT model, which provides improved embedding with transformer models.A decent summarizer should be able to scan the full text for intrinsic meanings and select sentences based on how the articles are internally embedded.Te TextRank model [33] was chosen as the foundational strategy to guarantee its accuracy.On specifc benchmarked references or any predetermined gold summary, the key problems for evaluating the text summarization are based.Finding the corpus needed to evaluate fresh information on unique subjects is becoming increasingly difcult.Terefore, the standard measure for the summary evaluation can be tested using the Recall-Oriented Understudy for Gist Evaluation (ROUGE-N) metric, which is accepted.Between the gold summaries and a few predetermined categories, the ROUGE-N can estimate the creation of N-grams.To efciently summarize the content created by machines, the ROUGE-N would measure the words.Te BERT model is particularly efective in parsing the meaning of the provided articles and papers and eliminates stop words, stem words with their root terms, and lowercase all text for simple transformation.Table 2 lists a few of the pre-trained BERT models.
To accomplish this objective, we tokenized the input material using the space package [40] and embedded the signifcant tokens using BERT through the sentence transformer package to maybe acquire some insights for the provided article/documents.Te average tokens contained in the sentences are used to establish the document's standard mean, and the meaningful tokens are given more weight.In order to efectively disambiguate and identify each sentence in the article or document, we have additionally given each sentence a weight.Te algorithm that determines the score for each article category using the binning technique would be completely responsible for the absolute labeling of the extracted summary.Also, it determines the exact match of the extraction summary through subsequent stages, which are essential.We signifcantly changed the BERT model to meet the needs of our extractive summarization in order to make it extremely efective.When creating the summary for the given document, we frst attempt to determine the sentence weight for each sentence in the documents using the dot product between C  International Journal of Intelligent Systems and D. After that, order the phrases from the highest to the lowest weight before selecting the top k for the summary.Te link between the summarized text and the original input material is determined using the Recall-Oriented Understudy for Gist Evaluation (ROUGE), a common scoring algorithm.In essence, the precision calculation is done to guarantee the ROUGE-N accuracy rate.If it is determined that the trigram and the summary S overlap, the dot product of C should be discarded, and the remaining computed candidate sentences in D should be removed.Te ROUGE score serves as a standardized method for evaluating the performance of text summarization and text translation models.Various ROUGE scale variations exist to gauge the degree of correspondence between a generated summary and its original reference summary, including ROUGE-N, ROUGE-L, ROUGE-S, and several others.Tis metric provides a reference or relative measurement that can be compared to human evaluations.In ROUGE-N, "N" represents N grams; this can be 1 or 2, denoting unigrams and bigrams respectively.ROUGE-L employs "L" to signify the longest common subsequence (LCS) of words that match between the candidate text and the reference summary, with a strong emphasis on preserving word order.When the preservation of word order in sentences is crucial, as is often the case in text summarization, the ROUGE-L score is utilized.Te term gramme (N-gram) is indicated by the letter "N" in this sentence.Te maximum length for position embedding in the original BERT model [41] is 512.We have overcome this restriction by incorporating a few extra position embeddings in other encoder settings.In order to possibly distinguish between distinct sentences in the imported document, we have additionally included some intermediary segment embeddings.
Table 3 depicts the performance of BERT model that we executed on the dataset COVID-19 and registered the total running time of every forward pass of the BERT model.
In comparison to the other two models discussed in this study, the BERT model summary used during the summarization process achieved 40% accuracy while using only 20% of the test data.Finally, the dense layer of the model summarizes the condensed summary of the input text while the dropout layer of the model prevents overftting.During the process of developing the model with many rounds of epochs, we employed the Adam optimization strategy with the cross entropy loss function.In this study, the BERT model has been applied in two forms: BERT-base and BERTlarge.With 110 million characteristics, the BERT base has 12 transformer layers and 12 attention layers.With 340 million parameters, the BERT big models contain 24 transformer layers and 16 attention layers.Te frst layer that accepts the input of max len is the input layer (512).Tis length was achieved by padding the input sentences.In order to prevent overftting, the output of the transformer is sent as input to the drop layer.Finally, the activated dense layer provides the summary of our text input.
Sequence-to-sequence model with two encoder LSTM layers and two decoder layers is the other model employed in this study.Here, the input sentences were lengthened to a maximum of 30 before being processed through an embedding layer to create embedding word vectors for each word that was included in the input text.Te output of the embedding layer is then transmitted through two LSTM layers-two encoding layers with padding input lengths of 300 and two decoding layers-before being decoded.Te attention layer produces a compressed summary of the provided test text data list.Te comparison results of various text summarizing techniques are shown in Table 4.
When compared to other text summarization models, the BERT model has done remarkably well, as seen in Table 4. ROUGE-1 has been used to conduct the evaluation while taking into account fundamental characteristics like count-vectorization, TF-IDF score, and Soft Cosine Similarity measure.Te COVID-19 datasets were taken into account when the algorithms listed in Table 4 were being evaluated, and their accuracy rate was recorded for benchmarking.[42] have recently benchmarked their performance in text analytics, spanning several sectors.Te recurrent neural network (RNN) model has mostly been employed for sequence modelling and language creation tasks.However, due to some expanding gradient concerns, the typical RNN model has had some trouble training the datasets for text summarization.Te long short term memory (LSTM) model has typically been employed to address gradient difculties, but it has not provided the appropriate level of judgment for text summarization.Additionally, the RNN-based computation experienced some problems locating previously hidden states and had problems with sequential dependency sequences.As a result, the RNN was unable to assess the memory and computation requirements of lengthy text document sequences.As a result, we used large collections of lengthy texts as the input for the sequence-to-sequence the deep  [43], with the intended output being the condensed summary.Te developed model takes as input a big sentence of text and outputs a concise summary of it.Assume that the input text is made up of a succession of "I" words, such as T1, T2.Te acronym TI was developed from a fxed-size vocabulary of the summary, which takes in T as input and produces the condensed text phrase "S" with length J, even if S is substantially smaller than "T" (J < I).Te straightforward sequence-to-sequence model [17] for text summarization is shown in Figure 3.

Sequence-to-Sequence (Seq2Seq) Mechanism to Summarize Text. Deep neural network models
Encoder: Te embedding layer has frst converted each word in the input encoder into an embedding word vector for the distributional representation of the entire sentence.For all iterations, we processed the text in leftto-right and right-to-left directions using a bidirectional LSTM model [44].Decoder: Te decoder receives the fnal word of the input sentence, eats it, and then uses a hidden layer unit to produce the output summary word.In the sequential processing of the text, the decoder provides the same word as input for producing the following word in a greedy manner.
Te stepwise procedure from sequence-to-sequence (Seq2Seq) is illustrated below: Step 1: Let us assume the lengthy text "T" as the input to the encoder and the summary text "S" as the output of the decoder of the Seq2Seq model.Let the top "N" most likely words be v1, v2, v3, . .., vn as per the decoder network output over the vocabulary V. TEXT (T): In the United States of America, the coronavirus death is 1 Million.

Summary Text(S): US COVID-19 death 1L
Step 2: Te next possible word in the sequence is predicted if S1 has already occurred using the conditional probability formula P(S2|T, S1) by maximizing the probability of S1 and S2 occurring together using P(S1, S2/T) � P(S1/T) * P (S2/T, S1) Step 3: Similarly, the third possible word has been determined using the conditional probability P(S3/T, S1, S2) by maximizing the probability of S 1 , S2, and S3.
Te above steps would be repeated until the end of a sentence is reached in the sequence of processing.
To summarize the COVID-19 related datasets, the selection of hyper parameter set "transformer_prepend" has been introduced and utilized the tensor2tensor library for efective fltering and categorization.Te comparison of the most important hyper parameter diferences has been laid out in Table 5. [41] that enables the decoder to assign various weights and to review earlier words in the input sequence before generating the next word.Te attention function of the decoder enables it to use contextual data pertaining to various input segments.Finally, the focus makes certain that the model employs several input segments with diferent weights, increasing the information coverage during the summarization phase [45].When creating the relevant summary word in the output, the attention mechanism further concentrates on and remembers just specifc passages from the input text.Te attention model creates a context vector for each output it comes across rather than encoding the input sentence into a single, fxed-length context vector [46].Te attention mechanism takes into account every word in the summary output and generates just the most signifcant words from the input text by giving these words a higher weight.Te attention mechanism [47] for condensing the content of the guidelines is shown in Figure 4.

Attention Mechanism for Text Summarization. Our model incorporates the attention mechanism
Algorithm 1 exploits the step-by-step procedure towards accepting the text input and generates the condensed summary by applying the embedding and encoder decoder with an attention mechanism.International Journal of Intelligent Systems frequently asked questions (FAQs) and their corresponding answers were also collected to construct the COVID-19 summarization dataset.In contrast, the question is viewed as a summary, and the corresponding answer is assumed as its lengthy sentence text.Analyzing the answer text and preparing the related question summary through the manual is timeconsuming and leads to more ambiguity in the text summarization process.
Our objective is to summarize the lengthy guideline text using deep learning model-based techniques.Te dataset consists of more than 500 guideline texts related to various information covering topics such as summary guideline texts, HTML links, categories, countries, cities, region, and GPS information.Te initial data processing and data cleaning tasks were applied to the dataset to fne-tune the dataset suitable to build the model more efectively and efciently.We used the Keras library [51][52][53] to remove the stop words, drop the duplication, and avoid the NA (not available) summary/text values.Te unwanted symbol characters and punctuations were removed potentially without afecting the objective of the solution.A separate dictionary of words is also used to expand the contradictory words such as can't and couldn't.Special tokens such as <SOSTOK> and <EOSTOK> were added to the summary to   (1) Embedding layer (2) Encoder LSTM layer (1 to 3) (3) Decoder LSTM layer (4) Attention layer (5) Dense layer Our model will not learn the non-trainable parameters from weighted vectors of the embedding matrix.Te check point facility in Keras helps us to save these best weights and has been used for early stopping of the model in 10 epochs.We have used the embedding layer to convert the integer sequence of words of text and summary into one-hot-vector method with their semantic meaning.Te categorical cross entropy cost function is used for fne tuning the model.Te epoch versus loss plot is shown in Figure 6.
During our training process, we evaluate the proposed model performance based on hold-out validation and intense training on the COVID-19 dataset.Ten, we plot the major performances of the model through each training step, i.e., each epoch of an ensemble model tree.Tese learning curves help to review this model and diagnose the learning processes, such as overft or underft model.Te underftting models represent that the training dataset has not learned sufciently and produces low training error values.On the other side, the overftting model has learnt the model so well and produces more statistical details and other random fuctuations in the given training datasets.

ROUGE (Recall-Oriented Understudy for Gist Evaluation)
. ROUGE is a metric used for measuring the score/accuracy of the summarization task based on recall [54].It evaluates the score by fnding the relation between the number of overlapping (matched) words in the predicted and original summaries.

ROUGE Recall
count of overlapping words count of total words in reference summary . (1)

ROUGE Precision
count of overlapping words count of total words in predicted summary . ( In ROUGE-N, the value N refers to overlapping ngrams.Te notational expression for obtaining the score can be written as follows: where "o" refers to the count of overlapping words present in the original and reference summaries and "p" refers to the count of the predicted/proposed set of summaries by algorithms.

International Journal of Intelligent Systems
Let us assume that we are calculating ROUGE-2, aka bigram matches.Te numerator  o  p loops through all bigrams in a single original summary and calculates the number of times an overlapping (matching) bigram is found in the candidate summary.Tis process of calculating the score is repeated for the overall reference summaries present in our test set [7,55].Te denominator simply counts the total number of bigrams in all reference summaries.Te ROUGE scores for the baseline, BERT attention model, and Seq2Seq pre-trained summarization models for Top 7 Guideline Texts are shown in Tables 6-8, respectively.Figure 7 represents the ROUGE score chart of BERT for Top 7 Guideline Texts. Figure 8 illustrates the ROUGE score chart of the Attention Model for Top 7 Guideline Texts. Figure 9 portrays the ROUGE Score chart of Seq2Seq Model for Top 7 Guideline Texts.Te precision, recall, and Fmeasure scores of ROUGE-i have been, respectively, notated as RiP, RiR, RiF in the respective fgures, whereas "Ri" refers to ROUGE score at the ith level of ROUGE and "P" refers precision refers recall and F refers F-measure.
Table 9 shows the average ROUGE score of three different models that we have built using the deep learning approach.Upon comparing the scores of such models, the BERT pretrained model outperforms in the process of summarization of the textual guidelines and generates the condensed summary of the COVID-19 dataset.
Figure 10 shows the details of the extractive text summary generated by the three baseline models.

Ensemble Approach of Text Document Summarization.
Finally, we integrated every model we created using the ensemble approach, which we usually employed for all kinds of machine learning tasks.Te project represents experimental work using the Seq2Seq, Attention, and pretrained BERT models.Te ranking of the fnal summation text of the input sentences was determined using an ensemble of these three summarization models, and the top N summarizations were gathered for performance comparison.For various degrees of ROUGE score, the ensemble model and the baseline model of summarization have also been contrasted.Te outcomes of the ensemble model for text summarization are shown in Table 10.
Te diferent levels of ROUGE scores were evaluated through the correlation co-efcient between ROUGE scores and the reference summary.Figure 11       International Journal of Intelligent Systems

Conclusions
Te datasets connected to COVID-19 have been efciently summarized in this study using BERT models, Sequence-to-Sequence, and attention mechanisms.According to our analysis, the ensemble model fared quite well in the ROUGE examination.In order to produce the accurate summary for the loaded datasets, our ensemble model efciently flters the semantic characteristics and extracts the words' implicit meaning.Te main beneft of this suggested ensemble model is that it uses hierarchical clustering to connect related sentences and distributional semantics of the words for categorization.Te integration of hierarchial clustering and distributional semantic approach creates a robust framework for text summarization and helps to gain the granular understanding of relationships between the sentences present in the COVID-19 dataset.Tese word embedding models enable to categorize the words into semantic clusters that refect the appropriate meaning and context for the sentence.By employing the predefned threshold limit, this integrated technique facilitates the selection of the Top-k summaries and produces efective results.

Limitation.
Even a large vocabulary size has not always helped the analysis in some cases.Similar to this, factual information was frequently produced improperly and with the inappropriate substitution of some popular tales for unusual words.Tis is considered the model's limitation.

Future Work.
Tese tests were rigorously conducted using Google Colab and carried out performed on a single GPU resource.However, using fne-tuned models for big hyperparameters would not be suitable for efcient extractive synthesis [56,57].Additionally, rather than using domainspecifc applications, we might search for an ensemble model that works for general extractive summarization.Only domain-specifc datasets can be used with the suggested ensemble model.Similarly, we can attempt abstractive summarization for some datasets relevant to academia, which will yield positive outcomes for dropout analysis.Although we have not signifcantly reduced the size of the pre-trained model, using approaches such as pruning and quantization would have been very benefcial in this model.

1. 1 .
Structure of the Paper.Te rest of the paper is organized as follows.Section 2 critically reviews the literature on the extractive summarization process and highlights the critical text summarization proposals to motivate the present work.Section 3 critically analyses the TS mechanisms implemented through three prominent text summarization techniques and highlights their limitations.Tese text summarization mechanisms are BERT, Sequence-to-Sequence, and Attention Mechanisms.Section 4 presents the results obtained based on the three summarization techniques used in the paper and highlights the underlying diferences through appropriate measures, such as Recall-Oriented Understudy for Gist Evaluation (ROUGE).Conclusions and directions for further research are presented in Section 5.

2 Glove 3 Fast 5 ELMo
of representing the context words and most frequent words from the input document to couple both Word2Vec and matrix factorization methods for identifying the co-occurrence list of words for the probable context generation for OOV words and unorthodox contents.It has been well trained in and decoder for fnding the hidden layer context for text generation demands on word vectors and hence requires less memory for evaluation 4

(i) Step 1 :
Load the COVID-19 related datasets and feed them into the BERT model.(ii) Step 2: Find the cosine-similarity matrix between the two vectors C and D with equal dimensions of BERT hidden layers.(iii) Step 3: Calculate each labeled token's probability as yielded by the dot product of C and the token represented in BERT's fnal hidden layers, followed by a SoftMax of the document's entire token.(iv) Step 4. Te fnal summary of the BERT model is computed using the token return probability after the document's end with calculated similarity Vector D.

Figure 2 :
Figure 2: Architecture of the text summarization with ensemble model.

Figure 3 :
Figure 3: Simple sequence -to-sequence model for summarization of text.

Figure 4 :
Figure 4: Attention mechanism for summarizing the guideline text.
represents the comparison of the performance metrics of the ensemble model and baseline models.

Figure 5 :Figure 6 :
Figure 5: Frequency distribution of vocabulary of guideline text and summary.

Table 2 :
Some of the pre-trained BERT models for evaluation.

Table 3 :
Te average running time of BERT model of a forward pass.

Table 4 :
Comparison result of text summarization algorithms.

Table 5 :
Selection parameter for hyperparameter settings.
Te guideline data related to COVID-19 is the input to the deep learning network, initialization of attention weight Input:

Table 9 :
Average ROUGE score results for the baseline line models.

Table 10 :
Results of ensemble model for text summarization.Figure 11: Comparison of the performance metrics of ensemble model and baseline models.Text "A novel coronavirus is a new coronavirus that has not been previously identified.The virus causing coronavirus disease 2019 (COVID-19), is not the same as the coronaviruses that commonly circulate among humans and cause mild illness, like the common cold.A diagnosis with coronavirus 229E, NL63, OC43, or HKU1 is not the same as a COVID-19 diagnosis.Patients with COVID-19 will be evaluated and cared for differently than patients with common coronavirus disgnosis." Ground Truth: The virus causing coronavirus disease 2019 (COVID-19), is not the same as the coronaviruses that commonly circulate among humans and cause mild illness, like the common cold.BERT: Novel coronavirus is a new virus.Coronavirus is not same as mild illness and cold.Covid-19 patients will be evaluated differently.Seq-to-Seq Model: Novel coronavirus is a new coronavirus.Diagnosis for this virus is not 229E, NL63, OC43 or HKU1.Attention Mechanism: Coronavirus has not identified previously.Coronavirus is different.It is not same as illness and cold.Evaluation for Covid-19 is different for patients.