SentMask: A Sentence-Aware Mask Attention-Guided Two-Stage Text Summarization Component

. Te text summarization task aims to generate succinct sentences that summarise what an article tries to express. Based on pretrained language models, combining extractive and abstractive summarization approaches has been widely adopted in text summarization tasks. It has been proven to be efective in many existing pieces of research using extract-then-abstract algorithms. However, this method sufers from semantic information loss throughout the extraction process, resulting in incomprehensive sentences being generated during the abstract phase. Besides, current research on text summarization emphasizes only word-level comprehension while paying little attention to understanding the level of the sentence. To tackle this problem, in this paper, we propose the SentMask component. Taking into account that the semantics of sentences that are fltered out during the extraction process is also worth considering, the paper designs a sentence-aware mask attention mechanism in the process of generating a text summary. By applying the extractive approach, the paper frst selects the most essential sentences to construct the initial summary phrases. Tis information leads the model to modify the weights of the attention mechanism, which provides supervision for the generative model to ensure that it focuses on the sentences that convey important semantics while not ignoring others. Te fnal summary is constructed based on the key information provided. Te experimental results demonstrate that our model achieves higher ROUGE and BLEU scores compared to other baseline models on two benchmark datasets.


Introduction
With the rapid increase in the number of articles and papers, we have found ourselves drowned in the sea of documents. Te time-consuming and energy-draining reading process can be avoided by creating a concise abstract of a text and transmitting the main concept to the reader. But summarizing articles automatically is a difcult process as it necessitates models to rewrite a long article into a concise and fuent version while preserving the essential information. In the area of automatic text summarization, extractive and abstractive methods are two primary paradigms. To produce a summary, the extractive [1] techniques select the salient phrases or sentences exactly from the original source, whereas the abstractive [2] techniques generate new phrases and sentences from scratch. However, because relevant information is spread throughout all sentences rather than contained in a few, extractive models sufer from a lack of semantics and cohesiveness in summary sentences, as well as redundancy in certain summary sentences. On the other hand, abstractive summarization models sufer from the slow encoding of long documents and the unreliability of the generated summaries.
Recently, some researchers have tried to combine these two methods in an extract-then-abstract way [3,4]. Te work [3] proposes a hybrid framework HYSUM for text summarization, which maintains salient content by switching rewriting sentences and copying sentences according to the degree of redundancy. Te work [4] provides a hybrid abstractive-extractive method, which scans a document, produces prominent textual fragments that highlight its main ideas, and selects the important sentences by calculating the BERTScore. Tese models design a twostage pipeline to pick out salient sentences from a source document frst and then rewrite the extracted sentences into a complete summary. However, most research using the extract-then-abstract framework generates summaries based solely on the extracted sentences, which loses robustness. In many cases, signifcant content might be fltered by the extraction model, causing severe information loss in the generation process.
Furthermore, it is difcult to comprehend and generalise articles due to their rigorous grammatical statements. To maintain the consistency of professional grammatical definitions and logic within original sentences, it is vital to preserve sentence-level information and semantics in summaries, which have also been ignored in previous works.
To overcome both of these issues while combining the benefts of both paradigms, in this paper, we propose SentMask, a novel sentence-aware mask attention-guided two-stage text summarization component, adaptively reducing the attention weight of fltered sentences by training neural networks. Taking Figure 1 for example, the existing methods generate the summary according to the selected sentences extracted by the extractors only. However, the sentences also contain some information, which should not be lost, such as "adverse events." Tus, the paper utilizes these sentences by reducing rather than deleting their attention weights.
An extractive summary is to extract important sentences to form a summary to achieve the function of summarizing the full text. During the extraction process, the model fully considers semantic information between sentences. Te generative summary is to generate orderly words, form sentences, and then form a summary to highly summarise the entire article. During the generation process, the semantic information between words is fully considered by the model, but the emphasis on the semantic information between sentences is weakened. In order to make full use of the semantic information between each word and sentence, we employ an extractor to extract the initial summary and an abstractor to abstract the fnal summary. Terefore, our model takes into account both word-level and sentence-level information in the text generation process. Unlike selecting important words in other works that separate the semantics of the whole, the paper uses an extractor to select essential information at the sentence level, faithfully preserving the semantics of the whole sentence. In this way, with the above issues solved, our model can avoid syntactic errors and incoherent errors in summary sentences and ensure that the generated phrases are fexible and stable. To better leverage the results of the extractor algorithm and preserve the necessary global information, the paper proposes a sentenceaware mask attention mechanism in our model.
Te paper evaluates the efcacy of our semisupervised and supervised SentMask models, respectively. Te semisupervised SentMask model consists of the TextRank algorithm [5] and sequence-to-sequence model (Seq2Seq) [6], while the supervised SentMask model consists of the MemSum algorithm [7] and BART [8] model. Te paper leverages the extractor algorithm to extract important sentences for summarization. Based on its results, the paper then masks other sentences by reducing rather than deleting their attention weights. Te noise reduction capability of our model is demonstrated by the weight reduction of the information in trivial sentences, which, to some extent, relatively increases the weight of important information.
Te following are our primary contributions: (1) Te paper proposes a brand-new two-stage hybrid abstractive and extractive summary method. While acquiring the information of the salient sentences generated by the extractor, our abstractor also extracts knowledge in a specifc way for the nonsalient sentences. Our method is implemented in semisupervised and supervised versions, which include unsupervised and supervised extractors, respectively. (2) Te paper proposes a sentence mask module, a sentence-aware mask attention mechanism, and a mask-aware copy mechanism. Te sentence mask module aims to transform a sample input into a mask matrix. Te sentence-aware mask attention mechanism reduces the nonsalient sentences' attention weight rather than losing its information. Te mask-aware copy mechanism copies only words from salient sentences since there could be noise throughout the article. (3) Te paper extensively evaluates SentMask on two benchmark datasets. Te results of the experimental evaluation show that SentMask outperforms the current state-of-the-art in these evaluations.

Traditional Summarization.
Several traditional summarization approaches for automatic summary generation have been advanced over the years, incorporating a variety of statistical-based [9], topic-based [10], graphbased [5], and semantic-based [11] techniques. For instance, the work [9] brings improvements by involving sentence position, sentence length, and keyword sentence features. Te work [10] proposes a term frequency-inverse document frequency algorithm, which measures the importance of keywords based on their frequency of occurrence and uses it to assess each sentence. Te abstract is extracted from the highest-scoring sentences. Biased TextRank [5] is a method for capturing meaning closeness between graph nodes and a target text that depends on document representation models and similarity measurements. Te latent semantic analysis [11] is an unsupervised technique that encodes text semantics based on the observed cooccurrence of words.
Traditional unsupervised text summarization models do not require any training data and generate the summary by accessing only the target documents. However, these traditional methodologies do the summarization task using manual design features, which shows poor generalization ability for new data.

Neural Networks Summarization.
Te two most common types of study are extractive summarization and abstractive summarization. Extractive summarization methods commonly construct an encoder-decoder architecture, with the graph attention network [12] as an encoder and autoregressive [13] or nonautoregressive [14] decoders. Te work [7] proposes a multistep extractive summariser based on reinforcement learning-based Markov decision processes, which considers information from the current extraction history.
In recent years, pretraining has been used in several varieties of transformer architecture in various ways, including encoder-only pretraining models like XLNet [15], decoder-only pretraining models like GPT [16], and encoder-decoder pretraining models like T5 [17] and BART [8]. For instance, the work [18] distills large pretrained sequence-to-sequence transformer models into smaller ones for faster inference and with the least amount of performance loss.
Two-stage document summarizing systems have been developed in recent studies. Te frst stage of this framework usually involves extracting some segments of the original text, and the second stage involves selecting or modifying these segments. Tere are various extract-then-abstract summarization methods such as extract-then-rewrite and extract-then-compress. In extract-then-rewrite models, the method [19] employs a coarse-to-fne approach inspired by humans, extracting all relevant sentences frst and then decoding them simultaneously. Te work [20] introduces a novel training signal that employs reinforcement learning to directly maximise summary-level ROUGE scores. In extract-then-compress models, the model [21] selects phrases from the document, identifes plausible compressions based on constituent parses, and rates those compressions using a neural network model to construct the fnal summary. Te work [22] proposes a method for learning to select sentence singletons and pairs, which would subsequently be employed by an abstractive summariser to build a sentence-by-sentence summary, with singletons compressed and pairs fused.
Previous research using the extract-then-abstract framework generates summaries based solely on the extracted sentences, which loses semantic information in the fltered sentences, causing a severe information loss. To that end, the paper designs a sentence-aware mask attentionguided two-stage text summarization component, which captures the gist of the text.

Materials and Methods
In this section, the paper introduces our sentence-aware extract-then-abstract summarization framework in detail as illustrated in Figure 2. It consists of four components: (1) An extractor, an importance-aware content selection component that utilizes the TextRank or MemSum [7] algorithm to extract and organize salient sentences. (2) An abstractor, a Seq2Seq [6] or BART- [8] based abstract generation component with sentence-aware mask attention mechanism that compresses and rephrases both the extracted sentences and the original article to a succinct summary. (3) Te sentence-aware mask attention mechanism, a modifed version of the attention weight mechanism by masking the nonsalient sentences. (4) Te mask-aware copy mechanism, a modifed version of the copy mechanism by copying words from the salient sentences rather than the whole article. Te paper describes these components in detail as follows.

Extractor.
First, we split the article into sentences. Let x denote the original sentences of the article, which consists of a sequence of sentences (x � u 1 , u 2 , . . . , u m ). Each u i consists of a sequence of words . Tese sentences are constructed as a directed graph represented by a sentence similarity matrix with the Tex-tRank algorithm or input to a multistep episodic Markov decision process with historical awareness using the MemSum algorithm. After the extractor algorithm, a score is calculated for each sentence, which represents the "importance" of the sentences. Te sentences are sorted in reverse order of the score, and the frst K sentences with the Whereas dichotomous data will be ... intervals... Acute physiology and ... score.
The purpose of this ... pancreatitis.
The purpose of this ... pancreatitis.
The time of first bowel sound The purpose of this ... pancreatitis..time of first bowel sound corresponding

Sentences Spliter
Abstract with existing methods

Extractor Abstractor
The purpose of this paper is to describ pancreatitis.
The purpose of this paper is to describe pancreatitis and consider the consequences of adverse events.

Sentences Mask Abstractor
The purpose of this ... pancreatitis. Acute physiology and ... score. Whereas dichotomous data will be ... intervals. And the adverse events ... outcomes....The time of first bowel sound Figure 1: Sample summary of an article from the MS2 dataset corpus. Existing methods generate the summary based on the sentences selected by the extractor. While the paper reduces the attention weights of nonsalient sentences by using a mask attention matrix with a sentence-aware masked attention mechanism, the sentence mask module in our model is a transformation of a sample input into a mask matrix.
International Journal of Intelligent Systems 3 highest scores are chosen to be the draft as the input of the abstractor to form the fnal summary.
x E denotes the initial sentences extracted by the extractor algorithm, which belong to the sentences in x .
So far, the paper is discussing the sentence level. Te extractor helps us to preserve the whole sentence semantics. Te paper then converts this information to the word-level since the Seq2Seq and BART models would take the wordlevel information into account.
Te paper utilizes a sentence mask module to transform a sample input into a mask matrix. Te transformation of the input of the SentMask model is shown in Figure 3.
x mask indicates whether the word is in the selected sentences.
, m i k is shown as follows: where x mask will be the essential component for us to perform a sentence-aware mask attention mechanism, as it conveys information about how important the word is. To make it clear, the paper reformulates

Abstractor.
After obtaining the initial salient textual fragments representing the source article's key points by the extractor, the paper generated the summary with the assistance of these extracted sentences. Te paper uses a pretrained word representation to map each token to a vector. Ten, the paper utilizes an abstractor to encode and decode the whole article, abstractor ∈ Seq2Seq, BART . Te decoder is initialized with the encoder's last hidden state. In Seq2Seq, our encoder and decoder are GRU-based. h t is the encoder's hidden state and s t is the decoder's hidden state at the time step t. Te context vector is c t � i a t,i h i .
In the BART, our encoder and decoder are transformer architecture. h E is the hidden state of the encoder, and h D t is the hidden state of the decoder at the time step t.
where y t−1 is the word generated in the last step. Te paper uses a sentence-aware attention mechanism in both of our abstractors. In addition, the paper utilizes a mask-aware copy mechanism in the Seq2Seq.

Sentence-Aware Mask Attention Mechanism.
Based on the attention mechanism, the paper proposes a sentenceaware attention mechanism in this paper, which is employed both in semisupervised and supervised modes. a t,i is the attention score obtained by our sentence-aware mask attention mechanism. It consists of two parts: standard wordlevel attention and sentence-aware masked attention on the sentence level. Te word-level attention is calculated by the associated phrase attention. In the masked sentence attention, the paper forces the model to focus on the important sentences extracted by the extractor algorithm. By combining such attention scores together with a hyperparameter as the weight, the paper can not only emphasize information from important sentences but also not lose semantics in other sentences. Te attention score calculation process in Seq2Seq is shown as follows: Te attention score calculation process in the BART is shown as follows: when ζ � attn, η i is the default attention mask. When ζ � mask, η i is shown as follows:  where ξ and ϵ are the hyperparameters. Te extension of the generation sources encourages the integrity of the sentence and increases the probability of correctness. For summary output, the fnal vocabulary distribution in BART at time step t is P � Dense(h D t ), where Dense is a dense layer, while the preliminary vocabulary distribution in Seq2Seq at time step t is defned as follows:

Data
3.4. Mask-Aware Copy Mechanism. Te copy mechanism in the Seq2Seq, according to [23], uses the encoder's representation of words to select a word in the inputs instead of choosing from the whole vocabulary. When dealing with important words, this technique may be more reliable than generating from all vocabulary. Due to the hidden state of a word being governed by its full context and lexical auxiliary feature collectively, the model can consistently produce great terms in the target vocabulary. Te paper makes a modifcation to the original copy mechanism. Te paper only copies words from important sentences since there could be noise throughout the article. By limiting the scope, the model can easily fnd the most possible word to generate. P copy is calculated as follows: where μ T 2 , W 4 , W 5 , and W 6 are trainable parameters. And σ means the sigmoid function.
Te fnal prediction is obtained by merging the copy probability and the output of the decoder. P � 1 − P copy P vocab + P copy i a t,i δ y t x i , In conclusion, our SentMask model extends the Seq2Seq and BART models, respectively, with an important sentenceguided masked attention strategy that enables the model to leverage both word-level information and sentence-level information for fnal sequence generation. Taking advantage of containing more condensed semantics at the wordlevel and keeping the original sentence grammar at the sentence level, our SentMask model promotes the capacity of capturing the gist of the input text, either semisupervised or supervised.

Dataset.
To comprehensively investigate our proposed model, we employ two benchmark datasets for evaluation, which are common options in previous research, including the Multi-Document Summarization of Medical Studies benchmark dataset (MS2) and the AESLC dataset. Te paper declares both of them are open access, where the MS2 dataset can be downloaded at https://paperswithcode.com/dataset/ ms-2 and the AESLC dataset can be downloaded at https:// github.com/ryanzhumich/AESLC. Te statistical details of the two datasets are shown in Table 1. Te following are brief summaries of these benchmark datasets. [24]. MS2 dataset is a scientifc literature dataset with about 470k pages and 20k summaries. Te paper removes the contents that are excessively long or too short, and 20,434 papers are ultimately acquired as our corpus, with 16,112 documents for training, 2,277 for validation, and 2,045 for testing. [25]. Te AESLC dataset is obtained from the Enron dataset, including many emails from stafers in the Enron Corporation, which are composed of 517,401 e-mail messages from 150 user mailboxes. After fltering and deduplicating, the paper obtains the fnal AESLC dataset.  paper, we implement our SentMask based on Seq2Seq and BART, respectively, which is sufcient to demonstrate the efectiveness of the method. Te paper sets ξ � −1e6. Te paper uses Pytorch to implement our model. To demonstrate the performance of the proposed SentMask model, the paper compares the SentMask model to many baselines with the same model size for a fair comparison, including the Lead3 algorithm, TextRank algorithm, GenCompareSum model [4], Seq2Seq model, Presumm model [26], Global Encoding model [27], Pointer-Generator model [23], Transformer [28], AESLC baseline [25], and BART [8].

Implementation and Evaluation
Tere are some descriptions of the baselines as follows.

Lead3
Algorithm. Lead3 algorithm takes the top K sentences.

TextRank Algorithm.
Te TextRank algorithm determines each sentence's score based on how similar the sentences are to one another and then selects the top K scoring sentences. [4]. GenCompareSum model is a hybrid extraction method, which generates salient text fragments representing their main points and selects the most important sentences in the document by calculating using BERTScore.

Seq2Seq
Model. Seq2Seq is an encoder-decoder architecture, which consists of LSTM or GRU. [26]. Te Presumm model is based on the BERT model, which can express the semantics of the document and obtain the representation of the sentence and improve the quality of the summary through the fne-tuning method. [27]. Te Global Encoding model is a Seq2Seq model, which employs a gated convolutional unit in the encoder for global encoding. [23]. Pointer-generator is an encoder-decoder model solving the OOV problem by controlling the pointer to make the model copy the token from the original context. [28]. It is a brand-new, uncomplicated network architecture, which consists of attention mechanism techniques. [25]. AESLC baseline is a multisentence extractor and a multisentence abstractor. [8]. BART is a transformer-based model, which employs a bidirectional encoder with a number of denoising pretraining objectives. For the evaluation of the quality of the experiment, the paper comprehensively evaluates the quality of the summary generated by these baseline models from both intelligent and human evaluation perspectives. Automated overview evaluation metrics, including ROUGE [29] and BLEU [30], are used to evaluate the quality of text summarization. In particular, the BLEU evaluation metric is an enhanced Ngram assessment metric, and its N-gram weights can be defned here to conveniently ft the models for diferent purposes and more accurately determine the consistency of the model. Tables 2 and 3, respectively. Te results show that the proposed SentMask model performs remarkably well in two text summarization datasets, demonstrating the efectiveness of our masked sentences attention mechanism.

Automated Evaluation. Te experimental results on the MS2 and AESLC datasets are shown in
Meanwhile, the improvements confrm that not only further refning information from the original text can be captured by the structure of a multilayer neural network but also the expression capacity that enables the model to generate summaries with few grammatical errors is improved by adding updated encoding information.

Human Evaluation.
To further assess the quality of the summaries produced by the SentMask model, the paper conducted a human evaluation using three typical indicators, informativeness, fuency, and faithfulness. Te following are brief summaries of these human evaluation metrics.

Informativeness.
Te informativeness of the summary is determined by how accurately it summarises the material in the original article.

Faithfulness.
Faithfulness evaluates how well the facts in the summary match those of the original article.

Fluency. Te summary's fuency is determined by how few serious grammatical faults it contains.
Te paper hires fve native English speakers and randomly chooses 300 news stories from the MS2 and AESLC datasets to evaluate the summaries of these baseline models and the SentMask model on three diferent aspects. Te score ranges from 1 (poor) to 5 (outstanding). Table 4 fndings demonstrate that, in terms of informativeness, fuency, and faithfulness, our SentMask model outperforms other baseline models, which illustrates the value of the sentence-aware mask attention mechanism. International Journal of Intelligent Systems

Ablation Study.
To obtain a more scientifcally accurate explanation, an ablation study is conducted by removing some components of our model to verify their contribution. Te paper conducts the ablation study with the semisupervised model and a supervised model, respectively, on the MS2 dataset. Te paper conducts several experiments and ablation tests as follows.

SentMask-T.
It is our proposed semisupervised model. Te sentences are frst generated by the TextRank algorithm and then passed through the proposed SentMask neural network.

TextRank.
TextRank is a graph-based ranking model for natural language processing, which fnds the most relevant sentences in an article.

SentMask-C.
It is our proposed supervised model. Te MemSum algorithm generates the initial selected sentences and passes through the proposed SentMask neural network.

MemSum.
MemSum is a historical-aware multistep episodic Markov decision process algorithm.   Table 4: Te human evaluation results. Te score is calculated on an average of the scores for 300 news articles from the MS2 and AESLC datasets that were supplied by 5 volunteers. Te score of each volunteer, which goes from 1 to 5, is the assessment of every news article. International Journal of Intelligent Systems 7

Models
To investigate how the hyperparameters afect the model's performance, the paper tries diferent hyperparameter settings in our ablation study. An essential hyperparameter is the number of sentences with the highest scores extracted by the extractor algorithm, K.
Te paper performs a set of experiments with a diferent selection of K to uncover its infuence on the quality of the generating sentence. Tere are two ways to control K in the extractor algorithm, one is to control the percentage of selected sentences and the other is to set K itself. Te settings of the two ways are described as follows.
percent � p k in the extractor algorithm; the frst p k of sentences is selected as subsequent input sentences and as nonmasked sentences. In our experiments, the paper tries diferent p k ∈ 50%, 40%, 30%, 20%, 10% { }. top � K in the extractor algorithm; the frst K sentences are selected as the subsequent input sentences and as the nonmasked sentences. Te paper tries diferent K ∈ 5, 4, 3, 2, 1 { }. For the semisupervised model, the ROUGE-L score and the BLEU score of the ablation models with diferent p k are shown in Figure 4. Te ROUGE-L score and BLEU score of ablation models with diferent top − K are illustrated in Figure 5. For the supervised model, the ROUGE-L score and BLEU score of ablation models with diferent p k are shown in Figure 6. Te ROUGE-L score and BLEU score of ablation models with diferent top − K are illustrated in Figure 7.
Overall, the ablation models, either the semisupervised model or the supervised model, perform poorly in terms of the ROUGE-L score and BLEU score, demonstrating the efectiveness of the sentence-aware masked attention mechanism in our SentMask model. From the eight fgures, with diferent K, the line trend of the results of the semisupervised SentMask model is more turbulent, while that in the supervised SentMask model is relatively stable. Tus, the performance of the semisupervised SentMask model is infuenced by the parameter K signifcantly, while the supervised model is slightly infuenced. In addition, selecting the proper number of sentences is a crucial decision for our model. Comparatively speaking, it can be observed that the best setting of hyper-parameter K is to select the frst 50% of sentences of source articles, either the semisupervised model or the supervised model.

Efect of the Hyper-Parameter.
To demonstrate our model robustness with diferent parameters, the paper tries diferent ϵ from 0.6 to 0.95 for the semisupervised model and the supervised model on the MS2 dataset. According to the results in Figure 8, the proposed SentMask performs well regardless of the value of ϵ. SentMask-T performs best when ϵ � 0.9 in the MS2 dataset and SentMask-C performs best when ϵ � 0.95 in the MS2 dataset. Note that the model mainly carries out the task of generating text abstracts, so the proportion of information from the attention mechanism represented by the masked sentences strategy should be less than that of the original attention mechanism. Table 5 shows an example of summaries generated by diferent models.

Case Study.
In this example, the original article provides verifcation of acupuncture's efcacy and safety in relieving abdominal pain and distension associated with acute pancreatitis. Te primary idea of this paper is defnitely about acupuncture's high efcacy and safety, and the research object is abdominal pain and distension for acute pancreatitis. However, the baseline models generate an inappropriate summary to varying degrees. In detail, the summary of the Lead3 algorithm contains duplicate information that does not represent the true abstract of this article, such as "Methods and Analysis" Te TextRank algorithm has a risk of ranking redundant sentences high and generates condensed sentences that are semantically similar sentences, such as "safety of acupuncture" which appears twice in the summary text.
Te Seq2Seq model creates a summary that solely comprises information related to acupuncture, not the effcacy or safety of acupuncture. Furthermore, it made the mistake of redundantly repeating the word "acupuncture." Te pointer network model generates an excessive number of words, emphasizing "acupuncture's efect" rather than "its efcacy and safety." Meanwhile, the trial method does not need to be included in the abstract of the paper. According to the summary of the Global Encoding model, "orthostatic hypotension and cardiovascular" is a component of the entire text, but not the main information. Te main objective of the summary given by the Presumm model is "homebased ventilation in intensive care," which is inconsistent. Te summary generated by the BART model focuses on "pancreatitis" rather than "efcacy," which is inappropriate.   Compared with these baseline models, the summary of our model is more coherent and semantically relevant to the source text. Our model focuses on information on the effcacy and safety of acupuncture rather than itself and points out that this is a systematic review and meta-analysis in its generated summary. Meanwhile, all the words generated from our model are the target words of the standard dataset, maintaining a high degree of conciseness. Terefore, our model can better consider the grammatical word-level and sentence-level appearances simultaneously by masking the sentences to advise the generator. Tis indicates that the masked sentence attention in our  model is able to capture substantial semantics and minimize noise information from the source article by inserting an original sentence pointer.

Conclusions
In this paper, we propose SentMask, a novel extract-thenabstract method for text summarization. By utilizing the sentence-aware mask attention mechanism, our method avoids information loss caused by the extraction model. Besides, the paper utilizes a sentence-level extractor, which can preserve sentence-level semantics during generation. Experimental results, the semisupervised model and the supervised model, both demonstrate our model can generate comprehensive summaries without sufering information loss.
In terms of our future work, the paper attempt to extend our solution in various directions. One possible direction is to take into account the varied connections among the words and sentences in articles. Te paper will explore using the similarity of phrases, especially critical phrases, to further explore semantic relationships.

Data Availability
Te data used to support the fndings of this study are included in the article [24,25]. Te MS2 and AESLC datasets can be derived from the websites https://paperswithcode. com/dataset/ms-2 and https://github.com/ryanzhumich/ AESLC.