Adaptive Attention with Consumer Sentinel for Movie Box Office Prediction

To improve the movie box office prediction accuracy, this paper proposes an adaptive attention with consumer sentinel (LSTMAACS) for movie box office prediction. First, the influencing factors of the movie box office are analyzed. Tackling the problem of ignoring consumer groups in existing predictionmodels, we add consumer features and then quantitatively analyze and normalize the box office influence factors. Second, we establish an LSTM (Long Short-TermMemory) box office prediction model and inject the attentionmechanism to construct an adaptive attention with consumer sentinel for movie box office prediction. Finally, 10,398 pieces of movie box office dataset are used in the Kaggle competition to compare the prediction results with the LSTM-AACS model, LSTM-Attention model, and LSTM model. -e results show that the relative error of LSTM-AACS prediction is 6.58%, which is lower than other models used in the experiment.


Introduction
e movie box office, as an indicator of the level of film development, has attracted great attention from all walks of life. At present, the prediction of the movie box office has become one of the hottest research by scholars [1]. Linear regression and nonlinear regression models used to construct social media-driven movie box office prediction models were proposed [2]. A new method of movie box office prediction based on two-level and twice proxy variables was proposed [3], which can predict the first weekʼs box office by using some preindicators obtained before the movie is released. A single influencing factor of the movie box office was mainly analyzed [4]. ey analyzed the influence of celebrity effect on box office. ey concluded that celebrity influence is positively related to box office. e competition factors with similar movie release time on the standard regression framework were tested, and a more simplified empirical model was proposed [5]. A BRP feedback neural network was proposed to solve movie box office prediction and classification problems [6]. e prediction model using the BP neural network has the following shortcomings. (1) Binary is used in the discretization of the model to quantify the various influencing factors of the movie box office [7]. ese variables are not processed according to the actual situation, and the differences between different influencing factors cannot be fully expressed.
(2)When using BP neural network for training, it is easy to fall into the problem of local minimization [8].
LSTM [9] is a time recurrent neural network. A movie box office prediction based on the LSTM model was proposed [10].
is model can solve the BP neural network using only simple Boolean coefficient values. It can map as many movie box office influence factors as possible in the input and output. However, its analysis of film sale influencing factors is not comprehensive, and the predicted results still have large relative errors.
Tackling the current movie box office prediction problem, this paper proposes an adaptive attention LSTM model with consumer sentinel. Compared with traditional LSTM, this model proposes an attention with consumer sentinel. On the one hand, it can fully consider the impact of movie consumer information on the movie box office and improve the model input. On the other hand, adaptive attention can capture the input affective information more vigorously, thereby further improving the prediction accuracy. Specifically, the model is based on the LSTM model injecting the adaptive attention (AAM) with consumer sentinel. Consumer sentinel can identify the influence of the influencing factors of the movie box office from more dimensions and solve the long-standing problem of ignoring consumer information in box office forecasting. e use of LSTM takes into account the random volatility and long-time span of the movie box office. LSTM also remembers the information for a long time to predict the movie box office. Injecting an adaptive attention can capture affective input information, which provides a guarantee for the accuracy of the movie box office prediction results. e proposed model provides a certain reference value for film investors in film risk control, and it can have a certain planning value for film release schedule and has practical application prospects. e contribution of this paper can be summarized as follows. (1) To improve the movie box office prediction accuracy, this paper proposes an LSTM model with an AAM and consumer sentinel (LSTM-AACS). It can better capture consumer characteristics, thereby improving prediction accuracy. (2) e LSTM-AACS model is applied to the prediction of the movie box office and achieves good results. e results show that the relative error of LSTM-AACS prediction is 6.58%, which is lower than other models used in the experiment.

Literature Review
ere are many factors including investment, director, actors, and sequel and play a role in promoting and guiding the filmʼs box office. In [11], six independent variables of film investment were selected; film quality, director, actors, film sequel, and piracy. ey established a linear regression model for influencing factors and movie box office. A semiparametric method was proposed to deal with random effects in a nonparametric way [12]. e example of comparing the reviews of movie critics uses the logit model of the adjacent category and the logit model of the related baseline category. Although this method eliminates the influence of extreme data, it also makes the use of data information insufficient.
e above research provides an important reference when selecting factors affecting the movie box office. en, they used the Sawhney and Eliashberg model to predict the cumulative number of viewers of the movie after a few weeks of its release [13]. Its practical significance is that, during the life cycle of movie release, movie theaters can dynamically adjust the projection strategy. For example, movie producers can expand or reduce the number of theaters showing the movie, change the projection period, and so on. However, this method has the following shortcomings. (1) When using the multiple linear regression algorithm to predict the cumulative audience in the first week, few film influencing factors (number of film copies, user ratings, number of theaters, and audience age) are considered, and it did not consider the special attributes of the movie to attract the audience. is led to an excessively large prediction error for the first week. (2) is kind of error will accumulate when the diffusion model is used to predict the number of viewers in the next few weeks, which will affect the final prediction accuracy.
Based on the multilayer neural network algorithm, multiple movie attributes that affect the box office were combined [14]. ey proposed a movie box office classification model and used the classification accuracy as the main index to evaluate the classification performance of the model. ey achieved good classification results. However, this method uses binary discrete numbers to quantify the various influencing factors of the movie box office, which is obviously a vague processing method. ese variables are not quantified according to the actual situation, so they cannot fully reflect the different variables in the influencing factors. In addition, the classification of the movie box office in the output layer of the prediction model is also vague, making the classification of each movie box office level too large. Such classification is of little relevance value for film investors and movie theaters to control the cost of film production and screening. A multimodal deep neural network for movie box office revenues prediction was proposed [15]. A CNN was built for extracting features from movie posters. en, a multimodal deep neural network was built to leverage both movie poster features and other movierelated data for movie box-office revenues prediction. In addition, the features of CNN learned from movie posters were analyzed. However, the research did not focus on building more multimode DNNs, nor did it merge audio and video data related to movies. In [16], a hybrid social recommender system utilizing a deep autoencoder network is introduced. e proposed approach employs collaborative and content-based filtering, as well as users' social influence. e social influence of each user is calculated based on his/ her social characteristics and behaviors on Twitter. For the evaluation purpose, the required datasets have been collected from MovieTweetings and Open Movie Database. However, the dataset used in this study is not comprehensive enough and may have limitations in prediction accuracy.
e LSTM-AACS model used in this paper uses the LSTM model based on the adaptive attention. A lot of work has been proposed for the LSTM model based on the attention. e attention-based LSTM model was proposed for financial time series prediction [15], and the model prediction can be intuitively understood through the attention vector. In addition, their focus on time and factors makes it easy for people to understand why certain trends are predicted when accessing a given time series table. ey also modified the loss function of the attention model using weighted classification crossentropy. However, there is a shortcoming that the error is small in the long-term forecast, and the performance in the short-term forecast is not ideal, with high errors. A forecasting framework was established to predict the opening prices of stocks [16]. ey processed stock data through a wavelet transform and used an attention-based LSTM neural network to predict the stock opening price, with excellent results. However, simply considering the impact of historical data on price trends is too singular and may not be able to fully and accurately forecast the price on a given day. An attention-based long 2 Complexity short-term memory network for aspect-level sentiment classification was proposed [17]. e attention mechanism can concentrate on different parts of a sentence when different aspects are taken as input. However, its flaw is that different aspects are input separately, and it does not realize modeling of multiple aspects simultaneously with the attention mechanism. An attention-based LSTM network is proposed for cross-language sentiment classification [18]. ey use bilingual bidirectional LSTM to model the sequence of words in the source and target languages. Based on the particularity of sentiment classification tasks, they proposed a hierarchical attention model that was jointly trained with LSTM network. e model has achieved gratifying results on the benchmark dataset with Chinese as the source language and English as the target language. However, the problem is that the performance of the model is not evaluated on more datasets and more language pairs. An attention-based LSTM model for the task of hashtag recommendation was proposed [19]. ey adopted the architecture of LSTM to avoid hand-crafted features. eir model incorporates topic modeling into the LSTM architecture through an attention mechanism and takes over the advantages of the both. rough evaluations run on a large dataset from Twitter, they have demonstrated that the proposed method outperforms competitive baseline methods effectively. However, the present work does not consider the use of other types of data in microblogs for hashtag recommendation [20]. e main problems above are as follows. (1) It performs well in short-term prediction, but the effect is not ideal in long-term prediction. (2) e input data of the model is not comprehensive, which leads to the prediction results only in a certain dataset to achieve high prediction accuracy. (3) e influencing factors of the results in the prediction problem are not considered comprehensively, such as ignoring user information and resulting in low prediction accuracy. Based on the above problems, we propose an AAM for movie box office prediction with consumer sentinel. With consumer sentinel, it can solve the problem of ignoring consumer groups in previous predictions. AAM can capture effective input information well. Finally, the LSTM model based on the above two algorithms is used to predict the movie box office and compare with other models. Experiments show that the prediction accuracy of the AAM for the movie box office prediction model with consumer sentinel is better than other models used in the experiment.

Framework Design.
e framework is shown in Figure 1. It can be seen that this paper adds consumer information to the previous movie box office influencing factors and injects an AAM into the LSTM neural network (its structure is shown in the blue box, and consumer sentinel are input into the model as features and then combined with the attention mechanism to train the LSTM model). is improves the prediction accuracy.

Factors of a Movie.
is paper uses the statistical analysis of the historical box movie office data in China combined with the actual situation of the movie market. e paper selects the director, actor, film genre, nation, and release data as the film's own influencing factors (as the filmʼs information input). is paper then assigns different weights to each factor. e calculation method will be explained in detail in Section 4.1.

Consumer Groups.
Based on the consideration of a movieʼs own influencing factors mentioned in Section 3.2.1, this paper adds the age information of movie consumer groups. is is because every movie must have its audience. For example, military subjects are more suitable for viewing by teenagers and above, while cartoons have more children as the audience. Generally, elderly people rarely go to the cinema to watch movies and so on. e age information of consumers is used as input information, and weights are assigned to jointly predict the final box office of the movie.

Long-and Short-Term Memory Network
Layer. LSTM is an improved RNN (Recurrent Neural Network) model that solves the problems of gradient explosion or gradient disappearance during RNN training. Different from the single tanh loop structure in standard RNN, LSTM is a special network with three "gates" [21,22]. ey are the forget gate, input gate, and output gate. e forget gate is responsible for choosing to forget invalid information in the past. e input gate is responsible for determining that useful new information is stored in the cell state. e output gate determines the output information. e process of the memory module for status update and information output is as follows: (1) e core of LSTM is cell: cell state is the memory transmission belt of the entire module that changes over time. e conveyor belt itself cannot control which information is memorized. e forget gate, input gate, and output gate play a controlling role.
(2) Forget state information: select the input x t at the current moment and the memory unit state information h t−1 at the previous moment, and then use the sigmoid function to output a value of [0, 1] to indicate the degree to which historical information needs to be retained:    Complexity (used for forgetting) with the new candidate information i t * C t to determine the updated information: (2) (4) Output information: first determine which part of the state will be outputted. Finally, obtain the memory unit output information at the current time after the value of the output gate and the state information of the memory unit undergo tanh transformation:

Adaptive Attention Mechanism Layer.
is paper adds an AAM layer [22] to the method, which can better capture the affective information in the movie box office data and grasp the core data information. It overcomes the problem of the standard LSTM model using the same state vector in each step of the prediction, which results in the inability to fully learn the detailed information of the sequence encoding during the prediction. e specific extension method is based on the original LSTM model, adding two formulas: where W i and x t are the input of LSTM and W h ∈ R d×d is the parameter matrix that the subsequent model needs to train. C t is a memory cell, called sentinel gate. It is similar to the input gate, forget gate, and output gate in LSTM. e structure of the formula is similar to (4). e vector c in the AAM can be expressed as follows: where β t ∈ [0, 1] can be regarded as a sentinel gate in the true sense controlling the degree to which the model pays attention to C t . e representation of β t is as follows: At the same time, the attention distribution α t of the K areas of the AAM has also been expanded to α t . e method is to splice an element after z t : where α t has k + 1 elements, and the expression of z t is as follows: Finally, the probability distribution of the matrix can be expressed as follows: where W p is the parameter matrix that the subsequent model needs to train. e output variable h t finally passes through a fully connected layer and softmax classifier, which finalizes the prediction of movie box office.

Normalization of Impact Factors.
is section will elaborate on the factors that affect the movie box office and give the corresponding definitions. At the same time, the quantification process of each attribute of the movie box office data will be given to prepare for the construction of the LSTM-AACS training set.

Director. Define movie box office influence of director i index as
where i means director number, j represents the jth movie filmed by director i, k indicates the week of the release, m means the m movies with the closest release time from the current time among all the movies filmed by director i, and b jk represents the box office during the kth week of the latest jth movie. Furthermore, the box office influences weight DirectorWeight i of the film directed by director i which can be obtained as follows: DirectorWeight i � lg Dir i /Dir min lg Dir max /Dir min ) , (12) where Dir max � max Dir 1 , Dir 2 , . . . , Dir min � min Dir 1 , Dir 2 , . . . , (13) where i represents the number of the director and Diri represents the influence of the ith director.

Actor. Define box office influence of actor i index as
where i means actor number, j represents the jth movie filmed by the actor i, k indicates the week of the release, m means the m movies with the closest release time from the current time among all the movies filmed by actor i, b jk represents the box office during the kth week of the latest jth movie, and u ij is the participation coefficient of the jth movie actor i recently participated in, and it is defined as follows: 0.5, n ∈ (5, +∞), Complexity where n is a positive integer, indicating the order of the actor i in the jth movie. Furthermore, the box office influence weight ActorWeight i of the film directed by actor i can be obtained as follows: where where i represents the number of the actor and A i represents the influence of the ith actor.

Movie
Genre. Define movie box office influence of movie genre i index as where i means genre number (i � 1,2, . . ., 9), k indicates the week of the release, m represents the week of screening of genre i, j represents the jth movie belonging to genre i, and b jk represents the box office of the jth movie with content genre i in the kth week of its release. e box office influence weight GenreWeight i of the film of genre i can be obtained as follows: where

Nation. Define box office influence of movie nation i index as
where i means nation number(1 ≤ i ≤ 5), the value of i from 1 to 5 corresponds to Europe, America, Japan, Korea, Hong Kong, and Taiwan, Mainland China, and other regions, k indicates the week of the release, m represents the total number of movies in the distribution nation i belonging to the area, j represents the jth movie in the distribution nation i, and b jk represents the box office during the kth week of the release of the jth movie in the distribution nation i. e box office influence weight NationWeight i of the film directed by nation i can be obtained as follows: where i represents the serial number of the issuance area, N i represents the influence of the issuance area i, and N j represents the influence weight of the issuance area j.

Data. Define box office influence of release data i index as
where i means data number(1 ≤ i ≤ 4), the value of i from 1 to 4 corresponds to the Lunar New Year file, the 51st file, the summer file, and the eleventh file. k indicates the week of the release, m represents the total number of movies with the release date in schedule i, j represents the jth movie belonging to data i, and b jk represents the box office data generated during the kth week of the release date of the jth movie with the release date in schedule i. e weight DataWeight i measures the box office influence of the type on the movie attributable to that type: where where i represents the serial number of the data and D i represents the influence of the data i.

Consumer Group.
is paper is divided into 4 age groups: under 18, 18-45, 46-69, and over 69. Define box office influence of movie nation i index as where i means age group number(1 ≤ i ≤ 4) the value of i ranges from 1 to 4 corresponding to ages under 18 years old (excluding 18 years old), 18-45 years old, 46-69 years old, and over 69 years old (excluding 69 years old), k indicates the week of the release, m represents the total number of movies in age group i, j represents the jth movie in the distribution age i, and b jk represents the box office during the kth week of the release of the jth movie in the distribution age i. e box office influence weight AgeWeight i of the film indexed by age group i can be obtained as follows: where Age max � max Age 1 , Age 2 , . . . , where i represents the serial number of the age group and Age i represents the influence of the age group i.

Model Parameters.
In the LSTM-AACS model, we set the dropout rate to 0.5. During training, a small batch stochastic gradient descent method is used to reduce the training loss, and the minibatch is set to 64. When analyzing the prediction results, the relative error is used for analysis. is paper uses the movie box office prediction dataset in the Kaggle competition. After obtaining the movie box office prediction data, the calculation formula for the relative error of the prediction result is as follows:

Error Comparison.
In the experiment, the results are analyzed by crossvalidation. is paper randomly takes out the first 3000 pieces of data from 10,398 pieces of data as the training set, and the last 7398 pieces of data as the test set. Learning is done 30 times when training the model, and then ten crossvalidation rounds are applied. Finally, the average relative error of these two models is shown in Table 1. It can be seen from Table 1 that the average relative error of LSTM time series and LSTM-Attention, using ten crossvalidation, is higher than the relative error of the model proposed in this paper. It shows that the LSTM-AACS model is better than the LSTM model and the general LSTM model with attention for movie box office prediction.
Randomly selecting the prediction results of several movies from the test set, Figure 2 shows the comparison of partial relative errors of the two models under the training set.
From the seven randomly selected movies, it can be seen that the relative error predicted by the LSTM-AACS model in the test set is lower than the relative error predicted by the LSTM model and the LSTM-Attention model. e prediction results of the LSTM-AACS model is relatively more accurate, and the performance is improved.

Result Comparison.
In order to make the model have both long-term and short-term prediction capabilities, we compare the long term prediction capabilities of the LSTM-AACS model, the LSTM-Attention model, and the LSTM time series model. We also compare and analyze the movie box office data in the Kaggle competition. Additionally, we choose the box office data of a Maoyan movie to predict the short term box office. Considering the classic movies of previous years, this paper chooses Dangal; My People, My Country; Wolf Warriors II and Fast & Furious 7 as the movies to predict their cumulative box office. Using these movies, this paper compares the actual value, predicted value, absolute difference, and relative error of the three models, respectively. e specific results are shown in Table 2.
As can be seen from Table 2, the relative error of the LSTM-AACS model in predicting the cumulative box office of the above four movies is lower than the relative error of the prediction results of the LSTM model and the LSTM-Attention model. is proves the feasibility of the LSTM-AACS model proposed in this paper in predicting movie box office. is also proves that the LSTM-AACS model can have a better evaluation effect on movie investors.

Conclusion
Tackling the problems of ignoring consumer factors and low prediction accuracy in movie box office prediction, this paper proposes an adaptive attention movie box office prediction model with consumer sentinel. e experimental results show that the introduction of consumer data into the prediction model can improve the prediction accuracy on the basis of a movies own influencing factors. Compared with a single LSTM model and an LSTM model with an attention mechanism, the LSTM model with AAM has better prediction capabilities for movie box office prediction. In the future, the model can be further optimized by enriching the characteristics of expert experience, introducing more consumer characteristics, and adding movie reviews as an influencing factor.

Data Availability
e data used to support the findings of the study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.