The Power of Words: A Study of How Search Contents Can Affect Financial Decisions

As search engines have become the main information resources of our daily life, studies about search behavior on the internet have gained great popularity with the growing knowledge of how the search behavior itself can affect our daily decisions, e.g. what to purchase, where to travel and even how to define beauty. However, there is no consensus conclusion whether the search behavior itself or the linguistic meaning behind it that can affect their decision. After analyzing the linguistic meanings of 13,915 English words obtained from Google Trends and its profit gained from the US house market by automatic transactions. It is found that their linguistic meanings of the search contents can affect financial decision results as word clusters with unsupervised machine learning methods.


Introduction
In investigating how the search behavior affects people's daily lives, the pioneers in this domain focused on confirming the correlation between search volumes and contemporaneous events in various fields. For example, the frequency of specific cancers inquired on the Internet during 2001-2003 was found to be closely correlated with their actual incidence (Cooper et al. 2005). The counts of the top 300 search contents during 2001-2003 were claimed to be in high correlation with the unemployment figures presented by US Bureau of Labor Statistics (Ettredge et al. 2005); Choi & Varian (2012) once analyzed how the search volume correlated with such economic activities as auto and home sales, international visitor statistics, and US unemployment rate. They found that there was a linear relationship between search behavior and the events that had happened. Later on, the focus in this domain shifted from conforming the correlation to "predicting the present", that is, to predict the events days or weeks in advance of their actual occurrence by analyzing the search behavior on the Internet. Tobias et al can quantify Wikipedia usage patterns before stock market moves (Tobias et al. 2013).
Recent studies have shown that search behavior can even change how people define outdoor beauty (Chanuki et al. 2015).
There was a great argument about using search behavior on the internet as an information source to predict reality because the prediction with search behavior could fail from time to time. For example, in 2008, researchers from Google claimed that they could "nowcast" the flu based on the people's search volume on Google. It was a success at that time. However, their prediction after 2008 failed-and failed spectacularly-missing at the peak of the 2013 flu season by 140 percent. Thereafter, many researchers and applicators turned to explore how the search behavior on the internet was able to be effectively used under certain restrictions or with specific techniques, and many successes in using search behavior data have been made (Saiz & Somonsohn 2013;Olivola & Sagara. 2009). Especially during the financially-unstable years, the search behavior on the internet has been regarded as a new information source, which was sometimes even more reliable than the traditional ones (Moat et al. 2014), because the internet has become a main way for consumers to distribute and gather information. Sharad Goel (2010) once predicted computer game sales and movie sales with the consumers' search behavior on the internet before their dates for release. In prediction with search behavior, of course, Google Trends played an important role in significantly raising the accuracy. With Google Trends, Tobias Preis forecast the influenza outbreaks successfully in 2014. Susy Moat quantified the advantage of looking forward, by finding the countries in 2012 which searched Electronic copy available at: https://ssrn.com/abstract=3508053 3 the "2013" more in Google Trends than "2012" have a much higher GDP level. In an emerging market, Tobias et al. (2013) employed Google Trends for prediction before the financial moves.
Besides stock markets, Hohenstatt, Kasbauer & Schafers (2011) found in a real estate market that online search behavior could predict aggregate changes of house price in 20 major cities across the U.S., while Beracha & Wintoki (2013) claimed that cross-sectional difference in search behavior could predict cross-sectional differences in the changes of house price across more than 200 cities in the U.S. Wu & Brynjolfsson (2015) disclosed that Google Trends had a good prediction ability in a real estate market. It was worth noting that Tsolacos (2012) and Dietzel, Braun & Schäfers (2014) tried to explore the role of sentiment in predicting the changes in house prices. Tsolacos (2012) revealed that economic sentiment indicator (ESI) could generate advance signals for forecasting the turning points of the house price in the real estate market. Dietzel, Braun & Schäfers (2014) took the internet search volume provided by Google Trends as a sentiment indicator and proved that this sentiment indicator could improve commercial real estate forecasting models for transactions and price indices.
Although the correlation between search volume and actual events has been confirmed and the accuracy for prediction has also been raised, the research subjects were mainly narrowed into certain types, e.g. search volume, frequency and headline topics, the size of search contents, to our knowledge, the types are no more than ten. Worse still, although some researchers like Tsolacos (2012) and Dietzel, Braun & Schäfers (2014) examined the role of sentiment in predicting the changes in house price, no emotional meaning of the search contents themselves was involved. As known, the emotional meaning is a pervasive aspect of how we interact with the world around us (Robinson & Altarriba, 2014), it has never been considered into search behavior researches. So, this study will take the emotional meanings of the search contents on the internet as a case to explore the role of lexical meanings in predicting the actual events. To be specific, the purpose of this study is twofold. One is to confirm the correlation with large size of search contents between the search volume and the house price and to justify the good prediction ability as claimed by Hohenstatt, Kasbauer & Schafers (2011), Wu & Brynjolfsson (2015 and Tobias et al. (2014) in a real estate market. The other is to explore the role of emotional meanings in predicting the changes in house prices.

Materials and Methods
Three data sources were employed in this study: Google Trends data was obtained from https://trends.google.com/trends/?geo=US; Each word is collected from the Warriner linguistic scale (2014). There are altogether 13,915 words. All the 13,915 Google Trends data is monthly downloaded from March 2008 to January 2019. Each of the emotional meaning of 13,915 English words in this study were collected from the norms of Warriner (2014) along three dimensions, that is, valence (the pleasantness of the words), arousal (the intensity of emotion provoked by the words), and dominance (the degree of control exerted by the words); and 3) house price was got from Zillow because the investment to real estate is more common to Step 1: The Pearson correlation test was carried out to test the linearity between the standardized search volume of each word from the norms of Warriner (2014) and the house price for all the 131 months from Zillow. The p-value threshold was preset at 0.001. 10341 words passed the Pearson correlation test, which means the standardized search volume of most words is highly correlated with the U.S house price for most of the words listed in the norms of Warriner.
Step 2: Profits based on 10341 words: an auto transaction method called "Google Trends strategy" was implemented for calculating the profits with a portfolio. As known, profits can only be made in a trading strategy if at least some future changes in the house price are correctly anticipated, in particular around large fluctuation of the house price. The results are compared with the "Buy and Hold" strategy, which is the profit made by the rise of the house price.
The thresholding period was set as six months, and then Google Trends strategy started from the seventh month. The standardized search volume of each word for each month compared with its average search volume of the first six months. In general, there were two positions, that is, short position and long position. Short position: if its search volume for one specific month went up as compared with its average search volume of the last six months, the house would be sold at the price of this specific month offered on Zillow; Long position: its search volume for one specific month went down, the house would be bought. In this way, the cumulative profits of a strategy's portfolio for a word could be obtained on the basis of buying Electronic copy available at: https://ssrn.com/abstract=3508053 and selling actions. Thus, Google Trends strategy was able to give us the profits of each word every month after the seventh month. Finally, the profits for 10341 words were figured out.
Of course, in applying this approach to analyze the relationship between standardized search volume and fluctuation of house price, the transaction fees have been neglected in this hypothetical investment strategy.
Step 3: The Pearson correlation test was carried out to test the linearity between the emotional meanings of each of the 10341 words along three dimensions and their corresponding profits obtained in Step 2.
Step 4: Considering the pattern of a space of three emotional dimensions, emotional words have the tendency to get together in clusters (Doerksen & Shimamura 2001), a machine learning method called "hierarchical clustering" was also done among all the 13,915 words presented by Google Trends strategy in terms of their profits and emotional meanings along three dimensions, and the Pearson correlation test would be carried out within each cluster and between clusters. Electronic copy available at: https://ssrn.com/abstract=3508053

Emotional Meaning in Predicting:
The cumulative profits and the emotional scores along three dimensions of the top words are presented in Table 1.
As shown in Table 1, "success" is the highest (167.04%) in cumulative profits, with its valence, arousal and dominance scores at 7.49, 5.8 and 6.38, respectively. That of "train" is the lowest (114.4%), with the scores along three dimensions at 6.36, 4.05 and 5.72.
Based on the 10,341 words, the Pearson correlation test was done between their cumulative profits and their emotional scores along three dimensions, but no strong correlation (Pvalence 1 is 0.84; Parousal is 0.77 and Pdominance is 0.31) was found. Taking the arousal scores of the 26 th cluster as a case, high correlations between the emotional scores of 500 words and their cumulative profits were plotted in Figure 2. Predicting between Clusters: Another Pearson correlation test was adopted to analyze the relationship between the 26 clusters based on the emotional mean scores of all the 500 words within each cluster. As indicated in Figure 2, the cumulative profits of each cluster (as plotted by the size of the red balls in Figure 2) are significantly related to their mean scores along two dimensions, that is, Arousal (c=-0.94, df = 24, p<0.001) and Dominance (c=-0.941, df = 24, p<0.001), but not to the Valence (c=-0.47, df = 24, p=0.015>0.001). As we can see from Figure   2, the mean scores of Valences within each cluster go only within a little range (between 3.93 to 4.91). The results indicate: (1) for the Valence dimension, the mean scores of each cluster are stable, and the variation between each cluster will not affect the cumulative profits. In other words, if the words are to be analyzed in terms of emotional meaning, their valence cannot be Electronic copy available at: https://ssrn.com/abstract=3508053 used for predicting the profits; (2) in regard to the Arousal dimension, the mean scores of each cluster are negatively correlated with the cumulative profits, that is, a cluster with high mean scores of Arousal may have low cumulative profit; (3) as for the Dominance, it means scores share a similar pattern in correlation with cumulative profits, a cluster with a low mean score most probably will bespeak high cumulative profits. The differences in the distribution of the emotional meanings along a single dimension and the interaction along two dimensions might be the reasons why arousal and dominance dimension rather than valence could be used to predict the cumulative profits based on the analysis of emotional meanings between clusters.
Third, within a space of three emotional dimensions, emotional words have the tendency to get together in clusters (Doerksen & Shimamura 2001). The emotional meanings within each cluster share identical features in emotional measurements, e. g. a strong prevalence is apparent of words referring to anger being low in valence and high in arousal and dominance. In a similar way, the emotional meanings between neighboring clusters share something in common with the emotional measurements. For example, Montefinese et al. (2014) found that Italian words referring to fear, sadness and despair were generally assembled in a similar place within an emotional space with low valence and dominance, and high in arousal.
The identical features within each cluster and similarity between neighboring clusters in emotional measurements might be the reasons why the emotional meanings are found to be Electronic copy available at: https://ssrn.com/abstract=3508053 correlated with the profits among all the three dimensions within a cluster and between neighboring clusters.
With regard to the role of emotional meanings in predicting the changes of house price, the stimulus-organism-response (S-O-R) Model developed by Mehrabian and Russell (1974)  examined the role of emotions (they used the term "sentiment") in predicting the changes of house price, the indicators they used were not as direct as those in our study. Tsolacos (2012) employed the economic sentiment indicator (ESI), which is to show how people feel about the market or economy and to quantify how current beliefs and positions affect future behavior in a graphical or numerical index. In a more indirect way, Dietzel, Braun & Schäfers (2014) regarded the internet search volume as an appropriate indicator for emotion.

Conclusion
This study confirmed that the search volume was highly correlated with the house price, and could be employed for predicting the house price. If taking emotional meanings of search contents into consideration for predicting the house price, although the profits based on prediction were not correlated with their emotional meanings as a whole along three dimensions, the profits were positively correlated with emotional meaning along valence, arousal and dominance dimension within each cluster, and were negatively correlated between cluster along both arousal and dominance dimensions.
To our best knowledge, this study is the first to confirm the correlation between the search volume and the house price and to justify the good prediction ability with large size of search Electronic copy available at: https://ssrn.com/abstract=3508053 contents. This study is also the first to explore the role of emotional meanings in predicting the changes in house prices. However, several limitations are evident in this study. First, although a large size of search contents is one of the originalities as compared with those of the previous studies, the size of 13,915 words is not large enough to represent 100,000 often-used words in English. Second, the search contents are analyzed only in terms of words rather than phrases Competing interests. The authors declare no competing interests.
Funding. This study was supported by National Natural Science Foundation of China (grant no. 71671019).