We have retrieved and analyzed several millions of Twitter messages corresponding to the Spanish general elections held on the 20th of December 2015 and repeated on the 26th of June 2016. The availability of data from two electoral campaigns that are very close in time allows us to compare collective behaviors of two analogous social systems with a similar context. By computing and analyzing the time series of daily activity, we have found a significant linear correlation between both elections. Additionally, we have revealed that the daily number of tweets, retweets, and mentions follow a power law with respect to the number of unique users that take part in the conversation. Furthermore, we have verified that the topologies of the networks of mentions and retweets do not change from one election to the other, indicating that their underlying dynamics are robust in the face of a change in social context. Hence, in the light of our results, there are several recurrent collective behavioral patterns that exhibit similar and consistent properties in different electoral campaigns.
Nowadays, social networking sites (SNS) are a well-established communication medium. They are used by a huge user base to share experiences, discuss opinions, read the news, etc. Twitter is one of the most dynamic SNS with respect to the interactions among users and one of the most powerful with respect to the potential information that can be extracted for research purposes. Moreover, in the last quarter of 2015 this social network had 305 million monthly active users, and the last data show that the number of monthly active users has risen to 335 million in the second quarter of 2018 [
These facts have stimulated the development of research projects from a wide variety of fields, from sociology to network science, that have provided new perspectives to the study of user behavior. These studies unravel the emergent patterns in our collective behavior and show how they can be used to gain insight about relevant social topics like economy [
In this work, we are interested in electoral processes. In these kinds of contexts, people use social media as a communication medium to exchange opinions. However, there are also some users whose purpose is to influence the conversation in a way that may affect the voting choices of the people. Although these are online actions, they have an impact on the offline world.
The concise character of a tweet turns it into a powerful tool to share breaking news and take part in dynamic debates [
In the Obama campaign of 2008, the efficacy of Twitter as a communication medium in a political context was established for the first time [
Some lines of research are centered around predicting the outcome of elections following diverse techniques [
In a previous paper [
Although we compare some of our results to those presented in that work, in this study we do not aim to predict the outcome of the elections or simply study the relations among users and politicians. We take advantage of the availability of Twitter data gathered during two electoral campaigns (the Spanish general elections of 2015 and the repetition of the elections in 2016) that are very close in time to compare the collective user behavior manifested in two analogous social systems with a similar context. The individual activity and the political actors may be different; in fact, in the second election two parties formed a coalition, altering the political landscape. Our objective is to study and characterize emergent behaviors that are recurrently manifested in political contexts.
To this end, we have computed and compared time series of daily activity for both elections, finding that the temporal series of activity for both electoral periods are significantly correlated. Furthermore, our results suggest that the number of tweets, retweets, and mentions follow a power law with respect to the number of unique users that take part in the conversation. In order to explore the evolution of the interactions among users, we have built networks of mentions and retweets and studied their temporal evolution. This has enabled us to verify that they show similar topological properties in both electoral periods. Besides, we have studied the mention and retweet subgraphs induced by political users and obtained results that imply a lack of communication among different parties and are in agreement with previous works [
The paper is organised as follows. In the second section we describe the political context, some relevant aspects of the interaction mechanisms in Twitter, the characteristics of our dataset, and the methodology followed to build the networks of interactions. In the third section, we present and discuss our results with respect to the user activity and the evolution of the mention and retweet networks. We also use a metric to discuss the influence of regular users and politicians. Furthermore, we analyze the degree of debate among politicians of different parties. Finally, we summarize the main conclusions.
Spain has a bicameral parliamentary system, where the lower house is called Congress of Deputies and the upper house, the Senate. For elections to the Congress of Deputies, held every four years, each of the 50 provinces serves as an electoral district, with the number of deputies representing it determined by its population. Under a proportional representation electoral system governed by the d’Hondt formula, ballots are cast for a provincewide party list rather than for candidates representing individual constituencies.
About four-fifths of the members of the Senate are directly elected via a plurality system at the provincial level. Each province is entitled to four representatives; voters cast ballots for three candidates, and those with the most votes are elected. The remainder of the senators are appointed by the regional legislatures. Because representation is not based upon population, in the Senate smaller and more-rural provinces generally are overrepresented in relation to their overall population [
On the 20th of December 2015, the Spanish general elections were held. The PP (Partido Popular, People’s Party) and the PSOE (Partido Socialista Obrero Español, Spanish Socialist Workers’ Party), which constituted the traditional two-party system had lost a lot of social support, while the emerging parties Podemos (We Can) and Cs (Ciudadanos, Citizens) were on the rise. This caused a transition from a two-party system to a multiparty system [
In spite of that, the PP, which was holding the government, was still leading the polls. The rest of the parties were behind but not too far away. In fact, the supports for the other three main parties fluctuated so much during the year before the election that it seemed impossible to predict, by looking at the polls, which would be the final ranking of votes [
The result was a fragmented parliament where no party held an absolute majority and large coalitions were needed to form a government. The votes and seats that each party obtained are displayed in Table
Results of the Spanish general elections of the 20th of December 2015 and the 26th of June 2016. Podemos and IU are together in 2016 because they formed a coalition called UP.
Votes | Seats | |||
---|---|---|---|---|
| | | | |
PP | 7236965 | 7941236 | 123 | 137 |
PSOE | 5545315 | 5443846 | 90 | 85 |
Podemos | 5212711 | 5087538 | 69 | 71 |
IU | 926783 | 2 | ||
Cs | 3514528 | 3141570 | 40 | 32 |
Others | 2775011 | 2665069 | 26 | 25 |
Before this new election, one of the emerging parties, Podemos, formed a coalition with IU (Izquierda Unida, United Left). This alliance was called UP (Unidos Podemos, United We Can).
The 2016 election resulted in a parliament that was almost as fragmented as the one in 2015. The votes and seats obtained by each party are presented in Table
We have worked with Twitter messages retrieved with the Twitter Streaming API. This API allows downloading Keywords for the 2015 election: Keywords for the 2016 election:
We have downloaded tweets during a period of more than two months before and after each election. However, the core of our analysis has been focused on the 15 days of the official electoral campaign, the
Besides the keywords used to retrieve the data, the most used hashtags that appear in our dataset are the following: Top hashtags of 2015: #podemos, #psoe, #partidopopular, #ciudadanos, #l6elecciones, #7deldebatedecisivo, #votapsoe, #possible, #pp, #mivotocuenta, #españa, #españaenserio, #podemos20dic, #20dicpodemos, #podemosremontada, #votapp, #hevotado, #votapodemos20d Top hashtags of 2016: #afavor, #unidospodemos, #l6elecciones, #votapsoe, #Debate13j, #psoe, #cambioamejor, #votapp, #españa, #ciudadanos, #partidopopular, #elcanvipossible, #unsiporelcambio, #avotar, #lasonrisadeunpais, #brexit, #eleccionesgenerales
We have also compiled lists of Twitter accounts associated with the four main parties (PP, PSOE, Podemos and Cs) and with IU. The latter is relevant because, in the 2016 election, as we explained in Section
In order to build the set of political accounts, we have looked into the Twitter lists (which are lists of accounts elaborated by the users) defined by relevant official accounts associated with each party. We have downloaded those lists that include politicians, political institutions, or supporters of the party. The total number of retrieved political users that participated in the conversation was 5227 in 2015, with an average number of followers of 4044, and 5012 in 2016, with an average number of followers of 4662.
In this section we will briefly describe some of the characteristics of Twitter. Specifically, the means of communication among users, which will be the source used to build the networks of interactions.
In Twitter there are several mechanisms of interaction among users. The first one is to
The
The
Here we will explain the methodology that we have adopted to build the networks of interactions from Twitter data. The interactions that we have analyzed are the
We have built mention networks by considering each user participating in the conversation as a node. Two nodes
The retweet network is built in a similar way as the mention network. The nodes are the users and user
Scheme that shows the method used to build the networks of retweets. In panel 1), user
Let us remark this idea: links in the retweet network join retweeters with original posters; there are no links between the middlemen that broadcast the original message. The reason for this choice of methodology is that the Twitter API used to download the data at that moment only provided information about the original poster of a tweet in the retweet metadata. The retweet network is then a directed and weighted network where the weights of the links from
We have considered two different temporal scales. On one hand, we have built daily networks, counting the 24 hours of a day from 4 AM (UTC) (in Spain, the local time corresponds to UTC+1 for winter time and UTC+2 for summer time) of that day until 4 AM of the following day. This way we capture full
In order to characterize the global activity of the users (number of tweets posted during a given period of time), we have computed time series of total daily activity for the whole period considered. Additionally, we have analyzed the temporal evolution of the distribution of daily activity per user.
In the left panels of Figure
Left panels: time series of aggregated user activity per day for the 2015 (top) and 2016 (bottom) elections. The shadowed region corresponds to the days of electoral campaign. Right panel: linear regression between the activity time series of both elections. The lines correspond to the linear fits of the data including the day of the elections (red dashed line) and excluding it (black continuous line).
In line with the existing literature [
In Spain, preelectoral silence is mandatory during the period that spans from 12AM of the day before the election to the closing of voting polls. This period is called
It is worth pointing out the similarity of the temporal evolution of the user activity in both elections. In Figure
Another relevant property of the user activity is displayed in Figure
Values of the exponents
| | |||
---|---|---|---|---|
| | | | |
Tweets | | | 0.96 | 0.85 |
Retweets | | | 0.94 | 0.80 |
Mentions | | | 0.89 | 0.66 |
Power law relationships of the total number of tweets, retweets, and mentions per day as a function of the number of unique users that participated in the conversation each day for the 2015 campaign (top) and the 2016 campaign (bottom). Note that the data corresponding to the day of the elections (marked with a circle) were not included in the fits.
Notice that, whereas in 2015 the growth for the three quantities was slightly super-linear with respect to the number of users, in 2016 we observe an approximately linear behavior. Hence, in 2015 when more users join the conversation, the activity experiences a proportionally higher increment than in 2016.
In order to further explore the characteristics of the user behavior, we have also analyzed the temporal evolution of the distribution of the daily user activity shown in the left panels of Figure
Temporal evolution of the distribution of activity per day in both electoral campaigns (2015 in the top panels and 2016 in the bottom panels). Left panels: probability mass functions (PMF) of the distribution of activity for each day in color code. Right panels: daily evolution of the
where in this case
In the right panels of Figure
where the prime denotes differentiation with respect to the first argument and
This technique is proven to be more precise than a minimum squares fit to the log-log plot, which usually yielding incorrect results when computing the parameters of power law distributions [
We can see that the values of the
The small fluctuations of the
We have analyzed the temporal evolution of the aggregated mention and retweet networks at two different temporal scales. On one hand, we have aggregated the networks for the whole campaign period (plus the next three days); on the other hand, we have performed an analysis of the temporal evolution of the networks by aggregating the data for each day separately and computing time series for different metrics.
In Figure
Left panel: strongly connected component of the aggregated mention network for the 2015 electoral campaign. Right panel: strongly connected component of the aggregated retweet network for the 2015 electoral campaign. Colors correspond to the communities computed with the Louvain algorithm [
Every well-defined group seems to correspond to a political party. Whereas in the representation of the mention network the different groups are well-defined and the most central nodes correspond to the leaders of the communities and are politicians or political parties, in the retweet network a mixing of nodes of different communities placed in the center of the representation can be appreciated. Most of these nodes present high centralities and belong to different communication media. This is in good agreement with the literature [
We have computed several statistical properties of these networks: the number of nodes (
The properties of the aggregated networks are displayed in Table
General statistical properties of the retweet and mention networks aggregated for the whole period of study for both electoral campaigns.
| | | ||
---|---|---|---|---|
| | | | |
Nodes | 354079 | 319961 | 330071 | 296645 |
Links | 1361495 | 1251113 | 928492 | 852216 |
Density | 1.09E-05 | 1.22E-05 | 8.52E-06 | 9.68E-06 |
| 3.85 | 3.91 | 2.81 | 2.87 |
| | | | |
| | | | |
| | | | |
| | | | |
| 0.19 | 0.19 | 0.06 | 0.06 |
The higher average clustering
The results show very little change from one election to the other, suggesting that the underlying dynamics of the networks of interactions are consistent and, to some extent, independent of the context.
With respect to the temporal evolution of the exponent of the in-degree distribution for mention and retweet networks, which is shown in Figure
Daily evolution of the
In the case of the temporal evolution of the out-degree distribution exponent, displayed in Figure
We have explored the degree correlations of the networks and their evolution in several ways. First, we have computed the degree assortativities [
Assortativities of the different aggregated networks.
| | | ||
---|---|---|---|---|
| | | | |
out-in | -0.1096 | -0.1095 | -0.1234 | -0.1169 |
in-in | -0.0174 | -0.0162 | -0.0313 | -0.0287 |
in-out | -0.0006 | -0.0014 | 0.0109 | 0.0074 |
out-out | 0.0521 | 0.0488 | 0.0988 | 0.0892 |
Z-Scores of the assortativities of the different aggregated networks with respect to 500 realizations of the directed configuration model.
| | | ||
---|---|---|---|---|
| | | | |
out-in | -152 | -137 | -141 | -133 |
in-in | -22 | -19 | -31 | -31 |
in-out | 2 | 2 | 12 | 9 |
out-out | 100 | 95 | 107 | 113 |
As we can see in Table
This order is maintained for the Z-Scores (see Table
With respect to the temporal evolution of the daily degree assortativities that is displayed in Figure
Daily evolution of the directed degree assortativities for mentions (@) and retweets (RT) networks for the 2015 (top) and 2016 (bottom) electoral campaigns.
These fluctuations seem to be caused by disruptive events, both exogenous and endogenous. On day -6 of 2015 a debate was celebrated between the leader of the PP (then, the president) and the leader of the PSOE. A similar pattern, but weaker, can be observed in the time series of assortativity on day -13, which coincides with the other important debate of the campaign. On day -11 of 2016 we have found a viral tweet that contained a comical video about Spanish politics. The retweets received by that particular tweet amounted to
In order to measure the global influence of a user on the network, we have used the user efficiency metric [
where
We have computed the distribution of efficiency for the global conversation including regular users and the politicians that participated in it. In order to compare the behavior of regular users with politicians, we have also considered the distribution of efficiency corresponding to the groups of accounts of the four main political parties described in Section
In Figure
Probability distributions of efficiency for Twitter accounts associated with each party and for the whole set of users. Top: 2015 electoral campaign. Bottom: 2016 electoral campaign. The value
We have compared the efficiency patterns of the political accounts of each party to the whole set of users for both elections. In order to do that, we have divided the interval that spans all the possible values of efficiency (shown in Figure
Probability differences of having an efficiency that falls within a given bin (
In order to assess the significance of this result, we have taken 100 samples of 1000 randomly chosen users and computed their efficiency probability differences with respect to the bulk of users as described above. The average values of the differences and their standard deviations are represented, respectively, as blue dots and a blue-grey shadow in Figure
In that figure, it can be noticed that accounts belonging to political parties tend to exhibit higher probabilities than regular users for efficiencies in the region
While most of the users act as passive listeners or broadcasters, Twitter conversations are usually driven by a small elite of influential accounts [
In order to analyze the communication among politicians, we have computed the subgraphs induced by the user accounts associated with political parties. Then, we have grouped nodes belonging to the same party in supernodes, obtaining a C-network. The resulting supernodes correspond to groups of users and the weights of the links among those supernodes are the sum of the weights of all the links that join a user from one group with a user of another in the original network. The resulting colored adjacency matrices are displayed in Figure
Adjacency matrices of C-networks where each supernode is the aggregation of political accounts belonging to a given party. Left panels correspond to mention networks and right panels to retweet networks. Top panels correspond to results of 2015 and bottom panels to results of 2016. Color is related to the proportion of mentions (retweets) directed from party
As we can see, in line with previous works [
We have included the party IU in this analysis to study the effect of the agreement between them and Podemos to form a coalition (UP) in the 2016 election. The results displayed in Figure
We have also computed the evolution of the assortative mixing [
The assortative mixing is a metric used to test if the links of a network preferently join nodes of the same kind and nodes of different kinds or the connections are random. In our case, the different kinds would be the different parties in one case and the coalitions in the other. In order to compute the assortative mixing, the nodes are classified in groups and the proportions
Then,
This metric takes the value
In Figure
Daily evolution of the assortative mixing by party and coalition for the mention (@) and retweet (RT) networks in 2015 (top) and 2016 (bottom) electoral campaigns.
The drop in assortativity the day after the elections means that politicians of different parties interacted more with each other that day than during the campaign. This seems to be caused partially by some exchange of messages commenting the consequences of the results of the elections. There are tweets containing criticism to adversaries, congratulation messages to related parties, and tweets trying to convince or push potential allies to form coalitions. Notice, however, that this decrease, although significant, is not large: the value reached in 2015 is around 0.87 and in 2016 is 0.89. Consequently, we attribute the decrease both to the drop in the number of posted messages the day after the elections, which makes the data more noisy, and to an increment of message exchange between parties.
The most remarkable feature of these time series is the difference between the elections of 2015 and 2016. In the latter, the parties IU and Podemos formed a coalition, a fact that is reflected here in the following way: whereas in 2015 coalition and party assortativities are almost equal, in 2016 the coalition assortativity is clearly higher for both networks. This means that the communication between users from Podemos and IU is high enough to lower the general assortativity.
In order to assess the relevance of this effect, we have performed a paired t-test on the assortativities time series coupling each assortativity from 2015 with its counterpart of 2016. The null hypothesis is that the assortative mixing values for both elections have the same expected values. The results presented in Table
P values of the paired t-test performed in the assortative mixing time series of parties and coalitions. The test has been carried out by taking each time series of 2015 and coupling it with its counterpart of 2016.
| | |
---|---|---|
Mentions | 0.0029 | 0.9 |
Retweets | 0.0001 | 0.3 |
Our main goal in this work was to perform a comparative analysis of the user behavior in Twitter in two consecutive electoral campaigns in order to find the presence of correlations and recurrent patterns. To this end, we have analyzed temporal series and interaction networks corresponding to two Twitter datasets downloaded during the Spanish electoral campaigns of 2015 and 2016. Although the individual activity and the political actors may have changed, we have shown evidence of recurrent activity patterns in different political campaigns. In particular, the activity time series for both elections exhibit a significant correlation. Moreover, we have found a power law relationship between the daily rate of tweets (retweets and mentions) and the number of unique users. Finally, besides the behavioral stabilities mentioned above, we have been able to detect the effect of a political coalition in the interaction networks through the study of the evolution of their properties.
The results that we have obtained from the computation and analysis of the daily user activity time series for both elections indicate that they present a significant linear correlation. Additionally, by studying the distribution of user activity we have found that in both elections its exponent fluctuates in the same tight interval. The value of the exponent obtained in a previous work [
We have shown that the daily rate of tweets, retweets, and mentions follow a power law with respect to the number of unique users that participated in the conversation each day. However, whereas in 2015 the growth for the three quantities was slightly super-linear with respect to the number of users, in 2016 we observe an approximately linear behavior. Hence, in 2015, when more users join the conversation, the activity experiences a proportionally higher increment than in 2016.
We have assessed the consistency of the topology of the mentions and retweets networks from one election to the other by computing the degree distribution and the degree correlations of the aggregated networks. The variation of the power law exponent of the degree distributions from one electoral period to the other is of 1% at most, whereas the degree correlations are shifted less than 10% from one year to the other. The values of these properties are also comparable to the results obtained in a previous work with a similar political context [
By computing the distribution of the user efficiency for regular users and the accounts associated with each party, we have shown that its functional form is not dependent on the chosen group of users or on the particular electoral period under study. This adds further evidence of the universality of the efficiency patterns shown by Morales et al. [
The performed analysis of the mention and retweet C-networks induced by political accounts has enabled us to show the lack of debate among different political parties. This result is in good agreement with the existing literature [
In addition to the regularities in behavioral patterns that we have found by comparing two similar political contexts, several results are consistent with a previous study of the 2011 Spanish elections [
The data used to support the findings of this study are available from the corresponding author upon request.
An earlier version of this work has been presented at the 9th International Conference on Complex Systems.
The authors declare that they have no conflicts of interest.
This work has been supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under Contract no. MTM2015-63914-P.