Are Health Videos from Hospitals, Health Organizations, and Active Users Available to Health Consumers? An Analysis of Diabetes Health Video Ranking in YouTube

Health consumers are increasingly using the Internet to search for health information. The existence of overloaded, inaccurate, obsolete, or simply incorrect health information available on the Internet is a serious obstacle for finding relevant and good-quality data that actually helps patients. Search engines of multimedia Internet platforms are thought to help users to find relevant information according to their search. But, is the information recovered by those search engines from quality sources? Is the health information uploaded from reliable sources, such as hospitals and health organizations, easily available to patients? The availability of videos is directly related to the ranking position in YouTube search. The higher the ranking of the information is, the more accessible it is. The aim of this study is to analyze the ranking evolution of diabetes health videos on YouTube in order to discover how videos from reliable channels, such as hospitals and health organizations, are evolving in the ranking. The analysis was done by tracking the ranking of 2372 videos on a daily basis during a 30-day period using 20 diabetes-related queries. Our conclusions are that the current YouTube algorithm favors the presence of reliable videos in upper rank positions in diabetes-related searches.


Introduction
The Internet is fast becoming one of the most common sources for health information, and studies show that people use the Internet to obtain information concerning health [1,2]. Since the arrival of social networks, there has been an important increase in users that trust the information deployed on the Internet. This is because the deviceindependent, quick, easy, and universal access from anywhere to the information provided gives the Internet a great advantage over other traditional sources of information such as encyclopedias.
In fact, according to literature, the Internet is currently the first source for medical information for patients concerning their illnesses [3][4][5]; they consult the web for further information on the illness itself, look for second opinions, search for others sharing similar health concerns, follow reports on personal health experiences [6], and even purchase drugs or medical treatment online [7].
However, users that are looking for health information on the Internet are not only patients that are worried about their illnesses, but also physicians fine-tuning their decisions on diagnosis and treatments based on the information gathered online [8]. A general problem when searching the Internet is the information overload and difficulty of finding the relevant data. In addition, too many websites include inaccurate, missing, obsolete, incorrect, biased, or misleading instructions that make it difficult for users to distinguish between trustworthy and specious information [9,10]. These inaccuracies provide erroneous information to patients causing misunderstandings [10,11] affecting themselves and other patients [12] and can lead to an unhealthy lifestyle, damaging the health of unconscious users, as in the case of anorexia [13], or sabotaging important prevention campaigns directed to the global population, as in the case of vaccines [14]. This problem is especially dangerous when users are not familiar with new technologies or when their health knowledge is limited. Also, certification approaches, such as the ethical HON code, are not solving the issue [15].
One of the most prominent Internet information providers is YouTube. YouTube is currently the most important video-sharing website on the Internet [16]. Each minute, 100 hours of video are uploaded to YouTube, over 6 billion hours of video are watched every month, and it has more than 1 billion unique visitors per month [17]. YouTube social media tools allow users to easily upload, view, and share videos and enable interaction by letting users rate videos and post comments. YouTube is increasingly being used to share health information offered by a variety of sources, including hospitals, organizations, governments, companies, and private users [18]. However, it may be difficult to find videos from trustworthy sources, since YouTube video ranking is known to favor content from popular sources (channels), meaning that, for example, hospital videos, where social interactions through likes/dislikes and comments are not so common, appear lower in the ranked list. Also, YouTube ranking does not focus on trustworthiness, and both misleading and incorrect videos may well be popular and therefore be given a high ranking [13,14].
Diabetes is one of the most common chronic diseases in the world. According to the available literature, there are currently 382 million people with diabetes, which represents a prevalence of 7.4 [19], and it is expected that the number of patients with diabetes will rise to 592 million by 2035, which will represent a prevalence of 9.1 [19]. Where diabetes is concerned, access to the correct information and selfmanagement of the disease by the patient can result in a clear increase in quality of life. It is therefore vital that precise and correct information is provided to patients with diabetes in order to effectively manage the disease. The Internet is one of the most important methods of self-education for diabetes patients [20]. The number of hospitals and health organizations dealing with patient education are increasing their presence on the Internet. However, the availability of the videos on YouTube might not depend only on the quality of the source. The quantity of videos uploaded daily on YouTube is so huge that it is possible that these educational videos might not be easily accessible to users.
With this problem in mind, the research question presented in this paper is the following: Are the diabetes-related health videos on YouTube posted by hospitals, health organizations, and active users easily available to health consumers? So, taking this into account, in this paper we analyze the stability of YouTube videos provided by trustworthy sources of information in order to know if these videos are actually accepted by users.
The structure of the paper is the following. The next section presents the background of the work where the related concepts and works are explained; then the methodology followed to configure white lists and explain how keywords were selected in order to perform the search is described.
In Results, the findings achieved are presented, and, subsequently, the authors discuss the results. Finally, conclusions are made.

Background
Health information on the Internet comes from different sources, not only from private persons reporting on personal experiences with disease, but also from hospitals, health organizations, governments, educational institutions, and profitmaking organizations. User studies have shown that the credibility of an information source is one of the most powerful factors affecting the process of selecting information [21]. For example, users are more likely to trust health information published or authorised by physicians or major health institutions [22][23][24] versus information provided by other sources. Such studies suggest that users show greater interest in health information published by professional sources, such as hospitals and health organizations, since these sources are considered more reliable than others available on the Internet.
A considerable amount of literature has been published on YouTube data analysis, such as studying relations between video ratings and their comments [25] or focusing on the social networking aspect of YouTube and social features [16,26]. Specifically in the case of health, the literature published shows concern about the content quality of YouTube videos. Although YouTube can be used to perform health research [27], there is not a specific health quality assessment in YouTube, and the patient can find erroneous information [28,29]. The literature advises that YouTube does not have clear and standardised mechanisms for patient assistance in the retrieval of quality information [30]. As a result, the literature is full of articles analyzing the quality of information available on specific subjects such as articles discussing the susceptibility of patients with vaccines [14], the knowledge available on rheumatoid arthritis [10], analysis of prostate cancer information [11], review of YouTube videos on first aid [31] and cardiopulmonary resuscitation [32], or analysis of patient resources for assessment of infantile spasms [33].
In general, the literature shows that the search for adequate health videos for patient education in YouTube is highly unpredictable in terms of health quality.
Since 2012 the YouTube search algorithm is based on Dwell Time, a paradigm instead of the use of number of clicks or likes [34]. This implies a change in the paradigm of ranking evolution. Before 2012 the YouTube ranking algorithm meant that low quality or misleading videos that appeared erroneously in top positions were watched regularly due to their high ranking position or misleading name. These unexpected clicks meant undeserved ranking for videos that could only be corrected by human engineers or negative likes. However, using the current algorithm, the Watch Time of the video is the leading feature for deciding the rank. That is, the algorithm considers the number of minutes that a video is seen by a user. Thus, if videos are interesting to users, their ranking should be stable, and, on the other hand, if videos are accessed by users and after some seconds the user realizes that this video is not the information that the user is looking Computational and Mathematical Methods in Medicine 3 for, the user stops the video, and the video is penalized in the ranking system.
So, according to the YouTube rank algorithm, the stability of the video ranking means that users accept the content of the video as correct. The aim of this paper is to analyze the behavior of health videos in the YouTube rank in order to evaluate if the videos from reliable sources have stability in YouTube rank. In order to achieve this, we have tracked the ranking of diabetes health videos on YouTube. This study aims to discover how Health Professional Channel videos from hospitals and health organizations are ranked on YouTube.

Methods
To analyze the stability of YouTube trustworthy videos in diabetes case, a case study has been selected as research methodology. To achieve that, we carried out the following steps.
First of all, we determined the research questions for making clear the research focus of the study. In this way, the research question presented in this paper is the following: Are the diabetes-related health videos on YouTube posted by hospitals, health organizations, and active users easily available to health consumers?
In order to answer this question we analyzed the stability of these videos, as we have pointed before, according to YouTube ranking algorithm.
After the determination of the research question, we perform a selection of keywords for formalizing the focus of the study. The list of keywords that we have used for performing the queries is shown as follows: After that, we selected a set of trustworthy information sources for being focused in the field and we identified a number of hospitals and health organizations publishing videos on YouTube (i.e., acting as YouTube channels) in order to create a white list of hospitals and health organizations. To create this list, we used the Ed Bennet Health Care Social Media List [18] and other secure media channels that we had identified in previous studies [35][36][37]. Besides taking these organizations into consideration, the literature shows that patients and caregivers have knowledge and experiences that they want to share and that patients like to receive information from peers. For this reason, we have included active users that predominantly produced diabetes videos in our white list. In total, our white list contains a total of 699 channels, where 651 were from hospitals, 30 were from health organizations, and 18 were managed by active users.
The next step after selecting the cases was the collection of data. In this way, in this study we obtained health videos from YouTube through textual search queries on diabetes-related issues. We set up a test environment, where 20 diabetesrelated queries were issued to YouTube over a 30-day period, from March until April 2013.
During this time, we collected the top daily 500 YouTube results for each query. Queries were issued using an anonymous profile to avoid any bias. The information collected included video name and identifier, channel identifier, and daily ranking position. Video and channel information was registered only the first time the video was detected. Later, only the ranking position was registered. The ranking orders were obtained by parsing the html of the result page in order to be sure that the data gathered was the same as users see, while video and channel information were collected through YouTube API version 2.0. Queries were issued with language option set to English. Figure 1 shows graphically the methodology followed in the collection of cases that was performed in two phases. First, we select the set of videos available on white lists that appears in the first set of queries performed with the 20 search keywords in YouTube server. This list of videos to follow was the cases to be daily analyzed in next phase. For that, in the second phase, the position of these videos was tracked for 30 days in order to evaluate their ranking stability in each search term query. Although the channels are active during the following phase, the new videos uploaded were not taken into account for the experiment in order to avoid bias and confusion. Using these daily searches we provide daily rank evolution statistics over the 20 search keywords.

Results
In this paper we wanted to check how fast it is possible to introduce new YouTube videos in the first ranking page for health purposes and how those results were so stable [38].
Using the 20 search terms that are shown in "List of Diabetes-Related Terms for YouTube Search" we tracked the  rank position of a total of 2372 YouTube health videos, included in white-listed channels, for the 30-day test period. The videos were uploaded from 73 hospital channels, 30 organization channels, and 18 user channels. We have analyzed the data using SPSS and Microsoft Excel to perform statistics and to plot the figures and charts.
In Figure 2 the distribution of white-listed videos compared with the rest is presented. As shown, the videos are distributed normally over the rank list, involving 9.71% of the total videos available in the top 500 rank. Figure 3 shows the distribution of videos grouped by type and rank. As shown, there are more videos from health organizations than from hospitals or active users. This is logical because in the white list there are more YouTube channels from health organizations than from hospitals and active users. In Figure 4, the standard deviation of the rank achieved during the study and grouped by the positions of the first day is presented. This graph shows a clear difference between being amongst the top positions and being in the last positions in the ranking system. It also demonstrates that the instability of the rank follows a logarithmic growth. The videos that are in the top positions have a greater rank stability than the rest. On one hand, the videos of the first 100 rank positions have a marked difference in their rank stability depending on their position, implying that videos in the very top positions are much more stable than the rest. On the other hand, the videos which are ranked lower than position 100 have their rank stability almost constant during the time of the study with a slight upward trend.
The effect shown in Figure 4 can be seen in another way in Figure 5 where the evolution of the ranks depending on time is visible. As shown, the evolution of the positions is more constant in videos in the top 100 positions than in the rest. This means that videos that are in top positions maintain their rank more easily than others. Figure 6 shows the behavior of the ranking of the videos for those videos ranked in the top 100 during the first day of the study. As demonstrated, the top twenty first ranked videos enjoy higher position stability than the rest of the ranked videos.
In more detail, Figure 7 shows the rank behavior of videos that are in top 20 positions. These are the most stable except for when the videos leave the top twenty first positions. The rationale behind this behavior is that the standard configuration of YouTube presents twenty videos per page. So videos falling closer to rank 20 are more likely to be relegated to the second page and therefore have less rank stability. This is because users are accustomed to visiting videos that are better ranked because they are in the top positions and therefore have more visibility. Another important feature of the figure is that the more visibility a video has, the less disruption is shown in the curve of the ranking evolution.  In addition to the evolution of the ranking of white list videos, we have analyzed the rank behavior of videos depending on the type of origin. Figure 8 shows the comparison of evolution of the rank of videos grouped in hospitals, organizations, or active users. The figure implies that although the slope of the evolution is similar in the three cases, in general, active users seem to be better ranked than hospitals and organizations.
Analyzing only the videos in the top 100 positions (Figure 9), the difference between the behavior of active users evolution rank and the rest is corrected. In this case the evolution of the rank over time is very similar. Finally, analyzing only those videos in the top 20 positions (Figure 10), hospitals and organizations have a better ranking than active users. However, the organizations and hospitals have decrease peaks in this graph while the evolution of active users is produced more gradually.

Discussion
Analyzing the results achieved, the rank evolution shows that for white list YouTube channels the videos that are located in the top positions are more stable than the rest. In addition, videos that make the first page (ranked 1-20) maintain their positions for a longer period of time. In fact, videos in top 5 positions maintained their ranking for practically the duration of the study period. According to YouTube ranking algorithm, those videos that are in the top positions but are not accepted by users (i.e., the user begins watching the video but ceases visualization quickly) are quickly rejected. So, according to our analysis, this lack of rejection shows that videos of white lists are well accepted by YouTube users in a diabetes based search. Thus, it would seem that although the positioning of health videos in YouTube in the very top positions (1)(2)(3)(4)(5) can be difficult, maintaining the videos in that high rank is not difficult for videos from white lists sources. This is proof of acceptance of users of trusted channels.
Although there are lots of videos in YouTube about diabetes, the sample that we analyzed has a correct significance (9.71% of the total videos) and they are well distributed over the ranking spectrum as can be seen on Figure 2. At the same time, the health organizations are the principal source of information (60%). Hospitals (22%) and active users (18%) are less active in creating contents for YouTube. However, the distribution of videos over the raking spectrum is similar as can be seen on Figure 3.
In general videos are stable in their position during the 30 days of the study. Figure 5 represents the average position of videos grouped by the position in first day. In this figure, videos of each group keep almost in the same position in average. However, if we see the evolution of rank in videos below the 100th position we can see that video position has a lower stability in time. In fact, Figure 4 shows how the standard deviation is decreased dramatically when the rank position of the video is lower than 100. According to this, when videos are displaced from the first page (each page has 20 videos), they suffer an important decrease of their rank. This is due to the loss of visibility. However, once this 7 important decrease occurs, the decline in rank is much more gradual, showing a good stability. In summary there is a big difference in the stability behavior between videos that are in first 100 positions and the rest. Also, trustworthy videos that are in first page (the first 20) have the best stability. So when a trustworthy video reaches the first 20 positions it has a high probability to keep its visibility.
According to the origin of the data, in general, active users are better ranked than health organizations or hospitals. Figure 8 shows the evolution of the rank grouped by origin. In this figure, it can be seen that the slope is similar in each group. It supposes that there are not significant differences in the behavioral obsolescence of videos, and the general decay of videos in YouTube rankings is similar. Figure 10 shows the average rank behavior of the top 20 videos grouped by origin. This figure shows that in top positions health organizations and hospitals are better ranked. In our opinion, this difference is due to a bias in the selection methodology. The users selected are the most active and visible and as a result, on average, they have a good rank. This would appear to be because users trust hospitals and organizations for knowledge on diabetes more than they trust other sources. Another interesting effect in the top positions is the marked decrease of mean rank of organizations. This effect is probably due to the higher production of videos from organizations. As shown in Figure 3 health organizations produce much more videos than active users. This means that new videos are continuously published and users see new videos instead of older ones. For this reason, although generally their videos are better ranked, their position decreases more quickly than less productive channels.
There are limitations to our work that should be noted. Firstly, even though our white lists of hospital and health organizations include a large number of channels, they cannot include every hospital and health organization available. The focus was not to track every relevant and trustworthy video in the result set from YouTube but rather to track videos from specific channel types that are assumed to be of interest to health consumers (i.e., hospitals, health organizations, and active users).
Also, it must be noted that we did not assess the quality of each video used in the study. Our study is based on the assumption that videos from hospitals and health organizations are of interest to health consumers and that availability of such videos is worthwhile investigating. In the light of user-interests in peer-to-peer healthcare, we also tracked availability of videos from active users, without making any claims regarding the trustworthiness of such videos. We are fully aware that videos from other channels (not included in any of our white lists) may provide useful and trustworthy information. We nevertheless believe that information provided by hospitals and health organizations is in general useful and of specific interest to health consumers, and we therefore chose to focus our study on such channels.
Furthermore, for each query we only examined the top 500 ranked videos from YouTube, when some queries return over 600.000 videos. However we have considered that a position over 500 is not significant in terms of availability to users. In addition, our test is limited to one month. In our opinion, this is enough time to evaluate the stability of YouTube videos due to the high quantity of videos uploaded daily.
Finally, the current study is based on diabetes queries. Although this study offers a basis for the behavior of rank in health related searches, these results cannot be directly extrapolated to other illnesses or health fields. In the future, we are planning to replicate our study using other terms in order to test if our findings can be extrapolated to other fields.

Conclusions
In order to analyze the ranking behavior of health videos in YouTube we have tracked the rank assigned by the YouTube search algorithm to trusted diabetes-related health videos for 30 days. For selected trusted videos, we defined a white list of dependable channels on videos published by hospitals, health organizations, and users actively publishing diabetes-related videos. Our findings show that the new YouTube algorithm, based on the number of visualization minutes of a video, makes the trusted videos stable in the top positions, implying that users accept these channels as a trustworthy information source.
Authors have shown how stable are YouTube rankings along the time and how difficult it is to introduce new reliable YouTube links in first ranking pages. Understanding how these ranking algorithms work is key in the public health field, as sometimes there are huge investments on the production of videos to promote digital health literacy but these videos are almost invisible due to a wrong search optimization strategy.
Additionally, results show the evidence that for YouTube and other searches criteria for reliable and valid health related information need to be different to the traditional Google algorithms, opening the way for new trustworthy health related searching engines.