User Evaluation of the Smartphone Screen Reader VoiceOver with Visually Disabled Participants

. Touchscreen assistive technology is designed to support speech interaction between visually disabled people and mobile devices, allowing hand gestures to interact with a touch user interface. In a global perspective, the World Health Organization estimates that around 285 million people are visually disabled with 2/3 of them over 50 years old. This paper presents the user evaluation of VoiceOver, a built-in screen reader in Apple Inc. products, with a detailed analysis of the gesture interaction, familiarity and training by visually disabled users, and the system response. Six participants with prescribed visual disability took part in the tests in a usability laboratory under controlled conditions. Data were collected and analysed using a mixed methods approach, with quantitative and qualitative measures. The results showed that the participants found most of the hand gestures easy to perform, although they reported inconsistent responses and lack of information associated with several functionalities. User training on each gesture was reported as key to allow the participants to perform certain diﬃcult or unknown gestures. This paper also reports on how to perform mobile device user evaluations in a laboratory environment and provides recommendations on technical and physical infrastructure.


Introduction
Since the last decade, touchscreen technology has been increasingly used not only across multiple types of devices, such as smartphones and tablets [1][2][3], but also in photocopying machines, automated teller machines (ATMs), and ticket machines in bus, railway stations, and airports.Reviews from the perspective of human factors and ergonomics and studies of people with developmental disabilities pointed out the relevance of the specific context of system interaction in order to maximize safety, performance, and user satisfaction [4] and the need for more research [5].Touchscreens require the use of fingers and a choreography of gestures for interaction between the user and the device's user interface (UI) [6,7].However, this type of screen interaction can represent a challenge for visually disabled users where the screens are designed for a visual feedback while using the system [8].
e World Health Organization (WHO) estimates that the number of people with visual disability is around 285 million globally and that about 2/3 of them are older than 50 years [9,10].Traditionally, visually disabled people have used different assistive technology devices, such as an external keyboard, a braille terminal, or a screen reader that provides speech feedback related to the visual elements on the screen.Mobile phones with physical buttons are still functional for many visually disabled people because of the surface and the rugosity of the buttons that provide palpable guidance when using the device.However, this type of communication device has become less popular in favour of smartphones with touchscreens that currently dominate the market.Smartphones with touchscreen interaction do mainly incorporate visual and sound feedback for communication with the user. is type of communication represents a challenge for the UI navigation to visually disabled people who do not see the screen with sufficient details and buttons without tactile feedback [11].Several solutions are available in the market to improve the accessibility of smartphone technology for visually disabled people [12][13][14].Some of these solutions are standalone products, and others are used in conjunction with other technology.One of the products available is VoiceOver [12], the integrated screen reader in Apple Inc. products.VoiceOver allows users to interact with the UI through gestures and with speech feedback to guide the navigation.
e screen reader has been included in Apple Inc. products since April 2005 in Mac OS X 10.4, since June 2009 in iPhone 3GS OS 3.0, and in iPad OS 3.2 since its introduction in April 2010.VoiceOver has to be activated in the device's settings, and when activated, the device provides a speech feedback when a user interacts using hand gestures on the touchscreen.
ere are different gestures that can be performed on the UI, and they provide immediate feedback interpreted by the screen reader.For instance, tap with one finger and drag will read the item in the cursor (selected), and four-finger tap near the top of the screen will read the first item at the top.e gestures must be made with the fingers, and the screen reader does not respond to voice commands or sense motion.
In this context, the research project "Visually impaired users touching the screen-A user evaluation of assistive technology" aimed at evaluating the accessibility and usability of a screen reader for touchscreens in smartphones [15]. is paper presents the results from the evaluation of the usability and the accessibility of the screen reader VoiceOver (iOS 7.1.2),which is an integrated functionality in iPhone mobile devices.In addition, the paper provides recommendations on technical and physical infrastructure to perform an evaluation of mobile devices in a laboratory environment.
e three research questions (RQs) targeted by this study were as follows: RQ1: What is the user experience of visually disabled users when interacting with the VoiceOver?RQ2: How is the VoiceOver screen reader response to a set of 16 performed hand gestures during a user evaluation?RQ3: What technical infrastructure can be suitable for an evaluation of mobile assistive technology with visually disabled users?Following this introduction, the research methodology and the technical test infrastructure are described.e results are presented based on the user evaluation outcomes and experience related to the test infrastructure.Furthermore, a discussion of the main results is provided followed by a summary of the research contributions and conclusions.

Materials and Methods
A mixed methods research approach was employed in the evaluation of the screen reader [16][17][18], with quantitative and qualitative measures.e evaluation was conducted in three phases: (1) individual user training at the participant's home and introduction to the gestures a few days before the test, supplied with a written instruction sent by e-mail; (2) a usability test in a controlled laboratory environment including a pretest interview for collecting participant background information; and (3) a posttest interview for qualitative analysis of the test output.e research team had three members whose background was health technology, educational training with assistive technology, and clinical practice.All research team members had professional experience in working with people with visual disabilities.
In the initial preparation of the study, phone interviews were made with three key informants with expertise in visual disabilities, who worked at the Norwegian State Agency for Special Needs Education Service (StatPed) [19].e goal of the interviews with the key informants was to gather insights on assistive technology for visually disabled people.Based on the interviews, a pilot test of the evaluation was prepared with a comparison of Android and Apple tablet devices.Two voluntary members from the Norwegian Association of the Blind and Partially Sighted [20] participated in the pilot test, running several tasks.Afterwards, a focus group interview was conducted in order to better understand the interactions and any of the problems that the users found.In the phone interviews and also in the pilot test, the informants explained that their experience was that the smartphone iPhone was the most commonly used and preferred device among their peers, also visually disabled people.Based on that information, an iPhone 4 (iOS 7.1.2)device was chosen for the study (the device can be seen in Figure 1) because it was widely available and had the VoiceOver screen reader integrated.e tasks were inspired by the standard gestures' descriptions in the VoiceOver guide manual [21].

Recruitment of Participants.
e recruitment of participants was made in collaboration with the Norwegian Association of the Blind and Partially Sighted [20].In addition, the professional network of one of the researchers with expertise in teaching and user training of assistive technology was used to support the recruitment process.e first contact made with the participants was a phone conversation to inform them about the study.
e second contact was an e-mail with information about the study and a consent form to be signed by each participant.Six visually disabled people were recruited to participate in the user evaluation, see Table 1 for distribution of participants.ey had a mean age of 42.8 years and an average of 1.9 years of user experience with VoiceOver.All the participants had previous experience with using a screen reader for desktop and/or laptop computers.

Test Procedure.
In the first phase of the evaluation, each participant had individual user training at home (Figure 2) on 16 specific hand gestures for screen interaction.e individual user training lasted 15-30 minutes (with an average of 21.7 minutes), led by a member of the research team.
e gestures that a user knew in advance and which ones were learned during the training were registered during the training session.
e second phase was executed in a usability laboratory.One of the researchers acted as the moderator and sat down 2 Mobile Information Systems beside the test participant.e participants were informed about the subsequent test and signed a consent form before the test began.Demographic information and user experience with specific technical devices were also collected.Each user evaluation followed the same test plan, with a set of 16 tasks related to the use of gestures for touchscreen interaction.e moderator guided through the tasks and asked the participants to speak out loudly during the task solving (Figure 3) following a think aloud protocol [22][23][24].e task solving was followed by a posttest individual interview (third phase).e participants were asked to score the gesture performance and task solving, choosing among three categories: "easy," "medium," or "difficult."In addition, problems or obstacles observed or reported were discussed.
e interviews also covered the general user experience with the smartphone and the first-time use of the VoiceOver.
Each test session (second and third phases) lasted between 90 and 120 minutes, and a total of six test sessions were run across three separate days.

Technical and Physical Test Infrastructure.
e evaluation was executed in the usability laboratory at the Centre for eHealth of the University of Agder, Norway [25].e usability laboratory consisted of two rooms; one test room and one control room, connected through a one-way mirror with visualisation towards the test room.In the test room, the moderator was placed together with a test participant, and in the control room, two observers followed the test from monitors and directly through the one-way mirror.e technical and physical infrastructure is described in Figure 4.

Data Collection.
e test sessions were audio-visually recorded in a F4V video file format.e recordings from two audio-visual sources were merged into one video file using the software Wirecast v.4.3.1 [26], with multiple video perspectives and one single audio channel.e files were exported to the Windows Media Video (WMV) format and then imported to the qualitative software tool QSR NVivo 10 [27].e recordings were transcribed verbatim and categorized for a qualitative content analysis [28].Quantitative measurements of the time and number of attempts in the task solving were made as a part of the analysis of the recordings.In addition, the research team made annotations during the test sessions that were included in the data collection (Figure 5).

Ethical Approval.
e Norwegian Centre for Research Data [29] approved this study with the project number 40636.All participants received verbal and written information about the project and confidential treatment of their collected data.ey were informed that their participation was voluntary, and each participant signed a consent form.e participants were aware that they could withdraw at any time without reason.In that case, their data would be consequently withdrawn and deleted.For health and safety reasons, each test participant was thoroughly informed about the physical environment before entering the test room and the participants were never left alone in the laboratory facilities.

Results
All six participants went through the laboratory test.e test results are presented divided into three categories: user training, quantitative metrics from the user tests, and qualitative outcome of the posttest interviews.

Pretest User Training.
e familiarity with the Voice-Over gestures registered in the user training is presented in Table 2. e registration showed that all participants knew the double tap gesture (number 4) and three-finger flick to the left or right (number 10). 5 out 6 were familiar with the one-finger tap gestures (numbers 1-3).For gesture numbers 6 and 7, the four-finger tap at the top or the bottom of the screen, 5 out of 6 participants did not know them in advance.

User Evaluations.
e quantitative measurements from the user evaluations are presented in Table 3, separated in six columns.
e first column describes the 16 VoiceOver standard gestures that were used to solve the associated task.
e tasks are described in the second column.e third column displays the average number of attempts needed for the task solving.e fourth column shows the task solving average time that was used, measured in seconds.e fifth column presents the system response to the gesture interaction differentiated in the categories "consequent" and "inconsequent" speech feedback.Consequent speech feedback refers to sufficient and adequate information in the system response and inconsequent feedback to insufficiency or lack of information in the system response.In usability studies, the task accuracy is often categorized into completed or not completed task [23,30].In this particular test, there was an additional variable related to the task performance, which was the feedback that the system provided when a participant performed a specific action.e categories chosen were therefore "consequent feedback" or "inconsequent feedback" to the specific hand gesture performed.e "consequent feedback" referred to the system appropriately    Mobile Information Systems providing feedback that corresponded to the hand gesture performed by a participant.e "inconsequent feedback" referred to a system feedback that did not correspond to the hand gesture performed by a participant of absence of any feedback.e sixth column specifies the type of inconsequent response occurred.Mobile Information Systems e performance of three different one-finger tap gestures (tasks 1-3) for speaking the item in the cursor required many attempts to succeed.e system response was consequent.e double tap and slit-tap gestures (tasks 4-5) were easy and fast to perform for the participants.e gesture four-finger tap at the top and bottom of the screen (tasks 6-7) were reported as technically difficult to perform by the participants, which was also indicated by the time for the task solving.e gestures two-finger flick up and down to read the page from top or bottom (tasks 8-9), were easy to perform and showed consequent speech feedback.e three-finger flick and tap gestures (tasks 10-11) were reported as easy to perform, but there was inconsequent system response related to insufficiency in the speech feedback when trying to inform about the current page.For the rotor-related tasks, 12 and 13, two of the participants needed several attempts (7 and 41) for finding the rotor settings, but adjusting the speed of the speech feedback was easier.e gestures three-finger double and triple tap (tasks 14-15) were easy to perform and with a quick task solving.
e two-finger double tap in task 16, to terminate a phone call, was easy to perform but there was inconsequent feedback from the system and the phone call was not terminated in three out of six tests.

Posttest Interviews.
e participants graded the performance of gestures and task solving (Table 4) during the individual posttest interview.
Five of the gestures in the task solving were categorized "easy" to perform, such as the one-finger double tap and the three-finger double and triple taps.Six gestures were categorized as "easy" or "medium," such as the one-finger flick up and down and three-finger tap.ere were gestures that were categorized as "difficult" by two participants, such as the four-finger tap at the bottom and the top of the screen and the two-finger double tap.e task for the two-finger double tap was termination of a phone call, and in the interviews, the participants confirmed that during the test but also in general, the gesture was associated with inconsistency from the system.For the rotor-related gestures, one participant emphasised the importance of user training to succeed with the specific use of the rotor function.
Regarding the first-time user experience, all participants needed user training to be able to start using the smartphone and for activation of the screen reader VoiceOver.ree had family or friends that helped them with the first-time use: one went to a course organized by the Norwegian Association of the Blind and Partially Sighted and two found it out by themselves explaining that VoiceOver as such provides user training and guidance by informing about which gesture to perform for an action.Four participants stated: It was a bit complicated with first-time set up of the new phone with apple-id and activation of VoiceOver, besides that it is easy to use.[. ..]After user training, when I understood how the system worked, I found it easy to use.[. ..] e functions make sense, and there is a logical structure.[. ..]It was terrible in the beginning, because I knew none of the gestures and I wanted to throw the phone away, but the price stopped me from doing it . . .now I find it fantastic!Two participants highlighted the benefits of the smartphone: I like that I can buy it myself in the store, I did not need to apply for and receive assistive technology from the municipal services.[. ..] is is the first device I use with built-in accessibility, as the screen reader is included.
Two participants described how the use of the screen reader had increased their self-management: I feel more included in the society, now I can use the Internet and check the same apps as other people do, such as Facebook, weather forecast and reading news.[. ..]It is a feeling of freedom when the phone can read messages for you when you are outdoors, before I had to ask people I did not know about reading from the screen if I received a message, I can now manage it myself and that is a new world for me.In addition, one participant expressed: VoiceOver has made my life much easier and I have become much more independent.Everyone with a visual impairment should use a phone with it.6 Mobile Information Systems However, user text input with the VoiceOver keyboard was reported as complicated by four participants, and, for this reason, those participants preferred to use an external keyboard.Another participant stated: It was hard in the beginning with the virtual keyboard, but with some training I overcame the difficulties.Five participants told that they preferred to use at home a desktop or laptop computer with reading list because the text input was quicker than in the smartphone and relying on the latter when they were out of home.Two participants expressed that it was easier for them to navigate on a small screen when compared to a larger tablet screen.

Discussion
is paper has presented a user evaluation of the Apple screen reader VoiceOver (iOS 7.1.2)with six visually disabled participants.
e aim was to identify challenges related to the performance of the standard VoiceOver gestures and evaluate the associated system response.Considering the sensory limitation of the target user group, the screen reader was expected to be intuitive with an optimal presentation of the functionality and distribution of the UI. e study showed that most of the gestures were easy to perform for the participants; however, some gestures were unfamiliar to the participants, especially those connected to the rotor function.e possibility of receiving individual user training before the evaluation was an advantage to succeed with the practical use of those gestures.
e system appropriately responded to the users' hand gestures, but inconsistent responses and lack of information were reported in the two-finger flick up, three-finger flick to the left or right, three-finger and double-finger taps.
e three research questions (RQs) formulated at the beginning of this paper are answered below based on the results from the study.
RQ1 asked about the user experience when interacting with the VoiceOver.e user experience with VoiceOver in general was positive, as the function was described to increase the self-management and support independence.Most of the gestures were both reported and observed as easy to perform, with some exceptions.e two most difficult ones reported by the participants were the four-finger tap and the two-finger double tap gestures.e gesture made using four-finger tap on the bottom or on the top of the screen to, respectively, read the content of the UI from either side was explicitly reported as difficult to perform.
RQ2 asked about the system response to the 16 hand gestures made on the touchscreen mobile device.e speech feedback appropriately responded during the test with useful information for participants to navigate through the UI, but a few inconsistent responses on correctly performed gestures were registered such as with the two-finger double tap to terminate a phone call.
e phone call was terminated correctly only in 3 out of 6 tests and can be considered as a weakness in the system with a negative consequence for the users since speaking on the phone is one of the most frequently used functions.Other user problems identified were related to the gesture made by three-finger flick to the left or right for swiping between screens where the speech feedback was inconsistent and lacked information.
RQ3 asked about recommended technical infrastructure in evaluations of mobile assistive technology with visually disabled users.A suitable infrastructure would be the one that optimizes the data collection and allows an effective retrospective analysis under more demanding conditions than other user evaluations.In addition, the comfort, safety, and trust of the visually disabled test participants are crucial to avoid interference and distortion with the test results.e described technical and physical infrastructure in Figure 4 serves as an example of a controlled scenario for an evaluation with the same type of technology and participants.
e video recordings require a sufficient quality allowing us to zoom in the user interface and the finger interactions in details.A professional software video program is needed to substantially reduce the speed for optimal viewing and retrospective analysis.In addition, the data should be collected with synchronized audio and video signals because streaming over a network usually incorporates latency.e synchronization is of high importance for the retrospective analysis, as the gestures and finger interactions with a mobile device's screen are often made at high speed.Another issue experienced and specific for tests with visually disabled participants was that that the sound from the VoiceOver interfered and overlapped with the sound from the test participant and the moderator in the recordings from the table microphone unit.
is might complicate the retrospective analysis, and based on that experience, we recommend using several microphones to record the sound sources separately.
is study of the screen reader VoiceOver had some limitations such as the number of test participants (n � 6) and tests were conducted only in a usability laboratory setting.However, the number of the participants with a distribution in their ages and smartphone skills meaningfully represented the user group of visually disabled users of smartphones.Other studies have shown that a small number of participants in usability studies can be sufficient for having valid results [31][32][33].e laboratory setting allowed the collection of detailed research data under controlled conditions.e collected data material was thoroughly analysed in detail to study the interaction between the visually disabled user and the UI touchscreen.Furthermore, the application of mixed method research, combining laboratory tests with detailed interviews, provided insights into the user experiences, as well as benefits and barriers of using the VoiceOver function.

Conclusions
is study was made as a part of the project "Visually impaired users touching the screen-A user evaluation of assistive technology" that aimed at evaluating the usability and accessibility of the screen reader VoiceOver.e main contribution of this study lies in the detailed analysis of the interaction with gestures between the visually disabled participants and the screen reader, preceding the responses from the system.In general, most of the hand gestures were Mobile Information Systems easy to perform for the participants, although user training played a key role for the understanding and successful performance of specifically complex gestures.Without training, participants could not have been able to perform such gestures.e system response and speech feedback were in most cases correct, but some functionalities of the system might be improved.e results presented are in line with other studies on assistive technologies and visually disabled users [34][35][36].
e methodological procedures with the use of mixed methods, combining quantitative laboratory test with qualitative interviews and observations, can be recommended to other studies of similar characteristics.e test procedure with user training on the specific hand gestures in advance reduced the memory load in the laboratory test situation, as all the participants were familiar with the gestures and could focus on performing the tasks.e application of a think aloud protocol in the usability laboratory together with posttest interviews is strongly recommended for other studies related to touchscreen assistive technology because they may provide a more comprehensive result.
In terms of future work, it is proposed to validate the laboratory results in the field and address research with a larger sample size focusing on text input and navigation using VoiceOver on a smartphone or tablet device.A comparison between the screen readers VoiceOver from Apple Inc. and TalkBack, which is mainly developed for Android devices, could illustrate differences across different platforms.e integration of VoiceOver in the Apple Watch provides new opportunities of studying user-friendliness and accessibility for visually disabled users.A comparison of the use of VoiceOver on a desktop or laptop computer which are generally more command based could be easily made in a similar usability laboratory.Finally, newer models of iPhone to date, such as 8 and Xs, provide more tactile feedback through vibration during interactions than previous versions and the impact of those functions for visually disabled users would be interesting to evaluate.

Figure 1 :
Figure 1: e smartphone used in the test.

Figure 2 :
Figure 2: User training of VoiceOver gestures at a participant's home.

Figure 3 :
Figure 3: e moderator (left) guiding a participant (right) through the task solving in the test room.

Figure 4 :
Figure 4: e technical and physical test infrastructure.

Figure 5 :
Figure 5: e control room showing the visual access to the test room through the one-way mirror.

Table 1 :
e background of the test participants.

Table 2 :
Familiarity per participant with the VoiceOver gestures in the pretest user training.

Table 3 :
Quantitative metrics of the user evaluations.

Table 4 :
e grading of the task solving made by the participants in the posttest interview (n � 6).Tap with one finger and swipe to right or left Speak the item in the cursor: find the app Calendar