A Rollercoaster to Model Touch Interactions during Turbulence

We contribute to a project introducing the use of a large single touch-screen as a concept for future airplane cockpits. Human-machine interaction in this new type of cockpit must be optimised to cope with the different types of normal use as well as during moments of turbulence (which can occur during flights varying degrees of severity). We propose an original experimental setup for reproducing turbulence (not limited to aviation) based on a touch-screen mounted on a rollercoaster. Participants had to repeatedly solve three basic touch interactions: a single click, a one-finger drag-and-drop, and a zoom operation involving a 2-finger pinching gesture. The completion times of the different tasks as well as the number of unnecessary interactions with the screen constitute the collected user data. We also propose a data analysis and statistical method to combine user performance with observed turbulence, including acceleration and jerk along the different axes. We then report some of the implications of severe turbulence on touch interaction and make recommendations as to how this can be accommodated in future design solutions.


Introduction
The future of aviation and motorized vehicle control will potentially include the integration of touch-sensitive interfaces into the control displays [1]. This would afford a more intuitive, direct, and tangible exchange of information between the user and system, as there is no displacement between control and feedback [2]. This research seeks to explore the effects of turbulence on the user's ability to complete simple touch-sensitive tasks and has been conducted as a part of the ODICIS project (ODICIS European project (2009-2012) http://web.archive.org/web/2013/www .odicis.org/) on "One Display for a Cockpit Interactive Solution"-a European initiative that aims to create a prototype of a future airplane cockpit. The main objective of this project is to develop a large seamless touch-sensitive display to be used in cockpits (Figure 1), and this paper is part of a series of experiments [3,4]. We argue that the focus of future designs should be on balancing the affordances and constraints of the technology (i.e., multitouch interfaces) and physicality (i.e., vibrations and turbulence) depending on the safety requirements surrounding a given task. Panning a large map of the flight plan during stationary is different from doing checks while taxiing to the runway, which in turn is different from doing engine checks during turbulent conditions. If touch displays are to be implemented successfully into the cockpits of planes or other vehicles, an understanding of how turbulence affects this type of interaction is needed.
We seek to make a contribution to this matter by studying atomic touch interactions such as clicking, dragging, and pinching during various degrees of turbulence along the three axes, using acceleration and also jerk (variation of acceleration) to characterise turbulence. We also provide an original motion simulation approach using a rollercoaster, with appropriate data processing and statistical analysis, which are generic and thus may also be used in domains other than aviation.

Background
Touch-screens have been appearing in a multitude of shapes and sizes since the 1960s [5]. The possibility of direct interaction had an immediate appeal owing to its circumvention of the need for unnecessary cognitive mapping between 2 Advances in Human-Computer Interaction an input device and interface [2]. The low cognitive load results in easily learnt interaction behaviour. The fact that there is no need for an independent input device and the relative robustness of the screen itself make this type of interaction relevant in mobile and turbulent situations where simplicity of interaction is of great value [6], e.g., in an airplane cockpit [1,3,7] or while controlling other vehicles.
However, touch interaction also has some innate constraints. In the 1980s and early 1990s multiple studies were done that identified touch interaction as a method that was fast and intuitive, but error prone and imprecise compared with other input modalities [8]. The drawbacks of touch interfaces are occlusion (i.e., that the hand covers part of the screen while a selection is being completed), the fact that targets must be larger than the width of the finger (rendering touch a lower resolution selection tool compared to, for instance, the mouse or even touch completed with a stylus) [6], and finally the loss of tactile feedback. Various strategies have been used to create solutions that overcome occlusion and enable high precision selection [4,6,[8][9][10][11][12][13].
Safety is one of the main concerns when incorporating touch-sensitive displays into vehicular control, as the consequences of errors are potentially enormous. One way of dealing with safety concerns with tactile interaction is to implement selection sequences of varying complexity, including confirmations [14]. In any case, how turbulence affects touch interaction is of fundamental importance.
The effects of vibrations and turbulence on manual controls have been explored mainly in laboratory environments [15][16][17] (Lin, 2010), where the amplitude and directions of the vibrations can be controlled [18]. An example of this is a 6-degree-of-freedom (lateral, longitudinal, vertical, pitch, roll, and yaw) Stewart platform [19][20][21]. This type of reproduction of turbulence is a research field in itself and has been formalised, for instance, in procedures [22] used by NASA (National Aeronautics and Space Administration, USA). One of the main problems with Stewart platforms is forward kinematics-it is very difficult to create a realistic simulation of forward longitudinal motion. Another problem with common Full Flight Simulators (FFS) is that they could not be extreme enough for the purpose, in particular regarding sustained acceleration higher than 1 g for several seconds.
More recently, however, Hourlier, Guérard, Barou, and Servantie [23] developed and tested scenarios containing simulated turbulence with the help of expert pilots. Their platform was a 6-axis hexapod that could mimic and exceed up to 8 g's of acceleration in a reliable way. This was used to create 6 turbulence profiles that were tested on the expert pilots, who rated the realism and severity of the experienced turbulence, in order to obtain turbulence conditions from mild to severe (according to the FAA (Federal Aviation Administration, USA) turbulence reporting criteria table). Of relevance, a study by Cockburn et al. [24] compared touch and a trackball as input devices during turbulent events simulated with a hexapod (cf. [23]) that delivered vibration along three axes. Results depended on precision requirements of the task such that low precision tasks elicited more accurate and faster performance in the touch input condition, irrespective of vibration level, relative to the trackball input condition. Furthermore, the trackball input condition resulted in more accurate and faster performance in tasks that required high precision, regardless of vibration level, compared to the touch input condition.
Research done on the effects of physical vibrations has for decades focused on tasks such as reading and writing [18,[25][26][27], though some research has then been done on exploring situational impairments that can occur when using touch on a mobile device while walking and in other contexts [28,29]. An abundance of research on vibrations is concerned with the effects on manual and visual tasks [15-17, 27, 30]; however, Dodd et al. [31] studied the effect of three levels of turbulence on data entry performance. They used no turbulence as the control condition and light and moderate levels of turbulence (e.g., 35 ∘ /s for yaw, pitch and roll and 18 ∘ /s for heave, surge and sway for the medium turbulence condition) elicited the highest levels of data entry errors and perceived mental workload and slower completion times. Lancaster, de Mers, Rogers, Smart, and Whitlow [32] had worked on a related project but drove a van to simulate touch interaction during turbulence. They collected completion time, data entry errors, perceived workload, and electromyography in order to obtain an objective measure of touch-interaction performance during moderate turbulence.
Another experiment which was designed to explore interactions that could be applied to various vehicle controls was the study by Lin et al. [33], who compared the performance of a trackball, a touch-screen, and a mouse in a vibrating environment. They initially tested the input devices in a static environment and found that the touch-screen performed the best. However, the error rates of the touch-screen increased severely under conditions of vibration. Taking several performance measures into account, they conclude that the mouse was the best device under conditions of vibration, where the trackball suffered mainly from long completion times and the touch-screen from high error rates [33]. Similarly, Mansfield et al. [17] also found the mouse to be superior to the touchscreen in an experiment simulating the conditions of PC use for train passengers. McDowell et al. [21] found that terraininduced ride motion of simulated military vehicle degrades reaction time and accuracy of operators (who are not looking outside and thus lack external visual cues). Studying the effect of motion on the use of a touch-screen system for tasks unrelated to driving in a military ground vehicle, Salmon et al. [34] found a positive learning effect, with a degradation of usability due to higher motion, but without characterising their "low" and "high" levels of motion in physical terms (e.g., acceleration along different axes) or the physical abilities of the simulator. As opposed to the elementary touch interactions that we use, Salmon et al. 's experiment tested a number of higher-level tasks such as writing, reading, and a combination of panning and zooming.
Other experiments completed under static conditions show that the touch-screen and mouse excel in different areas, with the mouse being most proficient in dragging tasks and touch for pointing [35]. In the context of a static aircraft cockpit, Stanton et al. [1] found that touch-screens provide a higher usability than trackball, rotary controller, and touch pad, but at the cost of some higher body discomfort.
As it is a younger technology, there is still much room for improvement in the touch technology itself, unlike that of the mouse. For instance, it will be a fundamental improvement when advanced haptic feedback, such as vibrotactile effects, becomes a standard part of touch-screen interfaces. The assumption is that many of the errors that occur for touch under turbulent conditions are due to difficulties in hand and eye coordination [18,25]. Haptic feedback could aid in eliminating these errors [36].
We believe that a better understanding of touch interactions during turbulence will allow designers to circumvent some of the problems by optimising visualisation and interaction, thereby allowing the advantages of direct manipulation [37] to be realised in a vibrating turbulent environment. This paper seeks to contribute towards that direction, including aspects not addressed in the above-mentioned literature such as the role of jerk (variation of acceleration) and differences among atomic touch interactions that are clicking, dragging, and pinching.

Experimental Context
In the ODICIS project, the use of a large single screen with multitouch capabilities without haptic feedback is a fixed choice. Indeed, despite the acknowledged advantages of vibrotactile feedback [36], no technology was available with reasonable efforts at the time of the project that could cope with the needed large screen areas coping with multifinger and multiuser interaction. This large touch-screen must be optimised to suit the needs of pilots both in normal and abnormal contexts. In addition to this, many other lowerlevel requirements have to be addressed prior to seeking a flight certification, such as redundancy and tolerance to shocks, fire, smoke, extreme temperature, and electrical constraints. The screen and its operators will have to withstand alterations such as sun reflections, dust, fingerprints, and various vibrations and turbulence (which happen at different levels of severity during flights). In this paper, we focus on a single type of perturbation, namely, turbulence. Subsequently, some general rules can be derived on how best to deal with high and low turbulence. In order to create comparable and reproducible results, a certain level of control is required when conducting experiments. The experimental setting had also to be robust enough to run many sessions with different parameters (e.g., different tasks, different test subjects) while maintaining the same sequences and levels of turbulence. These requirements precluded conducting experiments during real flights. A setting was required where turbulence could be produced in a controlled manner.
Due to the cost and difficulty to access a platform simulating motion (such as in some flight simulators) and also due to the already-mentioned physical limitations of Stewart platforms [38], we went in search of an alternative way of exploring the effects of turbulence and vibration on touch interaction. The opportunity arose of using a rollercoaster at the Tivoli Gardens (Tivoli A/S: http://www.tivoli.dk), an amusement park in Copenhagen, Denmark. The chosen rollercoaster was Odinexpressen, a "powered coaster" constructed in 1985. It is about 300 m long and has a speed of about 60 km/h ( Figure 2).
An amusement park ride made sense for several reasons not limited to the financial cost. First, realism in regard to the user experience was attained, even if realism in regard to a flight experience was less accurate: some observed turbulence such as strong lateral acceleration is not common for aircraft but may be of interest for other domains and can be filtered out in the data if so desired. The turbulence experienced by the user was severe and, most importantly, occurred in a repeatable sequence, meaning that each test subject experienced the same pattern of turbulence.
Two rounds of experiments were conducted: the "autumn trials" on 29 November 2010 and the "spring trials" on 25 May 2011. History of this article: The experiments occurred during a European research project and, after a long analysis phase, resulted in an extensive internal report. We then attempted a shorter academic publication, which, after a long review process, got some justified critique requiring some changes. We implemented most of the required changes but never finalised them due to each of the authors having changed mission or job after the end of the project. An invitation from the AHCI journal in May 2018 acted as a catalyst to implement the remaining changes, with, in particular, a significant effort to incorporate the most recent references.

Hardware Setup.
To simulate the touch-screen that will be used in ODICIS, a state-of-the-art 22 (56cm) LCD touchscreen was bought. The chosen model was from the 3 M Company (the multitouch display M2256PW) and was about the closest we could get to the high specification characteristics targeted by the OCIDIS screen (that uses a projection system, and infrared cameras for the tactile input) in particular in size and glass surface. The screen uses P-MVA technology, is multitouch and able to track up to 20 fingers with capacitive sensing, and has a resolution of 1680 × 1050 pixels, a video response time of 8 ms, and a hardware touch point speed of 6 ms.
Significant efforts were devoted to the safe mounting of the screen on a wagon of the rollercoaster. This involved building a metallic structure that would ensure the participants' safety as well as fulfilling the general safety regulations of the amusement park, while simultaneously allowing quick assembly and removal without damaging the wagon. The main safety concern was to ensure that the screen was securely fastened and could not fall off and thus injure either the participants or park visitors. An overview of the setup can be seen in Figure 3.
Each wagon consisted of two pairs of seats. Participants would sit on the rear pair of seats and a large wooden box containing all the necessary equipment was placed in the front seat foot area. The energy was provided by a 12V leadacid battery of 110Ah.
The box contained a laptop computer with a solid-state drive (a traditional hard-disk drive would not have been able to sustain the vibrations). The laptop PC was mounted in a docking station for more convenient connectivity (e.g., more ports, and screws to secure the video cable). A spatial sensor (compass 3-axis, gyroscope 3-axis, accelerometer 3-axis) was taped to the lid of the box to collect the turbulence data through a USB connection. The model (PhidgetSpatial: http://www.phidgets.com/products.php?product id=1056) from the Phidgets Company was the "1056 -PhidgetSpatial 3/3/3". The gyroscope worked in the range of ±400 ∘ /s with a resolution of 0.02 ∘ /s and a drift of about 4 ∘ /minute. The accelerometer had a resolution of 228 g, a range of ±5g (±49m/s 2 ), and a noise level at 128 Hz of = 300 g on x and y axis, and = 300 g on z axis. A GPS device was also embedded for some of the sessions, but the positioning lacked any useful degree of accuracy.
The sensor was queried every 32 ms, that is, 31.25Hz, and the nine parameters of the sensor logged (a sample is provided by Figure 19). Due to the way the motion sensor was mounted, we had the following mapping between the three accelerometer channels and the conventional terminology: A0 = -pitch axis (lateral), A1 = -roll axis (longitudinal), A2 =yaw axis (vertical). Since the accelerometer reports resistance to acceleration, a positive value was to be understood as the direction of the force (e.g., like gravity directed towards the centre of earth) and the opposite of the movement. Furthermore, for the mapping of the three gyroscope channels we had G0 = -pitch; G1 = -roll; G2 = +yaw.

Software Setup.
The intent of these initial experiments was to gather some general information about the effects of turbulence on the use of a touch-screen. The focus was on fundamental interaction principles (e.g., click, drag, zoom), and not on the use of a full and complex user interface such as what would be found in a plane cockpit. The tools employed to program the software for the experiments had to allow for fast prototyping and robust high-level tactile interaction: the Microsoft Windows Presentation Framework (WPF) was selected on Windows 7, programmed in.NET 4 with C♯ (C Sharp) and XAML (Extensible Application Markup Language). The chosen touch-screen was also directly compatible with the tactile functionalities of Windows 7. The software also logged user activity (e.g., time and location of the clicks, drags) as well as successes and failures in relation to the parameters of the scenarios.

Participants.
During the spring trials, 12 participants (5 female) did a total of 33 turns in the rollercoaster, and 2 of our participants were civil aircraft pilots. All had tried a touch-screen and a rollercoaster before. However, none had tried a touch-screen in a rollercoaster before. Due to various technical difficulties, where the system would occasionally need to be restarted, only 20 of the turns were used for data analysis. Some participants did more than one turn, and this was accounted for in the subsequent data analysis.

Tasks.
Each participant did one or more sessions, consisting of 3 consecutive rounds on the rollercoaster without a stop. The ride itself lasted approximately 3 minutes, which was the normal operational procedure of the chosen rollercoaster.
The software written for the experiments looped through three types of tasks that required a user action to solve them: tap, drag and drop, and zoom. The starting point of the sequence was randomly selected for each session. As soon as a user solved a task, the following task was automatically activated.
Advances in Human-Computer Interaction 5 Figure 4: A yellow circle needed to be clicked by the user (various sizes and locations).

Tap
Action. The first type of task to solve was very simple: it required the user to tap (touch with one finger) a yellow circle (cf. Figure 4) displayed at a random location on screen and at a random size ranging from 2 cm to 7.5cm. The minimum size of the yellow circle was chosen so that a fingertip could be placed on it for a click.

Drag and Drop
Action. The second type of task still only required one-digit interaction. In order to complete it, the user needed to drag a red disc into a specific location (drop area) on the screen represented by a blue circle (cf. Figure 5(a)). When the red disc was moved over the blue destination circle, the colour changed to a cyan border and yellow background (cf. Figure 5(b)) to indicate that the task could be completed if the red circle was dropped at this precise location.
The location of the drop area was randomly distributed on the screen, and the initial location of the red circle was where it was left at the end of the previous drag and drop task. The software ensured that the random location of the drop area was such that the red circle was not already at the destination.

Zoom Action.
The third and last type of task was solved using a two-finger pinching gesture which is translated into zoom, modelled on the zoom gesture used on mainstream touch interfaces such as that of the smartphone or tablet. The objective of this task was to change the size (zoom in or zoom out) of an orange solid square to make it match a preexisting blue square reference frame. This would initially be smaller or larger than the orange solid square that was the target for manipulation. The width of the border of the blue frame expressed a constant tolerance of about 1.4cm.
Two fingertips (not necessarily from the same hand) were placed on the orange solid square (Figure 6(a)). By increasing or decreasing the distance between the fingers, the size of the orange square increased or decreased accordingly ( Figure 6(b)). The objective was to match the size of the orange square to that of the reference frame: when the two matched, the frame would change from blue to cyan ( Figure 6(c)). At this point, the user could end the manipulation by removing both fingers from the screen. If the user continued the manipulation, the target would be over-or undershot (e.g., from (a) to (b)) and would require further adjustment of the orange square.
The locations of both the orange square and the blue reference frame were always at the centre of the screen, also during manipulations of the orange square. The size of the orange square was left from the previous task of the same type; the minimum size was 2.8cm, so that two fingertips could be placed on it and manipulate it. The reference frame was of random size, with a minimum size just large enough to include the orange square (2.8cm inside, 5.5cm outside) and a maximum size of 17.5cm.

Data Characterisation
Before starting the analyses, some descriptive statistics were made to better understand the sensor data, the variable properties of the three task types, and the global performance of the participants in terms of completion type and number of errors. Figure 7 characterises the three accelerometer axes (A0, A1, A2) as well as the total acceleration (A), which was calculated as an Euclidian distance from the three accelerometer axes. The reported values were aggregated by sessions using a "root mean square" function (RMS). with earth's standard gravity being 1 g=∼9.8 m/s 2 , a constant 1 g was added to A2 (vertical acceleration) to make the comparison with the two other axes A0 (lateral) and A1 (longitudinal) more straightforward. On the rollercoaster used, there was no looping, so this approximation should have been acceptable. One can see from Figure 7 that there was lesser acceleration along the longitudinal axis (A1), with a high density at a low value of around 0.1g. Conversely, lateral acceleration (A0) was dominant.

Characterisation of the Physical Turbulence.
From the point of view of the participants, when the global speed of the rollercoaster (in m/s) was constant (i.e., for the three axes), no turbulence was experienced at all, even when speed was high but constant. Furthermore, when global acceleration (in m/s 2 ) was constant, participants adapted to it, as we all do with the constant 1 g acceleration due to earth's gravity.
Jerk, or the variation of acceleration over time (in m/s 3 ), should thus be a more straightforward way of characterising the difficulty associated with the experienced turbulence. Figure 8 reports the density distribution of jerk, averaged by RMS over all the sessions. The calculation of jerk was however sensitive to the sampling frequency and smoothing, in particular because the accelerometer data is by nature very noisy.
Finally, as an attempt to better characterise the experienced turbulence, a Fourier analysis was done, but no major frequencies (in Hz) were identified; the signal was closer to white noise than the clean frequencies found, e.g., in flight simulator platforms.

Characterisation of the Tasks Properties.
Some descriptive statistics were also made to characterise the tasks given to the participants. Figure 9 reports the density distributions of the following: the distance from the centre of the screen of the circle to touch for the "tap" task in red; the distance to the target for the "drag" task in green; and the distance to adjust for the "zoom" task in blue.

Characterisation of the Participants' Performance.
The two main participants' performance indicators were described by a density distribution of their completion time in Figure 10 and number of errors in Figure 11. Completion time was defined as being from the time when the task was displayed to the participant until the task was solved. One can  see in Figure 10 that the "drag" was slower to perform, while "tap" was the fastest.
Errors were defined as actions outside the target areas, e.g., click outside the circle for the "tap" task, drop outside the target for the "drag" task", zoom adjustment outside the range for the "zoom" task, and a number of variations. One can see in Figure 11 that the "drag" task led typically to more errors per session, than the two other types of tasks. A few outliers had a number of errors higher than 15.

Data Analysis
This section describes in particular the construction of the statistical analysis of the spring data. The analysis divides each task into two parts: from the time when the task was proposed until the first action was taken (part 1) and from the first action to completion (part 2). Sensor data were smoothed both in each period and in the whole period from initiation to completion with a Gaussian kernel and a bandwidth of 0.6, using the function ksmooth() from the R statistical package. Furthermore, a constant 1 g was added to the A2 (vertical acceleration) as a rough compensation for earth's natural gravity and to ease the comparison between the three axes. After this, a root mean square (RMS) was calculated for part 1, part 2, and the total, based on the smoothed sensor data. The Total RMS was used for descriptive purposes, but not for the modelling.

Analysis of Completion Time. The analysis of the completion time was performed in three steps:
First, the initiation time (part 1) was log-transformed and modelled through a linear model with systematic effects of person ID, task type, distance to screen centre (for "tap"), drag length (for "drag"), zoom type (in/out), numerical zoom-in distance, numerical zoom-out distance, accelerometer RMS values (A0, A1, A2), gyroscope RMS values (G0, G1, G2), and the calculated jerk RMS (J0, J1, J2) for the relevant period, all interactions between sensors, and all interactions between task-related covariates (i.e., possibly predictive variable) and sensors. The end-model served both as input for a model for the whole task completion period and as a result in itself. To justify the inclusion of the many interaction terms, the p-values in the basic model for the interaction terms were plotted in a Q-Q plot (quantile-quantile) against the uniform distribution. If the effect of interaction terms was artificial, and if apparent significance was only a result of mass testing, the p-values should approximately distribute themselves according to a uniform distribution. However, for both this part and similarly part 2, a distinct concave pattern emerged, indicating too many low p-values to conform to a hypothesis of no interaction effects. The model was subsequently reduced through significance testing at a 1% level. Second, the time from initiation to completion (part 2) was analysed in a similar way.
Third, the significant factors from the first period (person ID, task type related covariates, and sensors related to the first part) were combined with significant factors from the second period (person ID, task type related covariates, and sensors related to the second part), to form the covariates of a model describing the log-transformed full completion time. This model was subsequently reduced through significance testing at a 1% level.
After model reduction, the significant effects were entered into a mixed effects model for each of the three models, where person ID was designated as a random effect, and coefficients were reestimated to provide a model formally independent of the current panel (cf. Table 3).
The results operate with "effects on average". This means that if covariates cov1 and cov2 enter into the model as 8 Advances in Human-Computer Interaction 1cov1 + 2cov2 + 3cov1cov2, the "average effect" of covariates cov1 and cov2 is 1 + 3mean(cov2) and 2 + 3mean(cov1), respectively.

Analysis of the Number of Errors.
The number of errors (i.e., the number of erroneous or superfluous user actions), given that the number of errors was positive, was modelled by a Poisson regression model with the natural logarithm as link, with explanatory variables as for completion time.
Due to the many interaction terms and limited number of data points, a specific estimation method was chosen. As base model, the model with no interaction terms was chosen, and interaction terms (and any removed main effects) were included successively with subsequent model reduction at a 5% level: first accelerometer data, then gyroscope data, and then jerk for the first period, then the similar variables for the second period. After this, all six sets of interaction terms were included again in the same order, and lastly the model was reduced at a 1% level. After model reduction, the significant effects were entered into a Poisson random effect model with person ID designated as a random effect, and further reduced.

Relative Importance of the Three Axes.
It is also desirable to specify which types of turbulence, and along which physical axis, have most effect on the participants' performance. By selectively removing the covariates of interest (e.g., A0 = lateral acceleration) and estimating the precision of fit of the statistical model by means of an Akaike information criterion (AIC), this approach allows one to estimate the correlation between turbulence along one axis and the participants' performance. Indeed, models with poorer fit are a sign of a higher importance of the variable being removed.

Results
The results are divided into four sections: results based on "real" data (i.e., limited to basic operations such as smoothing and averaging), results based on the statistical analysis, further constructions from the model, and results based on a questionnaire.

Descriptive Statistics.
The descriptive statistics are based on real data and not on a statistical model. In Figures 12 and 13, low jerks are defined as <2.05 m/s 2 and high jerks as >3.75 m/s 2 to form three even-sized groups (low, medium, high). It is remarkable in Figure 12 that between the low and high jerk levels, there is a +55% increase in task completion time for "tap", as much as +178% for "drag", and +81% for "zoom". Though Figure 12 uses jerk to characterise the turbulence, the figure is similar when acceleration is used instead (cf. Figures 20 and 22).
Similarly, one can see in Figure 13 that by far the sharpest increase in errors from the low to high jerk conditions occurred for the "drag" task (+1499%); the error increase was lower for "zoom" (+362%) and lowest for "tap" (+247%). The figure based on acceleration was similar (cf. Figure 21).
While Figures 12 and 13 were well representative, the understanding of the relation between the duration of a task  and the level of turbulence was nontrivial. Indeed, it is natural that longer tasks have a level of turbulence closer to average, due to the fact that turbulence is averaged over a longer duration. Therefore, the main possibility of having high levels of turbulence occurs precisely in the case of some short tasks. In order to visualise this effect, a second view of the same data Advances in Human-Computer Interaction is depicted in Figure 14, which has its axes inversed compared to Figure 12 and has a different aggregation.
It is notable from Figure 14 that the confidence intervals for jerk are wide for the "drag" and "zoom" tasks that have a high completion time. This illustrates that "drag" and "zoom" tasks with a high completion time occur both with high levels of jerk as well as with jerk levels closer to average. However, Figure 14 also shows that tasks with a low (and, to a lesser degree, medium) completion time do not typically occur with high levels of jerk, which dispelled our initial concern.
One can see from Figure 15 that for relatively high jerks, the apparent effect on completion time gets less clear, which can be partially explained by the influence of turbulence other than jerk (e.g., sustained acceleration, rotations) and by a lower amount of data. The "tap" task seemed to be more impacted by high levels of jerk (i.e., above 8 m/s 2 ) than was the case with "zoom", but this apparent trend would need to be refined by additional data points at those higher levels of jerk.
Concerning the angular speeds ( ∘ /s) around the three axes as provided by the gyroscope, Table 1 again shows that, for the task completion time, "drag" was more sensitive to such movements than "zoom", with "tap" being the least sensitive (see also Figures 23,24,and 25). The increase in completion time was calculated between the lower third and the upper third levels of angular speed. The time to first action was not significantly impacted by the angular speed.
The analysis did not focus much more on the effect of gyroscope data. From the raw angular speed ( ∘ /s) provided by the sensor, it would have been necessary to calculate angular acceleration ( ∘ /s 2 ) and then angular jerk ( ∘ /s 3 ) in order to pursue the analysis, as was done to infer jerk from the accelerometer data.

Statistical
Modelling. The statistical analysis reported in this section allows asking some questions at a higher level than permitted by the raw data collected during the experiments. In particular, it is possible to analyse the influence of some physical parameters more independently, as opposed to the raw data where these parameters are highly intertwined.
During the construction of the model for number of errors, the "drag" task was the only task that interacted with the turbulence measurements, meaning that the level of turbulence had a different impact on the number of errors for the "drag" task than for the two other tasks. It was estimated that for the "drag" task, the average number of errors was 5.25 (confidence interval ci=2.70:9.25), while the number of errors for other tasks was 4.15 (ci=2.23:7.07). Corrected for turbulence, the average errors were 0.12 (ci=0.08:0.17) for the "drag" task and 0.12 (ci=0.05:0.23) for the other tasks.

Turbulence along the Three Axes.
It was possible to estimate along which axis turbulence had the greatest impact on participants' performance, by removing in turn the relevant covariates from the statistical model and calculating the precision of fit with AIC (Akaike information criterion).
For the results reported in Table 2, higher AIC scores indicate poorer goodness of fit and thus higher importance of the covariate being removed. This shows that the combination of acceleration (A) and jerk (J) has the most importance in the model of participants' performance. Acceleration alone is the second most important predictor, followed by gyroscope (G) data and finally jerk alone. The results were consistent for the three axes.
Furthermore, Table 2 shows that the turbulence along the vertical axis was the most important, followed by the lateral axis and then the longitudinal axis.

Constructions from the Model.
While the statistical analysis yields some significant results on the interaction between various variables, some variables, such as total acceleration, were not in the model (only the three distinct axes were in the Table 1: Effect of angular speed on the task completion time.

Lower third
Upper third Increase of completion time "tap" "drag" "zoom" G0 (pitch) model). Therefore, in this section, we provided some results where one variable was reconstructed and analysed while the other parameters (with which it interacts) were kept at their average value.

Effect of Total Acceleration.
With acceleration on the three axes being the strongest indicator for performance in our statistical analysis, the total acceleration was reconstructed to provide some more tangible comprehensive results. Figure 16 shows that the task completion time increased exponentially with the higher acceleration. The completion time differed significantly (p<0.001) for the three different task types (for cut-off, i.e., when acceleration is null). Furthermore, one can see that acceleration had a similar type of exponential effect on the completion time of the three types of tasks, and this was verified by the fact that the gradients of the three curves in Figure 16 did not differ significantly (ztest, p>0.16).

Effect of Total Jerk.
Total jerk was reconstructed in an identical manner. Figure 17 shows that the task completion time increases exponentially with the higher jerks. The effect of jerk-as visible from the gradients of the three curves-was less pronounced for "tap" (slope of 1.20) than for the other two task types (z-test, p<0.03). While the gradient for "zoom" (1.80) was in turn lower than the gradient for "drag" (1.98), which should have been a sign of "zoom" being less affected by jerk than "drag", the trend is not low enough to be significantly different (p=0.39).

Debriefing.
After the experiments of the autumn session alone, 8 participants were asked some general follow-up questions. One question was about the level of difficulty when using the touch-screen during turbulence, reported on a Likert scale ranging from 1 to 5 (1 being easy and 5 being difficult), as shown in Figure 18.
Finally, the participants were asked which type of interaction they found the hardest. Six participants found "zoom" the most difficult, and the remaining two found "drag" the most difficult.
There was a general consensus that it was quite difficult to use the touch-screen during turbulence. One especially interesting comment made by a civil aircraft pilot was about the experience of the turbulence: "It is really not the same as the turbulence on board a flight, because those are more up and down, here you were thrown from side to side [. . .] However, the sense of forward movement is better than in a flight simulator"-this means that even though the turbulence cannot be truly equated with that of a flight, the sense of propulsion could. Where the level of difficulty was considered high by the participants, the perceived cognitive load of the tasks was considered low (no one gave more than a medium score of difficulty). This matched the intention of the design Table 3: Coefficients for fixed effects in the mixed effects linear regression of log (completion time) on task type and sensor RMS, where participants were included as a random effect. p-values for main effects include removal of any interaction terms. Estimate for participants is the random effect parameter. * indicates that fields marked with an asterisk are for covariates that interact with other variables. of the experiment, as the focus was on interaction and not on task comprehension.

Discussion
Overall "tap" had the fastest task completion time, followed by "zoom" and then "drag", with the last two being close. This was not surprising, as "tap" was the simplest selection strategy consisting of one target and one action. It would be interesting to see what would happen when using tap in a complex interface. From the descriptive statistics, "drag" was the task type that was by far the most affected by turbulence, both in terms of completion time and number of errors. Furthermore, while "tap" was the least affected by turbulence for the major part of the range of acceleration and jerk experienced during the experiments, there were some subtle indications from the descriptive statistics that this could change for higher levels of turbulence. This would require additional experiments to form a satisfactory conclusion. This data pattern was similar to the results of Cockburn et al. [24], which showed that their slider/dragging condition had the most errors as vibration increased and that easy tasks such as tapping/clicking were not very affected by vibration/turbulence. An interesting result of the statistical analysis is that turbulence along the vertical axis was the most important predictor for human performance in our experiments.
The statistical analysis also confirmed the intuitive hypothesis that longer drag and zoom lengths lead to a longer task completion time.
Constructions from the statistical analysis revealed that acceleration had a similar type of effect on the completion time of the three task types and that this effect is exponential. However, Dodd et al. [31] and Lancaster et al. [32] found a monotonic increase in completion time and number of errors as a function of acceleration, which could be due to the differences between turbulence simulator settings and the different stimuli/tasks (i.e., tasks and stimuli were similar but   not identical) used between these two experiments and the current study. Additionally, these research groups did not thoroughly analyse their data in terms of jerk, which is a strength of the analyses executed in this current study. One can thus infer that, the longer the tasks take to perform, the more sensitive they are to acceleration. For jerk, "tap" was least affected. The trend reported by the descriptive statistics of "zoom" as less sensitive than "drag" to jerk was still visible in the statistical analysis, but not enough to be significant.
From the above, one can infer that the intrinsic task duration seems to be more important parameter than the type of task as regards how sensitive they are to turbulence.
A great deal of attention was given to the data provided by the accelerometer, but only to a lesser extent to the data from the gyroscope. It might have been interesting to study the effects of the variations of angular acceleration ( ∘ /s 2 ) and even angular jerk ( ∘ /s 3 ), as these are likely to have a greater  Figure 19: Illustration of the variation of the different sensor values during a typical session, grouped by acceleration (accelerometer), angular rotation (gyroscope), and magnetic field (compass). For legibility, only one accelerometer out of three is plotted. effect than the angular speed alone ( ∘ /s) used in this article, but this is a task for future research.
In these experiments, the focus was not set on the different types of activation for clicking, as there was neither sufficient time to train the participants nor sufficiently long sessions to test the different variants. However, the strategy chosen for these experiments, namely, of activation on release of the finger (events typically called "Mouse-Up" or "Touch-Up"), should be favoured [38], as it allows adjusting or cancelling a click while the finger is still in contact with the screen.

Conclusion
The experiments served their purpose well, namely, to gather some information about the general effect of turbulence on the use of a tactile user interface and to propose a data analysis and statistical method for pursuing more experiments.
The main findings of the paper concern a method for correlating user performance (completion time, number of     errors) and sensor data (accelerometer, gyroscope), the effect of turbulence on different touch-interaction tasks, the effect of the drag and drop distance during turbulence, and the effect of the relative zoom distance during turbulence. Furthermore, we could infer a new hypothesis, namely, that the longer tasks take to perform (without turbulence), the more severely turbulence will affect them.
Although we could not confirm one of our initial hypotheses-that jerk alone would be a better predictor for human performance than acceleration alone-we did find that jerk was a significant predictor and that the combination of acceleration with jerk was the best predictor. Additional analyses would be needed to refine the calculation of jerk.
Although conditions were sometimes challenging, no major difficulty was discovered that would prohibit the use of tactile interfaces during the type of turbulence we could produce. Anecdotal evidence confirmed this: one of the aircraft pilots participating in the tests said that, though he was extremely sceptical about the use of tactile interaction prior to the experiments, after having tried it he was much more convinced that tactile approaches could be viable.