Where to Start? Exploring the Efficiency of Translation Movements on Multitouch Devices

. Predicting the efﬁciency of interaction techniques can be crucial for designing user interfaces. While models like Fitts’ law make general predictions, there is little research on how efﬁciency varies under different conditions like in which screen region a movement starts and in which direction it is going, and whether the surface is horizontal or vertical. This study investigates these aspects with regard to translation movements on a touch screen, using an extended Fitts’ law setup and considering arm kinematics. The results show that on horizontal displays translation is faster and causes less arm fatigue than on vertical ones. Also, on horizontal displays, we identiﬁed screen regions and movement directions that allow signiﬁcantly faster movement compared to others. Finally, movements that employ shorter kinematic chains (e.g. just the wrist) are signiﬁcantly faster than those that use longer ones (e.g. wrist, elbow, shoulder). We suggest adjustments to Fitts’ original formulation. In the future, our ﬁndings can inform or partially automate positioning decisions in interaction design.


Introduction
For many devices multitouch has become the standard interaction technique. Predicting the performance of interaction techniques can be important for various reasons, e.g. if an application is highly time/cost critical (industry, logistics, communications) or needs very high precision (medical, military). Even for applications where performance is not the first priority, it is still a criterion to evaluate the user interface [7]: Nielsen et al. [18] report that in 75% of the 57 studies they evaluated the user preferred the system with the best performance. Ben-Bassat et al. [2] show that the user would choose the system with the better performance and ignore the design, even if they have to expect monetary loss. Other studies could show that users rate a design worse after usage if the usability was low on effectiveness [22] or efficiency [18].
We find multitouch techniques on a broad range of devices, from smartphones and tablets to touch-sensitive tables and display walls. The performance of even basic interaction techniques like the translation and rotation of objects will likely differ depending on the size and orientation (horizontal/vertical) of the device and the position and posture of the user and his or her arm and hand. However, current models like Fitts' law [6] do not take such conditions into account yet. Although Fitts' law has been confirmed for different input devices like mouse, pen input and multitouch [5], there have only been few extensions so far [1,4]. In prior work we have started to look at how the screen region effects the performance of translation and rotation techniques as a first step to explore the conditions under which interaction techniques perform best [17].
In this paper, we focus exclusively on translation movements (dragging). However, in our study we include many different conditions like screen orientation (horizontal vs. vertical), screen regions (20 regions where the movement can start) and movement direction (eight directions). Based on existing work [3,23] we hypothesized that performance will differ with respect to screen orientation and screen region. We also look at the concept of fatigue as a possible cause for performance differences and as an important aspect in the subjective evaluation of multitouch displays [14]. Various studies have shown that the length of the kinematic chains which were used to execute the task play a decisive role in terms of fatigue and performance. Hincapié-Ramos et al. [9] for instance showed that working with extended arms, which constitutes a longer kinematic chain, fatigue increases. Other studies found higher performance for shorter kinematic chains [12].
We hypothesize that there is an increased perceived fatigue for vertical displays [3]. In accordance with [9] and [12] we hypothesize that users will use longer kinematic chains on vertical displays which decreases performance.
Our main contributions are the following: We present significant empirical findings concerning the performance of translation movements on horizontal vs. vertical displays, considering start points in various screen regions and various movement directions. We show that horizontal displays outperform vertical ones and identified various screen regions, on a horizontal screen, which perform better than others. Finally, we prove correlations between performance and kinematic chains based on a manual video analysis of kinematic chains.

Related Work
Most of the research for multitouch deals with the problem of selection/tapping in terms of efficiency and precision or both [13,20]. Or they focused on different interaction techniques for manipulating objects (translation, rotation or both) with multiple degrees of freedom in 2D [16] or 3D [11]. There is relatively little research for the translation/dragging task for multitouch in terms of efficiency in different areas of the display [3,23]. Bi et al. [3] divided the multitouch display into different cells to measure the performance of different tasks for the each cell in relationship to the position of the multitouch display. The displays were placed around the keyboard (left, bottom, right and top) and as a vertical screen. The used task for the study was a one-finger gesture task and two docking tasks with translation, rotation and scaling. In contrast to our experiment they used fewer screen locations (nine) and the direction of the one-finger drag gesture was limited to up, down, left or right. For one-handed tasks cells close to the keyboard performed best. For two-handed tasks the placement on bottom and top had the best performance. The design of Weiss et al. [23] only included up and down movements.
There are two lines of research where vertical and horizontal displays were at the center of attention. In the first line of research, the two orientations are compared in terms of efficiency and performance [8,19]. In the second line of research, solutions for the integration of both screen orientations are explored [23]. Hancock et al. [8], for instance, compared the direct input with a pen input on a vertical and horizontal display surface. With a selection task on menus they tried to find out which regions are faster and easier to reach for the used hands. Based on their findings they suggested an adaptive interface to detect handedness because handedness influences the performance of the selection. For instance, the left hand is faster for upper-left und lower-right regions while the upper-right and lower-left regions are faster for the right hand. Pedersen et al. [19] found that tapping was performed 5% faster on the vertical surface, whereas dragging was performed 5% faster and with fewer errors on the horizontal surface. In contrast to our experiment they compared tapping and dragging tasks on large multitouch displays where participants were standing. Additionally, the participants where free to choose the left or right hand for interaction. In contrast, the BendDesk [23] was constructed as a combined horizontal and vertical display, connected by a curved region. The authors studied this curved area and, among other things, compared down-up movements that cross through all three areas. One of their findings was that dragging on a planar surface is faster and straighter than dragging across the curve. Given that the distances were constant for all dragging tasks Fitts' law would have expected constant movement durations over all areas.
Since Fitts published his formula which predicts that the time to acquire a target is logarithmically related to the distance over the target size [6], there has been more research on this topic and Fitts' formula has been confirmed for different input devices like mouse, pen input or multitouch [5].
There has also been work claiming that Fitts' law was unsatisfactory and suggesting to extend it for a 2D task [24] or for touch input [4]. Additionally, the authors [17] proposed to consider the direction of the movement. Weiss et al. [23] found indications that the interaction zone has an influence on the task completion time.

Participants
Participants were recruited via noticeboards on university campus and through academic mailing lists. 16 subjects (7 male, 9 female) took part in the study and were paid 10 Euros. The age varied between 19 and 33 with an average of 24 years. In terms of handedness 13 subjects were right-handed and three were left-handed but use the right hand for controlling mouse and touch interfaces.

Apparatus
We used a 22-inch multitouch screen (3M model M2256PW) with 1680x1050 pixels and <6ms touch response time. The application was developed in Java 8 with JavaFX and was run on an iMac.   The screen was used in two orientations, vertical and horizontal. In the vertical setup the display stood on a desk, in the horizontal setup it was placed on a low table so that the surface was at a height of 73 cm = 28.7 inches ( Fig. 1a and 1b). The participants sat on a static chair centered in front of the display. Interactions were tracked by three webcams from the left (side view, cam 3), from above (top view, cam 1) and from behind the participant (shoulder view, cam 2). Figure 1c shows the webcam setup for the vertical display setup.

Tasks
The task required to move a circular cursor (grey circle with a red cross) into a target area marked by a dashed circle (see Figure 2a). The target area was 1.5 cm in diameter. As soon as the cursor was selected by touching it the cursor changed to monochrome colors (see Figure 2b). A trial was rated successful if the center of the cross was located inside the target area when lifting the finger off the screen. Success was signaled by a green check mark (Figure 2c). It was not necessary to achieve a perfect match between cursor and target area. But if the center of the cross remains outside the target area a "sad smiley" appeared to signal failure (see Figure 2d). In this case, the corresponding trial was repeated at the end of the set. The beginning of a new trial and success or failure of a trial were accompanied by distinct sounds.

Material
For a thorough analysis the surface area of the screen needed to be completely covered by movement paths. A movement path was described by a start and an end point, marked by two different circular areas. Start points were evenly distributed through a pattern as broad as possible. The multiplication with end points showed that a high number of start points would have led to an unacceptably high number of trials per orientation. Therefore, the number of start points was set to 20, so that the display was still (a) Task is shown

Fig. 2: Tasks
optimally covered with points in reasonable distances and the study was still feasible in an adequate time frame and with adequate effort (see Figure 3). Possible end points were computed for each start point with four possible distances (2.5 cm, 5 cm, 10 cm and 20 cm) going in eight possible directions (0°, 45°, 90°, 135°, 180°, 225°, 270°and 315°). The combination of distances and directions resulted in 32 different theoretical end points per start point (see Figure 3b). Some of the end points were not reachable because of the screen limits, so such points were removed (see Figure 4a). Figure 4b shows all used configurations for the respective start points. The total number of configurations was thus reduced from 640 to 388. Each configuration occurred once in each set for horizontal and vertical level.

Procedure
The study was conducted in a lab with a supervisor and took about 1:15 h per participant. Each subject was briefed using written instructions while allowing for clarification questions.
Participants completed a pre-test questionnaire for demographic data. To make subjects familiar with the device and its use (strategy and optimal finger movements) each session began with a training phase of 30 randomized trials using the same configurations across subjects.
All tasks were conducted first in the horizontal, then in the vertical condition, or vice versa (setup order was balanced across subjects). Each condition took about 30 minutes to complete with a short break in between. After switching display orientation another training phase was conducted with a different training set. For every orientation condition, the task set consisted of at least 388 trials presented in four blocks separated  by three breaks of 20 seconds. The actual number of trials for each subject depended on the individual error rate. Every failed trial was repeated at the end of the current set. Every subject received different task sets with 388 distinct configurations. The order of configurations was pseudo-random under the condition that neither start point nor movement direction be the same in two immediately subsequent trials. After completing all tasks, subjects answered a questionnaire with subjective ratings concerning the comparison of both orientation setups.

Design
We devised a within-subject design with two independent variables: (a) display orientation (horizontal, vertical) and (b) configuration (start point, direction). Training phases were not included in the analysis of the trials so that there were 16 subjects × 2 screen orientations × 4 blocks × 97 configurations by block (388 configurations in sum) = 12416 data items were analyzed.
The following data were measured as dependent variables:

Results
We analyzed our data such that concrete design recommendations could be generated. Therefore, we looked at screen areas. First, we analyzed screen halves (top/bottom half and left/right). Second, we defined three functional areas (see We compared the performance of these areas. Performance was measured for horizontal and vertical displays considering start point, movement direction and varying distances. We measured in terms of mean-of-means throughput (TP) after Soukoreff and MacKenzie [21]. The difficulty of each configuration was measured using the Index of Difficulty (ID) with the Shannon formula because it always gives a positive rating for the index of task difficulty [15]. TP combines speed and accuracy into a single dependent measure and is calculated by: where y is the number of subjects, and x represents the movement condition. M T ij is the meantime over all trials for this condition. The units of throughput are bits per second (or bps).
The advantage of TP is the normal distribution of these data (Shapiro-Wilk-Test: W = 0.9858, p − value = 0.8047) because normal distribution of the data is often a requirement for statistical tests. In contrast, distance times are log distributed (see Fig. 6).

Q-Q Plot for "log-normal" distribution
Quantiles for dt_data Quantiles from "log-normal" distribution Comparing display orientation A two-tailed paired t-test showed that translation movements on a horizontal surface (M = 8.67, SD = 2.02) outperformed the ones on a vertical surface (M = 7.53, SD = 1.46); t(15) = 3.62, p < 0.002, Cohen s d = 0.907 (see Fig. 7) which is also clearly visible in the heat maps in Fig. 8. Comparing functional areas For the horizontal condition, we found differences between areas (see Fig. 8). A pairwise t-test with Bonferroni-Holm correction between the corners shows that the left bottom corner of the horizontal display is the best corner in terms of performance (see Tab. 1). For vertical displays there were no significances.
An additional pairwise t-test with the same correction for the edges shows significant differences with large effect sizes in performance (see Tab. 2): the bottom edge    For the vertical condition there were no significances for these movements. Although we found several statistically significant differences between directions in the horizontal and vertical condition it was difficult to find systematic patterns. Figure 9 illustrates both the best and worst directions. The stars indicates the level of significance: * p < 0.05; * * p < 0.01; * * * p < 0.001; Comparing movement axes We slightly abstracted away from actual directions by collapsing two opposing directions into one axis (for the labels see Fig. 10). A pairwise comparison of axes in different areas showed, in some areas, a highly significant difference between the performance of the axes. Figure 11 shows only significant pairs. For instance the V axis showed significant differences to the H axis and the D2 axis in the zones Z21, Z22, Z31, Z32, Z33.
Kinematic chains For the analysis of the kinematic chains the sessions were recorded by three cameras (see Fig. 1 for the setup and Fig. 12 for an example screenshot of the video). The recorded material was annotated by two independent coders using * ** * ** *** *** *** ** ** *** ** * *** *** * *** ** *** the ANVIL video annotation tool [10]. For each translation movement the coders annotated the employed kinematic chain in the user's arm. A kinematic chain is a series of connected joints, e.g. finger-hand-forearm. The more joints there are involved, the longer the kinematic chain. To categorize a user movement we defined four kine- matic chains: finger, finger-hand, finger-hand-forearm and finger-hand-forearmupper arm. Each chain type is denoted by the topmost joint in the chain (printed in bold). For instance, if the target was moved by the finger without moving the hand, elbow or shoulder, then we categorized it as "finger". If the elbow was involved, then "forearm" was annotated. A chi square statistic shows that the the orientation of the display influences the usage of the kinematic chain χ 2 = (3, n = 12335) = 423.210, p < 0.001. On horizontal displays, users employed shorter kinematic chains. Table 3 illustrates the usage in percent. We can see that the values for finger, hand and forearm increases up to 200% in comparison to the vertical display.
T-Tests for the horizontal condition show that shorter chains (finger, hand, forearm) (M = 9.36, SD = 2.27) have significantly higher performance than longer chains (upper arm) (M = 8.47, SD = 1.95); t(15) = 6.593, p < 0.000. The same applies for Fig. 12: Screenshots of the three camera perspectives. Adjusting Fitts' law We adjusted Fitts' law to include the factors of screen orientation, start point and direction. We performed a multiple (linear) regression with the mentioned factors using the Shannon formulation for index of difficulty (id). For the computation we indexed the real values for start x-point (startX), start y-point (startY), screen orientation and direction (angle) to calculate the regression coefficients. We used values from 1 to 5 for the five possible starting x-coordinates, 1 to 4 for the starting ycoordinates and 1 to 8 for the angles in clockwise direction (1 = 0°... 8 = 315°). For screen orientation we used 1 for horizontal and 2 for vertical. We randomly divided our data into a training set and a test set, each of which contained 50% of vertical and 50% of horizontal data points. We used the training set to generate the model and derived the following parameters (startX, orientation and angle) and regression coefficients after the multiple regression (the starting y-coordinate was not significant for this model, p > 0.1): The first line is the regular Fitts' law formulation (from now on called Fitts' model). We used the training data to compute the coefficients for the regular Fitts' model, too, and derived y = 0.1605x + 0.089.
Comparing the actual test data (green) with the predictions of the regular Fitts' model (red) and of our adjusted Fitts' model (blue) over all ids (Figure 13a), both models achieve a similarly high correlation (Fitts' model R 2 = 0.98, adjusted model R 2 = 0.92). However, if we take the other factors into account like orientation, starting point and direction we get a better picture of the performance of the two models (see Figures 13b, 13c, 13d). Here, the adjusted model is more precise than the the predicted constant time of Fitts' model. It is clearly visible how the adjusted model accounts for our findings: The prediction for surface orientation for instance shows the different performances for horizontal and vertical displays (see Fig. 13b) -horizontal outperforms vertical. In Figure 13c the prediction shows that the left side is faster than the right side of the display. And as we can see in Fig  Questionnaire and interviews The direct comparison between horizontal and vertical display orientation shows an explicit preference for a horizontal multitouch display over all categories among the subjects (find all results in Table 4). Most of the subjects perceive the horizontal orientation as faster, simpler and operable with less effort compared to the vertical orientation. Furthermore, subjects feel safer and even for the correction of errors like re-adjusting due to missing the target area the horizontal display is preferred over the vertical display.

Discussion
Our results first of all show that the performance of a translation movement depends on the orientation of the display, the point where the interaction starts, the direction in which the translation moves and which kinematic chain the user used. The horizontal display yielded the best performance and lowest perceived fatigue compared to the vertical one. Additionally, users used shorter kinematic chains. Our results show that kinematic chains influence performance: shorter kinematic chains (involving finger, hand, forearm) yield higher performance than longer ones (involving the upper arm) which is in accordance with earlier findings [12]. However, how do kinematic chains relate to the concept of fatigue? A recent study by Hincapié-Ramos et al. [9] showed that what they call the "bent arm position" is the least tiring of all positions they tested for a selection task on a 2D plane. This clearly corresponds to our notion of a short kinematic chain. Therefore, we can assume that kinematic chains correspond not only with performance but also with fatigue: shorter kinematic chains cause less fatigue than longer kinematic chains. Taken together, these results imply that it may be desirable to train users accordingly, i.e. to use shorter kinematic chains to reduce fatigue and increase performance.
On horizontal displays, performance also differs across the screen. The left half yields higher performance than the right half. Also the bottom half outperforms the top half. The bottom edge and the bottom-left corner seem to be particularly good areas for high performance. This confirms UI decisions like placing a start button in the lower left corner or putting a dock bar along the bottom edge. It contradicts the UI decision of placing a menu along the right edge (e.g. Windows charm bar). However, this only applies to horizontal displays. Vertical displays have more homogeneous performance characteristics.
In terms of movement direction we found some patterns for high performance. Upward motion is performed faster than downward motion (in the horizontal condition). This was partially confirmed when looking at motion axes where the vertical axis and the diagonal from lower-left to upper-right are particularly good performance-wise. This is relevant if virtual objects (UI elements, photos, documents) have to be dragged to a target area. According to our results source and target positions should be located in the lower and upper screen regions respectively and ideally be on the vertical or the mentioned diagonal axis.
In prior work we found that combining the movement of translation and rotation achieves higher performance when directed to the right than when directed to the left [17]. Our current study shows that this does not hold for translation-only movements. It can be concluded that the found direction preference must be due to the rotation part or the combination of rotation and translation which is an interesting refinement of our earlier result.
Regarding Fitts' law, we found contradictory data since there are significant differences in performance depending on the the start position and movement direction. However, Fitts' law would have predicted constant movement durations over all areas and direction. This confirms earlier works [23,17]. So we suggest to adapt Fitts' law with additional parameters to factor in the start position, the screen orientation and the direction of the movement. Our adaption of Fitts' law gives a better approximation and prediction of the expected performance although the measured time data were not involved in the modeling of the prediction formula. In our case the adapted version predicted the different performances between the directions, orientations and x-coordinates. We are aware of the fact that our extension is only the simplest approach to adapt Fitts' law, but this should demonstrate the necessary of the extension of Fitts' law to achieve usable predictions.
Finally we would like to point out that the conditions have to be extended to have a higher generalizibility of the outcomes. Possible extensions are outlined in future work (Section 5).

Conclusion
We presented a study to systematically explore the performance of translation movements on a multitouch display. Our results show that the performance varies significantly and with large effect size depending on surface orientation (vertical vs. horizontal), movement start point and movement direction.
We showed that horizontal screens yield the highest performance and are subjectively preferred over vertical ones. We also found specific areas where the performance outperformed other areas of the display in both conditions, vertical and horizontal. Most differences occur on the horizontal display which means that the optimization potential is higher with horizontal displays. And we could show that the direction of the movement influences performance. These findings contradict Fitts' law which predicts constant movement time over all areas and along all directions. Therefore, we suggest to extend Fitts' law using a simple linear combination that factors in display orientation, start point and direction.
In terms of ergonomics we could show that orientation influences how the user executes movements. On horizontal displays user employ shorter kinematic chains com-pared to movements on a vertical screen. We interpret this as a possible cause for the higher performance and the lower perceived fatigue on horizontal screens.

Future Work
Future work should investigate more in-depth the correlation between performance, perceived fatigue and the usage of shorter kinematic chains. Our experimental design could be used to examine a wider array of conditions. It would be interesting to include smaller and larger form factors like smartphones and phablets, which are often operated by thumb, or multitouch tables and large touch walls where users may have to stretch or even walk to reach certain areas. Other important aspects to consider are different postures (sitting, standing) and different screen angles (45°and others).
Additionally, we need to check whether and how handedness affects our results. It would also be interesting to look at cultural differences (e.g. different reading and writing directions).
Ultimately, our results should make it possible to generate recommendations e.g. in the form of heat maps for specific devices, and to automatically evaluate user interfaces. Future interfaces may even constantly adapt the UI layout depending on user characteristics based on results such as ours.