Pupil-assisted target selection (PATS): State of the art and future prospects

Affective interaction is assumed to advance interacting among humans and technical systems. Using pupil size as affective information, we review hitherto presented approaches for pupil-based interaction. Three approaches base on different mechanisms; pupil size changes produced by covertly attending towards a set of stimuli changing in brightness, changes in pupil diameter actively produced by the user, and those occurring automatically as a by-product of cognitive processes. While the first two approaches can produce effective but relatively slow target selections, the latter might allow faster selections, but has to be combined with other selection mechanisms. Reanalyzing two data sets obtained with this most promising approach of using cognitive processes as basis for pupil-based target selection, it turns out that despite of differences in task and brightness, signal dynamics are characterized by comparable latencies and effect sizes. These may lay the foundation for future investigations into pupil diameter changes accompanying object selection in human-robot interaction. Finally, we suggest further investigations into the subject, determining when and why pupil diameter changes accompany object selections.


I. INTRODUCTION
During human-human interaction, gaze behavior reveals a lot of information about the persons involved.Especially object fixation provides a reliable clue on a person's attentional focus.Moreover, one can even assume that eye movements communicate complex intentions [1].Modern interfaces have taken advantage of the possibilities of human gaze behavior.For example, also human-robot interaction could be shaped intuitive and lead to true companions that are able to understand the subtle movements of the eye.As such a subtle movement, pupil-based information has recently gained more interest.
Pupil diameter constitutes a significant source of implicit information as it features autonomic activity changes that arise from various cognitive and/or affective processes.Analogous to eye movements, pupil size changes are suggested to also play a crucial role in social interaction.For example, [2] report sad facial expressions to be moderated by pupil diameter, assuming constricted pupils to correlate with high attributions of valence and arousal.Kret, Fischer and de Dreu [3] demonstrate that dilations may be of special importance in group interactions; here, increases and decreases of an opponent's pupil diameter significantly alter our willingness to cooperate and is mimicked during social interaction.As such, we may have developed the ability to infer information on the internal states of human opposites from their pupil size changes.To improve data exchange between user and computer, the same type of information is attempted to be assessed employing pupil diameter and processed by modern interaction concepts.During the last years, the perspective on pupil size changes has been extended considerably: until recently, pupil diameter has been regarded as a passive information channel that may be employed solely to assess mental workload or a user's emotional state [4].However, Ehlers and colleagues [5] show that pupil size changes areat least to a certain degreesubject to cognitive control and may provide an alternative way of target selection and thus, of active human-computer interaction.
In the following, we will shortly sum up the existing approaches that integrate pupil diameter into gaze based selection with the aim to clarify underlying mechanisms and evaluate them for future applications.Actually, three kinds of approaches can be distinguished, depending on the functional mechanisms they make use of.These are stimulus-driven pupil-based selection, active control of pupil size, and automatic pupil-size changes occurring as a by-product of cognitive processing.Although each approach produces effective target selections, automatic pupil diameter variations led to faster selections than the other approaches.In order to get an estimate of the best achievable performance using automatic pupil-size changes for target selection, two data sets using this approach were reanalyzed and compared.

A. Approaches in pupil-based HCI 1) Stimulus-driven pupil-based selection
One suggestion for pupil-assisted target selection is based on the finding that pupil adapts to the change in brightness of a displayed stimuli when covertly attending towards it, even when not fixating [6,7].Mathôt, Melmi, van der Linden and van der Stigchel [8] present a possibility which allows to infer attended objects that change in brightness via pupil size.Two objects are displayed that either turn dark from a bright state or vice versa repeatedly.The live monitoring of pupil diameter allows to infer the attended object, since oscillations in diameter correspond to the changes in brightness of the attended object.These binary selections can also be used for selections out of more stimuli, if they are used sequentially.Hereby, it had been shown that also selections from a set of up to eight objects can be performed by firstly displaying the eight objects.The four attended stimuli can be identified and further split up into two objects and so forth, using the same mechanism.Thus, intended objects can be detectedmerely relying on analyzing changes in pupil diameter in real time that occur due to the covert shift of attention to targets changing in brightness.

2) Active control on implicit events
Recent studies indicate that pupil size changes do not simply label sympathetic activity; at least to a certain degree, the associated dynamics are subject to cognitive control, albeit with individual varying success and over differing durations [9][10][11][12].Given that valid and continuous feedback is provided, participants deliberately expand their pupil diameter up to one millimeter beyond baseline mean [5].Pupil enlargements that arise from the ability to resize diameter occur slowly with strong effects and maximum values usually about three to five seconds after feedback onset [5,10,11].This finding can also be used to enable active computer input: Stoll and colleagues [13] applied pupil dilations due to arithmetic processing to establish communication with locked-in patients; however, in a rather non-applicable form.Recently, Ehlers, Strauch and Huckauf [14] utilized self-induced dilations as selection criterion in a simple search-and-select task and report promising results regarding speed and accuracy.Actively controlled pupil size changes may be usable for complex application scenarios that are associated with high user demands and difficult or uncertain selection decisions.

3) Automatic pupil size changes
Pupil diameter changes are connected with Locus Coeruleus activity [15].This correlation appears to be temporarily almost synchronous and thus offers information about a user's attentional state with high temporal resolution [16].For example, pupils dilate when a relevant object is spotted during visual search [17], and during a decision for "yes" in comparison to a decision for "no" [18].Given the similarity to the process of selection in human computer interaction (search and select if correct), these findings suggest that mouse clicks can be predicted on the basis of pupil diameter [19].Using such automatic responses of the pupil, it is shown that targets can be selected in an easy as well as in a hard task [21].This finding is replicated in a second investigation, during which selections via dwell-time are speeded up if pupil dilates beyond a threshold during eyetyping [22].However, on an individual trial basis, automatic pupil diameter changes provide only a limited classification performance in a machine learning investigation [20].

4) Evaluation
Taken together, 1 and 2 show that pupil diameter changes can be used for effective target selection.Nevertheless, thinking about future applications, one might argue that both, the stimulus-driven and the active control approach, require much load for the user.In this respect, automatic pupil size changes (3) seem to be more promising for applications, although they likely have to be combined with other input mechanisms to be effective.In addition, automatic pupil size changes also promise to be executed with high precision regarding the timing relative to other cognitive activities.They should be therefore faster than the other approaches and easy applicable as assistance in existing interfaces.

B. Assessing the potential of automatic pupil size changes for target selection
Taken together, automatic pupil size changes seem to be a promising mechanism for assisting in selecting targets.In the following, a more detailed assessment of this method will be reported.
There are already two interfaces using an implicit pupilassisted input option.In a first laboratory-fixed interface [21], a target item congruent to a central reference has to be selected from a set of circularly arranged distractors as task.While gaze direction indicates the position, this selection is performed via a combination of dwell-time and pupil size changes.While gaze highlights a fixated item, either a (previously calibrated) increase in pupil diameter within a window of 666 ms or a one second dwell-time serve as selection mechanisms, depending on which is reached first.Average selection time, defined as the time from the beginning when gaze entered the area of interest around the selectable objects (dwell-time) until selection is performed, is 840 ms with 5.44 % of selections being false positive.A similar task that features only a one second dwell-time leads to 0.6 % of false positives.While the pupil-assisted target selection mode can't outperform dwell-time, 40.83 % of selections were speeded up.The major contribution besides the live implementation and combination of dwell-time and pupil diameter oscillations are the signal dynamics reported.A sequence of a dilation, starting prior to fixating a later selected shape and reaching its maximum at about 400 ms post fixation, a constriction reaching its minimum at 750 ms and a subsequent dilation at about 1500 ms post shape fixation.Thus, while pupil diameter could not outperform the dwell-time, specific dynamics were at least found on average.
Based on these already mentioned observations, an eyetyping interface was realized in which subjects had to eye-type a text using an onscreen-keyboard outside the laboratory next to a window [22].Here, about the lower half of the screen is covered with an onscreen keyboard with a total of 33 selectable keys.Instead of using a race-horse algorithm between dwelltime and the employed pupil dilation criterion, a dwell-time of 1.5 s was lowered, whenever a predefined increase in pupil diameter was registered within 360 ms.The selection could be further accelerated when a subsequent constriction within another 360 ms was found.This way, pupil-assisted typing enabled a selection time of 1.04 s at the cost of 1.10 % false positive selections.As for [21], signal dynamics were recorded; all subjects show the prototypical set of dilation, constriction and dilation, however, signal dynamics are shifted in temporal phase and are differential regarding the size of dilation and constriction [22].This pattern was not only found between, but also within subjects.
In both hitherto existing applications of automatic pupil size changes, effective target selections were produced.However, it is still unclear whether the observed characteristics can be compared: The two studies used different eye trackers, different users, different tasks, and different lighting conditions.Therefore, it is unclear whether there are general thresholds that may be used in pupil-assisted target selection or whether these thresholds are dependent on the test conditions.In the present reanalysis, pupil dynamics are compared in order to learn about general and specific characteristics of using pupil dilations in target selection applications.
The determination of selection related pupil size changes is crucial for the development of thresholds for pupil-assisted selection.While the results of [22] point towards a dilation of approximately 0.15 mm, it is unclear to which extend a similar analysis would alter the dynamics presented by [21].Also, for the average pupil size changes, effect sizes are not completely clear, since baselines differed between both studies.For this review, we reanalyzed data from [21] by reorienting signal dynamics at the first local maximum in accordance to [17,22].Moreover, signal changes are made comparable by utilizing the same baseline.Klingner [17] describes that unaligned pupil dynamics can lead to distorted estimations on the effect size.Following this appraisal, signal dynamics of [21] are analyzed by aligning them to the first local maximum.

A. Original studies 1) Data set I [21]
Twenty-four users (MAge = 24.08,female = 18) had to select the one out of a set of circally arranged stimuli which did mirror a simultaneously depicted target.Data were collected in a windowless laboratory with an SMI iViewX XTM Hi Speed 1250 eye-tracker, using a distance of 65 cm to the monitor (BenQ XL2720Z 1920*1080 Pixels, 60 Hz).The tracker was down sampled to 60 Hz.Further details are given in [21].
2) Data set II [22] Twenty-one users (MAge = 22.76, female = 13) eye-typed a standard text.The study was conducted in a room next to a window during winter; brightness ranged from 45 to 3500 lx.Pupil size changes and gaze were tracked using a SMI RED 120 eye-tracker; the monitor (DELL P2210, 1650*1050 Pixels, Hz) was situated 40 cm from the user.The tracker was down sampled to 55 Hz.For further details of the data, [22] can be consulted.PsychoPy version 1.81.02 was employed for both studies as experimental software for the task, the display of visual stimuli, the underlying selection mechanisms, and data extraction [23].

3) Data analysis
Both, [21] and [22] provide pupil dynamics around fixation of later selected objects.However, baselines in both investigations are different, which affects the comparability of average dynamics.As a first step, average dynamics were therefore re-analyzed: the average pupil diameter of the interval of 700 ms to 600 ms prior to fixation was subtracted from the subsequent values for both data sets to provide a better estimation of average diameter changes around fixation as a baseline.
As a second step, the dynamics of every trial at the first local maximum in an interval of 300 ms prior to fixation until 700 ms post fixation were aligned; the maximum was set to zero on the time axis.In order to arrive at an interpretable signal dynamic, we chose to consider only these trials, which were present for at least 400 ms so that the signal preceding the fixation is meaningful.This is also a precondition to allow proper baseline registration and to ensure signals of equal length.This prevents distortions to the average dynamics that may appear due to signal dynamics of unequal length.An alignment of dynamics is regarded as a substantial step towards the estimation of an underlying effect size [17,22].This is in turn crucial for the determination of dilation criteria for selection in future interfaces.Hereby, the correct estimation of the effect size of the first local maximum is especially important since this signal change is considered most promising for interfaces that could arrive at being faster and/or less error-prone than dwell-time based selection [21].

III. RESULTS
Fig. 1 illustrates the newly baseline corrected data of [21] (Fig. 1a) and [22] (Fig. 1b) with adjusted to similar scaling of the average dynamics when fixating a later selected shape or key.The dynamics reveal a comparable effect size, although pupil dilates descriptively more for the data of [21] (0.06 mm) than for the data of [22] (0.03 mm).At the first local maximum, both dilations are significantly different from the baseline, as shown by the functional confidence intervals.This is especially noteworthy for the second local peak, which can however be explained by differences in the task and might likely be an artifact due to the emotional valuation of successful selections with non-correctable selections in [21] compared to correctable selections in [22].Moreover, task and apparatus have led to a descriptively higher variance between subjects in [21] compared to [22], as visible from the size of the functional confidence intervals.The timing is highly comparable between both investigations, revealing a consistent sequence of dilation, constriction and subsequent dilation.The first maximum is found at about 400 ms post fixation, a minimum is found for 750 ms [21] and for 1000 ms respectively [22].The second local maximum is found at 1600 ms [21] and 1800 ms [22].Thus, the data of [22] show a slightly delayed response in comparison to [21].Fig. 2 depicts the dynamics when aligned to the first maximum between 300 ms prior to the first fixation of a later selected key until 700 ms post fixation.Pupil dilates for 0.12 mm [21] (Fig. 2a) and slightly larger for 0.14 mm [22] (Fig. 2b) on average.The larger second local maximum in Fig. 1a leads to a continuingly dilated pupil in Fig. 2a, which cannot be monitored in Fig. 2b.Variance between average signals is slightly higher for the data of [21] than for [22].The sequence is comparable between both dynamics, except the prolonged dilated pupil in Fig. 2a compared to a pupil on baseline level in Fig. 2b.The slope to the maximum is of comparable size for both signals.

IV. DISCUSSION
Investigating the field of pupil-based target selection, three approaches can be distinguished.Stimulus-driven pupil-based selection requires rather long viewing durations for a stimulus until a certain frequency in brightness changes can be recognized in respective pupil dilations.Although this method works effectively, its application is strongly restricted by the necessary changes in brightness.Active control of pupil dilation works without the necessity of changing the stimulus material.Nevertheless, although also effective, actively controlling the pupil size is demanding and time consuming.Hence, for both approaches, the potential of application seems to be clearly limited.
Automatic pupil size changes occur as a by-product of cognitive processing.Although respective dilations might be smaller relative to the aforementioned approaches, their latencies are much smaller.This suggests that such pupil movements can in fact be employed to assist and improve target selection.The only two user studies which are reported using automatic pupil size changes in real time seem to confirm this suggestion.However, the two studies differ in many aspects.Therefore, it is unclear whether both data sets rely on comparable pupil responses.
The aim of the present data analysis was an evaluation of commonalities and differences of pupil size changes in two data sets.Despite the huge differences in data collection in terms of task, users, eye trackers, and brightness, the data show comparable pupil dynamics: At about 300 ms after fixating the later selected target, a first maximum of pupil size can be 2a 2b 1a 1b observed.As further analysis shows, the maximal dilation relative to fixation onset is about 0,15 mm in size.The comparably large pupil dilations reported in [21] and now reported for the first time for the data obtained in [22], plead for the potential of pupil size changes as a variable that substantially contributes to an adaptive dwell-time.With the current analysis, the findings of [22] in regard to the effect size of the dilation during the selection from the set of nine objects are replicated [21].The dilation appears to be slightly larger during the operation of the search and select task described in [21] than the operation of the onscreen keyboard described in [22].It would be interesting to see whether such differences can be confirmed in further studies.Since automatic changes in pupil diameter have a disappointing classification performance in an offline machine learning investigation for selections [20], it has to be investigated under which circumstances these diameter changes occur.For future investigations on diameter changes accompanying selection, uncertainty should be investigated, since high uncertainty can lead to dilations that could be mistaken as decision for "yes" by systems [24].Moreover, further variables psychophysiological variables readable from the eye might be investigated that allow to increase intention prediction performance.
The reported size of dilations should be robust to signal noise.They were obvious with two different eye trackers so that we suggest they will be detectable also for other eye trackers.The time window of about 300 ms means that such pupil size changes can be detectable even with low-cost trackers working at low sampling rates.
Analyzing data sets in comparison with each other seems to be a promising tool for detecting commonalities in signal dynamics which then can be used for modelling.For pupil dilations at issue, the current findings suggest that respective pupil responses may scarcely be used in a single channel application since the effect sizes might be still too small to be reliably detected in about 95 % of all trials.Nevertheless, the current values suggest that pupil dilations may be successfully used to assist other modalities for target selection.
Taken together, pupil diameter provides a large amount of implicit data.As an assistive input mechanism in dwell-time based concepts, it may constitute a promising supplement to make eyes-only interaction faster and more reliable.However, the true impact of pupil-driven information in HCI has yet to be determined.In particular, this involves a deeper understanding of the relationship between diameter changes and the underlying cognitive and/or affective processes.Further research on combining pupil size changes with other modalities may provide a promising way of incorporating implicit information channels like pupil size changes in explicit control of interfaces.

Fig. 1 .
Fig. 1.Signal dynamics aligned at the fixation of a later selected object, in data set I based on [21] (n = 1671; left, 1a) and in data set II based on [22] (n = 1814; right, 1b).Pupil dynamics are normalized to a foregoing baseline mean; shaded bars mark functional 95 % confidence intervals.

Fig. 2 .
Fig. 2. Signal dynamics aligned at the local maximum occurring between 300 ms prior to fixation until 700 ms after fixation for data set I [21] (n = 847; left, 1a) and for data set II [22] (n = 1301; right, 1b).Pupil dynamics are corrected for the first value as a baseline to increase comparability; shaded bars mark functional 95 % confidence intervals.