The Stack Archive

Controlling an iPhone with your face

Tue 4 Oct 2016

Since iOS  7 the iPhone has been capable of limited face gesture control via its unintuitively named Switch Control feature, which includes limited head-movement gestures as triggers for events, such as returning to the phone’s homepage.

The iPhone uses Apple’s FaceTime camera to register the movement, and the feature is buried in a group of other possible types of trigger, in a feature-set primarily aimed at disabled or physically impaired users.

Third-party vendors have taken some interest in the possibility of face-based user controls, coming up with novel products such as FaceBrowse, a full-featured Safari-based mobile hands-off web-browser. In the less-restricted fixed device space, a researcher from San Diego turned his face into an effective video remote control, and a Windows face-control program emerged a year later.

But researchers from the University of Toronto have had a wider vision for the possibilities of face-controlled smartphones, conducting research to try out three new methods for face-engaged interaction techniques.

‘[T]here are many situations where touch is limited. For instance, when outside during a cold winter, due to limitations of capacitive sensors, users have to take off their gloves to touch the screen, e.g., in order to change the playback of songs; when one hand is otherwise encumbered, users have trouble performing zoom (pinch) actions on their phones, e.g., in navigating maps. Under these circumstances, users could benefit from mechanisms that augment the touch input, although they may not use the augmented gestures all the time’

The techniques tried out ranged from those which add facial monitoring triggers to (traditional hands-based) direct interaction, to others which almost or completely replace direct interaction.

'3D map viewer: (a) normal 2D viewing, (b) rotating the phone to enter the 3D view mode, and (c)(d) moving the head to glimpse left or right side of the 3D buildings.'

‘3D map viewer: (a) normal 2D viewing, (b) rotating the phone to enter the 3D view mode, and (c)(d) moving the head to glimpse left or right side of the 3D buildings.’

The software the team used was built over the native face-detection APIs in iOS, and the researchers created a face-input processor to handle large-angle face rotations. In order to keep energy and CPU consumption to a minimum, the recogniser was given a frame rate of 16fps and video dimensions of 480×640 pixels.

The 3D Map Viewer segment of the experiment lets the user activate 3D maps view by tilting the phone, and then maps the viewport to the user’s head inclinations, enabling the user to have a more natural interaction with the virtual environment. The researchers suggest that the facility has potential in first-person perspective gaming, virtual tourism and previewing the streets ahead at intersections.

The innovations were tested among six males and three females aged 23-28, and included ‘multi-scale scrolling’, in which the speed of content scrolling is slowed down as the viewer leans into the device, and ‘one hand navigator’, which interactively uses the distance between face and screen to control both the zoom level and orientation of the content. One participant commented on the latter feature “[one hand navigator] is easier to mix zooming and rotation. [. . . ] I want to have it for my Google Map”, and several of the test subjects asserted that they would like to be able to use these distance-based features on a daily basis.

'Coarse-to-fine text edit: (a)(b) first touching the screen to set a rough cursor position, (c)(d) then using head gestures to move cursor in a finer level.'

‘Coarse-to-fine text edit: (a)(b) first touching the screen to set a rough cursor position, (c)(d) then using head gestures to move cursor in a finer level.’

Some of the techniques devised have minimal touch interaction. In the Coarse-To-Fine Text Edit module, the researchers address the clumsiness that can occur in some solutions to users inputting text on small devices:

‘Certain text editing tasks can be difficult to perform on smartphone touch screens due to the limited screen space and imprecision of finger input. For example, in cursor positioning, most commercial devices apply the “finger hold” gesture to trigger a virtual magnification lens with fixed offset to the touch point, allowing the user to see beneath their finger (similar to [30]). While this is functionally complete, it reduces context of the text surrounding the cursor. Also, it can be difficult for making cursor adjustments near the edges of the screen, because the finger likely slips off the screen thus cannot be sensed.’

Coarse-to-Fine text editing requires just one initial tap to set the cursor position, after which its position is changed by head movements, the principal advantage being that this removes the need to zoom in and lose context, and also means that none of the interface is hidden by the user’s hand.


‘Touch-free menu: using the relative orientation between the device and face angle to select menu items.’

The Touch-Free Menu module lets users choose options from a pie-graph by tilting their heads and/or their phones, with the relative orientation bringing the selected option into focus; the option is auto-engaged after a couple of seconds.

Among the more esoteric experiments in the research is Expressive Flicking, which combines motion sensing and touch to help the user combine multi-stage navigation gestures into single triggers, such as scrolling over varying distances (paragraphs/pages/sections/chapters, etc). This was the only experiment to receive less than 9 on the 1-11 Likert scale from the participants, who felt that the same results could be achieved with different gestures.

University alumnus Christopher Wang is among the many who have investigated expression recognition, presumably a potential extension to the possibilities of face-based control systems. However, since the subjects in the study featured here expressed some concern about their appearance to others whilst performing some of the more vigorous face-control tests, it is perhaps doubtful whether users will be willing to scrunge and gurn their way through their smartphones in a public context.


Apple iOS news research
Send us a correction about this article Send us a news tip