This page has moved
linkedin
  • Home
  • Media

Some fun videos from the eye tracking setup and video analysis I built
for a study with colleagues from Utrecht University back in 2014.

Details in Hessels, Cornelissen, Hooge, & Kemner (2017)
"Gaze Behavior to Faces During Dyadic Interaction"
​


​TL;DR:
A lot of research into face perception and face scanning uses static images or videos that do not actually interact with you. So, we built a setup to track gaze during live interactions. The videos where people try to stare at one another without laughing show that it got pretty realistic (and they're a funny bonus).


Picture

Back in 2014 Chantal Kemner's lab at Utrecht University wanted to know where people look while they interact with another person. There are many publications about where people look in faces. Many of these studies claim to investigate the "social" behavior of looking another person in the face. At the same time almost all of them use images or videos of faces. "Social" implies some sort of interaction, but videos and images do not actually respond to anything you do. To address this issue, the lab needed a setup that would allow accurate gaze measurement during live interaction between two people. One requirement was that no eye tracking glasses could be used since they obscure parts of the face which could disturb where people look (and very young participants will likely play with them). Thus, we needed a screen-based eye tracker, meaning screens between the two people. Another requirement was that people needed to be able to make eye contact. If you have ever been on a video call you may have noticed that the other person does not seem to be looking at you, even if they look at your eyes on the screen. Only if the person looks right into the camera will it look like "eye contact" on the other side. To address this we used the principle of a teleprompter, where the camera is behind a mirror that reflects the display (this is also how the news anchor can read their lines while looking right into the camera). By aligning the cameras with the eyes of both participants, looking at each other's eyes gives the impression of eye contact. This might all sound pretty artificial, but the experience turned out to be very life-like. Staring one another in the face without responding turned out to be just as awkward as it is in real life. In the video above you can see how that usually went.
​
​Another challenge (which is hard to imagine in modern times) was to track the face of the other person. Neural networks with highly capable face tracking were not widely available at the time. Usually videos were manually "coded" by research assistants, by hand. For each video frame an assistant would have to click the nose, the eyes, and the mouth of the person in the frame. That's 30 frames per second * 4 clicks per frame = 120 clicks per second of video! 

Since nobody on the team wanted the job, and frankly it seemed like something a computer should do, we used a method where an operator only had to click the facial landmarks once, along with selecting a subset the face. Our code would find trackable points in the selected area (any points, even if not facial landmarks) and use the deformation of these points to deform the landmarks the actual operator had selected. This way many frames could be annotated unsupervised (until the user inevitably touched their face and covered too many tracking points, causing an alert that a human should come take a look). If you look closely at the video below you will see little dots on the eyes, nose, and mouth. The lines around these landmarks are the edges of "Voronoi" cells (which are a neat way Roy made the definition of so called regions of interest more objective, but that's another story and another publication).

Powered by Create your own unique website with customizable templates.