This novel approach to automatically detecting eye contact during face-to-face interactions employs a deep neural network model that is as accurate as expert human raters. The approach captures eye contact via a wearable point-of-view (POV) camera and then uses a deep learning model to analyze results, eliminating the laborious and often subjective hand-coding process. The innovation will be an instrumental tool for clinicians and researchers as they analyze gaze behavior in social settings and medical screenings.
The innovation uses a residual neural network (ResNet) deep learning architecture. An observer wears a low-cost pair of glasses with a POV camera—serving as a video recorder—embedded in the bridge. As the subject makes eye contact with the observer, he looks into the camera. Computer software processes the video captured by the camera, solving a binary classification problem to determine in each frame whether the subject is making eye contact with the observer.
To demonstrate the approach, Georgia Tech researchers trained a deep convolutional neural network using a dataset of more than 4 million annotated images of 103 subjects with diverse demographic backgrounds. The network achieved overall precision of 0.936 and recall of 0.943 on 18 set-aside validation subjects. This performance is on par with 10 trained human coders with a mean precision of 0.918 and recall of 0.946, demonstrating that a deep learning model can produce automated coding with a reliability level comparable to human coders.
- High performance: Detects eye contact in POV camera video with reliability equivalent to expert human raters
- Efficient: Facilitates automatic detection of eye contact using computer vision methods
- Reliable: Demonstrates the feasibility of substituting automated analysis for human coding
- Inconspicuous camera: Enables normal face-to-face interactions without distractions, which is especially useful for patients with autism spectrum disorder (ASD)
This innovation supports numerous applications in clinical and social psychology research settings:
- Understanding social interactions
- Screening for numerous medical and/or psychiatric conditions (e.g., ASD, attention deficit/hyperactivity disorder [ADHD], fragile X syndrome, social anxiety/behavioral inhibition)
- Evaluating developmental milestones
- Interviewing job applicants
Gaze behavior is a key foundation of face-to-face social interaction. Eye contact serves multiple functions in social communication, including the establishment and recognition of relationships and the expression of interest and attentiveness. Atypical eye contact and abnormal gaze patterns are often indicators of numerous medical and/or psychiatric conditions. In particular, decreased eye contact is included in diagnostic criteria for ASD and is also a focus of early screening and treatment.
A variety of technologies automate the measurement of gaze behavior; eye tracking is the best- known example. Conventional monitor-based eye tracking is unsuitable for measuring real-world aspects of social gaze during face-to-face interactions. While wearable eye trackers can measure gaze behavior, they are expensive and burdensome to wearers and are doubly challenging for subjects with compliance, distraction, or fatigue issues. Moreover, because eye trackers only provide the point of gaze in a captured video recording, manual region of interest annotation must be performed on the video to identify gaze targets, thus limiting scalability.
Georgia Tech’s approach for automatically detecting eye contact demonstrates that it is feasible to use automated analysis for a range of applications.