GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification

Bin Wang1, Hongyi Pan1, Armstrong Aboah2, Zheyuan Zhang1, Elif Keles3, Drew Torigian3, Baris Turkbey4, Elizabeth Krupinski2, Jayaram Udupa3, Ulas Bagci1
1Northwestern University   2University of Arizona   3University of Pennsylvania   4National Institutes of Health
WACV 2024

GazeGNN integrates raw eye gaze with image features through a unified representation graph, enabling real‑time chest‑X‑ray disease classification without the costly visual attention map preprocessing.

Abstract

Eye tracking research is important in computer vision because it can help us understand how humans interact with the visual world. Specifically for high-risk applications, such as in medical imaging, eye tracking can help us to comprehend how radiologists and other medical professionals search, analyze, and interpret images for diagnostic and clinical purposes. Hence, the application of eye tracking techniques in disease classification has become increasingly popular in recent years. Contemporary works usually transform gaze information collected by eye track ing devices into visual attention maps (VAMs) to supervise the learning process. However, this is a time-consuming preprocessing step, which stops us from applying eye track ing to radiologists’ daily work. To solve this problem, we propose a novel gaze-guided graph neural network (GNN), GazeGNN, to leverage raw eye-gaze data without being converted into VAMs. In GazeGNN, to directly integrate eye gaze into image classification, we create a unified representation graph that models both images and gaze pattern information. With this benefit, we develop a real-time, real-world, end-to-end disease classification algorithm for the first time in the literature. This achievement demonstrates the practicality and feasibility of integrating real-time eye tracking techniques into the daily work of radiologists. To our best knowledge, GazeGNN is the first work that adopts GNN to integrate image and eye-gaze data. Our experiments on the public chest X-ray dataset show that our proposed method exhibits the best classification performance compared to existing methods.

Motivation

Transforming gaze data into dense Visual Attention Maps (VAMs) incurs heavy preprocessing and discards temporal cues. By representing sparse fixations as nodes in a graph and jointly reasoning over imaging and gaze vertices, GazeGNN:

  • Eliminates VAM generation, cutting inference latency from 9.2 s to 0.35 s.
  • Retains rich temporal‑spatial gaze patterns that correlate with expert diagnostic reasoning.
  • Enables end‑to‑end training on limited annotated scans via weight sharing across graph nodes.

Method

Results

GazeGNN surpasses prior gaze‑aware and gaze‑free methods on MIMIC‑CXR three‑class disease classification while maintaining real‑time speed.

Image 1

BibTeX

@inproceedings{wang2024gazegnn,
  title={Gazegnn: A gaze-guided graph neural network for chest x-ray classification},
  author={Wang, Bin and Pan, Hongyi and Aboah, Armstrong and Zhang, Zheyuan and Keles, Elif and Torigian, Drew and Turkbey, Baris and Krupinski, Elizabeth and Udupa, Jayaram and Bagci, Ulas},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2194--2203},
  year={2024}
}