Modeling and automatically predicting outcomes are fundamental steps in the assessment of psychomotor skills and play essential roles in providing timely feedback to trainees. Video-based assessment (VBA) of airwayskills presents a significant challenge, requiring the analysis of both spatial and temporal dynamics. The importance of the human gaze has been established as it carries valuable information about visual attention, which can be used to extract more relevant features that better reflect spatial and temporal dynamics. Recognizing the importance of the human gaze, which provides insight into visual attention, we utilize gaze data alongside a spatiotemporal attention mechanism for evaluating surgical skills using full-length videos.
Our modeling approach consists of three steps. Initially, gaze data is used to create a visual mask with pronounced spatiotemporal correlations, facilitating patch selection and minimizing unnecessary computations by omitting irrelevant patches. To counteract gaze measurement inaccuracies, the visual mask is refined with an isotropic gaussian convolution, ensuring the inclusion of absent or inaccurate gaze points and maintaining visibility across spatial locations. Next, a self-supervision network extracts feature maps from videos, while an attention module generates attention maps from the visual mask. A fusion module then combines the feature maps with attention maps to produce a classification score.
We evaluated our model on endotracheal intubation task performed on an airway manikin in a laboratory environment. Eighteen subjects performed 10 repetitions per day over three days. The video dataset - with their corresponding gaze - consists of 454 videos from the 18 subjects (129 unsuccessful and 325 successful trials). The model achieved a 93% accuracy, 90% F1-score, 96% sensitivity, and 92% specificity. Further analysis confirmed the effectiveness of the visual mask and attention maps in enhancing the performance of the model.
Keywords
ASSESSMENT
Additional Keywords
video-based assessment, human gaze, attention mechanism, endotracheal intubation.