바로가기 메뉴
본문내용 바로가기
하단내용 바로가기

메뉴보기

메뉴보기

발표연제 검색

연제번호 : FP2-2-2 북마크
제목 Interrater Reliability Among the Human Observers and Machine in Videofluoroscopic Examination
소속 Dankook University Hospital, Department of Rehabilitation Medicine1, Dankook University, Department of Nanobiomedical Science & BK21 PLUS NBM Research Center for Regenerative Medicine2, Dankook University, Department of Software Engineering3
저자 Yuna Kim1*, Joo Young Ko1, Hyun Il Kim1, Geun Seok Park1, Tae Uk Kim1, Seo Young Kim1, Jung Keun Hyun1,2, Seong Jae Lee1†
Objective
Interpretation of videofluoroscopic swallowing study (VFSS) depends on subjective visual judgement and is time consuming task that requires constant concentration. Previous studies demonstrated that interrater reliability of videofluoroscopy is only poor to fair. The authors previously developed a deep learning model which can detect the airway invasion from VFSS images in automated manner with a significant accuracy. The aim of this study is to assess the interrater reliability among the human observers and the deep learning model in detection of airway invasion from VFSS images and evaluate the clinical usefulness of the deep learning model.

Methods
One hundred and seventy-seven VFSS video files, each containing one swallowing event, were collected. Every effort was made to distribute the age, gender, viscosity of diet and degree of airway invasion (including laryngeal penetration and aspiration) evenly when selecting the VFSS files. The presence or absence of airway invasion in the files was judged by three physiatrists and the deep learning model. Physiatrist 1 had more than 20 years of experience in VFSS analysis, Physiatrist 2 had 10 years, and Physiatrist 3 was novice. They inspected the VFSS files blindly in separated place and discussion was not allowed. The results of three physiatrists and the computer were collected. The interrater reliability between all two physiatrists and the interrater reliability between the deep learning model and each physiatrist were analyzed using Cohen’s kappa statistics. The interrater reliability among three physiatrists and among all four raters including deep learning model were analyzed using Fleiss’ kappa statistics.

Results
The interrater reliabilities in all cases showed substantial agreement. Kappa coefficients were calculated as follows: between all two physiatrists, κ= 0.649, 0.679, 0.751; between deep learning model and each physiatrist, κ= 0.649, 0.679, 0.751; among three physiatrists, κ= 0.688; among all 4 raters including deep learning model, κ= 0.649 (Table 1).

Conclusion
The results show that the interrater reliability between the deep learning model and physiatrists was not inferior to the interrater reliability among the physiatrists. Clinical use of the deep learning model in the VFSS analysis is promising although further research will be needed to increase the accuracy and generalization.
File.1: Table1.jpg