바로가기 메뉴
본문내용 바로가기
하단내용 바로가기

메뉴보기

메뉴보기

발표연제 검색

연제번호 : P 1-19 북마크
제목 EHR Based Prediction of Future Incidence of Alzheimer’s Disease Using Machine Learning
소속 Yonsei University College of Medicine, Department of Rehabilitation Medicine, Gangnam Severance Hospital and Rehabilitation Institute of Neuromusular Disease1, Brookhaven National Laboratory, Computational Science Initiative2, National Health Insurance Service Ilsan Hospital, Department of Neurology3, Columbia University, Department of Psychiatry4, National Health Insurance Service Ilsan Hospital, Research and Analysis Team5, National Health Insurance Service Ilsan Hospital, Department of Physical Medicine and Rehabilitation6, Columbia University, Data Science Institute7
저자 Han Eol Cho1*, Ji Hwan Park2, Jong Hun Kim3, Melanie Wall4, Yaakov Stern4, Hyunsun Lim5, Shinjae Yoo2, Justin Byun1, Gun Jae Lee6, Jiook Cha4,7†, Hyoung-Seop Kim6†
Background: Accurate prediction of future incidence of Alzheimer’s disease may facilitate intervention strategy to delay disease onset. Existing AD risk prediction models require collection of biospecimen (genetic, CSF, or blood samples), cognitive testing, or brain imaging. Conversely, EHR provides an opportunity to build a completely automated risk prediction model based on individuals’ history of health and healthcare. We tested machine learning models to predict future incidence of AD using administrative EHR in individuals aged 65 or older.

Methods: We obtained de-identified EHR from Korean elders age above 65 years old (N=40,736) collected between 2002 and 2010 in the Korean National Health Insurance Service database system. Consisting of Participant Insurance Eligibility database, Healthcare Utilization database, and Health Screening database, our EHR contain 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness, and socio-demographics. Our event of interest was new incidence of AD defined from the EHR based on both AD codes and prescription of anti-dementia medication. Two definitions were considered: a more stringent one requiring a diagnosis and dementia medication resulting in n=614 cases (“definite AD”) and a more liberal one requiring only diagnostic codes (n=2,026; “probable AD”). We trained and validated a random forest, support vector machine, and logistic regression to predict incident AD in 1,2,3, and 4 subsequent years using the EHR available since 2002. The length of the EHR used in the models ranged from 1,571 to 2,239 days. Model training, validation, and testing was done using iterative (5 times), nested, stratified 5-fold cross validation.

Results: Average duration of EHR was 1,936 days in AD and 2,694 days in controls. For predicting future incidence of AD using the “definite AD” outcome, the machine learning models showed the best performance in 1 year prediction with AUC of 0.781; in 2 year, 0.739; in 3 year, 0.686; in 4 year, 0.662. Using “probable AD” outcome, the machine learning models showed the best performance in 1 year prediction with AUC of 0.730; in 2 year, 0.645; in 3 year, 0.575; in 4 year, 0.602. Important clinical features selected in logistic regression included hemoglobin level (b=-0.902), age (b=0.689), urine protein level (b=0.303), prescription of Lodopin (antipsychotic drug) (b=0.303), and prescription of Nicametate Citrate (vasodilator) (b=-0.297).

Conclusion: This study demonstrates that EHR can detect risk for incident AD. This approach could enable risk-specific stratification of elders for better targeted clinical trials.
File.1: 1.jpg
Figure 1. Consort Diagram
File.2: 2.jpg
Figure 2. Receiver-Operating Characteristic plots are shown for 0,1,2,3,4-year prediction. Incident AD was defined based on ICD-10 AD codes and anti-dementia medication for AD, “Definite AD”, or based on AD codes only, “Probable AD”.
File.3: table1.JPG
Top ten features and weights from logistic regression (0-yr prediction).