Summary: | The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time-consuming. Much is known about the genetic risk of acquiring MS, but little is understood about the effect of genetics on the clinical course. This work uses natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and key clinical traits of disease course. 5,789 individuals with MS were identified by algorithm. Algorithms were also developed with high precision and specificity to extract detailed features of the clinical course of MS, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale scores, timed 25-foot walk scores, and MS medications. DNA was available for 1,221 individuals through BioVU. These samples and 2,587 control samples were genotyped on the ImmunoChip. After extensive sample and SNP quality control, replication of known MS risk loci confirmed that the genetic architecture of this EMR-derived population is similar to that of other published MS datasets. Genetic analyses of seven clinical traits were performed using the data extracted from the medical records: age at diagnosis, age and CNS origin of first neurological symptom, presence of oligoclonal bands, Multiple Sclerosis Severity Score, timed 25-foot walk, and time to secondary progressive MS. No outstanding results were observed, but many interesting results require further investigation. This work shows the potential of using EMR-derived data in research studies of disease course.
|