Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study

BackgroundObsessive-compulsive disorder (OCD) is characterized by recurrent intrusive thoughts, urges, or images (obsessions) and repetitive physical or mental behaviors (compulsions). Previous factor analytic and clustering studies suggest the presence of three or four subty...

Full description

Bibliographic Details
Main Authors: Jamie D Feusner, Reza Mohideen, Stephen Smith, Ilyas Patanam, Anil Vaitla, Christopher Lam, Michelle Massi, Alex Leow
Format: Article
Language:English
Published: JMIR Publications 2021-06-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2021/6/e25482
id doaj-a20a0b77fd354f509ba39a9d6429a8b7
record_format Article
spelling doaj-a20a0b77fd354f509ba39a9d6429a8b72021-06-21T12:47:13ZengJMIR PublicationsJournal of Medical Internet Research1438-88712021-06-01236e2548210.2196/25482Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics StudyJamie D Feusnerhttps://orcid.org/0000-0002-0391-345XReza Mohideenhttps://orcid.org/0000-0002-1242-3081Stephen Smithhttps://orcid.org/0000-0003-1098-363XIlyas Patanamhttps://orcid.org/0000-0002-1244-5360Anil Vaitlahttps://orcid.org/0000-0003-2485-8063Christopher Lamhttps://orcid.org/0000-0001-5306-8429Michelle Massihttps://orcid.org/0000-0002-4206-9386Alex Leowhttps://orcid.org/0000-0002-5660-8651 BackgroundObsessive-compulsive disorder (OCD) is characterized by recurrent intrusive thoughts, urges, or images (obsessions) and repetitive physical or mental behaviors (compulsions). Previous factor analytic and clustering studies suggest the presence of three or four subtypes of OCD symptoms. However, these studies have relied on predefined symptom checklists, which are limited in breadth and may be biased toward researchers’ previous conceptualizations of OCD. ObjectiveIn this study, we examine a large data set of freely reported obsession symptoms obtained from an OCD mobile app as an alternative to uncovering potential OCD subtypes. From this, we examine data-driven clusters of obsessions based on their latent semantic relationships in the English language using word embeddings. MethodsWe extracted free-text entry words describing obsessions in a large sample of users of a mobile app, NOCD. Semantic vector space modeling was applied using the Global Vectors for Word Representation algorithm. A domain-specific extension, Mittens, was also applied to enhance the corpus with OCD-specific words. The resulting representations provided linear substructures of the word vector in a 100-dimensional space. We applied principal component analysis to the 100-dimensional vector representation of the most frequent words, followed by k-means clustering to obtain clusters of related words. ResultsWe obtained 7001 unique words representing obsessions from 25,369 individuals. Heuristics for determining the optimal number of clusters pointed to a three-cluster solution for grouping subtypes of OCD. The first had themes relating to relationship and just-right; the second had themes relating to doubt and checking; and the third had themes relating to contamination, somatic, physical harm, and sexual harm. All three clusters showed close semantic relationships with each other in the central area of convergence, with themes relating to harm. An equal-sized split-sample analysis across individuals and a split-sample analysis over time both showed overall stable cluster solutions. Words in the third cluster were the most frequently occurring words, followed by words in the first cluster. ConclusionsThe clustering of naturally acquired obsessional words resulted in three major groupings of semantic themes, which partially overlapped with predefined checklists from previous studies. Furthermore, the closeness of the overall embedded relationships across clusters and their central convergence on harm suggests that, at least at the level of self-reported obsessional thoughts, most obsessions have close semantic relationships. Harm to self or others may be an underlying organizing theme across many obsessions. Notably, relationship-themed words, not previously included in factor-analytic studies, clustered with just-right words. These novel insights have potential implications for understanding how an apparent multitude of obsessional symptoms are connected by underlying themes. This observation could aid exposure-based treatment approaches and could be used as a conceptual framework for future research.https://www.jmir.org/2021/6/e25482
collection DOAJ
language English
format Article
sources DOAJ
author Jamie D Feusner
Reza Mohideen
Stephen Smith
Ilyas Patanam
Anil Vaitla
Christopher Lam
Michelle Massi
Alex Leow
spellingShingle Jamie D Feusner
Reza Mohideen
Stephen Smith
Ilyas Patanam
Anil Vaitla
Christopher Lam
Michelle Massi
Alex Leow
Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study
Journal of Medical Internet Research
author_facet Jamie D Feusner
Reza Mohideen
Stephen Smith
Ilyas Patanam
Anil Vaitla
Christopher Lam
Michelle Massi
Alex Leow
author_sort Jamie D Feusner
title Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study
title_short Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study
title_full Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study
title_fullStr Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study
title_full_unstemmed Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study
title_sort semantic linkages of obsessions from an international obsessive-compulsive disorder mobile app data set: big data analytics study
publisher JMIR Publications
series Journal of Medical Internet Research
issn 1438-8871
publishDate 2021-06-01
description BackgroundObsessive-compulsive disorder (OCD) is characterized by recurrent intrusive thoughts, urges, or images (obsessions) and repetitive physical or mental behaviors (compulsions). Previous factor analytic and clustering studies suggest the presence of three or four subtypes of OCD symptoms. However, these studies have relied on predefined symptom checklists, which are limited in breadth and may be biased toward researchers’ previous conceptualizations of OCD. ObjectiveIn this study, we examine a large data set of freely reported obsession symptoms obtained from an OCD mobile app as an alternative to uncovering potential OCD subtypes. From this, we examine data-driven clusters of obsessions based on their latent semantic relationships in the English language using word embeddings. MethodsWe extracted free-text entry words describing obsessions in a large sample of users of a mobile app, NOCD. Semantic vector space modeling was applied using the Global Vectors for Word Representation algorithm. A domain-specific extension, Mittens, was also applied to enhance the corpus with OCD-specific words. The resulting representations provided linear substructures of the word vector in a 100-dimensional space. We applied principal component analysis to the 100-dimensional vector representation of the most frequent words, followed by k-means clustering to obtain clusters of related words. ResultsWe obtained 7001 unique words representing obsessions from 25,369 individuals. Heuristics for determining the optimal number of clusters pointed to a three-cluster solution for grouping subtypes of OCD. The first had themes relating to relationship and just-right; the second had themes relating to doubt and checking; and the third had themes relating to contamination, somatic, physical harm, and sexual harm. All three clusters showed close semantic relationships with each other in the central area of convergence, with themes relating to harm. An equal-sized split-sample analysis across individuals and a split-sample analysis over time both showed overall stable cluster solutions. Words in the third cluster were the most frequently occurring words, followed by words in the first cluster. ConclusionsThe clustering of naturally acquired obsessional words resulted in three major groupings of semantic themes, which partially overlapped with predefined checklists from previous studies. Furthermore, the closeness of the overall embedded relationships across clusters and their central convergence on harm suggests that, at least at the level of self-reported obsessional thoughts, most obsessions have close semantic relationships. Harm to self or others may be an underlying organizing theme across many obsessions. Notably, relationship-themed words, not previously included in factor-analytic studies, clustered with just-right words. These novel insights have potential implications for understanding how an apparent multitude of obsessional symptoms are connected by underlying themes. This observation could aid exposure-based treatment approaches and could be used as a conceptual framework for future research.
url https://www.jmir.org/2021/6/e25482
work_keys_str_mv AT jamiedfeusner semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
AT rezamohideen semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
AT stephensmith semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
AT ilyaspatanam semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
AT anilvaitla semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
AT christopherlam semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
AT michellemassi semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
AT alexleow semanticlinkagesofobsessionsfromaninternationalobsessivecompulsivedisordermobileappdatasetbigdataanalyticsstudy
_version_ 1721368081952931840