Summary: | Within well-established fields of biomedical science, we identify “gaps,” topical areas of investigation that might be expected to occur but are missing. We define a field by carrying out a topical PubMed query and analyze Medical Subject Headings (MeSH) by which the set of retrieved articles are indexed. MeSH terms that occur in >1% of the articles are examined pairwise to see how often they are predicted to co-occur within individual articles (assuming that they are independent of each other). A pair of MeSH terms that are predicted to co-occur in at least 10 articles, yet are not observed to co-occur in any article, are “gaps” and were studied further in a corpus of 10 disease-related article sets and 10 related to biological processes. Overall, articles that filled gaps were cited more heavily than non-gap-filling articles and were 61% more likely to be published in multidisciplinary high-impact journals. Nine different features of these “gaps” were characterized and tested to learn which, if any, correlate with the appearance of one or more articles containing both MeSH terms within the next 5 years. Several different types of gaps were identified, each having distinct combinations of predictive features: (a) those arising as a byproduct of MeSH indexing rules; (b) those having little biological meaning; (c) those representing “low hanging fruit” for immediate exploitation; and (d) those representing gaps across disciplines or subdisciplines that do not talk to each other or work together. We have built a free, open tool called “Mine the Gap!” that identifies and characterizes the “gaps” for any PubMed query, which can be accessed via the Anne O’Tate value-added PubMed search interface (http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/AnneOTate.cgi).
|