Summary: | There is much current interest in computer-based methods for selecting structurally diverse subsets of chemical databases, <I>e.g.</i>, for inclusion in biological screening programme or for the construction of combinatorial libraries. This paper summarises recent work in Sheffield on <i>dissimilarity-based compound selection</i>, which seeks to identify a maximally-dissimilar subset of the molecules in the database. More realistically, this approach seeks to identify a <i>near</i> maximally dissimilar subset, since the identification of the <i>most</i> dissimilar subset requires the use of a combinatorial algorithm that considers all possible subsets of a given dataset.
|