Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions
This dissertation deals with an important problem in Data Mining and Knowledge Discovery (DM & KD), and Information Technology (IT) in general. It addresses the problem of efficiently learning monotone Boolean functions via membership queries to oracles. The monotone Boolean function can be thou...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en |
Published: |
LSU
2001
|
Subjects: | |
Online Access: | http://etd.lsu.edu/docs/available/etd-1204101-184456/ |
id |
ndltd-LSU-oai-etd.lsu.edu-etd-1204101-184456 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LSU-oai-etd.lsu.edu-etd-1204101-1844562013-01-07T22:47:47Z Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions Torvik, Vetle Ingvald Engineering Science (Interdepartmental Program) This dissertation deals with an important problem in Data Mining and Knowledge Discovery (DM & KD), and Information Technology (IT) in general. It addresses the problem of efficiently learning monotone Boolean functions via membership queries to oracles. The monotone Boolean function can be thought of as a phenomenon, such as breast cancer or a computer crash, together with a set of predictor variables. The oracle can be thought of as an entity that knows the underlying monotone Boolean function, and provides a Boolean response to each query. In practice, it may take the shape of a human expert, or it may be the outcome of performing tasks such as running experiments or searching large databases. Monotone Boolean functions have a general knowledge representation power and are inherently frequent in applications. A key goal of this dissertation is to demonstrate the wide spectrum of important real-life applications that can be analyzed by using the new proposed computational approaches. The applications of breast cancer diagnosis, computer crashing, college acceptance policies, and record linkage in databases are here used to demonstrate this point and illustrate the algorithmic details. Monotone Boolean functions have the added benefit of being intuitive. This property is perhaps the most important in learning environments, especially when human interaction is involved, since people tend to make better use of knowledge they can easily interpret, understand, validate, and remember. The main goal of this dissertation is to design new algorithms that can minimize the average number of queries used to completely reconstruct monotone Boolean functions defined on a finite set of vectors V = {0,1}^n. The optimal query selections are found via a recursive algorithm in exponential time (in the size of V). The optimality conditions are then summarized in the simple form of evaluative criteria, which are near optimal and only take polynomial time to compute. Extensive unbiased empirical results show that the evaluative criterion approach is far superior to any of the existing methods. In fact, the reduction in average number of queries increases exponentially with the number of variables n, and faster than exponentially with the oracle's error rate. Jerry L. Trahan Lynn R. LaMotte T. Warren Liao Evangelos Triantaphyllou Jianhua Chen Manoj K. Chari LSU 2001-12-14 text application/pdf http://etd.lsu.edu/docs/available/etd-1204101-184456/ http://etd.lsu.edu/docs/available/etd-1204101-184456/ en unrestricted I hereby grant to LSU or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University Libraries in all forms of media, now or hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
topic |
Engineering Science (Interdepartmental Program) |
spellingShingle |
Engineering Science (Interdepartmental Program) Torvik, Vetle Ingvald Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions |
description |
This dissertation deals with an important problem in Data Mining and Knowledge Discovery (DM & KD), and Information Technology (IT) in general. It addresses the problem of efficiently learning monotone Boolean functions via membership queries to oracles. The monotone Boolean function can be thought of as a phenomenon, such as breast cancer or a computer crash, together with a set of predictor variables. The oracle can be thought of as an entity that knows the underlying monotone Boolean function, and provides a Boolean response to each query. In practice, it may take the shape of a human expert, or it may be the outcome of performing tasks such as running experiments or searching large databases.
Monotone Boolean functions have a general knowledge representation power and are inherently frequent in applications. A key goal of this dissertation is to demonstrate the wide spectrum of important real-life applications that can be analyzed by using the new proposed computational approaches. The applications of breast cancer diagnosis, computer crashing, college acceptance policies, and record linkage in databases are here used to demonstrate this point and illustrate the algorithmic details. Monotone Boolean functions have the added benefit of being intuitive. This property is perhaps the most important in learning environments, especially when human interaction is involved, since people tend to make better use of knowledge they can easily interpret, understand, validate, and remember.
The main goal of this dissertation is to design new algorithms that can minimize the average number of queries used to completely reconstruct monotone Boolean functions defined on a finite set of vectors V = {0,1}^n. The optimal query selections are found via a recursive algorithm in exponential time (in the size of V). The optimality conditions are then summarized in the simple form of evaluative criteria, which are near optimal and only take polynomial time to compute. Extensive unbiased empirical results show that the evaluative criterion approach is far superior to any of the existing methods. In fact, the reduction in average number of queries increases exponentially with the number of variables n, and faster than exponentially with the oracle's error rate. |
author2 |
Jerry L. Trahan |
author_facet |
Jerry L. Trahan Torvik, Vetle Ingvald |
author |
Torvik, Vetle Ingvald |
author_sort |
Torvik, Vetle Ingvald |
title |
Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions |
title_short |
Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions |
title_full |
Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions |
title_fullStr |
Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions |
title_full_unstemmed |
Data Mining and Knowledge Discovery: A Guided Approach Based on Monotone Boolean Functions |
title_sort |
data mining and knowledge discovery: a guided approach based on monotone boolean functions |
publisher |
LSU |
publishDate |
2001 |
url |
http://etd.lsu.edu/docs/available/etd-1204101-184456/ |
work_keys_str_mv |
AT torvikvetleingvald dataminingandknowledgediscoveryaguidedapproachbasedonmonotonebooleanfunctions |
_version_ |
1716476315366850560 |