Graph Prototype Generation for Graph Classification Using Genetic Algorithms and Graph Mining

碩士 === 國立清華大學 === 資訊工程學系 === 97 === Recently, graphs are widely used to represent structured objects. The term ‘graphs’ used here means a combination of labeled/unlabeled vertices and directed/undirected labeled/unlabeled edges. For example, a graph can represent the structure of chemical compounds...

Full description

Bibliographic Details
Main Authors: Yang, Shu-Hsin, 楊書欣
Other Authors: Soo, Von-Wun
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/26856511044490406949
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系 === 97 === Recently, graphs are widely used to represent structured objects. The term ‘graphs’ used here means a combination of labeled/unlabeled vertices and directed/undirected labeled/unlabeled edges. For example, a graph can represent the structure of chemical compounds , web page linking , and many other kinds of structured data. Although graph-based data is becoming more and more popular lately, the lack of powerful analytic tools is its major weakness and bottleneck. To cope with the disadvantage, various approaches attempted to map graph-based data structure into feature vectors so that they can apply statistical classifiers such as Support Vector Machine (SVM) , Boosting, etc. to classify the graphs. In this thesis, we propose a new point of view to deal with graph classification problems, i.e. prototype generation approach. That is, directly use the features of graphs (node labels, edge labels, connected components, subgraphs) to generate prototypes for each class that maximize the difference between intra-class similarity and inter-class similarity. In this approach, a graph-based genetic algorithm which includes genetic operators is used to generate offsprings, and gSpan (graph-based Substructure pattern mining) [Yan & Han 2002] is used to mining subgraphs to compute the fitness of selected prototypes. The classification accuracy of our method is near the best compared with other approaches with statistical classifiers. And it can be applied on almost every approach if a precise objective is given.