Towards completely automatized HTML form discovery on the web

The forms discovered by our proposal can be directly used as training data by some form classifiers. Our experimental validation used thousands of real Web forms, divided into six domains, including a representative subset of the publicly available DeepPeep form base (DEEPPEEP, 2010; DEEPPEEP REPOS...

Full description

Bibliographic Details
Main Author: Moraes, Maurício Coutinho
Other Authors: Heuser, Carlos Alberto
Format: Others
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10183/70194
Description
Summary:The forms discovered by our proposal can be directly used as training data by some form classifiers. Our experimental validation used thousands of real Web forms, divided into six domains, including a representative subset of the publicly available DeepPeep form base (DEEPPEEP, 2010; DEEPPEEP REPOSITORY, 2011). Our results show that it is feasible to mitigate the demanding manual work required by two cutting-edge form classifiers (i.e., GFC and DSFC (BARBOSA; FREIRE, 2007a)), at the cost of a relatively small loss in effectiveness.