A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The b...

Full description

Bibliographic Details
Main Authors: Ziyun Deng, Tingqin He
Format: Article
Language:English
Published: MDPI AG 2018-12-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/10/12/124
id doaj-a7221c34585a47fd97844c1250564638
record_format Article
spelling doaj-a7221c34585a47fd97844c12505646382020-11-24T21:23:00ZengMDPI AGFuture Internet1999-59032018-12-01101212410.3390/fi10120124fi10120124A Method for Filtering Pages by Similarity Degree based on Dynamic ProgrammingZiyun Deng0Tingqin He1College of Economics and Trade, Changsha Commerce & Tourism College, Changsha 410116, ChinaNational Supercomputing Center in Changsha, Hunan University, Changsha 410116, ChinaTo obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website &#8220;www.jd.com&#8222;, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the <i>PR</i> of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The <i>PR</i> of MFPSDDP is 13.3 percentage points higher than the <i>PR</i> of a Method for Filtering Pages by Containing Strings (MFPCS).https://www.mdpi.com/1999-5903/10/12/124method for filtering pagessimilarity degreedynamic programmingcombination method
collection DOAJ
language English
format Article
sources DOAJ
author Ziyun Deng
Tingqin He
spellingShingle Ziyun Deng
Tingqin He
A Method for Filtering Pages by Similarity Degree based on Dynamic Programming
Future Internet
method for filtering pages
similarity degree
dynamic programming
combination method
author_facet Ziyun Deng
Tingqin He
author_sort Ziyun Deng
title A Method for Filtering Pages by Similarity Degree based on Dynamic Programming
title_short A Method for Filtering Pages by Similarity Degree based on Dynamic Programming
title_full A Method for Filtering Pages by Similarity Degree based on Dynamic Programming
title_fullStr A Method for Filtering Pages by Similarity Degree based on Dynamic Programming
title_full_unstemmed A Method for Filtering Pages by Similarity Degree based on Dynamic Programming
title_sort method for filtering pages by similarity degree based on dynamic programming
publisher MDPI AG
series Future Internet
issn 1999-5903
publishDate 2018-12-01
description To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website &#8220;www.jd.com&#8222;, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the <i>PR</i> of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The <i>PR</i> of MFPSDDP is 13.3 percentage points higher than the <i>PR</i> of a Method for Filtering Pages by Containing Strings (MFPCS).
topic method for filtering pages
similarity degree
dynamic programming
combination method
url https://www.mdpi.com/1999-5903/10/12/124
work_keys_str_mv AT ziyundeng amethodforfilteringpagesbysimilaritydegreebasedondynamicprogramming
AT tingqinhe amethodforfilteringpagesbysimilaritydegreebasedondynamicprogramming
AT ziyundeng methodforfilteringpagesbysimilaritydegreebasedondynamicprogramming
AT tingqinhe methodforfilteringpagesbysimilaritydegreebasedondynamicprogramming
_version_ 1725994036024573952