A Method for Filtering Pages by Similarity Degree based on Dynamic Programming
To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The b...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-12-01
|
Series: | Future Internet |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-5903/10/12/124 |
id |
doaj-a7221c34585a47fd97844c1250564638 |
---|---|
record_format |
Article |
spelling |
doaj-a7221c34585a47fd97844c12505646382020-11-24T21:23:00ZengMDPI AGFuture Internet1999-59032018-12-01101212410.3390/fi10120124fi10120124A Method for Filtering Pages by Similarity Degree based on Dynamic ProgrammingZiyun Deng0Tingqin He1College of Economics and Trade, Changsha Commerce & Tourism College, Changsha 410116, ChinaNational Supercomputing Center in Changsha, Hunan University, Changsha 410116, ChinaTo obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website “www.jd.com„, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the <i>PR</i> of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The <i>PR</i> of MFPSDDP is 13.3 percentage points higher than the <i>PR</i> of a Method for Filtering Pages by Containing Strings (MFPCS).https://www.mdpi.com/1999-5903/10/12/124method for filtering pagessimilarity degreedynamic programmingcombination method |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ziyun Deng Tingqin He |
spellingShingle |
Ziyun Deng Tingqin He A Method for Filtering Pages by Similarity Degree based on Dynamic Programming Future Internet method for filtering pages similarity degree dynamic programming combination method |
author_facet |
Ziyun Deng Tingqin He |
author_sort |
Ziyun Deng |
title |
A Method for Filtering Pages by Similarity Degree based on Dynamic Programming |
title_short |
A Method for Filtering Pages by Similarity Degree based on Dynamic Programming |
title_full |
A Method for Filtering Pages by Similarity Degree based on Dynamic Programming |
title_fullStr |
A Method for Filtering Pages by Similarity Degree based on Dynamic Programming |
title_full_unstemmed |
A Method for Filtering Pages by Similarity Degree based on Dynamic Programming |
title_sort |
method for filtering pages by similarity degree based on dynamic programming |
publisher |
MDPI AG |
series |
Future Internet |
issn |
1999-5903 |
publishDate |
2018-12-01 |
description |
To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website “www.jd.com„, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the <i>PR</i> of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The <i>PR</i> of MFPSDDP is 13.3 percentage points higher than the <i>PR</i> of a Method for Filtering Pages by Containing Strings (MFPCS). |
topic |
method for filtering pages similarity degree dynamic programming combination method |
url |
https://www.mdpi.com/1999-5903/10/12/124 |
work_keys_str_mv |
AT ziyundeng amethodforfilteringpagesbysimilaritydegreebasedondynamicprogramming AT tingqinhe amethodforfilteringpagesbysimilaritydegreebasedondynamicprogramming AT ziyundeng methodforfilteringpagesbysimilaritydegreebasedondynamicprogramming AT tingqinhe methodforfilteringpagesbysimilaritydegreebasedondynamicprogramming |
_version_ |
1725994036024573952 |