Profile-Based Focused Crawling for Social Media-Sharing Websites
<p/> <p>We present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing websites. In this system, we treat the user profiles as ranking criteria for guiding the crawling process. Furthermore, we divide a user's profile into two...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2009-01-01
|
Series: | EURASIP Journal on Image and Video Processing |
Online Access: | http://jivp.eurasipjournals.com/content/2009/856037 |
id |
doaj-3bd5c2fa3d764c44aff14b7cf85365ab |
---|---|
record_format |
Article |
spelling |
doaj-3bd5c2fa3d764c44aff14b7cf85365ab2020-11-25T00:54:37ZengSpringerOpenEURASIP Journal on Image and Video Processing1687-51761687-52812009-01-0120091856037Profile-Based Focused Crawling for Social Media-Sharing WebsitesZhang ZhiyongNasraoui Olfa<p/> <p>We present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing websites. In this system, we treat the user profiles as ranking criteria for guiding the crawling process. Furthermore, we divide a user's profile into two parts, an <it>internal part</it>, which comes from the user's own contribution, and an <it>external part</it>, which comes from the user's social contacts. In order to expand the crawling topic, a cotagging topic-discovery scheme was adopted for social media-sharing websites. In order to efficiently and effectively extract data for the focused crawling, a <it>path string</it>-based page classification method is first developed for identifying <it>list pages, detail pages</it>, and <it>profile pages</it>. The identification of the correct type of page is essential for our crawling, since we want to distinguish between list, profile, and detail pages in order to extract the correct information from each type of page, and subsequently estimate a reasonable ranking for each link that is encountered while crawling. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and online page importance computation (OPIC) crawlers, when crawling the Flickr website for two different topics.</p>http://jivp.eurasipjournals.com/content/2009/856037 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhang Zhiyong Nasraoui Olfa |
spellingShingle |
Zhang Zhiyong Nasraoui Olfa Profile-Based Focused Crawling for Social Media-Sharing Websites EURASIP Journal on Image and Video Processing |
author_facet |
Zhang Zhiyong Nasraoui Olfa |
author_sort |
Zhang Zhiyong |
title |
Profile-Based Focused Crawling for Social Media-Sharing Websites |
title_short |
Profile-Based Focused Crawling for Social Media-Sharing Websites |
title_full |
Profile-Based Focused Crawling for Social Media-Sharing Websites |
title_fullStr |
Profile-Based Focused Crawling for Social Media-Sharing Websites |
title_full_unstemmed |
Profile-Based Focused Crawling for Social Media-Sharing Websites |
title_sort |
profile-based focused crawling for social media-sharing websites |
publisher |
SpringerOpen |
series |
EURASIP Journal on Image and Video Processing |
issn |
1687-5176 1687-5281 |
publishDate |
2009-01-01 |
description |
<p/> <p>We present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing websites. In this system, we treat the user profiles as ranking criteria for guiding the crawling process. Furthermore, we divide a user's profile into two parts, an <it>internal part</it>, which comes from the user's own contribution, and an <it>external part</it>, which comes from the user's social contacts. In order to expand the crawling topic, a cotagging topic-discovery scheme was adopted for social media-sharing websites. In order to efficiently and effectively extract data for the focused crawling, a <it>path string</it>-based page classification method is first developed for identifying <it>list pages, detail pages</it>, and <it>profile pages</it>. The identification of the correct type of page is essential for our crawling, since we want to distinguish between list, profile, and detail pages in order to extract the correct information from each type of page, and subsequently estimate a reasonable ranking for each link that is encountered while crawling. Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and online page importance computation (OPIC) crawlers, when crawling the Flickr website for two different topics.</p> |
url |
http://jivp.eurasipjournals.com/content/2009/856037 |
work_keys_str_mv |
AT zhangzhiyong profilebasedfocusedcrawlingforsocialmediasharingwebsites AT nasraouiolfa profilebasedfocusedcrawlingforsocialmediasharingwebsites |
_version_ |
1725233607474151424 |