A general framework for scraping newspaper websites

Data streaming nowadays is one of the most used approaches used by websites and applications to supply the end user with the latest articles and news. As a lot of news websites and companies are founded every day, such data centers must be flexible and it must be easy to introduce a new website to k...

Full description

Bibliographic Details
Main Author:	Tasim, Taner
Format:	Others
Language:	English
Published:	Linnéuniversitetet, Institutionen för datavetenskap (DV) 2016
Subjects:	Software Engineering Programvaruteknik
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-59044

id	ndltd-UPSALLA1-oai-DiVA.org-lnu-59044
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-lnu-590442018-01-14T05:11:47ZA general framework for scraping newspaper websitesengTasim, TanerLinnéuniversitetet, Institutionen för datavetenskap (DV)2016Software EngineeringProgramvaruteknikData streaming nowadays is one of the most used approaches used by websites and applications to supply the end user with the latest articles and news. As a lot of news websites and companies are founded every day, such data centers must be flexible and it must be easy to introduce a new website to keep track of. The main goal of this project is to investigate two frameworks where implementing a robot for given website should take some acceptable amount of time. It is really challenging task, first of all it aims optimizing of a framework which means to put less efforts on something and have the same result and one another thing is that it will be used by professors and students at the end so quality and robustness play big role here. In order to overcome this challenge two different types of news websites were investigated and through this process the approximately time to implement a single robot was extracted. Having in mind the time spent to implement a single robot, the new frameworks were implemented with the goal to spend less time to implement a new web robot. The results are two general frameworks for two different types of websites, where implementing a robot does not take so much efforts and time. The implementation time of a new robot was reduced from 18 hours to approximately 4 hours. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-59044application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Software Engineering Programvaruteknik
spellingShingle	Software Engineering Programvaruteknik Tasim, Taner A general framework for scraping newspaper websites
description	Data streaming nowadays is one of the most used approaches used by websites and applications to supply the end user with the latest articles and news. As a lot of news websites and companies are founded every day, such data centers must be flexible and it must be easy to introduce a new website to keep track of. The main goal of this project is to investigate two frameworks where implementing a robot for given website should take some acceptable amount of time. It is really challenging task, first of all it aims optimizing of a framework which means to put less efforts on something and have the same result and one another thing is that it will be used by professors and students at the end so quality and robustness play big role here. In order to overcome this challenge two different types of news websites were investigated and through this process the approximately time to implement a single robot was extracted. Having in mind the time spent to implement a single robot, the new frameworks were implemented with the goal to spend less time to implement a new web robot. The results are two general frameworks for two different types of websites, where implementing a robot does not take so much efforts and time. The implementation time of a new robot was reduced from 18 hours to approximately 4 hours.
author	Tasim, Taner
author_facet	Tasim, Taner
author_sort	Tasim, Taner
title	A general framework for scraping newspaper websites
title_short	A general framework for scraping newspaper websites
title_full	A general framework for scraping newspaper websites
title_fullStr	A general framework for scraping newspaper websites
title_full_unstemmed	A general framework for scraping newspaper websites
title_sort	general framework for scraping newspaper websites
publisher	Linnéuniversitetet, Institutionen för datavetenskap (DV)
publishDate	2016
url	http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-59044
work_keys_str_mv	AT tasimtaner ageneralframeworkforscrapingnewspaperwebsites AT tasimtaner generalframeworkforscrapingnewspaperwebsites
_version_	1718609862009552896

A general framework for scraping newspaper websites

Similar Items