Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform

There are many crawling systems available on the market but they are rather close systems dedicated for performing particular kind and class of tasks with predefined set of scope, strategy etc. In real life however there are meaningful groups of users (e.g. marketing, criminal or governmental analys...

Full description

Bibliographic Details
Main Authors: Leszek Siwik, Kamil Wlodarczyk, Mateusz Kluczny
Format: Article
Language:English
Published: AGH University of Science and Technology Press 2013-01-01
Series:Computer Science
Online Access:http://journals.agh.edu.pl/csci/article/download/266/746
id doaj-074e020d0ea84692b614ad4d2827d91d
record_format Article
spelling doaj-074e020d0ea84692b614ad4d2827d91d2020-11-25T00:02:29ZengAGH University of Science and Technology PressComputer Science1508-28062013-01-0114464510.7494/csci.2013.14.4.645Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling PlatformLeszek Siwik0Kamil Wlodarczyk1Mateusz Kluczny2University of Science and TechnologyAGH University of Science and TechnologyUniversity of Science and TechnologyThere are many crawling systems available on the market but they are rather close systems dedicated for performing particular kind and class of tasks with predefined set of scope, strategy etc. In real life however there are meaningful groups of users (e.g. marketing, criminal or governmental analysts) requiring not just a yet another crawling system dedicated for performing predefined tasks. They need rather easy-to-use, user friendly all-in-one studio for not only executing and running internet robots and crawlers, but also for (graphical) (re)defining and (re)composing crawlers according to dynamically changing requirements and use-cases. To realize the above-mentioned idea, Cassiopeia framework has been designed and developed. One has to remember, however, that enormous size and unimaginable structural complexity of WWW network are the reasons that, from a technical and architectural point of view, developing effective internet robots – and the more so developing a framework supporting graphical robots’ composition – becomes a really challenging task. The crucial aspect in the context of crawling efficiency and scalability is concurrency model applied. There are two the most typical concurrency management models i.e. classical concurrency based on the pool of threads and processes and event-driven concurrency. None of them are ideal approaches. That is why, research on alternative models is still conducted to propose efficient and convenient architecture for concurrent and distributed applications. One of promising models is staged event-driven architecture mixing to some extent both of above mentioned classical approaches and providing some additional benefits such as splitting application into separate stages connected by events queues – what is interesting taking requirements about crawler (re)composition into account. The goal of this paper is to present the idea and the PoC  implementation of Cassiopeia framework, with the special attention paid to its crucial architectural element i.e. design, implementation and applying of staged event-driven architecture being a micro-architecture of Cassiopeia’s agents i.e. its key computational and processing unitshttp://journals.agh.edu.pl/csci/article/download/266/746
collection DOAJ
language English
format Article
sources DOAJ
author Leszek Siwik
Kamil Wlodarczyk
Mateusz Kluczny
spellingShingle Leszek Siwik
Kamil Wlodarczyk
Mateusz Kluczny
Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform
Computer Science
author_facet Leszek Siwik
Kamil Wlodarczyk
Mateusz Kluczny
author_sort Leszek Siwik
title Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform
title_short Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform
title_full Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform
title_fullStr Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform
title_full_unstemmed Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform
title_sort staged event-driven architecture as a micro-architecture of distributed and pluginable crawling platform
publisher AGH University of Science and Technology Press
series Computer Science
issn 1508-2806
publishDate 2013-01-01
description There are many crawling systems available on the market but they are rather close systems dedicated for performing particular kind and class of tasks with predefined set of scope, strategy etc. In real life however there are meaningful groups of users (e.g. marketing, criminal or governmental analysts) requiring not just a yet another crawling system dedicated for performing predefined tasks. They need rather easy-to-use, user friendly all-in-one studio for not only executing and running internet robots and crawlers, but also for (graphical) (re)defining and (re)composing crawlers according to dynamically changing requirements and use-cases. To realize the above-mentioned idea, Cassiopeia framework has been designed and developed. One has to remember, however, that enormous size and unimaginable structural complexity of WWW network are the reasons that, from a technical and architectural point of view, developing effective internet robots – and the more so developing a framework supporting graphical robots’ composition – becomes a really challenging task. The crucial aspect in the context of crawling efficiency and scalability is concurrency model applied. There are two the most typical concurrency management models i.e. classical concurrency based on the pool of threads and processes and event-driven concurrency. None of them are ideal approaches. That is why, research on alternative models is still conducted to propose efficient and convenient architecture for concurrent and distributed applications. One of promising models is staged event-driven architecture mixing to some extent both of above mentioned classical approaches and providing some additional benefits such as splitting application into separate stages connected by events queues – what is interesting taking requirements about crawler (re)composition into account. The goal of this paper is to present the idea and the PoC  implementation of Cassiopeia framework, with the special attention paid to its crucial architectural element i.e. design, implementation and applying of staged event-driven architecture being a micro-architecture of Cassiopeia’s agents i.e. its key computational and processing units
url http://journals.agh.edu.pl/csci/article/download/266/746
work_keys_str_mv AT leszeksiwik stagedeventdrivenarchitectureasamicroarchitectureofdistributedandpluginablecrawlingplatform
AT kamilwlodarczyk stagedeventdrivenarchitectureasamicroarchitectureofdistributedandpluginablecrawlingplatform
AT mateuszkluczny stagedeventdrivenarchitectureasamicroarchitectureofdistributedandpluginablecrawlingplatform
_version_ 1725437501987880960