Generation of a Social Network Graph by Using Apache Spark

We plan to create a method of clustering a social network graph. For testing the method there is a need to generate a graph similar in structure to existing social networks. The article presents an algorithm for the graph distributed generation. We took into account basic properties such as power-la...

Full description

Bibliographic Details
Main Authors: Y. A. Belov, S. I. Vovchok
Format: Article
Language:English
Published: Yaroslavl State University 2016-12-01
Series:Modelirovanie i Analiz Informacionnyh Sistem
Subjects:
Online Access:https://www.mais-journal.ru/jour/article/view/414
id doaj-e06fa6b5312345a8a3c7b05672e3a1de
record_format Article
spelling doaj-e06fa6b5312345a8a3c7b05672e3a1de2021-07-29T08:15:22ZengYaroslavl State UniversityModelirovanie i Analiz Informacionnyh Sistem1818-10152313-54172016-12-0123677778310.18255/1818-1015-2016-6-777-783343Generation of a Social Network Graph by Using Apache SparkY. A. Belov0S. I. Vovchok1P.G. Demidov Yaroslavl State UniversityP.G. Demidov Yaroslavl State UniversityWe plan to create a method of clustering a social network graph. For testing the method there is a need to generate a graph similar in structure to existing social networks. The article presents an algorithm for the graph distributed generation. We took into account basic properties such as power-law distribution of the users communities number, dense intersections of the social networks and others. This algorithm also considers the problems that are present in similar works of other authors, for example, the multiple edges problem in the generation process. A special feature of the created algorithm is the implementation depending on the communities number parameter rather than on the connected users number as it is done in other works. It is connected with a peculiarity of progressing the existing social network structure. There are properties of its graph in the paper. We described a table containing the variables needed for the algorithm. A step-by-step generation algorithm was compiled. Appropriate mathematical parameters were calculated for it. A generation is performed in a distributed way by Apache Spark framework. It was described in detail how the tasks division with the help of this framework runs. The Erdos-Renyi model for random graphs is used in the algorithm. It is the most suitable and easy one to implement. The main advantages of the created method are the small amount of resources in comparison with other similar generators and execution speed. Speed is achieved through distributed work and the fact that in any time network users have their own unique numbers and are ordered by these numbers, so there is no need to sort them out. The designed algorithm will promote not only the efficient clustering method creation. It can be useful in other development areas connected, for example, with the social networks search engines.https://www.mais-journal.ru/jour/article/view/414social networkgeneration
collection DOAJ
language English
format Article
sources DOAJ
author Y. A. Belov
S. I. Vovchok
spellingShingle Y. A. Belov
S. I. Vovchok
Generation of a Social Network Graph by Using Apache Spark
Modelirovanie i Analiz Informacionnyh Sistem
social network
generation
author_facet Y. A. Belov
S. I. Vovchok
author_sort Y. A. Belov
title Generation of a Social Network Graph by Using Apache Spark
title_short Generation of a Social Network Graph by Using Apache Spark
title_full Generation of a Social Network Graph by Using Apache Spark
title_fullStr Generation of a Social Network Graph by Using Apache Spark
title_full_unstemmed Generation of a Social Network Graph by Using Apache Spark
title_sort generation of a social network graph by using apache spark
publisher Yaroslavl State University
series Modelirovanie i Analiz Informacionnyh Sistem
issn 1818-1015
2313-5417
publishDate 2016-12-01
description We plan to create a method of clustering a social network graph. For testing the method there is a need to generate a graph similar in structure to existing social networks. The article presents an algorithm for the graph distributed generation. We took into account basic properties such as power-law distribution of the users communities number, dense intersections of the social networks and others. This algorithm also considers the problems that are present in similar works of other authors, for example, the multiple edges problem in the generation process. A special feature of the created algorithm is the implementation depending on the communities number parameter rather than on the connected users number as it is done in other works. It is connected with a peculiarity of progressing the existing social network structure. There are properties of its graph in the paper. We described a table containing the variables needed for the algorithm. A step-by-step generation algorithm was compiled. Appropriate mathematical parameters were calculated for it. A generation is performed in a distributed way by Apache Spark framework. It was described in detail how the tasks division with the help of this framework runs. The Erdos-Renyi model for random graphs is used in the algorithm. It is the most suitable and easy one to implement. The main advantages of the created method are the small amount of resources in comparison with other similar generators and execution speed. Speed is achieved through distributed work and the fact that in any time network users have their own unique numbers and are ordered by these numbers, so there is no need to sort them out. The designed algorithm will promote not only the efficient clustering method creation. It can be useful in other development areas connected, for example, with the social networks search engines.
topic social network
generation
url https://www.mais-journal.ru/jour/article/view/414
work_keys_str_mv AT yabelov generationofasocialnetworkgraphbyusingapachespark
AT sivovchok generationofasocialnetworkgraphbyusingapachespark
_version_ 1721256520884158464