Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management

Customer Relationship Management (CRM) is a fundamental tool in the hospitality industry nowadays, which can be seen as a big-data scenario due to the large amount of recordings which are annually handled by managers. Data quality is crucial for the success of these systems, and one of the main issu...

Full description

Bibliographic Details
Main Authors: Lydia González-Serrano, Pilar Talón-Ballestero, Sergio Muñoz-Romero, Cristina Soguero-Ruiz, José Luis Rojo-Álvarez
Format: Article
Language:English
Published: MDPI AG 2019-04-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/21/4/419
id doaj-13344e3cceae4e17821305e58c37c8d1
record_format Article
spelling doaj-13344e3cceae4e17821305e58c37c8d12020-11-24T21:21:15ZengMDPI AGEntropy1099-43002019-04-0121441910.3390/e21040419e21040419Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship ManagementLydia González-Serrano0Pilar Talón-Ballestero1Sergio Muñoz-Romero2Cristina Soguero-Ruiz3José Luis Rojo-Álvarez4Department of Business and Management, Rey Juan Carlos University, 28943 Madrid, SpainDepartment of Business and Management, Rey Juan Carlos University, 28943 Madrid, SpainDepartment of Business and Management, Rey Juan Carlos University, 28943 Madrid, SpainDepartment of Business and Management, Rey Juan Carlos University, 28943 Madrid, SpainDepartment of Business and Management, Rey Juan Carlos University, 28943 Madrid, SpainCustomer Relationship Management (CRM) is a fundamental tool in the hospitality industry nowadays, which can be seen as a big-data scenario due to the large amount of recordings which are annually handled by managers. Data quality is crucial for the success of these systems, and one of the main issues to be solved by businesses in general and by hospitality businesses in particular in this setting is the identification of duplicated customers, which has not received much attention in recent literature, probably and partly because it is not an easy-to-state problem in statistical terms. In the present work, we address the problem statement of duplicated customer identification as a large-scale data analysis, and we propose and benchmark a general-purpose solution for it. Our system consists of four basic elements: (a) A generic feature representation for the customer fields in a simple table-shape database; (b) An efficient distance for comparison among feature values, in terms of the Wagner-Fischer algorithm to calculate the Levenshtein distance; (c) A big-data implementation using basic map-reduce techniques to readily support the comparison of strategies; (d) An <i>X-from-M</i> criterion to identify those possible neighbors to a duplicated-customer candidate. We analyze the mass density function of the distances in the CRM text-based fields and characterized their behavior and consistency in terms of the entropy and of the mutual information for these fields. Our experiments in a large CRM from a multinational hospitality chain show that the distance distributions are statistically consistent for each feature, and that neighbourhood thresholds are automatically adjusted by the system at a first step and they can be subsequently more-finely tuned according to the manager experience. The entropy distributions for the different variables, as well as the mutual information between pairs, are characterized by multimodal profiles, where a wide gap between close and far fields is often present. This motivates the proposal of the so-called <i>X-from-M</i> strategy, which is shown to be computationally affordable, and can provide the expert with a reduced number of duplicated candidates to supervise, with low <i>X</i> values being enough to warrant the sensitivity required at the automatic detection stage. The proposed system again encourages and supports the benefits of big-data technologies in CRM scenarios for hotel chains, and rather than the use of ad-hoc heuristic rules, it promotes the research and development of theoretically principled approaches.https://www.mdpi.com/1099-4300/21/4/419Customer Relationship Managementhospitality industrybig dataduplicate detectionname matchingLevenshtein distance<i>X-from-M</i> strategyentropymutual informationmass density function
collection DOAJ
language English
format Article
sources DOAJ
author Lydia González-Serrano
Pilar Talón-Ballestero
Sergio Muñoz-Romero
Cristina Soguero-Ruiz
José Luis Rojo-Álvarez
spellingShingle Lydia González-Serrano
Pilar Talón-Ballestero
Sergio Muñoz-Romero
Cristina Soguero-Ruiz
José Luis Rojo-Álvarez
Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
Entropy
Customer Relationship Management
hospitality industry
big data
duplicate detection
name matching
Levenshtein distance
<i>X-from-M</i> strategy
entropy
mutual information
mass density function
author_facet Lydia González-Serrano
Pilar Talón-Ballestero
Sergio Muñoz-Romero
Cristina Soguero-Ruiz
José Luis Rojo-Álvarez
author_sort Lydia González-Serrano
title Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
title_short Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
title_full Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
title_fullStr Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
title_full_unstemmed Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management
title_sort entropic statistical description of big data quality in hotel customer relationship management
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2019-04-01
description Customer Relationship Management (CRM) is a fundamental tool in the hospitality industry nowadays, which can be seen as a big-data scenario due to the large amount of recordings which are annually handled by managers. Data quality is crucial for the success of these systems, and one of the main issues to be solved by businesses in general and by hospitality businesses in particular in this setting is the identification of duplicated customers, which has not received much attention in recent literature, probably and partly because it is not an easy-to-state problem in statistical terms. In the present work, we address the problem statement of duplicated customer identification as a large-scale data analysis, and we propose and benchmark a general-purpose solution for it. Our system consists of four basic elements: (a) A generic feature representation for the customer fields in a simple table-shape database; (b) An efficient distance for comparison among feature values, in terms of the Wagner-Fischer algorithm to calculate the Levenshtein distance; (c) A big-data implementation using basic map-reduce techniques to readily support the comparison of strategies; (d) An <i>X-from-M</i> criterion to identify those possible neighbors to a duplicated-customer candidate. We analyze the mass density function of the distances in the CRM text-based fields and characterized their behavior and consistency in terms of the entropy and of the mutual information for these fields. Our experiments in a large CRM from a multinational hospitality chain show that the distance distributions are statistically consistent for each feature, and that neighbourhood thresholds are automatically adjusted by the system at a first step and they can be subsequently more-finely tuned according to the manager experience. The entropy distributions for the different variables, as well as the mutual information between pairs, are characterized by multimodal profiles, where a wide gap between close and far fields is often present. This motivates the proposal of the so-called <i>X-from-M</i> strategy, which is shown to be computationally affordable, and can provide the expert with a reduced number of duplicated candidates to supervise, with low <i>X</i> values being enough to warrant the sensitivity required at the automatic detection stage. The proposed system again encourages and supports the benefits of big-data technologies in CRM scenarios for hotel chains, and rather than the use of ad-hoc heuristic rules, it promotes the research and development of theoretically principled approaches.
topic Customer Relationship Management
hospitality industry
big data
duplicate detection
name matching
Levenshtein distance
<i>X-from-M</i> strategy
entropy
mutual information
mass density function
url https://www.mdpi.com/1099-4300/21/4/419
work_keys_str_mv AT lydiagonzalezserrano entropicstatisticaldescriptionofbigdataqualityinhotelcustomerrelationshipmanagement
AT pilartalonballestero entropicstatisticaldescriptionofbigdataqualityinhotelcustomerrelationshipmanagement
AT sergiomunozromero entropicstatisticaldescriptionofbigdataqualityinhotelcustomerrelationshipmanagement
AT cristinasogueroruiz entropicstatisticaldescriptionofbigdataqualityinhotelcustomerrelationshipmanagement
AT joseluisrojoalvarez entropicstatisticaldescriptionofbigdataqualityinhotelcustomerrelationshipmanagement
_version_ 1726000144970678272