A machine learning approach to enhance the privacy of customers

Under ett telefonsamtal mellan en kund och en representant för ett företag utbyts en mängd information. Allt från en kunds namn, identifikationsnummer, hemadress till väderkonversationer och mer vardagliga ämnen. Kunskap om sin kundbas är en viktig del av ett företags verksamhet. Det finns därför et...

Full description

Bibliographic Details
Main Authors:	Anderberg, Jesper, Fathullah, Nazdar
Format:	Others
Language:	English
Published:	Malmö universitet, Fakulteten för teknik och samhälle (TS) 2019
Subjects:	Engineering and Technology Teknik och teknologier
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20629

id	ndltd-UPSALLA1-oai-DiVA.org-mau-20629
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-mau-206292020-10-28T05:38:26ZA machine learning approach to enhance the privacy of customersengAnderberg, JesperFathullah, NazdarMalmö universitet, Fakulteten för teknik och samhälle (TS)Malmö universitet, Fakulteten för teknik och samhälle (TS)Malmö universitet/Teknik och samhälle2019Engineering and TechnologyTeknik och teknologierUnder ett telefonsamtal mellan en kund och en representant för ett företag utbyts en mängd information. Allt från en kunds namn, identifikationsnummer, hemadress till väderkonversationer och mer vardagliga ämnen. Kunskap om sin kundbas är en viktig del av ett företags verksamhet. Det finns därför ett behov av att analysera samtalet mellan kund och företag, för att utveckla och förbättra den övergripande kundservicen och kundkännedomen. Med nya lagstiftningar som GDPR måste dock särskild hänsyn tas vid lagring av personlig information.I detta arbete, undersöker vi möjligheterna att klassificera data från ett transkriberat röstsamtal med hjälp av två maskininlärnings algoritmer, för att utelämna känslig information.En maskininlärningsmodell implementeras med hjälp av en iterativ systemutvecklingsmetod.Genom att tillämpa Naive Bayes och Support Vector Machine algoritmer klassificeraskänslig data såsom en persons namn och plats. Utvärderingsmetoderna 10-fold crossvalidation, learning curve, classification rapport, och ROC kurva används för att utvärdera systemet. Resultaten visar hur algoritmen når en hög noggrannhet när datasetet innehåller fler datapunkter jämfört med ett dataset med färre antal datapunkter. Slutligen, genom att pre-processera datan ökar algoritmernas noggrannhet. During a phone call between a customer and a representative for a company, various amountof information is exchanged. Everything from a customer’s name, identification number,and home address, to weather conversations and more generic subjects. Companies knowledgeabout their customers are a vital part of their business. Therefore, a need to analyzethe conversation in the form of transcripts might be necessary to develop and improvethe overall customer service within a company. However, with new legislation like GDPR,special considerations must be taken into account when storing personal information.In this paper we will examine, by using two machine learning algorithms, the possibilitiesof classifying data from a transcribed phone call, to leave out sensitive information. Themachine learning model is built by following an iterative system development method. Byusing the Naive Bayes and Support Vector Machine algorithms, classification of sensitivedata, such a persons name and location, is conducted. Evaluation methods like 10-foldcross-validation, learning curve, classification report, and ROC curve are used to evaluating the system. The results show that the algorithm achieved a higher accuracy when the dataset contains more data samples, compared to a dataset with less number of data samples. Furthermore, by pre-processing the data, the accuracy of the machine learning models increased. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20629Local 30440application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Engineering and Technology Teknik och teknologier
spellingShingle	Engineering and Technology Teknik och teknologier Anderberg, Jesper Fathullah, Nazdar A machine learning approach to enhance the privacy of customers
description	Under ett telefonsamtal mellan en kund och en representant för ett företag utbyts en mängd information. Allt från en kunds namn, identifikationsnummer, hemadress till väderkonversationer och mer vardagliga ämnen. Kunskap om sin kundbas är en viktig del av ett företags verksamhet. Det finns därför ett behov av att analysera samtalet mellan kund och företag, för att utveckla och förbättra den övergripande kundservicen och kundkännedomen. Med nya lagstiftningar som GDPR måste dock särskild hänsyn tas vid lagring av personlig information.I detta arbete, undersöker vi möjligheterna att klassificera data från ett transkriberat röstsamtal med hjälp av två maskininlärnings algoritmer, för att utelämna känslig information.En maskininlärningsmodell implementeras med hjälp av en iterativ systemutvecklingsmetod.Genom att tillämpa Naive Bayes och Support Vector Machine algoritmer klassificeraskänslig data såsom en persons namn och plats. Utvärderingsmetoderna 10-fold crossvalidation, learning curve, classification rapport, och ROC kurva används för att utvärdera systemet. Resultaten visar hur algoritmen når en hög noggrannhet när datasetet innehåller fler datapunkter jämfört med ett dataset med färre antal datapunkter. Slutligen, genom att pre-processera datan ökar algoritmernas noggrannhet. === During a phone call between a customer and a representative for a company, various amountof information is exchanged. Everything from a customer’s name, identification number,and home address, to weather conversations and more generic subjects. Companies knowledgeabout their customers are a vital part of their business. Therefore, a need to analyzethe conversation in the form of transcripts might be necessary to develop and improvethe overall customer service within a company. However, with new legislation like GDPR,special considerations must be taken into account when storing personal information.In this paper we will examine, by using two machine learning algorithms, the possibilitiesof classifying data from a transcribed phone call, to leave out sensitive information. Themachine learning model is built by following an iterative system development method. Byusing the Naive Bayes and Support Vector Machine algorithms, classification of sensitivedata, such a persons name and location, is conducted. Evaluation methods like 10-foldcross-validation, learning curve, classification report, and ROC curve are used to evaluating the system. The results show that the algorithm achieved a higher accuracy when the dataset contains more data samples, compared to a dataset with less number of data samples. Furthermore, by pre-processing the data, the accuracy of the machine learning models increased.
author	Anderberg, Jesper Fathullah, Nazdar
author_facet	Anderberg, Jesper Fathullah, Nazdar
author_sort	Anderberg, Jesper
title	A machine learning approach to enhance the privacy of customers
title_short	A machine learning approach to enhance the privacy of customers
title_full	A machine learning approach to enhance the privacy of customers
title_fullStr	A machine learning approach to enhance the privacy of customers
title_full_unstemmed	A machine learning approach to enhance the privacy of customers
title_sort	machine learning approach to enhance the privacy of customers
publisher	Malmö universitet, Fakulteten för teknik och samhälle (TS)
publishDate	2019
url	http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20629
work_keys_str_mv	AT anderbergjesper amachinelearningapproachtoenhancetheprivacyofcustomers AT fathullahnazdar amachinelearningapproachtoenhancetheprivacyofcustomers AT anderbergjesper machinelearningapproachtoenhancetheprivacyofcustomers AT fathullahnazdar machinelearningapproachtoenhancetheprivacyofcustomers
_version_	1719353787753168896

A machine learning approach to enhance the privacy of customers

Similar Items