Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese

This paper describes how we tackled the development of Amaia, a conversational agent for Portuguese entrepreneurs. After introducing the domain corpus used as Amaia’s Knowledge Base (KB), we make an extensive comparison of approaches for automatically matching user requests with Frequently Asked Que...

Full description

Bibliographic Details
Main Authors: José Santos, Luís Duarte, João Ferreira, Ana Alves, Hugo Gonçalo Oliveira
Format: Article
Language:English
Published: MDPI AG 2020-09-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/11/9/428
id doaj-032549f437564037b0ef2955aa7ca917
record_format Article
spelling doaj-032549f437564037b0ef2955aa7ca9172020-11-25T03:37:37ZengMDPI AGInformation2078-24892020-09-011142842810.3390/info11090428Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for PortugueseJosé Santos0Luís Duarte1João Ferreira2Ana Alves3Hugo Gonçalo Oliveira4CISUC, DEI, University of Coimbra, 3030-290 Coimbra, PortugalCISUC, DEI, University of Coimbra, 3030-290 Coimbra, PortugalCISUC, DEI, University of Coimbra, 3030-290 Coimbra, PortugalCISUC, DEI, University of Coimbra, 3030-290 Coimbra, PortugalCISUC, DEI, University of Coimbra, 3030-290 Coimbra, PortugalThis paper describes how we tackled the development of Amaia, a conversational agent for Portuguese entrepreneurs. After introducing the domain corpus used as Amaia’s Knowledge Base (KB), we make an extensive comparison of approaches for automatically matching user requests with Frequently Asked Questions (FAQs) in the KB, covering Information Retrieval (IR), approaches based on static and contextual word embeddings, and a model of Semantic Textual Similarity (STS) trained for Portuguese, which achieved the best performance. We further describe how we decreased the model’s complexity and improved scalability, with minimal impact on performance. In the end, Amaia combines an IR library and an STS model with reduced features. Towards a more human-like behavior, Amaia can also answer out-of-domain questions, based on a second corpus integrated in the KB. Such interactions are identified with a text classifier, also described in the paper.https://www.mdpi.com/2078-2489/11/9/428semantic textual similarityquestion answeringconversational agentsmachine learninginformation retrievaltext classification
collection DOAJ
language English
format Article
sources DOAJ
author José Santos
Luís Duarte
João Ferreira
Ana Alves
Hugo Gonçalo Oliveira
spellingShingle José Santos
Luís Duarte
João Ferreira
Ana Alves
Hugo Gonçalo Oliveira
Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
Information
semantic textual similarity
question answering
conversational agents
machine learning
information retrieval
text classification
author_facet José Santos
Luís Duarte
João Ferreira
Ana Alves
Hugo Gonçalo Oliveira
author_sort José Santos
title Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
title_short Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
title_full Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
title_fullStr Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
title_full_unstemmed Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
title_sort developing amaia: a conversational agent for helping portuguese entrepreneurs—an extensive exploration of question-matching approaches for portuguese
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2020-09-01
description This paper describes how we tackled the development of Amaia, a conversational agent for Portuguese entrepreneurs. After introducing the domain corpus used as Amaia’s Knowledge Base (KB), we make an extensive comparison of approaches for automatically matching user requests with Frequently Asked Questions (FAQs) in the KB, covering Information Retrieval (IR), approaches based on static and contextual word embeddings, and a model of Semantic Textual Similarity (STS) trained for Portuguese, which achieved the best performance. We further describe how we decreased the model’s complexity and improved scalability, with minimal impact on performance. In the end, Amaia combines an IR library and an STS model with reduced features. Towards a more human-like behavior, Amaia can also answer out-of-domain questions, based on a second corpus integrated in the KB. Such interactions are identified with a text classifier, also described in the paper.
topic semantic textual similarity
question answering
conversational agents
machine learning
information retrieval
text classification
url https://www.mdpi.com/2078-2489/11/9/428
work_keys_str_mv AT josesantos developingamaiaaconversationalagentforhelpingportugueseentrepreneursanextensiveexplorationofquestionmatchingapproachesforportuguese
AT luisduarte developingamaiaaconversationalagentforhelpingportugueseentrepreneursanextensiveexplorationofquestionmatchingapproachesforportuguese
AT joaoferreira developingamaiaaconversationalagentforhelpingportugueseentrepreneursanextensiveexplorationofquestionmatchingapproachesforportuguese
AT anaalves developingamaiaaconversationalagentforhelpingportugueseentrepreneursanextensiveexplorationofquestionmatchingapproachesforportuguese
AT hugogoncalooliveira developingamaiaaconversationalagentforhelpingportugueseentrepreneursanextensiveexplorationofquestionmatchingapproachesforportuguese
_version_ 1724544956811444224