Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis

The latent semantic analysis (LSA) is a mathematical/statistical way of discovering hidden concepts between terms and documents or within a document collection (i.e., a large corpus of text). Each document of the corpus and terms are expressed as a vector with elements corresponding to these concept...

Full description

Bibliographic Details
Main Authors:	Fahrettin Horasan, Hasan Erbay, Fatih Varçın, Emre Deniz
Format:	Article
Language:	English
Published:	Hindawi Limited 2019-01-01
Series:	Scientific Programming
Online Access:	http://dx.doi.org/10.1155/2019/1095643

id	doaj-c9638530f13643cda7427f29e423a3a2
record_format	Article
spelling	doaj-c9638530f13643cda7427f29e423a3a22021-07-02T02:36:20ZengHindawi LimitedScientific Programming1058-92441875-919X2019-01-01201910.1155/2019/10956431095643Alternate Low-Rank Matrix Approximation in Latent Semantic AnalysisFahrettin Horasan0Hasan Erbay1Fatih Varçın2Emre Deniz3Computer Engineering Department, Engineering Faculty, Kırıkkale University, Yahşihan, 71450 Kırıkkale, TurkeyComputer Engineering Department, Engineering Faculty, Kırıkkale University, Yahşihan, 71450 Kırıkkale, TurkeyComputer Engineering Department, Engineering Faculty, Kırıkkale University, Yahşihan, 71450 Kırıkkale, TurkeyComputer Engineering Department, Engineering Faculty, Kırıkkale University, Yahşihan, 71450 Kırıkkale, TurkeyThe latent semantic analysis (LSA) is a mathematical/statistical way of discovering hidden concepts between terms and documents or within a document collection (i.e., a large corpus of text). Each document of the corpus and terms are expressed as a vector with elements corresponding to these concepts to form a term-document matrix. Then, the LSA uses a low-rank approximation to the term-document matrix in order to remove irrelevant information, to extract more important relations, and to reduce the computational time. The irrelevant information is called as “noise” and does not have a noteworthy effect on the meaning of the document collection. This is an essential step in the LSA. The singular value decomposition (SVD) has been the main tool obtaining the low-rank approximation in the LSA. Since the document collection is dynamic (i.e., the term-document matrix is subject to repeated updates), we need to renew the approximation. This can be done via recomputing the SVD or updating the SVD. However, the computational time of recomputing or updating the SVD of the term-document matrix is very high when adding new terms and/or documents to preexisting document collection. Therefore, this issue opened the door of using other matrix decompositions for the LSA as ULV- and URV-based decompositions. This study shows that the truncated ULV decomposition (TULVD) is a good alternative to the SVD in the LSA modeling.http://dx.doi.org/10.1155/2019/1095643
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Fahrettin Horasan Hasan Erbay Fatih Varçın Emre Deniz
spellingShingle	Fahrettin Horasan Hasan Erbay Fatih Varçın Emre Deniz Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis Scientific Programming
author_facet	Fahrettin Horasan Hasan Erbay Fatih Varçın Emre Deniz
author_sort	Fahrettin Horasan
title	Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis
title_short	Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis
title_full	Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis
title_fullStr	Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis
title_full_unstemmed	Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis
title_sort	alternate low-rank matrix approximation in latent semantic analysis
publisher	Hindawi Limited
series	Scientific Programming
issn	1058-9244 1875-919X
publishDate	2019-01-01
description	The latent semantic analysis (LSA) is a mathematical/statistical way of discovering hidden concepts between terms and documents or within a document collection (i.e., a large corpus of text). Each document of the corpus and terms are expressed as a vector with elements corresponding to these concepts to form a term-document matrix. Then, the LSA uses a low-rank approximation to the term-document matrix in order to remove irrelevant information, to extract more important relations, and to reduce the computational time. The irrelevant information is called as “noise” and does not have a noteworthy effect on the meaning of the document collection. This is an essential step in the LSA. The singular value decomposition (SVD) has been the main tool obtaining the low-rank approximation in the LSA. Since the document collection is dynamic (i.e., the term-document matrix is subject to repeated updates), we need to renew the approximation. This can be done via recomputing the SVD or updating the SVD. However, the computational time of recomputing or updating the SVD of the term-document matrix is very high when adding new terms and/or documents to preexisting document collection. Therefore, this issue opened the door of using other matrix decompositions for the LSA as ULV- and URV-based decompositions. This study shows that the truncated ULV decomposition (TULVD) is a good alternative to the SVD in the LSA modeling.
url	http://dx.doi.org/10.1155/2019/1095643
work_keys_str_mv	AT fahrettinhorasan alternatelowrankmatrixapproximationinlatentsemanticanalysis AT hasanerbay alternatelowrankmatrixapproximationinlatentsemanticanalysis AT fatihvarcın alternatelowrankmatrixapproximationinlatentsemanticanalysis AT emredeniz alternatelowrankmatrixapproximationinlatentsemanticanalysis
_version_	1721343070918672384

Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis

Similar Items