Automatically Detecting Errors in Employer Industry Classification Using Job Postings

Abstract In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. How...

Full description

Bibliographic Details
Main Authors: Alan Chern, Qiaoling Liu, Josh Chao, Mahak Goindani, Faizan Javed
Format: Article
Language:English
Published: SpringerOpen 2018-08-01
Series:Data Science and Engineering
Subjects:
Online Access:http://link.springer.com/article/10.1007/s41019-018-0071-7
id doaj-3156dbfafb134ee696ad2d0f1e515874
record_format Article
spelling doaj-3156dbfafb134ee696ad2d0f1e5158742021-03-02T04:34:43ZengSpringerOpenData Science and Engineering2364-11852364-15412018-08-013322123110.1007/s41019-018-0071-7Automatically Detecting Errors in Employer Industry Classification Using Job PostingsAlan Chern0Qiaoling Liu1Josh Chao2Mahak Goindani3Faizan Javed4CareerBuilderCareerBuilderCareerBuilderPurdue UniversityCareerBuilderAbstract In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes. Since the KB is huge, it is not possible to manually detect the errors. Therefore, in this paper we use machine learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers in the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and derive features from employer names, employer descriptions, job titles, and job descriptions to predict the industry of the employer. Two models are used for classification: (1) support vector machine and (2) random forest. Our experiments show that random forest is more effective than SVM in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better handles mixed feature vectors when normalization errors occur. We also observe that generally our models perform better in detecting errors for industries that have higher error rates.http://link.springer.com/article/10.1007/s41019-018-0071-7Employer industry classificationJob postingsMulticlass classificationError detection
collection DOAJ
language English
format Article
sources DOAJ
author Alan Chern
Qiaoling Liu
Josh Chao
Mahak Goindani
Faizan Javed
spellingShingle Alan Chern
Qiaoling Liu
Josh Chao
Mahak Goindani
Faizan Javed
Automatically Detecting Errors in Employer Industry Classification Using Job Postings
Data Science and Engineering
Employer industry classification
Job postings
Multiclass classification
Error detection
author_facet Alan Chern
Qiaoling Liu
Josh Chao
Mahak Goindani
Faizan Javed
author_sort Alan Chern
title Automatically Detecting Errors in Employer Industry Classification Using Job Postings
title_short Automatically Detecting Errors in Employer Industry Classification Using Job Postings
title_full Automatically Detecting Errors in Employer Industry Classification Using Job Postings
title_fullStr Automatically Detecting Errors in Employer Industry Classification Using Job Postings
title_full_unstemmed Automatically Detecting Errors in Employer Industry Classification Using Job Postings
title_sort automatically detecting errors in employer industry classification using job postings
publisher SpringerOpen
series Data Science and Engineering
issn 2364-1185
2364-1541
publishDate 2018-08-01
description Abstract In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes. Since the KB is huge, it is not possible to manually detect the errors. Therefore, in this paper we use machine learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers in the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and derive features from employer names, employer descriptions, job titles, and job descriptions to predict the industry of the employer. Two models are used for classification: (1) support vector machine and (2) random forest. Our experiments show that random forest is more effective than SVM in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better handles mixed feature vectors when normalization errors occur. We also observe that generally our models perform better in detecting errors for industries that have higher error rates.
topic Employer industry classification
Job postings
Multiclass classification
Error detection
url http://link.springer.com/article/10.1007/s41019-018-0071-7
work_keys_str_mv AT alanchern automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings
AT qiaolingliu automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings
AT joshchao automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings
AT mahakgoindani automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings
AT faizanjaved automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings
_version_ 1724243007873482752