Automatically Detecting Errors in Employer Industry Classification Using Job Postings
Abstract In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. How...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2018-08-01
|
Series: | Data Science and Engineering |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1007/s41019-018-0071-7 |
id |
doaj-3156dbfafb134ee696ad2d0f1e515874 |
---|---|
record_format |
Article |
spelling |
doaj-3156dbfafb134ee696ad2d0f1e5158742021-03-02T04:34:43ZengSpringerOpenData Science and Engineering2364-11852364-15412018-08-013322123110.1007/s41019-018-0071-7Automatically Detecting Errors in Employer Industry Classification Using Job PostingsAlan Chern0Qiaoling Liu1Josh Chao2Mahak Goindani3Faizan Javed4CareerBuilderCareerBuilderCareerBuilderPurdue UniversityCareerBuilderAbstract In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes. Since the KB is huge, it is not possible to manually detect the errors. Therefore, in this paper we use machine learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers in the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and derive features from employer names, employer descriptions, job titles, and job descriptions to predict the industry of the employer. Two models are used for classification: (1) support vector machine and (2) random forest. Our experiments show that random forest is more effective than SVM in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better handles mixed feature vectors when normalization errors occur. We also observe that generally our models perform better in detecting errors for industries that have higher error rates.http://link.springer.com/article/10.1007/s41019-018-0071-7Employer industry classificationJob postingsMulticlass classificationError detection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Alan Chern Qiaoling Liu Josh Chao Mahak Goindani Faizan Javed |
spellingShingle |
Alan Chern Qiaoling Liu Josh Chao Mahak Goindani Faizan Javed Automatically Detecting Errors in Employer Industry Classification Using Job Postings Data Science and Engineering Employer industry classification Job postings Multiclass classification Error detection |
author_facet |
Alan Chern Qiaoling Liu Josh Chao Mahak Goindani Faizan Javed |
author_sort |
Alan Chern |
title |
Automatically Detecting Errors in Employer Industry Classification Using Job Postings |
title_short |
Automatically Detecting Errors in Employer Industry Classification Using Job Postings |
title_full |
Automatically Detecting Errors in Employer Industry Classification Using Job Postings |
title_fullStr |
Automatically Detecting Errors in Employer Industry Classification Using Job Postings |
title_full_unstemmed |
Automatically Detecting Errors in Employer Industry Classification Using Job Postings |
title_sort |
automatically detecting errors in employer industry classification using job postings |
publisher |
SpringerOpen |
series |
Data Science and Engineering |
issn |
2364-1185 2364-1541 |
publishDate |
2018-08-01 |
description |
Abstract In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes. Since the KB is huge, it is not possible to manually detect the errors. Therefore, in this paper we use machine learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers in the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and derive features from employer names, employer descriptions, job titles, and job descriptions to predict the industry of the employer. Two models are used for classification: (1) support vector machine and (2) random forest. Our experiments show that random forest is more effective than SVM in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better handles mixed feature vectors when normalization errors occur. We also observe that generally our models perform better in detecting errors for industries that have higher error rates. |
topic |
Employer industry classification Job postings Multiclass classification Error detection |
url |
http://link.springer.com/article/10.1007/s41019-018-0071-7 |
work_keys_str_mv |
AT alanchern automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings AT qiaolingliu automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings AT joshchao automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings AT mahakgoindani automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings AT faizanjaved automaticallydetectingerrorsinemployerindustryclassificationusingjobpostings |
_version_ |
1724243007873482752 |