The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming
碩士 === 國立政治大學 === 圖書資訊與檔案學研究所 === 105 === Digital Archives, placed in the network system for users to browse, change the collection into the digital images, and can help to preserve the collection and promote the content information. However, in the era of information explosion, Digital Archives can...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Online Access: | http://ndltd.ncl.edu.tw/handle/fa92n7 |
id |
ndltd-TW-105NCCU5447031 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NCCU54470312019-05-15T23:39:15Z http://ndltd.ncl.edu.tw/handle/fa92n7 The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming 運用光學字元辨識技術建置數位典藏全文資料庫之評估:以明人文集為例 Tsai, Han Wei 蔡瀚緯 碩士 國立政治大學 圖書資訊與檔案學研究所 105 Digital Archives, placed in the network system for users to browse, change the collection into the digital images, and can help to preserve the collection and promote the content information. However, in the era of information explosion, Digital Archives can’t help users to retrieve the information in the collection by simply recording metadata. So, only when built into the full text retrieval can Digital Archives provide users with a quick retrieval of the information they want. And the Optical Character Recognition (OCR) can help to output the full text information. The study explores the ancient books’ format and impact of image quality on the recognition results by recognizing the ancient books of the Ming dynasty with the OCR software. The study also explores institutional as well as individual views and considerations by in-depth interviewing institutional staff with experiences in the full text of Digital Archives plan. From the result we can discover that though the ancient books’ format and image quality do have influences on the recognition results, the overall interview suggests that the technology has overcome the limitation of the format under the high requirement for the image quality; that is, the quality of ancient books’ images is the most influential factor in the recognition results. Although the OCR already has the breakthrough in assisting the establishment of the full text database, most institutions have not yet applied this technology to full-textualization of the Digital Archives due to technical unfamiliar, budget, human resources and other factors. The study suggests that if some day one institution is interested in working on the the full text of the Digital Archives project, it firstly needs to develop a proper SOP and needs to understand the conditions of their ready-to-be-textualized collections so that it can adopt a suitable input mode. Secondly, this institution needs to communicate with the OCR company more so that it can realize whether the chosen collection fits the cost-effectiveness. Finally, under the considerations of both the institution and users, the study suggests that institutions can cooperate with OCR companies in the future, so users can choose collections for OCR recognition on their own and give the full text to the institutions as feedback after proofreading. This can not only understand users’ needs but also reduce the cost of the proofreading for the institution. Lin, Chiao Min 林巧敏 學位論文 ; thesis 173 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立政治大學 === 圖書資訊與檔案學研究所 === 105 === Digital Archives, placed in the network system for users to browse, change the collection into the digital images, and can help to preserve the collection and promote the content information. However, in the era of information explosion, Digital Archives can’t help users to retrieve the information in the collection by simply recording metadata. So, only when built into the full text retrieval can Digital Archives provide users with a quick retrieval of the information they want. And the Optical Character Recognition (OCR) can help to output the full text information.
The study explores the ancient books’ format and impact of image quality on the recognition results by recognizing the ancient books of the Ming dynasty with the OCR software. The study also explores institutional as well as individual views and considerations by in-depth interviewing institutional staff with experiences in the full text of Digital Archives plan. From the result we can discover that though the ancient books’ format and image quality do have influences on the recognition results, the overall interview suggests that the technology has overcome the limitation of the format under the high requirement for the image quality; that is, the quality of ancient books’ images is the most influential factor in the recognition results. Although the OCR already has the breakthrough in assisting the establishment of the full text database, most institutions have not yet applied this technology to full-textualization of the Digital Archives due to technical unfamiliar, budget, human resources and other factors.
The study suggests that if some day one institution is interested in working on the the full text of the Digital Archives project, it firstly needs to develop a proper SOP and needs to understand the conditions of their ready-to-be-textualized collections so that it can adopt a suitable input mode. Secondly, this institution needs to communicate with the OCR company more so that it can realize whether the chosen collection fits the cost-effectiveness. Finally, under the considerations of both the institution and users, the study suggests that institutions can cooperate with OCR companies in the future, so users can choose collections for OCR recognition on their own and give the full text to the institutions as feedback after proofreading. This can not only understand users’ needs but also reduce the cost of the proofreading for the institution.
|
author2 |
Lin, Chiao Min |
author_facet |
Lin, Chiao Min Tsai, Han Wei 蔡瀚緯 |
author |
Tsai, Han Wei 蔡瀚緯 |
spellingShingle |
Tsai, Han Wei 蔡瀚緯 The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming |
author_sort |
Tsai, Han Wei |
title |
The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming |
title_short |
The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming |
title_full |
The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming |
title_fullStr |
The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming |
title_full_unstemmed |
The Analysis of Use Optical Character Recognition to Establish the Full-text Retrieval Database:A Case Study of the Anthology of Chinese Literature in Ming |
title_sort |
analysis of use optical character recognition to establish the full-text retrieval database:a case study of the anthology of chinese literature in ming |
url |
http://ndltd.ncl.edu.tw/handle/fa92n7 |
work_keys_str_mv |
AT tsaihanwei theanalysisofuseopticalcharacterrecognitiontoestablishthefulltextretrievaldatabaseacasestudyoftheanthologyofchineseliteratureinming AT càihànwěi theanalysisofuseopticalcharacterrecognitiontoestablishthefulltextretrievaldatabaseacasestudyoftheanthologyofchineseliteratureinming AT tsaihanwei yùnyòngguāngxuézìyuánbiànshíjìshùjiànzhìshùwèidiǎncángquánwénzīliàokùzhīpínggūyǐmíngrénwénjíwèilì AT càihànwěi yùnyòngguāngxuézìyuánbiànshíjìshùjiànzhìshùwèidiǎncángquánwénzīliàokùzhīpínggūyǐmíngrénwénjíwèilì AT tsaihanwei analysisofuseopticalcharacterrecognitiontoestablishthefulltextretrievaldatabaseacasestudyoftheanthologyofchineseliteratureinming AT càihànwěi analysisofuseopticalcharacterrecognitiontoestablishthefulltextretrievaldatabaseacasestudyoftheanthologyofchineseliteratureinming |
_version_ |
1719150209355743232 |