Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle Food
In 2016, the University of Waterloo began offering a mediated copyright review and deposit service to support the growth of our institutional repository UWSpace. This resulted in the need to batch import large lists of published works into the institutional repository quickly and accurately. A range...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Code4Lib
2018-11-01
|
Series: | Code4Lib Journal |
Online Access: | https://journal.code4lib.org/articles/13895 |
id |
doaj-cd45cf354dc74308b96aa83d103bf0d4 |
---|---|
record_format |
Article |
spelling |
doaj-cd45cf354dc74308b96aa83d103bf0d42020-11-25T03:26:10ZengCode4LibCode4Lib Journal1940-57582018-11-014213895Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle FoodWilliam RoyChris GrayIn 2016, the University of Waterloo began offering a mediated copyright review and deposit service to support the growth of our institutional repository UWSpace. This resulted in the need to batch import large lists of published works into the institutional repository quickly and accurately. A range of methods have been proposed for harvesting publications metadata en masse, but many technological solutions can easily become detached from a workflow that is both reproducible for support staff and applicable to a range of situations. Many repositories offer the capacity for batch upload via CSV, so our method provides a template Python script that leverages the Habanero library for populating CSV files with existing metadata retrieved from the CrossRef API. In our case, we have combined this with useful metadata contained in a TSV file downloaded from Web of Science in order to enrich our metadata as well. The appeal of this ‘low-maintenance’ method is that it provides more robust options for gathering metadata semi-automatically, and only requires the user’s ability to access Web of Science and the Python program, while still remaining flexible enough for local customizations.https://journal.code4lib.org/articles/13895 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
William Roy Chris Gray |
spellingShingle |
William Roy Chris Gray Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle Food Code4Lib Journal |
author_facet |
William Roy Chris Gray |
author_sort |
William Roy |
title |
Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle Food |
title_short |
Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle Food |
title_full |
Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle Food |
title_fullStr |
Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle Food |
title_full_unstemmed |
Preparing Existing Metadata for Repository Batch Import: A Recipe for a Fickle Food |
title_sort |
preparing existing metadata for repository batch import: a recipe for a fickle food |
publisher |
Code4Lib |
series |
Code4Lib Journal |
issn |
1940-5758 |
publishDate |
2018-11-01 |
description |
In 2016, the University of Waterloo began offering a mediated copyright review and deposit service to support the growth of our institutional repository UWSpace. This resulted in the need to batch import large lists of published works into the institutional repository quickly and accurately. A range of methods have been proposed for harvesting publications metadata en masse, but many technological solutions can easily become detached from a workflow that is both reproducible for support staff and applicable to a range of situations. Many repositories offer the capacity for batch upload via CSV, so our method provides a template Python script that leverages the Habanero library for populating CSV files with existing metadata retrieved from the CrossRef API. In our case, we have combined this with useful metadata contained in a TSV file downloaded from Web of Science in order to enrich our metadata as well. The appeal of this ‘low-maintenance’ method is that it provides more robust options for gathering metadata semi-automatically, and only requires the user’s ability to access Web of Science and the Python program, while still remaining flexible enough for local customizations. |
url |
https://journal.code4lib.org/articles/13895 |
work_keys_str_mv |
AT williamroy preparingexistingmetadataforrepositorybatchimportarecipeforaficklefood AT chrisgray preparingexistingmetadataforrepositorybatchimportarecipeforaficklefood |
_version_ |
1724593598317461504 |