Unlike the majority of the content on the Digitizers’ blog, this post does not involve pretty pictures or interesting nuggets of historical information. Instead, it will cover a very different aspect of the work performed in the Digital Initiatives unit: Metadata. To be more specific, descriptive metadata; less glamorous perhaps than the maps and images we typically share in this forum, but without a significant amount of behind-the-scenes metadata work, searching for a particular image in our digital collections would be akin to looking for a needle in an enormous pile of very-similar needles. While the following procedures will probably be most interesting to our colleagues in libraries, archives, and other information science disciplines who deal with these kinds of issues on a daily basis, we hope also to shed some light for the average user on both how and why we spend a significant amount of our time developing and implementing useful metadata for our collections.
BC Bibliography Metadata Overview
The BC Bibliography collection is inspired by (and based closely on) a three volume printed bibliography of the same name. In a sense, these volumes contain only metadata. That is, the title, author, and descriptive data included in the printed bibliography serves as a guide for locating the texts and documents that it refers to.
In much the same way, the metadata we create provides a means for users to locate the digital documents in our collections. The metadata schema for the BC Bibliography digital collection is designed specifically to supplement full text searches with results for related terms and to provide a faceted browsing experience in which users can select terms from several categorical lists to narrow down their results.
Preparing the metadata from the printed bibliography for use in an online environment is typically accomplished with what is known in information-professional circles as a “crosswalk,” which matches metadata fields from one “standard” schema to another. While the name calls to mind a nice clear path, in the real world things are almost never that simple. Fields from one schema seldom all match up 1:1 with the fields in the desired schema meaning that somehow data will either need to be split from a single field into multiple fields or visa versa. In practice this is akin to fitting a square peg in a round hole.
The final metadata for the BC Bibliography digital collection involved crosswalking three (or more) separate metadata sources into one. Because we wanted to provide the most complete metadata possible we decided to include not only the data from the original print bibliographies (which are missing subject data crucial to the desired faceted browsing experience), but also data from the contributing institution’s library catalogue record and the most complete record available through WorldCat (an industry-standard catalogue record aggregator). As one might imagine, this presented a host of issues including both the basic logistics of how to collect and combine all this data, and how to handle the numerous duplicates and inconsistencies inherent to mashing together so many disparate sets of data. Furthermore, the Library of Congress subject headings we were able to collect through this method proved unsuitable for the type of faceted browsing planned for the BC Bibliography collection, and needed to be split into constituent phrases (rather than the familiar dash — separated — format). Ultimately we were able to combine these records, split and reformat the subject fields, normalize the data, and remove all duplicates to end up with a properly formatted tab-seperated text file for batch uploading into our CONTENTdm based collection. If you’d like to know more about how it was all done, read on…
Pages: 1 2