Library metadata standards in the Web 2.0 era

29 Wednesday Oct 2014

Posted by Dominic in Information Management

Tags

authority control, BIBFRAME, Charles Ammi Cutter, DITA, information architecture, information management, information needs, information organisation, information retrieval, LCSH, Library of Congress, library OPACs, MARC, markup languages, metadata, programming languages, Web 2.0, WorldCat, XML

I shall begin this blog post with a pair of suitably dramatic quotations, purely in order to get the reader’s attention:

Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.

—Stewart Baker, former General Counsel of the National Security Agency.

We kill people based on metadata.

—Michael Hayden, former Director of the NSA and the Central Intelligence Agency.

(Thanks to David Cole of the New York Review of Books.)

Of course, not all applications of metadata (from the Greek meta-, meaning “change” or “beyond”; used in English to indicate the abstraction of a concept, in this case “data about data”) are literally a matter of life-or-death (although a trained cataloguer like me might beg to differ!). In the Information Age, however, the collection, maintenance and access to metadata is one of the most important issues facing any library, or indeed any organisation that requires a constant supply of relevant, good-quality information to function effectively.

The mainstay of the bibliographic metadata framework used in libraries is the Machine-Readable Cataloguing (MARC) record. This is a means of recording bibliographic metadata (in up to 999 separate fields and further sub-fields) in such a way that it can be read by a machine (but also by a trained human), essentially by providing a repeatable framework with clearly defined parameters that an automated process can understand. It was developed by the Library of Congress (LC) and first trialled in the mid-1960s; the LC’s importance and prestige in the LIS sector, both in the United States and abroad, ensured that it soon spread around the world.

A key reason for libraries to adopt the MARC record format for bibliographic metadata was to save time and resources: provided that all cataloguers used the same standards—the official MARC standards website links to numerous sets of authority codes for display in the relevant fields, such as country of origin, language, and how named individuals and organisations are related to the item being catalogued (and does not include further cataloguing standards such as the Anglo-American Cataloguing Rules and the comparatively new Resource Description and Access—then each new book or similar item only needed to be catalogued once, with the cataloguing authority then sharing its record with other libraries. Initially this was carried out by the LC offering a subscription service to computer-printed catalogue cards, but the development of the Internet in subsequent decades soon allowed for the much quicker transfer of information through cyberspace. The Online Computer Library Center (OCLC) which administers WorldCat, the world’s largest online public-access catalogue (OPAC), also dates back to the mid-1960s.

The development of online library catalogues sounded the death knell for the traditional card index systems, but the underlying metadata standards remained the same. (Photo credit: Ted Eytan)

Yet even as the methods of communication of, and access to, library resources changed, the metadata standards (albeit with periodic revisions) remained substantially the same. Just as the Dewey Decimal Classification and the Library of Congress Classification schemes date from the second half of the nineteenth centuries, so the principles for a library catalogue are still underpinned by Charles Ammi Cutter’s original objectives from the same period. Thus a modern library OPAC, whilst much more convenient to use than its cumbersome predecessor, is still used mostly to search by author, title or subject (the latter either directly by using controlled vocabulary such as the Library of Congress Subject Headings (LCSH), or by proxy with a classification system).

The recent development of more general computer markup languages to describe metadata, such as Extensible Markup Language (XML) and JavaScript Object Notation (JSON) has opened up many new avenues for library applications, as this twelve-year-old prediction indicates:

[XML] has the potential to exceed the impact of MARC on librarianship. While MARC is limited to bibliographic description […] XML provides a highly-effective framework for encoding anything from a bibliographic record for a book to the book itself.

—Roy Tennant, editor of XML in Libraries (2002).

The contributing authors to XML in Libraries identify seven key applications for the language:

Library catalogue records
Interlibrary loans
Cataloguing and indexing
Collection development
Databases
Data migration
Systems interoperability

Whilst most of these developments have taken place “behind the scenes”, one visible indicator of progress is that many academic library OPACs, such as my home institution of City University London, have the functionality to export a bibliographic record in a format which can be understood by various reference management programmes—a similar process to exporting a website (such as the entire contents of this blog, which WordPress allows me to do if I so choose) using XML.

Yet despite these developments, and other features of modern library OPACS about which I have already posted—such as the use of “professional” colour schemes and the embedding of multimedia material using APIs—the metadata itself remains static, even as Web 2.0 becomes ever-more dynamic and engaging for its users. This is why the LC is developing a new model called Bibliographic Framework Initiative (BIBFRAME) which aims to (eventually) replace the ageing MARC records.

The key advantage that BIBFRAME has over MARC is that it is designed around a Linked Data model. MARC records have certain “linking fields” in which such data can be added—for example, the name of a series within which an individual monograph is published—but my impression from using the records regularly as an information professional is that each one is a discrete entity with the links tacked on as an afterthought—which is only to be expected, given that the format predates the practice of hyperlinking by several decades. In a BIBFRAME record, every aspect of metadata is relational and therefore searchable; this allows the searcher to move beyond Cutter’s simple objectives to achieve far greater precision in terms of information retrieval. But that’s not all. Reading the BIBFRAME overview, for me the key element is:

Information Resources can then be re-assembled into a coherent architecture that allows for cooperative cataloging at a far more granular level than before. Then, as we leverage the Web as an architecture for data, whenever updates to these Resources are performed (e.g. someone adds new information about a Person, new mappings related to a Subject, etc.) notification events can occur to automatically update systems that reference these Resources. Further, these information assets can now be more effectively utilized at a granular level and provide a richer substrate in which local collections, special collections and third party data can easily annotate and contextualize cooperative library content.

This allows for far more user participation in the creation and dissemination of metadata, essentially opening up the process beyond library and related employees to the library users themselves. Instead of a series of static pages, however large or however interlinked they may be, imagine a library OPAC that invites anyone who accesses it to edit the catalogue records in the manner of Wikipedia, or to link to non-traditional external sources of information, such as a LibraryThing profile or a book review posted on a personal blog. This would require a great deal of programme-writing in order to produce an interface that the layperson can use without special training, and a level of professional monitoring to prevent vandalism, but the example of Wikipedia is encouraging: in under fifteen years of existence, the article-count (all freely usable, of course) of the English-language version alone is over 4.5 million, written almost exclusively by millions of volunteers (of which approximately 100,000 are regular users), whilst the Wikimedia Foundation (which also administers many other similar projects using the same collaborative model) boasts a shade over 200 paid employees. Despite its voluntary and open nature, it has effective systems for dealing with vandalism and ensuring quality control, as shown in a number of studies which have compared its accuracy to that of conventional encyclopaedias.

Web 2.0 is defined by the interaction between its content and its users, and its pervasiveness in modern society is testament to how popular it is. In an era when libraries of all types are continually striving to improve user participation—whether due to government funding cuts or otherwise—perhaps it is fair to say that this process should be carried out, as a matter of some urgency, at its heart: the metadata that underlies all library resources.

Walking in a multimedia wonderland

20 Monday Oct 2014

Posted by Dominic in Information Architecture

≈ 3 Comments

Tags

application programming interfaces, DITA, Google Maps, information architecture, library OPACs, multimedia, programming languages, shortcodes, SlideShare, SoundCloud, Web 2.0, website design, WordPress, YouTube

In our DITA lecture today, we learnt about Application Programming Interfaces (APIs), which are used to create a simplified programming platform that can in turn be used to share data between web services. In the WordPress context—or at least for the free WordPress.com domain that I am using, not the more advanced WordPress.org equivalent, which allows its users more flexibility in return for downloading the service’s source code, thus modifying it in isolation—the service uses a stripped-down programming language which does not allow the dynamic embedding of content from other web services using standard HTML and JavaScript due to the security concerns associated with running an extremely extensive shared platform; that is to say, if one user were to introduce a security concern by using embedded content in this way, it could compromise many other users’ accounts in turn.

WordPress instead uses its own system of shortcodes which can be used to embed multimedia content from a predefined range of approved web services (listed here). Readers who are already familiar with my blog will have noticed the Twitter widget in the right-hand sidebar, but a substantial number of other services can also be embedded within the main text of each blog post. So far in the course of my posts, I have restricted myself to formatting tweaks and displaying freely-licensed images that I have uploaded to my allowance within the WordPress servers, but the possibilities extend far more widely than that. So, please join me on a voyage of discovery and adventure as I investigate what is possible!

First off, I will start with a platform that I have already embedded in this blog: Twitter. In addition to adding a customised Twitter lineline through the Widgets interface, WordPress also allows me to embed individual tweets. Using the tweet I used to announce my previous blog post as an example, it couldn’t be simpler to embed, as all I have to do is to copy-and-paste the URL:

My thoughts on yesterday's #dita lab experiment on searching techniques in information retrieval. http://t.co/iylT763KkM #citylis

— Dom Allington-Smith (@domallsmi) October 14, 2014

However, using the “tweet” shortcode (putting the word in square brackets within the text editor) allows me further options for customisation, in this case reducing the width, aligning to the right, and hiding the media attachment linked to in the tweet:

My thoughts on yesterday's #dita lab experiment on searching techniques in information retrieval. http://t.co/iylT763KkM #citylis
— Dom Allington-Smith (@domallsmi) October 14, 2014

Another popular API that can be embedded is Google Maps, which is extremely suitable for customisation given its fundamentally interactive and quantitative (in terms of co-ordinates) nature. Google Maps features a “share and embed” option in the bottom-right hand corner of the screen, which produces a string of code in standard HTML.

<a href="https://www.google.com/maps/embed?pb=!1m18!1m12!1m3!1d2482.2083652802517!2d-0.10233599999996856!3d51.527738000000014!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13.1!3m3!1m2!1s0x48761ca7b1d83351%3A0x570d19c20ab22a83!2sCity+University+London!5e0!3m2!1sen!2suk!4v1413804696127">https://www.google.com/maps/embed?pb=!1m18!1m12!1m3!1d2482.2083652802517!2d-0.10233599999996856!3d51.527738000000014!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13.1!3m3!1m2!1s0x48761ca7b1d83351%3A0x570d19c20ab22a83!2sCity+University+London!5e0!3m2!1sen!2suk!4v1413804696127</a>

Now, I already said that WordPress doesn’t support the full version of this language, but it in fact converts this string automatically (I disabled this above by using the “code” tag to preserve the original structure) into its own house style to produce a fully-functional embedded map.

And that’s not all! All sorts of multimedia can be embedded, from various other APIs. A YouTube video URL can be posted either by itself, or with a “youtube” shortcode for customisation options, to produce this:

Audio files hosted on various platforms can also be embedded. For example, SoundCloud even has a “WordPress code” tick-box available when sharing:

WordPress also supports the embedding of resources produced by Google Office and Microsoft Office Live, in addition to similar services such as SlideShare, whose sharing interface also features a special WordPress option:

Finally, a number of meta-shortcodes create impromptu links to other pages within the blog itself, a selection of which follow.

(Archive)

(Category – general)

(Sitemap)

The Library of Tomorrow
- About me

A full list of shortcodes supported by WordPress can be viewed here. Clearly there are some omissions: for example, it’s not possible just yet to embed a photo from Flickr or the lead of a Wikipedia article (as Google has recently begun doing in many of its search results pages), but it is an ongoing process with a great deal of potential. Using APIs to embed different web services within one another is now a fundamental feature of Web 2.0, and websites that lack such features already look dated.

From the library management perspective, you may wish to refer back to the links I provided to the Russell Group universities’ library OPACs in my earlier post. A quick survey reveals embedded Twitter timelines, image slideshows, walkthrough video guides, interactive tours, and other such interactive features. Many of these features are hosted on the institutions’ own servers, but many also make use of the API technology that underlies the sharing of information between web services throughout the modern Web.

The Library of Tomorrow

~ thoughts and reflections on the world of Library and Information Science

Tag Archives: Web 2.0

Library metadata standards in the Web 2.0 era

Walking in a multimedia wonderland