, , , , , , , , , , , , , , , , , ,

I shall begin this blog post with a pair of suitably dramatic quotations, purely in order to get the reader’s attention:

Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.

Stewart Baker, former General Counsel of the National Security Agency.

We kill people based on metadata.

Michael Hayden, former Director of the NSA and the Central Intelligence Agency.

(Thanks to David Cole of the New York Review of Books.)

Of course, not all applications of metadata (from the Greek meta-, meaning “change” or “beyond”; used in English to indicate the abstraction of a concept, in this case “data about data”) are literally a matter of life-or-death (although a trained cataloguer like me might beg to differ!). In the Information Age, however, the collection, maintenance and access to metadata is one of the most important issues facing any library, or indeed any organisation that requires a constant supply of relevant, good-quality information to function effectively.

The mainstay of the bibliographic metadata framework used in libraries is the Machine-Readable Cataloguing (MARC) record. This is a means of recording bibliographic metadata (in up to 999 separate fields and further sub-fields) in such a way that it can be read by a machine (but also by a trained human), essentially by providing a repeatable framework with clearly defined parameters that an automated process can understand. It was developed by the Library of Congress (LC) and first trialled in the mid-1960s; the LC’s importance and prestige in the LIS sector, both in the United States and abroad, ensured that it soon spread around the world.

A key reason for libraries to adopt the MARC record format for bibliographic metadata was to save time and resources: provided that all cataloguers used the same standards—the official MARC standards website links to numerous sets of authority codes for display in the relevant fields, such as country of origin, language, and how named individuals and organisations are related to the item being catalogued (and does not include further cataloguing standards such as the Anglo-American Cataloguing Rules and the comparatively new Resource Description and Access—then each new book or similar item only needed to be catalogued once, with the cataloguing authority then sharing its record with other libraries. Initially this was carried out by the LC offering a subscription service to computer-printed catalogue cards, but the development of the Internet in subsequent decades soon allowed for the much quicker transfer of information through cyberspace. The Online Computer Library Center (OCLC) which administers WorldCat, the world’s largest online public-access catalogue (OPAC), also dates back to the mid-1960s.

LC card catalogue

The development of online library catalogues sounded the death knell for the traditional card index systems, but the underlying metadata standards remained the same. (Photo credit: Ted Eytan)

Yet even as the methods of communication of, and access to, library resources changed, the metadata standards (albeit with periodic revisions) remained substantially the same. Just as the Dewey Decimal Classification and the Library of Congress Classification schemes date from the second half of the nineteenth centuries, so the principles for a library catalogue are still underpinned by Charles Ammi Cutter’s original objectives from the same period. Thus a modern library OPAC, whilst much more convenient to use than its cumbersome predecessor, is still used mostly to search by author, title or subject (the latter either directly by using controlled vocabulary such as the Library of Congress Subject Headings (LCSH), or by proxy with a classification system).

The recent development of more general computer markup languages to describe metadata, such as Extensible Markup Language (XML) and JavaScript Object Notation (JSON) has opened up many new avenues for library applications, as this twelve-year-old prediction indicates:

[XML] has the potential to exceed the impact of MARC on librarianship. While MARC is limited to bibliographic description […] XML provides a highly-effective framework for encoding anything from a bibliographic record for a book to the book itself.

—Roy Tennant, editor of XML in Libraries (2002).

The contributing authors to XML in Libraries identify seven key applications for the language:

  1. Library catalogue records
  2. Interlibrary loans
  3. Cataloguing and indexing
  4. Collection development
  5. Databases
  6. Data migration
  7. Systems interoperability

Whilst most of these developments have taken place “behind the scenes”, one visible indicator of progress is that many academic library OPACs, such as my home institution of City University London, have the functionality to export a bibliographic record in a format which can be understood by various reference management programmes—a similar process to exporting a website (such as the entire contents of this blog, which WordPress allows me to do if I so choose) using XML.

Yet despite these developments, and other features of modern library OPACS about which I have already posted—such as the use of “professional” colour schemes and the embedding of multimedia material using APIs—the metadata itself remains static, even as Web 2.0 becomes ever-more dynamic and engaging for its users. This is why the LC is developing a new model called Bibliographic Framework Initiative (BIBFRAME) which aims to (eventually) replace the ageing MARC records.

The key advantage that BIBFRAME has over MARC is that it is designed around a Linked Data model. MARC records have certain “linking fields” in which such data can be added—for example, the name of a series within which an individual monograph is published—but my impression from using the records regularly as an information professional is that each one is a discrete entity with the links tacked on as an afterthought—which is only to be expected, given that the format predates the practice of hyperlinking by several decades. In a BIBFRAME record, every aspect of metadata is relational and therefore searchable; this allows the searcher to move beyond Cutter’s simple objectives to achieve far greater precision in terms of information retrieval. But that’s not all. Reading the BIBFRAME overview, for me the key element is:

Information Resources can then be re-assembled into a coherent architecture that allows for cooperative cataloging at a far more granular level than before. Then, as we leverage the Web as an architecture for data, whenever updates to these Resources are performed (e.g. someone adds new information about a Person, new mappings related to a Subject, etc.) notification events can occur to automatically update systems that reference these Resources. Further, these information assets can now be more effectively utilized at a granular level and provide a richer substrate in which local collections, special collections and third party data can easily annotate and contextualize cooperative library content.

This allows for far more user participation in the creation and dissemination of metadata, essentially opening up the process beyond library and related employees to the library users themselves. Instead of a series of static pages, however large or however interlinked they may be, imagine a library OPAC that invites anyone who accesses it to edit the catalogue records in the manner of Wikipedia, or to link to non-traditional external sources of information, such as a LibraryThing profile or a book review posted on a personal blog. This would require a great deal of programme-writing in order to produce an interface that the layperson can use without special training, and a level of professional monitoring to prevent vandalism, but the example of Wikipedia is encouraging: in under fifteen years of existence, the article-count (all freely usable, of course) of the English-language version alone is over 4.5 million, written almost exclusively by millions of volunteers (of which approximately 100,000 are regular users), whilst the Wikimedia Foundation (which also administers many other similar projects using the same collaborative model) boasts a shade over 200 paid employees. Despite its voluntary and open nature, it has effective systems for dealing with vandalism and ensuring quality control, as shown in a number of studies which have compared its accuracy to that of conventional encyclopaedias.

Web 2.0 is defined by the interaction between its content and its users, and its pervasiveness in modern society is testament to how popular it is. In an era when libraries of all types are continually striving to improve user participation—whether due to government funding cuts or otherwise—perhaps it is fair to say that this process should be carried out, as a matter of some urgency, at its heart: the metadata that underlies all library resources.