Visiting the Mundaneum

05 Wednesday Aug 2015

Posted by Dominic in Information History

Tags

Analogue Internet, data visualisation, Henri La Fontaine, information history, information organisation, information overload, Luciano Floridi, Mondothèque, Mundaneum, museums, Paul Otlet, Röyksopp, travel

Last weekend, I afforded myself some relief from the arduous processes of dissertation-writing and job-hunting with a short trip into deepest Wallonia. The pleasant Belgian town of Mons, a European Capital of Culture this year, is host to a number of museums, galleries and historic buildings, including the Mundaneum—the remains of Paul Otlet’s utopian project to create a world city underpinned by the free and direct access to, and dissemination of, information presented in a museum that showcases his life’s work. Although this visit was not strictly necessary to support my dissertation, as the resources that I require are all available either online or through the British Library, it was nevertheless fascinating to see original copies of many of Otlet’s explanatory posters and graphics, and of course numerous sections of the Répertoire bibliographique universel—Otlet’s enormous card-catalogue index of bibliographic references. I also had some productive discussions on a number of subjects with three like-minded individuals who were also visiting the museum.

The Mundaneum’s central atrium; the universality of the project is indicated by the prominent globe.

The Mundaneum’s exhibits are spread across three floors, illuminated just to the level of hushed reverence. The displays consist of sections of the catalogue and selections of Otlet’s drawings on various subjects (which are also available online through the Google Cultural Institute; Google operates one of its data centres nearby), not just limited to library and information science, but also including works on network theory, the nature of international associations, pacifism, and utopian visions of his never-realised World City.

A large section of the Répertoire bibliographique universel.

Perhaps surprisingly, each individual drawer can be opened to reveal its original contents.

Sections of the RBU are juxtaposed with examples of Otlet’s graphical output.

Of particular interest to my research into predictions of future information technology was a full-scale realisation of Otlet’s Mondothèque, an analogue anticipation of the digital desktop computer.

Otlet’s original drawing is framed above the modern construction.

Otlet’s work is also placed in its historical context by the display of previous attempts to organise human knowledge—some of which I referred to in this previous blog post on the history of encyclopaedias—in addition to a timeline that charts advances in statistics and abstract forms of data visualisation, as people have sort to record knowledge in ever-more accurate, intuitive ways.

The museum’s exhibits, furthermore, extend beyond Otlet’s lifetime, and include a plethora of examples of subsequent data visualisation enabled by the development of computer network technology in the form of the Internet—arguably the modern realisation of Otlet’s dreams. These later displays include visualisations on serious subjects such as global inequality and climate change, but also art inspired by data visualisation. I was particularly taken by this French-produced music video for the song “Remind Me” (2002) by the Norwegian group Röyksopp, which breaks down the events of a single person’s normal working day into its constituent quantitative data through a unrelenting procession of infographics:

Another information-inspired artwork demonstrates the exponential growth in digital information that has been caused by the development and increasing global ubiquity of the Internet. The (practically invisible) black grain in the centre of the white square of this exhibit represents the total amount of digital information produced by humanity from its beginnings to the year 2003. The white square itself extends that time period to 2014, and the larger black square is a prediction of the continued rapid expansion of digital information production that will continue up to 2020:

Can you see the central black grain?

This exhibit reminded me greatly of scale models of the Solar System, and in my experience the vast emptiness of space is as similarly difficult to comprehend as the current information explosion, which philosopher of information Luciano Floridi has characterised as “The Fourth Revolution“, comparable to the advances in human understanding achieved by Copernicus, Darwin and Freud.

All in all, the Mundaneum was an extremely interesting place to visit, and I would thoroughly recommend a trip to Mons to anyone interested in the subject.

Paul Otlet lives on, almost literally, through his work and writings.

…as does his close friend and long-term collaborator, Henri La Fontaine.

Self-reflection through DITA data analysis

09 Tuesday Dec 2014

Posted by Dominic in Information Architecture

≈ 6 Comments

Tags

data visualisation, DITA, Elena Villaespesa, Ernesto Priego, information architecture, text analysis, Twitter, Voyant Tools, WordPress

Well, after ten lectures, ten lab sessions and one reading week, the taught element of the DITA module is now over, with only the assignment left to complete and hand in. Before I started this module, a blog post reflecting on my experience would have been entirely qualitative—a simple description of my enjoyment of blogging and Tweeting, despite my previous hesitancy and even scepticism. However, if there’s one overarching theme that I’ve taken away from this module, it’s that everything I do online (and offline as well, for that matter) can be broken down into quantifiable data and be analysed using a variety of suitable tools. So, what better way to conclude my blog posts on this module by using some of these techniques on my own digital output?

Twitter
Let’s start with Twitter. From looking at my profile page, I can see that I (at the time of writing) have tweeted on 174 occasions, am following 177 accounts, have 77 followers, and have favourited 104 tweets. I can also scroll through each individual tweet that I have sent in order to see how many times (if any) it was retweeted or favourited by others. This is all very basic.

However, Twitter also provides its users with an analytics service, which—in addition to providing more detailed statistics pertaining to the above criteria—also provides a count of Impressions (defined as “the number of times users saw the tweet on Twitter”), Engagements (defined as “the total number of times a user has interacted with a tweet. This includes all clicks anywhere on the tweet[…]retweets, replies, follows, and favorites”), and the Engagement Rate (the number of Engagements divided by the total number of Impressions) for each individual tweet. For instance, this popular tweet about the course’s Christmas party yesterday evening has (so far) received 113 impressions and 62 engagements, giving an engagement rate of 54.9%. No doubt these figures will continue to change, especially as I have just interacted with it again in order to embed it in this post!

#citylis after party! pic.twitter.com/tOi9xsi6HN

— Dom Allington-Smith (@domallsmi) December 8, 2014

It is easy to see how Twitter Analytics can be of use to anyone with an account, especially those run by companies and institutions that are seeking maximum engagement with potential consumers. The detailed statistical information available allows the controller of the account to see what kind of tweets receive the most attention; for instance, those which include photos, links, or jokes.

As with several other websites that I have previously covered in this blog, Twitter analytics also allows for the raw data associated with the Twitter feed to be exported as a Comma Separated Values (csv) file into a spreadsheet for further analysis. (The data for my account is viewable here.) The spreadsheet—in the lab session yesterday we used an open-access template developed by Elena Villaespesa with the Tate and modified by Ernesto Priego—can then be used for further analysis. (Again, the data is viewable here For instance, it orders the top tweets by engagement rate for each month, so I can easily see that my top tweet for October was an advert for my earlier blog post on information retrieval—

My thoughts on yesterday's #dita lab experiment on searching techniques in information retrieval. http://t.co/iylT763KkM #citylis

— Dom Allington-Smith (@domallsmi) October 14, 2014

—and that my top post in November was a joke I made during the British Library Labs Symposium (although this has been influenced by the fact that I appear to have accidentally favourited my own tweet, a definite social media faux pas).

First it was the @MechCuratorBot , now @VictorianHumour 's Mechanical Comedian – how long before we get a Mechanical Librarian? #citylis

— Dom Allington-Smith (@domallsmi) November 3, 2014

I can also use the figures to produce suitable visualisations: for instance, this breakdown of different types of Twitter interactions between October and November indicates an overall upward trend (31% according to the figures) driven largely by people favouriting my tweets and clicking on them to view more details on more occasions.

The raw data itself can also be subjected to text analysis: the Voyant Tools wordcloud quickly and intuitively reveals the hashtags I have used most often (no prizes for guessing that #citylis dominates!) and the users with whom I have had the most interactions.

WordPress
The other main facet of my self-expression this term has been this blog. Again looking at the most basic raw data, I can see that I have published 13 posts (not including this one), and that the blog has 33 followers, has been commented on 28 times (although many of these are my own pingbacks to previous posts), and has been viewed 898 times (this last figure is viewable by anyone as a widget at the foot of the page).

WordPress also provides an analytics page which allows bloggers to track metrics such as page views, referrals to the site, and links clicked on. So, for example, I can track the blog’s popularity over time, with reference to when I published certain posts (interestingly enough, the leanest period so far coincides with reading week!).

I can also see the global reach of the blog by viewing the geographical location by country of visitors displayed on a world map (I’m surprised by some of these!).

And as a final example, I can see which tags and categories are particularly possible. As with the Twitter case above, this could allow me to expand my readership as I could focus on the more interesting subjects. Unfortunately this feature only extends back one week.

Unlike Twitter analytics, I cannot find a way to export this information for further analysis. However, I can copy-and-paste the content of each post to produce a Voyant word cloud.

Whilst “information” is unsurprisingly the dominant topic, the prominence of numerous colour-related terms is less immediately explainable, and requires me to think back to this early post on the blog’s design and colour scheme Similarly, anyone puzzled by the appearance of the word “demon” in a blog about Library and Information Science should refer to this post on information theory with special reference to Maxwell’s Demon. As ever, data mining and text analysis, understood as forms of “distant reading”, must be further investigated by close reading and an understanding of context, although they do provide good starting-points for research and identification of underlying patterns.

Conclusion
I have very much enjoyed using both Twitter and WordPress, and I believe that the examples above help to illustrate my engagement on the former platform, and my diverse range of readers and subjects covered on the latter. I have also enjoyed the module as a whole, despite not coming from a technical background: it has been extremely useful to understand the basic technological principles (and, stemming from them, the wider socio-cultural implications) that underlie the work of the contemporary information professional.

I will definitely keep this blog running, although I imagine that the posting schedule will become less frequent and regular, particularly with Christmas coming up and assignment deadlines looming. On the other hand, it may also begin to cover a wider range of material. I can imagine, for example, that the blogging process will be very helpful in organising my thoughts when it comes to writing my dissertation later in the academic year. For now, though, thanks for reading, and please stay tuned for further posts!

To finish with, here is some suitably reflective music for you to enjoy.

At the coalface of information

25 Tuesday Nov 2014

Posted by Dominic in Information Architecture

≈ 3 Comments

Tags

application programming interfaces, Big Data, Christiaan Huygens, data mining, data visualisation, datasets, Digital Humanities, DITA, Franco Moretti, Google Books, information architecture, information law, Old Bailey Online, programming languages, search engines, text analysis, Voyant Tools, Zotero

After last week’s experiments with word clouds and other forms of text analysis, our class took a step back yesterday to look at the wider implications of data mining (of which text analysis is a subset). These include questions of representation and of legality—for example, should data mining be exempt from copyright laws, allowing researchers to access full texts of copyrighted works in order to feed them through computer programmes and applications, with only the general conclusions being made available to the public instead of full individual texts?

Google Books’s ongoing digitisation project is a good example of the legal challenges involved; the associated website includes a brief (and self-justifying!) outline. It is worth bearing in mind that the aims of this particular project go beyond data mining, but one of the most visible outcomes is the creation of the Google Books Ngram Viewer, a tool which allows anyone to search the entire corpus of digitised material for certain words or phrases, in order to find out their levels of incidence (a technique given the name “n-gram”). The viewer covers the years 1800 to 2008 and incorporates corpora in various languages. The total number of books in the various corpora was 5.2&nsbp;million in 2012 and is growing rapidly; it still represents a small fraction of the total number of texts published or still available, but one that is becoming ever-increasingly representative of this total. Some institutions, such as the Royal Dutch Library (Koninklijke Bibliotheek) have also used the tool’s API to create their own “mashups”—in this case, a Dutch-language version of the tool with its own corpus.

The Google Books Ngram Viewer can be used, in a manner similar to word clouds, to provide a quick and easy-to-understand overview of the criteria searched for. For example, this search of the most notable leaders of the Soviet Union/Russian Federation since the October Revolution in 1917 produces the following results from Google’s English corpus, showing rapid rises for each new leader as he assumed power, followed by either sustained or transient interest thereafter. One can easily imagine a humanities scholar using the tool as a starting-point for further research (perhaps using the further links to precise listings within Google Books that are conveniently placed underneath the generated graph!).

The tool can also be useful for LIS research: this graph shows the n-gram trends for several literacy concepts that we discussed in our Foundation module last week.

The graphs can also be embedded using an API, but not in WordPress.com thanks to its limited HTML functionality! (Please click to enlarge.)

Many of the institutions and publishers that do collaborate with Google perhaps do so reluctantly, unwilling to take on a corporate behemoth of such immense proportions. However, there are many examples of research projects in which the corpus, data mining and text analysis are carried out with much greater co-operation. One of these is Old Bailey Online, a project funded and otherwise supported by a variety of institutions and sources, which provides a digital archive of the court’s proceedings between 1674 and 1913. The website has a search engine, but also an API Demonstrator, which allows the results of interrogations of the database to be exported to the reference management system, Zotero, and the Voyant Tools suite of applications for data visualisation which I used last week.

It is therefore possible to carry out complex searches, analyse the results at a superficial level (yet one that can identify key research questions), before going through particularly interesting texts within the corpus in more detail. This is conventional “close reading”; the newer methods of data mining and text analysis have been referred to as “distant reading” by the digital humanities scholar, Franco Moretti.

One of the reasons for making the archive publicly available is so that those with an interest in genealogy can research their family history; sadly, my almost-unique surname restricts me from carrying out a search based on such principles without further research into my family history! Nevertheless, a search of the complete archive for cases in which someone was found guilty of “wounding”, but also found to be “insane”, produced a corpus which I was able to visualise using a number of methods: for example, in addition to the word clouds covered last week, I produced a graph showing the incidence of different weapons commonly used in the corpus of cases (which is listed in chronological order).

The keywords can then be further analysed with a collocation tool, and by close reading in the corpus reader.

During this process, I noticed that the integration between Old Bailey Online and Voyant Tools was particularly impressive: the export interface was extremely easy to use, and the common English stopwords list was applied automatically (which is not the case if the text is entered manually, as was the case last week).

Some research projects take this process a stage further, and create their own customised data mining and data visualisation tools to integrate all aspects of the project within the same digital framework. Although this takes a significant amount of work, it also produces potentially the most convenient and “future-proof” (in the sense that a project does not have to rely upon an external partner). The Dutch Utrect University currently has several text-mining research projects listed on its Digital Humanities website. Unfortunately, many of them are still in the early stages of development and do not provide access to the data being used, but a good example is the Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic (CKCC) project, whose corpus comprises 20,000 letters sent between seventeenth-century scholars (mostly) resident in the Dutch Republic.

The project, again funded by grants from an assortment of sponsors, is clear in its aims:

One of the main targets of this project is to create free, online access to historical sources, open to researchers from various disciplines all over the world. Unlike earlier printed editions of correspondences, this source is not static but of a highly dynamic nature, making it possible for the scholarly community to discuss problems, share information and add transcriptions or footnotes. One of the great advantages of this project is its ability to invoke new questions, new interpretations and new information, and to bring all this together on an expanding website.

To this end, the project’s website includes a Virtual Research Environment (VRE)—the ePistolarium—which allows anyone to search the corpus and produce visualisations from the data produced. The search engine offers a plethora of options: one can search by sender, recipient (or combine the two), people named in the letter, geographical location of sender or recipient, and date. There is also an algorithm that allows for a similarity search, whereby letters are ranked and retrieved based on similarities within the text.

A search for the complete correspondence available of Christiaan Huygens—one of the most prominent and well-represented individuals within the corpus—produces a list of results which can be ordered using six different criteria: date, sender, recipient, sender location, recipient location, and text search score (if performing a free text search in the body of the letters). The transcribed contents of each individual letter can also be read, along with its associated metadata, important keywords, and similar texts that are retrieved using the aforementioned similarity search tool. Each letter can also be sent to an e-mail address, or be shared on Facebook or Twitter, but unfortunately there are no permalinks as yet. The search results as a whole can also be exported as a CSV (Comma Separated Values) file, for those who may wish to perform their own further analysis.

This would appear to be unnecessary, however, as the project has several different data visualisation tools that are fully integrated with the VRE. (The output for each visualisation is also available to download in JSON format, although it is not yet possible to embed any of them using an API.) The first of these is a map, in which the geographic location metadata associated with each letter is used to plot lines between senders and recipients on a map, in this case the correspondence of Huygens:

A movable timeline, in three different scales, allows the user to view patterns of Huygens’s correspondence in chronological order:

A network visualisation shows all the individuals to whom Huygens sent letters, and from whom he received them.

Finally, a “cocitation graph” shows the names of individuals, contemporary or otherwise, who feature in the correspondence. I believe that this visualisation is of the greatest value, as it allows us to view those who could be described as the intellectual influencers of Huygens and his peers, and acts as a useful starting-point for further research on this topic, which would involve close reading of the letters themselves. The project’s website includes a page of initial research experiments conducted with the tool.

(Committed readers of this blog may notice a certain similarity between these latter two visualisations and those produced by the TAGS Explorer tool that I wrote about some weeks ago.)

It is clear from my own experience with these projects, and the topics that I have covered in previous blog posts, that the “distant reading” of large-scale datasets through various forms of data mining is a crucial part of contemporary humanities research. Our role as information professionals must therefore be to fully understand these tools and technologies in order to further advance the knowledge that can be produced, or at least assisted in producing, by them throughout academia and the wider world. It is worth noting once more, however, that these techniques should be used to supplement traditional research, so we must also endeavour to keep our feet on the ground whilst doing so.

Screwing around with text analysis

18 Tuesday Nov 2014

Posted by Dominic in Information Architecture

≈ 5 Comments

Tags

Altmetric, altmetrics, data visualisation, datasets, Digital Humanities, DITA, Geoffrey Rockwell, infographics, information architecture, information overload, Jacob Harris, Julie Meloni, Stefan Sinclair, Stephen Ramsay, TAGS, text analysis, Twitter, Voyant Tools, Wordle

The title of this blog post is not flippant, but is in fact taken from a chapter of a recently-published book, entitled The Hermeneutics of Screwing Around; or What You Do with a Million Books, by Stephen Ramsay. Ramsay discusses how the concept of epistemological order—in his words, the “coherent, authoritative path through what is known”—has evolved over time, and recently been completely transformed entirely by the emergence and insatiable growth of the Internet. The essential premise of our current situation is that there now is far too much information, even on the most specialised subjects, for any one person to take in during his or her lifetime; however, new digital technologies allow us to explore the information in alternative ways.

One of these ways is text analysis. This involves feeding a corpus, or body, of text(s) through a computer programme in order to discover various conclusions that would not be apparent at first glance, or would otherwise have been prohibitively expensive and time-consuming to be carried out by human effort alone. For instance, a basic method of text analysis is counting the frequency of words that appear in a text; the information, which only takes a few seconds to compute, even for a corpus of thousands of words, can be displayed visually in the form of a word cloud. The word cloud currently produced by the RSS feed of this blog (which excludes the older posts) looks like this:

All apparently on-topic for a LIS student!

This word cloud was produced by Wordle, a widely-known and well-used JavaScript tool that performs this very basic level of text analysis. Wordle in fact refers to itself as a “toy” rather than a “tool”, but it provides a firm foundation for those getting involved in these methods for the first time. Indeed, Julie Meloni refers to it as “the gateway drug to textual analysis” in her blog post on the application’s uses in an education environment. Wordle’s addictiveness is enhanced by the fact that it is customisable at an aesthetic level—the colours, font and basic layout of the cloud can all be changed (hence the colour scheme to match my blog theme).

Wordle, however, is limited in other ways. I can demonstrate this by using the archive of data that I began exporting from Twitter a few weeks ago using TAGS (which has since been growing, as additional Tweets matching the criteria are automatically added every hour). Feeding the corpus consisting of the Tweets (all of which feature the hashtag #citylis, remember) into Wordle produces this word cloud (using exactly the same aesthetics as before):

This unwelcome result has occurred because of Worldle’s customisation limitations. Text analysis normally makes use of stopwords—extremely common words such as “the”, “a”, “and” and so forthwhich are not included in the computation. Wordle’s default stoplist does omit these words, but it cannot be edited to suit the needs of each individual word cloud; hence, this one is (as you would expect) dominated by the hashtag “citylis” that links the dataset together, whilst other Twitter jargon such as “RT” (retweet) and “MT” (modified tweet) is also visible.

Fortunately, other, more advanced, tools exist for text analysis. The best of those that I have come across so far is Voyant Tools (initially given the unfortunate name of Voyeur Tools), which is still under development by its creators, Geoffrey Rockwell and Stefan Sinclair, but offers significantly more options. Using the same citylis corpus and removing the unwanted stopwords produces a more meaningful visualisation, which clearly shows the most prolific Tweeters and the most common other hashtags used:

Unlike Wordle, the colours, fonts etc. cannot be modified.

Voyant’s tools also extend beyond the word cloud, including a full text reader, a word trends graph, and the ability to locate individual words and see them in context (known as collocation), as can be seen from the screenshot of the entire interface below. This has the potential for more extensive meaningful analysis: for example, looking at the word trends graph, it is clear that there is a spike in activity for the words “lab” and “#bl”—my classmate, Daniel van Strien, explains why here. Another of my classmates, Rachel Giles, has also used Voyant’s ability to customise stopwords to produce more meaningful information about her dataset of cancer-related health journal articles from Altmetric.

There are additional tools not visible in this screenshot, or in development.

There are more tools not visible in this screenshot, or in development.

This is only barely scratching the surface of text analysis, and I will post more on the subject next week as we explore the closely related technique of text mining. What is particularly exciting about text analysis is that it lends itself best to humanities research due to its focus on the written word, but is the complete opposite of the traditional “close-reading” approach due to its use of large datasets. These disciplines have traditionally been exclusively qualitative in nature, and so using what is a quantitative method to analyse the source material has the potential to open up a new wealth of opportunities for research and data visualisation, collectively known as the Digital Humanities. One interesting and freely-accessible example is this The Lord of the Rings project, which has analysed the complete text of the novel (plus J.R.R. Tolkien’s other works in some cases) to produce a range of relevant infographics.

However, I will nevertheless end on a note of caution. Analysing information in this way risks losing the context that underlies it, and visually appealing infographics such as word clouds can be used when they are not necessary at the expense of a more informative presentation, as Jacob Harris demonstrates with a comparison using the same dataset. As with several other aspects of this module, therefore, I would advise that this technique is used to supplement more traditional methods of achieving the same end, not to replace them.

Mapping the Twitterverse

02 Sunday Nov 2014

Posted by Dominic in Information Architecture

≈ 6 Comments

Tags

Andy Baio, application programming interfaces, Big Data, data manipulation, data visualisation, datasets, Dhiraj Murthy, DITA, Google Sheets, information architecture, JSON, markup languages, Martin Hawksey, metadata, Raffi Krikorian, social media, TAGS, Twitter

In an information era defined by an exponential growth in data output, fuelled by the connective and interactive technology possibilities provided by the Internet, there is perhaps no better exemplar of these phenomena than Twitter. Twitter is a social media website that allows its users to send short (140 characters or fewer) messages (“tweets”) that can be viewed instantly be their followers, or by other users searching for particular words or phrases. The scale of the enterprise is vast: Twitter estimates that it has 284 million active montly users, and that 500 million tweets are sent by these users every day. Despite the brevity of each message, each also contains a great deal of associated metadata, as shown by this “Map of a Tweet” (in the markup language JSON) produced by Twitter developer Raffi Krikorian.

Four more years. pic.twitter.com/bAJE6Vom

— Barack Obama (@BarackObama) November 7, 2012

(The most re-tweeted tweet of 2012.)

Twitter’s importance in contemporary culture can be seen in its use by political leaders the world over (a verified list includes a total of 66). The role of Twitter, and other forms of social media, in influencing world events such as the Arab Spring has also been well discussed and documented, and its sociological implications are being studied by academics such as Dhiraj Murthy. This makes it a valuable tool for scholarly research, but there is a key problem that must be overcome: Twitter’s main strengths as a communications medium—its timeliness, conciseness and immediacy—and its popularity mean that older Tweets are continually buried beneath successive waves of newer ones. The site’s own search function is notoriously limited (only extending back for one week, for example), and despite recent improvements, is still not capable of producing useful data. For instance, searching for mentions of an extremely popular hashtag (a Twitter tool used to denote key words or phrases, e.g. #citylis, City University London’s Library and Information Science course) will not return all of the possible results due to bandwidth limitations imposed on individual user queries.

However, using the technological possibilities opened up by APIs (as first mentioned in a previous post). A user can create their own Twitter app to gain access to the Twitter platform (under strict conditions), and, using the ingenious TAGS tool, developed by Martin Hawksey, which in turn uses the APIs of Twitter and Google Sheets (an online spreadsheet programme similar to Microsoft Excel) to automatically export the metadata—encoded in JSON—of selected Tweets (usually filtered by hashtag) into a database. One of my classmates, Daniel van Strien, has written more fully on the technicalities of how this is done, and I recommend that you read his blog to find out more.

These API processes result in a Google Sheets document which consists of four spreadsheets:

A “Readme/Settings” sheet that the user manipulates to set the search parameters, and including links to data visualisation tools (more on which later).
An “Archive” sheet, consisting of the tabulated metadata of each tweet retrieved.
A “Summary” sheet, listing users by number of Tweets about the desired hashtag, word or phrase, and also including other basic information derived from the Archive.
A “Dashboard” sheet, consisting of the Archive data presented in graph and chart form.

My TAGS Google Sheets document, which is used to search for the #citylis hashtag, can be viewed here.

Perhaps the most compelling results of TAGS, however, is the creation of tools to further manipulate and visualise the raw metadata. TAGS Archive creates something that looks, superficially at least, like a standard Twitter feed, but is in fact a fully-preserved archive of Tweets on the given search term that can also be filtered by screen name or tweet content, or searched for chronologically, far more effectively than by using Twitter’s own search interface. TAGS Explorer is even more visually exciting, as it creates a map of interactions between all of the different users who have used the search term.

This is particularly useful as it clearly demonstrates Twitter activity—within the chosen parameters—over time. Another of my classmates, Shermaine Waugh, tweeted this image last Monday (October 27), showing a map of tweet replies between users within #citylis:

https://twitter.com/ok_shermaine/status/526708221114068992

The equivalent image from today (November 2), indicates that new links have appeared and that existing ones have been strengthened, after a further six days of Twitter activity (another advantage of using TAGS is that all tweets from the moment of the automated programme being set up are retained, not just those sent seven days prior to the most recent export of data being carried out):

TAGS Explorer also allows the user to create more complex maps by including mentions (when a Twitter user mentions another user’s screen name in one of their Tweets):

An even more intricate map can be included by also including retweets (when a user republishes another user’s tweet without modification):

Finally, every node in the network can be clicked on to view each individual Twitter user’s interactions with the wider group (this can even be animated!), in this case myself:

These examples have all been rather self-indulgent; only of use to someone encountering these tools for the first time (or perhaps a future historian researching how Twitter was integrated into LIS university courses in the early 21st century? I can dream!), but these visualisation principles can also be applied to areas of more value for scholarly research, or indeed public interest. For example, I mentioned the Arab Spring earlier, and this video shows the activity, in real-time, concerning the #jan25 hashtag at the moment of former President Hosni Mubarak’s resignation:

Another example concerns Twitter itself, specifically its use in the current “Gamergate” controversy. American technologist and blogger, Andy Baio, used a set of 316,669 tweets and their associated metadata to produce a series of data visualisations in order to demonstrate a series of illuminating conclusions.

It is clear that Twitter offers a great variety of research possibilities and I am very much looking forward to continuing on this theme in the future lectures and lab sessions that comprise this module.

The Library of Tomorrow

~ thoughts and reflections on the world of Library and Information Science

Tag Archives: data visualisation

Visiting the Mundaneum

Self-reflection through DITA data analysis

At the coalface of information

Screwing around with text analysis