Patrick Egan | Digital Humanist and Ethnomusicologist in Cork, Ireland

Digital Archiving – Programming

In the last blog, “Digital Archiving – Workflow Setup“, we looked at setting up the scanning environment for the Cork LGBT Archive. Once this was achieved, our Assistant Digital Archivist, Mira Dean got to work on the archival boxes.

Archival boxes of the Cork LGBT Archive

Scanning for Omeka

The Cork LGBT Archive website is built on the Omeka software. According to the official website https://omeka.org, Omeka is a Swahili word that means to display or “lay out wares”. described as “a free, open-source content management system for online digital collections. As a web application, it allows users to publish and exhibit cultural heritage objects, and extend its functionality with themes and plugins”. Orla had been working on Omeka since we first met in 2014, so I had a good understanding about how it was set up and the plugins that were installed at various stages along the way.

Subject headings

As a digital archivist, one of the most important skills is to know the Dublin Core standard for metadata description. Dublin Core arrived on the archival scene in the 1990s, as a standard that could work across description throughout the archives world. One of the advantages of this is that it allows archives to be connected across geographical boundaries. For example if I wanted to find all the “Gay Activists” activities that occurred throughout the 1980s in Ireland and across Europe, those items would have already been described and categorised by archivists in every country across the continent.

As I had already been working with both the Boole Library in UCC and the Library of Congress in Washington DC, I was very familiar with this standard. In the past seven years, I had also gained experience in working with Linked Open Data, so from a programming point of view, I knew the potential for the metadata in this archive. This is one of the most appealing things about the technological side of the Cork LGBT Archive -it has already been set up with standards like Dublin Core and Linked Open Data as well as being a place where you can introduce new ideas to improve the work and presentation of the Archive.

So, for “subject” description, it was possible to use Dublin Core, but also some headings from Linked Open Data/ One of the more interesting plugins that Orla had previously installed was called “LC Suggest”, which means Library of Congress Suggest. This plugin is used in the subject description, and provides built-in subject headings that have already been used by the Library. Orla has also encouraged a number of previous digital archivists and “crowd-sourcers” to utilise LC Suggest, and another excellent resource named “Homosaurus”. Homosaurus.org is an international linked data vocabulary service that enhances the Cork LGBT Archive, as in the future it will be possible for machines to discover a whole host of information using the subject headings that were entered.

Other common Dublin Core descriptions include: Description, Creator, Source, Publisher, Identifier, and Coverage.

As a newcomer to such a specific area, sexual health in the gay community 1980-2000, there was some work and reading to be done in order to become more familiar with the correct use of subject headings and description of resources. I made sure to read through and think about the documents that I was encountering. One of the more helpful documents for this purpose was Kieran Rose’s book, Diverse Communities, which we scanned and described in July. It is an excellent read, and a very interesting insight into gay and lesbian politics in Ireland during the 1980s and 1990s.

In all, between June 1st and July 31st, we completed just about 400 items, fully scanned, described and published. But whilst scanning and description were in full flow, there was also time to explore the technological side of the project. Orla’s son, Jacob was with us from the start, and has proven to be a superb addition to the team, both in terms of design work and web coding.

Digital work – Extracting Subject Headings

Quite early on in our work, we discovered that we could use the Omeka API (Application Programming Language). An API is basically all the data from the Omeka database that is made available at a url, such as: https://corklgbtarchive.com/api/items. To a developer, this is gold. It means that you can search through the whole website and extract information on any aspect of the descriptions. I recognised early on that this was a brilliant opportunity to get a list of all the headings that had previously been added into the sexual health section. Some subject headings (as seen below) included AIDS Activism, Gay Health Action (Ireland), Safe Sex in AIDS Prevention, Safer Sex, AIDS Education, AIDS Awareness, Sexual Health, Donating Blood, AIDS Testing.


www.corklgbtarchive.com subject headings for AIDS related documents

However, in order to get the full list, we needed to write an application. Jacob is already fluent in Python, Javascript, PHP, Svelte and a host of other programming languages, and he knows GitHub as well. So it was very easy to work with him and request, “hey, can you build an application that extracts all the subject headings for sexual health?”. Jacob got to work right away, and within a short time had created not only an application for extracting subject headings, but one that lists private items, plus he made it interoperable so that any other Omeka user could use it on their website.

Programming the API script to extract Subject Heading information from collections on www.corklgbtarchive.com

Jacob’s work portfolio can be seen on his personal website, www.JacobEM.com, and each time that an application is made, he has published it on his www.GitHub.com account as a free-to-use repository. The full code for Jacob’s subject heading extractor, plus a step-by-step process is available for the world to use (whether programmer or not) at: https://jacobem.com/blog?q=omeka-website-metadata-extrators

The full list of subject headings was extracted using Jacob’s application, and this was then loaded into Excel and printed out, so that any time the subject heading field was being filled out, a host of possible headings were available.

Final list of subject headings for use with AIDS which is called the “AIDS Activism Sexual Health Ireland” collection

Digital Work – Database Scrolls!

After the success of extracting all the subject headings and working with Jacob on the API, it became clear that coding was an excellent way to monitor the Archive, to get a number of different overviews and to think about ways to create digital visualisations (more on visualisations in later blog posts). One of the most important aspects of my role as digital archivist was to identify the exact items that were in the Cork LGBT Archive website and to compare this to the items that had already been uploaded to the Digital Repository of Ireland (DRI). I found that if I could cross-check both that it would be possible to fill in the gaps with material that had not yet been ingested into the DRI resource.

Jacob’s success with the extraction of subject headings was indicative of what we could achieve – we set about getting a full list of items up to mid-July. This was quite straightforward, and again Jacob shared his work on GitHub for the world to use: https://github.com/yakowa/omeka-title-desc-extractor

I decided to utilise this digital opportunity by creating a physical, visual overview of the Archive. After using Jacob’s description extractor, I was able to edit some of the fields in Excel and use some HTML formatting software to display titles and descriptions side-by-side. I then split the title and description fields into columns and sorted it alphabetically. Next, in order to get a run-down of all the 160 or-so items in the DRI, I created a list from the website. This was then listed alphabetically, and compared to the Cork LGBT Archive listing, in order to see them side by side and to find which items were not yet added to the DRI. The result was very interesting:

Surprises!

As the work moved along in July and August, we stumbled upon some fascinating documents from the archive. Ironically, whilst working on digital scans and extracting data, we discovered a document from the Southern Gay Men’s Health Project on “Internet Outreach”. This document, written in 2003, talks about the potential of the internet as a powerful tool for Outreach.

Internet Outreach Report / Update from 2003, the first piece of evidence concerning the use of the World Wide Web to spread these messages.

Another surprise that developed during these busy summer months was our most important Spotify playlist! Mira’s job description to get everyone to get up and walk around the office space every hour in order to stretch the legs and take a break. For Orla, this then evolved into a way to illustrate the process involved with the workflow (scan to Omeka to DRI to Europeana) through movement, which then developed into the Shag Pack (see Love Shack from the B-52’s below). The Spotify playlist continues to evolve – if you enjoy our playlist, please like it, further suggestions are most welcome!

Leave a Reply

Your email address will not be published. Required fields are marked *