Endangered Archives Programme
Six million objects. One new website. Just seven weeks
The Endangered Archives Programme (EAP) contributes to the preservation of archival material that is in danger of destruction, neglect or physical deterioration worldwide.
Delivered by the British Library, and funded by Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin, EAP supports preservation of important, at-risk collections of photographs, documents, manuscripts and other items from around the world; it facilitates digital capture of these items; and shares over six million images online using their new website built by Cogapp.
The Programme has funded over 320 projects in 80 countries around the world.
Goals for the online archive project
- Display high-resolution, zoomable versions of all six million images using the open source Universal Viewer and leveraging IIIF (International Image Interoperability Framework)
- Bring the site into the British Library brand
- Make the site accessible across devices
- Improved user experience, including powerful search
- Increased stability and scalability
- Rapid delivery, with less than two months from project start to launch.
What they said
The new site delivers six million images via the open source Universal Viewer and IIIF. Every high-res image can be zoomed, panned, rotated and shared.
The images on the old site were medium resolution and not zoomable, making some portions of text illegible. For the new site, we made the images high-res, flexible and dynamic using the IIIF Image API. See technical approach below for more detail on how we did it.
Brand and experience
The British Library brand is supported online by their global experience language (GEL). The GEL offers a consistent, user-tested way to use the Library’s numerous online sites and systems.
Bringing the Endangered Archives Programme site into the GEL involved combining existing GEL code with new code created by Cogapp, and testing the whole interface for a high standard of functionality, accessibility, performance, and cross-device compatibility.
Content management, search and systems
Content is editable using Drupal, an open source content management system.
Search is powered by Solr on top of a similar Harvester/Mill system to those used on the Clyfford Still Museum Online Collection, the Qatar Digital Library, Yiddish Book Center and other Cogapp projects.
Our system harvests from the British Library's internal archive management system, then the Mill processes it into a format ready to ingest into Apache Solr. Once in Solr metadata becomes indexed and searchable, with additional features like faceted search filters.
The Endangered Archives Programme has been running for over a decade, and has amassed over six million images. The previous EAP site could not process all of the images, meaning some archives were unavailable to users. The scale of some archives regularly caused the site to crash, requiring effort from Library staff to rectify.
The new site is built on solid, scalable infrastructure enabling all of the images to be presented together online for the first time, and with the ability to add many, many more as new digitisation projects are commissioned.
Deliver in seven weeks
One of the reasons we were chosen by the Library as partner on this project is our proven track record of delivering to tight timescales. The stakeholder meeting to review the website was a critical aspect to the Programme’s continued funding, and was immovable.
We’re able to deliver quality quickly thanks to our strong agile processes and experienced team.
Other recent immovable deadlines we’ve delivered to include a visit from the President, the birth of a baby, the snap UK general election and the grand opening of a building.
Using IIIF provides integration with Universal Viewer as well as out-of-the-box compatibility with other viewers using the IIIF standard, such as Mirador and the Internet Archive Book Reader.
The Endangered Archives Programme has over 300,000 archive data records, and we imported all of these into Apache Solr to allow for rapid searching, filtering and retrieving of this data.
We used the Drupal CMS to allow the library to add and edit details for the hundreds of individual projects that contribute these archives, making sure to account for the different languages and scripts used around the world.
To deliver high-quality imagery, we used high-resolution TIFF-format master images that the library has as preservation copies, but with more than six million images, this equated to over 200TB of data! We needed a way to transfer this data quickly and efficiently from British Library storage to the new website servers.
We recommended using the Amazon Snowball service, which bypasses the internet, making the data transfer cost-effective and significantly faster (even for an organisation like the British Library with incredible broadband).
We then created a system that automatically detects a new TIFF image upload and converts it to JPEG2000 format suitable for serving web-friendly JPEG images dynamically using the IIPImage service.
Our system categorises the images by the archival file and project that they belonged to, as well as extracting their height and width, and stored all this information in Apache Solr.
The final system automatically scales up and processes images as fast as they can be imported to AWS: a rate of around eight images per second for direct import from the Snowball appliance.
All these systems are running on Amazon Web Services (AWS) infrastructure so that we can quickly scale the system to meet current and future needs.
Find out more
Inspired? To talk to us about IIIF, agile and online archives, please get in touch.