Waving Hand

We’re hiring

Join our award-winning team

Cogapp Refinery Services

Scalable cloud-based image and data processing

Thoughtful application of automated enrichment tools

Cogapp has been building large-scale digital platforms for some of the most recognised cultural institutions for around 30 years.

We have distilled all of this experience into Cogapp Refinery Services. CRS is a performant, scalable and robust pipeline for the enrichment and presentation of large collections of digital material.

Talk to us about your data processing and enrichment plans


We employ many techniques to enhance and enrich digital images, audio and video; examples include:

  • OCR of handwritten text
  • Named entity recognition
  • Language recognition
  • Image categorisation
  • Metadata extraction

For a given corpus of material, we work closely with an institution’s domain experts to select the best set of tools to apply to the material. Using this domain knowledge in combination with the chosen toolset provides a service that is greater than the sum of its parts.

Screenshot of Arabic text with OCR overlay of co-ordinates highlighted.
Illustrative image to show how OCR of non-latin character sets picks up positional information of words, in this case Arabic text from Defending the Qurʼan Against Slander ‎[F-1-6] (6/222), Qatar National Library, 12978, in Qatar Digital Library <https://www.qdl.qa/archive/qnlhc/12978.6> [accessed 11 July 2019]

Modern, scalable, serverless infrastructure

At its heart CRS is based on modern, scalable cloud infrastructure. Every job is different so CRS was developed to be highly modular. We work with our clients to understand their goals and together we develop a plan for how to deploy a pipeline to meet and, where possible, exceed requirements.

CRS uses a serverless technology stack. This allows for the instant scaling up and down of resources to match a project’s resource needs perfectly. Due to this, the system can accommodate many thousands of concurrent, parallel jobs making our pipelines incredibly performant at scale.

Diagram of indicative processing pipeline


The highly modular nature of CRS makes it extremely flexible in terms of output. The system can be configured to update third-party systems directly or simply output results to an agreed intermediary format for later ingestion.

Cogapp is very familiar with the world of metadata standards; we apply this experience allowing any output produced to be used as efficiently as possible by our clients.

Some example outputs offered by various modules include:

  • OCR output
    • ALTO XML
    • Full text
  • Resized and cropped image sets
    • Based upon output from other CRS modules i.e. objects identified by computer vision we can custom crop to interesting objects within an image
  • IIIF endpoints
    • For images and/or metadata
  • PDFs
    • Fully searchable for text-based content

Cogapp has delivered many award-winning projects over the years. Should it be required, we can of course accommodate custom web development alongside CRS to help present your digital assets in the most appropriate and engaging way for your audience.

Automation should always start with a conversation

Cogapp has been privileged to work on high-end, high-volume applications of Artificial Intelligence, Machine Learning, Natural Language Processing and Computer Vision.

Get in touch to speak with us about how we can bring the benefits of these technologies to your projects, both practically and at scale.

Start the conversation