VTL
Virtual Transcription Laboratory
Overview
Virtual Transcription Laboratory (VTL) empowers researchers, cultural heritage professionals, and data providers to unlock and enrich text-based cultural heritage (CH) materials at scale. By combining advanced automated processing with expert human input, it delivers flexible, reusable workflows for text recognition, and semantic enrichment. Whether working with historical newspapers, manuscripts, or ancient tablets, VTL supports the transformation of diverse CH objects into structured, searchable, and meaningful data.
Feature List
- End-to-end workflows for digitisation, text recognition, and enrichment of CH materials
- Hybrid processing: combines automation (e.g. ATR, NER) with human curation for high-quality results
- Smart discovery tools: find and retrieve datasets through powerful metadata and catalogue queries
- Supports a wide range of text types – from manuscripts to newspapers, cuneiform to oral history
- Modular and flexible: workflows adapt to different languages, scripts, and research needs
- Toolchain interoperability: integrates cutting-edge tools with existing CHI and research infrastructures
- Multi-actor support: for CH institutions, researchers as users, and researchers as data creators
- Promotes open, FAIR data practices in the CH and DH research communities
- Unlocks “hidden” collections by improving access to poorly digitised or under-described datasets
- Scenario-driven innovation: supports tailored use cases that inspire reuse, annotation, and new insights
User Experience
Milena, a dedicated collection manager at a public library in a mid-sized European city, treasures a fading archive of 19th-century local newspapers—written in a non-Latin script and vital to the region’s history. With limited funding and only a part-time student assistant, she’s managed to digitise a handful of titles. Yet, the quality varies, and the bulk of the collection—some still on fragile microfilm—remains inaccessible.
Milena dreams of making the papers fully searchable for local historians, genealogists, and university students. But OCR tools struggle with the script, producing error-ridden results.
One day, she discovers the Virtual Transcription Laboratory (VTL) within the Cultural Heritage Cloud. Curious, she uploads a test batch of TIFF newspaper scans. The VTL’s OCR Dashboard, tailored for historical newspapers and specific languages, impresses her. After reviewing and correcting results through its interface, she begins uploading larger volumes.
Milena also explores IIIF publication options and shares her collections in the Cultural Heritage Cloud Knowledge Base, enabling search, article segmentation, and entity recognition. The enriched content travels far—reaching Jay, a Bulgarian-American illustrator in California. Jay, captivated by a news piece echoing Balkan folk theatre, dives in. Inspired, he creates a community around 19th-century shadow theatre on the Cultural Heritage Cloud, uploading his own collection and refining the metadata.
His work leads to a beautifully illustrated book of Balkan folktales—praised by UNESCO for celebrating intangible cultural heritage. And Milena, quietly watching from her library, smiles. Her fragile newspapers have found new life, far beyond the city’s borders.
Why on the ECCCH?
The Cultural Heritage Cloud will offer new means for discovery of resources (especially cultural heritage objects) by offering comprehensive descriptions about them through a dedicated catalogue as well as through a knowledge graph. It will simplify the processing of datasets, by offering easily obtainable computing capacities for processing tasks, flexible in terms of available computing power (GPUs, RAM, time). There will be different ways to allocate these resources based on institutional or other collaboration contexts, ensuring them free at the point of use.
These capacities can be used to run services (Software as a Service) integrated in the Cultural Heritage Cloud, or to run custom services, deployable on on-demand virtual machines, using state-of-the-art deployment procedures and container-based virtualisation, i.e. docker.
These computing capacities will be integrated with personal storage capacities allowing to access data as well as to store newly generated data and to share it flexibly with colleagues across institutional boundaries. These storage capacities will be federated with institutional ones and those of other infrastructures, allowing to move data seamlessly between these.
All these offerings will be seamlessly available thanks to Federated Identity, allowing users to access resources and services with the local account (provided by the home institution)