Thursday, 6 June 2013

BT Digital Archives

Several weeks ago, on the 15 May 2013, several British Library folks involved in Digital Scholarship went to an event hosted by Coventry University about the digitisation of the BT Archive and subsequent creation of a digital library for this material. There were a number of speakers from the various collaborative partners in the project – Coventry University, the BT Archive, and The National Archives.

BT is the world’s oldest communications company with a history stretching back to the start of UK telecommunications in 1846. This history is reflected in their archive somewhere beneath Holborn which contains 2.5 to 3 kilometres of shelving. As well as material about BT’s development as a UK company, the Archive contains a lot of material about the global history of telecommunications and the international development of an information society built on a foundation of telecommunication.

From Flickr user: tarotastic

David Hay, the Head of Archives at BT, gave an introduction to the digitisation project which began in January 2012 and is due to be completed by July 2013. The project digitised approximately 8% of the entire collection with selection criteria focusing on primarily science and technology-related material with a balance between breadth, depth, and “pretty stuff”. Formats include photographs (tens of thousands), research reports, and subject/registry files. Their audience focus was primarily on Higher Education with a secondary audience of the general public. The bid for JISC funding was for 450,000 digital images and ultimately they produced 481,099 over a 10-month period.

Chris Mumby from The National Archives gave a talk on the project workflow: how the items make their way from Holborn to Kew, from conservation to imaging, and from internal OCR to external transcription. As a collaborative project, different aspects of the workflow are handled by different partners and different work-packages:
  • Project Management package – Coventry University
  • Digitisation package – BT Archive and The National Archives
    • Scoping at BT Archive
    • Items transported to TNA at Kew
    • Conservation by TNA staff
    • Imaging by TNA staff
    • Metadata creation on a mixed model
      • OCR text created at TNA
      • Transcription services from an external provider
    • Items transported by to BT Archive
  • Portal creation package – BT Archive and Coventry University
  • Academic package  – Coventry University and BT Archive
The BT Archives project is of a similar scope and size to the project on which I work and so the issues and proposed solutions related to workflow management were familiar to me. The need for consistency was mentioned a lot: David Hay emphasised the need to define language and nomenclature at the start of a project (using glossaries, stylesheets, guidelines, etc.) in order to ensure effective communication between areas; he also discussed the need for consistency in statistics gathering and metrics as well as the need to occasionally be flexible with milestones and timescales. For ensuring consistency in file-naming conventions, TNA found it best to create a folder structure and filenames that reflect BT cataloguing conventions (which meant learning the unique cataloguing conventions and ‘finding number’ constructions of the Archive) and replicate the archive file structures.

The speakers also emphasised the importance of strict project management and consistency in approach from the very start of the project. David mentioned the importance of thorough scoping at the start. Catt Baum, the Conservation Manager at TNA, said that conservation is often overlooked in digitisation projects: since there’s no point putting an image online from a file that can’t be read, it needs to be built into workflows from the start.

The iconic trimphone. From Flickr user: jovike
Both David Hay and Chris Mumby made points about minimal levels of metadata creation. Transferring data between work-packages and third-party developers creates added levels of complication so they advocated not digitising uncatalogued material and not starting projects in which cataloguing has to be built into the workflow. Their metadata creation focused on OCR of selected categories of material and some enhancing of catalogue records for ‘category B’ engineering reports.

Other speakers gave presentations on the interesting research that was being done with the digitised materials: Hilary Nesi discussed corpus query tools for text mining and linguistic analysis of the correspondence materials in the Archive (and emphasised the importance of defining what a ‘letter’ is early on in the project); Gemma Tombs talked about problem-based learning and using archive materials to construct learning scenarios for undergraduates; Martin Woolley talked about academic projects in design and his case studies of the design of the iconic trimphone and its place in UK design philosophy.


Overall the message from these presentations was that it’s what we do with the content – interesting and useful academic research, defining historical narratives – that is important rather than the physical processes of digitisation. Digitisation is a means of overcoming the constraints and limitations of the physical world and setting content free as digital material. 

No comments: