Over the last decade, we have made substantial progress addressing the challenges of digital preservation. We have identified some of the real problems, which were not always what we expected them to be; we have built tools together through international collaborations; and we have started to put sustainable frameworks into place. By encouraging interaction among problem owners, practitioners, and technologists we have made substantial progress on the path from pilot to production.
The environment in which we operate, however, continues to change. Some of our progress has addressed the unique issues that arose in early digital materials - the ones that form the bulk of today's digital collections. Recent and emerging trends in digital content challenge our assumptions and established approaches. These include dynamic elements in HTML5 and ePUB3, information hiding services such as link shorteners, and content that reflects the ongoing conversation about it. Furthermore, many large scale content holders are under significant budget pressure. This raises issues around staffing, skills, training, and the cost of services and technology.
This presentation will discuss risk as the central motivator for digital preservation. We discuss why we need risk assessment for digital collections. We present examples of risks for digital objects and, in particular, discuss how risks extent to the various components of the environment in which digital objects live. We discuss how risk assessment ties in with other digital preservation activities, such as characterization of digital objects, preservation watch, preservation policies, preservation planning and how they finally result in the prioritization of the identified preservation actions. We present examples of risk categories that have been used in practice (a pragmatic approach as well as a thorough risk assessment at the British Library, The National Archives, UK, risk assessment handbook) and theory (Planets risk model). We discuss how risk metrics are computed from information, such as probability and impact measures, which risk management approaches, such as avoidance and mitigation, exist, and, finally, we discuss difficulties in assessing risks.
We all know that we should have policies. OAIS tells us so, Audit & Certification authorities tell us so. But most of us have just plunged in without policies, or have written one as an afterthought, in preparation of audits. Inge Angevaare of the Dutch Digital Preservation Coalition NCDD analyses recent examples of digital preservation policymaking in the Netherlands in order to determine what worked and what did not, and why. Because rather than an afterthought, policies should be tools for practitioners.
The Wellcome Library is a collecting institution specialising in the history of medicine and allied disciplines. As a subject specialist institution we need to collect both digital and analogue information to remain relevant, but as a collecting institution we do not have control over the formats of material we receive. In addition, as with most archives, our collecting policy has traditionally been reactive, rather than proactive. Whilst this approach has worked for paper archives, it is not suitable for digital material which becomes obsolete quickly.
This presentation will look at our experiences of working with digital archives, and how this has changed work at the Wellcome Library, focussing particularly on the difficulties of working with a diverse, often obsolete, range of formats. One of the ways of solving some of the problems associated with this would be normalization, but this approach has advantages and disadvantages, both of which will be discussed.
With increased use of the Internet, the ways of disseminating knowledge have fundamentally changed. Firstly, progress in digitisation offers scientists the opportunity to share their knowledge worldwide, and, secondly, it is becoming apparent that research data has acquired a new value, so that its long-term preservation will be a key issue in the years to come.
In order to be able to offer the scientists at ETH the opportunity to preserve their research data in the long term, the ETH-Bibliothek has launched the ‘Digital curation’ project. The project group began its work on 1 October 2010.
So as to gain an overview of how research data is handled at present in the 16 departments, the survey ”Handling research data” is being distributed, with the assistance of the IT support managers in the various departments. The questionnaire gives the scientists the opportunity to describe how research data is actually handled in practice in the own professorship or research group, and so will provide important information for the work of the ‘Digital curation’ project group.
The questionnaire consists of two parts. In the first part, we want to know how research data is handled generally. The questions are based on a survey carried out by the working group "Research data" of the Leibniz community. The second part includes some specific questions about the research data based on the paper „Conducting a Data Interview“ von Witt & Carlson, Purdue University Libraries, 2010.
The paper proceeds from the experience with digital preservation gained by the author over several years in the National Library of the CR, as well as through participation in international projects. The primary accent will be placed on necessary preparations and fundamental decisions and changes connected with digital preservation. Digital preservation affects an institution very complexly and creates the need for a transformation of the routine approaches and organisational changes. The unpreparedness of the institution for relatively fundamental changes could become a more serious hurdle for digital preservation than a lack of financial means for investment. Contextual selection of holdings for digital preservation will be also mentioned.
Presentation will provide some introductory comments regarding the environment within which we work and in particular the increasing volumes and complexity of digital material that we have to manage. This will be followed by a description of the work being done in New Zealand to leverage the National Library's work on the National Digital Heritage Archive to support the public record and Archives New Zealand's Government Digital Archive project. Finally, a look at the status of digital preservation internationally including the need for ensuring a continued global, collaborative and interoperable approach to digital preservation research and practice.
Professionals working with, at or about digital archives have collected a tremendous amount of theories, models and practices to address the problems of digital preservation. Designated community, significant properties, the representation model of PREMIS, the Performance Model of the Australian National Archives and PLATO are just a few of them. But preservation planning remains a difficult task. How to integrate the different approaches into a framework, which maintains consistency for a long time?
The Nestor working group on digital preservation has written a guideline, which sketches the flow from data to information and further beyond to limited information objects. It shows how these objects could be maintained by grouping them along their purposes of use into subtypes, which have the same significant properties. On this basis necessary preservation actions could be chosen for the whole life cycle of the objects in a consistent way. The model gives also hints for further automation.
The "Nestor Leifaden für Digitale Bestandserhaltung" will be published in 11/2011. An English version is planned.
Two major planning instruments have been developed by the nestor network and the WissGrid project. The nestor ingest guide gives an overview of the necessary tasks during the ingest. Ingest is regarded as the transfer of responsibility and not necessarily the transfer of data. The importance of this phase has been emphasized by several publications which identify it as the single largest cost and risk factor.
The WissGrid data management blueprint and checklist focuses on research data, but also takes a broader approach. It tries to guide researchers and information specialists not only in planning the ingest but through the whole lifecycle from data creation to ingest, preservation and reuse.
In this workshop session we introduce both guides and discuss our experiences. As overarching theme we want to address the relation of lifecycle planning (WissGrid) to individual process planning (nestor), their complementarities and the advantages of the approaches. Harry Enke (Leibniz Institute for Astrophysics Potsdam) will explain the WissGrid approach and Karsten Huth (State Archive of Saxony) will introduce the nestor guide. The session will be moderated by Jens Ludwig (Goettingen State and University Library).
At the highest level of any business process it takes mission vision with the right policies and standards to achieve long term objectives. Then what is the importance of best practices, tools and workflow? It still takes a carpenter to build a house even though hammer and saw are largely replaced by pre-fabric an standard components.
Digital archivists and librarians will increasingly make use of tools and use best practices and standard, this will allow them to make effective use of planning pricing and workflow tools. The last decade R&D resulted in the production of several tools for planning, pricing and workflow. Most of these tools are still “under construction” and the sector still lacks the availability of digital preservation tools with that respect.
Unlike traditional tools, software tools are almost always “under construction” and it is therefore important the sector starts using these tools, provide feedback and support the improvement. One of the features of the Plato Planning Tools is to support the practitioner with the collection of information required for the planning process. It allows to connect policy to requirements of a collection.
What do we need representation information registries for? How do they help solve the core problems of maintaining long-term access to digital assets? How do we create more high quality content for registries? What's the best way to share information? What tools do we need to help us collect, filter and apply representation information? And what is a file format anyway?
These and other questions will be covered in this interactive discussion-based workshop.
This workshop will cover digital preservation of audiovisual content, and access requirements– and summarise results of the decade of Presto projects. PrestoPRIME is about digital preservation technology applied to audiovisual content.
Policies encapsulate the 'what' of an organisation or service. They describe the intentions of the organisation, but not how those intentions are to be implemented or executed. As such policies make it easier for others to understand the purpose and intentions of your organisation. They also help to ensure that an organisation's business processes are in-line with the intentions of the organisation. Policies often contain legal and domain specific elements. Policies are essential when organisations interact with each other as they illustrate what constraints each has to live within. Policies themselves are often natural language documents that are not implementable on their own. A procedure needs to be followed that results in implementable processes that enforce the policy with each workflow corresponding to a particular policy statement. The process must be traceable such that the link between policies and processes is captured.
This workshop will look at the process of creating policies based on the policy creation experience from the German National Library and will fold in lessons learnt from that process. The material will form the basis for discussion on the process of creating policies from workshop participants. The workshop will also look at the process of deriving processes from policies based on the work carried out in the SHAMAN and PASOPOL groups. The workshop will also describe the tools that can be used to create policies. The intention is to gather feedback on the policy creation process to ensure a robust set of tools exist for policy creation. In addition, a description of the policy derivation process will be given with the intention of gathering feedback on the approach and an assessment of its usefulness.
This presentation starts with an overview of the approaches to planning, implementing, and carrying out ingest of digital entities into curation and preservation repositories. The primary assumption shaping the analysis is that repositories will be ingesting a wide range of types of digital objects often involving complex inter-relationships of objects from a variety of information producing and using organisations or individuals. Effective ingest depends upon a rich understanding of the digital entities being ingested, such an understanding requires collaboration between producing organisations and repositories. In fact, one of the weakest aspects of the OAIS model is its delineation of the ingest process. The process of establishing agreement between creating and using organisations and curating and preservation repositories requires communication, negotiation, planning, testing and evaluation. The presentation concludes by suggesting some generic approaches to ingest; and in this it lays a foundation for the following presentations which investigate particular aspects or instances of ingest.
Audiovisual content is more like books than like digital library content: mainly, at present, it is on shelves. Much of it is analogue, and much of the digital content is not in files on mass storage systems – but is on things like CDs and DVDs and DAT tapes and other digital carriers that are not files. So ingest for such content does not begin with reading files: it begins with creating files by digitising analogue content, and ‘ripping’ digital content from physical (non-file) carriers. All of this makes audiovisual ingest one or two generation and technologies behind ingest of file-based content – so questions about “how is a SIP formed?” when applied to audiovisual content are likely to get the response “what’s a SIP?”. In addition, audiovisual content is about 100 times larger than other kinds of content, has ‘wrapper’ formats with complex structures, of types (eg MXF) unrecognised by standard digital library tools, has a time dimension that audio and video and subtitles has to understand and preserve – and has multiple encodings and proxies that need to be managed through cycles of obsolescence. This paper will introduce the real issues in audiovisual ingest.
Format identification is a prerequisite for preserving, and providing continuing access to, any digital object. It determines the more detailed characterisation of that object which may be possible, and informs preservation planning at a fundamental level. This presentation will describe the approach to characterisation, and specifically format identification, developed by the author at The National Archives in the UK, and how it is being implemented in a range of settings, including the Parliamentary Archives.
It will begin by considering the characterisation of digital objects, as a key stage in repository ingest workflows, and as an essential element of preservation planning and action. It will describe the distinctions between format identification, validation, and metadata extraction, as fundamental aspects of the characterisation process, and review the range of format identification techniques available. The DROID tool was developed to implement some of these techniques, using a combination of internal and external signatures. This presentation will describe this approach, assessing its strengths and limitations, and suggest areas for future research and enhancement; these must be discussed in the context of the evolving debate about the role of technical registries such as PRONOM and GDFR. It will consider how format identification may be applied to different types of material, and in a range of different scenarios. It will also describe recent developments of DROID, including its use as part of the PLANETS framework, and the concept of collection profiling.
Finally, it will consider how tools such as DROID can be used practically, as components in an ingest workflow, with particular reference to current work by the Parliamentary Archives to develop a digital repository. Format identification can be used to drive other characterisation processes, and examples of this will be described. It will describe the types of metadata created, and how it can be utilised during ingest. The roles and requirements of depositors, archivists and end users will also be addressed. Finally, the lessons learned from the practical implementation of characterisation workflows during ingest, and the integration of varied tools to achieve this, will be discussed.
In mid 2006, the Portuguese National Archives (Directorate-General of the Portuguese Archives) launched a project called RODA (Repository of Authentic Digital Objects) aiming at identifying and bringing together all the necessary technology, human resources and political support to carry out long-term preservation of digital materials being produced by the Portuguese public administration. As part of the original goals of RODA was the development of a digital repository capable of ingesting, managing and providing access to the various types of digital objects produced by national public institutions. The development of such repository should be supported by open-source technologies and, as much as possible, be based on existing standards such as the OAIS, METS, EAD and PREMIS.
The end result of the project is a full-fledge repository that supports integration of preservation action tasks and with an architecture allows a continuous adaptation of new functionality and technologies. It defines a data model that suites all meta information requisites of a digital preservation repository and a complex ingest workflow that validates and adapts all information to this curation environment.
For more information visit http://hdl.handle.net/1822/9408
The National Library of Finland is responsible for the collection, preservation and accessibility of Finland’s published national heritage, and for its other unique collections. This presentation will give a general overview of the several processes employed in the digitization and handling of electronic legal deposit. METS format has been chosen as the container format for digitized materials, and considerable amount of effort has been put into creating adequate METS profiles. As METS will be heavily relied as a container format, the practicalities are discussed in some depth. Regarding electronic legal deposit, the National Library has concentrated on large-scale web harvesting. Depositing of e-books is being tested with publishers. The future plans concerning digital preservation will be presented, especially the National Digital Library initiative.
One of the greatest challenges in digital preservation is preparing records metadata in good quality. Quite a lot of metadata have been already created with records and are placed in electronic records management systems (ERMS). Significant part of this information has value even after active phase of the records have been ended. Not do loose this data in archiving process is not the easiest task for public archives, because the ingest functions of the archive must be very flexible, but on the other hand quite standardized. Archives have to deal with the various ERMS that are used in agencies. The National Archives of Estonia has created the universal archiving module (UAM) that allows agencies to prepare records for archiving. The tool allows rearranging the structure of records and adding additional descriptions to them. It is also possible to check whether the data that is prepared for archiving meets the rules set by the archive institution or not. Transmission to the National Archives can be done manually (e.g. saving submission information packages to the DVDs) or using the Estonian secured internet layer X-Road. The packages arrive into the archive through the gateway called Kleio, where additional checks are made before transfer into digital repository.
The three Goportis partners – the German National Library of Science and Technology (TIB), the German National Library of Medicine (ZB MED) and the German National Library of Economics (ZBW) started the implementation of a collaboratively operated digital preservation system in 2010. Organisational, technological and institutional needs and requirements were allocated during a pilot phase. The heterogeneous holdings of the three partners and the different technological infrastructures call for a flexible ingest structure, which fullfills each institution‘s needs and simultaneously forms a solid framework in regards to trustworthiness. The presentation highlights decisions which needed to be made along the way and shows varying ingest flows ranging from (semi)automated submission applications to manual ingests.