català | castellano | english home   sitemap   legal notice   credits   contacte  
home home

Menno Polak, University of Amsterdam

"The preservation of research archives and data from the universities in The Netherlands"

I would like to report on a conference on ‘archives of science’ - I will specify later - held in Amsterdam at the University on 29 November of last year. I wish to report on that conference here because I took the initiative for it after the CASE meeting in Edinburgh. I hoped to find out what the situation in the Netherlands was. But let me first explain the background for my interest in the matter in the first place.

The Archive Project

Three years ago the University of Amsterdam hired me as a project manager for what was called the Archive Project. This project aimed at making good an enormous backlog of work - some 125 years of unattended business - the searching for, gathering and making accessible the archives of university faculties, departments, institutes, etc. These archives would - and will - at the end of the project be transferred to the Amsterdam Municipal Archives, a public repository. As a result these archives will be in the public domain. All this as is prescribed by the Public Records Acts of 1962 and 1995. This legislation prescribes that the records of public universities are to be treated in the same way as other government records.

In searching for these historic archives I have visited all university buildings and the rooms of all entities of the university organisation where records are kept - this sounds much more sophisticated than it really is: wet cellars, dirty attics, unattended closets and locked rooms, sometimes with the key missing. The attic in which the archive of the Faculty of Philosophy was kept, had been used as a home by a family of pigeons. You really do not want me to elaborate on what that did to the papers we found there. Moreover the university is presently housed in more than a hundred buildings all over Amsterdam. The University of Amsterdam is a very large and complex organisation, with an ever-changing institutional structure. It is both a research university and an important educational facility. Obviously, I cannot seriously claim to have been everywhere and to have seen everything, but I have done the best I can.

The Archive Project is primarily concerned with administrative documents, dating from the period 1877 to 1997. Administrative, i.e. much of it deals with budgets, personnel, buildings and so on. All of which is clearly not the core business of a university. All that is only there to support the primary processes: providing academic education and doing scientific research.

To what extent has the core business – education and research I shall concentrate on research – left  traces in the university archives? I will come to that in a minute. The fact that the Archive Project is concerned with the archives of the whole university explains that my interest is not limited to science in the sense of hard science, natural sciences, as would be true for most CASE members. The problem obviously extends to the social sciences and the humanities as well.

As will become clear later on, which research files from the past have been preserved is by and large the result of coincidence. So I will concentrate on the preservation of research data to be produced in the future, rather than on existing, or non-existing research material from the past. What can be done proactively? Should the university develop a policy to deal with research data? Looking to the future has one important advantage. The material will almost always be in digital form. As I said, I will deal here with future digital research data.

Concentrating on the future of course has the added advantage that I do not come into conflict with Godelieve Bolten. Those of you who were present in Edinburgh will remember that she reported on the activities of the Center for Scientific Archives at the State Archive of North-Holland, which is exclusively interested in existing material.


The ‘Archival Police’

The Archive Project was instituted because of an enquiry by the national public records inspection, into the management of university archives in the Netherlands. The results of the enquiry were far from uplifting. Neither at our University of Amsterdam, nor elsewhere, fortunately - or unfortunately, whichever you prefer. I will not go into detail. Except for this story; when the inspectors visited the Faculty of Law, the dean of the faculty opened the door to the room in which the records were supposedly kept, with some flourish, only to find it completely empty...

Interestingly, the inspectors never looked into, or asked about, the preservation of research files or data. It was only in their final report that they addressed the issue in passing. I quote:

‘Researchers often take research files home. In some cases these files end up in the State Archives of North-Holland, that collects important archives of science. [This is Godelieve's Center for Scientific Archives, that I referred to earlier.] In our view, the inspectors continued, in both instances this practice is contrary to the Public Records Act. Because research is the task of the universities and researchers do their work as government employees. The manuscripts of the researchers therefore appear to be part of the university archives.’

In short: research archives are university archives and, as such, have to be transferred to public repositories. The single argument above was: researchers do their research as government employees. What kind of archival material they meant (they only referred to "manuscripts"), was not specified.

 
Mutual obligations

If it is true that research materials are subject to the Public Records Act two sets of obligations arise. The public repository is obliged to guarantee the long term preservation and continued access to this material. But the universities are required to collect research material from researchers who are - as is shown by the quotation above - completely unused to this kind of procedure. And subsequently the university administration has to select what material is worthwhile to be transferred to the public repository. And lastly the university needs to make the material accessible. All this at the university's own expense.

Anyone who knows about the financial position of the universities in The Netherlands, will understand that this requirement is utterly unrealistic. Not only because researchers may need to be convinced of the usefulness of this new requirement, but also because there is no framework for dealing with the selection and classification of these highly specialised scientific documents.

The representative of the National Archives at the conference reiterated that research files are part of the university archives and as such fall under the sway of the Public Records Act. But shied away from the consequences. As I said two sets of obligations arise, but apparently the National Archives are not keen to keep their end of the bargain, no keener than the universities are to keep their end. Understandably so: looking to the future the sheer potential amount of the material as well as the problems involved in selection and cataloguing of the documents give everyone serious misgivings.

 
Research material

What research material did we find? I already hinted at it: we did not find very much, as a matter of fact. And what we found is unbalanced. Some material in one place, none at all elsewhere. As I said, I have been in all (or most) places where university archives are kept. The obvious conclusion is that if research material is preserved, it is not stored with the administrative material. I was looking for research material in the wrong place.

It is interesting to look at the administrative material for comparison. From what is left of the administrative material we can draw some conclusions, that are not very surprising:

· How much material is left over in part depends on the amount of storage space available. Especially in new buildings no one, it seems, ever realised that the paperwork the administrations produce must be kept somewhere. When no provision is made, the older material tends to be thrown out at some point.

· The same is true for departments and institutes that have moved a lot. The simplest method of dealing with a large amount of paper that has to be moved time and time again from one building to another is to move it to the bin.

· Actually one should perhaps draw the conclusion that the National Archives and the Inspectors are wrong. Research materials are not a part of the university archive. Physically at least, they are not. The structure and contents of archives generally are representative of reality. The administrative workflow is quite separate from the workflow of research. Two principles of archival science may be in conflict here.


Motives for preservation

All this is not to say that we should resign ourselves to the fact that this material is usually taken 'home' or destroyed or simply vanishes into thin air.

What arguments do we have to preserve the research material?

Of course it can be considered to be part of our cultural heritage. But two considerations precede this argument. In the first place it should  be possible to reproduce the research - and in so doing - check the results. This does not necessarily to refer to scientific fraud, although there is no reason to be hypocritical about it. We all know how easy and how tempting it can be to leave out or skip over that one unpleasant finding that does not fit our expectations.

Obviously as a motive to preserve research material the possibility to repeat the research is limited - also in time. But what period is involved, is dependent on the discipline. Across disciplines it may range from 10 years, as was mentioned in Edinburgh for SLAC or CERN, to several decades in other fields. At the very time of writing this a biologist on the radio said that he had done work as a student on the taxonomy of the South-East-Asian freshwater globefish. He claimed that his work had only been read by someone who studied the taxonomy of the South-East-Asian saltwater globefish. Other than that only once in a hundred years would someone study this particular animal species again. Even if this is somewhat exaggerated, obviously the taxonomy of 26 million animal species is not studied every 10 years.

A third motive, next to cultural heritage and replication of research, relies on the fact that data, while they have been gathered in the course of a research project, may serve multiple purposes. Data may be used subsequently in other fields or to solve different problems, or to solve the same problem on a larger scale.

It is particularly with this multiple use of data in mind that in recent years a committee of the OECD [Organisation for Economic Co-operation and Development] has written a report on access to scientific data. It is "the floating capital of the global science system". The report has resulted in an OECD-declaration in January 2004, signed by ministers of all OECD-member states about 'Open access to research data from public funding'. The report was concisely summarised in one of the major Dutch newspapers in a headline: "Here with those data!" A more extensive summary is found in an issue of Science from March 2004.

At the Amsterdam conference a report on the work of this Committee for Scientific and Technological Policy (CSTP) was given by its Dutch member. The point is that the potential for international cooperation has grown enormously due to the present cyberinfrastructure. The premise is Open Access: "Digital research data should be publicly available to society subject only to legitimate restrictions." The question remains then: what restrictions are legitimate?

The principles formulated by the ministers for data access are of a fairly abstract nature: Transparency, Legal conformity, Formal responsibility, Professionalism, Protection of intellectual property, Interoperability, Quality and security, Efficiency and Accountability. Taking a step back from this abstract level, the Dutch member of the CSTP pinpointed two issues:

· Sustainable archiving will be a problem that needs to be addressed.

· Funding conditions will play an important role in the implementation.


Counter arguments

Three questions, or counter arguments, continue to be asked.

1. Why are the data included in the published work insufficient?

2. Do you want to keep everything any researcher has ever produced?

3. Who is going to foot the bill?


As to the first question (the data are in the published work, aren’t they?), it is obviously not true. The raw data are not necessarily included in the publication, not even always I suppose the operational, the analysed data. The publication contains the data that support the argument. There is nothing wrong with this, but usually many more data are available than what can be printed in the text. Actually, there is really a fourth reason here to preserve research data: why would we jeopardize data painstakingly collected in archives, laboratories or surveys? I will return to the other two arguments later.


Classification of research material

So the question is not whether the material should be preserved at all, but exactly what material we are discussing:

 
Archival material - Public Records Act

1      administrative documents

2      academic correspondence   

Material - not evidently archival

    3   raw data

4      analysed data

Non-archival documents

5      Publications


With regard to this question I have tried to produce a provisional classification. In part they are university records (to be passed on to a public repository), in part it is data, in part it is published material that libraries deal with. This implies that the problem is narrowed down to the data, both the raw and the operational ones.

 
DANS and DARE

But it is not just the public records offices that are potentially involved. There are some very different initiatives as well, not all new, but they appear to be on a new lease of life of late. In the Netherlands the social sciences have long preserved digital files with empirical social science data. A similar 'repository' has existed since the late 1980s for data from historical research. Both data banks will shortly be united in one Data Archive and Network Service, DANS in Dutch, a cooperative enterprise of the Royal Academy of Sciences and the main funding agency. One of the speakers of the conference had back then taken the initiative to set up the repository for historical data files, and will probably be the DANS's first director. DANS is directed primarily at the humanities and the social sciences. The reasons for its founding are similar to my earlier arguments.

Another speaker at the conference discussed DARE, short for Digital Academic Repositories, in which the universities cooperate. Institutional repositories, I believe, are well known in other countries as well, although I believe the Dutch were the first to actually build a national network. DARE is based on the idea that adding metadata according to the OAIS [Open Archival Information System] -standard, makes the metadata harvestable. At first, DARE aimed exclusively at the digital preservation of and access to digital academic publications. The key word here is "publications". In the Netherlands DARE is especially affiliated with the university libraries. The libraries will, it has been said, become more and more invisible as organisations but will penetrate more and more into the primary processes of the university. The speaker introduced the term “libratory”.

Databases, research archives, generally do not constitute publications, and did initially not fit the DARE concept. Two issues come into focus, however:

1. Would it not be logical to keep the file of the digital raw data on which a publication is based along with the text of the publication itself? When, until recently, publications were always in print, the cost of printing the raw data was often prohibitive. But in a digital environment this is not necessarily the case; at least not to the same extent.

2. Is there in this digital environment a relevant distinction between a publication and a database? Does not the distinction fade? How do we look at published sources, done by historians but also by statisticians? Editing historical sources - my original field, by the way - may have seemed a little out of time ten years ago, but with the great digitization projects going on now it is more prominent than ever. And the distinction between a book or article and the publication of a collection of data is less and less clear.

Both DARE and DANS are at an early stage in their development. At the conference (and elsewhere) it was clear that they tread carefully around one another. What is the difference: the precursors of DANS are older and so far technologically ‘oldfashioned’, DARE is based on state of the art technology of search engines. In a very broad statement, one could say that DANS has stressed preservation, where DARE focuses on access.

 
Concluding remarks

1. An important consideration is that research materials play a very different role in various academic disciplines. Two extremes: most "hard scientists" consider the preservation of their data superfluous. In a laboratory environment at any rate it is possible to generate the data again and again, and ordinarily each time using improved equipment. In archeology the opposite is true. An excavation cannot be undone or repeated. Once a site is excavated the source of the data is destroyed. Here it is absolutely crucial to record the finds and findings.

2. So finally, let me return to the two questions to which I still owe you an answer.

Do I want to keep every scrap of paper (or any set of data) that any researcher has ever produced? Of course not. But even so it is all the more pressing to determine who will decide what to keep and what to do away with. Let us face it: so far the decision is left to chance. It is largely coincidental what research material survives and what does not. Is that an acceptable “solution”? Is it a solution at all?

The answer to the question of course is that I am not qualified to judge the value of research material across the board – but then who would be? Obviously it is up to the researchers themselves. Representatives from each discipline will have to distinguish – maybe after some time has lapsed – what is worthwhile and what is not.

3. Who will pay for the preservation, cataloguing etc. Let me finish with that quintessentially Dutch question. I would say that again it is the responsibility of the researchers themselves. It should be a part of the funding plan for each project: to determine how to dispose of the research data and to set aside a small portion of the funding to do so, as much as provision is made ahead of time for publishing the results of the research.




bottom