Data Management: A few of my favourite things

Last year I had the good fortune to help draft a research data management policy for a Dutch research organisation, or rather an umbrella organisation of research institutes. The draft policy is now under discussion among the institutes and the umbrella organisation. That consultancy project has given me a welcome opportunity to read (again) a number of data policies from academic organisations and funders, as well as guides for developing a data management policy by Hodson and Molloy, the Australian National Data Service, and the Digital Curation Centre.

Research data management or RDM is a lively debated and partially uncharted territory, where different organisations place different markers: some universities take two pages to communicate their policy, others more than ten. At the moment about half of the Dutch research universities have published an RDM policy; the others and also most universities of applied sciences are in the process of designing one. The situation at other research and science institutes in The Netherlands is equally mixed, which has also to do with international collaborations with partners which may or may not have (to comply with) data policies. Anyway, a data policy is a crucial element in a researcher’s influence sphere.

This blogpost is a highly subjective set of a few good practices. Firstly, it is subjective because the selection of extant policies that I’ve read or skimmed was strictly mine, and the links at the end of this post cover only part of Dutch research organisations (no foreign examples, no funders). Secondly, the order in which I’ve read them has certainly affected my “Oh, this is great” and “Yeah, I know” reactions, favouring some policies or sections over others for possibly no other reason than “newness” in the eyes of this reader. And thirdly, from the many, many aspects of research data management I’ve chosen just a few that I consider relevant for anyone who is interested in designing or implementing a data policy. So please, don’t be offended when your policy isn’t mentioned.

    1 – Who’s responsible?

A commonality in the RDM policies on my reading list is the two-tiered approach: The policy is released at the highest level of the organisation, typically by the Executive Board (in Dutch “College van Bestuur”), and it delegates responsibility to departments and/or graduate schools for both refining the policy – by means of discipline-specific information on e.g. data types, research methods, appropriate data repositories – and for monitoring and enforcing its implementation. This is a sensible strategy to handle “Sure, we should manage our data well, but in my line of working this is slightly different”.

So much for policy making; let’s look at responsibilities regarding the data themselves. In 2013 the Schuyt Committee concluded in her advisory report that the responsibility for research data lies at three different levels: 1) the individual researcher; 2) the research institution (e.g. by providing facilities and assistance, and by PhD supervisors showing exemplary behaviour); and 3) informal research networks, including journals.

In my policy collection, responsibility is sometimes just mentioned briefly; I guess that by now it is a common principle that the primary responsibility for RDM lies with the (principal) researcher or scientist. At the more explicit end of the spectrum we find for instance the University of Groningen. Not only is “Scientific research is a responsible job” the start of their RDM policy web page, the policy also devotes a good two pages to detailing roles and responsibilities throughout the organisation.

    2 – What to keep for the long term?

Before most of the current policy documents appeared the Netherlands Code of Conduct for Academic Practice already declared a minimum retention time for raw research data. The Code also states that “all steps taken must be properly reported and their execution must be properly monitored (lab journals, progress reports, documentation of arrangements and decisions, etc.)”. In a similar vein the Humboldt-Universität zu Berlin makes clear in its policy and guidelines that already the data management plan should tell how the data will be documented; moreover, this documentation and the metadata should comply with standards in the respective scholarly community.

The Guideline on Data Handling and Methods Reporting from the Tilburg School of Social and Behavioral Sciences refines the university-wide policy. “Staff and PhD students must ensure that there is a data package for each empirical article or book/chapter of which they are author (…)”. The guidelines then describe eight requirements that the data package must comply with: for instance, not only must it contain all the digital research material used in the research project that is needed to replicate the research, but also detailed information on who collected the data, where and at what location, as well as – when appropriate – computer scripts or statistical logbooks. I really like the short discussions/motivations about the requirements.

I’m only looking for a better name for this kind of package: “data package” might, despite the explicit requirements, still suggest that the mere data would be enough. Elsewhere I’ve proposed the name “replication package”, as shorthand for “everything that a fellow-researcher would need to replicate the particular study”, but that name unfortunately ignores the potential of reuse. And I’m afraid that “archival package” would only confirm the misunderstanding that archives exist for the sake of keeping your stuff from disappearing…

    3 – The visibility advantage

… whereas in reality research data archives contribute to data reuse, repurposing (answering other questions than the original one), efficiency, thrift (spending research funds well), and of course to increased visibility of researchers, data scientists, scientific programmers and their research organisations. From the latter point of view it certainly makes sense that Radboud University has decided to include a list of stored data sets in its self-evaluation of the Dutch Standard Evaluation Protocol. In this Protocol datasets as well as e.g. software tools explicitly qualify as research products achieved in the domains of research and society.

I’d like to see more examples of research institutes that register data and in particular the use of data as an indicator in their self-assessment. For one thing, more information about data availability and use can shed more light, in more disciplines, on the relation between Open Access to data and the number of citations. Keeping track of open data use is hard – if “they” don’t cite your data properly, how would you know about reuse, unless you live in a smallish research domain? – but joining forces might get us somewhere.

All Dutch universities and the Royal Netherlands Academy of Arts and Sciences offer public information about managing research data (and so do other research organisations not mentioned here). The following links point to a data (management) policy when available; otherwise the link refers to other RDM information. All links were retrieved in September 2017.

 

Disclaimer: I’ve carried out the consultancy project as a DANS employee, but the views presented here are my own, not necessarily my employer’s.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *