Sharing and publishing data

The openness and sharing  of research data promotes its reuse. Reuse of research data benefits not only the author of the data, but also other researchers and society as a whole.

Should I make my data available?

Research publishers and funding agencies are increasingly requiring that research data  should be openly accessed  upon completion of the research. The openness and sharing  of research data promotes its reuse, which benefits not only the author of the data, but also other researchers and societies the world over. However, when making data available, it is recommended that licenses are  used, thus allowing researchers to specify the degree of publicity and user rights for their data.

Licenses as a tool for openness

Unlike with scientific articles, which are  published or unpublished, research data can have varying degrees of publicity. The degree of publicity can be determined according to either legislative or storage technical criteria. An excellent idea of the degrees of openness in storage technology is provided by Tim Berners-Lee's 5-star deployment scheme supplemented by Linked Data Finland's 2 stars:

1* Publish your data in any format under an open license, e.g. PDF license or CC BY
2* Publish your data in a structured format, e.g. XML or CSV table
3* Use open access, non-proprietary file formats, e.g.  CSV, not Excel 4* use a URI
5* Link your data to other data to provide context
6* Data and documentation of its scheme (automatic)
7* Automatic validation (and correction) of data quality

Data that is completely openly available might only be information on the data, with access to actual use of the data requiring a user rights agreement, i.e. license approval or a specifically drafted agreement. Publishing research data online does not yet mean that users can do anything they want with the data. Indeed, published data can be viewed without permission, but not necessarily touched. Care must also be taken when linking data.

The data terms of use are always set by the author of research data or the party to whom the author has transferred the copyright (Copyright Act 404/1961). Data may be either completely open or its use is restricted to a certain purpose (usually scientific research). The data may contain sensitive (Personal Data Act 523/1999, section 11) or confidential (Act on the Openness of Government Activities 621/1999, section 24) data or business or trade secrets, which must be omitted when publishing research results. The use of data may also be restricted. Taking this kind of data into use (e.g. downloading) may require entering into a specific agreement with the author of the data.

The National Research Data Initiative (TTA) recommends the use of Creative Commons Finland licenses (CC BY), unless the data content requires otherwise. Creative Commons Finland's operations are overseen by the Helsinki Institute of Information Technology (HIIT) and Aalto University School of Arts, Design and Architecture Media Lab.

Example of license use

The Etsin - Research Data Finder  mentions in the data information under what terms the data may be used. The data author may grant the user permission to use the data any way they see fit, i.e. the author relinquishes all rights. However, even in this case, it should be kept in mind that both the Copyright Act and the responsible conduct of research require that the author be mentioned in accordance with good practices. This is the case, even if the data is published under a CC0 license. The author may set certain terms regarding use of the data. The author grants permission to use the data under these terms, i.e. he or she licenses the data. If the data is published under an open license (e.g. CC0 or CC4), the publisher of the data permits all persons downloading it to use and edit said data under the terms specified in the license.  In this case, users do not have to draft or sign a specific agreement and the author of the data does not need to know who is using the data.

The Creative Commons 4.0 attribution license also serves as a template for a working group, whose task is to draft a JHS Public Administration recommendation on an internationally compatible open data user license to be applied to Finnish public administration data. An interim version of the JHS XXX Open data user license recommendation, (in Finnish only) which is the recommendation applicable to all public administration data, has been published. The final recommendation will be published in machine-readable form on the www.yhteentoimivuus.fi website (Finnish only).  There is also a website to help in choosing a license. 

Embargo

Conducting research on  global phenomena often requires the laborious and expensive collection of data, such as by experiments or measurements, as well as the high  availability of data around the world. In such cases, a proprietary period or embargo is often set for the publishing of data, during which the authors of the data can prepare their research and article before the data is published for use by the scientific community and in further research. The duration of the embargo can vary by publisher, funding agency or research organisation. It is typically around 1-3 years from the date when  data collection begins.
In the case of scientific publications (e.g. articles), an embargo is a delay period set by the publisher, during which time the article may not be published for free access online. An embargo period runs from the date a journal is released, either online or in print, whichever version comes first. The embargo period varies from publisher to publisher, normally ranging from 6 to 12 months. Not all publishers have delay restrictions.