At the Research Town Hall Caring for Your Data, Dr. Michael Pencina, Vice Dean for Data Science and Information Technology at the School of Medicine and Professor of Biostatistics and Bioinformatics, talked about the importance of FAIR Principles and invited the audience to focus on what we can do with data and not on what we can’t do” in terms of the potential to analyze and compare data with current and expanding technologies.
Research has entered a new era where expectations of data management are higher than ever. The National Science Foundation (NSF) now mandates that proposals must include a formal Data Management Plan (DMP) (http:/duke.is/5jN3Ft). The National Institutes of Health (NIH) does not yet mandate a DMP, but they do require that proposals include a Data Sharing Plan so that “data is as widely and freely available as possible” (http://duke.is/utSwiY). Perhaps even more daunting than these requirements from funding agencies are the increasing volumes of data that researchers generate.
In order to maintain data integrity, the research team must consider how to 1) best organize and store data through potentially complex data processing steps, 2) best document the research processes applied to the data, and 3) ensure that everyone who handles and accesses the data are following the established data management practices. Consider a typical research project: there are multiple types of data, including data files in different and unique formats (not all of which are natively accessible to a scientist or bioinformatician without the aid of additional software and transformation), numerous data files collected over the course of the project, and many human and mechanic “hands” involved in each step. Given the scope of the challenges, what can the average research group do to establish and maintain rigorous data management practices despite resource constraints?
Rather than trying to create the perfect data infrastructure, researchers should focus on creating the best data management strategy that their research unit can actually implement – progress not perfection. A good strategy is to focus on managing data according to the FAIR principles: Findable, Accessible, Interoperable, and Reusable: (details available at https://www.nature.com/articles/sdata201618.pdf). These principles are intended to improve data management and support good data stewardship, which should improve publication quality because the data contained therein are reviewable and reusable.
Findable: In order for data to be useful, they must be findable– today and 5 years from now. Bear in mind that unique, detailed identifiers and metadata (information about the data) must be readily searchable by both humans and machines. Proper organization and indexing of the data and metadata are critical for this process.
Accessible: Data must be accessible to users once found. This means that the language and format of the data and metadata must be understandable by humans and machine users. Accessibility in this context may not mean open access; sensitive data and metadata may require an authentication or authorization protocol to access.
Interoperable: Interoperability in data management means that the data and metadata can be integrated with other data using a range of applications and workflows. Data will not be useful long-term if the data can only be manipulated on a limited range of platforms.
Reusable: Finally, data need to be reusable, because reuse is critical for ensuring reproducibility in research. The data and metadata should be richly described, including licensing information, detailed provenance, and relevant community standards.
If you are interested in creating your own data management plan, the Duke Libraries have resources to help you (https://library.duke.edu/data/data-management). ASIST is working closely with the Duke Libraries to ensure that these data management plans leverage the appropriate data management resources available at Duke. Developing and then implementing a FAIR principles-concordant data management plan will help to ensure that you and your research team are able to publish and disseminate data with confidence, knowing that the results can be traced back to initial data files and analyses at any time.
Do you need more support or information about data management? If so, we are here to ASIST.