Lessons Learned

The Practice of Reproducible Research


Kathryn Huff

Sutardja Dai Hall

Jan. 27, 2017

  • ''Reading brings us unknown friends'' - Honore de Balzac
  • Incentives
  • Pain Points
  • Recommmendations from the Authors
  • A Little Data
  • Needs

Incentives

  • verifiability
  • collaboration
  • efficiency
  • extensibility
  • "focus on science"
  • "forced planning"
  • "safety for evolution"

Pain Points

  • People and Skills
  • Dependencies, Build Systems, and Packaging
  • Hardware Access
  • Testing
  • Publishing
  • Data Versioning
  • Time and Incentives
  • Data restrictions

Recommendations

  • version control your code
  • open your data
  • automate everywhere possible
  • document your processes
  • test everything
  • use free and open tools

Recommendations: Continued

  • avoid excessive dependencies
  • or at least package their installation
  • host code on a collaborative platform (e.g. GitHub)
  • get DOIs for data and code
  • plain text data is preferred, timeless
  • explicitly set seeds
  • workflow frameworks can be overkill

Recommendations: Outliers

... in our estimation, if someone was to try to reproduce our research it would probably be more natural for them to write their own scripts as this has the additional benefit that they might not fall into any error we may have accidentally introduced in our scripts.

Recommendations: Outliers

Scientific funding and the number of scientists available to do the work is finite. Therefore not every scientific result can, or should be reproduced.
tools
languagess
testing

Emergent Needs

  • Common demoninator tools should support reproducibility
  • Improved configuration and build systems
  • Reproducibility at scale for HPC
  • Standardized hardware configurations limited-availability experimental apparatuses.
  • Better understanding of incentives for unit testing.
  • Greater adoption of unit testing irrespective of programming language.
  • Broader community adoption around publication formats that allow parallel editing
  • Broader adoption of data storage, versioning, and management tools.
  • Increased community recognition of the benefits of reproducibility.
  • Incentive systems where reproducibility is not self-incentivizing.
  • Standards around scrubbed and representational data
  • Community adoption for file format standards within some domains.
  • Domain standards which translate well outside of their own scientific communities.

Acknowledgements

  • BIDS
  • Justin Kitzes
  • Fatma Imamoglu
  • Daniel Turek
  • Ben Marwick
  • Chapter Authors
  • Case Study Authors
  • Reproducibility Working Group

BIDS Logo

THE END

Katy Huff

katyhuff.github.io/2017-01-27-bids
Creative Commons License
Lessons Learned by Kathryn Huff is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at http://katyhuff.github.io/2017-01-27-bids.