What Can Academia Learn

from Open Source


Kathryn Huff. February 2, 2015

Academia Town Hall hosted by UW eScience and GitHub

BIDS Logo FHR Logo Berkeley Logo
Fission.

\[\sigma(E,\vec{r},\hat{\Omega},T,t,x,i)\]

Chain reaction
Software Carpentry cyclus pyne THW book

Academia is...

  • a community
  • engaged in expanding knowledge
  • through education
  • and research.

Or is it?

Science

  • builds and organizes knowledge
  • tests explanations about the universe
  • systematically,
  • objectively,
  • transparently,
  • and reproducibly.

Otherwise it's not science.

Science relies on

  • peer review,
  • skepticism,
  • transparency,
  • attribution,
  • accountability,
  • collaboration,
  • and impact.

Since the 6th century BCE, academic science has been perfecting these tenents.

Open source software is now superior at all of them.

Peer Review, Skepticism, and Transparency

“ Organized Skepticism. Scientists are critical: All ideas must be tested and are subject to rigorous structured community scrutiny.” - R.K. Merton, 1942
“ The scientific method’s central motivation is the ubiquity of error—the awareness that mistakes and self-delusion can creep in absolutely anywhere and that the scientist’s effort is primarily expended in recognizing and rooting out error. ” - Donoho, 2009.

Peer Review For Code

  • Good: divide and conquer
  • Better: shared repository
  • Best: peer-reviewed pull requests

Error Detection

  • Good: show results to experts
  • Better: integration testing, pull-request code review
  • Best: unit test suite, continuous integration

Analysis

  • Good: pencil and calculator
  • Better: spreadsheets, matlab, mathematica
  • Best: scripting, open source libraries, modern programming language

API Design

  • Good: single block of procedural code
  • Better: separate functions
  • Best: small, testable functions, grouped into classes, DRY


DRY: Dont Repeat Yourself. Code replication is bug proliferation.

Attribution and Accountability


“ But what if they scoop me? ” - Someone in this room, probably.

Publishing First

  • Good: share data once all possible papers have been published
  • Better: share data as soon as there is a pre-print
  • Best: share, with a license, while working, if not sooner

Congratulations: your online repository history is an insurance policy against theivery.

Producing Quality

  • Good: being accountable for each paper you publish
  • Better: being accountable for the released version of the code
  • Best: being accountable for each line of code.

Accountability: git tracks each commit, on each line, for provenance and accountabiliy.

Collaboration and Impact

“ Two of the biggest challenges scientists and other programmers face when working with code and data are keeping track of changes (and being able to revert them if things go wrong), and collaborating on a program or dataset. ” - Wilson, et al. 2014.
“ If a piece of scientific software is released in the forest, does it change the field? ”

Teamwork

  • Good: weekly research meetings, year-long tasks
  • Better: daily conversations, month-long goals
  • Best: agile development, pair programming, issue tracking

Distribution Control

  • Good: "email to request access"
  • Better: license file
  • Best: license file, citation file, DOI, forkable online repository

Community Adoption

  • Good: none, internal use only
  • Better: online repository, developer email online
  • Best: issue tracker, user/developer listhost(s), online documentation

listhost

Extending Software

  • Good: hand over a zip file, theory paper
  • Better: rely on comments in code, example input file
  • Best: version controlled repository, automated documentation, test suite

Unique Issue in Nuclear Engineering


Export control is serious.

Export Control is a big deal in nuclear

Papers!

Acknowledgements

  • GitHub and UW eScience!
  • Nuclear Science and Security Constoritum
  • Berkeley Institute for Data Science

THE END

Katy Huff

katyhuff.github.io/town-hall
Creative Commons License
What Can Academia Learn From Open Source? by Kathryn Huff is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at http://katyhuff.github.io/town-hall.