## Reproducibility and Transparency: examples from the UN Global Platform *http://joe.peskett.projects.officialstatistics.org/NTTS/* Joe Peskett @joepeskett
## Importance of Transparency and Reproducibility * NSOs are not immune to the reproducibility crisis * Need to trust build between researchers, policy makers and the public
## The Goal: Trusted Methods * ***Minimum***: open code and access to data * ***Ideal***: an environment to run code on original and new data
## What is a container? *“An application or method, plus all its dependencies, libraries and other binaries, and configuration files needed to run it, bundled into one package.”*
## Why use containers? * A consistent environment * Run on different infrastructure; local and cloud * Isolated from other environments * Reusable containers for specific use cases
## Version control
## Version Control * Every version of a method has its own container and API * Allows for breaking changes without breaking pipelines * No need for making new containers for experimental purposes
## Version Control
## Model versioning * Data versioning vs. Parameter Versioning * New data is used for training an existing model * New design of the model (features, architecure etc)
## APIs and Transparency * ***“In general terms, it is a set of clearly defined methods of communication between various software components”*** * APIs can be called from simple apps or a variety of different programs * We can make certain methods publicly available, to demonstrate the outputs generated from given inputs * Track and understand use of methods and data
## An example - Urban Forests * Four methods, piped together * Final pipeline is composed in asynchronous manner * Different stages of the pipeline can be run standalone, and exchanged if required * Combination of programming languages
## Pipeline Design
## Method Overview
## Example Inputs/Outputs
## Documentation Overview
## Key Points * We can build small web apps/notebooks to demonstrate our methods working in a transparent way * Easy, reliable and resuable deployment of methods * Access to methods and data is always controlled through API keys
*Thank you for listening* **Any questions?** joe.peskett@officialstatistics.org