name: title class: Left, middle background-image: url(images/rawpixel/nasa-sun.jpg) background-size: cover # .whiteinline[.fancy[Basic recommendations for reproducible projects]] ### .whiteinline[[2.1] Recomendations and practices] .whiteinline[Reproducible Research Practices (RRP'21) · April 2021] .right[.whiteinline[Carlos Granell · Sergi Trilles]] .right[.whiteinline[Universitat Jaume I]] ??? Solar Observations taken during the Transit of Venus. Originally from NASA. Source: [Rawpixel(https://www.rawpixel.com/image/440254/solar-observations) --- name: spacecraft class: bottom, middle background-image: url(images/rawpixel/nasa-cygnus-spacecraft.jpg) background-size: cover --- template: spacecraft ## Nobel laureate and physics professor [Richard Feynman](https://en.wikipedia.org/wiki/Richard_Feynman) .large[“Science is a way of trying not to fool yourself. The principle is that you must not fool yourself, and you are the easiest person to fool.”] --- ## .center[Computational reproducibility] ### If you were to reproduce a paper... 1. .large[check the published manuscript to reproduce,] 1. .large[download associated data and code (and computational environment),] 1. .large[run the analytical stuff, and ] 1. .large[compare the results to the published results. ] --- class: center, top ## Ok, I got it. But getting started is difficult! -- ### "I have never heard of reproducibility before" -- ### "I'm not good at using tools. _You said github?_" -- ### "I have never been trained" -- ### "I don't know where to find tutorials, guidelines..." -- ### "I'm superbusy! I do not have enough time" -- ### "I do theoretical stuff. Reproduction is not for me" --- class: center, middle # No matter your excuse --- class: center, middle # .fatinline[Reproducible Research Practices] ### are central to enabling transparent and honest science --- class: center, middle # You should ask ### How do I start to make my next paper more reproducible? -- ### Which aspects should I improve the most? -- ### What steps or recommendations can I follow? --- class: center, left # Simple reproducible project workflows & recommendations ### Inspired by British Ecological Society (BES), [guides to better science](https://www.britishecologicalsociety.org/publications/guides-to/): .acidinline[Reproducible code] guide <a name=cite-cooper2019></a>[[CH19](https://www.britishecologicalsociety.org/publications/guides-to/)] and .acidinline[Data Management] <a name=cite-harrison2018></a>[[Har18](https://www.britishecologicalsociety.org/publications/guides-to/)] ### .heatinline[Passport for Open Science: A Practical Guide for PhD Students] <a name=cite-barthez2020></a>[[BL20](https://www.ouvrirlascience.fr/passport-for-open-science-a-practical-guide-for-phd-students/)] ### .coldinline[The Turing Way: A Handbook for Reproducible Data Science] <a name=cite-turingway2019></a>[[The+19](https://the-turing-way.netlify.app/welcome.html)] ??? Look at the checklist in these guides. --- class: inverse, center, middle # Basic recommedations to get started? --- class: center, middle # One project = One folder --- class: center, top # What's a project? -- ### one experiment, ### your ideas for future research, ### regular meeting notes/minutes ### a review paper, conference presentations ### a book, teaching materials ### PhD thesis manuscript --- class: center, middle # Choose your best way to organise a folder .large[Make sure it's consistent, informative and work for you] (eg: my [folder template](https://github.com/cgranell/rr-template)) --- class: center, middle # Consistent naming convention --- # .center[Consistent naming convention] .pull-left[ .center[<img src="images/phdcomics1531.gif" width="60%" />] ] .pull-right[ - Name things in machine- and human-readable manner - Order by default: - For scripts, start file names with numbers indicating the position of the script in the analysis: `01_download_data.R` - for data, you can use dates as prefix: `20200115_survey.csv` ] ??? [Source](http://phdcomics.com/comics/archive.php?comicid=1531) --- class: center, middle # Never touch raw data --- class: center, middle # Choose a coding style --- class: center, middle # Document, document, and document. .large[Add README file(s)] --- class: center # README file(s) .large[Include a __README file__ in the root folder to describe the project, provide basic orientation to use your code, data, etc. ] .large[Include __README files__ to describe metadata/complex content o subfolders] .large[Automatically visualised when landing on a GitHub repository, because it's written in [Markdown](https://www.markdownguide.org/) (Week 3)] --- class: center, middle # Add a LICENSE file(s) --- # .center[Licences] .pull-left[ ### A license is a contract between the authors and the users. - Without a license, **copyright** is automatically attached to your work. - Always **add a license to software** you plan to make public - Permissive = attribution (recommended for academic work) - Copyleft = share-alike (derivate work maintain same license as the original) - Always **add a license to data** you plan to make public ] .pull-right[ ### Further readings: - Visit [choosealicense](https://choosealicense.com/) or [OSI-approved licenses](https://opensource.org/licenses) - <a name=cite-morin2012></a>[[MUS12](https://dx.doi.org/10.1371/journal.pcbi.1002598)]: _A Quick Guide to Software Licensing for the Scientist-Programmer_ - <a name=cite-jolly2012></a>[[JAB12](https://dx.doi.org/10.1371/journal.pcbi.1002766)]: _Ten Simple Rules to Protect Your Intellectual Property_ ] --- class: center, middle # Use Version Control Systems (VCS) --- # .center[Version Control Systems] .pull-left[ ### Turn you project folder into a local version control repository - to keep track changes of your work - to provide a complete historical log of your project - to handle properly text formats (code, text documents, markdown documents) as opposed to rich/binary formats (Word). ] .pull-right[ ### Tools: - SVN, Mercurial, [Git](https://git-scm.com/) ### Additional readings: - [Getting started with git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) - [An introduction to Git: what it is, and how to use it](https://www.freecodecamp.org/news/what-is-git-and-how-to-use-it-c341b049ae61/) - <a name=cite-perez2016></a>[[Per+16](https://doi.org/10.1371/journal.pcbi.1004947)], <a name=cite-bryan2020></a>[[BH20](https://happygitwithr.com/)] ] --- class: center, middle # Use online code repositories services --- # .center[Online code repositories services] .pull-left[ ### Web-based platform for hosting code repositories - VCS + collaborative features ] .pull-right[ ### Tools: - [GitLab](https://about.gitlab.com/), [Bitbucket](https://bitbucket.org/), [GitHub](https://github.com/) ### Additional readings: - [Cookiecutter Data Science: Best practices for organising your repository for easy version control](https://drivendata.github.io/cookiecutter-data-science/) - [[Per+16](https://doi.org/10.1371/journal.pcbi.1004947)], [[BH20](https://happygitwithr.com/)] ] --- # Summary 1. .large[One project = One folder] 1. .large[Choose your best way to organise a folder] 1. .large[Consistent naming convention] 1. .large[Never touch raw data] 1. .large[Choose a coding style] 1. .large[Document, document, document] 1. .large[LICENSE file(s)] 1. .large[Use Version Control Systems (VCS)] 1. .large[Use online code repositories services] --- # References <a name=bib-barthez2020></a>[Baker, Anne-Sophie and Bernard Larrouturou](#cite-barthez2020) (2020). _Passport For Open Science_. URL: [https://www.ouvrirlascience.fr/passport-for-open-science-a-practical-guide-for-phd-students/](https://www.ouvrirlascience.fr/passport-for-open-science-a-practical-guide-for-phd-students/). <a name=bib-bryan2020></a>[Bryan, Jenny and Jim Hester](#cite-bryan2020) (2020). _Happy Git and GitHub for the useR_. URL: [https://happygitwithr.com/](https://happygitwithr.com/). <a name=bib-cooper2019></a>[Cooper, Natalie and Pen-Yuah Hsing](#cite-cooper2019) (2019). _Reproducible code_. URL: [https://www.britishecologicalsociety.org/publications/guides-to/](https://www.britishecologicalsociety.org/publications/guides-to/). <a name=bib-harrison2018></a>[Harrison, Kate](#cite-harrison2018) (2018). _Data management_. URL: [https://www.britishecologicalsociety.org/publications/guides-to/](https://www.britishecologicalsociety.org/publications/guides-to/). <a name=bib-jolly2012></a>[Jolly, M, AC Fletcher AC, et al.](#cite-jolly2012) (2012). "Ten Simple Rules to Protect Your Intellectual Property". In: _PLoS Computacional Biology_ 8 (11), p. e1002766. URL: [https://dx.doi.org/10.1371/journal.pcbi.1002766](https://dx.doi.org/10.1371/journal.pcbi.1002766). --- # References <a name=bib-morin2012></a>[Morin, A, J Urban, et al.](#cite-morin2012) (2012). "A Quick Guide to Software Licensing for the Scientist-Programmer". In: _PLoS Computacional Biology_ 8.7, p. e1002598. URL: [https://dx.doi.org/10.1371/journal.pcbi.1002598](https://dx.doi.org/10.1371/journal.pcbi.1002598). <a name=bib-perez2016></a>[Perez-Riverol, Y, L Gatto, et al.](#cite-perez2016) (2016). "Ten simple rules for taking advantage of Git and GitHub". In: _PLoS computational biology_ 12.7. URL: [https://doi.org/10.1371/journal.pcbi.1004947](https://doi.org/10.1371/journal.pcbi.1004947). <a name=bib-turingway2019></a>[The Turing Way Community, Becky Arnold, et al.](#cite-turingway2019) (2019). "The Turing Way: A Handbook for Reproducible Data Science". . This work was supported by The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/, particularly the "Tools, Practices and Systems" theme within that grant, and by The Alan Turing Institute under the EPSRC grant EP/N510129/1. DOI: [10.5281/zenodo.3233986](https://doi.org/10.5281%2Fzenodo.3233986). URL: [https://the-turing-way.netlify.app/welcome.html](https://the-turing-way.netlify.app/welcome.html).