name: title class: Left, middle <!--background-image: url(images/rawpixel/nasa-sun.jpg) background-size: cover--> # Before analysis · Basics to get started ### [Act II] Recomendations and practices for open and reproducible research .large[Reproducible Research Practices (RRP'23) · April 2023] .right[Carlos Granell · Sergi Trilles] .right[Universitat Jaume I] ??? Solar Observations taken during the Transit of Venus. Originally from NASA. Source: [Rawpixel(https://www.rawpixel.com/image/440254/solar-observations) --- class: inverse, bottom, middle ## Nobel laureate and physics professor Richard Feynman .large[“Science is a way of trying not to fool yourself. The principle is that you must not fool yourself, and you are the easiest person to fool.”] ??? [Richard Feynman](https://en.wikipedia.org/wiki/Richard_Feynman) --- class: left ## If you were to reproduce a paper... - .large[checking the published manuscript] - .small[<a name=cite-keshav2007></a>[[Kes07](https://doi.org/10.1145/1273445.1273458)]] - *How to Read a Paper* - .small[<a name=cite-nust2018_readpaper></a>[[NBM18](https://arxiv.org/abs/1806.09525)]] - *How to Read a Research Compendium* - .large[looking for published data and code, then] - .large[comparing the results of that data and code to the published results] - .large[if they are the _same_, the study is reproducible; otherwise, it is not*] ??? * or partially reproducible. Remember: Reproducibility is not a binary flag, but a spectrum --- class: center, middle ## But If you were to .gray.bg-blue[write] a reproducible paper --- class: center, top # Getting started is difficult! -- .huge["I have never heard of reproducibility before"] -- .huge["I'm not good at using tools. _You said github and docker?_"] -- .huge["I have never been trained"] -- .huge["I don't know where to find tutorials, guidelines..."] -- .huge["I'm superbusy! I do not have enough time"] -- .huge["I do theoretical stuff. Reproduction is not for me"] --- class: center, middle # No matter your excuse --- class: center, middle # Reproducible Research Practices ### are central to enabling not only .gray.bg-blue[reproducible] but .gray.bg-blue[transparent, open and honest] science too --- class: center, top # You should ask -- .huge[Which habits should I adopt to change my daily routine?] -- .huge[Which aspects should I improve the most?] -- .huge[What steps or recommendations can I follow?] --- class: left # Key guidelines & recommendations - .large[Inspired by British Ecological Society (BES), [guides to better science](https://www.britishecologicalsociety.org/publications/guides-to/): Reproducible code guide .small[<a name=cite-cooper2019></a>[[CH19](https://www.britishecologicalsociety.org/publications/guides-to/)]] and Data Management guide .small[<a name=cite-harrison2018></a>[[Har18](https://www.britishecologicalsociety.org/publications/guides-to/)]]] - .large[Passport For Open Science - A Practical Guide ofr PhD Students .small[<a name=cite-barthez2020></a>[[Ber+22](https://www.ouvrirlascience.fr/passport-for-open-science-a-practical-guide-for-phd-students/)]]] - .large[A Beginner's Guide to Conducting Reproducible Research .small[<a name=cite-alston2021></a>[[AR21](https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1002/bes2.1801)]]] - .large[The Turing Way: A Handbook for Reproducible Data Science .small[<a name=cite-turingway2019></a>[[The19](https://the-turing-way.netlify.app/welcome.html)]]] - .large[Reproducible Publications at AGILE Conferences: Guidelines for Authors, Scientific Reviewers, and Reproducibility Reviewers .small[<a name=cite-nust2020></a>[[Nus+20](https://osf.io/numa5/)]]] ??? Look at the checklist in these guides. --- class: inverse, center, middle # Basic recommentations & practices to get started --- name: rec01 class: inverse, center, middle # .blue.bg-white[\#1] # ONE project = ONE folder --- class: left ### What's a project? .pull-left[ - .large[an experiment in a PhD project] - .large[master project thesis] - .large[ideas for future research] - .large[regular meeting notes/minutes] - .large[teaching materials] ] -- .pull-right[ - .large[a review paper] - .large[a conference presentation] - .large[a workshop/seminar materials] - .large[a book] - .large[PhD thesis manuscript] ] --- name: rec02 class: inverse, center, middle # .blue.bg-white[\#2] # Choose your best way to organise a project/folder --- class: left ### Project/folder structure .huge[Make sure it's consistent, informative, and works for you] .pull-left[ - .large[`data`] - .large[`data-raw`] - .large[`figs`] ] .pull-right[ - .large[`scripts`] - .large[`reports`] - .large[`notes`, ...] ] .large[See [example repository structure](https://the-turing-way.netlify.app/project-design/project-repo/project-repo-advanced.html) .small[[[The19](https://the-turing-way.netlify.app/welcome.html)]]] -- .huge[And **stick to it**] --- name: rec03 class: inverse, center, middle # .blue.bg-white[\#3] # Choose a consistent file naming convention --- class: left ### File naming convention .pull-left[ .center[<img src="images/phdcomics1531.gif" width="75%" />] ] .pull-right[ - .large[File names are **machine-readable**, **human-readable**, and play well with **default ordering**] - .large[Script file names begin with numbers/letters to indicate the sequence in the analysis: `01_download_data.R`] - .large[Data file names begin with dates (YYYYMMDD) as prefix: `20200115_survey.csv`] ] ??? [Source](http://phdcomics.com/comics/archive.php?comicid=1531) --- name: rec04 class: inverse, center, middle # .blue.bg-white[\#4] # Never ever touch raw data --- class: left ### Raw data .huge[Remember [Newton's letter to Flamsteed](https://cgranell.github.io/rrp-uji/slides/slides11_01.html#newton)] - .large[Store raw data permanently (in the `data-raw` folder)] - .large[Use scripts to produce derived, clean datasets for analyses (in the `data` folder)] - .large[Document the process such as simple steps, diagrams, contents of the datasets and their provenance in a plain text README file (See [Recommendation \#6](#rec06))] --- name: rec05 class: inverse, center, middle # .blue.bg-white[\#5] # Use open data formats --- class: left ### Open formats .huge[Use open, text-based formats whenever possible] .huge[Alternatively, provide data in an open format besides proprietary format] - .large[.xls + .csv] --- name: rec06 class: inverse, center, middle # .blue.bg-white[\#6] # Document, document, and document --- class: left ### README file(s) - .large[Include (at least) a __README file__ in the root folder to describe the project, provide basic orientation to use your code, data, etc. ] - Suggestions for writing a good [README](https://www.makeareadme.com/) and [GitHub's README](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes) - If your project is on GitHub, README files will be automatically visualised if written in [Markdown](https://www.markdownguide.org/) - .large[Include (if required) __README files__ in each subfolder to describe metadata/complex content] - .large[Keep track of ideas, discussions and decisions about the project (in the `notes` folder)] - .large[Use simple formatting language/tool for writing plain text files that can be version controlled] --- name: rec07 class: inverse, center, middle # .blue.bg-white[\#7] # Add a data license --- class: left, middle ### A .gray.bg-blue[license] is a .gray.bg-blue[contract] between the authors and the users .large[Without a license, **copyright** is automatically attached to your work] .large[If you need it, add a **LICENSE filename** to apply a license to your work] .large[.small[<a name=cite-jolly2012></a>[[JAB12](https://doi.org/10.1371/journal.pcbi.1002766)]]: _Ten Simple Rules to Protect Your Intellectual Property_] --- class: left ### Data licenses .huge[Always add a [license](https://the-turing-way.netlify.app/reproducible-research/licensing/licensing-data.html) to the data you plan to make public] - .large[Creative Commons Licenses]: [BY SA NC ND](https://creativecommons.org/about/cclicenses/) - .large[CC0]: CC Zero allows creators to put their works into the worldwide public domain - .large[[Open Data Commons](https://opendatacommons.org/licenses/index.html)] <br/> .large[**[What UJI recommends...](https://www.uji.es/serveis/cd/bib/pco/base/drets2/)**] --- name: rec08 class: inverse, center, middle # .blue.bg-white[\#8] # Add a software license --- class: left ### Software licenses .huge[Always add a [license](https://the-turing-way.netlify.app/reproducible-research/licensing/licensing-software.html) to the software you plan to make public] .pull-left[ - .large[**Permissive** = attribution (recommended for academic work)] ] .pull-right[ - .large[**Copyleft** = share-alike (derivate work maintain same license as the original)] ] <br/> .large[**What UJI recommends...**] -- .large[Not specified] --- class:left ### .small[<a name=cite-morin2012></a>[[MUS12](https://doi.org/10.1371/journal.pcbi.1002598)]]: _A Quick Guide to Software Licensing for the Scientist-Programmer_ <img src="https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002598.g002&type=large" width="50%" style="display: block; margin: auto;" /> ??? Source: [https://doi.org/10.1371/journal.pcbi.1002598.g002](https://doi.org/10.1371/journal.pcbi.1002598.g002) --- name: rec09 class: inverse, center, middle # .blue.bg-white[\#9] # Learn/use version control systems --- class: left ### Version control systems .huge[Turn you project folder into a local version control repository] - .large[to keep track changes of your work over time, be they collaborative (working with others) or non-collaborative (tracking file histories)] - .large[to handle properly text formats (code, text documents, markdown documents) as opposed to rich/binary formats (Word)] --- class: left ### Version control systems .huge[Tools:] - .large[SVN, Mercurial, [Git](https://git-scm.com/)] .huge[Readings:] - .large[[Getting started with git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)] - .large[[An introduction to Git: what it is, and how to use it](https://www.freecodecamp.org/news/what-is-git-and-how-to-use-it-c341b049ae61/)] - .large[Ten simple rules for taking advantage of Git and GitHub .small[<a name=cite-perez2016></a>[[Per+16](https://doi.org/10.1371/journal.pcbi.1004947)]]] - .large[Excuse Me, Do You Have a Moment to Talk About Version Control? .small[<a name=cite-bryan2018></a>[[Bry18](https://doi.org/10.1080/00031305.2017.1399928)]]] --- name: rec10 class: inverse, center, middle # .blue.bg-white[\#10] # Learn/Use online (Git) repository hosting services --- class: left ### Online (Git) repository hosting services .huge[Easier for individuals and teams to use Git for version control and collaboration] - .large[VCS + collaborative features + development support features] .huge[Tools]: .large[[GitLab](https://about.gitlab.com/), [Bitbucket](https://bitbucket.org/), [GitHub](https://github.com/)] .huge[Readings:] - .large[[Cookiecutter Data Science: Best practices for organising your repository for easy version control](https://drivendata.github.io/cookiecutter-data-science/)] - .large[Ten simple rules for taking advantage of Git and GitHub .small[[[Per+16](https://doi.org/10.1371/journal.pcbi.1004947)]]] - .large[Happy Git and GitHub for the useR .small[<a name=cite-bryan2020></a>[[BH20](https://happygitwithr.com/)]]] --- name: summary class: inverse, center, middle # Summary --- - .large[[One project = One folder](#rec1)] - .large[[Choose your best way to organise a folder](#rec02)] - .large[[Choose a consistent file naming convention](#rec03)] - .large[[Never ever touch raw data](#rec04)] - .large[[Use open data formats](#rec05)] - .large[[Document, document, document](#rec06)] - .large[[Add a data license](#rec07)] - .large[[Add a software license](#rec08)] - .large[[Learn/use version control systems](#rec09)] - .large[[Learn/Use online (Git) repository hosting services](#rec10)] --- # References .tiny[ <a name=bib-keshav2007></a>[Keshav, S.](#cite-keshav2007) (2007). "How to Read a Paper". In: _ACM SIGCOMM Computer Communication Review_ 37.3, p. 83–84. ISSN: 0146-4833. URL: [https://doi.org/10.1145/1273445.1273458](https://doi.org/10.1145/1273445.1273458). <a name=bib-jolly2012></a>[Jolly, M, AC Fletcher AC, et al.](#cite-jolly2012) (2012). "Ten Simple Rules to Protect Your Intellectual Property". In: _PLoS Computacional Biology_ 8 (11), p. e1002766. URL: [https://doi.org/10.1371/journal.pcbi.1002766](https://doi.org/10.1371/journal.pcbi.1002766). <a name=bib-morin2012></a>[Morin, A, J Urban, et al.](#cite-morin2012) (2012). "A Quick Guide to Software Licensing for the Scientist-Programmer". In: _PLoS Computacional Biology_ 8.7, p. e1002598. URL: [https://doi.org/10.1371/journal.pcbi.1002598](https://doi.org/10.1371/journal.pcbi.1002598). <a name=bib-perez2016></a>[Perez-Riverol, Y, L Gatto, et al.](#cite-perez2016) (2016). "Ten simple rules for taking advantage of Git and GitHub". In: _PLoS computational biology_ 12.7. URL: [https://doi.org/10.1371/journal.pcbi.1004947](https://doi.org/10.1371/journal.pcbi.1004947). <a name=bib-bryan2018></a>[Bryan, Jennifer](#cite-bryan2018) (2018). "Excuse Me, Do You Have a Moment to Talk About Version Control?" In: _The American Statistician_ 72.1, pp. 20-27. URL: [https://doi.org/10.1080/00031305.2017.1399928](https://doi.org/10.1080/00031305.2017.1399928). <a name=bib-harrison2018></a>[Harrison, Kate](#cite-harrison2018) (2018). _Data management_. URL: [https://www.britishecologicalsociety.org/publications/guides-to/](https://www.britishecologicalsociety.org/publications/guides-to/). <a name=bib-nust2018_readpaper></a>[Nust, Daniel, Carl Boettiger, et al.](#cite-nust2018_readpaper) (2018). _How to Read a Research Compendium_. URL: [https://arxiv.org/abs/1806.09525](https://arxiv.org/abs/1806.09525). <a name=bib-cooper2019></a>[Cooper, Natalie and Pen-Yuah Hsing](#cite-cooper2019) (2019). _Reproducible code_. URL: [https://www.britishecologicalsociety.org/publications/guides-to/](https://www.britishecologicalsociety.org/publications/guides-to/). <a name=bib-turingway2019></a>[The Turing Way Community](#cite-turingway2019) (2019). _The Turing Way: A Handbook for Reproducible Data Science_. URL: [https://the-turing-way.netlify.app/welcome.html](https://the-turing-way.netlify.app/welcome.html). <a name=bib-bryan2020></a>[Bryan, Jenny and Jim Hester](#cite-bryan2020) (2020). _Happy Git and GitHub for the useR_. URL: [https://happygitwithr.com/](https://happygitwithr.com/). <a name=bib-nust2020></a>[Nust, Daniel, Frank Ostermann, et al.](#cite-nust2020) (2020). _Reproducible Publications at AGILE Conferences: Guidelines for Authors, Scientific Reviewers, and Reproducibility Reviewers_. URL: [https://osf.io/numa5/](https://osf.io/numa5/). <a name=bib-alston2021></a>[Alston, Jesse M. and Jessica A. Rick](#cite-alston2021) (2021). "A Beginner's Guide to Conducting Reproducible Research". In: _The Bulletin of the Ecological Society of America_ 102.2, p. e01801. eprint: https://esajournals.onlinelibrary.wiley.com/doi/pdf/10.1002/bes2.1801. URL: [https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1002/bes2.1801](https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1002/bes2.1801). <a name=bib-barthez2020></a>[Berti, Johann, Marin Dacos, et al.](#cite-barthez2020) (2022). _Passport For Open Science - A Practical Guide ofr PhD Students_. URL: [https://www.ouvrirlascience.fr/passport-for-open-science-a-practical-guide-for-phd-students/](https://www.ouvrirlascience.fr/passport-for-open-science-a-practical-guide-for-phd-students/). ]