After analysis

Writing and sharing reproducible resources

Carlos Granell

GEOTEC, Universitat Jaume I

Apr 30, 2024

[#21] “Resources (data, code, …) available upon request”

NEVER

Data and code must be made available at prepublication or submission time (SHOW ME), rather than postpublication (TRUST ME)

Example

(Stodden, Seiler, and Ma 2018) analysed 204 computational articles from Science with policy “data and code available postpublication upon request” in place. Authors:

  • Received data and/or code from authors for 44%;

  • Were able to reproduce findings for 26%;

  • Concluded improvement over no policy, but insufficient for reproducibility

[#22] Report software tool versions you use

Software version

  • Specify versions of software tools/components used in the paper (packages, libraries, frameworks, etc.)

  • A different version can lead to different results!

Example

Check “Story 4: Different Versions of Code, External Libraries, or Compilers can Challenge Reproducibility” (Mesnard and Barba 2017) to understand why using different software versions makes a difference in some disciplines

[#23] Cite the software you use

Do I cite all software used in my paper?

Cite software that’s KEY to research results

How to cite R engine

citation()
To cite R in publications use:

  R Core Team (2023). _R: A Language and Environment for Statistical
  Computing_. R Foundation for Statistical Computing, Vienna, Austria.
  <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2023},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

How to cite R packages

citation("palmerpenguins")
To cite palmerpenguins in publications use:

  Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer
  Archipelago (Antarctica) penguin data. R package version 0.1.0.
  https://allisonhorst.github.io/palmerpenguins/. doi:
  10.5281/zenodo.3960218.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {palmerpenguins: Palmer Archipelago (Antarctica) penguin data},
    author = {Allison Marie Horst and Alison Presmanes Hill and Kristen B Gorman},
    year = {2020},
    note = {R package version 0.1.0},
    doi = {10.5281/zenodo.3960218},
    url = {https://allisonhorst.github.io/palmerpenguins/},
  }

  • Do you cite SciPy library as a footnote 1 or as a regular article instead?

[#24] DASA Section

DASA = Data and Software Availability

AGILE Reproducible Paper Guidelines (Nüst et al. 2020)

“The DASA section provides references to where data, software and documentation is available (e.g., paper section or README file) and under what conditions (e.g., copyright, licenses or access procedures for protected data). It should be concise and contain persistent links to repositories using Digital Object Identifiers (DOI).”

DASA Examples

“All analyses were performed using R Statistical Software (v4.3.0; R Core Team 2023). Penguin data was obtained via the palmerpenguins R package (v0.1.0; Horst AM, Hill AP, Gorman KB (2020)”.

Based on citation() examples in recommendation 23

“The input data for this work are the full texts of GIScience conference proceedings from the years 2012 to 2018. The paper assessment results and source code of figures are published at https://github.com/nuest/reproducible-research-at-giscience and archived on Zenodo [Daniel Nüst et al., 2020]. The used computing environment is https://github.com/rocker-org/binder/ pinning the R version to 3.6.3 and R packages to the https://mran.microsoft.com/timemachine of July 5th 2019.”

Extracted from (Ostermann et al. 2021)

[#25] Share pre-prints

Preprints are versions of papers that have not yet been sent to a journal for review

  • Sharing preprints increases access and visibility to your work

  • Before sharing (depositing): choose appropriate license

Tips

Preprint and published versions can be merged in Google Scholar for citation

Where to deposit?

  • arXiv: physics, mathematics, computer science, quantitative biology, statistics, electrical engineering, and system sciences.
  • BioRxiv: biological sciences.
  • ChemRxiv: chemical science.
  • EarthArXiv: Earth Science and related domains of planetary science.
  • psyArXiv: psychology, powered by OSFPreprints.
  • SocArXiv: social sciences, powered by OSFPreprints.

Sharing preprints according to UJI

Open access dissemination of research results

“What type of publication must be deposited? Scientific articles. Also recommended: monographs, book chapters, conference papers, etc.

“Regarding the version of the article to be deposited, the published document in PDF format is recommended. If the editorial policy does not allow it, the postprint or the preprint will be deposited.”

Declaración institucional a favor de la promoción del acceso abierto en la Universitat Jaume I

“Cuando no se permita la difusión en acceso abierto del texto completo revisado, se solicitará que se deposite la versión no revisada de autor o autora (preprint)”

Código de Buenas Prácticas en Investigación y Doctorado

“[Researchers] will save the different versions of their publications (preprint, postprint and publisher’s PDF) in order to be able to deposit the document corresponding to the copyright and intellectual property rights in the UJI Repository”

“When the revised full text (postprint) is not allowed to be disseminated in open access, the author’s unrevised version (preprint) must be deposited.”

[#26] Share data/code

As part of the writing and sharing process…

  • Document and deposit your data/code resources

  • Include them as references in your paper

  • Cite them (most likely in the DASA section)

  • See flowchart in Recommendation 13

Tips

[#27] Research compendia

A research compendium is…

  • Collection of data, code, products (reports, questionnaires, etc.) of a research project that are archived together

  • Standardised and easily recognisable way to organise digital materials of a research project

  • Basic principles when creating a research compendia

Tools to share every step of the scientific process

[#28] Hello Quarto (and friends)

knitr (Yihui Xie 2015) started in 2011, RMarkdown (Y. Xie, Allaire, and Grolemund 2018) in 2014

Quarto (Allaire 2023) started in 2022

  • Weave together text and code to produce formatted output such as documents, web pages, blog posts, books, articles

  • Dynamic document: Reproducible figures/tables are created with code and integrated into documents in a way that are automatically updated when analyses are re-run

Quarto with R

Quarto with python

Quarto with python

[#29] Reproducible (interactive) manuscripts

Future is interactive AND reproducible papers

[#30] Spread the word

Dear colleagues,

Cultural change

Community effort

References

Allaire, JJ. 2023. Quarto: R Interface to ’Quarto’ Markdown Publishing System. https://CRAN.R-project.org/package=quarto.
Caprarelli, Graziella, Brian Sedora, Mia Ricci, Shelley Stall, and Matthew Giampoala. 2023. “Notebooks Now! The Future of Reproducible Research.” Earth and Space Science 10 (12). https://doi.org/10.1029/2023ea003458.
Dhar, Payal. 2023. “Octopus and ResearchEquals Aim to Break the Publishing Mould.” Nature. Springer Science; Business Media LLC. https://doi.org/10.1038/d41586-023-00861-0.
Hohman, Fred, Matthew Conlen, Jeffrey Heer, and Duen Chau. 2020. “Communicating with Interactive Articles.” Distill 5 (9). https://doi.org/10.23915/distill.00028.
Mesnard, Olivier, and Lorena A. Barba. 2017. “Reproducible and Replicable Computational Fluid Dynamics: It’s Harder Than You Think.” Computing in Science Engineering 19 (4): 44–55. https://doi.org/10.1109/MCSE.2017.3151254.
Nüst, Daniel, Frank Ostermann, Rusne Sileryte, Barbara Hofer, Carlos Granell, Marta Teperek, Anita Graser, et al. 2020. “Reproducible Publications at AGILE Conferences: Guidelines for Authors, Scientific Reviewers, and Reproducibility Reviewers.” https://doi.org/10.17605/OSF.IO/CB7Z8.
Ostermann, Frank O., Daniel Nüst, Carlos Granell, Barbara Hofer, and Markus Konkol. 2021. Reproducible Research and GIScience: An Evaluation Using GIScience Conference Papers.” In 11th International Conference on Geographic Information Science (GIScience 2021) - Part II, edited by Krzysztof Janowicz and Judith A. Verstegen, 208:2:1–16. Leibniz International Proceedings in Informatics (LIPIcs). Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.GIScience.2021.II.2.
Smith, AM, DS Katz, and KE Niemeyer. 2016. “Software Citation Principles.” PeerJ Computer Science 2: e86. https://doi.org/10.7717/peerj-cs.86.
Stodden, Victoria, Jennifer Seiler, and Zhaokun Ma. 2018. “An Empirical Analysis of Journal Policy Effectiveness for Computational Reproducibility.” Proceedings of the National Academy of Sciences 115 (11): 2584–89. https://doi.org/10.1073/pnas.1708290115.
Xie, Y, JJ Allaire, and G Grolemund. 2018. R Markdown: The Definitive Guide. CRC Press. https://bookdown.org/yihui/rmarkdown/.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.