9 Recommendations and practices: before, during and after

Researchers and scientists in academic institutions are expected to publish at a regular rate. The more, the better. It may not be demanded, but scientific publications certainly give a boost to a researcher’s carreer (Szabo 2025). Incentives do not necessarily align with the goals of reproducibility, openness, and transparency. For example, the “publish or perish” culture encourages researchers to focus on quantity over quality, leading to rushed publications that may lack thorough documentation and sharing of data and code (Figure 9.1).

Szabo, Csaba. 2025. Unreliable: Bias, Fraud, and the Reproducibility Crisis in Biomedical Research. Columbia University Press.

Figure 9.1: On the subject of incentives in research. Source: Leonid Schneider, ForBetterScience.com.

It is crucial to adopt practices that promote reproducibility at every stage of the research process, from planning to execution to dissemination. In this regard, (Alston and Rick 2021) outlined a series of basic actions and recomentdatiosn toward making research more reproducible based on three stages of a research project: (1) before, (2) during, and (3) after analysis.

Alston, Jesse M., and Jessica A. Rick. 2021. “A Beginner’s Guide to Conducting Reproducible Research.” The Bulletin of the Ecological Society of America 102 (2): e01801. https://doi.org/10.1002/bes2.1801.

flowchart LR
  A[BEFORE ANALYSIS] --> B[DURING ANALYSIS]
  B --> C[AFTER ANALYSIS]

We borrow this simple outline to better categorise and explain the set of practical recommendations and practices that follow. Note that most of the following recommendations are already widely accepted best practices for scientific research and that striving for a reasonable level of reproducibility is more achievable than you may expect. Essentially, researchers must think carefully before, during, and after analysis to ensure the reproducibility, openness and transparency of their work.

9.1 Before analysis: planing for reproducibility

“Science is a way of trying not to fool yourself. The principle is that you must not fool yourself, and you are the easiest person to fool.” (Quotes)

Nobel laureate and physics professor Richard Feynman

This slide deck, titled “Before analysis: planning for reproducibility” focuses on the organisational steps a researcher must take before a single line of code is written or data is collected. The central thesis is that reproducibility is not something you “fix” at the end of a research project; it is a mindset that must be integrated into the planning phase to avoid the “reproducibility debt” that accumulates through disorganised workflows.

The presentation emphasises that the “research journey” is often messy and non-linear. To counter this natural entropy, researchers should adopt a “project-centric” approach. This involves treating every research question as a self-contained unit where data, code, and documentation are inextricably linked. It highlights that the primary beneficiary of these practices is often your “future self,” who will inevitably forget the logic behind current decisions. It advocates for moving away from “hand-crafted” manual processes toward automated, scripted workflows that ensure the path from raw data to final results is transparent and repeatable.

List of recommendations/practices:

[#1] Adopt a project-oriented workflow: Use dedicated folders to ensure that all paths are relative to the project root, making the project portable across different computers.
[#2] Organise files logically: Use a consistent folder structure (e.g., separate folders for /data, /scripts, /results, and /docs).
[#3] Use consistent, machine-readable file names: Do not rely on filenames like final_v2_REALLY_FINAL.docx. Avoid spaces and special characters. Use delimiters like underscores or hyphens to make files easy to search and sort. Use chronological markers like ISO 8601 date formats (YYYY-MM-DD) in filenames so they sort naturally by time.
[#4] Separate raw data from processed data: Never overwrite your raw data. Treat it as “read-only” and save any cleaned versions in a separate directory.
[#5] Leverage open data formats: Use open, text-based formats whenever possible. For example, use CSV instead of Excel for tabular data, and use plain text files for code and documentation.
[#6] Document and write “README” files early: Create a README.md file at the start of the project to describe the purpose, structure, and requirements of the research. This serves as a guide for both current and future collaborators (including your future self).
[#7] Plan for data sharing: Identify a convenient license for your data.
[#8] Plan for code sharing: Identify a convenient license for your code and software artefacts.
[#9] Use version control: Use Git to track changes systematically and provide a “time machine” for your code (local repository).
[#10] Use online remote repositories: Use GitHub to ease collaborative development and sharing of code and software artefacts (remote repository).

Slide Deck

Before analysis: planning for reproducibility

9.2 During analysis: reproducible workflows

“The ideas we can most trust are those that have been the most tried and tested.” (Quotes)

Philosopher of science Karl Popper

This slide deck, titled “During Research: reproducible workflows”, focuses on the execution phase of a project. While the previous stage (“Before”) dealt with setup adn prep actions, this one provides the practical habits required to maintain a reproducible record as the research actually happens.

The core idea is that reproducibility is a continuous process, not a final step. It introduces the concept of the “computational environment” — the idea that your results are a product of not just your data, but the specific versions of software, libraries, and operating systems you use.

The deck also warns against “manual interventions” (like fixing a typo in a CSV file by hand), as these create “dark matter” in the research process, that is, steps that happened but left no record. Instead, it advocates for a script-based workflow where every transformation of the data is recorded in code. A significant portion of the deck is dedicated to literate programming (using tools like Quarto), which allows researchers to weave together their narrative, their code, and their results into a single, dynamic document that automatically updates when the data changes.

List of recommendations/practices:

[#11] Open data does not guarantee reproducible data:
[#12] Pay attention to data that is relevant for reproducibility:
[#13] Adopt FAIR principles for data management with caution: FAIR data is not necessarily reproducible data. FAIR focuses on making data Findable, Accessible, Interoperable, and Reusable, but it does not guarantee that the data is well-documented or that the analysis can be reproduced. To ensure reproducibility, you need to go beyond FAIR and also focus on the quality of documentation, version control, and providing clear instructions for how to use the data.
[#14] Use open source tools whenever possible:
[#15] Learn/use scripting languages: Avoid manual “point-and-click” operations in software like Excel. If a change is made to the data, it should be documented in a script. AI assistants can be used to generate code snippets, but the researcher should understand and review the code to ensure it is correct and reproducible.
[#16] Embrace notebooks: Use Quarto or similar tools to keep your text and code (scripts) in the same file. This prevents for example “copy-paste” errors between your statistics software and your word processor.
[#17] Manage sofware dependences: Save information about your software versions and package dependencies. This allows others to recreate the exact technical conditions under which your results were generated.
[#18] Record your computational environment:
[#19] Adopt FAIR principles for software/code management:
[#20] Automate workflows: Use tools such as Make or similar tools to automate the execution of your analysis pipeline. This ensures that all steps are executed in the correct order and that any changes to the data or code will trigger the necessary updates to the results.

Slide Deck

During analysis: data/coding practices and tools

9.3 After analysis: writing and sharing reproducible resources

“I do not mind if you think slowly, but I do object when you publish more quickly than you think.” (Quotes)

Nobel laureate and physics professor Wolfgang Pauli

This presentation, titled “After analysis: writing and sharing reproducible resources,” focuses on the final stages of the research lifecycle: how to document, cite, and share materials so that others (and your future self) can verify and build upon your work.

The core message is a shift from “Trust Me” (results published without access to data/code) to “Show Me” (pre-publication transparency). It argues that the common phrase “data and code available upon request” is insufficient and often leads to a dead end. Instead, reproducibility requires proactive sharing of all computational artifacts, including software versions, specific libraries, and raw data, at the time of submission.

The deck emphasises that software is a primary research object, not just a tool. This means researchers must explicitly cite the software and packages they use, just as they would cite a journal article. The presentation advocates for the use of Research Compendia and modern publishing tools like Quarto to create “dynamic documents” where the analysis and the narrative are linked. The ultimate goal is to move towards “reproducible manuscripts” that allow readers to interact with the data and code directly within the article, fostering a culture of openness and verification in scientific research/publishing.

List of recommendations/practices:

[#21] Move beyond “Available Upon Request”: Never rely on post-publication requests. Data and code must be available at the time of pre-publication or submission.
[#22] Report software versions: Always specify the exact versions of packages, libraries, and frameworks used. A change in version can lead to different results.
[#23] Cite the software you use: Treat software as a first-class citizen. Use commands like citation() in R to get proper references for the R engine and specific packages.
[#24] Include a DASA Section: Add a “Data and Software Availability” (DASA) section to your paper. This section should provide persistent links (like DOIs) to repositories and describe the conditions for access.
[#25] Share preprints: Deposit versions of your paper in repositories (like arXiv, BioRxiv, or SocArXiv) before formal journal review to increase visibility and access.
[#26] Formally share data and code: Document and deposit your resources in permanent repositories (like Zenodo or OSF) and link them to your GitHub account for versioned archiving.
[#27] Create research compendia: Organize all digital materials (data, code, reports) in a standardized, recognizable collection to make the entire project easy to navigate.
[#28] Use literate programming tools: Adopt tools like Quarto or RMarkdown to create dynamic documents where code and text are woven together, ensuring figures and tables update automatically.
[#29] Aim for reproducible manuscripts: Look toward the future of publishing—interactive papers that allow readers to engage with the data and code directly within the article.
[#30] Spread the word: Reproducibility is a community effort and a cultural change. Encourage colleagues to adopt these practices.

Slide Deck

After analysis: writing and sharing reproducible resources

9.1 Before analysis: planing for reproducibility

9.2 During analysis: reproducible workflows

9.3 After analysis: writing and sharing reproducible resources

9.4 Additional resources and readings