03 Quarto Document using plamerpenguins

Notebooks for Academics

guided-exercise
palmerpenguins is a great dataset for data exploration & visualization
Author

Carlos Granell

Published

May 2, 2024

Create an RStudio Project on Posit Cloud using an existing template

  • In Your Workspace, click New Project > New RStudio Project
  • Select File > New File > Quarto Document… to create the file notebook.qmd
  • Open it and delete all the content of the body section

Structure of the notebook using markdown

  • Write main headings (‘##’):

    • Introduction
    • Data & methods
    • Exploratory analysis
    • Conclusion
    • References
  • Add YAML options for authoring and table of contents

YAML header: authoring and table of content
---
title: "Notebook using `palmerpenguins`"
author: "Carlos Granell"
description: "`palmerpenguins` is a great dataset for data exploration & visualization"
date: "04/20/2024"
format: 
  html:
    toc: true
    toc-location: left
    toc-title: Contents
---
  • Numbering sections:

    • Add YAML option number-sections: true
    • Add “{.unnumbered}” next to References section to make it non-numbered section
Body: Non-numbered section
## References {.unnumbered}    

Set up code libraries

  • Select File > New File > R Script to create the file install.R
  • Add libraries to use in the notebook
install.R
install.packages("knitr")
install.packages("rmarkdown")
install.packages("palmerpenguins")
install.packages("tidyverse")
install.packages("gt")
  • Open the R Script and run it
  • Add code chunk in the notebook.qmd
Body: 'setup' chuck
```{r}
#| label: setup
#| echo: false
#| warning: false

library(tidyverse) # tidy data management. Similar to python's pandas
library(palmerpenguins) # our working dataset
library(gt)  # tabular data

```

Citations

  • Select File > New File > Text File to create the file references.bib
  • Add A YAML options to support citations: bibliography: references.bib
  • Open Console and type citation("palmerpenguins"). Copy and paste the bib entry into reference.bib

Write the Introduction section

This notebook explores the palmerpenguins dataset as a Quarto (http://quarto.org) document. The dataset comes in its own package, so we install the palmerpenguins package.

This notebook explores the palmerpenguins dataset as a Quarto document. The dataset comes in its own package, so we install the palmerpenguins package (Horst, Hill, and Gorman 2020).

Write Dataset section

  • Add a subheading “Dataset” (###) to the “Dataset & Methods” section

Data were collected and made available by Dr. Kristen Gorman (https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network (https://lternet.edu/).

The palmerpenguins package contains two datasets:

  • penguins: is a simplified version of the raw data
  • penquings_raw: contains all the variables and original names

In this notebook we are going to use the penguins dataset, which contains the following 8 variables are: species, island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex, year.

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

The palmerpenguins package contains two datasets:

  • penguins: is a simplified version of the raw data
  • penquings_raw: contains all the variables and original names

In this notebook we are going to use the penguins dataset, which contains the following 8 variables:

[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"             

Write Methods section

  • Add a subheading “Methods” (###) to the “Dataset & Methods” section

For data manipulation, we use the tidyverse meta package. For visualization purpose, we use ggplot2 package and the gt package.

The ggplot2 package, which is already included in the tidyverse, is an R package to create visualisation using the Grammar of Graphics (https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448), originally created by Leland Wilkinson.

The gt package offers a different and easy-to-use set of functions that helps us build display tables from tabular data. The gt philosophy states that a comprehensive collection of table parts can be used to create a broad range of functional tables. The output formats that gt currently supports are HTML, LaTeX, and RTF.

For data manipulation, we use the tidyverse meta package (Wickham et al. 2019). For visualization purpose, we use ggplot2 package (Wickham 2016) and the gt package (Iannone et al. 2024).

The ggplot2 package, which is already included in the tidyverse, is an R package to create visualisation using the Grammar of Graphics, originally created by Leland Wilkinson.

The gt package offers a different and easy-to-use set of functions that helps us build display tables from tabular data. The gt philosophy states that a comprehensive collection of table parts can be used to create a broad range of functional tables. The output formats that gt currently supports are HTML, LaTeX, and RTF.

Write Exploratory analysis section

  • Explanation in class: code chucks to produce output figures, YAML options for code execution, and freeze.

Write Conclusions section

This notebook demonstrates that the palmerpenguins package is a great data set to get started with computational notebooks with Quarto Document (https://quarto.org/docs/output-formats/html-basics.html). We have learned YAML options (table of content, section numbering, for HTML & PDF output, dhow to create content with markdown (headings, tables, figures, citations, cross references, margin content), execute code chucks, display code outputs, and specify options for code chunks.

This notebook demonstrates that the palmerpenguins package is a great data set to get started with computational notebooks with Quarto Document. We have learned YAML options (table of content, section numbering, for HTML & PDF output, dhow to create content with markdown (headings, tables, figures, citations, cross references, margin content), execute code chucks, display code outputs, and specify options for code chunks.

Support PDF format

  • Add YAML options for PDF
YAML header: PDF output format
---
format: 
  html:
    toc: true
    toc-location: left
    toc-title: Contents
  pdf:
    toc: true
    colorlinks: true
---
  • tinytext is required to compile PDF documents. Use the following command:
Terminal
quarto install tinytex

References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.
Iannone, Richard, Joe Cheng, Barret Schloerke, Ellis Hughes, Alexandra Lauer, and JooYoung Seo. 2024. Gt: Easily Create Presentation-Ready Display Tables. https://gt.rstudio.com.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.