Reproducible Manuscripts
with Quarto

Mine Çetinkaya-Rundel

Duke University + Posit
mine.quarto.pub/manuscripts-conf23

Full complexity spectrum of reproducible scientific projects

Simplest

Can run all code in a single file, and don’t mind running it over and over again with each edit.



e.g. Data Science 101 - HW 1, Stat 101 - Final project, a blog post, a tutorial, a not-too-extensive consulting report, etc.

Simple

Can run all code in a single file, and don’t mind running it over and over again with each edit, and need an output that conforms to journal style.



or

formatted with journal style

e.g., a not-too-computational journal article.

but science is rarely simple…

  • multiple collaborators, each with their favorite computing language and code editor
  • multiple stages of a project, each with their own level of feasibility of what can be re-run with each edit and what needs to be cached

More complex



or

Even more complex



or

Leveraging Quarto for fully reproducible scientific manuscripts

Aside: What is in a notebook?

A notebook is a document that contains both code and narrative:

  • Jupyter notebooks (.ipynb)
  • Quarto documents (.qmd) – a potential mindshift

Current state of affairs

Most computational science is born in notebooks

  • Peer-review and publication workflows don’t support notebooks as research outputs
  • The more complex scenarios involve a lot of manual finagling to bring the project to journal submission stage
  • Often during this process reproducibility is lost, or takes second seat to the formatting requirements
  • Final submission rarely captures all computations, which are, at best, relegated to supplementary materials

and dies ends in PDF or Word documents

https://data.agu.org/notebooks-now

  • Funded through a grant from the Alfred P. Sloan Foundation to the American Geophysical Union (AGU)
  • Broad collaboration between open source communities, open science organizations, and software tool makers

Roadmap to fully reproducible scientific manuscripts

that are not just PDFs that are the outputs of a single qmd file

An end-to-end scholarly publishing workflow that treats Jupyter and Quarto notebooks as a primary element of the scientific record.

A publication process that elevates transparent and reproducible work by authors, where data and software, together with narrative, are documented, shared, and archived.

New forms of credit to the wider research community, including research software engineers or rsearch software engineers.

Quarto can…

  • be authored in your favorite code editor
  • render from qmd or Jupyter notebook to PDF, Word, HTML, etc.
  • execute code in R, Python, and more
  • apply journal styles to your outputs with Quarto extensions
  • publish to GitHub Pages, Netlify, and more
  • orchestrate multiple inputs and outputs with Quarto projects
  • orchestrate multiple inputs and outputs with embedded computing using a new Quarto project type: manuscript

A new project
type: manuscript

Quarto manuscript

Quarto manuscripts (Quarto 1.4+), in addition to doing everything you can do with journal articles, can

  • produce manuscripts in multiple formats (including LaTeX or MS Word formats required by journals), and give readers easy access to all of the formats through a website

  • publish computations from one or more notebooks alongside the manuscript, allowing readers to dive into your code and view it or interact with it in a virtual environment

Let’s write a manuscript

Getting started

  • Approach 1: Start from scratch
    • Creating a Quarto manuscript
      • RStudio: New Project > New Directory > Quarto Manuscript
      • quarto create project manuscript <name>
    • Add manuscript content
  • Approach 2: Start with a sample from https://quarto.org/docs/manuscripts

Manuscripts ♥️ Git + GitHub

Track your project with Git and host on GitHub for easy publishing.

A finished product

Multiple formats from one source

Multiple formats from one source

In quarto.yml of the project:

---
format:
  html:
    theme: cosmo
    toc-location: left
    comments: 
      hypothesis: true
    citations-hover: true
    crossrefs-hover: true
  agu-pdf: default
  docx: default
  jats: default
---

Rich front matter

In index.qmd of the project:

---
title: La Palma Earthquakes
author:
  - name: Steve Purves
    orcid: 0000-0002-0760-5497
    corresponding: true
    email: steve@curvenote.com
    roles:
      - Investigation
      - Project administration
      - Software
      - Visualization
    affiliations:
      - Curvenote
  - name: Rowan Cockett
    orcid: 0000-0002-7859-8394
    corresponding: false
    roles: []
    affiliations:
      - Curvenote
license: CC BY-SA 4.0
keywords:
  - La Palma
  - Earthquakes
date: '2022-05-11'
abstract: |
  In September 2021, a significant jump in seismic activity on the island of La Palma (Canary Islands, Spain) signaled the start of a volcanic crisis that still continues at the time of writing. Earthquake data is continually collected and published by the Instituto Geográphico Nacional (IGN). We have created an accessible dataset from this and completed preliminary data analysis which shows seismicity originating at two distinct depths, consistent with the model of a two reservoir system feeding the currently very active volcano.
keypoints:
  - You may specify 1 to 3 keypoints for this PDF template
  - These keypoints are complete sentences and less than or equal to 140 characters
  - 'They are specific to this PDF template, so they will not appear in other exports'
citation:
  container-title: Notebooks Now!
draft: false
bibliography: references.bib
echo: false
---

Rich front matter

from source \(\rightarrow\) only relevant / required metadata in manuscript:

Rich front matter

from source \(\rightarrow\) only relevant / required metadata in manuscript:

Embedded computations

What’s next?

Actually dive into the code

  • We’ve seen that you can peruse the code underlying the figures and tables in the manuscript

  • What if you wanted to interact with the code – in a computational environment that’s just a click away and that has all the software and packages needed to reproduce the manuscript?

Back in 2019…

Binder with Quarto

with quarto use binder:

Binder with Quarto

Rewind,
and get started again

https://quarto.org/docs/manuscripts

Thank you!

mine.quarto.pub/manuscripts-conf23

github.com/mine-cetinkaya-rundel/quarto-manuscripts