Writing Reproducible Research Reports

Table of Contents

1. Writing Reproducible Reports

1.1. Topic Goal

To introduce writing reproducible research reports: reports that mix text, code, analyses, and output into a single document so that readers can see the report and the methods used to generate the analyses and are free to re-evaluate the claims by editing the source for the "viewable" document.

Ka : Introducing Reproducible Reports (i2c4p) from Britt Anderson on Vimeo.

Introducing the tools we will need for reproducible report generation in Emacs with Babel

1.2. Many Paths - One Destination

There is no standard for the software and tools needed to write a reproducible report. What is common is not software or tooling, but rather the final product. To be a reproducible report the reader should be able to repeat the analyses reported. That requires access to the code used to generate the analyses and the data on which the analyses were done. It should include the detailed specifics about how the data was collected, but for many (though not all experiments) the collection of the data itself is not a part of the reproducible report. If you string together a writing process that accomplishes that end then you will have produced a reproducible report. The "best" tool for the job will depend on your domain, what other workers in your area are used to and find customary, and practical concerns such as the speed of development and the ease of use.

For this topic we will be using emacs as the text editor and a pipeline that produces a pdf file as the viewable result. That pdf will be produced from a combination of code, text, and mark-up in a single "org" file. We will use the ability of emacs and its org babel system to execute code and insert the results into our document.

The production of a reproducible research report is inspired by ideas from literate programming. One can tangle the original file into only its code components. One weaves the text and code into the documentation.

Other tools that are used to accomplish similar results are RStudio and related tools. One can even write a technical book using this pipeline.

Jupyter notebooks also allow for a combination of code, text and graphics.

To compare them for their ease of use and fit to your use case will require trying them all, eventually. For now we will focus on one particular method.

Kb : What are Reproducible Research Reports (i2c4p) from Britt Anderson on Vimeo.

What is a reproducible research report. What tools do you need and how does it relate to literate programming?

1.3. Preliminaries

1.3.1. Latex Installing

LaTex is a type setting program. It is widely used in the mathematical and computing communities for its ability to produce precisely and correctly typeset mathematical notation. But it has grown well beyond those roots. We will use it as the method to go from a plain .tex file to a .pdf file.

Kc : LaTeX tools for report generation (i2c4p) from Britt Anderson on Vimeo.

Getting the latex packages you will need from Ubuntu for these exercises. What is LaTeX?

Our emacs editor will be able to export our org file into a tex file. To go the rest of the way to a pdf we will need various libraries and packages available through our Linux installation. Since these will be used by emacs, but not part of emacs, we will use the system package manager to get and install them. It is quite possible to use latex and all the related tools from the command line and it can often be helpful to do so. It is also possible to use these libraries from other editors. Thus we will install them with apt, a package manager for *buntus.

sudo apt install texlive-latex-recommended
sudo apt install texlive-latex-extra

There is also a "full" package as well that you can install instead of the above.

LaTeX could be a whole lecture (maybe even a course) in itself. We will not go into that here, but we will just use it behind the scenes. If you are interested though there are some nice tutorials.

1.3.2. LaTeX Tutorials for Beginners

1.3.3. Pdf Viewers

There are many pdf viewers that work well in the linux world. A very minimal version is xpdf whereas a much more full feature viewer is okular. However, we are working in emacs and it will be convenient to install a pdf-viewer that works in that environment. Emacs comes with a "doc-view" that will render a pdf, but it is not a very pretty view and our ability to interact and edit the pdf is limited. We will use a much more full featured and modern pdf tool called pdf-tools.

The Github Repo for PdfTools provides extensive documentation on use and installation. I have copied the minimal version here to get us going. Note that the pdf-tools instructions describes a package manager called aptitude. You can use apt instead (https://www.tecmint.com/difference-between-apt-and-aptitude/).

It is common for the Ubuntu packages to come in both regular and development versions. As a rule of thumb if you are unsure which to install always pick the development packages (noted by the -dev at the end). This will include some extra files and details, but you will often need those extra files when trying to "make" or assemble code yourself.

Kd : PdfTools (i2c4p) from Britt Anderson on Vimeo.

How to view and annotate pdfs in Emacs. The PdfTools package installed.

1.3.4. Preparing Xubuntu

You may already have many of the files you will need to build the pdf-tools package, but just in case make sure to install:

  1. gcc
  2. g++
  3. make
  4. automake
  5. autoconf
  6. libpng-dev
  7. zlib1g-dev
  8. libpoppler-glib-dev
  9. libpoppler-private-dev
  10. imagemagick

1.3.5. Preparing Emacs

Run M-x package-list-packages to refresh and display available packages. Search for let-alist and tablist. Mark them both with an i and then type x to execute the installation.

Then search for pdf-tools make it with an i and then enter x. This will download the pdf-tools package files to the correct directory (usually ~/.emacs.d/elpa/pdf-tools).

1.3.6. Building Pdf-Tools

Edit your init.el file and add the line

(pdf-tools-install)

The easiest way to trigger the build will be to save everything. Exit emacs. And then restart emacs.

On your first restart you will be asked whether to rebuild an epdf program. Say yes. Pdf-Tools will then figure out that you are on a -buntu and will ask for your super user password so that it can download from the Ubuntu package system any further packages or software it requires. At the end of that process you will see a message telling you if the compiliation completed successfully. If so, you can open a pdf in emacs just like any other file, C-x C-f.

Test it out with a pdf. If you don't have one handy just down load one via scholar.google.com to your desktop and then open the file from within Emacs.

1.4. Writing a simple report

  1. Start Emacs
  2. We will use R. In the future you might prefer to use the knitr package and rmarkdown package, but here we will use org-babel .
  3. Testing
    • If you have emacs correctly installed and a working latex installation than you should be able to open testLatex.org (look in codeExamples/reportGen/) in emacs and type C-c C-e l p and you will see a new pdf file in your current directory. You can open up the pdf and view it in Emacs with the C-x C-f.
    • If you have R and ESS installed properly you can open up testRBabel.org from the same source diretory, and run the same command. Now the new pdf will have have code and a figure. Look at the code in the .org file. You know how to write that already. You now know how to generate a basic reproducible report.

Ke : Report Generation Demo 1: Basics (i2c4p) from Britt Anderson on Vimeo.

The basics of report generation in emacs, org, and babel. The compilation steps and stages for going through LaTeX to a pdf.

1.5. Writing a not-so-simple report

1.5.1. Testing the inclusion of source code

Go back to the file you wrote in Rmd and which has the for-loop. Now convert that file to an org file (just do a save as: C-x C-w substituting .org for .rmd). Edit the code blocks that you wrote as R chunks to be source blocks in org mode. You have seen examples of this already, but there is a reminder here. Now compile it with the C-c C-e l p command to generate a pdf. Keep tweaking until you get it to work.

1.5.2. Tables.

Pretty much happens automatically. You may have to play with the header arguments to get exactly the look you want. The header arguments are those things appearing on the +#Begin_Src line.

d <- data.frame(foo=c('a','b','c'), bar=c(1.0/3.0,22,32))
d
  foo        bar
1   a  0.3333333
2   b 22.0000000
3   c 32.0000000

Kf : Demo 2: Code and Tables (i2c4p) from Britt Anderson on Vimeo.

Continuing our demonstration of report generation we now show how to include inline code, tables, and figures.

Try adding this to your prior file just to make sure you can get the table to execute.

1.5.3. What is an inline result?

An inline result is one that appears at the correct place in the text.

Imagine that you are doing an analysis for a report and instead of having to cut and paste from some other statistical program you can just magically have the result appear in the right location in the text! Try it. Here is an example to adapt into your current file. Try it with a different number of values and a different type of random number (e.g. how about an exponential distribution)?

The mean of 100 mean 0 normally distributed numbers is 0.0498254924850924.

1.5.4. Can I include references?

Yes. Of course you can. And you should.

Kg : Report Demo 3: Including Citations (i2c4p) from Britt Anderson on Vimeo.

The final demo of report generation shows you how to get and use bibtex formatted references for managing citations in your report.

  1. Using bibtex (biber and biblatex)

    Biblatex is the modern version of this, but it can be a bit easier to get started with than bibtex. Both of them use the same basic format for saved text files to keep all the references that you may ever want to use. They allow you to easily reformat and reuse your references in multiple documents or flip from APA style to another reference style with a few characters of text. Using an alphabetical format that requires changing to a cite in numerical order? Just change the bibtex command and recompile your document. Done.

  2. Testing

    Open up the test.bib file in the codeExamples/reportGen/ sub-directory. This will show you an example of a book and an article reference. Make sure that testRBabelBib.org and test.bib are in the same directory and that it is your current directory. If you have the right packages for latex installed, you should be able to C-c C-e l l to produce a file ending with .tex as an extension. Open this in emacs and enter C-c C-c . Ask for latex. Then repeat the C-c C-c for bibtex and twice more for latex and finally pdflatex. If all this seems repetitive it is possible to set things up in your =init.el` file to avoid this repetition, but this is already a pretty heavy lesson. Now look at the pdf file in emacs. You have a correctly edited, cited, and hyper linked pdf file. Amazing?

2. Assessment: Report Generation with Org Babel

2.1. Purpose

Gain familiarity with report generation using org and babel and the inclusion of code blocks, graphics, and reference sections

2.2. Task

Provide an org file and its companion bib file that has a

  1. Title
  2. Author
  3. Sections for
    • Introduction
    • Methods
    • Results
    • Conclusions
    • References
  4. Just have the headings and some minimal text. The purpose here is to get a working skeleton of a file that you can use and adapt later on.
  5. Cut and paste from your earlier examples of Rmd and Org files using R and Python to have some analysis in the results section. Make sure it includes as a minimum:
    • one code block
    • one figure
    • one inline code reference
    • one new citation other than the ones that were used in the example. There are many ways to get references in the bibtex format. Scholar Google can export that format, and so will most of the major journal web pages.

2.3. Comments

I will evaluate submissions by compiling from org to pdf in my version of emacs. Either I will be able to produce a pdf of your report or I won't. If I can, and it meets all the requirements above then you will get full credit.

Date: 2023-02-13 Mon 00:00

Author: Britt Anderson

Created: 2023-05-29 Mon 11:07

Validate