R packages part 2: on a roll

In part 2: versioning, GitHub release, data, other files, testing, and vignettes.

See part 1 for R package set up, directory structure, the DESCRIPTION file, writing R code for packages, using roxygen2 to write documentation and define the package namespace, and a simple build protocol.

Sources:

R packages by Hadley Wickham
Developer guidelines from Bioconductor
Writing R extensions by R Core Team
Developing R packages by Jeff Leek

Versioning

Use x.y.z versioning scheme, starting with 0.1.0 as recommended by Jeff Leek. Before release on CRAN or Bioconductor, keep x at 0 and increase y with every major redesign. Each time a change is made public (pushed to GitHub), increase z by one.

When submitting a package to Bioconductor, submit as version 0.99.0 so it gets bumped up to 1.0.0 on the next Bioconductor release. Bioconductor uses even y for packages in release and odd y for packages in development. Every time the Bioconductor release version increases y to the next even, bump up the GitHub (devel) version to the next odd. Continue to increase z with every public change (bumps back to zero with every y increase). To signify a major redesign with increased x, set y to 99 in the development version, and then the next Bioconductor release will be (x+1).0.0.

GitHub for R packages

For an in-depth explanation of version control and code release using git and GitHub for R package development, see Hadley Wickham’s chapter. In brief:

use git version control for local package development from the start (can turn on within RStudio),
create GitHub repo with same name as the package,
set remote origin of the local git repo to git@github.com:username/packagename.git,
write a README.md file to describe the package to GitHub users - include installation instructions e.g. devtools::install_github('username/packagename'),
push public versions to GitHub, bumping up the version number z and ensuring R CMD CHECK passes.

The URL and BugReports fields in the DESCRIPTION file can point to the package’s GitHub site and issues page. Use git tags to mark important versions.

Data

Data can be included in an R package as a means of: data release and sharing; checking package behavior with automated tests; and demonstrating package functions through examples or vignettes.

If the package is primarily a vehicle for data release and sharing, then the included functions should be minimal. If the package is primarily designed as analysis software, then the included data should be small. Large examples and vignettes for software packages can use large datasets from separate data packages. Note that data packages in Bioconductor are not limited by the same size restrictions as apply to software packages (see here).

R data objects to be exported to the user (for examples and vignettes) go in data/. The code to generate these objects goes in a data-raw/ directory. Run devtools::use_data_raw() to both create the data-raw/ directory, and add it to .Rbuildignore. In the data generation scripts, use devtools::use_data(object) to create a .rda file with the same name as the object in data/. Set LazyData: true in the DESCRIPTION file. Lazy-loading allows these data objects to be accessed directly in examples or vignettes (don’t require explicit loading), and only take up memory in an R session when called upon.

Objects in data/ are exported to the user (no export tag required), and should, therefore, be documented. Write a roxygen comment block for each data object in the R/<pkgname>.R file, with the data object name as an uncommented string beneath. Example from ggplot2 package:

#' Prices of 50,000 round cut diamonds.
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds.
#'
#' @format A data frame with 53940 rows and 10 variables:
#' \describe{
#'   \item{price}{price, in US dollars}
#'   \item{carat}{weight of the diamond, in carats}
#'   ...
#' }
#' @source \url{http://www.diamondse.info/}
"diamonds"

R data objects to be hidden from the user (for internal use within functions) go in R/sysdata.rda. As above, put the code to generate these objects in data-raw/, and use devtools::use_data(object1, object2, internal=TRUE) to save them to R/sysdata.rda. These objects will be available internally via lazy-loading (no need to explicitly load).

Raw data (not parsed into an R data object) is stored in inst/extdata/, and can be used to give examples of loading and parsing data from scratch. Reach these files using system.file('extdata', 'filename.csv', package='<pkgname>').

Note that some data can also be directly encoded in the R source files.

Installed files

The inst/ directory can house any miscellaneous files, and these are copied into the top-level directory when the package is installed. Some common files include:

inst/AUTHOR to describe non-standard authorship,
inst/COPYRIGHT to describe non-standard copyright,
inst/CITATION to give citation instructions,
inst/extdata as described above.

A message at package start up (see part 1) is a good way to direct users to this information. For example, users could be instructed to run citation('<pkgname>') to see citation instructions, and file.show(system.file("LICENSE", package='<pkgname>')) to see the LICENSE file.

Testing

Run devtools::use_testthat() to: create the directory tests/testthat/, write the file tests/testthat.R, and add testthat to the Suggests field in DESCRIPTION. Don’t edit the tests/testthat.R file - it ensures the tests are run during R CMD CHECK.

Test scripts go in the tests/testthat/ directory, and their names must start with test. Run devtools::test() to execute all test scripts.

Group related tests in the same file:

library(<pkgname>)
context("Context for this group of tests")

test_that("Expectation being tested", {
	expect_equal(my_func(input1), output1)
	expect_equal(my_func(input2), output2)
	expect_equal(my_func(input3), output3)
})

test_that("Expectation being tested", {
	expect_error(my_func(input1), "part of expected error msg")
	expect_error(my_func(input2), "part of expected error msg")
	expect_error(my_func(input3), "part of expected error msg")
})

The first argument to test_that should complete the sentence “Test that …”. The second argument is a code block containing one or more expectations. Types of expect_ function include:

equal (within numeric tolerance)
identical (no tolerance)
match (against a regular expression)
Variants of expect_match:
- output
- message
- warning
- error
is (inherits from specified class)
true
false

To calculate the percentage of a package covered by tests, run covr::package_coverage().

To test if the R code in the package follows the lintr style guide, add lintr to the Suggests field in DESCRIPTION and include as a test:

if (requireNamespace("lintr", quietly = TRUE)) {
  context("lints")
  test_that("Package style conforms to linters", {
    lintr::expect_lint_free()
  })
}

Vignettes

Vignettes are tutorials stepping through one or more use cases of the package, using small real datasets.

Run devtools::use_vignette('vig_title') to create the template file vignettes/vig_title.Rmd and add knitr to the Suggests and VignetteBuilder fields in DESCRIPTION. In addition, manually add rmarkdown and BiocStyle to the Suggests field in DESCRIPTION.

Example metadata header:

---
title: "Vignette Title"
author: "Vignette Author"
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{Vignette Title}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output: BiocStyle::html_document:
  toc: true
  fig_caption: yes
---

Include this code chunk at the beginning:

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
```

Write the text of the vignette in R-flavoured markdown, using BiocStyle macros to refer to other R packages.

Weave code and results through the text using knitr syntax. The data used for the vignette examples can be stored in inst/extdata (demonstrate how to load raw data) or in data/ (demonstrate how to work with loaded data). Remember to use Shift+Alt+K to see RStudio shortcuts - includes shortcuts for running code chunks.

At the end of the vignette, include a Session Information section with the output from devtools::session_info().

R packages part 2: on a roll

July 29, 2015

Versioning

GitHub for R packages

Data

Installed files

Testing

Vignettes

R packages part 3: full tilt

R packages part 1: up and running

Setting up this blog site: github, jekyll, and the hpstr theme