August 23, 2015Nicola Roberts Reading time ~9 minutes

R packages part 3: full tilt

In part 3: S4 classes and methods, compiled code, and automated checking with Travis CI.

See part 1 for R package set up, directory structure, the DESCRIPTION file, writing R code for packages, using roxygen2 to write documentation and define the package namespace, and a simple build protocol.

See part 2 for versioning, GitHub release, data, other files, testing, and vignettes.

Sources:

R packages by Hadley Wickham
Advanced R by Hadley Wickham
Developer guidelines from Bioconductor
Writing R extensions by R Core Team
R object oriented programming by Robert Stojnić and Laurent Gatto

S4 classes and methods

If defining new S4 classes and methods, add the methods package to the Imports field in DESCRIPTION, and include the following in pkgname.R:

#' @import methods 
NULL

Define a new class as shown below. The class name should use UpperCamelCase. The slots can be of type ANY (no type restriction), a base type, S4 class, or S3 class registered with setOldClass(). To allow multiple classes in a slot, use setClassUnion(). The validity argument is a function of the object returning TRUE or FALSE. Other useful arguments to setClass are contains for class inheritance, and prototype for default slot values. For class inheritance from another package, use @importClassesFrom pkg ClassName (and add package to Imports field in DESCRIPTION). If you want others to extend your class, @export it; if you want others to create instances of the class but not extend it, just @export the constructor function.

#' An example S4 class for members of The Beatles
#'
#' @slot name First name
#' @slot ranking Your ranking of favourites, from 1-4
setClass("BeatlesMember",
         slots = list(
             name = "character",
             ranking = "numeric"),
         validity = function(object) {
             is_valid <- TRUE
             if (! object@name %in% c("John", "Paul", "George", "Ringo")) {
                 is_valid <- FALSE
                 message("Name is not one of John, Paul, George, or Ringo")
             }
             if (! object@ranking %in% 1:4) {
                 is_valid <- FALSE
                 message("Ranking must be 1, 2, 3 or 4")
             }
             return(is_valid)
         })


# Constructor function
#' Create an instance of BeatlesMember
#'
#' @param name First name of Beatle's member
#' @param ranking Your ranking of favourites, from 1-4
#'
#' @export
#'
#' @examples
#' beatles_member('John', 1)
beatles_member <- function(name, ranking){
    ans <- new("BeatlesMember", name=name, ranking=ranking)
    return(ans)
}

Functions of S4 classes may be written as “regular” functions, or as S4 methods dispatched via a generic function. As a general rule, write S4 generics and methods if the function is a common task that could have multiple class-specific implementations, e.g. plot, append, sort, unique, as.data.frame etc. Setters and getters are also convenient as S4 methods. In contrast, if the function is highly specific to your package, just implement it as a regular function (checking the input has the correct class).

A generic function dispatches a method implementation specific to the class of the argument. The generic function must be defined before the method. If it has already been defined in another package (e.g. BiocGenerics), then use that pre-existing definition by including the roxygen comment #' @importMethodsFrom pkg generic.name above your method definition (and add package to Imports field in DESCRIPTION). To define a new generic, use the setGeneric function, and possibly @export it to users.

To define a method, use the setMethod function as shown below, with the argument name/s to the function exactly matching the argument name/s in the generic (even if it was defined by someone else). @export every method, and possibly use @describeIn to merge documentation with the class or the generic.

#' @importMethodsFrom BiocGenerics as.data.frame
#' @describeIn BeatlesMember convert to data.frame
#' @param x Object of class BeatlesMember
#' @export
setMethod("as.data.frame",
          signature = "BeatlesMember",
          definition = function(x, ...) {
              ans <- data.frame(name=x@name, ranking=x@ranking)
              return(ans)
          })

Slots of an S4 object can be accessed (get and set) via @ or slot(). However, this requires specific knowledge of the implementation (slot names). Defined accessor methods are a better approach for general users. Remember to check validObject(x)==TRUE before returning an object with a new slot value.

# only set the generic like this if it does not already exist
#' Get 'name' slot from S4 class
#' @param x Object of S4 class with slot 'name'
setGeneric("name", function(x, ...) standardGeneric("name"))

#' @describeIn BeatlesMember get name value
#' @export
setMethod("name", "BeatlesMember", function(x) x@name)

#' Set 'name' slot from S4 class
#' @param x Object of S4 class with slot 'name'
setGeneric("name<-", function(x, value) standardGeneric("name<-"))

#' @describeIn BeatlesMember set name value
#' @param value replacement value
#' @export
setReplaceMethod("name",
                 "BeatlesMember",
                 function(x, value) {
                     x@name <- value
                     if (validObject(x)) return(x)
                 })

A special method called show controls how the object is printed to console. Note that the show generic is provided by the methods package.

setMethod("show",
          "BeatlesMember",
          function(object) {
              cat("Object of class", class(object), "\n")
              cat(" name:", object@name, "\n")
              cat(" ranking:", object@ranking, "\n")
          })

When working with S4, the code must be loaded in the order: classes, generics, methods+other. The default is for R package code to load alphabetically by file name. To ensure classes and generics are loaded first, they could be placed in files aaa-classes.R and aaa-generics.R respectively. Alternatively, use the @include roxygen tag at the top of a file to list all the other source code files that should be loaded beforehand. This information is used to set the Collates field in DESCRIPTION (specifies a non-default load order).

Compiled code

Any C/C++ code (including header files) belong in the src/ directory.

Instructions for C++ or C

Set up a src/.gitignore file to ignore *.o, *.so and *.dll files (this will be auto-generated if you run devtools::use_rcpp()).

To access compiled C/C++ functions from R through the .Call() function, include the roxygen tag @useDynLib pkgname (for all compiled routines, place in pkgname.R) or @useDynLib pkgname routine (for a specific routine, place alongside wrapper R function).

Write .onUnload() function (place in pkgname.R) to clean up when the package is unloaded.

.onUnload <- function(libpath){
  library.dynam.unload("pkgname", libpath)
}

C with the R API

Using C in R packages is only recommended for legacy code.

C files must include:

#include <R.h>
#include <Rinternals.h>

To interface with R, C functions must both input and output SEXP (S expression) types (first and last steps are usually conversion between SEXP and C types). Remember to PROTECT() (and later UNPROTECT()) any SEXP object created in C to save it from R’s garbage collector.

Compiled functions should be called via a wrapper R function (with accompanying roxygen documentation), and the input classes can be checked within the wrapper.

illustrate <- function(x, y) {
  .Call('illustrate', PACKAGE='pkgname', x, y)
}

More info on using C with R is available here and here.

C++ with Rcpp

Run devtools::use_rcpp() to add Rcpp to the LinkingTo and Imports fields in DESCRIPTION and set up the .gitignore file as described above.

Include roxygen tag @importFrom Rcpp sourceCpp in pkgname.R (don’t actually need sourceCpp, but a bug in R means something has to be imported so the internal Rcpp code gets properly loaded).

C++ files must include:

#include <Rcpp.h>
using namespace Rcpp;

Rcpp will do the hard work of setting up functions of SEXP objects for you, so just write the C++ functions in terms of native C++ types, and preface the function with the special comment // [[Rcpp::export]]. Run devtools::document() and then build and reload the package. This automatically calls Rcpp::compileAttributes() and auto-generates the files src/RcppExports.cpp and R/RcppExports.R.

The auto-generated functions in src/RcppExports.cpp act as the go-between for the SEXP types passed to/from R and the C++ types passed to/from your other C++ functions.

The auto-generated R/RcppExports.R file contains wrapper functions for calling the compiled C++ functions. This file shouldn’t be edited directly, so any documentation should be written alongside the source C++ code. Write roxygen comment blocks in C++ as in R, just using //' at the start of each line instead of #' . Note that the roxygen line //' @export makes the R wrapper function available to the user, while the non-roxygen line // [[Rcpp::export]] just makes the C++ function available to the R wrapper function (via the SEXP translator function made in src/RcppExports.cpp).

More info on using C++ with R is available here and here.

Automated checking with Travis CI

During development, run devtools::check() and strive to eliminate all errors, warnings, and notes from these checks.

To automatically run R CMD check after every push to GitHub:

Run devtools::use_travis() to generate the .travis.yml config file and update .Rbuildignore accordingly. The basic config options are automatically included, with more advanced options described in the docs. Push to GitHub.
Log in to Travis (linked to your GitHub account), and turn on the repo you want to test (will automatically run after every push to GitHub).
Embed Travis status image (failing/passing) in the README.md:

[![Build Status](https://travis-ci.org/<USR>/<REPO>.svg?branch=master)](https://travis-ci.org/<USR>/<REPO>)

For BioConductor, the package must also pass BiocCheck::BiocCheck("/path/to/pkg"). To automate with Travis, add the following code to .travis.yml (taken from Przemol).

bioc_required: true
bioc_packages:
  - BiocCheck

after_script:
  - ls -lah
  - FILE=$(ls -1t *.tar.gz | head -n 1)
  - Rscript -e "library(BiocCheck); BiocCheck(\"${FILE}\")"

Submission to CRAN or Bioconductor

Info on submitting package to CRAN.

Info on submitting package to BioConductor is here, here and here.

July 29, 2015Nicola Roberts Reading time ~7 minutes

R packages part 2: on a roll

In part 2: versioning, GitHub release, data, other files, testing, and vignettes.

Sources:

R packages by Hadley Wickham
Developer guidelines from Bioconductor
Writing R extensions by R Core Team
Developing R packages by Jeff Leek

Versioning

Use x.y.z versioning scheme, starting with 0.1.0 as recommended by Jeff Leek. Before release on CRAN or Bioconductor, keep x at 0 and increase y with every major redesign. Each time a change is made public (pushed to GitHub), increase z by one.

When submitting a package to Bioconductor, submit as version 0.99.0 so it gets bumped up to 1.0.0 on the next Bioconductor release. Bioconductor uses even y for packages in release and odd y for packages in development. Every time the Bioconductor release version increases y to the next even, bump up the GitHub (devel) version to the next odd. Continue to increase z with every public change (bumps back to zero with every y increase). To signify a major redesign with increased x, set y to 99 in the development version, and then the next Bioconductor release will be (x+1).0.0.

GitHub for R packages

For an in-depth explanation of version control and code release using git and GitHub for R package development, see Hadley Wickham’s chapter. In brief:

use git version control for local package development from the start (can turn on within RStudio),
create GitHub repo with same name as the package,
set remote origin of the local git repo to git@github.com:username/packagename.git,
write a README.md file to describe the package to GitHub users - include installation instructions e.g. devtools::install_github('username/packagename'),
push public versions to GitHub, bumping up the version number z and ensuring R CMD CHECK passes.

The URL and BugReports fields in the DESCRIPTION file can point to the package’s GitHub site and issues page. Use git tags to mark important versions.

Data

Data can be included in an R package as a means of: data release and sharing; checking package behavior with automated tests; and demonstrating package functions through examples or vignettes.

If the package is primarily a vehicle for data release and sharing, then the included functions should be minimal. If the package is primarily designed as analysis software, then the included data should be small. Large examples and vignettes for software packages can use large datasets from separate data packages. Note that data packages in Bioconductor are not limited by the same size restrictions as apply to software packages (see here).

R data objects to be exported to the user (for examples and vignettes) go in data/. The code to generate these objects goes in a data-raw/ directory. Run devtools::use_data_raw() to both create the data-raw/ directory, and add it to .Rbuildignore. In the data generation scripts, use devtools::use_data(object) to create a .rda file with the same name as the object in data/. Set LazyData: true in the DESCRIPTION file. Lazy-loading allows these data objects to be accessed directly in examples or vignettes (don’t require explicit loading), and only take up memory in an R session when called upon.

Objects in data/ are exported to the user (no export tag required), and should, therefore, be documented. Write a roxygen comment block for each data object in the R/<pkgname>.R file, with the data object name as an uncommented string beneath. Example from ggplot2 package:

#' Prices of 50,000 round cut diamonds.
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds.
#'
#' @format A data frame with 53940 rows and 10 variables:
#' \describe{
#'   \item{price}{price, in US dollars}
#'   \item{carat}{weight of the diamond, in carats}
#'   ...
#' }
#' @source \url{http://www.diamondse.info/}
"diamonds"

R data objects to be hidden from the user (for internal use within functions) go in R/sysdata.rda. As above, put the code to generate these objects in data-raw/, and use devtools::use_data(object1, object2, internal=TRUE) to save them to R/sysdata.rda. These objects will be available internally via lazy-loading (no need to explicitly load).

Raw data (not parsed into an R data object) is stored in inst/extdata/, and can be used to give examples of loading and parsing data from scratch. Reach these files using system.file('extdata', 'filename.csv', package='<pkgname>').

Note that some data can also be directly encoded in the R source files.

Installed files

The inst/ directory can house any miscellaneous files, and these are copied into the top-level directory when the package is installed. Some common files include:

inst/AUTHOR to describe non-standard authorship,
inst/COPYRIGHT to describe non-standard copyright,
inst/CITATION to give citation instructions,
inst/extdata as described above.

A message at package start up (see part 1) is a good way to direct users to this information. For example, users could be instructed to run citation('<pkgname>') to see citation instructions, and file.show(system.file("LICENSE", package='<pkgname>')) to see the LICENSE file.

Testing

Run devtools::use_testthat() to: create the directory tests/testthat/, write the file tests/testthat.R, and add testthat to the Suggests field in DESCRIPTION. Don’t edit the tests/testthat.R file - it ensures the tests are run during R CMD CHECK.

Test scripts go in the tests/testthat/ directory, and their names must start with test. Run devtools::test() to execute all test scripts.

Group related tests in the same file:

library(<pkgname>)
context("Context for this group of tests")

test_that("Expectation being tested", {
	expect_equal(my_func(input1), output1)
	expect_equal(my_func(input2), output2)
	expect_equal(my_func(input3), output3)
})

test_that("Expectation being tested", {
	expect_error(my_func(input1), "part of expected error msg")
	expect_error(my_func(input2), "part of expected error msg")
	expect_error(my_func(input3), "part of expected error msg")
})

The first argument to test_that should complete the sentence “Test that …”. The second argument is a code block containing one or more expectations. Types of expect_ function include:

equal (within numeric tolerance)
identical (no tolerance)
match (against a regular expression)
Variants of expect_match:
- output
- message
- warning
- error
is (inherits from specified class)
true
false

To calculate the percentage of a package covered by tests, run covr::package_coverage().

To test if the R code in the package follows the lintr style guide, add lintr to the Suggests field in DESCRIPTION and include as a test:

if (requireNamespace("lintr", quietly = TRUE)) {
  context("lints")
  test_that("Package style conforms to linters", {
    lintr::expect_lint_free()
  })
}

Vignettes

Vignettes are tutorials stepping through one or more use cases of the package, using small real datasets.

Run devtools::use_vignette('vig_title') to create the template file vignettes/vig_title.Rmd and add knitr to the Suggests and VignetteBuilder fields in DESCRIPTION. In addition, manually add rmarkdown and BiocStyle to the Suggests field in DESCRIPTION.

Example metadata header:

---
title: "Vignette Title"
author: "Vignette Author"
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{Vignette Title}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output: BiocStyle::html_document:
  toc: true
  fig_caption: yes
---

Include this code chunk at the beginning:

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
```

Write the text of the vignette in R-flavoured markdown, using BiocStyle macros to refer to other R packages.

Weave code and results through the text using knitr syntax. The data used for the vignette examples can be stored in inst/extdata (demonstrate how to load raw data) or in data/ (demonstrate how to work with loaded data). Remember to use Shift+Alt+K to see RStudio shortcuts - includes shortcuts for running code chunks.

At the end of the vignette, include a Session Information section with the output from devtools::session_info().

approaching significance

Latest Posts