Code tools

This page lists the internal STAPM R packages. These code tools help to improve code quality, promote reproducible analysis, and enhance knowledge management. This post is a good explainer of why developing internal R packages is useful.

Developing code tools in the form of modular R functions that have been quality assured means that we save time by avoiding re-writing code for the same elements for each project, and that there is a greater consistency across projects and code. Writing code in a modular form (as functions) also means that we can more easily share useful elements of code between projects. New projects can extend functions, add new versions of functions, or add additional functions that bring new aspects to our modelling. Each function is accompanied by a description of what that element of code does.

The R packages that we have developed are described below (for the installation instructions, the documentation of individual functions and technical vignettes, click on the package links below to go to each package’s webpage). On the package webpages you will see that each package has its own accompanying vignettes, which explain in more detail what the functions in it do and how to use the functions. Each function has its own helpfile, which explains its inputs, processes and outputs. The package webpages also contain links to key outputs produced by the package code.


Tobacco and alcohol policy modelling workflow of R packages that inform the core STAPM modelling.



Publishing and promoting the software and the outputs it produces

The goal is to publish and promote the software developed in the form of these R packages. To this end, use-case scenarios, worked examples and “how to” resources are being developed. This process is supported by two-way engagement with academic collaborators and knowledge exchange partners who might make use of the STAPM code base.

Inspired by the ideas behind the R package “vertical” (Vuorre and Crump 2021), we have begun to integrate code that produces outputs within some of the STAPM R packages. For example,

Any outputs of the nature described above are highlighted and linked to on the package webpages below.

Data is only attached to packages when it is publicly available or sufficiently derived from the original source through the processes applied in the STAPM code to meet information governance requirements (see the Data page for the information governance policies followed).



Open source R packages

Some of the STAPM family of R packages are made open access via Github and the Open Science Framework under a GPL v3 license. The code that has been made open source is generally code that is used for building model inputs and for conducting certain functions within the simulation process. The code has been made open source for the following two reasons:

  • Transparency. Open science, allowing review and feedback to the project team on the code and methods used.

  • Methodology sharing. For people to understand the code and methods used so they might use aspects of it in their own work, e.g., because they are doing something partially related that isn’t exactly the same job and might like to “dip into” elements of this code for inspiration

Other R packages are not publicly available. The code that is not publicly available is generally code that is used for running the model and processing outcomes for particular projects. The sharing of this more sensitive code is subject to review and discussion among the wider team in the Sheffield Centre for Health and Related Research.



R packages that process data on tobacco and alcohol consumption

The R packages in this section all contain functions that prepare inputs to the STAPM microsimulation of tobacco and/or alcohol consumption.

The hseclean package contains functions to process the Health Survey for England data and the Scottish Health Survey data to inform the characteristics of the synthetic population sample. It contains functions that:
  • read the data, rename, organise and process the variables required
  • impute missing data
  • summarise data
  • The toolkitr package contains functions to process the Smoking Toolkit Study (STS) and Alcohol Toolkit Study (ATS) data to produce monthly cross-sectional data on drinking and smoking behaviours and attitudes:
  • read the data, rename, organise and process the variables required
  • summarise data
  • The smktrans package contains functions that estimate annual probabilities of smoking initiation, quitting and relapse from several years of cross-sectional smoking survey data. Estimates are stratified by age, sex, cohort/period, and Index of Multiple Deprivation quintiles.

    The alc.tools package contains functions that support the microsimulation of the dynamics of individual tobacco and alcohol consumption. Whilst the smktrans package supports the microsimulation of whether individuals currently smoke, the alc.tools package contains functions that support the microsimulation of the average number of cigarettes smoked per day by current smokers. The alc.tools package also contains functions to support the microsimulation of whether or not someone currently drinks, and of the average number of UK standard units of alcohol they drink if they are current drinkers. Data inputs to the functions in the alc.tools package are prepared by functions in the hseclean package.