Code tools

Code tools

Publishing and promoting the software and the outputs it produces
Open source R packages
R packages that run the models
R packages that process data on tobacco and alcohol consumption
R packages for disease epidemiology
R packages that model policy effects
R packages that model economic outcomes
R packages that model health outcomes

Other STAPM-related software

Data Cleaning Packages
References

Code tools

This page lists the internal STAPM R packages. These code tools help to improve code quality, promote reproducible analysis, and enhance knowledge management. This post is a good explainer of why developing internal R packages is useful.

Developing code tools in the form of modular R functions that have been quality assured means that we save time by avoiding re-writing code for the same elements for each project, and that there is a greater consistency across projects and code. Writing code in a modular form (as functions) also means that we can more easily share useful elements of code between projects. New projects can extend functions, add new versions of functions, or add additional functions that bring new aspects to our modelling. Each function is accompanied by a description of what that element of code does.

The R packages that we have developed are described below (for the installation instructions, the documentation of individual functions and technical vignettes, click on the package links below to go to each package’s webpage). On the package webpages you will see that each package has its own accompanying vignettes, which explain in more detail what the functions in it do and how to use the functions. Each function has its own helpfile, which explains its inputs, processes and outputs. The package webpages also contain links to key outputs produced by the package code.

Tobacco and alcohol policy modelling workflow of R packages that inform the core STAPM modelling.

Publishing and promoting the software and the outputs it produces

The goal is to publish and promote the software developed in the form of these R packages. To this end, use-case scenarios, worked examples and “how to” resources are being developed. This process is supported by two-way engagement with academic collaborators and knowledge exchange partners who might make use of the STAPM code base.

Inspired by the ideas behind the R package “vertical” (Vuorre and Crump 2021), we have begun to integrate code that produces outputs within some of the STAPM R packages. For example,

The smktrans STAPM R package is open access and there is embedded code that runs the functions within the package to produce smoking state transition probability estimates, which are also made open access alongside the R package code.
In the mort.tools STAPM R package, there is embedded code that uses the package functions to estimate death rates from causes related to tobacco and alcohol consumption, and this produces outputs that are linked to data visualisations that serve as quality assurance checks on the code.

Any outputs of the nature described above are highlighted and linked to on the package webpages below.

Data is only attached to packages when it is publicly available or sufficiently derived from the original source through the processes applied in the STAPM code to meet information governance requirements (see the Data page for the information governance policies followed).

Open source R packages

Some of the STAPM family of R packages are made open access via Github and the Open Science Framework under a GPL v3 license. The code that has been made open source is generally code that is used for building model inputs and for conducting certain functions within the simulation process. The code has been made open source for the following two reasons:

Transparency. Open science, allowing review and feedback to the project team on the code and methods used.
Methodology sharing. For people to understand the code and methods used so they might use aspects of it in their own work, e.g., because they are doing something partially related that isn’t exactly the same job and might like to “dip into” elements of this code for inspiration

Other R packages are not publicly available. The code that is not publicly available is generally code that is used for running the model and processing outcomes for particular projects. The sharing of this more sensitive code is subject to review and discussion among the wider team in the Sheffield Centre for Health and Related Research.

R packages that run the models

The main functions to run the models.

The stapmr package is the main package that we use to run the models. It contains functions that:

simulate individual-level smoking and/or drinking behaviour

apply mortality rates to simulate individual deaths

apply policy/intervention effects

calculate health and economic outcomes

R packages that process data on tobacco and alcohol consumption

The R packages in this section all contain functions that prepare inputs to the STAPM microsimulation of tobacco and/or alcohol consumption.

The hseclean package contains functions to process the Health Survey for England data and the Scottish Health Survey data to inform the characteristics of the synthetic population sample. It contains functions that:

read the data, rename, organise and process the variables required

impute missing data

summarise data

The toolkitr package contains functions to process the Smoking Toolkit Study (STS) and Alcohol Toolkit Study (ATS) data to produce monthly cross-sectional data on drinking and smoking behaviours and attitudes:

read the data, rename, organise and process the variables required

summarise data

The smktrans package contains functions that estimate annual probabilities of smoking initiation, quitting and relapse from several years of cross-sectional smoking survey data. Estimates are stratified by age, sex, cohort/period, and Index of Multiple Deprivation quintiles.

The alc.tools package contains functions that support the microsimulation of the dynamics of individual tobacco and alcohol consumption. Whilst the smktrans package supports the microsimulation of whether individuals currently smoke, the alc.tools package contains functions that support the microsimulation of the average number of cigarettes smoked per day by current smokers. The alc.tools package also contains functions to support the microsimulation of whether or not someone currently drinks, and of the average number of UK standard units of alcohol they drink if they are current drinkers. Data inputs to the functions in the alc.tools package are prepared by functions in the hseclean package.

R packages for disease epidemiology

Functions to assign the relative risk of disease to individuals and to summarise this risk e.g. to calculate population attributable fractions.

The tobalcepi package contains functions to assign relative risks of disease to individuals based on their tobacco and/or alcohol consumption, and to estimate the population attributable fractions of disease. The model currently estimates the effects of tobacco and alcohol consumption on the risks of developing 84 ICD-10 defined categories of disease.

R packages that model policy effects

Functions to estimate the effects of certain types of policy or intervention on tobacco and/or alcohol consumption.

The pricepol package contains functions to model the effects of price policies on tobacco and alcohol consumption.

The taxsim.post package contains functions to produce standardised outputs from the tax and price policy model TAX-sim.

R packages that model economic outcomes

Functions to estimate the economic outcomes of changes to tobacco and/or alcohol consumption.

The econcalc package contains functions to process the results of the simulation to obtain economic outcomes, including total tax revenue, total retail revenue, total consumer spending, mean prices paid by consumers, and the distributions of prices and net retail revenues.

The tobalciomodel package applies input-output modelling methods to estimate the impact of changes in demand for alcohol and tobacco products on the economy. Macroeconomic outcome impacts modelled include gross value added (GVA) and employment.

R packages that model health outcomes

Functions to estimate the health outcomes of changes to tobacco and/or alcohol consumption.

The mort.tools package contains code to read, clean and analyse cause-specific mortality data.

Data for England and Wales are provided by the Office for National Statistics.

Data for Scotland are from National Records Scotland.

The hesr package is a collection of functions to read, clean and analyse Admitted Patient Care data from the Hospital Episode Statistics. This provides cause-specific estimates of morbidity, rates of hospital admissions, and costs of hospital admissions.

The qalyr package is a collection of functions to support the estimation of health state utility values. The main data source is the Health Outcomes Data Repository (HODaR) data.

Code tools

Publishing and promoting the software and the outputs it produces

Open source R packages

R packages that run the models

R packages that process data on tobacco and alcohol consumption

R packages for disease epidemiology

R packages that model policy effects

R packages that model economic outcomes

R packages that model health outcomes

Other STAPM-related software

Data Cleaning Packages

References