This page lists the internal STAPM R packages. These code tools help to improve code quality, promote reproducible analysis, and enhance knowledge management. This post is a good explainer of why developing internal R packages is useful.
Developing code tools in the form of modular R functions that have been quality assured means that we save time by avoiding re-writing code for the same elements for each project, and that there is a greater consistency across projects and code. Writing code in a modular form (as functions) also means that we can more easily share useful elements of code between projects. New projects can extend functions, add new versions of functions, or add additional functions that bring new aspects to our modelling. Each function is accompanied by a description of what that element of code does.
The R packages that we have developed are described below (for the installation instructions, the documentation of individual functions and technical vignettes, click on the package links below to go to each package’s webpage). On the package webpages you will see that each package has its own accompanying vignettes, which explain in more detail what the functions in it do and how to use the functions. Each function has its own helpfile, which explains its inputs, processes and outputs. The package webpages also contain links to key outputs produced by the package code.
Tobacco and alcohol policy modelling workflow of R packages that inform the core STAPM modelling.
The goal is to publish and promote the software developed in the form of these R packages. To this end, use-case scenarios, worked examples and “how to” resources are being developed. This process is supported by two-way engagement with academic collaborators and knowledge exchange partners who might make use of the STAPM code base.
Inspired by the ideas behind the R package “vertical” (Vuorre and Crump 2021), we have begun to integrate code that produces outputs within some of the STAPM R packages. For example,
The smktrans STAPM R package is open access and there is embedded code that runs the functions within the package to produce smoking state transition probability estimates, which are also made open access alongside the R package code.
In the mort.tools STAPM R package, there is embedded code that uses the package functions to estimate death rates from causes related to tobacco and alcohol consumption, and this produces outputs that are linked to data visualisations that serve as quality assurance checks on the code.
Any outputs of the nature described above are highlighted and linked to on the package webpages below.
Data is only attached to packages when it is publicly available or sufficiently derived from the original source through the processes applied in the STAPM code to meet information governance requirements (see the Data page for the information governance policies followed).
Some of the STAPM family of R packages are made open access via Github and the Open Science Framework under a GPL v3 license. The code that has been made open source is generally code that is used for building model inputs and for conducting certain functions within the simulation process. The code has been made open source for the following two reasons:
Transparency. Open science, allowing review and feedback to the project team on the code and methods used.
Methodology sharing. For people to understand the code and methods used so they might use aspects of it in their own work, e.g., because they are doing something partially related that isn’t exactly the same job and might like to “dip into” elements of this code for inspiration
Other R packages are not publicly available. The code that is not publicly available is generally code that is used for running the model and processing outcomes for particular projects. The sharing of this more sensitive code is subject to review and discussion among the wider team in the Sheffield Centre for Health and Related Research.
The main functions to run the models.
The R packages in this section all contain functions that prepare inputs to the STAPM microsimulation of tobacco and/or alcohol consumption.
The smktrans package contains functions that estimate annual probabilities of smoking initiation, quitting and relapse from several years of cross-sectional smoking survey data. Estimates are stratified by age, sex, cohort/period, and Index of Multiple Deprivation quintiles.
The alc.tools package contains functions that support the microsimulation of the dynamics of individual tobacco and alcohol consumption. Whilst the smktrans package supports the microsimulation of whether individuals currently smoke, the alc.tools package contains functions that support the microsimulation of the average number of cigarettes smoked per day by current smokers. The alc.tools package also contains functions to support the microsimulation of whether or not someone currently drinks, and of the average number of UK standard units of alcohol they drink if they are current drinkers. Data inputs to the functions in the alc.tools package are prepared by functions in the hseclean package.
Functions to assign the relative risk of disease to individuals and to summarise this risk e.g. to calculate population attributable fractions.
The tobalcepi package contains functions to assign relative risks of disease to individuals based on their tobacco and/or alcohol consumption, and to estimate the population attributable fractions of disease. The model currently estimates the effects of tobacco and alcohol consumption on the risks of developing 84 ICD-10 defined categories of disease.
Functions to estimate the effects of certain types of policy or intervention on tobacco and/or alcohol consumption.
The pricepol package contains functions to model the effects of price policies on tobacco and alcohol consumption.
The taxsim.post package contains functions to produce standardised outputs from the tax and price policy model TAX-sim.
Functions to estimate the economic outcomes of changes to tobacco and/or alcohol consumption.
The econcalc package contains functions to process the results of the simulation to obtain economic outcomes, including total tax revenue, total retail revenue, total consumer spending, mean prices paid by consumers, and the distributions of prices and net retail revenues.
The tobalciomodel package applies input-output modelling methods to estimate the impact of changes in demand for alcohol and tobacco products on the economy. Macroeconomic outcome impacts modelled include gross value added (GVA) and employment.
Functions to estimate the health outcomes of changes to tobacco and/or alcohol consumption.
The hesr package is a collection of functions to read, clean and analyse Admitted Patient Care data from the Hospital Episode Statistics. This provides cause-specific estimates of morbidity, rates of hospital admissions, and costs of hospital admissions.
The qalyr package is a collection of functions to support the estimation of health state utility values. The main data source is the Health Outcomes Data Repository (HODaR) data.