The aim of this section is to explain the workflow and style with which we develop the Sheffield Tobacco and Alcohol Policy Modelling. The approach is based on the model quality assurance guidance developed by the section of Health Economics and Decision Science within SCHARR at The University of Sheffield, national guidance including the Aqua book (2015) and McPherson review (2013), and learning from other people’s working practices. We also follow the guidelines for developing health economic models to support decision-making (Squires et al. 2016).
The materials on this page help to transfer the knowledge of how we go about the Sheffield Tobacco and Alcohol Modelling within the modelling team. The materials are continuously added to, edited and reviewed. Materials are organised by topic, e.g., how to process a particular dataset or how to prepare a certain model input.
The main introductory resources is this introduction to R for STAPM. The idea is that the resources below provide pointers to material that is would be useful to read to become a proficient R user and to become familiar with how data and code is used in the STAPM modelling.
The principle we apply to coding and use of data is to separate the development of code tools (that we also call “software” or “R packages”) from the use of those code tools to conduct projects. This means that the code developed to in the STAPM modelling is either in the form of re-usable functions within R packages or project specific code that uses R functions within code written for specific project purposes. The code tools we develop in the form of R packages are the reason why we call STAPM a “modelling platform”, because it comprises a set of data and functions to process that data, which can be used and adapted for a range of projects.
A lot of the code uses the code syntax of the data.table R package. It is worth familiarising yourself with this as it differs from base R and the tidyverse.
Getting started in R:
Reproducible research practices:
Graphics:
We want to promote a working culture that expects thorough quality assurance (QA), and where everyone has the confidence to point out potential quality assurance issues by familiarising/normalising the process of finding and reporting errors. QA is conducted as a collaborative process during model building. Responsibility for QA and sign-off for the modelling should be split into two – “senior responsible individual” roles: the person with responsibility for maintaining the code tools, and the person with responsibility for each project that uses the code tools (e.g. principal investigator or lead analyst). Those accountable for the analysis may, or may not, be directly involved in coding themselves, but they need to ensure that coding is being carried out to a suitable standard. Our QA should cover inputs, processes and outputs. This QA process takes time, which should be taken into account in the timetabling and costing of projects.
Inputs
Input data should be checked to make sure it is up to date, has sensible distributions, and is being used appropriately and accurately. Instructions for how to update data inputs (e.g. as new years of data become available) should be added to R package and project documentation as appropriate.
Processes
Processes are iteratively tested by the model developer as model development occurs. Code is then checked by other individuals in the development team through analyst code review and discussion within the project team. We do not tend to double-code functions unless this is the most efficient way to assure the code. We assign who will take a look at different parts of the code, aiming for 2 people to have looked at each element. Where possible we build in automated tests during model development, for testing of individual model components and conditions. These tests can either be integrated into code e.g. within a function, built into the unit tests conducted on functions, or intermingled within the analysis code for a particular project. If the model is complex, we aim to produce simplified worked examples that help to check that the model is working as expected, e.g. by checking data from intermediate steps.
Outputs
Outputs are checked by running simplified worked examples that serve to check the model and understand its gearing. Such end-to-end testing is useful when adapting existing models because it allows comparison of “known good outputs” with new outputs to ensure that no major differences have been introduced. These example model runs are presented to the project team for discussion and sense checks. These checks will look at the final result and at the intermediate results, which means that the analysis should be broken down and presented in stages that everyone can understand. We then think about how we might conduct: Internal validation (checking model outputs against source data); Cross validation (checking model outputs against outputs of similar models); External validation (checking model outputs against external data not used during model development).
QA Reporting
For the STAPM R packages, we have setup a basic procedure for noting errors that were found, how they were detected and how they were fixed (see the Issues Tracking Sheet on our Management page). For projects, a short description of the QA activities undertaken should be added to the final project documentation (e.g. logging the checks undertaken).
Below is a growing list of introductory notes to using the STAPM code. The list is being added to and the notes are being developed all of the time.
Synthetic populations
Smoking state transition probabilities
Running a micro-simulation of smoking behaviour
Further reading:
Running a micro-simulation of price policies
The notes in this section provide training on how to prepare inputs for, run, and post-process outputs with the Tobacco and Alcohol Tax and Price Intervention (TAX-sim) model to produce analysis of policies designed to affect the price of tobacco and alcohol.
Further reading:
Analysis of hospital admissions