Improving Epidemiology Productivity and the Availability of Information

State: IN Type: Promising Practice Year: 2020

PH Issue: ­Timely access to information in a low resource environment

Public health needs timely, accurate information to track and improve its operations, to identify and monitor community needs, to direct interventions to those with the greatest risks, and to evaluate the impact of policies and interventions, and to support our community partners. But most LHD epidemiology departments have little capacity to inform the many decisions that affect health in their jurisdictions. For over a decade, the MCHPD epidemiology group has built a set of practices and resources to maximize our ability to provide timely information.This model practice describes the principles, practices, and resources that we have found to be most valuable.


To make timely information about public health operations and community health more readily available to people making decisions that affect the community's health, including program managers, public health leadership, community partners, policy makers, and others.

Guiding principles

This document describes processes we have developed over the past 15 years to achieve this goal. At the core of the processes are four principles:

  1. Provide decision support.
  2. Work as a member of a team
  3. Preserve an "information audit trail" for all work products
  4. Build for today and next year


To establish processes and systems which:

  1. foster information "self-service";
  2. preserve an information audit trail;
  3. speed and simplify epidemiologists' work in generating information;
  4. increase epidemiologists' productivity;
  5. improve the quality of information;
  6. minimize duplication of effort; and
  7. minimize disruptions to productivity from position turnover.

Implementation & Activities

The "LHD and Community Collaboration" section will describe specific activities through which we implement our principles, and work toward those objectives.


  • Established a user-friendly interactive reporting tool which staff throughout MCPHD use for information about their daily operations.
  • Created a library of flexible analysis programs which epidemiologist use to quickly fulfill data requests.
  • Established an "information audit trail" for each work product, to track down and correct errors.
  • Continuous improvement in the quality of our output.
  • Almost all new analyses re-use existing analysis programs.
  • Standard processes make it easy for staff to work together or to pick up each other's work, as needed.

Objectives met

The results describe above have produced the improvements described in our objectives, and support continued improvement.

Success factors

The main factor in the success of this practice is how consistently we use it, a consistency which is reinforced by group discussions, peer coaching, and semi-annual training sessions to assure a common understanding and competency regarding the practices.  Another important success factor is our readiness to discuss and alter our practices.

Public Health impact

The impact has been an increasingly rapid response to information requests, and an increase in the amount of information that each epidemiologist is able to provide.  That information good quality, timely, relevant information results in decisions that are based on better information.

Context (Our LHD & Population)

The approximately 700 staff of the Marion County Public Health Department (MCPHD) serve Indianapolis and the remainder of the county, about 960,000 residents.  About 2/3rd of the residents are White, 1/4th African American, and 1/10th Hispanic, with a 20% poverty rate.  Indiana has less per capita federal and state public health funding than almost every other state.  Our epidemiology group has 15 epidemiologists (about 8 FTE on grants) and two technical and support staff. The Epi group has used some of the practices described here for over 15 years, from when the group had only 4 members; the practices have scaled up well.

Population data geeks, not 1:1 case investigators

In some public health agencies, epidemiologists are hired within specific programs, and focus on case investigation or balance skills in analysis and scientific methods with content expertise in programs like Chronic Disease.  At MCPHD, epidemiology is a centralized function providing technical support to all MCPHD programs; except during acute outbreaks, epidemiology staff do not do case investigation. The epidemiologist's role is to understand programs' or external partners' information needs, and use data and analysis skills to provide the information needed for the client's decisions. While some of us have good content knowledge, we focus more on contributing science, methodology, and data tools expertise, and we rely on program staff for content expertise. The group mission is "to generate timely information that improves decisions affecting health in Marion County".


As noted in the Institute of Medicine (and many other experts), the US Public Health system is low on resources but has high demand ("For the Public's Health: Investing in a Healthier Future" IOM 2012), including regarding data and information ("The Role of Measurement in Action and Accountability" IOM 2011). So we in PH are constantly making decisions about how to allocate resources between important demands. But which choice will have the best impact? We can only guess, based on the best information we can gather before the decision is made. We owe it to our constituents to create the best decision support systems we can, within our resource constraints. Within PH, epidemiology often provides much of that decion support. In our health department. At MCPHD, we make that decision support role an explicit, central function of the Epidemiology group. Below are the principals and tools we have developed for that function.


1. Provide decision support.

Our focus in providing the right information to the right place at the right time. The program staff  and other customers know their business better than we do, and deserve respect for their content expertise and knowledge of program operations; we epis bring skills in data management, analysis, and interpretation. We are often important partners but seldom the leader of program initiatives. If we can 

2. Work as a member of a team

We all use the same analysis tools and datasets. We all adhere to certain standards in documenting and storing our work and work products. Epi staff are evaluated on how they contribute to the success of the whole group; they are expected to readily seek or provide help to others in the group, and to make improvements in our tools or processes to the benefit of everyone.

3. Preserve an "information audit trail" for all work products

If the information we provide is not reliable and trustworthy, we provide little value. Everything we create should have an audit trail that tracks its creation from the initial data to the final results, so that we could recreate the same results, and so we can find the origin of any errors, and correct them. As a side-effect, the audit trail makes it easy to find and copy or borrow from old analyses.

4. Build for today and next year

Improving PH is a marathon, not a sprint. We try to keep one eye on our long-term goals, and shape our work to bulit toward the goal as well as dealing with today's urgent demands. While it is quicker and easier in the short term to just put out today's fires, staff are expected to do the extra work that keeps us organized and increasingly effecient. The extra work that keeps up organized includes things like logging each request in the request database or filling in the standard header in analytic programs. We become increasingly efficient by taking time to create tools and analysis programs that are set up for easy re-use, and by keeping a lookout for how to trim our work processes. We try to "rachet progress", making and stabilizing small improvements so that we don't loose ground.


To make timely information about public health operations and community health more readily available to people making decisions that affect the community's health, including program managers, public health leadership, community partners, policy makers, and others.


To establish processes and systems which:

  1. foster information "self-service";
  2. preserve an information audit trail;
  3. speed and simplify epidemiologists' work in generating information;
  4. increase epidemiologists' productivity;
  5. improve the quality of information;
  6. minimize duplication of effort; and
  7. minimize disruptions to productivity from position turnover


You can implement these tools one (or a few) at a time; I would not try to implement them all at once. Some of them, like analytic datasets, can be built upon over time, so you can start small. I keep the "build for today and next year" principle in mind, making small improvements, but in a way that can be sustained and built upon.

We continue to develop all of these tools, adjusting them as we figure out how do things better.

Analytic data sets

Rather than having data analysts start with the raw data sources, our analysts use well-organized, standardized "analytic data sets" which we create from the raw data. For each data source, we have a "data prep" program that automatically runs on an appropriate schedule to add new data to that data source's analytic data set. Then data prep program

  • runs quality checks on the data and generates reports, email messages, and "unexpected value" data sets to preserve and communicate the data quality results,
  • assigns standard names and values for variables the curtain multiple data sets, such as race or gender
  • converts variables into standard types with standard values that are easy to use in analyses, such as (0, 1) for ("Yes", "No") type variables.
  • assigns meaningful variable names, labels, and data value formats to each variable in the analytic data set, and
  • labels the dataset, makes it read-only, and stores it where all analysts can access it.

The data prep programs can include code that addresses many of the subtleties and complexities in our data sources. For instance, we drop duplicate or invalid records, keeping only the most accurate record, we set unexpected values to missing, and we add useful computed variables like BMI or the Kotelchuck index. This results in simpler analysis programs, more consistent results across analysts, and helps them avoid the kind of errors that occur when someone is using a data set that is not as straightforward as it appears.

Poor man's data warehouse

All of our frequently used data sets are stored as analytic data sets in one place, accessible to all of our analysts. Most data sets are not linked together yet, although we are working toward that. But at least all analysts have access to shared, core datasets, documented and set up for easy analysis, and automatically updated frequently (usually daily or monthly).

We currently store our analytic data sets as SAS data sets, but we are assessing whether to store them in an SQL database, so that they are more readily accessible via software other than SAS.

Limited number of tools

In order to foster shared expertise and to assure that one epidemiologist can readily pass their work to another, we try to limit the number of software tools we use.

  • We try to assure that any new data system implemented within the agency allows backend access to the data be ODBC, so that we can easily integrate it into our current data flow.
  • We encourage programs to use Crystal Reports or Viya (see " Interactive reporting tool, suitable for the computer-shy" below) for getting data from their systems, rather than using reporting tools within each system, so that we can more easily support them and integrate data across programs.
  • Within epidemiology, everyone uses SAS or R, and within R, we try to use the same set of packages, so that we can understand and use each other's code.

If it makes a lot of sense to use some other tool, we might, but we have a strong bias toward figuring out how to get things done with our main tools, rather than using something new.

Interactive reporting tool, suitable for the computer-shy

One of best things we've done to P decision support interagency is to implement a user-friendly, interactive reporting tool, which allows staff throughout the agency to view and analyze data about their activities. The tool allows them to cut their data in almost any way they would like, to drill down to record level information, and to create dashboards. We have put a notable amount of effort into developing the tool, and training agency staff and how to use it, and creating initial reports for them to use. Ongoing maintenance and training takes about ¼ FTE. Epidemiologist who support specific programs, like Chronic Disease, also support that program's use of the tool. We recently switched from using Futrix (which is no longer available) to using SAS Viya.

Data request ID ("DR number")

The "DR number" is the spine that holds our operations together. Each work product we produce is assigned a data request ID (the "DR number"), which is identifies that product. The DR number is included in the filename of almost any file associated with that request, such as analytic program, the program long, program output, or documents containing results. The DR number is also included in the results, such as as a footnote on any graphs, or in page footers of documents. So in someone has a question about something we have produced, we can ask them to look for the DR number on that result, and then quickly find the program that produce that result or other relevant information.

Data request database

Our data request database simplifies use of the DR number and preservation of an information audit trail. When we receive request for information, the request is logged in the data request database, which generates the unique DR number. The database includes a button which creates a folder with the DR number, creates some program templates within that folder, and creates a parallel folder in our code repository.

The database includes tools to help us lookup past requests that might be similar to the current request, and to find analytic code, output, and other work products from those past requests. Often, rather than creating a new request, we use the database to find something we've produced already that satisfies the current need.

The database also helps us assess our productivity, operations, and distribution of clients, to help us with continuous quality improvement.

The data request database was created is Microsoft Access. A version set up to be deployed by other analysis groups is available at

Version control for analysis programs

We use Subversion to maintain a history of changes to each of our analytic programs. Subversion is a freely downloadable, open source version control system (see This is part of our "information audit trail". Through it, program versions that produced old work products can be restored, and we can quickly recover from erroneous changes to analytic programs. Good use of version control requires staff training and cultivating good version control habits.

Structured work product storage

Files associated with each work product are stored in that work product's system folder, with the folder name beginning with the request's DR number. All of the DR folders are e place, being subfolders of our Data Requests folder.

A folder with the same name is created on our Subversion version control server, where the analytic program files are then archived.

Automated request folder setup

Creating system and version control folders adds steps that tempting to skip, when you get an urgent data request. Our data request database includes a button which creates those folders, and creates some program templates within the system folder, so we can minimize the "overhead" required to conform to our group standards.

Standard analysis program headers

We have a standard header that which we include in each of our SAS or R analytic programs. The header makes it easier for coworkers to understand each other's programs, automates the inclusion of the DR number in results, automates saving the log file, and prompts other standard programming practices we have adopted. It also includes questions to complete for a code audit.

Preserve program logs

Our standard headers automatically generated analysis program log files, documenting how the output was created, and any potential errors. These are saved in that date request's system folder.

Code audits

For the most part, each analytic program is reviewed by another member of the Epidemiology staff, as a quality check. This helps identify errors, assures conformance to the group standards, and, most of all, helps us learn coding skills from each other, speeding how quickly folks in the group move up the learning curve.

As with other processes, we tested and modified the audit process a few times to trim it down to its essentials, and continue to modify it to keep it quick but valuable.

Semi-annual training

Twice a year the Epidemiology group meets to review our standard processes, to assure that everyone understands them the same way, and did discuss how we can trim or otherwise improve them. Even after several years, the sessions always produce changes in our processes or follow-up sessions for folks who want more focused training.

Orientation of new staff

Our checklist for orienting new staff includes several items to orient the staff to some of the tools and processes described here. It also includes reviewing a fairly extensive document describing our processes, and another document with information about our main data sets.

Thorough orientation and periodic re-training in the group's standard processes and the use of the group's core tools.

Consumer input

Over the years, we've had several attempts to formalize consumer input about the value of our work products and our services in general. We haven't found a way to do this it's been both informative and easily sustained. We currently have at least annual, informal discussions with key customers, and (sometimes) send out an annual survey. We have tried  per-work product surveys, but haven't maintained them, gotten a response rate, nor found them to be very informative.

Rules versus Guidelines

As they said in Pirates of the Caribbean, these are "more what you'd call 'guidelines' than actual rules." We try to use these tools and processes consistently, but staff have leeway to do what they think is best.

Tools per objective

  1. foster information "self-service": Interactive reporting tool
  2. preserve an information audit trail: Data request ID, Data request database, Version control for analysis programs, Preserve program logs
  3. speed and simplify epidemiologists' work in generating information: Analytic data sets, Poor man's data warehouse, Data request database
  4. increase epidemiologists' productivity: Data request database, Automated request folder setup
  5. improve the quality of information: Analytic data sets, Poor man's data warehouse, Consumer input
  6. minimize duplication of effort: Analytic data sets, Limited number of tools, Standard analysis program headers
  7. minimize disruptions to productivity from position turnover, Limited number of tools, Structured work product storage, Standard analysis program headers' Code audits, Semi-annual training, Orientation of new staff


We have evaluated and improved components of our practices often over the past 15 years. Here are some things that have resulted.


  • Established a user-friendly interactive reporting tool which combines information from almost all MCPHD programs into one interface, through which all MCPHD staff can review their historical operations data up through the prior day's work.
  • Created a library of flexible analysis programs and code snippets which epidemiologist use to quickly fulfill many data requests.
  • Established an "information audit trail" for each work product, which allows us to readily identify and correct the source of any errors.
  • The quality of our output continuously improves as old analysis programs get refined and augmented, as they get re-used.
  • Almost all new analyses re-use existing analysis programs, with simple changes to the time period or stratification categories, or with more modifications to improve the program for future re-use.
  • Our centralized organization of programs and work products makes it much easier than it had been for us to fill in for each other, if an epidemiologist is absent, or for an epidemiologist new to a role to find and re-use the programs of their predecessor.

We have sustained these pracitces for upto 15 years, and will continue. We keep working to streamline them. The FTE required for many is pretty minimal, and more that pays for itself in time saved, were we to have a less organized and less systematic approach.

E-Mail from NACCHO