A HIGHLY INEFFICIENT METHOD OF DETERMINING STD INCIDENCE RATES PER GENDER PER COUNTY

Upon completing the training courses, my first indulgence in SAS Studio was to write code that, given a spreadsheet of all reported cases, would return calculated STD incidence rates for each Texas county and the male and female populations within them. This soon proved to be quite the challenge for a very silly reason– the Chlamydia datafile I was trying to import (.XLSX of all cases from 2018-2022)was simply too darn big! With a filtered dataset that only contained cases from 2022, I successfully imported a relatively manageable file into SAS, and my program was finally executed to completion (albeit, painfully slow). 

Note: The following code was tweaked to yield incidence rate data tables for Chlamydia each year w/in 2018 – 2021 and for Gonorrhea + Syphilis as well! (update the filename, filepath, and year as desired) 

Here is how I did it:

  1. Import the Texas 2018-2022 Population datafile into the program using PROC IMPORT; I created a PROC SQL step to ensure the file wasn’t already opened/imported within the program. I then used a ‘data’ step to limit the columns imported into the program from the original spreadsheet to just the relevant four (County Co_fips Total TotalMale TotalFemale) and keep the population counts to those from 2022 only. 
  2. I similarly imported the Chlamydia 2022 Excel spreadsheet using PROC IMPORT, incorporating the same PROC SQL step to REFILE the data file if it has already been imported. Then, a new table ‘counts’ was created that would +1 for every row with the same county, and would also distinguish gender by creating a separate count +1 for each row where sex = Male and sex = Female (per county). An essential step was to run a PROC SORT nodupkey to clean up the output data before the incidence rate calculation step could be performed.
  3. Finally, I could run an incidence rate calculating step. I accomplished this using PROC SQL ‘create table’ because I required my output data in a format that could be exported as a tab-delimited file for the next steps. I created a combined rate table, along with three separate male, female, and total tables to be used for other county and gender-specific queries. Here is a snipped from one of the four output tables. I exported each table for use in the next phase of my project… map creation