IN WHICH TEXAS COUNTIES ARE WOMEN MORE AFFECTED BY GONORRHEA? PART I

CREATING MAPS IN SAS

I updated the previous code to create an additional table that would calculate the difference in incidence rate between male and female pop in each county.

 

tab delim datalines introduced into new table ‘county_diffrate’ ; columns set as ‘name’, ‘county’ and ‘rateddiff’

This simple subtraction equation gave me a column ‘diff’ in my output table that contained either a negative or positive value. I exported table as a tab-delimited file for my graph creation program. 

Let the graph-making begin! 

  1. In a new SAS program, I inputted the tab delim file into a data table I named ‘county_diffrate’ using a DATALINES statement. I made sure to establish the format of the character and numeric columns. 
  2. Map creation is made easy with SAS libraries. Simply reference the desired map data with a DATA desired_map_nameSET maps(gfk).insert_geography_of_interest statement, like so:SAS offers many different sets of open and public access geospatial data. GfK map data sets contain X, Y, LONG, and LAT values that can be used for annotation (discussed further in Step X).
    • You can use a WHERE statement to limit the map area to the region(s) of interest. For this project, I indicated ‘US-48’ to set my ‘texas_state’ map to include just Texas counties!
  3. PROC GMAP creates a map in the results section of a SAS program. I ran a PROC UNIVARIATE statement on my ‘county_diffrate’ data to determine the 1%, 25%, 50%, 75%, and 99% quantiles which I used to set the (levels=) 5 midpoints for the map legend range. By doing this, counties with a negligible difference between male and female rates [near zero] could be identified and assigned a neutral PATTERN (discussed further in Step 5). It is important to understand the ‘ID’ and ‘CHORO’ statements in order to create the graph you desire. 
    • The ID statement identifies the variable(s) in the map and response data sets that define map area. My maps.gfk derived texas_state and my county_diffrate (response) dataset share the variable ‘County’ which will help SAS identify the map area to which a response value (each row in county_diffrate) belongs.
      • Note: Every variable that is listed in the ID statement must appear in both the map and response data sets. The variable identified by the id-variable(s) argument can be of type numeric or character and should have the same name, type, and length in both the response and map data sets. 
    • The CHORO statement specifies the response variable(s) that contain the data represented on the map by patterns that fill the map areas. In the case of my desired map, this data was found within the ‘rateddiff’ column of my county_diffrate table. The CHORO statement can also be used to enhance the appearance of the map, modify map area patterns and legend, as well as use an Annotate data set (see Step X).
      • syntax: CHORO response-variable(s) </ option(s)>;
    • Here’s what we have so far
  4. Now, I wanted to enhance my map with a custom PATTERN set, a title, background color, and a border! I adjusted the cdefault in accordance with my chosen pattern range [See Resources page for helpful color palettes and other design tools]. In this particular case, I want my cdefault [counties with no data] to match those with negligible rate differences between male and female populations. With the following code, my map was now looking far more aesthetically pleasing AND informative!
    • Counties colored light pink indicated a slightly higher Gonorrhea incidence rate in females, and the darker pink indicated a significantly higher in females; likewise, the light green and darker green indicated higher rates in males. 
    • Counties colored a beige color were those with no data [zero reported cases] or a negligible difference in incidence rate amongst males and females.
  5. My satisfaction did not last long, and my fingers were itching to incorporate further enhancements to the graph. I wanted to create a border around the 11 public health regions of Texas in addition to the already defined counties, and I wanted to label major cities to make the map easier to decipher. Follow along as work through these tasks in Part II