Visualizing Microbiome Data: Choropleth Style

Recently I’ve needed to visualize spatial changes in my microbiome data that are easily interpretable by other people. The best solution I’ve come across is simply projecting the data onto a drawing of my organism. Like this:

Zostera marina high resolution

Alpha diversity of samples from pieces of a seagrass plant projected onto a drawing of that plant.

I’ve used SitePainter to produce these in the past, and in some ways it’s great. I got the figure I wanted and I’ve received a lot of positive feedback about it. The only problem is it has a steep learning curve and is the most dysfunctional GUI I’ve ever come across (generating the figure above took several days of my time, the first time). So when it became necessary to produce over forty more images like it I decided to search for a better way.

My first instinct was that there should be an easy way to do it in R, after all people produce pretty colored maps in R all the time. I reasoned it shouldn’t be too hard to feed the program an SVG file instead of a map and project my data onto that. It turns out I was wrong. Making a choropleth (the fancy name for a map colored by data) is incredibly difficult if your image file does not have latitude and longitude coordinates.

After many hours of searching the internet I was starting to wonder whether I should give up when I found this great tutorial about creating choropleths in python from an SVG file. It was super helpful and I’ve used my dubious python “skills” to modify it slightly to produce many maps instead of just one. If all you need is one map, I highly encourage you to use the original script from the tutorial, it’s a great little script and a lot less messy than my own. If you need to loop through many columns of data and produce many maps at once, my script may be more useful.

So without further ado, I’ll take you through my process to go from this…

zosteramarina3 and this…              Screenshot from 2015-05-06 15:00:21

…to this:
k00390


The first thing you’ll need is an SVG file. A lot of clipart comes in this format and some vector-editing programs (like Adobe Illustrator or the open source option, Inkscape) will even convert images into vector format. When you’re creating your picture, you’ll need to make sure that every unit you want colored is represented by a single polygon (an object you can “fill” in the program).

Once you’ve finished producing your graphic you’ll need to make sure that each of the polygon paths has a name that corresponds to the sample name you want to color it by. In Inkscape you can do this by editing the ID field of the Object Properties window (to see the Object Properties dialog press Shift+Ctrl+O). As with anything that involves coding it’s probably a good idea to avoid strange characters in the names you choose it’s also important that each ID in your picture is unique.

Naming objects with sample IDs in Inkscape.

Naming objects with sample IDs in Inkscape.

The next thing you’ll need is a data file. The file should be a tab delimited text file. Each row should be a different sample and columns should represent the data you want to project onto the image. In my case, I’m projecting predicted KEGG ortholog counts onto an image of seagrass. The first column of this file should be the IDs you used to name the polygons and should be titled “SVGID”.

Screenshot from 2015-06-04 15:05:37_redbox

With your data file and your SVG file in hand you’re ready to color your picture. Let’s take a look at the script:

#/usr/bin/python
### color_map.py

from BeautifulSoup import BeautifulSoup
import pandas as pd

##Map Options:

#Map color Options:

#greens
#colors = ["#f7fcf5", "#e5f5e0", "#c7e9c0", "#a1d99b", "#74c476", "#41ab5d", "#238b45", "#006d2c", "#00441b"]

#blues
colors = ["#fff7fb", "#ece7f2", "#d0d1e6", "#a6bddb", "#74a9cf", "#3690c0", "#0570b0", "#045a8d", "#023858"]

#blue-purple color-brewer palette
#colors = ["#f7fcfd", "#e0ecf4","#bfd3e6", "#9ebcda","#8c96c6", "#8c6bb1", "#88419d", "#810f7c", "#4d004b"]

#purple
#colors = ["#fcfbfd", "#efedf5", "#dadaeb", "#bcbddc", "#9e9ac8", "#807dba", "#6a51a3", "#54278f", "#3f007d"]

#Reading in the Metadata CSV file as a pandas data frame:
df = pd.read_csv('/path/to/your/csv/file/here.csv', sep="\t")

#Creating dictionary of IDs and KO values
svgid = df.iloc[:,0]
KO_id = list(df.columns.values)
del KO_id[0:1]
dictionary = df.set_index('SVGID').T.to_dict()

# Load the SVG map
svg = open('path/to/your/svg/file/here.svg', 'r').read()

# Load into Beautiful Soup
soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])

# Find polygon paths
paths = soup.findAll('path')

# Number of color increments to be used
groups = 9 

# Polygon style
path_style = 'font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-opacity:1;stroke-width:1.0;\
stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;marker-start:none;\
stroke-linejoin:bevel;fill:'

# Create SVG for each KO
for i in KO_id:
    # Color the polygons based on KO counts
    min_value = float(min(df.loc[:,i])); max_value = float(max(df.loc[:,i]))
    increment = (max_value - min_value)/float(groups)
    print i
    print "Min = %s" %min_value
    print "Max = %s" %max_value
    print "Increment: %s" %increment

    for p in paths:

        try:
            kegg = dictionary[p["id"]][i]
        except:
            continue

        if kegg > (max_value-increment):
            color_class = 8
        elif kegg > (max_value-increment*2):
            color_class = 7
        elif kegg > (max_value-increment*3):
            color_class = 6
        elif kegg > (max_value-increment*4):
            color_class = 5
        elif kegg > (max_value-increment*5):
            color_class = 4
        elif kegg > (max_value-increment*6):
            color_class = 3
        elif kegg > (max_value-increment*7):
            color_class = 2
        elif kegg > (max_value-increment*8):
            color_class = 1
        else:
            color_class = 0

        color = colors[color_class]
        p['style'] = path_style + color

        #create file to write to:
        svg_name = "/path/to/svg/output/file/choropleth_" + str(i) + ".svg"
        svg_c = open(svg_name,'wt')
        svg_c.write(soup.prettify())
        svg_c.close()

There are a couple of things you’ll need to change and some things you may want to edit.

What you must change:

Your data file:
On the line below you’ll want to put the path to your data file in the first set of single quotes.

#Reading in the Metadata CSV file as a pandas data frame:
df = pd.read_csv('/path/to/your/csv/file/here.csv', sep="\t")

Your SVG file:
On the line below you’ll want to put the path to your SVG file in the first set of single quotes.

# Load the SVG map
svg = open('path/to/your/svg/file/here.svg', 'r').read()

What you might want to change:

The palette:
I have several palettes of different colors commented out at the top of the code. To use one simply change which line is commented out. You can also create your own palettes using the same format. The online tool Color Brewer (http://colorbrewer2.org/) creates palettes for just this kind of use and can be very helpful for finding new palettes.

The number of bins (recommended for more advanced users only):
The data you provide is separated into a number of equally-sized bins for color-coding you can control this number by changing the groups number in this line:

# Number of color increments to be used
groups = my_group_number

The name of the output file:

        #create file to write to:
        svg_name = "/path/to/svg/output/file/choropleth_" + str(i) + ".svg"

I’ve set it at nine but you can set to what you want. Just remember, you must have the same number of colors in your palette as bins for your data and you need to change the loop accordingly.

Once you’re done editing, run your script and enjoy!

Advertisements

About Hannah Holland-Moritz

Hannah Holland-Moritz is a graduate student working in Noah Fierer’s lab. She graduated from UC Davis in June 2014 with a major in Biochemistry and Molecular Biology and minor in Bioinformatics and most recently spend a gap year working in Jonathan Eisen’s lab on the microbiome of seagrasses. Interested in Evolution, Ecology, Bioinformatics and all things microbial, she plans to pursue a career in research.
This entry was posted in Blog. Bookmark the permalink.

One Response to Visualizing Microbiome Data: Choropleth Style

  1. Cassie Ettinger says:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s