Recently I’ve needed to visualize spatial changes in my microbiome data that are easily interpretable by other people. The best solution I’ve come across is simply projecting the data onto a drawing of my organism. Like this:
I’ve used SitePainter to produce these in the past, and in some ways it’s great. I got the figure I wanted and I’ve received a lot of positive feedback about it. The only problem is it has a steep learning curve and is the most dysfunctional GUI I’ve ever come across (generating the figure above took several days of my time, the first time). So when it became necessary to produce over forty more images like it I decided to search for a better way.
My first instinct was that there should be an easy way to do it in R, after all people produce pretty colored maps in R all the time. I reasoned it shouldn’t be too hard to feed the program an SVG file instead of a map and project my data onto that. It turns out I was wrong. Making a choropleth (the fancy name for a map colored by data) is incredibly difficult if your image file does not have latitude and longitude coordinates.
After many hours of searching the internet I was starting to wonder whether I should give up when I found this great tutorial about creating choropleths in python from an SVG file. It was super helpful and I’ve used my dubious python “skills” to modify it slightly to produce many maps instead of just one. If all you need is one map, I highly encourage you to use the original script from the tutorial, it’s a great little script and a lot less messy than my own. If you need to loop through many columns of data and produce many maps at once, my script may be more useful.
So without further ado, I’ll take you through my process to go from this…
The first thing you’ll need is an SVG file. A lot of clipart comes in this format and some vector-editing programs (like Adobe Illustrator or the open source option, Inkscape) will even convert images into vector format. When you’re creating your picture, you’ll need to make sure that every unit you want colored is represented by a single polygon (an object you can “fill” in the program).
Once you’ve finished producing your graphic you’ll need to make sure that each of the polygon paths has a name that corresponds to the sample name you want to color it by. In Inkscape you can do this by editing the ID field of the Object Properties window (to see the Object Properties dialog press Shift+Ctrl+O). As with anything that involves coding it’s probably a good idea to avoid strange characters in the names you choose it’s also important that each ID in your picture is unique.
The next thing you’ll need is a data file. The file should be a tab delimited text file. Each row should be a different sample and columns should represent the data you want to project onto the image. In my case, I’m projecting predicted KEGG ortholog counts onto an image of seagrass. The first column of this file should be the IDs you used to name the polygons and should be titled “SVGID”.
With your data file and your SVG file in hand you’re ready to color your picture. Let’s take a look at the script:
#/usr/bin/python ### color_map.py from BeautifulSoup import BeautifulSoup import pandas as pd ##Map Options: #Map color Options: #greens #colors = ["#f7fcf5", "#e5f5e0", "#c7e9c0", "#a1d99b", "#74c476", "#41ab5d", "#238b45", "#006d2c", "#00441b"] #blues colors = ["#fff7fb", "#ece7f2", "#d0d1e6", "#a6bddb", "#74a9cf", "#3690c0", "#0570b0", "#045a8d", "#023858"] #blue-purple color-brewer palette #colors = ["#f7fcfd", "#e0ecf4","#bfd3e6", "#9ebcda","#8c96c6", "#8c6bb1", "#88419d", "#810f7c", "#4d004b"] #purple #colors = ["#fcfbfd", "#efedf5", "#dadaeb", "#bcbddc", "#9e9ac8", "#807dba", "#6a51a3", "#54278f", "#3f007d"] #Reading in the Metadata CSV file as a pandas data frame: df = pd.read_csv('/path/to/your/csv/file/here.csv', sep="\t") #Creating dictionary of IDs and KO values svgid = df.iloc[:,0] KO_id = list(df.columns.values) del KO_id[0:1] dictionary = df.set_index('SVGID').T.to_dict() # Load the SVG map svg = open('path/to/your/svg/file/here.svg', 'r').read() # Load into Beautiful Soup soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview']) # Find polygon paths paths = soup.findAll('path') # Number of color increments to be used groups = 9 # Polygon style path_style = 'font-size:12px;fill-rule:nonzero;stroke:#000000;stroke-opacity:1;stroke-width:1.0;\ stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;marker-start:none;\ stroke-linejoin:bevel;fill:' # Create SVG for each KO for i in KO_id: # Color the polygons based on KO counts min_value = float(min(df.loc[:,i])); max_value = float(max(df.loc[:,i])) increment = (max_value - min_value)/float(groups) print i print "Min = %s" %min_value print "Max = %s" %max_value print "Increment: %s" %increment for p in paths: try: kegg = dictionary[p["id"]][i] except: continue if kegg > (max_value-increment): color_class = 8 elif kegg > (max_value-increment*2): color_class = 7 elif kegg > (max_value-increment*3): color_class = 6 elif kegg > (max_value-increment*4): color_class = 5 elif kegg > (max_value-increment*5): color_class = 4 elif kegg > (max_value-increment*6): color_class = 3 elif kegg > (max_value-increment*7): color_class = 2 elif kegg > (max_value-increment*8): color_class = 1 else: color_class = 0 color = colors[color_class] p['style'] = path_style + color #create file to write to: svg_name = "/path/to/svg/output/file/choropleth_" + str(i) + ".svg" svg_c = open(svg_name,'wt') svg_c.write(soup.prettify()) svg_c.close()
There are a couple of things you’ll need to change and some things you may want to edit.
What you must change:
Your data file:
On the line below you’ll want to put the path to your data file in the first set of single quotes.
#Reading in the Metadata CSV file as a pandas data frame: df = pd.read_csv('/path/to/your/csv/file/here.csv', sep="\t")
Your SVG file:
On the line below you’ll want to put the path to your SVG file in the first set of single quotes.
# Load the SVG map svg = open('path/to/your/svg/file/here.svg', 'r').read()
What you might want to change:
The palette:
I have several palettes of different colors commented out at the top of the code. To use one simply change which line is commented out. You can also create your own palettes using the same format. The online tool Color Brewer (http://colorbrewer2.org/) creates palettes for just this kind of use and can be very helpful for finding new palettes.
The number of bins (recommended for more advanced users only):
The data you provide is separated into a number of equally-sized bins for color-coding you can control this number by changing the groups number in this line:
# Number of color increments to be used groups = my_group_number
The name of the output file:
#create file to write to: svg_name = "/path/to/svg/output/file/choropleth_" + str(i) + ".svg"
I’ve set it at nine but you can set to what you want. Just remember, you must have the same number of colors in your palette as bins for your data and you need to change the loop accordingly.
Once you’re done editing, run your script and enjoy!
Reblogged this on The Seagrass Microbiome Project.