So, for a bit, I kept messing up nested list comprehensions when I tried to use them. Since I’m usually writing in Python to Get All The Things Done, my usual response is, “Oh, I should figure out what I’m doing wrong sometime, but I’ll just do a while loop or use reduce or whatever for now.”
But the other day, I happened on a nice concrete example that clarified things for me, so I figured, maybe I should link it, and share another such example.
Suppose we have an XML file that contains results from timing how long requests to groups of URLs that are grouped together (I have such a file from my attempts to perform benchmarks on a rewritten server at work). In this case, there are two levels of groups, one for requests that correspond to a particular geospatial location, and one that corresponds to two related queries for two different types of information about that location (specifically, KML requests, and requests for the imagery corresponding to the particular KML file). The XML looks like sort of like this:
<location>
<tile>
<request type="kml" duration="100000" bytes="3150">
<![CDATA[url]]>
</request>
<request type="img" duration="102500" bytes="5316">
<![CDATA[url]]>
</request>
</tile>
...
</location>
...
(Duration is in microseconds, if you’re wondering why it’s so freaking huge.)
Well, suppose I want to load all that data into Excel or Matlab or something. The easiest way to do that is to convert all that data into CSV. But I want to make sure to preserve an easy way to keep track of the location grouping and the tile grouping, since the whole point is to allow arbitrary analysis of the data set, and those attributes may be important to some particular statistics the analyst may want.
I can do something like the following to generate the appropriate CSV file:
from lxml import etree as ET
def make_csv(xmlfile):
csvline='{0},{1},{2},{3},{4},{5}'
locations = ET.parse(xmlfile).findall('.//location')
return '\n'.join( [ csvline.format(i, j,
req.get('type',''),
req.get('duration',''),
req.get('bytes',''),
req.text if req.text else ''
)
for i,loc in zip(xrange(1,len(locations)+1), locations)
for j,tile in zip(xrange(1,len(loc)+1), loc)
for req in tile
] )
I’m not sure this is necessarily the most efficient way for me to do this, but oh well. Hopefully, this will both help me remember how to use nested comprehensions, and make it easier for someone else trying to figure out nested list comprehensions in the future (via the linked post if nothing else).
Design by Simon Fletcher. Powered by Tumblr.
© Copyright 2010