matplotlib - Python - reading a csv and grouping data by a column -
i working csv file 3 columns this:
timestamp, value, label 15:22:57, 849, cpu pid=26298:percent 15:22:57, 461000, jmx mb 15:22:58, 28683, disks i/o 15:22:58, 3369078, memory pid=26298:unit=mb:resident 15:22:58, 0, jmx 31690:gc-time 15:22:58, 0, cpu pid=26298:percent 15:22:58, 503000, jmx mb
the 'label' column contains distinct values (say total of 5), include spaces, colons , other special characters.
what trying achieve plot time against each metric (either on same plot or on separate ones). can matplotlib, first need group [timestamps, value]
pairs according 'label'.
i looked csv.dictreader
labels , itertools.groupby
group 'label', struggling in proper 'pythonic' way.
any suggestion?
thanks
you don't need groupby
; want use collections.defaultdict
collect series of [timestamp, value]
pairs keyed label:
from collections import defaultdict import csv per_label = defaultdict(list) open(inputfilename, 'rb') inputfile: reader = csv.reader(inputfile) next(reader, none) # skip header row timestamp, value, label in reader: per_label[label.strip()].append([timestamp.strip(), float(value)])
now per_label
dictionary labels keys, , list of [timestamp, value]
pairs values; i've stripped off whitespace (your input sample has lot of whitespace) , turned value
column floats.
for (limited) input sample results in:
{'cpu pid=26298:percent': [['15:22:57', 849.0], ['15:22:58', 0.0]], 'disks i/o': [['15:22:58', 28683.0]], 'jmx 31690:gc-time': [['15:22:58', 0.0]], 'jmx mb': [['15:22:57', 461000.0], ['15:22:58', 503000.0]], 'memory pid=26298:unit=mb:resident': [['15:22:58', 3369078.0]]}