matplotlib - Python - reading a csv and grouping data by a column -

- June 15, 2013

i working csv file 3 columns this:

timestamp, value, label 15:22:57, 849, cpu pid=26298:percent 15:22:57, 461000, jmx mb 15:22:58, 28683, disks i/o 15:22:58, 3369078, memory pid=26298:unit=mb:resident 15:22:58, 0, jmx 31690:gc-time 15:22:58, 0, cpu pid=26298:percent 15:22:58, 503000, jmx mb

the 'label' column contains distinct values (say total of 5), include spaces, colons , other special characters.

what trying achieve plot time against each metric (either on same plot or on separate ones). can matplotlib, first need group [timestamps, value] pairs according 'label'.

i looked csv.dictreader labels , itertools.groupby group 'label', struggling in proper 'pythonic' way.

any suggestion?

thanks

you don't need groupby; want use collections.defaultdict collect series of [timestamp, value] pairs keyed label:

from collections import defaultdict import csv  per_label = defaultdict(list)  open(inputfilename, 'rb') inputfile:     reader = csv.reader(inputfile)     next(reader, none)  # skip header row      timestamp, value, label in reader:         per_label[label.strip()].append([timestamp.strip(), float(value)])

now per_label dictionary labels keys, , list of [timestamp, value] pairs values; i've stripped off whitespace (your input sample has lot of whitespace) , turned value column floats.

for (limited) input sample results in:

{'cpu pid=26298:percent': [['15:22:57', 849.0], ['15:22:58', 0.0]],  'disks i/o': [['15:22:58', 28683.0]],  'jmx 31690:gc-time': [['15:22:58', 0.0]],  'jmx mb': [['15:22:57', 461000.0], ['15:22:58', 503000.0]],  'memory pid=26298:unit=mb:resident': [['15:22:58', 3369078.0]]}

Search This Blog

Employment & Recruiting

matplotlib - Python - reading a csv and grouping data by a column -

Popular posts from this blog

Php - Delimiter must not be alphanumeric or backslash -

Delphi interface implements -

java - How to create Table using Apache PDFBox -