file - Write data to disk in Python as a background process -

i have program in python following:

for j in xrange(200):     # 1) compute bunch of data     # 2) write data disk 

1) takes 2-5 minutes
2) takes ~1 minute

note there data keep in memory.

ideally write data disk in way avoids idling cpu. possible in python? thanks!

you try using multiple processes this:

import multiprocessing mp  def compute(j):     # compute bunch of data     return data  def write(data):     # write data disk  if __name__ == '__main__':     pool = mp.pool()     j in xrange(200):         pool.apply_async(compute, args=(j, ), callback=write)     pool.close()     pool.join() 

pool = mp.pool() create pool of worker processes. default, number of workers equals number of cpu cores machine has.

each pool.apply_async call queues task run worker in pool of worker processes. when worker available, runs compute(j). when worker returns value, data, thread in main process runs callback function write(data), data being data returned worker.

some caveats:

  • the data has picklable, since being communicated worker process main process via queue.
  • there no guarantee order in workers complete tasks same order in tasks sent pool. order in data written disk may not correspond j ranging 0 199. 1 way around problem write data sqlite (or other kind of) database j 1 of fields of data. then, when wish read data in order, select * table order j.
  • using multiple processes increase amount of memory required data generated worker processes , data waiting written disk accumulates in queue. might able reduce amount of memory required using numpy arrays. if not possible, might have reduce number of processes:

    pool = mp.pool(processes=1)  

    that create 1 worker process (to run compute), leaving main process run write. since compute takes longer write, queue won't backed more 1 chunk of data written disk. however, still need enough memory compute on 1 chunk of data while writing different chunk of data disk.

    if not have enough memory both simultaneously, have no choice -- original code, runs compute , write sequentially, way.

Popular posts from this blog

How to calculate SNR of signals in MATLAB? -

java - How to create Table using Apache PDFBox -

c# - Attempting to upload to FTP: System.Net.WebException: System error -