performance - Why is computing point distances so slow in Python? -


my python program slow. so, profiled , found of time being spent in function computes distance between 2 points (a point list of 3 python floats):

def get_dist(pt0, pt1):     val = 0     in range(3):         val += (pt0[i] - pt1[i]) ** 2     val = math.sqrt(val)     return val 

to analyze why function slow, wrote 2 test programs: 1 in python , 1 in c++ similar computation. compute distance between 1 million pairs of points. (the test code in python , c++ below.)

the python computation takes 2 seconds, while c++ takes 0.02 seconds. 100x difference!

why python code so slower c++ code such simple math computations? how speed up match c++ performance?

the python code used testing:

import math, random, time  num = 1000000  # generate random points , numbers  pt_list = [] rand_list = []  in range(num):     pt = []     j in range(3):         pt.append(random.random())     pt_list.append(pt)     rand_list.append(random.randint(0, num - 1))  # compute  beg_time = time.clock() dist = 0  in range(num):     pt0 = pt_list[i]     ri  = rand_list[i]     pt1 = pt_list[ri]      val = 0     j in range(3):         val += (pt0[j] - pt1[j]) ** 2     val = math.sqrt(val)      dist += val  end_time = time.clock() elap_time = (end_time - beg_time)  print elap_time print dist 

the c++ code used testing:

#include <cstdlib> #include <iostream> #include <ctime> #include <cmath>  struct point {     double v[3]; };  int num = 1000000;  int main() {     // allocate memory     point** pt_list = new point*[num];     int* rand_list = new int[num];      // generate random points , numbers     ( int = 0; < num; ++i )     {         point* pt = new point;          ( int j = 0; j < 3; ++j )         {             const double r = (double) rand() / (double) rand_max;             pt->v[j] = r;         }          pt_list[i] = pt;         rand_list[i] = rand() % num;     }      // compute      clock_t beg_time = clock();     double dist = 0;     ( int = 0; < num; ++i )     {         const point* pt0 = pt_list[i];         int r = rand_list[i];         const point* pt1 = pt_list[r];          double val = 0;         ( int j = 0; j < 3; ++j )         {             const double d = pt0->v[j] - pt1->v[j];             val += ( d * d );         }          val = sqrt(val);         dist += val;     }     clock_t end_time = clock();     double sec_time = (end_time - beg_time) / (double) clocks_per_sec;      std::cout << sec_time << std::endl;     std::cout << dist << std::endl;      return 0; } 

a sequence of optimizations:

the original code, small changes

import math, random, time  num = 1000000  # generate random points , numbers  # change #1: it's not have randomness. # 1 of cases. # changing code shouldn't change results. # using fixed seed ensures changes valid. # final 'print dist' should yield same result regardless of optimizations. # note: there's nothing magical seed. # randomly picked hash tag git log. random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8) pt_list = [] rand_list = []  in range(num):     pt = []     j in range(3):         pt.append(random.random())     pt_list.append(pt)  # change #2: rand_list computed in separate loop. # ensures upcoming optimizations same results # unoptimized version. in range(num):     rand_list.append(random.randint(0, num - 1))  # compute  beg_time = time.clock() dist = 0  in range(num):     pt0 = pt_list[i]     ri  = rand_list[i]     pt1 = pt_list[ri]      val = 0     j in range(3):         val += (pt0[j] - pt1[j]) ** 2     val = math.sqrt(val)      dist += val  end_time = time.clock() elap_time = (end_time - beg_time)  print elap_time print dist 


optimization #1: put code in function.

the first optimization (not shown) embed of code except import in function. simple change offers 36% performance boost on computer.


optimization #2: eschew ** operator.

you don't use pow(d,2) in c code because knows suboptimal in c. it's suboptimal in python. python's ** smart; evaluates x**2 x*x. however, takes time smart. know want d*d, use it. here's computation loop optimization:

for in range(num):     pt0 = pt_list[i]     ri  = rand_list[i]     pt1 = pt_list[ri]      val = 0      j in range(3):         d = pt0[j] - pt1[j]         val += d*d      val = math.sqrt(val)      dist += val  


optimization #3: pythonic.

your python code looks whole lot c code. aren't taking advantage of language.

import math, random, time, itertools  def main (num=1000000) :     # small optimization speeds things couple percent.     sqrt = math.sqrt      # generate random points , numbers      random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)      def random_point () :         return [random.random(), random.random(), random.random()]      def random_index () :        return random.randint(0, num-1)      # big optimization:     # don't generate lists of points.     # instead use list comprehensions create iterators.     # it's best avoid creating lists of millions of entities when don't     # need lists. don't need lists; need iterators.     pt_list = [random_point() in xrange(num)]     rand_pts = [pt_list[random_index()] in xrange(num)]       # compute      beg_time = time.clock()     dist = 0       # don't loop on range. that's c-like.     # instead loop on iterable, preferably 1 doesn't create     # collection on iteration occur.     # particularly important when collection large.     (pt0, pt1) in itertools.izip (pt_list, rand_pts) :          # small optimization: inner loop inlined,         # intermediate variable 'val' eliminated.         d0 = pt0[0]-pt1[0]         d1 = pt0[1]-pt1[1]         d2 = pt0[2]-pt1[2]          dist += sqrt(d0*d0 + d1*d1 + d2*d2)      end_time = time.clock()     elap_time = (end_time - beg_time)      print elap_time     print dist 


update

optimization #4, use numpy

the following takes 1/40th time of original version in timed section of code. not quite fast c, close.

note commented out, "mondo slow" computation. takes ten times long original version. there overhead cost using numpy. setup takes quite bit longer in code follows compared in non-numpy optimization #3.

bottom line: need take care when using numpy, , setup costs might significant.

import numpy, random, time  def main (num=1000000) :      # generate random points , numbers      random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)      def random_point () :         return [random.random(), random.random(), random.random()]      def random_index () :        return random.randint(0, num-1)      pt_list = numpy.array([random_point() in xrange(num)])     rand_pts = pt_list[[random_index() in xrange(num)],:]      # compute      beg_time = time.clock()      # mondo slow.     # dist = numpy.sum (     #            numpy.apply_along_axis (     #                numpy.linalg.norm, 1, pt_list - rand_pts))      # mondo fast.     dist = numpy.sum ((numpy.sum ((pt_list-rand_pts)**2, axis=1))**0.5)      end_time = time.clock()     elap_time = (end_time - beg_time)      print elap_time     print dist 

Popular posts from this blog

How to calculate SNR of signals in MATLAB? -

c# - Attempting to upload to FTP: System.Net.WebException: System error -

ios - UISlider customization: how to properly add shadow to custom knob image -