Python line profiler is a very convenient package that allows you to easily see the time taken for each line of code to execute. However, a fatal flaw is that it does not support profiling in multiprocessing, and there has been an open issue on Github since 2016. Here, I provide a hacky workaround for using line profiler in multiprocessing.
I saw this method on a Japanese website, even though I can’t read Japanese, the Python code written by the Japanese person is the same as mine, so I can still adapt it for use.
Original Usage
Assuming that we are calculating the sum of each number to the power of 1.5 by multiprocessing, we use the following inefficient code for illustrative purposes:
import math
import multiprocessing as mp
def power3(num):
return num * num * num
def child(num):
num = math.sqrt(num)
num = power3(num)
return num
pool = mp.Pool()
total = 0
for res in pool.imap_unordered(child, list(range(5))):
total += res
print(total)
pool.close()
pool.join()
Using the child
function to call power3
in parallel to calculate the sum of 1.5 to the power of 1 to 5. If we add @profile
on both functions for profiling, the following result will be generated:
Timer unit: 1e-06 s
Total time: 0 s
File: q.py
Function: power3 at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def power3(num):
6 return num * num * num
Total time: 0 s
File: q.py
Function: child at line 8
Line # Hits Time Per Hit % Time Line Contents
==============================================================
8 @profile
9 def child(num):
10 num = math.sqrt(num)
11 num = power3(num)
12 return num
Generates nothing!
Profile each subprocess
We can solve this issue by profiling each subprocess:
import math
import multiprocessing as mp
from line_profiler import LineProfiler
def wrap(num):
prof = LineProfiler()
prof.add_function(child)
prof.add_function(power3)
res = prof.runcall(child, num)
# prof.print_stats()
prof.dump_stats('demo_%d.lprof' % num)
return res
def power3(num):
return num * num * num
def child(num):
num = math.sqrt(num)
num = power3(num)
return num
pool = mp.Pool()
timings = dict()
unit = None
total = 0
for res in pool.imap_unordered(wrap, list(range(5))):
total += res
print(total)
pool.close()
pool.join()
After execution, we use prof.dump_stats('demo_%d.lprof' % num)
to save the log of each subprocess. We can also use print_stats
to print out the results. Because they are executed by different processes at the same time, the logs will be mixed together and difficult to read. After saving them, we can use python -m line_profiler demo_0.lprof
to see the execution time of each line of code in the subprocess (demo_0.lprof
can be replaced with other file names). The following is an example:
>> python -m line_profiler demo_0.lprof
Timer unit: 1e-06 s
Total time: 1e-06 s
File: q.py
Function: power3 at line 16
Line # Hits Time Per Hit % Time Line Contents
==============================================================
16 def power3(num):
17 1 1.0 1.0 100.0 return num * num * num
Total time: 1.4e-05 s
File: q.py
Function: child at line 20
Line # Hits Time Per Hit % Time Line Contents
==============================================================
20 def child(num):
21 1 11.0 11.0 78.6 num = math.sqrt(num)
22 1 3.0 3.0 21.4 num = power3(num)
23 1 0.0 0.0 0.0 return num
Hits
is 1 because it only calculate the running time of 1 subprocess here.
Combining logs from multiple subprocesses
We create five subprocesses here, so there are five .lprof
files. Is there a way to merge them into the same file? You can use prof.get_stats()
to extract the profiling results, and it is easy to merge multiple results together after observing the content. Then, use the show_text
function provided by line_profiler
to write to a file:
import math
import multiprocessing as mp
from line_profiler import LineProfiler, show_text
def wrap(num):
prof = LineProfiler()
prof.add_function(child)
prof.add_function(power3)
res = prof.runcall(child, num)
return res, prof.get_stats()
def power3(num):
return num * num * num
def child(num):
num = math.sqrt(num)
num = power3(num)
return num
def merge_timing(timings, new_timings):
for key, stats in new_timings.items():
if key not in timings:
timings[key] = stats
else:
cur_dict = dict((lineno, (hit, ts)) for lineno, hit, ts in timings[key])
for lineno, hit, ts in stats:
hit_now, ts_now = cur_dict.get(lineno, (0, 0))
cur_dict[lineno] = (hit_now + hit, ts_now + ts)
cur = sorted([(lineno, hit, ts) for lineno, (hit, ts) in cur_dict.items()])
timings[key] = cur
pool = mp.Pool()
timings = dict()
unit = None
total = 0
for res, r in pool.imap_unordered(wrap, list(range(5))):
total += res
merge_timing(timings, r.timings)
unit = r.unit
print(total)
show_text(timings, unit, stream=open('prof.txt', 'w'))
pool.close()
pool.join()
The result of prof.txt
:
Timer unit: 1e-06 s
Total time: 3e-06 s
File: qq.py
Function: power3 at line 14
Line # Hits Time Per Hit % Time Line Contents
==============================================================
14 def power3(num):
15 5 3.0 0.6 100.0 return num * num * num
Total time: 5.8e-05 s
File: qq.py
Function: child at line 18
Line # Hits Time Per Hit % Time Line Contents
==============================================================
18 def child(num):
19 5 47.0 9.4 81.0 num = math.sqrt(num)
20 5 10.0 2.0 17.2 num = power3(num)
21 5 1.0 0.2 1.7 return num
As we can see, the hits are all five, indicating that we have merged the results of all five subprocesses. As for opening a pull request to the original repo, let’s leave that to the more experienced experts!