Using line profiler in Python Multiprocessing

September 26, 2020

5-Minute Read

884 words

Python line profiler is a very convenient package that allows you to easily see the time taken for each line of code to execute. However, a fatal flaw is that it does not support profiling in multiprocessing, and there has been an open issue on Github since 2016. Here, I provide a hacky workaround for using line profiler in multiprocessing.

I saw this method on a Japanese website, even though I can’t read Japanese, the Python code written by the Japanese person is the same as mine, so I can still adapt it for use.

Original Usage

Assuming that we are calculating the sum of each number to the power of 1.5 by multiprocessing, we use the following inefficient code for illustrative purposes:

import math
import multiprocessing as mp

def power3(num):
    return num * num * num

def child(num):
    num = math.sqrt(num)
    num = power3(num)
    return num

pool = mp.Pool()
total = 0
for res in pool.imap_unordered(child, list(range(5))):
    total += res
print(total)
pool.close()
pool.join()

Using the child function to call power3 in parallel to calculate the sum of 1.5 to the power of 1 to 5. If we add @profile on both functions for profiling, the following result will be generated:

Timer unit: 1e-06 s

Total time: 0 s
File: q.py
Function: power3 at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def power3(num):
     6                                               return num * num * num

Total time: 0 s
File: q.py
Function: child at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def child(num):
    10                                               num = math.sqrt(num)
    11                                               num = power3(num)
    12                                               return num

Generates nothing!

Profile each subprocess

We can solve this issue by profiling each subprocess:

import math
import multiprocessing as mp
from line_profiler import LineProfiler


def wrap(num):
    prof = LineProfiler()
    prof.add_function(child)
    prof.add_function(power3)
    res = prof.runcall(child, num)
    # prof.print_stats()
    prof.dump_stats('demo_%d.lprof' % num)
    return res


def power3(num):
    return num * num * num


def child(num):
    num = math.sqrt(num)
    num = power3(num)
    return num


pool = mp.Pool()
timings = dict()
unit = None
total = 0
for res in pool.imap_unordered(wrap, list(range(5))):
    total += res
print(total)
pool.close()
pool.join()

After execution, we use prof.dump_stats('demo_%d.lprof' % num) to save the log of each subprocess. We can also use print_stats to print out the results. Because they are executed by different processes at the same time, the logs will be mixed together and difficult to read. After saving them, we can use python -m line_profiler demo_0.lprof to see the execution time of each line of code in the subprocess (demo_0.lprof can be replaced with other file names). The following is an example:

>> python -m line_profiler demo_0.lprof 
Timer unit: 1e-06 s

Total time: 1e-06 s
File: q.py
Function: power3 at line 16

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    16                                           def power3(num):
    17         1          1.0      1.0    100.0      return num * num * num

Total time: 1.4e-05 s
File: q.py
Function: child at line 20

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           def child(num):
    21         1         11.0     11.0     78.6      num = math.sqrt(num)
    22         1          3.0      3.0     21.4      num = power3(num)
    23         1          0.0      0.0      0.0      return num

Hits is 1 because it only calculate the running time of 1 subprocess here.

Combining logs from multiple subprocesses

We create five subprocesses here, so there are five .lprof files. Is there a way to merge them into the same file? You can use prof.get_stats() to extract the profiling results, and it is easy to merge multiple results together after observing the content. Then, use the show_text function provided by line_profiler to write to a file:

import math
import multiprocessing as mp
from line_profiler import LineProfiler, show_text


def wrap(num):
    prof = LineProfiler()
    prof.add_function(child)
    prof.add_function(power3)
    res = prof.runcall(child, num)
    return res, prof.get_stats()


def power3(num):
    return num * num * num


def child(num):
    num = math.sqrt(num)
    num = power3(num)
    return num


def merge_timing(timings, new_timings):
    for key, stats in new_timings.items():
        if key not in timings:
            timings[key] = stats
        else:
            cur_dict = dict((lineno, (hit, ts)) for lineno, hit, ts in timings[key])
            for lineno, hit, ts in stats:
                hit_now, ts_now = cur_dict.get(lineno, (0, 0))
                cur_dict[lineno] = (hit_now + hit, ts_now + ts)
            cur = sorted([(lineno, hit, ts) for lineno, (hit, ts) in cur_dict.items()])
            timings[key] = cur


pool = mp.Pool()
timings = dict()
unit = None
total = 0
for res, r in pool.imap_unordered(wrap, list(range(5))):
    total += res
    merge_timing(timings, r.timings)
    unit = r.unit
print(total)
show_text(timings, unit, stream=open('prof.txt', 'w'))
pool.close()
pool.join()

The result of prof.txt:

Timer unit: 1e-06 s

Total time: 3e-06 s
File: qq.py
Function: power3 at line 14

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    14                                           def power3(num):
    15         5          3.0      0.6    100.0      return num * num * num

Total time: 5.8e-05 s
File: qq.py
Function: child at line 18

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    18                                           def child(num):
    19         5         47.0      9.4     81.0      num = math.sqrt(num)
    20         5         10.0      2.0     17.2      num = power3(num)
    21         5          1.0      0.2      1.7      return num

As we can see, the hits are all five, indicating that we have merged the results of all five subprocesses. As for opening a pull request to the original repo, let’s leave that to the more experienced experts!

Reference

line_profiler with multiprocessing by floatnflow

Using line profiler in Python Multiprocessing

Original Usage

Profile each subprocess

Combining logs from multiple subprocesses

Reference

Recent Posts

Why Unlimited Swiping Right Fails Men: A Data Scientist’s View

Five Years of Investment Experience and Changes

Semi Join and Anti Join in SQL

Sending Google Analytics Events on Every Screen in Flutter

Building a Standout Entry-Level Machine Learning Engineer Resume

Categories

Tags