Issue
I know that most of the numpy operations will release GIL. Therefore, most of them can get expected speed up when working with python's multithreading. But I found it weird that this is not the case with numpy.linalg.inv
.
Experiment
I tried the following
import numpy as np
import time
from concurrent.futures import ThreadPoolExecutor
def numpy_op(arr):
# do matrix inversion here
return np.linalg.inv(arr)
num_workers = 8
np.random.seed(42)
args = [np.random.randn(10000, 10000) for _ in range(num_workers)]
# parallelize with thread pool
s_time = time.time()
with ThreadPoolExecutor(max_workers=num_workers) as executor:
res = list(executor.map(numpy_op, args))
# sequential code
res = []
for arg in args:
res.append(numpy_op(arg))
print(f'time consumed: {time.time()-s_time:.2f}s')
Results
- multithreading: 36.44s
- sequential: 28.14s
However, the speed up of multi-threading is quite normal if I do some other numpy operations other than matrix inversion, for example
def numpy_op(arr):
# some random numpy operations here
return (arr**2 + 2) ** 0.5
With this operations, the results are
- multithreading: 0.62s
- sequential: 3.71s
Version
I am using Python 3.9.10
and numpy 1.24.3
Solution
The issue here is that the implementation of np.linalg.inv
is itself multithreaded. To see this effect, add these lines to the top of your example:
import os
os.environ["OPENBLAS_NUM_THREADS"] = "1"
This reduces the number of threads available to such operations down to just one (assuming your numpy was built with openBLAS. See threadpoolctl for other cases.)
On my machine, this change flips the results:
- Sequential: 31.5 s -> 64.5 s
- Multithreaded: 56.0 s -> 26.8 s
I don't know if there is a comprehensive list of multithreaded operations in numpy, but it is something to consider when optimizing multicore throughput.
Answered By - Richard Sheridan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.