
由于GIL的存在,python多线程无法完成多核并行运算,因此在处理CPU-bound的工作时,可以利用multiprocessing模块实现多进程编程。与多线程相比,多进程有以下优点:
- 不受GIL限制
- 在Mac和Linux系统下效率高
1. The Process class
In multiprocessing, processes are spawned by creating a Process object and then calling its start() method. Process follows the API of threading.Thread.
|
The multiprocessing module also introduces APIs which do not have analogs in the threading module.
- Queue and Pipe
- Pool
Note: on windows, if __name__ == '__main__' is especially important since on windows, this module spawns processes by creating new python interpreter which is different from the fork() method in Unix. And because of this, windows will take more time and resources to spawn processes compared to Unix.
use set_start_method('spawn') to set start method
2. Communication between processes
Use pipe and queue to avoid having to use any synchronization primitives like locks.
Queue(FIFO)
class multiprocessing.Queue([maxsize])
|
Queue example:
|
3. Pool
The Pool class represents a pool of worker processes. It has methods which allows tasks to be offloaded to the worker processes in a few different ways.
One can create a pool of processes which will carry out tasks submitted to it with the Pool class.
|
If you need to run a function in a separate process, but want the current process to block until that function returns, use map or apply, otherwise, use map_async or apply_async.
map(func, iterable[, chunksize]): same like map function, applies the same function to many arguments.
map_async(func[, args[, kwds[, callback[, error_callback]]]]): call returns immediately instead of waiting for the result. You call its get() method to retrieve the result of the function call. When the function is complete, callback() is called. This can be used instead of calling get().
An example of callback function:
|
starmap_async(func, iterable[, chunksize]): Like map() except that the elements of the iterable are expected to be iterables that are unpacked as arguments.
An example of starmap_async:
|
4. Sharing state between processes
Shared memory
Using Value and Array to share data:
|
Server process
A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.
Server process managers are more flexible than using shared memory objects because they can be made to support arbitrary object types. Also, a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory.
5. The multiprocessing.dummy module
multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.
Test the efficiency of multithreading and multiprocessing
|
Result:
|




近期评论