Message Passing Interface (MPI) is a standardized and portable message-passing standard designed by a group of researchers from academia and industry to function on a wide variety of parallel computing architectures.
-
OpenMPI
- One of implementation of MPI
- https://en.wikipedia.org/wiki/Open_MPI
-
MPICH
- MPICH is one of the most popular implementations of MPI
- https://en.wikipedia.org/wiki/MPICH
-
Diff of OpenMPI and MPICH
It is important to recognize how MPICH and Open-MPI are different, i.e. that they are designed to meet different needs. MPICH is supposed to be high-quality reference implementation of the latest MPI standard and the basis for derivative implementations to meet special purpose needs. Open-MPI targets the common case, both in terms of usage and network conduits.
-
Others
- MPICH 3.0.4 and later on Mac, Linux SMPs and SGI SMPs.
- MVAPICH2 2.0a and later on Linux InfiniBand clusters.
- CrayMPI 6.1.0 and later on Cray XC30.
- SGI MPT 2.09 on SGI SMPs.
- OpenMPI development version on Mac.
ARMCI
ARMCI: Aggregate Remote Memory Copy Interface
The purpose of the Aggregate Remote Memory Copy (ARMCI) library is to provide a general-purpose, efficient, and widely portable remote memory access (RMA) operations (one-sided communication) optimized for contiguous and noncontiguous (strided, scatter/gather, I/O vector) data transfers.
ARMCI is compatible with MPI.
ARMCI is a standalone system that could be used to support user-level
libraries and applications that use MPI or PVM(Parrallel Virtual Machine).
1 |
MPI (Message Passing Interface) is a description of an interface that can be used to communicate between computational nodes to coordinate calculations. MPI alone is not a framework that can be used, you need an implementation of this Interface, which is available for a lot of systems and programming languages. |
- http://hpc.pnl.gov/armci/
ARMCI-MPI
- is completely rewritten implementation of ARMCI
- https://wiki.mpich.org/armci-mpi/index.php/Main_Page
UCX
- It’s ROCm-aware
- http://www.openucx.org/introduction/
MPI Communication
Point-to-point communication
-
MPI_Send
1
2
3
4
5
6
7MPI_Send(
void* data,
int count,
MPI_Datatype datatype,
int destination,
int tag,
MPI_Comm communicator)- tag: indicate a specific message
Sometimes there are cases when A might have to send many different types of messages to B. Instead of B having to go through extra measures to differentiate all these messages, MPI allows senders and receivers to also specify message IDs with the message (known as tags). When process B only requests a message with a certain tag number, messages with different tags will be buffered by the network until B is ready for them.
- tag: indicate a specific message
-
MPI_Recv
1
2
3
4
5
6
7
8MPI_Recv(
void* data,
int count,
MPI_Datatype datatype,
int source,
int tag,
MPI_Comm communicator,
MPI_Status* status)
Even though the message is routed to B, process B still has to acknowledge that it wants to receive A’s data. Once it does this, the data has been transmitted. Process A is acknowledged that the data has been transmitted and may go back to work.
Broadcast
MPI_Bcast
: one process(root) sends same data to all processes in a communicator.- One of the main uses of broadcasting is to send out user input to a parallel program, or send out configuration parameters to all processes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14+-> 0 (A)
|
+-> 1 (A)
0(A) => |
+-> 2 (A)
|
+-> 3 (A)
MPI_Bcast(
void* data,
int count,
MPI_Datatype datatype
int root,
MPI_Comm communicator)
- One of the main uses of broadcasting is to send out user input to a parallel program, or send out configuration parameters to all processes.
Scatter
MPI_Scatter
: similar asMPI_Bcast
, but sends chunks of an array to different processes in the order of process rank.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17+-> 0 (A)
|
+-> 1 (B)
0(A,B,C,D) => |
+-> 2 (C)
|
+-> 3 (D)
MPI_Scatter(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator)
Gather
MPI_Gather
: inverse ofMPI_Scatter
, gather data from all processes to single one process(root).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
170 (A) --+
|
1 (B) --+
| => 0(A,B,C,D)
2 (C) --+
|
3 (D) --+
MPI_Gather(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator)
Allgather
MPI_Allgather
: likeMPI_Gather
+MPI_Bcast
, similar asMPI_Gather
without the root process.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
160 (A) --+ +-- 0 (A,B,C,D)
| |
1 (B) --+ +-- 1 (A,B,C,D)
| => |
2 (C) --+ +-- 2 (A,B,C,D)
| |
3 (D) --+ +-- 3 (A,B,C,D)
MPI_Allgather(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
MPI_Comm communicator)
Reduce
MPI_Reduce
: similar toMPI_Gather
with the reduced result of all data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
240 (3) --+
|
1 (2) --+ (MPI_SUM)
| ========> 0(12)
2 (5) --+
|
3 (2) --+
0 (3, 1) --+
|
1 (2, 3) --+ (MPI_SUM)
| ========> 0(12, 9)
2 (5, 4) --+
|
3 (2, 1) --+
MPI_Reduce(
void* send_data,
void* recv_data,
int count,
MPI_Datatype datatype,
MPI_Op op,
int root,
MPI_Comm communicator)
Allreduce
MPI_Allreduce
: reduce the values and distribute the results to all processes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
150 (3, 1) --+ +-- 0 (12, 9)
| |
1 (2, 3) --+ (MPI_SUM) +-- 1 (12, 9)
| ========> |
2 (5, 4) --+ +-- 2 (12, 9)
| |
3 (2, 1) --+ +-- 3 (12, 9)
MPI_Allreduce(
void* send_data,
void* recv_data,
int count,
MPI_Datatype datatype,
MPI_Op op,
MPI_Comm communicator)
NCCL
- https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/overview.html
- https://developer.nvidia.com/nccl
- https://devblogs.nvidia.com/fast-multi-gpu-collectives-nccl/
- https://my.oschina.net/u/1459307/blog/1650028
- https://baijiahao.baidu.com/s?id=1581386178946489641&wfr=spider&for=pc
近期评论