Collaboration diagram for MPI Distributed Gather and Scatter:

Classes
struct	dg::MPIGather< Vector >
	Optimized MPI Gather operation. More...

struct	dg::MPIKroneckerGather< Vector >
	Communicator for asynchronous communication of `MPISparseBlockMat`. More...

Functions
template<class MessageType >
std::map< int, MessageType >	dg::mpi_permute (const std::map< int, MessageType > &messages, MPI_Comm comm)
	Exchange messages between processes in a communicator.

template<class ContainerType >
void	dg::mpi_gather (const thrust::host_vector< std::array< int, 2 > > &gather_map, const ContainerType &gatherFrom, ContainerType &result, MPI_Comm comm)
	Un-optimized distributed gather operation.

template<class ContainerType >
void	dg::mpi_scatter (const thrust::host_vector< std::array< int, 2 > > &scatter_map, const ContainerType &toScatter, ContainerType &result, MPI_Comm comm, bool resize_result=false)
	Un-optimized distributed scatter operation.

template<class Integer >
thrust::host_vector< std::array< Integer, 2 > >	dg::mpi_invert_permutation (const thrust::host_vector< std::array< Integer, 2 > > &p, MPI_Comm comm)
	Invert a globally bijective index map.

Detailed Description

MPI distributed gather and scatter operations

In order to understand what this is about you should first really(!) understand what gather and scatter operations are, so grab pen and paper!

Note: The dg library only implements optimized MPI gather operations. There is an un-optimized dg::mpi_scatter though. The reason is that gather operations are more easy to understand and implement because of the possible reduction in scatter operations.

Primer: Gather and scatter operations

First, we note that gather and scatter are most often used in the context of memory buffers. The buffer needs to be filled wih values (gather) or these values need to be written back into the original place (scatter).

Imagine a buffer vector w and an index map \( \text{g}[i]\) that gives to every index \( i\) in this vector w an index \( \text{g}[i]\) into a source vector v.

We can now define: Gather values from v and put them into w according to \( w[i] = v[\text{g}[i]] \)

Loosely we think of Scatter as the reverse operation, i.e. take the values in w and write them back into v. However, simply writing \( v[\text{g}[j]] = w[j] \) is a very bad definition. What should happen if \( g[j] = g[k]\) for some j and k? What if some indices \( v_i\) are not mapped at all?

It is more accurate to represent the gather and scatter operation by a matrix.

Gather matrix: A matrix \( G\) of size \( m \times N\) is a gather matrix if it consists of only 1's and 0's and has exactly one "1" in each row. \( m\) is the buffer size, \( N \) is the vector size and \( N\) may be smaller, same or larger than \(m\). If \( \text{g}[i]\) is the index map then

\[ G_{ij} := \delta_{\text{g}[i] j}\]

We have \( w = G v\)

Scatter matrix: A matrix \( S \) is a scatter matrix if its transpose is a gather matrix.

This means that \( S\) has size \( N \times m \) consists of only 1's and 0's and has exactly one "1" in each column. If \( \text{g}[j]\) is the index map then

\[ S_{ij} := \delta_{i \text{g}[j]}\]

We have \( v = S w\)

All of the following statements are true

The transpose of a gather matrix is a scatter matrix \( S = G^\mathrm{T}\). The associated index map of \( S\) is identical to the index map of \( G\).
The transpose of a scatter matrix is a gather matrix \( G = S^\mathrm{T}\). The associated index map of \( G\) is identical to the index map of \( S\).
From a given index map we can construct two matrices ( \( G \) and \( S\))
A simple consistency test is given by \( (Gv)\cdot (Gv) = S(Gv)\cdot v\).
A scatter matrix can have zero, one or more "1"s in each row.
A gather matrix can have zero, one or more "1"s in each column.
If v is filled with its indices i.e. \( v_i = i\) then \( m = Gv\) i.e. the gather operation reproduces the index map
If the entries of w are \( w_j = j\) then \( m \neq Sw\) does not reproduce the index map
In a "coo" formatted sparse matrix format the gather matrix is assembled via: \( m \) rows, \( N\) columns and \( m\) non-zeroes, the values array would consist only of "1"s, the row array is just the index \(i\) and the column array is the map \( g[i]\).
In a "coo" formatted sparse matrix format the scatter matrix is assembled via: \( N \) rows, \( m\) columns and \( m\) non-zeroes, the values array would consist only of "1"s, the row array is the map \(g[j]\) and the column array is the index \( j\).
\( G' = G_1 G_2 \), i.e. the multiplication of two gather matrices is again a gather
\( S' = S_1 S_2 \), i.e. the multiplication of two scatter matrices is again a scatter

Of the scatter and gather matrices permutations are especially interesting A matrix is a permutation if and only if it is both a scatter and a gather matrix. In such a case it is square \( m \times m\) and

\[ P^{-1} = P^T\]

. The buffer \( w\) and vector \( v\) have the same size \(m\).

The following statements are all true

The index map of a permutation is bijective i.e. invertible i.e. each element of the source vector v maps to exactly one location in the buffer vector w.
The scatter matrix \( S = G^T \equiv G'\neq G\) is a gather matrix (in general unequal \( G\)) with the associate index map \( m^{-1}\). Since the index map is recovered by applying the gather operation to the vector containing its index as values, we have
\[ m^{-1} = G' \vec i = S \vec i\]
\( S' = P_1 S P_2 \), i.e. multiplication of a scatter matrix by a permutation is again a scatter matrix
\( G' = P_1 G P_2 \), i.e. multiplication of a gather matrix by a permutation is again a gather matrix
A Permutation is symmetric if and only if it has identical scatter and gather maps
Symmetric permutations can be implemented "in-place" i.e. the source and buffer can be identical

MPI distributed gather and scatter

Now we turn the case that v and w are distributed across processes. Accordingly, the index map \( g\) is also distributed across processes (in the same way w is). The elements of \( g\) are global indices into v that have to be transformed to pairs

\[ i = [r, j]\]

where j is the local index into v and r is the rank in communicator) according to a user provided function. The user has to provide the index map as vector of mentioned pairs.

Imagine now that we want to perform a globally distributed gather operation. Notice that there a Bootstrap problem involved. The given index map tells each rank from where to receive data but each rank also needs to know where to send its own data to. This means in order to setup the communication we need to communicate to start with (the dg::mpi_permute function does that):

From the given index map a MPI communication matrix (of size \( s \times s\) where \( s\) is the number of processes in the MPI communicator) can be inferred. Each row shows how many elements a given rank ( the row index) receives from each of the other ranks in the communicator (the column indices). Each column of this map describe the sending pattern, i.e. how many elements a given rank (the column index) has to send each of the other ranks in the communicator. If the MPI communication matrix is symmetric we can perform MPI communications in-place
The information from the communication matrix can be used to allocate appropriately sized MPI send and receive buffers. Furthermore, it is possible to define a permutation across different processes. It is important to note that the index map associated to that permutation is immplementation defined i.e. the implementation analyses the communication matrix and chooses an optimal call of MPI Sends and Recvs. The implementation then provides two index maps. The first one must be used to gather values from v into the MPI send buffer and the second one can be used to gather values from the receive buffer into the target buffer. Notice that these two operations are local and require no MPI communication.

In total we thus describe the global gather as

\[ w = G v = G_1 P_{G,MPI} G_2 v\]

The global scatter operation is then simply

\[ v = S w = G_2^T P^T_{G,MPI} G^T_1 w = S_2 P_{S,MPI} S_1 w \]

(The scatter operation is constructed the same way as the gather operation, it is just the execution that is different)

Note: If the scatter/gather operations are part of a matrix-vector multiplication then \( G_1\) or \( S_1\) can be absorbed into the matrix

\[ M v = R G v = R G_1 P_{G,MPI} G_2 v = R' P_{G,MPI} G_2 v\]

. If R was a coo matrix the simple way to obtain R' is replacing the column indices with the map \( g_1\).

Note: To give the involved vectors unique names we call v the "vector", \( s = G_2 v\) is the "store" and, \( b = P s\) is the "buffer".

For

\[ M v = S C v = S_2 P_{S,MPI} S_1 C v = S_2 P_{S,MPI} C' v\]

. Again, if C was a coo matrix the simple way to obtain C' is replacing the row indices with the map \( g_1\).

Simplifications can be achieved if \( G_2 = S_2 = I\) is the identity or if \( P_{G,MPI} = P_{S,MPI} = P_{MPI}\) is symmetric, which means that in-place communication can be used.

Note: Locally, a gather operation is trivially parallel but a scatter operation is not in general (because of the possible reduction operation).

Function Documentation

◆ mpi_gather()

template<class ContainerType >

void dg::mpi_gather	(	const thrust::host_vector< std::array< int, 2 > > &	gather_map,
		const ContainerType &	gatherFrom,
		ContainerType &	result,
		MPI_Comm	comm )

Un-optimized distributed gather operation.

Template Parameters

ContainerType A (host) vector

Parameters

gather_map	Each element consists of `{rank within comm, local index on that rank}` pairs, which is equivalent to the global address of a vector element in `gatherFrom`
gatherFrom	Local part of the vector from which the calling and other ranks can gather indices
result	(Same size as gather_map on output) On output contains the elements that `gather_map` referenced
comm	The MPI communicator within which to exchange elements

See also: MPI distributed gather and scatter operations

◆ mpi_invert_permutation()

template<class Integer >

thrust::host_vector< std::array< Integer, 2 > > dg::mpi_invert_permutation	(	const thrust::host_vector< std::array< Integer, 2 > > &	p,
		MPI_Comm	comm )

Invert a globally bijective index map.

Parameters

p	Each element consists of `{rank, local index on that rank}` pairs, which is equivalent to the global address of a vector element

Attention: Must be bijective i.e. globally distinct elements in toScatter must map to distince elements in result and all elements in result must be mapped

Parameters

comm	The MPI communicator within which to exchange elements

Returns: inverse map

See also: MPI distributed gather and scatter operations

◆ mpi_permute()

template<class MessageType >

std::map< int, MessageType > dg::mpi_permute	(	const std::map< int, MessageType > &	messages,
		MPI_Comm	comm )

Exchange messages between processes in a communicator.

This happens in two communication phases (find more details in MPI distributed gather and scatter)

Call MPI_Allgather with given communicator to setup the communication pattern among processes
Call MPI_Alltoallv to send the actual messages.

For example

int rank, size;
MPI_Comm_rank( MPI_COMM_WORLD, &rank);
MPI_Comm_size( MPI_COMM_WORLD, &size);
// Send an integer to the next process in a "circle"
std::map<int, int> messages = { {(rank + 1)%size, rank}};
INFO( "Rank "<<rank<<" send message "<<messages[(rank+1)%size]<<"\n");
auto recv = dg::mpi_permute( messages, MPI_COMM_WORLD);
// Each rank received a message from the previous rank
INFO("Rank "<<rank<<" received message "<<recv[(rank+size-1)%size]<<"\n");
CHECK( recv[(rank+size-1)%size] == (rank+size-1)%  size);
// If we permute again we send everything back
auto messages_num = dg::mpi_permute( recv, MPI_COMM_WORLD);
CHECK( messages == messages_num);

Template Parameters

MessageType

Can be one of the following

A primitive type like int or double
A (host) vector of primitive types like std::vector<int> or thrust::host_vector<double>
A (host) vector of std::array of primitive types like thrust::host_vector<std::array<double,3>>

Parameters

messages	(in) `messages[rank]` contains the message that the calling process sends to the process rank within comm
comm	The MPI communicator within which to exchange messages. All processes in comm need to call this function.

Returns: received_messages[rank] contains the message that the calling process receveived from the process with rank in comm

Note: This can be used to bootstrap mpi gather operations if elements is an index map "recvIdx" of local indices of messages to receive from rank, because it "tells" every process which messages to send; This function is a permutation i.e.
recvIdx == dg::mpi_permute( dg::mpi_permute(recvIdx, comm), comm);

dg::mpi_permute
std::map< int, MessageType > mpi_permute(const std::map< int, MessageType > &messages, MPI_Comm comm)
Exchange messages between processes in a communicator.
Definition mpi_permutation.h:91

Template Parameters

ContainerType Shared ContainerType.

See also: MPI distributed gather and scatter operations; Also can be used to invert a bijective mpi gather map in dg::mpi_invert_permutation

◆ mpi_scatter()

template<class ContainerType >

void dg::mpi_scatter	(	const thrust::host_vector< std::array< int, 2 > > &	scatter_map,
		const ContainerType &	toScatter,
		ContainerType &	result,
		MPI_Comm	comm,
		bool	resize_result = false )

Un-optimized distributed scatter operation.

Template Parameters

ContainerType A (host) vector

Parameters

scatter_map Each element consists of {rank, local index on that rank} pairs, which is equivalent to the global address of an element in result

Attention: Must be injective i.e. globally distinct elements in toScatter must map to distince elements in result

Parameters

toScatter	Same size as `scatter_map`. The `scatter_map` tells where each element in this vector is sent to
result	In principle we must know the size of `result` beforehand (because how else did you come up with a `scatter_map`)
comm	The MPI communicator within which to exchange elements
resize_result	If true we resize the result to the correct size (mainly needed for `dg::mpi_invert_permutation`)

See also: MPI distributed gather and scatter operations

Classes

Functions

Detailed Description

MPI distributed gather and scatter operations

Primer: Gather and scatter operations

MPI distributed gather and scatter

Function Documentation

◆ mpi_gather()

◆ mpi_invert_permutation()

◆ mpi_permute()

◆ mpi_scatter()