Discontinuous Galerkin Library
#include "dg/algorithm.h"
|
Classes | |
struct | dg::MPIGather< Vector > |
Optimized MPI Gather operation. More... | |
struct | dg::MPIKroneckerGather< Vector > |
Communicator for asynchronous communication of MPISparseBlockMat . More... | |
Functions | |
template<class MessageType > | |
std::map< int, MessageType > | dg::mpi_permute (const std::map< int, MessageType > &messages, MPI_Comm comm) |
Exchange messages between processes in a communicator. | |
template<class ContainerType > | |
void | dg::mpi_gather (const thrust::host_vector< std::array< int, 2 > > &gather_map, const ContainerType &gatherFrom, ContainerType &result, MPI_Comm comm) |
Un-optimized distributed gather operation. | |
template<class ContainerType > | |
void | dg::mpi_scatter (const thrust::host_vector< std::array< int, 2 > > &scatter_map, const ContainerType &toScatter, ContainerType &result, MPI_Comm comm, bool resize_result=false) |
Un-optimized distributed scatter operation. | |
template<class Integer > | |
thrust::host_vector< std::array< Integer, 2 > > | dg::mpi_invert_permutation (const thrust::host_vector< std::array< Integer, 2 > > &p, MPI_Comm comm) |
Invert a globally bijective index map. | |
In order to understand what this is about you should first really(!) understand what gather and scatter operations are, so grab pen and paper!
dg::mpi_scatter
though. The reason is that gather operations are more easy to understand and implement because of the possible reduction in scatter operations. First, we note that gather and scatter are most often used in the context of memory buffers. The buffer needs to be filled wih values (gather) or these values need to be written back into the original place (scatter).
Imagine a buffer vector w and an index map \( \text{g}[i]\) that gives to every index \( i\) in this vector w an index \( \text{g}[i]\) into a source vector v.
We can now define: Gather values from v and put them into w according to \( w[i] = v[\text{g}[i]] \)
Loosely we think of Scatter as the reverse operation, i.e. take the values in w and write them back into v. However, simply writing \( v[\text{g}[j]] = w[j] \) is a very bad definition. What should happen if \( g[j] = g[k]\) for some j and k? What if some indices \( v_i\) are not mapped at all?
It is more accurate to represent the gather and scatter operation by a matrix.
Gather matrix: A matrix \( G\) of size \( m \times N\) is a gather matrix if it consists of only 1's and 0's and has exactly one "1" in each row. \( m\) is the buffer size, \( N \) is the vector size and \( N\) may be smaller, same or larger than \(m\). If \( \text{g}[i]\) is the index map then
\[ G_{ij} := \delta_{\text{g}[i] j}\]
We have \( w = G v\)
Scatter matrix: A matrix \( S \) is a scatter matrix if its transpose is a gather matrix.
This means that \( S\) has size \( N \times m \) consists of only 1's and 0's and has exactly one "1" in each column. If \( \text{g}[j]\) is the index map then
\[ S_{ij} := \delta_{i \text{g}[j]}\]
We have \( v = S w\)
All of the following statements are true
Of the scatter and gather matrices permutations are especially interesting A matrix is a permutation if and only if it is both a scatter and a gather matrix. In such a case it is square \( m \times m\) and
\[ P^{-1} = P^T\]
. The buffer \( w\) and vector \( v\) have the same size \(m\).
The following statements are all true
\[ m^{-1} = G' \vec i = S \vec i\]
Now we turn the case that v and w are distributed across processes. Accordingly, the index map \( g\) is also distributed across processes (in the same way w is). The elements of \( g\) are global indices into v that have to be transformed to pairs
\[ i = [r, j]\]
where j is the local index into v and r is the rank in communicator) according to a user provided function. The user has to provide the index map as vector of mentioned pairs.
Imagine now that we want to perform a globally distributed gather operation. Notice that there a Bootstrap problem involved. The given index map tells each rank from where to receive data but each rank also needs to know where to send its own data to. This means in order to setup the communication we need to communicate to start with (the dg::mpi_permute
function does that):
In total we thus describe the global gather as
\[ w = G v = G_1 P_{G,MPI} G_2 v\]
The global scatter operation is then simply
\[ v = S w = G_2^T P^T_{G,MPI} G^T_1 w = S_2 P_{S,MPI} S_1 w \]
(The scatter operation is constructed the same way as the gather operation, it is just the execution that is different)
\[ M v = R G v = R G_1 P_{G,MPI} G_2 v = R' P_{G,MPI} G_2 v\]
. If R was a coo matrix the simple way to obtain R' is replacing the column indices with the map \( g_1\).
For
\[ M v = S C v = S_2 P_{S,MPI} S_1 C v = S_2 P_{S,MPI} C' v\]
. Again, if C was a coo matrix the simple way to obtain C' is replacing the row indices with the map \( g_1\).
Simplifications can be achieved if \( G_2 = S_2 = I\) is the identity or if \( P_{G,MPI} = P_{S,MPI} = P_{MPI}\) is symmetric, which means that in-place communication can be used.
void dg::mpi_gather | ( | const thrust::host_vector< std::array< int, 2 > > & | gather_map, |
const ContainerType & | gatherFrom, | ||
ContainerType & | result, | ||
MPI_Comm | comm ) |
Un-optimized distributed gather operation.
ContainerType | A (host) vector |
gather_map | Each element consists of {rank within comm, local index on that rank} pairs, which is equivalent to the global address of a vector element in gatherFrom |
gatherFrom | Local part of the vector from which the calling and other ranks can gather indices |
result | (Same size as gather_map on output) On output contains the elements that gather_map referenced |
comm | The MPI communicator within which to exchange elements |
thrust::host_vector< std::array< Integer, 2 > > dg::mpi_invert_permutation | ( | const thrust::host_vector< std::array< Integer, 2 > > & | p, |
MPI_Comm | comm ) |
Invert a globally bijective index map.
p | Each element consists of {rank, local index on that rank} pairs, which is equivalent to the global address of a vector element |
toScatter
must map to distince elements in result
and all elements in result
must be mapped comm | The MPI communicator within which to exchange elements |
std::map< int, MessageType > dg::mpi_permute | ( | const std::map< int, MessageType > & | messages, |
MPI_Comm | comm ) |
Exchange messages between processes in a communicator.
This happens in two communication phases (find more details in MPI distributed gather and scatter)
MPI_Allgather
with given communicator to setup the communication pattern among processesMPI_Alltoallv
to send the actual messages.For example
MessageType | Can be one of the following
|
messages | (in) messages[rank] contains the message that the calling process sends to the process rank within comm |
comm | The MPI communicator within which to exchange messages. All processes in comm need to call this function. |
received_messages[rank]
contains the message that the calling process receveived from the process with rank in commContainerType | Shared ContainerType. |
dg::mpi_invert_permutation
void dg::mpi_scatter | ( | const thrust::host_vector< std::array< int, 2 > > & | scatter_map, |
const ContainerType & | toScatter, | ||
ContainerType & | result, | ||
MPI_Comm | comm, | ||
bool | resize_result = false ) |
Un-optimized distributed scatter operation.
ContainerType | A (host) vector |
scatter_map | Each element consists of {rank, local index on that rank} pairs, which is equivalent to the global address of an element in result |
toScatter
must map to distince elements in result
toScatter | Same size as scatter_map . The scatter_map tells where each element in this vector is sent to |
result | In principle we must know the size of result beforehand (because how else did you come up with a scatter_map ) |
comm | The MPI communicator within which to exchange elements |
resize_result | If true we resize the result to the correct size (mainly needed for dg::mpi_invert_permutation ) |