Collaboration diagram for MPI backend:

Classes
struct	dg::BijectiveComm< Index, Vector >
	Perform bijective gather and its transpose (scatter) operation across processes on distributed vectors using mpi. More...

struct	dg::SurjectiveComm< Index, Vector >
	Perform surjective gather and its transpose (scatter) operation across processes on distributed vectors using mpi. More...

struct	dg::GeneralComm< Index, Vector >
	Perform general gather and its transpose (scatter) operation across processes on distributed vectors using mpi. More...

struct	dg::aCommunicator< LocalContainer >
	Struct that performs collective scatter and gather operations across processes on distributed vectors using MPI. More...

struct	dg::RowColDistMat< LocalMatrixInner, LocalMatrixOuter, Collective >
	Distributed memory matrix class, asynchronous communication. More...

struct	dg::MPIDistMat< LocalMatrix, Collective >
	Distributed memory matrix class. More...

struct	dg::MPI_Vector< container >
	mpi Vector class More...

struct	dg::NearestNeighborComm< Index, Buffer, Vector >
	Communicator for asynchronous nearest neighbor communication. More...

Enumerations
enum	dg::dist_type { dg::row_dist =0 , dg::col_dist =1 }
	Type of distribution of MPI distributed matrices. More...

Functions
template<class ConversionPolicy , class real_type >
dg::MIHMatrix_t< real_type >	dg::convert (const dg::IHMatrix_t< real_type > &global, const ConversionPolicy &policy)
	Convert a (row-distributed) matrix with local row and global column indices to a row distributed MPI matrix. More...

template<class ConversionPolicy , class real_type >
dg::IHMatrix_t< real_type >	dg::convertGlobal2LocalRows (const dg::IHMatrix_t< real_type > &global, const ConversionPolicy &policy)
	Convert a (column-distributed) matrix with global row and column indices to a row distributed matrix. More...

template<class ConversionPolicy , class real_type >
void	dg::convertLocal2GlobalCols (dg::IHMatrix_t< real_type > &local, const ConversionPolicy &policy)
	Convert a matrix with local column indices to a matrix with global column indices. More...

Detailed Description

In this section the blas functions are implemented for the MPI+X hardware architectures, where X is e.g. CPU, GPU, accelerator cards... The general idea to achieve this is to separate global communication from local computations and thus readily reuse the existing, optimized library for the local part.

The MPI interface

Note: The mpi backend is activated by including mpi.h before any other feltor header file

MPI Vectors and the blas functions

In Feltor each mpi process usually gets an equally sized chunk of a vector. In particular the dg::aRealMPITopology2d and dg::aRealMPITopology3d classes represent the standard Cartesian process and point distribution (meaning every process gets an equally sized 2d / 3d box out of the global domain ). The corresponding mpi vector structure in FELTOR is the dg::MPI_Vector, which is nothing but a wrapper around a container type object and a MPI_Comm. With this the dg::blas1 functions can readily be implemented by just redirecting to the implementation for the container type. The only functions that need additional communication are the dg::blas1::dot and dg::blas2::dot functions (MPI_Allreduce).

MPI Matrices and the symv function

Contrary to a vector a matrix can be distributed among processes in two ways: row-wise and column-wise. When we implement a matrix-vector multiplication the order of communication and computation depends on the distribution of the matrix.

Row distributed matrices

In a row-distributed matrix each process holds the rows of the matrix that correspond to the portion of the MPI_Vector it holds. Every process thus holds the same amount of rows. When we implement a matrix-vector multiplication each process first has to gather all the elements of the input vector it needs to be able to compute the elements of its output. In general this requires MPI communication. (read the documentation of dg::aCommunicator for more info of how global scatter/gather operations work). After the elements have been gathered into a buffer the local matrix-vector multiplications can be executed. Formally, the gather operation can be written as a matrix \(G\) of \(1'\)s and \(0'\)s and we write.

\[ M v = R\cdot G v \]

where \(R\) is the row-distributed matrix with modified indices into a buffer vector and \(G\) is the gather matrix, in which the MPI-communication takes place. In this way we achieve a simple split between communication \( w=Gv\) and computation \( Rw\). Since the computation of \( R w\) is entirely local we can reuse the existing implementation for shared memory systems. Correspondingly, we define the structure dg::MPIDistMat as a simple a wrapper around a LocalMatrix type object and an instance of a dg::aCommunicator.

Column distributed matrices

In a column-distributed matrix each process holds the columns of the matrix that correspond to the portion of the MPI_Vector it holds. Each process thus holds the same amount of columns. In a column distributed matrix the local matrix-vector multiplication can be executed first because each processor already has all vector elements it needs. However the resulting elements have to be communicated back to the process they belong to. Furthermore, a process has to sum all elements it receives from other processes on the same index. This is a scatter and reduce operation and it can be written as a scatter matrix \(S\) (s.a. dg::aCommunicator).

\[ M v= S\cdot C v \]

where \(S\) is the scatter matrix and \(C\) is the column distributed matrix with modified indices. Again, we can reuse our shared memory algorithms to implement the local matrix-vector operation \( w=Cv\) before the communication step \( S w\). This is also implemented in dg::MPIDistMat.

Row and Column distributed

The dg::RowColDistMat goes one step further on a row distributed matrix and separates the matrix \( R = R_{inner} + R_{outer}\) into a part that can be computed entirely on the local process and a part that needs communication. This enables the implementation of overlapping communication and computation. (s.a. dg::NearestNeighborComm)

Transposition

It turns out that a row-distributed matrix can be transposed by transposition of both the local matrix and the gather matrix (s.a. dg::transpose):

\[ M^\mathrm{T} = G^\mathrm{T} R^\mathrm{T}\]

The result is then a column distributed matrix. Analogously, the transpose of a column distributed matrix is a row-distributed matrix.

Creation

You can create a row-distributed MPI matrix given its local parts on each process with local row and global column indices by our dg::convert function. If you have a column distributed matrix with its local parts on each process with global row and local columns indices, you can use a combination of dg::convertLocal2GlobalCols and dg::convertGlobal2LocalRows to bring it to a row-distributed form. The latter can then be used in dg::convert again.

Enumeration Type Documentation

◆ dist_type

enum dg::dist_type

Type of distribution of MPI distributed matrices.

Enumerator
row_dist	Row distributed.
col_dist	Column distributed.

Function Documentation

◆ convert()

template<class ConversionPolicy , class real_type >

dg::MIHMatrix_t< real_type > dg::convert	(	const dg::IHMatrix_t< real_type > &	global,
		const ConversionPolicy &	policy
	)

Convert a (row-distributed) matrix with local row and global column indices to a row distributed MPI matrix.

dg::IHMatrix_t<real_type> mat = dg::create::interpolation(
    g_new.local(), g_old.global(), method);
// mat is row distributed
// mat has local row and global column indices
auto mpi_mat = dg::convert(  mat, g_old);

Template Parameters

ConversionPolicy

(can be one of the MPI grids ) has to have the members:

bool global2localIdx(unsigned,unsigned&,unsigned&) const; where the first parameter is the global index and the other two are the output pair (localIdx, rank). return true if successful, false if global index is not part of the grid
MPI_Comm communicator() const; returns the communicator to use in the gather/scatter
local_size(); return the local vector size

Parameters

global	the local part of the matrix (different on each process) with global column indices and num_cols but local row indices and num_rows
policy	the conversion object

Returns: a row distributed MPI matrix. If no MPI communication is needed the collective communicator will have zero size.

See also: Topology base classes the MPI grids defined in Level 3 can all be used as a ConversionPolicy

◆ convertGlobal2LocalRows()

template<class ConversionPolicy , class real_type >

dg::IHMatrix_t< real_type > dg::convertGlobal2LocalRows	(	const dg::IHMatrix_t< real_type > &	global,
		const ConversionPolicy &	policy
	)

Convert a (column-distributed) matrix with global row and column indices to a row distributed matrix.

Send all elements with a global row-index that does not belong to the calling process to the process where it belongs to. This can be used to convert a column distributed matrix to a row-distributed matrix as in

dg::IHMatrix_t<real_type> mat = dg::create::projection(
    g_new.global(), g_old.local(), method);
// mat is column distributed
// mat has global rows and local column indices
dg::convertLocal2GlobalCols( mat, g_old);
// now mat has global rows and global column indices
auto mat_loc = dg::convertGlobal2LocalRows( mat, g_new);
// now mat is row distributed with global column indices
auto mpi_mat = dg::convert(  mat_loc, g_old);

Template Parameters

ConversionPolicy

(can be one of the MPI grids ) has to have the members:

bool global2localIdx(unsigned,unsigned&,unsigned&) const; where the first parameter is the global index and the other two are the output pair (localIdx, rank). return true if successful, false if global index is not part of the grid
MPI_Comm communicator() const; returns the communicator to use in the gather/scatter

Parameters

global	the column indices and num_cols need to be global, the row indices and num_rows local
policy	the conversion object

Returns: a row distributed MPI matrix. If no MPI communication is needed it simply has row-indices converted from global to local indices.

See also: Topology base classes the MPI grids defined in Level 3 can all be used as a ConversionPolicy

◆ convertLocal2GlobalCols()

template<class ConversionPolicy , class real_type >

void dg::convertLocal2GlobalCols	(	dg::IHMatrix_t< real_type > &	local,
		const ConversionPolicy &	policy
	)

Convert a matrix with local column indices to a matrix with global column indices.

Simply call policy.local2globalIdx for every column index

dg::IHMatrix_t<real_type> mat = dg::create::projection(
    g_new.global(), g_old.local(), method);
// mat is column distributed
// mat has global rows and local column indices
dg::convertLocal2GlobalCols( mat, g_old);
// now mat has global rows and global column indices
auto mat_loc = dg::convertGlobal2LocalRows( mat, g_new);
// now mat is row distributed with global column indices
auto mpi_mat = dg::convert(  mat_loc, g_old);

Template Parameters

ConversionPolicy

(can be one of the MPI grids ) has to have the members:

bool local2globalIdx(unsigned,unsigned&,unsigned&) const; where the first two parameters are the pair (localIdx, rank). and the last one is the global index and the return true if successful, false if index is not part of the grid
unsigned size() const; returns what will become the new num_cols

Parameters

local	the column indices and num_cols need to be local, will be global on output
policy	the conversion object

See also: Topology base classes the MPI grids defined in Level 3 can all be used as a ConversionPolicy

Classes

Enumerations

Functions

Detailed Description

The MPI interface

MPI Vectors and the blas functions

MPI Matrices and the symv function

Row distributed matrices

Column distributed matrices

Row and Column distributed

Transposition

Creation

Enumeration Type Documentation

◆ dist_type

Function Documentation

◆ convert()

◆ convertGlobal2LocalRows()

◆ convertLocal2GlobalCols()