Discontinuous Galerkin Library
#include "dg/algorithm.h"
Loading...
Searching...
No Matches
BLAS level 2 routines: Matrix-Vector

\( \alpha M \cdot x + \beta y\) and \( x^T M y \) More...

Collaboration diagram for BLAS level 2 routines: Matrix-Vector:

Namespaces

namespace  dg::blas2
 BLAS Level 2 routines.
 

Functions

template<class ContainerType1 , class MatrixType , class ContainerType2 >
auto dg::blas2::dot (const ContainerType1 &x, const MatrixType &m, const ContainerType2 &y)
 \( x^T M y\); Binary reproducible general dot product
 
template<class MatrixType , class ContainerType >
get_value_type< MatrixType > dg::blas2::dot (const MatrixType &m, const ContainerType &x)
 \( x^T M x\); Binary reproducible general dot product
 
template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::symv (get_value_type< ContainerType1 > alpha, MatrixType &&M, const ContainerType1 &x, get_value_type< ContainerType1 > beta, ContainerType2 &y)
 \( y = \alpha M x + \beta y\)
 
template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::symv (MatrixType &&M, const ContainerType1 &x, ContainerType2 &y)
 \( y = M x\)
 
template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::gemv (get_value_type< ContainerType1 > alpha, MatrixType &&M, const ContainerType1 &x, get_value_type< ContainerType1 > beta, ContainerType2 &y)
 Alias for blas2::symv \( y = \alpha M x + \beta y \);.
 
template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::gemv (MatrixType &&M, const ContainerType1 &x, ContainerType2 &y)
 Alias for blas2::symv \( y = M x\);.
 
template<class Stencil , class ContainerType , class ... ContainerTypes>
void dg::blas2::parallel_for (Stencil f, unsigned N, ContainerType &&x, ContainerTypes &&... xs)
 \( f(i, x_0, x_1, ...)\ \forall i\); Customizable and generic for loop
 
template<class FunctorType , class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::stencil (FunctorType f, MatrixType &&M, const ContainerType1 &x, ContainerType2 &y)
 \( F(M, x, y)\)
 
template<class MatrixType , class AnotherMatrixType >
void dg::blas2::transfer (const MatrixType &x, AnotherMatrixType &y)
 \( y = x\); Generic way to copy and/or convert a Matrix type to a different Matrix type
 
template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::apply (get_value_type< ContainerType1 > alpha, MatrixType &&M, const ContainerType1 &x, get_value_type< ContainerType1 > beta, ContainerType2 &y)
 Alias for dg::blas2::symv \( y = \alpha M(x) + \beta y \);.
 
template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::apply (MatrixType &&M, const ContainerType1 &x, ContainerType2 &y)
 Alias for dg::blas2::symv \( y = M( x)\);.
 

Detailed Description

\( \alpha M \cdot x + \beta y\) and \( x^T M y \)

Note
Successive calls to blas routines are executed sequentially
A manual synchronization of threads or devices is never needed in an application using these functions. All functions returning a value block until the value is ready.

Function Documentation

◆ apply() [1/2]

template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::apply ( get_value_type< ContainerType1 > alpha,
MatrixType && M,
const ContainerType1 & x,
get_value_type< ContainerType1 > beta,
ContainerType2 & y )
inline

Alias for dg::blas2::symv \( y = \alpha M(x) + \beta y \);.

This Alias exists for code readability: if your matrix is not actually a matrix but a functor then it may seem unnatural to write blas2::symv in your code especially if M is non-linear.

◆ apply() [2/2]

template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::apply ( MatrixType && M,
const ContainerType1 & x,
ContainerType2 & y )
inline

Alias for dg::blas2::symv \( y = M( x)\);.

This Alias exists for code readability: if your matrix is not actually a matrix but a functor then it may seem unnatural to write blas2::symv in your code especially if M is non-linear.

◆ dot() [1/2]

template<class ContainerType1 , class MatrixType , class ContainerType2 >
auto dg::blas2::dot ( const ContainerType1 & x,
const MatrixType & m,
const ContainerType2 & y )
inline

\( x^T M y\); Binary reproducible general dot product

This routine computes the scalar product defined by the symmetric positive definite matrix M

\[ x^T M y = \sum_{i,j=0}^{N-1} x_i M_{ij} y_j \]

For example

dg::DVec two( 100,2), three(100,3);
int result = dg::blas2::dot(two, 42., three);
CHECK( result == 25200); //100*(2*42*3)

Or a more elaborate use case

// This code snippet demonstrates how to discretize and compute the norm of a
// function on both shared and distributed memory systems
#ifdef WITH_MPI
// In an MPI environment define a 2d Cartesian communicator
MPI_Comm comm2d = dg::mpi_cart_create( MPI_COMM_WORLD, {0,0}, {1,1});
#endif
// create a grid of the domain [0,2]x[0,2] with 20 cells in x and y and
// 3 polynomial coefficients
dg::x::Grid2d g2d( 0, 2, 0, 2, 3, 20, 20
#ifdef WITH_MPI
, comm2d // in MPI distribute among processes
#endif
);
// define the function to integrate
const double amp = 2.;
auto function = [=] (double x, double y){
return amp*exp(x)*exp(y);
};
// create the Gaussian weights (volume form) for the integration
const dg::x::DVec w2d = dg::create::weights( g2d);
// discretize the function on the grid
const dg::x::DVec vec = dg::evaluate( function, g2d);
// now compute the scalar product (the L2 norm)
double square_norm = dg::blas2::dot( vec, w2d, vec);
// norm is now (or at least converges to): (e^4-1)^2
CHECK( fabs( square_norm - (exp(4)-1)*(exp(4)-1)) < 1e-6);
Attention
if one of the input vectors contains Inf or NaN or the product of the input numbers reaches Inf or Nan then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case
Note
Our implementation guarantees binary reproducible results up to and excluding the last mantissa bit of the result. Furthermore, the sum is computed with infinite precision and the result is then rounded to the nearest double precision number. Although the products are not computed with infinite precision, the order of multiplication is guaranteed. This is possible with the help of an adapted version of the dg::exblas library and works for single and double precision.
Attention
Binary Reproducible results are only guaranteed for float or double input. All other value types redirect to dg::blas1::vdot( dg::Product(), x, m, y);
Parameters
xLeft input
mThe diagonal Matrix.
yRight input (may alias x)
Returns
Generalized scalar product. If x and y are vectors of containers and m is not, then we sum the results of dg::blas2::dot( x[i], m, y[i])
Note
This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes. Also note that the behaviour is undefined when one of the containers contains nan
Template Parameters
MatrixTypeMatrixType has to have a category derived from AnyVectorTag and must be compatible with the ContainerTypes
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ dot() [2/2]

template<class MatrixType , class ContainerType >
get_value_type< MatrixType > dg::blas2::dot ( const MatrixType & m,
const ContainerType & x )
inline

\( x^T M x\); Binary reproducible general dot product

Alias for dg::blas2::dot( x,m,x)

\[ x^T M x = \sum_{i,j=0}^{N-1} x_i M_{ij} x_j \]

Parameters
mThe diagonal Matrix
xRight input
Returns
Generalized scalar product. If x is a vector of containers and m is not, then we sum the results of dg::blas2::dot( m, x[i])
Note
This routine is always executed synchronously due to the implicit memcpy of the result.
This routine is equivalent to the call dg::blas2::dot( x, m, x); which should be prefered because it looks more explicit
Template Parameters
MatrixTypeMatrixType has to have a category derived from AnyVectorTag and must be compatible with the ContainerTypes
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ gemv() [1/2]

template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::gemv ( get_value_type< ContainerType1 > alpha,
MatrixType && M,
const ContainerType1 & x,
get_value_type< ContainerType1 > beta,
ContainerType2 & y )
inline

Alias for blas2::symv \( y = \alpha M x + \beta y \);.

This Alias exists for code readability: if your "symmetric matrix" is not actually a symmetric matrix then it may seem unnatural to write blas2::symv in your code

◆ gemv() [2/2]

template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::gemv ( MatrixType && M,
const ContainerType1 & x,
ContainerType2 & y )
inline

Alias for blas2::symv \( y = M x\);.

This Alias exists for code readability: if your "symmetric matrix" is not actually a symmetric matrix then it may seem unnatural to write blas2::symv in your code

◆ parallel_for()

template<class Stencil , class ContainerType , class ... ContainerTypes>
void dg::blas2::parallel_for ( Stencil f,
unsigned N,
ContainerType && x,
ContainerTypes &&... xs )
inline

\( f(i, x_0, x_1, ...)\ \forall i\); Customizable and generic for loop

Attention
Only works for shared memory vectors (or scalars): no MPI, no Recursive (find reasons below).
For trivially parallel operations (no neighboring points involved) use dg::blas1::subroutine

This routine loops over an arbitrary user-defined "loop body" functor f with an arbitrary number of arguments \( x_s\) elementwise

\[ f(i, x_{0}, x_{1}, ...) \]

where i iterates from 0 to a given size N. The order of iterations is undefined. It is equivalent to the following

for(unsigned i=0; i<N; i++)
f( i, &x_0[0], &x_1[0], ...);

With this function very general for-loops can be parallelized like for example

// Compute forward difference of vector
std::vector<double> vec1 = {11,12,13}, vec2(3);
[]DG_DEVICE( unsigned i, const double* x, double* y, int mod)
{
y[i] = x[(i+1)%mod] - x[i];
}, 3, vec1, vec2, 3);
CHECK( vec2 == std::vector{ 1.,1.,-2.});

or

// Compute transpose of vector
const unsigned nx = 3, ny = 3;
std::vector<double> vec = {11,12,13, 21,22,23, 31,32,33}, vecT(9);
dg::blas2::parallel_for( [nx,ny]DG_DEVICE( unsigned k, const
double* ii, double* oo)
{
unsigned i = k/nx, j = k%nx;
oo[j*ny+i] = ii[i*nx+j];
}, nx*ny, vec, vecT);
CHECK( vecT == std::vector<double>{ 11,21,31, 12,22,32, 13,23,33});
Note
In a way this function is a generalization of dg::blas1::subroutine to non-trivial parallelization tasks. However, this comes at a price: this function only works for containers with the dg::SharedVectorTag and sclar types. The reason it cannot work for MPI is that the stencil (and thus the communication pattern) is unkown. However, it can serve as an important building block for other parallel functions like dg::blas2::stencil.
This is the closest function we have to kokkos::parallel_for of the Kokkos library.
Parameters
fthe loop body
Nthe total number of iterations in the for loop
xthe first argument
xsother arguments
Attention
The user has to decide whether or not it is safe to alias input or output vectors. If in doubt, do not alias output vectors.
Template Parameters
Stencila function or functor with an arbitrary number of arguments and no return type; The first argument is an unsigned (the loop iteration), afterwards takes a const_pointer_type argument (const pointer to first element in vector) for each input argument in the call and a pointer_type argument (pointer to first element in vector) for each output argument. Scalars are forwarded "as is" scalar_type . Stencil must be callable on the device in use. In particular, with CUDA it must be a functor (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE)
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the SharedVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp)
  • int, double and other primitive types ...

◆ stencil()

template<class FunctorType , class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::stencil ( FunctorType f,
MatrixType && M,
const ContainerType1 & x,
ContainerType2 & y )
inline

\( F(M, x, y)\)

This routine calls

\[ F(i, [M], x, y) \]

for all \( i \in [0,N[\), where N is the number of rows in M, using dg::blas2::parallel_for, where [M] depends on the matrix type:

  • for a csr matrix it is [M] = m.row_offsets, m.column_indices, m.values

Possible shared memory implementation

dg::blas2::parallel_for( F, m.num_rows, m.row_offsets, m.column_indices, m.values, x, y);
void parallel_for(Stencil f, unsigned N, ContainerType &&x, ContainerTypes &&... xs)
; Customizable and generic for loop
Definition blas2.h:413
@ y
y direction

Other matrix types have not yet been implemented.

Note
Since the matrix is known, a communication pattern is available and thus the function works in parallel for MPI (unlike dg::blas2::parallel_for).
In a way this function is a generalization of dg::blas2::parallel_for to MPI vectors at the cost of having to encode the communication stencil in the matrix M and only one vector argument
Parameters
fThe filter function is called like f(i, m.row_offsets_ptr, m.column_indices_ptr, m.values_ptr, x_ptr, y_ptr)
MThe Matrix.
xinput vector
ycontains the solution on output (may not alias x)
Template Parameters
FunctorTypeA type that is callable void operator()( unsigned, pointer, [m_pointers], const_pointer) For GPU vector the functor must be callable on the device.
MatrixTypeSo far only one of the dg::SparseMatrix types and their MPI variants are allowed
See also
dg::CSRMedianFilter, dg::create::window_stencil
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ symv() [1/2]

template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::symv ( get_value_type< ContainerType1 > alpha,
MatrixType && M,
const ContainerType1 & x,
get_value_type< ContainerType1 > beta,
ContainerType2 & y )
inline

\( y = \alpha M x + \beta y\)

This routine computes

\[ y = \alpha M x + \beta y \]

where \( M\) is a matrix (or functor that is called like M(alpha,x,beta,y)).

Note
There is nothing that prevents M from being non-symmetric or even non-linear. In this sense the term "symv" (symmetrix-Matrix-Vector multiplication) is misleading. For better code readability we introduce aliases: dg::blas2::gemv (general Matrix-Vector multiplication) and dg::apply (general, possibly non-linear functor application).

For example

auto mat = dg::SquareMatrix<double>({0,1,2, 3,4,5, 6,7,8});
std::vector<double> vec = {1,2,3}, res(3, 1000);
dg::blas2::symv( 0.5, mat, vec, 2., res);
CHECK( res == std::vector<double>{2004, 2013, 2022});

or a more elaborate use case

// This code snippet demonstrates how to derive a function on a device
#ifdef WITH_MPI
// In an MPI environment define a 2d Cartesian communicator
MPI_Comm comm2d = dg::mpi_cart_create( MPI_COMM_WORLD, {0,0}, {0,1});
#endif
// create a grid of the domain [0,2]x[0,2] with 20 cells in x and y and
// 3 polynomial coefficients
dg::x::Grid2d g2d( 0, 2, 0, 2, 3, 20, 20, dg::DIR, dg::PER
#ifdef WITH_MPI
, comm2d // in MPI distribute among processes
#endif
);
// define a function to derive
auto function = [](double x, double y){
return sin(x)*sin(y);
};
//Define a device matrix
//discretize the function on the grid and transfer the result to the device
const dg::x::DVec x = dg::evaluate( function, g2d);
//allocate memory for the result
//apply the derivative to x and store result in y
dg::blas2::symv(dx, x, y);
//or equivalently
dg::blas2::symv(1., dx, x, 0., y);
Parameters
alphaA Scalar
MThe Matrix.
xinput vector
betaA Scalar
ycontains the solution on output (may not alias x)
Attention
y may not alias x, the only exception is if MatrixType has the AnyVectorTag
If y on input contains a NaN or Inf, it may contain NaN or Inf on output as well even if beta is zero.
Template Parameters
MatrixTypeAny class for which a specialization of TensorTraits exists and defines a tensor_category derived from AnyMatrixTag. Furthermore, any functor/lambda type with signature void operator()( const ContainerType0&, ContainerType1&) . For example In case of SelfMadeMatrixTag only those blas2 functions that have a corresponding member function in the Matrix class (e.g. symv( const ContainerType0&, ContainerType1&); ) can be called. If a Container has the RecursiveVectorTag, then the matrix is applied to each of the elements unless the type has the SelfMadeMatrixTag or is a Functor type.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ symv() [2/2]

template<class MatrixType , class ContainerType1 , class ContainerType2 >
void dg::blas2::symv ( MatrixType && M,
const ContainerType1 & x,
ContainerType2 & y )
inline

\( y = M x\)

This routine computes

\[ y = M x \]

where \( M\) is a matrix (or functor that is called like M(x,y)).

Note
There is nothing that prevents M from being non-symmetric or even non-linear. In this sense the term "symv" (symmetrix-Matrix-Vector multiplication) is misleading. For better code readability we introduce aliases: dg::blas2::gemv (general Matrix-Vector multiplication) and dg::apply (general, possibly non-linear functor application)

For example

auto mat = dg::SquareMatrix<double>({0,1,2, 3,4,5, 6,7,8});
std::vector<double> vec = {1,2,3}, res(3);
dg::blas2::symv( mat, vec, res);
CHECK( res == std::vector<double>{8, 26, 44});

or a more elaborate use case

// This code snippet demonstrates how to derive a function on a device
#ifdef WITH_MPI
// In an MPI environment define a 2d Cartesian communicator
MPI_Comm comm2d = dg::mpi_cart_create( MPI_COMM_WORLD, {0,0}, {0,1});
#endif
// create a grid of the domain [0,2]x[0,2] with 20 cells in x and y and
// 3 polynomial coefficients
dg::x::Grid2d g2d( 0, 2, 0, 2, 3, 20, 20, dg::DIR, dg::PER
#ifdef WITH_MPI
, comm2d // in MPI distribute among processes
#endif
);
// define a function to derive
auto function = [](double x, double y){
return sin(x)*sin(y);
};
//Define a device matrix
//discretize the function on the grid and transfer the result to the device
const dg::x::DVec x = dg::evaluate( function, g2d);
//allocate memory for the result
//apply the derivative to x and store result in y
dg::blas2::symv(dx, x, y);
//or equivalently
dg::blas2::symv(1., dx, x, 0., y);
Parameters
MThe Matrix.
xinput vector
ycontains the solution on output (may not alias x)
Attention
y may not alias x, the only exception is if MatrixType has the AnyVectorTag and ContainerType1 ==ContainerType2
If y on input contains a NaN or Inf it is not a prioriy clear if it will contain a NaN or Inf on output as well. Our own matrix formats overwrite y and work correctly but for third-party libraries it is worth double-checking.
Template Parameters
MatrixTypeAny class for which a specialization of TensorTraits exists and defines a tensor_category derived from AnyMatrixTag. Furthermore, any functor/lambda type with signature void operator()( const ContainerType0&, ContainerType1&) . For example In case of SelfMadeMatrixTag only those blas2 functions that have a corresponding member function in the Matrix class (e.g. symv( const ContainerType0&, ContainerType1&); ) can be called. If a Container has the RecursiveVectorTag, then the matrix is applied to each of the elements unless the type has the SelfMadeMatrixTag or is a Functor type.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ transfer()

template<class MatrixType , class AnotherMatrixType >
void dg::blas2::transfer ( const MatrixType & x,
AnotherMatrixType & y )
inline

\( y = x\); Generic way to copy and/or convert a Matrix type to a different Matrix type

e.g. from CPU to GPU, or double to float, etc.

Template Parameters
MatrixTypeAny class for which a specialization of TensorTraits exists and defines a tensor_category derived from AnyMatrixTag. Furthermore, any functor/lambda type with signature void operator()( const ContainerType0&, ContainerType1&) . For example In case of SelfMadeMatrixTag only those blas2 functions that have a corresponding member function in the Matrix class (e.g. symv( const ContainerType0&, ContainerType1&); ) can be called. If a Container has the RecursiveVectorTag, then the matrix is applied to each of the elements unless the type has the SelfMadeMatrixTag or is a Functor type.
AnotherMatrixAnother Matrix type
Parameters
xsource
ysink
Note
y gets resized properly