Discontinuous Galerkin Library
#include "dg/algorithm.h"
Loading...
Searching...
No Matches
BLAS level 1 routines: Vector-Vector

\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\) More...

Collaboration diagram for BLAS level 1 routines: Vector-Vector:

Namespaces

namespace  dg::blas1
 BLAS Level 1 routines.
 

Functions

template<class Functor , class ContainerType , class ... ContainerTypes>
auto dg::blas1::vdot (Functor f, const ContainerType &x, const ContainerTypes &...xs) -> std::invoke_result_t< Functor, dg::get_value_type< ContainerType >, dg::get_value_type< ContainerTypes >... >
 \( \sum_i f(x_{0i}, x_{1i}, ...)\) Extended Precision transform reduce
 
template<class ContainerType1 , class ContainerType2 >
auto dg::blas1::dot (const ContainerType1 &x, const ContainerType2 &y)
 \( x^T y\) Binary reproducible Euclidean dot product between two vectors
 
template<class ContainerType , class OutputType , class BinaryOp , class UnaryOp = IDENTITY>
OutputType dg::blas1::reduce (const ContainerType &x, OutputType zero, BinaryOp binary_op, UnaryOp unary_op=UnaryOp())
 \( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction
 
template<class ContainerTypeIn , class ContainerTypeOut >
void dg::blas1::copy (const ContainerTypeIn &source, ContainerTypeOut &target)
 \( y=x \)
 
template<class ContainerType , class value_type >
void dg::blas1::scal (ContainerType &x, value_type alpha)
 \( x = \alpha x\)
 
template<class ContainerType , class value_type >
void dg::blas1::plus (ContainerType &x, value_type alpha)
 \( x = x + \alpha \)
 
template<class ContainerType , class ContainerType1 , class value_type , class value_type1 >
void dg::blas1::axpby (value_type alpha, const ContainerType1 &x, value_type1 beta, ContainerType &y)
 \( y = \alpha x + \beta y\)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 , class value_type2 >
void dg::blas1::axpbypgz (value_type alpha, const ContainerType1 &x, value_type1 beta, const ContainerType2 &y, value_type2 gamma, ContainerType &z)
 \( z = \alpha x + \beta y + \gamma z\)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void dg::blas1::axpby (value_type alpha, const ContainerType1 &x, value_type1 beta, const ContainerType2 &y, ContainerType &z)
 \( z = \alpha x + \beta y\)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, value_type1 beta, ContainerType &y)
 \( y = \alpha x_1 x_2 + \beta y\)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDot (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y)
 \( y = x_1 x_2 \)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class value_type , class value_type1 >
void dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, const ContainerType3 &x3, value_type1 beta, ContainerType &y)
 \( y = \alpha x_1 x_2 x_3 + \beta y\)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void dg::blas1::pointwiseDivide (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, value_type1 beta, ContainerType &y)
 \( y = \alpha x_1/ x_2 + \beta y \)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDivide (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y)
 \( y = x_1/ x_2\)
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class ContainerType4 , class value_type , class value_type1 , class value_type2 >
void dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &y1, value_type1 beta, const ContainerType3 &x2, const ContainerType4 &y2, value_type2 gamma, ContainerType &z)
 \( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\)
 
template<class ContainerType , class ContainerType1 , class UnaryOp >
void dg::blas1::transform (const ContainerType1 &x, ContainerType &y, UnaryOp op)
 \( y = op(x)\)
 
template<class ContainerType , class BinarySubroutine , class Functor , class ContainerType0 , class ... ContainerTypes>
void dg::blas1::evaluate (ContainerType &y, BinarySubroutine f, Functor g, const ContainerType0 &x0, const ContainerTypes &...xs)
 \( f(g(x_0,x_1,...), y)\)
 
template<class Subroutine , class ContainerType , class ... ContainerTypes>
void dg::blas1::subroutine (Subroutine f, ContainerType &&x, ContainerTypes &&... xs)
 \( f(x_0, x_1, ...)\); Customizable and generic blas1 function
 
template<class ContainerType0 , class BinarySubroutine , class Functor , class ContainerType1 , class ... ContainerTypes>
void dg::blas1::kronecker (ContainerType0 &y, BinarySubroutine f, Functor g, const ContainerType1 &x0, const ContainerTypes &...xs)
 \( f(g(x_{0i_0},x_{1i_1},...), y_I)\) (Kronecker evaluation)
 
template<class from_ContainerType , class ContainerType , class ... Params>
void dg::assign (const from_ContainerType &from, ContainerType &to, Params &&... ps)
 Generic way to assign the contents of a from_ContainerType object to a ContainerType object optionally given additional parameters.
 
template<class ContainerType , class from_ContainerType , class ... Params>
ContainerType dg::construct (const from_ContainerType &from, Params &&... ps)
 Generic way to construct an object of ContainerType given a from_ContainerType object and optional additional parameters.
 
template<class ContainerType , class Functor , class ... ContainerTypes>
auto dg::kronecker (Functor &&f, const ContainerType &x0, const ContainerTypes &... xs)
 \( y_I = f(x_{0i_0}, x_{1i_1}, ...) \) Memory allocating version of dg::blas1::kronecker
 

Detailed Description

\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\)

Note
Successive calls to blas routines are executed sequentially
A manual synchronization of threads or devices is never needed in an application using these functions. All functions returning a value block until the value is ready.

Function Documentation

◆ assign()

template<class from_ContainerType , class ContainerType , class ... Params>
void dg::assign ( const from_ContainerType & from,
ContainerType & to,
Params &&... ps )
inline

Generic way to assign the contents of a from_ContainerType object to a ContainerType object optionally given additional parameters.

The idea of this function is to convert between types with the same data layout but different execution policies (e.g. from a thrust::host_vector to a thrust::device_vector). If the layout differs, additional parameters can be used to achieve what you want.

For example

//Assign a host vector to a device vector
dg::HVec host( 100, 1.);
dg::DVec device(100);
dg::assign( host, device );
CHECK( device == dg::DVec( 100, 1.));
//Assign a host vector to all elements of a std::vector of 3 dg::DVec
std::vector<dg::DVec> device_vec;
dg::assign( host, device_vec, 3);
REQUIRE( device_vec.size() == 3);
for( unsigned u=0; u<3; u ++)
CHECK( device_vec[u] == device);
Parameters
fromsource vector
totarget vector contains a copy of from on output (memory is automatically resized if necessary)
psadditional parameters usable for the transfer operation
Note
it is possible to assign a from_ContainerType to a std::array<ContainerType, N> (all elements are initialized with from_ContainerType) and also a std::vector<ContainerType> ( the desired size of the std::vector must be provided as an additional parameter)
Template Parameters
from_ContainerTypemust have the same data policy derived from AnyVectorTag as ContainerType (with the exception of std::array and std::vector) but can have different execution policy
Paramsin some cases additional parameters that are necessary to assign objects of Type ContainerType
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ axpby() [1/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void dg::blas1::axpby ( value_type alpha,
const ContainerType1 & x,
value_type1 beta,
const ContainerType2 & y,
ContainerType & z )
inline

\( z = \alpha x + \beta y\)

This routine computes

\[ z_i = \alpha x_i + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three(100,3), result(100);
dg::blas1::axpby( 2, two, 3., three, result);
CHECK( result == dg::DVec( 100, 13.)); // 2*2+3*3
Parameters
alphaScalar
xContainerType x may alias z
betaScalar
yContainerType y may alias z
z(write-only) ContainerType z contains solution on output
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ axpby() [2/2]

template<class ContainerType , class ContainerType1 , class value_type , class value_type1 >
void dg::blas1::axpby ( value_type alpha,
const ContainerType1 & x,
value_type1 beta,
ContainerType & y )
inline

\( y = \alpha x + \beta y\)

This routine computes

\[ y_i = \alpha x_i + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three(100,3);
dg::blas1::axpby( 2, two, 3., three);
CHECK( three == dg::DVec( 100, 13.)); // 2*2+3*3
Parameters
alphaScalar
xContainerType x may alias y
betaScalar
y(read/write) ContainerType y contains solution on output
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ axpbypgz()

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 , class value_type2 >
void dg::blas1::axpbypgz ( value_type alpha,
const ContainerType1 & x,
value_type1 beta,
const ContainerType2 & y,
value_type2 gamma,
ContainerType & z )
inline

\( z = \alpha x + \beta y + \gamma z\)

This routine computes

\[ z_i = \alpha x_i + \beta y_i + \gamma z_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three(100,3);
dg::blas1::axpby( 2, two, 3., three);
CHECK( three == dg::DVec( 100, 13.)); // 2*2+3*3
Parameters
alphaScalar
xContainerType x may alias result
betaScalar
yContainerType y may alias result
gammaScalar
z(read/write) ContainerType contains solution on output
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ construct()

template<class ContainerType , class from_ContainerType , class ... Params>
ContainerType dg::construct ( const from_ContainerType & from,
Params &&... ps )
inline

Generic way to construct an object of ContainerType given a from_ContainerType object and optional additional parameters.

The idea of this function is to convert between types with the same data layout but different execution policies (e.g. from a thrust::host_vector to a thrust::device_vector) If the layout differs, additional parameters can be used to achieve what you want.

For example

//Construct a device vector from host vector
dg::HVec host( 100, 1.);
auto device = dg::construct<dg::DVec>( host );
CHECK( device == dg::DVec( 100, 1.));
// Construct an array of vectors
auto device_arr = dg::construct<std::array<dg::DVec, 3>>( host );
for( unsigned u=0; u<3; u ++)
CHECK( device_arr[u] == device);
//Construct a std::vector of 3 dg::DVec from a host vector
auto device_vec = dg::construct<std::vector<dg::DVec>>( host, 3);
REQUIRE( device_vec.size() == 3);
for( unsigned u=0; u<3; u++)
CHECK( device_vec[u] == device);
// Can also be used to change value type, e.g. complex:
auto complex_dev = dg::construct<dg::cDVec>( host );
CHECK( complex_dev == dg::cDVec( 100, thrust::complex<double>(1.)));
Parameters
fromsource vector
psadditional parameters necessary to construct a ContainerType object
Returns
from converted to the new format (memory is allocated accordingly)
Note
it is possible to construct a std::array<ContainerType, N> (all elements are initialized with from_ContainerType) and also a std::vector<ContainerType> ( the desired size of the std::vector must be provided as an additional parameter) given a from_ContainerType
Template Parameters
from_ContainerTypemust have the same data policy derived from AnyVectorTag as ContainerType (with the exception of std::array and std::vector) but can have different execution policy
Paramsin some cases additional parameters that are necessary to construct objects of Type ContainerType
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ copy()

template<class ContainerTypeIn , class ContainerTypeOut >
void dg::blas1::copy ( const ContainerTypeIn & source,
ContainerTypeOut & target )
inline

\( y=x \)

explicit pointwise assignment \( y_i = x_i\)

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), two_copy(100);
dg::blas1::copy( two, two_copy);
CHECK( two_copy == two);
Parameters
sourcevector to copy
target(write-only) destination
Note
in contrast to the dg::assign functions the copy function uses the execution policy to determine the implementation and thus works only on types with same execution policy
catches self-assignment
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ dot()

template<class ContainerType1 , class ContainerType2 >
auto dg::blas1::dot ( const ContainerType1 & x,
const ContainerType2 & y )
inline

\( x^T y\) Binary reproducible Euclidean dot product between two vectors

This routine computes

\[ x^T y = \sum_{i=0}^{N-1} x_i y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2.0), three(100,3.0);
double result = dg::blas1::dot(two, three);
CHECK( result == 600.0); //100*(2*3)

or

std::vector<thrust::complex<double>> c0( 100, {1,1}), c1(100, {1,-1});
auto cresult = dg::blas1::dot(c0, c1);
// Result is of complex type
CHECK( cresult == thrust::complex<double>{200,0});
Attention
if one of the input vectors contains Inf or NaN or the product of the input numbers reaches Inf or Nan then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case
Note
Our implementation guarantees binary reproducible results. The sum is computed with infinite precision and the result is rounded to the nearest double precision number. This is possible with the help of an adapted version of the dg::exblas library and works for single and double precision.
Attention
Binary Reproducible results are only guaranteed for float or double input. All other value types redirect to dg::blas1::vdot( dg::Product(), x, y);
Parameters
xLeft Container
yRight Container may alias x
Returns
Scalar product as defined above
Note
This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ evaluate()

template<class ContainerType , class BinarySubroutine , class Functor , class ContainerType0 , class ... ContainerTypes>
void dg::blas1::evaluate ( ContainerType & y,
BinarySubroutine f,
Functor g,
const ContainerType0 & x0,
const ContainerTypes &... xs )
inline

\( f(g(x_0,x_1,...), y)\)

This routine elementwise evaluates

\[ f(g(x_{0i}, x_{1i}, ...), y_i) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

auto function = [](double x, double y){
return sin(x)*sin(y);
};
dg::HVec pi2(20, M_PI/2.), pi3( 20, 3*M_PI/2.), result(20, 0);
dg::blas1::evaluate( result, dg::equals(), function, pi2, pi3);
CHECK( result == dg::HVec( 20, -1.)); // sin(M_PI/2.)*sin(3*M_PI/2.) = -1
Template Parameters
BinarySubroutineFunctor with signature: void ( value_type_g, value_type_y&) i.e. it reads the first (and second) and writes into the second argument
Functorsignature: value_type_g operator()( value_type_x0, value_type_x1, ...)
Attention
Both BinarySubroutine and Functor must be callable on the device in use. In particular, with CUDA they must be functor tpyes (not functions) and their signatures must contain the __device__ specifier. (s.a. DG_DEVICE)
Parameters
ycontains result
fThe subroutine, for example dg::equals or dg::plus_equals, see dg::blas1::evaluate binary operators for a collection of predefined functors to use here
gThe functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x0first input
xsmore input
Note
all aliases allowed
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ kronecker() [1/2]

template<class ContainerType0 , class BinarySubroutine , class Functor , class ContainerType1 , class ... ContainerTypes>
void dg::blas1::kronecker ( ContainerType0 & y,
BinarySubroutine f,
Functor g,
const ContainerType1 & x0,
const ContainerTypes &... xs )
inline

\( f(g(x_{0i_0},x_{1i_1},...), y_I)\) (Kronecker evaluation)

This routine elementwise evaluates

\[ f(g(x_{0i_0}, x_{1i_1}, ..., x_{(n-1)i_{n-1}}), y_{((i_{n-1} N_{n-2} +...)N_1+i_1)N_0+i_0}) \]

for all combinations of input values. \( N_i\) is the size of the vector \( x_i\). The **first index \(i_0\) is the fastest varying in the output**, then \( i_1\), etc. If \( x_i\) is a scalar then the size \( N_i = 1\).

Attention
None of the \( x_i\) or \( y\) can have the dg::RecursiveVectorTag

The size of the output \( y\) must match the product of sizes of input vectors i.e.

\[ N_y = \prod_{i=0}^{n-1} N_i \]

The order of evaluations is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

auto function = []( double x, double y) {
return x+y;
};
std::vector<double> xs{1,2,3,4}, ys{ 10,20,30,40}, y(16, 0);
dg::blas1::kronecker( y, dg::equals(), function, xs, ys);
// Note how xs varies fastest in result:
std::vector<double> result = { 11,12,13,14,21,22,23,24,31,32,33,34,41,42,43,44};
CHECK( y == result);
// Note that the following code is equivalent
// but that dg::blas1::kronecker has a performance advantage since
// it never explicitly forms XS or YS
std::vector<double> XS(16), YS(16);
for( unsigned i=0; i<4; i++)
for( unsigned k=0; k<4; k++)
{
XS[i*4+k] = xs[k];
YS[i*4+k] = ys[i];
}
dg::blas1::evaluate( y, dg::equals(), function, XS, YS);
CHECK( y == result);
// Finally, we could also write
dg::blas1::kronecker( XS, dg::equals(), []( double x, double y){ return x;}, xs, ys);
dg::blas1::kronecker( YS, dg::equals(), []( double x, double y){ return y;}, xs, ys);
dg::blas1::evaluate( y, dg::equals(), function, XS, YS);
CHECK( y == result);
Note
This function is trivially parallel and the MPI version simply calls the appropriate shared memory version The user is responsible for making sure that the result has the correct communicator
For the function \( f(x_0, x_1, ..., x_{n-1}) = x_0 x_1 ... x_{n-1} \) dg::blas1::kronecker(y, dg::equals(), x_0, x_1, ...) computes the actual Kronecker product of the arguments in reversed order

\[ y = x_{n-1} \otimes x_{n-2} \otimes ... \otimes x_1 \otimes x_0\]

(or the outer product) With this behaviour we can in e.g. Cartesian coordinates naturally define functions \( f(x,y,z)\) and evaluate this function on product space coordinates and have ** \( x \) as the fastest varying coordinate in memory**.
Template Parameters
BinarySubroutineFunctor with signature: void ( value_type_g, value_type_y&) i.e. it reads the first (and second) and writes into the second argument
Functorsignature: value_type_g operator()( value_type_x0, value_type_x1, ...)
Attention
Both BinarySubroutine and Functor must be callable on the device in use. In particular, with CUDA they must be functor tpyes (not functions) and their signatures must contain the __device__ specifier. (s.a. DG_DEVICE)
Parameters
ycontains result (size of y must match the product of sizes of \( x_i\))
fThe subroutine, for example dg::equals or dg::plus_equals, see dg::blas1::evaluate binary operators for a collection of predefined functors to use here
gThe functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x0first input
xsmore input
Note
all aliases allowed
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
dg::kronecker

◆ kronecker() [2/2]

template<class ContainerType , class Functor , class ... ContainerTypes>
auto dg::kronecker ( Functor && f,
const ContainerType & x0,
const ContainerTypes &... xs )

\( y_I = f(x_{0i_0}, x_{1i_1}, ...) \) Memory allocating version of dg::blas1::kronecker

In a shared memory space with serial execution this function is implemented roughly as in the following pseudo-code

// assume x0, xs are host vectors
size = product( x0.size(), xs.size(), ...)
using value_type = decltype( f(x0[0], xs[0]...));
thrust::host_vector<value_type> y(size);
dg::blas1::kronecker( y, dg::equals(), f, x0, xs...);
auto kronecker(Functor &&f, const ContainerType &x0, const ContainerTypes &... xs)
Memory allocating version of dg::blas1::kronecker
Definition blas1.h:857
@ y
y direction
This is the namespace for all functions and classes defined and used by the discontinuous Galerkin li...
Definition subroutines.h:22
Note
The return type is inferred from the execution_policy and the tensor_category of the input vectors. It is unspecified. It is such that the resulting vector is exactly compatible in a call to dg::kronecker( result, dg::equals(), f, x0, xs...);

The MPI distributed version of this function is implemented as

MPI_Comm comm_kron = dg::mpi_cart_kron( x0.communicator(), xs.communicator()...);
return MPI_Vector{dg::kronecker( f, x0.data(), xs.data()...), comm_kron}; // a dg::MPI_Vector
MPI_Comm mpi_cart_kron(std::vector< MPI_Comm > comms)
Form a Kronecker product among Cartesian communicators.
Definition mpi_kron.h:178
A simple wrapper around a container object and an MPI_Comm.
Definition mpi_vector.h:37
Attention
In particular this means that in MPI all the communicators in the input argument vectors need to be Cartesian communicators that were created from a common Cartesian root communicator and both root and all sub communicators need to be registered in the dg library through calls to dg::register_mpi_cart_sub or dg::mpi_cart_sub. Further, the order of input-communicators must match the dimensions in the common root communicator (see dg::mpi_cart_kron) i.e. currently in MPI it is not possible to transpose with this function

The rationale for this behaviour is that:

  1. the MPI standard has no easy way of finding a common ancestor to Cartesin sub communicators
  2. the MPI standard has no easy way of re-joining previously split Cartesian communicators
  3. we want to avoid creating a new communicator every time this function is called.

For example

auto function = []( double x, double y) {
return x+y;
};
std::vector<double> xs{1,2,3,4}, ys{ 10,20,30,40};
auto y = dg::kronecker( function, xs, ys);
// Note how xs varies fastest in result:
std::vector<double> result = { 11,12,13,14,21,22,23,24,31,32,33,34,41,42,43,44};
CHECK( y == result);
Template Parameters
Functorsignature: value_type_g operator()( value_type_x0, value_type_x1, ...)
Attention
Functor must be callable on the device in use. In particular, with CUDA it must be a functor tpye (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE)
Parameters
fThe functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x0first input
xsmore input
Returns
newly allocated result (size of container matches the product of sizes of \( x_i\))
Note
all aliases allowed
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
dg::blas1::kronecker dg::mpi_cart_kron

◆ plus()

template<class ContainerType , class value_type >
void dg::blas1::plus ( ContainerType & x,
value_type alpha )
inline

\( x = x + \alpha \)

This routine computes

\[ x_i + \alpha \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2);
dg::blas1::plus( two, 3.);
CHECK( two == dg::DVec( 100, 5.));
Parameters
alphaScalar
x(read/write) x
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDivide() [1/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDivide ( const ContainerType1 & x1,
const ContainerType2 & x2,
ContainerType & y )
inline

\( y = x_1/ x_2\)

Divides two vectors element by element:

\[ y_i = x_{1i}/x_{2i}\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100);
dg::blas1::pointwiseDivide( two, three, result );
CHECK( result == dg::DVec( 100, 2./3.));
Parameters
x1ContainerType x1
x2ContainerType x2 may alias x1
y(write-only) ContainerType y contains result on output ( may alias x1 and/or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDivide() [2/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void dg::blas1::pointwiseDivide ( value_type alpha,
const ContainerType1 & x1,
const ContainerType2 & x2,
value_type1 beta,
ContainerType & y )
inline

\( y = \alpha x_1/ x_2 + \beta y \)

Divides two vectors element by element:

\[ y_i = \alpha x_{1i}/x_{2i} + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100,1);
dg::blas1::pointwiseDivide( 3, two, three, 5, result );
CHECK( result == dg::DVec( 100, 7.)); // 3*2/3+5*1
Parameters
alphascalar
x1ContainerType x1
x2ContainerType x2 may alias x1
betascalar
y(read/write) ContainerType y contains result on output ( may alias x1 and/or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDot() [1/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDot ( const ContainerType1 & x1,
const ContainerType2 & x2,
ContainerType & y )
inline

\( y = x_1 x_2 \)

Multiplies two vectors element by element:

\[ y_i = x_{1i}x_{2i}\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100);
dg::blas1::pointwiseDot( two, three, result );
CHECK( result == dg::DVec( 100, 6.)); // 2*3
Parameters
x1ContainerType x1
x2ContainerType x2 may alias x1
y(write-only) ContainerType y contains result on output ( may alias x1 or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDot() [2/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class value_type , class value_type1 >
void dg::blas1::pointwiseDot ( value_type alpha,
const ContainerType1 & x1,
const ContainerType2 & x2,
const ContainerType3 & x3,
value_type1 beta,
ContainerType & y )
inline

\( y = \alpha x_1 x_2 x_3 + \beta y\)

Multiplies three vectors element by element:

\[ y_i = \alpha x_{1i}x_{2i}x_{3i} + \beta y_i\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), four(100,4), result(100,6);
dg::blas1::pointwiseDot(2., two, three, four, -4., result );
CHECK( result == dg::DVec( 100, 24.)); // 2*2*3*4-4*6
Parameters
alphascalar
x1ContainerType x1
x2ContainerType x2 may alias x1
x3ContainerType x3 may alias x1 and/or x2
betascalar
y(read/write) ContainerType y contains result on output ( may alias x1,x2 or x3)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDot() [3/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void dg::blas1::pointwiseDot ( value_type alpha,
const ContainerType1 & x1,
const ContainerType2 & x2,
value_type1 beta,
ContainerType & y )
inline

\( y = \alpha x_1 x_2 + \beta y\)

Multiplies two vectors element by element:

\[ y_i = \alpha x_{1i}x_{2i} + \beta y_i\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100,6);
dg::blas1::pointwiseDot(2., two, three, -4., result );
CHECK( result == dg::DVec( 100, -12.)); // 2*2*3-4*6
Parameters
alphascalar
x1ContainerType x1
x2ContainerType x2 may alias x1
betascalar
y(read/write) ContainerType y contains result on output ( may alias x1 or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDot() [4/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class ContainerType4 , class value_type , class value_type1 , class value_type2 >
void dg::blas1::pointwiseDot ( value_type alpha,
const ContainerType1 & x1,
const ContainerType2 & y1,
value_type1 beta,
const ContainerType3 & x2,
const ContainerType4 & y2,
value_type2 gamma,
ContainerType & z )

\( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\)

Multiplies and adds vectors element by element:

\[ z_i = \alpha x_{1i}y_{1i} + \beta x_{2i}y_{2i} + \gamma z_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two(100,2), three(100,3), four(100,4), five(100,5), result(100,6);
dg::blas1::pointwiseDot(2., two, three, -4., four, five, 2., result );
CHECK( result == dg::DVec( 100, -56.)); // 2*2*3-4*4*5+2*6
Parameters
alphascalar
x1ContainerType x1
y1ContainerType y1
betascalar
x2ContainerType x2
y2ContainerType y2
gammascalar
z(read/write) ContainerType z contains result on output
Note
all aliases are allowed
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ reduce()

template<class ContainerType , class OutputType , class BinaryOp , class UnaryOp = IDENTITY>
OutputType dg::blas1::reduce ( const ContainerType & x,
OutputType zero,
BinaryOp binary_op,
UnaryOp unary_op = UnaryOp() )
inline

\( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction

This routine computes

\[ s = f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_i) \otimes \dots \otimes f(x_{N-1}) \]

where \( \otimes \) is an arbitrary commutative and associative binary operator, \( f\) is an optional unary operator and

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

Note
numerical addition/multiplication is not exactly associative which means that the associated reduction looses precision due to inexact arithmetic. For binary reproducible exactly rounded results use the dg::blas1::dot function. However, this function is more general and faster to execute than dg::blas1::dot.

For example

//Check if a vector contains Inf or NaN
thrust::device_vector<double> x( 100, NAN );
bool hasnan = false;
hasnan = dg::blas1::reduce( x, false, thrust::logical_or<bool>(),
CHECK( hasnan == true);

or

// Find minimum and maximum of a vector
for( int u=0; u<100; u++)
x[u] = (double)(u-10)*(u-10);
// Notice the zero elements of the min and max functions
double min = dg::blas1::reduce( x, +1e308, thrust::minimum<double>());
double max = dg::blas1::reduce( x, -1e308, thrust::maximum<double>());
CHECK( min == 0);
CHECK( max == 89*89);
Parameters
xContainer to reduce
zeroThe neutral element with respect to binary_op that is x == binary_op( zero, x) . Determines the OutputType so make sure to make the type clear to the compiler (e.g. write (double)0 instead of 0 if you want double output)
Attention
In the current implementation zero is used to initialize partial sums e.g. when reducing MPI Vectors so it is important that zero is actually the neutral element. The reduction will yield wrong results if it is not.
Parameters
binary_opan associative and commutative binary operator
unary_opa unary operator applies to each element of x
Returns
Custom reduction as defined above
Note
This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes
Template Parameters
BinaryOpFunctor with signature: value_type operator()( value_type, value_type) , must be associative and commutative. value_tpye must be compatible with OutputType
UnaryOpa unary operator. The argument type must be compatible with get_value_type<ContainerType>. The return type must be convertible to OutputType
OutputTypeThe type of the result. Infered from zero so make sure zero's type is clear to the compiler.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
For partial reductions see dg::Average

◆ scal()

template<class ContainerType , class value_type >
void dg::blas1::scal ( ContainerType & x,
value_type alpha )
inline

\( x = \alpha x\)

This routine computes

\[ \alpha x_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2);
dg::blas1::scal( two, 0.5);
CHECK( two == dg::DVec( 100, 1));
Parameters
alphaScalar
x(read/write) x
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system
Template Parameters
value_typeAny type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ subroutine()

template<class Subroutine , class ContainerType , class ... ContainerTypes>
void dg::blas1::subroutine ( Subroutine f,
ContainerType && x,
ContainerTypes &&... xs )
inline

\( f(x_0, x_1, ...)\); Customizable and generic blas1 function

This routine evaluates an arbitrary user-defined subroutine f with an arbitrary number of arguments \( x_s\) elementwise

\[ f(x_{0i}, x_{1i}, ...) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), four(100,4);
dg::blas1::subroutine( []DG_DEVICE( double x, double y, double& z){
z = 7*x + y +z;
}, two, 3., four);
CHECK( four == dg::HVec( 100, 21.)); // 7*2+3+4
Parameters
fthe subroutine, see dg::blas1::subroutine subroutines for a collection of predefind subroutines to use here
xthe first argument
xsother arguments
Note
This function can compute any trivial parallel expression for any number of input and output arguments, which is quite remarkable really. In this sense it replaces all other blas1 functions except the scalar product, which is not trivially parallel.
Attention
The user has to decide whether or not it is safe to alias input or output vectors. If in doubt, do not alias output vectors.
Template Parameters
Subroutinea function or functor with an arbitrary number of arguments and no return type; taking a value_type argument for each input argument in the call and a value_type& argument for each output argument. Subroutine must be callable on the device in use. In particular, with CUDA it must be a functor (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ transform()

template<class ContainerType , class ContainerType1 , class UnaryOp >
void dg::blas1::transform ( const ContainerType1 & x,
ContainerType & y,
UnaryOp op )
inline

\( y = op(x)\)

This routine computes

\[ y_i = op(x_i) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), result(100);
CHECK( result == dg::DVec( 100, exp(2.)));
Parameters
xContainerType x may alias y
y(write-only) ContainerType y contains result, may alias x
opunary SquareMatrix to use on every element
Template Parameters
UnaryOpFunctor with signature: value_type operator()( value_type)
Note
UnaryOp must be callable on the device in use. In particular, with CUDA it must be of functor tpye (not a function) and its signatures must contain the __device__ specifier. (s.a. DG_DEVICE)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ vdot()

template<class Functor , class ContainerType , class ... ContainerTypes>
auto dg::blas1::vdot ( Functor f,
const ContainerType & x,
const ContainerTypes &... xs ) -> std::invoke_result_t<Functor, dg::get_value_type<ContainerType>, dg::get_value_type<ContainerTypes>...>

\( \sum_i f(x_{0i}, x_{1i}, ...)\) Extended Precision transform reduce

This routine computes

\[ \sum_i f(x_{0i}, x_{1i}, ...)\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

// A fun way to compute the size of a vector
dg::DVec v( 100,2);
unsigned size = dg::blas1::vdot( []DG_DEVICE(double x){ return 1u;},
v);
CHECK( size == 100u); //100*1

or

// Compute the weighted norm of a complex vector
std::vector<double> ww( 100, 42.);
std::vector<std::complex<double>> cc( 100, {1,1});
// Use auto to allow changing std::complex to thrust::complex
// Use norm instead of std::norm and let ADL work
double nrm = dg::blas1::vdot([](double w, auto z){ return w*norm(z);},
ww, cc);
CHECK( nrm == 100*42*2.0);
Note
The main motivator for this version of dot is that it works for complex numbers.
Attention
if one of the input vectors contains Inf or NaN or the product of the input numbers reaches Inf or Nan then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case
Note
This implementation does not guarantee binary reproducible results. The sum is computed with extended precision and the result is rounded to the nearest double precision number. This is possible with the help of an adapted version of the dg::exblas library and works for single and double precision.
Template Parameters
Functorsignature: value_type_g operator()( value_type_x0, value_type_x1, ...)
Attention
Functor must be callable on the device in use. In particular, with CUDA it must be a functor tpye (not function) and its signatures must contain the __device__ specifier. (s.a. DG_DEVICE)
Parameters
fThe functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
xFirst input
xsMore input (may alias x)
Returns
Scalar product as defined above
Note
This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system