Discontinuous Galerkin Library
#include "dg/algorithm.h"

\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\) More...

Collaboration diagram for BLAS level 1 routines: Vector-Vector:

Namespaces

namespace  dg::blas1
 BLAS Level 1 routines.
 

Functions

template<class ContainerType1 , class ContainerType2 >
get_value_type< ContainerType1 > dg::blas1::dot (const ContainerType1 &x, const ContainerType2 &y)
 \( x^T y\) Binary reproducible Euclidean dot product between two vectors More...
 
template<class ContainerType , class OutputType , class BinaryOp , class UnaryOp = IDENTITY>
OutputType dg::blas1::reduce (const ContainerType &x, OutputType zero, BinaryOp binary_op, UnaryOp unary_op=UnaryOp())
 \( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction More...
 
template<class ContainerTypeIn , class ContainerTypeOut >
void dg::blas1::copy (const ContainerTypeIn &source, ContainerTypeOut &target)
 \( y=x \) More...
 
template<class ContainerType >
void dg::blas1::scal (ContainerType &x, get_value_type< ContainerType > alpha)
 \( x = \alpha x\) More...
 
template<class ContainerType >
void dg::blas1::plus (ContainerType &x, get_value_type< ContainerType > alpha)
 \( x = x + \alpha \) More...
 
template<class ContainerType , class ContainerType1 >
void dg::blas1::axpby (get_value_type< ContainerType > alpha, const ContainerType1 &x, get_value_type< ContainerType > beta, ContainerType &y)
 \( y = \alpha x + \beta y\) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::axpbypgz (get_value_type< ContainerType > alpha, const ContainerType1 &x, get_value_type< ContainerType > beta, const ContainerType2 &y, get_value_type< ContainerType > gamma, ContainerType &z)
 \( z = \alpha x + \beta y + \gamma z\) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::axpby (get_value_type< ContainerType > alpha, const ContainerType1 &x, get_value_type< ContainerType > beta, const ContainerType2 &y, ContainerType &z)
 \( z = \alpha x + \beta y\) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDot (get_value_type< ContainerType > alpha, const ContainerType1 &x1, const ContainerType2 &x2, get_value_type< ContainerType > beta, ContainerType &y)
 \( y = \alpha x_1 x_2 + \beta y\) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDot (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y)
 \( y = x_1 x_2 \) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 >
void dg::blas1::pointwiseDot (get_value_type< ContainerType > alpha, const ContainerType1 &x1, const ContainerType2 &x2, const ContainerType3 &x3, get_value_type< ContainerType > beta, ContainerType &y)
 \( y = \alpha x_1 x_2 x_3 + \beta y\) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDivide (get_value_type< ContainerType > alpha, const ContainerType1 &x1, const ContainerType2 &x2, get_value_type< ContainerType > beta, ContainerType &y)
 \( y = \alpha x_1/ x_2 + \beta y \) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDivide (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y)
 \( y = x_1/ x_2\) More...
 
template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class ContainerType4 >
void dg::blas1::pointwiseDot (get_value_type< ContainerType > alpha, const ContainerType1 &x1, const ContainerType2 &y1, get_value_type< ContainerType > beta, const ContainerType3 &x2, const ContainerType4 &y2, get_value_type< ContainerType > gamma, ContainerType &z)
 \( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\) More...
 
template<class ContainerType , class ContainerType1 , class UnaryOp >
void dg::blas1::transform (const ContainerType1 &x, ContainerType &y, UnaryOp op)
 \( y = op(x)\) More...
 
template<class ContainerType , class BinarySubroutine , class Functor , class ContainerType0 , class ... ContainerTypes>
void dg::blas1::evaluate (ContainerType &y, BinarySubroutine f, Functor g, const ContainerType0 &x0, const ContainerTypes &...xs)
 \( f(g(x_0,x_1,...), y)\) More...
 
template<class Subroutine , class ContainerType , class ... ContainerTypes>
void dg::blas1::subroutine (Subroutine f, ContainerType &&x, ContainerTypes &&... xs)
 \( f(x_0, x_1, ...)\); Customizable and generic blas1 function More...
 

Detailed Description

\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\)

Note
Successive calls to blas routines are executed sequentially
A manual synchronization of threads or devices is never needed in an application using these functions. All functions returning a value block until the value is ready.

Function Documentation

◆ axpby() [1/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::axpby ( get_value_type< ContainerType >  alpha,
const ContainerType1 &  x,
get_value_type< ContainerType >  beta,
const ContainerType2 &  y,
ContainerType &  z 
)
inline

\( z = \alpha x + \beta y\)

This routine computes

\[ z_i = \alpha x_i + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), three(100,3), result(100);
dg::blas1::axpby( 2, two, 3., three, result); // result[i] = 13 (2*2+3*3)
void axpby(get_value_type< ContainerType > alpha, const ContainerType1 &x, get_value_type< ContainerType > beta, ContainerType &y)
Definition: blas1.h:231
thrust::device_vector< double > DVec
Device Vector. The device can be an OpenMP parallelized cpu or a gpu. This depends on the value of th...
Definition: typedefs.h:23
Parameters
alphaScalar
xContainerType x may alias z
betaScalar
yContainerType y may alias z
z(write-only) ContainerType z contains solution on output
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ axpby() [2/2]

template<class ContainerType , class ContainerType1 >
void dg::blas1::axpby ( get_value_type< ContainerType >  alpha,
const ContainerType1 &  x,
get_value_type< ContainerType >  beta,
ContainerType &  y 
)
inline

\( y = \alpha x + \beta y\)

This routine computes

\[ y_i = \alpha x_i + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), three(100,3);
dg::blas1::axpby( 2, two, 3., three); // three[i] = 13 (2*2+3*3)
Parameters
alphaScalar
xContainerType x may alias y
betaScalar
y(read/write) ContainerType y contains solution on output
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ axpbypgz()

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::axpbypgz ( get_value_type< ContainerType >  alpha,
const ContainerType1 &  x,
get_value_type< ContainerType >  beta,
const ContainerType2 &  y,
get_value_type< ContainerType >  gamma,
ContainerType &  z 
)
inline

\( z = \alpha x + \beta y + \gamma z\)

This routine computes

\[ z_i = \alpha x_i + \beta y_i + \gamma z_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two(100,2), five(100,5), result(100, 12);
dg::blas1::axpbypgz( 2.5, two, 2., five, -3.,result);
// result[i] = -21 (2.5*2+2*5-3*12)
void axpbypgz(get_value_type< ContainerType > alpha, const ContainerType1 &x, get_value_type< ContainerType > beta, const ContainerType2 &y, get_value_type< ContainerType > gamma, ContainerType &z)
Definition: blas1.h:265
Parameters
alphaScalar
xContainerType x may alias result
betaScalar
yContainerType y may alias result
gammaScalar
z(read/write) ContainerType contains solution on output
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ copy()

template<class ContainerTypeIn , class ContainerTypeOut >
void dg::blas1::copy ( const ContainerTypeIn &  source,
ContainerTypeOut &  target 
)
inline

\( y=x \)

explicit pointwise assignment \( y_i = x_i\)

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

Parameters
sourcevector to copy
target(write-only) destination
Note
in contrast to the dg::assign functions the copy function uses the execution policy to determine the implementation and thus works only on types with same execution policy
catches self-assignment
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ dot()

template<class ContainerType1 , class ContainerType2 >
get_value_type< ContainerType1 > dg::blas1::dot ( const ContainerType1 &  x,
const ContainerType2 &  y 
)
inline

\( x^T y\) Binary reproducible Euclidean dot product between two vectors

This routine computes

\[ x^T y = \sum_{i=0}^{N-1} x_i y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three(100,3);
double result = dg::blas1::dot( two, three); // result = 600 (100*(2*3))
get_value_type< ContainerType1 > dot(const ContainerType1 &x, const ContainerType2 &y)
Binary reproducible Euclidean dot product between two vectors
Definition: blas1.h:87
Attention
if one of the input vectors contains Inf or NaN or the product of the input numbers reaches Inf or Nan then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case
Note
Our implementation guarantees binary reproducible results. The sum is computed with infinite precision and the result is rounded to the nearest double precision number. This is possible with the help of an adapted version of the dg::exblas library and works for single and double precision.
Parameters
xLeft Container
yRight Container may alias x
Returns
Scalar product as defined above
Note
This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ evaluate()

template<class ContainerType , class BinarySubroutine , class Functor , class ContainerType0 , class ... ContainerTypes>
void dg::blas1::evaluate ( ContainerType &  y,
BinarySubroutine  f,
Functor  g,
const ContainerType0 &  x0,
const ContainerTypes &...  xs 
)
inline

\( f(g(x_0,x_1,...), y)\)

This routine elementwise evaluates

\[ f(g(x_{0i}, x_{1i}, ...), y_i) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

double function( double x, double y) {
return sin(x)*sin(y);
}
dg::HVec pi2(20, M_PI/2.), pi3( 20, 3*M_PI/2.), result(20, 0);
dg::blas1::evaluate( result, dg::equals(), function, pi2, pi3);
// result[i] = sin(M_PI/2.)*sin(3*M_PI/2.) = -1
#define M_PI
M_PI is non-standard ... so MSVC complains.
Definition: functors.h:6
void evaluate(ContainerType &y, BinarySubroutine f, Functor g, const ContainerType0 &x0, const ContainerTypes &...xs)
Definition: blas1.h:556
thrust::host_vector< double > HVec
Host Vector.
Definition: typedefs.h:19
Definition: subroutines.h:22
Template Parameters
BinarySubroutineFunctor with signature: void ( value_type_g, value_type_y&) i.e. it reads the first (and second) and writes into the second argument
Functorsignature: value_type_g operator()( value_type_x0, value_type_x1, ...)
Attention
Both BinarySubroutine and Functor must be callable on the device in use. In particular, with CUDA they must be functor tpyes (not functions) and their signatures must contain the __device__ specifier. (s.a. DG_DEVICE)
Parameters
ycontains result
fThe subroutine, for example dg::equals or dg::plus_equals, see dg::blas1::evaluate binary operators for a collection of predefined functors to use here
gThe functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x0first input
xsmore input
Note
all aliases allowed
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ plus()

template<class ContainerType >
void dg::blas1::plus ( ContainerType &  x,
get_value_type< ContainerType >  alpha 
)
inline

\( x = x + \alpha \)

This routine computes

\[ x_i + \alpha \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2);
dg::blas1::plus( two, 3. )); // two[i] = 5.
void plus(ContainerType &x, get_value_type< ContainerType > alpha)
Definition: blas1.h:207
Parameters
alphaScalar
x(read/write) x
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDivide() [1/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDivide ( const ContainerType1 &  x1,
const ContainerType2 &  x2,
ContainerType &  y 
)
inline

\( y = x_1/ x_2\)

Divides two vectors element by element:

\[ y_i = x_{1i}/x_{2i}\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), three( 100,3), result(100);
dg::blas1::pointwiseDivide( two, three, result );
// result[i] = -0.666... (2/3)
void pointwiseDivide(get_value_type< ContainerType > alpha, const ContainerType1 &x1, const ContainerType2 &x2, get_value_type< ContainerType > beta, ContainerType &y)
Definition: blas1.h:428
Parameters
x1ContainerType x1
x2ContainerType x2 may alias x1
y(write-only) ContainerType y contains result on output ( may alias x1 and/or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDivide() [2/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDivide ( get_value_type< ContainerType >  alpha,
const ContainerType1 &  x1,
const ContainerType2 &  x2,
get_value_type< ContainerType >  beta,
ContainerType &  y 
)
inline

\( y = \alpha x_1/ x_2 + \beta y \)

Divides two vectors element by element:

\[ y_i = \alpha x_{1i}/x_{2i} + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), three( 100,3), result(100,1);
dg::blas1::pointwiseDivide( 3, two, three, 5, result );
// result[i] = 7 (3*2/3+5*1)
Parameters
alphascalar
x1ContainerType x1
x2ContainerType x2 may alias x1
betascalar
y(read/write) ContainerType y contains result on output ( may alias x1 and/or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDot() [1/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDot ( const ContainerType1 &  x1,
const ContainerType2 &  x2,
ContainerType &  y 
)
inline

\( y = x_1 x_2 \)

Multiplies two vectors element by element:

\[ y_i = x_{1i}x_{2i}\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), three( 100,3), result(100);
dg::blas1::pointwiseDot( two, three, result ); // result[i] = 6.
void pointwiseDot(get_value_type< ContainerType > alpha, const ContainerType1 &x1, const ContainerType2 &x2, get_value_type< ContainerType > beta, ContainerType &y)
Definition: blas1.h:336
Parameters
x1ContainerType x1
x2ContainerType x2 may alias x1
y(write-only) ContainerType y contains result on output ( may alias x1 or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDot() [2/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 >
void dg::blas1::pointwiseDot ( get_value_type< ContainerType >  alpha,
const ContainerType1 &  x1,
const ContainerType2 &  x2,
const ContainerType3 &  x3,
get_value_type< ContainerType >  beta,
ContainerType &  y 
)
inline

\( y = \alpha x_1 x_2 x_3 + \beta y\)

Multiplies three vectors element by element:

\[ y_i = \alpha x_{1i}x_{2i}x_{3i} + \beta y_i\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), three( 100,3), four(100,4), result(100,6);
dg::blas1::pointwiseDot(2., two, three, four, -4., result );
// result[i] = 24. (2*2*3*4-4*6)
Parameters
alphascalar
x1ContainerType x1
x2ContainerType x2 may alias x1
x3ContainerType x3 may alias x1 and/or x2
betascalar
y(read/write) ContainerType y contains result on output ( may alias x1,x2 or x3)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDot() [3/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void dg::blas1::pointwiseDot ( get_value_type< ContainerType >  alpha,
const ContainerType1 &  x1,
const ContainerType2 &  x2,
get_value_type< ContainerType >  beta,
ContainerType &  y 
)
inline

\( y = \alpha x_1 x_2 + \beta y\)

Multiplies two vectors element by element:

\[ y_i = \alpha x_{1i}x_{2i} + \beta y_i\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), three( 100,3), result(100,6);
dg::blas1::pointwiseDot(2., two, three, -4., result );
// result[i] = -12. (2*2*3-4*6)
Parameters
alphascalar
x1ContainerType x1
x2ContainerType x2 may alias x1
betascalar
y(read/write) ContainerType y contains result on output ( may alias x1 or x2)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDot() [4/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class ContainerType4 >
void dg::blas1::pointwiseDot ( get_value_type< ContainerType >  alpha,
const ContainerType1 &  x1,
const ContainerType2 &  y1,
get_value_type< ContainerType >  beta,
const ContainerType3 &  x2,
const ContainerType4 &  y2,
get_value_type< ContainerType >  gamma,
ContainerType &  z 
)

\( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\)

Multiplies and adds vectors element by element:

\[ z_i = \alpha x_{1i}y_{1i} + \beta x_{2i}y_{2i} + \gamma z_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two(100,2), three(100,3), four(100,5), five(100,5), result(100,6);
dg::blas1::pointwiseDot(2., two, three, -4., four, five, 2., result );
// result[i] = -56.
Parameters
alphascalar
x1ContainerType x1
y1ContainerType y1
betascalar
x2ContainerType x2
y2ContainerType y2
gammascalar
z(read/write) ContainerType z contains result on output
Note
all aliases are allowed
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ reduce()

template<class ContainerType , class OutputType , class BinaryOp , class UnaryOp = IDENTITY>
OutputType dg::blas1::reduce ( const ContainerType &  x,
OutputType  zero,
BinaryOp  binary_op,
UnaryOp  unary_op = UnaryOp() 
)
inline

\( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction

This routine computes

\[ s = f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_i) \otimes \dots \otimes f(x_{N-1}) \]

where \( \otimes \) is an arbitrary commutative and associative binary operator, \( f\) is an optional unary operator and

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

Note
numerical addition/multiplication is not exactly associative which means that the associated reduction looses precision due to inexact arithmetic. For binary reproducible exactly rounded results use the dg::blas1::dot function. However, this function is more general and faster to execute than dg::blas1::dot.

For example

//Check if a vector contains Inf or NaN
thrust::device_vector<double> x( 100);
bool hasnan = false;
hasnan = dg::blas1::reduce( x, false, thrust::logical_or<bool>(),
std::cout << "x contains Inf or NaN "<<std::boolalpha<<hasnan<<"\n";
OutputType reduce(const ContainerType &x, OutputType zero, BinaryOp binary_op, UnaryOp unary_op=UnaryOp())
Custom (transform) reduction
Definition: blas1.h:139
@ x
x direction
Definition: functors.h:239
Parameters
xContainer to reduce
zeroThe neutral element with respect to binary_op that is x == binary_op( zero, x) . Determines the OutputType so make sure to make the type clear to the compiler (e.g. write (double)0 instead of 0 if you want double output)
Attention
In the current implementation zero is used to initialize partial sums e.g. when reducing MPI Vectors so it is important that zero is actually the neutral element. The reduction will yield wrong results if it is not.
Parameters
binary_opan associative and commutative binary operator
unary_opa unary operator applies to each element of x
Returns
Custom reduction as defined above
Note
This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes
Template Parameters
BinaryOpFunctor with signature: value_type operator()( value_type, value_type) , must be associative and commutative. value_tpye must be compatible with OutputType
UnaryOpa unary operator. The argument type must be compatible with get_value_type<ContainerType>. The return type must be convertible to OutputType
OutputTypeThe type of the result. Infered from zero so make sure zero's type is clear to the compiler.
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ scal()

template<class ContainerType >
void dg::blas1::scal ( ContainerType &  x,
get_value_type< ContainerType >  alpha 
)
inline

\( x = \alpha x\)

This routine computes

\[ \alpha x_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2);
dg::blas1::scal( two, 0.5 )); // result[i] = 1.
void scal(ContainerType &x, get_value_type< ContainerType > alpha)
Definition: blas1.h:185
Parameters
alphaScalar
x(read/write) x
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ subroutine()

template<class Subroutine , class ContainerType , class ... ContainerTypes>
void dg::blas1::subroutine ( Subroutine  f,
ContainerType &&  x,
ContainerTypes &&...  xs 
)
inline

\( f(x_0, x_1, ...)\); Customizable and generic blas1 function

This routine evaluates an arbitrary user-defined subroutine f with an arbitrary number of arguments \( x_s\) elementwise

\[ f(x_{0i}, x_{1i}, ...) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

struct Routine{
void operator()( double x, double y, double& z){
z = 7*x+y + z ;
}
};
dg::DVec two( 100,2), four(100,4);
dg::blas1::subroutine( Routine(), two, 3., four);
// four[i] now has the value 21 (7*2+3+4)
#define DG_DEVICE
Expands to __host__ __device__ if compiled with nvcc else is empty.
Definition: functions.h:9
void subroutine(Subroutine f, ContainerType &&x, ContainerTypes &&... xs)
; Customizable and generic blas1 function
Definition: blas1.h:618
@ z
z direction
@ y
y direction
Parameters
fthe subroutine, see dg::blas1::subroutine subroutines for a collection of predefind subroutines to use here
xthe first argument
xsother arguments
Note
This function can compute any trivial parallel expression for any number of input and output arguments, which is quite remarkable really. In this sense it replaces all other blas1 functions except the scalar product, which is not trivially parallel.
Attention
The user has to decide whether or not it is safe to alias input or output vectors. If in doubt, do not alias output vectors.
Template Parameters
Subroutinea function or functor with an arbitrary number of arguments and no return type; taking a value_type argument for each input argument in the call and a value_type& argument for each output argument. Subroutine must be callable on the device in use. In particular, with CUDA it must be a functor (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system

◆ transform()

template<class ContainerType , class ContainerType1 , class UnaryOp >
void dg::blas1::transform ( const ContainerType1 &  x,
ContainerType &  y,
UnaryOp  op 
)
inline

\( y = op(x)\)

This routine computes

\[ y_i = op(x_i) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

dg::DVec two( 100,2), result(100);
// result[i] = 7.389056... (e^2)
void transform(const ContainerType1 &x, ContainerType &y, UnaryOp op)
Definition: blas1.h:524
Definition: functors.h:86
Parameters
xContainerType x may alias y
y(write-only) ContainerType y contains result, may alias x
opunary Operator to use on every element
Template Parameters
UnaryOpFunctor with signature: value_type operator()( value_type)
Note
UnaryOp must be callable on the device in use. In particular, with CUDA it must be of functor tpye (not a function) and its signatures must contain the __device__ specifier. (s.a. DG_DEVICE)
Attention
only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.
Template Parameters
ContainerTypeAny class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others
  • dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
  • std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
  • double (scalar) and other primitive types ...
If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them
See also
See The dg dispatch system for a detailed explanation of our type dispatch system