Discontinuous Galerkin Library
#include "dg/algorithm.h"
|
\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\) More...
Namespaces | |
namespace | dg::blas1 |
BLAS Level 1 routines. | |
Functions | |
template<class Functor , class ContainerType , class ... ContainerTypes> | |
auto | dg::blas1::vdot (Functor f, const ContainerType &x, const ContainerTypes &...xs) -> std::invoke_result_t< Functor, dg::get_value_type< ContainerType >, dg::get_value_type< ContainerTypes >... > |
\( \sum_i f(x_{0i}, x_{1i}, ...)\) Extended Precision transform reduce | |
template<class ContainerType1 , class ContainerType2 > | |
auto | dg::blas1::dot (const ContainerType1 &x, const ContainerType2 &y) |
\( x^T y\) Binary reproducible Euclidean dot product between two vectors | |
template<class ContainerType , class OutputType , class BinaryOp , class UnaryOp = IDENTITY> | |
OutputType | dg::blas1::reduce (const ContainerType &x, OutputType zero, BinaryOp binary_op, UnaryOp unary_op=UnaryOp()) |
\( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction | |
template<class ContainerTypeIn , class ContainerTypeOut > | |
void | dg::blas1::copy (const ContainerTypeIn &source, ContainerTypeOut &target) |
\( y=x \) | |
template<class ContainerType , class value_type > | |
void | dg::blas1::scal (ContainerType &x, value_type alpha) |
\( x = \alpha x\) | |
template<class ContainerType , class value_type > | |
void | dg::blas1::plus (ContainerType &x, value_type alpha) |
\( x = x + \alpha \) | |
template<class ContainerType , class ContainerType1 , class value_type , class value_type1 > | |
void | dg::blas1::axpby (value_type alpha, const ContainerType1 &x, value_type1 beta, ContainerType &y) |
\( y = \alpha x + \beta y\) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 , class value_type2 > | |
void | dg::blas1::axpbypgz (value_type alpha, const ContainerType1 &x, value_type1 beta, const ContainerType2 &y, value_type2 gamma, ContainerType &z) |
\( z = \alpha x + \beta y + \gamma z\) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 > | |
void | dg::blas1::axpby (value_type alpha, const ContainerType1 &x, value_type1 beta, const ContainerType2 &y, ContainerType &z) |
\( z = \alpha x + \beta y\) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 > | |
void | dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, value_type1 beta, ContainerType &y) |
\( y = \alpha x_1 x_2 + \beta y\) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 > | |
void | dg::blas1::pointwiseDot (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y) |
\( y = x_1 x_2 \) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class value_type , class value_type1 > | |
void | dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, const ContainerType3 &x3, value_type1 beta, ContainerType &y) |
\( y = \alpha x_1 x_2 x_3 + \beta y\) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 > | |
void | dg::blas1::pointwiseDivide (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, value_type1 beta, ContainerType &y) |
\( y = \alpha x_1/ x_2 + \beta y \) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 > | |
void | dg::blas1::pointwiseDivide (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y) |
\( y = x_1/ x_2\) | |
template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class ContainerType4 , class value_type , class value_type1 , class value_type2 > | |
void | dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &y1, value_type1 beta, const ContainerType3 &x2, const ContainerType4 &y2, value_type2 gamma, ContainerType &z) |
\( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\) | |
template<class ContainerType , class ContainerType1 , class UnaryOp > | |
void | dg::blas1::transform (const ContainerType1 &x, ContainerType &y, UnaryOp op) |
\( y = op(x)\) | |
template<class ContainerType , class BinarySubroutine , class Functor , class ContainerType0 , class ... ContainerTypes> | |
void | dg::blas1::evaluate (ContainerType &y, BinarySubroutine f, Functor g, const ContainerType0 &x0, const ContainerTypes &...xs) |
\( f(g(x_0,x_1,...), y)\) | |
template<class Subroutine , class ContainerType , class ... ContainerTypes> | |
void | dg::blas1::subroutine (Subroutine f, ContainerType &&x, ContainerTypes &&... xs) |
\( f(x_0, x_1, ...)\); Customizable and generic blas1 function | |
template<class ContainerType0 , class BinarySubroutine , class Functor , class ContainerType1 , class ... ContainerTypes> | |
void | dg::blas1::kronecker (ContainerType0 &y, BinarySubroutine f, Functor g, const ContainerType1 &x0, const ContainerTypes &...xs) |
\( f(g(x_{0i_0},x_{1i_1},...), y_I)\) (Kronecker evaluation) | |
template<class from_ContainerType , class ContainerType , class ... Params> | |
void | dg::assign (const from_ContainerType &from, ContainerType &to, Params &&... ps) |
Generic way to assign the contents of a from_ContainerType object to a ContainerType object optionally given additional parameters. | |
template<class ContainerType , class from_ContainerType , class ... Params> | |
ContainerType | dg::construct (const from_ContainerType &from, Params &&... ps) |
Generic way to construct an object of ContainerType given a from_ContainerType object and optional additional parameters. | |
template<class ContainerType , class Functor , class ... ContainerTypes> | |
auto | dg::kronecker (Functor &&f, const ContainerType &x0, const ContainerTypes &... xs) |
\( y_I = f(x_{0i_0}, x_{1i_1}, ...) \) Memory allocating version of dg::blas1::kronecker | |
\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\)
|
inline |
Generic way to assign the contents of a from_ContainerType
object to a ContainerType
object optionally given additional parameters.
The idea of this function is to convert between types with the same data layout but different execution policies (e.g. from a thrust::host_vector to a thrust::device_vector). If the layout differs, additional parameters can be used to achieve what you want.
For example
from | source vector |
to | target vector contains a copy of from on output (memory is automatically resized if necessary) |
ps | additional parameters usable for the transfer operation |
from_ContainerType
to a std::array<ContainerType, N>
(all elements are initialized with from_ContainerType) and also a std::vector<ContainerType>
( the desired size of the std::vector
must be provided as an additional parameter) from_ContainerType | must have the same data policy derived from AnyVectorTag as ContainerType (with the exception of std::array and std::vector ) but can have different execution policy |
Params | in some cases additional parameters that are necessary to assign objects of Type ContainerType |
ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( z = \alpha x + \beta y\)
This routine computes
\[ z_i = \alpha x_i + \beta y_i \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | Scalar |
x | ContainerType x may alias z |
beta | Scalar |
y | ContainerType y may alias z |
z | (write-only) ContainerType z contains solution on output |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
|
inline |
\( y = \alpha x + \beta y\)
This routine computes
\[ y_i = \alpha x_i + \beta y_i \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | Scalar |
x | ContainerType x may alias y |
beta | Scalar |
y | (read/write) ContainerType y contains solution on output |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( z = \alpha x + \beta y + \gamma z\)
This routine computes
\[ z_i = \alpha x_i + \beta y_i + \gamma z_i \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | Scalar |
x | ContainerType x may alias result |
beta | Scalar |
y | ContainerType y may alias result |
gamma | Scalar |
z | (read/write) ContainerType contains solution on output |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
|
inline |
Generic way to construct an object of ContainerType
given a from_ContainerType
object and optional additional parameters.
The idea of this function is to convert between types with the same data layout but different execution policies (e.g. from a thrust::host_vector to a thrust::device_vector) If the layout differs, additional parameters can be used to achieve what you want.
For example
from | source vector |
ps | additional parameters necessary to construct a ContainerType object |
from
converted to the new format (memory is allocated accordingly) std::array<ContainerType, N>
(all elements are initialized with from_ContainerType) and also a std::vector<ContainerType>
( the desired size of the std::vector
must be provided as an additional parameter) given a from_ContainerType
from_ContainerType | must have the same data policy derived from AnyVectorTag as ContainerType (with the exception of std::array and std::vector ) but can have different execution policy |
Params | in some cases additional parameters that are necessary to construct objects of Type ContainerType |
ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( y=x \)
explicit pointwise assignment \( y_i = x_i\)
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
source | vector to copy |
target | (write-only) destination |
dg::assign
functions the copy
function uses the execution policy to determine the implementation and thus works only on types with same execution policy dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( x^T y\) Binary reproducible Euclidean dot product between two vectors
This routine computes
\[ x^T y = \sum_{i=0}^{N-1} x_i y_i \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
or
Inf
or NaN
or the product of the input numbers reaches Inf
or Nan
then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case dg::exblas
library and works for single and double precision. dg::blas1::vdot( dg::Product(), x, y);
x | Left Container |
y | Right Container may alias x |
ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( f(g(x_0,x_1,...), y)\)
This routine elementwise evaluates
\[ f(g(x_{0i}, x_{1i}, ...), y_i) \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
BinarySubroutine | Functor with signature: void ( value_type_g, value_type_y&) i.e. it reads the first (and second) and writes into the second argument |
Functor | signature: value_type_g operator()( value_type_x0, value_type_x1, ...) |
BinarySubroutine
and Functor
must be callable on the device in use. In particular, with CUDA they must be functor tpyes (not functions) and their signatures must contain the __device__ specifier. (s.a. DG_DEVICE) y | contains result |
f | The subroutine, for example dg::equals or dg::plus_equals , see dg::blas1::evaluate binary operators for a collection of predefined functors to use here |
g | The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here |
x0 | first input |
xs | more input |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( f(g(x_{0i_0},x_{1i_1},...), y_I)\) (Kronecker evaluation)
This routine elementwise evaluates
\[ f(g(x_{0i_0}, x_{1i_1}, ..., x_{(n-1)i_{n-1}}), y_{((i_{n-1} N_{n-2} +...)N_1+i_1)N_0+i_0}) \]
for all combinations of input values. \( N_i\) is the size of the vector \( x_i\). The **first index \(i_0\) is the fastest varying in the output**, then \( i_1\), etc. If \( x_i\) is a scalar then the size \( N_i = 1\).
dg::RecursiveVectorTag
The size of the output \( y\) must match the product of sizes of input vectors i.e.
\[ N_y = \prod_{i=0}^{n-1} N_i \]
The order of evaluations is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
dg::blas1::kronecker(y, dg::equals(), x_0, x_1, ...)
computes the actual Kronecker product of the arguments in reversed order \[ y = x_{n-1} \otimes x_{n-2} \otimes ... \otimes x_1 \otimes x_0\]
(or the outer product) With this behaviour we can in e.g. Cartesian coordinates naturally define functions \( f(x,y,z)\) and evaluate this function on product space coordinates and have ** \( x \) as the fastest varying coordinate in memory**.BinarySubroutine | Functor with signature: void ( value_type_g, value_type_y&) i.e. it reads the first (and second) and writes into the second argument |
Functor | signature: value_type_g operator()( value_type_x0, value_type_x1, ...) |
BinarySubroutine
and Functor
must be callable on the device in use. In particular, with CUDA they must be functor tpyes (not functions) and their signatures must contain the __device__ specifier. (s.a. DG_DEVICE) y | contains result (size of y must match the product of sizes of \( x_i\)) |
f | The subroutine, for example dg::equals or dg::plus_equals , see dg::blas1::evaluate binary operators for a collection of predefined functors to use here |
g | The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here |
x0 | first input |
xs | more input |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
auto dg::kronecker | ( | Functor && | f, |
const ContainerType & | x0, | ||
const ContainerTypes &... | xs ) |
\( y_I = f(x_{0i_0}, x_{1i_1}, ...) \) Memory allocating version of dg::blas1::kronecker
In a shared memory space with serial execution this function is implemented roughly as in the following pseudo-code
execution_policy
and the tensor_category
of the input vectors. It is unspecified. It is such that the resulting vector is exactly compatible in a call to dg::kronecker( result, dg::equals(), f, x0, xs...);
The MPI distributed version of this function is implemented as
dg::register_mpi_cart_sub
or dg::mpi_cart_sub
. Further, the order of input-communicators must match the dimensions in the common root communicator (see dg::mpi_cart_kron
) i.e. currently in MPI it is not possible to transpose with this functionThe rationale for this behaviour is that:
For example
Functor | signature: value_type_g operator()( value_type_x0, value_type_x1, ...) |
Functor
must be callable on the device in use. In particular, with CUDA it must be a functor tpye (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE) f | The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here |
x0 | first input |
xs | more input |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( x = x + \alpha \)
This routine computes
\[ x_i + \alpha \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | Scalar |
x | (read/write) x |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
|
inline |
\( y = x_1/ x_2\)
Divides two vectors element by element:
\[ y_i = x_{1i}/x_{2i}\]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
x1 | ContainerType x1 |
x2 | ContainerType x2 may alias x1 |
y | (write-only) ContainerType y contains result on output ( may alias x1 and/or x2) |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( y = \alpha x_1/ x_2 + \beta y \)
Divides two vectors element by element:
\[ y_i = \alpha x_{1i}/x_{2i} + \beta y_i \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | scalar |
x1 | ContainerType x1 |
x2 | ContainerType x2 may alias x1 |
beta | scalar |
y | (read/write) ContainerType y contains result on output ( may alias x1 and/or x2) |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
|
inline |
\( y = x_1 x_2 \)
Multiplies two vectors element by element:
\[ y_i = x_{1i}x_{2i}\]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
x1 | ContainerType x1 |
x2 | ContainerType x2 may alias x1 |
y | (write-only) ContainerType y contains result on output ( may alias x1 or x2) |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( y = \alpha x_1 x_2 x_3 + \beta y\)
Multiplies three vectors element by element:
\[ y_i = \alpha x_{1i}x_{2i}x_{3i} + \beta y_i\]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | scalar |
x1 | ContainerType x1 |
x2 | ContainerType x2 may alias x1 |
x3 | ContainerType x3 may alias x1 and/or x2 |
beta | scalar |
y | (read/write) ContainerType y contains result on output ( may alias x1,x2 or x3) |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
|
inline |
\( y = \alpha x_1 x_2 + \beta y\)
Multiplies two vectors element by element:
\[ y_i = \alpha x_{1i}x_{2i} + \beta y_i\]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | scalar |
x1 | ContainerType x1 |
x2 | ContainerType x2 may alias x1 |
beta | scalar |
y | (read/write) ContainerType y contains result on output ( may alias x1 or x2) |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
void dg::blas1::pointwiseDot | ( | value_type | alpha, |
const ContainerType1 & | x1, | ||
const ContainerType2 & | y1, | ||
value_type1 | beta, | ||
const ContainerType3 & | x2, | ||
const ContainerType4 & | y2, | ||
value_type2 | gamma, | ||
ContainerType & | z ) |
\( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\)
Multiplies and adds vectors element by element:
\[ z_i = \alpha x_{1i}y_{1i} + \beta x_{2i}y_{2i} + \gamma z_i \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | scalar |
x1 | ContainerType x1 |
y1 | ContainerType y1 |
beta | scalar |
x2 | ContainerType x2 |
y2 | ContainerType y2 |
gamma | scalar |
z | (read/write) ContainerType z contains result on output |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
|
inline |
\( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction
This routine computes
\[ s = f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_i) \otimes \dots \otimes f(x_{N-1}) \]
where \( \otimes \) is an arbitrary commutative and associative binary operator, \( f\) is an optional unary operator and
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
or
x | Container to reduce |
zero | The neutral element with respect to binary_op that is x == binary_op( zero, x) . Determines the OutputType so make sure to make the type clear to the compiler (e.g. write (double)0 instead of 0 if you want double output) |
zero
is used to initialize partial sums e.g. when reducing MPI Vectors so it is important that zero
is actually the neutral element. The reduction will yield wrong results if it is not. binary_op | an associative and commutative binary operator |
unary_op | a unary operator applies to each element of x |
BinaryOp | Functor with signature: value_type operator()(
value_type, value_type) , must be associative and commutative. value_tpye must be compatible with OutputType |
UnaryOp | a unary operator. The argument type must be compatible with get_value_type<ContainerType> . The return type must be convertible to OutputType |
OutputType | The type of the result. Infered from zero so make sure zero's type is clear to the compiler. |
ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
dg::Average
|
inline |
\( x = \alpha x\)
This routine computes
\[ \alpha x_i \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
alpha | Scalar |
x | (read/write) x |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
value_type | Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType> |
|
inline |
\( f(x_0, x_1, ...)\); Customizable and generic blas1 function
This routine evaluates an arbitrary user-defined subroutine f
with an arbitrary number of arguments \( x_s\) elementwise
\[ f(x_{0i}, x_{1i}, ...) \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
f | the subroutine, see dg::blas1::subroutine subroutines for a collection of predefind subroutines to use here |
x | the first argument |
xs | other arguments |
blas1
functions except the scalar product, which is not trivially parallel. Subroutine | a function or functor with an arbitrary number of arguments and no return type; taking a value_type argument for each input argument in the call and a value_type& argument for each output argument. Subroutine must be callable on the device in use. In particular, with CUDA it must be a functor (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE) |
dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
|
inline |
\( y = op(x)\)
This routine computes
\[ y_i = op(x_i) \]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
x | ContainerType x may alias y |
y | (write-only) ContainerType y contains result, may alias x |
op | unary SquareMatrix to use on every element |
UnaryOp | Functor with signature: value_type operator()( value_type) |
UnaryOp
must be callable on the device in use. In particular, with CUDA it must be of functor tpye (not a function) and its signatures must contain the __device__ specifier. (s.a. DG_DEVICE) dg::blas1::scal( y, 0 )
does not remove NaN or Inf from y while dg::blas1::copy( 0, y )
does. ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |
auto dg::blas1::vdot | ( | Functor | f, |
const ContainerType & | x, | ||
const ContainerTypes &... | xs ) -> std::invoke_result_t<Functor, dg::get_value_type<ContainerType>, dg::get_value_type<ContainerTypes>...> |
\( \sum_i f(x_{0i}, x_{1i}, ...)\) Extended Precision transform reduce
This routine computes
\[ \sum_i f(x_{0i}, x_{1i}, ...)\]
where i
iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType
has the RecursiveVectorTag
, i
recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.
For example
or
dot
is that it works for complex numbers. Inf
or NaN
or the product of the input numbers reaches Inf
or Nan
then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case dg::exblas
library and works for single and double precision.Functor | signature: value_type_g operator()( value_type_x0, value_type_x1, ...) |
Functor
must be callable on the device in use. In particular, with CUDA it must be a functor tpye (not function) and its signatures must contain the __device__ specifier. (s.a. DG_DEVICE) f | The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here |
x | First input |
xs | More input (may alias x) |
ContainerType | Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag . Among others
ContainerTypes in the argument list, then TensorTraits must exist for all of them |