\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\) More...

Collaboration diagram for BLAS level 1 routines: Vector-Vector:

Namespaces
namespace	dg::blas1
	BLAS Level 1 routines.

Functions
template<class Functor , class ContainerType , class ... ContainerTypes>
auto	dg::blas1::vdot (Functor f, const ContainerType &x, const ContainerTypes &...xs) -> std::invoke_result_t< Functor, dg::get_value_type< ContainerType >, dg::get_value_type< ContainerTypes >... >
	\( \sum_i f(x_{0i}, x_{1i}, ...)\) Extended Precision transform reduce

template<class ContainerType1 , class ContainerType2 >
auto	dg::blas1::dot (const ContainerType1 &x, const ContainerType2 &y)
	\( x^T y\) Binary reproducible Euclidean dot product between two vectors

template<class ContainerType , class OutputType , class BinaryOp , class UnaryOp = IDENTITY>
OutputType	dg::blas1::reduce (const ContainerType &x, OutputType zero, BinaryOp binary_op, UnaryOp unary_op=UnaryOp())
	\( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction

template<class ContainerTypeIn , class ContainerTypeOut >
void	dg::blas1::copy (const ContainerTypeIn &source, ContainerTypeOut &target)
	\( y=x \)

template<class ContainerType , class value_type >
void	dg::blas1::scal (ContainerType &x, value_type alpha)
	\( x = \alpha x\)

template<class ContainerType , class value_type >
void	dg::blas1::plus (ContainerType &x, value_type alpha)
	\( x = x + \alpha \)

template<class ContainerType , class ContainerType1 , class value_type , class value_type1 >
void	dg::blas1::axpby (value_type alpha, const ContainerType1 &x, value_type1 beta, ContainerType &y)
	\( y = \alpha x + \beta y\)

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 , class value_type2 >
void	dg::blas1::axpbypgz (value_type alpha, const ContainerType1 &x, value_type1 beta, const ContainerType2 &y, value_type2 gamma, ContainerType &z)
	\( z = \alpha x + \beta y + \gamma z\)

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void	dg::blas1::axpby (value_type alpha, const ContainerType1 &x, value_type1 beta, const ContainerType2 &y, ContainerType &z)
	\( z = \alpha x + \beta y\)

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void	dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, value_type1 beta, ContainerType &y)
	\( y = \alpha x_1 x_2 + \beta y\)

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void	dg::blas1::pointwiseDot (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y)
	\( y = x_1 x_2 \)

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class value_type , class value_type1 >
void	dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, const ContainerType3 &x3, value_type1 beta, ContainerType &y)
	\( y = \alpha x_1 x_2 x_3 + \beta y\)

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >
void	dg::blas1::pointwiseDivide (value_type alpha, const ContainerType1 &x1, const ContainerType2 &x2, value_type1 beta, ContainerType &y)
	\( y = \alpha x_1/ x_2 + \beta y \)

template<class ContainerType , class ContainerType1 , class ContainerType2 >
void	dg::blas1::pointwiseDivide (const ContainerType1 &x1, const ContainerType2 &x2, ContainerType &y)
	\( y = x_1/ x_2\)

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class ContainerType4 , class value_type , class value_type1 , class value_type2 >
void	dg::blas1::pointwiseDot (value_type alpha, const ContainerType1 &x1, const ContainerType2 &y1, value_type1 beta, const ContainerType3 &x2, const ContainerType4 &y2, value_type2 gamma, ContainerType &z)
	\( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\)

template<class ContainerType , class ContainerType1 , class UnaryOp >
void	dg::blas1::transform (const ContainerType1 &x, ContainerType &y, UnaryOp op)
	\( y = op(x)\)

template<class ContainerType , class BinarySubroutine , class Functor , class ContainerType0 , class ... ContainerTypes>
void	dg::blas1::evaluate (ContainerType &y, BinarySubroutine f, Functor g, const ContainerType0 &x0, const ContainerTypes &...xs)
	\( f(g(x_0,x_1,...), y)\)

template<class Subroutine , class ContainerType , class ... ContainerTypes>
void	dg::blas1::subroutine (Subroutine f, ContainerType &&x, ContainerTypes &&... xs)
	\( f(x_0, x_1, ...)\); Customizable and generic blas1 function

template<class ContainerType0 , class BinarySubroutine , class Functor , class ContainerType1 , class ... ContainerTypes>
void	dg::blas1::kronecker (ContainerType0 &y, BinarySubroutine f, Functor g, const ContainerType1 &x0, const ContainerTypes &...xs)
	\( f(g(x_{0i_0},x_{1i_1},...), y_I)\) (Kronecker evaluation)

template<class from_ContainerType , class ContainerType , class ... Params>
void	dg::assign (const from_ContainerType &from, ContainerType &to, Params &&... ps)
	Generic way to assign the contents of a `from_ContainerType` object to a `ContainerType` object optionally given additional parameters.

template<class ContainerType , class from_ContainerType , class ... Params>
ContainerType	dg::construct (const from_ContainerType &from, Params &&... ps)
	Generic way to construct an object of `ContainerType` given a `from_ContainerType` object and optional additional parameters.

template<class ContainerType , class Functor , class ... ContainerTypes>
auto	dg::kronecker (Functor &&f, const ContainerType &x0, const ContainerTypes &... xs)
	\( y_I = f(x_{0i_0}, x_{1i_1}, ...) \) Memory allocating version of `dg::blas1::kronecker`

Detailed Description

\( f( x_{0i}, x_{1i}, x_{2i}, ...) \) and \( x^T y\)

Note: Successive calls to blas routines are executed sequentially; A manual synchronization of threads or devices is never needed in an application using these functions. All functions returning a value block until the value is ready.

Function Documentation

◆ assign()

template<class from_ContainerType , class ContainerType , class ... Params>

void dg::assign	(	const from_ContainerType &	from,
		ContainerType &	to,
		Params &&...	ps )

inline

Generic way to assign the contents of a from_ContainerType object to a ContainerType object optionally given additional parameters.

The idea of this function is to convert between types with the same data layout but different execution policies (e.g. from a thrust::host_vector to a thrust::device_vector). If the layout differs, additional parameters can be used to achieve what you want.

For example

        //Assign a host vector to a device vector
        dg::HVec host( 100, 1.);
        dg::DVec device(100);
        dg::assign( host, device );
        CHECK( device == dg::DVec( 100, 1.));
 
        //Assign a host vector to all elements of a std::vector of 3 dg::DVec
        std::vector<dg::DVec> device_vec;
        dg::assign( host, device_vec, 3);
        REQUIRE( device_vec.size() == 3);
        for( unsigned u=0; u<3; u ++)
            CHECK( device_vec[u] == device);

Parameters

from	source vector
to	target vector contains a copy of `from` on output (memory is automatically resized if necessary)
ps	additional parameters usable for the transfer operation

Note: it is possible to assign a from_ContainerType to a std::array<ContainerType, N> (all elements are initialized with from_ContainerType) and also a std::vector<ContainerType> ( the desired size of the std::vector must be provided as an additional parameter)

Template Parameters

from_ContainerType	must have the same data policy derived from `AnyVectorTag` as `ContainerType` (with the exception of `std::array` and `std::vector`) but can have different execution policy
Params	in some cases additional parameters that are necessary to assign objects of Type `ContainerType`

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ axpby() [1/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >

void dg::blas1::axpby	(	value_type	alpha,
		const ContainerType1 &	x,
		value_type1	beta,
		const ContainerType2 &	y,
		ContainerType &	z )

inline

\( z = \alpha x + \beta y\)

This routine computes

\[ z_i = \alpha x_i + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three(100,3), result(100);
dg::blas1::axpby( 2, two, 3., three, result);
CHECK( result == dg::DVec( 100, 13.)); // 2*2+3*3

Parameters

alpha	Scalar
x	ContainerType x may alias z
beta	Scalar
y	ContainerType y may alias z
z	(write-only) ContainerType z contains solution on output

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ axpby() [2/2]

template<class ContainerType , class ContainerType1 , class value_type , class value_type1 >

void dg::blas1::axpby	(	value_type	alpha,
		const ContainerType1 &	x,
		value_type1	beta,
		ContainerType &	y )

inline

\( y = \alpha x + \beta y\)

This routine computes

\[ y_i = \alpha x_i + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three(100,3);
dg::blas1::axpby( 2, two, 3., three);
CHECK( three == dg::DVec( 100, 13.)); // 2*2+3*3

Parameters

alpha	Scalar
x	ContainerType x may alias y
beta	Scalar
y	(read/write) ContainerType y contains solution on output

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ axpbypgz()

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 , class value_type2 >

void dg::blas1::axpbypgz	(	value_type	alpha,
		const ContainerType1 &	x,
		value_type1	beta,
		const ContainerType2 &	y,
		value_type2	gamma,
		ContainerType &	z )

inline

\( z = \alpha x + \beta y + \gamma z\)

This routine computes

\[ z_i = \alpha x_i + \beta y_i + \gamma z_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three(100,3);
dg::blas1::axpby( 2, two, 3., three);
CHECK( three == dg::DVec( 100, 13.)); // 2*2+3*3

Parameters

alpha	Scalar
x	ContainerType x may alias result
beta	Scalar
y	ContainerType y may alias result
gamma	Scalar
z	(read/write) ContainerType contains solution on output

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ construct()

template<class ContainerType , class from_ContainerType , class ... Params>

ContainerType dg::construct	(	const from_ContainerType &	from,
		Params &&...	ps )

inline

Generic way to construct an object of ContainerType given a from_ContainerType object and optional additional parameters.

The idea of this function is to convert between types with the same data layout but different execution policies (e.g. from a thrust::host_vector to a thrust::device_vector) If the layout differs, additional parameters can be used to achieve what you want.

For example

        //Construct a device vector from host vector
        dg::HVec host( 100, 1.);
        auto device = dg::construct<dg::DVec>( host );
        CHECK( device == dg::DVec( 100, 1.));
 
        // Construct an array of vectors
        auto device_arr = dg::construct<std::array<dg::DVec, 3>>( host );
        for( unsigned u=0; u<3; u ++)
            CHECK( device_arr[u] == device);
 
        //Construct a std::vector of 3 dg::DVec from a host vector
        auto device_vec = dg::construct<std::vector<dg::DVec>>( host, 3);
        REQUIRE( device_vec.size() == 3);
        for( unsigned u=0; u<3; u++)
            CHECK( device_vec[u] == device);
 
        // Can also be used to change value type, e.g. complex:
        auto complex_dev = dg::construct<dg::cDVec>( host );
        CHECK( complex_dev == dg::cDVec( 100, thrust::complex<double>(1.)));

Parameters

from	source vector
ps	additional parameters necessary to construct a `ContainerType` object

Returns: from converted to the new format (memory is allocated accordingly)

Note: it is possible to construct a std::array<ContainerType, N> (all elements are initialized with from_ContainerType) and also a std::vector<ContainerType> ( the desired size of the std::vector must be provided as an additional parameter) given a from_ContainerType

Template Parameters

from_ContainerType	must have the same data policy derived from `AnyVectorTag` as `ContainerType` (with the exception of `std::array` and `std::vector`) but can have different execution policy
Params	in some cases additional parameters that are necessary to construct objects of Type `ContainerType`

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ copy()

template<class ContainerTypeIn , class ContainerTypeOut >

void dg::blas1::copy	(	const ContainerTypeIn &	source,
		ContainerTypeOut &	target )

inline

\( y=x \)

explicit pointwise assignment \( y_i = x_i\)

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), two_copy(100);
dg::blas1::copy( two, two_copy);
CHECK( two_copy == two);

Parameters

source	vector to copy
target	(write-only) destination

Note: in contrast to the dg::assign functions the copy function uses the execution policy to determine the implementation and thus works only on types with same execution policy; catches self-assignment

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ dot()

template<class ContainerType1 , class ContainerType2 >

auto dg::blas1::dot	(	const ContainerType1 &	x,
		const ContainerType2 &	y )

inline

\( x^T y\) Binary reproducible Euclidean dot product between two vectors

This routine computes

\[ x^T y = \sum_{i=0}^{N-1} x_i y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2.0), three(100,3.0);
double result = dg::blas1::dot(two, three);
CHECK( result == 600.0); //100*(2*3)

or

std::vector<thrust::complex<double>> c0( 100, {1,1}), c1(100, {1,-1});
auto cresult = dg::blas1::dot(c0, c1);
// Result is of complex type
CHECK( cresult == thrust::complex<double>{200,0});

Attention: if one of the input vectors contains Inf or NaN or the product of the input numbers reaches Inf or Nan then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case

Note: Our implementation guarantees binary reproducible results. The sum is computed with infinite precision and the result is rounded to the nearest double precision number. This is possible with the help of an adapted version of the dg::exblas library and works for single and double precision.

Attention: Binary Reproducible results are only guaranteed for float or double input. All other value types redirect to dg::blas1::vdot( dg::Product(), x, y);

Parameters

x	Left Container
y	Right Container may alias x

Returns: Scalar product as defined above

Note: This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ evaluate()

template<class ContainerType , class BinarySubroutine , class Functor , class ContainerType0 , class ... ContainerTypes>

void dg::blas1::evaluate	(	ContainerType &	y,
		BinarySubroutine	f,
		Functor	g,
		const ContainerType0 &	x0,
		const ContainerTypes &...	xs )

inline

\( f(g(x_0,x_1,...), y)\)

This routine elementwise evaluates

\[ f(g(x_{0i}, x_{1i}, ...), y_i) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

auto function = [](double x, double y){
    return sin(x)*sin(y);
};
dg::HVec pi2(20, M_PI/2.), pi3( 20, 3*M_PI/2.), result(20, 0);
dg::blas1::evaluate( result, dg::equals(), function, pi2, pi3);
CHECK( result == dg::HVec( 20, -1.)); // sin(M_PI/2.)*sin(3*M_PI/2.) = -1

Template Parameters

BinarySubroutine	Functor with signature: `void ( value_type_g, value_type_y&)` i.e. it reads the first (and second) and writes into the second argument
Functor	signature: `value_type_g operator()( value_type_x0, value_type_x1, ...)`

Attention: Both BinarySubroutine and Functor must be callable on the device in use. In particular, with CUDA they must be functor tpyes (not functions) and their signatures must contain the __device__ specifier. (s.a. DG_DEVICE)

Parameters

y	contains result
f	The subroutine, for example `dg::equals` or `dg::plus_equals`, see dg::blas1::evaluate binary operators for a collection of predefined functors to use here
g	The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x0	first input
xs	more input

Note: all aliases allowed

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ kronecker() [1/2]

template<class ContainerType0 , class BinarySubroutine , class Functor , class ContainerType1 , class ... ContainerTypes>

void dg::blas1::kronecker	(	ContainerType0 &	y,
		BinarySubroutine	f,
		Functor	g,
		const ContainerType1 &	x0,
		const ContainerTypes &...	xs )

inline

\( f(g(x_{0i_0},x_{1i_1},...), y_I)\) (Kronecker evaluation)

This routine elementwise evaluates

\[ f(g(x_{0i_0}, x_{1i_1}, ..., x_{(n-1)i_{n-1}}), y_{((i_{n-1} N_{n-2} +...)N_1+i_1)N_0+i_0}) \]

for all combinations of input values. \( N_i\) is the size of the vector \( x_i\). The **first index \(i_0\) is the fastest varying in the output**, then \( i_1\), etc. If \( x_i\) is a scalar then the size \( N_i = 1\).

Attention: None of the \( x_i\) or \( y\) can have the dg::RecursiveVectorTag

The size of the output \( y\) must match the product of sizes of input vectors i.e.

\[ N_y = \prod_{i=0}^{n-1} N_i \]

The order of evaluations is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

        auto function = []( double x, double y) {
            return x+y;
        };
        std::vector<double> xs{1,2,3,4}, ys{ 10,20,30,40}, y(16, 0);
        dg::blas1::kronecker( y, dg::equals(), function, xs, ys);
        // Note how xs varies fastest in result:
        std::vector<double> result = { 11,12,13,14,21,22,23,24,31,32,33,34,41,42,43,44};
        CHECK( y == result);
 
        // Note that the following code is equivalent
        // but that dg::blas1::kronecker has a performance advantage since
        // it never explicitly forms XS or YS
        std::vector<double> XS(16), YS(16);
        for( unsigned i=0; i<4; i++)
        for( unsigned k=0; k<4; k++)
        {
            XS[i*4+k] = xs[k];
            YS[i*4+k] = ys[i];
        }
        dg::blas1::evaluate( y, dg::equals(), function, XS, YS);
        CHECK( y == result);
 
        // Finally, we could also write
        dg::blas1::kronecker( XS, dg::equals(), []( double x, double){ return x;}, xs, ys);
        dg::blas1::kronecker( YS, dg::equals(), []( double, double y){ return y;}, xs, ys);
        dg::blas1::evaluate( y, dg::equals(), function, XS, YS);
        CHECK( y == result);

Note: This function is trivially parallel and the MPI version simply calls the appropriate shared memory version The user is responsible for making sure that the result has the correct communicator; For the function \( f(x_0, x_1, ..., x_{n-1}) = x_0 x_1 ... x_{n-1} \) dg::blas1::kronecker(y, dg::equals(), x_0, x_1, ...) computes the actual Kronecker product of the arguments in reversed order
\[ y = x_{n-1} \otimes x_{n-2} \otimes ... \otimes x_1 \otimes x_0\]
(or the outer product) With this behaviour we can in e.g. Cartesian coordinates naturally define functions \( f(x,y,z)\) and evaluate this function on product space coordinates and have ** \( x \) as the fastest varying coordinate in memory**.

Template Parameters

BinarySubroutine	Functor with signature: `void ( value_type_g, value_type_y&)` i.e. it reads the first (and second) and writes into the second argument
Functor	signature: `value_type_g operator()( value_type_x0, value_type_x1, ...)`

Attention: Both BinarySubroutine and Functor must be callable on the device in use. In particular, with CUDA they must be functor tpyes (not functions) and their signatures must contain the __device__ specifier. (s.a. DG_DEVICE)

Parameters

y	contains result (size of y must match the product of sizes of \( x_i\))
f	The subroutine, for example `dg::equals` or `dg::plus_equals`, see dg::blas1::evaluate binary operators for a collection of predefined functors to use here
g	The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x0	first input
xs	more input

Note: all aliases allowed

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system; dg::kronecker

◆ kronecker() [2/2]

template<class ContainerType , class Functor , class ... ContainerTypes>

auto dg::kronecker	(	Functor &&	f,
		const ContainerType &	x0,
		const ContainerTypes &...	xs )

\( y_I = f(x_{0i_0}, x_{1i_1}, ...) \) Memory allocating version of dg::blas1::kronecker

In a shared memory space with serial execution this function is implemented roughly as in the following pseudo-code

// assume x0, xs are host vectors
size = product( x0.size(), xs.size(), ...)
using value_type = decltype( f(x0[0], xs[0]...));
thrust::host_vector<value_type> y(size);
dg::blas1::kronecker( y, dg::equals(), f, x0, xs...);

Note: The return type is inferred from the execution_policy and the tensor_category of the input vectors. It is unspecified. It is such that the resulting vector is exactly compatible in a call to dg::kronecker( result, dg::equals(), f, x0, xs...);

The MPI distributed version of this function is implemented as

MPI_Comm comm_kron = dg::mpi_cart_kron( x0.communicator(), xs.communicator()...);

return MPI_Vector{dg::kronecker( f, x0.data(), xs.data()...), comm_kron}; // a dg::MPI_Vector

dg::mpi_cart_kron

MPI_Comm mpi_cart_kron(std::vector< MPI_Comm > comms)

Form a Kronecker product among Cartesian communicators.

Definition mpi_kron.h:178

dg::MPI_Vector

A simple wrapper around a container object and an MPI_Comm.

Definition mpi_vector.h:37

Attention: In particular this means that in MPI all the communicators in the input argument vectors need to be Cartesian communicators that were created from a common Cartesian root communicator and both root and all sub communicators need to be registered in the dg library through calls to dg::register_mpi_cart_sub or dg::mpi_cart_sub. Further, the order of input-communicators must match the dimensions in the common root communicator (see dg::mpi_cart_kron) i.e. currently in MPI it is not possible to transpose with this function

The rationale for this behaviour is that:

the MPI standard has no easy way of finding a common ancestor to Cartesin sub communicators
the MPI standard has no easy way of re-joining previously split Cartesian communicators
we want to avoid creating a new communicator every time this function is called.

For example

auto function = []( double x, double y) {
    return x+y;
};
std::vector<double> xs{1,2,3,4}, ys{ 10,20,30,40};
auto y = dg::kronecker( function, xs, ys);
// Note how xs varies fastest in result:
std::vector<double> result = { 11,12,13,14,21,22,23,24,31,32,33,34,41,42,43,44};
CHECK( y == result);

Template Parameters

Functor signature: value_type_g operator()( value_type_x0, value_type_x1, ...)

Attention: Functor must be callable on the device in use. In particular, with CUDA it must be a functor tpye (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE)

Parameters

f	The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x0	first input
xs	more input

Returns: newly allocated result (size of container matches the product of sizes of \( x_i\))

Note: all aliases allowed

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system; dg::blas1::kronecker dg::mpi_cart_kron

◆ plus()

template<class ContainerType , class value_type >

void dg::blas1::plus	(	ContainerType &	x,
		value_type	alpha )

inline

\( x = x + \alpha \)

This routine computes

\[ x_i + \alpha \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2);
dg::blas1::plus( two, 3.);
CHECK( two == dg::DVec( 100, 5.));

Parameters

alpha	Scalar
x	(read/write) x

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDivide() [1/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 >

void dg::blas1::pointwiseDivide	(	const ContainerType1 &	x1,
		const ContainerType2 &	x2,
		ContainerType &	y )

inline

\( y = x_1/ x_2\)

Divides two vectors element by element:

\[ y_i = x_{1i}/x_{2i}\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100);
dg::blas1::pointwiseDivide( two,  three, result );
CHECK( result == dg::DVec( 100, 2./3.));

Parameters

x1	ContainerType x1
x2	ContainerType x2 may alias x1
y	(write-only) ContainerType y contains result on output ( may alias x1 and/or x2)

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDivide() [2/2]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >

void dg::blas1::pointwiseDivide	(	value_type	alpha,
		const ContainerType1 &	x1,
		const ContainerType2 &	x2,
		value_type1	beta,
		ContainerType &	y )

inline

\( y = \alpha x_1/ x_2 + \beta y \)

Divides two vectors element by element:

\[ y_i = \alpha x_{1i}/x_{2i} + \beta y_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100,1);
dg::blas1::pointwiseDivide( 3, two,  three, 5, result );
CHECK( result == dg::DVec( 100, 7.)); // 3*2/3+5*1

Parameters

alpha	scalar
x1	ContainerType x1
x2	ContainerType x2 may alias x1
beta	scalar
y	(read/write) ContainerType y contains result on output ( may alias x1 and/or x2)

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDot() [1/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 >

void dg::blas1::pointwiseDot	(	const ContainerType1 &	x1,
		const ContainerType2 &	x2,
		ContainerType &	y )

inline

\( y = x_1 x_2 \)

Multiplies two vectors element by element:

\[ y_i = x_{1i}x_{2i}\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100);
dg::blas1::pointwiseDot( two,  three, result );
CHECK( result == dg::DVec( 100, 6.)); // 2*3

Parameters

x1	ContainerType x1
x2	ContainerType x2 may alias x1
y	(write-only) ContainerType y contains result on output ( may alias x1 or x2)

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ pointwiseDot() [2/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class value_type , class value_type1 >

void dg::blas1::pointwiseDot	(	value_type	alpha,
		const ContainerType1 &	x1,
		const ContainerType2 &	x2,
		const ContainerType3 &	x3,
		value_type1	beta,
		ContainerType &	y )

inline

\( y = \alpha x_1 x_2 x_3 + \beta y\)

Multiplies three vectors element by element:

\[ y_i = \alpha x_{1i}x_{2i}x_{3i} + \beta y_i\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), four(100,4), result(100,6);
dg::blas1::pointwiseDot(2., two,  three, four, -4., result );
CHECK( result == dg::DVec( 100, 24.)); // 2*2*3*4-4*6

Parameters

alpha	scalar
x1	ContainerType x1
x2	ContainerType x2 may alias x1
x3	ContainerType x3 may alias x1 and/or x2
beta	scalar
y	(read/write) ContainerType y contains result on output ( may alias x1,x2 or x3)

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDot() [3/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class value_type , class value_type1 >

void dg::blas1::pointwiseDot	(	value_type	alpha,
		const ContainerType1 &	x1,
		const ContainerType2 &	x2,
		value_type1	beta,
		ContainerType &	y )

inline

\( y = \alpha x_1 x_2 + \beta y\)

Multiplies two vectors element by element:

\[ y_i = \alpha x_{1i}x_{2i} + \beta y_i\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), three( 100,3), result(100,6);
dg::blas1::pointwiseDot(2., two,  three, -4., result );
CHECK( result == dg::DVec( 100, -12.)); // 2*2*3-4*6

Parameters

alpha	scalar
x1	ContainerType x1
x2	ContainerType x2 may alias x1
beta	scalar
y	(read/write) ContainerType y contains result on output ( may alias x1 or x2)

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ pointwiseDot() [4/4]

template<class ContainerType , class ContainerType1 , class ContainerType2 , class ContainerType3 , class ContainerType4 , class value_type , class value_type1 , class value_type2 >

void dg::blas1::pointwiseDot	(	value_type	alpha,
		const ContainerType1 &	x1,
		const ContainerType2 &	y1,
		value_type1	beta,
		const ContainerType3 &	x2,
		const ContainerType4 &	y2,
		value_type2	gamma,
		ContainerType &	z )

\( z = \alpha x_1y_1 + \beta x_2y_2 + \gamma z\)

Multiplies and adds vectors element by element:

\[ z_i = \alpha x_{1i}y_{1i} + \beta x_{2i}y_{2i} + \gamma z_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two(100,2), three(100,3), four(100,4), five(100,5), result(100,6);
dg::blas1::pointwiseDot(2., two,  three, -4., four, five, 2., result );
CHECK( result == dg::DVec( 100, -56.)); // 2*2*3-4*4*5+2*6

Parameters

alpha	scalar
x1	ContainerType x1
y1	ContainerType y1
beta	scalar
x2	ContainerType x2
y2	ContainerType y2
gamma	scalar
z	(read/write) ContainerType z contains result on output

Note: all aliases are allowed

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ reduce()

template<class ContainerType , class OutputType , class BinaryOp , class UnaryOp = IDENTITY>

OutputType dg::blas1::reduce	(	const ContainerType &	x,
		OutputType	zero,
		BinaryOp	binary_op,
		UnaryOp	unary_op = UnaryOp() )

inline

\( f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_{N-1}) \) Custom (transform) reduction

This routine computes

\[ s = f(x_0) \otimes f(x_1) \otimes \dots \otimes f(x_i) \otimes \dots \otimes f(x_{N-1}) \]

where \( \otimes \) is an arbitrary commutative and associative binary operator, \( f\) is an optional unary operator and

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

Note: numerical addition/multiplication is not exactly associative which means that the associated reduction looses precision due to inexact arithmetic. For binary reproducible exactly rounded results use the dg::blas1::dot function. However, this function is more general and faster to execute than dg::blas1::dot.

For example

//Check if a vector contains Inf or NaN
thrust::device_vector<double> x( 100, NAN );
bool hasnan = false;
hasnan = dg::blas1::reduce( x, false, thrust::logical_or<bool>(),
    dg::ISNFINITE<double>());
CHECK( hasnan == true);

or

// Find minimum and maximum of a vector
for( int u=0; u<100; u++)
    x[u] = (double)(u-10)*(u-10);
// Notice the zero elements of the min and max functions
double min = dg::blas1::reduce( x, +1e308, thrust::minimum<double>());
double max = dg::blas1::reduce( x, -1e308, thrust::maximum<double>());
CHECK( min == 0);
CHECK( max == 89*89);

Parameters

x	Container to reduce
zero	The neutral element with respect to binary_op that is `x == binary_op( zero, x)` . Determines the `OutputType` so make sure to make the type clear to the compiler (e.g. write `(double)0` instead of `0` if you want `double` output)

Attention: In the current implementation zero is used to initialize partial sums e.g. when reducing MPI Vectors so it is important that zero is actually the neutral element. The reduction will yield wrong results if it is not.

Parameters

binary_op	an associative and commutative binary operator
unary_op	a unary operator applies to each element of `x`

Returns: Custom reduction as defined above

Note: This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes

Template Parameters

BinaryOp	Functor with signature: `value_type operator()( value_type, value_type)`, must be associative and commutative. `value_tpye` must be compatible with `OutputType`
UnaryOp	a unary operator. The argument type must be compatible with `get_value_type<ContainerType>`. The return type must be convertible to `OutputType`
OutputType	The type of the result. Infered from `zero` so make sure `zero's` type is clear to the compiler.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system; For partial reductions see dg::Average

◆ scal()

template<class ContainerType , class value_type >

void dg::blas1::scal	(	ContainerType &	x,
		value_type	alpha )

inline

\( x = \alpha x\)

This routine computes

\[ \alpha x_i \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2);
dg::blas1::scal( two, 0.5);
CHECK( two == dg::DVec( 100, 1));

Parameters

alpha	Scalar
x	(read/write) x

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Template Parameters

value_type Any type that can be used in an arithmetic operation with dg::get_value_type<ContainerType>

◆ subroutine()

template<class Subroutine , class ContainerType , class ... ContainerTypes>

void dg::blas1::subroutine	(	Subroutine	f,
		ContainerType &&	x,
		ContainerTypes &&...	xs )

inline

\( f(x_0, x_1, ...)\); Customizable and generic blas1 function

This routine evaluates an arbitrary user-defined subroutine f with an arbitrary number of arguments \( x_s\) elementwise

\[ f(x_{0i}, x_{1i}, ...) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), four(100,4);
dg::blas1::subroutine( []DG_DEVICE( double x, double y, double& z){
    z = 7*x + y +z;
}, two, 3., four);
CHECK( four == dg::HVec( 100, 21.)); // 7*2+3+4

Parameters

f	the subroutine, see dg::blas1::subroutine subroutines for a collection of predefind subroutines to use here
x	the first argument
xs	other arguments

Note: This function can compute any trivial parallel expression for any number of input and output arguments, which is quite remarkable really. In this sense it replaces all other blas1 functions except the scalar product, which is not trivially parallel.

Attention: The user has to decide whether or not it is safe to alias input or output vectors. If in doubt, do not alias output vectors.

Template Parameters

Subroutine a function or functor with an arbitrary number of arguments and no return type; taking a value_type argument for each input argument in the call and a value_type& argument for each output argument. Subroutine must be callable on the device in use. In particular, with CUDA it must be a functor (not a function) and its signature must contain the __device__ specifier. (s.a. DG_DEVICE)

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ transform()

template<class ContainerType , class ContainerType1 , class UnaryOp >

void dg::blas1::transform	(	const ContainerType1 &	x,
		ContainerType &	y,
		UnaryOp	op )

inline

\( y = op(x)\)

This routine computes

\[ y_i = op(x_i) \]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

dg::DVec two( 100,2), result(100);
dg::blas1::transform( two, result, dg::EXP<double>());
CHECK( result == dg::DVec( 100, exp(2.)));

Parameters

x	ContainerType x may alias y
y	(write-only) ContainerType y contains result, may alias x
op	unary SquareMatrix to use on every element

Template Parameters

UnaryOp Functor with signature: value_type operator()( value_type)

Note: UnaryOp must be callable on the device in use. In particular, with CUDA it must be of functor tpye (not a function) and its signatures must contain the __device__ specifier. (s.a. DG_DEVICE)

Attention: only result vectors that are write-only and do not alias input vectors contain correct results when the result vector contains NaN or Inf on input. In particular, dg::blas1::scal( y, 0 ) does not remove NaN or Inf from y while dg::blas1::copy( 0, y ) does.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

◆ vdot()

template<class Functor , class ContainerType , class ... ContainerTypes>

auto dg::blas1::vdot	(	Functor	f,
		const ContainerType &	x,
		const ContainerTypes &...	xs ) -> std::invoke_result_t<Functor, dg::get_value_type<ContainerType>, dg::get_value_type<ContainerTypes>...>

\( \sum_i f(x_{0i}, x_{1i}, ...)\) Extended Precision transform reduce

This routine computes

\[ \sum_i f(x_{0i}, x_{1i}, ...)\]

where i iterates over all elements inside the given vectors. The order of iterations is undefined. Scalar arguments to container types are interpreted as vectors with all elements constant. If ContainerType has the RecursiveVectorTag, i recursively loops over all entries. If the vector sizes do not match, the result is undefined. The compiler chooses the implementation and parallelization of this function based on given template parameters. For a full set of rules please refer to The dg dispatch system.

For example

// A fun way to compute the size of a vector
dg::DVec v( 100,2);
unsigned size = dg::blas1::vdot( []DG_DEVICE(double){ return 1u;},
        v);
CHECK( size == 100u); //100*1

or

// Compute the weighted norm of a complex vector
std::vector<double> ww( 100, 42.);
std::vector<std::complex<double>> cc( 100, {1,1});
// Use auto to allow changing std::complex to thrust::complex
// Use norm instead of std::norm and let ADL work
double nrm = dg::blas1::vdot([](double w, auto z){ return w*norm(z);},
    ww, cc);
CHECK( nrm == 100*42*2.0);

Note: The main motivator for this version of dot is that it works for complex numbers.

Attention: if one of the input vectors contains Inf or NaN or the product of the input numbers reaches Inf or Nan then the behaviour is undefined and the function may throw. See dg::ISNFINITE and dg::ISNSANE in that case

Note: This implementation does not guarantee binary reproducible results. The sum is computed with extended precision and the result is rounded to the nearest double precision number. This is possible with the help of an adapted version of the dg::exblas library and works for single and double precision.

Template Parameters

Functor signature: value_type_g operator()( value_type_x0, value_type_x1, ...)

Attention: Functor must be callable on the device in use. In particular, with CUDA it must be a functor tpye (not function) and its signatures must contain the __device__ specifier. (s.a. DG_DEVICE)

Parameters

f	The functor to evaluate, see A large collection and dg::blas1::evaluate variadic functors for a collection of predefined functors to use here
x	First input
xs	More input (may alias x)

Returns: Scalar product as defined above

Note: This routine is always executed synchronously due to the implicit memcpy of the result. With mpi the result is broadcasted to all processes.

Template Parameters

ContainerType

Any class for which a specialization of TensorTraits exists and which fulfills the requirements of the there defined data and execution policies derived from AnyVectorTag and AnyPolicyTag. Among others

dg::HVec (serial), dg::DVec (cuda / omp), dg::MHVec (mpi + serial) or dg::MDVec (mpi + cuda / omp)
std::vector<dg::DVec> (vector of shared device vectors), std::array<double, 4> (array of 4 doubles) or std::map < std::string, dg::DVec> ( a map of named vectors)
double (scalar) and other primitive types ...

If there are several ContainerTypes in the argument list, then TensorTraits must exist for all of them

See also: See The dg dispatch system for a detailed explanation of our type dispatch system

Namespaces

Functions

Detailed Description

Function Documentation

◆ assign()

◆ axpby() [1/2]

◆ axpby() [2/2]

◆ axpbypgz()

◆ construct()

◆ copy()

◆ dot()

◆ evaluate()

◆ kronecker() [1/2]

◆ kronecker() [2/2]

◆ plus()

◆ pointwiseDivide() [1/2]

◆ pointwiseDivide() [2/2]

◆ pointwiseDot() [1/4]

◆ pointwiseDot() [2/4]

◆ pointwiseDot() [3/4]

◆ pointwiseDot() [4/4]

◆ reduce()

◆ scal()

◆ subroutine()

◆ transform()

◆ vdot()