Computer Physics Communications Program LibraryPrograms in Physics & Physical Chemistry |

[Licence| Download | New Version Template] aeyx_v1_0.tar.gz(140 Kbytes) | ||
---|---|---|

Manuscript Title: GPU-Accelerated Adjoint Algorithmic Differentiation | ||

Authors: Felix Gremse, Andreas Höfter, Lukas Razik, Fabian Kiessling, Uwe Naumann | ||

Program title: AD-GPU | ||

Catalogue identifier: AEYX_v1_0Distribution format: tar.gz | ||

Journal reference: Comput. Phys. Commun. 200(2016)300 | ||

Programming language: C++ and CUDA. | ||

Computer: Any computer with a compatible C++ compiler and a GPU with CUDA capability 3.0 or higher. | ||

Operating system: Windows 7 or Linux. | ||

RAM: 16 Gbyte | ||

Keywords: Adjoint Algorithmic Differentiation, GPU Programming. | ||

PACS: 02.60.Jh, 02.60.Pn, 03.65.Fd, 02.20.-a, 07.05.Bx. | ||

Classification: 4.9, 4.12, 6.1, 6.5. | ||

External routines: CUDA 6.5, Intel MKL (optional) and routines from BLAS, LAPACK and CUBLAS | ||

Nature of problem:Gradients are required for many optimization problems, e.g. classifier training or nonlinear image reconstruction. Often, the function, of which the gradient is required, can be implemented as a computer program. Then, algorithmic differentiation methods can be used to compute the gradient. Depending on the approach this may result in excessive requirements of computational resources, i.e. memory and arithmetic computations. GPUs provide massive computational resources but require special considerations to distribute the workload onto many light-weight threads. | ||

Solution method:Adjoint algorithmic differentiation allows efficient computation of gradients of cost functions given as computer programs. The gradient can be theoretically computed using a similar amount of arithmetic operations as one function evaluation. Optimal usage of parallel processors and limited memory is a major challenge which can be mediated by the use of vectorization. | ||

Restrictions:To use the GPU-accelerated adjoint algorithmic differentiation method, the cost function must be implemented using the provided AD-GPU intrinsics for matrix and vector operations. | ||

Unusual features:GPU-acceleration. | ||

Additional comments:The code uses some features of C++11, e.g. std::shared ptr. Alternatively, the boost library can be used. | ||

Running time:The time to run the example program is a few minutes or up to a few hours to reproduce the performance measurements. |

Disclaimer | ScienceDirect | CPC Journal | CPC | QUB |