Elsevier Science Home
Computer Physics Communications Program Library
Full text online from Science Direct
Programs in Physics & Physical Chemistry
CPC Home

[Licence| Download | New Version Template] adul_v2_0.tar.gz(178 Kbytes)
Manuscript Title: ARVO-CL: The OpenCL Version of the ARVO Package - An Efficient Tool for Computing the Accessible Surface Area and the Excluded Volume of Proteins via Analytical Equations
Authors: Ján Buša Jr., Shura Hayryan, Ming-Chya Wu, Ján Buša, Chin-Kun Hu
Program title: ARVO-CL
Catalogue identifier: ADUL_v2_0
Distribution format: tar.gz
Journal reference: Comput. Phys. Commun. 183(2012)2494
Programming language: C, OpenCL.
Computer: PC Pentium; SPP'2000.
Operating system: All OpenCL capable systems.
Has the code been vectorised or parallelized?: Parallelized using GPUs. A serial version (non GPU) is also included in the package.
Keywords: ARVO, Proteins, Solvent accessible area, Excluded volume, Stereographic projection, OpenCL package.
PACS: 87.14.Ee, 87.15.Aa, 02.60.Jh, 05.10.-a.
Classification: 3.

External routines: cl.hpp (http://www.khronos.org/registry/cl/api/1.1/cl.hpp)

Does the new version supersede the previous version?: Yes

Nature of problem:
Molecular mechanics computations, continuum percolation

Solution method:
Numerical algorithm based on the analytical formulas, after using the stereographic transformation.

Reasons for new version:
During past decade we have published a number of protein structure related algorithms and software packages [1, 2, 3, 4, 5, 6] which have received considerable attention from researchers and interesting applications of such packages have been found. For example, ARVO [4] has been used to find that ratios of volume V to surface area A. for proteins in Protein Data Bank (PDB) distribute in a narrow range [7]. Such a result is useful for finding native structures of proteins.
Therefore, we consider that there is a demand to revise and modernize these tools and to make them more efficient. Here we present the new version of the ARVO package. The original ARVO package was written in the FORTRAN language. One of the reasons for the new version is to rewrite it in C in order to make it more friendly to the young researchers who are not familiar with FORTRAN. Another, more important reason, is to use the possibilities for speeding-up provided by modern graphical cards. We also want to eliminate the necessity of re-compiling the program for every molecule. For this purpose, we have added the possibility of using general pdb [8] files as an input. Once compiled, the program can receive any number of input files successively. Also, we found it necessary to go through the algorithm and to make some tricks for avoiding unnecessary memory usage so that the package is more efficient.

Summary of revisions:
  1. New tool. ARVO is designed to calculate the volume and accessible surface area of an arbitrary system of overlapping spheres (representing atoms), the biomolecules being just one albeit important, application. The user provides the coordinates and radii of the spheres as well as the radius of the probe sphere (water molecule for biomolecules). In the old version of ARVO the input of data was organized immediately in the code, which made it necessary to recompile the program after every change in input data. In the current version a module called 'input structure' has been created to input the data from an independent external file. The coordinates and radii are stored in the file with extention *.ats (see the directory 'input' in the package). Each line in the file corresponds to one sphere (atom) and has the format

    24.733     -4.992     -13.256     2.800

    The first three numbers are the (x, y, z) coordinates of the atom and the last one is the radius. It is important to remember that the radius of the probe sphere must be already added to this number. In the above example, the value 2.800 is obtained by the formula "sphere radius+probe sphere radius". In the case of the arbitrary system of spheres the file *.ats is created by the user. In the case of the proteins the 'input structure' takes as an input a file in the format compatible with Protein Data Bank (pdb) format [8] and creates a corresponding *.ats file. It also assigns automatically radii to individual spheres and (optionally) adds to all radii the probe sphere (water molecule) radius. As output it produces a file containing coordinates of spheres together with radii. This file works automatically as an input for ARVO. Using an external tool allows users to create their own mappings of atoms and radii without the need to re-compile the tool 'input structure' or program ARVO.
    It is again the user's responsibility to assign proper radii to each type of atom. One can use any of the published standard sets of radii (see for example, [9, 10, 11, 12, 13]). Alternatively, the user can assign his own values for radii immediately in the module input structure. The radii are assigned in a special file with extension *pds (see the documentation) which consists of lines like this: ATOM CA ALA 2.0 which is read as "the Calpha atom of Alanine has radius 2.0 Angstroms". Here we provide for testing of the file rashin.pds where the radii are assigned according to [12].
    The output file contains only recognized atoms. Atoms that were not recognized (are not part of mapping) are written to a separate log file allowing the user to review and correct the mapping files later.
  2. The Language. Implementing the program in C is a natural first step when translating a program into OpenCL. This implementation is line-by-line rewritten from the original FORTRAN version of ARVO.
  3. OpenCL implementation. OpenCL [14] is an open standard for parallel programming of heterogeneous systems. Unlike other parallelization technologies like CUDA [15] or ATI Stream [16] which are interconnected with specific hardware (produced by NVIDIA or ATI, respectively), OpenCL is vendor-independent, and programs written in OpenCL can be run on any hardware of companies supporting this standard including AMD, INTEL, and NVIDIA. Programs written in OpenCL can be run without much change both on CPUs and GPUs.

Improvements as compared with the original version:
Support for files in the format as created by 'input structure'; input of parameters (name of input file) via command line; dynamic size of arrays - removal of the necessity to re-compile the program after any change in size of structures; memory allocation according to the real demands of the application; replacing north pole test by radius size slight reduction (see below).
To compile an OpenCL program, one needs to download and install the appropriate driver and software development kit (SDK). The program itself consists of two parts: a part running on the CPU and a part running on the GPU. The CPU initializes communication between the computer and the GPU loads data processes and exports results. The GPU does the parallel part of calculation, consisting of the search for neighboring atoms and calculating the contribution of the area and volume of the individual atom to the total area and volume of the molecule. For details of the algorithm, please read references [3, 4].
In programming using OpenCL, more attention must be given to memory used than in a classical approach. Memory of the device is usually limited and therefore, some changes to the original algorithm are necessary. First, unlike in the FORTRAN version of the program, no structures containing the list of neighbor atoms are created. The search for the neighbors is done on-line, when the calculation of the contribution from individual atoms is being performed.
The idea behind the North Pole check and molecule rotation [4, Sec. 4.7] has been changed. If during the north pole test, the north pole of the active sphere lies close to the surface of a neighboring sphere, the radius of such a neighboring sphere is multiplied by 0.9999 instead of rotating the whole molecule. This allows the algorithm to continue normally. Changing the radius of one atom changes the area and the volume of this atom by 0.02% and 0.03%, respectively. As the atom's contribution to the total area (volume) of the protein is usually only a part of the atom's total area (volume) and since there are many atoms in the protein itself, the change of total area (volume) is much smaller than 0.02% (0.03%). Testings showed relative errors ranging from 10-4 up to 10-8. An additional benefit of this approach is, that the whole molecule is not rotated and therefore there no errors are introduced which would occur during such rotation. We were even able to find a protein (1S1I having 31938 atoms), where, after several hundreds of rotations, ARVO was not able to find such a position that the original north pole test could pass. For such proteins the new approach is the only one possible.
Some data obtained using the north pole test (with rotation) and those without the north pole test (with radii reduction) are summarized in Table 1. The radius of water molecule was set to 1.4Å, and Rashin's set of the van der Waals radii of atoms [12] was used. The first column contains the protein name and the number of atoms. Each cell of the second and the third columns contains two numbers. The upper number is the volume (surface area) obtained using original ARVO algorithm ([4]) with conventional north pole test and rotation. The lower number shows the difference coming from using the new approach. The upper number in the fourth column shows the number of rotations when using the original version and the second number is the number of atoms for which the radius has been reduced. The relative error of volume (upper number) and area (lower number) obtained by using radius reduction are shown in the last column. It can be seen clearly, that the error is negligible.

protein
atoms #
volume
diff
area
diff
rotat.
reduct.
δvolume[%]
δarea[%]
3rn3
957
23951.180469
-0.000025
6858.322636
-0.000007
3
1
-1.04.10-7
-1.02.10-7
3cyt
1600
40875.867395
-0.001575
11455.474832
0.001415
3
4
-3.85.10-6
1.24.10-4
2act
1657
38608.243038
0.049480
9054.007350
0.001733
4
2
1.28.10-4
1.91.10-5
2brd
1738
43882.735479
-0.000344
10918.203529
-0.000097
21
1
-7.84.10-7
-8.88.10-7
8tln
2455
56698.988883
-0.000966
12496.978064
0.000459
15
4
-1.70.10-6
3.67.10-6
1rr8
4108
105841.502192
-0.000699
27983.159772
-0.000214
18
4
-6.60.10-7
-7.65.10-7
1xi5
15696
1743445.092001
0.007709
863139.882703
0.000070
1
1
4.42.10-7
8.11.10-9

Table 1: Comparison of volumes and surface areas of different proteins obtained by original ARVO and by the new version. Different strategies for dealing with the "north pole" are applied. The first column contains the PDB ID of the protein and the number of atoms. Second column: the volume of the protein obtained with original ARVO (upper number) and the difference with the new approach (lower number). Third column: the same as in 2nd column for the surface area. Forth column: The number of rotations of the molecule in original ARVO (upper number) and the number atoms whose radius has been reduced in the new version (lower number). Fifth column: The relative errors for the volume (upper number) and the area (lower number).

The disadvantage is that calculations using OpenCL are done in single precision only. This comes from the fact, that the OpenCL standard doesn't contain double precision float number operations as a basic part but as an extension only. This means that availability of double precision calculations depends on the device (CPU, GPU) vendor. Switching to double precision calculations downgrades speed performance (calculations in double precision are 8 to 2 times slower than the same calculations in single precision). Another problem is that after using the double precision switch, all calculations are done in double precision which leads to problems with insufficient memory. This problem can be bypassed by explicitly switching to single precision where possible but it requires careful modification of the whole program source. Since on our GPU (NVIDIA GTX 480) double precision was available, we have decided to use the double precision only for the critical parts of algorithm (s.a. integral calculation), leaving non-critical parts in single precision. This allowed us to speed up the calculation and to obtain acceptable results.
Results of the test calculations are given in Table 2. All calculations except for 2brd0 have been performed using water radius 1.4Å. The first column contains the protein name and the number of atoms. The second column contains computation time in seconds (in FORTRAN/CPU - upper part and OpenCL/GPU - lower part). The third column is a speed-up (time on the CPU divided by time on the GPU). The fourth and fifth columns contain the volume and area calculated in FORTRAN (upper number) and the difference when compared to results obtained by OpenCL (lower number). As one can see, the area and the volume obtained using FORTRAN (in double precision) and the OpenCL implementation (combination of single and double precision) are practically the same. This is even more clear from the relative error of the OpenCL implementation as shown in the last column (upper number for volume, and lower number for area). As to computational time, FORTRAN (C) implementation is appropriate in the case when the calculation takes approximately less than 2 seconds. This is because in the case of OpenCL some time - about 0.3s-1.5s on testing configuration - is needed for the initialization of the device and for starting the communication. Speed-up is clearly visible for large proteins when the parallel approach can be exploited, but complexity of protein needs to be taken into account as well. Compare the times for 2brd (water radius 1.4Å) and 2brd0 (water radius 0Å). The difference is in the number of neighbors (overlapping spheres). While, for water radius 1.4Å the number of neighbors is high and using the GPU is efficient, for water radius 0Å it is better to use CPU. All results were obtained on a test configuration with CPU Intel Core i7 930 processor running at 2.8GHz and a GPU NVIDIA GeForce GTX 480.

protein
atoms #
time F95 [s]
OpenCL
speed
up
volume
diff
area
diff
δvolume[%]
δarea[%]
1eca
1031
8.23
1.37
6.0126072.003069
0.004310
7004.168138
0.000498
1.65.10-5
7.11.10-6
2ptn
1629
13.72
1.52
9.0139273.220933
-0.007906
9227.570716
-0.005795
-2.01.10-5
-6.28.10-5
2brd
1738
15.77
1.59
9.9143882.735136
-0.006326
10918.203432
0.001471
-1.44.10-5
1.35.10-5
2brd0
1738
0.29
0.32
0.9122412.825807
-0.020471
22546.123881
-0.008437
-9.13.10-5
-9.17.10-4
8tln
2455
23.32
1.70
13.7456698.988550
-0.003028
12496.977990
-0.008708
-5.34.10-6
-4.64.10-4
1rr8
4108
30.89
1.75
17.67105841.501492
0.020445
27983.159558
-0.000802
1.93.10-5
-2.87.10-6
1s1i
31938
286.81
8.45
33.95816980.348702
-1.140763
253160.674893
0.049478
-1.40.10-4
1.95.10-5

Table 2: The table contains comparative data on precision and computational times obtained by FORTRAN vs. OpenCL implementations of ARVO. The structure of the columns is similar to the Table 1. Note that last protein (1s1i) was not calculated using FORTRAN implementation and comparison presented is between C and OpenCL version. This is because we were not able to find such rotation, that north pole test would pass.

At the time of writing, OpenCL allowed the allocation of only 1/4 of the total memory of the devices (CPU, GPU) by one call to malloc. This can be bypassed by four individual calls of memory allocation requesting 1/4 of the total devices' memory. It is advisable to use a dedicated GPU for the calculations since sharing a GPU for calculations and displaying graphics can lead to unexpected results due to common access to the memory of devices.

Restrictions:
The program does not account for possible cavities inside the molecule. The current version works in a combination of single and double precision (see Summary of revisions for details).

Running time:
Depends on the size of the molecule under consideration. For molecules whose running time was less than 2 seconds in the old version the performance is likely to decrease. This changes considerably when larger molecules are calculated (in test configuration speed-ups up to 34 were obtained).

References:
[1] F. Eisenmenger, U. H. E. Hansmann, S. Hayryan, C.-K. Hu, Comput. Phys. Commun. 138 (2001) 192.
[2] F. Eisenmenger, U. H. E. Hansmann, S. Hayryan, C.-K. Hu, Comput. Phys. Commun. 174 (2006) 422.
[3] S. Hayryan, C.-K. Hu, J. Skrivánek, E. Hayryan, I. Pokorný, J. Comput. Chem. 26 (2005) 334.
[4] J. Busa, J. Dzurina, E. Hayryan, S. Hayryan, C.-K. Hu, J. Plavka, I. Pokorný, J. Skrivánek, M.-C. Wu, Comput. Phys. Commun. 165 (2005) 59.
[5] J. Busa, S. Hayryan, C.-K. Hu, J. Skrivánek, M.-C. Wu, J. Comput. Chem. 30 (2009) 346.
[6] J. Busa, S. Hayryan, C.-K. Hu, J. Skrivánek, M.-C. Wu, Comput. Phys. Commun. 181 (2010) 2116.
[7] M.-C. Wu, M. S. Li, W.-J. Ma, M. Kouza, C.-K. Hu, EPL 96 (2011) 68005.
[8] http://www.rcsb.org
[9] B. Lee, F. M. Richards, J. Mol. Biol. 55 (1971) 379.
[10] F. M. Richards, Annu. Rev. Bipohys. Bioeng. 6 (1977) 151.
[11] A. Shrake, J. A. Rupley, J. Mol. Biol. 79 (1973) 351.
[12] A. A. Rashin, M. Iofin, B. Honig, Biochemistry 25 (1986) 3619.
[13] C. Chotia, Nature 248 (1974) 338.
[14] http://www.khronos.org/opencl/
[15] http://www.nvidia.com/object/cuda home new.html
[16] http://www.amd.com/stream