If you are going to do a lot of work on the Power nodes or the x86 GPU nodes you might want to add the following to your .bashrc file to set up your environment.

#Alias to get interactive session on the Power nodes with kepler GPUs
alias g8='srun   -N 1 --tasks-per-node=20 -p ppc --time=1:00:00                   --gres=gpu:kepler:4  --x11 --pty bash -l'
alias g8.1='srun -N 1 --tasks-per-node=20 -p ppc --time=1:00:00 --nodelist=ppc001 --gres=gpu:kepler:4  --x11 --pty bash -l'
alias g8.2='srun -N 1 --tasks-per-node=20 -p ppc --time=1:00:00 --nodelist=ppc002 --gres=gpu:kepler:4  --x11 --pty bash -l'

#Alias to get interactive session on the x86 node with pascal gpu
alias gpu4='srun -N 1 --tasks-per-node=16 -p gpu --time=1:00:00 --nodelist=gpu004  --gres=gpu:pascal:1 --x11 --pty bash -l'

if [[ `hostname` = *"ppc"* ]]; then
	date > /dev/null
	source /etc/profile
	export PATH=/software/apps/ddt/bin:$PATH
	export LD_LIBRARY_PATH=/software/apps/ddt/lib:$LD_LIBRARY_PATH
	module load XL
	module load OpenMPI
	module load CUDA
	module load PGI/18.4
else >/dev/null
	module load PrgEnv/svn/1.6.11
	module load PrgEnv/libs/cuda/8.0

Note: if you have added the blurb given above to your .bashrc file you can remove the the following two lines from the file buildit discussed below.

module purge
module load PGI/18.4

These examples exercise the GPUs on the Power nodes in various ways. To build/run these examples:

  1. In a new directory, download the examples
  2. Uncompress it
    tar -xzf gpu.tgz
  3. Get an interactive session on ppc001 or ppc002.
    srun -N 1 --tasks-per-node=1 -p ppc-build --share --time=1:00:00 --gres=gpu:kepler:4 --pty bash -l
  4. Run the script buildit. This sets up the environment and does a make.
  5. Exit the interactive session.
  6. Run the batch script
    sbatch -p ppc power_script


A script that sets up the environment and then does a make


Makefile for the examples.


This program returns the number of GPUs detected on a node. It should be 4 for ppc001 and ppc002. If not, there is a problem with your environment.


A very simple OpenAcc program.


Jacobi relaxation Calculation in OpenAcc and OpenMP This is from the Nvidia workshop.


Timer code for laplace2d.c.

A matrix multiply in Cuda from

cuFFT library example

C and Fortran Cuda programs. The CPU code accepts the Grid and Block dimensions then calls the kernel. We note that the number of threads for the kernel is the product of the grid and block dimensions. The kernel simply fills in an array of length 6*(# threads). The first element of each set of 6 is a thread number. Then we have: blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y, and threadIdx.z. Finally, the CPU prints this array. The file "input" is for this program.


Input for and testinput.f90


A script for running the examples.