Debugging Programs whilst not bugging yourself

The purpose of this article is to show how you can get error information from programs without resorting to doing a full blown debugger session. We will look at some examples in Fortran and C. We will show various compiler options that can give you additional information. We will touch on using the gdb debugger, but only using one simple command. We will start with some definitions and then look first at Fortran, because information is easier to get. Then we will look at some C examples.

The examples shown here are serial, not parallel MPI. If the examples were parallel the output would be similar except there would be multiple copies.

Some definitions:

Segmentation fault

Segmentation faults are most often caused by memory errors. That is, your program is trying to access memory it does not own. This can occur when your program tries to access an array element that is out of bounds or tries to use a pointer that is not allocated. It is possible for you can "get lucky" and access an array element out of bounds or access an invalid pointer and not generate an error. In Fortran, you can force (most) array and pointer accesses to be checked by specifying compile time options.

Arithmetic Error

Arithmetic errors are the result of illegal mathematical operations, such as dividing a number by zero, or having an illegal value for a function such as sqrt(-1.0). You can also have overflow errors which means that you have performed an operation that returns a value outside of what the computer can represent, such as x=(1e100)**100.

Arithmetic errors do not normally cause program termination but you might get NaN or Inf printed instead of "normal" numbers. These stand for Not a Number and Infinity. In Fortran, you can force program termination on arithmetic errors by specifying compile time options. For C programs you can also force termination but it requires some simple program modification. We will look at both cases.

Core file and core dump

When a program terminates abnormally it will sometimes produce a core file or core dump. The normal name for these files is core.##### where ##### represents the process number. Core files contain a description of the state of a program when it terminates. Sometimes you can find information about the program termination from a core file, including the line number that was executing at the time.

The generation of core files is disabled by default on RA. Please contact tkaiser@mines.edu if you would like to be able to generate core files.

Our Fortran example

Our Fortran example is based on the Stommel program that is used to teach MPI programming. It has been stripped down to have just enough to do some real calculation. Then we added some code to produce errors. We also added a call to get_command_argument to get a command line argument if it is present. If there is a command line argument of 1, 2, or 3 the program will produce an error. If there are no command line arguments then the program will run to completion without error and print a value of about 2.68E+07.


program stommel
    implicit none
    realallocatable:: psi(:,:)         ! our calculation grid
    realallocatable:: new_psi(:,:)     ! temp grid
    real :: diff
    integer nx,ny,ierr,len,stat
    character (len=7) :: aline
! the subroutine get_command_argument is part of the Fortran 2003
! standard.  it is accepted by most Fortran compilers as an 
! extension.  here, it returns the first command line argument as
! a string "aline" if it is present.  if the program is not given
! any command line arguments then stat indicates an error.  
!
! if we have a command line argument then we read from it into the 
! integer ierr.  ierr is used to force one of our error conditions.
    call get_command_argument(1alinelenstat)
    if(stat .eq0)then
      read(aline,*)ierr
    else
      ierr=0
    endif
! set the grid size
    nx=300 
    ny=300
! allocate the grid to size nx * ny plus the boundary cells
    allocate(psi(0:nx+1,0:ny+1))
    allocate(new_psi(0:nx+1,0:ny+1))
! set the values of the grid to 1
    psi=1.0
! psi(0,0) we set to zero so that we can force a
! divide by 0.0 later
    psi(0,0)=0.0
! do a jacobian iteration
    call do_jacobi(psi,new_psi,diff,1,nx,1,ny,ierr)
    write(*,*)diff
end program stommel
!*********************
subroutine do_jacobi(psi,new_psi,diff,i1,i2,j1,j2,ierr)
    implicit none
    integer,intent(in) :: i1,i2,j1,j2,ierr
    real,dimension(i1-1:i2+1,j1-1:j2+1):: psi
    real,dimension(i1-1:i2+1,j1-1:j2+1):: new_psi
    real diff
    realparameter:: a1=1.0,a2=1.0,a3=1.0,a4=1.0,a5=1.0
    integer i,j
    diff=0.0
    do j=j1,j2
        do i=i1,i2
            new_psi(i,j)=a1*psi(i+1,j) + a2*psi(i-1,j) + &
                         a3*psi(i,j+1) + a4*psi(i,j-1) - &
                         a5*(i+j)
            diff=diff+abs(new_psi(i,j)-psi(i,j))
         enddo
     enddo
! error 1 -- arithmetic error divide by zero 
     if(ierr .eq1)then
       psi(1,1)=1.0/psi(0,0)
       write(*,*)psi(0,0),psi(1,1)
     endif
! error 2 -- array access out of bounds
     if(ierr .eq2)then
       psi(30000,30000)=1.0
     endif
! error 3 -- arithmetic error value out of bounds.  we
!            do this inside of the subroutine asinerror
!            to show another level in our call tree.
     if(ierr .eq3)then
       call asinerror(3.0)
     endif
     psi(i1:i2,j1:j2)=new_psi(i1:i2,j1:j2)
end subroutine do_jacobi
!*********************
subroutine asinerror(x)
! error 3 -- arithmetic error value out of bounds
     y=asin(x)
     write(*,*)"asin(",x,") returned ",y
end subroutine

We will look at default builds using the Intel (ifort), Portland Group (pgf90), and NAG (nagfor) compilers. The compile lines are:

[tkaiser@ra state]$ ifort test.f90 -o intel1
[tkaiser@ra state]$ pgf90 test.f90 -o pg1   
[tkaiser@ra state]$ nagfor test.f90 -o nag1

Intel Compiler

Using the Intel compiled version of the program with no input and inputs of the three different command line options we get:

[tkaiser@ra state]$ ./intel1
  26820002.0
  
  
[tkaiser@ra state]$ ./intel1 1
  0.000000000000000E+000 Infinity               
  26820002.0
  
  
[tkaiser@ra state]$ ./intel1 2
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
intel1             0000000000403AE0  Unknown               Unknown  Unknown
intel1             000000000040307C  Unknown               Unknown  Unknown
libc.so.6          00000036BA91C3FB  Unknown               Unknown  Unknown
intel1             0000000000402FAA  Unknown               Unknown  Unknown


[tkaiser@ra state]$ ./intel1 3
 asin(   3.00000000000000      ) returned  NaN                    
  26820002.0

When we run the program with the command line arguments of 1, 2, or 3 we clearly have error conditions but we are not given any indication of where the problems are happening.

Let's concentrate on the segmentation fault and look at options that can tell you where the error is occurring. In theory the -traceback option will allow Fortran programs to print the call tree for an error, that is the line that caused the error and the routine name and where it was called from. What we find is that -traceback is often not enough. Lets look at some examples.

[tkaiser@ra state]$ ifort -traceback  test.f90 -o intel2
[tkaiser@ra state]$ ./intel2 2
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
intel2             0000000000403B13  MAIN__                     38  test.f90
intel2             000000000040307C  Unknown               Unknown  Unknown
libc.so.6          00000036BA91C3FB  Unknown               Unknown  Unknown
intel2             0000000000402FAA  Unknown               Unknown  Unknown
[tkaiser@ra state]$ 

With just the -traceback option we get an error on line 38. This is an improvement but it is actually giving us the line number for

call do_jacobi(psi,new_psi,diff,1,nx,1,ny,ierr)

not the line within do_jacobi that had the problem.

The default level of optimization for Fortran compiles is -O2. In this case the optimization is causing the whole do_jacobi routine to be inlined so there is no information given for the real line that caused the problem. There are a few things we can do about this. We can set the optimization to -O1 or we can add -debug inline-debug-info. Actually -debug inline-debug-info disables optimization. Some other options that we can use are -g or just -debug. These two options add enhanced debugging information that is useful when using debuggers. They also, by default reduce the level of optimization. We can manually set different levels of optimization. For example:

[tkaiser@ra state]$ ifort -o case02  test.f90 -traceback       -O1
[tkaiser@ra state]$ ./case02 2
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
case02             0000000000403662  do_jacobi_                 67  test.f90
case02             00000000004033C6  MAIN__                     38  test.f90
case02             000000000040307C  Unknown               Unknown  Unknown
libc.so.6          00000036BA91C3FB  Unknown               Unknown  Unknown
case02             0000000000402FAA  Unknown               Unknown  Unknown
[tkaiser@ra state]$ 

Here we get the correct line number for the line in do_jacobi that caused the problem (67) and the line number from which do_jacobi was called (38).

Another example is:

[tkaiser@ra state]$ ifort -o case07  test.f90 -traceback     -debug inline-debug-info -O2
[tkaiser@ra state]$ ./case07 2
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
case07             0000000000403BFF  MAIN__                     67  test.f90
case07             000000000040307C  Unknown               Unknown  Unknown
libc.so.6          00000036BA91C3FB  Unknown               Unknown  Unknown
case07             0000000000402FAA  Unknown               Unknown  Unknown
[tkaiser@ra state]$ 

Here we get the correct line number that caused the problem (67) but the routine name is wrong.

Our three types of results are:

Line 38 = MAIN__ and line 67 == do_jacobi_
Correctly show that the error was on line 67 of do_jacobi and this routine was called from Line 38 of the main routine.
Line 38 = MAIN__
Says that the problem occurred in a routine called from line 38 of main.
Line 67 = MAIN__
Give the correct line number for the error but the routine is not correct.

We ran 32 cases using different levels of optimization and with/without the -g, -debug and -debug inline-debug-info. The outputs were always similar to one of the three types given above. The best results are returned with optimization set to -O1. It correctly shows that the error was on line 67 of do_jacobi and this routine was called from Line 30 of the main routine. If you want a higher level of optimization then you should specify the -debug inline-debug-info but realize that the results need to be looked carefully because they may be misleading.

Case -traceback Optimization -debug inline-debug-info -debug -g line 38 line 67
1x MAIN__
2x -O1 MAIN__ do_jacobi_
3x -O2 MAIN__
4x -O3 MAIN__
5x x MAIN__ do_jacobi_
6x -O1 x MAIN__ do_jacobi_
7x -O2 x MAIN__
8x -O3 x MAIN__
9x x MAIN__ do_jacobi_
10x -O1 x MAIN__ do_jacobi_
11x -O2 x MAIN__
12x -O3 x MAIN__
13x x x MAIN__ do_jacobi_
14x -O1 x x MAIN__ do_jacobi_
15x -O2 x x MAIN__
16x -O3 x x MAIN__
17x x MAIN__ do_jacobi_
18x -O1 x MAIN__ do_jacobi_
19x -O2 x MAIN__
20x -O3 x MAIN__
21x x x MAIN__ do_jacobi_
22x -O1 x x MAIN__ do_jacobi_
23x -O2 x x MAIN__
24x -O3 x x MAIN__
25x x x MAIN__ do_jacobi_
26x -O1 x x MAIN__ do_jacobi_
27x -O2 x x MAIN__
28x -O3 x x MAIN__
29x x x x MAIN__ do_jacobi_
30x -O1 x x x MAIN__ do_jacobi_
31x -O2 x x x MAIN__
32x -O3 x x x MAIN__

Portland Group Compiler

The Portland Group Compiler has a -traceback option also. However, it does not currently work. This has been reported to the company. We will expand this article when the option it fixed.

You can sometimes get a traceback from PG compiled programs fro a core dump if you compile using the -g flag. See the section "Getting traceback information from Core Dumps" below.

NAG compiler

The traceback option for the NAG compiler is -gline. Its behavior is a little strange. If we just use this option we don't get a traceback.

[tkaiser@ra state]$ nagfor  -gline  test.f90 -o case01
NAG Fortran Compiler Release 5.2(668)
Extension: test.f90, line 40: GET_COMMAND_ARGUMENT intrinsic procedure
[NAG Fortran Compiler normal termination, 1 warning]

[tkaiser@ra state]$ ./case01 2
Segmentation fault (core dumped)
[tkaiser@ra state]$ 

If we add the -C option we will get a traceback. The -C option causes a number of different runtime checks to be done. See the man page.

[tkaiser@ra state]$ nagfor -O4 -gline -C test.f90 -o case02
NAG Fortran Compiler Release 5.2(668)
Extension: test.f90, line 40: GET_COMMAND_ARGUMENT intrinsic procedure
[NAG Fortran Compiler normal termination, 1 warning]


[tkaiser@ra state]$ ./case02 2
Runtime Error: test.f90, line 67: Subscript 1 of PSI (value 30000) is out of range (0:301)
Program terminated by fatal error
test.f90, line 67: Error occurred in DO_JACOBI
test.f90, line 38: Called by STOMMEL
Aborted (core dumped)
[tkaiser@ra state]$ 

Divide by zero and argument out of range - Floating point exceptions

If we run our program with an input of 1 or 3 then we will get a divide by zero on line 62 or an invalid argument error in the asin function.

By default, the NAG compiled programs will exit in either of these two cases. On the other hand, the Intel and Portland Group compiled programs will not exit but just report a Not a Number (NaN) or Infinity (Inf) for results. The following table shows the compiler options to enable/disable exit on floating point exceptions. If an exception is enabled then the program will always exit when an it occures. If not, it will most likely only exit if the problem causes something else to go wrong, such as a memory access error.

Turning on exception handling in Fortran

Compiler Exit on exception Continue on exception
nagfor -
ifort -fpe0 -
pgf90 -Ktrap=fp,inv -

Turning on run time checking in Fortran

Below we have the compile line options to perform various types of run time checking for our three compilers. For the NAG and Intel compilers there are a number of sub options that can be specified. Please see the man page. Note that turning on these options can reduce the speed of your program so they should most likely only be used during debugging.

Portland Group pgf90
-C
Add array bounds checking; the same as -Mbounds.
-Mchkptr
Check for unintended de-referencing of NULL pointers.
Intel ifort
-check
Do extensive run time checks
NAG nagfor
-C
Do extensive run time checks

C programs aren't any easier

We have seen that for Fortran programs we can sometimes set compile line options to get tracebacks and turn "abort at floating point exceptions" on or off. For C programmers things are not that easy. The traceback options only work in rare cases. For example, in theory, a C program compiled with the Intel compiler with the -traceback option will return a trace on error if it is linked as a subroutine to a Fortran program.

Also, there are no compile line options to enable aborts at floating point exceptions. There is however a subroutine call to do this. We will look at that subroutine call using a C version of our program.

Our C program


#define FLT double
#define INT int
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <fenv.h>
#include <signal.h>
#include <execinfo.h>


FLT **matrix(INT nrl,INT nrh,INT ncl,INT nch);
void do_jacobi(FLT ** psi,FLT ** new_psi,FLT *diff,INT i1,INT i2,INT j1,INT j2);
int feenableexcept(int);
void fpehandler(int sig_num);
void asinerror(FLT x);

INT nx,ny;
INT ierr,inter,isig;

int main(int argcchar **argv)
{
FLT mydiff;
FLT **psi;     /* our calculation grid */
FLT **new_psi/* temp storage for the grid */
INT i,j;


    ierr=0;
    inter=0;
/*
! if we have a command line argument then we read from it into the 
! integer ierr.  ierr is used to force one of our error conditions.
*/
    if(argc > 1) {
        sscanf(argv[1],"%d",&ierr);
    }

/* if there is a second argument then enable expection handling. 
   if the value is greater than 0 also install a signal handler. 
*/
    if(argc > 2) {
        feenableexcept(FE_ALL_EXCEPT);
        sscanf(argv[2],"%d",&isig);
        if(isig > 0) {
            signal(SIGFPEfpehandler);
        }
    }

    nx=300;
    ny=300;
    psi=    matrix((INT)(0),(INT)(nx+1),(INT)(0),(INT)(ny+1));
    new_psi=matrix((INT)(0),(INT)(nx+1),(INT)(0),(INT)(ny+1));
/* set initial guess for the value of the grid */
    for(i=0;i<=nx+1;i++)
        for(j=0;j<=ny+1;j++)
          psi[i][j]=1.0;
    psi[0][0]=0.0;
    do_jacobi(psi,new_psi,&mydiff,1,nx,1,ny);
    printf("%g\n",mydiff);
    return 0;
}

void do_jacobi(FLT ** psi,FLT ** new_psi,FLT *diff,INT i1,INT i2,INT j1,INT j2){
/*
! does a single Jacobi iteration step
! input is the grid and the indices for the interior cells
! new_psi is temp storage for the the updated grid
! output is the updated grid in psi and diff which is
! the sum of the differences between the old and new grids
*/
    INT i,j;
    FLT a1,a2,a3,a4,a5;
    a1=1.0;a2=1.0;a3=1.0;a4=1.0;a5=1.0;
    *diff=0.0;
        for(j=j1;j<=j2;j++){
    fori=i1;i<=i2;i++) {
            new_psi[i][j]=a1*psi[i+1][j] + a2*psi[i-1][j] + 
                         a3*psi[i][j+1] + a4*psi[i][j-1] - 
                         a5*(i+j);
            *diff=*diff+fabs(new_psi[i][j]-psi[i][j]);
         }
    }
    fori=i1;i<=i2;i++)
        for(j=j1;j<=j2;j++)
           psi[i][j]=new_psi[i][j];
/*
!error 1 -- arithmetic error divide by zero
*/
    if(ierr == 1 ) {
        psi[1][1]=1.0/psi[0][0];
        printf("%g %g\n",psi[0][0],psi[1][1]);
    }
/*
! error 2 -- array access out of bounds
*/
    if(ierr == 2) {
        psi[3000][3000]=1.0;
    }
/*
! error 3 -- arithmetic error value out of bounds.  we
!            do this inside of the subroutine asinerror
!            to show another level in our call tree.
*/
    if(ierr == 3) {
        asinerror(3.0);
    }
}

void asinerror(FLT x) {
    FLT y;
    y=asin(x);
    printf("asin(%g) returned %g\n",x,y);
}

FLT **matrix(INT nrl,INT nrh,INT ncl,INT nch)
{
    INT i;
    FLT **m;
    m=(FLT **) malloc((unsigned) (nrh-nrl+1)*sizeof(FLT*));
    if (!m){
        printf("allocation failure 1 in matrix()\n");
        exit(1);
    }
    m -= nrl;
    for(i=nrl;i<=nrh;i++) {
        if(i == nrl){ 
            m[i]=(FLT *) malloc((unsigned) (nrh-nrl+1)*(nch-ncl+1)*sizeof(FLT));
            if (!m[i]){
                printf("allocation failure 2 in matrix()\n");
                exit(1);
            }
            m[i] -= ncl;
        }
        else {
            m[i]=m[i-1]+(nch-ncl+1);
        }
    }
    return m;
}


/* signal handler for signal(SIGFPE, fpehandler) - floating point exceptions */
void fpehandler(int sig_num)
{
    void *array[10];
    size_t size;
    char **strings;
    int i;
    
    printf("SIGFPE: floating point exception occurred, exiting with signal %d.\n",sig_num);
    
/* 
! the following can provide additional traceback information
!   but it is not particularly useful because it is addresses
!   not line numbers
*/
   
/*
// get void*'s for all entries on the stack
    size = backtrace(array, 10);
// print out all the frames to stderr
    backtrace_symbols_fd(array, size, 2);

    size = backtrace (array, 10);
    strings = backtrace_symbols(array, size);
    printf ("Obtained %zd stack frames.\n", size);
    for (i = 0; i < size; i++)
        printf ("%s\n", strings[i]);

    free (strings);
*/
    abort();
}

Again we read the first command line argument as an integer. If the value is 1 we get a divide by zero which returns "inf". 2 gives us a Segmentation fault. 3 gives a value of "nan" for asin(3.0). We get the same results for the Intel (icc), Portland Group (pgcc), and gcc compilers.

[tkaiser@ra state]$ gcc stc_00.c -o stc_00g -lm

[tkaiser@ra state]$ ./stc_00g 1
0 inf
2.682e+07

[tkaiser@ra state]$ ./stc_00g 2
Segmentation fault (core dumped)

[tkaiser@ra state]$ ./stc_00g 3
asin(3) returned nan
2.682e+07

Now notice in our program we test for the presence of a second command line argument. If we have one then we call

    if(argc > 2) {
        feenableexcept(FE_ALL_EXCEPT);
        sscanf(argv[2],"%d",&isig);
        if(isig > 0) {
            signal(SIGFPE, fpehandler);
        }
    }

The routine feenableexcept turns on checking for exceptions. An exception in this case is a problem in a calculation. The macro FE_ALL_EXCEPT says to check for several types of exceptions including divide by zero and invalid arguments. When an exception occurs there is a signal that is generated and sent to the program.

Think of a signal to a program as it being hit with a hammer. For example, when you do a control-C to kill a program it is being sent a signal to terminate. Signals can be ignored, or they can be handled in a default way, or they can be handeled by calling a routine that you specify.

The routine "signal" sets up a handler for a signal. In our case if isig is greater than 0 then we call the routine signal(SIGFPE, fpehandler). Then whenever your program gets a SIGFPE signal that a floating point exception has occurred, whatever it is doing it stops and calls the routine "fpehandler". Our routine fpehandler is:

void fpehandler(int sig_num)
{
        printf("SIGFPE: floating point exception occurred, exiting with signal %d.\n",sig_num);
        abort();
}

This will give us a little more information than just the default exception handler. Let's look at the differences, first with no exception testing, then with exception testing turned on but using the default handler, then finally with our handler.

[tkaiser@ra state]$ ./stc_00g 1  
0 inf
2.682e+07
[tkaiser@ra state]$ 

[tkaiser@ra state]$ ./stc_00g 1 0
Floating point exception (core dumped)

[tkaiser@ra state]$ ./stc_00g 1 1
SIGFPE: floating point exception occurred, exiting with signal 8.
Aborted (core dumped)

In the first case the divide by zero does not cause program termination. In the second case it does cause program termination. In the final case we call our error handler before termination. The handler routine could be written to provide additional information. For example there is a backtrace_symbols routine that can be called that gives hex addresses for all of the routines in the call tree. This might be useful to some people.

Getting traceback information from Core Dumps

You might have noticed that there is the statement (core dumped) in the output of many of the runs. You most likely will not see this in your output on RA. A core dump is a file that contains a description of the state of a program when it has an abnormal exit. The generation of core files is disabled by default on RA. It can be enabled but even then it will only work in some cases. Please email tkaiser@mines.edu for information.

Core dumps often contain traceback information that can be read using a debugger such as gdb. Debuggers are in general rather complicated but here is a script that is useful using the gdb debugger.

traceback

#!/bin/bash 
gdb $1 -c core* << END
bt
list
quit
END

Given the name of a program that terminated with a core dump, the script starts gdb, reads the core file and prints the backtrace, along with a short listing and exits. We need to be sure that the only core file in the directory is the one generated by the program. Here is an example:

[tkaiser@ra state]$ icc stc_00.c -g -O0  -o stc_00g -lm
[tkaiser@ra state]$ rm core*
[tkaiser@ra state]$ ./stc_00g 3 0
Floating point exception (core dumped)


[tkaiser@ra state]$ ./traceback stc_00g
GNU gdb 6.8-HEAD-2009-12-22-08:58
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `./stc_00g 3 0'.
Program terminated with signal 8, Arithmetic exception.
[New process 6745]
#0  0x0000000000401a6f in asin ()
(gdb) #0  0x0000000000401a6f in asin ()
#1  0x000000000040142f in asinerror (x=3) at /state/partition1/tkaiser/stc_00.c:111
#2  0x0000000000401412 in do_jacobi (psi=0x50f010, new_psi=0x50f990, diff=0x7fbfffdd08, i1=1, i2=300, j1=1, j2=300) at /state/partition1/tkaiser/stc_00.c:105
#3  0x000000000040112d in main (argc=3, argv=0x7fbfffde28) at /state/partition1/tkaiser/stc_00.c:58

(gdb) 19	
20	int main(int argc, char **argv)
21	{
22	FLT mydiff;
23	FLT **psi;     /* our calculation grid */
24	FLT **new_psi; /* temp storage for the grid */
25	INT i,j;
26	
27	
28	    ierr=0;
(gdb) [tkaiser@ra state]$ 

The lines in bold give the traceback for the program, showing that there was a problem with the asin function.

The amount of information you can obtain from a core file is again dependent on the compile options. In general the higher the optimization the less useful the information. Programs compiled without the -g option can also have much of the useful information stripped.

Summary

Current generation optimizing compilers have the "feature" that by default there is very little information provided when a program terminates improperly or when there is a arithmetic problem. Fortran compilers have some compiler options that enable additional information to be provided. It is possible to get C programs to terminate on arithmetic errors by setting run time flags using the feenableexcept call. Core files contain information about the state of a program at exit. These can be "read" using a few simple commands and the gdb debugger.

Files

  1. test.f90
  2. stc_00.c
  3. traceback

Author Information

Timothy H. Kaiser, Ph.D.
tkaiser@mines.edu
Colorado School of Mines

Date:

January 2010, Revised May 2011