Fortran: coding for scientists, by scientists

Come back through time to the days of FORTRAN – the language of fluid dynamics, computational physics and more.

FORTRAN (it dropped the caps in 1990) is the oldest high-level language still written today. It’s now over 55 years old and still in widespread use in the sciences, in high-performance computing, and in supercomputers. Its real strength is in numerical computation and complicated mathematical models (making it also popular in finance); and its position is hard to assail given the vast Fortran code library of numerical computation routines that’s available. There are even people still using fixed-format F77 (see below), although most modern users have shifted to the easier free-format. It’s probably not your language of choice for shiny Web 2.0 development, but it’s fascinating to have a look at something with such a venerable and successful history.

History

John Backus, at IBM, proposed developing Fortran in 1953, with the aim of producing a more practical alternative to assembly language. Assembly language (now as then) uses mnemonics like ADD to describe basic operations, which are then translated into machine code (as you’ll know from the assembly tutorial on page 106). This is one up from programming directly in machine code (as in the very earliest computers), but is still massively time-consuming and makes bugs very difficult to find. Backus’ idea was to create a program that could turn something like mathematical notation into machine code. This wasn’t a totally novel idea. UNIVAC had Short Code, which did something similar but very slowly around 1949, and Grace Hopper at UNIVAC was proposing a similar idea to Backus’ at about the same time (resulting in FLOW-MATIC, an ancestor of COBOL). But in 1953, there was nothing that really looked like a useful, functional, high-level language. IBM gave Backus the go-ahead.

Backus’ team had completed a draft specification by 1954, and the first compiler was delivered in April 1957. An important aspect was that it was an optimising compiler; the language would (understandably) never be popular if it couldn’t compete on performance with hand-coded assembler, which had been the main problem with Short Code. FORTRAN’s focus was on numerical computation, as that was the chief use for IBM’s machines at the time. It caught on extremely quickly, making it the first successful high-level language (that is, a language which is at least to some extent removed from the details of machine code).

The major advantage, and the reason for its popularity, was that programs could now be written much more quickly. Although FORTRAN was initially very limited in scope, it enabled the programmer to think in terms of algorithms, without then having to translate that by hand into machine instructions. Instead, the compiler did that hard work – and did it well. Its rapid spread led to manufacturers developing compilers for many different computers, making it arguably the first cross-platform language. This in turn helped popularise it further, because FORTRAN programs written for one computer could be run on another computer; something that just wasn’t possible with assembler.

This first version of FORTRAN had 32 statements, including flow control (IF, GOTO, etc), I/O, and assignment statements. It was stored on punchcards, with one card per line of code. FORTRAN II, in 1958, allowed the programmer to create subroutines and functions, including passing parameters by reference, and global (or COMMON) variables. Subroutines could not however be recursive, as computers at the time didn’t have a ‘stack’ concept. FORTRAN III was never released, and FORTRAN IV, begun in 1961, had a few minor improvements.

FORTRAN 66 was the next big step, providing an ‘industry-standard’ version of the language, based on FORTRAN IV. They also defined Basic FORTRAN, which removed all machine-dependent features (ie anything that relied on a particular brand of computer to work). Afterwards, however, compilers with various extensions were released to take advantage of other possible features, which led the ANSI committee to revise the standard again. This took nearly a decade, but the update was finally circulated in 1977, resulting in FORTRAN 77, perhaps the most historically important dialect. Unlike earlier versions, F77 code is still running out in the world, and FORTRAN compilers now available will compile F77 as well as later versions. It had improved support for structured programming, now the norm in programming, much improved character input/output support (previously characters were unsupported, and had to be placed into numeric variables via the Hollerith constants technique, which was not very portable), support for lexical string comparison (demanded by the US Dept of Defence), and various loop extensions. F77 was the “Standard FORTRAN” for nearly fifteen years.

The next major rewrite, after many delays, was Fortran 90 (caps now abandoned). (For more on the delays and the byzantine behind-the-scenes wrangling, see Brian Meek’s somewhat world-weary piece The Fortran Saga (https://www.fortran.com/forsaga.html).) The biggest change was that free-form source input was now allowed (see the section on fixed-form source). Array operations were also finally added, as was recursion, operator overloading, dynamic memory allocation, and much improved data structure handling. Modules were also now available, improving program structure and reflecting new programming practice since 1977. However, all F77 features were retained, and any compliant F77 program should also be a compliant F90 program. Fortran 95 was a minor revision, although it deleted a few features (dating back to pre-F77) that had been labelled obsolete in Fortran 90.

Another major revision has since been published, Fortran 2003 (and a minor update in 2008). This includes, among other things, object-oriented programming support, improved floating-point handling, and improved I/O. However, if you want to learn Fortran today, it’s usually advised that you start with 90/95, and then go on to learn about the new features added by 2003. The rest of this article will use Fortran 95.

Image

Hello there! Check out the user input boxout below and see if you can improve this code to ask your name.

Hello World

To get started with Fortran, the easiest compiler to get hold of is the GNU compiler GFortran. Install the gfortran package for your distribution, or download it from the GNU website and compile from source. Other compilers include G95 (free) and NAG (paid). Some compilers include extensions beyond the Fortran standard, which can be handy but equally lock you into that compiler; here we’ll stick with the official standard.

You can run GFortran with the command gfortran or f95; I’ll use f95. Once you’ve got the compiler installed, save this as hello.f95):

! Hello World program hello print *,”Hello World” end program hello

Compile it with f95 -o hello hello.f95. -o hello specifies the output file; without this option, the default is to compile to a.out. Run it with ./hello and you should get the traditional Hello World output.

Looking at the source, you can see that comments start with !, and that programs begin with program <name> and end with end program <name>. print * means print to screen (print), with the appropriate format for the output (*). Both double and single quotes can be used from Fortran 90 onwards (single only for FORTRAN 77). Indentation isn’t required in free format source code, but it does make for code that’s easier to read.

Fixed Format and Free Format

Prior to Fortran90, Fortran code had to be written in fixed format. Initially, this was because FORTRAN in its earliest days was written on punch cards, and a specific format of card was expected. The basic rules looked like this:

  • Maximum line length 72 characters. To continue a line, put any character in column 6 on the next (continuation) line.
  • The first 6 columns must be empty (so each line starts with 6 spaces), unless the line is a continuation line or a comment.
  • Comment lines have * or c in the first column.
  • Spaces are ignored altogether. endprogram and end program are read exactly the same.

So the “Hello World” program above would look like this in fixed format:

c Hello World program hello print *,’Hello World’ end program hello

This is, as you will readily understand, a bit of a faff. Happily, as of Fortran 90, free format was introduced; though fixed format is still understood by Fortran compilers, to make sure old code could still be used. Free format looks a lot more like any other modern language (all the rest of the code in this tutorial is in free format), but there are still some rules:

Lines can be up to 132 characters long; to continue a line use & at the end of the first line.

If the split is in the middle of a name, use another & at the start of the next line:

character :: longname*100 longname = “this is a terribly long line, so you need an ampersand here & &and another one here”

Comments start with !.

Whitespace doesn’t matter in some places: ENDIF and end if are treated the same, and indentation is not significant. However, you can’t have embedded spaces in variable names or in numbers, as you could in earlier FORTRAN.

You can put multiple statements on one line by separating them with ; (although the next person reading your code may or may not thank you for it).

Most compilers assume free format if the source filename ends in .f95 or .f90, and fixed format if it
is .for. You can also specify -ffixed-form or -ffree-form on the Linux command line when invoking the GCC compiler.

The logistic equation

The logistic equation is an example of a very basic population model that shows some distinctive characteristics of feedback systems. It can be defined in a couple of ways, but we’ll use this one:

Pi+1 = R * Pi * (1 - Pi)

Pi is the population at generation i. R is a growth rate term; you could think of it as representing the resources available. A more basic version of this equation would be Pi+1 = R * Pi Pi, in which the population at generation i+1 is purely dependent on the population at generation i plus this growth rate. A system with better resources would grow faster. The (1 – Pi) term is added to represent the idea that as populations grow larger, expansion becomes harder. What we might expect is that over term, the value converges, and eventually Pi and Pi+1 become the same.

So here’s a first go at an iterative program that looks at what does happen to population over time (specifically, over 1,000 generations). Save this as logistic.f95:

! Logistic equation program logistic implicit none integer :: i, n real :: R, x, xnext character(len=30) :: f n = 1000 R = 1.5 x = 0.01 f = “(i6, f12.6)” do i = 1, n xnext = R * x * (1-x) write (*,f) i, xnext x = xnext end do end program logistic

Any line beginning ! is a comment. implicit none is an inheritance from FORTRAN 77 and earlier. Originally, Fortran had implicit variable typing, done by variable name. Variables beginning i, j, k, l, m, or n were taken to be integers (since scientists and mathematicians – and Fortran was written by scientists – expect i and j to be integers), and all other names were real numbers, unless explicitly declared to be strings or characters. Due to the desire for backwards compatibility, this had to still work in F90 and onwards; but it’s not a good modern programming practice. The solution was implicit none, which turns off implicit typing and requires all variables to be explicitly declared.

The next three lines do the declaring, creating integers, real numbers (real has 6 decimal digits of precision), and a string, which is declared as a character array of length 30. Other basic data types are double precision (13 digits of precision), logical (true/false, like a Boolean variable), and complex (complex numbers). Fortran 90 also introduced derived types so you can create your own types. Note that we are specifying the initial value of P0 (x = 0.01) as well as R.

The string f will be a formatting line for the output; the format here is (i6, f12.6). i6 is an integer format, requiring up to 6 positions (ie it could output a number up to 999999, or -99999 as the negative sign takes up an output position). If you tried to output 1000000 it would print ****** to warn you that the number was too large for the format string. f12.6 is for real numbers, and indicate 12 positions, of which 1 will be the decimal point, and 6 will be to the right of the decimal point. So you could print up to 99999.999999. If there are more than 6 numbers after the decimal point it will be rounded appropriately. Again, if the whole number is too large, it prints asterisks.

The main body of the program is the do loop, which loops for i between 1 and n (here 1000). It calculates xnext (xi+1 in the equation) from x (xi), outputs i and xnext to screen, then stores the calculated value in x for the next loop. Compile and run as follows:

f95 -o logistic logistic.f95 ./logistic > plot.dat

This saves the output as a file, but it’s just a file of numbers. To view it as a plot, you’ll need to use Gnuplot and Evince (or another PostScript viewer). Save this as plot.gp:

set term postscript enh color set output “plot.ps” plot “plot.dat” u 1.2

The details of Gnuplot are outside the scope of this tutorial, but basically this sets up your terminal, sets the output filename, and tells Gnuplot to plot the contents of plot.dat using the first two columns as x and y axes. Generate and view the plot with

gnuplot plot.gp evince plot.ps

If you look at the output file plot.dat you’ll see that the equation rapidly converges to 0.333333. This suggests that a stable value for the population is 0.333333. This might be the same as the carrying capacity of the environment (ie the population level that the environment can support), but it might also be lower. Try changing the value of x to see if the start size of the population has an effect; and R to look at what effect the growth rate has.

Image

The interesting stuff is as R gets bigger, especially between about 3 and 4.

Multiple loops

Here’s another version, with multiple loops. This will allow us to look at what happens to the convergence point with different values of R (x here stays the same).

! Logistic equation program logistic implicit none integer :: i, j, m, n real :: R, x, xnext character(len=30) :: f m = 4000 n = 1000 f = “(i6, f12.6)” do i = 1, m R = i R = R/1000 x = 0.01 do j = 1, n xnext = R * x * (1-x) write (*,f) R, xnext x = xnext end do end do end program logistic

The meat of the program is still the same (that inner do loop, now labelled with j). But it now runs once for each value of R between 1 and 4. Since i varies between 1 and 4000, and R is defined as 1/1000 of i, it jumps in 0.001 steps. Note that if you try to define R = i/1000 it won’t work; dividing an integer always results in an integer. Instead, define R as i (translating it into a real number) then divide it by 1000.

Because of the way this is plotted, the long thick lines for R between 1 and 1.5 actually represent the population converging to a specific level (the top of the line, which is the population stable point for that value of R and x. (You could edit the code to look for the convergence point and only plot that, if it exists.) As R gets bigger, the graph gets more complicated. At some values there seem to be multiple convergence points; towards the far right of the graph, what we’re seeing is actually a chaotic effect.

A couple of potential improvements would be to try changing the initial value of x to see what happens at different R values; and to test for a convergence point and to plot only that convergence point if it exists, rather than all the points that lead it there. This could make the graph a bit clearer. But this initial graph should be enough to demonstrate the possibilities of feedback equations like this. If this sort of numerical modelling is your bag, you can find sample Fortran programs online for nearly anything you want to do. There’s extensive documentation available, although Fortran programs can be a bit of a nightmare to debug as error messages are not always readily comprehensible. Try it out, and enjoy that sense of hacker companionship linking you to those 1950s IBM pioneers who paved the way for the vast field of modern languages.

User input

You could set up the program to ask you for a specific version of x and R. Add these lines:

integer :: outf

character(len=30) :: outfile

outf = 3

outfile = “plot_v1.dat”

print *, “Please enter an initial x value between 0 and 1”

read *, x

print *, “Please enter an initial R value greater than 1”

read *, R

open (unit=outf, file=outfile, action=”write”, status=”replace”)

! Do loop and other variables as before, but alter the write line:

write (outf, f) i,xnext

outf and outfile are used to write to a specific file, rather than specifying the file on the command line. This is because if you use > to redirect on the command line, the print lines will also be redirected. You won’t see them, and they’ll cause problems in the data file. With open, as here, you need to provide an integer filehandle (should be greater than 3, as 0, 1, and 2 are the system filehandles; and Fortran itself reserves 5 and 6), a string filename, and options: here, the write action, and replace rather than append status.