Introducing the Windows Coarray Library
Posted on 2018-01-23
With the release of Simply Fortran version 2.41, developers on Windows now have access to true multi-image coarray support in Fortran via our Windows Coarray Library. For those unfamiliar with coarrays, they are a Fortran 2008 feature (though they existed as extensions long before the standard) for parallel processing. When used properly, this programming paradigm allows developers with modern processors to fully exploit multiple processing cores in a standard, portable manner.
A simple "hello world" example might appear as:
program Hello_World implicit none integer :: i character(len=20) :: name[*] if (this_image() == 1) then write(*,'(a)',advance='no') 'Enter your name: ' read(*,'(a)') name do i = 2, num_images() name[i] = name end do end if sync all write(*,'(3a,i0)') 'Hello ',trim(name),' from image ', this_image() end program Hello_World
In the above example, we're defining a coarray name that has one entry per image. As the program proceeds, we ask for the user's name from the first image of our program only. Once we have the name, we provide this name to all the other running images:
do i = 2, num_images() name[i] = name end do
Note that accessing the entries for another image requires using square brackets. After ensuring all the images are synchronized (so that we're sure they've all been provided with the user's name), we say hello from every available image. The program output on a dual-core system will appear as:
This example is, of course, trivial. A more exciting example would involve getting actual computations done in parallel, and there are plenty of problems that can be written to do so. Linked below is a simple diffusion example that exploits coarrays to allow parallel computations:
How the Windows Coarray Library Works
Our coarray library is somewhat unique in that it does not require any external control program or runtimes when in use. When coarrays are enabled, the compiled executable contains additional code to manage executing additional copies of itself. When first executed, your coarray-enabled executable will start "worker" copies of itself, all under the control of the original executable. By default, the total number of copies of the executable started will equal the number of processors reported by Windows. To start a different number of total copies, any coarray-enabled executable can be started with the command-line argument --coarray-number-images=n where n can be changed to the desired total number of images.
The Windows Coarray Library facilitates communication, data transfer, and synchronization between images using a combination of native Windows synchronization objects and temporary disk access. This combination ensures maximum portability between different versions of Windows, and executables using the Windows Coarray Library should run on any modern Windows system regardless of whether Simply Fortran is present on that system. While the synchronization objects have very little speed penalties, the disk access can cause data transfers to be somewhat slower.
Basic communication and data transfer is handled via a temporary SQLite database that the coarray executable configures upon execution. The database is not designed to be user-accessible, so we're not going to explain its details. However, it has been optimized for this specific use case. Reading and writing simultaneously from different images should not cause exclusive database locking due to options enabled within SQLite.
To improve the speed of the database, data transfers smaller than 64 bytes are handled within the database, but larger transfers use unique, single-use temporary files. While still requiring disk access, it allows other data transfer operations between images to progress without waiting for a particular transfer to occur.
While the average developer may not care about how specifically we've implemented coarrays, the above information should be taken into consideration when planning development. Here are two important tips based on the design described:
- If transferring an array of values, transfer it as a whole array.
- Only transfer what needs to be transferred.
The first pointer is referring to a common and unnecessary practice in some Fortran code. Developers should remember that modern Fortran is fully vectorized, and arrays and regions of arrays can be assigned directly as opposed to looping over each array element. Using vectorized assignment or access becomes even more important with coarrays because our coarray implementation has a non-negligible fixed cost per transfer regardless of the quantity of data. If we were to assign each element of a vector coarray via a loop, we're requesting a data transfer for each and every element. The following would be painfully slow for copying a 10000-element vector:
do i=1,10000 x(i) = x(i) end do
To take advantage of transferring an entire block of memory, we should use the following:
x(1:10000) = x(1:10000)
The vectorized operation will occur quite rapidly compared to the first, but the outcome will be the same.
The second tip is more general with respect to parallel computing. While above we suggest using vectorized transfers for assigning a relatively large array, the operation would be far too costly if we only needed to transfer the end elements, for example. If what we actually needed was to set both ends to be equal, we'd only want to call:
x(1) = x(1) x(10000) = x(10000)
Again, this simplification would be faster even if we weren't using coarrays, but it becomes drastically more important when we can avoid writing out about one megabyte of information to disk for another image to read.
While we've tried to ensure our Windows Coarray Library is fully functional, we've still marked it as experimental. We've tested the library with an assortment of Fortran programs using coarrays, from trivial all the way to a coarray test suite that exercises the library. When users do find failure conditions, we'd love to hear about them to improve Simply Fortran and to make sure we're providing the best product to our end users.