3. Remote Memory Access

Athapascan-0 Formats have some functions that allow access to memory regions of any node executing the parallel program. The access to one of such regions is done using a DMA region address, which is defined by an address in a node, with a certain format. DMA regions can be read and written using any basic or user-defined format.

3.1. DMA Regions

The structure that defines remotely accessible data in a given node is called DMA region. It must be created with the function a0NewDMARegion() by one of the threads of the node that holds the data. To inform the threads of remote nodes, DMA regions can be sent in messages.

  a0tDMARegion MyRegion;
  int Data;
  ...
  /* create a DMA region address to the integer "Data" */
  a0NewDMARegion(&MyRegion, A0FInt, &Data, 1);
  /* send it to somebody */
  a0FSend(..., &Request, A0FDMARegion, &MyRegion, 1);

3.2. Reading and Writing Remotely

The functions that read and write data from and to DMA regions are a0FRead() and a0FWrite(), respectively. They have also asynchronous versions (a0IFRead() and a0IFWrite()). One format must be given to specify into where to receive the read data or from where to send the written data. The format have always an address and a count as complements, indicating that count elements of the given format must be loaded or stored contiguously, starting at the given address. The format used remotely, in the node which contains the DMA region accessed, is defined inside the DMA region descritor itself.

The remote node normally transfers the whole DMA region. In the case that only a part of the regions is wanted to be accessed, there are also functions (a0FReadPart() and a0FWritePart()). They can specify the starting element and the count of elements of the DMA region that should be transferred.

  /* remotely */
  int Data[100];
  ...
  /* create DMA region and send it to someone */
  a0NewDMARegion(&MyRegion, A0FInt, &Data, 100);
  a0FSend(..., &Request, A0FDMARegion, &MyRegion, 1);

  /* locally */
  int PartData[2];
  int AllData[100];
  ...
  /* receive the DMA region */
  a0FReceive(..., &Request, A0FDMARegion, &RemoteRegion, 1);
  /* reads all elements */
  a0FRead(&RemoteRegion, &Request, A0FInt, &AllData, 100);
  ...
  /* write only a part (elements 7 and 8: start=7 size=2) */
  a0FWritePart(&RemoteRegion, 7, 2, &Request, A0FInt, &PartData, 2);

3.3. Memory Locks

Memory locks allow cooperating threads to perform consistent operations on DMA regions, but do not guarantee exclusive access. That is, the threads may still access DMA regions without using memory locks, possibly resulting in inconsistencies.

The following example defines a lock to a DMA region. After getting the lock, the function reads the DMA region, modifies it, and writes it back. Then the lock is released. The functions a0Lock() and a0Unlock() block the calling thread until the lock is acquired and released respectively.

  a0tLock lock;
  a0tDMARegion region;
  int count;

  a0SetLockDMARegion(&lock, 0, ®ion);
  a0Lock(&lock, 1, NULL);
  a0FRead(®ion, NULL, A0Int, &count, 1);
  count ++;
  a0FWrite(®ion, NULL, A0Int, &count, 1);
  a0Unlock(&lock, 1, NULL);

To minimize the amount of communication, there are combined versions of read and lock, and write and unlock. Then the same code could have been written as:

  a0tLock lock;
  a0tDMARegion region;
  int count;

  a0FReadLock(&lock, ®ion, NULL, A0Int, &count, 1);
  count ++;
  a0FWriteUnlock(&lock, NULL, A0Int, &count, 1);

The locking mechanism allows to lock parts of DMA regions. More than one thread may hold a lock for a given DMA region, if they refer to disjoint parts of that DMA region. The lock is done using the memory address of the referred parts, so locking different DMA regions that share the same address space may result in a conflict.

In order to set several locks, the program must before build a lock table. All the memory regions of the table will be locked in an predefined order, to avoid deadlocks. It is considerred a programming error if a given thread calls the function a0Lock() while it is holding a previous lock. The following lines lock a DMA region and the elements 5 to 7 of another. Then the elements of the first region are read and written into the locked part of the second. Finally, both locks are released.

  a0tLock lock[2];
  a0tDMARegion region1, region2;
  float foo[2];

  a0SetLockDMARegion(    lock, 0, ®ion1);
  a0SetLockDMARegionPart(lock, 1, ®ion2, 5, 2);
  a0Lock(lock, 2, NULL);
  a0FRead(®ion1, NULL, A0FFloat, foo, 2);
  foo[0] += foo[1];
  a0FWrite(®ion2, NULL, A0FFloat, foo, 2);
  a0Unlock(lock, 2, NULL);

The same optimisation of a0FReadLock() can be done when acquiring a table of locks. The functions a0SetLockRead() and a0SetUnlockWrite() can set the lock table elements to be read at lock time or written at unlock time, respectively. The code above can be rewritten as below. The read is performed at the same time as the lock. The write is performed at the same time as the unlock.

  a0tLock lock[2];
  a0tDMARegion region1, region2;
  float foo[2];

  a0SetLockDMARegion(    lock, 0, ®ion1);
  a0SetLockRead(         lock, 0, A0FFloat, foo, 2);
  a0SetLockDMARegionPart(lock, 1, ®ion2, 5, 2);
  a0SetUnlockWrite(      lock, 1, A0FFloat, foo, 2);
  a0Lock(lock, 2, NULL);
  foo[0] += foo[1];
  a0Unlock(lock, 2, NULL);

Note: the functions a0SetLockRead() and a0SetUnlockWrite() must be called after a0SetLockDMARegion() or a0SetLockDMARegionPart() and before a0Lock().