PTL Logo

Fault Tolerance Research @ Open Systems Laboratory

Transparent Checkpoint/Restart in Open MPI

  •  

Known Issues

Below is a list of know issues:

  • Ticket #1769.
    Applications that make use of MPI_ANY_SOURCE and MPI_ANY_TAG experience occasional hangs during checkpoint.
  • Ticket #1539.
    There is currently a bug in the OpenIB BTL when used with threads enabled. It will occasionally deadlock when in this driver due to a bug in the locking logic. This is unrelated to checkpoint/restart functionality, but could occur if you enable the checkpoint/restart coordination thread and are using the OpenIB BTL. If you are interested in tracking this issue see the Ticket above for more details. In the mean time a patch has been attached to the ticket that will work around the problem by removing the problematic locks. In our testing this patch did not lead to incorrect behavior, but be careful if you apply it as it has not been certified by the OpenIB BTL maintainers.