In this paper, we describe a method of execution retry for bypassing software faults in message-passing applications. Based on the techniques of checkpointing and message logging, we demonstrate the use of message replaying and message reordering as two mechanisms for achieving localized and fast recovery. Our approach gradually increases the rollback distance and the number of affected processes when a previous retry fails, and is therefore named progressive retry. An ...