next up previous contents
Next: Clocks for Timing Up: Checkpointing/Recovery Methods Previous: Checkpointing Invocation   Contents

Recovery Invocation

Recovering from a checkpoint is a two-phase operation for which the flesh provides distinct timebins for recovery routines to be scheduled at:

CCTK_RECOVER_PARAMETERS
This time bin is executed before CCTK_STARTUP, in which the parameter file is parsed. From these parameter settings, the recovery routines should decide whether recovery was requested, and if so, restore all parameters from the checkpoint file and overwrite those which aren't steerable.
The flesh loops over all registered recovery routines to find out whether recovery was requested. Each recovery routine should, therefore, return a positive integer value for successful parameter recovery (causing a shortcut of the loop over all registered recovery routines), zero for no recovery, or a negative value to flag an error.
If recovery was requested, but no routine could successfully recover parameters, the flesh will abort the run with an error message. If no routine recovered any parameters, i.e. if all parameter recovery routines returned zero, then this indicates that this run is not a recovery run.
If parameter recovery was performed successfully, the scheduler will set the recovered flag which--in combination with the setting of the Cactus::recovery_mode parameter--decides whether any thorn routine scheduled for CCTK_INITIAL and CCTK_POSTINITIAL will be called. The default is to not execute these initial time bins during recovery, because the initial data will be set up from the checkpoint file during the following CCTK_RECOVER_VARIABLES time bin.
CCTK_RECOVER_VARIABLES
Recovery routines scheduled for this time bin are responsible for restoring the contents of all grid variables with storage assigned from the checkpoint.
Depending on the setting of Cactus::recovery_mode, they should also decide how to treat errors in recovering individual grid variables. Strict recovery (which is the default) requires all variables to be restored successfully (and will stop the code if not), whereas a relaxed mode could, e.g. allow for grid variables, which are not found in the checkpoint file, to be gracefully ignored during recovery.


next up previous contents
Next: Clocks for Timing Up: Checkpointing/Recovery Methods Previous: Checkpointing Invocation   Contents