Next: Clocks for Timing
Up: Checkpointing/Recovery Methods
Previous: Checkpointing Invocation
Contents
Recovering from a checkpoint is a two-phase operation for which the flesh
provides distinct timebins for recovery routines to be scheduled at:
- CCTK_RECOVER_PARAMETERS
- This time bin is executed before
CCTK_STARTUP, in which the parameter file is parsed. From these parameter
settings, the recovery routines should decide whether recovery was
requested, and if so, restore all parameters from the checkpoint file and
overwrite those which aren't steerable.
The flesh loops over all registered recovery routines to find out
whether recovery was requested. Each recovery routine should, therefore,
return a positive integer value for successful parameter recovery (causing
a shortcut of the loop over all registered recovery routines),
zero for no recovery, or a negative value to flag an error.
If recovery was requested, but no routine could successfully recover
parameters, the flesh will abort the run with an error message.
If no routine recovered any parameters, i.e. if all parameter
recovery routines returned zero, then this indicates that this run
is not a recovery run.
If parameter recovery was performed successfully, the scheduler will set the
recovered flag which--in combination with the setting of the Cactus::recovery_mode parameter--decides whether any thorn routine
scheduled for CCTK_INITIAL and CCTK_POSTINITIAL will be called.
The default is to not execute these initial time bins during recovery,
because the initial data will be set up from the checkpoint file during the
following CCTK_RECOVER_VARIABLES time bin.
- CCTK_RECOVER_VARIABLES
- Recovery routines scheduled for this time bin are responsible for restoring
the contents of all grid variables with storage assigned from the
checkpoint.
Depending on the setting of Cactus::recovery_mode, they should also
decide how to treat errors in recovering individual grid variables. Strict
recovery (which is the default) requires all variables to be restored
successfully (and will stop the code if not), whereas a relaxed mode
could, e.g. allow for grid variables, which are not found in the checkpoint
file, to be gracefully ignored during recovery.
Next: Clocks for Timing
Up: Checkpointing/Recovery Methods
Previous: Checkpointing Invocation
Contents
|