next up previous contents
Next: Adding a Recovery Method Up: Providing Your own Checkpointing/Recovery Previous: Providing Your own Checkpointing/Recovery   Contents

Adding a Checkpointing Method

Checkpointing is similar to performing full output of an individual grid variable, except that

  1. the output is done for all grid variables existing in a grid hierarchy
  2. in addition to the contents of all variables, also the current setting of all parameters is saved as well as some other information necessary for recovering from the checkpoint at a later time

A thorn routine providing this checkpointing capability should register itself with the flesh's scheduler at the CPINITIAL (for initial data checkpoints), CHECKPOINT (for periodic checkpoints of evolution data), and TERMINATE time bins (for checkpointing the last timestep of a simulation).

It should also decide whether checkpointing is needed by evaluating the corresponding checkpoint parameters of IOUtil (see section A6.9).

Before dumping the contents of a distributed grid array into a checkpoint file the variable should be synchronized in case synchronization was not done before implicitly by the scheduler.

To gather the current parameter values you can use the C routine

  char *IOUtil_GetAllParameters (const cGH *GH, int all);
from thorn IOUtil. This routine returns the parameter settings in an allocated single large string. Its second argument all flags whether all parameter settings should be gathered ($!= 0$) or just the ones which have been set before ($== 0$). Note that you should always save all parameters in a checkpoint in order to reproduce the same state after recovery.

As additional data necessary for proper recovery, the following information must be saved in a checkpoint file:

  • the current main loop index (used by the driver as the main evolution loop index)
  • the current CCTK iteration number (GH->cctk_iteration)
  • the physical simulation time (GH->cctk_time)

Moreover, information about the I/O mode used to create the checkpoint (chunked/unchunked, serial versus parallel I/O), the active thorns list, or this run's Cactus version ID (for compatibility checking at recovery time) could be relevant to save in a checkpoint.


next up previous contents
Next: Adding a Recovery Method Up: Providing Your own Checkpointing/Recovery Previous: Providing Your own Checkpointing/Recovery   Contents