Prerequisites: - SAGA with File and Job package - Optional: * SAGA CPR for Migol * SAGA - SAGA Python bindings (PYTHONPATH must be set accordingly) - NAMD installation There are currently three version of the REMD-Manager: REMDManager-v2.1.py: - Support for CPR/Migol - Optional: SAGA Glide-In/BigJob abstraction (requires Advert Service) - Optional: Adaptive temperature sampling (supports a variable number of replicas) - for configuration of Glide-In refer to v3.0 - prepare_MAMD_config() : NPT.conf is changed if necessary (means after exchange step, new temperatures are assigned) - get_energy() : energy is pulled from "output.txt" and this file name should be set when initialized - do_exchange() : exchange is attempted here 0.) Setup - install SAGA/Globus (Migol optional) - Optional for Glide-In: Make sure the advert service is configured to use PostgreSQL by checking $SAGA_LOCATION/share/saga/saga_adaptor_default_advert.ini: [saga.adaptors.default_advert] name = default_advert [saga.adaptors.default_advert.preferences] dbtype = postgresql [saga.adaptors.default_advert.preferences.postgresql] # The set of parameters used in the dbconnect string # for PostgreSQL is the same as accepted by the PQconnectdb # function from the libpq library (see here: # http://www.postgresql.org/docs/8.1/interactive/libpq.html#LIBPQ-CONNECT) dbconnect = dbname=advertdb;host=fortytwo.cct.lsu.edu;port=5432;user=SAGA;password=SAGA_client - A Globus credential is required for running REMD via the Globus adaptor * LONI: https://docs.loni.org/wiki/Requesting_a_LONI_Grid_Certificate * TeraGrid: - http://www.teragrid.org/userinfo/access/sso_nontgca.php - http://www.ncsa.uiuc.edu/UserInfo/Grid/Security/GetUserCert.html - Initialize a proxy certificate before run with grid-proxy-init - Test Globus: globusrun -a -r qb1.loni.org 1.) Setup NAMD and RE: - Configure NAMD installation. On QB the following softenv entries can be added to the .soft file: +mvapich-1.0-intel10.1 +namd-2.6-mvapich +nwchem-5.1-mpich +mpich-1.2.7p1-Intel-fc-9.1+gcc-3.4.6 - create input file for NAMD. For the first run simply use the template from the svn: cp NPT.conf.template NPT.conf - Adjust configuration file (re_manager_v1.conf): * adjust working directories (must exists on file system!) * adjust allocation remote_host : qb1.loni.org qb1.loni.org replica_count : 2 remote_host_local_scheduler : pbs pbs workingdirectory : /work/luckow/replica/1 /work/luckow/replica/2 executable : /usr/local/packages/namd-2.6-mvapich-1.0-intel10.1/namd2 /usr/local/packages/namd-2.6-mvapich-1.0-intel10.1/namd2 queue : workq workq project : loni_jha_big loni_jha_big arguments : NPT.conf totalcputime : 1 numberofprocesses : 16 exchange_count : 10 stage_in_file : 310K-init.coor 310K-init.xsc NPT.conf parm99bs0_all.prm sbox_init.pdb sbox_init.psf temperature : 300 310 advert_host : fortytwo.cct.lsu.edu Ensure that the working directories specified exists. You can create the directories like this: mkdir /work/luckow/replica/1 mkdir /work/luckow/replica/2 ... 2.) Check configuration in REMDManager-v2.1.py. For a simple scenario use: """ Config parameters (will be moved to config file in the future) """ CPR = False SCP = False GlideIn = False AdaptiveSampling = False 3.) The REMDgManager can be run as follows: $ grid-proxy-init $ python REMDManager-v2.1.py --type=REMD --configfile=re_manager_v1.conf ****************************************************************************************************************************** REMDManager-v3.0.py: - Refactoring of version 2.1 - Improved configuration mechanism - Support for CPR/Migol - Mandatory: SAGA Glide-In/BigJob abstraction (requires Advert Service) - Adaptive replica size (variable number of MPI processes per replica) 0.) Setup SAGA 1.) RE configuration 1a) Configure Replica-Agent (it is located in the ../bigjob directory): - ../bigjob/advert_launcher.py resp. ../bigjob/advert_launcher.sh - poll advert service for jobs - executes and monitors jobs - see also README in ../bigjob/ directory * Agent Wrapper bash script (in ../bigjob/): $ cp advert_launcher.sh.template advert_launcher.sh This is the wrapper script for the Replica-Agent. It ensures that the environemnt is correctly set (MPI version, NAMD version etc.) If necessary adjust enviornmental settings in this file. * Agent Configuration (in ../bigjob/):: $cp advert_launcher.conf.template advert_launcher.conf Adjust shell, mpi version, if necessary 1b.) Replica-Manager: - REMDManager-v3.0.py - spawns Replica-Agents (Glide-Ins) and executes replica jobs through Glide-Ins [DEFAULT] # RE Manager settings re_agent: $HOME/tmp/REMDgManager/src/replica_launcher.sh arguments: NPT.conf total_number_replica: 16 number_of_mpi_processes : 16 exchange_count : 5 stage_in_file : 310K-init.coor 310K-init.xsc NPT.conf parm99bs0_all.prm sbox_init.pdb sbox_init.psf temperature : 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 advert_host : fortytwo.cct.lsu.edu cpr: False scp: False glide_in: True adaptive_sampling: False adaptive_replica_size: True [QB] host: qb1.loni.org gridftp_url: qb1.loni.org scheduler: pbs number_glide_in: 2 number_nodes: 128 executable: /usr/local/packages/namd-2.6-mvapich-1.0-intel10.1/namd2 jobtype: mpi working_dir_root: /work/luckow/replica/ allocation: loni_jha_big queue: workq userproxy: Ensure that the working directories specified exist on all machines. You can create the directories like this: mkdir /work/luckow/replica/1 mkdir /work/luckow/replica/2 ... 3.) Run RE simulation: $ grid-proxy-init $ python REMDManager-v3.0.py --type=REMD --configfile=re_manager_v3.conf TODO: - relative paths for file staging - automatic directory creation - cancelation of jobs via Globus adaptor does not work correctly Last updated: 10/01/2008