home | prev

Mastering the use of hpx_run.py

If Your HPX Application hangs

If you are running your application and it seems to be hanging, there is an easy way to kill all of the processes that are hanging. If you are running multi-locality but on an SMP machine, then all you have to do is Ctrl-C and hpx_run.py will take care of killing all the hanging processes. However, running on a cluster is a bit of a challenge, because the way hpx_run.py works is to ssh the command to the other nodes. If you Ctrl-C out of a cluster run, all the processes are still running on the other nodes. Whenever this happens, hpx_run.py will make a script called cleanup*.sh that you can use to automatically send the kill command to the hanging processes. Another thing to note is that as of right now, hpx_run.py has only been tested to work on Arete for a distributed multi-locality run, and only if you run the script on one of the compute nodes. If you try to run on other clusters, your mileage may vary.

Silent Mode

There is a trick you can use to minimize the amount of text that hpx_run.py prints out. If all you care about is the output from the program you can use the --shhh option to significantly reduce the amount that hpx_run.py prints out. This is what it looks like in action:

$ hpx_run.py -s -l 1:1 "fibonacci"
Runtime 'rts0' stdout:
elapsed: 0.019939, result: 55

$ hpx_run.py -s -l 2:1 "fibonacci"
Runtime 'rts1' stdout:
elapsed: 0.129291, result: 55

Debug Mode

I told you in the first tutorial that hpx_run.py just automated setting up the multiple locality runs. If you wanted to see what commands it is actually running you can use the --debug option to do a dry-run where it doesn't actually start the run, but just prints out the commands it would have run. This is what this looks like in action:

$ hpx_run.py -d -l 2:1 "fibonacci"
System view:
        2 nodes:
        Node 'node1' with 1 cores
        Node 'node0' with 1 cores

Locality set:
        2 localities:
        Locality 'L0' with 1 threads
        Locality 'L1' with 1 threads

Distributed runtime:
        2 local instances:
        Runtime 'rts1' with 1 threads
        Runtime 'rts0' with 1 threads

Local runtime instance 'rts1':
        HPX command: fibonacci -r -a localhost:2222 -x localhost:2223 -l 2 -t 1 
        Environment: (Empty)

Local runtime instance 'rts0':
        HPX command: fibonacci -w -a localhost:2222 -x localhost:2224 -l 2 -t 1 
        Environment: (Empty)

Setting Base Port

If you run on a shared system with other people (such as on Castor and Pollux) then you might run into some port conflicts with other people running HPX code. By default, hpx_run.py starts with a base port of 2222 and counts up from there. You can set it to start with a different port by using the --port option. This is illustrated below:

$ hpx_run.py -d -p 2829 -l 2:1 "fibonacci"
System view:
        2 nodes:
        Node 'node1' with 1 cores
        Node 'node0' with 1 cores

Locality set:
        2 localities:
        Locality 'L0' with 1 threads
        Locality 'L1' with 1 threads

Distributed runtime:
        2 local instances:
        Runtime 'rts1' with 1 threads
        Runtime 'rts0' with 1 threads

Local runtime instance 'rts1':
        HPX command: fibonacci -r -a localhost:2829 -x localhost:2830 -l 2 -t 1 
        Environment: (Empty)

Local runtime instance 'rts0':
        HPX command: fibonacci -w -a localhost:2829 -x localhost:2831 -l 2 -t 1 
        Environment: (Empty)

GDB Mode

You can use the --use_gdb option to tell hpx_run.py to run the application inside of GDB. This is useful for when you are trying to debug your application using multiple localities.