If you are running your application and it seems to be hanging, there is an easy way to kill all of the processes that are hanging.
If you are running multi-locality but on an SMP machine, then all you have to do is Ctrl-C and hpx_run.py will take care of killing all the hanging processes.
However, running on a cluster is a bit of a challenge, because the way hpx_run.py works is to ssh the command to the other nodes.
If you Ctrl-C out of a cluster run, all the processes are still running on the other nodes.
Whenever this happens, hpx_run.py will make a script called cleanup*.sh
that you can use to automatically send the kill command to the hanging processes.
Another thing to note is that as of right now, hpx_run.py has only been tested to work on Arete for a distributed multi-locality run, and only if you run the script on one of the compute nodes.
If you try to run on other clusters, your mileage may vary.
There is a trick you can use to minimize the amount of text that hpx_run.py prints out.
If all you care about is the output from the program you can use the --shhh
option to significantly reduce the amount that hpx_run.py prints out.
This is what it looks like in action:
$ hpx_run.py -s -l 1:1 "fibonacci" Runtime 'rts0' stdout: elapsed: 0.019939, result: 55 $ hpx_run.py -s -l 2:1 "fibonacci" Runtime 'rts1' stdout: elapsed: 0.129291, result: 55
I told you in the first tutorial that hpx_run.py just automated setting up the multiple locality runs.
If you wanted to see what commands it is actually running you can use the --debug
option to do a dry-run where it doesn't actually start the run, but just prints out the commands it would have run.
This is what this looks like in action:
$ hpx_run.py -d -l 2:1 "fibonacci" System view: 2 nodes: Node 'node1' with 1 cores Node 'node0' with 1 cores Locality set: 2 localities: Locality 'L0' with 1 threads Locality 'L1' with 1 threads Distributed runtime: 2 local instances: Runtime 'rts1' with 1 threads Runtime 'rts0' with 1 threads Local runtime instance 'rts1': HPX command: fibonacci -r -a localhost:2222 -x localhost:2223 -l 2 -t 1 Environment: (Empty) Local runtime instance 'rts0': HPX command: fibonacci -w -a localhost:2222 -x localhost:2224 -l 2 -t 1 Environment: (Empty)
If you run on a shared system with other people (such as on Castor and Pollux) then you might run into some port conflicts with other people running HPX code.
By default, hpx_run.py starts with a base port of 2222
and counts up from there.
You can set it to start with a different port by using the --port
option.
This is illustrated below:
$ hpx_run.py -d -p 2829 -l 2:1 "fibonacci" System view: 2 nodes: Node 'node1' with 1 cores Node 'node0' with 1 cores Locality set: 2 localities: Locality 'L0' with 1 threads Locality 'L1' with 1 threads Distributed runtime: 2 local instances: Runtime 'rts1' with 1 threads Runtime 'rts0' with 1 threads Local runtime instance 'rts1': HPX command: fibonacci -r -a localhost:2829 -x localhost:2830 -l 2 -t 1 Environment: (Empty) Local runtime instance 'rts0': HPX command: fibonacci -w -a localhost:2829 -x localhost:2831 -l 2 -t 1 Environment: (Empty)
You can use the --use_gdb
option to tell hpx_run.py to run the application inside of GDB.
This is useful for when you are trying to debug your application using multiple localities.