Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
dragnet:benchmarks_of_the_lotaas_pipelines [2015-08-21 00:13] – Sotiris Sanidas | dragnet:benchmarks_of_the_lotaas_pipelines [2017-03-08 15:27] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 25: | Line 25: | ||
accelsearch (zmax=0; | accelsearch (zmax=0; | ||
accelsearch (zmax=50; | accelsearch (zmax=50; | ||
- | accelsearch (zmax=50; | + | accelsearch (zmax=50; |
- | accelsearch (zmax=200; | + | accelsearch (zmax=200; |
**Total time spent for the second large set of DM trials (4000-10000)**\\ | **Total time spent for the second large set of DM trials (4000-10000)**\\ | ||
Line 35: | Line 35: | ||
accelsearch (zmax=0; | accelsearch (zmax=0; | ||
accelsearch (zmax=50; | accelsearch (zmax=50; | ||
- | accelsearch (zmax=50; | + | accelsearch (zmax=50; |
- | accelsearch (zmax=200; | + | accelsearch (zmax=200; |
^ % time alloc. ^ zmax=0; | ^ % time alloc. ^ zmax=0; | ||
- | ^ fil conversion ^ 3 ^ 1 | + | ^ fil conversion ^ 3 ^ 1 |
- | ^ rfifind ^ 9 ^ 3 ^ 6 ^ | + | ^ rfifind ^ 9 ^ 3 ^ 6 ^ |
- | ^ dedispersion ^ 37 ^ 16 ^ 25 ^ | + | ^ dedispersion ^ 37 ^ 16 ^ 25 ^ |
- | ^ sp search ^ 14 ^ 5 | + | ^ sp search ^ 14 ^ 5 |
- | ^ realfft ^ 3 ^ 1 ^ 2 ^ | + | ^ realfft ^ 3 ^ 1 ^ 2 ^ < |
- | ^ rednoise ^ 3 ^ 1 | + | ^ rednoise ^ 3 ^ 1 |
- | ^ accelsearch ^ 18 ^ 67 ^ 46 ^ | + | ^ accelsearch ^ 18 ^ 67 ^ 46 ^ |
- | ^ folding ^ 12 ^ 5 ^ 8 ^ | + | ^ folding ^ 12 ^ 5 ^ 8 ^ |
- | ^ data copying/etc ^ 1 ^ 1 ^ 1 ^ | + | ^ data copying/etc ^ 1 ^ 1 ^ 1 ^ < |
Total processing time per beam (zmax=0; | Total processing time per beam (zmax=0; | ||
Total processing time per beam (zmax=50; | Total processing time per beam (zmax=50; | ||
+ | Total processing time per beam (zmax=50; | ||
+ | Total processing time per beam (zmax=200; | ||
==== Performance of the LOTAAS v.1 GPU pipeline on cartesius ==== | ==== Performance of the LOTAAS v.1 GPU pipeline on cartesius ==== | ||
Line 60: | Line 62: | ||
- | ==== Data transfering | + | ==== Data transferring |
- | 32-bit to 8-bit downsampling on CEP2 (per observation): | + | 32-bit to 8-bit downsampling on CEP2 (per observation): |
- | Transferring from CEP2 to LTA (per observation): | + | Transferring from CEP2 to LTA (per observation): |
- | Observation downloading on cartesius (1-core): ~8hours | + | Observation downloading on cartesius (1-core): ~8hours\\ |
+ | Observation downloading on cartesius (home area, 8jobs in parallel.sh):< | ||
+ | ==== Benchmarks for filterbank creation with psrfits2fil ==== | ||
+ | psrfits2fil was executed with different numbers of parallel processes. The following plot shows the amount of time needed in order to create the fil files for various cases of parallel psrfits2fil instances.\\ | ||
+ | |||
+ | Using the same disk the following cases were tried: 1, | ||
+ | for 2 disks: 1, | ||
+ | |||
+ | {{dragnet: | ||
+ | |||
+ | Using multithreading with 2 disks, gives a smooth linear performance up to 24 cores, and then it turns slightly worse, probably due to I/O. | ||
+ | |||
+ | Using the above results, I extrapolated the time needed with each work strategy in order to compute 32 filtebanks.\\ | ||
+ | |||
+ | {{dragnet: | ||
+ | |||
+ | When using the same disk, the fastest execution time is achieved having 4 psrfits2fil instances running in parallel. Above that, probably disk I/O normalises all the results and the performance decreases gradually, probably due to the increased I/O calls, since the throughput must already be saturated.\\ | ||
+ | |||
+ | Using 2 disks, the performance is significantly better, and the best results are achieved using 24 psrfits2fil instances in parallel, although the difference remains small.\\ | ||
+ | |||
+ | ==== rfifind benchmarks ==== | ||
+ | I ran the same tests twice. | ||
+ | |||
+ | I created rfi masks running rfifind in parallel for 4, | ||
+ | In the following plots I plot the number of parallel instances of rfifind executed (x-axis) and the time taken for these to be completed (y-axis).\\ | ||
+ | |||
+ | {{dragnet: | ||
+ | {{dragnet: | ||
+ | |||
+ | In the following plots, I extrapolated the above results in order to find the optimal number of parallel jobs in order to compute 32 rfi masks | ||
+ | |||
+ | {{dragnet: | ||
+ | {{dragnet: | ||
+ | |||
+ | From the above, we can conclude that using 1 or 2 disks does not make a big difference. Also, hyperthreading works smoothly, and indeed the best strategy is to have the maximum possible number of rfifind instances running in parallel. | ||
+ | |||
+ | ==== Cartesius Benchmarks ==== | ||
+ | |||
+ | Processing 1 full pointing on cartesius using either /dev/shm or HDDs | ||
+ | |||
+ | {{dragnet: |