dragnet:benchmarks_of_the_lotaas_pipelines

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dragnet:benchmarks_of_the_lotaas_pipelines [2015-08-20 16:23] Sotiris Sanidasdragnet:benchmarks_of_the_lotaas_pipelines [2017-03-08 15:27] (current) – external edit 127.0.0.1
Line 10: Line 10:
 accelsearch (-zmax=0 ; –numharm=16): 1min20sec\\ accelsearch (-zmax=0 ; –numharm=16): 1min20sec\\
 accelsearch (-zmax=50 ; –numharm=16): 12min\\ accelsearch (-zmax=50 ; –numharm=16): 12min\\
 +accelsearch (-zmax=50 ; –numharm=8): 5min\\
 +accelsearch (-zmax=50 ; –numharm=8): 26min\\
 plots: 20sec\\ plots: 20sec\\
 python sifting and folding: 21min\\ python sifting and folding: 21min\\
Line 22: Line 24:
 rednoise: 3.5min\\ rednoise: 3.5min\\
 accelsearch (zmax=0;numharm=16): 21min\\ accelsearch (zmax=0;numharm=16): 21min\\
-accelsearch (zmax=0;numharm=16): 192min\\+accelsearch (zmax=50;numharm=16): 192min\\ 
 +accelsearch (zmax=50;numharm=8): 80min\\ 
 +accelsearch (zmax=200;numharm=8): 416min\\
  
 **Total time spent for the second large set of DM trials (4000-10000)**\\ **Total time spent for the second large set of DM trials (4000-10000)**\\
Line 30: Line 34:
 rednoise: 2min\\ rednoise: 2min\\
 accelsearch (zmax=0;numharm=16): 11min\\ accelsearch (zmax=0;numharm=16): 11min\\
-accelsearch (zmax=0;numharm=16): 96min\\+accelsearch (zmax=50;numharm=16): 96min\\ 
 +accelsearch (zmax=50;numharm=8): 40min\\ 
 +accelsearch (zmax=200;numharm=8): 208min\\ 
  
  
 ^ % time alloc. ^ zmax=0;numharm=16 ^ zmax=50;numharm=16 ^ zmax=50;numharm=8 ^ zmax=200;numharm=8 ^ ^ % time alloc. ^ zmax=0;numharm=16 ^ zmax=50;numharm=16 ^ zmax=50;numharm=8 ^ zmax=200;numharm=8 ^
-^ fil conversion ^  3  ^  1       +^ fil conversion ^  3  ^  1    2   <1  
-^ rfifind ^  9  ^  3  ^     +^ rfifind ^  9  ^  3  ^  6   2  
-^ dedispersion ^  37  ^  16  ^     +^ dedispersion ^  37  ^  16  ^  25   8  
-^ sp search ^  14  ^  5       +^ sp search ^  14  ^  5    9   3  
-^ realfft ^  3  ^  1  ^     +^ realfft ^  3  ^  1  ^  2   <1  
-^ rednoise ^  3  ^  1       +^ rednoise ^  3  ^  1    2   <1  
-^ accelsearch ^  18  ^  67  ^     +^ accelsearch ^  18  ^  67  ^  46   81  
-^ folding ^  12  ^  5  ^     +^ folding ^  12  ^  5  ^  8   3  
-^ data copying/etc ^  1  ^  1  ^     ^+^ data copying/etc ^  1  ^  1  ^  1   <1  ^
  
  
 Total processing time per beam (zmax=0;numharm=16): ~3hours\\ Total processing time per beam (zmax=0;numharm=16): ~3hours\\
 Total processing time per beam (zmax=50;numharm=16): ~7hours\\ Total processing time per beam (zmax=50;numharm=16): ~7hours\\
 +Total processing time per beam (zmax=50;numharm=8): ~5hours\\
 +Total processing time per beam (zmax=200;numharm=8): ~13h40m\\
  
 ==== Performance of the LOTAAS v.1 GPU pipeline on cartesius ==== ==== Performance of the LOTAAS v.1 GPU pipeline on cartesius ====
Line 53: Line 62:
  
  
-==== Data transfering (CEP$/LTA) ====+==== Data transferring (CEP$/LTA) ==== 
 + 
 +32-bit to 8-bit downsampling on CEP2 (per observation): 6-8 hours\\ 
 +Transferring from CEP2 to LTA (per observation): 2-3 hours\\ 
 +Observation downloading on cartesius (1-core): ~8hours\\ 
 +Observation downloading on cartesius (home area, 8jobs in parallel.sh):<2hours\\ 
 + 
 +==== Benchmarks for filterbank creation with psrfits2fil ==== 
 + 
 + 
 +psrfits2fil was executed with different numbers of parallel processes. The following plot shows the amount of time needed in order to create the fil files for various cases of parallel psrfits2fil instances.\\ 
 + 
 +Using the same disk the following cases were tried: 1,3,4,5,8,12,16. Anything above 16 is just an extrapolation\\ 
 +for 2 disks: 1,4,8,12,16,20,24,28,32 
 + 
 +{{dragnet:benchmarks:psrfits2fil1a.png?400}} 
 + 
 +Using multithreading with 2 disks, gives a smooth linear performance up to 24 cores, and then it turns slightly worse, probably due to I/O. 
 + 
 +Using the above results, I extrapolated the time needed with each work strategy in order to compute 32 filtebanks.\\ 
 + 
 +{{dragnet:benchmarks:psrfits2fil1b.png?400}} 
 + 
 +When using the same disk, the fastest execution time is achieved having 4 psrfits2fil instances running in parallel. Above that, probably disk I/O normalises all the results and the performance decreases gradually, probably due to the increased I/O calls, since the throughput must already be saturated.\\ 
 + 
 +Using 2 disks, the performance is significantly better, and the best results are achieved using 24 psrfits2fil instances in parallel, although the difference remains small.\\ 
 + 
 +==== rfifind benchmarks ==== 
 +I ran the same tests twice.  
 + 
 +I created rfi masks running rfifind in parallel for 4,8,12,16,20,24,28 and 32 cores (>16 hyperthreaded).\\ 
 +In the following plots I plot the number of parallel instances of rfifind executed (x-axis) and the time taken for these to be completed (y-axis).\\  
 + 
 +{{dragnet:benchmarks:rfifind1a.png?400}} 
 +{{dragnet:benchmarks:rfifind2a.png?400}} 
 + 
 +In the following plots, I extrapolated the above results in order to find the optimal number of parallel jobs in order to compute 32 rfi masks 
 + 
 +{{dragnet:benchmarks:rfifind1b.png?400}} 
 +{{dragnet:benchmarks:rfifind2b.png?400}}
  
-32-bit to 8-bit downsampling on CEP2 (per observation): 6-8 hours +From the above, we can conclude that using 1 or 2 disks does not make a big difference. Also, hyperthreading works smoothly, and indeed the best strategy is to have the maximum possible number of rfifind instances running in parallel.
-Transferring from CEP2 to LTA (per observation): 2-3 hours +
-Observation downloading on cartesius (1-core): ~8hours+
  
 +==== Cartesius Benchmarks ====
  
 +Processing 1 full pointing on cartesius using either /dev/shm or HDDs
  
 +{{dragnet:benchmarks:cartesius_bm1.png?400}}
  • Last modified: 2015-08-20 16:23
  • by Sotiris Sanidas