Differences

This shows you the differences between two versions of the page.

--- dragnet:benchmarks_of_the_lotaas_pipelines [2016-08-06 10:56] – Sotiris Sanidas
+++ dragnet:benchmarks_of_the_lotaas_pipelines [2017-03-08 15:27] (current) – external edit 127.0.0.1
@@ Line 62: / Line 62: @@
-==== Data transfering (CEP$/LTA) ====
+==== Data transferring (CEP$/LTA) ====
 -bit to 8-bit downsampling on CEP2 (per observation): 6-8 hours\\
@@ Line 71: / Line 71: @@
 ==== Benchmarks for filterbank creation with psrfits2fil ====
-A series of tests were ran on dragnet (drg01), directly (now through slurm), on fits files from a random LOTAAS observation. Number of cores,means number of different fits files processed simultaneously. The total time is an extrapolation of this benchmark.\\
-Only one test was ran for each occasion. Repeating them showed differences in execution time, but I doubt that the results change qualitatively.
+psrfits2fil was executed with different numbers of parallel processes. The following plot shows the amount of time needed in order to create the fil files for various cases of parallel psrfits2fil instances.\\
-=== Input/Output on the same disk ===
+Using the same disk the following cases were tried: 1,3,4,5,8,12,16. Anything above 16 is just an extrapolation\\
--core:354sec\\
+for 2 disks: 1,4,8,12,16,20,24,28,32
-Total time for 16 files:5664sec\\
--cores:630sec\\
+{{dragnet:benchmarks:psrfits2fil1a.png?400}}
-Total time for 16 files:3360sec\\
--cores:720sec\\
+Using multithreading with 2 disks, gives a smooth linear performance up to 24 cores, and then it turns slightly worse, probably due to I/O.
-Total time for 16 files:2880sec\\
--cores:1106sec\\
+Using the above results, I extrapolated the time needed with each work strategy in order to compute 32 filtebanks.\\
-Total time for 16 files:3539sec\\
--cores:1735sec\\
+{{dragnet:benchmarks:psrfits2fil1b.png?400}}
-Total time for 16 files:3360sec\\
--cores:3778sec\\
+When using the same disk, the fastest execution time is achieved having 4 psrfits2fil instances running in parallel. Above that, probably disk I/O normalises all the results and the performance decreases gradually, probably due to the increased I/O calls, since the throughput must already be saturated.\\
-Total time for 16 files:3778sec\\
-When writing the filterbank in the same disk with the fits file, the best performance is achieved by having 4 psrfits2fil instances running.
+Using 2 disks, the performance is significantly better, and the best results are achieved using 24 psrfits2fil instances in parallel, although the difference remains small.\\
-=== Input/output on different disks ===
+==== rfifind benchmarks ====
--core:329sec\\
+I ran the same tests twice.
-Total time for 16 files:5264sec\\
--cores:523sec\\
+I created rfi masks running rfifind in parallel for 4,8,12,16,20,24,28 and 32 cores (>16 hyperthreaded).\\
-Total time for 16 files:2092sec\\
+In the following plots I plot the number of parallel instances of rfifind executed (x-axis) and the time taken for these to be completed (y-axis).\\
--cores:901sec\\
+{{dragnet:benchmarks:rfifind1a.png?400}}
-Total time for 16 files:1802sec\\
+{{dragnet:benchmarks:rfifind2a.png?400}}
--cores:1513sec\\
+In the following plots, I extrapolated the above results in order to find the optimal number of parallel jobs in order to compute 32 rfi masks
-Total time for 16 files:1513sec\\
--cores (hyperthreaded):2252sec scales perfectly!\\
+{{dragnet:benchmarks:rfifind1b.png?400}}
--cores (hyperthreaded):3346sec performance loss already evident;~340 sec slower than expected
+{{dragnet:benchmarks:rfifind2b.png?400}}
-Using all the cores in the dragnet nodes gives the best performance. Moreover, multithreading behaves exactly as having the extra cores.
+From the above, we can conclude that using 1 or 2 disks does not make a big difference. Also, hyperthreading works smoothly, and indeed the best strategy is to have the maximum possible number of rfifind instances running in parallel.
-Moving a fil file from /data2 to /data1 takes 1 minute.
+==== Cartesius Benchmarks ====
+Processing 1 full pointing on cartesius using either /dev/shm or HDDs
+{{dragnet:benchmarks:cartesius_bm1.png?400}}