dragnet:cluster_benchmark

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
dragnet:cluster_benchmark [2017-01-26 17:47] – [drgXX nodes] add cp performance for drgXX RAID0 amesfoortdragnet:cluster_benchmark [2017-09-19 21:40] (current) – [RDMA] fix typo amesfoort
Line 170: Line 170:
 RDMA (Remote Direct Memory Access) allows an application to directly access memory on another node. Although some initial administration is set up via the OS kernel, the actual transfer commands and completion handling does not go via the kernel. This also saves data copying on sender and receiver and CPU usage. RDMA (Remote Direct Memory Access) allows an application to directly access memory on another node. Although some initial administration is set up via the OS kernel, the actual transfer commands and completion handling does not go via the kernel. This also saves data copying on sender and receiver and CPU usage.
  
-Typical applications that may use RDMA are applications that use MPI (Message Passing Interface) (such as COBALT), or (hopefully) the LUSTRE client. NFS can also be set up to use RDMA. You can program directly into the ''verbs'' and ''rdma-cm'' C APIs and link to those libraries, but be aware that extending some code to do this is not a 1 hr task... (Undoubtedly, there is also a Python module that either only wraps or even makes makes life easier.)+Typical applications that may use RDMA are applications that use MPI (Message Passing Interface) (such as COBALT), or (hopefully) the LUSTRE client. NFS can also be set up to use RDMA. You can program directly into the ''verbs'' and ''rdma-cm'' C APIs and link to those libraries, but be aware that extending some code to do this is not a 1 hr task... (Undoubtedly, there is also a Python module that either only wraps or even makes life easier.)
  
 We used the ''qperf'' benchmark and got the following bandwidth and latency numbers between two ''drgXX'' nodes (TCP/UDP/SCTP over IP also included, but not as fast as mentioned above): We used the ''qperf'' benchmark and got the following bandwidth and latency numbers between two ''drgXX'' nodes (TCP/UDP/SCTP over IP also included, but not as fast as mentioned above):
Line 318: Line 318:
 Scratch space on ''drgXX'' at ''/data1'' and ''/data2'' (individually). A transfer size of 4k vs 64k does not appear to matter. We reach up to 288 MiB/s. Copying a large file using cp(1) reaches 225 - 279 MiB/s. Scratch space on ''drgXX'' at ''/data1'' and ''/data2'' (individually). A transfer size of 4k vs 64k does not appear to matter. We reach up to 288 MiB/s. Copying a large file using cp(1) reaches 225 - 279 MiB/s.
  
 +Another cp test on a 85+% full target filesystem:
  
 +  [amesfoort@drg23 data1]$ time (cp /data2/L412984/L412984_SAP000_B045_S0_P000_bf.raw /data2/L412984/L412984_SAP000_B046_S0_P000_bf.raw Ltmp && sync)
 +  
 +  real 9m56.566s
 +  user 0m0.799s
 +  sys 4m20.007s
 +
 +With a file size of 2 * 75931582464 bytes, that's a read/write speed of 242.8 MiB/s.
 ===== dragproc node ===== ===== dragproc node =====
  
 On ''dragproc'' at ''/data'', a transfer size of 64k may perform somewhat better than 4k, but not consistently. We reach 490 - 530 MiB/s. On ''dragproc'' at ''/data'', a transfer size of 64k may perform somewhat better than 4k, but not consistently. We reach 490 - 530 MiB/s.
  
  • Last modified: 2017-01-26 17:47
  • by amesfoort