Why HPC use IB network?

“Do not make the network be one bottleneck for the computing efficiency of multi-node parallel .”

Why HPC use IB network? This statement is actually inexact, and the question should be: Why HPC use RDMA?

When multi-node parallel in High-Performance Computing (HPC) cluster, a computing network is required for the real-time communication between kernels along with large-bandwidth and low latency. While the data of traditional communication method-Ethernet TCP/IP needs to be transmitted to the operating system kernel, then copied from kernel to target computer’s memory, driving high latency and occupying CPU computing resources.

RDMA (Remote Directly Memory Access) can directly transfer data from one computer’s memory to another one without the operating system kernel, this method can avoid its disturbance, lessen latency and CPU usage.

Now communications protocol supporting RDMA includes,
InfiniBand (IB): The new communications protocol that supports RDMA from the very beginning. Since it’s a new network technology, the supporting NICs and switches thus need to be provided.

ROMA converged Ethernet(RoCE): RDMA over Ethernet, allows executing RDMA communications protocol through Ethernet, which enables RDMA usage based on standardized Ethernet infrastructure (switch) except that network card needs to be special NIC supporting RoCE.

Internet Wide Area RDMA Protocol (iWARP): RDMA over TCP, allows executing RDMA communications protocol through Ethernet, which enables RDMA usage based on standardized Ethernet infrastructure(switch) except that network card needs to be special NIC supporting iWARP.

These three mainstream RDMA technologies, they can be divided into two camps. One is InfiniBand (IB) technology, and the other is Ethernet-based technologies that support RDMA, such as RoCE and iWARP. Among them, InfiniBand and RoCE are highly endorsed technologies by the InfiniBand Trade Association (IBTA), benefiting from their robust support and industry advocacy. Mellanox Technologies, an Israeli company, has emerged as a pioneer in this domain and was acquired by NVIDIA for $6.8 billion in April 2020. While IEEE/IETF are strongly supported by iWARP, mainly being promoted by Chelsio company.

High-performance computing needs processing massive data and computing, thus efficient communication technology is required for rapid data transmission and low-latency computing. Undoubtedly, that’s the most suitable application scenario for the IB network. It can be said that the IB network is designed for HPC.