Development Trends and Analysis of Liquid Cooling Technology in Data Centers — Second Chapter

In the latter part, let’s explore the opportunities and challenges.

Now, let’s delve into the distinction between system solutions and IT solutions. Similar to the human body, various means can be employed to ensure comfort, such as using fans, turning on air conditioning, or even swimming or taking showers. As long as a method can address our comfort needs, it qualifies as an effective solution.

However, the scenario is different for data centers. If I opt for immersive cooling, whether single-phase or two-phase, or choose spray cooling, I must consider the compatibility of every IT component and the overall operation of the cooling system. Moreover, the design of the complete solution also needs to be taken into account. This introduces the concept of system solutions. In cases of localized improvements, like focusing solely on cooling CPUs or GPUs, only IT solutions might need to be considered, and this is distinct.

When an enterprise genuinely aims to engage in artificial intelligence projects in the current era, the factors to contemplate in deciding whether to adopt liquid cooling technology are still quite numerous. This encompasses costs, including single kW construction expenses, total cost of ownership (TCO), and various factors tied to IT output. Beyond these aspects, those illustrated on the left side, these are the driving forces behind deploying liquid cooling technology, yielding benefits from them. For business development and innovation, as previously mentioned, if data center space and power resources are limited, employing liquid cooling can enhance IT output, thereby fostering favorable business growth.

In fact, is liquid cooling necessary for cases where a PUE of 1.2 is required?

And if I intend to build AI servers, is liquid cooling an absolute necessity? We are aware that NVIDIA’s provided solutions encompass more than just liquid cooling technology. Addressing these challenges is of utmost importance, including considerations of stability and cost factors.

Speaking of costs, it’s important to note that data centers have a relatively long lifecycle while the lifespan of IT equipment is comparatively shorter. Making decisions like whether to adopt liquid cooling technology or choose between cold plates and immersion techniques during the mid-term, say after 3 or 4 years of data center usage, presents a complex dilemma. Furthermore, factors such as legacy applications and whether the existing servers can continue to function in the new environment must also be accounted for.

Another significant concern is standardization. Currently, each company offers distinct solutions, from IT equipment to cooling distribution units (CDUs), extending to liquid cooling systems; each phase has its own standards. Opting for a particular solution, does it signify being locked into that company’s offerings or does it allow for interchangeability and interoperability across multiple solutions? This poses a substantial challenge for many users.

As a user, lacking robust control capabilities may lead to more conservative technology choices. If selecting a particular technology results in being bound by it, relinquishing control could be a potential consequence. Unless complete control over the entire supply chain can be achieved, decisions are often limited by existing conditions.

Furthermore, fire safety concerns need to be factored in. For instance, oil-based cooling fluids might pose flammability issues. Thus, using such cooling fluids in high-density data centers — does it comply with fire safety regulations? Can it be approved and recognized as a risk-free technology for secure usage? Given the current absence of comprehensive standards, experimentation might be daring, but consequences in case of issues would be borne by the user. Moreover, there are other ecological factors to consider, although specific details won’t be elaborated upon at this point.

Cooling Technology Overview

Apart from the single-phase and two-phase liquid cooling technologies mentioned earlier, there’s also a type known as full-coverage cold plate liquid cooling technology. Unlike traditional cold plate technology, full-coverage cold plate covers all components, not just core parts for cooling. This technology can be applied to all components, ensuring they all receive cooling.

In the past, full-coverage cold plate technology faced significant challenges because many components are plug-and-play, such as memory and hard drives, making them less standardized and unable to be covered. However, in the current AI scenario, AI boards can be designed to be integrated, with all components attached to the mainboard, thereby achieving full-coverage cooling for them. An important recent trend is the significant price drop in SSDs (Solid-State Drives). Compared to the beginning of the year, SSD prices have decreased significantly while storage capacity has greatly increased. Hence, storage capacity is no longer a problem. In this context, the heat dissipation issue with HDDs (Hard Disk Drives) that previously hindered the application of this technology is no longer relevant.

Regarding immersion liquid cooling technology, I believe that in the long run, single-phase immersion cooling is a superior data center solution.

However, there are still many challenges that need to be overcome with immersion liquid cooling. This primarily includes the following aspects:

  • Is immersion liquid cooling suitable for all scenarios?
  • How compatible is immersion liquid cooling with IT equipment? Could it lead to damage to IT equipment? Further research is needed on material compatibility.
  • Can traditional air-cooled equipment be directly immersed? More validation is required.
  • What are the technical standards for cooling liquids? Safety (fire, human safety), compatibility (electronic, electrical), heat dissipation performance, GWP/ODP, etc.
  • How to choose between single-phase and two-phase?
  • The choice between fluorinated liquids and oil-based cooling liquids? Fluorinated liquids are expensive and volatile, with complex systems, but perform well in server compatibility. On the other hand, oil-based cooling liquids are cheaper and less volatile, with simpler systems. However, handling oil-based cooling liquids during component replacement can be challenging and may require additional equipment. Thus, from the perspective of choosing a cooling liquid, it’s difficult to determine which technology will ultimately win.
  • How to address operational challenges?