Machine learning—software that gets smarter by learning from data—is exploding in popularity. To keep up, servers—the workhorse of the Internet—will have to get a lot smarter. Specifically, they’ll need greater computing horsepower to process image, voice, and other forms of cognitive workloads. Today, servers are primarily composed of processors, memory, storage, and networking. Servers built for machine learning, however, will need an additional ingredient—graphics processing units (GPUs). ARK estimates  that GPUs could comprise up to 75% of the value of these servers as machine learning becomes a primary workload. This dramatic shift in the hardware mix will siphon value away from the central processing unit (CPU), to the benefit of GPU manufacturers.
The Internet is built on web servers—computers dedicated to serving up webpages and applications that operate 24/7. For the most part, servers use the same components as PCs. CPUs are the most valuable component in traditional servers, followed by memory and storage, as shown in the chart below. Supporting components like the motherboard, network adaptors, power supply, cooling, and chassis make up the rest. This recipe in varying combinations, scaled to millions of units, is what makes up the modern cloud datacenter.
Machine learning, the core technology behind services like Siri, Skype Translator, and Google Photos, adds a new workload to datacenters. Although today’s servers can handle these requests at modest scale, as voice and video usage continues to increase, cloud service providers will need accelerators, such as GPUs, to improve performance, cost, and energy efficiency.
In image classification, for example, GPUs outperform CPUs more than tenfold when total cost of ownership, including hardware and electricity, is calculated, as shown in the chart below.
Machine learning can be broken into two parts—training and inference. Training involves feeding the artificial neural networks, i.e., the key data structure used for machine learning, with large amounts of data to help it learn. This technique takes tremendous processing power and typically is done on high performance servers with multiple GPUs. After a GPU is trained, it can be used to answer queries. This is called “Inference” and is generally a simpler problem but also benefits from GPU acceleration. Once trained, the neural network is deployed in a production environment and can be used to answer user queries such as SIRI voice commands.
If ARK’s projections are correct, increased adoption of machine learning will result in greater adoption of GPUs and the server hardware’s value mix will take on a much different look.
Adding a single GPU to a server changes the hardware value chain substantially. In the example below, the component cost of a basic “inference server” would be roughly 25% CPU and 25% GPU, with the balance going to memory, storage, networking, and other. This build assumes a mid-range CPU and a mid-range GPU, each costing approximately $400.
The value shift is far more pronounced with multiple GPUs in High Performance Computing (HPC) servers. HPC servers tackle the most demanding applications such as scientific research, oil and gas exploration, computational finance, and big data analysis. Unlike commodity web servers, which are designed for maximum scale, HPC servers are designed for maximum performance and require accelerators like GPUs. In the example above, modeled after the popular Dell C4130, four NVIDIA Tesla K40 GPUs are installed in a single server. At $2,700 a card, $10,800 in total, the GPUs make up almost half of the server’s bill of materials.
Facebook’s “Big Sur” machine learning server supports up to eight GPUs.
When researchers discovered that GPUs could be used to accelerate the training of neural networks, they created servers with even greater GPU density than HCP servers. Both Facebook [FB] and Baidu [BIDU] have deployed 4U servers with up to eight GPUs per server to accelerate the training of neural networks. Based on ARK’s research and analysis, GPUs make up as much as 75% of Facebook’s Big Sur server hardware cost, with other components playing mostly a supporting role.
Having witnessed the market potential for accelerators, competitors are catching on. Intel [INTC] is developing “Knight’s Landing,” a 72-core chip that will deliver GPU level computing performance. Intel also has acquired Altera, with the long-term goal of integrating programmable gate arrays into its chips to enable acceleration of data center applications. Meanwhile NVIDIA [NVDA] is set to release its Pascal chip this year with substantial performance, memory, and bandwidth improvements.
Will all servers include GPUs some day? Probably not. But ARK believes two trends are clear. First, if Moore’s Law is slowing down in this context, accelerators such as GPUs will be one of the best ways to scale performance in the absence of CPU improvements. Second, web services are transforming from “dumb” requests – like serving a webpage – to “smart” requests – like calling artificial intelligence assistants. As ARK’s research illustrates, when “intelligence” permeates a greater portion of web services, the need for GPUs will increase. In the very long run, if all software will be imbued with intelligence, and servers exist predominantly to serve intelligence, then the notion that all servers will include GPUs isn’t so far-fetched after all.