Cerebras Systems, a U.S. startup, is positioning itself as a formidable competitor to Nvidia in the AI GPU market by leveraging its innovative Wafer Scale Engine chips. These iPad-sized processors, developed by Cerebras, aim to accelerate AI workloads while significantly reducing costs.
On Tuesday, Cerebras unveiled a new service called “Cerebras Inference”. This offering allows customers to utilize its proprietary chips for their AI programs, with the startup asserting that its technology can run generative AI tasks up to 20 times faster and at one-fifth the cost of Nvidia’s industry-standard GPUs, such as the H100.
Cerebras Inference is focused on enhancing AI’s ability to generate new data, such as predicting subsequent words in a text. The company boldly claims that its platform is the “fastest AI inference solution in the world.” To back this claim, Cerebras is using its Wafer Scale Engine chips to power Meta’s Llama 3.1, an open-source large language model, with remarkable results. The company reports that Llama 3.1 can deliver responses with virtually no lag, thanks to the chips’ performance.
Cerebras highlights that its chips can run the 8 billion parameter version of Llama 3.1 at a rate of 1,800 tokens per second, translating to the ability to generate a 1,300-word article in just one second. For the more powerful 70 billion parameter version, the chips can produce 450 tokens per second. These benchmarks, according to Cerebras, surpass the token-per-second performance of AI cloud providers such as Amazon AWS, Microsoft Azure, and Groq.
In terms of cost, Cerebras Inference is highly competitive. The service is offered at a fraction of the price of GPU-based alternatives, with pay-as-you-go pricing starting at just 10 cents per million tokens for the Llama 3.1 8B model and 60 cents for the 70B model. This contrasts sharply with OpenAI’s pricing, which ranges from $2.50 to $15 per million tokens.
Cerebras’ Wafer Scale Engine chips are manufactured by Taiwan’s TSMC, the same contract chip maker responsible for Nvidia’s AI GPUs. In March, Cerebras introduced its third-generation chip, the WSE-3, which boasts an impressive 4 trillion transistors and 900,000 AI cores, further solidifying its potential to disrupt the AI hardware market.