IBM recently announced that it will make its Spyre AI accelerator commercially available. The Spyre accelerator brings low-latency inferencing for in-system generative and agentic AI to compatible IBM computers.
IBM Spyre accelerator in PCIe form.
IBM previewed the Spyre AI accelerator for IBM Z at Hot Chips 2024, with the goal of scaling the "enterprise AI workloads of tomorrow." Now, the company has set an initial release date—October 28, 2025—by which Spyre will be generally available for IBM z17 and LinuxONE 5 mainframes. Compatibility with IBM Power11 servers will follow in December 2025.
The Spyre Accelerator's Hardware
The Spyre accelerator is powered by a 32-core, 25.6-billion-transistor system on chip (SoC) with up to 1 TB of memory. The accelerator is designed to work with IBM’s Telum II processor and comes in a PCIe card form factor. It can be deployed with up to 48 cards in IBM Z or LinuxONE mainframe systems and 16 cards in the enterprise Power11 server.
The product came out of a research effort to bring generative AI capability directly into the enterprise IT environment. The IBM Research AI Hardware Center produced the initial iteration of Spyre as a prototype chip. From there, the team worked with the enterprise infrastructure development group to rapidly iterate the design and develop the hardware architecture. The result is a 5-nm SoC built into a 75-W PCIe card.
The Spyre processor is optimized for matrix multiplication—the mainstay of AI computation—and efficient core-to-core data transfer. The cores support int4 and int8 numeric formats for fast AI processing. The result is a processor that is faster and more energy efficient at AI than a multipurpose CPU, but still integrates easily within enterprise computing systems. The October release includes the latest z/OS, V 3.2 with complete Spyre acceleration support.
IBM and a World of Transactions
According to IBM data, 70% of the world’s transactions, by dollar value, are run through IBM systems. Much of the computing work surrounding these transactions can benefit from AI augmentation. However, prior to the Spyre accelerator, mission-critical generative AI either had to run on non-optimized systems or be outsourced to cloud-based AI resources. Doing so opens up performance and security challenges. The new accelerator enables an optimized AI architecture within the selected IBM enterprise systems.
Prototype Spyre accelerator chip as shown at Hot Chips 2024.
Spyre allows large operators to directly integrate generative and predictive AI applications within their own computing architecture. This promotes better security, higher performance, and internal control. AI within the mainframe will automate processes, improve and optimize apps, and develop bespoke models. z/OS 3.2 adds support for AI-driven data access methods as well.
The agentic AI that Spyre supports differs from conventional AI in that AI agents drive and direct the generative process rather than humans. Many generative AI applications involve human or programmed guidance when dealing with multi-step processes. Agentic AI utilizes AI agents to predict need and drive multi-step workflows. IBM explains, for example, that AI agents will not only tell you the best time to go on a climbing expedition but will also reserve a flight and hotel. While conventional AI would likely not connect the two tasks, IBM’s agentic AI would infer the need for the second task based on the first.
Past Meets Future in Spyre Accelerators
One of IBM’s modernized AI applications involves updating and translating COBOL programs into modern languages. COBOL, or “common business-oriented language”, harkens back to the early days of electronic computing. First released in 1960, it was one of the first compiled languages available for civilian business use. COBOL was the mainstay for financial and administrative applications for decades.
Today, however, there are few experts in the language, and it has largely been sunsetted for new development. However, millions of lines of COBOL code still exist in mission-critical systems throughout the world. The IBM Watsonx Code Assistant for IBM Z uses AI to allow non-COBOL experts to upgrade this obsolete code. One of the design goals of the Spyre accelerator was to support this specific need.
What’s Next for Spyre?
IBM envisions a great future for Spyre and follow-on products. Mainframe and enterprise server architectures that combine both high-capability conventional computing and dedicated AI hardware will become the norm. Such systems come with the ability to utilize low-latency AI for myriad applications. A Spyre accelerated system would deliver faster and more accurate fraud detection, improved code generation and code updating, and AI applications with more comprehensive agentic service levels. IBM even sees the potential for bringing large language model (LLM) training into the mainframe.
All images used courtesy of IBM.