Software Designed Out Memory

In late March, Google released an announcement on software that drastically reduces (by up to six times) the amount of Random Access Memory needed to run inference workloads.

The effect on the market was drastic. Prices, which had been rising all year and set to continue into 2027, took a temporary nose dive of 20% over six days for some models. The investment market was also affected. Shares of Samsung Electronics fell 4.8%, SK Hynix declined 5.9%, and Micron Technology dropped 3.4%.

Although there has been short term bounce back after the initial jitters, the announcement does have ramifications for the future of enterprise IT hardware and what this could mean for the future. This is especially true because developers are already using it.

What Does this Mean for Our Customers?

Less memory-heavy AI applications mean that users will be able to do a lot more with less RAM, and be more sustainable as a result. These are six highlighted examples:

1. Extending the Life of Existing Technology

The software enables more efficient utilisation of existing hardware. Rather than designing new, denser memory modules for inference workloads, there is now a way of redesigning compression so that the current technology can do more. Therefore, existing technology is not going anywhere soon. The technology is about memory, but storage is impacted as well, so there is a double benefit.

2. Local Models and Efficiency in the Networks

Being able to run more inference on less hardware opens the possibility to run more models locally rather than in the cloud. Not only does this reduce the load in the data centre but also in the network traffic. Since activities such as streaming have been calculated to use 70% of their energy in the networks, this also has implications for the global energy usage of AI workloads.

3. Smart Sensors Could get Smarter Still

Commentators have already mentioned the possibility of mobile phones processing AI locally. The principle stays true for other connected devices, notably smart sensors. A downside of the internet of things from a sustainability perspective is that the energy savings obtained by real time monitoring and adjustment on energy usage could be offset by the energy of the data transmission itself. Local smart models can change this.

4. Lower Cost Bar for Deploying AI

The advent of this kind of technology also lowers the financial bar to deploying AI models because the reduction in memory usage makes the solution less expensive. It means that there is a possibility of scaling inference workload without associated costs on infrastructure run locally or cloud costs. This means that the reason over 80% of companies are pulling back from AI deployments (because they are not seeing the ROI) potentially disappears.

5. Moving into More Useful AI Application for Business

One commentator interviewed for InfoWorld stated “The moment you move beyond toy prompts and start working with long documents, multi-step workflows, or anything that needs context to persist, memory becomes the constraint.”

If this is the case, then allowing simpler inference does not just reduce the cost of deploying Agentic AI, it also increases the usefulness of Agentic AI. Companies can now potentially get more performance from deployments, as well as a lower cost, faster than before. This makes it more attractive.

A Word to the Wise

With all the ancillary benefits of doing more with less, it’s unlikely that the memory market is collapsing any time soon. Experience tells us that when you get efficiency gains, you do not use half the amount of material, you double the usage. So, if components are in high demand now, they will still be in high demand as the cost of deploying AI drops and more people get on board.

If you would like to know more about how Techbuyer can help with your memory or hardware needs, contact our sales team.