AI was supposed to transform inventory management. With enough data and computing power, AI systems should be able to simply decide how much to order, when to order it, and where to send it—better than any human planner ever could.
In practice, many of these systems fail. They make counterintuitive decisions, require endless tuning, and struggle to perform reliably once deployed. A new paper co-authored by Will Ma, Roderick H. Cushman Associate Professor of Business at CBS, shows why this happens and how the largest e-commerce retailer in China was able to fix it.
Studying inventory decisions at Alibaba’s Tmall platform, Ma, along with his co-authors Yaqi Xie, Xinru Hao, Jiaxi Liu, Linwei Xin, Lei Cao, and Yidong Zhang, demonstrate that AI works best not when it is given complete freedom, but when it is guided by the same basic logic that experienced inventory managers already use, rooted in decades of inventory research. Their pragmatic approach, called DeepStock, blends modern “Deep Reinforcement Learning” AI with classic “Base Stock” inventory logic and has proven robust enough to manage inventory for more than one million product-warehouse combinations in the real world.
Ma noted that, “AI has unbelievable promise, extracting demand signals from large-scale data and optimizing inventory accordingly.” However, pragmatic deployments require a “hybrid” approach that incorporates structured inventory logic into the AI’s decisions. This “hybrid” approach that incorporates operational domain knowledge provides a general recipe to get the most out of AI’s power in the future.
Putting AI to the Test
Ma and his co-researchers examined inventory management as a recurring, high-stakes decision problem. Every few days, firms must decide how much inventory to order for each product, knowing that demand is uncertain and that deliveries can arrive with delays.
Modern AI systems can draw on a vast range of inputs to inform these decisions, including past sales, demand forecasts, upcoming promotions, seasonality, supplier lead times, current inventory levels, and even recent social media trends. The authors’ central question was not whether AI could process this information, but how much freedom it should be given in turning data into decisions.
To answer that question, they compared several widely used AI approaches for inventory planning. Some methods learn by estimating the long-term value of different actions; others directly reward actions that perform well in simulated scenarios. The authors then introduced a critical change: instead of allowing the AI to output any order quantity it wanted, they constrained its decisions using familiar inventory logic.
For example, rather than deciding an order quantity from scratch, the AI first learned a sensible target inventory level and then ordered only what was needed to reach that target. In other cases, the model was structured so that stronger demand signals naturally translated into larger orders.
The models were evaluated using large historical datasets covering tens of thousands of products and then validated through live deployments across Alibaba’s network, where inventory decisions must be made repeatedly across millions of product–warehouse combinations.
How Guardrails Made AI Better
Across simulations and real-world deployments, the results were consistent. AI systems performed better when they were forced to respect basic inventory logic.
First, structured models were far easier to train. Unconstrained AI systems often worked well only after extensive trial and error, with performance swinging sharply depending on technical tuning choices. When inventory logic was built directly into the decision process, performance improved more quickly and remained stable across a wide range of settings.
Second, these systems avoided common operational mistakes. They learned not to place large orders when inventory was already high and to respond more aggressively when demand picked up—behaviors that experienced managers expect, but that unconstrained AI systems sometimes failed to learn reliably.
Third, the approach proved its value where it mattered most: in live operations. Alibaba initially tested the system on a subset of products and then rolled it out more broadly. The AI reduced how long products sat in warehouses without increasing the frequency of stockouts, a rare and valuable outcome in inventory management. This suggests that DeepStock is stocking the right inventory at the right place at the right time.
These improvements held at remarkable scale. By late 2025, the system was managing inventory decisions for more than one million product–warehouse combinations across the Tmall platform. Even modest reductions in inventory holding time, without sacrificing service levels, translated into substantial financial impact, including an estimated $50 million reduction in inventory and over $1.8 million in annual capital cost savings (all figures converted to USD).
What It Takes for AI to Work in the Real World
In complex operational settings, unconstrained AI systems must learn even the most basic rules from scratch, making them brittle and difficult to deploy at scale. In other words, more freedom is not always better, especially in pragmatic scenarios with limited training data and training time.
By embedding well-understood business logic directly into AI systems, firms can make these tools easier to train, easier to trust, and more effective in real-world environments. The success of DeepStock suggests that the future of AI in operations lies not in replacing human judgment, but in encoding it.
AI delivers the greatest value when it reflects how decisions are actually made—under uncertainty, time pressure, and operational constraints. When designed this way, AI can move beyond pilots and experiments to become reliable infrastructure running the core of large organizations.
Read the full article by Will (Wei) Ma, Roderick H. Cushman / Columbia Business School











