With the arrival of Reasoning and Inference-Time compute, we are at an inflection point in the AI computing journey. Finally, revenue generation from AI models is aligning with the cost of AI compute. The more you think (compute) the better the outcomes (results, productivity). This makes intuitive sense and something we have used as a guiding principle from the early days of d-Matrix. A person with multiple advanced degrees is not necessarily smarter than the high-school diploma holder who acquires domain specific knowledge, thinks and works hard to apply this knowledge for their professional outcome.
The DeepSeek AI reasoning models (Reasoning R1, R1-Zero) using innovative techniques like reinforcement learning with Chains-of-Thought (CoT) and knowledge distillation have put open source reasoning models on the map. Not only does DeepSeek beat top proprietary models in terms of capabilities, the innovative techniques used have made these models much more affordable. Imagine if you could distill all your thoughts into some of the critical life decisions you make, capture the stream of thinking that helped you arrive at those decisions, and then transfer those to your kids. Wouldn’t that be cool! Distillation leads to smaller models and smaller models that are trained to think more, can match the best outcomes in their chosen areas. And the sweetener, these models are openly available for everyone to try and use. DeepSeek AI has likely catalyzed the age of Inference.
At d-Matrix, we have been big proponents of inference efficiency at scale and open-source from the early years. We built the company around creating a foundational inference compute solution with open standards as our guiding principle. As we move into the ‘inference era’ with the multiple forces coming together – reasoning, inference-time-compute, open weight models, distillation, quantized numerics, energy efficient architectures, interactive use-cases and ROI – we hope to make our mark in this new world with our newly launched Corsair platform which emphasizes ‘Do More with Less’.
More Inference with Less Cost($), Less Energy(W), Less Time(s)
Our computing platform emphasizes
Disaggregated hardware with open software stack = Adoptability and cost efficiency = Less Cost($)
Digital In-memory Compute = Energy efficiency = Less Energy(W)
Memory-Compute Integration with chiplets = Low latency batched inference = Less Time(s)
All this, to make Gen AI commercially viable for everyone. Would love to speak with you, if we can help you go on this journey.
A version of this was originally posted on LinkedIn by Sid Sheth on January 27th 2024
#Reasoning #R1 #Inference #AI #DIMC #chiplets #ROI