Impact of the DeepSeek Moment on Inference Compute

In this talk, d-Matrix CTO/Cofounder Sudeep Bhoja discusses the impact that the release of the Deep Seek R1 model is having on inference compute.

Stepping through the evolution of reasoning models and the significance of inference time compute in enhancing model performance, Sudeep gives us a look at the techniques, methods and the implications in detail.

Highlights:

Reasoning models rely on “inference time compute.” They will unlock the golden age of inference.
DeepSeek R1 is only the first of many open models that will compete with frontier models. Distillation makes smaller models much more capable.
Unlocking efficiency from model architecture and algorithmic techniques today
Models are highly memory bound, so GPUs end up being under-utilized.
Deploying with efficient inference compute platform like d-Matrix Corsair will result in faster speed, cost savings and energy efficiency

Inference Time Compute: Sudeep shares the characteristics of inference time compute on models large and small, noting that the more computations you do during inference, the better the model gets. There is a balancing act, as you enhance the model with more inference time compute, the latency also increases which results in significant waiting time for users to see a response.

Reviewing performance numbers, he steps through the generation of synthetic data sets from these new open source models and what is involved in the distillation into smaller models. Using the distilled data set created from a larger teacher model and doing supervised fine tuning on smaller student models, these models become much more capable.

Finally, Sudeep explains that the reasoning models are highly memory bound and end up underutilizing the GPUs that are optimized for training. He highlights the potential of new architectures and purpose-built ASICs like our d-Matrix Corsair, which delivers efficient inference time compute, dramatically reduces latency, improves energy efficiency and is ideal for the age of inference.

Article Tags:

Impact of the DeepSeek Moment on Inference Compute

Impact of the DeepSeek Moment on Inference Compute

Article Tags:

Suggested Articles

How to Bridge Speed and Scale: Redefining AI Inference with Ultra-Low Latency Batched Throughput

The Complete Recipe to Unlock AI Reasoning at Enterprise Scale

Think more vs. Train more