kilotaras

Alibaba Cloud claims to reduce Nvidia GPU used for serving unpopular models by 82% (emphasis mine)

> 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found

Instead of 1192 GPUs they now use 213 for serving those requests.

show comments
djoldman

Key paragraph:

> However, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.

show comments
hunglee2

The US attempt to slow down China's technological development succeeds on the basis of preventing China from directly following the same path, but may backfire in the sense it forces innovation by China in a different direction. The overall outcome for us all may be increase efficiency as a result of this forced innovation, especially if Chinese companies continue to open source their advances, so we may in the end have reason to thank the US for their civilisational gate keeping

show comments
braza

Does someone know if there's some equivalent of those engineering/research blogs for Chinese companies?

I used to follow the ones from Western companies, but honestly, after some point in time, I would like to see some cases from what I consider is a good benchmark for everyone that does not work in FAANG in terms of engineering.

show comments
shoeb00m

Would this make cloud providers running low volume fine-tuned models more economically viable?

checker659

They are working with tiny models. Not sure how well it'd scale to bigger models (if at all).

show comments
ibejoeb

Sounds like this virtual GPU is a separate scheduler. I wonder what kind of latency is introduced by marshaling all that data around.

mighmi

To what extent is this practice applicable to other loads?

show comments
throwaway48476

Its easy enough for a a well resourced entity to take a pre trained model and deploy it on new hardware to save on the NVDA tax. It's far less likely for research and model training to happen outside the mature NVDA ecosystem.

catigula

Sounds like they stopped doing something stupid.

lnxg33k1

Lots of shareholders here, move along, there is nothing to read

t0lo

Is this another nail in the gpu/ai stock market bubble coffin?

wslh

How feasible is that in an horizon of 5 years new optimized "equations" will cut the need for more GPUs?

show comments