Loading a model takes at least a few seconds, usually more, depending on model size, disk &#x2F; network speed and a bunch of other factors.If you&#x27;re using an efficient inference engine like VLLM, you&#x27;re adding compilation into the mix, and not all of that is fully cached yet.If that kind of latency isn&#x27;t acceptable to you, you have to keep the models loaded.This (along with batching) is why large local models are a dumb and wasteful idea if you&#x27;re not serving them at enterprise scale.

Models take a lot of VRAM which is tightly coupled to the GPU so yeah, it&#x27;s basically sitting there with the model waiting for use. I&#x27;m sure they probably do idle out but a few minutes of idle time is a lot of waste--possibly the full 82% mentioned. In this case they optimized by letting the GPUs load multiple models and sharing the load out by token.

&gt; I guess I’d assumed this sort of thing would be allocated dynamicallyAt the scale of a hyperscaler I think Alibaba is the one that would be doing that. AWS, Azure and I assume Alibaba do lease&#x2F;rent data centers, but someone has to own the servers &#x2F; GPU racks. I know there are specialized companies like nscale (and more further down the chain) in the mix, but I always assumed they only lease out fixed capacity.

The paper is about techniques to do that dynamic allocation to maximize utilization without incurring unacceptable latencies. If you let a GPU sit idle for several minutes after serving a single request, you&#x27;re setting money on fire. So they reuse it for a different model as soon as possible, starting even before the first request is finished, because: If you don&#x27;t have a dedicated GPU for a model, are you going to wait for a multi-gigabyte transfer before each request? So they have a dedicated GPU (or two, one for prefill, one for decode) for a group of models that are processed in an interleaved fashion, scheduled such that they stay within the latency budget.

&gt;Do the GPUs just sit there with the models on them when the models are not in useI&#x27;ve assumed that as well. It makes sense to me since loading up a model locally takes a while. I wonder if there is some sort of better way I&#x27;m not in the know about. That or too GPU poor to know about.

the models are huge, so not a single (latest gen) one can fit on a single GPU.It&#x27;s likely that these are small unpopular (non flagship) models, or that they only pack eg one layer of each model.

I’m slightly confuse as to how all this works. Do the GPUs just sit there with the models on them when the models are not in use?I guess I’d assumed this sort of thing would be allocated dynamically. Of course, there’s a benefit to minimizing the number of times you load a model. But surely if a GPU+model is idle for more than a couple minutes it could be freed?(I’m not an AI guy, though—actually I’m used to asking SLURM for new nodes with every run I do!)

So 82% of 17.7%?14.5% is worth a raise at least. But it’s still misleading.

Really:&quot;A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s.&quot;Which, if you scale it, matches the GPs statement.

Not really, Figure 1(a) of the paper says that the 17.7% are relative to a total of 30k GPUs (i.e. 5310 GPUs for handling those 1.35% of requests) and the reduction is measured in a smaller beta deployment with only 47 different models (vs. the 733 &quot;cold&quot; models overall.) Naïve extrapolation by model count suggests they would need 3321 GPUs to serve all cold models, a 37.5% reduction to before. (Or 6.6% reduction of the full 30k-GPU cluster.)

If you&#x27;re one who sees progress as an end goal unto itself, what you describe is a good thing. When one party is attempting novel solutions to outcompete the competition we will be faster to whatever the next change is.That said, I&#x27;m not sure what the US policies specifically have to do with this. Countries are always in competition with one another, and if one industry or technology is considered a national security threat they will guard it.

In the past, software and computer engineers would tackle problems head-on, designing algorithms and finding creative solutions.thanks to the US restrictions on semiconductor industry (Chinese), Chinese engineers are being forced to innovate and find their own ways to overcome challenges like the old school engineers (What Silicon Valley used to be)

Alibaba Cloud claims to reduce Nvidia GPU used for serving unpopular models by 82% (emphasis mine)&gt; 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers foundInstead of 1192 GPUs they now use 213 for serving those requests.

these other models are likely much smaller

Key paragraph:&gt; However, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.

Ok, we&#x27;ve changed the URL above (from <a href="https:&#x2F;&#x2F;www.scmp.com&#x2F;business&#x2F;article&#x2F;3329450&#x2F;alibaba-cloud-claims-slash-nvidia-gpu-use-82-new-pooling-system" rel="nofollow">https:&#x2F;&#x2F;www.scmp.com&#x2F;business&#x2F;article&#x2F;3329450&#x2F;alibaba-cloud-...</a>), and will put the link to the paper in the top text. Thanks!

better link <a href="https:&#x2F;&#x2F;www.tomshardware.com&#x2F;tech-industry&#x2F;semiconductors&#x2F;alibaba-says-new-pooling-system-cut-nvidia-gpu-use-by-82-percent" rel="nofollow">https:&#x2F;&#x2F;www.tomshardware.com&#x2F;tech-industry&#x2F;semiconductors&#x2F;al...</a>paper 
<a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;3731569.3764815" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;3731569.3764815</a>

&gt; most of what western companies have created has had significant contribution by Chinese scientists or manufacturing, without which those companies would have nothing. If you look at the names of AI researchers there&#x27;s a strong pattern even if some are currently plying their trade in the west.While I don&#x27;t disagree with your overall point, it&#x27;s important to recognize that this is only a phenomenon of the last ~30 years, and to avoid falling into the trapn of Han racial chauvinism. E.g. there were ~no Chinese scientists in Germany in the 70s but they were heavily innovating nevertheless.

The whole “China copies everything” narrative is becoming less and less true.It’s funny - it’s at the point with Chinese manufacturing for niche electronic goods (e.g rooftop van air conditioner) where some Chinese brands are more trustworthy - more value for your money and sometimes even better overall quality. With American brands you gotta make sure you’re not overpaying for dated tech that is inefficient. Maybe the same will happen with LLMs.

Ironically, the best way America could have prevented China’s rise in tech was by stapling green cards to diplomas of Chinese citizens who completed their higher education in the U.S. like the plan in the early 2010s.

&gt; History has shown that withholding technology from China does not significantly stop them and they&#x27;ll achieve it (or better) in a small number of years.It&#x27;s worked for a very long time for aircraft.China has been pushing to build its own aircraft for &gt;23 years. It took 14 years for COMAC to get its first regional jet flying commercial flights on a Chinese airline, and 21 years to get a narrow-body plane flying a commercial flight on a Chinese airline.If for no technical reasons and purely political, COMAC may still be decades away from being able to fly to most of the world.Likewise, in ~5 years, China may be able to build Chips that are as good as Nvidia after Nvidia&#x27;s 90% profit margin - i.e. they are 1&#x2F;10th as good for the price - but since they can buy them for cost - they&#x27;re they same price for performance and good enough.If for purely political reasons, China may never be able to export these chips to most of the world - which limits their scale - which makes it harder to make them cost effective compared to Western chips.

Re: Western. A similar thing plays out when the term &quot;international community&quot; is used in news. It refers to the US and its major allies which means US, Canada, Western Europe, Japan, Australia and New Zealand more or less.

It&#x27;s just straight up low expectations and underestimation derived from racism in the assumption that Americans are smarter and more capable, and Chinese are only good for copying designs and making things we come up with. The idea that they can&#x27;t do that like we can is pervasive.

Nobody thinks the Japanese aren’t “civilized.” “Western” is just a euphemism for “rich and orderly.”

&gt; look at the namesWhy would I do that tho? If we look at the names of scientists&#x2F;researchers&#x2F;engineers&#x2F;businessmen, the conclusion would be that the US has contributed nothing to the world. Europeans did all the hard work!

this is true for anyone - create challenges, and you optimize efficiency elsewhere.Also, isn&#x27;t this the usual path to better computer science? Reducing computation needs by making better&#x2F;more efficient algorithms? The whole &quot;trillions of dollars of brute force GPU strength&quot; proposed by Altman, Nadella, Musk et al just seems to reinforce that these are business people at heart, not engineers&#x2F;computer scientists...

The whole &quot;western&quot; or &quot;the west&quot; always makes me laugh. Half the time it&#x27;s a dog whistle for &quot;white&quot;. Like many right-wing commentators love saying &quot;Western Values&quot; to avoid saying &quot;white, Euro-centric, Christian values&quot;.Mexico is a modern country, an industrialized country, a country that is exactly as &quot;western&quot; as the US or Canada. They have the same religious beliefs, speak a dialect of a European language. They have European style cities, a long history of cultural contributions. Yet they&#x27;re not white enough to be part of &quot;The West&quot;.I think at this point we should be honest with ourselves in it&#x27;s usage. 90% of the time it&#x27;s a racist dog whistle.

I am not sure exactly to what degree, but &quot;I hate the term &#x27;western&#x27; because some &#x27;weste[r]ners&#x27; use it to separated what they think are &#x27;civilized&#x27; from &#x27;uncivilized&#x27;&quot; is definitely a bit of an antiquated perspective at this point; almost like a justification to hold on to other older perspectives about &quot;racism&quot;. I have started resorting to using terms like European Cultural Block because of it in certain communities that understand contemporary topics and have an advanced understanding of the world.Your first statement is not likely unique to China though, even though they have demonstrated that in about the last 40 years, which I don&#x27;t really think qualifies as &quot;history&quot;. What it does demonstrate is that societies that have a certain kind of ethnic self-respect and can cast off the detrimental influences of foreign, hostile, and even enemy elements to pursue their own self-interest and survival will succeed, regardless of hurdles placed before them.It&#x27;s really just a story of personal development and either escaping, evading, and avoiding detrimental, toxic people and their behaviors. All of humanity that all has to currently still share a single planet with ZERO save spots, would be better off if we all not just allowed each other to be ourselves in our won places without others subverting, subjugating, infiltrating, dominating, poisoning, or polluting any other people on the planet. Then everyone can decide if we want to be friends or not friends with each other, collaborate and be friendly or simply avoid each other. We do not have to like each other to get along if everyone agrees on a base understanding that no people can parasitize and abuse and manipulate any others.

western is a cultural term derived from a geographic one. The US is also not &#x27;western&#x27; strictly geographically as it is not in western europe, neither is australia. But they both originated from Britain&#x27;s empire and share in it&#x27;s cultural ancestry. It means &quot;western europe and it&#x27;s cultural derivatives&quot;. Spain and Portugal&#x27;s empire fell away long before britain and france&#x27;s and they don&#x27;t have similar geopolitical relations like NATO, so it&#x27;s hard to consider their former colonies&#x2F;upstarts part of the same sphere of cultural influence.China for sure will catch up, the question is what they will do with it. They&#x27;re not ambitious like the US&#x2F;West. The US wanted influence all over the world as an extension of the cold war and to keep economic interests safeguarded. But China just doesn&#x27;t operate that way. They&#x27;re more hands-off. They could be opening up alibaba cloud datacenters all over the US, offering it as an AWS&#x2F;Azure alternative, funding tons of startups all over europe, the US,etc... to exert their influence, but they won&#x27;t. They have a more long-term low-and-slow approach to global domination. The &quot;100 year marathon&quot; as they called it, which they&#x27;ll win for sure.China&#x27;s greatest weakness is not just their lack of ambition,but their command-economy. They&#x27;re doing capitalism but with central control of the economy. It intertwines government policy with corporate policy, making it harder to do business overseas (like with bytedance&#x2F;tiktok).

&gt;History has shown that withholding technology from China does not significantly stop them and they&#x27;ll achieve it (or better) in a small number of years.I don&#x27;t think you can really produce a definite counterfactual that they would or wouldn&#x27;t have taken longer or shorter without it, but certainly they were pushing for self sufficiency long before technology restrictions. But we&#x27;re not going to be handing our technologies to our competitors on a silver platter, and it&#x27;s also best for businesses to start weaning themselves off the Chiinese market. Virtually every market reliant on them today is in big trouble.As for hubris, I think that&#x27;s more a projection of your part if you want to start bringing up race cards with regards to contributions, that kind of argument would be applicable to everyone. And AI research is highly diverse and international, Chinese names don&#x27;t dominate the list more than Turks, Greeks, Malaysians, etc.

&gt; History has shown that withholding technology from China does not significantly stop them and they&#x27;ll achieve it (or better) in a small number of years.Really? How long has China been attempting to build their own jet engines? How long have they been attempting to build competitive CPUs?History has shown withholding tech successfully keeps them at least a generation behind the west.In some fields like CPUs they “make up for it” by just building larger clusters, but ultimately history does not show what you’re claiming. The only thing it shows is that we need to be even more diligent in protecting IP because a large portion of their catching up is a direct result of stealing the tech they were cut off from.

In a way withholding a tech becomes a signal saying &quot;Hey this is important&quot; so the result is China dedicates more resources to researching it lol.

It’s helpful to think of westernism as a platonic ideal. Individually derived reason and virtue, superior to state and sometimes ‘gods’ as a tradition to drive up the total survivability, richness, and stability of the community.Concepts that enable the individual should empower a chosen configuration of society not the other way around.Contrast this with non westernism where either education of the individual is not valued or the state is the primary goal over the individual.I’ve worked with states governments and individuals around the world for 20 years and find this very useful definition. What’s confusing is the nations who have half adopted westernism but don’t fully due to either caste systems or government dominated thinking.It’s an arrow towards rationalism over tradition, individualism over collectivism, flatness over hierarchy, and future over past. But only the limit of the resources any given society has.

Name one thing China has invented first in LLMs that the “west” adopted as a standard

I find &quot;western&quot; is often used to disparage &quot;western thought&quot;, as in it can&#x27;t grasp the deep wisdom of those mysterious orientals that transcends normal logic and reason. Declaring such a split is the underpinning of a whole lot of woo-woo beliefs.

History has shown that withholding technology from China does not significantly stop them and they&#x27;ll achieve it (or better) in a small number of years.In many senses there&#x27;s hubris in the western* view of China accomplishments: most of what western companies have created has had significant contribution by Chinese scientists or manufacturing, without which those companies would have nothing. If you look at the names of AI researchers there&#x27;s a strong pattern even if some are currently plying their trade in the west.---* I hate the term &quot;western&quot; because some &quot;westeners&quot; use it to separated what they think are &quot;civilized&quot; from &quot;uncivilized&quot;, hence for them LATAM is not &quot;western&quot; even though everything about LATAM countries is western.

&gt; But, China&#x27;s greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.That&#x27;s a strength. Them not having interest in global domination and regime change other than their backyard is what allows them to easily make partners in Africa and LATAM, the most important regions for raw materials.

China&#x27;s greatest weakness is that their working-age population has already peaked and is in the process of plummeting, which will continue over the coming decades.

I went to a school that was heavy on immigrants and had lots of 1st gen citizens as students and all they did was advocate against people like me for admissions and for preferential admissions for their own group. So in my opinion, skilled immigration is not a transfer of talent but an expansion of the upper classes who go to war with each other over a small number of seats. Ironically this zero sum game keeps overall skill levels the same. For every immigrant, say, one citizen loses a seat somewhere.

&gt; But now the US is trying to … compete against 10x population and lack of similar levels of internal strife and fissures.I can’t tell whether you think the anti-immigration stance is a good thing or bad thing.

The greatest weakness of the US is its utter lack of self awareness and its ambition to dominate others. Nobody is looking for another &quot;leader&quot;, people just want to live well without a bully on their neck. So, many countries that are not part of the US closed club are welcoming China as a new business partner.

&gt;But now the US is trying to be xenophobic like China, restrict tech import&#x2F;export like China but compete against 10x population and lack of similar levels of internal strife and fissures.Do I infer correctly that you believe that China has less internal strife and fissures than the US has?

&gt; But, China&#x27;s greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.How can they have international hegemony before they clear their regional order? China is more interested in aligning Taiwan than invading; though it’ll probably invade if it can’t align it diplomatically.China is probably not interested in continuing the current Western-style order but to implement their own sino-stuff. At least with the CCP at the helm.

Nobody is anti-immigrant outside of a small pocket of anti-H1B folks in the tech community. People are, however, anti-illegal-immigrant, which is completely different.

I think anti-immigrant rhetoric will have the most impact against the US. A lot of the people innovating on this stuff are being maligned and leaving in droves.Aside from geography, attracting talent from all over the world is the one edge the US has a nation over countries like China. But now the US is trying to be xenophobic like China, restrict tech import&#x2F;export like China but compete against 10x population and lack of similar levels of internal strife and fissures.The world, even Europe is looking for a new country to take on a leader&#x2F;superpower role. China isn&#x27;t there yet, but it might get there in a few years after their next-gen fighter jets and catching up to ASML.But, China&#x27;s greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.

&gt; China has an import ban on chipsOnly in response to the US banning the export of the high-end GPUs China wanted. The import ban is the Chinese government burning the the landing ships, it clearly communicates to everyone that there is no going back, and total commitment is expected.

The US is certainly slowing down China considerably. China would certainly not have an import ban on Blackwell GPUs if they were made available. And upstream, the ban on EUV and other high end semiconductor production equipment has severely limited china’s capacity to produce logic and DRAM (including HBM).

Would they have done that if the US had been more &quot;reliable&quot; in providing the chips and didn&#x27;t cut them off in the first place?The point still stands that the US instigated the split.

The US isn&#x27;t slowing China anymore.China has an import ban on chips [1] so its irrelevant what the US does.[1]: <a href="https:&#x2F;&#x2F;www.cnbc.com&#x2F;2025&#x2F;09&#x2F;17&#x2F;nvidia-ceo-disappointed-after-reports-china-has-banned-its-ai-chips.html" rel="nofollow">https:&#x2F;&#x2F;www.cnbc.com&#x2F;2025&#x2F;09&#x2F;17&#x2F;nvidia-ceo-disappointed-afte...</a>

I&#x27;ve been in Chile, Peru, Colombia, Panama, and Costa Rica.The streets are flooded with cheap Chinese cars and I see more BYD than American cars. If the car wasn&#x27;t made in Japan or Korea which probably account for most of the cars, it was likely made in China. Moreover, I haven&#x27;t been in countries with the closest ties to China.

The premature optimizer is never the innovator.Japan eventually stopped that role and their products improved greatly.

Tbh this whole situation reminds of how Japan excelled in making a lot more with a lot less after WW2, e.g., fuel-efficient engines, light cars, etc. these constraints were not present in the US (and to some extent in Europe), and resulted in US cars being completely not competitive in non-US markets.

But they aren&#x27;t keeping upThey are lauded for the ability to cost ratio, or their ability to parameter ratio, but virtually everyone using LLMs for productive work are using ChatGPT&#x2F;Gemini&#x2F;Claude.They are kind of like Huffy bicycles. Good value, work well, but if you go to any serious event, no one will be riding one.

way too early to say thatwhile qwen, deepseek and kimi are opensourced and good, they are preferred because of their insane token ratio, they use a lot less for more, but a by product is that they are less accurate
it is amazing progress by the chinese companies, but they definitely can improve a lot more

Have you tried using those models? qwen for example cant even do something as basic as clustering analysis on a list of integers, hell it goes off the rails when just reading said integers from a file - starts babbling about determining number of digits, indexes, tries concatenating numbers together into one big string, no idea wtf is going on with that model.

too early to call a winner, though it is disappointing to see US withdrawal from open source. Still the main outcome of open source is distribution &#x2F; diffusion of the idea, so it will inevitably mean US open source will come back, hopefully via some grass roots maniac, there will be a Linus-like character emerge at some point

may backfire? it&#x27;s a bit too late for that.go to 2024, western labs were crushing it.it&#x27;s now 2025, and from china, we have deepseek, qwen, kimi, glm, ernie and many more capable models keeping up with western labs. there are actually now more chinese labs releasing sota models than western labs.

positive take: a bifurcated tech tree might give us (humanity) a better chance of faster advancement, as it would be a persistent A&#x2F;B test in live environment. Where I would join you in the crossing of fingers is to ensure such A&#x2F;B testing is competitive but not destructive. We may even evolve to a situation of complementarity, an American Ying vs the Chinese Yang. Lets hope so!

Fingers crossed for convergence rather than divergence in the technical standards.Although the way hings are going it looks like the 2 stacks will diverge sooner rather than later , with the US+ banning the use of CHN models while simultaneosly banning the export of it quasi-open models. 
We may very well end up in a situation like the old PAL vs NTSC video standard where the PAL(EU&#x2F;Asia&#x2F;AFrica) and NTSC(America&#x27;s&#x2F;Japan) gradually converged with the adoption of digital formats. Instead here would be a divergence based on geopolitical considerations.

You mean, thank the US for their FAILED &quot;civilizational&quot; gate keeping.

Qwen&#x27;s max series had always been closed weight, it&#x27;s not a policy change like you are alluding.What exactly is Huawei&#x27;s flagship series anyway? Because their PanGu line is open-weight, but Huawei is as of yet not in the LLM making business, their models are only meant to signal that it&#x27;s possible to do training and inference on their hardware, that&#x27;s all. No one actually uses those models.

Small counterpoint but there are also 2 new players putting out SOTA open source models (Moonshots Kimi and zhipus GLM) so we&#x27;re still seeing the same number of models overall, just via newer entrants.

Their are signs that China is not open sourcing their SOTA models anymore. Both Huawei and Qwen (Qwen-Max, WAN 2.5) and have launched flagship models which are yet to be opensourced.

That&#x27;s how it usually goes, fully expected

I want China to release GPUs with a ton of VRAM, 128gb - 256gb. It doesn’t matter if they are half as fast as Nvidia because having a big model at a reasonable speed is better than not being to run them at all. AMD could have done this and have had a massive impact on nvidia’s market share but they choose not to because reasons.

Every single sentence you wrote is untrue and can be disproven by empirical evidence. You can learn about it here[1][2].1. <a href="https:&#x2F;&#x2F;itif.org&#x2F;publications&#x2F;2024&#x2F;09&#x2F;16&#x2F;china-is-rapidly-becoming-a-leading-innovator-in-advanced-industries" rel="nofollow">https:&#x2F;&#x2F;itif.org&#x2F;publications&#x2F;2024&#x2F;09&#x2F;16&#x2F;china-is-rapidly-be...</a>2. <a href="https:&#x2F;&#x2F;www.economist.com&#x2F;science-and-technology&#x2F;2024&#x2F;06&#x2F;12&#x2F;china-has-become-a-scientific-superpower" rel="nofollow">https:&#x2F;&#x2F;www.economist.com&#x2F;science-and-technology&#x2F;2024&#x2F;06&#x2F;12&#x2F;...</a>

China&#x27;s innovation relies on the stolen western IP, without it, China is nothing. Also USSR&#x2F;Russia is no longer a scientific powerhouse that can supply China with some military innovation. A dictatorship combined with cheap labour it 100% guarantees that the country&#x27;s innovation is stunted, no matter what the Chinese propaganda claims.

I was doing this in the 70-80s with electronics from Hong Kong and Japan. The nice cheap stuff ( I was very young ) was all sheets in things I basically had to pattern match against notes of others on BBS and meetups.

Another outcome may be that we now have to learn Chinese to understand their datasheets ...

&quot;... instead of spending the time and money for research and engineering...&quot;China has plenty of R&amp;D and science now.

It&#x27;s much easier to copy what others are doing instead of spending the time and money for research and engineering. It&#x27;s also much easier if you steal the tech. I could never have invented a bicycle but I can sure make a copy of one.

China is a nation of engineers...The US has been relying in on H-1B immigrants. Science is under attack. The truth is the US already lost: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;whVlI6H4d-4" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;whVlI6H4d-4</a>

I believe this is an Pollyanna take on AI. There is nothing about humans that tells us humans will bring AI to fruition for the other humans and a mountain of evidence showing how it will be used to abuse humans instead....for profits&#x2F;power&#x2F;whatever horse shit the masters of the universe have decided upon.

The US attempt to slow down China&#x27;s technological development succeeds on the basis of preventing China from directly following the same path, but may backfire in the sense it forces innovation by China in a different direction. The overall outcome for us all may be increase efficiency as a result of this forced innovation, especially if Chinese companies continue to open source their advances, so we may in the end have reason to thank the US for their civilisational gate keeping

The company blogs of Chinese companies will often do articles like this[1] talking about a new innovation or optimization that they did, but this will be often just mixed in with marketing articles too.I would also assume there&#x27;s a lot of content in the native Chinese forums, which unfortunately, as an English-speaking person, I wouldn&#x27;t be able to easily refer to :([1] <a href="https:&#x2F;&#x2F;www.alibabacloud.com&#x2F;blog&#x2F;how-does-alibaba-ensure-the-performance-of-system-components-in-a-10000-node-kubernetes-cluster_595469" rel="nofollow">https:&#x2F;&#x2F;www.alibabacloud.com&#x2F;blog&#x2F;how-does-alibaba-ensure-th...</a>

Does someone know if there&#x27;s some equivalent of those engineering&#x2F;research blogs for Chinese companies?I used to follow the ones from Western companies, but honestly, after some point in time, I would like to see some cases from what I consider is a good benchmark for everyone that does not work in FAANG in terms of engineering.

Would this make cloud providers running low volume fine-tuned models more economically viable?

They&#x27;re all LLMs, so no, not tiny, but not exactly huge either:&gt; Our current deployment runs in a cross-region cluster comprising 213 H20 GPUs, serving twenty-eight 1.8–7B models (TP=1) and nineteen 32–72B models (TP=4).

They are working with tiny models. Not sure how well it&#x27;d scale to bigger models (if at all).

Sounds like this virtual GPU is a separate scheduler. I wonder what kind of latency is introduced by marshaling all that data around.

This is for platforms that serve many different models, most of which have very low usage. e.g. huggingface, civitai

To what extent is this practice applicable to other loads?

Its easy enough for a a well resourced entity to take a pre trained model and deploy it on new hardware to save on the NVDA tax. It&#x27;s far less likely for research and model training to happen outside the mature NVDA ecosystem.

Sounds like they stopped doing something stupid.

Lots of shareholders here, move along, there is nothing to read

Is this another nail in the gpu&#x2F;ai stock market bubble coffin?

How feasible is that in an horizon of 5 years new optimized &quot;equations&quot; will cut the need for more GPUs?

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system