> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.
I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
show comments
zinodaur
Oh no, someone is profiting off of their work without proper attribution!?!?
show comments
unrvl22
The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.
show comments
jordz
Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
show comments
fkozlowski
I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?
MadrasTh0rn
Not surprised
show comments
ekjhgkejhgk
One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.
show comments
jrm4
“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”
-- Bill Gates
show comments
AnotherGoodName
This is fascinating that it worked though. Can we just merge all the open weight models and get something better?
show comments
yieldcrv
Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?
Its a fine tune of Qwen
Not a conspiracy
show comments
diego_moita
WHAT!? There are thieves in Rio de Janeiro?
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
alfiedotwtf
Wasn’t it already obvious given the awfully familiar parameter numbers?
> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.
I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Oh no, someone is profiting off of their work without proper attribution!?!?
The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.
Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).
The model's webpage at https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B says it's a merge now. It previously didn't contain this paragraph:
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?
Not surprised
One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.
“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”
-- Bill Gates
This is fascinating that it worked though. Can we just merge all the open weight models and get something better?
Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?
Its a fine tune of Qwen
Not a conspiracy
WHAT!? There are thieves in Rio de Janeiro?
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.
Wasn’t it already obvious given the awfully familiar parameter numbers?