I'm losing the SEO battle for my own open source project

418 points208 comments10 hours ago

Vegenoid

I pay for Kagi to get better search results. Lately, I’ve felt that Kagi’s search has been just as full of low-information and AI generated results as Google. I’ve been wondering why I’m still paying for it. This seemed like a good litmus test. Unfortunately, Kagi displays pretty much the same results as Google for nanoclaw.

show comments

Growtika

A couple years back John Reilly posted on HN "How I ruined my SEO" and I helped him fix it for free. He wrote about the whole thing here: https://johnnyreilly.com/how-we-fixed-my-seo

Happy to do the same for you if you want.

The quickest win in your case: map all the backlinks the .net site got (happy to pull this for you), then email every publication that linked to it. "Hey, you covered NanoClaw but linked to a fake site, here's the real one." You'd be surprised how many will actually swap the link. That alone could flip things.

Beyond that there's some technical SEO stuff on nanoclaw.dev that would help - structured data, schema, signals for search engines and LLMs. Happy to walk you through it.

update: ok this is getting more traction than I expected so let me give some practical stuff.

1. Google Search Console - did you add and verify nanoclaw.dev there? If not, do it now and submit your sitemap. Basic but critical.

2. I checked the fake site and it actually doesn't have that many backlinks, so the situation is more winnable than it looks.

3. Your GitHub repo has tons of high quality backlinks which is great. Outreach to those places, tell the story. I'm sure a few will add a link to your actual site. That alone makes you way more resilient to fakers going forward. This is only happening because everything is so new. Here's a list with all the backlinks pointing to your repo:

https://docs.google.com/spreadsheets/d/1bBrYsppQuVrktL1lPfNm...

4. Open social profiles for the project - Twitter/X, LinkedIn page if you want. This helps search engines build a knowledge graph around NanoClaw. Then add Organization and sameAs schema markup to nanoclaw.dev connecting all the dots (your site, the GitHub repo, the social profiles). This is how you tell Google "these all belong to the same entity."

5. One more thing - you had a chance to link to nanoclaw.dev from this HN thread but you linked to your tweet instead. Totally get it, but a strong link from a front page HN post with all this traffic and engagement would do real work for your site's authority. If it's not crossing any rule (specific use case here so maybe check with the mods haha) drop a comment here with a link to nanoclaw.dev. I don't think anyone here would mind if it will get you few steps closer towards winning that fake site

show comments

jackfranklyn

The structured data point in the top comment is spot on. Added Organization and SoftwareApplication schema to my own project recently and the shift in how Google indexes you is real - went from being treated as a random domain to Google actually understanding what the site represents.

What's maddening about this whole situation though is that Google already has every signal it needs. The GitHub repo links to nanoclaw.dev. The npm package links to it. The commit history proves authorship. But apparently domain age and raw backlink count still trump verified ownership signals. The system rewards whoever stakes out the domain first, not whoever actually built the thing.

show comments

AznHisoka

I’m looking at this from a 3rd party of view (definitely not claiming the .net “deserves” to rank higher)

1) the .net version has a couple of very high authority links, namely from theregister and thenewstack (both of which have had lots of engagement).

I highly doubt it would have ranked without those links.

2) its only been a week. Give Google time to understand which pages should rank higher.

3) Google is biased towards sites that cover a topic earlier than others.

I’ve seen pages that are still top 3 for a particular competitive query years later, simply because they were one of the first to write about it.

Suggestions: give it time. Meanwhile I would recommend linking to your website rather than your github everywhere you mention it, to give it a boost

show comments

uyzstvqs

I did some experimenting using different search engines and AIs. Here's the results:

Google and Brave linked to the official GitHub repo followed by the fake domain. DuckDuckGo and Bing linked to the fake domain first, followed by the official GitHub. Mojeek gave higher ranking to two third party articles, but linked to both the official GitHub and website without fakes. Qwant was the worst, as the official website was the second result amongst multiple fake websites and an unrelated GitHub repo.

Then there the AIs. ChatGPT, Google AI mode, Gemini, Grok, Perplexity, and Brave Search "Ask" all linked to the official website, and some added the GitHub repo as well. DuckDuckGo Search Assist linked to just the official GitHub. Google AI mode, Gemini and Grok also explicitly warned about the fake websites. Copilot got the official website and GitHub right, but linked to a presumably fake X account as well.

Conclusion: Google, Brave and Mojeek win in search. AI is very good and clearly beats search overall. Google AI mode, Gemini and Grok stand out in quality.

show comments

markus_zhang

My advice to all OSS developers: if you open source your project, expect it to be abused in all possible ways. Don't open source if you have anxiety over it. It is how the world works, whether we like it or not.

I appreciate that you open source your projects for us to study. But TBH, please help yourself first.

show comments

ariehkovler

It's worse than that. There's a SECOND imitator that I actually stumbled on today while looking something up about nanoclaw - nanoclawS [dot] io - and that one's harvesting email addresses.

The obvious risk here is a bait and switch, where one of these sites switches their link to the Github repo to point to a malicious imitator repo instead.

One approach would be to go after the sites themselves, not their Google ranking. See if their hosts are willing to take them down. Is there anything you can assert copyright over to hang a DCMA request on? That's hard for an Open Source project, I guess. And the fake sites aren't (yet) doing any actual scamming.

Good luck, though!

show comments

bob1029

Losing the SEO battle is a lot like losing money on the stock market. The system you are fighting is incredibly efficient and will never in a trillion years give a single shit about your specific concerns. You can hire lawyers and spend time complaining about it all day on social media. But you'll rarely get a drop of blood out of this stone. The best you can do is to step back, reevaluate your understanding of the market, and adjust your strategy.

allthetime

Piggybacking on the Claw hype, surprised when someone piggybacks on you...

show comments

GeoAtreides

And I'm losing the sanity battle for my own mind with all these AI generated posts pls I beg you two lines by your hand are worth 100000 generated tokens

dirk94018

We had a similar experience — looks like someone used AI to clone our site's design and structure at linuxtoaster.com. The real issue Gavriel is highlighting goes beyond SEO. The cost of creating a convincing copycat site just went to zero. Anyone can feed a successful page to an LLM and get a polished clone in minutes. And for open source projects it's even worse — they can clone your website AND clone your code, have an AI rebrand it, and ship a convincing-looking alternative overnight.

show comments

MarkSweep

The link on GitHub to the real site is marked with rel="nofollow". I wonder if it would make sense for GitHub to remove nofollow in some circumstances. Perhaps based on some sort of reputation system or if the site links back to the repo with a <link rel="self" href="..." /> in the header? Presumably that would help the real site rank higher when the repo ranks highly.

show comments

Sweepi

> When you Google "NanoClaw," a fake website ranks #2 globally, right below the project's GitHub.

Unfortunately, the fake website [.net] is also #3 on Kagi, and #1 on Duckduckgo. On Kagi, the Github is #1 and nanoclaw.dev is #4, but only if you count "Interesting Finds". On Duckduckgo, the Github is #2 and nanoclaw.dev is nowhere to be found.

czhu12

I've been developing and maintaining https://canine.sh and https://hellocsv.github.io/HelloCSV/ for some time now, and its really odd what pops up when you google these.

Neither of these projects anything requiring payment anywhere, but tons of sites pop up trying to "sell" these projects. I wouldn't even know what that means and I'm kind of tempted to drop in a credit card to see what happens. Would they auto send you a link to the public repo?

Most of it is quite lazy and haven't quite kept up with modern AI capabilities. They mostly just scrape the text I wrote, and present it with some screenshots that I created. I can imagine a future where

- really nice landing pages are generated

- the product is entirely rebranded

- marketing is automated (linkedin, google ads, etc)

and someone can develop some autonomous system that basically finds high quality, yet unknown open source projects, and redeploys it and sells it online for actual money.

signorovitch

> This isn't an SEO problem. This is a Google problem.

I've tested on a few of the big search engines, and nanoclaw.dev is never in the first page.

Gemini was also unable to find the .dev, even in "Research Mode." The only way I was able to get a direct link to nanoclaw.dev was with chatgpt, which found it by scraping the GitHub (it also spat out links to a couple of other copies it found from google.)

Seems this is a wider SEO issue, one which infiltrates even the technology supposed to replace it.

show comments

tracker1

Do what Louis Rossman did... just ask Google's AI what you need to change on your site... Apparently that's the secret now.

networkcat

Before installing new software, I usually visit its GitHub page or Wikipedia entry first and click through to the official site from there. I just don't trust the 'official' sites that pop up in Google search results. How many of you do the same?

show comments

youknownothing

> I've done everything you're supposed to do and more.

By the sound of it, everything except reporting it? Winning SEO just means appear before them in search results, but the fake page shouldn't just lose the race, it should be taken down.

ICANN specifies how to deal with this kind of issue: https://www.icann.org/en/system/files/files/submitting-dns-a...

show comments

shubhamintech

lol This gets worse with AI search. If Google can't figure out canonical source from a GitHub repo linking directly to the official site, LLMs definitely can't. And once an AI overview bakes the fake site into its knowledge graph, you're not just losing Google rankings imo, you're losing the models too. Registering every TLD on day 1 is now just table stakes for any OSS project which still doesn't seem fair.

throwaway85825

People forget that Google is a malware services company. A significant part of their revenue is fake OBS malware and the like.

samuelknight

Copycats are not a new problem. You can be completely open source and have a trademark on the project name.

show comments

azangru

> So I built a real website. That was two weeks ago.

Is Google supposed to have drastic updates to its index over 2 weeks?

show comments

lucasluitjes

I've been annoyed with Google search quality lately and was wondering how the others fared on this specific issue. Turns out, mostly not much better.

Bing, DuckDuckGo, Qwant, Ecosia, Brave all had the github repo and nanoclaw.net (the fake homepage) in the first or second place. Marginalia had fascinating results about biology but only tangentially related Nanoclaw results, not the github repo or either the fake or real homepage.

Mojeek was the exception, sort of. It had some random news sites up top, but the github repo in 2nd place and nanoclaw.dev (the real homepage) in the 4th place. The fake nanoclaw.net did not show.

Kagi is the only one I couldn't try because apparently I used up my free credits a year back. Can anyone see how they compare?

show comments

WD-42

Is there an acronym for “AI generated, didn’t read”?

show comments

jccooper

I don't see that Google cares much about backlinks any more. Seems like it's all about "content" keywords and maybe a little time-on-site. The domain is a huge signal, which is probably where the problem comes from here.

Sadly, Google's generally better against all the new AI-generated content farms than other players, so maybe they're still running PageRank somewhere.

vegasbrianc

SEO is broken at the moment. With Google Overviews just killing organic SEO, it is becoming less and less relevant, unfortunately.

theanonymousone

I saw this some time ago with Bing and OpenCode:

"If I search for "opencode GitHub" in Bing, a random fork is returned"

https://news.ycombinator.com/item?id=46573286

inkysigma

Just an FYI, but I don't know if being in the website field of GitHub really helps since there's a rel nofollow on the link.

bubblewand

Yeah, Google stopped even trying to usefully index most of the web around ‘08 or ‘09 or so. Was super obvious when it happened and it’s been that way ever since. Your GitHub is up there because it’s a blessed website, your personal site isn’t and will struggle mightily to rank even when you search exact, unusual phrases on it, if it’s like most of the rest of the Web on Google these days.

Get more traffic (make sure google analytics sees it, IDK but that probably matters because monopoly) and it might help.

Most of the other indices aren’t much better. Turns out fighting spam is expensive, easier to just do a combo of boosting really big sites and blessed spammers that use your ad network.

show comments

elevation

This project was launched very quickly, and may have not had a large budget for extra domains.

But for entities with a bit more time, you can prevent this scenario by taking acquiring the .com/.net variant domains before launching.

roywiggins

I'll be honest, I'd take this more seriously if this post didn't read like ChatGPT output. If you won't spend the effort to use your own words why should I stir myself to care?

Sorry, I'll put it in hand-crafted ChatGPTese:

## The Slop Problem

Every post sounds the same. No intelligence. No individuality. Just pure, clean LLM slop. Let's dive in.

- Every post has LLM tells. This is key.

- Posts get upvoted anyway. Nobody seems to notice or indeed care.

- People acclimate to the slop. This isn't just a coincidence. This is a real shift in standards. When people read enough of this, they begin to think it sounds normal.

## The Replying Dilemma

Should you engage with the content, when there is a real person involved? On the one hand, they put their name on it, and probably the details are drawn from their prompt, so it can be said to fairly represent what they wanted to say. So maybe ragging on their ChatGPT prose is being mean. On the other hand, if nobody ever mentions this, the acclimatization will only get worse as the rising tide of slop overwhelms any other style of writing.

## The "Snobbery is good actually" Option

Relentlessly bully people for their half-baked LLM copy. Make it your whole personality. Go insane.

## The "Giving Up" Solution

Learn to stop worrying and love the LLM.

show comments

ryandrake

> I don't want to be playing this game. I want to be writing code, building community, pushing features, fixing bugs.

Then just write code, build features, and fix bugs. Nobody is forcing you to fix search engines' problems. If you're not making money off of traffic, then why worry so much about SEO? Just do your thing. If it really bothers you, put a little note on your GitHub warning people about the fake site, and get on with your life.

show comments

iamacyborg

Google is absolutely idiotic sometimes.

We (as in the team that helped fork and migrate the PoE1 wiki) setup a new domain for the Path of Exile 2 wiki, which is being hosted by the folks at Grinding Gear Games and linked on the official website and in multiple places on the highly trafficked subreddit.

Despite this, Google has decided that the site is not relevant and shouldn't appear anywhere in search results, despite the wiki for the first game appearing everywhere.

tmaly

Wasn't one of the original ideas of NFT was to essentially identify the original creator?

alexpham14

Oof, this is exactly the nightmare scenario for “repo-first” OSS.

The weird bit isn’t that a scraper site exists, it’s that Google can’t do the obvious graph join: query == project name, #1 result is the repo, repo declares Homepage = X, yet Google still boosts an imposter domain. That’s not “SEO”, that’s the ranking system refusing to treat maintainer-declared canonical as a strong signal. Early domain squatters get to “set the default” purely by being first, then they can flip the content later once trust is baked in.

People keep saying “tell users to bookmark the real URL” like that scales. Most people will click the second link and assume it’s official. If Google can’t solve this class of problem, their “AI answers” are going to be a bigger mess than blue links ever were.

bakugo

> I don't want to be playing this game. I want to be writing code

I assume the "I" here refers to Claude, who seemingly wrote the entire project AND the linked post.

ZoomZoomZoom

This is a google problem, but only secondary.

The crux of the matter is that there's nothing that protects an open project besides reputation, and nowadays in the digital space it can be cheaply farmed.

Laws could help, but they only work when you undertake purposeful actions to be covered by them, like register a trademark, and it's never cheap.

Imagine you're in a local band playing shows. It's 3 month old and you have no issued records. A second band tighter with venues takes your name and starts performing under your moniker. You have no money to take that to court and good luck making a case. You can't do anything besides screaming on the web or, don't know, kicking a few butts. You change your name.

show comments

renegat0x0

- I think I was upset when Google allowed fake ad for VLC to appear high in ranking

- I hate that Google returns content farms instead of product web pages

- I hate that Google provides a page of 10 useful links, later links are just pure garbage. I think that something in Google engine is profoundly broken

- I maintain my own search index, but it requires a lot of effort, and attention. I do insert links if I find them worthy. I think more people should have their personal search indexes. Mine is below. I am quite happy that problems like these do not affect me that much

https://github.com/rumca-js/Internet-Places-Database

show comments

senko

> This isn't an SEO problem. This is a Google problem.

Sorry, but this is a SEO problem. The fake site has probably been linked to by a number of high-SEO outlets. What you should do is contact them and tell them to fix the links (to point to your site), which they should be happy to do.

show comments

rocketvole

i think orcasclicer suffers from the same issue. Not really sure why some oss projects struggle with this issue and others don't (notepad++)

MagicMoonlight

A guy that stole someone else’s idea by making a shinier website getting mad that someone stole his idea by making a shinier website. Such is life.

boredhedgehog

> The person running nanoclaw[.]net can put anything they want on that page tomorrow. A crypto scam. A phishing page. Malicious download links. They could fork the GitHub repo, inject malicious code, and link to it from the site that Google is telling thousands of people is legitimate.

A lot of handwringing about hypotheticals. The page is up there because it links the official repo. Changing that will quickly tank its search rank.

barelysapient

The more things change the more they stay the same.

shevy-java

I've noticed this a few years ago. Google has been ruining its search engine deliberately so. I could explain the things Google did here, but other websites and videos already explain it, including the why (though there is some speculation as to why).

These days I even find e. g. qwant sometimes having better results than google search. I see it as a positive thing though - I can soon stop using Google search. So one less Google product. One day I will be Google free. It will be a happy day. I really think Google must cease to exist.

(The only sad thing is how crap the other search engines are. So while Google search sucks nowadays, I consistently get even worse results with e. g. DuckDuckGo. And I think part of the reason is because the world wide web also sucks a LOT more compared to the old days. Google is also partially responsible for this by the way, which just reinforces the idea that Google must die.)

keybored

Live by bots, die by bots.

imp0cat

It's simple really, .net > .dev.

keiferski

Suddenly the pre-Google Yahoo model of curated links is starting to seem relevant again.

Curation in general is probably a skill that will become more and more in demand as the Internet fills up with AI slop.

show comments

ChrisArchitect

Two weeks? Hardly enough for the correct url to take over. A correct url with no history/presence that came out of nowhere as far as the engine is concerned. It will happen most likely tho, thanks to the links from the project etc, but might take a bit of time since the other url is established. "losing the battle" now perhaps, but not for long most likely.

Imustaskforhelp

Duckduckgo actually shows nanoclaw.net as the first result and the github page as second.

Another point but DDG's AI feature actually references Nanoclaw.net as a source.

Damn I booted up Orion (Kagi) and even Kagi shows nanoclaw.net as the third result after the github page with qwibitai and another github page with your (previous?) github username ie gavrielc which when clicked on also results to the same github page.

There is an interesting find page in kagi which references the website but it still shows nanoclaw.net page earlier and the nanoclaw.dev interesting find shows the .dev domain barely that in first time I didn't even notice it.

I expected it better from DDG/Kagi to be honest. I also tried brave and it had the same issue. Brave even is its own independent index and even that struggles with.

Let's hope that this can quickly get patched though. Also a good reminder to people to prefer opening up github links than websites as I must admit that even as a tech-savvy person I could've fallen for nanoclaw.net link as well given its second in like all search engines.

show comments

dumbfounder

DMCA?

show comments

Imustaskforhelp

Another comment here but here are all the search engines I looked at:

1. DDG 2. Kagi 3. Brave 4. Ecosia 5. Startpage 6. Marginalia 7. Mojeek 8. Yandex.ru

from 1-5 all referenced .net before .dev and DDG referenced .net before github , marinalia didn't give me either .net, .dev or gh link but rather docker.com or some other tech articles

Mojeek and Yandex.ru DID give me .dev links before .net at the time of writing.

I literally opened these two as a joke especially Mojeek not expecting too much But I just know names of lots of search engines so I tried.

Mojeek and Yandex.ru have surprised me although I think yandex.ru might have referenced the .dev because of https://nanoclaw.dev/ru/ as it points to this.

Mojeek seems interesting now from this observation

I also wanted to try swisscows but looks like they have become 100% premium as I do remember being able to search for free but now a popup comes.

I also tried baidu (chinese search engine) and it gave results in chinese and firefox translate sort of stuttered and didn't work when I tried to translate, I don't know chinese so pasted it in claude and it doesn't link to either .net or .dev but rather chinese links.

Now with all of this observation, I think that we do know one Provider (Mojeek) who won. A lot of these on these lists are actually not independent except Mojeek and brave and probably yandex.ru

SO I guess the main takeaway from this could be that Independent search engines can be interesting. They can still be hit or miss but the more independent search engines the merrier given that some might miss but some will also hit.

My comment definitely feels like a good reputation bonus for mojeek. Well anything for more independent search engines imo. I looked at their about me and it seems that they are a single person (Marc Smith). Fascinating stuff

I know marginalia_nu is on hn so maybe marginalia and mojeek can share some index together. Anyways this was a fun exciting experiment to do. I hope the community tries out other search engines if I may have missed any and share insights if a particular search engine gives interesting results.

show comments

Drupon

Sorry Gavriel Cohen, but this Google search placement was promised to the other person thousands of years ago.

newswasboring

I fell for this yesterday, but for zeroclaw not nanoclaw. I found this website[1] through brave search I think. I was not paying too much attention as I was under the influence, it points to the wrong repo[2] and instructions install from that. I didn't like zeroclaw anyways so I tried to uninstall it and only then realized i'm on a forked repo.

[1] https://zeroclaw.net/ [2] https://github.com/openagen/zeroclaw

yieldcrv

Gavriel is freaking out over nothing while making rookie mistakes pretending not to be in an SEO war

It's literally not his problem that some people click a scam link, he still has 18,000 github stars, its just a bifurcated audience of undiscerning people

He's overly worried about a perfect unanimous impression when he shouldn't

Now he's wasting his money on SEO tweaks and domain names while saying he only wants to code, then focus on coding! not buying obscure TLD's and vibecoding sitemaps while wondering what he did wrong

yeesh, some people can't handle a little fame

show comments

csomar

It’s worse. I wrote about this a couple weeks ago [1]. With AI responses and Google pulling results from different sources, you could potentially hijack other brands with your own fake content (ie: phone number).

1: https://codeinput.com/blog/google-seo

DeathArrow

>We trust Google to surface reliable information about elections. Vaccines. Medical conditions. Financial decisions. And they can't get this right?

Actually I don't trust Google and I don't expect it to surface reliable information. I expect it to surface information and I will dig through it and judge for myself whether it is reliable or not.

gjsman-1000

Steve Jobs famously never allowed free meals at Apple.

Humans are psychologically incapable of assigning respect to things that are free; across the board - not donating to open-source, maxing out every dollar of food stamps, refusing to pay a dollar for an app if it has a free tier, even companies like AWS ripping off open source without any qualms. If you got an offer for a free relationship no strings attached, would you take it seriously? If someone on a street corner has artwork for $5 or $500, it could be the same piece of art, but which one gets more attention on first glance?

If you want your work to be respected, do not make it open source. Your odds are slightly better at succeeding at acting. Remember that 97% of public GitHub repos have zero external users.

show comments