This should be a warning to anyone running GCP. They suspend accounts left right and centre without even thinking about what they're doing. It seems like they use Gemini 3.1 Pro to run their production decisions.
TK has a history of absolutely destroying the culture of the place like in OCI and has done something similar in GCP from what I've heard. GCP and Google are completely different entities with how they work. Don't expect Google quality from the name. It's just like those old brands which now have cheap licensed products like Nokia (An exaggeration I know but not far from truth).
Not only that they are known to shut off their services randomly giving you like 6 months to migrate. They have lots of engineers not doing anything, so they put them on migrating internal users off those services, most of their clients don't. There was a brilliant article on this by an ex-GCP employee that I can't find right now.
Avoid GCP like plague if you are serious about your business.
"Railway owns our vendor choices, and we ultimately own this one. Your customers don't care whether the failure was Google or Railway; they see your product. Your uptime is our responsibility, and we'll keep delivering on it."
Kudos to them for acknowledging it and not doing PR speak. It shows it was an architectural failure from their part of trusting GCP, and they are working to fix it. Should they have seen it coming? Yes. But better late than never.
show comments
Jgrubb
Railway has not had the best month in the tech press have they? And in both cases it was an automated process belonging to some other party that put them there, damaging their reputation.
I was going to talk to our google rep about their killing the Gemini cli but this is way more concerning.
show comments
majdalsado
Unfortunately we had to make emergency migration off to Azure yesterday due to this. Thankfully our DB was not hosted on Railway and we were back up in a couple hours.
As much as we loved the simplicity they provided us, there's just been too many mishaps and shortcomings for us to continue running a B2B enterprise app on their infrastructure.
Sad day :(
show comments
myself248
How many trains were delayed or incorrectly routed as a result?
ryanSrich
Question: for a smaller SaaS tool, or even internal product. If a team doesn't want to manage AWS or another IaaS provider, what are the best alternatives for the following
1.) Vercel - having a bad month
2.) Supabase - having a bad month
3.) Railway - now having a bad month
show comments
teraflop
> May 19, 22:10 UTC - Our automated monitoring detected API health check failures and paged our on-calls, who started investigating the issue.
> At 22:20 UTC on May 19, Google Cloud placed Railway’s production account into a suspended status incorrectly, as part of an automated action.
If the timestamps are accurate, what was causing the errors 10 minutes before the account was suspended?
The simplest explanation is just that one or the other of these timestamps is wrong, which wouldn't be a big deal. But if the timestamps aren't known with certainty, it seems very odd to include them in the writeup as though they are certain, even though they are very obviously inconsistent with each other.
show comments
dan_sbl
> As a side effect, Terms-of-service acceptance records were also reset, prompting users to re-accept on their next visit to the dashboard.
Don't get me wrong- the rest of this mess falls pretty clearly on Google Cloud, but this one feels like something Railway did to themselves.
Bender
I've read all the threads and their main page and I still don't really understand what this service is. Is this like a commercial alternative to Gerrit? What do people use this for?
I'm not a developer, just curious what this is.
show comments
dantillberg
What drives Google to apply these actions so completely and immediately, versus a more deliberate approach, with notification and delay before action, manual review for paying customers, or a warning to resolve within X hours/days? Once or twice could be errors or bad implementation, but these can't explain away the pattern.
It would seem that Google's counsel has deemed that whenever _____ is detected, the company must immediately and completely sever the business relationship. What is that driving concern? Is it sanctions enforcement? CSAM? Something else?
show comments
beauregardener
Sadly, My Railway project is still having issues 24 hours later. Already started emergency migration away from Railway backend :(
nosefrog
It's highly unlikely that GCP banning their account without telling them is true, but GCP is probably not going to go public with the real reason.
"Your customers don't care whether the failure was Google or Railway; they see your product. Your uptime is our responsibility, and we'll keep delivering on it." - Thanks Claude!
indrex
Had similar experience with GCP. Terminated VMs six times, and responded zero times.
alansaber
Unfortunately we've also had a litany of problems with our GCP deployment and chose to remove them completely as a service provider.
mellosouls
Even if it ultimately turns out to be "Google's fault" (as this report seems to be saying), Railway say they own the incident but make no apology here.
whirlwin
The RCA and preventive measures was a pleasant read. I got a lot of respect for companies putting a lot of effort into incident reports like these. Makes them appear very professional rather than just blaming the cloud provider outright.
theredleft
back to on-prem
show comments
loxodrome
I will definitely not be signing up on GCP because of this.
stefan_
It's reassuring to know they will ban a million dollar enterprise customer just like they will ban your GMail of 20 years.
show comments
rurban
Google, the new Microsoft!
show comments
siliconc0w
Why would you use an infrastructure provider on top of another infrastructure provider? It adds cost and risk, it's always going to be a leaky abstraction, and it's not hard to learn how to use GCP or AWS correctly - especially with agents.
show comments
FajitaNachos
19 minutes from detection to getting the google account restored is pretty awesome honestly.
1970-01-01
They forgot to get reimbursement for downtime. A free month of GCP is better than nothing.
pm90
I don’t understand why Google still has TK helming GCP when its obviously not achieved the kind of success it should. Google infra is some of the best in the world yet GCP is meh. It continues to underperform and seems content to be a distant 3rd behind AWS and Azure.
delduca
Flagged by some AI automation.
in_a_society
Google has a culture problem. This is not something that can change easily nor will it change when it’s not recognized as being an issue within their organization.
Between my peer c-suites, the conversation is that GCP cannot even be in the consideration set until such a time as a several-year period has elapsed without this kind of incident.
show comments
koliber
Now given the logic that you can't be dependent on any one service to run your SaaS, how does Railway convince its customers to run their SaaS on a single service?
ibejoeb
I've been getting serious, recently, about moving all my workloads to equipment that I control in datacenters with which I have professional relationships. It's less expensive, easier, and this kind of nonsense doesn't happen. These cloud providers need to step back and observe how terrible they've made these products. Footguns everywhere, pricing that is impossible to forecast or reason about, broken APIs, and automated self destruction. Then you have third-party providers sitting on top of them, adding another layer of each antifeature. Crazy.
show comments
corndoge
> Your customers don't care whether the failure was Google or Railway; they see your product.
Refreshing. So tired of businesses blaming their vendors. Oh it wasn't us spamming you text messages and emails, it was Shopify. Oh, our delivery guarantee said 2 days and it's been a week? That's not us, it's UPS.
I don't care. I didn't pay UPS or Shopify. I paid you.
_justme
Does this qualify for a list entry on killedbygoogle.com ?
charcircuit
>Google Cloud placed Railway’s production account into a suspended status incorrectly, as part of an automated action.
There is no justification given on why this action was incorrect. It's possible they actually did something wrong.
show comments
llmslave
Major infra provider -> has no backups/game plan if GCP goes down
tamimio
> Railway’s production account into a suspended status incorrectly, as part of an automated action.
Be it individuals or companies, this time is the best time to ditch all dependence on anything clouds or SaaS since all are using automated AI, more and more of these incidents will occur.
AtNightWeCode
So, what was the reason for the account suspension. Why did it happen? I know Google can be a bit stupid with their automatons but I am bit skeptical here. There are sites more critical than Railway hosted on GCP.
This should be a warning to anyone running GCP. They suspend accounts left right and centre without even thinking about what they're doing. It seems like they use Gemini 3.1 Pro to run their production decisions.
TK has a history of absolutely destroying the culture of the place like in OCI and has done something similar in GCP from what I've heard. GCP and Google are completely different entities with how they work. Don't expect Google quality from the name. It's just like those old brands which now have cheap licensed products like Nokia (An exaggeration I know but not far from truth).
Not only that they are known to shut off their services randomly giving you like 6 months to migrate. They have lots of engineers not doing anything, so they put them on migrating internal users off those services, most of their clients don't. There was a brilliant article on this by an ex-GCP employee that I can't find right now.
Avoid GCP like plague if you are serious about your business.
Edit: Gemini (unironically) found the article on this, a very good read: https://steve-yegge.medium.com/dear-google-cloud-your-deprec...
"Finally, we are in planning to remove Google Cloud services from our data plane’s hot path, and keeping them only for secondary/failover."
That's pretty clear. Google can no longer be trusted as a B2B service provider.
The interesting and yet-to-be-explained part is why google flagged the account?
Put all the timestamps you want in the post mortem about what you observed, but you haven't addressed the root cause.
The "this doesn't make sense" part of the story likely has a real explanation that nobody wants to reveal yet.
This isn’t the first time Google Cloud has seriously messed with a customer’s account: https://cloud.google.com/blog/products/infrastructure/detail...
"Railway owns our vendor choices, and we ultimately own this one. Your customers don't care whether the failure was Google or Railway; they see your product. Your uptime is our responsibility, and we'll keep delivering on it."
Kudos to them for acknowledging it and not doing PR speak. It shows it was an architectural failure from their part of trusting GCP, and they are working to fix it. Should they have seen it coming? Yes. But better late than never.
Railway has not had the best month in the tech press have they? And in both cases it was an automated process belonging to some other party that put them there, damaging their reputation.
I was going to talk to our google rep about their killing the Gemini cli but this is way more concerning.
Unfortunately we had to make emergency migration off to Azure yesterday due to this. Thankfully our DB was not hosted on Railway and we were back up in a couple hours.
As much as we loved the simplicity they provided us, there's just been too many mishaps and shortcomings for us to continue running a B2B enterprise app on their infrastructure.
Sad day :(
How many trains were delayed or incorrectly routed as a result?
Question: for a smaller SaaS tool, or even internal product. If a team doesn't want to manage AWS or another IaaS provider, what are the best alternatives for the following
1.) Vercel - having a bad month
2.) Supabase - having a bad month
3.) Railway - now having a bad month
> May 19, 22:10 UTC - Our automated monitoring detected API health check failures and paged our on-calls, who started investigating the issue.
> At 22:20 UTC on May 19, Google Cloud placed Railway’s production account into a suspended status incorrectly, as part of an automated action.
If the timestamps are accurate, what was causing the errors 10 minutes before the account was suspended?
The simplest explanation is just that one or the other of these timestamps is wrong, which wouldn't be a big deal. But if the timestamps aren't known with certainty, it seems very odd to include them in the writeup as though they are certain, even though they are very obviously inconsistent with each other.
> As a side effect, Terms-of-service acceptance records were also reset, prompting users to re-accept on their next visit to the dashboard.
Don't get me wrong- the rest of this mess falls pretty clearly on Google Cloud, but this one feels like something Railway did to themselves.
I've read all the threads and their main page and I still don't really understand what this service is. Is this like a commercial alternative to Gerrit? What do people use this for?
I'm not a developer, just curious what this is.
What drives Google to apply these actions so completely and immediately, versus a more deliberate approach, with notification and delay before action, manual review for paying customers, or a warning to resolve within X hours/days? Once or twice could be errors or bad implementation, but these can't explain away the pattern.
It would seem that Google's counsel has deemed that whenever _____ is detected, the company must immediately and completely sever the business relationship. What is that driving concern? Is it sanctions enforcement? CSAM? Something else?
Sadly, My Railway project is still having issues 24 hours later. Already started emergency migration away from Railway backend :(
It's highly unlikely that GCP banning their account without telling them is true, but GCP is probably not going to go public with the real reason.
Duplicate of:
https://news.ycombinator.com/item?id=48201484
"Your customers don't care whether the failure was Google or Railway; they see your product. Your uptime is our responsibility, and we'll keep delivering on it." - Thanks Claude!
Had similar experience with GCP. Terminated VMs six times, and responded zero times.
Unfortunately we've also had a litany of problems with our GCP deployment and chose to remove them completely as a service provider.
Even if it ultimately turns out to be "Google's fault" (as this report seems to be saying), Railway say they own the incident but make no apology here.
The RCA and preventive measures was a pleasant read. I got a lot of respect for companies putting a lot of effort into incident reports like these. Makes them appear very professional rather than just blaming the cloud provider outright.
back to on-prem
I will definitely not be signing up on GCP because of this.
It's reassuring to know they will ban a million dollar enterprise customer just like they will ban your GMail of 20 years.
Google, the new Microsoft!
Why would you use an infrastructure provider on top of another infrastructure provider? It adds cost and risk, it's always going to be a leaky abstraction, and it's not hard to learn how to use GCP or AWS correctly - especially with agents.
19 minutes from detection to getting the google account restored is pretty awesome honestly.
They forgot to get reimbursement for downtime. A free month of GCP is better than nothing.
I don’t understand why Google still has TK helming GCP when its obviously not achieved the kind of success it should. Google infra is some of the best in the world yet GCP is meh. It continues to underperform and seems content to be a distant 3rd behind AWS and Azure.
Flagged by some AI automation.
Google has a culture problem. This is not something that can change easily nor will it change when it’s not recognized as being an issue within their organization.
Between my peer c-suites, the conversation is that GCP cannot even be in the consideration set until such a time as a several-year period has elapsed without this kind of incident.
Now given the logic that you can't be dependent on any one service to run your SaaS, how does Railway convince its customers to run their SaaS on a single service?
I've been getting serious, recently, about moving all my workloads to equipment that I control in datacenters with which I have professional relationships. It's less expensive, easier, and this kind of nonsense doesn't happen. These cloud providers need to step back and observe how terrible they've made these products. Footguns everywhere, pricing that is impossible to forecast or reason about, broken APIs, and automated self destruction. Then you have third-party providers sitting on top of them, adding another layer of each antifeature. Crazy.
> Your customers don't care whether the failure was Google or Railway; they see your product.
Refreshing. So tired of businesses blaming their vendors. Oh it wasn't us spamming you text messages and emails, it was Shopify. Oh, our delivery guarantee said 2 days and it's been a week? That's not us, it's UPS.
I don't care. I didn't pay UPS or Shopify. I paid you.
Does this qualify for a list entry on killedbygoogle.com ?
>Google Cloud placed Railway’s production account into a suspended status incorrectly, as part of an automated action.
There is no justification given on why this action was incorrect. It's possible they actually did something wrong.
Major infra provider -> has no backups/game plan if GCP goes down
> Railway’s production account into a suspended status incorrectly, as part of an automated action.
Be it individuals or companies, this time is the best time to ditch all dependence on anything clouds or SaaS since all are using automated AI, more and more of these incidents will occur.
So, what was the reason for the account suspension. Why did it happen? I know Google can be a bit stupid with their automatons but I am bit skeptical here. There are sites more critical than Railway hosted on GCP.
Related discussion during the incident:
https://news.ycombinator.com/item?id=48201484
Perfect reminder that it's time to use Google Takeout while I still can.
tldr: AI suspended an almost a billion dollar startup account.