HN.zip

Cloudflare Down Again – and DownDetector Is Also Down

https://downdetector.tr/

294 points by bakigul - 134 comments

134 Comments

JoeWiltshire [3 hidden]5 mins ago
https://downdetectorsdowndetectorsdowndetectorsdowndetector.... reports that https://downdetectorsdowndetectorsdowndetector.com/ is down, guessing downdetectorsdowndetectorsdowndetector runs via cloudflare!
klas_segeljakt [3 hidden]5 mins ago
It's downdetectors all the way down
zombot [3 hidden]5 mins ago
Is that the same as "All downdetectors are down"?
salviati [3 hidden]5 mins ago
People confused by this reference can read https://en.wikipedia.org/wiki/Turtles_all_the_way_down
wiz21c [3 hidden]5 mins ago
no, it's some turtle-ish dialect
johncolanduoni [3 hidden]5 mins ago
We need a set of down detectors that all detect if each other are down, and produce an answer via quorum.
ipnon [3 hidden]5 mins ago
I think CAP theorem says this is impossible.
johncolanduoni [3 hidden]5 mins ago
Only if the internet partitions in such a way that a quorum is impossible. Cloudflare is not quite that essential.
AmbroseBierce [3 hidden]5 mins ago
Snark detectors are all the way up tho
rozenmd [3 hidden]5 mins ago
guess that explains the random spike in people searching downdetector in OnlineOrNot: https://onlineornot.com/website-down-checker?requestId=RlByu...
yellowbanana [3 hidden]5 mins ago
This is art
LAC-Tech [3 hidden]5 mins ago
I don't understand, why don't they just make the whole internet out of down detectors?
Oarch [3 hidden]5 mins ago
Always has been.
skywhopper [3 hidden]5 mins ago
Every network card is a down detector.
zombot [3 hidden]5 mins ago
I always wondered who detects when downdetector.com is down.
Poudlardo [3 hidden]5 mins ago
haha this website actually is useful
lopatin [3 hidden]5 mins ago
I think it's down though
justmarc [3 hidden]5 mins ago
Down detector slop
loopdoend [3 hidden]5 mins ago
No idea why the media always relies on/cites this useless site...
dtf [3 hidden]5 mins ago
Edinburgh Airport is also down, suspending all flights after an "IT issue with our air traffic control provider". Not sure if this is coincidental, but the timing is rather suspicious!
johncolanduoni [3 hidden]5 mins ago
Maybe the little plane icons on the ATC screens are a PNG hosted on some Cloudflare domain.
AmbroseBierce [3 hidden]5 mins ago
They laughed at my base64 encoded icons, now there, enjoy your downtime.
Nextgrid [3 hidden]5 mins ago
Not a safety-critical system but I know passenger information screens in at least some airports are just full-screen browsers displaying a SaaS-hosted webpage.
Popeyes [3 hidden]5 mins ago
Fontawesome?
Maxion [3 hidden]5 mins ago
minified js depependency loaded from some CDN?
hoherd [3 hidden]5 mins ago
These days planes got problems with all kinds of clouds.
ErroneousBosh [3 hidden]5 mins ago
Annoyingly I wanted to fly a parcel from Edinburgh up to Stornoway, but it's looking like I'd be quicker driving the seven hours up to the ferry terminal myself.
noosphr [3 hidden]5 mins ago
I know I shouldn't, but I really can't help myself: https://blog.cloudflare.com/20-percent-internet-upgrade/
jsheard [3 hidden]5 mins ago
It could always be worse: https://en.wikipedia.org/wiki/Cloudbleed

They haven't had an incident that bad since they switched from C to Rust.

noosphr [3 hidden]5 mins ago
Yes, it's been a great two months.
winternewt [3 hidden]5 mins ago
What are you implying by linking to that article?
amiga386 [3 hidden]5 mins ago
In the chain of events that led to Cloudflare's largest ever outage, code they'd rewritten from C to Rust was significant factor.

They expected a maximum config file size but an upstream error meant it was much larger than normal. Their Rust code parsed a fraction of the config, then did ".unwrap()" and panicked, crashing the entire program.

This validated a number of things that programmers say in response to Rust advocates who relentlessly badger people in pursuit of mindshare and adoption:

* memory errors are not the only category of errors, or security flaws. A language claiming magic bullets for one thing might be nonetheless be worse at another thing.

* there is no guarantee that if you write in <latest hyped language> your code will have fewer errors. If anything, you'll add new errors during the rewrite

* Rust has footguns like any other language. If it gains common adoption, there will be doofus programmers using it too, just like the other languages. What will the errors of Rust doofuses look like, compared to C, C++, C#, Java, JavaScript, Python, Ruby, etc. doofuses?

* availability is orthagonal to security. While there is a huge interest in remaining secure, if you design for "and it remains secure because it stops as soon as there's an error", have you considered what negative effects a widespread outage would cause?

RamRodification [3 hidden]5 mins ago
I'm not the person you are replying to, but like all of technology, you just find the latest (or most public) change made, and then fire your blame-cannon at it.

Excel crashed? Must be that new WiFi they installed!

ErroneousBosh [3 hidden]5 mins ago
"Ever since you replaced my wiper blades the clutch has been slipping"
skywhopper [3 hidden]5 mins ago
Cloudflare was crowing that their services were better because “We write a lot of Rust, and we’ve gotten pretty good at it.”

The last outage was in fact partially due to a Rust panic because of some sloppy code.

Yes, these complex systems are way more complex than just which language they use. But Cloudflare is the one who made the oversimplified claim that using Rust would necessarily make their systems better. It’s not so simple.

stingraycharles [3 hidden]5 mins ago
“haha rust is bad” or something, is’s a silly take. these things hardly, if ever, are due to programming language choice and rather due to complicated interactions between different systems.
nixosbestos [3 hidden]5 mins ago
Do you have a point? Curious if you can articulate one or if there's gonna be a "I'm just asking questions man" response.
7-Zark-7 [3 hidden]5 mins ago
The old guard has left as they we too much of an expense in this cost-cutting age... without mentors, crap creeps in and now we are seeing what happens when people don't know how things work, are in charge...
cowsandmilk [3 hidden]5 mins ago
I’m sick of people saying this. The truth at every cloud provider:

1. There were outages under the old guard.

2. The new guard is operating systems that are larger than what the old guard operated.

gosub100 [3 hidden]5 mins ago
You don't think companies try to save costs by hiring the cheapest and dumbest people?
kylecazar [3 hidden]5 mins ago
I don't think that's the exact mechanism, no.

They might go on hiring freezes more often, reconsider a role, and in some cases pass on someone too expensive... But I don't think many companies are actively out trawling for "cheap and dumb".

I'm sure you'll find some, but not Cloudflare.

bflesch [3 hidden]5 mins ago
Why not build the next cloudflare then, I'm sure it is appreciated by HN folks.
ai-christianson [3 hidden]5 mins ago
Or maybe we just move away from cloudflare-like services altogether.
bflesch [3 hidden]5 mins ago
Ideally, yes. Maybe someone can build a CDN on top of uncloud.
smallerize [3 hidden]5 mins ago
And just live with high latency?
jsheard [3 hidden]5 mins ago
The website you're using right now is hosted from a single location without any kind of CDN, so unless by coincidence you just happen to live next door then you seem to be managing. Not bundling 40MB of Javascript or doing 50 roundtrips to load a page goes a long way.
bflesch [3 hidden]5 mins ago
What is "high latency" nowadays? If people wouldn't bundle 30mb into every html page it wouldn't be needed.

Also cloudflare is needed due to DDOS and abuse from rogue actors, which are mostly located in specific areas. Residential IP ranges in democratic countries are not causing the issues.

ptidhomme [3 hidden]5 mins ago
Aren't botnet targeting cheap and unsecured consumer devices specifically in the residential IP ranges ?
steve1977 [3 hidden]5 mins ago
That stupid Cloudflare check page often adds latency in orders of magnitude compared to what a few thousand miles of cables would. Also most applications and websites are not that sensitive to latency anyway, at least when done properly.
cess11 [3 hidden]5 mins ago
Sure, why not?
simultsop [3 hidden]5 mins ago
To mars data center right?
dontlaugh [3 hidden]5 mins ago
With what capital exactly?
yatralalala [3 hidden]5 mins ago
Internet is no longer decenstralised.

Some interesting DNS data https://news.ycombinator.com/item?id=46159249

huijzer [3 hidden]5 mins ago
Hence why I wrote a post on 18th of Nov (previous Cloudflare outage): https://huijzer.xyz/posts/123/do-not-put-your-site-behind-cl....

That blog post made it to the front page of HN and my site did not go down. Nor did any DDoS network take the site out even though I also challenged them last time by commenting that I would be okay with a DDoS. I would figure out a way around it.

In general, marketing often works via fear, that's why Cloudflare has those blog posts talking about "largest botnet ever". Advertisement for medicine for example also works often via fear. "Take this or you die", essentially.

peanut-walrus [3 hidden]5 mins ago
Cloudflare is widely used because it's the easiest way to run a website for free or expose local services to internet. I think for most cloudflare users, the ddos protection is not the main reason they're using it.
miyuru [3 hidden]5 mins ago
I am using cloudflare because the origin servers are IPv6 only.
bo1024 [3 hidden]5 mins ago
Cloudflare hosts websites for free?
em500 [3 hidden]5 mins ago
Yes, marketing often works via fear. And decision making in organizations often works through blame shifting and diffusion of accountability. So organizations will just stick with centralization and Cloudfare, AWS, Microsoft et al regardless of technical concerns.
f311a [3 hidden]5 mins ago
> A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components. We will share more information as we have it today.
tietjens [3 hidden]5 mins ago
does this mean we can blame React Server Components for something new?
osener [3 hidden]5 mins ago
Listen to the sound of HN hawks erupting with joy when they realize they can blame JS, React, RSC, Rust, Cloudflare, and the cloud all for one outage.
tietjens [3 hidden]5 mins ago
For those hawks, Christmas has come early.
johncolanduoni [3 hidden]5 mins ago
I always suspected RSC was actually a secret Facebook plan to sabotage the React ecosystem now that their competitors all use it to some degree. Now I’m convinced.
girvo [3 hidden]5 mins ago
I mean RSC wasn’t really even the FB folks as far as I remember, they barely control React anymore
Jsmith4523 [3 hidden]5 mins ago
You know what, maybe AI is taking all the goddamn jobs
akKsbba [3 hidden]5 mins ago
They’re a global company that offshores with location based pay and utilizes H1Bs. I think that’s the first thing to look at. You get what you pay for.

Stop trying to devalue labor. Not much sympathy when you’re obviously cutting corners.

_fizz_buzz_ [3 hidden]5 mins ago
Just because someone is on an H1B visa doesn't mean they know less. It's a bit rich to blame this on foreign workers even though nothing is known about who or what caused this outage.
greenchair [3 hidden]5 mins ago
Knowledge + tech skills are not the only factor that lead to subpar outcomes with these scenarios. In my experience the thing that causes the most problems with H1Bs is the weak English and related communication issues.
codingdave [3 hidden]5 mins ago
In my experience, the communication problems stem from the Americans who expect perfect English from all others. English is spoken across the entire business world between people for whom it is not their first language. The accents and broken English is epic in many organizations. Yet they work through it and get things done together.

If you work harder at taking the burden upon yourself to understand others, you might be surprised how well people can learn to communicate despite differing backgrounds.

akKsbba [3 hidden]5 mins ago
I’m blaming it on paying workers less. H1B, location based pay, offshoring, etc. are all ways to pay workers less.
renegade-otter [3 hidden]5 mins ago
The problem with H1B is that these people are effectively prisoners. The market is not so hot right now even for those who have leverage, but combine it with the visa system and you get this "gotta do the needful" attitude to please the bosses, rushing broken fixes to production.
ivanbalepin [3 hidden]5 mins ago
if this is referring to Cloudflare, they are not yet particularly known for any major non-sales layoffs, ai or not.
immibis [3 hidden]5 mins ago
They pretty much said this. All the big companies that had recent outages are companies that publicly embraced vibe coding.
GaryBluto [3 hidden]5 mins ago
In the 80s, a "series" of fires broke out and destroyed many homes and businesses in England, all of which having a print of a painting known as 'The Crying Boy'. The painting has ever since been rumoured to be haunted.

Obviously, 'The Crying Boy' was not the cause of the fires, it was just that most homes in the 80s England had that print, as it was a popular one, and people found a pattern where there wasn't one.

jasonvorhe [3 hidden]5 mins ago
causality, causation, yadda yadda. They already explained that it was some react server component update. sure, could've also been done with some ai assist but we don't know.

These companies also don't vibe code (which would involve just prompting without editing code yourself, at least that's the most common definition).

I really hope news like these won't be followed by comments like these (not criticism of you personally) until the AI hype dies down a bit. It's getting really tiresome to always read the same oversimplified takes every time there's some outage involving centralized entities such as cloudflare instead of talking about the elephant in the room, which is their attempt of doing MITM on the majority of internet users.

johncolanduoni [3 hidden]5 mins ago
All the big companies embraced vibe coding, so I’m not sure there was a natural experiment here.
poszlem [3 hidden]5 mins ago
This ignores all the companies that publicly embraced vibe coding and did NOT have outages. Not a huge fan of vibe coding, but let's keep the populism to minimum here.
lxgr [3 hidden]5 mins ago
On top of that, humans are more than capable of causing high-impact outages as well. (It's easier with massive unforced centralization, of course.)
thenthenthen [3 hidden]5 mins ago
We seriously need to start to be thinking of Up-detectors
ablation [3 hidden]5 mins ago
Already being discussed here: https://news.ycombinator.com/item?id=46158191
bobowzki [3 hidden]5 mins ago
I host my companys website on Cloudflare pages using Cloudflare's DNS. I don't want to move to 100% self hosting but I would like to have self hosted backup. Has anyone solved this?
skywhopper [3 hidden]5 mins ago
Having a self-hosted “backup” that is ready to go at any time means having a self-hosted server that’s always on, basically. There are lots of cheap colo or VM options out there. But the problem is going to be dealing with an outage… how do you switch DNS over when your DNS provider is down?

Well, one way is to use a different DNS provider than either of your hosting options.

You can see this is getting complicated. Might be better to take the downtime.

But if I had to make a real recommendation I’m not aware of any time in the last decade that a static site deployed on AWS S3/Cloudfront would have actually been unavailable.

stanislavb [3 hidden]5 mins ago
I "envy" DownDetector these days ... I wanna know how much money they are making out of these Cloudflare downs...
hdgvhicv [3 hidden]5 mins ago
And once again simple self hosted services remain up.
sneak [3 hidden]5 mins ago
No; a lot of people still put those behind cloudflare.
steve1977 [3 hidden]5 mins ago
I would not call these simple and self-hosted then.
willswire [3 hidden]5 mins ago
Woke up this morning with my iPhone and Apple Watch suddenly in a different time zone. Anyone else experience this?
testplzignore [3 hidden]5 mins ago
Are you by chance on an airplane?
Maksadbek [3 hidden]5 mins ago
Do we have downdectector for downdetector ?
pyuser583 [3 hidden]5 mins ago
I assume this is why Claude stopped working
lionkor [3 hidden]5 mins ago
There are other LLMs you can ask to be absolutely, 100% sure.
Maxion [3 hidden]5 mins ago
You're absolutely right – Here is a list of current SOTA models that you can try!

Would you want me to:

- Create a list of all LLM models released in the past few months

- Let you know why my existence means you can't afford RAM anymore

- Help you learn sustenance farming so that you can feed your family in the coming AI future?

FranklinMaillot [3 hidden]5 mins ago
Not sure if this is related, but has anyone seen their allowance used up unexpectedly fast? Had Claude Code Web showing service disruption warnings, and all of a sudden I'm at 92% usage.

I'm on the pro plan, only using Sonnet and Haiku. I almost never hit the 5-hour limit, let alone in less than 2 hours.

CGamesPlay [3 hidden]5 mins ago
Did you accidentally hit tab to turn on “always thinking”? It burns tokens much faster.
jakewins [3 hidden]5 mins ago
PascalStehling [3 hidden]5 mins ago
downdetectors downdetector shows that downdector should not be down. Something is wrong here.

https://downdetectorsdowndetector.com/

terom [3 hidden]5 mins ago
downdetectorsdowndetector.com does not load the results as part of the HTML, nor does it do any API requests to retrieve the status. Instead, the obfuscated javascript code contains a `generateMockStatus()` function that has parts like `responseTimeMs: randomInt(...)` and a hardcoded `status: up` / `httpStatus: 200`. I didn't reverse-engineer the entire script, but based on it incorrectly showing downdetector.com as being up today, I'm pretty sure that downdetectorsdowndetector.com is just faking the results.

downdetectorsdowndetectorsdowndetector.com and downdetectorsdowndetectorsdowndetectorsdowndetector.com seem like they might be legit. One has the results in the HTML, the other fetches some JSON from a backend (`status4.php`).

ericcurtin [3 hidden]5 mins ago
Time to use some local ai with Docker Model Runner :)

No cloudflare no problem

https://github.com/docker/model-runner

jacquesm [3 hidden]5 mins ago
Neat trick: just do job interviews when Cloudflare is down...
dev_l1x_be [3 hidden]5 mins ago
Instead of figuring out a novel way of distributing content a stateful way with security and redundancy in mind we have created the current centralised monstrosity that we call the modern web. ¯\_(ツ)_/¯
privera13 [3 hidden]5 mins ago
Down of a System
ndsipa_pomu [3 hidden]5 mins ago
WAKE UP!
ractive [3 hidden]5 mins ago
Grab a brush and put a little makeup.
camillomiller [3 hidden]5 mins ago
The entirety of shopify was down too for 30 minutes.
bflesch [3 hidden]5 mins ago
How are these clowns deploying stuff on a Friday, it is unbelievable to me. It is not even funny any more. It seems cloudflare is held together by marketing only. They should stop all of these stupid initiatives and keep their stack simple.

And I'm 100% sure the management responsible for this is already fueling up the ferraris to drive to their beach house. All of us make them rich and they keep on enshittifying their product out of pure hubris.

throwaway_x235 [3 hidden]5 mins ago
> How are these clowns deploying stuff on a Friday, it is unbelievable to me

I have stopped fighting this battle at work. Despite Friday being one of the most important days of the week for our customers, people still push out the latest commit 10 minutes before they leave in the afternoon. Going on a weekend trip home to your family? No problem, just deploy and be offline for hours while you are traveling...

The response was that my way of thinking is "old school". Modern development is "fail fast" and that CI/CD with good tests and rollback fixes everything. Being afraid of deploys is "so 2010s"... The problem is that our tests don't cover everything, and not all deploys can be rolled back quickly and the person who knows how what their commit actually does is unavailable!

We have had multiple issues with late Friday-commits, but somehow we keep doing this. Funnily enough, I have noticed a pattern. Many devs only do this a few times due to the massive backlash from customers when they are fixing the bug. So gradually they learn to deploy at less busy times. The problem is that not enough has learned this lesson, or are too invested in their point of view to change. It seems that some individuals learn the hard way, but the organization has not learned or is too afraid to push for a change.

pessimizer [3 hidden]5 mins ago
If you are a monopoly, there is no incentive to do anything well. You've saturated the market, the incentive is to cut costs.

In fact, there are incentives for public failures: they'll help the politicians that you bought sell the legislation that you wrote explaining how national security requires that the taxpayer write a check to your stockholders/owners in return for nothing.

heisenbit [3 hidden]5 mins ago
If the deployment was related to the React Server issue then maybe it was unavoidable.
uyzstvqs [3 hidden]5 mins ago
Cloudflare's entire WAF depending on React is an issue in itself IMO.
steve1977 [3 hidden]5 mins ago
Everything related to React is avoidable by not using React.
bflesch [3 hidden]5 mins ago
Yes, but a hotfix was already in place. They chose to deploy the "proper fix" this morning, and obviously it went wrong. Also they didn't do a phased rollout because it impacted their high-value customers such as shopify as well as claude, causing significant damages. Their procedures are not good.
giuscri [3 hidden]5 mins ago
how do you tell they deployed new stuff?
bflesch [3 hidden]5 mins ago
See https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

"A change made to how Cloudflare's Web Application Firewall parses requests caused Cloudflare's network to be unavailable for several minutes this morning. This was not an attack; the change was deployed by our team to help mitigate the industry-wide vulnerability disclosed this week in React Server Components."

The bug is known since several days, and the hotfix was already in place. So they worked on the "final fix" and chose to deploy it on a friday morning.

tonyhart7 [3 hidden]5 mins ago
You know its bad when DownDetector is also down
poulpy123 [3 hidden]5 mins ago
what about downdowndetectordetector ?
noosphr [3 hidden]5 mins ago
Another dozen or so of these and the self mutilation that teach companies have engaged in the last few years with mass lay-offs should finally end.

Extrapolating at current rates I guess that means April 2026.

echelon [3 hidden]5 mins ago
Appears to be fixed now. Just lost 30 minutes of working.

If this is unwrap() again, we need to have a talk about Rust panic safety.

jasmes [3 hidden]5 mins ago
Time to rewrite Rust’s unwrap() in Rust obviously.
Maxion [3 hidden]5 mins ago
Does it make it worse or better if I say it's RSC?

https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q

johncolanduoni [3 hidden]5 mins ago
Well, technically RSC was messed up, and then the hotfix for the messed up RSC was itself messed up. I guess there’s a lot of blame to go around.
jacquesm [3 hidden]5 mins ago
Now multiply that 'just' by the number of people affected.
LightBug1 [3 hidden]5 mins ago
Ok, at what point is "We use Cloudflare" going to be a supply-chain red marker?

At what point does the cost outweigh the benefit?

B4n4n4 [3 hidden]5 mins ago
LinkedIn down
jacquesm [3 hidden]5 mins ago
That's a net positive then.
borplk [3 hidden]5 mins ago
Incoming "Look at all this cool postmortem stuff about our fuckup" blog post. It's getting a bit old guys.
Vivianfromtheo [3 hidden]5 mins ago
Crunchyroll down too got me and the anime community stressed
A_D_E_P_T [3 hidden]5 mins ago
If Crunchyroll is down for 30 minutes it's nbd, because you know they'll be back. If the pirate sites are down for any duration, it can be very stressful, because they can be gone for good.
pappya_coder [3 hidden]5 mins ago
Yes
Folyd [3 hidden]5 mins ago
i can confirm it down again
RockRobotRock [3 hidden]5 mins ago
[flagged]
JimmaDaRustla [3 hidden]5 mins ago
This made getting paged at 4am worth it
sebmellen [3 hidden]5 mins ago
Me too man