> What I had missed is that we deployed a new internal service last week that sent less than three GetPostRecord requests per second, but it did sometimes send batches of 15-20 thousand URIs at a time. Typically, we'd probably be doing between 1-50 post lookups per request.
That’ll do it.
98codes [3 hidden]5 mins ago
Ahh, the three relevant numbers in development: 0, 1, and infinity.
jandrese [3 hidden]5 mins ago
The incredible part about this is because their backend is all TCP/IP they were literally exhausting the ports by leaving all 65k of them in TIME_WAIT, and the workaround was to start randomizing the localhost address to give them another trillion ports or so.
bombcar [3 hidden]5 mins ago
Zero, one, many, many thousands.
LoganDark [3 hidden]5 mins ago
And then they fix the issue by using multiple localhost IPs rather than, perhaps, not sending 15-20 thousand URIs at a time
odo1242 [3 hidden]5 mins ago
They mentioned it was a temporary fix that they removed after finding and fixing the true root cause, though.
htx80nerd [3 hidden]5 mins ago
less than ideal if I had to be frank.
drewg123 [3 hidden]5 mins ago
Golang's use of a potentially unbounded number of threads is just insane. I used to be fairly bullish on golang, but this, combined with the fact that its garbage collected, makes me feel its just unsuitable for production use.
tapoxi [3 hidden]5 mins ago
I don't really understand this architecture, but I thought Bluesky was distributed like Mastodon? How can it have an outage?
The simple answer is that atproto works like the web & search engines, where the apps aggregate from the distributed accounts. So the proper analogy here would be like yahoo going down in 1999.
tapoxi [3 hidden]5 mins ago
This is a fantastic write-up, thanks for sharing!
isodev [3 hidden]5 mins ago
Google and MSN Search were already available at this time. Also websites used to publish webrings and there was IRC and forums to ask people about things.
isodev [3 hidden]5 mins ago
It’s more of a concept of a plan for being distributed. I even went through the trouble of hosting my own PDC and still, I was unable to use the service during the outage
Retr0id [3 hidden]5 mins ago
Mastodon infra can have outages, too.
tapoxi [3 hidden]5 mins ago
It's just confined to one instance if it goes down, not all of Mastodon.
LoganDark [3 hidden]5 mins ago
A web interface and home server can have an outage. Bluesky is just a web interface and home server.
mwkaufma [3 hidden]5 mins ago
Tell us more about this buggy "new internal service" that's scraping batch data :P
goekjclo [3 hidden]5 mins ago
> The timing of these log spikes lined up with drops in user-facing traffic, which makes sense. Our data plane heavily uses memcached to keep load off our main Scylla database, and if we're exhausting ports, that's a huge problem.
I expect this is common.
pembrook [3 hidden]5 mins ago
Distributed social media goes down? hrmmm.
Email and the internet don't have "downtime." Certain key infra providers do of course. ISPs can go down. DNS providers can go down. But the internet and email itself can't go down absent a global electricity outage.
You haven't built a decentralized protocol until you reach that standard imo. Otherwise its just "distributed protocol" cosplay. Nice virtue signaling hat, but you aren't the real thing. Kind of like how everybody has been amnesia'd into thinking Obsidian is open source when it really isn't.
iAMkenough [3 hidden]5 mins ago
Bluesky is a provider. Blacksky didn’t go down.
gsibble [3 hidden]5 mins ago
Did all 3 users notice?
ffsm8 [3 hidden]5 mins ago
Naw, only one did. Turns out the other two were his socket accounts he used to upvote and comment on his own content.
Okay, nuff trolling for today
electrondood [3 hidden]5 mins ago
Great write up... curious about the RCA. Thanks!
rvz [3 hidden]5 mins ago
Thank you for the post mortem on this outage.
templar_snow [3 hidden]5 mins ago
[flagged]
lavela [3 hidden]5 mins ago
Why?
jonstaab [3 hidden]5 mins ago
nostr never goes down
jandrese [3 hidden]5 mins ago
If nostr went down would people even notice?
jonstaab [3 hidden]5 mins ago
probably not
pfraze [3 hidden]5 mins ago
All support to other decentralizers but nothing never goes down.
jonstaab [3 hidden]5 mins ago
1000x redundancy makes it vanishingly unlikely. Although I know we're due for a pole shift so all bets are off I suppose.
jmclnx [3 hidden]5 mins ago
Lite Blue on a dark Blue background. That is a new one, I have seen grey text on lite grey, but blue on blue ?
The article does work in lynx, at least I can read it.
That’ll do it.
The simple answer is that atproto works like the web & search engines, where the apps aggregate from the distributed accounts. So the proper analogy here would be like yahoo going down in 1999.
I expect this is common.
Email and the internet don't have "downtime." Certain key infra providers do of course. ISPs can go down. DNS providers can go down. But the internet and email itself can't go down absent a global electricity outage.
You haven't built a decentralized protocol until you reach that standard imo. Otherwise its just "distributed protocol" cosplay. Nice virtue signaling hat, but you aren't the real thing. Kind of like how everybody has been amnesia'd into thinking Obsidian is open source when it really isn't.
Okay, nuff trolling for today
The article does work in lynx, at least I can read it.