Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

NodeBB

  1. Home
  2. Fediverse memes
  3. This should have end up differently

This should have end up differently

Scheduled Pinned Locked Moved Fediverse memes
44 Posts 19 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R redacted@lemmy.zip

    Why does turning off bots turn off federation?

    dave@lemmy.nzD This user is from outside of this forum
    dave@lemmy.nzD This user is from outside of this forum
    dave@lemmy.nz
    wrote last edited by dave@lemmy.nz
    #19

    Cloudflare's bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).

    For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It's telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.

    For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it's from 1IP then it's probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.

    I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.

    This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.

    I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).

    cooper8@feddit.onlineC R R 3 Replies Last reply
    12
    • dave@lemmy.nzD dave@lemmy.nz

      Cloudflare's bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).

      For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It's telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.

      For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it's from 1IP then it's probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.

      I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.

      This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.

      I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).

      cooper8@feddit.onlineC This user is from outside of this forum
      cooper8@feddit.onlineC This user is from outside of this forum
      cooper8@feddit.online
      wrote last edited by
      #20

      The thing that confuses me is, wouldn't a whitelist for federated instances and request frequency throttling at the account level solve this issue?

      I suppose this would require that the client not have a public front end that keeps full navigation functionality, but for a smaller instance that seems like an easy sacrifice to make in exchange for stability.

      "But then how will new instances get federated?" maybe they have to actually talk to the admins of other instances to get vouched in to the whitelist. Just because the network is distributed doesnt mean it needs to be fully inclusive by default, and in fact it explicitly isn't.

      I'm assuming I'm missing something super basic that makes all this not enough, bots spoofing the requests with the credentials of a whitelisted instance maybe?

      Seems like maybe the instances should have encrypted keys that handshake each other with batch requests.

      Am I on to something or just wildly gesticulating?

      dave@lemmy.nzD 1 Reply Last reply
      4
      • cooper8@feddit.onlineC cooper8@feddit.online

        The thing that confuses me is, wouldn't a whitelist for federated instances and request frequency throttling at the account level solve this issue?

        I suppose this would require that the client not have a public front end that keeps full navigation functionality, but for a smaller instance that seems like an easy sacrifice to make in exchange for stability.

        "But then how will new instances get federated?" maybe they have to actually talk to the admins of other instances to get vouched in to the whitelist. Just because the network is distributed doesnt mean it needs to be fully inclusive by default, and in fact it explicitly isn't.

        I'm assuming I'm missing something super basic that makes all this not enough, bots spoofing the requests with the credentials of a whitelisted instance maybe?

        Seems like maybe the instances should have encrypted keys that handshake each other with batch requests.

        Am I on to something or just wildly gesticulating?

        dave@lemmy.nzD This user is from outside of this forum
        dave@lemmy.nzD This user is from outside of this forum
        dave@lemmy.nz
        wrote last edited by
        #21

        There are thousands of instances and it's not really about admins. If a Mastodon user wants to go and follow a Lemmy community, they can. They shouldn't need to ask their admin to contact the admin of the Lemmy instance to be allowed to.

        However, there is something called Fediseer which allows a chain of trust. Some instances guarantee other instances who then guarantee others down a chain. If an instance turns out bad then their guarantor can revoke it and any instances lower in the chain (that the spammy instance guarantees) also lose their trusted status. It doesn't share IPs to my knowledge though, and outbound IPs are different than the inbound one on the domain if there is a CDN like Cloudflare in the mix. The intent is actually to identify and block instances set up to spam (or other reasons to defederate).

        I think the other part missing is that it's not just instances. If you upload an image to Lemmy.world and then someone on feddit.online views it, the feddit.online user's browser loads that image directly from Lemmy.world. That means if you block any IP that's not an instance, people won't be able to see content uploaded by your users. So you have to be able to tell what is a Brazil-hosted AI bot and what's a Brazilian user viewing a meme your user uploaded.

        There are of course different parts that you can or can't block which is basically the idea, working out which endpoints can be blocked and which will break things for genuine users. With static images they can be basically ignored because Cloudflare will cache it, but having thousands of post or feed loads in a hurry can bring down an instance.

        cooper8@feddit.onlineC 1 Reply Last reply
        4
        • dave@lemmy.nzD dave@lemmy.nz

          Running an instance without cloudflare in front is hard work, because AI scrapers bring it to it's knees. It's a never ending battle to block them even with Cloudflare, at least Cloudflare can help reduce the load, and even the free version comes with many tools to identify and block problematic bots.

          Though if you turn on bot blocking you break federation, so you have to be a lot more refined in your security rules.

          D This user is from outside of this forum
          D This user is from outside of this forum
          dragonfucker@lemmy.nz
          wrote last edited by
          #22

          What about Anubis?

          dave@lemmy.nzD 1 Reply Last reply
          2
          • D dragonfucker@lemmy.nz

            What about Anubis?

            dave@lemmy.nzD This user is from outside of this forum
            dave@lemmy.nzD This user is from outside of this forum
            dave@lemmy.nz
            wrote last edited by
            #23

            Yeah so anubis is like a Cloudflare challenge, it fits in to a certain part of the process.

            My point is basically that Cloudflare provides a service that stands in for many things an admin could be doing. There are many instances that don't use Cloudflare, and I commend them for that. It's more work but certainly possible.

            There's also the additional problem that AI bots are breaking through anubis so it can't be the only line of defence.

            E.g. https://news.ycombinator.com/item?id=44914773

            D 1 Reply Last reply
            4
            • dave@lemmy.nzD dave@lemmy.nz

              Yeah so anubis is like a Cloudflare challenge, it fits in to a certain part of the process.

              My point is basically that Cloudflare provides a service that stands in for many things an admin could be doing. There are many instances that don't use Cloudflare, and I commend them for that. It's more work but certainly possible.

              There's also the additional problem that AI bots are breaking through anubis so it can't be the only line of defence.

              E.g. https://news.ycombinator.com/item?id=44914773

              D This user is from outside of this forum
              D This user is from outside of this forum
              dragonfucker@lemmy.nz
              wrote last edited by
              #24

              Interesting, thanks

              1 Reply Last reply
              1
              • dave@lemmy.nzD dave@lemmy.nz

                Cloudflare's bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).

                For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It's telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.

                For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it's from 1IP then it's probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.

                I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.

                This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.

                I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).

                R This user is from outside of this forum
                R This user is from outside of this forum
                redacted@lemmy.zip
                wrote last edited by
                #25

                Thank you for the detailed response 🙂 i even understood most of it

                1 Reply Last reply
                1
                • kierunkowy74@piefed.socialK kierunkowy74@piefed.social

                  Apparently there is a decentralized internet out there. Just we are not experiencing it right now. Skill issue, huh?

                  insert cursed wojak reaction

                  irelephant@lemmy.dbzer0.comI This user is from outside of this forum
                  irelephant@lemmy.dbzer0.comI This user is from outside of this forum
                  irelephant@lemmy.dbzer0.com
                  wrote last edited by
                  #26

                  cursed wojak

                  1 Reply Last reply
                  3
                  • slothrop@lemmy.caS slothrop@lemmy.ca

                    sh.itjust.works was out

                    kierunkowy74@piefed.socialK This user is from outside of this forum
                    kierunkowy74@piefed.socialK This user is from outside of this forum
                    kierunkowy74@piefed.social
                    wrote last edited by
                    #27

                    crap.itdidnt.work

                    1 Reply Last reply
                    1
                    • dave@lemmy.nzD dave@lemmy.nz

                      There are thousands of instances and it's not really about admins. If a Mastodon user wants to go and follow a Lemmy community, they can. They shouldn't need to ask their admin to contact the admin of the Lemmy instance to be allowed to.

                      However, there is something called Fediseer which allows a chain of trust. Some instances guarantee other instances who then guarantee others down a chain. If an instance turns out bad then their guarantor can revoke it and any instances lower in the chain (that the spammy instance guarantees) also lose their trusted status. It doesn't share IPs to my knowledge though, and outbound IPs are different than the inbound one on the domain if there is a CDN like Cloudflare in the mix. The intent is actually to identify and block instances set up to spam (or other reasons to defederate).

                      I think the other part missing is that it's not just instances. If you upload an image to Lemmy.world and then someone on feddit.online views it, the feddit.online user's browser loads that image directly from Lemmy.world. That means if you block any IP that's not an instance, people won't be able to see content uploaded by your users. So you have to be able to tell what is a Brazil-hosted AI bot and what's a Brazilian user viewing a meme your user uploaded.

                      There are of course different parts that you can or can't block which is basically the idea, working out which endpoints can be blocked and which will break things for genuine users. With static images they can be basically ignored because Cloudflare will cache it, but having thousands of post or feed loads in a hurry can bring down an instance.

                      cooper8@feddit.onlineC This user is from outside of this forum
                      cooper8@feddit.onlineC This user is from outside of this forum
                      cooper8@feddit.online
                      wrote last edited by cooper8@feddit.online
                      #28

                      Fediseer seems like a good solution, essentially a whitelist vouch system with touching at second hand.

                      Regarding the media hosting, again it seems like something that could rely on a method of identifying the user request directly with their user account before responding to the request. Cookies could be an option for this, though they are falling out of favor. Alternately, and more securely, it could be a cryptographic handshake where the user's home instance and the instance hosting the post generate a public key using their two private keys for the user, and the user provides the public key when making pull requests from the federated instance. The keys could be batch generated when an instance first federates content with another and then assigned to user accounts the first time the user makes a pull request through a link from their home instance to the federated instance.

                      Secure Scuttlebutt Protocol already deved the encryption methodology that could be cross applied for a lot of this: https://ssbc.github.io/scuttlebutt-protocol-guide/ though I am of course not suggesting SSP be adopted whole cloth, and there are a bunch of other OS projects with encryption that could be used. This is just the one that comes to mind.

                      (edit: also I am in favor of finding methodologies that work whether CloudFlare is used by the instance or not, obviously CloudFlare has advantages but as we have seen also is a vulnerability of the network.)

                      dave@lemmy.nzD 1 Reply Last reply
                      1
                      • cooper8@feddit.onlineC cooper8@feddit.online

                        Fediseer seems like a good solution, essentially a whitelist vouch system with touching at second hand.

                        Regarding the media hosting, again it seems like something that could rely on a method of identifying the user request directly with their user account before responding to the request. Cookies could be an option for this, though they are falling out of favor. Alternately, and more securely, it could be a cryptographic handshake where the user's home instance and the instance hosting the post generate a public key using their two private keys for the user, and the user provides the public key when making pull requests from the federated instance. The keys could be batch generated when an instance first federates content with another and then assigned to user accounts the first time the user makes a pull request through a link from their home instance to the federated instance.

                        Secure Scuttlebutt Protocol already deved the encryption methodology that could be cross applied for a lot of this: https://ssbc.github.io/scuttlebutt-protocol-guide/ though I am of course not suggesting SSP be adopted whole cloth, and there are a bunch of other OS projects with encryption that could be used. This is just the one that comes to mind.

                        (edit: also I am in favor of finding methodologies that work whether CloudFlare is used by the instance or not, obviously CloudFlare has advantages but as we have seen also is a vulnerability of the network.)

                        dave@lemmy.nzD This user is from outside of this forum
                        dave@lemmy.nzD This user is from outside of this forum
                        dave@lemmy.nz
                        wrote last edited by
                        #29

                        Regarding the media hosting, again it seems like something that could rely on a method of identifying the user request directly with their user account before responding to the request.

                        Yeah, so far it works to just check for a JWT in the cookie (regardless of what it is) to allow logged in users to bypass the rules. This works on Lemmy because the bots aren't specifically targetting Lemmy so they don't try to fake this (although if there were, just make an instance and our instances will send you all the data lol).

                        Alternately, and more securely, it could be a cryptographic handshake where the user’s home instance and the instance hosting the post generate a public key using their two private keys for the user, and the user provides the public key when making pull requests from the federated instance.

                        This is already basically how ActivityPub works for communication between instances. But the activities are one thing, it's the page loads that are the killer because of the database queries needed to compile a unique, sorted home page of subscriptions. You could block logged out users but that impacts many lurkers.

                        For media, that's difficult as media is often being loaded from a remote instance that doesn't know who you are, along with the problem that the media provider is not technically part of Lemmy (it's a separate service called pict-rs) so doesn't know if you're logged in. I'm not sure how that worked on PieFed or Mbin, but regardless you might not be logged in at all, and you should still be allowed to browse content.

                        Lemmy has a proxy option where the instance can fetch content from the other servers to provide to the user, which does get around this issue for logged out users. But the proxy caches the media, and when this happens you are now the host of whatever media is in any post that made it's way to your instance, along with all the legal risks that involves.

                        (edit: also I am in favor of finding methodologies that work whether CloudFlare is used by the instance or not, obviously CloudFlare has advantages but as we have seen also is a vulnerability of the network.)

                        All of the things being discussed around mitigations in Cloudflare are also possible to do without Cloudflare, but it just means setting it all up yourself. I'll just wait for someone smarter than me to build a tool I can host myself that does all this automatically, then I'll consider it 😅

                        cooper8@feddit.onlineC 1 Reply Last reply
                        1
                        • kierunkowy74@piefed.socialK kierunkowy74@piefed.social

                          Mine were PieFed.social, PieFed.zip and fedit.pl. I am aware of lemmy.world too. Probably there were more

                          U This user is from outside of this forum
                          U This user is from outside of this forum
                          ulterno@programming.dev
                          wrote last edited by
                          #30

                          Yeah mine too.
                          I started wondering if the requests from desktop Lemmy apps also go through Cloudflare (probably do).

                          1 Reply Last reply
                          0
                          • kierunkowy74@piefed.socialK kierunkowy74@piefed.social
                            This post did not contain any content.
                            C This user is from outside of this forum
                            C This user is from outside of this forum
                            cheesenoodle@lemmy.world
                            wrote last edited by
                            #31

                            So my takeaway from this thread is existing mega corporations have found a legal way (deliberately or not) to run endless denial of service attacks on potential competition?

                            1 Reply Last reply
                            5
                            • dave@lemmy.nzD dave@lemmy.nz

                              Running an instance without cloudflare in front is hard work, because AI scrapers bring it to it's knees. It's a never ending battle to block them even with Cloudflare, at least Cloudflare can help reduce the load, and even the free version comes with many tools to identify and block problematic bots.

                              Though if you turn on bot blocking you break federation, so you have to be a lot more refined in your security rules.

                              oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                              oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                              oftheair@lemmy.blahaj.zone
                              wrote last edited by
                              #32

                              because AI scrapers bring it to it’s knees

                              There are three (at least) piece of web software to protect from AI Scrapers currently, it should be more than possible without Cloudflare.

                              dave@lemmy.nzD 1 Reply Last reply
                              2
                              • oftheair@lemmy.blahaj.zoneO oftheair@lemmy.blahaj.zone

                                because AI scrapers bring it to it’s knees

                                There are three (at least) piece of web software to protect from AI Scrapers currently, it should be more than possible without Cloudflare.

                                dave@lemmy.nzD This user is from outside of this forum
                                dave@lemmy.nzD This user is from outside of this forum
                                dave@lemmy.nz
                                wrote last edited by
                                #33

                                It's not even possible to do a good job of it with Cloudflare. What are the three you are referring to? The most commonly known one is Anubis, which Codeberg found AI bots had learnt to solve them.

                                oftheair@lemmy.blahaj.zoneO 1 Reply Last reply
                                3
                                • dave@lemmy.nzD dave@lemmy.nz

                                  It's not even possible to do a good job of it with Cloudflare. What are the three you are referring to? The most commonly known one is Anubis, which Codeberg found AI bots had learnt to solve them.

                                  oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                                  oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                                  oftheair@lemmy.blahaj.zone
                                  wrote last edited by oftheair@lemmy.blahaj.zone
                                  #34

                                  Okay, seems there are only two as it seems nepenthes is no longer developed.

                                  • Anubis
                                  • Iocaine
                                  dave@lemmy.nzD 1 Reply Last reply
                                  3
                                  • oftheair@lemmy.blahaj.zoneO oftheair@lemmy.blahaj.zone

                                    Okay, seems there are only two as it seems nepenthes is no longer developed.

                                    • Anubis
                                    • Iocaine
                                    dave@lemmy.nzD This user is from outside of this forum
                                    dave@lemmy.nzD This user is from outside of this forum
                                    dave@lemmy.nz
                                    wrote last edited by
                                    #35

                                    Yeah so anubis is the bot blocking one, already breached by bots.

                                    Iocaine is an LLM maze and poisoner, intended to trap a bot but your site still needs the resources to serve all the requests, and it's not clear what happens when a user is accidentally identified as a bot.

                                    oftheair@lemmy.blahaj.zoneO 1 Reply Last reply
                                    3
                                    • dave@lemmy.nzD dave@lemmy.nz

                                      Yeah so anubis is the bot blocking one, already breached by bots.

                                      Iocaine is an LLM maze and poisoner, intended to trap a bot but your site still needs the resources to serve all the requests, and it's not clear what happens when a user is accidentally identified as a bot.

                                      oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                                      oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                                      oftheair@lemmy.blahaj.zone
                                      wrote last edited by
                                      #36

                                      Ah, okay.

                                      Thanks for the info!

                                      1 Reply Last reply
                                      1
                                      • dave@lemmy.nzD dave@lemmy.nz

                                        Cloudflare's bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).

                                        For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It's telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.

                                        For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it's from 1IP then it's probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.

                                        I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.

                                        This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.

                                        I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).

                                        R This user is from outside of this forum
                                        R This user is from outside of this forum
                                        rekabis@lemmy.ca
                                        wrote last edited by
                                        #37

                                        Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It's telling my instance about every post, comment, or vote.

                                        And yet, federation means that each instance should know all the other domain names, yes? So do daily DNS lookups of all IP addresses associated with federation and auto-whitelist them.

                                        Sure, if you have to then configure cloudflare with these IPs, it’ll require an API to do so automatically.

                                        But otherwise if you are running some sort of throttling protection on the actual box or VM the instance is sitting on, it should be rather trivial to update it directly, especially if said throttling software is doing Linux correctly and drawing its whitelist from a flat file.

                                        dave@lemmy.nzD 1 Reply Last reply
                                        1
                                        • slothrop@lemmy.caS slothrop@lemmy.ca

                                          sh.itjust.works was out

                                          oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                                          oftheair@lemmy.blahaj.zoneO This user is from outside of this forum
                                          oftheair@lemmy.blahaj.zone
                                          wrote last edited by
                                          #38

                                          Guess they didn't live up to their name.

                                          1 Reply Last reply
                                          1
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups