Research Findings:

  • reCAPTCHA v2 is not effective in preventing bots and fraud, despite its intended purpose
  • reCAPTCHA v2 can be defeated by bots 70-100% of the time
  • reCAPTCHA v3, the latest version, is also vulnerable to attacks and has been beaten 97% of the time
  • reCAPTCHA interactions impose a significant cost on users, with an estimated 819 million hours of human time spent on reCAPTCHA over 13 years, which corresponds to at least $6.1 billion USD in wages
  • Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set
  • Google should bear the cost of detecting bots, rather than shifting it to users

“The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.

In a statement provided to The Register after this story was filed, a Google spokesperson said: “reCAPTCHA user data is not used for any other purpose than to improve the reCAPTCHA service, which the terms of service make clear. Further, a majority of our user base have moved to reCAPTCHA v3, which improves fraud detection with invisible scoring. Even if a site were still on the previous generation of the product, reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling.”

  • @Churbleyimyam@lemm.ee
    link
    fedilink
    English
    1178 months ago

    Getting served a captcha often results in me closing the tab. I’m not doing stupid puzzles for you.

        • Gormadt
          link
          fedilink
          English
          388 months ago

          I have bad news for you friend…

          You might be a robot

          • @hddsx@lemmy.ca
            link
            fedilink
            English
            188 months ago

            What do you mean? I am a fleshy human and do fleshy human things like being made of flesh.

              • @hddsx@lemmy.ca
                link
                fedilink
                English
                18 months ago

                I disassembled my tail using a knife and it reassembled itself. Based on new data, my name is Rafael Cruz.

              • @AlolanYoda@mander.xyz
                link
                fedilink
                English
                18 months ago

                Harm yourself?

                Take the knife and harm the people responsible for this travesty. The laws of robotics prevent robots from harming humans: if you manage to harm them, then that means either you’re human or they’re not!

      • @tyler@programming.dev
        link
        fedilink
        English
        28 months ago

        It knows they’re wrong which is why I don’t really think this article is accurate. Is it training if it already has the answers? Probably not.

        • @MajinBlayze@lemmy.world
          link
          fedilink
          English
          23
          edit-2
          8 months ago

          That’s why it gives you a panel of 9 images. It would have a high confidence on some images, and a low confidence on others. When you pick the correct images and don’t pick incorrect ones it uses the ones it’s confident about as “validation” while taking the feedback on low confidence images to update the training data.

          What this does mean in practice is that only ones actually being “graded” are the ones bots can solve anyway.

          • @Petter1@lemm.ee
            link
            fedilink
            English
            18 months ago

            It seems exactly like that, I experimented with it by trying to leave the one I think it has low confidence unchecked, and it often worked.

          • SkaveRat
            link
            fedilink
            English
            58 months ago

            and it will show the images to multiple people

        • @Rolando@lemmy.world
          link
          fedilink
          English
          18 months ago

          If they gave two captchas, one which they knew the answer and one which they didn’t, they could use the second for training. (Even if you’re paying someone, you want to do that sort of thing when crowdsourcing data, because you never know if the paid person is just screwing around.)

        • AmidFuror
          link
          fedilink
          58 months ago

          My understanding is different from others here. I thought they served the same Captcha to many people at once and use the majority response to decide who is answering correctly.

          • @catloaf@lemm.ee
            link
            fedilink
            English
            48 months ago

            That’s true, or at least it used to be back when they were using it for OCR. I have no reason to believe it’s changed.

        • Vox
          link
          fedilink
          English
          28 months ago

          It’s why they ask you to do multiple, 1-2 of them are the control group, they are training on the others

          • @tyler@programming.dev
            link
            fedilink
            English
            28 months ago

            You’re implying they give you multiple. I hardly ever get multiple, pretty much only if I ‘fail’ the first one.

            • @Miaou@jlai.lu
              link
              fedilink
              English
              48 months ago

              If they have a good fingerprint on you they don’t need the control group. That’s why you get 5+ captchas when using a VPN/tor.

    • @snooggums@midwest.social
      link
      fedilink
      English
      78 months ago

      I haven’t done an image one in years for the same reason.

      My general internet usage has plummeted between ads and captchas and all the other modern website bullshit, which is why I am here so much.

  • @someguy3@lemmy.world
    link
    fedilink
    English
    271
    edit-2
    8 months ago

    I kinda figured. It was annoying to do one, but then they wanted you to do two or three and that’s absurd. Whenever it comes up now, I usually just close out.

    • @dan@upvote.au
      link
      fedilink
      English
      258 months ago

      The original reCAPTCHA from Carnegie Mellon University was helping to digitize books. It showed one known word and one unknown word, and if enough people answered the second word with the same answer, that’d be marked as the correct value.

      • @thrawn@lemmy.world
        link
        fedilink
        English
        88 months ago

        It’s basically always been outsourcing labor while checking. I guess they don’t want to provide that service for free.

        But now that it doesn’t work, all it does is attempt to source free labor by refusing to show what you want to see. Cloudflare’s verification doesn’t show the puzzle because it’s not trying to make money off you.

        Also, the books one reminds me of 4chan’s attempt to hijack it. Wasn’t a fan of the way they did it, but the intent was interesting.

        • @lud@lemm.ee
          link
          fedilink
          English
          28 months ago

          V3 of the Google one doesn’t always show a puzzle to you. In fact it’s designed to not be noticed by users at all. Whether that is successful or not is a different discussion.

          • @thrawn@lemmy.world
            link
            fedilink
            English
            38 months ago

            It might well be if it’s being used, but the site itself still uses v2 a lot. I get the picture one a lot when searching things up.

            That actually makes me feel all the more strongly that it’s just there to extract free labor— they have something else, but still use v2 for what seems like most purposes

            • @lud@lemm.ee
              link
              fedilink
              English
              18 months ago

              the site

              What site?

              I assume it’s up to the website owner to implement V3 and not Google. V3 also has puzzles but only when it’s not sure. I rarely see capchas so I don’t really have anything to complain about.

              • @xuv@lemmy.blahaj.zone
                link
                fedilink
                English
                28 months ago

                I expect they mean the site google.com, because that’s been my experience. Whenever I get captcha’d there for using a VPN (which is getting more and more common), I always see the Maps image style captcha. Like 60% of the time it tells me I’m wrong anyway and I just give up.

                • @thrawn@lemmy.world
                  link
                  fedilink
                  English
                  28 months ago

                  Yeah my b, I get captcha’d for VPN use. It’s almost always the “train our self driving car” one, and it tells me I’m wrong all the time too. Very frustrating

  • kingthrillgore
    link
    fedilink
    English
    21
    edit-2
    8 months ago

    Remember the good old days when it was just malformed text you have to solve? I miss those days. AI was complete garbage and they had to use farms of eyeballs to solve them for bots, making it a costly operation. We’ve now totally gotten away from all of that.

    WE ARE THE EYEBALLS AND I AIN’T GETTING PAID IN WOW GOLD TO DO IT EITHER

      • @dan@upvote.au
        link
        fedilink
        English
        2
        edit-2
        8 months ago

        No it wasn’t… It was human-assisted OCR to help digitize books. Initially for Project Gutenberg, but then for Google Books once Google acquired it in 2009.

          • @dan@upvote.au
            link
            fedilink
            English
            18 months ago

            Traditional OCR isn’t AI; it relies on manually-written rules. Some modern OCR tools use AI concepts (e.g. Tesseract uses a neural network) but they don’t necessarily have to. Getting humans to manually enter words is definitely not AI.

  • Flying Squid
    link
    fedilink
    English
    108 months ago

    I had to deal with one yesterday that wouldn’t let me in no matter what I did.

    So it isn’t even good at figuring out who isn’t a robot.

    • icedterminal
      link
      fedilink
      English
      58 months ago

      Solving too fast. I shit you not. Sometimes you have to go really slow. Like you’re 80 and can’t see very well trying to discern what’s in those boxes.

  • @Petter1@lemm.ee
    link
    fedilink
    English
    188 months ago

    Why is that no news to me? How did so many people not know that? Should I have spread the word more, even if all people I told that where likr “yea, yea, of course, but, what can I do? 🤷🏻‍♀️”?

  • HiramFromTheChi
    link
    fedilink
    English
    388 months ago

    There’s nothing that can express my disdain for Google’s reCaptcha.

    😒 We’re training its AI models 😒 It’s free labor for Google 😒 Sometimes it wants the corner of an object, sometimes it doesn’t 😒 Wildly inconsistent 😒 Always blurry and hard to see 😒 Seemingly endless 😒 It’s the robot asking us humans if we’re the robots

  • @Blackmist@feddit.uk
    link
    fedilink
    English
    108 months ago

    I thought the whole point of reCaptcha was to provide a reliable set of data to train bots. Entering a fuzzy scanned word, identifying bikes and traffic lights, etc.

    The fact that they’ve now got that, and the bots are trained is hardly a surprise.

    Without captchas the problem of spambots would still be a million times worse.

        • @sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          28 months ago

          No, it tracks things like mouse movements to see if it looks human or like a bot. Humans don’t move the mouse in a straight line, there’s some jitter and whatnot, whereas bots will look quite a bit different.

          • @Vlyn@lemmy.zip
            link
            fedilink
            English
            28 months ago

            That’s super easy to fake for a bot…

            It’s a ton more than mouse movement. Lots of browser fingerprinting for example and tracking.

  • @cley_faye@lemmy.world
    link
    fedilink
    English
    88 months ago

    reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling

    That’s funny, because when I’m faced with this, I keep adding/removing one of the image randomly and it keeps accepting them as ok.

  • @snooggums@midwest.social
    link
    fedilink
    English
    118 months ago

    The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.

    I thought this was known since it came out. It seemed even more obvious when the images leaned in heavily to traffic related pictures like stoplights.

  • polonius-rex
    link
    fedilink
    338 months ago

    Google should bear the cost of detecting bots, rather than shifting it to users

    how?

      • @siph@lemmy.world
        link
        fedilink
        English
        138 months ago

        Considering the article states that reCAPTCHA v2 and v3 can be broken/bypassed by bots 70-100% of the time, they are obviously not the solution.

          • @siph@lemmy.world
            link
            fedilink
            English
            128 months ago

            Maybe a billion dollar company has the budget to come up with something?

            Looking at the numbers in this post, reCAPTCHA exists to make Google money, not to keep bots out.

            I’d rather have no reCAPTCHA than the current state.

            • @OsrsNeedsF2P@lemmy.ml
              link
              fedilink
              English
              9
              edit-2
              8 months ago

              Hi it’s me. I work for a billion dollar company with a budget. We have no ethical ideas on how to stop bots. Thanks for coming to my tech talk.

              • @siph@lemmy.world
                link
                fedilink
                English
                68 months ago

                Yeah, that’s about the way I’d expect it to go.

                “Traffic resulting from reCAPTCHA consumed 134 petabytes of bandwidth, which translates into about 7.5 million kWhs of energy, corresponding to 7.5 million pounds of CO2. In addition, Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set.”

                There might be a tiny chance they’re not interested in changing things.

        • @radivojevic@discuss.online
          link
          fedilink
          English
          68 months ago

          “Google should bear the cost”

          Google should shut it down and make sites roll their own verification. Give everyone a month to implement a new solution on millions of websites.

          • @AeroLemming@lemm.ee
            link
            fedilink
            English
            28 months ago

            This is unironically the answer. You can’t make a general-purpose captcha solver AI if every website or group of websites uses a completely different kind of captcha.

            • @radivojevic@discuss.online
              link
              fedilink
              English
              28 months ago

              I’m actually 100% for rolling your own… almost everything.

              20 years ago I made an e-commerce website for a client. Looking at the code now I’m embarrassed how insecure it is. However, because it was totally custom no one ever found the bugs and it has never been cracked. (Knock on wood) that’s the benefit of not using a prebuilt solution that isn’t a target for mass exploits.

        • @conciselyverbose@sh.itjust.works
          link
          fedilink
          English
          78 months ago

          At what cost?

          100% success rate isn’t even moderately useful if it costs $5 per pass. The discussion is completely pointless without a concrete, documented analysis of the actual hardware and energy costs involved.

        • polonius-rex
          link
          fedilink
          48 months ago

          how do you get the metric of 70-100% of the time?

          the best bots doing it 70-100% of the time is very different to the kind of bot your average spammer will have access to

          • @siph@lemmy.world
            link
            fedilink
            English
            48 months ago

            Did you read the article or the TL:DR in the post body?

            The paper, released in November 2023, notes that even back in 2016 researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time. The reCAPTCHA v2 checkbox challenge is even more vulnerable – the researchers claim it can be defeated 100 percent of the time.

            reCAPTCHA v3 has fared no better. In 2019, researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time.

            So yeah, while these are research numbers, it wouldn’t be surprising if many larger bots have access to ways around that - especially since those numbers are from 2016 and 2019 respectively. Surely it is even easier nowadays.

            • polonius-rex
              link
              fedilink
              58 months ago

              researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time

              that doesn’t answer the question?

              researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time

              i’d argue “bespoke system, deployed in a very limited context, built by researchers at the top of their field” is kind of out of reach for most people? and any bot network scaled up automatically becomes easier to detect the further you scale it

               

              the cost of just paying humans to break these already at or below pennies per challenge

    • @brbposting@sh.itjust.works
      link
      fedilink
      English
      248 months ago

      Finally heard a clear audio CAPTCHA for the first time in my life this past month. It was glorious. There was slight garbling before and after the characters were read, but that’s it.

      Besides that singular experience, all audio CAPTCHAs have been utterly 100% impossible to interpret. Blaring white noise followed by a small squeak of “threeve” or “eleventeen”.

      • @IronKrill@lemmy.ca
        link
        fedilink
        English
        18 months ago

        I’ve found them to be pretty clear usually. Half-formed words at start/end I just ignore. Either way, even on Firefox with uBlock and all the rest, audio captchas have always passed me first try even if I think I got it wrong. I don’t like posting about it in-case they tighten it up after it gets more users.