Love or hate just please explain why. This isn’t my area of expertise so I’d love to hear your opinions, especially if you’re particularly well versed or involved. If you have any literature, studies or websites let me know.

  • CodenameDarlen@lemmy.world
    link
    fedilink
    arrow-up
    31
    arrow-down
    2
    ·
    edit-2
    2 days ago

    They’re annoying to be honest.

    I used Qwen 3.5 for some research a few weeks ago, at first the good thing was every sentence was referenced by a link from the internet. So I naturally thought “well, it’s actually researching for me, so no hallucination, good”. Then I decided to look into the linked URLs and it was hallucinating text AND linking random URL to those texts (???), nothing that the AI outputs was really in the web page that was linked. The subject was the same, output and URLs, but it was not extracting actual text from the pages, it was linking a random URL and hallucinating the text.

    Related to code (that’s my area, I’m a programmer), I tried to use Qwen Code 3.5 to vibe code a personal project that was already initialized and basically working. But it just struggles to keep consistency, it took me a lot of hours just prompting the LLM and in the end it made a messy code base hard to be maintained, I asked to write tests as well and after I checked manually the tests they were just bizarre, they were passing but it didn’t cover the use cases properly, a lot of hallucination just to make the test pass. A programmer doing it manually could write better code and keep it maintainable at least, writing tests that covers actual use cases and edge cases.

    Related to images, I can spot from very far most of the AI generated art, there’s something on it that I can’t put my finger on but I somehow know it’s AI made.

    In conclusion, they’re not sustainable, they make half-working things, it generates more costs than income, besides the natural resources it uses.

    This is very concerning in my opinion, given the humanity history, if we rely on half-done things it might lead us to very problematic situations. I’m just saying, the next Chernobyl disaster might have some AI work behind it.

    • Buckshot@programming.dev
      link
      fedilink
      arrow-up
      12
      arrow-down
      1
      ·
      2 days ago

      Had the same research issue from multiple models. The website it linked existed and was relevant but often the specific page was hallucinated or just didn’t say what it said it did.

      In the end it probably created more work than it saved.

      Also a programmer and i find it OK for small stuff but anything beyond 1 function and it’s just unmaintainable slop. I tried vibe coding a project just to see what i was missing. Its fine, it did the job, but only if I dont look at the code. Its insecure, inefficient, and unmaintainable.

      • CodenameDarlen@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        2 days ago

        I agree, I assumed this error was LLM related not Qwen itself. I think LLMs aren’t able to fit the referenced URL within the text extracted from it. They probably do some extensive research (I remember it searched like 20-40 sites), but it’s up to the LLM if it’ll use an exact mention of a given web page or not. So that’s the problem…

        Also it’s a complete mess to build frontend, if you ask a single landing page or pretty common interface it may be able to build something reasonable good, but for more complex layouts it’ll struggle a lot.

        I think this happens because it’s hard to test interfaces. I never got deep into frontend testing but I know there are ways to write actual visual tests for it, but the LLM can’t assimilate the code and an image easily, we’d need to take constant screenshots of the result, feed it back to the LLM and ask it to fix until the interface matches what you want. We’d need a vision capable mode more a coding one.

        I mean you may get good results for average and common layouts, but if you try anything different you’ll see a huge struggle from LLMs.

    • leoj@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      For context and to your knowledge of the field, is Qwen 3.5 supposed to be cutting edge?