• 0 Posts
  • 11 Comments
Joined 3 years ago
cake
Cake day: August 4th, 2023

help-circle
  • Traditional software was developed by humans as an artifact that, and to the degree that humans improved the software for some task, got better, but it was not guaranteed. Windows 11 is proof of that, and there are a laundry list of regressions and bugs introduced into software developed by humans. I acknowledge you say usually and especially for open source — I lukewarm agree with that statement but disagree that large LLMs or other generative models will follow this trend, and merely want to point out that software usually introduces bugs as it’s developed, which are hopefully fixed by people who can reason over the code.

    Which brings us to AI models, and really they should just be called transformer models; they are statistical tensor product machines. They are not software in a traditional sense. They are trained to match their training input in a statistical sense. If the input data is corrupted, the model will actually get worse over time, not better. If the data is biased, it will get worse over time, not better. With the amount of slop generated on the web, it is extraordinarily hard to denoise and decide what’s good data and what’s bad data that shouldn’t be used for training. Which means the scaling we’ve seen with increased data will not necessarily hold. And there’s not a clear indication that scaling the model size, which is largely already impractical, is having some synergistic or emergent effect as hoped and hyped.

    Also, we’re really not in the infancy of AI. Maybe the infancy of widespread hype for it, but the idea of using tensor products for statistical learning algorithms goes back at least as far as Smolensky, maybe before, and that was what, 1990?

    We are in the infancy of I’d say quantum style compute, so we really don’t have much to draw on beyond theoretical models.

    Generative LLM models have largely plateaued in my opinion.


  • In my experience it is obvious. Calling people on it also makes them feel embarrassed usually. I put something like “I can just ask an LLM myself if I wanted this output. Please provide your own commentary.” If I were a manager and I had an employee just copy pasting that kind of output, I’d probably wonder if that employee actually contributes anything.



  • This already happens intrinsically in the models. The tokens are abstracted in the internal layers and only translated in the output layer back to next token prediction. Training visual models is slightly different because you’re not outputting tokens but pixel values (or possibly bounding boxes or edges, but not usually; conversely if not generative you may be predicting labels which could theoretically be in token space).

    The field itself is actually fairly stagnant in architecture. It’s still just attention layers all the way down. It’s just adding more context length and more layers and wider layers while training on more data. I personally think this approach will never achieve AGI or anything like it. It will get better at perfectly reciting its training data, but I don’t expect truly emergent phenomena to occur with these architectures just because they’re very big. They’ll be decent chatbots, but we already have that, and they’ll just consumer ever more resources for vanishingly small improvements (and won’t functionally improve any true logical capability beyond regurgitating logical paths already trodden in their training data but in a very brittle way, because they do not actually understand the logic or why the logic is valid, they have no true state model of objects which are described in the token space they’re traversing probabilistically).



  • Current models are speculated at 700 billion parameters plus. At 32 bit precision (half float), that’s 2.8TB of RAM per model, or about 10 of these units. There are ways to lower it, but if you’re trying to run full precision (say for training) you’d use over 2x this, something like maybe 4x depending on how you store gradients and updates, and then running full precision I’d reckon at 32bit probably. Possible I suppose they train at 32bit but I’d be kind of surprised.

    Edit: Also, they don’t release it anymore but some folks think newer models are like 1.5 trillion parameters. So figure around 2-3x that number above for newer models. The only real strategy for these guys is bigger. I think it’s dumb, and the returns are diminishing rapidly, but you got to sell the investors. If reciting nearly whole works verbatim is easy now, it’s going to be exact if they keep going. They’ll approach parameter spaces that can just straight up save things into their parameter spaces.





  • Looking through, it seems like for the most part these are very niche and/or require the user to be using SSO or enterprise recovery options and/or try to change and rotate keys or resync often. I think few people using this for personal would be interacting with that attack surface or accepting organizational invites, but it is serious for organizations (probably why they’re trying quickly to address this).

    Honestly I think a server being incognito controlled and undetected in bitwardens fleet while also performing these attacks is, unlikely? Certainly less likely than passwords being stolen from individual site hacks or probably even banks. Like at that point, it would just be easier to do these types of manipulations directly on bank accounts or crypto wallets or email accounts than here, but then again, if you crack a wallet like this you get theoretically all the goodies to those too I suppose, for a possibly short time (assuming the user wasn’t using 2FA that wasn’t email based as well).

    Not to mitigate these issues. They need to fix them, just trying to ascertain how severe and if individual users should have much cause for concern.