Skip Navigation

scruiser @ scruiser @awful.systems

Posts

4
Comments

199
Joined

2 yr. ago

2mo ago

If AI is so good at coding … where are the open source contributions?

The promptfarmers can push the hallucination rates incrementally lower by spending 10x compute on training (and training on 10x the data and spending 10x on runtime cost) but they're already consuming a plurality of all VC funding so they can't 10x many more times without going bust entirely. And they aren't going to get them down to 0%, hallucinations are intrinsic to how LLMs operate, no patch with run-time inference or multiple tries or RAG will eliminate that.

And as for newer models... o3 actually had a higher hallucination rate because trying to squeeze rational logic out of the models with fine-tuning just breaks them in a different direction.

I will acknowledge in domains with analytically verifiable answers you can check the LLMs that way, but in that case, its no longer primarily an LLM, you've got an entire expert system or proof assistant or whatever that can operate independently of the LLM and the LLM is just providing creative input.

2mo ago

If AI is so good at coding … where are the open source contributions?

A junior developer learns from these repeated minor corrections. LLM's can't learn from them. they don't have any runtime fine-tuning (and even if they did it wouldn't be learning like a human does), at the very best past conversations get summarized and crammed into the context window hidden from the user to provide a shallow illusion of continuity and learning.

2mo ago

If AI is so good at coding … where are the open source contributions?

I like promptfarmers for the LLM companies and developers. It reflects there attitude of passively hoping that letting their model grow in scale will bring in some future harvest of money.

2mo ago

If AI is so good at coding … where are the open source contributions?

You've inadvertently pointed out the exact problem: LLM approaches can (unreliably) manage boilerplate and basic stuff but fail at anything more advanced, and by handling the basic stuff they give people false confidence that leads to them submitting slop (that gets rejected) to open source projects. LLMs, as the linked pivot-to-ai post explains, aren't even at the level of occasionally making decent open source contributions.

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 18th May 2025

Given the libertarian fixation, probably a solid percentage of them. And even the ones that didn't vote for Trump often push or at least support various mixes of "grey-tribe", "politics is spiders", "center left", etc. kind of libertarian centrist thinking where they either avoided "political" discussion on lesswrong or the EA forums (and implicitly accepted libertarian assumptions without argument) or they encouraged "reaching across the aisle" or "avoiding polarized discourse" or otherwise normalized Trump and the alt-right.

Like looking at Scott's recent posts on ACX, he is absolutely refusing responsibility for his role in the alt-right pipeline with every excuse he can pull out of his ass.

Of course, the heretics who have gone full e/acc absolutely love these sorts of "policy" choices, so this actually makes them more in favor of Trump.

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 18th May 2025

In terms of writing bots to play Pokemon specifically (which given the prompting and custom tools written I think is the most fair comparison)... not very well... according to this reddit comment a bot from 11 years ago can beat the game in 2 hours and was written with about 7.5K lines of LUA, while an open source LLM scaffold for playing Pokemon relatively similar to claude's or gemini's is 4.8k lines (and still missing many of the tools Gemini had by the end, and Gemini took weeks of constant play instead of 2 hours).

So basically it takes about the same number of lines written to do a much much worse job. Pokebot probably required relatively more skill to implement... but OTOH, Gemini's scaffold took thousands of dollars in API calls to trial and error develop and run. So you can write bots from scratch that substantially outperform LLM agent for moderately more programming effort and substantially less overall cost.

In terms of gameplay with reinforcement learning... still not very well. I've watched this video before on using RL directly on pixel output (with just a touch of memory hacking to set the rewards), it uses substantially less compute than LLMs playing pokemon and the resulting trained NN benefits from all previous training. The developer hadn't gotten it to play through the whole game... probably a few more tweaks to the reward function might manage a lot more progress? OTOH, LLMs playing pokemon benefit from being able to more directly use NPC dialog (even if their CoT "reasoning" often goes on erroneous tangents or completely batshit leaps of logic), while the RL approach is almost outright blind... a big problem the RL approach might run into is backtracking in the later stages since they use reward of exploration to drive the model forward. OTOH, the LLMs also had a lot of problems with backtracking.

My (wildly optimistic by sneerclubbing standards) expectations for "LLM agents" is that people figure out how to use them as a "creative" component in more conventional bots and AI approaches, where a more conventional bot prompts the LLM for "plans" which it uses when it gets stuck. AlphaGeometry2 is a good demonstration of this, it solved 42/50 problems with a hybrid neurosymbolic and LLM approach, but it is notable it could solve 16 problems with just the symbolic portion without the LLM portion, so the LLM is contributing some, but the actual rigorous verification is handled by the symbolic AI.

(edit: Looking at more discussion of AlphaGeometry, the addition of an LLM is even less impressive than that, it's doing something you could do without an LLM at all, on a set of 30 problems discussed, the full AlphaGeometry can do 25/30, without the LLM at all 14/30,* but* using alternative methods to an LLM it can do 18/30 or even 21/30 (depending on the exact method). So... the LLM is doing something, which is more than my most cynical sneering would suspect, but not much, and not necessarily that much better than alternative non-LLM methods.)

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 18th May 2025

Despite the snake-oil flavor of Vending-Bench, GeminiPlaysPokemon, and ClaudePlaysPokemon, I've found them to be a decent antidote to agentic LLM hype. The insane transcripts of Vending-Bench and the inability of an LLM to play Pokemon at the level of a 9 year old is hard to argue with, and the snake oil flavoring makes it easier to get them to swallow.

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 18th May 2025

It starts out seeming like a funny but petty and irrelevant criticism of his kitchen skill and product choices, but then beautifully transitions that into an accurate criticism of OpenAI.

2mo ago

Shocked to hear ‘prompt engineer’ is not a real job

It was a thing in the sense that the promptfondlers were trying to portray prompting as a matter of fine technique and skill (as opposed to dumb luck mixed with trial and error with a few general guidelines that half work). It was not a thing then and is still not now in the sense that prompting has none of the skill or precision or verifiability or reliability of actual programming.

2mo ago

Shocked to hear ‘prompt engineer’ is not a real job

This post has prompted me to give a reminder that one of the authors of AI 2027 predicted back in 2021 that "prompt programming" would be a thing by now.

2mo ago

OpenAI tries a new scheme to go for-profit — but Elon Musk isn’t convinced

On the other hand, it doesn't matter what Elon actually thinks, because he would probably go through with the lawsuit given any available pretext because he's mad he got kicked out and couldn't commercialize OpenAI exactly like Sam tried.

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 11th May 2025

I knew the exact couple you were talking about before I read any additional comments. They seem to show up in the news like clockwork... do they have a publicist or PR agent looking for newspapers in need of garbage filler puff pieces? If anything, going to the white house is a step up from there normal pattern of self promotion.

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 4th May 2025

Yeah they are normally all over anything with the word "market" in it, with an almost religious like belief in market's ability to solve things.

My suspicion is that the writer has picked up some anti-Ukrainian sentiment from the US right wing (which in order to rationalize and justify Trump's constant sucking up to Putin has looked for any and every angle to tear Ukraine down). And this anti-Ukrainian sentiment has somehow trumped their worship of markets... Checking back through their posting history to try to discern their exact political alignment... it's hard to say, they've got the Scott Alexander thing going on where they use disconnected historical examples crossed with a bad analogies crossed with misappropriated terms from philosophy to make points that you can't follow unless you already know their real intended context. So idk.

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 4th May 2025

The slatestarcodex is discussing the unethical research performed on changemyview. Of course, the most upvoted take is that they don't see the harm or why it should be deemed unethical. Lots of upvoted complaints about IRBs and such. It's pretty gross.

2mo ago

Sneerquence classics: Eliezer on GOFAI (half serious half sneering effort post)

I'd add:

examples of problems equivalent to the halting problem, examples of problems that are intractable
computational complexity. I.e. Schrodinger Equation and DFT and why the ASI can't invent new materials/nanotech (if it was even possible in the first place) just by simulating stuff really well.

titotal has written some good stuff on computational complexity before. Oh wait, you said you can do physics so maybe you're already familiar with the material science stuff?

2mo ago

Sneerquence classics: Eliezer on GOFAI (half serious half sneering effort post)

It's worse than you are remembering! Eliezer has claimed deep neural networks (maybe even something along the lines of llms) could learn to break hashes just through being trained on exposure to hash/plaintext pairs on the training data set.

The original discussion: here about a lesswrong post and here about a tweet. And the original lesswrong post if you want to go back to the source.

2mo ago

Seattle Worldcon science fiction convention vets panelists with ChatGPT

That disclaimer feels like parody given that LLMs have existed under a decade and only been popular a few years. Like it's mocking all the job ads that ask for 10+ years of experience on a programming language or library that has literally only existed for 7 years.

2mo ago

Sneerquence classics: Eliezer on GOFAI (half serious half sneering effort post)

Yeah, he thinks Cyc was a switch from the brilliant meta-heuristic soup of Eurisko to the dead end of expert systems, but according to the article I linked, Cycorp was still programming in extensive heuristics and meta-heuristics with the expert system entries they were making as part of it's general resolution-based inference engine, it's just that Cyc wasn't able to do anything useful with these heuristics and in fact they were slowing it down extensively, so they started turning them off in 2007 and completely turned off the general inference system in 2010!

To be ~~fair~~ far too charitable to Eliezer, this little factoid has cites from 2022 and 2023 when Lenat wrote more about lessons from Cyc, so it's not like Eliezer could have known this back in 2008. To ~~sneer~~ be actually fair to Eliezer, he should have figured they guy that actually wrote and used Eurisko and talked about how Cyc was an extension of it and repeatedly refers back to lessons of Eurisko would in fact try to include a system of heuristics and meta-heuristics in Cyc! To properly sneer at Eliezer... it probably wouldn't have helped even if Lenat kept the public up to date on the latest lessons from Cyc through academic articles, Eliezer doesn't actually keep up with the literature as it's published.

2mo ago

Seattle Worldcon science fiction convention vets panelists with ChatGPT

Using just the author's name as input feels deliberately bad. Like the promptfondlers generally emphasize how important prompting it right is, its hard to imagine them going deliberately minimalistic in prompt.

2mo ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 4th May 2025

AlphaFold exists, so computational complexity is a lie and the AGI will surely find an easy approximation to the Schrodinger Equation that surpasses all Density Functional Theory approximations and lets it invent radically new materials without any experimentation!