AI solves every river crossing puzzle, we can go home now
AI solves every river crossing puzzle, we can go home now

Gemini - River Crossing Puzzle Solution Explained

I think this summarizes in one conversation what is so fucking irritating about this thing: I am supposed to believe that it wrote that code.
No siree, no RAG, no trickery with training a model to transform the code while maintaining identical expression graph, it just goes from word-salading all over the place on a natural language task, to outputting 100 lines of coherent code.
Although that does suggest a new dunk on computer touchers, of the AI enthusiast kind, you can point at that and say that coding clearly does not require any logical reasoning.
(Also, as usual with AI it is not always that good. sometimes it fucks up the code, too).
It is funny how, when generating the code, it suddenly appears to have "understood" what the instruction "The dog can not be left unattended" means, while that was clearly not the case for the natural language output.
That's what I was going to say. The natural language version actually claims that it leaves the dog behind unattended in every step, even though the following step continues as though it still has the dog and not whichever vegetable it brought back in the previous step.
Either it's not actually good at natural language processing or some element of the solution isn't surviving the shift from the river_cross() tool to natural language output. Whatever actual state it's tracking internally doesn't track to the output past the headline.
Other funny thing: it only became a fully automatic plagiarism machine when it claimed that it wrote the code (referring to itself by name which is a dead giveaway that the system prompt makes it do that).
I wonder if code is where they will ultimately get nailed to the wall for willful copyright infringement. Code is too brittle for their standard approach, "we sort of blurred a lot of works together so its ours now, transformative use, fuck you, prove that you don't just blur other people's work together, huh?".
But also for a piece of code, you can very easily test if the code has the same "meaning" - you can implement a parser that converts code to an expression graph, and then compare that. Which makes it far easier to output code that is functionally identical to the code they are plagiarizing, but looks very different.
But also I estimate approximately 0% probability that the assholes working on that wouldn't have banter between themselves about copyright laundering.
edit: Another thing is that since it can have no own conception of what "correct" behavior is for a piece of code being plagiarized, it would also plagiarize all the security exploits.
This hasn't been a big problem for the industry, because only short snippets were being cut and pasted (how to make some stupid API call, etc), but with generative AI whole implementations are going to get plagiarized wholesale.
Unlike any other work, code comes with its own built in, essentially irremovable "watermark" in the form of security exploits. In several thousands lines of code, there would be enough "watermark" for identification.
To give an example, Warner Brothers got sued by Bethesda for stealing code from Fallout Shelter when making their Westworld mobile game, with Bethesda pointing to a bug that appeared in early versions of FO Shelter as evidence of stolen code.
I'd say that incredibly unlikely unless an LLM suddenly blurts out Tesla's entire self-driving codebase.
The code itself is probably among the least behind-a-moat things in software development, that's why so many big players are fine with open sourcing their stuff.
@HedyL @diz I kinda wonder if this would work better if it just was worded the other way round: "must be supervised always"
If I understand correctly, LLMs have difficulties encoding negative correlations (not, un-, ...)
Edit: or maybe not, seeing it did this transformation already in the introduction and still lets the dog escape on the very first turn
That is not equivalent, though; other solutions to "can not be left unattended" exist; just ask Kristi Noem.