vrighter

yes but now you've shifted the problem again. You went from detecting infinite sites by detecting loops in an infinite tree without loops or with infinite distinct urls, to somehow keeping a list of all infinite distinct urls to avoid going to one twice(which you wouldn't anyway, because there are infinite links), to assuming you have a list that already detected which sites these are so you could avoid them and therefore not have to worry about detecting them (the very thing you started with).

It's ok to admit that your initial idea was wrong. You did not solve a coding problem. You changed the requirements so it's not your problem anymore.

And storing a domain whitelist would't work either, btw. A tarpit entrance is just one url among lots of legitimate ones, in legitimate domains.

6mo ago

AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

Jump

it's one domain. It's infinite pages under that domain. Limiting max visits per domain is a very different thing than trying to detect loops which aren't there. You are now making a completely different argument. In fact it sounds suspiciously like the only thing I said they could do: have some arbitrary threshold, beyond which they give up... because there's no way of detecting otherwise

6mo ago

AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

Jump

what part of "they do not repeat" do you still not get? You can put them in a list, but you won't ever get a hit ic it'd just be wasting memory

6mo ago

Facebook admits that the Linux topic crackdown was 'in error' and has been fixed

Jump

funny how they never make errors in favour of anyone else but themselves

6mo ago

Astral is building a new static type checker for Python, from scratch, in Rust

Jump

i don't believe it's possible either. For example the tree walker of the ast module takes the node passed to it, checks its type, gets its name, then looks for the method with that dynamically looked up name in your implementation of the tree walker and if it does (the user might not have implemented a visit method for that type of node), calls it and passes the node to it. All of this at runtime.

6mo ago

deepseek

Jump

but you can't train it yourself

6mo ago

Permanently Deleted

Jump

so? it won't have any effect on china, because last i checked, us laws apply only in the us

6mo ago

AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

Jump

sure, if you have enough memory to store a list of all guids.

6mo ago

AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

Jump

an infinite loop detector detects when you're going round in circles. They can't detect when you're going down an infinitely deep acyclic graph, because that, by definition doesn't have any loops for it to detect. The best they can do is just have a threshold after which they give up.

6mo ago

Open-source Deepseek R1 dethrones commercial AI, now allegedly being hit by cyberattack

Jump

I'm not seeing any reasoning, that was the point of my comment. That's why I said "supposed"

6mo ago

deepseek

Jump

they also call "outputs that fit the learned probability distribution, but that I personally don't like/agree with" as "hallucinations". They also call "showing your working" reasoning. The llm space has redefined a lot of words. I see no problem with defining words. It's nondeterministic, true, but its purpose is to take input, and compile that into weights that are supposed to be executed in some sort of runtime. I don't see myself as redefining the word. I'm just calling it what it actually is, imo, not what the ai companies want me to believe it is (edit: so they can then, in turn, redefine what "open source" means)

6mo ago

deepseek

Jump

it's just a different paradigm. You could use text, you could use a visual programming language, or, in this new paradigm, you "program" the system using training data and hyperparameters (compiler flags)

6mo ago

Open-source Deepseek R1 dethrones commercial AI, now allegedly being hit by cyberattack

Jump

so.... with all the supposed reasoning stuff they can do, and supposed "extrapolation of knowledge" they cannot figure out that a tail is part of a cat, and which part it is.

6mo ago

deepseek

Jump

no, it's not. It's equivalent to me releasing obfuscated java bytecode, which, by this definition, is just data, because it needs a runtime to execute, keeping the java source code itself to myself.

Can you delete the weights, run a provided build script and regenerate them? No? then it's not open source.