Skip Navigation

User banner
Posts
11
Comments
107
Joined
2 yr. ago

  • This is true only if the decisions were made independently. If you allow people to make a decision after they've seen the metrics, this no longer holds.

    Here's an example of the first. You go at a farmer's market with a cow and you ask everyone to write on a piece of paper what they think the weight is. If you get the replies and average them, you will find that the mean of all answers will be quite close to the real answer. A mix of non-experts and experts will iron out a good answer somehow.

    Now take the average experience of going to a restaurant. One might have just opened recently, has great food and great staff, but only 5 reviews, at an average of 3.8 or something. Another restaurant nearby has been open for 3-4 years, and has 1000 reviews, at maybe 3.9. People will usually follow the one with more reviews because they think it's the safer option due to the information available. However, if you were to hide this and ask them to choose by just looking at the venue and the menu, they would probably choose the first one.

    Group dynamics are quite interesting, and the psychology behind this is quite funky sometimes :D

  • Be wary that their docs are so and so. Nanonets OCR, Mistral OCR and MinerU will also extract formulas and images.

    One other model I forgot to mention is Docling. This one is quite quick to set up in a docker container, and will have a web interface ready to go where you can upload documents. This sort of follows the PaddleOCR pipeline, but also allows you to use vLMs.

    Good luck!

  • If you find that OCR doesn't get you very far, maybe try a small vLM to parse PNGs of the pages. For example, Nanonets OCR will do this, although quite slow if you don't have a GPU. It will give you a Markdown version of the page, which you can then translate with another tool.

    PaddleOCR might also be useful, since it focuses on Chinese, but it's more difficult to set up. To add to this, some other options are MinerU and MistralOCR (this is paid, but you can test it for free if you upload it in Mistral's library).

  • Sure, if all politicians make all their data available to the public. Their phone chat messages, photos taken, everything.

    No...? Then don't bring it up ever again. Initiatives like these will only make it look like you're a villain if you want privacy.

  • I was hoping a dbzer0 piefed instance would happen sometimes in the future! I would totally use it, since it has some pretty cool features that Lemmy has been quite slow in implementing. For example, merged communities that cover one specific topic.

  • All the ones I mentioned can be installed with pip or uv if I am not mistaken. It would probably be more finicky than containers that you can put behind a reverse proxy, but it is possible if you wish to go that route. Ollama will also run system-wide, so any project will be able to use its API without you having to create a separate environment and download the same model twice in order to use it.

  • Ollama for API, which you can integrate into Open WebUI. You can also integrate image generation with ComfyUI I believe.

    It's less of a hassle to use Docker for Open WebUI, but ollama works as a regular CLI tool.

  • You really think this is all him? Guy's got a full team running the media. The Heritage Foundation's got its grimy hands all over the country.

  • Look into Cosmic DE. It has a similar vibe if you set up tiling, but without all the headache of configuring all the components so that it is usable.

  • Permanently Deleted

    Jump
  • Didn't this guy just go on French national TV and say that Macron is a dictator?

    I think if this guy gets voted in, Romania might say bye bye to its EU funds.

  • I heard that Poland is also cheering for some MAGA guy in the next election... Troubling times ahead.

    For Romania, there might still be a chance in the run-off. However, the difference between the two candidates was quite large (20% difference; 1.8 million votes). Similarly, the other candidates seemed to have voters that would rather vote for the nazi. Most likely all hope is lost, but that 1% chance is still there.

  • You're right! Sorry for the typo. The older nomic-embed-text model is often used in examples, but granite-embedding is a more recent one and smaller for English-only text (30M parameters). If your use case is multi-language, they also offer a bigger one (278M parameters) that can handle English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified). I would test them out a bit to see what works best for you.

    Furthermore, if you're not dependent on MariaDB for something else in your system, there are also some other vector databases I would recommend. Qdrant also works quite well, and you can integrate it pretty easily in something like LangChain. It really depends on how much you want to push your RAG workflow, but let me know if you have any other questions.

  • Have a look at Ollama embeddings. Easy to set up and the models are much smaller than a typical LLM.

  • No one is stopping them from buying meat though? There are still big sections full of it in stores that you see whenever you go grocery shopping.

    Heck, I would ban the ads on all food products if it were me. Same with pharmaceuticals.

  • My bad, I initially read that we should give Florida to Russia in exchange for Ukraine getting back Crimea, haha. I said giving Alaska instead because it was also part of Russia before, so they could spin it off the narrative similarly to what they did to Ukraine.

    But yeah, I don't think the clown in the White House can really understand the situation unless you put it in perspective for him.

  • Why not give Alaska back to Russia? You know, if we're at the stage where we say we cede territories...

  • It would be a shame if someone were to make a post with their office locations across Europe and share it in all the European communities on Lemmy...

  • What's the plan against this? It's pretty clear that this type of grifting works. Hungary kept Orban in power for far too long, and now Romania might be next.

  • Wait, you didn't need a passport before, did you? Since it was still EU travel, you could get in with just a national ID, or am I wrong?

  • Romania has previously jumped into a war, only to change sides later. I wouldn't be surprised if they end up taking the bait before the upcoming presidential elections. From what I've heard, the far right candidate that's left in the race is betting that he will get the votes of everyone that voted for Calin Georgescu. His platform? Being a boot licker for Trump.

    Troubling times for the Balkans.

  • Free and Open Source Software @beehaw.org

    Open Source Text-to-Speech and Speech-to-Text on Android?

    Europe @feddit.org

    Advancing European Sovereignty in HPC with RISC-V

    Europe @feddit.org

    EU Digital Sovereignty - Time to provide alternatives to US/Chinese big tech

    Europe @feddit.org

    We can all help Ukraine - UNITED24

    Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ @lemmy.dbzer0.com

    Archiving papers using Zotero headless?

    Technology @lemmy.world

    Redox OS 0.9.0 - Redox - Your Next(Gen) OS

    Linux @lemmy.ml

    Poll: GUI framework for widgets/apps in Wayland

    Arch Linux @lemmy.ml

    Installing AUR packages after using archinstall

    Modded Minecraft @sopuli.xyz

    Improving server performance for All the Mods 8

    Linux @lemmy.ml

    Jump from Arch to NixOS?

    Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ @lemmy.dbzer0.com

    Sites or Trackers for Exam Dumps