Free Open-Source Artificial Intelligence
- Introducing LLM360: Fully Transparent Open-Source LLMswww.llm360.ai Introducing LLM360: Fully Transparent Open-Source LLMs
We are thrilled to introduce LLM360, an initiative to open source LLMs that fosters transparency, trust, and collaborative research. When releasing models under LLM360, we strive to make all the details of LLM accessible to everyone.
- Mistral shocks AI community as latest open source model eclipses GPT-3.5 performanceventurebeat.com Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance
However, the seeming lack of safety guardrails also may present a challenge to policymakers and regulators.
Original Mistral AI blog: https://mistral.ai/news/mixtral-of-experts/
- mistral-8x7b-chathuggingface.co mattshumer/mistral-8x7b-chat · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
> A very capable chat model built on top of the new Mistral MoE model, trained on the SlimOrca dataset for 1 epoch, using QLoRA.
- SuperDuperDB - Integrating AI directly with your Databases
YouTube Video
Click to view this content.
https://github.com/SuperDuperDB/superduperdb
- Anybody know what's up with the abstrusegoose comic?
cross-posted from: https://lemmy.ca/post/10994517
> Sorry if this isn't relevant to the community, but couldn't think of anywhere better to post. I saw something curious in my RSS comics feed last night for the Abstruse Goose comic. The author is fairly prolific and used to post comics based on math, technology, etc. His site and archive of comics has now been replaced with a single cryptic message: > > "AGI will not be designed by humans. It will be evolved through relentless evolutionary computational processes designed by humans." > > Very curious! Anybody have any theories on what is going on? I can't imagine what his motivation might be :)
- QuIP#: SOTA 2 bit LLMsgithub.com GitHub - Cornell-RelaxML/quip-sharp
Contribute to Cornell-RelaxML/quip-sharp development by creating an account on GitHub.
>Large language models (LLMs) exhibit amazing performance on a wide variety of tasks such as text modeling and code generation. However, they are also very large. For example Llama 2 70B has 70 billion parameters that require 140GB of memory to store in half precision. This presents many challenges, such as needing multiple GPUs just to serve a single LLM. To address these issues, researchers have developed compression methods that reduce the size of models without destroying performance. > >One class of methods, post-training quantization, compresses trained model weights into lower precision formats to reduce memory requirements. For example, quantizing a model from 16 bit to 2 bit precision would reduce the size of the model by 8x, meaning that even Llama 2 70B would fit on a single 24GB GPU. In this work, we introduce QuIP#, which combines lattice codebooks with incoherence processing to create state-of-the-art 2 bit quantized models. These two methods allow QuIP# to significantly close the gap between 2 bit quantized LLMs and unquantized 16 bit models.
Project Page: https://cornell-relaxml.github.io/quip-sharp/
Code: https://github.com/Cornell-RelaxML/quip-sharp
- Mistral AI bucks release trend by dropping torrent link to new open source LLMventurebeat.com Mistral AI bucks release trend by dropping torrent link to new open source LLM
Mistral AI released a new LLM with nothing but a torrent link — creating community buzz for the stark contrast with Google's Gemini launch.
Nitter “original” with magnet link: https://nitter.net/MistralAI/status/1733150512395038967
- ByteDance AI Promises Stronger than Gemini Open Weight GPT Dropping Soon
Nitter “original” thread: https://nitter.net/QuanquanGu/status/1732484036160012798
- Fine Tuning Mistral 7B on Magic the Gathering Draftgenerallyintelligent.substack.com Fine Tuning Mistral 7B on Magic the Gathering Drafts
Tips, examples, and thoughts from an exploration of the world of fine tuning
cross-posted from: https://derp.foo/post/467324
> There is a discussion on Hacker News, but feel free to comment here as well.
- Joint initiative for trustworthy AIactu.epfl.ch Joint initiative for trustworthy AI
ETH Zurich and EPFL are launching the “Swiss AI Initiative”, whose purpose is to position Switzerland as a leading global hub for the development and implementation of transparent and reliable artificial intelligence (AI). The new Alps supercomputer based at the Swiss National Supercomputing Centre ...
- New technique to run 70B LLM Inference on a single 4GB GPUai.gopubby.com Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique
Large language models require huge amounts of GPU memory. Is it possible to run inference on a single GPU? If so, what is the minimum GPU…
- Seamless Communication - A family of Al translation models that enable more natural and authentic communication across languagesai.meta.com Seamless Communication - AI at Meta
A significant step towards removing language barriers through expressive, fast and high-quality AI translation
SeamlessM4T
---
SeamlessM4T is our foundational all-in-one Massively Multilingual and Multimodal Machine Translation model delivering high-quality translation for speech and text in nearly 100 languages.
SeamlessM4T models support the tasks of:
- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Text-to-speech translation (T2ST)
- Text-to-text translation (T2TT)
- Automatic speech recognition (ASR)
🌟 We are releasing SemalessM4T v2, an updated version with our novel UnitY2 architecture. This new model improves over SeamlessM4T v1 in quality as well as inference latency in speech generation tasks.
To learn more about the collection of SeamlessM4T models, the approach used in each, their language coverage and their performance, visit the SeamlessM4T README or 🤗 Model Card
Code: https://github.com/facebookresearch/seamless_communication
- Introducing SDXL Turbo
YouTube Video
Click to view this content.
SDXL-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation. A real-time demo is available here: http://clipdrop.co/stable-diffusion-turbo
Key Takeaways:
-
SDXL Turbo achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one.
-
See our research paper for specific technical details regarding the model’s new distillation technique that leverages a combination of adversarial training and score distillation.
-
Download the model weights and code on Hugging Face, currently being released under a non-commercial research license that permits personal, non-commercial use.
-
Test SDXL Turbo on Stability AI’s image editing platform Clipdrop, with a beta demonstration of the real-time text-to-image generation capabilities
Model weights and code: https://huggingface.co/stabilityai/sdxl-turbo
Demo: https://clipdrop.co/stable-diffusion-turbo
Paper: https://stability.ai/research/stability-ai-adversarial-diffusion-distillation
Blogpost: https://stability.ai/news/stability-ai-sdxl-turbo
-
- What's the deal with LlamaCPP and caching?
I'm curious what it is doing from a top down perspective.
I've been playing with a 70B chat model that has several datasets on top of Llama2. There are some unusual features somewhere in this LLM and I am not sure what was trained versus (unusual layers?). The model has built in roleplaying stories I've never seen other models perform. These stories are not in the Oobabooga Textgen WebUI. The model can do stuff like a Roman Gladiator, and some NSFW stuff. These are not very realistic stories and play out with the depth of a child's videogame. They are structured rigidly like they are coming from a hidden system context.
Like with the gladiators story it plays out like Tekken on the original PlayStation. No amount of dialogue context about how real gladiators will change the story flow. Like I tried modifying by adding how gladiators were mostly nonlethal fighters and showmen more closely aligned with the wrestler-actors that were popular in the 80's and 90's, but no amount of input into the dialogue or system contexts changed the story from a constant series of lethal encounters. These stories could override pretty much anything I added to system context in Textgen.
There was one story that turned an escape room into objectification of women, and another where name-1 is basically like a Loki-like character that makes the user question what is really happening by taking on elements in system context but changing them slightly. Like I had 5 characters in system context and it shifted between them circumstantially in a story telling fashion that was highly intentional with each shift. (I know exactly what a bad system context can do, and what errors look like in practice, especially with this model. I am 100% certain these are either (over) trained or programic in nature. Asking the model to generate a list of built in roleplaying stories creates a similar list of stories the couple of times I cared to ask. I try to stay away from these "built-in" roleplays as they all seem rather poorly written. I think this model does far better when I write the entire story in system context. One of the main things the built in stories do that surprise me is maintaining a consistent set of character identities and features throughout the story. Like the user can pick a trident or gladius, drop into a dialogue that is far longer than the batch size and then return with the same weapon in the next fight. Normally, I expect that kind of persistence would only happen if the detail was added to the system context.
Is this behavior part of some deeper layer of llama.cpp that I do not see in the Python version or Textgen source, like is there an additional persistent context stored in the cache?
- Video-LLaVA: Learning United Visual Representation by Alignment Before Projectionreplicate.com nateraw/video-llava – Run with an API on Replicate
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Abstract
>The Large Vision-Language Model (LVLM) has enhanced the performance of various downstream tasks in visual-language understanding. Most existing approaches encode images and videos into separate feature spaces, which are then fed as inputs to large language models. However, due to the lack of unified tokenization for images and videos, namely misalignment before projection, it becomes challenging for a Large Language Model (LLM) to learn multi-modal interactions from several poor projection layers. In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM. As a result, we establish a simple but robust LVLM baseline, Video-LLaVA, which learns from a mixed dataset of images and videos, mutually enhancing each other. Video-LLaVA achieves superior performances on a broad range of 9 image benchmarks across 5 image question-answering datasets and 4 image benchmark toolkits. Additionally, our Video-LLaVA also outperforms Video-ChatGPT by 5.8%, 9.9%, 18.6%, and 10.1% on MSRVTT, MSVD, TGIF, and ActivityNet, respectively. Notably, extensive experiments demonstrate that Video-LLaVA mutually benefits images and videos within a unified visual representation, outperforming models designed specifically for images or videos.
Paper: https://arxiv.org/abs/2311.10122
Code: https://github.com/PKU-YuanGroup/Video-LLaVA
Demo: https://huggingface.co/spaces/LanguageBind/Video-LLaVA
- StyleTTS 2
Abstract
>In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis. StyleTTS 2 differs from its predecessor by modeling styles as a latent random variable through diffusion models to generate the most suitable style for the text without requiring reference speech, achieving efficient latent diffusion while benefiting from the diverse speech synthesis offered by diffusion models. Furthermore, we employ large pre-trained SLMs, such as WavLM, as discriminators with our novel differentiable duration modeling for end-to-end training, resulting in improved speech naturalness. StyleTTS 2 surpasses human recordings on the single-speaker LJSpeech dataset and matches it on the multispeaker VCTK dataset as judged by native English speakers. Moreover, when trained on the LibriTTS dataset, our model outperforms previous publicly available models for zero-shot speaker adaptation. This work achieves the first human-level TTS synthesis on both single and multispeaker datasets, showcasing the potential of style diffusion and adversarial training with large SLMs.
Paper: https://arxiv.org/abs/2306.07691
Code: https://github.com/yl4579/StyleTTS2
Colab: https://colab.research.google.com/github/yl4579/StyleTTS2/blob/main/
- LLaVA-Plus - Large Language and Vision Assistants that Plug and Learn to Use Skills
Abstract
>LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models. It maintains a skill repository of pre-trained vision and vision-language models and can activate relevant tools based on users' inputs to fulfill real-world tasks. LLaVA-Plus is trained on multimodal instruction-following data to acquire the ability to use tools, covering visual understanding, generation, external knowledge retrieval, and compositions. Empirical results show that LLaVA-Plus outperforms LLaVA in existing capabilities and exhibits new ones. It is distinct in that the image query is directly grounded and actively engaged throughout the entire human-AI interaction sessions, significantly improving tool use performance and enabling new scenarios.
Paper: https://arxiv.org/abs/2311.05437
Code: https://github.com/LLaVA-VL/LLaVA-Plus-Codebase
Demo: https://llavaplus.ngrok.io/
Dataset: https://huggingface.co/datasets/LLaVA-VL/llava-plus-data
Model: https://llava-vl.github.io/llava-plus/
- An open-source computer vision framework to build and deploy apps in minutes without worrying about multimedia pipelinesgithub.com GitHub - pipeless-ai/pipeless: An open-source computer vision framework to build and deploy apps in minutes without worrying about multimedia pipelines
An open-source computer vision framework to build and deploy apps in minutes without worrying about multimedia pipelines - GitHub - pipeless-ai/pipeless: An open-source computer vision framework to...
- Llama 2 / WizardLM Megathread
Llama 2 & WizardLM Megathread
Starting another model megathread to aggregate resources for any newcomers.
It's been awhile since I've had a chance to chat with some of these models so let me know some your favorites in the comments below.
There are many to choose from - sharing your experience could help someone else decide which to download for their use-case.
Thread Models:
---
Quantized Base Llama-2 Chat Models
Llama-2-7b-Chat
GPTQ
GGUF
AWQ
---
Llama-2-13B-chat
GPTQ
GGUF
AWQ
---
Llama-2-70B-chat
GPTQ
GGUF
AWQ
---
Quantized WizardLM Models
WizardLM-7B-V1.0+
GPTQ
GGUF
AWQ
---
WizardLM-13B-V1.0+
GPTQ
GGUF
AWQ
---
WizardLM-30B-V1.0+
GPTQ
- WizardLM-30B-uncensored-GPTQ
- WizardLM-Uncensored-SuperCOT-StoryTelling-30B-GPTQ
- WizardLM-33B-V1.0-Uncensored-GPTQ
GGUF
- WizardLM-30B-GGUF
- WizardLM-Uncensored-SuperCOT-StoryTelling-30B-GGUF
- WizardLM-33B-V1.0-Uncensored-GGUF
AWQ
---
Llama 2 Resources
> LLaMA 2 is a large language model developed by Meta and is the successor to LLaMA 1. LLaMA 2 is available for free for research and commercial use through providers like AWS, Hugging Face, and others. LLaMA 2 pretrained models are trained on 2 trillion tokens, and have double the context length than LLaMA 1. Its fine-tuned models have been trained on over 1 million human annotations.
Llama 2 Benchmarks
> Llama 2 shows strong improvements over prior LLMs across diverse NLP benchmarks, especially as model size increases: On well-rounded language tests like MMLU and AGIEval, Llama-2-70B scores 68.9% and 54.2% - far above MTP-7B, Falcon-7B, and even the 65B Llama 1 model.
Llama 2 Tutorials
Tutorials by James Briggs (also link above) are quick, hands-on ways for you to experiment with Llama 2 workflows. See also a poor man's guide to fine-tuning Llama 2. Check out Replicate if you want to host Llama 2 with an easy-to-use API.
---
Did I miss any models? What are some of your favorites? Which family/foundation/fine-tuning should we cover next?
- We're building FOSAI models! Cast your votes and pick your tunings.
Hey everyone!
I think it's time we had a fosai model on HuggingFace. I'd like to start collecting ideas, strategies, and approaches for fine-tuning our first community model.
I'm open to hearing what you think we should do. We will release more in time. This is just the beginning.
For now, I say let's pick a current open-source foundation model and fine-tune on datasets we all curate together, built around a loose concept of using a fine-tuned LLM to teach ourselves more bleeding-edge technologies (and how to build them using technical tools and concepts).
FOSAI is a non-profit movement. You own everything fosai as much as I do. It is synonymous with the concept of FOSS. It is for everyone to champion as they see fit. Anyone is welcome to join me in training or tuning using the workflows I share along the way.
You are encouraged to leverage fosai tools to create and express ideas of your own. All fosai models will be licensed under Apache 2.0. I am open to hearing thoughts if other licenses should be considered.
---
We're Building FOSAI Models! 🤖
Our goal is to fine-tune a foundation model and open-source it. We're going to start with one foundation family with smaller parameters (7B/13B) then work our way up to 40B (or other sizes), moving to the next as we vote on what foundation we should fine-tune as a community.
---
Fine-Tuned Use Case ☑️
Technical
FOSAI Model Idea #1
- Research & Development AssistantFOSAI Model Idea #2
- Technical Project ManagerFOSAI Model Idea #3
- Personal Software DeveloperFOSAI Model Idea #4
- Life Coach / Teacher / MentorFOSAI Model Idea #5
- FOSAI OS / System Assistant
Non-Technical
FOSAI Model Idea #6
- Dungeon Master / Lore MasterFOSAI Model Idea #7
- Sentient Robot CharacterFOSAI Model Idea #8
- Friendly Companion CharacterFOSAI Model Idea #9
- General RPG or Sci-Fi CharacterFOSAI Model Idea #10
- Philosophical Character
OR
FOSAI Foundation Model
☑️---
Foundation Model ☑️
(Pick one)
Mistral
Llama 2
Falcon
..(Your Submission Here)
---
Model Name & Convention
snake_case_example
CamelCaseExample
kebab-case-example
0.) FOSAI ☑️
fosai-7B
fosai-13B
1.) FOSAI Assistant ☑️
fosai-assitant-7B
fosai-assistant-13B
2.) FOSAI Atlas ☑️
fosai-atlas-7B
fosai-atlas-13B
3.) FOSAI Navigator ☑️
fosai-navigator-7B
fosai-navigator-13B
4.) ?
---
Datasets ☑️
TBD!
What datasets do you think we should fine-tune on?
---
Alignment ☑️
To embody open-source mentalities, I think it's worth releasing both censored and uncensored versions of our models. This is something I will consider as we train and fine-tune over time. Like any tool, you are responsible for your usage and how you choose to incorporate into your business and/or personal life.
---
License ☑️
All fosai models will be licensed under Apache 2.0. I am open to hearing thoughts if other licenses should be considered.
This will be a fine-tuned model, so it may inherit some of the permissions and license agreements as its foundation model and have other implications depending on your country or local law.
Generally speaking, you can expect that all fosai models will be commercially viable through the selection process of its foundation family and the post-processing steps that are fine-tuning the model.
---
Costs
I will be personally covering all training and deployment costs. This may change if I choose to put together some sort of patronage, but for now - don't worry about this. I will be using something like RunPod or some other custom deployed solution for training.
---
Cast Your Votes! ☑️
Share Your Ideas & Vote in the Comments Below! ✅
What do you want to see out of this first community model? What are some of the fine-tuning ideas you've wanted to try, but never had the time or chance to test? Let me know in the comments and we'll brainstorm together.
I am in no rush to get this out, so I will leave this up for everyone to see and interact with until I feel we have a solid direction we can all agree upon. There will be plenty of more opportunities to create, curate, and customize more fosai models I plan to release in the future.
Update [10/25/23]: I may have found a fine-tuning workflow for both Llama (2) and Mistral, but I haven't had any time to validate the first test run. Once I have a chance to do this and test some inference I'll be updating this post with the workflow, the models, and some sample output with example datasets. Unfortunately, I have ran out of personal funds to allocate to training, so it is unsure when I will have a chance to make another attempt at this if this first attempt doesn't pan out. Will keep everyone posted as we approach the end of 2023.
- LM Studio - A new tool to discover, download, and run local LLMs
Hey everyone!
I don't think I've shared this one before, so allow me to introduce you to 'LM Studio' - a new application that is tailored to LLM developers and enthusiasts.
Check it out!
- https://lmstudio.ai/
- https://github.com/lmstudio-ai
---
With LM Studio, you can ...
>🤖 - Run LLMs on your laptop, entirely offline
>👾 - Use models through the in-app Chat UI or an OpenAI compatible local server
>📂 - Download any compatible model files from HuggingFace 🤗 repositories
>🔭 - Discover new & noteworthy LLMs in the app's home page LM Studio supports any ggml Llama, MPT, and StarCoder model on Hugging Face (Llama 2, Orca, Vicuna, Nous Hermes, WizardCoder, MPT, etc.)
>Minimum requirements: M1/M2 Mac, or a Windows PC with a processor that supports AVX2. Linux is under development.
>Made possible thanks to the llama.cpp project.
>We are expanding our team. See our careers page.
---
Love seeing these new tools come out! Especially with the new gguf format being widely adopted.
The regularly updated and curated list of new LLM releases they provide through this platform is enough for me to keep it installed.
I'll be tinkering plenty when I have the time this week. I'll be sure to let everyone know how it goes! In the meantime, if you do end up giving LM Studio a try - let us know your thoughts and experience with it in the comments below.
- Combining 'LocalAI' + 'Continue' to Create a Private Co-Pilot Coding Assistant!
Hello everyone!
I am working on figuring out better workflows bringing back more consistent post schedules. In the meantime, I'd like to leave you with a new update from LocalAI & Continue.
Check these projects out! More info from the Continue & LocalAI teams below:
Continue
- https://continue.dev/
- https://github.com/continuedev/continue
The open-source autopilot for software development A VS Code extension that brings the power of ChatGPT to your IDE
LocalAI
- https://localai.io/basics/news/
- https://github.com/go-skynet/LocalAI
LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.
Combining the Power of Continue + LocalAI!
- https://github.com/go-skynet/LocalAI/blob/master/examples/continue/README.md
---
Note
From this release the llama backend supports only gguf files (see 943 ). LocalAI however still supports ggml files. We ship a version of llama.cpp before that change in a separate backend, named llama-stable to allow still loading ggml files. If you were specifying the llama backend manually to load ggml files from this release you should use llama-stable instead, or do not specify a backend at all (LocalAI will automatically handle this).
Continue
!logo
This document presents an example of integration with continuedev/continue.
For a live demonstration, please click on the link below:
Integration Setup Walkthrough
-
As outlined in
continue
's documentation, install the Visual Studio Code extension from the marketplace and open it. -
In this example, LocalAI will download the gpt4all model and set it up as "gpt-3.5-turbo". Refer to the
docker-compose.yaml
file for details.```bash
Clone LocalAI
git clone https://github.com/go-skynet/LocalAI
cd LocalAI/examples/continue
Start with docker-compose
docker-compose up --build -d ```
-
Type
/config
within Continue's VSCode extension, or edit the file located at~/.continue/config.py
on your system with the following configuration:```py from continuedev.src.continuedev.libs.llm.openai import OpenAI, OpenAIServerInfo
config = ContinueConfig( ... models=Models( default=OpenAI( api_key="my-api-key", model="gpt-3.5-turbo", openai_server_info=OpenAIServerInfo( api_base="http://localhost:8080", model="gpt-3.5-turbo" ) ) ), ) ```
This setup enables you to make queries directly to your model running in the Docker container. Note that the
api_key
does not need to be properly set up; it is included here as a placeholder.If editing the configuration seems confusing, you may copy and paste the provided default
config.py
file over the existing one in~/.continue/config.py
after initializing the extension in the VSCode IDE.Additional Resources
- Free Open-Source AI LLM Guide (Summer 2023)
Hello everyone!
We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of !fosai@lemmy.world. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all.
It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI.
The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon!
In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see.
---
Getting Started With Free Open-Source AI
Have no idea where to begin with AI / LLMs? Try starting with our Lemmy Crash Course for Free Open-Source AI.
When you're ready to explore more resources see our FOSAI Nexus - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology.
If you're looking to jump right in, I recommend downloading oobabooga's text-generation-webui and installing one of the LLMs from TheBloke below.
When you're ready, give https://fosai.xyz a visit and check out some of the resources I've placed on there for the community.
Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B).
8-bit System Requirements
| Model | VRAM Used | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | |-----------|-----------|--------------------|-------------------|-------------------| | LLaMA-7B | 9.2GB | 10GB | 3060 12GB, 3080 10GB | 24 GB | | LLaMA-13B | 16.3GB | 20GB | 3090, 3090 Ti, 4090 | 32 GB | | LLaMA-30B | 36GB | 40GB | A6000 48GB, A100 40GB | 64 GB | | LLaMA-65B | 74GB | 80GB | A100 80GB | 128 GB |
4-bit System Requirements
| Model | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | |-----------|--------------------|--------------------------------|-------------------| | LLaMA-7B | 6GB | GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 | 6 GB | | LLaMA-13B | 10GB | AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 | 12 GB | | LLaMA-30B | 20GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 | 32 GB | | LLaMA-65B | 40GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 | 64 GB |
*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.
When in doubt, try starting with 3B or 7B models and work your way up to 13B+.
---
FOSAI Resources
Fediverse / FOSAI
LLM Leaderboards
LLM Search Tools
---
Large Language Model Hub
oobabooga
text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. It is highly compatible with many formats.
Exllama
A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.
gpt4all
Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.
TavernAI
The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).
SillyTavern
Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8
Koboldcpp
A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.
KoboldAI-Client
This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.
h2oGPT
h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.
---
Models
The Bloke
The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).
These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.
Support TheBloke here.
---
70B
---
30B
---
13B
---
7B
---
More Models
---
More General AI/LLM Resources
Awesome-LLM: https://github.com/Hannibal046/Awesome-LLM
Awesome Jailbreaks: https://github.com/0xk1h0/ChatGPT_DAN
Awesome Prompts: https://github.com/f/awesome-chatgpt-prompts
Prompt-Engineering-Guide: https://github.com/dair-ai/Prompt-Engineering-Guide
AI Explained (Great channel for AI news): https://piped.video/channel/UCNJ1Ymd5yFuUPtn21xtRbbw
Lex Fridman (In depth podcasts): https://piped.video/channel/UCSHZKyawb77ixDdsGog4iWA
---
LLM Leaderboards
LLM Logic Tests:
https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit#gid=2011456595
llm-leaderboard: https://github.com/LudwigStumpp/llm-leaderboard
Chat leaderboard: https://chat.lmsys.org/?leaderboard
Gotzmann LLM Score v2.4: https://docs.google.com/spreadsheets/d/1ikqqIaptv2P4_15Ytzro46YysCldKY7Ub2wcX5H1jCQ/edit#gid=0 LLM Worksheet: https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYpb63e1ZR3aePczz3zlbJW-Y4/edit#gid=0
CanAiCode Leaderboard: https://huggingface.co/spaces/mike-ravkine/can-ai-code-results
AlpacaEval Leaderboard https://tatsu-lab.github.io/alpaca_eval/
Measuring Massive Multitask Language Understanding: https://github.com/hendrycks/test
Awesome-LLM-Benchmark: https://github.com/SihyeongPark/Awesome-LLM-Benchmark
---
Places to Find Models
Discovery the LLMs: https://llm.extractum.io/
Open LLM Models List: https://github.com/underlines/awesome-marketing-datascience/blob/master/llm-model-list.md
OSS_LLMs: https://docs.google.com/spreadsheets/d/1PtrPwDV8Wcdhzh-N_Siaofc2R6TImebnFvv0GuCCzdo/edit#gid=0
OpenLLaMA: An Open Reproduction of LLaMA: https://github.com/openlm-research/open_llama open-llms: https://github.com/eugeneyan/open-llms
---
Training & Datasets
Uncensored Models: https://erichartford.com/uncensored-models
LLMsPracticalGuide: https://github.com/Mooler0410/LLMsPracticalGuide
awesome-chatgpt-dataset: https://github.com/voidful/awesome-chatgpt-dataset
awesome-instruction-dataset: https://github.com/yaodongC/awesome-instruction-dataset
---
GL, HF!
Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community.
If you haven't already, consider subscribing to the free open-source AI community at !fosai@lemmy.world where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge.
Thank you for reading!
Update #1 [7/29/23]: I have officially converted this resource into a website! Bookmark and visit https://www.fosai.xyz/ for more insights and information!
Update #2! [9/22/23]: This guide may be outdated! All
GGML
model file formats have been deprecated in place of llama.cpp's newGGUF
- the new and improved successor to the now legacyGGML
format. Visit TheBloke on HuggingFace to find all kinds of newGGUF
models to choose from. Use interfaces like oobabooga or llama.cpp to runGGUF
models locally. Keep your eye out for more platforms to adopt the newGGUF
format as it gathers traction and popularity. Looking for something new? Check out LM Studio, a new tool for researching and developing open-source large language models. I have also updated our sidebar - double check for anything new there or at FOSAI▲XYZ!.