Running local LLMs with GPT4All and LM Studio

When it's so easy to go local, why would you use the cloud?

Running local LLMs with GPT4All and LM Studio
light-streaks-inverted

In my day to day work, I'm finding that I'm more likely run a local, open source Large Language Model (LLM) on my machine than I am to open up a browser to use a commercial offering. GPT4All and LM Studio provide simple and straightforward options for running LLMs locally without a lot of complex configuration overhead.

Running LLMs locally provides a ton of benefits including:

  • No caps on query submissions. I'm not going to get throttled because I'm not paying for the latest and greatest model.
  • Total privacy. All chats stay on my machine, which is great if I need to use the LLM to process sensitive data that I don't want in the cloud.
  • The ability to select the appropriate model for the task at hand. Running LLMs locally means I can use just about any model available on HuggingFace.
  • Options for tuning the model for your specific use case. It's possible to add system prompts and tune other parameters such as the context length, temperature, and whatnot. At this point, I don't know what all of the sliders adjust, but it's always nice to have the option to tinker under the hood.

Of course, there are tradeoffs. Running LLMs locally means that you are limited by the compute power of your machine, which affects both the speed of responses and the size of the model that you can run. However, for most of the fuzzy tasks that I use an LLM for, a smaller model (27B parameters or fewer) is totally sufficient. 8B or 9B parameters seems to strike the right balance between speed, output quality, and sophistication.

I even used local LLMs to help prepare this blog post:

  • I dictated a rough draft of the content using the built in Audio Recorder in Obsidian
  • Then, I used Whisper to transcribe it with the Turbo model
  • Finally, I used gemma-2-9b-it-GGUF in LM Studio to clean up the wonky formatting in the .txt file from Whisper using this prompt:
Clean this document up by placing paragraph breaks in logical places and closing up excessive whitespace and trailing spaces. Paragraphs should break after cohesive ideas and should be roughly 3–6 sentences, with preference for shorter paragraphs.

This process yielded about 780 words of content, but for me there's a huge difference between writing at the keyboard and dictating, so the bulk of what I dictated needed to be rewritten. The value of using LLMs during the process is that it helped me take a first pass at getting my thoughts out while generating lots of great reference materials.

My setup for running LLMs

I've been using two applications—GPT4All and LM Studio—to run LLMs locally on my machine. At time of writing, I'm using an M1 MacBook Pro at home and an M3 MacBook Pro at work, both with 32G of RAM. Both machines have plenty of compute power for the LLM tasks I run. 16GB of RAM would probably be sufficient, but YMMV depending on what LLM you want to use and what you throw at it.

GPT4All and LM Studio: two great options for trying this at home

GPT4All and LM Studio are two plug-and-play apps for running LLMs locally. Both apps work well, but I prefer LM Studio for its interface and overall selection of models.

I discovered GPT4All through Mozilla's IRL Podcast, which featured Andriy Mulyar, founder and CTO of Nomic, which made GPT4All. This is what he had to say:

One of the biggest focuses that we have around GPT4All is making sure that privacy is the first thing we think about. In some sense, one of the core reasons behind why we even built GPT4All and the ecosystem of models that came in with it was because of all these large issues and concerns about privacy with people using OpenAI’s models.

That was enough to convince me to give it a shot. After using GPT4All for a bit, I bobbed over to Reddit and discovered LM Studio. For the past few months, I've gone back and forth between GPT4All and LM Studio for different tasks. My evaluation has been completely non-scientific, but LM Studio simply feels better to me. I love the ethos and vision of Nomic and GPT4All, but LM Studio simply wins out in terms of simplicity, features, and overall user experience.

Some of the features I really like in LM Studio include:

  • Staff Picks of recommended models.
  • Model descriptions pulled directly from HuggingFace, plus a link to the model page for more details.
  • User, Power User, and Developer modes to make your experience as simple or as customizable as you like.
  • The option to add system prompts to constrain the output of a model (e.g. keep all responses to three sentences, respond only in haikus, etc.)
  • The ability to create folders to organize chats, which is extremely useful if you use different models to assist with different parts of a project or if you need to start multiple chats to get the model to produce usable output and you want to trace your steps.
  • Basic Retrieval Augmented Generation (RAG) capabilities. You can feed it a document from outside the prompt for incorporation in the response. It's handy if you want to query a document or request a summary.
💡
Both LM Studio and GPT4All provide (limited) capabilities for Retrieval Augemented Generation (RAG). In short, you can feed the model a document, or a collection of documents and have it process that as part of the input. Performing RAG locally has lots of interesting use cases, but I think I'd need beefier hardware and a bigger model to really make use of it. RAG on a local machine does well with one or two documents, but when I prompted GPT4All to summarize my Obsidian vault, it kept summarizing like five notes.

Getting started with local LLMs

Running local LLMs is a simple and highly practical way for leveraging AI capabilities in your day-to-day work without privacy concerns or feeding big models. I recommend downloading GPT4All and LM Studio and giving it a try.

I tend to favor gemma-2-9b-it-GGUF in LM Studio and Meta-Llama-3.1-8B-Instruct-GGUF in GPT4All. Both models provide a great starting point and take up roughly 5GB of hard drive space. Give 'em a whirl and see if you feel like you really need ChatGPT, Gemini, or Claude.

Resources