S02

Local vs Cloud LLMs

15 min read

Running AI on your computer vs using online services

-------|---------------|---------|

Cloud	Use someone else's servers via internet	ChatGPT, Claude, Gemini
Local	Run the AI on your own computer	Ollama, LM Studio

Both have their place. Let's figure out which is right for you.

Cloud LLMs (ChatGPT, Claude, etc.)

How It Works

Your device → Internet → Company's servers → Response back to you

The AI runs on powerful computers owned by OpenAI, Anthropic, or Google. You just send messages and get responses.

Advantages

Advantage	Why It Matters
-----------	----------------
No setup	Create account, start chatting
Most powerful models	GPT-4, Claude 3.5, etc. are only available here
No hardware needed	Works on any phone, tablet, or computer
Always updated	Companies improve models without you doing anything
Free tiers	Casual use costs nothing

Disadvantages

Disadvantage	Why It Matters
--------------	----------------
Privacy concerns	Your conversations go through their servers
Internet required	No connection = no AI
Can be slow	Server load affects response time
Usage limits	Free tiers have caps; heavy use costs money
Data policy concerns	May be used for training (check settings)
Censorship/guardrails	Some topics are restricted

Best For

Most people, most of the time
When you need the smartest models
When you don't want to deal with technical setup
When privacy isn't a critical concern

Local LLMs (Ollama, LM Studio)

How It Works

Your device → Your device
(Everything stays on your computer)

You download the AI model and run it on your own hardware. No internet needed once set up.

Advantages

Advantage	Why It Matters
-----------	----------------
Complete privacy	Data never leaves your computer
Works offline	Use on planes, in remote areas, anywhere
No usage limits	Run as much as you want
No subscription fees	Free forever after setup
Full control	No censorship or content restrictions
Customizable	Can fine-tune for specific tasks

Disadvantages

Disadvantage	Why It Matters
--------------	----------------
Hardware requirements	Need a decent computer (see below)
Smaller models	Best models don't run locally (yet)
Setup required	Need some technical comfort
Slower	Unless you have a good GPU
No internet features	Can't search web, access current info
Your responsibility	Updates, troubleshooting, etc.

Best For

Privacy-conscious users
Developers building applications
Offline use cases
Learning how AI works
Avoiding monthly subscriptions

Hardware Requirements for Local LLMs

Minimum Specs

Component	Minimum	Recommended
-----------	---------	-------------
RAM	8 GB	16-32 GB
Storage	10 GB free	50+ GB free
CPU	Modern 4-core	8+ cores
GPU	Not required	NVIDIA with 8+ GB VRAM

What Can You Run?

Your Hardware	What Models Work	Quality
---------------	------------------	---------
Basic laptop (8GB RAM)	3B parameter models	Basic, faster
Good laptop (16GB RAM)	7-8B parameter models	Decent
Gaming PC (32GB RAM + GPU)	13-70B parameter models	Good
Workstation (64GB+ RAM, pro GPU)	70B+ parameter models	Great

Model Size Guide

Model Size	RAM Needed	Quality Comparison
------------	------------	-------------------
3B	4-6 GB	Basic assistant
7-8B	8-10 GB	Like GPT-3.5
13B	16 GB	Better reasoning
30-34B	24-32 GB	Approaching GPT-4 quality
70B	48+ GB	Strong performance

How to Get Started: Cloud

Option 1: ChatGPT

Go to chat.openai.com
Click "Sign Up"
Create account (email or Google)
Start chatting

Free tier: GPT-3.5 unlimited, limited GPT-4o

Option 2: Claude

Go to claude.ai
Click "Try Claude"
Create account
Start chatting

Free tier: Generous daily limits

Option 3: Gemini

Go to gemini.google.com
Sign in with Google account
Start chatting

Free tier: Unlimited basic use

How to Get Started: Local

Option 1: Ollama (Easiest for Mac/Linux)

# Install (Mac/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama3.2

# That's it! Start chatting.

Windows: Download from ollama.com

Option 2: LM Studio (Easiest for Everyone)

Go to lmstudio.ai
Download for your OS
Open the app
Browse and download a model (app handles this)
Click "Chat" and start talking

Best for: People who prefer graphical interfaces

Option 3: Jan (Simple, Cross-Platform)

Go to jan.ai
Download the app
Download a model through the app
Chat

Best for: Beginners who want a ChatGPT-like experience locally

Recommended First Local Model

For most people starting out:

Llama 3.2 3B - Good balance of quality and speed, runs on most computers

# With Ollama
ollama run llama3.2

If you have more RAM (16GB+):

Llama 3.1 8B - Better quality, still reasonable speed

ollama run llama3.1:8b

Privacy Deep Dive

Cloud Privacy Concerns

Concern	Reality	Mitigation
---------	---------	------------
Conversations stored	Usually yes, for service improvement	Check data settings
Used for training	Sometimes, usually opt-out available	Disable in settings
Employees can see chats	Rarely, for safety review	Don't share secrets
Data breaches possible	Small risk, big companies are targets	Avoid very sensitive info

What NOT to Put in Cloud LLMs

Passwords or secret keys
Financial account numbers
Medical records
Confidential business strategies
Personal secrets you wouldn't share
Client confidential information

When Local is Worth the Effort

Processing sensitive documents
Journaling or personal reflection
Business confidential work
Medical or legal information
Anything you wouldn't email

Cost Comparison

Cloud Costs

Service	Free Tier	Paid Tier
---------	-----------	-----------
ChatGPT	GPT-3.5 unlimited	$20/month for Plus
Claude	~100 messages/day	$20/month for Pro
Gemini	Unlimited basic	$20/month for Advanced
API (per token)	Varies	$0.50-30 per million tokens

Typical user: Free tier is enough Power user: $20/month Developer building apps: $10-100/month depending on usage

Local Costs

Component	One-time Cost	Ongoing Cost
-----------	---------------	--------------
Software	Free (Ollama, LM Studio)	$0
Models	Free (open weights)	$0
Electricity	~$2-10/month if heavy use	~$2-10/month
Hardware (if upgrading)	$500-2000+	$0

If you already have a decent computer: Essentially free If you need to upgrade: Significant upfront, but no subscription

The Hybrid Approach (Recommended)

Many people use both:

Task	Use
------	-----
General questions	Cloud (faster, smarter)
Sensitive documents	Local (privacy)
Creative writing	Either (test both)
Coding help	Cloud (better for now)
Offline work	Local (obviously)
Learning how LLMs work	Local (full access)

The best of both worlds:

Keep a cloud account for powerful models
Set up a local option for privacy-sensitive tasks
Use whichever fits each situation

Quick Decision Guide

Use Cloud If:

[ ] You want the smartest AI available
[ ] You don't want technical setup
[ ] Privacy isn't a major concern
[ ] You're always online
[ ] You want features like web search, image generation

Use Local If:

[ ] Privacy is important to you
[ ] You work with sensitive information
[ ] You want to avoid subscriptions
[ ] You're often offline
[ ] You want to learn how AI works
[ ] You have decent hardware

Use Both If:

[ ] You want flexibility
[ ] Different tasks have different needs
[ ] You're a curious person who likes options

Model Comparison (Honest Assessment)

Capability	Best Cloud	Best Local	Gap
------------	-----------	------------	-----
General chat	GPT-4/Claude	Llama 3.1 70B	Noticeable
Simple tasks	Any	Any 7B model	Small
Coding	Claude/GPT-4	DeepSeek Coder	Moderate
Creative writing	Claude	Llama 3.1	Moderate
Reasoning	GPT-4/Claude	Llama 3.1 70B	Noticeable
Speed	Fast	Depends on hardware	Varies

Bottom line: Cloud models are still noticeably better, but local models are improving rapidly and are "good enough" for many tasks.

Key Takeaways

Cloud is easier, local is more private - choose based on your needs
Free cloud tiers are generous - most people never need to pay
Local is surprisingly good - especially for 7B+ models
You can use both - many people do
Hardware matters for local - but it's getting better
Start with cloud - add local when you have a reason to

What's Next?

Using cloud? Just go to Claude.ai or ChatGPT and start
Trying local? Download LM Studio and try Llama 3.2
Ready to go deep? Check out the Deep Dive modules based on Karpathy's video

Original content comparing local and cloud approaches.