What is Stable Diffusion
Stable Diffusion is an open-source AI image generation model released by Stability AI in 2022. Unlike Midjourney or DALL-E, which run on private servers, Stable Diffusion runs on your own computer. The model weights (the actual AI brain) are free to download and use.
Because it's open-source, thousands of developers have built interfaces, extensions, and fine-tuned versions on top of it. The community around Stable Diffusion is massive - Civitai (the main model sharing site) hosts over 100,000 custom models.
The tradeoff versus paid tools: more setup work, more technical knowledge required, and hardware requirements. The reward: complete control, unlimited generation, and zero subscription costs once you're set up.
Did you know? Stable Diffusion is completely free and open-source. SDXL generates images at 1024x1024 base resolution. ComfyUI and Automatic1111 are the two most popular user interfaces for running it.
Source: Stability AI documentation and community statistics, 2025
Hardware Requirements
The minimum to run Stable Diffusion locally is a GPU with 4GB VRAM. At 4GB, generation is slow and you're limited to smaller image sizes. 8GB is where Stable Diffusion starts feeling comfortable. 12GB+ is where you can run SDXL smoothly and work with upscalers.
| GPU | VRAM | Performance | Recommendation |
|---|---|---|---|
| RTX 3060 / 4060 | 8GB / 12GB | Good | Solid entry point |
| RTX 3070 / 4070 | 8GB / 12GB | Very good | Comfortable for most work |
| RTX 3080 / 4080 | 10-16GB | Excellent | Power user territory |
| RTX 4090 | 24GB | Maximum | Professional/heavy use |
| No GPU / Cloud | N/A | Variable | Google Colab / RunPod |
AMD GPUs: AMD cards (RX 6000/7000 series) work via ROCm on Linux. Windows support exists but is more complex to set up. If you're on Windows and considering new hardware specifically for Stable Diffusion, NVIDIA is the easier choice.
Apple Silicon (M1/M2/M3): Stable Diffusion runs via CoreML on Apple Silicon. Performance is good for an integrated solution. The M2 Max and M3 Pro/Max chips have enough unified memory to run SDXL well.
No GPU? Use Google Colab (free tier available, intermittent GPU access) or RunPod ($0.20-0.50/hour for a dedicated GPU). Both let you run full Stable Diffusion in the cloud without owning hardware.
Installation Options
There are three main ways to get started. Pick based on your technical comfort and hardware.
Option 1: Automatic1111 (Beginner Recommended)
Automatic1111 (A1111) is the most beginner-friendly local option. It has a traditional web UI that runs in your browser.
- Install Python 3.10.6 - Download from python.org. Important: check "Add Python to PATH" during installation.
- Install Git - Download from git-scm.com and install with default settings.
- Download the installer - Go to the Automatic1111 GitHub page, download the latest release zip file.
- Run webui-user.bat - On Windows, double-click this file. It downloads all dependencies automatically (this takes 5-15 minutes on first run).
- Open your browser - The UI opens at http://localhost:7860 automatically when ready.
Option 2: ComfyUI (Advanced)
ComfyUI uses a node-based visual workflow. More flexible and increasingly used by professionals. The learning curve is steeper but worth it if you want full control over the generation pipeline.
Option 3: Google Colab (No GPU Required)
Several free Colab notebooks give you Stable Diffusion access with no local setup. Search "stable diffusion automatic1111 colab" for current notebooks. The free Colab tier gives intermittent GPU access. Colab Pro ($10/month) gives reliable GPU sessions.
Your First Image Generation
Once A1111 is running in your browser, here's how to generate your first image.
- Find the text prompt box - The large text area at the top labeled "Prompt." This is where you describe what you want.
- Type a simple prompt - Start simple: "a golden retriever puppy in a field of flowers, digital photography, bright colors"
- Set basic parameters - Width/Height: 512x512 for SD 1.5, 1024x1024 for SDXL. Steps: 20 is a good starting point. CFG Scale: 7 is default.
- Click Generate - The big orange button. Watch the progress bar. Your first image appears in seconds (or a minute+ on slower hardware).
- Iterate - If you don't like it, adjust the prompt and generate again. This is the whole workflow.
What is CFG Scale?
CFG Scale (Classifier Free Guidance) controls how closely the model follows your prompt. Low values (3-5) give more creative freedom. High values (10-15) stick closer to your exact words. 7 is a reliable default that balances creativity with instruction-following.
Understanding Checkpoints and Models
The base Stable Diffusion model is a starting point. The community has created thousands of fine-tuned versions called checkpoints that specialize in specific styles or subjects.
Download checkpoints from Civitai.com. Place the downloaded .safetensors files in the stable-diffusion-webui/models/Stable-diffusion/ folder. Click the refresh button in A1111 and the checkpoint appears in the dropdown.
Popular beginner checkpoints:
- DreamShaper XL - Great all-around model, handles portraits and landscapes well
- Realistic Vision - Photorealistic human portraits
- Juggernaut XL - Highly detailed photorealistic output
- Anything V5 - Anime and illustrated art style
- AbsoluteReality - Hyper-realistic photography style
Each checkpoint has a different "sweet spot" for prompting. The Civitai page for each model includes example prompts from the creator - start with those to understand how the model responds.
Essential Prompting Techniques
Stable Diffusion prompting has some quirks compared to Midjourney. Here's what matters most.
Positive vs Negative prompts: A1111 has two prompt fields. The top one is your positive prompt (what you want). The bottom is the negative prompt (what you don't want). Use the negative prompt heavily - it dramatically improves output quality.
Universal negative prompt:
ugly, blurry, low quality, worst quality, normal quality, text, watermark, signature, extra limbs, deformed hands, bad anatomy, mutated, disfigured
This single negative prompt eliminates most of the obvious quality issues you'll see in beginners' generations.
Quality boosters (add to positive prompt):
masterpiece, best quality, highly detailed, 8k resolution, professional photography
Keyword weighting: You can give keywords extra importance with parentheses: (red dress:1.4) makes red dress 40% more emphasized. Useful for making sure key elements don't get lost in complex prompts.
Prompt length: SD handles longer prompts better than Midjourney. 50-100 words in the positive prompt is fine. Be specific and descriptive.
Must-Have Extensions
A1111's extension system lets you add capabilities. These are the ones worth installing.
ControlNet: The single most powerful extension. Lets you control the pose, depth, and composition of generated images using reference images. Essential for character consistency and controlled compositions. Install from the Extensions tab in A1111.
ADetailer: Automatically fixes faces in your images. SD often produces slightly blurry or distorted faces in full-body shots. ADetailer detects faces and upscales/refines them automatically. Huge quality improvement for portraits.
Ultimate SD Upscale: Upscales images to 2x or 4x resolution while adding detail. Turns a 512x512 into a crisp 2048x2048. Essential if you need high-resolution output.
Aspect Ratio Helper: Makes it easy to set standard aspect ratios (16:9, 4:5, 9:16) without doing math. Small but useful quality-of-life addition.
Troubleshooting Common Issues
"CUDA out of memory" error: Your GPU ran out of VRAM. Try: reduce image resolution to 512x512, reduce batch size to 1, add --medvram to the launch command in webui-user.bat, or enable "Tiled VAE" in the settings.
Blurry faces: Install ADetailer extension. Alternatively, use the "Restore faces" checkbox in the settings. For portraits, upscale after generation and detail will improve.
Wrong colors or composition: Increase the CFG Scale from 7 to 9-12 to make the model follow your prompt more closely. Also check that you're using the right checkpoint for your style.
Generation is very slow: Check that you're using the GPU and not CPU. In A1111, look at the bottom status bar - it should show your GPU. If it's using CPU, check that PyTorch was installed with CUDA support during setup.
Images look low quality despite good prompt: Switch to a better checkpoint. The base SD 1.5 and SDXL models produce average-quality output. Community fine-tuned models from Civitai are dramatically better for most use cases.
Safety Note
Stable Diffusion has fewer built-in content filters than commercial tools. The default installation includes NSFW filters but they can be bypassed. Be thoughtful about what you generate and don't share or publish content that violates platform terms of service or community standards.