Interactive Guide to LoRA Dataset Creation

Build Your Perfect Character Dataset for LoRA

This guide translates the complexity of creating a high-quality LoRA dataset into an interactive experience. Move from theory to practice and learn how to meticulously curate data to train a Stable Diffusion model that consistently generates your unique character.

The Blueprint for Success

A high-fidelity LoRA model is built on a foundation of specific data principles. Click on each card below to explore the nine essential criteria that define an optimal dataset for character training.

Curation Workbench

This is where theory becomes practice. Use these interactive tools to understand and configure the key parameters of your dataset, from image count to file structure and captioning strategy.

1. Define Your Dataset Size

The number of images in your dataset impacts the LoRA's learning depth. While quality trumps quantity, there are recommended ranges for different training goals. Use the chart for reference and the slider to explore the trade-offs.

40 Images: Recommended for a robust model

2. Structure Your Files

Correct file organization is critical for training scripts like Kohya_ss. The folder name tells the trainer how many times to repeat each image per epoch and defines your character's unique trigger word. Create your folder structure below.

Instance Token (Unique Trigger)

Class Token (Category)

Repeats: 10

Your Training Folder Name:

10_skswoman woman

3. Master the Captioning Rule

Captions are your primary tool for controlling the LoRA. They teach the model what features are changeable versus what features are part of the character's core identity. The rule is simple but powerful.

✅ Tag This (To Make It Changeable)

Tag any attribute you want to control with prompts during generation.

Clothing: `red dress`, `blue jacket`
Expressions: `smiling`, `surprised look`
Backgrounds: `outdoors`, `at a cafe`
Actions: `standing`, `holding a book`

❌ Don't Tag This (To Keep It Consistent)

Do NOT tag the core features of your character. The LoRA will learn these implicitly.

Inherent traits: `blue eyes`, `freckles`
Signature style: `specific hairstyle`
Body shape: `athletic build`
Unique features: `a small scar`

Image Collection Blueprint: 150 Shots for Perfection

This section provides an exact breakdown of 150 images to build a comprehensive LoRA dataset for your naked female character. Since clothing and poses will be controlled externally, the focus here is on varying facial expressions, lighting, camera angles, and subtle body views to capture the character's core identity. Progress through these tiers to incrementally enhance your model. Hover over an image description and click "Show Prompt" for a suggested generation prompt for each image.

Tier 1: Foundation (Images 1-25) - Core Identity

Focus on establishing the character's fundamental facial features, body proportions, and overall likeness from standard angles under simple lighting. These images form the bedrock of your LoRA.

Tier 2: Expansion (Images 26-60) - Adding Nuance

Expand the dataset by introducing a broader range of facial expressions, varied lighting conditions, and subtle camera angles. This tier helps the LoRA learn to generalize the character's features under diverse visual stimuli.

Tier 3: Depth (Images 61-100) - Detailed Features & Complex Expressions

Dive deeper into the character's intricate details and emotional range. This tier focuses on capturing more complex expressions and using dramatic lighting to highlight features, along with closer body shots.

Tier 4: Versatility (Images 101-150) - Advanced Variations & Environment Cues

The final tier pushes the boundaries with advanced lighting, extreme angles, and a wide array of complex expressions. These images will ensure your LoRA is highly robust and can generate your character in almost any scenario, even with very subtle background hints.

Final Preparation Checklist

Before you start training, run through this checklist. It summarizes the key preparation steps to ensure your dataset is optimized for the best possible results.

Embrace the Iterative Process

LoRA training is a cycle: train, test, analyze, and refine your dataset. Don't aim for perfection on the first try. Use your initial results to identify weaknesses and improve your data for the next training run. Happy training!

For any help, here is my Instagram:

leo.tdo