Skip to content
CREATIVE · TUTORIAL · CHARACTER SHEETS

How to build a character sheet.

10 MIN READ·PUBLISHED MAY 2026·FILED UNDER CREATIVE · IMAGE

·What it is

A character sheet is a small set of reference images of the same person — front, three-quarter, profile, full body, two expressions — that you feed to an image model as the "this is the character" anchor.

Without one, every prompt generates a slightly different person. With one, you can drop the same character into a hundred scenes and keep them recognisable. It is the single highest-leverage piece of preparation in any AI image workflow that involves a recurring person — a brand persona, a graphic novel protagonist, a spokesperson, yourself.

I built mine because I needed the portrait series cycling in the hero of this site to feel like the same person across six wildly different moods (studio, warehouse, cinematic, profile, mid-shot, stage). Before the sheet, every regeneration was a slightly different bald man. After the sheet, the system finally understood there was one specific bald man.

There are three ways to build one. I'll walk you through them in order of effort, with the tradeoffs honestly named, so you can pick the one that fits the size of the problem you actually have.

01Google Flow · Use Character

Fastest path. Lowest control. When to use it: you're inside Flow already, you need consistency across a single video sequence or storyboard, and you can live with mild drift between shots.

How it works. Open Flow, start a new project, enable the Use Character feature in the scene setup. Upload one or two reference images — front-facing, clean even lighting, no heavy shadows. Add a short text description ("bald man, mid-40s, light stubble, dark streetwear, no jewellery"). Generate. Flow conditions every subsequent shot in that project on the reference.

Tradeoffs.

  • Pro: Ten minutes end to end. No model files, no compositing, no command line, no dataset prep. The character travels automatically through every shot in the project.
  • Con: Confined to Flow's pipeline — the character does not export cleanly to other tools. Drift increases as scene complexity grows: hair texture, eye colour, jaw shape will wobble across longer sequences. There is no way to fine-tune.

Best for: storyboards, mood reels, pitch decks — anything where the character only needs to read as "the same person, roughly" rather than "the same person, exactly."

02Reference-shot pipeline, manually composited

Most control. Moderate effort. When to use it: you want maximum control over which "version" of the character becomes canonical, and you want a sheet that is portable across multiple tools (Midjourney --cref, Nano Banana reference-attach, Imagen 3 character mode, Flux ControlNet, etc.).

This is the method I used for the portrait series on this site. The six images cycling in the hero are the output. The character sheet that anchors them is the input.

How it works.

  1. Pick one anchor reference. Start with one image that you genuinely like — sharp focus, even lighting, neutral expression, three-quarter face is usually safest. Everything that follows is conditioned on this one image, so it has to be good. Spend an hour on this step alone.
  2. Enumerate the angles you need. A minimal sheet is six panels: front portrait, three-quarter left, three-quarter right, full profile, three-quarter body shot, plus one or two expression variants. A richer sheet adds a back-of-head shot and a low-angle hero shot.
  3. Generate each angle with reference conditioning. In Midjourney that is --cref [your image URL] --cw 100. In Nano Banana it is the reference-attach feature. In Imagen 3 it is character mode. Generate four to eight variants per angle.
  4. Cherry-pick. For each angle, choose the variant where the face matches the anchor most cleanly — same jaw, same eye spacing, same nose ridge. This is the slow part and it matters. You are training your own eye to spot drift.
  5. Composite. Drop the picks into a single sheet image in Photoshop, Affinity, Figma, or GIMP. A clean 2×4 or 3×3 grid is enough. Label each panel with the angle name.
  6. Use the sheet as your new reference. You now have a portable character. Attach the sheet (or a single panel from it) as the reference image in any new generation, in any tool.

Tradeoffs.

  • Pro: You control every panel. You can mix tools per angle — Nano Banana might nail your front shots while Midjourney nails the profile. The resulting sheet is portable across any tool that accepts a reference image, including ones that don't exist yet.
  • Con: Two to four hours per character. Requires basic compositing skill. Output quality is entirely a function of the source reference, so pick the anchor carefully — a mediocre anchor produces a mediocre sheet no matter how good the downstream generations are.

Best for: brand work, personal sites, anything where the character will appear dozens of times and consistency matters more than the evening it took to build the sheet.

03Train a Flux1D LoRA

Highest consistency. Largest upfront cost. When to use it: you're building a brand around a specific character or persona that will appear hundreds of times. You are willing to invest a weekend the first time you do it. You are comfortable opening a terminal and editing a YAML file.

How it works.

  1. Gather a dataset. Fifteen to thirty-plus images of the character. Mix of angles, lighting setups, expressions, distances. Crop each one to a consistent square or portrait aspect (1024×1024 or 1024×1280 are the current Flux1D sweet spots). Remove blurry, watermarked, or off-character images. The dataset is the entire game.
  2. Caption each image. This is the step most beginners underweight. Describe everything in the image except the character itself — clothing, pose, environment, lighting, mood, expression. The model will learn the character as the "constant" that appears under every caption, and the captions teach it which variables you want to be able to control later. Use a unique trigger word (e.g. rtnl_man) at the start of every caption.
  3. Pick a training environment.
    • RunPod or Vast.ai (cloud GPU rental): ~$3-10 per full training run. Cheapest option. Requires comfort with SSH and a Linux shell.
    • Replicate's Flux LoRA trainer (managed): pay per training run, decent defaults, point-and-click. Easiest entry path.
    • Local training on a 24 GB+ GPU (3090, 4090, A6000): zero marginal cost if you already own the hardware.
  4. Train. Reasonable starting settings: 1500–3000 total steps, learning rate between 1e-4 and 4e-4, batch size 1, network dimension 16–32. Watch the loss converge but do not overtrain — past a certain point the model starts memorising backgrounds and clothing instead of generalising the character.
  5. Test in ComfyUI or Forge. Load Flux1D base + your new LoRA file at strength 0.7-1.0. Generate test prompts that include your trigger word. Bad outputs almost always mean the dataset is bad, not that the training is bad — go back to step 1.
  6. Iterate. Prune the weakest three to five images from the dataset. Retrain. Repeat until the LoRA reliably produces the character in arbitrary scenes without prompting tricks.

Tradeoffs.

  • Pro: Highest consistency available today. Character is portable across any image tool that loads Flux LoRAs (a list that is growing fast). Once trained, infinite generations cost nothing additional. The LoRA file is a few hundred megabytes and lives on your own disk.
  • Con: One to two weekends the first time you do this. Requires technical comfort with model files, ComfyUI graphs, dataset captioning, and basic command-line work. Dataset quality dictates everything — there is no amount of training that fixes a bad dataset, and that is the discovery most first-time trainers make the hard way.

Best for: long-running brand work, character-driven projects (graphic novels, recurring advertising creative, a personal site you intend to keep updating for years), or any case where method 02's portability stops being enough.

·What I actually use

The portrait series on this site is Method 02. One anchor reference, generated through Nano Banana with reference-attach, with text-prompt overrides for the six moods (studio, warehouse, cinematic, profile, mid-shot, stage). Total build time across all six was about three evenings of cherry-picking, plus an hour of compositing.

I am working on a Method 03 LoRA on the same character. When that ships, the cycling images here will get replaced with LoRA-generated variants, and the training notebook and dataset captioning notes will go up under Technical · Deep End.

If you build one yourself using any of these methods, send it to me. I would like to see it.