== *** ==

Learning ComfyUI for Better Art

2024-09-19

Est. 6m read

In 2022, I was having fun with Automatic1111. At the time, it was state of the art text-to-image generation. Anyone could make a good looking image just by picking the right words. But it stopped being fun once I had to really think about what to make next.

In 2023, I paid for $10 worth of cloud GPU credits at runpod.io and that has lasted me to this day. I still have $2.29 worth. It allowed me to experiment with all kinds of different GPUs (even an A40) for running fine-tuning jobs and using Stable Diffusion.

In 2024, I solved my storage and compute concerns¹ and one of the first things I wanted to go back to was using Stable Diffusion. Luckily, the tools have evolved and gotten even more incredible since I last dived into it.

The first thing I had to learn was ComfyUI. After using it for a while now, I’m convinced that ComfyUI is the best way to use diffusion models right now. It is a much more expressive process compared to Automatic1111 in 2022. You can generate exactly what you need and even extend it by using custom nodes.

ComfyUI default output — The default image prompt for ComfyUI

Starting Out

Double clicking anywhere in the canvas will give you a search box. Type “Load Checkpoint” to create a node that allows you to choose from a diffusion checkpoint. It has a few outputs which you can see as color-coded dots on the left.

The Model (Purple) is the diffusion model being loaded, it’ll be used for image generation. Different models have different styles.
The CLIP² (Yellow) is a neural net that guides the output of the image based on text. Different checkpoints have different CLIP.
The VAE³ (Red) is an encoder that decides which images from the data to use in the output generation step. Most of the time, the VAE is baked into the checkpoint. Sometimes it isn’t!

KSampler Node

All of these outputs are then given to a KSampler node which does the diffusion part. The KSampler also requires a “Latent Image” as input.

The latent image is typically created using an “Empty Latent Image” node with a custom width and height. For Stable Diffusion (and XL) there are recommendations for the size and aspect ratios of the latent image.⁴

Some handy tips

In the KSampler node, set control_after_generate to increment, it makes it easier to track the last 10+ seeds used. Randomize makes it harder.
Steps and CFG heavily rely on your checkpoint. Check the checkpoint’s CivitAI description/gallery for recommendations.
There are custom nodes that make it easy to remember the recommended width/height.
By dragging and dropping an input/output onto the empty grid, it will show you suggested nodes for the connection.

Conditioning Prompts

Conditioning is what they call the positive/negative prompt inputs. By connecting a “CLIP Text Encode (Prompt)” node to the KSampler. There’s one for a “Positive” and “Negative” to help guide the image, specifying what you do and don’t want to see.

IPAdapters

Simple image generation is just one part of ComfyUI, what makes ComfyUI special is how it allows you to combine nodes in novel ways to create as many images as needed in a “workflow.”

Custom Node Manager

By this point, you’ll probably need some custom nodes to extend ComfyUI. To get started with custom nodes, install ltdrdata/ComfyUI-Manager using the guide and it’s a whole new game.

A powerful one to start with is the IPAdapters. People often brand them as a “one-shot-lora” meaning you can generate images of a specific style using only one input image. Also known as style transfer. To use them is pretty easy, once you’ve got the IPAdapter models and the IPAdapter Plus nodes installed it looks like this:

The IPAdapter Unified Loader goes into the IPAdapter which modifies the model used in the KSampler. The model should have this “new style” applied based on the input image (blue wire). Adjusting the weight, start_at, and end_at values will control how much and when the IPAdapter is applied.

IPAdapter is also good in instances where you want the face to match an input image. By using additional tools (insightface and FaceID) the outputs can be really accurate. There are some licensing concerns around insightface thought if you’re using it commercially.

YouTube - Ultimate Guide to IPAdapter on ComfyUI - This video is a great guide to learn IPAdapter and as reference for how to install insightface.

ControlNet

The control net is a much stronger way of controlling the shape of the output image. It’s good at taking a sketch (or other types of input) and figuring out how to make an image that aligns with the input image. It can do use depth maps, canny edges, and even OpenPose images as input.

Applying a ControlNet is really simple, just add the Load and Apply ControlNet nodes, by adjusting the weight value you can allow the ControlNet more freedom in its output.

There are many kinds of ControlNet models, far too many to include in this post. I’d recommend reading the GitHub page for this ComfyUI node. It explains many of the ControlNets visually.

Face Fixer

One node that’s been useful for me is the FaceDetailer from ComfyUI Impact Pack. It’s very easy to use, all it needs is something that detects the location of all the faces in the image (bbox) and the image itself.

In this case, I’m using “UltralyticsDetectorProvider” to detect the faces for the best results.

This extra step of detailing the face often makes the output much better.

SwarmUI

One of the last pieces to all of this was giving others an easy way to use the workflows I’m making. I could’ve turned my ComfyUI workflow into an API, instead I’ve put it into SwarmUI. I’m still experimenting with ways to enable others to use my workflows.

Final Tips

You can drop an image you generated into ComfyUI and it will recreate the node graph that made it.
Make backups of your workflows. I’ve already had to start from scratch ~~once~~ twice.
Look for other workflows online, there are some great ones out there on YouTube/CivitAI. Search around for what you’re looking for.

Cutom Node Starter Pack

It may seem like a lot, but each of these has their own reason for being installed. I’d consider this to be a solid starter pack, if you’re generating videos for example you’ll need more.

ComfyUI-Manager
ComfyUI Impact Pack
TensorRT Node for ComfyUI
ComfyUI’s ControlNet Auxiliary Preprocessors
Efficiency Nodes for ComfyUI Version 2.0+
UltimateSDUpscale
MTB Nodes
Comfyroll Studio
ComfyUI_IPAdapter_plus
rgthree’s ComfyUI Nodes
Save Image with Generation Metadata
ComfyUI-SDXL-EmptyLatentImage
ComfyUI Nodes for External Tooling
ComfyUI Inpaint Nodes
Use Everywhere (UE Nodes)
KJNodes for ComfyUI
segment anything
comfyui-mixlab-nodes
ComfyUI Workspace Manager - Comfyspace
Crystools
hd-nodes-comfyui

As mentioned, this list isn’t complete, as you try workflows from other people, it may ask you to install additional ones. That’s also a great way of getting “the essentials” is by opening custom workflows and clicking “Install Missing Custom Nodes.”

Pages

Blog