Bring You up to Speed with Comfy Creator

Paul Fidika
6 min readFeb 23, 2024

--

(Note: Comfy Creator is not yet publicly available; expect it mid-March 2024.)

Comfy Creator (formerly void.tech) is a fork of ComfyUI:

ComfyUI is an inference pipeline for producing media with AI. It started as a pipeline for stable diffusion models (which are models that take text and produce an image), but it’s become much, much broader than that.

ComfyUI is the best open-source pipeline for producing media currently in existence. It is better than ForgeUI, Automatic1111, Fooocus, and InvokeAI.

ComfyUI was written by a solo-anon developer, who has since been hired at StabilityAI.

ComfyUI can be divided up into two parts: the graph-editor front-end, and the inference backend.

Front-End

the graph editor looks like this:

This workflow (1) loads a model (UNet, CLIP, and VAE pairing), (2) encodes two pieces of text (positive and negative) into embedding space using CLIP, (3) runs the diffusion process on an empty latent image using the positive and negative embeddings as conditioning and the UNet, (4) decodes the image from a latent into pixels using the VAE.

This workflow compiles down to a workflow-api JSON format that looks like this:

{
"3": {
"inputs": {
"seed": 156680208700286,
"steps": 20,
"cfg": 8,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1,
"model": [
"4",
0
],
"positive": [
"6",
0
],
"negative": [
"7",
0
],
"latent_image": [
"5",
0
]
},
"class_type": "KSampler",
"_meta": {
"title": "KSampler"
}
},
"4": {
"inputs": {
"ckpt_name": "sdxl_anime\\break_domain_xl_v05g.safetensors"
},
"class_type": "CheckpointLoaderSimple",
"_meta": {
"title": "Load Checkpoint"
}
},
"5": {
"inputs": {
"width": 512,
"height": 512,
"batch_size": 1
},
"class_type": "EmptyLatentImage",
"_meta": {
"title": "Empty Latent Image"
}
},
"6": {
"inputs": {
"text": "beautiful scenery nature glass bottle landscape, , purple galaxy bottle,",
"clip": [
"4",
1
]
},
"class_type": "CLIPTextEncode",
"_meta": {
"title": "CLIP Text Encode (Prompt)"
}
},
"7": {
"inputs": {
"text": "text, watermark",
"clip": [
"4",
1
]
},
"class_type": "CLIPTextEncode",
"_meta": {
"title": "CLIP Text Encode (Prompt)"
}
},
"8": {
"inputs": {
"samples": [
"3",
0
],
"vae": [
"4",
2
]
},
"class_type": "VAEDecode",
"_meta": {
"title": "VAE Decode"
}
},
"9": {
"inputs": {
"filename_prefix": "ComfyUI",
"images": [
"8",
0
]
},
"class_type": "SaveImage",
"_meta": {
"title": "Save Image"
}
}
}

This JSON is passed to the server-backend (the API).

If you want to run ComfyUI, checkout Scott Detweiler’s tutorial series (he’s alos a StabilityAI employee):

https://www.youtube.com/watch?v=AbB33AxrcZo&t=3s

Unfortunately, ComfyUI’s frontend code is a complete disaster and is built around LiteGraph.js; a barely-maintained 8 year old graph-editor library.

After spending a month of failing to refactor ComfyUI’s frontend, we discarded all of it and rewrote it in TypeScript + React, buliting it around React-Flow; a modern graph-editor library.

https://reactflow.dev/

https://github.com/comfy-creator/Comfy-Creator

We are releasing Comfy Creator’s front-end under a source-available, non-commercial license. This is so that users can run Comfy Creator locally, on their own machine, without censorship, while also preventing our competitors from trivially cloning our repo and launching a copy-cat service.

Locally-running, uncensored models are important to prevent us from becoming Google:

Comfy Creator’s motto is “Don’t Be Evil”; this was Larry Paige and Sergei Brin’s original, long-since abandoned motto for Google.

In the future, the entire world will run on AI models; giving individuals control over their own models, rather than gating access to it and using AI to perpetuate our own biases and values, is important for the future of our Democracy and Capitalism.

Backend

ComfyUI’s backend is written in Python + PyTorch. It takes the above-mentioned JSON and runs it as a workflow. It also provides an API to the front-end.

Comfy Creator Server is a fork of ComfyUI, and is released under a GNU GPL 3.0 license; just like the original ComfyUI. GPL 3.0 requires all forks to use the same license as the original.

ComfyUI was built only to run locally on the user’s machine. It is important that Comfy Creator be able to run locally, but users should also have the option to access more powerful remote GPUs as well.

There are a half-dozen startups that provide dockerized cloud-hosted instances of ComfyUI:

However, the cheapest and easiest way to run dockerized ComfyUI is to simply run it yourself:

Services such as RunPod provide one-click deploys of ComfyUI containers using a template:

However, Comfy Creator is better. We’ve rebuilt the ComfyUI server around gRPC rather than REST, and built a distributed-queue system using Apache Pulsar; users can place a workflow on our queue, and one of the workers in our worker-cloud will pick it up, process the job, and return media to the user.

Comfy Creator is the first Serverless instance of ComfyUI.

Serverless Comfy Creator is far more economical than the dockerized ComfyUI our competitors are using; you do not need to spend minutes spinning up a docker-container on a machine with a dedicated GPU, which sits idle most of the time and needs to be shutdown when you’re done; simply submit your workflow to a single API endpoint, get your results, and forget about it.

Stable Diffusion Architecture

Stable Diffusion 1, 2, and Midjourney were originally built using OpenAI’s DALLE-2 architecture, published April 2022:

This popularized Diffusion processes, replacing GANs as the new state of the art. Here’s a tutorial on how to build Stable Diffusion from scratch using PyTorch:

Diffusion models have become increasingly large over time:

Stable Diffusion 1: 1B parameters

Stable Diffusion XL: 3B parameters + (fine tuner)

Stable Cascade: 3.6B + 1.5B parameters

Stable Diffusion 3: up to 8B parameters

DALLE-3 and Midjourney do not disclose their model-sizes, but I consider DALLE-3 and Midjourney v7 to be the best image-generation models currently (as of Feb 2024).

Training Your Own Models

First, start by picking a model you want to train on top of:

Then use Kohya-SS to create a LoRA that modifies the original model however you like. A LoRA is a small set of layers that is trained to modify the original, much larger model, in desirable ways. It is much easier to train 9M parameters (a typical LoRA size) from scratch than to fine-tune 3B parameters (SDXL size).

-

--

--