Bring You up to Speed with Comfy Creator

6 min readFeb 23, 2024

(Note: Comfy Creator is not yet publicly available; expect it mid-March 2024.)

Comfy Creator (formerly void.tech) is a fork of ComfyUI:

GitHub - comfyanonymous/ComfyUI: The most powerful and modular stable diffusion GUI, api and…

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. …

github.com

ComfyUI is an inference pipeline for producing media with AI. It started as a pipeline for stable diffusion models (which are models that take text and produce an image), but it’s become much, much broader than that.

ComfyUI is the best open-source pipeline for producing media currently in existence. It is better than ForgeUI, Automatic1111, Fooocus, and InvokeAI.

ComfyUI was written by a solo-anon developer, who has since been hired at StabilityAI.

ComfyUI can be divided up into two parts: the graph-editor front-end, and the inference backend.

Front-End

the graph editor looks like this:

This workflow (1) loads a model (UNet, CLIP, and VAE pairing), (2) encodes two pieces of text (positive and negative) into embedding space using CLIP, (3) runs the diffusion process on an empty latent image using the positive and negative embeddings as conditioning and the UNet, (4) decodes the image from a latent into pixels using the VAE.

This workflow compiles down to a workflow-api JSON format that looks like this:

{
  "3": {
    "inputs": {
      "seed": 156680208700286,
      "steps": 20,
      "cfg": 8,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1,
      "model": [
        "4",
        0
      ],
      "positive": [
        "6",
        0
      ],
      "negative": [
        "7",
        0
      ],
      "latent_image": [
        "5",
        0
      ]
    },
    "class_type": "KSampler",
    "_meta": {
      "title": "KSampler"
    }
  },
  "4": {
    "inputs": {
      "ckpt_name": "sdxl_anime\\break_domain_xl_v05g.safetensors"
    },
    "class_type": "CheckpointLoaderSimple",
    "_meta": {
      "title": "Load Checkpoint"
    }
  },
  "5": {
    "inputs": {
      "width": 512,
      "height": 512,
      "batch_size": 1
    },
    "class_type": "EmptyLatentImage",
    "_meta": {
      "title": "Empty Latent Image"
    }
  },
  "6": {
    "inputs": {
      "text": "beautiful scenery nature glass bottle landscape, , purple galaxy bottle,",
      "clip": [
        "4",
        1
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "7": {
    "inputs": {
      "text": "text, watermark",
      "clip": [
        "4",
        1
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "8": {
    "inputs": {
      "samples": [
        "3",
        0
      ],
      "vae": [
        "4",
        2
      ]
    },
    "class_type": "VAEDecode",
    "_meta": {
      "title": "VAE Decode"
    }
  },
  "9": {
    "inputs": {
      "filename_prefix": "ComfyUI",
      "images": [
        "8",
        0
      ]
    },
    "class_type": "SaveImage",
    "_meta": {
      "title": "Save Image"
    }
  }
}

This JSON is passed to the server-backend (the API).

If you want to run ComfyUI, checkout Scott Detweiler’s tutorial series (he’s alos a StabilityAI employee):

https://www.youtube.com/watch?v=AbB33AxrcZo&t=3s

Unfortunately, ComfyUI’s frontend code is a complete disaster and is built around LiteGraph.js; a barely-maintained 8 year old graph-editor library.

After spending a month of failing to refactor ComfyUI’s frontend, we discarded all of it and rewrote it in TypeScript + React, buliting it around React-Flow; a modern graph-editor library.

https://reactflow.dev/

https://github.com/comfy-creator/Comfy-Creator

We are releasing Comfy Creator’s front-end under a source-available, non-commercial license. This is so that users can run Comfy Creator locally, on their own machine, without censorship, while also preventing our competitors from trivially cloning our repo and launching a copy-cat service.

Locally-running, uncensored models are important to prevent us from becoming Google:

Elon Musk Targets Google Search After Claiming Company AI Is 'Insane' And 'Racist'

"The problem is not just Google Gemini, it's Google search too," Musk said after the company paused its AI image…

www.forbes.com

Comfy Creator’s motto is “Don’t Be Evil”; this was Larry Paige and Sergei Brin’s original, long-since abandoned motto for Google.

In the future, the entire world will run on AI models; giving individuals control over their own models, rather than gating access to it and using AI to perpetuate our own biases and values, is important for the future of our Democracy and Capitalism.

Backend

ComfyUI’s backend is written in Python + PyTorch. It takes the above-mentioned JSON and runs it as a workflow. It also provides an API to the front-end.

GitHub - comfy-creator/Comfy-Creator-Server: The backend for Comfy Creator; a flexible PyTorch…

The backend for Comfy Creator; a flexible PyTorch inference graph and gRPC server. - comfy-creator/Comfy-Creator-Server

github.com

Comfy Creator Server is a fork of ComfyUI, and is released under a GNU GPL 3.0 license; just like the original ComfyUI. GPL 3.0 requires all forks to use the same license as the original.

ComfyUI was built only to run locally on the user’s machine. It is important that Comfy Creator be able to run locally, but users should also have the option to access more powerful remote GPUs as well.

There are a half-dozen startups that provide dockerized cloud-hosted instances of ComfyUI:

RunDiffusion - Automatic1111 in the Cloud

Fully managed Automatic1111, Fooocus, and ComfyUI in the cloud on blazing fast GPUs. No code. Get a private workspace…

rundiffusion.com

OpenArt - Free AI Image/Art Generator | OpenArt

Free AI image generator. Free AI art generator. Free AI video generator. 100+ models and styles to choose from. Train…

openart.ai

flowt.ai

Edit description

app.flowt.ai

However, the cheapest and easiest way to run dockerized ComfyUI is to simply run it yourself:

Services such as RunPod provide one-click deploys of ComfyUI containers using a template:

How to get Stable Diffusion Set Up With ComfyUI

Automatic1111 is an iconic front end for Stable Diffusion, with a user-friendly setup that has introduced millions to…

blog.runpod.io

However, Comfy Creator is better. We’ve rebuilt the ComfyUI server around gRPC rather than REST, and built a distributed-queue system using Apache Pulsar; users can place a workflow on our queue, and one of the workers in our worker-cloud will pick it up, process the job, and return media to the user.

Comfy Creator is the first Serverless instance of ComfyUI.

Serverless Comfy Creator is far more economical than the dockerized ComfyUI our competitors are using; you do not need to spend minutes spinning up a docker-container on a machine with a dedicated GPU, which sits idle most of the time and needs to be shutdown when you’re done; simply submit your workflow to a single API endpoint, get your results, and forget about it.

Stable Diffusion Architecture

Stable Diffusion 1, 2, and Midjourney were originally built using OpenAI’s DALLE-2 architecture, published April 2022:

Hierarchical Text-Conditional Image Generation with CLIP Latents

Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and…

arxiv.org

This popularized Diffusion processes, replacing GANs as the new state of the art. Here’s a tutorial on how to build Stable Diffusion from scratch using PyTorch:

Diffusion models have become increasingly large over time:

Stable Diffusion 1: 1B parameters

Stable Diffusion XL: 3B parameters + (fine tuner)

Stable Cascade: 3.6B + 1.5B parameters

Stable Diffusion 3: up to 8B parameters

DALLE-3 and Midjourney do not disclose their model-sizes, but I consider DALLE-3 and Midjourney v7 to be the best image-generation models currently (as of Feb 2024).

Training Your Own Models

First, start by picking a model you want to train on top of:

DreamShaper XL - v2.1 Turbo DPM++ SDE | Stable Diffusion Checkpoint | Civitai

DreamShaper XL - Now Turbo! Also check out the 1.5 DreamShaper page Check the version description below (bottom right)…

civitai.com

Then use Kohya-SS to create a LoRA that modifies the original model however you like. A LoRA is a small set of layers that is trained to modify the original, much larger model, in desirable ways. It is much easier to train 9M parameters (a typical LoRA size) from scratch than to fine-tune 3B parameters (SDXL size).