Tuesday, May 2, 2023

Replicating MidJourney within Stable Diffusion

MidJourney allows me to produce crazy amazing portraits. There is an open source alternative to MidJourney called "Stable Diffusion".

Over the last 5 days I challenged myself to reproduce the look of MidJourney in Stable Diffusion. I'm liking the result.

MidJourney image:


Stable Diffusion image: 


Stable Diffusion is an open source text to image program you run on your local computer via python. So... if you can get it to do what you want it is free with the caveat that the resulting image will be 512x512. This may be a bit technical but I wanted to help anyone trying to get Stable Diffusion to produce decent images. It is very doable.

This tutorial is decent but it is lacking in optimization:

https://www.youtube.com/watch?v=Bdl-jWR3Ukc


Here are some things to understand:

First off, you won't produce great images without a negative prompt. I stumbled onto this negative prompt and it was a game changer:

easynegative, badhandv4.pt, bad quality, normal quality, worst quality, (((duplicate))), bad art, mutated, extra limbs, extra legs, extra arms, bad anatomy, (blurry image:1.1), (blurry picture:1.1), (worst quality, low quality:1.4), (out of frame), duplication, (folds:1.7), lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, (extra arms), (extra legs), fused fingers, too many fingers, long neck, username, watermark, signature, monochrome, deformed legs, face out of frame, head out of frame, head cropped, face cropped, same face twins

AI images are produced by feeding your text into a sophisticated prebuilt knowledge pathway called a model. The model is a file GBs in size and comes in 2 varies: checkpoint (old style) and safe-tensors (new style). The one commonly used is Stable Diffusion v1.5 (v1-5-pruned.ckpt) which is 7GB. Models are built by sifting thru millions of images that are tagged with relevant information. This isn't something the average person can do costing companies millions of dollars and a ton of processing power.

However...

You can use AI tools to graft new images into an existing model that replace a concept. This is the hard part. You are trying to "lightly" replace a concept without overdoing it. If you do it wrong Stable Diffusion ignores your prompt text and only produces images like what you give it. This is called overfitting. If you do it right you can get images to change their appearance based on the text of the prompt.                                                                               

The tool I use to graft images into an existing model is dreambooth. I will assume you have installed Stable Diffusion and Dreambooth (see youtube video). You also need a graphics card with at least 12GB of VRAM. I have the Geforce 3080 ti which has 12GB of VRAM. 

It is important to understand VRAM capacity is your enemy. It has caused me a lot of grief finding the perfect settings.

You will need a good model to start with. v1.5 raw is not good enough. There are people that graft generic images onto v1.5 for you to start from. 

I recommend https://civitai.com/models/9114/consistent-factor 

Forget what the youtube video tells you. You will need 75 512x512 images that represent your subject in a variety of settings and poses. You can do it with less but VRAM will force you to build a smaller sized image. My setup allows you to build a 384x384 image with 12GB VRAM (which is really good). There are sites that allow you to quickly resize images to 512x512. (see youtube video for details on this).

I built 75 images in MidJourney in the style I wanted and resized them all to 512x512 and put those in 1 directory.

Once you have 75 images continue...

1. Download the consistent factor 4GB safe-tensors file and put it in your models/Stable-diffusion directory.

2. Go to dreambooth tab and create a new model based on that safe-tensors model. once it is built click 'load settings'

3. Once it is loaded click 'performance wizard' on settings tab. switch to concepts tab and click 'training wizard (person)'

4. click 'save settings' and 'load settings' because I don't trust this tool

'settings' tab entries of interest:

> Training Steps Per Image (Epochs) = 150

> Save model frequency = 25 (create safe-tensors model every 25 epochs)

> Save preview frequency = 5 (show previews every 5 epochs)

> batch size = 1

> Learning rate = 0.000005

> Max Resolution = 384 (if you run out of memory slide this down and retry until it works)

'concepts' tab entries of interest

> we are only doing 1 concept

> Dataset Directory = your 75 image directory

> Classification Dataset Directory = create a directory for this (dreambooth will fill it for you)

> Instance prompt = girl

> Class prompt = girl

> Sample image prompt = girl in a field

> Sample negative prompt = the one I gave above

> Class images per istance = 4 (300 ideal images / 74 provided images = 4)

> Number of samples to generate = 4

> Sample CFG scale = 7

> Sample Steps = 20


once this is all entered, click 'save settings' and 'load settings'


The commonly accepted min number of images to train on is 300. When you have less you ideally want the number of images you have to divide evenly into 300. You then take the remainder (4 in my case) and put that as Class images per istance.


Our goal is to get Max Resolution as high as possible. 

> 75 images + learning rate of 0.000005 + Class images per istance of 4 is the magic combination

> If you have more images, it needs more VRAM

> If you have less images, you need more class images which also needs more VRAM and slows down processing

> If you train slower (ex: 0.000002), you need more VRAM


An epoch just means how many training steps per image. In this case each Epoch will be 150 training steps.

> This can take anywhere between 45 to 75 seconds depending on the complexity of your 75 images.


After each 25 epochs your model will be saved in models\Stable-diffusion

> Once you have this model (your goal) you can test it (see below)


click 'Train' and let it go until it it reaches epoch 26. then click 'cancel' and let it wrap up

> It will have saved a safe-tensors model after 25 epochs.

> Go to txt2img tab

> refresh the checkpoint list and pick your model/checkpoint

> prompt = girl in a field

> negative prompt = what I gave above

> click 'restore faces'

> slide batch size to 8

> click 'generate'


If your model is good, the person will be in a variety of poses in the style you want. If your model overfits your text will have no impact on the result (which may still be ok). If your model is crappy you will see weird artifacts/glitches.


NOTE: all images you produce via txt2img tab are stored in outputs\txt2img-images


If you have a bad model it likely means your images are too different from each other.

You have 3 options to fix this:

1. You can try adding/removing words from the prompt to see if it gets better

2. You use this model as a starting point to run another 25 epochs for another save

3. Replace your images

25 epochs is ideal because if it works, it is likely not overfitted and flexible. The more epochs you run, the more the concept of 'girl' is hardwired to your images ignoring other words in your prompt.

Once you have this working you can do whatever you want. This is how people do deep fakes. You could replace 15 images in your 75 image set of a famous person, build a new model and they will show up in your results.

Wednesday, April 15, 2020

New game underway



Since the summer of 2017 I have been working part time on a game written in C++ using the UE4 engine (see above).

I recently played thru Half-Life Alex and it has became apparent that UE4 is lacking. I have stopped development and turned all my attention to learning how to write a game engine. I purposely wrote the game with no levels built in UE4 and no scripting in UE4.

It may be unrealistic to swap out UE4 for my own engine but I'm going to give it a shot. Even if I fail I expect I will learn a ton.

I am following the game engine series by "The Cherno". This guy is incredibly knowledgeable. I have decided to become a Partner Patreon member and are diving deep into all his videos.

OpenGL is the current series I am starting

Tuesday, September 10, 2019

Classic WOW Design


I've never been a serious WOW player. I've never even taken a character to max level. If there is such a thing as a filthy casual, that's me (except I take showers).

So when WOW classic came out, I figured I'd get it a fresh go. As a game designer, these are my thoughts so far:

Traveling to places requires a lot of walking
Some quests require you to walk for minutes to reach your destination. It forces you to learn the landscape. It increases immersion. As far as I'm concerned, flying mounts should have never been added to the game. It detaches you from the environment.

Energy/Mana management is important
As a mage, it usually takes half my mana to kill a single enemy. If another enemy attacks me, I could run out of mana and die ... so after every kill I stop to refill my mana.

Attacking 2 enemies/mobs at once, usually means you will die
I'm used to Destiny and Diablo where you can steam roll over stuff. No can do in wow. If I see two enemies grouped together, I pass on attacking them solo. That adds a tactical component to the game. I hate that Destiny does not have this. Destiny's design approach is just to flood an area with mobs are rely on your twitch skills or constant re-spawning to get thru an area. It didn't used to be like that. In the original Destiny 1, you could slowly work your way through nightfall solo and get a halo achievement for that. Not any more. You play fast, accurate and furious or you suck.

The world feels vast
It may look primitive but the environment goes on without pause. Modern games like Division, Destiny or Anthem put too much emphasis on fantastic visuals. These visuals require significant load times that they hide with maps or flying ships or in Anthem's case ... not well at all. I think there is a case for simpler graphics that allow the landscape to stretch on forever.

People are friendly
Very rarely do I encounter anyone that is not friendly.This is due to several factors:
1. People playing classic wow are probably a majority of veteran players who are good mannered.
2. Buffs matter. Putting a buff on someone as you pass by is a way of saying hi in a meaningful way.
3. You need to cooperate to beat stuff and being caustic isn't doing you any favors.

Respawning is something you need to pay attention to
If you need to penetrate into an enemy camp and kill high level mobs to get there, you need to watch your back. Enemies are going to respawn behind you and could trap you. And... if you die in an enemy camp, you may not find a safe spot to revive.

Some non-dungeon areas require grouping with others
I played a were-wolf area last night that required groups of people working together. I really want to complete every quest and there was no way I was completing those quests without the help of others. There was one quest (ambush) where I got what I needed and I just stuck around for another 15 minutes until everyone had a chance to complete it.

Money is tight
Gold... OMG... It is hard to get this stuff. Having to earn money makes you appreciate it. You don't have enough to spend it on everything you want (training, profession, items) so you need to carefully choose what you want.

Leveling is slow
It takes hours to go up one level. When you do level up, you stop to put a point in your talent tree... because those upgrades matter.

Quests are important and take time to complete
In modern wow, I didn't pay much attention to quest text. In classic, I read it all because I know I'm not going to level out of an area before I complete the quests. Last night I was struggling to find anything to do in an area that I was not done with. That forced me to go into harder areas to complete lingering quests.

The death penalty is rough
If you die, you may walk minutes to get back to your body. After dying a few times, you think really hard about that. I'd say it isn't quite as punishing as Dark Souls where you have to fight your way back to your body to get the resources you lost, but it is punishing.

Closing thoughts
Wow classic is a solid old school game. Modern games try to hook you with quick leveling, no death penalties and fremium addictive RNG/Gambling aspects. Wow classic has none of that and it is refreshing.

Wednesday, October 25, 2017

The State of VR

The State of VR



Oculus Rift is clearly experiencing trouble selling their headset. In May they had to shut down their Film Studio and they've slashed the price of their headset several times.

Is this a clear sign that VR is a failure?
I don't think so.

I am bullish on VR. It is not an if but a when. It is the when that is the issue.

Processing Power Limitations
I think the problem is we don't have the processing power or appropriate pricing of that hardware for VR to become mainstream. To render each eye at 90 frames per second and deliver a AAA game experience is impossible with today's hardware. I have struggled with this a lot in my game design. I have taken the route of allowing the game to automatically adjust scene complexity based on frame rate. The result is little to no stutter but the visuals can't come close to approaching Destiny.

Similarities to Tablets
When Tablets first hit the scene, the size of the screen was small, the interface was clunky and the processing power was not up to the task. Now everyone has a tablet. Technology and Software have arrived to allow us to have a powerful touch computer in our hands. VR is in a similar state. The hardware just isn't there and it is probably 10 years down the road before cheap fast hardware is available.

Saturated Market
VR at this point is pretty saturated for what it can deliver. Because of hardware limitations it feels less like a gaming platform and more like a gimmick like the wii. Once you become accustomed to it, you find most experiences shallow and don't stack up to AAA titles.

In the next few years we are likely going to see VR stagnate. This will continue until hardware becomes powerful enough

Standing vs Sitting
One of the other problems with VR you need to stand for an immersive experience. This has already been done 3 times: wii, xbox and playstation. I think people are tired of setting up sensor bars and getting off the couch. You need to have a very compelling reason to get people off the couch and current VR hardware and software are not cutting it.

Summary
All of this sounds pretty dire but I think this is good for VR developers who are in it for the long term. I do this as a hobby and don't have to worry about making money from VR. I can build the type of experience I want and know it will be unique because there is little competition in this space.

Monday, September 25, 2017

Game Design Observations

Game Design Observations

My game is going to be large and open for the player to go in whatever direction they want. I've been playing Destiny 2 pretty heavily the last week and thought I'd look at some popular games and make some observations.

Here are 5 games that I think are worth evaluating:

Destiny:


Diablo:

Zelda:

Pokemon Go:

Skyrim:


What drives you to explore?
Destiny: Search for better loot.
Diablo: Search for better loot.
Zelda: Search for better loot & Story
Pokemon Go: Find rare Pokemon
Skyrim: Quests. Loot is mostly irrelevant.

How does your character level up?
Destiny: Items (Light Level)
Diablo: Experience & items
Zelda: Experience
Pokemon Go: Not relevant
Skyrim: Using the ability you want to level up

What keeps you playing?
Destiny: loot and friends playing together
Diablo: trying to reach highest greater rift
Zelda: end of story
Pokemon Go: Find rare pokemon and level up existing ones
Skyrim: Completing quests, finding new areas, raising character level

What sucks?
Destiny: Once you hit 260 light, it becomes a grind. You are often forced to use weapons you hate.
Diablo: Once you reach Torment 10 or so, it becomes a grind. Power level has gotten out of control.
Zelda: Weapons break.
Pokemon Go: It is hard to find rare Pokemon
Skyrim: Everything starts to look the same

What 1 improvement would I add?
Destiny: A wider range of improvements on the high end. Light level should keep going to 400+ with high level content available to dedicated players.
Diablo: Fix legendary gems. They are a storage nightmare.
Zelda: I didn't finish it. It really didn't like constantly breaking weapons. This felt like Halo.
Pokemon Go: Make the Pokemon locator work again... idiots.
Skyrim: Make loot more valuable

Reflections on how to design my game:
1. There needs to be something happening around you constantly.
2. Enemies need to be frequent, varied
3. Events should occur that you can find and complete
4. You can find items/enemies via a locator
5. Character level should be unlimited and driven off gear.
6. Gear can be upgraded without a great hassle.
7. There needs to be quests to complete with rewards.
8. Areas of the game get harder as you go further driving you to improve your gear.

Monday, September 18, 2017

Destiny 2 Graphics

Destiny 2 Graphics



I've played a lot of Destiny over the last few years. Yesterday I got the itch to play again so I picked up Destiny 2 ... and ... played it for most of the day.


Man... that game's graphics are amazing. I am officially intimidated by it.
But again... they have spent hundreds of millions of dollars on it so I expect it but still...



I think what they do is build the scene with a multi-texture material. When they are all done, anything that is static is converted into a single landscape mesh using that material. The result is stunning.


Tuesday, September 12, 2017

Working on Urban Layout

Working on Urban Layout


I've been working on an urban layout now that I know the scale I need to construct the world.

Here is a sample I put together yesterday:

You can teleport in all directions forever and it will continue to build a random city with no stutter. This is obviously very primitive but I was more concerned with performance and how things looked at this scale and whether it would continue to look vast (it does).

Next I am going add elevations for the urban layout and see what that looks like.

Just for a bit of fun, I adjusted the layout to not have a road :)