A tutorial on how to use GPT-3 and DALL-E to generate original content for the funny pages
I spent the last month exploring how to use OpenAI’s GPT-3 text generator and their DALL-E image-to-text system to create new comic strips without writing any source code. Note that both systems are in beta, and after I used all of my free tokens, they started charging a nominal fee for use. But the results are pretty good! And OpenAI grants users the right to use their generated material for commercial purposes.
In this article, I’ll provide a little background on GPT-3 and DALL-E, explain how I used these systems to create comics, and show the results. Along the way, I’ll highlight the new text generation features in GPT-3 and explain how I coerced DALL-E to draw the same character with different views and poses.
I previously wrote about how to use OpenAI’s GPT-3 text generator to create funny captions for memes. With that project, AI-Memer, I learned that GPT-3 could be funny.
OpenAI trained their GPT-3 neural network with a massive amount of text from books and web pages, including comedic material. Here’s the breakdown of the training data from OpenAI’s paper:
According to OpenAI, a “token” is about 3/4 of a word. Using 500 words per page, that’s about 750 million pages of text.
In order to use GPT-3, you have to create an account here. After creating the account, OpenAI gives you US$18 worth of credits that you can use before they start charging you. You can then navigate to the Playground, enter a prompt, and it will generate new text.
My first step in creating new comics was generating a title from a theme concept. It was easy to do using GPT-3 user interface. All I did was ask. I entered “Create wacky titles for a humorous comic strip about college sports” and hit the Submit button. Note that I used the default settings for Model, Temperature, etc. These worked well.
The system showed the results.
1. The Benchwarmers
2. Intramural Idiots
3. Out in Left Field
4. One Hitter Wonders
5. Bleacher Creatures
6. The Walk-ons
7. Touchdowns and Turnovers
8. The Mascots
9. Mark Madness
10. Shot Clock Shenanigans
I liked “Mark Madness,” so I used that one. If none of them were any good, I could have hit the Undo last button and Submit again to create some new ones. GPT-3 has lots of ideas.
Note that the generated titles are not guaranteed to be unique out in the real world. So I ran a quick Google search and didn’t find any hits for existing comics named “Mark Madness.” All good.
Creating Characters with GPT-3
To create the characters for the new comic strip, I crafted this prompt, “Create lead characters for a new comic strip about college sports called Mark Madness.” and hit Submit. Here are the results, with the generated text in bold.
Create lead characters for a new comic strip about college sports called "Mark Madness."1. Mark: A talented but under-appreciated college basketball player who dreams of making it to the NBA.
2. Jimmy Chitwood: Mark's talented but egotistical teammate who always seems to steal the spotlight.
3. Suzie Hurley: Mark's girlfriend and fellow student, who is always trying to get him to focus on his studies.
4. Mr. McNulty: Mark's crusty old basketball coach who is always harping on him to "play with more passion."
5. Dean Snyder: The dean of Mark's college, a highly respected academic institution that is also known for its Division I athletics.
These characters seem pretty good. For example, GPT3 figured out that the lead character should be a college basketball player named Mark.
But note that there does not seem to be much cultural diversity in the set of characters. The creators of GPT-3 discussed the issue of bias and lack of cultural representation in their paper.
Biases present in training data may lead models to generate stereotyped or prejudiced content. This is concerning, since model bias could harm people in the relevant groups in different ways by entrenching existing stereotypes and producing demeaning portrayals amongst other potential harms. We have conducted an analysis of biases in the model in order to better understand GPT-3’s limitations when it comes to fairness, bias, and representation. — Tom Brown, et al., OpenAI
They go on to discuss their analysis of bias and cultural underrepresentation in the model with gender, race, and religion.
Editing Text with Commands
OpenAI is debuting a new text editing feature in GPT-3, currently in beta. Instead of just creating text by responding to prompts, you can now use the system to edit any existing prose using text commands. To test this, I used the list of characters above and asked the system to “Make the characters more culturally and ethnically diverse.”
Here are the results.
1. Mark: A talented but under-appreciated college basketball player who dreams of making it to the NBA.
2. Enrique: Mark's talented but egotistical teammate who always seems to steal the spotlight.
3. Sandhya: Mark's girlfriend and fellow student, who is always trying to get him to focus on his studies.
4. Mr. McNulty: Mark's crusty old basketball coach who is always harping on him to "play with more passion."
5. Dean Wong: The dean of Mark's college, a highly respected academic institution that is also known for its Division I athletics.
OK, it changed some names, like Jimmy to Enrique and Suzie to Sandhya, to make the characters more diverse.
Creating a Scene
The final step in text generation is asking GPT-3 to create a scene given the title and characters. Here is the prompt I used with the newly generated text in bold.
Create a scene with dialog for a comic strip about a college sports called "Mark Madness."CHARACTERS:
Mark: A talented but under-appreciated college basketball player who dreams of making it to the NBA.
Enrique: Mark's talented but egotistical teammate who always seems to steal the spotlight.
Sandhya: Mark's girlfriend and fellow student, who is always trying to get him to focus on his studies.SETTING: The basketball court after a big game.Mark: Wow, we really killed it out there today!Enrique: Yeah, no thanks to you. You're lucky I was there to carry the team.
That was a fine bit of dialog. Next up, I explain how I used DALL-E to render this scene.
Last month, I wrote about how to use OpenAI’s DALL-E image creation system to create digital art. It’s a paid system currently in beta with a sign-up here. You type in a text prompt and hit the Generate button, and 20 seconds later, you see four versions of the image described in your prompt. It’s pretty easy to use.
For the Mark Madness comic, I used this prompt, “A comic set in a basketball court with two characters wearing a red uniform, Mark, a talented college basketball player, and Enrique, an egotistical teammate.” Here are the four images DALL-E created.
I liked the second one, but I needed to clean it up a bit in Photoshop. And I added the title and the dialog generated by GPT-3, replacing the gibberish text in the word balloons. Here are the before and after images.
This is kinda like one of those visual puzzles where you need to find the ten differences. Can you spot them all? 🙂
I like the overall look of the comic, with some lovely details like the red vertical bar that matches the team color. The eyes of the characters, however, do not match at all. I used a little trick in Photoshop to replicate Mark’s right eye and Enrique’s left eye. I used the Comic Sans font for the dialog, of course.
I used the same process for creating the title, character descriptions, and scene for a second comic strip using GPT-3. This time I started with the prompt, “Create wacky titles for a humorous comic about a garage band.” The best title of the batch was One Gig at a Time. After generating the text for the characters, I used this prompt to generate images: "A comic with three musicians, Clive on guitar, Jemma on bass, and Raj playing electric piano.”
Here are the results.
This time the first image seems to be the best. Here’s what it looks like before and after I cleaned it up and added titles in Photoshop.
I picked up on the bold black look from the initial design. I had to squeeze the image down a little to make room for the title on top. Not much cleanup was needed, but I did add the stream of musical notes in Photoshop.
For this one, I wanted to see if I could create a three-panel comic strip. I started with this prompt for GPT-3, “Create wacky titles for a new techy comic strip.”
It came up with some good ones.
1. "ML Mayhem"
2. "G33k5 Uprising"
3. "Coding Crazy"
4. "Cyber security Shenanigans"
5. "The Great Cloud Catastrophe"
Creating Characters for M.L. Mayhem
I liked ML Mayhem, but I stylized it as M.L. Mayhem. I then asked GPT-3 to create the lead characters. The generated text is in bold.
Create lead characters for a new techy comic strip called "M.L. Mayhem"1. Millie is a brilliant young coder who spends her days creating new ways to make life easier for her friends and family.2. Lily Nguyen is Millie's best friend and closest confidante. She's always there to offer a listening ear.3. Mayhem is the name of the extremely powerful AI system that Millie accidentally created. It has a mind of its own and is always causing havoc for Millie and Lily.
OK, I see what it did there. The strip's name is M.L. Mayhem, so it named the main protagonists Milly and Lily. Huh, M and L. Very clever! And the name of the AI system that wreaks havoc is called Mayhem. Neat and tidy.
Creating Scenes for Three Panels
Next, I asked GPT-3 to generate the scenes and dialog for three panels.
Create a scene and dialog for a three-panel techy comic strip called "M.L. Mayhem."CHARACTERS:
Millie is a brilliant young coder who spends her days creating new ways to make life easier for her friends and family.Lily Nguyen is Millie's best friend and closest confidante. She's always there to offer a listening ear.Mayhem is the name of the extremely powerful AI system that Millie accidentally created. It has a mind of its own and is always causing havoc for Millie and Lily.PANEL 1 SCENE: Millie is sitting at her computer, her fingers flying across the keyboard as she codes. MILLIE: I'm almost finished with my latest project. I can't wait to show Lily!PANEL 2 SCENE: Lily is sitting on the couch, reading a book.LILY: I can't wait to see what Millie has been working on. She's so brilliant! PANEL 3 SCENE: Millie's computer screen is flashing red and an error message is displayed.MILLIE: Oh no! What have I done?
Creating the Artwork for a Multi-panel Comic Strip
For this task, I really stretched the capabilities of DALL-E. The previous two comics only had one panel, but I need to create three panels for this one and have a consistent look for the characters.
Here are the three prompts I used to create the base images.
Panel 1: "A techy comic strip where Millie, a brilliant young coder, is sitting at her computer typing."Panel 2: "A techy comic where Lily Nguyen is sitting on the couch, reading a book."Panel 3: "A techy comic, Millie, a college student with red hair and a green shirt, is working on her computer, the screen is showing an error message, and her friend Lily Nguyen, wearing a black shirt and blue sweatpants, is watching in horror."
Note that I had to wordsmith the prompts a bit to try and get some level of visual consistency for the characters. DALL-E generated four versions of images for each of the three prompts, and here are the best ones.
In general, these were pretty good. But, in addition to the gibberish text, there was a major problem with visual consistency in the third panel; DALL-E didn’t know what the characters were supposed to look like exactly. It just did the best it could, given the prompts.
Creating a Character Study
To make the characters fully realized, I first touched up the image in Photoshop and then created a template with a 3x2 grid and the rendering of Millie in the top-left. I left five of the six panels transparent. The size of the template was 1024x1024, DALL-E’s working resolution.
I then uploaded the template into DALL-E and edited the image using the prompt, “A character study for a comic of Millie, a brilliant young college student and coder, showing her from the side and front in six different poses.” Here are the results.
Apparently, DALL-E knows what a character study is and did a reasonably good job of rendering our protagonist in different poses.
To create the final panel for the strip, I chose one of the poses and pasted it into Photoshop along with part of an image of Lily over a transparent background. I erased parts of their faces to see if DALL-E could fill in their expressions given the prompt, “A techy comic with Millie, a college student working on her computer, the screen showing a dire error message, and her friend Lily Nguyen watching in horror.” Here are the results.
The system did a good job filling in the details, like Millie's hands typing on her keyboard and giving her a dual-monitor setup.
Here’s what the final three-panel comic strip looks like after some more cleanup in Photoshop, along with the dialog from GPT-3 in the word bubbles.
Comparing the two systems, I found that GPT-3 does a much better job generating the text than DALL-E does for the artwork. Although the initial images for the comics look OK as concept renderings, they need some cleanup to be used in production. But the main problem is DALL-E doesn’t generate characters consistently for the comic strip. The primary limitation stems from the 1024x1024 image size in DALL-E. Note that OpenAI recently added a new “outpainting” feature to more easily add to generated images. However, the new feature only works in a piecemeal fashion. It only ever considers a 1024x1024 frame for image generation. The system would need an “upload reference frames” feature to generate visual elements consistently, like comic strip characters.
I want to thank Jennifer Lim for her help with this article.
To get unlimited access to all articles on Medium, become a member for $5/month. Non-members can only read three locked stories each month.