Generating Super Mario Bros Levels With Text-Conditional Diffusion Models
This page presents research in Procedural Content Generation via Machine Learning done
in collaboration with undergraduate students Olivia Kilday, Bess Hagan, Emilio Salas, and Reid Williams
as part of Southwestern University's Summer research program.
Source code for training your own Mario Diffusion models is available on GitHub.
The GitHub repository also has simple instructions for downloading our models from Hugging Face and generating your own Mario
levels without even training a model!
Out publication has been accepted to The 21st AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 2025),
but an arXiv pre-print explaining our approach and results is available here.
The online appendix for the paper is included in the arXiv version, but is available separately at this link.
Video Demonstrating Our Interactive GUI For Level Creation
Model-Generated Scenes
Each collection of images shows level-scenes generated from trained models. The input-prompt for each image is shown. The same prompts are used for each model. The first 5 prompts are taken
from real data, and the next 5 are randomly generated prompts not in the original data set. The first number in each image filename is simply to specify the order: 0 through 9.
The second number in each filename is the caption adherence score: the degree to which the generated image actually matches the caption that is shown in the filename. Caption
adherence scores range from -1.0 to 1.0, with 1.0 being a perfect score.
Regular Captions
The caption shown is the input the the diffusion model
Though not shown in the file names, the actual input prompt for each image also includes an explicit mention of each feature that is completely absent from the scene.
For example, prompts would contain phrases such as "no enemies. no pipes. no rectangular block clusters."
Though not shown in the file names, each of these input prompts were combined with a separate corresponding negative prompt.
So, if the input prompt did not make any mention of coins or pipes, then the corresponding negative prompt would include "coins. pipes."
within its collection of phrases.