AI Dungeon Master vs ChatGPT: A Side-by-Side

A lot of people have tried using ChatGPT as a D&D dungeon master. It works, sort of, and that's the problem. It works well enough to get you excited, and then it falls apart in ways that take a few sessions to notice. This post is about where ChatGPT-as-DM is actually fine, where it isn't, and how a purpose-built AI dungeon master compares.

The short version

	ChatGPT as DM	Purpose-built AI DM (TableForge)
Rules adjudication	LLM improvises	Programmatic rules engine
Dice rolls	LLM "rolls" (not actually random)	Real random number generation
Character sheets	You maintain by hand	Updates automatically
Cross-session memory	Whatever you paste back in	Persistent and structured
Multiplayer	One person at a time	Up to 6 players, real-time or async
Combat tracking	Drifts within a few rounds	Tracks initiative, conditions, HP
Spell slot tracking	You hope it remembers	Engine tracks every slot
Cost	$20/mo for ChatGPT Plus	Free tier, then $14.99/mo

The rest of this post is the long version.

Where ChatGPT-as-DM is actually fine

We don't want to oversell the gap. ChatGPT and similar general-purpose LLMs are genuinely good at some parts of running a game.

One-shots and short scenarios. A two-hour session with a clean start and end is well within what ChatGPT can handle. The memory problem hasn't kicked in yet, the rules drift hasn't accumulated, and the narration is often vivid.

Creative warmups. If you're a human DM prepping a session and want to brainstorm NPCs, locations, or plot hooks, ChatGPT is a great writing partner. This is the LLM doing exactly what LLMs are good at.

Prompt experimentation. Curious about what an AI DM could feel like? Try ChatGPT for an hour. It's a low-cost way to figure out whether the category interests you before paying for a dedicated tool.

Solo freeform play. If your idea of a game is more "collaborative fiction" than "D&D campaign," ChatGPT is fine. You'll do the dice rolls yourself, you'll handle your own character sheet, and the AI will give you atmosphere.

These are real use cases. We don't want anyone reading this and thinking ChatGPT is useless for tabletop. It isn't.

Where ChatGPT-as-DM breaks down

The problems start in the second session and get worse from there.

No persistent memory. ChatGPT's memory feature helps with general preferences, but it doesn't structure campaign state. You can paste session summaries back in, but you're effectively being your own scribe and database for the campaign. The NPC you befriended three sessions ago will get forgotten or subtly recharacterized. Promises won't be kept. Quest threads will quietly fade. The campaign won't hold.

No real dice. When ChatGPT "rolls" dice, it generates text that looks like a die roll. It is not random in any meaningful statistical sense. Across a long campaign this matters. You'll get unrealistic streaks, and (more subtly) the AI will sometimes generate the result it thinks is dramatically appropriate, which is exactly what you don't want from dice.

No rules engine. This is the big one. Ask ChatGPT to adjudicate a grapple check and it might get it right. Ask it to track concentration across a five-round fight while managing action economy and three players' reactions and it will eventually invent rulings that don't exist. Concrete things that go wrong: a paladin smites after seeing the dice (the rules require the decision first); a spell's damage type is misremembered; concentration silently survives a failed save; bonus actions and reactions get conflated.

No automatic character sheet. You maintain HP, spell slots, conditions, and inventory by hand. This is fine for one or two sessions and exhausting by session five.

No real multiplayer. ChatGPT is one-on-one. There are workarounds (one person plays "host" and relays for others) but they're awkward enough that they usually don't survive past the first session.

Combat drift. This is what kills most ChatGPT campaigns. Combat is the densest application of rules in D&D, and it's where LLM-as-DM systems fail most visibly. The AI loses track of who's gone, what conditions are active, how much HP someone has, whether a spell is still concentrated on. You can correct it manually, but at that point you're DMing the AI, not playing.

How a purpose-built AI DM is different

A tool built specifically for running tabletop campaigns handles these problems architecturally, not by hoping the LLM gets it right.

In TableForge's case: a dedicated rules engine (real code, not an LLM) adjudicates dice, combat, conditions, and spell slots. Character sheets update automatically based on engine output. Campaign memory is persistent across sessions in structured form: NPCs, locations, decisions, quest state, consequences. Multiplayer is first-class, supporting up to six players in real-time or async. The LLM still does what it's good at (narration, NPC voice, atmosphere, reacting to your choices) and stops being asked to do what it isn't good at.

The point isn't that ChatGPT is bad. The point is that running a long campaign is a different problem than chatting, and it benefits from infrastructure built for it.

How to decide

If you want to play D&D for an evening with a friend and don't care if the rules drift, try ChatGPT first. It's free if you already have a subscription, and there's nothing to install.

If you want to run a campaign that holds together across sessions, with real dice and a working character sheet and the option to invite friends, you want a purpose-built tool. There are a few in the category. We make one.

A pragmatic order of operations: try ChatGPT for an hour. Notice where it works and where it doesn't. If the category interests you, try a free tier of a dedicated tool. Pick the one that runs your game the way you want it run.

Your first session on TableForge is free. Try it.