Lesson 6: Terminal Protocol
Terminal applications don't write "move cursor to row 5, column 10, print 'hello' in bold red." They write escape sequences — byte strings that the terminal emulator interprets. After this lesson, you understand the VT100 state machine, can parse any CSI sequence by hand, and can build a VT parser that correctly handles the bytestream Ghostty produces.
Why a State Machine?
The terminal protocol is in-band. The same byte stream contains printable text AND control commands. A byte A (0x41) could be a literal 'A' or part of a cursor movement sequence (\e[10A = move cursor up 10 lines). The parser must track which context it's in.
stateDiagram-v2
[*] --> Ground
Ground --> Escape: 0x1B (ESC)
note right of Ground: any other byte → emit char to grid
Escape --> CSI_Entry: '[' (0x5B)
Escape --> OSC_String: ']' (0x5D)
note right of Escape: any other byte → exec ESC command<br/>then return to Ground
CSI_Entry --> CSI_Param: 0x30-0x39 (digits)
CSI_Entry --> CSI_Param: 0x3B (semicolon)
CSI_Entry --> CSI_Intermediate: 0x20-0x2F
note right of CSI_Entry: 0x40-0x7E → exec CSI<br/>then return to Ground
CSI_Param --> CSI_Param: 0x30-0x39 (digits)
CSI_Param --> CSI_Param: 0x3B (semicolon)
CSI_Param --> CSI_Intermediate: 0x20-0x2F
note right of CSI_Param: 0x40-0x7E → exec CSI<br/>then return to Ground
CSI_Intermediate --> CSI_Intermediate: 0x20-0x2F
note right of CSI_Intermediate: 0x40-0x7E → exec CSI<br/>then return to Ground
The core idea: the parser is always in exactly one state. The next byte determines the transition. This is a deterministic finite automaton (DFA).
The canonical reference: Paul Williams' "A parser for DEC's ANSI-compatible video terminals" (vtparse) defines the state machine. Ghostty's StreamHandler.zig implements a Zig version. The states, transitions, and actions are standardized. Your job is to implement them correctly, not cleverly.
The State Machine States
| State | Meaning | Exit condition |
|---|---|---|
| Ground | Normal text | ESC (0x1B) or C0 controls (0x00-0x1F) transition out |
| Escape | Saw ESC, deciding what's next | Next byte: [ → CSI, ] → OSC, P → DCS, ( etc → charset select |
| CSI Entry | Saw ESC [, collecting params |
Terminator byte in 0x40-0x7E range executes the sequence |
| CSI Param | Collecting parameter bytes | Terminator executes; ; stays in Param; ?, $, etc → Intermediate |
| CSI Intermediate | Private/extension marker (e.g. ? for DEC private modes) |
Terminator executes |
| CSI Ignore | Invalid sequence — discard bytes until terminator | Terminator byte returns to Ground (no action) |
| DCS Entry/Param/Data | Device Control String (e.g. \eP...\e\) |
ST (\e\) or BEL terminates |
| OSC String | Operating System Command (e.g. set window title) | ST or BEL terminates |
| SOS/PM/APC String | Rare string types | ST or BEL terminates |
C0 and C1 Control Characters
C0 (0x00-0x1F): single-byte controls embedded in the text stream.
| Byte | Name | Meaning |
|---|---|---|
\0 |
NUL | Ignored (padding) |
\a |
BEL | Bell (audible beep or visual flash) |
\b |
BS | Backspace (move cursor left) |
\t |
HT | Horizontal tab (move to next tab stop) |
\n |
LF | Line feed (move cursor down, may scroll) |
\v |
VT | Vertical tab |
\f |
FF | Form feed (clear screen sometimes) |
\r |
CR | Carriage return (move cursor to column 0) |
\x0E |
SO | Shift Out (select G1 character set) |
\x0F |
SI | Shift In (select G0 character set) |
\x1B |
ESC | Escape — start an escape sequence |
\x7F |
DEL | Delete (ignored in terminal protocol) |
C1 (0x80-0x9F): 8-bit equivalents of escape sequences. Rarely used; 7-bit ESC-prefixed forms are standard.
CR vs LF: carriage return moves the cursor to the leftmost column. line feed moves the cursor down one row (scrolling if at bottom). Unix uses \n (LF) for newlines. Terminals on Unix usually have ONLCR set: output LF is translated to CR+LF. This is why raw text files look correct in the terminal — the line discipline adds the CR.
CSI Sequences (Control Sequence Introducer)
Format: ESC [ followed by parameter bytes (0x30-0x3F), intermediate bytes (0x20-0x2F), and a final byte (0x40-0x7E).
\e[<params><intermediates><final>
Cursor movement
| Sequence | Action |
|---|---|
\e[nA |
CUU — cursor up n lines |
\e[nB |
CUD — cursor down n lines |
\e[nC |
CUF — cursor forward n columns |
\e[nD |
CUB — cursor back n columns |
\e[nE |
CNL — cursor next line (down n, column 0) |
\e[nF |
CPL — cursor previous line (up n, column 0) |
\e[nG |
CHA — cursor horizontal absolute (column n, 1-indexed) |
\e[n;mH |
CUP — cursor position (row n, column m, 1-indexed) |
\e[nJ |
ED — erase in display (0 = cursor to end, 1 = start to cursor, 2 = all, 3 = scrollback) |
\e[nK |
EL — erase in line (0 = cursor to end, 1 = start to cursor, 2 = all) |
SGR — Select Graphic Rendition
\e[<params>m
Each parameter sets a display attribute. Parameters are applied in sequence. An empty parameter means 0 (reset).
| Code | Effect |
|---|---|
| 0 | Reset all attributes |
| 1 | Bold |
| 2 | Dim (faint) |
| 3 | Italic |
| 4 | Underline |
| 5 | Slow blink |
| 7 | Inverse (swap fg/bg) |
| 8 | Invisible (hidden) |
| 9 | Strikethrough |
| 21 | Double underline |
| 22 | Normal intensity (not bold/dim) |
| 23 | Not italic |
| 24 | Not underlined |
| 25 | Not blinking |
| 27 | Not inverse |
| 28 | Not hidden |
| 29 | Not strikethrough |
| 30-37 | Foreground color (standard: black, red, green, yellow, blue, magenta, cyan, white) |
| 38;5;n | Foreground 256-color (n = 0-255) |
| 38;2;r;g;b | Foreground true color (24-bit RGB) |
| 39 | Default foreground |
| 40-47 | Background color (standard) |
| 48;5;n | Background 256-color |
| 48;2;r;g;b | Background true color |
| 49 | Default background |
| 53 | Overline |
| 55 | Not overline |
| 90-97 | Bright foreground |
| 100-107 | Bright background |
Accumulate, then render. SGR parameters are cumulative. \e[1;4m is bold + underline. \e[1m\e[4m also produces bold + underline. The parser should maintain an attribute state. Each SGR sequence modifies the state. When characters are printed, they use the current attribute state. Don't re-render on every SGR — render when characters arrive.
DEC Private Modes
CSI sequences with ? prefix: \e[?<params><final>.
| Sequence | Action |
|---|---|
\e[?25h |
DECTCEM — show cursor |
\e[?25l |
DECTCEM — hide cursor |
\e[?1049h |
Alternate screen buffer (enter) |
\e[?1049l |
Alternate screen buffer (exit) |
\e[?1000h |
Enable mouse tracking (button events) |
\e[?1002h |
Enable mouse tracking (button + motion) |
\e[?1003h |
Enable mouse tracking (any motion) |
\e[?1000l / 1002l / 1003l |
Disable mouse tracking |
\e[?2004h |
Bracketed paste mode |
\e[?2004l |
Disable bracketed paste |
The DEC private modes are critical for terminal applications. vim uses ?1049h to enter the alternate screen. tmux uses ?1000h for mouse support. Ghostty must handle every one of these correctly.
OSC Sequences (Operating System Command)
Format: ESC ] followed by the command and body, terminated by ST (\e\) or BEL (\a).
\e]<command>;<body>\e\ (ST-terminated)
\e]<command>;<body>\a (BEL-terminated)
| Command | Meaning | Body |
|---|---|---|
| 0, 1, 2 | Set window/icon title | Text string |
| 4 | Set/query color palette | Color index;color spec |
| 7 | Set current directory | file:// URL |
| 8 | Set hyperlink | params;URI (empty URI to close) |
| 10, 11 | Set/get foreground/background color | Color spec |
| 52 | Clipboard access (read/write) | Base64 data |
| 133 | Shell integration | ;A = prompt start, ;B = command start, ;C = command end (exit code follows) |
OSC 52 (clipboard) is a security boundary. Ghostty requires explicit user action (or configuration) to allow programs to read the clipboard. Without this, cat /etc/passwd piped through a malicious OSC 52 reader could exfiltrate data.
DCS Sequences (Device Control String)
Format: ESC P followed by the command, parameter bytes, data, terminated by ST or BEL.
| Sequence | Purpose |
|---|---|
\eP+q...\e\ |
Request termcap/terminfo strings (used by neovim) |
\eP$q...\e\ |
DECRQSS — request status string (what's the current cursor position?) |
| `\eP! | ...\e` |
Ghostty implements sixel support via DCS sequences. Each sixel is a bitmap that occupies grid cells. This is how terminal image protocols work — not as a separate channel, but as specially-interpreted escape sequences in the text stream.
The Kitty Keyboard Protocol
The standard terminal protocol encodes keyboard input poorly. Modifiers (Ctrl, Alt, Super) are hacky (Ctrl+A = byte 0x01, but Ctrl+Shift+A = ?). The kitty keyboard protocol fixes this with explicit key event encoding:
\e[<key>;<modifiers>u // Key press
\e[<key>;<modifiers>:<action>u // Key event with action (press/repeat/release)
| Modifier bit | Key |
|---|---|
| 0x01 | Shift |
| 0x02 | Alt |
| 0x04 | Ctrl |
| 0x08 | Super (Windows/Command key) |
| 0x10 | Hyper |
| 0x20 | Meta |
Each physical key has a Unicode codepoint (e.g., 'a' = 97). The kitty protocol sends the base key plus modifier flags. Ghostty can distinguish Ctrl+Shift+A from Ctrl+A from plain A — which the traditional protocol cannot do.
Ghostty uses the kitty protocol for input reporting. This enables key combinations like Ctrl+Shift+C (copy) to be distinguished from Ctrl+C (SIGINT). The terminal emulator maps these to application-level actions without ambiguity.
Project: vt-parse
Build a VT parser from scratch:
- State machine: implement the full DFA (Ground, Escape, CSI Entry/Param/Intermediate/Ignore, OSC, DCS, SOS/PM/APC)
- CSI dispatch: parse parameters (numbers separated by
;, with defaults for omitted params), intermediate bytes (private mode markers), and call the appropriate action - SGR state: maintain attribute state (fg, bg, bold, italic, underline, inverse). Update on SGR sequences. Apply to characters written to the grid.
- Terminal grid: fixed-size 2D array of cells. Each cell: codepoint, fg color, bg color, attribute flags
- Test it: pipe
ls --color=auto,vim,htop,catof a file with ANSI art through your parser and verify the grid state matches the expected output
// Minimal interface
typedef enum { GROUND, ESCAPE, CSI_ENTRY, CSI_PARAM, CSI_INTERMEDIATE, CSI_IGNORE,
OSC_STRING, DCS_ENTRY, DCS_PARAM, DCS_DATA, DCS_PASSTHROUGH } vt_state_t;
void vt_parse_byte(vt_t *vt, uint8_t byte);
// Called for each byte from the PTY master. Updates vt->state, accumulates params,
// and dispatches actions (print_char, cursor_up, erase_display, set_sgr, etc.)
Ghostty Source to Study
| File | What to study |
|---|---|
src/termio/StreamHandler.zig |
VT parser state machine — the full DFA implementation |
src/terminal/Terminal.zig |
Terminal state: grid, cursor, margins, SGR attributes, tab stops |
src/terminfo.zig |
Terminfo database loading and response generation |
Self-Check
Can you:
- Draw the VT100 state machine (all 10+ states and transitions) from memory
- Parse
\e[38;2;255;128;0;1;4mHello\e[0mby hand: what colors is "Hello" rendered in, and with what attributes? - Explain the difference between CSI, OSC, and DCS: prefix, purpose, and termination
- Why does
\e[?1049hcause "the screen to clear" when entering vim? (It switches to the alternate screen buffer.) - Write the CSI sequence to move the cursor to row 10, column 20
- Explain why the kitty keyboard protocol exists and what problem it solves
- Trace the byte sequence of a key press through the parser: from keyboard input to grid update