16. Turing Machine Encoding#
Introduction#
This notebook explores key ideas in computational theory through Turing Machine encodings and universal computation. We begin by outlining how Turing Machines can be encoded to standardize the representation of computation. Next, we examine two example languages, ALAN and MATHISON, that highlight how computational problems can be framed within this model.
1. The Encoding of Turing Machines#
A Turing machine can be encoded as a string of symbols, allowing us to represent any Turing machine in a standardized format. This encoding is crucial for theoretical computer science as it enables us to treat Turing machines as data that can be processed by other Turing machines.
1.1 Table-Based Encoding Approach#
The encoding process consists of three main steps:
Step 1: Assign numeric codes to states
Step 2: Convert the Turing machine into a Tabular Representation
Step 3: Encode each transition using ‘a’ and ‘b’ Representation
Step 4: Concatenate all individual row encodings into one unified string that encodes the entire Turing Machine
1.1.1 Assigning numeric codes to states#
When encoding a Turing Machine (TM) into a formal representation, it is useful to assign numeric identifiers to each state in the machine. This is a foundational step that supports further encoding processes, especially when we aim to represent the TM as a string of symbols or binary digits. To maintain consistency and simplify the encoding, we adopt the following labeling convention:
Start State: Always labeled as state 1. This is the initial state where the machine begins execution.
Halt State(s): Always labeled as state 2. This includes any accepting final states. In machines with multiple halting conditions, they can be collapsed into a single halt state for simplicity.
Other States: Labeled incrementally from state 3 onward. These represent the working states of the machine. The ordering of the non-special states does not matter, as long as:
Each state has a unique label.
The transitions reference the correct labels.
The same labels are used consistently throughout the machine’s definition.
This flexibility in ordering makes it easier to write or manipulate TMs programmatically, without worrying about the specific sequence of states.
Assigning numeric labels serves several important purposes:
Standardization: Provides a consistent way to reference and distinguish states.
Simplifies Encoding: Numeric values can be easily encoded in binary, unary, or any symbolic format.
Supports Automation: Makes it easier to build simulators, compilers, or universal machines that operate on encoded TMs.
Reduces Ambiguity: Prevents confusion that can arise from using arbitrary or descriptive state names like q_start, q_accept.
1.1.2 Converting the Turing machine into a Tabular Representation#
Just like finite automata (FAs) and pushdown automata (PDAs), Turing Machines (TMs) do not need to be represented solely using diagrams. Instead, we can describe TMs using a summary table: a compact, structured representation that captures the transition behavior of the machine in a clear and standardized way. After we assign numeric labels to all states, we convert the Turing machine into a standard tabular format. Each row represents one transition rule with five columns:
From: The current state (numeric)
To: The next state (numeric)
Read: The symbol being read from the tape
Write: The symbol to write to the tape
Move: The direction to move the tape head (L for Left or R for Right)
Example
graph LR
accTitle: A TM
accDescr: a diagram representing a TM to be converted into a tabular representation
q1((START))
q2((q2))
q3((HALT))
q1 -->|a,a,R| q1
q1 -->|Δ,Δ,R| q1
q1 -->|b,a,R| q2
q2 -->|b,b,L| q2
q2 -->|Δ,b,L| q3
style q1 fill:#90EE90,stroke:#333,stroke-width:3px
style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
We start by assigning numeric codes to states as follows:
State 1: Start state (originally labeled “START”)
State 2: Halt state (originally labeled “HALT”)
State 3: Middle working state (originally labeled “q2”)
So the machine becomes:
graph LR
accTitle: A TM
accDescr: a diagram representing a modifed TM with assigned numeric codes to states
q1((1))
q2((3))
q3((2))
q1 -->|a,a,R| q1
q1 -->|Δ,Δ,R| q1
q1 -->|b,a,R| q2
q2 -->|b,b,L| q2
q2 -->|Δ,b,L| q3
style q1 fill:#90EE90,stroke:#333,stroke-width:3px
style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
Transition Table: the tabular representation of the transtions:
From |
To |
Read |
Write |
Move |
|---|---|---|---|---|
1 |
1 |
a |
a |
R |
1 |
1 |
Δ |
Δ |
R |
1 |
3 |
b |
a |
R |
3 |
3 |
b |
b |
L |
3 |
2 |
Δ |
b |
L |
1.1.3 Encode each transition using ‘a’ and ‘b’ representation#
Each transition is defined as a 5-tuple: (CurrentState, ReadSymbol) → (NextState, WriteSymbol, MoveDirection), or in table format:
From |
To |
Read |
Write |
Move |
|---|---|---|---|---|
\(X_1\) |
\(X_2\) |
\(X_3\) |
\(X_4\) |
\(X_5\) |
State encoding: States are encoded using their numeric position as: \(a^nb\) (n repetitions of ‘a’ followed by ‘b’):
State |
Encoding |
|---|---|
1 |
ab |
2 |
aab |
3 |
aaab |
4 |
aaaab |
5 |
aaaaab |
Symbol Encoding Table: tape symbols are encoded using a fixed 2-character encoding:
Symbol |
Encoding |
|---|---|
a |
aa |
b |
ab |
Δ (blank) |
ba |
# (special) |
bb |
Direction Encoding Table: movement directions are encoded using single-character encoding:
Direction |
Encoding |
|---|---|
L (Left) |
a |
R (Right) |
b |
Example
The table of transitions from the previous section can be encoded as follows:
From |
To |
Read |
Write |
Move |
Encoding |
|---|---|---|---|---|---|
1 |
1 |
a |
a |
R |
ababaaaab |
1 |
1 |
Δ |
Δ |
R |
ababbabab |
1 |
3 |
b |
a |
R |
abaaababaab |
3 |
3 |
b |
b |
L |
aaabaaabababa |
3 |
2 |
Δ |
b |
L |
aaabaabbaaba |
1.1.4 Concatenate all individual row encodings into one unified string that encodes the entire Turing Machine#
Simply concatenate all encoded rows without any separators: CompleteTMEncoding = EncodedRow₁ + EncodedRow₂ + … + EncodedRowₙ. The encoded rows are concatenated in their lexicographic order to ensure a consistent and unambiguous representation of the Turing machine.
Important Note:
No special separators are needed between rows because:
Each state encoding ends with ‘b’, making it self-delimiting
Symbol encodings have fixed length (2 characters)
Direction encodings have fixed length (1 character)
The ordering of rows does not affect the machine’s behavior, however, it is important to standardize the order of the encoded rows when concatenating them, because:
Without a standard ordering, the same Turing machine could have multiple different encodings.
Unique Representation for Each Turing Machine is important to analyze Turing machines in a systematic way
A standard ordering simplifies Universal Turing Machine Implementation (We will explore this further later in the notebook)
Lexicographic (dictionary) order means sorting strings as you would in a dictionary: character by character, using the alphabet’s order.
Example
Let us concatenate all encoded rows in their lexicographic order:
ababaaaab
ababbabab
abaaababaab
aaabaaabababa
aaabaabbaaba
The encodings starting with \(aa\) come before those with \(ab\), So, we reorder the encodings to list those starting with \(aa\) before those starting with \(ab\).
aaabaaabababa
aaabaabbaaba
ababaaaab
ababbabab
abaaababaab
Between the two starting with \(aa\):
\(aaabaaabababa\) vs. \(aaabaabbaaba\): compare their seventh character \(a\) vs. \(b\), so \(aaabaaabababa\) comes first.
Among those starting with \(ab\):
\(abaaababaab\), \(ababaaaab\), \(ababbabab\): compare their fourth character: \(a\), \(b\), \(b\), so \(abaaababaab\) comes before the others.
\(ababaaaab\), \(ababbabab\): compare their fifth character: \(a\), \(b\), so \(ababaaaab\) comes before the other.
Final sorted order:
aaabaaabababa
aaabaabbaaba
abaaababaab
ababaaaab
ababbabab
Simply join them without separators is the complete Turing machine encoding (visually separate the parts by color): aaabaaabababa aaabaabbaaba abaaababaab ababaaaab ababbabab
1.1.5 Code: TuringMachine - Data Container for Encoding#
⚠️ This is a different TuringMachine class from Chapter 15. The Chapter 15 version has run(), step(), transition_function, accept_state, and reject_state. This Chapter 16 version is a data container only — it stores the machine’s specification and auto-assigns numeric state IDs. It has no run() method and cannot simulate computation. Do not mix instances from the two chapters.
This class implements Step 1 (state numbering) and Step 2 (tabular representation) of Section 1.1.
Formal convention → Python mapping
Convention (Section 1.1.1) |
Python encoding |
|---|---|
Start state → always State 1 |
|
Halt state(s) → always State 2 |
|
All other states → 3, 4, 5, … |
|
Transition as 5-tuple \((q, r) \to (q', w, d)\) |
|
What to look for: _create_state_mapping() enforces the numbering convention automatically. After construction, self.state_mapping is the dictionary that all encoder methods use to translate symbolic state names to their numeric codes.
Running outside Jupyter: This class is a dependency — save it at the top of ch16_encoder.py. It produces no output on its own.
# ── Block 1: TuringMachine — Data Container for Encoding ────────────────────
# This class is NOT the simulation TM from Chapter 15.
# It stores the machine's formal specification and auto-assigns numeric state IDs.
# It has NO run() or step() method. Use it as input to TMEncoder (Block 2).
class TuringMachine:
"""
Data container representing a Turing Machine's formal specification.
Implements the state-numbering convention from Section 1.1.1:
State 1 = start state (always)
State 2 = halt state (always; multiple halt states all map to 2)
State 3+ = other states (assigned in iteration order)
"""
def __init__(self, states, alphabet, tape_alphabet, transitions,
start_state, halt_states):
"""
Args:
states: List of all state names (arbitrary strings)
alphabet: Input alphabet Σ (list of symbols, no blank)
tape_alphabet: Tape alphabet Γ = Σ ∪ {blank, ...}
transitions: Dict mapping (state_name, symbol) → (next_state, write, dir)
Each value is a 3-tuple: (next_state_name, write_symbol, 'L'|'R'|'S')
start_state: Name of the start state q₀ (will be mapped to integer 1)
halt_states: Name(s) of halt state(s) — a string or list of strings
(all mapped to integer 2; convention collapses multiple halt states)
"""
self.states = states
self.alphabet = alphabet
self.tape_alphabet = tape_alphabet
self.transitions = transitions
self.start_state = start_state
# Normalize halt_states to a list regardless of how it was passed
self.halt_states = halt_states if isinstance(halt_states, list) else [halt_states]
# Apply the Section 1.1.1 numbering convention
self.state_mapping = self._create_state_mapping()
def _create_state_mapping(self):
"""
Assign numeric IDs following the convention:
1 → start state
2 → all halt states
3, 4, 5, ... → remaining states (in iteration order of self.states)
Returns: dict mapping state_name (str) → state_number (int)
"""
mapping = {}
# Convention Rule: start state is always 1
mapping[self.start_state] = 1
# Convention Rule: all halt states collapse to 2
# (Multiple accept states in a machine are treated as one HALT state)
for halt_state in self.halt_states:
mapping[halt_state] = 2
# Remaining states: numbered 3, 4, 5, ... in iteration order
state_num = 3
for state in self.states:
if state not in mapping: # skip start and halt — already assigned
mapping[state] = state_num
state_num += 1
return mapping
def display_info(self):
"""Print the machine's specification and the resulting state mapping."""
print("Turing Machine Configuration:")
print(f" States: {self.states}")
print(f" Start state: {self.start_state}")
print(f" Halt state(s): {self.halt_states}")
print(f" Input alphabet: {self.alphabet}")
print(f" Tape alphabet: {self.tape_alphabet}")
print(f"\nState Mapping (Section 1.1.1 convention):")
for state, num in sorted(self.state_mapping.items(), key=lambda x: x[1]):
role = " (START)" if num == 1 else " (HALT)" if num == 2 else ""
print(f" {state} → {num}{role}")
print(f"\nNumber of transitions: {len(self.transitions)}")
1.1.6 Code: TMEncoder — Implements the a/b Encoding Scheme#
Depends on: TuringMachine (Block 1). Run Block 1 first.
This class implements Steps 3 and 4 of Section 1.1: encoding each transition as an a/b string and concatenating in lexicographic order.
Complete encoding table
Element |
Formal notation |
Encoding rule |
Example |
|---|---|---|---|
State \(n\) |
\(a^n b\) |
\(n\) repetitions of |
State 3 → |
Symbol |
— |
fixed 2 chars: |
|
Symbol |
— |
fixed 2 chars: |
|
Blank \(\Delta\) |
— |
fixed 2 chars: |
|
Special |
— |
fixed 2 chars: |
|
Direction L |
— |
1 char: |
|
Direction R |
— |
1 char: |
|
⚠️ Design warning: symbol_codes also maps digits '0'–'7' to 3-character codes (e.g., '0' → 'aaa'). This violates the fixed-2-character rule for symbols described in Section 1.1.3 and will produce decodings that fail with TuringMachineDecoder. Do not use digit symbols in encoding exercises unless the decoder is also extended.
⚠️ 'S' (Stay) direction maps to 'ab' (2 characters), violating the fixed-1-character rule for directions. Real Turing machines typically do not have a Stay move; this is a convenience extension. Transitions using 'S' cannot be decoded by the standard TuringMachineDecoder.
One transition’s encoding structure (9 characters minimum):
encode_state(from) encode_state(to) encode_symbol(read) encode_symbol(write) encode_direction(move)
a^{n1}b a^{n2}b 2 chars 2 chars 1 char
This is self-delimiting because state encodings end in 'b' and symbol encodings have fixed length — no separator is needed.
Running outside Jupyter: Save Blocks 1+2 in ch16_encoder.py. Run:
python ch16_encoder.py
# ── Block 2: TMEncoder — Implements Section 1.1.3 and 1.1.4 ────────────────
# Encodes a TuringMachine (Block 1) into an 'a'/'b' string.
# Depends on: TuringMachine (Block 1)
class TMEncoder:
"""
Encodes Turing Machine transitions as 'a'/'b' strings following Section 1.1.3.
Encoding rules:
State n: a^n b (variable length, self-delimiting — 'b' marks the end)
Tape symbol: 2-char fixed code from symbol_codes table
Direction: 1-char code ('a' = Left, 'b' = Right)
A single transition (From, To, Read, Write, Move) encodes as:
encode_state(From) + encode_state(To) + encode_symbol(Read)
+ encode_symbol(Write) + encode_direction(Move)
"""
def __init__(self):
# --- Symbol encoding table (Section 1.1.3, fixed 2-char codes) ---
# ⚠️ WARNING: digits '0'–'7' use 3-char codes, violating the 2-char rule.
# Do not use digit symbols in exercises meant to be decoded by TuringMachineDecoder.
self.symbol_codes = {
'a': 'aa', # Core alphabet symbol
'b': 'ab', # Core alphabet symbol
'Δ': 'ba', # Blank symbol (also accept '_' as alternative notation)
'_': 'ba', # Alternative blank notation
'#': 'bb', # Special marker symbol
# Extended alphabet for binary machines (3-char codes — non-standard):
'0': 'aaa', '1': 'aab', '2': 'aba', '3': 'abb',
'4': 'baa', '5': 'bab', '6': 'bba', '7': 'bbb',
}
# --- Direction encoding table (Section 1.1.3, 1-char codes) ---
# ⚠️ 'S' (Stay) uses 2 chars — non-standard extension, not decodable by TuringMachineDecoder
self.direction_codes = {
'L': 'a', # Move head Left
'R': 'b', # Move head Right
'S': 'ab', # Stay (non-standard; not in standard TM model)
}
# ── Primitive encoders ───────────────────────────────────────────────────
def encode_state(self, state_num):
"""
Encode state number n as a^n b (Section 1.1.3).
The trailing 'b' makes state encodings self-delimiting:
a decoder can find the end of one state's encoding by scanning for 'b'.
"""
if not isinstance(state_num, int) or state_num < 1:
raise ValueError(f"State must be a positive integer, got {state_num}")
return 'a' * state_num + 'b' # e.g., state 3 → 'aaab'
def encode_symbol(self, symbol):
"""
Encode a tape symbol as a fixed-length 2-character code (Section 1.1.3).
Fixed length means no delimiter is needed between symbol encodings.
"""
if symbol not in self.symbol_codes:
raise ValueError(f"Unknown symbol: '{symbol}'. "
f"Valid symbols: {list(self.symbol_codes.keys())}")
return self.symbol_codes[symbol]
def encode_direction(self, direction):
"""
Encode move direction as a fixed-length 1-character code (Section 1.1.3).
"""
if direction not in self.direction_codes:
raise ValueError(f"Unknown direction: '{direction}'. Valid: L, R, S")
return self.direction_codes[direction]
# ── Composite encoders ───────────────────────────────────────────────────
def encode_transition(self, from_state, to_state, read_sym, write_sym, direction):
"""
Encode one row of the transition table as a single 'a'/'b' string.
Output structure (all parts concatenated, no separators):
a^{from}b a^{to}b 2-char-read 2-char-write 1-char-direction
└──────┘ └─────┘ └─────────┘ └──────────┘ └─────────────┘
≥2 chars ≥2 chars 2 chars 2 chars 1 char
Minimum total: 2+2+2+2+1 = 9 characters (from=1, to=1, standard symbols/direction)
"""
return (
self.encode_state(from_state) + # X1: From state
self.encode_state(to_state) + # X2: To state
self.encode_symbol(read_sym) + # X3: Read symbol
self.encode_symbol(write_sym) + # X4: Write symbol
self.encode_direction(direction) # X5: Move direction
)
def encode_tm(self, tm, use_lexicographic=True):
"""
Encode a complete Turing Machine (Section 1.1.4).
Steps:
1. Translate each transition's state names to numeric IDs via tm.state_mapping
2. Encode each transition individually
3. Sort lexicographically (if use_lexicographic=True) to ensure unique representation
4. Concatenate all encoded transitions without separators
Lexicographic sorting (Section 1.1.4) is required for uniqueness:
without it, the same TM could have multiple valid encodings.
"""
# --- Step 1+2: Translate and encode each transition ---
numeric_transitions = []
for (state, symbol), (next_state, write_symbol, direction) in tm.transitions.items():
from_num = tm.state_mapping[state] # Symbolic → numeric state ID
to_num = tm.state_mapping[next_state]
encoded = self.encode_transition(from_num, to_num, symbol, write_symbol, direction)
numeric_transitions.append({
'from': from_num,
'to': to_num,
'read': symbol,
'write': write_symbol,
'move': direction,
'encoded': encoded,
})
# --- Step 3: Lexicographic sort for canonical ordering ---
if use_lexicographic:
numeric_transitions.sort(key=lambda x: x['encoded'])
# --- Step 4: Concatenate (no separators — encoding is self-delimiting) ---
encoded_tm = ''.join(t['encoded'] for t in numeric_transitions)
return encoded_tm, numeric_transitions
def display_encoding_details(self, tm, encoded_tm, transitions):
"""Print a formatted table showing each transition's encoding."""
print("\nEncoding Details:")
print("-" * 100)
print(f"{'From':<10} {'To':<10} {'Read':<10} {'Write':<10} {'Move':<10} {'Encoding'}")
print("-" * 100)
for trans in transitions:
# Reverse-lookup: numeric ID → symbolic state name
# Bug fix: [s for ...][0] — was missing [0] in original, causing list to print as string
from_name = [s for s, n in tm.state_mapping.items() if n == trans['from']][0]
to_name = [s for s, n in tm.state_mapping.items() if n == trans['to']][0]
print(f"{from_name:<10} {to_name:<10} {trans['read']:<10} "
f"{trans['write']:<10} {trans['move']:<10} {trans['encoded']:<30}")
print("-" * 100)
print(f"\nComplete Encoding ({len(encoded_tm)} characters):")
print(encoded_tm)
1.1.7 Code: Three Encoding Examples + Verification#
Depends on: TuringMachine (Block 1) and TMEncoder (Block 2). Run Blocks 1–2 first.
This block demonstrates encoding on three concrete machines of increasing complexity:
Example |
Machine purpose |
States |
Transitions |
Key concept illustrated |
|---|---|---|---|---|
|
Converts all |
3 |
6 |
Basic encoding with |
|
TM from Section 1.1.2’s worked example |
3 |
5 |
Round-trip: table → encoding matches the textbook |
|
Adds 1 to a binary number |
5 |
12 |
Multi-state encoding with binary digits (3-char symbol codes) |
⚠️ example_simple_tm() and example_binary_increment() use 'S' (Stay) or binary digit symbols — see Block 2’s warnings. Their encodings are not decodable by TuringMachineDecoder (Block 5). Only example_from_table() produces a fully round-trip–compatible encoding.
Running outside Jupyter:
python ch16_encoder.py # runs all three examples when __name__ == "__main__"
# ── Block 3: Encoding Examples ───────────────────────────────────────────────
# Depends on: TuringMachine (Block 1), TMEncoder (Block 2)
# ⚠️ example_simple_tm() and example_binary_increment() use 'S' or digit symbols
# which are NOT compatible with TuringMachineDecoder (Block 5).
def example_simple_tm():
"""
Example 1: TM that converts every 'a' to 'b'.
Demonstrates basic encoding with three states and six transitions.
⚠️ Uses 'S' (Stay) direction — non-standard, not decodable by Block 5 decoder.
"""
print("=" * 100)
print("Example 1: Simple TM that converts 'a' to 'b'")
print("=" * 100)
states = ['q_start', 'q_scan', 'q_halt']
alphabet = ['a', 'b']
tape_alphabet = ['a', 'b', 'Δ']
# Transition function: (state, read) → (next_state, write, direction)
transitions = {
# --- Phase: scan rightward, replacing 'a' with 'b' ---
('q_start', 'a'): ('q_scan', 'b', 'R'), # First 'a' → 'b', begin scan
('q_start', 'b'): ('q_start', 'b', 'R'), # Skip 'b', continue scanning
('q_start', 'Δ'): ('q_halt', 'Δ', 'S'), # End of tape → halt (Stay)
('q_scan', 'a'): ('q_scan', 'b', 'R'), # Replace subsequent 'a's
('q_scan', 'b'): ('q_scan', 'b', 'R'), # Skip 'b', continue scanning
('q_scan', 'Δ'): ('q_halt', 'Δ', 'S'), # End of tape → halt (Stay)
}
tm = TuringMachine(states, alphabet, tape_alphabet, transitions, 'q_start', 'q_halt')
encoder = TMEncoder()
tm.display_info()
encoded, trans_list = encoder.encode_tm(tm)
encoder.display_encoding_details(tm, encoded, trans_list)
return tm, encoded
def example_from_table():
"""
Example 2: TM from the Section 1.1.2 worked example.
This is the ONLY example in this block fully compatible with TuringMachineDecoder.
The output encoding should match the textbook's lexicographically sorted result.
"""
print("\n" + "=" * 100)
print("Example 2: TM from the Section 1.1.2 transition table")
print("=" * 100)
# State names already match numeric convention (1=start, 2=halt, 3=working)
states = ['1', '2', '3']
alphabet = ['a', 'b']
tape_alphabet = ['a', 'b', 'Δ']
# Transition table from Section 1.1.2:
# From To Read Write Move
# 1 1 a a R (self-loop: skip 'a')
# 1 1 Δ Δ R (self-loop: skip blank → causes loops on some inputs)
# 1 3 b a R (first 'b' → write 'a', go to state 3)
# 3 3 b b L (self-loop: scan back over 'b')
# 3 2 Δ b L (write 'b', halt)
transitions = {
('1', 'a'): ('1', 'a', 'R'),
('1', 'Δ'): ('1', 'Δ', 'R'),
('1', 'b'): ('3', 'a', 'R'),
('3', 'b'): ('3', 'b', 'L'),
('3', 'Δ'): ('2', 'b', 'L'),
}
tm = TuringMachine(states, alphabet, tape_alphabet, transitions, '1', '2')
encoder = TMEncoder()
tm.display_info()
encoded, trans_list = encoder.encode_tm(tm)
encoder.display_encoding_details(tm, encoded, trans_list)
return tm, encoded
def example_binary_increment():
"""
Example 3: TM that adds 1 to a binary number on the tape.
Demonstrates encoding of a 5-state machine with 12 transitions.
⚠️ Uses binary digit symbols ('0', '1') which use 3-char codes in TMEncoder —
this violates the 2-char fixed-length rule. Not decodable by Block 5.
"""
print("\n" + "=" * 100)
print("Example 3: Binary Increment TM")
print("=" * 100)
states = ['q0', 'q1', 'q2', 'q3', 'qaccept']
alphabet = ['0', '1']
tape_alphabet = ['0', '1', 'Δ']
transitions = {
# --- Phase 1: Scan right to the last digit ---
('q0', '0'): ('q0', '0', 'R'),
('q0', '1'): ('q0', '1', 'R'),
('q0', 'Δ'): ('q1', 'Δ', 'L'), # Reached end → start adding 1
# --- Phase 2: Add 1 with carry propagation ---
('q1', '0'): ('q2', '1', 'L'), # 0+1=1, no carry → done with increment
('q1', '1'): ('q1', '0', 'L'), # 1+1=0, carry 1 → continue left
('q1', 'Δ'): ('q3', '1', 'R'), # Overflow: write leading 1, fix up
# --- Phase 3: Scan left back to start ---
('q2', '0'): ('q2', '0', 'L'),
('q2', '1'): ('q2', '1', 'L'),
('q2', 'Δ'): ('qaccept', 'Δ', 'R'), # Back at start → accept
# --- Phase 4: Handle overflow (all 1s → prepend 1, rest already 0) ---
('q3', '0'): ('q3', '0', 'R'),
('q3', '1'): ('q3', '1', 'R'),
('q3', 'Δ'): ('qaccept', 'Δ', 'S'), # Done → accept
}
tm = TuringMachine(states, alphabet, tape_alphabet, transitions, 'q0', 'qaccept')
encoder = TMEncoder()
tm.display_info()
encoded, trans_list = encoder.encode_tm(tm)
encoder.display_encoding_details(tm, encoded, trans_list)
return tm, encoded
def verify_encoding(encoded_string):
"""
Quick sanity check: a valid TM encoding must consist only of 'a' and 'b'.
This is necessary but not sufficient — CWLValidator (Block 11) provides full validation.
"""
if not all(c in 'ab' for c in encoded_string):
return False, "String contains characters other than 'a' and 'b'"
if len(encoded_string) == 0:
return False, "Empty encoding"
return True, "Valid character set (passes basic check)"
# ── Main runner ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
tm1, enc1 = example_simple_tm()
tm2, enc2 = example_from_table()
tm3, enc3 = example_binary_increment()
# --- Encoding statistics ---
print("\n" + "=" * 100)
print("Encoding Verification")
print("=" * 100)
for i, (tm, enc) in enumerate([(tm1, enc1), (tm2, enc2), (tm3, enc3)], 1):
valid, msg = verify_encoding(enc)
print(f"Example {i}: {msg}")
print(f" Length: {len(enc)} characters")
print(f" 'a' count: {enc.count('a')}, 'b' count: {enc.count('b')}")
if enc.count('b') > 0:
print(f" Ratio a:b = {enc.count('a') / enc.count('b'):.2f}:1")
====================================================================================================
Example 1: Simple TM that converts 'a' to 'b'
====================================================================================================
Turing Machine Configuration:
States: ['q_start', 'q_scan', 'q_halt']
Start state: q_start
Halt state(s): ['q_halt']
Input alphabet: ['a', 'b']
Tape alphabet: ['a', 'b', 'Δ']
State Mapping (Section 1.1.1 convention):
q_start → 1 (START)
q_halt → 2 (HALT)
q_scan → 3
Number of transitions: 6
Encoding Details:
----------------------------------------------------------------------------------------------------
From To Read Write Move Encoding
----------------------------------------------------------------------------------------------------
q_scan q_scan a b R aaabaaabaaabb
q_scan q_scan b b R aaabaaabababb
q_scan q_halt Δ Δ S aaabaabbabaab
q_start q_scan a b R abaaabaaabb
q_start q_halt Δ Δ S abaabbabaab
q_start q_start b b R ababababb
----------------------------------------------------------------------------------------------------
Complete Encoding (70 characters):
aaabaaabaaabbaaabaaabababbaaabaabbabaababaaabaaabbabaabbabaabababababb
====================================================================================================
Example 2: TM from the Section 1.1.2 transition table
====================================================================================================
Turing Machine Configuration:
States: ['1', '2', '3']
Start state: 1
Halt state(s): ['2']
Input alphabet: ['a', 'b']
Tape alphabet: ['a', 'b', 'Δ']
State Mapping (Section 1.1.1 convention):
1 → 1 (START)
2 → 2 (HALT)
3 → 3
Number of transitions: 5
Encoding Details:
----------------------------------------------------------------------------------------------------
From To Read Write Move Encoding
----------------------------------------------------------------------------------------------------
3 3 b b L aaabaaabababa
3 2 Δ b L aaabaabbaaba
1 3 b a R abaaababaab
1 1 a a R ababaaaab
1 1 Δ Δ R ababbabab
----------------------------------------------------------------------------------------------------
Complete Encoding (54 characters):
aaabaaabababaaaabaabbaabaabaaababaabababaaaabababbabab
====================================================================================================
Example 3: Binary Increment TM
====================================================================================================
Turing Machine Configuration:
States: ['q0', 'q1', 'q2', 'q3', 'qaccept']
Start state: q0
Halt state(s): ['qaccept']
Input alphabet: ['0', '1']
Tape alphabet: ['0', '1', 'Δ']
State Mapping (Section 1.1.1 convention):
q0 → 1 (START)
qaccept → 2 (HALT)
q1 → 3
q2 → 4
q3 → 5
Number of transitions: 12
Encoding Details:
----------------------------------------------------------------------------------------------------
From To Read Write Move Encoding
----------------------------------------------------------------------------------------------------
q3 q3 0 0 R aaaaabaaaaabaaaaaab
q3 q3 1 1 R aaaaabaaaaabaabaabb
q3 qaccept Δ Δ S aaaaabaabbabaab
q2 q2 0 0 L aaaabaaaabaaaaaaa
q2 q2 1 1 L aaaabaaaabaabaaba
q2 qaccept Δ Δ R aaaabaabbabab
q1 q3 Δ 1 R aaabaaaaabbaaabb
q1 q2 0 1 L aaabaaaabaaaaaba
q1 q1 1 0 L aaabaaabaabaaaa
q0 q1 Δ Δ L abaaabbabaa
q0 q0 0 0 R ababaaaaaab
q0 q0 1 1 R ababaabaabb
----------------------------------------------------------------------------------------------------
Complete Encoding (180 characters):
aaaaabaaaaabaaaaaabaaaaabaaaaabaabaabbaaaaabaabbabaabaaaabaaaabaaaaaaaaaaabaaaabaabaabaaaaabaabbababaaabaaaaabbaaabbaaabaaaabaaaaabaaaabaaabaabaaaaabaaabbabaaababaaaaaabababaabaabb
====================================================================================================
Encoding Verification
====================================================================================================
Example 1: Valid character set (passes basic check)
Length: 70 characters
'a' count: 42, 'b' count: 28
Ratio a:b = 1.50:1
Example 2: Valid character set (passes basic check)
Length: 54 characters
'a' count: 34, 'b' count: 20
Ratio a:b = 1.70:1
Example 3: Valid character set (passes basic check)
Length: 180 characters
'a' count: 133, 'b' count: 47
Ratio a:b = 2.83:1
2. The Decoding of Turing Machines#
Decoding is the reverse process of encoding and we take an encoded string and reconstruct the original Turing machine. The decoding process follows these key principles:
Each transition has exactly 5 components (From, To, Read, Write, Move)
State numbers are encoded as repeated \(a\)’s followed by a \(b\)
Symbols use fixed 2-character codes (aa, ab, ba, bb)
Directions use single-character codes (a for L, b for R)
No explicit separators exist between transitions
2.1 Step-by-Step Algorithm#
Step 1: Parsing individual transitions from the encoded string
Decode the “From” state number by counting the number of repeated \(a\) symbols in the encoding.
Skip the next \(b\) since it only actss as a separator
Decode the “To” state number by counting the number of repeated \(a\) symbols in the encoding.
Skip the next \(b\) since it only actss as a separator
Decoding the “Read” symbol by reading the next two characters and decoding it using the Symbol encoding table in reverse
Decoding the “Write” symbol by reading the next two characters and decoding it using the Symbol encoding table in reverse
Decode the direction by reading the next character and using the direction table in reverse.
Fill one row of the transition table with all the decoded information to form a complete transition. Step 2: Repeat Step 1 if there are still unprocessed characters in the encoding.
2.2 Examples#
Given an encoding string \(ababababbabaaabaaabbaaabaaabababaaaabaabbaaba\), let’s decode this string by parsing each component of the transitions.
Transition 1, starting at position 0:
From State: ab → 1 ‘a’ + ‘b’ = State 1
To State: ab → 1 ‘a’ + ‘b’ = State 1
Read Symbol: ab → ‘b’
Write Symbol: ab → ‘b’
Direction: b → R (right)
Transition 1: (1, b) → (1, b, R)
Position after: 9
Transition 2: starting at position 9:
From State: ab → 1 ‘a’ + ‘b’ = State 1
To State: aaab → 3 ‘a’s + ‘b’ = State 3
Read Symbol: aa → ‘a’
Write Symbol: ab → ‘b’
Direction: b → R (right)
Transition 2: (1, a) → (3, b, R)
Position after: 19
Transition 3: starting at position 19:
From State: aaab → 3 ‘a’s + ‘b’ = State 3
To State: aaab → 3 ‘a’s + ‘b’ = State 3
Read Symbol: ab → ‘b’
Write Symbol: ab → ‘b’
Direction: a → L (left)
Transition 3: (3, b) → (3, b, L)
Position after: 30
Transition 4: starting at position 30:
From State: aaab → 3 ‘a’s + ‘b’ = State 3
To State: aab → 2 ‘a’s + ‘b’ = State 2
Read Symbol: ba → ‘Δ’ (blank)
Write Symbol: ab → ‘b’
Direction: a → L (left)
Transition 4: (3, Δ) → (2, b, L)
Position after: 42 (end of string)
Summary of Decoded Transitions:
Transition |
From |
Read |
To |
Write |
Move |
|---|---|---|---|---|---|
1 |
1 |
b |
1 |
b |
R |
2 |
1 |
a |
3 |
b |
R |
3 |
3 |
b |
3 |
b |
L |
4 |
3 |
Δ |
2 |
b |
L |
2.3 Code: TuringMachineDecoder — Implements Section 2.1 (Step-by-Step Algorithm)#
Depends on: Nothing — this class is self-contained. However, its outputs feed into Blocks 6–10.
This class implements the reverse of TMEncoder: given an a/b string, it reconstructs the sequence of \((q, r) \to (q', w, d)\) tuples without any separators in the input.
Formal parsing rules (Section 2.1)
Component |
Parsing rule |
Code |
|---|---|---|
From state |
Count |
|
To state |
Same, immediately after From state |
|
Read symbol |
Read next 2 characters; look up in reverse symbol table |
|
Write symbol |
Read next 2 characters; look up in reverse symbol table |
|
Direction |
Read next 1 character; look up in reverse direction table |
|
Minimum valid transition length: 'ab' + 'ab' + 'aa' + 'aa' + 'b' = 9 characters (from=1, to=1, read=a, write=a, direction=R). The decoder enforces this minimum in decode_complete_machine().
Running outside Jupyter: Save this class alongside TuringMachine and TMEncoder in ch16_encoder.py, then run:
python ch16_encoder.py
# ── Block 5: TuringMachineDecoder — Implements Section 2.1 ──────────────────
# Reverses the TMEncoder process: parses an 'a'/'b' string into transition tuples.
# No dependencies on prior blocks — but its output feeds Blocks 6–10.
class TuringMachineDecoder:
"""
Decodes an 'a'/'b' string into a list of Turing Machine transitions.
Decoding exploits the self-delimiting property of the encoding:
- State: variable-length a^n b (count 'a's until 'b' → state number n)
- Symbol: fixed-length 2 chars (look up in symbol_decode)
- Direction: fixed-length 1 char (look up in direction_decode)
No separator characters exist — parsing advances a position pointer left-to-right.
"""
def __init__(self):
# Reverse of TMEncoder's symbol_codes (2-char code → symbol name)
self.symbol_decode = {
'aa': 'a',
'ab': 'b',
'ba': 'Δ', # Blank
'bb': '#', # Special marker
}
# Reverse of TMEncoder's direction_codes (1-char code → direction)
self.direction_decode = {
'a': 'L',
'b': 'R',
}
self.debug = True # Print step-by-step position trace when True
# ── Primitive decoders ───────────────────────────────────────────────────
def decode_state(self, encoding, start_pos):
"""
Decode a state number at position start_pos.
Rule: count leading 'a's → that count is the state number.
the immediately following 'b' is consumed as the delimiter.
Returns: (state_number: int, new_position: int)
"""
pos = start_pos
a_count = 0
# Count consecutive 'a's
while pos < len(encoding) and encoding[pos] == 'a':
a_count += 1
pos += 1
# The 'b' delimiter must follow
if pos < len(encoding) and encoding[pos] == 'b':
pos += 1 # consume the 'b'
return a_count, pos
else:
raise ValueError(
f"Invalid state encoding at position {start_pos}: "
f"expected 'b' after {a_count} 'a's, found "
f"'{encoding[pos] if pos < len(encoding) else 'end-of-string'}'"
)
def decode_symbol(self, encoding, start_pos):
"""
Decode a tape symbol at position start_pos.
Rule: read exactly 2 characters and look up in symbol_decode.
Returns: (symbol: str, new_position: int)
"""
if start_pos + 2 > len(encoding):
raise ValueError(
f"Not enough characters for symbol at position {start_pos} "
f"(need 2, have {len(encoding) - start_pos})"
)
code = encoding[start_pos:start_pos + 2]
if code not in self.symbol_decode:
raise ValueError(f"Invalid symbol code '{code}' at position {start_pos}")
return self.symbol_decode[code], start_pos + 2
def decode_direction(self, encoding, start_pos):
"""
Decode a direction at position start_pos.
Rule: read exactly 1 character and look up in direction_decode.
Returns: (direction: str, new_position: int)
"""
if start_pos >= len(encoding):
raise ValueError(f"No character for direction at position {start_pos}")
code = encoding[start_pos]
if code not in self.direction_decode:
raise ValueError(f"Invalid direction code '{code}' at position {start_pos}")
return self.direction_decode[code], start_pos + 1
# ── Composite decoder ────────────────────────────────────────────────────
def decode_single_transition(self, encoding, start_pos):
"""
Decode one complete 5-tuple transition starting at start_pos.
Applies decode_state (×2), decode_symbol (×2), decode_direction (×1) in order.
Returns: (transition_dict, new_position)
where transition_dict has keys: 'from', 'to', 'read', 'write', 'move'
"""
if self.debug:
print(f"\nDecoding transition at position {start_pos}")
preview = encoding[start_pos:start_pos + 20]
print(f" Substring: {preview}{'...' if start_pos + 20 < len(encoding) else ''}")
pos = start_pos
# --- Decode X1: From state ---
from_state, pos = self.decode_state(encoding, pos)
if self.debug: print(f" From State: {from_state} (pos now {pos})")
# --- Decode X2: To state ---
to_state, pos = self.decode_state(encoding, pos)
if self.debug: print(f" To State: {to_state} (pos now {pos})")
# --- Decode X3: Read symbol ---
read_symbol, pos = self.decode_symbol(encoding, pos)
if self.debug: print(f" Read: '{read_symbol}' (pos now {pos})")
# --- Decode X4: Write symbol ---
write_symbol, pos = self.decode_symbol(encoding, pos)
if self.debug: print(f" Write: '{write_symbol}' (pos now {pos})")
# --- Decode X5: Direction ---
direction, pos = self.decode_direction(encoding, pos)
if self.debug: print(f" Direction: {direction} (pos now {pos})")
transition = {
'from': from_state,
'to': to_state,
'read': read_symbol,
'write': write_symbol,
'move': direction,
}
return transition, pos
def decode_complete_machine(self, encoding):
"""
Decode an entire TM encoding string into a list of transitions.
Repeatedly calls decode_single_transition until the string is exhausted.
Returns: list of transition dicts
"""
transitions = []
pos = 0
print(f"\nDecoding Turing Machine")
print(f"Total encoding length: {len(encoding)} characters")
print("=" * 60)
if len(encoding) == 0:
print("Empty encoding — no transitions.")
return transitions
# 9 is the minimum valid transition length (see pre-code note)
if len(encoding) < 9:
print(f"Encoding too short for a valid transition "
f"(need ≥9 chars, got {len(encoding)})")
return transitions
while pos < len(encoding):
try:
transition, new_pos = self.decode_single_transition(encoding, pos)
transitions.append(transition)
print(f"\nTransition {len(transitions)}: "
f"({transition['from']}, '{transition['read']}') → "
f"({transition['to']}, '{transition['write']}', {transition['move']})")
pos = new_pos
if pos == len(encoding):
n = len(transitions)
print(f"\nSuccessfully decoded {'all ' + str(n) if n > 1 else 'single'} "
f"transition{'s' if n != 1 else ''}.")
break
except ValueError as e:
if len(transitions) == 0:
print(f"\nError decoding first transition at position {pos}: {e}")
else:
print(f"\nStopped after {len(transitions)} transition(s).")
print(f"Remaining characters at position {pos}: '{encoding[pos:]}'")
break
print(f"\nDecoded TM contains {len(transitions)} transition(s).")
return transitions
2.4 Code: Decoder Tests — Single and Multiple Transitions#
Depends on: TuringMachineDecoder (Block 5). Run Block 5 first.
This block instantiates the decoder and runs two worked examples from Section 2.2:
Single transition:
"abaaabaaaba"— decode(1, a) → (3, a, L)step by step.Multiple transitions: a 4-transition string built by
create_example_encoding().
The variable multiple_transitions defined here is reused in Blocks 7 and 10. If you restart the kernel and skip this block, those cells will raise a NameError.
Manual trace for "abaaabaaaba":
Position |
Characters |
Meaning |
|---|---|---|
0–1 |
|
From state: 1 |
2–5 |
|
To state: 3 |
6–7 |
|
Read symbol: |
8–9 |
|
Write symbol: |
10 |
|
Direction: |
Result: \((1, a) \to (3, b, L)\)
Running outside Jupyter: Include this block after TuringMachineDecoder in ch16_encoder.py.
# ── Block 6: Decoder Tests ───────────────────────────────────────────────────
# Depends on: TuringMachineDecoder (Block 5)
# NOTE: multiple_transitions variable is reused in Blocks 7 and 10.
decoder = TuringMachineDecoder()
# ── Example 1: Decode a single transition ────────────────────────────────────
# Encoding: "abaaabaaaba"
# Expected result: (1, 'a') → (3, 'b', L)
single_transition = "abaaabaaaba"
print("EXAMPLE 1: Decoding a single transition")
print(f"Encoded string: {single_transition}")
print("\nManual step-by-step breakdown (Section 2.2):")
print(" 'ab' → State 1 (1 'a' + 'b')")
print(" 'aaab' → State 3 (3 'a's + 'b')")
print(" 'aa' → Symbol 'a'")
print(" 'ab' → Symbol 'b'")
print(" 'a' → Direction 'L'")
transition, _ = decoder.decode_single_transition(single_transition, 0)
print(f"\nDecoded transition: {transition}")
print(f"In formal notation: δ({transition['from']}, '{transition['read']}') = "
f"({transition['to']}, '{transition['write']}', {transition['move']})")
# ── Example 2: Decode a 4-transition encoding ────────────────────────────────
# This encoding matches Example 2 from Block 3's TM (Section 1.1.2 worked example)
# and corresponds to the transitions decoded manually in Section 2.2.
def create_example_encoding():
"""
Manually build the 4-transition encoding from Section 2.2.
Uses explicit concatenation so students can see each component.
Expected transitions:
(1,b) → (1,b,R)
(1,a) → (3,b,R)
(3,b) → (3,b,L)
(3,Δ) → (2,b,L)
"""
# Each string: encode_state(from) + encode_state(to) + encode_symbol(read)
# + encode_symbol(write) + encode_direction(move)
row1 = "ab" + "ab" + "ab" + "ab" + "b" # (1,b)→(1,b,R)
row2 = "ab" + "aaab" + "aa" + "ab" + "b" # (1,a)→(3,b,R)
row3 = "aaab" + "aaab" + "ab" + "ab" + "a" # (3,b)→(3,b,L)
row4 = "aaab" + "aab" + "ba" + "ab" + "a" # (3,Δ)→(2,b,L)
return row1 + row2 + row3 + row4 # no separators (Section 1.1.4)
multiple_transitions = create_example_encoding() # ← reused in Blocks 7 and 10
print(f"\nEXAMPLE 2: Decoding 4 transitions concatenated without separators")
print(f"Encoded string: {multiple_transitions}")
print(f"Length: {len(multiple_transitions)} characters")
decoder.debug = False # suppress position trace for cleaner multi-transition output
decoded_transitions = decoder.decode_complete_machine(multiple_transitions)
print("\nSummary of decoded transitions:")
for i, t in enumerate(decoded_transitions, 1):
print(f" {i}. δ({t['from']}, '{t['read']}') = ({t['to']}, '{t['write']}', {t['move']})")
EXAMPLE 1: Decoding a single transition
Encoded string: abaaabaaaba
Manual step-by-step breakdown (Section 2.2):
'ab' → State 1 (1 'a' + 'b')
'aaab' → State 3 (3 'a's + 'b')
'aa' → Symbol 'a'
'ab' → Symbol 'b'
'a' → Direction 'L'
Decoding transition starting at position 0
Substring: abaaabaaaba...
From State: 1 (pos now 2)
To State: 3 (pos now 6)
Read Symbol: 'a' (pos now 8)
Write Symbol: 'b' (pos now 10)
Direction: L (pos now 11)
Decoded transition: {'from': 1, 'to': 3, 'read': 'a', 'write': 'b', 'move': 'L'}
In formal notation: δ(1, 'a') = (3, 'b', L)
EXAMPLE 2: Decoding 4 transitions concatenated without separators
Encoded string: ababababbabaaabaaabbaaabaaabababaaaabaabbaaba
Length: 45 characters
Decoding Turing Machine
Total encoding length: 45 characters
============================================================
Transition 1: (1, 'b') → (1, 'b', R)
Transition 2: (1, 'a') → (3, 'b', R)
Transition 3: (3, 'b') → (3, 'b', L)
Transition 4: (3, 'Δ') → (2, 'b', L)
Successfully decoded all 4 transitions
Decoded TM contains 4 transitions
Summary of decoded transitions:
1. δ(1, 'b') = (1, 'b', R)
2. δ(1, 'a') = (3, 'b', R)
3. δ(3, 'b') = (3, 'b', L)
4. δ(3, 'Δ') = (2, 'b', L)
# Create a machine with multiple transitions
def create_example_encoding():
"""Create an encoded string with multiple transitions"""
transitions = [
# (1,b) → (1,b,R)
"ab" + "ab" + "ab" + "ab" + "b",
# (1,a) → (3,b,R)
"ab" + "aaab" + "aa" + "ab" + "b",
# (3,b) → (3,b,L)
"aaab" + "aaab" + "ab" + "ab" + "a",
# (3,Δ) → (2,b,L)
"aaab" + "aab" + "ba" + "ab" + "a"
]
return ''.join(transitions)
multiple_transitions = create_example_encoding()
print(f"\nEXAMPLE 2: Multiple transitions without separators")
print(f"Encoded string: {multiple_transitions}")
print(f"Length: {len(multiple_transitions)} characters")
# Decode with detailed output
decoder.debug = False # Turn off debug for cleaner output
decoded_transitions = decoder.decode_complete_machine(multiple_transitions)
print("\nSummary of decoded transitions:")
for i, t in enumerate(decoded_transitions, 1):
print(f"{i}. State {t['from']} + '{t['read']}' → State {t['to']} + '{t['write']}' + {t['move']}")
EXAMPLE 2: Multiple transitions without separators
Encoded string: ababababbabaaabaaabbaaabaaabababaaaabaabbaaba
Length: 45 characters
Decoding Turing Machine
Total encoding length: 45 characters
============================================================
Transition 1: (1, 'b') → (1, 'b', R)
Transition 2: (1, 'a') → (3, 'b', R)
Transition 3: (3, 'b') → (3, 'b', L)
Transition 4: (3, 'Δ') → (2, 'b', L)
Successfully decoded all 4 transitions
Decoded TM contains 4 transitions
Summary of decoded transitions:
1. State 1 + 'b' → State 1 + 'b' + R
2. State 1 + 'a' → State 3 + 'b' + R
3. State 3 + 'b' → State 3 + 'b' + L
4. State 3 + 'Δ' → State 2 + 'b' + L
2.5 Code: Full Round-Trip — Encode → Decode → Reconstruct#
Depends on: TuringMachineDecoder (Block 5) and multiple_transitions variable (Block 6). Run those blocks first.
Pipeline for this block:
multiple_transitions (str)
└── TuringMachineDecoder.decode_complete_machine()
└── list of transition dicts
└── reconstruct_turing_machine()
├── extract states and symbols
├── build transition_dict
└── tm_spec (dict) → print Transition Table
reconstruct_turing_machine() is the inverse of example_from_table() from Block 3. If the encoding round-trips correctly, the Transition Table printed here should match the table in Section 1.1.2.
Running outside Jupyter:
python ch16_encoder.py # reconstruct_turing_machine is called at module level
def reconstruct_turing_machine(encoded_string):
"""Completely reconstruct a Turing machine from its encoding"""
print("\nCOMPLETE TURING MACHINE RECONSTRUCTION")
print("=" * 60)
# Step 1: Decode transitions
decoder = TuringMachineDecoder()
decoder.debug = False
transitions = decoder.decode_complete_machine(encoded_string)
# Step 2: Extract states and symbols
states = sorted(set(t['from'] for t in transitions) |
set(t['to'] for t in transitions))
symbols = sorted(set(t['read'] for t in transitions) |
set(t['write'] for t in transitions))
# Step 3: Build transition dictionary
transition_dict = {}
for t in transitions:
key = (t['from'], t['read'])
value = (t['to'], t['write'], t['move'])
transition_dict[key] = value
# Step 4: Create machine specification
tm_spec = {
'states': states,
'alphabet': [s for s in symbols if s not in ['Δ', '#']],
'tape_alphabet': symbols,
'transitions': transition_dict,
'start_state': 1,
'halt_state': 2
}
# Display the reconstructed machine
print("\nReconstructed Turing Machine:")
print(f"States: {tm_spec['states']}")
print(f"Input alphabet: {tm_spec['alphabet']}")
print(f"Tape alphabet: {tm_spec['tape_alphabet']}")
print(f"Start state: {tm_spec['start_state']}")
print(f"Halt state: {tm_spec['halt_state']}")
print("\nTransition Table:")
print("-" * 60)
print(f"{'From':<8} {'Read':<8} {'To':<8} {'Write':<8} {'Move':<8}")
print("-" * 60)
for (from_s, read_s), (to_s, write_s, move_d) in sorted(transition_dict.items()):
print(f"{from_s:<8} {read_s:<8} {to_s:<8} {write_s:<8} {move_d:<8}")
return tm_spec
# Test with our example encoding
reconstructed = reconstruct_turing_machine(multiple_transitions)
COMPLETE TURING MACHINE RECONSTRUCTION
============================================================
Decoding Turing Machine
Total encoding length: 45 characters
============================================================
Transition 1: (1, 'b') → (1, 'b', R)
Transition 2: (1, 'a') → (3, 'b', R)
Transition 3: (3, 'b') → (3, 'b', L)
Transition 4: (3, 'Δ') → (2, 'b', L)
Successfully decoded all 4 transitions
Decoded TM contains 4 transitions
Reconstructed Turing Machine:
States: [1, 2, 3]
Input alphabet: ['a', 'b']
Tape alphabet: ['a', 'b', 'Δ']
Start state: 1
Halt state: 2
Transition Table:
------------------------------------------------------------
From Read To Write Move
------------------------------------------------------------
1 a 3 b R
1 b 1 b R
3 b 3 b L
3 Δ 2 b L
3. Language ALAN and MATHISON#
3.1 The Code Word Language (CWL)#
From the previous encoding section, you may notice that the encoding of a transition follows a specific pattern, which can be represented by a regular expression. The Code Word Language (CWL) is a regular language that describes the valid structure of encoded Turing machine transitions. It provides a regular expression pattern that all valid Turing machine encodings must follow. CWL is defined by the regular expression:
CWL = \((a^+ba^+b(a+b)^5)^*\), where \(a^+\) stands for one or more \(a\)’s.
In a valid CWL word, \(a^+ba^+b\) encodes the “From” state and “To” state, while \((a+b)^5\) encodes the “Read”, “Write”, and “Direction” information.
Important Note: The CWL defines the pattern for valid Turing machine encodings. This means that any valid Turing machine encoding must follow the CWL pattern. However, not every word in the CWL is necessarily a valid Turing machine encoding, as some CWL words may fit the pattern but fail to represent an actual, meaningful Turing machine. As an example, the string \(aababaabbb\) is a valid CWL word because it follows the required pattern. However, when decoded, it describes a transition from state 2 to state 1, which is invalid in a Turing machine because state 2 is a HALT state, and HALT states cannot have outgoing transitions. So, while the encoding fits the CWL pattern, it does not represent a valid Turing machine transition.
More examples of CWL words that are valid in form but invalid as Turing machine encodings:
❌ A CWL word that fits the pattern but refers to a transition for a non-existent state
❌ A CWL word that uses a tape symbol not in the machine’s alphabet
❌ A CWL word missing key components (e.g., no start state or halt state)
In short, CWL sets the “shape” of valid encodings, but additional checks are needed to ensure the content represents a functioning Turing machine.
3.2 Code: CWL Structure Analysis and Validator#
Depends on: Nothing — this block is standalone. However, CWLValidator is a dependency for Blocks 12–15 (ALAN and MATHISON analyzers). Run this block before those.
This block implements two things:
analyze_cwl_codeword()— manually parses a fixed example codeword character-by-character, showing exactly where each structural component (state encoding, 5-char block) begins and ends.CWLValidator— uses a regular expression to check if a string matches the CWL pattern.
CWL formal definition (Section 3.1): $\(\text{CWL} = (a^+ b \, a^+ b \, (a+b)^5)^*\)$
In Python regex: ^(a+ba+b[ab]{5})*$
Notation → regex mapping:
Formal element |
Meaning |
Regex |
|---|---|---|
\(a^+\) |
One or more |
|
\(b\) |
State encoding delimiter |
|
\((a+b)^5\) |
Exactly 5 characters from |
|
\((\ldots)^*\) |
Zero or more complete transitions |
|
Important note from Section 3.1: CWL defines the shape of valid encodings, not their meaning. A string can satisfy CWL while encoding an impossible TM (e.g., a transition leaving state 2, or a state with no path from state 1). CWL is a necessary but not sufficient condition for a valid TM encoding.
Running outside Jupyter:
python ch16_cwl.py
# ── Block 11: CWL Analysis and Validation ───────────────────────────────────
# CWLValidator is a dependency for ALANAnalyzer (Block 12) and MATHISONAnalyzer (Block 14).
# Run this block before Blocks 12–15.
import re
def analyze_cwl_codeword():
"""
Manually parse a single CWL codeword to show the structural decomposition.
Illustrates how the a^+b a^+b [ab]{5} pattern maps to formal TM components.
"""
print("\nANATOMY OF A CWL CODE WORD")
print("=" * 60)
# Example transition: (1, a) → (3, Δ, R)
# Encoding: ab [state1] + aaab [state3] + ab [read='b'] + ba [write='Δ'] + b [dir=R]
codeword = "abaaababbabb" # Note: corrected from original for clarity
print(f"\nExample code word: {codeword}")
print(f"Expected parse: (1, 'b') → (3, 'Δ', R)")
print("\nDetailed breakdown:")
pos = 0
# --- Parse first state (From): a^n b ---
state1_start = pos
while pos < len(codeword) and codeword[pos] == 'a':
pos += 1
pos += 1 # consume the 'b' delimiter
state1_str = codeword[state1_start:pos]
print(f" Position {state1_start:2d}–{pos-1:2d}: '{state1_str}' "
f"→ State {state1_str.count('a')} (From)")
# --- Parse second state (To): a^n b ---
state2_start = pos
while pos < len(codeword) and codeword[pos] == 'a':
pos += 1
pos += 1
state2_str = codeword[state2_start:pos]
print(f" Position {state2_start:2d}–{pos-1:2d}: '{state2_str}' "
f"→ State {state2_str.count('a')} (To)")
# --- Parse the fixed 5-character block [ab]{5} ---
five_chars = codeword[pos:pos + 5]
print(f" Position {pos:2d}–{pos+4:2d}: '{five_chars}' → Symbol/Direction block")
# Interpret each sub-field of the 5-char block
sym_decode = {'aa': 'a', 'ab': 'b', 'ba': 'Δ', 'bb': '#'}
dir_decode = {'a': 'L', 'b': 'R'}
read_code = five_chars[0:2]
write_code = five_chars[2:4]
dir_code = five_chars[4]
print(f"\n Five-character block breakdown:")
print(f" Read (chars 1–2): '{read_code}' → '{sym_decode.get(read_code, '?')}'")
print(f" Write (chars 3–4): '{write_code}' → '{sym_decode.get(write_code, '?')}'")
print(f" Dir (char 5): '{dir_code}' → '{dir_decode.get(dir_code, '?')}'")
return codeword
example_codeword = analyze_cwl_codeword()
class CWLValidator:
"""
Validates whether a string belongs to CWL (Code Word Language).
CWL = (a+ b a+ b (a|b){5})*
A string in CWL has the correct *shape* for a TM encoding, but may still
describe an invalid machine (e.g., transition from HALT state, missing START state).
CWL membership is necessary but not sufficient for a valid TM encoding.
"""
def __init__(self):
# Build the regex for CWL = (a+ba+b[ab]{5})*
# a+ : one or more 'a's (state encoding prefix)
# b : state encoding delimiter
# [ab]{5}: exactly 5 bits of symbol+direction information
self.pattern = r'^(a+ba+b[ab]{5})*$'
self.regex = re.compile(self.pattern)
def is_valid_cwl(self, cwl_string):
"""
Return True iff cwl_string matches the CWL pattern.
Uses fullmatch to require the pattern to cover the entire string.
"""
return bool(self.regex.fullmatch(cwl_string))
ANATOMY OF A CWL CODE WORD
============================================================
Example code word: abaaababbabb
Expected parse: (1, 'b') → (3, 'Δ', R)
Detailed breakdown:
Position 0– 1: 'ab' → State 1 (From)
Position 2– 5: 'aaab' → State 3 (To)
Position 6–10: 'abbab' → Symbol/Direction block
Five-character block breakdown:
Read (chars 1–2): 'ab' → 'b'
Write (chars 3–4): 'ba' → 'Δ'
Dir (char 5): 'b' → 'R'
import re
class CWLValidator:
def __init__(self):
# Build the regex pattern for CWL
# a+ means one or more 'a's
# (a+b) means either 'a' or 'b'
# {5} means exactly 5 occurrences
self.pattern = r'^(a+ba+b[ab]{5})*'
self.regex = re.compile(self.pattern)
def is_valid_cwl(self, cwl_string):
"""
Checks if the provided string is a valid CWL word.
Returns True if it matches the pattern, False otherwise.
"""
return bool(self.regex.fullmatch(cwl_string))
3.3 Language ALAN#
Since any Turing Machine can be encoded as a string, we can use this string as input to another Turing Machine, or even feed it back into the original machine using its own encoding. This leads to the fascinating idea that TMs can reason about other TMs (or even themselves).
3.3.1 Definition of Language ALAN#
We define the language ALAN:
ALAN = { all words in CWL that are either:
Not accepted by the Turing Machines they represent, or
Do not represent any valid Turing Machine at all }
We introduce ALAN to formalize a particular class of decision problems. It includes all strings that either fail to encode a valid TM or, if they do encode a TM, are not accepted by the machine they describe. In other words, ALAN captures invalid encodings and self-rejecting behaviors.
3.3.2 Examples of Language ALAN#
consider the following three TM encodings:
\(aababaaaaa\): It does not represent a valid TM because it encodes an invalid transition leaving a HALT state. This string is in language ALAN.
\(aaabaabaaaaa\): It does not represent a valid TM because it lacks any transitions involving the START state. This string is in language ALAN.
\(abaabababb\): It represents a valid TM, and after decoding, the TM is shown below. When we feed its own encoding string as input, the machine rejects it. This string is in language ALAN.
graph LR
accTitle: A TM
accDescr: a diagram representing a TM decoded from a given encoding
q1((START))
q2((HALT))
q1 -->|b,b,R| q2
style q1 fill:#90EE90,stroke:#333,stroke-width:3px
style q2 fill:#87CEEB,stroke:#333,stroke-width:2px
\(abaaabaaaabaaabaaabaaaabaaabaabababa\): It represents a valid TM, and after decoding, the TM is shown below. When we feed its own encoding string as input, the machine accepts it. This string is NOT in language ALAN.
graph LR
accTitle: A TM
accDescr: a diagram representing a TM decoded from a given encoding
q1((START))
q2((3))
q3((HALT))
q1 -->|a,a,R| q2
q2 -->|a,a,R| q2
q2 -->|b,b,L| q3
style q1 fill:#90EE90,stroke:#333,stroke-width:3px
style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
3.3.3 ALAN is NOT Recursively Enumerable#
We shall now prove that the language ALAN is not recursively enumerable. We will do this by contradiction.
Step 1: Suppose, for the sake of contradiction, that ALAN is recursively enumerable. This means there exists a Turing Machine, call it \(T\), such that \(T\) accepts exactly all words in ALAN. Let \(code(T)\) denote the encoding of the machine \(T\) itself.
Step 2: Ask the Critical Question: Is \(code(T)\) a word in ALAN? There are exactly two possibilities:
Yes, \(code(T) \in ALAN\).
No, \(code(T) \notin ALAN\).
Let’s explore each.
Step 3: Case 1 - Assume \(code(T) \in ALAN\), by the definition of ALAN:
Either \(code(T)\) does not represent a valid TM, but we know it does (it’s the encoding of \(T\)).
Or, \(T\) does not accept \(code(T)\).
But wait! We assumed \(T\) accepts all words in ALAN. So if \(code(T) \in ALAN\), \(T\) must accept it. This Contradicts the definition of ALAN, which says that if \(code(T) \in ALAN\), then \(T\) does not accept it.
Step 4: Case 2 — Assume \(code(T) \notin ALAN\), then by the definition of ALAN:
Either \(code(T)\) is a valid encoding and \(T\) accepts \(code(T)\), or
\(code(T)\) is not a valid encoding (which it is, by construction).
So we get: \(code(T)\) is accepted by \(T\).
But \(T\) only accepts words in ALAN. Therefore, \(code(T)\) must be a word in ALAN. Contradiction again.
Step 5: Conclude the Contradiction. In both cases, we reach a contradiction:
If \(code(T) \in ALAN\) → contradiction.
If \(code(T) \notin ALAN\) → contradiction.
Therefore, our initial assumption that ALAN is recursively enumerable must be false.
We conclude that ALAN is not recursively enumerable. This proof mirrors the classic diagonalization arguments used in computability theory, showing that certain languages cannot be captured by any Turing Machine.
3.3.4 Code: ALANAnalyzer — Implements Section 3.3’s ALAN Membership Test#
Depends on: CWLValidator (Block 11) and TuringMachineDecoder (Block 5). Run both blocks first.
ALAN is defined in Section 3.3.1 as: $\(\text{ALAN} = \{w \in \text{CWL} \mid w \text{ does not encode a valid TM, or the TM it encodes does not accept } w\}\)$
The membership test is a four-step cascade:
Step |
Test |
If fails → |
Result |
|---|---|---|---|
1 |
Is |
Exit |
NOT in ALAN (not even CWL-shaped) |
2 |
Does |
Exit |
IN ALAN (“invalid TM” criterion) |
3 |
Does the TM pass validity checks? |
Exit |
IN ALAN (structural invalidity) |
4 |
Does the TM accept |
Stay |
IN ALAN if NO; NOT in ALAN if YES |
Self-reference at Step 4: The machine is run on its own encoding string — this is the computability-theory concept of a machine processing its own description. Section 3.3.3 uses exactly this self-reference to prove ALAN is not recursively enumerable.
Running outside Jupyter:
python ch16_alan.py
# ── Block 12: ALANAnalyzer — ALAN Membership (Section 3.3) ──────────────────
# Depends on: CWLValidator (Block 11), TuringMachineDecoder (Block 5)
#
# ALAN = { w ∈ CWL | w does not encode a valid TM, OR the TM it encodes rejects w }
# Proof that ALAN is not RE: Section 3.3.3 (diagonalization argument)
class ALANAnalyzer:
"""
Determines whether a string belongs to language ALAN (Section 3.3.1).
Membership test cascade (four steps):
Step 1: CWL membership check
Step 2: Structural decode (does it parse as a TM?)
Step 3: TM validity check (has START, HALT, no duplicates, no HALT transitions)
Step 4: Self-simulation (does the TM accept its own encoding?)
"""
def __init__(self):
self.cwl_validator = CWLValidator() # Block 11
self.tm_decoder = TuringMachineDecoder() # Block 5
self.tm_decoder.debug = True # show step-by-step decode trace
def analyze_alan_membership(self, cwl_string):
"""
Four-step ALAN membership test.
Returns: (in_alan: bool, reason: str)
"""
print(f"\nAnalyzing ALAN membership for: {cwl_string[:30]}{'...' if len(cwl_string)>30 else ''}")
print("=" * 60)
# --- Step 1: CWL check ---
# ALAN ⊆ CWL by definition; anything outside CWL is simply not in ALAN
if not self.cwl_validator.is_valid_cwl(cwl_string):
print("✗ Not in CWL → not in ALAN")
return False, "not_cwl"
print("✓ String is in CWL")
# --- Step 2: Decode attempt ---
# If decoding raises an exception, the encoding is malformed → in ALAN
try:
transitions = self.tm_decoder.decode_complete_machine(cwl_string)
print(f"✓ Decoded {len(transitions)} transition(s)")
except Exception as e:
print(f"✓ String is in ALAN (invalid TM encoding: {e})")
return True, "invalid_tm"
# --- Step 3: TM validity checks ---
# A CWL string may parse but still describe an impossible machine
validity_issues = self.check_tm_validity(transitions)
if validity_issues:
print(f"✓ String is in ALAN (structural issue: {validity_issues[0]})")
return True, validity_issues[0]
print("✓ Represents a structurally valid TM")
# --- Step 4: Self-simulation (the self-reference step) ---
# Run the decoded TM on cwl_string itself (its own encoding)
# This implements the "does T accept code(T)?" question from Section 3.3.3
accepts_self = self.simulate_tm_on_self(transitions, cwl_string)
if accepts_self:
print("✗ TM accepts its own encoding → NOT in ALAN")
return False, "accepts_self"
else:
print("✓ TM does not accept its own encoding → IS in ALAN")
return True, "rejects_self"
def check_tm_validity(self, transitions):
"""
Check structural validity of a decoded TM.
Returns a list of issue strings (empty = valid).
"""
issues = []
states = set()
for t in transitions:
states.add(t['from'])
states.add(t['to'])
# Must have a START state (State 1) — Section 1.1.1
if 1 not in states:
issues.append("missing_start_state")
# Must have a HALT state (State 2) — Section 1.1.1
if 2 not in states:
issues.append("missing_halt_state")
# HALT state must be reachable (at least one transition leads to State 2)
if 2 in states and not any(t['to'] == 2 for t in transitions):
issues.append("unreachable_halt_state")
# No duplicate (from, read) pairs — δ must be a function
transition_keys = [(t['from'], t['read']) for t in transitions]
if len(transition_keys) != len(set(transition_keys)):
issues.append("duplicate_transitions")
return issues
def simulate_tm_on_self(self, transitions, input_string):
"""
Run the decoded TM on input_string (its own encoding) for up to max_steps.
This is the self-reference simulation from Section 3.3.3.
Returns True if the TM accepts, False if it rejects or times out.
"""
print("\nSimulating TM on its own encoding (Step 4)...")
# Build δ as a dict: (from_state, read_symbol) → (to_state, write, direction)
trans_dict = {(t['from'], t['read']): (t['to'], t['write'], t['move'])
for t in transitions}
# Initialize tape with the TM's own encoding (not empty — this is self-simulation)
tape = list(input_string) + ['Δ'] * 1000
head = 0
state = 1 # Always start in State 1
steps = 0
max_steps = 10000
while steps < max_steps:
# --- Check for accept state ---
if state == 2:
print(f" Reached HALT state (State 2) after {steps} steps → ACCEPT")
return True
# --- Read current symbol ---
current_symbol = tape[head] if head < len(tape) else 'Δ'
# --- Look up transition ---
key = (state, current_symbol)
if key not in trans_dict:
print(f" No transition for ({state}, '{current_symbol}') → implicit REJECT")
return False
next_state, write_symbol, direction = trans_dict[key]
# --- Execute transition ---
tape[head] = write_symbol
state = next_state
if direction == 'L':
head = max(0, head - 1)
elif direction == 'R':
head += 1
steps += 1
print(f" Exceeded {max_steps} steps → treating as non-acceptance (loop)")
return False
3.3.5: Code: ALAN Examples#
Depends on: ALANAnalyzer (Block 12), which in turn depends on CWLValidator (Block 11) and TuringMachineDecoder (Block 5). Run those blocks first.
This block instantiates ALANAnalyzer and tests four example strings from Section 3.3.2.
def demonstrate_alan_examples():
"""Show various examples of strings in and not in ALAN"""
print("\nEXAMPLES OF ALAN MEMBERSHIP")
print("=" * 60)
examples = [
{
'string': 'ababbabbb',
'description': 'Simple TM with one transition: (1,b)→(1,b,R)',
'expected': 'Likely in ALAN (no halt state reachable)'
},
{
'string': 'abaabbaba',
'description': 'TM that immediately halts: (1,a)→(2,a,L)',
'expected': 'Check if it accepts strings starting with "a"'
},
{
'string': 'aaababbabbb',
'description': 'TM with start state 3 (invalid)',
'expected': 'In ALAN (no state 1)'
},
{
'string': 'abaaabaaabb' + 'aaaabaabbabba',
'description': 'Two transitions: (1,a)→(3,a,R), (4,a)→(2,b,R)',
'expected': 'In ALAN (disconnected states)'
}
]
for ex in examples:
print(f"\nExample: {ex['description']}")
print(f"String: {ex['string']}")
print(f"Expected: {ex['expected']}")
in_alan, reason = alan_analyzer.analyze_alan_membership(ex['string'])
print(f"Result: {'IN ALAN' if in_alan else 'NOT IN ALAN'} (reason: {reason})")
demonstrate_alan_examples()
EXAMPLES OF ALAN MEMBERSHIP
============================================================
Example: Simple TM with one transition: (1,b)→(1,b,R)
String: ababbabbb
Expected: Likely in ALAN (no halt state reachable)
Analyzing ALAN membership for: ababbabbb...
============================================================
✓ String is in CWL
Decoding Turing Machine
Total encoding length: 9 characters
============================================================
Decoding transition starting at position 0
Substring: ababbabbb...
From State: 1 (pos now 2)
To State: 1 (pos now 4)
Read Symbol: 'Δ' (pos now 6)
Write Symbol: '#' (pos now 8)
Direction: R (pos now 9)
Transition 1: (1, 'Δ') → (1, '#', R)
Successfully decoded single transition
Decoded TM contains a single transition
✓ Successfully decoded 1 transitions
✓ String is in ALAN (Reason: missing_halt_state)
Result: IN ALAN (reason: missing_halt_state)
Example: TM that immediately halts: (1,a)→(2,a,L)
String: abaabbaba
Expected: Check if it accepts strings starting with "a"
Analyzing ALAN membership for: abaabbaba...
============================================================
✗ Not in CWL, therefore not in ALAN
Result: NOT IN ALAN (reason: not_cwl)
Example: TM with start state 3 (invalid)
String: aaababbabbb
Expected: In ALAN (no state 1)
Analyzing ALAN membership for: aaababbabbb...
============================================================
✓ String is in CWL
Decoding Turing Machine
Total encoding length: 11 characters
============================================================
Decoding transition starting at position 0
Substring: aaababbabbb...
From State: 3 (pos now 4)
To State: 1 (pos now 6)
Read Symbol: 'Δ' (pos now 8)
Write Symbol: '#' (pos now 10)
Direction: R (pos now 11)
Transition 1: (3, 'Δ') → (1, '#', R)
Successfully decoded single transition
Decoded TM contains a single transition
✓ Successfully decoded 1 transitions
✓ String is in ALAN (Reason: missing_halt_state)
Result: IN ALAN (reason: missing_halt_state)
Example: Two transitions: (1,a)→(3,a,R), (4,a)→(2,b,R)
String: abaaabaaabbaaaabaabbabba
Expected: In ALAN (disconnected states)
Analyzing ALAN membership for: abaaabaaabbaaaabaabbabba...
============================================================
✓ String is in CWL
Decoding Turing Machine
Total encoding length: 24 characters
============================================================
Decoding transition starting at position 0
Substring: abaaabaaabbaaaabaabb...
From State: 1 (pos now 2)
To State: 3 (pos now 6)
Read Symbol: 'a' (pos now 8)
Write Symbol: 'b' (pos now 10)
Direction: R (pos now 11)
Transition 1: (1, 'a') → (3, 'b', R)
Decoding transition starting at position 11
Substring: aaaabaabbabba...
From State: 4 (pos now 16)
To State: 2 (pos now 19)
Read Symbol: 'Δ' (pos now 21)
Write Symbol: '#' (pos now 23)
Direction: L (pos now 24)
Transition 2: (4, 'Δ') → (2, '#', L)
Successfully decoded all 2 transitions
Decoded TM contains 2 transitions
✓ Successfully decoded 2 transitions
✓ Represents a valid TM
Simulating TM on its own encoding...
No transition for (3, 'b') - rejecting
✓ TM does not accept its own encoding - IS in ALAN
Result: IN ALAN (reason: rejects_self)
3.4 Language MATHISON#
Now, let’s consider the other side of the story: what if the CWL words are accepted by the Turing Machines they encode? If we collect all valid TM encodings that are accepted by their respective machines, what language do we obtain?
3.4.1 Definition of Language MATHISON#
We define the language MATHISON:
MATHISON = all words in CWL that:
Are accepted by the Turing Machines they represent, and
Do represent valid Turing Machines }
In other words: the set of all encoded Turing Machines \(T\) such that \(T\), when given its own encoding \(code(T)\) as input, accepts that input.
3.4.2 Examples of Language MATHISON#
Given a TM encoding \(abaabaaabbababababb\)
Decode the first transition: \(abaabaaabb\)
ab = State 1
aab = State 2
aa = Symbol a
ab = Symbol b
b = Direction R
Decode the second transition: \(ababababb\)
ab = State 1
ab = State 1
ab = Symbol b
ab = Symbol b
b = Direction R
Turing Machine:
(1, a) → (2, b, R)
(1, b) → (1, b, R)
This Turing Machine reads a single ‘a’ and moves to the HALT state to accept the string. When we feed its own encoding as input, it accepts it, meaning its encoding is a word in Language MATHISON.
3.4.3 MATHISON is Recursively Enumerable#
To prove that MATHISON is recursively enumerable, we observe that each word in the language represents a valid Turing Machine \(T\) and that this machine accepts the word (its own encoding). Since the word is an encoding of a Turing Machine, we can reconstruct the machine from the encoding. By simulating this reconstructed machine on its own encoding, we know it eventually halts and accepts its encoding. This means we have a Turing Machine that accepts every word in MATHISON.
For words not in the language, the machine may reject or loop forever, which is acceptable for a recognizer. However, we cannot build a Turing Machine that always guarantees rejection for non-members, because determining whether a machine accepts an input is equivalent to solving the Halting Problem, which is undecidable. Therefore, while we can recognize MATHISON by simulating \(T\) on \(w\) and accepting if it halts and accepts, we cannot decide MATHISON because we cannot always detect non-accepting cases (especially when the machine runs forever).
Therefore, we have shown that there exists a Turing Machine that recognizes MATHISON. Thus, MATHISON is recursively enumerable.
The question we have yet to answer is: can we simulate the behavior of any Turing Machine on any arbitrary input string, including its own encoding as input? The answer is yes — we can construct a Universal Turing Machine capable of performing this simulation, which we will explore in the following section.
3.4.4 Complement of Recursively Enumerable Languages#
We are now ready to prove that If \(L\) is a recursively enumerable language, its complement \(L'\) is not necessarily recursively enumerable.
Since \(CWL\) is a regular language, its complement \(CWL'\) is also regular.
Because all regular languages are recursively enumerable, \(CWL'\) is recursively enumerable.
Consider the union language \(L= CWL' \cup MATHISON\). Since both \(CWL'\) and MATHISON are recursively enumerable, their union \(L\) is also recursively enumerable (because the class of recursively enumerable languages is closed under union).
Observe that the complement of \(L\) is exactly the language ALAN: \(L'= ALAN\). But we know that ALAN is not recursively enumerable.
Therefore, we have an example where: \(L\) is recursively enumerable, but its complement \(L'\) is not recursively enumerable.
3.4.5 Code: MATHISONAnalyzer — Implements Section 3.4’s MATHISON Membership Test#
Depends on: CWLValidator (Block 11) and TuringMachineDecoder (Block 5). Run those blocks first.
MATHISON is defined in Section 3.4.1 as: $\(\text{MATHISON} = \{w \in \text{CWL} \mid w \text{ encodes a valid TM that accepts } w\}\)$
MATHISON is the complement of ALAN within CWL (i.e., ALAN ∪ MATHISON = CWL for valid TM encodings). Section 3.4.3 proves MATHISON is recursively enumerable.
Running outside Jupyter:
python ch16_mathison.py
# ── Block 14: MATHISONAnalyzer — MATHISON Membership (Section 3.4) ───────────
# Depends on: CWLValidator (Block 11), TuringMachineDecoder (Block 5)
#
# MATHISON = { w ∈ CWL | w encodes a valid TM T, and T accepts w }
# MATHISON is the complement of ALAN within valid CWL words.
# Proof that MATHISON is RE: Section 3.4.3
#
# ⚠️ Bug fixed: original _simulate_on_empty_tape simulated on blank tape,
# contradicting the definition. Renamed to _simulate_on_own_encoding
# and now passes cwl_string as the tape content.
class MATHISONAnalyzer:
"""
Determines whether a string belongs to language MATHISON (Section 3.4.1).
Membership test cascade (same four steps as ALAN, but Step 4 inverted):
Step 1: CWL check
Step 2: Structural decode
Step 3: TM validity check
Step 4: Self-simulation — TM accepts its encoding? → YES → in MATHISON
"""
def __init__(self):
self.cwl_validator = CWLValidator() # Block 11
self.tm_decoder = TuringMachineDecoder() # Block 5
self.tm_decoder.debug = False
self.max_steps = 10000
def is_in_mathison(self, cwl_string, verbose=True):
"""
Four-step MATHISON membership test.
Returns: (in_mathison: bool, reason: str, steps_taken: int)
"""
if verbose:
print(f"\nAnalyzing MATHISON membership for: "
f"{cwl_string[:30]}{'...' if len(cwl_string)>30 else ''}")
print("=" * 60)
# --- Step 1: CWL check ---
if not self.cwl_validator.is_valid_cwl(cwl_string):
if verbose: print("✗ Not in CWL → not in MATHISON")
return False, "not_cwl", 0
if verbose: print("✓ String is in CWL")
# --- Step 2: Decode ---
try:
transitions = self.tm_decoder.decode_complete_machine(cwl_string)
if verbose: print(f"✓ Decoded {len(transitions)} transition(s)")
except Exception as e:
if verbose: print(f"✗ Invalid TM encoding: {e}")
return False, "invalid_tm", 0
# --- Step 3: Validity checks ---
if not self._is_valid_tm(transitions):
if verbose: print("✗ Missing START or HALT state → not in MATHISON")
return False, "invalid_structure", 0
if verbose: print("✓ Represents a structurally valid TM")
# --- Step 4: Self-simulation (MATHISON criterion: TM must ACCEPT its encoding) ---
# Bug fix: simulate on cwl_string (own encoding), not on empty tape
halts, steps, crash_reason = self._simulate_on_own_encoding(
transitions, cwl_string, verbose
)
if halts:
if verbose:
print(f"✓ TM halts and accepts its own encoding after {steps} steps → IN MATHISON")
return True, "accepts_self", steps
else:
reason = crash_reason if crash_reason else "timeout"
if verbose:
print(f"✗ TM does not accept its own encoding "
f"({'crash: ' + crash_reason if crash_reason else 'did not halt'}) "
f"→ NOT IN MATHISON")
return False, reason, steps
def _is_valid_tm(self, transitions):
"""Minimal validity: START (1) and HALT (2) states must both appear."""
states = set(t['from'] for t in transitions) | set(t['to'] for t in transitions)
return 1 in states and 2 in states
def _simulate_on_own_encoding(self, transitions, cwl_string, verbose=False):
"""
Simulate the decoded TM with cwl_string as the tape input.
This correctly implements the MATHISON criterion: T accepts code(T).
Bug fix from original: was named _simulate_on_empty_tape and initialized tape to blanks.
MATHISON requires the machine to accept its OWN encoding, not an empty tape.
Returns: (halts: bool, steps_taken: int, crash_reason: str|None)
"""
# δ as a Python dict
trans_dict = {(t['from'], t['read']): (t['to'], t['write'], t['move'])
for t in transitions}
# Tape initialized with the TM's own encoding (not blanks — that was the bug)
tape = list(cwl_string) + ['Δ'] * 1000
head = 0
state = 1 # q₀ = State 1
steps = 0
visited_configs = set() # Cycle detection
while steps < self.max_steps:
# --- Accept check ---
if state == 2:
return True, steps, None
# --- Cycle detection ---
tape_segment = ''.join(tape[max(0, head-5):head+6])
config = (state, head, tape_segment)
if config in visited_configs:
if verbose: print(f" Cycle detected at step {steps}")
return False, steps, None
visited_configs.add(config)
# --- Read and transition ---
current_symbol = tape[head] if head < len(tape) else 'Δ'
key = (state, current_symbol)
if key not in trans_dict:
if state == 2:
return True, steps, None # Halt state with no transition = accept
return False, steps, None # No transition = implicit reject
next_state, write_symbol, direction = trans_dict[key]
# --- Boundary check before moving left ---
if direction == 'L' and head == 0:
if verbose:
print(f" CRASH at step {steps}: attempted left-move from cell 0")
return False, steps, "left_boundary_crash"
# --- Execute transition ---
tape[head] = write_symbol
state = next_state
if direction == 'L':
head -= 1
elif direction == 'R':
head += 1
if head >= len(tape):
tape.extend(['Δ'] * 1000)
steps += 1
return False, steps, None # Exceeded max_steps
3.4.6 Code: MATHISON Examples#
Depends on: MATHISONAnalyzer (Block 14), CWLValidator (Block 11), TuringMachineDecoder (Block 5). Run those blocks first.
This block tests six encodings with different MATHISON outcomes, including crash cases (left-boundary violations) and loops.
# ── Block 15: MATHISON Examples ──────────────────────────────────────────────
# Depends on: MATHISONAnalyzer (Block 14)
# Bug fix: mathison_analyzer is instantiated here before demonstrate_mathison_examples()
# ⚠️ Bug fix: original code called demonstrate_mathison_examples() without
# instantiating mathison_analyzer first → NameError in notebook output.
mathison_analyzer = MATHISONAnalyzer() # ← this line was missing in original
def demonstrate_mathison_examples():
"""
Test six TM encodings and report MATHISON membership.
Uses the corrected _simulate_on_own_encoding (not empty-tape simulation).
"""
print("\nEXAMPLES OF MATHISON MEMBERSHIP")
print("=" * 60)
examples = [
{
'name': 'Immediate Halt (Right)',
'encoding': 'abaabbabbb', # δ(1,Δ) = (2,Δ,R): halts immediately on blank
'description': 'Reads blank → HALT immediately. '
'Own encoding starts with "a" not blank, so...',
'expected': False, # Encoding starts with 'a', no δ(1,'a') → rejects self
},
{
'name': 'Crash on Left Move',
'encoding': 'abaabbaaba', # δ(1,Δ) = (2,Δ,L): tries L from cell 0
'description': 'Tries to move left from cell 0 on first step → crash',
'expected': False,
},
{
'name': 'Write Then Halt',
'encoding': 'abaabbabbabaababbaba', # δ(1,Δ)=(1,b,R), δ(1,b)=(2,b,L)
'description': 'Writes b rightward, then moves left. '
'Own encoding starts with "a", so transition for "a" missing.',
'expected': False, # No δ(1,'a') → implicit reject of own encoding
},
{
'name': 'Infinite Right Scan',
'encoding': 'abaabababb', # δ(1,Δ) = (1,Δ,R): rightward loop forever
'description': 'Loops rightward on blanks — never halts on own encoding',
'expected': False,
},
{
'name': 'Accepts Starting \'a\'',
# δ(1,a)=(2,a,R): accepts any string starting with 'a'
# Own encoding starts with 'a' (all TM encodings do) → accepts self!
'encoding': 'abaabaaab',
'description': 'Accepts strings starting with "a". '
'Own encoding starts with "a" → MATHISON!',
'expected': True,
},
{
'name': 'Accepts Starting \'ab\'',
# δ(1,a)=(3,a,R), δ(3,b)=(2,b,R): accepts strings starting with 'ab'
# All TM encodings start with 'ab' → accepts self!
'encoding': 'abaaaabaabaaabaaabb',
'description': 'Accepts strings starting with "ab". '
'All TM encodings start with "ab" → MATHISON!',
'expected': True,
},
]
for ex in examples:
print(f"\n{'─'*60}")
print(f"Name: {ex['name']}")
print(f"Description: {ex['description']}")
print(f"Encoding: {ex['encoding']}")
# Check CWL membership first
if mathison_analyzer.cwl_validator.is_valid_cwl(ex['encoding']):
# Decode to show the transitions
mathison_analyzer.tm_decoder.debug = False
try:
transitions = mathison_analyzer.tm_decoder.decode_complete_machine(
ex['encoding']
)
for t in transitions:
print(f" Transition: δ({t['from']}, '{t['read']}') = "
f"({t['to']}, '{t['write']}', {t['move']})")
except Exception:
print(" (Unable to decode)")
else:
print(" ✗ Not a valid CWL string")
result, reason, steps = mathison_analyzer.is_in_mathison(ex['encoding'], verbose=False)
correct = (result == ex['expected'])
print(f"Expected in MATHISON: {ex['expected']}")
print(f"Actual result: {'IN MATHISON' if result else 'NOT IN MATHISON'} "
f"(reason: {reason}, steps: {steps})")
print(f"{'✓ Correct' if correct else '✗ Incorrect'}")
demonstrate_mathison_examples()
EXAMPLES OF MATHISON MEMBERSHIP
============================================================
────────────────────────────────────────────────────────────
Name: Immediate Halt (Right)
Description: Reads blank → HALT immediately. Own encoding starts with "a" not blank, so...
Encoding: abaabbabbb
Decoding Turing Machine
Total encoding length: 10 characters
============================================================
Transition 1: (1, 'Δ') → (2, '#', R)
Successfully decoded single transition
Decoded TM contains a single transition
Transition: δ(1, 'Δ') = (2, '#', R)
Decoding Turing Machine
Total encoding length: 10 characters
============================================================
Transition 1: (1, 'Δ') → (2, '#', R)
Successfully decoded single transition
Decoded TM contains a single transition
Expected in MATHISON: False
Actual result: IN MATHISON (reason: halts, steps: 1)
✗ Incorrect
────────────────────────────────────────────────────────────
Name: Crash on Left Move
Description: Tries to move left from cell 0 on first step → crash
Encoding: abaabbaaba
Decoding Turing Machine
Total encoding length: 10 characters
============================================================
Transition 1: (1, 'Δ') → (2, 'b', L)
Successfully decoded single transition
Decoded TM contains a single transition
Transition: δ(1, 'Δ') = (2, 'b', L)
Decoding Turing Machine
Total encoding length: 10 characters
============================================================
Transition 1: (1, 'Δ') → (2, 'b', L)
Successfully decoded single transition
Decoded TM contains a single transition
Expected in MATHISON: False
Actual result: NOT IN MATHISON (reason: left_boundary_crash, steps: 0)
✓ Correct
────────────────────────────────────────────────────────────
Name: Write Then Halt
Description: Writes b rightward, then moves left. Own encoding starts with "a", so transition for "a" missing.
Encoding: abaabbabbabaababbaba
✗ Not a valid CWL string
Expected in MATHISON: False
Actual result: NOT IN MATHISON (reason: not_cwl, steps: 0)
✓ Correct
────────────────────────────────────────────────────────────
Name: Infinite Right Scan
Description: Loops rightward on blanks — never halts on own encoding
Encoding: abaabababb
Decoding Turing Machine
Total encoding length: 10 characters
============================================================
Transition 1: (1, 'b') → (2, 'b', R)
Successfully decoded single transition
Decoded TM contains a single transition
Transition: δ(1, 'b') = (2, 'b', R)
Decoding Turing Machine
Total encoding length: 10 characters
============================================================
Transition 1: (1, 'b') → (2, 'b', R)
Successfully decoded single transition
Decoded TM contains a single transition
Expected in MATHISON: False
Actual result: NOT IN MATHISON (reason: timeout, steps: 0)
✓ Correct
────────────────────────────────────────────────────────────
Name: Accepts Starting 'a'
Description: Accepts strings starting with "a". Own encoding starts with "a" → MATHISON!
Encoding: abaabaaab
✗ Not a valid CWL string
Expected in MATHISON: True
Actual result: NOT IN MATHISON (reason: not_cwl, steps: 0)
✗ Incorrect
────────────────────────────────────────────────────────────
Name: Accepts Starting 'ab'
Description: Accepts strings starting with "ab". All TM encodings start with "ab" → MATHISON!
Encoding: abaaaabaabaaabaaabb
✗ Not a valid CWL string
Expected in MATHISON: True
Actual result: NOT IN MATHISON (reason: not_cwl, steps: 0)
✗ Incorrect
4. Practice Exercises#
4.1 State Numbering Convention#
Given the following Turing machine with descriptive state names, apply the standard numbering convention:
States: q_init, q_loop, q_check, q_final, q_process
Start state: q_init
Halt state: q_final
4.2 Encode a Single Transition#
Encode the following transition using the ‘a’ and ‘b’ encoding scheme:
Transition: (State 3, symbol ‘b’) → (State 5, symbol ‘a’, move Left)
4.3 Transition Table Creation#
Create a transition table for a TM that:
Starts in state 1
If it reads ‘a’, it changes it to ‘b’ and halts (state 2)
If it reads ‘b’, it moves right staying in state 1
If it reads blank (Δ), it writes ‘a’ and halts
Task: Create the table with columns: From, To, Read, Write, Move
4.4 Decode a Complete Transition#
Decode the following string: \(aabaaabbabaa\)
4.5 Decode Multiple Transitions#
Decode the following string: \(ababbabaabaabaabbaba\)
4.6 CWL Membership#
Determine if the following strings belong to CWL. If not, explain why:
abaabbabbb
abcabbabbb
abaabbabb
bbaabbabbb
abaabbabbbaabaabbaba
4.7 CWL Pattern Analysis#
Given that \(CWL = (a^+ba^+b(a+b)^5)^*\), explain why a string of length 15 might or might not be in CWL.
4.8 ALAN Membership Analysis#
For each encoding below, determine if it belongs to ALAN and explain why:
aaabaaabbabbb (TM with no state 1)
ababbabbb (TM that loops in state 1)
abaabbaba (TM that halts immediately)
4.9 Self-Reference Problem#
Consider the TM encoded by \(abaabbaaabaaabaaababbb\),
Does this TM accept its own encoding?
Is this encoding in ALAN?
Trace the first 5 steps of execution when the TM runs on its own encoding
4.10 MATHISON Quick Check#
Without full simulation, determine which encodings likely belong to MATHISON:
abaabbaab
ababbaabb
abaaabbaaabaaabaabbaba
4.11 TM Encoder Implementation#
Create a class that encodes Turing Machines into the ‘a’ and ‘b’ string representation. The encoder should handle state numbering conventions and produce lexicographically sorted encodings.
Requirements: implement a TMEncoder class with the following methods:
encode_state(n) - Encodes state number n as a^n b
encode_transition(from, to, read, write, move) - Encodes a single transition
encode_machine(transitions) - Encodes complete TM with lexicographic sorting
4.12 TM Decoder with Error Handling#
Create a robust decoder that can handle malformed encodings and provide detailed error messages.
Requirements: implement a TMDecoder class that:
Decodes state numbers from a^n b patterns
Decodes symbols and directions
Handles incomplete or invalid encodings gracefully
Returns structured transition data
4.13 CWL Validator#
Implement a validator that checks if a string belongs to the Code Word Language (CWL).
Requirements: create a function that validates CWL membership using the pattern: \(CWL = (a^+ba^+b(a+b)^5)^*\)
4.14 ALAN Membership Checker#
Implement a complete ALAN membership checker that determines if a CWL string belongs to language ALAN.
Requirements: create a system that:
Validates CWL membership
Decodes the TM
Checks TM validity
Simulates the TM on its own encoding
Determines ALAN membership
5. Further Reading#
“Introduction to the Theory of Computation” by Michael Sipser, Chapter 3
“Introduction to Computer Theory” by Daniel I.A. Cohen, Chapter 23
“Automata Theory, Languages, and Computation” by Hopcroft, Motwani, and Ullman, Chapter 9