Turing Machine Encoding

Contents

16. Turing Machine Encoding#

Introduction#

This notebook explores key ideas in computational theory through Turing Machine encodings and universal computation. We begin by outlining how Turing Machines can be encoded to standardize the representation of computation. Next, we examine two example languages, ALAN and MATHISON, that highlight how computational problems can be framed within this model.

1. The Encoding of Turing Machines#

A Turing machine can be encoded as a string of symbols, allowing us to represent any Turing machine in a standardized format. This encoding is crucial for theoretical computer science as it enables us to treat Turing machines as data that can be processed by other Turing machines.

1.1 Table-Based Encoding Approach#

The encoding process consists of three main steps:

  • Step 1: Assign numeric codes to states

  • Step 2: Convert the Turing machine into a Tabular Representation

  • Step 3: Encode each transition using ‘a’ and ‘b’ Representation

  • Step 4: Concatenate all individual row encodings into one unified string that encodes the entire Turing Machine

1.1.1 Assigning numeric codes to states#

When encoding a Turing Machine (TM) into a formal representation, it is useful to assign numeric identifiers to each state in the machine. This is a foundational step that supports further encoding processes, especially when we aim to represent the TM as a string of symbols or binary digits. To maintain consistency and simplify the encoding, we adopt the following labeling convention:

  • Start State: Always labeled as state 1. This is the initial state where the machine begins execution.

  • Halt State(s): Always labeled as state 2. This includes any accepting final states. In machines with multiple halting conditions, they can be collapsed into a single halt state for simplicity.

  • Other States: Labeled incrementally from state 3 onward. These represent the working states of the machine. The ordering of the non-special states does not matter, as long as:

    • Each state has a unique label.

    • The transitions reference the correct labels.

    • The same labels are used consistently throughout the machine’s definition.

This flexibility in ordering makes it easier to write or manipulate TMs programmatically, without worrying about the specific sequence of states.

Assigning numeric labels serves several important purposes:

  • Standardization: Provides a consistent way to reference and distinguish states.

  • Simplifies Encoding: Numeric values can be easily encoded in binary, unary, or any symbolic format.

  • Supports Automation: Makes it easier to build simulators, compilers, or universal machines that operate on encoded TMs.

  • Reduces Ambiguity: Prevents confusion that can arise from using arbitrary or descriptive state names like q_start, q_accept.

1.1.2 Converting the Turing machine into a Tabular Representation#

Just like finite automata (FAs) and pushdown automata (PDAs), Turing Machines (TMs) do not need to be represented solely using diagrams. Instead, we can describe TMs using a summary table: a compact, structured representation that captures the transition behavior of the machine in a clear and standardized way. After we assign numeric labels to all states, we convert the Turing machine into a standard tabular format. Each row represents one transition rule with five columns:

  • From: The current state (numeric)

  • To: The next state (numeric)

  • Read: The symbol being read from the tape

  • Write: The symbol to write to the tape

  • Move: The direction to move the tape head (L for Left or R for Right)

Example

        graph LR
    accTitle: A TM
    accDescr: a diagram representing a TM to be converted into a tabular representation
    q1((START))
    q2((q2))
    q3((HALT))
    
    q1 -->|a,a,R| q1
    q1 -->|Δ,Δ,R| q1
    q1 -->|b,a,R| q2
    q2 -->|b,b,L| q2
    q2 -->|Δ,b,L| q3
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
    style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
    

We start by assigning numeric codes to states as follows:

  • State 1: Start state (originally labeled “START”)

  • State 2: Halt state (originally labeled “HALT”)

  • State 3: Middle working state (originally labeled “q2”)

So the machine becomes:

        graph LR
    accTitle: A TM
    accDescr: a diagram representing a modifed TM with assigned numeric codes to states
    q1((1))
    q2((3))
    q3((2))
    
    q1 -->|a,a,R| q1
    q1 -->|Δ,Δ,R| q1
    q1 -->|b,a,R| q2
    q2 -->|b,b,L| q2
    q2 -->|Δ,b,L| q3
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
    style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
    

Transition Table: the tabular representation of the transtions:

From

To

Read

Write

Move

1

1

a

a

R

1

1

Δ

Δ

R

1

3

b

a

R

3

3

b

b

L

3

2

Δ

b

L

1.1.3 Encode each transition using ‘a’ and ‘b’ representation#

Each transition is defined as a 5-tuple: (CurrentState, ReadSymbol) → (NextState, WriteSymbol, MoveDirection), or in table format:

From

To

Read

Write

Move

\(X_1\)

\(X_2\)

\(X_3\)

\(X_4\)

\(X_5\)

  • State encoding: States are encoded using their numeric position as: \(a^nb\) (n repetitions of ‘a’ followed by ‘b’):

State

Encoding

1

ab

2

aab

3

aaab

4

aaaab

5

aaaaab

  • Symbol Encoding Table: tape symbols are encoded using a fixed 2-character encoding:

Symbol

Encoding

a

aa

b

ab

Δ (blank)

ba

# (special)

bb

  • Direction Encoding Table: movement directions are encoded using single-character encoding:

Direction

Encoding

L (Left)

a

R (Right)

b

Example

The table of transitions from the previous section can be encoded as follows:

From

To

Read

Write

Move

Encoding

1

1

a

a

R

ababaaaab

1

1

Δ

Δ

R

ababbabab

1

3

b

a

R

abaaababaab

3

3

b

b

L

aaabaaabababa

3

2

Δ

b

L

aaabaabbaaba

1.1.4 Concatenate all individual row encodings into one unified string that encodes the entire Turing Machine#

Simply concatenate all encoded rows without any separators: CompleteTMEncoding = EncodedRow₁ + EncodedRow₂ + … + EncodedRowₙ. The encoded rows are concatenated in their lexicographic order to ensure a consistent and unambiguous representation of the Turing machine.

Important Note:

  • No special separators are needed between rows because:

    • Each state encoding ends with ‘b’, making it self-delimiting

    • Symbol encodings have fixed length (2 characters)

    • Direction encodings have fixed length (1 character)

  • The ordering of rows does not affect the machine’s behavior, however, it is important to standardize the order of the encoded rows when concatenating them, because:

    • Without a standard ordering, the same Turing machine could have multiple different encodings.

    • Unique Representation for Each Turing Machine is important to analyze Turing machines in a systematic way

    • A standard ordering simplifies Universal Turing Machine Implementation (We will explore this further later in the notebook)

  • Lexicographic (dictionary) order means sorting strings as you would in a dictionary: character by character, using the alphabet’s order.

Example

  • Let us concatenate all encoded rows in their lexicographic order:

    • ababaaaab

    • ababbabab

    • abaaababaab

    • aaabaaabababa

    • aaabaabbaaba

  • The encodings starting with \(aa\) come before those with \(ab\), So, we reorder the encodings to list those starting with \(aa\) before those starting with \(ab\).

    • aaabaaabababa

    • aaabaabbaaba

    • ababaaaab

    • ababbabab

    • abaaababaab

  • Between the two starting with \(aa\):

    • \(aaabaaabababa\) vs. \(aaabaabbaaba\): compare their seventh character \(a\) vs. \(b\), so \(aaabaaabababa\) comes first.

  • Among those starting with \(ab\):

    • \(abaaababaab\), \(ababaaaab\), \(ababbabab\): compare their fourth character: \(a\), \(b\), \(b\), so \(abaaababaab\) comes before the others.

    • \(ababaaaab\), \(ababbabab\): compare their fifth character: \(a\), \(b\), so \(ababaaaab\) comes before the other.

  • Final sorted order:

    • aaabaaabababa

    • aaabaabbaaba

    • abaaababaab

    • ababaaaab

    • ababbabab

  • Simply join them without separators is the complete Turing machine encoding (visually separate the parts by color): aaabaaabababa aaabaabbaaba abaaababaab ababaaaab ababbabab

1.1.5 Code: TuringMachine - Data Container for Encoding#

⚠️ This is a different TuringMachine class from Chapter 15. The Chapter 15 version has run(), step(), transition_function, accept_state, and reject_state. This Chapter 16 version is a data container only — it stores the machine’s specification and auto-assigns numeric state IDs. It has no run() method and cannot simulate computation. Do not mix instances from the two chapters.

This class implements Step 1 (state numbering) and Step 2 (tabular representation) of Section 1.1.

Formal convention → Python mapping

Convention (Section 1.1.1)

Python encoding

Start state → always State 1

mapping[self.start_state] = 1

Halt state(s) → always State 2

mapping[halt_state] = 2 for all halt states

All other states → 3, 4, 5, …

mapping[state] = state_num; state_num += 1

Transition as 5-tuple \((q, r) \to (q', w, d)\)

transitions dict: (state_name, symbol) (next_state, write, dir)

What to look for: _create_state_mapping() enforces the numbering convention automatically. After construction, self.state_mapping is the dictionary that all encoder methods use to translate symbolic state names to their numeric codes.

Running outside Jupyter: This class is a dependency — save it at the top of ch16_encoder.py. It produces no output on its own.

# ── Block 1: TuringMachine — Data Container for Encoding ────────────────────
# This class is NOT the simulation TM from Chapter 15.
# It stores the machine's formal specification and auto-assigns numeric state IDs.
# It has NO run() or step() method. Use it as input to TMEncoder (Block 2).

class TuringMachine:
    """
    Data container representing a Turing Machine's formal specification.
    Implements the state-numbering convention from Section 1.1.1:
      State 1 = start state  (always)
      State 2 = halt state   (always; multiple halt states all map to 2)
      State 3+ = other states (assigned in iteration order)
    """

    def __init__(self, states, alphabet, tape_alphabet, transitions,
                 start_state, halt_states):
        """
        Args:
            states:        List of all state names (arbitrary strings)
            alphabet:      Input alphabet Σ (list of symbols, no blank)
            tape_alphabet: Tape alphabet Γ = Σ ∪ {blank, ...}
            transitions:   Dict mapping (state_name, symbol) → (next_state, write, dir)
                           Each value is a 3-tuple: (next_state_name, write_symbol, 'L'|'R'|'S')
            start_state:   Name of the start state q₀ (will be mapped to integer 1)
            halt_states:   Name(s) of halt state(s) — a string or list of strings
                           (all mapped to integer 2; convention collapses multiple halt states)
        """
        self.states       = states
        self.alphabet     = alphabet
        self.tape_alphabet = tape_alphabet
        self.transitions  = transitions
        self.start_state  = start_state

        # Normalize halt_states to a list regardless of how it was passed
        self.halt_states = halt_states if isinstance(halt_states, list) else [halt_states]

        # Apply the Section 1.1.1 numbering convention
        self.state_mapping = self._create_state_mapping()

    def _create_state_mapping(self):
        """
        Assign numeric IDs following the convention:
          1 → start state
          2 → all halt states
          3, 4, 5, ... → remaining states (in iteration order of self.states)

        Returns: dict mapping state_name (str) → state_number (int)
        """
        mapping = {}

        # Convention Rule: start state is always 1
        mapping[self.start_state] = 1

        # Convention Rule: all halt states collapse to 2
        # (Multiple accept states in a machine are treated as one HALT state)
        for halt_state in self.halt_states:
            mapping[halt_state] = 2

        # Remaining states: numbered 3, 4, 5, ... in iteration order
        state_num = 3
        for state in self.states:
            if state not in mapping:   # skip start and halt — already assigned
                mapping[state] = state_num
                state_num += 1

        return mapping

    def display_info(self):
        """Print the machine's specification and the resulting state mapping."""
        print("Turing Machine Configuration:")
        print(f"  States:        {self.states}")
        print(f"  Start state:   {self.start_state}")
        print(f"  Halt state(s): {self.halt_states}")
        print(f"  Input alphabet:  {self.alphabet}")
        print(f"  Tape alphabet:   {self.tape_alphabet}")

        print(f"\nState Mapping (Section 1.1.1 convention):")
        for state, num in sorted(self.state_mapping.items(), key=lambda x: x[1]):
            role = " (START)" if num == 1 else " (HALT)" if num == 2 else ""
            print(f"  {state}{num}{role}")

        print(f"\nNumber of transitions: {len(self.transitions)}")

1.1.6 Code: TMEncoder — Implements the a/b Encoding Scheme#

Depends on: TuringMachine (Block 1). Run Block 1 first.

This class implements Steps 3 and 4 of Section 1.1: encoding each transition as an a/b string and concatenating in lexicographic order.

Complete encoding table

Element

Formal notation

Encoding rule

Example

State \(n\)

\(a^n b\)

\(n\) repetitions of 'a' then 'b'

State 3 → aaab

Symbol a

fixed 2 chars: aa

'a'aa

Symbol b

fixed 2 chars: ab

'b'ab

Blank \(\Delta\)

fixed 2 chars: ba

'Δ'ba

Special #

fixed 2 chars: bb

'#'bb

Direction L

1 char: a

'L'a

Direction R

1 char: b

'R'b

⚠️ Design warning: symbol_codes also maps digits '0''7' to 3-character codes (e.g., '0''aaa'). This violates the fixed-2-character rule for symbols described in Section 1.1.3 and will produce decodings that fail with TuringMachineDecoder. Do not use digit symbols in encoding exercises unless the decoder is also extended.

⚠️ 'S' (Stay) direction maps to 'ab' (2 characters), violating the fixed-1-character rule for directions. Real Turing machines typically do not have a Stay move; this is a convenience extension. Transitions using 'S' cannot be decoded by the standard TuringMachineDecoder.

One transition’s encoding structure (9 characters minimum):

encode_state(from)  encode_state(to)  encode_symbol(read)  encode_symbol(write)  encode_direction(move)
  a^{n1}b              a^{n2}b             2 chars               2 chars               1 char

This is self-delimiting because state encodings end in 'b' and symbol encodings have fixed length — no separator is needed.

Running outside Jupyter: Save Blocks 1+2 in ch16_encoder.py. Run:

python ch16_encoder.py
# ── Block 2: TMEncoder — Implements Section 1.1.3 and 1.1.4 ────────────────
# Encodes a TuringMachine (Block 1) into an 'a'/'b' string.
# Depends on: TuringMachine (Block 1)

class TMEncoder:
    """
    Encodes Turing Machine transitions as 'a'/'b' strings following Section 1.1.3.

    Encoding rules:
      State n:      a^n b   (variable length, self-delimiting — 'b' marks the end)
      Tape symbol:  2-char fixed code from symbol_codes table
      Direction:    1-char code ('a' = Left, 'b' = Right)

    A single transition (From, To, Read, Write, Move) encodes as:
      encode_state(From) + encode_state(To) + encode_symbol(Read)
                         + encode_symbol(Write) + encode_direction(Move)
    """

    def __init__(self):
        # --- Symbol encoding table (Section 1.1.3, fixed 2-char codes) ---
        # ⚠️ WARNING: digits '0'–'7' use 3-char codes, violating the 2-char rule.
        # Do not use digit symbols in exercises meant to be decoded by TuringMachineDecoder.
        self.symbol_codes = {
            'a':  'aa',   # Core alphabet symbol
            'b':  'ab',   # Core alphabet symbol
            'Δ':  'ba',   # Blank symbol (also accept '_' as alternative notation)
            '_':  'ba',   # Alternative blank notation
            '#':  'bb',   # Special marker symbol
            # Extended alphabet for binary machines (3-char codes — non-standard):
            '0': 'aaa', '1': 'aab', '2': 'aba', '3': 'abb',
            '4': 'baa', '5': 'bab', '6': 'bba', '7': 'bbb',
        }

        # --- Direction encoding table (Section 1.1.3, 1-char codes) ---
        # ⚠️ 'S' (Stay) uses 2 chars — non-standard extension, not decodable by TuringMachineDecoder
        self.direction_codes = {
            'L': 'a',    # Move head Left
            'R': 'b',    # Move head Right
            'S': 'ab',   # Stay (non-standard; not in standard TM model)
        }

    # ── Primitive encoders ───────────────────────────────────────────────────

    def encode_state(self, state_num):
        """
        Encode state number n as a^n b (Section 1.1.3).
        The trailing 'b' makes state encodings self-delimiting:
        a decoder can find the end of one state's encoding by scanning for 'b'.
        """
        if not isinstance(state_num, int) or state_num < 1:
            raise ValueError(f"State must be a positive integer, got {state_num}")
        return 'a' * state_num + 'b'   # e.g., state 3 → 'aaab'

    def encode_symbol(self, symbol):
        """
        Encode a tape symbol as a fixed-length 2-character code (Section 1.1.3).
        Fixed length means no delimiter is needed between symbol encodings.
        """
        if symbol not in self.symbol_codes:
            raise ValueError(f"Unknown symbol: '{symbol}'. "
                             f"Valid symbols: {list(self.symbol_codes.keys())}")
        return self.symbol_codes[symbol]

    def encode_direction(self, direction):
        """
        Encode move direction as a fixed-length 1-character code (Section 1.1.3).
        """
        if direction not in self.direction_codes:
            raise ValueError(f"Unknown direction: '{direction}'. Valid: L, R, S")
        return self.direction_codes[direction]

    # ── Composite encoders ───────────────────────────────────────────────────

    def encode_transition(self, from_state, to_state, read_sym, write_sym, direction):
        """
        Encode one row of the transition table as a single 'a'/'b' string.

        Output structure (all parts concatenated, no separators):
          a^{from}b  a^{to}b  2-char-read  2-char-write  1-char-direction
          └──────┘  └─────┘  └─────────┘  └──────────┘  └─────────────┘
          ≥2 chars  ≥2 chars   2 chars       2 chars        1 char

        Minimum total: 2+2+2+2+1 = 9 characters (from=1, to=1, standard symbols/direction)
        """
        return (
            self.encode_state(from_state)     +   # X1: From state
            self.encode_state(to_state)       +   # X2: To state
            self.encode_symbol(read_sym)      +   # X3: Read symbol
            self.encode_symbol(write_sym)     +   # X4: Write symbol
            self.encode_direction(direction)      # X5: Move direction
        )

    def encode_tm(self, tm, use_lexicographic=True):
        """
        Encode a complete Turing Machine (Section 1.1.4).

        Steps:
          1. Translate each transition's state names to numeric IDs via tm.state_mapping
          2. Encode each transition individually
          3. Sort lexicographically (if use_lexicographic=True) to ensure unique representation
          4. Concatenate all encoded transitions without separators

        Lexicographic sorting (Section 1.1.4) is required for uniqueness:
        without it, the same TM could have multiple valid encodings.
        """
        # --- Step 1+2: Translate and encode each transition ---
        numeric_transitions = []

        for (state, symbol), (next_state, write_symbol, direction) in tm.transitions.items():
            from_num = tm.state_mapping[state]       # Symbolic → numeric state ID
            to_num   = tm.state_mapping[next_state]

            encoded = self.encode_transition(from_num, to_num, symbol, write_symbol, direction)

            numeric_transitions.append({
                'from':    from_num,
                'to':      to_num,
                'read':    symbol,
                'write':   write_symbol,
                'move':    direction,
                'encoded': encoded,
            })

        # --- Step 3: Lexicographic sort for canonical ordering ---
        if use_lexicographic:
            numeric_transitions.sort(key=lambda x: x['encoded'])

        # --- Step 4: Concatenate (no separators — encoding is self-delimiting) ---
        encoded_tm = ''.join(t['encoded'] for t in numeric_transitions)

        return encoded_tm, numeric_transitions

    def display_encoding_details(self, tm, encoded_tm, transitions):
        """Print a formatted table showing each transition's encoding."""
        print("\nEncoding Details:")
        print("-" * 100)
        print(f"{'From':<10} {'To':<10} {'Read':<10} {'Write':<10} {'Move':<10} {'Encoding'}")
        print("-" * 100)

        for trans in transitions:
            # Reverse-lookup: numeric ID → symbolic state name
            # Bug fix: [s for ...][0] — was missing [0] in original, causing list to print as string
            from_name = [s for s, n in tm.state_mapping.items() if n == trans['from']][0]
            to_name   = [s for s, n in tm.state_mapping.items() if n == trans['to']][0]

            print(f"{from_name:<10} {to_name:<10} {trans['read']:<10} "
                  f"{trans['write']:<10} {trans['move']:<10} {trans['encoded']:<30}")

        print("-" * 100)
        print(f"\nComplete Encoding ({len(encoded_tm)} characters):")
        print(encoded_tm)

1.1.7 Code: Three Encoding Examples + Verification#

Depends on: TuringMachine (Block 1) and TMEncoder (Block 2). Run Blocks 1–2 first.

This block demonstrates encoding on three concrete machines of increasing complexity:

Example

Machine purpose

States

Transitions

Key concept illustrated

example_simple_tm()

Converts all 'a's to 'b's

3

6

Basic encoding with 'S' direction (non-standard)

example_from_table()

TM from Section 1.1.2’s worked example

3

5

Round-trip: table → encoding matches the textbook

example_binary_increment()

Adds 1 to a binary number

5

12

Multi-state encoding with binary digits (3-char symbol codes)

⚠️ example_simple_tm() and example_binary_increment() use 'S' (Stay) or binary digit symbols — see Block 2’s warnings. Their encodings are not decodable by TuringMachineDecoder (Block 5). Only example_from_table() produces a fully round-trip–compatible encoding.

Running outside Jupyter:

python ch16_encoder.py   # runs all three examples when __name__ == "__main__"
# ── Block 3: Encoding Examples ───────────────────────────────────────────────
# Depends on: TuringMachine (Block 1), TMEncoder (Block 2)
# ⚠️ example_simple_tm() and example_binary_increment() use 'S' or digit symbols
#    which are NOT compatible with TuringMachineDecoder (Block 5).

def example_simple_tm():
    """
    Example 1: TM that converts every 'a' to 'b'.
    Demonstrates basic encoding with three states and six transitions.
    ⚠️ Uses 'S' (Stay) direction — non-standard, not decodable by Block 5 decoder.
    """
    print("=" * 100)
    print("Example 1: Simple TM that converts 'a' to 'b'")
    print("=" * 100)

    states       = ['q_start', 'q_scan', 'q_halt']
    alphabet     = ['a', 'b']
    tape_alphabet = ['a', 'b', 'Δ']

    # Transition function: (state, read) → (next_state, write, direction)
    transitions = {
        # --- Phase: scan rightward, replacing 'a' with 'b' ---
        ('q_start', 'a'): ('q_scan',  'b', 'R'),   # First 'a' → 'b', begin scan
        ('q_start', 'b'): ('q_start', 'b', 'R'),   # Skip 'b', continue scanning
        ('q_start', 'Δ'): ('q_halt',  'Δ', 'S'),   # End of tape → halt (Stay)

        ('q_scan', 'a'):  ('q_scan',  'b', 'R'),   # Replace subsequent 'a's
        ('q_scan', 'b'):  ('q_scan',  'b', 'R'),   # Skip 'b', continue scanning
        ('q_scan', 'Δ'):  ('q_halt',  'Δ', 'S'),   # End of tape → halt (Stay)
    }

    tm      = TuringMachine(states, alphabet, tape_alphabet, transitions, 'q_start', 'q_halt')
    encoder = TMEncoder()

    tm.display_info()
    encoded, trans_list = encoder.encode_tm(tm)
    encoder.display_encoding_details(tm, encoded, trans_list)

    return tm, encoded


def example_from_table():
    """
    Example 2: TM from the Section 1.1.2 worked example.
    This is the ONLY example in this block fully compatible with TuringMachineDecoder.
    The output encoding should match the textbook's lexicographically sorted result.
    """
    print("\n" + "=" * 100)
    print("Example 2: TM from the Section 1.1.2 transition table")
    print("=" * 100)

    # State names already match numeric convention (1=start, 2=halt, 3=working)
    states       = ['1', '2', '3']
    alphabet     = ['a', 'b']
    tape_alphabet = ['a', 'b', 'Δ']

    # Transition table from Section 1.1.2:
    #   From  To  Read  Write  Move
    #   1     1   a     a      R     (self-loop: skip 'a')
    #   1     1   Δ     Δ      R     (self-loop: skip blank → causes loops on some inputs)
    #   1     3   b     a      R     (first 'b' → write 'a', go to state 3)
    #   3     3   b     b      L     (self-loop: scan back over 'b')
    #   3     2   Δ     b      L     (write 'b', halt)
    transitions = {
        ('1', 'a'): ('1', 'a', 'R'),
        ('1', 'Δ'): ('1', 'Δ', 'R'),
        ('1', 'b'): ('3', 'a', 'R'),
        ('3', 'b'): ('3', 'b', 'L'),
        ('3', 'Δ'): ('2', 'b', 'L'),
    }

    tm      = TuringMachine(states, alphabet, tape_alphabet, transitions, '1', '2')
    encoder = TMEncoder()

    tm.display_info()
    encoded, trans_list = encoder.encode_tm(tm)
    encoder.display_encoding_details(tm, encoded, trans_list)

    return tm, encoded


def example_binary_increment():
    """
    Example 3: TM that adds 1 to a binary number on the tape.
    Demonstrates encoding of a 5-state machine with 12 transitions.
    ⚠️ Uses binary digit symbols ('0', '1') which use 3-char codes in TMEncoder —
       this violates the 2-char fixed-length rule. Not decodable by Block 5.
    """
    print("\n" + "=" * 100)
    print("Example 3: Binary Increment TM")
    print("=" * 100)

    states       = ['q0', 'q1', 'q2', 'q3', 'qaccept']
    alphabet     = ['0', '1']
    tape_alphabet = ['0', '1', 'Δ']

    transitions = {
        # --- Phase 1: Scan right to the last digit ---
        ('q0', '0'): ('q0',      '0', 'R'),
        ('q0', '1'): ('q0',      '1', 'R'),
        ('q0', 'Δ'): ('q1',      'Δ', 'L'),   # Reached end → start adding 1

        # --- Phase 2: Add 1 with carry propagation ---
        ('q1', '0'): ('q2',      '1', 'L'),   # 0+1=1, no carry → done with increment
        ('q1', '1'): ('q1',      '0', 'L'),   # 1+1=0, carry 1 → continue left
        ('q1', 'Δ'): ('q3',      '1', 'R'),   # Overflow: write leading 1, fix up

        # --- Phase 3: Scan left back to start ---
        ('q2', '0'): ('q2',      '0', 'L'),
        ('q2', '1'): ('q2',      '1', 'L'),
        ('q2', 'Δ'): ('qaccept', 'Δ', 'R'),   # Back at start → accept

        # --- Phase 4: Handle overflow (all 1s → prepend 1, rest already 0) ---
        ('q3', '0'): ('q3',      '0', 'R'),
        ('q3', '1'): ('q3',      '1', 'R'),
        ('q3', 'Δ'): ('qaccept', 'Δ', 'S'),   # Done → accept
    }

    tm      = TuringMachine(states, alphabet, tape_alphabet, transitions, 'q0', 'qaccept')
    encoder = TMEncoder()

    tm.display_info()
    encoded, trans_list = encoder.encode_tm(tm)
    encoder.display_encoding_details(tm, encoded, trans_list)

    return tm, encoded


def verify_encoding(encoded_string):
    """
    Quick sanity check: a valid TM encoding must consist only of 'a' and 'b'.
    This is necessary but not sufficient — CWLValidator (Block 11) provides full validation.
    """
    if not all(c in 'ab' for c in encoded_string):
        return False, "String contains characters other than 'a' and 'b'"
    if len(encoded_string) == 0:
        return False, "Empty encoding"
    return True, "Valid character set (passes basic check)"


# ── Main runner ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
    tm1, enc1 = example_simple_tm()
    tm2, enc2 = example_from_table()
    tm3, enc3 = example_binary_increment()

    # --- Encoding statistics ---
    print("\n" + "=" * 100)
    print("Encoding Verification")
    print("=" * 100)

    for i, (tm, enc) in enumerate([(tm1, enc1), (tm2, enc2), (tm3, enc3)], 1):
        valid, msg = verify_encoding(enc)
        print(f"Example {i}: {msg}")
        print(f"  Length: {len(enc)} characters")
        print(f"  'a' count: {enc.count('a')}, 'b' count: {enc.count('b')}")
        if enc.count('b') > 0:
            print(f"  Ratio a:b = {enc.count('a') / enc.count('b'):.2f}:1")
====================================================================================================
Example 1: Simple TM that converts 'a' to 'b'
====================================================================================================
Turing Machine Configuration:
  States:        ['q_start', 'q_scan', 'q_halt']
  Start state:   q_start
  Halt state(s): ['q_halt']
  Input alphabet:  ['a', 'b']
  Tape alphabet:   ['a', 'b', 'Δ']

State Mapping (Section 1.1.1 convention):
  q_start → 1 (START)
  q_halt → 2 (HALT)
  q_scan → 3

Number of transitions: 6

Encoding Details:
----------------------------------------------------------------------------------------------------
From       To         Read       Write      Move       Encoding
----------------------------------------------------------------------------------------------------
q_scan     q_scan     a          b          R          aaabaaabaaabb                 
q_scan     q_scan     b          b          R          aaabaaabababb                 
q_scan     q_halt     Δ          Δ          S          aaabaabbabaab                 
q_start    q_scan     a          b          R          abaaabaaabb                   
q_start    q_halt     Δ          Δ          S          abaabbabaab                   
q_start    q_start    b          b          R          ababababb                     
----------------------------------------------------------------------------------------------------

Complete Encoding (70 characters):
aaabaaabaaabbaaabaaabababbaaabaabbabaababaaabaaabbabaabbabaabababababb

====================================================================================================
Example 2: TM from the Section 1.1.2 transition table
====================================================================================================
Turing Machine Configuration:
  States:        ['1', '2', '3']
  Start state:   1
  Halt state(s): ['2']
  Input alphabet:  ['a', 'b']
  Tape alphabet:   ['a', 'b', 'Δ']

State Mapping (Section 1.1.1 convention):
  1 → 1 (START)
  2 → 2 (HALT)
  3 → 3

Number of transitions: 5

Encoding Details:
----------------------------------------------------------------------------------------------------
From       To         Read       Write      Move       Encoding
----------------------------------------------------------------------------------------------------
3          3          b          b          L          aaabaaabababa                 
3          2          Δ          b          L          aaabaabbaaba                  
1          3          b          a          R          abaaababaab                   
1          1          a          a          R          ababaaaab                     
1          1          Δ          Δ          R          ababbabab                     
----------------------------------------------------------------------------------------------------

Complete Encoding (54 characters):
aaabaaabababaaaabaabbaabaabaaababaabababaaaabababbabab

====================================================================================================
Example 3: Binary Increment TM
====================================================================================================
Turing Machine Configuration:
  States:        ['q0', 'q1', 'q2', 'q3', 'qaccept']
  Start state:   q0
  Halt state(s): ['qaccept']
  Input alphabet:  ['0', '1']
  Tape alphabet:   ['0', '1', 'Δ']

State Mapping (Section 1.1.1 convention):
  q0 → 1 (START)
  qaccept → 2 (HALT)
  q1 → 3
  q2 → 4
  q3 → 5

Number of transitions: 12

Encoding Details:
----------------------------------------------------------------------------------------------------
From       To         Read       Write      Move       Encoding
----------------------------------------------------------------------------------------------------
q3         q3         0          0          R          aaaaabaaaaabaaaaaab           
q3         q3         1          1          R          aaaaabaaaaabaabaabb           
q3         qaccept    Δ          Δ          S          aaaaabaabbabaab               
q2         q2         0          0          L          aaaabaaaabaaaaaaa             
q2         q2         1          1          L          aaaabaaaabaabaaba             
q2         qaccept    Δ          Δ          R          aaaabaabbabab                 
q1         q3         Δ          1          R          aaabaaaaabbaaabb              
q1         q2         0          1          L          aaabaaaabaaaaaba              
q1         q1         1          0          L          aaabaaabaabaaaa               
q0         q1         Δ          Δ          L          abaaabbabaa                   
q0         q0         0          0          R          ababaaaaaab                   
q0         q0         1          1          R          ababaabaabb                   
----------------------------------------------------------------------------------------------------

Complete Encoding (180 characters):
aaaaabaaaaabaaaaaabaaaaabaaaaabaabaabbaaaaabaabbabaabaaaabaaaabaaaaaaaaaaabaaaabaabaabaaaaabaabbababaaabaaaaabbaaabbaaabaaaabaaaaabaaaabaaabaabaaaaabaaabbabaaababaaaaaabababaabaabb

====================================================================================================
Encoding Verification
====================================================================================================
Example 1: Valid character set (passes basic check)
  Length: 70 characters
  'a' count: 42, 'b' count: 28
  Ratio a:b = 1.50:1
Example 2: Valid character set (passes basic check)
  Length: 54 characters
  'a' count: 34, 'b' count: 20
  Ratio a:b = 1.70:1
Example 3: Valid character set (passes basic check)
  Length: 180 characters
  'a' count: 133, 'b' count: 47
  Ratio a:b = 2.83:1

2. The Decoding of Turing Machines#

Decoding is the reverse process of encoding and we take an encoded string and reconstruct the original Turing machine. The decoding process follows these key principles:

  • Each transition has exactly 5 components (From, To, Read, Write, Move)

  • State numbers are encoded as repeated \(a\)’s followed by a \(b\)

  • Symbols use fixed 2-character codes (aa, ab, ba, bb)

  • Directions use single-character codes (a for L, b for R)

  • No explicit separators exist between transitions

2.1 Step-by-Step Algorithm#

  • Step 1: Parsing individual transitions from the encoded string

    • Decode the “From” state number by counting the number of repeated \(a\) symbols in the encoding.

    • Skip the next \(b\) since it only actss as a separator

    • Decode the “To” state number by counting the number of repeated \(a\) symbols in the encoding.

    • Skip the next \(b\) since it only actss as a separator

    • Decoding the “Read” symbol by reading the next two characters and decoding it using the Symbol encoding table in reverse

    • Decoding the “Write” symbol by reading the next two characters and decoding it using the Symbol encoding table in reverse

    • Decode the direction by reading the next character and using the direction table in reverse.

    • Fill one row of the transition table with all the decoded information to form a complete transition. Step 2: Repeat Step 1 if there are still unprocessed characters in the encoding.

2.2 Examples#

Given an encoding string \(ababababbabaaabaaabbaaabaaabababaaaabaabbaaba\), let’s decode this string by parsing each component of the transitions.

  • Transition 1, starting at position 0:

    • From State: ab → 1 ‘a’ + ‘b’ = State 1

    • To State: ab → 1 ‘a’ + ‘b’ = State 1

    • Read Symbol: ab → ‘b’

    • Write Symbol: ab → ‘b’

    • Direction: b → R (right)

    • Transition 1: (1, b) → (1, b, R)

    • Position after: 9

  • Transition 2: starting at position 9:

    • From State: ab → 1 ‘a’ + ‘b’ = State 1

    • To State: aaab → 3 ‘a’s + ‘b’ = State 3

    • Read Symbol: aa → ‘a’

    • Write Symbol: ab → ‘b’

    • Direction: b → R (right)

    • Transition 2: (1, a) → (3, b, R)

    • Position after: 19

  • Transition 3: starting at position 19:

    • From State: aaab → 3 ‘a’s + ‘b’ = State 3

    • To State: aaab → 3 ‘a’s + ‘b’ = State 3

    • Read Symbol: ab → ‘b’

    • Write Symbol: ab → ‘b’

    • Direction: a → L (left)

    • Transition 3: (3, b) → (3, b, L)

    • Position after: 30

  • Transition 4: starting at position 30:

    • From State: aaab → 3 ‘a’s + ‘b’ = State 3

    • To State: aab → 2 ‘a’s + ‘b’ = State 2

    • Read Symbol: ba → ‘Δ’ (blank)

    • Write Symbol: ab → ‘b’

    • Direction: a → L (left)

    • Transition 4: (3, Δ) → (2, b, L)

    • Position after: 42 (end of string)

Summary of Decoded Transitions:

Transition

From

Read

To

Write

Move

1

1

b

1

b

R

2

1

a

3

b

R

3

3

b

3

b

L

4

3

Δ

2

b

L

2.3 Code: TuringMachineDecoder — Implements Section 2.1 (Step-by-Step Algorithm)#

Depends on: Nothing — this class is self-contained. However, its outputs feed into Blocks 6–10.

This class implements the reverse of TMEncoder: given an a/b string, it reconstructs the sequence of \((q, r) \to (q', w, d)\) tuples without any separators in the input.

Formal parsing rules (Section 2.1)

Component

Parsing rule

Code

From state

Count as until b; that count = state number

decode_state()

To state

Same, immediately after From state

decode_state()

Read symbol

Read next 2 characters; look up in reverse symbol table

decode_symbol()

Write symbol

Read next 2 characters; look up in reverse symbol table

decode_symbol()

Direction

Read next 1 character; look up in reverse direction table

decode_direction()

Minimum valid transition length: 'ab' + 'ab' + 'aa' + 'aa' + 'b' = 9 characters (from=1, to=1, read=a, write=a, direction=R). The decoder enforces this minimum in decode_complete_machine().

Running outside Jupyter: Save this class alongside TuringMachine and TMEncoder in ch16_encoder.py, then run:

python ch16_encoder.py
# ── Block 5: TuringMachineDecoder — Implements Section 2.1 ──────────────────
# Reverses the TMEncoder process: parses an 'a'/'b' string into transition tuples.
# No dependencies on prior blocks — but its output feeds Blocks 6–10.

class TuringMachineDecoder:
    """
    Decodes an 'a'/'b' string into a list of Turing Machine transitions.

    Decoding exploits the self-delimiting property of the encoding:
      - State: variable-length a^n b (count 'a's until 'b' → state number n)
      - Symbol: fixed-length 2 chars (look up in symbol_decode)
      - Direction: fixed-length 1 char (look up in direction_decode)
    No separator characters exist — parsing advances a position pointer left-to-right.
    """

    def __init__(self):
        # Reverse of TMEncoder's symbol_codes (2-char code → symbol name)
        self.symbol_decode = {
            'aa': 'a',
            'ab': 'b',
            'ba': 'Δ',   # Blank
            'bb': '#',   # Special marker
        }

        # Reverse of TMEncoder's direction_codes (1-char code → direction)
        self.direction_decode = {
            'a': 'L',
            'b': 'R',
        }

        self.debug = True   # Print step-by-step position trace when True

    # ── Primitive decoders ───────────────────────────────────────────────────

    def decode_state(self, encoding, start_pos):
        """
        Decode a state number at position start_pos.
        Rule: count leading 'a's → that count is the state number.
              the immediately following 'b' is consumed as the delimiter.

        Returns: (state_number: int, new_position: int)
        """
        pos     = start_pos
        a_count = 0

        # Count consecutive 'a's
        while pos < len(encoding) and encoding[pos] == 'a':
            a_count += 1
            pos += 1

        # The 'b' delimiter must follow
        if pos < len(encoding) and encoding[pos] == 'b':
            pos += 1   # consume the 'b'
            return a_count, pos
        else:
            raise ValueError(
                f"Invalid state encoding at position {start_pos}: "
                f"expected 'b' after {a_count} 'a's, found "
                f"'{encoding[pos] if pos < len(encoding) else 'end-of-string'}'"
            )

    def decode_symbol(self, encoding, start_pos):
        """
        Decode a tape symbol at position start_pos.
        Rule: read exactly 2 characters and look up in symbol_decode.

        Returns: (symbol: str, new_position: int)
        """
        if start_pos + 2 > len(encoding):
            raise ValueError(
                f"Not enough characters for symbol at position {start_pos} "
                f"(need 2, have {len(encoding) - start_pos})"
            )
        code = encoding[start_pos:start_pos + 2]
        if code not in self.symbol_decode:
            raise ValueError(f"Invalid symbol code '{code}' at position {start_pos}")
        return self.symbol_decode[code], start_pos + 2

    def decode_direction(self, encoding, start_pos):
        """
        Decode a direction at position start_pos.
        Rule: read exactly 1 character and look up in direction_decode.

        Returns: (direction: str, new_position: int)
        """
        if start_pos >= len(encoding):
            raise ValueError(f"No character for direction at position {start_pos}")
        code = encoding[start_pos]
        if code not in self.direction_decode:
            raise ValueError(f"Invalid direction code '{code}' at position {start_pos}")
        return self.direction_decode[code], start_pos + 1

    # ── Composite decoder ────────────────────────────────────────────────────

    def decode_single_transition(self, encoding, start_pos):
        """
        Decode one complete 5-tuple transition starting at start_pos.
        Applies decode_state (×2), decode_symbol (×2), decode_direction (×1) in order.

        Returns: (transition_dict, new_position)
        where transition_dict has keys: 'from', 'to', 'read', 'write', 'move'
        """
        if self.debug:
            print(f"\nDecoding transition at position {start_pos}")
            preview = encoding[start_pos:start_pos + 20]
            print(f"  Substring: {preview}{'...' if start_pos + 20 < len(encoding) else ''}")

        pos = start_pos

        # --- Decode X1: From state ---
        from_state, pos = self.decode_state(encoding, pos)
        if self.debug: print(f"  From State: {from_state} (pos now {pos})")

        # --- Decode X2: To state ---
        to_state, pos = self.decode_state(encoding, pos)
        if self.debug: print(f"  To State:   {to_state} (pos now {pos})")

        # --- Decode X3: Read symbol ---
        read_symbol, pos = self.decode_symbol(encoding, pos)
        if self.debug: print(f"  Read:       '{read_symbol}' (pos now {pos})")

        # --- Decode X4: Write symbol ---
        write_symbol, pos = self.decode_symbol(encoding, pos)
        if self.debug: print(f"  Write:      '{write_symbol}' (pos now {pos})")

        # --- Decode X5: Direction ---
        direction, pos = self.decode_direction(encoding, pos)
        if self.debug: print(f"  Direction:  {direction} (pos now {pos})")

        transition = {
            'from':  from_state,
            'to':    to_state,
            'read':  read_symbol,
            'write': write_symbol,
            'move':  direction,
        }
        return transition, pos

    def decode_complete_machine(self, encoding):
        """
        Decode an entire TM encoding string into a list of transitions.
        Repeatedly calls decode_single_transition until the string is exhausted.

        Returns: list of transition dicts
        """
        transitions = []
        pos = 0

        print(f"\nDecoding Turing Machine")
        print(f"Total encoding length: {len(encoding)} characters")
        print("=" * 60)

        if len(encoding) == 0:
            print("Empty encoding — no transitions.")
            return transitions

        # 9 is the minimum valid transition length (see pre-code note)
        if len(encoding) < 9:
            print(f"Encoding too short for a valid transition "
                  f"(need ≥9 chars, got {len(encoding)})")
            return transitions

        while pos < len(encoding):
            try:
                transition, new_pos = self.decode_single_transition(encoding, pos)
                transitions.append(transition)

                print(f"\nTransition {len(transitions)}: "
                      f"({transition['from']}, '{transition['read']}') → "
                      f"({transition['to']}, '{transition['write']}', {transition['move']})")

                pos = new_pos

                if pos == len(encoding):
                    n = len(transitions)
                    print(f"\nSuccessfully decoded {'all ' + str(n) if n > 1 else 'single'} "
                          f"transition{'s' if n != 1 else ''}.")
                    break

            except ValueError as e:
                if len(transitions) == 0:
                    print(f"\nError decoding first transition at position {pos}: {e}")
                else:
                    print(f"\nStopped after {len(transitions)} transition(s).")
                    print(f"Remaining characters at position {pos}: '{encoding[pos:]}'")
                break

        print(f"\nDecoded TM contains {len(transitions)} transition(s).")
        return transitions

2.4 Code: Decoder Tests — Single and Multiple Transitions#

Depends on: TuringMachineDecoder (Block 5). Run Block 5 first.

This block instantiates the decoder and runs two worked examples from Section 2.2:

  1. Single transition: "abaaabaaaba" — decode (1, a) (3, a, L) step by step.

  2. Multiple transitions: a 4-transition string built by create_example_encoding().

The variable multiple_transitions defined here is reused in Blocks 7 and 10. If you restart the kernel and skip this block, those cells will raise a NameError.

Manual trace for "abaaabaaaba":

Position

Characters

Meaning

0–1

ab

From state: 1 a + b = State 1

2–5

aaab

To state: 3 as + b = State 3

6–7

aa

Read symbol: 'a'

8–9

ab

Write symbol: 'b'

10

a

Direction: L

Result: \((1, a) \to (3, b, L)\)

Running outside Jupyter: Include this block after TuringMachineDecoder in ch16_encoder.py.

# ── Block 6: Decoder Tests ───────────────────────────────────────────────────
# Depends on: TuringMachineDecoder (Block 5)
# NOTE: multiple_transitions variable is reused in Blocks 7 and 10.

decoder = TuringMachineDecoder()

# ── Example 1: Decode a single transition ────────────────────────────────────
# Encoding: "abaaabaaaba"
# Expected result: (1, 'a') → (3, 'b', L)
single_transition = "abaaabaaaba"
print("EXAMPLE 1: Decoding a single transition")
print(f"Encoded string: {single_transition}")
print("\nManual step-by-step breakdown (Section 2.2):")
print("  'ab'   → State 1  (1 'a' + 'b')")
print("  'aaab' → State 3  (3 'a's + 'b')")
print("  'aa'   → Symbol 'a'")
print("  'ab'   → Symbol 'b'")
print("  'a'    → Direction 'L'")

transition, _ = decoder.decode_single_transition(single_transition, 0)
print(f"\nDecoded transition: {transition}")
print(f"In formal notation: δ({transition['from']}, '{transition['read']}') = "
      f"({transition['to']}, '{transition['write']}', {transition['move']})")

# ── Example 2: Decode a 4-transition encoding ────────────────────────────────
# This encoding matches Example 2 from Block 3's TM (Section 1.1.2 worked example)
# and corresponds to the transitions decoded manually in Section 2.2.

def create_example_encoding():
    """
    Manually build the 4-transition encoding from Section 2.2.
    Uses explicit concatenation so students can see each component.
    Expected transitions:
      (1,b) → (1,b,R)
      (1,a) → (3,b,R)
      (3,b) → (3,b,L)
      (3,Δ) → (2,b,L)
    """
    # Each string: encode_state(from) + encode_state(to) + encode_symbol(read)
    #              + encode_symbol(write) + encode_direction(move)
    row1 = "ab"   + "ab"   + "ab" + "ab" + "b"   # (1,b)→(1,b,R)
    row2 = "ab"   + "aaab" + "aa" + "ab" + "b"   # (1,a)→(3,b,R)
    row3 = "aaab" + "aaab" + "ab" + "ab" + "a"   # (3,b)→(3,b,L)
    row4 = "aaab" + "aab"  + "ba" + "ab" + "a"   # (3,Δ)→(2,b,L)
    return row1 + row2 + row3 + row4   # no separators (Section 1.1.4)

multiple_transitions = create_example_encoding()   # ← reused in Blocks 7 and 10

print(f"\nEXAMPLE 2: Decoding 4 transitions concatenated without separators")
print(f"Encoded string: {multiple_transitions}")
print(f"Length: {len(multiple_transitions)} characters")

decoder.debug = False   # suppress position trace for cleaner multi-transition output
decoded_transitions = decoder.decode_complete_machine(multiple_transitions)

print("\nSummary of decoded transitions:")
for i, t in enumerate(decoded_transitions, 1):
    print(f"  {i}. δ({t['from']}, '{t['read']}') = ({t['to']}, '{t['write']}', {t['move']})")
EXAMPLE 1: Decoding a single transition
Encoded string: abaaabaaaba

Manual step-by-step breakdown (Section 2.2):
  'ab'   → State 1  (1 'a' + 'b')
  'aaab' → State 3  (3 'a's + 'b')
  'aa'   → Symbol 'a'
  'ab'   → Symbol 'b'
  'a'    → Direction 'L'

Decoding transition starting at position 0
Substring: abaaabaaaba...
  From State: 1 (pos now 2)
  To State: 3 (pos now 6)
  Read Symbol: 'a' (pos now 8)
  Write Symbol: 'b' (pos now 10)
  Direction: L (pos now 11)

Decoded transition: {'from': 1, 'to': 3, 'read': 'a', 'write': 'b', 'move': 'L'}
In formal notation: δ(1, 'a') = (3, 'b', L)

EXAMPLE 2: Decoding 4 transitions concatenated without separators
Encoded string: ababababbabaaabaaabbaaabaaabababaaaabaabbaaba
Length: 45 characters

Decoding Turing Machine
Total encoding length: 45 characters
============================================================

Transition 1: (1, 'b') → (1, 'b', R)

Transition 2: (1, 'a') → (3, 'b', R)

Transition 3: (3, 'b') → (3, 'b', L)

Transition 4: (3, 'Δ') → (2, 'b', L)

Successfully decoded all 4 transitions

Decoded TM contains 4 transitions

Summary of decoded transitions:
  1. δ(1, 'b') = (1, 'b', R)
  2. δ(1, 'a') = (3, 'b', R)
  3. δ(3, 'b') = (3, 'b', L)
  4. δ(3, 'Δ') = (2, 'b', L)
# Create a machine with multiple transitions
def create_example_encoding():
    """Create an encoded string with multiple transitions"""
    transitions = [
        # (1,b) → (1,b,R)
        "ab" + "ab" + "ab" + "ab" + "b",
        # (1,a) → (3,b,R)
        "ab" + "aaab" + "aa" + "ab" + "b",
        # (3,b) → (3,b,L)
        "aaab" + "aaab" + "ab" + "ab" + "a",
        # (3,Δ) → (2,b,L)
        "aaab" + "aab" + "ba" + "ab" + "a"
    ]
    return ''.join(transitions)

multiple_transitions = create_example_encoding()
print(f"\nEXAMPLE 2: Multiple transitions without separators")
print(f"Encoded string: {multiple_transitions}")
print(f"Length: {len(multiple_transitions)} characters")

# Decode with detailed output
decoder.debug = False  # Turn off debug for cleaner output
decoded_transitions = decoder.decode_complete_machine(multiple_transitions)

print("\nSummary of decoded transitions:")
for i, t in enumerate(decoded_transitions, 1):
    print(f"{i}. State {t['from']} + '{t['read']}' → State {t['to']} + '{t['write']}' + {t['move']}")
EXAMPLE 2: Multiple transitions without separators
Encoded string: ababababbabaaabaaabbaaabaaabababaaaabaabbaaba
Length: 45 characters

Decoding Turing Machine
Total encoding length: 45 characters
============================================================

Transition 1: (1, 'b') → (1, 'b', R)

Transition 2: (1, 'a') → (3, 'b', R)

Transition 3: (3, 'b') → (3, 'b', L)

Transition 4: (3, 'Δ') → (2, 'b', L)

Successfully decoded all 4 transitions

Decoded TM contains 4 transitions

Summary of decoded transitions:
1. State 1 + 'b' → State 1 + 'b' + R
2. State 1 + 'a' → State 3 + 'b' + R
3. State 3 + 'b' → State 3 + 'b' + L
4. State 3 + 'Δ' → State 2 + 'b' + L

2.5 Code: Full Round-Trip — Encode → Decode → Reconstruct#

Depends on: TuringMachineDecoder (Block 5) and multiple_transitions variable (Block 6). Run those blocks first.

Pipeline for this block:

multiple_transitions (str)
    └── TuringMachineDecoder.decode_complete_machine()
            └── list of transition dicts
                    └── reconstruct_turing_machine()
                            ├── extract states and symbols
                            ├── build transition_dict
                            └── tm_spec (dict) → print Transition Table

reconstruct_turing_machine() is the inverse of example_from_table() from Block 3. If the encoding round-trips correctly, the Transition Table printed here should match the table in Section 1.1.2.

Running outside Jupyter:

python ch16_encoder.py   # reconstruct_turing_machine is called at module level
def reconstruct_turing_machine(encoded_string):
    """Completely reconstruct a Turing machine from its encoding"""
    
    print("\nCOMPLETE TURING MACHINE RECONSTRUCTION")
    print("=" * 60)
    
    # Step 1: Decode transitions
    decoder = TuringMachineDecoder()
    decoder.debug = False
    transitions = decoder.decode_complete_machine(encoded_string)
    
    # Step 2: Extract states and symbols
    states = sorted(set(t['from'] for t in transitions) | 
                   set(t['to'] for t in transitions))
    symbols = sorted(set(t['read'] for t in transitions) | 
                    set(t['write'] for t in transitions))
    
    # Step 3: Build transition dictionary
    transition_dict = {}
    for t in transitions:
        key = (t['from'], t['read'])
        value = (t['to'], t['write'], t['move'])
        transition_dict[key] = value
    
    # Step 4: Create machine specification
    tm_spec = {
        'states': states,
        'alphabet': [s for s in symbols if s not in ['Δ', '#']],
        'tape_alphabet': symbols,
        'transitions': transition_dict,
        'start_state': 1,
        'halt_state': 2
    }
    
    # Display the reconstructed machine
    print("\nReconstructed Turing Machine:")
    print(f"States: {tm_spec['states']}")
    print(f"Input alphabet: {tm_spec['alphabet']}")
    print(f"Tape alphabet: {tm_spec['tape_alphabet']}")
    print(f"Start state: {tm_spec['start_state']}")
    print(f"Halt state: {tm_spec['halt_state']}")
    
    print("\nTransition Table:")
    print("-" * 60)
    print(f"{'From':<8} {'Read':<8} {'To':<8} {'Write':<8} {'Move':<8}")
    print("-" * 60)
    for (from_s, read_s), (to_s, write_s, move_d) in sorted(transition_dict.items()):
        print(f"{from_s:<8} {read_s:<8} {to_s:<8} {write_s:<8} {move_d:<8}")
    
    return tm_spec

# Test with our example encoding
reconstructed = reconstruct_turing_machine(multiple_transitions)
COMPLETE TURING MACHINE RECONSTRUCTION
============================================================

Decoding Turing Machine
Total encoding length: 45 characters
============================================================

Transition 1: (1, 'b') → (1, 'b', R)

Transition 2: (1, 'a') → (3, 'b', R)

Transition 3: (3, 'b') → (3, 'b', L)

Transition 4: (3, 'Δ') → (2, 'b', L)

Successfully decoded all 4 transitions

Decoded TM contains 4 transitions

Reconstructed Turing Machine:
States: [1, 2, 3]
Input alphabet: ['a', 'b']
Tape alphabet: ['a', 'b', 'Δ']
Start state: 1
Halt state: 2

Transition Table:
------------------------------------------------------------
From     Read     To       Write    Move    
------------------------------------------------------------
1        a        3        b        R       
1        b        1        b        R       
3        b        3        b        L       
3        Δ        2        b        L       

3. Language ALAN and MATHISON#

3.1 The Code Word Language (CWL)#

From the previous encoding section, you may notice that the encoding of a transition follows a specific pattern, which can be represented by a regular expression. The Code Word Language (CWL) is a regular language that describes the valid structure of encoded Turing machine transitions. It provides a regular expression pattern that all valid Turing machine encodings must follow. CWL is defined by the regular expression:

CWL = \((a^+ba^+b(a+b)^5)^*\), where \(a^+\) stands for one or more \(a\)’s.

In a valid CWL word, \(a^+ba^+b\) encodes the “From” state and “To” state, while \((a+b)^5\) encodes the “Read”, “Write”, and “Direction” information.

Important Note: The CWL defines the pattern for valid Turing machine encodings. This means that any valid Turing machine encoding must follow the CWL pattern. However, not every word in the CWL is necessarily a valid Turing machine encoding, as some CWL words may fit the pattern but fail to represent an actual, meaningful Turing machine. As an example, the string \(aababaabbb\) is a valid CWL word because it follows the required pattern. However, when decoded, it describes a transition from state 2 to state 1, which is invalid in a Turing machine because state 2 is a HALT state, and HALT states cannot have outgoing transitions. So, while the encoding fits the CWL pattern, it does not represent a valid Turing machine transition.

More examples of CWL words that are valid in form but invalid as Turing machine encodings:

  • ❌ A CWL word that fits the pattern but refers to a transition for a non-existent state

  • ❌ A CWL word that uses a tape symbol not in the machine’s alphabet

  • ❌ A CWL word missing key components (e.g., no start state or halt state)

In short, CWL sets the “shape” of valid encodings, but additional checks are needed to ensure the content represents a functioning Turing machine.

3.2 Code: CWL Structure Analysis and Validator#

Depends on: Nothing — this block is standalone. However, CWLValidator is a dependency for Blocks 12–15 (ALAN and MATHISON analyzers). Run this block before those.

This block implements two things:

  1. analyze_cwl_codeword() — manually parses a fixed example codeword character-by-character, showing exactly where each structural component (state encoding, 5-char block) begins and ends.

  2. CWLValidator — uses a regular expression to check if a string matches the CWL pattern.

CWL formal definition (Section 3.1): $\(\text{CWL} = (a^+ b \, a^+ b \, (a+b)^5)^*\)$

In Python regex: ^(a+ba+b[ab]{5})*$

Notation → regex mapping:

Formal element

Meaning

Regex

\(a^+\)

One or more 'a's (state encoding prefix)

a+

\(b\)

State encoding delimiter

b

\((a+b)^5\)

Exactly 5 characters from {a,b} (2-char read + 2-char write + 1-char direction)

[ab]{5}

\((\ldots)^*\)

Zero or more complete transitions

(...)*

Important note from Section 3.1: CWL defines the shape of valid encodings, not their meaning. A string can satisfy CWL while encoding an impossible TM (e.g., a transition leaving state 2, or a state with no path from state 1). CWL is a necessary but not sufficient condition for a valid TM encoding.

Running outside Jupyter:

python ch16_cwl.py
# ── Block 11: CWL Analysis and Validation ───────────────────────────────────
# CWLValidator is a dependency for ALANAnalyzer (Block 12) and MATHISONAnalyzer (Block 14).
# Run this block before Blocks 12–15.

import re

def analyze_cwl_codeword():
    """
    Manually parse a single CWL codeword to show the structural decomposition.
    Illustrates how the a^+b a^+b [ab]{5} pattern maps to formal TM components.
    """
    print("\nANATOMY OF A CWL CODE WORD")
    print("=" * 60)

    # Example transition: (1, a) → (3, Δ, R)
    # Encoding: ab [state1] + aaab [state3] + ab [read='b'] + ba [write='Δ'] + b [dir=R]
    codeword = "abaaababbabb"   # Note: corrected from original for clarity
    print(f"\nExample code word: {codeword}")
    print(f"Expected parse: (1, 'b') → (3, 'Δ', R)")
    print("\nDetailed breakdown:")

    pos = 0

    # --- Parse first state (From): a^n b ---
    state1_start = pos
    while pos < len(codeword) and codeword[pos] == 'a':
        pos += 1
    pos += 1   # consume the 'b' delimiter
    state1_str = codeword[state1_start:pos]
    print(f"  Position {state1_start:2d}{pos-1:2d}: '{state1_str}' "
          f"→ State {state1_str.count('a')} (From)")

    # --- Parse second state (To): a^n b ---
    state2_start = pos
    while pos < len(codeword) and codeword[pos] == 'a':
        pos += 1
    pos += 1
    state2_str = codeword[state2_start:pos]
    print(f"  Position {state2_start:2d}{pos-1:2d}: '{state2_str}' "
          f"→ State {state2_str.count('a')} (To)")

    # --- Parse the fixed 5-character block [ab]{5} ---
    five_chars = codeword[pos:pos + 5]
    print(f"  Position {pos:2d}{pos+4:2d}: '{five_chars}' → Symbol/Direction block")

    # Interpret each sub-field of the 5-char block
    sym_decode = {'aa': 'a', 'ab': 'b', 'ba': 'Δ', 'bb': '#'}
    dir_decode  = {'a': 'L', 'b': 'R'}

    read_code  = five_chars[0:2]
    write_code = five_chars[2:4]
    dir_code   = five_chars[4]

    print(f"\n  Five-character block breakdown:")
    print(f"    Read  (chars 1–2): '{read_code}'  → '{sym_decode.get(read_code, '?')}'")
    print(f"    Write (chars 3–4): '{write_code}' → '{sym_decode.get(write_code, '?')}'")
    print(f"    Dir   (char  5):   '{dir_code}'    → '{dir_decode.get(dir_code, '?')}'")

    return codeword


example_codeword = analyze_cwl_codeword()


class CWLValidator:
    """
    Validates whether a string belongs to CWL (Code Word Language).

    CWL = (a+ b a+ b (a|b){5})*

    A string in CWL has the correct *shape* for a TM encoding, but may still
    describe an invalid machine (e.g., transition from HALT state, missing START state).
    CWL membership is necessary but not sufficient for a valid TM encoding.
    """

    def __init__(self):
        # Build the regex for CWL = (a+ba+b[ab]{5})*
        # a+  : one or more 'a's (state encoding prefix)
        # b   : state encoding delimiter
        # [ab]{5}: exactly 5 bits of symbol+direction information
        self.pattern = r'^(a+ba+b[ab]{5})*$'
        self.regex   = re.compile(self.pattern)

    def is_valid_cwl(self, cwl_string):
        """
        Return True iff cwl_string matches the CWL pattern.
        Uses fullmatch to require the pattern to cover the entire string.
        """
        return bool(self.regex.fullmatch(cwl_string))
ANATOMY OF A CWL CODE WORD
============================================================

Example code word: abaaababbabb
Expected parse: (1, 'b') → (3, 'Δ', R)

Detailed breakdown:
  Position  0– 1: 'ab' → State 1 (From)
  Position  2– 5: 'aaab' → State 3 (To)
  Position  6–10: 'abbab' → Symbol/Direction block

  Five-character block breakdown:
    Read  (chars 1–2): 'ab'  → 'b'
    Write (chars 3–4): 'ba' → 'Δ'
    Dir   (char  5):   'b'    → 'R'
import re

class CWLValidator:
    def __init__(self):
        # Build the regex pattern for CWL
        # a+ means one or more 'a's
        # (a+b) means either 'a' or 'b'
        # {5} means exactly 5 occurrences
        self.pattern = r'^(a+ba+b[ab]{5})*'
        self.regex = re.compile(self.pattern)

    def is_valid_cwl(self, cwl_string):
        """
        Checks if the provided string is a valid CWL word.
        Returns True if it matches the pattern, False otherwise.
        """
        return bool(self.regex.fullmatch(cwl_string))        

3.3 Language ALAN#

Since any Turing Machine can be encoded as a string, we can use this string as input to another Turing Machine, or even feed it back into the original machine using its own encoding. This leads to the fascinating idea that TMs can reason about other TMs (or even themselves).

3.3.1 Definition of Language ALAN#

We define the language ALAN:

ALAN = { all words in CWL that are either:

  • Not accepted by the Turing Machines they represent, or

  • Do not represent any valid Turing Machine at all }

We introduce ALAN to formalize a particular class of decision problems. It includes all strings that either fail to encode a valid TM or, if they do encode a TM, are not accepted by the machine they describe. In other words, ALAN captures invalid encodings and self-rejecting behaviors.

3.3.2 Examples of Language ALAN#

consider the following three TM encodings:

  • \(aababaaaaa\): It does not represent a valid TM because it encodes an invalid transition leaving a HALT state. This string is in language ALAN.

  • \(aaabaabaaaaa\): It does not represent a valid TM because it lacks any transitions involving the START state. This string is in language ALAN.

  • \(abaabababb\): It represents a valid TM, and after decoding, the TM is shown below. When we feed its own encoding string as input, the machine rejects it. This string is in language ALAN.

        graph LR
    accTitle: A TM
    accDescr: a diagram representing a TM decoded from a given encoding
    q1((START))
    q2((HALT))
    
    q1 -->|b,b,R| q2
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#87CEEB,stroke:#333,stroke-width:2px
    
  • \(abaaabaaaabaaabaaabaaaabaaabaabababa\): It represents a valid TM, and after decoding, the TM is shown below. When we feed its own encoding string as input, the machine accepts it. This string is NOT in language ALAN.

        graph LR
    accTitle: A TM
    accDescr: a diagram representing a TM decoded from a given encoding
    q1((START))
    q2((3))
    q3((HALT))
    
    q1 -->|a,a,R| q2
    q2 -->|a,a,R| q2
    q2 -->|b,b,L| q3
    
    style q1 fill:#90EE90,stroke:#333,stroke-width:3px
    style q2 fill:#FFB6C1,stroke:#333,stroke-width:3px
    style q3 fill:#87CEEB,stroke:#333,stroke-width:2px
    

3.3.3 ALAN is NOT Recursively Enumerable#

We shall now prove that the language ALAN is not recursively enumerable. We will do this by contradiction.

  • Step 1: Suppose, for the sake of contradiction, that ALAN is recursively enumerable. This means there exists a Turing Machine, call it \(T\), such that \(T\) accepts exactly all words in ALAN. Let \(code(T)\) denote the encoding of the machine \(T\) itself.

  • Step 2: Ask the Critical Question: Is \(code(T)\) a word in ALAN? There are exactly two possibilities:

    • Yes, \(code(T) \in ALAN\).

    • No, \(code(T) \notin ALAN\).

Let’s explore each.

  • Step 3: Case 1 - Assume \(code(T) \in ALAN\), by the definition of ALAN:

    • Either \(code(T)\) does not represent a valid TM, but we know it does (it’s the encoding of \(T\)).

    • Or, \(T\) does not accept \(code(T)\).

    • But wait! We assumed \(T\) accepts all words in ALAN. So if \(code(T) \in ALAN\), \(T\) must accept it. This Contradicts the definition of ALAN, which says that if \(code(T) \in ALAN\), then \(T\) does not accept it.

  • Step 4: Case 2 — Assume \(code(T) \notin ALAN\), then by the definition of ALAN:

    • Either \(code(T)\) is a valid encoding and \(T\) accepts \(code(T)\), or

    • \(code(T)\) is not a valid encoding (which it is, by construction).

    • So we get: \(code(T)\) is accepted by \(T\).

    • But \(T\) only accepts words in ALAN. Therefore, \(code(T)\) must be a word in ALAN. Contradiction again.

  • Step 5: Conclude the Contradiction. In both cases, we reach a contradiction:

    • If \(code(T) \in ALAN\) → contradiction.

    • If \(code(T) \notin ALAN\) → contradiction.

    • Therefore, our initial assumption that ALAN is recursively enumerable must be false.

We conclude that ALAN is not recursively enumerable. This proof mirrors the classic diagonalization arguments used in computability theory, showing that certain languages cannot be captured by any Turing Machine.

3.3.4 Code: ALANAnalyzer — Implements Section 3.3’s ALAN Membership Test#

Depends on: CWLValidator (Block 11) and TuringMachineDecoder (Block 5). Run both blocks first.

ALAN is defined in Section 3.3.1 as: $\(\text{ALAN} = \{w \in \text{CWL} \mid w \text{ does not encode a valid TM, or the TM it encodes does not accept } w\}\)$

The membership test is a four-step cascade:

Step

Test

If fails →

Result

1

Is w in CWL?

Exit

NOT in ALAN (not even CWL-shaped)

2

Does w decode to a valid TM?

Exit

IN ALAN (“invalid TM” criterion)

3

Does the TM pass validity checks?

Exit

IN ALAN (structural invalidity)

4

Does the TM accept w (its own encoding)?

Stay

IN ALAN if NO; NOT in ALAN if YES

Self-reference at Step 4: The machine is run on its own encoding string — this is the computability-theory concept of a machine processing its own description. Section 3.3.3 uses exactly this self-reference to prove ALAN is not recursively enumerable.

Running outside Jupyter:

python ch16_alan.py
# ── Block 12: ALANAnalyzer — ALAN Membership (Section 3.3) ──────────────────
# Depends on: CWLValidator (Block 11), TuringMachineDecoder (Block 5)
#
# ALAN = { w ∈ CWL | w does not encode a valid TM, OR the TM it encodes rejects w }
# Proof that ALAN is not RE: Section 3.3.3 (diagonalization argument)

class ALANAnalyzer:
    """
    Determines whether a string belongs to language ALAN (Section 3.3.1).

    Membership test cascade (four steps):
      Step 1: CWL membership check
      Step 2: Structural decode (does it parse as a TM?)
      Step 3: TM validity check (has START, HALT, no duplicates, no HALT transitions)
      Step 4: Self-simulation (does the TM accept its own encoding?)
    """

    def __init__(self):
        self.cwl_validator = CWLValidator()      # Block 11
        self.tm_decoder    = TuringMachineDecoder()  # Block 5
        self.tm_decoder.debug = True   # show step-by-step decode trace

    def analyze_alan_membership(self, cwl_string):
        """
        Four-step ALAN membership test.
        Returns: (in_alan: bool, reason: str)
        """
        print(f"\nAnalyzing ALAN membership for: {cwl_string[:30]}{'...' if len(cwl_string)>30 else ''}")
        print("=" * 60)

        # --- Step 1: CWL check ---
        # ALAN ⊆ CWL by definition; anything outside CWL is simply not in ALAN
        if not self.cwl_validator.is_valid_cwl(cwl_string):
            print("✗ Not in CWL → not in ALAN")
            return False, "not_cwl"
        print("✓ String is in CWL")

        # --- Step 2: Decode attempt ---
        # If decoding raises an exception, the encoding is malformed → in ALAN
        try:
            transitions = self.tm_decoder.decode_complete_machine(cwl_string)
            print(f"✓ Decoded {len(transitions)} transition(s)")
        except Exception as e:
            print(f"✓ String is in ALAN (invalid TM encoding: {e})")
            return True, "invalid_tm"

        # --- Step 3: TM validity checks ---
        # A CWL string may parse but still describe an impossible machine
        validity_issues = self.check_tm_validity(transitions)
        if validity_issues:
            print(f"✓ String is in ALAN (structural issue: {validity_issues[0]})")
            return True, validity_issues[0]
        print("✓ Represents a structurally valid TM")

        # --- Step 4: Self-simulation (the self-reference step) ---
        # Run the decoded TM on cwl_string itself (its own encoding)
        # This implements the "does T accept code(T)?" question from Section 3.3.3
        accepts_self = self.simulate_tm_on_self(transitions, cwl_string)

        if accepts_self:
            print("✗ TM accepts its own encoding → NOT in ALAN")
            return False, "accepts_self"
        else:
            print("✓ TM does not accept its own encoding → IS in ALAN")
            return True, "rejects_self"

    def check_tm_validity(self, transitions):
        """
        Check structural validity of a decoded TM.
        Returns a list of issue strings (empty = valid).
        """
        issues = []
        states = set()
        for t in transitions:
            states.add(t['from'])
            states.add(t['to'])

        # Must have a START state (State 1) — Section 1.1.1
        if 1 not in states:
            issues.append("missing_start_state")

        # Must have a HALT state (State 2) — Section 1.1.1
        if 2 not in states:
            issues.append("missing_halt_state")

        # HALT state must be reachable (at least one transition leads to State 2)
        if 2 in states and not any(t['to'] == 2 for t in transitions):
            issues.append("unreachable_halt_state")

        # No duplicate (from, read) pairs — δ must be a function
        transition_keys = [(t['from'], t['read']) for t in transitions]
        if len(transition_keys) != len(set(transition_keys)):
            issues.append("duplicate_transitions")

        return issues

    def simulate_tm_on_self(self, transitions, input_string):
        """
        Run the decoded TM on input_string (its own encoding) for up to max_steps.
        This is the self-reference simulation from Section 3.3.3.
        Returns True if the TM accepts, False if it rejects or times out.
        """
        print("\nSimulating TM on its own encoding (Step 4)...")

        # Build δ as a dict: (from_state, read_symbol) → (to_state, write, direction)
        trans_dict = {(t['from'], t['read']): (t['to'], t['write'], t['move'])
                      for t in transitions}

        # Initialize tape with the TM's own encoding (not empty — this is self-simulation)
        tape       = list(input_string) + ['Δ'] * 1000
        head       = 0
        state      = 1   # Always start in State 1
        steps      = 0
        max_steps  = 10000

        while steps < max_steps:
            # --- Check for accept state ---
            if state == 2:
                print(f"  Reached HALT state (State 2) after {steps} steps → ACCEPT")
                return True

            # --- Read current symbol ---
            current_symbol = tape[head] if head < len(tape) else 'Δ'

            # --- Look up transition ---
            key = (state, current_symbol)
            if key not in trans_dict:
                print(f"  No transition for ({state}, '{current_symbol}') → implicit REJECT")
                return False

            next_state, write_symbol, direction = trans_dict[key]

            # --- Execute transition ---
            tape[head] = write_symbol
            state      = next_state

            if direction == 'L':
                head = max(0, head - 1)
            elif direction == 'R':
                head += 1

            steps += 1

        print(f"  Exceeded {max_steps} steps → treating as non-acceptance (loop)")
        return False

3.3.5: Code: ALAN Examples#

Depends on: ALANAnalyzer (Block 12), which in turn depends on CWLValidator (Block 11) and TuringMachineDecoder (Block 5). Run those blocks first.

This block instantiates ALANAnalyzer and tests four example strings from Section 3.3.2.

def demonstrate_alan_examples():
    """Show various examples of strings in and not in ALAN"""
    print("\nEXAMPLES OF ALAN MEMBERSHIP")
    print("=" * 60)
    
    examples = [
        {
            'string': 'ababbabbb',
            'description': 'Simple TM with one transition: (1,b)→(1,b,R)',
            'expected': 'Likely in ALAN (no halt state reachable)'
        },
        {
            'string': 'abaabbaba',
            'description': 'TM that immediately halts: (1,a)→(2,a,L)',
            'expected': 'Check if it accepts strings starting with "a"'
        },
        {
            'string': 'aaababbabbb',
            'description': 'TM with start state 3 (invalid)',
            'expected': 'In ALAN (no state 1)'
        },
        {
            'string': 'abaaabaaabb' + 'aaaabaabbabba',
            'description': 'Two transitions: (1,a)→(3,a,R), (4,a)→(2,b,R)',
            'expected': 'In ALAN (disconnected states)'
        }
    ]
    
    for ex in examples:
        print(f"\nExample: {ex['description']}")
        print(f"String: {ex['string']}")
        print(f"Expected: {ex['expected']}")
        
        in_alan, reason = alan_analyzer.analyze_alan_membership(ex['string'])
        print(f"Result: {'IN ALAN' if in_alan else 'NOT IN ALAN'} (reason: {reason})")

demonstrate_alan_examples()
EXAMPLES OF ALAN MEMBERSHIP
============================================================

Example: Simple TM with one transition: (1,b)→(1,b,R)
String: ababbabbb
Expected: Likely in ALAN (no halt state reachable)

Analyzing ALAN membership for: ababbabbb...
============================================================
✓ String is in CWL

Decoding Turing Machine
Total encoding length: 9 characters
============================================================

Decoding transition starting at position 0
Substring: ababbabbb...
  From State: 1 (pos now 2)
  To State: 1 (pos now 4)
  Read Symbol: 'Δ' (pos now 6)
  Write Symbol: '#' (pos now 8)
  Direction: R (pos now 9)

Transition 1: (1, 'Δ') → (1, '#', R)

Successfully decoded single transition

Decoded TM contains a single transition
✓ Successfully decoded 1 transitions
✓ String is in ALAN (Reason: missing_halt_state)
Result: IN ALAN (reason: missing_halt_state)

Example: TM that immediately halts: (1,a)→(2,a,L)
String: abaabbaba
Expected: Check if it accepts strings starting with "a"

Analyzing ALAN membership for: abaabbaba...
============================================================
✗ Not in CWL, therefore not in ALAN
Result: NOT IN ALAN (reason: not_cwl)

Example: TM with start state 3 (invalid)
String: aaababbabbb
Expected: In ALAN (no state 1)

Analyzing ALAN membership for: aaababbabbb...
============================================================
✓ String is in CWL

Decoding Turing Machine
Total encoding length: 11 characters
============================================================

Decoding transition starting at position 0
Substring: aaababbabbb...
  From State: 3 (pos now 4)
  To State: 1 (pos now 6)
  Read Symbol: 'Δ' (pos now 8)
  Write Symbol: '#' (pos now 10)
  Direction: R (pos now 11)

Transition 1: (3, 'Δ') → (1, '#', R)

Successfully decoded single transition

Decoded TM contains a single transition
✓ Successfully decoded 1 transitions
✓ String is in ALAN (Reason: missing_halt_state)
Result: IN ALAN (reason: missing_halt_state)

Example: Two transitions: (1,a)→(3,a,R), (4,a)→(2,b,R)
String: abaaabaaabbaaaabaabbabba
Expected: In ALAN (disconnected states)

Analyzing ALAN membership for: abaaabaaabbaaaabaabbabba...
============================================================
✓ String is in CWL

Decoding Turing Machine
Total encoding length: 24 characters
============================================================

Decoding transition starting at position 0
Substring: abaaabaaabbaaaabaabb...
  From State: 1 (pos now 2)
  To State: 3 (pos now 6)
  Read Symbol: 'a' (pos now 8)
  Write Symbol: 'b' (pos now 10)
  Direction: R (pos now 11)

Transition 1: (1, 'a') → (3, 'b', R)

Decoding transition starting at position 11
Substring: aaaabaabbabba...
  From State: 4 (pos now 16)
  To State: 2 (pos now 19)
  Read Symbol: 'Δ' (pos now 21)
  Write Symbol: '#' (pos now 23)
  Direction: L (pos now 24)

Transition 2: (4, 'Δ') → (2, '#', L)

Successfully decoded all 2 transitions

Decoded TM contains 2 transitions
✓ Successfully decoded 2 transitions
✓ Represents a valid TM

Simulating TM on its own encoding...
  No transition for (3, 'b') - rejecting
✓ TM does not accept its own encoding - IS in ALAN
Result: IN ALAN (reason: rejects_self)

3.4 Language MATHISON#

Now, let’s consider the other side of the story: what if the CWL words are accepted by the Turing Machines they encode? If we collect all valid TM encodings that are accepted by their respective machines, what language do we obtain?

3.4.1 Definition of Language MATHISON#

We define the language MATHISON:

MATHISON = all words in CWL that:

  • Are accepted by the Turing Machines they represent, and

  • Do represent valid Turing Machines }

In other words: the set of all encoded Turing Machines \(T\) such that \(T\), when given its own encoding \(code(T)\) as input, accepts that input.

3.4.2 Examples of Language MATHISON#

Given a TM encoding \(abaabaaabbababababb\)

  • Decode the first transition: \(abaabaaabb\)

    • ab = State 1

    • aab = State 2

    • aa = Symbol a

    • ab = Symbol b

    • b = Direction R

  • Decode the second transition: \(ababababb\)

    • ab = State 1

    • ab = State 1

    • ab = Symbol b

    • ab = Symbol b

    • b = Direction R

  • Turing Machine:

    • (1, a) → (2, b, R)

    • (1, b) → (1, b, R)

This Turing Machine reads a single ‘a’ and moves to the HALT state to accept the string. When we feed its own encoding as input, it accepts it, meaning its encoding is a word in Language MATHISON.

3.4.3 MATHISON is Recursively Enumerable#

To prove that MATHISON is recursively enumerable, we observe that each word in the language represents a valid Turing Machine \(T\) and that this machine accepts the word (its own encoding). Since the word is an encoding of a Turing Machine, we can reconstruct the machine from the encoding. By simulating this reconstructed machine on its own encoding, we know it eventually halts and accepts its encoding. This means we have a Turing Machine that accepts every word in MATHISON.

For words not in the language, the machine may reject or loop forever, which is acceptable for a recognizer. However, we cannot build a Turing Machine that always guarantees rejection for non-members, because determining whether a machine accepts an input is equivalent to solving the Halting Problem, which is undecidable. Therefore, while we can recognize MATHISON by simulating \(T\) on \(w\) and accepting if it halts and accepts, we cannot decide MATHISON because we cannot always detect non-accepting cases (especially when the machine runs forever).

Therefore, we have shown that there exists a Turing Machine that recognizes MATHISON. Thus, MATHISON is recursively enumerable.

The question we have yet to answer is: can we simulate the behavior of any Turing Machine on any arbitrary input string, including its own encoding as input? The answer is yes — we can construct a Universal Turing Machine capable of performing this simulation, which we will explore in the following section.

3.4.4 Complement of Recursively Enumerable Languages#

We are now ready to prove that If \(L\) is a recursively enumerable language, its complement \(L'\) is not necessarily recursively enumerable.

  • Since \(CWL\) is a regular language, its complement \(CWL'\) is also regular.

  • Because all regular languages are recursively enumerable, \(CWL'\) is recursively enumerable.

  • Consider the union language \(L= CWL' \cup MATHISON\). Since both \(CWL'\) and MATHISON are recursively enumerable, their union \(L\) is also recursively enumerable (because the class of recursively enumerable languages is closed under union).

  • Observe that the complement of \(L\) is exactly the language ALAN: \(L'= ALAN\). But we know that ALAN is not recursively enumerable.

  • Therefore, we have an example where: \(L\) is recursively enumerable, but its complement \(L'\) is not recursively enumerable.

3.4.5 Code: MATHISONAnalyzer — Implements Section 3.4’s MATHISON Membership Test#

Depends on: CWLValidator (Block 11) and TuringMachineDecoder (Block 5). Run those blocks first.

MATHISON is defined in Section 3.4.1 as: $\(\text{MATHISON} = \{w \in \text{CWL} \mid w \text{ encodes a valid TM that accepts } w\}\)$

MATHISON is the complement of ALAN within CWL (i.e., ALAN ∪ MATHISON = CWL for valid TM encodings). Section 3.4.3 proves MATHISON is recursively enumerable.

Running outside Jupyter:

python ch16_mathison.py
# ── Block 14: MATHISONAnalyzer — MATHISON Membership (Section 3.4) ───────────
# Depends on: CWLValidator (Block 11), TuringMachineDecoder (Block 5)
#
# MATHISON = { w ∈ CWL | w encodes a valid TM T, and T accepts w }
# MATHISON is the complement of ALAN within valid CWL words.
# Proof that MATHISON is RE: Section 3.4.3
#
# ⚠️ Bug fixed: original _simulate_on_empty_tape simulated on blank tape,
#    contradicting the definition. Renamed to _simulate_on_own_encoding
#    and now passes cwl_string as the tape content.

class MATHISONAnalyzer:
    """
    Determines whether a string belongs to language MATHISON (Section 3.4.1).

    Membership test cascade (same four steps as ALAN, but Step 4 inverted):
      Step 1: CWL check
      Step 2: Structural decode
      Step 3: TM validity check
      Step 4: Self-simulation — TM accepts its encoding? → YES → in MATHISON
    """

    def __init__(self):
        self.cwl_validator = CWLValidator()          # Block 11
        self.tm_decoder    = TuringMachineDecoder()  # Block 5
        self.tm_decoder.debug = False
        self.max_steps = 10000

    def is_in_mathison(self, cwl_string, verbose=True):
        """
        Four-step MATHISON membership test.
        Returns: (in_mathison: bool, reason: str, steps_taken: int)
        """
        if verbose:
            print(f"\nAnalyzing MATHISON membership for: "
                  f"{cwl_string[:30]}{'...' if len(cwl_string)>30 else ''}")
            print("=" * 60)

        # --- Step 1: CWL check ---
        if not self.cwl_validator.is_valid_cwl(cwl_string):
            if verbose: print("✗ Not in CWL → not in MATHISON")
            return False, "not_cwl", 0
        if verbose: print("✓ String is in CWL")

        # --- Step 2: Decode ---
        try:
            transitions = self.tm_decoder.decode_complete_machine(cwl_string)
            if verbose: print(f"✓ Decoded {len(transitions)} transition(s)")
        except Exception as e:
            if verbose: print(f"✗ Invalid TM encoding: {e}")
            return False, "invalid_tm", 0

        # --- Step 3: Validity checks ---
        if not self._is_valid_tm(transitions):
            if verbose: print("✗ Missing START or HALT state → not in MATHISON")
            return False, "invalid_structure", 0
        if verbose: print("✓ Represents a structurally valid TM")

        # --- Step 4: Self-simulation (MATHISON criterion: TM must ACCEPT its encoding) ---
        # Bug fix: simulate on cwl_string (own encoding), not on empty tape
        halts, steps, crash_reason = self._simulate_on_own_encoding(
            transitions, cwl_string, verbose
        )

        if halts:
            if verbose:
                print(f"✓ TM halts and accepts its own encoding after {steps} steps → IN MATHISON")
            return True, "accepts_self", steps
        else:
            reason = crash_reason if crash_reason else "timeout"
            if verbose:
                print(f"✗ TM does not accept its own encoding "
                      f"({'crash: ' + crash_reason if crash_reason else 'did not halt'}) "
                      f"→ NOT IN MATHISON")
            return False, reason, steps

    def _is_valid_tm(self, transitions):
        """Minimal validity: START (1) and HALT (2) states must both appear."""
        states = set(t['from'] for t in transitions) | set(t['to'] for t in transitions)
        return 1 in states and 2 in states

    def _simulate_on_own_encoding(self, transitions, cwl_string, verbose=False):
        """
        Simulate the decoded TM with cwl_string as the tape input.
        This correctly implements the MATHISON criterion: T accepts code(T).

        Bug fix from original: was named _simulate_on_empty_tape and initialized tape to blanks.
        MATHISON requires the machine to accept its OWN encoding, not an empty tape.

        Returns: (halts: bool, steps_taken: int, crash_reason: str|None)
        """
        # δ as a Python dict
        trans_dict = {(t['from'], t['read']): (t['to'], t['write'], t['move'])
                      for t in transitions}

        # Tape initialized with the TM's own encoding (not blanks — that was the bug)
        tape    = list(cwl_string) + ['Δ'] * 1000
        head    = 0
        state   = 1   # q₀ = State 1
        steps   = 0

        visited_configs = set()   # Cycle detection

        while steps < self.max_steps:
            # --- Accept check ---
            if state == 2:
                return True, steps, None

            # --- Cycle detection ---
            tape_segment = ''.join(tape[max(0, head-5):head+6])
            config = (state, head, tape_segment)
            if config in visited_configs:
                if verbose: print(f"  Cycle detected at step {steps}")
                return False, steps, None
            visited_configs.add(config)

            # --- Read and transition ---
            current_symbol = tape[head] if head < len(tape) else 'Δ'
            key = (state, current_symbol)

            if key not in trans_dict:
                if state == 2:
                    return True, steps, None   # Halt state with no transition = accept
                return False, steps, None       # No transition = implicit reject

            next_state, write_symbol, direction = trans_dict[key]

            # --- Boundary check before moving left ---
            if direction == 'L' and head == 0:
                if verbose:
                    print(f"  CRASH at step {steps}: attempted left-move from cell 0")
                return False, steps, "left_boundary_crash"

            # --- Execute transition ---
            tape[head] = write_symbol
            state      = next_state
            if direction == 'L':
                head -= 1
            elif direction == 'R':
                head += 1
                if head >= len(tape):
                    tape.extend(['Δ'] * 1000)

            steps += 1

        return False, steps, None   # Exceeded max_steps

3.4.6 Code: MATHISON Examples#

Depends on: MATHISONAnalyzer (Block 14), CWLValidator (Block 11), TuringMachineDecoder (Block 5). Run those blocks first.

This block tests six encodings with different MATHISON outcomes, including crash cases (left-boundary violations) and loops.

# ── Block 15: MATHISON Examples ──────────────────────────────────────────────
# Depends on: MATHISONAnalyzer (Block 14)
# Bug fix: mathison_analyzer is instantiated here before demonstrate_mathison_examples()

# ⚠️ Bug fix: original code called demonstrate_mathison_examples() without
#    instantiating mathison_analyzer first → NameError in notebook output.
mathison_analyzer = MATHISONAnalyzer()   # ← this line was missing in original


def demonstrate_mathison_examples():
    """
    Test six TM encodings and report MATHISON membership.
    Uses the corrected _simulate_on_own_encoding (not empty-tape simulation).
    """
    print("\nEXAMPLES OF MATHISON MEMBERSHIP")
    print("=" * 60)

    examples = [
        {
            'name':        'Immediate Halt (Right)',
            'encoding':    'abaabbabbb',   # δ(1,Δ) = (2,Δ,R): halts immediately on blank
            'description': 'Reads blank → HALT immediately. '
                           'Own encoding starts with "a" not blank, so...',
            'expected':    False,   # Encoding starts with 'a', no δ(1,'a') → rejects self
        },
        {
            'name':        'Crash on Left Move',
            'encoding':    'abaabbaaba',   # δ(1,Δ) = (2,Δ,L): tries L from cell 0
            'description': 'Tries to move left from cell 0 on first step → crash',
            'expected':    False,
        },
        {
            'name':        'Write Then Halt',
            'encoding':    'abaabbabbabaababbaba',  # δ(1,Δ)=(1,b,R), δ(1,b)=(2,b,L)
            'description': 'Writes b rightward, then moves left. '
                           'Own encoding starts with "a", so transition for "a" missing.',
            'expected':    False,   # No δ(1,'a') → implicit reject of own encoding
        },
        {
            'name':        'Infinite Right Scan',
            'encoding':    'abaabababb',   # δ(1,Δ) = (1,Δ,R): rightward loop forever
            'description': 'Loops rightward on blanks — never halts on own encoding',
            'expected':    False,
        },
        {
            'name':        'Accepts Starting \'a\'',
            # δ(1,a)=(2,a,R): accepts any string starting with 'a'
            # Own encoding starts with 'a' (all TM encodings do) → accepts self!
            'encoding':    'abaabaaab',
            'description': 'Accepts strings starting with "a". '
                           'Own encoding starts with "a" → MATHISON!',
            'expected':    True,
        },
        {
            'name':        'Accepts Starting \'ab\'',
            # δ(1,a)=(3,a,R), δ(3,b)=(2,b,R): accepts strings starting with 'ab'
            # All TM encodings start with 'ab' → accepts self!
            'encoding':    'abaaaabaabaaabaaabb',
            'description': 'Accepts strings starting with "ab". '
                           'All TM encodings start with "ab" → MATHISON!',
            'expected':    True,
        },
    ]

    for ex in examples:
        print(f"\n{'─'*60}")
        print(f"Name:        {ex['name']}")
        print(f"Description: {ex['description']}")
        print(f"Encoding:    {ex['encoding']}")

        # Check CWL membership first
        if mathison_analyzer.cwl_validator.is_valid_cwl(ex['encoding']):
            # Decode to show the transitions
            mathison_analyzer.tm_decoder.debug = False
            try:
                transitions = mathison_analyzer.tm_decoder.decode_complete_machine(
                    ex['encoding']
                )
                for t in transitions:
                    print(f"  Transition: δ({t['from']}, '{t['read']}') = "
                          f"({t['to']}, '{t['write']}', {t['move']})")
            except Exception:
                print("  (Unable to decode)")
        else:
            print("  ✗ Not a valid CWL string")

        result, reason, steps = mathison_analyzer.is_in_mathison(ex['encoding'], verbose=False)
        correct = (result == ex['expected'])
        print(f"Expected in MATHISON: {ex['expected']}")
        print(f"Actual result:        {'IN MATHISON' if result else 'NOT IN MATHISON'} "
              f"(reason: {reason}, steps: {steps})")
        print(f"{'✓ Correct' if correct else '✗ Incorrect'}")


demonstrate_mathison_examples()
EXAMPLES OF MATHISON MEMBERSHIP
============================================================

────────────────────────────────────────────────────────────
Name:        Immediate Halt (Right)
Description: Reads blank → HALT immediately. Own encoding starts with "a" not blank, so...
Encoding:    abaabbabbb

Decoding Turing Machine
Total encoding length: 10 characters
============================================================

Transition 1: (1, 'Δ') → (2, '#', R)

Successfully decoded single transition

Decoded TM contains a single transition
  Transition: δ(1, 'Δ') = (2, '#', R)

Decoding Turing Machine
Total encoding length: 10 characters
============================================================

Transition 1: (1, 'Δ') → (2, '#', R)

Successfully decoded single transition

Decoded TM contains a single transition
Expected in MATHISON: False
Actual result:        IN MATHISON (reason: halts, steps: 1)
✗ Incorrect

────────────────────────────────────────────────────────────
Name:        Crash on Left Move
Description: Tries to move left from cell 0 on first step → crash
Encoding:    abaabbaaba

Decoding Turing Machine
Total encoding length: 10 characters
============================================================

Transition 1: (1, 'Δ') → (2, 'b', L)

Successfully decoded single transition

Decoded TM contains a single transition
  Transition: δ(1, 'Δ') = (2, 'b', L)

Decoding Turing Machine
Total encoding length: 10 characters
============================================================

Transition 1: (1, 'Δ') → (2, 'b', L)

Successfully decoded single transition

Decoded TM contains a single transition
Expected in MATHISON: False
Actual result:        NOT IN MATHISON (reason: left_boundary_crash, steps: 0)
✓ Correct

────────────────────────────────────────────────────────────
Name:        Write Then Halt
Description: Writes b rightward, then moves left. Own encoding starts with "a", so transition for "a" missing.
Encoding:    abaabbabbabaababbaba
  ✗ Not a valid CWL string
Expected in MATHISON: False
Actual result:        NOT IN MATHISON (reason: not_cwl, steps: 0)
✓ Correct

────────────────────────────────────────────────────────────
Name:        Infinite Right Scan
Description: Loops rightward on blanks — never halts on own encoding
Encoding:    abaabababb

Decoding Turing Machine
Total encoding length: 10 characters
============================================================

Transition 1: (1, 'b') → (2, 'b', R)

Successfully decoded single transition

Decoded TM contains a single transition
  Transition: δ(1, 'b') = (2, 'b', R)

Decoding Turing Machine
Total encoding length: 10 characters
============================================================

Transition 1: (1, 'b') → (2, 'b', R)

Successfully decoded single transition

Decoded TM contains a single transition
Expected in MATHISON: False
Actual result:        NOT IN MATHISON (reason: timeout, steps: 0)
✓ Correct

────────────────────────────────────────────────────────────
Name:        Accepts Starting 'a'
Description: Accepts strings starting with "a". Own encoding starts with "a" → MATHISON!
Encoding:    abaabaaab
  ✗ Not a valid CWL string
Expected in MATHISON: True
Actual result:        NOT IN MATHISON (reason: not_cwl, steps: 0)
✗ Incorrect

────────────────────────────────────────────────────────────
Name:        Accepts Starting 'ab'
Description: Accepts strings starting with "ab". All TM encodings start with "ab" → MATHISON!
Encoding:    abaaaabaabaaabaaabb
  ✗ Not a valid CWL string
Expected in MATHISON: True
Actual result:        NOT IN MATHISON (reason: not_cwl, steps: 0)
✗ Incorrect

4. Practice Exercises#

4.1 State Numbering Convention#

Given the following Turing machine with descriptive state names, apply the standard numbering convention:

  • States: q_init, q_loop, q_check, q_final, q_process

  • Start state: q_init

  • Halt state: q_final

4.2 Encode a Single Transition#

Encode the following transition using the ‘a’ and ‘b’ encoding scheme:

Transition: (State 3, symbol ‘b’) → (State 5, symbol ‘a’, move Left)

4.3 Transition Table Creation#

Create a transition table for a TM that:

  • Starts in state 1

  • If it reads ‘a’, it changes it to ‘b’ and halts (state 2)

  • If it reads ‘b’, it moves right staying in state 1

  • If it reads blank (Δ), it writes ‘a’ and halts

Task: Create the table with columns: From, To, Read, Write, Move

4.4 Decode a Complete Transition#

Decode the following string: \(aabaaabbabaa\)

4.5 Decode Multiple Transitions#

Decode the following string: \(ababbabaabaabaabbaba\)

4.6 CWL Membership#

Determine if the following strings belong to CWL. If not, explain why:

  1. abaabbabbb

  2. abcabbabbb

  3. abaabbabb

  4. bbaabbabbb

  5. abaabbabbbaabaabbaba

4.7 CWL Pattern Analysis#

Given that \(CWL = (a^+ba^+b(a+b)^5)^*\), explain why a string of length 15 might or might not be in CWL.

4.8 ALAN Membership Analysis#

For each encoding below, determine if it belongs to ALAN and explain why:

  1. aaabaaabbabbb (TM with no state 1)

  2. ababbabbb (TM that loops in state 1)

  3. abaabbaba (TM that halts immediately)

4.9 Self-Reference Problem#

Consider the TM encoded by \(abaabbaaabaaabaaababbb\),

  • Does this TM accept its own encoding?

  • Is this encoding in ALAN?

  • Trace the first 5 steps of execution when the TM runs on its own encoding

4.10 MATHISON Quick Check#

Without full simulation, determine which encodings likely belong to MATHISON:

  1. abaabbaab

  2. ababbaabb

  3. abaaabbaaabaaabaabbaba

4.11 TM Encoder Implementation#

Create a class that encodes Turing Machines into the ‘a’ and ‘b’ string representation. The encoder should handle state numbering conventions and produce lexicographically sorted encodings.

Requirements: implement a TMEncoder class with the following methods:

  • encode_state(n) - Encodes state number n as a^n b

  • encode_transition(from, to, read, write, move) - Encodes a single transition

  • encode_machine(transitions) - Encodes complete TM with lexicographic sorting

4.12 TM Decoder with Error Handling#

Create a robust decoder that can handle malformed encodings and provide detailed error messages.

Requirements: implement a TMDecoder class that:

  • Decodes state numbers from a^n b patterns

  • Decodes symbols and directions

  • Handles incomplete or invalid encodings gracefully

  • Returns structured transition data

4.13 CWL Validator#

Implement a validator that checks if a string belongs to the Code Word Language (CWL).

Requirements: create a function that validates CWL membership using the pattern: \(CWL = (a^+ba^+b(a+b)^5)^*\)

4.14 ALAN Membership Checker#

Implement a complete ALAN membership checker that determines if a CWL string belongs to language ALAN.

Requirements: create a system that:

  • Validates CWL membership

  • Decodes the TM

  • Checks TM validity

  • Simulates the TM on its own encoding

  • Determines ALAN membership

5. Further Reading#

  • “Introduction to the Theory of Computation” by Michael Sipser, Chapter 3

  • “Introduction to Computer Theory” by Daniel I.A. Cohen, Chapter 23

  • “Automata Theory, Languages, and Computation” by Hopcroft, Motwani, and Ullman, Chapter 9