Turing Machine Languages

15. Turing Machine Languages#

Introduction#

This section explores the theoretical foundations of Turing machine languages, including recursively enumerable languages, and properties of these languages.

Code: The `TuringMachine` Class — Shared Infrastructure#

This class is the foundation for every code example in this chapter. Run this cell before any other cell or you will get a NameError. It translates the standard 7-tuple formal definition of a Turing machine into a Python object.

Formal definition → Python mapping

Formal notation	Python attribute / parameter
Set of states \(Q\)	`self.states` (a Python `set`)
Input alphabet \(\Sigma\)	`self.alphabet`
Tape alphabet \(\Gamma\)	`self.tape_alphabet` (must include blank \(\Delta\))
Transition function \(\delta : Q \times \Gamma \to Q \times \Gamma \times \{L, R\}\)	`self.transition_function` — a `dict` keyed by `(state, symbol)` tuples
Start state \(q_0\)	`self.start_state`
Accept state \(q_{accept}\)	`self.accept_state`
Reject state \(q_{reject}\)	`self.reject_state`
Tape contents	`self.tape` — a Python `list` of single-character strings
Read/write head position	`self.head_position` — a zero-based integer index

What to look for: The step() method encodes one application of \(\delta\). The run() method encodes the definition of acceptance: the machine accepts \(w\) if and only if it reaches accept_state within max_steps. The 'loop' return value is an approximation — a real TM may loop beyond max_steps; the simulator cannot distinguish a slow-halting machine from a non-halting one.

Running outside Jupyter: This class is not meant to be executed standalone — it is a dependency. Save it in a file called turing_machine.py and import it in any script that calls create_aa_substring_tm(), create_start_with_b_tm(), or create_starts_with_a_tm().

# TuringMachine class — implements the 7-tuple (Q, Σ, Γ, δ, q0, q_accept, q_reject)
# from the formal definition (see Section 3.1 of Sipser, or textbook Section 13.1).
# This class is SHARED BY ALL examples in Chapter 15. Run this cell first.

class TuringMachine:
    def __init__(self, states, alphabet, tape_alphabet, transition_function,
                 start_state, accept_state, reject_state):
        # --- Phase 1: Store the machine's formal specification ---
        self.states = states                         # Q: finite set of states
        self.alphabet = alphabet                     # Σ: input alphabet (no blank)
        self.tape_alphabet = tape_alphabet           # Γ: tape alphabet (Σ ⊆ Γ, includes blank)
        self.transition_function = transition_function  # δ: (Q \ {q_acc, q_rej}) × Γ → Q × Γ × {L,R}
        self.start_state = start_state               # q_0 ∈ Q
        self.accept_state = accept_state             # q_accept ∈ Q
        self.reject_state = reject_state             # q_reject ∈ Q, q_reject ≠ q_accept

        # --- Phase 2: Initialize runtime state (reset on each call to run()) ---
        self.tape = []           # Γ*: tape contents as a list (grows rightward only)
        self.head_position = 0   # Integer index into self.tape; 0 = leftmost cell
        self.current_state = start_state

    def step(self):
        """
        Execute one step of δ.
        Returns True if the machine is still running, False if it has halted.

        Formal rule: if current_state ∈ {q_accept, q_reject}, the machine has halted —
        no further transitions are defined (Sipser: "once the machine enters q_accept or
        q_reject, it halts immediately").
        """
        # --- Halting check: δ is undefined on accept/reject states ---
        if self.current_state in [self.accept_state, self.reject_state]:
            return False  # Machine already halted; caller should not call step() again

        # --- Read the current tape symbol (treat out-of-bounds as blank) ---
        # A real TM tape is infinite; the list is finite, so reading past its end = blank (△)
        current_symbol = (self.tape[self.head_position]
                          if self.head_position < len(self.tape)
                          else '△')

        # --- Look up δ(current_state, current_symbol) ---
        if (self.current_state, current_symbol) in self.transition_function:
            new_state, write_symbol, direction = (
                self.transition_function[(self.current_state, current_symbol)]
            )

            # --- Write phase: δ writes write_symbol to the current cell ---
            if self.head_position < len(self.tape):
                self.tape[self.head_position] = write_symbol
            else:
                self.tape.append(write_symbol)  # Extend tape rightward on first visit

            # --- Move phase: δ moves head Left or Right ---
            if direction == 'R':
                self.head_position += 1
            elif direction == 'L':
                # max(0, ...) enforces the left-end boundary of the tape
                self.head_position = max(0, self.head_position - 1)

            self.current_state = new_state
            return True  # Machine took a step; still running

        else:
            # No transition defined → implicit rejection
            # (This models a machine that crashes on undefined (state, symbol) pairs.)
            self.current_state = self.reject_state
            return False

    def run(self, input_string, max_steps=1000):
        """
        Run the TM on input_string for up to max_steps steps.

        Returns one of three strings that correspond directly to the three
        classification categories from the textbook:
          'accept'  → w ∈ L(T)  (machine halted in q_accept)
          'reject'  → w ∉ L(T)  (machine halted in q_reject, explicitly or implicitly)
          'loop'    → machine did not halt within max_steps (approximates non-termination)

        Note: 'loop' is a simulation artifact. A true TM either loops forever or halts;
        the simulator uses max_steps as a cutoff because it cannot solve the Halting Problem.
        """
        # --- Phase 1: Initialize tape and head position from input ---
        self.tape = list(input_string)   # Each character becomes one tape cell
        self.head_position = 0
        self.current_state = self.start_state

        # --- Phase 2: Step until halt or step limit ---
        steps = 0
        while steps < max_steps:
            if self.current_state == self.accept_state:
                return 'accept'
            elif self.current_state == self.reject_state:
                return 'reject'

            if not self.step():
                break   # step() returned False → machine halted (accept or reject handled above)
            steps += 1

        # --- Phase 3: Classify the outcome ---
        if steps >= max_steps:
            return 'loop'    # Did not halt within budget → treat as non-termination
        else:
            return 'reject'  # Halted (via step() returning False) but not in accept state

1. Recursively Enumerable Languages#

1.1 Defintion#

A language \(L\) is recursively enumerable if there exists a Turing machine \(T\) such that:

For every string \(w \in L\), \(T\) halts and accepts \(w\)
For every string \(s \notin L\), \(T\) either rejects \(s\) or loops forever

We say a recursively enumerable language \(L\) is Turing-recognizable since there exists a Turing machine that accepts every string in \(L\). In other words, the machine will eventually halt and accept any input that belongs to the language. Another way to describe this is that there’s a Turing machine that can enumerate all the strings in \(L\), one by one. If a string is in \(L\), the machine will eventually recognize it and accept it. Later in this section, we’ll introduce Turing-decidable languages, which are quite different from Turing-recognizable ones.

1.2 TM Examples#

1.2.1 Example 1#

The following Turing machine has three states:

State 1 (START state)
State 2 (intermediate state)
State 3 (HALT state)

The transitions are:

From state 1 to state 2: On reading ‘a’, write ‘a’, move Right (a, a, R)
From state 2 to state 3 (HALT): On reading ‘a’, write ‘a’, move Right (a, a, R)
From state 2 to state 1: On reading ‘b’, write ‘b’, move Right (b, b, R)
From state 1 to state 1: On reading ‘b’, write ‘b’, move Right (b, b, R)
From state 1 to state 1: On reading blank (Δ), write blank, move Right (Δ, Δ, R)

        flowchart LR
    accTitle: A Example TM
    accDescr: a diagram representing a TM that has three states
    start((1 START))
    state2((2))
    halt[HALT 3]
    
    %% Self loop on start state
    start -->|"(b, b, R)/(Δ, Δ, R)"| start
    
    %% Transition from start to state2
    start -->|"(a, a, R)"| state2
    
    %% Transition from state2 to halt
    state2 -->|"(a, a, R)"| halt
    
    %% Transition from state2 back to start
    state2 -->|"(b, b, R)"| start
    
    %% Style to match the image
    classDef default fill:#fff,stroke:#333,stroke-width:1px
    classDef halt fill:#fff,stroke:#333,stroke-width:1px,rx:15,ry:15
    
    class halt halt

Accept Class: Based on the transitions, the machine accepts if and only if it reads \(aa\) somewhere in the input string. More precisely, it accepts an input string if it finds an \(a\) followed immediately by another \(a\). Examples of strings in the accept class include \(aab\), \(baa\), \(baaba\).

\(ACCEPT(T)\): any strings with double \(a\).

Reject Class: This machine doesn’t have an explicit reject state, but it can implicitly reject if it reaches a configuration where no transition is defined. For a string ending in \(a\) but without \(aa\), the machine will process the string and never encounter two consecutive \(a\)’s. When it reaches the final \(a\), it will transition from state 1 to state 2. Since there are no more characters, it will not move to state 3, but it also has no transition defined for state 2 with blank input. This causes the machine to reject the string (by halting without reaching the accept state). Examples of rejected strings include \(ba\), \(baba\), \(bbba\).

\(REJECT(T)\): any strings without double \(a\) and end in \(a\).

Loop Class: By analyzing the two loop transitions at state 1, we see that the loop class contains all strings that do not contain the substring \(aa\) and end in \(b\). For such strings, the machine will process each symbol without ever encountering two consecutive \(a\)’s. When it reaches the final \(b\), it either remains in state 1 or transitions from state 2 back to state 1. With no further input to process, the machine enters an infinite loop in state 1, causing the machine to loop forever. Examples of strings causing loops include \(b\), \(ab\), \(bab\).

\(LOOP(T)\): any strings without double \(a\) and end in \(b\).

Code Example: Recursively Enumerable Language (the “contains aa” TM)#

Depends on: The TuringMachine class defined in the cell above. Run that cell first.

This block implements the three-state TM from Section 1.2.1 and then classifies strings into the three formal categories (ACCEPT, REJECT, LOOP) defined in Section 1.

Formal notation → Python mapping for this TM

Textbook element	Python encoding
\(Q = \{1, 2, 3\}\)	`states = {'1', '2', '3'}`
\(\Sigma = \{a, b\}\)	`alphabet = {'a', 'b'}`
\(\delta(1, a) = (2, a, R)\)	`('1', 'a'): ('2', 'a', 'R')`
\(\delta(2, a) = (3, a, R)\) (HALT = accept)	`('2', 'a'): ('3', 'a', 'R')`
\(\delta(2, b) = (1, b, R)\)	`('2', 'b'): ('1', 'b', 'R')`
\(\delta(1, b) = (1, b, R)\)	`('1', 'b'): ('1', 'b', 'R')`
\(\delta(1, \Delta) = (1, \Delta, R)\)	`('1', '△'): ('1', '△', 'R')`
No transition for \((2, \Delta)\)	Key `('2', '△')` absent from dict → implicit reject

What to look for: contains_aa, is_reject_class, and is_loop_class are Python oracles — they compute directly what the TM computes via state transitions. Comparing tm.run(test) against these oracles is how the test harness verifies the TM behaves correctly. Notice that the three predicates are mutually exclusive and exhaustive: every string over \(\{a, b\}^*\) falls into exactly one category.

Running outside Jupyter: Save this cell (and the TuringMachine class) in a single file, say ch15_example1.py. Replace any display() calls with print(). Then run:

python ch15_example1.py

# ── Block 2: Example 1 ── Recursively Enumerable Language ──────────────────
# Implements the 3-state TM from Section 1.2.1 that recognizes L = {w ∈ {a,b}* | w contains "aa"}
# Depends on: TuringMachine class (Block 1 must be run first)

def create_aa_substring_tm():
    """
    Construct the TM from the Section 1.2.1 transition diagram.

    State roles (formal → Python string):
      State 1 (START)   : haven't seen an 'a' yet in the current run
      State 2           : just saw exactly one 'a'; looking for a second consecutive 'a'
      State 3 (HALT)    : q_accept — two consecutive 'a's confirmed

    ACCEPT(T)  = strings containing substring "aa"            = {w | "aa" ⊆ w}
    REJECT(T)  = strings without "aa" that end in 'a'         = {w | "aa" ⊄ w ∧ w ends in 'a'}
    LOOP(T)    = strings without "aa" that end in 'b' (or ε)  = {w | "aa" ⊄ w ∧ (w = ε ∨ w ends in 'b')}
    """
    states = {'1', '2', '3'}
    alphabet = {'a', 'b'}
    tape_alphabet = {'a', 'b', '△'}   # △ = blank; Γ = Σ ∪ {△}

    # δ encoded as a dict: (current_state, read_symbol) → (next_state, write_symbol, direction)
    # Each entry corresponds to one labeled arrow in the Section 1.2.1 diagram.
    transition_function = {
        # ── Transitions out of State 1 (START) ──────────────────────────────
        # δ(1, a) = (2, a, R): first 'a' seen; move to "waiting for second 'a'" state
        ('1', 'a'): ('2', 'a', 'R'),

        # δ(1, b) = (1, b, R): 'b' resets progress; self-loop keeps scanning
        ('1', 'b'): ('1', 'b', 'R'),

        # δ(1, △) = (1, △, R): blank causes an infinite loop on strings ending in 'b' or ε
        # Design choice: no explicit reject state → strings without "aa" ending in 'b' loop forever
        ('1', '△'): ('1', '△', 'R'),

        # ── Transitions out of State 2 (intermediate) ───────────────────────
        # δ(2, a) = (3, a, R): second consecutive 'a' found → accept (HALT)
        ('2', 'a'): ('3', 'a', 'R'),

        # δ(2, b) = (1, b, R): the 'a' streak is broken; reset to START state
        ('2', 'b'): ('1', 'b', 'R'),

        # Note: ('2', '△') is intentionally absent.
        # If State 2 reads a blank, no transition fires → implicit rejection.
        # This accounts for strings like "ba", "baba" that end in 'a' without "aa".
    }

    start_state  = '1'
    accept_state = '3'        # q_accept = State 3 (HALT in the diagram)
    reject_state = 'reject'   # q_reject: not in diagram; used only for implicit rejection

    return TuringMachine(states, alphabet, tape_alphabet, transition_function,
                         start_state, accept_state, reject_state)


# ── Python oracles: compute classification directly (used to verify TM behavior) ──

def contains_aa(string):
    """
    Oracle for ACCEPT(T): True iff w ∈ L = {w | "aa" ⊆ w}.
    Used only for testing — the TM computes this via state transitions, not string search.
    """
    return 'aa' in string

def is_reject_class(string):
    """
    Oracle for REJECT(T): strings without "aa" that end in 'a'.
    The TM reaches State 2 on the final 'a' and then reads blank → no transition → implicit reject.
    """
    return not contains_aa(string) and string.endswith('a')

def is_loop_class(string):
    """
    Oracle for LOOP(T): strings without "aa" that end in 'b' or are empty.
    The TM loops on δ(1, △) = (1, △, R) forever.
    """
    return not contains_aa(string) and not string.endswith('a')


# ── Test harness ─────────────────────────────────────────────────────────────

def test_tm():
    """
    Run the TM on representative strings from each classification class
    and compare against the Python oracles.
    """
    tm = create_aa_substring_tm()

    # --- ACCEPT class: strings containing "aa" ---
    accept_tests = ["aa", "aab", "baa", "baab", "ababaa"]
    print("ACCEPT CLASS TESTS (should all be 'accept'):")
    for test in accept_tests:
        result   = tm.run(test)
        expected = "accept" if contains_aa(test) else "not accept"
        print(f"  String: '{test}', Result: {result}, Expected: {expected}")

    # --- REJECT class: no "aa", ends in 'a' ---
    reject_tests = ["a", "ba", "baba", "bbba"]
    print("\nREJECT CLASS TESTS (should all be 'reject'):")
    for test in reject_tests:
        result   = tm.run(test)
        expected = "reject" if is_reject_class(test) else "not reject"
        print(f"  String: '{test}', Result: {result}, Expected: {expected}")

    # --- LOOP class: no "aa", ends in 'b' or empty ---
    loop_tests = ["", "b", "bb", "bab"]
    print("\nLOOP CLASS TESTS (should all be 'loop'):")
    for test in loop_tests:
        result   = tm.run(test)
        expected = "loop" if is_loop_class(test) else "not loop"
        print(f"  String: '{test}', Result: {result}, Expected: {expected}")

if __name__ == "__main__":
    test_tm()

ACCEPT CLASS TESTS (should all be 'accept'):
  String: 'aa', Result: accept, Expected: accept
  String: 'aab', Result: accept, Expected: accept
  String: 'baa', Result: accept, Expected: accept
  String: 'baab', Result: accept, Expected: accept
  String: 'ababaa', Result: accept, Expected: accept

REJECT CLASS TESTS (should all be 'reject'):
  String: 'a', Result: reject, Expected: reject
  String: 'ba', Result: reject, Expected: reject
  String: 'baba', Result: reject, Expected: reject
  String: 'bbba', Result: reject, Expected: reject

LOOP CLASS TESTS (should all be 'loop'):
  String: '', Result: loop, Expected: loop
  String: 'b', Result: loop, Expected: loop
  String: 'bb', Result: loop, Expected: loop
  String: 'bab', Result: loop, Expected: loop

1.2.2 Example 2#

The following Turing machine has three states:

START (1): The initial state
2: An intermediate state
HALT: The accepting state

The transitions are:

From START to 2:
- When reading ‘Δ’ (blank), write ‘b’, move Right
- When reading ‘a’, write ‘b’, move Right
From START to HALT:
- When reading ‘b’, write ‘b’, move Right
From 2 to 2:
- When reading ‘a’, write ‘b’, move Right
- When reading ‘b’, write ‘b’, move Right
- When reading ‘Δ’ (blank), write ‘b’, move Right

        stateDiagram-v2
    accTitle: A Example TM
    accDescr: a diagram representing a TM that has three states
    START: START (1)
    INTERMEDIATE: 2
    HALT: HALT

    START --> INTERMEDIATE : (Δ,b,R)/(a,b,R)
    START --> HALT : (b,b,R)
    INTERMEDIATE --> INTERMEDIATE : (a,b,R)/(b,b,R)/(Δ,b,R)

This Turing machine accepts strings that start with \(b\). Therefore, the language recognized by this machine is \(L = b(a+b)^*\)

Accept Class: Strings that start with \(b\) are accepted. When the machine reads \(b\) in the start state (1), it transitions directly to the HALT state. Examples: \(b\), \(ba\), \(bb\), \(baa\).

\(ACCEPT(T)\): L

Reject Class: This machine doesn’t explicitly reject any strings - it either accepts or loops.

\(REJECT(T): \varnothing\)

Loop Class: All strings that don’t start with \(b\) will cause the machine to loop. If the first symbol is \(a\)’ or \(Δ\), the machine transitions to state 2. Once in state 2, the machine loops indefinitely, as all transitions from 2 lead back to itself. Examples: \(a\), \(aa\), \(ab\).

\(LOOP(T)\): \(L'\), the complement of \(L\).

Since we can construct a Turing machine that accepts strings in \(L = b(a+b)^*\) and either rejects or loops on strings not in \(L\), we conclude that \(L\) is a recursively enumerable language.

Code Example: Recursively Enumerable Language (the “starts with b” TM)#

Depends on: The TuringMachine class (Block 1). Run Block 1 first.

This block implements the three-state TM from Section 1.2.2 that recognizes \(L = b(a+b)^*\).

Formal notation → Python mapping for this TM

Textbook element	Python encoding
\(Q = \{\text{START}, 2, \text{HALT}\}\)	`states = {'x1', 'x2', 'HALT'}`
\(\delta(\text{START}, b) = (\text{HALT}, b, R)\)	`('x1', 'b'): ('HALT', 'b', 'R')`
\(\delta(\text{START}, a) = (2, b, R)\)	`('x1', 'a'): ('x2', 'b', 'R')`
\(\delta(\text{START}, \Delta) = (2, b, R)\)	`('x1', '△'): ('x2', 'b', 'R')`
\(\delta(2, \cdot) = (2, b, R)\) for all \(\cdot \in \Gamma\)	`('x2', 'a')`, `('x2', 'b')`, `('x2', '△')` all → `('x2', 'b', 'R')`
REJECT(T) = ∅	No `reject_state` transitions in diagram; machine only accepts or loops

What to look for: The analyze_start_with_b_tm() function runs the TM on a broad set of strings and groups results automatically — this is an empirical language description approach. It identifies the pattern from behavior rather than from the transition table. Compare it to reading the language directly from the diagram.

Running outside Jupyter: Save in ch15_example2.py alongside the TuringMachine class and run:

python ch15_example2.py

# ── Block 3: Example 2 ── Recursively Enumerable Language ──────────────────
# Implements the 3-state TM from Section 1.2.2 that recognizes L = b(a+b)*
# Depends on: TuringMachine class (Block 1 must be run first)

def create_start_with_b_tm():
    """
    Construct the TM from the Section 1.2.2 transition diagram.

    State roles:
      x1 (START) : reading the first character — branch to HALT if 'b', else loop via x2
      x2          : entered when first character is NOT 'b'; self-loops forever on all input
      HALT        : q_accept — first character was 'b'

    ACCEPT(T)  = b(a+b)*        (strings starting with 'b')
    REJECT(T)  = ∅               (no explicit reject state in diagram)
    LOOP(T)    = complement of L (strings NOT starting with 'b', including ε)
    """
    states       = {'x1', 'x2', 'HALT'}
    alphabet     = {'a', 'b', '△'}
    tape_alphabet = {'a', 'b', '△'}

    transition_function = {
        # ── Transitions out of x1 (START) ────────────────────────────────────
        # δ(x1, b) = (HALT, b, R): first symbol is 'b' → accept immediately
        ('x1', 'b'): ('HALT', 'b', 'R'),

        # δ(x1, a) = (x2, b, R): first symbol is not 'b' → enter the eternal loop state
        # Design: the machine overwrites 'a' with 'b' (modifying the tape), but this
        # does not affect acceptance since the machine loops forever from x2.
        ('x1', 'a'): ('x2', 'b', 'R'),

        # δ(x1, △) = (x2, b, R): empty string → first "symbol" is blank → loop state
        ('x1', '△'): ('x2', 'b', 'R'),

        # ── Transitions out of x2 (eternal-loop state) ───────────────────────
        # δ(x2, ·) = (x2, b, R) for all · ∈ Γ
        # This implements LOOP(T) = L' (complement of L):
        # once in x2, the machine scans right forever, never halting.
        ('x2', 'a'): ('x2', 'b', 'R'),
        ('x2', 'b'): ('x2', 'b', 'R'),
        ('x2', '△'): ('x2', 'b', 'R'),
    }

    start_state  = 'x1'
    accept_state = 'HALT'
    reject_state = 'reject'   # Not in diagram; required by TuringMachine constructor

    return TuringMachine(states, alphabet, tape_alphabet, transition_function,
                         start_state, accept_state, reject_state)


def analyze_start_with_b_tm():
    """
    Empirically determine the language by running the TM on a broad test set
    and grouping results into ACCEPT / REJECT / LOOP classes.

    This approach identifies the language from observed behavior — useful when the
    transition table is complex and the accepted language is not immediately obvious.
    """
    tm = create_start_with_b_tm()

    # Test a representative sample of strings over {a, b}* up to length 3
    test_strings = ['', 'a', 'b', 'aa', 'ab', 'bb', 'aaa', 'aab', 'aba',
                    'abb', 'baa', 'bab', 'bba', 'bbb']
    results = {}

    for string in test_strings:
        results[string] = tm.run(string)   # 'accept', 'reject', or 'loop'

    # --- Partition results into three formal classes ---
    accepted_strings = [s for s, r in results.items() if r == 'accept']
    rejected_strings = [s for s, r in results.items() if r == 'reject']
    looping_strings  = [s for s, r in results.items() if r == 'loop']

    print("Analysis of Turing Machine Language:")
    print(f"  ACCEPT(T): {accepted_strings}")
    print(f"  REJECT(T): {rejected_strings}")
    print(f"  LOOP(T):   {looping_strings}")

    # --- Infer the language pattern from ACCEPT(T) ---
    print("\nLanguage Description:")
    if accepted_strings and all(s.startswith('b') for s in accepted_strings):
        print("  This TM accepts exactly the strings that start with 'b'.")
        print("  Formal language: L = b(a+b)*")
    elif not accepted_strings:
        print("  This TM accepts no strings (L = ∅).")
    else:
        print("  Pattern: [determine from accepted_strings]")

    return results


# ── Standalone test ──────────────────────────────────────────────────────────

if __name__ == "__main__":
    # Quick spot-check on individual inputs
    tm = create_start_with_b_tm()
    spot_checks = ['', 'a', 'b', 'aa', 'ab', 'ba', 'bb']

    print("Spot-check results:")
    for test in spot_checks:
        print(f"  Input: '{test}', Result: {tm.run(test)}")

    print("\n" + "=" * 50)
    analyze_start_with_b_tm()

Spot-check results:
  Input: '', Result: loop
  Input: 'a', Result: loop
  Input: 'b', Result: accept
  Input: 'aa', Result: loop
  Input: 'ab', Result: loop
  Input: 'ba', Result: accept
  Input: 'bb', Result: accept

==================================================
Analysis of Turing Machine Language:
  ACCEPT(T): ['b', 'bb', 'baa', 'bab', 'bba', 'bbb']
  REJECT(T): []
  LOOP(T):   ['', 'a', 'aa', 'ab', 'aaa', 'aab', 'aba', 'abb']

Language Description:
  This TM accepts exactly the strings that start with 'b'.
  Formal language: L = b(a+b)*

2. Recursive Language#

2.1 Defintion#

A language \(L\) is recursive if there exists a Turing machine \(T\) such that:

For every string \(w \in L\), \(T\) halts and accepts \(w\)
For every string \(s \notin L\), \(T\) rejects \(s\)
\(T\) must terminates on all inputs after a finite number of steps

Think of a recursive language as a language for which you can write a computer program that always finishes running and always tells you correctly whether the input belongs to the language. We say a recursive language \(L\) is Turing-decidable since there exists a Turing machine that decides membership for every string in \(L\). That is, the machine will halt on all inputs: it accepts strings that belong to \(L\) and rejects those that don’t, with no possibility of looping forever. This makes recursive languages more powerful in a practical sense, because you can always determine whether a given string is in the language. Unlike Turing-recognizable languages, recursive languages guarantee both recognition and rejection through halting behavior.

2.2 Examples#

This is a simple Turing machine with just two states:

START (the initial state)
HALT (the accepting state)

There is a single transition:

From START state to HALT state when reading ‘a’, writing ‘a’, and moving the tape head right (a,a,R)

        stateDiagram-v2
    accTitle: A Simple TM
    accDescr: a diagram representing a TM that has two states
    START: START
    HALT: HALT

    START --> HALT : (a,a,R)

The language being accepted by this Turing machine contains all strings starting with \(a\).

\(ACCEPT(T): a(a+b)^*\)
\(REJECT(T): \Lambda + b(a+b)^*\)
\(LOOP(T): \varnothing\)

This language is recursive (decidable) because the machine accepting it always terminates on any input. For any input string, the machine gives a definitive “yes” or “no” answer in a finite number of steps.

2.3 Code: Recursive (Decidable) Language — the “starts with a” TM#

Depends on: The TuringMachine class (Block 1). Run Block 1 first.

⚠️ Name collision: This cell defines a function called test_tm(), the same name used in Block 2. In a live Jupyter kernel, running this cell will overwrite the Block 2 test_tm(). If you need to re-run Block 2’s tests, re-execute Block 2 after this cell. To avoid confusion in teaching, consider renaming this to test_recursive_tm().

This block implements the two-state TM from Section 2.2 that decides \(L = a(a+b)^*\). The critical difference from Examples 1 and 2 is that LOOP(T) = ∅ — the machine always halts, making \(L\) recursive (decidable).

Formal notation → Python mapping for this TM

Textbook element	Python encoding
\(Q = \{\text{START}, \text{HALT}\}\)	`states = {'START', 'HALT'}`
\(\delta(\text{START}, a) = (\text{HALT}, a, R)\)	`('START', 'a'): ('HALT', 'a', 'R')`
All other \((q, \sigma)\) pairs undefined	Missing keys → implicit rejection
ACCEPT(T) = \(a(a+b)^*\)	Any string whose first character is `'a'`
REJECT(T) = \(\Lambda + b(a+b)^*\)	Empty string or strings starting with non-`'a'`
LOOP(T) = ∅	No string loops — this is what makes the language recursive

Dependency diagram for this block:

TuringMachine (Block 1)
    └── create_starts_with_a_tm()   → returns a TuringMachine instance
            └── test_tm()
                    ├── tm.run(test)         (calls TuringMachine.run)
                    └── starts_with_a(test)  (oracle for comparison)

Running outside Jupyter:

python ch15_example3.py

# ── Block 4: Recursive (Decidable) Language ─────────────────────────────────
# Implements the 2-state TM from Section 2.2 that DECIDES L = a(a+b)*
# Depends on: TuringMachine class (Block 1 must be run first)
#
# ⚠️  WARNING: This cell defines test_tm(), which was also defined in Block 2.
#     Running this cell in a live kernel will overwrite Block 2's test_tm().
#     Consider renaming this to test_recursive_tm() to avoid the collision.

def create_starts_with_a_tm():
    """
    Construct the TM from the Section 2.2 transition diagram (two states only).

    This TM is a DECIDER for L = a(a+b)*:
      - It accepts every w ∈ L (starts with 'a')
      - It rejects every w ∉ L (empty string or starts with non-'a')
      - LOOP(T) = ∅  ← this is the defining property of a recursive language

    The single transition δ(START, a) = (HALT, a, R) means:
      - Any string starting with 'a' → immediately accepts after one step
      - Any other first symbol       → no transition fires → implicit rejection
    """
    states       = {'START', 'HALT'}
    alphabet     = {'a', 'b', 'c'}       # Σ: can extend without changing acceptance behavior
    tape_alphabet = {'a', 'b', 'c', '△'} # Γ = Σ ∪ {blank}

    # δ contains exactly ONE transition: the entire decision is made on the first symbol.
    # Every other (state, symbol) pair is undefined → implicit rejection → always halts.
    # This guarantees LOOP(T) = ∅, making L recursive.
    transition_function = {
        # δ(START, a) = (HALT, a, R): first symbol is 'a' → accept
        ('START', 'a'): ('HALT', 'a', 'R'),
        # ('START', 'b'), ('START', '△'), etc. are all absent → reject immediately
    }

    start_state  = 'START'
    accept_state = 'HALT'
    reject_state = 'REJECT'  # Implicit rejections transition here internally

    return TuringMachine(states, alphabet, tape_alphabet, transition_function,
                         start_state, accept_state, reject_state)


def starts_with_a(string):
    """
    Oracle for ACCEPT(T): True iff w ∈ L = a(a+b)*.
    Equivalent to the TM's decision, but computed directly in Python.
    """
    return string.startswith('a') if string else False  # ε ∉ L (empty string rejected)


# ⚠️  NAME COLLISION: this function is also defined in Block 2 as test_tm().
# Running this cell overwrites that definition. Rename to test_recursive_tm() if needed.
def test_tm():
    """
    Verify the decider by comparing tm.run() against the starts_with_a() oracle.
    Every result should be either 'accept' or 'reject' — never 'loop'.
    """
    tm = create_starts_with_a_tm()

    # --- Strings that should be ACCEPTED: L = a(a+b)* ---
    accept_tests = ["a", "aa", "ab", "abc"]
    print("ACCEPT TESTS (strings starting with 'a'):")
    for test in accept_tests:
        result   = tm.run(test)
        expected = "accept" if starts_with_a(test) else "reject"
        print(f"  String: '{test}', Result: {result}, Expected: {expected}")

    # --- Strings that should be REJECTED: ε + b(a+b)* ---
    # Crucially: none of these should return 'loop'
    reject_tests = ["", "b", "ba", "c", "cab"]
    print("\nREJECT TESTS (strings not starting with 'a'):")
    for test in reject_tests:
        result   = tm.run(test)
        expected = "reject" if not starts_with_a(test) else "accept"
        print(f"  String: '{test}', Result: {result}, Expected: {expected}")

if __name__ == "__main__":
    test_tm()

ACCEPT TESTS (strings starting with 'a'):
  String: 'a', Result: accept, Expected: accept
  String: 'aa', Result: accept, Expected: accept
  String: 'ab', Result: accept, Expected: accept
  String: 'abc', Result: accept, Expected: accept

REJECT TESTS (strings not starting with 'a'):
  String: '', Result: reject, Expected: reject
  String: 'b', Result: reject, Expected: reject
  String: 'ba', Result: reject, Expected: reject
  String: 'c', Result: reject, Expected: reject
  String: 'cab', Result: reject, Expected: reject

3. What Makes a Language Decidable or Recognizable?#

By definition, if there exists at least one Turing machine that satisfies the criteria for recognizing or deciding a language, we classify the language accordingly:

If there is a Turing machine that accepts all strings in the language and terminates on all inputs (accepts or rejects), then the language is called recursive (decidable).
If there is a Turing machine that accepts all strings in the language but may loop forever on inputs not in the language, then the language is called recursively enumerable ( recognizable or semi-decidable).

The classification of the language depends on the existence of such a machine, not on the behavior of every possible machine for that language. So even if one Turing machine for a language loops on some inputs, that doesn’t mean the language is not recursive. There might be another Turing machine that terminates on all inputs. If even one such halting machine exists, the language is recursive. Conversely, if no machine can decide the language without possibly looping on some inputs, but there is at least one machine that can accept all valid strings, then the language is recursively enumerable but not recursive.

The key points are:

Language classification is based on what is possible, not on what every implementation does.
The presence of one looping machine does not mean the language is not recursive.
What matters is whether a halting machine (for recursive) or an accepting machine (for recursively enumerable) exists.

4. Key Properties and Features#

4.1 Dovetailing Construction#

4.1.1 What is Dovetailing#

Dovetailing is a technique used to simulate multiple Turing machines at the same time, by interleaving their steps. Instead of running one machine to completion before starting the next, dovetailing runs a little bit of each machine in turn, switching back and forth rapidly. When we have multiple machines that might run forever, dovetailing ensures we don’t get stuck waiting on one machine that never halts. Instead, we give each machine a slice of time, so if any one of them halts, we will find out eventually.

4.1.2 How does it work#

Suppose we want to simulate two machines \(T_1\) and \(T_2\) on the same input \(w\):

Run step 1 of \(T_1\)
Run step 1 of \(T_2\)
Run step 2 of \(T_1\)
Run step 2 of \(T_2\)
Run step 3 of \(T_1\)
Run step 3 of \(T_2\)
… and so on.

You keep alternating steps between the two machines. As soon as either \(T_1\) or \(T_2\) halts and accepts, the dovetailing simulation can accept the input immediately.

4.2 Union of Recursively Enumerable Languages#

The class of recursively enumerable languages is closed under union, which means if \(L_1\) and \(L_2\) are recursively enumerable languages, then their union \(L_1 \cup L_2\) is also recursively enumerable.

To prove correctness, we construct a Turing machine that accepts \(L_1 \cup L_2\) by simulating both machines for \(L_1\) and \(L_2\) in parallel. The new machine accepts any input string if either original machine accepts it, and it loops indefinitely on strings not in the union, as allowed for recursively enumerable languages. Let \(T_1\) be a Turing machine that recognizes \(L_1\). Let \(T_2\) be a Turing machine that recognizes \(L_2\). An important property of recursively enumerable languages is that Turing machines recognizing them are not required to terminate on strings not in the language. For strings \(w\), the behavior of our union machine \(T_3\) should be:

If \(w \in L_1\) or \(w \in L_2\), then \(T_3\) should accept \(w\).
If \(w \notin L_1\) and \(w \notin L_2\), then \(T_3\) should not accept \(w\), it can either:
- Explicitly reject \(w\), or
- Loop forever (not terminate at all)

The key insight here is that for an recursively enumerable language recognizer, we only guarantee acceptance for strings in the language. For strings not in the language, there is no requirement to reach a specific decision. Our union machine \(T_3\) should preserve this characteristic. This behavior is naturally achieved by the dovetailing construction:

If \(w \in L_1\), then \(T_1\) will eventually accept it, and \(T_3\) will detect this and accept it.
If \(w \in L_2\), then \(T_2\) will eventually accept it, and \(T_3\) will detect this and accept it.
If \(w \notin L_1\) and \(w \notin L_2\), then \(T_3\) should either reject it or loop forever.

Now here’s an interesting question: can we build a Turing machine that accepts all strings in the union of two recursively enumerable languages, and rejects all strings not in the union without looping forever? If the answer is yes, meaning we can build such a machine or find an algorithm that always halts with the correct answer, then by definition, the union of the two languages would be recursive, not just recursively enumerable. The answer is actually No, we cannot always build a Turing machine that both accepts all strings in the union of two recursively enumerable languages AND rejects all strings not in the union, unless both original languages are already recursive (decidable). We’ll explore the detailed proof after we discuss another key property of recursive languages. Look at it from a different angle: the construction has to deal with situations where \(T_1\) runs forever on some input \(w \notin L_1\) and \(T_2\) also runs forever on that same \(w \notin L_2\). In these cases, there’s no way for the combined machine \(T_3\) to decide in finite time that \(w\) isn’t in the union. Since we are building an recursively enumerable recognizer, looping on strings not in the language is normal and actually expected. So let us revise the third step of the dovetailing construction as follows:

If \(w \notin L_1\) and \(w \notin L_2\), then \(T_3\) will run forever without terminating.

In practical implementations of the union construction, we typically do not add explicit rejection logic. Instead:

The dovetailing process continues indefinitely.
If either \(T_1\) or \(T_2\) accepts, \(T_3\) immediately accepts.
If both \(T_1\) and \(T_2\) eventually reject, \(T_3\) could be designed to run forever. Alternatively, we can modify \(T_1\) and \(T_2\) so that, instead of rejecting words they don’t accept, they run forever on those inputs.
If either \(T_1\) or \(T_2\) runs forever, \(T_3\) does’t need any need special handling, because the original machine is already running indefinitely.

On the union machine \(T_3\):

\(ACCEPT(T_3) = ACCEPT(T_1) + ACCEPT(T_2)\)
\(REJECT(T_3) = \varnothing\)
\(LOOP(T_3) = REJECT(T_1) + REJECT(T_2) + LOOP(T_1) + LOOP(T_2)\)

4.3 Complement of Recursively Enumerable Languages#

The complement of a language \(L\), denoted as \(L'\), is defined as the set of all strings over the alphabet that are not in \(L\): \(L' = \{w \in \Sigma^* | w \notin L\}\). The vertical bar \(|\) means “such that.” If \(L\) is a recursively enumerable language, its complement \(L'\) is not necessarily recursively enumerable. Recursively enumerable languages are not closed under complement. We will prove this in the next section by introducing how Turing machines can be encoded, along with a few interesting new languages.

4.4 Complement of Recursive Languages#

If the language \(L\) is recursive, then its complement \(L'\) is also recursive. In other words, the recursive languages are closed under complementation. To prove this, it is often easier to use Post machines instead of Turing machines. Although Post machines are not covered in detail in this textbook, they are explained in the further reading section. The key idea is to modify a Post machine for a recursive language so that it handles every possible input character by explicitly adding transitions to REJECT states. By adding REJECT states, we ensure the machine has a defined transition for every input symbol in every configuration. This turns the machine’s transition function into a total function. To recognize the complement of the language, we simply flip the ACCEPT and REJECT states. This creates a new Post machine that accepts exactly the strings not in the original language and still terminates on all inputs. Since this new machine also never loops, the complement language is recursive. Finally, because every Post machine has an equivalent Turing machine, this result holds for Turing machines as well.

However, this approach does not work for recursively enumerable languages. We cannot use the same reasoning to show that the complement of a recursively enumerable language is also recursively enumerable, because a Post machine for an recursively enumerable language might loop forever on some inputs. Simply swapping the ACCEPT and REJECT states doesn’t resolve this as any input that originally caused the machine to loop will still loop in the new machine. As a result, the machine cannot decide those inputs, and the complement language may not be recursively enumerable.

This property reveals several important insights:

Decidability Means Complete Knowledge: If a problem is decidable (recursive), we have complete knowledge about both positive and negative instances. We can determine with certainty whether any string is in the language or not.
Contrast with recursively enumerable Languages: recursively enumerable languages are not closed under complementation. There exist recursively enumerable languages whose complements are not recursively enumerable.

4.5 Recursively Enumerable Complement Implies Recursiveness#

If a language \(L\) is recursively enumerable and its complement \(L'\) is also recursively enumerable, then \(L\) is recursive (decidable). To prove this, we know that both languages are recursively enumerable, so there must exist two Turing machines \(T_1\) and \(T_2\) such that \(T_1\) recognizes \(L\) and \(T_2\) recognizes \(L'\):

\(ACCEPT(T_1) = L\),
\(ACCEPT(T_2) = L'\)

We can construct a Turing machine \(T_3\) that decides \(L\) as follows: On input \(w\), run \(T_1\) and \(T_2\) in parallel using the dovetailing technique:

If \(T_1\) accepts \(w\), then \(T_3\) ACCEPT
If \(T_2\) accepts \(w\), then \(T_3\) REJECT

The reason why it works is because:

If \(w \in L\), then \(T_1\) will eventually accept \(w\), and \(T_3\) will accept it
If \(w \notin L\), that means \(w \in L'\), then \(T_2\) will eventually accept \(w\), and \(T_3\) will reject it
For any input \(w\), either \(T_1\) or \(T_2\) must eventually accept it (as \(w\) must be either in \(L\) or in \(L'\)), so \(T_3\) always terminates. Thus, \(T_3\) decides \(L\), making \(L\) recursive.

When we say that we can build something, such as a machine or algorithm, it is important to back that up by showing how the construction is actually done. Demonstrating the “can” part is what turns a claim into a proven fact. However, in this case, the detailed construction process is quite lengthy and goes beyond the scope of this textbook. For those interested in a deeper understanding, the full construction is available in the further reading section.

In practical terms, this theorem says if you can enumerate all the “yes” instances AND all the “no” instances of a problem, then you can decide the problem. However, if you can only enumerate one set (either “yes” or “no” instances), then the problem might be undecidable.

4.6 Intersection of Recursively Enumerable Languages#

If \(L_1\) and \(L_2\) are recursively enumerable languages, then their intersection \(L_1 \cap L_2\) is also recursively enumerable. This means the class of recursively enumerable languages is closed under intersection operations. To prove this, we need to construct a Turing machine \(L_3\) that recognizes \(L_1 \cap L_2\). Unlike the union construction, we need both machines to accept for \(L_3\) to accept:

Step 1: Create a two-track Turing machine that copies the input from track 1 to track 2, then moves the head back to the start and begins running \(L_1\) on track 1.
Step 2: Modify \(L_1\) to operate only on the top track, and change its HALT state so that it rewinds the tape head and then starts \(L_2\).
Step 3: Modify \(L_2\) to operate only on the bottom track without changing its HALT state.

This combined machine first runs \(L_1\) on the input. If \(L_1\) accepts it, it then runs \(L_2\) on the same input. The machine accepts only if both \(L_1\) and \(L_2\) accept. Therefore, it recognizes exactly the strings in the intersection of the two languages. Some inputs may still be rejected or cause one of the machines to loop forever. However, since we are only building a recognizer for a recursively enumerable intersection, this behavior is acceptable and does not affect the correctness of the construction.

5. Practice Exercises#

5.1#

Give an example of a language that is recursively enumerable but not recursive. Explain why it falls into this category.

5.2#

Prove that every recursive language is recursively enumerable, but not every recursively enumerable language is recursive.

5.3#

Let \(L_1\) and \(L_2\) be recursively enumerable languages. Which of the following must also be recursively enumerable? Justify your answer.

\(L_1 \cup L_2\)
\(L_1 \cap L_2\)
\(L_1 - L_2\)

5.4 Binary String Parity Checker (Recursive Language)#

Design a Turing machine that accepts binary strings with an even number of 1s. This is a recursive language because the machine should always halt and give a definitive answer.

Requirements:

Accept strings like “00”, “11”, “0011”, “1100”
Reject strings like “1”, “01”, “101”, “111”
The machine must always halt (no infinite loops)

5.5 Palindrome Recognizer (Recursively Enumerable)#

Create a Turing machine that recognizes palindromes over the alphabet {a, b}. The machine should accept palindromes and may loop on non-palindromes.

Requirements:

Accept palindromes like “a”, “b”, “aa”, “bb”, “aba”, “bab”, “abba”
May reject or loop on non-palindromes like “ab”, “abc”, “abab”

5.6 Union of Two Languages (Dovetailing)#

Implement the dovetailing technique to create a Turing machine that accepts the union of two recursively enumerable languages:

L1: Strings containing “aa”
L2: Strings starting with “b”

Requirements:

Use dovetailing to simulate both machines in parallel
Accept if either machine accepts
Demonstrate the union property

5.7 Complement Language Analysis#

Given a recursive language L = {strings over {a,b} that start with ‘a’}, create its complement and demonstrate that the complement is also recursive.

Requirements:

Implement the original language recognizer
Implement the complement language recognizer
Show both always halt (proving recursiveness)

5.8 Language Intersection#

Create a Turing machine that recognizes the intersection of two languages:

L1: Binary strings with even number of 1s
L2: Binary strings with even length

Requirements:

Implement both individual language recognizers
Create the intersection recognizer using sequential composition
Test with various inputs

6. Further Reading#

“Introduction to the Theory of Computation” by Michael Sipser, Chapter 3
“Introduction to Computer Theory” by Daniel I.A. Cohen, Chapter 23
“Automata Theory, Languages, and Computation” by Hopcroft, Motwani, and Ullman, Chapter 8

Turing Machine Languages

Contents

15. Turing Machine Languages#

Introduction#

Code: The TuringMachine Class — Shared Infrastructure#

1. Recursively Enumerable Languages#

1.1 Defintion#

1.2 TM Examples#

1.2.1 Example 1#

Code Example: Recursively Enumerable Language (the “contains aa” TM)#

1.2.2 Example 2#

Code Example: Recursively Enumerable Language (the “starts with b” TM)#

2. Recursive Language#

2.1 Defintion#

2.2 Examples#

2.3 Code: Recursive (Decidable) Language — the “starts with a” TM#

3. What Makes a Language Decidable or Recognizable?#

4. Key Properties and Features#

4.1 Dovetailing Construction#

4.1.1 What is Dovetailing#

4.1.2 How does it work#

4.2 Union of Recursively Enumerable Languages#

4.3 Complement of Recursively Enumerable Languages#

4.4 Complement of Recursive Languages#

4.5 Recursively Enumerable Complement Implies Recursiveness#

4.6 Intersection of Recursively Enumerable Languages#

5. Practice Exercises#

5.1#

5.2#

5.3#

5.4 Binary String Parity Checker (Recursive Language)#

5.5 Palindrome Recognizer (Recursively Enumerable)#

5.6 Union of Two Languages (Dovetailing)#

5.7 Complement Language Analysis#

5.8 Language Intersection#

6. Further Reading#

Code: The `TuringMachine` Class — Shared Infrastructure#