Set Theory

1. Set Theory#

Before we discuss languages and computability, we’ll want some foundational understanding of set theory, which we’ll use frequently in the rest of this book.

After completing this chapter you will:

Know the definition of a set.
Understand set containment and equality.
Understand the basic set operations like union and intersection.

1. Basic Definitions#

The fundamental object in mathematics - that serves as the basis for all other mathematical structures and indeed all of mathematics itself - is the set.

Definition - A set is an unordered collection of distinct objects, called elements or members. A set is said to contain its elements. We write \(a \in A\) to denote that \(a\) is an element of the set \(A\). The notation \(a \notin A\) denotes that \(a\) is not an element of the set \(A\).

Note - The elements within a set must be distinct, and so we can’t have repeated elements. So, the collection of numbers \(\{1,2,3,4,5\}\) would be a set, but the collection \(\{1,1,2,3,5\}\) would not be, as \(1\) appears twice.

Some examples of sets include:

The set of all vowels in the English alphabet: \(\{a,e,i,o,u\}\).
The set of all living former U.S. Presidents: \(\{Clinton,Bush,Obama,Biden\}\)
The set of all positive integers less than \(5\): \(\{1,2,3,4\}\).
The set of all integers, typically written \(\mathbb{Z}\).
The set of all rational numbers, typically written \(\mathbb{Q}\).
The set of all real numbers, typically written \(\mathbb{R}\).
The set of all complex numbers, typcially written \(\mathbb{C}\).

One of the fundamental data structures in Python is the set. To specify a set we can use curly braces or the \(set()\) constructor. The empty set must be created with \(set()\), not \(\{\}\).

# Using curly braces
my_set = {1, 2, 3}

# Using the set() constructor
another_set = set([1, 2, 2, 3, 4])  # duplicates removed
print(another_set)  # {1, 2, 3, 4}

# Empty set must be created with set(), not {}
empty_set = set()

{1, 2, 3, 4}

We can add or remove elements from a set using the add and remove methods. The remove method will throw an error if you ask to remove an element that isn’t in the set. The discard method works just like remove, except it doesn’t throw an error in that case.

s = {1, 2, 3}

# Add an element
s.add(4)  # {1, 2, 3, 4}
s

{1, 2, 3, 4}

# Remove an element
s.remove(2)  # KeyError if 2 not found
s

{1, 3, 4}

If we run that remove method again, we get:

# Remove an element
s.remove(2)  # KeyError if 2 not found
s

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[4], line 2
      1 # Remove an element
----> 2 s.remove(2)  # KeyError if 2 not found
      3 s

KeyError: 2

Because \(2\) is no longer in the set. However, if instead we use the \(discard\) method, it works the same, except without the error.

s = {1, 2, 3}

# Discard an element
s.discard(2) # No error if 2 not found
s

# Discard an element
s.discard(2) # No error if 2 not found
s

2. Containment, Equality, and Cardinality#

We say that two sets are equal if and only if they have the same elements. We write \(A = B\) if the set \(A\) equals the set \(B\). A set \(A\) is a subset of \(B\) if every element of \(A\) is also an element of \(B\), and we write this as \(A \subseteq B\). A set \(A\) is a superset of \(B\) if every element of \(B\) is also an element of \(A\), and we write this as \(A \supseteq B\). Note that if \(A\) is both a subset and superset of \(B\), then \(A = B\). Sometimes we want to denote that \(A\) is a proper subset of \(B\), which means \(A\) is a subset of \(B\) and \(A \neq B\). In this case, we write it as \(A \subset B\).

Examples - The set of all positive integers less than \(5\) is a subset of the integers, \(\{1,2,3,4\} \subseteq \mathbb{Z}\). The set of integers is a subset of the rational numbers, \(\mathbb{Z} \subseteq \mathbb{Q}\), the set of rational numbers is a subset of the real numbers, \(\mathbb{Q} \subseteq \mathbb{R}\), and the set of real numbers is a subset of the complex numbers, \(\mathbb{R} \subseteq \mathbb{C}\).

It’s possible for a set to contain other sets. For example, the set of all sets of distinct positive numbers that sum to \(5\) would be \(\{\{5\},\{4,1\},\{3,2\}\}\). It’s also possible for a set to contain nothing. This is called the empty set, denoted \(\emptyset\).

The number of elements in a set is called the cardinality of a set. If the set is finite this is straightforward. However, if the set is infinite the question is quite subtle - there are actually different degrees of infinity! In particular, regarding our examples above, the cardinality of the integers is equal to the cardinality of the rational numbers, and the cardinality of the real numbers is equal to the cardinality of the complex numbers, but the cardinality of the real numbers is greater than the cardinality of the rational numbers! We won’t concern ourselves with the different infinite cardinals much in this book, but they will come up (in disguise) when we discuss decidability in Turing Machines.

Examples -

The cardinality of the set of vowels is \(5\).
The cardinality of the set of U.S. states is \(50\).
The cardinality of the set of integers is infinite, or more precisely countably infinite.

The powerset of a set is the set of all its subsets. So, for example, the subsets of \(A = \{1,2,3\}\) are \(\emptyset\), \(\{1\}\), \(\{2\}\), \(\{3\}\), \(\{1,2\}\), \(\{1,3\}\), \(\{2,3\}\), \(\{1,2,3\}\), and so its powerset, written \(2^{A}\), is \(\{\emptyset,\{1\},\{2\},\{3\},\{1,2\},\{1,3\},\{2,3\},\{1,2,3\}\}\). The notation is based on the fact that if the cardinality of \(A\), \(|A|\), is finite, then the cardinality of its powerset is \(2^{|A|}\). In particular, we note that for our example \(|A| = 3\), while \(|2^{A}| = 8\).

In Python, we can check if an element is in a set with the in operator. This returns a boolean (True of False)

s = {1, 2, 3}
1 in s

4 in s

We can also check if a set is a subset of another with the issubset method.

t = {'a','b','c'}
u = {'a','b'}
u.issubset(t)

t.issubset(u)

We can use the “==” operator to check for equality of two sets.

s = {1,2,3}
t = {3,2,1}
u = {1,2,4}

print(s == t)

Note that the two sets \(s\) and \(t\) are equal, even though we specified their elements in a different order. In a set, order doesn’t matter.

print(s == u)

We can get the cardinality of a set with the \(len()\) function:

len(s)

len({1,3,5,7})

Note - All sets you’ll use in Python are finite.

Python has no built in powerset method. In fact, in Python a set object cannot be an element of another set. The technical reason for this is set objects are mutable and therefore not hashable, but getting into what that means, why it presents a limitation, and how to work around it would take us too deep into the specifics of Python’s implementation and too far away from this course.

3. The Cartesian Product#

Definition - The Cartesian product of sets \(A\) and \(B\), denoted \(A \times B\), is the set of all ordered pairs \((a,b)\), where \(a \in A\) and \(b \in B\).

\[A \times B = \{(a,b)| a \in A \land b \in B\}\]

Note - The operator \(\land\) is the “and” operator. So, the statement “\(a \in A \land b \in B\)” means \(a\) in \(A\) and \(b\) in \(B\).

For example, the 2-dimension plane, \(\mathbb{R}^{2}\), is the Cartesian product of the real line with itself \(\mathbb{R} \times \mathbb{R}\).

Example - If \(A = \{1,2\}\) and \(B = \{a,b,c\}\), then the Cartesian product of the two sets is

\[ A \times B = \{(1,a),(2,a),(1,b),(2,b),(1,c),(2,c)\} \]

The cardinality of the Cartesian product of \(A\) and \(B\) will be \(|A||B|\).

Example - \(|A \times B|\), the cardinality of \(A \times B\), is \(8\), which is \(|A||B| = 2 \times 4\).

Python has no built in function for generating the Cartesian product of two sets, but we can easily write one:

def cartesian_product(A: set, B: set) -> set[tuple]:
    """
    Return the Cartesian product A × B as a set of (a, b) tuples.
    Example: cartesian_product({1,2}, {'x'}) -> {(1, 'x'), (2, 'x')}
    """
    return {(a, b) for a in A for b in B}

A = {1,2}
B = {'a','b','c'}
cartesian_product(A,B)

4. Set Operations#

Definitions - Let \(A\) and \(B\) be sets.

The union of the sets \(A\) and \(B\), denoted \(A \cup B\), is the set that contains those elements in either \(A\), or \(B\), or both.

The intersection of the sets \(A\) and \(B\), denoted \(A \cap B\), is the set containing those elements in both \(A\) and \(B\).

Two sets are called disjoint if their intersection is the empty set.

The difference of sets \(A\) and \(B\), denoted \(A - B\), is the set of all elements in \(A\) that are not in \(B\).

Examples - The union of the sets \(A = \{1,2,3\}\) and \(B = \{1,3,5\}\) is \(A \cup B = \{1,2,3,5\}\). Their intersection is \(A \cap B = \{1,3\}\). Their difference is \(A - B = \{2\}\).

The union operator in Python is “|”. The intersection operator is “&”. The difference operator is “-“.

A = {1,2,3}
B = {1,3,5};

Union

A|B

Intersection

A&B

Difference

A-B

Sometimes, sets are defined in relation to a larger universal set that contains all elements that can appear in sets. For example, you could define sets as being subsets of the real numbers, where the real number line is the universal set. In this context, it makes sense to talk about the complement of a set.

Definition - Let \(U\) be a defined universal set. The complement of the set \(A\), denoted by \(\overline{A}\), is the complement of \(A\) with respect to \(U\). It is the set of all elements in \(U\) not in \(A\).

Example - The complement of the positive integers within the set of all integers, \(\mathbb{Z}\), is the set of non-negative integers.

If we take the union of many sets \(A_{1},A_{2},\ldots,A_{n}\) we can write it as:

\[\displaystyle A_{1} \cup A_{2} \cup \cdots \cup A_{n} = \bigcup_{i = 1}^{n}A_{i}\]

If we take the intersection of many sets \(A_{1},A_{2},\ldots,A_{n}\) we can write it as:

\[\displaystyle A_{1} \cap A_{2} \cap \cdots \cap A_{n} = \bigcap_{i = 1}^{n}A_{i}\]

5. Naive Set Theory#

We’ve played a bit loose with our definition of a set, and in particular we never made precise what we meant by an “object” that can go in a set. Treating set theory like this is called “naive set theory”, and if you go looking for problems with this naive approach, you can find them. In particular, you can run into contradictions like the set of all sets that do not contain themselves. We won’t deal with these issues in this class, but they are serious, and the formal foundations of set theory require a more precise set of fundamental definitions and specifications to fix them.

This contradictory notion of the set of all sets that do not contain themselves was discovered and popularized by Bertrand Russell. A great book about Russell is the graphic novel Logicomix.

6. Practice Exercises#

Exercise 1 - What is the Cartesian product of the sets \(A = \{1,2,5\}\) and \(B = \{x,y,z\}\)? What is its cardinality?

Exercise 2 - What is are the union and intersection of the sets \(A = \{a,b,c\}\) and \(B = \{a,e,i,o,u\}\).

Exercise 3 - What is the powerset of the set \(P = \{1,3,7\}\).

7. References and Further Reading#

Introduction to the Theory of Computation (Third Edition) by Michael Sipser

Section 0.2 - Mathematical Notions and Terminology
Naive Set Theory by Paul Halmos.
Logicomix