🧑🏾💻
prep
Overview description of the prep work for the sprint
🧠 Why we use types
Learning Objectives
In real life, as well as programming, there are some impossible operations. Can you divide seven by yellow? Can you set fire to a sound? These don’t make sense. The same is true in programming.
Given these functions:
def half(value):
return value / 2
def double(value):
return value * 2
def second(value):
return value[1]
Consider these blocks of code:
print(half(22))
print(half("hello"))
print(half("22"))
Is half("22")
hoping to return 11 (because the string should be converted to a number)? Or return 2 (because it’s the first half of the string)? Or error, because it doesn’t make sense?
What is half("hello")
meant to do? It probably doesn’t make sense.
print(double(22))
print(double("hello"))
print(double("22"))
Does double("hello")
make sense? If so, what do you expect it to return?
print(second(22))
print(second(0x16))
print(second("hello"))
print(second("22"))
How about second(22)
? Should it treat 22 like a stringified version of the decimal representation of the number 22 and return 2? If so - 22
is the same as 0x16
. Should second(0x16)
convert 0x16
to decimal before returning the second character? Or should it remember that the original number was input as hexadecimal and return 6
?
Intent
The intent of these functions is probably that half
and double
are expected to operate on numbers, and second
is expected to operate on strings (and/or maybe lists).
But Python lets us write all of these things. Some of them, like half("hello")
will error when they run, maybe breaking our program. Others, like double("22")
will succeed but in surprising ways which may cause our program to give more subtly incorrect results later on.
✍️exercise
double("22")
will do. Then run the code and check. Did it do what you expected? Why did it return the value it did?In such a simple program as above, it’s easy for us to run the program manually and see the errors (if we add enough logging). But as programs get bigger, these things get harder to spot.
This gets even harder when code is only sometimes executed. For instance, consider this NodeJS program:
import process from "node:process";
import readline from "node:readline";
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
rl.question("What URL should we fetch?\n> ", async (url) => {
const response = await fetch(url);
if (!response.ok) {
if (response.body.toLowerCase().includes("permission")) {
console.error("You didn't have permission to get that URL");
} else {
console.error(`The request failed - body: ${response.body}`);
}
process.exit(1);
}
const body = await response.json();
// TODO: Do something with the response.
rl.close();
});
There is a bug here. response.body
is a Promise
not a string. So if a user ever tries to fetch a URL which returns a non-200 status code, our program will crash:
% node fetch.js
What URL should we fetch?
> http://www.google.com/beepboop
file:///Users/dwh/tmp/jsplay/fetch.js:12
if (response.body.toLowerCase().includes("permission")) {
^
TypeError: response.body.toLowerCase is not a function
at file:///Users/dwh/tmp/jsplay/fetch.js:12:23
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
Node.js v22.11.0
The code we wrote was wrong. It could never have been correct. After a fetch
, response.body.toLowerCase()
never makes sense. Ideally we shouldn’t have needed to wait until running the code (in fact, running exactly that line of code with exactly that data) to find this out.
Types
This is where types come in.
Imagine if we could run some analysis over our code that told us “You’re calling double
with a string, but double
expects a number, you have a bug”. Or told us “You’re calling response.body.toLowerCase()
but response.body
is a Promise
which doesn’t have a method toLowerCase
, you have a bug”.
Now we wouldn’t need to keep running our program with lots of different inputs every time we change it. The type analysis could tell us “You have a bug here, you should fix it”. Without having to run the program, and without having to think about different possible inputs.
Support for type checking
Different languages have different levels of support for checking types.
Some languages, like Java, C++, Rust, and Go, require you to write what types you expect function parameters to have.
Other languages, like JavaScript and Python, don’t require this but they have tools which allow you to add this information by using a tool like mypy or JSDoc.
Languages with optional type checking perform good checks when you do add this type information. If you don’t add type annotations in all of your code, they will perform fewer checks. Sometimes they will infer the correct types based on what you have annotated. Other times they will just ignore code with no annotations and not give you errors about it even if it’s wrong.
Limits of type checking
Types can be really useful for detecting bugs. But there are limits to what kind of bugs type checking can detect.
Take this code:
def double(number):
return number * 3
print(double(10))
This code has a bug.
✍️exercise
Even though we’re calling double
with the correct type, something is wrong. Either the name of double
is wrong (it should be called triple
), or what it’s doing is wrong (it should do * 2
not * 3
).
Type checking can’t catch this bug. All of the types are correct. Not all bugs are type errors. But checking for type errors can get rid of a lot of bugs.
🔎 Type checking with mypy
Learning Objectives
Mypy is a tool which enables type checking in Python code.
Reading
✍️exercise
Do not run the following code.
This code contains bugs related to types. They are bugs mypy can catch.
Read this code to understand what it’s trying to do. Add type annotations to the method parameters and return types of this code. Run the code through mypy, and fix all of the bugs that show up. When you’re confident all of the type annotations are correct, and the bugs are fixed, run the code and check it works.
def open_account(balances, name, amount):
balances[name] = amount
def sum_balances(accounts):
total = 0
for name, pence in accounts.items():
print(f"{name} had balance {pence}")
total += pence
return total
def format_pence_as_string(total_pence):
if total_pence < 100:
return f"{total_pence}p"
pounds = int(total_pence / 100)
pence = total_pence % 100
return f"£{pounds}.{pence:02d}"
balances = {
"Sima": 700,
"Linn": 545,
"Georg": 831,
}
open_account("Tobi", 9.13)
open_account("Olya", "£7.13")
total_pence = sum_balances(balances)
total_string = format_pence_as_str(total_pence)
print(f"The bank accounts total {total_string}")
📝 Classes and objects
Learning Objectives
We’ve already seen that objects can group together related, named data. We can write:
imran = {
"name": "Imran",
"age": 22,
"preferred_operating_system": "Ubuntu",
}
eliza = {
"name": "Eliza",
"age": 34,
"preferred_operating_system": "Arch Linux",
}
This allows us to pass around the values of imran
or eliza
, and access all of the related information while we do.
We’ve also already seen that it is useful to know that you can’t call .lower()
on the value 2
.
It would be useful for a type checker to tell us if we try to access a property of an object that that object doesn’t have:
imran = {
"name": "Imran",
"age": 22,
"preferred_operating_system": "Ubuntu",
}
print(imran["name"])
print(imran["address"])
This code doesn’t work, but mypy can’t tell us this. As far as it is concerned, a dictionary is a dictionary - it could contain any keys!
Instead, we can use a
💡Tip
The word object has a lot of uses.
In JavaScript, we don’t have a “dictionary” type, we call them objects. Sometimes these objects are just dictionaries - collections of key-value pairs. Other times they are instances of a specific class.
In general, people use the word object both to mean “collection of key-value pairs” and “instance of a class”. You often need to work out which they mean from context.
class Person:
def __init__(self, name: str, age: int, preferred_operating_system: str):
self.name = name
self.age = age
self.preferred_operating_system = preferred_operating_system
imran = Person("Imran", 22, "Ubuntu")
print(imran.name)
print(imran.address)
eliza = Person("Eliza", 34, "Arch Linux")
print(eliza.name)
print(eliza.address)
This code is saying: “There’s a category of object called Person. Every instance of Person has a name
, an age
, and a preferred_operating_system
”. It then makes two instances of Person, and uses them.
The method called __init__
is called a constructor - it is what is called when we construct a new instance of the class.
Exercise
Save the above code to a file, and run it through mypy.
Read the error, and make sure you understand what it’s telling you.
You can use the names of classes in type annotations just like you can use types like str
or int
:
def is_adult(person: Person) -> bool:
return person.age >= 18
print(is_adult(imran))
Exercise
Add the is_adult
code to the file you saved earlier.
Run it through mypy - notice that no errors are reported - mypy understands that Person
has a property named age
so is happy with the function.
Write a new function in the file that accepts a Person
as a parameter and tries to access a property that doesn’t exist. Run it through mypy and check that it does report an error.
📝 Methods
Learning Objectives
We’ve seen that we can take instances of classes as function parameters:
def is_adult(person: Person) -> bool:
return person.age >= 18
We’ve also seen types that have methods on them, e.g. "abc".upper()
. This looks a bit different from functions we define ourselves (which may look like upper("abc")
).
Methods are just like functions, but they are attached to a class.
We could rewrite our is_adult
function as a method on Person
:
class Person:
def __init__(self, name: str, age: int, preferred_operating_system: str):
self.name = name
self.age = age
self.preferred_operating_system = preferred_operating_system
def is_adult(self):
return self.age >= 18
imran = Person("Imran", 22, "Ubuntu")
print(imran.is_adult())
This has a few advantages over
✍️exercise
Think of the advantages of using methods instead of free functions. Write them down in your notebook.
Expand for some answers after you've listed your own.
- Ease of documentation - it makes it easier to find all of the things related to a string (or a Person) if they’re attached to that type.
- Encapsulation - if we change the implementation of
Person
(e.g. we start storing a date of birth instead of an age), it’s more obvious what things we need to change.
✍️exercise
Change the Person
class to take a date of birth (using the standard library’s datetime.date
class) and store it in a field instead of age
.
Update the is_adult
method to act the same as before.
📝 Dataclasses
Learning Objectives
We’ve seen that grouping together fields and methods into a class can help us encapsulate them. We can define a class whose purpose is just to group together related data, and provide access to it.
Our Person
class is an example of this. We just store some data in it (and maybe add some methods that just read that data).
If a class is just a place to group related data, it is sometimes called a
There are several functions we can implement on classes that have obvious implementations for value objects.
Equality is one: ideally two value objects are the same if their fields are the same. But this is not the case with objects by default:
class Person:
def __init__(self, name: str, age: int, preferred_operating_system: str):
self.name = name
self.age = age
self.preferred_operating_system = preferred_operating_system
imran = Person("Imran", 22, "Ubuntu")
imran2 = Person("Imran", 22, "Ubuntu")
print(imran == imran2) # Prints False
Similarly, it’s useful when we print a value object to see its type and fields. But this is not the case with objects by default:
class Person:
def __init__(self, name: str, age: int, preferred_operating_system: str):
self.name = name
self.age = age
self.preferred_operating_system = preferred_operating_system
imran = Person("Imran", 22, "Ubuntu")
print(imran) # Prints <__main__.Person object at 0x1048b5a90>
Python has a useful
dataclass
which generates some of these functions for us. In fact, it even generates the constructor for us.
from dataclasses import dataclass
@dataclass(frozen=True)
class Person:
name: str
age: int
preferred_operating_system: str
imran = Person("Imran", 22, "Ubuntu") # We can call this constructor - @dataclass generated it for us.
print(imran) # Prints Person(name='Imran', age=22, preferred_operating_system='Ubuntu')
imran2 = Person("Imran", 22, "Ubuntu")
print(imran == imran2) # Prints True
The dataclass
decorator generated a constructor, a __str__
method (which is called when string formatting the value), and a custom __eq__
method (which is called when comparing two values). This saves us having to write all of that code.
Other languages have a similar idea of a value type, and tools to help make them, such as Java’s record classes and C#’s’ structure types.
✍️exercise
Write a Person
class using @datatype
which uses a datetime.date
for date of birth, rather than an int
for age.
Re-add the is_adult
method to it.
🧠 Generics
Learning Objectives
Sometimes we want to reason about more complicated type relationships than “this field is a string”. Lists and dicts are examples of this. We may want to reason that every value in a list is a string.
Consider this code:
from dataclasses import dataclass
@dataclass(frozen=True)
class Person:
name: str
children: list
fatma = Person(name="Fatma", children=[])
aisha = Person(name="Aisha", children=[])
imran = Person(name="Imran", children=[fatma, aisha])
def print_family_tree(person: Person) -> None:
print(person.name)
for child in person.children:
print(f"- {child.name} ({child.age})")
print_family_tree(imran)
There is a bug in this code. Can you spot it?
Run your code through mypy. Does mypy spot it?
In some languages, like Java, C#, Rust, or Go, type information is required - you can’t write code without it. This means those languages can do more checks, and give better error messages. We call these
In other languages, like Python and JavaScript, type information is optional. Because of this, tools that check types are sometimes less strict. If they don’t know what type something has, they stop doing any checks.
That’s what’s happening here. Person.children
is a list
, but mypy doesn’t know what type of thing is in the list. It doesn’t even know that everything in the list has the same type = ["hello", 7, True]
is a legal list in Python.
We can use
|
|
Run this code through mypy.
Now that we’ve told mypy Person.children
is a list of type Person
(line 7), it can identify that the child
variable on line 16 is of type Person
. Because of this, it can tell us that child.age
on line 17 doesn’t exist.
📝Note
Most generics don’t need the types to be quoted. Normally you’d just write List[Person]
. But inside a type definition itself (i.e. inside the Person
class), the Person
type doesn’t exist yet, so we need to quote it.
It’s kind of annoying, but don’t worry about it too much.
✍️exercise
print
on line 17 - we do want to print the children’s ages. (Feel free to invent the ages of Imran’s children.)⌨️ Type-guided refactorings
Learning Objectives
Using classes and objects can help us to understand and maintain codebases, particularly as they grow.
We’ve already identified that using methods instead of free functions can help us to encapsulate information. If we change our class from storing age as an int
to storing date of birth as a datetime.date
, it’s easier to know what we’re likely to need to change.
Type checking can also help us with this. If you have some code which accesses imran.age
, and we remove the age
field, we can run mypy: It can tell us “Here are all of the places you also need to change your code”.
Take this file as an example. It is a program that works out what laptops could be allocated to what people based on their preferred operating system.
from dataclasses import dataclass
from typing import List
@dataclass(frozen=True)
class Person:
name: str
age: int
preferred_operating_system: str
@dataclass(frozen=True)
class Laptop:
id: int
manufacturer: str
model: str
screen_size_in_inches: float
operating_system: str
def find_possible_laptops(laptops: List[Laptop], person: Person) -> List[Laptop]:
possible_laptops = []
for laptop in laptops:
if laptop.operating_system == person.preferred_operating_system:
possible_laptops.append(laptop)
return possible_laptops
people = [
Person(name="Imran", age=22, preferred_operating_system="Ubuntu"),
Person(name="Eliza", age=34, preferred_operating_system="Arch Linux"),
]
laptops = [
Laptop(id=1, manufacturer="Dell", model="XPS", screen_size_in_inches=13, operating_system="Arch Linux"),
Laptop(id=2, manufacturer="Dell", model="XPS", screen_size_in_inches=15, operating_system="Ubuntu"),
Laptop(id=3, manufacturer="Dell", model="XPS", screen_size_in_inches=15, operating_system="ubuntu"),
Laptop(id=4, manufacturer="Apple", model="macBook", screen_size_in_inches=13, operating_system="macOS"),
]
for person in people:
possible_laptops = find_possible_laptops(laptops, person)
print(f"Possible laptops for {person.name}: {possible_laptops}")
Let’s imagine we want to change our code. We don’t want to say “Every person has one preferred operating system” any more. We want to let people have a list of operating systems they prefer (in order). So we could say “Imran prefers Ubuntu most of all, and then Arch Linux, but will not use macOS”.
✍️exercise
Try changing the type annotation of Person.preferred_operating_system
from str
to List[str]
.
Run mypy on the code.
It tells us different places that our code is now wrong, because we’re passing values of the wrong type.
We probably also want to rename our field - lists are plural. Rename the field to preferred_operating_systems
.
Run mypy again.
Fix all of the places that mypy tells you need changing. Make sure the program works as you’d expect.
The bigger (and more complicated) our codebase is, the more useful it is that mypy tells us what code needs changing. This is even more useful when we start working with code we didn’t write ourselves, or we wrote long ago. Instead of needing to read all of the code and search around to try to work out where we need to change an age
to date_of_birth
, or a preferred_operating_system
to a preferred_operating_systems
(and maybe change from an ==
check to an in
check), mypy can just tell us “here are all of the places that are wrong”.
🗄️ Enums
Learning Objectives
In the laptops example, we were using strings to store operating systems. Using strings is often problematic because they can take lots of different values. When we have a known set of possible values it is useful to ensure only those values can occur.
Some common problems with strings:
- Case sensitivity - are
"macOS"
and"MacOS"
the same? Should they be? - Spaces - are
"ArchLinux"
and"Arch Linux"
the same? Should they be? - Normalised values - are
"Arch Linux"
and"Arch"
the same? Should they be? - Typos - is
"Arc Linux"
meant to be"Arch Linux"
? Or is it a separate operating system?
In fact, in the previous example, the laptop with id 3 was never put in anyone’s preferred list, because its operating system was spelled Ubuntu
not ubuntu
.
We can use enums to represent that one some values are allowed, and make sure we’re always using the same ones. This is similar to how in HTML we can use an <input type="number">
instead of an <input type="text">
to restrict what a user can enter into a form.
In Python, we can define an enum as a new type. This is like bool
- bool
is a type which has two possible values (True
and False
). We can make enums that have any number of possible values, and we can choose the values’ names.
from enum import Enum
class OperatingSystem(Enum):
MACOS = "macOS"
ARCH = "Arch Linux"
UBUNTU = "Ubuntu"
This defines a new type called OperatingSystem
which has three possible values - MACOS
, ARCH
, and UBUNTU
. We can use this type in a type annotation to make sure that we’re only passed one of these values. If someone makes a typo in one of these values, mypy will catch it and tell us that UBUNT
or macOS
or NIX
doesn’t exist.
from dataclasses import dataclass
from enum import Enum
from typing import List
class OperatingSystem(Enum):
MACOS = "macOS"
ARCH = "Arch Linux"
UBUNTU = "Ubuntu"
@dataclass(frozen=True)
class Person:
name: str
age: int
preferred_operating_system: OperatingSystem
@dataclass(frozen=True)
class Laptop:
id: int
manufacturer: str
model: str
screen_size_in_inches: float
operating_system: OperatingSystem
def find_possible_laptops(laptops: List[Laptop], person: Person) -> List[Laptop]:
possible_laptops = []
for laptop in laptops:
if laptop.operating_system == person.preferred_operating_system:
possible_laptops.append(laptop)
return possible_laptops
people = [
Person(name="Imran", age=22, preferred_operating_system=OperatingSystem.UBUNTU),
Person(name="Eliza", age=34, preferred_operating_system=OperatingSystem.ARCH),
]
laptops = [
Laptop(id=1, manufacturer="Dell", model="XPS", screen_size_in_inches=13, operating_system=OperatingSystem.ARCH),
Laptop(id=2, manufacturer="Dell", model="XPS", screen_size_in_inches=15, operating_system=OperatingSystem.UBUNTU),
Laptop(id=3, manufacturer="Dell", model="XPS", screen_size_in_inches=15, operating_system=OperatingSystem.UBUNTU),
Laptop(id=4, manufacturer="Apple", model="macBook", screen_size_in_inches=13, operating_system=OperatingSystem.MACOS),
]
for person in people:
possible_laptops = find_possible_laptops(laptops, person)
print(f"Possible laptops for {person.name}: {possible_laptops}")
We know that when we save data, transfer it across a network, or take user input, everything comes in as bytes. A typical pattern in software is to accept a string in the user input, and convert it to an enum before passing it into any other function. If the string wasn’t a valid operating system we know about, we will reject it and give an error when we first accept it. All of our other functions can take an OperatingSystem
as a parameter, and know that any value it’s given must be a valid operating system. This restricts where we need to worry about incorrect input - once we’ve checked that the string was correct one time, the rest of our code doesn’t have to worry about incorrect strings.
✍️exercise
Write a program which:
- Already has a list of
Laptop
s that a library has to lend out. - Accepts user input to create a new
Person
- it should use theinput
function to read a person’s name, age, and preferred operating system. - Tells the user how many laptops the library has that have that operating system.
- If there is an operating system that has more laptops available, tells the user that if they’re willing to accept that operating system they’re more likely to get a laptop.
You should convert the age and preferred operating system input from the user into more constrained types as quickly as possible, and should output errors to stderr and terminate the program with a non-zero exit code if the user input bad values.
🌳 Inheritance
Learning Objectives
Classes can extend other classes to share most of their functionality but add or replace some of it.
Read the following code:
from typing import Iterable, Optional
class ImmutableNumberList:
# We accept any `Iterable[int]` here, so can construct with a list, a set, or anything else that can be iterated.
def __init__(self, elements: Iterable[int]):
# We copy the elements so that if someone mutates the passed in elements list, our copy won't be mutated.
self.elements = [element for element in elements]
def first(self) -> Optional[int]:
if not self.elements:
return None
return self.elements[0]
def last(self) -> Optional[int]:
if not self.elements:
return None
return self.elements[-1]
def length(self) -> int:
return len(self.elements)
def largest(self) -> Optional[int]:
# To find the largest element, we need to go through the entire list (which may take some time).
if not self.elements:
return None
largest = self.elements[0]
for element in self.elements:
if element > largest:
largest = element
return largest
# A SortedImmutableNumberList is the same as an ImmutableNumberList,
# but it changes some aspects.
class SortedImmutableNumberList(ImmutableNumberList):
def __init__(self, elements: Iterable[int]):
# We do extra work here when constructing the list,
# to make sure the elements are sorted.
# This takes more time than the ImmutableNumberList version would.
super().__init__(sorted(elements))
# This method overrides (replaces) the method with the same name on the super-class.
def largest(self) -> Optional[int]:
# Because we know the elements were already sorted in the constructor,
# we can implement finding the largest number faster.
# We don't need to look through every element - we know the largest element is at the end.
# Because we did extra work one time before (in the constructor),
# we can avoid re-doing that work every time someone calls `largest()`.
return self.last()
def max_gap_between_values(self) -> Optional[int]:
if not self.elements:
return None
previous_element = None
max_gap = -1
for element in self.elements:
if previous_element is not None:
gap = element - previous_element
if gap > max_gap:
max_gap = gap
previous_element = element
return max_gap
values = SortedImmutableNumberList([1, 19, 7, 13, 4])
print(values.largest())
print(values.max_gap_between_values())
unsorted_values = ImmutableNumberList([1, 19, 7, 13, 4])
print(unsorted_values.largest())
print(unsorted_values.max_gap_between_values()) # This doesn't work - the superclass doesn't define this method.
We have two classes that behave the same. They both have a constructor, and four methods (first
, last
, largest
, length
). SortedImmutableNumberList
also has an extra method: max_gap_between_values
which ImmutableNumberList
does not have.
The largest
implementation is different for the two classes. They have different trade-offs. If we will frequently need to get the largest value from the list, ImmutableNumberList
is going to be slower, because it looks through every element every time it needs to find the largest value. If instead we will frequently need to get the length of the list, ImmutableNumberList
is going to be faster, because it does less work in the constructor.
Reading
Programmers used to use inheritance a lot. Over time, many people are preferring composition over inheritance.
Have a read of this article describing the differences between composition and inheritance and this article exploring when each makes sense.
✍️exercise
Play computer with this code. Predict what you expect each line will do. Then run the code and check your predictions. (If any lines cause errors, you may need to comment them out to check later lines).
class Parent:
def __init__(self, first_name: str, last_name: str):
self.first_name = first_name
self.last_name = last_name
def get_name(self) -> str:
return f"{self.first_name} {self.last_name}"
class Child(Parent):
def __init__(self, first_name: str, last_name: str):
super().__init__(first_name, last_name)
self.previous_last_names = []
def change_last_name(self, last_name) -> None:
self.previous_last_names.append(self.last_name)
self.last_name = last_name
def get_full_name(self) -> str:
suffix = ""
if len(self.previous_last_names) > 0:
suffix = f" (née {self.previous_last_names[0]})"
return f"{self.first_name} {self.last_name}{suffix}"
person1 = Child("Elizaveta", "Alekseeva")
print(person1.get_name())
print(person1.get_full_name())
person1.change_last_name("Tyurina")
print(person1.get_name())
print(person1.get_full_name())
person2 = Parent("Elizaveta", "Alekseeva")
print(person2.get_name())
print(person2.get_full_name())
person2.change_last_name("Tyurina")
print(person2.get_name())
print(person2.get_full_name())