Copyright 2020 Doulos Ltd
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
This document is about recommended coding styles and idioms that are specific to Python. There are many general recommendations about coding style that would apply equally to Python and to other languages (e.g. use meaningful variable names, keep most functions short, insert blank lines between functions and between groups of related statements), but are not included in this document.
In what follows, recommended Python coding styles and idioms are tagged as GOOD, and counter-examples are tagged as BAD. The counter examples are not necessarily bad in themselves, but each has been chosen to contrast with a particular GOOD example, just to make a specific point. Some of the points made are mere conventions that reflect common practice in the Python community. Many of the points made are not black-and-white, but require judgement and interpretation. Sometimes you may need to be a bit forgiving because it is not always possible to capture the full spirit of a coding guideline in a short example. Often you will need to "close your eyes" to certain aspects of an example, because the example has been contrived to illustrate a specific point. For example, many of the variable and function names below are single letters rather than being meaningful words (because they have no meaning in these contrived examples!) Please don't get hung up or dogmatic about any of this, but try to understand the spirit!
There is an executable version of these coding guidelines available on Google Colab, allowing you to run every Python code fragment from this document, online, without having to install Python on your computer. Click here to register and gain access.
Start from the design principles of the Python language, as outlined in the Zen of Python
import this
Always use 4 spaces for indentation, as opposed to 1 space, or 2 spaces, or variable spacing, or tabs. Avoid using tabs for indentation because they are sometimes displayed differently in different editors, which can cause confusion and waste time, especially when editors convert spaces to tabs or vice-versa.
GOOD
def f():
if True:
a = 0
while False:
b = 1
c = 2
else:
d = 3
f()
BAD
def f():
if True:
a = 0
while False:
b = 1
c = 2
else:
d = 3
f()
Put one space either side of the assignment symbol and either side of operators:
GOOD
a = 2
if a == 2:
b = 2 + 2
BAD
a=2
if a==2:
b= 2+2
Put one space immediately after , and : in all lists, argument lists, and dictionaries:
GOOD
x = [4, 5, 6]
def f(a, b, c):
return {1: a, 2: b, 3: c}
f(x[1], x[0], x[2])
BAD
x = [4,5,6]
def f(a,b,c):
return {1:a,2:b,3:c}
f(x[1],x[0],x[2])
But do not put white space immediately within parenthesis:
GOOD
x = (2 + (3 * 4))
y = {'a': 1, 'b': 2, 'c': [3, 4]}
BAD
x = ( 2 + ( 3 + 4 ) )
y = { 'a': 1, 'b': 2, 'c': [ 3, 4 ] }
Although white space should be put around = in general, do not put white space around = when setting default values for arguments or when passing keyword arguments:
GOOD
def f(arg1, arg2=0, arg3=0):
return arg1 + arg2 + arg3
f(1, arg3=3)
BAD
def f(arg1, arg2 = 0, arg3 = 0):
return arg1 + arg2 + arg3
f(1, arg3 = 3)
In long expressions, consider putting white space only around lower priority operators (or use parenthesis to group operations):
GOOD
a = b = c = x = y = 1
a, b, c, x, y = [1] * 5
y = a*x + b*y + c
y
Do not use ; to put more than one statement on the same line
GOOD
a = 1
b = a + 2
BAD
a = 1; b = a + 2
Avoid using the line continuation character \. Instead, take advantage of the rule that allows line breaks within parenthesis. Where line breaks are necessary within long expressions, prefer to insert line breaks immediately before operators.
GOOD
def f(
a_very_very_long_variable_name,
another_very_very_long_variable_name,
yet_another_very_very_long_variable_name):
x = (a_very_very_long_variable_name
+ another_very_very_long_variable_name
+ yet_another_very_very_long_variable_name)
return x
f(1, 1, 1)
BAD
def f( \
a_very_very_long_variable_name, \
another_very_very_long_variable_name, \
yet_another_very_very_long_variable_name):
x = a_very_very_long_variable_name \
+ another_very_very_long_variable_name \
+ yet_another_very_very_long_variable_name
return x
f(1, 1, 1)
In control statements, always follow : with a line break. Avoid writing statements on the same line as :
condition = True
GOOD
def f():
print('Hello')
if condition:
f()
BAD
def f(): print('Hello')
if condition: f()
Use the variable name _ where the value of a variable is not being used:
GOOD
for i in range(8):
print(i, end='')
for _ in range(8): # Value of loop variable is unused
print('~', end='')
def f():
return (1, 2)
v, _ = f() # Second value of tuple is unused
v
Avoid reading the value of variable _:
BAD
print(_)
Use snake_case for variable, function, and attribute names.
Use CamelCase for class names.
Consider using the __ prefix for class attributes, particularly where the same attribute name is used in a sub-class:
GOOD
class MyClass: # Camel case
def __init__(self, data1):
self.__data = data1 # __ prefix
def get(self):
return self.__data
class MySubClass(MyClass): # Camel case
def __init__(self, data1, data2):
super().__init__(data1)
self.__data = data2 # __ prefix
def get(self):
return self.__data
my_object = MySubClass(1, 2) # Snake case
assert my_object.get() == 2
assert MyClass.get(my_object) == 1
try:
my_object.__data # __ prefix mangles the name so it is not visible outside the class
except AttributeError as details:
print(details)
Use keyword arguments (as opposed to positional arguments) where they help to improve readability:
def add_and_print(a, b, quiet=False, base8=False, base16=False):
assert not (base8 and base16)
if not quiet:
aplusb = a + b
if base8:
print(oct(aplusb))
elif base16:
print(hex(aplusb))
else:
print(aplusb)
GOOD
add_and_print(2, 2, base16=True)
BAD
add_and_print(2, 2, False, False, True)
Avoid wildcard imports, and generally prefer not to import individual names from modules. Instead, import the whole module, and prefix the required function or variable or class name with the module name:
GOOD
import math
math.pi, math.sin(math.pi)
NOT ENCOURAGED
from math import pi
from math import sin
pi, sin(pi)
BAD
from math import *
pi, sin(pi)
While importing individual variables or functions is discouraged, it is fine to import individual modules from packages
GOOD
from matplotlib import pyplot
pyplot.show()
Only use import X as Y where Y is a standard or well-known abbreviation for X
GOOD
import numpy as np
import matplotlib.pyplot as plt
plt.plot(np.array([1, 2, 3]), np.array([30, 0, 20]))
plt.show()
BAD
import numpy as n
import matplotlib.pyplot as p
p.plot(n.array([1, 2, 3]), n.array([30, 0, 20]))
p.show()
Take advantage of lists, tuples, dictionaries, iterators, and generators. They are very convenient, very Pythonic, and have many "batteries included".
GOOD
Initialize three variables with one assignment:
a, b, c = 1, 2, 3
Swap the values of a and b with one assignment:
a, b = b, a
An example of iterator unpacking
a, *b, c = range(1, 6)
a, b, c
To initialize several variables to the same value, take advantage of the fact that you can chain the assignment operator. You don't even need a tuple:
GOOD
a = b = c = 0
To create a fixed-size list populated with a default value, use the * operator:
GOOD
N = 8
list1 = [0] * N
list1
list2 = [None] * N
list2
BAD
list1 = [0 for _ in range(N)]
list2 = [None for _ in range(N)]
This is only bad in the sense that using the * operator to build a list is such a neat trick that avoids the need for a for loop altogether. List comprehensions are good ...
Use list comprehensions in preference to for loops to populate lists, but only where the list comprehension is easy to understand. A for loop may still be preferred for more complicated or obscure cases.
GOOD
my_list = [i * i for i in range(5)]
my_list
BAD
my_list = []
for i in range(5):
my_list.append(i * i)
my_list
This is only bad in the sense that a list comprehension is a tidier way to achieve the same thing. There is nothing wrong with using a loop to append to an empty list.
Use generators or generator expressions to generate a series of objects in cases where the overhead of building a fully-populated list in memory is not necessary. That is, in cases where a series of objects are ultimately consumed one-by-one.
GOOD
my_generator = (n * n for n in range(5))
my_generator
for i in my_generator:
print(i, end=', ')
BAD
my_list = [i * i for i in range(5)]
my_list
Take advantage of functional programming features such as generators, lambda, map, filter, and zip (perhaps to avoid the need to construct lists in memory), but only where the code is easy to understand.
GOOD
my_generator = (n * n for n in range(5))
The generator yields up objects one at a time, and the map applies the given function to each object in turn to create a new iterable.
my_map = map(lambda x: x + 1, my_generator) # Increment each value from my_generator
for i in my_map:
print(i, end=', ')
A generator function, that is, a function that uses the yield statement, is a very convenient way for a user to define an iterator.
def my_generator(N): # Generate a series of integers from 0 to N-1
for i in range(N):
#print(f'yield {i}')
yield i
for j in my_generator(5):
print(j, end=', ')
A filter applies a given function to each object from the iterable my_generator(10) in turn, and itself only returns those objects for which the lambda returns True.
for n in filter(lambda arg: arg % 2, my_generator(10)):
print(n, end=', ')
A zip is like a clothing zipper, only extended to multiple dimensions. The first item from each of the iterables passed to zip is combined into a tuple, then the next item from each iterable, and so on.
z = zip(range(3), [4.0, 5.0, 6.0], ['abc', 'def', 'ghi'])
for i in z:
print(i)
BAD
Avoiding nesting zip, map, filter, generators, and similar if the code becomes hard to understand.
x = zip(map(lambda x: x * x,
(i for i in range(10) if (i % 2) == 0)),
filter(lambda x: x % 3, (i for i in range(8))))
What the heck?
tuple(x)
GOOD
It is much better to break up the code into readable parts, following the Python language design principle that flat is better than nested. This map generates the squares of the even integers between 0 and 9.
m = map(lambda x: x * x, (i for i in range(10) if (i % 2) == 0))
itertools.tee is being used to replicate the iterator so that we can show the value of m for debug purposes, the point being that the call to tuple(tmp) will exhaust the iterator tmp, so we need second copy of the iterator to pass forward into the zip.
import itertools
tmp, m = itertools.tee(m)
tuple(tmp)
This filter generates the integers between 0 and 7 that are not divisible by 3.
f = filter(lambda x: x % 3, (i for i in range(8)))
Again, itertools.tee is being used purely so we can show the intermediate value of f before it is consumed by the zip below.
tmp, f = itertools.tee(f)
tuple(tmp)
x = zip(m, f)
tuple(x)
Avoid using lambda where def would do perfectly well:
GOOD
def f(arg1, arg2):
return arg1 + arg2
f(1, 2)
BAD
f = lambda arg1, arg2: arg1 + arg2
f(1, 2)
When comparing a value to the special value None, prefer the is operator to the == operator, because the meaning of == can (in theory) be redefined using magic methods.
GOOD
def f(arg1=None):
if arg1 is None: # Arg1 is absent
return 0
else:
return arg1
f()
BAD
def f(arg1=None):
if arg1 == None: # Might not work if == has been redefined
return 0
else:
return arg1
f()
Similarly, when treating values as truth values, avoid writing == True or > 0.
GOOD
condition = True
if condition:
pass
BAD
condition = True
if condition == True:
pass
GOOD
my_list = [1, 2, 3]
if my_list:
print(my_list)
BAD
my_list = [1, 2, 3]
if len(my_list) > 0:
print(my_list)
Take advantage of the in operator whenever you need to do an existence check. That is, don't write a for loop or use multiple or operators to test for the existence of an item in a dictionary, set, or sequence.
GOOD
my_list = [1, 2, 3]
item = 2
if item in my_list:
print("Item found")
BAD
if my_list[0] == item or my_list[1] == item or my_list[2] == item:
print("Item found")
BAD
for i in my_list:
if i == item:
print("Item found")
break
Only use the try...except syntax to handle genuine exceptions, not programming errors and bugs. Genuine exceptions are events, typically unusual events, that are best handled outside of the normal program control flow of functions, loops, and conditional statements. Keep try blocks short, because the longer the try block, the more likely you are to have programming errors (bugs) being caught as exceptions. Avoid catching unnamed exceptions, because every run-time programming error raises an exception.
GOOD
x, y = 12, 4
a = "abc"
try:
b = a[x // y] # The try-except handles division-by-zero, but there is an unanticipated bug (index out of range)
except ZeroDivisionError:
print("Unexpected divide-by-zero")
BAD
try:
x, y = 12, 4
a = "abc"
b = a[x // y] # Bug - index out of range
except: # Unnamed exception catches all exceptions, so the out-of-range error is hidden
print("Exception caught")
Do not use exceptions where normal control flow statements such as if or break would do just as well.
my_list = [1, 2, 3, -1, 4]
GOOD
for i in my_list:
if i < 0:
print('Negative value found')
break # Jump out of loop
BAD
try:
for i in my_list:
if i < 0:
raise Exception # Jump out of loop
except:
print('Negative value found')
Do use context managers whenever appropriate (the with construct), because context managers are the cleanest way to ensure that any resources allocated during execution are tidied up at the right time, an obvious example being opening and closing files:
GOOD
with open('Python style v2.ipynb') as file:
for line in file:
print(line, end='')
if "metadata" in line:
break
BAD
file = open('Python style v2.ipynb')
for line in file:
print(line, end='')
if "metadata" in line:
break
file.close() # Without a context manager, need to remember to close the file explicity
Use the @property decorator when you want to define custom setter and getter methods that are called whenever a given property of an instance object is accessed. This is more of an advanced coding idiom, but does make it possible to add any kind of custom behavior or side-effect to a simple instance object property reference. When used appropriately, in a way that is clear and natural to the user of the class, this is very Pythonic. There are other ways of achieving the same thing in Python, but this seems to have become the preferred idiom:
GOOD
class C:
def __init__(self):
self.__x = None
self.get_count = 0
self.set_count = 0
@property
def x(self): # Getter method for property x
self.get_count += 1 # Custom behavior
return self.__x
@x.setter
def x(self, value): # Setter method for property x
self.set_count += 1 # Custom behavior
self.__x = value
@x.deleter
def x(self):
print(f'Deleted after {self.get_count} gets and {self.set_count} sets')
del self.__x
c = C()
c.x = 1
c.x = 2
assert c.x == 2
del c.x
Remember the Python design principle of batteries included. Do use the "batteries" that are built into the Python language and the Standard Library, rather than reinventing the wheel and coding up such features from scratch. Here are a few common examples, but the list could go on and on:
Use the Standard Library wherever it offers the feature you need. For example, creating file system directory paths.
directory = "my_dir"
file = "my_file"
GOOD
import os
os.path.join(directory, file)
BAD
directory + "/" + file
Use the built-in sorted function or sort method to sort lists or strings, as opposed to implementing your own sorting algorithm from scratch.
GOOD
list_of_chars = sorted("Monty Python")
list_of_chars
Use join to convert lists of characters to strings (with or without a separator between the characters):
GOOD
''.join(list_of_chars)
BAD
result = ''
for char in list_of_chars:
result += char
result
Use enumerate if you need an index variable when scanning through a collection.
my_collection = {'a': 1, 'b': 2, 'c': 3}
GOOD
for i, value in enumerate(my_collection):
print(i, value, my_collection[value])
BAD
i = 0
for value in my_collection:
print(i, value, my_collection[value])
i += 1
Use f-strings whenever you need to format a text string.
GOOD
a, b = 2, 3
print(f"a = {a}, b = {b}, a + b = {a + b}")
BAD
print("a = {0}, b = {1}, a + b = {2}".format(a, b, a + b))
BAD
print("a = %d, b = %d, a + b = %d" % (a, b, a + b)) # Python 2
BAD
print("a = " + str(a) + ", b = " + str(b) + ", a + b = " + str(a + b))
Use the pytest module for unit testing. pytest is not part of the Standard Library, so usually needs to be installed using pip. The point is that pytest has become the de facto standard for unit testing in the Python world, so you should use it rather than inventing your own unit testing framework. (unittest was an older unit testing framework which had a Python implementation, but pytest is more Pythonic and has become more popular in recent years.)
Unfortunately it is hard to illustrate pytest from within a Jupyter Notebook, and unit testing is anyway such a broad topic that is is better described elsewhere.
These are general coding guidelines that would apply for any programming language, but which have a particular twist for Python.
Do use assert statements to catch potential bugs and write defensive code. assert statements should be used as checks that expected conditions do indeed hold. A failing assert statement should always indicate an unexpected programming error. Never use assert instead of if or print, because Python programs can be executed with assertions disabled.
Because Python is a dynamic language without compile-time type checking, assertions can be a useful way to check the type of incoming argument values:
GOOD
def f(a, b):
assert type(a) is int
assert type(b) is int
return (a + b) // 2
f(2, 3)
f(2, 'b')
It is best to test just a single condition with each assert, because this makes it easier to debug assertion failures.
BAD
def f(a, b):
assert type(a) is int and type(b) is int
return (a + b) // 2
f(2, 'b')
Wherever possible, re-write code to avoid the need for comments. But do write comments to explain anything surprising or obscure.
An assert is sometimes better than a comment:
GOOD
def f(arg):
assert type(arg) is float
return 10 ** arg
f(0.5)
BAD
def f(arg):
# arg should be a float
return 10 ** arg
f(0.5)
Do use docstrings to add appropriate documentation to modules, classes, and functions
GOOD
"""This is a docstring that describes the purpose of this module"""
class C:
"""This is a docstring that describes the purpose of this class"""
def f():
"""This is a docstring that describes the purpose of this function"""
pass
def g():
"""This is a docstring that describes the purpose of this function"""
pass
help(C)
Avoid global and nonlocal variables whenever possible. The alternative would be to make the objects locally accessible without reference to global or nonlocal variables. This implies passing the objects as mutable arguments, that is, objects containing variables that can be assigned within functions by calling setter methods of the object. For example:
GOOD
class C:
def set(self, arg):
self.__v = arg
def get(self):
return self.__v
def set_v(arg): # A mutable object is passed as an argument
arg.set(1) # Modifies the value of the object
def get_v(arg):
print(arg.get())
def f():
obj = C() # obj is local to h, not global, and there are no global or nonlocal accesses to obj
set_v(obj)
get_v(obj)
f()
BAD
def set_v():
global v
v = 2
def get_v():
print(v)
def f():
set_v()
get_v()
f()
del v # Serves no purpose, but illustrates that v is defined and visible at global scope
Exactly the same point can be illustrated using nonlocal scope rather than global scope.
GOOD
def g():
def set_v(arg):
arg.set(3)
def get_v(arg):
print(arg.get())
def f():
obj = C()
set_v(obj)
get_v(obj)
f()
g()
BAD
def g():
def set_v():
nonlocal v
v = 4
def get_v():
print(v)
def f():
set_v()
get_v()
f()
del v # Serves no purpose, but illustrates that v is defined and visible at nonlocal scope
g()
In the same spirit of avoiding global variables, avoid static methods in Python, because their only significance is their scope (defined within a class). If you want variables that are common to all instance objects of a given class, it is better to use class methods.
GOOD
class C:
@classmethod
def set(cls, arg):
cls.classdata = arg
@classmethod
def get(cls):
return cls.classdata
obj1 = C()
obj2 = C() # Two independent instance objects sharing a common class variable
obj1.set(5)
obj2.get()
BAD
class C:
@staticmethod
def set(arg):
global classdata
classdata = arg
@staticmethod
def get():
return classdata
obj1 = C()
obj2 = C() # Two independent instance objects sharing a common global variable
obj1.set(6)
obj2.get()
del classdata # Serves no purpose, but illustrates that classdata is global