Structure of a Python project
Objectives of this section
- Understand the difference between Python modules and packages.
- Define the tree structure of your project (
src
vsflat
layout).- Understand what
import
does and control the use of your package.
In this section, we'll take a look at the structure of a Python project, with examples of tree structure. This trees structure are not fixed, but are used in most scientific computing projects (numpy, pandas, scikit-learn, etc.)
.
A Python library consists of:
-
Python files with the extension
.py
made for import, called modules, - directories containing Python files, called packages.
Python files with the extension .py
designed to execute code are called scripts.
Modules and packages can be used in the Python interpreter using the import
command.
A number of different concepts are commonly referred to by the word package
. A python project that you want to distribute is commonly called package but a directory, containing Python files, that you can import is also called package. We will call a distribution (or project) package the former and package (or library) the later.
When designing your own library, it's important to understand how they work, so you can define the behavior you want the end user get when importing all or part of the package.
Module
To illustrate the use of a module in Python, we're going to write a calculator
that only knows how to add
and subtract
.
Here the calculator_mod.py
file in examples/calculator_mod
directory:
"""
Calculator module
"""
def add(a, b):
"""
return a + b
"""
return a + b
def sub(a, b):
"""
return a - b
"""
return a - b
Using a module
Note : To try the examples below, activate your
pipenv
environment and go in directoryexamples/calculator_mod
:pipenv shell cd examples/calculator_mod
There are several ways to import a module using the import
keyword.
- You can import a module via its name.
import calculator_mod
calculator_mod.add(1, 2)
- You can import some objects of a module.
from calculator_mod import sub
sub(1, 2)
- You can import a module by modifying its name.
import calculator_mod as calc
calc.add(1, 2)
- You can import all the objects of a module..
from calculator_mod import *
add(1, 2)
This type of import should be avoided as it is considered dangerous.
import
explicitly defines certain module attributes
-
__dict__
: dictionary used by the module for attribute namespace -
__name__
: module name -
__file__
: module file -
__doc__
: module documentation
warning: When a program is running, the module is imported only once.
Module Execution
To execute a module as a script, you can add the following test at the end of a module:
if __name__ == '__main__':
print(add(1, 2))
Here's the new calculator_mod.py
.
"""
Calculator module
"""
def add(a, b):
"""
return a + b
"""
return a + b
def sub(a, b):
"""
return a - b
"""
return a - b
if __name__ == '__main__':
print(add(1, 2))
The module can now be executed.
Exercise Add the two lines above to
calculator_mod.py
and tests it:python3 calculator_mod.py
:source_information: The
if
part is only executed if the module is the main program. It is not executed when importing.
Package
As mentioned in the introduction, a package is a set of Python modules.
Let's take a look at the following projects structure:
- single module layout
- flat layout
- src layout
Single module layout
examples/calculator_mod/
├── calculator_mod.py
├── pyproject.toml
├── README.md
└── LICENSE.txt
A standalone module is placed directly under the project root, instead of inside a package folder. This is different from the flat and src layout where there is at least one top package folder.
Flat layout
tree examples/calculator_flat/
examples/calculator_flat_layout/
├── calculator
│ ├──__init__.py
│ └── operator
│ ├── __init__.py
│ ├── add.py
│ └── sub.py
├── pyproject.toml
├── README.md
└── LICENSE.txt
This tree is called flat-layout
.
At the root we find the pyproject.toml
file, which describes the project and configures the distribution build, the README
and the license file.
The root also contains a package called calculator
and a sub-package called operator
, containing two modules (add.py
and sub.py
).
Src layout
tree examples/calculator_src/
examples/calculator_src_layout/
├── src
│ └── calculator
│ ├──__init__.py
│ └── operator
│ ├── __init__.py
│ ├── add.py
│ └── sub.py
├── pyproject.toml
├── README.md
└── LICENSE.txt
This tree is called src-layout
. The same packages as in flat-layout
are found in the src
folder rather than in the root directory.
The src-layout
requires the project to be installed in order to execute its code, which is not the case with the flat-layout
(you only need to go to the project root).
This means that src-layout
implies an additional step in the project development flow (in general, an editable installation is used for development and a normal installation is used for testing).
More information at https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/
__init__.py
File
The The
__init__.py
file
- It is mandatory for Python to consider directories as containing packages or modules (with the exception of namespace package).
- It can be empty.
- It may contain initialization code.
- It may contain
imports
.- It may contain the
__all__
variable.
Let's take the example of the file calculator/operator/__init__.py
.
__all__ = ['add', 'sub']
In this way, we can import add
and sub
by simply doing:
from calculator.operator import *
Then access attributes and functions by doing:
print(add.add(1, 2))
print(sub.sub(1, 2))
You can also use relative import when you want to use functions from your application in different modules.
Let's add these lines in calculator_flat/calculator/__init__.py
file:
from . import operator
from .operator import *
from .operator.add import add
We then have the following behavior:
import calculator
calculator.add(1, 2)
calculator.sub.sub(2,3)
Exercise: What else can we do?
calculator.add.add(1, 2)
?calculator.sub.sub(1, 2)
?calculator.operator.sub(1, 2)
?calculator.operator.add.add(1, 2)
?Solution:
- no
- yes
- no
- yes
Searching for modules and packages
For Python to import a module correctly, it must be in its PATH. The sys
module provides a list of directories where Python searches for modules.
Exercise: display your sys.path
import sys print(sys.path)
Solution:
python3 -c "import sys;print(sys.path)" ['', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/home/pierron/.local/lib/python3.8/site-packages', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages'] pipenv run python3 -c "import sys;print(sys.path)" ['', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/home/pierron/.local/share/virtualenvs/splienart-P0OIA1Oy/lib/python3.8/site-packages']
You see the differences, the paths after
lib-dynload
are replaced by the virtual environment path.
Python includes the current directory in the PATH, but you can also add directories at runtime, since sys.path
is just a list.
sys.path.append("/home/my_project/")
print(sys.path)
When you want to import foo, here's the order of the files searched for in sys.path.
- foo.dll, foo.dylib or foo.so.
- foo.py
- foo.pyc
- foo/_init__.py
warning: Adding a new entry to the
sys.path
list is shown only to illustrate how Python package import works. Under no circumstances should you use this method for your application. In the next section, we'll show you how to do things properly.