Configuring Python Workspace: Poetry

Table of Contents

In the previous article, I have described my approach to configure Python workspace. I mentioned there that I do not use poetry because it “cannot be used to specify dependencies when you work with Jupyter notebooks”. However, people ( @BasicWolf и @iroln) from the russian tech website Habr recommended me to look at poetry closer, as it apparently can fulfil all my requirements. “Two heads are better than one”, and I started to explore this tool deeper. Indeed, I have managed to fulfil all my requirements with this tool but with some configurations. In this post, I describe how to configure it to meet my requirements and how to use it.

Poetry

I have modified my script to configure Python workspace in order to add a possibility to use poetry for dependency management. In particular, comparing to the previous version I have added the following part:

if [ $USE_POETRY -eq 1 ]; then
    echo "Installing poetry..."
    pyenv activate tools3
    curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

    source $HOME/.poetry/env

    # configuring poetry to create venv directories inside the project
    poetry config virtualenvs.in-project true

    # adding lookup of bash completion files in user's directory
    mkdir -p ~/.bash_completion.d/
    if ! [ -f ~/.bash_completion ]; then
        echo '' >> ~/.bash_completion
        echo 'for bcfile in ~/.bash_completion.d/* ; do' >> ~/.bash_completion
        echo '    [ -f "$bcfile" ] && . $bcfile' >> ~/.bash_completion
        echo 'done' >> ~/.bash_completion
        
        echo '' >> ~/.bashrc
        echo 'if [ -f ~/.bash_completion ]; then' >> ~/.bashrc
        echo '    source ~/.bash_completion' >> ~/.bashrc
        echo 'fi' >> ~/.bashrc
    fi

    if ! [ -f ~/.bash_completion.d/poetry.bash-completion ]; then
        poetry completions bash > ~/.bash_completion.d/poetry.bash-completion
    fi
    pyenv deactivate
else
    echo "Installing virtualenv..."
    pyenv activate tools3
    pip install virtualenv
    pyenv deactivate

    mkdir -p ~/.bash_fns
    
    curl -L https://raw.githubusercontent.com/zyrikby/blog_related/master/2020-02-configuring-python-workspace/pip_functions.sh > ~/.bash_fns/pip_functions.sh
    
    echo '' >> ~/.bashrc
    echo 'if [ -f ~/.bash_fns/pip_functions.sh ]; then' >> ~/.bashrc
    echo '    source ~/.bash_fns/pip_functions.sh' >> ~/.bashrc
    echo 'fi' >> ~/.bashrc

    #activating for current shell session
    source ~/.bash_fns/pip_functions.sh
fi

The if statement checks if user wants to use poetry for configuration management or prefers my custom solution (described in the previous article). If USE_POETRY is equal to 1, the script installs poetry and configures it. Let’s consider in details what is happening there.

pyenv activate tools3
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python # installing poetry

In this code fragment, the script activates the “tool3” pyenv global environment and installs poetry according to the official recommendation. Poetry adds its directory to the path in the .profile file, thus the poetry commands will be available only after the next login. In order to make them available right away, the scirpt runs the command source $HOME/.poetry/env that activates poetry for the current shell session.

The command poetry config virtualenvs.in-project true tells poetry to create virtual environment directory (.venv) inside a project directory. By default, poetry uses a separate cache directory where it stores all virtual environment related files. In the previous article, I have mentioned that in this case VSCode would not be able to activate the virtual environment automatically. Thus, this configuration is required to fulfil my requirement for automatic activation of the virtual environment.

The following fragment is required to activate bash completion for the poetry commands. Poetry documentation recommends activation of the bash completion using the command poetry completions bash > /etc/bash_completion.d/poetry.bash-completion. Unfortunately, in (k)Ubuntu 18.04 this command generates Permission denied exception because poetry does not have permission to write to the /etc/bash_completion.d/ directory. In order to avoid this exception, you can either run this command with elevated privileges (e.g., using sudo) or you can add the functionality of bash completion files activation from your user directory:

# adding lookup of bash completion files in user's directory
mkdir -p ~/.bash_completion.d/
if ! [ -f ~/.bash_completion ]; then
    echo '' >> ~/.bash_completion
    echo 'for bcfile in ~/.bash_completion.d/* ; do' >> ~/.bash_completion
    echo '    [ -f "$bcfile" ] && . $bcfile' >> ~/.bash_completion
    echo 'done' >> ~/.bash_completion
    
    echo '' >> ~/.bashrc
    echo 'if [ -f ~/.bash_completion ]; then' >> ~/.bashrc
    echo '    source ~/.bash_completion' >> ~/.bashrc
    echo 'fi' >> ~/.bashrc
fi

After that you just need to add poetry bash completion file to this directory if it is not there:

if ! [ -f ~/.bash_completion.d/poetry.bash-completion ]; then
    poetry completions bash > ~/.bash_completion.d/poetry.bash-completion
fi

After you have executed this installation script please log off and log in in order to activate the configuration.

Development Workflows with Poetry

Poetry stores project configuration in the pyproject.toml file. Mainly, you use it to specify your package and development dependencies (they all stored in the same file but in different toml sections). When you install dependencies, poetry creates the poetry.lock file where it stores the exact versions of the dependencies after it has resolved them. You should add this file under the version control so that your collaborators later can replicate your virtual environment.

If you do not find something in this tutorial, please refer to the official documentation of this tool.

Python Development Workflow

As I described in my previous article, I use Python mainly to prepare some scripts required for my research. They are not supposed to be installed into the system. Thus, most often I make a root directory for the scripts related to a research project and initialize the poetry tool there:

$ mkdir -p ~/projects/new_scripts_proj
$ cd ~/projects/new_scripts_proj
$ poetry init

This command will ask several questions about the project settings, and as a result it will create a pyproject.toml file inside the directory with the following content (I have not add any dependencies yet):

[tool.poetry]
name = "new_scripts_proj"
version = "0.1.0"
description = ""
authors = ["Yury Zhauniarovich <email@email.com>"]

[tool.poetry.dependencies]
python = "^3.8"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"

Now, you can add your dependencies either directly to this file (note that there are two different sections for package and development dependencies) or using the poetry add command:

$ poetry add "numpy==1.18.0"

The poetry add command adds the dependency to the pyproject.toml file and installs it. Thus, when you first run this command it will also create a .venv directory with virtual environmnet files inside the project folder. It will also automatically generate poetry.lock file. The --dev option allows you to add a development dependency. For instance, you can install pylint using the following command: poetry add --dev pylint. After these operations, your pyproject.toml should look similar to this:

[tool.poetry]
name = "new_scripts_proj"
version = "0.1.0"
description = ""
authors = ["Yury Zhauniarovich <email@email.com>"]

[tool.poetry.dependencies]
python = "^3.8"
numpy = "1.18.0"

[tool.poetry.dev-dependencies]
pylint = "^2.4.4"

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"

As you can see, the numpy library has been added to the package dependencies with the exact version we have provided. At the same time, the pylint package has been added to development dependencies, and its version is specified using caret requirements. You can read about all ways of specifying dependency versions supported by poetry in the documentation.

You can list all dependencies in the current virtual environment using the poetry show command. By default, this command shows all dependencies and their versions, however it has several useful options. For instance, the --tree option shows the dependency tree, while --no-dev suppresses the output for development dependencies. However, the most useful option is -o or --outdated. It shows the list of outdated packages. For instance, we can see that the latest available version for numpy is 1.18.1.

$ poetry show
astroid           2.3.3  An abstract syntax tree for Python with inference support.
isort             4.3.21 A Python utility / library to sort Python imports.
lazy-object-proxy 1.4.3  A fast and thorough lazy object proxy.
mccabe            0.6.1  McCabe checker, plugin for flake8
numpy             1.18.0 NumPy is the fundamental package for array computing with Python.
pylint            2.4.4  python code static checker
six               1.14.0 Python 2 and 3 compatibility utilities
wrapt             1.11.2 Module for decorators, wrappers and monkey patching.

$ poetry show --tree
$  poetry show --tree
numpy 1.18.0 NumPy is the fundamental package for array computing with Python.
pylint 2.4.4 python code static checker
├── astroid >=2.3.0,<2.4
│   ├── lazy-object-proxy >=1.4.0,<1.5.0 
│   ├── six >=1.12,<2.0 
│   └── wrapt >=1.11.0,<1.12.0 
├── colorama *
├── isort >=4.2.5,<5
└── mccabe >=0.6,<0.7

$ poetry show --no-dev
numpy 1.18.0 NumPy is the fundamental package for array computing with Python.

$ poetry show -o
numpy 1.18.0 1.18.1 NumPy is the fundamental package for array computing with Python.
wrapt 1.11.2 1.12.0 Module for decorators, wrappers and monkey patching.

Poetry may be used to update dependencies. You can either update all package dependencies using the poetry update command, or you can update a particular library providing its name as an argument. By default, this command updates all dependencies to the latest allowed versions. This means that in our case the command poetry update numpy will not update numpy library, because only version 1.18.0 is allowed by our pyproject.toml file. If you want to update this dependency, at first you need to modify pyproject.toml weakening the specified constraint, e.g., changing the line to numpy = "1.18.*". You can read more how to specify dependency versions in the documentation.

After the project is created, you can run VSCode in this directory, and it should automatically activate the virtual environment due to the configurations we have made previously. However, if you use other tools you may need to run some commands in the created virtual environment. In this case, you can use the following two commands to reach this goal: poetry run <command> and poetry shell. The former just runs one provided command, while the second spawns a new shell session with the activated virtual environment.

If you plan to develop a package, you may consider creating a project using the poetry new command. In this case, poetry will create some boilerplate code and files so that you can start developing your package faster:

$ poetry new --src poetry-demo 
Created package poetry_demo in poetry-demo
$ cd poetry-demo/
$ tree
.
├── pyproject.toml
├── README.rst
├── src
│   └── poetry_demo
│       └── __init__.py
└── tests
    ├── __init__.py
    └── test_poetry_demo.py

The option --src will put the sources of the package into the src directory. If you plan to distribute packages this is a preferred way of organizing your project.

Then, you can use the commands poetry build and poetry publish to build package and publish it on Pypi. However, as I previously mentioned currently I do not publish packages, so if you need this functionality please refer to the poetry documentation.

Finally, if you have someone who still uses the requirements.txt file to define dependencies, poetry can export the dependencies into this format:

$ poetry export -f requirements.txt
numpy==1.18.0 \
    --hash=sha256:b091e5d4cbbe79f0e8b6b6b522346e54a282eadb06e3fd761e9b6fafc2ca91ad \
    --hash=sha256:443ab93fc35b31f01db8704681eb2fd82f3a1b2fa08eed2dd0e71f1f57423d4a \
    --hash=sha256:88c5ccbc4cadf39f32193a5ef22e3f84674418a9fd877c63322917ae8f295a56 \
    --hash=sha256:e1080e37c090534adb2dd7ae1c59ee883e5d8c3e63d2a4d43c20ee348d0459c5 \
    --hash=sha256:f084d513de729ff10cd72a1f80db468cff464fedb1ef2fea030221a0f62d7ff4 \
    --hash=sha256:1baefd1fb4695e7f2e305467dbd876d765e6edd30c522894df76f8301efaee36 \
    --hash=sha256:cc070fc43a494e42732d6ae2f6621db040611c1dde64762a40c8418023af56d7 \
    --hash=sha256:6f8113c8dbfc192b58996ee77333696469ea121d1c44ea429d8fd266e4c6be51 \
    --hash=sha256:a30f5c3e1b1b5d16ec1f03f4df28e08b8a7529d8c920bbed657f4fde61f1fbcd \
    --hash=sha256:3c68c827689ca0ca713dba598335073ce0966850ec0b30715527dce4ecd84055 \
    --hash=sha256:f6a7421da632fc01e8a3ecd19c3f7350258d82501a646747664bae9c6a87c731 \
    --hash=sha256:905cd6fa6ac14654a6a32b21fad34670e97881d832e24a3ca32e19b455edb4a8 \
    --hash=sha256:854f6ed4fa91fa6da5d764558804ba5b0f43a51e5fe9fc4fdc93270b052f188a \
    --hash=sha256:ac3cf835c334fcc6b74dc4e630f9b5ff7b4c43f7fb2a7813208d95d4e10b5623 \
    --hash=sha256:62506e9e4d2a39c87984f081a2651d4282a1d706b1a82fe9d50a559bb58e705a \
    --hash=sha256:9d6de2ad782aae68f7ed0e0e616477fbf693d6d7cc5f0f1505833ff12f84a673 \
    --hash=sha256:1c35fb1131362e6090d30286cfda52ddd42e69d3e2bf1fea190a0fad83ea3a18 \
    --hash=sha256:56710a756c5009af9f35b91a22790701420406d9ac24cf6b652b0e22cfbbb7ff \
    --hash=sha256:03bbde29ac8fba860bb2c53a1525b3604a9b60417855ac3119d89868ec6041c3 \
    --hash=sha256:712f0c32555132f4b641b918bdb1fd3c692909ae916a233ce7f50eac2de87e37 \
    --hash=sha256:a9d72d9abaf65628f0f31bbb573b7d9304e43b1e6bbae43149c17737a42764c4

If you do not need hashes, add the --without-hashes option to the command. In this case, the output of the command should be more familiar.

Data Analysis Workflow

The Data Analysis Workflow is similar to Python Development Workflow: you create a new directory, initialize new poetry project, and installs all necessary dependencies. Then, you just need to run Jupyter Notebook from the virtual environment:

$ poetry run jupyter notebook

Note, we do not need to install jupyter notebook into our environment. It is globally available due to pyenv jupyter global environment. However, the configuration that we have made will make the packages installed by poetry in the virtual environment available.

Conclusion

Personally, from now on I plan to use poetry for dependency management because it fulfils my requirements. Moreover, contrary to my custom approach described in the previous article that relies on pip, which does not have dependency resolver, poetry employs one, therefore you should always get consistent versions of the dependencies in your virtual environment.

However, if you are mostly develop Python packages rather than simple scripts you may want to consider a more advanced project management tool called DepHell. It was also recommended to me in the article, however, I have found it quite complicated to adapt in my scenarios.

Yury Zhauniarovich
Yury Zhauniarovich
Lead Data Scientist
Independent Cyber Security Researcher

Related