In the previous article, I shared my setup for producing the graphs for research papers. However, recently when I was working on figures for a new paper, I discovered that my setup must be updated. The reason is that the new matplotlib version (since 3.6) produces a warning that the embedded seaborn styles are now deprecated. In this article, I provide the updates to the setup described in the previous article.
When you write a scientific paper, one of the most common tasks is to analyze the obtained results and design beautiful graphs explaining them. Currently, in the research community, Python’s ecosystem is the most popular for achieving these goals. It provides web-based interactive computational environments (e.g., Jupyter Notebook/Lab) to write code and describe the results, and pandas and matplotlib libraries to analyze data and produce graphs correspondingly. Unfortunately, due to the rich functionality, it is hard to start using them effectively in your everyday research activities when you initiate your path as a researcher. In this article, I would like to share some tips and tricks on how to employ the matplotlib library to produce nice graphs for research papers.
Nowadays, it is a quite popular to store semi-structured information using JSON format. Indeed, JSON files have quite simple structure and can be easily read by human beings. JSON syntax allows one to represent complex dependencies in data and avoid data duplication. Moreover, all modern programming languages have libraries that facilitate JSON parsing and storing data into this format. Not surprisingly, JSON is extensively used to return data in Application Programming Interfaces (APIs) .
At the same time, data analysts prefer to deal with structured data represented in the form of series and dataframes. Unfortunately, transforming JSON data into structured format is not that straightforward. Previously, I preferred to develop code to parse manually complex JSON files and create a pandas dataframe from the parsed data. However, recently I have discovered a pandas function called json_normalize
that saved me some time in my projects. In this article, I explain how you can start using it in your projects.
In my previous articles (Configuring Python Workspace and Configuring Python Workspace: Poetry), I have described how I use pyenv to create several virtual environments. With the lapse of time, the tools that you install in these environments become outdated and you need a tool to update them. I develop a pyenv plugin that updates all packages in all or particular pyenv environments and in this post I describe how to use it.
Recently, I have updated my operating system, and as a part of this process I have installed the latest poetry version (a tool for Python dependency management). When I have started a new project using my typical routine, I have discovered that poetry cannot install development dependencies exiting with a weird SolverProblemError
error.
Recently, I have participated in a project at AI Superior aimed at the analysis of a dataset with sensitive data. So as the data have to remain private, initially we shared the dataset through a secure channel and took measures to prevent its accidental distribution (we put the dataset in a separate directory and configured git to ignore this folder and other directories containing intermediate processing results). However, working on this project I have noticed that Jupyter notebook, that is a kind of standard tool used for data analysis, may be a source of sensitive data leakage.
In the previous article, I have described how poetry can be used to configure Python workspace and to create a new Python package project. Although poetry creates the structure of a package and adds some boilerplate code, in order to develop this package in VSCode we need to do some additional configurations. In this post, I describe how to start developing a new Python package project in VSCode.
In the previous article, I have described my approach to configure Python workspace. I mentioned there that I do not use poetry because it “cannot be used to specify dependencies when you work with Jupyter notebooks”. However, people (@BasicWolf and @iroln) from the Russian tech website Habr recommended me to look at poetry closer, as it apparently can fulfil all my requirements. “Two heads are better than one”, and I started to explore this tool deeper. Indeed, I have managed to fulfil all my requirements with this tool but with some configurations. In this post, I describe how to configure it to meet my requirements and how to use it.