Nowadays, it is a quite popular to store semi-structured information using JSON format. Indeed, JSON files have quite simple structure and can be easily read by human beings. JSON syntax allows one to represent complex dependencies in data and avoid data duplication. Moreover, all modern programming languages have libraries that facilitate JSON parsing and storing data into this format. Not surprisingly, JSON is extensively used to return data in Application Programming Interfaces (APIs) .
At the same time, data analysts prefer to deal with structured data represented in the form of series and dataframes. Unfortunately, transforming JSON data into structured format is not that straightforward. Previously, I preferred to develop code to parse manually complex JSON files and create a pandas dataframe from the parsed data. However, recently I have discovered a pandas function called
json_normalize that saved me some time in my projects. In this article, I explain how you can start using it in your projects.
In my previous articles (Configuring Python Workspace and Configuring Python Workspace: Poetry), I have described how I use pyenv to create several virtual environments. With the lapse of time, the tools that you install in these environments become outdated and you need a tool to update them. I develop a pyenv plugin that updates all packages in all or particular pyenv environments and in this post I describe how to use it.
Recently, I have updated my operating system, and as a part of this process I have installed the latest poetry version (a tool for Python dependency management). When I have started a new project using my typical routine, I have discovered that poetry cannot install development dependencies exiting with a weird
Recently, I have participated in a project at AI Superior aimed at the analysis of a dataset with sensitive data. So as the data have to remain private, initially we shared the dataset through a secure channel and took measures to prevent its accidental distribution (we put the dataset in a separate directory and configured git to ignore this folder and other directories containing intermediate processing results). However, working on this project I have noticed that Jupyter notebook, that is a kind of standard tool used for data analysis, may be a source of sensitive data leakage.
In the previous article, I have described how poetry can be used to configure Python workspace and to create a new Python package project. Although poetry creates the structure of a package and adds some boilerplate code, in order to develop this package in VSCode we need to do some additional configurations. In this post, I describe how to start developing a new Python package project in VSCode.
In the previous article, I have described my approach to configure Python workspace. I mentioned there that I do not use poetry because it “cannot be used to specify dependencies when you work with Jupyter notebooks”. However, people (@BasicWolf and @iroln) from the Russian tech website Habr recommended me to look at poetry closer, as it apparently can fulfil all my requirements. “Two heads are better than one”, and I started to explore this tool deeper. Indeed, I have managed to fulfil all my requirements with this tool but with some configurations. In this post, I describe how to configure it to meet my requirements and how to use it.
I like Python. For the last several years, I have used it extensively in my research. There are a lot of useful libraries, and it is an equally powerful language for writing simple scripts, producing large systems, doing data analysis and machine learning. It is very laconic and allows you to use different programming paradigms. It is quite easy to start developing in Python: modern operating systems are either already supplied with a Python interpreter or provide you with an easy way to install it. However, when you start developing more professionally using this language, you discover that its ecosystem is quite complicated. In this article, I try to shed a bit more light on the topic how to configure Python workspace.