A short story moving Minions into Python 3

Shkelqim Memolla
9 min readMar 25, 2022

It wall started on…

one day in mid 2020 when we were discussing about some potential features that could be added in one of our main services, which we call Minion. The new changes would improve some sections of the service, but on the other side would more tech debt. Yes, that’s right. We’d be improving an already outdated service running in Python 2. That was the moment when we realized that the Python 2 was already past its end-of-life since the January of 2020 and the idea of moving into newer version was delayed and ignored. It was time to start with a more interesting challenge, moving away from Python 2 to Python 3. Before talking about how and whats, let’s have a quick overview of …

The current setup

The PGS team in Stylight, plays an important role in the product. And that means taking ownership and being responsible for different services ranging from internal to external ones, i.e. Stylight’s partners.

The Minion code mentioned above is shared among three main services:

  • A job processing service built with Python 2 and RQ
  • An internal API service build with Python 2 and GraphQL
  • A data consuming service build with Python 2 and Tornado

With the Python being our primary language in our team, means many services in need of being ported to Python 3.

All our services run inside Docker containers in AWS. A simplified pipeline is shown below:

CircleCI Pipeline

Having the same code shared among multiple services poses the threat of breaking them all at once. In order to assure a smooth transition we had to come up with a …

Migration strategy

In the testing step, in our CircleCI pipeline we run all unit and integration tests altogether. And that usually takes some time. Since we were planning to extend our pipeline to support Python 3, we had to improve the testing time. With some minor effort we changed our CircleCI configuration to run the tests in parallel, with four containers instead of a single one.

Running tests in parallel in CircleCI

Before starting any development, we wanted to make sure that any file change is tested with a Python 3 interpreter as well. That defines the first step in the migration strategy:

  1. Adapt CircleCI pipeline to build and run the new changes with Python 3. The setup would be changed to:

In this way we’d assure that a migrated file would build and run successfully with Python 3. One thing that needs to be emphasized is that we’d maintain a list of already migrated files/folders so that the Python 3 pipeline in the CircleCI would run only with those files.

Now that the build setup is ready the question is where to start from, i.e. which Python file should be picked first? And that bring us to the second step:

2. Create a topological sort of file interdependency

Here we wanted to make sure that file/module that is least referenced should be picked up first, so that the changes are minimized and isolated. For that we used graphviz to generate the following graph:

A representation of module relationships. From top to bottom, with the top having no dependencies and bottom nodes depending from top ones. PS: This graph is for one service only.

When generating this graph we made sure to exclude any third-party packages in order to keep it simpler. From there it was easy to generate the files names in the same order. At the end of this step we had in our hands a huge list of Python files sorted according to the order of relevancy and imports.

3. Divide and Conquer

Each team member would now pick a file, assign their name on it and start migrating it to Python 3. In this way we’d know who is tackling what and which file to pick up next.

We followed three simple conventions:

  • Add a “@PORTED“ line at the top of the file denoting that this file is already migrated to Python 3
  • Introduce six to have compatible code for both Python versions
  • Add a “@PyCompat” line at the beginning of a block of code where six was used.

The reason for the first one was that at the end of the port we’d expect not to find any Python files containing the “@PORTED” annotation. And that would imply all files were modified at some point and successfully migrated.

For the second point, we wanted to maintain the compatibility between two versions and perform incremental migrations in smaller batches.

And the last point, would serve more as a reminder about which blocks needed to be cleaned after the code was fully migrated.

4. Porting the Codebase

This would be the step where we would be starting making changes. But before diving into changes, we had to know what to look for. For that we created a shared file in the team containing all incompatibilities between Python versions. And that would act as a reference while going through the code.

4.1. Dealing with incompatibilities

The overall migration process went smooth. We had some challenges with these three areas though:

  • Iterators & Lists

In Python 2 dict keys/values/items, range, map, filter they all returns lists, whereas in Python 3 they are iterators. The existing test cases would not catch these issues. What helped here was the incompatibility list and new test cases.

  • Strings & Bytes

The strings are used almost everywhere in the codebase. Compared to the iterators, the issues related with strings and bytes most of the time popped out by themselves from the current tests in place. What turned challenging here was that in some places they were required to be returned as strings and some other places not. And sometimes it resulted in a domino effect of breaking some methods and fixing others. In short, we had to make sure that the inputs/outputs/conversions of strings to bytes and vice-versa were consistent .

  • File Readers & Streams

We work with the files and streams a lot, and by that I mean really a lot. That is, the files play a critical role on our day-to-day job. And for that we have written efficient and optimized parsers and streams.

While porting these parts of the code we had to be careful of not introducing extra operations in place. What do I mean by that? Imagine these cases:

In Python 2 -> read from file as strings -> return strings -> caller expects strings.

In Python 3 -> read from file as bytes -> return bytes -> caller expects strings.

In this case we’d be breaking the caller’s contract by returning incompatible data types. The other option would be to convert the bytes into strings and keep the same behavior as before. But this would result into an extra step, which could be costly. What we did, was adapt the original caller and its chain of callers. Of course this resulted into more code changes, but the same performance was kept.

PS: This section is so important for us that we usually ask many interview questions about it.

4.2. Replacing libraries

Python 3 comes with extra standard libraries compared to Python 2. In this step we noted down all our installed packages and made a cross check on what’s already present in the Python 3 standard library. With that in mind we replaced packages such as:

  • mock with unittest
  • unicodecsv with csv

and a bunch of others.

In terms of incompatible changes, there were not many. This step turned out to be relatively easier compared with the others.

4.3. Moving Datadog outside the service

Collecting metrics and stats are done via Datadog. The issue was the agent we had in place was installed in the same service itself. Even after porting the codebase we’d still have external dependencies with Python 2.

The advantage was that the service was already containerized, and it would be easier to have a newer agent in a separate container. Thus moving the agent to a Python 3 dockerized container, required just a few changes on the existing setup.

Splitting Datadog from the service

So far we had a codebase which would run on both Python 2 & 3 versions. All the changes we had done were tested with both versions, but the production was still running in Python 2. Clearly one thing was missing:

5. Switching to the Python 3 interpreter in production

This was the moment we had all been waiting for. Then one Monday morning we all gather, look each other on eyes and nod in agreement. We press the button, and all observe like little children of what was going to happen.

Enabling Python 3 worklfow

What could go wrong ?

Not much actually. We could still easily roll back without any efforts. In the end we had code that was compatible with both versions. During this time we had to just sit back and monitor the service.

Did anything go wrong with all the precautions in place?

The entire migration process was done progressionally, i.e. once a portion was ported it would be deployed immediately. And that significantly reduced the number of issues that could pop in the final state. From time to time we had minor issues which were resolved easily. Once we switched to Python 3 there was still one issue that slipped through the cracks, not a minor one but also not too critical. Without going into too much details, there were a few jobs pickled with Python 2 that had datetime objects in it and un-pickled with Python 3 from Redis, and that caused them to fail. This was initially mitigated by putting an intermediary step of overwriting the default functionality of rq.job.loads to unpickle in bytes and later on fixed with the RQ package upgrade.

6. Removing Python 2 code

Remember the annotations introduced at the beginning, i.e. “@PyCompat” and “@PORTED”?

That was the step where we CTRL+F for those occurrences. As a requirement all the Python files that were generated earlier, should contain “@PORTED” line header. That translates to opening such files one-by-one and removing any code block with the “@PyCompat” annotation and finally removing the “@PORTED” header.

Then we got rid of Python 2 specific Docker files and adapted the CircleCI config to permanently remove the Python 2 jobs.

Aftermath

This turned out to be a really interesting challenge. In the end we were left with a newly polished service, free of Python 2 and more lovely to work with. We, as engineers had the chance to dive deep in different sort of topics in both Python versions, including their source code.

To summarize what helped:

  • Having a clear migration strategy
  • Running/building the codebase with both Python versions
  • Testing, testing and testing. Not just unit testing, but also the integration testing with the help of staging environments.

What’s next on roadmap?

As mentioned in the beginning, the Minion code is shared among three service. Our goal is to split them into smaller microservices. Does that sounds like a challenge you could take on? If so, we have many open positions and would encourage you to apply.

I hope you learned something through this reading and here’s a mandatory reference:

Minion codebase in Stylight

Author: Shkelqim Memolla

Hey there! I’m Shkelqim (pretty hard to pronounce I know), a Software Engineer with a keen eye on clean, maintainable and efficient code. Feel free to say hi or connect! This is my comfort place I where I usually write my thoughts down in various topics.

--

--

Shkelqim Memolla

Lead Engineer @Stylight & METU graduate. Think it, build it, ship it!