Kyle Edwards

Python Concurrency and Asyncio

Introduction to Asynchronous Python

Modern languages like Python and Node.js implement a flavor of concurrency called called cooperative concurrency. This model is not primarily concerned with parallelism or multithreading, but rather the ability for a single thread to juggle various task by means of an event loop. When one task needs to wait for some asynchronous process to complete, whether that’s a timer, an HTTP request, a database query, etc…, it can yield its execution back to the loop for something else to run.

The pythonic way of running cooperative, asynchronous code is the asyncio module and the async/await syntax. When you call await in an async function, the execution is yielded up the call stack, so something else can execute. You’re effectively saying, “Hey, I know what I’m about to do is going to take a second, so go do something else instead.”

These features were originally accessible using generator-based coroutines, but now you can define an asynchronous functions with the async def syntax. These functions, when called, don’t return their return value. Instead they return an awaitable coroutine. Additional awaitable mechanisms include Tasks and Futures.

Tasks provide a way to schedule coroutines concurrently on the event loop, and provide some high-level functionality around cancellation. Futures are lower level objects that represent the result of an asynchronous operation. Futures are analogous to Promises in Javascript.

Generators and AsyncIO

You can think of generators like an extension of a function that has the ability store its internal state, including a frame containing their internal variables. Generators in Python can even be sent in variables with .send(). This means that your application’s execution can jump in and out of a coroutine and its caller.

Coroutines are an opinionated implementation of an asynchronous pattern built using generators. In python 3.4, the coroutine() function simply returns a passed in generator, but more functionality around coroutines are in the Task implementation. In the Task implementation, coroutines are run on the event loop, yielding their execution whenever possible. The Task is then reapplied to the event loop with loop.call_soon(task._step), unless that task is complete or an exception was caught. This is known as a trampoline pattern, where work reschedules itself. Because of how event loops iterate through callbacks, trampolines (usually) nicely interleave workloads.

  1. Calls currently ready callbacks.
  2. Polls for IO using the selector.
  3. Schedules the resulting callbacks for the next iteration.
  4. Looks at whether some of the call later callbacks are ready.

It’s important to note on a single thread, only one coroutine can be executed at a given time. There is NO PARALLELISM is cooperative concurrency. This is why it’s extremely important to use non-blocking code whenever possible, which either yields at an extremely low level, or the work is broken up into small enough chunks as to give other operations a chance to get time in the event loop.

Coroutines store their internal state in cr_ member variables, while generators store theirs in gi_ variables. Even in the C implementation of the coroutine code, coroutines use some generator methods.

A quirk about python Tasks is that inner async functions run as regular generator coroutines and don’t get wrapped as a Task. See PEP 380 for more information.

Async code works entirely on the basis of code that schedules its subsequent workload as a callback on the event loop. The event loop responds to these callbacks, alongside OS-level IO events, making it possible to interleave workloads on a single thread.

In modern Python (3.8+), Futures are still extensions of generator-based coroutines, however they implement the dunder __await__ method to support the Awaitable interface. They also support the legacy __iter__ method for older python concurrency with the yield from syntax. The yield from syntax is interesting because it shows that the users of these generators simply wrapped the underlying generator and yielded its response as its own until it was complete.

At the core of coroutine __await__s, eventually an awaited coroutine terminates by simply yielding a value. I’m not currently sure if this is something that yields a message that is understood as the process still working or if it’s done. For example, how does a library like httpx deal with this?

Awaitables, Coroutines, Futures, and Tasks

Cancelling

When you using something like a asyncio.wait_for timeout, it cancels the internal Task. If asyncio.gather is cancelled, it cancels the running awaitables as well. When a coroutine gets cancelled, any currently running await will receive a CancelledError exception.

Questions

Why is it not better to just run a bunch of different python workers as separate OS threads?

Because the python runtime requires a lot of baseline memory, and using cooperative coroutines (green threads) are a good tradeoff to achieve concurrency.

Debugging

Set PYTHONASYNCIODEBUG=1 and PYTHONTRACEMALLOC=1 environment variables to enable useful asynchronous traces. Also, using type declarations will help in identifying where coroutines are not awaited properly.

Async IO Design Patterns

Explore: Can you mock awaitables by supporting a .__await__() dunder method?

Hesitations and Caveats

There are some possible problems with AsyncIO, however it seems as long as you keep a few considerations in mind, you should be fine. See Lynn Root - Advanced asyncio: Solving Real-world Production Problems.

Additional Resources

Keep In Mind

AsyncIO and Threads

Threaded code tends to be blocking, and async code can often be thread-unsafe. One possible way to bridge that gap between the two is to use a thread-safe queue to pass messages between threads.

Async queues are not thread-safe by default, but you may be able to coerce safety by inserting and reading from an async queue with loop.call_soon_threadsafe.

Threading and the GIL

Due to Python’s Global Interpreter Lock (GIL), threading does not help anything for CPU-bound problems. I’m kind of running this point into the ground, but only use threading with I/O-bound code.

Additional Resources

Parallels to Node.js

Node.js is built on top of libuv rather than the native loop implementation python uses, however it’s possible to replace the event loop using a library like uvloop. Most asynchronous calls in Node can be run on the event loop and are run in the same thread, however file IO is not. It is run in a separate thread via the thread pool. This is an issue in asynchronous python particularly. Even libraries like aiofiles seem to be deferring file work to a ThreadPoolExecutor, however Node.js uses a preallocated thread pool with 4 threads. CPU-bound operations

Node.js asynchronous code uses asyncronous primitives whenever possible, and then uses thread pools for IO- and CPU-bound operations that are not supported by OS-level primitives like epoll, kqueue, etc.

This is what the asyncio and aio* community-driven projects are trying to attain.

Additional Resources