An Introduction to Asynchronous Programming using Python’s asyncio¶
Asynchronous programming solves a VERY specific problem: if you have a program that sits there waiting for some other routine to complete, but (crucially) could be doing other things while waiting for the aforementioned routine to complete then you might want to use asnychronous programming.
(This is a Jupyter Notebook! Follow along with the most recent version on github)
What might a routine look like that waits around? Perhaps an i/o routine!? Yes! That’s where the python package name comes from: asyncio provides a way to call i/o routines and then wait for them to finish and start doing something else whilst you wait for the io operation to finish. Another big use of asynchronous programming is for user interfaces where you are waiting for input most of the time but then need to quickly process the input and go back to waiting for more input.
Figure 1: Proof why it’s a good thing I’m not a front-end developer and a diagram of why asynchronous programming can be faster.
You might rightfully be asking how/why this is different from using threads. Well it’s similar but different. You could do everything here by spawning multiple threads, but threads aren’t free as far as resources go: you may be in an instance where you need to only use one thread. More importantly, multi-threading is actually going to have to run like the synchronous example in Figure 1 above! If you have to wait for something, the thread has to actively wait for the something to complete, unless you’re using a asynchronous programming.
Enough talking, let’s get started with some code! The syntax for asyncio got much cleaner and easier to follow/read as of version 3.6 so I need to make sure we’re using version 3.6 or later
import sys
assert sys.version_info.major >= 3
assert sys.version_info.minor >= 6
import asyncio
import time
The key object here which allows asynchronous programming to work is called the event loop:
loop = asyncio.get_event_loop()
# loop.set_debug(True) #this provides MUCH more helpful error messages; always do this when debugging/developing
# Not set here to keep the notebook output looking cleaner
You place objects called coroutines into the loop. A coroutine is an object which knows how to cede control back to the event loop and then let the event loop know when it is ready to resume control. The simplest coroutine is one that just sleeps by ceding control to the event loop and then prints something:
async def simple_sleeping_coroutine(label):
waiting_time_start = time.time()
await asyncio.sleep(1.0)
print(f"routine {label} waited {time.time() - waiting_time_start} s")
There are a couple of new things here for a lot of programmers. First off, the keyword async
tells the interpreter that we are writing a coroutine and thus that it is an object which can be put into the event loop.
Then the keyword await (expression)
is where the magic happens! It tells the interpreter: “OK! I’m done until the expression is done computing, then I need to get control back eventually”. Crucially, that expression must iteself be a coroutine and thus can talk to the interpreter. True to the spirit of python syntax which came before it, await
is just like reading plain language: await this result to be finished. We now have a coroutine; let’s put this guy into the event loop.
loop.run_until_complete(simple_sleeping_coroutine("a"))
This is a rather simple example: we put only one coroutine on the event loop and we ran it until it completed. The real power comes from placing multiple coroutines on the event loop which we can do using asyncio.gather
.
loop.run_until_complete(asyncio.gather(simple_sleeping_coroutine("a"), simple_sleeping_coroutine("b")))
You will note that the above routine outputed [None, None]
and you can indeed use return statements inside of coroutines, like so:
async def less_simple_sleeping_coroutine(label):
waiting_time_start = time.time()
await asyncio.sleep(1.0)
sleeping_time_s = time.time() - waiting_time_start
print(f"routine {label} waited {sleeping_time_s} s")
return sleeping_time_s
You can also use a generator to make multiple coroutines: it’s just an object and doesn’t get called until the event loop does, so no reason you can’t.
loop.run_until_complete(asyncio.gather(*[less_simple_sleeping_coroutine(i) for i in range(5)]))
It’s also worth noting that because only one function is ever running at the same time, you can use a simple collection variable like a list instead of the necessary thread safe containers if you were using a threading or multiprocessing solution to your problem. Let me reiterate since this is really, really cool and does a lot to make asynchronous programming so attractive as a paradigm: you can use any data structure you want when programming asynchronously. That may be hard to follow so here’s an example:
collection_list = []
async def collection_example_sleeping_coroutine(label):
waiting_time_start = time.time()
await asyncio.sleep(1.0)
sleeping_time_s = time.time() - waiting_time_start
print(f"routine {label} waited {sleeping_time_s} s")
collection_list.append(sleeping_time_s)
loop.run_until_complete(asyncio.gather(*[collection_example_sleeping_coroutine(i) for i in range(5)]))
collection_list
You might be wondering why the routine’s output when they did. The short answer is that basically the event loop does what it will do and it tries its best to schedule efficiently but you can’t be guaranteed–unless you program it to do so–that one routine will come before another.
The End of Contrived Examples: a webcrawler¶
OK well that’s all been well and cool but still completely contrived examples, since all they did was sleep; no i/o was done… until now! So let’s examine something that this would be very useful for: web-crawling. Say you have a list of websites you want to grab data from:
url_test_list = [
'http://ziprecruiter.com',
'http://google.com',
'http://reddit.com',
'http://news.ycombinator.com/',
'http://httpbin.org/',
'http://scouting.org',
'https://en.wikipedia.org/wiki/Main_Page',
'https://www.amazon.com/',
'http://github.com',
'http://fakey.fakefake'
]
These websites will all take a different amount of time to load based on a bunch of factors: how much data they’re serving, what their code that delivers that data is and many other things. That alone isn’t necessarily enough to justify using asynchronous programming, but say we want to do something with the data once we’ve fetched it: perhaps count all instances of <
and >
in the html of the returned website. That is indeed something we can do while waiting for another website to load so this is a PERFECT instance of how asyncio should be used!
So with that in mind, let’s write our fetcher coroutine. We’re going to need a connection library, however, that is a coroutine. Asyncio has network connection primitives but in the interest of conciseness, I’m going to use aiohttp which provides everything we need for this example in a simple package. You can also easily write servers with that package but we’re just focusing on a client for now. OK!
import aiohttp
import async_timeout
# this is basically just taken from the example on their website
async def async_fetch_text(url):
try:
# These are coroutine context managers: pretty cool!
async with aiohttp.ClientSession() as session: #wait to get the session, cede control to the event loop
async with async_timeout.timeout(10):
async with session.get(url) as response:
return await response.text()
except aiohttp.ClientConnectorError:
print(f"cannot connect to {url}")
return ""
# this doesn't really need to be async because it needs the entire thread's attention to calculate
def count_angle_brakets(text_input):
left_count = 0
right_count = 0
for char in text_input:
if char == '<':
left_count += 1
if char == '>':
right_count += 1
return left_count, right_count
async def async_worker(url):
start_time = time.time()
url_text = await async_fetch_text(url)
print(f"fetching {url} took {time.time() - start_time} s")
count = count_angle_brakets(url_text)
return url, count
%time loop.run_until_complete(asyncio.gather(*[async_worker(url) for url in url_test_list]))
Just to convince you that this is indeed faster than doing this synchronously, let’s quickly do it the old fashioned synchronous way!
import requests
def sync_fetch_text(url):
try:
r = requests.get(url)
return r.text
except requests.ConnectionError:
print(f"Cannot connect to {url}")
return ""
def sync_worker(url):
start_time = time.time()
url_text = sync_fetch_text(url)
print(f"fetching {url} took {time.time() - start_time} s")
count = count_angle_brakets(url_text)
return url, count
%time [sync_worker(url) for url in url_test_list]
beautiful! The asynchronous solution is much quicker!
Various Tips¶
There are a few more things worth covering in a basic introduction:
Another way to start the event loop running.¶
The event loop can start in another way. We used gather
and run_until_complete
, but it also has a run_forever
function, but to get there you have to register tasks with the loop first, like so:
loop.create_task(less_simple_sleeping_coroutine("a"))
loop.create_task(less_simple_sleeping_coroutine("b"))
But if we run the loop forever, we can’t run anything more in the python interpreter! So I find it useful to define a function which stops the loop after a while to avoid running forever.
async def time_limiter(s_to_wait):
await asyncio.sleep(s_to_wait)
loop.stop()
loop.create_task(time_limiter(3))
%time loop.run_forever()
This pattern would also make it possible to use a coroutine with a non-terminating loop to keep sending requests off to somewhere for some reason or another: very powerful!
Firing off a New Thread¶
If you have a routine you want to fire off in its own thread to run in its own process in parallel, you can do that using loop.run_in_executor
. You might do this because you simply need to run a legacy function merged with your fancy modern asynchronous code. Or perhaps you have a heavy task you want to send off to its own process.
loop.run_in_executor
needs an Executor object to tell it how to run, a function handle and then the arguments to that function. It is super useful anything that needs its own thread or process and for wrapping legacy code that you don’t have the time to re-write; it provides a nice bridge between asynchronous programming and parallel programming. But how do you use it?
#Thread execution:
def fibonacci(n):
if n <= 1:
return 1
else:
return fibonacci(n - 1) + fibonacci(n - 2)
async def async_fib_worker_thread(i):
start_time = time.time()
result = await loop.run_in_executor(None, fibonacci, i)
print(f"Fib_{i}={result} calculated in {time.time() - start_time}")
return result
%time fib_numbers = loop.run_until_complete(asyncio.gather(*[async_fib_worker_thread(i) for i in range(20,34)]))
print(fib_numbers)
Note that in case you weren’t certain this would happen before, that the output of run_until_complete is in order, despite that fact that it did not calculate everything in order. I wouldn’t ever rely on this being the case though.
Now let’s spin up new processes:
# Process Execution
import concurrent.futures
process_executor = concurrent.futures.ProcessPoolExecutor()
async def async_fib_worker_proc(i):
start_time = time.time()
result = await loop.run_in_executor(process_executor, fibonacci, i)
print(f"Fib_{i}={result} calculated in {time.time() - start_time}")
return result
%time fib_numbers = loop.run_until_complete(asyncio.gather(*[async_fib_worker_proc(i) for i in range(20,34)]))
print(fib_numbers)
The process-based task should have been much quicker at the expense of using more system resources, because each coroutine got its own process, so it could run in parallel to the master event loop thread, whereas the one before it had to share time with all the other processes.
A Note on the Event Loop¶
In order to save resources, we really ought to close the event loop when we’re done with it. Don’t worry! You can get a new one back if you need it, but the old one is gone forever:
loop.close()
# to get it back:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
Anyone can develop their own event loop, though, and some people have gone and done that using the same event loop which powers node.js and call is uvloop which claims to be 2-4x faster than the core python event loop for certain examples.
(Before you ask, the python core (cpython) developers don’t make it the default loop because it introduces extra dependencies because of how it’s written, and the python developers try to keep the number of dependencies in the core python as low as possible.)
Let’s see if uvloop is indeed faster for our web crawling example!
%time loop.run_until_complete(asyncio.gather(*[async_worker(url) for url in url_test_list]))
loop.close() # since we're done with it!
# from the uvloop docs
import uvloop
# Tell asyncio to use uvloop to create new event loops
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop) # always register the new event loop with asyncio or it can get confused and give you odd errors when it tries to use an event loop other than the one you want it to
%time loop.run_until_complete(asyncio.gather(*[async_worker(url) for url in url_test_list]))
So indeed a bit faster! Very cool!
Further Reading¶
Even the core developers of python/asyncio will admit the current documentation is pretty terrible. But now if you followed everything I wrote, I hope the documentation will make a lot more sense. What you can do with asnycio is very very rich: this guide only scratched the surface. The video linked above is a great resouce, and here are some other good resources to learn more:
- http://pyvideo.org/europython-2016/asyncawait-in-python-35-and-why-it-is-awesome.html
- https://docs.python.org/3/library/asyncio.html#module-asyncio
- https://hackernoon.com/asyncio-for-the-working-python-developer-5c468e6e2e8e
- https://medium.com/python-pandemonium/asyncio-coroutine-patterns-beyond-await-a6121486656f
- https://pymotw.com/3/asyncio/
Acknowledgements¶
I want to thank Yury Selivanov because this document is essentially lecture notes on his videos that I wrote while trying to wrap my head around asynchronous programming. Also want to thank my employer ZipRectuiter for requiring that I understand asynchronous programming for a recent project and letting me write and post this.
And thank you for reading!