Million requests per second with Python

Is it possible? Probably not until recently. Many large companies have been investigating migrating to other programming languages to boost their operation performance and save on server prices but there is no need really. Python can be right tool for the job and there is a lot of work around performance in the community happening. CPython 3.6 boosted overall interpreter performance with new dictionary implementation, CPython 3.7 is gonna be even faster thanks to introducing faster call convention and dictionary lookup caches. For number crunching tasks you can use PyPy with its just-in-time code compilation. Since recently it can run NumPy test suite and improved overall compatibility with C extensions drastically. Later this year PyPy is expected to reach Python 3.5 conformance.

All this great work inspired me to innovate in one of the areas which Python is used extensively, web and micro-services development.

Enter Japronto!

Japronto is a brand new micro-framework tailored for your micro-services needs. It’s main goals include being fast, scalable and lightweight. It lets you do synchronous and asynchronous programming with asyncio and it’s shamelessly fast. Even faster than NodeJS and Go.

Python micro-frameworks (blue), Dark side of force (green) and Japronto (purple)

This micro benchmark was done using a “Hello world!” application but it clearly demonstrates server-framework overhead for a number of solutions. These results were obtained on AWS c4.2xlarge instance that has 8 VCPUs launched in São Paulo region with default shared tenancy, HVM virtualization and magnetic storage. The machine was running Ubuntu 16.04.1 LTS (Xenial Xerus) with Linux 4.4.0–53-generic x86_64 kernel. The OS was reporting Xeon® CPU E5–2666 v3 @ 2.90GHz CPU. I used Python 3.6 which I freshly compiled from source code. To be fair all the contestants (including Go) were running single worker process. Servers were load tested using wrk with 1 thread, 100 connections and 24 simultaneous (pipelined) requests per connection (cumulative parallelism of 2400 requests).

HTTP pipelining (image credit Wikipedia)

HTTP pipelining is crucial here since it’s one of the optimizations that Japronto takes into account when executing requests. Most of the servers execute requests from pipelining clients in the same fashion they would do from non-pipelining clients and don’t try to optimize it (in fact Sanic and Meinheld would also silently drop requests from pipelining clients which is a violation of HTTP 1.1 protocol). In simple words pipelining is a technique in which the client doesn’t need to wait for the response before sending following request over the same TCP connection. To ensure integrity of the communication server sends back several responses in the same order requests were received.

The gory details of optimizations

When many small GET requests are pipelined together by the client there is a great chance they are gonna arrive in one TCP packet (thanks to Nagle’s algorithm) on the server side and read back by one system call. Doing a system call and moving data from kernel-space to user-space is a very expensive operation compared to e.g. moving memory inside process space. That’s why doing as little as possible (but not less) system calls is important. When Japronto receives data and successfully parses out many requests out of it it tries to execute all requests as fast as possible, glue back responses in correct order and write back in one system call. In fact the kernel can aid with the gluing part thanks to scatter/gather IO system calls which Japronto doesn’t use yet. Beware that all this is not always possible since some of the requests could take too long and waiting for them would needlessly increase latency. Care needs to be taken when tuning heuristics weighting between the cost of system calls and expected request completion time.

Japronto gives a 1,214,440 RPS median of grouped continuous data, calculated as the 50th percentile, using interpolation.

Besides delaying writes for pipelined clients there are several other techniques employed in the code. Japronto is written almost entirely in C. The parser, protocol, connection reaper, router, request and response objects are written as C extensions. Japronto tries hard to delay creation of Python counterparts of its internal structures until asked explicitly. For example headers dictionary won’t be created until requested in a view. All the token boundaries are already marked before but normalization of header keys and creation of several str objects is done when accessed for the first time.

Japronto relies on the excellent picohttpparser C library for parsing status line, headers and chunked HTTP message body. Picohttpparser directly employs text processing instructions found in modern CPUs with SSE4.2 extensions (almost any 10 year old x86_64 CPU has it) to quickly match boundaries of HTTP tokens. The I/O is handled by the super awesome uvloop, which itself is a wrapper around libuv. At the lowest level this is a bridge to epoll system call providing asynchronous notifications on read-write readiness.

Picohttpparser relies on SSE4.2 and CMPESTRI x86_64 intrinsic to do parsing

Python is a garbage collected language and care needs to be taken when designing high performance systems not to needlessly increase pressure on the GC. The internal design of Japronto tries to avoid reference cycles and do as little allocations/deallocations as possible. It does so by preallocating some objects in so called arenas. It also tries to reuse Python objects for future requests if they are no longer referenced instead of throwing them away.

All the allocations are done as multiples of 4KB, internal structures are laid out carefully so that data used frequently together is close enough in memory minimizing possibility of cache misses. Japronto tries not to copy between buffers unnecessarily and does many operations in-place. For example percent-decoding the path before matching in the router process.

Call for help

I’ve been working on Japronto continuously for last 3 months often during weekends as well as normal labor days. This was only possible due to taking a break from my regular programmer job and putting all the efforts into this project. I think it’s time to share fruit of my work with the community.

Currently Japronto implements pretty solid feature-set:

HTTP 1.x implementation with support for chunked uploads
Full support for HTTP pipelining
Keep-alive connections with configurable reaper
Support for synchronous and asynchronous views
Master-multiworker model based on forking
Support for code reloading on changes
Simple routing

I would like to look into Websockets and streaming HTTP responses asynchronously next. There is a lot of work to be done in documentation space and testing could definitely benefit from some help. If you would like to help me please contact me directly on Twitter (@squeaky_pl) or Github. The project repository is located at https://github.com/squeaky-pl/japronto.

Also if your company is looking for a Python developer who is a performance freak and also does DevOps I am open to hearing about that. I am going to consider positions worldwide.

The other contestants

Looking closer at the other contestants we can see that shiny new NodeJS is almost as fast as Go (I was pretty disappointed with Go performance in this micro-benchmark to be honest). We can also see that Meinheld WSGI server is almost on par with NodeJS and Go. Despite of its inherently blocking design, it is a great performer compared to preceding four which are Python asynchronous solutions. Never trust anyone who says that asynchronous systems are always speedier, they are almost always more concurrent but there’s much more to it.

Final words

All the techniques that were mentioned here are not really specific to Python. They could be probably employed in other languages like Ruby, JavaScript or PHP even. I would be interested in doing such work as well but sadly this will not happen unless somebody funds it.

I would like to thank Python community for continuous investments in performance engineering. Namely Victor Stinner @VictorStinner, INADA Naoki @methane and Yury Selivanov @1st1 and entire PyPy team.

For the love of Python.

Million requests per second with Python

Enter Japronto!

The gory details of optimizations

Call for help

The other contestants

Final words

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112