This blog post dives into an interesting finding: two frequently used system calls (gettimeofday
, clock_gettime
) are much slower on AWS EC2.
Linux provides a mechanism for speeding up those two frequently used system calls by implementing the system call code in userland and avoiding the switch to the kernel entirely. This is done via a virtual shared library provided by the kernel that is mapped into the address space of every running program.
The two system calls listed cannot use the vDSO as they normally would on any other system. This is because the virtualized clock source on xen (and some kvm configurations) do not support reading the time in userland via the vDSO.
There is no safe workaround for this; the user may decide to change their clock source to tsc
by writing to file in sysfs, but this is considered dangerous. Continue reading to learn more and the results of a microbenchmark.
To quickly confirm if your system is affected by this issue, you can compile and run the following program with strace
:
#include <stdio.h>#include <stdlib.h>#include <sys/time.h>intmain(intargc,char*argv[]){structtimevaltv;inti=0;for(;i<100;i++){gettimeofday(&tv,NULL);}return0;}
Compile with: gcc -o test test.c
and then trace with strace -ce gettimeofday ./test
% strace -ce gettimeofday ./test
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0100 gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 100 total`
As you can see, strace
counted 100 calls to gettimeofday
. This means that the vDSO is not being used and real system calls are being made causing a context switch to the kernel. The Linux vDSO was designed with gettimeofday
in mind (in fact, it’s even mentioned in the vDSO man page). Any system call that is passed through the vDSO is executed completely in userland causing no context switch to the kernel. As a result, any system call that successfully uses the vDSO will not appear in strace
output.
Continue reading to learn why exactly this is and to see some interesting profiling data.
There’s a few important things the reader will need to be familiar with in order to follow the explanation and code snippets that follow illustrating this issue.
Linux system calls and the vDSO
Before proceeding with this post, it is strongly recommended that the reader carefully read our previous post detailing how system calls work on Linux: The Definitive Guide to Linux System Calls.
As described in detail in that blog post, the vDSO is essentially a shared library that is provided by the kernel which is mapped into every process’ address space. When the gettimeofday
, clock_gettime
, getcpu
, or time
system calls are made, glibc will attempt to call the code provided by the vDSO. This code will access the needed data without entering the kernel, saving the process the overhead of making a real system call.
Because system calls made via the vDSO do not enter the kernel, strace
is not notified that the vDSO system call was made. As a result, a program which calls gettimeofday
successfully via the vDSO will not show gettimeofday
in the strace
output. You would need to use ltrace
instead. Learn more about how strace
works by reading an older post of ours.
On AWS EC2, gettimeofday
appears in the strace
output. This is because the vDSO falls back to a regular system call in certain situations.
Linux timekeeping
There are many different apparatuses that can be used for timekeeping on x86 systems running the Linux kernel:
Each system has its own benefits and drawbacks. More detailed information on each method is conveniently presented in the kernel source in Documentation/virtual/kvm/timekeeping.txt.
It’s important to understand that virtualization introduces many complexities when it comes to timekeeping. Some examples include:
- The virtual machines running on a host now all share the same source of time, but it is impossible for every VM to have its time updated at exactly the same instant. Moreover, a VM may have interrupts disabled while executing critical sections of the kernel while the hypervisor may be happily generating timer interrupts.
- Certain timekeeping systems (like the Time Stamp Counter) are themselves virtualized; reads from the TSC register may impose a performance penalty yielding inaccurate readings and backwards time drift.
- Migration of VMs between hypervisors with different CPUs may be problematic if the timekeeping system relies on the clock rate of the processor.
The folks at VMWare published a very interesting paper describing these and other timekeeping issues. The information is presented as being specific to VMWare, but most of it is generally applicable to any virtualization system.
In order to deal with these and other issues, KVM and Xen provide their own timekeeping systems: the KVM PVclock and the Xen time implementation. In the Linux kernel each of these is commonly referred to as a clocksource.
The system’s current clocksource can be found by checking the file /sys/devices/system/clocksource/clocksource0/current_clocksource
.
This is the clocksource which will be consulted when system calls like gettimeofday
or clock_gettime
are executed.
Let’s take a look at the vDSO code implementing gettimeofday
for more clarity. Remember, this code is packaged with the kernel, but is actually run completely in userland.
If we examine the code in arch/x86/vdso/vclock_gettime.c and check the vDSO implementations for gettimeofday
(__vdso_gettimeofday
) and clock_gettime
(__vdso_clock_gettime
), we’ll find that both pieces of code have a similar conditional near the end of the function:
if(ret==VCLOCK_NONE)returnvdso_fallback_gtod(clock,ts);
(The code for __Vdso_clock_gettime
has the same check, but calls vdso_fallback_gettime
instead.)
If ret
is set to VCLOCK_NONE
this indicates that the system’s current clocksource does not support the vDSO. In this case, the vdso_fallback_gtod
function failsafe function is called which will simply executes a system call normally: by entering the kernel and incurring all the normal overhead.
But, in which cases does ret
get set to VCLOCK_NONE
?
If we follow the code backward from this point, we’ll find that ret
is set to the vclock_mode
field of the current clocksource. Clocksources such as:
all have their vclock_mode
fields set to an identifier other thanVCLOCK_NONE
.
On the other hand, clocksources such as:
- the Xen time implementation, and
- systems where either
CONFIG_PARAVIRT_CLOCK
is not enabled in the kernel configuration or the CPU does not provide a paravirtualized clock feature
all have their vclock_mode
fields set to VCLOCK_NONE
(0
).
AWS EC2 uses Xen. Xen’s default clocksource (xen
) has its vclock_mode
field set to VCLOCK_NONE
which means EC2 instances will always fall back to using the slower system call path – the vDSO will never be used.
But, what effect does this have on performance?
The purpose of the following experiment is to measure the difference in wall clock time in a microbenchmark to test the difference in execution speed between the fast vDSO-enabled gettimeofday
system calls and regular, slow, gettimeofday
calls.
In order to test this, we’ll run the sample program above with three different loop counts on an EC2 instance with the clocksource set to xen
and then again with the clocksource set to tsc
.
It is not safe to switch the clocksource to tsc
on EC2. It is unlikely, but possible that this can lead to unexpected backwards clock drift. Do not do this on your production systems.
Experiment setup
All tests were run:
- with the Amazon Linux AMI 2016.09.1 (HVM), SSD Volume Type (AMI: ami-f173cc91),
- on an m4.xlarge instance size,
- in the us-west-2c availability zone.
We’ll time the execution of the program using the time
program. Readers may wonder: “how can you use the time
program if you are potentially destabilizing the clocksource?”
Luckily, the kernel developer Ingo Molnar wrote a program for detecting time warps: time-warp-test.c. Note that you will need to modify this program just slightly for 64bit x86 systems.
We ran this helpful program while performing our experiments to help detect if any time warps were experienced. There were none.
There’s a few things that some one wishing to get a more scientific result could do to increase confidence in the result if desired:
- Backward clock drift is unlikely, but possible. Running the expirement many times, gathering the outcomes, and running a probabilistic analysis can help mute bad data points.
- The experiment can be re-run on a non-virtualized system that would not suffer from clock drift. The experiment can be performed first by running the test program provided to test the vDSO. Then, the program can be modified to call the syscall directly.
For our purposes, running the time warp test to detect time warps while performing the experiment was sufficient.
Results
The results of this microbenchmark show that the vDSO method is about 77% faster:
A tight loop of 5 million calls to gettimeofday
:
- vDSO enabled:
- real: 0m0.123s
- user: 0m0.120s
- sys: 0m0.000s
- regular slow system call:
- real: 0m0.547s
- user: 0m0.120s
- sys: 0m0.424s
A tight loop of 50 million calls to gettimeofday
:
- vDSO enabled:
- real: 0m1.225s
- user: 0m1.224s
- sys: 0m0.000s
- regular system call:
- real: 0m5.459s
- user: 0m1.316s
- sys: 0m4.140s
A tight loop of 500 million calls to gettimeofday
:
- vDSO enabled:
- real: 0m12.247s
- user: 0m12.244s
- sys: 0m0.000s
- regular system call:
- real: 0m54.606s
- user: 0m13.192s
- sys: 0m41.412s
Patches to Xen in progress
The proper fix for this issue would be to add vDSO support to the xen clocksource. Luckily, there are some patches in the works that aim to do just that.
Until this change (or one like it) is merged to the kernel and deployed to EC2, the gettimeofday
and clock_gettime
system calls will execute 77% slower than they otherwise could when run on EC2.
As expected, the vDSO system call path is measurably faster than the normal system call path. This is because the vDSO system call path prevents a context switch into the kernel. Remember: vDSO system calls will not appear in strace
output if they successfully pass through the vDSO. If they are unabled to use the vDSO for some reason, they will fall back to regular system calls and will appear in strace
output.
There are some patches in the works to add vDSO support to Xen, but there’s no telling when this will be available in AWS EC2.
Until a change like this is deployed to EC2, gettimeofday
and clock_gettime
will perform approximately 77% slower than they normally would.
Using strace
on your applications incurs overhead while it is in use, but it provides invaluable insight into what exactly your applications are doing. All programmers deploying software to production environments should regularly strace
their applications in development mode and question all output they find.
If you enjoyed this post, you may enjoy some of our other low-level technical posts: