Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

XACT: Lock-Free Multi-CAS for C++/x64 Built on TSX

$
0
0

README.md

Lock-free N-way atomic compare-and-swap:

TSX-based concurrency primitives for C++/linux/x64

Background

Intel's TSX instructions, included in some Haswell processors and more widely available in recent Skylake chips, implement a restricted form of hardware transactional memory. TSX is powerful, but has some severe limitations: it gets cranky when you touch too much memory at once, and it likes to mysteriously abort transactions. Intel hasn't released many details on the underlying algorithms. You get the idea: it isn't really suitable for general purpose transactional memory.

Currently, TSX instructions are seeing some use in glibc and libraries like ConcurrencyKit to optimize lock-based concurrency control. While this approach enables some impressive speedups, it ends up presenting the same lock-based, blocking API to the programmer.

XACT takes a middle road: it doesn't use TSX for full-blown transactional memory, but also doesn't just use it to transparently elide locks. Instead, it aims to provide a higher-level interface to a small set of non-lock-based primitives.

The basic idea is to use just enough TSX to get the job done. XACT intentionally keeps hardware-level transactions very short, with as few memory accesses as necessary, to avoid the spurious failures and aborts that seem to have limited TSX's use so far.

One fundamental primitive in lock-free programming is compare-and-swap, as provided by the cmpxchg x86 instruction. This is "single" CAS: it operates on a single memory location at a time, which places a number of constraints on its applications. Double-CAS, or DCAS, comes up a lot in papers, but isn't meaningfully supported by any modern mainstream architecture. This is unfortunate, as people have been dreaming up algorithms requiring DCAS for a long time now.

I'm also pretty tired of plain single CAS, so XACT's first main feature is a lock-free, multi-way, atomic compare-and-swap operation. This allows for double-CAS as well as triple- and quadruple-CAS. And also 8-way CAS. As a bonus, N-way atomic stores, loads, and fetch-adds are also supported.

Implementation

XACT's underlying primitives are written in NASM assembly, but its main interface is in C++. The C++ layer handles some logic on top of the assembly primitives, deals with alignment of the provided AtomicU64 type, and provides some level of type safety and human-friendliness.

If you're interested in the underlying details, the assembly parts are documented decently well.

API

See the example below, as well as some API notes here.

Multi-CAS Example

An ordinary CAS operation on X86-64 operates on a single memory location, like this:

#include<atomic>
#include<cassert>intmain() {
    std::atomic<uint64_t> atom {10};uint64_t expected = 10;uint64_t desired = 20;// if `atom` is currently equal to `expected`,// its value is replaced with `desired` and the method returns true.assert(atom.compare_exchange_strong(expected, desired));assert(atom.load() == 20);// if `atom` does not equal `expected`,// the CAS fails and `atom`'s value is unchanged.
    desired = 100;
    expected = 99;assert(!atom.compare_exchange_strong(expected, desired));assert(atom.load() == 20);
}

There's still no DCAS instruction. But with TSX and XACT, you can now perform atomic loads / stores / CAS / fetch-adds on multiple memory locations simultaneously like this:

#include<array>
#include<xact/AtomicU64.h>
#include<xact/FixedAtomicU64Group.h>usingnamespacestd;using AtomGroup = xact::FixedAtomicU64Group<4>;staticconstsize_tkAtoms = 4;intmain() {
  std::array<xact::AtomicU64, kAtoms> atoms {1, 2, 3, 4};
  AtomGroup group{{&atoms[0], &atoms[1], &atoms[2], &atoms[3]}};
  std::array<uint64_t, kAtoms> newValues {10, 20, 30, 40};
  std::array<uint64_t, kAtoms> expectedValues {1, 2, 3, 4};for (;;) {if (group.compareExchange(expectedValues, newValues)) {break;
    }
  }
}

(See the extended, two-thread example here.)

Above, XACT's FixedAtomicU64Group is an array type containing references to multiple AtomicU64 instances. You can think of it as a composite interface: where calling compareExchange() on a single AtomicU64 will either atomically replace its value or fail, calling compareExchange on a FixedAtomicU64Group will either atomically replace the values of all members or completely fail.

The API allows up to 64-way CAS, but I haven't pushed it too hard yet. 8- and 16- way CAS appear to be reliable. The practical limit will depend on your particular hardware and workload, and on Intel's magic pixie dust.

Benchmarks

See here for some recent, preliminary benchmarks.

Building

Library Dependencies

  • XACT itself has no external dependencies on other libraries.
  • Unit tests depend on GTest (includes as a git submodule) and libglog
  • Benchmarks rely on Google's Benchmark library (included as a submodule) and libglog

Compiler/System Requirements

Building XACT requires:

  • CMake
  • A C++-11-compatible compiler (tested on Clang and GCC)
  • The Netwide Assembler (NASM)
  • An Intel x86-64 processor with TSX extensions (currently, this will probably be a Skylake chip).
  • Linux

Build Instructions

make deps
mkdir buildcd build && cmake ../
make xact
make test# optional
make install

Roadmap

Future ideas:

  • More general transactional operations, e.g. a write to two locations conditional on the values at four other locations.
  • A generalized predicate-based compare and swap. (testing for not just equality but <=, >, etc.)
  • Spinlock-based fallbacks, both for non-TSX chips and for frequently failing transactions.

License

MIT


Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>