Uber London loses licence to operate

September 22, 2017, 3:11 am

≫ Next: Taking a full course of antibiotics doesn’t help prevent spread of resistance

≪ Previous: Reproducibility vs. root privileges

Uber London loses licence to operate

Uber will not be issued a new private hire licence, Transport for London (TfL) has said.

TfL concluded the ride-hailing app firm was not fit and proper to hold a London private hire operator licence.

It said it took the decision on the grounds of "public safety and security implications".

Confirming it would appeal against the decision, Uber said it showed the world "far from being open, London is closed to innovative companies".

Some 3.5 million passengers and 40,000 drivers use the Uber app in London.

Latest updates and reaction

What does the ruling mean?

In a statement, Uber said: "Transport for London and the mayor have caved in to a small number of people who want to restrict consumer choice."

Uber's general manager in London Tom Elvidge said: "To defend the livelihoods of all those drivers, and the consumer choice of millions of Londoners who use our app, we intend to immediately challenge this in the courts."

He said Uber operated in more than 600 cities around the world, including more than 40 towns and cities in the UK.

TfL's concerns include Uber's approach to carrying out background checks on drivers and reporting serious criminal offences.

Image copyrightPA

Image caption Taxi drivers have been campaigning against Uber, such as engaging in this "go slow" protest in 2014

There had been growing speculation that the app could be banned from London.

Opponents of the firm claim it causes gridlocked roads and does not do enough to regulate its drivers.

One driver with Uber in London said: "I don't think it is a fair decision. Uber offers a flexible schedule, and a weekly income."

Analysis: From BBC technology correspondent Rory Cellan-Jones

Throughout its short, tempestuous life, Uber has clashed with regulators around the world - and more often than not it has come out on top.

Its tactic has often been to arrive in a city, break a few rules, and then apologise when it's rapped over the knuckles. Some regulators have backed down, others have run the company out of town.

In London, despite protests from angry taxi drivers, the company has had a relatively easy ride until now.

But a wave of bad publicity about its corporate culture, its lax attitude to checks on its drivers and its treatment of this freelance army seems to have spurred TfL into action.

Make no mistake, Uber will use every legal avenue to fight this ban. It will argue that consumers, in the shape of the millions of mainly young Londoners who rely on its service, will be seriously let down if it can no longer operate.

But the courts will have to balance that with the serious concerns about public safety raised by TfL.

Mayor of London Sadiq Khan said in a statement: "I fully support TfL's decision - it would be wrong if TfL continued to license Uber if there is any way that this could pose a threat to Londoners' safety and security.

"Any operator of private hire services in London needs to play by the rules."

Across the world, Uber has been pushed out or denied access by local licensing laws.

Legislators in Darwin, in Australia's Northern Territory, are debating whether to allow Uber to return after a raft of reforms designed to open up the ride-sharing market were announced.

Uber is currently fighting a test case in Denmark after four if its drivers were found to be in violation of the country's laws requiring taxi meters.

General secretary of the Licensed Taxi Drivers' Association Steve McNamara said it was the "right call" not to re-license Uber in London.

"We expect Uber will again embark on a spurious legal challenge against the mayor and TfL, and we will urge the court to uphold this decision," he said.

"This immoral company has no place on London's streets."

'Luddite decision'

Labour MP Wes Streeting, chairman of the All Party Parliamentary Group on Taxis, said: "This is a courageous decision by the mayor and Transport for London.

"It finally draws a line in the sand to make it clear that no company, however big and powerful, will be allowed to flout our laws and regulations or jeopardise Londoners' safety without facing serious consequences."

However, David Leam, of London First which campaigns for business in the capital, said London needed to be open to new ideas, business and services.

He said: "This will be seen as a Luddite decision by millions of Londoners and international visitors who use Uber, and will also hit London's reputation as a global tech hub."

James Farrar, chairman of the Independent Workers' Union of Great Britain's United Private Hire Drivers branch, said: "This is a devastating blow for 30,000 Londoners who now face losing their job and being saddled with unmanageable vehicle-related debt.

"To strip Uber of its licence after five years of laissez-faire regulation is a testament to a systemic failure at TfL."

Uber has 21 days to appeal TfL's decision.

Timeline: Uber

July 2010 - Uber is launched as a cab-hailing app in San Francisco
July 2012 - Uber is granted a five-year private hire licence to operate in the capital
May 2017 - Uber is issued a four-month operator licence to continue working in London to allow TfL to "consider" a new five-year deal
18 September 2017 - TfL announces it is overhauling the current fee structure for private hire companies operating in the capital
22 September 2017 - TfL announces it will not be issuing Uber London Limited with a private hire operator licence

Do you use Uber? Do you drive for them? What is your reaction to this news? Email your comments to haveyoursay@bbc.co.uk

Please include a contact number if you are willing to speak to a BBC journalist. You can also contact us in the following ways:

Tweet: @BBC_HaveYourSay
WhatsApp: +44 7555 173285
Text an SMS or MMS to 61124 (UK) or +44 7624 800 100 (international)
Please read our terms & conditions

Or comment here:

Your contact details Name Your E-mail address(required) Town & Country Your telephone number Comments(required)

If you are happy to be contacted by a BBC journalist please leave a telephone number that we can contact you on. In some cases a selection of your comments will be published, displaying your name as you provide it and location, unless you state otherwise. Your contact details will never be published. When sending us pictures, video or eyewitness accounts at no time should you endanger yourself or others, take any unnecessary risks or infringe any laws. Please ensure you have read the terms and conditions.

Terms and conditions

↧

Taking a full course of antibiotics doesn’t help prevent spread of resistance

September 22, 2017, 4:50 am

≫ Next: PulseAudio under the hood

≪ Previous: Uber London loses licence to operate

Rosemary Petty, a Publix Supermarket pharmacy technician, counts out a prescription of antibiotic pills August 7, 2007 in Miami, Florida. — You might not need the whole bottle!

This story originally appeared on the Conversation and is republished here with permission.

A recent article in the British Medical Journal set off a bit of a firestorm with its claim that “the antibiotic course has had its day.” The authors challenged the very widespread belief that you should keep taking every last dose of antibiotic prescribed by your physician even after you feel better. This advice has long been claimed to be key to preventing antibiotic resistance.

Want to listen to this article out loud? Hear it on Slate Voice.

Listen to an audio recording of this article

Get Slate Voice, the spoken edition of the magazine, made exclusively for Slate Plus members. In addition to this article, you’ll hear a daily selection of our best stories, handpicked by our editors and voiced by professional narrators.

Start your free 2-week trial

Your Slate Voice podcast feed

To listen to an audio recording of this article, copy this link and add it to your podcast app:

For full instructions see the Slate Pluspodcasts FAQ.

The challenge to that claim has sparked a backlash, with physicians and public health officials expressing concern about an overly complex message that could result in encouraging patients to go rogue and ignore their doctors’ prescriptions.

All of this is a bit amusing to experts like me, who have been actively challenging this silly dogma for a long time but without the recent media attention. So let me try to bring some clarity to the debate, which really isn’t that complex, so you understand what to do when your doctor gives you a prescription for antibiotics.

I’ll start with the bottom line first. It is absolutely false that continuing to take antibiotics after you feel completely better will reduce the emergence of antibiotic resistance. Quite to the contrary, it will very likely promote the emergence of antibiotic resistance!

How did we get to this point, with such a huge gulf between reality and dogma? And how could so much of the medical community have been so wrong for so long?

Let’s make one thing explicitly clear. Historically, doctors and patients have both been pretty dreadful about using antibiotics appropriately. We’ve long prescribed antibiotics out of fear and habit, not science.

Indeed, as far back as 1945, Alexander Fleming, the man who discovered penicillin in 1928, warned the public that people were taking penicillin to treat diseases not caused by bacteria and that this unnecessary use of penicillin was going to breed out antibiotic resistance. He warned that those who so abused penicillin would be “morally responsible” for the deaths that ensued from penicillin-resistant infections.

The sad thing is, society never did listen. Recent data confirm that our use of antibiotics has gotten no better in the modern era.

It has been estimated that 30 to 50 percent of antibiotic use in the U.S. is unnecessary. In my opinion, those numbers are rather hilariously low. I would that estimate 75 percent or more of antibiotic use is unnecessary. You see, I, and experts like me, have a more stringent definition of when antibiotic use is appropriate or not, which includes giving too long a course of antibiotics to patients who need them.

The truth is, we do not know how long a course of antibiotics is necessary to treat most types of infections. Even in the modern scientific era, the primary basis for the duration of most courses of antibiotics is a decree Constantine the Great issued in A.D. 321 that the week would consist of seven days. That’s why your doctor gives you seven or 14 days’ worth of antibiotics!

If good old Constantine had decreed four days in a week, doctors would be prescribing antibiotics in four- or eight-day courses, rather than seven- or 14-day courses. I refer to seven- or 14-day antibiotic courses as “one or two Constantine units” to underscore the absurdity of the basis for these durations.

But in the last two decades, we actually have had dozens of clinical trials published demonstrating that shorter courses of antibiotics are just as effective as longer courses.

Furthermore, the shorter-course regimens led to lower rates of antibiotic resistance.

Enter Dr. Louis Rice, chairman of medicine at Brown University. Rice has long been an international leader in antibiotic use and resistance. Ten years ago, he gave a lecture at a national infection meeting in which he challenged physicians to move to evidenced-based, short-course antibiotic regimens.

He then went a step further and became the first expert to publicly challenge the nonsense that continuing to take antibiotics after you feel better would somehow prevent the emergence of resistance.

Rice is the man who first called out that the emperor had no clothes. He traced the origin of this incorrect myth back through time. He discovered that the myth appears to originate due to a general misunderstanding of the findings in an article published in 1945, one of the first descriptions of penicillin therapy for pneumonia (lung infection).

This misunderstanding blossomed into the extremely pervasive, indeed erroneous, silly urban legend that continuing antibiotics beyond resolution of symptoms somehow helps reduce antibiotic resistance.

Not only is there absolutely no evidence that taking antibiotics past when you feel better will reduce antibiotic resistance, it doesn’t even make sense. The fact is, the longer you take an antibiotic for, the greater the chance that antibiotic-resistant bacteria will emerge in your body and in the environment around you.

There are some chronic infections, such as tuberculosis, where you do indeed have to take long courses of antibiotics, not to prevent resistance but rather to cure the infection. But for most acute bacterial infections, short courses of antibiotics result in equivalent cure rates and with less chance of causing the emergence of antibiotic resistance among the bacteria in and on your body.

The World Health Organization agrees that there is scant, if any, evidence for long courses of antibiotics, saying that “shorter treatments make more sense—they are more likely to be completed properly, have fewer side effects and also are likely to be cheaper.”

So what should we do about antibiotic courses?

Medicine in the 21^st century is a team sport. You and your physician need to be partners in decision-making. If you are sick and your doctor mentions antibiotics to you, the first thing you should say is, “Hey, doc, do I really need the antibiotic?”

Doctors may otherwise prescribe an antibiotic even when you don’t need one, out of fear that you will be unhappy without the prescription. Flip the script on them. Help them to know that you’d prefer not to take the antibiotic unless it is really necessary.

If your doctor says, “Yes, I believe you have a bacterial infection and you need the antibiotic,” the next question is, “OK, can we treat for a short course?”

Third, after you begin taking the antibiotics, if you feel much better before you complete the course, give your doctor a call and ask if you can safely stop therapy.

So the bottom line is, doctors should prescribe as short a course of antibiotics as possible to treat your bacterial infection. If you feel completely well before you finish that course, you should be encouraged to call your physician to discuss if it is safe to stop early.

See, it’s not so complicated after all.

↧

PulseAudio under the hood

September 22, 2017, 2:17 am

≫ Next: Coding the history of Deep Learning

≪ Previous: Taking a full course of antibiotics doesn’t help prevent spread of resistance

Table of contents

Preface

I’m working on the Roc open-source project, a development kit for realtime streaming over an unreliable network. (The project is not ready for use yet, but in case you’re interested, you can follow updates on my twitter).

We decided to implement a set of PulseAudio modules that will allow PulseAudio to use Roc as a network transport. Many Linux distros employ PulseAudio, and their users will be able to improve network service quality without changing the workflow. This led me to dig into PulseAudio internals and eventually to this post.

Why this post?

PulseAudio has Documentation page covering many specific problems that may be encountered by user and developer. Modules page contains a complete list of existing modules with parameters. D-Bus API and C API are also documented well.

Unfortunately, the available documentation doesn’t give a bird-eye view and an explanation of PulseAudio features and design and doesn’t cover many implementation details.

In result, the overall picture remains unclear. Advanced configuration looks mysterious because one need to understand what happens under the hood first. The learning curve for the module writer is high too.

This post tries to fill the gap and has three goals:

overview available features
explain the underlying design and important implementation details
provide a starting point for writing clients and server modules

This post does not provide a detailed reference or tutorial for PulseAudio configuration and APIs. Further details can be obtained from the official documentation (for configuration and client APIs) and from the source code (for internal interfaces).

Disclaimer

I’m not a PulseAudio developer. This post reflects my personal understanding of PulseAudio, obtained from the source code, experiments, official wiki, mailing lists, and blog articles. It may be inaccurate. Please let me know about any issues.

PulseAudio tends to trigger flame wars, which I believe are non-constructive. This post tries to be neutral and provide an unbiased overview of the implemented features and design.

Thanks

I’d like to thank my friends and colleagues Mikhail Baranov and Dmitriy Shilin who read drafts of this post and provided a valuable feedback. They definitely made it better!

About PulseAudio

PulseAudio is a sound server for POSIX OSes (mostly aiming Linux) acting as a proxy and router between hardware device drivers and applications on single or multiple hosts.

See details on the About page on wiki.

Design goals

PulseAudio is designed to meet a number of goals.

Abstraction layer for desktop audio
PulseAudio manages all audio applications, local and network streams, devices, filters, and audio I/O. It provides an abstraction layer that combines all this stuff together in one place.
Programmable behavior
A rich API provides methods for inspecting and controlling all available objects and their run-time and persistent properties. This makes it possible to replace configuration files with GUI tools. Many desktop environments provide such tools.
Automatic setup
PulseAudio is designed to work out of the box. It automatically detects and configures local devices and sound servers available in the local network. It also implements numerous policies for automatic audio management and routing.
Flexibility
PulseAudio provides a high flexibility for the user. It’s possible to connect any stream of any application to any local or remote device, configure per-stream and per-device volumes, construct sound processing chains, and more.
Extensibility
PulseAudio provides a framework for server extensions. Many built-in features are implemented as modules. Third-party modules exist as well, including advanced policy-based routing and AirPlay2 support.

Feature overview

The following list gives an idea of the features implemented in PulseAudio.

Protocols and networking
PulseAudio supports a variety of network protocols to communicate with clients, remote servers, and third-party software.
Device drivers
PulseAudio supports several backends to interact with hardware devices and controls. It supports hotplug and automatically configures new devices.
Sound processing
PulseAudio implements various sound processing tools, like mixing, sample rate conversion, and acoustic echo cancellation, which may be employed manually or automatically.
Sample cache
PulseAudio implements an in-memory storage for short named batches of samples that may be uploaded to the server once and then played multiple times.
Stream management
PulseAudio manages all input and output streams of all desktop applications, providing them such features as clocking, buffering, and rewinding.
Time management
PulseAudio implements a per-device timer-based scheduler that provides clocking in the sound card domain, maintains optimal latency, and reduces the probability of playback glitches.
Power saving
PulseAudio employs several techniques to reduce CPU and battery usage.
Automatic setup and routing
PulseAudio automatically sets parameters of cards, devices, and streams, routes streams to devices, and performs other housekeeping actions.
Desktop integrations
PulseAudio implements several features that integrate it into the desktop environment.
Compatibility layers
There are several compatibility layers with other sound systems, so that existing applications may automatically run on top of PulseAudio without modification.

Use cases

Here are some practical examples of how PulseAudio features may be used on the desktop:

Smart hotplug handling. For example, automatically setup Bluetooth or USB headset when it’s connected, or automatically switch to headphones when they’re inserted into the jack.
A GUI for easy switching an audio card between various modes like stereo, surround, or S/PDIF.
A GUI for easy switching an audio stream to any available audio device, like internal speakers, wired headphones, Bluetooth headset, or HDMI output.
A GUI for making a single application louder than others, or muting it, and remembering this decision when the application will appear next time.
A GUI for routing audio to a remote device available in LAN. For example, connecting a browser playing music on a laptop to speakers attached to a Raspberry Pi.
Automatically routing music or voice from a Bluetooth player or mobile phone to a sound card or Bluetooth speakers or headset.
Transparently adding various sound processing tools to a running application, for example adding acoustic echo cancellation to a VoIP client.
Reducing CPU and battery usage by automatically adjusting latency on the fly to a maximum value acceptable for currently running applications, and by disabling currently unnecessary sound processing like resampling.
Smart I/O scheduling, which may combine a high latency for playback (to avoid glitches and reduce CPU usage) and a low latency for user actions like volume changes (to provide smoother user experience).
Automatically integrating existing desktop applications into PulseAudio workflow, even if they are not aware of PulseAudio.

Problems and drawbacks

There are several known disadvantages of using PulseAudio, including both fundamental issues, and implementation issues that may be resolved in the future:

additional complexity, overhead, and bugs (more code always means more bugs)
lack of comprehensive documentation
non-intuitive command line tools and configuration
weird features like autospawn and built-in watchdog
higher minimum possible latency
poor quality of service over an unreliable network like 802.11 (WiFi)
no hardware mixing and resampling
no hardware volumes when using ALSA UCM

High-level components

The diagram below demonstrates a simplified view of an example PulseAudio setup.

It shows three clients (employing three different APIs), one local PulseAudio server, two remote PulseAudio servers (connected via “native” and RTP protocols), one remote RTP receiver, ALSA backend, and a set of modules required to serve this setup.

The diagram shows most important PulseAudio components:

libpulse-simple
Client library.
Provides “Simple API” for applications. Implemented as a wrapper around libpulse.
libpulse
Client and server library.
Provides “Asynchronous API” for applications. Communicates with the server via the “native” protocol over a Unix domain or TCP stream socket.
Contains only definitions and code that are part of public API. The server also reuses definitions and some code from this library internally.
libpulsecommon
Client and server library.
Contains parts from libpulsecore which are needed on both client and server but can’t be included into libpulse because they are not part of public API. For technical reasons, it also contains parts of libpulse.
libpulsecore
Server library.
Provides internal API for modules. Contains common environment and generic building blocks for modules.
modules
Server extensions.
Many server features are implemented in modules, including network protocols, device drivers, desktop integrations, etc.

Key abstractions

This sections discusses the key server-side object types.

Devices and streams

PulseAudio is built around devices (sources and sinks) connected to streams (source outputs and sink inputs). The diagram below illustrates these connections.

Source
A source is an input device. It is an active unit that produces samples.
Source runs a thread with its own event loop, generates sample chunks, and posts them to all connected source outputs. It also implements clocking and maintains latency. The rest of the world usually communicates with a source using messages.
The typical source represents an input sound device, e.g. a microphone connected to a sound card line input or on a Bluetooth headset. PulseAudio automatically creates a source for every detected input device.
Source output
A source output is a recording stream. It is a passive unit that is connected to a source and consumes samples from it.
The source thread invokes source output when next sample chunk is available or parameters are updated. If the source and source output use different audio formats, source output automatically converts sample format, sample rate, and channel map.
The typical source output represents a recording stream opened by an application. PulseAudio automatically creates a source output for every opened recording stream.
Sink
A sink is an output device. It is an active unit that consumes samples.
Sink runs a thread with its own event loop, peeks sample chunks from connected sink inputs, and mixes them. It also implements clocking and maintains latency. The rest of the world usually communicates with a sink using messages.
The typical sink represents an output sound device, e.g. headphones connected to a sound card line output or on a Bluetooth headset. PulseAudio automatically creates a sink for every detected output device.
Sink input
A sink input is a playback stream. It is a passive unit that is connected to a sink and produces samples for it.
The sink thread invokes sink input when next sample chunk is needed or parameters are updated. If sink and sink input use different audio formats, sink input automatically converts sample format, sample rate, and channel map.
The typical sink input represents a playback stream opened by an application. PulseAudio automatically creates a sink input for every opened playback stream.

Object hierarchy

The diagram below shows the hierarchy of the server-side objects.

Core
The core provides a shared environment for modules. Modules use it to find and register objects, install hooks for various events, register event loop handlers, etc.
There is only one core instance which is created at startup.
Module
A module represents a loadable server extension.
Modules usually implement and register other objects. A module can be loaded multiple times with different parameters and so have multiple instances.
The typical module implements a network or device discovery, a network or hardware device or stream, or a sound processing device. For example, PulseAudio loads a new instance of the module-alsa-card for every detected ALSA card.
Client
A client represents an application connected to the PulseAudio server.
It contains lists of playback and recording streams.
The typical client represents a local application, e.g. a media player, or a remote PulseAudio server. PulseAudio automatically creates a client for every incoming connection.
Card
A card represents a physical audio device, like a sound card or Bluetooth device.
It contains card profiles, device ports, and devices (sources and sinks) connected to device ports. It also has a single active card profile.
The typical card represents ALSA card or Bluetooth device. PulseAudio automatically creates a card for every detected physical device.
Card profile
A card profile represents an opaque configuration set for a card.
It defines the backend-specific configuration of the card, and the list of currently available devices (sources and sinks) and device ports. Only one card profile of a card may be active at the same time. The user can switch the active card profile at any time.
The typical card profile represents the sound card mode, e.g. analog and digital output, and mono, stereo, and surround mode. PulseAudio automatically creates a card profile for every available operation mode of a card.
Device port
A device port represents a single input or output port on the card.
A single card may have multiple device ports. Different device ports of a single card may be used simultaneously via different devices (sources and sinks).
The typical device port represents a physical port on a card, or a combination of a physical port plus its logical parameters, e.g. one output port for internal laptop speakers, and another output port for headphones connected via a line out. PulseAudio automatically creates device ports for every detected card, depending on the currently active card profile.
Device
A device represents an active sample producer (input device) or consumer (output device).
A device can have an arbitrary number of streams connected to it. A recording stream (source output) can be connected to an input device (source). A playback stream (sink input) can be connected to an output device (sink).
There are three kinds of devices:
- Hardware device
  Hardware source or sink is associated with a card. It contains a subset of device ports provided by the card and has a single active device port, from which it will read or write samples. The user can switch the active device port of a source or sink at any time.
  PulseAudio automatically creates one or several pairs of a hardware source and sink for every detected card, depending on the currently active card profile.
- Virtual device
  Virtual source or sink is not associated with a card. It may represent a remote network device or anything else, depending on the implementation.
  PulseAudio may automatically create a pair of virtual source and sink for every remote sound card exported by every PulseAudio server in the local network.
- Monitor device
  Sink monitor is a special kind of virtual source associated with a sink.
  Every sink automatically gets a sink monitor, named as “<sink_name>.monitor”. Every time when the sink reads a chunk from its sink inputs, it also writes this chunk to the sink monitor.
  Typical usage of the sink monitor is capturing all sound that was sent to speakers and duplicating it somewhere else. PulseAudio automatically creates a sink monitor for every sink.
Stream
A stream represents a passive sample consumer (recording stream) or producer (playback stream).
Every stream should be connected to some device. A recording stream (source output) should be connected to an input device (source). A playback stream (sink input) should be connected to an output device (sink).
There are two kinds of streams:
- Application stream
  An application stream is associated with a client. It is created when an application connected to PulseAudio server starts playback or recording.
- Virtual stream
  A virtual stream is not associated with a client. It may represent a remote network server or anything else, depending on the implementation.
Sample cache
The sample cache is an in-memory storage for short named batches of samples that may be uploaded to the server once and then played multiple times.
The sample cache is useful for event sounds. It’s also a simple way to overcome the network latency issues because the samples are played after they are uploaded to the server.

D-Bus API

component

libpulsecore

module-dbus-protocol

PulseAudio server may be inspected and controlled via D-Bus API. Note that it can’t be used for playback and recording. These features are available only through the C API.

Buses and services

D-Bus has several modes of communication:

via a system bus (system-wide)
via a session bus (one bus per login session)
peer-to-peer (direct communication between applications)

PulseAudio implements several D-Bus services:

Device reservation API (on session bus)
Server lookup API (on session bus)
Server API (peer-to-peer)
Server API extensions (peer-to-peer)

Device reservation API

Device reservation API provides methods for coordinating access to audio devices, typically ALSA or OSS devices. It is used to ensure that nobody else is using the device at the same time.

If an application needs to use a device directly (bypassing PulseAudio), it should first acquire exclusive access to the device. When access is acquired, the application may use the device until it receives a signal indicating that exclusive access has been revoked.

This API is designed to be generic. It is a small standalone D-Bus interface with no dependencies on PulseAudio abstractions, so it may be easily implemented by other software.

Server lookup API

PulseAudio server API uses peer-to-peer D-Bus mode. In this mode, clients communicate directly with the server instead of using a session bus, which acts as a proxy. In contrast to the session bus mode, this mode permits remote access and has lower latency. However, clients need a way to determine the server address before connecting to it.

To solve this problem, PulseAudio server registers server lookup interface on the session bus. A client should first connect to the session bus in order to discover PulseAudio server address and then connect to PulseAudio server directly for peer-to-peer communication.

Server API

Server API is available through a peer-to-peer connection to PulseAudio server.

Every object in the hierarchy is identified by a unique path. The hierarchy starts with the core object, which has a well-known path. It may be used to discover all other objects.

The diagram below shows the most important D-Bus interfaces.

Core - A top-level interface that provides access to all other interfaces.
Module - A loadable server extension.
Client - An application connected to the PulseAudio server.
Card - A physical audio device, like a sound card or Bluetooth device.
CardProfile - An opaque configuration set for a card.
DevicePort - A single input or output port on the card.
Device - The parent interface for Source and Sink.
Source - An input device. Hardware source is associated with a card and device port.
Sink - An output device. Hardware sink is associated with a card and device port.
Stream - May be either a recording stream (soruce output) or playback stream (sink input). Application stream is associated with a client.
Sample - A named batch of samples in the sample cache.

In addition to the core interface, PulseAudio modules can register custom server API extensions, that are also discoverable through the core. Several extensions are available out of the box:

StreamRestore
component
module-stream-restore
Query and modify the database used to store device and stream parameters.
Equalizer
component
module-equalizer-sink
Query and modify equalizer sink levers.
Ladspa
component
module-ladspa-sink
Query and modify LADSPA sink control ports.

C API

PulseAudio provides C API for client applications.

The API is implemented in the libpulse and libpulse-simple libraries, which communicate with the server via the “native” protocol. There are also official bindings for Vala and third-party bindings for other languages.

C API is a superset of the D-Bus API. It’s mainly asynchronous, so it’s more complex and harder to use. In addition to inspecting and controlling the server, it supports recording and playback.

The API is divided into two alternative parts:

Asynchronous API (libpulse), complicated but complete
Simple API (libpulse-simple), a simplified synchronous wrapper for the recording and playback subset of the asynchronous API

Asynchronous API

Asynchronous API is based on event loop and callbacks.

The diagram below demonstrates the workflow.

Main Loop
The first step is creating an instance of one of the available Main Loop implementations. They differ in a way how the user can run the main loop: directly, in a separate thread, or using Glib.
All communications with the server happen inside the main loop. The user should run main loop iterations from time to time. The user can either register callbacks that are invoked from the main loop or use polling.
For a regular main loop, polling may be performed between iterations. For a threaded main loop, polling may be performed after obtaining a lock from another thread.
Context
The second step is creating a Context object that represents a connection to the server. The user can set callbacks for context state updates.
Stream
When context state becomes ready, the user can create one or multiple Stream objects for playback or recording.
The user can set callbacks for stream state updates and I/O events, invoked when the server wants to send or receive more samples. Clocking is controlled by the server. Stream also provides several management functions, like pause, resume, and volume control.
Sample Cache
In addition to regular streams, the user can also use Sample Cache to upload named batches of samples to the server without playing them and start playback later. This is a simple way to overcome problems with latency.
Introspection
Having a context in a ready state, Server Query and Control API may be used to query and modify various objects on the server. All operations are asynchronous. The object hierarchy is similar to D-Bus API described above.
Property Lists
Every server-side object has a Property List, a map with textual keys and arbitrary textual or binary values. Applications and modules may get and set these properties. Various modules implement automatic actions based on some properties, like routing, volume setup, and autoloading filters.
The typical usage in applications is to provide a property list when creating a context (for client properties) and when creating a stream (for stream properties). Higher-level frameworks that use PulseAudio (like GStreamer) usually do it automatically.
Operations
All operations with server-side objects are asynchronous. Many API calls return an Operation object which represents an asynchronous request. It may be used to poll the request status, set completion callback, or cancel the request.
Events
The client can receive two types of events from server:
- subscription events
  Events API provides methods for subscribing events triggered for the server-side objects available through the introspection API. Every event has an integer type and arbitrary binary payload.
- stream events
  Stream Events are generated to acknowledge the client of the stream state change or ask it to do something, e.g. pause the stream. Such event has a textual name and arbitrary binary payload.

Simple API

component

libpulse-simple

Simple API is a convenient wrapper around the threaded main loop and stream. The user just chooses parameters, connects to the server and writes or reads samples. All operations are blocking.

Limitations:

only single stream per connection is supported
no support for volume control, channel mappings, and events

Protocols and networking

PulseAudio server supports a variety of network protocols to communicate with clients, remote servers, and third-party software. See Network page on wiki.

PulseAudio implements two custom protocols:

“native”, a full-featured protocol for most client-server and server-server communications
“simple”, which is rarely useful

It also supports several foreign transport and discovery protocols:

mDNS (Zeroconf)
RTP/SDP/SAP
RAOP
HTTP
DLNA and Chromecast
ESound

And two control protocols:

Native protocol

component

libpulsecore

module-native-protocol-{fd,unix,tcp}

PulseAudio uses a so-called “native” protocol for client-server and server-server connections, which works over a Unix domain or TCP stream socket. It is a rich, binary, message-oriented, bidirectional, asynchronous protocol.

The Asynchronous API described above mostly mirrors the features provided by this protocol.

They are:

authentication - provide authentication data for the server
streams - manage server-side stream state and exchange samples
sample cache - manage server-side sample storage
introspection - query and modify server-side objects
events - subscribe server-side object events
extensions - send custom commands to modules

There are four message types, each with a header and an optional payload:

packet
Control message. May contain a command (from client to server), a reply (from server to client), or an event (from server to client). Each message has its own payload type.
memblock
Data message. Contains a chunk of samples. In the zero-copy mode payload with samples is omitted and the message contains only a header.
shmrelease, shmrevoke
Shared pool management message. These messages are used to manage the shared memory pool employed in the zero-copy mode.

With the “native” protocol, the client is clocked by the server. Server requests client to send some amount of samples from time to time.

Since the protocol uses stream sockets, it’s not real time. The delays introduced by the sender or network cause playback delays on the receiver.

Zero-copy mode

When the “native” protocol is used for client and server on the same host, the zero-copy mode may be employed.

It requires Unix domain socket to be used, and POSIX shared memory or Linux-specific memfd to be supported and enabled in PulseAudio. It also requires the server and client to run as the same user, for security reasons.

In this mode, chunks are allocated in a shared memory pool and communication is done through a shared ring buffer channel that uses a shared memory block for data and two file descriptor-based semaphores for notifications, on top of POSIX pipe or Linux-specific eventfd.

To establish communication, the server should send to the client file descriptors of the shared memory region and semaphores. The algorithm is the following:

the server creates an anonymous in-memory file using shm_open (POSIX) or memfd_create (Linux-specific)
the server maps the file to memory using mmap and initializes a memory pool there
the server allocates one block from the pool, initializes the shared ring buffer, and creates semaphores using pipe (POSIX) of eventfd (Linux-specific)
the server transfers file descriptors to the client via a Unix domain socket using sendmsg, which provides special API for this feature
the client receives file descriptors and also maps the file to memory using mmap
the server and client now use single shared memory pool and ring buffer
to avoid races, the server and client use mutexes that are placed inside shared memory as well

After this, all messages (packet, memblock, shmrelease, shmrevoke) are sent via the shared ring buffer.

When the memblock message is sent in the zero-copy mode, it’s payload is omitted. Since the server and client use the same shared memory pool, the chunk payload can be obtained from the memory pool using the chunk identifier in the chunk header and is not needed to be transmitted.

Two additional messages are used in this mode:

the peer that received a chunk sends shmrelease when it finishes reading the chunk and wants to return it to the shared memory pool
the peer that has sent a chunk may send shmrevoke when it wants to cancel reading from the chunk

To achieve true zero-copy when playing samples, an application should use the Asynchronous API and delegate memory allocation to the library. When the zero-copy mode is enabled, memory is automatically allocated from the shared pool.

Authentication

When a client connects to the server via the “native” protocol, the server performs several authentication checks in the following order:

auth-anonymous
If the auth-anonymous option is set, then the client is accepted.
uid
If a Unix domain socket is used, and the client has the same UID as the server, then the client is accepted. This check uses a feature of Unix domain sockets that provide a way to securely determine credentials of the other side.
auth-group
If a Unix domain socket is used, and the auth-group option is set, and the client belongs to the group specified by this option, then the client is accepted.
auth-cookie
If the auth-cookie option is set, and the client provided a correct authentication cookie, then the client is accepted.
On start, the server checks the "~/.pulse-cookie" file. If the file isn’t writable, the server reads a cookie from it. Otherwise, it generates a new random cookie and writes it to the file. Optionally, the server also stores the cookie into the X11 root window properties.
Client searches for a cookie in an environment variable, in the X11 root window properties, in parameters provided by the application, and in the "~/.pulse-cookie" file (in this order).
auth-ip-acl
If TCP socket is used, and the auth-ip-acl option is set, and client’s IP address belongs to the address whitelist specified in this option, then the client is accepted.
reject
If all checks have failed, then the client is rejected.

Tunnels

component

module-tunnel-{source,sink}

module-tunnel-{source,sink}-new

Local applications may be connected with to audio devices using tunnel sources and sinks. The diagram below illustrates an example of such connections.

Each tunnel connects a single pair of a local device and remote stream:

a local tunnel sink is connected to a remote sink input
a local tunnel source is connected to a remote source output

Each tunnel acts as a regular PulseAudio client and connects to a remote PulseAudio server via the “native” protocol over TCP. Tunnel sink creates a playback stream, and tunnel source creates a recording stream.

Tunnel devices may be created either manually by the user or automatically if the Zeroconf support is enabled.

mDNS (Zeroconf)

component

module-zeroconf-{publish,discover}

module-bonjour-publish

mDNS (multicast DNS) protocol, a part of the Zeroconf protocol stack, resolves names in the local network without using a name server.

PulseAudio may use Avahi (free Zeroconf implementation) or Bonjour (Apple Zeroconf implementation). If Avahi or Bonjour daemon is running and the Zeroconf support is enabled in PulseAudio, every sink and source on every PulseAudio server in the local network automatically become available on all other PulseAudio servers.

To achieve this, PulseAudio uses automatically configured tunnels:

publishing
PulseAudio server publishes every sink and source as an mDNS service. Each published entry contains the server address, device name and type, and audio parameters, like sample rate and channel map.
Publishing is implemented for both Avahi (module-zeroconf-publish) and Bonjour (module-bonjour-publish).
discovery
PulseAudio server monitors services published on the local network. For every detected service, PulseAudio server creates a tunnel sink or source connected to the remote device and configured with the parameters of that device.
Discovery is implemented only for Avahi (module-zeroconf-discover).

RTP/SDP/SAP

component

module-rtp-{send,recv}

PulseAudio also has the RTP support. Unlike the “native” PulseAudio tunnels, this technology supports multicasting of a single local source to any number of remote sinks.

To achieve this, three protocols are used:

RTP (Real-time Transport Protocol)
A transport protocol for delivering audio and video over IP networks.
SDP (Session Description Protocol)
A format for describing multimedia session parameters. Usually used to describe RTP sessions.
SAP (Session Announcement Protocol)
A protocol for broadcasting multicast session information. Usually used to send SDP messages.

PulseAudio implements both RTP sender and receiver. They may be used together, or with other software that supports RTP, for example VLC, GStreamer, FFMpeg, MPLayer, or SoX. See RTP page on wiki.

The diagram below shows an example workflow.

RTP sender
RTP sender creates an RTP source output.
Every RTP source output is connected to a single source and configured to send RTP packets to a single network address, usually a multicast one.
When the RTP source output is created, it broadcasts RTP session parameters to the local network using SDP/SAP. When source writes samples to RTP source output, source output sends them to preconfigured address via RTP. When RTP source output is destroyed, it broadcasts goodbye message using SDP/SAP.
RTP receiver
RTP receiver listens to SDP/SAP announcements in the local network.
When it receives an announcement for a new RTP session, it creates RTP sink input for it. When it receives goodbye message, it destroys the appropriate RTP sink input.
Every RTP sink input is connected to single sink and is configured to receive RTP packets from single RTP sender.
When RTP sink input receives an RTP packet, it stores it in the queue. When sink requests samples from RTP sink input, RTP sink input reads samples from that queue.

RTP sender can’t be clocked by RTP receiver because the sender has no feedback from the receiver and there may be multiple receivers for a single multicast sender. Since sender and receiver clocks are always slightly different, the receiver queue size is slowly drifting. To avoid this, RTP receiver adjusts resampler rate on the fly so that samples are played a bit slower or faster depending on the queue size.

RTP is a real time protocol. The delays introduced by the sender or network cause playback holes on the receiver. Playback is never delayed and packets delivered too late are just dropped.

RAOP (AirPlay)

component

module-raop-{discover,sink}

RAOP (Remote Audio Output Protocol) is a proprietary streaming protocol based on RTP and RTSP, used in Apple AirPlay devices. RTP is a transport protocol, and RTSP is a control protocol.

AirPlay devices use mDNS and are discoverable via Zeroconf. AirPlay uses AES encryption, but the RSA keys were extracted from Apple devices, and open-source RAOP implementations appeared.

Since version 11.0, PulseAudio has built-in support for RAOP2. PulseAudio uses Avahi to receive mDNS RAOP announcements. Every AirPlay device in the local network automatically becomes available in PulseAudio.

RAOP support consists of two parts:

discovery
PulseAudio server monitors services published on the local network. For every detected service, PulseAudio server creates an RAOP sink.
sink
Every RAOP sink is connected to a single AirPlay device. It uses RTSP to negotiate session parameters and RTP to transmit samples.

HTTP support

component

libpulsecore

module-http-protocol-{unix,tcp}

The HTTP support provides two features:

web interface
A simple web interface provides a few bits of information about the server and server-side objects.
streaming
It’s possible to receive samples from sources and sink monitors via HTTP. This feature is used for DLNA support.
Every source or sink monitor has a dedicated HTTP endpoint.
When a new HTTP client connects to the endpoint, PulseAudio first sends the standard HTTP headers, including the “Content-Type” header with the MIME type corresponding to the sample format in use.
After sending headers, PulseAudio creates a new source output connected to the source or sink monitor which writes all new samples to the HTTP connection. Samples are sent as-is, without any additional encoding.

DLNA and Chromecast

component

module-rygel-media-server

pulseaudio-dlna

DLNA (Digital Living Network Alliance) is a set of interoperability guidelines for sharing digital media among multimedia devices. It employs numerous control and transport protocols, including UPnP, RTP, and custom HTTP APIs.

Chromecast is a line of digital media players developed by Google. It uses Google Cast, a proprietary protocol stack, based on Google Protocol Buffers and mDNS.

There are two implementations of DLNA and/or Chromecast support:

module-rygel-media-server
PulseAudio can become a DLNA media server so that other DLNA devices can discover and read PulseAudio sources. This feature is implemented using Rygel, a DLNA media server. See details here.
PulseAudio registers a Rygel plugin, which exports PulseAudio sources and sink monitors via D-Bus. Every exported source or sink monitor includes an HTTP URL that should be used to read samples from PulseAudio.
For its part, Rygel publishes exported sources and sink monitors via UPnP, and ultimately DLNA clients may see them and read PulseAudio streams via HTTP.
pulseaudio-dlna
A third-party pulseaudio-dlna project allows PulseAudio to discover and send audio to DLNA media renderers and Chromecast devices. Devices published in the local network automatically appear as new PulseAudio sinks.
This project is implemented as a standalone daemon written in Python. The daemon creates a null sink for every discovered remote device, opens the sink monitor associated with it, reads samples from the monitor, performs necessary encoding, and sends samples to the remote device.
The communication with the PulseAudio server is done via the D-Bus API (to query and configure server objects) and parec tool (to receive samples from a sink monitor).

ESound

component

libpulsecore

module-esound-protocol-{unix,tcp}

PulseAudio server may be accessed via the protocol used in Enlightened Sound Daemon.

The documentation says that it supports playback, recording, and control commands, so switching to PulseAudio should be transparent for applications that are using ESound.

Simple

component

libpulsecore

module-simple-protocol-{unix,tcp}

The “simple” protocol is used to send or receive raw PCM samples from the PulseAudio server without any headers or meta-information.

The user should configure the sample format and the source and sink to use. Then the user may use tools like netcat to send PCM samples over a Unix domain or TCP socket.

CLI

component

libpulsecore

module-cli-protocol-{unix,tcp}

module-cli

PulseAudio server implements its own CLI protocol.

It is a simple text protocol that provides various commands to inspect and control the server:

status commands - list and inspect server-side objects
module management - load, unload, and inspect modules
moving streams - move streams to devices
killing clients and streams - remove clients and streams
volume commands - setup volumes of devices and streams
configuration commands - setup parameters of devices and device ports
property lists - setup property lists of devices and streams
sample cache - add, remove, or play samples in the server-side sample cache
log and debug commands - configure logging, dump server configuration, etc
meta commands - include and conditional directives

The same syntax may be used in several places:

in PulseAudio configuration files
over a Unix domain or TCP socket (module-cli-protocol-{unix,tcp})
over the controlling TTY of the server (module-cli)

The TTY version requires that the server should be started in foreground mode in a terminal. The pacmd tool uses the CLI protocol over a Unix domain socket.

Device drivers

PulseAudio has several backends that implement audio I/O and device management. The diagram below illustrates backend-specific components.

card
A card represents a physical audio device, like a sound card or Bluetooth device. It contains card profiles, device ports, and devices. It has a single active card profile.
card profile
A card profile represents an opaque configuration set of a card, like an analog or digital mode. It defines the backend-specific configuration of the card, and the list of currently available device ports and devices.
device port
A device port represents a single input or output port on the card, like internal speakers or external line-out. Multiple device ports may belong to a card.
device
A device (source or sink) represents an active sample producer or consumer. A hardware device is associated with a card and a set of device ports. It has a single active device port. Multiple devices may belong to a card.

Sources and sinks

Every backend should implement the following features of a source or sink:

reading or writing samples to the device
maintaining latency
providing clocking in the device time domain

There are two types sources and sinks:

Hardware source or sink has an associated card object and is connected to one of its device ports. The user can dynamically switch active device port of hardware source or sink.
Virtual source or sink doesn’t have an associated card object and device ports. However, backend still may internally associate it with a platform-specific device or port or whatever else.

Currently, the only full-featured backends are ALSA and Bluetooth, which implement all five object types and provide hardware sources and sinks. Other backends don’t implement all object types and provide only virtual sources and sinks.

ALSA backend

component

module-udev-detect

module-alsa-{card,source,sink}

ALSA (Advanced Linux Sound Architecture) is a Linux kernel component providing device drivers for sound cards, and a user space library interacting with the kernel drivers. It provides a rich high-level API and hides hardware-specific stuff.

ALSA uses a device hierarchy that is different from the PulseAudio hierarchy. See this post for an overview. Here it is:

card
ALSA card represents a hardware or virtual sound card. Hardware cards are backed by kernel drivers, while virtual cards are implemented completely in user space plugins.
device
ALSA card contains at least one playback or capture device. A device is something that is capable of processing single playback or recording stream. All devices or a card may be used independently and in parallel. Typically, every card has one playback device and one capture device.
subdevice
ALSA device contains at least one subdevice. All subdevices of the same device share the same playback or recording stream. For playback devices, subdevices are used to represent available slots for hardware mixing. Typically, there is no hardware mixing, and every device has a single subdevice.

ALSA device is identified by a card number and device number. PulseAudio by default interacts only with hardware ALSA devices. PulseAudio currently doesn’t use hardware mixing and so don’t employ multiple subdevices even if they’re available.

For every ALSA device, there are several user space interfaces. The most important are:

PCM
PCM interface provides methods for playback and recording.
Applications can setup the per-device kernel-side ring buffer parameters, write or read samples to the buffer, and issue flow control operations.
CTL
CTL interface provides methods for configuring device volume and options.
Applications usually don’t use it directly. Instead, they use the mixer interface, which is implemented on top of the HCTL interface, which in turn is implemented on top of the CTL interface.
The mixer interface provides access to a per-device mixer, containing multiple mixer elements. Each mixer element (a.k.a. jack control) is associated with a kernel-side kcontrol, which represents a volume meter or a toggleable switch or enumeration for some device option.
UCM
UCM (Use Case Manager) provides high-level configuration presets that may be used instead of configuring mixer elements manually.
Applications can describe their use-cases by selecting one of available presets. The UCM then performs all necessary configuration automatically, hiding machine-specific details and complexity of the mixer interface.

ALSA backend in PulseAudio automatically creates PulseAudio cards, card profiles, device ports, sources, and sinks:

PulseAudio card is associated with an ALSA card.
PulseAudio card profile is associated with a configuration set for an ALSA card. It defines a subset of ALSA devices belonging to a card, and so the list of available device ports, sources, and sinks.
PulseAudio device port is associated with a configuration set for an ALSA device. It defines a list of enabled ALSA jacks and other device options.
PulseAudio source and sink are associated with an ALSA device. When a source or sink is connected to a specific device port, they together define an ALSA device and its configuration.

The concrete meaning of PulseAudio card profile and device ports depends on whether the UCM is available for an ALSA card or not (see below).

PulseAudio sources and sinks for ALSA devices implement a timer-based scheduler that manages latency and clocking. It is discussed later in this post.

ALSA jacks

ALSA device often has multiple ports, or jacks, for example, one for a line-out, and another for internal speaker. All such jacks are represented with a single ALSA device, and the active jack is selected via a toggleable mixer element.

UCM abstracts jacks into logical mutually exclusive UCM devices, like “Speaker” or “Headset”. Note that UCM device doesn’t represent ALSA device or subdevice. It represents a value of a mixer element of an ALSA device.

Roughly speaking, PulseAudio probes jacks and creates at least one device port for every available one, regardless of whether UCM is available or not. However, details differ in these two cases.

ALSA cards with UCM

ALSA Use Case Manager aims two goals:

Abstract the applications which configure ALSA devices from the complexity of the mixer interface.
Make these applications portable across numerous embedded and mobile devices, by moving the machine-specific part to configuration files.

UCM lets applications to operate with such high-level operations as “setup this device to play HiFi music via an external line-out” or “setup that device to capture voice for a phone call via an internal microphone”.

UCM then looks into the local configuration files and maps such use-case description to the concrete values of mixer elements. These files may be part of the UCM package or may be provided by a device vendor.

An application provides the UCM with three strings:

ucm verb
Defines the main operation mode of an ALSA device, e.g. “HiFi” or “Voice”. Only one UCM verb may be active at the same time.
ucm modifier
Defines a supplementary operation mode of an ALSA device, e.g. “PlayMusic” or “PlayTone”. Available UCM modifiers are defined by currently active UCM verb. Zero or multiple UCM modifiers may be active at the same time.
ucm device
Defines active jacks of an ALSA device, e.g. “Speaker” or “Headset”. Available UCM devices are defined by currently active UCM verb. Zero or multiple UCM devices may be active at the same time.

A combination of one UCM verb, zero or multiple UCM modifiers, and one or multiple UCM devices define what ALSA device to use and how to configure its mixer elements.

When UCM is available for a card, PulseAudio automatically employs it.

The diagram below illustrates relations between PulseAudio and ALSA objects when the UCM is active. Some diagrams and details are also available on Linaro Wiki: 1, 2, 3.

The mapping of the PulseAudio object hierarchy to the ALSA object hierarchy is the following:

PulseAudio card is associated with the UCM interface of an ALSA card. One PulseAudio card is created for every ALSA card.
PulseAudio card profile is associated with a UCM verb. For every card, one PulseAudio profile is created for every UCM verb available for the card.
PulseAudio source is associated with the PCM interface of a capture ALSA device. For every card, one PulseAudio source is created for every available capture ALSA device that is associated with a UCM verb, UCM modifier, or UCM device available in the currently active card profile.
PulseAudio sink is associated with the PCM interface of a playback ALSA device. For every card, one PulseAudio sink is created for every playback ALSA device that is associated with a UCM verb, UCM modifier, or UCM device available in the currently active card profile.
PulseAudio device port is associated with a combination of a UCM modifier and UCM devices. For every source or sink, one PulseAudio device port is created for every possible valid combination of zero or one UCM modifier and one or multiple UCM devices.
A valid combination includes:
- only UCM modifiers and devices that are enabled by currently active card profile
- only UCM modifiers and devices that are associated with the ALSA device of the source or sink
- only non-mutually exclusive UCM devices
Every UCM modifier is mapped to a PulseAudio role. The UCM modifier of a device port is actually enabled only when there is at least one source output or sink input connected to the source or sink of the device port, which has a “media.role” property equal to the UCM modifier’s role.

This is how the mapping is used:

The card defines what ALSA card is used, and so what profiles are available.
The currently active card profile of the card defines what UCM verb is used, and so what sources and sinks are available.
The source or sink defines what ALSA device is used, and so what device ports are available.
The currently active device port of the source or sink defines what UCM modifier and UCM devices are used. Whether the UCM modifier is enabled depends on the roles of currently connected source outputs or sinks inputs.
The currently active UCM verb, UCM modifier, and UCM devices define what ALSA device jacks, options, and volume meters are used.

ALSA cards w/o UCM

Besides the UCM support, PulseAudio has its own configuration system on top of the ALSA mixer. It was developed before UCM appeared. It is used when the UCM is not available for a card.

Mixer configuration is described in custom PulseAudio-specific configuration files. See Profiles page on wiki.

Configuration files define the following objects:

profile set
Provides available profiles for an ALSA card. Contains a list of profiles.
Physically it is a .conf file under the "/usr/share/pulseaudio/alsa-mixer/profile-sets" directory.
profile
Represents a configuration set for an ALSA card. Contains a list of mappings.
Every mapping defines a pair of playback and capture ALSA devices that are available when this profile is active, and the configuration sets available for each device.
Physically it is a [Profile] section in the profile file.
mapping
Represents a pair of playback and capture ALSA devices. Contains:
- device mask(s)
  Device masks are used to match a concrete pair of playback and capture ALSA devices belonging to the ALSA card. First matched devices are used.
- channel map
  Channel mapping defines what channels are used for the ALSA device.
- input/output paths
  Mapping contains multiple input and output paths that represent alternative configuration sets for the ALSA device.
  Every input or output path defines a single configuration set, which provides an ALSA mixer path and settings for ALSA mixer elements accessible through that path.
Physically it is a [Mapping] section in the profile file.
path
Represents a configuration set for a single capture or playback ALSA device. Contains a list of elements, and a list of jacks.
Every element or jack defines an ALSA mixer element and how it should be used when the configuration set defined by this path is active.
Physically it is a .conf file under the "/usr/share/pulseaudio/alsa-mixer/paths" directory.
jack
Represents an ALSA mixer element for a jack that should be used for probing. Contains identifier of the ALSA mixer element and its expected state (plugged or unplugged).
Every jack is probed and its state is compared with the expected one. This probing is used to activate only those paths that are actually available.
Physically it is a [Jack] section in the path file.
element
Represents an ALSA mixer element and defines how it should be handled. Contains:
- element id
  Defines the name of the ALSA mixer element.
- volume policy
  Defines how to handle the volume of the ALSA mixer element. It may be either ignored, unconditionally disabled, unconditionally set to a constant, or merged into the value of PulseAudio volume slider.
- switch value
  Defines how to handle the value of a switch ALSA mixer element. It may be either ignored, unconditionally set to a constant, used for muting and unmuting, or made selectable by the user via an option.
- enumeration value
  Defines how to handle the value of enumeration ALSA mixer element. It may be either ignored, or made selectable by the user via an option.
- options
  Every option defines one alternative value of a switch or enumeration ALSA mixer element. This value is made selectable by the user.
Physically it is an [Element] section in the path file.
option
Represents one alternative value of a switch or enumeration ALSA mixer element. Contains identifier of the ALSA mixer element and its value.
Physically it is an [Option] section in the path file.

When UCM is not available for a card, PulseAudio uses Udev rules to select an appropriate profile set for the card:

PulseAudio installs Udev rules that match known audio card devices by vendor and product identifiers and set PULSE_PROFILE_SET property for them. The property contains a name of a profile set .conf file.
When PulseAudio configures a new ALSA card that has no UCM support, it reads the PULSE_PROFILE_SET property set by Udev rules and loads the appropriate profile set file. The file defines how to create and configure card profiles, device ports, sources, and sinks.
If an ALSA card was not matched by Udev rules and the PULSE_PROFILE_SET property was not set, PulseAudio uses default profile set which contains some reasonable configuration for most cards.

The diagram below illustrates relations between PulseAudio and ALSA objects when UCM is not used.

The mapping of the PulseAudio object hierarchy to the ALSA object hierarchy is the following:

PulseAudio card is associated with an ALSA card and a profile set defined in configuration files. One PulseAudio card is created for every ALSA card, and one profile set is selected for every card.
PulseAudio card profile is associated with a profile defined in configuration files. For every card, one PulseAudio profile is created for every profile in the profile set of the card.
PulseAudio source is associated with a mapping defined in configuration files, and with the PCM interface of the capture device matched by the device mask of the mapping. For every card, one PulseAudio source is created for every mapping in the currently active profile.
PulseAudio sink is associated with a mapping defined in configuration files, and with the PCM interface of the playback device matched by the device mask of the mapping. For every card, one PulseAudio sink is created for every mapping in the currently active profile.
PulseAudio device port is associated with a combination of a path and options defined in configuration files. For every source or sink, one PulseAudio device port is created for every possible combination of one path and a subset of all options of all elements of this path.

This is how the mapping is used:

The card defines what ALSA card is used and what profile set is used, and so what profiles are available.
The currently active card profile of the card defines what mappings are available, and so what sources and sinks are available.
The source or sink defines what ALSA device is used, and what mapping is used, and so what device ports are available.
The currently active device port of the source or sink defines what path is used, what jacks are probed, what elements are used for getting and setting volume and how, and what combination of options of elements of the path is used.
The currently active elements, their volume policies, and their options define how to configure ALSA mixer elements of the ALSA device.

Bluetooth backend

component

module-bluetooth-discover

module-bluez5-{discover,device}

module-bluez4-{discover,device}

PulseAudio supports Bluetooth, a wireless protocol stack for exchanging data over short distances. See Bluetooth page for details. PulseAudio relies on two backends for Bluetooth support:

BlueZ (PulseAudio supports versions 4 and 5, but we discuss only version 5)
oFono (for HFP support)

Bluetooth specification defines numerous Bluetooth profiles which may be supported by a device. Each profile describes the protocols, codecs, and device roles to be used. A Bluetooth device may support a subset of defined profiles and roles. Note that Bluetooth profiles and roles are different from the PulseAudio card profiles and stream roles.

PulseAudio supports three Bluetooth profiles:

A2DP (Advanced Audio Distribution Profile)
Profile for high-quality audio streaming. Usually used to stream music.
The roles of the two connected A2DP devices are:
- Source role (SRC) - the device that sends audio
- Sink role (SNK) - the device that receives audio
PulseAudio supports both roles. For every discovered A2DP device, two options are available:
- for SRC device, the server may create a single PulseAudio source which acts as an SNK device
- for SNK device, the server may create a single PulseAudio sink which acts as an SRC device
HSP (Headset Profile)
Profile for phone-quality audio playback and recording. Usually used for phone calls.
The roles of the two connected HSP devices are:
- Headset role (HS) - the device with the speakers and microphone, e.g. a headset
- Audio Gateway role (AG) - the device that serves as a gateway to an external service, e.g. a mobile phone connected to a cellular network
PulseAudio supports both roles. It can communicate with a headset or be a headset itself for other device. For every discovered HS or AG device, the server may create a pair of PulseAudio source and sink which together act as an AG or HS device.
HFP (Hands-Free Profile)
Provides all features of HSP plus some additional features for managing phone calls.
The roles of the two connected HFP devices are:
- Hands-Free Unit role (HF) - the device with the speakers and microphone, e.g. a portable navigation device
- Audio Gateway role (AG) - the device that serves as a gateway to an external service, e.g. a mobile phone connected to a cellular network
PulseAudio supports both roles. It can communicate with a HF unit or be a HF unit itself for other device. For every discovered HF or AG device, the server may create a pair of PulseAudio source and sink which together act as an AG or HF device.

PulseAudio card profile is associated with a Bluetooth profile and role. The following card profiles are available:

High Fidelity Playback (A2DP Sink) - the PulseAudio card will provide a single PulseAudio source which acts as an A2DP SNK device
High Fidelity Capture (A2DP Source) - the PulseAudio card will provide a single PulseAudio sink which acts as an A2DP SRC device
Headset Head Unit (HSP/HFP) - the PulseAudio card will provide a pair PulseAudio source and sink which together act as an HF device
Headset Audio Gateway (HSP/HFP) - the PulseAudio card will provide a pair PulseAudio source and sink which together act as an AG device

Bluetooth backend listens to BlueZ and oFono events on D-Bus and automatically creates PulseAudio cards and hardware sources and sinks for all discovered Bluetooth devices.

The mapping of the PulseAudio object hierarchy to the Bluetooth object hierarchy is the following:

PulseAudio card is associated with a Bluetooth device. One PulseAudio card is created for every discovered Bluetooth device.
PulseAudio card profile is associated with a Bluetooth profile and role. One of the predefined PulseAudio card profiles created for every available operation mode supported by the Bluetooth device.
One PulseAudio source and/or one PulseAudio sink is created for every PulseAudio card depending on the currently active card profile.
One PulseAudio device port is created for every PulseAudio source or sink.

This is how the mapping is used:

The card defines what Bluetooth device is used and what profiles are available.
The currently active card profile defines what Bluetooth profile and role are used, and so what transport protocols and codecs are used.

JACK backend

component

module-jackdbus-detect

module-jack-{source,sink}

JACK (JACK Audio Connection Kit) is a professional sound server that provides realtime low-latency connections between applications and hardware. Like PulseAudio, JACK may work on top of several backends, including ALSA.

However, their design goals are different. See comments from PulseAudio authors and JACK authors:

PulseAudio design is focused on consumer audio for desktop and mobile. It offers seamless device switching, automatic setup of hardware and networking, and power saving. It can’t guarantee extremely low latency. Instead, it usually adjusts latency dynamically to provide lower battery usage and better user experience even on cheap hardware.
JACK design is focused on professional audio hardware and software. It offers the lowest possible latency and may connect applications directly to devices or each other. It doesn’t try to provide the smooth desktop experience to the detriment of performance or configurability and is targeted to advanced users.

There are three alternative options to use PulseAudio and JACK on the same system:

Use them for different sound cards. JACK can ask PulseAudio to release an ALSA card via device reservation API.
Suspend PulseAudio when JACK is running using pasuspender tool.
Configure PulseAudio to use JACK backend instead of ALSA.

JACK backend for PulseAudio monitors JACK ports published on D-Bus. For every detected port, PulseAudio automatically creates a virtual source or sink, associated with this port.

PulseAudio uses two threads for every JACK source and sink: one realtime thread for the JACK event loop, and another for the PulseAudio one. The reason for an extra thread is that it’s not possible to add custom event sources to the JACK event loop, hence PulseAudio event loop can’t be embedded into it. The extra thread costs extra latency, especially if PulseAudio is not configured to make its threads realtime using rtkit.

Other backends

The following backends are available but have limited functionality:

OSS
component
module-detect
module-oss
OSS (Open Sound System) is an older interface for making and capturing sound in Unix and Unix-like operating systems. Nowadays it is superseded by ALSA on Linux but is used on some other Unix systems. Many systems, including Linux and various *BSD variants, provide a compatibility layer for OSS applications.
OSS backend implements virtual source and sink for OSS devices. Each one is connected to a single device, usually /dev/dspN. At startup, PulseAudio can automatically create a sink and source for every available OSS device.
Solaris
component
module-detect
module-solaris
Solaris backend implements virtual source and sink for /dev/audio device available in Solaris and some *BSD variants, also known as “Sun audio” or “Sunau” and originally appeared in SunOS. This device supports the Au file format.
At startup, PulseAudio can automatically create one sink and one source for /dev/audio device if it is present.
CoreAudio
component
module-coreaudio-{detect,device}
CoreAudio is a low-level API for dealing with sound in Apple’s MacOS and iOS operating systems.
CoreAudio backend monitors available devices and automatically creates card, sink, and source for every detected device. No card profiles and device ports are implemented.
WaveOut
component
module-detect
module-waveout
WaveOut backend implements virtual source and sink for the legacy Win32 WaveIn/WaveOut interfaces. They are part of MultiMedia Extensions introduced in Windows 95 and still supported in recent Windows versions (with some issues).
Each source or sink is connected to a single device. At startup, PulseAudio can automatically create one sink and one source for the first available device if it is running on Windows.
ESound
component
module-esound-sink
ESound backend implements virtual sink acting as a client for Enlightened Sound Daemon. It doesn’t implement source. The documentation recommends avoiding using this sink because of latency issues.
Note that PulseAudio server is also able to emulate ESound server.

Hotplug support

component

module-{udev,jackbus,coreaudio}-detect

module-{bluetooth,bluez5,bluez4}-discover

Hotplug is currently implemented for the following backends:

ALSA (using libudev)
Bluetooth (using BlueZ)
JACK (using D-Bus JACK API)
CoreAudio

In particular, PulseAudio uses libudev to detect ALSA cards (both with and without UCM support). The server creates Udev monitor and filters events for sound card devices:

when a new device is inserted, the server creates a card, card profiles, device ports, sources, and sinks, as described above
when the device is removed, all these objects are removed as well

Hardware controls

PulseAudio server has support for hardware controls. The user should manually specify a sink, and the server will forward volume up/down and mute requests to it.

Two types of controls are supported:

IR remote control
Infrared remote controls are handled using LIRC (Linux Infrared Remote Control).
Multimedia buttons
component
module-mmkbd-evdev
Multimedia buttons available on some keyboards are handled using evdev, a generic input event interface in the Linux kernel, usually used in programs like X server and Wayland.

Sound processing

PulseAudio implements various sound processing tools. Some of them are enabled automatically when necessary (like sample rate conversion), and others should be explicitly configured by the user (like echo cancellation).

Resampler

Every source, sink, source output, and sink input may use its own audio parameters:

sample format (e.g. 32-bit floats in native endian)
sample rate (e.g. 44100Hz)
channel map (e.g. two channels for stereo)

Source output and sink input are responsible for performing all necessary conversions when they are invoked by source or sink. To achieve this, they configure resampler with appropriate input and output parameters and then run it frame-by-frame.

When resampler is configured, it tries to select an optimal conversion algorithm for requested input and output parameters:

chooses sample rate conversion method
chooses the working sample format for sample rate conversion (some methods benefit from using a higher precision)
calculates channel mapping (taking into account channel names and meaning)

For every frame, resampler performs the following steps:

converts frame from input sample format to working sample format
maps input channels to output channels
converts sample rate from input rate to output rate
if LFE channel (subwoofer) is used, applies the LR4 filter
converts frame from working sample format to output sample format

Each step is performed only if it’s needed. For example, if input sample format and working sample format are the same, no conversion is necessary.

Sample rate conversion usually operates at fixed input and output rates. When a client creates a stream, it may enable variable rate mode. In this case, input or output rate may be updated on the fly by explicit client request.

The user can specify what method to use in the server configuration files. You can find a comparison of hardware and some software resampler methods in this post.

The following methods are supported:

speex
Fast resampler from Speex library. If PulseAudio was built with speex support, used by default.
ffmpeg
Fast resampler from FFmpeg library. If PulseAudio was built without speex support, and variable rate mode is not requested, used by default.
src
Slower but high-quality resampler from Secret Rabbit Code (libsamplerate) library. Used in some PulseAudio modules.
sox
Slower but high-quality resampler from SoX library. Not used by default.
trivial
Built-in low-quality implementation used as a fallback when PulseAudio was built without speex support, and ffmpeg can’t be used because variable rate mode was requested.
Instead of interpolation, it uses decimation (when downsampling) or duplication (when upsampling).
copy
No-op implementation used when input and output sample rates are the same.
peaks
Pseudo resampler that finds peaks. It is enabled when a client requests peak detection mode. Instead of interpolation, it calculates every output sample as a maximum value in the corresponding window of input samples.
This mode is usually used in GUI applications like pavucontrol that want to display volume level.

Mixing and volumes

component

libpulse

libpulsecore

When multiple sink inputs are connected to one sink, sink automatically mixes them, taking into account per-channel volume settings. See Volumes and Writing Volume Control UIs pages.

Every source, sink, and sink input (but not source output) has its own per-channel volume level that may be controlled via both C API and D-Bus API.

A way how the sink and sink input volumes are combined is determined by the flat volumes mode (inspired by Windows Vista):

When flat volumes are enabled, the volume of the sink is always the maximum volume of all sink inputs connected to it. When the sink input volume is updated, the sink volume is recalculated too. When the sink volume is updated, all sink input volumes are scaled equally.
When flat volumes are disabled, each sink input has its own volume, considered to be relative to the volume of the sink to which it is connected.

This mode may be enabled per-sink or globally (default in many distros). An application can also disable flat volumes per stream using the relative volume flag.

There are two kinds of volumes:

hardware volumes are used for hardware sources and sinks that support it
software volumes are used everywhere else, in particular for hardware sources and sinks that don’t support hardware volume, and for sink input mixing

Volumes span from 0% to 100%, which are respectively the silence and the maximum volume that the sound hardware is capable of.

Software volumes use the logarithmic scale. Hardware volumes generally use an unspecified scale. However, volumes of hardware sources and sinks that have the decibel volume flag and volumes of all sink inputs may be converted to and from the decibel scale using dedicated API functions.

Finally, virtual source and sinks that are attached to a master source or sink usually use volume sharing mode. When it is enabled, the source or sink always uses the same volume as its master.

Volume range

component

libpulse

libpulsecore

module-alsa-{source,sink}

The device volume range is virtually divided into the three subranges:

[muted, base]
[base, norm]
[norm, norm * n]

The points on the boundaries are the following:

“muted”
Constant. Complete silence (0%, -inf dB).
“base”
Determined dynamically for every device. Defines backend-specific default volume that may be not as loud as the “norm” volume. May be equal to the “norm” volume.
Mapped to the volume where the analog output is at some kind of normalized, pre-defined voltage level. For S/PDIF cards set to the volume where the output PCM samples are unscaled.
“norm”
Constant. Maximum hardware volume of a card (100%, 0 dB). If a card includes a proper amplifier, this volume may be very loud.
For cards with an amplifier, volumes below this point employ hardware amplification, and volumes above this point employ digital (software) amplification.
For cards without an amplifier, digital amplification is always used, and volumes above the “norm” make no sense and are equal to the “norm” volume.
“norm * n”
Maximum volume that a GUI allows to set, e.g. “norm * 2”.
A GUI uses a maximum volume above the “norm” to let the user to employ additional digital amplification, which may useful on some cards that have an amplifier.

Passthrough

By default, PulseAudio uses uncompressed PCM everywhere. However, some input and output devices support various compressed audio encodings.

PulseAudio provides passthrough mode that may be enabled for a source, sink, source output, and sink input. With this mode, samples may be sent in the device-specific compressed encoding.

When a client creates a stream, it may enable passthrough mode. In this case, passthrough mode is enabled for corresponding source output or sink input.

Note that a passthrough source output may be connected only to a passthrough source, and a passthrough sink input may be connected only to a passthrough sink.

Currently, only several S/PDIF (IEC61937) encodings are supported:

IEC61937 DTS
IEC61937 AC3
IEC61937 EAC3
IEC61937 MPEG
IEC61937 MPEG2 AAC

PulseAudio server doesn’t automatically detect actual encodings supported by hardware. However, PulseAudio client can manually enable or disable encodings for every source and sink using introspection API. The server stores a list of enabled encodings for every source and sink in a database, so these settings are persistent.

The user can enable encodings via pavucontrol GUI. Other applications may check which encodings are enabled for a source or sink and use passthrough mode if they support one of the enabled encodings.

Virtual devices and streams

Pulseaudio provides several sound processing tools implemented as virtual devices (sources and sinks) and virtual streams (source outputs and sink inputs).

sink monitor
Sink monitor reads samples written to the sink.
Every sink automatically gets an associated sink monitor. Every time when the sink reads a chunk from its sink inputs, it writes this chunk to the sink monitor.
The sink monitor is a sink-to-source-output adapter.
loopback
component
module-loopback
Loopback forwards audio from a source to a sink.
Loopback is implemented as a pair of a source output and sink input and a queue in between. Source and sink may have different clocks. To deal with it, loopback adjusts resampler rate on the fly to maintain fixed latency calculated from the queue size.
Loopback is a source-to-sink adapter.
null source and sink
component
module-null-{source,sink}
The null sink silently drops all data from the sink inputs. The null source writes silence to the connected source outputs.
As any other sink, the null sink has an associated sink monitor, which can be used to read all data written to the sink. Hence, the null sink together with its sink monitor is a sink-input-to-source-output adapter.
combine sink
component
module-combine-sink
This sink combines multiple sinks into one.
All data written to this sink is forwarded to all connected sinks. Combine sink creates a sink input for every connected sink and duplicates incoming data to every sink input.
Different sinks may have different clocks. To deal with it, combine sink adjusts sample rate for every sink input, which performs resampling, which may be quite CPU intensive. By default, the “trivial” rate conversion method is used.
sine source and sink input
component
module-sine-source
module-sine
Sine source and sine sink input generate a sine wave with the preconfigured frequency.
Sine source may be connected to source outputs (e.g. application recording stream or an RTP sender). Sine sink input may be connected to a sink (e.g. ALSA sink or tunnel sink).
pipe source and sink
component
module-pipe-{source,sink}
Pipe source or sink reads or writes samples to a preconfigured file on disk. This file may be a named pipe (FIFO).

Filter devices

Filter sources and sinks are a special category of virtual devices. Such source or sink has a special virtual source output or sink input connected to another, master source or sink.

There are two kinds of filters:

regular filters
Such filter creates one virtual source or sink and one virtual source output or sink input connected to the master source or sink.
group filters
Such filter creates a pair of connected virtual source and sink and a pair of virtual source output and sink input, connected to a pair of master source and sink.

PulseAudio treats filter devices specially in several cases:

volume sharing
The filter source or sink always uses the same volume as its master source or sink.
autoloading
The filter source or sink may be automatically loaded and connected to a stream based on stream properties set by an application.
routing rules
Automatic routing rules have special cases for autoloaded filter sources and sinks.

Regular filters

Several regular filters are available:

remap source and sink
component
module-remap-{source,sink}
Remap source and sink act as a proxy for a master source or sink, performing statically configured channel remapping on top of it.
equalizer sink
component
module-equalizer-sink
This sink acts as a proxy for a master sink, implementing STFT OLA-based digital equalizer on top of it. The equalizer may be configured on the fly via a D-Bus interface.
virtual surround sink
component
module-virtual-surround-sink
This sink acts as a proxy for a master sink, implementing performing a convolution with a prerecorded HRIR WAV file to emulate surround sound when using headphones.
virtual source and sink
component
module-virtual-{source,sink}
The source code of the virtual source and sink may be used as a reference when writing a new filter source or sink.
Virtual source reads the data written to the master sink. It may be used as a sink-input-to-source-output adapter, just like the null sink with its sink monitor.
Virtual sink simply forwards all data to the master sink. Its primary purpose is to demonstrate the implementation of a filter sink.

Echo cancellation filter

component

module-echo-cancel

Acoustic echo cancellation (AEC) is implemented as a group filter. It creates a connected pair of virtual source and sink, each acting as a proxy for a master source and sink.

Here is how it works:

When data is sent to the virtual sink, it is forwarded to the master sink. Additionally, the virtual sink stores a frame of recently written samples.
When data is read from the virtual source, it is forwarded from the master source. Additionally, the echo is canceled using the sample frame stored in the virtual sink.

Several AEC engines are implemented:

LADSPA plugin sink

component

module-ladspa-sink

LADSPA (Linux Audio Developer’s Simple Plugin API) is a standard API for plugins implementing audio filters and audio signal processing effects.

PulseAudio implements LADSPA support as a filter sink. Each LADSPA sink loads single LADSPA plugin from a shared library. Plugin parameters may be configured when the sink is created or via D-Bus API on the fly.

A concrete example may be found in this post, which demonstrates how to configure PulseAudio to use an LADSPA plugin for Dynamic Range Compression.

Note that LADSPA Version 2 (LV2) standard exists, but it’s not supported in PulseAudio.

Constructing chains

Modules add new functionality to the server by implementing sources, source outputs, sinks, and sink inputs. The user then may combine them into a chain. However, only two types of direct connections are allowed:

source output to a source
sink input to a sink

When this is not enough, the elements of a chain have to be connected indirectly using one of the available adapters.

The table below summarizes all possible direct and indirect connections. A table row defines from where to read the samples. A table column defines to where to write the samples.

The combinations not listed in the table aren’t possible. It’s not possible to read samples from a source output and write samples to a sink input and source.

	to source output	to sink
from source	directly	loopback
from sink	sink monitor	sink monitor + loopback
from sink input	null sink + sink monitor	directly

Sample cache

The sample cache is an in-memory storage for short named batches of samples that may be uploaded to the server once and then played multiple times.

The sample cache is useful for event sounds. It’s also a simple way to overcome the network latency issues because the samples are played after they are uploaded to the server.

Clients may create, remove, and play the sample cache entries using several protocols:

“native”
ESound
D-Bus
CLI

There are several different methods of uploading samples to the sample cache:

from stream
The client connects an existing playback stream to a sample cache entry. All samples written to the stream will be temporarily sent to the sample cache entry instead of the sink input associated with the stream.
This method is used in the “native” protocol.
from payload
The client directly sends the samples to a sample cache entry.
This method is used in the D-Bus and ESound protocols.
from file
The client asks the server to load the samples from an audio file on disk.
The file can be either loaded immediately or lazily. In the latter case, the server loads the file in memory only when it should be played, and automatically unloads it if it wasn’t used for some period of time.
This method is used in the CLI protocol.

When the client asks the server to play a sample cache entry, the server creates a new sink input that reads samples from the entry.

Depending on the protocol, the client may also provide additional properties for the new sink input, which, among other things, may be used by the routing algorithm to choose a sink to connect it to.

Stream management

component

libpulsecore

module-native-protocol-{fd,unix,tcp}

Clients which use the “native” protocol should create playback or recording streams in order to exchange samples with the server. Every stream is associated with a sink input or source output on the server, which may be connected to a sink or source. The client and server then exchange asynchronous commands and chunks of samples through the stream.

The diagram below illustrates the logical data flow from an application to a sound card. Note that it doesn’t reflect that the client application, the native protocol module, the sink module, and the sound card are actually separate execution threads connected via queues.

Stream types

There are three types of the “native” protocol streams:

recording stream
A recording stream has a corresponding source output that is connected to a source. Samples are sent from the server to client. Control commands are sent in both directions.
playback stream
A playback stream has a corresponding sink input that is connected to a sink. Samples are sent from the client to server. Control commands are sent in both directions.
upload stream
An upload stream has a corresponding sink input that is connected to a sample cache entry. Samples are sent from the client to server. Control commands are sent in both directions.

Client to server

The client may send the following stream commands to the server:

write or read samples
The client sends (for playback streams) or receives (for recording streams) a chunk of samples. When sending samples, the client may either append samples to the stream or specify a seek offset and overwrite previously sent samples.
stream properties
The client gets or sets various stream properties, including timing info (current position and latency), sample spec (sample size, sample rate, and number of channels), channel map (bitmask of enabled channels), format info (stream encoding, PCM or hardware-specific like S/PDIF), buffer attributes (maximum and target buffer size, prebuffering size, request size), and stream proplist (list of arbitrary named properties).
stream state
The client gets current stream state. A stream may be playing, suspended (to save power when the sink is idle), or corked (paused).
pause and resume
The client corks (pauses) or uncorks (resumes) the stream. The stream is paused as soon as possible without waiting the full latency period.
prebuffering
The client sends prebuf command (start prebuffering) or trigger command (stop prebuffering). When prebuffering is started for a playback stream, the stream is paused until the server-side buffer accumulates required amount of samples. When the server receives enough samples from the client, it automatically starts the stream and disables prebuffering.
flush
The client drops all samples from the server-side stream buffer.
drain
The client asks the server to inform it when the server reads all samples from the server-side buffer of playback stream and it becomes empty and an underflow occurs. When this happens, all samples sent by the client are already sent to the sink, though they probably didn’t reach the sound card yet.

Server to client

The client may register callbacks for the following stream commands from server:

request to write or read
The server requests the client to send (for playback streams) or receive (for recording streams) more samples. The client should send or receive samples only when requested by the server to be clocked by the sound card timer.
underflow and overflow
The server acknowledges the client that an underflow (underrun) or overflow (overrun) occurred. Underflow occurs when trying to read from an empty stream buffer. Overflow occurs when trying to write to a full stream buffer.
stream started
The server acknowledges the client that the stream was automatically started after prebuffering or underrun.
stream suspended
The server acknowledges the client that the stream was automatically suspended. When all streams connected to a source or sink remain paused for some period of time, the source or sink and all streams connected to it are suspended to save power.
stream moved
The server acknowledges the client that the stream was moved to another source or sink.
stream event
The server may send custom events to the client with a textual name and arbitrary binary payload. Currently, three event types exist: request-cork (the client should pause stream), request-uncork (the client should unpause stream), format-lost (the stream was moved to another source or sink that doesn’t support encoding being used currently).

Buffering

The diagram below shows what buffers are used to transfer sample chunks from the client to the sound card. See also this post for an overview of buffering in PulseAudio.

The following buffers are employed:

device buffer
The sink reads samples from connected sink inputs and stores them into the device buffer. In the case of an ALSA sink, this is the kernel-side ALSA ring buffer. The sound card reads samples from it.
The size of this buffer is equal to the minimum target stream buffer size among of the all sink inputs connected to sink. This is so to have the largest possible latency still meeting the requirements of all clients.
render queue
The sink input returns samples from its render queue. When the render queue doesn’t contain enough samples, the sink input pops a chunk from the stream buffer, converts it to the format requested by sink using resampler, and pushes to the render queue.
The size of this buffer is equal to the size of device buffer (for samples that were already passed to the sink, needed for rewinding) plus a zero or small amount of samples (for samples that were not yet passed to the sink, appearing only when the stream provided a larger chunk than requested and only a part of the chunk was read).
resampler buffer
Depending on the resampling method, resampler may introduce its own buffering between the stream buffer and render queue.
The size of this buffer is zero or one chunk.
stream buffer
The server pushes chunks received from client to the stream buffer.
The server maintains the stream buffer size near to the target stream buffer size requested by the client via the buffer attributes. Server achieves this by adjusting the number of samples requested from client each time. Therefore, the properly written client should try to respond with the requested number of samples, in average.
The size of this buffer is equal to the size of render queue (for samples that were already passed to the render queue, needed for rewinding) plus some amount of samples (for samples that were not yet passed to the render queue, needed to achieve the target latency).
socket buffer(s)
When the client sends a chunk to the server, the chunk may pass through zero, one, or two sockets buffers:
- If a TCP socket is used, both client and server have their own kernel-side socket buffers. Two socket buffers in total.
- If a Unix domain socket is used, the client and server share the same kernel-side socket buffer. One socket buffer in total.
- If a Unix domain socket is used, and the zero-copy mode is enabled, the client and server use the same user space shared memory pool, so that no kernel-side buffer is used. In this case, the client allocates a chunk and sends its ID to the server, and the server pushes it to the stream buffer. Zero socket buffers in total.
The size of this buffer(s) is determined only by network or scheduling delay. The Nagle’s algorithm is disabled by PulseAudio for TCP sockets and is never used for Unix domain sockets.
client buffer
The client performs no buffering except a single chunk that it’s going to send to the server.
So the size of this buffer is no more than one chunk.

Rewinding

Rewinding is a process of overwriting existing samples in buffers instead of appending to them. Implementation details are described in the Rewinding page.

It is an important feature that is necessary to combine a higher latency for playback (to reduce glitches) with a lower latency for out-of-band requests like pause and volume changes (to improve user experience).

For example, the volume change should be applied immediately even when the playback latency is 2s. To achieve this, all buffers that contain samples with an older volume value are rewound and refilled.

Rewinding and buffers

Rewinding works by moving read and write pointers of the ring buffers.

The three buffers are employed:

device buffer
Normally, the sound card moves the read pointer forward, and the sink moves the write pointer forward. On rewind, the sink moves the write pointer backward. The read pointer can’t be moved backward because the sound card has already played the samples.
render queue
Normally, the sink moves the read pointer forward, and the sink input moves the write pointer forward. On rewind, the sink input moves the read pointer backward, so that the sink can re-read required amount of samples.
To support this, the render queue always keeps some amount of samples before the read pointer, equal to the size of the device buffer.
stream buffer
Normally, the sink input moves the read and write pointers forward. On rewind, the read pointer is moved backward.
To support this, the stream buffer always keeps some amount of samples before the read pointer, equal to the size of the render queue, which includes the size of the device buffer.

Rewind processing

A rewind is separated into two parts:

rewind request
rewind processing

Rewind request may be issued on a sink input or sink. Sink input always propagates rewind requests to the sink. Rewind processing always starts from the sink and then goes down to all connected sink inputs.

The diagram below illustrates the overall algorithm.

The steps are:

If a rewind was requested for a sink input, it is propagated to the sink. If a rewind was requested for a sink, it is processed.
The sink processes the rewind. It moves back the write pointer of the device buffer as much as possible. The rewind request may be truncated if some samples were already played and can’t be rewound.
The sink asks all connected sink inputs to process the rewind and passes them the number of samples actually rewound.
The sink input processes the rewind. First, it moves back the read pointer of the render queue. If it’s not enough, it also moves back the read pointer of the stream buffer.

Rewind requests

Here is the list of cases when a rewind request is issued on a sink input or sink:

stream write with a non-zero seek offset
The client explicitly overwrites previously written samples by specifying a seek offset. Buffers are rewound to overwrite unplayed samples.
stream flushed
The client explicitly drops buffered samples. Buffers are rewound to drop unplayed samples.
sink input corked or uncorked
The sink input is paused or unpaused via the stream or introspection API or by an automatic rule. When the stream is paused, buffers are rewound to drop unplayed samples. When the stream is unpaused, buffers are rewound to start playing newly available samples immediately.
sink input volume changed or muted
The sink input volume is changed or muted via the stream or introspection API, by the sink, or by an automatic rule. Buffers are rewound to apply the new volume immediately.
sink input removed or moved to another sink
The sink input is removed or moved due to client disconnect, via the introspection API, or by an automatic rule. Buffers are rewound to drop unplayed samples of the stream.
sink input underrun ended
The sink input was in underrun and finally had provided the samples. Buffers are rewound to overwrite silence with newly available samples and play them immediately.
sink volume changed or muted
The sink volume is changed or muted via the introspection API, by hardware, or by an automatic rule. Buffers are rewound to apply the new volume immediately.
sink latency decreased
The sink decreases its latency due to a new stream connection with a lower latency requirement or on a watermark decrease. Buffers are rewound to shrink the device buffer.
sink parameters changed
The parameters of a virtual sink (like equalizer sink) are changed. Buffers are rewound to apply new parameters immediately.

Moving streams

At any time, a sink input or source output may be moved to another sink or source.

The move may be initiated explicitly by any application (typically via the mixer GUI like pavucontrol) or automatically by the routing policy (typically when a device is inserted or removed).

When the stream is moved, a rewind is requested to drop its samples from the sink or source it was previously connected to.

Synchronized streams

When a client creates a stream, it may configure it to be synchronized with another stream.

PulseAudio guarantees that all streams synchronized together always go sample-by-sample. To achieve this, it automatically propagates control commands issued on a stream (like pause and resume) to all synchronized streams.

It’s currently not possible to move a synchronized stream to another device.

Monitoring

The client can monitor existing devices and streams:

To monitor a source, the client just connects a recording stream to the source.
To monitor a source output, the client connects a recording stream to the source to which the source output is connected to. This is enough because all source outputs connected to the same source get the same data.
To monitor a sink, the client connects a recording stream to the corresponding sink monitor.
To monitor a sink input, the client connects a recording stream to the corresponding sink monitor and sets the monitor stream of the recording stream to the identifier of the specific sink input to be monitored.

Time management

Playback and recording are driven by a per-device timer-based scheduler that provides clocking and maintains optimal latency.

The diagram below illustrates the process. It shows the path of samples from an application (on the left) to a sound card (on the right) when using the “native” protocol.

Clocking

There are no two devices with equal clocks. One of them is always slightly faster and another is slightly slower. This applies both to a pair of computers, as well as to a pair of separately clocked devices on the same computer.

Since sound cards have their own clocks, an application can’t use a CPU timer to send samples to the sound card. Instead, the application should be clocked by the sound card, i.e. use a timer that runs in the sound card time domain.

In PulseAudio, clocking is provided by sources and sinks. Hardware source or sink runs a thread that writes or read samples to source outputs or sink inputs using a timer synchronized with the sound card.

Clocking and native protocol

component

libpulsecore

module-native-protocol-{fd,unix,tcp}

Every application stream running over the “native” protocol is clocked by the source or sink to which it is connected. Every client stream has an associated source output (for recording streams) or sink input (for playback streams) on the server.

When source writes samples to the source output, source output forwards them to the client stream. When sink reads samples from the sink input, sink input requests desired amount of samples from the client stream.

When the asynchronous API is used, a callback is invoked when the server requests more samples. The callback should respond with a desired amount of samples. When the simple API is used, the client is blocked until the server requests more samples.

Clocking and RTP

component

module-rtp-recv

An RTP sender can’t be clocked by an RTP receiver because the sender has no feedback from the receiver and there may be multiple receivers for a single multicast sender. In result, the receiver queue size is slowly but constantly increasing or decreasing.

Sooner or later, it will cause an underrun (the next sample to play is not received yet) or an overrun (the received sample is dropped because it came too early and the queue is still full). The user will hear glitches.

When I was running a demo sender and receiver on two computers, the clock difference was about 0.0055%. This means that every hour the first timer outruns the second by approximately 200 milliseconds. In other words, if the latency is about 200 ms, the playback has to be restarted every hour.

To prevent this, PulseAudio RTP receiver adjusts resampler rate on the fly to maintain constant queue size:

when the queue size becomes too high, the rate is slightly increased, the samples are played a bit faster, and after a while, the queue size decreases
when the queue size becomes too low, the rate is slightly decreased, the samples are played a bit slower, and after a while, the queue size increases

To prevent oscillations, an exponentially weighted average of the estimated rate is used. To prevent latency jumps, the rate is updated gradually with small steps. Algorithm details are described in the source code.

Latency

component

libpulse

libpulsecore

module-native-protocol-{fd,unix,tcp}

module-alsa-{source,sink}

Simply speaking, latency is the delay between sound being played and heard, or between being emitted and recorded. More accurately, we are interested in the delay between a client application and an analog input or output.

When a playback stream is connected to a sink, the following happens:

on server request, the client sends samples to the server via a socket or a shared ring buffer
the sink input gets an I/O event, reads samples and writes them to the stream buffer
on timer tick, PulseAudio sink reads samples from the sink input stream buffer, performs resampling and mixing, and writes result to the ALSA ring buffer
on timer tick, the DMA reads samples from the ALSA ring buffer and writes it to the sound card
the sound card passes samples to the codec, which writes the result to the analog output

The same is happening with a recording stream, in reverse order:

the codec reads samples from the analog input and passes them to the DMA
the DMA writes the samples to the ALSA ring buffer
on timer tick, PulseAudio source reads samples from the ALSA ring buffer and writes them to source outputs
source output reads samples from the stream buffer and sends them to the client
the client gets and I/O event and reads samples

Accordingly, the latency can’t be less than the sum of:

the time to transmit a chunk of samples (or a chunk header in the zero-copy mode) between the client and the server
the time to process samples, e.g. the time to resample and mix chunks from all sink inputs connected to the sink in case of a playback stream
the number of queued samples in the ALSA ring buffer
the time to transmit a chunk of samples via DMA
the time to encode or decode samples in the sound card codec

PulseAudio can control the stream buffer size and the ALSA ring buffer size. The rest components can’t be controlled and are determined by the hardware capabilities. To achieve the target overall latency, PulseAudio measures the current overall latency and then adjusts the buffer sizes accordingly.

The driver and hardware latency, which includes the size of the ALSA ring buffer, the DMA delay, and the sound card codec delay, is measured using the ALSA PCM timestamping.

On the diagram above, the minimum possible latency is shown as a vertical bar between the two red lines. Since the client application, the PulseAudio network thread, the PulseAudio device thread, and the sound card run in parallel and may be clocked independently, there are additional small delays caused by scheduler jitter and non-synchronous timers. So in practice, the minimum possible latency will be a bit higher than shown in the diagram.

Note that compared to bare ALSA, PulseAudio increases the minimum possible latency:

the client, the network thread, and the device thread are separate processes, so there are additional context switches
the client communicates with the server via IPC, so there are non-zero round trip times (even in the zero-copy mode)

On the other hand, PulseAudio may operate at a lower latency than a naive implementation of an ALSA client based on select/poll, due to its advanced timer-based scheduler (see below).

Controlling latency

Every source, sink, source output, and sink input has its own latency. PulseAudio server controls all of them and can adjust latency on the fly to reach the minimum acceptable value that causes no glitches. This value may depend on things like hardware capacity and current system load. See LatencyControl page.

An application can set its own latency requirements for a stream, using these four parameters of the stream buffer:

maxlength - the maximum number of bytes in buffer
tlength - the desired number of bytes in buffer, i.e. the target latency
prebuf - the minimum number of bytes to be accumulated in buffer before starting the stream, i.e. the start threshold
minreq - the minimum number of bytes to be requested from client each time

For every stream, PulseAudio server maintains a constant latency, depending on adjust latency mode that may be enabled per-stream by an application:

If adjust latency mode is disabled, tlength specifies target size of the stream buffer. PulseAudio server requests or sends samples to the client in such way that there is always about tlength bytes in the stream buffer.
If adjust latency mode is enabled, tlength specifies desired size of the stream buffer plus the device buffer. Device buffer size is controlled by the source or sink implementation. Let’s call it dlength.
In the case of ALSA source or sink, dlength corresponds to the driver and hardware latency, which includes the size of the ALSA ring buffer, the DMA delay, and the sound card codec delay.
In other words, in this mode tlength specifies the desired latency between the client and the sound card. To reach it, PulseAudio server does two things:
- adjusts the dlength to be the minimum tlength value among of the all sink inputs connected to sink
- requests or sends samples to the client in such way that there is always about tlength - dlength bytes in the stream buffer

Note that the actual stream latency may be higher than requested by an application. PulseAudio automatically increases the latency depending on hardware and OS scheduler constraints. In particular, the latency is increased in case of frequent ALSA underruns to avoid glitches.

Measuring latency

PulseAudio provides clients with the stream timing info, containing the stream latency divided into the three components:

transport latency
The time required to send a sample from the client to the server-side stream buffer. Set to the half of the round trip time, calculated on the client when it sends the timing info request and receives the response.
stream buffer latency
The size of the server-side stream buffer. Equal to the difference between the write index and read index.
For playback stream, the client advances the write index and the sink advances the read index. For recording stream, the source advances the write index and the client advances the read index.
sink or source latency
The latency reported by sink or source. In general, it represents delay between the stream buffer and the sound card.
For ALSA devices, it is approximately equal to the size of the chunk queue between the sink and sink input, plus the size of the ALSA ring buffer, plus the DMA delay, plus the sound card codec delay.

Timing info may be retrieved manually by application or automatically when the auto timing update flag is set for a stream.

The client library uses timing info to calculate two values:

stream time
Stream time is the timestamp of the sample being currently played or recorded on the sound card:
- playback streams: stream_time = streambuf_read_index - sink_latency + transport_latency
- recording streams: stream_time = streambuf_write_index + source_latency + transport_latency
The transport latency is added to take into account the number of samples played or recorded on the server during the time elapsed between the timing info was sent from the server and received on the client.
stream latency
Stream latency is the difference between the timestamps of the last sample sent or received by the client and the sample being currently played or recorded on the sound card:
- playback streams: stream_latency = streambuf_write_index - stream_time
- recording streams: stream_latency = stream_time - streambuf_read_index
Or, equally:
- playback streams: stream_latency = streambuf_latency + sink_latency - transport_latency
- recording streams: stream_latency = streambuf_latency + source_latency - transport_latency

Usually, the stream time value is not used directly in the calculation above. Instead, it is postprocessed before reporting to the application or calculating the stream latency:

unless the not monotonic flag is set for the stream, the client ensures that the stream time never steps back;
if the interpolate timing flag is set for the stream, the client interpolates and smooths the stream time between timing info updates.

Actual calculations in the source code differ from the formulas above in two details:

on signed overflows, negative values may be truncated to zero or reported separately;
when recording stream is connected to a monitor source, the latency of the monitored sink is taken into account in addition to the latency of monitor source.

Latency and backends

Non-ALSA backends generally don’t support adjusting device buffer size. An application can determine if a source or sink supports it by checking the dynamic latency flag of the device.

Some backends, including Bluetooth devices, don’t provide accurate information about the actual latency. This information is important for some applications, notably for the lip sync in video players.

To workaround problems with such backends, the user can manually set the latency offset for a device port, which is zero by default. When a hardware source or sink is connected to a device port, the latency offset of the device port is added to the latency (device buffer size) reported by the source or sink.

ALSA challenges

When a program uses ALSA, the program writes or reads samples from the ALSA ring buffer, and the sound card reads or writes samples from that buffer at a timer tick. The user can configure the buffer size (number of samples in whole ring buffer) and the period size (a.k.a. fragment size, the number of samples written or read per one timer tick).

As noted above, the program should be clocked by the sound card. Traditionally, this is achieved by either blocking on ALSA write or read operation until the sound card updates the ring buffer, or waiting for it using poll or select.

This way, the whole process is driven by the sound card timer, which is good. However, two problems may be encountered when developing ALSA client:

If poll or select is used, there is always a short delay between the moment when the sound card reads from the ring buffer and the program writes next chunk of samples to it. The time is spent for a context switch from the kernel space to the user space, returning from poll or select, issuing the next write call, and finally doing one more context switch from the user space to the kernel space.
On low period sizes, this delay can cause glitches. The delay may be avoided if the program uses blocking write instead of poll or select, but this doesn’t allow to do I/O multiplexing, which may be necessary.
It’s not easy to guess optimal buffer size and period size because they depend on the latency, hardware, CPU, and average system load. When default parameters chosen by ALSA doesn’t play well enough, the client programming becomes more tricky.

Timer-based scheduler (tsched)

component

module-alsa-{source,sink}

PulseAudio addresses these challenges by doing its own timer-based scheduling, also known as the glitch-free playback model. It was introduced in 0.9.11 version and inspired by audio systems in Windows Vista and MacOS.

With the timer-based scheduling, PulseAudio is able to fulfill the two requirements at the same time:

the server usually doesn’t introduce glitches by itself even with low latency values
the client doesn’t bother about the ring buffer parameters and advanced timing techniques and can just request the latency it needs

Without it, the glitches were more common when using PulseAudio with low latency values, and applications had to either switch to a higher latency or use ALSA directly.

Here are its basic ideas:

Timer
Instead of using poll or select on ALSA device, PulseAudio configures its own timer and uses poll or select on the timer. To avoid glitches, PulseAudio timer is configured to fire some time before the sound card timer, so that PulseAudio has enough time to write next chunk of samples to the ring buffer.
Synchronization
PulseAudio monitors the size of ALSA ring buffer and adjusts the timer to be synchronous with the sound card timer. To avoid oscillations, the sleep period for the PulseAudio timer is updated smoothly.
Watermark
PulseAudio maintains a watermark for the number of unread bytes in the ring buffer. When this number becomes lower than the watermark, or (in the worst case) an underrun occurs, the watermark is increased. When this number becomes higher than the watermark, the watermark is decreased.
The key point of the watermark is that PulseAudio can detect and prevent an underrun until it really happens. The watermark value affects two things:
- The delta between PulseAudio timer and ALSA timer. The higher is the watermark, the greater the delta, so that PulseAudio has more time to fill the ring buffer before ALSA timer tick happens.
- The latency. When the watermark becomes too high, the latency is increased. When the watermark becomes low again, the latency is decreased back. This means that the actual latency may be higher than requested.
Interrupts
On every tick of the sound card timer, an interrupt (a.k.a. period wakeup) is generated, and the process blocked on ALSA write or poll is woken up. PulseAudio doesn’t need this, so it tries to reduce the number of interrupts to lower CPU usage.
To reduce the number of interrupts, the buffer size and period size (a.k.a. fragment size) are set as large as supported by hardware. Typical values are 2s buffer size and 1s or 0.5s period size.
If supported by the sound card driver, interrupts for the sound card timer are disabled at all. This only works with recent ALSA versions and some drivers. This should work at least with recent Intel drivers.
Rewinding
To provide instant reaction on user-input, PulseAudio uses the ALSA feature of buffer rewriting. Whenever an application performs a seek, pause, or writes more samples, PulseAudio rewrites the ring buffer with actual data.
Clocking
When a sink reads samples from a sink input, the sink input requests more samples from the application. The process is driven by the sink timer, which is kept synchronous with the sound card timer. The samples are requested some time before they should be written to the ring buffer, so the application has time to receive the request and response with a chunk of samples.

The timer-based scheduler may be enabled or disabled globally or per-sink. By default, it is automatically enabled when:

the card is a real hardware device
the card driver supports mmap
the card driver doesn’t use double buffering (batch mode), so that the real device buffer may be updated at any time
PulseAudio is not running under a virtual machine (yes, it has some code to detect this)

Power saving

Several techniques are used to save the power. Some benchmarks may be found in this post.

Device states

component

libpulsecore

module-suspend-on-idle

A source or sink may be in one of the following states:

RUNNING
Actively performing I/O. There are non-paused connected streams.
IDLE
Actively performing I/O, but there are no non-paused connected streams. In the case of a sink, zero samples are written to the device. In the case of a source, recorded samples are dropped.
SUSPENDED
Not performing any I/O.

The source or sink is marked idle when there are no non-paused connected streams. If it remains in this state for some time, it may be suspended. When a non-paused stream appears again, the source or sink is resumed.

Reducing interrupts

component

module-alsa-{source,sink}

The less frequently sound card interrupts occur, the less frequently the driver wakes up, the less power is used.

When the timer-based scheduler is used, PulseAudio reduces the number of sound card interrupts or completely disables them if it’s supported by the driver.

Default latency

component

libpulsecore

module-native-protocol-{fd,unix,tcp}

The higher is latency, the less frequently the server wakes up, the less power is used.

High latency may be set for a stream automatically:

If an application uses PulseAudio and didn’t specify the latency, PulseAudio automatically selects the default value, which is high, typically 2s.
If an application uses GStreamer, the user can configure GStreamer to select a high latency automatically for applications which media role is “music”, as described in this post. However, GStreamer doesn’t do it by default.
GStreamer uses PulseAudio as a backend and is a backend itself for many applications and higher-level media frameworks like Xine and Phonon. See details here.

Automatic setup and routing

PulseAudio automatically restores parameters for cards, devices, and streams, routes streams to devices, and performs other housekeeping actions.

Some examples:

When a new card, device, or stream appears, the server should restore previously configured parameters.
When a card or device appears, the server may move existing streams to it. When a card or device disappears, the server may move existing streams to another device.
When a client creates a new stream, the server should route it to some device.
When a client creates a new stream, the server may perform some automatic setup depending on stream properties, like autoloading sound processing tools, or silencing less important streams.

Databases

PulseAudio uses an embedded database to store parameters and routing rules persistently. It supports three database backends:

gdbm (GNU dbm)
tdb
“simple” (built-in hashtable-based implementation)

Two separate databases are involved in routing:

restoration database
device manager database

Stream roles

The routing and automatic setup are heavily based on roles. A role is a short string describing media type, like “video”, “music”, or “phone”.

Roles are used in several places:

every stream may have the “media.role” property provided by application
every device may have the “device.intended_roles” property provided by the device backend
the restoration database may contain per-role routing rules
the device manager database may contain per-role priority lists of routing rules

Stream groups

Two streams belong to the same group if they have the same group identifier. PulseAudio checks the following stream properties and uses the first available one as the group identifier of the stream:

“media.role” (e.g. “video”, “music”, or “phone”)
“application.id” (e.g. “org.gnome.Totem”)
“application.name” (e.g. “Totem Music Player”)
“media.name” (e.g. “Pink Floyd - Astronomy Domine”)

If none of these properties are present in the stream property list, PulseAudio uses the default group identifier, which is the same for all streams.

Note that more generic properties are preferred over more specific ones.

Stream groups are used in two places:

in the restoration database
when autoloading group filters

Stream routing

component

libpulsecore

module-stream-restore

module-device-manager

module-intended-roles

Routing is a process of choosing to which device (source or sink) to connect a new stream (source output or sink input). A good routing overview can be found in this post.

Routing consists of several steps:

application device
If the client has specified a device name, the stream is routed to that device.
This usually means that the user has configured the application to use a specific device. Some applications may provide a command line option or GUI for that.
module-stream-restore
Otherwise, module-stream-restore (enabled by default) checks the restoration database. If there is a stored device for the stream group, and the device is currently available, the stream is routed to that device.
This means that the user had moved a stream from the same stream group to that device earlier, and this decision was remembered. The restoration database is updated only when the user manually moves a stream via GUI. It’s not affected by automatic routing.
module-device-manager
Otherwise, module-device-manager (enabled in KDE) checks its per-role and global priority lists of devices. If there is a non-empty priority list for the stream role or non-empty global priority list, and there are currently available devices in the list, the stream is routed to the first such device.
This means that the user has configured some priority lists to be used for a role or globally. KDE provides a GUI for that.
module-intended-roles
Otherwise, module-intended-roles (enabled by default) searches for a device which intended role list contains the stream role. If such device exists, the stream is routed to that device.
In other words, a per-role default device is used. The intended role list of a device is provided by the device backend. It can also be set manually when creating the device.
fallback device
Otherwise, the stream is routed to the fallback device.
Fallback source and fallback sink may be changed by the user.

Some side notes:

After the user had once manually moved a stream, there is an entry for it in the restoration database, and the other routing steps are never executed again for this stream. Some GUI tools provide a function of removing routing rules from the restoration database.
When the user moves a stream to another device, all streams of the same stream group are immediately moved as well, and a routing rule for the whole group is stored in the restoration database.
When the user updates device manager routing rules, existing streams are immediately re-routed according to the new routing rules.
When the user changes the fallback source or sink, nothing happens. The new fallback device will be used only when a new stream is routed.

Restoration database

component

module-{card,device,stream}-restore

module-default-device-restore

Three parameter categories are stored in the restoration database:

active profiles of cards
volume/mute settings and active ports of devices
volume/mute settings and routing rules (stream group name plus device name)

There is a separate module responsible for managing every category:

module-card-restore
module-device-restore
module-stream-restore

The modules implement two functions:

all modules monitor server objects and read or write appropriate category of parameters to the restoration database
some modules also provide a protocol and API extension that enables client applications to read, write, and monitor the corresponding category of parameters in the database

Each module monitors both existing and new objects:

When an appropriate parameter of an existing object is changed by the user, the module stores the object ID, parameter ID, and parameter value into the database.
When a new object (card, device, or stream) appears, the module checks if there are stored parameters for this object, and restores them, if any.

What is used as the object ID depends on the object type:

for cards and devices (sources and sinks), the card or device name is used
for streams, the stream group identifier is used, computed from the stream properties

The outcome of using stream group as the object ID is the following:

All streams with the same role share the same volume and routing settings. For example, all music streams or all notification streams.
All streams without a role, but belonging to the same application, share the same volume and routing settings. For example, all instances of a music player.
All streams without a role and an application identifier, but with the same media name, share the same volume and routing settings. For example, all music players which are playing the same file and have specified media name in the same format.
All streams that didn’t specify any of the above properties share the same volume and routing settings.

When the user changes the stream volume or moves it to another device, all other streams with the shared volume and routing are automatically updated or moved as well.

Besides the three modules described above, the module-default-device-restore saves (on a timer event) and restores (on start) the fallback source and sink. These two device names are stored in two text files instead of the restoration database.

Device manager database

component

module-device-manager

The module-device-manager was developed for KDE, which uses it by default and provides a GUI tool for manipulating priority lists. The module performs three functions:

Maintains a database with multiple priority lists of devices: one priority list per role, and one default priority list. The priority list may contain both devices that are available currently or were available in the past.
Implements routing. When a new stream can’t be routed using the restoration database, device manager checks if there is a non-empty priority list for the stream role or non-empty default priority list. If a non-empty priority list is found, the first currently available device from the priority list is used.
Provides a protocol and API extension that provides methods for inspecting and manipulating priority lists.

Device intended roles

component

module-intended-roles

The module-intended-roles implements a kind of automatic per-role fallback device.

This is how it works:

When server backend creates a device, it may specify a list of roles intended for the device via the “device.intended_roles” property. This property may be also autodetected from the device form factor.
When a client creates a context or a stream, it may specify its role via the “media.role” property. This property may be also autodetected from the application desktop file.
When server routes a new stream, it searches for a device which has the stream role in its intended role list. The fallback device is always checked first, so if its intended roles match, it takes priority over other devices.

The intended role list is set for several types of sources and sinks:

alsa
ALSA backend sets intended roles of sources and sinks if the ALSA card supports UCM.
For every source or sink, PulseAudio computes intended roles from the UCM modifiers associated with the device ports of the source or sink. Every UCM modifier name is converted to a PulseAudio role name.
bluetooth
Bluetooth backend sets intended roles of sources and sinks depending on the device form factor.
raop
RAOP module unconditionally sets “music” role for RAOP sinks.
aec
Acoustic echo cancellation module unconditionally sets “phone” role for its sources and sinks.

Priority routing proposal

As discussed in the blog post mentioned above, existing routing scheme is non-intuitive, and its steps are quite isolated and inconsistent in several aspects:

If all applications will correctly provide stream roles, it will not be possible to move a single stream to another device without affecting all streams with the same role. Per-application stream moving works so far only because most applications don’t specify stream roles, and PulseAudio uses application ID instead.
When a stream is moved manually or device manager rules are updated, existing streams are re-routed. When the fallback device is updated, existing streams are not affected, however.
When configuring device manager, it’s clear that there are per-role device lists. When moving a stream, it’s not clear what rules will be overwritten (per-stream, per-role, per-application, or something else).
Device manager routing rules contain device lists with both available and unavailable devices, and the first available device is used. Restoration database routing rules contain only a single device, and the rule is used only when the device is available.
Restoration database rules override the device manager rules, but this is not obvious. The overrides may suddenly appear or disappear depending on whether the corresponding device is currently available and what meta-information is provided by an application.

In result, it’s hard for the user to figure it out how and why the routing works. A non-implemented PriorityRouting proposal exists, aimed to make routing more consistent and transparent:

priority lists are moved to the PulseAudio core and become first-class objects
all existing routing steps are reworked to operate on top of these priority lists
modules may implement routing policies by registering or manipulating priority lists
the user can inspect and configure priority lists using GUI tools

Third-party routing modules

Some projects implement their own PulseAudio modules that replace or modify default routing scheme:

Tizen IVI
component
module-murphy-ivi
Tizen IVI (Linux for In-Vehicle Infotainment) project uses Murphy, a resource policy daemon that manages things like audio, video and network connections. The pulseaudio-module-murphy-ivi PulseAudio module implements routing using Murphy as a backend.
See these slides for some details.
IoT.bzh
component
agl-audio-plugin
IoT.bzh (Linux for Internet Of Things) project uses agl-audio-plugin PulseAudio module. It was forked from the Tizen IVI PulseAudio module and is its simplified version that doesn’t need Murphy and uses either a JSON configuration file or its own embedded configuration.
There are also some slides with details.
Nemo
component
module-meego-*
module-policy-enforcement
module-stream-restore-nemo
Nemo is a Linux distribution for mobile devices. It uses Mer (a MeeGo fork). It implements several PulseAudio modules with custom routing, volume, and permission policies.
See details on their wiki.
Sailfish OS
Sailfish OS a mobile OS based on Nemo and Mer. It uses pulseaudio-modules-droid, which implements several PulseAudio modules allowing PulseAudio to work on top of the Android Audio HAL.
See details on these slides.

Autodetecting properties

component

libpulsecore

module-augment-properties

When an application connects to the server, a new client object is created on the server. The C API automatically sets the “application.process.binary” property of the client, which contains the name of the application executable.

When a new client is created, PulseAudio may automatically find the desktop file for the application and use it to compute some client properties. The desktop file is usually searched in the "/usr/share/desktop" directory and should have the same name as the application executable.

The two desktop file entries are used:

X-PulseAudio-Properties
This entry may define arbitrary properties for the client.
Categories
This may be used to compute the “media.role” property for the client. Currently, the “Game” category is mapped to the “game” role, and the “Telephony” category is mapped to the “phone” role.

When a new stream is created, it inherits all properties from the client, including the ones that were detected from the desktop file. If the stream has its own properties, they override the client properties.

Sources and sinks may have “device.form_factor” property. Form factor is a short string describing device type, e.g. “handset”, “headset”, “speaker”, or “microphone”.

If a new device has an empty intended role list property, but non-empty form factor property, in some cases PulseAudio may automatically compute intended roles from the form factor. Currently, “headset”, “handset”, and “hands-free” form factors are converted to the “phone” role.

Currently, the device form factor is set in two places:

Bluetooth backend computes device form factor from the device class.
If Udev rules can match the sound card model, the SOUND_FORM_FACTOR property is attached to the device. The server reads this property during source or sink initialization.

Autoloading filters

component

module-filter-{heuristics,apply}

PulseAudio may automatically load and setup filter sources and sinks based on the stream properties. An application specifies what sound processing tools it wants for a stream, and the server performs all necessary configuration.

This mechanism is based on a convention used for all filter sources and sinks:

every filter is implemented in a separate module
the module name is the filter name
every module creates one source, or one sink, or one paired source and sink
every module accepts the “master” or the “source_master” and “sink_master” arguments

Three stream properties are employed:

“filter.want”
The name of the filter to load. The server is allowed to ignore the filter when it thinks it’s unreasonable.
“filter.apply”
The name of the filter to load. The server unconditionally loads the filter. Overrides “filter.want”.
“filter.suppress”
The name of the filter not to load. The server doesn’t load the filter, even if it’s specified in “filter.want” or “filter.apply”. Useful when “filter.want” or “filter.apply” property is set automatically, but application wants to disable it.

The autoloading support is divided into two modules:

module-filter-heuristics
Tracks when a stream is created or moved.
If the “filter.apply” is unset and the “filter.want” property is set and the specified filter should not be ignored, sets the “filter.apply” property to the value of the “filter.want” property.
Currently, the only filter that may be ignored is “echo-cancel”. It is ignored if the “device.intended_roles” property of the stream device contains the “phone” role.
module-filter-apply
Tracks when a stream is created, moved, or stream properties are updated.
If the “filter.apply” is set and not disabled by the “filter.suppress” property, and the filter is not loaded yet, the module does the following:
- Checks if this is a group filter, which uses a paired source and sink and therefore requires a paired source output and sink input. In this case, the filter is loaded only if the paired stream exists as well.
  Two streams are considered paired if they have the same “filter.apply” property and the same stream group identifier. Currently, the only group filter is “echo-cancel”.
- Checks if the user did specify additional module paraterer via the “filter.apply.<filter_name>.parameters” property.
- Loads the “module-<filter_name>” module. The “master” or the “source_master” and “sink_master” arguments are set to the name of the source or sink to which the stream or paired streams are currently connected. If the user did specify additional parameters, they are also passed to the module.
- Finds the newly loaded filter source or sink.
- Moves the stream or paired streams to the filter source or sink.

Automatic actions

Several modules implement various housekeeping actions that are performed automatically.

switch on port available
component
module-switch-on-port-available
When a new device port appears, automatically set it as the active port of the appropriate source or sink.
When the active device port is switched, the active card profile may be switched as well. To avoid unwanted switches, the module tracks manual port and profile changes made by the user and uses some heuristics to determine what profile is preferred for every port, and what port is preferred for every profile.
When multiple ports or profiles are available, the one with the highest priority is selected. The priority of a port or profile is defined by its backend.
Note that ALSA and Bluetooth backends never add new device ports to a card after its initialization. For these cards, the functionality of this module is limited to selecting the highest priority device port when a card is initialized.
switch on connect
component
module-switch-on-connect
When a new source or sink appears, automatically set it as the fallback device and move all active streams to it.
rescue streams
component
module-rescue-streams
When a source or sink disappears, automatically move streams connected to it to another working source or sink.
always sink
component
module-always-sink
Ensure that there is always at least one non-filter sink. When all sinks disappear, automatically load the null sink.
role ducking and corking
component
module-role-{ducking,cork}
When an “important” stream is started, automatically duck (lower the volume) or cork (mute and request a pause) active streams. When the important stream finishes, unduck or uncork other active streams.
Whether a stream is important or not is determined by its “media.role” property. By default, streams with the “phone” role are considered important.
allow passthrough
component
module-allow-passthrough
When a new passthrough stream is moved to an existing sink, automatically create a null sink and move all other running streams to that null sink. When the passthrough finishes, move the streams back and remove the null sink.
The reason for such behavior is that passthrough streams are incompatible with regular PCM streams and they can’t be connected to the same sink at the same time. Therefore, if the user moves a passthrough stream to a sink, all other streams should be temporary disconnected.
position event sounds
component
module-position-event-sounds
When a stream is created or its properties are changed, adjust the volume balance of the stream depending on its on-screen 2-D coordinated provided by the application.
First, PulseAudio checks that:
- the “media.role” property of the stream is set to “event”, which is usually true for GUI event sounds
- the “event.id” property of the stream is not set to one of the known identifiers of test sounds used in some volume control dialogs
- whether the “event.mouse.hpos” and “event.mouse.vpos”, or “window.hpos” and “window.vpos” properties are set for the stream
If all of the above is true, PulseAudio adjusts the volume balance of the stream:
- the horizontal position defines the balance between the “left” and “right” channels
- the vertical position defines the balance between the “front” and “rear” channels
Thus, the GUI events on the screen are virtually mapped to a horizontal plane around the user.
match volumes
When a new stream appears, automatically update its volume based on its name and a preconfigured match table, by default loaded from "~/.pulse/match". Each line contains a regular expression to match the stream name and the volume to set.

Role-based configuration of ALSA devices

component

module-alsa-card

When a stream is moved to or from an ALSA source or sink which employs UCM, the stream role may affect ALSA mixer configuration:

Every ALSA device port is associated with zero or one UCM modifier.
Every UCM modifier is mapped to a PulseAudio role.
The UCM modifier of a device port is enabled when there is at least one source output or sink input connected to the source or sink of the device port, which has the “media.role” property equal to the UCM modifier role.

Automated setup of Bluetooth devices

component

module-bluetooth-policy

PulseAudio provides two features that automate the setup of Bluetooth devices that support A2DP (usually used for music) and HSP (usually used for phone calls):

automatic profile switch
Automatically switch between A2DP and HSP profiles (if both are available for a device), depending on the roles of the currently active streams.
When a new stream is connected to a source of a Bluetooth card and its “media.role” property is set to “phone” or unset (depending on the “auto_switch” module parameter), automatically switch the card profile to HSP. When all such source outputs are disconnected from the source, switch the card profile to A2DP.
automatic playback
Automatically route and play audio from A2DP sources.
When the currently active card profile of a Bluetooth card is “A2DP source”, and a new PulseAudio source is created for the card, do the following:
- create a loopback, i.e. a pair of a source output and sink input connected with a queue
- connect the loopback source output to the source
- set the “media.role” property of the loopback sink input to “music”, which may be used for routing
- let PulseAudio route the loopback sink input to some sink

Desktop integrations

This section describes features that integrate PulseAudio into the desktop environment. Some details are also available on the Desktops page on wiki.

Autospawn and autoexit

PulseAudio server usually starts and exits automatically:

autospawn
When a client tries to communicate to the server via libpulse, the server is started automatically if it’s not started yet. Due to this feature, when the user kills the server but an active client exists, the server will be automatically started again. This feature can be disabled in the client configuration file.
autoexit
When there are no connected clients during some period of time, the server automatically exits. However, the automatic exit may be prevented by one of the session management modules.

Session management

There are several session management modules that prevent the server from exiting during the lifetime of a desktop session:

systemd-logind
component
module-systemd-login
This module monitors logind events via D-Bus API. It creates a fake PulseAudio client for every new login session of the current user (determined by UID), and removes it when the session ends. These fake clients keep the server opened until the last user session ends.
ConsoleKit
component
module-console-kit
This module is the same but monitors ConsoleKit events.
XSMP
component
module-x11-xsmp
This module connects to the X session manager via XSMP protocol. It creates a fake client and removes it when the current X11 session ends.

X11 publishing

component

module-x11-publish

PulseAudio server may publish its address and credentials via the X11 root window properties. These properties may be then read by clients running on the same X display, including remote clients that use SSH X forwarding.

The properties can be read using this command:

$ xprop -root | grep PULSE

Here is the list:

PULSE_COOKIE
An authentication cookie that may be used by clients to connect to the server. All clients that have access to the current X11 display will be able to connect to PulseAudio server too.
PULSE_ID
Server ID in form of "server_uid@machine_id/server_pid".
PULSE_SERVER
Space-separated list of server sockets, e.g. "{machine_id}unix:/socket/path tcp:hostname:port". The server automatically updates this property when sockets are opened or closed.
PULSE_SESSION_ID
The value of $XDG_SESSION_ID environment variable when the server was started. Omitted if the variable wasn’t set.
PULSE_SOURCE, PULSE_SINK
These properties are optional. They may be set to a source or sink name provided by the user via the module arguments when starting the server. Clients will use these source and sink as defaults.

X11 events

PulseAudio can interact with X11 events in two ways:

Bell
component
module-x11-bell
The core X11 protocol allows clients to ring a bell, imitating analogous feature of the TTY. XKB extends this by supporting multiple named bells and providing an API for controlling bells and handling bell events.
PulseAudio is able to intercept XKB bell event and play a preconfigured sample from the sample cache.
Cork
component
module-x11-cork-request
PulseAudio can automatically cork (mute and request a pause) all active streams when a more important stream appears, and uncork them when it disappears.
This is implemented as special cork-request and uncork-request events sent from the server to clients. However, many PulseAudio clients don’t subscribe and handle server events, and their streams become muted but not paused.
As a workaround, PulseAudio can artificially synthesize X11 media key events along with the cork or uncork request, as if the pause or play multimedia keyboard button was pressed. Some applications will handle these events and pause/resume playback. This scheme is known to be buggy, however.

RealtimeKit (rtkit)

RealtimeKit provides a D-Bus API that allows user processes to enable the realtime scheduling policy without the root privileges. Basically, it’s a D-Bus, rootless, policy-based replacement for the sched_setscheduler POSIX call, plus a watchdog. Some details are available in this post.

PulseAudio may use it to enable SCHED_RR policy for some of its threads. Threads under this policy preempt any other threads on the system except the other realtime threads.

The realtime policy is enabled for the sink and source threads, including the ALSA sink thread that runs the timer-based scheduler. This helps to handle low latency values because when it’s time to provide samples for the ALSA driver, PulseAudio will not be delayed even if there are other starved processes.

GNOME registry (GConf)

component

module-gconf

gconf-helper

GConf is a system used by the GNOME desktop environment for storing configuration settings for the desktop and applications. GConf is currently deprecated and is replaced with GSettings and dconf, but is still available in distros.

PulseAudio monitors the "/system/pulseaudio/modules" GConf directory that should have the following layout:

system
└── pulseaudio
    └── modules             # root directory monitored by PulseAudio
        └── <foobar>        # subdirectory with an arbitrary name├── enabled     # contains a boolean, true if this subdirectory is enabled
            ├── name0       # contains a string with module name
            ├── args0       # contains a string with module arguments
            ├── name1       # contains a string with module name
            ├── args1       # contains a string with module arguments
            └── ...

For every subdirectory, PulseAudio automatically loads modules when new entries appear (up to ten entries currently) and unloads them when they disappear or “enabled” entry is set to false. The monitoring is implemented in gconf-helper tool which is the part of the PulseAudio package.

This feature is used in the paprefs GUI which may be used to configure and enable some non-default modules like RTP sender and receiver. The advantage of this approach is that these settings are stored persistently in the GConf database.

Compatibility layers

There are several compatibility layers with other sound systems, so that existing applications may automatically run on PulseAudio without modification.

Emulate ALSA

ALSA implements both kernel drivers and a user space library (libasound) with a high-level API for applications. This library supports plugins, which implement virtual ALSA devices.

Plugins are usually used for two things:

to add software sound processing on top of a real device, when it’s missed in hardware (e.g. resampling or mixing)
to redirect sound to another sound systems (e.g. PulseAudio and JACK)

The user manually enables plugins and sets the default device in ALSA configuration files, regularly "~/.asoundrc" or "/etc/asound.conf".

Note that plugins are the feature of libasound. They work entirely in user space inside a process that uses the library and has opened a virtual device. ALSA is not a sound server and doesn’t run a daemon process, unlike PulseAudio and JACK.

PulseAudio ships with an ALSA plugin that implements the “pulse” virtual device. In Linux distros that use PulseAudio, this device is usually configured as the default device for ALSA applications.

This is how it works:

When an ALSA application opens the “pulse” device, a new PulseAudio stream is created, and playback or recording is redirected to PulseAudio server.
PulseAudio server does its usual business: routing, mixing, adjusting volume, etc.
If the stream was routed to a local sound card, and PulseAudio uses ALSA backend, the stream goes to libasound again. This time, however, a hardware device for the appropriate sound card is used instead of the virtual “pulse” device.
Finally, libasound asks the kernel space ALSA driver to write or read samples from the sound card ring buffer.

Emulate OSS

OSS is an aged interface for making and capturing sound in Unix. OSS creates a character device for every sound card, and applications may open device, write or read samples, and perform control commands with ioctl.

ossp (OSS Proxy) provides a full-featured OSS emulation on top of ALSA or PulseAudio. It has a modular architecture consisting of the main daemon and backends:

osspd
osspd daemon creates and serves OSS devices using CUSE (Character Device in User Space). It forwards sound to a backend.
ossp-aslap
ossp-aslap backend forwards sound to ALSA using libasound.
ossp-padsp
ossp-padsp backend forwards sound to PulseAudio using libpulse.

Well, quite enough forwarding:

In addition, PulseAudio provides the padsp wrapper. It intercepts standard library functions using the LD_PRELOAD trick:

when the wrapped application tries to open an OSS device, it gets a fake file descriptor
when the application issues an operation on that fake descriptor (e.g. write, stat, or ioctl), the wrapper handles it and forwards sound or control commands to PulseAudio

The same approach is used in aoss (OSS to ALSA), esddsp (OSS to ESound), and artsdsp (OSS to arts sound server). It works for some applications but is known to be incomplete.

Note that ALSA driver also provides in-kernel OSS emulation. However, it’s not aware of the user space stuff, including libasound virtual devices, and therefore can’t be used to forward sound to PulseAudio. See OSS emulation on ALSA wiki.

Emulate ESound

component

module-esound-protocol-{unix,tcp}

module-esound-compat-{spawnpid,spawnfd}

Like PulseAudio, Enlightened Sound Daemon is a sound server accessed by applications via a socket. This daemon was being used in GNOME before it switched to PulseAudio.

For seamless migration, PulseAudio server provides two features:

A module that implements the ESound protocol and emulates ESD server. Existing applications may communicate with PulseAudio server as if it were an ESound server.
Two modules that implement ESound autospawn conventions. An application may start PulseAudio server as if it were an ESound server, and PulseAudio will notify the application that it was successfully started with a signal of via a file descriptor.

Emulate PulseAudio

There are several methods of running applications that need PulseAudio on systems where PulseAudio is not the primary sound system or even is not installed.

Configuring PulseAudio to use JACK backend
It’s possible to configure PulseAudio to use JACK backend (that has limited functionality and larger latency) instead of ALSA backend.
The typical use case for this method is to run PulseAudio applications that don’t support JACK (like Skype) on a system that uses JACK, without switching the entire system to PulseAudio.
Configuring PulseAudio as a “dumb pipe” for ALSA
dmix and dsnoop are ALSA virtual devices that implement software mixing and support sharing the same device between multiple applications.
It’s possible to configure PulseAudio to provide single sink and source attached to dmix and dsnoop devices without creating sinks and sources for hardware ALSA devices. See instructions here and here.
The typical use case for this method is to run PulseAudio applications that don’t support ALSA (like Skype) on a system that uses ALSA, without switching the entire system to PulseAudio.
Emulating PulseAudio on top of ALSA
The apulse wrapper tool uses the LD_PRELOAD trick to implement libpulse and libpulse-simple API directly on top of ALSA (libasound).
The typical use case for this method is to run PulseAudio applications that don’t support ALSA on a system that uses ALSA, without even installing PulseAudio.
Bluetooth without PulseAudio
The BlueALSA (bluez-alsa) project implements virtual ALSA device that uses Bluez5 as a backend. This allows to play and record audio from Bluetooth devices with any software that supports ALSA.

Suspend PulseAudio

pasuspender is a wrapper tool for applications that need exclusive access to ALSA devices.

It uses the device reservation API to ask PulseAudio server to release ownership of devices, runs the wrapped application, and returns the ownership when the application exits.

The typical use case for this method is to run JACK applications on a system that uses PulseAudio.

Server internals

This section provides a brief overview of PulseAudio server internals.

Components

PulseAudio server consists of several logical components:

daemon
Daemon is a top-level component that configures core and modules and starts the core main loop.
core
Core provides building blocks and shared environment for modules. It is implemented in the libpulsecore library, which also uses libpulse and libpulsecommon.
modules
Modules are dynamically loaded libraries that extend server and implement many actual features, including network protocols, device drivers, sound processing tools, audio routing, etc. Modules use the libpulsecore library.

Core

The core provides building blocks and shared environment for modules:

environment for modules
Shared environment for modules:
- module management
- name registry
- main loop
- hooks
fundamental objects
Skeleton implementation of the fundamental objects:
- module
- client
- card
- device port
- source
- source output
- sink
- sink input
common functionality
Reusable parts common for multiple modules:
- audio processing
- stream management
- parsers and formatters
- codecs
- protocols
utility functions
Numerous general-purpose utility functions:
- memory management
- collections
- message queues
- event loops
- threading
- I/O
- platform wrappers
- OOP helpers

Modules

A module is a dynamically loadable server extension. Usually, it is a shared library loaded at run-time, but it’s also possible to build it as a static library and link into the server at compile time.

There are several ways how a module may extend the server:

register new objects
A module can get a skeleton implementation from the core (e.g. source or sink), extend it, create an instance, and register the instance in the name registry. The core and other modules may then use registered objects.
Such a module is usually loaded multiple times. The module arguments define the name and parameters of the object to create. For every new object, a new instance of the module is loaded.
This approach is used to implement hardware and network devices and sound processing tools.
register event loop handlers
A module can monitor various external events, like a Udev event, a D-Bus signal, or socket I/O. Module registers I/O or timer event handler in the core event loop, and core invokes handlers when the event is fired.
Such a module may be loaded either once or multiple times depending on the implementation, e.g. one module instance for every socket address to be monitored.
This approach is used to implement device hotplug, network publishing and discovery, and servers for various protocols, including the D-Bus API, the “native” protocol used in the C API, and the CLI protocol used in command line tools.
register hooks and subscriptions
A module can register hooks or subscribe events. The module monitors created, removed, or modified objects or other events, and implements some behavior.
Such a module is usually loaded only once.
This approach is used to implement various automatic actions like routing new streams, automatically saving and restoring object parameters to the database, autoloading filters, etc.
register protocol and API extensions
A module can register extensions for the “native” protocol and the C API, as well as for the D-Bus API. Clients may then use these extensions to communicate with the module.
This approach is used to provide an API to manage the restoration database and setup custom parameters for some sound processing tools.

Objects

There are two base types that are used to implement objects:

object
Object (pa_object) is the base type for reference-countable objects. It implements reference counting, virtual destructor, and dynamic type checks and casts.
Usage:
- message object
- device port
message object
Message object (pa_msgobject) is the base type for objects that can receive messages. It extends pa_object, so provides all its features, and adds message handling.
Usage:
- core
- source
- source output
- sink
- sink input
- network connections
- streams

Registries

The core provides two global registries accessible in modules:

name registry
The name registry (pa_namereg) is a global hashmap that contains objects of the following types:
- card
- source
- sink
- sample cache entry
These objects are always added to the name registry when they’re created, and modules can access them by name. The name registry also manages default source and sink.
shared properties
The shared property subsystem (pa_shared) is a global hashmap that contains arbitrary data. Modules use it to register objects that may be accessed by name in related modules or other instances of the same module.

Hooks and subscriptions

Hooks are an internal notification mechanism. Hooks may be provided both by core and modules:

the core provides hooks for registered objects (e.g. sources and sinks)
the core provides hooks for the “native” and D-Bus protocol events
Bluetooth backend provides hooks for driver events

Object hooks can be roughly divided into the four categories:

object is created or removed
object is fully initialized
object state or properties are changed, e.g. the property list or volume is updated
object connections are changed, e.g. a stream is moved to another device

Subscription events are an alternative notification mechanism for registered objects (e.g. sources and sinks). They may be used both internally and externally via the C API:

module or client subscribes events by a mask
core triggers event when an object is created, removed, or modified

Properties

PulseAudio defines numerous properties that may be set for objects registered in the core. See ApplicationProperties wiki page and the Doxygen documentation with the full list of properties.

A property has a textual name and textual or binary value. The value format depends on the property name. Properties are organized into property lists. The following objects have a property list:

module
client
card
device port
source
source output
sink
sink input
sample cache entry
format info

The properties are accessible in the core, in modules, and in clients through the C API and D-Bus API. They are set both by the server and applications. When an application that uses libpulse connects to the server or creates a stream, libpulse automatically sets some client and stream properties from environment variables and process attributes.

Applications typically specify properties when creating a context or stream object. Context properties are used for the corresponding server-side client object, and stream properties are used for the corresponding server-side source output or sink input object.

There are also two cases when properties are inherited:

when a sink input or source output is created for a “native” protocol stream, it inherits properties of the client that owns the stream
when a sink input is created for to play a sample cache entry, it inherits properties of the entry

Most properties are used to provide various auxiliary meta-information to applications, like description and icons that may be displayed in GUI. Some properties are used for automatic actions like filter autoloading and routing.

Properties are grouped into property classes. The table below summarizes them.

class	description	set for	set by	used by
*"module."**	module meta-information	module	module loader	GUIs
*"application."**	application and process attributes	client	applications, libpulse	GUIs, autodetecting properties, automatic actions, routing
*"window."**	desktop window attributes	client	applications, libpulse	position event sounds module
*"device."**	card, device, and device port attributes	card, device port, source, sink	device backends, sound processing modules	GUIs, automatic actions, routing
*"media."**	multimedia attributes of a stream	source output, sink input	applications, sound processing modules	GUIs, automatic actions, routing, device backends
*"event."**	event sound stream attributes	source output, sink input	applications, sample cache	sample cache, position event sounds module
*"filter."**	sound processing filters for a stream	source output, sink input	applications	filter autoloading
*"format."**	sample format attributes	format info of source, sink, source output, or sink input	applications, device backends	applications, device backends

Threads

There are two types of threads used in PulseAudio server:

event loop threads
These threads run an event loop (pa_mainloop) which handles I/O and timer events, including asynchronous messages from other threads.
The mainloop API hides the execution flow control from the thread implementer, which just registers event handlers and starts the loop.
IO threads
These threads run a select/poll-like loop (pa_rtpoll) which handles I/O and timer events, including asynchronous messages from other threads.
The rtpoll API leaves the execution flow control to the thread implementer, who may decide how much and when to sleep, what file descriptors to handle and when, when to handle pending messages, etc.

PulseAudio server creates the following threads:

core event loop
The main thread runs the core main loop. Most modules register handlers in the core main loop. For example, the core main loop is used in hotplug and network modules to listen to Udev events (to detect ALSA cards), to listen to D-Bus events (to detect Bluetooth devices and JACK ports), to listen to broadcast announcements (to detect RTP receivers), and to handle client connections.
avahi event loop
PulseAudio creates a separate threaded event loop which runs an Avahi client and handles asynchronous messages from other threads.
client event loops
When PulseAudio server acts as a client to another remote PulseAudio server, it runs a separate thread with a client main loop. This is used in the tunnel source and sink implementation.
device IO threads
Every source and sink have its own IO thread which reads or writes samples, implements clocking, and maintains latency.

The communication between threads is done via messages. Each thread installs a thread message queue (pa_thead_mq which uses pa_asyncmsgq) to handle asynchronous messages for message objects (pa_msgobject).

There are two types of message objects:

Sources and sinks are message objects with a dedicated IO thread, which handles both IO and asynchronous messages.
Other message objects, including source outputs and sink inputs, don’t have a dedicated thread. They all share the core event loop thread which handles asynchronous messages sent to them.

Memory

The memory management in PulseAudio is based on the following concepts:

reference counting
PulseAudio extensively uses reference counting. In particular, all objects (pa_object), message objects (pa_msgobject), memory blocks (pa_memblock), and many other types are reference-counted.
blocks
A block (pa_memblock) is a fixed-size reference-counted array in memory, usually, but not necessary, allocated from a memory pool. Usually, blocks are used indirectly via chunks.
chunks
A chunk (pa_memchunk) is a variable-size slice of a block. It consists of a pointer to a block, a starting offset in the block, and a length. Chunks are not reference-counted. They are usually allocated on stack or inside other objects. Most of the sample processing and exchange in done using chunks.
block queues
A block queue (ps_memblockq) is a FIFO of chunks. It allows to push and pop chunks and implements various flow control operations. Sample streams are based on the block queues.
pools
A pool (pa_mempool) is the most common way to allocate blocks.
Pools may use either private memory, POSIX shared memory, or memfd shared memory. The shared memory pools combined with the memory exports and imports are used to implement the zero-copy mode.
Pools may be either global or per-client. The per-client pools guarantee that only the owner client is able to access its memory.
exports and imports
Memory exports (pa_memexport) allow to make blocks accessible from other processes. Memory imports (pa_memimport) allow to access the exported blocks.
The exporting process allocates a block in a shared memory pool, fills it, and communicates the shared memory id and the block id to the importing process. The importing process opens the shared memory, finds the block, and uses it.
When either the exporting process revokes the block or the importing process releases the block, it is returned to the memory pool.

I/O

PulseAudio server uses the following I/O APIs:

mainloop
The event loop API (pa_mainloop_api) is used to handle I/O and timer events using callbacks. The user registers callbacks and runs the loop.
There are three implementations of the mainloop API: a regular event loop (pa_mainloop), a threaded event loop (pa_threaded_mainloop), which runs an event loop in a separate thread, and a Glib event loop (pa_glib_mainloop), which runs an event loop on top of the Glib event loop.
Most of the I/O, with a notable exception of the device IO threads, is based on the mainloop API.
rtpoll
The rtpoll API (pa_rtpoll) is used to handle I/O and timer events using a select/poll-like loop.
The user registers descriptors and manually runs the loop iterations. The user fully controls the flow and decides when to perform I/O and when and how much to sleep.
The device threads are based on the rtpoll API.
srbchannel
A shared ring buffer channel (pa_srbchannel) is a bidirectional byte stream on top of two ring buffers in the shared memory, two file descriptor-based semaphores (using POSIX pipe or Linux-specific eventfd), and an event loop.
The user registers a callback that is called when the descriptor is ready, and performs non-blocking read or write operations.
The “native” protocol streams employ this channel to exchange control commands and chunk identifiers in the zero-copy mode, when the client and server are running on the same host and use a shared memory pool.
iochannel
An I/O channel (pa_iochannel) is a bidirectional byte stream on top of a socket file descriptor and an event loop.
The user registers a callback that is called when the descriptor is ready, and performs non-blocking read or write operations. In addition to the regular data, the user may also send and receive file descriptors and user credentials, if a Unix domain socket is used.
Binary client streams, like the “native”, “simple”, and ESound streams, are based on this channel, except when using the zero-copy mode of the “native” protocol.
ioline
An I/O line (pa_ioline) is a bidirectional line-oriented text stream on top of the I/O channel (pa_iochannel). The user registers callback that is called for every received line and sends and receives data line-by-line.
Text client streams, like the CLI and HTTP streams, are based on this channel.

Packets

The “native” protocol is implemented on top of packets, packet stream, and packet dispatcher:

packet
Packet (pa_packet) is a fixed-size reference-countable blob with data. Packets are usually allocated from a global per-process pool.
packet stream
Packet stream (pa_pstream) is a bidirectional message stream on top of an event loop (pa_mainloop), an I/O channel (pa_iochannel), and a memory pool (pa_mempool).
The packet stream may optionally employ memory exports (pa_memexport), memory imports (pa_memimport), and a shared ring buffer channel (pa_srbchannel) if the zero-copy mode is enabled.
The user may send and receive packets (pa_packet), chunks (pa_memchunk), and control messages (shmrelease, shmrevoke). All these message types are used by the “native” protocol.
packet dispatcher
Packet dispatcher (pa_pdispatch) looks up and invokes a callback for a packet. The user first registers callbacks for commands in the dispatcher, and then passes incoming packets to it.

Audio files

Normally, it’s up to the client application to read or write audio files, and both PulseAudio server and client libraries deal only with sample streams. However, there are two cases when the server can read an audio file directly:

the client may ask the server to load an audio file to the sample cache
the client may ask the server to play an audio file

In the latter case, the server creates a new sink input that reads samples from the file. Both features are available through the CLI protocol. The server uses libsndfile to read audio files.

Optimizations

Depending on the target platform, PulseAudio may employ various optimized versions of the sample conversion and software volume functions.

They include several functions written in the GCC inline assembly and employing MMX (x86), SSE (x86), or NEON (arm) instructions, and several functions written in the Orc assembly. The latter may be compiled at run-time for the current CPU.

Watchdog

PulseAudio server has a built-in watchdog based on the POSIX rlimit feature. It terminates the server process if it consumes too much CPU load and doesn’t respond in time. The server may be then automatically started by a client if the autospawn feature is enabled.

The server configures the RLIMIT_CPU timer which has the soft and hard limits:

When the CPU time of the server process reaches the soft limit, the kernel sends to the process the SIGXCPU signal.
The signal handler checks if the server consumed too high percent of the CPU load since the previous signal handler invocation. If so, the signal handler terminates the server. Otherwise, it restarts the RLIMIT_CPU timer.
When the CPU time of the server process reaches the hard limit, the kernel sends to the process the SIGKILL signal.
This signal can’t be handled and unconditionally terminates the process. This happens only if the server failed to handle the previous SIGXCPU signal and didn’t restart the RLIMIT_CPU timer before it reached the hard limit.

Module list

The tables below provide a brief summary of modules available out of the box, grouped by categories. Further details may be found on the Modules page on wiki.

Protocols and networking

module	usage	description
module-cli-protocol-{unix,tcp} module-cli	enabled by default	Starts the CLI protocol server over a Unix domain socket, TCP socket, or the controlling TTY of the daemon.
module-dbus-protocol	enabled by default	Starts the D-Bus protocol server.
module-native-protocol-{fd,unix,tcp}	enabled by default	Starts the "native" protocol server over a preopened file descriptor, Unix domain socket, or TCP socket.
module-simple-protocol-{unix,tcp}	rarely used	Starts the "simple" protocol server over a Unix domain socket or TCP socket.
module-esound-protocol-{unix,tcp}		Starts the ESound protocol server over a Unix domain socket or TCP socket.
module-tunnel-{source,sink}	loaded by another module	Creates a virtual source or sink connected to a remote source or sink via the "native" protocol (implements client from scratch).
module-tunnel-{source,sink}-new	work in progress	Creates a virtual source or sink connected to a remote source or sink via the "native" protocol (reuses client from libpulse).
module-zeroconf-discover	enabled in paprefs	Listens to mDNS announcements and automatically loads module-tunnel-{source,sink} for every remote source or sink (uses Avahi).
module-zeroconf-publish	enabled in paprefs	Sends mDNS announcements for all local sources and sinks (uses Avahi).
module-bonjour-publish	for MacOS	Sends mDNS announcements for all local sources and sinks (uses Apple Bonjour).
module-raop-sink	loaded by another module	Creates a virtual sink that forwards audio to a remote AirPlay1 device.
module-raop-discover	enabled in paprefs	Listens to mDNS announcements and automatically loads module-raop-sink for every remote AirPlay1 device (uses Avahi).
module-rtp-recv	enabled in paprefs	Listens to SDP/SAP announcements and automatically creates an RTP sink input for every detected RTP sender.
module-rtp-send	enabled in paprefs	Creates RTP source output that sends samples to a preconfigured address, and broadcasts SDP/SAP announcements for it.
module-http-protocol-{unix,tcp}	used by other modules	Starts an HTTP server over a Unix domain socket or TCP socket. Implements a web interface and HTTP streaming for sources and sink monitors.
module-rygel-media-server	enabled in paprefs	Registers a plugin for the Rygel DLNA / UpNP server. Publishes HTTP streams of sources and sink monitors.

Device drivers

module	usage	description
module-udev-detect	enabled if available	Listens to Udev events and automatically loads module-alsa-card for every ALSA card.
module-hal-detect	deprecated	Loads module-udev-detect.
module-detect	for systems without Udev	On start, detects ALSA, OSS, Solaris, and WaveOut devices and loads appropriate source and sink modules.
module-alsa-card	loaded by another module	Creates a card for an ALSA card, and automatically loads module-alsa-{source,sink} for inner ALSA devices.
module-alsa-{source,sink}	loaded by another module	Creates a hardware source or sink for an ALSA device.
module-bluetooth-discover	enabled by default	Loads either module-bluez5-discover or module-bluez4-discover.
module-bluetooth-policy	enabled by default	Automatically switches card profiles of Bluetooth cards and loads module-loopback to route and play music from new Bluetooth sources.
module-{bluez5,bluez4}-discover	loaded by another module	Listens to Bluez and oFono events on D-Bus and automatically loads module-{bluez5,bluez4}-device for every device.
module-{bluez5,bluez4}-device	loaded by another module	Creates a card, source, and sink for a Bluetooth device.
module-jackdbus-detect		Listens to JACK events on D-Bus and automatically loads module-jack-{source,sink} for every JACK port.
module-jack-{source,sink}	loaded by another module	Creates a virtual source or sink for a JACK port.
module-oss	for systems with OSS	Creates a virtual source and sink for an OSS device (/dev/dspN).
module-solaris	for Solaris and some *BSD	Creates a virtual source and sink for a Sun audio device (/dev/audio).
module-coreaudio-detect	for MacOS	Listens to CoreAudio events and automatically loads module-coreaudio-device for every CoreAudio device.
module-coreaudio-device	for MacOS	Creates a virtual source and sink for a CoreAudio device.
module-waveout	for Windows	Creates a virtual source and sink for Win32 WaveIn/WaveOut devices.
module-esound-sink	has latency issues	Creates a virtual sink connected to the ESound daemon.
module-lirc		Listens to LIRC events from an IR remote control and forward volume up/down and mute requests to a preconfigured sink.
module-mmkbd-evdev		Listens to evdev events from the multimedia keyboard and forward volume up/down and mute requests to a preconfigured sink.

Sound processing

module	usage	description
module-loopback	used by other modules	Creates a pair of virtual source output and sink input connected with a queue, may be used as a source-to-sink adapter.
module-null-{source,sink}		Creates a virtual sink that silently drops all data. Together with its monitor, may be used as a source-output-to-sink-input adapter.
module-combine-sink	enabled in paprefs	Creates a virtual sink that duplicates data to several other sinks.
module-combine	deprecated	Loads module-combine-sink.
module-sine-source module-sine		Creates a virtual source or sink input that generates a sine wave with the preconfigured frequency.
module-pipe-{source,sink}		Creates a virtual source or sink that reads or writes data to a preconfigured file or named pipe.
module-remap-{source,sink}	autoloadable filter	Creates a filter source or sink that performs channel remapping on top of the master source or sink.
module-equalizer-sink	autoloadable filter	Creates a filter source or sink that implements a digital equalizer on top of the master source or sink. The equalizer may be controlled via D-Bus.
module-virtual-surround-sink	autoloadable filter	Creates a filter source or sink that performs a convolution with a HRIR WAV file on top of the master source or sink.
module-virtual-{source,sink}	autoloadable filter	Creates a filter source or sink that just reads or writes data to the master source or sink.
module-echo-cancel	autoloadable filter	Creates a paired filter source and sink that perform acoustic echo cancellation on top of the master source and sink.
module-ladspa-sink	autoloadable filter	Creates a filter source or sink that applies an audio filter from an external LADSPA plugin on top of the master source or sink. The plugin may be controlled via D-Bus.

Power saving

module	usage	description
module-suspend-on-idle	enabled by default	Monitors sources and sinks and automatically suspends them when there are no connected streams for some period of time.

Automatic setup and routing

module	usage	description
module-default-device-restore	enabled by default	Saves and restores the fallback source and sink.
module-card-restore	enabled by default	Saves and restores active profiles of card.
module-device-restore	enabled by default	Saves and restores volume/mute settings and active ports of devices.
module-stream-restore	enabled by default	Saves and restores volume/mute settings and routing rules of stream groups. Routes new streams according to stored device name.
module-volume-restore	deprecated	Loads module-stream-restore.
module-device-manager	enabled in KDE	Stores per-role and global priority lists of devices. Routes new streams according to the stored priority lists.
module-intended-roles	enabled by default	Routes new streams according to the stream role and device intended role list.
module-augment-properties	enabled by default	Computes client properties from the desktop file of the application.
module-filter-heuristics	enabled by default	Adjusts the filters requested via the stream properties.
module-filter-apply	enabled by default	Loads the filters requested via the stream properties.
module-switch-on-port-available	enabled by default	When a new device port appears, automatically sets it as the active port of the appropriate device.
module-switch-on-connect		When a new device appears, automatically sets it as the fallback device and move all active streams to it.
module-rescue-streams	enabled by default	When a device disappears, automatically moves streams connected to it to another working device.
module-always-sink	enabled by default	When all sinks disappear, automatically loads the null sink.
module-role-{ducking,cork}	enabled by default	When an important stream is started, automatically ducks (lower the volume) or corks (mute and request a pause) active streams.
module-allow-passthrough		When a new passthrough stream is moved to an existing sink, automatically creates a null sink and moves all other running streams to that null sink.
module-position-event-sounds	enabled by default	When a stream is created or its properties are changed, adjusts the volume balance of the stream depending on its on-screen 2-D coordinated provided by the application.
module-match		When a new stream appears, automatically updates its volume based on its name and a preconfigured match table.

Desktop integrations

module	usage	description
module-systemd-login	enabled by default	Listens to logind events on D-Bus and creates a fake PulseAudio client for every new login session of the current user to prevent server from exiting until the user logouts.
module-console-kit	for systems w/o systemd	Listens to ConsoleKit events on D-Bus and creates a fake PulseAudio client for every new login session of the current user to prevent server from exiting the user logouts.
module-x11-xsmp	enabled by default	Listens to X session manager events and creates a fake PulseAudio client for the current login session to prevent server from exiting the user logouts.
module-x11-publish	enabled by default	Publishes server address and credentials via the X11 root window properties, which may be read by clients running on the same X display, including remote clients that use SSH X forwarding.
module-x11-bell	enabled by default	Intercepts XKB bell events and plays a preconfigured sample from the sample cache instead of the default X11 bell.
module-x11-cork-request	enabled by default	When an (un)cork is requested for a stream, synthesizes X11 media key event and sends it to the application. A workaround for applications that don't handle cork requests, but do handle multimedia keys.
module-gconf	enabled by default	Monitors GConf directory and automatically loads PulseAudio modules listed there.

Compatibility layers

module	usage	description
module-esound-compat-spawnpid module-esound-compat-spawnfd	loaded by esdcompat	Notifies the process that started the server that the server was successfully started with a signal or via a file descriptor.

This section provides a brief summary of the three standard GUI tools for PulseAudio. Besides these tools, most desktop environments also provide their own tools or applets.

pavucontrol

pavucontrol (PulseAudio Volume Control) provides the following features:

setup volumes or mute sources, source outputs, sinks, and sink inputs
connect sink inputs to sinks
connect source outputs to sources
set fallback source and sink
set active device port of a source or sink
set active card profile of a card
configure latency offset and passthrough mode of a source or sink

This tool uses the C API to communicate with the server. PulseAudio automatically saves most of these setting to the restoration database, so they are persistent.

paprefs

paprefs (PulseAudio Preferences) provides methods to enable modules or module options that are disabled by default.

This tool just writes module names and arguments to the GNOME registry (GConf) and PulseAudio automatically loads the modules. These settings are persistent.

qpaeq

qpaeq (Qt PulseAudio Equalizer) is a frontend for the equalizer sink.

This tool communicates with the server through the D-Bus API. It first creates an equalizer sink connected to a selected master sink, and then uses the D-Bus API extension registered by the equalized sink.

The number of sliders depends on the window size, so a larger window gives a higher precision.

PulseAudio package comes with several command line tools.

Server

pulseaudio
pulseaudio tool starts or kills the server.
The user may specify what configuration files to use, what modules to load and from where, configure log levels, and some other options.

Clients

pacmd
pacmd tool starts an interactive session for server configuration.
It connects to the server via the CLI protocol over a Unix domain socket. This text protocol provides a variety of commands to inspect and configure the server. The tool redirects its stdin and stdout to the socket so that the user directly communicates with the server.
pactl
pactl tool implements non-interactive commands for server configuration.
It communicates with the server via the C API, which uses the “native” protocol internally. The tool understands a number of commands which should be specified via the command line arguments. It supports only a subset of the features available in the CLI protocol.
pacat
pacat tool implements a playback and recording client.
The paplay, parecord, parec, and pamon tools are symlinks to the pacat tool. The tool has four operation modes. The mode is determined by the symlink used to invoke the tool:
- paplay
  Create a playback stream, read and decode samples from an audio file using libsndfile, and send samples to the stream.
- parecord
  Create a recording stream, receive samples from the stream, and encode and write them to a file using libsndfile.
- pacat
  Create a playback stream, read raw samples from stdin, and send samples to the stream.
- parec or pamon
  Create a recording stream, receive samples from the stream, and write raw samples to stdout.

Desktop

start-pulseaudio-x11
start-pulseaudio-x11 tool starts the server for the current X11 session.
It relies on the autospawn feature to start the server. The tool loads several desktop-related modules, passing them the $DISPLAY and $SESSION_MANAGER environment variables.
pax11publish
pax11publish tool publishes the server address and credentials via the X11 root window properties.
The tool sets X11 properties that may be used by PulseAudio clients to connect to the server. The server address and credentials should be manually provided by the user via command line arguments. The tool is now superseded by the module-x11-publish and start-pulseaudio-x11 tool.

Compatibility

pasuspender
pasuspender tool is a wrapper for applications that require an exclusive access to devices.
The tool connects to the server via the C API and asks the server to suspend all sinks and sources. Then it runs a child process, that may freely use any devices, typically ALSA devices. When the child exits, the tool asks the server to resume sinks and sources.
padsp
padsp tool is a wrapper for OSS applications.
The tool intercepts standard library functions of an OSS application using the LD_PRELOAD trick and redirects sound to PulseAudio. This approach works for some applications but is known to be incomplete. The ossp daemon with the ossp-padsp backend may be a better alternative for this tool.
esdcompat
esdcompat emulates ESound autospawn feature.
The tool uses PulseAudio autospawn to start PulseAudio server and loads ESound compatibility modules that emulate ESound autospawn. The tool takes the same arguments as the ESound daemon so it can be used as a drop-in replacement for the esd tool.

Configuration

component

libpulse

libpulsecore

PulseAudio configuration is documented on the User and the FAQ pages on wiki.

The Arch Linux wiki may be also useful: 1, 2, 3, 4.

System and user modes

There are two ways to run the server:

one instance per-user
one system-wide instance

The system-wide mode is not recommended but is useful in some cases like an embedded system. See the SystemWide and WhatIsWrongWithSystemWide pages on wiki.

The major differences between the two modes are the following:

In the system-wide mode, the most of the desktop integration features, like session management and X11 publishing, are not necessary and can’t be used since the server is not bound to a login session or X11 display.
In the per-user mode, server instances use the device reservation API to acquire exclusive access on ALSA devices. In the system-wide mode, this API is not used.
In the per-user mode, PulseAudio system directories are shared between server instances, and PulseAudio user directories belong to the user. In the system-wide mode, PulseAudio user directories belong to the “pulse” user.

From the usability and security points of view, the server is not designed to be shared by multiple users. There is no user separation inside the server. In particular:

Any connected client can read and control streams or devices used by another client. Any connected client can affect global configuration, like device hotplug.
All connected clients share the same persistent state, including the default source and sink, the restoration database, etc.

System directories

PulseAudio uses the following system-wide directories:

config directory
Holds system-wide configuration files.
Usually set to "/etc/pulse".
Contains .conf and .pa configuration files.
module directory
Holds dynamically loadable libraries for server modules.
Usually set to "/usr/lib/pulse-{VERSION}/modules".
Contains "module-{NAME}.so" files.
data directory
Holds platform-independent server data.
Usually set to "/usr/share/pulseaudio".
Contains "alsa-mixer" directory with ALSA profiles for non-UCM ALSA cards. See Profiles page on wiki.

User directories

PulseAudio uses the following per-user directories (belonging to the “pulse” user in the system-wide mode):

home directory
Default or parent directory for the configuration files, persistent state, and runtime state of the user’s server instance.
Usually set to "$XDG_CONFIG_HOME/pulse" (typically "~/.config/pulse"). However, if the "$HOME/.pulse" directory already exists, it’s used otherwise.
Contains:
- a cookie file (for the “native” protocol authentication)
- usually, contains configuration files, if the config directory is the same as the home directory
- usually, contains persistent state, if the state directory is the same as the home directory
- usually, contains a symlink to the runtime directory
config directory
Holds per-user configuration files that override global configuration.
Usually, this is the same directory as the home directory. However, another path may be specified via the $PULSE_CONFIG_PATH environment variable.
Contains .conf and .pa configuration files.
state directory
Holds per-user persistent state that is changed dynamically and should be kept across reboots.
Usually, this is the same directory as the home directory. However, another path may be specified via the $PULSE_STATE_PATH environment variable.
Contains:
- default source and sink names
- restoration database
- device manager database
- equalizer sink database
runtime directory
Holds per-user runtime state that should be cleared when the server restarts.
Usually, the home directory contains a symlink to the runtime directory, which is located in /tmp. The symlink name includes the machine id.
However, another path may be specified via the $PULSE_RUNTIME_PATH environment variable. If it’s not set, but the $XDG_RUNTIME_DIR environment variable is set, the runtime directory path is set to "$XDG_RUNTIME_DIR/pulse" instead (typically somewhere in "/run").
Contains:
- sockets files
- pid file
- lock files

Configuration files

PulseAudio uses two types of configuration files:

.conf - client or server options in a simple key-value format
.pa - server initialization commands in the CLI protocol format

Four files are used by default:

daemon.conf - server options
client.conf - client options
default.pa - server initialization for the per-user mode
system.pa - server initialization for the system-wide mode

Sockets

PulseAudio uses two types of sockets:

Unix domain sockets (for local clients)
TCP sockets (for remote clients)

The table below lists the non-standard TCP ports used by PulseAudio.

module	port
module-native-protocol-tcp	4713
module-simple-protocol-tcp	4711
module-http-protocol-tcp	4714
module-cli-protocol-tcp	4712

Client startup

Every client that uses libpulse performs the following steps at startup:

initialize
First, the client reads the server address, the authentication cookie, and other options from:
- environment variables
- X11 root window properties
- per-user and system-wide client configuration files
connect
When the initialization is done, the client tries to connect to the following addresses:
- a Unix socket for the “native” protocol in the per-user and system-wide runtime directories
- a TCP socket on localhost
- a TCP socket on the host defined by the $DISPLAY environment variable
autospawn
If the client can’t connect to the server, it automatically starts the server if all of the following is true:
- the autospawn is not disabled in the client configuration file
- the client is not running under the root user
- the server address is not set, or it’s set and belongs to the same login session as the client
authenticate
When the client has connected to the server, it tries to read the authentication cookie from:
- environment variables
- X11 root window properties
- a cookie file explicitly provided by the application
- per-user and system-wide client configuration files
- per-user home directory

Portability

PulseAudio may work on several POSIX-compatible platforms. See About and Ports pages on wiki.

Currently supported operating systems are:

Linux
Android
*BSD
Solaris
Mac OSX
Windows

PulseAudio core implements platform wrappers for low-level stuff like threading, networking, I/O, and shared memory. The rest code is mostly cross-platform.

However, some important features rely on platform-specific external components:

The only full-featured hardware backends are ALSA and Bluetooth, which a both Linux-specific.
The timer-based scheduler is implemented only for ALSA cards, hence it’s also Linux-specific.
Most desktop integration features depend on freedesktop and Unix or Linux-specific components which are not used on Android, Mac OSX, and Windows.
Hardware controls handling also depend on Unix or Linux-specific components which are not used on Mac OSX and Windows.

In result, Linux is the only platform on which all of the important features are supported. Other Unix desktops are supported but have limited functionality. Some features rely on the Linux and Unix desktop stack, therefore non-Unix desktop support is even more limited.

Example setups

This section demonstrates example PulseAudio configurations for several common and advanced use cases.

Playback and recording

Connect a recording application to a source
An application connects to the server via the “native” protocol and creates a recording stream. The server creates a source output for the stream. The source output is connected to a source.
Configuration:
1. Start the recording application:
```
$ parecord output.wav
```
2. Connect the parecord source output to the source using the pavucontrol tool.
Connect a playback application to a sink
An application connects to the server via the “native” protocol and creates a playback stream. The server creates a sink input for the stream. The sink input is connected to a sink.
Configuration:
1. Start the playback application:
```
$ paplay input.wav
```
2. Connect the paplay sink input to the sink using the pavucontrol tool.
Connect a playback application to a filter sink
An application connects to the server via the “native” protocol and creates a playback stream. The server creates a sink input for the stream. The sink input is connected to a filter sink, which in turn is connected to the master sink.
Configuration:
1. Create an equalizer sink and sink input:
```
$ pactl load-module module-dbus-protocol
$ qpaeq
```
2. Connect the equalizer sink input to the master sink using the pavucontrol tool.
3. Start the playback application:
```
$ paplay input.wav
```
4. Connect the paplay sink input to the equalizer sink using the pavucontrol tool.

Capturing sound

Connect a sink to a file
The monitor source of a sink is connected to a loopback, which in turn is connected to a pipe sink (because it’s not possible to connect a source to a sink directly). The pipe sink writes samples to a file.
Configuration:
1. Create the pipe sink:
```
$ pactl load-module module-pipe-sink file="$(pwd)/output.pcm"
```
2. Create the loopback sink input and source output:
```
$ pactl load-module module-loopback
```
3. Connect the loopback sink input to the pipe sink using the pavucontrol tool.
4. Connect the loopback source output to the sink monitor of a sink using the pavucontrol tool.
5. When the capture is done, play the recorded file:
```
$ pacat output.pcm
```
Connect a playback application to a file
An application connects to the server via the “native” protocol and creates a playback stream. The server creates a sink input for the stream. The sink input is connected to a pipe sink. The pipe sink writes samples to a file.
Configuration:
1. Create the pipe sink:
```
$ pactl load-module module-pipe-sink file="$(pwd)/output.pcm"
```
2. Start the playback application:
```
$ paplay input.wav
```
3. Connect the paplay sink input to the pipe sink using the pavucontrol tool.
4. When the capture is done, play the recorded file:
```
$ pacat output.pcm
```
Connect a playback application to a recording application
Two applications connect to the server via the “native” protocol and create playback and recording streams. The server creates sink input and source output for the streams. The sink input is connected to the source output via a null sink (because it’s not possible to connect a sink input to a source output directly).
Configuration:
1. Create a null sink:
```
$ pactl load-module module-null-sink
```
2. Start the playback application:
```
$ paplay input.wav
```
3. Start the recording application:
```
$ parecord output.wav
```
4. Connect the paplay sink input to the null sink using the pavucontrol tool.
5. Connect the parecord source output to the monitor of the null sink using the pavucontrol tool.

Native protocol

Connect a local playback application to a remote sink
An application connects to the local server via the “native” protocol and creates a playback stream. The local server creates a sink input for the stream. The local sink input is connected to a tunnel sink. The tunnel sink connects to the remote server via the “native” protocol and creates a playback stream. The remote server creates a sink input for the stream. The remote sink input is connected to a sink.
Configuration:
1. Enable Zeroconf publishing on the remote server using the paprefs tool or the following commands:
```
$ (start avahi daemon)
$ pactl load-module module-native-protocol-tcp auth-anonymous=1
$ pactl load-module module-zeroconf-publish
```
2. Enable Zeroconf discovery on the local server using the paprefs tool or the following commands:
```
$ (start avahi daemon)
$ pactl load-module module-zeroconf-discover
```
3. Wait until the tunnel sinks and sources appear on the local server.
4. Start the playback application on the local server:
```
$ paplay input.wav
```
5. Connect the local sink input to the tunnel sink using the pavucontrol tool on the local server.
6. Connect the remote sink input to a sink using the pavucontrol tool on the remote server.
Connect a local playback application to a remote recording application
The local application connects to the local server via the “native” protocol and creates a playback stream. The local server creates a sink input for the stream. The remote application connects to the remote server via the “native” protocol and creates a recording stream. The remote server creates a source output for the stream.
The local sink input is connected to a tunnel sink. The tunnel sink connects to the remote server via the “native” protocol and creates a playback stream. The remote server creates a sink input for the stream. The remote sink input is connected to the remote source output via a null sink (because it’s not possible to connect a sink input to a source output directly).
Configuration:
1. Enable Zeroconf publishing on the remote server using the paprefs tool or the following commands:
```
$ (start avahi daemon)
$ pactl load-module module-native-protocol-tcp auth-anonymous=1
$ pactl load-module module-zeroconf-publish
```
2. Enable Zeroconf discovery on the local server using the paprefs tool or the following commands:
```
$ (start avahi daemon)
$ pactl load-module module-zeroconf-discover
```
3. Wait until the tunnel sinks and sources appear on the local server.
4. Start the playback application on the local server:
```
$ paplay input.wav
```
5. Start the recording application on the remote server:
```
$ parecord output.wav
```
6. Create the null sink on the remote server:
```
$ pactl load-module module-null-sink
```
7. Connect the local sink input to the tunnel sink using the pavucontrol tool on the local server.
8. Connect the remote sink input to the null sink using the pavucontrol tool on the remote server.
9. Connect the remote source output to the sink monitor of the null sink using the pavucontrol tool on the remote server.
Connect a local sink to a remote sink
The monitor source of a sink is connected to a loopback, which in turn is connected to a tunnel sink (because it’s not possible to connect a source to a sink directly). The tunnel sink connects to the remote server via the “native” protocol and creates a playback stream. The remote server creates a sink input for the stream. The remote sink input is connected to a sink.
Configuration:
1. Enable Zeroconf publishing on the remote server using the paprefs tool or the following commands:
```
$ (start avahi daemon)
$ pactl load-module module-native-protocol-tcp auth-anonymous=1
$ pactl load-module module-zeroconf-publish
```
2. Enable Zeroconf discovery on the local server using the paprefs tool or the following commands:
```
$ (start avahi daemon)
$ pactl load-module module-zeroconf-discover
```
3. Wait until the tunnel sinks and sources appear on the local server.
4. Create the loopback on the local server:
```
$ pactl load-module module-loopback
```
5. Connect the loopback sink input to the tunnel sink using the pavucontrol tool.
6. Connect the loopback source output to the sink monitor of a sink using the pavucontrol tool.
7. Connect the remote sink input to the sink using the pavucontrol tool on the remote server.

RTP

Connect a local playback application to a remote sink
An application connects to the local server via the “native” protocol and creates a playback stream. The local server creates a sink input for the stream. The local sink input is connected to the RTP source output via a null sink (because it’s not possible to connect a sink input to a source output directly). The local RTP source output is connected to the remote RTP sink input via RTP. The remote RTP sink input is connected to a sink.
Configuration:
1. Enable RTP receiver on the remote server using the paprefs tool or the following command:
```
$ pactl load-module module-rtp-recv sap_address=<IP>
```
2. Enable RTP sender on the local server using the paprefs tool or the following command:
```
$ pactl load-module module-rtp-send destionation_ip=<IP>
```
3. Create the null sink on the local server:
```
$ pactl load-module module-null-sink
```
4. Connect the local sink input to the null sink using the pavucontrol tool on the local server.
5. Connect the sink monitor of the null sink to the local RTP source output using the pavucontrol tool on the local server.
6. Connect the remote RTP sink input to a sink using the pavucontrol tool on the remote server.
Connect a local playback application to a remote recording application
The local application connects to the local server via the “native” protocol and creates a playback stream. The local server creates a sink input for the stream. The remote application connects to the remote server via the “native” protocol and creates a recording stream. The remote server creates a source output for the stream.
The local sink input is connected to the RTP source output via a null sink (because it’s not possible to connect a sink input to a source output directly). The local RTP source output is connected to the remote RTP sink input via RTP. The remote RTP sink input is connected to the remote source output via a null sink (because it’s not possible to connect a sink input to a source output directly).
Configuration:
1. Enable RTP receiver on the remote server using the paprefs tool or the following command:
```
$ pactl load-module module-rtp-recv sap_address=<IP>
```
2. Enable RTP sender on the local server using the paprefs tool or the following command:
```
$ pactl load-module module-rtp-send destionation_ip=<IP>
```
3. Create the null sink on the local server:
```
$ pactl load-module module-null-sink
```
4. Create the null sink on the remote server:
```
$ pactl load-module module-null-sink
```
5. Connect the local sink input to the null sink using the pavucontrol tool on the local server.
6. Connect the sink monitor of the null sink to the local RTP source output using the pavucontrol tool on the local server.
7. Connect the remote RTP sink input to the null sink using the pavucontrol tool on the remote server.
8. Connect the remote source output to the sink monitor of the null sink using the pavucontrol tool on the remote server.
Connect a local sink to a remote sink
The sink monitor of a local sink is connected to the local RTP source output. The local RTP source output is connected to the remote RTP sink input via RTP. The remote RTP sink input is connected to a sink.
Configuration:
1. Enable RTP receiver on the remote server using the paprefs tool or the following command:
```
$ pactl load-module module-rtp-recv sap_address=<IP>
```
2. Enable RTP sender on the local server using the paprefs tool or the following command:
```
$ pactl load-module module-rtp-send destionation_ip=<IP>
```
3. Connect the sink monitor of a sink to the local RTP source output using the pavucontrol tool on the local server.
4. Connect the remote RTP sink input to a sink using the pavucontrol tool on the remote server.

Example clients and modules

This section provides several examples of client applications and server modules. The source code and usage instructions are available on GitHub.

Also, some analysis of the client examples is available in this post.

Documentation

The following official documentation is available:

Overview:
- Clients - application developer documentation
- Developer - module developer documentation
API documentation:
Client examples:
Module examples:

D-Bus API

This example is quite straightforward, so just look at the code.

C API

These examples are described in details in another blog post.

Modules

These examples have comments in the source code.

pa_module_source
This module implements a source. The module runs a thread that takes care of the timing and periodically generates samples and writes them to the connected source outputs.
pa_module_source_output
This module implements a source output. The module provides callbacks invoked when a source generates more samples.
pa_module_sink
This module implements a sink. The module runs a thread that takes care of the timing and periodically requests samples from the connected sink inputs.
pa_module_sink_input
This module implements a sink input. The module provides callbacks invoked when a sink needs more samples.

Critique

Finally, I’d like to discuss some problems in the PulseAudio design and implementation that I’ve gathered while writing this post.

We’ll discuss only problems that are essential but yet solvable and could be avoided while still providing the same functionality to the user.

We won’t discuss several kinds of issues:

The rationale for implementing or not implementing some features in a sound server. An ideal feature set of a sound server deserves a deeper analysis than I can provide here.
Fundamental costs of the provided features that are unavoidable. It’s up to the user to decide whether to pay them or not, depending on the requirements and available alternatives.
Bugs and limitations of the existing code that can be just fixed at some point. There is a bug tracker for such things.

Documentation

A comprehensive documentation is a starting point for detecting problems and improving things. PulseAudio has a good documentation for public interfaces and troubleshooting. However, the following official documentation is partially or completely missing:

a detailed feature overview
a design overview
a detailed description of the key abstractions and their relations
a high-level description of the key algorithms
a high-level description of the protocols
a detailed documentation for internal APIs
a rationale

Abstraction level

PulseAudio is built around the four fundamental object types: sources, sinks, source outputs, and sink inputs.

These object types are low-level enough. They operate with streams of samples. It’s up to the implementation how to handle the samples: write them to a sound card, send them over the network, perform sound processing or whatever else.

The good thing with this approach is the flexibility it gives. All of the numerous PulseAudio features are implemented in terms of the four objects. The problem, however, is that the features implemented on top of them are much higher level.

Because of this distance, modules often need to implement some intermediate layers. Similar modules need similar intermediate layers. However, the lack of the appropriate core abstractions causes problems with code reuse and consistency.

Here are some candidates of the high-level abstractions that are currently missing:

publishing and discovery
PulseAudio implements Zeroconf publishing and discovery for PulseAudio servers, SAP/SDP publishing and discovery for RTP, Zeroconf discovery for AirPlay, Udev hotplugging for ALSA cards, and BlueZ hotplugging for Bluetooth devices.
All of these mechanisms are implemented independently and are bound to a concrete type of transport or device. It’s not possible to reuse existing publishing and discovery alone. For example, it’s not possible to write a module that improves the RTP support without reimplementing SAP/SDP support or reuse the Zeroconf discovery for an alternative transport.
transport
PulseAudio implements various transport protocols, including the “native” protocol, RTP, RAOP, HTTP, and several Bluetooth transports.
Conceptually, a transport could implement just encoding and I/O. Bluetooth transports are close to this. RTP transport is implemented as a source output and sink input, which is a bit more complicated. The “native” protocol and RAOP transports are implemented as a source and sink, which is the most complicated.
Every network transport is implemented from scratch. A network source or sink should start an IO thread, run the rtpoll loop, implement clocking, handle asynchronous events, etc. It’s not possible to reuse this code for a new transport.
protocol extension
PulseAudio is an extensible server, and it supports several extensible protocols: the D-Bus API, the “native” protocol, and the RTP.
The D-Bus API is a good example. It provides an abstraction of the protocol extension which may be registered by a module.
Modules can also implement custom commands for the “native” protocol. However, the core does not provide an explicit abstraction of the “native” protocol extension. All modules that implement custom commands are hardcoded in the protocol implementation. It’s not possible to add a new extension without modifying the core.
The RTP is designed to be an extremely extensible protocol. However, PulseAudio doesn’t support RTP extensions. It’s not possible to add support for a new payload type or add forward error correction support to the RTP transport.
filter
PulseAudio supports several filters, like channel remapping and echo cancellation.
All filter modules follow the same convention. They accept the same arguments and create a pair of virtual device and stream.
Every filter module is implemented from scratch. It should parse the module arguments, start an IO thread, run the rtpoll loop, implement clocking, handle asynchronous events, etc. It’s not possible to reuse this code for a new filter.
Although it’s possible to implement filters as LADSPA plugins, all filters available out of the box don’t use this possibility. And even though this approach reduces the overhead in the source code, it doesn’t reduce the overhead at runtime. Every plugin still gets its own source or sink, so we have an extra thread and latency.
priority list
PulseAudio implements an extensible routing algorithm, which is spread across several modules.
Every routing step is implemented in a separate module. These modules are quite isolated because the core doesn’t provide a generic routing abstraction. Every module just installs a hook that tries to route a stream in its own way if it wasn’t routed by another module yet. The routing rules are implemented independently in every module, which may lead to inconsistency.
This problem is addressed by the PriorityRouting proposal which is currently not implemented.
scheduler
PulseAudio implements an advanced timer-based scheduler for ALSA device. The scheduler is implemented inside the ALSA source and sink.
There are two problems. First, the implementation is duplicated in the source and in the sink. Second, the implementation is pretty complicated, and mixing it with the source or sink housekeeping makes things even more complicated. It would be much simpler to understand and improve it if it was a standalone component.

Mechanism and policy

Lack of the appropriate high-level abstractions leads to violation of the separation of mechanism and policy principle.

If a mechanism is not encapsulated by an abstraction that is generic enough, its implementation tends to be merged with the concrete policy or a set of policies. It makes it hard to modify, replace, or reuse mechanism and policy independently.

When this happens, and several modules need the same mechanism, two scenarios are possible:

The implementation of the mechanism may be reimplemented in every module, but a bit differently, fitted for the concrete policy that the module needs.
In this case, we get increased code duplication. We also get an increased coupling between modules, because cooperating modules rely on the concrete policies implemented in other modules, instead of a generic mechanism implemented in core.
This happened with network transports, sound processing filters, and routing modules.
The implementation of the mechanism together with the implementation of all necessary policies may be moved to the core.
In this case, we get defective modularity. We also get an increased coupling between modules and core, because the modules rely on the concrete policies implemented in core, instead of a generic mechanism.
This happened with the network protocols and protocol extensions. Actually, the modularity is only an illusion in this particular case, because the “native”, the “simple”, the CLI, the HTTP, and the ESound protocol modules are just thin wrappers that use the functionality implemented completely in the core.

Code quality

There are some usual problems with the code quality, that could be resolved by introducing stricter code style guidelines:

Complicated sharing of responsibilities between the “base” part of an object from the core and the “derived” part from a module, which recursively calls each other.
Complicated sharing of an object state between threads and non-obvious jumps from one thread to another. Different methods of the same object are called on different threads and use different subsets of the object fields.
Wide and coupled internal interfaces. Maybe it’s just me, but it took me about a week to figure it out what’s going on in the ALSA Mixer and UCM related code.
Mixing low-level code like memory management and string manipulation with the high-level logic, which becomes hard to extract and understand.
Mixing generic utilities and domain-specific components in core. Utilities are boring and vast, and it would be helpful to separate them from the really important code.
Handmade serialization for the network protocols and restoration database, handmade formatters and parsers.
Custom implementation of the collections, event loops, threading primitives, and platform wrappers, instead of using a general purpose library like Glib, which is an acceptable dependency on the desktop.
Long function bodies, short variable names, reusing the same variable for several purposes, #ifdef madness in some modules.

Service quality

Two PulseAudio servers can be connected either using the “native” protocol or using RTP. Both implementations are not suited for unreliable networks like WiFi:

the “native” protocol is based on TCP, and packet losses cause playback delays
the implementation of RTP sender and receiver in PulseAudio doesn’t employ any RTP extensions for error correction or retransmission, and packet losses cause playback holes

This problem is addressed by the PulseAudio integration for the Roc project that I’m currently working on.

Usability

There are several sources of confusion for users:

Building transport chains is non-intuitive. The user just wants to select the source, destination, and transport. However, things are not that simple.
On the one hand, different transports are implemented using different object types. For example, the “native” protocol uses tunnel sources and sinks, but the RTP uses RTP source outputs and sink inputs. On the other hand, only certain object types may be connected directly.
Hence, in some cases, the user can just connect a device or a stream to a transport, while in other cases the user has to configure tricky adapters like the loopback and null sink.
Building sound processing chains is non-intuitive. The user just wants to apply one or several filters to a stream or device. But things are not that simple.
Every filter module creates a pair of a virtual device and stream. The user has to find both in the list of available devices and streams and connect the filter device to some stream and the filter stream to some device. When multiple filter chains are employed, the configuration becomes totally confusing.
Routing is non-intuitive. The user wants to select a device either for a stream or for a category of streams. Again, things are not that simple.
There are two independent routing rule databases. The fist one (module-stream-restore) is used when the user moves a single stream, but the server may move either a single stream or a category of streams, depending on the stream properties provided by the application.
The second one (module-device-manager) is used when the user configures preferred devices for a category of streams. However, this routing rules may be overridden by conflicting rules from the first database. Only KDE provides a GUI for this second database.
The autospawn feature is just weird. The user wants to kill the server, but the server is magically restarted by any background client. By the way, if the user disables autospawn, the server will not be automatically restarted after a crash, which still happens from time to time.

Final thoughts

This section lists some downsides, but there are upsides too:

With a few exceptions mentioned above, the module system is done well.
At the high level, the inter-thread communication is done well. Mutexes are rare, threads use event loops and exchange asynchronous messages.
The memory management is done well. It is based on pools and chunks and is flexible enough to support the zero-copy mode.
The function contracts are carefully covered with assertions.
There is a plenty of advanced and well-engineered features, including the timer-based scheduler, buffer rewinding, clocking, and latency management. There is a lot to learn from the implementation.
Finally, PulseAudio just works in many cases.

Most of the problems listed in this section are not trivial but may be resolved with a thoughtful refactoring. In this regard, it’s worth mentioning some reasons to contribute to PulseAudio:

The project solves real problems that are complex and interesting, so it offers a challenge.
The project employs or implements many different technologies, from various device backends to sound processing tools and scheduling, so there is something to learn from.
The project is already used on many Linux desktops, so it’s practical.

Thanks for reading, and happy hacking!

↧

Coding the history of Deep Learning

September 22, 2017, 6:05 am

≫ Next: Ask HN: Sell my startup for $14M because I can't raise $2M?

≪ Previous: PulseAudio under the hood

There are six snippets of code that made deep learning what it is today. This article covers the inventors and the background to their breakthroughs. Each story includes simple code samples on FloydHub and GitHub to play around with.

Source: Google press image

To run the code examples on FloydHub, make sure you have installed the floyd command line tool and cloned the code examples I've provided to your local machine. If you are new to FloydHub, you might want to first read the getting started with FloydHub section in my earlier post. Once you have initiated the CLI in the example project folder on your local machine, you can spin up the project on FloydHub with the following command:

floyd run --data emilwallner/datasets/mnist/1:mnist --tensorboard --mode jupyter

The Method of Least Squares

Deep learning all started with this snippet of math (which I've translated here into Python):

defcompute_error_for_line_given_points(b, m, coordinates):
    totalError =0for i in range(0, len(coordinates)):
        x = coordinates[i][0]
        y = coordinates[i][1]
        totalError +=(y -(m * x + b))**2return totalError / float(len(coordinates))

compute_error_for_line_given_points(1,2,[[3,6],[6,9],[12,18]])

This was first published by Adrien-Marie Legendre (1805, Legendre), a Parisian mathematician who was also known for measuring the meter. He had a particular obsession with predicting the future location of comets. Given a couple of past locations of comets, he searched relentlessly for a method to calculate their trajectory.

It really was one of those spaghetti-on-the-wall moments. He tried several methods, then one version finally stuck with him. Legendre’s process started by guessing the future location of a comet, then he squared the errors he made, and finally remade his guess to reduce the sum of the squared errors. This was the seed for linear regression.

Play with the above code in the Jupyter notebook I've provided to get a feel for it. m is the coefficient and b in the constant for your prediction, and the coordinates are the locations of the comet. The goal is to find a combination of m and b where the error is as small as possible.

least

This is the core of deep learning: taking an input and a desired output, and then searching for the correlation between the two.

Gradient Descent

Legendre’s method of manually trying to reduce the error rate was time-consuming. Peter Debye, a Nobel prize winner from The Netherlands, formalized a solution for this process a century later (1909, Debye).

Let’s imagine that Legendre had one parameter to worry about - we'll call it X. The Y axis represents the error value for each value of X. Legendre was searching for where X results in the lowest error. In this graphical representation, we can see that the value of X that minimizes the error Y is when X = 1.1.

cost function

Peter Debye noticed that the slope to the left of the minimum is negative, while it’s positive on the other side. Thus, if you know the value of the slope at any given X value, you can guide Y towards its minimum.

This led to the method of gradient descent. The principle is used in almost every deep learning model.

To play with this, let’s assume that the error function is $Error = x^5 -2x^3-2$ . To know the slope of any given X value we take its derivative, which is $5x^4 - 6x^2$ :

equation

Watch Khan Academy’s video if you need to brush up your knowledge on derivatives.

Debye’s math translated into Python:

current_x =0.5
learning_rate =0.01
num_iterations =60defslope_at_given_x_value(x):return5* x**4-6* x**2for i in range(num_iterations):
   previous_x = current_x
   current_x +=-learning_rate * slope_at_given_x_value(previous_x)print(previous_x)print("The local minimum occurs at %f"% current_x)

The trick here is the learning_rate. By going in the opposite direction of the slope it approaches the minimum. Additionally, the closer it gets to the minimum, the smaller the slope gets. This reduces each step as the slope approaches zero.

num_iterations is your estimated time of iterations before you reach the minimum. Play with the parameters it to get an intuition for gradient descent.

Linear Regression

By combining the method of least square and gradient descent you get linear regression. In the 1950s and 1960s, a group of experimental economists implemented versions of these ideas on early computers. The logic was implemented on physical punch cards - truly handmade software programs. It took several days to prepare these punch cards and up to 24 hours to run one regression analysis through the computer.

Here's a linear regression example translated into Python (so that you don't have to do it in punch cards):


wheat_and_bread =[[0.5,5],[0.6,5.5],[0.8,6],[1.1,6.8],[1.4,7]]defstep_gradient(b_current, m_current, points, learningRate):
    b_gradient =0
    m_gradient =0
    N = float(len(points))for i in range(0, len(points)):
        x = points[i][0]
        y = points[i][1]
        b_gradient +=-(2/N)*(y -((m_current * x)+ b_current))
        m_gradient +=-(2/N)* x *(y -((m_current * x)+ b_current))
    new_b = b_current -(learningRate * b_gradient)
    new_m = m_current -(learningRate * m_gradient)return[new_b, new_m]defgradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
    b = starting_b
    m = starting_mfor i in range(num_iterations):
        b, m = step_gradient(b, m, points, learning_rate)return[b, m]

gradient_descent_runner(wheat_and_bread,1,1,0.01,100)

This should not introduce anything new, per se. However, it can be a bit of a mind boggle to merge the error function with gradient descent. Run the code and play around with this linear regression simulator.

The Perceptron

Enter Frank Rosenblatt - the guy who dissected rat brains during the day and searched for signs of extraterrestrial life at night. In 1958, he hit the front page of New York Times: “New Navy Device Learns By Doing” with a machine that mimics a neuron (1958, Rosenblatt).

If you showed Rosenblatt's machine 50 sets of two images, one with a mark to the left and the other on the right, it could make the distinction without being pre-programmed. The public got carried away with the possibilities of a true learning machine.

perceptron

For every training cycle, you start with input data to the left. Initial random weights are added to all the input data. They are then summed up. If the sum is negative, it’s translated into 0, otherwise, it’s mapped into a 1.

If the prediction is correct, then nothing happens to the weights in that cycle. If it’s wrong, you multiply the error with a learning rate. This adjusts the weights accordingly.

Let’s run the perceptron with the classic OR logic.

Input		Output
0	0 =	0
0	1 =	1
1	0 =	1
1	1 =	1

The perceptron machine translated into Python:

from random import choice from numpy import array, dot, random 
1_or_0 =lambda x:0if x <0else1
training_data =[(array([0,0,1]),0),(array([0,1,1]),1),(array([1,0,1]),1),(array([1,1,1]),1),]
weights = random.rand(3)
errors =[]
learning_rate =0.2
num_iterations =100for i in range(num_iterations):
    input, truth = choice(training_data)
    result = dot(weights, input)
    error = truth - 1_or_0(result)
    errors.append(error)
    weights += learning_rate * error * input for x, _ in training_data:
    result = dot(x, w)print("{}: {} -> {}".format(input[:2], result, 1_or_0(result)))

One year after the initial hype, Marvin Minsky and Seymour Papert destroyed the idea (1969, Minsky & Papert). At the time, Minsky and Papert ran the AI lab at MIT. They wrote a book proving that the perceptron could only solve linear problems. They also debunked claims about the multi-layer perceptron. Sadly, Frank Rosenblatt died in a boat accident two years later.

Just a year after the Minsky and Papert book was released, a Finnish master student discovered the theory to solve non-linear problems with multi-layered perceptrons (Linnainmaa, 1970). Because of the mainstream critic of the perceptron, the funding of AI dried up for more than a decade. This was known as the first AI winter.

The power of Minsky and Papert’s critique was the XOR problem. The logic is the same as the OR logic with one exception - when you have two true statements (1 & 1), you return False (0).

The XOR problem

In the OR logic, it’s possible to divide the true combination from the false ones. But as you can see, you can’t divide the XOR logic with one linear function.

Artificial Neural Networks

By 1986, several experiments proved that neural networks could solve complex nonlinear problems (Rumelhart et al., 1986). At the time, computers were 10,000 times faster compared to when the theory was developed. This is how Rumelhart et al. introduced their legendary paper:

We describe a new learning procedure, back-propagation, for networks of neuron-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure” Nature 323, 533 - 536 (09 October 1986)

To understand the core of this paper, we’ll code the implementation by DeepMind’s Andrew Trask. This is not a random snippet of code. It’s been used in Andrew Karpathy’s deep learning course at Stanford, and Siraj Raval’s Udacity course. On top of it, it solves the XOR problem, thawing the first AI winter.

1 - 8 are the neurons and 1.1 - 7.1 are the synapses/weights.

Before we dig into the code, play with this simulator for one to two hours to grasp the core logic. Then read Trask’s blog post, and then read it four more times. Note that the added parameter [1] in the X_XOR data are bias neurons, they have the same behavior as a constant in a linear function.

import numpy as np

X_XOR = np.array([[0,0,1],[0,1,1],[1,0,1],[1,1,1]])
y_truth = np.array([[0],[1],[1],[0]])

np.random.seed(1)
syn_0 =2*np.random.random((3,4))-1
syn_1 =2*np.random.random((4,1))-1defsigmoid(x):
    output =1/(1+np.exp(-x))return outputdefsigmoid_output_to_derivative(output):return output*(1-output)for j in range(60000):
    layer_1 = sigmoid(np.dot(X_XOR, syn_0))
    layer_2 = sigmoid(np.dot(layer_1, syn_1))
    error = layer_2 - y_truth
    layer_2_delta = error * sigmoid_output_to_derivative(layer_2)
    layer_1_error = layer_2_delta.dot(syn_1.T)
    layer_1_delta = layer_1_error * sigmoid_output_to_derivative(layer_1)
    syn_1 -= layer_1.T.dot(layer_2_delta)
    syn_0 -= X_XOR.T.dot(layer_1_delta)print("Output After Training: \n", layer_2)

Backpropagation, matrix multiplication, and gradient descent combined can be hard to wrap your mind around. The visualizations of this process is often a simplification of what’s going on behind the hood. Focus on understanding the logic behind it, but don’t worry too much of having a mental picture of it.

Also, look at Andrew Karpathy’s lecture on backpropagation, play with these visualizations, and read Michael Nielsen’s chapter on it.

Deep Neural Networks

Deep neural networks are neural networks with more than one layer between the input and output layer. The notion was introduced by Rina Dechter (Dechter, 1986), but gained mainstream attention in 2012. Soon after IBM Watson's Jeopardy victory and Google’s cat recognizer.

The core structure of deep neural network have stayed the same, but are now applied to several different problems. There have also been a lot of improvement in regularization. Originally, it was a set of math functions to simplify noisy earth data (Tikhonov, A. N., 1963). They are now used in neural networks to improve their ability to generalize.

A large share of the innovation is due to computing power. This improved researcher’s innovation cycles - what took a supercomputer one year to calculate in the mid-eighties takes half a second with today’s GPU technology.

The reduced cost in computing and the development of deep learning libraries have now made it accessible to the general public. Let’s look at an example of a common deep learning stack, starting from the bottom layer:

GPU > Nvidia Tesla K80. The hardware commonly used for graphics processing. Compared to CPUs, they are on average 50-200 times faster for deep learning.
CUDA> low level programming language for the GPUs
CuDNN> Nvidia’s library to optimize CUDA
Tensorflow> Google’s deep learning framework on top of CuDNN
TFlearn> A front-end framework for Tensorflow

Let’s have a look at the MNIST image classification of digits, the “Hello World” of deep learning.

Implemented in TFlearn:

from __future__ import division, print_function, absolute_importimport tflearnfrom tflearn.layers.core import dropout, fully_connectedfrom tensorflow.examples.tutorials.mnist import input_datafrom tflearn.layers.conv import conv_2d, max_pool_2dfrom tflearn.layers.normalization import local_response_normalizationfrom tflearn.layers.estimator import regression


mnist = input_data.read_data_sets("/data/", one_hot=True)
X, Y, testX, testY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels
X = X.reshape([-1,28,28,1])
testX = testX.reshape([-1,28,28,1])


network = tflearn.input_data(shape=[None,28,28,1], name='input')
network = conv_2d(network,32,3, activation='relu', regularizer="L2")
network = max_pool_2d(network,2)
network = local_response_normalization(network)
network = conv_2d(network,64,3, activation='relu', regularizer="L2")
network = max_pool_2d(network,2)
network = local_response_normalization(network)
network = fully_connected(network,128, activation='tanh')
network = dropout(network,0.8)
network = fully_connected(network,256, activation='tanh')
network = dropout(network,0.8)
network = fully_connected(network,10, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.01,
                        loss='categorical_crossentropy', name='target')


model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit({'input': X},{'target': Y}, n_epoch=20,
            validation_set=({'input': testX},{'target': testY}),
            snapshot_step=100, show_metric=True, run_id='convnet_mnist')

There are plenty of great articles explaining the MNIST problem: here here and here. If you want a high-level understanding of TFlearn, look at my previous post.

Let’s sum it up

As you see in the TFlearn example, the main logic of deep learning is still similar to Rosenblatt’s perceptron. Instead of using a binary Heaviside step function, today’s networks mostly use Relu activations. In the last layer of the convolutional neural network, loss equals categorical_crossentropy. This is an evolution of Legendre’s least square, a logistical regression for multiple categories. The optimizer adam originates from the work of Debye’ gradient descent. Tikhonov’s regularization notion is widely implemented in the form of dropout layers and regularization functions, L1/L2.

If you want a better intuition for neural networks and how to implement them, read my previous post on the FloydHub blog: My First Weekend of Deep Learning.

Thanks to Ignacio Tonoli, Brian Young, Paal Ringstad, Tomas Moska, and Charlie Harrington for reading drafts of this. Code sources are included in the Jupyter notebooks.

About Emil Wallner

This the second part in a multi-part blog series from Emil as he learns deep learning. Emil has spent a decade exploring human learning. He's worked for Oxford's business school, invested in education startups, and built an education technology business. Last year, he enrolled at Ecole 42 to apply his knowledge of human learning to machine learning.

You can follow along with Emil on Twitter and Medium.

We're always looking for more guests to write interesting blog posts about deep learning. Let us know on Twitter if you're interested.

↧

Ask HN: Sell my startup for $14M because I can't raise $2M?

September 22, 2017, 11:44 am

≫ Next: Is Rotten Tomatoes Killing the Movie Industry? No, Bad Movies Are

≪ Previous: Coding the history of Deep Learning

I founded an vertical-focused enterprise startup. We signed a few huge customers in the space on a build-and-beta basis, and that led to industry awards and a full pipeline of leads and signed contracts for installation (SaaS pricing with small implementation fee).

We've done it with a ridiculously small team, however. And it's become impossible to handle the dev and deployment, support and training, security reviews, feature requests, etc. So we are looking at options.

We've been approached by a few large companies in the industry regarding acquisition. It looks like we could sell for $12-14M. Kind of exciting, but also well under the value that's possible (which would probably be something north of $400M in our industry alone, with the possibility to move laterally).

The alternative is to raise money. We've put together a solid deck and model (I think), but the local venture capital scene is not great (think $1M+ ARR).

We think this is a product ripe for SV firms, but those critical "warm introductions" are elusive. We just don't have the network. Only 4% of my cold emails have even been opened (yes, we track it).

It's crazy to me that our industry is jumping up and down for our product, we've got offers to buy the company, but we can't raise a seed. I've failed hard here. I'm open for advice.

↧

Is Rotten Tomatoes Killing the Movie Industry? No, Bad Movies Are

September 22, 2017, 11:43 am

≫ Next: Bitcoin is fiat money, too

≪ Previous: Ask HN: Sell my startup for $14M because I can't raise $2M?

Take this simple quiz to see if you are qualified to run a movie studio.

Question: The terrible movie The Mummy— starring Tom Cruise — gets a terrible 16 percent rating on movie-review-aggregation site Rotten Tomatoes and wilts at the box office.

This failure is the result of:

A: A bad idea, cynically conceived and poorly executed.
B: Rotten Tomatoes.

If you answered B, you could be a studio executive. Judging, at least, by a recent New York Times article in which studio head honchos lay the blame for several months of horrific box office numbers — the worst in 20 years — on the website, which aggregates reviews from thousands of critics, excerpts a few hundred, and assigns a “rotten” rating to movies that fail to earn at least 60 percent positive notices on its popular Tomatometer.

The site is growing rapidly, reportedly getting nearly 14 million unique visitors in May, an increase of 35 percent over the previous year. Its parent, Fandango, attaches the Tomatometer to its ticket-buying interface. All of this has helped make Rotten Tomatoes a go-to consumer service for folks who want a quick read on whether a movie might be worth their time.

Studio executives, according to the Times, fear its growing popularity gives it an inordinate commercial influence. But they apparently don’t believe this strongly enough to say so on record.

The article cites a stale, oft-repeated quote from director Brett Ratner, saying the website is hastening the “destruction” of the movie business. (Coincidentally, he directed Rush Hour 3, which has an 18 percent rating on Rotten Tomatoes, yet it made $140 million, seemingly disproving his theory.) Despite the article’s assertion that Ratner’s opinion is “echoed daily” by studio execs, none would echo it publicly, reportedly for fear of giving the site even more legitimacy.

I suspect there is another reason.

If you are the person who gave the green light to The Mummy (I gave it two stars), or King Arthur (two stars, 28 percent on Rotten Tomatoes), or CHIPS (two and a half stars, 15 percent on RT), or Baywatch (two and a half stars, 18 percent on RT), or Transformers: The Last Knight (two stars, 15 percent on RT), you’re going to look like a fraud blaming the failure of your awful movie on a website that simply aggregates reviews.

And, let’s be clear: It was an awful summer for movies.

**rank Masi / Paramount Pictures**
“BAYWATCH”: (L-R) Dwayne Johnson as Mitch Buchannon and Zac Efron as Matt Brody.

And the site actually does more than that. Rotten Tomatoes “curators” read each review and assign to each movie a thumbs-up or thumbs-down — translated Fresh or Rotten in Rotten Tomatoes parlance — something a lot of critics don’t do in such black-and-white terms. These curators are actually quite fastidious. They contact on-the-fence critics to affirm or clarify opinions — I know this is true because they’ve twice contacted me to affirm a Fresh or Rotten rating on Ingrid Goes West (I said Fresh) and Rough Night (I went Rotten).

Is there something Orwellian and reductive about the way a (potentially) complex entity like a movie can be reduced to a green splat or red tomato? Yes. But there is often something appropriate about it, too. If movies are to be made and marketed like mass merchandise — The Mummy was literally conceived to be the beginning of a franchise/mythological universe — they can be Yelped like it. (A more pertinent question — why are “rotten” tomatoes icons green? Isn’t that an unripe tomato?).

And if you wanted to, you could use Rotten Tomatoes in a fairly sophisticated way. You start with the Tomatometer, but you can click through to individual reviews to get a broader consideration of a movie. You can customize your feed to consult critics you prefer.

Or you can go to a different site. Consumers have their choice of aggregators, just as they have their choice of movies. One is Metacritic, which aims to be more refined and discriminating than Rotten Tomatoes. As part of its methodology, Metacritic actually reviews reviewers — starting with a much smaller, more seasoned, and (it claims) more prestigious base of critics. Within this group, it assigns increased weight to critics whose opinions are deemed most worthwhile. (This is Metacritic’s secret sauce, and it does not disclose ingredients.)

Though most fans prefer the unruly democracy of Rotten Tomatoes, its handy mobile interface, and its incredible search engine optimization (Rotten Tomatoes comes up in most searches about movies), most cineasts and stats experts who’ve studied the Rotten Tomatoes and Metacritic algorithms recommend Metacritic (it has a better “Gaussian distribution,” says one stats guy) for serious film fans.

And analysts prefer both sites to the user polls you can find on IMDB and Fandango. A mathematician who examined the Fandango user poll concludes that Fandango “likes too many movies.” I’ve noticed that the user polls are also probably gamed — the scores are high until the movie actually opens, then they drop, often precipitously. A suspicious person would conclude that studios are ballot-stuffing the user base to give their movies a head start.

By the way, that never works.

Because there remains a constant in the movie business, and in the culture of moviegoing: What sells a movie over the long haul is old-fashioned word of mouth. Marketing can get an audience in theaters on opening weekend. Critics can help or hurt in the margin. And Rotten Tomatoes can aggregate those critics.

But even a cursory look at recent Rotten Tomato ratings shows a tenuous relationship between the site’s ratings and actual box office. It has assigned a rotten rating to movies that have done well (The Hitmans’ Bodyguard, The Emoji Movie), and it affixes an overwhelmingly positive rating to a film that fared poorly (Logan Lucky).

Once a movie opens, it’s people talking to other people (now accelerated by social media) that puts/keeps butts in seats.

That’s great news for great movies.

And bad news for The Mummy.

Published: September 21, 2017 — 3:01 AM EDT | Updated:September 22, 2017 — 6:12 AM EDT

comments powered by Disqus.

↧

Bitcoin is fiat money, too

September 22, 2017, 2:08 pm

≫ Next: Introduction to WebRender

≪ Previous: Is Rotten Tomatoes Killing the Movie Industry? No, Bad Movies Are

FINANCIERS with PhDs like to remind each other to “read your Kindleberger". The rare academic who could speak fluently to bureaucrats and normal people, Charles Kindleberger designed the Marshall Plan and wrote vast economic histories worthy of Tolstoy. “Read your Kindleberger” is just a coded way of saying “don’t forget this has all happened before”. So to anyone invested in, mining or building applications for distributed ledger money such as bitcoin or ethereum: read your Kindleberger.

Start with A Financial History of Western Europe, in which Kindleberger documents how many times merchants in different centuries figured out clever ways of doing the exact same thing. They made transactions easier, and in the process created new deposits and bills that increased the supply of money. In most cases, the Bürgermeister or the king left these innovations in place, but decided to control the supply of money and credit themselves. It is good for the king to be in charge of his own creditors. But also, it has always been tempting for private finance to create too much money. There is no evidence that money born on a distributed ledger will be clean of this sin.

Distributed ledgers, which borrow private computers from around the world to update the same list of accounts, address one ancient challenge of finance: how to make sure a transaction between two people located far apart is credible to both. Other innovations have done the same. In 1773 banks in England went in on a clearinghouse in London, for example, an improvement on the system of managing separate ledgers with each bank. The banks themselves took in gold coin—cumbersome to carry and verify—then created new money by offering more in loans than the gold they had on deposit. In 1776 Adam Smith described coins as an earth-bound highway, where bank money offered a “waggon-way through the air”. (Quote from Kindleberger. Read your Kindleberger.) Your blogger challenges any bitcoin operator to praise his own disruptions more loftily.

Over the next century the “currency school”, which wanted to lock down growth in money, argued with the “banking school”, which wanted ever more waggons in the air. We can’t look back now and say either side definitively won, because this argument cannot ever possibly be won. Creditors and business owners want the supply of money to grow slowly. Debtors and employees want it to grow faster. We might sooner tease out the true nature of the Holy Spirit.

Bankers talk about “governance”, ways to ensure private banks and central bankers make sound decisions—so they create just enough money make commerce easier, but not so much that the system collapses through inflation or panics. The developers behind distributed ledgers, however, often talk as if governance is something they are beyond. They are not. Computer code is just a set of rules. Code is governance. And it can change. Take bitcoin: if a supermajority of the computers running the bitcoin distributed ledger run an upgrade, the upgrade becomes the new code. But behind each computer is a human, making decisions. Distributed-ledger developers talk about a consensus-driven model, where you improve the system by bringing everyone on board. So do central bankers.

And different humans have different interests. In bitcoin, the people who own the computers verifying transactions—the “miners”—want code that increases fees for miners. People who use bitcoin want code that keeps those fees low. These two sides could not agree, and so in August the bitcoin distributed ledger “forked”—a smaller group of developers created a copy with slightly different rules, called “bitcoin cash”. Everyone who owned one unit of bitcoin also suddenly owned one unit of bitcoin cash. Out of a governance dispute, new money. In mid-September bitcoin traded at about $3900, while bitcoin cash fetched only $500.

Cointelegraph, a site for people who trade distributed-ledger currencies, also offers quotes and analysis on currencies called ethereum, ripple and litecoin. You could argue that markets are already deciding which new currencies provide sound money. And in doing so you would join the banking school of 19th-century England, or the people who loosened financial regulation in the late 1990s in America. Your blogger does not believe this argument is wrong, necessarily. But it is not new, and it has failed spectacularly in the past. Distributed ledgers are useful technology, just like banks. As they become a larger part of finance, the temptation to abuse them will be just as great. History instructs that no governance is perfect, and humans are reliably awful.

This week Quartz, an online magazine, published an interview with Vitalik Buterin, the 23-year-old founder of ethereum. The currency survived a crisis of credibility after a bug was discovered in 2016, leaving him up all night communicating with ethereum’s users and gathering consensus for action—much like the world’s central bankers a decade ago. He seems thoughtful on the trade-offs of governance, but unaware that anyone has considered them before:

In the case of ethereum, if somehow 80% of ethereum’s users just ended up being cryptocurrency speculators, would we then have a social responsibility to start optimizing for that constituency, because that would end up being our constituency? That’s an interesting philosophical question.

Yes. It is. Read your Kindleberger.

↧

Introduction to WebRender

September 22, 2017, 8:22 am

≫ Next: Gitlab 10.0 released

≪ Previous: Bitcoin is fiat money, too

WebRender, is a 2D renderer for the web. It started as Servo‘s graphics engine, and we are in the process of integrating it in Firefox.

I have been meaning for a while to write about what WebRender is, how it works and how its architecture is different from what Firefox and other browsers currently do. To do that, I first wanted to provide some context and look at what Firefox (and other browser engines) look like today, and it grew into a blog post of its own.

In this post we’ll go through a very high level and simplified overview of what Gecko’s graphics pipeline looks like. We’ll see that there are striking similarities with the architecture of other browsers, which is not to say that all browsers work the same way (the devil is always in the details), but some of these similarities are – in my humble opinion – interesting so I wanted to mention them.

Current web renderers

Broadly speaking, All major browser engines are designed around similar architectural concepts. In short:

The result of the layout computation is a tree of positioned elements in the page that we call the frame tree (some other engines call it the flow tree).
From the frame tree we generate a mostly flat list of drawing commands that we call the display list.
Portions of the display list are then painted into layers (which you can think of as the layers in a lot of image editing software). Painting is when the browser computes the color of actual pixels: going from information about what’s on the page to images that represent things on the page. This is traditionally done with immediate mode drawing libraries such as Cairo, Skia or Direct2D.
Layers are then combined together into one final image during the compositing phase.

In Firefox, layers are painted on the content process while compositing is performed on the compositor process.

Here is a rough (and over-simplified) sketch of Gecko’s graphics pipeline, from the DOM to the screen:

Gecko graphics pipeline

Notable differences and similarities between browser engines

This is the general idea but there are of course variations between browsers:

Display list

Some browsers skip the display list and instead paint layers directly off of the frame tree. It used to be the case of Chromium for example, although I think that they have been moving towards a display list-like approach as part of their “slimming paint” project.

I find display lists handy because:

Respecting the painting order of elements while traversing the frame tree is hard and being able to sort the display items helps. Generating the layer tree from the sorted display list is a lot easier than doing it from the frame tree because of complicated interactions between the rules of stacking contexts and z ordering in the CSS specification (If you want to know more about this, look up the deliciously dreadful name the Chromium folks have given to this problem: the “fundamental compositing bug”).
A display list is a convenient data structure to compute invalidation (figuring out the smallest region of pixels that need to be painted when something changes). Or at least it has worked very well in Firefox and Chromium’s invalidation design (also part of the slimming paint project) looks very similar.

Compositing

Some browsers have their own compositor (Firefox, Chromium), while others (Edge, Safari) more closely integrate with the OS’s compositing window manager and delegate compositing layers to the latter. Nonetheless, all of these browsers have a notion of retained layers.
(Chromium appears to be moving towards using DirectComposition on Windows which indicates they will delegate at least some of the compositing on windows at some point.)

One thing that all major browsers have in common is the separation between painting and compositing. Rather than paint everything directly into the window, browsers paint into these intermediate surfaces that we call layers.

compositor

We have this separation because while painting can be expensive and sometimes hard to do at 60 frames per second, compositing is a relatively simple operation and fairly easy to run on the GPU. Browsers can hope to composite at a solid 60 frames per second and perform painting at a lower frequency if it can’t be done at the full frame rate.
With a compositor, scrolling is only a matter of moving a layer, and various other effects and animations can also be performed by the compositor.
Painting and compositing at different frequencies makes it possible for some of the most important user interactions to stay responsive and smooth even if a page is too complex to fully paint at a high enough frequency. It also prevents long paint times or javascript execution to affect video playback.

Closing note

We just had a very high-level overview of how Gecko renders web content today and we saw that most browsers have some similarities in their overall architecture. Time to have a look at this from a more historical point of view.

Browser rendering engines were initially designed quite a while back, when computers did not necessarily have a lot of cores or a GPU and websites were pretty simple. Computer hardware evolved, the web which was initially a platform to present mostly static documents turned into a real interactive application platform, and as a result browser engines evolved as well.

An example of such evolution is the separation of painting and compositing (which hasn’t always been there). We already mentioned how compositing is an appealing approach when there is a lot of scrolling involved.

With a combination of web content becoming more demanding, computers getting more cores, the decline of Moore’s law and people getting used to browsing the web at 60 frames per seconds, it later became necessary to move compositing to a separate thread to ensure long paint times or JS execution would not cause the compositor to miss frames. These days we are also moving painting itself off of the main thread where JS and layout are performed.

These are very welcome incremental evolutions that are making browsing the web a lot nicer. Overall these evolutions have been mostly about keeping the same drawing model, taking apart and moving pieces in several threads.

What if we designed and built a browser engine from scratch today? Some elements of the rendering pipeline would remain similar while some would certainly be done differently. One of the elements common to most browsers that in my opinion is most showing its age is the way we have been doing painting, and this will be the topic of the next post in this series.

Post scriptum: Want to know more about how web browsers work? have a look at the Gecko overview wiki page.

↧

Gitlab 10.0 released

September 22, 2017, 12:08 pm

≫ Next: Red Hat is pretty good at being Red Hat

≪ Previous: Introduction to WebRender

From the formulation of an idea to executing and monitoring it in production, DevOps establishes a culture and environment where developing, testing, and releasing software can happen quickly, frequently, and more reliably.

GitLab 10.0 delivers a hands-free DevOps environment with the introduction of Auto DevOps, allowing your team to easily configure and adopt modern development practices in your workflow. Not only that, there's new navigation and a new way of collaborating across groups.

With every monthly release of GitLab, we introduce new capabilities and improve our existing features. GitLab 10.0 is no exception and includes numerous new additions, such as the ability to automatically resolve outdated merge request discussions, improvements to subgroups, and an API for Wiki thanks to a contribution from our open source community.

GitLab's powerful issue management capabilities keep getting better with every release. Filtering and searching issues across groups has been vastly improved, our updated UX makes moving issues easier to discover and can be automated through quick action commands. GitLab Enterprise Edition Premium customers using JIRA can now see commits and branches in JIRA's development panel.

Security and performance continues to improve. Administrators can now restrict SSH access through technology and key length. LDAP Group Sync can be automated through our API and can now lock down External Users at point of login as well. Performance continues to get faster, improving page loading speeds, the speed of creating projects and performing commits, and reduced memory usage.

Auto DevOps brings DevOps best practices to your project by automatically configuring your build, test, code quality assurance, review apps, deployment, and monitoring in a single environment. In GitLab 10.0, we have introduced out-of-the-box templates to quickly set up an end-to-end DevOps lifecycle, built on top of GitLab CI/CD.

As it stands, GitLab offers a single environment where a code change can not only initiate a build, but deploy a Review App to preview your changes from within each merge request. During the review process GitLab’s recently introduced ability to measure Code Quality ensures changes improve the overall quality of your software.

After code review, GitLab’s deployment capabilities easily allow you to deploy to canary or production environments, as well as using GitLab Auto Deploy to deploy straight to Google Cloud. Post-deployment metrics with GitLab Auto Monitoring provide response and system metrics to make sure newly deployed code is performant.

Now, GitLab 10.0 brings this entire lifecycle together in an automated way, allowing you to go from idea to production in the blink of an eye with GitLab Auto Devops.

Auto DevOps automatically detects, builds, tests, deploys, and monitors applications. Leveraging Herokuish, it supports all languages and frameworks available through Heroku buildpacks, such as Ruby, Rails, Node, PHP, Python, and Java, as well as the ability to customize your own buildpacks. Read the quick start guide to begin right now.

Auto DevOps is currently in Beta and is not recommended for production use just yet.

Note: GitLab is not affiliated with Heroku or Glider Labs.

Read through the documentation on Auto DevOps

Deprecations

PostgreSQL 9.2 Support

With the release of GitLab 10.0 on September 22nd, support for Postgres 9.2 will end and it will be removed from the Omnibus GitLab package. For deployments using packaged Postgres, upgrading to GitLab 10.0 requires the database to already be running version 9.6.

If you are upgrading from at least GitLab 9.0, your database is already running version 9.6. If you are running a version older than 9.0, please upgrade your database to prepare for GitLab 10.

PostgreSQL 9.2 is end of life in September, five years after it was first released.

Due: September 22nd, 2017.

API V3

In GitLab 8.17 we announced the deprecation of API v3.

We are still seeing a high volume of traffic on GitLab.com using API v3 requests.

API v3 will be removed in GitLab 11 and we just wanted to ensure that developers were migrating to API v4. Please refer to our documentation that shows changes between the two API versions.

Due: GitLab 11.0

Koding Integration

The GitLab Koding integration is being removed as we continue to enhance GitLab’s in-line editing capabilities.

From GitLab 10.0 it will no longer be possible to activate Koding integration. Existing installations that are currently using Koding may continue to do so until GitLab 11.0 when this functionality will be completely removed.

Due: GitLab 11.0

TLSv1 no Longer Accepted by Default

GitLab 10 will no longer accept TLSv1 by default. If you would like to continue to accept TLSv1 connections, it can be added back to the list of supported protocols by editing the nginx['ssl_protocols'] field in gitlab.rb.

Due: September 22nd, 2017.

GitLab Git HTTP Server Configuration Support Removed

Since GitLab 8.2, we have used gitlab-workhorse to process Git HTTP traffic. Earlier versions of GitLab used gitlab-git-http-server, and configuration entries for it have been ignored. With GitLab 10, we will be removing the code to recognize the long-deprecated configuration parameters for gitlab-git-http-server. In the event your gitlab.rb configuration file contains these entries, they should be removed or GitLab configuration will fail.

Due: September 22nd, 2017.

Private Tokens

Private tokens allow a mechanism to access the API with a token unique to your user account.

Personal Access Tokens provide more granular access to GitLab via the API and are recommended over using Private Tokens.

Private Token support will be removed in GitLab 10.2 as we feel they are less secure that Personal Access Tokens.

Due: September 22nd, 2017.

Custom SSH Client Configuration for the Git User

Currently, the Git user on a GitLab server can have custom SSH client configuration placed into ~git/.ssh/config.id_rsa and other configuration files are also picked up automatically.

This custom manual configuration is automatically picked up and used in a number of places. However, it’s insecure as there are no per-gitlab-user access controls on use of the key.

Due: September 22nd, 2018.

Drop Support of Legacy Git Storage Configuration

With the release of GitLab 9.0, we changed how to configure an alternate Git storage directory in order to support multiple directories. Backwards compatibility was maintained for the older formats to ease the upgrade process. In a future release of GitLab, we will no longer support the older configuration parameter, and users should modify their gitlab.rb to support the current git_data_dirs format.

For example if your gitlab.rb contains git_data_dirs({ "default" => "/var/opt/gitlab/git-data" }) it should be changed to git_data_dirs({ "default" => { "path" => "/var/opt/gitlab/git-data" } }).

Due: March 22nd, 2018.

Keyword 'types' in '.gitlab-ci.yml'

The types keyword, that has been replaced by stages long time ago, is deprecated and will be removed in a future release of GitLab.

Due: March 22nd, 2018.

Build Badges

Old badges paths for builds are now deprecated in favor of pipeline badges, and will be removed in a future release of GitLab.

Due: March 22nd, 2018.

Auto Deploy

Auto Deploy is now part of Auto DevOps, and no longer needs to have standalone templates. With the incorporation in Auto DevOps, it has been improved with Helm Charts support and persistent database instances, to make deployments even more usable for production.

Due: November 22nd, 2017.

Legacy Triggers

Triggers with the legacy label do not have an associated user and only have access to the current project. You are advised to take ownership of any legacy triggers.

Due: January 22nd, 2018.

Code Quality 'codeclimate' Job Name

In the first iteration of GitLab Code Quality, we hardcoded detection of the job by codeclimate. We now officially look for codequality jobs, even if the old codeclimate is still working.

Due: March 22nd, 2018.

Runner's 'docker-ssh' and 'docker-ssh+machine' executors

Two GitLab Runner’s executors – docker-ssh and docker-ssh+machine– are now deprecated. They will be removed in one of the upcoming releases.

You can read more about the decision in the issue.

Due: March 22nd, 2018.

We’re also deprecating all service-management-related commands (stop, start, restart, status, install, uninstall). They will be removed in one of the upcoming releases.

You can read more about the decision in the issue.

Due: March 22nd, 2018.

Upgrade barometer

To upgrade to GitLab 10.0 from the latest 9.5 version, no downtime is required. To upgrade without downtime, please consult the documentation on downtimeless upgrades.

If you are upgrading from a version prior to 9.5, migrations will take a significant amount of time based on the size of your database. For example on GitLab.com, the set of migrations would take over 24 hours. For larger deployments, we recommend upgrading to 9.5 first and allowing background migrations to run. Once completed, then continue and upgrade to GitLab 10.0.

You can check the status of background migrations by running this command from the Rails console: Sidekiq::Queue.new('background_migration').size

For this release we have migrations and post-deploy migrations.

GitLab.com migrations took approximately five minutes and post-deploy migrations amounted for a total of around 15 minutes.

GitLab Geo users, please consult the documentation on upgrading Geo.

↧

Red Hat is pretty good at being Red Hat

September 22, 2017, 10:54 am

≫ Next: How Logos are Remembered

≪ Previous: Gitlab 10.0 released

Today I attended Red Hat’s European analyst day at the Tate Modern art gallery. Here are some of thoughts based on what I heard.

So it turns out Red Hat is pretty good at being Red Hat. By that I mean Red Hat sticks to the knitting, carries water and chops wood, and generally just does a good job of packaging open source technology for enterprise adoption. It’s fashionable these days to decry open source – “it’s not a business”. Maybe not for you, but for Red Hat it sure is.

Enterprises trust Red Hat precisely because it makes open source boring. Exciting and cool, on the other hand, often means getting paged in the middle of the night. Enterprise people generally don’t like that kind of thing.

It’s been interesting to see the rise of OpenShift. What looked like a PaaS also ran in versions one and two has emerged as a go to Kubernetes distribution for enterprises. Red Hat now more than 400 paying customers for OpenShift Container Platform. For an enterprise today looking to establish a container strategy, with a view towards multicloud and or hybrid portability, Red Hat is becoming a default choice. At the very least it’s going to be one of the first names on a vendor selection list. There are tons of VC funded startups in the Kubernetes ecosystem – the new hotness, Heptio, for example recently took $25m – but there are very few that an enterprise is going to see as a relatively risk free investment. For a container platform discussion you’ll talk to Docker, Red Hat and a couple of other vendors. Recently, Pivotal bowed to inevitably of Kubernetes as a thing, by delivering support for Kubernetes in a BOSH world, in the shape of Kubo, co-developed with Google. Google is still very much a public cloud, rather than hybrid player, though it is partnering with Nutanix for hybrid deployments. Mesosphere blinked, and is now supporting Kubernetes. Oracle is in, working with CoreOS, and has joined the Cloud Native Computing Federation. IBM is all about Kubernetes. Amazon Web Services recently joined the CNCF, likely a harbinger of direct Kubernetes support. Microsoft is also building Kubernetes support into Azure. Red Hat’s platform story is pretty clear – unlike IBM and SAP it never made a bet on Cloud Foundry, so it could move directly to container native. That said, Red Hat is all too aware that it needs to avoids the fate of OpenStack, which grew explosively on the vendor side, before interest subsided, leaving Red Hat with customer contracts to support.

So what about customers and prospects? Saying “we want containers” is one thing. Being ready to take advantage of them however is quite another. Always start with continuous integration and deployment, with automated testing and distributed version control, the basis of everything good in modern software development.

Pivotal Labs has carved out an excellent position for itself by helping companies learn how to develop software the cloud native way – with its prescriptive diet of exposed concrete, pair programming and automated testing.

Red Hat is now emulating Pivotal with its own “Dojo” style Open Innovation Labs, initially in Boston, London and Singapore. Red Hat doesn’t have Pivotal’s pedigree in organisational transformation for application development but it plans to build capacity there. One of Red Hat’s customers at the event, Motability, was positive about the Open Innovation Lab experience in London. It said that it had also planned to carry out a training engagement with Thoughtworks. After the Red Hat engagement though, Thoughtworks was surprised how much progress it had already made. Which is to say, Thoughtworks may have lost money to Red Hat there.

So OpenShift is in good shape. But Red Hat also has another solid enterprise play in the configuration as code/DevOps space, through its Ansible acquisition, competing with Chef and Puppet. Red Hat cleaned up its story recently by finally open sourcing Ansible Tower a couple of weeks back. Red Hat said today it has nearly 500 paying customers, but is in quiet period before its next financial quarter, so that number will shortly get a boost.

Red Hat remains an anomaly – it makes money in open source. It has new revenue streams opening up. It is well positioned to keep doing the basics, but also now have a conversation with the C-suite about transformation.

What Kubernetes and Cloud9 tell us about the new industry – anyone but Amazon.

Amazon Web Services joins the Cloud Native Computing Foundation – what does it mean?

Innovate disrupt: Lab, Space, Cluster

disclosure: AWS, Docker, Google Cloud Platform, IBM, Pivotal, Red Hat and SAP are clients, but all opinions expressed in this analysis are RedMonk’s own.

↧

How Logos are Remembered

September 22, 2017, 8:52 am

≫ Next: Taking a cigarette break on the smoking internet

≪ Previous: Red Hat is pretty good at being Red Hat

10 iconic logos. 156 Americans. 80 hours of drawing from memory.

The logos of global corporations like Apple, Starbucks, and Foot Locker are designed to create instant brand associations in the minds of billions who see them every day. But how accurately can we remember the features and colors of these famous symbols?

To find out, we asked over 150 Americans to draw 10 famous logos from memory as accurately as they could. Based on more than 1,500 drawings created over a period of 80 hours, the results reveal that, far from being stamped perfectly in our collective memory, these ubiquitous emblems largely exist as fuzzy visions in our mind's eye. One in 5 people thinks the Foot Locker referee wears a hat (he doesn't), and nearly half of people believe the Starbucks mermaid does not wear a crown (she does). That only scratches the surface of what our study found out.

Choose from the logos below to jump to individual results, or read a summary of all findings here.

Jump to a Section

Apple

There are Apple devices in the pockets, on the wrists, and otherwise in the possession of around 600 million people across the world.¹ With so many opportunities to see it each day, recalling Apple's elegantly simple logo should be a no-brainer, right? After all (and unlike Starbucks or Foot Locker), the clue to its design is in the name!

In fact, only 20 percent of people were able to draw the Apple logo almost perfectly. To qualify as nearly perfect, all key features (e.g., the bite, the floating leaf, and the overall shape and proportions) had to be combined accurately.

The most common mistake, made by nearly 1 in 3 people, was including a stalk, when in reality there isn't one. There is, however, a leaf, and although 15 percent drew it facing the wrong direction, three-quarters of people remembered to include it in one form or another.

The Apple logo as we know it today was designed by ad agency Regis McKenna in 1977.² Art director Rob Janoff's challenge was to make the emblem more businesslike (it had previously shown Isaac Newton sitting under an apple tree) and, as Steve Jobs put it, "don't make it cute." The bite, which is the logo's most iconic feature, was included for scale, so the apple wouldn't be mistaken for a cherry.

Eighty-four percent of people remembered the bite, but over 1 in 5 mistakenly drew it on the left side instead of the right.

Interestingly, a smaller proportion of people in our experiment put the bite on the wrong side of the apple (22 percent) than the proportion who, in a now famous experiment in visual memory conducted in 1979, thought Abraham Lincoln faced left on a U.S. penny (50 percent)³, when in fact he faces right. So when it comes to saying which way Abe faces on a penny, our guesses are no better than the toss of a coin, but when asked which side of the apple has the bite, we're right almost 80 percent of the time.

Of 156 people, five (3 percent) drew the logo as rainbow-striped, which reflects how it looked between 1977 and 1998.⁴ Their average age was 42, compared to an average age of 34 across all participants. So it's possible they remembered the logo as it was when they were in their 20s.

Six percent colored the logo red (perhaps thinking of a regular red apple), while 72 percent correctly identified it as black or gray.

Adidas

Adidas, the second largest sportswear company in the world, acquired its three-stripes logo in 1952 from footwear brand Karhu Sports for two bottles of whiskey and the equivalent of $2,000.⁵ Sixty-five years later, the iconic stripes are featured on the clothing of hundreds of professional athletes and hundreds of millions of consumers.

Despite its ubiquity, only 12 percent of people in our study created near perfect renditions of the Adidas logo from memory. That meant recalling it has three stripes (not four or more, as 11 percent of participants believed), that the word "adidas" is written entirely in lowercase (21 percent used a capital "A"), and that the logo does not, as one participant imagined, consist of a brown boot facing left.

Roughly 1 in 10 people drew the Adidas trefoil logo, which was introduced in 1971.⁵ Its three leaves are intended to represent the landmasses of the Americas, Europe, Africa, and Asia, and the three lines connecting them signify diversity. The trefoil logo is still used on some of the company’s products, particularly the Adidas Originals line.

The Adidas logo is usually black, but 8 percent of people included blue in their drawings, which, while not the logo's primary color, is used extensively in Adidas' packaging, especially on their blue footwear boxes.

Burger King

In contrast to the minimalistic and monochromatic Apple and Adidas icons, Burger King's logo is slightly more complex, consisting of three distinct features (text, bun halves, and a crescent shape) and three colors (red, yellow, and blue). As a result, we expected the accuracy of our participants' recollections to drop. However, the additional colors didn't seem to put any extra strain on their memories.

Eighteen percent of people were able to recall Burger King's logo almost perfectly, compared to 20 percent for Apple and 12 percent for Adidas.

One of the most interesting results of our experiment is that people often mistakenly remember elements of a brand's wider advertising as being part of its logo. Twenty-one percent of people included a crown, despite Burger King’s logo not featuring one since its “sitting king” logo, which was in use from 1957 to 1969.⁶

It seems unlikely that our participants, with an average age of 34, recalled a defunct logo from 48 years ago instead of the current version, which has been around since 1999. It's more likely that they couldn't fully remember what the logo looks like, so they included a crown in response to the word "King" in the brand's name, the King character from its ads, and the paper crowns that are given out in restaurants.

It's less clear why over 1 in 5 people drew Burger King logos that look almost identical to the version used between 1969 and 1999, which consisted of red text and orange bun halves, but no blue crescent or "axis tilt." Our theory is that some of these participants probably did remember the old logo, while others tried to draw it in its current form but forgot a couple of key features.

Domino's

The three dots featured on the original Domino's Pizza logo represented the first three stores owned by founders Tom and James Monaghan in the 1960s. The plan was to add a dot for each additional store that opened, but rapid growth quickly made the idea impractical, so the three dots were left untouched.⁷

Twenty-eight percent of people recalled that the Domino's logo has three dots and positioned them correctly (two in the bottom square, one in the top). Thirty-seven percent included more than three dots, while 14 percent forgot them altogether.

Overall, 16 percent drew near perfect Domino's logos, and 28 percent made good attempts, meaning their versions closely matched the actual logo except for a few minor mistakes.

We saw above that a large proportion of people appeared to draw an older version of the Burger King logo, but it wasn't clear whether they actually meant to. In Domino's case, it was more clear-cut.

Fifteen percent of people drew what was unmistakably the square, tilted Domino's logo used between 1996 and 2012. Sampling the colors used in the 156 drawings also shows that the average remembered shade of blue is closer to the royal blue used in the older Domino's logo than the sky blue utilized today.

Two-thirds of our participants included the brand name in their drawings, although not always with perfect accuracy: 55 percent forgot to include the apostrophe, and 11 percent included an "e": Dominoe's.

7-Eleven

The 7-Eleven logo hasn't changed much since it first appeared in 1946. It is characterized by a numeral seven that is intersected by the word "Eleven" and set inside a white trapezoid against a green background. The only major change came in 1969 when the top part of the "7" was made orange while the bottom remained red.⁸

Nineteen percent of people drew near perfect 7-Eleven logos, and 46 percent closely matched the actual logo except for some minor flaws. Two of the most common mistakes were writing "Eleven" as a numeral (31 percent), and positioning the word "Eleven" underneath the "7" instead of across its middle (56 percent).

One of the most subtle features of the 7-Eleven logo is that "eleven" is written "ELEVEn" – in uppercase except for the letter "n." Only two in 156 people (around 1 percent) correctly remembered this minor detail. We wondered if they were 7-Eleven employees, but further investigation showed that, while one does work in wholesale and retail, both said they only engage with the 7-Eleven brand "very rarely" or "occasionally."

It seems that, despite remaining almost completely unchanged for nearly 50 years, the 7-Eleven logo has too many elements that can be mixed up or omitted in people's minds, as it was only the fourth most accurately remembered logo out of the 10.

Starbucks

Since its formation in 1971, Starbucks has used three logos, each showing a different rendition of a twin-tailed mermaid (or siren, as she's known in Greek mythology). In the first version of the logo, the mermaid was topless, but in 1987 the logo was simplified by covering her breasts with flowing hair and switching the main color from brown to green.

Starbucks' current logo, introduced in 2011, is a streamlined version of the two-tailed siren. It no longer features the "Starbucks Coffee" text and is pure green, as opposed to green and black. Despite this simplification, only 6 percent of people drew a near perfect Starbucks logo from memory.

There's no doubt that the Starbucks mermaid is extremely memorable (90 percent of people included her), but she's complex enough for very few people to recall the details of her form accurately.

Of people who drew the Starbucks mermaid, 45 percent forgot that she wears a crown, 16 percent who drew her crown remembered the star in its center, and 55 percent omitted her twin tails.

Thirty-one percent of people remembered the pre-2011 Starbucks logo, which featured the brand name and a black circle.

Overall, despite the fact that Starbucks sells around 18 million cups of coffee each day,¹⁰the finer details of the mermaid made its logo the least accurately remembered of all the brands we investigated.

Walmart

Walmart is the world’s largest company by revenue, with more than 11,500 stores in 28 countries.¹¹ Its first logo, used between 1962 and 1964, was as basic as logos come, consisting of plain sans-serif lettering and no symbols. The typeface then changed to the "Frontier Font," resembling something from the Wild West, before reverting to plain lettering in 1981, although still with a hyphen between “Wal” and “Mart.” One in 10 people used a hyphen in their drawings.

In 1992, a star replaced the hyphen (7 percent of our participants included this defunct star in their drawings), until 2008, when Walmart made the logo friendlier and more welcoming by removing the hyphen, making the text title case and including a "sunburst" symbol. Sixty-eight percent of people remembered the sunburst symbol, although 42 percent of the time they didn’t draw it with the correct number of points (six), arranged in the proper fashion.

Twelve percent of people drew a near perfect Walmart logo from memory, indicating that it is less clearly recalled than 7-Eleven (19 percent) but equal to Adidas.

In terms of color and text, the Walmart logo is very simple. But the sunburst symbol is exactly the sort of thing people fail to recall accurately. Remembering the presence of the sunburst is easy, but there is enough complexity to it that many people get confused by the number and arrangement of the beams. This was the most common reason drawings failed to be near perfect, followed by confusing old and new versions of the logo. This was more the case for Walmart than any other brand. People confused the star and the sunburst – from the brand’s two most recent logos– either including both at the same time, placing the sunburst where the star used to be, or vice versa.

This was the most common reason drawings failed to be near perfect, followed by confusing old and new versions of the logo. This was more the case for Walmart than any other brand. People confused the star and the sunburst – from the brand’s two most recent logos– either including both at the same time, placing the sunburst where the star used to be, or vice versa.

Target

Alongside Walmart, Target is one of the country’s biggest mass-merchandise retailers. Its logo is one of the most recognizable, too. According to a 2003 Target study, 96 percent of North Americans associated the bold, red bull's-eye symbol with the retailer.¹²

In our experiment, one-quarter of people drew near perfect Target logos from memory, making it the second best remembered. Fifty-two percent were able to draw it well (a close match except for a few minor flaws).

The main aspect of the logo to get right is its color (100 percent of people knew the bull's-eye was red) and the number of rings that comprise the target. Fifty-nine percent of people knew the actual logo has a solid circle in its center and one ring surrounding it.

Most people (59 percent) chose to include the word "Target," which is how it’s commonly seen on storefronts. However, those who missed out the brand name weren’t completely wrong, as Target has decoupled the name from its logo in ads and flyers since 2006, believing that most people are familiar enough with the brand to make the association easily.

IKEA

The most accurately drawn logo from memory by 156 Americans belongs to a Swedish company: IKEA.

Thirty percent of the people who drew the IKEA logo from memory were able to recreate its combination of text, shapes, and color almost perfectly, compared to 25 percent for Target, and 20 percent for Apple.

One of the most important elements of the IKEA logo, given that it is so simple, are the letters. Eighty-eight percent of people remembered that they are written in uppercase. Whether you remember the IKEA logo with or without the yellow oval behind the blue letters depends on if you think of an IKEA storefront, which is likely to feature yellow letters on a blue background, or the logo from printed materials and TV commercials, which shows the oval. Forty-one percent of drawings included the oval behind the brand name.

Summary

The question at the heart of this experiment is "How accurately can we recall logos we see on a daily basis?" The results show that most people are very good at recalling brand colors – around 80 percent selected the correct palettes for their drawings, while shapes and elements in logos are harder to recall.

When a brand’s logo changes over time, a subset of people mistakenly conflates old and new versions. Similarly, we sometimes slip up when advertising utilizes strong symbols not used in the logo (e.g., the Burger King crown).

Overall, 16 percent of people drew near perfect logos, and 37 percent were good but not perfect. As we would expect, the more complex the logo, the less likely people are to remember it in full.

We wanted to go deeper than just measuring the accuracy of each remembered logo, so we asked our 156 participants a host of questions about themselves, including their age, gender, occupation, and how much they engage with the 10 brands (e.g., owning an Apple device or regularly shopping at Target). Men and women performed equally well (or poorly), regardless of the logo in question, and their level of brand engagement made no difference to their ability to accurately recall the logos.

There was, however, a difference by age. On average, younger people drew more accurate logos than older people. This was true across almost all brands, but was most noticeable for 7-Eleven, Burger King, and Adidas. Walmart was least affected by age, showing no difference between younger and older groups.

Our participants’ drawings were scored twice. First by the participants themselves, then by an independent panel of five design and advertising professionals, who assessed the presence of key features and dimensions, as well as the colors used. The judges’ scores were averaged to establish ratings for over 1,500 drawings. This allowed us to compare the drawers' self-perceived accuracy to the actual accuracy as judged by other people. In other words, we could measure under and overconfidence.

Seventy percent of our participants overrated the accuracy of their attempts. The average accuracy of the drawings as judged by the participants was 5.3 out of 10, while the average rating of the same drawings by our independent judges was 3.8. Confidence scaled with accuracy, so while people in general overestimated how well they did, those who did best had the best awareness of their ability.

How can logos from companies as ubiquitous as Apple, Starbucks, and Walmart be so hard to pin down in our mind’s eye?

One explanation is that, as Sherlock Holmes said, we "see, but do not observe." Logos of giant corporations are so widespread that we don't need to have a photographic memory of them to recognize and engage with these brands. Instead, we remember just enough to get by. This process has been dubbed "inattentional amnesia" – despite seeing something many times, we fail to create a lasting memory of it. We found this to be the case even when participants engaged with the brands more than average.

We’ve seen how – despite our self-confidence – our ability to remember logos is surprisingly fallible. Perhaps you think you would have fared a little better. If so, we'll finish by asking you a very simple question: What color is the second letter in Google’s logo?

Logo Gallery

150 people Draw 10 Logos from Memory

Company logos are trademarks^TM or registered® of their respective holders. Use of them does not imply any affiliation with or endorsement by them.

To embed this interactive gallery on your site, use this code:

<iframe width="576" height="500" scrolling="no" frameborder="0" src="http://ec2-52-36-103-254.us-west-2.compute.amazonaws.com/logoslider/"></iframe>

Try It Yourself

Think you can do better? Test your logo recall skills with our interactive quiz below:

Methodology

We recruited 156 Americans ranging in age from 20 to 70 to take part in our drawing experiment, with an average age of 34. Fifty percent were female, 47 percent were male, and 3 percent preferred not to say.
Our participants collectively spent 80 hours (31 minutes each) drawing the 10 logos using only their memories. Each person was taught how to use the drawing software to maximize their ability to draw accurately. They were asked to rate each of their attempts for accuracy out of 10, as well as indicate their level of engagement with the respective brand.
Logos were also independently rated for accuracy by a panel of five design and marketing professionals. Each gave the over 1,500 drawings a score out of 10 based on the included features, proportions, and color palettes. The scores were then averaged to establish accuracy ratings for each drawing and participant.

About Signs.com

Fascinated by the impact of logos on culture and human psychology? Us too! Signs.com is a team of design and tech nerds providing customers with the signage, practices, and education they need to get noticed. We offer more than just custom signs, though. Our superior customer service, easy-to-use design tools, and world-class materials will help you make your logo, brand, and company as memorable as possible.

Sources

1. http://uk.businessinsider.com/credit-suisse-estimates-588-million-apple-users-2016-4

2. https://www.creativereview.co.uk/apple-logo-1977/

3. http://www.sciencedirect.com/science/article/pii/0010028579900136

4. https://9to5mac.com/2014/05/19/fancy-a-piece-of-apple-history-apples-original-rainbow-logo-signs-being-auctioned/

5. http://www.creativebloq.com/logo-design/how-adidas-logo-earned-its-stripes-11135390

6. http://logos.wikia.com/wiki/Burger_King

7. http://logos.wikia.com/wiki/Domino%27s

8. http://logos.wikia.com/wiki/7-Eleven

9. U.S. Securities and Exchange Commission filings, January 2017

10. https://www.facebook.com/starbuckspartners/videos/10155429344291002/

11. http://money.cnn.com/2016/06/02/news/companies/walmart-smiley/index.html

12. https://corporate.target.com/article/2014/04/bullseye-love-history-of-target-logo

Fair Use Statement

Want to share our research? Go for it! You’re welcome to share as long as it’s for noncommercial purposes, and you link back to us here!

↧

Taking a cigarette break on the smoking internet

September 22, 2017, 11:25 am

≫ Next: Freight startup Flexport soars from ‘unsexy’ to $800M valuation

≪ Previous: How Logos are Remembered

The online people-who-love-to-smoke community is one of the most supportive and kindly corners of the internet I have ever encountered, especially for a group entirely preoccupied with the abetting of a habit that is the leading cause of preventable death worldwide. I’m not sure if this is a particularly illuminating observation, whether it says something profoundly terrible about humanity and where it is headed, or whether this is something we should find solace in. I change my mind about it a lot.

There is a website I love called airportsmokers.com. It advertises itself as “the original airport smoking website,” and it does exactly what it says it does: it tells its visitors what airports they can smoke in, and under what sort of conditions. It offers a ranking system, visually represented by a little row of cigarettes: 1/10 for an airport where you can’t smoke anywhere, even you’re on a 12-hour layover and you are about to chew your own sleeves off; 10/10 for an airport where you can smoke two cigarettes at once in the queue for passport control. My personal favorite ranking is in the 8-to-9 cigarette range: “Excessive amount of smoking areas before and after security and sometimes at baggage reclaim.” “Excessive.” How many smoking areas would there have to be in an airport before a hardcore smoker looked around and thought, “This is too many, even for me”?

Airportsmokers.com is unbelievably thorough: among the many things it has taught me is how many airports there are in this world. Given the vast scope of its informative ambition, and the fact that I only came across it six months ago when trying to figure out if I could smoke in Doha airport, I have not been able to look at the ranking for every single airport listed, but I am fairly certain that no airport has achieved a ranking of 10/10 cigarettes. How could it? Even if an airport does have a place to smoke, or at least one that you don’t have to pay to get into, that place is invariably a disgusting and sad reminder of what sort of shabby treatment you will put up with if you are addicted to cigarettes. At very, very best, what you will get is a glass box near a boarding gate, televisions mounted to the wall broadcasting the latest in international bad news, tired people in track pants leaning against the wall because the swivel chairs are broken, and ashtrays that are emptied once every six months. No attempt is made to make these places comfortable, because no attempt is needed. Smokers are not in a position to make any demands.

At the top of the home page of airportsmokers.com, in bold, is the insistence that “we are not second class travellers!!!” This is an obvious untruth, and airportsmokers.com knows it. The whole reason that the site exists is that smokers are absolutely second class travellers, and the only way they will survive is if they stick together.

A man is seen smoking a cigarette on a hammock at the 2017 Woodstock Festival Poland on August 4, 2017 in Kostrzyn, Poland. Getty/Omer Messinger

According to Mark Walmsley, the very nice man who runs it and who answered my questions with promptness and patience, the site is updated almost daily with feedback provided by its many users. Stuff like this (on the conditions on the ground in Sydney airport): “T1 terminal does indeed have a smoking area, its [sic] an outside area in the far left hand corner as you enter from the driveway in departures near the Coffee Club. It is before security. Feedback provided by Ray W. on the 11th April 2017.” Thank you, Ray. And thank you to Chris G, who has sweetly uploaded some photos of the smoking facilities at Charles de Gaulle, and to Chris M reporting from Manila International Airport (“troubling news: there is now no indoor smoking at all in Terminal 2.”) Thank you to the anonymous person who sent a photo of a chilled-out bald man smoking on a metal chair at Nadi International in Fiji (which still, by the way, gets a ranking of only 1/10 cigarettes). We are grateful to G.M., who reports that a sign has been erected at Zagreb International, which promises that “smoking cabins” are coming soon.

There is so much feedback, and while some of it is delivered in an offputtingly aggrieved tone, the vast majority of it is kind, encouraging, and helpful without expectation of reward. When I asked him what prompted the creation of airportsmokers.com, Walmsley told me that he started the site after being stuck for 12 hours without a cigarette in Dublin Airport, “only to find out later that it had an outdoor smoking area past security,” where he could have smoked like a train, if there had been a website to inform him that it existed. Most airport websites, he said, do not publish this kind of information, and so airportsmokers.com was born. It receives 5,000 unique visitors every day. Walmsley also told me that he does not make money off his website, and I believe him. He seems genuinely to be motivated by a desire to help.

A smoker stands in front of a construction hoarding featuring cloud and sky patterns, on July 31, 2017, in Covent Garden, London, England. Getty/Richard Baker

The people of airportsmokers.com look out for each other because they need to, but also because they want to. I have spent a long, long time on airportsmokers.com, and I haven’t seen any information that is not directly related to where and how a person can smoke their head off at an airport. The internet in general feels like a place for people to hate each other and make everything worse, but airportsmokers.com is a cheering exception to this. The meanest thing I could find was this: “Helping the non-smoker: The Website received its first anti-smoking rants on the 25th August 2017. The sender was unable to use paragraphs and to be honest, it was too much of a hassle to read. So for those who wish to send rants, send them somewhere else as they will go unread.”

The people of airportsmokers.com look out for each other because they need to, but also because they want to.

This air of solidarity, of warm-hearted acceptance of the fact that we are all in this together, extends beyond airportsmokers.com to all corners of the smoking internet, including the designated smoking area on reddit. The first rule on r/Cigarettes is “Be nice,” and the banner declares, “Come take a smoke break with us.” It is clearly a place for kickin’ back. The first rule on r/Horses, just for contrast, is “Do not post questions about your horses that could significantly affect their health. Call a vet.” Like airportsmokers.com, the overall tone on r/Cigarettes can very occasionally veer into the aggrieved or harassed-seeming: “Don’t berate others for what they’re smoking, what they bought, or for smoking in general. This is not the place to tell people smoking is bad for them. We know.” In general, however, people are there to have a good time, and to write posts titled stuff like “Nothing like a red after an exam.” “Best part of the day.” “After Harvey, it's gotten kinda chilly in Houston (for September) and I switched to menthols while it's cool out.” “What do you guys think about not inhaling?” “Cigs aren’t for everyone.”

A man smokes as Sauvignon Blanc grapes are harvested for Bar-Maor Winery at daybreak on July 30, 2017 in Givat Nili in northern Israel. Getty/David Silverman

Cigs, it is true, are not for everyone. They are hardly for anyone, anymore, and this is perhaps the reason why everyone on cigarette internet is so nice to each other – they are bound by a shared sense of shame and persecution. Even the guys doing their cigarette reviews on YouTube attract almost no negative attention at all, even as they seem to open themselves wide open to this possibility. A man wearing wraparound sunglasses in his car, giving his not inconsiderable viewership what it has asked for, which in this case is the most rigorous and lengthy description of what it is like to smoke a Newport Menthol, beginning with what the box feels like in his hand, to how it packs (he has heard that Newports pack harder than other cigarettes, and they do — when he opens the box it is clear that he has packed the shit out of them), to the quality of the plastic seal which closes the box, to how it burns, to how it looks when it burns, to the actual thoughtful and vivid description of what it tastes like (strong, full-bodied, definitely heavy menthol resulting in a mouth that is kind of numb and gummy, slight sulphur taste which is not as gross as it could be).

There are many cigarette reviews on YouTube. A lot of them (all of them, if I am honest), are mesmerizingly odd. Some of the people doing them make me feel sad, and worried about their quality of life or their undoubtedly troubled past. They are a bit twitchy and strange, making wild claims about the government’s suppression of the free circulation of cigarettes throughout Europe. There is a guy who moves his mouth around in a gasping way for about five seconds before he speaks, so that he looks as if he has been badly dubbed. He sits in a way that makes him look he doesn’t have a skeleton, and his upper arms have a pallor that make me feel like he is dead. Nearly all of them are unusual in a way you would need to text your friends about: “Met a guy who loves smoking too much. A strange man.” You wonder how these people will be robust enough to face the barrage of abuse that will surely be coming their way in the comments section. Surely just person after person going “Fuck, you are so weird.” Just: “There is truly no reason for you to be doing this.” “What kind of life is this.” But no. Nothing. Just stuff like “Great video, dude!” or “Disgusting, ignorant health government. Valuing health more than right and wrong and enforcing it blindly... Other than that great review and better screen resolution!” “Keep it up!!!”

There is, of course, some shadowy and sinister undergrowth in even this sunny garden. For instance, a site called Smoking From All Sides. It is as baldly plain in design as airportsmokers.com, and claims simply to provide links to “all perspectives of smoking.” There are links to anti-smoking groups, to “commentary” about smoking, and to a “television-style” documentary called “Why Do People Smoke?” So far, so relaxing, but then obviously it turns out to be about things of a sexual nature in a way that jars with the wholesome energy of the wider community. You click on a link called “smoking glamour,” and suddenly there are all these ads for “smoking erotica,” pictures of hard-faced ladies making out with their cigarettes, and another link called “Female Celebrity Smoking List,” which is somehow even more pervert-y than the smoking porn.

Still though, the site could be so much worse. And again, it does exactly what it says it does: I now have more perspectives on smoking than I did before. It was probably only trying to help.

Rosa Lyster is a writer in Cape Town.

↧

Freight startup Flexport soars from ‘unsexy’ to $800M valuation

September 22, 2017, 7:59 am

≫ Next: Pyka is hiring an EE to design aircraft subsystems

≪ Previous: Taking a cigarette break on the smoking internet

Flexport handles the boring logistics of a trillion-dollar business: the transport of shipping containers around the world. Because the work of freight forwarding seemed so bland, it was long ignored by the tech world. But digitizing the paperwork let Flexport speed up shipping so clients keep less inventory on hand while never running out.

When you apply that optimization to how every container full of electronics, clothes or food gets from factory to store, Flexport keeps getting smarter as the value piles up. That’s why just a year after raising $65 million at a valuation of $365 million, TechCrunch has learned Flexport has just closed a huge new round of funding, according to five sources.

Initially, Flexport was receiving offers valuing it at over $1 billion, but turned those down in favor of a more manageable valuation. Multiple sources now confirm that the startup has completed a $110 million Series C at an $800 million pre-money valuation. The round is mostly filled with existing investors, including DST. Flexport declined to comment for this story.

What we’ve kept hearing is that Flexport co-founder Ryan Petersen is a favorite amongst investors. “He’s a machine,” said one of TechCrunch’s sources. After growing up buying scooters from China and fencing them online, he co-founded ImportGenius to scan and sell shipping manifest data about imports. That led him to realize how antiquated freight forwarding was, paving the way for Flexport’s start in 2013.

“They’re hiring like crazy,” one source said. Flexport appears to be trying to scale up fast enough to compete with entrenched giants in the shipping space, like FedEx, DHL and Expeditors, that can’t adapt to new technology as quickly. Meanwhile, it’s looking to box out upstart competitors picking away at parts of the freight forwarding equation, including Freightos, Haven and Fleet.

Soon after TechCrunch wrote a story about the company being “The unsexiest trillion-dollar startup,” we started receiving aggressive pitches from these competitors. Fusion’s Alexis Madrigal shined a light on how interesting the business could be with his podcast series Containers, which was sponsored by Flexport. And Flexport raised its $65 million Series B last September that brought it to $94 million in total funding. A year later, its value has more than doubled.

Flexport founder and CEO Ryan Petersen

Now the company has over 400 employees in seven offices. It earns roughly 15 percent of the average $2,000 it costs to move a shipping container around the world, compared to 25 percent that its competitors charge. Petersen told Forbes he expects revenue of $500 million this year, yet that still makes Flexport an underdog. “There are 25 freight forwarders that each do more than $1 billion in revenue a year,” he said. “None of them was founded after Netscape.”

That’s Flexport’s advantage. Tracking everything with paper leads its older competitors to see clients individually. Flexport wholistically analyzes all its data to optimize shipping routes and simplify relationships with ports, truck drivers and anyone else that touches a container. That’s allowed it to shave off five days of travel time for moving less than a container full of goods.

Now it’s opening its own “cross docks” — warehouses where it can temporarily store clients’ goods until it can batch their transport with other shipments going to the same place. That way it’s always moving full containers with maximum efficiency. Flexport already has cross docks in Hong Kong and LA, but Petersen foresees having a global network.

The new capital could help Flexport pay for transitioning from a pure-play software company merely handling routing logistics to being an actual freight company. This shift from bits to atoms doesn’t come cheap, but with plenty of revenue waiting to be stolen from sluggish competitors, it’s having no problem finding the capital.

TechCrunch has heard that many of Flexport’s existing investors, which number at least 57, were shut out of the new round despite being interested. And while investors were happily offering it a valuation of $1 billion or more, Petersen didn’t want to get in over his head and risk a down-round later if the market stumbled.

The obvious, glamorous verticals of tech have been overrun with startups. Everyone seems to have a photo-sharing app or some “revolutionary” artificial intelligence play. There’s an old adage that the best startup ideas are often at the intersection of “seems like a bad idea” and “is a good idea,” as Peter Thiel, the leader of Flexport investor Founders Fund, has said.

But as the mainstream embraces startup culture and aspiring founders flood Silicon Valley, there seems to be no shortage of people willing to chase things that seem like a bad idea. Perhaps an addendum to the startup seeker’s mantra should be “Seems bad and boring.” That’s where there are still gold mines untapped.

Additional reporting by Ingrid Lunden

Featured Image: Bryce Durbin/TechCrunch

↧

Pyka is hiring an EE to design aircraft subsystems

September 22, 2017, 3:48 pm

≫ Next: The Last Analog Motion Graphics Machine [video]

≪ Previous: Freight startup Flexport soars from ‘unsexy’ to $800M valuation

Pyka is looking for an Electrical Engineer to join our experienced and enthusiastic team of engineers. In this role, you will be responsible for everything from system level design decisions to detailed design, manufacturing, and testing. As our first electrical engineering hire, you will be a key contributor to the entire vehicle system – an essential asset to our fast moving team. The ways in which you will contribute will be extremely varied, from continuing the development of our custom sensorless motor controller, to integrating the aircraft’s flight controller. You will have ownership over much of the propulsion system and avionics hardware that will launch on our first production-spec aircraft this winter.

Our engineering team believes in rapid iteration and testing in the physical world. You will get the opportunity to build and fly commercial unmanned electric aircraft at an unprecedented scale.

Pyka is an autonomous electric aircraft company that was founded nine months ago. We are a full-stack team with strong expertise in software/firmware development for autonomous vehicles and electric aircraft design. Our company is founded on the goal of one day providing everyday people with an inexpensive and compelling alternative to existing public transit systems using our extremely short takeoff and landing autonomous electric aircraft. For a host of reasons, our first product is an autonomous electric crop duster which we will be deploying in New Zealand. We have built a full scale 400lb payload proof of concept vehicle and custom autopilot. Our current focus is towards designing and building two of our production spec aircraft which we will be sending to New Zealand this winter for certification testing and our first commercial flights.

Fostering a welcoming company culture that promotes experimentation, asking questions, and the ability to express different views and opinions is extremely important to us. We are a transparent company both internally and externally and do not believe in operating in “stealth mode.” We offer competitive salary and equity.

Responsibilities: Develop our custom motor controller (15kW, 120V pack voltage, utilizing TI’s instaspin for sensorless commutation) Select, integrate, and test various propulsion system subsystems including li-ion BMS. Help develop the aircraft's wiring harness with a particular focus on robustness and repeatability of manufacturing.

Preferred qualifications: Hands on experience laying out and assembling PCBs. Experience designing power electronics. Experience integrating and debugging electric vehicle systems. Working knowledge of EMI and EMC mitigation techniques and best practices. Inquisitive attitude with an ability to quickly and deftly research new topics and develop mastery of a new subjects.

If you're excited about electric aviation and power electronics, please contact me at nathan@flypyka.com.

↧

The Last Analog Motion Graphics Machine [video]

September 22, 2017, 11:23 am

≫ Next: Laser Socks – a sweaty game pointing toward the future of computing

≪ Previous: Pyka is hiring an EE to design aircraft subsystems

Offentliggjort den 21. sep. 2017

Engineer Dave Sieg has spent the last 20 years preserving the only working Scanimate, an analog motion graphics machine that was the staple of film/tv animation in the 70's and 80's. Dave discusses the technical and cultural impact of the Scanimate and what the future holds for this iconic machine.

WATCH NEXT: Tyler, the Creator Goes Behind the Scenes at Stoopid Buddy Stoodios - http://bit.ly/2eR3TFy

Subscribe Now: http://bit.ly/SUBSCRIBE-TO-VICELAND

Follow VICELAND:

↧

Laser Socks – a sweaty game pointing toward the future of computing

September 21, 2017, 8:35 am

≫ Next: A Whale's Guide to Crypto Trading [pdf]

≪ Previous: The Last Analog Motion Graphics Machine [video]

In the video above two people are playing Laser Socks, a game I invented in an afternoon using a research programming system, common household items, and a couple lines of code.

Players try to point a laser pointer at their opponent's socks while dodging their opponent's laser. Whenever they score a hit, the health meter closest to their opponent's play area fills up with blue light. Whoever gets their opponent's meter to fill up first wins.

This page describes the process of making Laser Socks and some of the insights gleaned from the game.

Context

In August 2015, my research group (The Communications Design Group or CDG) had a game jam — an event where participants create games together over the course of a few days. The theme was to make hybrid physical/digital games using a prototype research system Bret Victor and Robert Ochshorn had made called Hypercard in the World. This system was like an operating system for an entire room — it connected cameras, projectors, computers, databases, and laser pointers throughout the lab to let people write programs that would magically add projected graphics and interactivity to physical objects. The point of the jam was to see what playful things you could make with this kind of system. We ended up making more than a dozen new and diverse games.

I made Laser Socks, a game about jumping around and shooting a laser pointer at an opponent's feet. It was fun, ridiculous, and simple to make. In some ways, Laser Socks became one of the highlight demonstrations of what could be done if there was a medium of expression that integrated dynamic computational elements into the physical world.

There were several interesting things that my lab members and I noticed about the process of making and playing Laser Socks that are worth explaining — I've tried to capture them below.

Insights

I could throw together a new type of game the world had never seen before in an afternoon.

This isn't to say that I'm some kind of genius, but rather that once the conceptual possibility of making a physical/digital game became available and a technical system lowered the barriers to making it, I could make something brand new and pretty fun! In our research group we viewed this as a positive sign that if many other people were given the same opportunities, a huge wave of creativity in this new expressive medium could be unleashed, and not just for games but across all kinds of human activities.

Laser Socks was made of posterboard, trash bags, and tape.

Since the cameras, projectors, and computers were already built into our lab space, all I had to do was use some stuff lying around in our lab to make the game. There is something magical and very humane about being able to use physical materials to make a "UI". Compare this to making a computer game: I would have had to laboriously code the appearance of each game element. When working in the dynamic medium the physical world is its own renderer.

Laser Socks was basically two lines of code.

Conceptually the game is extremely simple: "when a laser is detected in a play area, increase the nearest health meter". That's it. The Hypercard in the World system took care of detecting lasers and projection mapping, so really all I had do was draw a blue rectangle for the health meter. It still amazes me that there was so much fun and richness in such simple programming.

Laser Socks was multiplayer by default.

Though it wasn't given as an explicit instruction at the beginning of the game jam, all the games we made including Laser Socks were multiplayer. It seems that when making games in the physical world the natural impulse is to make them social. Compare this to current computer interfaces, which almost all assume a single user in their own private world.

In fact, I designed Laser Socks for two players, but much to my surprise we discovered that the game supported 2 vs. 2 or even 3 vs. 3 games with no modifications. The game's code was just looking for lasers — it didn't care how many people there were. Here is a 3D video of the first 2 vs. 2 game:

For comparison, imagine trying to modify a video game to support more players. Even if you had access to the source code, it would probably be quite difficult. Since the game took place in the physical world with pretty "dumb" code, it could easily support more social experiences.

Additionally, in January 2016 we had a lab anniversary party. We invited around 200 people into our lab space and had several demonstrations around the room all running on the Hypercard in the World system, including Laser Socks, which turned out to be a great party game. In contrast to virtual reality or augmented reality systems, these demos were naturally integrated into the party atmosphere.

Two people mid-jump during a round of Laser Socks

Improvisation is encouraged in the physical world.

Not only were we able to add more players to Laser Socks with no problem, we were also able to add game modes or rules on a whim. Since many parts of the game are socially-mediated instead of mediated by a computer, they can be modified just by having the players agree to them. There is little or no change in code required to experiment with these new ideas.

For example, when testing the first game of Laser Socks we only had one pair of white socks so Michael Nagle and I each wore a single sock and hopped around on one foot (it was pretty hard!). This was just an idea we had and could try immediately. There have also been games where each person holds two laser pointers, one in each hand, or games where players kneel down to try to block a person's lasers.

In the video below, we can see this kind of improvisation in action — Bret "cheats" after he's technically lost by throwing a sock on Michael's game pad. Using that idea as inspiration, I can imagine a game mode where spectators are allowed to throw socks on either game area as an added challenge. This idea could be tried immediately, no code needed.

It has been a dream of many computer visionaries that people without an extensive technical background can create, adapt, and modify computer programs for their own particular uses. Somehow this dream hasn't come to fruition yet, but maybe with a type of computing integrated into the social world there is more possibility for this kind of authorship.

We can exploit the properties of the physical world.

The code for Hypercard in the World, the system that powered Laser Socks, was not fancy. It didn't use machine learning or advanced computer vision — it just looked for laser dots in certain areas. The reason Laser Socks worked is because a laser appears very dim when it bounces off black material and much brighter when it bounces off white material. This is why players needed to wear white socks while playing Laser Socks — the system would not detect the laser unless it bounced off bright material. It's wonderful that we could use this basic property of color and physics to our advantage instead of trying to make some advanced algorithm for sock detection which is out range for most people.

The Dream

In the 1960s, when computers took up entire rooms and academic researchers had to sign up for time slots to use them, some MIT students invented arguably the first real-time computer game, dubbed Spacewar. As Stewart Brand wrote later in Spacewar: Fanatic Life and Symbolic Death Among the Computer Bums, the game functioned as a harbinger of some of the new possibilities the computing medium might bring:

Yet Spacewar, if anyone cared to notice, was a flawless crystal ball of things to come in computer science and computer use:
It was intensely interactive in real time with the computer.
It encouraged new programming by the user.
It bonded human and machine through a responsive broadband interface of live graphics display.
It served primarily as a communication device between humans.
It was a game.
It functioned best on, stand-alone equipment (and diarupted multiple-user equipment).
It served human interest, not machine. (Spacewar is trivial to a computer.)
It was delightful.

Similarly, Laser Socks is a fun demo by itself, but what we as researchers hope it points toward is a new type of computing that instead of isolating humans in artificial digital worlds, provides a medium of expression that is continuous and integrated with our physical and social worlds. It's a type of computing that's hard to imagine for most of us now, but may one day be as much a part of our lives as laptops and mobile phones are now. Laser Socks is just one small glimmer in this vast ocean of possibility.

The researchers at Dynamicland have been trying to simulate such a future to see what it would be like. Using dozens of prototypes made in earlier systems as examples, the group has developed a more advanced system dubbed "Realtalk" to support their continuing experiments. They've installed this system into their entire lab space and are beginning to open the space to the wider community. They're hoping to make the space into a new type of institution that is somewhere between a library, makerspace, lecture hall, and research lab, all centered around developing a new type of media the world has never seen before. Best luck to the team.

✨   🌙🚀✨

↧

A Whale's Guide to Crypto Trading [pdf]

September 22, 2017, 2:48 pm

≫ Next: Portugal treats addiction as a disease, not a crime

≪ Previous: Laser Socks – a sweaty game pointing toward the future of computing

Download PDF

↧

Portugal treats addiction as a disease, not a crime

September 22, 2017, 12:43 pm

≫ Next: New antibody attacks 99% of HIV strains

≪ Previous: A Whale's Guide to Crypto Trading [pdf]

This issue is personal to me, because my hometown in rural Oregon has been devastated by methamphetamines and, more recently, by opioids. Classmates have died or had their lives destroyed; my seventh-grade crush is now homeless because of her addictions.

Many people are also coming to Portugal to explore what a smarter, health-driven approach might look like. Delegations from around the world are flying to Lisbon to study what is now referred to as the “Portuguese model.”

“This is the best thing to happen to this country,” Mario Oliveira, 53, a former typesetter who became hooked on heroin 30 years ago, told me as he sipped from a paper cup of methadone supplied by a mobile van. The vans, a crucial link in Portugal’s public health efforts, cruise Lisbon’s streets every day of the year and supply users with free methadone, an opioid substitute, to stabilize their lives and enable them to hold jobs.

Methadone and other drug treatment programs also exist in the U.S., but are often expensive or difficult to access. The result is that only 10 percent of Americans struggling with addiction get treatment; in Portugal, treatment is standard.

A van distributing methadone, an opioid substitute, in the Lumiar neighborhood. Daniel Rodrigues for The New York Times

“If I couldn’t come here, I don’t know if I’d still be alive,” Oliveira told me. He said that he used to steal to support his habit but is now getting his life under control. Two weeks ago, he began reducing his dose of methadone, and he hopes to wean himself off opioids completely.

Yet Portugal’s approach is no magic wand.

“I’m homeless and jobless and addicted again,” Miguel Fonseca, a 39-year-old electrical mechanic, said as he held a lighter under a sheet of tin foil to turn a pinch of heroin powder into fumes that he smoked to get high. He spends about $100 a day on his habit, and in the past sometimes has turned to theft to support it.

Less than 100 feet away, Mario, the fisherman I began this story with, was injecting himself with heroin and cocaine, and showing little interest in Lopes’s health outreach. He assured her that he wasn’t overdosing, and he scoffed at the idea of methadone as an alternative to heroin.

Mario told me that he had started with heroin at age 14 (another man I met had started at age 11), and used it during the two years he worked as a fisherman in Massachusetts. “Portuguese heroin isn’t as high quality as American heroin,” he complained. He then reached for a pipe and began to smoke cocaine.

Public health workers like Lopes may never be able to get Mario to give up drugs, but she can help keep him alive. Seeing Mario, his blood spattered on the steps from his constant injections, tottering off to get more drugs, it was clear that the Portuguese model isn’t as effective as we might hope — but it occurred to me that in America, Mario might well be dead.

Portugal switched to its health focus under the leadership of a socialist prime minister named António Guterres— and if the name sounds familiar, it’s because he’s now the United Nations secretary general. The new approach was a gamble. “We were facing a devastating situation, so we had nothing to lose,” recalled João Castel-Branco Goulão, a public health expert and the architect of the policy (“our national hero,” as one Portuguese cabinet minister told me).

So let’s be clear on what Portugal did and didn’t do. First, it didn’t change laws on drug trafficking: Dealers still go to prison. And it didn’t quite legalize drug use, but rather made the purchase or possession of small quantities (up to a 10-day supply) not a crime but an administrative offense, like a traffic ticket.

Offenders are summoned to a “Dissuasion Commission” hearing — an informal meeting at a conference table with social workers who try to prevent a casual user from becoming addicted.

Hypodermic kits are distributed by outreach workers and used syringes are collected for safe disposal and to prevent sharing. Daniel Rodrigues for The New York Times

“How long have you been using?” Nuno Capaz, a sociologist and member of the Dissuasion Commission in Lisbon, asked a 26-year-old factory worker caught with hashish. They chatted, with Capaz trying to figure out if the young man was in danger of taking up harder drugs. The dissuasion board can fine offenders, but that’s rare. Mostly the strategy is to intervene with counseling or other assistance before an offender becomes addicted.

“My main concern is the health of the person,” Capaz explained afterward. “Our approach is much closer to that of a medical doctor than to a court of law.”

The public health approach arises from an increasingly common view worldwide that addiction is a chronic disease, perhaps comparable to diabetes, and thus requires medical care rather than punishment. After all, we don’t just tell diabetics, Get over it.

My sense from observing the hearings and talking to users is that the Dissuasion Commission isn’t terribly effective at dissuading. How successful could a 15-minute session be? Then again, criminal sanctions also seem ineffective at discouraging drug use: When scholars look at the impact of crackdowns, they find there’s typically little impact.

In the first year or so of decriminalization in Portugal, there did seem to be the increase in drug use that critics had predicted. But although the Portuguese model is often described simply as decriminalization, perhaps the more important part is a public health initiative to treat addiction and discourage narcotics use. My take is that decriminalization on its own might have led to a modest increase in the use of hard drugs, but that this was swamped by public health efforts that led to an overall decline.

Portugal introduced targeted messaging to particular groups — prostitutes, Ukrainians, high school dropouts, and so on. The Health Ministry dispatched workers into the most drug-infested neighborhoods to pass out needles and urge users to try methadone. At big concerts or similar gatherings, the Health Ministry sometimes authorizes the testing of users’ drugs to advise them if they are safe, and then the return of the stash. Decriminalization makes all this easier, because people no longer fear arrest.

So how effective are the methadone vans and prevention campaigns? I thought I’d ask some real experts: drug dealers.

“There are fewer customers now,” complained one heroin dealer in the gritty Lumiar neighborhood. Another, Joaquim Farinha, 55, was skeptical that methadone was costing him much business. “Business is still pretty good,” he said, interrupting the interview to make a sale to a middle-aged woman.

(Portugal’s drug market is relatively nonviolent and relaxed partly because of another factor: Handguns are tightly controlled.)

On balance, the evidence is that drug use stabilized or declined since Portugal changed approaches, particularly for heroin. In polls, the proportion of 15- to 24-year-olds who say that they have used illicit drugs in the last month dropped by almost half since decriminalization.

Decriminalization also made it easier to fight infectious diseases and treat overdoses. In the U.S., people are sometimes reluctant to call 911 after a friend overdoses for fear of an arrest; that’s not a risk in Portugal. In 1999, Portugal had the highest rate of drug-related AIDS in the European Union; since then, H.I.V. diagnoses attributed to injections have fallen by more than 90 percent and Portugal is no longer at the high end in Europe.

One crucial mistake that Portugal did not make was to follow the United States in adopting prescription opioid painkillers for routine use. Adalberto Campos Fernandes, the health minister, said that Portuguese doctors resisted overprescribing and that regulators also stood in the way.

Recovering drug users receive daily doses of methadone from vans that cruise Lisbon streets. Daniel Rodrigues for The New York Times

Another factor that has benefited Portugal: The economy has grown and there is a robust social fabric and safety net, so fewer people self-medicate with drugs. Anne Case and Angus Deaton of Princeton University have chronicled the rise of “deaths of despair” and argue that opioid use in America in part reflects a long-term decline in well-paying jobs for those with a high school education or less.

Portugal initially was scolded around the world for its experiment, as a weak link in the war on drugs, but today it’s hailed as a model. The World Health Organization and American Public Health Association have both praised decriminalization and a public health focus, as has the Global Commission on Drug Policy.

One attraction of the Portuguese approach is that it’s incomparably cheaper to treat people than to jail them. The Health Ministry spends less than $10 per citizen per year on its successful drug policy. Meanwhile, the U.S. has spent some $10,000 per household (more than $1 trillion) over the decades on a failed drug policy that results in more than 1,000 deaths each week.

I’ve been apprehensive of decriminalizing hard drugs for fear of increasing addiction. Portugal changed my mind, and its policy seems fundamentally humane and lifesaving. Yet let’s also be realistic about what is possible: Portugal’s approach works better than America’s, but nothing succeeds as well as we might hope.

The hilly Casal Ventoso neighborhood of Lisbon was ground zero for heroin in Lisbon 15 years ago, “a wall of death,” remembered Paulo Brito, 55, who has been using heroin since he was 15.

Brito weaned himself off drugs with the help of health workers and remained “clean” for 10 years — but relapsed a year ago, and I met him in today’s Casal Ventoso. There are fewer overdoses now, but it is still littered with hypodermic packages and other detritus of narcotics, as well as a pall of sadness.

“I’ve hit rock bottom,” Brito told me despairingly. “I’m losing the person I most love in the world.”

His girlfriend, Teresa, is begging him to give up heroin. He wants to choose her; he fervently wants to quit. But he doesn’t know if he can, and he teared up as he said, “It’s like entering a boxing ring and facing Mike Tyson.”

Yet for all his suffering, Brito lives, because he’s Portuguese. The lesson that Portugal offers the world is that while we can’t eradicate heroin, it’s possible to save the lives of drug users — if we’re willing to treat them not as criminals but as sick, suffering human beings who need helping hands, not handcuffs.

↧

New antibody attacks 99% of HIV strains

September 22, 2017, 10:18 am

≫ Next: Joining a Frat lowers GPA by 0.25 (6%) but raises future salary by 36%

≪ Previous: Portugal treats addiction as a disease, not a crime

Scientists have engineered an antibody that attacks 99% of HIV strains and can prevent infection in primates.

It is built to attack three critical parts of the virus - making it harder for HIV to resist its effects.

The work is a collaboration between the US National Institutes of Health and the pharmaceutical company Sanofi.

The International Aids Society said it was an "exciting breakthrough". Human trials will start in 2018 to see if it can prevent or treat infection.

Our bodies struggle to fight HIV because of the virus' incredible ability to mutate and change its appearance.

These varieties of HIV - or strains - in a single patient are comparable to those of influenza during a worldwide flu season.

So the immune system finds itself in a fight against an insurmountable number of strains of HIV.

Super-antibodies

But after years of infection, a small number of patients develop powerful weapons called "broadly neutralising antibodies" that attack something fundamental to HIV and can kill large swathes of HIV strains.

Researchers have been trying to use broadly neutralising antibodies as a way to treat HIV, or prevent infection in the first place.

The study, published in the journal Science, combines three such antibodies into an even more powerful "tri-specific antibody".

Dr Gary Nabel, the chief scientific officer at Sanofi and one of the report authors, told the BBC News website: "They are more potent and have greater breadth than any single naturally occurring antibody that's been discovered."

The best naturally occurring antibodies will target 90% of HIV strains.

"We're getting 99% coverage, and getting coverage at very low concentrations of the antibody," said Dr Nabel.

Experiments on 24 monkeys showed none of those given the tri-specific antibody developed an infection when they were later injected with the virus.

Dr Nabel said: "It was quite an impressive degree of protection."

The work included scientists at Harvard Medical School, The Scripps Research Institute, and the Massachusetts Institute of Technology.

'Exciting'

Clinical trials to test the antibody in people will start next year.

Prof Linda-Gail Bekker, the president of the International Aids Society, told the BBC: "This paper reports an exciting breakthrough.

"These super-engineered antibodies seem to go beyond the natural and could have more applications than we have imagined to date.

"It's early days yet, and as a scientist I look forward to seeing the first trials get off the ground in 2018.

"As a doctor in Africa, I feel the urgency to confirm these findings in humans as soon as possible."

Dr Anthony Fauci, the director of the US National Institute of Allergy and Infectious Diseases, said it was an intriguing approach.

He added: "Combinations of antibodies that each bind to a distinct site on HIV may best overcome the defences of the virus in the effort to achieve effective antibody-based treatment and prevention."

Follow James on Twitter.

↧

Joining a Frat lowers GPA by 0.25 (6%) but raises future salary by 36%

September 22, 2017, 3:12 pm

≫ Next: Relicensing React, Jest, Flow, and Immutable.js

≪ Previous: New antibody attacks 99% of HIV strains

31 Pages Posted: 14 Apr 2016 Last revised: 19 May 2017

10 Thoughts

Union College - Department of Economics

Date Written: April 4, 2017

Abstract

We exploit changes in the residential and social environment on campus to identify the economic and academic consequences of fraternity membership at a small Northeastern college. Our estimates suggest that these consequences are large, with fraternity membership lowering student GPA by approximately 0.25 points on the traditional four-point scale, but raising future income by approximately 36%, for those students whose decision about membership is affected by changes in the environment. These results suggest that fraternity membership causally produces large gains in social capital, which more than outweigh its negative effects on human capital for potential members. Alcohol-related behavior does not explain much of the effects of fraternity membership on either the human capital or social capital effects. These findings suggest that college administrators face significant trade-offs when crafting policies related to Greek life on campus.

Keywords: Fraternities, Social Capital, Human Capital, Grades, Alcohol, Drinking

JEL Classification: I23, J24, I12

Suggested Citation: Suggested Citation

Mara, Jack and Davis, Lewis and Schmidt, Stephen, Social Animal House: The Economic and Academic Consequences of Fraternity Membership (April 4, 2017). Available at SSRN: https://ssrn.com/abstract=2763720 or http://dx.doi.org/10.2139/ssrn.2763720

↧