PostgreSQL Features You May Not Have Tried but Should

June 20, 2018, 9:39 am

≫ Next: Attacks against GPG signed APT repositories

≪ Previous: It took a long time for de Gaulle to become ‘de Gaulle’

PostgreSQL contains a slew of great features. Many of them are very well known. Others can be incredibly useful but are not as widely appreciated. Here are our top picks for great PostgreSQL features you may not have taken a look at but really should, since they can help you get code into production faster, make ops easier and generally get things done with less code and sweat.

Pub/Sub Notifications

PostgreSQL comes with a simple non-durable topic-based publish-subscribe notification system. It’s no Kafka, but the features do support common use cases.

Messages on a specific topic can be broadcast to all connected subscribers who are listening for that topic. The messages are pushed by the Postgres server to the listening clients. Polling is not required, but your database driver should support delivery of notifications to the application asynchronously.

The notification consists of a topic name and a payload (upto about 8000 characters). The payload would typically be a JSON string, but of course it can be anything. You can send a notification using the NOTIFY command:

NOTIFY'foo_events''{"userid":42,"action":"grok"}'

or the pg_notify() function:

SELECTpg_notify('foo_events','{"userid":42,"action":"grok"}');

The subscription happens with the LISTEN command, but typically you’ve to use driver-specific APIs. Here’s the Go version for example.

Table Inheritance

Assume you have a table called “invoices”. You’ve to now support “government invoices”, which are invoices but have a few additional fields. How would you model this? Rather than adding nullable columns to invoices, or a single nullable JSON column to invoices, try the inheritance feature:

CREATETABLEinvoices(invoice_numberintNOTNULLPRIMARYKEY,issued_ondateNOTNULLDEFAULTnow());CREATETABLEgovernment_invoices(department_idtextNOTNULL)INHERITS(invoices);

This reflects the situation that all government invoices are invoices, but have an extra attribute. The “government_invoices” table above has a total of 3 columns:

test=# \d invoices
                  Table "public.invoices"
     Column     |  Type   | Collation | Nullable | Default
----------------+---------+-----------+----------+---------
 invoice_number | integer |           | not null |
 issued_on      | date    |           | not null | now()
Indexes:
    "invoices_pkey" PRIMARY KEY, btree (invoice_number)
Number of child tables: 1 (Use \d+ to list them.)

test=# \d government_invoices
            Table "public.government_invoices"
     Column     |  Type   | Collation | Nullable | Default
----------------+---------+-----------+----------+---------
 invoice_number | integer |           | not null |
 issued_on      | date    |           | not null | now()
 department_id  | text    |           | not null |
Inherits: invoices

Adding rows work as though the tables are independent:

INSERTINTOinvoices(invoice_number)VALUES(100);INSERTINTOgovernment_invoices(invoice_number,department_id)VALUES(101,'DOD');

But watch happens when you SELECT:

test=# SELECT * FROM government_invoices;
 invoice_number | issued_on  | department_id
----------------+------------+---------------
            101 | 2018-06-19 | DOD
(1 row)

test=# SELECT * FROM invoices;
 invoice_number | issued_on
----------------+------------
            100 | 2018-06-19
            101 | 2018-06-19
(2 rows)

Invoice number 101 from the child table is also listed in the parent table. This facilitates algorithms that know only about invoices to work on the parent invoices table while ignoring any child tables.

You can learn more about table inheritance in PostgreSQL in the docs.

Foreign Data Wrappers

Did you know you can have virtual tables that actually serve data from another PostgreSQL instance? Or even SQLite files, MongoDB, Redis, and more? This feature is called Foreign Data Wrappers, which provides a standardized way to access and manipulate data stored externally to the Postgres server you’re connecting to. There are various FDW implementations available that let you connect to various different data sources. These are typically packaged as extensions.

The standard Postgres distribution comes with thepostgres_fdw extension, which lets you connect to other Postgres servers. For example, you can move a big table to another server, and setup a virtual table (the proper term is a “foreign table”) locally:

-- install the extension (required only once)
CREATE EXTENSION postgres_fdw;

-- big_server is our big, remote server with the migrated table
CREATE SERVER big_server FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (host '10.1.2.3', dbname 'big_db');

-- create a user mapping for the app user
CREATE USER MAPPING FOR app_user SERVER big_server
  OPTIONS (user 'remote_user', password 'remote_pass');

-- and finally create the virtual table invoices -> big_db.public.invoices
CREATE FOREIGN TABLE invoices (
  invoice_num int NOT NULL PRIMARY KEY
  -- other columns omitted for brevity
) SERVER big_server;

This Wiki page has a good list of the many FDW implementations available.

Apart from accessing data from other servers, FDW has also been used to implement alternate storage layouts, like cstore_fdw.

There is also dblink, which is yet another method of accessing remote PostgreSQL (only) data.

Partitioned Tables

Starting with version 10, PostgreSQL natively supports sharding of a table into multiple child tables, based on a partition key value computed from one or more column values. This allows a single, large table to be physically stored as separate tables, for better DML performance and storage management.

Here’s how to create a partitioned table, with the intention of adding a new table each month:

-- the parent table
CREATE TABLE invoices (
    invoice_number   int  NOT NULL,
    issued_on        date NOT NULL DEFAULT now()
) PARTITION BY RANGE (issued_on);
-- table for the month of May 2018
CREATE TABLE invoices_2018_05 PARTITION OF invoices
  FOR VALUES FROM ('2018-05-01') TO ('2018-06-01');

-- table for the month of June 2018
CREATE TABLE invoices_2018_06 PARTITION OF invoices
  FOR VALUES FROM ('2018-06-01') TO ('2018-07-01');

The child tables have to be manually or programmatically created, it does not happen automatically.

You can query and insert at the parent level, and PostgreSQL will route it to the appropriate child table, have a look:

First, we insert two rows:

test=# INSERT INTO invoices VALUES (10042, '2018-05-15');
INSERT 0 1
test=# INSERT INTO invoices VALUES (43029, '2018-06-15');
INSERT 0 1

We can see that the rows are actually inserted into the child tables:

test=# SELECT * FROM invoices_2018_05;
 invoice_number | issued_on
----------------+------------
          10042 | 2018-05-15
(1 row)

test=# SELECT * FROM invoices_2018_06;
 invoice_number | issued_on
----------------+------------
          43029 | 2018-06-15
(1 row)

But queries can be done on the parent, returning combined result:

test=# SELECT * FROM invoices;
 invoice_number | issued_on
----------------+------------
          10042 | 2018-05-15
          43029 | 2018-06-15
(2 rows)

Partitioning is in some ways similar to inheritance (query at parent level), but there are differences too (partition parent contains no data, for example). You can read more about partitioning in the docs.

PostgreSQL 11, currently in beta, improves upon this feature. This article has an overview of the improvements.

Range Types

Ever worked with temperature ranges, calendar scheduling, price ranges and the like? Working with intervals are one of those deceptively simple things that gently lead you into hair-pulling and late night debugging.

Here’s a table with a range column, and some values:

CREATE TABLE prices (
    item  text,
    price int4range -- int4range is a range of regular integers
);

INSERT INTO prices VALUES ('mouse',    '[10,16)');
INSERT INTO prices VALUES ('keyboard', '[20,31)');
INSERT INTO prices VALUES ('joystick', '[35,56)');

The numbers in the mismatched brackets represent a half-open interval. Here is the query to find all items that are in the price range $15 to $30, using the && operator (range overlap):

test=# SELECT * FROM prices WHERE price && int4range(15,30);
   item   |  price
----------+---------
 mouse    | [10,16)
 keyboard | [20,31)
(2 rows)

If you are not impressed, try writing the query without ranges. (No really, just try it.)

Range types are quite powerful – there are more operators,functions, you can define your own range types, and even index them.

To learn more about ranges, try this presentation, and the docs.

Array Types

PostgreSQL has supported arrays for a long time. Array types can reduce the amount of boilerplate application code and simplify queries. Here is a table that uses an array column:

CREATE TABLE posts (
    title text NOT NULL PRIMARY KEY,
    tags  text[]
);

Assuming each row represents a blog post, each having a set of tags, here is how we can list all the posts that have both “postgres” and “go” tags:

test=# SELECT title, tags FROM posts WHERE '{"postgres", "go"}' <@ tags;
               title               |          tags
-----------------------------------+------------------------
 Writing PostgreSQL Triggers in Go | {postgres,triggers,go}
(1 row)

The usage of the array type here makes for concise data modelling and simpler queries. Postgres arrays come with operators and functions, including aggregate functions. You can also create indexes on array expressions. Here isan article on using arrays with Go.

Triggers

You can ask PostgreSQL to execute a specific function when rows are inserted, updated or deleted from a table. The function can even alter the values during an insert. You can read more about triggers here. Just to whet your appetite, here is a trigger that sends out a notification and writes to an audit log when a user is created:

-- a table of users
CREATE TABLE users (
  username text NOT NULL PRIMARY KEY
);

-- an audit log
CREATE TABLE audit_log (
  at          timestamptz NOT NULL DEFAULT now(),
  description text NOT NULL
);

-- the actual function that is executed per insert
CREATE FUNCTION on_user_added() RETURNS TRIGGER AS $$
BEGIN
  IF (TG_OP = 'INSERT') THEN
    -- add an entry into the audit log
    INSERT INTO audit_log (description)
        VALUES ('new user created, username is ' || NEW.username);
    -- send a notification
    PERFORM pg_notify('usercreated', NEW.username);
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- set the function as an insert trigger
CREATE TRIGGER on_user_added AFTER INSERT ON users
  FOR EACH ROW EXECUTE PROCEDURE on_user_added();

Now if we try to add a new user, an audit log entry is automatically added:

test=# INSERT INTO users VALUES ('alice');
INSERT 0 1
test=# SELECT * FROM audit_log;
                at             |             description
-------------------------------+-------------------------------------
 2018-06-19 04:00:30.672947+00 | new user created, username is alice
(1 row)

pg_stat_statements

pg_stat_statements is an extension that is present by default in your PostgreSQL distribution, but is not enabled. This extension records a wealth of information about each statement executed, including the time taken, the memory used and disk I/Os initiated. It’s indispensible for understanding and debugging query performance issues.

The overhead of installing and enabling this extension is small, it’s very simple to use, and there is no reason NOT to run it on your production servers! Read the docs for more info.

Hash, GIN and BRIN Indexes

The default index type in PostgreSQL is the B-Tree, but there are also other types, that are documented here. Other index types are very helpful in cases that aren’t actually uncommon. In particular, setting up indexes of the hash, GIN and BRIN type might just be the solution to your performance issues:

Hash: Unlike B-Tree indexes which have inherent ordering, hash indexes are unordered and can only do equality matches (lookup). However, hash indexes occupy much lesser space and are faster than B-Trees for equality matches. (Also, note that prior to PostgreSQL 10 it was not possible to replicate hash indexes; they were unlogged.)
GIN: GIN is an inverted index, which essentially allows for multiple values for a single key. GIN indexes are useful for indexing arrays, JSON, ranges, full text search etc.
BRIN: If your data has a specific, natural order – for example, time series data – and your queries typically work only with a small range of it, then BRIN indexes can speed up your queries with very little overhead. BRIN indexes maintain ranges per block of data, allowing the optimizer to skip over blocks that contain rows that won’t be selected by the query.

Start reading about PostgreSQL index types here.

Full Text Search

PostgreSQL can do full text search very well, including in languages other than English. Here’s an [article] (https://www.opsdash.com/blog/postgres-full-text-search-golang.html) that walks you through the steps of creating a full text query search app over Postgres in Go.

About pgDash

pgDash is an in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics.

pgDash is currently in Beta and provides core reporting and visualization functionality, including collecting and displaying PostgreSQL information and providing time-series graphs, detailed reports, alerting and more.

↧

Attacks against GPG signed APT repositories

June 20, 2018, 9:09 am

≫ Next: Oxford English Dictionary extends hunt for regional words around the world

≪ Previous: PostgreSQL Features You May Not Have Tried but Should

This blog post takes a closer look at some attacks against APT repositories outlined in the academic paper A Look In the Mirror: Attacks on Package Managers and how they apply to recent versions of APT software.

It is a common misconception that simply signing your packages and repository metadata with GPG is enough to create a secure APT repository. This is false. Many of the attacks outlined in the paper and this blog post are effective against GPG-signed APT repositories. GPG signing Debian packages themselves does nothing, as explained below.

The easiest way to prevent the attacks covered below is to always serve your APT repository over TLS; no exceptions.

In addition to the attacks mentioned in the paper, we’ll also take a look at two other more recent security vulnerablities and discuss some mitigation techniques.

Building and maintaining secure APT repositories is difficult, time consuming, and must be handled with care. In some instances, some of these bugs or other issues may not be able to be easily dealt with as a repository server may not necessarily be able to enforce that clients use a particular version of APT when connecting to the repository server.

The easiest way to prevent as many security issues as possible with hosting APT repositories is to simply use TLS.

Create a secure APT repository in less than 10 seconds, free.

Understanding the APT security model is important before diving into the various attacks. To understand the security model, you must first consider that there two types of GPG signatures that can be used. This point is a source of confusion for both APT repository hosts and users of APT.

The two types of GPG signatures that can be used are:

GPG signatures on the individual Debian packages themselves (which is not a useful activity, as explained next), and
GPG signatures on repository metadata.

Unless explicitly stated otherwise, the term GPG signature in APT software and documentation almost always refer to the latter case.

GPG signing a Debian package does nothing because package signatures are not verified by default on any major distribution when packages are installed with apt-get install. See your /etc/dpkg/dpkg.cfg file for an explicit comment to this effect.

Here’s mine (Ubuntu 16.04):

# Do not enable debsig-verify by default; since the distribution is not using# embedded signatures, debsig-verify would reject all packages.

That said…

If you want users to verify GPG signatures of your packages manually (the only way they could verify them), you should provide explicit instructions on how to do this along with an XML policy file that debsig-verify can consume to verify the package’s GPG signature.

It is possible there are Debian package providers doing this; we have yet to see anyone provide an XML policy document for verifying Debian package GPG signatures out in the wild.

When APT software (such as apt-get, or reprepro) or folks offering APT repositories mention GPG signatures in their documentation they are typically referring to GPG signatures on repository metadata, not on packages themselves. Likewise, when you configure the SignWith option of reprepro (documented here), you are telling reprepro to sign your repository metadata with the specified GPG key; this does not sign any of the packages, though.

GPG signing repository metadata is intended to ensure that the repository metadata has not been modified after it has been generated. This is the typical method employed by APT repositories and is explained in the SecureAPT wiki page.

Please keep this in mind when thinking about the APT security model and the attacks presented below.

The paper outlines a series of attacks against package managers, which we’ll examine in more detail as they relate to APT.

The attacks outlined against various package managers throughout the paper are as follows:

Arbitrary package: an attacker can provide a malicious package in place of the package requested by the user.
Replay Attack: an attacker can replay older versions of correctly signed repository metadata causing a client to install a version of a package with a known vulnerablity.
Freeze Attack: an attacker can freeze the repository metadata a client sees at a particular moment in time preventing a client system from updating to a newer version of a package.
Extraneous Dependencies: an attacker rewrites the package metadata to insert dependencies on additional packages that have known vulnerabilities.
Endless Data: an attacker provides an endless stream of repository metadata or package data causing the client to fill its disk or memory and crash.
The paper also outlines a series of attacks done against the mirroring system for a variety of package managers. These attacks are very interesting, as the paper explains several ways in which a malicious mirror could be constructed. We won’t be examining the mirror attacks in this blog post, but readers are encouraged to read the paper for more details and consider their usage of publicly available package repository mirrors.

We’ll examine how each of these relate to APT repositories and how these attacks can be avoided or prevented. Additionally, we’ll examine two other bugs that have surfaced more recently with very serious consequences:

Improper error handling in APT during GPG signature verification leads to arbitrary code execution.
apt-key silently fails to remove GPG keys requested by the user.

Create a secure APT repository in less than 10 seconds, free.

It is possible for an attacker to cause a client of an APT repository to install an arbitrary package in three circumstances:

If the repository is served over plain-text HTTP with no GPG signatures, the repository metadata can be trivially replaced by an attacker who can perform a Man-in-the-middle (MitM) attack to cause the client to install an arbitrary package.
If the repository is served over plain-text HTTP with GPG signed repository metadata to a client running any APT version prior to 1.1 (e.g. Ubuntu 14.04 or earlier, Debian Jessie or earlier), or a more recent version of APT with Acquire::AllowDowngradeToInsecureRepositories set to true in apt.conf (the default is false, thankfully) an attacker who can perform a MitM can simply block requests for the GPG signed metadata and respond with hand-crafted unsigned metadata pointing at arbitrary packages.
If the repository is served over plain-text HTTP with GPG signed repository metadata to a client running any APT version with Acquire::AllowInsecureRepositories set to true (which is the default in Ubuntu 16.04) which has not previously been successfully authenticated against, an attacker who can perform a MitM can respond with hand-crafted unsigned metadata pointing at arbitrary packages.

In order to prevent this type of attack you should do the following:

Always use TLS for serving APT repositories. This will prevent an attacker from being able to perform a MitM.
Set the Acquire::AllowInsecureRepositories to false explicitly. This option will prevent APT from using repositories that do not present GPG signed metadata, thereby preventing a downgrade attack. This option is present in APT 1.1 or newer (available on Ubuntu 16.04 or newer, and Debian Stretch or newer), but is defaulted to true for “backward compatibility” on several versions of APT. We strongly recommend setting this value explicitly to false.
Set the Acquire::AllowDowngradeToInsecureRepositories to false. This option is available on APT 1.1 or newer. Note that this option will only help if you’ve been able to connect to a repository and download the GPG signed metadata at least once already. This is better than nothing, but options 1 and 2 should be used, first.

If you are using a version of APT prior to 1.1, you should upgrade immediately. If your system does not provide a version of suitable version of APT, we provide a backported APT 1.2.10 for both Ubuntu Precise and Ubuntu Trusty, which includes these flags and additional recently added APT features for other APT bugs.

If you are using any version of APT, you should set Acquire::AllowInsecureRepositories to false in your apt.conf, because some versions of APT have this defaulted to true.

If you are hosting an APT repository for your users, customers, or clients: serve that repository over TLS, only.

An attacker who can perform a MitM attack on a client can replay valid APT metadata (along with the valid GPG signature for the metadata) to perform two types of attacks:

A replay attack where a client system is presented with APT repository metadata pointing to older versions of packages with known security vulnerabilities. APT will ask the user to confirm a downgrade, but automation tools (bash scripts, Chef, Puppet, etc) may swallow these warnings. It is typical to find scripts with the flag --force-yes being passed to APT.
A freeze attack where a client system is presented with APT repository metadata pointing to their current package version. This attack is presented for a period of time so that clients don’t “see” new versions of packages when they are published in the future. This gives the attacker the opporunity to use exploits as they are discovered later.

In order to prevent these types of attacks there are several things you can do:

Always use TLS for serving APT repositories. This will prevent an attacker from being able to perform a MitM.
Repository hosts should ensure that the Valid-Until field is present in their generated APT metadata. If you use reprepro to generate APT metadata, you can enable this by setting the ValidFor option in your conf/distributions file. Of course, this will mean you need to regenerate your APT metadata before it expires so that you don’t break any clients (possibly with a cron job, or some other mechanism). You’ll need to choose a “reasonable” value for this, because replay attacks will still be possible until the expiration of the metadata.
APT clients should set the Acquire::Max-ValidTime option in their apt.conf to a reasonable value to guard against remote APT hosts with Valid-Until set to unreasonably large values. Additionally, APT clients should ensure the system clock is accurate (i.e., run NTP).

The Valid-Until field will only prevent replay attacks after the metadata has expired. The metadata can still be replayed until it expires, so this field does not protect against freeze attacks. Nevertheless, APT repository hosts who use this field should choose a short Valid-Until time. If you are using a tool like reprepro, this can be set by configuring the ValidFor option in conf/distributions. This also means APT repository hosts must regenerate their repository metadata regularly, even if no new packages are published, to prevent breaking clients.

Also note that the Valid-Until field relies on an accurate system clock on the client. APT clients should ensure their system clock is accurate by using a service like NTP.

Using TLS is the best option to solve this because:

You do not need to worry about regenerating your APT repository repeatedly to keep it up to date, even if no new packages are added.
You do not run the risk of a freeze attack during the metadata’s validity window.
If your clients are independent entities, you may not be able to guarantee that their system clocks are accurate.

Create a secure APT repository in less than 10 seconds, free.

An attacker can provide hand crafted repository metadata that adds malicious dependencies to packages listed in the APT repository metadata in the following circumstances:

Plain text HTTP APT repositories with no GPG signatures.
Plain text HTTP APT repositories with GPG signatures that are served to clients running a version of APT that allows downgrade attacks or has Acquire::AllowDowngradeToInsecureRepositories set to true.
Plain text HTTP APT repositories with GPG signatures that are served to clients that do not allow downgrade attacks, but for which the client has not yet cached a copy of the signed metadata.

This attack is particularly painful because the original package the user requested is installed giving the user the impression that they have correctly installed their desired software.

The best way to fix this issue is to host your APT repository via TLS.

Alternatively, APT may be modified to verify that dependencies specified in the repository metadata are checked against the dependencies listed in the package itself when it is downloaded and reject any repositories with mismatches.

The APT client on Ubuntu Xenial (16.04.3) was vulnerable to this attack.

A malicious APT repository can provide repository metadata that is extremely large. APT clients will download this metadata and fill the disk.

Reproducing this is relatively straight forward:

Create an APT repository on the server.
Remove the Release files and replace the Packages files with extremely large files. We used zero-filled files generated with dd. We reproduced this bug by replacing the Packages file with a 10GB file that was zero filled.
Run apt-get update on the client.
Watch the disk fill up and the errors begin scrolling from APT when there is no space left on the disk.

Whether or not this type of attack would result in your system failing depends specifically on the operational parameters of the software being run on that system, however it is safe to assume that this could lead to a denial of service.

Fixing this might be a bit tricky as different folks may have different preferences on how much disk space used is too much, but either way patching APT to check the Content-Length HTTP header (if it exists) to determine if the metadata being requested will actually fit on the disk prior to attempting to download it is probably a good first step.

Without a custom patched APT client, the easiest way to guard against an attack like this would be to:

Ensure that Acquire::AllowInsecureRepositories is explicitly set to false. It is defaulted to true on Ubuntu 16.04.3. This would prevent downloading repository metadata if a Release file is missing.
Ensure you are using adequate system monitoring software to alert you when your disk reaches low disk space.
Perhaps consider writing a script or system monitoring to check the size of the APT cache files in /var/lib/apt and /var/cache/apt and alerting based on the size and free disk space available.

Create a secure APT repository in less than 10 seconds, free.

On December 13, 2016 the debian-security-announce list announced a security vulnerability which leads to arbitrary code execution due to a bug in the GPG signature verification code in APT:

An attacker able to man-in-the-middle HTTP requests to an apt repository that uses InRelease files (clearsigned Release files), can take advantage of this flaw to circumvent the signature of the InRelease file, leading to arbitrary code execution.

It is recommended that all APT users ensure they are running the latest version of APT which has patched this issue.

It is recommended that all APT repository hosts serve their APT repositories over TLS to prevent MitM attacks which can expose users to this and many other attacks, as described previously in this blog post.

The apt-key utility is typically used to manage GPG keys installed on a Debian or Ubuntu system which are in turn used to verify repository metadata.

It is critical to be able to add and remove GPG keys to ensure that your system will be able to verify the integrity of the GPG signed APT repositories that APT connects and downloads metadata from, especially if no other security mechanisms (like TLS) are in place.

Unfortunately, some version of apt-key silently fail to remove GPG keys. The second comment explains the impact of this bug:

What’s not obvious from the original post, though, is that apt-key reports back that it deleted the key “OK” - but actually did not […] This can result in users and applications alike meaning to revoke trust on an APT archive keyring, being told they succeeded in doing so, but actually failing.
As such, from my perspective, this is a security bug.

This bug is currently marked as triaged, with one user reporting that they are still affected in 17.10. It remains to be seen if the next Ubuntu LTS, 18.04, will also be affected by this issue.

In the meantime, users are strongly encouraged to manually check their APT key rings in /etc/apt/trusted.gpg and /etc/apt/trusted.gpg.d/ and verify that GPG keys they revoke are actually revoked.

Having a stale or untrustworthy GPG key in your APT keyring could be extremely detrimental, so it is worth the extra effort to manually verify GPG keys have been correctly removed until this bug is fixed.

Always serve and access APT repositories over TLS. No exceptions; there are simply too many potential attack vectors and bugs in the GPG verification system provided by APT to use only GPG with plain text HTTP.

Using GPG signatures on APT repositories in addition to serving the repository over TLS is, of course, fine and encouraged.

Our overall recommendations for APT clients are:

Always use TLS for serving your APT repositories. Refuse to connect to repositories that are served over plain text HTTP.
Audit the GPG keys installed on your system to ensure the keys you’ve attempted to remove have actually been removed.
Audit your apt.conf settings to ensure that you have Acquire::AllowDowngradeToInsecureRepositories explicitly set to false.
Audit your apt.conf settings to ensure that you have Acquire::AllowInsecureRepositories explicitly set to false.
Ensure you have proper monitoring of your disk space to guard against “endless data” attacks.

Our overall recommendations for APT repository hosts are:

Use our service and avoid dealing with this yourself, or
Always serve your APT repositories over TLS

Create a secure APT repository in less than 10 seconds, free.

↧

Oxford English Dictionary extends hunt for regional words around the world

June 20, 2018, 11:31 am

≫ Next: Tesla sues ex-employee for hacking, theft, and leaking to the press

≪ Previous: Attacks against GPG signed APT repositories

The Oxford English Dictionary is asking the public to help it mine the regional differences of English around the world to expand its record of the language, with early submissions ranging from New Zealand’s “munted” to Hawaii’s “hammajang”.

Last year, a collaboration between the OED, the BBC and the Forward Arts Foundation to find and define local English words resulted in more than 100 new regional words and phrases being added to the dictionary, from Yorkshire’s “ee bah gum” to the north east’s “cuddy wifter”, a left-handed person. Now, the OED is widening its search to English speakers around the world, with associate editor Eleanor Maier calling the early response “phenomenal”, as editors begin to draft a range of suggestions for inclusion in the dictionary.

These range from Hawaii’s “hammajang”, meaning “in a disorderly or shambolic state”, to the Scottish word for a swimming costume, “dookers” or “duckers”, and New Zealand’s “munted”, meaning “broken or wrecked”. The OED is also looking to include the word “chopsy”, a Welsh term for an overly talkative person; “frog-drowner”, which Americans might use to describe a torrential downpour of rain; “brick”, which means “very cold” to residents of New Jersey and New York City; and “round the Wrekin”, meaning “in a lengthy or roundabout manner” in the Midlands.

The dictionary has already found that, depending on location, a picture hanging askew might be described as “agley”, “catawampous”, “antigodlin” or “ahoo” by an English speaker, while a loved one could be called a “doy”, “pet”, “dou-dou”, “bubele”, “alanna” or“babber”.

“The OED aims to cover all types of English, including standard English, scientific and technical vocabulary, literary words, slang, and regionalisms. So it’s important to include these words to enable us to present a picture of the English language in all its forms,” said Maier.

The Words Where You Are appeal is looking for more suggestions. These words will go alongside the regional words suggested by members of the UK public last year, when BBC Radio listeners were asked to send in their local turns of phrase, which were later included in poems by authors including Liz Berry and Hollie McNish for a National Poetry Day project.

“We were surprised and pleased by the number of regional words we were able to include as a result,” said Maier. “With the public’s suggestions as a starting point we were able to unearth a rich seam of regional vocabulary.”

Some of the words suggested in the UK date back centuries, such as “zamzawed” – Devon’s term for food or a meal that has been spoiled by overcooking – to more recent coinages such as “jarg”, used in Liverpool to refer to something false or misleading. Other additions now in the OED include “antwacky”, meaning old-fashioned, and “barry”, meaning great, with more to come.

Maier said that it can be difficult for the OED’s lexicographers to identify regional words, as they are more often spoken than written down, and the editors require citable evidence to include a new definition.

“In recent years, resources such as Twitter have been a great way for us to monitor the words that people are using informally in particular parts of the world and this, combined with targeted appeals, allows a lot more of these words to be identified and researched,” she said.

“Tarzy”, for example, is a Middlesbrough word meaning a makeshift rope swing used to swing over a river or stream. Lexicographers have so far dated it back to 2003, but Maier said that it is likely to have been used before that, with a colleague’s mother remembering using it as a child in the 1970s. The word can only have arisen in the 20th century, however, as it derives from the Edgar Rice Burroughs’ character Tarzan.

“Regional words indicate that their users come from a particular place and often contribute to one’s sense of identity,” said Maier. “You know you are home when words such as tarzy … can be used in the knowledge that they will be understood.”

Regional vocabulary has been included in the OED since its first edition, with many of the public’s suggestions, such as “ginnel” (an alleyway), “grockle” (a tourist), “far-welted” (describing a sheep on its back), “nesh” (cold, susceptible to cold), “clarty” (very muddy), “sneck” (a latch), “kaylied” (drunk), and “throng” (busy), already listed.

↧

Tesla sues ex-employee for hacking, theft, and leaking to the press

June 20, 2018, 9:39 am

≫ Next: Synergy-core: Open source core of Synergy, the keyboard and mouse sharing tool

≪ Previous: Oxford English Dictionary extends hunt for regional words around the world

Tesla filed a lawsuit on Wednesday alleging that a former employee hacked the company’s system and transferred “gigabytes” of data to unnamed third parties. The lawsuit may align with Tesla CEO Elon Musk’s recent claim of “sabotage” by an ex-worker.

Tesla’s lawsuit names Martin Tripp, a former process technician at the company’s Gigafactory in Nevada, as the defendant. Tesla claims that Tripp “unlawfully hacked the company’s confidential and trade secret information and transferred that information to third parties.” To do so, Tripp placed “hacking software” in the computers of three individual employees to routinely export confidential data, the company says.

Tesla says that Tripp has already admitted to hacking its manufacturing operating system, and the company accuses him of making false statements to the media about the stolen information. The suit, which was filed in federal court in Nevada, was first reported by CNBC.

Tesla claims that Tripp was the source of a news report about punctured battery cells at Tesla’s factory.

For example, Tripp claimed that punctured battery cells had been used in certain Model 3 vehicles even though no punctured cells were ever used in vehicles, batteries or otherwise. Tripp also vastly exaggerated the true amount and value of “scrap” material that Tesla generated during the manufacturing process, and falsely claimed that Tesla was delayed in bringing new manufacturing equipment online.

The lawsuit doesn’t specify which media outlet Tripp leaked to, but earlier this month, Business Insider ran a story about scrap metal waste and punctured battery cells at Tesla’s factory, citing documents and former employees as sources.

Musk first alluded to the incidents described in the lawsuit last week in an email to Tesla employees in which he claimed Tesla was being sabotaged by an ex-employee. The employee was allegedly found to have made “direct code changes” to Tesla’s manufacturing system and sent “large amounts of highly sensitive” data to third parties.

Musk also questioned whether “Wall Street short-sellers,” “oil & gas companies,” or “the multitude of big gas/diesel car company competitors” may have been involved. “If they’re willing to cheat so much about emissions, maybe they’re willing to cheat in other ways?” he wrote.

Tesla has struggled to ramp up production of its Model 3 vehicle, encountering bottlenecks in its factory and supply chain. The company is scrambling to produce 5,000 Model 3s a week ahead of its third-quarter earnings, as it attempts to evolve from a niche luxury electric car company into a mass-market player.

A lawyer for Tripp could not be immediately reached, and a Tesla spokesperson declined to comment.

↧

Synergy-core: Open source core of Synergy, the keyboard and mouse sharing tool

June 20, 2018, 9:15 am

≫ Next: YAML: probably not so great after all

≪ Previous: Tesla sues ex-employee for hacking, theft, and leaking to the press

README.md

This is the open source core component of Synergy, a keyboard and mouse sharing tool.

Download: https://symless.com/synergy/downloads

↧

YAML: probably not so great after all

June 20, 2018, 11:20 am

≫ Next: A Life-Saving Checklist

≪ Previous: Synergy-core: Open source core of Synergy, the keyboard and mouse sharing tool

I previously wrote why using JSON for human-editable configuration files is a bad idea. Today we’re going to look at some of the problems with YAML.

Insecure by default

YAML is insecure by default. Loading a user-provided (untrusted) YAML string needs careful consideration.

!!python/object/apply:os.system
args: ['ls /']

Running it with print(yaml.load(open('a.yaml'))) should give you something like:

bin   etc   lib    lost+found  opt   root  sbin  tmp  var sys
boot  dev   efi    home        lib64 mnt   proc  run  srv usr
0

Many other languages (including Ruby and PHP) are also unsafe by default.Searching for yaml.load on GitHub gives a whopping 2.8 million results.yaml.safe_load only gives 26,000 results.

Mind you, many of those yaml.load()s are fine – loading in a config file with yaml.load() is often okay since it’s usually (though not always!) from a ‘trusted source’, and many are from test files with static YAML. But still, one can’t help but wonder how many exploits are hidden in those 2.8 million results.

This is not a theoretical problem. In 2013 every Ruby on Rails application ever written was found to be vulnerable to remote code execution due to exactly this problem.

One might argue this is not really the fault of the YAML format as such, but rather the fault of the libraries implementing it wrong, but it seems to be the case that the majority of libraries are unsafe by default (especially the dynamic languages), so de-facto it is a problem with YAML.

One might also argue that fixing it is as easy as replacing load() withsafe_load(), but many people are unaware of the problem, and even if you’re aware of it, it’s one of those things that can be easy to forget. It’s pretty bad API design.

Can be hard to edit, especially for large files

YAML files can be hard to edit, and this difficulty grows fast as the file gets larger.

A good example of this are Ruby on Rails’ translation files; for example:

en:
   formtastic:
	 labels:
	   title: "Title"  # Default global value
	   article:
		 body: "Article content"
	   post:
		 new:
		   title: "Choose a title..."
		   body: "Write something..."
		 edit:
		   title: "Edit title"
		   body: "Edit body"

This still looks okay, right? But what if this file has 100 lines? Or 1,000 lines? It is difficult to see “where” in the file you are because it may be off the screen. You’ll need to scroll up, but then you need to keep track of the indentation, which can be pretty hard even with indentation guides, especially since 2-space indentation is the norm and tab indentation is forbidden.

And accidentally getting the indentation wrong often isn’t an error; it will often just deserialize to something you didn’t intend. Happy debugging!

I’ve been happily programming Python for over a decade, so I’m used to significant whitespace, but sometimes I’m still struggling with YAML. In Python the drawbacks and loss of clarity are contained by not having functions that are several pages long, but data or configuration files have no such natural limits to their length.

For small files this is not a problem; but it really doesn’t scale well to larger files, especially not if you want to edit them later on.

It’s pretty complex

YAML may seem ‘simple’ and ‘obvious’ when glancing at a basic example, but turns out it’s not. The YAML spec is 23,449 words; for comparison,TOML is 838 words, JSON is 1,969 words, andXML is 20,603 words.

Who among us have read all that? Who among us have read and understood all of that? Who among of have read, understood, and remembered all of that?

For example did you know there are nine ways to write a multi-line string in YAML with subtly different behaviour?

Yeah :-/

That post gets even more interesting if you look at its revision history, as the author of the post discovers more and more ways to do this and more of the subtleties involved.

It’s telling that the YAML spec starts with a preview, which states (emphases mine):

This section provides a quick glimpse into the expressive power of YAML. It is not expected that the first-time reader grok all of the examples. Rather, these selections are used as motivation for the remainder of the specification.

Surprising behaviour

What does this parse to (examples courtesy ofColm O’Connor):

- Don Corleone: Do you have faith in my judgment?
- Clemenza: Yes
- Don Corleone: Do I have your loyalty?

Yup!

[
	{'Don Corleone': 'Do you have faith in my judgment?'},
	{'Clemenza': True},
	{'Don Corleone': 'Do I have your loyalty?'}
]

Or what about:

python: 3.5.3
postgres: 9.3

9.3 gets recognized as a number, but 3.5.3 does’t:

{'python': '3.5.3', 'postgres': 9.3}

Or what about:

013: Tilburg
Effenaar: Eindhoven

013 is a popular music Venue in Tilburg, but YAML will send you the wrong way:

{11: 'Tilburg', 'Effenaar': 'Eindhoven'}

All of this – and more – is why many experienced YAMLers will often quote all strings, even when it’s not strictly required.

It’s not portable

Because it’s so complex, its claims of portability have been greatly exaggerated. For example consider this example taken from the YAML spec:

? - Detroit Tigers
  - Chicago cubs
:
  - 2001-07-23

? [ New York Yankees,
	Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
	2001-08-14 ]

Aside from the fact that most readers of this probably won’t even know what this does, try parsing it in Python with PyYAML:

yaml.constructor.ConstructorError: while constructing a mapping
  in "a.yaml", line 1, column 1
found unhashable key
  in "a.yaml", line 1, column 3

In Ruby it works:

{
	["Detroit Tigers", "Chicago cubs"] => [
		#<Date: 2001-07-23 ((2452114j,0s,0n),+0s,2299161j)>
	],
	["New York Yankees", "Atlanta Braves"] => [
		#<Date: 2001-07-02 ((2452093j,0s,0n),+0s,2299161j)>,
		#<Date: 2001-08-12 ((2452134j,0s,0n),+0s,2299161j)>,
		#<Date: 2001-08-14 ((2452136j,0s,0n),+0s,2299161j)>
	]
}

The reason for this is because you can’t use a list as a dict key in Python:

>>> {['a']: 'zxc'}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unhashable type: 'list'

And this restriction is not unique to Python; common languages such as PHP, JavaScript, and Go all share this restriction.

So use this in a YAML file, and you won’t be able to read it in most languages.

Here’s another example again taken from the examples section of the YAML spec:

# Ranking of 1998 home runs
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey

# Team ranking
---
- Chicago Cubs
- St Louis Cardinals

Python says:

yaml.composer.ComposerError: expected a single document in the stream
  in "a.yaml", line 3, column 1
but found another document
  in "a.yaml", line 8, column 1

While Ruby outputs:

["Mark McGwire", "Sammy Sosa", "Ken Griffey"]

The reason for this is that there are multiple YAML documents in a single file (--- start the document). In Python there is the load_all() function to parse all documents. Ruby’s load() just loads the first document, and as near as I can tell, doesn’t have a way to load multiple documents.

I’m fairly sure that many more incompatibilities more subtle than this can be found. Are you sure that every YAML parser will treat foo:bar as a string, or 0x42 as the integer 42, etc.?

Goals achieved?

The spec states:

The design goals for YAML are, in decreasing priority:
YAML is easily readable by humans.
YAML data is portable between programming languages.
YAML matches the native data structures of agile languages.
YAML has a consistent model to support generic tools.
YAML supports one-pass processing.
YAML is expressive and extensible.
YAML is easy to implement and use.

So how well does it do?

YAML is easily readable by humans.

True only if you stick to a small subset. The full set is complex – much more so than XML or JSON.

YAML data is portable between programming languages.

Not really true, as it’s too easy to create constructs that are not supported by common languages.

YAML matches the native data structures of agile languages.

See above. Plus, why only support agile (or dynamic) languages? What about other languages?

YAML has a consistent model to support generic tools.

I am not even sure what this means and I can’t find any elaboration.

YAML supports one-pass processing.

I’ll take their word for it.

YAML is expressive and extensible.

Well, it is, but it’s too expressive (e.g. too complex).

YAML is easy to implement and use.

$ cat `ls -1 ~/gocode/src/github.com/go-yaml/yaml/*.go | grep -v _test` | wc -l
9247

$ cat /usr/lib/python3.5/site-packages/yaml/*.py | wc -l
5713

Conclusion

Don’t get me wrong, it’s not like YAML is absolutely terrible – it’s certainly not as problematic as using JSON– but it’s not exactly great either. There are some drawbacks and surprises that are not at all obvious at first, and there are a number of better alternatives such as TOML and other more specialized formats.

Personally, I’m not likely to use it again when I’ve got a choice.

If you must use YAML then I recommend you useStrictYAML, which removes some (though not all) of the more hairy parts.

Feedback

You can mail me at martin@arp242.net or create a GitHub issue for feedback, questions, etc.

↧

A Life-Saving Checklist

June 20, 2018, 11:19 am

≫ Next: Keybase Exploding Messages

≪ Previous: YAML: probably not so great after all

The damage that the human body can survive these days is as awesome as it is horrible: crushing, burning, bombing, a burst blood vessel in the brain, a ruptured colon, a massive heart attack, rampaging infection. These conditions had once been uniformly fatal. Now survival is commonplace, and a large part of the credit goes to the irreplaceable component of medicine known as intensive care.

It’s an opaque term. Specialists in the field prefer to call what they do “critical care,” but that doesn’t exactly clarify matters. The non-medical term “life support” gets us closer. Intensive-care units take artificial control of failing bodies. Typically, this involves a panoply of technology—a mechanical ventilator and perhaps a tracheostomy tube if the lungs have failed, an aortic balloon pump if the heart has given out, a dialysis machine if the kidneys don’t work. When you are unconscious and can’t eat, silicone tubing can be surgically inserted into the stomach or intestines for formula feeding. If the intestines are too damaged, solutions of amino acids, fatty acids, and glucose can be infused directly into the bloodstream.

The difficulties of life support are considerable. Reviving a drowning victim, for example, is rarely as easy as it looks on television, where a few chest compressions and some mouth-to-mouth resuscitation always seem to bring someone with waterlogged lungs and a stilled heart coughing and sputtering back to life. Consider a case report in The Annals of Thoracic Surgery of a three-year-old girl who fell into an icy fishpond in a small Austrian town in the Alps. She was lost beneath the surface for thirty minutes before her parents found her on the pond bottom and pulled her up. Following instructions from an emergency physician on the phone, they began cardiopulmonary resuscitation. A rescue team arrived eight minutes later. The girl had a body temperature of sixty-six degrees, and no pulse. Her pupils were dilated and did not react to light, indicating that her brain was no longer working.

But the emergency technicians continued CPR anyway. A helicopter took her to a nearby hospital, where she was wheeled directly to an operating room. A surgical team put her on a heart-lung bypass machine. Between the transport time and the time it took to plug the inflow and outflow lines into the femoral vessels of her right leg, she had been lifeless for an hour and a half. By the two-hour mark, however, her body temperature had risen almost ten degrees, and her heart began to beat. It was her first organ to come back.

After six hours, her core temperature reached 98.6 degrees. The team tried to put her on a breathing machine, but the pond water had damaged her lungs too severely for oxygen to reach her blood. So they switched her to an artificial-lung system known as ECMO—extracorporeal membrane oxygenation. The surgeons opened her chest down the middle with a power saw and sewed lines to and from the ECMO unit into her aorta and her beating heart. The team moved the girl into intensive care, with her chest still open and covered with plastic foil. A day later, her lungs had recovered sufficiently for the team to switch her from ECMO to a mechanical ventilator and close her chest. Over the next two days, all her organs recovered except her brain. A CT scan showed global brain swelling, which is a sign of diffuse damage, but no actual dead zones. So the team drilled a hole into the girl’s skull, threaded in a probe to monitor her cerebral pressure, and kept that pressure tightly controlled by constantly adjusting her fluids and medications. For more than a week, she lay comatose. Then, slowly, she came back to life.

First, her pupils started to react to light. Next, she began to breathe on her own. And, one day, she simply awoke. Two weeks after her accident, she went home. Her right leg and left arm were partially paralyzed. Her speech was thick and slurry. But by age five, after extensive outpatient therapy, she had recovered her faculties completely. She was like any little girl again.

What makes her recovery astounding isn’t just the idea that someone could come back from two hours in a state that would once have been considered death. It’s also the idea that a group of people in an ordinary hospital could do something so enormously complex. To save this one child, scores of people had to carry out thousands of steps correctly: placing the heart-pump tubing into her without letting in air bubbles; maintaining the sterility of her lines, her open chest, the burr hole in her skull; keeping a temperamental battery of machines up and running. The degree of difficulty in any one of these steps is substantial. Then you must add the difficulties of orchestrating them in the right sequence, with nothing dropped, leaving some room for improvisation, but not too much.

For every drowned and pulseless child rescued by intensive care, there are many more who don’t make it—and not just because their bodies are too far gone. Machines break down; a team can’t get moving fast enough; a simple step is forgotten. Such cases don’t get written up in The Annals of Thoracic Surgery, but they are the norm. Intensive-care medicine has become the art of managing extreme complexity—and a test of whether such complexity can, in fact, be humanly mastered.

On any given day in the United States, some ninety thousand people are in intensive care. Over a year, an estimated five million Americans will be, and over a normal lifetime nearly all of us will come to know the glassed bay of an I.C.U. from the inside. Wide swaths of medicine now depend on the lifesupport systems that I.C.U.s provide: care for premature infants; victims of trauma, strokes, and heart attacks; patients who have had surgery on their brain, heart, lungs, or major blood vessels. Critical care has become an increasingly large portion of what hospitals do. Fifty years ago, I.C.U.s barely existed. Today, in my hospital, a hundred and fifty-five of our almost seven hundred patients are, as I write this, in intensive care. The average stay of an I.C.U. patient is four days, and the survival rate is eighty-six per cent. Going into an I.C.U., being put on a mechanical ventilator, having tubes and wires run into and out of you, is not a sentence of death. But the days will be the most precarious of your life.

A decade ago, Israeli scientists published a study in which engineers observed patient care in I.C.U.s for twenty-four-hour stretches. They found that the average patient required a hundred and seventy-eight individual actions per day, ranging from administering a drug to suctioning the lungs, and every one of them posed risks. Remarkably, the nurses and doctors were observed to make an error in just one per cent of these actions—but that still amounted to an average of two errors a day with every patient. Intensive care succeeds only when we hold the odds of doing harm low enough for the odds of doing good to prevail. This is hard. There are dangers simply in lying unconscious in bed for a few days. Muscles atrophy. Bones lose mass. Pressure ulcers form. Veins begin to clot off. You have to stretch and exercise patients’ flaccid limbs daily to avoid contractures, give subcutaneous injections of blood thinners at least twice a day, turn patients in bed every few hours, bathe them and change their sheets without knocking out a tube or a line, brush their teeth twice a day to avoid pneumonia from bacterial buildup in their mouths. Add a ventilator, dialysis, and open wounds to care for, and the difficulties only accumulate.

The story of one of my patients makes the point. Anthony DeFilippo was a forty-eight-year-old limousine driver from Everett, Massachusetts, who started to hemorrhage at a community hospital during surgery for a hernia and gallstones. The bleeding was finally stopped but his liver was severely damaged, and over the next few days he became too sick for the hospital’s facilities. When he arrived in our I.C.U., at 1:30 A.M. on a Sunday, his ragged black hair was plastered to his sweaty forehead, his body was shaking, and his heart was racing at a hundred and fourteen beats a minute. He was delirious from fever, shock, and low oxygen levels.

“I need to get out!” he cried. “I need to get out!” He clawed at his gown, his oxygen mask, the dressings covering his abdominal wound.

“Tony, it’s all right,” a nurse said to him. “We’re going to help you. You’re in a hospital.”

He shoved her—he was a big man—and tried to swing his legs out of the bed. We turned up his oxygen flow, put his wrists in cloth restraints, and tried to reason with him. He eventually let us draw blood from him and give him antibiotics.

The laboratory results came back showing liver failure, and a wildly elevated white-blood-cell count indicating infection. It soon became evident from his empty urine bag that his kidneys had failed, too. In the next few hours, his blood pressure fell, his breathing worsened, and he drifted from agitation to near-unconsciousness. Each of his organ systems, including his brain, was shutting down.

I called his sister, who was his next of kin, and told her of the situation. “Do everything you can,” she said.

So we did. We gave him a syringeful of anesthetic, and a resident slid a breathing tube into his throat. Another resident “lined him up.” She inserted a thin, two-inch-long needle and catheter through his upturned right wrist and into his radial artery, and then sewed the line to his skin with a silk suture. Next, she put in a central line—a twelve-inch catheter pushed into the jugular vein in his left neck. After she sewed that in place, and an X-ray showed its tip floating just where it was supposed to—inside his vena cava at the entrance to his heart—she put a third, slightly thicker line, for dialysis, through his right upper chest and into the subclavian vein, deep under the collarbone.

We hooked a breathing tube up to a hose from a ventilator and set it to give him fourteen forced breaths of a hundred-per-cent oxygen every minute. We dialled the ventilator pressures and gas flow up and down, like engineers at a control panel, until we got the blood levels of oxygen and carbon dioxide where we wanted them. The arterial line gave us continuous arterial blood-pressure measurements, and we tweaked his medications to get the pressures we liked. We regulated his intravenous fluids according to venous-pressure measurements from his jugular line. We plugged his subclavian line into tubing from a dialysis machine, and every few minutes his entire blood volume washed through this artificial kidney and back into his body; a little adjustment here and there, and we could alter the levels of potassium and bicarbonate and salt in his body as well. He was, we liked to imagine, a simple machine in our hands.

But he wasn’t, of course. It was as if we had gained a steering wheel and a few gauges and controls, but on a runaway eighteen-wheeler hurtling down a mountain. Keeping his blood pressure normal was requiring gallons of intravenous fluid and a pharmacy shelf of drugs. He was on near-maximal ventilator support. His temperature climbed to a hundred and four degrees. Less than five per cent of patients with his degree of organ failure make it home. And a single misstep could easily erase those slender chances.

For ten days, though, all went well. His chief problem had been liver damage from the operation he’d had. The main duct from his liver was severed and was leaking bile, which is caustic—it digests the fat in one’s diet and was essentially eating him alive from the inside. He had become too sick to survive an operation to repair the leak. So we tried a temporary solution—we had radiologists place a plastic drain, using X-ray guidance, through his abdominal wall and into the severed duct in order to draw the leaking bile out of him. They found so much that they had to place three drains—one inside the duct and two around it. But, as the bile drained out, his fevers subsided. His requirements for oxygen and fluids diminished. His blood pressure returned to normal. He was on the mend. Then, on the eleventh day, just as we were getting ready to take him off the mechanical ventilator, he developed high, spiking fevers, his blood pressure sank, and his blood-oxygen levels plummeted again. His skin became clammy. He got shaking chills.

We didn’t understand what had happened. He seemed to have developed an infection, but our X-rays and CT scans failed to turn up a source. Even after we put him on four antibiotics, he continued to spike fevers. During one fever, his heart went into fibrillation. A Code Blue was called. A dozen nurses and doctors raced to his bedside, slapped electric paddles onto his chest, and shocked him. His heart responded, fortunately, and went back into rhythm. It took two more days for us to figure out what had gone wrong. We considered the possibility that one of his lines had become infected, so we put in new lines and sent the old ones to the lab for culturing. Forty-eight hours later, the results returned: all of them were infected. The infection had probably started in one line, perhaps contaminated during insertion, and spread through his bloodstream to the others. Then they all began spilling bacteria into him, producing his fevers and steep decline.

This is the reality of intensive care: at any point, we are as apt to harm as we are to heal. Line infections are so common that they are considered a routine complication. I.C.U.s put five million lines into patients each year, and national statistics show that, after ten days, four per cent of those lines become infected. Line infections occur in eighty thousand people a year in the United States, and are fatal between five and twenty-eight per cent of the time, depending on how sick one is at the start. Those who survive line infections spend on average a week longer in intensive care. And this is just one of many risks. After ten days with a urinary catheter, four per cent of American I.C.U. patients develop a bladder infection. After ten days on a ventilator, six per cent develop bacterial pneumonia, resulting in death forty to fifty-five per cent of the time. All in all, about half of I.C.U. patients end up experiencing a serious complication, and, once a complication occurs, the chances of survival drop sharply.

It was a week before DeFilippo recovered sufficiently from his infections to come off the ventilator, and it was two months before he left the hospital. Weak and debilitated, he lost his limousine business and his home, and he had to move in with his sister. The tube draining bile still dangled from his abdomen; when he was stronger, I was going to have to do surgery to reconstruct the main bile duct from his liver. But he survived. Most people in his situation do not.

Here, then, is the puzzle of I.C.U. care: you have a desperately sick patient, and in order to have a chance of saving him you have to make sure that a hundred and seventy-eight daily tasks are done right—despite some monitor’s alarm going off for God knows what reason, despite the patient in the next bed crashing, despite a nurse poking his head around the curtain to ask whether someone could help “get this lady’s chest open.” So how do you actually manage all this complexity? The solution that the medical profession has favored is specialization.

I tell DeFilippo’s story, for instance, as if I were the one tending to him hour by hour. But that was actually Max Weinmann, an intensivist (as intensive-care specialists like to be called). I want to think that, as a general surgeon, I can handle most clinical situations. But, as the intricacies involved in intensive care have mounted, responsibility has increasingly shifted to super-specialists like him. In the past decade, training programs focussed on critical care have opened in every major American city, and half of I.C.U.s now rely on super-specialists.

Expertise is the mantra of modern medicine. In the early twentieth century, you needed only a high-school diploma and a one-year medical degree to practice medicine. By the century’s end, all doctors had to have a college degree, a four-year medical degree, and an additional three to seven years of residency training in an individual field of practice—pediatrics, surgery, neurology, or the like. Already, though, this level of preparation has seemed inadequate to the new complexity of medicine. After their residencies, most young doctors today are going on to do fellowships, adding one to three further years of training in, say, laparoscopic surgery, or pediatric metabolic disorders, or breast radiology—or critical care. A young doctor is not so young nowadays; you typically don’t start in independent practice until your mid-thirties.

We now live in the era of the super-specialist—of clinicians who have taken the time to practice at one narrow thing until they can do it better than anyone who hasn’t. Super-specialists have two advantages over ordinary specialists: greater knowledge of the details that matter and an ability to handle the complexities of the job. There are degrees of complexity, though, and intensive-care medicine has grown so far beyond ordinary complexity that avoiding daily mistakes is proving impossible even for our super-specialists. The I.C.U., with its spectacular successes and frequent failures, therefore poses a distinctive challenge: what do you do when expertise is not enough?

On October 30, 1935, at Wright Air Field in Dayton, Ohio, the U.S. Army Air Corps held a flight competition for airplane manufacturers vying to build its next-generation long-range bomber. It wasn’t supposed to be much of a competition. In early evaluations, the Boeing Corporation’s gleaming aluminum-alloy Model 299 had trounced the designs of Martin and Douglas. Boeing’s plane could carry five times as many bombs as the Army had requested; it could fly faster than previous bombers, and almost twice as far. A Seattle newspaperman who had glimpsed the plane called it the “flying fortress,” and the name stuck. The flight “competition,” according to the military historian Phillip Meilinger, was regarded as a mere formality. The Army planned to order at least sixty-five of the aircraft.

A small crowd of Army brass and manufacturing executives watched as the Model 299 test plane taxied onto the runway. It was sleek and impressive, with a hundred-and-three-foot wingspan and four engines jutting out from the wings, rather than the usual two. The plane roared down the tarmac, lifted off smoothly, and climbed sharply to three hundred feet. Then it stalled, turned on one wing, and crashed in a fiery explosion. Two of the five crew members died, including the pilot, Major Ployer P. Hill.

An investigation revealed that nothing mechanical had gone wrong. The crash had been due to “pilot error,” the report said. Substantially more complex than previous aircraft, the new plane required the pilot to attend to the four engines, a retractable landing gear, new wing flaps, electric trim tabs that needed adjustment to maintain control at different airspeeds, and constant-speed propellers whose pitch had to be regulated with hydraulic controls, among other features. While doing all this, Hill had forgotten to release a new locking mechanism on the elevator and rudder controls. The Boeing model was deemed, as a newspaper put it, “too much airplane for one man to fly.” The Army Air Corps declared Douglas’s smaller design the winner. Boeing nearly went bankrupt.

Still, the Army purchased a few aircraft from Boeing as test planes, and some insiders remained convinced that the aircraft was flyable. So a group of test pilots got together and considered what to do.

They could have required Model 299 pilots to undergo more training. But it was hard to imagine having more experience and expertise than Major Hill, who had been the U.S. Army Air Corps’ chief of flight testing. Instead, they came up with an ingeniously simple approach: they created a pilot’s checklist, with step-by-step checks for takeoff, flight, landing, and taxiing. Its mere existence indicated how far aeronautics had advanced. In the early years of flight, getting an aircraft into the air might have been nerve-racking, but it was hardly complex. Using a checklist for takeoff would no more have occurred to a pilot than to a driver backing a car out of the garage. But this new plane was too complicated to be left to the memory of any pilot, however expert.

With the checklist in hand, the pilots went on to fly the Model 299 a total of 1.8 million miles without one accident. The Army ultimately ordered almost thirteen thousand of the aircraft, which it dubbed the B-17. And, because flying the behemoth was now possible, the Army gained a decisive air advantage in the Second World War which enabled its devastating bombing campaign across Nazi Germany.

Medicine today has entered its B-17 phase. Substantial parts of what hospitals do—most notably, intensive care—are now too complex for clinicians to carry them out reliably from memory alone. I.C.U. life support has become too much medicine for one person to fly.

Yet it’s far from obvious that something as simple as a checklist could be of much help in medical care. Sick people are phenomenally more various than airplanes. A study of forty-one thousand trauma patients—just trauma patients—found that they had 1,224 different injury-related diagnoses in 32,261 unique combinations for teams to attend to. That’s like having 32,261 kinds of airplane to land. Mapping out the proper steps for each is not possible, and physicians have been skeptical that a piece of paper with a bunch of little boxes would improve matters much.

In 2001, though, a critical-care specialist at Johns Hopkins Hospital named Peter Pronovost decided to give it a try. He didn’t attempt to make the checklist cover everything; he designed it to tackle just one problem, the one that nearly killed Anthony DeFilippo: line infections. On a sheet of plain paper, he plotted out the steps to take in order to avoid infections when putting a line in. Doctors are supposed to (1) wash their hands with soap, (2) clean the patient’s skin with chlorhexidine antiseptic, (3) put sterile drapes over the entire patient, (4) wear a sterile mask, hat, gown, and gloves, and (5) put a sterile dressing over the catheter site once the line is in. Check, check, check, check, check. These steps are no-brainers; they have been known and taught for years. So it seemed silly to make a checklist just for them. Still, Pronovost asked the nurses in his I.C.U. to observe the doctors for a month as they put lines into patients, and record how often they completed each step. In more than a third of patients, they skipped at least one.

The next month, he and his team persuaded the hospital administration to authorize nurses to stop doctors if they saw them skipping a step on the checklist; nurses were also to ask them each day whether any lines ought to be removed, so as not to leave them in longer than necessary. This was revolutionary. Nurses have always had their ways of nudging a doctor into doing the right thing, ranging from the gentle reminder (“Um, did you forget to put on your mask, doctor?”) to more forceful methods (I’ve had a nurse bodycheck me when she thought I hadn’t put enough drapes on a patient). But many nurses aren’t sure whether this is their place, or whether a given step is worth a confrontation. (Does it really matter whether a patient’s legs are draped for a line going into the chest?) The new rule made it clear: if doctors didn’t follow every step on the checklist, the nurses would have backup from the administration to intervene.

Pronovost and his colleagues monitored what happened for a year afterward. The results were so dramatic that they weren’t sure whether to believe them: the ten-day line-infection rate went from eleven per cent to zero. So they followed patients for fifteen more months. Only two line infections occurred during the entire period. They calculated that, in this one hospital, the checklist had prevented forty-three infections and eight deaths, and saved two million dollars in costs.

Pronovost recruited some more colleagues, and they made some more checklists. One aimed to insure that nurses observe patients for pain at least once every four hours and provide timely pain medication. This reduced the likelihood of a patient’s experiencing untreated pain from forty-one per cent to three per cent. They tested a checklist for patients on mechanical ventilation, making sure that, for instance, the head of each patient’s bed was propped up at least thirty degrees so that oral secretions couldn’t go into the windpipe, and antacid medication was given to prevent stomach ulcers. The proportion of patients who didn’t receive the recommended care dropped from seventy per cent to four per cent; the occurrence of pneumonias fell by a quarter; and twenty-one fewer patients died than in the previous year. The researchers found that simply having the doctors and nurses in the I.C.U. make their own checklists for what they thought should be done each day improved the consistency of care to the point that, within a few weeks, the average length of patient stay in intensive care dropped by half.

The checklists provided two main benefits, Pronovost observed. First, they helped with memory recall, especially with mundane matters that are easily overlooked in patients undergoing more drastic events. (When you’re worrying about what treatment to give a woman who won’t stop seizing, it’s hard to remember to make sure that the head of her bed is in the right position.) A second effect was to make explicit the minimum, expected steps in complex processes. Pronovost was surprised to discover how often even experienced personnel failed to grasp the importance of certain precautions. In a survey of I.C.U. staff taken before introducing the ventilator checklists, he found that half hadn’t realized that there was evidence strongly supporting giving ventilated patients antacid medication. Checklists established a higher standard of baseline performance.

These are, of course, ridiculously primitive insights. Pronovost is routinely described by colleagues as “brilliant,” “inspiring,” a “genius.” He has an M.D. and a Ph.D. in public health from Johns Hopkins, and is trained in emergency medicine, anesthesiology, and critical-care medicine. But, really, does it take all that to figure out what house movers, wedding planners, and tax accountants figured out ages ago?

Pronovost is hardly the first person in medicine to use a checklist. But he is among the first to recognize its power to save lives and take advantage of the breadth of its possibilities. Forty-two years old, with cropped light-brown hair, tenth-grader looks, and a fluttering, finchlike energy, he is an odd mixture of the nerdy and the messianic. He grew up in Waterbury, Connecticut, the son of an elementary-school teacher and a math professor, went to nearby Fairfield University, and, like many good students, decided that he would go into medicine. Unlike many students, though, he found that he actually liked caring for sick people. He hated the laboratory—with all those micropipettes and cell cultures, and no patients around—but he had that scientific “How can I solve this unsolved problem?” turn of mind. So after his residency in anesthesiology and his fellowship in critical care, he studied clinical-research methods.

For his doctoral thesis, he examined intensive-care units in Maryland, and he discovered that putting an intensivist on staff reduced death rates by a third. It was the first time that someone had demonstrated the public-health value of using intensivists. He wasn’t satisfied with having proved his case, though; he wanted hospitals to change accordingly. After his study was published, in 1999, he met with a coalition of large employers known as the Leapfrog Group. It included companies like General Motors and Verizon, which were seeking to improve the standards of hospitals where their employees obtain care. Within weeks, the coalition announced that its members expected the hospitals they contracted with to staff their I.C.U.s with intensivists. These employers pay for health care for thirty-seven million employees, retirees, and dependents nationwide. So although hospitals protested that there weren’t enough intensivists to go around, and that the cost could be prohibitive, Pronovost’s idea effectively became an instant national standard.

The scientist in him has always made room for the campaigner. People say he is the kind of guy who, even as a trainee, could make you feel you’d saved the world every time you washed your hands properly. “I’ve never seen anybody inspire as he does,” Marty Makary, a Johns Hopkins surgeon, told me. “Partly, he has this contagious, excitable nature. He has a smile that’s tough to match. But he also has a way of making people feel heard. People will come to him with the dumbest ideas, and he’ll endorse them anyway. ‘Oh, I like that, I like that, I like that!’ he’ll say. I’ve watched him, and I still have no idea how deliberate this is. Maybe he really does like every idea. But wait, and you realize: he only acts on the ones he truly believes in.”

After the checklist results, the idea Pronovost truly believed in was that checklists could save enormous numbers of lives. He took his findings on the road, showing his checklists to doctors, nurses, insurers, employers—anyone who would listen. He spoke in an average of seven cities a month while continuing to work full time in Johns Hopkins’s I.C.U.s. But this time he found few takers.

There were various reasons. Some physicians were offended by the suggestion that they needed checklists. Others had legitimate doubts about Pronovost’s evidence. So far, he’d shown only that checklists worked in one hospital, Johns Hopkins, where the I.C.U.s have money, plenty of staff, and Peter Pronovost walking the hallways to make sure that the checklists are being used properly. How about in the real world—where I.C.U. nurses and doctors are in short supply, pressed for time, overwhelmed with patients, and hardly receptive to the idea of filling out yet another piece of paper?

In 2003, however, the Michigan Health and Hospital Association asked Pronovost to try out three of his checklists in Michigan’s I.C.U.s. It would be a huge undertaking. Not only would he have to get the state’s hospitals to use the checklists; he would also have to measure whether doing so made a genuine difference. But at last Pronovost had a chance to establish whether his checklist idea really worked.

This past summer, I visited Sinai-Grace Hospital, in inner-city Detroit, and saw what Pronovost was up against. Occupying a campus of red brick buildings amid abandoned houses, check-cashing stores, and wig shops on the city’s West Side, just south of 8 Mile Road, Sinai-Grace is a classic urban hospital. It has eight hundred physicians, seven hundred nurses, and two thousand other medical personnel to care for a population with the lowest median income of any city in the country. More than a quarter of a million residents are uninsured; three hundred thousand are on state assistance. That has meant chronic financial problems. Sinai-Grace is not the most cash-strapped hospital in the city—that would be Detroit Receiving Hospital, where a fifth of the patients have no means of payment. But between 2000 and 2003 Sinai-Grace and eight other Detroit hospitals were forced to cut a third of their staff, and the state had to come forward with a fifty-million-dollar bailout to avert their bankruptcy.

Sinai-Grace has five I.C.U.s for adult patients and one for infants. Hassan Makki, the director of intensive care, told me what it was like there in 2004, when Pronovost and the hospital association started a series of mailings and conference calls with hospitals to introduce checklists for central lines and ventilator patients. “Morale was low,” he said. “We had lost lots of staff, and the nurses who remained weren’t sure if they were staying.” Many doctors were thinking about leaving, too. Meanwhile, the teams faced an even heavier workload because of new rules limiting how long the residents could work at a stretch. Now Pronovost was telling them to find the time to fill out some daily checklists?

Tom Piskorowski, one of the I.C.U. physicians, told me his reaction: “Forget the paperwork. Take care of the patient.”

I accompanied a team on 7 A.M. rounds through one of the surgical I.C.U.s. It had eleven patients. Four had gunshot wounds (one had been shot in the chest; one had been shot through the bowel, kidney, and liver; two had been shot through the neck, and left quadriplegic). Five patients had cerebral hemorrhaging (three were seventy-nine years and older and had been injured falling down stairs; one was a middle-aged man whose skull and left temporal lobe had been damaged by an assault with a blunt weapon; and one was a worker who had become paralyzed from the neck down after falling twenty-five feet off a ladder onto his head). There was a cancer patient recovering from surgery to remove part of his lung, and a patient who had had surgery to repair a cerebral aneurysm.

The doctors and nurses on rounds tried to proceed methodically from one room to the next but were constantly interrupted: a patient they thought they’d stabilized began hemorrhaging again; another who had been taken off the ventilator developed trouble breathing and had to be put back on the machine. It was hard to imagine that they could get their heads far enough above the daily tide of disasters to worry about the minutiae on some checklist.

Yet there they were, I discovered, filling out those pages. Mostly, it was the nurses who kept things in order. Each morning, a senior nurse walked through the unit, clipboard in hand, making sure that every patient on a ventilator had the bed propped at the right angle, and had been given the right medicines and the right tests. Whenever doctors put in a central line, a nurse made sure that the central-line checklist had been filled out and placed in the patient’s chart. Looking back through their files, I found that they had been doing this faithfully for more than three years.

Pronovost had been canny when he started. In his first conversations with hospital administrators, he didn’t order them to use the checklists. Instead, he asked them simply to gather data on their own infection rates. In early 2004, they found, the infection rates for I.C.U. patients in Michigan hospitals were higher than the national average, and in some hospitals dramatically so. Sinai-Grace experienced more line infections than seventy-five per cent of American hospitals. Meanwhile, Blue Cross Blue Shield of Michigan agreed to give hospitals small bonus payments for participating in Pronovost’s program. A checklist suddenly seemed an easy and logical thing to try.

In what became known as the Keystone Initiative, each hospital assigned a project manager to roll out the checklists and participate in a twice-monthly conference call with Pronovost for trouble-shooting. Pronovost also insisted that each participating hospital assign to each unit a senior hospital executive, who would visit the unit at least once a month, hear people’s complaints, and help them solve problems.

The executives were reluctant. They normally lived in meetings worrying about strategy and budgets. They weren’t used to venturing into patient territory and didn’t feel that they belonged there. In some places, they encountered hostility. But their involvement proved crucial. In the first month, according to Christine Goeschel, at the time the Keystone Initiative’s director, the executives discovered that the chlorhexidine soap, shown to reduce line infections, was available in fewer than a third of the I.C.U.s. This was a problem only an executive could solve. Within weeks, every I.C.U. in Michigan had a supply of the soap. Teams also complained to the hospital officials that the checklist required that patients be fully covered with a sterile drape when lines were being put in, but full-size barrier drapes were often unavailable. So the officials made sure that the drapes were stocked. Then they persuaded Arrow International, one of the largest manufacturers of central lines, to produce a new central-line kit that had both the drape and chlorhexidine in it.

In December, 2006, the Keystone Initiative published its findings in a landmark article in The New England Journal of Medicine. Within the first three months of the project, the infection rate in Michigan’s I.C.U.s decreased by sixty-six per cent. The typical I.C.U.—including the ones at Sinai-Grace Hospital—cut its quarterly infection rate to zero. Michigan’s infection rates fell so low that its average I.C.U. outperformed ninety per cent of I.C.U.s nationwide. In the Keystone Initiative’s first eighteen months, the hospitals saved an estimated hundred and seventy-five million dollars in costs and more than fifteen hundred lives. The successes have been sustained for almost four years—all because of a stupid little checklist.

Pronovost’s results have not been ignored. He has since had requests to help Rhode Island, New Jersey, and the country of Spain do what Michigan did. Back in the Wolverine State, he and the Keystone Initiative have begun testing half a dozen additional checklists to improve care for I.C.U. patients. He has also been asked to develop a program for surgery patients. It has all become more than he and his small group of researchers can keep up with.

But consider: there are hundreds, perhaps thousands, of things doctors do that are at least as dangerous and prone to human failure as putting central lines into I.C.U. patients. It’s true of cardiac care, stroke treatment, H.I.V. treatment, and surgery of all kinds. It’s also true of diagnosis, whether one is trying to identify cancer or infection or a heart attack. All have steps that are worth putting on a checklist and testing in routine care. The question—still unanswered—is whether medical culture will embrace the opportunity.

Tom Wolfe’s “The Right Stuff” tells the story of our first astronauts, and charts the demise of the maverick, Chuck Yeager test-pilot culture of the nineteen-fifties. It was a culture defined by how unbelievably dangerous the job was. Test pilots strapped themselves into machines of barely controlled power and complexity, and a quarter of them were killed on the job. The pilots had to have focus, daring, wits, and an ability to improvise—the right stuff. But as knowledge of how to control the risks of flying accumulated—as checklists and flight simulators became more prevalent and sophisticated—the danger diminished, values of safety and conscientiousness prevailed, and the rock-star status of the test pilots was gone.

Something like this is going on in medicine. We have the means to make some of the most complex and dangerous work we do—in surgery, emergency care, and I.C.U. medicine—more effective than we ever thought possible. But the prospect pushes against the traditional culture of medicine, with its central belief that in situations of high risk and complexity what you want is a kind of expert audacity—the right stuff, again. Checklists and standard operating procedures feel like exactly the opposite, and that’s what rankles many people.

It’s ludicrous, though, to suppose that checklists are going to do away with the need for courage, wits, and improvisation. The body is too intricate and individual for that: good medicine will not be able to dispense with expert audacity. Yet it should also be ready to accept the virtues of regimentation.

The still limited response to Pronovost’s work may be easy to explain, but it is hard to justify. If someone found a new drug that could wipe out infections with anything remotely like the effectiveness of Pronovost’s lists, there would be television ads with Robert Jarvik extolling its virtues, detail men offering free lunches to get doctors to make it part of their practice, government programs to research it, and competitors jumping in to make a newer, better version. That’s what happened when manufacturers marketed central-line catheters coated with silver or other antimicrobials; they cost a third more, and reduced infections only slightly—and hospitals have spent tens of millions of dollars on them. But, with the checklist, what we have is Peter Pronovost trying to see if maybe, in the next year or two, hospitals in Rhode Island and New Jersey will give his idea a try.

Pronovost remains, in a way, an odd bird in medical research. He does not have the multimillion-dollar grants that his colleagues in bench science have. He has no swarm of doctoral students and lab animals. He’s focussed on work that is not normally considered a significant contribution in academic medicine. As a result, few other researchers are venturing to extend his achievements. Yet his work has already saved more lives than that of any laboratory scientist in the past decade.

I called Pronovost recently at Johns Hopkins, where he was on duty in an I.C.U. I asked him how long it would be before the average doctor or nurse is as apt to have a checklist in hand as a stethoscope (which, unlike checklists, has never been proved to make a difference to patient care).

“At the current rate, it will never happen,” he said, as monitors beeped in the background. “The fundamental problem with the quality of American medicine is that we’ve failed to view delivery of health care as a science. The tasks of medical science fall into three buckets. One is understanding disease biology. One is finding effective therapies. And one is insuring those therapies are delivered effectively. That third bucket has been almost totally ignored by research funders, government, and academia. It’s viewed as the art of medicine. That’s a mistake, a huge mistake. And from a taxpayer’s perspective it’s outrageous.” We have a thirty-billion-dollar-a-year National Institutes of Health, he pointed out, which has been a remarkable powerhouse of discovery. But we have no billion-dollar National Institute of Health Care Delivery studying how best to incorporate those discoveries into daily practice.

I asked him how much it would cost for him to do for the whole country what he did for Michigan. About two million dollars, he said, maybe three, mostly for the technical work of signing up hospitals to participate state by state and coördinating a database to track the results. He’s already devised a plan to do it in all of Spain for less.

“We could get I.C.U. checklists in use throughout the United States within two years, if the country wanted it,” he said.

So far, it seems, we don’t. The United States could have been the first to adopt medical checklists nationwide, but, instead, Spain will beat us. “I at least hope we’re not the last,” Pronovost said.

Recently, I spoke to Markus Thalmann, the cardiac surgeon on the team that saved the little Austrian girl who had drowned, and learned that a checklist had been crucial to her survival. Thalmann had worked for six years at the city hospital in Klagenfurt, the small provincial capital in south Austria where the girl was resuscitated. She was not the first person whom he and his colleagues had tried to revive from cardiac arrest after hypothermia and suffocation. They received between three and five such patients a year, he estimated, mostly avalanche victims (Klagenfurt is surrounded by the Alps), some of them drowning victims, and a few of them people attempting suicide by taking a drug overdose and then wandering out into the snowy forests to fall unconscious.

For a long time, he said, no matter how hard the medical team tried, it had no survivors. Most of the victims had gone without a pulse and oxygen for too long by the time they were found. But some, he felt, still had a flicker of viability in them, and each time the team failed to sustain it.

Speed was the chief difficulty. Success required having an array of equipment and people at the ready—helicopter-rescue personnel, trauma surgeons, an experienced cardiac anesthesiologist and surgeon, bioengineering support staff, operating and critical-care nurses, intensivists. Too often, someone or something was missing. So he and a couple of colleagues made and distributed a checklist. In cases like these, the checklist said, rescue teams were to tell the hospital to prepare for possible cardiac bypass and rewarming. They were to call, when possible, even before they arrived on the scene, as the preparation time could be significant. The hospital would then work down a list of people to be notified. They would have an operating room set up and standing by.

The team had its first success with the checklist in place—the rescue of the three-year-old girl. Not long afterward, Thalmann left to take a job at a hospital in Vienna. The team, however, was able to make at least two other such rescues, he said. In one case, a man was found frozen and pulseless after a suicide attempt. In another, a mother and her sixteen-year-old daughter were in an accident that sent them and their car through a guardrail, over a cliff, and into a mountain river. The mother died on impact; the daughter was trapped as the car rapidly filled with icy water. She had been in cardiac and respiratory arrest for a prolonged period of time when the rescue team arrived.

From that point onward, though, the system went like clockwork. By the time the rescue team got to her and began CPR, the hospital had been notified. The transport team got her there in minutes. The surgical team took her straight to the operating room and crashed her onto heart-lung bypass. One step went right after another. And, because of the speed with which they did, she had a chance.

As the girl’s body slowly rewarmed, her heart came back. In the I.C.U., a mechanical ventilator, fluids, and intravenous drugs kept her going while the rest of her body recovered. The next day, the doctors were able to remove her lines and tubes. The day after that, she was sitting up in bed, ready to go home. ♦

↧

Keybase Exploding Messages

June 20, 2018, 11:09 am

≫ Next: How a Few People Took Equifax to Small Claims Court Over Its Data Breach and Won

≪ Previous: A Life-Saving Checklist

NEWS FLASH:Keybase Exploding Messages

June 20, 2018

Hi everyone. Starting today, you can put a short fuse on your Keybase chat messages. This works for 1-on-1 chats and also inside team, group, and community channels.

demonstration from my brother

We recommend your most intimate and most corporate messages get the exploding treatment. Each end of the professionalism spectrum. Don't put that dank meme or Income Statement in Slack, people. Put it in Keybase, so it's encrypted.

And if you'd like it to be temporary, make it explode.

Some details:

They blow up

When the time comes, KA-BOOM

When the countdown hits zero, everyone throws away the plaintext and ciphertext.

Exploding messages have forward secrecy

Forward secrecy means your apps use temporary "ephemeral" keys that are thrown away.

To be clear, all Keybase messages - not just exploding ones - are end-to-end encrypted. No one can read them without compromising one of your devices.

Forward secrecy is something extra. It protects you if someone in the future steals your device's key and also has access to old recorded messages, say by scooping up ciphertexts as a man-in-the-middle.

Forward secrecy works extra well with timed messages because your device destroys both the decryption keys and plaintext.

Splosions from the command line

You can also send exploding messages from your terminal of choice.*

keybase chat send --exploding-lifetime "30s" friend1234 \"Yo - meet @ 10pm under the powerlines. Bring the stuff"


keybase chat send --exploding-lifetime "6h" acmecorp.finance \"That is NO laughing matter, James."

* assuming sane choices

If you're writing a bot, you can pipe JSON to keybase chat api. Example:

{
  "method":"send",
  "params":{
     "options":{
        "channel":{
           "name":"vegetarians.bots"},
        "exploding_lifetime":"1d",
        "message":{
           "body":"smellbot has detected meat in the office foyer."}}}}

You can get more API examples with keybase chat api --help.

That's it!

Another step forward for Keybase. Cryptography for everyone.

More stuff soon,
💖 Keybase

INSTALL KEYBASE 🌈

FAQ

Are there docs on how this works?

Yes, here are the crypto docs. And here's the Keybase source code.

My team uses Telegram and I'm scared shitless.

Hugs

Why don't all messages have forward secrecy?

We've blogged about this before. It's not the right answer for a mainstream chat replacement, because it forces dangerous behavior and/or makes you lose all your group's messages when you update devices.

Still, it's up to you! Now you can have all the forward secrecy you want with Keybase.

What about repudiation?

Repudiation is launching in ~3 weeks.

So yes, in chats up to 100 people, messages will be pairwise MAC'ed. Pairwise MAC'ing doesn't scale for mega-large community chats, so those messages are still signed & encrypted in the traditional sense. This seems like the best balance.

Some have argued repudiation is a silly feature. After all, who, in the history of humanity, has ever not gotten in trouble for something that leaked, just because it wasn't cryptographically signed?

Anyway, this is minimal overhead with some benefit, so we're doing it.

Does the timer begin when the message is sent or received?

Sent.

This seems like the only sensible answer for group chats. And we can't have a different answer for 1-on-1 chats and group chats. That would confuse people. Not the kind of person who reads an FAQ such as yourself, of course.

So our answer is simple: you set a timer and the message is gone after that time.

I have nothing to hide

Because no one is trying to hurt you

INSTALL KEYBASE 🌈

↧

How a Few People Took Equifax to Small Claims Court Over Its Data Breach and Won

June 20, 2018, 11:48 am

≫ Next: SEC issues warning against John McAfee’s crypto shilling

≪ Previous: Keybase Exploding Messages

The company failed to act on a March 2017 warning from federal authorities about a vulnerability that hackers would later exploit, resulting in the exposure of millions of consumers’ Social Security numbers, driver’s license numbers and other sensitive personal information.

Even after the breach, Equifax fumbled its response, briefly directing worried consumers to a fake, unaffiliated website. By its own admission, the security lapse had cost the company almost $243 million by the end of the first quarter of this year.

For many, those missteps were a motivation to act. No matter how futile it seemed to take on a multibillion-dollar corporation, local courts at least offered an opportunity to hold Equifax to account.

“They have a responsibility to either get back to you or lose,” Ms. West said of the small claims process. “That’s the most fascinating thing.”

For Ms. Bernstein, who runs a medical device consulting business, the Equifax breach was particularly frustrating because it threatened to interfere with her ability to apply for government grants. Because the grants often require credit checks, a credit freeze, which experts recommended after the breach, was out of the question.

“I really don’t need this aggravation with my credit and my phone number and, God knows, maybe my bank account numbers being leaked all over the internet,” she recalled thinking.

A judge in San Francisco Superior Court agreed, ordering Equifax to pay not only for the cost of credit monitoring, but also for the emotional distress and time Ms. Bernstein spent trying to get through to the company. On appeal, another judge reduced the award by about $2,000, setting it at $7,440.

In the end, Ms. Bernstein received restitution and relief: Last week, just days before receiving the Equifax check, Ms. Bernstein opened an envelope containing another check, this one for $15,000 for a grant she had just been awarded.

↧

SEC issues warning against John McAfee’s crypto shilling

June 20, 2018, 10:21 am

≫ Next: Is Serverless Just a New Word for Cloud Based?

≪ Previous: How a Few People Took Equifax to Small Claims Court Over Its Data Breach and Won

June 20, 2018 by Ben Riley

John McAfee is a 72-year-old businessman, best known for his work in the cybersecurity industry as the founder of McAfee Internet Security. He is the CEO of MGT Capital Investments Inc and since 2017, he has been a strong advocate in the Cryptocurrency space. With almost 830,000 followers on Twitter, John has a huge influence on the community and has made some bold price predictions for various Cryptocurrencies. His most common prediction is the still outstanding, where he predicted $500,000 USD Bitcoin by the end of 2020 which has since been raised to $1,000,000 USD. He is a strong believer in the technology and continues to promote in to his audience. You can track this prediction at (http://bircoin.top).

As John became more renown for his influence in the Cryptocurrency space by sharing his knowledge and ideas, he began to promote Initial Coin Offerings (ICO’s). ICO’s are a very lucrative and high-risk investment and have been heavily scrutinised by the U.S Securities and Exchange Commission (SEC). John began promoting coins and projects frequently on his Twitter account with a ‘coin of the day’ and ‘coin of the week’. He later made it public that he was accepting $105,000 USD for every promotional tweet he would make. After such tweets and recommendations, the price movement of these coins was being affected and some referred to John as a ‘shill’.

On June 19th, 2018, John put out a tweet to alert the community that the SEC has warned him to stop promoting ICO’s as they are unregulated and often fraudulent. He stated that he is looking at documenting an alternative to ICO’s which the SEC won’t be able to touch.

Despite this setback, John has been at work, expanding on his plans to bring Cryptocurrency to the world and support the transition from the regular financial system as we know it. He announced that in 2020, he aims to run for a presidential position to try and raise mass awareness for Cryptocurrencies. He’s also been looking into a ‘fiat’ currency which is backed by Cryptocurrency.

Although John McAfee is often perceived as an erratic and outgoing individual, I believe that he only wants the best for Cryptocurrency and having an influence as large as his is highly beneficial to helping spread awareness of this great technology which should help boost long-term success. Make sure to follow John and have a watch of some of his interviews, this man has some great knowledge and will do great things for the industry.

Check out his Twitter here: https://twitter.com/officialmcafee

Follow my work on YouTube: https://youtube.com/c/thecryptogod

↧

Is Serverless Just a New Word for Cloud Based?

June 20, 2018, 8:10 am

≫ Next: Erlang OTP 21.0 is Released

≪ Previous: SEC issues warning against John McAfee’s crypto shilling

Rick Golba | June 20, 2018 | Posted In: Amazon RDS, AWS, Cloud and MySQL, Cloud and NoSQL, MariaDB, MongoDB, MySQL, PostgreSQL

Serverless is a new buzzword in the database industry. Even though it gets tossed around often, there is some confusion about what it really means and how it really works. Serverless architectures rely on third-party Backend as a Service (BaaS) services. They can also include custom code that is run in managed, ephemeral containers on a Functions as a Service (FaaS) platform. In comparison to traditional Platform as a Service (PaaS) server architecture, where you pay a predetermined sum for your instances, serverless applications benefit from reduced costs of operations and lower complexity. They are also considered to be more agile, allowing for reduced engineering efforts.

In reality, there are still servers in a serverless architecture: they are just being used, managed, and maintained outside of the application. But isn’t that a lot like what cloud providers, such as Amazon RDS, Google Cloud, and Microsoft Azure, are already offering? Well, yes, but with several caveats.

When you use any of the aforementioned platforms, you still need to provision the types of instances that you plan to use and define how those platforms will act. For example, will it run MySQL, MongoDB, PostgreSQL, or some other tool? With serverless, these decisions are no longer needed. Instead, you simply consume resources from a shared resource pool, using whatever application suits your needs at that time. In addition, in a serverless world, you are only charged for the time that you use the server instead of being charged whether you use it a lot or a little (or not at all).

Remember When You Joined That Gym?

How many of us have purchased a gym membership at some point in our life? Oftentimes, you walk in with the best of intentions and happily enroll in a monthly plan. “For only $29.95 per month, you can use all of the resources of the gym as much as you want.” But, many of us have purchased such a membership and found that our visits to the gym dwindle over time, leaving us paying the same monthly fee for less usage.

Traditional Database as a Service (DBaaS) offerings are similar to your gym membership: you sign up, select your service options, and start using them right away. There are certainly cases of companies using those services consistently, just like there are gym members who show up faithfully month after month. But there are also companies who spin up database instances for a specific purpose, use the database instance for some amount of time, and then slowly find that they are accessing that instance less and less. However, the fees for the instance, much like the fees for your gym membership, keep getting charged.

What if we had a “pay as you go” gym plan? Well, some of those certainly exist. Serverless architecture is somewhat like this plan: you only pay for the resources when you use them, and you only pay for your specific usage. This would be like charging $5 for access to the weight room and $3 for access to the swimming pool, each time you use one or the other. The one big difference with serverless architecture for databases is that you still need to have your data stored somewhere in the environment and made available to you as needed. This would be like renting a gym locker to store your workout gear so that didn’t have to bring it back and forth each time you visited.

Obviously, you will pay for that storage, whether it is your data or your workout gear, but the storage fees are going to be less than your standard membership. The big advantage is that you have what you need when you need it, and you can access the necessary resources to use whatever you are storing.

With a serverless architecture, you store your data securely on low cost storage devices and access as needed. The resources required to process that data are available on an on demand basis. So, your charges are likely to be lower since you are paying a low fee for data storage and a usage fee on resources. This can work great for companies that do not need 24x7x365 access to their data since they are only paying for the services when they are using them. It’s also ideal for developers, who may find that they spend far more time working on their application code than testing it against the database. Instead of paying for the database resources while the data is just sitting there doing nothing, you now pay to store the data and incur the database associated fees at use time.

Benefits and Risks of Going Serverless

One of the biggest possible benefits of going with a serverless architecture is that you save money and hassle. Money can be saved since you only pay for the resources when you use them. Hassle is reduced since you don’t need to worry about the hardware on which your application runs. These can be big wins for a company, but you need to be aware of some pitfalls.

First, serverless can save you money, but there is no guarantee that it will save you money.

Consider 2 different people who have the exact same cell phone – maybe it’s your dad and your teenage daughter. These 2 users probably have very different patterns of usage: your dad uses the phone sporadically (if at all!) and your teenage daughter seems to have her phone physically attached to her. These 2 people would benefit from different service plans with their provider. For your dad, a basic plan that allows some usage (similar to the base cost of storage in our serverless database) with charges for usage above that cap would probably suffice. However, such a plan for your teenage daughter would probably spiral out of control and incur very high usage fees. For her, an unlimited plan makes sense. What is a great fit for one user is a poor fit for another, and the same is true when comparing serverless and DBaaS options.

The good news is that serverless architectures and DBaaS options, like Amazon RDS, Microsoft Azure, and Google Cloud, reduce a lot of the hassle of owning and managing servers. You no longer need to be concerned about Mean Time Between Failures, power and cooling issues, or many of the other headaches that come with maintaining your hardware. However, this can also have a negative consequence.

The challenge of enforced updates

About the only thing that is consistent about software in today’s world is that it is constantly changing. New versions are released with new features that may or may not be important to you. When a serverless provider decides to implement a new version or patch of their backend, there may be some downstream issues for you to manage. It is always important to test any new updates, but now some of the decisions about how and when to upgrade may be out of your control. Proper notification from the provider gives you a window of time for testing, but they are probably going to flip the switch regardless of whether or not you have completed all of your test cycles. This is true of both serverless and DBaaS options.

A risk of vendor lock-in

A common mantra in the software world is that we want to avoid vendor lock-in. Of course, from the provider’s side, they want to avoid customer churn, so we often find ourselves on opposite sides of the same issue. Moving to a new platform or provider becomes more complex as you cede more aspects of server management to the host. This means that serverless can cause deep lock-in since your application is designed to work with the environment as your provider has configured it. If you choose to move to a different provider, you need to extract your application and your data from the current provider and probably need to rework it to fit the requirements of the new provider.

The challenge of client-side optimization

Another consideration is that optimizations of server-side configurations must necessarily be more generic compared to those you might make to self-hosted servers. Optimization can no longer be done at the server level for your specific application and use; instead, you now rely on a smarter client to perform your necessary optimizations. This requires a skill set that may not exist with some developers: the ability to tune applications client-side.

Conclusion

Serverless is not going away. In fact, it is likely to grow as people come to a better understanding and comfort level with it. You need to be able to make an informed decision regarding whether serverless is right for you. Careful consideration of the pros and cons is imperative for making a solid determination. Understanding your usage patterns, user expectations, development capabilities, and a lot more will help to guide that decision.

In a future post, I’ll review the architectural differences between on-premises, PaaS, DBaaS and serverless database environments.

Rick Golba

Rick Golba is a Solutions Engineer at Percona. Rick has over 20 years of experience working with databases. Prior to Percona, he worked as a Technical Trainer for HP/Vertica.

↧

Erlang OTP 21.0 is Released

June 20, 2018, 10:02 am

≫ Next: Kazuo Kashio, a Founder of Casio Computer, Has Died

≪ Previous: Is Serverless Just a New Word for Cloud Based?

Erlang/OTP 21 is a new major release with new features, improvements as well as incompatibilities.

Potential Incompatibilities

All Corba applications are now moved from the OTP repository
A new Corba repository will be created https://github.com/erlang
New applications ftp and tftp, moved from inets
ssl no longer supports 3_DES cipher suites or RSA-key exchange cipher suites by default
erlang:monitor on a primitive node (erl_interface, jinterface, etc) will no longer fail with badarg exception. Instead a monitor will be created, but it will only supervise the connection to the node.

Highlights

Erts:

Enhanced IO scalability
Support for usage of distribution controller processes for alternative transports, routing etc
compact instructions on 64bit systems for code below 4GB 20% less memory for loaded code
Rewrite of the efile-driver with NIFs and "Dirty schedulers" resulting in faster file operations
non-smp VM removed
link and monitor optimized for scalability
os:getenv/putenv now work on thread-safe emulation. No longer in sync with libc getenv(3). Manual synchronization will be needed.

Compiler:

Misc compiler optimizations including contributions from the Elixir team resulting in 10% improvements in benchmarks
"Tuple calls" have been removed from the run-time system.
Code such as f({ok, Val}) -> {ok, Val} is now automatically rewritten to f({ok, Val} = Tuple) -> Tuple. this reduces code size, execution time, and removed GC pressure.
More information in stacktrace from a number of operators
erlang:get_stacktrace/0 deprecated to be replaced with try ... catch C:R:Stacktrace -> ...
Creation of small maps with literal keys optimized.
A new predifined macro `OTP_RELEASE` and preprocessor directives `-if` and `-elif`

Security:

DTLS is now supported in the SSL application
Enhanced support for distribution over TLS
"unsecure" ciphers removed from defaults in SSL and SSH.
A new option value defined to facilitate implementing exec servers. Old option kept for compatibility, but now gives errors on stderror.

Standard libraries:

New API for logging, logger
New uri_string module for parsing URIs according to "The standard"
New function lists:search(list,fun/1) -> {ok, Value} | false
Changed default behaviour of .erlang loading. escript, erlc, dialyzer and typer no longer load an .erlang at all.

For more details see
http://erlang.org/download/otp_src_21.0.readme

Pre built versions for Windows can be fetched here:
http://erlang.org/download/otp_win32_21.0.exe
http://erlang.org/download/otp_win64_21.0.exe

Online documentation can be browsed here:
http://erlang.org/documentation/doc-10.0/doc

The Erlang/OTP source can also be found at GitHub on the official Erlang repository, Here: OTP-21.0

Thank you for all your contributions!

↧

Kazuo Kashio, a Founder of Casio Computer, Has Died

June 20, 2018, 2:30 pm

≫ Next: Bedford and the Normalization of Deviance

≪ Previous: Erlang OTP 21.0 is Released

Kazuo graduated in 1949 from an English course affiliated with Nihon University in Tokyo. In 1950 he joined his brothers Tadao and Toshio at the fledgling consumer goods factory that was churning out the cigarette rings and a corn biscuit baker. Tadao had worked in a munitions plant and made airplane components during the war.

In the early 1950s their fourth brother, Yukio, an engineer, joined the firm. By 1957 they had perfected an innovative electronic (rather than electromechanical) calculator that could not only add and subtract but also multiply. It weighed 308 pounds and sold for the equivalent of $11,000 in today’s dollars.

By 1965, Casio was producing desktop calculators. In 1972 the company introduced the Casio Mini, billed as the world’s first personal calculator. It became ubiquitous after a prodigious marketing campaign.

While European craftsmen scoffed at electronic watches, Casio perfected quartz and digital models sold under the motto “Time is a continuous process of addition.”

In 1974, the company unveiled the Casiotron, a digital wristwatch that displayed a calendar in addition to the time.

The G-Shock, of which there are now hundreds of models, boasted not only durability but also longer battery life than its competitors. It was heavily promoted through, among other marketing tools, product placement in films like “Men in Black” and “Mission: Impossible.”

The Casiotone-201, an electronic musical keyboard, was introduced in 1980, followed by the QV-10 digital camera in 1995 and pocket televisions, word processors, car navigation systems, blood pressure sensors and more.

Mr. Kashio is survived by his wife, Soko, as well as two daughters, a son and his brother Yukio. His brother Tadao died in 1993. His brother Toshio, who invented some of Casio’s products, died in 2012.

↧

Bedford and the Normalization of Deviance

June 20, 2018, 12:39 pm

≫ Next: The Death of Postmodernism and Beyond (2006)

≪ Previous: Kazuo Kashio, a Founder of Casio Computer, Has Died

Like many pilots, I read accident reports all the time. This may seem morbid to people outside “the biz”, but those of us on the inside know that learning what went wrong is an important step in avoiding the fate suffered by those aviators. And after fifteen years in the flying business, the NTSB’s recently-released report on the 2014 Gulfstream IV crash in Bedford, Massachusetts is one of the most disturbing I’ve ever laid eyes on.

If you’re not familiar with the accident, it’s quite simple to explain: the highly experienced crew of a Gulfstream IV-SP attempted to takeoff with the gust lock (often referred to as a “control lock”) engaged. The aircraft exited the end of the runway and broke apart when it encountered a steep culvert. The ensuing fire killed all aboard.

Sounds pretty open-and shut, doesn’t it? There have been dozens of accidents caused by the flight crew’s failure to remove the gust/control lock prior to flight. Professional test pilots have done it on multiple occasions, ranging from the prototype B-17 bomber in 1935 to the DHC-4 Caribou in 1992. But in this case, the NTSB report details a long series of actions and habitual behaviors which are so far beyond the pale that they defy the standard description of “pilot error”.

Just the Facts

Let me summarize the ten most pertinent errors and omissions of this incident for you:

There are five checklists which must be run prior to flying. The pilots ran none of them. CVR data and pilot interviews revealed that checklists simply were not used. This was not an anomaly, it was standard operating procedure for them.
Obviously the gust lock was not removed prior to flying. This is a very big, very visible, bright red handle which sticks up vertically right between the throttles and the flap handle. As the Simon & Chabris selective attention test demonstrates, it’s not necessarily hard to miss the gust lock handle protruding six inches above the rest of the center pedestal. But it’s also the precise reason we have checklists and procedures in the first place.
Flight control checks were not performed on this flight, nor were they ever performed. Hundreds of flights worth of data from the FDR and pilot interviews confirm it.
The crew received a Rudder Limit message indicating that the rudder’s load limiter had activated. This is abnormal. The crew saw the alert. We know this because it was verbalized. Action taken? None.
The Pilot Flying (PF) was unable to push the power levers far enough forward to achieve takeoff thrust. Worse, he actually verbalized that he wasn’t able to get full power, yet continued the takeoff anyway.
The Pilot Not Flying (PNF) was supposed to monitor the engines and verbally call out when takeoff power was set. He failed to perform this task.
Aerodynamics naturally move the elevator up (and therefore the control column aft) as the airplane accelerates. Gulfstream pilots are trained to look for this. It didn’t happen, and it wasn’t caught by either pilot.
The Pilot Flying realized the gust lock was engaged, and said so verbally several times. At this point, the aircraft was traveling 128 knots had used 3,100 feet of runway; about 5,000 feet remained. In other words, they had plenty of time to abort the takeoff. They chose to continue anyway.
One of the pilots pulled the flight power shutoff handle to remove hydraulic pressure from the flight controls in an attempt to release the gust lock while accelerating down the runway. The FPSOV was not designed for this purpose, and you won’t find any G-IV manual advocating this procedure. Because it doesn’t work.
By the time they realized it wouldn’t work and began the abort attempt, it was too late. The aircraft was traveling at 162 knots (186 mph!) and only about 2,700 feet of pavement remained. The hydraulically-actuated ground spoilers — which greatly aid in stopping the aircraft by placing most of its weight back on the wheels to increase rolling resistance and braking efficiency — were no longer available because the crew had removed hydraulic power to the flight controls.

Industry Responses

Gulfstream IV gust lock (the red handle, shown here in the engaged position)

Gulfstream has been sued by the victim’s families. Attorneys claim that the gust lock was defective, and that this is the primary reason for the crash. False. The gust lock is designed to prevent damage to the flight controls from wind gusts. It does that job admirably. It also prevents application of full takeoff power, but the fact that the pilot was able to physically push the power levers so far forward simply illustrates that anything can be broken if you put enough muscle into it.

The throttle portion of the gust lock may have failed to meet a technical certification requirement, but it was not the cause of the accident. The responsibility for ensuring the gust lock is disengaged prior to takeoff lies with the pilots, not the manufacturer of the airplane.

Gulfstream pilot and Code7700 author James Albright calls the crash involuntary manslaughter. I agree. This wasn’t a normal accident chain. The pilots knew what was wrong while there was still plenty of time to stop it. They had all the facts you and I have today. They chose to continue anyway. It’s the most inexplicable thing I’ve yet seen a professional pilot do, and I’ve seen a lot of crazy things. If locked flight controls don’t prompt a takeoff abort, nothing will.

Albright’s analysis is outstanding: direct and factual. I predict there will be no shortage of articles and opinions on this accident. It will be pointed to and discussed for years as a bright, shining example of how not to operate an aircraft.

In response to the crash, former NTSB member John Goglia has called for video cameras in the cockpit, with footage to be regularly reviewed to ensure pilots are completing checklists. Despite the good intentions, this proposal would not achieve the desired end. Pilots are already work in the presence of cockpit voice recorders, flight data recorders, ATC communication recording, radar data recording, and more. If a pilot needs to be videotaped too, I’d respectfully suggest that this person should be relieved of duty. No, the problem here is not going to be solved by hauling Big Brother further into the cockpit.

A better model would be that of the FOQA program, where information from flight data recorders is downloaded and analyzed periodically in a no-hazard environment. The pilots, the company, and the FAA each get something valuable. It’s less stick, more carrot. I would also add that this sort of program is in keeping with the Fed’s recent emphasis on compliance over enforcement action.

The Normalization of Deviance

What I, and probably you, are most interested in is determining how well-respected, experienced, and accomplished pilots who’ve been through the best training the industry has to offer reached the point where their performance is so bad that a CFI wouldn’t accept it from a primary student on their very first flight.

After reading through the litany of errors and malfeasance present in this accident report, it’s tempting to brush the whole thing off and say “this could never happen to me”. I sincerely believe doing so would be a grave mistake. It absolutely can happen to any of us, just as it has to plenty of well-trained, experienced, intelligent pilots. Test pilots. People who are much better than you or I will ever be.

But how? Clearly the Bedford pilots were capable of following proper procedures, and did so at carefully selected times: at recurrent training events, during IS-BAO audits, on checkrides, and various other occasions.

Goglia, Albright, the NTSB, and others are focusing on “complacency” as a root cause, but I believe there might be a more detailed explanation. The true accident chain on this crash formed over a long, long period of time — decades, most likely — through a process known as the normalization of deviance.

Social normalization of deviance means that people within the organization become so much accustomed to a deviant behavior that they don’t consider it as deviant, despite the fact that they far exceed their own rules for the elementary safety. People grow more accustomed to the deviant behavior the more it occurs. To people outside of the organization, the activities seem deviant; however, people within the organization do not recognize the deviance because it is seen as a normal occurrence. In hindsight, people within the organization realize that their seemingly normal behavior was deviant.

This concept was developed by sociologist and Columbia University professor Diane Vaughan after the Challenger explosion. NASA fell victim to it in 1986, and then got hit again when the Columbia disaster occurred in 2003. If they couldn’t escape its clutches, you might wonder what hope we have. Well, for one thing, spaceflight in general and the shuttle program in particular are specialized, experimental types of flying. They demand acceptance of a far higher risk profile than corporate, charter, and private aviation.

I believe the first step in avoiding “normalization of deviance” is awareness, just as admitting you have a problem is the first step in recovery from substance addiction. After all, if you can’t detect the presence of a problem, how can you possibly fix it?

There are several factors which tend to sprout normalization of deviance:

First and foremost is the attitude that rules are stupid and/or inefficient. Pilots, who tend to be independent Type A personalities anyway, often develop shortcuts or workarounds when the checklist, regulation, training, or professional standard seems inefficient. Example: the boss in on board and we can’t sit here for several minutes running checklists; I did a cockpit flow, so let’s just get going!
Sometimes pilots learn a deviation without realizing it. Formalized training only covers part of what an aviator needs to know to fly in the real world. The rest comes from senior pilots, training captains, and tribal knowledge. What’s taught is not always correct.
Often, the internal justification for cognizant rule breaking includes the “good” of the company or customer, often where the rule or standard is perceived as counterproductive. In the case of corporate or charter flying, it’s the argument that the passenger shouldn’t have to (or doesn’t want to) wait. I’ve seen examples of pilots starting engines while the passengers are still boarding, or while the copilot is still loading luggage. Are we at war? Under threat of physical attack? Is there some reason a 2 minute delay is going to cause the world to stop turning?
The last step in the process is silence. Co-workers are afraid to speak up, and understandably so. The cockpit is already a small place. It gets a lot smaller when disagreements start to brew between crew members. In the case of contract pilots, it may result in the loss of a regular customer. Unfortunately, the likelihood that rule violations will become normalized increases if those who see them refuse to intervene.

The normalization of deviance can be stopped, but doing so is neither easy or comfortable. It requires a willingness to confront such deviance when it is seen, lest it metastasize to the point we read about in the Bedford NTSB report. It also requires buy-in from pilots on the procedures and training they receive. When those things are viewed as “checking a box” rather than bona fide safety elements, it becomes natural to downplay their importance.

Many of you know I am not exactly a fan of the Part 121 airline scene, but it’s hard to argue with the success airlines have had in this area. When I flew for Dynamic Aviation’s California Medfly operation here in Southern California, procedures and checklists were followed with that level of precision and dedication. As a result, the CMF program has logged several decades of safe operation despite the high-risk nature of the job.

Whether you’re flying friends & family, pallets of cargo, or the general public, we all have the same basic goal, to aviate without ending up in an embarrassing NTSB report whose facts leave no doubt about how badly we screwed up. The normalization of deviance is like corrosion: an insidious, ever-present, naturally-occurring enemy which will weaken and eventually destroy us. If we let it.

↧

The Death of Postmodernism and Beyond (2006)

June 20, 2018, 12:24 pm

≫ Next: Using Unix as an IDE (2012)

≪ Previous: Bedford and the Normalization of Deviance

Your complimentary articles

You’ve read one of your four complimentary articles for this month.

You can read four articles free per month. To have complete access to the thousands of philosophy articles on this site, please

Articles

Alan Kirby says postmodernism is dead and buried. In its place comes a new paradigm of authority and knowledge formed under the pressure of new technologies and contemporary social forces.

I have in front of me a module description downloaded from a British university English department’s website. It includes details of assignments and a week-by-week reading list for the optional module ‘Postmodern Fictions’, and if the university is to remain nameless here it’s not because the module is in any way shameful but that it handily represents modules or module parts which will be taught in virtually every English department in the land this coming academic year. It assumes that postmodernism is alive, thriving and kicking: it says it will introduce “the general topics of ‘postmodernism’ and ‘postmodernity’ by examining their relationship to the contemporary writing of fiction”. This might suggest that postmodernism is contemporary, but the comparison actually shows that it is dead and buried.

Postmodern philosophy emphasises the elusiveness of meaning and knowledge. This is often expressed in postmodern art as a concern with representation and an ironic self-awareness. And the argument that postmodernism is over has already been made philosophically. There are people who have essentially asserted that for a while we believed in postmodern ideas, but not any more, and from now on we’re going to believe in critical realism. The weakness in this analysis is that it centres on the academy, on the practices and suppositions of philosophers who may or may not be shifting ground or about to shift – and many academics will simply decide that, finally, they prefer to stay with Foucault [arch postmodernist] than go over to anything else. However, a far more compelling case can be made that postmodernism is dead by looking outside the academy at current cultural production.

Most of the undergraduates who will take ‘Postmodern Fictions’ this year will have been born in 1985 or after, and all but one of the module’s primary texts were written before their lifetime. Far from being ‘contemporary’, these texts were published in another world, before the students were born: The French Lieutenant’s Woman, Nights at the Circus, If on a Winter’s Night a Traveller, Do Androids Dream of Electric Sheep? (and Blade Runner), White Noise: this is Mum and Dad’s culture. Some of the texts (‘The Library of Babel’) were written even before their parents were born. Replace this cache with other postmodern stalwarts – Beloved, Flaubert’s Parrot, Waterland, The Crying of Lot 49, Pale Fire, Slaughterhouse 5, Lanark, Neuromancer, anything by B.S. Johnson – and the same applies. It’s all about as contemporary as The Smiths, as hip as shoulder pads, as happening as Betamax video recorders. These are texts which are just coming to grips with the existence of rock music and television; they mostly do not dream even of the possibility of the technology and communications media – mobile phones, email, the internet, computers in every house powerful enough to put a man on the moon – which today’s undergraduates take for granted.

The reason why the primary reading on British postmodernism fictions modules is so old, in relative terms, is that it has not been rejuvenated. Just look out into the cultural market-place: buy novels published in the last five years, watch a twenty-first century film, listen to the latest music – above all just sit and watch television for a week – and you will hardly catch a glimpse of postmodernism. Similarly, one can go to literary conferences (as I did in July) and sit through a dozen papers which make no mention of Theory, of Derrida, Foucault, Baudrillard. The sense of superannuation, of the impotence and the irrelevance of so much Theory among academics, also bears testimony to the passing of postmodernism. The people who produce the cultural material which academics and non-academics read, watch and listen to, have simply given up on postmodernism. The occasional metafictional or self-conscious text will appear, to widespread indifference – like Bret Easton Ellis’ Lunar Park– but then modernist novels, now long forgotten, were still being written into the 1950s and 60s. The only place where the postmodern is extant is in children’s cartoons like Shrek and The Incredibles, as a sop to parents obliged to sit through them with their toddlers. This is the level to which postmodernism has sunk; a source of marginal gags in pop culture aimed at the under-eights.

What’s Post Postmodernism?

I believe there is more to this shift than a simple change in cultural fashion. The terms by which authority, knowledge, selfhood, reality and time are conceived have been altered, suddenly and forever. There is now a gulf between most lecturers and their students akin to the one which appeared in the late 1960s, but not for the same kind of reason. The shift from modernism to postmodernism did not stem from any profound reformulation in the conditions of cultural production and reception; all that happened, to rhetorically exaggerate, was that the kind of people who had once written Ulysses and To the Lighthouse wrote Pale Fire and The Bloody Chamber instead. But somewhere in the late 1990s or early 2000s, the emergence of new technologies re-structured, violently and forever, the nature of the author, the reader and the text, and the relationships between them.

Postmodernism, like modernism and romanticism before it, fetishised [ie placed supreme importance on] the author, even when the author chose to indict or pretended to abolish him or herself. But the culture we have now fetishises the recipient of the text to the degree that they become a partial or whole author of it. Optimists may see this as the democratisation of culture; pessimists will point to the excruciating banality and vacuity of the cultural products thereby generated (at least so far).

Let me explain. Postmodernism conceived of contemporary culture as a spectacle before which the individual sat powerless, and within which questions of the real were problematised. It therefore emphasised the television or the cinema screen. Its successor, which I will call pseudo-modernism, makes the individual’s action the necessary condition of the cultural product. Pseudo-modernism includes all television or radio programmes or parts of programmes, all ‘texts’, whose content and dynamics are invented or directed by the participating viewer or listener (although these latter terms, with their passivity and emphasis on reception, are obsolete: whatever a telephoning Big Brother voter or a telephoning 6-0-6 football fan are doing, they are not simply viewing or listening).

By definition, pseudo-modern cultural products cannot and do not exist unless the individual intervenes physically in them. Great Expectations will exist materially whether anyone reads it or not. Once Dickens had finished writing it and the publisher released it into the world, its ‘material textuality’ – its selection of words – was made and finished, even though its meanings, how people interpret it, would remain largely up for grabs. Its material production and its constitution were decided by its suppliers, that is, its author, publisher, serialiser etc alone – only the meaning was the domain of the reader. Big Brother on the other hand, to take a typical pseudo-modern cultural text, would not exist materially if nobody phoned up to vote its contestants off. Voting is thus part of the material textuality of the programme – the telephoning viewers write the programme themselves. If it were not possible for viewers to write sections of Big Brother, it would then uncannily resemble an Andy Warhol film: neurotic, youthful exhibitionists inertly bitching and talking aimlessly in rooms for hour after hour. This is to say, what makes Big Brother what it is, is the viewer’s act of phoning in.

Pseudo-modernism also encompasses contemporary news programmes, whose content increasingly consists of emails or text messages sent in commenting on the news items. The terminology of ‘interactivity’ is equally inappropriate here, since there is no exchange: instead, the viewer or listener enters – writes a segment of the programme – then departs, returning to a passive role. Pseudo-modernism also includes computer games, which similarly place the individual in a context where they invent the cultural content, within pre-delineated limits. The content of each individual act of playing the game varies according to the particular player.

The pseudo-modern cultural phenomenon par excellence is the internet. Its central act is that of the individual clicking on his/her mouse to move through pages in a way which cannot be duplicated, inventing a pathway through cultural products which has never existed before and never will again. This is a far more intense engagement with the cultural process than anything literature can offer, and gives the undeniable sense (or illusion) of the individual controlling, managing, running, making up his/her involvement with the cultural product. Internet pages are not ‘authored’ in the sense that anyone knows who wrote them, or cares. The majority either require the individual to make them work, like Streetmap or Route Planner, or permit him/her to add to them, like Wikipedia, or through feedback on, for instance, media websites. In all cases, it is intrinsic to the internet that you can easily make up pages yourself (eg blogs).

If the internet and its use define and dominate pseudo-modernism, the new era has also seen the revamping of older forms along its lines. Cinema in the pseudo-modern age looks more and more like a computer game. Its images, which once came from the ‘real’ world – framed, lit, soundtracked and edited together by ingenious directors to guide the viewer’s thoughts or emotions – are now increasingly created through a computer. And they look it. Where once special effects were supposed to make the impossible appear credible, CGI frequently [inadvertently] works to make the possible look artificial, as in much of Lord of the Rings or Gladiator. Battles involving thousands of individuals have really happened; pseudo-modern cinema makes them look as if they have only ever happened in cyberspace. And so cinema has given cultural ground not merely to the computer as a generator of its images, but to the computer game as the model of its relationship with the viewer.

Similarly, television in the pseudo-modern age favours not only reality TV (yet another unapt term), but also shopping channels, and quizzes in which the viewer calls to guess the answer to riddles in the hope of winning money. It also favours phenomena like Ceefax and Teletext. But rather than bemoan the new situation, it is more useful to find ways of making these new conditions conduits for cultural achievements instead of the vacuity currently evident. It is important here to see that whereas the form may change (Big Brother may wither on the vine), the terms by which individuals relate to their television screen and consequently what broadcasters show have incontrovertibly changed. The purely ‘spectacular’ function of television, as with all the arts, has become a marginal one: what is central now is the busy, active, forging work of the individual who would once have been called its recipient. In all of this, the ‘viewer’ feels powerful and is indeed necessary; the ‘author’ as traditionally understood is either relegated to the status of the one who sets the parameters within which others operate, or becomes simply irrelevant, unknown, sidelined; and the ‘text’ is characterised both by its hyper-ephemerality and by its instability. It is made up by the ‘viewer’, if not in its content then in its sequence – you wouldn’t read Middlemarch by going from page 118 to 316 to 401 to 501, but you might well, and justifiably, read Ceefax that way.

A pseudo-modern text lasts an exceptionally brief time. Unlike, say, Fawlty Towers, reality TV programmes cannot be repeated in their original form, since the phone-ins cannot be reproduced, and without the possibility of phoning-in they become a different and far less attractive entity. Ceefax text dies after a few hours. If scholars give the date they referenced an internet page, it is because the pages disappear or get radically re-cast so quickly. Text messages and emails are extremely difficult to keep in their original form; printing out emails does convert them into something more stable, like a letter, but only by destroying their essential, electronic state. Radio phone-ins, computer games – their shelf-life is short, they are very soon obsolete. A culture based on these things can have no memory – certainly not the burdensome sense of a preceding cultural inheritance which informed modernism and postmodernism. Non-reproducible and evanescent, pseudo-modernism is thus also amnesiac: these are cultural actions in the present moment with no sense of either past or future.

The cultural products of pseudo-modernism are also exceptionally banal, as I’ve hinted. The content of pseudo-modern films tends to be solely the acts which beget and which end life. This puerile primitivism of the script stands in stark contrast to the sophistication of contemporary cinema’s technical effects. Much text messaging and emailing is vapid in comparison with what people of all educational levels used to put into letters. A triteness, a shallowness dominates all. The pseudo-modern era, at least so far, is a cultural desert. Although we may grow so used to the new terms that we can adapt them for meaningful artistic expression (and then the pejorative label I have given pseudo-modernism may no longer be appropriate), for now we are confronted by a storm of human activity producing almost nothing of any lasting or even reproducible cultural value – anything which human beings might look at again and appreciate in fifty or two hundred years time.

The roots of pseudo-modernism can be traced back through the years dominated by postmodernism. Dance music and industrial pornography, for instance, products of the late 70s and 80s, tend to the ephemeral, to the vacuous on the level of signification, and to the unauthored (dance much more so than pop or rock). They also foreground the activity of their ‘reception’: dance music is to be danced to, porn is not to be read or watched but used, in a way which generates the pseudo-modern illusion of participation. In music, the pseudo-modern supersedingof the artist-dominated album as monolithic text by the downloading and mix-and-matching of individual tracks on to an iPod, selected by the listener, was certainly prefigured by the music fan’s creation of compilation tapes a generation ago. But a shift has occurred, in that what was a marginal pastime of the fan has become the dominant and definitive way of consuming music, rendering the idea of the album as a coherent work of art, a body of integrated meaning, obsolete.

To a degree, pseudo-modernism is no more than a technologically motivated shift to the cultural centre of something which has always existed (similarly, metafiction has always existed, but was never so fetishised as it was by postmodernism). Television has always used audience participation, just as theatre and other performing arts did before it; but as an option, not as a necessity: pseudo-modern TV programmes have participation built into them. There have long been very ‘active’ cultural forms, too, from carnival to pantomime. But none of these implied a written or otherwise material text, and so they dwelt in the margins of a culture which fetishised such texts – whereas the pseudo-modern text, with all its peculiarities, stands as the central, dominant, paradigmatic form of cultural product today, although culture, in its margins, still knows other kinds. Nor should these other kinds be stigmatised as ‘passive’ against pseudo-modernity’s ‘activity’. Reading, listening, watching always had their kinds of activity; but there is a physicality to the actions of the pseudo-modern text-maker, and a necessity to his or her actions as regards the composition of the text, as well as a domination which has changed the cultural balance of power (note how cinema and TV, yesterday’s giants, have bowed before it). It forms the twenty-first century’s social-historical-cultural hegemony. Moreover, the activity of pseudo-modernism has its own specificity: it is electronic, and textual, but ephemeral.

Clicking In The Changes

In postmodernism, one read, watched, listened, as before. In pseudo-modernism one phones, clicks, presses, surfs, chooses, moves, downloads. There is a generation gap here, roughly separating people born before and after 1980. Those born later might see their peers as free, autonomous, inventive, expressive, dynamic, empowered, independent, their voices unique, raised and heard: postmodernism and everything before it will by contrast seem elitist, dull, a distant and droning monologue which oppresses and occludes them. Those born before 1980 may see, not the people, but contemporary texts which are alternately violent, pornographic, unreal, trite, vapid, conformist, consumerist, meaningless and brainless (see the drivel found, say, on some Wikipedia pages, or the lack of context on Ceefax). To them what came before pseudo-modernism will increasingly seem a golden age of intelligence, creativity, rebellion and authenticity. Hence the name ‘pseudo-modernism’ also connotes the tension between the sophistication of the technological means, and the vapidity or ignorance of the content conveyed by it – a cultural moment summed up by the fatuity of the mobile phone user’s “I’m on the bus”.

Whereas postmodernism called ‘reality’ into question, pseudo-modernism defines the real implicitly as myself, now, ‘interacting’ with its texts. Thus, pseudo-modernism suggests that whatever it does or makes is what is reality, and a pseudo-modern text may flourish the apparently real in an uncomplicated form: the docu-soap with its hand-held cameras (which, by displaying individuals aware of being regarded, give the viewer the illusion of participation); The Office and The Blair Witch Project, interactive pornography and reality TV; the essayistic cinema of Michael Moore or Morgan Spurlock.

Along with this new view of reality, it is clear that the dominant intellectual framework has changed. While postmodernism’s cultural products have been consigned to the same historicised status as modernism and romanticism, its intellectual tendencies (feminism, postcolonialism etc) find themselves isolated in the new philosophical environment. The academy, perhaps especially in Britain, is today so swamped by the assumptions and practices of market economics that it is deeply implausible for academics to tell their students they inhabit a postmodern world where a multiplicity of ideologies, world-views and voices can be heard. Their every step hounded by market economics, academics cannot preach multiplicity when their lives are dominated by what amounts in practice to consumer fanaticism. The world has narrowed intellectually, not broadened, in the last ten years. Where Lyotard saw the eclipse of Grand Narratives, pseudo-modernism sees the ideology of globalised market economics raised to the level of the sole and over-powering regulator of all social activity – monopolistic, all-engulfing, all-explaining, all-structuring, as every academic must disagreeably recognise. Pseudo-modernism is of course consumerist and conformist, a matter of moving around the world as it is given or sold.

Secondly, whereas postmodernism favoured the ironic, the knowing and the playful, with their allusions to knowledge, history and ambivalence, pseudo-modernism’s typical intellectual states are ignorance, fanaticism and anxiety: Bush, Blair, Bin Laden, Le Pen and their like on one side, and the more numerous but less powerful masses on the other. Pseudo-modernism belongs to a world pervaded by the encounter between a religiously fanatical segment of the United States, a largely secular but definitionally hyper-religious Israel, and a fanatical sub-section of Muslims scattered across the planet: pseudo-modernism was not born on 11 September 2001, but postmodernism was interred in its rubble. In this context pseudo-modernism lashes fantastically sophisticated technology to the pursuit of medieval barbarism – as in the uploading of videos of beheadings onto the internet, or the use of mobile phones to film torture in prisons. Beyond this, the destiny of everyone else is to suffer the anxiety of getting hit in the cross-fire. But this fatalistic anxiety extends far beyond geopolitics, into every aspect of contemporary life; from a general fear of social breakdown and identity loss, to a deep unease about diet and health; from anguish about the destructiveness of climate change, to the effects of a new personal ineptitude and helplessness, which yield TV programmes about how to clean your house, bring up your children or remain solvent. This technologised cluelessness is utterly contemporary: the pseudo-modernist communicates constantly with the other side of the planet, yet needs to be told to eat vegetables to be healthy, a fact self-evident in the Bronze Age. He or she can direct the course of national television programmes, but does not know how to make him or herself something to eat – a characteristic fusion of the childish and the advanced, the powerful and the helpless. For varying reasons, these are people incapable of the “disbelief of Grand Narratives” which Lyotard argued typified postmodernists.

This pseudo-modern world, so frightening and seemingly uncontrollable, inevitably feeds a desire to return to the infantile playing with toys which also characterises the pseudo-modern cultural world. Here, the typical emotional state, radically superseding the hyper-consciousness of irony, is the trance– the state of being swallowed up by your activity. In place of the neurosis of modernism and the narcissism of postmodernism, pseudo-modernism takes the world away, by creating a new weightless nowhere of silent autism. You click, you punch the keys, you are ‘involved’, engulfed, deciding. You are the text, there is no-one else, no ‘author’; there is nowhere else, no other time or place. You are free: you are the text: the text is superseded.

Alan Kirby holds a PhD in English Literature from the University of Exeter. He currently lives in Oxford.

↧

Using Unix as an IDE (2012)

June 20, 2018, 1:39 pm

≫ Next: MySQL High Availability at GitHub

≪ Previous: The Death of Postmodernism and Beyond (2006)

This series has been independently translated into Chinese, Russian,Turkish, and Korean, and formatted as anebook.

Are you re-reading this series? Did you see posts 6 (Debugging) and 7 (Revision control) last time? Due to a misconfiguration, they weren’t showing at the end of the series page for some time; this has since been fixed.

Newbies and experienced professional programmers alike appreciate the concept of the IDE, or integrated development environment. Having the primary tools necessary for organising, writing, maintaining, testing, and debugging code in an integrated application with common interfaces for all the different tools is certainly a very valuable asset. Additionally, an environment expressly designed for programming in various languages affords advantages such as autocompletion, and syntax checking and highlighting.

With such tools available to developers on all major desktop operating systems including GNU/Linux and BSD, and with many of the best free of charge, there’s not really a good reason to write your code in Windows Notepad, or with nano orcat.

However, there’s a minor meme among devotees of Unix and its modern-day derivatives that “Unix is an IDE”, meaning that the tools available to developers on the terminal cover the major features in cutting-edge desktop IDEs with some ease. Opinion is quite divided on this, but whether or not you feel it’s fair to call Unix an IDE in the same sense as Eclipse or Microsoft Visual Studio, it may surprise you just how comprehensive a development environment the humble Bash shell can be.

How is UNIX an IDE?

The primary rationale for using an IDE is that it gathers all your tools in the same place, and you can use them in concert with roughly the same user interface paradigm, and without having to exert too much effort to make separate applications cooperate. The reason this becomes especially desirable with GUI applications is because it’s very difficult to make windowed applications speak a common language or work well with each other; aside from cutting and pasting text, they don’t share a common interface.

The interesting thing about this problem for shell users is that well-designed and enduring Unix tools already share a common user interface in streams of text and files as persistent objects, otherwise expressed in the axiom “everything’s a file”. Pretty much everything in Unix is built around these two concepts, and it’s this common user interface, coupled with a forty-year history of high-powered tools whose users and developers have especially prized interoperability, that goes a long way to making Unix as powerful as a full-blown IDE.

The right idea

This attitude isn’t the preserve of battle-hardened Unix greybeards; you can see it in another form in the way the modern incarnations of the two grand old text editors Emacs and Vi (GNU Emacs and Vim) have such active communities developing plugins to make them support pretty much any kind of editing task. There are plugins to do pretty much anything you could really want to do in programming in both editors, and any Vim junkie could spout off at least three or four that they feel are “essential”.

However, it often becomes apparent to me when reading about these efforts that the developers concerned are trying to make these text editors into IDEs in their own right. There are posts about never needing to leave Vim, ornever needing to leave Emacs. But I think that trying to shoehorn Vim or Emacs into becoming something that it’s not isn’t quite thinking about the problem in the right way. Bram Moolenaar, the author of Vim, appears to agree to some extent, as you can see by reading :help design-not. The shell is only ever a Ctrl+Z away, and its mature, highly composable toolset will afford you more power than either editor ever could.

EDIT October 2017: New versions of Vim 8.x now include an embedded terminal accessible with the :terminal command. It still has some problems, but it works better than previous plugin-based attempts to do this. Even with this new feature, I still strongly recommend the approach discussed in these posts instead.

About this series

In this series of posts, I will be going through six major features of an IDE, and giving examples showing how common tools available in GNU/Linux allow you to use them together with ease. This will by no means be a comprehensive survey, nor are the tools I will demonstrate the only options.

File and project management— ls, find, grep/ack, bash
Text editor and editing tools— vim, awk, sort, column
Compiler and/or interpreter— gcc, perl
Build tools— make
Debugger— gdb, valgrind, ltrace, lsof, pmap
Version control— diff, patch, svn, git

What I’m not trying to say

I don’t think IDEs are bad; I think they’re brilliant, which is why I’m trying to convince you that Unix can be used as one, or at least thought of as one. I’m also not going to say that Unix is always the best tool for any programming task; it is arguably much better suited for C, C++, Python, Perl, or Shell development than it is for more “industry” languages like Java or C#, especially if writing GUI-heavy applications. In particular, I’m not going to try to convince you to scrap your hard-won Eclipse or Microsoft Visual Studio knowledge for the sometimes esoteric world of the command line. All I want to do is show you what we’re doing on the other side of the fence.

↧

MySQL High Availability at GitHub

June 20, 2018, 10:14 am

≫ Next: A Git protocol parser written in Go

≪ Previous: Using Unix as an IDE (2012)

GitHub uses MySQL as its main datastore for all things non-git, and its availability is critical to GitHub’s operation. The site itself, GitHub’s API, authentication and more, all require database access. We run multiple MySQL clusters serving our different services and tasks. Our clusters use classic master-replicas setup, where a single node in a cluster (the master) is able to accept writes. The rest of the cluster nodes (the replicas) asynchronously replay changes from the master and serve our read traffic.

The availability of master nodes is particularly critical. With no master, a cluster cannot accept writes: any writes that need to be persisted cannot be persisted. Any incoming changes such as commits, issues, user creation, reviews, new repositories, etc., would fail.

To support writes we clearly need to have an available writer node, a master of a cluster. But just as important, we need to be able to identify, or discover, that node.

On a failure, say a master box crash scenario, we must ensure the existence of a new master, as well as be able to quickly advertise its identity. The time it takes to detect a failure, run the failover and advertise the new master’s identity makes up the total outage time.

This post illustrates GitHub’s MySQL high availability and master service discovery solution, which allows us to reliably run a cross-data-center operation, be tolerant of data center isolation, and achieve short outage times on a failure.

High availability objectives

The solution described in this post iterates on, and improves, previous high availability (HA) solutions implemented at GitHub. As we scale, our MySQL HA strategy must adapt to changes. We wish to have similar HA strategies for our MySQL and for other services within GitHub.

When considering high availability and service discovery, some questions can guide your path into an appropriate solution. An incomplete list may include:

How much outage time can you tolerate?
How reliable is crash detection? Can you tolerate false positives (premature failovers)?
How reliable is failover? Where can it fail?
How well does the solution work cross-data-center? On low and high latency networks?
Will the solution tolerate a complete data center (DC) failure or network isolation?
What mechanism, if any, prevents or mitigates split-brain scenarios (two servers claiming to be the master of a given cluster, both independently and unknowingly to each other accepting writes)?
Can you afford data loss? To what extent?

To illustrate some of the above, let’s first consider our previous HA iteration, and why we changed it.

Moving away from VIP and DNS based discovery

In our previous iteration, we used:

orchestrator for detection and failover, and
VIP and DNS for master discovery.

In that iteration, clients discovered the writer node by using a name, e.g. mysql-writer-1.github.net. The name resolved to a Virtual IP address (VIP) which the master host would acquire.

Thus, on a normal day, clients would just resolve the name, connect to the resolved IP, and find the master listening on the other side.

Consider this replication topology, spanning three different data centers:

In the event of a master failure, a new server, one of the replicas, must be promoted in its place.

orchestrator will detect a failure, promote a new master, and then act to reassign the name/VIP. Clients don’t actually know the identity of the master: all they have is a name, and that name must now resolve to the new master. However, consider:

VIPs are cooperative: they are claimed and owned by the database servers themselves. To acquire or release a VIP, a server must send an ARP request. The server owning the VIP must first release it before the newly promoted master acquires it. This has some undesired effects:

An orderly failover operation will first contact the dead master and request that it release the VIP, and then contact the newly promoted master and request that it grab the VIP. What if the old master cannot be reached or refuses to release the VIP? Given that there’s a failure scenario on that server in the first place, it is not unlikely that it would fail to respond in a timely manner, or indeed respond at all.
- We can end up with a split-brain: two hosts claiming to have the same VIP. Different clients may connect to either of those servers, depending on the shortest network path.
- The source of truth here depends on the cooperation of two independent servers, and this setup is unreliable.
Even if the old master does cooperate, the workflow wastes precious time: the switch to the new master waits while we contact the old master.
And even as the VIP changes, existing client connections are not guaranteed to disconnect from the old server, and we may still experience a split-brain.

In parts of our setup VIPs are bound by physical location. They are owned by a switch or a router. Thus, we can only reassign the VIPs onto co-located servers. In particular, in some cases we cannot assign the VIP to a server promoted in a different data center, and must make a DNS change.

DNS changes take longer to propagate. Clients cache DNS names for a preconfigured time. A cross-DC failover implies more outage time: it will take more time to make all clients aware of the identity of the new master.

These limitations alone were enough to push us in search of a new solution, but for even more consideration were:

Masters were self-injecting themselves with heartbeats via the pt-heartbeat service, for the purpose of lag measurement and throttling control. The service had to be kicked off on the newly promoted master. If possible, the service would be shut down on the old master.
Likewise, Pseudo-GTID injection was self-managed by the masters. It would need to kick off on the new master, and preferably stop on the old master.
The new master was set as writable. The old master was to be set as read_only, if possible.

These extra steps were a contributing factor to the total outage time and introduced their own failures and friction.

The solution worked, and GitHub has had successful MySQL failovers that went well under the radar, but we wanted our HA to improve on the following:

Be data center agnostic.
Be tolerant of data center failure.
Remove unreliable cooperative workflows.
Reduce total outage time.
As much as possible, have lossless failovers.

GitHub’s HA solution: orchestrator, Consul, GLB

Our new strategy, along with collateral improvements, solves or mitigates much of the concerns above. In today’s HA setup, we have:

orchestrator to run detection and failovers. We use a cross-DC orchestrator/raft setup as depicted below.
Hashicorp’s Consul for service discovery.
GLB/HAProxy as a proxy layer between clients and writer nodes.
anycast for network routing.

The new setup removes VIP and DNS changes altogether. And while we introduce more components, we are able to decouple the components and simplify the task, as well as be able to utilize solid and stable solutions. A breakdown follows.

A normal flow

On a normal day the apps connect to the write nodes through GLB/HAProxy.

The apps are never aware of the master’s identity. As before, they use a name. For example, the master for cluster1 would be mysql-writer-1.github.net. In our current setup, however, this name gets resolved to an anycast IP.

With anycast, the name resolves to the same IP everywhere, but traffic is routed differently based on a client’s location. In particular, in each of our data centers we have GLB, our highly available load balancer, deployed on multiple boxes. Traffic to mysql-writer-1.github.net always routes to the local data center’s GLB cluster. Thus, all clients are served by local proxies.

We run GLB on top of HAProxy. Our HAProxy has writer pools: one pool per MySQL cluster, where each pool has exactly one backend server: the cluster’s master. All GLB/HAProxy boxes in all DCs have the exact same pools, and they all indicate the exact same backend servers in these pools. Thus, if an app wishes to write to mysql-writer-1.github.net, it matters not which GLB server it connects to. It will always get routed to the actual cluster1 master node.

As far as the apps are concerned, discovery ends at GLB, and there is never a need for re-discovery. It’s all on GLB to route the traffic to the correct destination.

How does GLB know which servers to list as backends, and how do we propagate changes to GLB?

Discovery via Consul

Consul is well known as a service discovery solution, and also offers DNS services. In our solution, however, we utilize it as a highly available key-value (KV) store.

Within Consul’s KV store we write the identities of cluster masters. For each cluster, there’s a set of KV entries indicating the cluster’s master fqdn, port, ipv4, ipv6.

Each GLB/HAProxy node runs consul-template: a service that listens on changes to Consul data (in our case: changes to clusters masters data). consul-template produces a valid config file and is able to reload HAProxy upon changes to the config.

Thus, a change in Consul to a master’s identity is observed by each GLB/HAProxy box, which then reconfigures itself, sets the new master as the single entity in a cluster’s backend pool, and reloads to reflect those changes.

At GitHub we have a Consul setup in each data center, and each setup is highly available. However, these setups are independent of each other. They do not replicate between each other and do not share any data.

How does Consul get told of changes, and how is the information distributed cross-DC?

orchestrator/raft

We run an orchestrator/raft setup: orchestrator nodes communicate to each other via raft consensus. We have one or two orchestrator nodes per data center.

orchestrator is charged with failure detection, with MySQL failover, and with communicating the change of master to Consul. Failover is operated by the single orchestrator/raft leader node, but the change, the news that a cluster now has a new master, is propagated to all orchestrator nodes through the raft mechanism.

As orchestrator nodes receive the news of a master change, they each communicate to their local Consul setups: they each invoke a KV write. DCs with more than one orchestrator representative will have multiple (identical) writes to Consul.

Putting the flow together

In a master crash scenario:

The orchestrator nodes detect failures.
The orchestrator/raft leader kicks off a recovery. A new master gets promoted.
orchestrator/raft advertises the master change to all raft cluster nodes.
Each orchestrator/raft member receives a leader change notification. They each update the local Consul’s KV store with the identity of the new master.
Each GLB/HAProxy has consul-template running, which observes the change in Consul’s KV store, and reconfigures and reloads HAProxy.
Client traffic gets redirected to the new master.

There is a clear ownership of responsibilities for each component, and the entire design is both decoupled as well as simplified. orchestrator doesn’t know about the load balancers. Consul doesn’t need to know where the information came from. Proxies only care about Consul. Clients only care about the proxy.

Furthermore:

There are no DNS changes to propagate.
There is no TTL.
The flow does not need the dead master’s cooperation. It is largely ignored.

Additional details

To further secure the flow, we also have the following:

HAProxy is configured with a very short hard-stop-after. When it reloads with a new backend server in a writer-pool, it automatically terminates any existing connections to the old master.
- With hard-stop-after we don’t even require cooperation from the clients, and this mitigates a split-brain scenario. It’s noteworthy that this isn’t hermetic, and some time passes before we kill old connections. But there’s then a point in time after which we’re comfortable and expect no nasty surprises.
We do not strictly require Consul to be available at all times. In fact, we only need it to be available at failover time. If Consul happens to be down, GLB continues to operate with the last known values and makes no drastic moves.
GLB is set to validate the identity of the newly promoted master. Similarly to our context-aware MySQL pools, a check is made on the backend server, to confirm it is indeed a writer node. If we happen to delete the master’s identity in Consul, no problem; the empty entry is ignored. If we mistakenly write the name of a non-master server in Consul, no problem; GLB will refuse to update it and keep running with last known state.

We further tackle concerns and pursue HA objectives in the following sections.

orchestrator/raft failure detection

orchestrator uses a holistic approach to detecting failure, and as such it is very reliable. We do not observe false positives: we do not have premature failovers, and thus do not suffer unnecessary outage time.

orchestrator/raft further tackles the case for a complete DC network isolation (aka DC fencing). A DC network isolation can cause confusion: servers within that DC can talk to each other. Is it they that are network isolated from other DCs, or is it other DCs that are being network isolated?

In an orchestrator/raft setup, the raft leader node is the one to run the failovers. A leader is a node that gets the support of the majority of the group (quorum). Our orchestrator node deployment is such that no single data center makes a majority, and any n-1 DCs do.

In the event of a complete DC network isolation, the orchestrator nodes in that DC get disconnected from their peers in other DCs. As a result, the orchestrator nodes in the isolated DC cannot be the leaders of the raft cluster. If any such node did happen to be the leader, it steps down. A new leader will be assigned from any of the other DCs. That leader will have the support of all the other DCs, which are capable of communicating between themselves.

Thus, the orchestrator node that calls the shots will be one that is outside the network isolated data center. Should there be a master in an isolated DC, orchestrator will initiate the failover to replace it with a server in one of the available DCs. We mitigate DC isolation by delegating the decision making to the quorum in the non-isolated DCs.

Quicker advertisement

Total outage time can further be reduced by advertising the master change sooner. How can that be achieved?

When orchestrator begins a failover, it observes the fleet of servers available to be promoted. Understanding replication rules and abiding by hints and limitations, it is able to make an educated decision on the best course of action.

It may recognize that a server available for promotion is also an ideal candidate, such that:

There is nothing to prevent the promotion of the server (and potentially the user has hinted that such server is preferred for promotion), and
The server is expected to be able to take all of its siblings as replicas.

In such a case orchestrator proceeds to first set the server as writable, and immediately advertises the promotion of the server (writes to Consul KV in our case), even while asynchronously beginning to fix the replication tree, an operation that will typically take a few more seconds.

It is likely that by the time our GLB servers have been fully reloaded, the replication tree is already intact, but it is not strictly required. The server is good to receive writes!

Semi-synchronous replication

In MySQL’s semi-synchronous replication a master does not acknowledge a transaction commit until the change is known to have shipped to one or more replicas. It provides a way to achieve lossless failovers: any change applied on the master is either applied or waiting to be applied on one of the replicas.

Consistency comes with a cost: a risk to availability. Should no replica acknowledge receipt of changes, the master will block and writes will stall. Fortunately, there is a timeout configuration, after which the master can revert back to asynchronous replication mode, making writes available again.

We have set our timeout at a reasonably low value: 500ms. It is more than enough to ship changes from the master to local DC replicas, and typically also to remote DCs. With this timeout we observe perfect semi-sync behavior (no fallback to asynchronous replication), as well as feel comfortable with a very short blocking period in case of acknowledgement failure.

We enable semi-sync on local DC replicas, and in the event of a master’s death, we expect (though do not strictly enforce) a lossless failover. Lossless failover on a complete DC failure is costly and we do not expect it.

While experimenting with semi-sync timeout, we also observed a behavior that plays to our advantage: we are able to influence the identity of the ideal candidate in the event of a master failure. By enabling semi-sync on designated servers, and by marking them as candidates, we are able to reduce total outage time by affecting the outcome of a failure. In our experiments we observe that we typically end up with the ideal candidates, and hence run quick advertisements.

Heartbeat injection

Instead of managing the startup/shutdown of the pt-heartbeat service on promoted/demoted masters, we opted to run it everywhere at all times. This required some patching so as to make pt-heartbeat comfortable with servers either changing their read_only state back and forth or completely crashing.

In our current setup pt-heartbeat services run on masters and on replicas. On masters, they generate the heartbeat events. On replicas, they identify that the servers are read-only and routinely recheck their status. As soon as a server is promoted as master, pt-heartbeat on that server identifies the server as writable and begins injecting heartbeat events.

orchestrator ownership delegation

We further delegated to orchestrator:

Pseudo-GTID injection,
Setting the promoted master as writable, clearing its replication state, and
Setting the old master as read_only, if possible.

On all things new-master, this reduces friction. A master that is just being promoted is clearly expected to be alive and accessible, or else we would not promote it. It makes sense, then, to let orchestrator apply changes directly to the promoted master.

Limitations and drawbacks

The proxy layer makes the apps unaware of the master’s identity, but it also masks the apps’ identities from the master. All the master sees are connections coming from the proxy layer, and we lose information about the actual source of the connection.

As distributed systems go, we are still left with unhandled scenarios.

Notably, on a data center isolation scenario, and assuming a master is in the isolated DC, apps in that DC are still able to write to the master. This may result in state inconsistency once network is brought back up. We are working to mitigate this split-brain by implementing a reliable STONITH from within the very isolated DC. As before, some time will pass before bringing down the master, and there could be a short period of split-brain. The operational cost of avoiding split-brains altogether is very high.

More scenarios exist: the outage of Consul at the time of the failover; partial DC isolation; others. We understand that with distributed systems of this nature it is impossible to close all of the loopholes, so we focus on the most important cases.

The results

Our orchestrator/GLB/Consul setup provides us with:

Reliable failure detection,
Data center agnostic failovers,
Typically lossless failovers,
Data center network isolation support,
Split-brain mitigation (more in the works),
No cooperation dependency,
Between 10 and 13 seconds of total outage time in most cases.
- We see up to 20 seconds of total outage time in less frequent cases, and up to 25 seconds in extreme cases.

Conclusion

The orchestration/proxy/service-discovery paradigm uses well known and trusted components in a decoupled architecture, which makes it easier to deploy, operate and observe, and where each component can independently scale up or down. We continue to seek improvements as we continuously test our setup.

↧

A Git protocol parser written in Go

June 20, 2018, 3:47 pm

≫ Next: Theory Suggests That All Genes Affect Every Complex Trait

≪ Previous: MySQL High Availability at GitHub

README.md

A Git protocol parser written in Go.

This is more like an experimental project to better understand the protocol.

This is not an official Google product (i.e. a 20% project).

↧

Theory Suggests That All Genes Affect Every Complex Trait

June 20, 2018, 12:19 pm

≫ Next: A Beginner's Guide to Firewalling with pf

≪ Previous: A Git protocol parser written in Go

The question most of genetics tries to answer is how genes connect to the traits we see. One person has red hair, another blonde hair; one dies at age 30 of Huntington’s disease, another lives to celebrate a 102nd birthday. Knowing what in the vast expanse of the genetic code is behind traits can fuel better treatments and information about future risks and illuminate how biology and evolution work. For some traits, the connection to certain genes is clear: Mutations of a single gene are behind sickle cell anemia, for instance, and mutations in another are behind cystic fibrosis.

But unfortunately for those who like things simple, these conditions are the exceptions. The roots of many traits, from how tall you are to your susceptibility to schizophrenia, are far more tangled. In fact, they may be so complex that almost the entire genome may be involved in some way, an idea formalized in a theory put forward last year.

Starting about 15 years ago, geneticists began to collect DNA from thousands of people who shared traits, to look for clues to each trait’s cause in commonalities between their genomes, a kind of analysis called a genome-wide association study (GWAS). What they found, first, was that you need an enormous number of people to get statistically significant results — one recent GWAS seeking correlations between genetics and insomnia, for instance, included more than a million people. Second, in study after study, even the most significant genetic connections turned out to have surprisingly small effects. The conclusion, sometimes called the polygenic hypothesis, was that multiple loci, or positions in the genome, were likely to be involved in every trait, with each contributing just a small part. (A single large gene can contain several loci, each representing a distinct part of the DNA where mutations make a detectable difference.)

How many loci that “multiple” description might mean was not defined precisely. One very early genetic mapping study in 1999 suggested that “a large number of loci (perhaps > than 15)” might contribute to autism risk, recalled Jonathan Pritchard, now a geneticist at Stanford University. “That’s a lot!” he remembered thinking when the paper came out.

Over the years, however, what scientists might consider “a lot” in this context has quietly inflated. Last June, Pritchard and his Stanford colleagues Evan Boyle and Yang Li (now at the University of Chicago) published a paper about this in Cell that immediately sparked controversy, although it also had many people nodding in cautious agreement. The authors described what they called the “omnigenic” model of complex traits. Drawing on GWAS analyses of three diseases, they concluded that in the cell types that are relevant to a disease, it appears that not 15, not 100, but essentially all genes contribute to the condition. The authors suggested that for some traits, “multiple” loci could mean more than 100,000.

The reaction was swift. “It caused a lot of discussion,” said Barbara Franke, a geneticist at Radboud University in the Netherlands who studies attention deficit hyperactivity disorder (ADHD). “Everywhere you went the omnigenic paper would be discussed.” The Journal of Psychiatry and Brain Science did a special issue just of response papers, some of them taking exception to the name, some saying that after all it was just an expansion of earlier ideas. A year on, however, the study has been cited more than 200 times, by papers whose subjects range from GWAS data to individual receptors. It seems to have encapsulated something many people in the genomics community had been turning over in their minds. But exactly what scientists should do with its insights depends on whom you talk to.

An Infinity of Small Effects

The origin of the idea lies in a very simple observation: When you look at the portions of the genome that GWAS findings have flagged as significant to individual traits, they are eerily well-distributed. Pritchard and his colleagues had been studying loci that contribute to height in humans. “What we realized was that the signal for height was coming from almost the whole genome,” he said. If the genome were a long string of ornamental lights, and every DNA snippet linked to height were illuminated, more than 100,000 lights would be shining all the way down the string. That result contrasted starkly with the general expectation that GWAS findings would be clustered around the most important genes for a trait.

Then, while looking at GWAS analyses of schizophrenia, rheumatoid arthritis and Crohn’s disease, the researchers found something else unexpected. In our current understanding, disease often arises because of malfunctions in key biological pathways. Depending on the disease, this might lead to the overactivation of immune cells, for example, or the underproduction of a hormone. You might expect that the genetic loci incriminated by GWAS would be in genes in that key pathway. And you’d expect those genes would be ones used specifically in the types of cells associated with that disease: immune cells for autoimmune diseases, brain cells for psychiatric disorders, or pancreatic cells for diabetes, for instance.

But when the researchers looked at disease-specific cell types, an enormous number of the regions flagged by GWAS were not in those genes. They were in genes expressed in nearly every cell in the body — genes doing basic maintenance tasks that all cells need. Pritchard and his colleagues suggest that this manifests a truth that is perhaps not always taken literally: Everything in a cell is connected. If incremental disruptions in basic processes can add up to greatly derange a trait, then perhaps nearly every gene expressed in a cell, no matter how seemingly unrelated to the metabolic process of interest, matters.

In its broadest strokes, this idea has been around since 1918, when R. A. Fisher, one of the founders of population genetics, proposed that complex traits could be produced by an infinite number of genes, each with infinitely small effects. But his was a statistical model that didn’t refer to any actual, specific biological conditions. It seems we are now in the era of being able to provide those specifics.

“This was the right paper at the right time,” according to Aravinda Chakravarti, a professor of neuroscience and physiology and director of the Center for Human Genetics and Genomics at New York University, who was a prepublication reviewer of the omnigenics paper in Cell. He and others had noticed many examples of how widely distributed genetic influences could be, he said, but they had not put them together into a coherent thesis. He disagrees with critics who say the paper simply stated the obvious. “The paper clarified many points of view. It didn’t matter if I had thought about it — I had not thought about it hard enough. And I had never heard anybody thinking about it hard enough, with any clarity, [such] that it formed any new hypothesis.”

In the paper, Pritchard and his colleagues proposed that, when geneticists seek what’s responsible for a disease or trait, it may be fruitful to think of the genes in a cell as a network. There may be some very highly connected genes at the center of a disease process, which they dub core genes. Peripheral genes, meanwhile, in aggregate help tip the scales one way or the other. The Cell paper authors suggest that understanding of the core genes will offer the best insights into the mechanism of a disease. Piecing together how peripheral genes contribute, on the other hand, will broaden understanding of why some people develop a disorder and others don’t.

Do Core Genes Exist?

Since the Cell paper’s publication a year ago, scientists’ discussion has circled around whether such a distinction is useful. David Goldstein, a geneticist at Columbia University, is not sure that disease processes must truly be routed through core genes, but he also says that the idea that not everything picked up by GWAS is central and specific to a given disease is important. In the early days of GWAS, he said, when a connection between a genetic locus and a disease was detected, people would take that as a sign that it should be the target of investigation for new treatments, even if the connection was weak.

“Those arguments are all fine — and were — unless something like what Jonathan is describing is going on,” he continued. “That’s a really big deal in terms of our interpretation of GWAS,” because weakly connected loci might then be less useful for getting at the pathology of a disease than people thought.

Yet that may well depend on the disease, according to Naomi Wray, a quantitative geneticist at the University of Queensland who pointed out when scientists first started doing GWAS analyses that they should expect to see many weak associations. A few conditions, she says, are primarily attributable to a small number of identifiable genes, or even just one — yet other genes may still flip the switch between one manifestation of illness and another. She cites the example of Huntington’s disease, a progressive neurological disorder caused by a specific defect in one gene. The age at which it strikes depends on how many repeats of a particular DNA sequence someone has in that gene. But even among patients with the same number of repeats, the age at which symptoms first appear varies, as does the severity with which the disability progresses. Scientists in the field are looking at other loci linked to Huntington’s disease to see how they might be causing the differences.

“These [loci] are by definition in peripheral genes. But they’re actually how the body is responding to this major insult of the core gene,” Wray said.

For most complex conditions and diseases, however, she thinks that the idea of a tiny coterie of identifiable core genes is a red herring because the effects might truly stem from disturbances at innumerable loci — and from the environment — working in concert. In a new paper out in Cell this week, Wray and her colleagues argue that the core gene idea amounts to an unwarranted assumption, and that researchers should simply let the experimental data about particular traits or conditions lead their thinking. (In their paper proposing omnigenics, Pritchard and his co-authors also asked whether the distinction between core and peripheral genes was useful and acknowledged that some diseases might not have them.)

Teasing out the detailed genetics of diseases will therefore continue to require studies on very large numbers of people. Unfortunately, in the past year, Pritchard has been told that some groups applying for funding to do GWAS have been turned down by reviewers citing the omnigenics paper. He feels this reflects a misinterpretation: Omnigenics “explains why GWAS is hard,” he said. “It doesn’t mean we shouldn’t do GWAS.”

Franke, who sees the paper as a provocatively phrased extension of earlier ideas, says that it has nevertheless shaped her thinking in the past year. “It made me rethink what I know about signal transduction — about how messages are relayed in cells — and how functions are fulfilled,” she said. The deeper you look at the workings of a cell, the more you realize that a single common protein may have quite different effects depending on what type of cell it is in: It may bear different messages, or block different processes, so much so that traits that might seem to be quite disconnected begin to change.

“It gave a lot of food for thought,” she said of the paper, “and I think that was the goal.”

↧

A Beginner's Guide to Firewalling with pf

June 20, 2018, 2:49 pm

≫ Next: Tether releases law firm report attesting to $2.55 billion USD reserves

≪ Previous: Theory Suggests That All Genes Affect Every Complex Trait

A Beginner's Guide to Firewalling with pf

This guide is written for the person very new to firewalling. Please realize that the sample firewall we build should not be considered appropriate for actual use. I just try to cover a few basics, that took me awhile to grasp from the better known (and more detailed) documentation referenced below

It's my hope that this guide will not only get you started, but give you enough of a grasp of using pf so that you will then be able to go to those more advanced guides and perfect your firewalling skills.

The pf packet filter was developed for OpenBSD but is now included in FreeBSD, which is where I've used it. Having it run at boot and the like is covered in the various documents, however I'll quickly run through the steps for FreeBSD.

First one adds to /etc/rc.conf

pf_enable="YES"
pf_rules="/etc/pf.conf"
pflog_enable="YES"
pflog_logfile="/var/log/pflog"

This will start the pf filter at boot. We'll cover starting it while running after we put in some rules.

A few precautions

While experimenting with rules on a remote machine, please be careful. Many people have shut themselves out of a remote machine. One simple solution is to open a session in screen or tmux on the machine you're working on. In that session, as root, run

which will disable pf after 2 minutes. Another, slightly more involved solution is to add a quick cronjob to turn off pf, so that if you do make this mistake, you can get back into the box shortly. Don't do this now, wait till we've made some rules, but let's make a few preparations. Make a quick change to /usr/local/etc/sudoers (in FreeBSD, it may be elsewhere on other operating systems.)

visudo -f /usr/local/etc/sudoers

If your user name is john, then add a line (below any other lines giving you less privileges--sudo processes the sudoers file from top to bottom)

john    ALL= NOPASSWD: /sbin/pfctl

Now we edit the crontab.

Then, in your crontab add

*/2 * * * * /usr/local/bin/sudo /sbin/pfctl -d

This has the same effect, to stop pf after 2 minutes. (Change the time as desired.)

I dislike the NOPASSWD option, as it makes it easier to make serious mistakes--however, when I'm testing a bunch of pf rules, I usually put it in there as I'm constantly using pfctl. When done, and the ruleset works, I remove the NOPASSWD part.

Ok, now that we've ensured we'll be able to get back in the box, let's create an /etc/pf.conf file. In FreeBSD, the file already exists. What I usually do is start in my home directory, create the rules, and then test them, loading them with sudo. Then, when it's working the way I want it to work, I backup the default pf.conf and copy it over.

mv /etc/pf.conf /etc/pf.conf.bak
cp pf.conf /etc

(As we haven't yet created the file, don't do it yet) :).

A simple ruleset

In your home directory, open up your favorite text editor and create a file called pf.conf to start setting your rules. We start with some macros. Macros are similar to variables in programming. A brief example

We've created a macro. Later, when we make rules, rather than having to make one rule for port 80 and a second for port 22 we can simply do something like

pass out proto tcp to port $tcp_pass

(Note that you need the $ in front of the macro's name.)

Then, if we decide we want to add port 123, the Network Time Protocol, rather than write a new rule, we can simply add 123 to our macro. We can also realize we forgot to allow it to get email so we'll add all three, pop3 to get email, smtp to send it and ntp to connect to time servers.

tcp_pass= "{ 80 22 25 110 123 }"

The pf filter will read the rules we create from top to bottom. So, we start with a rule to block everything

This keeps the box pretty secure, but we won't be able to do anything. So, in order to access web pages, allow the Network Time Protocol to connect to a time server and to ssh to an outside machine, we can use the macro we defined. Our pf.conf looks like this.

tcp_pass = "{ 80 22 25 110 123 }"
block all
pass out on fxp0 proto tcp to any port $tcp_pass keep state

Let's look at what we've done so far. First, we defined a macro. We used port numbers. While one can mix and match if they want, for clarity's sake, we try to use the same pattern for all ports. We can also put in commas if we want or the protocol name if it's defined in /etc/services. So, that macro could actually read

tcp_pass = "{ 80 ssh, ntp smtp 110}"

and still work. However, one should strive for consistancy.

My personal habit is to have one white space between left bracket and first port and another space between the last port listed and the white bracket. (However, that white space, as shown in the example above, is also optional--that is one can do { 80 or {80 and they will both work. Also, keeping in them in numerical order helps you remember what you're doing. So, let's clean up the above example a little bit.

tcp_pass= "{ 22 25 80 110 123 }"

You can only use a name if the port is defined in /etc/services. For instance, if you decide to set ssh to only accept connections on port 1222, you would have to put 1222 in that macro, NOT ssh. Port 1222 is defined in /etc/services as nerv, the SNI R&D network, so if you check your rules with pfctl, it'll show that you have a rule to pass out to nerv. To avoid confusion, if you're going to use a non-standard port, use something that isn't listed in /etc/services.

You can also specify a range of ports with a colon. For instance, if I wanted to add samba, which uses ports 137, 138 and 139, I could have added 137:139.

Next, we block everything as our default. Now, we have to add rules to let things through. Remember, pf processes rules from top to bottom.

So, we are allowing out web traffic, ssh connections, sending mail, getting mail with pop3, and we're able to contact time servers.

So, we start with the action, pass. We're passing, not blocking. The order of syntax is important. I'm just giving the basic options here, again, this article can be considered a prep for more advanced tutorials.

pass out means we're allowing things out, not necessarily in. The next part, on fxp0 refers to the interface--in this case, a network card that in FreeBSD parlance is called fxp0. (An Intel card).

The to any part means that we're allowing it to go anywhere. Next, we specify the ports we're talking about--here we use our macro, tcp_pass, meaning we're allowing to the ports mentioned above.

The keep state part can be important. It means that once we've established the connection, pf is going to keep track of it, so the answer from, for example, a web page, doesn't have to go through checking each rule, but can just be opened. The same with pop3. One uses pop3 to contact a pop3 server (hrrm, obviously), and the server answers. As we have the keep state keywords, the server's answer can go right through once the connection is established.

Many of these are optional. For example, if we write pass on fxp0 rather than pass out on fxp0, then traffic will be allowed in both directions, in and out.

If you look through /etc/services, you'll see that some things, such as ipp, port 631 used with CUPS, use both tcp and udp. To deal with such things, we could insert another macro

We might find that sending email isn't working properly. Checking /etc/services we find that smtp can use udp.

grep smtp /etc/services
smtp             25/tcp    mail         #Simple Mail Transfer
smtp             25/udp    mail         #Simple Mail Transfer
smtps           465/tcp    #smtp protocol over TLS/SSL (was ssmtp)
smtps           465/udp    #smtp protocol over TLS/SSL (was ssmtp)

However, with our new macro, this is simple--we simply add it to udp_pass

Now, let's add the udp_pass macro to our ruleset

tcp_pass = "{ 22 25 80 110 123 }"
udp_pass = "{ 110 631 }"
block all
pass out on fxp0 proto tcp to any port $tcp_pass keep state
pass out on fxp0 proto udp to any port $udp_pass keep state

However, we find CUPS isn't working. A quick grep of ipp in /etc/services shows that it also uses tcp. So, we add 631 to our list of ports in the tcp_pass macro. Now we have

tcp_pass = "{ 22 25 80 110 123 631 }"
udp_pass = "{ 110 631 }"
block all
pass out on fxp0 proto tcp to any port $tcp_pass keep state
pass out on fxp0 proto udp to any port $udp_pass keep state

The same holds true for any other ports that you've forgotten. For instance, these rules don't allow DNS, port 53. We grep domain in /etc/services, and see that it uses both tcp and udp so we add 53 to both macros

tcp_pass = "{ 22 25 53 80 110 123 631 }"
udp_pass = "{ 53 110 631 }"

Using the quick keyword

Packets going in and out are matched against an entire ruleset before being passed or blocked. Sometimes, you want to speed things up and either quickly block or quickly pass a package.

In such cases, you can use the quick keyword. If a packet matches something in that line, it stops going through the rules and processes the packet. For example, let's say that you have a web server on your LAN behind a firewall, and you are sure that all requests for port 80 are coming from your internal network, so you want to quickly pass them through. (I can't see that being true in real life, but this is for an example of using quick).

pass in quick on fxp0 proto tcp to any port 80 keep state

Put that above the other rules, right after the macro definitions. Now, requests coming in for the LAN webserver will be passed right though (note that in this case it was pass in) without being matched against the rest of the ruleset.

Tables

Tables are useful and fast. They are used to hold a group of addresses. For instance, suppose you want to allow everything in from your local networks, 192.168.8.0 and 192.168.9.0

table <local> { 192.168.8.0/24, 192.168.9.0/24 }

Insert that line above your rules. Now, to allow everything from those two networks (and we'll make use of the quick keyword as well) add a rule. Since we're using quick we want this rule towards the top. If it was at the end, there's no point in using the quick keyword, for the packets would have already been matched against every rule above this one

pass in quick from <local> to any keep state

Sometimes, you might have mistyped something, so it's always good to check your tables.

will show you the contents of your table. Note that if you edit your pf.conf, adding a table, and then simply pfctl -d and -e, to disable and re-enable pf, the table rules may not be applied. The way to do it is

pfctl -t local -Tl -f /etc/pf.conf

The -t is for the table name, in this case, local. The -T is used to give various commands. In this case, we are using l for load. That is a lower case letter L, not the numeral one. The -f is for file, in this case, /etc/pf.conf where we have added the table.

We can use pfctl for various useful table commands. We've already mentioned the show command.

will show you the current working contents of a table called mytable. If you keep mytable in a static file, for example, /etc/pf.mytable, and edit the file, the way to get pf to use it is

When using a text file, create a line in pf.conf like

table <mytable> persist file "/etc/pf.mytable"

In this case, it is replacing whatever the former contents of mytable were with the latest contents of mytable.

Tables can be added and deleted on the fly, see the links at the end of this article for more details. To add or delete an address to a table on the fly, one can use, to add the address 192.168.1.115 to a table called mytable

pfctl -t mytable -T add 192.168.1.115

To remove the address

pfctl -t mytable -T delete 192.168.1.115

This all takes effect immediately. As mentioned above, you can confirm that the desired address has been added or deleted with

Another popular use of tables is to block bruteforce attacks. This can go directly into pf.conf

table <bruteforce>
block in quick from &ltbruteforce>
pass in inet proto tcp from any to any port ssh \
flags S/SA keep state \
(max-src-conn 10 max-src-conn-rate 10/30, \
overload <bruteforce> flush global)

This will block any IP trying to connect more than 10 times in 30 seconds. You may want much higher numbers, depending upon circumstances. In the Peter Hansteen article that showed me this, he has a max-src-conn at 100 and max-src-conn-rate at 15/5.

Anchors

Like tables, anchors can be added on the fly. However, tables are more for addresses and anchors are for rules. They can be handy while testing rulesets.

For example, I have an anchor to allow me to use ftp. I create a file that reads

pass out proto tcp from any to port 21 keep state
pass out proto tcp from any to port > 1023 keep state

I save it as /etc/ftp-anchor and at end of my /etc/pf.conf file put

Now, I want to grab something from ftp.FreeBSD.org. I load the anchor ruleset.

pfctl -a ftpanchor -f /etc/ftp-anchor

I can check that the rules are loaded.

pfctl -a ftpanchor -s rules

The -a is for anchor. In the first command the -f was for the file that we are using. The second command uses -s as in show to show the rules. I should get a reponse to that command showing the anchor ruleset of passing out proto tcp on ports 21 and everything above port 1023

Now I do whatever I had to do. I want to remove those rules.

pfctl -a ftpanchor -F rules

Doing the pfctl -a ftpanchor -s rules should now give me no response. My pf ruleset is back to normal.

One can edit the anchor ruleset and reload and unload it at will without having to restart pf or reload the entire pf.conf ruleset. It's handy for testing various rules. For example, perhaps I also need to allow out udp to do what I want. Rather than adding a line to pf.conf and reloading the entire ruleset, I can edit /etc/ftp-anchor, add a line to allow out udp, then reload the anchor with pfctl -a -f /etc/ftp-anchor again. Once again, when finished, I can flush the anchor rules and my pf ruleset is back to normal. (One doesn't need udp for ftp, but this is for example.)

If you think you will usually want to have the anchor in place, and only unload it on rare occasions, you would add, below the anchor line

load anchor ftpanchor from "/etc/ftp-anchor"

Now, the anchor will load whenever pf starts.

Anchors are quite flexible. The above gives very simple examples. The more detailed guides listed at the end of this article go into greater detail.

The pfctl command

We've already mentioned the pfctl command. It does a variety of things--as you can guess from its name, it controls pf. It's used to enable, disable, reload and flush rulesets as well as give status. I'l just cover a few of the main uses here, check the man page. You might remember the cronjob we made will simply disable the packet filter. That's a simple -d flag To test the syntax of rules without loading them we can use -n

pfctl -n -v -f /etc/pf.conf

The -v is for verbose and the -f for the file.

To put the packet filter we've created into effect, assuming you are user john and you've created it in your home directory, as john

The -f stands for file. Say we've checked it out and we don't like it so we want to go back to our default rules, in /etc/

sudo pfctl -f /etc/pf.conf

It used to be pfctl -Rf /etc/pf.conf but (and last I looked the man page doesn't mention that) if you do so you'll get a message that you must enable table loading for optimization. So, just use pfctl -f /etc/pf.conf

There are a variety of uses for pfctl. Doing pfctl -s info gives you a quick look at what's going on. Doing pfctl -vs rules shows you your rules and what's happening, for example

pass out proto tcp from any to any port = http keep state
  [ Evaluations: 96        Packets: 906       Bytes: 496407      States: 0     ]
pass out proto tcp from any to any port = pop3 keep state
  [ Evaluations: 96        Packets: 514       Bytes: 71260       States: 0     ]

The man page gives a complete list.

Now it's time to test this. It's early morning, and I'm not thinking that clearly. Hopefully, however, I've remembered to add myself to sudoers as being able to run pfctl without a password and remember to quickly set up the cronjob so that if I make a mistake, pf will be disabled shortly. (VERY necessary if you're testing this on a machine to which you don't have physical access). So, I put these rules which I've saved into a file pf.conf in my home directory into operation.

I find that I've locked myself out of the box. Oops. I forgot to allow ssh connections in, and the connection that I was using has just been blocked.

So, I wait a few minutes for the crontab to disable pf and log back into the remote machine. This time, I remember to add a rule

pass in proto tcp to port 22 keep state

Now, these rules seem to be working. So, I copy them over to /etc/pf.conf.

sudo cp pf.conf /etc/
sudo pftcl -f /etc/pf.conf
sudo pftcl -s rules

Listing the rules will show me that it's doing what I want to do.

Now, of course, I remove the cronjob. If you're going to be experimenting with your rules frequently, and will be needing it again, just do crontab -e and comment the line out with a #.

A few odds and ends

Using ftp with pf can be problematic. For detailed explanations of workarounds, see the more detailed guides referenced at the end of this article. In a nutshell, you have to redirect them to a proxy. The only trick that I don't believe is mentioned in the more detailed guides is that it has to go above your filtering rules. Otherwise, you'll get an error message that rules must be in order options, normalization, etc.

Most of what is covered in the more advanced guides seems to be aimed at having an ftp server. To simply use an ftp client from your workstation, rather than use the rdr (for redirect) rules, you might simply use an anchor to allow out proto tcp while using ftp. I gave an example in the section about anchors.

There is also the scrub keyword. Again, while this is more fully explained in the references below in a nutshell it normalizes fragmented packets. The usual line is

This rule must also go above the filtering rules (and above the rdr to redirect ftp if you use it.) It is the "normalization" referred to in that error message. At the top of the ruleset you define your macros, tables and the like. Then would come the scrub rule, then the rdr rule, then your filtering rules (the group that in our example begins with block all). Using the scrub keyword can help protect against certain kinds of attacks, and it's a good line to have.

In FreeBSD, one of the older versions, (5.something or perhaps 6.0) you have to add the pf module to your kernel. In later versions, it's not necessary, it will load the module if listed in /etc/rc.conf. Check /usr/src/sys/conf/NOTES to see the various options, but I just had

device          pf
device          pflog
device          pfsync

I also have

in my kernel. Although I don't make use of queuing, otherwise, one gets a warning every time they reload their pf rules that ALTQ isn't enabled in the kernel.

(I'm not sure if this is still needed or not, all the boxes I have still have it in their custom kernels and I've never removed it.)

If you use the module, it assumes the presence of device bpf, INET and INET6 in the kernel. (See the handbook page on pf.)

Logging

Logging is done by adding the log keyword to a rule. First, in /etc/rc.conf, you should have the lines

pflog_enable="YES"
pflog_logfile="/var/log/pf.log"

The log keyword comes after pass or block in or out. For example, we have a rule

pass out proto tcp from any to port 21 keep state

To log it, we would change that line to read

pass out log proto tcp from any to port 21 keep state

You may have to run the command touch /var/log/pf.log, or it may be automatically created. After this is done, restart pf

It isn't in readable form, you have to use tcpdump to read it.

tcpdump -n -e -ttt -r /var/log/pf.log

To read it in realtime

tcpdump -n -e -ttt -i pflog0

Nat and redirection

It's pretty easy to use pf for NAT and redirection. One common use I've seen is with a FreeBSD jail server, cloning its lo interface to give said interface a private range of addresses. Then use pf to redirect queries on say, 80 (and/or 443) to that address. First we clone the interface. In the system's /etc/rc.conf we put

cloned_interfaces=lo1"
ipv4_addrs_lo1="192.168.1.1-9/29"

(That is a lower case L, lowercase letter O, and the numeral one, as in loopback one.) That gives a range from 192.168.1.1 to 192.168.1.9. To do this on the fly, the commands are

ifconfig lo1 create
ifconfig lo1 inet 192.168.1.1/29

I haven't needed this often enough to figure out the syntax of creating multiple lo1 addresses with ifconfig. If I were doing it on the fly, I would set up rc.conf as mentioned, but use an alias on the command line, e.g.

ifconfig lo1 alias 192.168.1.2/29.

(You could also use aliases in /etc/rc.conf instead of the syntax I have--as the saying goes, in Unix there's always more than one way to do something with a corollary that someone thinks your way is stupid, but I digress.)

We'll assume you have a public address of 5.6.7.8 and you will redirect inquiries to your webserver running on cloned lo1 interface 192.168.1.1. Our public network interface will be bce0. We'll assume the reader has built a jail to host their web server. I have a page on jails for readers looking to get started with them.

Now we edit /etc/pf.conf. At the top, we define macros.

These two lines allow things from the jail to go out through the public interface, in case that wasn't clear. It may not even be necessary. In some cases, you may need tcp but not udp. The user can adapt it to their needs.

These should all be added above rules, otherwise, one gets an error that rules must be in order; options, normalization, queueing, translation, filtering.

We can add other jails. For example, if we did give lo1 a range of addresses, we might have 192.168.1.2 be a jail running MySQL. Create a couple of new macros.

SQLJAIL="192.168.1.2"
SQLPORT="{ 3306 }"

Then add

rdr pass on bce0 proto tcp from any to $IP_PUB port $SQLPORT -> $SQLJAIL

(Or, as we expect 3306 to be the only port used, don't bother making a macro for it, just have the rule read from any to $IP_PUB port 3306. As long as the reader understands the syntax, they can do adapt it to their needs.)

This page is only meant to be an introduction. However, if you've understood these simple rulesets, you're probably ready to look at the more sophisticated tutorials. The two that I use mostfrequently are theOpenBSD PF User's Guide and Peter N.M. Hansteen's tutorial Feel free to email me if you detect any glaring errors. I am no expert on pf, but hope that this brief introduction will make it easier for the novice to grasp its basic concepts.

There have been various, sometimes significant, divergence in OpenBSD's and FreeBSD's pf syntax, so the more recent OpenBSD guides may not work as expected in FreeBSD. To get the older OpenBSD faqs, you can use cvs. For the 4.1 version use

cvs -d anoncvs@anoncvs1.ca.openbsd.org:/cvs get -D "May 2, 2007" www/faq/pf

To get the 4.5 version

cvs -d anoncvs@anoncvs1.ca.openbsd.org:/cvs get -D "May 2, 2009" www/faq/pf

(Thanks to daemonforums user jggimi for that tip.)

↧