Quantcast
Channel: Hacker News
Viewing all 25817 articles
Browse latest View live

Ask HN: Apple maps vs. Google maps today?

$
0
0

I use Google Maps, Apple maps, Bing maps and OpenStreetMap

Some comments:

- Apple maps were pretty awful when they first came out but now seem good enough (and I generally use them for walking directions)

- Google StreetView is great - satellite view where we are in the UK is about 12 years old so useful for a historical view

- Apple and Bing satellite views are good, much more up to date that Google's and often better quality

- Bing in the UK has Ordnance Survey maps down to 1:25,000 which is awesome

- OpenStreetMap often has details (particularly footpaths) that no other map has

Edit: Was impressed with the speed that Google had the new Queensferry Crossing bridge on their maps - neither Bing or Apple have this on their maps yet. OpenStreetMap does, of course!

Edit2: Use my cars built in sat-nav, mainly because of the ergonomics (big buttons) and multiple displays.


I find that in richer countries with more iPhones, Apple Maps is on par with Google Maps. But in places with fewer iPhones, it can go a bit haywire.

Here's a couple of Apple/Google maps comparisons I've noticed and taken screenshots off. First one in San Andres (Colombia), second in East Jerusalem. http://imgur.com/a/QgnS3

I also noticed that Apple Maps would translate street names, in lots of Colombian countries, streets are laid out in a grid. With the streets that go east to west called "Calles", and north to south called "Carreras" (i.e. Calle 69 con carrera 4).

Apple maps would translate these to be called Street 69 and Road 4.


Apple Maps is fine. The rendering is still slightly worse than Google Maps when it comes to putting relevant information in the viewport (street names, primarily.)

But the search ... good grief, the search still sucks donkey balls through a molecular straw. It seems to lack any sense of your local context and location - which is kinda relevant for maps!

e.g. frequently when I used to search for "charing cross" whilst in London, it would offer "Charing Cross, Glasgow" as the default. (That's been fixed now)

There was another time I was searching for something in, IIRC, Dover, UK whilst in Dover, UK and it offered me something in Maryland, USA. Absolute dogshit.

And it just plain doesn't know about real places - searched for Leigh Library the other week (whilst about 400yds away!) which lives on Civic Square (part of its official quoted address) and the only "Civic Square" it offered was in Motherwell (Scotland!)


I've noticed the search is crap too. The other day I searched for a restaurant and it found nothing. Asked for directions anyway and it directed me to the right place so it knew where it was!

I use Apple Maps most of the time, although I keep GMaps installed just in case.


Switzerland here. Apple Maps is still pretty bad compared to Google Maps.

While Apple Maps is indeed usable, Google Maps is just so much better. Recently I searched for a train station like "Bahnhof Flamatt" and Apple Maps gave me something in the US. Another day I searched something in Bern and Apple Maps gave me results for New Bern in the US. On another day I suddenly got black and white (!) maps in Apple Maps, then I got blurry maps. All of areas that were just fine before.

Edit: typos


Show HN: A simple upload library inspired by flow.js and resumable.js

$
0
0

README.md

Sauce Test Status

中文

A JavaScript library providing multiple simultaneous, stable, fault-tolerant and resumable/restartable file uploads via the HTML5 File API.

Forked flow.js but refactor it.

The library is designed to introduce fault-tolerance into the upload of large files through HTTP. This is done by splitting each file into small chunks. Then, whenever the upload of a chunk fails, uploading is retried until the procedure completes. This allows uploads to automatically resume uploading after a network connection is lost either locally or to the server. Additionally, it allows for users to pause, resume and even recover uploads without losing state because only the currently uploading chunks will be aborted, not the entire upload.

Uploader (simple-uploader.js) does not have any external dependencies other than the HTML5 File API. This is relied on for the ability to chunk files into smaller pieces. Currently, this means that support is limited to Firefox 4+, Chrome 11+, Safari 6+ and Internet Explorer 10+.

Samples and examples are available in the samples/ folder. Please push your own as Markdown to help document the project.

New Features

  • Treat Folder and File as Uploader.File

  • Treat Uploader as a root Folder

  • New fileList property which contains files and folders

How can I install it?

Download a latest build from https://github.com/simple-uploader/Uploader/releases/ it contains development and minified production files in dist/ folder.

or use npm:

npm install simple-uploader.js

or use git clone

git clone https://github.com/simple-uploader/Uploader

How can I use it?

A new Uploader object is created with information of what and where to post:

var uploader =newUploader({
  target:'/api/photo/redeem-upload-token', 
  query: { upload_token:'my_token' }
})// Uploader isn't supported, fall back on a different methodif (!uploader.support) location.href='/some-old-crappy-uploader'

To allow files to be either selected and drag-dropped, you'll assign drop target and a DOM item to be clicked for browsing:

uploader.assignBrowse(document.getElementById('browseButton'))uploader.assignDrop(document.getElementById('dropTarget'))

After this, interaction with Uploader.js is done by listening to events:

uploader.on('fileAdded', function (file, event) {console.log(file, event)
})uploader.on('fileSuccess', function (rootFile, file, message) {console.log(rootFile, file, message)
})uploader.on('fileComplete', function (rootFile) {console.log(rootFile)
})uploader.on('fileError', function (rootFile, file, message) {console.log(rootFile, file, message)
})

How do I set it up with my server?

Most of the magic for Uploader.js happens in the user's browser, but files still need to be reassembled from chunks on the server side. This should be a fairly simple task and can be achieved in any web framework or language, which is able to receive file uploads.

To handle the state of upload chunks, a number of extra parameters are sent along with all requests:

  • chunkNumber: The index of the chunk in the current upload. First chunk is 1 (no base-0 counting here).
  • totalChunks: The total number of chunks.
  • chunkSize: The general chunk size. Using this value and totalSize you can calculate the total number of chunks. Please note that the size of the data received in the HTTP might be lower than chunkSize of this for the last chunk for a file.
  • totalSize: The total file size.
  • identifier: A unique identifier for the file contained in the request.
  • filename: The original file name (since a bug in Firefox results in the file name not being transmitted in chunk multipart posts).
  • relativePath: The file's relative path when selecting a directory (defaults to file name in all browsers except Chrome).

You should allow for the same chunk to be uploaded more than once; this isn't standard behaviour, but on an unstable network environment it could happen, and this case is exactly what Uploader.js is designed for.

For every request, you can confirm reception in HTTP status codes (can be change through the permanentErrors option):

  • 200, 201, 202: The chunk was accepted and correct. No need to re-upload.
  • 404, 415. 500, 501: The file for which the chunk was uploaded is not supported, cancel the entire upload.
  • Anything else: Something went wrong, but try reuploading the file.

Handling GET (or test() requests)

Enabling the testChunks option will allow uploads to be resumed after browser restarts and even across browsers (in theory you could even run the same file upload across multiple tabs or different browsers). The POST data requests listed are required to use Uploader.js to receive data, but you can extend support by implementing a corresponding GET request with the same parameters:

  • If this request returns a 200, 201 or 202 HTTP code, the chunks is assumed to have been completed.
  • If request returns a permanent error status, upload is stopped.
  • If request returns anything else, the chunk will be uploaded in the standard fashion.

After this is done and testChunks enabled, an upload can quickly catch up even after a browser restart by simply verifying already uploaded chunks that do not need to be uploaded again.

Full documentation

Uploader

Configuration

The object is loaded with a configuration options:

var r =newUploader({ opt1:'val', ...})

Available configuration options are:

  • target The target URL for the multipart POST request. This can be a string or a function. If a function, it will be passed a Uploader.File, a Uploader.Chunk and isTest boolean (Default: /)
  • singleFile Enable single file upload. Once one file is uploaded, second file will overtake existing one, first one will be canceled. (Default: false)
  • chunkSize The size in bytes of each uploaded chunk of data. The last uploaded chunk will be at least this size and up to two the size, see Issue #51 for details and reasons. (Default: 1*1024*1024)
  • forceChunkSize Force all chunks to be less or equal than chunkSize. Otherwise, the last chunk will be greater than or equal to chunkSize. (Default: false)
  • simultaneousUploads Number of simultaneous uploads (Default: 3)
  • fileParameterName The name of the multipart POST parameter to use for the file chunk (Default: file)
  • query Extra parameters to include in the multipart POST with data. This can be an object or a function. If a function, it will be passed a Uploader.File, a Uploader.Chunk object and a isTest boolean (Default: {})
  • headers Extra headers to include in the multipart POST with data. If a function, it will be passed a Uploader.File, a Uploader.Chunk object and a isTest boolean (Default: {})
  • withCredentials Standard CORS requests do not send or set any cookies by default. In order to include cookies as part of the request, you need to set the withCredentials property to true. (Default: false)
  • method Method to use when POSTing chunks to the server (multipart or octet) (Default: multipart)
  • testMethod HTTP method to use when chunks are being tested. If set to a function, it will be passed a Uploader.File and a Uploader.Chunk arguments. (Default: GET)
  • uploadMethod HTTP method to use when chunks are being uploaded. If set to a function, it will be passed a Uploader.File arguments. (Default: POST)
  • allowDuplicateUploads Once a file is uploaded, allow reupload of the same file. By default, if a file is already uploaded, it will be skipped unless the file is removed from the existing Uploader object. (Default: false)
  • prioritizeFirstAndLastChunk Prioritize first and last chunks of all files. This can be handy if you can determine if a file is valid for your service from only the first or last chunk. For example, photo or video meta data is usually located in the first part of a file, making it easy to test support from only the first chunk. (Default: false)
  • testChunks Make a GET request to the server for each chunks to see if it already exists. If implemented on the server-side, this will allow for upload resumes even after a browser crash or even a computer restart. (Default: true)
  • preprocess Optional function to process each chunk before testing & sending. To the function it will be passed the chunk as parameter, and should call the preprocessFinished method on the chunk when finished. (Default: null)
  • initFileFn Optional function to initialize the fileObject. To the function it will be passed a Uploader.File arguments.
  • readFileFn Optional function wrapping reading operation from the original file. To the function it will be passed the Uploader.File, the startByte and endByte, the fileType and the Uploader.Chunk.
  • generateUniqueIdentifier Override the function that generates unique identifiers for each file. (Default: null)
  • maxChunkRetries The maximum number of retries for a chunk before the upload is failed. Valid values are any positive integer and undefined for no limit. (Default: 0)
  • chunkRetryInterval The number of milliseconds to wait before retrying a chunk on a non-permanent error. Valid values are any positive integer and undefined for immediate retry. (Default: undefined)
  • progressCallbacksInterval The time interval in milliseconds between progress reports. Set it to 0 to handle each progress callback. (Default: 500)
  • speedSmoothingFactor Used for calculating average upload speed. Number from 1 to 0. Set to 1 and average upload speed wil be equal to current upload speed. For longer file uploads it is better set this number to 0.02, because time remaining estimation will be more accurate. This parameter must be adjusted together with progressCallbacksInterval parameter. (Default 0.1)
  • successStatuses Response is success if response status is in this list (Default: [200,201, 202])
  • permanentErrors Response fails if response status is in this list (Default: [404, 415, 500, 501])

Properties

  • .support A boolean value indicator whether or not Uploader.js is supported by the current browser.
  • .supportDirectory A boolean value, which indicates if browser supports directory uploads.
  • .opts A hash object of the configuration of the Uploader.js instance.
  • .files An array of Uploader.File file objects added by the user (see full docs for this object type below).
  • .fileList An array of Uploader.File file(folder) objects added by the user (see full docs for this object type below), but it treated Folder as a Uploader.File Object.

Methods

  • .assignBrowse(domNodes, isDirectory, singleFile, attributes) Assign a browse action to one or more DOM nodes.

    • domNodes array of dom nodes or a single node.
    • isDirectory Pass in true to allow directories to be selected (Chrome only, support can be checked with supportDirectory property).
    • singleFile To prevent multiple file uploads set this to true. Also look at config parameter singleFile.
    • attributes Pass object of keys and values to set custom attributes on input fields. For example, you can set accept attribute to image/*. This means that user will be able to select only images. Full list of attributes: https://www.w3.org/wiki/HTML/Elements/input/file

    Note: avoid using a and button tags as file upload buttons, use span instead.

  • .assignDrop(domNodes) Assign one or more DOM nodes as a drop target.

  • .unAssignDrop(domNodes) Unassign one or more DOM nodes as a drop target.

  • .on(event, callback) Listen for event from Uploader.js (see below)

  • .off([event, [callback]]):

    • .off(event) Remove all callbacks of specific event.
    • .off(event, callback) Remove specific callback of event. callback should be a Function.
  • .upload() Start or resume uploading.

  • .pause() Pause uploading.

  • .resume() Resume uploading.

  • .cancel() Cancel upload of all Uploader.File objects and remove them from the list.

  • .progress() Returns a float between 0 and 1 indicating the current upload progress of all files.

  • .isUploading() Returns a boolean indicating whether or not the instance is currently uploading anything.

  • .addFile(file) Add a HTML5 File object to the list of files.

  • .removeFile(file) Cancel upload of a specific Uploader.File object on the list from the list.

  • .getFromUniqueIdentifier(uniqueIdentifier) Look up a Uploader.File object by its unique identifier.

  • .getSize() Returns the total size of the upload in bytes.

  • .sizeUploaded() Returns the total size uploaded of all files in bytes.

  • .timeRemaining() Returns remaining time to upload all files in seconds. Accuracy is based on average speed. If speed is zero, time remaining will be equal to positive infinity Number.POSITIVE_INFINITY

Events

  • .fileSuccess(rootFile, file, message, chunk) A specific file was completed. First argument rootFile is the root Uploader.File instance which contains or equal the completed file, second argument file argument is instance of Uploader.File too, it's the current completed file object, third argument message contains server response. Response is always a string. Fourth argument chunk is instance of Uploader.Chunk. You can get response status by accessing xhr object chunk.xhr.status.
  • .fileComplete(rootFile) A root file(Folder) was completed.
  • .fileProgress(rootFile, file, chunk) Uploading progressed for a specific file.
  • .fileAdded(file, event) This event is used for file validation. To reject this file return false. This event is also called before file is added to upload queue, this means that calling uploader.upload() function will not start current file upload. Optionally, you can use the browser event object from when the file was added.
  • .filesAdded(files, fileList, event) Same as fileAdded, but used for multiple file validation.
  • .filesSubmitted(files, fileList, event) Same as filesAdded, but happens after the file is added to upload queue. Can be used to start upload of currently added files.
  • .fileRemoved(file) The specific file was removed from the upload queue. Combined with filesSubmitted, can be used to notify UI to update its state to match the upload queue.
  • .fileRetry(rootFile, file, chunk) Something went wrong during upload of a specific file, uploading is being retried.
  • .fileError(rootFile, file, message, chunk) An error occurred during upload of a specific file.
  • .uploadStart() Upload has been started.
  • .complete() Uploading completed.
  • .catchAll(event, ...) Listen to all the events listed above with the same callback function.

Uploader.File

Properties

  • .uploader A back-reference to the parent Uploader object.
  • .name The name of the file(folder).
  • .averageSpeed Average upload speed, bytes per second.
  • .currentSpeed Current upload speed, bytes per second.
  • .paused Indicated if file(folder) is paused.
  • .error Indicated if file(folder) has encountered an error.
  • .isFolder Indicated if file(folder) is an Directory.

If .isFolder is false then these properties will be added:

  • .file The correlating HTML5 File object.
  • .relativePath The relative path to the file (defaults to file name if relative path doesn't exist).
  • .size Size in bytes of the file.
  • .uniqueIdentifier A unique identifier assigned to this file object. This value is included in uploads to the server for reference, but can also be used in CSS classes etc when building your upload UI.
  • .chunks An array of Uploader.Chunk items. You shouldn't need to dig into these.

Methods

  • .getRoot() Returns the file's root Uploader.File instance in uploader.fileList.
  • .progress() Returns a float between 0 and 1 indicating the current upload progress of the file.
  • .pause() Pause uploading the file.
  • .resume() Resume uploading the file.
  • .cancel() Abort uploading the file and delete it from the list of files to upload.
  • .retry() Retry uploading the file.
  • .bootstrap() Rebuild the state of a Uploader.File object, including reassigning chunks and XMLHttpRequest instances.
  • .isUploading() Returns a boolean indicating whether file chunks is uploading.
  • .isComplete() Returns a boolean indicating whether the file has completed uploading and received a server response.
  • .sizeUploaded() Returns size uploaded in bytes.
  • .timeRemaining() Returns remaining time to finish upload file in seconds. Accuracy is based on average speed. If speed is zero, time remaining will be equal to positive infinity Number.POSITIVE_INFINITY
  • .getExtension() Returns file extension in lowercase.
  • .getType() Returns file type.

Origin

Uploader.js was inspired by and evolved from https://github.com/flowjs/flow.js and https://github.com/23/resumable.js.

Why Women Had Better Sex Under Socialism

$
0
0

Ms. Durcheva was a single mother for many years, but she insisted that her life before 1989 was more gratifying than the stressful existence of her daughter, who was born in the late 1970s.

“All she does is work and work,” Ms. Durcheva told me in 2013, “and when she comes home at night she is too tired to be with her husband. But it doesn’t matter, because he is tired, too. They sit together in front of the television like zombies. When I was her age, we had much more fun.”

Last year in Jena, a university town in the former East Germany, I spoke with a recently married 30-something named Daniela Gruber. Her own mother — born and raised under the Communist system — was putting pressure on Ms. Gruber to have a baby.

“She doesn’t understand how much harder it is now — it was so easy for women before the Wall fell,” she told me, referring to the dismantling of the Berlin Wall in 1989. “They had kindergartens and crèches, and they could take maternity leave and have their jobs held for them. I work contract to contract, and don’t have time to get pregnant.”

This generational divide between daughters and mothers who reached adulthood on either side of 1989 supports the idea that women had more fulfilling lives during the Communist era. And they owed this quality of life, in part, to the fact that these regimes saw women’s emancipation as central to advanced “scientific socialist” societies, as they saw themselves.

Although East European Communist states needed women’s labor to realize their programs for rapid industrialization after World War II, the ideological foundation for women’s equality with men was laid by August Bebel and Friedrich Engels in the 19th century. After the Bolshevik takeover, Vladimir Lenin and Aleksandra Kollontai enabled a sexual revolution in the early years of the Soviet Union, with Kollontai arguing that love should be freed from economic considerations.

Russia extended full suffrage to women in 1917, three years before the United States did. The Bolsheviks also liberalized divorce laws, guaranteed reproductive rights and attempted to socialize domestic labor by investing in public laundries and people’s canteens. Women were mobilized into the labor force and became financially untethered from men.

In Central Asia in the 1920s, Russian women crusaded for the liberation of Muslim women. This top-down campaign met a violent backlash from local patriarchs not keen to see their sisters, wives and daughters freed from the shackles of tradition.

In the 1930s, Joseph Stalin reversed much of the Soviet Union’s early progress in women’s rights — outlawing abortion and promoting the nuclear family. However, the acute male labor shortages that followed World War II spurred other Communist governments to push forward with various programs for women’s emancipation, including state-sponsored research on the mysteries of female sexuality. Most Eastern European women could not travel to the West or read a free press, but scientific socialism did come with some benefits.

“As early as 1952, Czechoslovak sexologists started doing research on the female orgasm, and in 1961 they held a conference solely devoted to the topic,” Katerina Liskova, a professor at Masaryk University in the Czech Republic, told me. “They focused on the importance of the equality between men and women as a core component of female pleasure. Some even argued that men need to share housework and child rearing, otherwise there would be no good sex.”

Agnieszka Koscianska, an associate professor of anthropology at the University of Warsaw, told me that pre-1989 Polish sexologists “didn’t limit sex to bodily experiences and stressed the importance of social and cultural contexts for sexual pleasure.” It was state socialism’s answer to work-life balance: “Even the best stimulation, they argued, will not help to achieve pleasure if a woman is stressed or overworked, worried about her future and financial stability.”

In all the Warsaw Pact countries, the imposition of one-party rule precipitated a sweeping overhaul of laws regarding the family. Communists invested major resources in the education and training of women and in guaranteeing their employment. State-run women’s committees sought to re-educate boys to accept girls as full comrades, and they attempted to convince their compatriots that male chauvinism was a remnant of the pre-socialist past.

Although gender wage disparities and labor segregation persisted, and although the Communists never fully reformed domestic patriarchy, Communist women enjoyed a degree of self-sufficiency that few Western women could have imagined. Eastern bloc women did not need to marry, or have sex, for money. The socialist state met their basic needs and countries such as Bulgaria, Poland, Hungary, Czechoslovakia and East Germany committed extra resources to support single mothers, divorcées and widows. With the noted exceptions of Romania, Albania and Stalin’s Soviet Union, most Eastern European countries guaranteed access to sex education and abortion. This reduced the social costs of accidental pregnancy and lowered the opportunity costs of becoming a mother.

Some liberal feminists in the West grudgingly acknowledged those accomplishments but were critical of the achievements of state socialism because they did not emerge from independent women’s movements, but represented a type of emancipation from above. Many academic feminists today celebrate choice but also embrace a cultural relativism dictated by the imperatives of intersectionality. Any top-down political program that seeks to impose a universalist set of values like equal rights for women is seriously out of fashion.

The result, unfortunately, has been that many of the advances of women’s liberation in the former Warsaw Pact countries have been lost or reversed. Ms. Durcheva’s adult daughter and the younger Ms. Gruber now struggle to resolve the work-life problems that Communist governments had once solved for their mothers.

“The Republic gave me my freedom,” Ms. Durcheva once told me, referring to the People’s Republic of Bulgaria. “Democracy took some of that freedom away.”

As for Ms. Gruber, she has no illusions about the brutalities of East German Communism; she just wishes “things weren’t so much harder now.”

Because they championed sexual equality — at work, at home and in the bedroom — and were willing to enforce it, Communist women who occupied positions in the state apparatus could be called cultural imperialists. But the liberation they imposed radically transformed millions of lives across the globe, including those of many women who still walk among us as the mothers and grandmothers of adults in the now democratic member states of the European Union. Those comrades’ insistence on government intervention may seem heavy-handed to our postmodern sensibilities, but sometimes necessary social change — which soon comes to be seen as the natural order of things — needs an emancipation proclamation from above.

Correction: August 20, 2017

An opinion essay last week about Eastern European women’s lives under Communism misattributed responsibility for enacting women’s suffrage in Russia in 1917. It was achieved under the provisional government in July, not by the Bolsheviks, who did not seize power until November.

Continue reading the main story

Hot New ‘Anonymous’ Chat App Sarahah Hijacks Millions of Contact Data

$
0
0

This blog post was authored by Senior Security Analyst Zach Julian; you can connect with him on Twitter here.

By now, you may have heard the about Sarahah, the new anonymous chat application that’s gone viral around the world.

Sarahah, available for Android, iOS and via the web, allows users to send and receive anonymous messages.  The app has received widespread media attention online, and now boasts a user base of between 10 and 50 million users on Android alone, according to the Google Play Store.

The app has raised concerns around cyber-bullying, but that’s only a small part of the dangers of downloading and installing Sarahah.

Both the Android and iOS applications contain functionality to send every phone number, email address, and associated names on a device to Sarahah’s servers. Exactly how this happens depends on your phone.

Upon logging into the app, Sarahah will attempt to send all phone and email contacts outbound. On iOS and Android 6+, the operating system will prompt the user before allowing access to the phone’s contacts. Phones running Android 5 and below, of which there is still a significant market share, will have no further prompt about accessing contacts beyond the Play Store permissions during installation. It’s likely that most users permit access to their contacts without considering how this data may be used.

While it’s not uncommon for mobile applications to upload your contacts as part of a ‘find your friends’ feature, Sarahah has no such functionality. The creator of Sarahah has replied that this was planned for future implementation, that no contact data is stored, and that the application will not upload contacts in the next update.

Sarahah on Android

Immediately upon logging into the Android application (or after a period of inactivity on the app), the Saharah client makes two POST requests to www.sarahah.com, which contain the Android device’s phone and email contact details (as seen below):

On Android 5 and below, these requests will be issued silently and without user interaction. With an estimated 54% of users running Android 5 and below, this is a probably a substantial amount of Sarahah’s 10 to 50 million Android users.

Android 6 introduced permissions changes, so Android 6+ will prompt the user for access to the contacts, as shown below:

Sarahah’s permission prompt on its Android version.

Upon pressing “Allow”, all phone and email contacts will be uploaded to Sarahah. The address book on my phone consists of 164 contacts. Extrapolating this by 10 to 50 million users on Android alone means it’s possible Sarahah has harvested hundreds of millions of names, phone numbers, and email addresses from their users. Overall, Sarahah does not provide enough information for users to make an informed decision whether using the application is worth sharing this sensitive data.

Sarahah on iOS

iOS offered more protection against this data leak, explicitly prompting whether to allow the application access to the phone’s contacts. In the prompt, the application states:

Sarahah’s permission prompt on its iOS version.

If the user presses “OK”, all phone and email contacts will be transmitted to Sarahah in the same manner as on Android. After reviewing the application on iOS and Android, I was unable to find any functionality that would require access to “contacts to show you who has an account in Sarahah.” Unfortunately, it’s probably safe to assume that the majority users on both Android and iOS simply approve access to their contacts.

How This Works

The contact-harvesting functionality can be seen in the video below. The video begins by authenticating to Sarahah on Android. After logging in and retrieving some relevant account details, the two POST requests are made, transmitting my device’s phone and email contacts respectively.

You’re Not as Anonymous As You Think.

Sarahah, on both Android and iOS, does not provide users enough information on how their phone’s contact details will be used. While this functionality is claimed to be part of a future release, and that “the Sarahah database doesn’t currently hold a single contact”, unfortunately all we have is the company’s word.

With at least tens of millions of installs, consider how many phone numbers, names, and email addresses Sarahah has potentially harvested. Even names, numbers, and email addresses alone may be sensitive data for some users.

 

The Art of Philosophy: Visualising Aristotle in Early 17th-Century Paris

$
0
0

With their elaborate interplay of image and text, the several large-scale prints designed by the French friar Martin Meurisse to communicate Aristotelian thought are wonderfully impressive creations. Susanna Berger explores the function of these complex works, and how such visual commentaries not only served to express philosophical ideas in a novel way but also engendered their own unique mode of thinking.

Artificiosa totius logices descriptio
Meurisse and Gaultier’s Artificiosa totius logices descriptio, 1614 — Source.

In 1619, Martin Meurisse (1584–1644), a Franciscan professor of philosophy at the Grand Couvent des Cordeliers in Paris, became embroiled in a debate with the Protestant pastor François Oyseau (1545–1625) about the significance of the rituals of the mass. In the heat of the exchange, and riled by an accusation that he was a poor logician, Oyseau decried Meurisse to be an incompetent judge as he was “a logician only in picturing and copper-plate engraving.”1 Oyseau’s barb was alluding to a series of extravagantly engraved thesis prints, broadsides incorporating both text and image, that Meurisse had designed for his philosophy students to use during their oral examinations. Although for Oyseau they were mere “frivolous allegories”, these pedagogical prints are now considered among the most important early modern images of philosophy, and their inventive iconography inspired new visualizations of thought in a range of drawn and printed sources: from the lecture notebooks of students in Leuven to eighteenth-century German textbooks.

Produced in collaboration with the engraver Léonard Gaultier (1560/61–1635) and published by Jean Messager (1572–1649), Meurisse’s prints are wonderfully dense and complex creations — elaborate depictions of landscapes and architectural structures adorned with a dizzying array of figures, animals, and objects, and annotated with quotations from the writings of classical and scholastic philosophers. The first, a summary of logic entitled Artificiosa totius logices descriptio (Artificial Description of Logic in Its Entirety), appeared in 1614 (see image above). The following year, Meurisse and Gaultier produced the Clara totius physiologiae synopsis (Clear Synopsis of Physics in Its Entirety), which visualizes Aristotelian natural philosophy. Their third, the Laurus metaphysica (Laurel of Metaphysics) of 1616, represents metaphysics, and their fourth, Tableau industrieux de toute la philosophie morale (Artificial Table of Moral Philosophy in Its Entirety), of 1618, depicts moral philosophy. In addition, a fifth thesis print, entitled Typus necessitatis logicae ad alias scientias capessendas (Scheme of the Necessity of Logic for Grasping the Other Branches of Knowledge), inspired by the Descriptio, appeared some years later in 1622. Also engraved by Gaultier and published by Messager, this broadside was designed by the Carmelite philosophy professor Jean Chéron (1596–1673).2

Clara totius physiologiae synopsis
Meurisse and Gaultier’s Clara totius physiologiae synopsis, 1615 — Source.
Laurus metaphysica
Meurisse and Gaultier’s Laurus metaphysica, 1616 — Source.

All of these broadsides are extremely impressive in scale — not only physically as regards to size, but also in relation to the tremendous amount of work and close collaboration among the wide network of scholars, artisans, engravers, patrons, and printers involved. The effort and significant cost expended to create these and other philosophical images attest to how highly such prints were valued in the study and transmission of philosophy. Although the broadsides of Meurisse, Chéron, and Gaultier were relatively little known through the seventeenth and eighteenth centuries, for a handful of academics across Europe they had a deep impact on the philosophy classroom. Descriptio, Laurus metaphysica, and Tableau were reproduced and translated into English by Richard Dey, a graduate of the University of Cambridge, in mid-seventeenth-century London, while a copy of Meurisse’s Synopsis was displayed at the anatomy theater of the University of Leiden by Ottho van Heurne (1577–1652), professor of medicine.3 Meurisse’s acclaim as a designer of illustrated broadsides was also reported in the Hungarian travelogue Europica varietas (1620) by Márton Szepsi Csombor (1594–1623) who, while visiting Paris in 1618, was “anxious above all else to become acquainted with the celebrated, renowned, and highly intelligent friar, who with great mastery put the entire philosophy course on a[n engraver’s] plate.”4

These thesis prints’ kaleidoscopic mobile of images and texts visualizes something more than just harmonious principles of philosophy — in a way, it mirrors the early modern European mind itself as it produced and transmitted knowledge. For the knowledge generators of this time, the viewing and creation of imagery functioned as important instruments of philosophical thought and teaching. There were two particularly important mechanisms at play here. First, artists, students, and philosophers used the space of the page to map theoretical relationships, so that in creating and looking at these visual representations, they could think through the mechanism of spatial constructs. Second, through producing or simply examining figurative images, they could think through the mechanism of visual commentary. Both spatial constructs and visual commentaries worked together as part of a common project of philosophical thinking through visual representation.

Logic instruction at this time was based primarily on a collection of texts by Aristotle known as the Organon and on Porphyry’s third-century text, Isagoge, which served as a preface to the Organon.5 The teaching of these treatises reflected the view that logic should be organized into the three mental operations: apprehension, judgment, and ratiocination (or reasoning by using syllogisms). It is through apprehension, the first operation, that the conception of an object or term is brought to mind: for example, the apprehension of such concepts as “dog” and “mammal”. Through judgment, the second operation, simple concepts are then combined or divided to create propositions ( “Dogs are mammals”). By way of ratiocination, the third operation, the mind organizes these propositions to form syllogisms ( “Dogs are mammals / All mammals are animals / Thus dogs are animals”).

descriptio segments
Meurisse and Gaultier’s Artificiosa totius logices descriptio, 1614, shown divided into its main segments — Source.

To make sense of how the images within these broadsides use the space of the page to organize such conceptual affiliations, let us take a look at how the Descriptio print of Meurisse and Gaultier is structured.6 By slicing it up into horizontal segments, we can better see how the layers build upon one another successively, from the bottom of the page to the top (see image above). Segment one offers information about the engraving’s production and shows Meurisse and his students, surrounded by various personifications of logical concepts — crowds of children, men, and ladies, a one-legged laborer bearing a basket (of human limbs) — all approaching a walled garden.

one legged man
Detail of the man with the basket of limbs, from Meurisse and Gaultier’s Artificiosa totius logices descriptio, 1614. The figure visualises a passage in Boethius’s De divisione (ca. 515) concerning a form of partition. The body parts in the man’s basket, which seems also to hold his left foot, represent an explication of how a whole human body may be divided into parts — Source.

The inside of this garden, which appears in segment two, represents the realm of the first operation of the mind (apprehension), dominated by a central fountain, around which water spurts into dishes. To either side of these basins are six groups of real, logically graspable entities which contrast with the respective entities positioned just outside the garden, directly across the wall. So, for example, we see a group of “finite” angels inside, and over the wall outside, wearing a tiara and holding a globe, encircled and sustained by an incandescent cloud, an image of “infinite” God. Likewise elsewhere, a group of “complete” humans inside, and outside an array of disembodied hands and feet; three cows (identified as “real entities”) inside, and outside a chimera (marked “rational being”).

god angels
Detail from Meurisse and Gaultier’s Artificiosa totius logices descriptio, 1614 — Source.
chimera
Detail from Meurisse and Gaultier’s Artificiosa totius logices descriptio, 1614 — Source.

In the third segment above we meet another garden, this one, rather less busy, explicating the activity of the second operation (judgment). The fourth segment pertains to propositions that are organized into syllogisms (a result of the third operation, ratiocination). Crowning the whole piece, in segment five, is a dedication to the French statesman and bibliophile Jacques-Auguste de Thou (1553–1617), flanked on the left by de Thou’s coat of arms and on the right by the arms of the Franciscans.

Altogether the broadside allows viewers to grasp at a glance how a philosophical discipline can be divided into its parts and how its parts relate to one another and the greater whole. The individual sections of the print cannot be fully appreciated when seen in isolation; they gain their significance and meaning from their location within the broadside’s spatial system.

In addition to relying on the space of the page to show how the field of logic is organized, the Descriptio also features visual commentaries that offer original interpretations of Aristotle’s age-old system. Rather than merely reiterating already extant philosophical concepts, the visual form of this engraving enriches theoretical knowledge, functioning as a visual exegesis which produces new meaning. In a detail from segment three, for instance, we find two palm trees forming an archway, inscribed with a description of the creation of propositions. Meurisse has carved “noun” and “verb”, central elements in any proposition, into the trunks: each is “an utterance signified by convention”.

palm trees
Detail of the intertwining palms, from Meurisse and Gaultier’s Artificiosa totius logices descriptio, 1614 — Source.

Why palm trees in particular? Palms are dioecious — their male and female flowers grow on separate plants — and were thought, at the time, to reproduce by intertwining with the branches of palm trees of the opposite sex. In this way mating palm trees became popular emblems for conjugal love and fertility in French iconography between the mid-sixteenth and mid-eighteenth centuries.7 Meurisse and Gaultier are here therefore, drawing an analogy between the mating of trees and the production of propositions: adapting the emblem in a manner that they expected viewers of the broadside to understand and appreciate.8 This detail highlights the varied sources of inspiration for the designers of philosophical broadsides, but it also indicates the richness of their visual commentaries. Whereas a textbook from the period might simply define a proposition as the sum of its parts (a noun and a verb), this image presents a much more intricate and complex interpretation of the proposition, likening it to a new organic entity, a proposition generated by mating plants.

Even though the philosophical visualizations of Meurisse, Chéron, and Gaultier had an international reputation in the early modern period, they have since been largely forgotten, and although there is great interest among intellectual historians today in challenges to Aristotelian orthodoxies during the “scientific revolution”, no major study has focused on the visual documents integral to this epistemic shift. It is true that in recent years, a newly emerging and rich body of scholarship has started to explore the frescoes, oil paintings, prints, and drawings relating to the works of anti-Aristotelian philosophers such as Galileo Galilei (1564–1642) and Thomas Hobbes (1588–1679).9 But these studies consider only a piece of a larger picture, as it were. Image and image production were, in fact, vital in the early modern intellectual movements that embraced and developed Aristotelian thought, as evidenced by the many visual representations found among pedagogical and scholarly materials from the period, with perhaps no finer example than these thesis prints of Meurisse, Chéron, and Gaultier.



Susanna Berger is Assistant Professor of Art History at the University of Southern California. Her research and teaching explore diverse facets of art and visual culture from printed and drawn illustrations of philosophical knowledge to central works in the history of European early modern painting. Her first book, The Art of Philosophy: Visual Thinking in Europe from the Late Renaissance to the Early Enlightenment, appeared with Princeton University Press in March 2017.

The essay above is adapted from THE ART OF PHILOSOPHY: Visual Thinking in Europe from the Late Renaissance to the Early Enlightenment by Susanna Berger. Copyright © 2017 by Princeton University Press. Reprinted by permission.


1. Oyseau, Les faussetez insignes du sieur Meurisse Cordelier, qu’il a nagueres publiées contre le sieur Oyseau Ministre de L’Eglise de Reformée de Gyen Sur Loire. Avec la response et refutation des faussetez par ledit sieur Oyseau (Charenton: Samuel Petit, 1619), 20: “. . . n’est Logicien qu’en peinture & en taille douce . . .” The word “peinture” did not imply “painting” in the strict sense.

2. For the purposes of discussion, these broadsides will hereafter be referred to as Descriptio, Synopsis, Tableau, and Typus.

3. See Malcolm Jones, The Print in Early Modern England: An Historical Oversight (New Haven: Yale University Press, 2010), 47. The print was displayed either at the entrance to the anatomy theater or in one of the rooms above or below the stage.

4. See Márton Szepsi Csombor, Szepsi Csombor Márton összes müvei (Budapest: Akadémiai Kiadó, 1968), 233: “és mindeneknek elötte igyekeztem azon hogy amaz hires neves nagy elmejü baráttal ki az egész Cursus Philosophicust nagy mesterségel tablákra hozta, megismerkedhetnem.” (On meeting Meurisse, Csombor told him that his name was greatly respected everywhere his writings were known.) I would like to thank Péter Tóth for translating this text for me.

5. Laurence Brockliss, French Higher Education in the Seventeenth and Eighteenth Centuries: A Cultural History (Oxford: Oxford University Press, 1987), 194–95.

6. For in-depth discussions of this print, see Susanna Berger, “Martin Meurisse’s Garden of Logic”, The Journal of the Warburg and Courtauld Institutes 76 (2013): 203–-49; and Susanna Berger, The Art of Philosophy: Visual Thinking in Europe from the Late Renaissance to the Early Enlightenment (Princeton: Princeton University Press, 2017), chapter 2.

7. David Watkin, “Iungit Amor: Royal Marriage Imagery in France, 1550–1750”, Journal of the Warburg and Courtauld Institutes 54 (1991): 256–61, especially 256–57. This theory of palm tree reproduction became well known in sixteenth- and seventeenth-century France through vernacular editions of Philostratus the Elder’s Imagines, in which it is presented as fact. Gaultier, who engraved the Descriptio, also created illustrations for Blaise de Vigenère’s Les images ou tableaux de platte peinture des deux Philostrates, which was published in the same year (1614).

8. For another adaptation of the palm tree motif, see the title page of Tommaso Campanella’s Realis philosophiae epilogisticae partes quatuor (1623).

9. Here I am thinking of such path-breaking projects as Eileen Reeves, Painting the Heavens: Art and Science in the Age of Galileo (Princeton: Princeton University Press, 1997); Horst Bredekamp, Thomas Hobbes Visuelle Strategien. Der Leviathan: Urbild Des Modernen Staates (Berlin: Akademie, 1999); David Freedberg, The Eye of the Lynx: Galileo, His Friends, and the Beginnings of Modern Natural History (Chicago: University of Chicago Press, 2003); Pamela H. Smith, The Body of the Artisan: Art and Experience in the Scientific Revolution (Chicago: University of Chicago Press, 2004); and Alexander Marr, Between Raphael and Galileo: Mutio Oddi and the Mathematical Culture of Late Renaissance Italy (Chicago: University of Chicago Press, 2011).


Public Domain Works

  • Thesis prints at the Bibliothèque nationale de France (CC BY-NC-SA)
  • Thesis prints at Princeton University Library

Further Reading

The Art of Philosophy: Visual Thinking in Europe from the Late Renaissance to the Early Enlightenment (Princeton University Press (2017))

by Susanna Berger

Delving into the intersections between artistic images and philosophical knowledge in Europe from the late sixteenth to the early eighteenth centuries, The Art of Philosophy shows that the making and study of visual art functioned as important methods of philosophical thinking and instruction.


Books link through to Amazon who will give us a small percentage of sale price (ca. 6%). Discover more recommended books in our dedicated section of the site: FURTHER READING.

Rayton Solar: Legitimate Investment or Scam?

$
0
0
Bill Nye Pitching Rayton Solar

Rayton Solar, a company endorsed by Bill Nye the Science Guy is asking regular Americans to invest millions of dollars. The Facebook and Instagram feeds of people who are interested in solar and clean energy are full of their ads.

We took a close look at this investment with two questions in mind:

  1. Is this a good investment?
  2. Will this help the solar industry and make renewable energy cheaper and easier to get?

So first, is this a good investment?

Our Answer: No — definitely not.

Investors are being asked by Rayton, and by Bill Nye to make an investment on terms that are not fair and that almost no experienced investor would make in ANY company. We think this is unethical. Rayton has purposely offered “unfriendly terms” to investors that make it likely that the CEO would profit even when the company fails and investors lose all their money.

If you have already invested on StartEngine or another site, we recommend you attempt to withdraw your investment. We’d argue it’s also unethical for StartEngine to allow terms like these to be offered anywhere on its site.

What’s the problem? The strange investment terms Rayton purposefully offers allow the founders to take money from investors and spend it on themselves rather than invest it in Rayton. This is a big red flag. All money raised should go to build technology required to make Rayton a valuable company. It’s hard to overstate how offensive this is as an investor. In basic terms, this means the owners could take some of the money from you, the investor reading this, and buy a bunch of beach houses in Florida or more than 30 Teslas while Rayton fails.

The ability to do this is stated on the first page of their official investment filing:

“After each closing, funds tendered by investors will be available to the company and, after the company has sold $7,000,000 worth of Common Stock, selling securityholders will be permitted to sell up to $3,000,000 worth of Common Stock.”

So if Rayton sells $10 Million in stock, $3 Million of investor money can go straight to the CEO, Andrew Yakub, and to two holding companies, rather than going to build technology so Rayton can make money and repay investors. Those owners will still walk away with $3 Million dollars in cash from investors even if the company fails and investors lose everything.

While founders are sometimes allowed to sell equity in late investment rounds it’s unheard of for founders to do that at Rayton’s stage. This is also a clue that Andrew Yakub and his team aren’t confident in Rayton’s technology.

Worse, it appears that Rayton is using investor money to buy more Facebook ads, to get more investors, so it can raise more money. If this is true, it’s like a ponzi scheme with money from early investors being spent to get money from new investors rather than on building Rayton technology.

The Second Major Red Flag: The price of each share Rayton is higher than almost any other startup ever. It’s a bad deal for investors.

A typical company at this stage with an unproven technology might be worth $2.5 Million in an early “Series A” investment round. If expert investors thought the technology was very promising and the company could earn large profits, professional investors might value it at $25 Million or perhaps a bit more.

Rayton is selling shares at a value “post money” between $60 Million and $267 Million (see page 14 of their investment circular). That means they claim that right now it could be worth over $200 Million.

Almost all companies at this early stage are worth less than $30 Million. Here is a graph of typical valuations at the same “series A” stage of ALL other companies that successfully raise funds. The graph doesn’t even go past $100 Million. How could Rayton, in good conscience, ask its investors to ever invest at twice that? Investors could pay 4 times as much as the early investors in Uber paid.

*As an aside, our country and world as a whole should be investing in the right solar, battery storage, electric vehicle, and wind companies, and asking our lawmakers to provide the same subsidies to those industries that are provided to other industries.

2. Will this help the solar industry? Our Answer: Almost certainly not.

The Rayton video misleadingly implies that using less silicon will change the solar industry. It’s important to consider the capital cost of the machines, higher operating costs, and the fact that moving to thinner silicon would require expensive changes elsewhere in the process of making a solar panel. It’s not clear that in the end Rayton would be cheaper. More importantly, using less silicon wouldn’t solve the key cost issues of solar. Solar panels are less than half the cost of a solar system and silicon, which Rayton says it saves, is only a small part of solar panel cost.

As an expert stated below in the comments, “there are proven technologies that are already more advanced …that will render this idea obsolete…. (such as)… a Gallium Arsenide (GaAs) thin film cell … with similar savings in material costs to what Rayton is proposing and which is already into it’s fourth generation of manufacturing processes. Additionaly, the real issue with solar is that they are non-commodity products sold at commodity prices. See full comment below.

Another statement aggregated from several experts include user YMK1234:

If cutting anything bigger than a tiny sample via the proposed method actually worked it would be extremely fragile. A carrier material and probably a new process of panel creation would be required. The technical description doesn’t address technical accuracy issues… but if they really could really do what they claim, every microchip company could also use that technology which would ensure they wouldn’t need crowdfunding.

More concerning still is the fact that the challenges to wide scale solar adoption are now more around policy, grid infrastructure, and the cost of everything else from land to power inverters (“balance of system” in industry jargon). Silicon cost is not a major issue.

Finally, the fact that no reputable solar investors are listed as supporting Rayton should be a red flag. Rayton will not change the solar industry.

Unethical enough to take action:

Our research showed the extent to which investors are being deliberately misled by Rayton while Rayton raises millions of dollars from casual (and thus probably not wealthy) investors. This is so unethical that action needs to be taken to stop it.

  1. Bill Nye should speak up so more well meaning people who want to support solar do not lose money. If Bill Nye doesn’t do this, his behavior is as unethical as that of Rayton’s.
  2. Other professionals who are endorsing Rayton explicity or implicitly are part of the effort to deliberately mislead crowdfunders (possibly while profiting from the crowdfunding). These people should explain why Rayton’s unfriendly terms are fair, close down Rayton crowdfunding and return investor money, or improve the terms for crowdfunders so Rayton Directors can’t profit while crowdfunders lose everything. Anyone reading this should contact them to tell them to do so, and their colleagues at UCLA should bring this up with them.

Rayton Director and UCLA Profesor James Rosenzweig: rosenzweig@physics.ucla.edu. Phone: 310–206–4541. Website http://www.pa.ucla.edu/directory/james-rosenzweig.

Rayton Director and UCLA Profesor Mark Goorsky: Tel. (310) 206–0267, FAX (310) 206–7353, Email: goorsky@seas.ucla.edu.

The CEO Andrew Yakub — Contact info not publicly available, but email @raytonsolar.com probably guessable.

3. StartEngine bears ethical responsibility for allowing investor unfriendly terms to be sold. The JOBS act made equity crowdfunding legal, but it was not implemented well if Rayton’s unethical terms are legal. Startengine should prohibit “cash-out” offerings and add onerous langauge around early stage exec pay to make schemes like Rayton’s more difficult to pull off. It’s a competitive market so all crowdfunding sites should do this together to preserve their collective reputations. Startengine should also end the Rayton campaign and others with similarly bad terms.

*The JOBS act was probably a good thing and was addressed in a really interesting episode of the Startup Podcast.

Iowa's handout to Apple illustrates the folly of corporate welfare deals

$
0
0

State and local officials in Iowa have been working hard to rationalize their handout of more than $208 million in tax benefits to Apple, one of the world’s richest companies, for a data facility that will host 50 permanent jobs.

The deal will help make Iowa an “innovation and technology” hub, Gov. Kim Reynolds gushed. It will ensure development of a big parcel of open land that otherwise would have remained fallow, local development officials said. It was a bargain, according to civic leaders in Waukee, the Iowa community that will host the data center — after all, Apple will be contributing up to $100 million to a local infrastructure fund, starting with money to build a youth sports center.

We were highly skeptical of this deal when it was announced Aug. 24. In the fullness of time, we’ve subjected it to closer scrutiny. And now it looks even worse.

In a broader sense, the Apple deal shows the shortcomings of all such corporate handouts, nationwide. State and local governments seldom perform cost-benefit studies to determine their value — except in retrospect, when the money already has been paid out. They seldom explain why some industries should be favored over others — think about the film production incentives offered by Michigan, Louisiana, Georgia and, yes, Iowa, which never panned out as profit-makers for the states. They’re often negotiated in secret, as was the Iowa deal, then presented to taxpayers as faits accomplis — andoften misleadingly.

These incentives often are an unnecessary bonus to companies that already have made a site location decision based on more important factors. Yet states and localities have persuaded themselves that the incentive packages are an indispensable lure to employers and that without them their economies will collapse.

“Firms know where they want to be,” says economist David Swenson of Iowa State University. “The question of how much in rents they can extract from state and local governments is phase two. But taxes are a secondary consideration.”

Worst of all, the handouts allow big companies to pit state against state and city against city in a competition that benefits corporate shareholders almost exclusively. Bizarrely, this process has been explicitly endorsed by Donald Trump. Companies “can leave from state to state and they can negotiate good deals with the different states and all of that,” Trump said in December, as long as they’re not taking the jobs across the border. This is a formula, of course, for what some might compare to unrestrained corporate extortion.

These corporate handouts might make sense if they spurred economic growth. They don’t.

“There is virtually no association between economic development incentives and any measure of economic performance,” urban economist Richard Florida concluded in 2012. A study of his found “no statistically significant association between economic development incentives per capita and average wages or incomes; none between incentives and college grads or knowledge workers; and none between incentives and the state unemployment rate.”

Another study found that, if anything, government incentives led to slower growth among the companies that received them, possibly because their managers spent more time pursuing incentives than focusing on the business, and felt less pressure to seek out nonincentive-related growth opportunities.

Apple’s deal with Iowa, which includes about $20 million in a state investment tax credit and a 20-year tax abatement from the city of Waukee worth nearly $190 million, underscores all these elements. First, it was put together behind closed doors and presented to state legislators and economic officials only after the fact. Some elements of the package are still shrouded in mystery.

But at least one major element of the original announcement has proven to be extremely misleading. That’s Apple’s contribution of “up to $100 million” to an infrastructure fund for Waukee. As it turns out, “up to” was an important but overlooked phrase. That’s because the total contribution would be dependent on expansion of the data center well beyond the 400,000 square feet originally announced. Unless that expansion occurs, Apple’s contribution will be only $20 million, a source close to the deal told me. In other words, Apple is guaranteed to contribute only one-fifth as much as people were led to believe.

Apple has sound reasons for locating a $1.3-billion data center — which will provide a technical background to services such as its Siri voice-activated assistance service, iTunes and its App Store — on the Iowa prairie. Iowa has abundant wind generation, which enables the company to say the center will operate on electricity from 100% renewable sources. It has plenty of good, flat land if the company decides to expand its data warehouse. And it’s relatively well-insulated from weather extremes, not to mention coastal weather disasters.

As it happens, Apple was already eligible for a major tax break even before it entered negotiations for this deal. In 2007, the legislature enacted a sales tax exemption specifically for data centers, covering their servers, industrial coolers, backup generators, and other computer equipment. Since Apple is planning to spend about $645 million on equipment for the data center, according to Reuters, that implies a break of nearly $39 million from the state’s 6% sales and use tax. Because the exemption was preexisting, it wasn’t even mentioned during the gala announcement of how much Apple would be getting to move to Iowa.

Justifying public outlays of this magnitude, one must have a handle on their possible contribution to employment. In the case of data centers, Swenson argues, it’s meager. Not only is the permanent direct employment forecast for the Apple facility a mere 50 people, the potential for indirect employment is also small. Microsoft and Google data centers preceded Apple to Iowa, Swenson observes. “These centers have absolutely no linkages upstream or downstream with the rest of the economy, except for the upstream grab of electricity,” he says. “They’re just big, sterile, hot boxes that don’t feed into Iowa’s economy.”

Indeed, even as Iowa’s data-center complex has expanded over the years, its employment in the sector has shrunk, possibly because the centers are increasingly automated. According to data from the Bureau of Labor Statistics, Iowa employment in the “data processing, hosting, and related services” sector has been falling sharply, to about 3,400 last year from more than 7,400 in 2007. The state’s share of all such employment nationally also has fallen to about 1% now from nearly 3% in 2007. The lesson is that if Iowa officials think their handouts will place them at the hub of a high-tech revolution, they’re chasing an imaginary grail.

Yet politicians continue to shovel out the benefits, hoping to steer their economies in new directions and perhaps acquire a reputation for vision. Nevada was so eager to land a big battery factory from Tesla Motors’ Elon Musk that it offered him twice what Musk was seeking from the five states competing for the project. (In Las Vegas, this is known as “leaving money on the table.”) Wisconsin Gov. Scott Walker gave a big incentive deal to a furniture factory even though it was laying off half its workforce. He followed up last month with an astronomical $3-billion handout to electronics manufacturer Foxconn for a factory likely to employ a fraction of the workforce it forecasts.

The biggest scam of all may be film incentives, which peaked a few years ago when states across the country began to see themselves as rivals to California. The pioneer in this effort, Louisiana, was later shown to be spending $7.29 in incentives for every dollar in revenue brought in. ("People are getting rich on this deal, and it's not Louisiana taxpayers," concluded the study's sponsor, the Louisiana Budget Project.) California eventually was forced to counter the proliferating production raids with a program of its own, but even its relatively modest package costs $1 in outlay for every 65 cents returned to the treasury. Iowa’s film incentive program, by the way, was such a mess that it collapsed in 2009 amid scandal, leading to several felony convictions.

This is how states and localities end up on a merry-go-round of infinite spending. Whenever another deal gets proposed by starry-eyed politicians, the taxpayers should just say no.

Keep up to date with Michael Hiltzik. Follow @hiltzikm on Twitter, see his Facebook page, or email michael.hiltzik@latimes.com.

Return to Michael Hiltzik's blog.

MORE FROM MICHAEL HILTZIK

Memo to economists defending price gouging in a disaster: It's still wrong, morally and economically

More than 20 Texas representatives and senators voted against Sandy aid. How will they vote on Harvey?

"I-M-P-E-A-C-H": People quitting Trump are now doing so in code

Roll Your Own Bitcoin Exchange in Haskell

$
0
0

A stock exchange is a complex beast, but much of it can be reduced to a single data structure: the Order Book. A Bitcoin exchange uses the same data structure when trading currency pairs of USD, BTC, or ETH. This article will show you how to:

  • Design an order book that can handle limit orders and market orders
  • Install automated sanity checks that run on every write to the order book, preventing hacks and implementation bugs
  • Build an HTTP API that people can use to interact with the order book

We won’t work with actual bitcoins or wallets, since they add a lot of complexity and risk without making the article any more interesting. Instead, we’ll assume the “billing code” has already been written and focus only on the order book portion of the exchange.

Types

So what is an order book, really?

First we’ll define our orders:

importData.TaggedimportqualifiedData.MapasMimportqualifiedData.SequenceasQtypeUserId=IntegertypeCurrency=TexttypeCurrencyPair=(Currency,Currency)typeAmount=IntegertypePrice=DoubledataLimitOrder=LimitOrder{_lorder_user::UserId,_lorder_fromAmount::Amount,_lorder_toAmount::Amount}deriving(Eq,Show,Generic)dataTBiddataTAsktypeBidTa=TaggedTBidatypeAskTa=TaggedTAskatypeBid=BidTLimitOrdertypeAsk=AskTLimitOrderdataMarketOrder=MarketOrder{_morder_user::UserId,_morder_amount::Amount}deriving(Eq,Show,Generic)typeMBid=BidTMarketOrdertypeMAsk=AskTMarketOrderdataOrderBookFa=OrderBook{_book_fromCurrency::Currency,_book_toCurrency::Currency,_book_bids::M.MapPrice(Q.Seq(BidTa)),_book_asks::M.MapPrice(Q.Seq(AskTa))}deriving(Eq,Show,Functor,Traversable,Foldable)typeOrderBook=OrderBookFLimitOrder

A couple points to note:

  • We use Seq rather than List or Vector because we need relatively efficient insert and search.
  • The higher order OrderBookF lets us get Traversable, Functor, and Foldable instances to manipulate the order book without writing any code.
  • Both Bid and Asks are LimitOrders, but I want to track them separately in the type system. So I use Tagged to attach a TBid or TAsk tag.

The OrderBook by itself isn’t enough, because we want to track the final amounts actually transferred between buyers and sellers. Lets add a few types for that:

dataSingleEntry=SingleEntry{_se_account::UserId,_se_currency::Currency,_se_amount::Amount}deriving(Eq,Show,Generic)dataDoubleEntry=DoubleEntry{_de_fromAccount::UserId,_de_toAccount::UserId,_de_currency::Currency,_de_amount::Amount}deriving(Eq,Show,Generic)dataTradeFa=Trade{_trade_from::a,_trade_to::a}deriving(Eq,Show,Functor,Traversable,Foldable,Generic)typeTrade=TradeFDoubleEntrydataExternaltypeExternalTransfer=TaggedExternalSingleEntry

To “execute” an order causes it to turn into at least one Trade. There are 4 changes to users’ balances whenever a Trade occurs for the USD/BTC pair:

  • The buyer loses his USD
  • The seller gains USD
  • The buyer gains BTC
  • The seller loses BTC

It’s important to use “Double entry accounting” when handling money of any sort. By adding an entry to both sides whenever money changes hands, it’s much more resistant to bookkeeping errors. The total amount of USD and BTC should be zero-sum when transferred internally.

Some accounts are external to our exchange, such as USD held in bank accounts, or BTC held in wallets. In this example, I’ve chosen to eliminate the external account from the exchange’s view, giving us only single-entry bookkeeping for transfers outside the exchange. For accounting purposes, there should be a separate database where full double-entry financial data is held, and it should be routinely reconciled with the exchange, any bank accounts, and any wallets.

Administrators or billing code can create SingleEntry records when creating or destroying money supply in the exchange, while all users will create DoubleEntry records amongst each other.

Finally, here’s the global state that the exchange is going to keep:

dataExchange=Exchange{_exchange_external::TVar(Q.SeqExternalTransfer),_exchange_book::TVarOrderBook,_exchange_trades::TVar(Q.SeqTrade)}

In words: An Exchange is a collection of external transfers that move money in and out of the system, an order book for a single currency pair full of orders waiting to be filled, and the history of previous trades.

This exchange has no trading fees. Those would go on either the Trade or External types, depending on whether we charge people for internal or external transfers.

Order book

What are the fundamental operations we need to use our order book? We want to:

  • Add an order to the book, automatically executing trades if able.
  • Cancel an order from the book
  • List all outstanding orders on the order book

Once we have these, we’ll wrap it in an HTTP API that can be used by scripts or a website to interact with the order book.

I’ll only look at the Bid limit and market orders for now - the Ask versions are very similar. Here are the types for the functions we need:

cancelBid::Bid->OrderBook->OrderBookfillBid::Bid->OrderBook->([Trade],OrderBook)tryFillMBid::MBid->Balances->OrderBook->(MaybeMBid,[Trade],OrderBook)listOrders::OrderBook->[LimitOrder]listOrders=toList

Since the generated Foldable instance does listOrders for us, I went ahead and implemented it. I’ll leave the implementation of cancelBid on Github.

For fillBid and tryFillMBid, we’ll depend on a generic matchBid function that attempts to fill a bid on an OrderBook and returns a new Bid for whatever couldn’t be matched, any Trades that were executed, and the new OrderBook:

matchBid::Bid->OrderBook->(MaybeBid,[Trade],OrderBook)matchBidbidbook=letpair=_book_pairbookloop::(Bid,[Trade],OrderBook)->(MaybeBid,[Trade],OrderBook)loopx@(bid,trades,book)=caselowestAskbookof-- Case 1: The order book has no asks(Nothing,_)->(Justbid,[],book)(JustlowestAsk,deletedBook)->casemergeBidpairbidlowestAskof-- Case 2: The bid was unable to be matched(Justbid,Just_,Nothing)->(Justbid,trades,book)-- Case 3: The bid was partially matched; repeat the loop(JustbidRemainder,Nothing,Justtrade)->loop(bidRemainder,trade:trades,deletedBook)-- Case 4: The ask was partially matched; terminate the loop.(Nothing,JustaskRemainder,Justtrade)->(Nothing,trade:trades,unsafe_addAskaskRemainderdeletedBook)-- Case 5: The bid and ask exactly canceled each other out(Nothing,Nothing,Justtrade)->(Nothing,trade:trades,deletedBook)-- Case 6: Impossible casesx->panic$"fillBid: Unexpected case: "<>showxinloop(bid,[],book)mergeBid::CurrencyPair->Bid->Ask->(MaybeBid,MaybeAsk,MaybeTrade)lowestAsk::OrderBook->(MaybeAsk,OrderBook)unsafe_addAsk::Ask->OrderBook->OrderBook

Here, we repeatedly find the best price Ask on the order book, use it to fill our Bid, and stop when we run out of qualifying Asks.

lowestAsk is pretty easy since our Map is sorted by price. I’ll define it and unsafe_addAsk in the Github repository.

mergeBid is probably the most complex, since it handles 3 distinct things:

  • It generates new orders if either the bid or ask were partially filled
  • It generates a trade if the bid crosses the ask
  • It handles the calculations determining these
mergeBid::CurrencyPair->Bid->Ask->(MaybeBid,MaybeAsk,MaybeTrade)mergeBid(fromCurrency,toCurrency)bidask=letbidOrder=unTaggedbidaskOrder=unTaggedaskn1=_lorder_fromAmountbidOrderd1=_lorder_toAmountbidOrdern2=negate$_lorder_fromAmountaskOrderd2=_lorder_toAmountaskOrderbuyer=_lorder_userbidOrderseller=_lorder_useraskOrderfi=fromIntegral-- If seller rounds down, price would be below his limit.sellerPrice=ceiling(fin2/fid2)-- If buyer rounds up, price would be above his limit.buyerPrice=floor(fin1/fid1)unitPrice=buyerPricenumUnits=minn1n2toAmount=ceiling(finumUnits/fiunitPrice)fromTransfer=DoubleEntry{_de_fromAccount=seller,_de_toAccount=buyer,_de_amount=numUnits,_de_currency=fromCurrency}toTransfer=DoubleEntry{_de_fromAccount=buyer,_de_toAccount=seller,_de_amount=toAmount,_de_currency=toCurrency}trade=TradefromTransfertoTransfer(mNewBid,mNewAsk)=cased1`compare`d2of-- Case 1: Buyer is done; seller still has inventoryLT->letnewAsk=Tagged$LimitOrder{_lorder_user=seller,_lorder_fromAmount=n2-numUnits,_lorder_toAmount=sellerPrice}in(Nothing,JustnewAsk)-- Case 2: Seller is out; buyer needs moreGT->letnewBid=Tagged$LimitOrder{_lorder_user=buyer,_lorder_fromAmount=n1-numUnits,_lorder_toAmount=buyerPrice}in(JustnewBid,Nothing)-- Case 3: Buyer and seller exactly tradedEQ->(Nothing,Nothing)inifbuyerPrice>=sellerPrice-- Bid has crossed the ask, so we can generate a trade.then(mNewBid,mNewAsk,Justtrade)-- Bid is less than ask, so they can't be merged.else(Justbid,Justask,Nothing)

mergeBid is subtle enough that you probably don’t feel comfortable trusting its implementation based on visual inspection alone. In the next section, we’ll install automated sanity checks on every write so that any implementation bugs will be blocked.

Security

How do we stop our exchange from losing money when we inevitably get hacked?

The most important things to secure are the actual Bitcoin or Ethereum wallets our exchange has. Luckily, we avoided that issue by not having any wallets.

The second most important thing is to have append-only backups that we can roll back to if we detect a hack. That’s not ideal, because we still have to tell all of our customers we got hacked.

The third most important thing is to avoid losing the money in the first place.

In my last article, I showed how you can use theorem provers to formally prove that certain properties hold regarding your data structures. We can’t quite do that in Haskell, but let me define a few invariants for you anyways and I’ll show you a trick.

Invariants

Here are our invariants in words:

  • Users can’t trade with themselves
  • Users can’t have negative balances
  • Users can’t have more money in pending trades than they have in their account.
  • Users and OrderBooks can’t have currencies that don’t exist

And here they are in code:

  • Users can’t trade with themselves
typeConsistencyCheck=Exchange->STMBoolconsistency_noSelfTrades::ConsistencyCheckconsistency_noSelfTrades=\exchange->dotrades<-readTVar$_exchange_tradesexchangereturn$allcheckTradetradeswherecheckDeDoubleEntry{..}=_de_fromAccount/=_de_toAccountcheckTradeTrade{..}=checkDe_trade_from&&checkDe_trade_to
  • Users can’t have negative balances
consistency_noNegativeBalances::ConsistencyCheckconsistency_noNegativeBalances=\exchange->dobals<-userBalancesexchangeletcheckUser(userId,balances)=flipall(M.toListbalances)$\(currency,balance)->ifbalance>=0thenTrueelseerror$"Negative balance for "<>show(userId,currency,balance)return$allcheckUser$M.toListbalsuserBalances::Exchange->STM(M.MapUserIdBalances)userBalances=undefined
  • Users can’t have more money in pending trades than they have in their account.
consistency_ordersBackedByAccount::ConsistencyCheckconsistency_ordersBackedByAccount=\exchange->dousersBals<-userBalancesexchangeletcheckUserBalance::Balances->(Currency,Amount)->BoolcheckUserBalanceuserBals(currency,bookAmount)=caseM.lookupcurrencyuserBalsofNothing->FalseJustuserAmount->userAmount>=bookAmountletcheckUser::(UserId,Balances)->STMBoolcheckUser(user,userBals)=dobookBals<-userBookBalancesexchangeuserletcurrenciesPending=M.toListbookBalsreturn$all(checkUserBalanceuserBals)currenciesPendingallMcheckUser$M.toListusersBalsuserBookBalances::Exchange->UserId->STMBalancesuserBookBalances=undefined
  • Users and OrderBooks can’t have nonexistent currencies
consistency_allCurrenciesExist::ConsistencyCheckconsistency_allCurrenciesExist=\exchange->dousersBals<-userBalancesexchangebookBals<-bookBalancesexchangeletvalidcurrency=currency`elem`allCurrenciescheckBalsbals=allvalid$M.keysbalsusersCheck=allcheckBalsusersBalsbooksCheck=allvalid$M.keysbookBalsreturn$usersCheck&&booksCheckbookBalances::Exchange->STMBalancesbookBalances=undefined

As long as those functions always return true, we can have some confidence in the rest of our code.

userBalances, bookBalances, and userBookBalances roll up the trades and external transfers to get a final balance. I’ll leave their implementation in the Github repository.

The Trick

People often use triggers or constraints in relational databases to automatically enforce invariants. Using Haskell’s Software Transactional Memory library, we can do something similar:

installSanityChecks::Exchange->IO()installSanityChecksexchange=atomically$mapM_installCheck[(consistency_noNegativeBalances,"No negative balances"),(consistency_ordersBackedByAccount,"Orders must be backed by account"),(consistency_allCurrenciesExist,"Non-existent currency"),(consistency_noSelfTrades,"Users cannot trade with themselves")]whereinstallCheck(check,message)=alwaysSucceeds$dob<-checkexchangeifbthenreturn()elseerrormessage

atomically enters a “transaction” in the STM sense. It’s designed as an efficient way to allow multiple threads to concurrently update a shared data structure. Transactions can abort and retry if there are concurrent updates to the same variable. We can also abort if one of our sanity checks fails.

The alwaysSucceed function will run the sanity check once, and if it passes run it after every transaction afterwards. It will roll back the transaction with an exception if the sanity check fails with an exception.

We’ll call installSanityChecks near the start of our program, after we initialize or load our exchange’s state. Then every write will be automatically sanity-checked and rolled back with an exception. Our HTTP library warp will catch the exception and abort the request.

Networking

We want 5 API endpoints:

  • List orders
  • Cancel an order
  • Add an order
  • Create money on the exchange
  • List balances

Here are the request types:

dataRequest_ListOrders=Request_ListOrders{_reqListOrders_user::MaybeUserId}deriving(Eq,Show,Generic)dataRequest_CancelOrder=Request_CancelBid{_reqCancelOrder_bid::Bid}|Request_CancelAsk{_reqCancelOrder_ask::Ask}deriving(Eq,Show,Generic)dataRequest_AddOrder=Request_AddBid{_reqAddOrder_bid::Bid}|Request_AddAsk{_reqAddOrder_ask::Ask}|Request_AddMBid{_reqAddOrder_mbid::MBid}|Request_AddMAsk{_reqAddOrder_mask::MAsk}deriving(Eq,Show,Generic)dataRequest_CreateMoney=Request_CreateMoney{_reqCreateMoney_singleEntry::SingleEntry}deriving(Eq,Show,Generic)dataRequest_ListBalances=Request_ListBalancesderiving(Eq,Show,Generic)

And the response types:

dataResponse_ListOrders=Response_ListOrders{_resListOrders_orders::[LimitOrder]}deriving(Eq,Show,Generic)dataResponse_CancelOrder=Response_CancelOrderderiving(Eq,Show,Generic)dataResponse_AddOrder=Response_AddBid{_resAddOrder_trades::[Trade]}|Response_AddAsk{_resAddOrder_trades::[Trade]}|Response_AddMBid{_resAddOrder_mbidRemainder::MaybeMBid,_resAddOrder_trades::[Trade]}|Response_AddMAsk{_resAddOrder_maskRemainder::MaybeMAsk,_resAddOrder_trades::[Trade]}deriving(Eq,Show,Generic)dataResponse_CreateMoney=Response_CreateMoneyderiving(Eq,Show,Generic)dataResponse_ListBalances=Response_ListBalances{_resListBalances_externals::[(UserId,Currency,Amount)],_resListBalances_internals::[(UserId,Currency,Amount)],_resListBalances_helds::[(UserId,Currency,Amount)],_resListBalances_totalBals::[(UserId,Currency,Amount)],_resListBalances_bookBals::[(Currency,Amount)]}deriving(Eq,Show,Generic)

Anyone that wants to interact with our exchange will end up creating a Request and receive an appropriate Response. Here is the actual entrypoint to the server:

importNetwork.WaiasWimportNetwork.Wai.Handler.WarpimportNetwork.HTTP.Types.MethodimportNetwork.HTTP.Types.Statusle_port=2345serverMain::IO()serverMain=dostate<-initializeinstallSanityChecksstateputStrLn$"Listening on "++showle_portrunle_port$\reqrespond->let?req=req?respond=respond?state=stateindoprintreqbody<-strictRequestBodyreqcase(pathInforeq,requestMethodreq)of("createMoney":_,"POST")->withParsedRequestbodyapi_createMoney("listOrders":_,"POST")->withParsedRequestbodyapi_listOrders("addOrder":_,"POST")->withParsedRequestbodyapi_addOrder("cancelOrder":_,"POST")->withParsedRequestbodyapi_cancelOrder("listBalances":_,"POST")->withParsedRequestbodyapi_listBalances_->respond(W.responseLBSstatus404[]"Unknown path")initialize::IOExchangeinitialize=Exchange<$>newTVarIOQ.empty<*>newTVarIO(newBook("USD","BTC"))<*>newTVarIOQ.emptytypeHandlerTa=(?req::Request,?respond::(Response->IOResponseReceived),?state::Exchange)=>a->IOResponseReceivedwithParsedRequest::FromJSONa=>BSL.ByteString->HandlerT(a->IOResponseReceived)withParsedRequestbshandler=casedecodebsofNothing->?respond(W.responseLBSstatus400[]"Unable to parse")Justx->handlerx

And here’s how to implement one of our handlers:

api_listOrders::HandlerTRequest_ListOrdersapi_listOrders_=doletExchange{..}=?statebook<-readTVarIO_exchange_book?respond$W.responseLBSstatus200[]$encode$Response_ListOrders$toListbook

I’ve also included JSON serializers and deserializers in the Github repository.

Testing

How do we actually end-to-end test our Bitcoin exchange?

The first step is to print one of your request values to get its corresponding JSON:

print$encode$Request_AddBid$Tagged$LimitOrder{_lorder_user=1_lorder_fromAmount=4600,_lorder_toAmount=1}

The second step is to use a shell script to send that JSON to the server:

cat <<'EOF' | curl localhost:2345/addOrder -XPOST -d @-
{
  "bid":{
    "toAmount":1,
    "user":1,
    "fromAmount":4600
  },
  "tag":"Request_AddBid"
}EOF

At this point, you can use any combination of the 5 endpoints to change or inspect the exchange’s state.

Conclusion

We showed how you can implement a simple order book in Haskell, which can be the basis of a full-blown Bitcoin exchange. Future articles may cover:

  • Writing the order book in C for efficiency, and using it in the larger Haskell program.
  • Writing a program that watches for actual Bitcoin to be sent, so money can enter our exchange.
  • Using unit tests to validate the order book implementation.
  • Adding multiple currency pairs to the exchange.
  • Adding authentication, so users won’t have unrestricted access to the exchange.

Principles of Automated Testing

$
0
0

Automated testing is a core part of writing reliable software; there's only so much you can test manually, and there's no way you can test things as thoroughly or as conscientiously as a machine. As someone who has spent an inordinate amount of time working on automated testing systems, for both work and open-source projects, this post covers how I think of them. Which distinctions are meaningful and which aren't, which practices make a difference and which don't, building up to a coherent set of principles of how to think about the world of automated testing in any software project.


About the Author: Haoyi is a software engineer, an early contributor to Scala.js, and the author of many open-source Scala tools such as the Ammonite REPL and FastParse.

If you've enjoyed this blog, or enjoyed using Haoyi's other open source libraries, please chip in (or get your Company to chip in!) via Patreon so he can continue his open-source work


I probably care more about automated testing than most software engineers. At a previous job, I agitated-for and rolled-out Selenium integration tests as part of the development process across engineering, developed static-analysis "tests" to block silly mistakes and code quality issues, and led projects to fight the flaky test scourge to help Make The CI Build Green Again. In my open source work, e.g. in projects like Ammonite or FastParse, my ratio of test code to "main" application code often is about 1 to 1.

A lot has been written of the practice of automated testing: about Unit Testing, Property-based Testing, Integration Testing, and other topics. Unsurprisingly, much of the information you can find on the internet is incomplete, at odds to one another, or only applies narrowly to certain kinds of projects or scenarios.

Rather than talking about individual tools or techniques, this post attempts to define a way of thinking about automated testing that should apply broadly regardless of what software project you are working on. Hopefully this should form a foundation that will come in useful when you end up having to lift your gaze from the daily grind of software development and start thinking about the broader strategy of automated testing in your project or organization.

The Purpose of Automated Tests

The purpose of automated tests is to try and verify your software does what you expect it to do, now and into the future.

This is a very broad definition, and reflects how there are very many different ways to try and verify your software does what you expect:

  • Calling a function on known inputs and assert-ing on the expected result

  • Setting up a staging website and checking that the web pages, together with all the systems behind it, can perform simple operations correctly

  • Fuzzing your system with large amounts of random input and just seeing if it crashes

  • Comparing the behavior of your program against a known-good reference implementation and ensuring it behaves identically

Note how the stated goal doesn't say a word about "unit" or "integration" testing. That is because those are almost never the end goal: you want tests that automatically check that your software does what you want, by whatever means necessary. "unit" or "integration" tests are only one distinction out of many different ways of approaching automated testing, several of which we will cover in this post.

Now that we've defined the high-level goal, the rest of this post will go into much more detail about the intricacies and trade-offs inherent to the different ways we can try and achieve it.

Unit vs Integration tests

When you are working on automated tests, some arguments always come up:

  • Are we writing unit tests or integration tests?

  • Should we be writing unit tests or integration tests?

  • How do we define "unit tests" or "integration tests"?

There is an endless number of "one true way"s of distinguishing between unit and integration tests, all of them different. Rules like:

  • Unit tests must run entirely in a single process

  • Unit tests are not allowed to execute code in more than one file: all imports must be mocked

  • Unit tests are any test that don't cross the client-server boundary

However, I think such discussion often lacks perspective. In reality, the exact point where you draw the line is arbitrary. Every piece of code or system is a unit integrating smaller units:

  • A cluster that integrates multiple physical or virtual machines

  • A machine (virtual of physical) that integrates multiple processes

  • A process that integrates multiple other subprocesses (e.g. databases, workers, etc.)

  • A subprocess that integrates multiple modules

  • A module or package integrating multiple smaller modules

  • A module integrating individual functions

  • A function integrating primitives like Ints using basic arithmetic

Every piece of code or system could be thought of as a "unit" to be tested, and every piece of code or system could be thought of as an "integration" of other smaller units. Basically all software ever written is broken down hierarchically in this way.

                            _________ 
                           |         |
                           | Machine |
                           |_________|
                           /         \
                _________ /           \ _________
               |         |             |         |
               | Process |             | Process |
               |_________|             |_________|
               /                       /         \
              /              ________ /           \ ________              
            ...             |        |             |        |             
                            | Module |             | Module |             
                            |________|             |________|
                            /        \                      \
                __________ /          \ __________           \ __________ 
               |          |            |          |           |          |
               | Function |            | Function |           | Function |
               |__________|            |__________|           |__________|
               /          \            /          \           /          \
              /            \          /            \         /            \
            ...            ...      ...            ...     ...            ...

Earlier we defined that the purpose of automated testing is "to try and verify your software does what you expect", and in any piece of software you'll have code at every level of this hierarchy. All of that code is your responsibility to test and verify.

For consistency with existing terminology, I will call tests for code low in the hierarchy (e.g. functions integrating primitives) "unit" tests, and tests for code high in the hierarchy (e.g. a cluster integrating virtual machines) "integration" tests. But those labels are simply directions on a spectrum, and there isn't a bright line you can draw between the "unit" and "integration" labels that applies to every project.

                      Most tests somewhere in between
  Unit <------------------------------------------------------------> Integration
              |           |           |            |          |
          Functions    Modules    Processes    Machines    Clusters

What really matters is that you are conscious of the way your software is broken down hierarchically, and that automated tests can live at any level in the code hierarchy and any point in the unit-integration spectrum.

Being on a spectrum doesn't mean that the distinction between "unit" or "integration" tests is meaningless. While there is no bright line between the two extremes, tests towards each end of the spectrum do have different properties:

Unit Integration
Low in the hierarchy High in the hierarchy
Fast Slow
More Reliable More Flaky
Little setup required Lots of setup required
Few dependencies Many dependencies
Few failure modes Many failure Modes
Specific failure messages Generic failure messages
  • Unit tests tend to be faster, since they exercise less code that needs to run.

  • Unit tests tend to be more reliable, since they exercise less code that may have non-deterministic failures

  • Unit tests need less set up beforehand, since they have fewer dependencies

  • Unit tests tend to fail in relatively specific ways ("function was meant to return 1, returned 2") with few possible causes, whereas integration tests tend to fail with broad, meaningless errors ("could not reach website") with many different possible causes

What does this mean to you, as a test writer?

The distinction between "unit" and "integration" tests is up to you to define

A library of algorithms is likely to have a different definition of "unit" and "integration" tests than a website, which may have a different definition of "unit" and "integration" tests than a cluster deployment system.

  • The library-of-algorithms may define "unit" tests as tests which run one function on tiny inputs (e.g. sorting a list of zero, one, two or three numbers) while "integration" tests use multiple functions to construct common algorithms

  • The website may define "unit" tests as anything that doesn't touch the HTTP API, and "integration" tests as anything that does.

  • Alternately, a website it may define "unit" tests as anything that doesn't spin up a browser (up-to-and-including API interactions) and "integration" tests as those that spin up a browser using Selenium to interact with the server through the UI/Javascript

  • The cluster-deployment-system may define "unit" tests as anything that doesn't physically create virtual machines, up to and including tests using HTTP APIs or database-access, while "integration" tests are those that spin up a real cluster in a staging environment

While there are differences (e.g. an algorithm-library's integration tests may run faster than a cluster-deployment-system's unit tests) in the end all these systems have tests that range from the "more unit" to "more integration" ends of the spectrum.

Thus it is up to the project owner to draw the line between them, and then build up practices around that line. The bullets above should give you some an idea of where the line could be drawn in various projects, and practices around that line could be things like:

  • Unit tests must be run before-commit while only integration tests are only run once a day during a nightly-build

  • Integration tests run on a separate CI machine/cluster from Unit tests due to their different setup requirements

There could be value in splitting up the spectrum of tests into more fine-grained partitions. Again, it is again up to the project owner to decide how many lines to draw, where to draw them, what each group of tests is called (e.g. "unit", "integration", "end to end", "functional"?) and how they are treated within your project.

There is no universal classification of "unit" and "integration" tests that is meaningful across the huge range of software projects that people work on, but that does not mean the distinction is meaningless. It simply means that it is up to each individual project to draw the line in a way that is meaningful and useful.

Tests at every level of the hierarchy

Every piece of software is written hierarchically, as units integrating smaller units. And at every level, it is possible for the programmer to make a mistake: ideally a mistake that our automated tests would catch.

Hence, rules like "only write unit tests, not integration tests" or "only write integration tests, not unit tests" are overly restrictive.

  • You cannot just write unit tests. It doesn't matter if every individual function is heavily tested but your module combines your functions in an incorrect way, or if every individual module is heavily tested but the application-process is using the modules incorrectly. While it is great to have a test suite which runs really fast, it's useless if it can't catch bugs introduced at upper layers of your program hierarchy.

  • You shouldn't just write integration tests. In theory it works, code at a upper layer in the hierarchy exercises code at layers beneath it, but you would need a large number of integration tests to sufficiently exercise various cases in the low-level code. e.g. if you want to check a single function's behavior with 10 different sets of primitive arguments, using integration tests to test it may mean you end up setting-up and tearing-down an application process 10 times: a slow and and wasteful use of your compute resources.

Instead, the structure your tests should roughly mirror the structure of your software. You want tests at all levels, proportionate to the amount of code at that level and how likely/serious it is to be wrong. This guards against the possibility for errors to be introduced at any level in the hierarchy of your piece of software.

How to Prioritize Tests

Automated tests serve two main purposes: making sure your code isn't already broken (perhaps in some way that's hard to catch via manual testing) and making sure that working code doesn't become broken at some point in the future (regressions). The former may be caused by an incomplete implementation, and the latter due to mistakes as the codebase evolves over time.

Thus, it doesn't make sense to have automated tests for code that isn't likely to be broken, code whose breakage isn't important, or code which is likely to disappear entirely before someone causes it to break.

It's more an art than science to decide how much testing a system or piece-of-code needs, but some guidelines may be:

  • Important things need more testing! Your password/authentication system definitely should be heavily tested to ensure bad passwords don't let people log in anyway, more so than other random logic in your application.

  • Less important things, may need less testing or no testing at all. Maybe it doesn't matter if your web-upsell-modal-thingy doesn't appear for a few days until the next deploy, and maybe the only way to properly test it is via expensive/slow/flaky Selenium integration tests. If so, the Cost of Tests may dictate that you simply should not have automated tests for it.

  • Code under active development needs more tests, while code not under development need less. If a scary piece of code has been untouched for years, it is unlikely to become broken if it wasn't before. Now, you may want tests to make sure it is not already broken, but you won't need tests to guard against regressions and "new" breakage.

  • APIs that aren't going to disappear should be tested more than APIs that might. You should focus more effort in testing the stable interfaces within your application, rather than testing unstable code that may be gone entirely in a week. Combined with the above guideline, the code that deserves the most testing has a stable API but has internals undergoing heavy development.

  • If the complexity in your code is in awkward places (inter-process, browser-server, with database interop, ...) you should make sure you test that logic, no matter how awkward. Do not just test the "easy" things: it doesn't matter how well individual functions are tested if the gnarly/fragile code tying them together ends up broken.

Many of these points are subjective, and cannot be determined purely from the code itself. Nevertheless, these are judgements you have to make when prioritizing where to focus your efforts writing automated tests for your codebase.

Tests are code

Tests are code like any other: your test suite is a piece of software that checks that your "main" software behaves in certain ways. Thus, your test code should be treated like any other proper piece of software:

  • Common test logic should be refactored out to helpers. If there's a bug in some common test logic, it's great to be able to fix it in one place rather than digging through copy-paste boilerplate over the test suite to apply fixes.

  • Tests should have the same code-quality standards applied as normal code: proper naming, formatting, comments, inline-docs, code organization, coding style and conventions.

  • Tests need to be refactored to maintain code quality. Any code gets messy and hard to handle as it grows, requiring refactoring to keep things neat and DRY and well-organized. Test code is no different, and as it grows and changes to support testing the growing/changing feature-set of the main application, it needs periodic refactoring to keep things DRY and maintain code quality.

  • Your test suite should be agile and flexible. If the API being tested changes, it should be quick and easy to change your tests. If a chunk of code is deleted, you should feel free to delete the corresponding tests, and if the code is re-written it should not be hard to re-write the tests to match. Proper abstractions/helpers/fixtures help make sure that modifying and re-writing parts of your test suite isn’t burdensome.

  • If your test abstractions/helpers/fixtures grow complex, they themselves should be tested, at least minimally.

Not everyone agrees with these guidelines. I have seen people who argue that tests are different from normal code. That copy-paste test code is not just acceptable, but preferable to setting up test abstractions and helpers to keep things DRY. The argument being it's simpler to see if there's a mistake in the tests when there's no abstraction. I do not agree with that point of view.

My view is that tests are code like any other, and should be treated as such.

DRY data-driven tests

Tests are code, and code should be DRY and factored such that only the necessary logic is visible and you don't have repeated boilerplate. One good example of this is defining test-helpers to let you easily shove lots of test cases through your test suite, and at-a-glance be able to see exactly what inputs your test suite is testing. For example, given the following test code:

// Sanity check the logic that runs when you press ENTER in the REPL and
// detects whether a set of input lines is...
//
// - Complete, and can be submitted without needing additional input
// - Incomplete, and thus needs additional lines of input from the user
def test1 = {
  val res = ammonite.interp.Parsers.split("{}")
  assert(res.isDefined)
}
def test2 = {
  val res = ammonite.interp.Parsers.split("foo.bar")
  assert(res.isDefined)
}
def test3 = {
  val res = ammonite.interp.Parsers.split("foo.bar // line comment")
  assert(res.isDefined)
}
def test4 = {
  val res = ammonite.interp.Parsers.split("foo.bar /* block comment */")
  assert(res.isDefined)
}
def test5 = {
  val res = ammonite.interp.Parsers.split(
    "val r = (1 until 1000).view.filter(n => n % 3 == 0 || n % 5 == 0).sum"
  )
  assert(res.isDefined)
}
def test6 = {
  val res = ammonite.interp.Parsers.split("{")
  assert(res.isEmpty)
}
def test7 = {
  val res = ammonite.interp.Parsers.split("foo.bar /* incomplete block comment")
  assert(res.isEmpty)
}
def test8 = {
  val res = ammonite.interp.Parsers.split(
    "val r = (1 until 1000.view.filter(n => n % 3 == 0 || n % 5 == 0)"
  )
  assert(res.isEmpty)
}
def test9 = {
  val res = ammonite.interp.Parsers.split(
    "val r = (1 until 1000).view.filter(n => n % 3 == 0 || n % 5 == 0"
  )
  assert(res.isEmpty)
}

You can see that it's doing the same thing over and over. It really should be written as:

// Sanity check the logic that runs when you press ENTER in the REPL and
// detects whether a set of input lines is...
//
// - Complete, and can be submitted without needing additional input
// - Incomplete, and thus needs additional lines of input from the user

def checkDefined(s: String) = {
  val res = ammonite.interp.Parsers.split(s)
  assert(res.isDefined)
}
def checkEmpty(s: String) = {
  val res = ammonite.interp.Parsers.split(s)
  assert(res.isEmpty)
}
def testDefined = {
  checkDefined("{}")
  checkDefined("foo.bar")
  checkDefined("foo.bar // line comment")
  checkDefined("foo.bar /* block comment */")
  checkDefined("val r = (1 until 1000).view.filter(n => n % 3 == 0 || n % 5 == 0).sum")
}
def testEmpty = {
  checkEmpty("{")
  checkEmpty("foo.bar /* incomplete block comment")
  checkEmpty("val r = (1 until 1000.view.filter(n => n % 3 == 0 || n % 5 == 0)")
  checkEmpty("val r = (1 until 1000).view.filter(n => n % 3 == 0 || n % 5 == 0")
}

This is just a normal refactoring that you would perform on any code in any programming language. Nevertheless, it immediately turns the boilerplate-heavy copy-paste test methods into elegant, DRY code which makes it obvious-at-a-glance exactly what inputs you are testing and what their expected output is. There are other ways you could do this, you could e.g. define all the Defined cases in an Array, all the Empty cases in an Array, and loop over them with asserts:

def definedCases = Seq(
  "{}",
  "foo.bar",
  "foo.bar // line comment",
  "foo.bar /* block comment */",
  "val r = (1 until 1000).view.filter(n => n % 3 == 0 || n % 5 == 0).sum"
)

for(s <- definedCases){
  val res = ammonite.interp.Parsers.split(s)
  assert(res.isDefined)
}

def emptyCases = Seq(
  "{",
  "foo.bar /* incomplete block comment",
  "val r = (1 until 1000.view.filter(n => n % 3 == 0 || n % 5 == 0)",
  "val r = (1 until 1000).view.filter(n => n % 3 == 0 || n % 5 == 0"
)

for(s <- emptyCases){
  val res = ammonite.interp.Parsers.split(s)
  assert(res.isEmpty)
}

Both refactorings achieve the same goal, and there are countless other ways of DRYing up this code. Which style you prefer is up to you.

There are a lot of fancy tools/terminology around this idea: "table-driven tests", "data-driven tests", etc.. But fundamentally, all you want is for your test cases to be concise and the expected/asserted behavior obvious-at-a-glance. This is something that normal code-refactoring techniques are capable of helping you achieve without any fancy tooling. Only after you've tried to do this manually, and found it lacking in some way, then is it worth starting to look at more specialized tools and techniques.

Testing DSLs

There are a variety of testing DSLs that let you write tests in a very different way from normal code. I find general-purpose testing DSLs generally unhelpful, though there are use cases for DSLs specialized to a particular narrow use case.

General-Purpose Testing DSLs

These include external DSLs like the Cucumber family, which provide a whole new syntax to write your tests in:

Scenario: Eric wants to withdraw money from his bank account at an ATM
    Given Eric has a valid Credit or Debit card
    And his account balance is $100
    When he inserts his card
    And withdraws $45
    Then the ATM should return $45
    And his account balance is $55
Scenario Outline: A user withdraws money from an ATM
    Given <Name> has a valid Credit or Debit card
    And their account balance is <OriginalBalance>
    When they insert their card
    And withdraw <WithdrawalAmount>
    Then the ATM should return <WithdrawalAmount>
    And their account balance is <NewBalance>

    Examples:
      | Name   | OriginalBalance | WithdrawalAmount | NewBalance |
      | Eric   | 100             | 45               | 55         |
      | Pranav | 100             | 40               | 60         |
      | Ed     | 1000            | 200              | 800        |

To internal/embedded DSLs like Scalatest, which twist the host language's syntax into something english-like, to let you write tests:

"An empty Set" should "have size 0" in {
  assert(Set.empty.size == 0)
}

"A Set" can {
  "empty" should { 
    "have size 0" in {
      assert(Set.empty.size == 0)
    }
    "produce NoSuchElementException when head is invoked" in { 
      intercept[NoSuchElementException] {
        Set.empty.head
      }
    }
    "should be empty" ignore { 
      assert(Set.empty.isEmpty)
    }
  }
}
val result = 8
result should equal (3) // By default, calls left == right, except for arrays
result should be (3)    // Calls left == right, except for arrays
result should === (3)   // By default, calls left == right, except for arrays

val one = 1
one should be < 7       // works for any T when an implicit Ordered[T] exists
one should be <= 7
one should be >= 0

result shouldEqual 3    // Alternate forms for equal and be
result shouldBe 3       // that don't require parentheses

My view of such DSLs is that they are generally not worth the effort. They provide an added level of indirection & complexity, whether through a special syntax/parser/interpreter in the case of Cucumber, or through special extension methods/syntax in the case of Scalatest. Both of these make it harder for me to figure out what a test is testing.

I see such syntaxes as generally inferior to just using asserts and normal helper-methods/for-loops/etc. to write tests. While they often provide additional features like nice error messages, these days test frameworks like PyTest or uTest are also able to provide such "nice" errors using plain-old-asserts:

$ cat test_foo.py
def test_simple():
    result = 8
    assert result == 3

$ py.test test_foo.py
=================================== FAILURES ===================================
_________________________________ test_simple __________________________________

    def test_simple():
        result = 8
>       assert result == 3
E       assert 8 == 3

test_foo.py:3: AssertionError
=========================== 1 failed in 0.03 seconds ===========================

As mentioned earlier, I think that Tests are code, and thus the normal code-writing-tools like functions, objects and abstractions you use when writing normal code works just fine for writing tests. If you aren't using Cucumber-like external DSLs or Scalatest-like embedded-english-like DSLs to write your main project, you should not be using such things to write your test suite.

Specialized Testing DSLs

While I think general purpose testing DSLs like Scalatest or Cucumber are not a good idea, specialized testing DSLs (e.g. narrowly defining the inputs/outputs of a test case) do have a purpose.

For example the MyPy project uses a special syntax to define the input/output of test cases for it's python type checker:

[case testNewSyntaxBasics]
# flags: --python-version 3.6
x: int
x = 5
y: int = 5

a: str
a = 5  # E: Incompatible types in assignment (expression has type "int", variable has type "str")
b: str = 5  # E: Incompatible types in assignment (expression has type "int", variable has type "str")

zzz: int
zzz: str  # E: Name 'zzz' already defined

Where the # E: comments are asserts that the typechecker will raise specific errors at specific locations when checking this file.

My own Ammonite project has its own special syntax to assert the behavior of REPL sessions:

@ val x = 1
x: Int = 1

@ /* trigger compiler crash */ trait Bar { super[Object].hashCode }
error: java.lang.AssertionError: assertion failed

@ 1 + x
res1: Int = 2

In both of these cases, the DSL is narrowly scoped, to the extent where it is "obvious" what it is testing. Furthermore, these DSLs are only necessary when the "noise" of normal code becomes too great. For example, defining the above Ammonite test case in "normal code" looks something like

checker.run("val x = 1")
checker.assertSuccess("x: Int = 1")

checker.run("/* trigger compiler crash */ trait Bar { super[Object].hashCode }")
checker.assertError("java.lang.AssertionError: assertion failed")

checker.run("1 + x")
checker.assertSuccess("res1: Int = 2")

Here, you can see that the Ammonite REPL-test-case DSL is a clear improvement in readability compared to writing the tests in "normal" code! It is in these cases, where a DSL actually reduces the amount of noise/ceremony beyond what normal code can do, where you should reach towards a specialized DSL. In all other cases, and certainly as a default, your tests should be written in the same style of code as the main codebase it is testing.

Example vs Bulk tests

Example tests are those which walk your code through a single (or small number of) example, with careful asserts along the way to make sure it's doing exactly the right thing. Bulk tests, on the other hand, are those which shove large amounts of examples through your code, with a less thorough examination of how each case behaves: just making sure it's not crashing, with perhaps a rough check to make sure it's not completely misbehaving. Fuzz Testing or Property-based Testing are two common approaches within this category.

Like the distinction between Unit vs Integration tests, Example vs Bulk tests are a spectrum, with most tests falling somewhere in the middle. The DRY data-driven tests above, for example lie somewhere in the middle: covering more than one set of input data with the same set of checks, but not the hundreds or thousands of different inputs normally associated with fuzz tests or property-based tests.

The Example vs Bulk spectrum is orthogonal to the Unit vs Integration spectrum, and you can easily find examples towards every extreme in the two spectrums:

Unit Integration
Example Feeding [1, 0] into a sorting algorithm and ensuring it becomes [0, 1]Clicking through a single flow on a website and making sure a particular flow works
Bulk Feeding large amounts of random numbers into a sorting algorithm and ensuring it ends up sorted Clicking around a website randomly overnight to make sure no 500 errors appear

Example Tests

Example tests are often what people first think of when they hear "automated testing": tests that use an APIs in a certain way and check the results. Here's one such test from my FastParse library, which tests a trivial parser that parses a single character a:

import fastparse.all._
val parseA = P( "a" )

val Parsed.Success(value, successIndex) = parseA.parse("a")
assert(
  value == (), 
  successIndex == 1
)

val failure = parseA.parse("b").asInstanceOf[Parsed.Failure]
assert(
  failure.lastParser == ("a": P0),
  failure.index == 0,
  failure.extra.traced.trace == """parseA:1:1 / "a":1:1 ..."b""""
)

As you can see, this takes takes multiple steps:

  • Defining a parser
  • Using it to parse different strings
  • Checking that it succeeds and fails when it should succeed or fail
  • Checking that the contents of each success and failure are what we expect

This is not unlike what you would do poking around in the REPL, except in a REPL we would simply eyeball the values returned by the library while here we use asserts.

Often, example tests are paired with manual testing: you poke around in a REPL or run the main method during development to make sure the feature works. Then you add a test that does basically the same thing to the test suite to ensure the feature keeps working and avoids regressions. If you do TDD, you may write the test first, but everything else remains the same.

Example tests are good documentation: often, just from reading a few examples, it's relatively clear what a module does and how it is expected to be used. Example tests are great for covering the "expected" success and failure cases, those that you probably already tested manually. However, they are not enough to cover "unexpected" cases. You can make it easier to cover a bunch of input/output test cases via DRY data-driven tests, but in the end you are still limited by what examples you can imagine, which are only a subset of all the possible inputs. That is where Bulk tests come in.

Bulk tests

Bulk tests are those that check many more cases than you can cover via manual testing: rather than running a piece of code once and checking what it does, bulk tests run the code with 100s of 1000s of different inputs. This lets you cover unexpected cases you never thought to test manually, or add to your Example tests.

There are well-known approaches like Fuzz Testing or Property-based Testing that are ways of performing bulk tests, and frameworks like frameworks like QuickCheck or ScalaCheck that help with this, and and provide a lot of bells and whistles, but in the end bulk testing boils down to something like this:

for i in range(0, 9999):
    for j in range(0, 9999):
        result = func(i, j)
        assert(sanity_check(result))

Here, we're calling func with a hundred million different inputs, with a simple sanity_check function that doesn't know all the expected outputs for each input, but can check basic things like "output is not negative". At the same time, we are checking that func doesn't throw an exception or loop forever on some input.

What to Bulk Test

Bulk test are slower than single example tests, due to the number of inputs they test. Thus their Cost of tests is much higher, and they should be used with care. Nevertheless, for functionality where the range of possible inputs is large and it's hard to manually pick example tests to cover all edge cases, they can be worth the cost. Examples include:

  • Mathy algorithms with lots of while-loops, if implemented incorrectly, tend to end up in infinite loops when certain combinations of numbers are input

  • Programming language analyzers, where the range of possible input programs is large and often includes code patterns you didn't expect

  • Log file parsers, where often due to the messy-and-unstructured nature of logs it's hard to know exactly what kind of patterns you need to accept or reject

In such cases, feeding in large amounts of varied test-data helps suss out edge cases you may not have thought of yourself. The test data could be a wide range of random numbers, sample programs sourced from the internet, or a days worth of logs pulled from your production environment.

How to Bulk Test

When dealing with such large sets of inputs, "correct" isn't defined by an equally big set of expected outputs. Rather, "correct" is usually defined by a relationship between the input and output that you expect to to be true regardless of what the input is:

  • "any input does not cause program to throw an exception or loop forever".

  • "the output list of a sorting function must contain all values in the input list exactly once, and the output list must be sorted"

  • "all lines parsed from my sample log file must contain a date between X and Y, and no other lines in that log file must contain the flag USER_CREATED"

Usually, the checks you do in these bulk tests are simpler and much less precise than the checks you would do in example tests. It's not practical to run your log-file parser against a log dump and try and assert the thousands of values returned precisely match a thousands-long list of expected values: you are just as likely to make an error in entering your expected-output as you are in the logic of the parser itself! Nevertheless, we know that some properties should always hold true, regardless of exactly what values come out of your program. Those properties are what bulk tests are meant to test for.

Apart from generating a huge pile of input data using for-loops, you can often find lots of real-world input-data to feed into your code. If we are testing a program meant to process Python source code, for example, such a bulk-test may look like

repos = [
    "dropbox/changes",
    "django/django",
    "mitsuhiko/flask",
    "zulip/zulip",
    "ansible/ansible",
    "kennethresitz/requests"
]
for repo in repos:
    clone_repo("https://github.com/" + repo)
    for file in os.walk(repo):
        if file.endswith(".py"):  
            result = process_python_source(file)
            assert(sanity_check(result))

Bulk tests are often much slower than example tests: perhaps taking seconds or minutes to run, instead of milliseconds. Furthermore, bulk tests tend to be opaque and unreadable: when you're generating thousands of test values or loading thousands of test inputs from the internet, it's not clear which inputs are the actual edge cases and which inputs are common and uninteresting.

Minimizing Bulk Tests to Example Tests

Thus it is often worth minimizing the bulk test cases that cause bugs and adding them to your example test suite. This means your example tests end up containing a good selection of the edge cases that occur in the bulk test data. This serves as good documentation for edge cases that someone modifying the program code should pay attention to, and lets them can quickly run tests for the "most important" edge cases to check for basic correctness in milliseconds rather than waiting seconds or minutes for bulk tests to run.

My own FastParse library has a test suite in this fashion: with an expansive (and expensive!) bulk tests suite that spends several minutes pulling in thousands of source files from the internet, parsing them, and performing basic checks ("any file that the existing parser can successfully parse, we can parse too"). This is paired with a large collection of DRY data-driven example tests. These contain minimized examples of all the issues the bulk tests have found, and run in less than a second.

Again, there are Property-based Testing tools like QuickCheck or ScalaCheck that help with writing this kind of bulk test. They make it easy to generate large quantities of "representative" data to feed into your functions, to automatically find "small" failing inputs, that are easier to debug, and have many other nice things. However, they aren't strictly necessary: sometimes, a few for-loops, a dump of production data, or a few large inputs found "in the wild" are enough to serve this purpose. If you are finding the quick-n-dirty methods of performing bulk tests lacking, only then should you start looking for more sophisticated tools.

Cost of tests

Tests are not free: after all, someone has to write them! Even after they're already written, tests are still not free! Every test imposes an ongoing cost on your test suite. Each test:

  • Makes your test suite slower
  • Makes your test suite flakier
  • Will need to be maintained: updated when the main code changes, grepped through during refactorings, etc.

These aren't theoretical concerns:

  • I've worked with codebases with tens of thousands of tests we ran hundreds of times a day: even a 1 in a million chance of flakiness per-test was enough to create several false-positives each day, confusing and frustrating our engineers. Test runs were costing 6-figure sums every month, and the millions of lines of code we maintained actively slowed us down.

  • In my open source work, some projects take a half-hour to run tests in CI (across 5 parallel workers!), and I've been frustrated with flakiness causing false red-builds.

Parallelizing your tests over multiple machines can speed up slow tests, but costs $$$, more than just running the tests on a single machine due to the per-machine setup overhead.

Automated tests can be "not worth it" if they take forever to run, are not reliable, are difficult to maintain and/or cover things which are of low priority to test. Such tests are actively harmful: they should not be written, and if already written should be deleted. I have personally deleted many such tests, e.g. selenium tests for an web upsell experiment that:

  • Added 15 minutes to our test suite: selenium tests easily take a minute each, and this tested many cases through selenium

  • Flaked a several every day

  • Tested a feature that would have been noticed immediately if it wasn't behaving correctly: the experiment already has all the logging for AB testing and tracking users' engagement with it

  • Wouldn't really have mattered if it broke for a day: no data lost, no functionality broken, no user would have even noticed

  • Even if they did matter, they weren't likely to catch any bugs or regressions in the 2-3 weeks before the experiment was going to be discarded anyway

In such cases, you should thank the authors for trying their best to be good engineers and testing their code, but nevertheless delete those tests if they are not pulling their weight.

In my open source work on Ammonite, I similarly ended up deleting many entries from my {Scala-Version x JVM-Version} test matrix that were adding tens of minutes to the test suite but were unlikely to catch any bugs that weren't already caught by other entries in the matrix. While it would be "nice" to run tests on the product of every Scala version and every JVM version, in practice it was costing enough time and catching sufficiently few bugs that it was not worth it.

Refactoring to reduce the cost of tests

Apart from not-writing or deleting tests whose cost is too high, you can also put in effort to try and reduce the cost of the tests you already have. For example, refactoring/modularizing code often lets you push tests away from big "integration" tests towards small "unit" tests, which are faster and more reliable:

  • Global variables often force you to spawn new subprocesses to test application logic. If you refactor the logic to remove the dependence on globals, you can often test the logic within the same process, which can be much faster. This is especially important on slow-booting platforms like the JVM, but even snappy interpreters like Python easily take 10s to 100s of milliseconds to load the necessary modules before running.

  • Database access is slower than in-memory logic; for example, rather than loading from the database in bits and pieces throughout your code, perhaps load the necessary data up-front then feed it into your "core business logic". That way you can run tests of your database-free core business logic in-memory thousands of times faster than if they had to keep interacting with the database.

Essentially, this involves taking a monolithic application which looks like:

                     ____________________
                    |                    |
                    |                    |
                    |    Application     | <-- Lots of Integration Tests
                    |                    |
                    |____________________|

And breaking it down to look something like this:

                         __________
                        |          |
                        |   Main   |  <------- Few Integration Tests
                        |__________|
                        /   |  |   \ 
             __________/    |  |    \__________
            /               /  \               \
 __________/     __________/    \__________     \__________                        
|          |    |          |    |          |    |          |                        
|  Module  |    |  Module  |    |  Module  |    |  Module  | <-- Lots of Unit Tests
|__________|    |__________|    |__________|    |__________|                        

Now that your monolith has been broken down into smaller units, you can then start shifting from the "integration" towards the "unit" ends of the spectrum: many integration tests previously testing logic within the monolithic Application can now be shifted to unit tests for individual Modules, following the guideline of having Tests at every level of the hierarchy.

In these cases, you usually want to leave a few "integration" tests running the Main module to exercise the full flow and making sure the various Modules work together. Even so, the exercise of breaking apart your monolith into modules, and updating your tests to match, should make your test suite run much faster and more reliably, without much of a loss in bug-catching-capability.

Again, this strategy applies at every level of your code hierarchy, whether you are breaking apart a monolithic cluster, monolithic application process, or a monolithic module.

If your test suite is growing big/slow/unreliable, and you are reluctant to delete tests or pay money to parallelize them over different machines, trying to refactor code to convert integration tests to unit tests is one possible way forward.


It is surprisingly easy to write tests with negative value. Tests have an ongoing cost: in runtime, flakiness, and maintenance. This is something that engineers should definitely keep in mind, and actively manage, to maximize their return-on-investment for writing and maintaining their suite of automated tests.

Conclusion

This post has gone over a number of considerations I keep in mind when writing automated tests:

The goal of this post is to paint a different picture of automated tests than is normally discussed: a picture where automated tests lie on continuous spectrums, rather than discrete buckets, and it's up to each project owner to categorize them. Where tests are "just code", subject to the same constraints and allowing for the same techniques, rather than being something special and different. Where tests are ruthlessly prioritized, and those that provide less value than their ongoing costs are culled.

This post is intentionally silent about a whole host of test-writing topics: Test-Driven Development, code coverage, UI testing, and many other things. More than specific tools you should use or techniques you can apply, this post is meant to have painted a coherent set of principles for how to think about automated testing in any software project.

Even without such specific guidance, this post should hopefully provide you a solid foundation that should help you frame, discuss and evaluate any tools, techniques or practices related to automated testing regardless of what project you find yourself working in.

SharknAT&To (vulnerabilities in AT&T U-verse modems)

$
0
0

 Introduction

When evidence of the problems described in this report were first noticed, it almost seemed hard to believe. However, for those familiar with the technical history of Arris and their careless lingering of hardcoded accounts on their products, this report will sadly come as no surprise. For everyone else, prepare to be horrified.

In all fairness, it is uncertain whether these gaping security holes were introduced by Arris (the OEM) or if these problems were added after delivery to the ISP (AT&T U-verse). From examining the firmware, it seems apparent that AT&T engineers have the authority and ability to add and customize code running on these devices, which they then provide to the consumer (as they should).

Some of the problems discussed here affect most AT&T U-verse modems regardless of the OEM, while others seem to be OEM specific. So it is not easy to tell who is responsible for this situation. It could be either, or more likely, it could be both. The hope behind writing this is that the problems will be swiftly patched and that going forward, peer reviews and/or vulnerability testing on new releases of production firmware will be implemented prior to pushing it to the gateways. Security through obscurity is not acceptable in today’s high threat landscape and this is especially true regarding devices which a) route traffic, sensitive communications and trade secrets for millions of customers in the US, b) are directly reachable from the Internet at large, and c) have wireless capability and therefore have an additional method of spreading infection and releasing data.

Regardless of why, when, or even who introduced these vulnerabilities, it is the responsibility of the ISP to ensure that their network and equipment are providing a safe environment for their end users. This, sadly, is not currently the case. The first vulnerability found was caused pure carelessness, if not intentional all together. Furthermore, it is hard to believe that no one is already exploiting this vulnerability at the detriment of innocents. Which is why this report is not passing Go, not collecting $200, and is going straight to the public domain. The vulnerabilities found here will be ordered roughly from least to most prevalent.

1. SSH exposed to The Internet; superuser account with hardcoded username/password.

It was found that the latest firmware update (9.2.2h0d83) for the NVG589 and NVG599 modems enabled SSH and contained hardcoded credentials which can be used to gain access to the modem’s “cshell” client over SSH. The cshell is a limited menu driven shell which is capable of viewing/changing the WiFi SSID/password, modifying the network setup, re-flashing the firmware from a file served by any tftp server on the Internet, and even controlling what appears to be a kernel module whose sole purpose seems to be to inject advertisements into the user’s unencrypted web traffic. Although no clear evidence was found suggesting that this module is actually being used currently, it is present, and vulnerable. Aside from the most dangerous items listed above, the cshell application is also capable of many other privileged actions. The username for this access is remotessh and the password is 5SaP9I26.

Figure 1: Attacker view of cshell after login to an affected U-verse modem.

To reiterate the carelessness of this firmware’s release, the cshell binary is running as root and so any exploitable command, injection vulnerability or buffer overflow will result in a root shell. Yes, it is running as root, and trivially susceptible to command injection. Through the use of the menu’s ping functionality, and due to not sanitizing parameters, one execute arbitrary commands through the menu, or escape the menu altogether. An example payload is shown below.

>> ping -c 1 192.168.1.254;echo /bin/nsh >>/etc/shells

>> ping -c 1 192.168.1.254;echo /bin/sh >>/etc/shells

>> ping -c 1 192.168.1.254;sed -i ‘s/remotessh:\/:\/bin\/cshell/remotessh:\/:\/bin\/nsh/g’ /etc/passwd

Now type exit and then reconnect via SSH. The prompt will change from NOS/xxxxxxxxxxxxx to Axis/xxxxxxxxxxxxxxx. At this point the attacker can type “!” and will be given a busybox root shel!.

Please note that the cshell binary was only examined briefly and only until the easiest exploit was found. Judging by the binary’s repetitive use of unsafe C functions, one can guess that hundreds of additional vulnerabilities exist. However, we find it highly amusing that the first vulnerability found was so trivial that it looks like it came out of one of those “hacking tutorials” that were popular in the 90’s (Google “how to hack filetype:txt”).

This is the first and least common vulnerability that was discovered. The number of exposed devices while not as huge as the rest, but it is still quite unacceptable when you realize that these devices directly correlate to people being put at unnecessary risk of theft & fraud.

Censys reports 14,894 hosts which are likely vulnerable. There is no guarantee expressed or implied in terms of this number being all-inclusive however.

2. Default credentials “caserver” https server NVG599

A HTTPS server of unknown purpose was found running on port 49955 with default credentials. The username tech with and empty password field conveyed access to this highly vulnerable web server, which used only a Basic Authorization scheme. The server seems slightly unstable with its authorization capacity, denying access on the first attempt even with valid credentials and eventually completely locking up with an “unauthorized” message. It remains unclear whether this is just poor coding or more security through obscurity, but either is unacceptable.

3. Command Injection “caserver” https server NVG599

How many vulnerabilities did you find in the screenshot above?

The next vulnerability is the caserver command injection vulnerability. The exact intended purpose of caserver is unclear but its implications are not. Caserver is an https server that runs on port 49955 of affected devices (which seems to only be the NVG599 modem). The caserver script takes several commands, including:

  • Upload of a firmware image
  • Requests to a get_data handler which enumerates any object available in its internal “SDB” databases with a lot of fruitful information
  • Requests to a set_data command which allows changes to the SDB configuration

The screenshot below shows the request which causes command injection, again … as the root user. Note that for the first request the server will probably reply “You are not authorized to access this page”. This can simply be ignored and resubmitting the request shown will yield command execution. The service can be a little quirky, it locks you out after about 5 requests, a reboot will fix the issue if you are testing and running into this problem. The User-Agent field seems to be required but any string will suffice.

There are countless ways to exploit this, but a few quick and dirty stacked commands using wget to download busybox with netcat (mips-BE) from an http server (no SSL support) and then spawn a reverse shell works well.

Estimating the number of hosts affected was trickier due to the service being on an uncommon port. Host search engines such as Censys and Shodan don’t commonly scan for these services or ports. Based on self-collected data, our ballpark figure is around 220,000 devices.

4.Information disclosure/hardcoded credentials

The next vulnerability involves a service on port 61001 which will give an attacker a plethora of useful data about the device. The attacker however, will need to know the serial number of the device ahead of time. Once this information is acquired, the request can be made.

Figure 3:Request to BDC server.

Just before the serial number notice the characters “001E46”. This number correlates with the model number and is a valid Arris Organizationally unique identifier (OUI). This particular OUI was brute-forced from a list of Arris OUIs obtained at https://www.wireshark.org/tools/oui-lookup.html.

When the correct serial number, OUI, and username/password are submitted as above the server will hang for several seconds before returning a response. Afterwards, several pieces of invaluable information are returned about the modem’s configuration, as well as its logs. The most sensitive pieces of information are probably the WiFi credentials and the MAC addresses of the internal hosts, as they can be used for the next vulnerability.

The hardcoded username/password credentials are bdctest/bdctest. This is the second most prevalent vulnerability but at the moment it is not the biggest threat since the modem’s serial number is needed to exploit it. This may change if an attacker were to find a reliable way of obtaining the serial number. If present, an attacker could use the aforementioned “caserver” to retrieve the serial number as well by requesting a valid file present in the webroot other that “/caserver”. Once such example of this would be “/functions.lua”. Sending a GET request to this file will return the serial number amongst the headers.

This normally would not be advantageous for an attacker since the presence of the caserver service equates to root shell access. However, if the caserver is locked, then this is a method to overcome the lockout since only the path ”/caserver” is locked-out.

5.Firewall bypass no authentication

The most prevalent vulnerability based solely on the high number of affected devices is the firewall bypass that is made possible by the service listening on port 49152. This program takes a three byte magic value “\x2a\xce\x01” followed by the six byte mac address and two byte port of whichever internal host one would like to connect to from anywhere on The Internet! What this basically means is that the only thing protecting an AT&T U-verse internal network device from The Internet is whether or not an attacker knows or is able to brute-force the MAC address of any of its devices! Note however, that the first three bytes (six characters) of a MAC address are very predictable since they correspond to the manufacturer. Given this an attacker could very well start out with this scheme with the unknowns marked as:

“\x2a\xce\x01\xab\x23\xed\x??\x??\x??\x??\x??”

To make matters worse, this tcp proxy service will alert the attacker when they have found a correct MAC address by returning a different error code to signify that either the host didn’t respond on the specified port or that an RST was returned. Therefore, the attacker is able to attack the MAC address brute-force and the port brute-force problems separately, greatly decreasing the amount of keyspace which must be covered. The scheme now looks something like this (guessing last three bytes of MAC):

“\x2a\xce\x01\xab\x23\xed\x??\x??\x??\xaa\xaa”

Followed by (Guessing port, same as a TCP port scan):

“\x2a\xce\x01\xab\x23\xed\x38\x41\xa0\x??\x??”

At which point is now feasible to for a determined hacker to use a brute force attack. Aside from the brute force approach, there are other methods of obtaining the MAC addresses. Such as the previously mentioned vulnerability, or using a wireless device in monitor mode in order to sniff the wireless client’s MAC addresses. Basically, if your neighbor knows your public IP address, you are in immediate danger of intrusion.

Going off of the example above, if the device with MAC address ab:23:ed:38:41:a0 has an http server running on port 80 (with the firewall configured to not allow incoming traffic) and an attacker wants to connect and issue a GET request on the webroot. The command will be:

python -c ‘print “\x2a\xce\x01\xab\x23\xed\x38\x41\xa0\x00\x50GET / HTTP/1.0\n\n”’ | nc publicip 49152

This will open an unauthorized TCP connection between the attacker and the “protected” web server despite the user never authorizing it.

It is believed that the original purpose of this service was to allow AT&T to connect to the AT&T issued DVR devices which reside on the internal LAN. However, it should be painfully obvious by now that there is something terribly wrong with this implementation. Added to the severity is the fact that every single AT&T device observed has had this port (49152) open and has responded to probes in the same way. It is also important to note that the gateway itself cannot be connected to in this manner. For example, an attacker cannot set the MAC address to that of the modem’s LAN interface and the port to correspond to the web configuration console. This attempt will fail. This TCP proxy service will only connect attackers to client devices.

In Conclusion

In 2017, when artificial intelligence runs the largest advertising firm on the Internet, when only last year the largest leaks in American history occurred, and where vehicles are self driving, autonomous, Internet connected, and hacked … why do we still find CGI injections, blank default passwords with root privileged services exposed, and what most will likely term “backdoored” credentials?

Developing software is no trivial ask, it is part of this company’s core services, but carelessness of this magnitude should come with some accountability. Below are some workarounds for the vulnerabilities described in this write-up, the time of full disclosure is gone (mostly), but let the time of accountability begin.

Accountability, or is ok to continuously accept free credit monitoring for vendors, governments, and corporations “accidentally” exposing your privacy and in this case, maybe that of your family’s too?

Vulnerability 1: SSH exposed to The Internet; superuser account with hardcoded username/password.

To disable the SSH backdoor, preform the following commands. Substitute “ipaddress” with your gateway’s IP address (internal or external).

ssh remotessh@ipaddress

(Enter password 5SaP9I26)

NOS/255291283229493> configure

Config Mode v1.3

NOS/255291283229493 (top)>> set management remote-access ssh-permanent-enable off

NOS/255291283229493 (top)>> save

NOS/255291283229493 (top)>> exit

NOS/255291283229493> restart

Vulnerabilities 2 & 3; Disable CASERVER for the NVG599.

If suffering also from vulnerability 4, please refer to vulnerability 4’s mitigation steps before proceeding with these steps. Using Burpsuite or some other application, which lets you customize web requests, submit the following request from to the gateway’s external IP address from outside of the LAN.

POST /caserver HTTP/1.1
Host: FIXMYMODEM
Authorization: Basic dGVjaDo=
User-Agent: Fixmymodem
Connection: Keep-Alive
Content-Length: 77

appid=001&set_data=fixit;chmod 000 /var/caserver/caserver;fixit

Vulnerability 4: Information disclosure/hardcoded credentials

At the present time we only have a fix for vulnerability 4 for those who have root access on their gateway. Root access may be obtained by vulnerabilities 1,2, 3, via a serial TTY line, or some other method unknown to us. We will, however, continue searching for a workaround to help those without root access.

For those suffering from the CASERVER vulnerability (port 49955) but not the SSH backdoor, submit the following command before disabling caserver.

POST /caserver HTTP/1.1
Host: FIXMYMODEM
Authorization: Basic dGVjaDo=
User-Agent: Fixmymodem
Connection: Keep-Alive
Content-Length: 77

appid=001&set_data=fixit;chmod 000 /www/sbdc/cgi-bin/sbdc.ha;fixit

Those with access to the SSH backdoor may submit the following command from cshell.

NOS/123456789>> ping -c 1 192.168.1.254;chmod 000 /www/sbdc/cgi-bin/sbdc.ha

Vulnerability 5: Firewall bypass no authentication

The most widespread vulnerability found is luckily the easiest to fix. This mitigation technique only requires access to the modem’s configuration portal and admin password (printed on label). While connected to the LAN, go to 192.168.1.254 in a web browser. Click on Firewall->NAT/Gaming.

Click on Custom Services. Fill in the fields as shown below. In The “Base Host Port” type a port number that is not in use by an internal host (this traffic will be directed to an actual internal host). Port 1 is usually a good choice.

Click Add.

Select a device in “Needed by Device” to redirect traffic to. Make sure the Service that was created in the previous step is selected. Click Add.

Port 49152 should now either not respond or send an RST. Otherwise, check and make sure a service is not running on the chosen internal port (port 1).

Disclaimer: No guarantee is expressed or implied that performing the actions described above will not cause damage to and/or render inoperable any or all electronic devices on and orbiting Earth, including your modem!  If you choose to proceed, you are doing so at your own risk and liability. 

Criticizing Google got me fired

$
0
0

Barry Lynn formerly directed New America's Open Markets program and is the author of "Cornered: The New Monopoly Capitalism and the Economics of Destruction."

I’ve studied monopolies for about 20 years. I got into this line of work back in 1999, when an earthquake in Taiwan resulted in the shutdown of computer factories all over the United States.

What happened was that an earthquake disrupted the flow of electricity to foundries in Taipei, where most of the world’s capacity for a key type of semiconductor was located. The loss of this capacity led to a cascading crash of industrial activity, similar to a financial crash.

For me, this realization opened a window into a world that our economic textbooks tell us shouldn’t exist: A world in which a few giant corporations control all of various types of production and supply. Worse, a world in which those corporations sometimes put almost all the capacity to build some vital industrial input in a single physical place in the world.

Since then, I’ve written two books on monopolies and written many articles and op-eds. Much of my work has continued to focus on the ways that concentration of capacity can make complex systems like banking and communications — in addition to industrial production — subject to potentially catastrophic disruption.

What I came to understand is that the changes in the enforcement of antitrust laws that had allowed a few corporations to use their power in ways that put all society at risk, had also resulted in huge threats to our economic and political well-being.

Antimonopoly law, I learned, dates to the founding of our nation. It is, in essence, an extension of the concept of checks and balances into the political economy. One goal of antimonopoly law is to ensure that every American has liberty, to change jobs when they want, to create a small business or small farm if they want, to get access to the information they want. Another goal of antimonopoly is to ensure that our democratic institutions are not overwhelmed by wealth and power concentrated in the hands of the few.

What I also learned is that since the early days of the Reagan Administration, power over almost all forms of economic activity in America has been steadily concentrated in fewer and fewer hands. This includes retail and transportation. It includes pharmaceuticals and farming. It includes almost every corner of the Internet.

This concentration affects our economic well-being.  It’s what explains why, for example, the percentage of Americans who own their own businesses has been falling for the last generation.  As more and more of the economy become sown up by monopolistic corporations, there are fewer and fewer opportunities for entrepreneurship.

It also explains why me must pay more or many services.  As hospitals continue to merge into giant chains, for example, they are able to pass along ever higher prices without having to worry about losing business to competitors. And anyone who flies these days can attest to what happens when just four airlines control 80 percent of the market.

But even more important is the way increasing monopoly affects us politically.

It means that we all enjoy less freedom to do what we want in our jobs and our lives. It means that fewer and fewer companies are competing for our labor, allowing employers to gain more and more power not only over how we do business, but also how over we speak, think and act.

If you want a good example of how giant corporations sometimes misuse the power that concentration gives them, just look at what happened to me.

For the last fifteen years, I’ve done my antimonopoly writing and research at a think tank in Washington named New America. This last June 27, my group published a statement praising the European Union for fining Google for violating antitrust law. Later that day I was told that Google — which provides substantial support to other programs at New America — said they wanted to sever all ties with the organization. Two days later I was told that the entire team of my Open Markets Program had to leave New America by September 1.

No think tank wants to appear beholden to the demands of its corporate donors. But in this instance, that’s exactly the case. I — and my entire team of journalists and researchers  at Open Markets — were let go because the leaders of my think tank chose not to stand up to Google’s threats. (In a statement, New America has denied that this was the case.)

We should all be worried about big business interfering with our speech, our thinking and our expression. By design, the private  business corporation is geared to pursue its own interests. It’s our job as citizens to structure a political economy that keeps corporations small enough to ensure that their actions never threaten the people’s sovereignty over our nation. The first and most vital step to this end is to protect the media we use to communicate with one another from being captured by a few giants.

But today we are failing. Not only are we not preventing concentration of power over our economy and our media. We are not protecting the groups that are working to prevent and reverse that concentration of power.

Wherever you work, whatever you do, your livelihood and your liberties are every day more at risk as long as we allow a few giant corporations — especially in online commerce — to continue to extend their reach into and over the world of ideas.

India's workhorse rocket fails for the first time in decades

$
0
0

India’s premier rocket failed to put a navigation satellite into orbit during a launch this morning, after some unknown malfunction prevented the satellite from leaving the vehicle.

The rocket, known as the Polar Satellite Launch Vehicle, or PSLV, successfully took off from the Satish Dhawan Space Centre in southeastern India at 9:30AM ET. About a little over 10 minutes into the flight, however, the rocket seemed to be in a lower altitude than it need to be. A host during the live broadcast of the launch noted that there was a “variation” in the rocket’s performance. Later, an official with the Indian Space Research Organization (ISRO) confirmed that the payload fairing — the cone-like structure that surrounds the satellite on the top of the rocket — failed to separate and expose the satellite to space. So the satellite was effectively trapped inside the fairing and could not be deployed into orbit.

It seems possible that the rocket’s low trajectory had to do with the fact that the fairing didn’t separate, making the vehicle heavier than it was supposed to be. "If the fairing doesn’t separate you’re lugging along all this extra weight, so you lose velocity and height,” Jonathan McDowell, an astrophysicist at Harvard and spaceflight expert, tells The Verge. For now, it looks as if the top of the rocket with the trapped satellite will remain in orbit around Earth for the time being, says McDowell. Eventually, the air in Earth’s atmosphere will drag it down, though, and the vehicle will burn up during the descent.

It’s an unexpected failure for a fairly reliable rocket. Over the last 24 years, the PSLV has flown 41 times and has only suffered two failures in its launch history — the most recent mishap occurring during a mission in 1997. However, that mission was not a total loss as the satellite it carried was still able to make it to orbit. This was the first total failure of the rocket to happen since the PSLV’s very first failure in 1993.

The PSLV has become the backbone of India’s space program, used to launch probes to both Mars and the Moon. It’s also recently become a great ride-share option for satellite operators, allowing multiple probes to be sent into space during one launch. In February, the PSLV set a record by launching 104 satellites at once, which is the most that has ever gone up on a single rocket. Today’s launch was only supposed to send up one satellite, though — the eighth satellite of the Indian Regional Navigation Satellite System.

The PSLV also has some important launches coming. The vehicle is supposed to carry a private lander to the Moon for TeamIndus, a competitor in the Google Lunar X Prize to send a privately funded vehicle to the lunar surface. However, with today’s failure, it’s unclear how the rocket’s schedule will be affected in the months ahead.

Choosing the right IoT button for your use

$
0
0

Selecting the right IoT button to match your needs may be a trickier task than you might think at first. And why wouldn’t it be. There are so many alternatives to choose from. What’s more, all of them with different feature sets.

So where do I start the selection?, you might ask. That’s a genuine concern. And a reason this guide was created.

The first thing to ask is:

How will you be using your IoT button(s)?

At home or in business?

How many people use it/them?

What type of actions do they need to support?

Do they have to be integrated into existing platforms or my company’s systems?

What if I give buttons my customers to place re-orders from my store? Are the payments secure?

Is it easy to deploy thousands of buttons to my customers?

Can I support my buttons remotely? Or, what’s more, can I have to manage the buttons remotely?

Solid questions and issues you need to think through.

Following this guide will pretty much steer you into the right direction.

(P.S. Few additional tips for you who consider using IoT buttons in business)

In business matters, business tools matter

To effectively operate your IoT button fleet you need to be able to manage them, collect statistics and analytics – and naturally provide support if such an occasion arises.

bttn was initially created to serve as a business tool. Thus it comes with all necessary management tools businesses need.

We’ve prepared an introduction to bttn management tools for businesses. You can download it here.

Consider IoT button deployment already when selecting your IoT button

Usually one of the most overlooked aspects in development projects is deployment. Depending on the vendor, IoT button deployment may turn out to be trickier than expected.

With bttn we have put a lot of effort to make everything run as smooth as possible on the cloud and hardware side so deployment is a breeze for you.

How to distribute the bttn devices to your customers? Do they know how to setup their bttns? Worry no more. Guess why we’re such big believers in stand-alone connectivity (and at the same time not-so-big-fans of local infrastructure like Wi-Fi or Bluetooth)? Exactly. To be easy to use. Just power on and everything works. Like magic.

We’ve prepared a bttn deployment guide that covers everything from planning to actually executing the deployment. You can download it here.

guide-to-choosing-the-right-iot-button
Guide to choosing the right IoT button. For PDF download, scroll down to get link.

To test how bttn cloud service and management tools for business work, get a free trial to a virtual bttn. 

Guides

Management tools for businesses

Deployment guide

Integrations

Bttn for commerce API

Mastering Bayes with R

$
0
0

The mind of Homo sapiens is an inference machine. The visual inputs of a darkening sky and distant sound of thunder, combined with the the internal state of your brain (memory), leads to the inevitable output of you taking your umbrella with you.

However, sometimes our minds lets us down.

Objective

This blog post considers some common malfunctions in our reasoning by using some toy examples where statistics can help and then moves on to the modern problem of deciding which web page is better at increasing the outcome of interest (e.g. advert clicks); page A or page B?

Consider the following:

  • I plan to toss a fair coin a total of 100 times.
  • After 10 tosses we have 8 Heads (H) and 2 Tails (T).

What is the total number of Tails we expect to get by the end of the experiment? (For N = 100, where N is the number of trials in the experiment).

Take a moment and think.

Gambler’s fallacy

For 100 coin tosses you might expect 50 Tails. This expectation that things will even themselves out is such a common misconception in humans it has a name; the Gambler’s fallacy. Because we got more Heads in the first 10 tosses we will get fewer Heads in the next 90 tosses… In situations where what is being observed is truly random (i.e., independent trials of a random process), this belief, though appealing to the human mind, is false.

Additional information

For you to be reading this esoteric blog, you are probably aware of the Gambler’s fallacy and provided a different answer. Knowing that the first 10 tosses gave 8 H and 2 T, combined with the assumption that every toss is independent, you simply add half of the remaining tosses to H and half to T.

  • 8 + 45 = 53 H
  • 2 + 45 = 47 T

This isn’t that different to 50:50, how can we test whether we need to reject the null hypothesis of 50 H and 50 T for the above experiment? Do we have sufficient statistical power to elucidate this problem? What happens if we scale the problem, are we better or worse at using our gut to solve these problems? This is where inferential statistics comes in handy and lets us know when a difference between data is larger than might be expected by chance alone.

A Victorian version of the Pepsi Challenge

An interesting tangent that uses a Lady’s gut to solve a problem; consider Fisher’s Tea Drinker. Versions of the story vary see Wikipedia or the R help which is referenced to Agresti (1990).

?fisher.test()

A British woman claimed to be able to distinguish whether milk or tea was added to the cup first. To test, she was given eight randomly ordered cups of tea, in four of which milk was added first. She was to select the 4 cups prepared by one method, giving her an advantage of comparison.

The null hypothesis is that there is no association between the true order of pouring and the woman’s guess, the alternative that there is a positive association (that the odds ratio is greater than 1). Whatever the outcome (I prefer the story where she gets them all correct), it’s a memorable story that frames the statistical process of hypothesis testing as a permutation test; it’s readily interpretable.

Note, how as this was prior to the Neyman-Pearson method, Fisher gives no alternative hypothesis.

Agresti gives

TeaTasting<-matrix(c(3,1,1,3),nrow=2,dimnames=list(Guess=c("Milk","Tea"),Truth=c("Milk","Tea")))TeaTasting
##       Truth
## Guess  Milk Tea
##   Milk    3   1
##   Tea     1   3
fisher.test(TeaTasting,alternative="greater")
## 
## 	Fisher's Exact Test for Count Data
## 
## data:  TeaTasting
## p-value = 0.2429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.3135693       Inf
## sample estimates:
## odds ratio 
##   6.408309

In this case as p > 0.05, an association could not be established. We fail to reject the null hypothesis given the data.

For an accessible discussion of the p-value and why “The Earth is Round (p < .05)” see Cohen (1994).

Wikipedia reference says all correct!

However, if we repeated the experiment in a parallel universe and the Lady got them all correct, as cited from the Wikipedia reference (Salburg, 2002)…

Again, the test statistic was a simple count of the number of successes in selecting the 4 cups. The null hypothesis distribution was computed by the number of permutations. The number of selected permutations equaled the number of unselected permutations. Using a combination formula, with n=8 total cups and k=4 cups chosen, we show there are 70 combinations.

[ \frac{8!}{4!(8 - 4)!} ]

Thus, if the women guesses all correct, then that’s a 1 in 70 chance. Or a p-value of…

(1/70)
## [1] 0.01428571

Or using Fisher’s test in R:

TeaTasting2<-matrix(c(4,0,0,4),nrow=2,dimnames=list(Guess=c("Milk","Tea"),Truth=c("Milk","Tea")))TeaTasting2
##       Truth
## Guess  Milk Tea
##   Milk    4   0
##   Tea     0   4
suppressWarnings(fisher.test(TeaTasting2,alternative="greater"))
## 
## 	Fisher's Exact Test for Count Data
## 
## data:  TeaTasting2
## p-value = 0.01429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  2.003768      Inf
## sample estimates:
## odds ratio 
##        Inf

We can also use chi-squared test if we feel like it (different test, same data). You may remember from your A-level Biology classes of manually calculating the chi-squared test statistic. Essentially you are comparing the difference between the observed and expected while controlling for the degrees of freedom and the number of trials.

suppressWarnings(chisq.test(TeaTasting2))
## 
## 	Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  TeaTasting2
## X-squared = 4.5, df = 1, p-value = 0.03389

However, Pearson’s Chi-squared test provides a different p-value, that’s because it’s a different test and method (it performs poorly when any marginal totals sum to zero or with small samples, as we have here). Performing the appropriate statistical test can be Tea-testing! It’s better to plan your experiment and analysis beforehand.

Traditional statistics

These examples above are probably familiar to most readers and represent the frequentist school of thinking which contributed to massive progress for statistics and science in the twentieth century.

For Frequentist statistics, the parameters of a data generating distribution are set and we attempt to infer the value of these parameters by inspecting the data which are randomly generated by those distributions / parameters.

Let’s look at how we can use Frequentist statistics to solve the modern problem of deciding which web page is better, A or B?

Frequentist A / B Testing

Suppose our lead content designer has identified problems with our landing page (page A). They create a new, possibly better page based on this hunch (page B). We as, data scientists, want to measure which page is better, using data and statistics.

Our users, are randomly assigned what page they land on, either A or B. They then either click through to where we want them to go, or they do not. The average click-through-rate (CTR) is calculated for each page by adding the number of success and dividing by the number of unique users visiting the page (number of trials). This is usually converted to a percentage by multiplying by 100.

Just pick the higher rate?

CAVEAT: this is for illustration purposes, the analysis you intend to use should be declared before the experiment to protect yourself from fudging and p-value hacking shenanigans. Furthermore, your effect size is likely to be much smaller (the relative difference in CTR between pages if real).

Imagine we have 10 trials per page, where the CTR:

Surely A is better than B? Hmm, we recall the concept of variation and confidence intervals which makes us concerned about accepting that page B is better, what if this were a fluke due to the vagaries of chance?

We would feel more confident if we had more trials, right?

Or

pageTest<-matrix(c(10,20,90,80),nrow=2,dimnames=list(Page=c("A","B"),Clicked=c("Click","No click")))pageTest
##     Clicked
## Page Click No click
##    A    10       90
##    B    20       80

This data is interesting as for each trial there is either a success or a failure (click or no click). This suggests a Bernoulli distribution may be better than a Gaussian as our generative distribution for the data (it’s zeroes and ones).

NOTE: see Wikipedia for help in picking an assumed generative distribution for your experimental data and the associated test.

How do we quantify this?

By using statistical significance testing; is the difference in CRT between pages statistically significant?

At this point we should clarify our question by deciding upon our significance level, $\alpha = 0.05$. Interestingly, $\alpha$ is also the exact probability of our rejecting the null hypothesis when it is true (we falsely believe that there is a difference in CRT for the pages).

Thus, we ask, is the difference in CRT between pages statistically significant at significance level $\alpha$?

Our alternative hypothesis is two-sided as we acknowledge that our page may have been made worse. It is ethical to consider this and be able to detect inferior performance (like comparing a new medicine to the current gold standard).

chisq.test(pageTest)
## 
## 	Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  pageTest
## X-squared = 3.1765, df = 1, p-value = 0.07471

Or in Python?

Is p < $\alpha$?

We fail to reject the null hypothesis given the data. Damn, no significance! Can’t we just keep running the experiment a little bit longer to see if we can reach significance?

This is dodgy ground and known as p-value hacking (Head et al., 2015). That’s why it’s important to write your experimental and analytical design down and declare it to your colleagues beforehand, thus mitigating this kind of temptation.

p value hacking

Why is it inappropriate to peek at our p-value during the experiment? Surely page B will either be better or it won’t, aren’t we being a bit over the top?

Let’s repeat a similar experiment to above but peek at the p-value frequently so we can ensure to stop when we get the answer we want; the one that matches our beliefs and assumptions (i.e. that my new page design is awesome!) (this is p-value hacking and is not cool).

We write a function that demonstrates p-hacking for us (see the code comments for help). This is why R is awesome and why simulation is very useful for helping you to understand statistics. You could use this could for rough-and-ready power analysis if your feel confident.

The logic of the function is sort-of like this:

  • Simulate CTR for pages A and B as series of n Bernoulli trials by random generation for the binomial distribution with the parameter probability of success (CTR when presented with the pages A or B).
  • For each page we expose it to n users (A gets n visitors so does B).
  • Do a chi-squared test and plot the p-value as we accumulate data, one by one (thus we are peeking rather than using an agreed stopping rule).
  • We use set.seed in the for loop to make it reproducible.
  • Plot the p-values against number of users (we peek at the p-value after every unique user is tested).

NOTE: this is not the optimal way to set-up the code but I wanted it to be clear and thinking hurts. We also suppress warnings for chi.sqtest as it complains when n is small.

#  a user clicking is called a click, a user not clicking a nick (based on spam and ham for email classification problems)p_h4ckz0r<-function(ap,bp,n,alpha=0.05,seed=1337){# ap, CTR for page A# bp, CTR for page B# number of trials# alpha is our significance level, probability of rejecting H0 when it's true# seed, run in a different universe or the same one# run individual experiment given# the mean CTR of page A (ap) and B (bp)# p is the probability of success# q is the probability of failure# create empty list to hold datap_values<-data.frame("p"=rep(0,n),"number_of_users_per_page"=rep(0,n),"reject_h0"=as.logical(rep(0,n)))for(iin1:n){# make it reproducible, as we loop throughset.seed(seed=seed)a_click<-sum(rbinom(i,1,ap))b_click<-sum(rbinom(i,1,bp))a_nick<-i-a_clickb_nick<-i-b_click# create a 2 by 2 contingency table to pass# to chi.qstestpageTest<-matrix(c(a_click,b_click,a_nick,b_nick),nrow=2,dimnames=list(Page=c("A","B"),Clicked=c("Click","No click")))# print(pageTest)# store p-valuesp_values[i,"p"]<-suppressWarnings(chisq.test(pageTest)$p.value)# store np_values[i,"number_of_users_per_page"]<-i# accept or rejectp_values[i,"reject_h0"]<-as.integer(p_values[i,"p"]<alpha)}#print(tail(p_values, 100))plot(y=p_values$p,x=p_values$n,type="l",xlab="Number of users visiting each page (n)",ylab="p-value",col="blue")abline(h=alpha,col="red")# significance level}

We then run our function using the same inputs as from the experiment above where unbeknownst to us the CTR for page A is 0.1 and for page B is 0.2. Let’s imagine we ran it over the course of two weeks and kept peeking at the p-values as we go (we magically end up with the exact number of visitors randomly assigned to page A or B). We specified beforehand that we are working with $\alpha$ set to 0.05.

As we collect more data we see that the p-value begins to creep towards $\alpha$. However, this is a somewhat wiggly line. This scenario is quite straightforward as peeking early would not be too detrimental, except around 450 where p creeps above alpha briefly.

p_h4ckz0r(ap=0.1,bp=0.2,n=500,alpha=0.05,seed=1337)

plot of chunk 2017-06-18_hack1

If we ran this experiment again or in a different universe, a different scenario likely would arise, we do that here by specifying a different random seed which affects the Bernoulli trials we generate using rbinom.

p_h4ckz0r(ap=0.1,bp=0.2,n=500,alpha=0.05,seed=255)

plot of chunk 2017-06-18_hack2

Here we see a situation where the p-value weaves in and out of significance. What would happen if your stopping rule was at n = 100 (that’s why it’s important to do power analysis beforehand as it tells you what n needs to be to limit this risk)?

What about if our CTR for each page are identical?

p_h4ckz0r(ap=0.1,bp=0.1,n=200,alpha=0.05,seed=37)

plot of chunk 2017-06-18_hack3

Here we see the importance of not peeking. Prior to the experiment you should have done a power analysis to determine what was as a suitable sample size to detect an effect size of your specification given your $\alpha$. Otherwise, by peeking around 80 trials for each you might get the false impression that page B is better than A, thus falsely rejecting the null hypothesis. You’ll then have this page artifact that doesn’t add any value, yet you have a p-value to support the inclusion of it - like the vestigial hip bone of a whale (although at least that was useful once). Thus, you begin to accrue a bunch of junk.

The same principle holds for clinical trials.

The p-value interpretation

Wikipedia defines a p-value as follows:

The p-value or probability value is the probability for a given statistical model that, when the null hypothesis is true, the statistical summary (such as the sample mean difference between two compared groups) would be the same as or more extreme than the actual observed results.

If we rephrase that in the context of our example, if the difference for CTR for page A and B is very large, then the Chi-square statistic will be large (controlling for n), thus p will be small. If p is below $\alpha$, we can say that the difference is statistically significant and we can reject the null hypothesis of no difference. If p is greater than $\alpha$, it means we can’t reject the null hypothesis with the data we collected (it does NOT mean the null hypothesis is true).

Statistical significance is not the same as real world significance

Typical numbers you see for CTR might be single digits or tens of percent (e.g. 1% CTR or 10% CTR, depending on the context). Differences between groups or the effect size between page A and B can be small (1%, 0.1% or 0.01%), so why bother?

The analysis doesn’t end with the p-value. Why not do a break-even analysis to work out whether its worth the effort by adding a currency value estimate, as below.

# lots of visitors means small improvements can mean many more happy customers, or higher CTR!number_visitors_per_day<-1e6# quantify the value of a succesful Click throughpounds_per_CTR<-15.45# pounds# 1% CTRpageA<-0.01daily_earnings_pageA<-number_visitors_per_day*pageA*pounds_per_CTR# effect size improvement, for page B over page Aeffect_size<-0.001daily_earnings_pageB<-number_visitors_per_day*(pageA+effect_size)*pounds_per_CTRcat("Using page B will earn you an extra £",daily_earnings_pageB-daily_earnings_pageA,"per day.")
## Using page B will earn you an extra £ 15450 per day.

Is it worth the effort? Asking questions like these quantifies real world value of any page changes you may want to make given the data.

The numbers are where the scientific discussion should start, not end. (Nuzzo, 2014)

Traditional A / B Testing Summary

  • It’s confusing, even for professional data scientists, making it hard to communicate with non-data scientist.
  • First: define null hypothesis and alternative hypothesis.
  • Define experiment and analysis beforehand, using power analysis to help decide on stopping conditions.
  • Result of test: reject null hypothesis or do not reject it.
  • Failing to reject the null hypothesis is not the same as accepting it (perhaps you need more data).
  • If the variance is large, or the effect size(i.e. the smallest difference in CTR (10%, 5% or 1%?)) you want to detect is small you’ll need more data.
  • More data gives greater statistical power.

Here’s a another problem, again from the twentieth century; a problem which our stubborn mind can struggle to believe. We use this to introduce an orthogonal approach to Frequentist statistics.

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice? - (Whitaker, 1990, as quoted by vos Savant 1990a)

  1. You pick a door (you don’t get to see what’s behind it) (door #1)
  2. Monty Hall opens a door you didn’t pick, reveals a goat (door #2)
  3. You are given a choice: stay with door #1, or switch to door #3

Which do you chose?

On average, you should switch - really, you should.

According to the comprehensive Wikipedia page on this problem, even pigeons learn to cope with this problem better than humans. With people writing and complaining and rejecting logical and mathematical solutions to the problem. Sometimes our inference machine can let us down.

The solution

We provide a brief solution here given some Bayesian thinking and based on the Lazy programmer course course.

  • Assume we choose door #1 (each probability is conditioned on this) and C = ? tell us which door the car really is behind…

Initially, the probability of (p):

  • If H = door that Monty Hall opens
  • Assume he opens door #2 (it doesn’t really matter as the problem is symmetric).

Then, thinking about the probabilities (Monty Hall can’t open your door, nor can he reveal where the car really is, he has to show a goat):

What probability do we actually want? Stick or twist… (where H is what door Monty Hall reveals has a goat).

Now for Bayes rule… (read the pipe as “given”)

The probability of the Car being behind door number 3 given Monty Hall shows there’s a goat behind door number 2 (and we’ve already picked door number one).

Therefore we should always switch! (Not switching p given by, 1 - 2/3 = 1/3)

Bayes

Reverand Thomas Bayes first provided an equation that allows new evidence to update beliefs in the 1700s. His early work paved the way for the development of a very different and orthogonal approach to frequentist statistics. These methods further contrast to the hypothesis testing of the Frequentist statistical approach developed by Fisher, Pearson and Neyman.

Bayesian methods are better aligned with how we think about the world; being readily interpretable. This contrasts with the language used in frequentist statistics where we are likely to be misunderstood by non-scientists due to our reliance on p-values and their counter-intuitive definition. There are other reasons to prefer Bayes method which we do not explore here.

At its heart, Bayesian inference is effectively an application of Bayes theorem. In many ways it is about what you do with a test’s result, rather than something you use instead of a test. Bayesian inference enables you to choose between a set of mutually-exclusive explanations, or to quantify belief. Assuming you are not interested in an algebraic ‘proof’ of this theorem, we proceed.

For another situation, predicting rates of rare events, where the Bayesian approach was preferred to Frequentist see my open paper (Gregory et al., 2016)(see Supplementary 5.1.3 of the paper for Bayesian methods to produce a posterior probability distribution for the transformation efficiency of a species).

Bayes A/B testing in R

library(bayesAB)

We quote the bayesABpackage vignette to kick things off.

Bayesian methods provide several benefits over frequentist methods in the context of A/B tests - namely in interpretability. Instead of p-values you get direct probabilities on whether A is better than B (and by how much). Instead of point estimates your posterior distributions are parametrized random variables which can be summarized any number of ways. Bayesian tests are also immune to ‘peeking’ and are thus valid whenever a test is stopped. - Frank Portman

The Bayesian approach provides a context of prior belief, which attempts to capture what we know about the world or the problem we are working on. For example in my publication (Gregory et al., 2016), I used the Bayesian approach to estimate the probability of success of an experiment that had hitherto not been attempted. The Frequentist philosophy struggles with these kinds of problems.

Your prior beliefs are encapsulated by the distribution of a random variable over which we believe our parameter to lie. In the context of A / B testing, as you expose groups to different tests, you collect the data and combine it with the prior to get the posterior distribution over the parameter(s) in question. Your data updates your prior belief.

(This is hard to solve! We can use MCMC sampling methods instead.)

The use of an informative prior overcomes Frequentist method issues of repeated testing, early stopping and the low base-rate problem. The frequentists must specify a significance level and stopping rule, the Bayesians must specify a prior.

This ability to exploit your accrued knowledge might be particular desirable given an extreme example. Imagine you’re doing a clinical trial and drug B is working well, can you stop the test and improve the well-being of all your patients? Frequentist statistics says we can’t due to the issues of p-hacking which affects our overall statistical error rate. Bayes let’s us explore and exploit our data.

bayesAB examples

We design an experiment to assess the click-through-rate (CTR) onto a page. We randomly show users one of two possible pages (page A or page B). We want to determine which page, if any, has a higher CTR.

We use rbinom to randomly generate two Bernoulli distributions, we provide each page with a different probability of success or CTR. To keep things interesting we hide the generation of these data from you (we generate the data in a similar way to our custom p_h4ckz0r function).

Always look at the data first. We plot the CTR for pages A and B to visually compare.

par(mfrow=c(1,2))barplot(table(A_binom),main="Clicks on A",ylim=c(0,500))barplot(table(B_binom),main="Clicks on B",ylim=c(0,500))

plot of chunk 2017-06-18_bar

Page A appears to have a higher CTR. Perhaps our new page made things worse!? We calculate the summary statistics for both pages’ data.

summary(A_binom)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   0.236   0.000   1.000
summary(B_binom)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00    0.18    0.00    1.00

The bayesAB workflow

  • Decide on a generative distribution (Bernoulli for CTR).
  • Decide on a suitable prior distribution parameters for your data.
  • Fit a bayesTest object.

Decide how you want to parametrize your data

Based on our prior knowledge we know that our page A had a CTR (therefore Bernoulli) of about 25%; something we were trying to improve with using the font Comic Sans. We’ve got a lot of data to support this so we’re pretty sure about the selection of our prior as being a Bernoulli distribution with the parameter of CRT to lie between .2-.3 range, thus covering the .25.

Given the binary nature of click or not, we know that the CTR must be between zero and one. We browse for our probability distribution reference list on Wikipedia and discover a suitable candidate; the Beta distribution!

Decide on prior

The vignette tells us:

The conjugate prior for the Bernoulli distribution is the Beta distribution. (?bayesTest for more info).

# beta distribution has two parameters, use trial and error to get the distribution to look like your imagined prior distribution.
# the peak should be centred over your expected mean based on previous experimentsplotBeta(alpha=100,beta=200)

plot of chunk 2017-06-18_beta1

This is a bit off, and doesn’t match our desire to have a conjugate prior encapsulating the 0.2-0.3 range (determining the correct prior comes with trial and error and practice). Let’s try again…

plotBeta(65,200)#perfect

plot of chunk 2017-06-18_beta2

Now that we’ve settled on a suitable prior that matches our pre-held belief, let’s fit our bayesTest object. We are happy with this as our expectation is that the mean CTR of a typical page is around 0.25 with some variation between experiments captured by the distribution which is narrow and falls off to zero sharply. If we were more uncertain the spread of the distribution would reflect this by having a wider base. We can adjust the Beta distribution variance by increasing the size of our parameters (try it).

plotBeta(650,2000)#narrow

Fit it

ab1<-bayesTest(A_binom,B_binom,priors=c('alpha'=65,'beta'=200),n_samples=1e5,distribution='bernoulli')

This function fits a Bayesian model to our A/B testing sample data. bayesTest also comes with a bunch of generic methods; print, summary and plot.

print(ab1)
## --------------------------------------------
## Distribution used: bernoulli 
## --------------------------------------------
## Using data with the following properties: 
##          [,1] [,2]
## Min.    0.000 0.00
## 1st Qu. 0.000 0.00
## Median  0.000 0.00
## Mean    0.236 0.18
## 3rd Qu. 0.000 0.00
## Max.    1.000 1.00
## --------------------------------------------
## Priors used for the calculation: 
## alpha  beta 
##    65   200 
## --------------------------------------------
## Calculated posteriors for the following parameters: 
## Probability 
## --------------------------------------------
## Monte Carlo samples generated per posterior: 
## [1] 1e+05

print talks describes the inputs to the test; the summary statistics of the input data for CTR of pages A and B, the prior (our belief) and the number of posterior samples to draw (1e5 is a good rule of thumb and should be large enough for the distribution to converge).

summary(ab1)
## Quantiles of posteriors for A and B:
## 
## $Probability
## $Probability$A_probs
##        0%       25%       50%       75%      100% 
## 0.1746725 0.2286782 0.2388918 0.2494973 0.3095461 
## 
## $Probability$B_probs
##        0%       25%       50%       75%      100% 
## 0.1425819 0.1926600 0.2023210 0.2122469 0.2665380 
## 
## 
## --------------------------------------------
## 
## P(A > B) by (0)%: 
## 
## $Probability
## [1] 0.95843
## 
## --------------------------------------------
## 
## Credible Interval on (A - B) / B for interval length(s) (0.9) : 
## 
## $Probability
##          5%         95% 
## 0.008605099 0.385517471 
## 
## --------------------------------------------
## 
## Posterior Expected Loss for choosing B over A:
## 
## $Probability
## [1] 0.009377985

summary gives us some really interesting values. We’ll pick out the credible interval as our main talking point. Bayesian intervals treat their bounds as fixed and the estimated parameter as a random variable, whereas frequentist confidence intervals treat their bounds as random variables and the parameter as a fixed value.

The credible interval is more intuitive to both the scientist and the non-scientist. For example, in the experiment above we can be fairly certain that the use of the Comic Sans font in Page B has had a negative effect on CTR.

  • We can quantify this and say that we are 95.8% certain that page A is better than page B.
  • We can go further, and say that the Credible Interval on (A - B) / B is between 0.008 and 0.386 times better for Page A relative to Page B.

To elucidate that last bullet point we show that the mean CTR difference between pages divided by page B mean is 0.31, thus page A is 31% better. Rather than relying on just a point we have access to the whole credible interval distribution, with a credible interval length of 0.9 (0.95 - 0.05) the default.

((sum(A_binom)/length(A_binom))-(sum(B_binom)/length(B_binom)))/(sum(B_binom)/length(B_binom))
## [1] 0.3111111

plot plots the priors, posteriors, and the Monte Carlo ‘integrated’ samples. Have a go at interpreting these plots yourself.

p2<-plot(ab1)p2

plot of chunk 2017-06-18_bayesABplot of chunk 2017-06-18_bayesABplot of chunk 2017-06-18_bayesAB

What did the data generation method looked like?

Remember we simulated the data ourselves? This is what it looked like.

5% CTR difference and 500 trials / user visits per Page.

It appears the experiment was a failure and the switch to the Comic Sans font negatively affected the CTR of users.

set.seed(14641)A_binom<-rbinom(500,1,.25)B_binom<-rbinom(500,1,.2)

Take home message

We quote Frank Portman to finish (other AB testing packages exist):

Most A/B test approaches are centered around frequentist hypothesis tests used to come up with a point estimate (probability of rejecting the null) of a hard-to-interpret value. Oftentimes, the statistician or data scientist laying down the groundwork for the A/B test will have to do a power test to determine sample size and then interface with a Product Manager or Marketing Exec in order to relay the results. This quickly gets messy in terms of interpretability. More importantly it is simply not as robust as A/B testing given informative priors and the ability to inspect an entire distribution over a parameter, not just a point estimate.

Although it is very seductive, using Bayesian inference to combine subjective and objective likelihoods has clear risks, and makes some statisticians understandably nervous. There is no universal best strategy to A/B testing but being aware of both Frequentist and Bayesian inference paradigms is a useful starting point.

References

  • Agresti, A. (1990) Categorical data analysis. New York: Wiley. Pages 59–66.
  • Cohen, J. (1994) The Earth is Round. American Psychologist, Vol 49, no 12, 997-1003.
  • Gregory, M., Alphey, L., Morrison, N. I. and Shimeld, S. M. (2016), Insect transformation with piggyBac: getting the number of injections just right. Insect Mol Biol, 25: 259–271. doi:10.1111/imb.12220
  • Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The Extent and Consequences of P-Hacking in Science. PLoS Biol 13(3):
  • Nuzzo, (2014). Nature 506, 150–152, doi:10.1038/506150a
  • Portman, F. (2016). https://cran.r-project.org/
  • Salsburg, D. (2002) The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, W.H. Freeman / Owl Book. ISBN 0-8050-7134-2
  • Whitaker, Craig F. (9 September 1990). “[Formulation by Marilyn vos Savant of question posed in a letter from Craig Whitaker]. Ask Marilyn”. Parade Magazine: 16. (See Wikipedia link for links)
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bayesAB_0.7.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10      knitr_1.16        magrittr_1.5     
##  [4] devtools_1.13.0   munsell_0.4.3     colorspace_1.3-2 
##  [7] R6_2.2.0          highr_0.6         httr_1.2.1       
## [10] stringr_1.2.0     plyr_1.8.4        tools_3.3.2      
## [13] rmd2md_0.1.4      grid_3.3.2        gtable_0.2.0     
## [16] git2r_0.18.0      withr_1.0.2       htmltools_0.3.5  
## [19] yaml_2.1.14       lazyeval_0.2.0    checkpoint_0.3.18
## [22] rprojroot_1.2     digest_0.6.12     tibble_1.3.0     
## [25] reshape2_1.4.2    ggplot2_2.2.1     codetools_0.2-15 
## [28] curl_2.3          memoise_1.1.0     evaluate_0.10    
## [31] rmarkdown_1.6     labeling_0.3      stringi_1.1.5    
## [34] scales_0.4.1      backports_1.0.5

Gnu Privacy Guard relies on one person who is going broke (2015)

$
0
0

Update, Feb. 5, 2015, 8:10 p.m.: After this article appeared, Werner Koch informed us that last week he was awarded a one-time grant of $60,000 from Linux Foundation's Core Infrastructure Initiative. Werner told us he only received permission to disclose it after our article published. Meanwhile, since our story was posted, donations flooded Werner's website donation page and he reached his funding goal of $137,000. In addition, Facebook and the online payment processor Stripe each pledged to donate $50,000 a year to Koch’s project.

The man who built the free email encryption software used by whistleblower Edward Snowden, as well as hundreds of thousands of journalists, dissidents and security-minded people around the world, is running out of money to keep his project alive.

Werner Koch wrote the software, known as Gnu Privacy Guard, in 1997, and since then has been almost single-handedly keeping it alive with patches and updates from his home in Erkrath, Germany. Now 53, he is running out of money and patience with being underfunded.

"I'm too idealistic," he told me in an interview at a hacker convention in Germany in December. "In early 2013 I was really about to give it all up and take a straight job." But then the Snowden news broke, and "I realized this was not the time to cancel."

Like many people who build security software, Koch believes that offering the underlying software code for free is the best way to demonstrate that there are no hidden backdoors in it giving access to spy agencies or others. However, this means that many important computer security tools are built and maintained by volunteers.

Now, more than a year after Snowden's revelations, Koch is still struggling to raise enough money to pay himself and to fulfill his dream of hiring a full-time programmer. He says he's made about $25,000 per year since 2001 — a fraction of what he could earn in private industry. In December, he launched a fundraising campaign that has garnered about $43,000 to date — far short of his goal of $137,000 — which would allow him to pay himself a decent salary and hire a full-time developer.

The fact that so much of the Internet's security software is underfunded is becoming increasingly problematic. Last year, in the wake of the Heartbleed bug, I wrote that while the U.S. spends more than $50 billion per year on spying and intelligence, pennies go to Internet security. The bug revealed that an encryption program used by everybody from Amazon to Twitter was maintained by just four programmers, only one of whom called it his full-time job. A group of tech companies stepped in to fund it.

Koch's code powers most of the popular email encryption programs GPGTools, Enigmail, and GPG4Win. "If there is one nightmare that we fear, then it's the fact that Werner Koch is no longer available," said Enigmail developer Nicolai Josuttis. "It's a shame that he is alone and that he has such a bad financial situation."

The programs are also underfunded. Enigmail is maintained by two developers in their spare time. Both have other full-time jobs. Enigmail's lead developer, Patrick Brunschwig, told me that Enigmail receives about $1,000 a year in donations — just enough to keep the website online.

GPGTools, which allows users to encrypt email from Apple Mail, announced in October that it would start charging users a small fee. The other popular program, GPG4Win, is run by Koch himself.

Email encryption first became available to the public in 1991, when Phil Zimmermann released a free program called Pretty Good Privacy, or PGP, on the Internet. Prior to that, powerful computer-enabled encryption was only available to the government and large companies that could pay licensing fees. The U.S. government subsequently investigated Zimmermann for violating arms trafficking laws because high-powered encryption was subject to export restrictions.

In 1997, Koch attended a talk by free software evangelist Richard Stallman, who was visiting Germany. Stallman urged the crowd to write their own version of PGP. "We can't export it, but if you write it, we can import it," he said.

Inspired, Koch decided to try. "I figured I can do it," he recalled. He had some time between consulting projects. Within a few months, he released an initial version of the software he called Gnu Privacy Guard, a play on PGP and an homage to Stallman's free Gnu operating system.

Koch's software was a hit even though it only ran on the Unix operating system. It was free, the underlying software code was open for developers to inspect and improve, and it wasn't subject to U.S. export restrictions.

Koch continued to work on GPG in between consulting projects until 1999, when the German government gave him a grant to make GPG compatible with the Microsoft Windows operating system. The money allowed him to hire a programmer to maintain the software while also building the Windows version, which became GPG4Win. This remains the primary free encryption program for Windows machines.

In 2005, Koch won another contract from the German government to support the development of another email encryption method. But in 2010, the funding ran out.

For almost two years, Koch continued to pay his programmer in the hope that he could find more funding. "But nothing came," Koch recalled. So, in August 2012, he had to let the programmer go. By summer 2013, Koch was himself ready to quit.

But after the Snowden news broke, Koch decided to launch a fundraising campaign. He set up an appeal at a crowdsourcing website, made t-shirts and stickers to give to donors, and advertised it on his website. In the end, he earned just $21,000.

The campaign gave Koch, who has an 8-year-old daughter and a wife who isn't working, some breathing room. But when I asked him what he will do when the current batch of money runs out, he shrugged and said he prefers not to think about it. "I'm very glad that there is money for the next three months," Koch said. "Really I am better at programming than this business stuff."


Neural networks meet space

$
0
0

Researchers from the Department of Energy’s SLAC National Accelerator Laboratory and Stanford University have for the first time shown that neural networks—a form of artificial intelligence—can accurately analyze the complex distortions in spacetime known as gravitational lenses 10 million times faster than traditional methods.

“Analyses that typically take weeks to months to complete, that require the input of experts and that are computationally demanding, can be done by neural nets within a fraction of a second, in a fully automated way and, in principle, on a cell phone’s computer chip,” says postdoctoral fellow Laurence Perreault Levasseur, a co-author of a study published today in Nature.

Lightning-fast complex analysis

The team at the Kavli Institute for Particle Astrophysics and Cosmology (KIPAC), a joint institute of SLAC and Stanford, used neural networks to analyze images of strong gravitational lensing, where the image of a faraway galaxy is multiplied and distorted into rings and arcs by the gravity of a massive object, such as a galaxy cluster, that’s closer to us. The distortions provide important clues about how mass is distributed in space and how that distribution changes over time – properties linked to invisible dark matter that makes up 85 percent of all matter in the universe and to dark energy that’s accelerating the expansion of the universe.

Until now this type of analysis has been a tedious process that involves comparing actual images of lenses with a large number of computer simulations of mathematical lensing models. This can take weeks to months for a single lens.

But with the neural networks, the researchers were able to do the same analysis in a few seconds, which they demonstrated using real images from NASA’s Hubble Space Telescope and simulated ones.

To train the neural networks in what to look for, the researchers showed them about half a million simulated images of gravitational lenses for about a day. Once trained, the networks were able to analyze new lenses almost instantaneously with a precision that was comparable to traditional analysis methods. In a separate paper, submitted to The Astrophysical Journal Letters, the team reports how these networks can also determine the uncertainties of their analyses.

Grid of nine boxes showing various gravitational lenses

KIPAC researchers used images of strongly lensed galaxies taken with the Hubble Space Telescope to test the performance of neural networks, which promise to speed up complex astrophysical analyses tremendously.

Yashar Hezaveh/Laurence Perreault Levasseur/Phil Marshall/Stanford/SLAC National Accelerator Laboratory; NASA/ESA

Prepared for the data floods of the future

“The neural networks we tested—three publicly available neural nets and one that we developed ourselves—were able to determine the properties of each lens, including how its mass was distributed and how much it magnified the image of the background galaxy,” says the study’s lead author Yashar Hezaveh, a NASA Hubble postdoctoral fellow at KIPAC.

This goes far beyond recent applications of neural networks in astrophysics, which were limited to solving classification problems, such as determining whether an image shows a gravitational lens or not.

The ability to sift through large amounts of data and perform complex analyses very quickly and in a fully automated fashion could transform astrophysics in a way that is much needed for future sky surveys that will look deeper into the universe—and produce more data—than ever before.

The Large Synoptic Survey Telescope (LSST), for example, whose 3.2-gigapixel camera is currently under construction at SLAC, will provide unparalleled views of the universe and is expected to increase the number of known strong gravitational lenses from a few hundred today to tens of thousands.

“We won’t have enough people to analyze all these data in a timely manner with the traditional methods,” Perreault Levasseur says. “Neural networks will help us identify interesting objects and analyze them quickly. This will give us more time to ask the right questions about the universe.”

Convolutional neural network example with pictures of dogs and features

Scheme of an artificial neural network, with individual computational units organized into hundreds of layers. Each layer searches for certain features in the input image (at left). The last layer provides the result of the analysis. The researchers used particular kinds of neural networks, called convolutional neural networks, in which individual computational units (neurons, gray spheres) of each layer are also organized into 2-D slabs that bundle information about the original image into larger computational units.

Greg Stewart, SLAC National Accelerator Laboratory

A revolutionary approach

Neural networks are inspired by the architecture of the human brain, in which a dense network of neurons quickly processes and analyzes information.

In the artificial version, the “neurons” are single computational units that are associated with the pixels of the image being analyzed. The neurons are organized into layers, up to hundreds of layers deep. Each layer searches for features in the image. Once the first layer has found a certain feature, it transmits the information to the next layer, which then searches for another feature within that feature, and so on.

“The amazing thing is that neural networks learn by themselves what features to look for,” says KIPAC staff scientist Phil Marshall, a co-author of the paper. “This is comparable to the way small children learn to recognize objects. You don’t tell them exactly what a dog is; you just show them pictures of dogs.”

But in this case, Hezaveh says, “It’s as if they not only picked photos of dogs from a pile of photos, but also returned information about the dogs’ weight, height and age.”

Although the KIPAC scientists ran their tests on the Sherlock high-performance computing cluster at the Stanford Research Computing Center, they could have done their computations on a laptop or even on a cell phone, they said. In fact, one of the neural networks they tested was designed to work on iPhones.

“Neural nets have been applied to astrophysical problems in the past with mixed outcomes,” says KIPAC faculty member Roger Blandford, who was not a co-author on the paper. “But new algorithms combined with modern graphics processing units, or GPUs, can produce extremely fast and reliable results, as the gravitational lens problem tackled in this paper dramatically demonstrates. There is considerable optimism that this will become the approach of choice for many more data processing and analysis problems in astrophysics and other fields.”    

Editor's note: This article originally appeared as a SLAC press release.

Use your browser’s print dialog box to create a pdf.

Databases and Distributed Deadlocks: A FAQ

$
0
0

Since Citus is a distributed database, we often hear questions about distributed transactions. Specifically, people ask us about transactions that modify data living on different machines.

So we started to work on distributed transactions. We then identified distributed deadlock detection as the building block to enable distributed transactions in Citus.

First some background: At Citus we focus on scaling out Postgres. We want to make Postgres performance & Postgres scale something you never have to worry about. We even have a cloud offering, a fully-managed database as a service, to make Citus even more worry-free. We carry the pager so you don’t have to and all that. And because we’ve built Citus using the PostgreSQL extension APIs, Citus stays in sync with all the latest Postgres innovations as they are released (aka we are not a fork.) Yes, we’re excited for Postgres 10 like all the rest of you :)

Back to distributed deadlocks: As we began working on distributed deadlock detection, we realized that we needed to clarify certain concepts. So we created a simple FAQ for the Citus development team. And we found ourselves referring back to the FAQ over and over again. So we decided to share it here on our blog, in the hopes you find it useful.

What are locks and deadlocks?

Locks are a way of preventing multiple processes (read: client sessions) from accessing or modifying the same data at the same time. If a process tries to obtain a lock when that lock is already held by another process, it needs to wait until the first process releases the lock.

How locks work

Waiting becomes problematic when processes obtain locks in a different order. Process 1 may by waiting for a lock held by process 2, while process 2 may be waiting for a lock held by process 1, or a chain of processes that ultimately wait on process 1. This is a deadlock.

When does PostgreSQL acquire locks?

Backing up further, PostgreSQL has various locks. In the context of deadlocks, the ones we’re most concerned with are row-level locks that are acquired by statements prior to modifying a row and held until the end of the transaction. DELETE, UPDATE, INSERT..ON CONFLICT take locks on the rows that they modify and also on rows in other tables referenced by a foreign key. INSERT and COPY also acquire a lock when there is a unique constraint, to prevent concurrent writes with the same value.

So if a session does:

BEGIN;UPDATEtableSETvalue=5WHEREkey='hello';

This session now holds row-level locks on all rows where key = ‘hello’. If another session attempts to update rows where key = 'hello’ at the same time, that command will block until session 1 sends commit or abort.

When do deadlocks occur in PostgreSQL?

In the following scenario, sessions 1 and 2 obtain locks in opposite order after sending BEGIN.

1: UPDATE table SET value = 1 WHERE key = 'hello';A takes 'hello’ lock

2: UPDATE table SET value = 2 WHERE key = 'world';B takes 'world’ lock

1: UPDATE table SET value = 1 WHERE key = 'world';wait for 'hello’ lock held by 2

2: UPDATE table SET value = 2 WHERE key = 'hello';wait for 'world’ lock held by 1

This situation on its own can’t be resolved. However, if sessions are waiting on a lock for a while, Postgres will check whether processes are actually waiting for each other. If that is the case, Postgres will forcibly abort transactions until the deadlock is gone.

Note that if both sessions followed the same order (first hello, then world), the deadlock would not have occurred since whichever session gets the 'hello’ lock goes first. Modifications occurring in different order is a key characteristic of deadlocks.

What is a distributed deadlock (in Citus)?

In Citus, the scenario above becomes a bit more complicated if the rows are in different shards on different workers.

In that case, Citus worker A sees:

1: UPDATE table_123 SET value = 5 WHERE key = 'hello';

2: UPDATE table_123 SET value = 6 WHERE key = 'hello';waits for 'hello’ lock held by 1

Citus worker B sees:

2: UPDATE table_234 SET value = 6 WHERE key = 'world';

1: UPDATE table_234 SET value = 5 WHERE key = 'world';waits for 'world’ lock held by 2

Distributed Deadlock

Distributed deadlock

Neither PostgreSQL database on worker A or worker B sees a problem here, just one session waiting for the other one to finish. From the outside, we can see that neither session can finish. In fact, this situation will last until the client disconnects or the server restarts. This situation where two sessions on different worker nodes are both waiting for each other is called a distributed deadlock.

Why are (distributed) deadlocks really bad?

The rows held by the two sessions that are in a deadlock can no longer be modified while the sessions lasts, but that’s far from the worst part. Other sessions may take locks and then get blocked on session 1 or 2 and those locks will prevent yet more sessions from completing and might also make them more likely to form other deadlocks. This can escalate to a full system outage.

How can a distributed database detect and stop distributed deadlocks?

To detect a distributed deadlock, Citus needs to continuously monitor all nodes for processes that are waiting for locks for a non-negligible amount of time (e.g. 1 second). When this occurs, we collect the lock tables from all nodes and construct a directed graph of the processes that are waiting for each other across all the nodes. If there is a cycle in this graph, then there is a distributed deadlock. To end the deadlock, we need to proactively kill processes or cancel transactions until the cycle is gone.

How can a distributed database prevent distributed deadlocks?

Deadlock detection and prevention are related but different topics. Deadlock prevention is an optimisation problem.

The simplest solution is to only allow one multi-shard transaction at a time. At Citus, we find we can do better by using whatever information we have available, most notably the query and the current time.

Predicate Locks A common technique for deadlock prevention is predicate locks. When we see two concurrent transactions performing an UPDATE .. WHERE key = 'hello’ then we know that they might modify the same rows, while a concurrent UPDATE .. WHERE key = 'world’ won’t. We could therefore take a lock based on filter conditions (a predicate lock) on the coordinator. This would allow parallel, multi-shard UPDATEs to run concurrently without risk of deadlock, provided they filter by the same column with a different value.

The predicate locking technique can also detect deadlocks caused by multi-statement transactions across multiple shards if there is one coordinator node. Before a distributed deadlock can form across workers, the predicate locks would have already formed a deadlock on the coordinator, which is detected by PostgreSQL.

Spanner/F1 uses predicate locks to prevent deadlocks within a shard. Spanner could do this because it disallows interactive transaction blocks, meaning it knows all the commands upfront and can take the necessary predicate locks in advance. This is a useful model, but it doesn’t fit well into the PostgreSQL protocol that allows interactive transaction blocks.

Wait-Die or Wound-Wait Another prevention technique is to assign priorities (transaction IDs) to distributed transactions in the form of timestamps. We then try to ensure that a transaction A with a low transaction ID does not get blocked by a transaction B with a higher transaction ID (priority inversion). Whenever this happens, we should either cancel/restart A (wait-die) and try again later, or cancel/restart B (wound-wait) in order to let A through. The latter is generally more efficient.

In PostgreSQL, savepoints might allow us to restart part of a transaction. What’s nice about the wound-wait technique is that it works even when there are multiple coordinators. As long as clocks are reasonably well-synchronised, priority inversion is not that common. In practice, since a transaction that starts earlier typically acquires locks earlier, most transactions don’t experience any cancellation due to priority inversion. The ones that do are the ones that are likely to form a deadlock. Spanner/F1 also uses wound-wait for preventing multi-shard deadlocks.

You can read more on concurrency control in distributed databases here:

https://people.eecs.berkeley.edu/~kubitron/cs262a/handouts/papers/concurrency-distributed-databases.pdf

At Citus we love tackling complex distributed systems problems

Distributed transactions are a complex topic. Most articles on this topic focus on the more visible problem around data consistency. These articles then discuss protocols such as 2PC, Paxos, or Raft.

In practice, data consistency is only one side of the coin. If you’re using a relational database, your application benefits from another key feature: deadlock detection. Hence our work in distributed deadlock detection—and this FAQ.

We love tackling these types of complex distributed systems problems. Not only are the engineering challenges fun—but it is satisfying when our customers tells us that they no longer have to worry about scale and performance because of the work we’ve done scaling out Postgres. And if you’ve been reading our blog for a while, you know we like sharing our learnings, too. Hopefully you’ve found this useful! Let us know on twitter your thoughts and feedback.

Tesla Model S battery degradation data

$
0
0

Want to buy a Tesla? Use my referral link! 

Updated: August 23, 2017. Tesla Motors provides an 8-year infinite mile battery failure warranty but it doesn’t cover degradation. Therefore it is highly relevant for every Tesla driver to know what to expect of the degradation of the capacity over time, because it is equivalent to the range of your car. In the Netherlands, Merijn Coumans is updating on a regular basis, via the Dutch-Belgium Tesla Forum,  a file of owners’ data, created by Matteo. The most recent (Aug 23, 2017) version of the results is displayed below. Detailed explanations are also in the google doc file, as well as at the end of this blogpost. The new files also have data input possibilities for USA and other non-km (i.e. miles) drivers.

chartIn the figure the percentage of range loss is shown on the vertical axis. The horizontal axis displays the distance (in km) driven with the vehicles.

The red fitted line has a slope above 60.000 km (say 40,000 miles) of 1% per 50.000 km (30,000 miles). On average the batteries have 92% remaining at 240.000 km (150,000 miles). If the linear behavior would continue, then the ‘lifetime’ (still 80% capacity left) can be calculated as follows: 92-80 = 12% times 50.000 km =  600.000 km, plus 240.000 km, gives 840.000 km (500,000 miles)! Note that a ICE car has a average lifetime of 220.000 km (140,000 miles)… And remember: if an ICE fails after say 300.000 km, you have a problem. The battery in a Tesla EV after the suggested 840.000 km (ok, lets take 500.000 km, still great!) still has 80% capacity left!

To put in into perspective, on a 0-100% scale, it looks like this:

chart-2Here is a nice video introducing an interactive graph from Teslanomics

The way to measure this is to do a full charge (100%) and then check the EPA rated range (in North America) or Typical range (in Europe and Asia/Pacific). In the plot, these numbers are then compared to the range numbers the car displayed when it was new. For example, for the 85 kWh Model S85 variant, this is about 400 km typical range or 265 mi EPA rated range. Even though this is mostly a reliable method, sometimes the computer in the car can’t accurately estimate how much energy the battery holds and might display an inaccurate range number. To improve accuracy, it is a good idea to run down the battery to almost empty and then charge to 100%, once a month. This is known as rebalancing the battery. However, the battery shouldn’t be left at 0% or 100% for more than 2 hours.

The data collected by Merijn, also include how many Supercharger visits were done, among other details. See the forum for more information, or if you want to upload your data.

Besides Mileage vs Remaining Range, the file includes two other charts: Charge Cycles vs Remaining Range and Battery Age vs Remaining Range. From literature and research we know typically that 80% of battery capacity remains after 1000-2000 full cycles, strongly dependent on the temperature of the batteries. The data below support these numbers.

chart-3

chart-4

Here is a recent update from the USA with almost 130000 miles driven as max.

tesla-battery-degredation-pluginFrom the USA+ drivers I could find the following previous data of the Plug in America Survey, and used it to generate the following picture:

PlugInAmericaData

I compared all provided data with the EPA 265 miles number for the 85kWh Model S and to the 210 miles EPA number for the 60 kWh models. It is not clear of course how trustworthy this data is, and how peoples measured..

If you like the km version, here it is:

PlugInAmericaDatakm

The plot below is from UK-manufactured Nissan Leafs, 2013 24kWh models, thanks to forum users and via Simon Canfer.c563dltxqaa_ik6-jpg-large

Links on battery degradation:

Here are some plots from 2013

From a Tesla Blogger

A nice video on battery degradation can be found here.

How to prolong Lithium-based batteries?

Some data from 2013 from the Tesla Forum

Here is a very nice report on the Tesla Roadster by Plug-in America, one result is the following one, based on 126 vehicles:

roadster

—–

here are some of the notes supporting the data at the top of this blogpost:

Notes
◘ These charts are updated every time there is a new entry. All charts on this page show entries from all locations.
◘ The trendlines are generated by the chart. Hover over the trendline to see the current formula.
◘ When range mode is on, the displayed range in the car increases a few miles or km. For all calculations, this chart uses range numbers with range mode on because this statement by Jerome Guillen: “EPA testing is [done] with range mode on, given that it is assumed customers will use that function when they want to drive the farthest.” In other words, because the range scores are achieved with range mode on, it is more accurate to do calculations based on range numbers when range mode is on. But users don’t have to turn on range mode just to read their 100% charged range. The chart will calculate that accurately.
A video showing how range mode effects displayed rangeYoutube
◘ A cycle is when multiple percentage charges add up to 100%. For example charging 5 times from 70% to 90% is 1 cycle. Professor Jeff Dahn who is an expert on Tesla batteries said the following when responding to a question by a Tesla owner:
Quote: “If you charge from 30% to 70% 150 times or from 10% to 70% 100 times, the ageing of the battery will be approximately the same”.Source
A video of Prof Dahn comparing Model S battery to other EVsYoutube
◘ If you don’t see your name on cycles chart, go back to data entry and enter lifetime average energy consumption (Wh/km or Wh/mi).
Replacement Batteries
If you have a replacement battery, don’t be surprised if you don’t recognise the mileage and battery age numbers you see above. These numbers are calculated for your replacement battery. They are different than your car mileage or vehicle age. If you want to know more about how this calculation is done, read on. Otherwise this might be too much detail.

Let’s assume somebody entered this data:
◘ Mileage: 30,000 km
◘ At what km did you replace battery? 24,000 km
◘ What happened to typical range after replacement? Improved 5 km
◘ Typical Range at 100% charge: 390 km
◘ Ownership duration: 350 days

Mileage calculation for replacement battery: If the battery hadn’t been replaced, typical range would be 390-5= 385 km on the old battery at 30,000 km.
If 30,000 km equals to 400-385 km range loss
then X km equals to 400-390 km range loss
Cross multiply. X= 30,000 * (400-390) / (400-385) = 30,000 * 10 / 15 = 20,000 km. The chart will display 20,000 km
◘ Why does the chart display 20,000 km even though actual mileage on the replacement battery is only 30,000-24,000= 6,000 km?
Because this replacement battery was refurbished and had some mileage on it.
◘ Why does the chart display 20,000 km even though mileage on the car is 30,000 km?
Because the chart displays mileage on the current battery not on the car and it has calculated that the replacement battery has 20,000 km mileage on it.
◘ How would the chart calculate age on the replacement battery?
Age = 350 * (400-390) / (400-385) = 350 * 10 / 15 = 233 days
◘ What mileage would the chart show if the replacement battery still had 400 km range?
Even though the calculation would result in zero miles (X= 30,000 * (400-400) / (400-395) = 30,000 * 0 / 5 = 0 km) the chart recognises that the calculation shouldn’t be less than what the user reported. In this case odometer shows 30,000, replacement happened at 24,000. Therefore mileage on the replacement battery is at least 6,000 km. Therefore the chart would display 6,000 km. The age calculation for the replacement battery would result in zero days too because the typical range is still 400 km (same as a new battery). Of course zero days would be incorrect too because the battery has 6000 km on it. Again the chart wouldn’t use zero and would calculate the time that corresponds to 6000 km mileage as follows:
If 30,000 km equals to 350 days
Then 6,000 km equals to X days
X= 6,000*350/30,000= 70 days
Updates
◘ 30 July 2015: The Mileage Chart now switches between km and miles depending on location of selected username. We have now more entries from USA than before. So I thought this would be useful. Matteo
◘ 11 Feb 2016: The trendline for the first chart (Mileage vs Remaining range chart) has been updated. The old trendline was a third order polynomial trendline. It worked fine for 0-120,000 km where there are lots of entries but after there were no more entries it showed a sharp drop. The new trendline is polynomial until the data ends but linear afterwards. Like the old trendline this new trendline is also calculated automatically by the chart and updates when there are new entries. Matteo
◘ 9 Aug 2016: I added a script that clears the username selection in E1 every 2 hours. Matteo

Redesigning Python's named tuples

$
0
0
Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visitthis page to join up and keep LWN on the net.

By Jake Edge
August 23, 2017

Deficiencies in the startup time for Python, along with the collections.namedtuple() data structure being identified as part of the problem, led Guido van Rossum to decree that named tuples should be optimized. That immediately set off a mini-storm of thoughts about the data structure and how it might be redesigned in the original python-dev thread, but Van Rossum directed participants over to python-ideas, where a number of alternatives were discussed. They ranged from straightforward tweaks to address the most pressing performance problems to elevating named tuples to be a new top-level data structure—joining regular tuples, lists, sets, dictionaries, and so on.

A named tuple simply adds field names for the entries in a tuple so that they can be accessed by index or name. For example:

>>> from collections import namedtuple>>> Point = namedtuple('Point', ['x', 'y'])>>> p = Point(1,2)>>> p.y
    2>>> p[1]
    2
The existing implementation builds a Python class implementing the named tuple; it is the building process that is the worst offender in terms of startup performance. A bug was filed in November 2016; more recently the bug was revived and various benchmarks of the performance of named tuples were added to it. By the looks, there is room for a good bit of optimization, but the fastest implementation may not be the winner—at least for now.

To some extent, the current named tuple implementation has been a victim of its own success. It is now routinely used in the standard library and in other popular modules such that its performance has substantially contributed to Python's slow startup time. The existing implementation creates a _source attribute with pure Python code to create a class, which is then passed to exec() to build it. That attribute is then available for use in programs or to directly create the named tuple class by incorporating the source code. The pull request currently under consideration effectively routes around most of the use ofexec(), though it is still used to add the __new__() function to create new instances of the named tuple class.

After Van Rossum's decree, Raymond Hettinger reopened the bug with an explicit set of goals. His plan was to extend the patch set from the pull request so that it was fully compatible with the existing implementation and to measure the impact of it, including for alternative Python implementations (e.g. PyPy, Jython). But patch author Jelle Zijlstra wondered if it made sense to investigate a C-based implementation of named tuples created by Joe Jevnik.

Benchmarks were posted. Jevnik summarized hisfindings about the C version as follows: "type creation is much faster; instance creation and named attribute access are a bit faster". Zijlstra's benchmarks of his own version showed a 4x speedup for creating the class (versus the existing CPython implementation) and roughly the same performance as CPython for instantiation and attribute access. Those numbers caused Zijlstra to suggest using the C version:

Joe's cnamedtuple is about 40x faster for class creation than the current implementation, and my PR only speeds class creation up by 4x. That difference is big enough that I think we should seriously consider using the C implementation.

There are some downsides to a C implementation, however. As the original bug reporter, Naoki Inada, pointed out, maintenance is more difficult for C-based code. In addition, only CPython can directly benefit from it; alternative language implementations will either need to reimplement it or forgo it.

Class creation performance is only one area that could use improvement, however. Victor Stinner noted that accessing tuple values by name was nearly twice as slow when compared to the somewhat similar, internal PyStructSequence that is used for things like sys.version_info. It would be desirable for any named tuple upgrade to find a way to reduce the access-by-name overhead, several said. In fact, Giampaolo Rodolà pointed out that the asyncio module could serve nearly twice as many requests per second if the performance of PyStructSequence could be attained.

But Rodolà would like to go even further than that. He proposed new syntax that would allow the creation of named tuples on the fly. He gave two possibilities for how that might look:

>>> ntuple(x=1, y=0)
    (x=1, y=0)>>> (x=1, y=0)
    (x=1, y=0)

Either way (or both) would be implemented in C for speed. It would allow named tuples to be created without having to describe them up front, as is done now. But it would also remove one of the principles that guided the design of named tuples, as Tim Peters said:

How do you propose that the resulting object T know that T.x is 1. T.y is 0, and T.z doesn't make sense? Declaring a namedtuple up front allows the _class_ to know that all of its instances map attribute "x" to index 0 and attribute "y" to index 1. The instances know nothing about that on their own, and consume no more memory than a plain tuple. If your `ntuple()` returns an object implementing its own mapping, it loses a primary advantage (0 memory overhead) of namedtuples.

Post-decree, Ethan Furman moved the discussion to python-ideas and suggested looking at his aenum module as a possible source for a new named tuple. But that implementation usesmetaclasses, which could lead to problems when subclassing as Van Rossum pointed out.

Jim Jewett's suggestion to make named tuples simply be a view into a dictionary ran aground on too many incompatibilities with the existing implementation. Python dictionaries are now ordered by default and are optimized for speed, so they might be a reasonable choice, Jewett said. As Greg Ewing and others noted, though, that would lose many of the attributes that are valued for named tuples, including low memory overhead, access by index, and being a subclass of tuple.

Rodolà revived his proposal for named tuples without a declaration, but there are a number of problems with that approach. One of the main stumbling blocks is the type of these on-the-fly named tuples—effectively each one created would have its own type even if it had the same names in the same order. That is wasteful of memory, as is having each instance know about the mapping from indexes to names; the current implementation puts that in the class, which can be reused. There might be ways to cache these on-the-fly named tuple types to avoid some of the wasted memory, however. Those problems and concern that it would be abused led Van Rossum to declare the "bare" syntax (e.g. (x=1, y=0)) proposal as dead.

But the discussion of ntuple(x=1, y=0) continued for a while before seemingly running aground as well. Part of the problem is that it combines two things in an unexpected way: declaring the order of the fields in the named tuple and using keyword arguments where order should not matter. For the x and y case, it is fairly clear, but named tuples could be used for types where the order is not so clear. As Steven D'Aprano put it:

I don't see any way that this proposal can be anything by a subtle source of bugs. We have two *incompatible* requirements:
  • we want to define the order of the fields according to the order we give keyword arguments;
  • we want to give keyword arguments in any order without caring about the field order.
We can't have both, and we can't give up either without being a surprising source of annoyance and bugs.

As far as I am concerned, this kills the proposal for me. If you care about field order, then use namedtuple and explicitly define a class with the field order you want. If you don't care about field order, use SimpleNamespace.

He elaborated on the ordering problem by giving an example of a named tuple that stored the attributes of elementary particles (e.g. flavor, spin, charge) which do not have an automatic ordering. That argument seemed to resonate with several thread participants.

So it would seem that a major overhaul of the interface for building named tuples is not likely anytime soon—if ever. The C reimplementation has some major performance benefits (and could presumably pick up the PyStructSequence performance for access by name), but it would seem that the first step will be to merge Zijlstra's Python-based implementation. That will allow for a fallback with better performance for alternative implementations, while still leaving open the possibility of replacing it with an even faster C version later.


(Log in to post comments)

Observations about the attack on WikiLeaks

$
0
0

On 30 august, this year, a technical attack was performed againstWikiLeaks, leading some visitors of WikiLeaks' Web site to see instead a claim by "OurMine" that they seized control of WikiLeaks' servers. A lot of stupid things, by ignorant people (both WikiLeaks fans and ennemies), have been said on the networks, about this attack. Most of the time, they did not bother to check any facts, and they did not demonstrate any knowledge of the technical infrastructure. Here, I want to describe the bare facts, as seen from technical observations. Spoiler: I have no sensational revelations to make.

First, the basic fact: some people saw something which was obviously not WikiLeaks' Web site: screenshots of the page are here or here. Some people deduced from that that WikiLeaks' Web server was cracked and the crackers modified its content (you can find this in The Verge for instance). That was a bold deduction: the complete process from the user's browser to the display of the Web page is quite complicated, with a lot of actors, and many things can go wrong.

In the case of WikiLeaks, it appeared rapidly that the Web server was not cracked but that the attack targeted successfully the wikileaks.orgdomain name. Observations (more on that later) show that the namewikileaks.org was not resolved into the usualIP address but in another one, located in a different hoster. How is it possible? What are the consequences?

You should remember that investigation of digital incidents on the Internet is difficult. The external analyst does not have all the data. Sometimes, when the analysis starts, it is too late, the data already changed. And the internal analyst almost never publishes everything, and sometimes even lies. There are some organisations that are more open in their communication (see this Cloudflare report or this Gandi report) but they are the exceptions rather than the rule. Here, WikiLeaks reacted like the typical corporation, denying the problem, then trying to downplay it, and not publishing anything of value for the users. So, most of the claims that you can read about network incidents are not backed by facts, specially not publically-verifiable facts. The problem is obviously worse in that case, because WikiLeaks is at the center of many hot controversies. For instance, some WikiLeaks fans claimed from the beginning "WikiLeaks' servers have not been compromised" while they had zero actual information, and, anyway, not enough time to analyze it.

So, the issue was with the domain namewikileaks.org. To explain what happened, we need to go back to the DNS, both a critical infrastructure of the Internet, and a widely unknown (or underknown) technology. The DNS is a database indexed bydomain names (likewikileaks.org orssi.gouv.fr). When you query the DNS for a given domain name, you get various technical informations such asIP addresses of servers, cryptographic keys, name of servers, etc. When the typical Web browser goes tohttp://www.okstate.com/, the software on the user's machine performs a DNS query for the namewww.okstate.com, and gets back the IP address of the HTTP server. It then connects to the server.

From this very short description, you can see that s·he who controls the DNS controls where the user will eventually go and what s·he will see. And the entire DNS resolution process (from a name to the data) is itself quite complicated, offering many opportunities for an attacker. Summary so far: DNS is critical, and most organisations underestimate it (or, worse, claim it is not their responsability).

And where do the data in the DNS come from? That's the biggest source of vulnerabilities: unlike what many people said, most so-called "DNS attacks" are not DNS attacks at all, meaning they don't exploit a weakness in the DNS protocol. Most of the time, they are attacks against the provisioning infrastructure, the set of actors and servers that domain name holders (such as WikiLeaks for wikileaks.org) use to provision the data. Let's say you are Pat Smith, responsible for the online activity of an organisation named the Foobar Society. You have the domain namefoobar.example. The Web site is hosted at Wonderful Hosting, Inc. After you've choosen aTLD (and I recommend you read the excellent EFF survey before you do so), you'll typically need to choose a registrar which will act as a proxy between you and the actualregistry of the TLD (here, the fictitious.example). Most of the time, you, Pat Smith, will connect to the Web site of the registrar, create an account, and configure the data which will ultimately appear in the DNS. For instance, when the Web site is created at Wonderful Hosting, Pat will enter its IP address in the control panel provided by the registrar. You can see immediately that this required Pat to log in the said control panel. If Pat used a weak password, or wrote it down under h·is·er desk or if Pat is gullible and believes a phone call asking h·įm·er to give the password, the account may be compromised, and the attacker may log in instead of Pat and put the IP address of h·er·is choosing. This kind of attacks is very common, and illustrate the fact that not all attacks are technically complicated.

So, what happened in the WikiLeaks case? (Warning, it will now become more technical.) We'll first use a "passive DNS" base, DNSDB. This sort of databases observes the DNS traffic (which is most of the time in clear, seeRFC 7626) and record it, allowing its users to time-travel. DNSDB is not public, I'm sorry, so for this one, you'll have to trust me. (That's why real-time reaction is important: when you arrive too late, the only tools to observe an attack are specialized tools like this one.) What's in DNSDB?

;;  bailiwick: org.
;;      count: 9
;; first seen: 2017-08-30 04:28:40 -0000
;;  last seen: 2017-08-30 04:30:28 -0000
wikileaks.org. IN NS ns1.rivalhost.com.
wikileaks.org. IN NS ns2.rivalhost.com.
wikileaks.org. IN NS ns3.rivalhost.com.

;;  bailiwick: org.
;;      count: 474
;; first seen: 2017-08-30 04:20:15 -0000
;;  last seen: 2017-08-30 04:28:41 -0000
wikileaks.org. IN NS ns1.rival-dns.com.
wikileaks.org. IN NS ns2.rival-dns.com.
wikileaks.org. IN NS ns3.rival-dns.com.
    

What does it mean? That during the attack (around 04:30UTC), the.org registry was replying with the illegitimate set of servers. The usual servers are (we use the dig tool, the best tool to debug DNS issues):


% dig @a0.org.afilias-nst.info. NS wikileaks.org
...
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21194
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 3
...
;; AUTHORITY SECTION:
wikileaks.org.		86400 IN NS ns1.wikileaks.org.
wikileaks.org.		86400 IN NS ns2.wikileaks.org.

;; ADDITIONAL SECTION:
ns1.wikileaks.org.	86400 IN A 46.28.206.81
ns2.wikileaks.org.	86400 IN A 46.28.206.82
...
;; SERVER: 2001:500:e::1#53(2001:500:e::1)
;; WHEN: Fri Sep 01 09:18:14 CEST 2017
...

    

(And, yes, there is a discrepancy between what is served by the registry and what's inside nsX.wikileaks.org name servers: whoever manages WikiLeaks DNS does a sloppy job. That's why it is often useful to query the parent's name servers, like I did here.)

So, the name servers were changed, for rogue ones. Note there was also a discrepancy during the attack. These rogue servers gave a different set of NS (Name Servers), according to DNSDB:

;;  bailiwick: wikileaks.org.
;;      count: 1
;; first seen: 2017-08-31 02:02:38 -0000
;;  last seen: 2017-08-31 02:02:38 -0000
wikileaks.org. IN NS ns1.rivalhost-global-dns.com.
wikileaks.org. IN NS ns2.rivalhost-global-dns.com.     

Note that it does not mean that the DNS hoster of the attacker,Rival, is an accomplice. They may simply have a rogue customer. Any big service provider will have some rotten apples among its clients.

You can see the date of the last change inwhois output, when everything was put back in place:

% whois wikileaks.org
...
Updated Date: 2017-08-31T15:01:04Z

Surely enough, the rogue name servers were serving IP addresses pointing to the "false" Web site. Again, in DNSDB:

;;  bailiwick: wikileaks.org.
;;      count: 44
;; first seen: 2017-08-30 04:29:07 -0000
;;  last seen: 2017-08-31 07:22:05 -0000
wikileaks.org. IN A 181.215.237.148

The normal IP addresses of WikiLeaks are in the prefixes95.211.113.XXX,141.105.XXX and195.35.109.XXX (dig A wikileaks.org if you want to see them, or use a DNS Looking Glass). 181.215.237.148 is the address of the rogue Web site, hosted by Rival again, as can be seen with thewhois tool:

% whois 181.215.237.148
inetnum:     181.215.236/23
status:      reallocated
owner:       Digital Energy Technologies Chile SpA
ownerid:     US-DETC5-LACNIC
responsible: RivalHost, LLC.
address:     Waterwood Parkway, 1015, Suite G, C-1
address:     73034 - Edmond - OK
country:     US
owner-c:     VIG28
tech-c:      VIG28
abuse-c:     VIG28
...
nic-hdl:     VIG28
person:      AS61440 Network Operating Center
e-mail:      noc@AS61440.NET
address:     Moneda, 970, Piso 5
address:     8320313 - Santiago - RM
country:     CL

(It also shows that this prefix was allocated inChile, the world is a complicated place, and the Internet even more so.)

So, this was the modus operandi of the cracker. S·he managed to change the set of name servers serving wikileaks.org, and that gave h·im·er the ability to send visitors to a server s·he controlled. (Note that this HTTP server, 181.215.237.148, no longer serves the cracker's page: it was probably removed by the provider.)

Many people on the social networks claimed that the attack was done by "DNS poisoning". First, a word of warning by a DNS professional: when someone types "DNS poisoning", you can be pretty sure s·he knows next to nothing about DNS. DNS poisoning is a very specific attack, for which we have solutions (DNSSEC, mentioned later), but it does not seem to be very common (read again my warning at the beginning: most attacks are never properly analyzed and documented, so it is hard to be more precise). What is very common are attacks against the domain name provisioning system. This is, for instance, what happened to the New York Times in 2013, from an attack by the infamous SEA (see NYT paper and a technical analysis). More recently, there was the attack against St. Louis Federal Reserve and many others. These attacks don't use the DNS protocol and it is quite a stretch to label them as "DNS attacks" or, even worse, "DNS poisoning".

What are the consequences of such an attack? As explained earlier, once you control the DNS, you control everything. You can redirect users to a Web site (not only the external visitors, but also the employees of the targeted organisation, when they connect to internal services, potentially stealing passwords and other informations), hijack the emails, etc. So, claiming that "the servers were not compromised" (technically true) is almost useless. With an attack on domain names, the cracker does not need to compromise the servers.

Who was cracked in the WikiLeaks case? From the outside, we can say with confidence that the name servers were changed. The weakness could have been at the holder (WikiLeaks), at the registrar (Dynadot, an information you also get with whois), or at the registry (.org, administratively managed by PIR and technically byAfilias). From the information available, one cannot say where the problem was (so, people who publically shouted that "WikiLeaks is not responsible" were showing their blind faith, not their analytic abilities). Of course, most of the times, the weakest link is the user (weak password to the registrar portal, and not activating 2FA), but some registrars or registries displayed in the past serious security weaknesses. The only thing we can say is that no other domain name appeared to have been hijacked. (When someone takes control of a registrar or registry, he can change many domain names.)

I said before that, when you control a domain name, you can send both external and internal visitors to the server you want. That was not entirely true, since good security relies ondefence in depth and some measures can be taken to limit the risk, even if your domain name is compromised. One of them is of course havingHTTPS (it is the case of WikiLeaks), with redirection from the plain HTTP site, and HSTS (standardized in RFC 6797), to avoid that regular visitors go trough the insecure HTTP. Again, WikiLeaks use it:

%  wget --server-response --output-document /dev/null https://wikileaks.org/
...
Strict-Transport-Security: max-age=25920000; includeSubDomains; preload

These techniques will at least raise an alarm, telling the visitor that something is fishy.

In the same way, using Tor to go to a.onionURL would also help. But I have not been able to find a .onion for WikiLeaks (thehttp://suw74isz7wqzpmgu.onion/ indicated onthe wiki does not work, thehttp://wlupld3ptjvsgwqw.onion seems to be just for uploading).

One can also limit the risk coming from an account compromise by enabling registry lock, a technique offered by most TLD (including.org) to prevent unauthorized changes. When activated, it requires extra steps and checking for any change. I cannot say, from the outside, if WikiLeaks enabled it but sensitive domain namesmust do it.

Funny enough, with so many people claiming it was "DNS poisoning", the best protection against this specific attack,DNSSEC, is not enabled by WikiLeaks (there is a DNSSEC key inwikileaks.org but no signatures and noDS record in the parent). Ifwikileaks.org were signed, and if you use a validating DNS resolver (everybody should), you cannot fall for a DNS poisoning attack against WikiLeaks. Of course, if the attack is, instead, a compromise of holder account, registrar or registry, DNSSEC would not help a lot.

A bit of technical fun at the end. WikiLeaks usesglue records for its name servers. They are nameserver names which are under the domain they serve, thus creating a chicken-and-egg problem. To allow the DNS client to query them, the parent has to know the IP address of this name server. This is what is called a glue record. DNSDB shows us that the glue forns1.wikileaks.org was apparently modified (note that it was several hours after the main attack):

;;  bailiwick: org.
;;      count: 546
;; first seen: 2017-08-31 00:23:13 -0000
;;  last seen: 2017-08-31 06:22:42 -0000
ns1.wikileaks.org. IN A 191.101.26.67

This machine is still up and serves a funny value forwikileaks.org (again, you can use a DNS Looking Glass):


% dig @191.101.26.67 A wikileaks.org
...
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53887
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
...
;; ANSWER SECTION:
wikileaks.org.		400 IN A 127.0.0.1

This IP address, meaninglocalhost, was indeed seen by some DNSDB sensors:

;;  bailiwick: wikileaks.org.
;;      count: 1
;; first seen: 2017-08-31 09:17:29 -0000
;;  last seen: 2017-08-31 09:17:29 -0000
wikileaks.org. IN A 127.0.0.1	

Since the DNS heavily relies on caching, the information was still seen even after the configuration was fixed. Here, we use the RIPE Atlas probes with the atlas-resolve tool to see how many probes still saw the wrong value (pay attention to the date and time, all inUTC, which is the rule when analyzing Internet problems):

% atlas-resolve -r 1000 -t A wikileaks.org
[141.105.65.113 141.105.69.239 195.35.109.44 195.35.109.53 95.211.113.131 95.211.113.154] : 850 occurrences 
[195.175.254.2] : 2 occurrences 
[127.0.0.1] : 126 occurrences 
Test #9261634 done at 2017-08-31T10:03:24Z

Version PDF de cette page (mais vous pouvez aussi imprimer depuis votre navigateur, il y a une feuille de style prévue pour cela)

Viewing all 25817 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>