There’s a war on for top performing tech talent. Growth companies realize this coveted group of employees is critical in providing a competitive edge and delighting customers. However, top performing employees have an edge in today’s employment market fueled by a perfect storm of factors:
Tech unemployment is at historic lows, and some tech sectors have an unemployment rate of zero.
A popular approach to filling talent gaps with H-1B visas now faces new levels of political scrutiny.
Unprecedented demand for IT skills in traditional tech hubs, specifically Silicon Valley, has resulted in an employment market where most employers can’t afford to pay the demand for wages and a cost of living environment where most employees can’t afford to live.
What’s the answer to attracting the best and brightest in this climate? Flexibility.
A recent Deloitte study found that 64 percent of Millennials already have flexible locations for work, up from 21 percent just last year. Many in the succeeding Gen Z — the first generation to always have had in Internet connection — will never consider an employer that doesn’t offer flexible work environments.
The definition of flexibility will vary from company to company. At my company, we started by targeting the best available talent regardless of location and have experienced tremendous growth because of it as a virtual organization.
We still want our teams to do their jobs from wherever they are. We never would have attracted the talent required to support our growth by chaining employees to a desk.
Our experience is that top performing employees beget top performing employees. As a result of this organic growth in our number of employees, we’ve had clusters of talent forming in various cities, including St. Louis, Austin, New Orleans, New York, Washington, and Victoria.
The skeptics of completely virtual organizations argue that in-person employee-to-employee communication is crucial for collaboration and productivity. These skeptics are right; they are also wrong.
Communication in any organization is critical to success. Water cooler conversations can result in increased collaboration. This can also be offset by the increased levels of productivity and cost savings related to employees working in a comfortable, quiet, familiar environment. The tools available for remote collaboration — Zoom, Hangouts, Slack — have dramatically increased in functionality. These tools facilitate our daily standups and company-wide meetings.
Still, there is value in actual face time. Our organic hub-centric growth has allowed us to create employee centers in various cities. Employees are not expected to report to these centers daily. While most companies offer a “work from home” day to inspire or reward employees, we have established “work from work” days. Employees come into the centers refreshed and inspired to collaborate on tech jams and code sprints.
We also use these hubs as places to mentor the next generation of top performing employees. All new team members are assigned a mentor located in their same geographic area. I also use work-from-work days to meet personally with all employees.
Regardless of the type or organization — traditional office, virtual environment, or hybrid model — communication with employees will always be a key to success. At the same time, communication in any environment will never be perfect, and that’s OK as long as you recognize that and adapt to it.
There’s no end in sight in the war for top performing tech talent. While there is still great value in direct communication with employees, long gone are the days of shackling them to a desk in an office park.
The winners in this war will be companies that offer flexible workplace arrangements that meet the needs of both companies and their employees.
Andy Dearing is CEO of GIS platform company Boundless. A commercial pilot and self-taught geographer, he has been working with GIS for nearly 15 years. He lives in Missouri with his wife and four kids.
The bad news: from Firefox 55 onwards, Selenium IDE will no longer work.
The reasons for this are complex, but boil down to two main causes:
Browsers are complicated pieces of software that are constantly evolving. Mozilla has been working hard to make Firefox faster and more stable, while still retaining the flexibility and ease of extension that we’ve come to know and love. As part of that process, Firefox is switching extensions from the original “XPI” format, to a newer, more widely adopted “Web Extension” mechanism.
The Selenium project lacks someone with the time and energy to move the IDE forwards to take advantage of the new technologies.
Selenium is one of the most widely used pieces of testing software there is. Despite this, the team of people regularly contributing is small: since the start of the year, there are only 11 people who have made more than 10 commits, with two people accounting for more than half of those. Since 2016, only one person has been maintaining the IDE.
Selenium is an Open Source project. None of the core contributors — not the IDE maintainer, not the language binding owners — are paid to work on work on it. They do it because they love working on the code, and they typically do it in their “copious free time”. The IDE maintainer has had almost none of that to spare. We should all be thanking that committer for his time and effort. Thank you, Samit!
So what can we do to move forward? The first thing is that there are now a wealth of tools that are stepping up to fill the gap. You should go and have a look at them. The second thing is that there is an effort to rebuild IDE using modern APIs, to be usable across more than just Firefox. The fine people at Applitools are helping with this effort.
The third thing? That’s you. You could help us.
If you believe that a friendly UI for quickly recording and playing back tests is a useful Open Source tool, then please come and join us! The main technical discussions are happening on the #selenium IRC channel. If you’d prefer Slack, you can join us on that too. Or there’s the ever useful selenium-developers mailing list. Come onboard. We’d love your help, and IDE is a wonderful thing to contribute to!
This blog post will show you how to create your own Serverless Raspberry Pi cluster with Docker and the OpenFaaS framework. People often ask me what they should do with their cluster and this application is perfect for the credit-card sized device - want more compute power? Scale by adding more RPis.
"Serverless" is a design pattern for event-driven architectures just like "bridge", "facade" and "factory" are also abstract concepts - so is "serverless".
Here's my cluster for the blog post - with brass stand-offs used to separate each device.
What is Serverless and why does it matter to you?
As an industry we have some explaining to do regarding what
the term "serverless" means. For the sake of this blog post let us assume that it is a new architectural pattern for event-driven architectures and that it lets you write tiny, reusable functions in whatever language you like. Read more on Serverless here.
Image may be NSFW. Clik here to view.Serverless is an architectural pattern resulting in: Functions as a Service, or FaaS
Serverless functions can do anything, but usually work on a given input - such as an event from GitHub, Twitter, PayPal, Slack, your Jenkins CI pipeline - or in the case of a Raspberry Pi - maybe a real-world sensor input such as a PIR motion sensor, laser tripwire or even a temperature gauge.
Image may be NSFW. Clik here to view.
Let's also assume that serverless functions tend to make use of third-party back-end services to become greater than the sum of their parts.
We'll be using OpenFaaS which lets you turn any single host or cluster into a back-end to run serverless functions. Any binary, script or programming language that can be deployed with Docker will work on OpenFaaS and you can chose on a scale between speed and flexibility. The good news is a UI and metrics are also built-in.
Here's what we'll do:
Set up Docker on one or more hosts (Raspberry Pi 2/3)
Docker is a technology for packaging and deploying applications, it also has clustering built-in which is secure by default and only takes one line to set up. OpenFaaS uses Docker and Swarm to spread your serverless functions across all your available RPis.
Image may be NSFW. Clik here to view.Pictured: 3x Raspberry Pi Zero
I recommend using Raspberry Pi 2 or 3 for this project along with an Ethernet switch and a powerful USB multi-adapter.
The community is helping the Docker team to ready support for Raspbian Stretch, but it's not yet seamless. Please download Jessie from the RPi foundation's archive here
Before booting the RPi you'll need to create a file in the boot partition called "ssh". Just keep the file blank. This enables remote logins.
Power up and change the hostname
Now power up the RPi and connect with ssh
$ ssh pi@raspberrypi.local
The password is raspberry.
Use the raspi-config utility to change the hostname to swarm-1 or similar and then reboot.
While you're here you can also change the memory split between the GPU (graphics) and the system to 16mb.
We can use a utility script for this:
$ curl -sSL https://get.docker.com | sh
This installation method may change in the future. As noted above you need to be running Jessie so we have a known configuration.
You may see a warning like this, but you can ignore it and you should end up with Docker CE 17.05:
WARNING: raspbian is no longer updated @ https://get.docker.com/
Installing the legacy docker-engine package...
After, make sure your user account can access the Docker client with this command:
$ usermod pi -aG docker
If your username isn't pi then replace pi with alex for instance.
Change the default password
Type in $sudo passwd pi and enter a new password, please don't skip this step!
Now repeat the above for each of the RPis.
Create your Swarm cluster
Log into the first RPi and type in the following:
$ docker swarm init
Swarm initialized: current node (3ra7i5ldijsffjnmubmsfh767) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-496mv9itb7584pzcddzj4zvzzfltgud8k75rvujopw15n3ehzu-af445b08359golnzhncbdj9o3 \
192.168.0.79:2377
You'll see the output with your join token and the command to type into the other RPis. So log into each one with ssh and paste in the command.
Give this a few seconds to connect then on the first RPi check all your nodes are listed:
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
3ra7i5ldijsffjnmubmsfh767 * swarm1 Ready Active Leader
k9mom28s2kqxocfq1fo6ywu63 swarm3 Ready Active
y2p089bs174vmrlx30gc77h4o swarm4 Ready Active
Congratulations! You have a Raspberry Pi cluster!
*More on clusters
You can see my three hosts up and running. Only one is a manager at this point. If our manager were to go down then we'd be in an unrecoverable situation. The way around this is to add redundancy by promoting more of the nodes to managers - they will still run workloads, unless you specifically set up your services to only be placed on workers.
To upgrade a worker to a manager, just type in docker swarm promote from one of your managers.
Note: Swarm commands such as docker service ls or docker node ls can only be done on the manager.
For a deeper dive into how managers and workers keep "quorum" head over to the Docker Swarm admin guide.
OpenFaaS
Now let's move on to deploying a real application to enable Serverless functions to run on our cluster. OpenFaaS is a framework for Docker that lets any process or container become a serverless function - at scale and on any hardware or cloud. Thanks to Docker and Golang's portability it also runs very well on a Raspberry Pi.
Image may be NSFW. Clik here to view.
Please show your support and star the OpenFaaS repository on GitHub.
Log into the first RPi (where we ran docker swarm init) and clone/deploy the project:
$ git clone https://github.com/alexellis/faas/
$ cd faas
$ ./deploy_stack.armhf.sh
Creating network func_functions
Creating service func_gateway
Creating service func_prometheus
Creating service func_alertmanager
Creating service func_nodeinfo
Creating service func_markdown
Creating service func_wordcount
Creating service func_echoit
Your other RPis will now be instructed by Docker Swarm to start pulling the Docker images from the internet and extracting them to the SD card. The work will be spread across all the RPis so that none of them are overworked.
This could take a couple of minutes, so you can check when it's done by typing in:
Now you can follow the same tutorial written for PC, Laptop and Cloud available below, but we are going to run a couple of commands first for the Raspberry Pi.
Pick it up at step 3:
Instead of placing your functions in ~/functions/hello-python - place them inside the faas-cli folder we just cloned from GitHub.
Also replace "localhost" for the IP address of your first RPi in the stack.yml file.
Note that the Raspberry Pi may take a few minutes to download your serverless function to the relevant RPi. You can check on your services to make sure you have 1/1 replicas showing up with this command:
$ watch 'docker service ls'
pv27thj5lftz hello-python replicated 1/1 alexellis2/faas-hello-python-armhf:latest
For more information on working with Node.js or other languages head over to the main FaaS repo
Check your function metrics
With a Serverless experience, you don't want to spend all your time managing your functions. Fortunately Prometheus metrics are built into OpenFaaS meaning you can keep track of how long each functions takes to run and how often it's being called.
Metrics drive auto-scaling
If you generate enough load on any of of the functions then OpenFaaS will auto-scale your function and when the demand eases off you'll get back to a single replica again.
Here is a sample query you can paste into Safari, Chrome etc:
The queries are written in PromQL - Prometheus query language. The first one shows us how often the function is being called:
rate(gateway_function_invocation_total[20s])
The second query shows us how many replicas we have of each function, there should be only one of each at the start:
gateway_service_count
If you want to trigger auto-scaling you could try the following on the RPi:
$ while [ true ]; do curl -4 localhost:8080/function/func_echoit --data "hello world" ; done
Check the Prometheus "alerts" page, and see if you are generating enough load for the auto-scaling to trigger, if you're not then run the command in a few additional Terminal windows too.
Image may be NSFW. Clik here to view.
After you reduce the load, the replica count shown in your second graph and the gateway_service_count metric will go back to 1 again.
Wrapping up
We've now set up Docker, Swarm and run OpenFaaS - which let us treat our Raspberry Pi like one giant computer - ready to crunch through code.
How did you find setting up your Docker Swarm first cluster and running OpenFaaS? Please share a picture or a Tweet on Twitter @alexellisuk
Share on Twitter
Watch my Dockercon video of OpenFaaS
I presented OpenFaaS (then called FaaS) at Dockercon in Austin - watch this video for a high-level introduction and some really interactive demos Alexa and GitHub.
Got questions? Ask in the comments below - or send your email over to me for an invite to my Raspberry Pi, Docker and Serverless Slack channel where you can chat with like-minded people about what you're working on.
Want to learn more about Docker on the Raspberry Pi?
I'd suggest starting with 5 Things you need to know which covers things like security and and the subtle differences between RPi and a regular PC.
How do Bitcoin markets behave? What are the causes of the sudden spikes and dips in cryptocurrency values? Are the markets for different altcoins inseparably linked or largely independent? How can we predict what will happen next?
Articles on cryptocurrencies, such as Bitcoin and Ethereum, are rife with speculation these days, with hundreds of self-proclaimed experts advocating for the trends that they expect to emerge. What is lacking from many of these analyses is a strong foundation of data and statistics to backup the claims.
The goal of this article is to provide an easy introduction to cryptocurrency analysis using Python. We will walk through a simple Python script to retrieve, analyze, and visualize data on different cryptocurrencies. In the process, we will uncover an interesting trend in how these volatile markets behave, and how they are evolving.
Image may be NSFW. Clik here to view.
This is not a post explaining what cryptocurrencies are (if you want one, I would recommend this great overview), nor is it an opinion piece on which specific currencies will rise and which will fall. Instead, all that we are concerned about in this tutorial is procuring the raw data and uncovering the stories hidden in the numbers.
Step 1 - Setup Your Data Laboratory
The tutorial is intended to be accessible for enthusiasts, engineers, and data scientists at all skill levels. The only skills that you will need are a basic understanding of Python and enough knowledge of the command line to setup a project.
A completed version of the notebook with all of the results is available here.
Step 1.1 - Install Anaconda
The easiest way to install the dependencies for this project from scratch is to use Anaconda, a prepackaged Python data science ecosystem and dependency manager.
If you're an advanced user, and you don't want to use Anaconda, that's totally fine; I'll assume you don't need help installing the required dependencies. Feel free to skip to section 2.
Step 1.2 - Setup an Anaconda Project Environment
Once Anaconda is installed, we'll want to create a new environment to keep our dependencies organized.
Run conda create --name cryptocurrency-analysis python=3 to create a new Anaconda environment for our project.
Next, run source activate cryptocurrency-analysis (on Linux/macOS) or activate cryptocurrency-analysis (on windows) to activate this environment.
Finally, run conda install numpy pandas nb_conda jupyter plotly quandl to install the required dependencies in the environment. This could take a few minutes to complete.
Why use environments? If you plan on developing multiple Python projects on your computer, it is helpful to keep the dependencies (software libraries and packages) separate in order to avoid conflicts. Anaconda will create a special environment directory for the dependencies for each project to keep everything organized and separated.
Step 1.3 - Start An Interative Jupyter Notebook
Once the environment and dependencies are all set up, run jupyter notebook to start the iPython kernel, and open your browser to http://localhost:8888/. Create a new Python notebook, making sure to use the Python [conda env:cryptocurrency-analysis] kernel.
Image may be NSFW. Clik here to view.
Step 1.4 - Import the Dependencies At The Top of The Notebook
Once you've got a blank Jupyter notebook open, the first thing we'll do is import the required dependencies.
import os
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime
We'll also import Plotly and enable the offline mode.
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)
Step 2 - Retrieve Bitcoin Pricing Data
Now that everything is set up, we're ready to start retrieving data for analysis. First, we need to get Bitcoin pricing data using Quandl's free Bitcoin API.
Step 2.1 - Define Quandl Helper Function
To assist with this data retrieval we'll define a function to download and cache datasets from Quandl.
def get_quandl_data(quandl_id):
'''Download and cache Quandl dataseries'''
cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
try:
f = open(cache_path, 'rb')
df = pickle.load(f)
print('Loaded {} from cache'.format(quandl_id))
except (OSError, IOError) as e:
print('Downloading {} from Quandl'.format(quandl_id))
df = quandl.get(quandl_id, returns="pandas")
df.to_pickle(cache_path)
print('Cached {} at {}'.format(quandl_id, cache_path))
return df
We're using pickle to serialize and save the downloaded data as a file, which will prevent our script from re-downloading the same data each time we run the script. The function will return the data as a Pandas dataframe. If you're not familiar with dataframes, you can think of them as super-powered spreadsheets.
Step 2.2 - Pull Kraken Exchange Pricing Data
Let's first pull the historical Bitcoin exchange rate for the Kraken Bitcoin exchange.
# Pull Kraken BTC price exchange data
btc_usd_price_kraken = get_quandl_data('BCHARTS/KRAKENUSD')
We can inspect the first 5 rows of the dataframe using the head() method.
btc_usd_price_kraken.head()
Open
High
Low
Close
Volume (BTC)
Volume (Currency)
Weighted Price
Date
2014-01-07
874.67040
892.06753
810.00000
810.00000
15.622378
13151.472844
841.835522
2014-01-08
810.00000
899.84281
788.00000
824.98287
19.182756
16097.329584
839.156269
2014-01-09
825.56345
870.00000
807.42084
841.86934
8.158335
6784.249982
831.572913
2014-01-10
839.99000
857.34056
817.00000
857.33056
8.024510
6780.220188
844.938794
2014-01-11
858.20000
918.05471
857.16554
899.84105
18.748285
16698.566929
890.671709
Next, we'll generate a simple chart as a quick visual verification that the data looks correct.
# Chart the BTC pricing data
btc_trace = go.Scatter(x=btc_usd_price_kraken.index, y=btc_usd_price_kraken['Weighted Price'])
py.iplot([btc_trace])
Image may be NSFW. Clik here to view.
Here, we're using Plotly for generating our visualizations. This is a less traditional choice than some of the more established Python data visualization libraries such as Matplotlib, but I think Plotly is a great choice since it produces fully-interactive charts using D3.js. These charts have attractive visual defaults, are easy to explore, and are very simple to embed in web pages.
As a quick sanity check, you should compare the generated chart with publicly available graphs on Bitcoin prices(such as those on Coinbase), to verify that the downloaded data is legit.
Step 2.3 - Pull Pricing Data From More BTC Exchanges
You might have noticed a hitch in this dataset - there are a few notable down-spikes, particularly in late 2014 and early 2016. These spikes are specific to the Kraken dataset, and we obviously don't want them to be reflected in our overall pricing analysis.
The nature of Bitcoin exchanges is that the pricing is determined by supply and demand, hence no single exchange contains a true "master price" of Bitcoin. To solve this issue, along with that of down-spikes (which are likely the result of technical outages and data set glitches) we will pull data from three more major Bitcoin exchanges to calculate an aggregate Bitcoin price index.
First, we will download the data from each exchange into a dictionary of dataframes.
# Pull pricing data for 3 more BTC exchanges
exchanges = ['COINBASE','BITSTAMP','ITBIT']
exchange_data = {}
exchange_data['KRAKEN'] = btc_usd_price_kraken
for exchange in exchanges:
exchange_code = 'BCHARTS/{}USD'.format(exchange)
btc_exchange_df = get_quandl_data(exchange_code)
exchange_data[exchange] = btc_exchange_df
Step 2.4 - Merge All Of The Pricing Data Into A Single Dataframe
Next, we will define a simple function to merge a common column of each dataframe into a new combined dataframe.
def merge_dfs_on_column(dataframes, labels, col):
'''Merge a single column of each dataframe into a new combined dataframe'''
series_dict = {}
for index in range(len(dataframes)):
series_dict[labels[index]] = dataframes[index][col]
return pd.DataFrame(series_dict)
Now we will merge all of the dataframes together on their "Weighted Price" column.
# Merge the BTC price dataseries' into a single dataframe
btc_usd_datasets = merge_dfs_on_column(list(exchange_data.values()), list(exchange_data.keys()), 'Weighted Price')
Finally, we can preview last five rows the result using the tail() method, to make sure it looks ok.
btc_usd_datasets.tail()
BITSTAMP
COINBASE
ITBIT
KRAKEN
Date
2017-08-14
4210.154943
4213.332106
4207.366696
4213.257519
2017-08-15
4101.447155
4131.606897
4127.036871
4149.146996
2017-08-16
4193.426713
4193.469553
4190.104520
4187.399662
2017-08-17
4338.694675
4334.115210
4334.449440
4346.508031
2017-08-18
4182.166174
4169.555948
4175.440768
4198.277722
The prices look to be as expected: they are in similar ranges, but with slight variations based on the supply and demand of each individual Bitcoin exchange.
Step 2.5 - Visualize The Pricing Datasets
The next logical step is to visualize how these pricing datasets compare. For this, we'll define a helper function to provide a single-line command to generate a graph from the dataframe.
def df_scatter(df, title, seperate_y_axis=False, y_axis_label='', scale='linear', initial_hide=False):
'''Generate a scatter plot of the entire dataframe'''
label_arr = list(df)
series_arr = list(map(lambda col: df[col], label_arr))
layout = go.Layout(
title=title,
legend=dict(orientation="h"),
xaxis=dict(type='date'),
yaxis=dict(
title=y_axis_label,
showticklabels= not seperate_y_axis,
type=scale
)
)
y_axis_config = dict(
overlaying='y',
showticklabels=False,
type=scale )
visibility = 'visible'
if initial_hide:
visibility = 'legendonly'
# Form Trace For Each Series
trace_arr = []
for index, series in enumerate(series_arr):
trace = go.Scatter(
x=series.index,
y=series,
name=label_arr[index],
visible=visibility
)
# Add seperate axis for the series
if seperate_y_axis:
trace['yaxis'] = 'y{}'.format(index + 1)
layout['yaxis{}'.format(index + 1)] = y_axis_config
trace_arr.append(trace)
fig = go.Figure(data=trace_arr, layout=layout)
py.iplot(fig)
In the interest of brevity, I won't go too far into how this helper function works. Check out the documentation for Pandas and Plotly if you would like to learn more.
We can now easily generate a graph for the Bitcoin pricing data.
# Plot all of the BTC exchange prices
df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')
Image may be NSFW. Clik here to view.
Step 2.6 - Clean and Aggregate the Pricing Data
We can see that, although the four series follow roughly the same path, there are various irregularities in each that we'll want to get rid of.
Let's remove all of the zero values from the dataframe, since we know that the price of Bitcoin has never been equal to zero in the timeframe that we are examining.
When we re-chart the dataframe, we'll see a much cleaner looking chart without the down-spikes.
# Plot the revised dataframe
df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')
Image may be NSFW. Clik here to view.
We can now calculate a new column, containing the average daily Bitcoin price across all of the exchanges.
# Calculate the average BTC price as a new column
btc_usd_datasets['avg_btc_price_usd'] = btc_usd_datasets.mean(axis=1)
This new column is our Bitcoin pricing index! Let's chart that column to make sure it looks ok.
# Plot the average BTC price
btc_trace = go.Scatter(x=btc_usd_datasets.index, y=btc_usd_datasets['avg_btc_price_usd'])
py.iplot([btc_trace])
Image may be NSFW. Clik here to view.
Yup, looks good. We'll use this aggregate pricing series later on, in order to convert the exchange rates of other cryptocurrencies to USD.
Step 3 - Retrieve Altcoin Pricing Data
Now that we have a solid time series dataset for the price of Bitcoin, let's pull in some data for non-Bitcoin cryptocurrencies, commonly referred to as altcoins.
Step 3.1 - Define Poloniex API Helper Functions
For retrieving data on cryptocurrencies we'll be using the Poloniex API. To assist in the altcoin data retrieval, we'll define two helper functions to download and cache JSON data from this API.
First, we'll define get_json_data, which will download and cache JSON data from a provided URL.
def get_json_data(json_url, cache_path):
'''Download and cache JSON data, return as a dataframe.'''
try:
f = open(cache_path, 'rb')
df = pickle.load(f)
print('Loaded {} from cache'.format(json_url))
except (OSError, IOError) as e:
print('Downloading {}'.format(json_url))
df = pd.read_json(json_url)
df.to_pickle(cache_path)
print('Cached {} at {}'.format(json_url, cache_path))
return df
Next, we'll define a function that will generate Poloniex API HTTP requests, and will subsequently call our new get_json_data function to save the resulting data.
base_polo_url = 'https://poloniex.com/public?command=returnChartData¤cyPair={}&start={}&end={}&period={}'
start_date = datetime.strptime('2015-01-01', '%Y-%m-%d') # get data from the start of 2015
end_date = datetime.now() # up until today
pediod = 86400 # pull daily data (86,400 seconds per day)
def get_crypto_data(poloniex_pair):
'''Retrieve cryptocurrency data from poloniex'''
json_url = base_polo_url.format(poloniex_pair, start_date.timestamp(), end_date.timestamp(), pediod)
data_df = get_json_data(json_url, poloniex_pair)
data_df = data_df.set_index('date')
return data_df
This function will take a cryptocurrency pair string (such as 'BTC_ETH') and return a dataframe containing the historical exchange rate of the two currencies.
Step 3.2 - Download Trading Data From Poloniex
Most altcoins cannot be bought directly with USD; to acquire these coins individuals often buy Bitcoins and then trade the Bitcoins for altcoins on cryptocurrency exchanges. For this reason, we'll be downloading the exchange rate to BTC for each coin, and then we'll use our existing BTC pricing data to convert this value to USD.
altcoins = ['ETH','LTC','XRP','ETC','STR','DASH','SC','XMR','XEM']
altcoin_data = {}
for altcoin in altcoins:
coinpair = 'BTC_{}'.format(altcoin)
crypto_price_df = get_crypto_data(coinpair)
altcoin_data[altcoin] = crypto_price_df
Now we have a dictionary with 9 dataframes, each containing the historical daily average exchange prices between the altcoin and Bitcoin.
We can preview the last few rows of the Ethereum price table to make sure it looks ok.
altcoin_data['ETH'].tail()
close
high
low
open
quoteVolume
volume
weightedAverage
date
2017-08-18 12:00:00
0.070510
0.071000
0.070170
0.070887
17364.271529
1224.762684
0.070533
2017-08-18 16:00:00
0.071595
0.072096
0.070004
0.070510
26644.018123
1893.136154
0.071053
2017-08-18 20:00:00
0.071321
0.072906
0.070482
0.071600
39655.127825
2841.549065
0.071657
2017-08-19 00:00:00
0.071447
0.071855
0.070868
0.071321
16116.922869
1150.361419
0.071376
2017-08-19 04:00:00
0.072323
0.072550
0.071292
0.071447
14425.571894
1039.596030
0.072066
Step 3.3 - Convert Prices to USD
Now we can combine this BTC-altcoin exchange rate data with our Bitcoin pricing index to directly calculate the historical USD values for each altcoin.
# Calculate USD Price as a new column in each altcoin dataframe
for altcoin in altcoin_data.keys():
altcoin_data[altcoin]['price_usd'] = altcoin_data[altcoin]['weightedAverage'] * btc_usd_datasets['avg_btc_price_usd']
Here, we've created a new column in each altcoin dataframe with the USD prices for that coin.
Next, we can re-use our merge_dfs_on_column function from earlier to create a combined dataframe of the USD price for each cryptocurrency.
# Merge USD price of each altcoin into single dataframe
combined_df = merge_dfs_on_column(list(altcoin_data.values()), list(altcoin_data.keys()), 'price_usd')
Easy. Now let's also add the Bitcoin prices as a final column to the combined dataframe.
# Add BTC price to the dataframe
combined_df['BTC'] = btc_usd_datasets['avg_btc_price_usd']
Now we should have a single dataframe containing daily USD prices for the ten cryptocurrencies that we're examining.
Let's reuse our df_scatter function from earlier to chart all of the cryptocurrency prices against each other.
# Chart all of the altocoin prices
df_scatter(combined_df, 'Cryptocurrency Prices (USD)', seperate_y_axis=False, y_axis_label='Coin Value (USD)', scale='log')
Image may be NSFW. Clik here to view.
Nice! This graph provides a pretty solid "big picture" view of how the exchange rates for each currency have varied over the past few years.
Note that we're using a logarithmic y-axis scale in order to compare all of the currencies on the same plot. You are welcome to try out different parameters values here (such as scale='linear') to get different perspectives on the data.
Step 3.4 - Perform Correlation Analysis
You might notice is that the cryptocurrency exchange rates, despite their wildly different values and volatility, look slightly correlated. Especially since the spike in April 2017, even many of the smaller fluctuations appear to be occurring in sync across the entire market.
A visually-derived hunch is not much better than a guess until we have the stats to back it up.
We can test our correlation hypothesis using the Pandas corr() method, which computes a Pearson correlation coefficient for each column in the dataframe against each other column.
First we'll calculate correlations for 2016.
# Calculate the pearson correlation coefficients for cryptocurrencies in 2016
combined_df_2016 = combined_df[combined_df.index.year == 2016]
combined_df_2016.corr(method='pearson')
DASH
ETC
ETH
LTC
SC
STR
XEM
XMR
XRP
BTC
DASH
1.000000
0.420150
0.627993
0.407754
0.590599
0.342005
0.638123
0.702146
-0.011410
0.642670
ETC
0.420150
1.000000
0.284055
-0.186238
0.643888
-0.138814
0.656708
-0.454081
-0.508740
-0.517899
ETH
0.627993
0.284055
1.000000
0.592820
0.688943
0.103136
0.518135
0.231452
0.140013
0.367577
LTC
0.407754
-0.186238
0.592820
1.000000
0.672518
-0.157133
0.447101
0.196907
-0.236687
0.616770
SC
0.590599
0.643888
0.688943
0.672518
1.000000
0.074235
0.860056
0.236015
-0.100052
0.559292
STR
0.342005
-0.138814
0.103136
-0.157133
0.074235
1.000000
0.200894
0.474046
0.493774
0.317864
XEM
0.638123
0.656708
0.518135
0.447101
0.860056
0.200894
1.000000
0.371288
-0.095796
0.610839
XMR
0.702146
-0.454081
0.231452
0.196907
0.236015
0.474046
0.371288
1.000000
0.154389
0.693027
XRP
-0.011410
-0.508740
0.140013
-0.236687
-0.100052
0.493774
-0.095796
0.154389
1.000000
0.006302
BTC
0.642670
-0.517899
0.367577
0.616770
0.559292
0.317864
0.610839
0.693027
0.006302
1.000000
These correlation coefficients are all over the place. Coefficients close to 1 or -1 mean that the series' are strongly correlated or inversely correlated respectively, and coefficients close to zero mean that the values are not correlated, and fluctuate independently of each other.
To help visualize these results, we'll create one more helper visualization function.
def correlation_heatmap(df, title, absolute_bounds=True):
'''Plot a correlation heatmap for the entire dataframe'''
heatmap = go.Heatmap(
z=df.corr(method='pearson').as_matrix(),
x=df.columns,
y=df.columns,
colorbar=dict(title='Pearson Coefficient'),
)
layout = go.Layout(title=title)
if absolute_bounds:
heatmap['zmax'] = 1.0
heatmap['zmin'] = -1.0
fig = go.Figure(data=[heatmap], layout=layout)
py.iplot(fig)
correlation_heatmap(combined_df_2016, "Cryptocurrency Correlations in 2016")
Image may be NSFW. Clik here to view.
Here, the dark red values represent strong correlations (note that each currency is, obviously, strongly correlated with itself), and the dark blue values represent strong inverse correlations. All of the light blue/orange/gray/tan colors in-between represent varying degrees of weak/non-existent correlations.
What does this chart tell us? Essentially, it shows that there was little statistically significant linkage between how the prices of different cryptocurrencies fluctuated during 2016.
Now, to test our hypothesis that the cryptocurrencies have become more correlated in recent months, let's repeat the same test using only the data from 2017.
These are much more significant correlation coefficients.
correlation_heatmap(combined_df_2017, "Cryptocurrency Correlations in 2017")
Image may be NSFW. Clik here to view.
Huh. That's rather interesting.
Note - I have been made aware that computing correlations directly on a non-stationary time series (such as raw pricing data) can give biased correlation values. I'll modify the tutorial code to reflect that in the next day or so, but until then, you could try using the pct_change method before computing the correlations (ex. combined_df_2017.pct_change().corr(method='pearson')), to use the daily return percentages instead of absolute price values. The result still shows a similar, but less extreme, trend.
Why is this happening?
Good question. I'm really not sure.
The most immediate explanation that comes to mind is that hedge funds have recently begun publicly trading in crypto-currency markets. These funds have vastly more capital to play with than the average trader, so if a fund is hedging their bets across multiple cryptocurrencies, and using similar trading strategies for each based on independent variables (say, the stock market), it could make sense that this trend would emerge.
In-Depth - STR and XRP
One noticeable trait of the above chart is that STR (the token for Stellar, officially known as "Lumens"), is the least correlated cryptocurrency. The notable exception here is with XRP (the token for Ripple), which has a very strong (0.955) correlation with STR.
What is significant here is that Stellar and Ripple are both somewhat similar fintech platforms aimed at reducing the friction of international money transfers between banks.
It is conceivable that some big-money players and hedge funds might be using similar trading strategies for their investments in Stellar and Ripple, due to the similarity of the blockchain services that use each token. This could explain why STR is so much more heavily correlated with XRP than with the other cryptocurrencies.
Quick Plug - I'm a contributor to Chipper, a (very) early-stage startup using Stellar with the aim of disrupting micro-remittances in Africa.
Your Turn
This explanation is, however, largely speculative. Maybe you can do better. With the foundation we've made here, there are hundreds of different paths to take to continue searching for stories within the data.
Here are some ideas:
Add data from more cryptocurrencies to the analysis.
Adjust the time frame and granularity of the correlation analysis, for a more fine or coarse grained view of the trends.
Search for trends in trading volume and/or blockchain mining data sets. The buy/sell volume ratios are likely more relevant than the raw price data if you want to predict future price fluctuations.
Add pricing data on stocks, commodities, and fiat currencies to determine which of them correlate with cryptocurrencies (but please remember the old adage that "Correlation does not imply causation").
Quantify the amount of "buzz" surrounding specific cryptocurrencies using Event Registry, GDELT, and Google Trends.
Train a predictive machine learning model on the data to predict tomorrow's prices. If you're more ambitious, you could even try doing this with a recurrent neural network (RNN).
Use your analysis to create an automated "Trading Bot" on a trading site such as Poloniex or Coinbase, using their respective trading APIs. Be careful: a poorly optimized trading bot is an easy way to lose your money quickly.
Share your findings! The best part of Bitcoin, and of cryptocurrencies in general, is that their decentralized nature makes them more free and democratic than virtually any other asset. Open source your analysis, participate in the community, maybe write a blog post about it.
An HTML version of the Python notebook is available here.
Hopefully, now you have the skills to do your own analysis and to think critically about any speculative cryptocurrency articles you might read in the future, especially those written without any data to back up the provided predictions.
Thanks for reading, and please comment below if you have any ideas, suggestions, or criticisms regarding this tutorial. If you find problems with the code, you can also feel free to open an issue in the Github repository here.
I've got second (and potentially third) part in the works, which will likely be following through on some of the ideas listed above, so stay tuned for more in the coming weeks.
Go has been gaining a significant popularity over last few months. Language-related articles and blog posts are written every day. New Go projects are started on Github. Go conferences and meetups attract more and more people. This language certainly has its time now. It became a language of the year 2016 according to TIOBE and recently even made its way to their elite club of 10 most popular languages in the world.
I came across Go a year ago and decided to give it a try. After spending some time with it I can say that it’s definitely a language worth learning. Even if you’re not planning to use it in the long run, playing with it for a while may help you to improve your programming skills in general. In this post I’d like to tell you about five things that I’ve learned with Go and found useful in other languages.
Image may be NSFW. Clik here to view.Gopher - Go’s mascot
1. It is possible to have both dynamic-like syntax and static safety
On a daily basis I work in Ruby and I really like its dynamic typing system. It makes the language easy to learn, easy to use and allows programmers to write code very quickly. In my opinion however it works very well mostly in a smaller codebase. When my project starts to grow and becomes more and more complicated I tend to miss the safety and reliability that statically typed languages provide. Even if I test my code carefully, it can always happen that I forget to cover some edge case and suddenly my object will appear in the context that I didn’t expect. Is it possible then to have a dynamic-like programming language and don’t give up the static safety at the same time? I think so. Let me speak in Go code!
Go is not an object-oriented language. But it does have interfaces. And they are pretty much the same as these you can find in Java or C++. They have names and define a set of function signatures:
typeAnimalinterface{Speak()string}
Then we have Go’s equivalent of classes - structs. Structs are simple things that bundle together attributes:
typeDogstruct{namestring}
Now we can add a function to the struct:
func(dDog)Speak()string{return"Woof!"}
It means that, from now, you can invoke that function on any instance of Dog struct.
This piece of code may seem strange at the first time. Why did we write it outside the struct? And what is this weird (d Dog) part before the function name? Let me explain. Authors of Go wanted to give users more flexibility by allowing them to add their logic to any structs they like. Even to the ones they’re not authors of (like some external libraries). Therefore they decided to keep functions outside the structs. And because the compiler needs to know which struct you’re extending, you have to specify its name explicitly and put it into this strange part called receiver.
To use the above code we can write a function that simply takes Animal as an argument and calls its method.
funcSaySomething(aAnimal){fmt.Println(a.Speak())}
And as you can imagine we’re gonna put the Dog as an argument to the SaySomething function:
dog:=Dog{name:"Charlie"}SaySomething(dog)
“Very well”, you think, “but what do we need to do for the Dog to implement Animal interface?” Absolutely nothing, it’s done already! Go uses a concept called “automatic interface implementation”. A struct containing all methods defined in the interface automatically fulfills it. There is no implements keyword. Isn’t that cool? A friend of mine even likes to call it “a statically typed duck typing”, referring to the famous principle:
“If it quacks like a duck, then it probably is a duck”.
Thanks to that feature and type inference that allows us to omit the type of a variable while defining, we can feel like we’re working in a dynamically typed language. But here we get the safety of a typed system too.
Why is this important? If your project is written in a dynamic, highly abstractive language one day you may find out that some parts of it need to be rewritten in a lower level, compiled language. I noticed however that it’s quite hard to convince Ruby or Python programmer to start writing in a static language and ask them to give up the flexibility they had. But it may be easier to do with “statically-duck-typed” Go.
2. It’s better to compose than inherit
In my previous blog post I described a problem that we can run into if we use object-oriented features too much. I told a story of a client that initially asks for a software that can be modeled with a single class and then gradually extends his concept, in a way that the inheritance seemed like a perfect answer for his increasing demands. Unfortunately, going that way led us to a huge tree of closely related classes where adding new logic, maintaining simplicity and avoiding code duplication was very hard.
My conclusion to that story was that if we want to mitigate the risk of getting lost inside the dark forest of code complexity we need to avoid inheritance and prefer composition instead. I know however that it can be hard to change your mind from one paradigm to another. In my case the thing that helped me the most was writing a code in a language that doesn’t support inheritance at all. You guessed it - that language was Go.
Go doesn’t have the concept of inheriting structs by design. The authors wanted to keep the language simple and clear. They didn’t find inheritance necessary, but they included a feature that is particularly useful when you want to use composition. In order to describe it, I’ll use an example taken from that other blog post.
Let’s say that we’re modeling a vehicle that can have different types of engines and bodies:
Image may be NSFW. Clik here to view.
Let’s create two interfaces representing these features:
Now we need to create a Vehicle struct that will compose above interfaces:
typeVehiclestruct{EngineBody}
Can you see anything strange here? I deliberately omitted names of the fields that these interfaces define. Therefore I used a feature called embedding. From now on every single method existing in the embedded interface will be also visible directly on the Vehicle struct itself. That means that we can invoke, let’s say, refill() function on any instance of Vehicle and Go will pass that through to the Engine implementation. We get a proper composition for free and we don’t need to add any explicit delegation boilerplate. That’s how it works in practice:
If you can’t switch your mind to prefer composition over inheritance in your object-oriented language - try Go and write something more complex than “hello world”. Because it doesn’t support inheritance at all, you’re gonna need to learn how to compose. Quickly.
3. Channels and goroutines are powerful way to solve problems involving concurrency
Go has some really simple and cool tools that help you work with concurrency: channels and goroutines. What are they?
Goroutines are Go’s “green threads”. As you can imagine, they are not handled by an operating system, but by the Go scheduler that is included into each binary. And fortunately this scheduler is smart enough to automatically utilize all CPU cores. Goroutines are small and lightweight, therefore you can easily create many of them and get advanced parallelism for free.
Channel is a simple “pipe” you can use to connect goroutines together. You can take it, write something to one end and read it from the other end. It simply allows goroutines to communicate with each other in an asynchronous way.
Here is a quick example of how they can work together. Let’s imagine that we’ve got a function that runs a long computation and we don’t want it to block the whole program. This is what can be done:
funcHeavyComputation(chchanint32){// long, serious math stuffch<-result}
As you can see, this function takes a channel in its list of arguments. Once it obtains a result it pushes the computed value directly to that channel.
Now let’s see how we can use it. First we need to create a new channel of type int32:
Then we can call our heavy function:
Here comes a bit of magic - the go keyword. You can put it in front of any function call. Go will then create a new goroutine with the same address space and use it to run the function. All of these happen in the background, so the execution will return immediately to allow you to do other things.
And that’s exactly what’s gonna happen in this case. The just created goroutine will live asynchronously doing its job and then it’ll send the result to the channel once ready. We can try to obtain the result in the following way:
If the result is ready, we’ll get it immediately. Otherwise we’d block here until HeavyComputation finishes and writes back to the channel.
Goroutines and channels are simple, yet very powerful mechanisms to work with concurrency and parallelism. Once you learn it, you’ll get a fresh look on how to solve this kind of problems. They offer an approach that is similar to the actor model known from languages and frameworks like Erlang and Akka, but I think they give more flexibility.
Programmers of other languages seem to start noticing their advantages. For instance, the authors of concurrent-ruby library, an unopinionated concurrency tools framework, ported Go’s channels directly to their project.
With that knowledge we can jump directly to the next paragraph.
4. Don’t communicate by sharing memory, share memory by communicating.
Traditional programming languages with their standard libraries (like C++, Java, Ruby or Python) encourage users to tackle concurrency problems in a way that many threads should have access to the same shared memory. In order to synchronize them and avoid simultaneous access programmers use locks. Locks prevent two thread from accessing a shared resource at the same time.
An example of this concept in Ruby may look like this:
Thanks to goroutines and channels Go programmers can take a different approach. Instead of using locks to control access to a shared resource, they can simply use channels to pass around its pointer. Then only a goroutine that holds the pointer can use it and make modifications to the shared structure.
There is a great explanation in Go’s documentation that helped me to understand this mechanism:
One way to think about this model is to consider a typical single-threaded program running on one CPU. It has no need for synchronization primitives. Now run another such instance; it too needs no synchronization. Now let those two communicate; if the communication is the synchronizer, there’s still no need for other synchronization.
This is definitely not a new idea, but somehow to many people a lock is still the default solution for any concurrency problem. Of course it doesn’t mean that locking is useless. It can be used to implement simple things, like an atomic counter. But for higher level abstractions it’s good to consider different techniques, like the one that authors of Go suggest.
5. There is nothing exceptional in exceptions
Programming languages that handle errors in a form of exceptions encourage users to think about them in a certain way. They are called “exceptions”, so there must happen something exceptional, extraordinary and uncommon for the “exception” to be triggered, right? Maybe I shouldn’t care too much about it? Maybe I can just pretend it won’t happen?
Go is different, because it doesn’t have the concept of exceptions by design. It might look like a lack of feature is called a feature, but it actually makes sense if you think about it for a while. In fact there is nothing exceptional in exceptions. They are usually just one of possible return values from a function. IO error during socket communication? It’s a network so we need to be prepared. No space left on device? It happens, nobody has unlimited hard drive. Database record not found? Well, doesn’t sound like something impossible.
If errors are merely return values why should we treat them differently? We shouldn’t. Here is how they are handled in Go. Let’s try to open a file:
f,err:=os.Open("filename.ext")
As you can see, this (and many other) Go functions returns two values - the handler and the error. The whole safety checking is as simple as comparing the error to nil. When the file is successfully opened we receive the handler, but the error is set to nil. Otherwise we can find the error struct there.
iferr!=nil{fmt.Println(err)returnerr}// do something with the file
To be honest I’m not sure if this is the most beautiful way of handling errors I’ve ever seen, but it definitely does a good job in encouraging programmer not to ignore them. You can’t simply omit assigning the second return value. In case you do, Go will complain:
multiple-value os.Open() in single-value context
Go will also force you to read it later at least once. Otherwise you’ll get another error:
err declared and not used
Regardless of the language that you use on the daily basis it’s good to think about exceptions like they were regular return values. Don’t pretend that they just won’t occur. Bad things happen usually in the least expected moment. Don’t leave you catch blocks empty.
Conclusion
Go is an interesting language that presents a different approach to writing code. It deliberately misses some features that we know from other languages, like inheritance or exceptions. Instead it encourages users to tackle problems with its own toolset. Therefore, if you want to write maintainable, clean and robust code, you have to start thinking in a different, Go-like way. This is however a good thing, since the skills that you learn here can be successfully used in other languages. Your milage may vary, but I think that once you start playing with Go you’ll quickly find out that it actually helps you becoming a better programmer in general.
Disclaimer: these are my early thoughts. None of this is battle ready. You’ve been warned.
Hello, Coroutines!
At the recent C++ Committee meeting in Toronto, the Coroutines TS was forwarded to ISO for publication. That roughly means that the coroutine “feature branch” is finished, and is ready to be merged into trunk (standard C++) after a suitable vetting period (no less than a year). That puts it on target for C++20. What does that mean for idiomatic modern C++?
Lots, actually. With the resumable functions (aka, stackless coroutines) from the Coroutines TS, we can do away with callbacks, event loops, and future chaining (future.then()) in our asynchronous APIs. Instead, our APIs can return “awaitable” types. Programmers can then just use these APIs in a synchronous-looking style, spamming co_await in front of any async API call and returning an awaitable type.
This is a bit abstract, so this blog post make it more concrete. It describes how the author wrapped the interface of libuv — a C library that provides the asynchronous I/O in Node.js — in awaitables. In libuv, all async APIs take a callback and loop on an internal event loop, invoking the callback when the operation completes. Wrapping the interfaces in awaitables makes for a much better experience without the callbacks and the inversion of control they bring.
Below, for instance, is a function that (asynchronously) opens a file, reads from it, writes it to stdout, and closes it:
auto start_dump_file( const std::string& str )
-> future_t<void>
{
// We can use the same request object for
// all file operations as they don't overlap.
static_buf_t<1024> buffer;
fs_t openreq;
uv_file file = co_await fs_open(uv_default_loop(),
&openreq,
str.c_str(),
O_RDONLY,
0);
if (file > 0)
{
while (1)
{
fs_t readreq;
int result = co_await fs_read(uv_default_loop(),&readreq,
file,&buffer,
1,
-1);
if (result <= 0)
break;
buffer.len = result;
fs_t req;
(void) co_await fs_write(uv_default_loop(),&req,
1 /*stdout*/,&buffer,
1,
-1);
}
fs_t closereq;
(void) co_await fs_close(uv_default_loop(),&closereq,
file);
}
}
You can see that this looks almost exactly like ordinary synchronous code, with two exceptions:
Calls to asynchronous operations are preceded with co_await, and
The function returns an awaitable type (future_t<void>).
Very nice. But this code snippet does too much in my opinion. Wouldn’t it be nice to have a reusable component for asynchronously reading a file, separate from the bit about writing it to stdout? What would that even look like?
Hello, Ranges!
Also at the recent C++ Committee meeting in Toronto, the Ranges TS was forwarded to ISO for publication. This is the first baby step toward a complete reimagining and reimplementation of the C++ standard library in which interfaces are specified in terms of ranges in addition to iterators.
Once we have “range” as an abstraction, we can build range adaptors and build pipelines that transform ranges of values in interesting ways. More than just a curiosity, this is a very functional style that lets you program without a lot of state manipulation. The fewer states your program can be in, the easier it is for you to reason about your code, and the fewer bugs you’ll have. (For more on that, you can see my 2015 C++Con talk about ranges; or just look at the source for a simple app that prints a formatted calendar to stdout, and note the lack of loops, conditionals, and overt state manipulation.)
For instance, if we have a range of characters, we might want to lazily convert each character to lowercase. Using the range-v3 library, you can do the following:
std::string hello("Hello, World!");
using namespace ranges;
auto lower = hello
| view::transform([](char c){
return (char)std::tolower(c);});
Now lower presents a view of hello where each character is run through the tolower transform on the fly.
Although the range adaptors haven’t been standardized yet, the Committee has already put its stamp of approval on the overall direction, including adaptors and pipelines. (See N4128 for the ranges position paper.) Someday, these components will all be standard, and the C++ community can encourage their use in idiomatic modern C++.
Ranges + Coroutines == ?
With coroutines, ranges become even more powerful. For one thing, the co_yield keyword makes it trivial to define your own (synchronous) ranges. Already with range-v3 you can use the following code to define a range of all the integers and apply a filter to them:
#include <iostream>
#include <range/v3/all.hpp>
#include <range/v3/experimental/utility/generator.hpp>
using namespace ranges;
// Define a range of all the unsigned shorts:
experimental::generator<unsigned short> ushorts()
{
unsigned short u = 0;
do { co_yield u; } while (++u);
}
int main()
{
// Filter all the even unsigned shorts:
auto evens = ushorts()
| view::filter([](auto i) {
return (i % 2) == 0; });
// Write the evens to cout:
copy( evens, ostream_iterator<>(std::cout, "\n") );
}
Put the above code in a .cpp file, compile with a recent clang and -fcoroutines-ts -std=gnu++1z, and away you go. Congrats, you’re using coroutines and ranges together. This is a trivial example, but you get the idea.
Asynchronous Ranges
That great and all, but it’s not asynchronous, so who cares? If it were asynchronous, what would that look like? Moving to the first element of the range would be an awaitable operation, and then moving to every subsequent element would also be awaitable.
In the ranges world, moving to the first element of a range R is spelled “auto it = begin(R)”, and moving to subsequent elements is spelled “++it”. So for an asynchronous range, those two operations should be awaitable. In other words, given an asynchronous range R, we should be able to do:
// Consume a range asynchronously
for( auto it = co_await begin(R);
it != end(R);
co_await ++it )
{
auto && e = *it;
do_something( e );
}
In fact, the Coroutines TS anticipates this and has a asynchronous range-based for loop for just this abstraction. The above code can be rewritten:
// Same as above:
for co_await ( auto&& e : R )
{
do_something( e );
}
Now we have two different but closely related abstractions: Range and AsynchronousRange. In the first, begin returns something that models an Iterator. In the second, begin returns an Awaitable of an AsynchronousIterator. What does that buy us?
Asynchronous Range Adaptors
Once we have an abstraction, we can program against that abstraction. Today we have a view::transform that knows how to operate on synchronous ranges. It can be extended to also work with asynchronous ranges. So can all the other range adaptors: filter, join, chunk, group_by, interleave, transpose, etc, etc. So it will be possible to build a pipeline of operations, and apply the pipeline to a synchronous range to get a (lazy) synchronous transformation, and apply the same exact pipeline to an asynchronous range to get a non-blocking asynchronous transformation. The benefits are:
The same functional style can be used for synchronous and asynchronous code, reusing the same components and the same idioms.
Asynchronous code, when expressed with ranges and transformations, can be made largely stateless, as can be done today with synchronous range-based code. This leads to programs with fewer states and hence fewer state-related bugs.
Range-based code composes very well and encourages a decomposition of problems into orthogonal pieces which are easily testable in isolation. (E.g., a view::filter component can be used with any input range, synchronous or asynchronous, and can be easily tested in isolation of any particular range.)
Another way to look at this is that synchronous ranges are an example of a pull-based interface: the user extracts elements from the range and processes them one at a time. Asynchronous ranges, on the other hand, represent more of a push-based model: things happen when data shows up, whenever that may be. This is akin to the reactive style of programming.
By using ranges and coroutines together, we unify push and pull based idioms into a consistent, functional style of programming. And that’s going to be important, I think.
Back to LibUV
Earlier, we wondered about a reusable libuv component that used its asynchronous operations to read a file. Now we know what such a component could look like: an asynchronous range. Let’s start with an asynchronous range of characters. (Here I’m glossing over the fact that libuv deals with UTF-8, not ASCII. I’m also ignoring errors, which is another can of worms.)
auto async_file( const std::string& str )
-> async_generator<char>
{
// We can use the same request object for
// all file operations as they don't overlap.
static_buf_t<1024> buffer;
fs_t openreq;
uv_file file = co_await fs_open(uv_default_loop(),
&openreq,
str.c_str(),
O_RDONLY,
0);
if (file > 0)
{
while (1)
{
fs_t readreq;
int result = co_await fs_read(uv_default_loop(),&readreq,
file,&buffer,
1,
-1);
if (result <= 0)
break;
// Yield the characters one at a time.
for ( int i = 0; i < result; ++i )
{
co_yield buffer.buffer[i];
}
}
fs_t closereq;
(void) co_await fs_close(uv_default_loop(),&closereq,
file);
}
}
The async_file function above asynchronously reads a block of text from the file and then co_yields the individual characters one at a time. The result is an asynchronous range of characters: async_generator<char>. (For an implementation of async_generator, look in Lewis Baker’s cppcoro library.)
Now that we have an asynchronous range of characters representing the file, we can apply transformations to it. For instance, we could convert all the characters to lowercase:
// Create an asynchronous range of characters read
// from a file and lower-cased:
auto async_lower = async_file("some_input.txt")
| view::transform([](char c){
return (char)std::tolower(c);});
That’s the same transformation we applied above to a std::string synchronously, but here it’s used asynchronously. Such an asynchronous range can then be passed through further transforms, asynchronously written out, or passed to an asynchronous std:: algorithm (because we’ll need those, too!)
One More Thing
I hear you saying, “Processing a file one character at a time like this would be too slow! I want to operate on chunks.” The above async_file function is still doing too much. It should be an asynchronous range of chunks. Let’s try again:
auto async_file_chunk( const std::string& str )
-> async_generator<static_buf_t<1024>&>
{
// We can use the same request object for
// all file operations as they don't overlap.
static_buf_t<1024> buffer;
fs_t openreq;
uv_file file = co_await fs_open(uv_default_loop(),
&openreq,
str.c_str(),
O_RDONLY,
0);
if (file > 0)
{
while (1)
{
fs_t readreq;
int result = co_await fs_read(uv_default_loop(),&readreq,
file,&buffer,
1,
-1);
if (result <= 0)
break;
// Just yield the buffer.
buffer.len = result;
co_yield buffer;
}
fs_t closereq;
(void) co_await fs_close(uv_default_loop(),&closereq,
file);
}
}
Now if I want to, I can asynchronously read a block and asynchronously write the block, as the original code was doing, but while keeping those components separate, as they should be.
For some uses, a flattened view would be more convenient. No problem. That’s what the adaptors are for. If static_buf_t is a (synchronous) range of characters, we already have the tools we need:
// Create an asynchronous range of characters read from a
// chunked file and lower-cased:
auto async_lower = async_file_chunk("some_input.txt")
| view::join
| view::transform([](char c){
return (char)std::tolower(c);});
Notice the addition of view::join. Its job is to take a range of ranges and flatten it. Let’s see what joining an asynchronous range might look like:
template <class AsyncRange>
auto async_join( AsyncRange&& rng )
-> async_generator<range_value_t<
async_range_value_t<AsyncRange>>>
{
for co_await ( auto&& chunk : rng )
{
for ( auto&& e : chunk )
co_yield e;
}
}
We (asynchronously) loop over the outer range, then (synchronously) loop over the inner ranges, and co_yield each value. Pretty easy. From there, it’s just a matter of rigging up operator| to async_join to make joining work in pipelines. (A fully generic view::join will be more complicated than that since both the inner and outer ranges can be either synchronous or asynchronous, but this suffices for now.)
Summary
With ranges and coroutines together, we can unify the push and pull programming idioms, bringing ordinary C++ and reactive C++ closer together. The C++ Standard Library is already evolving in this direction, and I’m working to make that happen both on the Committee and internally at Facebook.
There’s LOTS of open questions. How well does this perform at runtime? Does this scale? Is it flexible enough to handle lots of interesting use cases? How do we handle errors in the middle of an asynchronous pipeline? What about splits and joins in the async call graph? Can this handle streaming interfaces? And so on. I’ll be looking into all this, but at least for now I have a promising direction, and that’s fun.
1st video of a series instruction tapes about the PDP-11. The videos are captures of VHS video tapes being over 25 years old. Those VHS tapes were made by taping the projection slides from the special projection device used by the individual student following this course.
With 124 companies presenting over the next two days, this is the largest Demo Day in YC’s 12.5 year history.
We wanted to share some stats about YC and about the batch. For the more visually inclined, see the graphic below created by YC alum SketchDeck (YC W14).
YC has funded 1,430 companies since 2005, and almost 3,500 founders.
The total market cap of all YC companies is over $85B.
10 YC alum are valued at over $1B and 64 YC alum are valued at over $100M.
YC alum have raised over $13B in total.
Total Companies Presenting at YC S17 Demo Day: 124
COUNTRIES REPRESENTED IN YC S17– 28% of the companies in the batch come from outside the US Austria Brazil Canada Colombia Denmark France India Indonesia Israel Malaysia Mexico Netherlands New Zealand Nigeria Russia South Africa UK USA
UNIVERSITIES REPRESENTED Founders from 72 universities are represented in YC S17. Founders from 190 universities applied to YC S17.
AVERAGE AGE Avg age of S17 founders: 29.5 Avg age of S17 applicants: 30.81
UNDERREPRESENTED FOUNDERS S17 Female founders: 35 (12% of all S17 founders) S17 Applicants who were female: 14% S17 Companies with a female founder: 27 (21% of the batch) S17 Applications with a female founder: 22%
S17 Black founders: 14 (4.76% of S17 founders) S17 Companies with a Black/Af-Am founder: 8 (6.25% of the batch)
S17 Hispanic/Latinx founders: 13 (4.42% of S17 founders) S17 Companies with a Hispanic/Latinx founder: 7 (5.47% of the batch)
(We haven’t asked for race/ethnicity on our applications, which is why you don’t see application percentages for Black/Latinx founders.)
Inspired by the always-incredible work on Dolphin,
I decided to write myself an NES emulator
called Corrosion a couple years ago. I managed to get it working well enough to
play basic games, and then put the project aside. This post is not about the
emulator itself, but rather the JIT compiler I added to it last year and the
upgrades to said JIT compiler I’ve made over the past few weeks.
Having read that, you might be wondering “Why would anybody write a JIT compiler
for the NES?” Indeed, it’s a reasonable question. Unlike newer consoles, it’s
quite feasible to emulate the NES’s modified 6502 CPU at full speed with a
simple interpreter. As with most of the projects I write about here, I wanted
to know how they work, so I built one. Having done so, I can say that I would
not recommend JIT compilation for production-quality NES emulators except in
severely resource-constrained environments. However, I would strongly recommend
this project for anyone who wants to learn more about JIT compilers, as it’s
complex enough to be challenging but simple enough to be manageable.
This is more of a post-mortem article covering the design of my JIT compiler,
the pitfalls I ran into and the mistakes I made in construction and what I’ve
learned from the process. It is not a tutorial on how to write your own JIT
compiler, though there are some links that cover that in more detail at the end.
The emulator is written in Rust, but you don’t need to know Rust to follow along.
Most of the concepts will map to other low-level languages like C or C++. An
understanding of x64 assembly would be helpful, but again, not required - I
didn’t know much assembly starting this project, and even now my assembly is
pretty weak.
Basics of JIT Compilation
Just to make sure everyone’s on the same page, a quick interlude on how JIT
compilers work at a high level. If you’re familiar with this already, feel free
to skip ahead.
Broadly speaking, a JIT (or just-in-time) compiler is a piece of code that
translates some kind of program code into machine instructions for the host CPU.
The difference between a JIT compiler and a regular compiler is that a JIT
compiler performs this translation at runtime (hence just-in-time) rather than
compiling the code and saving a binary for later execution.
For emulation, the original program code is typically the binary machine code
that was intended for the emulated CPU (in this case the NES’ 6502 CPU). However,
JIT compilers are used for many other kinds of programs. Examples include the
JIT compilers used by modern browsers to run Javascript, the Hotspot compiler
in the JVM and dynamic language runtimes like PyPy and LuaJIT.
JIT compilers are used primarily to speed up execution. A standard interpreter
must fetch, decode and execute instructions one at a time. Even in a relatively
fast language like Rust or C, this incurs some overhead. A JIT compiler, on the
other hand, can be run once and emit a blob of machine code which executes an
entire emulated function (or more) in one sequence of instructions. Eliminating
that overhead often greatly improves execution speed. However, since the
compilation is done at runtime, care must be taken that the JIT compiler itself
doesn’t run slowly enough to cause performance problems, where an ahead-of-time
(AOT) compiler can spend much more time optimizing the code it generates.
A JIT compiler typically parses some chunk of code, performs any analysis
it needs to, and then generates binary machine code for the host CPU into a
code buffer. Modern OS’s require these code buffers to be marked as read-only
and executable before they can be executed, but once this is done the generated
code can be executed by jumping the host processor to the beginning of the buffer
just like any normal function. Some more sophisticated JIT compilers will
translate the source language into some intermediate in-memory representation
for further processing before emitting the final machine code.
As a simple example, consider the following 6502 code:
LDA $1A // Load byte from RAM at 0x001A into A register
ADC #$20 // Add 0x20 to the A register
STA $1A // Store A register into RAM at 0x001A
This might be translated into the following (simplified) x64 code:
MOV r9b, [rdx + 1Ah] // Load byte from RAM array pointed to by rdx into r9b
ADC r9b, 20h // Add 0x20 to r9b, which represents the A register
MOV [rdx + 1Ah], r9b // Store the result back into the RAM array at 0x001A
Note that I’ve omitted things like processor flags and interrupts from this
example.
Design of Corrosion’s JIT
Corrosion has a relatively simplistic JIT compiler. It has no intermediate
representation or register allocator, which might be found in more sophisticated
JIT compilers - Dolphin’s PPC JIT has a register allocator, while David Sharp’s
Tarmac ARM emulator features an IR called Armlets (see links at the end).
Since machine code is typically a binary format too complex for humans to write
directly, most JIT compilers also devote much code to translating some
assembly-like syntax or DSL used by the developers into the bytes that are given
to the host CPU. Fortunately for me, there is an extremely useful compiler plugin
by CensoredUsername called dynasm-rs
which can parse an Intel-assembly-like syntax and perform most of the assembly
at compile time. I would recommend any Rust-based JIT compiler author should
check out this plugin; I’ve found it to work well, with no bugs to speak of and
CensoredUsername was very helpful about answering my silly questions when I asked.
The only limitation is that it currently only supports the x64 instruction set,
though x86 support is planned. For those who prefer C/C++, there is a similar
tool called DynASM, though I can’t
comment on that as I’ve never used it myself.
The entry point to the JIT compiler in Corrosion is thedispatcher module.
When the CPU interpreter detects that it’s executing an address from the ROM,
it makes a call to the dispatcher to compile (if necessary) and execute the
relevant block of code. The dispatcher is responsible for managing the cache
of generated code blocks and calling to the JIT compiler to generate more code
when necessary.
If the dispatcher doesn’t have an existing generated code block for a particular
location in ROM, the nes_analyst module
is used to collect information about the code to be compiled. The primary
responsibility of nes_analyst is to determine where the end of the current
function is and collect information about the instructions it contains.
This is done using a very simplistic algorithm that I copied from Dolphin. It
decodes instructions until it finds the first unconditional exit point (eg.
returns, jumps or calls to other functions). To ignore the conditional exit
points, it tracks the target address of the farthest forward-facing branch it’s
seen; any exit point before that is conditional. This approach does occasionally
overestimate the length of the actual function, but it’s simple and fast.
The nes_analyst module is also responsible for identifying which instructions
are the targets of branches and which instructions change or use which processor
flags, which is used later in the compilation process. Decoding opcodes is done
using the decode_opcode! macro which expands to a giant match structure that
calls the appropriate functions. decode_opcode! has handling for the various
addressing modes which we don’t really need here, so there is some clutter,
but it works well enough.
As mentioned earlier, Corrosion doesn’t have a register allocator. It’s quite
common for emulated CPU’s to have more registers than the host CPU, especially
since many JIT compilers run on the relatively register-light x86
and x64 instruction sets. As a result, they need to do the extra step of
determining which emulated registers should be represented by host registers
and which should be stored in memory at any given point in the code. Conveniently,
the NES’s 6502 CPU has even fewer registers than x64 does, which means we can
statically assign one x64 register to represent each 6502 register and have a
few left over to store things like the pointers to the Rust CPU structure and
the array which stores the emulated RAM, as well as a few more for general-purpose
scratch memory.
Most 6502 instructions come in various different flavors called addressing modes,
which control where they take some of their data from. Take the CPX (ComPare X)
instruction as an example. This instruction compares the value in the X register
to a one-byte operand, setting the N (sign), Z (zero), and C (carry) flags.
If the opcode is 0xE0, the operand is a one-byte immediate value stored
right after the opcode. If the opcode is 0xE4, the next byte is instead
zero-extended to 16 bits and used as an address into RAM. This mode is called
the zero-page mode, and it can only access the first 255 bytes of RAM, which are
called the Zero Page. The byte at the selected location is used for the
comparison. Finally, if the opcode is 0xEC, the next two bytes (little-endian)
are used as an absolute address into memory and whichever byte they select is used.
If you’re wondering, zero page instructions are one byte smaller and slightly
faster than absolute instructions, which matters when you have a 64k address
space and 1.34MHz CPU.
There are a number of other addressing modes, but this should suffice to explain
the concept. I could have written hand-tuned machine code for all 255 possible
opcodes, but I’m a lazy programmer, so instead I wrote a collection of routines
that generate code to move the appropriate byte into one of my scratch registers
(r8). That way, I can call the routine appropriate for the addressing mode to load
the operand into r8, then define the instruction code to take it from there.
Likewise, when writing to memory, I can move the value to be written into r8
and call a routine to generate the instructions to transfer that value into the
appropriate location in memory. It’s slightly less efficient at runtime because
I have to move data through an intermediate register instead of using it
directly, but it saved a lot of my time.
Slight aside - I was a bit surprised by how small the difference is between
writing code to implement something and writing code that generates a program
to implement something. I’ll use CPX as an example again - this is some code
from an earlier version of the JIT:
If I were actually writing this in assembly, this reads like pretty much how
I’d do it - call the function for the appropriate addressing mode to load the
operand, do some branching to set or clear the carry flag, compare the operand
against the X register and call some functions to update the sign and zero
flags. In fact, that’s exactly how the interpreter handles this instruction.
Instead, I’m calling a function to generate the code to load the operand,
generating code to do the comparison and update the flags, etc. Despite that
extra layer of indirection, though, it reads pretty much the same. Because
of this, implementing all of the instructions was as straightforward as
translating my Rust code into assembly. I’m not actually that good with
assembly, so my code will probably make experienced assembly programmers cry.
Still, it does the job. With that said, I would be interested in ideas for
making it better if anyone would care to share links or suggestions.
Enhancements
That brings me up to the present, more or less. Over the past few weeks I’ve
been working on some ‘optimizations’ to the JIT compiler. I write that in quotes
because for the most part I can’t actually detect any measurable change in
execution speed for these, but they were somewhat interesting to implement.
The first such enhancement that I added was redundant flag elimination. This was
actually really easy and I probably should have done it from the start. The idea
here is that a good chunk of the code emitted by a JIT compiler (at least for
emulators) does nothing but implement the various flag behaviors of the emulated
CPU (eg. setting the overflow flag when an addition overflows). To some extent,
a clever JIT compiler author can exploit similar flags in the host CPU to
accomplish this with fewer instructions, but it’s still there. If you look at
documents detailing the 6502’s instruction set,
you’ll quickly see that many instructions change the flags in some way,
but very few instructions use them. What this means is that a typical program
will overwrite processor flags far more often than they’re actually used.
Interpreters sometimes take advantage of this by not storing the flags at all,
and instead storing enough data to calculate the flags and then evaluating them
lazily when needed. A JIT compiler, however, can go one step further and analyze
every instruction to see if that flag value will be used before it’s overwritten
by another instruction. If not, it doesn’t emit the machine code to update the
flag.
The way I implemented this is to have nes_analyst keep track of the last
instruction to change each flag while it’s stepping through a function. Then
when it hits an instruction that uses a flag, it looks up the InstructionAnalysis
structure for the last instruction to set the flag, which contains a set of
booleans indicating whether each flag will be used. Since we now know that that
instruction’s flag will be used and not overwritten, we set the appropriate
boolean to true, signaling the JIT compiler to emit code to update that flag
later on.
There are a few pitfalls with this approach. For instance, if a branch is taken
or if execution hits a jump instruction, we can’t know if the code it jumps to
will rely on this flag. If so, this optimization could break. A more
sophisticated analysis could probably detect that, for at least some cases.
This one-pass algorithm can’t, so to be on the safe side it assumes that jump
and branch instructions use all of the flags. Likewise, when an interrupt
occurs, the NES pushes the flags and the return address on the stack. Since an
interrupt can occur at any time, there’s no way to be sure that the flags byte
it pushes on the stack will be correct. I don’t have a solution to this except
to assume that no game will break because of the exact value of the flags byte
on the stack. This seems like a safe assumption. Since interrupts can happen at
any time, it would be difficult to know what the flags should have looked like
when the interrupt happened. Something to be aware of, though.
The initial version of my JIT compiler emitted a fixed series of instructions
(a function prologue) at the beginning of every compiled block which rearranged
the arguments from the win64 calling convention and loaded all of the NES
register values out of memory into the designated x64 registers. Then, at every
possible exit point from the block, it would emit some code (the epilogue) to
do the reverse; store the register values back in memory and return control
back to the interpreter. This means we can’t just jump to the middle of a
compiled function - we’ll skip over the prologue and crash. Therefore, if some
other code tries to jump into the middle of a function, we need to compile that
function suffix as a complete function of its own, with its own prologue and
epologues. Also, these duplicate prologues and epilogues take up space in the
instruction cache, which could reduce performance.
Instead, I’ve changed it to use a trampoline; this is an ordinary Rust function
taking the pointer to the compiled code to jump to as well as the pointers to
the CPU structure and the RAM array. It contains an asm! macro which defines
the assembly instructions to load the registers from memory, call the compiled
block and then store the updated registers back into memory. Since we now only
have one global ‘prologue/epilogue’ shared between all compiled code blocks, we
can then call directly into the middle of an existing block with no trouble.
Another problem with the prologue/epilogue design was that compiled blocks
couldn’t easily call each other; the JIT would have to store everything back in
memory to prepare for the prologue to be run again, or know how to jump past
the prologue or something else complicated. With a trampoline-based design,
it’s easy to jump to another block - everything’s already loaded into the
appropriate registers, so you can just jump the host processor to the beginning
of the target block. One wrinkle is that you need to be careful not to link
together blocks from different banks of ROM, since one bank could be switched
out and now your code is jumping to the wrong place.
Challenges
Speaking of that trampoline function, I did run into some difficulty implementing
it. The trampoline function needs to transfer values from a struct in memory
to and from registers. It takes a pointer to a CPU struct as an argument, but
that alone isn’t enough; Rust can rearrange and pad the fields however it likes,
so I needed a way to get the offset of each field from the pointer to the CPU.
C/C++ programmers can use the offsetof macro, but Rust has no official way to
calculate the offset of a field within a structure. The layout of Rust structures
isn’t even guaranteed to be the same from release to release - in fact, it was
changed
just a few months ago in version 1.18. I could have marked the CPU struct withrepr(C) to force it to use the C layout and used hard-coded offsets, but that
felt inelegant. I would have needed to update the offsets every time I modified
the CPU struct, for one thing. Instead, I found a macro online that can calculate
the offset of any field in a structure.
This works by casting 0 (NULL) to a raw pointer to a $ty structure,
dereferencing it, taking a reference to the field and casting that pointer back
to a usize. As far as I can tell, this is actually safe and should be entirely
evaluated at compile time, but it still needs to be wrapped in an unsafe block
anyway. Use at your own risk, etc. etc. It’s pretty easy to add more macros to
calculate offsets with multiple levels of nesting - see offset_of_2 inx86_64_compiler/mod.rs for an example. One drawback of this is that it can’t
be used for static values - it’s forbidden to dereference null pointers when
initializing static values, even with unsafe. Because of that, I didn’t think
it would work with the asm! macro’s n value constraint (meaning constant
integer) but it totally does. Still, it’d be really nice if this was something
Rust supported out of the box.
Another challenge I ran into while implementing this is dealing with some
quirks of the win64 calling convention. Rust, you see, does not have a defined
calling convention, so there’s no reliable way to call directly into Rust code
from assembly. Instead, you expose a function marked extern "win64" or
similar which then calls the function you actually want. This way, you set up
your code to be compatible with the chosen calling convention - pushing
caller-saved registers on the stack, placing arguments in the right registers -
and leave Rust to handle the translation to its own internal calling
convention. The win64 convention is one of two 64-bit calling conventions
supported - the other one, sysv64, is still experimental and requires a special
feature flag even on nightly. The JIT compiler needs to call back into Rust
code to handle things like reading and writing memory-mapped devices like the
PPU or the controllers. Unfortunately, win64 is slightly difficult to work
with. It requires that the stack pointer be 16-byte aligned at the entry to
every function, and that the caller provide a 32-byte empty space on the stack
before the return address for the callee to use as scratch space. Failure to do
this correctly causes hard-to-debug segfaults. In my code, I don’t have many
places where I call back to Rust code, and the generated code doesn’t use the
stack very much, so I deal with this by just hard-coding the number of bytes of
space to leave on the stack. It’s not ideal (if I had more complex requirements
I might add a trampoline_to_win64 function to match trampoline_to_nes), but
other JIT compiler authors should be aware of it.
Next up, debugging. Debugging a JIT compiler sucks even more than debugging an
interpreter. Debugging tools largely just don’t handle runtime-generated
machine code. Visual Studio, despite having a quite competent disassembly view,
just will not step into a generated code block. GDB’s disassembly view will at
least display the generated code and let you scroll downwards through it, but
not back upwards (I guess because it doesn’t know which byte to start
disasembling from, but it could at least allow you to scroll back up to the
program counter). GDB also fails to insert breakpoints into generated code
blocks even when you give it the address of the instruction to break at. GDB
has some sort of interface for exposing debugging info for JIT-compiled code,
but I wasn’t able to make much sense of it. Apparently it relies on the JIT
compiler generating and emitting a complete ELF table in memory for the
generated code, which sounds like a lot of hassle. Anyway, in the absence of a
debugger, good old println-debugging is your best friend. This is complicated
by the fact that you have to insert your debug output into the generated code
at runtime, but I’d strongly suggest you find a way. I wish I had done this
earlier, it would have saved me a ton of debugging time.
Handling interrupts also proved to be something of a challenge. The NES has
very tight synchronization between the CPU and the other hardware, which
includes interrupts. I had hoped there would be some clever way to implement
interrupts without just checking if there had been an interrupt before every
emulated instruction, but I couldn’t find one. This is part of why the
duplicate epilogues were a problem, in fact; every emulated instruction was
preceded by an implicit exit point, so there were a lot of redundant epilogues.
The best I could come up with was to store a cycle count representing when the
next interrupt would occur and then compare that against the actual cycle count
before every instruction. This sort of works, because the hardware interrupts
of the NES are entirely predictable, but it probably wouldn’t work for other
systems. On the other hand, other systems probably don’t require such tight
timing for the interrupts, so if you’re writing a JIT you might be able to get
away with only checking for interrupts once every 10 emulated instructions or
something.
As I mentioned in my post on parsing iNES ROM headers,
the NES only has 32k of address space mapped to the ROM. Some games take up
more than a megabyte of ROM space, so NES cartridges incorporate circuitry so
that the game can map banks of the ROM in and out of the address space.
Implementing the bankswitching logic is one thing, but this allows for the
possibility of self-modifying code even if you only use the JIT compiler when
executing from ROM. There are all sorts of wacky corner cases this enables -
what if the bank you’re executing is switched out between instructions? What if
half of a block is on one bank and the other is on the next bank, then the
second half gets switched out? If you then execute a generated code block that
compiled in the instructions from the original bank, the game will probably
break. You could even have a multi-byte instruction on a bank boundary, such
that the last byte of the instruction depends on which bank is mapped in. I’ll
be honest, I didn’t solve this problem. Corrosion just assumes that no game
will do strange stuff like this. Initially, I took a much more conservative
approach and deleted all of the compiled code for a bank whenever it was
switched out. This was a mistake; games like Legend of Zelda bankswitch
frequently enough that the emulator was constantly recompiling sections of code
that it had already compiled before. Major respect for the developers of other
JIT-based emulators - dealing with arbitrary self-modifying code, especially in
situations where you have an instruction cache and/or pipelining, must be a
nightmare.
Other resources & conclusion
Well, that’s about it from me. This was a bit more stream-of-consciousness than
my posts usually are, since I was writing about something I made a while ago.
I normally write my posts concurrent with working on the projects they cover.
I hope you found it interesting and/or educational. I’ll leave you with some
links to other resources that I used or wish that I’d known about when I was
building this thing.
First off, Eli Bendersky’s Adventures In JIT Compilation series
(Part 1,Part 2,Part 3,Part 4)
is an excellent introduction to the low-level details of implementing an
interpreter and a series of JITs for Brainfuck, including different ways of
generating machine code, intermediate representations and so on.
Second, David Sharp’s report on Tarmac,
an optimizing JIT compiler for ARM emulation. It’s over a hundred pages long,
but this is an excellent overview of JIT compilation techniques as well as a
detailed explanation of how Tarmac works. Sharp gives a good explanation (often
including diagrams and/or examples) of common approaches to various problems in
emulation, even if Tarmac itself doesn’t use them. If nothing else, read it to
learn about terminology you can plug into a search engine to find out more.
If you’re interested in NES emulation in particular, the NESdev wiki is the premiere source of information for aspiring emulator developers and
homebrew ROM authors. This wiki and the resources it links to (includingthe forums,test ROMs, and lots
of documentation about the CPU/PPU/APU) provided all of the documentation I used
to build this emulator in the first place.
Finally, Dolphin’s JIT doesn’t seem to have much documentation, so if you want
to find out more about it there are only two sources that I’ve found useful.
The source code,
and this Reddit comment
by one of the developers giving a relatively high-level overview of how it all
works.
This has been a fun project. I have some other stuff in the pipeline at the
moment, but I’d like to come back to this emulator at some point. Until next
time…
Urban design and mental health in Tokyo: a city case study
Layla McCay (1,2), Emily Suzuki (2) and Anna Chang (3) (1) Centre for Urban Design and Mental Health, UK and Japan (2) Tokyo Medical and Dental University, Japan (3) Southern California Institute of Architecture, USA
6Urban planning/design lessons from Tokyo for better public mental health
Streetparks: Empower and incentivise residents to install nature everywhere (and easily access out of town nature)
Superblocks: Nudge vehicles into larger streets to prioritise active transport, social streetlife, green space etc.
Active transport: The affordable, efficient, reliable and convenient choice
Social exercise: Easy access and changing/storage facilities
Interior placemaking: popular indoor spaces like shopping malls can be green, active, and pro-social like a high street
Suicide-prevention design: Innovative design interventions include blue light and nature images in train stations
5Urban planning/design steps to help improve Tokyo’s public mental health
Increase awareness: Urban planners and designers need to appreciate urban design and mental health opportunities
Seize the cycling opportunity: remove workplace barriers to instill bike commuting; infrastructure will follow demand
Harness neglected waterways: for walking, sports, socialising and relaxation
Social interaction: This should be at the heart of more public space design
Optimise the workplace: Long working hours means the importance of commute and office design to promote better mental health
Introduction
The World Health Organisation defines mental health as: “a state of well-being in which every individual realizes his or her own potential, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to her or his community.” This definition is relevant for urban designers because it also reflects key components of a thriving, resilient urban population. Key urban design factors that can affect mental health include: access to natural (green and blue) spaces (Roe 2016, Gascon et al 2015), facilitating physical activity (Morgan et al 2013), pro-social activity (Francis et al 2012), safety (including wayfinding, crime, and traffic), and sleep quality (Clark et al 2006, Litman 2016).
Overview of mental health in Tokyo
Tokyo has a complex relationship with mental health. There is a lack of universally understood term: 'mental health' is often taken to mean mental illness (in Tokyo, a person with a mental illness is sometimes referred to by the abbreviation 'men-hel') which can make it complicated to discuss mental health promotion. It is hard to identify Tokyo-specific mental health statistics. The World Mental Health Japan survey conducted mental health assessments with 4,000 people in Japan, and found the lifetime prevalence of common mental illnesses to be around 1 in 5 people, which is lower than in many Western countries (Ishikawa et al, 2016). However, mental disorders are also the second greatest cause of severe role impairment of all chronic health problems in Japan (approximately 2.6 million people), and mental illness is considered to account for nearly a quarter of all-cause disease burden in Japan (Health Japan, 2016). The World Mental Health survey measured Japanese people’s lifetime risk of anxiety disorders to be 8.1%, substance use disorders (largely alcohol) 7.4%, and mood disorders (largely depression) 6.5%. Men were more likely to have a mental disorder in their lifetime, but women and younger people were more likely to have persistent disorders.
In addition, Tokyo has some country-specific mental health challenges. Mental health problems are subject to profound stigma. Almost universally long working hours have led to a more restricted leisure culture than in some other cities. In a country with high levels of overtime, Karoshi (death by overwork) is associated with very high stress levels and low food intake, leading to heart attack, stroke or suicide. The evidence on causality and prevalence is still incomplete, but karoshi is often in the news. Japan is known for high suicide rates, though this is improving; however, figures vary but Japan still has the 6th highest suicide rate in the world (18.7 per 100,000 people in 2013, OECD). Suicide has a complex mix of drivers in Japan, from depression to a tradition of ‘honourable suicide’ following unemployment, financial problems, or causing a burden for family members. Nearly one million people each year experience Hikkikomori, acute social withdrawal typified by a young person not leaving their home for more than six months, with a lifetime prevalence of 1.2% (Koyama et al, 2010). Finally, some people who were affected by the Tohoku earthquake and tsunami in 2011 have developed post-disaster mental health problems, and Tokyo’s population lives in recognition that a large earthquake could occur at any time.
The proportion of people with mental disorders who receive diagnosis and treatment in Japan is lower than in many Western countries, likely associated with ongoing stigma, a reluctance to divulge mental health problems, and low uptake of psychiatry careers. Just 1 in 5 people with mental health problems seek formal treatment (Ishikawa et al, 2016).
The evolution of urban planning and design in Tokyo
Tokyo is often considered a city, but it is in fact a metropolitan prefecture (region) comprising 23 special wards, each governed as separate cities, plus 26 more cities, 5 towns, and 8 villages, all governed separately (with national and Tokyo Metropolitan Government influence), creating a complex picture in terms of urban planning. The city has a population of over 13 million, and the metropolitan area extends to a population of 36 million, the most populous in the world (World Population Review, 2017). The centre has a density of 15,187 people per square kilometer, much less than Manhattan (27,000) and Paris (21,000).
Tokyo’s urban planning history can be characterised by a cycle of city destruction and rebuilding. Fires, earthquakes and war have driven Tokyo’s planning efforts and ever-changing cityscapes, particularly since the Great Ginza Fire of 1872, just after Tokyo became the capital of Japan. Indeed, the Great Ginza Fire precipitated the Meiji Government’s first modern urban planning project, rebuilding to create a ‘fireproof’ city that incorporated a brick quarter and several parks. This led to the establishment of the Tokyo Town Planning Ordinance in 1888, then the 1919 Planning Law, and the birth of toshi keikaku/modern urban planning, in Tokyo, establishing top-down planning systems. The next big reconstruction project followed the 1923 Great Kanto Earthquake, and resulted in the development of the city’s extensive rail system. Then post-WWII, further rebuilding was necessary following war damage. The 1968 City Planning Law delegated more planning powers to the local level (its revision in 1980 was the first to explicitly include consideration of quality of life). Over the next few decades came a surge in population and industry, and along with that, a succession of laws around parks and green spaces, roads and railways, crowding and pollution. Many of Tokyo's rivers were sent underground or diverted, and canals and aqueducts were installed. With soaring house prices, residents increasingly moved to the suburbs, leading to ‘doughnutisation’ or reduced populations in the centre of the Tokyo prefecture, with mainly home owners (often older people) remaining in the centre. Since the ‘bubble economy’ burst in the 1990s, central housing has become more affordable again, and some younger people have returned to live in the city centre.
Figure 1: Timeline of Japanese modern urban planning
Methods
Literature review A search was conducted on the Tokyo Metropolitan Government website to identify relevant policy documents. These were retrieved and assessed, and relevant sections were identified and extracted. Further policies mentioned by interviewees were also examined.
Interviews Eleven Tokyo-based academics, public health specialists, mental health specialists, urban planners, urban designers, developers and architects were identified by snowball sampling. Semi-structured interviews were conducted with each person in either English or in Japanese with English translation. Each subject was asked about what they considered to be urban design factors that support good mental health, the priority given to mental health in urban design policies and plans, and barriers to prioritisation using the Centre for Urban Design and Mental Health protocol.
Tokyo street scene. Picture by Mai Kobuchi.
Results
‘Compared with other Asian countries whose population density is quite high, Tokyo is a less stressful city. I don’t think this is realized by a particular architect but by well-organized city planning. This city planning is unique to Japan.’– Urban planner
Tokyo’s urban policies with the potential to impact mental health
‘Health is always the most important topic in urban planning. When we had the Meiji Restoration, a priority topic was health. We needed more sunshine to improve people’s health, which can be realized by making more windows. Once we regulated it, we could get enough windows.’ – Urban Planner
At a national level, Japan’s Ministry of Land, Infrastructure, Transport and Tourism has set five goals, which may be relevant to urban design for mental health:
Changing social environment to develop people’s self awareness on health;
Promoting elderly people’s community participation to help them find the purpose of life;
Preparing urban functions within walking distance from elderly people’s homes so that they can live by themselves;
Making a city where people can safely walk, e.g. barrier-free sidewalks; and
Improving public transportation services.
In a city governed by complex laws and policies, Tokyo, prioritisation of health in general has been important (though explicit focus on the mental aspect of health has been more limited). Japan’s National Spatial Planning Act, Capital Regional Plan (2009) expects the Tokyo area to fulfill various roles in the 21st century including: being a beautiful area where 42 million people can live comfortably; conservation and creation of a good environment; and disaster-proofing to provide safety. While Tokyo Metropolitan Government manages certain Tokyo-wide policies, which will be discussed in this paper, in general, urban policies vary substantially within the wider Tokyo prefecture, and there is scope for individual case studies of cities within Tokyo metropolis.
Tokyo has six stated goals for urban planning. Relevant for mental health, these include: restoration of green and blue spaces; and creating a city where people can live comfortably, safely, and with peace of mind. Resident consultation and active participation in the planning process is important to the Tokyo Metropolitan Government, and the trend, particularly in the past decade, has been decentralisation of city planning.
Despite the proliferation of urban policies that recognise health, many of the architects, urban planners and developers interviewed reported that they were not familiar with specific laws, policies or guidelines that they were obliged or encouraged to follow that explicitly included health promotion for the population, though an exception is for older people.
‘The government does not issue policies about health for urban planners or architects, except for particular cases, such as the size of rooms in group homes for people with dementia, or the percentage of green space for developments. Architects are trained to build good environments; there is no effort to regulate them once they have qualified.’ – Developer
‘Most of the guidelines that specifically mention health are aiming for making comfortable environment for elderly people. For example, in Sugamo, which is called “Harajuku for the elderly people,” the government spent money to make the area walkable and barrier-free.’ – Urban designer
Opinions about the environmental determinants of mental health
Interviewees identified factors within the built environment that they personally considered to be associated with mental health in Japan: beauty, nature, opportunities for creativity, social connections, opportunities to contribute to the community, access to healthcare, safety, and confidence that the city is well-managed, including efficient, reliable transportation.
Access to nature: green and blue places
One of the main ways in which Tokyo’s urban design seeks to promote better mental health is by increasing citizens’ access to green space. In the Edo period, Tokyo was filled with green spaces and waterfront parks. However, with industrialisation, much of this was lost. Tokyo policies explicitly recognise the health benefits of green space:
‘Greenery in urban areas brings pleasant and comfortable features to the lifestyles of residents’ and ‘greenery brings comfort to the human spirit’. (Basic Policies for the 10-year Project for Green Tokyo, TMG 2007).
The concept of access to green space as a factor that promotes good mental health is generally recognised, particularly in the context of appreciating nature and beauty:
‘We try to make more natural things in public space by planting trees’ – Urban Planner
‘Plants are a way to improve happiness’ – Think Tank academic
‘Japanese people have this ability to screen out any ugliness and chaos, just ignore it and focus on what is beautiful’ – Health company employee
‘Japanese people care about beauty and appreciate nature and how it can be seen in the changing seasons'– Think Tank academic
In addition to having several public parks, Tokyo has a range of large, landscaped parks that can be visited for a small fee. However, the opportunities and challenges of integrating further greenery into the Tokyo cityscape, while considered congruous, may be challenging in practice, given the existing development of the city.
‘Part of Japan’s culture is having small pieces of nature around you: plants and water. This is part of the concept behind bonsai trees: the ability to feel like you are in a forest when you are in a tiny apartment’– Policy specialist
‘Tokyo is already densely developed and designed, so it is hard to design park areas. One example is Roppongi Hills, where the developers included a park with green space and water for people’s comfort.’– Academic
However, green spaces are not always ‘usable’. While cherry blossom (hanami) season encourages picnics in the park, a good opportunity to socialise, relax, and meet people beyond one’s colleagues and family, at other times of the year this use of green space is often considered inappropriate.
‘Parks have too many fixed benches, too many plants, and not enough open space. They have early closing times and signs to stay off the grass. And I think guards panic when they see people using a green space that is not designated for that purpose.’ – Urban designer
‘Some of the existing green places in Tokyo are the grounds of temples and shrines, but we cannot use these spaces for games or picnics, for example’– Architect
In the Edo period, Tokyo had many rivers and aqueducts, but many now flow beneath the city; some of these are now visible in the form of green walking and cycling routes above ground. These take the form of green median strips in roadways (eg the River Green Road 九品仏川緑道 in Jiyugaoka) or run parallel to roadways; there are also green pedestrian pathways and narrow pathways that run alongside houses. Some of these paths do still retain a reduced form of running water, which is occasionally landscaped into long, narrow parks.
Tokyo still has overground rivers and canals, but since WWII, their waterfronts have been lined by warehouses and other industrial facilities, the water polluted, and the areas avoided by residents. However, more recently Tokyo has improved water quality. For example, the Sumida River has seen stronger plant wastewater regulations, dredging, purification water from Tone and Arakawa Rivers, and sewage system development. This has enabled exploration of the opportunities of waterside development (Tokyo Construction Bureau). A number of pleasure boats now ply the river, and an occasional kayak is seen. The Sumida River Terrace now features a riverside trail with plants. It is intended for walking and relaxing, yet it is usually only busy for seasonal festivals, and learning from riverside developments in other cities, and has further opportunities for development.
The Development Policy for City Planning Park and Green Space (2011) seeks to ‘revive beautiful city Tokyo surrounded by water and green corridors’ and explicitly recognises urban greenery ‘as a place of relief for Tokyo citizens’. This recognition of the benefits of nature is reiterated in the Metropolitan Area Readjustment Act (1958, frequent revisions), spawning various Green Action Plans, which prioritise the need to ‘conserve green spaces that embrace the healthy natural environment’. There is an added incentive in Tokyo to invest in open green spaces as evacuation locations in the event of natural disasters. A range of laws, including the Urban Park Act, the Landscapes Act and the Urban Green Space Conservation Act, supports these efforts. The current Tokyo City Planning Vision (Episode 4, P 117-123) by the Bureau of Urban Development envisions green roofs, ground level planting, additional lawns and riverside greenery, and using plants to create more shadow in order to reduce heat.
Referring to nature is not always helpful in urban design discussions: nature can be perceived as a threatening concept in Japan, so it may be more helpful to refer to ‘green space’.
‘European nature seems to be calm and well-controlled by humans. But [in Japan] we are in awe of nature, sometimes nature bears down on us as natural disaster, like typhoons, earthquakes, tsunamis.’ - Academic
Specific actions to improve access to nature in Tokyo
Designating green zones: Tokyo’s Comprehensive Policy for Preserving Greenery (2010) has led to designation of special districts and zones in Tokyo, including those focused on promoting greenery:
Special green space conservation districts: These districts, designated under the Urban Green Space Conservation Act, aim to ensure ‘favourable urban environments’ by conserving and promoting urban green spaces.
Suburban green space conservation zones: These large-scale zones were designated under the Act for the Conservation of Suburban Green Zones in the National Capital Region to prevent unregulated urbanization, and explicitly aim to maintain and improve healthy minds and bodies of urban and suburban residents.
Scenic districts: These districts aim to conserve urban scenic beauty and environment, and are subject to building restrictions under the Tokyo Scenic Area Ordinance.
Incentivising park development by private companies: To enhance the value of open spaces that are created in the process of large-scale urban development, the TMG established the Guideline for Greenery Development in Privately-Owned Public Spaces to facilitate the creation of spaces such as greenery networks and pro-social open spaces that create ‘the building of lively communities’.
In central Tokyo, open space is expensive. The Comprehensive Policy for Green Conservation seeks to systematically encourage retention of green space and development of new green space, including by private developers. TMG created a park usage development scheme whereby private business projects that include a certain size of park (suitable for evacuation), which they develop, manage, and make free for public access, are given preferential treatment and easing of building regulations. The first private park opened in 2009. While pseudo-public space has its problems, this approach is delivering results in Tokyo. Privately-run parks are now creating an increase in green space across the city, and improved management of city-owned parks. TMG also leverages its own resources to promote greening, for example when TMG leases land to a private company (for example as a parking lot), the leasing contract includes greening requirements, and public-private partnerships are emerging such as the riverside Sumida River Terrace project near Kiyosubashi Bridge.
Sumida River Terrace project is a public-private partnership that seeks to deliver a waterside venue for walking, cycling, and socialising in Tokyo. Artist: Ryoji Noritake
Machizukuri (Empowering citizens for community design):
“Machizukuri (街づくり) became popular to stimulate energy of cities by developing good residential environment.’ – Architect
A key way in which machizukuri is exhibited is through empowering citizens to deliver a green city. Tokyo is remarkable for having relatively few parks, yet a profusion of greenery. Much of this lives in plant pots and tree bases that line the streets outside people’s homes.
“Everyone engages in design of public space, but the Japanese style of placemaking is to make things beautiful.’ – Architect
TMG has stated that ‘the leading players in restoring greenery to Tokyo are its citizens’. They elaborate that to restore greenery of Tokyo ‘it is necessary for each citizen of Tokyo to take an interest in greenery. The driving force for creating a verdant Tokyo will be people’s wish to nurture green areas in their lives in which greenery is scare and to cherish abundant greenery’. To this end, the TMG has encouraged ‘greenery tended carefully by residents’, including encouraging planting of trees (including a practice of ‘memorial trees’ to celebrate special events like graduation or marriage), and turfing of school playgrounds. TMG recommends approaches to help residents green their neighbourhoods and to empower them, delivers workshops and shares methodology to green rooftops, wall surfaces, railroad areas and parking lots. TMG encourages fundraising to develop new parks, and offers tax incentives to encourage residents’ efforts, and also provides tax incentives on contributions to a ‘green fund’.
TMG seeks to ‘match’ community needs with local business to achieve green aspirations. There is a range of initiatives that enable local citizens to work with professional urban designers. The TMG registers groups that engage proactively in activities to ‘enhance community charm’ by ‘incorporating local colour and characteristics’. And the Ordinance on the Promotion of Stylish Townscape Creation in Tokyo (2003) led to a program that empowers Tokyo citizens, via Townscape Preparatory Committees, to work with the private sector to develop attractive townscapes. This has resulted in Tokyo citizens planting and maintaining greenery in tiny public spaces next to their property all over the city, ostensibly greening the city at the individual level.
Investing in Kankyojiku (Environmental Green Axes): Kankyojiku are networks of urban spaces around roads, rivers, parks, and infrastructure that are ‘lush with greenery’ and create a network of ‘environmental green axes throughout the city. The principle of kankyojiku spaces is that whenever urban facilities are being developed, deep, wide greenery is integrated to deliver ‘pleasant landscapes’ and ‘scenic beauty’ for local communities. The TMG published Kankyojiku guidelines in 2007 with the aim of forming more kankyojiku; the Kankyojiku Council was established the following year to promote development of these areas, and to share lessons from places where Kankyojiku has already been implemented.
Access to Shinrin yoku (forest bathing) Japan has long enjoyed a health tradition of onsen bathing (hot mineral waters). Shinrin yoku, is a term developed by the Japanese Ministry of Agriculture, Forestry, and Fisheries in 1982 to describe a therapeutic health practice that aims to boost immunity, reduce stress, and promote wellbeing. Shinrin yoku is often called ‘forest bathing’ but more literally means ‘taking in the forest atmosphere’ – the opportunity for city dwellers to spend leisurely time in the forest without any distractions. Japanese research has found associations between this nature immersion and improvements in physiological and psychological indicators of stress, mood hostility, fatigue, confusion and vitality (Park et al, 2010). In particular, the research suggests that forests located at high elevations with low atmospheric pressure can help reduce depression. Throughout Japan, there are 48 official forest routes for Shinrin yoku, designated where research has demonstrated health benefits. There are five official shinrin yoku trails within Tokyo metropolis, found at Okutama forest, on the western edge of Tokyo, and accessible by affordable train. At weekends, extra direct trains from Tokyo city centre are available. There are also some limited opportunities for immersion in forest surroundings even in the very centre of Tokyo. Around a quarter of the Tokyo population are said to participate in shinrin yoku, and some companies even include visits as part of their company health plan.
Designing Active Spaces
Tokyo is a city of commuters, but not of car ownership. The metropolis is huge, the housing prices in central areas are expensive, the suburbs are large and sprawling, the public transit system is crowded yet affordable, efficient and effective, and as such, many people spent substantial time commuting each day by public transportation (the average one-way commute to work is around 45 minutes). Car ownership is low in Tokyo (0.46 cars per household, compared to the Japan average of 1.07 and the US average of 1.95) (Hongo, 2014).
Tokyo’s citizens naturally integrate light exercise into their daily routines to a greater extent than other cities through active transport (walking, cycling, and public transit).
Walking: The Bureau of Social Welfare and Public Health at Tokyo Metropolitan Government seek to increase walking under the tagline ‘small efforts, lasting health’, with an explicit note that walking improves mental health: ‘walking can be a great change of mood and stress-release’. This effort includes a website publishing Tokyo walking maps, searchable by distance and time, and by accessibility from various train lines, and encourages walking in "urban oases": ‘The sound of chirping birds and running water, the sight of beautiful foliage, the fragrance of seasonal flowers, and more—nature has a way of truly refreshing the body and mind’ and participation in pro-social walking events. The city also provides some outdoor exercise equipment, though this is not always easy to find. The Guideline to Promote Town Planning of Health, Medical Care and Welfare - Technical Advice from Japan’s Ministry of Land, Infrastructure, Transport and Tourism considers a pedestrian-friendly environment as a component of healthy ageing, and walkability is explicitly part of living space planning for older people. This Guideline notes the importance of views along the pedestrian pathway, resting places, and high quality of pathway maintenance for safety, and emphasizes that people are more willing to walk and exercise if the surrounding environment is planned with social space and facilities, such as resting spots and parks. And the Japan Health Promotion Fitness Foundation is increasing its recommendation of 8000 steps/day.
‘From a viewpoint of urban planning, we don’t have many parks and green spaces. But still, we can keep long life expectancy by walking. People have to walk even after they become old. For example, people have to use train and stairways. In contrast, once people start living in the US, people use cars.’ – Urban planner
‘Increasing walking and pedestrianisation is primarily to ease congestion, improve safety, reduce pollution, and benefit physical health – but of course there is a mental health benefit’– Think Tank academic
Public transport: Tokyo has the highest usage of public transit in the world. Reasons for the success of Tokyo’s public transit system include its reach, efficiency, reliability, affordability, and dependability. This facilitates access to the full breadth of the city, regardless of where in the outer suburbs people may live.
‘At any station in Tokyo, we have so many passengers on the station platform. However, Japanese people don’t feel stress because trains are punctual and come one after another. People know they can get another train soon. In this sense, stress can be reduced by good railway management.’ – Urban planner
However, there are persistent challenges with overcrowding, and in addition to discomfort and stress, there are some safety challenges associated with being in such close proximity to each other, particularly for women who may experience sexual harassment including unwanted sexual contact from chikan (gropers) or voyeuristic "up-skirt" photography; the latter trend has led to two main design changes in the last two decades:
A designated women-only train carriage on many train lines during morning rush hours when the trains are particularly crowded. Elementary school-age boys, male passengers with disabilities, or male carers travelling with disabled passengers may also use these carriages. There is no law against other males using them, but this is discouraged and evokes social stigma.
A compulsory shutter sound that cannot be disabled for all mobile phones with camera functions that are sold in Japan (this feature is mandated by phone companies, not by law, but all phone manufacturers and carriers comply).
Long working hours, safety, and some of the shortest sleeping times in the world mean that many people commuting to Tokyo spend their commuting time having a restorative nap – though this type of sleep can be easily disrupted and of a poorer quality than sleep at home; furthermore very crowded train carriages preclude sleep, and many other opportunities for relaxation in transit
‘You can design the city to shorten commuting time. People in Tokyo sleep, on average, 5 hours and half at home, and they add to that their sleep during the commuting time in a train.’ – Architect
Train stations are accessible and often barrier-free, which means that regardless of physical ability, people tend to walk from stations to their destinations rather than use cars, leading to natural integration of physical activity into people’s daily routines. This has also led to channelling of much car traffic onto larger roads, while smaller networks of roads have few cars (and those that access these roads do so at low speed that prioritises pedestrians). Often these roads do not have pavements (sidewalks) due to such low car traffic, informally creating an pedestrian and bike-friendly thoroughfare reminiscent of Barcelona’s ‘superblocks’.
However, the general population’s appreciation of the natural exercise opportunities afforded by active transport may be more limited; exercise is often viewed as a hobby and people are more likely to consciously incorporate physical activity into their life as part of an organised exercise group, choosing designated exercise areas with formal facilities, such as the few running routes with adjacent showers and lockers, or well-managed playing fields in suburban parts of the city. Affordable, accessible and efficient public transport ensures accessibility to such areas.
‘People do use public transport, but you’ll see them standing in line for the escalator rather than using the stairs. They don’t commute like this for the exercise benefits’– Health company employee
‘Sports are popular only really within clubs, as an extension of school clubs. Fewer people exercise alone. But where they do exercise, there is usually good infrastructure, such as showers near the Imperial moat jogging route.’ – Health company employee
'You see playing fields all along the river, and at the weekend they are full of people playing baseball or soccer. The playing fields are all booked for a specific period, it’s all managed meticulously, and everybody has their role, like cleaning the pitch.' – Heath company employee
Cycling: Bicycles are another key form of transportation in Tokyo, but they are used in different ways than in many other cities. Around 14% of journeys in Tokyo each day are made by bike; however, these were largely shorter journeys (under 2km), and are often undertaken by women in the course of a domestic routine that may involve such tasks as shopping, getting to the train station, and picking children up from school. These women ride bikes referred to informally as the ‘mamachari’, and these affordable, light ‘mother’s bikes’ have become a cultural icon, affording women independence, physical activity, access to social and natural settings, and other benefits of cycling. They emerged in the 1950s and with their space for baskets, luggage racks and multiple child seats have made cycling accessible and convenient for women and their families. The government sought to impose a limit of one child to be carried per mamachari, provided the cyclist is over 16, a child seat is attached, and the child is under the age of six. Mothers complained, and the number of child passengers permissible was increased to a limit of two (though in reality, three children riders per mamachari are often observed). Mamacharis often share pavements (sidewalks) with pedestrians, or use smaller roads.
“Riding a mamachari is safe because everyone rode one as a child, and of course our mother rode one, and now our children ride them, and our wife rides them, so of course we are careful when driving near a mamachari.” - Architect
It is thought that up to 9% of employees across Japan cycle to work. Bicycles are used less often by workers on their commutes compared to other large cities largely due to bureaucratic barriers. Corporate law requires companies across Japan to insure their staff for accidents, which includes commuting, and insurance policies often fail to cover cycling. There can also be disputes on what specific part of the journey is regarded as commuting (for instance when cyclists stop somewhere en route). Companies may also be liable for payment of costs for commuting by bicycle (such as bicycle repairs). As a result, companies generally impose bans on cycling to work (though some operate a ‘don’t ask don’t tell policy’, and occasional others promote cycling for environmental and health benefits).
'In the building next to us, staff members are forbidden from cycling to work, because of the risk of accidents and insurance claims. It’s a shame as there is great bike parking around here. Some of the staff do it sneakily anyway.'– Health company employee
An important impact of these restrictions has been that formal bicycle infrastructure is limited in Japan:
Protected bike lanes are rare. From observation, cyclists wearing cycling clothing and helmets tend to cycle in the roads, but commonly, cyclists in work or casual clothes will informally share pavements (sidewalks) with pedestrians. This is technically only permitted where there are specific signs indicating cyclists can use the pavements (sidewalks) or for children; a third permission is more open to interpretation: when it is found to be unavoidable, in light of roadway or traffic conditions, for said standard bicycle to travel on a sidewalk in order to ensure the safety of said standard bicycle. (Road Traffic Act 2015, Article 63-4).
Bike parking is challenging, and often requires payment or accessing special bike park locations, with bikes removed by police when parked in informal spaces. It is unusual for workplaces to provide bike parking.
There is a perception that bikeshare schemes have proliferated and are now ubiquitous in Tokyo. It is true that bikeshare use has increased (1.8 million journeys in 2016 compared to just 20,000 in 2012). However, in comparison with other global cities, adoption is relatively low in Tokyo, as the city explores integration with the rest of the public transport network. There are small and increasingly popular bikeshare schemes, but these are not joined up (as opposed to other cities where there is often a city-wide service); furthermore, docking stations can be far apart and do not extend throughout the city, rendering bikeshare practical only for limited routes. As of March 2017, the largest of these schemes in Tokyo (sponsored by Docomo, in partnership with Bunkyo, Chiyoda, Chuo, Koto, Minato, Ota and Shinjuku wards) reportedly had 4,200 bikes and 281 docking stations throughout the city, with 1.8 million bikeshare journeys made in fiscal year 2016 (a massive increase from 20,000 journeys in 2012). This scheme is small compared to the largest bikeshare scheme in the world, in Hangzhou, China, which reportedly has 84,100 bikes, over 3600 docking stations, and around 115 million journeys per year. By comparison, London’s bikeshare scheme has 11,500 bikes at over 750 docking stations, and 10.3 million journeys in 2016; New York City has 10,000 bikes at 603 stations, and 14 million journeys in 2016.
Urban design for physical activity in Tokyo's plans and the upcoming Olympic Games legacy
Tokyo will host the Summer Olympic Games in 2020. The building, policymaking and legacy-building requirements International Olympic Committee may create a further catalyst, in particular, inspiring Tokyo to adopt successes from other cities:
‘By 2020, Tokyo has to compete with other cities such as New York and Beijing, for example, it will have to build more bicycle lanes on the road. External pressure urges Japan to take more designs from other cities and improve. If London built 100 km of bike lanes, Tokyo will want to build 110 km. That’s how it works.’ – Architect
While physical health legacy has not yet been very explicit in Tokyo Metropolitan Government’s Olympic ambitions, a series of policies and plans are emerging that contain physical activity promotion and facilitation aspirations. A three-year (2015-2017) progress plan has been set as part of The Long-Term Vision for Tokyo and Citizen First – Building a New Tokyo – Action Plan to 2020 (P.259-261).
Urban walkability is set to improve as part of The Long-Term Vision for Tokyo and the plan for the Bureau of Urban Development (page 137). Currently the Bureau of Social Welfare and Public Health provides a Tokyo Walking Map to encourage and empower walkers, and further encourages potential walkers by hosting various events such as Tokyo Walk 2017. Over the next 13 years, a pedestrian-friendly environment and a new underground network of pedestrian walkways are envisioned in the Toranomon area; 43km of riverside walking paths by 2024, and the Bureau of Urban Development’s road space working group is developing a Rambling Tokyo Strategy, aiming to create a network for walkers in the city. Ideas include improving infrastructure (signs, walkway quality, resting places) and local public transport, alongside encouraging travel by foot by creating and promoting ‘charm points’, and places of historical, cultural and entertainment interest on walking routes.
Urban bikeability is also set to improve, framed as part of energy-saving policies: The Long-Term Vision for Tokyo (Episodes 3-2, P.102-103) and Citizen First – Building a New Tokyo – Action Plan to 2020 include a 264km bike path (and related facilities) (Episodes 3-7, P.277), with improved bike share facilites (Episodes 1 and 5).
A general citizen fitness program is also underway. Citizen First – building a new Tokyo – Action Plan to 2020, Episode2-2-8 (Divercity) provides details of plans to make sports more enjoyable to citizen of Tokyo; this includes creating an atmosphere of sport, post-Olympics use of the Olympics facilities, and creating a ‘sport-friendly environment’ including ensuring good access to sports facilities.
Pro-Social Space
Public spaces are often the epicentre of positive, natural social interaction in a city, and this is widely recognised as an important factor in maintaining good mental health and building resilience. Tokyo tends not to have town squares, which form natural public open spaces in many Western countries. Tokyo values public spaces as settings for people to socialise, but this is not always reflected in pro-social design; the priority for open space design is more likely to be evacuation space in case of earthquakes.
‘We need gathering spaces’– Public Health Specialist
‘If people feel that they are a part of the society and community through social activities in open public spaces, it is good for their mental health.’ – Architect
‘Buildings made of wood with fewer edges and squares creates permeable architecture that encourages public social life.’ – Urban designer
‘Good social connections among the neighborhood will help prepare for a big disaster.’ – Architect
Outdoor public places
‘There are three types of public places in Tokyo: public parks, train plazas and temples/shrines.’ – Urban planner
Public parks: ‘If you go to pachinko (a pinball gambling game) or shopping centre, you will notice the lack of diversity. However, parks are open for anyone. You can see all the generations, including rich, poor, elderly and young people.’ – Architect
‘If parks are used for community activities, it can lead to improved mental health.’ – Urban designer
Train station plazas ‘Evacuation space for earthquakes takes priority over placemaking. Station plazas are mostly empty other than smoking areas. There is opportunity for development, such as removing smoking areas, and having benches – benches with a more social layout to encourage social interaction.’ – Academic
Temples and shrines ‘Public life once happened at the shrines and temples – these were our version of town squares. Now public life happens in streets and alleys.’ – Architect
'Temples and shrines have many community events – matsuri (festivals), markets...’ – Urban planner
‘They have open space, but many people think temples and shrines are for praying, not sitting. Maybe there is opportunity to develop benches just outside the temple gates.’ – Urban planner
Indoor public places Value is increasingly being recognised in the potential of indoor public spaces to reap the benefits of pro-social interaction in Tokyo. In particular, there is discussion in Tokyo of how to better harness shopping centre design and facilities to improve health and wellbeing, envisioning the centre of a shopping mall as a new version of a city’s town square, how to design bars to facilitate positive social discussions, and how to better design urban workplaces to help counteract the potentially negative health effects of Tokyo’s trend for very long working hours.
Shopping places ‘Shopping areas next to train stations belong to the train company and generate income – but they are often ‘placeless spaces’, that do not reflect the neighbourhood at all.’ - Academic
‘Shotengai (indoor/covered shopping arcades) feel very connected with their local communities… shopping malls can be more placeless’. – Academic
‘Shopping malls have evolved to have more open spaces and high ceilings to reduce stress and let people sit, relax and talk. Sometimes there are classes or markets or other activities’– Policy Specialist
‘Temples and shrines often have an associated retail corridor. This integrates them with the community and tourism.’ – Urban planner
Bars ‘In Akabane, there is a famous drinking place with a ko-shaped counter. The “Ko” shape (‘ko’ is the pronunciation of the Japanese katakana symbol コ) was invented to increase people’s happiness, to share their smiles with other visitors while they drink.’ – Urban designer
Offices ‘Designers are trying to address the stress caused by long working hours – they do that by increasing social interaction, through more open spaces in the office, and by making routes more inconvenient, so that people moving around the office or university have more movement and more social interactions’– Policy Specialist.
Places for older people The opportunities of older people to move around their neighbourhoods and socialise is valued, and a new aspect of Tokyo’s long term care policies seeks to bring urban planning into play, with ‘designated activity areas’, often around a station, that bring together shops and services, home care, health facilities and social facilities in convenient, barrier-free ‘daily activity areas’. There have also been experiments with ‘healthy roads’ in Tokyo, widening the road to facilitate pedestrian traffic of different speeds, again clustering these range of services in an accessible way.
Post-disaster places Finally, experiences of disasters in Tokyo and the surrounding regions have led to post-traumatic stress and contributed to the city’s focus on opportunities to support people’s mental health through architecture and planning, and facilitating pro-social opportunity has emerged as a key factor.
‘People were traumatised after 3/11 (tsunami and earthquake), so we had to think about city design. People evacuated due to the tsunami were living in temporary housing, which was stressful for most people. We made shared space at the entrance of their temporary housing to increase social interaction and community spaces with gardens.’ – Architect
‘Entrances were designed to face each other rather than side by side to promote sociability.’ – Architect
Safe Space
Tokyo prioritises the safety of its citizens in two main ways: it wants to be the safest city in the world in terms of crime, and it must safeguard its population in case of natural disasters. Tokyo’s crime rates are low, and with an ever-present risk of disasters such as earthquakes, crime did not factor into discussions of urban design opportunity: much of the focus on safety in Tokyo was in the context of disaster preparedness. However, the ever-present risk of earthquakes was not felt to be a concerning factor, as people feel the city prepares them adequately for the risk with clear information and training opportunities. Design interventions to enhance safety in Tokyo include earthquake-resistant buildings and firebreak belts, along with barrier-free design, to enable the safe movement of older people and those with disabilities around the city.
‘Earthquakes are the biggest safety threat in Tokyo.’– Health company employee
‘People feel little stress because Tokyo is safe and clean. If there is an earthquake, we all know what to do, we are all ready, so we do not have to fear; nuclear risks are a big fear.’ – Urban planner
Another way in which Tokyo promotes safety is in the city’s barrier-free aspirations, including residential housing, public amenities and transportation. This plan focuses on improving the quality of life and urban participation of older people and those with disabilities. (Tokyo City Planning Vision Episode 4, P.140, by Bureau of Urban Development)
Perceived prioritisation of mental health in urban planning and design in Tokyo
There was a consensus amongst those interviewed that mental health is not currently considered to be a priority within urban policymaking, architecture or urban planning.
‘I think some cities are definitely interested in physical health intervention in city planning, though people’s interest in mental health is still limited.’ – Architect.
Partly, the lack of explicit focus on mental health may be attributed to conceptualisation of the term ‘mental health’, which was not familiar to people; 'mental health' was considered synonymous with mental illness, and interviewees used words such as ‘stress’, ‘peace’ and ‘comfort' to describe the concept.
‘People think about peaceful comfort, and about reducing the stress caused by long hours at the office, but not specifically about mental health.’– Academic
‘Some neighbourhood developments focus on comfort, but do not have a specific health focus’. – Developer
The first explicit policy linking urban living and health by Tokyo Metropolitan Government seems to have been in 1972. However it was not until the 1990s that the city started to measure people’s ‘life evaluation’. More recently, there has been a trend of promoting the concept of urban ‘happiness’, particularly in the context of delivering sustainable cities, in line with global declarations such as the Millennium Development Goals and Sustainable Development Goals:
‘Since the 1980s, ‘smile’ and ‘happiness’ have been included in the name of declarations.’– Architect
‘Happiness is being recognised as a tool to drive sustainability.’ – Academic
In 2011, a prefectural ranking of wellbeing was published, leading to a flurry of measurements of wellbeing across the country (for example, in Shiga prefecture in 2012, residents proposed a ‘smile index of wellbeing’; the Tosa Association of Corporate Executives released the Gross Kochi Happiness (GKH) index; 52 local governments across the country launched a ‘Happiness League’; and in 2015, Arakawa City in Tokyo published its first "Gross Arakawa Happiness (GHA) Report" inspired by Bhutan’s measurement of happiness (Figure 1).
Projects like the Gross Arakawa Happiness report tend to use indicators that do not necessarily address urban design factors, and often include few explicit mental health indicators. However, the March 2011 Great East Japan Earthquake, tsunami and nuclear accident have catalysed some people's interest in the explicit links between urban design and mental health in Japan. Tokyo's growing expertise in post-disaster response may include urban design elements, from the emerging benefits of group walking on reducing depression in older people (Tsuji et al, 2017) to design of temporary housing.
Projects like the Gross Arakawa Happiness report tend to use indicators that do not necessarily address urban design factors, and often have few explicit mental health indicators.
‘People started thinking about health and urban planning after the 3/11 earthquake – but it is health in general; not specifically mental health’– Architect
Barriers to implementing better urban design for mental health
The interviews with a range of designers and planners revealed that pro-mental health design is sometimes implemented, but not in a systemic or even conscious way. A key barrier to explicitly leveraging urban design and planning to improve mental health seems to be lack of awareness, both about mental health in general, and the ways in which urban design could help improve population mental health. For example, people do not always appreciate that access to green space is linked to mental health:
‘Japanese people don’t make the direct connection between spending time in nature and health.’ – Public health specialist
Another barrier was considered to be a change in priorities led by the shift from public to private developments in Tokyo over time leading to a reduced focus on design impacts for the wider population:
‘In the Edo period, there was public urban planning, but since the Meiji period, the focus has been more on private development.’ – Public Health Specialist.
Innovating to increase the positive impact of public spaces on mental health may also face cultural challenges regarding the perception of what is appropriate and dignified:
‘Japan has a culture of celebrating cherry blossom in April, sitting down with food and drink and friends to have a picnic. People don’t use outside public spaces like this during other seasons. It would be considered childish.’ – Architect
Though Tokyo is known around the world for being safe, 21st century safety trends has started to drive anxiety leading to reduced pro-social interactions within communities:
‘I read last week on Twitter that parents of elementary school children suggested a condominium building put up a notice advising residents “Don’t say hello to children in this building: they are instructed not to respond strangers to protect themselves from crimes in the building’s shared space”. Elderly people responded that they feel bad when they say hello to children in the building and nobody responds.’ – Architect
How to overcome barriers
The consensus was that in order to leverage urban design to improve mental health, the most effective intervention is education and awareness-raising for the public, for designers, and for policymakers:
‘In order to promote mental health-oriented city design, we need to have a workshop to raise public awareness. That is a way to create power.’ – Architect
Opportunities suggested by respondents to promote mental health in Tokyo
With few, if any, guidelines articulating any particular factors within urban design that could improve public mental health, the interviewees offered a range of suggestions:
New spaces Options for developing Tokyo in ways that promote mental health may be limited, given the high levels of existing development of the city. This means seeking untapped opportunities:
‘Tokyo does not make much use of its waterfronts, and can learn from other cities on that, especially around socialising and physical activity.’ – Architect
‘There is no more space for parks on the ground, so now we make parks in the sky, on top of tall buildings.’ – Urban Planner
‘Many people go to shopping malls: they are becoming the new town centres. So we can have greenery, open spaces, and places to exercise, just as we might design in a town’. - Architect
Attractiveness of surroundings
‘Cutely-designed spaces can help people to feel happier – this is really important in Japan.’ - Architect
Older people
‘For elderly people, it is better to live in a city centre. You can easily access public transportation as well as entertainment such as movie theatres. In addition, there are so many hospitals at the centre of a city.’ – Urban planner
Cycling in Tokyo greenery. Picture by Mai Kobuchi
Case studies
Case Study 1: Dementia-friendly communities A dementia-friendly community is one that supports older people with dementia by ensuring they feel included and appreciated in the places they live, helping them remain independent with a good quality of life, reducing stigma, anxiety and frustration, and reducing the need for higher levels of care. Japan pioneered dementia-friendly communities. These communities combine awareness raising with adaptations to help people with dementia navigate, access, and use local facilities, from shops and banks to pharmacies and public transport. These adaptations can include signage, creating barrier-free access, places to rest and socialise, and toilet availability. The communities also encourage pro-social interaction between all ages. The Dementia-Friendly Japan Initiative helps to deliver these changes: a private-sector led platform of business, local government, academia, non-governmental organizations, and people with dementia and their families. A further policy intervention that supports dementia-friendly communities is Japan’s Compact Cities initiative. Developed to cope with depopulation, the initiative clusters government services and other facilities and ensures they are accessible to all by public transport. See examples of dementia-friendly communities in Japan.
Case Study 2: Suicide prevention For many years, Japan has had some of the highest rates of suicide in the world, and until recently the subject was taboo. In the last decade, Japan has really focused on suicide prevention, and the rates have decreased. Urban design has played a role in preventing suicide. Part of this has been physical prevention. For example the installation of platform barriers to prevent access to the train line can deter suicide attempts, or draw attention to suicide attempts, enabling others to press public panic buttons to alert staff before the train approaches. However, other design interventions are being introduced to reduce suicidal intent in certain locations. A Japanese study of installing blue LED lights on platforms seems to suggest an association with a reduction of suicides, though the size of the impact is not clear. It is speculated that this works as blue is associated with calmness or nature; or else it creates associations with the police; or even that it simply causes a disruption in ordinary perceptions and distracts suicidal intention. Blue LED lighting has now been employed by several train companies as a suicide prevention measure (Matsubayashi et al, 2010). Shin-Koiwa station, a particularly common Tokyo location for suicide by jumping in front of trains, has taken this to another level by covering the roof in blue translucent covering (though covering white light with blue light has been criticised for degrading lighting quality and reducing safety on the platform). Efforts at Shin-Koiwa station are complemented by measures including the installation of television screens showing nature scenes, and posting information about a free suicide prevention helpline on the walls of station platforms.
SWOT Analysis: urban design for mental health in Tokyo
Strengths
Weaknesses
Tokyo places a high priority on systematically integrating greenery into the urban environment, a key factor in promoting good mental health.
There is good public recognition of population ‘stress’ and an interest in reducing this while increasing 'comfort'.
Citizens are empowered to influence the design of their communities.
Tokyo has an excellent public transport network and first mile/last mile connections.
Tokyo is one of the safest cities in the world.
Tokyo places high value in older people’s quality of life and the city’s investment in barrier-free access can improve the quality of life for older people and those with mobility challenges.
Many of Tokyo's neighbourhoods informally embody 'superblock' rationale, prioritising pedestrians and cyclists in smaller streets, and nudging motor vehicles to use larger streets.
Mental health is often considered synonymous with mental illness, which risks missing opportunities to take about promoting good population mental health.
There are low knowledge and awareness levels about any links between mental health and the urban environment.
Tokyo does not issue explicit many mental health expectations, recommendations or guidelines for urban architecture, design, and planning projects, so mental health is not systematically integrated in planning and design projects.
Tokyo’s cycling infrastructure is weak; bureaucratic barriers deter explicit effort to improve design for integrating exercise into daily routines.
Opportunities
Threats
A marked policy interest in happiness could be expanded to better include mental health.
Infrastructure investment and health legacy planning for Tokyo 2020 Olympic Games is an opportunity to integrate new ideas for urban design, including a potential review of employers’ cycling policies and/or insurance companies’ coverage.
TMG already incentivises private developers to integrate green space; they could extend incentives to drive other design factors that improve mental health.
Waterfront areas of Tokyo could be developed to promote better mental health.
Train station plazas provide public open spaces that could be better developed for pro-social interaction (while retaining their earthquake evacuation space function).
Temples and shrines provide public open spaces that could be further developed for pro-social interaction, including spaces immediately outside the entrances.
Mental health and mental illness are not commonly used terms in the Tokyo urban community, so policymakers may be less likely to access the research and implement recommendations focused on these terms.
Mental illness stigma and reluctance to discuss the topic may deter prioritisation of mental health in urban design.
The preponderance of private development over public development may limit population-focused health promotion opportunities.
There is little space available to invest in delivering open, green, pro-social spaces.
Companies may continue to inhibit commuting by cycling, deterring design investment.
Conclusions
7 Lessons from Tokyo that could be applied to promote good mental health through urban planning and design in other cities
Empower and incentivise city users to install nature everywhere: To green a city where large park spaces may not be available, it is possible to empower the general public to take personal responsibility in contributing to street greenery. A combination of education and incentive programmes can also help to encourage businesses to invest in innovative greening of every available space, including roofs, walls, and public parks.
Nudge vehicles into main streets to achieve natural pedestrian-friendly superblocks: Encouraging motor vehicles to use large, efficient roads and avoid smaller roads other than for access enables prioritisation of pedestrians and cyclists, which delivers opportunities for public street events and activities, and development of green space. Meanwhile, public transport can be nearby and accessible.
Make active transport the most convenient way to get around: An affordable, efficient, reliable and extensive public transport system can nudge a natural reduction in cars and prioritise pedestrians and bicycles around station residential, shopping, social and service hubs. Combined with a culture of biking as a family transport method, this:
Promotes walking and biking as the safer, more convenient option
Drives demand for pedestrian infrastructure (such as overpasses and underpasses to access stations and services)
Drives demand for fine-grained, human-scale streetfronts that provide welcoming, interesting engaging aspects to pedestrians including shops and cafes. Such streetscapes help reduce negative thoughts, improve walkability and pro-social engagement with neighbours, and help increase feelings of safety in backstreets.
Reduces traffic on residential streets, reducing light and sound, promoting better sleep.
Make social exercise easy: Public transit access to exercise locations (from sports facilities to hiking) plus publicly accessible water facilities, lockers and shower facilities for jogging, and convenient public transport accessibility for sports facilities can help facilitate social exercise.
Integrate spiritual centres with the wider community: Temples, shrines, and other types of spiritual centres often contain potentially welcoming public spaces in cities otherwise lacking in available public spaces; the community could be further drawn in where appropriate, for example, through inviting local festivals and retail corridors connecting these buildings’ open spaces to the rest of the community.
Harness indoor public spaces for better mental health: Where wide-open urban public spaces are not available outdoors, innovative investment in interior placemaking can seek to achieve mental health benefits by designing green, active, pro-social spaces into indoor, densely-frequented places such as shopping malls.
Use innovative design to help prevent suicide: Suicide reduction is not simply about physical barriers; psychological deterrents may be explored, such as blue lights and images of nature at high-risk train stations.
5 Recommendations for Tokyo to improve public mental health through urban planning and design
Increase awareness of the links between urban design and mental health: This study revealed limited recognition and understanding of mental health by urban planners and designers in Tokyo. As such, the opportunity to promote good public mental health is not being systematically considered in their projects. Further awareness-raising and education for policymakers could articulate the opportunity and help create demand. Policymakers and professional organisations could develop policies, guidelines and incentives for architects, planners and developers to systematically integrate population mental health considerations into their projects. And architects, planners and other city designers could develop knowledge and skills that would enable them to leverage public mental health to increase the value of their projects. This could result in the integration of better mental health into Tokyo.
Realise the cycling opportunity: Bicycles are currently conceived primarily as a family transport utility and most rides are short; this is great, but to fully reap the productivity and physical and mental health benefits of cycling (including to counteract the effects of long working hours), companies’ insurance policies could evolve to incorporate commuting by bike/reduce commuting responsibility, and investments could be make in cycling infrastructure in the city (such as more protected bike lanes, cycle routes that go through natural settings, and bike parking) and in the office (such as showers, lockers and bike parking). This will help promote longer and vigorous bike rides, delivering the associated health benefits of physical exercise and nature exposure, in addition to environmental benefits.
Harness waterways for better mental health and wellbeing: Tokyo’s waterways remain largely untapped natural spaces that could provide more green and blue spaces for walking, watersports, relaxing and socialising.
Design public spaces for social interaction: Tokyo has fewer obvious pro-social public spaces than some other cities (though may have more regular organised community events than many). Currently, many green spaces are carefully tended and cordoned off, and do not encourage casual use such as picnics and ball games; train station plazas are often empty; areas adjacent to temples and shrines may have further development opportunities; and 'placeless' shopping areas associated with train stations could be improved. Innovative design can help increase opportunities for positive, natural social interactions. This may include street seating, street games, outdoor gyms, nature installation, and public gathering spaces for festivals, markets and other local events.
Optimise the workplace for better mental health: Tokyo’s practice of having long working hours compared to other cities, along with often long commutes, means that Tokyo citizens are missing out on quality time for leisure, nature access, exercise, and socialising on work days. Urban designers can help integrate these protective factors into the ‘work pathway’ to help promote good mental health. This includes the commute to/from work (opportunities for physical activity, nature exposure, relaxing setting, and efficiency, including management of overcrowding on public transport), and in the work setting (access to nature - including views of nature, pictures of nature, office gardens and office greenery, circadian lighting, opportunities for social interaction, privacy, choices about types of workspaces and settings, physical activity within the office, and support of physical activity in office commute.
Acknowledgements
Thank you to Professor Keiko Nakamura, Head of the Department of Global Health Entrepreneurship at Tokyo Medical and Dental University, for hosting this research and providing support, advice and introductions.
Gascon, M., Triguero-Mas, M., Martínez, D., Dadvand, P., Forns, J., Plasència, A., & Nieuwenhuijsen, M. J. (2015). Mental Health Benefits of Long-Term Exposure to Residential Green and Blue Spaces: A Systematic Review. International Journal of Environmental Research and Public Health, 12(4), 4354–4379" http://doi.org/10.3390/ijerph120404354
Koyama A, Miyake Y, Kawakami N, Tsuchiya M, Tachimori H, Takeshima T. Lifetime prevalence, psychiatric comorbidity and demographic correlates of “hikikomori” in a community population in Japan. Psychiatry Research 2010;176(1):69-74.
Matsubayashi, Tetsuya et al. Does the installation of blue lights on train platforms prevent suicide? A before-and-after observational study from Japan. Journal of Affective Disorders 147(1):385 – 388
Park BJ, Tsunetsugu Y, Kasetani T, Kagawa T, Miyazaki Y. The physiological effects of Shinrin-yoku (taking in the forest atmosphere or forest bathing): evidence from field experiments in 24 forests across Japan. Environmental Health and Preventive Medicine. 2010;15(1):18-26.
Tsuji T, Sasaki Y, Matsuyama Y, et al. Reducing depressive symptoms after the Great East Japan Earthquake in older survivors through group exercise participation and regular walking: a prospective observational study. BMJ Open 2017;7:e013706. doi: 10.1136/bmjopen-2016-013706
Anna Chang was born in Taipei, Taiwan, and currently lives in Los Angeles, USA. She has a Bachelor of Architecture degree from the Southern California Institute of Architecture (SCI-Arc). With an interest in residential and educational architecture and in graphic design, she plans to develop a career in the field of architectural design and tectonics from different cultures.
Harpy is a "Fire and Forget" autonomous weapon, launched from a ground vehicle behind the battle zone.
The Harpy weapon detects, attacks and destroys enemy radar emitters, hitting them with high hit accuracy. Harpy effectively suppresses hostile SAM and radar sites for long durations, loitering above enemy territory for hours.
Harpy is an all-weather day/night system. Harpy is in production, and is operational with several Air Forces.
GOV.UK Pay makes it easy for departments and agencies across government to take payments. It provides a standard set of GOV.UK branded pages which can be incorporated into a service to take payments, and an admin console which enables services to administer and process payments taken through GOV.UK Pay. Much of GOV.UK Pay’s functionality can be incorporated into existing case management tools via our API.
During beta, GOV.UK Pay will support credit and debit card payments only. Over time, further payment methods, such as direct debit or eWallet, will be added.
Your service only needs to integrate with GOV.UK Pay once to let your users make credit and debit card payments. When GOV.UK Pay is expanded to accept other payment methods, your service will not have to undertake any new integration. Using GOV.UK Pay will save your team time compared to implementing support for multiple PSPs and payment methods from scratch.
The platform currently supports one-off payments (like fees, fines or licence payments). In the future, it will also support recurring payments, for example, monthly tax payments.
The GOV.UK Pay platform does not currently support payments to cardholders, for example, payments of benefits, or grants. The platform only supports taking payments, or providing refunds.
There is no charge to use GOV.UK Pay. You will still need to pay your PSP’s transaction fees.
There are five departments and agencies currently partnering with the platform:
Ministry of Justice
Home Office
Department for Business, Energy & Industrial Strategy
To start using the GOV.UK Pay platform, you’ll need:
a new or existing digital service that needs to take payments
people with the development skills to build the technical integration with GOV.UK Pay
your own PSP contracts for gateway and merchant acquirer services (there are existing cross-government contracts offering competitive transaction costs that may be available to you - please email us at govuk-pay-support@digital.cabinet-office.gov.uk if you’d like to discuss contractual questions)
points of contact to communicate with the GOV.UK Pay team - this is likely to be your service’s product owner or service manager, as well as a technical lead
If you are new to the GOV.UK Pay API, we recommend you take a look at the quick start guide, which explains how to access the API and start exploring what it can do.
The GOV.UK Pay platform is based on REST principles with endpoints returning data in JSON format, and standard HTTP error response codes. The platform API allows you to:
If your department or agency is participating in the beta, it will have a GOV.UK Pay service account (in some cases, it may have several service accounts, one for each service that is going to integrate with Pay).
To use the GOV.UK Pay API, you will need your own individual staff account. A staff account is linked to a service account and can be used to create API keys for that service, as well as access the transaction history and service settings.
Have your service manager email govuk-pay-support@digital.cabinet-office.gov.uk to get your staff account credentials. You’ll receive access to a sandbox account which allows you to familiarise yourself and your developers with the platform before processing real payments. The GOV.UK Pay team will set up your service and provide you with login details for your account.
You must store your API keys away securely. Make sure you never share this key in publicly accessible documents or repositories, or with people who shouldn’t be using the GOV.UK Pay API directly. Read our security section for more information on how to keep your API key safe.
API Explorer setup
The quickest way to learn about the API is to use the API Explorer with the API key that you just created.
In the resulting pop-up, enter the following values:
For API Key, enter “[YOUR-API-KEY]” (do not include the quotation marks), replacing [YOUR-API-KEY] with the actual value of your sandbox API key, as shown in the screenshot below. You do not need to put the “Bearer: ” prefix which is required when calling the API from code; the API Explorer adds that automatically.
For Label, enter “Authorization” (do not include the quotation marks).
Make sure you are using an API key from your sandbox account on the self-service site, not the production account.
Making a test API call
To test the API Explorer, select Create new payment from the API Explorer Action dropdown menu. Click on the Body tab lower down to see an example JSON body that you would send when creating a payment.
As well as details of the payment, you’ll notice that you need to send a return_url when creating a payment. The reason for this is that users go to GOV.UK Pay hosted pages to actually make their payment.
The return_url is the URL of a page on your service that the user will be redirected to after they have completed their payment (or payment has failed).
If the API Explorer is set up correctly, you will receive a 201 Created response with a JSON body, confirming that the payment was created. Note that the JSON includes a next_url link. This URL is where your service should redirect the user for them to make their payment.
Go to the next_url with your browser. You’ll see the payment screen. Refer to the Testing GOV.UK Pay section to find a mock credit card number that you can enter to simulate a payment in the sandbox environment. For the rest of the details, enter some test information, bearing in mind that:
the expiry date must be in the future
the postcode must be valid
Submit the payment.
Go to the service admin site. Select Transactions at left. You’ll see the payment you just made.
This section outlines how your service will interact with GOV.UK Pay after integration.
Overview of payment flow
When an end user needs to make a payment, your service makes an API call to create a new payment, then redirects the user to the payment pages hosted on GOV.UK Pay.
The end user enters their payment details (for example, credit/debit card details and billing address) on the Pay pages. Pay handles all the details of verifying the payment with the underlying Payment Service Provider.
After the transaction reaches a final state, the end user is then redirected back to your service.
A final state means that the transaction:
succeeds
fails (for example because the payment details are wrong)
cannot be completed because of a technical error
was cancelled by your service
When the user arrives back at your service, you can use the API to check the status of the transaction and show them an appropriate message.
amount: in pence; in this example, the payment is for £145
reference: This is the reference number you wish to associate with this payment. The format is up to you, so if you have an existing format, you can keep using it with Pay (maximum 255 characters; it must not contain URLs)
description: A human-readable description of the payment; this will be shown to the end user on the payment pages and to your staff on the GOV.UK Pay self-service site (maximum 255 characters; it must not contain URLs)
return_url: This is an HTTPS URL on your site that the user will be sent back to once they have completed their payment attempt on GOV.UK Pay (this should not be JSON-encoded as backslashes will not be accepted)
This is the header and the first part of the JSON body of the response to the Create new payment API call that your service will receive:
HTTP/1.1 201 Created
Content-Type: application/json
Location: https://publicapi.pymnt.uk/v1/payments/icus7b4umg4b4g5fat4831es5f
The beginning of the response confirms the properties of the attempted payment.
The self URL (also provided in the Location header of the response) is a unique identifier for the payment. It can be used to retrieve its status details in future.
The next_url is the URL where you should now direct the end user. It points to a payment page hosted by GOV.UK Pay where the user can enter their payment details. Note that this is a one-time URL; after it’s been visited once, it will give an error message.
When your service redirects the user to next_url, they see a page something like this:
The link to try the payment again sends the user to the return_url you provided in the initial request.
Payment flow: after payment
After the user attempts payment, GOV.UK Pay returns them to the return_url you provided in the initial request, whatever the status of the payment.
The return_url should specify a page on your service. When the user visits the return_url, your service should:
+ match the returning user with their payment (with a secure cookie, or a secure random ID string included in the return_url)
+ check the status of the payment using an API call
See the Integration details section for more details about how to match the user to the payment.
To check the status of the payment, you must make a Find payment by ID API call, using the payment_id of the payment as the parameter.
The URL to do this is the same as the self URL provided in the response when the payment was first created.
The response body contains information about the payment encoded in JSON format. Here is the beginning of a typical response:
The state array within the JSON lets you know the outcome of the payment:
The status value describes a stage of the payment journey.
The finished value indicates if the payment journey is complete or not; that is, if the status of this payment can change.
In this example, the payment was successful, and the payment journey is finished.
It is up to your page at the return_url to show an appropriate message based on the state of the payment. For example, for a completed payment, you would likely want to confirm that the payment has been received and explain what will happen next. For a failed payment, you should make clear that payment failed and offer the user a chance to try again.
Now that you understand the payment process, see the Integration details section for more about how you can integrate your service with GOV.UK Pay.
The GOV.UK Pay platform is based on REST principles with endpoints returning data in JSON format, and standard HTTP error response codes. The platform API allows you to:
initiate and complete payments
view the event history for individual payments
view transactions within a specified time period
provide full or partial refunds
API overview
The base URL for the API is https://publicapi.payments.service.gov.uk/
The same base URL is now used for testing and production. The API key you use determines if the actions are treated as sandbox test payments or processed as real payments.
For full details of each API action, see the API Browser:
You can also use our interactive API Explorer to try out API calls and view responses.
See the Quick Start Guide section for how to set up the API Explorer. Make sure you enter your sandbox API key to avoid generating real payments!
API authentication
GOV.UK Pay authenticates API calls with OAuth2 HTTP bearer tokens. These are easy to use and consist of one component: your API key. Bearer tokens are specified in RFC 6750.
When making an API call, you’ll need to add your API key to an “Authorization” HTTP header and prefix it with “Bearer ”. This is an example of how a header would look.
You can check the status of a given payment using the Find payment by ID API call.
The response will include a status value as described in the table below, and a true/false finished value which indicates if the status can change.
status value
Meaning
finished value
created
Payment created; user has not yet visited next_url
false
started
User has visited next_url and is entering payment details
false
submitted
User has submitted payment details but has not yet clicked Confirm
false
success
User successfully completed the payment
true
failed
User didn’t complete the payment, due to invalid or expired payment details, fraud block, etc.
true
cancelled
Your service cancelled the payment using an API call or the self-service site.
true
error
Something went wrong with GOV.UK Pay or the underlying Payment Service Provider.
true
HTTP status codes
You will encounter typical HTTP success and error response codes when using the Pay API. Generally any:
100 code is informational
200 code indicates you’ve been successful
300 code indicates a redirection
400 code indicates a client error (your error)
500 code indicates a server error (something went wrong on the GOV.UK Pay end)
These are the known error codes you are likely to receive:
Common error code
Meaning
200
Payment information request succeeded
201
Payment has been created
204
The server successfully processed the request, but is not returning any content
400
The server cannot process the request due to a client error, e.g. missing details in the request or a failed payment cancellation
401
Required authentication has failed or not been provided
404
The resource you want cannot be found
412
Precondition failed: e.g. mismatch in expected refund amount available
422
Unprocessable entity obtained on a request validation
Any 500 error
Something is wrong with GOV.UK Pay - please contact us
API error codes
When an error occurs, you will receive these API codes in the body of the response.
This is the format of the general JSON error response body:
{"code":"PXXXX","description":"Message explaining the error"}
Note that the description provided is written to be informative to you, the developer, and is not intended for the end user.
Also note that extra keys, e.g field, may be provided on a per-error basis.
These error codes provide more information about why a request failed.
Request type
Error code
Meaning
Cause
Create payment
P0101
Missing mandatory attribute
The request you sent is missing a required attribute
Create payment
P0102
Invalid attribute value
The value of an attribute you sent is invalid
Create payment
P0197
Unable to parse JSON
The JSON you sent in the request body is invalid
Create payment
P0198
Downstream system error
Internal error with GOV.UK Pay: contact us, quoting the error code
Create payment
P0199
Account error
There is a problem with your service account: contact us, quoting the error code
Find payment by ID
P0200
paymentId not found
No payment matched the paymentId you provided
Find payment by ID
P0298
Downstream system error
Internal error with GOV.UK Pay: contact us, quoting the error code
Return payment events by ID
P0300
paymentId not found
No payment matched the paymentId you provided
Return payment events by ID
P0398
Downstream system error
Internal error with GOV.UK Pay: contact us, quoting the error code
Search payments
P0401
Invalid parameters
The parameters you sent are invalid.
Search payments
P0402
Page not found
The requested page of search results does not exist
Search payments
P0498
Downstream system error
Internal error with GOV.UK Pay: contact us, quoting the error code
Cancel payment
P0500
paymentId not found
No payment matched the paymentId you provided
Cancel payment
P0501
Cancellation failed
Cancelling the payment failed; contact us, quoting the error code
Cancel payment
P0598
Downstream system error
Internal error with GOV.UK Pay; contact us, quoting the error code
Create refund
P0600
paymentId not found
No payment matched the paymentId you provided
Create refund
P0601
Missing mandatory attribute
The request you sent is missing a required attribute
Create refund
P0602
Invalid attribute value
The value of an attribute you sent is invalid
Create refund
P0603
Refund not available
The payment is not available for refund.
Create refund
P0604
Refund amount available mismatch
The refundamountavailable value you provided does not match the true amount available to refund.
Create refund
P0697
Unable to parse JSON
The JSON you sent in the request body is invalid
Create refund
P0698
Downstream system error
Internal error with GOV.UK Pay; contact us, quoting the error code
Get refund
P0700
refundId not found
No refund matched the refundId you provided
Get refund
P0798
Downstream system error
Internal error with GOV.UK Pay; contact us, quoting the error code
Get refunds
P0800
refundId not found
No refund match the refundId you provided
Get refunds
P0898
Downstream system error
Internal error with GOV.UK Pay; contact us, quoting the error code
General
P0900
Too many requests
Your service account is sending requests above the allowed rate; try the request again in a few seconds
General
P0920
Request blocked by security rules
Our firewall blocked your request. See Troubleshooting section for details.
General
PO999
GOV.UK Pay is unavailable
The GOV.UK Pay service is temporarily down.
Card types
These are the possible values of the card_brand parameter.
card_brand
type
label
visa
DEBIT
Visa
visa
CREDIT
Visa
master-card
DEBIT
Mastercard
master-card
CREDIT
Mastercard
american-express
CREDIT
American Express
diners-club
CREDIT
Diners Club
discover
CREDIT
Discover
jcb
CREDIT
Jcb
unionpay
CREDIT
Union Pay
API rate limits
There is a maximum rate limit for requests to the API from your service account. The limit is high and most services are unlikely ever to exceed it.
If you do exceed the limit (that is, send a large number of requests in a short amount of time), you will receive P0900 errors. If this happens, you can attempt any rate-limited requests again after a second has passed.
Please contact us if you want to discuss the rate limiting applied to your service account.
This section gives more technical detail about how to integrate your service with GOV.UK Pay.
Creating a payment
When you make a payment, you will need to supply a reference. This is a unique identifier for the payment. If your organisation already has an existing identifier for payments, you can use it here.
You will also need to supply a return_url, a URL hosted by your service for the user to return to after they have completed payment on GOV.UK Pay. See the section below on Choosing the return URL for more information.
You will need to store the URL from the Location header/in the self section of links in the JSON body (the same URL is shown in both places). This URL contains the GOV.UK Pay payment_id which uniquely identifies the payment. An authenticated GET request to the URL will return information about the payment and its status.
It is important that you do not expose the URL with the payment_id publically, for example as a URL parameter or in an insecure cookie. You should store it as a secure cookie or in a database.
You will receive the next_url to which you should direct the user to complete their payment. During the GOV.UK Pay beta, it is only returned in response to the initial POST call to create a payment, not on sub. It will only work once.
Tracking the progress of a payment
You can track the progress of a payment while the user is on GOV.UK Pay using the Find payment by ID call.
NOTE: The status of the payment will go through several phases until it either succeeds or fails. See the API reference section for more details.
Choosing the return URL and matching user to payment
For security reasons, GOV.UK Pay does not add the payment ID or outcome to your return_url as parameters.
To match up a returning user with their payment, there are two recommended methods:
use a secure cookie containing the Payment ID from GOV.UK Pay, issued by your service when the payment is created (before sending the user to next_url). Users won’t be able to decrypt a secure cookie, so a fraudster could not alter the payment ID and intercept other users’ payments.
create a secure random ID (such as a UUID) and include this as part of the return_url, using a different return_url for each payment. Since a securely generated UUID is not guessable, fraudsters will not be able to intercept users’ payments.
Note: If you create an ID yourself, you’ll likely need to store this in a datastore mapped to the payment ID just after you create a payment.
Accepting returning users
A user directed to the return URL could have:
- paid successfully
- not paid because their card was rejected or they clicked cancel
- not paid because your service (via the API) or service manager (via the self-service dashboard) cancelled the payment in progress
Your service should use the API to check the payment status when the user reaches the return URL, and provide an appropriate response based on the final status of the payment attempt.
When a user doesn’t complete their payment journey
The user may close their browser or lose internet connection in the middle of the payment flow on GOV.UK Pay. These users will not be redirected back to your service.
You can still check on the status of these payments by making a GET request using the Location Header or Self Link, the same way you would if they were redirected, but just after a set time (eg, an hour).
Note: GOV.UK Pay will eventually expire incomplete payments, but you should expect an occasional success or failure if the user experienced problems right at the moment of the redirect.
If a user does not have enough funds in their account to make a payment, the current GOV.UK Pay frontend will not let them try again with separate card details. This will soon be fixed as part of the beta.
Cancelling a payment
You can cancel a payment that is not yet in a final state by using the Cancel payment API call.
Financial reporting integration
If you’re a beta partner, the GOV.UK Pay team will hold technical workshops with you to discuss how to integrate the reporting from GOV.UK Pay with your own financial systems.
With our API, you’ll be able to:
GET the status of an individual payment
cancel a payment that’s not yet captured
soon get the status of multiple payments based on certain criteria, e.g. date range (this feature is under development)
soon issue full and partial refunds (this feature is under development)
GOV.UK Pay now supports refunding payments. You can choose to refund part of a payment, or the full amount.
After issuing a partial refund of a payment, you can issue further partial refunds, until the full amount of the original payment has been refunded.
Each payment has a refund status which can take one of the following values:
Payment refund status
Meaning
pending
The payment is potentially refundable, but is not ready to be refunded yet.
unavailable
It is not possible to refund the payment: for example, the payment failed.
available
It’s possible to initiate a refund. Note that this does not mean that the full original value of the payment is available to refund: there may have been previous partial refunds.
full
The original value of the payment has been fully refunded, so it is not possible to refund any more.
In the sandbox, you will not see the pending status as there is no delay in processing a payment. In a live environment, successful payments will spend some time in pending state before a refund becomes possible.
This refund status is a property of a payment; each refund will also have its own status of submitted/success/error.
Payment refund status and partial refunds
You can find out the refund status of a payment with the API using the Find payment by ID or Search payments functions.
The response will contain a refund_summary section. Here is an example of that section of the response for a completed £50 payment with no previous refunds:
In this example, the refund status of the payment is available, indicating that a refund can be initiated. The amount_available value is 5000: that is, £50 is available to be refunded.
The amount_submitted is 0, showing that there have been no previous refunds.
Partial refunds are possible, and you can make multiple partial refunds against the same payment. As you’d expect, the total of refunds against a payment can’t be greater than the original payment.
In this case, the original payment was for £90. The amount_available value shows that only £60 is available to be refunded, because £30 has already been refunded in one or more partial refunds (as shown by amount_submitted).
If you needed to know the details of the partial refunds (for example, whether there had been a single refund of £30 or multiple smaller refunds), you could use the API function to Get all refunds for a payment.
Specifying the expected refund available
When you submit a refund request for a payment via the API, you can optionally specify a refund_amount_available value in the body of the request.
This is so you can provide the total amount of the original payment you expect to be available for refund at the time your refund request is made.
The purpose of this is to prevent accidentally processing a partial refund more than once, by rejecting requests where your refund_amount_available doesn’t match the real amount that’s available to be refunded.
For example, suppose a payment was made for £5, but later it turns out the user is due a £2 refund. Your system for processing refunds submits a request for a £2 refund to our API, but it accidentally gets sent twice. Without a refund_amount_available specified, GOV.UK PAY would have no way to tell the second request was a mistake, so it would process both requests, generating two refunds of £2 each.
Now imagine the same scenario, but if you had specified the refund_amount_available as £5. The first request still succeeds, leaving £3 available to be refunded. When the accidental duplicate request comes in, it has a refund_amount_available of £5, even though only £3 is available, so GOV.UK PAY can tell that it’s a stale request, and it is rejected.
We recommend that your service tracks the expected refund amount available and submits a refund_amount_available value whenever you request a refund via the API.
When a refund request is rejected due to a refund amount available mismatch, the error code returned is P0604, with an HTTP status of 412.
You need to specify the paymentId of the original payment, and provide the amount to refund (in pence).
You should check that the amount you attempt to refund does not exceed the amount_available value; otherwise you will receive a P0603 error like this:
{
"code": "P0603",
"description": "The payment is not available for refund. Payment refund status: amount_not_available"
}
Each refund has a unique refund_id.
You can use the Get all refunds for a payment function to get information all the refunds for a payment (including their refund_ids).
You can retrieve information about an individual refund using the Find payment refund by ID function.
Handling refund errors
When you try to create a refund with the API, it may fail immediately - for example if you try to refund more than the amount available. In that case, the original Submit a refund for a payment request will return an error code and a description of what it means. (A refund attempt that fails like this with an error code is not assigned a refundId and is not available using Find payment refund by ID).
If accepted by GOV.UK Pay, a refund may still go on to fail at the PSP. This may happen if the card involved is cancelled or has expired, or if your account with the PSP does not have enough funds to cover the refund.
Each refund has a processing status indicated by a “status” value that is returned in response to a successful request to a /refunds/ endpoint.
This is a partial example of a response including the refund’s processing status:
Initially, in a live environment, the status returned will be submitted. After the PSP has processed the refund, the status returned will be success or error. (In the sandbox environment, the status will always go straight to success).
Refund processing status
Meaning
submitted
The refund request is valid as far as GOV.UK Pay can tell and has been passed to the underlying payment processor.
success
The refund has been successfully processed.
error
It was not possible for the payment processor to make the refund.
To handle this, you must use Find payment refund by ID to check the processing status of the refund until it changes to either success or error.
It will typically take 30 minutes for the status to change. We suggest you check the status after 30 minutes, and do not repeat more than once every 5 minutes.
In the event of an error, GOV.UK Pay will not currently provide any more information. Please contact us if more information is required about why a refund failed.
Refunds from the admin site
As an alternative to refunding via the API, you can use the service admin site at https://selfservice.payments.service.gov.uk to view transactions and issue refunds.
Go to the Transactions section of the site to see a list of transactions.
In this list, click on the reference for an individual payment (in the Reference number column) to see details of that payment (including any previous refunds).
In the details view, you can use the red Refund payment button at the upper right to carry out a full or partial payment.
End users are automatically notified by email about payments (according to the settings you have entered in the self-service admin site), but not when a payment is refunded (either manually or via the API). You should arrange to notify end users about refunds as appropriate for your service.
You will receive a sandbox account for testing in addition to your production credentials.
When testing, you’ll need to ensure:
you’ve tested with mock card numbers (see below) to simulate both successful and unsuccessful transactions - never use real card numbers for testing purposes, as this breaks PCI rules
you test with your sandbox account, not your production account
REST calls succeed with 200 API codes
you’ve tested the user journey from your service to the payments platform using end-to-end/smoke tests
Integration testing
To check your integration with GOV.UK Pay is working as expected, you’ll need to run a series of tests.
We recommend that you build tests to include both the GOV.UK Pay API and its front end user journey. We are constantly iterating our interface, so you shouldn’t rely on any specific page layout, but you can build tests that address form elements (such as buttons) using their IDs. Alternatively, you can build stubs that will emulate GOV.UK Pay functionality.
Our APIs will evolve over time. We will always let you know in advance if we intend to make any breaking, or backwards-incompatible API changes so you can ensure your service works with the new version. Please see our section on versioning for more information.
There is guidance in the GOV.UK Service Manual on smoke testing. At the Government Digital Service, we tend to use Cucumber for testing (regardless of the core code language), as you can easily describe the behaviour you expect at the appropriate level.
When you’re testing your integration, you must not use real card numbers. Use the below test numbers.
When you’re using these card numbers, you can enter any valid value for the other details (name, expiry date, card security code etc). For example, it doesn’t matter what expiry date you enter, but it must be in the future.
Still in Worldpay, go to Profile > Merchant Channel and set the endpoint for HTTP notifications from Worldpay to GOV.UK Pay to https://notifications.payments.service.gov.uk/v1/api/notifications/worldpay
Use the same URL for Test and Production channels within Worldpay. The completed settings should look like this:
If you want to use 3D Secure authentication for your payments, ask your Worldpay account manager to configure your merchant code (you cannot do this yourself by logging in to Worldpay). Ask them to enable 3D Secure for all payments. Once this is done, you can sign in to the GOV.UK Pay self-service admin site and turn on 3D Secure.
Emergency contact details
Your service manager should have been given details of emergency contact methods to reach our support team in case of an urgent problem (for example, if you suspect that fraudulent transactions are being made on your account).
Before you enter production, you should make sure that the right people on your team know how to report an emergency.
Integrating with existing reporting systems
If you’re a beta partner, the GOV.UK Pay team will hold technical workshops with you to discuss how to integrate the reporting from GOV.UK Pay with your own financial systems.
This section explains how to troubleshoot common problems.
Code P0920 errors
Problem: Some calls you make to the API receive a 400 Bad Request response, with this error in the response body:
{"code":"P0920","description":"Request blocked by security rules. Please consult API documentation for more information."}
Fix: This response means that your API call was blocked by our firewalls. Check that the API call you sent was correctly formed.
Possible reasons why your call may be rejected include:
sending a POST call with an empty body where content in the body is expected
the use of invalid characters such as <, >, |, " or backslashes
the return_url you provide is not https://
there are URLs inside the reference or description you provide
The code examples in the documentation don’t work
Problem: The “Example Request” code snippets in the API documentation always cause the request to fail with a “401 unauthorized” error.
Fix: Remember that you must send the API key with “Bearer ” in front of it (not including the quotation marks.) This is not made clear in the code examples due to a technical limitation.
If the code example includes a line like this:
request["authorization"]='YOUR API KEY HERE'
remember that you must actually use “Bearer ” as the value.
When we add new properties to the JSON responses, the Pay API version number will not change. You should develop your service to ignore properties it does not understand.
New, dated versions of the public API will be released if JSON values are removed in a backwards incompatible manner. All these versions of the Pay API will be documented in our Revision History table below.
Our version number will be updated in the URL when there is a release. All releases will be marked with full version numbers.
Updating to the latest Pay API
We’ll send you an email to let you know about any new versions of our API.
If you want to update to our latest API, make sure you test your code before committing to the change.
We’ll support each version of the early beta Pay API for at least 1 month after we issue an upgrade notice. As the API matures, we may increase this support period to 3 or 6 months.
Soon, we hope to let you check which version of our API you’re currently running by checking your dashboard.
Revision history
Version
Date
Author
Comments
1.0
Reporting vulnerabilities
If you believe GOV.UK Pay security has been breached, contact us immediately at govuk-pay-support@digital.cabinet-office.gov.uk. If you are a production user and the suspected breach is severe, consider using the urgent contact details provided to your service manager.
Please don’t disclose the suspected breach publically until it has been fixed.
Securing your developer keys
The GOV.UK Pay platform will let you create as many API keys as you want.
We suggest letting all your developers experiment with their own test keys in the Sandbox environment, but keys for real integrations should only be shared with the minimum number of people necessary. This is because these keys can be used to create and manipulate payments. Do not commit these keys to public source code repositories.
Revoke your key immediately using the self-service site if you believe it has been accidentally shared or compromised.
If you believe your key has been used to make fraudulent payments, contact the GOV.UK support team using the urgent contact methods provided to your service manager.
To further secure your live developer keys:
don’t embed your developer keys in any of your code - this only increases the risk that they will be discovered (instead store your keys inside your configuration files)
don’t store your API keys in your application source tree (even when you’re not making your source code publically available)
revoke your developer keys when they’re no longer required (this limits the number of entry points into your account)
have a leavers’ process, so that a developer’s API key is revoked when they leave
Securing your integration with GOV.UK Pay
Make sure you’ve fully tested your integration with GOV.UK Pay. When doing so, take care not to use any real card numbers. Read our testing section for more details.
Securing user information
GOV.UK Pay doesn’t store full card numbers or CVV data for security reasons. This means you won’t be able to search for transactions using card numbers. You’ll only be able to look up certain transactions using the:
payment ID
reference metadata put into the system when creating the payment ID
Payment Card Industry (PCI) compliance
A major benefit of integrating with GOV.UK Pay is that you’ll have immediate access to a fully secure and PCI accredited platform.
The Payment Card Industry Security Standards Council is an open global forum, launched in 2006, that develops, maintains and manages the PCI Security Standards. These cover everything from the point of entry of card data into a system, to how the data is processed, through to secure payment applications.
Compliance with PCI Security Standards is governed by the payment brands and their partners. Although the GOV.UK Pay team has completed the majority of work to ensure compliance with the PCI standard, your service may be asked by your payment provider to supply extra evidence on your internal security protocols. You will have to complete a self-assessment questionnaire called the PCI DSS SAQ that will include a series of yes or no questions about your security practises. Your service manager may also be asked to undertake security awareness training to ensure they are qualified to handle credit card data.
HTTPS
GOV.UK Pay follows government HTTPS security guidelines. The Hypertext Transfer Protocol Secure (HTTPS), which involves the Transport Layer Security (TLS) protocol is used by the platform to authenticate servers and clients/secure connections.
Your government service will only be able to integrate with the GOV.UK Pay if it also uses HTTPS.
We are experimenting with taking in Open Source contributions from our live partner services and we hope to widen this further after learning more about integrating outside contributions into our workflow.
If you’d like to speak to us about writing a developer library for GOV.UK Pay, we’d love to hear from you: please email us
In emergencies, Service teams can raise a P1 incident by emailing us. This will raise a high priority ticket and could alert the team in the middle of the night, so please only use it for P1 incidents. Please do not share this address with people outside your team.
Users experiencing intermittent or degraded service due to platform issue
1 business day or next working day
2 business days
P4
Minor
Component failure that is not immediately service impacting
2 business days
weekly
Please always raise support tickets using the email addresses supplied above rather than relying on a personal email address. This ensures your issue will be dealt with as soon as possible.
Support requests are raised both automatically and manually. Our automated monitoring and system alerts feed into our helpdesk ticketing system.
Communication
The Pay support team will contact you via two methods:
If you have raised a support request, we will communicate with you individually using our Deskpro ticketing system via govuk-pay-support@digital.cabinet-office.gov.uk. This will ensure that you are not relying on a single individual for comms.
We will provide general incident updates to all GOV.UK Pay users through an email mailing list. All users of GOV.UK Pay will be subscribed to this list. You can choose to unsubscribe, but should ensure that anyone who needs to know about incident or maintenance alerts is subscribed. If you would like someone subscribed to this list please email govuk-pay-support@digital.cabinet-office.gov.uk.
Who deals with support issues
At all times, we have a dedicated Support Lead and Secondary (rotated weekly) who draw on additional support as required from other members of the team.
Additionally, a manager will act as the Support escalation route.
For an incident, an Incident Lead and an Incident Comms person will be assigned.
How do we respond
In general, our priority is to ensure high availability of the service.
However, in the event of a critical P1 incident we may need to isolate the issue and take GOV.UK Pay offline in order to minimise any security breach until we can investigate fully.
For general or non urgent support queries our service levels are:
First reply service level
First resolution service level
80% within two working days
80% within 5 working days
Resolution and Closure
For all incidents:
Resolution occurs when the Incident Lead, in consultation with the initiator, confirms that the matter reported is no longer impacting the service.
Any follow up work will be tracked as part of business as usual.
An incident post mortem will be held for all P1 incidents, and for any others where one is needed. The outputs will be made available to our beta partners.
Feedback on support matters
If we are not responding to your requests for support in the way that you would expect or your problem is not addressed in this documentation, please email us, including any error messages you’re getting.
Newsletter and blogs
GOV.UK Pay publishes a newsletter every few weeks. This will help you stay updated with any new features we release.
More coverage of the platform can be found in press reports. Published articles include:
We’re looking forward to producing these. Please bear with us.
If you are a beta partner and think your service could to be featured in our case study section, please contact us.
This section defines commonly used terms in GOV.UK Pay.
Term
Definition
Service admin site
the site where your staff users can sign in to view transaction history and settings for your integration with GOV.UK Pay
Payment pages
the pages on GOV.UK Pay which users follow to make a payment
End user
a user who is paying through GOV.UK Pay
Staff user
a user from a team delivering a government service which uses GOV.UK Pay to process payments; if you’re reading this, someone like you
Platform administrator
this is a member of the GOV.UK Pay team responsible for controlling access to the platform, ensuring effective support is provided to partner services, and improving the platform on the basis of user feedback
Platform developer
a member of the GOV.UK dev team
Platform accounts
Account type
Account definition
Staff account
an account for a staff user that grants appropriate access rights and permissions to administer payments for their service through GOV.UK Pay. Initially, all staff accounts will have the same permissions. But in future they will be distinguished by different permissions relating to the different user groups outlined: Service Admin, Accountant, Refund Manager, Developer
Service account
an account for a partner service with GOV.UK Pay. It is this account that staff users will be accessing. The Service Account will associate all relevant information and preferences with a particular service (e.g. transaction history, selected providers, accepted payment methods, etc).
Provider account
an account that a partner service has with a payment provider for payment processing. Each service may have separate provider accounts with multiple providers
(Edit 5 Jan 2011: New Compression results section and small crinkler x86 decompressor analysis)
If you are not familiar with 4k intros, you may wonder how things are organized at the executable level to achieve this kind of packing-performance. Probably the most important and essential aspect of 4k-64k intros is the compressor, and surprisingly, 4k intros have been well equipped for the past five years, as Crinkler is the best compressor developed so far for this category. It has been created by Blueberry (Loonies) and Mentor (tbc), two of the greatest demomakers around.
Last year, I started to learn a bit more about the compression technique used in Crinkler. It started from some pouet's comments that intrigued me, like "crinkler needs several hundred of mega-bytes to compress/decompress a 4k intros" (wow) or "when you want to compress an executable, It can take hours, depending on the compressor parameters"... I observed also bad comrpession result, while trying to convert some part of C++ code to asm code using crinkler... With this silly question, I realized that in order to achieve better compression ratio, you better need a code that is comrpession friendly but is not necessarily smaller. Or in other term, the smaller asm code is not always the best candidate for better compression under crinkler... so right, I needed to understand how crinkler was working in order to code crinkler-friendly code...
I just had a basic knowledge about compression, probably the last book I bought about compression was more than 15 years ago to make a presentation about jpeg compression for a physics courses (that was a way to talk about computer related things in a non-computer course!)... I remember that I didn't go further in the book, and stopped just before arithmetic encoding. Too bad, that's exactly one part of crinkler's compression technique, and has been widely used for the past few years (and studied for the past 40 years!), especially in compressors like H.264!
So wow, It took me a substantial amount of time to jump again on the compressor's train and to read all those complicated-statistical articles to understand how things are working... but that was worth it! In the same time, I spent a bit of my time to dissect crinkler's decompressor, extract the code decompressor in order to comment it and to compare its implementation with my little-own-test in this field... I had a great time to do this, although, in the end, I found that whatever I could do, under 4k, Crinkler is probably the best compressor ever.
You will find here an attempt to explain a little bit more what's behind Crinkler. I'm far from being a compressor expert, so if you are familiar with context-modeling, this post may sounds a bit light, but I'm sure It could be of some interest for people like me, that are discovering things like this and want to understand how they make 4k intros possible!
Crinkler main principles
If you want a bit more information, you should have a look at the "manual.txt" file in the crinkler's archive. You will find here lots of valuable information ranging from why this project was created to what kind of options you can setup for crinkler. There is also an old but still accurate and worth to look at powerpoint presentation from the author themselves that is available here.
First of all, you will find that crinkler is not strictly speaking an executable compressor but is rather an integrated linker-compressor. In fact, in the intro dev tool chain, It's used as part of the building process and is used inplace of your traditional linker.... while crinkler has the ability to compress its output. Why crinkler is better suited at this place? Most notably because at the linker level, crinkler has access to portions of your code, your data, and is able to move them around in order to achieve better compression. Though, for this choice, I'm not completely sure, but this could be also implemented as a standard exe compressor, relying on relocation tables in the PE sections of the executable and a good disassembler like beaengine in order to move the code around and update references... So, crinkler, cr-linker, compressor-linker, is a linker with an integrated compressor.
Secondly, crinkler is using a compression method that is far more aggressive and efficient than any old dictionary-coder-LZ methods : it's called context modeling coupled with an arithmetic coder. As mentioned in the crinkler's manual, the best place I found to learn about this was Matt Mahoney resource website. This is definitely the place to start when you want to play with context modeling, as there are lots of sourcecode, previous version of PAQ program, from which you can learn gradually how to build such a compressor (more particularly in earlier version of the program, when the design was still simple to handle). Building a context-modelling based compressor/decompressor is almost accessible from any developer, but one of the strength of crinkler is its decompressor size : around 210-220 bytes, which makes it probably the most efficient and smaller context-modelling decompressor in the world. We will see also that crinkler made one of the simplest choice for a context-modelling compressor, using a semi-static model in order to achieve better compression for 4k of datas, resulting in a less complex decompressor code as well.
Lastly, crinkler is optimizing the usage of the exe-PE file (which is the Windows Portable Executable format, the binary format of the a windows executable file, official description is available here). Mostly by removing the standard import table and dll loading in favor of a custom loader that exploit internal windows structure as well as storing function hashing in the header of the PE files to recover dll functions.
Compression method
Arithmetic coding
The whole compression problem in crinkler can be summarized like this: what is the probability of the next bit to compress/decompress to be 1? The better is the probability (meaning by matching the expecting result bit), the better is the compression ratio. Hence, Crinkler needs to be a little bit psychic?!
First of all, you probably wonder why probability is important here. This is mainly due to one compression technique called arithmetic coding. I won't go into the detail here and encourage the reader to read about the wikipedia article and related links. The main principle of arithmetic coding is its ability to encode into a single number a set of symbols for which you know their probability to occur. The higher the probability is for a known symbol, the lower the number of bits will be required to encode its compressed counterpart.
At the bit level, things are getting even simpler, since the symbols are only 1 or 0. So if you can provide a probability for the next bit (even if this probability is completely wrong), you are able to encode it through an arithmetic coder.
A simple binary arithmetic coder interface could look like this:
/// Simple ArithmeticCoder interface
class ArithmeticCoder {
/// Decode a bit for a given probability.
/// Decode returns the decoded bit 1 or 0
int Decode(Bitstream inputStream, double probabilityForNextBit);
/// Encode a bit (nextBit) with a given probability
void Encode(Bitstream outputStream, int nextBit, double probabilityForNextBit);
}
And a simple usage of this ArithmeticCoder could look like this:
// Initialize variables
Bitstream inputCompressedStream = ...;
Bitstream outputStream = ...;
ArithmeticCoder coder;
Context context = ...;
// Simple decoder implem using an arithmetic coder
for(int i = 0; i < numberOfBitsToDecode; i++) {
// Made usage of our psychic alias Context class
double nextProbability = context.ComputeProbability();
// Decode the next bit from the compressed stream, based on this
// probability
int nextBit = coder.Decode( inputCompressedStream, nextProbability);
// Update the psychic and tell him, how much wrong or right he was!
context.UpdateModel( nextBit, nextProbability);
// Output the decoded bit
outputStream.Write(nextBit);
}
So a Binary Arithmetic Coder is able to compress a stream of bits, if you are able to tell him what's the probability for the next bit in the stream. Its usage is fairly simple, although their implementations are often really tricky and sometimes quite obscure (a real arithmetic implementation should face lots of small problems : renormalization, underflow, overflow...etc.).
Working at the bit level here wouldn't have been possible 20 years ago, as It requires a tremendous amount of CPU (and memory for the psychic-context) in order to calculate/encode a single bit, but with nowadays computer power, It's less a problem... Lots of implem are working at the byte level for better performance, some of them can work at the bit level while still batching the decoding/encoding results at the byte level. Crinkler doesn't care about this and is working at the bit level, making the arithmetic decoder in less than 20 x86 ASM instructions.
The C++ pseudo-code for an arithmetic decoder is like this:
int ArithmeticCoder::Decode(Bitstream inputStream, double nextProbability) {
int output = 0; // the decoded symbol
// renormalization
while (range < 0x80000000) {
range <<= 1;
value <<= 1;
value += inputStream.GetNextBit();
}
unsigned int subRange = (range * nextProbability);
range = range - subRange;
if (value >= range) { // we have the symbol 1
value = value - range;
range = subRange;
output++; // output = 1
}
return output;
}
This is almost exactly what is used in crinkler, but this done in only 18 asm instructions! The crinkler arithmetic coder is using a 33 bit precision. The decoder only needs to handle up to 0x80000000 limit renormalization while the encoder needs to work on 64 bit to handle the 33 bit precision. This is much more convenient to work at this precision for the decoder, as it is able to easily detect renormalization (0x80000000 is in fact a negative number. The loop could have been formulated like while (range >= 0), and this is how it is done in asm).
So the arithmetic coder is the basic component used in crinkler. You will find plenty of arithmetic coder examples on Internet. Even if you don't fully understand the theory behind them, you can use them quite easily. I found for example an interesting project called flavor, which provides a tool to produce some arithmetic coders code based on a formal description (For example, a 32bit precision arithmetic coder description in flavor), pretty handy to understand how things are translated from different coder behaviors.
But, ok, the real brain here is not the arithmetic coder... but the psychic-context (the Context class above) which is responsible to provide a probability and to update its model based on the previous expectation. This is where a compressor is making the difference.
Context modeling - Context mixing
This is one great point about using an arithmetic coder: they can be decoupled from the component responsible to provide the probability for the next symbol. This component is called a context-modeling.
What is the context? It is whatever data can help your context-modeler to evaluate the probability for the next symbol to occur. Thus, the most obvious data for a compressor-decompressor is to use previous decoded data to update its internal probability table.
Suppose you have the following sequence of 8 bytes 0x7FFFFFFF,0xFFFFFFFF that is already decoded. What will be the next bit? It is certainly to be a 1, and you could bet on it as high as 98% of probability.
So this is not a surprise that using history of data is the key point for the context modeler to predict next bit (and well, we have to admit that our computer-psychic is not as good as he claims, as he needs to know the past to predict the future!).
Now that we know that to produce a probability for the next bit, we need to use historic data, how crinkler is using them? Crinkler is in fact maintaining a table of probability, up to 8 bytes + the current bits already read before the next bit. In the context-modeling jargon, it's often called the order (before context modeling, there was technique developped like PPM for Partial Predition Matching and DMC for dynamic markov compression). But crinkler is using not only the last x bytes (up to 8), but sparse mode (as it is mentioned in PAQ compressors), a combination of the last 8 bytes + the current bits already read. Crinkler calls this a model: It is stored into a single byte :
The 0x00 model says that It doesn't use any previous bytes other than the current bits being read.
The 0x80 model says that it is using the previous byte + the current bits being read.
The 0x81 model says that is is using the previous byte and the -8th byte + the current bits being read.
The 0xFF model says that all 8 previous bytes are used
You probably don't see yet how this is used. We are going to take a simple case here: Use the previous byte to predict the next bit (called the model 0x80).
At position 0, we know that 0xFF is followed by bit 1 (0x80 <=> 10000000b). So n0 = 0, n1 = 1 (n0 denotes the number of 0 that follows 0xFF, n1 denotes the number of 1 that usually follows 0xFF)
At position 1, we know that 0xFF is still followed by bit 1: n0 = 0, n1 = 2
At position 2, n0 = 0, n1 = 3
At position 3, we have n0 = 0, n1 = 3, making the probability for one p(1) = (n1 + eps) / ( n0+eps + n1+eps). eps for epsilon, lets take 0.01. We have p(1) = (2+0.01)/(0+0.01 + 2+0.01) = 99,50%
So we have the probability of 99,50% at position (3) that the next bit is a 1.
The principle here is simple: For each model and an historic value, we associate n0 and n1, the number of bits found for bit 0 (n0) and bit 1 (n1). Updating those n0/n1 counters needs to be done carefully : a naive approach would be to increment according values when a particular training bit is found... but there is more chance that recent values are more relevant than olders.... Matt Mahoney explained this in The PAQ1 Data Compression Program, 2002. (Describes PAQ1), and describes how to efficiently update those counters for a non-stationary source of data :
If the training bit is y (0 or 1) then increment ny (n0 or n1).
If n(1-y) > 2, then set n(1-y) = n(1-y) / 2 + 1 (rounding down if odd).
Suppose for example that n0 = 3 and n1 = 4 and we have a new bit 1. Then n0 will be = n0/2 + 1 = 3/2+1=2 and n1 = n1 + 1 = 5
Now, we know how to produce a single probability for a single model... but working with a single model (for exemple, only the previous byte) wouldn't be enough to evaluate correctly the next bit. Instead, we need a way to combine different models (different selection of historic data). This is called context-mixing, and this is the real power of context modeling: whatever is your method to collect and calculate a probability, you can, at some point, mix severals estimator to calculate a single probability.
There are several ways to mix those probabilities. In the pure context-modeling jargon, the model is the way you mix probabilities and for each model, you have a weight :
static: you determine the weights whatever the data are.
semi-static: you perform a 1st pass over the data to compress to determine the weights for each model, and them a 2nd pass with the best weights
adaptive: weights are updated dynamically as new bits are discovered.
Crinkler is using a semi-static context-mixing but is somewhat also "semi-adaptive", because It is using different weights for the code of your exe, and the data of your exe, as they have a different binary layout.
So how this is mixed-up? Crinkler needs to determine the best context-models (the combination of historic data) that It will use, assign for each of those context a weight. The weight is then used to calculate the final probability.
For each selected historic model (i) with an associated model weight wi, and ni0/ni1 bit counters, the final probability p(1) is calculated like this :
p(1) = Sum( wi * ni1 / (ni0 + ni1)) / Sum ( wi )
This is exactly what is done in the code above for context.ComputeProbability();, and this is exactly what crinkler is doing.
In the end, crinkler is selecting a list of models for each type of section in your exe: a set of models for the code section, a set of models for the data section.
How many models crinkler is selecting? It depends on your data. For example, for ergon intro,crinklers is selecting the following models:
(note that in crinkler, the final weight used to multiply n1/n0+n1 is by 2^w, and not wi itself).
Wow, does it means that crinkler needs to store those datas in your exe. (14 bytes + 20 bytes) * 2 = 68 bytes? Well, crinkler authors are smarter than this! In fact the models are stored, but weights are only store in a single int (32 bits for each section). Yep, a single int to stored those weights? Indeed: if you look at those weights, they are increasing, sometimes they are equal... So they found a clever way to store a compact representation of those weights in a 32 bit form. Starting with a weight of 1, the 32bit weight is shifted by one bit to the left : If this is 0, than the currentWeight doesn't change, if bit is 1, than currentWeight is incremented by 1 : (in this pseudo-code, shift is done to the right)
int currentWeight = 1;
int compactWeight = ....;
foreach (model in models) {
if ( compactWeight & 1 )
currentWeigh++;
compactWeight = compactWeight >> 1;
// ... used currentWeight for current model
}
This way, crinkler is able to store a compact form of pairs (model/weight) for each type of data in your executable (code or pure data).
Model selection
Model selection is one of the key process of crinkler. For a particular set of datas, what is the best selection of models? You start with 256 models (all the combinations of the 8 previous bytes) and you need to determine the best selection of models. You have to take into account that each time you are using a model, you need to use 1 byte in your final executable to store this model. Model selection is part of crinkler compressor but is not part of crinkler decompressor. The decompressor just need to know the list of the final models used to compress the data, but doesn't care about intermediate results. On the other hand, the compressor needs to test every combination of model, and find an appropriate weight for each model.
I have tested several methods in my test code and try to recover the method used in crinkler, without achieving comparable compression ratio... I tried some brute force algo without any success... The selection algorithm is probably a bit clever than the one I have tested, and would probably require to layout mathematics/statistics formulas/combination to select an accurate method.
Finally, blueberry has given their method (thanks!)
"To answer your question about the model selection process, it is actually not very clever. We step through the models in bit-mirrored numerical order (i.e. 00, 80, 40, C0, 20 etc.) and for each step do the following:
- Check if compression improves by adding the model to the current set of models (taking into account the one extra byte to store the model).
- If so, add the model, and then step through every model in the current set and remove it if compression improves by doing so.
The difference between FAST and SLOW compression is that SLOW optimizes the model weights for every comparison between model sets, whereas FAST uses a heuristic for the model weights (number of bits set in the model mask). "
On the other hand, I tried a fully adaptive context modelling approach, using dynamic weight calculation explained by Matt Mahoney with neural networks and stretch/squash functions (look at PAQ on wikipedia). It was really promising, as I was able to achieve sometimes better compression ratio than crinkler... but at the cost of a decompressor 100 bytes heavier... and even I was able to save 30 to 60 bytes for the compressed data, I was still off by 40-70 bytes... so under 4k, this approach was definitely not as efficient as a semi-static approach chosen by crinkler.
Storing probabilities
If you have correctly followed the previous model selection, crinkler is now working with a set of models (selection of history data), for each bit that is found, each model probabilities must be updated...
But think about it: for example, if to predict the following bit, we are using the probabilities for the 8 previous bytes, it means that for every combination of 8 bytes already found in the decoded data, we would have a pair of n0/n1 counters?
That would mean that we could have the folowing probabilities to update for the context 0xFF (8 previous bytes):
- "00 00 00 00 c0 00 00 50 00" => some n0/n1
- "00 00 70 00 00 00 00 F2 01" => another n0/n1
- "00 00 00 40 00 00 00 30 02" => another n0/n1
...etc.
and if we have other models like 0x80 (previous byte), or 0xC0 (the last 2 previous bytes), we would have also different counters for them:
// For model 0x80
- "00" => some n0/n1
- "01" => another n0/n1
- "02" => yet another n0/n1
...
// For model 0xC0
- "50 00" => some bis n0/n1
- "F2 01" => another bis n0/n1
- "30 02" => yet another bis n0/n1
...
From the previous model context, I have slightly over simplified the fact that not only the previous bytes is used, but also the current bits being read. In fact, when we are using for example the model 0x80 (using the previous byte), the context of the historic data is composed not only by the previous byte, but also by the bits being read on the current octet. This implies obviously that for every bit read, there is a different context. Suppose we have the sequence 0x75, 0x86 (in binary 10000110b), the position of the encoded bits is just after the 0x75 value and that we are using the previous byte + the bits currently read:
First, we start on a byte boundary
- 0x75 with 0 bit (we start with 0) is followed by bit 1 (the 8 of 0x85). The context is 0x75 + 0 bit read
- We read one more bit, we have a new context : 0x75 + bit 1. This context is followed by a 0
- We read one more bit, we have a new context : 0x75 + bit 10. This context is followed by a 0.
...
- We read one more bit, we have a new context : 0x75 + bit 1000011, that is followed by a 0 (and we are ending on a byte boundary).
Reading 0x75 followed by 0x86, with a model using only the previous byte, we finally have 8 context with their own n0/n1 to store in the probability table.
As you can see, It is obvious that It's difficult to store all context found (.i.e for each single bit decoded, there is a different context of historic bytes) and their respective exact probability counters, without exploding the RAM. Moreover if you think about the number of models that are used by crinkler: 14 types of different historic previous bytes selection for ergon's code!
This kind of problem is often handled using a hashtable while handling collisions. This is what is done in some of the PAQ compressors. Crinkler is also using an hashtable to store counter probabilities, with the association context_history_of_bytes = > (n0/n1), but It is not handling collision in order to keep minimal the size of the decompressor. As usual, the hash function used by crinkler is really tiny while still giving really good results.
So instead of having the association between context_history_of_bytes => n0/n1, we are using a hashing function, hash(context_history_of_bytes) => n0/n1. Then, the dictionary that is storing all those associations needs to be correctly dimensioned, large enough, to store as much as possible associations found while decoding/encoding the data.
Like in PAQ compressors, crinkler is using one byte for each counter, meaning that n0 and n1 together are taking 16 bit, 2 bytes. So if you instruct crinkler to use a hashtable of 100Mo, It will be possible to store 50 millions of different keys, meaning different historic context of bytes and their respective probability counters. There is a little remark about crinkler and the byte counter: in PAQ compressors, limits are handled, meaning that if a counter is going above 255, It will stuck to 255... but crinkler made the choice to not test the limits in order to keep the code smaller (although, that would take less than 6 bytes to test the limit). What is the impact of this choice? Well, if you know crinkler, you are aware that crinkler doesn't handle large section of "zeros" or whatever empty initialized data. This is just because the probabilities are looping from 255 to 0, meaning that you jump from a 100% probability (probably accurate) to almost a 0% probability (probably wrong) every 256 bytes. Is this really hurting the compression? Well, It would hurt a lot if crinkler was used for larger executable, but in a 4k, It's not hurting so much (although, It could hurt if you really have large portions of initialized data). Also, not all the context are reseted at the same time (a 8 byte context will not probably reset as often as a 1 byte context), so it means that final probability calculation is still accurate... while there is a probability that is reseted, other models with their own probabilities are still counting there... so this is not a huge issue.
What happens also if the hash for a different context is giving the same value? Well, the model is then updating the wrong probability counters. If the hashtable is too small the probability counters may really be too much disturbed and they would provide a less accurate final probability. But if the hashtable is large enough, collisions are less likely to happen.
Thus, it is quite common to use a hashtable as large as 256 to 512Mo if you want, although 256Mo is often enough, but the larger is your hashtable, the less are collisions, the more accurate is your probability. Recall from the beginning of this post, and you should understand now why "crinkler can take several hundreds of megabytes to decompress"... simply because of this hashtable that store all the probabilities for the next bit for all models combination used.
If you are familiar with crinkler, you already know the option to find a best possible hashsize for an initial hashtable size and a number of tries (hashtries option). This part is responsible to test different size of hashtable (like starting from 100Mo, and reducing the size by 2 bytes 30 times, and test the final compression) and test final compression result. This is a way to empirically reduce collision effects by selecting the hashsize that is giving the better compression ratio (meaning less collisions in the hash). Although this option is only able to help you save a couple of bytes, no more.
Data reordering and type of data
Reordering or organizing differently the data to have a better compression is one of the common technique in compression methods. Sometimes for example, Its better to store deltas of values than to store values themselves...etc.
Crinkler is using this principle to perform data reordering. At the linker level, crinkler has access to portion of datas and code, and is able to move those portions around in order to achieve a better compression ratio. This is really easy to understand : suppose that you have a series initialized zero values in your data section. If those values are interleaved with non zero values, the counter probabilities will switch from "there are plenty of zero there" to "ooops, there are some other datas"... and the final probability will balance between 90% to 20%. Grouping data that are similar is a way to improve the overall probability correctness.
This part is the most time consuming, as It needs to move and arrange all portions of your executable around, and test which arrangement is giving the best compression result. But It's paying to use this option, as you may be able to save 100 bytes in the end just with this option.
One thing that is also related to data reordering is the way crinkler is handling separately the binary code and the data of your executable. Why?, because their binary representation is different, leading to a completely different set of probabilities. If you look at the selected models for ergon, you will find that code and data models are quite different. Crinkler is using this to achieve better performance here. In fact, crinkler is compressing completely separately the code and the datas. Code has its own models and weights, Data another set of models and weights. What does it means internally? Crinkler is using a set of model and weights to decode the code section of your exectuable. Once finished, It will erase the probability counters stored in the hashtable-dictionary, and go to the data section, with new models and weights. Reseting all counters to 0 in the middle of decompressing is improving compression by a factor of 2-4%, which is quite impressive and valuable for a 4k (around 100 to 150 bytes).
I found that even with an adaptive model (with a neural networks dynamically updating the weights), It is still worth to reset the probabilities between code and data decompression. In fact, reseting the probabilities is an empirical way to instruct the context modeling that datas are so different that It's better to start from scratch with new probability counters. If you think about it, an improved demo compressor (for larger exectuable, for example under 64k) could be clever to detect those portions of datas that are enough different that It would be better to reset the dictionary than to keep it as it is.
There is just one last thing about weights handling in crinkler. When decoding/encoding, It seems that crinkler is artificially increasing the weights for the first discovered bit. This little trick is improving compression ratio by about 1 to 2% which is not bad. Having higher weights at the beginning enable to have a better response of the compressor/decompressor, even If it doesn't still have enough data to compute a correct probability. Increasing the weights is helping the compression ratio at cold start.
Crinkler is also able to transform the x86 code for the executable part to improve compression ratio. This technique is widely used and consist of replacing relative jump (conditionnal, function calls...etc.) to absolute jump, leading to a better compression ratio.
Custom DLL LoadLibrary and PE file optimization
In order to strip down the size of an executable, It's necessary to exploit as much as possible the organization of a PE file.
First thing that crinkler is using is that lots of part in a PE files are not used at all. If you want to know how a windows executable PE files can be reduced, I suggest you read Tiny PE article, which is a good way to understand what is actually used by a PE loader. Unlike the Tiny PE sample, where the author is moving the PE header to the dos header, crinkler made the choice to use this unused place to store hash values that are used to reference DLL functions used.
This trick is called import by hashing and is quite common in intro's compressor. Probably what make crinkler a little bit more advanced is that to perform the "GetProcAddress" (which is responsible to get the pointer to a function from a function name), crinkler is navigating inside internal windows process structure in order to directly get the address of the functions from the in-memory import table. Indeed, you won't find any import section table in a crinklerized executable. Everything is re-discovered through internal windows structures. Those structures are not officially documented but you can find some valuable information around, most notably here.
If you look at crinkler's code stored in the crinkler import section, which is the code injected just before the intros start, in order to load all dll functions, you will find those cryptics calls like this:
This is done by going through internal structures:
(0) first crinklers gets a pointer to the "PROCESS ENVIRONMENT BLOCK (PEB)" with the instruction MOV EAX, FS:[BX+0x30]. EAX is now pointing to the PEB
Public TypePEB
InheritedAddressSpace As Byte
ReadImageFileExecOptions As Byte
BeingDebugged As Byte
Spare As Byte
Mutant As Long
SectionBaseAddress As Long
ProcessModuleInfo As Long ‘ // <---- PEB_LDR_DATA
ProcessParameters As Long ‘ // RTL_USER_PROCESS_PARAMETERS
SubSystemData As Long
ProcessHeap As Long
... struct continue
(1) Then it gets a pointer to the "ProcessModuleInfo/PEB_LDR_DATA" MOV EAX, [EAX+0xC]
Public Type_PEB_LDR_DATA
Length As Integer
Initialized As Long
SsHandle As Long
InLoadOrderModuleList As LIST_ENTRY// <---- LIST_ENTRY InLoadOrderModuleList
InMemoryOrderModuleList As LIST_ENTRY
InInitOrderModuleList As LIST_ENTRY
EntryInProgress As LongEnd Type
(2) Then it gets a pointer to get a pointer to the next "InLoadOrderModuleList/LIST_ENTRY" MOV EAX, [EAX+0xC].
Public TypeLIST_ENTRY Flink As LIST_ENTRY
Blink As LIST_ENTRYEnd Type
(3) and (4) Then it navigates through the LIST_ENTRY linked list MOV EAX, [EAX]. This is done 2 times. First time, we get a pointer to the NTDLL.dll, second with get a pointer to the KERNEL.DLL. Each LIST_ENTRY is in fact followed by the structure LDR_MODULE :
Public Type LDR_MODULE
InLoadOrderModuleList As LIST_ENTRY
InMemoryOrderModuleList As LIST_ENTRY
InInitOrderModuleList As LIST_ENTRY
BaseAddress As Long
EntryPoint As Long
SizeOfImage As Long
FullDllName As UNICODE_STRING
BaseDllName As UNICODE_STRING
Flags As Long
LoadCount As Integer
TlsIndex As Integer
HashTableEntry As LIST_ENTRY
TimeDateStamp As Long
LoadedImports As Long
EntryActivationContext As Long ‘ // ACTIVATION_CONTEXT
PatchInformation As Long
End Type
Then from the BaseAddress of the Kernel.dll module, crinkler is going to the section where functions are already loaded in memory. From there, the first hashed function that is stored by crinkler is LoadLibrary function. After this, crinkler is able to load all the depend dll and navigate through the import tables, recomputing the hash for all functions names for dependent dlls, and is trying to match the hash stored in the PE header. If a match is found, then the function entry point is stored.
This way, crinkler is able to call some OS functions stored in the Kernel.DLL, without even linking explicitly to those DLL, as they are automatically loaded whenever a DLL is loaded. Thus achieving a way to import all functions used by an intro with a custom import loader.
Compression results
So finally, you may ask, how much crinkler is good at compressing? How does it compare to other compression method? How does look like the entropy in a crinklerized exe?
I'll take the example of Ergon exe. You can already find a detailed analysis for this particular exe.
Comparison with other compression methods
In order to make a fair comparison between crinkler and other compressors, I have used the data that are actually compressed by crinkler after the reordering of code and data (This is done by unpacking a crinklerized ergon.exe and extracting only the compressed data). This comparison is accurate in that all compressors are using exactly the same data.
In order also to be fair with crinkler, the size of 3652 is not taking into account the PE header + the crinkler decompressor code (which in total is 432 bytes for crinkler).
To perform this comparison, I have only used 7z which has at least 3 interesting methods to test against :
Standard Deflate Zip
PPMd with 256Mo of dictionary
LZMA with 256Mo of dictionary
I have also included a comparison with a more advanced packing method from Matt Mahoney resource, Paq8l which is one of the version of PAQ methods, using neural networks and several context modeling methods.
Program
Compression Method
Size in bytes
Ratio vs Crinkler
none
uncompressed
9796
crinkler
ctx-model 256Mo
3652
+0,00%
7z
deflate 32Ko
4526
+23,93%
7z
PPMd 256Mo
4334
+18,67%
7z
LZMA 256Mo
4380
+19,93%
Paq8l
dyn-ctx-model 256Mo
3521
-3,59%
As you can see, crinkler is far more efficient than any of the "standard" compression method (Zip, PPMd, LZMA). I'm not even talking about the fact that a true comparison would be to include the decompressor size, so the ratio should certainly be worse for all standard methods!
Paq8l is of course slightly better... but if you take into account that Paq8l decompressor is itself an exe of 37Ko... compare to the 220 byte of crinkler... you should understand now how much crinkler is highly efficient in its own domain! (remember? 4k!)
Entropy
In order to measure the entropy of crinkler, I have developed a very small program in C# that is displaying the entropy of an exe. From green color (low entropy, less bits necessary to encode this information) to red color (high entropy, more bits necessary to encode this information).
I have done this on 3 different ergon executable :
The uncompressed ergon.exe (28Ko). It is the standard output of a binary exe with MSVC++ 2008.
The raw-crinklerized ergon.exe extracted code and data section, but not compressed (9796 bytes)
The final crinklerized ergon.exe file (4070 bytes)
As expected, the entropy is fairly massive in a crinklerized exe. Compare with the waste of information in a standard windows executable. Also, you can appreciate how much is important the reordering and packing of data (no compression) that is perform by crinkler.
Some notes about the x86 crinkler decompressor asm code
I have often talked about how much crinkler decompressor is truly a piece of x86 art. It is hard to describe the technique used here, there are lots of x86 standard optimization and some really nice trick. Most notably:
using all the registers
using intensively the stack to save/restore all the registers with pushad/popad x86. This is for example done (1 + number_of_model) per bit. If you have 15 models, there will be a total of 16 pushad/popad instructions for a single bit to be decoded! You may wonder why making so many pushes? Its the only way to efficiently use all the registers (rule #1) without having to store particular registers in a buffer. Of course, push/pop instruction is also used at several places in the code as well.
As a result of 1) and 2), apart from the hash dictionnary, no intermediate structure are used to perform the context modeling calculation.
Deferred conditional jump: Usually, when you perform some conditional testing with x86, this is often immediately followed by a conditional jump (like cmp eax, 0; jne go_for_bla). In crinkler, sometimes, a conditionnal test is done, and is used several instruction laters. (for example. cmp eax,0; push eax; mov eax, 5; jne go_for_bla <---- this is using the result of cmp eax,0 comparison). It makes the code to read a LOT harder. Sometimes, the conditional is even used after a direct jump! This is probably one part of crinkler's decompressor that impressed me the most. This is of course something quite common if you are programming heavily optimized-size x86 asm code... you need to know of course which instructions is not modifying CPU flags in order to achieve this kind of optimization!
Final words
I would like to apologize for the lack of charts, pictures to explain a little bit how things are working. This article is probably still obscure for a casual reader, and should be considered as a draft version. This was a quick and dirty post. I wanted to write this for a long time, so here it is, not perfect as it should be, but this may be improved in future versions!
As you can see, crinkler is really worth to look at. The effort to make it so efficient is impressive and there is almost no doubt that there won't be any other crinkler competitor for a long time! At least for a 4k executable. Above 4k, I'm quite confident that there are still lots of area that could be improved, and probably kkrunchy is far from being the ultimate packer under 64k... Still, if you want a packer, you need to code it, and that's not so trivial!
Clone the repository: git clone https://github.com/extr0py/oni.git
Install dependencies by running npm install from the root
Build using npm run build from the root
Run npm link to register the ONI command
Run oni at the command line
Goals
The goal of this project is to provide both the full-fledged VIM experience, with no compromises, while pushing forward to enable new scenarios.
Modern UX - The VIM experience should not be compromised with poor user experiences that stem from terminal limitations.
Rich plugin development - using JavaScript, instead of VimL, allowing deep-language integration.
Cross-platform support - across Windows, OS X, and Linux.
Batteries included - rich features are available out of the box - minimal setup needed to be productive. TypeScript development is the canonical example, but the hope is that other language providers will be included. Later, an included package manager will make it simple to find and install plugins.
Performance - no compromises, VIM is fast, and ONI should be fast too.
Ease Learning Curve - without sacrificing the VIM experience
VIM is an incredible tool for manipulating text at the speed of thought. With a composable, modal command language, it is no wonder that VIM usage is still prevalent today even in the realm of modern editors.
However, going from thought to code has some different challenges than going from thought to text. IDEs today provide several benefits that help to reduce cognitive load when writing code, and that benefit is tremendously important - not only in terms of pure coding efficiency and productivity, but also in making the process of writing code enjoyable and fun.
In my journey of learning VIM and increasing proficiency in other editors, I've found there is always a trade-off - either enjoy the autocompletion and IDE features, and compromise on the experience and muscle memory I've built with VIM, or work in VIM and compromise on the rich language functionality we have in an IDE.
The goal of this project is to give an editor that gives the best of both worlds - the power, speed, and flexibility of using VIM for manipulating text, as well as the rich tooling that comes with an IDE.
Documentation
Usage
Code Completion
Code completion is a commonly requested add-on to Vim, and the most common solutions are to use a plugin like YouCompleteMe, deoplete, or AutoComplPop.
These are all great plugins - but they all have the same fundamental issue that they are bounded by the limitations of the Vim terminal UI, and as such, can never be quite up-to-par with new editors that do not have such limitations. In addition, some require an involved installation process. The goal of code completion in ONI is to be able to break free of these restrictions, and provide the same richness that modern editors like Atom or VSCode provide for completion.
Entry point
If a language extension is available for a language, then that language service will be queried as you type, and if there are completions available, those will be presented automatically.
Out of the box, the supported languages for rich completion are JavaScript & TypeScript. There is no special setup required for JavaScript & TypeScript language completion, but you will get best results by having a jsconfig.json or tsconfig.json at the root of your project.. You can use an empty json file with {} to get the rich completion.
Commands
<C-n> - navigate to next entry in the completion menu
<C-p> - navigate to previous entry in the completion menu
<Enter> - selected completion item
<Esc> - close the completion menu
Options
oni.useExternalPopupMenu - if set to true, will render the Vim popupmenu in the same UI as the language extension menu, so that it has a consistent look and feel. If set to false, will fallback to allow Neovim to render the menu.
Fuzzy Finder
Fuzzy Finder is a quick and easy way to switch between files. It's similiar in goal to the Ctrl-P plugin, and the built-in functionality editors like VSCode and Atom provide.
Entry point
<C-p> - show the Fuzzy Finder menu
Commands
<C-n> - navigate to next entry in the Fuzzy Finder menu
<C-p> - navigate to previous entry in the Fuzzy Finder menu
<Enter> - select a Fuzzy Finder item
<Esc> - close the Fuzzy Finder menu
By default, Fuzzy Finder uses git ls-files to get the available files in the directory, but if git is not present, it will fallback to a non-git strategy.
The Fuzzy Finder strategy can be configured by the editor.quickOpen.execCommand, and must be a shell command that returns a list of files, separated by newlines.
Command Palette
The Command Palette offers another command-line based interface to Oni.
Entry point
Commands
<C-n> - navigate to next entry in the Command Palette menu
<C-p> - navigate to previous entry in the Command Palette menu
<Enter> - select a Command Palette item
<Esc> - close the Command Palette menu
Currently, the Command Palette includes items from:
a few commonly used menu items
NPM package.json scripts
Plugin commands
Launch parameters from the .oni folder
Quick Info
Quick Info gives a quick summary of an identifier when the cursor is held on it. JavaScript and TypeScript is supported out of the box.
Entry point
Leave the cursor hovering over an identifier.
Options
oni.quickInfo.enabled - If set to true, the Quick Info feature is enabled. (Default: true)
oni.quickInfo.delay - Delay in milliseconds for the Quick Info window to show. (Default: 500)
Status Bar
Oni features a rich status bar, designed as a replacement for vim-powerline and vim-airline.
API
Oni provides a StatusBar API for adding new items to the status bar.
Options
oni.statusbar.enabled - If set to true, the status bar feature is enabled. (Default: true)
Users that are coming from Neovim and have highly customized status bars may want to set oni.statusbar.enabled to false, along with setting the oni.loadInitVim to true and oni.useDefaultConfig to false.
Tabs
Oni features a buffer tab bar, like many common IDEs. VIM has its own definition of a "Tab", which is really a set of windows and buffers. By default, the tabs in Oni correspond to open files (buffers). You can override this, and show vim-defined tabs, by setting the tabs.showVimTabs setting to true
Commands
[b and ]b will cycle through buffers, which has the effect of moving through the tabs.
Options
tabs.enabled - If set to true, the tabs are visible. (Default: true)
tabs.showVimTabs - If set to true, shows vim tabs. Otherwise, shows open buffers. (Default: false. Requires Neovim 0.2.1+)
Languages
JavaScript and TypeScript
Configuration
JavaScript and TypeScript support is enabled out-of-the-box using the TypeScript Standalone Server. No setup and configuration is necessary, however, you will get better results if you use a tsconfig.json or a jsconfig.json to structure your project.
Supported Language features
Completion
Goto Definition
Formatting
Enhanced Syntax Highlighting
Quick Info
Signature Help
Live Evaluation
Debugging
Y
Y
Y
Y
Y
Y
Y
N
C#
Configuration
C# language support requires the oni-language-csharp plugin, which provides langauge capabilities for both .NET and Mono.
Go language support depends on the go-langserver by SourceGraph, which provides language support for Go. Follow their installation instructions as this language server is not bundled out-of-the-box with Oni.
go-langserver must be available in your PATH. You can override this by setting the golang.langServerCommand configuration value.
Supported Language features
Completion
Goto Definition
Formatting
Enhanced Syntax Highlighting
Quick Info
Signature Help
Live Evaluation
Debugging
N
Y
N
N
Y
N
N
N
Known Issues
Python
Configuration
Python language support depends on pyls by Palantir, which provides language support for Python. Follow their installation instructions as this language server is not bundled out-of-the-box with Oni.
pyls must be available in your PATH. You can override this by setting the python.langServerCommand configuration value.
oni.audio.bellUrl - Set a custom sound effect for the bell (:help bell). The value should be an absolute path to a supported audio file, such as a WAV file.
oni.useDefaultConfig - ONI comes with an opinionated default set of plugins for a predictable out-of-box experience. This will be great for newcomes to ONI or Vim, but for Vim/Neovim veterans, this will likely conflict. Set this to false to avoid loading the default config, and to load settings from init.vim instead (If this is false, it implies oni.loadInitVim is true)
oni.loadInitVim - This determines whether the user's init.vim is loaded. Use caution when setting this to true and setting oni.useDefaultConfig to true, as there could be conflicts with the default configuration.
oni.exclude - Glob pattern of files to exclude from Fuzzy Finder (Ctrl-P). Defaults to ["**/node_modules/**"]
oni.hideMenu - (default: false) If true, hide menu bar. When hidden, menu bar can still be displayed with Alt.
editor.clipboard.enabled - (default: true) Enables / disables system clipboard integration.
editor.fontSize - Font size
editor.fontFamily - Font family
editor.fontLigatures - (default: true). If true, ligatures are enabled.
editor.backgroundImageUrl - specific a custom background image
editor.backgroundImageSize - specific a custom background size (cover, contain)
editor.scrollBar.visible - (default: true) sets whether the buffer scrollbar is visible
environment.additionalPaths - (default: [] on Windows, ['/usr/bin', '/usr/local/bin'] on OSX and Linux). Sets additional paths for binaries. This may be necessary to configure, if using plugins or a Language Server that is not in the default set of runtime paths. Note that depending on where you launch Oni, there may be a different set of runtime paths sent to it - you can always check by opening the developer tools and running process.env.PATH in the console.
See the Config.ts file for other interesting values to set.
In VimL, the g:gui_oni variable will be set to 1, and can be validated with if exists("g:gui_oni") in VimL.
Clipboard Integration
Oni, by default, integrates with the system clipboard. This is controlled by the editor.clipboard.enabled option.
The behavior is as follows:
All yanks or deletes will be pushed to the system clipboard.
Pressing on Windows/Linux ( on OSX) in visual mode will copy the selected text to the system clipboard.
Pressing on Windows/Linux ( on OSX) in insert mode will paste the text from the system clipboard.
If you have custom behavior or functionality bound to <C-c>, <C-v> (or <M-c>, <M-v> on OSX), you may wish to disable this behavior by setting editor.clipboard.enabled to false.
Plugins
Installation
Oni does not require the use of a plugin-manager such as pathogen or vundle (although you may use one if you wish, and this will be necessary if you are sharing a configuration between Neovim and Oni).
Oni will, by default, load all plugins in the $HOME/.oni/plugins directory.
Installing a Vim Plugin
To install a Vim plugin, you just need to create a directory inside $HOME/.oni/plugins.
git clone will usually do this for you, so for example, if you wanted to install this Solarized Theme by lifepillar, you'd run:
NOTE: On Windows, use your home directory (ie, C:/users/<your username) instead of ~
This will clone the vim-solarized8 plugin and create an ~/.oni/plugins/vim-solarized8 folder.
Restart Oni, and execute :colorscheme solarized8_light, and enjoy your new theme!
Installing an Oni Plugin
Installing an Oni plugin is much the same as installing a Vim plugin. However, because they potentially have JavaScript extension code in addition to VimL, you often need to install NPM dependencies.
Prerequisite: Make sure the npm command is available. If not, install the latest node
As above, you just need to create a folder hosting the plugin, and install the dependencies. As an example, here's how you'd install the oni-plugin-tslint extension.
Restart Oni, and linting should now be enabled when you open up a TypeScript (.ts) file.
API
Oni offers several rich extensibility points, with the focus being on various UI integrations as well as IDE-like capabilities.
NOTE: The API will be in-flux until v1.0.
Language extenders given ONI rich integration with languages, offering services like:
Code Completion
Quick Info
Goto Definition
Formatting
Live code evaluation
Unit test integration
Enhanced syntax highlighting
To see the in-progress API, check out the Oni.d.ts definition file as well as the typescript language plugin, which demonstrates several of these features:
You can explore the Oni API during runtime by doing the following:
Press <C-P> (open Command Palette)
Select "Open Dev Tools"
You can access the Oni object directly, ie:
Background
ONI currently supports the setting of a background image as well as background opacity.
Debuggers
Project Templates
Snippets
FAQ
Why isn't my init.vim loaded?
TL;DR - Set the oni.useDefaultConfig configuration value to false
By default, Oni has an opinionated, prescribed set of plugins, in order to facilitate a predictable out-of-box experience that highlights the additional UI integration points. However, this will likely have conflicts with a Vim/Neovim veteran's finely-honed configuration.
To avoid loading the Oni defaults, and instead use your init.vim, set this configuration value to false in $HOME/.oni/config.json.
Included VIM Plugins
This distribution contains several VIM plugins that enhance the VIM experience.
There are a few image and audio assets bundled with Oni - see ASSETS.md for attribution.
Windows and OSX have a bundled version of Neovim, which is covered under Neovim's license
Bundled Plugins
Bundled plugins have their own license terms. These include:
Contributing
Contributions are very much welcome :)
If you're interested in helping out, check out CONTRIBUTING.md for tips and tricks for working with ONI.
Thanks
Big thanks to the NeoVim team - without their work, this project would not be possible. The deep integration with VIM would not be possible without the incredible work that was done to enable the msgpack-RPC interface. Thanks!
Sponsors
A big THANK YOU to our current monthly sponsors. Your contributions help keep this project alive!
Other Contributions
In addition, there are several other great NeoVim front-end UIs here that served as great reference points and learning opportunities.
When Richard Stephenson drives to work, there's a chance that later that day he'll become the first human to see new details of Mars, a moon of Saturn, or the far reaches of the solar system.
Again.
Stephenson's seen plenty of such firsts because his job as an Operations Supervisor at the Canberra Deep Space Tracking Complex means it's his responsibility to track humanity's most distant spacecraft. Voyager 2 is in his care, along with Cassini, New Horizons, MAVEN and several lower-profile missions. The ESA's Rosetta comet exploration mission also crossed his desk until its timely demise, while Mars Rovers are often on the menu.
Stephenson has a particular fondness for Voyager 2, as he was hired and moved to Australia to help track Voyager 2's 1989 encounter with Neptune. That visit came 12 years after its launch 40 years and a day ago. That it remains humanity’s only close encounter with the planet is a testament to the durability of the Voyagers and the outstanding and astounding achievements the missions represent.
“With Neptune we were getting pictures back in real time,” he recalls. “Every day you would come in on shift and see both getting just a tiny bit closer.” Sometimes Stephenson would find himself in the company of just one or two others as they witnessed views of the ice giant that no human eye had previously beheld.
“I was sitting in front of a chart looking at atmospherics,” he says. “I saw it a few seconds after the images downloaded.”
Little wonder, then, that Stephenson told The Register he sees his career as “one memorable event after another. I have been involved in so many big missions for 30 years.”
Helping to make his career extra special is that Australia's the only place from which Voyager 2 can be observed. The Canberra Complex at which Stephenson works is the Australian node of NASA's Deep Space Network. The other two are at Goldstone, California, and near Madrid, Spain. As the Canberra complex is the only one in the Southern hemisphere, it gets plenty of work. It also gets all the connections to Voyager 2, which took a sharp southern turn years ago and can only be reached from Canberra.
Communicating with far-flung spacecraft needs big antennae and the Canberra facility has several: a 70m behemoth is its most sensitive instrument, but a pair of 34m dishes can be coupled to similar effect. The 64m Parkes radio telescope, about 300km away, can also be co-opted to receive data.
On the job
Stephenson's job sees him ensure the station's antennae are ready for actions when far-flung spacecraft are in position to chat with Earth.
He knows what to do because the Jet Propulsion Laboratory sends him a schedule..
“A project will come along that says I will need so many hours, this many on the 70m and this many on the 34m,” he explains. As the complex has finite resources he says “There's a bit of conflict resolution to be done, but that is sorted out months in advance.”
The result of that wrangling is his daily to-do list of “connections” he and his team of four need to make. Before each connection, the complex is re-configured for the job, with transmitters, antennae and all required resources brought online and/or brought into desired configurations and tested.
One critical job is getting the antennae into position: Canberra's 70m dish weighs 8,000 tonnes, half of which can move. A hydrostatic bearing helps it all to get moving: the antenna floats on a thin film of oil, beneath which hydraulics move to get the antenna pointing wherever it is required to go.
There's a deadline for each connection and it can't be missed, as doing so might mean missing data. That's bad news if a spacecraft has planned a data dump, which happen when they're in orbits that occasionally bring them close to Earth. When spacecraft make those close approaches, they empty their data stores at quite high bitrates, then sail back into the void and communicate more slowly, while storing data for their next close approach.
Image may be NSFW. Clik here to view.
The Canberra Deep Space Tracking Complex
Communication speeds vary. Stephenson says the Mars Atmosphere and Volatile Evolution (MAVEN) mission can trickle in data at ten bits per second. Even Voyager 1 can beat that, regularly hitting 160 bits per second and reaching 1,200 bits per second when emptying its memory.
Antennae make the difference: the Voyagers' are high gain, MAVEN's isn't. So even though Voyager 1's signal comes in at -168dbm it's easier to reach than other probes. Kepler, which is not far from Earth, talks to Earth on its low gain antenna in the mid 160s dbm when it is busy peering at exoplanets and just sends housekeeping data on its secondary antenna.
Stephenson says each of the spacecraft he tends has its own idiosyncrasies. Voyager 2 experienced a capacitor failure early in its mission that left its main radio inoperable and has meant that Stephenson and his team need to figure out what frequency to use when communicating with the craft.
Understanding just how to tickle each craft just right is one of the skills of his job, but also a skill he worries will soon become less relevant because the Deep Space Network will soon move to a follow-the-Sun operation. Today, Canberra and the other two sites are staffed 24x7, with crews of four or five on duty.
Rise of the machines
When the new regime kicks in, crew sizes will increase to nine, but that team will operate all three nodes of the network thanks to increased automation.
“We are now heavily dependent on automation,” Stepheson says, although nothing happens without a human initiating the action. “You will always need a controller who can identify the issues, but I can see that in five to ten years that might change, too.”
Other changes he sees coming include upgrades to the Network's telescopes. They needn't necessarily be made any larger, as improvements to low-noise amplifiers and software to process signals offer the chance to improve accuracy. Canberra's amplifiers are already cryogenically cooled to 4.5° kelvin.
Desktop upgrades are also on the agenda. Stephenson said he uses a Solaris workstation running version 8 of the OS. Linux is making inroads as Sun kit becomes harder to rely on.
The Voyagers, too, are becoming less reliable, but Stephenson hopes to stick around to help them keep communicating, if only because it's a pleasure to come to work in the bucolic atmosphere of the Canberra complex, which nestles in a small valley that feels a thousand miles from care but is just a 20-minute commute from the city's suburbs.
“Of the three stations it is the most picturesque,” Stephenson says. “You feel like you are working in the country. And when I go for a walk at 3AM and look up at the stars, it is quite special.” ®
Piping Curl to Bash is dangerous. Hopefully, this is well know at this point and many projects no longer advertise this installation method. Several popular posts by security researchers demonstrated all kinds of covert ways such instructions could be subverted by an attacker who managed to gain access to the server hosting the install script. Passing HTTP responses uninspected to other sensitive system commands is no safer, however.
This is how Docker documentation currently advertises adding their GPG key to the trusted key ring for Debian's APT:
The APT key ring is used to authenticate packages that are installed by Debian's package manager. If an attacker can put their public key into this key ring they can for example provide compromised package updates to your system. Obviously you want to be extra sure no keys from questionable origins get added.
Docker documentation correctly prompts you to verify that the added key matches the correct fingerprint (in contrast to some other projects). However, the suggested method is flawed since the fingerprint is only checked after an arbitrary number of keys has already been imported. Even more, the command shown specifically checks only the fingerprint of the Docker signing key, not of the key (or keys) that were imported.
Consider the case where download.docker.com starts serving an evil key file that contains both the official Docker signing key and the attacker's own key. In this case, the instructions above will not reveal anything suspicious. The displayed fingerprint matches the one shown in the documentation:
However, importing our key file also silently added another key to the key ring. Considering that apt-key list outputs several screen-fulls of text on a typical system, this is not easy to spot after the fact without specifically looking for it:
$ sudo apt-key list|grep -C1 evil
pub 1024R/6537017F 2017-08-21
uid Dr. Evil <evil@example.com>
sub 1024R/F2F8E843 2017-08-21
The solution is obviously not to pipe the key file directly into apt-key without inspecting it first. Interestingly, this is not straightforward. pgpdump tool is recommended by the first answer that comes up when asking Google about inspecting PGP key files. The version in Debian Jessie fails to find anything suspicious about our evil file:
Debian developers suggest importing the key into a personal GnuPG key ring, inspecting the integrity and then exporting it into apt-key. That is more of a hassle, and not that useful for Docker's key that doesn't use the web of trust. In our case, inspecting the file with GnuPG directly is enough to show that it in fact contains two keys with different fingerprints:
The key file used in the example above was created by simply concatenating the two ASCII-armored public key blocks. It looks pretty suspicious in a text editor because it contains two ASCII headers (which is probably why pgpdump stops processing it before the end). However, two public key blocks could easily have also been exported into a single ASCII-armor block.
Metamarkets handles a lot of data. The torrent of data that clients send to us surpasses a petabyte a week. At this scale, the ability to failover gracefully, to detect and eliminate brownouts, and to efficiently operate huge quantities of byte-banging machines is necessary.
We started and grew Metamarkets in AWS’s us-east region. And the majority of our footprint was in a single availability zone (AZ). As we grew, we started to see the side effects of being restricted to one AZ, then the side effects of being restricted to one region. It’s kind of like inflating a balloon in a porcupine farm, where you know it is a bad idea, but while you’re trying to figure out where to start inflating a new balloon, the prior one keeps on filling up with more air!
As we investigated growth strategies outside of a single AZ, we realized a lot of the infrastructure changes we needed to make to accommodate multiple availability zones were the same changes we would need to make to accommodate multiple clouds. After some looking around, we decided the Google Cloud Platform was potentially a very good fit to the way Metamarkets’ business and teams operate and the way some forces in the infrastructure industry are trending.
This post will cover some of the pragmatic differences we have experienced between AWS and GCP as cloud providers as of 2017. Some of the comparisons will be listed as unfair comparisons. In these instances we believe the level of service Metamarkets has subscribed to is different between the two cloud providers. In the interest of transparency, our primary operations in AWS are in us-east, which is the oldest AWS region and subject to a lot of cloud legacy both in users and (I suspect) internal hardware, and design.
While the rest of this post centers on some of the higher level considerations of GCP and AWS, it is worth calling out the key use cases Metamarkets uses each cloud for. During part of our initial investigations, node spin-up time on GCP was so fast that we found race conditions in our cluster management software. The distributed load-balancer intake methodologies GCP employs also means clients often hop on the GCP network very close to their point of presence. This combined with the per-minute pricing means GCP is a natural choice for things which scale up and down regularly relating to real-time data. Metamarkets runs all of our real-time components (for which we have our own throughput-based autoscaling) on GCP. For AWS, we have a large pool of compute resources running at high cpu utilization which also dip into spot market resources as needed. These compute resources in AWS use a combination of local disk and various EBS attached volume types depending on the SLO of the services using them. The workloads commonly used on our AWS instances are things which receive instructions for a chunk of local computations that need performed, then the aggregated results are sent back or shuffled to other nodes in the cluster.
Input and output is one of the core bread-and-butter aspects of the cloud. Being able to push data to and from disk and network is core to how data flows in a cloud environment. It is worth noting that the general trend in the industry seems to be to push users onto network attached storage instead of local storage. The upside being the claimed (we’re big, but not big enough to have raw-disk failure statistics) failure rates for network attached is lower than for local disk. The downside is that when network attached disk has problems, you may see it on multiple instances at the same time, potentially taking them down completely (easy to manage) or inducing a brownout (hard to manage). When a local disk fails, the solution is to kill that instance and let the HA built into your application recover on a new VM. When network disk fails or has a multi-instance brownout, you’re just stuck and have to failover to another failure domain, which is usually in another availability zone or in some cases another region! We know this because this kind of failure has caused production outages for us before in AWS. This trend towards network attached storage is one of the scariest industry trends for big data in the cloud where there will probably be more growing pains before it is resolved.
AWS has the best offering for local disk solutions of the two cloud vendors. While the newest C4 / R4 / M4 instance classes are EBS only, the I3 / D2 / X1 series have excellent options for local disk.
At the time of this writing, the rate card for local SSD in GCP is much higher than similar storage in AWS, making it an option for people who absolutely require local SSD, but certainly an economically punishing option for those who must do so. (Late Note: a price change was just announced)
For performance, AWS offers a lot of options as far as your expected throughput, and the ability to burst capacity for short times. This allows for a lot of configuration options, and adds a lot of extra monitoring concerns that you need to account for. The end result is you can likely get highly specialized disk settings for your exact application needs, as long as you are willing to spend the initial effort to get tuning and monitoring correct. We do not do extreme disk tuning, and instead go for macro adjustments, which generally means using a particular disk class and changing either zero settings or poking simple optimizations. As such, we tend to use GP2 or SC1. Our experience with these two particular disk types is that the performance expectations can be inconsistent and can suffer from noisy neighbors and multi-VM brownouts or blackouts. We do not currently use provisioned IOPS. As such, as long as your disk throughput needs are within the bounds of skew caused by these effects, and you can fail over to an isolated failure domain, they make great options.
For GCP the options are more limited. But from our experience the network attached disk performs EXACTLY as advertised to the point of being shockingly to spec to where we haven’t had any need for deeper configuration options. We haven’t seen any hiccups in the disk layer in GCP yet, and only use network attached disk (persistent disk).
The networking is comprised of node to node networking, node to external networking and node to network-disk. The two clouds tackle the networking issue very differently, and such differences need to be taken into account depending on the needs of your applications.
For AWS, networking expectations is one of the hardest things to figure out. The instances give general specifications for “low,” “medium” and “high” network, or general limits like “up to 10Gbs.” If you read the fine print of the 10Gbs or 20Gbs instances, you’ll notice that you’ll only see those throughputs if you are using placement groups, which can be subject to freeze outs where you cannot get capacity. What this means is that your network throughput and consistency is going to be highly varied and hard to predict. In order to make full use of these networks you will have to have special network drivers on these machines that are not packaged with some older distributions, as well as enable special flags on your instances to enable the enhanced network. This leaves you with an extra critical software versioning you need to test, deploy, and keep track of. Luckily, linux distributions have started carrying more up to date AWS cloud network drivers by default.
Astute readers will note that disk reading over a network that has such loose guarantees of throughput is a recipe for unpredictability. AWS has mitigated this by having “EBS optimized” as an option which puts the network traffic for network attached storage in a different bandwidth pool. In our experience this eliminates contention for resources with yourself, but does not prevent upstream-EBS issues.
For GCP, networking per VM is both significantly higher than what is achieved in AWS, and more consistent. The achievable network capacity is based on the quantity of CPUs your VMs have. GCP shows a strong ability to define a spec and deliver on the throughput expectations. The balance here is that GCP has no dedicated bandwidth for network attached storage, but the total network available is higher.
Don’t let the reviews above scare you too much. With a little tuning, a disk-and-network heavy operation like kafka broker replacement, where we tend max-out the network bandwidth, can be made to show very strong top-hat characteristics for both AWS and GCP.
From a network logistics standpoint, GCP has an advantage in how it labels its zones. For GCP (at the time of this writing) the zone names are the same for everyone. In AWS the zones are shuffled around per account so that it is difficult to determine which zone corresponds to other zones. Additionally, the zones reported by spot prices on some of the billing and invoicing documents do not directly correspond to a particular zone’s alphabetical notation.
The fundamental technology for AWS EC2 VMs is xen, while the fundamental technology for GCP GCE VMs is kvm. The way each handles compute demand is significantly different from the guest OS perspective. As a simple example, AWS claims full exposure of the underlying NUMA topology on their latest series of i3 instances, whereas GCP has little to no information regarding NUMA considerations on their platform.
In practical usage in our clusters, Samza shows significantly more CPU skew for the same amount of work done on GCP compared to AWS where the CPU is more consistent from VM to VM. For VMs that run with relatively light CPU utilization (<60% or so), this will probably not be enough to affect the guest system. But for VMs that are intended to run near 100% CPU usage (like heavily bin-packed containers), and require approximately equal work done by different nodes over time, this can make capacity planning more challenging.
This difference is also apparent in some of the cloud service offerings. Kinesis from AWS is structured around shards which are similar to Kafka topic partitions. If you are only looking to parallelize your work, such a setup works best if your workers have nearly even work distribution to prevent any particular shard from falling behind. This also requires making sure the data going into the shards is nearly equal in “work” required to process it. SQS from AWS is a different message passing option which claim to have “nearly unlimited” TPS capabilities, but we have never tested the scaling limits for our use cases. On the GCP side PubSub is their all-in-one solution for both a distributed log and messaging, and billed on a bytes-through basis. We have not evaluated PubSub at our scale.
From a flexibility standpoint, the GCP offering of custom machine types is something we use extensively. The maximum memory per core available in GCP is not as high as in AWS (unless you want to pay a premium), but that has not affected our services running in GCP.
Our interaction with AWS support has traditionally been on only the most rudimentary level. This is largely because most support modes that operate on a percent of cloud spend become worse and worse deals for you as your cloud spend increases. You do not get economies of scale on your support. This leaves us on the lowest level of support.
To kick-start our utilization of GCP we engaged their Professional Services offering. Our interaction with the Google PSO team has been very positive, and I highly encourage anyone considering a serious undertaking on GCP to engage their representative about such opportunities. The PSO team we worked with was very good at addressing our needs on everything from best ways to get rid of toil to getting in experts to talk to our engineers about the current and future plans for some key GCP products. For outside of the PSO team, we have generally found the GCP support better at either directly addressing concerns, or identifying the specific parts they cannot help with. The GCP support has been a much more favorable interaction compared to our experience (at the support level we pay for) with AWS support.
For AWS, multiple “Availability Zones” are within a single “Region”. For GCP, multiple “Zones” are in a single “Region”. With how the machines are laid out, on AWS you have the option of getting dedicated machines which you can use to guarantee no two machines of yours run on the same underlying motherboard, or you can just use the largest instance type of its class (ex: r3.8xlarge) to probably have a whole motherboard to yourself. In GCP there is no comparable offering. What this means is that for most use cases, you have to assume in any particular zone that all of your instances are running on the same machine. We have had failures of underlying hardware or some part of networking simultaneously take out multiple instances before, meaning they shared the same hardware at some finer level than just availability zone.
In theory, any failures should be isolated per zone, and in rare scenarios a failure hits the entire region at once, and in incredibly rare scenarios it hits multiple regions. In practice what we have seen that multiple zones protects primarily against resource freeze out (unable to launch new VMs) and secondarily but weakly limiting the scope of failure of hardware. What it does not protect against is a particular cloud service having sudden issues across multiple zones. Unfortunately, we do not have stats on multi-region failures in either AWS or GCP. This means we cannot tell the difference between a local failure or a global one.
GCP has been a lot more forthcoming with what issues their services are experiencing. Whereas in AWS we will often see issues that go unreported or, worse, get told that there is no issue (see Support above). The transparency provided by GCP is more helpful for our team because it allows us to make better decisions about investing in cloud failover (problem on GCP side) or monitoring, detection, and service self-healing (problems on our side). If you take the self-reporting of incidents at face value, AWS often states the blast radius of an issue as confined to a specific region, whereas GCP has more claims of global issues 12.
Both providers offer SLAs for different services and both want you to use their vendor-specific offerings for various items as core parts of your technology. But both the GCP SLA and the AWS SLA only cover the services affected. This means if a failure in blob storage causes your expensive compute time to be wasted, then the SLA might only cover the blob storage cost. You should discuss SLAs with your account representative if you have any questions or need further explanations.
A unique feature for GCP is the ability to migrate your VMs to new hardware transparently. This live migration is something we were very hesitant about at first, but in practice when migrating a chunk of kafka brokers in the middle of broker replacement, none of our metrics could even detect that a migration had occurred!
Wow… billing. The fundamental way in which AWS and GCP bill is very different. And getting a handle on your cloud spend is a huge hassle in both AWS and GCP. AWS provides a pre-canned billing dashboard which provides basic macro insights into your bill. GCP provides estimates exported into BigQuery, upon which you can build Data Studio reports on your own. We do not find either of these sufficient and instead opt to use our own expertise to build beautiful interactive data streams on our cloud billing data. Since the Metamarkets user interface is designed to help make sense of highly dimensional time series data, putting the cloud invoices into this system is a natural choice and has worked out very well so far.
For AWS cloud spend is accrued as line items where different line items have rate identifiers whose rates are multiplied by the consumed resource quantities. This follows a very predictable denormalized data scheme that is compatible with multiple analysis tools.
For GCP billing exports into BigQuery, each line item is an aggregate over a time period of accrued usage (many GCP-internal calculations are rolled up to Day boundaries at the time of this writing), and has sub-components of credits. For example, if you run a n1-highmem-16 at the standard rate card of 0.9472 dollars/ hr for one day. You will see a line item for 22.7328 dollars for “Highmem Intel N1 16 VCPU running in Americas” with a usage of 86400 seconds. If you run the same instance throughout the month, you will eventually start to see a credit for “Sustained Usage Discount” as an item in the nested list of credits for that resources usage, beginning on whichever time-slice it starts to get applied. Subtract the sum of all the credits from the sum of the usage costs and you have what your final bill will be. This has two major disadvantages: 1) auditing is very hard; you usually have to just take the numbers as presented (in our experience the numbers are correct, just hard to calculate independently). 2) calculating an “estimated spend this month” kind of projection is very challenging, which makes your finance team cranky.
This section is focused more around the higher level aspects of cost, and not the specific rates Metamarkets pays for different services.
The strategy for AWS is largely around instance reservations. With the recent addition of convertible reservations and instance size flexibility, it makes experimenting with more efficient instance configurations much easier. The only downsides we’ve encountered with this program are occasional instance type freeze-out in a particular zone due to lack of availability, and the complexity of handling convertible reservations that do not all have the same start date. We find the flexibility provided by these features very much worth the wait for capacity to become available. Upgrades to instance types or pricings tend to go on about 12 to 16 month cycles. Ask your AWS representative if you have any questions or concerns along these lines.
For GCP, the strategy seems to be headed toward committed use discounts and sustained usage discounts with a premium for specific extended compute or extended memory needs. This allows for quite a bit of flexibility in how your clusters are configured, and provides a natural way to transition from independent VMs to a containerized environment.
For transient instances each provider has slightly different solutions. For GCP the preemptible instances are an alternative to running things with a guaranteed tenancy. An interesting feature of the GCP preemptable instances is that they are terminated if left up for 24 hours. This makes having a budgeted spend pretty straight forward, and helps make sure you are not doing crazy things on the preemptable instances that you shouldn’t be doing. For AWS the offering is around the spot market. We love the spot market but it does make your monthly bill very hard to predict. The nature of the transient VM offerings means that the capacity available for any particular task is going to be a function of how long the task needs to run, and how many resources the task needs. If you are going to go down the route of extensively using the transient instances, make sure you have the ability to migrate your workload among different resource pools.
Both cloud providers have excellent security features for data and we have never been concerned about security of the cloud providers themselves. This section is dedicated more to the ease of management and a few feature differences between the cloud providers.
IAM
AWS has very detailed IAM rules that, in general, are focused on functions performed against resources. For example, you have different things you can do to a resource such as get, list, describe, edit, and a host of other things. At the time of this writing S3 supports 20 different operations you can perform against an object in S3. This means you are probably going to go down the route of granting large swaths of rights to some IAM roles, and just granting one or two to others.
In GCP, the IAM is centered more around pairing logins or IDs with intentions against a resource. Groups are more a logical construct to make the management of the IDs easier, and instead of detailed operations that can be done against resources, intentions against the resource are expressed as “Roles.” While there is mixed support for fine grained access controls, the general use cases are going to be against pre-canned roles and intentions such as “Viewer,” “Subscriber,” “Owner” or “User.”
Threats
As far as being the target of attacks, we noticed a significant difference between AWS and GCP. If you search your sshd logs for the phrase, “POSSIBLE BREAK-IN ATTEMPT!”, the quantity of attempts in GCP is dramatically higher than in AWS. In GCP we typically see somewhere around 130,000 break-in attempts every day. In AWS it is on the order of a few hundred.
KMS
Both clouds offer a form of a Key Management Service. We use this to store secrets in blob storage (S3 / GS) in an encrypted form. The secrets are usually DB passwords or the secret component of cloud keys for the other cloud (AWS secrets encrypted and stored on GS, and GCP secrets encrypted and stored in S3). Read access to the blob storage and decryption rights against the key is limited to specific machines (machine-role in AWS and service-account in GCP) so that specific instances can read and decrypt the secret to authorize against the other cloud. Both clouds have very workable interfaces. The transparent decryption AWS offers for S3 is very easy to use and gives AWS’s KMS solution an advantage in our use case.
Our largest compute footprint runs on home-grown modifications to CoreOS (close cousin of the GKE COS) adding Mesos support. For some of our other service clusters we are investigating cloud container systems. In early investigations GKE is much easier to adopt and has better high level feature than ECS. But the networking connectivity restrictions in GKE are very limiting for a migration of services from a non-containerized environment to a containerized one (something being actively addressed). The problem related to CPU skew also makes tightly packed GKE nodes more worrisome. I’m bullish that the cloud providers will come up with increasingly better solutions in this area.
The monitoring features for AWS are exposed through CloudWatch, and for GCP are exposed through Stackdriver. These can both provide basic dashboards but lack the real-time slice and dice needs of our team. So we use our own Metamarkets products to monitor the metrics coming off our machines. For logging we found that Stackdriver can provide some interesting information by having access to details at the load balancer level, but for the vast majority of our logging needs we export data to SumoLogic. Neither CloudWatch nor Stackdriver have the understanding of containerized services as a first-class assumption. As containerization orchestrators such as Mesos and Kubernetes gain more popularity, this is an area I’m hoping to see more innovation in down the line.
One of the odd aspects about GCP was that many of the features of interest for our use were pre-GA. This left us with a strange choice, where we had to determine if going with a pre-GA offering was more risky or less risky than developing an alternative in-house. It is worth noting that Gmail was in beta from 2004 to 2009, a hefty testing timeline. So a common question we would ask our account representative was “Is this real-beta or gmail-beta?” Most of the time, pre-GA items were determined to be more stable and reliable than what we could cook up as an alternative in a short time.
In general AWS has a higher quantity of more mature features, but the features GCP is publishing tend to come with less vendor lock. This also means that you can try-out the public versions of many of the GCP offerings without any spend on the GCP platform itself, which is very valuable to feed the natural curiosity of developers.
The AWS auto scaling groups function close to how we traditionally operate scaling needs. We don’t really use any auto-scaling capabilities, but use ASGs as a way to do instance accounting. Being able to modify the instance count in the UI is very handy. In GCP, the instance groups have a nasty side effect where you cannot leave the instance quantity unspecified in deployment manager, so it is very easy for one operator to scale an instance group, and another to push a different count through deployment manager.
The GCP web UI is a little more modern and feels snappier, though the recent updates to the AWS console are a huge improvement over the prior version. The in-browser SSH sessions in GCP are also very nice. For building instance templates themselves, the ability to just plop a file into GCS and use that as your instance root image is very handy feature for GCP.
At Metamarkets, we believe in the cloud. More specifically, we believe that one day soon people will think of servers the same way they think of circuits. Our technology investments are aimed at making the connectivity of data to insight completely seamless. By exercising the advantages of various cloud providers, Metamarkets is better positioned to adjust to a changing world and adapt our compute needs as the ravenous desire for data insight continues to grow.
Editor’s note: this article was updated on August 6th 2017, to better reflect current terminology relating to neurodiversity.
Imagine a young Isaac Newton time-travelling from 1670s England to teach Harvard undergrads in 2017. After the time-jump, Newton still has an obsessive, paranoid personality, with Asperger’s syndrome, a bad stutter, unstable moods, and episodes of psychotic mania and depression. But now he’s subject to Harvard’s speech codes that prohibit any “disrespect for the dignity of others”; any violations will get him in trouble with Harvard’s Inquisition (the ‘Office for Equity, Diversity, and Inclusion’). Newton also wants to publish Philosophiæ Naturalis Principia Mathematica, to explain the laws of motion governing the universe. But his literary agent explains that he can’t get a decent book deal until Newton builds his ‘author platform’ to include at least 20k Twitter followers – without provoking any backlash for airing his eccentric views on ancient Greek alchemy, Biblical cryptography, fiat currency, Jewish mysticism, or how to predict the exact date of the Apocalypse.
Newton wouldn’t last long as a ‘public intellectual’ in modern American culture. Sooner or later, he would say ‘offensive’ things that get reported to Harvard and that get picked up by mainstream media as moral-outrage clickbait. His eccentric, ornery awkwardness would lead to swift expulsion from academia, social media, and publishing. Result? On the upside, he’d drive some traffic through Huffpost, Buzzfeed, and Jezebel, and people would have a fresh controversy to virtue-signal about on Facebook. On the downside, we wouldn’t have Newton’s Laws of Motion.
Let’s take a step back from this alt-history nightmare and consider the general problem of ‘neurodiversity’ and free speech. In this article, I’ll explore the science of neurodiversity, and how campus speech codes and restrictive speech norms impose impossible expectations on the social sensitivity, cultural awareness, verbal precision, and self-control of many neurodivergent people.
I’ll focus on how campus speech codes impose discriminatory chilling effects on academic neurodiversity, partly because I’m a nerdy academic who loathes speech codes. But it’s not just personal. Ever since the Middle Ages, universities have nurtured people with unusual brains and minds. Historically, academia was a haven for neurodiversity of all sorts. Eccentrics have been hanging out in Cambridge since 1209 and in Harvard since 1636. For centuries, these eccentricity-havens have been our time-traveling bridges from the ancient history of Western civilization to the far future of science, technology, and moral progress. Now thousands of our havens are under threat, and that’s sad and wrong, and we need to fix it.
This article is a bit long, because the argument is new (as far as I know), and it requires a bit of background. But I hope you’ll stick with me, because I think the issue is neglected and important. (A note on terminology: universities are commonly assumed to be ‘neurohomogenous’, where everyone is ‘neurotypical’, but in fact they are ‘neurodiverse’ and include many ‘neurodivergent’ people, who cluster into ‘neurominorities’ sharing certain conditions, and who may become ‘Neurodiversity Movement’ activists to advocate for their rights. People with Asperger’s syndrome sometimes call themselves ‘aspies’. The ‘neurodiversity’ term came originally from the Autism Rights Movement, but now includes many variations in brain function apart from the autism spectrum.)
From eccentricity to neurodiversity
Censorship kills creativity, truth, and progress in obvious ways. Without the free exchange of ideas, people can’t share risky new ideas (creativity), test them against other people’s logic and facts (truth), or compile them into civilizational advances (progress). But censorship also kills rational culture in a less obvious way: it silences the eccentric. It discriminates against neurominorities. It imposes a chilling effect on unusual brains that house unusual minds. It marginalizes people who may have great ideas, but who also happen to have mental disorders, personality quirks, eccentric beliefs, or unusual communication styles that make it hard for them to understand and follow the current speech norms that govern what is ‘acceptable’. Harvard’s speech codes and Twitter’s trolls may not prohibit anything in Principiaitself, but they drive away the kinds of eccentric people who write such books because of all the other ‘offensive’ things they sometimes do and say.
Eccentricity is a precious resource, easily wasted. In his book On Liberty (1859): John Stuart Mill warned that ‘the tyranny of the majority’ tends to marginalize the insights of the eccentric:
The amount of eccentricity in a society has generally been proportional to the amount of genius, mental vigour, and moral courage which it contained. That so few now dare to be eccentric, marks the chief danger of the time. (Chapter 3, paragraph 13).
Nowadays, the tyranny of the neurotypical oppressing the neurodivergent may be the chief danger of our time.
The neurotypicality assumption behind speech codes
Campus speech codes may have been well-intentioned at first. They tried to make universities more welcoming to racial and sexual minorities by forcing everyone to speak as inoffensively as possible. But a side-effect of trying to increase demographic diversity was to reduce neurodiversity, by stigmatizing anyone whose brain can’t color inside the lines of ‘appropriate speech’. The more ‘respectful’ campuses became to the neurotypical, the more alienating they became to the neurodivergent.
Here’s the problem. America’s informal ‘speech norms’, which govern what we’re allowed to say and what we’re not, were created and imposed by ‘normal’ brains, for ‘normal’ brains to obey and enforce. Formal speech codes at American universities were also written by and for the ‘neurotypical’. They assume that everyone on campus is equally capable, 100% of the time, of:
Using their verbal intelligence and cultural background to understand speech codes that are intentionally vague, over-broad, and euphemistic, to discern who’s actually allowed to say what, in which contexts, using which words;
Understand what’s inside the current Overton windowof ‘acceptable ideas’, including the current social norms about what is ‘respectful’ versus what is ‘offensive’, ‘inappropriate’, ‘sexist’, ‘racist’, ‘Islamophobic’, or ‘transphobic’;
Use ‘Theory of Mind’ to predict with 100% accuracy which speech acts might be offensive to someone of a different sex, age, race, ethnicity, national origin, sexual orientation, religion, or political outlook;
Inhibit ‘inappropriate’ speech with 100% reliability in all social contexts that might be reported or recorded by others;
Predict with 100% accuracy what’s likely to trigger outrage by peers, student activists, social media, or mainstream media – any of which might create ‘adverse publicity’ for the university and a speech code inquisition, without due process or right of appeal, for the speaker.
Speech codes assume a false model of human nature – that everyone has the same kind of brain that yields a narrow, ‘normal’ set of personality traits, cognitive and verbal abilities, moral temperaments, communication styles, and capacities for self-inhibition. This neurotypicality assumption is scientifically wrong, because different people inherit different sets of genes that influence how their brains grow and function, and every mental trait shows substantial heritability. These heritable mental traits run deep: they are stable across adolescence and adulthood, and they span everything from social intelligence to political attitudes. They also predict many aspects of human communication – probably including the ability to understand and follow formal speech codes and informal speech norms. The neurodivergent are often just ‘born that way’.
Why speech codes stigmatize the most creative thinkers
When universities impose speech codes, they impose impossible behavioral standards on people who aren’t neurotypical, such as those with Asperger’s, bipolar, Tourette’s, or dozens of other personality quirks or mental ‘disorders’. Historically, neurodiversity was stigmatized with extreme prejudice, but recently the Autism Rights Movement, the National Alliance for Mental Illness, and other advocacy groups have fought for more acceptance. Neurodiversity is even celebrated in recent books such as Thinking in Pictures by Temple Grandin (on Asperger’s syndrome), A Beautiful Mind by Sylvia Nasar (on schizophrenia), The Wisdom of Psychopaths by Kevin Dutton (on Dark Triad traits), and Quiet by Susan Cain (on introversion).
Most of the real geniuses I’ve known are not neurotypical. Especially in evolutionary game theory. They would have a lot of trouble comprehending or following typical university speech codes. I suspect this would have been true for most of the brilliant thinkers who built civilization over the last several millennia. Consider just a few geniuses who seem, given biographical records, to have been on the autism/Asperger’s spectrum: Béla Bartók, Jeremy Bentham, Lewis Carroll, Marie Curie, Charles Darwin, Emily Dickinson, Albert Einstein, Sir Ronald Fisher, Sir Francis Galton, Glenn Gould, Patricia Highsmith, Alfred Hitchcock, Alfred Kinsey, Stanley Kubrick, Barbara McClintock, Gregor Mendel, Bertrand Russell, Nikola Tesla, Mark Twain, Alan Turing, H. G. Wells, and Ludwig Wittgenstein. (Aspies like me enjoy making lists; also see this resource.) Moreover, the world’s richest tech billionaires often show some Asperger-like traits: think Paul Allen, Bill Gates, Elon Musk, Larry Page, Peter Thiel, and Mark Zuckerberg. And in movies and TV, outspoken, insensitive aspies are no longer just ‘mad scientist’ side-kicks, but heroic protagonists such as Tony Stark, Sherlock Holmes, Gregory House, Lisbeth Salander, and Dr. Strange.
On the upside, the civilizational contributions from the neurodivergent have been formidable – and often decisive in science and technology. On the downside, Asberger’s traits seem common among academics who have suffered the worst public outrages against things they’ve said and done, that weren’t intended to be offensive at all.
The varieties of neurodiversity
Restrictive speech norms are a problem for people on the autism spectrum, which includes about 1% of the general public, but which is a much higher proportion of academics in science, technology, engineering, and mathematics (STEM fields) – like Sheldon Cooper, a Caltech physicist on the TV show The Big Bang Theory. Apart from the autism spectrum, a much larger proportion of students, staff, and faculty at any university have other neurological disorders, mental illnesses, or personality quirks that make it hard to avoid ‘offensive’ speech all of the time – even if they’re ‘high functioning’ and have no trouble doing their academic work. For example, speech codes make no allowance for these conditions:
Tourette syndrome(1%) can include irresistible compulsions to say obscene or derogatory things;
Social (pragmatic) communication disorder(a newly recognized disorder, prevalence unknown) impairs abilities to use language ‘appropriately’, to match communication styles to different contexts and listeners, and to read between the lines given subtle or ambiguous language;
PTSD(8% prevalence) increases sensitivity to reminders of past trauma (‘triggers’), which can provoke reactive anger, verbal aggression, and offensive speech;
Bipolar disorder(4%) can trigger manic phases in which beliefs become more eccentric, and speech and sexual behavior become less inhibited;
Schizophreniaspectrum disorders (5% prevalence) often lead to unusual communication styles, social awkwardness, and eccentric views that fall outside the Overton window;
Paranoid, schizoid, and schizotypal (‘Cluster A’) personality disorders (4% prevalence) involve social awkwardness, eccentric behaviors, and odd speech patterns, which can come across as insensitive or offensive;
Histrionic, narcissistic, borderline, and antisocial (‘Cluster B’) personality disorders (2% prevalence) involve impulsivity, attention-seeking, emotional instability and/or lack of empathy, which result in speech and behavior that often violates social norms.
Some of the prevalence estimates are imprecise, and many people have more than one of these disorders. But together, mental disorders like these affect at least 20% of students, staff, and faculty. That’s higher than the percentage of American college students who are Hispanic (17%), Black (14%), LGBTQ+ (7%), or undocumented immigrants (5%). And for many of these mental disorders, symptom severity peaks at the ages of typical college students: universities are demanding that the neurodivergent inhibit their speech most carefully when they are least able to do so.
Apart from diagnosable mental disorders such as Asperger’s, a substantial minority of people on any campus are on the extremes of the Big Five personality traits, which all have implications for speech code behavior. Low Conscientiousness predicts impulsive, reckless, or short-sighted speech and behavior – i.e. being more likely to violate speech codes. Low Agreeableness predicts being ornery, offensive, and disagreeable – i.e. violating speech codes. High Openness predicts adopting unusual beliefs and eccentric behaviors – i.e. violating speech codes. High Extraversion predicts being hyper-social, hyper-sexual, and hyper-verbal – i.e. especially violating codes about sexual behavior and speech. Since the Big Five traits all show substantial heritability, any speech code that can’t realistically be followed by people who score at an extreme on these Big Five traits, is basically punishing them for the genes they happened to inherit.
Beyond mental disorders and personality quirks, many people on campuses at any given time are in states of ‘transient neurodiversity’ – altered psychological states due to low blood sugar, life stressors, medication side-effects, or ‘smart drugs’ such as caffeine, Ritalin, Adderall, or Modafinil. Also, sleep disorders affect over 20% of people, and the resulting sleep deprivation reduces inhibition. These kinds of transient neurodiversity can also interfere with social sensitivity, Theory of Mind, and verbal inhibition, so can reduce the ability to comply with speech codes. Unless universities want to outlaw fatigue, hunger, heartbreak, meds and coffee it’s hard to maintain the delusion that everyone’s speech will be 100% inoffensive 100% of the time.
How neurodiversity makes it hard to understand speech codes
Since speech codes are written by the neurotypical for the neurotypical, the neurodivergent often find them literally incomprehensible, and it’s impossible to follow a rule that doesn’t make sense. For example, a typical set of ‘respectful campus’, ‘sexual misconduct’, and ‘anti-harassment’ policies prohibits:
‘unwelcome verbal behavior’
‘unwelcome jokes about a protected characteristic’
‘hate or bias acts that violate our sense of community’
‘sexist comments’
‘degrading pictorial material’
‘displaying objectionable objects’
‘negative posters about a protected characteristic’
These quotes are from my university’s recent policies, but they’re pretty standard. I don’t understand what any of these phrases actually allow or prohibit, and I worked on free speech issues in our Faculty Senate for two years, and in our Sexual Misconduct Policy Committee for one year, so I’ve puzzled over them for some time.
Lacking good Theory of Mind, how could a person with Asperger’s anticipate which speech acts would be ‘unwelcome’ to a stranger, or might be considered ‘sexist’ or ‘sexually suggestive’? Lacking a good understanding of social norms, how could they anticipate what counts as a ‘hate act that violates our sense of community’, or what counts as an ‘objectionable object’? Lacking a good understanding of current civil rights legalese, how could any 18-year-old Freshman – neurotypical or not – understand what a ‘protected characteristic’ is?
The language of campus speech codes is designed to give the illusion of precision, while remaining so vague that they can be enforced however administrators want to enforce them, whenever personal complaints, student protests, lawsuits, or adverse publicity make it expedient to punish someone for being ‘offensive’. So, students, staff, and faculty are expected to be able to ‘read between the lines’ of speech codes to understand what is actually forbidden versus what is actually permitted.
But people differ in their ability to understand spoken and written language, including the dry intricacies of administrative policies, the ever-changing euphemisms of PC culture, and the double standards of Leftist identity politics. Deciphering speech codes requires high levels of verbal, social, and emotional intelligence to discern the real meaning behind vague euphemisms and social justice shibboleths, and the neurodivergent may not have the kinds of brains that can make those kinds of inferences.
Speech codes are also intentionally vague so that anyone who’s upset by someone else’s speech can make a complaint, with the subjective feelings of the listener as the arbiter of whether an offense has occurred. In most campus speech codes, there is no ‘reasonable person’ standard for what speech counts as offensive. This means that even if an aspie or schizotypal person develops an accurate mental model of how an average person would respond to a possible speech act, they can’t rely on that. They’re expected to make their speech inoffensive to the most sensitive person they might ever encounter on campus. The result is the ‘coddling culture’ in which administrators prioritize the alleged vulnerabilities of listeners over the communication rights of speakers. In fact, the only lip service given to neurodiversity in campus speech codes is in the (false) assumption that ‘trigger warnings’ and prohibitions against ‘microaggressions’ will be useful in protecting listeners with PTSD or high neuroticism. Administrators assume that the most vulnerable ‘snowflakes’ are always listeners, and never speakers. They even fail to understand that when someone with PTSD is ‘triggered’ by a situation, they might say something in response that someone else finds ‘offensive’.
Systematizing versus empathizing
Autism spectrum disorders are central to the tension between campus censorship and neurodiversity. This is because there’s a trade-off between ‘systematizing’ and ‘empathizing’. Systematizing is the drive to construct and analyze abstract systems of rules, evidence, and procedures; it’s stronger in males, in people with autism/Asperger’s, and in STEM fields. Empathizing is the ability to understand other people’s thoughts and feelings, and to respond with ‘appropriate’ emotions and speech acts; it’s stronger in females, in people with schizophrenia spectrum disorders, and in the arts and humanities. Conservative satirists often mock ‘social justice warriors’ for their ‘autistic screeching’, but Leftist student protesters are more likely to be high empathizers from the arts, humanities, and social sciences, than high systematizers from the hard sciences or engineering.
Consider the Empathy Quotient (EQ) scale, developed by autism researcher Simon Baron-Cohen to measure empathizing versus systematizing. Positively-scored items that predict higher empathy include:
‘I am good at predicting how someone will feel.’
‘I find it easy to put myself in somebody else’s shoes.’
‘I can tune into how someone else feels rapidly and intuitively.’
‘I can usually appreciate the other person’s viewpoint, even if I don’t agree with it.’
Negatively-scored items that predict lower empathy include:
‘I often find it difficult to judge if something is rude or polite.’
‘It is hard for me to see why some things upset people so much.’
‘I can’t always see why someone should have felt offended by a remark.’
‘Other people often say that I am insensitive, though I don’t always see why.’
Reading these items, it seems like a higher EQ score would strongly predict ability to follow campus speech codes that prohibit causing offense to others. People on the autism spectrum, such as those with Asperger’s, score much lower on the EQ scale. (Full disclosure: I score 14 out of 80.) Thus, aspies simply don’t have brains that can anticipate what might be considered offensive, disrespectful, unwanted, or outrageous by others – regardless of what campus speech codes expect of us. From a high systematizer’s perspective, most ‘respectful campus’ speech codes are basically demands that they should turn into a high empathizer through sheer force of will. Men also score lower on the EQ scale than women, and Asperger’s is 11 times more common in men, so speech codes also impose ‘disparate impact’ on males, a form of sex discrimination that is illegal under federal law.
The ways that speech codes discriminate against systematizers is exacerbated by their vagueness, overbreadth, unsystematic structure, double standards, and logical inconsistencies – which drive systematizers nuts. For example, most speech codes prohibit any insults based on a person’s sex, race, religion, or political attitudes. But aspie students often notice that these codes are applied very selectively: it’s OK to insult ‘toxic masculinity’ and ‘patriarchy’, but not to question the ‘wage gap’ or ‘rape culture’; it’s OK to insult ‘white privilege’ and the ‘Alt-Right’ but not affirmative action or ‘Black Lives Matter’; it’s OK to insult pro-life Catholics but not pro-sharia Muslims. The concept of ‘unwelcome’ jokes or ‘unwelcome’ sexual comments seems like a time-travel paradox to aspies – how can you judge what speech act is ‘unwelcome’ until after you get the feedback about whether it was welcome?
Even worse, most campus speech codes are associated with social justice theories of gender feminism, critical race theory, and social constructivism, which reject the best-established scientific findings about sex differences, race differences, and behavior genetics. Requiring aspies to buy into speech codes based on blatant falsehoods violates our deepest systematizer values of logic, rationality, and realism. For an example of a systematizer’s exasperation about unprincipled speech codes, see this letter by a Cornell student with high-functioning autism.
To test my intuitions about these issues, I ran an informal poll of my Twitter followers, asking ‘Which condition would make it hardest to follow a college speech code that prohibits all ‘offensive’ or ‘disrespectful’ statements?’. There were 655 votes across four response options: 54% for ‘Asperger’s’, 19% for ‘Schizophrenia’, 14% for ‘Bipolar’, and 13% for ‘ADHD’. The results of this one-item survey, from a small sample of my eccentric followers, should not be taken seriously as any kind of scientific research. They simply show I’m not the only person who thinks that Asperger’s would make it hard to follow campus speech codes.
In fact, to many STEM students and faculty, empathizers seem to have forged campus speech codes into weapons for aspie-shaming. In a world where nerds like Mark Zuckerberg and Elon Musk are the most powerful innovators, speech codes seem like the revenge of the anti-nerds.
How speech codes impose disparate impact on neurominorities
When a policy is formally neutral, but it adversely affects one legally protected group of people more than other people, that’s called ‘disparate impact’, and it’s illegal. People with diagnosed mental disorders qualify as ‘disabled’ people under the 1990 Americans with Disabilities Act (ADA) and other federal laws, so any speech code at a public university that imposes disparate impact on neurominorities is illegal.
What is the disparate impact here? Given restrictive speech codes and speech norms, neurodivergent people know that at any time, they might say something ‘offensive’ that could lead to expulsion, firing, or denial of tenure. They live in fear. They feel a chilling effect on their speech and behavior. They learn to self-censor.
Consider how speech codes can feel wretchedly discriminatory to neurominorities:
Imagine you’re a grad student in the social sciences and you hear about peers getting into trouble making off-the-cuff remarks when teaching controversial classes, such as Human Sexuality, American History, or Social Psychology. You are deterred from teaching, and drift away into private industry.
Imagine you are a man with Asperger’s syndromedoing a science Ph.D. and you see social justice activists destroying nerdy male scientists for their non-PC views, trivial mistakes, or fictional offenses, as in the cases of Matt Tayloror Tim Hunt. You realize you’ll probably make some similar misjudgment sooner or later if you stay in academia, so you leave for a Bay Area tech start-up that’s more forgiving of social gaffes.
Imagine you’re an anthropology professor with Asperger’s, so you can’t anticipate whether people will find your jokes hilarious or offensive until you tell them. But you get better student course evaluations when you try to be funny. Now your university imposes a new speech code that says, basically, ‘Don’t say anything that people might find offensive’. You need good course evaluations for promotion and tenure, but your brain can’t anticipate your students’ reactions to your quirky sense of humor.
Imagine you’re an undergrad, but you have bipolar disorder, so sometimes you get into manic states, when you become more outspoken in classes about your non-PC views on sexual politics.
Imagine you’re a university system administrator with Tourette syndrome, so that sometimes in meetings with other IT staff, you can’t help but blurt out words that some consider racially or sexually offensive.
In response to these chilling effects, neurodivergent academics may withdraw from the social and intellectual life of the university. They may avoid lab group meetings, post-colloquium dinners, faculty parties, and conferences, where any tipsy comment, if overheard by anyone with a propensity for moralistic outrage, could threaten their reputation and career. I’ve seen this social withdrawal happen more and more over the last couple of decades. Nerdy, eccentric, and awkward academics who would have been outspoken, hilarious, and joyful in the 1980s are now cautious, somber, and frightened.
This withdrawal from the university’s ‘life of the mind’ is especially heart-breaking to the neurodivergent, who often can’t stand small talk, and whose only real social connections come through vigorous debate about dangerous ideas with their intellectual equals. Speech codes don’t just censor their words; they also decimate their relationships, collaborations, and social networks. Chilling effects on speech can turn an aspie’s social life into a frozen wasteland. The resulting alienation can exacerbate many mental disorders, leading to a downward spiral of self-censorship, loneliness, despair, and failure. Consider political science professor Will Moore: he had high-functioning autism, and was so tired of accidentally offending colleagues that he killed himself this April; his suicide note is here. If being driven to suicide isn’t disparate impact, what is?
There’s an analogy here between neurodiversity and ideological diversity. Campus speech codes have marginalized both over the last couple of decades. American universities are now dominated by progressive Leftists, registered Democrats, and social justice activists. They are hostile and discriminatory against students, staff, and faculty who are centrist, libertarian, conservative and/or religious. There are real career costs to holding certain political views in academia – even if those views are shared by most Americans. This problem of ideological diversity is already being addressed by great organizations such as the Heterodox Academy and the Foundation for Individual Rights in Education, by online magazines such as Quillette, and by free speech advocates such as Alice Dreger, Jonathan Haidt, Sam Harris, Laura Kipnis, Scott Lilienfeld, Greg Lukianoff, Camille Paglia, Jordan Peterson, Steven Pinker, and Bret Weinstein. By contrast, the neurodiversity problem has not been discussed much, although it might be easier to solve through anti-discrimination lawsuits. In principle, speech codes discriminating against certain ideologies is a form of disparate impact, but at the moment, being a Republican or a Neoreactionary is not a ‘protected class’ under federal anti-discrimination law, whereas having a disability such as a mental disorder is.
Conclusion: What to do about neurodiversity and free speech
Campus speech codes discriminate against neurominorities. They impose unrealistic demands, fears, and stigma on the large proportion of students, staff, and faculty who have common mental disorders, or extremes on the Big Five personality traits, or transient disinhibition due to sleep deprivation or smart drugs. As a practical matter, it is virtually impossible for someone with Asperger’s, bipolar, ADHD, low Agreeableness, low Conscientiousness, extreme fatigue, or Modafinil mania to understand what kinds of speech acts are considered acceptable, and to inhibit the production of such speech 100% of the time, in 100% of educational and social situations.
In a future article, I’ll outline a legal strategy to use the ADA to eliminate campus speech codes that discriminate against neurominorities.
For the moment, just consider this: every campus speech code and restrictive speech norm is a Sword of Damocles dangling above the head of every academic whose brain works a little differently. We feel the sharpness and the weight every day. After every class, meeting, blog, and tweet, we brace for the moral outrage, public shaming, witch hunts, and inquisitions that seem to hit our colleagues so unpredictably and unfairly. Like visitors from a past century or a foreign culture, we don’t understand which concepts are admissible in your Overton window, or which words are acceptable to your ears. We don’t understand your verbal and moral taboos. We can’t make sense of your double standards and logical inconsistencies. We don’t respect your assumption that empathizing should always take precedence over systematizing. Yet we know you have the power to hurt us for things we can’t help. So, we suffer relentless anxiety about our words, our thoughts, our social relationships, our reputations, and our careers.
That era is over. Neurodiversity is finding its voice and its confidence. People with mental disorders and eccentric personalities have rights too, and we will not be intimidated by your stigma and shaming. We will demand our rights under the ADA through the Department of Education, the Department of Justice, and in federal district courts. We will educate administrators about the discriminatory side-effects of their bad policies. We will shatter your Swords of Damocles and raise our freak flags to fly over campuses around the world.
For centuries, academia has been a haven for neurodiversity – a true ‘safe space’ for eccentric thought and language, for thinking the unthinkable and saying the unsayable. We will make it that haven again, and there is nothing that university administrators can do to stop us. Everything is on our side: behavioral science, intellectual history, federal law, public opinion, and liberal academia’s own most sacred values of diversity and inclusivity. Neurodiversity is here to stay, and we will not be silenced any longer.
If the neurodivergent stand up for our free speech rights, campus speech codes will go extinct very quickly. In the future, they will be considered a weird historical curiosity of runaway virtue-signaling in early 21st-century American academia. The freedom to think eccentric thoughts and say eccentric things must be protected again. The freedom to be eccentric must be restored. Newton must be welcomed back to academia.
Acknowledgements: For helpful feedback on earlier drafts, thanks to Jean Carroll, Diana Fleischman, Jonathan Haidt, Claire Lehmann, Greg Lukianoff, and many fine people on Twitter.
Geoffrey Miller is a tenured associate professor of psychology at the University of New Mexico, where he’s fought for several years to eliminate speech codes. He is the author of The Mating Mind, Mating Intelligence, Spent, and What Women Want. His research has focused on sexual selection, mate choice, human sexuality, intelligence, humor, creativity, personality traits, evolutionary psychopathology, behavior genetics, consumer behavior, evolutionary aesthetics, research ethics, virtue signaling, and Effective Altruism. He did a podcast called The Mating Grounds; follow him on Twitter @primalpoly.
In July last year, long-haul truck driver Stephanie Klang got a rare speeding ticket because she was too engrossed listening to public radio.
“It’s okay, I only get a speeding ticket about once every 10 years,” she said. “… It was worth it for the story.”
She told the state patrolman that yes, she knows listening to the radio is not a valid excuse, then proceeded to tell him all about the radio show that took her mind off her speed — an episode of BackStory about the history of taxes in the U.S. after the country had just broken away from England.
Klang has been a truck driver for 37 years, going through all 48 states, and she listens to public radio all the time. She said she used to have a small booklet listing all the public radio stations in the country, which she got as a gift for pledging support.
“I used that book until it absolutely fell apart, and I wish I’d ordered two of them now,” she said.
After years of listening, she has memorized some local stations — for example, 90.1 in Dallas and 90.7 in St. Louis.
Image may be NSFW. Clik here to view.
Murphy
She’s not the only truck driver who listens to NPR — far from it, according to Finn Murphy, who has been a long-haul trucker for more than 30 years.
“Every single driver I’ve ever talked to listens to NPR,” said Murphy.
He recently published The Long Haul, a book about his experiences. “If I can, I’ll schedule my driving to catch Fresh Air with Terry Gross,” Murphy wrote. “ … I’ve got a little crush on Terry, actually. It’s probably because I’ve spent more time with her than anyone else in my life.”
I read the book and interviewed Murphy for a story about the future of the trucking industry, and that part of the book got me curious. Why do truckers like NPR? They probably don’t fit the mold of the “business leader,” “educated lifelong learner” or any of the profiles described by National Public Media. And most people probably don’t think “trucker” when they envision the typical public radio listener. So what can truckers tell us about what public radio knows about its listeners and how we could serve them better?
Murphy writes that even if truckers “may not like the slant, if there is one,” they still listen to public radio. A few years ago, he was sitting at a truck stop coffee counter with a driver who was a Ku Klux Klan member. Murphy asked the other driver if he listened to NPR.
“He said, ‘Oh god, yeah, ‘US Jews and Girls Report.’ I said, ‘Well, what do you mean?’ He said, ‘Well, all the commentators are Jews … and they’re always talking about women’s issues. It drives me crazy.’”
“And I said, ‘Well, so why don’t you stop listening?’” Murphy continued. “And he says, ‘I can’t, because it’s the only station that will go on mile after mile and I can pick it up again.’”
Aside from the content, according to Murphy, drivers like NPR for the continuity. They can keep listening to the same programs from state to state.
They also like NPR because they’re bored, he said, even if a Klan member is listening to a show about Black Lives Matter or transgender people.
“Even if that’s not important to you, that discussion … you’re hearing on the radio is still a whole lot more interesting than anything else you’re going to find on the radio,” Murphy said. “If you want to get excited and exercised and activated about different points of view from yours, that also makes the miles go by.”
Fred Manale, a 55-year-old trucker from Louisiana, said he listens to public radio, though he finds it “disturbing.” For example, he said NPR should not be blaming President Trump for having a connection to Russia.
“It’s almost like the last thing you can listen to, and then when that runs out, then you find something else to listen to,” Manale said.
Murphy also said that public radio stations don’t know how many truckers listen to them. He realized this after talking about his book on public radio stations.
“They’re like, ‘Really? Truckers listen to NPR?’” he said. “I’m like, ‘Yeah, every frigging trucker listens to NPR.’”
Not all stations are oblivious to the audience. WNYC’s Brian Lehrer Show has definitely taken calls from truckers over the years, said EP Megan Ryan. And NPR spokesperson Isabel Lara wrote in an email that “it’s high praise when truck drivers — who spend long hours on the road and do more listening than just about anyone else — say they listen to NPR.” She added that NPR’s listener services department does hear from listeners who identify as truckers, some of whom scan for NPR stations as they travel across state lines.
Image may be NSFW. Clik here to view.
Murphy and his truck.
Murphy said that though he likes NPR, stations could do more to appeal to drivers. For example, he said stations doing pledge drives could consider giving a shoutout to drivers and see whether they can contribute.
In his book, he wrote that “it would be incorrect to think that truckers constitute some harmonized bloc of redneck atavism.”
Ray Hollister discovered public radio when he was a long-haul truck driver for a year in 2002. Now the general manager of an IT company, Hollister said the network needs to “speak more to the flyover states. … Do stories that affect more people than just the coast.”
“Truck drivers come from across America,” he said. “They’re a pretty decent cross section of America. I knew a ton of white, black, Asian, Hispanic truck drivers, and the only thing we had in common was that we were all truck drivers.”
If people in public radio are surprised that truck drivers listen to their stations, the system has a blind spot, said John Sutton, general manager of WESA in Pittsburgh and an industry consultant in programming, marketing and research. When people in public radio look at research and talk to potential underwriters and foundations, Sutton said, they focus on how public radio listeners are different — how they’re well educated and more likely to volunteer and engage with the arts.
“Those things are important, but … we often blind ourselves to how similar our listeners are to the average American,” Sutton said. “There are a lot of people who listen to us who don’t have college degrees, and we just don’t focus on those people in a lot of our discussions.”
According to Sutton, an NPR audience survey this year found that the percentage of NPR news listeners who gambled in the past year is almost the same as the percentage who went to the opera or performances of classical music, and that listeners are just as likely as the average American to donate to religious organizations.
People in public radio making content and programming decisions should think “as much about what public radio listeners have in common with the average American, instead of what the differences are,”Sutton said. “… [T]here are lots of truck drivers that are curious about the world, and they’ve got a lot of time to listen.”