Ms. Slaughter told Mr. Lynn that “the time has come for Open Markets and New America to part ways,” according to an email from Ms. Slaughter to Mr. Lynn. The email suggested that the entire Open Markets team — nearly 10 full-time employees and unpaid fellows — would be exiled from New America.
While she asserted in the email, which was reviewed by The New York Times, that the decision was “in no way based on the content of your work,” Ms. Slaughter accused Mr. Lynn of “imperiling the institution as a whole.”
Mr. Lynn, in an interview, charged that Ms. Slaughter caved to pressure from Mr. Schmidt and Google, and, in so doing, set the desires of a donor over the think tank’s intellectual integrity.
“Google is very aggressive in throwing its money around Washington and Brussels, and then pulling the strings,” Mr. Lynn said. “People are so afraid of Google now.”
Google rejected any suggestion that it played a role in New America’s split with Open Markets. Riva Sciuto, a Google spokeswoman, pointed out that the company supports a wide range of think tanks and other nonprofits focused on information access and internet regulation. “We don’t agree with every group 100 percent of the time, and while we sometimes respectfully disagree, we respect each group’s independence, personnel decisions and policy perspectives.”
New America’s executive vice president, Tyra Mariani, said it was “a mutual decision for Barry to spin out his Open Markets program,” and that the move was not in any way influenced by Google or Mr. Schmidt.
“New America financial supporters have no influence or control over the research design, methodology, analysis or findings of New America research projects, nor do they have influence or control over the content of educational programs and communications efforts,” Ms. Mariani said. She added that Mr. Lynn’s statement praising the European Union’s sanctions against Google had been temporarily removed from New America’s website because of “an unintentional internal issue” unrelated to Google or Mr. Schmidt.
Ms. Mariani and Ms. Sciuto said Google is continuing to fund New America.
Hours after this article was published online Wednesday morning, Ms. Slaughter announced that the think tank had fired Mr. Lynn on Wednesday for “his repeated refusal to adhere to New America’s standards of openness and institutional collegiality.”
Ms. Slaughter also wrote on Twitter that the article was “false,” but was unable to cite any errors. New America would not make Ms. Slaughter available for an interview.
It is difficult to overstate Mr. Lynn’s influence in raising concerns about the market dominance of Google, as well as of other tech companies such as Amazon and Facebook. His Open Markets initiative organized a 2016 conference at which a range of influential figures — including Senator Elizabeth Warren of Massachusetts — warned of damaging effects from market consolidation in tech.
In the run-up to that conference, Ms. Slaughter and New America’s lead fund-raiser in emails to Mr. Lynn indicated that Google was concerned that its positions were not going to be represented, and that it was not given advanced notice of the event.
“We are in the process of trying to expand our relationship with Google on some absolutely key points,” Ms. Slaughter wrote in an email to Mr. Lynn, urging him to “just THINK about how you are imperiling funding for others.”
Mr. Lynn is now starting a stand-alone nonprofit with the same team to continue Open Markets’ work. The new group, which does not yet have a name, has funding commitments, though clearly is not expecting money from Google. It has launched a website called Citizens Against Monopoly that accuses Google of “trying to censor journalists and researchers who fight dangerous monopolies.” The site vows, “We are going to make sure Google doesn’t get away with this.”
After initially eschewing Washington public policy debates, which were seen in Silicon Valley as pay-to-play politics, Google has developed an influence operation that is arguably more muscular and sophisticated than that of any other American company. It spent $9.5 million on lobbying through the first half of this year — more than almost any other company. It helped organize conferences at which key regulators overseeing investigations into the company were presented with pro-Google arguments, sometimes without disclosure of Google’s role.
Among the most effective — if little examined — tools in Google’s public policy toolbox has been its funding of nonprofit groups from across the political spectrum. This year, it has donated to 170 such groups, according to Google’s voluntary disclosures on its website. While Google does not indicate how much cash was donated, the number of beneficiaries has grown exponentially since it started disclosing its donations in 2010, when it gave to 45 groups.
Some tech lobbyists, think tank officials and scholars argue that the efforts help explain why Google has mostly avoided damaging regulatory and enforcement decisions in the United States of the sort levied by the European Union in late June.
But Google’s Washington alliances could be tested in the coming months. Google emerged as a flash point in the latest skirmish of the culture wars this month after one of its male engineers posted a critique of the company’s efforts to diversify. And its data collection continues fueling questions about its commitment to privacy.
Then there are the mounting concerns about the market dominance of Google, which handles an overwhelming majority of all internet searches globally and dominates internet advertising. Its alleged tilting of search results to favor its services over those offered by competitors led to the European Union’s $2.7 billion antitrust penalty in June.
The Open Markets’ statement that drew Mr. Schmidt’s ire praised the fine, and called on United States regulators to more aggressively enforce antitrust rules against Google, Amazon and “other dominant platform monopolists.”
Last month, Democratic congressional leaders rolled out a policy platform that included a pledge to dismantle monopolies, including in cable and internet service, which some read as a challenge to Google in particular. That sentiment — which appears to have some support from populist elements of President Trump’s base — diverges sharply from the approach that had been taken by most Democrats until recently.
Google’s willingness to spread cash around the think tanks and advocacy groups focused on internet and telecommunications policy has effectively muted, if not silenced, criticism of the company over the past several years, said Marc Rotenberg, the president of the Electronic Privacy Information Center. His group, which does not accept corporate funding, has played a leading role in calling out Google and other tech companies for alleged privacy violations. But Mr. Rotenberg said it is become increasingly difficult to find partners in that effort as more groups accept Google funding.
“There are simply fewer groups that are available to speak up about Google’s activities that threaten online privacy,” Mr. Rotenberg said. “The groups that should be speaking up aren’t.”
Google’s Accelerated Mobile Pages are a set of performance focused restrictions and limitations on HTML, CSS, and JavaScript that maximize performance and allow Google to distribute optimized content using their CDN. In a nutshell, Google AMP enforces the usage of AMP web components in place of certain HTML tags, restricts CSS selectors that can negatively impact performance, and forbids the usage of non Google AMP JavaScript outside sandboxed iframes. To learn more about the AMP format, how it works, and how it fits into Google search, take a look at the official overview, technical description, and the AMP by Example portal.
Why do we publish on Google AMP?
Condé Nast is one of the world’s most prestigious publishers. You may recognize some of our brands which include Ars Technica, Bon Appétit, Golf Digest, GQ, Pitchfork, The New Yorker, Vanity Fair, Vogue, Wired, and others. As a publisher, implementing AMP was a no-brainer. AMP delivers many benefits in terms of performance, consistency, and experience for our mobile users. Users arriving from Google undergo an uninterrupted flow from search into our content. AMP content loads quickly thanks to Google’s CDN and AMP HTML enforced performance guidelines. The AMP Layout System ensures a great reading experience by preventing pages from jumping around as third-party content loads. Our regular sites are not by any means slow, we constantly beat our competitors’ load and render times, but there are certain performance guarantees that Google can deliver to their users (e.g. prefetching) when content is on their CDN.
Let’s take a step back and think about what incentivises us as a publisher. To generate revenue, we need our content to be impactful and discoverable by the largest possible audience. Keeping this audience engaged with quality content across our brands helps us maximize our revenue. Google AMP helps us satisfy these needs. AMP increases the visibility and discoverability of our content by allowing it to be included in Google’s Top News Carousel, as well as improving the experience of regular Google search results. AMP ensures our content consistently loads quickly anywhere in the world with a seamless experience from Google search through their integrated AMP viewer. This seamless experience leads to increased engagement and decreased bounce. Monetization benefits are less decisive and pending further analysis of our current implementation. Our initial AMP launch only served ads with contextual targeting which have a lower CPM (cost-per-mille, the amount a publisher makes for a thousand impressions of an ad) than the ads we serve on our regular mobile sites which support both contextual and audience targeting. As we’ve improved targeting and added support for other revenue partners, we see that CPM for AMP has either stayed the same or increased depending on the brand.
We went live with Google AMP on Vanity Fair a little over a year ago. Post-launch, the traffic and search rank results were very positive: click through rate from Google search went from 5.9% (Regular) to 10.3% (AMP), and average search position went from 5.9 (Regular) to 1.7 (AMP). Since then, we have deployed AMP across fifteen of our brands and we have been very pleased with the results. Today, AMP accounts for 79% of our mobile search traffic and 36% of our total mobile visits. We were able to scale our Google AMP implementation at Condé Nast from one to fifteen brands with minimal disruption and engineering effort from our brands.
How have we implemented Google AMP?
Our brand’s AMP content gets discovered by having the regular page include a link rel="amphtml" tag in their head section pointing to the URL for the AMP version of the page. In our architecture, we proxy traffic for AMP URLs to our internal AMP service which is in charge of generating the AMP version of the requested content. The following diagram is an overview of our content creation and AMP distribution architecture:
All of our brands use Copilot, our internal Content Management System (CMS). Our CMS stores content in a customized version of markdown called Copilot Markdown. Copilot Markdown is simply a set of extensions on the CommonMark specification which adds special syntax for embeds, callouts, sections, and other features used by our editors.
When a Google user opens an AMP result for one of our brands, the Google AMP CDN quickly delivers its latest cached copy. Behind the scenes, the Google AMP CDN triggers a request to the URL of the AMP document on our brand’s domain. All of our brands are fronted by the Fastly CDN which provides another layer of caching. When the request hits Fastly, logic in the brand’s varnish control language (VCL) configuration determines if the request is for AMP content. If so, Fastly sets the backend for the request to be the AMP Service which then takes over and generates AMP HTML for the requested content.
In order for our AMP service to render a piece of content, we have to fetch the content from our CMS, then parse and transform it into React components which get rendered into valid AMP HTML. The following diagram shows this rendering pipeline which is internal to the AMP service:
At Condé Nast, our tech stack uses Node.js and React for most of our brands’ regular websites. Using React with server side rendering to generate AMP content makes perfect sense for us. This choice allows our brand engineers to easily contribute to our AMP codebase. The use of Node.js and React also allows us to reuse components and helper code across our regular sites and AMP service.
Ensuring that brands maintain their look and feel when served over AMP was an important priority when creating the AMP service. The AMP service enables this by allowing brands to provide their own configuration and their own Sass files that override our sane defaults. If provided, the configuration and SCSS files control the output AMP HTML markup and CSS respectively. Through this approach, brands are able to toggle features and customize their design. In the future, we will take further advantage of React components extensibility to allow much deeper customization of brands’ AMP HTML markup output. Having a service that’s able to do this for all of our brands has proven very valuable and drastically reduced duplication of effort. To start serving AMP traffic, a brand just needs to add a single configuration file and modify their Fastly VCL configuration. As we’ve grown the feature set of our AMP implementation and onboarded more brands, this architecture has proven to be a very scalable solution.
What did we learn?
Our centralized content management system and service-oriented architecture makes it nearly effortless to onboard new brands onto AMP. This wasn’t always the case. When we onboarded our first brand we were not using a service architecture, instead we had a plugin architecture where each brand had to depend on the AMP plugin. This plugin architecture resulted in duplication of code as we onboarded other brands, it also made it challenging to roll out new plugin versions across all brands. These problems went away when we switched to the shared AMP service. The power and scalability of a service-oriented architecture have been clear. The AMP service validates the use of cross-brand shared services throughout Condé Nast. Furthermore, the benefits of having our data stored in a common format (Copilot Markdown) for all our brands, with shared tooling, APIs, and effort have never been more evident.
Overall, AMP has had a positive impact to the business in terms of traffic and to our Google users in terms of experience. Today, AMP accounts for 36% of our total mobile traffic and we are continuously adding features that create monetization opportunities for the business and improve the experience for our AMP users. We are very excited to see what the future holds as other platforms outside of Google decide to deliver AMP content and more organizations start contributing to the AMP Open Source Project.
State and local officials in Iowa have been working hard to rationalize their handout of more than $208 million in tax benefits to Apple, one of the world’s richest companies, for a data facility that will host 50 permanent jobs.
The deal will help make Iowa an “innovation and technology” hub, Gov. Kim Reynolds gushed. It will ensure development of a big parcel of open land that otherwise would have remained fallow, local development officials said. It was a bargain, according to civic leaders in Waukee, the Iowa community that will host the data center — after all, Apple will be contributing up to $100 million to a local infrastructure fund, starting with money to build a youth sports center.
We were highly skeptical of this deal when it was announced Aug. 24. In the fullness of time, we’ve subjected it to closer scrutiny. And now it looks even worse.
In a broader sense, the Apple deal shows the shortcomings of all such corporate handouts, nationwide. State and local governments seldom perform cost-benefit studies to determine their value — except in retrospect, when the money already has been paid out. They seldom explain why some industries should be favored over others — think about the film production incentives offered by Michigan, Louisiana, Georgia and, yes, Iowa, which never panned out as profit-makers for the states. They’re often negotiated in secret, as was the Iowa deal, then presented to taxpayers as faits accomplis — andoften misleadingly.
These incentives often are an unnecessary bonus to companies that already have made a site location decision based on more important factors. Yet states and localities have persuaded themselves that the incentive packages are an indispensable lure to employers and that without them their economies will collapse.
“Firms know where they want to be,” says economist David Swenson of Iowa State University. “The question of how much in rents they can extract from state and local governments is phase two. But taxes are a secondary consideration.”
Worst of all, the handouts allow big companies to pit state against state and city against city in a competition that benefits corporate shareholders almost exclusively. Bizarrely, this process has been explicitly endorsed by Donald Trump. Companies “can leave from state to state and they can negotiate good deals with the different states and all of that,” Trump said in December, as long as they’re not taking the jobs across the border. This is a formula, of course, for what some might compare to unrestrained corporate extortion.
These corporate handouts might make sense if they spurred economic growth. They don’t.
“There is virtually no association between economic development incentives and any measure of economic performance,” urban economist Richard Florida concluded in 2012. A study of his found “no statistically significant association between economic development incentives per capita and average wages or incomes; none between incentives and college grads or knowledge workers; and none between incentives and the state unemployment rate.”
Another study found that, if anything, government incentives led to slower growth among the companies that received them, possibly because their managers spent more time pursuing incentives than focusing on the business, and felt less pressure to seek out nonincentive-related growth opportunities.
Apple’s deal with Iowa, which includes about $20 million in a state investment tax credit and a 20-year tax abatement from the city of Waukee worth nearly $190 million, underscores all these elements. First, it was put together behind closed doors and presented to state legislators and economic officials only after the fact. Some elements of the package are still shrouded in mystery.
But at least one major element of the original announcement has proven to be extremely misleading. That’s Apple’s contribution of “up to $100 million” to an infrastructure fund for Waukee. As it turns out, “up to” was an important but overlooked phrase. That’s because the total contribution would be dependent on expansion of the data center well beyond the 400,000 square feet originally announced. Unless that expansion occurs, Apple’s contribution will be only $20 million, a source close to the deal told me. In other words, Apple is guaranteed to contribute only one-fifth as much as people were led to believe.
Apple has sound reasons for locating a $1.3-billion data center — which will provide a technical background to services such as its Siri voice-activated assistance service, iTunes and its App Store — on the Iowa prairie. Iowa has abundant wind generation, which enables the company to say the center will operate on electricity from 100% renewable sources. It has plenty of good, flat land if the company decides to expand its data warehouse. And it’s relatively well-insulated from weather extremes, not to mention coastal weather disasters.
As it happens, Apple was already eligible for a major tax break even before it entered negotiations for this deal. In 2007, the legislature enacted a sales tax exemption specifically for data centers, covering their servers, industrial coolers, backup generators, and other computer equipment. Since Apple is planning to spend about $645 million on equipment for the data center, according to Reuters, that implies a break of nearly $39 million from the state’s 6% sales and use tax. Because the exemption was preexisting, it wasn’t even mentioned during the gala announcement of how much Apple would be getting to move to Iowa.
Justifying public outlays of this magnitude, one must have a handle on their possible contribution to employment. In the case of data centers, Swenson argues, it’s meager. Not only is the permanent direct employment forecast for the Apple facility a mere 50 people, the potential for indirect employment is also small. Microsoft and Google data centers preceded Apple to Iowa, Swenson observes. “These centers have absolutely no linkages upstream or downstream with the rest of the economy, except for the upstream grab of electricity,” he says. “They’re just big, sterile, hot boxes that don’t feed into Iowa’s economy.”
Indeed, even as Iowa’s data-center complex has expanded over the years, its employment in the sector has shrunk, possibly because the centers are increasingly automated. According to data from the Bureau of Labor Statistics, Iowa employment in the “data processing, hosting, and related services” sector has been falling sharply, to about 3,400 last year from more than 7,400 in 2007. The state’s share of all such employment nationally also has fallen to about 1% now from nearly 3% in 2007. The lesson is that if Iowa officials think their handouts will place them at the hub of a high-tech revolution, they’re chasing an imaginary grail.
Yet politicians continue to shovel out the benefits, hoping to steer their economies in new directions and perhaps acquire a reputation for vision. Nevada was so eager to land a big battery factory from Tesla Motors’ Elon Musk that it offered him twice what Musk was seeking from the five states competing for the project. (In Las Vegas, this is known as “leaving money on the table.”) Wisconsin Gov. Scott Walker gave a big incentive deal to a furniture factory even though it was laying off half its workforce. He followed up last month with an astronomical $3-billion handout to electronics manufacturer Foxconn for a factory likely to employ a fraction of the workforce it forecasts.
The biggest scam of all may be film incentives, which peaked a few years ago when states across the country began to see themselves as rivals to California. The pioneer in this effort, Louisiana, was later shown to be spending $7.29 in incentives for every dollar in revenue brought in. ("People are getting rich on this deal, and it's not Louisiana taxpayers," concluded the study's sponsor, the Louisiana Budget Project.) California eventually was forced to counter the proliferating production raids with a program of its own, but even its relatively modest package costs $1 in outlay for every 65 cents returned to the treasury. Iowa’s film incentive program, by the way, was such a mess that it collapsed in 2009 amid scandal, leading to several felony convictions.
This is how states and localities end up on a merry-go-round of infinite spending. Whenever another deal gets proposed by starry-eyed politicians, the taxpayers should just say no.
A teenage entrepreneur who became a millionaire by 20 before sharing a billion-dollar fortune at 36, Bhavin Turakhia isn’t afraid to think big. Now he’s putting $45 million of his own money into building a rival to Slack and other office messaging platforms.
Flock, a cloud-based team collaboration service, has attracted 25,000 enterprise users and customers including Tim Hortons, Whirlpool and Princeton University. It’s a market that has already drawn interest from global technology giants Facebook Inc., Amazon.com Inc. and Microsoft Corp.
This time last year, few had heard of Bhavin and his younger brother Divyank. That changed when they sold their advertising technology company Media.net, with customers including Yahoo, CNN and the New York Times, to a Chinese consortium for $900 million. The all-cash deal catapulted the duo from mere millionaires into the ranks of the super-rich.
“I want to make Flock bigger and better than anything I’ve built before,” Bhavin Turakhia, wearing his signature dark Levi’s T-shirt and Puma sweatpants, said at his Bangalore offices.
While the brothers built up Media.net together, Bhavin is now driving Flock personally while Divyank focuses on growing the advertising business under its new owners.
Flock is pitched as a way to share ideas in real time while doing away with communication hierarchies, at a cheaper price than rival services. Its free version integrates tools like a to-do list and a polling app that teams can share with each other while also allowing users to summon an Uber or take part in audio and video conferencing. Popular apps like Trello, Twitter, Mailchimp and Google Drive can be combined with the platform, he said.
Flock has set its sights on disrupting a segment where Slack and Microsoft have staked out claims, said Neha Dharia, a senior analyst at Ovum Ltd and its global specialist tracking business collaboration and communication products. Flock’s monthly charges are $3 per user for its premium version while Slack’s per-user charges start from $6.67, according to its website. Both offer a free plan.
“As the collaborative messaging space gets hotter, Flock is pricing itself aggressively and fighting for market share,” said Dharia, who said it’s too early to know if it can become a giant killer. “Turakhia will have to keep innovating and integrating technologies like artificial intelligence to keep up with bigger competitors.”
The entrepreneur’s casual dress is no sartorial quirk. His wardrobe consists of 140 identical outfits to do away with frivolous decisions, like choosing what to wear or eat each day, to make him more productive elsewhere, he said. A set of formal clothes is kept in the office for government meetings and client dinners. The Turakhias were born and raised in a middle-class Mumbai family and their father was an accountant. Bhavin, who is two years older than Divyank, started programming at the age of 10 and would devour computer text books and even tutor his teachers in advanced programming languages. To understand his idol Bill Gates’ strategy, he read a dozen books about the Microsoft founder and would eventually get an engineering degree.
The brothers bootstrapped the Directi Group with a $500 loan from their father when Bhavin was still 17, buying server space and domain names in the U.S. In 1994, it became India’s first accredited domain registrar and a decade later sold four product companies to the U.S-based web hoster Endurance Group for $160 million.
Bhavin Turakhia has taken his mantra about productivity to other parts of his life by replicating his workspaces at home and in offices in Mumbai, Dubai, London and Los Angeles, right down to the same standing desk and custom-built chairs. To wring more out of his commutes, his cars are fitted with bespoke desks.
His portfolio now consists of five companies besides Flock, including domain name registrar Radix and Codechef, a competitive programming platform that helps students and professionals test and improve their coding skills. Each are independent entities of Directi and are incorporated in the U.S, Dubai and U.K.
Like the Turakhias’ previous startups, Flock doesn’t have a single dollar of debt or venture capital. That contrasts with four-year-old Slack, which is backed by SoftBank Group Corp. and Accel Partners and was last month said to be raising $250 million at a valuation of more than $5 billion.
While the first generation of Indian technology companies are often copycats of successful rivals, that isn’t a bad thing, said Vivek Wadhwa, former technology entrepreneur and Distinguished Fellow at Carnegie Mellon University’s College of Engineering at Silicon Valley.
“It is the best way for them to get started and learn, and Chinese entrepreneurs too got their start by copying and now they are innovating,” said Wadhwa, who expects Indian entrepreneurs like Turakhia to challenge established rivals.
Criticism about being a copycat doesn’t faze Bhavin and neither does Slack’s high profile investors and lofty valuation.
“I’ve always arrived late to segments: I was eight years behind the rest of the market in web hosting and 10 years behind in contextual advertising but my startups worked their way into the top five,” he said.
There are many descriptions floating around the internet, trying to explain functional programming in simple terms. Unfortunately, most discuss details only loosely related to functional programming, while others focus on topics that are completely irrelevant. So of course, I had to write my own!
This post is my own understanding of what is the "core" of "functional programming", how it differs from "imperative" programming, and what the main benefits of the approach are. As a worked example, we will use a kitchen recipe as a proxy for the more-abstract kind of logic you find in program source code, to try and make concrete what is normally a very abstract topic. That recipe is one of my favorite recipes available online, Michael Chu's Classic Tiramisu.
About the Author: Haoyi is a software engineer, an early contributor to Scala.js, and the author of many open-source Scala tools such as the Ammonite REPL and FastParse.
If you've enjoyed this blog, or enjoyed using Haoyi's other open source libraries, please chip in (or get your Company to chip in!) via Patreon so he can continue his open-source work
A topic as broad as "Functional Programming", or "FP" has too many different interpretations and facets to be summarized in one blog post. Nevertheless, this post will discuss what I think is the most core, basic level of functional programming. This will hopefully be something that everyone, from FP newbies to FP "experts", should be able to empathise with and agree is a useful part of functional programming.
It's not surprising that many people have tried to explain functional programming using kitchen/recipe/cookbook examples: learning things "by analogy" of things you already know is one of the easiest ways of learning. However, all explanations I have seen fall short. I will begin by examining some typical, incorrect explanations of what functional programming is about, before discussing how Michael Chu's Classic Tiramisu recipe:
Can provide insight into what I think are the core techniques and benefits of functional programming.
What Functional Programming is Not
There are many poor explanations people have given for "what is functional programming". Here is a selection:
Helper Methods
One of the most common misconceptions of what FP is is illustrated by the following example:
FP => I'll have a Sazerac
Imperative => Excuse me sir, could you take some ice, add rye whiskey, add bitters, add absinthe, shake, strain into a glass, and add a lemon garnish before bringing it to me
While this example was taken from the y-combinator message board, I've seen this attitude in many places: the idea that functional programming is just taking imperative instructions, and wrapping them in a helper. In this case, the messy imperative code will all sit inside a single helper:
def sazerac():
... 10000 lines of messy imperative code ...
But even in imperative programming you always end up factoring things into helper methods. Java has helper methods. Write assembly, and it ends up being organized with sub-procedures to encapsulate messes of imperative code.
Thus, while this is a useful technique, writing helper methods to wrap your messy code in a single method/function/subprocess/subroutine call does not count as functional programming.
Furthermore, picking an easier/simpler problem, despite making your code look neater, does not count as "Functional Programming" either. Calling a single method that executes a huge blob of code that someone else has written is convenient, but is not functional programming. The point of FP is to face the complexity, own it, and control it, not shove it inside some unmaintained helper function or say it's a problem for some "other department" to deal with.
It's often said that you can write COBOL in any language, that you can write Java in any language. Well, you can write any language in Haskell too: the above is basically writing Bash in Haskell
Just because something is implemented in Haskell with Monads, doesn't mean it's functional programming. If it looks like imperative code written in Bash, and it's semantics are like imperative code written in Bash, it's imperative code. This example certainly looks exactly like imperative code written in Bash except it's run using serve $ makeCocktail instead of bash cocktail.sh.
Compile-time AST Macros
Some variant of Lisp (or Scheme?) was probably one of the first implemented FP languages; and Lisps tend to have compile-time AST macros that allow you to transform sections of the program at compile-time.
But compile-time code-transformations are not unique to Lisp; apart from other FP languages that have them, like Template Haskell or Scala Macros, many languages have some sort of compile-time code transformation. From Java Annotation Processors, to my own MacroPy project in Python, it turns out that compile-time ast macros are just as feasible in imperative languages, doing imperative programming. You can manipulate mutable ASTs using imperative Python code just as easily as you can elegantly transform immutable ASTs using Scala.
Furthermore, there are a large set of "obviously" functional programming languages that don't have AST-transforming macros at all. Purescript, non-Template Haskell, Scala 2.9, and many other "obviously" functional languages do not include support for compile-time AST transformations. So whatever is the core of functional programming, it's not AST macros.
Static Types
There are a large number of people who use FP together with static types, e.g. in languages like Haskell, Scala, or Ocaml. Thus, if you spend all your time within this world, it might be tempting to think that FP is all about static types. Higher-kinded, Rank-N, Dependent, the fancier the types, the more functional the programming.
However, there are probably just as many people using FP without static types: in some parts of the Javascript community, Clojure, Scheme or one of the many other Lisps. It turns out, that all those using FP without types still get many of the benefits. And then there are all those people in static-typed languages like Java that use minimal FP in their code.
So static types, while present in many FP languages, are not the core of FP.
Step by Step Imperative Recipes
Now that we've looked at a few common misconceptions of what FP is, let's look at what the core of FP actually is (according to me) in contrast to "imperative" programming, using Michael Chu's Classic Tiramisu:
As an example. To begin with, we'll explore an "imperative" recipe, that is probably familiar to those you already know.
Michael Chu's Classic Tiramisu, like all the other recipe's on his excellent recipe site has roughly four sections on the page:
The backstory of the recipe
A step-by-step guide, with photos, of how to make the Tiramisu
A diagram of the overall process, showing which ingredients are combined with which others, to create the resultant Tiramisu
A lively and entertaining comments section
For the purpose of this programming blog we will only be looking at parts 2. and 3.: the step by step guide, and the process diagram. The step by step guide details, in order, a series of steps that you can take to make a Tiramisu. At a high level, hiding many of the details, it looks like this:
Begin by assembling four large egg yolks, 1/2 cup sweet marsala wine, 16 ounces mascarpone cheese, 12 ounces espresso, 2 tablespoons cocoa powder, 1 cup heavy cream, 1/2 cup granulated sugar, and enough lady fingers to layer a 12x8 inch pan twice (40).
Stir two tablespoons of granulated sugar into the espresso and put it in the refrigerator to chill.
Whisk the egg yolks
Pour in the sugar and wine and whisked briefly until it was well blended.
Pour some water into a saucepan and set it over high heat until it began to boil.
Lowering the heat to medium, place the heatproof bowl over the water and stirred as the mixture began to thicken and smooth out.
Whip the heavy cream until soft peaks.
Beat the mascarpone cheese until smooth and creamy.
Poured the mixture onto the cheese and beat
Fold in the whipped cream
Assemble the tiramisu.
Give the each ladyfinger cookie a one second soak on each side and then arrange it on the pan
After the first layer of ladyfingers are done, use a spatula to spread half the cream mixture over it.
Cover the cream layer with another layer of soaked ladyfingers.
The rest of the cream is spread onto the top and cocoa powder sifted over the surface to cover the tiramisu.
The tiramisu was now complete and would require a four hour chill in the refrigerator.
This is, I think, something like what most people would think of when told "imperative recipe". You start with a set of inputs (the bullet 1.) and then perform a series of steps until you have a result at the end. (For now, I'm ignoring the pictures in the recipe, though you could think of them as a sort of assert function for a would-be chef to check some invariants after each step to make sure his tiramisu hasn't gone terribly wrong!)
A simplified Python version of this recipe (ignoring the fact that I'm overloading the same functions to work on different types/number of arguments) may look something like this:
Like most imperative code, it works, but may be hard to understand deeply or difficult to refactor. For example, in cooking terms, you may ask the following questions:
If I have two people to make this tiramisu, which parts can be done in parallel?
My expresso hasn't arrived yet; can I shift that step down and do other things first and include the expresso later when it arrives?
At step 9. I screwed up and spilled the bowl onto the floor. Which steps do I need to re-do (and which ingredients I may have to re-purchase) to recover and continue the recipe?
Just before step 10, you realize you forgot to do step 7. How much of your ingredients have been ruined?
All four of these are things that happen regularly in a kitchen, and also happen to correspond to things you do with program code all the time: parallelizing things over the available cores to speed things up, shuffling the order of a computation around, dealing with failures and exceptions, or plain old bugs and mistakes.
The answers to these questions are left as an exercise to the reader; in this case, with 12 steps, it's not terribly hard to figure out. A few minutes carefully studying the recipe and you could probably figure it out, so you should definitely give it a try.
In a large software project, with a codebase containing thousands or millions of lines of imperative code, that time could easily stretch to days, weeks, or months trying to figure out how to properly recover when one of those imperative steps fails, or how to make your legacy PHP monolith do something faster by using more than 1 of the 32 cores you have available on your beefy server.
The problem in these cases often isn't that you don't know how to run stuff in a separate process in PHP - the problem is that you don't know enough about your own code to decide what to run in that other process. To move things onto a separate process, you need to know exactly what each bit of code depends on, and who depends on it, so you can pick a set with minimal dependencies to run somewhere else (since inter-process communication is expensive). That's difficult when you have a pile of imperative code and don't even understand it enough to easily move things around within a single process.
The reason that these kinds of analyses are hard on this imperative recipe is the same reason that the analyses are hard when programming in an imperative style:
There is an ordering of the steps, but the ordering between some steps is required, e.g. the series 9, 10, 11, while those between other steps is entirely arbitrary: step 2 could be done anywhere before step 11, and step 7 and 8 could be swapped or done much earlier and nobody would care.
The instructions are based on changing the state of things, e.g. pouring stuff into mixture, a term that we use repeatedly throughout the recipe but means a different thing in each step. Even the meaning of cheese and cream changes as the recipe progresses (e.g. after calling whip(cream)), but it is entirely hidden from you and not obvious from the code.
Overall, these factors make it hard to decide, given a single step S, what steps S depends on, and what other steps depend on S. Again, it is possible to figure it out, but what is somewhat-tedious to figure out in a 16-line tiramisu recipe becomes painful and difficult in a 1,000,000 line enterprise codebase.
So that's what an imperative Tiramisu recipe looks like. What does a "functional programming" Tiramisu recipe look like?
"Functional Programming" Recipes
It turns out, there's a FP version of this recipe right underneath the imperative one! The "process diagram" mentioned above is an excellent illustration of how such a recipe would look like using "Functional Programming":
To read this, the raw ingredients are on the left, and each of the boxes represents a single operation that transforms and combines the ingredients. After all the combinations have taken place, you end up on the right with a single, complete Tiramisu. While this "2D" format is not how people write program source code, the underlying structure is not too different from how people structure "FP" programs, which I will demonstrate below.
This diagram leaves out some the detail that the full imperative recipe provides, even compared to the abridged version I transcribed above. For example, chilling the expresso or explicitly boiling the water are left out, and the details of assemble are not included. Nevertheless, it contains the same high-level steps of how to build the tiramisu I abridged above. We're not leaving out large numbers of operations or hiding things behind high-level instructions: all the same steps are still there, just organized slightly differently.
But even if this diagram has the same "content" as the imperative instruction-list I discussed earlier, what about this makes this presentation of the recipe more "functional"?
Tiramisu Diagram to Functional Programming
While nobody actually writes their code in a 2D table-flowchart-thing like this tiramisu diagram is, it turns out underneath the 2D format the "core" of this diagram is the dependency graph between elements:
Where each box takes in some "inputs" from the left, and results in an "output" that can be used by more-rightward boxes. This can be straightforwardly represented in code by treating the boxes as functions, e.g. in the following Python code:
(Again, forgive the fact that I'm overloading the same functions to work on different types and numbers of arguments)
If it's not immediately clear how this code relates to the "functional programming dependency diagram" I discussed above, we can draw the dependency graph of this code: showing where the input variables go, where the return value of each function goes, all the way into the "final" result that gets returned:
It might look like a bit of a mess, but if you look carefully, you will see that although the graphs are laid out differently, the fundamental structure of the two graphs is identical! That is what I mean when I say the 2D box-diagram is a "FP Recipe": although people don't tend to write code in 2D box-diagrams, the underlying structure that the diagram represents is totally equivalent to some "FP"-ish Python code, not too dissimilar to what people do write.
This code looks very unlike code you are likely to see in a Python project, "in the wild", but we can fix that! If you prefer to have intermediate named values instead of one big expression, it's straightforward to pull out each function call into it's own statement:
That makes it look entirely "pythonic", indistinguishable from the code you might find in any random project on Github
Moving every expression into a separate statement is a straightforward transformation, at least for FP programs, and is the kind of thing that compilers regularly do automatically. Thus, although that block-flow-chart diagram may have looked a bit foreign at first, it really isn't that different from the code people write day to day, all year round.
In fact, it looks not too unlike the "Imperative" version we came up with earlier!
These two snippets of code look very similar, but the top one is "Functional Programming" while the bottom one is "Imperative Programming". The difference between them?
In the first, you can see that beat(cheese) must come before beat(whisked, beat_cheese), because beat_cheese is defined by the beat(cheese) and used by beat(whisked, beat_cheese). Even if you know nothing about beat, cheese or whisked, it is clear from the code that if you tried to reverse the order - and beat(whisked, beat_cheese)beforebeat(cheese), it wouldn't work.
In the second, it's not so clear: does beat(cheese)need to come before beat(mixture, cheese)? Or does beat(mixture, cheese) need to come before beat(cheese)? In this case, we have a link to the "docs" (the original recipe) so we can look it up, but which one depends on the other - and whether they are currently in the right order - is not clear from the code.
But how does this seemingly-trivial difference affect the way you build software?
Preventing Errors with Functional Programming
The difference between the two Python snippets, the # FP and # Imperative snippets, will become clear with the following thought experiment: what if we try to make changes to the code?
Changing code is something we do all day, and sometimes we do it incorrectly. It would be a nice property of a codebase if changes tended to be easier to make correctly, and incorrect changes were easier to spot. We'll discuss the latter first.
If I try to tidy things up and accidentally move the statement
beat_cheese = beat(cheese)
below
cheese_mixture = beat(whisked, beat_cheese)
It should be clear to me that something is wrong, because there will be no beat_cheese in scope to create the cheese_mixture. Even if it's not clear to me, it's probably clear to my linter and editor:
As you can see, not only does the usage of beat_cheese raise an error because no such variable is defined, the definition of beat_cheesealso raises a visual warning: it is greyed out since it is dead code! This makes it very hard to miss when you make such trivial error, and saves you time: rather than waiting 10s for your test suite to run, within less than 1s your linter would have lit up and flagged the lines as invalid. Over the days, months and years, this adds up to a significant productivity boost
However, in the Imperative case, it's not clear how
beat(mixture, cheese)
Relates to the things before or after it. If I remove the beat(cheese) earlier, I still have a cheese to pass in. If I remove the beat(mixture, cheese) entirely, I still have a mixture I can use in later steps of the recipe. So how do I know, from looking at the code, that removing a step or re-ordering them so that beat(cheese) comes after beat(mixture, cheese) is a problem?
The answer is, you often don't, and neither does your computer, or your editor and linter, who aren't going to help you spot the fact that you accidentally swapped two of the imperative statements:
Fundamentally, in the "FP" example, the code is laid out in a way that the "correct" usage is obvious: each function, e.g. beat, only depends on the things that are passed into it, and it's output is only depended upon by whoever uses it's return value. In the "Imperative" example, it's not clear who depends on who: you have to memorize the fact that beat(cheese) must come before beat(mixture, cheese), and not the other way around.
While this is not difficult assuming we are looking at already-correct code (the current order is the correct order!), when mistakes are made, and code happens to be incorrect, "FP" code makes the mistakes much easier for you (or your linter) to spot so you can correct them.
While this example may seem contrived, the basic problem exists in all large codebases I've worked with. For example, maybe you've bumped into code similar to the following three functions:
def initialize():
... 1000 lines of messy code, no return value...
def make_app():
... 2000 lines of messy code, no return value...
def start_server():
... 4000 lines of messy code, no return value...
Which transitively depend on a 1 million line codebase ("The App"). How could I know that start_server() needs to be called before make_app(), which itself needs to be called before initialize(), when all of them are global functions which don't take arguments or return anything? I have certainly spent countless days of my career puzzling over such mysteries in large codebases, and I am sure others have too. If start_server returned something I needed to pass to make_app, which returned something I needed to pass to initialize, that would make it clear from the outset which one needs to come before the other.
Re-ordering or shuffling around statements is not uncommon. When you are refactoring a piece of code to let you re-use it in a different place, a lot of time is spent shifting bits of code up and down small amounts, just like the example I showed above, so that the code you want to re-use is all in one place and you can extract it into a helper.
Perhaps you just want to tidy up what was previous a messy function to organize the code a bit better than it already is, grouping related lines so they can be read together easily, without changing any behavior at all.
Or perhaps, as mentioned earlier, someone made a mistake and the code that already exists is incorrect, and your job is to figure out which of the statements is out of order so you can fix it.
All of these are things that software engineers do day in, day out. And often, we make mistakes when doing so. With functional programming, whether in a typed language or not, it tends to be much more clear when you've made a trivial, dumb error. That means you get feedback quicker: you get corrected quietly by your linter in the privacy of your own laptop, and can quickly fix it and make progress, rather than waiting a long time only to be loudly yelled at by Jenkins CI in front of your entire team.
Refactoring a Functional Tiramisu Recipe
Even if you haven't already-made a mistake, and are just thinking of making a change to a codebase, the # FP version of the code is a lot easier to think about than the # Imperative version. The same often applies whether you're writing dealing with Python, Javascript, Scala, or a Tiramisu recipe!
I have already shown above how the 2D-block-diagram version of this recipe is exactly equivalent in semantics to a "FP" Python function. For this section I will use the 2D-block-diagrams to illustrate my points, as it is much clearer visually, but the same kind of reasoning applies to "FP" code in Python or any other programming language. While working with an FP style, you quickly get used to performing the same analyses in your head, just as quickly, but on lines of source code rather than 2D-block-diagrams.
What is interesting is that this structure lets us very easily answer some of the questions we asked above:
If I have two people to make this tiramisu, which parts can be done in parallel?
This one is easy:
Anything vertically separated can be done in parallel. For example, preparing the ladies fingers and preparing the eggs/sugar/wine are separate and can be done independently, as can whipping the cream and mascarpone cheese. Thus, if you have three people, you might assign:
one person to be the egg/wine/sugar mixture czar,
one to be the mascarpone/cream czar, and
one to be the expresso/ladyfingers czar.
On the other hand, anything horizontally separated has to be done sequentially:
Thus even if you parallelize the early bits, the later beat-fold-assembly-refrigerate steps all have to be done sequentially, and how much time you can save on your Tiramisu is limited by the length of the Critical Path.
Working with the "FP" representation of the recipe doesn't shorten the critical path, and thus doesn't affect how much you can "theoretically" speed up your recipe with parallelism. What it does do is make clear exactly which parts of the recipe can be parallelized and which can't, so you can more quickly organize your work to get maximum parallelism given the constraints of the recipe, and then move on to other things.
Again, while we're looking at a 2D-block-diagram, the same applies to FP-style code in Python, Javascript, Scala, or any other programming language.
My expresso hasn't arrived yet; can I shift that step down and do other things first and include the expresso later when it arrives?
If you expresso hasn't arrived, anything depending on it can't be done, but anything else involving eggs/sugar/wine/cheese/cream can be prepared: the sections marked in red make it clear which parts of the recipe depend on expresso; the rest can be done while waiting for the expresso to arrive
What if my eggs haven't arrived? Which steps can I do first before the eggs turn up?
In this case, the top block can't be done but you can prepare the bottom and middle blocks: preparing the expresso, beating the cream and mascarpone cheese. Again, this is obvious from looking at the diagram
At step 9. I screwed up and spilled the bowl onto the floor. Which steps do I need to re-do (and which ingredients I may have to re-purchase) to recover and continue the recipe?
Step 9 is when you beat the Mascarpone cheese into the egg mixture. Once we find it on the diagram, it's clear what we need to do:
You will need to get some new eggs/sugar/wine/cheese and beat/beat/whisk/beat them all over again
What if I spilled the bowl at step 10? Or step 8?
Spilling the bowl at step 10 (folding the whipped cream into the main mixture) is the same as spilling the bowl at step 9, except you need to get new cream too.
Spilling the bowl at step 8 (beating Mascarpone cheese) and you just need to get new mascarpone cheese and beat it. The rest of your ingredients are fine.
Just before step 10, you realize you forgot to do step 7. How much of your ingredients have been ruined?
In the diagram above, the red boxes represent the steps we've already done, up to step 10 (folding in the whipped cream). As you can see, not having done step 7 (whipping the heavy cream) is no big deal; we haven't needed to done it up to now, so we can do it and continue with step 10
What if the forgotten step was step 4? Or step 2?
If you forgot step 4 (whisking in wine and sugar to the beaten eggs) you've ruined your eggs/sugar/wine/cheese:
As you can see, the stuff we've been whisking and beating was not prepared properly before being whisked and beaten, since we forgot to mix in the wine and sugar. Assuming we don't know enough kitchen chemistry to incorporate the wine/sugar in at this stage (Our eggs may well have turned into omelettes by now without the additional liquid from the wine...) we will need to re-do all the steps in the upper red box.
If you forgot step 2 (dissolving sugar into expresso) you're fine. The expresso hasn't been needed yet:
According to the imperative recipe above, we should have done the expresso mixing first before starting on the egg/wine/cheese. But even though we didn't do it, it is trivial to see from the FP-style recipe that there really isn't any loss: no other steps so far depended on that, no other ingredients were ruined.
As you can see, many of the questions that were non-trivial to answer when dealing with the imperative code back in Kitchen Refactoring are now trivial to answer when working with the FP-style 2D-block-diagrams.
Again, while nobody actually codes in 2D-block-diagrams (except skilled engineers running recipe blogs) the 2D-block-diagrams are equivalent to a relatively straightforward snippet as shown above. With some experience dealing with FP code, you can often perform the same analyses just as easily when working directly with the equivalent Python code we showed earlier. And it's not just about programmers: automated tools linters or IDEs often perform the same analysis on the fly, as shown earlier, quickly alerting you if you make a mistake that means the recipe can no longer be completed successfully:
The Core of Functional Programming
The core of Functional Programming is thinking about data-flow rather than control-flow. Although, by virtue of editing plain text, you are forced to order your code in a linear sequence of statements, those statements are a thin skin over what you really care about: the shape and structure of the data-flow graph within your program.
Similarly, when executing a "functional program" in a single thread, you are forced to pick a linear order in which you execute each individual instruction, which e.g. might be the same as the order in which it is written down in the code. But since we know what really matters is the shape of the data-flow graph, we can freely re-arrange the statements in the code, and the order of execution, as long as the graph shape is preserved. Since the data-flow graph matches the graph of definitions and usages, even your editors and linters understand it enough to warn you if you re-arrange things in an invalid order. In fact, if you have multiple cores (or multiple cooks!) you can execute parts of it in parallel, not in any linear order at all! Exactly in what order the program-counter proceeds from instruction to instruction is irrelevant.
This is in contrast to an imperative program, where the exact order in which the program-counter executes each statement, going in and out of loops, in and out of sub-routines, is the key to understanding the program. In an imperative program, you tend to think in terms of steps that must happen "before" and "after", and make sure that the control-flow of the program executes the commands in the right order for the program to work.
Note that none of the FP examples here are "less complex" than the "imperative" recipe we discussed above. It's about the same number of lines:
whether as multiple statements, one big expression, or as a 2D block diagram. All the same operations are present: beating, whiping, folding, etc.. Functional Programming is not about hiding ugly code in helper methods and hoping nobody notices: it's about managing the same complexity in a way that makes the dependencies between each piece of code obvious, by following the graph of where function arguments come from and where return values end up.
When you have a working program, having the dependency graph of function return values being passed into other functions as arguments makes it really easy to analyze code. For example, if we were curious what exactly is required to get our wet_fingers_mixture, we can see:
wet_fingers comes from soak2seconds(fingers, sweet_espresso)
sweet_espresso comes from dissolve(sugar2, espresso)
sugar2, fingers, espresso are the initial ingredients of the recipe
An there you have it: just a few steps, entirely mechanical, and we can see exactly what wet_fingers needs. We need no understanding of what dissolve does, or what a sugar2 is: just from the structure of the code we can already see what wet_fingers requires. Just as importantly, we can also see that it does not depend on folded_mixture, whipped_cream, or any of the other steps that are above it in the code: while those steps "come before" the operations that give us a wet_fingers, it's clear from this analysis that their ordering is entirely accidental, and that we could e.g. prepare the wet_fingers before the other steps if we so desired.
It's not hard to do this yourself, but any IDE with jump-to-definition should be able to do this for you, and so can automated linters and code analysis tools. And understanding the code is the first step in changing it, without bugs.
When you have a broken program, having the dependencies be easy to analyze means it's easier to spot when you make a mistake or do something out of order: even in a dynamic language like python, a subtly bad copy-paste job can get called out by your editor so you can fix it before needing to run any code:
Whether you're working in a dynamic language like Python or a static language like Scala, whether your code is currently working or broken, Functional Programming's data-flow-centric approach helps you understand your code faster, easier and with more tooling help than a Imperative, mutation-heavy approach.
Conclusion
The core of Functional Programming is thinking about data-flow rather than control-flow
While this may seem a trivial definition of "Functional Programming", I think it is really the core of the idea. While there are many further steps, from the simple (immutability, referential transparency, ...) to the more advanced (monads, lenses, ...) this core of should be something that everyone, from newbies to old hands, whether using Scala or Clojure or Haskell or React.js, should be able to empathise with. Even in a language like Python, as I have used for the examples, it is possible to program in a more "Functional" style, and reap some of the benefits of functional programming.
Those more advanced topics don't really fit this worked example anyway: kitchen ingredients tend to be very, very mutable (and perishable!).
Though it's growing, this baseline-level of FP is not yet widespread in industry.
Whole languages, such as Bash, make it a pain in the neck to take non-trivial function arguments or return non-trivial results, resulting in people's code writing things to the filesystem, hopefully "before" someone else needs to read them.
Languages like Java encourage patterns where you instantiate a half-baked object and then set the fields later, praying that nobody accidentally tries to use it "too early" it in it's half-baked state while it's internal variables are garbage.
In all of these cases, the order in which things run - exactly how the program-counter progresses from statement to statement, in and out of for-loops, in and out of sub-routines - is critical.
Even in the kitchen, having a "FP-style" recipe like the block diagram I showed above is helpful, because when the person bringing your Marsala Wine is stuck in traffic, it makes it easier to re-organize your recipe so you can get as much work done immediately. When that person arrives, it helps you figure out how to parallelize the work over the people you have available. When someone screws up, it helps you figure out exactly which ingredients you need to re-purchase and steps you need to re-do.
This widespread applicability, even to fields outside the software world, and to every "FP" language within the software world, is why I think this is truly what functional programming is all about.
I have always been interested in calligraphy, especially the eccentricities
of the swash and black letter capitals.
My father was deeply concerned with Renaissance printing, being a textual
bibliographer of Shakespeare. His major focus was in the errors printers
were likely to make, nevertheless I grew up in an environment where the long-s
and other archaic devices appeared. My father used to brag that a great aunt
of his still used the long-s in her hand-written notes.
MALVOLIO By my life, this is my lady's hand, these be her very C's, her U's and her T's and thus she makes her great P's. It is, in comtempt of question, her hand.
Sir ANDREW Her C's, her U's and her T's: why that?
-- Shakespeare Twelfth Night, II, v
CLEAR="ALL">
Why that indeed. Some questions may have no answers.
In the early '80s I was working at JPL and I met my first bitmap display.
The primary use of this display was to make movies of a simulated fly-by
of Olympus Mons (on Mars). I used it to design my first bitmap font.
In the late '80s I bought my first computer, a Mac II. And with it a bought
a little program called Fontastic which allowed people to design their own
'FONT' resources (mac bitmap fonts).
Then a friend, Godfrey DiGiorgi, recommended that I buy a copy of Fontographer
and design PostScript fonts. I was leary initially. How could a rasterizer
match the quality of a hand crafted bitmap? But eventually I succumbed to
the attractions of the cubic spline.
In the meantime I had studied calligraphy, and once I had Fontographer began
designing fonts based on various (latin) calligraphic hands.
In the early '90s I was working at a little web start-up company, called
NaviSoft, which was almost immediately bought by AOL. My product was an
html-editor (best known as AOLpress). As I was working to convert it to handle
Unicode I became concerned about the lack of Unicode fonts. I began working
on my own Unicode font (just the alphabetics and symbols, I knew there was
no way I'd be able to deal with all the CJK characters). I designed a font
based on Caslon's work with Bold and Italic variants. And then I started
working on monospaced and sans-serif families (I called the sans-serif design
"Caliban" as a play on Arial).
Aldus (the makers of Fontographer) had been bought by MacroMedia, and MacroMedia
seemed to have no interest in continuing Fontographer. So development on
Fontographer ceased. It did not support OpenType, and its unicode support
was minimal. I began to write little programs to decode Type1 fonts and fix
them up in various ways.
AOL did not really know what to do with AOLpress. AOLpress had been designed
with web designers in mind, not with Steve Case's mother (which was AOL's
target audience). So development on AOLpress ceased and the Unicode/CSS version
never was completed. I continued to work on my fonts however and continued
to be dissatisfied with Fontographer. In 1998 my AOL options matured and
I was able to retire.
I wanted to try to become a primatologist and had made arrangements to spend
4 months in Madagascar as a field assistant to Chia Tan studying the Greater
Bamboo lemur (Hapalemur simus). Sadly I found that I was not really
cut out for that life. I had a hard time recognizing individual animals,
and found that after a few months the leaches were more annoying than I had
expected.
So I gave up on that.
Instead I set about working on an improved version of my html editor, and
started by writing my own Unicode based widget set for it (this was before
pango was out). When the widget set was usable I decided to write a small
application to test it, and something to display the splines of a postscript
font seemed just the thing. Having done that I figured I might as well allow
people to edit those splines and save it back. And so was born the first
version of PfaEdit.
Somehow the html-editor never got written.
I quickly discovered I was better at designing a font editor than I was at
designing fonts, so I gave up on them too.
After a couple months of work I had something which worked, or so I thought,
and I posted it to the web (my friend, Dan Kenan, very kindly gave me some
space on his server, aptly named bibliofile) on 7 November of 2000. Within
a month I had received my first bug report, and presumably had my first user.
I continued working on PfaEdit, adding support for pfb fonts and then in
December for truetype and bdf fonts. I learned about sourceforge and moved
PfaEdit there in April of 2001.
In April of 2001 I added support for type2 fonts embedded in an sfnt wrapper
(opentype fonts, but not the advanced typographic tables). In July of 2001
MinGyoon (from Korea) asked me if PfaEdit could support CID keyed fonts so
I learned about those and added support for them in August.
Valek Filippov suggested that I make PfaEdit be internationalizable, so I
provided a mechanism and he provided a Russian translation of the user interface
in June of 2001. On December of 2005 I gave up on my own system and switched
to GNU gettext.
Language
Translator
Initial version
English
Russian
Valek Filippov
June 2001
Japanese
Kanou Hiroki
August 2002
French
Pierre Hanser
September 2002
Italian
Claudio Beccari
February 2003
Spanish
Walter Echarri
October 2004
Vietnamese
Clytie Siddall
July 2006
Greek
Apostolos Syropoulos
August 2006
Simplified Chinese
Lee Chenhwa
October 2006
German
Philipp Poll
October 2006
Polish
Michal Nowakowski
October 2006
Traditional Chinese
Wei-Lun Chao
August 2007
I started working on support for simple aspects of the OpenType GSUB and
GPOS tables in April of 2001 and finished the process (ignoring bugs of course)
with the contextual chaining lookups in August of 2003. Similarly I started
adding support for the various equivalent Apple Advanced Typography tables
(primarily 'morx') at about the same time.
In an early attempt to get PfaEdit to generate instructions to grid-fit truetype
fonts, I set about to write a truetype instruction simulator so that I could
debug the generated code. It didn't work very well on real fonts. Then, in
early 2001, I discovered freetype and
found that freetype already did this (and did it right). At first I examined
their code to try and figure out what was wrong with mine, but eventually
I gave that up and simply used freetype as an instruction simulator. As things
got more complicated (with David Turner's permission, and many suggestions
from Werner LEMBERG), I eventually wrote a visual front end for freetype's
built-in debugger. For a while this lived in a separate program called mensis,
but in March of 2003 I integrated it into PfaEdit.
Many people urged me to provide a scripting interface to PfaEdit. At first
I could not understand the point -- font design needs a graphical interface
after all. But I was only looking at a small fraction of the tasks that could
potentially be done with such an interface, and in January of 2002 PfaEdit
gained the ability to run scripts.
In 2003 Yannis Haralambous invited me to talk at EuroTex. I fear I rather
disappointed him in my choice of subject matter -- I tried to do better the
next year when Apostolos Syropoulos invited me to EuroTex 2004 (but I overreached
myself then and made some incorrect assumptions). These conferences were
the first time I had actually met any of my users and were quite stimulating,
leading to many suggestions and requests. I learned about SVG fonts at EuroTex
2003 and implemented them soon thereafter.
Yannis was also working on a book, Fontes & codages in which FontForge
figures. He spent a lot of time making suggestions and finding bugs. He
encouraged me to support multi-master fonts and by February of 2004 I had
done so. Then I started working on Apple's distortable font technology (which
has many similarities to Adobe's multi-master, but is rather badly documented)
and, with help from Apple, had them working in April of 2004. I then extended
freetype's support for multi-master fonts to support Apple's distortable
fonts.
In early 2004 people complained that the name "PfaEdit" no longer reflected
the abilities of the program and requested that I change it. Various people
suggested names (including me), but the one I liked the best, FontForge,
came from David Turner of freetype. And in March of 2004 PfaEdit changed
its name to FontForge.
At about the same time I wanted to provide a somewhat more complete ability
to handle PostScript Type3 fonts (or SVG fonts). So I implemented a multi-layered
editing mode which provided a rather clumsy interface to some of the facilities
of vector graphics programs.
In 2005 a Korean company asked me to do something. We had some difficulty
communicating (I don't speak Korean), but eventually I figured out that they
wanted to be able to group glyphs together. Prior to this FontForge handled
encodings as an integral part of the font, which didn't seem right, and it
made implementing groups impossible. So I had to rewrite much of the internals
of FontForge to redo encodings before I could even start on groups. This
took longer than I had thought it would, and by the time I finished (in July
of 2005) the Koreans seemed to have lost interest. Ah well.
I got interested in pdf files in October of 2005, and gave FontForge's Print
command the ability to print to a pdf file. Then I thought it would be kind
of fun to be able to read a font out of a pdf file. I was a little worried
about implementing this because I know that most fonts stored in pdf files
are sub-sets, and only contain the glyphs actually used in the pdf file itself.
I was convinced that I'd get lots of bug reports from people complaining
that FontForge didn't read the entire font. Nevertheless my sense of fun
overcame my fear of silly bug reports and I implemented it.
And I did get bug reports complaining that FontForge did not read the fonts
correctly. And I don't think I was able to convince some of the complainers that the
fonts were incomplete in the pdf file. Ah well.
The X11 folk want to move away from the bdf format, so they came up with
their own format (call opentype bitmap, with extension "otb") which was
essentially an sfnt wrapper around a series of bitmap strikes with no outline
font. I implemented that back in July of 2003. But then in July of 2005 they
wanted to preserve the BDF properties as well. So we worked out a new table
(called 'BDF ') to contain the properties from all the strikes in the font.
Now it should be possible to make a round trip conversion of
bdf->otb->bdf and not lose any information.
Many people complained about FontForge's ability to edit quadratic splines.
I had no experience editing quadratic splines before I wrote my original
version, I just made it behave like the cubic spline editor (which seemed
obvious). But doing the obvious makes it hard to create a font that uses
some of the optimizations in the ttf file, and made instructing the font
confusing. So between January and February of 2006 FontForge's quadratic
editing capabilities underwent an evolutionary change as people complained
and I tried to fix things.
I have a testsuite for fontforge. Obviously. Originally it was very simple:
a set of script files which did various actions. If FontForge didn't crash,
then I presumed it worked. That was about all I could test, and although
that's important, there are a few other things which might be examined. So
I wrote a command to compare two fonts and see if they were equivalent.
Originally this had been a separate command (called sfddiff), but if I integrated
it into FontForge I could increase the abilities of the tests I wrote.
FontForge produced some rather naive type1 and type2 fonts which did not
make good use of the PostScript concept of subroutines. In June of 2006 I
did a substantial rewrite of the type2 output code and decreased the size
of my output fonts considerably. My new comparison command was helpful in
debugging. Nonetheless I introduced a number of bugs. Which got fixed, of
course. But it made me leary of doing the same thing for type1 output. After
all, Adobe doesn't even produce type1 fonts any more, so surely I don't need
to optimize them. Michael Zedler said otherwise, and after great effort on
his part induced me (in October 2006) to make better use of subroutines in
Type1 output also. No bugs yet... (but it's still October of 2006).
All of FontForge's dialogs had a fixed layout. Which works fine if you've
only got one language to support, but which looks really ugly (and worse
can be totally illegible) when the dialog is translated into a different
language and labels suddenly become longer (or shorter) and spill over into
the textfield they identify. There has been a sudden burst of people willing
to do translations recently. This mattered. So I stole the concept of boxes
from gtk and implemented them in my widget set (in August of 2006), allowing
a dialog to do its own layout to match the size of the things in it.
The pace of change seems to have slowed recently (Oct 2006) as all of the
large tasks have either been done or proved insurmountable. As more people
use the program they find more bugs and I have less time to do development.
In the last few years there have also been large internal changes which (I
hope) are practically invisible to users and cosmetic changes which make
the dialogs look nicer and more comprehensible but which aren't functional.
My interface to GSUB/GPOS was not well thought out. I stored things in FontForge
at the feature level, while OpenType wants things done at the lookup level.
I thought lookups added an unnecessary level of complexity and ignored them.
But people complained (they always do) that once a font had been read in
to FontForge and saved out again it wouldn't work any more. And that was
because I had lost the ordering imposed by the lookups. So in early 2007
I had to redo much of the internals of fontforge as it related to OpenType.
I also changed the Metrics View so it would handle all OpenType lookup types
(rather than just kerning).
And people didn't like my scripting language. Why hadn't I used python? (Well
because I didn't know python and was lazy about learning more stuff that
I didn't think would be useful to me). Various people told me that they just
couldn't use FontForge because it didn't support python. So I added python
support. Then I discovered that my build machine has such an old version
of python that it doesn't provide libpython -- and I can't upgrade my machine
any more because all the distros require booting from CD now (and my machine
can't).
In May of 2007 I went to the Libre Graphics Meeting in Montreal, and as I
listened to the Inkscape talk on how they handled plugins, I realized that
I could do that too. So I extended the python interface to support python
plugins and menu items. Dave Crossland, as is his wont, had many requests,
and had me update the old Display dialog to support all the OpenType lookups
(just as I'd done for the Metrics View) and then merge that into the Print
dialog too. Dave also felt that FontForge should be able to store a font
directly on the Open Font Library website. Well, they had no API for this,
so I had to sit down and figure out http all over again and see what bits
of the user API I needed to walk through.
In June I started working on Adobe's feature files (I could support them
now that I was handling lookups properly), and found to my shock that
The syntax as presented by Adobe wasn't complete (could not represent all
of opentype)
Some of the syntax that was presented hadn't been implemented by Adobe yet
and was marked "Subject to change"
There was no easy way to represent the "Everything else" class (class 0)
of a class set without enumerating every glyph by hand (which could not be
translated into a class 0.
There was no way to distinguish a contextual class based lookup from a contextual
coverage-table based lookup.
... on and on ...
I had assumed that feature files were a stable useful format and found to
my distaste that they were not. I implemented the bits that Adobe hadn't
implemented, and extended them a bit so I could represent more of OpenType
(and told Adobe what my extensions were, but was told they didn't like them).
Grump. Well I wanted something to store as much of OpenType as I could, and
I wasn't going to wait for Adobe to come up with something (which they still
haven't).
Apostolos gave me the spec for the new 'MATH' table. But that spec had MicroSoft
Confidential printed all over it and I wasn't about to touch it. Apostolos
got annoyed at my ignoring it, so in July he had Sergey from MS send me a
copy of the spec that no longer said "Confidential" on every page. Then I
implemented the new 'MATH' table.
I'd never had a good Embolden function. I'd tried various approaches and
none worked well. This year I decided to try a very simple idea: Use expand
stroke and then squash the glyph together so it was the same height it had
been before. That basically worked. Still a few oddities, but basically
functional.
In July Michal Nowakowski gave me a patch which vastly improved the truetype
auto instructor. I told him I'd only accept it if he would support it. After
some initial grumbling he did so -- and then proceded to make it even better!
Then about a week later Alexej Kryukov said he wanted to make the autoinstructor
support diagonal stems, and the two of them started working together on this.
At the Libre Graphics Meeting Dave demoed Raph Levien's spiro splines and
encouraged me to integrate them into fontforge. But Raph released under GPL
and wasn't willing to change, and I released under BSD and wasn't willing
to change. I got permision from Raph to repackage his spiro routines into
a small shared library (libspiro) which could be released separately from
FontForge but to which FontForge could link. And we had Raph's spiros in
FontForge.
I realized that no only could I stick python into fontforge, but if I did
a little more work, I could stick fontforge into python. So I wrapped up
most of fontforge into a shared library that python could load. Dave Crossland
had been complaining (again) about the FontForge widget set. When was I going
to move to gtk? (well, I'd tried gtk back in 2004, and found it hard to use,
and bits of it ugly -- and less functional than my own widget set in the
ways that mattered to me, so I had given up on it). Dave offered to fund
development of a gtk fontforge UI, but only if I'd switch to GPL. I dislike
GPL, it seems so restrictive to me, so I said I wouldn't. Then I realized
that I could rework my library until it was independant of widget set, and
allow Dave to write a UI to sit on top of it, not bound by the fontforge
license. So I reworked the internals of fontforge to make them extensible,
stripped the UI out of libfontforge. And started to work on a gtk based fontforge
of my own.
Dave Crossland was complaining on the Open Font Library mailing list about
how much information was lost when a font was released. Guidelines. Names
of lookups. Cubic splines used for generating the quadratics of TrueType.
And about the need for providing sources. Well, providing sources of fonts
can be difficult, and not always useful if the tools to generate the fonts
aren't also available. However there is no reason why much of that information
can't be stored in the font itself. I already had a table that FontForge
would create called ('PfEd', left over from PfaEdit days) which stored per-glyph
comments, and other things. I could simply extend that table to store guidelines
and other things. And document it so that others could use it, of course
-- but I'd already done that.
And that brings us up to Jan 2008, I guess the pace of change sped up a bit
this year as opposed to last.
Alexey and others complained that they wanted multiple layers of splines.
More than just the Foreground, Background, Guidelines layers that FontForge
came with. One common request was to have both a cubic (PostScript) and a
quadratic (TrueType) layer and be able to generate fonts from both. So in
March of 2008 FontForge grew multiple layer support.
Later in March I added support for the OpenType 'BASE' and Apple 'bsln' tables.
And to amuse myself I added the ablity to have gradient fills in Type3 (and
svg) fonts.
In June I was thinking of the embolden command I did the year before, and
realized that that was essentially the same idea as was needed for generating
Small Caps glyphs from Capital letters. And then some of those algorithms
could be used to create condensed and extended glyphs. And then I sat down
and wrote a generic "glyph change" dialog -- years ago I had had a "MetaFont"
command which was supposed to allow the user to embolden fonts for condense
them or ... Unfortunately my MetaFont never worked very well (And some users
complained that It didn't read Knuth's mf files. Sigh. No, it metamorphosed
fonts in its own way, not Knuth's), so it got removed. Now it was basically
working in a new form -- but I know better than to call it MetaFont now.
Alexey then stepped in and rewrote much of the code. I did not handle diagonal
stems well when creating small caps, and that was just what he was doing
with the autohinter. So he greatly improved the output.
I was also intregued by italics. Converting a font to italics involves so
many different things -- the font is slanted, compressed, serifs change,
letter forms change, ... I studied exisiting fonts to see what I could learn
and asked various real typographers. The consensis I heard from them was
that I could never make a good italic font from a roman one mechanically
and should not bother to try -- it would just lead to people making bad italic
fonts. Good advice, but I didn't follow it. I thought it was a neat challange.
And it was something Ikarus had done, so I wanted to do it too.
In July a friend of mine, who is a mac user, said she wouldn't even consider
looking at fontforge on her mac unless it behaved more like a mac application.
So I figured out how to build a mac Application, and how to respond to apple
events (like having someone double click on a font file, or drop a font file
on fontforge's icon). I figured out how to start up X so that the user didn't
have to. I made pretty (well, I think they are pretty) icons for font files.
I even changed the menus to use the command key on the mac and to show the
mac cloverleaf icon.
My friend still (November) has not looked at fontforge. Ah well.
Dave Crossland had hired someone to integrate cairo into fontforge. But the
result never got back to me. In a moment of foolish boredom I decided I could
do that too. So I studied cairo, and it really didn't seem that hard. But
it was slow -- at least on my 10 year old x86 machine which doesn't support
XRender. Cairo gave two things I cared about, anti-aliased splines in the
glyph view, and anti-aliased text everywhere. Well I needed cairo in the
glyph view, but pango would also provide fuzzy text and was lighter weight
and would also support complex scripts (which fontforge's own widget set did
not do). So I could turn off cairo everywhere but the glyph view but still
get fuzzy text from pango. And speed things up. Then Khaled Hosny suggested
that I implement pango. Hurumph. And I had wanted to surprise people. Oh
well. Implement Pango I did.
A group in Japan created the "Unofficial mingw fontforge page". A very nice
piece of work. It included a set of X resources which provided another, nice
look to the UI. A theme. And then other people started writing themes -- and
started complaining about and finding old bugs in fontforge's resource reading
code -- it had never been exercised before I guess.
I have received many suggestions from many people, too many to enumerate
here, and FontForge is the better for their requests. Often I have reacted
badly to these suggestions (because they always mean more work for me), and
I apologize for that, but mostly I wish to thank those who have helped make
FontForge what it is today.
Currently, probably the biggest complaint about FontForge is the choice of
widget set. No one likes my widgets (except me). Unfortunately for the rest
of the world I don't like the two choices of
widget set available to me (gtk and qt). I will get started working on
converting to one and then run into some problem I can't work around easily
and give up and go back to my own. Well in 2008 I still don't like gtk, but
I have the fontview working in it. A start but probably not something I will
continue.
Headless mode is a useful way to run Firefox, which is just as it sounds — Firefox is run as normal, except without any of the UI components visible. This may not be very useful for surfing the web, but it is very useful for automated testing. This article provides all you need to know about running headless Firefox.
Using headless mode
You can run Firefox in headless mode from the command line, by including the -headless flag. For example:
/path/to/firefox -headless
For the moment, we've kept it simple, but more options will be added later on.
As an example, we are working on a --screenshot option, which will allow you easily take screenshots using headless Firefox. See bug 1378010 for progress on this.
Browser support
Headless Firefox works on Fx55+ on Linux, and 56+ on Windows/Mac.
Automated testing with headless mode
The most useful way to use headless Firefox is to run automated tests with it, meaning that you can make your testing process much more efficient.
Selenium
To provide an example of using headless mode with automated testing, we'll create a Selenium test using Node.js and the selenium-webdriver package. For this guide we'll assume that you already have basic familiarity with Selenium, Webdriver, and Node, and that you already have a testing environment set up. If you don't, work through our Setting up Selenium in Node guide first, then come back.
First, of all, make sure you've got Node installed on your system, and the selenium-webdriver package installed, then create a new file called selenium-test.js.
Let's add some code. Inside this file, start by importing the main selenium-webdriver module, and the firefox submodule:
var webdriver = require('selenium-webdriver'),
By = webdriver.By,
until = webdriver.until;
var firefox = require('selenium-webdriver/firefox');
Next, set the MOZ_HEADLESS environment variable to 1, so that the test will be run in headless mode:
process.env.MOZ_HEADLESS = "1";
Now create a new driver instance for Firefox, and use setFirefoxOptions() to include an options object that specifies that we want to run the test on the Nightly channel (this step will be unnecessary on Linux, and after headless mode lands in the release channel on Windows/Mac, but it is still useful if you want to test a Nightly-specific feature):
var driver = new webdriver.Builder()
.forBrowser('firefox')
.setFirefoxOptions(new firefox.Options().setBinary(firefox.Channel.NIGHTLY))
.build();
Finally, add the following code, which performs a simple test on the Google search homepage:
Slimerjs has Firefox support built in on Linux, with Mac and Windows support coming soon. See Headless SlimerJS with Firefox by Brendan Dahl for more details.
In addition, you will be able to use headless Firefox to run automated tests written in most other popular testing apps, as long as they allow you to set environment variables.
Troubleshooting and further help
If you are having trouble getting headless mode to work, then do not worry — we are here to help. This section is designed to be added to as more questions arise and answers are found.
On Linux, certain libraries are currently required on your system even though headless mode doesn't use them — because Firefox links against them. See bug 1372998 for more details and progress towards a fix.
If you want to ask the engineers a question, the best place to go is the #headless channel on Mozilla IRC. If you are pretty sure you've found a bug, file it on Mozilla Bugzilla.
WiMonitor makes Wi-Fi packet sniffing and network monitoring dead simple!
Once configured the device automatically scans channels 1-13 in the 2.4GHz band, captures packets and sends them to a remote IP address over Ethernet for further processing. The encapsulation is compatible with Wireshark so you can analyze Wi-Fi traffic using it.
The concept of cryptocurrencies is built from forgotten ideas in research literature.
Arvind Narayanan and Jeremy Clark
If you've read about bitcoin in the press and have some familiarity with academic research in the field of cryptography, you might reasonably come away with the following impression: Several decades' worth of research on digital cash, beginning with David Chaum,10,12 did not lead to commercial success because it required a centralized, banklike server controlling the system, and no banks wanted to sign on. Along came bitcoin, a radically different proposal for a decentralized cryptocurrency that didn't need the banks, and digital cash finally succeeded. Its inventor, the mysterious Satoshi Nakamoto, was an academic outsider, and bitcoin bears no resemblance to earlier academic proposals.
This article challenges that view by showing that nearly all of the technical components of bitcoin originated in the academic literature of the 1980s and '90s (see figure 1). This is not to diminish Nakamoto's achievement but to point out that he stood on the shoulders of giants. Indeed, by tracing the origins of the ideas in bitcoin, we can zero in on Nakamoto's true leap of insight—the specific, complex way in which the underlying components are put together. This helps explain why bitcoin took so long to be invented. Readers already familiar with how bitcoin works may gain a deeper understanding from this historical presentation. (For an introduction, see Bitcoin and Cryptocurrency Technologies by Arvind Narayanan et al.36) Bitcoin's intellectual history also serves as a case study demonstrating the relationships among academia, outside researchers, and practitioners, and offers lessons on how these groups can benefit from one another.
The Ledger
If you have a secure ledger, the process to leverage it into a digital payment system is straightforward. For example, if Alice sends Bob $100 by PayPal, then PayPal debits $100 from Alice's account and credits $100 to Bob's account. This is also roughly what happens in traditional banking, although the absence of a single ledger shared between banks complicates things.
This idea of a ledger is the starting point for understanding bitcoin. It is a place to record all transactions that happen in the system, and it is open to and trusted by all system participants. Bitcoin converts this system for recording payments into a currency. Whereas in banking, an account balance represents cash that can be demanded from the bank, what does a unit of bitcoin represent? For now, assume that what is being transacted holds value inherently.
How can you build a ledger for use in an environment like the Internet where participants may not trust each other? Let's start with the easy part: the choice of data structure. There are a few desirable properties. The ledger should be immutable or, more precisely, append only: you should be able to add new transactions but not remove, modify, or reorder existing ones. There should also be a way to obtain a succinct cryptographic digest of the state of the ledger at any time. A digest is a short string that makes it possible to avoid storing the entire ledger, knowing that if the ledger were tampered with in any way, the resulting digest would change, and thus the tampering would be detected. The reason for these properties is that unlike a regular data structure that's stored on a single machine, the ledger is a global data structure collectively maintained by a mutually untrusting set of participants. This contrasts with another approach to decentralizing digital ledgers,7,13,21 in which many participants maintain local ledgers and it is up to the user querying this set of ledgers to resolve any conflicts.
Linked timestamping
Bitcoin's ledger data structure is borrowed, with minimal modifications, from a series of papers by Stuart Haber and Scott Stornetta written between 1990 and 1997 (their 1991 paper had another co-author, Dave Bayer).5,22,23 We know this because Nakamoto says so in his bitcoin white paper.34 Haber and Stornetta's work addressed the problem of document timestamping—they aimed to build a "digital notary" service. For patents, business contracts, and other documents, one may want to establish that the document was created at a certain point in time, and no later. The notion of document is quite general and could be any type of data. They do mention, in passing, financial transactions as a potential application, but it wasn't their focus.
In a simplified version of Haber and Stornetta's proposal, documents are constantly being created and broadcast. The creator of each document asserts a time of creation and signs the document, its timestamp, and the previously broadcast document. This previous document has signed its own predecessor, so the documents form a long chain with pointers backwards in time. An outside user cannot alter a timestamped message since it is signed by the creator, and the creator cannot alter the message without also altering the entire chain of messages that follows. Thus, if you are given a single item in the chain by a trusted source (e.g., another user or a specialized timestamping service), the entire chain up to that point is locked in, immutable, and temporally ordered. Further, if you assume that the system rejects documents with incorrect creation times, you can be reasonably assured that documents are at least as old as they claim to be. At any rate, bitcoin borrows only the data structure from Haber and Stornetta's work and reengineers its security properties with the addition of the proof-of-work scheme described later in this article.
In their follow-up papers, Haber and Stornetta introduced other ideas that make this data structure more effective and efficient (some of which were hinted at in their first paper). First, links between documents can be created using hashes rather than signatures; hashes are simpler and faster to compute. Such links are called hash pointers. Second, instead of threading documents individually—which might be inefficient if many documents are created at approximately the same time—they can be grouped into batches or blocks, with documents in each block having essentially the same timestamp. Third, within each block, documents can be linked together with a binary tree of hash pointers, called a Merkle tree, rather than a linear chain. Incidentally, Josh Benaloh and Michael de Mare independently introduced all three of these ideas in 1991,6 soon after Haber and Stornetta's first paper.
Merkle trees
Bitcoin uses essentially the data structure in Haber and Stornetta's 1997 paper, shown in simplified form in figure 2 (Nakamoto was presumably unaware of Benaloh and de Mare's work). Of course, in bitcoin, transactions take the place of documents. In each block's Merkle tree, the leaf nodes are transactions, and each internal node essentially consists of two pointers. This data structure has two important properties. First, the hash of the latest block acts as a digest. A change to any of the transactions (leaf nodes) will necessitate changes propagating all the way to the root of the block, and the roots of all following blocks. Thus, if you know the latest hash, you can download the rest of the ledger from an untrusted source and verify that it hasn't changed. A similar argument establishes another important property of the data structure—that is, someone can efficiently prove to you that a particular transaction is included in the ledger. This user would have to send you only a small number of nodes in that transaction's block (this is the point of the Merkle tree), as well as a small amount of information for every following block. The ability to efficiently prove inclusion of transactions is highly desirable for performance and scalability.
Merkle trees, by the way, are named for Ralph Merkle, a pioneer of asymmetric cryptography who proposed the idea in his 1980 paper.33 His intended application was to produce a digest for a public directory of digital certificates. When a website, for example, presents you with a certificate, it could also present a short proof that the certificate appears in the global directory. You could efficiently verify the proof as long as you know the root hash of the Merkle tree of the certificates in the directory. This idea is ancient by cryptographic standards, but its power has been appreciated only of late. It is at the core of the recently implemented Certificate Transparency system.30 A 2015 paper proposes CONIKS, which applies the idea to directories of public keys for end-to-end encrypted emails.32 Efficient verification of parts of the global state is one of the key functionalities provided by the ledger in Ethereum, a new cryptocurrency.
Bitcoin may be the most well-known real-world instantiation of Haber and Stornetta's data structures, but it is not the first. At least two companies—Surety starting in the mid-'90s and Guardtime starting in 2007—offer document timestamping services. An interesting twist present in both of these services is an idea mentioned by Bayer, Haber, and Stornetta,5 which is to publish Merkle roots periodically in a newspaper by taking out an ad. Figure 3 shows a Merkle root published by Guardtime.
Byzantine fault tolerance
Of course, the requirements for an Internet currency without a central authority are more stringent. A distributed ledger will inevitably have forks, which means that some nodes will think block A is the latest block, while other nodes will think it is block B. This could be because of an adversary trying to disrupt the ledger's operation or simply because of network latency, resulting in blocks occasionally being generated near-simultaneously by different nodes unaware of each other's blocks. Linked timestamping alone is not enough to resolve forks, as was shown by Mike Just in 1998.26
A different research field, fault-tolerant distributed computing, has studied this problem, where it goes by different names, including state replication. A solution to this problem is one that enables a set of nodes to apply the same state transitions in the same order—typically, the precise order does not matter, only that all nodes are consistent. For a digital currency, the state to be replicated is the set of balances, and transactions are state transitions. Early solutions, including Paxos, proposed by Turing Award winner Leslie Lamport in 1989,28,29 consider state replication when communication channels are unreliable and when a minority of nodes may exhibit certain "realistic" faults, such as going offline forever or rebooting and sending outdated messages from when it first went offline. A prolific literature followed with more adverse settings and efficiency tradeoffs.
A related line of work studied the situation where the network is mostly reliable (messages are delivered with bounded delay), but where the definition of "fault" was expanded to handle any deviation from the protocol. Such Byzantine faults include both naturally occurring faults as well as maliciously crafted behaviors. They were first studied in a paper also by Lamport, cowritten with Robert Shostak and Marshall Pease, as early as 1982.27 Much later, in 1999, a landmark paper by Miguel Castro and Barbara Liskov introduced PBFT (practical Byzantine fault tolerance), which accommodated both Byzantine faults and an unreliable network.8 Compared with linked timestamping, the fault-tolerance literature is enormous and includes hundreds of variants and optimizations of Paxos, PBFT, and other seminal protocols.
In his original white paper, Nakamoto does not cite this literature or use its language. He uses some concepts, referring to his protocol as a consensus mechanism and considering faults both in the form of attackers, as well as nodes joining and leaving the network. This is in contrast to his explicit reliance on the literature in linked timestamping (and proof of work, discussed next). When asked in a mailing-list discussion about bitcoin's relation to the Byzantine Generals' Problem (a thought experiment requiring BFT to solve), Nakamoto asserts that the proof-of-work chain solves this problem.35
In the following years, other academics have studied Nakamoto consensus from the perspective of distributed systems. This is still a work in progress. Some show that bitcoin's properties are quite weak,43 while others argue that the BFT perspective doesn't do justice to bitcoin's consistency properties.40 Another approach is to define variants of well-studied properties and prove that bitcoin satisfies them.19 Recently these definitions were substantially sharpened to provide a more standard consistency definition that holds under more realistic assumptions about message delivery.37 All of this work, however, makes assumptions about "honest," i.e., procotol-compliant, behavior among a subset of participants, whereas Nakamoto suggests that honest behavior need not be blindly assumed, because it is incentivized. A richer analysis of Nakamoto consensus accounting for the role of incentives doesn't fit cleanly into past models of fault-tolerant systems.
Proof of Work
Virtually all fault-tolerant systems assume that a strict majority or supermajority (e.g., more than half or two-thirds) of nodes in the system are both honest and reliable. In an open peer-to-peer network, there is no registration of nodes, and they freely join and leave. Thus an adversary can create enough Sybils, or sockpuppet nodes, to overcome the consensus guarantees of the system. The Sybil attack was formalized in 2002 by John Douceur,14 who turned to a cryptographic construction called proof of work to mitigate it.
The origins
To understand proof of work, let's turn to its origins. The first proposal that would be called proof of work today was created in 1992 by Cynthia Dwork and Moni Naor.15 Their goal was to deter spam. Note that spam, Sybil attacks, and denial of service are all roughly similar problems in which the adversary amplifies its influence in the network compared to regular users; proof of work is applicable as a defense against all three. In Dwork and Naor's design, email recipients would process only those emails that were accompanied by proof that the sender had performed a moderate amount of computational work—hence, "proof of work." Computing the proof would take perhaps a few seconds on a regular computer. Thus, it would pose no difficulty for regular users, but a spammer wishing to send a million emails would require several weeks, using equivalent hardware.
Note that the proof-of-work instance (also called a puzzle) has to be specific to the email, as well as to the recipient. Otherwise, a spammer would be able to send multiple messages to the same recipient (or the same message to multiple recipients) for the cost of one message to one recipient. The second crucial property is that it should pose minimal computational burden on the recipient; puzzle solutions should be trivial to verify, regardless of how hard they are to compute. Additionally, Dwork and Naor considered functions with a trapdoor, a secret known to a central authority that would allow the authority to solve the puzzles without doing the work. One possible application of a trapdoor would be for the authority to approve posting to mailing lists without incurring a cost. Dwork and Naor's proposal consisted of three candidate puzzles meeting their properties, and it kicked off a whole research field, to which we'll return.
Hashcash
A very similar idea called hashcash was independently invented in 1997 by Adam Back, a postdoctoral researcher at the time who was part of the cypherpunk community. Cypherpunks were activists who opposed the power of governments and centralized institutions, and sought to create social and political change through cryptography. Back was practically oriented: he released hashcash first as software,2 and five years later in 2002 released an Internet draft (a standardization document) and a paper.4
Hashcash is much simpler than Dwork and Naor's idea: it has no trapdoor and no central authority, and it uses only hash functions instead of digital signatures. It is based on a simple principle: a hash function behaves as a random function for some practical purposes, which means that the only way to find an input that hashes to a particular output is to try various inputs until one produces the desired output. Further, the only way to find an input that hashes into an arbitrary set of outputs is again to try hashing different inputs one by one. So, if I challenged you to find an input whose (binary) hash value begins with 10 zeros, you would have to try numerous inputs, and you would find that each output had a 1/210 chance of beginning with 10 zeros, which means that you would have to try on the order of 10 inputs, or approximately 1,000 hash computations.
As the name suggests, in hashcash Back viewed proof of work as a form of cash. On his web page he positioned it as an alternative to David Chaum's DigiCash, which was a system that issued untraceable digital cash from a bank to a user.3 He even made compromises to the technical design to make it appear more cashlike. Later, Back made comments suggesting that bitcoin was a straightforward extension of hashcash. Hashcash is simply not cash, however, because it has no protection against double spending. Hashcash tokens cannot be exchanged among peers.
Meanwhile, in the academic scene, researchers found many applications for proof of work besides spam, such as preventing denial-of-service attacks,25 ensuring the integrity of web analytics,17 and rate-limiting password guessing online.38 Incidentally, the term proof of work was coined only in 1999 in a paper by Markus Jakobsson and Ari Juels, which also includes a nice survey of the work up until that point.24 It is worth noting that these researchers seem to have been unaware of hashcash but independently started to converge on hash-based proof of work, which was introduced in papers by Eran Gabber et al.18 and by Juels and Brainard.25 (Many of the terms used throughout this paragraph didn't become standard terminology until long after the papers in question were published.)
Proof of work and digital cash: A catch-22
You may know that proof of work did not succeed in its original application as an anti-spam measure. One possible reason is the dramatic difference in the puzzle-solving speed of different devices. That means spammers will be able to make a small investment in custom hardware to increase their spam rate by orders of magnitude. In economics, the natural response to an asymmetry in the cost of production is trade—that is, a market for proof-of-work solutions. But this presents a catch-22, because that would require a working digital currency. Indeed, the lack of such a currency is a major part of the motivation for proof of work in the first place. One crude solution to this problem is to declare puzzle solutions to be cash, as hashcash tries to do.
More coherent approaches to treating puzzle solutions as cash are found in two essays that preceded bitcoin, describing ideas called b-money13 and bit gold42 respectively. These proposals offer timestamping services that sign off on the creation (through proof of work) of money, and once money is created, they sign off on transfers. If disagreement about the ledger occurs among the servers or nodes, however, there isn't a clear way to resolve it. Letting the majority decide seems to be implicit in both authors' writings, but because of the Sybil problem, these mechanisms aren't very secure, unless there is a gatekeeper who controls entry into the network or Sybil resistance is itself achieved with proof of work.
Putting it all together
Understanding all these predecessors that contain pieces of bitcoin's design leads to an appreciation of the true genius of Nakamoto's innovation. In bitcoin, for the first time, puzzle solutions don't constitute cash by themselves. Instead, they are merely used to secure the ledger. Solving proof of work is performed by specialized entities called miners (although Nakamoto underestimated just how specialized mining would become).
Miners are constantly in a race with each other to find the next puzzle solution; each miner solves a slightly different variant of the puzzle so that the chance of success is proportional to the fraction of global mining power that the miner controls. A miner who solves a puzzle gets to contribute the next batch, or block, of transactions to the ledger, which is based on linked timestamping. In exchange for the service of maintaining the ledger, a miner who contributes a block is rewarded with newly minted units of the currency. With high likelihood, if a miner contributes an invalid transaction or block, it will be rejected by the majority of other miners who contribute the following blocks, and this will also invalidate the block reward for the bad block. In this way, because of the monetary incentives, miners ensure each other's compliance with the protocol.
Bitcoin neatly avoids the double-spending problem plaguing proof-of-work-as-cash schemes because it eschews puzzle solutions themselves having value. In fact, puzzle solutions are twice decoupled from economic value: the amount of work required to produce a block is a floating parameter (proportional to the global mining power), and further, the number of bitcoins issued per block is not fixed either. The block reward (which is how new bitcoins are minted) is set to halve every four years (in 2017, the reward is 12.5 bitcoins/block, down from 50 bitcoins/block). Bitcoin incorporates an additional reward scheme—namely, senders of transactions paying miners for the service of including the transaction in their blocks. It is expected that the market will determine transaction fees and miners' rewards.
Nakamoto's genius, then, wasn't any of the individual components of bitcoin, but rather the intricate way in which they fit together to breathe life into the system. The timestamping and Byzantine agreement researchers didn't hit upon the idea of incentivizing nodes to be honest, nor, until 2005, of using proof of work to do away with identities. Conversely, the authors of hashcash, b-money, and bit gold didn't incorporate the idea of a consensus algorithm to prevent double spending. In bitcoin, a secure ledger is necessary to prevent double spending and thus ensure that the currency has value. A valuable currency is necessary to reward miners. In turn, strength of mining power is necessary to secure the ledger. Without it, an adversary could amass more than 50 percent of the global mining power and thereby be able to generate blocks faster than the rest of the network, double-spend transactions, and effectively rewrite history, overrunning the system. Thus, bitcoin is bootstrapped, with a circular dependence among these three components. Nakamoto's challenge was not just the design, but also convincing the initial community of users and miners to take a leap together into the unknown—back when a pizza cost 10,000 bitcoins and the network's mining power was less than a trillionth of what it is today.
Public keys as identities
This article began with the understanding that a secure ledger makes creating digital currency straightforward. Let's revisit this claim. When Alice wishes to pay Bob, she broadcasts the transaction to all bitcoin nodes. A transaction is simply a string: a statement encoding Alice's wish to pay Bob some value, signed by her. The eventual inclusion of this signed statement into the ledger by miners is what makes the transaction real. Note that this doesn't require Bob's participation in any way. But let's focus on what's not in the transaction: conspicuously absent are Alice and Bob's identities; instead, the transaction contains only their respective public keys. This is an important concept in bitcoin: public keys are the only kinds of identities in the system. Transactions transfer value from and to public keys, which are called addresses.
In order to "speak for" an identity, you must know the corresponding secret key. You can create a new identity at any time by generating a new key pair, with no central authority or registry. You don't need to obtain a user name or inform others that you have picked a particular name. This is the notion of decentralized identity management. Bitcoin doesn't specify how Alice tells Bob what her pseudonym is—that is external to the system.
Although radically different from most other payment systems today, these ideas are quite old, dating back to David Chaum, the father of digital cash. In fact, Chaum also made seminal contributions to anonymity networks, and it is in this context that he invented this idea. In his 1981 paper, "Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms,"9 he states: "A digital pseudonym' is a public key used to verify signatures made by the anonymous holder of the corresponding private key."
Now, having message recipients be known only by a public key presents an obvious problem: there is no way to route the message to the right computer. This leads to a massive inefficiency in Chaum's proposal, which can be traded off against the level of anonymity but not eliminated. Bitcoin is similarly exceedingly inefficient compared with centralized payment systems: the ledger containing every transaction is maintained by every node in the system. Bitcoin incurs this inefficiency for security reasons anyway, and thus achieves pseudonymity (i.e, public keys as identities) "for free." Chaum took these ideas much further in a 1985 paper,11 where he presents a vision of privacy-preserving e-commerce based on pervasive pseudonyms, as well as "blind signatures," the key technical idea behind his digital cash.
The public-keys-as-identities idea is also seen in b-money and bit gold, the two precursor essays to bitcoin discussed earlier. However, much of the work that built on Chaum's foundation, as well as Chaum's own later work on ecash, moved away from this idea. The cypherpunks were keenly interested in privacy-preserving communication and commerce, and they embraced pseudonyms, which they called nyms. But to them, nyms weren't mere cryptographic identities (i.e., public keys), but rather, usually email addresses that were linked to public keys. Similarly, Ian Goldberg's dissertation, which became the basis of much future work on anonymous communication, recognizes Chaum's idea but suggests that nyms should be human-memorable nicknames with certificates to bind them.20 Thus Bitcoin proved to be the most successful instantiation of Chaum's idea.
The Blockchain
So far, this article has not addressed the blockchain, which, if you believe the hype, is bitcoin's main invention. It might come as a surprise to you that Nakamoto doesn't mention that term at all. In fact, the term blockchain has standard technical definition but is a loose umbrella term used by various parties to refer to systems that bear varying levels of resemblance to bitcoin and its ledger.
Discussing example applications that benefit from a blockchain will help clarify the different uses of the term. First, consider a database backend for transactions among a consortium of banks, where transactions are netted at the end of each day and accounts are settled by the central bank. Such a system has a small number of well-identified parties, so Nakamoto consensus would be overkill. An on-blockchain currency is not needed either, as the accounts are denominated in traditional currency. Linked timestamping, on the other hand, would clearly be useful, at least to ensure a consistent global ordering of transactions in the face of network latency. State replication would also be useful: a bank would know that its local copy of the data is identical to what the central bank will use to settle its account. This frees banks from the expensive reconciliation process they must currently perform.
Second, consider an asset-management application such as a registry of documents that tracks ownership of financial securities, or real estate, or any other asset. Using a blockchain would increase interoperability and decrease barriers to entry. We want a secure, global registry of documents, and ideally one that allows public participation. This is essentially what the timestamping services of the 1990s and 2000s sought to provide. Public blockchains offer a particularly effective way to achieve this today (the data itself may be stored off-chain, with only the metadata stored on-chain). Other applications also benefit from a timestamping or "public bulletin board" abstraction, most notably electronic voting.
Let's build on the asset-management example. Suppose you want to execute trades of assets via the blockchain, and not merely record them there. This is possible if the asset is issued digitally on the blockchain itself, and if the blockchain supports smart contracts. In this instance, smart contracts solve the "fair exchange" problem of ensuring that payment is made if and only if the asset is transferred. More generally, smart contracts can encode complex business logic, provided that all necessary input data (assets, their prices, and so on) are represented on the blockchain.
This mapping of blockchain properties to applications allows us not only to appreciate their potential, but also to inject a much-needed dose of skepticism. First, many proposed applications of blockchains, especially in banking, don't use Nakamoto consensus. Rather, they use the ledger data structure and Byzantine agreement, which, as shown, date to the '90s. This belies the claim that blockchains are a new and revolutionary technology. Instead, the buzz around blockchains has helped banks initiate collective action to deploy shared-ledger technology, like the parable of "stone soup." Bitcoin has also served as a highly visible proof of concept that the decentralized ledger works, and the Bitcoin Core project has provided a convenient code base that can be adapted as necessary.
Second, blockchains are frequently presented as more secure than traditional registries—a misleading claim. To see why, the overall stability of the system or platform must be separated from endpoint security—that is, the security of users and devices. True, the systemic risk of blockchains may be lower than that of many centralized institutions, but the endpoint-security risk of blockchains is far worse than the corresponding risk of traditional institutions. Blockchain transactions are near-instant, irreversible, and, in public blockchains, anonymous by design. With a blockchain-based stock registry, if a user (or broker or agent) loses control of his or her private keys—which takes nothing more than losing a phone or getting malware on a computer—the user loses his or her assets. The extraordinary history of bitcoin hacks, thefts, and scams doesn't inspire much confidence—according to one estimate, at least six percent of bitcoins in circulation have been stolen at least once.39
Concluding Lessons
The history described here offers rich (and complementary) lessons for practitioners and academics. Practitioners should be skeptical of claims of revolutionary technology. As shown here, most of the ideas in bitcoin that have generated excitement in the enterprise, such as distributed ledgers and Byzantine agreement, actually date back 20 years or more. Recognize that your problem may not require any breakthroughs—there may be long-forgotten solutions in research papers.
Academia seems to have the opposite problem, at least in this instance: a resistance to radical, extrinsic ideas. The bitcoin white paper, despite the pedigree of many of its ideas, was more novel than most academic research. Moreover, Nakamoto didn't care for academic peer review and didn't fully connect it to its history. As a result, academics essentially ignored bitcoin for several years. Many academic communities informally argued that Bitcoin couldn't work, based on theoretical models or experiences with past systems, despite the fact that it was working in practice.
We've seen repeatedly that ideas in the research literature can be gradually forgotten or lie unappreciated, especially if they are ahead of their time, even in popular areas of research. Both practitioners and academics would do well to revisit old ideas to glean insights for present systems. Bitcoin was unusual and successful not because it was on the cutting edge of research on any of its components, but because it combined old ideas from many previously unrelated fields. This is not easy to do, as it requires bridging disparate terminology, assumptions, etc., but it is a valuable blueprint for innovation.
Practitioners would benefit from being able to identify overhyped technology. Some indicators of hype: difficulty identifying the technical innovation; difficulty pinning down the meaning of supposedly technical terms, because of companies eager to attach their own products to the bandwagon; difficulty identifying the problem that is being solved; and finally, claims of technology solving social problems or creating economic/political upheaval.
In contrast, academia has difficulty selling its inventions. For example, it's unfortunate that the original proof-of-work researchers get no credit for bitcoin, possibly because the work wasn't well known outside academic circles. Activities such as releasing code and working with practitioners are not adequately rewarded in academia. In fact, the original branch of the academic proof-of-work literature continues today without acknowledging the existence of bitcoin! Engaging with the real world not only helps get credit, but will also reduce reinvention and is a source of fresh ideas.
Sidebars
Sybil-resistant networks
In his paper on Sybil attacks, John Douceur proposed that all nodes participating in a BFT protocol be required to solve hashcash puzzles. If a node were masquerading as N nodes, it would be unable to solve N puzzles in time, and the fake identities would be purged. A malicious node, however, could still obtain a moderate advantage over an honest node that claimed only a single identity. A follow-up paper in 20051 suggested that honest nodes should instead mimic the behavior of malicious nodes and claim as many virtual identities as they computationally can afford to claim. With these virtual identities executing a BFT protocol, the assumption, "At most a fraction f of nodes are faulty" can be replaced with the assumption "The fraction of total computational power controlled by faulty nodes is at most f." Thus, it is no longer necessary to validate identities, and open peer-to-peer networks can run a BFT protocol. Bitcoin uses exactly this idea. But Nakamoto asks a further question: What motivates nodes to perform computationally expensive proof of work? The answer requires a further leap: digital currency.
Smart contracts
A smart contract takes the idea of putting data in a secure ledger and extends it to computation. In other words, it is a consensus protocol for the correct execution of a publicly specified program. Users can invoke functions in these smart-contract programs, subject to any restrictions specified by the program, and the function code is executed in tandem by the miners. Users can trust the output without having to redo the computation and can write their own programs to act on the output of other programs. Smart contracts are especially powerful when combined with a cryptocurrency platform, because the programs in question can handle money—own it, transfer it, destroy it, and, in some cases, even print it.
Bitcoin implements a restrictive programming language for smart contracts. A "standard" transaction (i.e., one that moves currency from one address to another) is specified as a short script in this language. Ethereum offers a more permissive and powerful language.
The idea of smart contracts was proposed by Nick Szabo in 199441 and so named because he saw them as analogs of legal contracts, but with automated enforcement. (This view has been critiqued by Karen Levy31 and Ed Felten.16) Presciently, Szabo presented smart contracts as extensions of digital-cash protocols and recognized that Byzantine agreement and digital signatures (among others) could be used as building blocks. The success of cryptocurrencies has made smart contracts practical, and research on the topic has bloomed as well. For example, programming languages researchers have adapted their methods and tools to automatically discover bugs in smart contracts and to write verifiably correct ones.
Permissioned blockchains
While this article has emphasized that private or permissioned blockchains omit most of bitcoin's innovations, this isn't meant to diminish the interesting work happening in this space. A permissioned blockchain places restrictions on who can join the network, write transactions, or mine blocks. In particular, if miners are restricted to a list of trustworthy participants, the proof of work can be dropped in favor of a more traditional BFT approach. Thus, much of the research is a rebirth of BFT that asks questions such as: Can we use hash trees to simplify consensus algorithms? What if the network can fail only in certain ways?
Further, there are important considerations around identity and public-key infrastructure, access control, and confidentiality of the data stored on the blockchain. These issues largely don't arise in public blockchain settings, nor are they studied in the traditional BFT literature.
Finally, there is also the engineering work of scaling blockchains for high throughput and adapting them to various applications such as supply-chain management and financial technology.
Acknowledgements
The authors are grateful to Adam Back, Andrew Miller, Edward Felten, Harry Kalodner, Ian Goldberg, Ian Grigg, Joseph Bonneau, Malte Möser, Mike Just, Neha Narula, Steven Goldfeder, and Stuart Haber for valuable feedback on a draft.
8. Castro, M., Liskov, B. 1999. Practical Byzantine fault tolerance. Proceedings of the Third Symposium on Operating Systems Design and Implementation;http://pmg.csail.mit.edu/papers/osdi99.pdf.
9. Chaum, D. 1981. Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of the ACM 24(2): 84-90; https://dl.acm.org/citation.cfm?id=358563.
10. Chaum, D. 1983. Blind signatures for untraceable payments. Advances in Cryptology: 199-203.
11. Chaum, D. 1985. Security without identification: transaction systems to make Big Brother obsolete. Communications of the ACM 28(10): 1030-1044; https://dl.acm.org/citation.cfm?id=4373.
19. Garay, J. A., et al. 2015. The bitcoin backbone protocol: analysis and applications. Advances in Cryptology: 281-310; https://eprint.iacr.org/2014/765.pdf.
23. Haber, S., Stornetta, W. S. 1997. Secure names for bit-strings. In Proceedings of the 4th ACM Conference on Computer and Communications Security: 28-35; http://dl.acm.org/citation.cfm?id=266430.
27. Lamport, L., et al. 1982. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems 4(3): 382-401; https://dl.acm.org/citation.cfm?id=357176 .
31. Levy, K. E. C. 2017. Book-smart, not street-smart: blockchain-based smart contracts and the social workings of law. Engaging Science, Technology, and Society 3: 1-15; http://estsjournal.org/article/view/107.
36. Narayanan, A., et al. 2016. Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction. Princeton University Press; http://bitcoinbook.cs.princeton.edu/.
38. Pinkas, B., Sander, T. 2002. Securing passwords against dictionary attacks. Proceedings of the Ninth ACM Conference on Computer and Communications Security: 161-170; https://dl.acm.org/citation.cfm?id=586133.
43. Wattenhofer, R. 2016. The Science of the Blockchain. Inverted Forest Publishing.
44. Rivest, R. L., Shamir, A. 1996. PayWord and MicroMint: Two simple micropayment schemes. International Workshop on Security Protocols.
Arvind Narayanan is an assistant professor of computer science at Princeton. He leads the Princeton Web Transparency and Accountability Project to uncover how companies collect and use our personal information. Narayanan also leads a research team investigating the security, anonymity, and stability of cryptocurrencies, as well as novel applications of blockchains. He co-created a massive open online course, and a textbook on bitcoin and cryptocurrency technologies. His doctoral research showed the fundamental limits of de-identification, for which he received the Privacy Enhancing Technologies Award. Narayanan is an affiliated faculty member at the Center for Information Technology Policy at Princeton and an affiliate scholar at Stanford Law School's Center for Internet and Society. You can follow him on Twitter at @random_walker.
Jeremy Clark is an assistant professor at the Concordia Institute for Information Systems Engineering. He obtained his Ph.D. from the University of Waterloo, where his gold medal dissertation was on designing and deploying secure voting systems, including Scantegrity—the first cryptographically verifiable system used in a public-sector election. He wrote one of the earliest academic papers on bitcoin, completed several research projects in the area, and contributed to the first textbook. Beyond research, he has worked with several municipalities on voting technology and testified to the Canadian Senate on bitcoin. You can follow him on Twitter at @PulpSpy.
Introduction This tutorial attempts to explain a few of the possible methods that
exist on Linux to access the graphics hardware from a low level. I am not
talking about using Xlib instead of GTK+ or QT5, nor am I talking about using
DirectFB, I want to go even lower than that; I'm talking about drawing graphics
to the screen without needing any external dependencies; I'm talking about
communicating directly with the Linux kernel. I will also provide information
about programming for newer graphical systems (Wayland/Mir) even though those
do not involve direct communication with the kernel drivers. The reason I want
to provide this information in this tutorial is that even though their APIs are
higher level, the programming techniques used in low-level graphics programming
can easily be adapted to work with Wayland and Mir. Also, similar to fbdev and
KMS/DRM APIs, good programming resources are hard to come by. Most Linux systems actually provide a few different methods for drawing
graphics to the screen; there are options. However, the problem is that
documentation is basically non-existent. So, I would like to explain here what
you need to know to get started. Please note that this tutorial assumes you have a basic knowledge of C,
this is not a beginner tutorial, this is for people who are interested in
something like learning more about how Linux works, or about programming for
embedded systems, or just doing weird experimental stuff for fun.
Methods for Rendering in Linux The most common graphics architecture in Linux by far is X11. However,
this is not the only way that Linux has to display graphics. In addition, X11
has some disadvantages which are going to be different for everybody. For now,
we are going to say that using xlib or xcb for rendering is too high level. We
want to learn about lower level options, preferably communicating only with the
Linux kernel. There are a few options available, the two main options are the
Linux framebuffer device and DRM. Most other options are not feasible for a
single person to do by him/her self. In addition to these kernel options, there
are a few upcoming graphical systems which are interesting in the way that they
work, being programmed similarly to the kernel interfaces and providing a
hopefully easy path to port your low level application into a fully windowed
environment, in particular I am talking about freedesktop.org's Wayland, and
Ubuntu's Mir. Although these systems are not in use yet, they can be compiled
and installed, and you can write applications for them.
Linux Framebuffer Device (fbdev) The Linux Framebuffer is often talked about, but rarely actually used.
One of the main reasons for this is that documentation is fairly hard to come
by. Like many things, the people that know how to program for the framebuffer
are few and far between, and for some reason, they aren't prone to share their
knowledge. However, it is possible and not too difficult to actually make it
work. So here's a little explanation of how to render to it.
First of all, the Linux kernel must be built with support for the correct
Framebuffer driver. If none is available for your graphics card, you can use
the generic VESA driver, but this requires an additional parameter to be passed
to the kernel at boot time. Other drivers shouldn't require this. If your kernel provides the framebuffer device, it will be called fb0 (or
fb1, fb2, etc.. if there is more than one active driver) and will be located in
/dev. This appears as a file, but in fact it is not a file, it is a file-like
interface to the device driver, part of the Unix Everything-is-a-file concept.
Although it is not a real file, we can still open it (and read to it and write
to it) like a file. This is the very first thing we want to do, open the device
file.
int fb_fd = open("/dev/fb0",O_RDWR);
Once the file is open, we can actually start writing to it. However, this
would not be useful since we don't know the dimensions or color depth of the
screen yet, so we would not be able to accurately draw anything meaningful
inside the buffer. So the next thing that we should do it get some basic
information about the screen. We can do this with the ioctl function (or
syscall). There are two structures defined in linux/fb.h for storing info about
the screen. They are called fb_var_screeninfo and fb_fix_screeninfo. We should
create an instance of each of these structs. (also, remember to include
linux/fb.h)
These structures are defined in linux/fb.h as follows:
struct fb_fix_screeninfo {
char id[16]; /* identification string eg "TT Builtin" */
unsigned long smem_start; /* Start of frame buffer mem */
/* (physical address) */
__u32 smem_len; /* Length of frame buffer mem */
__u32 type; /* see FB_TYPE_* */
__u32 type_aux; /* Interleave for interleaved Planes */
__u32 visual; /* see FB_VISUAL_* */
__u16 xpanstep; /* zero if no hardware panning */
__u16 ypanstep; /* zero if no hardware panning */
__u16 ywrapstep; /* zero if no hardware ywrap */
__u32 line_length; /* length of a line in bytes */
unsigned long mmio_start; /* Start of Memory Mapped I/O */
/* (physical address) */
__u32 mmio_len; /* Length of Memory Mapped I/O */
__u32 accel; /* Indicate to driver which */
/* specific chip/card we have */
__u16 capabilities; /* see FB_CAP_* */
__u16 reserved[2]; /* Reserved for future compatibility */
};
...
struct fb_var_screeninfo {
__u32 xres; /* visible resolution */
__u32 yres;
__u32 xres_virtual; /* virtual resolution */
__u32 yres_virtual;
__u32 xoffset; /* offset from virtual to visible */
__u32 yoffset; /* resolution */
__u32 bits_per_pixel; /* guess what */
__u32 grayscale; /* 0 = color, 1 = grayscale, */
/* >1 = FOURCC */
struct fb_bitfield red; /* bitfield in fb mem if true color, */
struct fb_bitfield green; /* else only length is significant */
struct fb_bitfield blue;
struct fb_bitfield transp; /* transparency */
__u32 nonstd; /* != 0 Non standard pixel format */
__u32 activate; /* see FB_ACTIVATE_* */
__u32 height; /* height of picture in mm */
__u32 width; /* width of picture in mm */
__u32 accel_flags; /* (OBSOLETE) see fb_info.flags */
/* Timing: All values in pixclocks, except pixclock (of course) */
__u32 pixclock; /* pixel clock in ps (pico seconds) */
__u32 left_margin; /* time from sync to picture */
__u32 right_margin; /* time from picture to sync */
__u32 upper_margin; /* time from sync to picture */
__u32 lower_margin;
__u32 hsync_len; /* length of horizontal sync */
__u32 vsync_len; /* length of vertical sync */
__u32 sync; /* see FB_SYNC_* */
__u32 vmode; /* see FB_VMODE_* */
__u32 rotate; /* angle we rotate counter clockwise */
__u32 colorspace; /* colorspace for FOURCC-based modes */
__u32 reserved[4]; /* Reserved for future compatibility */
};
Now that we know all about these structures, we can use ioctl on our open
file descriptor to fill these structures.
//Get variable screen information
ioctl(fb_fd, FBIOGET_VSCREENINFO, &vinfo);
//Get fixed screen information
ioctl(fb_fd, FBIOGET_FSCREENINFO, &finfo);
Note that the fb_var_screeninfo structure is variable information.
This means, in addition to the FBIOGET_VSCREENINFO, we can also call ioctl with
FBIOPUT_VSCREENINFO to change the settings of the framebuffer. Most
importantly, we probably will want to set the bits_per_pixel field to something
reasonable, since by default it seems to be set to something like 8, and is not
enough to render in color. You might also need to set grayscale to 0, but in
practice, it seems to work even if you don't do that. After that, you should
get it again to make sure that your changes were successful.
Once that is done, we can calculate the total size of the screen (in
bytes). This is important because we will need to map exactly the right amount
of memory, and only draw into that memory, otherwise bad things will happen. To
calculate the size of the screen (the size of the buffer), we can use
vinfo.yres_virtual, which is the number of horizontal lines on the screen,
multiplied by finfo.line_length, the length of each line in bytes.
long screensize = vinfo.yres_virtual * finfo.line_length;
Once we have the size of the screen, we can use mmap to map the buffer to
memory. mmap will return a pointer to the beginning of the memory.
Now, you have your framebuffer mapped to memory. All that is left to do
is to draw on it. This we can do just by setting the memory at the right
location to the correct value of the pixel in the color you want. So next what
we need to do is calculate the correct location in the mapped memory of the
pixel that we want to set. For this, we can use the following algorithm:
long x,y; //location we want to draw the pixel
uint32_t pixel; //The pixel we want to draw at that location
//Make sure you set x,y and pixel correctly (details later)
long location = (x+vinfo.xoffset) * (vinfo.bits_per_pixel/8) + (y+vinfo.yoffset) * finfo.line_length;
*((uint32_t*)(fbp + location)) = pixel;
(y+vinfo.yoffset) * finfo.line_length gets the beginning of
line y in memory, and (x+vinfo.xoffset) * (vinfo.bits_per_pixel/8)
gets the offset of x on that line. All we have to do is add those together and
we have the correct location in memory of the pixel we want to draw. Then we
just set that memory to the pixel we want to draw. So we need to decide what
color we want the pixel to be and then calculate what the value for a pixel of
that color would be. We can use the vinfo structure to figure out the pixel
format required for computing the correct pixel value, specifically the red,
green, and blue fields, which are also structures. I like to write a little
function that takes 8 bit values for each color and returns the pixel value, it
looks like this:
This function takes the 8 bit value and shifts it to the left the correct
offset of that color. Then combines it with the other colors using the OR
operator. So if we want to draw a pixel of color 0xFF,0x00,0xFF (purple), it
take the red value (0xFF), shift it over the correct offset or red (probably
16) and the result would be 0x00FF0000, then it would take the green value
(0x00) and shift that to the left (probably 8 bits) and then OR those together,
resulting in the same value since green was set to 0, and then take blue (0xFF)
and shift that the left (probably 0 bits) resulting in 0x000000FF, then OR that
value with the red and green to get the final pixel color of 0x00FF00FF. So you can see how easy it is to determine the correct pixel value. Now
one important thing you need to remember and be aware of at all time is that
you must never try to draw outside of the screen. This is because your program
has gotten permission to write into that buffer, but if you try to write
outside that buffer, you are essentially trying to modify somebody else's
memory. Linux probably won't allow this and your program will end with a
segfault. This is bad for your program, but if you have set the tty to graphics
mode (I will explain later), then you can cause the whole machine to lock up,
which is a very bad thing to do. To make sure we don't draw outside the screen,
we can use vinfo.xres (the width of the screen in pixels) and vinfo.yres (the height of the screen in pixels). If you never draw
and pixels above vinfo.xres,vinfo.yres or below 0,0, then you should be fine.
Also, you should note that it is actually safe to draw X values past vinfo.xres
since the buffer is just one big block of memory, if you exceed vinfo.xres, you
will actually be drawing on the line below Y. So (1+vinfo.xres),14 is the same
as 1,15. Of course, this is probably not something that you would ever want to
do, so it's probably best to just never draw X greater than vinfo.xres. Now, as a final example, let's take a look at some code to set up the
framebuffer and then make the entire screen bright purple. The code looks like
this:
Now if you run this code, you will probably see nothing, that doesn't
mean that it's not working, it just means your eyes aren't fast enough. If you
want to see it work, you should add a delay (nanosleep) after rendering.
I have one final thing to explain, as I promised earlier. However, you
need to be aware that this is dangerous and not strictly required. If you use
this and something goes wrong and your program doesn't clean up properly, then
you will lock up your computer (not really, but the screen will stop
responding). You have been warned. What I am talking about is claiming the tty for graphics only. This will
prevent the framebuffer console from trying to draw overtop of your graphics.
You will need to use ioctl again, but this time not on the framebuffer device,
instead you will have to use it on the tty device, probably /dev/tty0. You will
need to call KDSETMODE with KD_GRAPHICS to set up the tty for graphics only,
and then with KD_TEXT to undo it. You MUST set it back to KD_TEXT at
exit, or else. I recommend never using this until you are 100% sure that your
code will not cause a segfault, otherwise you are going to be rebooting your
computer a lot. You could probably also set something up so that you can press
some key combination or type some simple command that runs a program that just
sets KDSETMODE back to KD_TEXT, that's probably easier, but somehow I doubt you
will actually bother. Anyways, the code looks like this:
int tty_fd = open("/dev/tty0", O_RDWR);
ioctl(tty_fd,KDSETMODE,KD_GRAPHICS);
...
//At exit:
ioctl(tty_fd,KDSETMODE,KD_TEXT);
That's all there is to it.
Now that you are all set up to render stuff, you probably want to start
actually drawing stuff besides single pixels or clearing the whole screen, and
you probably want to to do something about the flickering that will occur when
rendering directly to the framebuffer. If that's the case, have a look at the
sections on double buffering and drawing primatives.
Direct Rendering Manager (DRM) Dumb Buffers Supposedly, fbdev is the "old" way of doing things, and KMS/DRM is the
"new" way. I don't really get what all the fuss is about, I like fbdev for what
it is. KMS/DRM has certain obvious advantages, but that's no reason to throw
away fbdev (fortunately, fbdev will still be around for a long time though). Anyways, KMS/DRM is a much more featureful interface, and gives you a lot
more options (which means it's also a lot more complicated). KMS/DRM offers
much more control over the graphics hardware, which is great if you want to do
really fancy stuff like hardware acceleration. It also has it's own mechanism
for double buffering, which is nice. Anyways, let's get to it. DRM has a feature called "dumb buffers" which
is essentially a framebuffer. It's supposedly the easiest to set up, but really
still quite a pain. KMS/DRM is a kernel interface, however, most applications
using KMS/DRM use libdrm, which makes some parts of the process a lot easier.
However, this is a low-level tutorial, and in order to keep this as low-level
as possible, we are going to avoid using any user land library. Fortunately,
it isn't too difficult to bypass libdrm (thankfully everything is open source).
First of all, we need to discover and configure the hardware. For this,
we will use Kernel Mode Setting (KMS). This step is quite tedious, but bare
with me. A lot of this typically handled in libdrm, but we will not be using
libdrm at all. I learned a lot of this by reading the libdrm source code, I
have simplified a lot of it. The very first thing that needs to be done is to open the DRI device (DRI
stands for Direct Rendering Infrastructure). The DRI device is provided by the
kernel in /dev/dri/card0. This assumes you only have one graphics card of
course, if you have more than one graphics card, they will be called card1,
card2, card3, etc.... However, it is usually safe to assume that you can just
use card0. We can open it with a call to open, just like we did for fbdev.
int dri_fd = open("/dev/dri/card0",O_RDWR);
We will need this file descriptor to do all our communication with the
DRM driver through ioctl calls. We will also need to have some structures and
preprocessor constants, all of which are defined in two header files provided
by the kernel. Include the headers drm/drm.h and drm/drm_mode.h. Do not include
xf86drm.h or xf86drmMode.h, these are part of libdrm, not the kernel interface.
#include <drm/drm.h>
#include <drm/drm_mode.h>
The first thing we need to do is become the "master" of the dri device.
This we can do with the ioctl, DRM_IOCTL_SET_MASTER.
ioctl(dri_fd, DRM_IOCTL_SET_MASTER, 0);
We only need to be "master" to do the actual KMS mode setting, so we can
drop it as soon as we are done with that. The first thing we need to do it find all the "connectors" the card
provides. "Connectors" seem to usually correspond to actual hardware connectors
on the graphics card. For instance, I have 3 on my laptop: The internal
connector to the LCD, the external VGA connector and the HDMI connector. To get this we need to create an instance of the structure
drm_mode_card_res. We can ask the kernel to fill this structure with the ioctl,
DRM_IOCTL_MODE_GETRESOURCES. Let's quick take a look at the structure's
definition
However, the way that we have to do this is a little funky because we
don't actually know how many connectors there are yet. You may notice that the
structure contains a field called count_connectors, which will be used to
notify our program of how many connectors there are on the card. Then, once we
have this count, we will call ioctl again and get the actual connector ID,
which we can then use to get the actual connector information. First thing's
first, we need to make sure that the structure is zero'ed out, otherwise the
kernel will interpret the ioctl as a request for IDs instead of a request for
the resource counts. Then we use DRM_IOCTL_MODE_GETRESOURCES to fill in the
count fields of the structure. This is where there might be trouble, depending on your application. You
are supposed to allocate enough memory to store the IDs of the resources. I
would rather not allocate memory dynamically if I don't have to, although you
may feel free to do so if it suits your purposes. Instead, I am just going to
allocate more than enough space and assume that it will always be enough. This
might be considered bad form to some people, as it makes assumptions about the
user's hardware and wastes a tiny bit of memory, but I am happy with it. We
should allocate this memory as an array of 64 bit integers, and we will need
four of them since we have four types of resources to deal with (connectors,
encoders, framebuffers, and crtcs). The code will look something like this.
Note how we filled in the fields fb_id_ptr, crtc_id_ptr, connector_id_ptr, and
encoder_id_ptr with the respective memory addresses of the arrays we created.
Then, the next thing we need to do is iterate through all the connector
IDs we discovered and get the information about them. Once we get enough
information, we can actually do the mode setting to set the card up.
Fortunately, the kernel driver has already provided us with the number of
connectors on the graphics card, so we can set up our loop very easily.
int i;
for (i=0;i<res.count_connectors;i++)
{
Then, we can start to get information about the connector. For this we
will need a structure called drm_mode_get_connector, which is defined like this.
We will need to use ioctl DRM_IOCTL_MODE_GETCONNECTOR to fill in this
structure. Like DRM_IOCTL_MODE_GETRESOURCES, we will need to use this ioctl
twice, first to get the resource counts, then next to get the resources. It's important to know about the connector's resources, especially what
it calls "modes". "Modes", in this case, are the available resolutions for the
display you are using, starting with the current one. This will tell us how big
we can make our framebuffer. Notice here that the fields resource fields do not
include _id_ptr, but just _ptr. This time, the kernel will not fill in the
allocated memory with resource IDs, but with actual resource information, so we
need to actually create an array of structures for the modes_ptr field. This
structure should be of type drm_mode_modeinfo which is defined as follows.
The other resource fields can just be unsigned 64 bit integers, we
probably won't even need to look at them, but they do need to exist so that we
don't overwrite some memory somewhere else by accident. We can perform that
ioctl now.
Now that we have our "modes", we can create our "dumb" framebuffer. We
don't want to try to create any framebuffers for non-connected connectors
though, so we can can do a few checks to see if it's valid. First, we can check
if there is at least 1 valid encoder, and one valid mode. We will need both a
valid mode and a valid encoder in order to make use of the connector, so if
there aren't any, then there is no need to mess with that connector further. In
addition, we can check if the connector is "connected", with the connection
field. We can also check if the connector has an active encoder. We could still
use it even if it has no active encoder, but it's a lot of work, and one
connector will have an active encoder. So we can check if there is an
encoder_id.
if (conn.count_encoders<1 || conn.count_modes<1 || !conn.encoder_id || !conn.connection)
continue;
This part is nice an easy compared with KMS, except for one small compilation,
which is what to do in the case that there is more than one connector active.
One thing we can do is create a framebuffer for each connector, but this could
lead to complications later when we want to draw to them. We might also just
assume that there is only one valid connector, but this is pretty ugly. It is
also possible to use KMS to set up the system with one framebuffer for all the
valid connectors. What you do here is really dependent on what your program
wants to do. For this example, we will create a framebuffer for each connector. So what we have to do is maintain a list of all the created framebuffers
and the mode that you want to set it to. I am going to do this with just three
arrays, all the same size as our maximum assumed number of connectors. The
first will store the base pointer for the framebuffer, and the other two will
hold the width and height for the buffers. It's an ugly solution, but this will
work.
void *fb_base[10];
long fb_w[10];
long fb_h[10];
SIDE NOTE: You may notice that our code is becoming uglier and uglier.
The reason for this is because we are avoiding dealing with some fairly serious
issues. Namely how we should actually choose a mode, how to choose a connector,
what to do with more than one connector, and also avoiding dynamic memory
allocation whenever possible. There are elegant solutions to these problems,
but outside the scope of this document. However, the basic outline of the
elegant solution is as follows, check each connector for validity, build a list
of all valid connectors, find a mode valid for all valid connectors, create a
single framebuffer at this resolution, set up all connectors to use this
framebuffer. END SIDE NOTE
Next, let's create our "dumb" framebuffer. We need three structures,
drm_mode_create_dumb, drm_mode_map_dumb, and drm_mode_fb_cmd.
The first two are used to create and map the dumb buffer. The last one is
used to "add" the buffer, basically letting the DRM driver know that we created
it. The structures are defined as follows.
The first thing we need to do is fill out the create structure with a
mode. Then we can call the ioctl, DRM_IOCTL_MODE_CREATE_DUMB. This will fill in
the handle field. This handle is a Graphics Execution Manager (GEM) handle, and
we will need to remember it. Second, we put the handle into the drm_mode_fb_cmd
structure (as well as other fields) and call the ioctl, DRM_IOCTL_MODE_ADDFB.
Then we prepare the framebuffer for mapping by filling in the drm_mode_map_dumb
structure's handle and using the ioctl, DRM_IOCTL_MODE_MAP_DUMB. Then we can
finally map the framebuffer using mmap.
Note how I also filled out the arrays for the various connectors with the
base pointer to the mapped framebuffer and the corresponding dimension of them.
Then we can be done with the dumb buffer creation. We now have our
framebuffer(s) mapped and once we finish our mode setting we can begin drawing
into them.
Back to mode setting. Next we need to get an encoder. In this context, an
encoder is what takes our framebuffer and gets it into the proper format for
"scanout" (that is, displaying it on the screen). Fortunately, our connector
structure has a great field called encoder_id, which is the ID of the active
encoder for that connector. We could search for other encoders, but we could
also just use this one, let's do that. We only have the ID of the encoder, but
we will need to get more information about it. For this, use the
drm_mode_get_encoder structure along with the ioctl, DRM_IOCTL_MODE_GETENCODER.
The structure looks like this.
struct drm_mode_get_encoder {
__u32 encoder_id;
__u32 encoder_type;
__u32 crtc_id; /** Id of crtc */
__u32 possible_crtcs;
__u32 possible_clones;
};
Which gives us the ID of the current CRTC for this encoder. A CRTC is a
CRT (Cathode Ray Tube??) Controller. This is what we really need to set up. The
CRTC's structure looks like this.
struct drm_mode_crtc {
__u64 set_connectors_ptr;
__u32 count_connectors;
__u32 crtc_id; /** Id */
__u32 fb_id; /** Id of framebuffer */
__u32 x, y; /** Position on the frameuffer */
__u32 gamma_size;
__u32 mode_valid;
struct drm_mode_modeinfo mode;
};
We can get information about it using the ioctl, DRM_IOCTL_MODE_GETCRTC.
Our code might look like this.
Then, finally, we can set up the CRTC and connect it to our newly created
dumb framebuffer using the ioctl, DRM_IOCTL_MODE_SETCRTC. Take note that
mode_valid must be set to 1, otherwise the DRM driver won't do anything.
I set up the connectors_ptr field with sort of a hack here. I just used
the original list, but instead of starting at the beginning, I start at the
current position in the list (the current connector) and hard coded the
connector count to always be one. At this point you can stop being the "master"
of the DRM device as this is no longer required. Use the ioctl,
DRM_IOCTL_DROP_MASTER.
ioctl(dri_fd, DRM_IOCTL_DROP_MASTER, 0);
Then we are finally done with our KMS. We can now draw on our framebuffer
and the DRM driver will deliver that data to the screen. For instance, to clear
the screen to the color purple, we might use the following code.
int i;
for (i=0;i<res.count_connectors;i++)
for (y=0;y<fb_h[i];y++)
for (x=0;x<fb_w[i];x++)
{
int location=y*(fb_w[i]) + x;
*(((uint32_t*)fb_base[i])+location)=0x00ff00ff;
}
That's all there is to it. A full working example follows.
#include <stdio.h>
#include <stdint.h<
#include <fcntl.h<
#include <sys/mman.h<
#include <sys/ioctl.h<
#include <drm/drm.h<
#include <drm/drm_mode.h<
int main()
{
//------------------------------------------------------------------------------
//Opening the DRI device
//------------------------------------------------------------------------------
int dri_fd = open("/dev/dri/card0",O_RDWR | O_CLOEXEC);
//------------------------------------------------------------------------------
//Kernel Mode Setting (KMS)
//------------------------------------------------------------------------------
uint64_t res_fb_buf[10]={0},
res_crtc_buf[10]={0},
res_conn_buf[10]={0},
res_enc_buf[10]={0};
struct drm_mode_card_res res={0};
//Become the "master" of the DRI device
ioctl(dri_fd, DRM_IOCTL_SET_MASTER, 0);
//Get resource counts
ioctl(dri_fd, DRM_IOCTL_MODE_GETRESOURCES, &res);
res.fb_id_ptr=(uint64_t)res_fb_buf;
res.crtc_id_ptr=(uint64_t)res_crtc_buf;
res.connector_id_ptr=(uint64_t)res_conn_buf;
res.encoder_id_ptr=(uint64_t)res_enc_buf;
//Get resource IDs
ioctl(dri_fd, DRM_IOCTL_MODE_GETRESOURCES, &res);
printf("fb: %d, crtc: %d, conn: %d, enc: %d\n",res.count_fbs,res.count_crtcs,res.count_connectors,res.count_encoders);
void *fb_base[10];
long fb_w[10];
long fb_h[10];
//Loop though all available connectors
int i;
for (i=0;i<res.count_connectors;i++)
{
struct drm_mode_modeinfo conn_mode_buf[20]={0};
uint64_t conn_prop_buf[20]={0},
conn_propval_buf[20]={0},
conn_enc_buf[20]={0};
struct drm_mode_get_connector conn={0};
conn.connector_id=res_conn_buf[i];
ioctl(dri_fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn); //get connector resource counts
conn.modes_ptr=(uint64_t)conn_mode_buf;
conn.props_ptr=(uint64_t)conn_prop_buf;
conn.prop_values_ptr=(uint64_t)conn_propval_buf;
conn.encoders_ptr=(uint64_t)conn_enc_buf;
ioctl(dri_fd, DRM_IOCTL_MODE_GETCONNECTOR, &conn); //get connector resources
//Check if the connector is OK to use (connected to something)
if (conn.count_encoders<1 || conn.count_modes<1 || !conn.encoder_id || !conn.connection)
{
printf("Not connected\n");
continue;
}
//------------------------------------------------------------------------------
//Creating a dumb buffer
//------------------------------------------------------------------------------
struct drm_mode_create_dumb create_dumb={0};
struct drm_mode_map_dumb map_dumb={0};
struct drm_mode_fb_cmd cmd_dumb={0};
//If we create the buffer later, we can get the size of the screen first.
//This must be a valid mode, so it's probably best to do this after we find
//a valid crtc with modes.
create_dumb.width = conn_mode_buf[0].hdisplay;
create_dumb.height = conn_mode_buf[0].vdisplay;
create_dumb.bpp = 32;
create_dumb.flags = 0;
create_dumb.pitch = 0;
create_dumb.size = 0;
create_dumb.handle = 0;
ioctl(dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, &create_dumb);
cmd_dumb.width=create_dumb.width;
cmd_dumb.height=create_dumb.height;
cmd_dumb.bpp=create_dumb.bpp;
cmd_dumb.pitch=create_dumb.pitch;
cmd_dumb.depth=24;
cmd_dumb.handle=create_dumb.handle;
ioctl(dri_fd,DRM_IOCTL_MODE_ADDFB,&cmd_dumb);
map_dumb.handle=create_dumb.handle;
ioctl(dri_fd,DRM_IOCTL_MODE_MAP_DUMB,&map_dumb);
fb_base[i] = mmap(0, create_dumb.size, PROT_READ | PROT_WRITE, MAP_SHARED, dri_fd, map_dumb.offset);
fb_w[i]=create_dumb.width;
fb_h[i]=create_dumb.height;
//------------------------------------------------------------------------------
//Kernel Mode Setting (KMS)
//------------------------------------------------------------------------------
printf("%d : mode: %d, prop: %d, enc: %d\n",conn.connection,conn.count_modes,conn.count_props,conn.count_encoders);
printf("modes: %dx%d FB: %d\n",conn_mode_buf[0].hdisplay,conn_mode_buf[0].vdisplay,fb_base[i]);
struct drm_mode_get_encoder enc={0};
enc.encoder_id=conn.encoder_id;
ioctl(dri_fd, DRM_IOCTL_MODE_GETENCODER, &enc); //get encoder
struct drm_mode_crtc crtc={0};
crtc.crtc_id=enc.crtc_id;
ioctl(dri_fd, DRM_IOCTL_MODE_GETCRTC, &crtc);
crtc.fb_id=cmd_dumb.fb_id;
crtc.set_connectors_ptr=(uint64_t)&res_conn_buf[i];
crtc.count_connectors=1;
crtc.mode=conn_mode_buf[0];
crtc.mode_valid=1;
ioctl(dri_fd, DRM_IOCTL_MODE_SETCRTC, &crtc);
}
//Stop being the "master" of the DRI device
ioctl(dri_fd, DRM_IOCTL_DROP_MASTER, 0);
int x,y;
for (i=0;i<100;i++)
{
int j;
for (j=0;j<res.count_connectors;j++)
{
int col=(rand()%0x00ffffff)&0x00ff00ff;
for (y=0;y<fb_h[j];y++)
for (x=0;x<fb_w[j];x++)
{
int location=y*(fb_w[j]) + x;
*(((uint32_t*)fb_base[j])+location)=col;
}
}
usleep(100000);
}
return 0;
}
Now you should be able to get graphics drawn on the screen using KMS/DRM
dumb buffers. For further information, you should check out the sections on
double buffering and drawing
primatives.
PLEASE NOTE: This section provides a working example, but it still needs
a significant amount of work to improve it. It shows many less than beautiful
solutions for the problems faced. If you have a better solution, please email
me at prushik@betteros.org and I will update the tutorial (and include your
name). However, solutions must follow the following constraints: no libdrm, no
standard libc functions that do not map 1-1 to Linux system calls, no 3rd party
libraries.
X Server (X11) Direct Connection Connecting to the X11 server is by far the most common method of
displaying graphics on Linux systems. It has been around a long time and is in
use on virtually every Linux system, it will most likely still be around for
the forseeable future, even after Wayland and Mir start to see common use.
However, X11 is far from a simple and easy to use protocol. Typically, an
application using X11 for graphics will use a very high level widget library,
such as GTK+ or QT, which in turn use Xlib or XCB to establish connection and
handle communication with the X server. A simpler application might only use
Xlib or XCB if the programmer has enough skill. XCB is currently accepted as
the lowest level method possible of communicating with the X server. However,
I refuse to accept this. The X11 protocol uses the client server model for communication. This
means that, if we can open sockets, we can connect to the X server on our own,
without relying on Xlib or XCB to facilitate communication. We will just have
to handle the X11 protocol stuff ourselves. If you were writing an X server,
this would be a very daunting and nearly impossible task given the scope of all
the X server is expected to handle; however, writing a client is a much simpler
task as you will only need to implement the parts of the protocol you need to
work with, and you can ignore all the extensions you aren't interested in. However, it's important to note that Xorg actually allows for multiple
transport methods. The most common of which is actually not network
communication, but it is socket based communication, so the connection
gets established in almost the same way. The first thing you will need to do in order to establish a connection is
to check out the DISPLAY environment variable. The DISPLAY environment variable
will give us the information about which transport method to use and where the
server is located. This variable will only be set on the client, which is what
we are focusing on here. The DISPLAY variable has three parts, which you will
have to be able to parse out. The DISPLAY variable come in the following form: host:display.screenhost is the hostname of the server. This field is is actually the
one that determines the transport method. If a host name is specified and a
valid and resolvable hostname or an IP address, then TCP is selected as the
transport method. If this field is left blank, or is "unix", then Unix Domain
Sockets are used. On modern desktop systems, DISPLAY is most likely going to be
set to ":0.0", so supporting Unix Domain Sockets is probably going to be very
important. display and screen are the display number and screen number.
Screen is an optional field, if it is left out, the application should assume
0. Multiple screens are hardly ever used, even on setups which actually do have
more than one monitor attached, so ignoring screens might be good enough, it's
up to you. display is manditory, but it is almost always zero, in fact, it's so
common that some video card drivers actually don't work right if it the display
number is anything else. However, it's always best to do handle every scenario
as long as performance isn't impeded. If you are unsure of what any of these
terms mean, then you should do some research about X11 and how it works. So now you should be able to see how we can find out the server to
connect to and the transport method using only the DISPLAY variable. Now that
we know this, we can establish our connection. In the case that our server is using TCP as a transport method, then the
address is already in the DISPLAY variable, the TCP port is a little trickier
to figure out though. The typical X11 port is TCP port 6000, however, if the
display set in the DISPLAY variable is greater than zero, the port number the
server will be listening on will be 6000 + display number. So if our DISPLAY
variable is set to "127.0.0.1:14.0", then we would want to connect to the
localhost (127.0.0.1 is localhost) on port 6014. In the more likely case that the server is using Unix Domain Sockets,
then we don't have to worry about the address or hostname to connect to, since
Unix Domain Sockets are only for inter process communication on one machine. If
you aren't familiar with Unix Domain Sockets, I'll explain a little bit about
them. It is a very similar concept to a Linux kernel device file. The server
adds an entry to the file hierachy, which appears as a file. The client can
then connect to that node using it's path and then communicate with the server
by writing into the resulting file descriptor. In the case of X11, the server
creates a node in /tmp/.X11-unix/ named after the display number prefixed by a
capital X. It seems scary to me to have to hard-code paths into my application,
but I checked the XCB sources, and it seems to be exactly what XCB is doing
anyway. So therefore, if DISPLAY is ":14.0", then our client application would
need to connect to /tmp/.X11-unix/X14. We don't have to worry about port
numbers when using Unix Domain Sockets, since there aren't any in this
transport method. Here is an example of how to connect to a Unix Domain Socket for display ":0"
//Create the socket
sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
if (sockfd < 0)
error("Error opening socket\n",21);
serv_addr.sun_family = AF_UNIX;
strcpy(serv_addr.sun_path, "/tmp/.X11-unix/X0");
srv_len = sizeof(struct sockaddr_un);
//Connect to socket
connect(sockfd,(struct sockaddr *)&serv_addr,srv_len);
Once the connection is established, we can start communicating directly
with the server using normal read/write calls, all we have to do is get the
protocol right. If you have ever programmed anything with XCB, then you should
already have a good idea of what needs to be done, since XCB functions are
almost a 1 to 1 mapping of packets sent to the server. One good way to get an
idea of what your packets should look like is to do a network capture of a
simple X11 or XCB program. To do this, make sure your X server supports TCP
communication (wasn't run with -nolisten flag), then set your DISPLAY variable
to "127.0.0.1:0", and then run your packet capture software and the test
program. Basically, we need to do the same things that you would normally do in
XCB. However, before we get into that, we have to finish setting up the
connection. The X11 protocol starts with a little handshake, the client send
an initial connection request, to which the server responds with an initial
connection reply. The request is a short and simple packet, which basically
only contains 2 pieces of information, the byte order the connection will use,
and the protocol version that will be used. The byte order field is first and it is just a one byte wide field. This
should probably be set to little endian unless you are working with embedded
systems. For little endian, the field should be set to 0x6c which is a lower
case 'l'. For big endian, this should be 0x66 which is an upper case 'B'.
Immediately following the endianness field, is one byte of padding. This can
probably be anything, but set it to zero just in case. Next are two 16 bit wide integer fields specifying the protocol version,
the major version number followed by the minor version number. Since we are
using X11, the major version should just be 11 (0x0b). The minor number should
be 0. This is what all clients I have seen do, and I doubt there is any reason
to set the minor version to anything different. Following that are two more 16 bit wide fields. These are used for an
authorization protocol, which I don't want to get into or deal with, so just
set them both to 0. I have never seen anything else anyway. Finally, following
that, there is another two bytes of unused padding. The request packets fields can be easily stored in a C structure like the
following:
That concludes the connection request part. Assuming that the server
accepts your connection, it will finish the handshake with a connection reply.
This reply is generally a massive packet containing a ton of information, which
will not be fun to decode, but it has to be done. The first thing you need to do is read the first 8 bytes. The packet will
be bigger than that, but we don't want to read the rest of it yet because it
may vary in size, and we don't know yet how big it will be. The first 8 bytes
will contain the information that we need to know to get the rest of the
packet. The reply will contain a one byte field called a success code, then one
byte of padding, then two 16 bit wide fields specifying the major and minor
version numbers of the server, followed by a 16 bit field specifying the size
of the remaining data in the packet. The success code will either be 1 for success, or 0 for failure. The
major and minor fields will be 11 and 0 respectively. The length field will be
the size of the remaining data to be read, however, the value given is divided
by 4, so to get the number of bytes left in the packet, you need to multiply
the length field by 4. This data can be read easily into a structure like the following:
Then things start to get tricky. There is probably going to be almost
2000 bytes left to read. There are a few more sections left in the packet, but
now that we know the size of the packet, we can read the rest of the packet
into a buffer. The next section we read will be the connection setup section. This data
is a fixed size of 24 bytes, and it fits nicely into a structure like the
following:
There are some very important fields in this section. The release is the
actual version of the X server running, we can ignore this. The id_base and
id_mask are very important. These determine what numbers we will be using for
resource IDs in our application. The motion_buffer_size is of little importance
for now. The vendor_length is important to remember, even though we probably do
not care who the X server vendor is, nor should any application take any
action based on the vendor, but we will need the length so we know how many
bytes we need to burn after the section and before we get to important stuff.
request_max is the maximum size a request can be. roots is the number of roots.
formats is the number of available formats. image_order and bitmap_order are
for endianness. scanline_unit and scanline_pad will be one of 8, 16, or 32.
keycode_min and keycode_max are the minimum and maximum values possible for key
codes. The padding at the end is just unused padding. Immediately following this section is the vendor name. This is a variable
length string, but we already know it's length because the connection setup
section includes the vendor_length field, which will contain the length
of the vendor name string in bytes. I have a feeling that this is never
actually useful, so chances are you can ignore it, you might not even bother
allocating memory for it. After that, we will get a list of all the available pixmap formats. These
are fixed length structures, but there is a variable number of them in the
packet. However, we know how many there will be due to the formats field
in the setup section. The structure for storing these pixmap formats looks like
this:
Inside this structure we get the color depth, bits per pixel, scanline
padding, and 40 bits of unused padding. Remember that there will be more than
just one available pixmap format. You will need to remember these if you want
to draw pixmaps on the screen. After the pixmap formats, we move on to the next section: root windows.
If you have ever worked with X in the past, then you probably know what this
means. The root window is the bottom most window, sometimes the one that
displays the desktop. All other windows are a child of the root window. In
order for you to draw your own window on the screen, you will need to know the
ID of the root window. This is where we get it from. I guess it is possible for there to be multiple root windows, based on how the
protocol is designed, but I don't see why it would ever happen, but there could
be something I am overlooking. Anyways, the number of root windows was defined
in the setup section by the roots field. The structure of a root window
looks like this:
Now, this is another place where things can get complicated
programatically. Clearly, this is an important part of the handshake to
interpret, however, there is a lot of stuff here. Also, let's not forget that
it is technically possible (according to the protocol) for there to be more
than one root window, which makes things even more complicated when we get into
the depths. There will be multiple depths, and each depth will have a list of
visuals to go with it. The structure for storing depths looks something like
this:
As you can see, the depth structure is very simple. However, what makes
it complicated is that there will be a list of all valid visuals for that depth
following each entry. The number of visuals for each depth is specified in the
visuals field of the depth structure. A visuals structure looks like this:
It is probably helpful to keep all the information gathered from the
handshake structure of its own. I like to call it x11_connection, because it
makes sense to associate this information with the X connection.
int x11_handshake(int sock, struct x11_connection *conn)
{
struct x11_conn_req req = {0};
req.order='l'; //Little endian
req.major=11; req.minor=0; //Version 11.0
write(sock,&req,sizeof(struct x11_conn_req)); //Send request
read(sock,&conn->header,sizeof(struct x11_conn_reply)); //Read reply header
if (conn->header.success==0)
return conn->header.success;
conn->setup = sbrk(conn->header.length*4); //Allocate memory for remainder of data
read(sock,conn->setup,conn->header.length*4); //Read remainder of data
void* p = ((void*)conn->setup)+sizeof(struct x11_conn_setup)+conn->setup->vendor_length; //Ignore the vendor
conn->format = p; //Align struct with format sections
p += sizeof(struct x11_pixmap_format)*conn->setup->formats; //move pointer to end of section
conn->root = p; //Align struct with root section(s)
p += sizeof(struct x11_root_window)//move pointer to end of section
conn->depth = p; //Align depth struct with first depth section
p += sizeof(struct x11_depth); //move pointer to end of section
conn->visual = p; //Align visual with first visual for first depth
return conn->header.success;
}
Note that this function places the pointers for root, depth, and visual
only on the first instace of each section. That means that root[1] will not
give you correct information about the second root window (if there ever is
one), and depth[1] will only give you the second depth structure for the first
root window, assuming it has more than one. Be aware of this.
As with XCB, the next thing that needs to be done after the connection is
established, is to generate a resource ID so we can set up the graphics
context. This doesn't actually require any communication with the server, all
the information we need was part of the handshake. The connection setup section
included the fields id_base and id_mask. To generate a resource
ID, all we need to do is find an unused value above id_base that fits in the
mask of id_mask. The easiest thing to do is have a function that keeps an the
last ID used in an internal state variable, and just increment the value each
time.
Once we have an ID, things are going to start getting a lot more familiar
for XCB programmers. One thing that we will need to deal with is op codes. In
XCB, to create a graphics context, all you need to do is call the function:
xcb_create_gc. However, we can't do that, since we aren't using XCB and we
don't have any predefined functions. So to get the server to do what we want,
we need to send it the correct op code for the operation we want to perform.
The best way to handle this is to set them up as preprocessor constants, at
least for the core operations (extensions are more complicated). Op codes 0-127
are used for the core protocol and are constant. Op codes about 127 are used
for extensions, and are dynamically assigned. For now, we are only focusing on
the core protocol. I don't really want to list out all the op code in the core protocol, so
I will just give you the ones that will be used in this tutorial as an example.
A full listing of op codes can be found in the xproto tarball which you can
get from x.org.
Then, all we have to do is send a packet with the op code and the correct
arguments for that op code. It is also possible to send multiple requests in
one single packet, simply by concatenating the requests, which is exactly what
XCB does with all requests until xcb_flush is called. There are a few ways one to implement this in your application. One might
be similar to how the XCB API works, with a function for each request. Or you
might be able to get away with a variadic function to which you pass your op
code as the first argument. However, not all arguments are of the same size in
the X protocol, so you would need so way to determine this, either based on the
op code passed to it, or to pass a size along with each argument.
Unfortunately, both of those methods of implementing request via variadic
functions has significant limitations, so I will show each request type as it's
own function. For implementing functions other than the ones I will be showing
in this tutorial, a good place to look is in the XCB API, XCB sources, XCB
documentation, and packet captures. One thing to note is that in X11, many requests have a flag field. This
flag field tells the server how many additional fields will follow the flags
field and what will be stored inside them. This means that many X11 requests
are actually variable length. This can be annoying when trying to program
efficiently. Fortunately, each field is either 32 bits wide, or padded out to
32 bits, and there are many algorithms available for counting bits, so we can
determine the size of the request fairly easily. I will include an algorithm
for counting bits in my example, the algorithm is adapted from a algorithm
found on this
page. Ideally, if you are only targeting new enough processors, then you
could write a small ASM function to utilize the POPCNT instruction. However,
my processor (Core 2 Duo) does not support this (SSE4.2). You might also use
__builtin_popcount, assuming your compiler is GCC or clang and a standard c
library is being used in your application.
//MIT HACKMEM bit counting algorithm
int count_bits(uint32_t n)
{
unsigned int c;
c = n - ((n >> 1) & 0x55555555);
c = ((c >> 2) & 0x33333333) + (c & 0x33333333);
c = ((c >> 4) + c) & 0x0F0F0F0F;
c = ((c >> 8) + c) & 0x00FF00FF;
c = ((c >> 16) + c) & 0x0000FFFF;
return c;
}
Once you have a way to count set bits easily, then we are one step closer
to being able to draw a window. However, we will also need to be able to set
the flags that we want to have set. We could memorize all the numbers for each
flag, or we could use the preprocessor. Love the preprocessor.
Once we know the right flags, and we know the right op codes, we just
need to know the right sizes and orders of the arguments before we can start
sending requests. To find the correct sizes and orders of arguments, we can
look either at packet captures or the X11 documentation:
Requests section. Though packet captures are easier to read and understand,
the documentation is more thorough, once you figure out its little nuances. Once you have figured out the right order, you can write a nice function
that will take care of sending out the right data. For creating a graphics
context, your code might look like the following:
In this function, I pass in both the socket to write to, and the
connection structure. This is not necessary really because the structure is not
actually used in the function. It would also make sense to keep the socket file
descriptor in a field of the connection structure, however, I did not do this
in my code. As you can probably see, all this does is put the arguments in the
correct order and write them to the socket. The one thing to note in this
function is the flags field. There are other ways that this could be set up to
work, but what I have done is passed in a pointer to the start of the option
data. The count_bits function determines how many 32 bit numbers should be read
from the list and adds them to the end of the packet. This is very similar to
what XCB does in the same cases.
The function for creating a window looks very similar:
As you can see, creating a window is just about as simple as creating a
graphics context. It just has a few more parameters. You also have to take note
of the fact that the x,y and w,h arguments are 16 bits wide and not 32 bits.
Mapping a window to the screen is quite a bit simpler than the previous
requests:
Mapping a window doesn't require a flags argument or a list of optional
arguments. There is only one argument, the window ID. Once the window is mapped to the screen, you should be able to see it show up
on the screen. If you create a window with the background flag set and a
background color set up, you can see that color in the background of the
window.
Now, once you have a nice window finally displaying on your screen after
all that work, let's talk about drawing something in the window. Unlike the other methods and protocols described in this tutorial, using
X11 works by communicating with a server (as I REALLY hope you realized
already), as opposed to just writing your graphics into a framebuffer. This
means that any time you want to draw something on the screen, you will have to
actually send that something to the server. X11 gives us two ways of doing
this, we can send an image, or we can instruct the server to draw primatives.
Although this tutorial is more concerned with drawing primatives, I want
to actually talk about drawing images instead. The reason for this is that X11
already includes requests for drawing primatives, which are easy to figure out
and use, and because all of the other graphical display methods in this
tutorial center around drawing onto a framebuffer, and a framebuffer is
essentially just an image already. So therefore, in order to apply the
techniques learned from other graphical display methods in this tutorial, you
will probably want to be drawing images. Also, using only primatives will
impose limitations later when you do decide that you want to start drawing
actual pictures loaded from files. Displaying images is a little bit more difficult than the other requests
we covered already. To display a window, we told the server to create a window
and draw it on the screen, and optionally to fill the window with a color.
However, to display an image, we have to have the image client side and then
transfer that entire image up to the server. This is accomplished with the "put
image" request. The put image request (or x11_put_img as I call it) will send
the image data up to the server and tell the server where to put it. The
request requires a "drawable" argument, which is the location where the server
will put the image. This could be pretty much any drawable on the server. We
could, for example, simply tell the server to jam that image directly onto the
window we created, and it will work fine. However, that would mean that every
time we wanted to do something with that image, we would have to re-upload it
to the server, which is an unacceptable overhead cost in most cases. To
compensate for this, we can create a "pixmap" on the server, then put our image
into that pixmap (which is a drawable), and then map that pixmap onto our
window where and when we want it. That method has the advantage of keeping that
image in server memory so that it can be retreived at any time without
uploading the data again. Using a pixmap will probably be the preferred method
most of the time, however, sending the image data directly to your window is
simpler, and you should be able to figure out how to create and draw to a
pixmap on your own since it is very similar to other things that we have
already done in this tutorial, so I will only show how to draw directly on the
created window.
Basically, sending an image is the same as any other request, it just
includes one really big field at the end containing the image data. To send an
image to the server, you should use the "put image" request. This request needs
a drawable (where to draw it) an a GC. In addition, it also requires a height,
width, and x and y. The height and width are the size of the image, these are
important because these help the server calculate how big the image will be in
conjunction with the depth. In order for the server to accept and draw the
image, the depth must be the same as the GC used to draw, otherwise the X
server will send an error message and draw nothing. The size must also be
correct. To calculate the size, simply multiply the height and width of the
image, then multiply by the depth (32 = 32 bits) rounded to the nearest boundary
(depth 24 = 32 / depth 16 = 16 / depth 8 = 8 / depth < 8 = 8). Following the
normal fields in the request is one large data field, which contains the image
data. Implementing this function might look something like this:
Notice that I write the data with a seperate system call. Since the
connection uses a stream socket, this still works with no issues, there is no
distinction between packets in TCP or Unix Domain stream sockets. This also
allows us to avoid copying image data into a new buffer before writing it to
the socket. Note that we calculated the size of the image by multiplying height
and width, this will not work all the time because it assumes a depth of either
24 or 32 bits (which is what I expect on modern systems).
And now, without further ado, a full, working example:
Wayland (Coming Soon) I have been working on this, but it will take time, be patient.
Mir (Coming Soon) Once I get Mir actually compiled and running on my computer, then I will
start work on actually developing native Mir applications. This will probably
take the longest.
Useful Drawing Concepts Now that you have learned how to actually draw on the screen. You will
probably want to learn how to do more interesting and useful things with your
new knowledge. Because we are working with such low level graphics systems, we
do not automatically get things like double buffering and primatives, you have
to implement that stuff on your own. This section may help.
Double Buffering Double buffering is very simple, and can have a profound impact on your
graphics. With only a single video buffer, the contents of the buffer might be
displayed at a bad time, such as right after you have cleared the screen, but
before anything has been redrawn. This makes your graphics look pretty ugly. It
will look like your screen is flickering and probably isn't great for your eyes.
Double buffering solves this problem by never presenting anything to the
screen until it is fully drawn. The basic idea is that instead of having only
one buffer, there are two buffers of equal size: a front buffer, and a back
buffer. The front buffer is the one being drawn on the screen. Nothing should
be drawn onto the front buffer. The back buffer is not displayed, it is just
memory. Everything that should be displayed is first drawn into the back
buffer. Then, once the entire frame is ready, the buffers are swapped, and the
back buffer becomes the new front buffer. This way, the front buffer always
contains a complete frame.
Implementing double buffering is easy. The first thing you need to do is
allocate a second buffer, the exact same size as your front buffer. For
instance, if we are working with the Linux framebuffer device, we might use the
following code.
Then, instead of drawing into the front buffer, all your drawing
functions must draw into the back buffer instead. So, if we wanted to make the
whole screen purple, we might use the following code (compare to example in
fbdev section).
for (x=0;x<vinfo.xres;x++)
for (y=0;y<vinfo.yres;y++)
{
long location = (x+vinfo.xoffset) * (vinfo.bits_per_pixel/8) + (y+vinfo.yoffset) * finfo.line_length;
*((uint32_t*)(back_buffer + location)) = pixel_color(0xFF,0x00,0xFF, &vinfo);
}
We can draw as much as we want into the back buffer, but it will never be
displayed on the screen, until we swap it with the front buffer that is. The way we swap buffers is going to be dependent on the method we are
using to render to the screen. For instance, DRM/KMS provides it's own method
of swapping buffers. You might be able to just tell your video driver to use
your back buffer as the front buffer. If all else fails, you can just copy the
contents of the entire back buffer onto the front buffer. In this case, we do
not need to actually "swap" the buffers since we do not really care about
moving to contents of the front buffer into the back buffer, so we only need to
copy the back buffer to the front buffer and be done with it. We can use this
method of swapping buffers using the fbdev rendering method. The code to swap
buffers might look like this.
inline void swap_buffers()
{
int i;
for (i=0;i<(vinfo.yres_virtual * finfo.line_length)/4;i++)
{
((uint32_t*)(fbp))[i] = back_buffer[i];
}
}
Please note that there is a lot of room for optimization in this code.
On systems with SIMD instructions, such x86_64 (SSE, SSE2, SSE3, SSE4.1, etc..)
multiple bytes of the buffer can be copied at once. Writing this routine in
assembly language might be helpful.
However, there is even a better way to do double buffer using fbdev.
Although the interface does not provide a way to switch buffers, we can use a
little trick to simulate this behavior. The fbdev driver allows for what it calls "panning", where the
framebuffer can be bigger than the screen and then the driver can be instructed
to draw the correct portion of the buffer on the screen. To simulate buffer
switching behavior, we can allocate both buffers as one big buffer and then
draw in the one that is off the screen (our back buffer), and then tell the
device to "pan" to the offscreen portion of the buffer. We tell the device to
pan with the FBIOPAN_DISPLAY ioctl.
uint8_t *fbp, //Front buffer base pointer
*bbp; //back buffer base pointer
void init_fbdev()
{
...
fbp = mmap(0, screensize*2, PROT_READ | PROT_WRITE, MAP_SHARED, fb_fd, (off_t)0);
bbp = fbp + screensize;
}
void clear()
{
for (x=0;x<vinfo.xres;x++)
for (y=0;y<vinfo.yres;y++)
{
long location = (x+vinfo.xoffset) * (vinfo.bits_per_pixel/8) + (y+vinfo.yoffset) * finfo.line_length;
*((uint32_t*)(bbp + location)) = pixel_color(0xFF,0x00,0xFF, &vinfo);
}
}
inline void swap_buffers()
{
if (vinfo.yoffset==0)
vinfo.yoffset = screensize;
else
vinfo.yoffset=0;
//"Pan" to the back buffer
ioctl(fb_fd, FBIOPAN_DISPLAY, &vinfo);
//Update the pointer to the back buffer so we don't draw on the front buffer
long tmp;
tmp=fbp;
fbp=bbp;
bbp=tmp;
}
That's all you need to know to do double buffering. Make sure you
remember to swap your buffers when you want your next frame to be displayed.
Drawing Primatives You will probably need to draw shapes on the screen if you are going to
do graphics. Drawing single pixels isn't all that useful for most applications.
Here I describe a few common shape drawing algorithms that execute very quickly
and do what you need. I have used most of these in my own code. However, I can't cover everything, but there are lots of other resources online
that can help. Here is one very good one: Primitive Shapes & Lines.
Lines Drawing lines is very important. Probably every single graphical
application draws lines for something or another. Many programs might only need
to draw horizontal and vertical lines, while other applications, such as games,
will need to draw lines of an arbitrary angle. Drawing horizontal lines and vertical lines is by far the simplest. You
can draw these types of lines just by iterating over the x or y values and
plotting each pixel. The following code shows these two functions. Please note
that these functions make the assumption that x1 is less than x2 and y1 is less
than y2.
void draw_horizontal_line(int x1, int x2, int y, uint32_t pixel)
{
int i;
for (i=x1;i<x2;i++)
draw(i,y,pixel);
}
void draw_vertical_line(int x, int y1, int y2, uint32_t pixel)
{
int i;
for (i=y1;i<y2;i++)
draw(x,i,pixel);
}
These functions are very simple, about as simple as possible. However, it
should be noted that there is still room for optimization, especially in the
horizontal line version. If we are using a framebuffer stored as a contiguous
block of main memory, then we know that our horizontal line is also stored as
a contiguous block of memory, and we can determine it's size by subtracting x1
from x2. We can then copy memory into that block instead of changing each pixel
one at a time. Machines with SIMD instructions (e.g. SSE, SSE2, etc...) will be
particularly adept at this type of operation, it might be useful to write this
portion of your code in assembly. However, your compiler's optimizer should
also be able to do a pretty good job of optimizing this code on it's own.
For more complicated lines, we can use an algorithm known as Bresenham's
line algorithm. It's basically the fastest one available, and it works great.
You aren't going to get any super fancy features like anti-aliasing, but in
most cases, this is exactly what you want. This code has been adapted from
code found on this page:
Primative Shapes & Lines written in 1996.
void draw_line(int x1, int y1, int x2, int y2, uint32_t pixel)
{
int i,dx,dy,sdx,sdy,dxabs,dyabs,x,y,px,py;
dx=x2-x1; //Delta x
dy=y2-y1; //Delta y
dxabs=abs(dx); //Absolute delta
dyabs=abs(dy); //Absolute delta
sdx=(dx>0)?1:-1; //signum function
sdy=(dy>0)?1:-1; //signum function
x=dyabs>>1;
y=dxabs>>1;
px=x1;
py=y1;
if (dxabs>=dyabs)
{
for(i=0;i<dxabs;i++)
{
y+=dyabs;
if (y>=dxabs)
{
y-=dxabs;
py+=sdy;
}
px+=sdx;
draw(px,py,pixel);
}
}
else
{
for(i=0;i<dyabs;i++)
{
x+=dxabs;
if (x>=dyabs)
{
x-=dyabs;
px+=sdx;
}
py+=sdy;
draw(px,py,pixel);
}
}
}
Circles Drawing a circle is pretty easy. Circles are fun because they are very
uniform. If you divide a circle in half down the middle, each side is just a
mirror of the other side. so for any point on the circle (x,y), the other side
is just (-x,y). The same can be said if you divide in half across the center.
Top and bottom are also just mirrors of each other. So for any one point on a
circle, we actually know 4 points: (x,y) (-x,y) (x,-y) and (-x,-y). Then if we
divide the circle again diagonally, we can calculate 4 more points by just
exchanging x and y. So for every point we calculate, we can draw 8 points on
the circle, which means we only need to find 1/8th of the circle in order to
draw the whole thing. The following code is an implementation of what is called the "midpoint
circle algorithm".
//Draw a circle at (cx,cy)
void draw_circle(double cx, double cy, int radius, uint32_t pixel)
{
inline void plot4points(double cx, double cy, double x, double y, uint32_t pixel)
{
draw(cx + x, cy + y,pixel);
draw(cx - x, cy + y,pixel);
draw(cx + x, cy - y,pixel);
draw(cx - x, cy - y,pixel);
}
inline void plot8points(double cx, double cy, double x, double y, uint32_t pixel)
{
plot4points(cx, cy, x, y,pixel);
plot4points(cx, cy, y, x,pixel);
}
int error = -radius;
double x = radius;
double y = 0;
while (x >= y)
{
plot8points(cx, cy, x, y, pixel);
error += y;
y++;
error += y;
if (error >= 0)
{
error += -x;
x--;
error += -x;
}
}
}
This algorithm is wonderful and simple and also very fast. I recommend
using this algorithm any time you want to draw a circle. However, this
algorithm only draws the outline of the circle, it does not fill the circle in.
Fortunately, if you want to draw a filled circle instead, the modification is
very easy. All you need to do is instead of plotting points (x,y) and (-x,y),
draw a line between those points. This line drawing will always be horizontal,
so the line drawing algorithm can be optimized accordingly (described in
previous section). Here is the code (this code replaces plot4points function in
the circle drawing algorithm above).
If you're interested in contributing to the ongoing development of OpenThread, then the OpenThread GitHub repository is for you. There you will find all the code, including information on how to contribute, our style guide, code of conduct, licensing, and much more.
If you want to use OpenThread in your own products or for personal deployment, check the options below.
Since the launch of Lisk there was a lot of confusion about what it really is, and what it enables people to do in the future. This is partly our fault because during the ICO we advertised Lisk within other cryptocurrency communities, and we still haven’t released our vision paper. With this blog post I want to clean up some misunderstandings, and explain briefly what Lisk is and what it isn’t.
Misunderstandings
There are various misunderstandings, myths, and other wrong facts about Lisk being circulated around. So let’s clean up the most prominent ones.
Lisk is a fork of Ethereum!
No! The Lisk code base was forked from the Crypti project Oliver and I were a part of for 18 months before we started Lisk. About 2 years ago Crypti collected 750 BTC in order to develop a blockchain application platform. In the end the Crypti foundation developed a code base which was a good starting point for a new project. However, for the ultimate goal the available financial reserves were simply not enough to develop the idea to its fullest extent, even though it was managed extremely carefully. Oliver and I saw the potential and decided to take over, giving Crypti holders an exit opportunity, and giving the world the decentralized blockchain application platform it deserves. To get a more detailed understanding why Lisk is different than Ethereum, continue reading.
Lisk is written in JavaScript, which is not secure!
No! Even though JavaScript is a weakly typed language it doesn’t mean that it is inherently insecure. Yes, the programming language used plays an important role, but at the end of the day it is the developers obligation to write secure code in the first place. There are other programming languages we could have used, which owing to their strongly typed nature, make it less probable for the developer to write bad code. However, that doesn’t mean these languages do not come with other important trade offs. We chose JavaScript because it runs literally everywhere, is extremely popular & widespread, and has huge companies like Google or Microsoft working on its speed and security across a wide range of devices. Therefore, its ever increasing developer base and ease of access played a huge role in that decision.
With regard to JavaScript’s known limitation on number precision larger than 32 bits. We have worked around this limitation by conducting integer based arithmetic throughout the code base using BigNumber.js (a library for arbitrary-precision decimal and non-decimal arithmetic). Blockchain application developers are largely shielded by our API, or in cases which require it, they can utilize the same library used by the Lisk core. Therefore, this limitation is not a problem, nor has it been since Lisk’s inception.
To further strengthen the code base, we are planning an eventual transition to TypeScript — a strongly typed JavaScript super-set — which once implemented should mitigate the issues normally associated with weakly typed code. We are also contemplating pushing back certain critical core features to C++ for maximum speed and security. In summary, we are leveraging all options available to solve some of the known problems associated with using JavaScript on such a project as Lisk.
Lisk is a competitor to every blockchain project!
No! While it’s true that you can implement practically everything into a Lisk sidechain, we believe that the blockchain space is big enough to have several projects solving the same issues.
Decentralized storage can be done on a stand-alone blockchain or on Lisk.
Anonymous transactions can be done on a stand-alone blockchain or on Lisk.
A social blogging system can be done on a stand-alone blockchain or on Lisk.
And the list goes on..
The best part is, if you need one of these services in your Lisk blockchain app you don’t need to re-implement everything. You can just plug into third party technologies and leverage their functionalities from within your Lisk blockchain app. Continue reading to learn more about that.
What is Lisk
A decentralized network
Lisk is a decentralized network such as Bitcoin, Nxt, or BitShares. It doesn’t utilize Proof of Work like Bitcoin, or Proof of Stake like Nxt. Lisk uses a simplified implementation of BitShares’ original consensus algorithm called Delegated Proof of Stake. That means every LSK holder can vote for mainchain delegates which are securing the network. There are only a maximum of 101 active mainchain delegates which always got the most votes on the whole network, and only they can earn block generation rewards, that means there is a financial incentive to become an active delegate. Every other delegate is on standby awaiting to become elected, or alternatively, securing a Lisk sidechain.
A cryptocurrency
Lisk is a cryptocurrency and its token is called LSK. To send LSK from one account to another takes 10 seconds, after about 1–2 minutes the transaction can be deemed immutable.
A framework
The Lisk App SDK is a framework to deploy your own sidechains and develop blockchain applications on top of them. Everything is written in JavaScript. That means you can develop platform independent social networks, messengers, games, financial applications, prediction markets, exchanges, storage platforms, contract execution platforms, sharing economies, online shops, IoT applications, and much more on your own blockchain, fully decentralized, and without the hassle of complicated consensus protocols or P2P networking algorithms.
A sidechain platform
In the previous section, you read that the Lisk App SDK enables you to deploy a sidechain to Lisk. This sidechain is a fully customisable blockchain, free of pollution or spam on the mainchain or other sidechains. All specifications, parameters, and transaction types can be changed to fit perfectly to your blockchain application.
In order to prevent spam on a blockchain there is always a token of value which is required to save any data on the blockchain, e.g. a fee to send transactions. In your Lisk sidechain you can either use the LSK tokens from the mainchain or an entirely new custom token.
In the case you want to use mainchain LSK tokens within your sidechain you have one limitation. You can’t simply transfer tokens between different blockchains. Remember, a sidechain is a completely new and independent blockchain! You also can’t send Litecoin to the Bitcoin network.
That’s why there is a special transaction type to transfer LSK tokens from the mainchain to a sidechain. Practically, the LSK tokens never leave the mainchain. They simply get transfered to the sidechain owners Lisk account (on the mainchain!). At the same time, the equal number of LSK tokens will be replicated on the sidechain and can be used by yourself as normally.
That means in the case of a badly written sidechain or blockchain application, all LSK tokens are safe and can be retrieved easily by the sidechain owner. However, this also means that you have to trust the sidechain owner. This doesn’t make sense for all types of blockchain applications, only for some owned by startups or reputable individuals.
In the case you want to develop a truly trustless application, and you don’t want to hold LSK tokens belonging to your users, then you can simply issue your own custom token (or multiple ones) on the sidechain and use them as a completely new cryptocurrency. That means you can distribute them in an ICO or sell them on exchanges. This way your users have full and exclusive control over their custom tokens in a completely trustless way, and can use them inside your blockchain application. To easily exchange custom token against LSK you might implement a mini-exchange system inside your blockchain application.
If a hardfork is necessary because there was a bug discovered in the Lisk sidechain or blockchain application, then only the sidechain needs to be hardforked. The mainchain stays completely unharmed and immutable.
There are ongoing discussions about public, private, and consortium blockchains. With Lisk you can build whatever you want and need. However, in order to attract startups and developers we are putting our main focus on public consortium blockchains. That means federated blockchains which are secured by delegates chosen from a public marketplace which are being paid for generating blocks, either by the blockchain application owner or its users. Continue reading in order to learn more about the Lisk delegate marketplace.
A blockchain application platform
At the end of the day sidechains are only decentralized databases of blockchain applications; which allow the issuance of tokens of value, being immutable, and having a decentralized network. For our users the most interesting part is the blockchain application platform.
The Lisk App SDK not only allows the deployment of sidechains, it also offers all necessary tools to develop a blockchain application. A blockchain application is not only a script running in a decentralized and trustless manner. It’s much more powerful. It basically consists of four parts.
Consensus Algorithm (e.g. DPoS)
Sidechain (as the immutable database)
Back end (the application logic)
Front end (the application user interface)
A blockchain application is a fully fledged application, with its own consensus algorithm (by default DPoS), blockchain, optional custom token, back end, and front end. However, it can also connect into the existing ecosystem of Lisk by utilizing the LSK token, leveraging other blockchain applications (i.e. blockchain services), attracting sidechain delegates, and starting with an existing user base.
It is important to note, that thanks to the Lisk App SDK you are not only able to develop the back end of your application (i.e. the functionalities), you are also able to develop a front end (i.e. user interface). This way you can easily create complete blockchain applications ready to download and use in one convenient package. In this case the front end can also be accessed in a decentralized way, what is currently a unique feature within the blockchain industry.
We often hear the argument that if the code is not stored on the blockchain then it’s not a decentralized application. We have to make a distinction here, the application itself is running on your computer just as the Bitcoin client or a Torrent client. That means you have to download it first. Now there are multiple users running the application; some of them have the special permission to generate new blocks inside the sidechain and therefore secure the network (=delegates), others are simply using the application (=users). Your blockchain application and sidechain is secured by multiple devices now and therefore is decentralized, thus it’s a decentralized application.
A blockchain service platform
Blockchain services are a sub-category of blockchain applications. They allow you to do something very specific, similar to a single purpose application.
Some examples to name are:
Identity as a service (e.g. single sign-in for various websites with encrypted personal information directly accessible on demand)
Decentralized hosting as a service (e.g. to store images or documents in a decentralized, encrypted cloud)
Smart contract execution as a service (e.g. to calculate a random number which is safe from manipulation)
Oracles as a service (e.g. to get opinions about your product or to get the result of a real life event)
The great thing about Lisk blockchain services are that they are all plugged into the Lisk mainchain and ecosystem, because they are all running on their own Lisk sidechains. That means that once such a service exists every other blockchain application developer can leverage it. For example, if you want to develop a social network and want to store user images, you can plug into a blockchain service which enables you to do exactly that (the other option is to create a bridge to a third party decentralized hosting service, i.e. StorJ).
That means over time Lisk blockchain applications can become more complex and feature-rich, the whole Lisk app ecosystem grows. It’s like a brain making more and more neural connections to become more capable.
All those blockchain services can fall back on the existing Lisk ecosystem with its users and the LSK token, hopefully stimulating demand for it. Additionally, all those blockchain services are independent and if they fail, only they fail. What remains is a small sidechain entry on the Lisk mainchain, that’s it.
Another powerful component is the open source character. While there are ways to hide JavaScript code, we estimate the majority won’t do it in order to gain trust. Now, every time someone integrates a cool feature into his blockchain application or service everyone else can copy it, enhance it, and integrate it into his own blockchain application or service.
Some examples where being open source has real advantages:
In the beginning we won’t have the necessary libraries to execute Solidity smart contracts on Lisk, however it’s possible to integrate the EVM into any Lisk sidechain. If someone does that at one point of time, everyone could utilize the source code and would be able to either execute smart contracts locally inside their sidechain, or offer smart contract execution as a service. At that point developers won’t need to plug into the Ethereum network anymore.
It will probably also take some time until there is a Proof of Work consensus algorithm for Lisk sidechains. However, once someone develops one, everyone else can utilize it.
If a project fails because of political or financial reasons someone else can simply use the code and re-deploy the blockchain application again. He might even write a sidechain migration so he can take the old users with him to his new sidechain.
A decentralized directory for applications
Nowadays, we have Apple’s App Store and Google’s Play Store. Both centralized solutions owned by corporations for centralized applications.
With Lisk we not only have the network, ecosystem, and SDK for decentralized blockchain applications, we also have a decentralized application directory directly within the client which everyone can access. All blockchain applications and services built on the Lisk platform have to be registered on the Lisk mainchain, this way the connection between mainchain and sidechain is created (for the LSK replication within the sidechain) and the application directory can find all blockchain applications and services. During the registration the developer is able to set a unique blockchain application name, description, category, icon, tags, and download link.
With that we offer an alternative app store for decentralized, platform independent, blockchain applications.
A decentralized and public delegate marketplace
The Lisk delegate marketplace is a way to connect blockchain application developers and sidechain delegates. This is necessary, because every sidechain is an independent blockchain which needs its own set of nodes to secure it. If blockchain application developers don’t use our default consensus algorithm (e.g. PoS or PoW) they will have to take care of the sidechain miners/forgers themselves, which might be a better option for some use-cases. However, if the blockchain application developer is using our default consensus algorithm (DPoS) he will be able to rent delegates from the delegate marketplace for a recurring fee.
This kind of delegate marketplace can only work with DPoS and it enables us to do a lot more with Lisk than other platforms can do.
Supporting third party software like Bitcoin (or IPFS, Ethereum, etc.)
Let’s say you have a blockchain application which saves purchase receipts on the sidechain for every Bitcoin transaction your users are doing. How do you get the Bitcoin transactions of your users? An oracle is far to slow and a blockchain explorer is centralized. That means you have to run the Bitcoin client locally next to your blockchain application in order to access its API and get all necessary data immediately. That means these kind of delegates will have to promote themselves as sidechain delegates with a local Bitcoin instance running. This will be one of the attributes a delegate will have inside the delegate marketplace. The same applies to any other third party software, centralized and decentralized.
Staying in certain jurisdictions for legal reasons
Some blockchain applications might offer features which are illegal in one country and legal in another. The startup behind the blockchain application might be very serious about it and wants to stay 100% legal, that means the sidechain needs to be secured by nodes which are only running in the jurisdictions where the blockchain application features are legal. The node country will be another attribute a delegate will have inside the delegate marketplace.
Generating income for everyone in the world
Every sidechain delegate can be rented for a fee they define (e.g. 10 LSK for every 100 blocks). Imagine in two years you can secure 50 niche blockchain applications with your Raspberry Pi 5 which you bought for $15, generating a monthly income of $50. For many, this might not sound like much, but in large parts of the world this is enough money to last for a month. The world is becoming more connected every day and even in those parts of the world the internet accessibility is improving. Making this scenario not fiction, but a very possible reality.
Lite clients for every blockchain application and service
Thanks to federated sidechains it’s extremely easy to update the code and introduce changes. Even more important, it is extremely easy to develop a lite client for your blockchain application or service. We estimate that mobile devices will be (and already are) the computers of the future and nobody wants to stay in sync with a blockchain on mobile devices. This means we need to let users access any blockchain application with a simple click inside the application directory, without any installation process. This can be done with a lite client which is receiving the whole blockchain application (back end and front end) from a sidechain delegate in a decentralized way.
What Lisk isn’t
A smart contract platform
I hope that above explanations showed that we are not a smart contract platform which simply executes Solidity contracts inside an EVM fork. Lisk blockchain applications are integrating their features directly into their back end.
A few examples how you can implement specific features in a Lisk blockchain application:
Authorisation within a blockchain application
You can completely rely on the Lisk APIs here. Your users can simply use their Lisk passphrase, which always converts to the same blockchain application account. For security reasons we suggest users to setup a second passphrase on the mainchain and never input this second passphrase on a sidechain.
→ You can now build applications which require users to authorise themselves. This is important for all kinds of applications.
Sending a message, writing a blog post
You have to add a new transaction type which allows you to append text with the required length. Additionally, you have to add two API calls to initiate this new transaction and to retrieve the content of a transaction from the sidechain.
→ You can now build a decentralized messaging service, social network, or blogging system. You can even develop voting systems by relying on the stake of individual users.
Again, you simply have to add a new transaction type which allows you to set a boolean state of true or false.
→ You can now build a decentralized Internet of Things application which allows you to securely (with authorisation!) turn on gadgets with a simple transaction, which can be just a push on a button.
Generating random numbers
Every calculation within a Lisk blockchain application needs to be deterministic. That means every node executing the blockchain application code has to come to the same end results. So you can’t use a random() JavaScript function, you have to rely on other methods. One option is to calculate a random number of the next 10 block IDs. Another option for a system with multiple users is to force everyone to announce a number and calculate the random number with the help of every announced number.
→ You can now build games which require a random number, e.g. a slot machine.
A Bitcoin startup
We are a blockchain startup, and more importantly a Bitcoin independent network; running on our own blockchain, with its own cryptocurrency, and full control over how the platform evolves over the forthcoming years. This should become evidently clear if you check out which conferences, meetups, and events we are attending.
A pump and dump
Many users are urging us to spend our funds on unnecessary stuff and give out exorbitant bounties. We won’t! We are here to build and promote the platform consistently over a long time. The development speed accelerate over time once the team grows and internal processes are being established. The promotion is growing in steps and certain marketing campaigns only make sense once the project is at a specific stage in terms of development. We estimate our funds are enough for a minimum of 5 years.
Ready…
Lisk isn’t ready, else there wouldn’t have been an ICO! Some parts of our great vision still take months, other parts maybe years. It doesn’t take a week to revolutionize the app market.
Although observational findings linking breast milk to higher scores on cognitive tests may be confounded by factors associated with mothers’ choice to breastfeed, it has been suggested that one or more constituents of breast milk facilitate cognitive development, particularly in preterms. Because cognitive scores are related to head size, we hypothesised that breast milk mediates cognitive effects by affecting brain growth. We used detailed data from a randomized feeding trial to calculate percentage of breast milk (%EBM) in the infant diet of 50 adolescents. MRI scans were obtained (mean age=15y9m), allowing volumes of total brain (TBV), white and grey matter (WMV, GMV) to be calculated. In the total group %EBM correlated significantly with Verbal IQ (VIQ); in boys, with all IQ scores, TBV and WMV. VIQ was, in turn, correlated with WMV and, in boys only, additionally with TBV. No significant relationships were seen in girls or with grey matter. These data support the hypothesis that breast milk promotes brain development, particularly white matter growth. The selective effect in males accords with animal and human evidence regarding gender effects of early diet. Our data have important neurobiological and public health implications and identify areas for future mechanistic study.
Numerous studies report that breastfeeding is associated with higher scores on tests of neurodevelopment and cognition in later life (1,2), suggesting that breast milk may impact early brain development, with potentially important biological, medical and social implications. These findings are often questioned because other factors associated with mothers’ choice to breast-feed (e.g. higher socioeconomic status and level of education, different child-rearing attitudes) might promote cognitive development. In a study using data from a national database, adjustment for maternal IQ eliminated the breastfeeding effect (3). All these studies, however, are potentially flawed by lack of experimental design.
More recently, a large cluster randomised trial of breastfeeding promotion using an experimental design demonstrated a large effect of breastfeeding on cognition, adding credence to the evidence for beneficial effects seen in past observational studies (4). Anderson and colleagues (5) showed in a meta-analysis that, after appropriate adjustments, breastfeeding was associated with an advantage of around 3 points on tests of cognition in children born at term and around 5 points in those born preterm, both large effects in population terms. The implication is that, over and above social factors, one or more constituents of breast milk benefit neurodevelopment, particularly so in those born preterm, at a more sensitive stage of brain development.
Our current study was prompted by two factors, the now likely causal relationship between breast feeding and cognition and the previous observation that cognitive scores in preterms are related to head circumference and brain size measured by quantitative volumetric analysis of MRI scans (6). With this background, we examined the relationships between breast milk feeding, cognition and brain volume determined from MRI scans. Our specific purpose was to provide evidence for the hypothesis that cognitive benefits of breastfeeding are mediated though an effect on brain growth, ultimately influencing mature brain volume.
To explore this hypothesis, we studied members of a cohort of adolescents who had participated in a large randomised trial examining the health and developmental effects of early infant nutrition, conducted between 1982-5 (7). The subjects were born preterm, at a sensitive period for nutritional effects on brain development and cognition (8). In exploratory analyses, those receiving breast milk, after allowing for confounding factors, had an 8.3 point IQ advantage at 7-8 years (9). A subset of this cohort has been followed to 13-19 years of age for MRI scanning and cognitive testing. Data collected on these subjects whilst in the neonatal unit provided precise information on the volumes of breast milk consumed (by nasogastric tube), giving a rare opportunity to explore the potential dose-response effect of breast milk feeding on brain volumes and cognition at adolescence. It is important to note that all neonatal data were recorded for this study at the time and were not obtained retrospectively from charts. Although some IQ data for the larger group have been reported previously (10,11), neither IQ data for this particular group nor information regarding breast milk in the diet have been published.
METHODS
Subjects
The original randomised trial of early nutrition consisted of two studies, each with two parallel trials (8). In the trial of interest here, mothers elected to breastfeed but varied in their success in expressing milk and, therefore, in the need for supplementary feeding. According to randomisation before breastfeeding commenced, these infants were given one of three supplements: a nutrient enriched preterm formula (PTF), a standard term formula (TF; used in the 1980s for feeding preterm infants) or banked breast milk (BBM; provided by unrelated donors and unfortified). The volumes of all enteral intakes were recorded daily (parenteral nutrition was not included) and the expressed maternal breast milk component converted to a percentage (%EBM) of the infant’s total intake. The median number of days in the study was 39.5 (Quartiles: 21.5; 39.5; 61.3).
Our intention was not to conduct another follow-up since the neuroimaging component, constrained the number of adolescents who could be studied due to factors such as difficulty of travel to the hospital for MRI imaging. In order to make interpretation of the diet-brain relationships as straightforward as possible, we chose to include those who had been born at 30wk GA or less and who had been considered neurologically normal, by history and examination, when seen at 7.5-8 years, making them an atypical group. All attended mainstream schools. Cognitive and imaging data were available for 50 (26M; 24F) adolescents who had been assigned to the EBM + supplement limb of the original study, allocated to diet as follows: PTF=28, TF =13, and BBM=9. Percentage EBM in the diet ranged from 0 (2 infants) to 100 (3 infants) with a mean of 60.1. Members of this group had been assessed cognitively in previous follow-up studies but no MRI scans were collected at time of birth (in the early 1980s) or subsequently. Mean age at cognitive testing/MRI acquisition was 15y9m (Range: 13y5m -19y9m; SD=13.6m).
Cognitive assessment
The age-appropriate Wechsler IQ test was given: 44 completed the Wechsler Intelligence Scale for Children – Third Edition (WISC-III) and 6 the Wechsler Adult Intelligence Scale – Third Edition (WAIS-III). Mazes, an optional subtest, was omitted from the WISC-III. Verbal IQ (VIQ), Performance IQ (PIQ) and Full Scale IQ (FSIQ) scores were calculated. IQ scores have a population mean of 100 and a standard deviation of 15.
MRI Scan acquisition
MRI studies were performed using a 1.5T Siemens Vision system. Investigations included magnetization prepared rapid acquisition gradient echo (MPRAGE 3-D) (12) volume acquisition with repetition time of 10 ms; echo time, 4 ms; inversion time, 200 ms; flip angle, 12 degrees; matrix size, 256×256; field of view, 250 mm; partition thickness, 1.25 mm; 128 sagittal partitions in the third dimension, and acquisition time, 8.3 min.
MRI volumetric analysis
We used an established technique developed by Fischl and colleagues (13) to obtain volumes for total brain (excluding cerebrospinal fluid) and for white and cortical grey matter in the left and right hemispheres separately; central grey matter and the cerebellum were not included. This method involves whole brain segmentation and the assignment of a neuroanatomical label to every point in the cortex; labelling of the structures, using T1-weighted images, is based on probabilistic information estimated from a training set manually labelled by neuroanatomists. It has been validated by correlation with histology, manual measures and with known properties of the cortex (14,15). We correlated total brain volumes obtained on two separate occasions and, as expected from such an automated method, the correlation between them was 1.0.
Procedure
Cognitive data were collected during the hospital visit for MRI acquisition. Test administration was carried out by one individual blind to %EBM. Written consent was obtained from all participants and parents/guardians, and approval was obtained from the relevant ethics committees (The Great Ormond Street NHS Trust/Institute of Child Health, Norwich District, South Sheffield Research, East Suffolk Local Research and Cambridge Local Research Ethics Committees).
Statistics
T-tests were used to compare male and female groups. The relationships among %EBM, IQ scores and neural volumes were calculated using partial correlation coefficients. In analyses involving %EBM and IQ scores, we covaried level of maternal education and social class. We did not correct for age at test since IQ scores are already age-corrected, nor for GA since there was no significant relationship between GA and %EBM in this group (p=0.2). In all analyses involving neural measurements from MRI scans, we added age at test/scan and GA as additional covariates since brain development takes place over the age range during which our adolescents were scanned (16) and because ‘baseline’ brain structure at birth is related to GA (17). In fact, the relationships between GA and neural volumes were not significant in this group (the range in GA values was only four weeks) but we included it in case of residual confounding. We then selected three quantitative variables reflecting aspects of neonatal condition: days to achieve enteral feeds of 150ml/kg/day (only available for 41 infants), days of ventilation and number of infections based on positive blood culture. We conducted principal analyses as described above for all children first, reporting these results. We then carried out secondary analyses with the additional covariates but only report these if the pattern of significant results was altered. We also conducted regression analyses in which all covariates were added as predictors.
Results
Group characteristics
presents perinatal and milk intake data for the total group and genders separately; there were no significant differences between boys and girls. In addition, we looked at socioeconomic status (coded into 6 categories based on the U.K. Registrar General’s Classification of Occupations) and level of maternal education (coded into 5 categories based on academic achievement. The distribution of education level did not differ between girls and boys; significantly more girls were classified as higher social class (chi square: p=0.03). All mothers in the study had chosen to breastfeed and there was no relationship between either social class or maternal education and %EBM within the group. Nevertheless, to reduce any residual impact, both maternal education and social class were covaried in all subsequent analyses. presents information specifically about markers of illness in the neonatal period that might have had an impact on feeding and subsequent outcomes. No child in this study received corticosteroids.
Means (SD) for perinatal and social factors for the whole group and for males and females separately.
Indications of severity of illness during the neonatal course
The children had been classified as neurologically normal on the basis of history and examination, but not MRI scan, at 7-8years. Thirty-three of the 50 adolescent scans were considered normal on visual inspection. Of the remaining 17, ten had a small corpus callosum, two had reduced bilateral hippocampi, two showed a mild periventricular leukomalacia pattern, and one each had prominent ventricles, left peritrigonal focal atrophy, and a possible MR signal abnormality in deep white matter. We could not discern any relationships between these findings and outcome measures.
As mentioned above, the group was chosen in such a way that it was not representative of the whole cohort. However, we compared the study group with the total group in the cohort who met the same selection criteria (GA≤30 weeks, normal neurological status at 7-8 years and participated in Trial 2; n = 84) to see if they were comparable: GA – Total mean = 28.8wks, Study mean = 28.8wks; birth weight – Total mean = 1299g, Study mean = 1230g; Apgar at 5 minutes – Total mean = 7.7, Study mean = 7.7; days of ventilation – Total median = 3.0, Study median = 2.0. The proportions of small for gestational age infants were 13% in the total group and 16% in the study group.
%EBM and IQ
shows that all mean IQ scores were close to the putative population mean of 100. IQ scores for boys and girls did not differ significantly. Correlation coefficients between %EBM and IQ measures are shown in ; p-levels ≤ 0.05 were considered significant. The whole group showed a significant relationship between %EBM and VIQ (FSIQ at trend level), largely attributable to the boys, who showed relationships between %EBM and all three IQ scores, while the girls showed none. Repeating these analyses with additional covariates did not change this pattern of results, except that the relationship between PIQ and %EBM in the boys was at trend level only (p=.08). We then carried out regression analyses for the three IQ scores with the following predictors: maternal education, social class, %EBM, number of infections, number of days to achieve enteral feeds of 150ml/kg/day, and days of ventilation. Results are presented in . For each outcome measure, we show the overall significance level of the best model with the contributions made by its significant predictors. The ratio of number of predictor variables to number of subjects is rather high so caution in interpretation is advised.
Mean (SD) VIQ, PIQ and FSIQ scores for the whole group and for boys and girls separately.
Correlation coefficients (p-levels) between %EBM and IQ measures for the whole group and boys and girls separately with level of maternal education and social class partialled out.
The overall significance level of the models predicting IQ and neural volumes are given in the top row. The predictor variables are listed and significance levels for those retained in each model are presented. Since results for left and right grey and...
%EBM and neural volumes
We calculated partial correlations between %EBM and a series of neural volumes: TBV, left and right WMV and left and right GMV, adding test age/age at scan and GA as covariates in addition to maternal education and social class, as above. shows that %EBM was related significantly both to TBV and to left and right WMV in the whole group. The same was true in the boys, in whom the relationship between %EBM and WMV accounted for almost 50% of the variance. In girls, the relationship with WMV existed at a trend level only and that with TBV was not significant. No significant relationships existed between any %EBM and any GMV. Repeating the analyses using the additional covariates did not change these patterns of results except to reduce the size of the correlations between %EBM and WMV in girls so that they were no longer significant even at a trend level. We repeated regression analyses as reported above, adding GA and age at test/scan as predictors. Results are shown in ; the patterns for left and right GMV and WMV were the same so only the left results are reported.
Correlations (p-values) between %EBM and neural volumes for the whole group and boys and girls separately. Maternal education, social class, test age and gestational age were used as covariates in all analyses.
The above analyses used absolute brain volumes. We repeated the WMV and GMV analyses covarying for TBV, i.e. using relative volumes (shown in ). The results reflect the fact that these analyses deal with two component volumes contributing to TBV. Thus while WMVs remained significantly related to %EBM, GMVs were now also significantly related but in the negative direction; the trend level relationships between %EBM and WMV in girls become significant. Finally, to further clarify the relative contributions of grey and white matter to TBV, we examined the correlations between %EBM and TBV. In all cases, covarying grey matter increased the size of the correlation coefficients while covarying white matter decreased them.
Neural volumes and IQ
Finally, we examined the relationships between neural volumes and IQ scores (). For the whole group, left and right WMV correlated significantly with both VIQ and FSIQ. Boys showed these same relationships (with trend level relationships between PIQ and white matter) and, also, a significant correlation between TBV and VIQ. The non-significant relationships between IQ and neural volumes in girls and between IQ and grey matter in any group are omitted from .
Significant correlations (p-values) between neural volume measures and the three IQ scores for the whole group and for boys. Maternal education, social class, test age and gestational age are used as covariates in all analyses. Results were the same for...
Impact of donor banked breast milk (BBM)
The diet group given banked donated breast milk (BBM) as a supplement to EBM (n=9) received 100% human milk from two sources. To check that the inclusion of this group had not unduly influenced the results, we eliminated the BBM group and analysed data only from the 41 infants who had not received any donated human milk. The relationships between %EBM and VIQ and FSIQ in this smaller group were both significant at the p=0.006 level. Regression analyses in these 41 subjects who received one of the other two supplements (i.e. TF or PTF) showed that neither supplement significantly predicted IQ outcome but %EBM did so for both VIQ and FSIQ. The effects of sole diets were not examined in this study. Similar results were found for neural volumes but are not reported here.
DISCUSSION
This study showed a dose response relationship between early breast milk intake and later IQ and, uniquely, with whole brain volume at adolescence. In all subjects, but most clearly in boys, the effects of breast milk were seen more strongly on white than grey matter in the brain. These data support the hypothesis that one or more constituents of mothers’ breast milk promote brain development at a structural level.
Our observation that IQ itself was strongly related to white matter and hence TBV is consistent with the observed links between early breast milk feeding and both later IQ and later WMV. Relationships between white matter and measures of cognition have been reported previously in children (18). These data support the hypothesis that the beneficial impact of breastfeeding on IQ may be mediated, at least partly, by promoting the development of white matter.
Generalisation of our findings to healthy full term infants is untested but may be supported by the observations that breast feeding in healthy infants is also related to higher subsequent IQ scores and that IQ is related to white matter in children and adults generally. In any case, preterm infants represent a sizeable proportion of the population (e.g. greater than the prevalence of Type I diabetes ), over-represented amongst those with neurodevelopmental problems, so favourable influences on long term brain development are important to identify even within this large subgroup.
While correlations were seen in the whole population, they were always more highly significant in males. Thus, IQ was most clearly related to white matter and hence TBV in males, and expressed breast milk was linked to IQ and to neural volumes only in males. These findings concur with a large body of animal evidence; Smart (19) for example, examining the effect of early malnutrition on later learning and behavior in 165 animal studies, reported dominant effects in males. A growing literature in humans describes gender differences in brain development and structure (20,21) and in relationships between brain structure and cognition (22,23). We emphasise that males and females did not differ in mean IQ values but rather in the relationships between IQ and both diet and neural volumes. Percentage EBM accounted for around 25% of the variance in IQ in boys but only 5% in girls. It is interesting to note the recent report by Kesler and colleagues (24) that measures of WMV differ between preterm males and females.
Correlations were often found for FSIQ, but more detailed analysis showed that VIQ, rather than PIQ, was related to breast feeding and neural volumes (except in one instance). There is growing evidence that early nutrition may have a selective, lasting effect on verbal subscales (25,8); in later development, however, diet may selectively affect PIQ (26). The implication is that the cognitive effects of nutritional intervention may depend on the timetable of brain development.
Possible mechanisms
The mechanism for the observed effects is uncertain. In recent years, long-chain polyunsaturated fatty acids (LCPUFAs), notably docosahexaenoic acid (DHA), have been considered a key explanation for neurodevelopmental benefits of breast milk. A recent meta-analysis of outcome in preterms after early LCPUFA supplementation of infant formula (27), however, found no conclusive evidence for long-term cognitive effects, suggesting that differences in early LCPUFA intake may not explain our observations. While LCPUFAs remain of interest, it should be remembered that human milk contains a wide variety of other factors including hormones and growth factors, some of which (e.g. thyroxin, nerve growth factor) could plausibly influence cerebral development, although this is not established. During the window of brain development we studied, glial cell production and myelination are key factors related to volume increase (28). Our study was not designed to examine detailed neuronal development, but a plausible hypothesis for further testing, is that breast feeding promotes glial cell proliferation and, hence, myelination. Interestingly, a recent study showed that dietary manipulation in neonatal rats affected the expression of myelin basic protein more in males than females (29), consistent with our findings of a greater effect of diet on white matter development in males.
Breast milk contains significant quantities of cholesterol while infant formulas contain little (30); full term breast fed infants have higher total serum cholesterol at 6 months (31). Saher and colleagues (32) reported recently that cholesterol is an indispensable component of myelin membranes in mice and that cholesterol availability in oligodendrocytes is a rate-limiting factor in brain maturation. The massive increase in synapse number post-natally may require large amounts of cholesterol, obtained by neurons from astrocytes (33). Serum cholesterol has been related to intellectual performance in adults (34). We suggest, therefore, that dietary cholesterol intake could help explain an impact of breast milk on white matter development and cognition, through enhanced glial production/myelination.
Limitations
We recognize limitations to the conclusions that can be drawn from this study since it did not have an experimental design, making it impossible to test for causation. We hope, however, to provide an impetus to further research and suggest to cognitive scientists that nutrition as an environmental factor may affect their outcomes. Testing for causation more formally in future intervention studies may be possible but, meanwhile, the strong observed dose-response relationships and the consistency of our findings with previous animal and human research makes causation plausible.
The study sample was restricted to preterm infants who had normal neurological status at 7 years. We are unable to say anything about the relationships we have observed here in those preterms who show evidence of injury or adverse outcome. We do not know whether this group would show the same effects, smaller or larger. It is possible that %EBM might have a greater influence in this vulnerable group but this remains to be determined.
There are also some more specific limitations. Der and colleagues (4) have recently suggested that the relationship between breastfeeding and IQ is explained by the confounding influence of higher IQ in breast-feeding mothers. While all mothers in the present study had chosen to provide breast milk, we accept that residual confounding might remain, even within this subpopulation, if, for example, higher maternal IQ was associated with greater breast milk production. Lacking maternal IQ scores, we have used level of maternal education as a proxy measure and covaried this in all correlations. Although there was no significant relationship between %EBM and level of maternal education in the study population here, we accept that maternal IQ cannot be ruled out as a factor without directly measuring it. We have noted above that while the main outcome data, cognitive scores and neural volumes, were available for all subjects, data were missing for a control variable we used as a covariate, a weakness of the study.
During the age range studied here, grey matter and white matter develop along different trajectories (16). Neural volumes were obtained at one time point only which is why we have covaried age at scan in an attempt to crudely control for these differences. Repeated scans would have allowed observations about the effects of breast milk on the course of development. The results show that the %EBM ingested as a child is related to the volume of white matter seen in the brain at the age range studied here but the absolute volumes of white (and grey) matter might differ at different ages without affecting these correlation coefficients.
In summary, the present evidence supports the hypothesis that the effects of constituents of mothers’ milk underlie the repeated observations that breastfeeding is associated with a higher cognitive level, at least in males. This has both scientific and clinical implications, pointing to nutritional practices that could optimise the attainment of cognitive potential. Our study also helps provide a framework for future mechanistic studies on the impact of early nutrition on brain development.
Repeating radio signals from a mysterious source in a dwarf galaxy 3 billion light-years away have been detected by astronomers.
Using the Green Bank Telescope in West Virginia, scientists with the Breakthrough Listen initiative—a massive project dedicated to finding signs of intelligent alien life—recorded 15 repeating fast radio bursts (FRBs) on August 26. The discovery was announced as an Astronomer's Telegram and will be described in further detail in a forthcoming scientific article, according to a statement from Breakthrough Listen.
FRBs last just a few milliseconds and appear to be coming from deep space. Because FRBs have an extremely short duration, and because scientists usually find them in data only after the event has taken place, pinpointing their origin has not been possible.
Since their discovery over 15 years ago, almost two dozen FRBs have been recorded. Most often, they are one-off events, but in 2016 scientists announced in the journal Naturethat they had found a repeating radio signal—FRB 121102. By monitoring and tracking this repeating burst, they were able to trace it back to a dwarf galaxy 3 billion light-years away. Still, the source remained elusive. Nothing that we know of in that region of space could be producing these signals.
Now the Breakthrough Listen team has detected 15 more busts coming from FRB 121102. Vishal Gajjar, a postdoctoral researcher at the University of California, Berkeley, where Breakthrough Listen is based, observed the new bursts during a monitoring effort run. Over five hours of observation, he and his colleagues collected 400 terabytes of data over the 4 to 8 GHz frequency band, or C-band, which is mostly used for satellite communications transmissions.
A colorful deep space image captured by the Hubble Space Telescope, as seen in a NASA handout from June 3, 2014. REUTERS/NASA/ESA/Handout via Reuters
Analysis of the data revealed the new repeating pulses and showed that the source is in a “heightened activity state,” the Astronomers Telegram said. “Follow-on observations are encouraged, particularly at higher radio frequencies,” the team added.
Initial results indicate that FRBs emit at higher frequencies than previously observed. This discovery that should help scientists determine the source producing the bursts.
Several explanations for FRBs have been suggested. One is a cataclysmic event, such as a neutron star collapsing into a black hole. But such an event would produce only one burst and therefore does not explain the repeating pattern of FRB 121102. Another possible explanation is that they are coming from a young, highly magnetized neutron star, but so far nothing like this has been detected in this region of space.
Despite widespread speculation, the possibility of the signals coming from an advanced alien civilization has been largely ruled out.
The clear reading of this code (as the screenshot says) is that the inventory limit in Zork 1 is random, not a fixed number of items. Each item you pick up makes it more likely that you'll hit a "holding too many things" error. But since it's a random chance, you can just try again -- it might work next time.
This was passed around in a commentary cloud of "This game was unfair," "games in the 80s were terrible," and so on. (See this NeoGAF thread, for example.)
This is fascinating! I played Zork, as I played all the Infocom games, and I didn't remember this inventory detail. It felt dimly familiar when I was reminded of it, though.
Research time!
Is it true?
That's the first question, of course. Let's try it.
>i
You are carrying:
A rope
A nasty knife
A brass lantern (providing light)
A clove of garlic
A lunch
A brown sack
A glass bottle
The glass bottle contains:
A quantity of water
A jewel-encrusted egg
A leaflet>get sword
You're holding too many things already!>get sword
Taken.
It's true! It's true!
Here I'm playing Zork 1, revision 88, serial number 840726. This is by far the most common version you'll find today, because it was the version included on the Lost Treasures of Infocom CD. You can play it today on iOS (at least until iOS11 hits) or on GOG. Or I'm pretty sure you can find it with a web search.
But is it evil?
Well, that's a more complicated question.
If you look at the Twitter thread, you'll see that the code snippet is taken from this source code listing (wayback link from 2004). Let's look at a larger chunk of this file:
if( parent(noun) in player ) ?label4;
weight = QueryWeight(noun);
if( (weight + QueryWeight(player)) <= Load_max ) ?label4;
if( ~~vb ) ?label5;
print "Your load is too heavy";
if( Load_max >= Load_allowed ) ?label6;
print ", especially in light of your condition.";
jump label7;
.label6;
print ".";
.label7;
new_line;
.label5;
return 2;
.label4;
if( Verb ~= ##Take ) ?label8;
num_items = CCount(player);
if( num_items <= Maximum_held ) ?label8;
weight = num_items * Max_held_mult;
if( weight <= random(100) ) ?label8;
print "You're holding too many things already!";
new_line;
rfalse;
.label8;
move noun to player;
Here it becomes apparent that there are two independent limit tests when you pick up an object. First it checks the sum of your weight and the weight of what you're picking up. This is a constant test: the sum must be less than Load_max (100 pounds), or it displays the error "Your load is too heavy." (If you're wounded, the weight limit decreases.)
Then it checks the number of items you're holding. This is the randomized test, but there's a safety zone: anything up to Maximum_held (7 items) is safe. In fact, if you're carrying 7 items, the next TAKE command is safe. Beyond eight items, the chance of failure is num_items * Max_held_mult (N*8) as a percentage.
So this is already less evil than it looked at first. You can carry up to 100 pounds and eight items, but the item limit is soft -- you get some wiggle room on that.
More trivia:
Object weights range from 2 (the leaflet, matchbook, etc) to 55 (the gold coffin).
Weight is figured recursively -- objects in a container still count towards the weight limit.
Object count is not recursive, so you can work around the "too many things" error with careful sack-management.
Worn clothing is counted as weight 1. But there is no clothing in Zork 1, so this doesn't help you much!
The original MIT Zork/Dungeon game, the predecessor to Zork 1/2/3, did not have the item count check (randomized or otherwise). It only checked total weight.
So is Zork 1 evil?
Of course it's evil! There's a thief who can walk into the room and kill you! Or steal vital equipment from you and hide it in an inaccessible room! Your lamp dies after 385 turns! Evil, sheesh.
But that's not a complete answer.
Remember that we are looking at the dawn of computing gaming history. The very idea that a videogame should be fair, or even winnable, was hazy. It was perfectly normal for a game to just get harder and harder until it killed you. (Think Pac-Man, Asteroids, etc.)
The adventure genre has always presented itself as "solvable", but of course that is itself a subjective standard. Through the 1980s, we took for granted that solving a game took repeated attempts -- death after death, retry after retry, mistake after mistake -- learning (hopefully) a little each time. Today we say "masocore"; back then it was just the way games were. The thief was an annoyance. "Unfair" sequencing, like being able to accidentally destroy or lose a crucial object, wasn't even worth a blink.
(Jason Dyer has been going through the adventure games of the 1970s; there's also the Digital Antiquarian. Those blog series give an excellent introduction to just how arbitrary, buggy, and poorly tested a lot of those early games were. Remember, Infocom's canon stood out for being much better than the rest. Honest.)
The soft item count limit, taken as a game mechanic, wasn't unreasonable. In particular, you can't say it was a trivial annoyance whose only purpose was to make the player retype a command. Wasting a turn is a meaningful penalty in Zork. Your lamp is slowly dying; the thief is out there wandering. If you run into that "too many items" error, you've failed at inventory management. You should have paid more attention and put something in your sack first.
Was inventory management a good game design idea? Well, no. I thought it was annoying and tedious then; I still think so today. But it was part of the Zork ethos, and the item limit was part of the inventory system.
Are we really looking at the Zork 1 source code?
Excellent question! Always question hot takes you see on Twitter. The answer is "yes, sort of."
The Zork file we've been discussing is not the source code that Lebling and Blank wrote at Infocom circa 1980. Rather, it is a disassembly of the Z-machine game file that Infocom sold. (And which can be found on the Lost Treasures CD, etc.)
The Z-machine format was never published by Infocom, but it was reverse-engineered around 1990 and is now well understood. You can find tools which disassemble the Infocom game files (and Inform game files, for that matter). I use txd, found in the ztools package.
Run txd on the Zork 1 file, and you'll see... raw Z-machine assembly code. The txd output for the function we've been discussing looks like:
L0003: GET_PARENT G76 -> -(SP)
JIN (SP)+,G6f [TRUE] L0007
CALL R0241 (G76) -> L03
CALL R0241 (G6f) -> -(SP)
ADD L03,(SP)+ -> -(SP)
JG (SP)+,G85 [FALSE] L0007
JZ L00 [TRUE] L0006
PRINT "Your load is too heavy"
JL G85,G86 [FALSE] L0004
PRINT ", especially in light of your condition."
JUMP L0005
L0004: PRINT "."
L0005: NEW_LINE
L0006: RET #02
L0007: JE G78,#5d [FALSE] L0008
CALL R0240 (G6f) -> L01
JG L01,G3b [FALSE] L0008
MUL L01,G3a -> L03
RANDOM #64 -> -(SP)
JG L03,(SP)+ [FALSE] L0008
PRINT "You're holding too many things already!"
NEW_LINE
RFALSE
L0008: INSERT_OBJ G76,G6f
If you compare this to the code above, you can see they behave the same. But the game file is compiled code. All the symbols -- the variable and function names -- have been stripped out.
So where did that original file, with its nice labels, come from? If you go back to the twitter thread, you'll see:
Holy crap, you pulled that literal code from my horrible, horrible decompiler output from the early 2000s!
Allen Garvin started with that raw disassembly. Then he laboriously figured out what each line did, and gave every function and variable an appropriate label. (Not, of course, the same labels that the Infocom authors used!)
As Allen's tweet implies, we've been looking at a crude, early attempt. He has a much cleaner Zork 1 source file posted today:
if (parent(noun) notin player) {
weight = QueryWeight(noun);
if (weight + QueryWeight(player) > Load_max) {
if (vb) {
print "Your load is too heavy";
if (Load_max < Load_allowed) {
print ", especially in light of your condition.";
} else {
print ".";
}
new_line;
}
return A_FAILURE;
}
}
if (action == ##Take) {
num_items = CCount(player);
if (num_items > Maximum_held) {
weight = num_items * Max_held_mult;
if (weight > random(100)) {
print "You're holding too many things already!";
new_line;
rfalse;
}
}
}
move noun to player;
It does exactly the same thing, but it's much more readable, right? It's also recompilable Inform 6 source code.
What about the Zork 2 stuff?
In the swirling tweet-gyre, Jason Scott (noted Infocom historian!) wrote:
The main thing is, I am coming to the conclusion that what this really is Zork II's FUMBLE spell effects on you.
(FUMBLE is one of the curses that the Wizard of Frobozz casts on you. But the Wizard doesn't show up until Zork 2.)
Jason points out that the Infocom games re-used parser code all the time. It was a big chunk, essentially a library, and the Infocom authors copied and pasted it from one game to the next. (Remember that detail about clothing?) In fact, if you look a little farther down the original Zork 1 file, you'll see:
.label8;
move noun to player;
give noun visited;
Zork2_deletion();
ScoreObj(noun);
rtrue;
In fact the whole Zork 1 parser is studded with Zork2_deletion() calls. But, on the other hand, we've seen that the randomizer really does take effect in Zork 1. So Jason is right about re-used code, but wrong about the FUMBLE curse theory. (Sorry!)
So what's going here?
First of all, remember that Zork2_deletion is a label that Allen Garvin added. The compiled game file just has a do-nothing call to an empty function.
Why did Allen use that label? If you look at a disassembly of Zork 2, there's a very similar routine, but it has some extra code that can print the message "When you touch the [noun] it immediately disappears!" This has to do with the FANTASIZE curse, which makes you hallucinate fake objects.
It's pretty clear that both games (and probably Zork 3) were compiled with a common parser library. But the library was rigged a lot of special curse conditions which were only compiled in when building Zork 2. (#ifdef code, we'd say today.)
The randomized item limit was not one of these curses. It really did apply in both games.
Most of Infocom's games had inventory limits, but the form varied.
Enchanter: Weight and item limit, just as in Zork 1.
Zork 3: Weight and item limit, with a twist: "Oh, no. The [obj1] slips from your arms while taking the [obj2] and both tumble to the ground."
Lurking Horror: Weight and item limit, but not randomized.
And so on.
(Yes, I keep all the Infocom game files in disassembled form, to answer questions just like this. But no, I'm not going to go through and catalog the inventory limits in every single one.)
Conclusions
This post has gotten really long and I haven't figured out a conclusion for it. (Yes, you say, like so many other blog posts...) Well, try this:
Detail matters. And comparing different versions of the same file can be surprisingly interesting.
I built a camera that snaps a GIF and ejects a little cartridge so you can hold a moving photo in your hand! I'm calling it the "Instagif NextStep". Don't ask me why I built it, it sounded like a fun challenge and I always wanted to hold a moving photo. If it wasn't obvious, I was inspired by the Polaroid OneStep.
What I love about these kinds of projects is that they involve a bunch of different skill sets and disciplines - hardware, software, 3D modeling, 3D printing, circuit design, mechanical/electrical engineering, design, fabrication etc that need to be integrated for it to work seamlessly. Ironically this is also what I hate about these kinds of projects :P
I have detailed all the steps here and have released all the code, eagle files and design files so you can create your own.
As always I underestimated the project which ended up being far more complex and taking up far more time than I had accounted for. Some would think I should have learned by now. If you want to check out some of these projects I'm alluding to, head over to my website http://shek.it
Also here's a short video of the camera in action: https://youtu.be/T71S8kZhzWw
Piecing it together
I thought I'll start off with the entire assembly and then break down the different elements. As you can see, everything is assembled from the base up in layers helping in easy assembly and quick disassembly for troubleshooting
Designing in 3D
I enjoy this part of the design process as you have to visualize and imagine how things will fit, move, be assembled, be dismantled, in your head before even touching a CAD program. Paper sketches help a lot along the way.
There aren't many screws needed in this build and most parts snap together using small but powerful neodynium magnets
The entire thing was modeled in Autodesk Fusion 360.
Ejection Mechanism
I spent quite a bit of time trying to figure out the ejection mechanism so the cartridge could slide out of the camera. As always it was important to me that the final camera wasn't too large so I needed something that saved space and still did the job. I experimented with linear actuators but eventually settled on a rack and pinion mechanism which I placed in such a way that it conserves space.
Another thing that I had trouble figuring out was how to prevent the cartridge from sliding out if the camera was tilted forward. It needed some kind of stopper in the front. I considered a mechanical arm but I didn't want to add an additional servo just for this purpose. I also considered adding a magnet at the back that would keep it in place. Eventually I found a simple, elegant solution which you can see prototyped in the GIF. Didn't need any additional mechanism :)
Underestimating Physics
This was the final sliding platform. The wheels were actually added later (post printing) when I realized I had completely underestimated and ignored friction!
The platform was getting badly stuck while sliding on the bottom rails it was placed on causing the servo to burn out.. After some design iterations, I was able to come up with a solution that involved printing additional wheel attachments and gluing it to the existing platform with epoxy (seen above). The friction issue was solved!
Sketching up Layouts
I also spent a lot of time figuring out how to make the cartridge as small as possible. Ultimately the size of the cartridge would also define the size of the entire camera.
I did a bunch of sketches of the cartridge, laying out all the elements in different permutations to squeeze out every millimeter. This was also the time I spent pulling my hair out experimenting with different screens for the raspberry pi. I tried a bunch of screens in different sizes: 2.8", 3.5", 4".
Word of caution: A raspberry pi's SPI interface can't handle fast refresh rates on screens with resolutions higher than 320x240. Viewing videos/gifs will be dismal with huge lag and visible frame refreshes. This meant that 2.8" was the largest usable screen. I found this out the hard way.
With more sketches, I finally found a layout that would work while keeping the cartridge size small.
The Component Massacre
To achieve the desired cartridge layout, some amount of abuse was required of every component that needed to fit inside the cartridge:
1) PiTFT Screen - I dismantled the screen, removing all header pins which were adding a ton of height and cutting off the excess PCB. The Pi was then soldered to the screen.
2) PowerBoost 1000C charger - Again unneeded components that added height - the micro usb and JST connector were removed. I soldered the wires straight to the breakout pins.
3) Raspberry Pi Zero W - The micro usb ports were sanded down to reduce the ~2mm overhang they have over the PCB to reduce overall width. I figured as long as I only touch the casing and stay away from the contacts it should be fine. The HDMI port was massacred with the top part of its casing removed completely. This reduced the overall height by another ~1.5mm. Also the sides of the PCB was sanded down just a bit.
4) LiPo Battery - I spent a lot of time trying to find the perfect battery. The one I found had the perfect combination of size and amperage to work in the design.
5) Momentary switch - The 6mm momentary switches that powers the cartridge up was sanded down and excess casing was clipped off. It is placed at the top edge of the cartridge right between the two copper charging plates.
Packed like Sardines
Here you can see the mutilated components being laid out inside the cartridge. There wasn't any wiggle room left in there! During final assembly all these components were soldered together as one circuit and placed in the cartridge
Charging the Cartridge
The cartridge has its own 400mAh LiPo battery. I used pogo pins to connect this battery via two tinned copper plates to the main power bank in the camera which is a portable 10000mAh supply. I used pogo pins as they provide some give while ensuring contact.
- When the cartridge is inside the camera it gets powered by the large powerbank and the battery gets charged,
- When the cartridge is ejected it's own LiPo takes over to power it
I followed the steps in this repo (https://github.com/NeonHorizon/lipopi) to get the system working in the cartridge. The repo explains it well.
Extending the Headers
The raspberry pi 3 also went through its fair share of alterations:
1) I removed the ethernet port and one of the dual USB ports. This was because the mounting holes of the screen were located exactly there and I needed these holes to screw the screen to the camera chassis.
2) The other problem that the screen caused was that it blocked all the pi's header pins even though it actually uses only about 8 pins. Since I needed access to the GPIOs, I extended the header pins out the other side, making plugging and unplugging wires a breeze.
Honeybees. Honeybees are acceptable only via surface transportation, and must bear special handling fees, in addition to regular postage. Mailpieces must be plainly marked on the address side with “Live Bees” and “Surface Only” or “Surface Mail Only.”
Via air transportation. Only queen honey bees may be shipped via air transportation. Each queen honeybee shipped via air transportation may be accompanied by up to eight attendant honeybees.
Via surface transportation. Queen honeybees shipped via surface transportation must bear special handling fees, in addition to regular postage.
Note: Boxes of approximately identical size, securely fastened to prevent separation in transit, may be accepted for mailing as a single parcel if the mailpiece does not exceed DMM limits for combined length and girth or exceed 70 pounds in weight.
On the night of March 11, 1437 A.D., in what is now modern-day Seoul, a new star appeared in the sky, seemingly out of nowhere. The newcomer shone for 14 days before fading into the darkness. Korean astronomers noted the mysterious star and its brief stint in the sky in their records. Centuries later, modern astronomers studying these records determined that what the Koreans had seen was a cosmic explosion called a nova. Novae occur in two-star systems, when a dead star, known as a white dwarf, starts eating away at its companion, a star like our sun. The white dwarf slowly builds a layer of hydrogen stolen from the other star over tens of thousands of years, and then ejects it all at once, producing an eruption of light 300,000 times brighter than the sun that can last for weeks.
Michael Shara and his researcher colleagues have spent the last nearly 30 years looking for the star responsible for this nova. In a new paper published Wednesday in Nature, they say they’ve finally found it.
“It’s been like searching for a needle in a billion haystacks,” Shara said.
For most of their search, Shara, a curator in the American Museum of Natural History’s department of astrophysics; Richard Stephenson, a historian of ancient astronomical records at Durham University; and Mike Bode, an astrophysicist at Liverpool John Moores University, focused on a part of the sky where they suspected the mystery star must lurk. The investigation was an on-again, off-again effort of “failure after failure after failure,” one that they returned to when they had the time or a lead.
Last year, Shara found some relevant files in his office that he hadn’t looked at in nearly a decade, and decided to expand the search area in the sky. He started combing through digital databases of stars, looking for any interesting targets. In one astronomical catalog, he saw a well-known planetary nebula, a glowing shell of gas and dust. In a different catalog, he found an image of a binary star taken in 2016 in the same area. Then it hit him: That wasn’t a planetary nebula. It was the leftover shell of a nova explosion, floating near the star system that produced it.
“One the one hand, I was pretty ecstatic,” Shara said of this moment. “On the other hand, I was sort of pounding my forehead a little bit with the back of my hand like, oh my god, I could have found this decades ago, if I had just expanded the search a little bit.”
Shara knew if he could find other images of this star from decades ago and compare it to this newfound photo, he could track its movement over time, rewind the clock, and determine its position nearly 600 years ago, when the Korean astronomers made their observations. Photographs from the 1980s and 1990s showed the star had indeed moved between then and now, but not enough for the researchers to accurately calculate its position in the 15th century. So they looked further back in time, at a photographic plate from 1923 they got through a Harvard project to digitize a century’s worth of sky observations. They measured how much the star moved in the last century, and extrapolated that to five more centuries. As they rewound the clock, the star moved closer to the nova shell. According to their calculations, the star was right in the center of the nova shell on the day the Koreans saw a mystery star, which means it caused the cosmic explosion.
Shara kept up his search for decades because he wanted to confirm a theory on nova explosions he wrote in 1986, when he was working at the Space-Telescope Science Institute at Johns Hopkins. After a classical nova, the process starts all over again. The white dwarf begins again to chip away at its companion. Smaller explosions, known as dwarf novae, may occur as it gathers its fellow star’s material. Eventually, there’s enough stolen material to produce another full-sized nova. Shara posited that dwarf novae and full-sized novae are the same systems, seen at different times, rather than entirely separate objects, as some astronomers have suggested. Now, the search has paid off. Shara found that the star system responsible for the nova in 1437 A.D. shows dwarf novae in photos from the 1930s and 1940s, which supports his claim that both phenomena originate from a single source.
Shara said he and his colleagues couldn’t have investigated this mystery without online astronomical catalogs, which weren’t around in the 1980s. The catalogs are like Google search engines for hundreds of millions of stars.
“If huge databases had existed then, I would have unraveled it right then and there,” he said. “It was only because of this ability to check something like a million astronomical papers and books and many petabytes of images, all within a few minutes, and correlate things with each other that I was able to find this.”
I am in search of the simplest React Stack. It has to be future-proof and feature-packed, but most importantly it has to be a breeze to work with.
In this series I will look for easy-to-use solutions to implement a buffed version of the classic React TodoMVC. When I think I can achieve something simpler, I will create helper libraries myself. The following topics will be likely covered before the finale.
Bootstrapping
State Management
Routing
Styling
Networking
Testing
Hosting
The episodes will be short and … simple.
Bootstrapping and Bundling
Create React App is hard to beat here. It sets up a lot of hidden goodies with a few commands only.
npm install create-react-app -g create-react-app todos cd todos npm start
This creates an offline-first Progressive Web App with ES6+ support, auto-prefixed CSS and a watching dev server. Sooo much good tech in one sentence! Awesome.
You can test and build the code later with the npm test / npm run build commands. If your project outgrows Create React App, just execute npm run eject and take over control.
That’s it! Lets jump straight into development by editing the src folder — created by React Create App.