Chris Lattner Joins Google Brain

August 14, 2017, 10:53 am

≫ Next: 64-bit Firefox is the new default on 64-bit Windows

Chris Lattner, one of the key creators behind the Apple programming language Swift, is on the move again. After a short six-month stay at Tesla, which he joined last year from Apple to act as VP of Autopilot Software, Lattner announced on Twitter today that his next stop is Google Brain.

Lattner, who worked for more than a decade on low-level software and systems at Apple, revealed in June that he wasn’t going to be staying on at Tesla after finding that it wasn’t “a good fit.” Lattner then joked that his resume was “easy to find online,” and noted his top qualification: Seven years of Swift experience, which is the longest anyone not on his immediate team at Apple can reasonably claim without outright lying.

Swift isn’t Lattner’s only major contribution to the world of programming: Prior to his helping hand with Apple’s latest coding language, he created the Land compiler and LLVM. In other words, you’d be hard-pressed to find a modern developer whose work hasn’t been touched at a fundamental level by something Lattner has created in the past.

Google Brain is the search giant’s team focused on deep learning and artificial intelligence. It focused on helping to use AI across a range of products, tackling both research and product integration, working together with teams across Alphabet, including at DeepMind. Its ultimate stated motivation is to advance the field with open source projects, academic collaboration and publication.

↧

64-bit Firefox is the new default on 64-bit Windows

August 14, 2017, 2:04 pm

≫ Next: A Solution of the P versus NP Problem

≪ Previous: Chris Lattner Joins Google Brain

Users on 64-bit Windows who download Firefox will now get our 64-bit version by default. That means they’ll install a more secure version of Firefox, one that also crashes a whole lot less. How much less? In our tests so far, 64-bit Firefox reduced crashes by 39% on machines with 4GB of RAM or more

64-bit Firefox has more security and fewer crashes

What’s the difference between 32-bit and 64-bit?

Here’s the key thing to know: 64-bit applications can access more memory and are less likely to crash than 32-bit applications. Also, with the jump from 32 to 64 bits, a security feature called Address Space Layout Randomization (ASLR) works better to protect you from attackers.

macOS and Linux users, fret not. You already enjoy a Firefox that’s optimized for 64-bit.

How do you get 64-bit Firefox?

If you’re running 64-bit Windows (here’s how to check), you have two options to get your computer hooked up with this improved Firefox experience:

You can download Firefox now and reinstall, which will automatically install Firefox 64-bit; or
You can wait. We intend to migrate the remaining 64-bit Windows users to a 64-bit version of Firefox with a future release. If you prefer to stay with 32-bit Firefox after the 64-bit migration, you can simply download and re-run the Firefox 32-bit installer from the Firefox platforms and languages download page.

↧

A Solution of the P versus NP Problem

August 14, 2017, 5:08 am

≫ Next: Macie: Automatically Discover, Classify, and Secure Content at Scale

≪ Previous: 64-bit Firefox is the new default on 64-bit Windows

(Submitted on 11 Aug 2017)

Abstract: Berg and Ulfberg and Amano and Maruoka have used CNF-DNF-approximators to prove exponential lower bounds for the monotone network complexity of the clique function and of Andreev's function. We show that these approximators can be used to prove the same lower bound for their non-monotone network complexity. This implies P not equal NP.

From: Norbert Blum [view email]
[v1] Fri, 11 Aug 2017 09:38:56 GMT (23kb)

↧

Macie: Automatically Discover, Classify, and Secure Content at Scale

August 14, 2017, 12:55 pm

≫ Next: Everything You Always Wanted to Know About Optical Networking [pdf]

≪ Previous: A Solution of the P versus NP Problem

When Jeff and I heard about this service, we both were curious on the meaning of the name Macie. Of course, Jeff being a great researcher looked up the name Macie and found that the name Macie has two meanings. It has both French and English (UK) based origin, it is typically a girl name, has various meanings. The first meaning of Macie that was found, said that that name meant “weapon”. The second meaning noted the name was representative of a person that is bold, sporty, and sweet. In a way, these definitions are appropriate, as today I am happy to announce that we are launching Amazon Macie, a new security service that uses machine learning to help identify and protect sensitive data stored in AWS from breaches, data leaks, and unauthorized access with Amazon Simple Storage Service (S3) being the initial data store. Therefore, I can imagine that Amazon Macie could be described as a bold, weapon for AWS customers providing a sweet service with a sporty user interface that helps to protects against malicious access of your data at rest. Whew, that was a mouthful, but I unbelievably got all the Macie descriptions out in a single sentence! Nevertheless, I am a thrilled to share with you the power of the new Amazon Macie service.

Amazon Macie is a service powered by machine learning that can automatically discover and classify your data stored in Amazon S3. But Macie doesn’t stop there, once your data has been classified by Macie, it assigns each data item a business value, and then continuously monitors the data in order to detect any suspicious activity based upon access patterns. Key features of the Macie service include:

Data Security Automation: analyzes, classifies, and processes data to understand the historical patterns, user authentications to data, data access locations, and times of access.
Data Security & Monitoring: actively monitors usage log data for anomaly detected along with automatic resolution of reported issues through CloudWatch Events and Lambda
Data Visibility for Proactive Loss prevention: Provides management visibility into details of storage data while providing immediate protection without the need for manual customer input
Data Research and Reporting: allows administrative configuration for reporting and alert management requirements

How does Amazon Macie accomplish this you ask?

Using machine learning algorithms for natural language processing (NLP), Macie can automate the classification of data in your S3 buckets. In addition, Amazon Macie takes advantage of predictive analytics algorithms enabling data access patterns to be dynamically analyzed. Learnings are then used to inform and to alert you on possible suspicious behavior. Macie also runs an engine specifically to detect common sources of personally identifiable information (PII), or sensitive personal information (SP). Macie takes advantage of AWS CloudTrail and continuously checks Cloudtrail events for PUT requests in S3 buckets and automatically classify new objects in almost real time.

While Macie is a powerful tool to for security and data protection in the AWS cloud, it also can aid you with governance, compliance requirements, and/or audit standards. Many of you may already be aware of the EU’s most stringent privacy regulation to date – The General Protection Data Regulation (GDPR), which becomes enforceable on May 25, 2018. As Amazon Macie recognizes personally identifiable information (PII) and provides customers with dashboards and alerts, it will enable customers to comply with GDPR regulations around encryption and pseudonymization of data. When combined with Lambda queries, Macie becomes a powerful tool to help remediate GDPR concerns.

Tour of the Amazon Macie Service

Let’s look a tour of the service and look at Amazon Macie up close and personal.

First, I will log onto the Macie console and start the process of setting up Macie so that I can start to my data classification and protection by clicking the Get Started button.

As you can see, to enable the Amazon Macie service, I must have the appropriate IAM roles created for the service, and additionally I will need to have AWS CloudTrail enabled in my account.

I will create these roles and turn on the AWS CloudTrail service in my account. To make things easier for you to setup Macie, you can take advantage of sample template for CloudFormation provided in the Macie User Guide that will set up required IAM roles and policies for you, you then would only need to setup a trail as noted in the CloudTrail documentation.

If you have multiple AWS accounts, you should note that the account you use to enable the Macie service will be noted as the master account, you can integrate other accounts with the Macie service but they will have the member account designation. Users from member accounts will need to use an IAM role to federate access to the master account in order access the Macie console.

Now that my IAM roles are created and CloudTrail is enabled, I will click the Enable Macie button to start Macie’s data monitoring and protection.

Once Macie is finished starting the service in your account, you will be brought to the service main screen and any existing alerts in your account will be presented to you. Since I have just started the service, I currently have no existing alerts at this time.

Considering we are doing a tour of the Macie service, I will now integrate some of my S3 buckets with Macie. However, you do not have to specify any S3 buckets for Macie to start monitoring since the service already uses the AWS CloudTrail Management API analyze and process information. With this tour of Macie, I have decided to monitor some object level API events in from certain buckets in CloudTrail.

In order to integrate with S3, I will go to the Integrations tab of the Macie console. Once on the Integrations tab, I will see two options: Accounts and Services. The Account option is used to integrate member accounts with Macie and to set your data retention policy. Since I want to integrate specific S3 buckets with Macie, I’ll click the Services option go to the Services tab.

When I integrate Macie with the S3 service, a trail and a S3 bucket will be created to store logs about S3 data events. To get started, I will use the Select an account drop down to choose an account. Once my account is selected, the services available for integration are presented. I’ll select the Amazon S3 service by clicking the Add button.

Now I can select the buckets that I want Macie to analyze, selecting the Review and Save button takes me to a screen which I confirm that I desire object level logging by clicking Save button.

4
Next, on our Macie tour, let’s look at how we can customize data classification with Macie.

As we discussed, Macie will automatically monitor and classify your data. Once Macie identifies your data it will classify your data objects by file and content type. Macie will also use a support vector machine (SVM) classifier to classify the content within S3 objects in addition to the metadata of the file. In deep learning/machine learning fields of study, support vector machines are supervised learning models, which have learning algorithms used for classification and regression analysis of data. Macie trained the SVM classifier by using a data of varying content types optimized to support accurate detection of data content even including the source code you may write.

Macie will assign only one content type per data object or file, however, you have the ability to enable or disable content type and file extensions in order to include or exclude them from the Macie service classifying these objects. Once Macie classifies the data, it will assign risk level of the object between 1 and 10 with 10 being the highest risk and 1 being the lowest data risk level.

To customize the classification of our data with Macie, I’ll go to the Settings Tab. I am now presented with the choices available to enable or disable the Macie classifications settings.

For an example during our tour of Macie, I will choose File extension. When presented with the list of file extensions that Macie tracks and uses for classifications.

As a test, I’ll edit the apk file extension for Android application install file, and disable monitoring of this file by selecting No – disabled from the dropdown and clicking the Save button. Of course, later I will turn this back on since I want to keep my entire collection of data files safe including my Android development binaries.

One last thing I want to note about data classification using Macie is that the service provides visibility in how you data object are being classified and highlights data assets that you have stored regarding how critical or important the information for compliance, for your personal data and for your business.

Now that we have explored the data that Macie classifies and monitors, the last stop on our service tour is the Macie dashboard.

The Macie Dashboard provides us with a complete picture of all of the data and activity that has been gathered as Macie monitors and classifies our data. The dashboard displays Metrics and Views grouped by categories to provide different visual perspectives of your data. Within these dashboard screens, you also you can go from a metric perspective directly to the Research tab to build and run queries based on the metric. These queries can be used to set up customized alerts for notification of any possible security issues or problems. We won’t have an opportunity to tour the Research or Alerts tab, but you can find out more information about these features in the Macie user guide.

Turning back to the Dashboard, there are so many great resources in the Macie Dashboard that we will not be able to stop at each view, metric, and feature during our tour, so let me give you an overview of all the features of the dashboard that you can take advantage of using.

Dashboard Metrics – monitored data grouped by the following categories:

High-risk S3 objects: data objects with risk levels of 8 through 10.
Total event occurrences:– total count of all event occurrences since Macie was enabled
Total user sessions– 5-minute snapshot of CloudTrail data

Dashboard Views – views to display various points of the monitored data and activity:

S3 objects for a selected time range
S3 objects
S3 objects by personally identifiable information (PII)
S3 objects by ACL
CloudTrail events and associated users
CloudTrail errors and associated users
Activity location
AWS CLoudTrail events
Activity ISPs
AWS CloudTrail user identity types

Summary

Well, that concludes our tour of the new and exciting Amazon Macie service. Amazon Macie is a sensational new service that uses the power of machine learning and deep learning to aid you in securing, identifying, and protecting your data stored in Amazon S3. Using natural language processing (NLP) to automate data classification, Amazon Macie enables you to easily get started with high accuracy classification and immediate protection of your data by simply enabling the service. The interactive dashboards give visibility to the where, what, who, and when of your information allowing you to proactively analyze massive streams of data, data accesses, and API calls in your environment. Learn more about Amazon Macie by visiting the product page or the documentation in the Amazon Macieuser guide.

– Tara

↧

Everything You Always Wanted to Know About Optical Networking [pdf]

August 14, 2017, 12:49 pm

≫ Next: 'Instantly rechargeable' battery could change the future of electric automobiles

≪ Previous: Macie: Automatically Discover, Classify, and Secure Content at Scale

Download PDF

↧

'Instantly rechargeable' battery could change the future of electric automobiles

August 14, 2017, 2:04 pm

≫ Next: Launch HN: Lambda School (YC S17) – CS education that's free until you get a job

≪ Previous: Everything You Always Wanted to Know About Optical Networking [pdf]

WEST LAFAYETTE, Ind. – A technology developed by Purdue researchers could provide an “instantly rechargeable” method that is safe, affordable and environmentally friendly for recharging electric and hybrid vehicle batteries through a quick and easy process similar to refueling a car at a gas station.

The innovation could expedite the adoption of electric and hybrid vehicles by eliminating the time needed to stop and re-charge a conventional electric car’s battery and dramatically reducing the need for new infrastructure to support re-charging stations.

John Cushman, Purdue University distinguished professor of earth, atmospheric and planetary science and a professor of mathematics, presented the research findings “Redox reactions in immiscible-fluids in porous media – membraneless battery applications” at the recent International Society for Porous Media 9th International Conference in Rotterdam, Netherlands.

Cushman co-founded Ifbattery LLC (IF-battery) to further develop and commercialize the technology.

“Electric and hybrid vehicle sales are growing worldwide and the popularity of companies like Tesla is incredible, but there continues to be strong challenges for industry and consumers of electric or hybrid cars,” said Cushman, who led the research team that developed the technology. “The biggest challenge for industry is to extend the life of a battery’s charge and the infrastructure needed to actually charge the vehicle. The greatest hurdle for drivers is the time commitment to keeping their cars fully charged.”

Current electric cars need convenient locations built for charging ports.

“Designing and building enough of these recharging stations requires massive infrastructure development, which means the energy distribution and storage system is being rebuilt at tremendous cost to accommodate the need for continual local battery recharge,” said Eric Nauman, co-founder of Ifbattery and a Purdue professor of mechanical engineering, basic medical sciences and biomedical engineering. “Ifbattery is developing an energy storage system that would enable drivers to fill up their electric or hybrid vehicles with fluid electrolytes to re-energize spent battery fluids much like refueling their gas tanks.”

The spent battery fluids or electrolyte could be collected and taken to a solar farm, wind turbine installation or hydroelectric plant for re-charging.

“Instead of refining petroleum, the refiners would reprocess spent electrolytes and instead of dispensing gas, the fueling stations would dispense a water and ethanol or methanol solution as fluid electrolytes to power vehicles,” Cushman said. “Users would be able to drop off the spent electrolytes at gas stations, which would then be sent in bulk to solar farms, wind turbine installations or hydroelectric plants for reconstitution or re-charging into the viable electrolyte and reused many times. It is believed that our technology could be nearly ‘drop-in’ ready for most of the underground piping system, rail and truck delivery system, gas stations and refineries.”

Mike Mueterthies, Purdue doctoral teaching and research assistant in physics and the third co-founder of Ifbattery, said the flow battery system makes the Ifbattery system unique.

“Other flow batteries exist, but we are the first to remove membranes which reduces costs and extends battery life,” Mueterthies said.

Ifbattery’s membrane-free battery demonstrates other benefits as well.

“Membrane fouling can limit the number of recharge cycles and is a known contributor to many battery fires,” Cushman said. “Ifbattery’s components are safe enough to be stored in a family home, are stable enough to meet major production and distribution requirements and are cost effective.”

Ifbattery licensed part of the technology through the Purdue Research Foundation Office of Technology Commercialization and has developed patents of its own. The company is a member of the Purdue Startup Class of 2017. Click https://youtu.be/LskSvhrjSjE to view a video about the company.

“We are at a stage in the company’s growth that we are looking for additional financing to build large-scale prototypes and subsequently manufacturing partners,” Cushman said.

Writer: Cynthia Sequin, 765-588-3340, casequin@prf.org

Sources: John Cushman, 765-494-8040, drbigjohn@gmail.com

Eric Nauman, enauman@purdue.edu

Michael Mueterthies, mmuetert@purdue.edu

↧

Launch HN: Lambda School (YC S17) – CS education that's free until you get a job

August 14, 2017, 11:17 am

≫ Next: “Packages should be reproducible” added to Debian Policy

≪ Previous: 'Instantly rechargeable' battery could change the future of electric automobiles

Hey HN,

We're Lambda School (https://lambdaschool.com/computer-science). We train people to become software engineers, and we charge nothing until a student gets a software job that pays more than $50k/yr. At that point we take 17% of income for two years (capped at a maximum of $30k total).

There are so many people held back from a high quality education simply because they can't afford the cost and/or risk. Even if you can get student loans, four years and a potential six figures of student loans is a daunting proposition, especially if you come from a lower-class background. New alternatives, such as code bootcamps, either require expensive loans or tens of thousands of dollars in cash up front, which most people don't have, and they vary widely in quality. This leaves a lot of very smart people working for not much money.

We're different. We're an educational institution that owns the risk: if you don't get a good job, we don't get paid. We do everything in small, interactive, online classes with world-class instructors (currently from Stanford, Berkeley, Hack Reactor, etc.). Our curriculum goes a lot deeper than code bootcamps as well; we use C++ and spend a lot of time with lower-level algorithms, data structures, architecture, scaling, etc.

The full curriculum is here: https://github.com/LambdaSchool/LambdaCSA-Syllabus. Happy to answer any questions and looking forward to hearing feedback!

↧

“Packages should be reproducible” added to Debian Policy

August 14, 2017, 10:19 am

≫ Next: U.S. judge says LinkedIn cannot block startup from public profile data

≪ Previous: Launch HN: Lambda School (YC S17) – CS education that's free until you get a job

#844431 - debian-policy: Packages should be reproducible - Debian Bug report logs

Reply or subscribe to this bug.

Toggle useless messages

Report forwarded to debian-bugs-dist@lists.debian.org, reproducible-builds@lists.alioth.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Tue, 15 Nov 2016 17:30:04 GMT) (full text, mbox, link).

Acknowledgement sent to Chris Lamb <lamby@debian.org>:
New Bug report received and forwarded. Copy sent to reproducible-builds@lists.alioth.debian.org, Debian Policy List <debian-policy@lists.debian.org>. (Tue, 15 Nov 2016 17:30:04 GMT) (full text, mbox, link).

Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

Package: debian-policy
Version: 3.9.8.0
X-Debbugs-Cc: reproducible-builds@lists.alioth.debian.org

Dear Policy maintainers,

Whilst anyone can inspect the source code in Debian for malicious
flaws, we distribute pre-compiled to end users. The motivation behind
the Reproducible Builds effort is to permit verification that no flaws
have been introduced — either maliciously or accidentally — during this
compilation process by promising identical results are always generated
from a given source, thus allowing multiple third-parties to come to a
consensus on whether a build was compromised.

Debian has been making great strides to make itself reproducible,
contributing 100s patches, not only within Debian itself but also to
upstream projects. We have also been running a comprehensive and non-
trivial CI framework to test for reproducibility of packages for quite
some time.

However, the recent arrival of the final pieces of the toolchain into
unstable encourages me to propose that we add a recommendation that
packages in Debian should be reproducible.

This would be act both as documentation of a modern best practice, but
also act as a "placeholder" so that we can increase its severity at some
future date.

[As a mild suggestion to streamline this; we should probably come to some
consensus on principle of this addition to Policy first and only then
move to the more difficult topic of defining exactly what reproducibility
means in a technical sense.]


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-

Changed Bug title to 'debian-policy: Packages should be reproducible' from 'Packages should be reproducible'. Request was from Holger Levsen <holger@layer-acht.org> to control@bugs.debian.org. (Tue, 15 Nov 2016 17:57:05 GMT) (full text, mbox, link).

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Tue, 15 Nov 2016 19:45:02 GMT) (full text, mbox, link).

Acknowledgement sent to Henrique de Moraes Holschuh <hmh@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Tue, 15 Nov 2016 19:45:02 GMT) (full text, mbox, link).

Message #12 received at 844431@bugs.debian.org (full text, mbox, reply):

On Tue, 15 Nov 2016, Chris Lamb wrote:> [As a mild suggestion to streamline this; we should probably come to some> consensus on principle of this addition to Policy first and only then> move to the more difficult topic of defining exactly what reproducibility> means in a technical sense.]

I don't think there will be much of a contention about this.

Please propose wording (i.e. the diff to the policy text), but I
recommend that you do *not* use "should" or "must" to make such
reproducibility mandatory right now, only to define stuff like "*if* it
is built for reproducibility, it must do so in such a way that...", etc.

Enforcing package reproducibility (should/must in policy) has to wait
until a majority of the package is effectively being reproducibly built
for a small while (to shaken up any issues), and the tooling echosystem
is complete so that it is actually usable to verify things.  IMHO, this
would be best done only after stretch is released, even if we reach >85%
reproducibility levels *and* a full, working toolset before that.

As a suggestion, since a "may build reproducibly" policy is not going to
give the readers the desired idea, the policy text proposal could use
words to the effect that "it is recommended that", and "in the future,
this will become a requirement".

Any packages that absolutely cannot be built in a reproducible way[1],
can become oficially allowed exceptions -- and we could likely teach the
verification tools that specific regions of a package/file are to be
random, and ignore those when comparing for reproducibility, too.  But
this would be tackled on in the future, between an already implemented
policy of SHOULD is out, and >95% of the packages are being built
reproducibly and policy is about to be changed to MUST.  Therefore, the
initial proposal just needs to acknowledge that this fact could happen
and will be dealt with in time.

[1] Such as random noise added to kernel and firmware data structures
during local builds, to be used as a last defense to avoid the *herd
using same keys* effects, etc.

-- 
  Henrique Holschuh

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Thu, 17 Nov 2016 14:30:10 GMT) (full text, mbox, link).

Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Thu, 17 Nov 2016 14:30:10 GMT) (full text, mbox, link).

Message #17 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Henrique de Moraes Holschuh wrote:> I don't think there will be much of a contention about this.

Great :)

> Please propose wording (i.e. the diff to the policy text), but> I recommend that you do *not* use "should" or "must" to make such> reproducibility mandatory right now.

Completely agreed. Any requirement would be counter-productive and
ultimately premature at this stage.

I've attached an initial wording to get us going. I'm not 100% convinced
with it myself but it should help start any discussion in this area.


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-

[debian-policy.diff.txt (text/plain, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Sun, 07 May 2017 15:39:03 GMT) (full text, mbox, link).

Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 07 May 2017 15:39:03 GMT) (full text, mbox, link).

Message #22 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

hi,

unsurprisingly I'm also in favor of making this policy change, now.

I also believe there is quite a consensus (definitly a rough one…) in Debian
for making this change, judging by the feedback we got at 3 DebConfs since 2013,
several mini Debconfs and other events, plus the general feedback in the form
of code merges and uploads.

At the Reproducible Builds Hackathon in Hamburg we were reminded of the former
DPL asking DDs to be "more bold" doing sensible changes forward, and as such
we plan that starting with the development phase of "buster" we'll consider
bugs about reproducible builds issues to be of severity "normal", not "wishlist".

This shall be announced on d-d-a soon & given there is no disagrement on this
procedure on this bug.

Last and least for now: the wording of
https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=844431;filename=debian-policy.diff.txt;msg=17
IMO is almost good as it is, though I'll try to amend it to include the
definition of reproducible builds from reproducible-builds.org. 


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 07 May 2017 17:18:03 GMT) (full text, mbox, link).

Message #27 received at 844431@bugs.debian.org (full text, mbox, reply):

Hi Holger,> unsurprisingly I'm also in favor of making this policy change, now.

Actually, yes, why were we waiting for stretch to be released? :)

> Last and least for now: the wording of> https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=844431;filename=debian-policy.diff.txt;msg=17> IMO is almost good as it is, though I'll try to amend it to include the> definition of reproducible builds from reproducible-builds.org. 

That seems the next concrete step.


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-

Acknowledgement sent to Daniel Shahaf <danielsh@apache.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 07 May 2017 20:57:04 GMT) (full text, mbox, link).

Message #32 received at 844431@bugs.debian.org (full text, mbox, reply):

Chris Lamb wrote on Thu, Nov 17, 2016 at 12:30:44 +0100:> +++ b/policy.sgml> @@ -2503,6 +2503,20 @@ endif> +      <sect id="readmesource">

Note that the id should be changed before applying, since there already
is a sect with this id value.

Acknowledgement sent to Bill Allombert <ballombe@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Thu, 11 May 2017 12:57:02 GMT) (full text, mbox, link).

Message #37 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sun, May 07, 2017 at 03:35:00PM +0000, Holger Levsen wrote:> hi,> > unsurprisingly I'm also in favor of making this policy change, now.> > I also believe there is quite a consensus (definitly a rough one…) in Debian
> for making this change, judging by the feedback we got at 3 DebConfs since 2013,> several mini Debconfs and other events, plus the general feedback in the form> of code merges and uploads.> > At the Reproducible Builds Hackathon in Hamburg we were reminded of the former> DPL asking DDs to be "more bold" doing sensible changes forward, and as such> we plan that starting with the development phase of "buster" we'll consider> bugs about reproducible builds issues to be of severity "normal", not "wishlist".

I really think there should be an official tool to do build packages
reproducibly with an interface like cowbuilder. 

Currently, there are too much uncertainty about the process for bug
reports to be of severity normal.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here.

Message #42 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Thu, May 11, 2017 at 02:42:43PM +0200, Bill Allombert wrote:> I really think there should be an official tool to do build packages> reproducibly with an interface like cowbuilder. 
the official tool to build packages reproducible in sid is called
"dpkg-buildpackage" (since dpkg 1.18.16 in sid since 2016-12-17).


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Bill Allombert <ballombe@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 May 2017 14:54:05 GMT) (full text, mbox, link).

Message #47 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sun, May 14, 2017 at 02:36:46PM +0000, Holger Levsen wrote:> On Thu, May 11, 2017 at 02:42:43PM +0200, Bill Allombert wrote:> > I really think there should be an official tool to do build packages> > reproducibly with an interface like cowbuilder. >  > the official tool to build packages reproducible in sid is called> "dpkg-buildpackage" (since dpkg 1.18.16 in sid since 2016-12-17).

So if your package builds with "dpkg-buildpackage" then the build is
reproducible and any bug report to the contrary is in error ? 

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here.

Message #52 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sun, May 14, 2017 at 04:51:47PM +0200, Bill Allombert wrote:> > the official tool to build packages reproducible in sid is called> > "dpkg-buildpackage" (since dpkg 1.18.16 in sid since 2016-12-17).> So if your package builds with "dpkg-buildpackage" then the build is> reproducible and any bug report to the contrary is in error ? 

almost. 93% of the packages in stretch today can be re-build bit by bit
identically. 

that's why we're now aiming at "packages should be reproducible" and not
for "must be reproducible"… (but plan to later aim for "must be").


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Message #57 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sun, May 07, 2017 at 06:15:38PM +0100, Chris Lamb wrote:> > unsurprisingly I'm also in favor of making this policy change, now.> Actually, yes, why were we waiting for stretch to be released? :)

good question. I guess because of a mental barrier against doing changes
targeted post-stretch now :)
 
> > Last and least for now: the wording of> > https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=844431;filename=debian-policy.diff.txt;msg=17> > IMO is almost good as it is, though I'll try to amend it to include the> > definition of reproducible builds from reproducible-builds.org. > That seems the next concrete step.

indeed! Will see to work on this the next days…


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Bill Allombert <ballombe@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 May 2017 15:06:13 GMT) (full text, mbox, link).

Message #62 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sun, May 14, 2017 at 02:58:27PM +0000, Holger Levsen wrote:> On Sun, May 14, 2017 at 04:51:47PM +0200, Bill Allombert wrote:> > > the official tool to build packages reproducible in sid is called> > > "dpkg-buildpackage" (since dpkg 1.18.16 in sid since 2016-12-17).> > So if your package builds with "dpkg-buildpackage" then the build is> > reproducible and any bug report to the contrary is in error ? > > almost. 93% of the packages in stretch today can be re-build bit by bit> identically. 

OK, but how can I check that my package build is reproducible before uploading
it ?

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here.

Message #67 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sun, May 14, 2017 at 05:05:36PM +0200, Bill Allombert wrote:> OK, but how can I check that my package build is reproducible before uploading> it ?

in general you cannot find out with 100% certainity whether a given source package
will be reproducible. You can only find out with certainity if a package is *not*
reproducible…

that said

a.) go to http://reproducible.debian.net/$srcpkg and see if its reproducible today.

	Bill, did you do this for your packages?
	And then there is also https://tests.reproducible-builds.org/debian/unstable/index_dd-list.html#ballombe@debian.org
		which shows that half of your 26 packages in sid (main) are unreproducible
		with build path variation, though most of those unreproducible ones
		are reproducible without build path variation…
	-> https://tests.reproducible-builds.org/debian/testing/index_dd-list.html#ballombe@debian.org
		only shows 4 unreproducible packages…

b.) build it twice and compare using diffoscope
c.) use reprotest


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Guillem Jover <guillem@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 May 2017 20:00:10 GMT) (full text, mbox, link).

Message #72 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sun, 2017-05-14 at 15:20:54 +0000, Holger Levsen wrote:> On Sun, May 14, 2017 at 05:05:36PM +0200, Bill Allombert wrote:> > OK, but how can I check that my package build is reproducible before uploading> > it ?> > in general you cannot find out with 100% certainity whether a given source package> will be reproducible. You can only find out with certainity if a package is *not*> reproducible…
> > that said> > a.) go to http://reproducible.debian.net/$srcpkg and see if its reproducible today.> > 	Bill, did you do this for your packages?> 	And then there is also https://tests.reproducible-builds.org/debian/unstable/index_dd-list.html#ballombe@debian.org> 		which shows that half of your 26 packages in sid (main) are unreproducible> 		with build path variation, though most of those unreproducible ones> 		are reproducible without build path variation…
> 	-> https://tests.reproducible-builds.org/debian/testing/index_dd-list.html#ballombe@debian.org> 		only shows 4 unreproducible packages…

b.0.) use debrepro (from devscripts)

> b.) build it twice and compare using diffoscope> c.) use reprotest

Thanks,
Guillem

Message #77 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sun, May 14, 2017 at 09:58:12PM +0200, Guillem Jover wrote:> On Sun, 2017-05-14 at 15:20:54 +0000, Holger Levsen wrote:> > 	Bill, did you do this for your packages?

on re-reading what I wrote here, it occurred to me that this could be
read *hostile* despite me having *zero* intentions to be hostile… I
just wanted to be friendly and give helpful URLs to you, Bill… I'm
sorry if this came across differently!

> b.0.) use debrepro (from devscripts)
Thanks for this additional hint, Guillem!


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Bill Allombert <ballombe@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 May 2017 22:09:03 GMT) (full text, mbox, link).

Message #82 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sun, May 14, 2017 at 03:20:54PM +0000, Holger Levsen wrote:> On Sun, May 14, 2017 at 05:05:36PM +0200, Bill Allombert wrote:> a.) go to http://reproducible.debian.net/$srcpkg and see if its reproducible today.

As I said, I would like to check that my package build is reproducible before
I upload it, not after, so I can be sure that any bug is fixed in the
upload.

Some of my package were listed as reproducible for several months and
then became unreproducible without any new upload. I do not mind that.
However from a policy point of view, reproducible need to be defined
precisely. Generally speaking, reproducible means that the build will
not change if some (but not all) parameters are changed. What parameters
are allowed to change need to be defined.

One way is specify that would be to provide an authoritative tool to
validate packages.

Cheers,
PS: I thanks you for your advices, I will reply to you privately if I
need to.
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here.

Message #87 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Mon, May 15, 2017 at 12:05:17AM +0200, Bill Allombert wrote:> On Sun, May 14, 2017 at 03:20:54PM +0000, Holger Levsen wrote:> > On Sun, May 14, 2017 at 05:05:36PM +0200, Bill Allombert wrote:> > a.) go to http://reproducible.debian.net/$srcpkg and see if its reproducible today.> As I said, I would like to check that my package build is reproducible before> I upload it, not after, so I can be sure that any bug is fixed in the> upload.
b.), b.0), c.) and d.) were given as possible "tools" *to build twice with 
(some) variation(s) and compare the results*.

"Reproducible Builds" (in the sense of bit by bit identicall builds) is
really a rather new field in the era of software (well, not really, but 
thats history and bit rotted until it was rediscovered in the early 2010s…)

What is trivial, if given, is to show that a package is *un*reproducible.

It's much harder to show that a package is reproducible.

And given that this is a new field I think it's ok, while somewhat unsatisfying,
that maybe some unreproducibility will only be detected by a more advanced
tool, like reproducible.debian.net (which aint a,b,c nor d, but e.)
after an upload has taken place.

This is one of the reasons we are aiming for "packages *should* be reproducible"
now, and not "*must* be".

> Some of my package were listed as reproducible for several months and> then became unreproducible without any new upload. I do not mind that.

I guess this is because we introduced many more variations during 2014 and 2015.
During 2016 I don't recall us introducing many varitions, or rather many
causing many new unreproducibilty issues…

For 2017 there weren't any.

> However from a policy point of view, reproducible need to be defined> precisely.

Yes!

> Generally speaking, reproducible means that the build will> not change if some (but not all) parameters are changed.

Yes.

> What parameters> are allowed to change need to be defined.

I sadly think this is impossible.

> One way is specify that would be to provide an authoritative tool to> validate packages.

the tool to validate builds should be diff/sha256sum. a tool to simulate all possible
variations in the wild would probably need endless time to operate… 

> PS: I thanks you for your advices, I will reply to you privately if I> need to.

While you surely can do so (and I will happily reply) I would even more happily
prefer if you could ask me on public list (and ping in private if you havent 
gotten a reply in whatever you think is appropriate)… a.) then more people can 
learn b.) you'll probably get faster *and better replies* (esp. on language
specific details) and c.) this helps me getting my inbox under control :-)

-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Mon, 15 May 2017 06:51:03 GMT) (full text, mbox, link).

Acknowledgement sent to Wouter Verhelst <wouter@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 15 May 2017 06:51:03 GMT) (full text, mbox, link).

Message #92 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sun, May 14, 2017 at 11:15:26PM +0000, Holger Levsen wrote:> On Mon, May 15, 2017 at 12:05:17AM +0200, Bill Allombert wrote:> > On Sun, May 14, 2017 at 03:20:54PM +0000, Holger Levsen wrote:> > > On Sun, May 14, 2017 at 05:05:36PM +0200, Bill Allombert wrote:> > > a.) go to http://reproducible.debian.net/$srcpkg and see if its reproducible today.> > As I said, I would like to check that my package build is reproducible before> > I upload it, not after, so I can be sure that any bug is fixed in the> > upload.>  > b.), b.0), c.) and d.) were given as possible "tools" *to build twice with > (some) variation(s) and compare the results*.> > "Reproducible Builds" (in the sense of bit by bit identicall builds) is> really a rather new field in the era of software (well, not really, but > thats history and bit rotted until it was rediscovered in the early 2010s…)
> > What is trivial, if given, is to show that a package is *un*reproducible.> > It's much harder to show that a package is reproducible.> > And given that this is a new field I think it's ok, while somewhat unsatisfying,> that maybe some unreproducibility will only be detected by a more advanced> tool, like reproducible.debian.net (which aint a,b,c nor d, but e.)> after an upload has taken place.

I think it's probably not a good idea to (when we've moved to mandate
"packages must be reproducible") allow packages to become insta-buggy by
things that are out of their control and not clearly specified in
policy. That's not how we do things in Debian.

As such, I would favour the following approach:
- You guys (= the reproducible builds guys) come up with a list of
  things that commonly make a package nonreproducible today, and policy
  adds those as "should not"s. If I'm not mistaken, such a list already
  exists, you may simply need to generalize it a bit?
- Actually, I'm sure there may be things that packages failed to
  comply with in the past, but that are not a problem anymore today;
  we can make those "must not" rules already today.
- If you find new and interesting ways to make packages nonreproducible
  at some point in the future, those can be added (as "should" first,
  and as "must" later).

This would result in a section in policy of this form:

---
# Reproducible builds

Packages should generally be reproducible. That is, a package build
should result in a bit-by-bit identical package from one build to the
next.

Specifically, packages must not do any of the following things:
- non-reproducible thing A
- non-reproducible thing B
- ...

Moreover, while the following are not must rules yet, packages should
also not do any of the following things:
- still-in-the-wild non-reproducible thing A
- still-in-the-wild non-reproducible thing B
- ...
---

(wording may need some tweaking)

The above wording makes "bit-by-bit identical" a should (so packagers
are encouraged to reach that goal), but already allows you to file RC
bugs on some subset of "is not reproducible" package issues, and a
subset that will improve over time.

With that wording, I don't think we should ever make "bit-by-bit
identical" a must; I also don't think we would need to. As you say,
building packages nonreproducibly is difficult to define, and it
certainly is difficult to test for in a definite manner.

> > What parameters> > are allowed to change need to be defined.> > I sadly think this is impossible.

I agree that it will probably be a neverending effort, but I also think
it's the only way that it can reasonably be done.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
       people in the world who think they really understand all of its rules,
       and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Wed, 17 May 2017 22:54:03 GMT) (full text, mbox, link).

Acknowledgement sent to Bill Allombert <ballombe@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Wed, 17 May 2017 22:54:03 GMT) (full text, mbox, link).

Message #97 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sun, May 14, 2017 at 11:15:26PM +0000, Holger Levsen wrote:> On Mon, May 15, 2017 at 12:05:17AM +0200, Bill Allombert wrote:> > On Sun, May 14, 2017 at 03:20:54PM +0000, Holger Levsen wrote:> > > On Sun, May 14, 2017 at 05:05:36PM +0200, Bill Allombert wrote:> > > a.) go to http://reproducible.debian.net/$srcpkg and see if its reproducible today.> > As I said, I would like to check that my package build is reproducible before> > I upload it, not after, so I can be sure that any bug is fixed in the> > upload.>  > b.), b.0), c.) and d.) were given as possible "tools" *to build twice with > (some) variation(s) and compare the results*.> > "Reproducible Builds" (in the sense of bit by bit identicall builds) is> really a rather new field in the era of software (well, not really, but > thats history and bit rotted until it was rediscovered in the early 2010s…)
> > What is trivial, if given, is to show that a package is *un*reproducible.> > It's much harder to show that a package is reproducible.

We should avoid a terminological confusion...

Unreproducible means that "it will never be reproduced", which is quite
different from "it will always be reproduced".
Reproducible means that "it is possible to reproduce".

So in fact it is much easier to show that something is reproducible
than unreproducible.

There are situations where policy mandates that the build will be
different (for example setting DEB_BUILD_OPTIONS).

And actually, we do not need packages to build always identically.
Instead we need a reliable way to rebuild them identically, which is a
lower bar.

If (as it is planned) all packages are built by the autobuilders, then
we could provide a tool that rebuild a package (maybe by taking a
.buildinfo as input and downloading the same versions of the build
dependencies from snapshot.d.o) using the same setting as the autobuilders.

Then policy would cover the issues that could still lead to a different
build (for example using timestamp, hardcoding hardware characteristic
of the build machine , etc.).

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here.

Acknowledgement sent to Adrian Bunk <bunk@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 24 Jul 2017 21:12:02 GMT) (full text, mbox, link).

Message #102 received at 844431@bugs.debian.org (full text, mbox, reply):

>...> Debian Policy> =============> > We are in the process of making reproducibility of packages something> properly documented in policy.  Writing patches for policy is not easy,> so we welcome input from everyone to be able to better consider all the> needed facets.  See bug #844431 [16] for it.> Also, we wish to remind everyone that Debian Policy aims at documenting> current practices, it's not a "stick" to impose new rules.  That said,> we believe reproducible builds to be among the best practices today.>...

If it could be interpreted in the future to include things that are
not current practice today, it would be a stick to impose new rules.

The main problem is the lack of an exact definition what
"packages build in a reproducible manner" includes, and what not.

Bill already explained that "it is possible to reproduce" is a much 
easier problem to solve than "it will always be reproduced".

I would suggest a top-down approach to that:

What are the high-level guarantees reproducible builds plans to make 
for all packages in buster?

What exactly is required from every single package for that,
and also realistic to achieve for buster?

Once you have these plus a list of all remaining bugs, you can
go to the release team asking whether these can be considered
as release critical for buster.

At that point documenting this status quo for policy should
be straightforward.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Fri, 11 Aug 2017 23:21:06 GMT) (full text, mbox, link).

Acknowledgement sent to Sean Whitton <spwhitton@spwhitton.name>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Fri, 11 Aug 2017 23:21:06 GMT) (full text, mbox, link).

Message #107 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

control: user debian-policy@packages.debian.org
control: usertag = normative proposal

Hello,

==== Proposal: ====

This is what Holger and I think we should add to Policy, after
readability tweaks:

    Packages should build reproducibly, which for purposes of this
    document means that given

    - a version of a source package unpacked at a given path;
    - a set of versions of installed build-dependencies; and
    - a build architecture,

    repeatedly building the source package on the architecture with those
    versions of the build dependencies installed will produce bit-for-bit
    identical binary packages.

==== Explanation: ====

The definition from the reproducible builds group[1] says:

    A build is reproducible if given the same source code, build
    environment and build instructions, any party can recreate
    bit-by-bit identical copies of all specified artifacts.

    The relevant attributes of the build environment, the build
    instructions and the source code as well as the expected
    reproducible artifacts are defined by ... distributors.

i.e. Debian has to define the build environment, source code and build
instructions.  I think that my wording defines these as Debian currently
understands them.

Later, we could narrow the definition of build environment by adding
more constraints, but we're not there yet.

[1]  https://reproducible-builds.org/docs/definition/

-- 
Sean Whitton

[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#844431; Package debian-policy. (Sat, 12 Aug 2017 00:33:07 GMT) (full text, mbox, link).

Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 12 Aug 2017 00:33:07 GMT) (full text, mbox, link).

Message #112 received at 844431@bugs.debian.org (full text, mbox, reply):

Dear Sean & Holger,

Thank you so much for working on this at the end of a tiring DebConf!

> […]
> Later, we could narrow the definition of build environment by adding> more constraints, but we're not there yet.

That makes sense. Indeed, that even feels like the optimal approach
as it allows flexibility and experimentation, probably more important
the closer and closer we get to to 100% reproducibility.

Thanks again :)


Best wishes,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 12 Aug 2017 01:09:03 GMT) (full text, mbox, link).

Message #117 received at 844431@bugs.debian.org (full text, mbox, reply):

Sean Whitton <spwhitton@spwhitton.name> writes:> ==== Proposal: ====> This is what Holger and I think we should add to Policy, after> readability tweaks:>     Packages should build reproducibly, which for purposes of this>     document means that given>     - a version of a source package unpacked at a given path;>     - a set of versions of installed build-dependencies; and>     - a build architecture,>     repeatedly building the source package on the architecture with those>     versions of the build dependencies installed will produce bit-for-bit>     identical binary packages.

I think we need to add all environment variables starting with DEB_* to
the prerequisites.  If you set DEB_BUILD_OPTIONS=nostrip or
DEB_BUILD_MAINT_OPTIONS=hardening=all, you'll definitely get a different
package, for instance.

I feel like there are a bunch of other environment variables that have to
be consistent, although I'm not sure how to specify that since other
environment variables shouldn't matter.  But, say, setting GNUTARGET is
very likely to cause weirdness by changing how ld works.  There are
probably more interesting examples.

How does the current reproducible build testing work with the environment?
Maybe we should just document that for right now and relax it later if
needed?

> ==== Explanation: ====> The definition from the reproducible builds group[1] says:>     A build is reproducible if given the same source code, build>     environment and build instructions, any party can recreate>     bit-by-bit identical copies of all specified artifacts.>     The relevant attributes of the build environment, the build>     instructions and the source code as well as the expected>     reproducible artifacts are defined by ... distributors.> i.e. Debian has to define the build environment, source code and build> instructions.  I think that my wording defines these as Debian currently> understands them.> Later, we could narrow the definition of build environment by adding> more constraints, but we're not there yet.> [1]  https://reproducible-builds.org/docs/definition/

We should add a link to that page (maybe in a footnote).

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 12 Aug 2017 01:39:03 GMT) (full text, mbox, link).

Message #122 received at 844431@bugs.debian.org (full text, mbox, reply):

Thanks for the proposal.  I like it!  A few nit-picks below:

On Fri 2017-08-11 16:08:47 -0700, Sean Whitton wrote:

>     - a version of a source package unpacked at a given path;

I don't like the idea of hard-coding a fixed build path requirement into
debian policy.  We're over 80% with variable build paths in unstable
already, and i want to keep the pressure up on this.  The build location
should not influence the binary output.

>     repeatedly building the source package on the architecture with

maybe s/on the architecture/on any machine of the same architecture/ ?

all the best,

    --dkg

Message #127 received at 844431@bugs.debian.org (full text, mbox, reply):

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:> On Fri 2017-08-11 16:08:47 -0700, Sean Whitton wrote:>>     - a version of a source package unpacked at a given path;> I don't like the idea of hard-coding a fixed build path requirement into> debian policy.  We're over 80% with variable build paths in unstable> already, and i want to keep the pressure up on this.  The build location> should not influence the binary output.

It shouldn't, but my understanding is that it currently does.  If you can
fix that, that's great, but until that's been fixed, I don't see the harm
in documenting this as a prerequisite for a reproducible build.  If we can
relax that prerequisite later, great, but nothing about listing it here
should reduce the pressure on making variable build paths work.  It just
documents the current state of the world.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Message #130 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hi,

Quoting Russ Allbery (2017-08-12 09:57:44)
> I think we need to add all environment variables starting with DEB_* to> the prerequisites.  If you set DEB_BUILD_OPTIONS=nostrip or> DEB_BUILD_MAINT_OPTIONS=hardening=all, you'll definitely get a different> package, for instance.> > I feel like there are a bunch of other environment variables that have to> be consistent, although I'm not sure how to specify that since other> environment variables shouldn't matter.  But, say, setting GNUTARGET is> very likely to cause weirdness by changing how ld works.  There are> probably more interesting examples.> > How does the current reproducible build testing work with the environment?> Maybe we should just document that for right now and relax it later if> needed?

currently, dpkg-genbuildinfo records all environment variables in a .buildinfo
file which pass a whitelist check. The current whitelist is stored here:

https://anonscm.debian.org/cgit/dpkg/dpkg.git/tree/scripts/Dpkg/Build/Info.pm#n50

I'm not proposing that this whole list should be added to policy. But the list
that ends up in policy must be a subset of the list of environment variables
that dpkg-genbuildinfo stores in the .buildinfo file. Thus:

 - this list from dpkg should give a number of good suggestions of which
   environment variables should be added to policy

 - if any additional variables are added, then they must be added to
   dpkg-genbuildinfo as well.

Thanks!

cheers, josch

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Bill Allombert <ballombe@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 12 Aug 2017 10:03:05 GMT) (full text, mbox, link).

Message #135 received at 844431@bugs.debian.org (full text, mbox, reply):

On Fri, Aug 11, 2017 at 04:08:47PM -0700, Sean Whitton wrote:> control: user debian-policy@packages.debian.org> control: usertag = normative proposal> > Hello,> > ==== Proposal: ====> > This is what Holger and I think we should add to Policy, after> readability tweaks:> >     Packages should build reproducibly, which for purposes of this>     document means that given> >     - a version of a source package unpacked at a given path;>     - a set of versions of installed build-dependencies; and>     - a build architecture,> >     repeatedly building the source package on the architecture with those>     versions of the build dependencies installed will produce bit-for-bit>     identical binary packages.> > ==== Explanation: ====> > The definition from the reproducible builds group[1] says:> >     A build is reproducible if given the same source code, build>     environment and build instructions, any party can recreate>     bit-by-bit identical copies of all specified artifacts.> >     The relevant attributes of the build environment, the build>     instructions and the source code as well as the expected>     reproducible artifacts are defined by ... distributors.> > i.e. Debian has to define the build environment, source code and build> instructions.  I think that my wording defines these as Debian currently> understands them.

This require policy to define the build environment and build
instruction much more precisely than it does now, which does not
seems to be practical. Unless maybe if a reference implementation
is provided.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here.

Owner recorded as Sean Whitton <spwhitton@spwhitton.name>. Request was from Sean Whitton <spwhitton@spwhitton.name> to control@bugs.debian.org. (Sat, 12 Aug 2017 16:33:09 GMT) (full text, mbox, link).

Acknowledgement sent to Sean Whitton <spwhitton@spwhitton.name>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 12 Aug 2017 18:27:05 GMT) (full text, mbox, link).

Message #142 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

control: tag -1 +patch

This patch incorporates the feedback given on the proposal I sent
yesterday, both in this bug and in person from Russ and Holger (thank
you to all).

I am seeking formal seconds for this patch, from any DD.

In particular:

- for now, we only require reproducibility when the set of environment
  variable values set is exactly the same

  This is because

  - the reproducible builds team aren't yet totally clear on the
    variables that they think may be allowed to vary

  - we should wait until .buildinfo is properly documented in policy,
    and then we can refer to that file

- we don't require reproducibility when build paths vary

  This is because

  - since there is not a consensus on whether we should require this,
    and there is strong consensus on the requirement of reproducibility
    if the path does /not/ vary, this issue should not block this change.
    We should open a separate bug against debian-policy

diff --git a/policy/ch-source.rst b/policy/ch-source.rst
index 127b125..cc4b020 100644
--- a/policy/ch-source.rst
+++ b/policy/ch-source.rst
@@ -661,6 +661,22 @@ particularly complex or unintuitive source layout or build system (for
 example, a package that builds the same source multiple times to
 generate different binary packages).
 
+Reproducibility
+---------------
+
+Packages should build reproducibly, which for the purposes of this
+document [#]_ means that given
+
+- a version of a source package unpacked at a given path;
+- a set of versions of installed build dependencies;
+- a set of environment variable values; and
+- a build architecture,
+
+repeatedly building the source package on any machine of the same
+architecture with those versions of the build dependencies installed
+and exactly those environment variable values set will produce
+bit-for-bit identical binary packages.
+
 .. [#]
    See the file ``upgrading-checklist`` for information about policy
    which has changed between different versions of this document.
@@ -790,3 +806,7 @@ generate different binary packages).
    often creates either static linking or shared library conflicts, and,
    most importantly, increases the difficulty of handling security
    vulnerabilities in the duplicated code.
+
+.. [#]
+   This is Debian's precisification of the `reproducible-builds.org
+   definition <https://reproducible-builds.org/docs/definition/>`_.

-- 
Sean Whitton

[signature.asc (application/pgp-signature, inline)]

Added tag(s) patch. Request was from Sean Whitton <spwhitton@spwhitton.name> to 844431-submit@bugs.debian.org. (Sat, 12 Aug 2017 18:27:05 GMT) (full text, mbox, link).

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>:
Bug#844431; Package debian-policy. (Sat, 12 Aug 2017 18:51:03 GMT) (full text, mbox, link).

Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 18:51:04 GMT) (full text, mbox, link).

Message #149 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sat, Aug 12, 2017 at 11:23:14AM -0700, Sean Whitton wrote:> I am seeking formal seconds for this patch, from any DD.> > In particular:> > - for now, we only require reproducibility when the set of environment>   variable values set is exactly the same> >   This is because> >   - the reproducible builds team aren't yet totally clear on the>     variables that they think may be allowed to vary> >   - we should wait until .buildinfo is properly documented in policy,>     and then we can refer to that file> > - we don't require reproducibility when build paths vary> >   This is because> >   - since there is not a consensus on whether we should require this,>     and there is strong consensus on the requirement of reproducibility>     if the path does /not/ vary, this issue should not block this change.>     We should open a separate bug against debian-policy> > diff --git a/policy/ch-source.rst b/policy/ch-source.rst> index 127b125..cc4b020 100644> --- a/policy/ch-source.rst> +++ b/policy/ch-source.rst> @@ -661,6 +661,22 @@ particularly complex or unintuitive source layout or build system (for>  example, a package that builds the same source multiple times to>  generate different binary packages).>  > +Reproducibility> +---------------> +> +Packages should build reproducibly, which for the purposes of this> +document [#]_ means that given> +> +- a version of a source package unpacked at a given path;> +- a set of versions of installed build dependencies;> +- a set of environment variable values; and> +- a build architecture,> +> +repeatedly building the source package on any machine of the same> +architecture with those versions of the build dependencies installed> +and exactly those environment variable values set will produce> +bit-for-bit identical binary packages.> +>  .. [#]>     See the file ``upgrading-checklist`` for information about policy>     which has changed between different versions of this document.> @@ -790,3 +806,7 @@ generate different binary packages).>     often creates either static linking or shared library conflicts, and,>     most importantly, increases the difficulty of handling security>     vulnerabilities in the duplicated code.> +> +.. [#]> +   This is Debian's precisification of the `reproducible-builds.org> +   definition <https://reproducible-builds.org/docs/definition/>`_.

very happily seconded, many thanks to everyone who has contributed to this bug
directly or "indirectly" (I'm thinking specifically about Lunar here).


-- 
cheers,
	Holger (who watched http://meetings-archive.debian.net/pub/debian-meetings/2017/debconf17/reproducible-builds-status-update.vp8.webm today and was equally happy when seeing the whole audience agreeing this should be in policy - and the applause after Russ's closing statement was also very very nice…!)

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Ondrej Novy <novy@ondrej.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 18:54:03 GMT) (full text, mbox, link).

Message #154 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hi,

2017-08-12 14:23 GMT-04:00 Sean Whitton <spwhitton@spwhitton.name>:> control: tag -1 +patch>> This patch incorporates the feedback given on the proposal I sent> yesterday, both in this bug and in person from Russ and Holger (thank> you to all).>

seconded, thanks for working on this.

-- 
Best regards
 Ondřej Nový

Email: novy@ondrej.org
PGP: 3D98 3C52 EB85 980C 46A5  6090 3573 1255 9D1E 064B

[Message part 2 (text/html, inline)]

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 19:27:02 GMT) (full text, mbox, link).

Message #159 received at 844431@bugs.debian.org (full text, mbox, reply):

Sean Whitton <spwhitton@spwhitton.name> writes:> diff --git a/policy/ch-source.rst b/policy/ch-source.rst> index 127b125..cc4b020 100644> --- a/policy/ch-source.rst> +++ b/policy/ch-source.rst> @@ -661,6 +661,22 @@ particularly complex or unintuitive source layout or build system (for>  example, a package that builds the same source multiple times to>  generate different binary packages).>  > +Reproducibility> +---------------> +> +Packages should build reproducibly, which for the purposes of this> +document [#]_ means that given> +> +- a version of a source package unpacked at a given path;> +- a set of versions of installed build dependencies;> +- a set of environment variable values; and> +- a build architecture,> +> +repeatedly building the source package on any machine of the same> +architecture with those versions of the build dependencies installed> +and exactly those environment variable values set will produce> +bit-for-bit identical binary packages.> +>  .. [#]>     See the file ``upgrading-checklist`` for information about policy>     which has changed between different versions of this document.> @@ -790,3 +806,7 @@ generate different binary packages).>     often creates either static linking or shared library conflicts, and,>     most importantly, increases the difficulty of handling security>     vulnerabilities in the duplicated code.> +> +.. [#]> +   This is Debian's precisification of the `reproducible-builds.org> +   definition <https://reproducible-builds.org/docs/definition/>`_.

Seconded.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Acknowledgement sent to Ximin Luo <infinity0@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 20:03:03 GMT) (full text, mbox, link).

Message #164 received at 844431@bugs.debian.org (full text, mbox, reply):

Sean Whitton:> diff --git a/policy/ch-source.rst b/policy/ch-source.rst> index 127b125..cc4b020 100644> --- a/policy/ch-source.rst> +++ b/policy/ch-source.rst> @@ -661,6 +661,22 @@ particularly complex or unintuitive source layout or build system (for>  example, a package that builds the same source multiple times to>  generate different binary packages).>  > +Reproducibility> +---------------> +> +Packages should build reproducibly, which for the purposes of this> +document [#]_ means that given> +> +- a version of a source package unpacked at a given path;> +- a set of versions of installed build dependencies;> +- a set of environment variable values; and> +- a build architecture,> +> +repeatedly building the source package on any machine of the same> +architecture with those versions of the build dependencies installed> +and exactly those environment variable values set will produce> +bit-for-bit identical binary packages.> +

To echo dkg and others' comments, it would be nice if we could add here:

+Packages are encouraged to produce bit-for-bit identical binary packages even
+if most environment variables and build paths are varied. This is technically
+more difficult at the time of writing, but it is intended that this stricter
+definition would replace the above one, when appropriate in the future.

If this type of "intent" wording is not appropriate for Policy then disregard what I'm saying, I don't wish to block this patch for this reason.

>  .. [#]>     See the file ``upgrading-checklist`` for information about policy>     which has changed between different versions of this document.> @@ -790,3 +806,7 @@ generate different binary packages).>     often creates either static linking or shared library conflicts, and,>     most importantly, increases the difficulty of handling security>     vulnerabilities in the duplicated code.> +> +.. [#]> +   This is Debian's precisification of the `reproducible-builds.org> +   definition <https://reproducible-builds.org/docs/definition/>`_.> 

"precisification" -> "more precise version"

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 20:21:02 GMT) (full text, mbox, link).

Message #169 received at 844431@bugs.debian.org (full text, mbox, reply):

Ximin Luo <infinity0@debian.org> writes:> To echo dkg and others' comments, it would be nice if we could add here:> +Packages are encouraged to produce bit-for-bit identical binary packages even> +if most environment variables and build paths are varied. This is technically> +more difficult at the time of writing, but it is intended that this stricter> +definition would replace the above one, when appropriate in the future.> If this type of "intent" wording is not appropriate for Policy then> disregard what I'm saying, I don't wish to block this patch for this> reason.

Oh, that's a good way to capture that.  This seems fine to me, and I have
no objections to adding this advice.  Seconded the original with or
without this addition.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 20:33:02 GMT) (full text, mbox, link).

Message #174 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sat, Aug 12, 2017 at 01:18:23PM -0700, Russ Allbery wrote:> > +Packages are encouraged to produce bit-for-bit identical binary packages even> > +if most environment variables and build paths are varied. This is technically> > +more difficult at the time of writing, but it is intended that this stricter> > +definition would replace the above one, when appropriate in the future.> > > If this type of "intent" wording is not appropriate for Policy then> > disregard what I'm saying, I don't wish to block this patch for this> > reason.> > Oh, that's a good way to capture that.  This seems fine to me, and I have> no objections to adding this advice.  Seconded the original with or> without this addition.
I'm also seconding the original with or without this addition.


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Message #177 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hi,

Quoting Sean Whitton (2017-08-13 03:23:14)
> +Reproducibility> +---------------> +> +Packages should build reproducibly, which for the purposes of this> +document [#]_ means that given> +> +- a version of a source package unpacked at a given path;> +- a set of versions of installed build dependencies;> +- a set of environment variable values; and> +- a build architecture,

Policy §4.9 defines "build architecture" in the context of dpkg-architecture
already and I think what you mean here is either "host architecture" or at
least "build and host architecture" or you need to mention that you are only
talking about native builds where build and host architecture are equal.

Thanks!

cheers, josch

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 21:00:06 GMT) (full text, mbox, link).

Message #182 received at 844431@bugs.debian.org (full text, mbox, reply):

Johannes Schauer <josch@debian.org> writes:> Policy §4.9 defines "build architecture" in the context of
> dpkg-architecture already and I think what you mean here is either "host> architecture" or at least "build and host architecture" or you need to> mention that you are only talking about native builds where build and> host architecture are equal.

I suspect we want to say build and host architecture for right now.
(Maybe we can later aspire to making the build architecture not matter.)

Thanks, good catch!

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 21:03:03 GMT) (full text, mbox, link).

Message #187 received at 844431@bugs.debian.org (full text, mbox, reply):

Bill Allombert <ballombe@debian.org> writes:> This require policy to define the build environment and build> instruction much more precisely than it does now, which does not seems> to be practical. Unless maybe if a reference implementation is provided.

I don't see anything in this proposal that would require a more precise
definition than we have in Sean's current proposal.  This is the standard
that we're already using for filing reproducible build bugs in the
archive, and it's been basically fine.

The tools aren't in place yet to make it super-easy for people to test for
themselves, but that's in the works, and that's also why it's a should
(not must) and there's infrastructure in place for Debian to check it for
you.

We can always aspire to get more formal and specific in the future, but
that's true of many other parts of Policy as well.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sat, 12 Aug 2017 21:45:03 GMT) (full text, mbox, link).

Message #192 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Fri, Aug 11, 2017 at 08:35:47PM -0700, Russ Allbery wrote:> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:> > I don't like the idea of hard-coding a fixed build path requirement into> > debian policy. 

I don't *like* it neither but I think it's the sensible thing to do now.

> > We're over 80% with variable build paths in unstable> > already, and i want to keep the pressure up on this.  The build location> > should not influence the binary output.

I'd like to keep the pressure on this but and I think we can still that
while OTOH also trying to get closer to 100% first+too.

With build path variation reaching the worthwhile goal of having >98% reproducible
builds will be delayed by 1-2 years at least, so this is a classic "perfect is the
enemy of good". I don't do reproducible builds for purely academic reasons,
I foremost want them to increase the security of user systems.> It shouldn't, but my understanding is that it currently does.  If you can> fix that, that's great, but until that's been fixed, I don't see the harm> in documenting this as a prerequisite for a reproducible build.  If we can> relax that prerequisite later, great, but nothing about listing it here> should reduce the pressure on making variable build paths work.  It just> documents the current state of the world.

exactly.


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Sean Whitton <spwhitton@spwhitton.name>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 12 Aug 2017 22:39:02 GMT) (full text, mbox, link).

Message #197 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hello,

On Sat, Aug 12 2017, Russ Allbery wrote:

> I suspect we want to say build and host architecture for right now.> (Maybe we can later aspire to making the build architecture not> matter.)

On Sat, Aug 12 2017, Ximin Luo wrote:

> To echo dkg and others' comments, it would be nice if we could add> here:>> +Packages are encouraged to produce bit-for-bit identical binary> packages even +if most environment variables and build paths are> varied. This is technically +more difficult at the time of writing,> but it is intended that this stricter +definition would replace the> above one, when appropriate in the future.

Here is an updated patch addressing these.  I reworded it to use
'recommended' and changed the tone to better suit policy.

Thank you Ximin, Russ and Johannes!

> "precisification" -> "more precise version"

Our definition is not actually a /version/ of the
reproducible-builds.org definition -- that would imply that our
definition could replace the reproducible-builds.org definition, like
upgrading a package.

'precisification' means roughly "filling out the missing specification
when it is appropriate to fill it out", which is what the r-p.org
definition instructs distributors to do.

diff --git a/policy/ch-source.rst b/policy/ch-source.rst
index 127b125..6e32870 100644
--- a/policy/ch-source.rst
+++ b/policy/ch-source.rst
@@ -661,6 +661,28 @@ particularly complex or unintuitive source layout or build system (for
 example, a package that builds the same source multiple times to
 generate different binary packages).
 
+Reproducibility
+---------------
+
+Packages should build reproducibly, which for the purposes of this
+document [#]_ means that given
+
+- a version of a source package unpacked at a given path;
+- a set of versions of installed build dependencies;
+- a set of environment variable values;
+- a build architecture; and
+- a host architecture,
+
+repeatedly building the source package for the build architecture on
+any machine of the host architecture with those versions of the build
+dependencies installed and exactly those environment variable values
+set will produce bit-for-bit identical binary packages.
+
+It is recommended that packages produce bit-for-bit identical binaries
+even if most environment variables and build paths are varied.  It is
+intended for this stricter standard to replace the above when it is
+easier for packages to meet it.
+
 .. [#]
    See the file ``upgrading-checklist`` for information about policy
    which has changed between different versions of this document.
@@ -790,3 +812,7 @@ generate different binary packages).
    often creates either static linking or shared library conflicts, and,
    most importantly, increases the difficulty of handling security
    vulnerabilities in the duplicated code.
+
+.. [#]
+   This is Debian's precisification of the `reproducible-builds.org
+   definition <https://reproducible-builds.org/docs/definition/>`_.

-- 
Sean Whitton

[signature.asc (application/pgp-signature, inline)]

Message #202 received at 844431@bugs.debian.org (full text, mbox, reply):

Sean Whitton:> [..]> > Here is an updated patch addressing these.  I reworded it to use> 'recommended' and changed the tone to better suit policy.> > Thank you Ximin, Russ and Johannes!> >> "precisification" -> "more precise version"> > Our definition is not actually a /version/ of the> reproducible-builds.org definition -- that would imply that our> definition could replace the reproducible-builds.org definition, like> upgrading a package.> > 'precisification' means roughly "filling out the missing specification> when it is appropriate to fill it out", which is what the r-p.org> definition instructs distributors to do.> > diff --git a/policy/ch-source.rst b/policy/ch-source.rst> index 127b125..6e32870 100644> --- a/policy/ch-source.rst> +++ b/policy/ch-source.rst> @@ -661,6 +661,28 @@ particularly complex or unintuitive source layout or build system (for>  example, a package that builds the same source multiple times to>  generate different binary packages).>  > +Reproducibility> +---------------> +> +Packages should build reproducibly, which for the purposes of this> +document [#]_ means that given> +> +- a version of a source package unpacked at a given path;> +- a set of versions of installed build dependencies;> +- a set of environment variable values;> +- a build architecture; and> +- a host architecture,> +> +repeatedly building the source package for the build architecture on> +any machine of the host architecture with those versions of the build> +dependencies installed and exactly those environment variable values> +set will produce bit-for-bit identical binary packages.> +> +It is recommended that packages produce bit-for-bit identical binaries> +even if most environment variables and build paths are varied.  It is> +intended for this stricter standard to replace the above when it is> +easier for packages to meet it.> +>  .. [#]>     See the file ``upgrading-checklist`` for information about policy>     which has changed between different versions of this document.> @@ -790,3 +812,7 @@ generate different binary packages).>     often creates either static linking or shared library conflicts, and,>     most importantly, increases the difficulty of handling security>     vulnerabilities in the duplicated code.> +> +.. [#]> +   This is Debian's precisification of the `reproducible-builds.org> +   definition <https://reproducible-builds.org/docs/definition/>`_.> > 

Thanks! Seconded.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Acknowledgement sent to Sean Whitton <spwhitton@spwhitton.name>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 13 Aug 2017 12:27:03 GMT) (full text, mbox, link).

Message #207 received at 844431@bugs.debian.org (full text, mbox, reply):

On Sat, Aug 12 2017, Ximin Luo wrote:> Thanks! Seconded.

Just to be clear, we are waiting on one more second for the version
that refers to build and target architecture.

-- 
Sean Whitton

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>:
Bug#844431; Package debian-policy. (Sun, 13 Aug 2017 13:30:03 GMT) (full text, mbox, link).

Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sun, 13 Aug 2017 13:30:03 GMT) (full text, mbox, link).

Message #212 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sat, Aug 12, 2017 at 03:34:35PM -0700, Sean Whitton wrote:> Here is an updated patch addressing these.  I reworded it to use> 'recommended' and changed the tone to better suit policy.> > Thank you Ximin, Russ and Johannes!> > > "precisification" -> "more precise version"> > Our definition is not actually a /version/ of the> reproducible-builds.org definition -- that would imply that our> definition could replace the reproducible-builds.org definition, like> upgrading a package.> > 'precisification' means roughly "filling out the missing specification> when it is appropriate to fill it out", which is what the r-p.org> definition instructs distributors to do.> > diff --git a/policy/ch-source.rst b/policy/ch-source.rst> index 127b125..6e32870 100644> --- a/policy/ch-source.rst> +++ b/policy/ch-source.rst> @@ -661,6 +661,28 @@ particularly complex or unintuitive source layout or build system (for>  example, a package that builds the same source multiple times to>  generate different binary packages).>  > +Reproducibility> +---------------> +> +Packages should build reproducibly, which for the purposes of this> +document [#]_ means that given> +> +- a version of a source package unpacked at a given path;> +- a set of versions of installed build dependencies;> +- a set of environment variable values;> +- a build architecture; and> +- a host architecture,> +> +repeatedly building the source package for the build architecture on> +any machine of the host architecture with those versions of the build> +dependencies installed and exactly those environment variable values> +set will produce bit-for-bit identical binary packages.> +> +It is recommended that packages produce bit-for-bit identical binaries> +even if most environment variables and build paths are varied.  It is> +intended for this stricter standard to replace the above when it is> +easier for packages to meet it.> +>  .. [#]>     See the file ``upgrading-checklist`` for information about policy>     which has changed between different versions of this document.> @@ -790,3 +812,7 @@ generate different binary packages).>     often creates either static linking or shared library conflicts, and,>     most importantly, increases the difficulty of handling security>     vulnerabilities in the duplicated code.> +> +.. [#]> +   This is Debian's precisification of the `reproducible-builds.org> +   definition <https://reproducible-builds.org/docs/definition/>`_.

seconded & thanks for these improvements!


-- 
cheers,
	Holger

[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>:
Bug#844431; Package debian-policy. (Sun, 13 Aug 2017 14:51:03 GMT) (full text, mbox, link).

Acknowledgement sent to gregor herrmann <gregoa@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>, Sean Whitton <spwhitton@spwhitton.name>. (Sun, 13 Aug 2017 14:51:03 GMT) (full text, mbox, link).

Message #217 received at 844431@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sat, 12 Aug 2017 15:34:35 -0700, Sean Whitton wrote:> diff --git a/policy/ch-source.rst b/policy/ch-source.rst> index 127b125..6e32870 100644> --- a/policy/ch-source.rst> +++ b/policy/ch-source.rst> @@ -661,6 +661,28 @@ particularly complex or unintuitive source layout or build system (for>  example, a package that builds the same source multiple times to>  generate different binary packages).>  > +Reproducibility> +---------------> +> +Packages should build reproducibly, which for the purposes of this> +document [#]_ means that given> +> +- a version of a source package unpacked at a given path;> +- a set of versions of installed build dependencies;> +- a set of environment variable values;> +- a build architecture; and> +- a host architecture,> +> +repeatedly building the source package for the build architecture on> +any machine of the host architecture with those versions of the build> +dependencies installed and exactly those environment variable values> +set will produce bit-for-bit identical binary packages.> +> +It is recommended that packages produce bit-for-bit identical binaries> +even if most environment variables and build paths are varied.  It is> +intended for this stricter standard to replace the above when it is> +easier for packages to meet it.> +>  .. [#]>     See the file ``upgrading-checklist`` for information about policy>     which has changed between different versions of this document.> @@ -790,3 +812,7 @@ generate different binary packages).>     often creates either static linking or shared library conflicts, and,>     most importantly, increases the difficulty of handling security>     vulnerabilities in the duplicated code.> +> +.. [#]> +   This is Debian's precisification of the `reproducible-builds.org> +   definition <https://reproducible-builds.org/docs/definition/>`_.


Seconded.

Thanks to everyone for their work on this.


Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at/ - Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe
   `-

[signature.asc (application/pgp-signature, inline)]

Added tag(s) pending. Request was from Sean Whitton <spwhitton@spwhitton.name> to control@bugs.debian.org. (Mon, 14 Aug 2017 16:24:04 GMT) (full text, mbox, link).

Send a report that this bug log contains spam.

Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified:Mon Aug 14 22:15:05 2017; Machine Name:beach

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

↧

U.S. judge says LinkedIn cannot block startup from public profile data

August 14, 2017, 2:04 pm

≫ Next: The dwarfs of our vocabulary

≪ Previous: “Packages should be reproducible” added to Debian Policy

SAN FRANCISCO (Reuters) - A U.S. federal judge on Monday ruled that Microsoft Corp's (MSFT.O) LinkedIn unit cannot prevent a startup from accessing public profile data, in a test of how much control a social media site can wield over information its users have deemed to be public.

U.S. District Judge Edward Chen in San Francisco granted a preliminary injunction request brought by hiQ Labs, and ordered LinkedIn to remove within 24 hours any technology preventing hiQ from accessing public profiles.

"To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers," Chen's order reads.

LinkedIn plans to challenge the decision, a company spokeswoman said.

“We’re disappointed in the court’s ruling,” the spokeswoman said. “This case is not over. We will continue to fight to protect our members’ ability to control the information they make available on LinkedIn.”

The dispute between the two tech companies has been going on since May, when LinkedIn issued a letter to hiQ Labs instructing the startup to stop scraping data from its service.

HiQ Labs responded by filing a suit against LinkedIn in June, alleging that the Microsoft-owned social network was in violation of antitrust laws. HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit.

The case is considered to have implications beyond LinkedIn and hiQ Labs and could dictate just how much control companies have over publicly available data that is hosted on their services.

Representatives of hiQ Labs did not reply to requests for comment.

Reporting by Salvador Rodriguez and Dan Levine; Editing by Tom Brown

↧

The dwarfs of our vocabulary

August 14, 2017, 8:49 pm

≫ Next: Bezos should put his billions in public libraries

≪ Previous: U.S. judge says LinkedIn cannot block startup from public profile data

I receive all kinds of questions about etymology. Unless they are responses to my posts, they usually concern slang and exotic words. No one seems to care about and, as, at, for, and their likes. Conjunctions and prepositions are taken for granted, even though their origin is sometimes obscure and their history full of meaning. In my work, I have dealt only with if and yet in detail and found both etymologies highly complicated.

The sorely neglected members of etymology.

One thing can be said with certainty about conjunctions and other so-called form words. Their change into connectives was a gradual process. Today they are usually very short. For instance, the Russian for “and” is i (the vowel as in Engl. it), while one of the prepositions for “at” is u (as in Engl. put). In the past, all such words were longer. The process of abridgment can sometimes be observed without any knowledge of historical linguistics. For example, the Scandinavian languages lost final n, so that the cognate of Engl. in there is just i. The English indefinite article goes back to the numeral one. It still has n before vowels (an apple), but before consonants only a remains (a pear).

A modern conjunction could once be an adverb. Such is the history of Engl. but, which not too long ago meant “outside” (it still does so in some British dialects). The Old English forms of but and about were būtan and abūtan respectively, so that their similarity needs no proof. Both words have lost their second syllable (-an). However, the root vowel of about preserved its length and became a diphthong by the Great Vowel Shift, whereas in būtan, which stood in an unstressed position, ū was shortened; hence the modern form but, as in shut, cut, and so forth. Būtan was a sum of a preposition (be or bi) and ūtan“out” (such sums are common: compare Engl. within and without). Therefore, it comes as no surprise that but, with its ancient reference to things “outside,” has turned into an adversative conjunction.

Even more instructive is the history of the words for “and.” The Latin for “and” was et, recognized from French, where it has stayed unchanged in spelling, and from et cetera. In Gothic (a Germanic language, recorded in the fourth century), the exact correspondence of et is iþ (þ has the value of th in Engl. thin). If Slavic ot– “from; against” has the same root, we witness another case of a symbiosis between a conjunction and a preposition. This iþ ~ id was all over the place in Old English, German, Saxon, and Icelandic, though in all those languages it had lost part of its independence, turned into a prefix, and functioned as the first element of compound words.

English has its reflex (continuation) in at least two words: in eddish“aftermath; stubble,” which was called edgrow in Middle English, and in eddy“a small whirlpool.” If eddy had existed in Old English, it would have sounded as edwǣg (wǣg“wave”: cf. German Woge), that is, approximately “wave and another wave.” Since eddy was recorded only in the fifteenth century, it might have been a borrowing from Scandinavian (Old Icelandic had iða, a close counterpart of eddy).

This is a whirlpool, or eddy, signifying the importance of the conjunction *and*.

We can now return to and. It looks like a cognate of Latin ante“before.” The word from which it was derived meant “across; separated; in front of.” It can be seen in Engl. end, whose Gothic cognate is andeis, and Old Icelandic enni“forehead.” Both the end and the forehead are indeed “at the font.” German und had many variants, anti, enti, and inti, among them. It would be easy to refer this plethora of vowels to ablaut and in one case to umlaut (in enti, final i would have turned a into e, but Icelandic had en ~ enn, a safe cognate, and there was no i in it).

Why did Old High German need so many variants of such an outwardly simple word? Full-length essays and even a book have been written about this conjunction. Characteristic of the mess one encounters in dealing with form words is also the multitude of senses Middle High German unde ~ und ~ unt had: “and; likewise; but; meanwhile; namely; as; as long as; which.” In translating the great Middle High German poems, one often wonders what this short word means.

In one’s own language, the speaker rarely notices such a lack of precision. Consider Engl. as“to such a degree; according to; when; because” (as soon as possible;as I said; as the night was drawing on…; as there is no quorum….). Something along the same lines can be said about since (it has been years sincewe met; since so many people are absent…). Usually the context disambiguates the message. Yet even in a living language trouble is not always excluded. German wenn means “when” and “if.” Therefore, wenn ich komme means either “when I come” or “if I come.” This is rather inconvenient, but unschooled people are not even aware of the problem. Engl. as is derived from alswā, that is, “also.” We see the same scenario occurring again and again. It appears that every conjunction, a short connective, reduced to one or two sounds, was once a full-fledged word. Repeated hundreds of times in the role of a link and occurring in an unstressed position, it would lose its weight and become a syntactic ligament.

Depressed by the vagaries of English syntax.

However simple the word and may seem, old languages often distinguished between the connective link between words and a synonymous link between clauses. A special particle meaning “and” might be added to words and words only. Such are Latin –que and Gothic –uh. The particle appended to the end of a word is called enclitic. Old Germanic (especially Old Icelandic) is full of enclitics. In the earliest stages of the Indo-European languages, subordinating conjunctions were rare. It is easier to say something, add and, and go on (the way children recount an episode they have just seen; this system is called parataxis). Etymology shows how notional words gradually turned into connectives, so that as, since, etc. emerged; how prepositions acquired the role of conjunctions (consider Engl. for indicating purpose—for you, side by side with for“because”) and how our modern system of hypotaxis, with its plethora of subordinating conjunctions, came into being. Characteristically, in Old Icelandic, the conjunction er meant “as, when, which, etc.”: its sole function was to show that a subordinate clause is setting in. Modern readers often wonder how to interpret such sentences.

We can sometimes guess the origin of a conjunction by looking at it. Thus, because is obviously be and cause. But short words may need reinforcement, and this is how languages produce monsters like whatsoever, notwithstanding, insofar as, and inasmuchas. The last of them troubled Winnie the Pooh’s friend Eeyore, who spent some time ruminating on its meaning: inasmuch as what? We may leave the melancholy creature to its own devices but note that the history of conjunctions shows how human thought produced abstract concepts (from “the front part” to the additive and and again from “in addition to” to and), how it encouraged prepositions and conjunctions to exchange hostages, and how, to remedy the confusion, it made people coin long and unwieldy phrases, some of which can depress even a toy donkey.

Image credits: (1) “Snow white 1937 trailer screenshot (2)” by Petrusbarbygere, Public Domain via Wikimedia Commons. (2) “The Corryvreckan Whirlpool” by Walter Baxter, CC BY-SA 2.0 via Wikimedia Commons. (3) “Eeyore” by Christene S., CC BY 2.0 via Flickr. (4) “Caught Reading” by John Morgan, CC BY 2.0 via Flickr. Featured image: “Imp, Santa Claus, Dwarf” by brisch27, Public Domain via Pixabay.

↧

Bezos should put his billions in public libraries

August 15, 2017, 8:55 am

≫ Next: On Melissa O’Neill’s PCG random number generator

≪ Previous: The dwarfs of our vocabulary

Susan Crawford is a columnist for Backchannel and a professor at Harvard Law School. She is also the author of The Responsive City and Captive Audience.

———

Imagine that you are Jeff Bezos. For four hours two weeks ago, you were the richest person in the world. And though Wall Street knocked you down a notch, pretty much everyone thinks it’s inevitable that you’re going to be number one again. You’re starting to be aware of the smell of the tar pits and you’re casting about for a way to put all that loot to some good. You're eying the Gates-Buffet Giving Pledge and thinking that if you donate half your fortune it should make a difference. You're comfortable with making older but meaningful institutions great again.

So far, you’ve concentrated on things that might benefit our distant successors—space travel, cancer treatments, AI, and a clock that will keep running for 10,000 years. But you want to do something more immediate. You say you want your philanthropic activity “to be helping people in the here and now—short term—at the intersection of urgent need and lasting impact.” You are open to suggestions–so much so that you even recently tweeted a “request for ideas.”

Though you don’t mention it, I suspect you’re thinking of stepping into an area that traditionally government might have addressed—but now, in an era in which the wealthy are doing better and better, benefits seem to go toward the top while the “urgent needs” of just plain people are left to the grace of a harsh marketplace. Like it or not, citizens are increasingly dependent on the kindness of strangers with billions of dollars,

I have a suggestion for you, Jeff Bezos. How would you like to become the Andrew Carnegie of our time?

Yes, I am talking about libraries. Those places where books sit on shelves, not delivered by FedEx. And so much more. Carnegie made them the center of his philanthropy, and almost became synonymous with them. More importantly, he changed countless lives with his investments in libraries. I have heard that you’re looking for big ideas, and this is one.

Today, local libraries are thought of as slightly retro public institutions. For some reason, major donors don't get excited about them. OK, there are some notable exceptions to this rule—in my adopted city of New York, for instance, Stephen Schwarzman has his name engraved on the main branch building of the public library; in Kansas City, the Kemper family has donated millions to the downtown branch and a Kemper scion, R. Crosby Kemper III, has been the executive director of the library for more than 12 years.

But the real impact—the one that changes lives and transforms communities—has yet to be made. It turns out that libraries are the very model of the more-than-shovel-ready, here-and-now, urgent-need-and-lasting-impact places that you as a tech philanthropist claim to be interested in supporting in a big way. And libraries’ needs are dire.

You, Mr. Bezos, may not have been inside a library in a while. Things have changed. Today, libraries are serving as essential civic places. Trusted by every part of American society, they're the only noncommercial places other than city squares where people meet across genders and ages. They provide all kinds of services and programming—just visit the glorious Madison, WI Central Library, where a first-rate makerspace is under the same LEED-certified roof as local service agencies helping people sign up for health care and food assistance.

Librarians are not shushing people, and libraries are no longer only silent cathedrals for solo reading. (They still have reading rooms—don't worry.) Instead, these great pieces of civic architecture are being repurposed: They're places that offer classes in computer skills and thousands of other subjects, provide internet access to millions of Americans who can't afford it, and host innumerable neighborhood meetings.

Libraries these days are providing meals to kids and adults through local food banks, working with local immigrant agencies, offering homework help, and loaning out an amazing array of things, from musical instruments to microscopes. (Yes: the Library of Things.) What they're up to is dazzling. And in 2013, 94 percent of Americans said that having a public library improves the quality of life in a community. As America gets older and more unequal, its people need new forms of education to thrive—and libraries are ground zero for every public value the country cares about.

The American Library Association says that America’s more than 120,000 public, school, academic, and special libraries are visited more than 1.4 billion times a year by hundreds of millions of Americans in every corner of the nation and from every walk of life. They complement but do not compete with your mighty commercial bookselling venture, Mr. Bezos. At the same time, libraries are chronically under-resourced. Limited hours. Limited staff. Low pay. Constant need for renovation. Overcrowding.

Libraries are attempting to serve people in an era of thin government support, increasing need, and staggering inequality—much like the era that gave us Andrew Carnegie. His response to the problems of his time was to build thousands of public libraries across the country, starting in 1886. Most of those beloved community libraries are still functioning. Carnegie aimed high, wanting to make the world better than he found it. And he succeeded.

Here's the twist in the story that you, Mr. Bezos, may not know: Carnegie's money was given on the condition that local public authorities step up with pledges to support and maintain the institutions that he launched. For Carnegie, this structure fit with the idea that communities were being helped to help themselves—a pillar for him. Many cities turned down Carnegie's offer, and later regretted it.

Mark Wilson/Getty Images (L) and Bettmann/Getty Images (R)

If you are looking to have your name be kept alive in the memories of generations—or if you simply want a legacy worthy of the fortune you have reaped—you don't need to start something new or even have it named after you. (You didn't rename the Washington Post, either, and yet it's becoming one of the handful of great news sources in the world.) Hidden in plain sight, the local libraries of America are patiently waiting for your attention. (They're also often really beautiful spaces, and I can tell that you like design. Just down the street from your headquarters is Rem Koolhaas’s terrific Seattle main library, with areas named after donors and relatives of Paul Allen, Microsoft, Charles Simonyi, and Boeing.)

Whether or not the local library a random American uses today was actually built by Carnegie, he or she knows what that philanthropist did. More important, if a philanthropist was someone who wanted to get a glimpse of what his money did, he would be proud of what his money had accomplished.

Tragically, the federal government and the states are constantly cutting back on library funding. You would almost think that politicians don’t want members of the public to have access to the very knowledge that would lead them to make informed decisions! But those politicians are ignoring the fact libraries are citadels of civilization and economic ladders for those otherwise stuck on the bottom rungs. Why not use the lever of your money, Mr. Bezos, to spur public authorities to do their part? Just like Carnegie did. It is hard to imagine a better use of billions.

↧

On Melissa O’Neill’s PCG random number generator

August 15, 2017, 8:24 am

≫ Next: Launch HN: Thematic (YC S17) Customer Feedback Analysis via NLP

≪ Previous: Bezos should put his billions in public libraries

Computers often need random numbers. Most times, random numbers are not actually random… in the sense that they are the output of a mathematical function that is purely deterministic. And it is not even entirely clear what “really random” would mean. It is not clear that we live in a randomized universe… it seems more likely that our universe is deterministic but that our limited access to information makes randomness a useful concept. Still, very smart people have spent a lot of time defining what random means, and it turns out that mathematical functions can be said to produce “random” outputs in a reasonable sense.

In any case, many programmers have now adopted a new random number generator called PCG and designed by professor Melissa O’Neill from Harvey Mudd college.

What O’Neill did is quite reasonable. She asked herself whether we could produce better random number generators, she wrote a paper and published code. The result was quickly adopted by engineers worldwide.

She also submitted her paper for consideration in what I expect to be a good, well-managed journal.

Her manuscript became lengthy in time and maybe exceeded some people’s style sensibilities, she justifies herself in this manner:

I prefer to write papers that are broadly accessible. I’d rather write a paper that can be enjoyed by people who are interested in the topic than one that can only be understood by a tiny number of experts. I don’t agree with the philosophy that the more impenetrable the paper, the better the work must be! Describing desirable qualities in detail seemed to be necessary for the paper to make sense to anyone not deeply entrenched in the field. Doing so also seemed necessary for anyone in the field who only cared about a subset of the qualities I considered desirable—I would need to convince them that the qualities they usually didn’t care about were actually valuable too.

As I pointed out, she had a real-world impact:

While attending PLDI and TRANSACT in June of 2015, I got one of the first clues that my work had had real impact. I can’t remember the talk or the paper, but someone was saying how their results had been much improved from prior work by switching to a new, better, random number generator. At the end I asked which one. It was PCG.

Meanwhile, at least one influential researcher (whose work I respect) had harsh words publicly for her result:

I’d be extremely careful before taking from granted any claim made about PCG generators. Wait at least until the paper is published, if it ever happens. (…) Several claims on the PCG site are false, or weasel words (…) You should also be precise about which generator you have in mind—the PCG general definition covers basically any generator ever conceived. (…) Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*.

Her paper was not accepted. She put it in those terms:

What was more interesting were the ways in which the journal reviewing differed from the paper’s Internet reception. Some reviewers found my style of exposition enjoyable, but others found it too leisurely and inappropriately relaxed. (…) An additional difference from the Internet reaction was that some of the TOMS reviewers felt that what I’d done just wasn’t very mathematically sophisticated and was thus trivial/uninteresting. (…) Finally, few Internet readers had complained that the paper was too long but, as I mentioned earlier, the length of the paper was a theme throughout all the reviewing. (…) Regarding that latter point, I am, on reflection, unrepentant. I wanted to write something that was broadly accessible, and based on other feedback I succeeded.

I emailed O’Neill questions a couple of times, but she never got back to me.

So we end up with this reasonably popular random number generator, based on a paper that you can find online. As far as I can tell, the work has not been described and reviewed in a standard peer-reviewed manner. Note that though she is the inventor, nothing precludes us to study her work and write papers about it.

John D. Cook has been doing some work in this direction on his blog, but I think that if we believe in the importance of formal scientific publications, then we ought to cover PCG in such publications, if only to say why it is not worth consideration.

What is at stake here is whether we care for formal scientific publications. I suspect that Cook and O’Neill openly do not care. The reason you would care, fifty years ago, is that without the formal publication, you would have a hard time distributing your work. That incentive is gone. As O’Neill points out, her work is receiving citations, and she has significant real-world impact.

At least in software, there has long been a relatively close relationship between engineering and academic publications. These do not live in entirely separate worlds. I do not have a good sense as to whether they are moving apart. I think that they might be. Aside from hot topics like deep learning, I wonder whether the academic publications are growing ever less relevant to practice.

↧

Launch HN: Thematic (YC S17) Customer Feedback Analysis via NLP

August 15, 2017, 11:04 am

≫ Next: Learnings from One Year of Building an Open Source Elixir Application

≪ Previous: On Melissa O’Neill’s PCG random number generator

Hi! I’m the CEO of Thematic, http://www.getthematic.com. We analyse customer feedback to tell companies how to increase customer satisfaction and reduce churn.

We are one of the handful of companies that got into YC through the Startup School, and (I have to say) the only company that signed YC itself as a customer!

I have a PhD in NLP and ML and was consulting when two large media companies came to me with a problem: They collect tons of customer feedback in free text as part of their NPS surveys, but don’t have the time to sift through the responses.

This turned out to be common. Most companies collect feedback but, especially in large companies, nobody reads this data, and definitely not people who are in charge of strategy. Customers are screaming what’s wrong and what they want, but nobody is listening.

I tried a few open-source packages but found that none worked well. Developed on canonical text like news article or Wikipedia, they either failed to understand the variety of expressions, or were too hard to explain. I wrote a new approach capitalising on my PhD and new Deep Learning approaches. It's completely unsupervised: just needs raw data but, unlike topic modelling, produces clear and specific themes. My husband Nathan joined as a co-founder and for the next year we learned how to solve this problem in a way customer insights professionals find valuable.

Those media companies became customers and we quickly bootstrapped into a profitable startup. This is when Nathan signed up for YC’s Startup School. We grew 20% in those 10 weeks, loved the accountability and the focus. Our mentor suggested we apply for YC, which seemed like a crazy idea, but we gave it a go.

Fast-forward another 2 months, and we are just before Demo Day! Thematic grew 3x in that time, and we are working with brands like Vodafone, Air New Zealand, Stripe, Ableton, and Manpower Group.

Hope you found our story interesting, and happy to answer any questions.

↧

Learnings from One Year of Building an Open Source Elixir Application

August 15, 2017, 5:54 am

≫ Next: Ev Williams helped create the open web, now he’s betting against it (2016)

≪ Previous: Launch HN: Thematic (YC S17) Customer Feedback Analysis via NLP

Additionally, Elyxel will be open sourced in the hope that it might be beneficial to anyone trying to learn from what I did. This piece is geared towards someone who is familiar with the fundamentals of programming and general web development.

Avoiding Design Limbo

I wanted to start with a fairly unoriginal simple concept that was familiar to me. At the time, solving the well defined problem of community software was particularly appealing. I was getting fatigued from the constant negative rhetoric that seemed to dominate the conversation on forums I frequent. That coupled with a desire to learn new technology was all the kindling I needed to get started.

The actual design of the site took a minimal amount of time. I opted for a simple relatively sparse interface. On past projects I would spend a lot of time up front on the design phase of the project. It's easy (and honestly quite fun) to get lost in the design process.

Final 1.0 design live

Elixir & Phoenix

To gain better leverage, I decided to use this spare time as an opportunity to learn a new programming language and framework. The sheer amount of ways to build a modern web application was staggering. But Elixir and Phoenix came out on top after some research—both were fast, being actively developed and most importantly easy to get started with.

There's so many different ways to accomplish the same thing, and to be honest it was quite difficult cutting through the noise. I opted to try and remove as many elements as possible, with one of the major ones avoiding using a front-end javascript framework. Static pages were plenty fast for my use case. A vital part of the process is whittling down the problem to its core.

After settling on my choice, I knew I needed a server to host the finished application. Instead of using an automated service like Heroku, I set about learning how to provision and setup a small virtual private server (VPS). Now having gone through the process, I've gained a greater appreciation and understanding of the setup infrastructure. I highly recommend doing it at least once.

If you're interested in doing the same, I've listed the major steps of the process below². The entire setup process took about a month of learning and understanding best practices. I had to cut some of these explorations short as each thread I pulled on unearthed a hundred more. There's a staggering amount of low level tech built by brilliant folks we rely on everyday.

Installed Ubuntu
Learned about Ubuntu's file system layout
Setup password less login via SSH
Installed NGINX
Setup reverse proxy to point towards Phoenix application
Enabled gzip support to improve performance.
Installed htop for monitoring
Pointed the elyxel.com domain to the server
Setup SSL via Let's Encrypt
Secure NGINX on Ubuntu

Diving In To The Deep End

There's quite a few ways to approach learning within a new domain. What works best for me is starting out by reading through some of the foundational principles. Once I have a rudimentary understanding, I try to quickly apply what I just learned to solve a related problem. If I don't take this next step I usually don't end up making enough meaningful connections in my brain to remember any of it.

Building the application consisted of two major learning steps. Learn enough of Elixir, which meant wrapping my head around functional programming and the syntax. Secondly, learn Phoenix, which was similar enough to Rails that it didn't require as much effort.

There was an upfront cost of having to learn Elixir before I could build anything. While the syntax was familiar there were a few new concepts to grapple with due to Elixir being a functional programming language.

Once I had the foundation, learning the the Phoenix framework was fairly smooth. The documentation on the Phoenix project web site served as the backbone of the steps I took to write up the application. While it was a good overview some of the challenges I faced required looking up supplementary material. Luckily, any gaps in knowledge were then filled with shorter guides and pieces written by intrepid early adopters. I've included links to some of the most useful ones below.

After gaining some familiarity with the chosen tool set, I knew I needed to breakdown the project into smaller milestones—teasing out critical features needed for version 1.0. This is the whittled down list I came up with:

Written draft of the problem
Sketch & wireframes
Server provisioning
Landing page
Signup flow
Invite Flow
Application functionality
- Top page
- Recent page
- Submit page
- Comments
- Profiles
- Voting mechanism
- Sentiment analysis
- Curated stories
- Content fire hose

Once I had an outline, it was a matter of building things piece by piece. The main challenge that kept coming up was the lack of robust best practices for features I was trying to build. In hindsight this was actually favorable because it ended up being a rewarding challenge to figure out things on my own when I did get stuck. I wouldn't recommend this process if you're under a deadline, but if the goal is learning then it definitely is valuable.

Another learning constraint was picking simpler libraries to utilize. The goal was to be able to understand the tools I was using and avoid cruft. We've all heard the horror stories of including some mega complex library to achieve a relatively simple goal.

Code Highlights

To avoid getting overly prescriptive, I'm only going to describe a few of the interesting challenges of the project³.

The login system was particularly tricky even though I used a simple library called openmaize⁴. It was great but through out the build I found myself getting increasingly paranoid about missing some big security feature and leaving myself exposed to an unforeseen vulnerability.

For less trickier parts I ported over some of the code from Lobster, another Rails based community site. Their code base seemed well built and accessible. One example was the particularly clever bit of code creating human readable timestamps below.

defmoduleElyxel.Timedo@moduledoc"""
        Time Helper
        """epoch={{1970,1,1},{0,0,0}}@epoch:calendar.datetime_to_gregorian_seconds(epoch)defelapsed(time)donow=:os.system_time(:seconds)past_time=time|>Ecto.DateTime.to_erl|>:calendar.datetime_to_gregorian_seconds|>-(@epoch)now-past_timeenddefsimple_time(time)doseconds=elapsed(time)conddoseconds<=60->"#{seconds}s"seconds<(60*60)->"#{round(seconds/60.0)}m"seconds<(60*60*48)->"#{round(seconds/60.0/60.0)}h"seconds<(60*60*24*30)->"#{round(seconds/60.0/60.0/24.0)}d"seconds<(60*60*24*365)->"#{round(seconds/60.0/60.0/24.0/30.0)}mo"true->"#{round(seconds/60.0/60.0/24.0/365.0)}y"endendend

Most implementations of rating I found followed a similar pattern. I opted for the version below. It will be interesting to watch how this evolves as the community grows.

defmoduleElyxel.Ratingdo@moduledoc"""
        Rating Helper
        """importElyxel.Timedefcalculate_rating(pluses,comments,time)docomment_weight=0.2# Comments carry a little weightgravity=1.5# Rating decreases much faster for older items if gravity is increasedamplifier=10000# Surfaces buried significant digitsround(((pluses+(comments*comment_weight)-1)/(:math.pow(((elapsed(time)/60/60)+2),gravity)))*amplifier)endend

I resisted the temptation to use a library for pagination. Here is the simple solution I cobbled together.

defmoduleElyxel.Paginationdo@moduledoc"""
        Generic ecto pagination helper
        """importEcto.QueryaliasElyxel.Repodefpage(query,page:page,per_page:per_page)doscrub_page=page|>scrubcount=per_page+1result=query|>limit(^count)|>offset(^(scrub_page*per_page))|>Repo.all%{has_next?:(length(result)==count),has_prev?:scrub_page>0,current_page:scrub_page,list:Enum.slice(result,0,count-1)}enddefpscrub(page)doconddois_integer(page)&&(page>=0)->pageis_binary(page)->caseInteger.parse(page)do{page,_}->if(page<0)do0elsepageend:error->0endtrue->0endendend

Performance

Elyxel was designed and built with performance in mind. Styles and any additional flourishes were kept to a minimum. My choice of Elixir & Phoenix was driven by this consideration as well. Most of the pages are well under 100 kilobytes and load in less than 100 milliseconds⁵. I find it's always helpful to keep performance in the back of your mind when building something.

I achieved this by keeping within certain design constraints from the start. First was using a system font stack. In recent years, most operating systems come with a robust set of defaults. Not including custom typography saves the cost of sending it over the wire and reducing browser render time.

/* This typographic stack should work well across most platforms */--system-fonts:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Oxygen,Ubuntu,Cantarell,"Open Sans","Helvetica Neue",sans-serif;

Furthermore, all assets were vector (SVG) and animations created with CSS. I didn't use a javascript framework because it didn't feel necessary for the task. Coupled with the speed of Phoenix and Elixir these small compromises improved speed in aggregate.

Fatigue and Completion

One thing that stuck with me throughout this project was how challenging it was to work a full work day and spend a few extra hours at night chipping away at Elyxel⁶. Particularly because I was doing really interesting rewarding work as well. There were so many times where I was too exhausted to come home and work on Elyxel even though I was super excited to. I always admire folks who are able to do both consistently well.

Dans ses écrits, un sage Italien Dit que le mieux est l'ennemi du bien.
Voltaire

To keep moving along I had to adapt the mantra, popularized by Voltaire, of not letting perfection being the enemy of good. There is endless amount of ways to iterate and improve something but I had to call whatever I was working on as done for the sake of progress even though I knew with more time I could improve it significantly. I've gotten better at it but it's still an uncomfortable decision most of the time.

The good news is I ended up finishing version 1.0 in January but have since stopped working on it. This is partially due to a massive project at work taking up head space and generally wanting to take a break.

What's next?

Truthfully, I'm not sure. Now that it's out there I'm not quite sure what to do with it. I learned a ton from building Elyxel out that has helped me grow various skills that directly apply to my career. Ideally with a little more work it becomes a small growing community. If you’ve read through all of this you’re the kind of person I made Elyxel for. If you’d like to join please don’t hesitate to send me an e-mail.

At the end of the day, I find the real delight comes from the moment you experience the vending machine whirring and clicking to life as it dispenses your reward with a satisfying thud.

Pointers

Here is a supplemental collection of links that I have read through and have helped throughout the process of building this app:

Dev Ops

Elixir

Phoenix

Thanks to Rich, Jamie, and Jake for reading and providing invaluable advice.

If you liked this story, you might enjoy my article on prototyping an ambient notification cube. And if you think I've missed anything, please let me know.

↧

Ev Williams helped create the open web, now he’s betting against it (2016)

August 15, 2017, 9:51 am

≫ Next: Rustgo: Calling Rust from Go with near-zero overhead

≪ Previous: Learnings from One Year of Building an Open Source Elixir Application

To a certain kind of nerd, Ev Williams is the Forrest Gump of internet media. Williams helped write the software that made us call blogs blogs. He founded a podcast company years before most people listened to them. He sent Twitter’s 75th tweet, then ran the company. And now he’s the founder and CEO of Medium, the platform for online writing embraced by sportswriters, Silicon Valley executives, and the President of the United States.

Of all the American internet industry’s critical events (other than that fateful night in Mark Zuckerberg’s dorm room), odds are good that Williams was there or knew someone present. So at 9 a.m. on a Tuesday in March, as he sweeps down the stairs into the BART station at 16th and Mission—into the fast-transit artery of the country’s most technologized region—you might expect someone to recognize him. But as he gets on a downtown train, no one turns a head.

Despite serving as a board member at one of the five largest social networks, and a mainstay of the Bay Area tech industry for almost two decades, the kind of fame attached to the names of Mark Zuckerberg, Peter Thiel, or the “Google guys” has eluded Williams. He’s maybe even sub-Travis Kalanick now.

Yet his run near the top has been remarkably consistent. While other CEOs in his early-web cohort have left the industry, or have become writers or consultants, Williams has stuck around, leading companies. His startups have nearly all specialized in the same abstract medium: text boxes. He has dotted the web with these text boxes, and people have poured their souls into them, have argued and wept and whispered into them. Millions of people have had their worldview shaped by these text boxes, and the boxes themselves have, in turn, changed the Internet. They have also made Williams rich. Though few of his businesses have turned a profit, he is a billionaire.

I met him in a cafe on Valencia Street, an old punk and immigrant district in San Francisco now lined with spartan boutiques, ethical taxidermy shops, and other beacons of mass-appeal hipsterism.

Williams looks the tech-CEO part. He is tall, soft-spoken, with a constant air of chilled-out concern. His gray hoodie and black t-shirt are woven from some athleisure Star Fleet-issue textile, and he wears broad, squarish white glasses that I internally dub the Warby Mugatu. Within minutes of arriving, he has launched back into his endless theme, which he expands on across multiple meetings, on two different coasts, across three months: “The open web,” he says, “is pretty broken.” But don’t worry—he has a plan to save it, or, at least, sort of save it. And it involves text boxes.

The open web is the nickname for the internet as it should be—free, uncensorable, and independently owned and operated. According to the blog posts that hashed out most of its theory (and which themselves were published on the open web), the open web describes an internet where people mostly publish their writing (or music, or photos, or films) to servers that they own or rent, accessible via their own personal domain names, in formats that are themselves free or unrestricted. It is the web because the pages are written in HTML and CSS; it is open because anyone can access almost all of it, without special privileges, expenditures, or a user account. Above all, the open web is free—free like language is free, like consciousness is free. Freedom not so much as a right, but as a technical and inalienable fact.

This liberty has an end goal: to turn the web into the finest, coolest piece of media ever created, a library of libraries authored by all of humanity. This web encompasses novels and newspapers and scientific journals, all at once. Anyone can write for it, and anyone can read it. It is a to-do list, a logbook, a work of literature, and a communication tool so powerful that it could abort war.

This is a vision for the web that sounds both very similar to and very distant from the web that you and I use everyday. Our web, after all, contains unhappy news, garish advertising, unsympathetic grandstanding, and a lot of photos of other people’s kids. All this clutter reaches us after being shunted through social networks, which (the idealists lament) are effectively shut off from the rest of the network. The follow-on effect from these networks is even worse: Cookies tied to those same user accounts surveil us as we read across the open web, then a mysterious algorithm uses this collected browsing history to decide how to distract us with ads. The open web is pretty broken indeed, and this isn’t even getting into spam, mass harassment, identity theft, and digital espionage.

“There’s still a bunch of stuff on the web. The stuff we read everyday, the stuff you write, is on the web. And that’s great,” says Williams. (In fact, you are reading this very story on the open web—unless you found it on the Facebook app on your phone, in which case you are reading a copy nearly identical to the open-web version of the story, except that yours loaded much faster and lives on Facebook’s servers.) “There’s still the fact that anyone, at any time, can create their own website and start publishing, and they have a voice—I mean that’s the idea that I got really excited about almost 20 years ago.”

“I think that will continue. I think the openness of voices is not going to consolidate back to the old days of media,” he told me. “I think the distribution points are going to consolidate.”

The distribution points are the search engines and the social networks: Facebook, Google, Twitter, Snapchat, and the messaging apps. Also on that list are YouTube (owned by Google), Instagram (owned by Facebook), Whatsapp (also owned by Facebook), and Facebook Messenger (ditto). By linking the web together, or hosting normally data-heavy content for free, these distribution nodes seize more and more users. And because each of the nodes is more interesting than any one individual’s personal site, people who used to go to personal sites wind up at the nodes instead.

As Williams puts it: “Primarily what we’ve seen is that the social networks have gotten really, really big, and they drive more and more of our attention.” With this size, they also collect more revenue: 85 cents of every new dollar in online advertising went to Google or Facebook in early 2016, according to a Morgan Stanley analyst quoted by The New York Times.

“That could be bad,” says Williams, in his low-key way.

The open web’s terminal illness is not a story that he alone is telling. It is the common wisdom of the moment, espoused by Times columnists and longtime tech bloggers. The developers who wrote Drupal and Wordpress, two important pieces of blogging software, both recently expressed anxiety over the open web’s future. Since so many of these social networks are operated by algorithms, whose machinations are proprietary knowledge, they worry that people are losing any control over what they see when they log on. The once-polyphonic blogosphere, they say, will turn into the web of mass-manufactured schlock.

Something like this has happened before. Tim Wu, a law professor at Columbia University, argues in his book The Master Switch that every major telecommunications technology has followed the same pattern: a brief, thrilling period of openness, followed by a monopolistic and increasingly atrophied closedness. Without government intervention, the same fate will befall the internet, he says. Williams cites Wu frequently. “Railroad, electricity, cable, telephone—all followed this similar pattern toward closedness and monopoly, and government regulated or not, it tends to happen because of the power of network effects and the economies of scale,” he told me.

Williams and his team at Medium say they are working to resist this consolidation, though they are not doing quite what anyone else would recognize as resistance. The truth is that they themselves want to consolidate some of the web, too; and then—with that task done—govern as just, beloved, and benevolent despots. Josh Benton, a media critic at Harvard, once described Medium as “YouTube for prose,” and that’s an apt summary of what it feels like to use. But as I spend more time with Ev, I catch him thinking of Medium as a project philosophically akin to the “Foundation” novels by Isaac Asimov. The heroes of those books sought to centralize all the learning across the galaxy before a dark age set in, knowing that though they cannot stop the shadowed era, they may be able to preserve scholarship and therefore shorten it. Ev’s ambitions, though not as grandiose, follow similar lines. Medium seeks to replicate the web’s old, chaotic hubbub on a single, ordered site—because, ultimately, Ev values the chaos.

In the spring of 2000, a developer and designer in San Francisco named Meg Hourihan was surveying the city’s swelling ranks. New coders had come from all around the world to her city, to work on internet projects, and they were crowding her favorite haunts. She loved the web, and she was excited to see it catch on with a broader public—but she could not get excited about the hordes descending on the city.

“I realized there are dot-com people and there are web people,” she wrote on her blog at the time. “Dot-com people work for start-ups injected with large Silicon Valley coin, they have options, they talk options, they dream options. They have IPOs. They’re richer after four months of ‘web’ work than many web people who’ve been doing it since the beginning. They don’t have personal sites. … They don’t get personal.”

She continued. “Web people can tell you the first site they ever saw, they can tell you the moment they knew: This, This Is It, I Will Do This. And they pour themselves into the web, with stories, with designs, with pictures. They create things worth looking at, worth reading, worth coveting, worth envying, worth loving.”

At the time, Hourihan was co-founder of a small company named Pyra Labs. Her co-founder was Williams. They were both web people.

Born in 1972, Williams grew up on a farm about 90 minutes from Lincoln, Nebraska. For a long time, he didn’t stray far. He stayed in-state for school, going to the University of Nebraska. But sensing the internet’s enormous potential, he dropped out, preferring to try his luck with tech ventures funded by his parents’ money. One of his companies sold a CD-ROM with information about that year’s Cornhuskers team. Another distributed a video about how to connect to the internet.

But by the time he was 24, Williams realized he would have to leave the plains to work on the net. He moved to Sebastopol, California, to work at O’Reilly Media. O’Reilly publishes dead-tree books—programming manuals and standards guides—that held biblical importance to ’90s coders. “When viewed from Nebraska, Sebastopol looks like it’s in exactly the same spot as San Francisco,” he would write later. “In actuality, it’s about an hour away and feels like a very different place.”

He stayed there for several years nonetheless. But he was right about San Francisco. It was there that he met Hourihan. They discovered their mutual admiration for the web, briefly dated, and ultimately founded Pyra Labs together in 1999. Pyra never actually shipped its namesake software, a suite of office collaboration tools, but in the offing it managed to build Blogger, the first simple web-journaling software to find a massive user base. Blogger also helped popularize the word blog.

Williams and Hourihan had tremendously unlucky timing. Blogger got big just as the first dot-com bubble popped. The company wasn’t expensive to run, but with VCs going bankrupt right and left, no one could find the money to fund it. It missed payrolls. Its leaders fought about the right path. It laid off employees. In January 2001, Hourihan resigned, and everyone else at the company walked out. (Hourihan later founded Kinja, Gawker Media’s blogging software.)

Yet Pyra didn’t die. Williams kept it alive by knocking out small contracts to keep the corporate name afloat, while finishing long-planned product updates. Two years after the bubble’s collapse, he shipped a premium version of Blogger that cost money to use. He hired a few more staff. In February 2003, Google bought Pyra. “We had a million registered users,” Williams says now. “And that felt big.”

It’s worth dwelling on this moment. Blogger’s story contains all the contradictions that would eventually dissolve the open web. For all the talk of their radical openness, blogs had mostly been the domain of those with hosting space, programming experience, and the time to write them. The blogosphere was dense and complicated, with many writers posting dozens of times per day; those who had the power to blog (like Andrew Sullivan and Stereogum’s Scott Lapatine) could shape conversations in politics, culture, and music.

Blogger’s great innovation was to supply writers with an easy interface and a free domain name, blogspot.com, where they could host their journals. This latter feature fueled the site’s growth. It allowed blogging to graduate from the dominion of a tech-savvy elite to something that anyone with a computer and web connection could do—and more people than ever, motivated by the national-security anxiety and intense politics of the 2000s, were eager to take part.

But even in the blogosphere’s early days, growth was synonymous with consolidation. Expanding the web’s power to more people also centralized it—there was no difference between the two. It foreshadowed what was to come.

Williams stayed at Google for six months before moving on. In the fall of 2004, he co-founded Odeo, an early podcasting company. Odeo wanted to be to podcasts what Blogger was to blogs, but internet audio was still too disorganized for a business to succeed.

“The entire idea of podcasts came from realizing you could do a hack to pull stuff down from the Internet on your computer and put it on your iPod,” he told me when we met in New York. “And that was cool, but it was a pain in the butt.”

By early 2006, some of Odeo’s employees began playing around with a software doodad they had developed. It was a digital megaphone, basically: If you sent it a short SMS text, then it would broadcast that message to all of your friends. That product, separate from the core podcast offering, debuted in March and formally launched in July. By December, it had more than 60,000 users. By February 2007, Odeo had rebranded itself Twitter.

The next month, a subset of technology and media elite glommed onto Twitter at the South by Southwest technology conference in Austin. They loved it, they blogged about it, they started dropping daily witticisms and one-liners there—tweeting when they had written a new blog post, for instance—and the service exploded. By April 2007, Twitter had 8 million users. It had grown more than 13-fold in five months.

This period—the fall of 2006 to the spring of 2007—was the most heated the aughts ever got in Silicon Valley. In this period, Google acquired YouTube, an 18-month-old company, for $1.6 billion. Facebook opened to all users, not just college students. TIME declared “You” the Person of the Year, a silly gimmick that nonetheless initiated the era of social-media hype. And Apple debuted the first iPhone.

In this environment, Twitter was growing explosively, though under the aimless leadership of Jack Dorsey. In 2008, Williams was named Twitter CEO.

Even the internet of 2008 can seem distant. That year’s presidential election was famously waged via web blogs. By 2012, much of the conversation had moved to Twitter. Speaking now, Williams sounds contrite about this centralization, from many news sites and blogs to a single platform. “In general, structurally, it’s probably bad if all our media and communications are going through services that are controlled by profit-driven corporations,” he says. (A similar sentiment sparked the creation of public broadcast media in the 1970s.)

The dangers of corporate consolidation dominate his metaphors. A favorite idea is that the web’s current state resembles the factory-farmed food system. “If your job was to feed people, but you were only measured by the efficiency of calories delivered, you may learn over time that high-calorie, high-processed foods were the most efficient ways to deliver calories,” he says. They would be the most margin-friendly way to deliver calories. But the food still wouldn’t be good—because the original metric didn’t take into account “sustainability, or health, or nourishment, or happiness of the people.”

I proposed that Medium is trying to be the Whole Foods of content. He laughed.

“Maybe we are,” he said. “Not that Whole Foods is perfect, and we’re not perfect either, but we are trying to figure out how to optimize for satisfaction and nourishment, not just activity or calories.”

Williams and his team have devised alternate metrics to account for those more holistic virtues—namely, “time spent reading,” which measures how long Medium users collectively spent reading a story. And instead of garish display ads, much of its revenue (right now) takes the form of native advertising, or brand sponsorship of certain series.

And Medium’s marketing position isn’t far from Whole Foods either—it wants to be the big corporation that upscale customers trust. For even though Williams may express suspicion of the big profit-driven networks, Medium vies to join them. Weeks after we met, the company debuted a tool to suck up WordPress blogs and drop them into Medium. Publications that would have previously lived at their own domain name—like The Awl,Pacific Standard, and Bill Simmons’ new site, The Ringer—now live exclusively on Medium. (The Toastalso considered moving to Medium, but chose to shut down for other reasons.) Each of these sites still lives on its own domain name, but in terms of design and function, each is essentially a Medium page. Their stories also live on Medium’s servers.

While he was CEO of Twitter, Williams spoke to a small product team about what the social network needed to become. The Internet was transitioning from the web of archipelagos to the web of continents, he said. The archipelagos—think of email and the blogosphere—constituted many small, independently owned atolls that could communicate. But their disjointedness also made them nearly impossible to update.

A new form of organization was supplanting the archipelagos, he said: the web of continents. Facebook was a great continent, of course, but so was any other site that absorbed its users into a great centralized morass. If Twitter hoped to survive, it needed to do more than serve and connect the archipelagos—it had to become continental.

Williams did not quite take it there. His tenure at Twitter was marked by fast growth, but the company never found its business footing. In 2010, he stepped down as CEO, though he remained on its board. Two years later, he founded Medium, describing it as a place for content that was too short for Blogger and too long for Twitter. The next year, in the autumn of 2013, Twitter had its initial public offering. Williams’s 12-percent stake in the company made him a multi-billionaire.

Which is funny, because talking to Williams, you get the sense that—to paraphrase the joke about Obama—if things had really worked out for him, he could have been a journalist. What seems to excite him most about Medium or any of his other ventures are that he helped give voice to the afflicted. He remembered “I, Racist,” a Medium post adapted from a sermon by John Metta. It found tens of thousands of readers on his site.

“He, as a person, just had the right way to say something,” he said of Metta. “And he wasn’t necessarily someone who said I’m gonna be a publisher, I’m gonna start a blog, I’m gonna have a voice. We gave him a canvas, what was in his brain came out, and it found people who needed to see it. That’s a better world, when that’s happening all the time.”

Williams still comes off like a cheerleader for this better world. He told me that a Medium user wrote an open letter to him, saying that though they had posted to the site every day for a month, they had not gotten more than 100 “recommends” on their post yet. (Every social network has its atomic unit of dopamine-like recognition: Facebook has likes, Twitter has hearts, Medium has the recommend.) He said he wanted to reply and tell the guy to step back.

“Think about what you’re doing,” he says. “You’re playing this game for attention that half of humanity is playing. And you’re competing for not only the thousands of people who publish on Medium the same day, the millions of people who publish on websites that have ever published, the billion videos on YouTube, every book in the world, not to mention what’s on Instagram, Facebook, Twitter, Vine, everything else, right now—it’s amazing any people are reading your stuff!”

That this can still happen—that any subset of readers can still find and read an amateur writer’s work—is what excites him most about Medium. Talking about the centralization of the web, he continually returns to the “bad world.”

“The worst world, the scary version, is if the tricks to get attention are a skill developed and owned primarily by profit-driven companies,” he told me. “I’d go back to the food analogy. What are people going to be consuming most of the time? They’re optimizing for clicks and dollars. Can a person who has a unique perspective play that game? Are they just going to get trounced?”

This is Medium’s reason for existing: to protect individual writers in the fierce and nasty content jungles. Resistance to the centralization generally is futile, he believes, citing Wu. “That’s the way the Internet works, and that’s the way humans work,” he says. “Efficiency and ROI and economies of scale and user experience—they’re all going to drive more things to consolidate. I kind of look at that as a force of nature. But if things consolidate, does that mean that everything is shit?”

That is the Medium appeal, in a nutshell. Keeping everything from being shit. It wants to do so by adopting many of the tics and habits of the original blogosphere—the intertextuality, the back-and-forth, the sense of amateurism—without being the open web. It will use its own custom metrics, like time-spent-reading, to decide who sees what stories; and it will tend to show your friends something if you “recommend” it. Medium, yes, will just be another platform, but it will run the open web in an emulator.

“I understand the skepticism, that we’re a venture-backed corporation that is saying those things,” Williams says. “I think you can still be optimistic that something good can be created, and you can at least get behind the fact that there shouldn’t be one platform that everything centralizes to.”

And by one platform, he means Facebook.

Facebook. Of course it would end with Facebook. The web people have always been suspicious of it. As early as 2007, early bloggers like Jason Kottke called Facebook“a step sideways or even backwards” for the web. They compared it to AOL, another platform that intended to centralize the net before it flamed out. (A year earlier, Kottke had married Hourihan.)

Except Facebook has succeeded where AOL failed. An April report from the web-analytics company Parse.ly found that Google and Facebook, just two companies, send more than 80 percent of all traffic to news sites. (No wonder they make 85 cents of every digital-ad dollar.) And since few people today use RSS readers like Feedly or visit homepages directly, publications like The Atlantic essentially depend on Facebook and Google to send them their regular readers. Forget that thriving blogosphere: If their authors didn’t move to social media years ago, then their readers did. The web of 2008—the web that helped elect President Obama—has already withered.

All of this can make Williams’s memories of the web sound elegiac. I once met Williams in a hotel lobby in midtown Manhattan, early in the morning. We looked out across Columbus Circle and the late autumn ruddiness of Central Park. In a freak of city planning, Trump Tower was the only thing obstructing our view.

Williams’ flight the day before had been exhilarating, in a comfortably ordinary way, he told me. “I was having one of those rare moments where you appreciate that you live in the future,” he said. “From having called my Uber to get there, to having my boarding pass in my iPhone wallet and scanning it.”

“And something about that, like—everything worked! And that was amazing. And that’s an everyday occurrence, and there’s wifi in the airport, and I can use my phone and laptop the entire time, and there’s wifi on the plane. That was our dreamed-of future.”

And it was. But the thing about dreaming up a future, and making it real, is then you have to live in it. Back in San Francisco, coming out of the BART station on Market Street, he admits that the web game has changed since he came up. His glasses have been switched out for ruby sunglasses, polarized, reflective, and movie-star dashing.

“There were always ecommerce startups,” he says. “I was never part of that world, and we kind of looked down on them when the whole boom was happening. We were creating businesses, but ours had more creativity, ours weren’t just for the money. Or maybe ours were even for utility but not just money, whereas clearly there are ways for both.”

He laughs. “Even the Google guys—they were trying to create something really useful and good for the world, and they made all the money.”

Now the internet works differently, he says: “It’s in general no longer about the creativity, it’s about the business.”

↧

Rustgo: Calling Rust from Go with near-zero overhead

August 15, 2017, 5:26 am

≫ Next: Wefunder is hiring engineers to save the American Dream

≪ Previous: Ev Williams helped create the open web, now he’s betting against it (2016)

Go has good support for calling into assembly, and a lot of the fast cryptographic code in the stdlib is carefully optimized assembly, bringing speedups of over 20 times.

However, writing assembly code is hard, reviewing it is possibly harder, and cryptography is unforgiving. Wouldn't it be nice if we could write these hot functions in a higher level language?

This post is the story of a slightly-less-than-sane experiment to call Rust code from Go fast enough to replace assembly. No need to know Rust, or compiler internals, but knowing what a linker is would help.

Why Rust

I'll be upfront: I don't know Rust, and don't feel compelled to do my day-to-day programming in it. However, I know Rust is a very tweakable and optimizable language, while still more readable than assembly. (After all, everything is more readable than assembly!)

Go strives to find defaults that are good for its core use cases, and only accepts features that are fast enough to be enabled by default, in a constant and successful fight against knobs. I love it for that. But for what we are doing today we need a language that won't flinch when asked to generate stack-only functions with manually hinted away safety checks.

So if there's a language that we might be able to constrain enough to behave like assembly, and to optimize enough to be as useful as assembly, it might be Rust.

Finally, Rust is safe, actively developed, and not least, there's already a good ecosystem of high-performance Rust cryptography code to tap into.

Why not cgo

Go has a Foreign Function Interface, cgo. cgo allows Go programs to call C functions in the most natural way possible—which is unfortunately not very natural at all. (I know more than I'd like to about cgo, and I can tell you it's not fun.)

By using the C ABI as lingua franca of FFIs, we can call anything from anything: Rust can compile into a library exposing the C ABI, and cgo can use that. It's awkward, but it works.

We can even use reverse-cgo to build Go into a C library and call it from random languages, like I did with Python as a stunt. (It was a stunt folks, stop taking me seriously.)

But cgo does a lot of things to enable that bit of Go naturalness it provides: it will setup a whole stack for C to live in, it makes defer calls to prepare for a panic in a Go callback... this ~~could be~~ will be a whole post of its own.

As a result, the performance cost of each cgo call is way too high for the use case we are thinking about—small hot functions.

Linking it together

So here's the idea: if we have Rust code that is as constrained as assembly, we should be able to use it just like assembly, and call straight into it. Maybe with a thin layer of glue.

We don't have to work at the IR level: the Go compiler converts both code and high-level assembly into machine code before linking since Go 1.3.

This is confirmed by the existence of "external linking", where the system linker is used to put together a Go program. It's how cgo works, too: it compiles C with the C compiler, Go with the Go compiler, and links it all together with clang or gcc. We can even pass flags to the linker with CGO_LDFLAGS.

Underneath all the safety features of cgo, we surely find a cross-language function call, after all.

It would be nice if we could figure out how to do this without patching the compiler, though. First, let's figure out how to link a Go program with a Rust archive.

I could not find a decent way to link against a foreign blob with go build (why should there be one?) except using #cgo directives. However, invoking cgo makes .s files go to the C compiler instead of the Go one, and my friends, we will need Go assembly.

Thankfully go/build is nothing but a frontend! Go offers a set of low level tools to compile and link programs, go build just collects files and invokes those tools. We can follow what it does by using the -x flag.

I built this small Makefile by following a -x -ldflags "-v -linkmode=external '-extldflags=-v'" invocation of a cgo build.

rustgo: rustgo.a  
        go tool link -o rustgo -extld clang -buildmode exe -buildid b01dca11ab1e -linkmode external -v rustgo.a

rustgo.a: hello.go hello.o  
        go tool compile -o rustgo.a -p main -buildid b01dca11ab1e -pack hello.go
        go tool pack r rustgo.a hello.o

hello.o: hello.s  
        go tool asm -I "$(shell go env GOROOT)/pkg/include" -D GOOS_darwin -D GOARCH_amd64 -o hello.o hello.s

This compiles a simple main package composed of a Go file (hello.go) and a Go assembly file (hello.s).

Now, if we want to link in a Rust object we first build it as a static library...

libhello.a: hello.rs  
        rustc -g -O --crate-type staticlib hello.rs

... and then just tell the external linker to link it together.

rustgo: rustgo.a libhello.a  
        go tool link -o rustgo -extld clang -buildmode exe -buildid b01dca11ab1e -linkmode external -v -extldflags='-lhello -L"$(CURDIR)"' rustgo.a

$ make
go tool asm -I "/usr/local/Cellar/go/1.8.1_1/libexec/pkg/include" -D GOOS_darwin -D GOARCH_amd64 -o hello.o hello.s  
go tool compile -o rustgo.a -p main -buildid b01dca11ab1e -pack hello.go  
go tool pack r rustgo.a hello.o  
rustc --crate-type staticlib hello.rs  
note: link against the following native artifacts when linking against this static library

note: the order and any duplication can be significant on some platforms, and so may need to be preserved

note: library: System

note: library: c

note: library: m

go tool link -o rustgo -extld clang -buildmode exe -buildid b01dca11ab1e -linkmode external -v -extldflags="-lhello -L/Users/filippo/code/misc/rustgo" rustgo.a  
HEADER = -H1 -T0x1001000 -D0x0 -R0x1000  
searching for runtime.a in /usr/local/Cellar/go/1.8.1_1/libexec/pkg/darwin_amd64/runtime.a  
searching for runtime/cgo.a in /usr/local/Cellar/go/1.8.1_1/libexec/pkg/darwin_amd64/runtime/cgo.a  
 0.00 deadcode
 0.00 pclntab=166785 bytes, funcdata total 17079 bytes
 0.01 dodata
 0.01 symsize = 0
 0.01 symsize = 0
 0.01 reloc
 0.01 dwarf
 0.02 symsize = 0
 0.02 reloc
 0.02 asmb
 0.02 codeblk
 0.03 datblk
 0.03 sym
 0.03 headr
 0.06 host link: "clang" "-m64" "-gdwarf-2" "-Wl,-headerpad,1144" "-Wl,-no_pie" "-Wl,-pagezero_size,4000000" "-o" "rustgo" "-Qunused-arguments" "/var/folders/ry/v14gg02d0y9cb2w9809hf6ch0000gn/T/go-link-412633279/go.o" "/var/folders/ry/v14gg02d0y9cb2w9809hf6ch0000gn/T/go-link-412633279/000000.o" "-g" "-O2" "-lpthread" "-lhello" "-L/Users/filippo/code/misc/rustgo"
 0.34 cpu time
12641 symbols  
5764 liveness data

Jumping into Rust

Alright, so we linked it, but the symbols are not going to do anything just by sitting next to each other. We need to somehow call the Rust function from our Go code.

We know how to call a Go function from Go. In assembly the same call looks like CALL hello(SB), where SB is a virtual register all global symbols are relative to.

If we want to call an assembly function from Go we make the compiler aware of its existence like a C header, by writing func hello() without a function body.

I tried all combinations of the above to call an external (Rust) function, but they all complained that they couldn't find either the symbol name, or the function body.

But cgo, which at the end of the day is just a giant code generator, somehow manages to eventually invoke that foreign function! How?

I stumbled upon the answer a couple days later.

//go:cgo_import_static _cgoPREFIX_Cfunc__Cmalloc
//go:linkname __cgofn__cgoPREFIX_Cfunc__Cmalloc _cgoPREFIX_Cfunc__Cmalloc
var __cgofn__cgoPREFIX_Cfunc__Cmalloc byte  
var _cgoPREFIX_Cfunc__Cmalloc = unsafe.Pointer(&__cgofn__cgoPREFIX_Cfunc__Cmalloc)

That looks like an interesting pragma! //go:linkname just creates a symbol alias in the local scope (which can be used to call private functions!), and I'm pretty sure the byte trick is only cleverness to have something to take the address of, but //go:cgo_import_static... this imports an external symbol!

Armed with this new tool and the Makefile above, we have a chance to invoke this Rust function (hello.rs)

#[no_mangle]
pub extern fn hello() {  
    println!("Hello, Rust!");
}

(The no-mangle-pub-extern incantation is from this tutorial.)

from this Go program (hello.go)

package main

//go:cgo_import_static hello

func trampoline()

func main() {  
    println("Hello, Go!")
    trampoline()
}

with the help of this assembly snippet. (hello.s)

TEXT ·trampoline(SB), 0, $2048  
    JMP hello(SB)
    RET

CALL was a bit too smart to work, but using a simple JMP...

Hello, Go!  
Hello, Rust!  
panic: runtime error: invalid memory address or nil pointer dereference  
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]

💥

Well, it crashes when it tries to return. Also that $2048 value is the whole stack size Rust is allowed (if it's even putting the stack in the right place), and don't ask me what happens if Rust tries to touch a heap... but hell, I'm surprised it works at all!

Calling conventions

Now, to make it return cleanly, and take some arguments, we need to look more closely at the Go and Rust calling conventions. A calling convention defines where arguments and return values sit across function calls.

The Go calling convention is described here and here. For Rust we'll look at the default for FFI, which is the standard C calling convention.

To keep going we're going to need a debugger. (LLDB supports Go, but breakpoints are somehow broken on macOS, so I had to play inside a privileged Docker container.)

Zelda dangerous to go alone

The Go calling convention

Go calling convention diagram

The Go calling convention is mostly undocumented, but we'll need to understand it to proceed, so here is what we can learn from a disassembly (amd64 specific). Let's look at a very simple function.

// func foo(x, y uint64) uint64
TEXT ·foo(SB), 0, $256-24  
    MOVQ x+0(FP), DX
    MOVQ DX, ret+16(FP)
    RET

foo has 256 (0x100) bytes of local frame, 16 bytes of arguments, 8 bytes of return value, and it returns its first argument.

func main() {  
    foo(0xf0f0f0f0f0f0f0f0, 0x5555555555555555)

rustgo[0x49d785]:  movabsq $-0xf0f0f0f0f0f0f10, %rax  
rustgo[0x49d78f]:  movq   %rax, (%rsp)  
rustgo[0x49d793]:  movabsq $0x5555555555555555, %rax  
rustgo[0x49d79d]:  movq   %rax, 0x8(%rsp)  
rustgo[0x49d7a2]:  callq  0x49d8a0                  ; main.foo at hello.s:14

The caller, seen above, does very little: it places the arguments on the stack in reverse order, at the bottom of its own frame (rsp to 16(rsp), remember that the stack grows down) and executes CALL. The CALL will push the return pointer to the stack and jump. There's no caller cleanup, just a plain RET.

Notice that rsp is fixed, and we have movqs, not pushs.

rustgo`main.foo at hello.s:14:  
rustgo[0x49d8a0]:  movq   %fs:-0x8, %rcx  
rustgo[0x49d8a9]:  leaq   -0x88(%rsp), %rax  
rustgo[0x49d8b1]:  cmpq   0x10(%rcx), %rax  
rustgo[0x49d8b5]:  jbe    0x49d8ee                  ; main.foo + 78 at hello.s:14  
                   [...]
rustgo[0x49d8ee]:  callq  0x495d10                  ; runtime.morestack_noctxt at asm_amd64.s:405  
rustgo[0x49d8f3]:  jmp    0x49d8a0                  ; main.foo at hello.s:14

The first 4 and last 2 instructions of the function are checking if there is enough space for the stack, and if not calling runtime.morestack. They are probably skipped for NOSPLIT functions.

rustgo[0x49d8b7]:  subq   $0x108, %rsp  
                   [...]
rustgo[0x49d8e6]:  addq   $0x108, %rsp  
rustgo[0x49d8ed]:  retq

Then there's the rsp management, which subtracts 0x108, making space for the entire 0x100 bytes of frame in one go, and the 8 bytes of frame pointer. So rsp points to the bottom (the end) of the function frame, and is callee managed. Before returning, rsp is returned to where it was (just past the return pointer).

rustgo[0x49d8be]:  movq   %rbp, 0x100(%rsp)  
rustgo[0x49d8c6]:  leaq   0x100(%rsp), %rbp  
                   [...]
rustgo[0x49d8de]:  movq   0x100(%rsp), %rbp

Finally the frame pointer, which is effectively pushed to the stack just after the return pointer, and updated at rbp. So rbp is also callee saved, and should be updated to point at where the caller's rbp is stored to enable stack trace unrolling.

rustgo[0x49d8ce]:  movq   0x110(%rsp), %rdx  
rustgo[0x49d8d6]:  movq   %rdx, 0x120(%rsp)

Finally, from the body itself we learn that return values go just above the arguments.

Virtual registers

The Go docs say that SP and FP are virtual registers, not just aliases of rsp and rbp.

Indeed, when accessing SP from Go assembly, the offsets are adjusted relative to the real rsp so that SP points to the top, not the bottom, of the frame. That's convenient because it means not having to change all offsets when changing the frame size, but it's just syntactic sugar. Naked access to the register (like MOVQ SP, DX) accesses rsp directly.

The FP virtual register is simply an adjusted offset over rsp, too. It points to the bottom of the caller frame, where arguments are, and there's no direct access.

Note: Go maintains rbp and frame pointers to help debugging, but then uses a fixed rsp and omit-stack-pointer-style rsp offsets for the virtual FP. You can learn more about frame pointers and not using them from this Adam Langley blog post.

The C calling convention

"sysv64", the default C calling convention on x86-64, is quite different:

The arguments are passed via registers: RDI, RSI, RDX, RCX, R8, and R9.
The return value goes to RAX.
Some registers are callee-saved: RBP, RBX, and R12–R15.
- We care little about this, since in Go all registers are caller-saved.
The stack must be aligned to 16-bytes.
- (I think this is why JMP worked and CALL didn't, we failed to align the stack!)

Frame pointers work the same way (and are generated by rustc with -g).

Gluing them together

Building a simple trampoline between the two conventions won't be hard. We can also look at asmcgocall for inspiration, since it does approximately the same job, but for cgo.

We need to remember that we want the Rust function to use the stack space of our assembly function, since Go ensured for us that it's present. To do that, we have to rollback rsp from the end of the stack.

package main

//go:cgo_import_static increment
func trampoline(arg uint64) uint64

func main() {  
    println(trampoline(41))
}

⬇

TEXT ·trampoline(SB), 0, $2048-16  
    MOVQ arg+0(FP), DI // Load the argument before messing with SP
    MOVQ SP, BX        // Save SP in a callee-saved registry
    ADDQ $2048, SP     // Rollback SP to reuse this function's frame
    ANDQ $~15, SP      // Align the stack to 16-bytes
    CALL increment(SB)
    MOVQ BX, SP        // Restore SP
    MOVQ AX, ret+8(FP) // Place the return value on the stack
    RET

⬇

#[no_mangle]
pub extern fn increment(a: u64) -> u64 {  
    return a + 1;
}

CALL on macOS

CALL didn't quite work on macOS. For some reason, there the function call was replaced with an intermediate call to _cgo_thread_start, which is not that incredible considering we are using something called cgo_import_static and that CALL is virtual in Go assembly.

callq  0x40a27cd                 ; x_cgo_thread_start + 29

We can bypass that "helper" by using the full //go:linkname incantation we found in the standard library to take a pointer to the function, and then calling the function pointer, like this.

import _ "unsafe"

//go:cgo_import_static increment
//go:linkname increment increment
var increment uintptr  
var _increment = &increment

    MOVQ ·_increment(SB), AX
    CALL AX

Is it fast?

The point of this whole exercise is to be able to call Rust instead of assembly for cryptographic operations (and to have fun). So a rustgo call will have to be almost as fast as an assembly call to be useful.

Benchmark time!

We'll compare incrementing a uint64 inline, with a //go:noinline function, with the rustgo call above, and with a cgo call to the exact same Rust function.

Rust was compiled with -g -O, and the benchmarks were run on macOS on a 2.9GHz Intel Core i5.

name                 time/op  
CallOverhead/Inline  1.72ns ± 3%  
CallOverhead/Go      4.60ns ± 2%  
CallOverhead/rustgo  5.11ns ± 4%  
CallOverhead/cgo     73.6ns ± 0%

rustgo is 11% slower than a Go function call, and almost 15 times faster than cgo!

The performance is even better when run on Linux without the function pointer workaround, with only a 2% overhead.

name                 time/op  
CallOverhead/Inline  1.67ns ± 2%  
CallOverhead/Go      4.49ns ± 3%  
CallOverhead/rustgo  4.58ns ± 3%  
CallOverhead/cgo     69.4ns ± 0%

A real example

For a real-world demo, I picked the excellent curve25519-dalek library, and specifically the task of multiplying the curve basepoint by a scalar and returning its Edwards representation.

The Cargo benchmarks swing widely between executions because of CPU frequency scaling, but they suggest the operation will take 22.9µs ± 17%.

test curve::bench::basepoint_mult    ... bench:      17,276 ns/iter (+/- 3,057)  
test curve::bench::edwards_compress  ... bench:       5,633 ns/iter (+/- 858)

On the Go side, we'll expose a simple API.

func ScalarBaseMult(dst, in *[32]byte)

On the Rust side, it's not different from building an interface for normal FFI.

I'll be honest, it took me forever to figure out enough Rust to make this work.

#![no_std]

extern crate curve25519_dalek;  
use curve25519_dalek::scalar::Scalar;  
use curve25519_dalek::constants;

#[no_mangle]
pub extern fn scalar_base_mult(dst: &mut [u8; 32], k: &[u8; 32]) {  
    let res = &constants::ED25519_BASEPOINT_TABLE * &Scalar(*k);
    dst.clone_from(res.compress_edwards().as_bytes());
}

To build the .a we use cargo build --release with a Cargo.toml that defines the dependencies, enables frame pointers, and configures curve25519-dalek to use its most efficient math and no standard library.

[package]
name = "ed25519-dalek-rustgo"  
version = "0.0.0"

[lib]
crate-type = ["staticlib"]

[dependencies.curve25519-dalek]
version = "^0.9"  
default-features = false  
features = ["nightly"]

[profile.release]
debug = true

Finally, we need to adjust the trampoline to take two arguments and return no value.

TEXT ·ScalarBaseMult(SB), 0, $16384-16  
    MOVQ dst+0(FP), DI
    MOVQ in+8(FP), SI

    MOVQ SP, BX
    ADDQ $16384, SP
    ANDQ $~15, SP

    MOVQ ·_scalar_base_mult(SB), AX
    CALL AX

    MOVQ BX, SP
    RET

The result is a transparent Go call with performance that closely resembles the pure Rust benchmark, and is almost 6% faster than cgo!

name            old time/op  new time/op  delta  
RustScalarBaseMult  23.7µs ± 1%  22.3µs ± 4%  -5.88%  (p=0.003 n=5+7)

For comparison, similar functionality is provided by github.com/agl/ed25519/edwards25519, and that pure-Go library takes almost 3 times as long.

h := &edwards25519.ExtendedGroupElement{}  
edwards25519.GeScalarMultBase(h, &k)  
h.ToBytes(&dst)

name            time/op  
GoScalarBaseMult  66.1µs ± 2%

Packaging up

Now we know it actually works, that's exciting! But to be usable it will have to be an importable package, not forced into package main by a weird build process.

This is where //go:binary-only-package comes in! That annotation allows us to tell the compiler to ignore the source of the package, and to only use the pre-built .a library file in $GOPATH/pkg.

If we can manage to build a .a file that works with Go's native linker (cmd/link, referred to also as the internal linker), we can redistribute that and it will let our users import the package as if it was a native one, including cross-compiling (provided we included a .a for that platform)!

The Go side is easy, and pairs with the assembly and Rust we already have. We can even include docs for go doc's benefit.

//go:binary-only-package

// Package edwards25519 implements operations on an Edwards curve that is
// isomorphic to curve25519.
//
// Crypto operations are implemented by calling directly into the Rust
// library curve25519-dalek, without cgo.
//
// You should not actually be using this.
package edwards25519

import _ "unsafe"

//go:cgo_import_static scalar_base_mult
//go:linkname scalar_base_mult scalar_base_mult
var scalar_base_mult uintptr  
var _scalar_base_mult = &scalar_base_mult

// ScalarBaseMult multiplies the scalar in by the curve basepoint, and writes
// the compressed Edwards representation of the resulting point to dst.
func ScalarBaseMult(dst, in *[32]byte)

The Makefile will have to change quite a bit—since we aren't building a binary anymore we don't get to keep using go tool link.

A .a archive is just a pack of .o object files in an ancient format with a symbol table. If we could get the symbols from the Rust libed25519_dalek_rustgo.a library into the edwards25519.a archive that go tool compile made, we should be golden.

.a archives are managed by the ar UNIX tool, or by its Go internal counterpart, cmd/pack (as in go tool pack). The two formats are ever-so-subtly different, of course. We'll need to use the platform ar for libed25519_dalek_rustgo.a and the Go cmd/pack for edwards25519.a.

(For example, the platform ar on my macOS uses the BSD convention of calling files #1/LEN and then embedding the filename of length LEN at the beginning of the file, to exceed the 16 bytes max file length. That was confusing.)

To bundle the two libraries I tried doing the simplest (read: hackish) thing: extract libed25519_dalek_rustgo.a into a temporary folder, and then pack the objects back into edwards25519.a.

edwards25519/edwards25519.a: edwards25519/rustgo.go edwards25519/rustgo.o target/release/libed25519_dalek_rustgo.a  
               go tool compile -N -l -o $@ -p main -pack edwards25519/rustgo.go
               go tool pack r $@ edwards25519/rustgo.o # from edwards25519/rustgo.s
               mkdir -p target/release/libed25519_dalek_rustgo && cd target/release/libed25519_dalek_rustgo && \
                       rm -f *.o && ar xv "$(CURDIR)/target/release/libed25519_dalek_rustgo.a"
               go tool pack r $@ target/release/libed25519_dalek_rustgo/*.o

.PHONY: install
install: edwards25519/edwards25519.a  
               mkdir -p "$(shell go env GOPATH)/pkg/darwin_amd64/$(IMPORT_PATH)/"
               cp edwards25519/edwards25519.a "$(shell go env GOPATH)/pkg/darwin_amd64/$(IMPORT_PATH)/"

Imagine my surprise when it worked!

With the .a in place it's just a matter of making a simple program using the package.

package main

import (  
    "bytes"
    "encoding/hex"
    "fmt"
    "testing"

    "github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519"
)

func main() {  
    input, _ := hex.DecodeString("39129b3f7bbd7e17a39679b940018a737fc3bf430fcbc827029e67360aab3707")
    expected, _ := hex.DecodeString("1cc4789ed5ea69f84ad460941ba0491ff532c1af1fa126733d6c7b62f7ebcbcf")

    var dst, k [32]byte
    copy(k[:], input)

    edwards25519.ScalarBaseMult(&dst, &k)
    if !bytes.Equal(dst[:], expected) {
        fmt.Println("rustgo produces a wrong result!")
    }

    fmt.Printf("BenchmarkScalarBaseMult\t%v\n", testing.Benchmark(func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            edwards25519.ScalarBaseMult(&dst, &k)
        }
    }))
}

And running go build!

$ go build -ldflags '-linkmode external -extldflags -lresolv'
$ ./ed25519-dalek-rustgo
BenchmarkScalarBaseMult      100000         19914 ns/op

Well, it almost worked. We cheated. The binary would not compile unless we linked it to libresolv. To be fair, the Rust compiler tried to tell us. (But who listens to everything the Rust compiler tells you anyway?)

note: link against the following native artifacts when linking against this static library

note: the order and any duplication can be significant on some platforms, and so may need to be preserved

note: library: System

note: library: resolv

note: library: c

note: library: m

Now, linking against system libraries would be a problem, because it will never happen with internal linking and cross-compilation...

But hold on a minute, libresolv?! Why does our no_std, "should be like assembly", stack only Rust library want to resolve DNS names?

I really meant `no_std`

The problem is that the library is not actually no_std. Look at all that stuff in there! We want nothing to do with allocators!

$ ar t target/release/libed25519_dalek_rustgo.a
__.SYMDEF  
ed25519_dalek_rustgo-742a1d9f1c101d86.0.o  
ed25519_dalek_rustgo-742a1d9f1c101d86.crate.allocator.o  
curve25519_dalek-03e3ca0f6d904d88.0.o  
subtle-cd04b61500f6e56a.0.o  
std-72653eb2361f5909.0.o  
panic_unwind-d0b88496572d35a9.0.o  
unwind-da13b913698118f9.0.o  
arrayref-2be0c0ff08ae2c7d.0.o  
digest-f1373d68da35ca45.0.o  
generic_array-95ca86a62dc11ddc.0.o  
nodrop-7df18ca19bb4fc21.0.o  
odds-3bc0ea0bdf8209aa.0.o  
typenum-a61a9024d805e64e.0.o  
rand-e0d585156faee9eb.0.o  
alloc_system-c942637a1f049140.0.o  
libc-e038d130d15e5dae.0.o  
alloc-0e789b712308019f.0.o  
std_unicode-9735142be30abc63.0.o  
compiler_builtins-8a5da980a34153c7.0.o  
absvdi2.o  
absvsi2.o  
absvti2.o  
[... snip ...]
truncsfhf2.o  
ucmpdi2.o  
ucmpti2.o  
core-9077840c2cc91cbf.0.o

So how do we actually make it no_std? This turned out to be an entire side-quest, but I'll give you a recap.

If any dependency is not no_std, your no_std flag is nullified. One of the curve25519-dalek dependencies had this problem, cargo update fixed that.
Actually making a no_stdstaticlib (that is, an library for external use, as opposed to for inclusion in a Rust program) is more like making a no_stdexecutable, which is much harder as it must be self-contained.
The docs on how to make a no_stdexecutable are sparse. I mostly used an old version of the Rust book and eventually found this section in the lang_items chapter. This blog post was useful.
For starters, you need to define "lang_items" functions to handle functionality that is normally in the stdlib, like panic_fmt.
Then you are without the Rust equivalents of compiler-rt, so you have to import the crate compiler_builtins. (rust-lang/rust#43264)
Then there's a problem with rust_begin_unwind being unexported, which don't ask me why but is solved by marking panic_fmt as no_mangle, which the linter is not happy about. (rust-lang/rust#38281)
Then you are without memcpy, but thankfully there's a native Rust reimplementation in the rlibc crate. Super useful learning that nm -u will tell you what symbols are missing from an object.

This all boils down to a bunch of arcane lines at the top of our lib.rs.

#![no_std]
#![feature(lang_items, compiler_builtins_lib, core_intrinsics)]
use core::intrinsics;  
#[allow(private_no_mangle_fns)] #[no_mangle] // rust-lang/rust#38281
#[lang = "panic_fmt"] fn panic_fmt() -> ! { unsafe { intrinsics::abort() } }
#[lang = "eh_personality"] extern fn eh_personality() {}
extern crate compiler_builtins; // rust-lang/rust#43264  
extern crate rlibc;

And with that, go build works (!!!) on macOS.

Linux

On Linux nothing works.

External linking complains about fmax and other symbols missing, and it seems to be right.

$ ld -r -o linux.o target/release/libed25519_dalek_rustgo/*.o
$ nm -u linux.o
                 U _GLOBAL_OFFSET_TABLE_
                 U abort
                 U fmax
                 U fmaxf
                 U fmaxl
                 U logb
                 U logbf
                 U logbl
                 U scalbn
                 U scalbnf
                 U scalbnl

A friend thankfully suggested making sure that I was using --gc-sections to strip dead code, which might reference things I don't actually need. And sure enough, this worked. (That's three layers of flag-passing right there.)

$ go build -ldflags '-extld clang -linkmode external -extldflags -Wl,--gc-sections'

But umh, in the Makefile we aren't using a linker at all, so where do we put --gc-sections? The answer is to stop hacking .as together and actually reading the linker man page.

We can build a .o containing a given symbol and all the symbols it references with ld -r --gc-sections -u $SYMBOL. -r makes the object reusable for a later link, and -u marks a symbol as needed, or everything would end up garbage collected. $SYMBOL is scalar_base_mult in our case.

Why wasn't this a problem on macOS? It would have been if we linked manually, but the macOS compiler apparently does dead symbol stripping by default.

$ ld -e _scalar_base_mult target/release/libed25519_dalek_rustgo/*.o
Undefined symbols for architecture x86_64:  
  "___assert_rtn", referenced from:
      _compilerrt_abort_impl in int_util.o
  "_copysign", referenced from:
      ___divdc3 in divdc3.o
      ___muldc3 in muldc3.o
  "_copysignf", referenced from:
      ___divsc3 in divsc3.o
      ___mulsc3 in mulsc3.o
  "_copysignl", referenced from:
      ___divxc3 in divxc3.o
      ___mulxc3 in mulxc3.o
  "_fmax", referenced from:
      ___divdc3 in divdc3.o
  "_fmaxf", referenced from:
      ___divsc3 in divsc3.o
  "_fmaxl", referenced from:
      ___divxc3 in divxc3.o
  "_logb", referenced from:
      ___divdc3 in divdc3.o
  "_logbf", referenced from:
      ___divsc3 in divsc3.o
  "_logbl", referenced from:
      ___divxc3 in divxc3.o
  "_scalbn", referenced from:
      ___divdc3 in divdc3.o
  "_scalbnf", referenced from:
      ___divsc3 in divsc3.o
  "_scalbnl", referenced from:
      ___divxc3 in divxc3.o
ld: symbol(s) not found for inferred architecture x86_64  
$ ld -e _scalar_base_mult -dead_strip target/release/libed25519_dalek_rustgo/*.o

This is also the part where we learn painfully that the macOS platform prepends a _ to all symbol names, because reasons.

So here's the Makefile portion that will work with external linking out of the box.

edwards25519/edwards25519.a: edwards25519/rustgo.go edwards25519/rustgo.o edwards25519/libed25519_dalek_rustgo.o  
        go tool compile -N -l -o $@ -p main -pack edwards25519/rustgo.go
        go tool pack r $@ edwards25519/rustgo.o edwards25519/libed25519_dalek_rustgo.o

edwards25519/libed25519_dalek_rustgo.o: target/$(TARGET)/release/libed25519_dalek_rustgo.a  
ifeq ($(shell go env GOOS),darwin)  
        $(LD) -r -o $@ -arch x86_64 -u "_$(SYMBOL)" $^
else  
        $(LD) -r -o $@ --gc-sections -u "$(SYMBOL)" $^
endif

The last missing piece is internal linking on Linux. In short, it was not linking the Rust code, even if the compilation seemed to succeed. The relocations were not happening and the CALL instructions in our Rust function left pointing at meaningless addresses.

At that point I felt like it had to be a silent linker bug, the final boss in implementing rustgo, and reached out to people much smarter than me. One of them was guiding me in debugging cmd/link (which was fascinating!) when Ian Lance Taylor, the author of cgo, helpfully pointed out that //cgo:cgo_import_static is not enough for internal linking, and that I also wanted //cgo:cgo_import_dynamic.

//go:cgo_import_static scalar_base_mult
//go:cgo_import_dynamic scalar_base_mult

I still have no idea why leaving it out would result in that issue, but adding it finally made our rustgo package compile both with external and internal linking, on Linux and macOS, out of the box.

Redistributable

Now that we can build a .a, we can take the suggestion in the //go:binary-only-package spec, and build a tarball with .as for linux_amd64/darwin_amd64 and the package source, to untar into a GOPATH to install.

$ tar tf ed25519-dalek-rustgo_go1.8.3.tar.gz
src/github.com/FiloSottile/ed25519-dalek-rustgo/  
src/github.com/FiloSottile/ed25519-dalek-rustgo/.gitignore  
src/github.com/FiloSottile/ed25519-dalek-rustgo/Cargo.lock  
src/github.com/FiloSottile/ed25519-dalek-rustgo/Cargo.toml  
src/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519/  
src/github.com/FiloSottile/ed25519-dalek-rustgo/main.go  
src/github.com/FiloSottile/ed25519-dalek-rustgo/Makefile  
src/github.com/FiloSottile/ed25519-dalek-rustgo/release.sh  
src/github.com/FiloSottile/ed25519-dalek-rustgo/src/  
src/github.com/FiloSottile/ed25519-dalek-rustgo/target.go  
src/github.com/FiloSottile/ed25519-dalek-rustgo/src/lib.rs  
src/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519/rustgo.go  
src/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519/rustgo.s  
pkg/linux_amd64/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519.a  
pkg/darwin_amd64/github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519.a

Once installed like that, the package will be usable just like a native one, cross-compilation included (as long as we packaged a .a for the target)!

The only thing we have to worry about is that if we build Rust with -Ctarget-cpu=native it might not run on older CPUs. Thankfully benchmarks (and the curve25519-dalek authors) tell us that the only real difference is between post and pre-Haswell processors, so we only have to make a universal build and a Haswell one.

$ benchstat bench-none.txt bench-haswell.txt
name                   old time/op  new time/op  delta  
ScalarBaseMult/rustgo  22.0µs ± 3%  20.2µs ± 2%  -8.41%  (p=0.001 n=7+6)  
$ benchstat bench-haswell.txt bench-native.txt
name                   old time/op  new time/op  delta  
ScalarBaseMult/rustgo  20.2µs ± 2%  20.1µs ± 2%   ~     (p=0.945 n=6+7)

As the cherry on top, I made the Makefile obey GOOS/GOARCH, converting them as needed into Rust target triples, so if you have Rust set up for cross-compilation you can even cross-compile the .a itself.

Here's the result: github.com/FiloSottile/ed25519-dalek-rustgo/edwards25519. It's even on godoc.

Turning it into a real thing

Well, this was fun.

But to be clear, rustgo is not a real thing that you should use ~~in production~~. For example, I suspect I should be saving g before the jump, the stack size is completely arbitrary, and shrinking the trampoline frame like that will probably confuse the hell out of debuggers. Also, a panic in Rust might get weird.

To make it a real thing I'd start by calling morestack manually from a NOSPLIT assembly function to ensure we have enough goroutine stack space (instead of rolling back rsp) with a size obtained maybe from static analysis of the Rust function (instead of, well, made up).

It could all be analyzed, generated and built by some "rustgo" tool, instead of hardcoded in Makefiles and assembly files. cgo itself is little more than a code-generation tool after all. It might make sense as a go:generate thing, but I know someone who wants to make it a cargo command. (Finally some Rust-vs-Go fighting!) Also, a Rust-side collection of FFI types like, say, GoSlice would be nice.

#[repr(C)]
struct GoSlice {  
    array: *mut u8,
    len: i32,
    cap: i32,
}

Or maybe a Go or Rust adult will come and tell us to stop before we get hurt.

In the meantime, you might want to follow me on Twitter.

Thanks (in no particular order) to David, Ian, Henry, Isis, Manish, Zaki, Anna, George, Kaylyn, Bill, David, Jess, Tony and Daniel for making this possible. Don't blame them for the mistakes and horrors, those are mine.

P.S. Before anyone tries to compare this to cgo (which has many more safety features) or pure Go, it's not meant to replace neither. It's meant to replace manually written assembly with something much safer and more readable, with comparable performance. Or better yet, it was meant to be a fun experiment.

↧

Wefunder is hiring engineers to save the American Dream

August 15, 2017, 12:48 pm

≫ Next: High-process-count support added to master

≪ Previous: Rustgo: Calling Rust from Go with near-zero overhead

Benefits

Wefunder covers 100% of health care, dental, and vision. We also offer a 401k plan.

All-Expense Paid Vacation

We're all big on travel and adventure. There's unlimited vacation days. Also, once a year, the team goes somewhere exotic, all expenses paid. (Previously: Greece, Thailand, Hawaii, & Italy)

Transport & Parking

There's free parking at our office, and we re-imburse up to $200 a month in Uber or Lyft.

Gym or Rock Climbing Membership

Free memberships at 24 Hour Fitness or Mission Cliffs; both a short walk from the office.

Classes & Conferences

Take professional development classes and conferences on us. Always be learning, they say.

Food & Drinks

Our kitchen is always well stocked with tasty foods and drinks. We often cook each other breakfast and lunch! Would it be inappropriate the mention the martini bar?

Work Toys

Set up your work station with all the fancy equipment you need. That 5k iMac is going to make you more productive, right?

Tired? Take a nap

Roof deck hammocks and bunk beds!

↧

High-process-count support added to master

August 15, 2017, 10:55 am

≫ Next: Text Messages Between Travis Kalanick and Anthony Levandowski

≪ Previous: Wefunder is hiring engineers to save the American Dream

High-process-count support added to masterMatthew Dillondillon at backplane.com
Sat Aug 12 13:44:15 PDT 2017

We've fixed a number of bottlenecks that can develop when the number of
user processes runs into the tens of thousands or higher. One thing led to
another and I said to myself, "gee, we have a 6-digit PID, might as well
make it work to a million!". With the commits made today, master can
support at least 900,000 processes with just a kern.maxproc setting in
/boot/loader.conf, assuming the machine has the memory to handle it.

And, in fact, as today's machines start to ratchet up there in both memory
capacity and core count, with fast storage (NVMe) and fast networking
(10GigE and higher), even in consumer boxes, this is actually something
that one might want to do. With AMD's threadripper and EPYC chips now out,
the Intel<->AMD cpu wars are back on! Boasting up to 32 cores (64
threads) per socket and two sockets on EPYC, terrabytes of ram, and
motherboards with dual 10GigE built-in, the reality is that these numbers
are already achievable in a useful manner.

In anycase, I've tested these changes on a dual-socket xeon. I can in-fact
start 900,000 processes. They don't get a whole lot of cpu and running
'ps' would be painful, but it works and the system is still responsive from
the shell with all of that going on.

xeon126# uptime
1:42PM up 9 mins, 3 users, load averages: 890407.00, 549381.40, 254199.55

In fact, judging from the memory use, these minimal test processes only eat
around 60KB each. 900,000 of them ate only 55GB on a 128GB machine. So
even a million processes is not out of the question, depending on the cpu
requirements for those processes. Today's modern machines can be stuffed
with enormous amounts of memory.

Of course, our PIDs are currently limited to 6 digits, so a million is
kinda the upper limit in terms of discrete user processes (verses pthreads
which are less restricted). I'd rather not go to 7 digits (yet).

-Matt

NOTE: master users, a full world + kernel compile is needed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/users/attachments/20170812/198f7439/attachment.html>

More information about the Users mailing list

↧

Text Messages Between Travis Kalanick and Anthony Levandowski

August 15, 2017, 10:49 am

≫ Next: Dung Beetles Navigate via the Milky Way

≪ Previous: High-process-count support added to master

The epic court case between Waymo and Uber over self-driving car secrets took a tabloid turn last week, as Waymo’s lawyers filed a document containing approximately 400 text messages between Uber founder Travis Kalanick, and Anthony Levandowski, the engineer accused of taking thousands of files from Waymo to help build Uber’s lidar sensors.

Waymo expected the SMS messages, sent between February and December 2016, to reveal what Uber knew, and when the company knew it. Some emails do touch on technical matters—for example, one from Levandowski on 5 May saying that he was “driving to SF to meet with [Uber’s] laser guy and guide the team.”

However, Waymo’s lawyers say that there are “significant and inexplicable gaps” in the text messages, including none at all before February 13, even though the two had certainly met before. Despite this, the texts provide a rich insight into the men’s relationship, and into Uber’s plans for (and worries about) its self-driving car technology.

The Otto Acquisition

Just two weeks after Levandowski quit Google, Kalanick was already visiting the engineer’s new self-driving truck start-up, Otto. Kalanick was planning to buy Otto almost immediately but that fast pace came with issues:

2/13/2016 Kalanick: Good hangin

2/13/2016 Levandowski: It was awesome. Lots more to come. We ended up wrapping truck testing at 2.30

2/13/2016 Levandowski: We had a close call but no contact with anyone or anything

This appears to be a reference to a failure of the self-driving technology that nearly resulted in an accident. Presumably, the testing was happening at a test track, as the California Department of Motor Vehicles (DMV) still does not allow the testing of autonomous commercial vehicles.

4/6/2016 Levandowski: Basically I’d like the freedom to move as needed on the acquisition (and take advice/guidance) but if I can close them within the range we agreed you guys are happy.

4/6/2016 Kalanick: I am super down to make sure this [is] quick lightweight and straight forward for you guys

At this point, Otto was being run from Levandowski’s home in Palo Alto. In early April, the DMV launched an investigation that had Levandowski and Kalanick worried.

4/22/2016 Kalanick: How did they find out?

4/22/2016 Levandowski: Trying to dig in, likely city of Palo Alto.

4/22/2016 Levandowski: Just wrapped with the DMV, it was the city of Palo Alto freaking out about AV trucks testing and were asked to investigate. The guys were happy with our answers and were [sic] in the clear.

Levandowski and Kalanick’s Relationship

The two men quickly formed a strong bond, but there are challenges—and advantages—when your new best friend is the CEO of the world’s largest start-up.

3/29/2016 Levandowski: I am at the secret side door, no rush

7/23/2016 Kalanick: You hungry? .. Can get some Uber Eats steak and eggs.

At these meetings, often late at night, Levandowski would explain the mysteries of self-driving technology to the Uber founder.

4/8/2016 Kalanick: Where you teach me in depth about an autonomy topic

4/8/2016 Levandowski: Yes, we should of done it. We did a bit on lasers before but need to go deep on all the topics.

In return, Kalanick dispensed management advice, such as this just before the Otto acquisition was announced:

8/12/2016 Kalanick: Three principles

8/12/2016 Kalanick: 1) don’t tell anyone about the deal before it happens, ESPECIALLY someone you're about to fire 2) firing fast is a cultural imperative you don't want to break except in the most extreme situations 3) get creative

Both men shared the same ambition:

9/19/2016 Levandowski: We’re going to take over the world

9/19/2016 Levandowski: One robot at a time

10/7/2016 Kalanick: Down to hang this eve and mastermind some shit

Uber Really Wanted to Partner With Google

Uber’s rivalry with Alphabet’s self-driving subsidiary Waymo is a recent thing. An earlier court filing contained an email from 2015 that showed Kalanick and Google founder Larry Page were exploring a partnership on self-driving technology. The new text messages suggest that this was still a hope over a year later.

6/13/2016 Kalanick: Just got word from Drummond that g-CO is out

6/13/2016 Levandowski: Wow, at least now we know it's a zero sum game

David Drummond is Alphabet’s chief legal officer and was a board member of Uber until August 2016. He was the main channel of communication between the companies. “G-co” could refer to cooperation or forming a company with Google, the lack of which cemented the conflict between the two, and ultimately pushed Drummond off the board.

Uber Saw Tesla as a Huge Competitor

While Uber followed Google’s cars closely, it was Tesla and Elon Musk that the duo discussed most frequently.

9/14/2016 Levandowski: Tesla crash in January … implies Elon is lying about millions of miles without incident. We should have LDP on Tesla just to catch all the crashes that are going on.

9/22/2016: We’ve got to start calling Elon on his shit. I'm not on social media but let's start "faketesla" and start give physics lessons about stupid shit Elon says like [saying his cars don’t need lidar]

In late October, the two exchanged a flurry of tests about Musk’s announcement that all Teslas would come with all the hardware necessary for full self-driving, sometimes called Level 5.

10/20/2016 Levandowski: Elon is going to make going to [self driving] not as big of a scary thing for the public... which should be good

10/20/2017 Kalanick: Got to get software runnin

10/20/2017 Levandowski: Amen

10/20/2017 Kalanick: What do you think chances are he has Level 5 in 20% of a given city?

10/20/2017 Levandowski: For easy city

10/20/2017 Levandowski: He's trippin' but might/will blame regulatory as to why it's not available

Did Uber’s Cars Have Real Problems in San Francisco?

In December 2016, Uber launched a self-driving taxi service in San Francisco, without obtaining permission from the DMV. The program lasted only a week, and was dogged by reports of Uber’s 16 cars running red lights. A single text from Levandowski to Kalanick, two days before Uber’s cars had their registrations revoked by the DMV, refers to the issue:

12/19/2016 Levandowski: Quick update on that special intersection in SF, we taped 6 red car violations within 2 hours

A source close to Uber’s operations says its engineers watched the intersection where Uber’s cars were said to have run the red light, and that this text refers to them recording a number of normal, human-operated vehicles also breaking the law. Uber has never officially admitted that its software was to blame.

This post was corrected on 15 August 2017 to fix the context of a 12/19/2016 message.

↧

Dung Beetles Navigate via the Milky Way

August 15, 2017, 8:09 am

≫ Next: Gates Makes Largest Donation Since 2000 with $4.6B Pledge

≪ Previous: Text Messages Between Travis Kalanick and Anthony Levandowski

Talk about star power—a new study shows that dung beetles navigate via the Milky Way, the first known species to do so in the animal kingdom.

The tiny insects can orient themselves to the bright stripe of light generated by our galaxy, and move in a line relative to it, according to recent experiments in South Africa.

“This is a complicated navigational feat—it’s quite impressive for an animal that size,” said study co-author Eric Warrant, a biologist at the University of Lund in Sweden.

picture of a dung beetle — A dung beetle rolling its ball in South Africa. Photograph courtesy Eric Warrant.

Moving in a straight line is crucial to dung beetles, which live in a rough-and-tumble world where competition for excrement is fierce. (Play “Dung Beetle Derby” on the National Geographic Kids website.)

Once the beetles sniff out a steaming pile, males painstakingly craft the dung into balls and roll them as far away from the chaotic mound as possible, often toting a female that they have also picked up. The pair bury the dung, which later becomes food for their babies.

But it’s not always that easy. Lurking about the dung pile are lots of dung beetles just waiting to snatch a freshly made ball. (Related: “Dung Beetles’ Favorite Poop Revealed.”)

That’s why ball-bearing beetles have to make a fast beeline away from the pile.

“If they roll back into the dung pile, it’s curtains,” Warrant said. If thieves near the pile steal their ball, the beetle has to start all over again, which is a big investment of energy.

Seeing Stars

Scientists already knew that dung beetles can move in straight lines away from dung piles by detecting a symmetrical pattern of polarized light that appears around the sun. We can’t see this pattern, but insects can thanks to special photoreceptors in their eyes.

Milky Way picture — The Milky Way glimmers over Indonesia. Photograph by Justin Ng, Your Shot.

But less well-known was how beetles use visual cues at night, such as the moon and its much weaker polarized light pattern. So Warrant and colleagues went to a game farm in South Africa to observe the nocturnal African dung beetle Scarabaeus satyrus. (Read another Weird & Wild post on why dung beetles dance.)

Attracting the beetles proved straightforward: The scientists collected buckets of dung, put them out, and waited for the beetles to fly in.

But their initial observations were puzzling. S. satyrus could still roll a ball in a straight line even on moonless nights, “which caused us a great deal of grief—we didn’t know how to explain this at all,” Warrant said.

Then, “it occurred to us that maybe they were using the stars—and it turned out they were.”

Dapper Beetles

To test the star theory, the team set up a small, enclosed table on the game reserve, placed beetles in them, and observed how the insects reacted to different sky conditions. The team confirmed that even on clear, moonless nights, the beetles could still navigate their balls in a straight line.

To show that the beetles were focusing on the Milky Way, the team moved the table into the Johannesburg Planetarium, and found that the beetles could orient equally well under a full starlit sky as when only the Milky Way was present. (See Milky Way pictures.)

Lastly, to confirm the Milky Way results, the team put little cardboard hats on the study beetles’ heads, blocking their view of the sky. Those beetles just rolled around and around aimlessly, according to the study, published recently in the journal Current Biology.

Dung beetle researcher Sean D. Whipple, of the Entomology Department at the University of Nebraska-Lincoln, said by email that the “awesome results …. provide strong evidence for orientation by starlight in dung beetles.”

He added that this discovery reveals another potential negative impact of light pollution, a global phenomenon that blocks out stars.

“If artificial light—from cities, houses, roadways, etc.—drowns out the visibility of the night sky, it could have the potential to impact effective orientation and navigation of dung beetles in the same way as an overcast sky,” Whipple said.

Keep On Rollin’

Study co-author Warrant added that other dung beetles likely navigate via the Milky Way, although the galaxy is most prominent in the night sky in the Southern Hemisphere.

What’s more, it’s “probably a widespread skill that insects have—migrating moths might also be able to do it.”

As for the beetles themselves, they were “very easy to work with,” he added.

“You can do anything you want to them, and they just keep on rolling.”

↧

What’s the difference between 32-bit and 64-bit?

How do you get 64-bit Firefox?

Avoiding Design Limbo

Elixir & Phoenix

Diving In To The Deep End

Code Highlights

Performance

Fatigue and Completion

What's next?

Pointers

Why Rust

Why not cgo

Linking it together

Jumping into Rust

Calling conventions

The Go calling convention

Virtual registers

The C calling convention

Gluing them together

CALL on macOS

Is it fast?

A real example

Packaging up

I really meant no_std

Linux

Redistributable

Turning it into a real thing

Benefits

All-Expense Paid Vacation

Transport & Parking

Gym or Rock Climbing Membership

Classes & Conferences

Food & Drinks

Work Toys

Tired? Take a nap

I really meant `no_std`