Show HN: I heard you like RSS Readers, so I built one you might like

April 19, 2018, 8:21 am

≫ Next: The latest trend for tech interviews: Days of unpaid homework

≪ Previous: Flow chart of cognitive biases

Telescope is a news reader app where you can follow news sources, video channels and RSS you love.

Get stories **only** from the subscriptions you follow.

You can subscribe to trusted news sources such as New York Times, NPR, Bloomberg, The Verge and thousands more. You can simply search for your favorite news sources and subscribe to them or you can add their RSS feeds. Telescope also let's you subscribe to your favorite Youtube channels.

Once you subscribe to sources you like, Telescope will show you aggregated 'Home Feed' with latest articles and videos only from your subscribed channels. By default they are sorted by their 'hotness' calculated based on the up votes received across the Telescope community. Additionally, you can choose to sort your feed by time, 'top of the day' or 'top of the week'.

Best RSS Reader with focus on readability. Subscribe to as many as RSS feeds you want and rely on home feed to surface on right articles for you!

- At Telescope we believe in the power of journalism and need to eliminate fake news! We believe trusted news sources and journalists will play important part in that. That is why we want to empower you with trusted information by allowing you to follow news you can trust and rely on.

- Also with the help of the Telescope community you will be able to get access to the top news of the day or week. Telescope home feed surfaces articles trusted and up voted by millions in our community. Break out of the "personalization bubbles" created around you by the other apps!

Features:
- Subscribe to news sources such as New York Times, The Washington Times, Forbes, Bloomberg, The Verge, NPR, Wired, and 1000+. Search for news sources under 'Explore' tab
- Add any RSS url of your choice by clicking on '+' button in 'Explore' tab
- Subscribe to any Youtube channel. Search channels under 'Explore' tab
- Save any story for you to read later. Find saved stories under 'Reading List' tab
- Block posts from certain categories from appearing in your 'Home Feed', such as blocking Politics, Entertainment or any other category posts. Set it under 'Profile' tab -> 'Home Feed Configuration'
- Switch to any particular subscription under sidebar (top left button on Home)
- View posts from your subscriptions categories by Politics, Entertainment, Sports, etc. under sidebar
- Sort your feed by 'Hot', 'New', 'Top of Today', 'Top of the week'.

Feel free to reach us at contact@telescope.surf for any feedback. You can access Telescope on web at https://telescope.surf

↧

The latest trend for tech interviews: Days of unpaid homework

April 19, 2018, 12:31 am

≫ Next: MySQL 8.0 is now generally available

≪ Previous: Show HN: I heard you like RSS Readers, so I built one you might like

Last year, a company that was interested in hiring me as a software developer asked me to build a food delivery application for a fictional restaurant, as a way to test my coding abilities. I was a bit shocked. The time commitment for building an entire application from scratch can be substantial, and the homework assignment didn’t pay.

After a long weekend of work, I was so exhausted and miffed that I gave up. I told the interviewer I wasn’t interested in the job, but the reality is that I was dismayed at the interview process.

I queried my network, asking if anyone else had been given homework assignments. It turns out I wasn’t alone. Developers all over the U.S. had encountered the practice, spending anywhere from a few hours to over three days working on their unpaid interview assignments. In a talk given at PyCon last year, software developer Susan Tan said she spent as long as 32 hours on homework, only to be rejected because it was missing a feature not even stated in the original requirements. She says she usually didn’t receive any feedback at all, a common gripe among the developers I queried. The most extreme example I heard about directly came from Erik Umenhofer, who lives in Silicon Valley and told me that he’s encountered tests that required a solid three days of work.

“In the Bay Area people just get used to it,” he said. “When one company sees another company does a hiring practice, they all start to adopt it.”

How homework became part of the tech interview

For my first full-time development job back in 2010, I endured a 6-hour-long unstructured interview. As was typical for job interviews in the tech industry at the time, I chatted with every member of the team about my background and skills. Some of them had prepared questions, others hadn’t.

Numerous studies now show interviews like these are a potential source of bias and do not accurately evaluate candidates.

Later on in my career, in part to avoid this potential bias, the tech industry introduced other forms of testing with the goal of testing skills, such as “whiteboarding,” which involves asking job candidates to jot out the solutions to brainteaser questions such as finding the first 25 prime numbers of a sequence.

There was backlash against that, too, as critics pointed out it hardly resembled how most programmers work on the job. Then Google, the most famous proponent of these tests, delivered “whiteboarding” a final blow when it admitted its own studies showed the puzzles weren’t even very useful in evaluating skill.

“Take home tests” emerged in this vacuum as an opportunity for developers to use their own computers to work on solutions to relevant problems in the comfortable environment of their home.

The problem with homework for job interviews

Completing days of free work as a requirement for applying to a job is a burden for anyone, but it may also deepen biases against already underrepresented groups in tech, such as women. Women still perform most child care in this country, leaving them with much less free time to do these tests. As software developer Gabriela Voicu told me, many women don’t have time for “code challenges that may or may not go anywhere.” She was especially irked that some companies claim these challenges remove bias and create a fairer process. “If that is their goal, that is an imperfect solution,” she said.

Because homework is so new, there are no studies comparing them to other methods of hiring. But some hiring managers already contend that there are ways to evaluate programmers that are less time consuming.

Pete Holiday, an Engineering Manager at CallRail in Atlanta, used to use homework as part of job interviews before realizing that he was ruling out good candidates. Some told him they didn’t have time for homework. Others may have never gotten to that point. “It’s way more inclusive to just have someone come to the office and talk to them,” Holiday said. “You’re not counting on them having time, or a computer at home. We have candidates with sick family member, single parents. Without the homework we can cast a wider net.”

Holiday also noticed that some candidates weren’t familiar with the type of framework the test expected them to use, but they were more than capable of learning it.

Holiday’s company now uses a structured “live-coding interview” format that involves reviewing code together and talking about it. The process typically takes between 30 minutes to an hour.

Alistair Davidson, a Technical Architect in London, says his company made the same switch after learning that a “small” homework assignment they’d developed took a candidate 16 hours to finish. “[A live-coding interview] gives you a better feel for what it’s like to work directly with someone and try to solve a problem with them” he says.

In the meantime, many developers, especially at more senior levels, are starting to simply say “no thanks” when handed homework assignments. And that’s the approach I will also take when I’m on the job market again.

↧

MySQL 8.0 is now generally available

April 19, 2018, 7:01 am

≫ Next: Launch HN: SharpestMinds (YC W18) – Online Community for AI Devs

≪ Previous: The latest trend for tech interviews: Days of unpaid homework

We proudly announce General Availability of MySQL 8.0. Download now! MySQL 8.0 is an extremely exciting new version of the world’s most popular open source database with improvements across the board. Some key enhancements include:

SQL Window functions, Common Table Expressions, NOWAIT and SKIP LOCKED, Descending Indexes, Grouping, Regular Expressions, Character Sets, Cost Model, and Histograms.
JSON Extended syntax, new functions, improved sorting, and partial updates. With JSON table functions you can use the SQL machinery for JSON data.
GIS Geography support. Spatial Reference Systems (SRS), as well as SRS aware spatial datatypes, spatial indexes, and spatial functions.
Reliability DDL statements have become atomic and crash safe, meta-data is stored in a single, transactional data dictionary. Powered by InnoDB!
Observability Significant enhancements to Performance Schema, Information Schema, Configuration Variables, and Error Logging.
Manageability Remote management, Undo tablespace management, and new instant DDL.
Security OpenSSL improvements, new default authentication, SQL Roles, breaking up the super privilege, password strength, and more.
Performance InnoDB is significantly better at Read/Write workloads, IO bound workloads, and high contention “hot spot” workloads. Added Resource Group feature to give users an option optimize for specific workloads on specific hardware by mapping user threads to CPUs.

The above represents some of the highlights and I encourage you to further drill into the complete series of Milestone blog posts—8.0.0, 8.0.1, 8.0.2, 8.0.3, and 8.0.4—and even further down in to the individual worklogs with their specifications and implementation details. Or perhaps you prefer to just look at the source code at github.com/mysql.

Developer features

MySQL Developers want new features and MySQL 8.0 delivers many new and much requested features in areas such as SQL, JSON, Regular Expressions, and GIS. Developers also want to be able to store Emojis, thus UTF8MB4 is now the default character set in 8.0. Finally there are improvements in Datatypes, with bit-wise operations on BINARY datatypes and improved IPv6 and UUID functions.

SQL

Window Functions

MySQL 8.0 delivers SQL window functions. Similar to grouped aggregate functions, window functions perform some calculation on a set of rows, e.g. COUNT or SUM. But where a grouped aggregate collapses this set of rows into a single row, a window function will perform the aggregation for each row in the result set.

Window functions come in two flavors: SQL aggregate functions used as window functions and specialized window functions. This is the set of aggregate functions in MySQL that support windowing: COUNT, SUM, AVG, MIN, MAX, BIT_OR, BIT_AND, BIT_XOR, STDDEV_POP (and its synonyms STD, STDDEV), STDDEV_SAMP, VAR_POP (and its synonym VARIANCE) and VAR_SAMP. The set of specialized window functions are: RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE, ROW_NUMBER, FIRST_VALUE, LAST_VALUE, NTH_VALUE, LEAD and LAG

Support for window functions (a.k.a. analytic functions) is a frequent user request. Window functions have long been part of standard SQL (SQL 2003). See blog post by Dag Wanvik here as well as blog post by Guilhem Bichot here.

Common Table Expression

MySQL 8.0 delivers [Recursive] Common Table Expressions (CTEs). Non-recursive CTEs can be explained as “improved derived tables” as it allow the derived table to be referenced more than once. A recursive CTE is a set of rows which is built iteratively: from an initial set of rows, a process derives new rows, which grow the set, and those new rows are fed into the process again, producing more rows, and so on, until the process produces no more rows. CTE is a commonly requested SQL feature, see for example feature request 16244 and 32174 . See blog posts by Guilhem Bichot here, here, here, and here.

NOWAIT and SKIP LOCKED

MySQL 8.0 delivers NOWAIT and SKIP LOCKED alternatives in the SQL locking clause. Normally, when a row is locked due to an UPDATE or a SELECT ... FOR UPDATE, any other transaction will have to wait to access that locked row. In some use cases there is a need to either return immediately if a row is locked or ignore locked rows. A locking clause using NOWAIT will never wait to acquire a row lock. Instead, the query will fail with an error. A locking clause using SKIP LOCKED will never wait to acquire a row lock on the listed tables. Instead, the locked rows are skipped and not read at all. NOWAIT and SKIP LOCKED are frequently requested SQL features. See for example feature request 49763 . We also want to say thank you to Kyle Oppenheim for his code contribution! See blog post by Martin Hansson here.

Descending Indexes

MySQL 8.0 delivers support for indexes in descending order. Values in such an index are arranged in descending order, and we scan it forward. Before 8.0, when a user create a descending index, we created an ascending index and scanned it backwards. One benefit is that forward index scans are faster than backward index scans. Another benefit of a real descending index is that it enables us to use indexes instead of filesort for an ORDER BY clause with mixed ASC/DESC sort key parts. Descending Indexes is a frequently requested SQL feature. See for example feature request 13375 . See blog post by Chaithra Gopalareddy here.

GROUPING

MySQL 8.0 delivers GROUPING(), SQL_FEATURE T433. The GROUPING() function distinguishes super-aggregate rows from regular grouped rows. GROUP BY extensions such as ROLLUP produce super-aggregate rows where the set of all values is represented by null. Using the GROUPING() function, you can distinguish a null representing the set of all values in a super-aggregate row from a NULL in a regular row. GROUPING is a frequently requested SQL feature. See feature requests 3156 and 46053. Thank you to Zoe Dong and Shane Adams for code contributions in feature request 46053 ! See blog post by Chaithra Gopalareddy here.

Optimizer Hints

In 5.7 we introduced a new hint syntax for optimizer hints. With the new syntax, hints can be specified directly after the SELECT | INSERT | REPLACE | UPDATE | DELETE keywords in an SQL statement, enclosed in /*+ */ style comments. (See 5.7 blog post by Sergey Glukhov here). In MySQL 8.0 we complete the picture by fully utilizing this new style:

MySQL 8.0 adds hints for INDEX_MERGE and NO_INDEX_MERGE. This allows the user to control index merge behavior for an individual query without changing the optimizer switch.
MySQL 8.0 adds hints for JOIN_FIXED_ORDER, JOIN_ORDER, JOIN_PREFIX, and JOIN_SUFFIX. This allows the user to control table order for the join execution.
MySQL 8.0 adds a hint called SET_VAR. The SET_VAR hint will set the value for a given system variable for the next statement only. Thus the value will be reset to the previous value after the statement is over. See blog post by Sergey Glukhov here.

We prefer the new style of optimizer hints as preferred over the old-style hints and setting of optimizer_switch values. By not being inter-mingled with SQL, the new hints can be injected in many places in a query string. They also have clearer semantics in being a hint (vs directive).

JSON

MySQL 8.0 adds new JSON functions and improves performance for sorting and grouping JSON values.

Extended Syntax for Ranges in JSON path expressions

MySQL 8.0 extends the syntax for ranges in JSON path expressions. For example SELECT JSON_EXTRACT('[1, 2, 3, 4, 5]', '$[1 to 3]'); results in [2, 3, 4]. The new syntax introduced is a subset of the SQL standard syntax, described in SQL:2016, 9.39 SQL/JSON path language: syntax and semantics. See also Bug#79052 reported by Roland Bouman.

JSON Table Functions

MySQL 8.0 adds JSON table functions which enables the use of the SQL machinery for JSON data. JSON_TABLE() creates a relational view of JSON data. It maps the result of a JSON data evaluation into relational rows and columns. The user can query the result returned by the function as a regular relational table using SQL, e.g. join, project, and aggregate.

JSON Aggregation Functions

MySQL 8.0 adds the aggregation functions JSON_ARRAYAGG() to generate JSON arrays and JSON_OBJECTAGG() to generate JSON objects . This makes it possible to combine JSON documents in multiple rows into a JSON array or a JSON object. See blog post by Catalin Besleaga here.

JSON Merge Functions

The JSON_MERGE_PATCH() function implements the semantics of JavaScript (and other scripting languages) specified by RFC7396, i.e. it removes duplicates by precedence of the second document. For example, JSON_MERGE('{"a":1,"b":2 }','{"a":3,"c":4 }'); # returns {"a":3,"b":2,"c":4}.

The JSON_MERGE_PRESERVE() function has the semantics of JSON_MERGE() implemented in MySQL 5.7 which preserves all values, for example JSON_MERGE('{"a": 1,"b":2}','{"a":3,"c":4}');# returns {"a":[1,3],"b":2,"c":4}.

The existing JSON_MERGE() function is deprecated in MySQL 8.0 to remove ambiguity for the merge operation. See also proposal in Bug#81283 and blog post by Morgan Tocker here.

JSON Pretty Function

MySQL 8.0 adds a JSON_PRETTY() function in MySQL. The function accepts either a JSON native data-type or string representation of JSON and returns a JSON formatted string in a human-readable way with new lines and indentation.

JSON Size Functions

MySQL 8.0 adds JSON functions related to space usage for a given JSON object. The JSON_STORAGE_SIZE() returns the actual size in bytes for a JSON datatype. The JSON_STORAGE_FREE() returns the free space of a JSON binary type in bytes, including fragmentation and padding saved for inplace update.

JSON Improved Sorting

MySQL 8.0 gives better performance for sorting/grouping JSON values by using variable length sort keys. Preliminary benchmarks shows from 1.2 to 18 times improvement in sorting, depending on use case.

JSON Partial Update

MySQL 8.0 adds support for partial update for the JSON_REMOVE(), JSON_SET() and JSON_REPLACE() functions. If only some parts of a JSON document are updated, we want to give information to the handler about what was changed, so that the storage engine and replication don’t need to write the full document. In a replicated environment, it cannot be guaranteed that the layout of a JSON document is exactly the same on the slave and the master, so the physical diffs cannot be used to reduce the network I/O for row-based replication. Thus, MySQL 8.0 provides logical diffs that row-based replication can send over the wire and reapply on the slave. See blog post by Knut Anders Hatlen here.

GIS

MySQL 8.0 delivers geography support. This includes meta-data support for Spatial Reference System (SRS), as well as SRS aware spatial datatypes, spatial indexes, and spatial functions. In short, MySQL 8.0 understands latitude and longitude coordinates on the earth’s surface and can, for example, correctly calculate the distances between two points on the earths surface in any of the about 5000 supported spatial reference systems.

Spatial Reference System (SRS)

The ST_SPATIAL_REFERENCE_SYSTEMS information schema view provides information about available spatial reference systems for spatial data. This view is based on the SQL/MM (ISO/IEC 13249-3) standard. Each spatial reference system is identified by an SRID number. MySQL 8.0 ships with about 5000 SRIDs from the EPSG Geodetic Parameter Dataset, covering georeferenced ellipsoids and 2d projections (i.e. all 2D spatial reference systems).

SRID aware spatial datatypes

Spatial datatypes can be attributed with the spatial reference system definition, for example with SRID 4326 like this: CREATE TABLE t1 (g GEOMETRY SRID 4326); The SRID is here a SQL type modifier for the GEOMETRY datatype. Values inserted into a column with an SRID property must be in that SRID. Attempts to insert values with other SRIDs results in an exception condition being raised. Unmodified types, i.e., types with no SRID specification, will continue to accept all SRIDs, as before.

MySQL 8.0 adds the INFORMATION_SCHEMA.ST_GEOMETRY_COLUMNS view as specified in SQL/MM Part 3, Sect. 19.2. This view will list all GEOMETRY columns in the MySQL instance and for each column it will list the standard SRS_NAME , SRS_ID , and GEOMETRY_TYPE_NAME.

SRID aware spatial indexes

Spatial indexes can be created on spatial datatypes. Columns in spatial indexes must be declared NOT NULL. For example like this: CREATE TABLE t1 (g GEOMETRY SRID 4326 NOT NULL, SPATIAL INDEX(g));

Columns with a spatial index should have an SRID type modifier to allow the optimizer to use the index. If a spatial index is created on a column that doesn’t have an SRID type modifier, a warning is issued.

SRID aware spatial functions

MySQL 8.0 extends spatial functions such as ST_Distance() and ST_Length() to detect that its parameters are in a geographic (ellipsoidal) SRS and to compute the distance on the ellipsoid. So far, ST_Distance and spatial relations such as ST_Within, ST_Intersects, ST_Contains, ST_Crosses, etc. support geographic computations. The behavior of each ST function is as defined in SQL/MM Part 3 Spatial.

Character Sets

MySQL 8.0 makes UTF8MB4 the default character set. SQL performance – such as sorting UTF8MB4 strings – has been improved by a factor of 20 in 8.0 as compared to 5.7. UTF8MB4 is the dominating character encoding for the web, and this move will make life easier for the vast majority of MySQL users.

The default character set has changed from latin1 to utf8mb4 and the default collation has changed from latin1_swedish_ci to utf8mb4_800_ci_ai.
The changes in defaults applies to libmysql and server command tools as well as the server itself.
The changes are also reflected in MTR tests, running with new default charset.
The collation weight and case mapping are based on Unicode 9.0.0 , announced by the Unicode committee on Jun 21, 2016.
The 21 language specific case insensitive collations available for latin1 (MySQL legacy) have been implemented for utf8mb4 collations, for example the Czech collation becomes utf8mb4_cs_800_ai_ci. See complete list in WL#9108 . See blog post by Xing Zhang here .
Added support for case and accent sensitive collations. MySQL 8.0 supports all 3 levels of collation weight defined by DUCET (Default Unicode Collation Entry Table). See blog post by Xing Zhang here.
Japanese utf8mb4_ja_0900_as_cs collation for utf8mb4 which sorts characters by using three levels’ weight. This gives the correct sorting order for Japanese. See blog post by Xing Zhang here.
Japanese with additional kana sensitive feature, utf8mb4_ja_0900_as_cs_ks, where ‘ks’ stands for ‘kana sensitive’. See blog post by Xing Zhang here.
Changed all new collations, from Unicode 9.0.0 forward, to be NO PAD instead of PAD STRING, ie., treat spaces at the end of a string like any other character. This is done to improve consistency and performance. Older collations are left in place.

See also blog posts by Bernt Marius Johnsen here, here and here.

Datatypes

Bit-wise operations on binary data types

MySQL 8.0 extends the bit-wise operations (‘bit-wise AND’, etc) to also work with [VAR]BINARY/[TINY|MEDIUM|LONG]BLOB. Prior to 8.0 bit-wise operations were only supported for integers. If you used bit-wise operations on binaries the arguments were implicitly cast to BIGINT (64 bit) before the operation, thus possibly losing bits. From 8.0 and onward bit-wise operations work for all BINARY and BLOB data types, casting arguments such that bits are not lost.

IPV6 manipulation

MySQL 8.0 improves the usability of IPv6 manipulation by supporting bit-wise operations on BINARY data types. In MySQL 5.6 we introduced the INET6_ATON() and INET6_NTOA() functions which convert IPv6 addresses between text form like 'fe80::226:b9ff:fe77:eb17' and VARBINARY(16). However, until now we could not combine these IPv6 functions with bit-wise operations since such operations would – wrongly – convert output to BIGINT. For example, if we have an IPv6 address and want to test it against a network mask, we can now use INET6_ATON(address) & INET6_ATON(network) because INET6_ATON() correctly returns the VARBINARY(16) datatype (128 bits). See blog post by Catalin Besleaga here.

UUID manipulations

MySQL 8.0 improves the usability of UUID manipulations by implementing three new SQL functions: UUID_TO_BIN(), BIN_TO_UUID(), and IS_UUID(). The first one converts from UUID formatted text to VARBINARY(16), the second one from VARBINARY(16) to UUID formatted text, and the last one checks the validity of an UUID formatted text. The UUID stored as a VARBINARY(16) can be indexed using functional indexes. The functions UUID_TO_BIN() and UUID_TO_BIN() can also shuffle the time-related bits and move them at the beginning making it index friendly and avoiding the random inserts in the B-tree, this way reducing the insert time. The lack of such functionality has been mentioned as one of the drawbacks of using UUID’s. See blog post by Catalin Besleaga here.

Cost Model

Query Optimizer Takes Data Buffering into Account

MySQL 8.0 chooses query plans based on knowledge about whether data resides in-memory or on-disk. This happens automatically, as seen from the end user there is no configuration involved. Historically, the MySQL cost model has assumed data to reside on spinning disks. The cost constants associated with looking up data in-memory and on-disk are now different, thus, the optimizer will choose more optimal access methods for the two cases, based on knowledge of the location of data. See blog post by Øystein Grøvlen here.

Optimizer Histograms

MySQL 8.0 implements histogram statistics. With Histograms, the user can create statistics on the data distribution for a column in a table, typically done for non-indexed columns, which then will be used by the query optimizer in finding the optimal query plan. The primary use case for histogram statistics is for calculating the selectivity (filter effect) of predicates of the form “COLUMN operator CONSTANT”.

The user creates a histogram by means of the ANALYZE TABLE syntax which has been extended to accept two new clauses: UPDATE HISTOGRAM ON column [, column] [WITH n BUCKETS] and DROP HISTOGRAM ON column [, column]. The number of buckets is optional, the default is 100. The histogram statistics are stored in the dictionary table “column_statistics” and accessible through the view information_schema.COLUMN_STATISTICS. The histogram is stored as a JSON object due to the flexibility of the JSON datatype. ANALYZE TABLE will automatically decide whether to sample the base table or not, based on table size. It will also decide whether to build a singleton or a equi-height histogram based on the data distribution and the number of buckets specified. See blog post by Erik Frøseth here.

Regular Expressions

MySQL 8.0 supports regular expressions for UTF8MB4 as well as new functions like REGEXP_INSTR(), REGEXP_LIKE(), REGEXP_REPLACE(), and REGEXP_SUBSTR(). The system variables regexp_stack_limit (default 32 steps) and regexp_time_limit (default 8000000 bytes) have been added to control the execution. The REGEXP_REPLACE() function is one of the most requested features by the MySQL community, for example see feature request reported as BUG #27389 by Hans Ginzel. See also blog posts by Martin Hansson here and Bernt Marius Johnsen here.

Dev Ops features

Dev Ops care about operational aspects of the database, typically about reliability, availability, performance, security, observability, and manageability. High Availability comes with MySQL InnoDB Cluster and MySQL Group Replication which will be covered by a separate blog post. Here follows what 8.0 brings to the table in the other categories.

Reliability

MySQL 8.0 increases the overall reliability of MySQL because :

MySQL 8.0 stores its meta-data into InnoDB, a proven transactional storage engine. System tables such as Users and Privileges as well as Data Dictionary tables now reside in InnoDB.
MySQL 8.0 eliminates one source of potential inconsistency. In 5.7 and earlier versions there are essentially two data dictionaries, one for the Server layer and one for the InnoDB layer, and these can get out of sync in some crashing scenarios. In 8.0 there is only one data dictionary.
MySQL 8.0 ensures atomic, crash safe DDL. With this the user is guaranteed that any DDL statement will either be executed fully or not at all. This is particularly important in a replicated environment, otherwise there can be scenarios where masters and slaves (nodes) get out of sync, causing data-drift.

This work is done in the context of the new, transactional data dictionary. See blog posts by Staale Deraas here and here.

Observability

Information Schema (speed up)

MySQL 8.0 reimplements Information Schema. In the new implementation the Information Schema tables are simple views on data dictionary tables stored in InnoDB. This is by far more efficient than the old implementation with up to 100 times speedup. This makes Information Schema practically usable by external tooling. See blog posts by Gopal Shankar here and here , and the blog post by Ståle Deraas here.

Performance Schema (speed up)

MySQL 8.0 speeds up performance schema queries by adding more than 100 indexes on performance schema tables. The indexes on performance schema tables are predefined. They cannot be deleted,added or altered. A performance schema index is implemented as a filtered scan across the existing table data, rather than a traversal through a separate data structure. There are no B-trees or hash tables to be constructed, updated or otherwise managed. Performance Schema tables indexes behave like hash indexes in that a) they quickly retrieve the desired rows, and b) do not provide row ordering, leaving the server to sort the result set if necessary. However, depending on the query, indexes obviate the need for a full table scan and will return a considerably smaller result set. Performance schema indexes are visible with SHOW INDEXES and are represented in the EXPLAIN output for queries that reference indexed columns. See comment from Simon Mudd. See blog post by Marc Alff here.

Configuration Variables

MySQL 8.0 adds useful information about configuration variables, such as the variable name, min/max values, where the current value came from, who made the change and when it was made. This information is found in a new performance schema table called variables_info. See blog post by Satish Bharathy here.

Client Error Reporting – Message Counts

MySQL 8.0 makes it possible to look at aggregated counts of client error messages reported by the server. The user can look at statistics from 5 different tables: Global count, summary per thread, summary per user, summary per host, or summary per account. For each error message the user can see the number of errors raised, the number of errors handled by the SQL exception handler, “first seen” timestamp, and “last seen” timestamp. Given the right privileges the user can either SELECT from these tables or TRUNCATE to reset statistics. See blog post by Mayank Prasad here.

Statement Latency Histograms

MySQL 8.0 provides performance schema histograms of statements latency, for the purpose of better visibility of query response times. This work also computes “P95”, “P99” and “P999” percentiles from collected histograms. These percentiles can be used as indicators of quality of service. See blog post by Frédéric Descamps here.

Data Locking Dependencies Graph

MySQL 8.0 instruments data locks in the performance schema. When transaction A is locking row R, and transaction B is waiting on this very same row, B is effectively blocked by A. The added instrumentation exposes which data is locked (R), who owns the lock (A), and who is waiting for the data (B). See blog post by Frédéric Descamps here.

Digest Query Sample

MySQL 8.0 makes some changes to the events_statements_summary_by_digest performance schema table to capture a full example query and some key information about this query example. The column QUERY_SAMPLE_TEXT is added to capture a query sample so that users can run EXPLAIN on a real query and to get a query plan. The column QUERY_SAMPLE_SEEN is added to capture the query sample timestamp. The column QUERY_SAMPLE_TIMER_WAIT is added to capture the query sample execution time. The columns FIRST_SEEN and LAST_SEEN have been modified to use fractional seconds. See blog post by Frédéric Descamps here.

Meta-data about Instruments

MySQL 8.0 adds meta-data such as properties, volatility, and documentation to the performance schema table setup_instruments. This read only meta-data act as online documentation for instruments, to be looked at by users or tools. See blog post by Frédéric Descamps here.

Error Logging

MySQL 8.0 delivers a major overhaul of the MySQL error log. From a software architecture perspective the error log is made a component in the new service infrastructure. This means that advanced users can write their own error log implementation if desired. Most users will not want to write their own error log implementation but still want some flexibility in what to write and where to write it. Hence, 8.0 offers users facilities to add sinks (where) and filters (what). MySQL 8.0 implements a filtering service (API) and a default filtering service implementation (component). Filtering here means to suppress certain log messages (selection) and/or fields within a given log message (projection). MySQL 8.0 implements a log writer service (API) and a default log writer service implementation (component). Log writers accept a log event and write it to a log. This log can be a classic file, syslog, EventLog and a new JSON log writer.

By default, without any configuration, MySQL 8.0 delivers many out-of-the-box error log improvements such as:

Error numbering: The format is a number in the 10000 series preceded by “MY-“, for example “MY-10001”. Error numbers will be stable in a GA release, but the corresponding error texts are allowed to change (i.e. improve) in maintenance releases.
System messages: System messages are written to the error log as [System] instead of [Error], [Warning], [Note]. [System] and [Error] messages are printed regardless of verbosity and cannot be suppressed. [System] messages are only used in a few places, mainly associated with major state transitions such as starting or stopping the server.
Reduced verbosity: The default of log_error_verbosity changes from 3 (Notes) to 2 (Warning). This makes MySQL 8.0 error log less verbose by default.
Source Component: Each message is annotated with one of three values [Server], [InnoDB], [Replic] showing which sub-system the message is coming from.

This is what is written to the error log in 8.0 GA after startup :

2018-03-08T10:14:29.289863Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.5) starting as process 8063
2018-03-08T10:14:29.745356Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-03-08T10:14:29.765159Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.5'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution.
2018-03-08T10:16:51.343979Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.5)  Source distribution.

2018-03-08T10:14:29.289863Z0[System][MY-010116][Server]/usr/sbin/mysqld(mysqld8.0.5)starting asprocess8063

2018-03-08T10:14:29.745356Z0[Warning][MY-010068][Server]CA certificate ca.pem isselfsigned.

2018-03-08T10:14:29.765159Z0[System][MY-010931][Server]/usr/sbin/mysqld:ready forconnections.Version:'8.0.5' socket:'/tmp/mysql.sock' port:3306 Source distribution.

2018-03-08T10:16:51.343979Z0[System][MY-010910][Server]/usr/sbin/mysqld:Shutdown complete(mysqld8.0.5) Source distribution.

The introduction of error numbering in the error log allows MySQL to improve an error text in upcoming maintenance releases (if needed) while keeping the error number (ID) unchanged. Error numbers also act as the basis for filtering/suppression and internationalization/localization.

Manageability

INVISIBLE Indexes

MySQL 8.0 adds the capability of toggling the visibility of an index (visible/invisible). An invisible index is not considered by the optimizer when it makes the query execution plan. However, the index is still maintained in the background so it is cheap to make it visible again. The purpose of this is for a DBA / DevOp to determine whether an index can be dropped or not. If you suspect an index of not being used you first make it invisible, then monitor query performance, and finally remove the index if no query slow down is experienced. This feature has been asked for by many users, for example through Bug#70299. See blog post by Martin Hansson here.

Flexible Undo Tablespace Management

MySQL 8.0 gives the user full control over Undo tablespaces, i.e. how many tablespaces, where are they placed, and how many rollback segments in each.

No more Undo log in the System tablespace. Undo log is migrated out of the System tablespace and into Undo tablespaces during upgrade. This gives an upgrade path for existing 5.7 installation using the system tablespace for undo logs.
Undo tablespaces can be managed separately from the System tablespace. For example, Undo tablespaces can be put on fast storage.
Reclaim space taken by unusually large transactions (online). A minimum of two Undo tablespaces are created to allow for tablespace truncation. This allows InnoDB to shrink the undo tablespace because one Undo tablespace can be active while the other is truncated.
More rollback segments results in less contention. The user might choose to have up to 127 Undo tablespaces, each one having up to 128 rollback segments. More rollback segments mean that concurrent transactions are more likely to use separate rollback segments for their undo logs which results in less contention for the same resources.

See blog post by Kevin Lewis here.

SET PERSIST for global variables

MySQL 8.0 makes it possible to persist global, dynamic server variables. Many server variables are both GLOBAL and DYNAMIC and can be reconfigured while the server is running. For example: SET GLOBAL sql_mode='STRICT_TRANS_TABLES'; However, such settings are lost upon a server restart.

This work makes it possible to write SET PERSIST sql_mode='STRICT_TRANS_TABLES';The effect is that the setting will survive a server restart. There are many usage scenarios for this functionality but most importantly it gives a way to manage server settings when editing the configuration files is inconvenient or not an option. For example in some hosted environments you don’t have file system access, all that you have is the ability to connect to one or more servers. As for SET GLOBAL you need the super privilege for SET PERSIST.

There is also the RESET PERSIST command. The RESET PERSIST command has the semantic of removing the configuration variable from the persist configuration, thus converting it to have similar behavior as SET GLOBAL.

MySQL 8.0 allows SET PERSIST to set most read-only variables as well, the new values will here take effect at the next server restart. Note that a small subset of read-only variables are left intentionally not settable. See blog post by Satish Bharathy here.

Remote Management

MySQL 8.0 implements an SQL RESTART command. The purpose is to enable remote management of a MySQL server over an SQL connection, for example to set a non-dynamic configuration variable by SET PERSIST followed by a RESTART. See blog post MySQL 8.0: changing configuration easily and cloud friendly ! by Frédéric Descamps.

Rename Tablespace (SQL DDL)

MySQL 8.0 implements ALTER TABLESPACE s1 RENAME TO s2; A shared/general tablespace is a user-visible entity which users can CREATE, ALTER, and DROP. See also Bug#26949, Bug#32497, and Bug#58006.

Rename Column (SQL DDL)

MySQL 8.0 implements ALTER TABLE ... RENAME COLUMN old_name TO new_name;This is an improvement over existing syntax ALTER TABLE <table_name> CHANGE … which requires re-specification of all the attributes of the column. The old/existing syntax has the disadvantage that all the column information might not be available to the application trying to do the rename. There is also a risk of accidental data type change in the old/existing syntax which might result in data loss.

Security features

New Default Authentication Plugin

MySQL 8.0 changes the default authentication plugin from mysql_native_password to caching_sha2_password. Correspondingly, libmysqlclient will use caching_sha2_password as the default authentication mechanism, too. The new caching_sha2_password combines better security (SHA2 algorithm) with high performance (caching). The general direction is that we recommend all users to use TLS/SSL for all their network communication. See blog post by Harin Vadodaria here.

OpenSSL by Default in Community Edition

MySQL 8.0 is unifying on OpenSSL as the default TLS/SSL library for both MySQL Enterprise Edition and MySQL Community Edition. Previously, MySQL Community Edition used YaSSL. Supporting OpenSSL in the MySQL Community Edition has been one of the most frequently requested features. See blog post by Frédéric Descamps here.

OpenSSL is Dynamically Linked

MySQL 8.0 is linked dynamically with OpenSSL. Seen from the MySQL Repository users perspective , the MySQL packages depends on the OpenSSL files provided by the Linux system at hand. By dynamically linking, OpenSSL updates can be applied upon availability without requiring a MySQL upgrade or patch. See blog post by Frédéric Descamps here.

Encryption of Undo and Redo log

MySQL 8.0 implements data-at-rest encryption of UNDO and REDO logs. In 5.7 we introduced Tablespace Encryption for InnoDB tables stored in file-per-table tablespaces. This feature provides at-rest encryption for physical tablespace data files. In 8.0 we extend this to include UNDO and REDO logs. See documentation here.

SQL roles

MySQL 8.0 implements SQL Roles. A role is a named collection of privileges. The purpose is to simplify the user access right management. One can grant roles to users, grant privileges to roles, create roles, drop roles, and decide what roles are applicable during a session. See blog post by Frédéric Descamps here.

Allow grants and revokes for PUBLIC

MySQL 8.0 introduces the configuration variable mandatory-roles which can be used for automatic assignment and granting of default roles when new users are created. Example: role1@%,role2,role3,role4@localhost. All the specified roles are always considered granted to every user and they can’t be revoked. These roles still require activation unless they are made into default roles. When the new server configuration variable activate-all-roles-on-login is set to “ON”, all granted roles are always activated after the user has authenticated.

Breaking up the super privileges

MySQL 8.0 defines a set of new granular privileges for various aspects of what SUPER is used for in previous releases. The purpose is to limit user access rights to what is needed for the job at hand and nothing more. For example BINLOG_ADMIN, CONNECTION_ADMIN, and ROLE_ADMIN.

Authorization model to manage XA-transactions

MySQL 8.0 introduces a new system privilege XA_RECOVER_ADMIN which controls the capability to execute the statement XA RECOVER. An attempt to do XA RECOVER by a user who wasn’t granted the new system privilege XA_RECOVER_ADMIN will cause an error.

Password rotation policy

MySQL 8.0 introduces restrictions on password reuse. Restrictions can be configured at global level as well as individual user level. Password history is kept secure because it may give clues about habits or patterns used by individual users when they change their password. The password rotation policy comes in addition to other, existing mechanisms such as the password expiration policy and allowed password policy. See Password Management.

Slow down brute force attacks on user passwords

MySQL 8.0 introduces a delay in the authentication process based on consecutive unsuccessful login attempts. The purpose is to slow down brute force attacks on user passwords. It is possible to configure the number of consecutive unsuccessful attempts before the delay is introduced and the maximum amount of delay introduced.

Retire skip-grant-tables

MySQL 8.0 disallows remote connections when the server is started with –skip-grant-tables. See also Bug#79027 reported by Omar Bourja.

Add mysqld_safe-functionality to server

MySQL 8.0 implement parts of the logic currently found in the mysqld_safe script inside the server. The work improves server usability in some scenarios for example when using the --daemonize startup option. The work also make users less dependent upon the mysqld_safe script, which we hope to remove in the future. It also fixes Bug#75343 reported by Peter Laursen.

Performance

MySQL 8.0 comes with better performance for Read/Write workloads, IO bound workloads, and high contention “hot spot” workloads. In addition, the new Resource Group feature gives users an option to optimize for specific workloads on specific hardware by mapping user threads to CPUs.

Scaling Read/Write Workloads

MySQL 8.0 scales well on RW and heavy write workloads. On intensive RW workloads we observe better performance already from 4 concurrent users and more than 2 times better performance on high loads comparing to MySQL 5.7. We can say that while 5.7 significantly improved scalability for Read Only workloads, 8.0 significantly improves scalability for Read/Write workloads. The effect is that MySQL improves hardware utilization (efficiency) for standard server side hardware (like systems with 2 CPU sockets). This improvement is due to re-designing how InnoDB writes to the REDO log. In contrast to the historical implementation where user threads were constantly fighting to log their data changes, in the new REDO log solution user threads are now lock-free, REDO writing and flushing is managed by dedicated background threads, and the whole REDO processing becomes event-driven. See blog post by Dimitri Kravtchuk here.

Utilizing IO Capacity (Fast Storage)

MySQL 8.0 allows users to use every storage device to its full power. For example, testing with Intel Optane flash devices we were able to out-pass 1M Point-Select QPS in a fully IO-bound workload. (IO bound means that data are not cached in buffer pool but must be retrieved from secondary storage). This improvement is due to getting rid of the fil_system_mutex global lock.

Better Performance upon High Contention Loads (“hot rows”)

MySQL 8.0 significantly improves the performance for high contention workloads. A high contention workload occurs when multiple transactions are waiting for a lock on the same row in a table, causing queues of waiting transactions. Many real world workloads are not smooth over for example a day but might have bursts at certain hours (Pareto distributed). MySQL 8.0 deals much better with such bursts both in terms of transactions per second, mean latency, and 95th percentile latency. The benefit to the end user is better hardware utilization (efficiency) because the system needs less spare capacity and can thus run with a higher average load. The original patch was contributed by Jiamin Huang (Bug#84266). Please study the Contention-Aware Transaction Scheduling (CATS) algorithm and read the MySQL blog post by Jiamin Huang and Sunny Bains here.

MySQL 8.0 introduces global Resource Groups to MySQL. With Resource Groups, DevOps/DBAs can manage the mapping between user/system threads and CPUs. This can be used to split workloads across CPUs to obtain better efficiency and/or performance in some use cases. Thus, Resource Groups adds a tool to the DBA toolbox, a tool which can help the DBA to increase hardware utilization or to increase query stability. As an example, with a Sysbench RW workload running on a Intel(R) Xeon (R) CPU E7-4860 2.27 GHz 40 cores-HT box we doubled the overall throughput by limiting the Write load to 10 cores. Resource Groups is a fairly advanced tool which requires skilled DevOps/DBA to be used effectively as effects will vary with type of load and with the hardware at hand.

Other Features

↧

Launch HN: SharpestMinds (YC W18) – Online Community for AI Devs

April 19, 2018, 9:48 am

≫ Next: Robot Conquers One of the Hardest Human Tasks: Assembling Ikea Furniture

≪ Previous: MySQL 8.0 is now generally available

Hi HN! We're Ed and Jeremie, the founders of SharpestMinds in YC's W18 batch. We're building a free online community for ML/AI developers through which they can access job opportunities. (You can apply to join it at https://www.sharpestminds.com/members)

We're ML developers from non traditional backgrounds. Ed did a PhD in biological physics, and Jeremie studied quantum optics before dropping out of grad school to work on SharpestMinds. We started looking for ML jobs after school, thinking it shouldn't be too hard to get one. We found to our naive surprise that we fell short on a number of skills that are needed to do good work in industry. You just don't learn much devops in grad school.

As a result we decided to build something that would make it easier for ML devs to develop (and discover!) skills they might be missing, and then get their first jobs or internships. From the outset we also wanted to build a community around the process, since looking for your first job is usually a pretty lonely experience. Because we monetize directly through hiring, we can afford to create a space for discussion without ads or algorithmic distractions :)

Our typical users so far have been grad students who know ML material well, but don't yet have much, or any, practical experience. However, you don't need a degree at all (a few of our users are self-taught high school dropouts), and anyone who knows the material is welcome. In fact, that's one of the advantages of our system: we test directly for knowledge, so it doesn't matter how you got that knowledge or how long it took you to get it. One of our goals is that by the time we present you as a candidate, things that would otherwise be holes in your resumé don't matter so much, and we can make that case to companies that are hiring.

To qualify for joining, you do an online deep learning quiz (here: https://www.sharpestminds.com/members/apply), followed by a technical interview. If you pass both, we invite you aboard. It's possible to retake the quiz a month later if you don't pass it, and we'll send you tips on what to study in the meantime.

Once you join you get access to a job board with exclusive (i.e., not scraped) internship and full-time opportunities on it. We've created an application system where your profile gets customized to the job you're applying for, to maximize the odds that you'll get an interview. We also have lists of common interview questions, mentors that you can practice interviewing with, and periodic AMAs with ML hiring managers from companies like Skydio and Airbnb.

The hardest part about building this has been figuring out the best way to present our users to employers. Early on we found that hiring managers were passing on qualified people, because their eyes would glaze over from reading too many CVs. We ended up building application profiles that let our users display their most relevant personal projects prominently in their application. The interview rate has increased significantly as a result.

If our approach works for the ML/AI field, we'd like to build communities like this for other fields too.

We're looking forward to getting feedback and hearing ideas from HN! We know there are lots of ML devs / enthusiasts on here, and we'd also be very interested in hearing about your own experiences making the transition, or similar programs you might know about. We'd also be interested in hearing about what, in your experience, are the most important programming skills needed by someone with a good knowledge base but little practical experience to be a strong contributor at their first job or internship.

↧

Robot Conquers One of the Hardest Human Tasks: Assembling Ikea Furniture

April 19, 2018, 9:20 am

≫ Next: The Opening Bell (no one told me I had to make a speech)

≪ Previous: Launch HN: SharpestMinds (YC W18) – Online Community for AI Devs

Robots have taken our jobs, learned our chores and beaten us at our own games.

Now researchers in Singapore say they have trained one to perform another task known to confound humans: figuring out how to assemble furniture from Ikea.

A team from Nanyang Technological University programmed a robot to create and execute a plan to piece together most of Ikea’s $25 solid-pine Stefan chair on its own, calling on a medley of human skills to do so. The researchers explained their work in a study published on Wednesday in the journal Science Robotics.

“If you think about it, it requires perception, it requires you to plan a motion, it requires control between the robot and the environment, it requires transporting an object with two arms simultaneously,” said Dr. Quang-Cuong Pham, an assistant professor of engineering at the university and one of the paper’s authors. “Because this task requires so many interesting skills for robots, we felt that it could be a good project to push our capabilities to the limit.”

He and his Nanyang colleagues who worked on the study, Francisco Suárez-Ruiz and Xian Zhou, aren’t alone.

In recent years, a handful of others have set out to teach robots to assemble Ikea furniture, a task that can mimic the manipulations robots can or may someday perform on factory floors and that involves a brand many know all too well.

“It’s something that almost everybody is familiar with and almost everybody hates doing,” said Ross A. Knepper, an assistant professor of computer science at Cornell University, whose research focuses on human-robot interaction.

In 2013, Mr. Knepper was part of a team at the Massachusetts Institute of Technology that presented a paper on its work in the area, describing the “IkeaBot” the team created, which could assemble the company’s Lack table on its own.

But chairs, with backs, stretchers and other parts, pose a more complex challenge; hence the interest of the Nanyang researchers.

Their robot was made of custom software, a three-dimensional camera, two robotic arms, grippers and force detectors. The team chose only off-the-shelf tools, in order to mirror human biology.

“Humans have the same hardware to do many different things,” Dr. Pham said. “So this is kind of the genericity that we wanted to mimic.”

Also like humans, the robot had a little help to start: It was fed a kind of manual, a set of ordered instructions on how the pieces fit together. After that, though, it was on its own.

The robot proceeded in three broad phases, spread out over 20 minutes 19 seconds.

First, like humans, it took some time to stare at the pieces scattered before it.

The robot spent a few seconds photographing the scene and matching each part to the one modeled in its “manual.”

Then, over more than 11 minutes, the robot devised a plan that would allow it to quickly assemble the chair without its arms knocking into each other or into the various parts.

Finally, it put the plan in motion over the course of nearly nine minutes. The robot used grippers to pick up the wooden pins from a tray and force sensors at its “wrists” to detect when the pins, searching in a spiral pattern, finally slid into their holes. Working in unison, the arms then pressed the sides of the chair frame together.

Of course, the robot didn’t succeed right away. There were several failed attempts along the way and researchers tweaked the system before the robot was finally able to assemble the chair on its own.

The accomplishment was the culmination of three years of work, but the team is eager to see what else it can automate, Dr. Pham said.

With the help of experts in artificial intelligence, the researchers may be able to create a robot that can build a chair by following spoken directions or by watching someone else do it first, he said. Or maybe, he said, they’ll eventually develop one that assembles furniture in a way that is truly human: by ignoring the manual altogether.

Niraj Chokshi is a general assignment reporter based in New York. Before joining The Times in 2016, he covered state governments for The Washington Post. He has also worked at The Atlantic, National Journal and The Recorder, in San Francisco. @nirajc

A version of this article appears in print on , on Page B8 of the New York edition with the headline: Robot Cures Human Headache: Putting Together Ikea Furniture. Order Reprints | Today’s Paper | Subscribe

↧

The Opening Bell (no one told me I had to make a speech)

April 19, 2018, 8:58 am

≫ Next: A manufacturing process that produces long strips of high-quality graphene

≪ Previous: Robot Conquers One of the Hardest Human Tasks: Assembling Ikea Furniture

This week will make it 20 years since I was one of the lucky ones to ring the opening bell for the American Stock Exchange. It was spring of 1998 and our company Sonic Foundry would officially begin trading that day.

Sonic Foundry Prospectus from 1998.

The IPO market had been hot in 1997 but it was getting tougher and we knew our window was probably closing. It was a Saturday in New York City and Rimas, Sonic Foundry’s CEO, and I were on our way to meet with Ray Dirks, head of a small firm we hoped would help take us public. When we walked in no one was there but Ray in a large empty office with row upon row of desks. It was like something out of a Wall Street movie. I clearly remember Rimas, my best friend and recent MBA graduate, turning to me and saying “I smell money”. I was thinking something along the lines of “are you sure you aren’t confusing emptiness and despair with money?” It turned out that Rimas had a pretty accurate nose and in a matter of months we would be doing our IPO.

No one told me I had to give a speech.

I was required to give some words before the opening bell. I remember trying to be inspiring but realizing half way through that not a single person on the trading floor gave a crap about what I had to say. They just wanted to get to work.

It was great fun. I rang the bell, the market opened, and then we toured the trading floor and watched our symbol SFO appear on the electronic ticker for the first time (we would become SOFO when we later switched to the NASDAQ). We were whisked off to a celebratory lunch and then spent the afternoon in Manhattan continually checking the stock price. There was great joy (and relief) when we actually closed above the opening price.

My co-founder Curtis J Palmer and I with some sweet phones on the trading floor. (My skin tone and slight puffiness would never suggest that I was a coder from Wisconsin)

It had taken us 7 years to get to this day, but it was only a little over 3 years since we had taken our first friends and family investment changing the direction of our company forever. The first 4 years we scraped by through savings, VISA cards, and sales from Sound Forge. Once we decided to take outside investment priorities changed. It suddenly wasn’t just about making the best product we could. It was now about revenue and growth and doing things we thought would make investors happy. It sometimes meant sacrificing long term results for short term gains and making decisions you may have done differently without that added requirement.

I don’t regret going public or our crazy ride during the internet bubble. After all we were able to create many products still in use today, Sound Forge, CD Architect, Acid, Vegas, Mediasite just to name a few. The list of companies and products created by XSOFO’ers is an impressive one. I spent 20 years working with extremely intelligent people building incredible products and making life-long friendships.

But I can tell you that the time I look back on most fondly is 1994. It was the year that Curt left Microsoft to drive across the country to join me in Madison, Wisconsin. The two of us hired a young engineer right out of university, John Feith, and the 3 of us spent all of 1994 writing Sound Forge 3.0. It would be the break out version for us and Macromedia would show up at the end of the year looking to buy us out. We spent that year working night and day, surviving on Glass Nickel pizza, in a mostly empty concrete walled building, with Curt blasting us with Tool and Pearl Jam. We didn’t worry about recurring revenue, or quarterly results. We just wrote code and that made us happy because we knew we were creating something people would love.

So next time you’re at a meetup and someone dumps on you for your “lifestyle business” I want you to think of this Kurt Vonnegut quote

“I urge you to please notice when you are happy, and exclaim or murmur or think at some point, ‘If this isn’t nice, I don’t know what is.”

Because for me both 1998 and 1994 certainly were nice.

↧

A manufacturing process that produces long strips of high-quality graphene

April 19, 2018, 5:31 am

≫ Next: ‘I Fundamentally Believe That My Time at Reddit Made the World a Worse Place’

≪ Previous: The Opening Bell (no one told me I had to make a speech)

MIT engineers have developed a continuous manufacturing process that produces long strips of high-quality graphene.

The team’s results are the first demonstration of an industrial, scalable method for manufacturing high-quality graphene that is tailored for use in membranes that filter a variety of molecules, including salts, larger ions, proteins, or nanoparticles. Such membranes should be useful for desalination, biological separation, and other applications.

“For several years, researchers have thought of graphene as a potential route to ultrathin membranes,” says John Hart, associate professor of mechanical engineering and director of the Laboratory for Manufacturing and Productivity at MIT. “We believe this is the first study that has tailored the manufacturing of graphene toward membrane applications, which require the graphene to be seamless, cover the substrate fully, and be of high quality.”

Hart is the senior author on the paper, which appears online in the journal Applied Materials and Interfaces. The study includes first author Piran Kidambi, a former MIT postdoc who is now an assistant professor at Vanderbilt University; MIT graduate students Dhanushkodi Mariappan and Nicholas Dee; Sui Zhang of the National University of Singapore; Andrey Vyatskikh, a former student at the Skolkovo Institute of Science and Technology who is now at Caltech; and Rohit Karnik, an associate professor of mechanical engineering at MIT.

Growing graphene

For many researchers, graphene is ideal for use in filtration membranes. A single sheet of graphene resembles atomically thin chicken wire and is composed of carbon atoms joined in a pattern that makes the material extremely tough and impervious to even the smallest atom, helium.

Researchers, including Karnik’s group, have developed techniques to fabricate graphene membranes and precisely riddle them with tiny holes, or nanopores, the size of which can be tailored to filter out specific molecules. For the most part, scientists synthesize graphene through a process called chemical vapor deposition, in which they first heat a sample of copper foil and then deposit onto it a combination of carbon and other gases.

Graphene-based membranes have mostly been made in small batches in the laboratory, where researchers can carefully control the material’s growth conditions. However, Hart and his colleagues believe that if graphene membranes are ever to be used commercially they will have to be produced in large quantities, at high rates, and with reliable performance.

“We know that for industrialization, it would need to be a continuous process,” Hart says. “You would never be able to make enough by making just pieces. And membranes that are used commercially need to be fairly big — some so big that you would have to send a poster-wide sheet of foil into a furnace to make a membrane.”

A factory roll-out

The researchers set out to build an end-to-end, start-to-finish manufacturing process to make membrane-quality graphene.

The team’s setup combines a roll-to-roll approach — a common industrial approach for continuous processing of thin foils — with the common graphene-fabrication technique of chemical vapor deposition, to manufacture high-quality graphene in large quantities and at a high rate. The system consists of two spools, connected by a conveyor belt that runs through a small furnace. The first spool unfurls a long strip of copper foil, less than 1 centimeter wide. When it enters the furnace, the foil is fed through first one tube and then another, in a “split-zone” design.

While the foil rolls through the first tube, it heats up to a certain ideal temperature, at which point it is ready to roll through the second tube, where the scientists pump in a specified ratio of methane and hydrogen gas, which are deposited onto the heated foil to produce graphene.

“Graphene starts forming in little islands, and then those islands grow together to form a continuous sheet,” Hart says. “By the time it’s out of the oven, the graphene should be fully covering the foil in one layer, kind of like a continuous bed of pizza.”

As the graphene exits the furnace, it’s rolled onto the second spool. The researchers found that they were able to feed the foil continuously through the system, producing high-quality graphene at a rate of 5 centimers per minute. Their longest run lasted almost four hours, during which they produced about 10 meters of continuous graphene.

“If this were in a factory, it would be running 24-7,” Hart says. “You would have big spools of foil feeding through, like a printing press.”

Flexible design

Once the researchers produced graphene using their roll-to-roll method, they unwound the foil from the second spool and cut small samples out. They cast the samples with a polymer mesh, or support, using a method developed by scientists at Harvard University, and subsequently etched away the underlying copper.

“If you don’t support graphene adequately, it will just curl up on itself,” Kidambi says. “So you etch copper out from underneath and have graphene directly supported by a porous polymer — which is basically a membrane.”

The polymer covering contains holes that are larger than graphene’s pores, which Hart says act as microscopic “drumheads,” keeping the graphene sturdy and its tiny pores open.

The researchers performed diffusion tests with the graphene membranes, flowing a solution of water, salts, and other molecules across each membrane. They found that overall, the membranes were able to withstand the flow while filtering out molecules. Their performance was comparable to graphene membranes made using conventional, small-batch approaches.

The team also ran the process at different speeds, with different ratios of methane and hydrogen gas, and characterized the quality of the resulting graphene after each run. They drew up plots to show the relationship between graphene’s quality and the speed and gas ratios of the manufacturing process. Kidambi says that if other designers can build similar setups, they can use the team’s plots to identify the settings they would need to produce a certain quality of graphene.

“The system gives you a great degree of flexibility in terms of what you’d like to tune graphene for, all the way from electronic to membrane applications,” Kidambi says.

Looking forward, Hart says he would like to find ways to include polymer casting and other steps that currently are performed by hand, in the roll-to-roll system.

“In the end-to-end process, we would need to integrate more operations into the manufacturing line,” Hart says. “For now, we’ve demonstrated that this process can be scaled up, and we hope this increases confidence and interest in graphene-based membrane technologies, and provides a pathway to commercialization.”

↧

‘I Fundamentally Believe That My Time at Reddit Made the World a Worse Place’

April 19, 2018, 8:42 am

≫ Next: Apple open sources FoundationDB

≪ Previous: A manufacturing process that produces long strips of high-quality graphene

Over the last few months, Select All has interviewed more than a dozen prominent technology figures about what has gone wrong with the contemporary internet for a project called “The Internet Apologizes.” We’re now publishing lengthier transcripts of each individual interview. This interview features Dan McComas, the former senior vice-president for product of Reddit and the founder and CEO of Imzy, a community-focused platform.

You can find other interviews from this series here.

Reddit started a couple of years after Facebook, and it’s super giant, and the kind of thing that you were present for was the challenge of building a platform that can accommodate a really large and sprawling set of communities, but at the same time make sure that it’s able to maintain community standards. You worked at it, I know, from the product end. I’m interested in hearing a bit about how you came to work at Reddit, and the questions you were thinking about.
I came to work at Reddit through Reddit Gifts. I started Reddit Gifts, and the intention there was just really to see if we could get people to do nice things for other people. That was it. It was just kind of a concept that we came up with and then ran with it. It ended up being pretty impactful, I think, to the overall community. Ultimately, it was too much time for us to manage, so we were going to shut it down, and then Reddit acquired us.

I came in to work at Reddit officially in 2011, and kept doing Reddit Gifts and also being involved with the Reddit side. For a few years there, it was just interesting. I was watching them kind of from the outside, but from the inside as well because I was privy to all the conversations going on. That was kind of during the time of, I think, the r/jailbait debacle. We were acquired just before Yishan became CEO. I worked really closely with Yishan throughout the years.

For me, it was an interesting aspect that I got on it because I got to see the inner workings of the decisions they were making and why they were making the decisions they were making, but I wasn’t in the position, nor would I want to be in the position, of having any kind of impact on the decisions being made.

What were some of those critical decisions that you’re thinking of?
First, I’ll say there were very few decisions made. I think that the biggest problem that Reddit had and continues to have, and that all of the platforms, Facebook and Twitter, and Discord now continue to have is that they’re not making decisions, is that there is absolutely no active thought going into their problems — problems that are going to exist in coming months or years — and what they can do to combat them. I know firsthand that between 2011 and 2015 or 2016, there was just really no thought going into it until I took over product, Ellen [Pao] took over the CEO role, and Jessica [Moreno] took over the head of community role, and we started trying to think about what was going on and what was going to be happening in the future.

We can talk about those decisions if you want, but I think the more interesting aspect is just why people aren’t thinking about this stuff, and how can we get people to think about this stuff. That’s really half of the premise of why Imzy was started. I think there’s just a complete breakdown in the kind of thought process behind how your technology is going to affect the users that use it and the world at large, and the incentive structure that is behind Silicon Valley start-ups and how they’re formed.

What’s that incentive structure?
The incentive structure is simply growth at all costs. There was never, in any board meeting that I have ever attended, a conversation about the users, about things that were going on that were bad, about potential dangers, about decisions that might affect potential dangers. There was never a conversation about that stuff.

The only time we would ever hear anything from the board on that stuff is when there were huge press debacles like the Anderson Cooper thing. In that case, we would get a call from the people who were being negatively affected by the press basically wanting to know how they should answer and what we were going to do about it.

The kind of classic comment that would come up in every board meeting was, “Why aren’t you growing faster?”

We’d say, “Well, we’ve grown by 40 million visitors since the last board meeting.”

And the response was, “That’s slower than the internet is growing; that’s not enough. You have to grow more.” Ultimately, that is why Ellen and I were let go.

Because you pushed back against that?
Because there was so much shit going on: on the site; in the press because of Ellen’s case; internally in the company, we had just moved everybody to San Francisco and pretty much the entire employee base was totally pissed off, and there was so much cleanup to be done just from an organization side; and the technology was in such bad shape. There was no way that we could focus on the type of growth efforts that they wanted. And if you have a small staff, you have to focus on the problems that are going to give you the biggest impact.

If you look from a product angle, if you look at that just from a funnel basis, it’s like 99 percent of everybody who visits Reddit don’t know what Reddit is. They find it by organic search or from a person sharing it. They land on a page and they leave and they never come back. The biggest opportunity to grow Reddit is to focus on that part of the funnel. By doing that, and putting 90 percent of your resources toward focusing on that part of the funnel, you pretty much completely ignore everything that’s actually going on on the site. You ignore the moderators; you ignore the users who are contributing content; you ignore the communities that are being created and the activities going on within them. You basically risk the health of your platform.

It’s a really mismatched incentive structure because if Reddit, specifically, focused all their efforts on the health of their platform, on the people that are really the contributors and not the consumers, they would see growth beyond what they’re getting. It’s kind of a backward way of looking at the problem from a traditional product perspective because you’re not directly affecting growth.

Why, then, do they care so much about growth? Revenue?
From the inside, I can tell you that the board is never asking about revenue. They honestly don’t care, and they said as much. They’re only asking about growth. They believe that if they have a billion unique visitors a month, that they have a property that is going to be worth a ton of money in some way eventually. They really do look at it in that abstract way.

I know they’re making a lot of strides on the advertising side. But I guarantee, that is not their focus. Their focus is purely growth.

This dynamic at Reddit is hardly unique. It seems like it applies to all the major digital platforms.
Absolutely.

In Reddit’s case, that presented a lot of challenges, and it means that they prioritized growth and sacrificed instituting measures or investing in the kinds of changes that would have made the site less toxic. Looking at it from the outside and based on your experience at Reddit and your knowledge of that platform, how do you see this problem mapping onto Facebook or YouTube or Twitter?
Yeah, those sites are wholly different in their incentive structure at this point. It’s a bit different and nuanced in that they’re all public. That brings on a number of other expectations. There is an ultimate expectation of revenue and profit. There’s also another expectation of — I wouldn’t say growth, but I would say a predictable pattern of growth, I guess. I don’t think it’s Facebook’s ultimate objective to get to 5 billion users or something. They have a humongous user base and they don’t want to lose anybody, and they want to have the right kind of activity going on.

I absolutely disagree with a lot of people and think that Facebook has done a better job at this than any other company. I think they have tried to prioritize user safety and they have tried to put processes in place for managing content. I think Twitter is much worse. I think, ultimately, the problem that Reddit has is the same as Twitter and Discord. By focusing on growth and growth only and ignoring the problems, they amassed a large set of cultural norms on their platforms. Their cultural norms are different for every community, but they tend to stem from harassment or abuse or bad behavior, and they have worked themselves into a position where they’re completely defensive and they can just never catch up on the problem. I really don’t believe it’s possible for either of them to catch up on the problem. I think the best that they can do is figure out how to hide this behavior from an average user. I don’t see any way that it’s going to improve. I have no hope for either of those platforms.

Why?
I just think that the problems are too ingrained, in not only the site and the site’s communities and users but in the general understanding and expectations of the public. I think that if you ask pretty much anybody about Reddit, they’re either not going to know what Reddit is, which is the large majority of people, or they’re going to be like, “Oh, it’s that place where there’s jailbait or something like that.” I don’t think that they’re going to be able to turn these things around.

Were there moments in which Reddit chose to double down on something and made it that much harder to work toward a solution?
I don’t know. I’m trying to think about your question. The typical pattern that we always went through was, there would be a bunch of bad behavior on the site, and the community team would have to deal with it and would be really annoyed. Sometimes they would take the free-speech side and decide that we don’t want to make a call on this. Other times they would say, “Hey, we need to take care of this,” and somebody above them would raise either the free-speech side or the “I don’t want to deal with this because it would cause too many problems on the site” side. That was more often the response.

There are a couple of subreddits, some of which have been banned and some haven’t, but the FatPeopleHate one was a really bad one. There are a bunch of animal-cruelty subreddits, specifically with a sexual nature, that they would always refuse to ban. The arguments were usually, “We don’t want to touch this because these are our most volatile users and they’ll just make things a nightmare,” and then, ultimately, these things will bubble up, make it into the press, and then we would make a decision to change things. We would deal with the immediate impact, which was painful, would last a week or two, and then it would go away. For the most part, unfortunately, I see them still following this pattern.

Is there something recent that you’re thinking of?
I can’t remember the specific instances right now, but there was a bunch of press about things that were going on on Reddit and Discord, and they both reacted and banned the subreddit. They made an announcement, “We’re taking a bigger stance against these things.” Discord made the same announcement.

It’s just more of the same. I don’t see them getting in front of the problem and it’s a total bummer, to be honest. It’s a super bummer. I hate it. I still grapple with the fact that I worked at Reddit, and so does Jessica. She’s decided to leave the industry completely. She’s completely changing her career and has left the tech industry altogether. It’s a bummer.

There’s now this rising chorus of tech executives who, whether it’s because of the Russia election stuff or user privacy concerns or broader user safety issues, are speaking out. Do you think this could amount to any substantial change?
I don’t think the existing platforms are going to change. I do believe that new platforms could be started up, could operate better, could be more mindful, and could create better infrastructure and platforms for the large public. But in order to do that, I think that one of two things needs to happen. I think that the venture capitalists need to kind of reframe their thinking on how these companies look as they start up and grow. I know firsthand that at least the investors that I worked with at Imzy are not ready to undertake that path. Imzy shut down, we still had $8 million in the bank, and we had raised $11 million. I know firsthand the palate of these investors, and from my experience, the majority of Silicon Valley investors are all the same archetype. I think that somebody needs to come along and change their thinking on that. I don’t think that that’s going to happen.

The other way is for a group of people to get together and create a modern platform using in some way their own resources, or finding the resources in interesting ways to do so. Unfortunately, it’s a really expensive process to build a platform like this. It takes a lot of engineering, it takes a lot of human power, it takes a lot of marketing and PR power, and it’s just an expensive process and it takes a long time. It’s really hard to get a network effect going. It would take years. It’s just a really hard process that somebody needs to be in for that ride. I just don’t believe that right now we have found the right mix of the right founders and team to build the infrastructure, and the right funding mechanism to make that happen. I tried, and it just totally didn’t work. It failed. I don’t know. I would love to take a crack at it, but it’s fucking hard to put these resources together.

Let’s say you were able to change the thinking and you were able to get a group of folks who were interested in putting up the capital necessary to create a new platform. Could you get a seat at the table with Facebook and Twitter?
I think it’s absolutely possible, but it takes a couple of major factors. I think a start-up needs to think about the monetization and how it can work with the users instead of against the users. I think they need to figure out the right funding mechanisms and incentive structures that also work toward the users. I think they need to have the right product team in place to focus on users. You’ll start to see a pattern emerge here. I think that they need to have a community or a service team from day one that focuses on users’ well-being. I think they need to have the right intentions. I think you need to get all those kinds of things in place; you need to understand the investment that you’re in for, as far as time. Most start-ups these days have a 12-to-18-month horizon that they look at, and that’s just not enough. That’s not enough to build one of these platforms.

Reddit got lucky. I always thought being acquired and then ignored by Condé Nast was a blessing and a curse. It allowed the communities time to organically grow. Developers let it set and evolve. And that’s exactly the opportunity the platforms need — they need that time to find their footing and to find a number of different cohorts to grow in.

I think that the acquisition that happened was a weird one. I think Condé Nast wanted some street cred, that’s why they bought it. I don’t think they knew what they were buying. In fact, I know they didn’t know what to do with it other than just to let it sit and gain some momentum. Now [co-founder and CEO] Steve [Huffman] is able to grow it into something, and I think he’s gonna do a great job. I think he’s gonna grow it into something huge.

But you don’t think that growth solves the problems?
No, absolutely not. It’s just gonna keep getting worse. I fundamentally believe that my time at Reddit made the world a worse place. And that sucks, and it sucks to have to say that about myself.

If you were talking to people making platforms now, what would you urge them to pay attention to?
I don’t have very many opinions or thoughts about what Reddit or Twitter should do at this time. I just don’t. But I’ve got a lot of advice for start-ups, and it’s not very fucking complicated. It’s just: Think about the impact that you want to have on your users and on the people consuming your content and do the right thing. They know what the right thing is. Discord knows what the right thing is. I had conversations with Jason [Citron] a year ago about the problem of white supremacy on his site, and he said, “I don’t want to invade their privacy by going into their channels and reading what they’re doing.” And I said, “They’re gonna cause deaths because you’re not doing that.” And he said, “You really think so?” And I said, “Yeah.” And sure enough they didn’t do anything, and sure enough deaths were caused because of the shit going on in their channels.

These things can be foreseen. Don’t be idiots about it. You’re people, you see what’s going on, you see trends that are forming, just fucking do something. It’s not that hard. That’s my advice to founders of start-ups, just be mindful of it. Or put somebody in charge of being mindful of it.

All the big companies do have people who are paid to mind this, but it doesn’t seem to be enough.
My guess is that Reddit has six to ten community managers. And even if they had double that, that’s not enough. And my guess is that they have five engineers working on it; that’s just not enough. When I was there and we scaled up the community team, there were three people on the community team. There was a community of 250 million people. It’s not enough. Facebook and Twitter have teams in other countries taking care of the worst of the internet, and Reddit hasn’t even considered doing something like that. And it takes a big investment. I think you’ve got to get out in front of the problem when you start up, and you gotta be able and willing to invest in what it takes to keep up with it. But I think that ignoring either of those puts you in a place where I don’t think you can ever really catch up from it.

I really appreciate you taking the time to speak with me. I’ve been thinking about this for a few years, and it’s felt at once kind of affirming to see it blow up this way, and at the same time totally terrifying.
It’s awful and it’s gonna get worse, so you’re in the right business.

↧

Apple open sources FoundationDB

April 19, 2018, 9:41 am

≫ Next: Kay Boyle Knew Everyone and Saw It All

≪ Previous: ‘I Fundamentally Believe That My Time at Reddit Made the World a Worse Place’

Published April 19, 2018

The next chapter

Starting today, FoundationDB starts its next chapter as an open source project!

FoundationDB is a distributed datastore, designed from the ground up to be deployed on clusters of commodity hardware. These clusters scale well as you add machines, automatically heal from hardware failures, and have a simple API. The key-value store supports fully global, cross-row ACID transactions. That's the highest level of data consistency possible. What does this mean for you? Strong consistency makes your application code simpler, your data models more efficient, and your failure modes less surprising.

The great thing is that FoundationDB is already well-established — it's actively developed and has years of production use. We intend to drive FoundationDB forward as a community project and we welcome your participation.

A powerful abstraction

We believe FoundationDB can become the foundation of the next generation of distributed databases. Since its beginnings in 2010 as a startup, the world of databases has increasingly aligned with FoundationDB to favor data consistency.

The vision of FoundationDB is to start with a simple, powerful core and extend it through the addition of “layers”. The key-value store, which is open sourced today, is the core, focused on incorporating only features that aren’t possible to write in layers. Layers extend that core by adding features to model specific types of data and handle their access patterns.

The fundamental architecture of FoundationDB, including its use of layers, promotes the best practices of scalable and manageable systems. By running multiple layers on a single cluster (for example a document store layer and a graph layer), you can match your specific applications to the best data model. Running less infrastructure reduces your organization's operational and technical overhead.

By open sourcing the FoundationDB core, we expect the quantity and variety of layers to develop rapidly. When we think about the FoundationDB community, we approach it both in terms of the core itself and the ecosystem of layers that it enables.

By open sourcing FoundationDB, our goal is to build an open community. All major development will be done in the open. We’ve outlined a design document process to ensure that this work is done transparently and with community input. We’ve taken early steps to outline project governance to provide a basic structure that will enable members of the community who actively contribute to have a greater voice in the project decision-making.

We also want FoundationDB to be a healthy and responsive community. To that end, we’ve adopted a code of conduct based on the Contributor Covenant to outline the behaviors we encourage and those we disallow.

We’d love your participation. Here are several ways you can get involved:

Ask questions on the FoundationDB community forums: forums.foundationdb.org. We have categories for user-related questions (how do I use X) as well as development questions (I am digging into the FoundationDB core and want to change Y). Say hello!
Help improve the software by reporting bugs through GitHub issues.
Make contributions to the core software and documentation (please see the project’s contribution guide).

Get Started

The source for FoundationDB is available at github.com/apple/foundationdb.

Please see the Getting Started guide for the basics of how to install, use, and develop against FoundationDB. Binary installers are available for macOS, Windows, and Linux at www.foundationdb.org/download/.

↧

Kay Boyle Knew Everyone and Saw It All

April 19, 2018, 11:39 am

≫ Next: The AI Revolution Hasn’t Happened Yet

≪ Previous: Apple open sources FoundationDB

"It was in the late twenties that I went to live and work in Paris, and I was then still a French citizen (through my marriage). These two facts would seem to disqualify me as a member of the lost generation or as an expatriate. But I was there, in whatever guise, and even if a bit late.” So begins Kay Boyle’s memoirs of that decade, Being Geniuses Together.

Writer Kay Boyle (1902–1992) had little patience with the legend of the “Lost Generation” of American expatriate artists and writers who gathered in Paris in the 1920s. “I think all this glorification of that wonderful Camelot period is absurd,” Boyle declared in a 1984 New York Times Book Review interview headlined “Paris Wasn’t Like That.”

“I never understood what the Lost Generation meant,” Boyle told NBC television correspondent Pat Mitchell in a 1988 interview for the Today Show. “It was not a community thing at all. It’s been misrepresented. . . . It was characterized by real desperation, there were suicides. And if you sat down at a table at night with some people you knew were writers, if you would mention anything you were doing, everyone would get up and leave the table. You didn’t talk about your work.”

Yet even if she debunked the myth, Boyle most certainly was there. Her early poems and short stories appeared in the avant-garde “little magazines” published in Paris, alongside the works of Ezra Pound, James Joyce, William Carlos Williams, Gertrude Stein, and Ernest Hemingway. Nor did she deny that something momentous was taking place in Paris: “Our daily revolt was against literary pretentiousness, against weary, dreary rhetoric, against out-worn literary conventions. We called our protest ‘the revolution of the word,’ and there is no doubt that it was high time such a revolution took place. . . . There was then, before the Twenties, no lively, wholly American, grandly experimental, and furiously disrespectful school of writing so we had to invent that school,” she wrote in a 1971 essay called “Writers in Metaphysical Revolt.” In 1929, Boyle had been among a group of expatriate writers and artists who signed a manifesto published in transition magazine calling for the “Revolution of the Word” and declaring, “THE WRITER EXPRESSES. HE DOES NOT COMMUNICATE” and “THE PLAIN READER BE DAMNED.”

In a 1931 New Republic review of two of Boyle’s earliest books, Wedding Day and Other Stories (1930) and her first published novel, Plagued by the Nightingale (1931), Katherine Anne Porter declared, “Gertrude Stein and James Joyce were and are the glories of their time and some very portentous talents have emerged from their shadows. Miss Boyle, one of the newest, I believe to be among the strongest.” Identifying Boyle as “part of the most important literary movement of her time,” Porter wrote, “She sums up the salient qualities of that movement: a fighting spirit, freshness of feeling, curiosity, the courage of her own attitude and idiom, a violently dedicated search for the meanings and methods of art.”

William Carlos Williams saw Boyle as Emily Dickinson’s successor. Poet and publisher Harry Crosby, whose Black Sun Press brought out Boyle’s first book, Short Stories, in 1929, called her “the best girl writer since Jane Austen.”

Boyle, who had a knack for always being where the major events of the twentieth century were taking place, had a long and distinguished career. She wrote more than 40 books, among them 14 novels, 11 collections of short fiction, eight volumes of poetry, essay collections, children’s literature, two ghost-written books, and translations of French novels. Her impeccable literary credentials include two O. Henry Awards for Best Short Story of the Year, two Guggenheim fellowships, and membership in the American Academy of Arts and Letters, where she (literally) occupied the Henry James chair.

I came to know Boyle in the course of writing a doctoral dissertation about her. My adviser at Penn State, Philip Young, one of the earliest and most influential Hemingway scholars, had refused to let me write about Hemingway, saying he had been done to death and that I should find a topic on which I could say something new. Until I went “shopping” on the library shelves looking for inspiration and happened upon a long line of her books, I had never heard of Boyle.

Flipping through Boyle’s books and reading the author bios, I was immediately intrigued and set to reading everything she had written. Three years later, with a degree in hand and aiming to turn the dissertation into a book, I sent a copy to Boyle and asked her permission to quote from the unpublished letters I had read among her papers at the Southern Illinois University library. I expected a yes or no answer. Instead, in a series of letters over the next several months, she responded to my work page by page, paragraph by paragraph, saying she thought my thesis “deeply right,” but not hesitating to let me know when she thought I was off the mark. She was responding in such “excruciating detail,” she said, because she did not intend to go to such trouble again, and she considered the book her authorized biography. Five years later, in 1986, my book Kay Boyle: Artist and Activist became the first full-length study of her life and work to be published.

In the years that followed, we continued to correspond, and I made periodic visits to the Bay Area to see her. I edited a volume of previously uncollected short stories, Life Being the Best and Other Stories, having had to persuade her that her early experimental fiction was, indeed, still of interest. To her, Paris in the twenties was ancient history. She was far more interested in current politics, social justice, and the problems of her neighbors.

Although Boyle was known in her lifetime as a first-rate novelist and short-story writer, her letters to her contemporaries are equally significant for what they reveal about her eventful times. A roster of her correspondents reads like a twentieth-century Who’s Who. They include Williams, Pound, Porter, and Archibald MacLeish, poet and publisher Robert McAlmon, Black Sun Press publishers Harry and Caresse Crosby, Poetry editor Harriet Monroe, New Directions publisher James Laughlin, New Yorker editor Harold Ross, Richard Wright, William Shirer, Samuel Beckett, Jessica Mitford, and many other famous figures—not to mention scores of family members, friends, politicians, students, and admirers.

Throughout her life, Boyle decried the prurient public interest in writers’ personal lives, but as she approached the age of 90, she seemed to possess a growing sense of her place in literary history and decided that she wanted a collection of her letters to stand as a record of her life in her own words. I was honored when she asked me to take on this project the year before she died. Little did I know that Boyle had written at least 25,000 letters, nor that the project would take more than two decades to complete. I gathered copies of some 7,000 letters from 80 different sources—from 54 libraries and institutional repositories as well from the private collections of more than two dozen individuals, many of whom had known or corresponded with Boyle themselves. From these I selected 378 for a single volume of selected letters, Kay Boyle: A Twentieth-Century Life in Letters.

Of the letters Boyle composed, one she wrote as a brash, young artistic revolutionary was prescient. On August 15, 1930, she wrote to fellow poet Walter Lowenfels: “The present day reader is not worth a ha’penny bit and I don’t care whether his judgments are prejudiced or not. Hell, I’m writing for posterity—I mean, I’m going to. I’m going to write a record of our age and posterity is going to know all about our age and they won’t care if my name is Boyle or Murphy.” In both her published work and her private correspondence, Boyle did, indeed, write “a record of our age.”

Like many of her contemporaries of the so-called “Lost Generation,” Boyle came from the American Midwest. Born in 1902 in St. Paul, Minnesota, she hailed from the same hometown as F. Scott Fitzgerald, who was born there in 1896. But from there her story diverges from the common narrative. When she was two years old, her family left St. Paul, and thanks to the affluence of her grandfather, Jesse Peyton Boyle, an attorney and a cofounder of the West Publishing Company, they “traveled expensively, and dined expansively, in a great many different countries” during her childhood, as she recalled in Being Geniuses Together. They lived in and around Philadelphia and Atlantic City, and summered at a resort in the Poconos.

After the family suffered financial reversals in 1916, Boyle’s father went into the automotive repair business with a cousin in Cincinnati. As that business slowly failed, the family lived in progressively modest dwellings until they finally moved into quarters above the garage in the city’s industrial section. Her mother, Katherine Evans Boyle, was keenly attuned to modern developments in the arts. A friend and correspondent of Alfred Stieglitz, she kept him abreast of her two daughters’ artistic achievements. In 1913, she took Kay and her sister to see the Armory Show in New York City, where Marcel Duchamp’s Nude Descending a Staircase was creating a sensation.

Boyle, who had no formal schooling, later wrote that her mother alone had been her education: “Mother accepted me and my word as she accepted James Joyce, Gertrude Stein, or Brancusi, or any serious artist. Because of her, I knew that anyone who wrote, or anyone who painted, or anyone who composed music, had a special place in life. And so, when I got to Paris, and really met these people who were accomplishing things, I felt I belonged with them, because my mother brought me up in that quite simple feeling.”

In 1923, Boyle went to France not to pursue her artistic calling among the Left Bank literati, but because she had married a French exchange student she met in Cincinnati, where he had earned a degree in engineering. At the time, by U.S. law, an American woman had to take her husband’s nationality. When she and her husband, Richard Brault, left to spend the summer with his family in Brittany, Boyle had no way of knowing that she would not return to the United States for 18 years.

She lived her first two years abroad as a French housewife in hardscrabble circumstances, in the port city of Le Havre, then in Harfleur. While her husband put in 12-hour days working for an electric company, Boyle immersed herself in her writing, turning out poems, stories, and “endless letters.” The letters record not only her artistic aims and efforts, what she is reading, and her assessments of other writers, but the mundane struggles of daily life. Writing to her mother (“dearest muddie”) on October 25, 1923, Boyle describes moving into a flat lacking heat, electricity, and running water: “We have had two back-breaking days. Moved in Tuesday in the pouring rain. Cheerfully opened the kitchen shelves and found every dish piled with rotting food—the stove stuffed with all kinds of filth—every sauce pan reeking. For solace I turned to the bedroom—the chamberpot encrusted an inch thick with urine—I had to get it out with a knife.”

In late 1925 Boyle developed a lung condition in the dank northern winter. Gratefully, she accepted an invitation from Ernest Walsh, coeditor with Ethel Moorhead of the little magazine This Quarter, to see his lung specialist in Paris at his expense and then join him and Moorhead for a few months in their rented villa in the South of France to recuperate in the warmth and sun. Boyle soon regained her health, and it was not long before she and Walsh fell in love.

Walsh died of tuberculosis in October 1926 at the age of 31, leaving Boyle pregnant with his child, born in March 1927. Her husband, Richard, invited her and her daughter, Sharon, to return to live with him in Stoke-on-Trent, England, where he had found a job with the Michelin Tire Company. Out of options, Kay accepted. She lasted there a year. “I cannot inflict a platonic wife upon Richard for the rest of his existence, and I want to take care of my daughter myself,” she wrote to poet and editor Lola Ridge on November 29, 1927.

So it was in 1928 that Boyle finally made it to Paris as a single working mother. Largely because she needed housing and child care, she joined the commune of Raymond Duncan (Isadora’s brother), who preached the virtues of the simple life and whose followers wore togas and sandals and subsisted on goat cheese and yogurt. She worked in Duncan’s two Paris gift shops, where goods supposedly handcrafted at the commune (but imported from Greece) were sold to wealthy tourists. Duncan’s hypocrisy revealed itself when he used the proceeds of a large sale to an American museum to buy himself an American luxury automobile instead of a printing press for the colony. With the help of her friends Harry and Caresse Crosby and Robert McAlmon, Boyle was able to “kidnap” her daughter and escape the commune in late 1928. She continued writing, and her circle of friends widened to include leading writers, publishers, and artists of the day, among them Joyce, transition publisher Eugene Jolas, Hart Crane, Emma Goldman, Constantin Brancusi, and Marcel Duchamp (who later would become the godfather of her sixth child, born in 1943).

While the mass of American exiles returned home when the Jazz Age collapsed along with the economy in late 1929, Boyle remained in Europe. That year, on the terrace of La Coupole, she had met Laurence Vail, known as “the King of Bohemia,” and with their melded family they took up a peripatetic life together, following favorable exchange rates across Europe. Their daughter Apple-Joan was born in December 1929, they married in Nice in 1932, their daughter Kathe was born in Kitzbühel, Austria, in 1934, and they frequently cared for Vail’s two children, Sindbad and Pegeen, from his previous marriage to Peggy Guggenheim. Living in Austria in the 1930s, Boyle witnessed firsthand the rise of fascism. She won the 1935 O. Henry Award for Best Short Story of the Year with “The White Horses of Vienna,” which featured a Nazi protagonist. In 1937, they bought a chalet in Megève, a village in the French Alps, dubbing their home “Les Cinq Enfants”—changing the name to “Les Six Enfants” when their third daughter, Clover, was born in 1939.

In letters as well as in her fiction writing, Boyle bore witness to France’s mobilization and the continuities and dissonances of everyday life. “Here half the world is skiing while the other half dies, and the night-clubs are open until three in the morning, and God knows how people can dance as madly as that and as late and be as happy as they are,” she wrote to Caresse Crosby on February 16, 1940, four months before France fell to Nazi Germany.

In the summer of 1941, war finally forced Boyle and her family from Europe. Through months of herculean efforts with various authorities, Boyle managed to arrange passage to America by way of Lisbon via the Pan Am Clipper. The returning entourage included her second husband, Laurence Vail; his ex-wife, Peggy Guggenheim; Guggenheim’s husband-to-be, Max Ernst; and the combined brood of six children. A photograph published in the New York World-Telegram shows the family looking dazed upon disembarking in New York City on July 14, 1941.

What the papers did not report was that Boyle had arranged separate passage on a refugee ship for her husband-to-be, Joseph von Franckenstein, an Austrian baron who had fled Nazism in 1938, whom she had met in Megève, where he was a ski instructor and the children’s tutor. Boyle and Franckenstein would marry in 1943 and have two children: Faith, her fifth daughter, and Ian, her first son.

Back in America, angered by the talk she heard accusing the French of having “lain down on the job,” Boyle began churning out stories for mass-market magazines—both to earn a living and to communicate to the widest possible audience what was happening in Europe. She went on the lecture circuit to speak about France under the German occupation. Her talk consisted of “actual stories of the defeat and of the people’s reaction to occupation, scenes in which I participated myself, an indictment of the State Department, and of the Fascist element in every country,” she wrote Robert McAlmon on December 6, 1941. “The whole thing is an attempt to demonstrate further that nothing is of any importance except the individual resistance and the individual protest.”

Boyle’s 1944 novel, Avalanche, originally serialized in the Saturday Evening Post, was the first novel about the French Resistance—and her only best-seller. In the meantime, Joseph Franckenstein became an American citizen, joined the U.S. Mountain Infantry and later the OSS, the predecessor of the CIA, and engaged in intelligence work behind enemy lines. He infiltrated into Austria in the guise of a German sergeant, worked with resistance groups, and after being captured and tortured by the Gestapo, he narrowly escaped with his life.

After the war, Franckenstein and Boyle returned to Europe, he as a U.S. Foreign Service officer, and she as a foreign correspondent for the New Yorker, assigned by editor Harold Ross to “bring fiction out of Occupied Germany.” In 1952, in the heyday of the McCarthy witch hunt, they were subjected to a loyalty-security hearing in Marburg, Germany. Janet Flanner, the New Yorker’s longtime Paris correspondent, testified eloquently for the defense, and after Boyle put her friend back on the train to Paris, she wrote to her New York agent, Ann Watkins, on October 23, 1952:

Everyone feels confident that the decision of the panel will be the right one, but we will not know what this decision is for several weeks. I shall write you at length about everything very soon, but at the present moment I feel so depressed, so crushed even, by the humiliation, the degradation, the shocking injustice of all this, that I can only feel numbly for some sort of action to take to help keep this kind of thing from happening to other honest people. Joseph has aged ten years—he is haggard—his heart, let us hope, only temporarily, broken—but broken without any doubt.

Boyle and Franckenstein were unanimously cleared of all charges, but within a few months, he and several other government employees who had been cleared by the loyalty-security panel, were dismissed from the Foreign Service “in the interest of national security.” Then the New Yorker withdrew her accreditation. The family was forced to return to the United States, where Franckenstein took a job teaching at a girls’ school in Rowayton, Connecticut, and Boyle found herself blacklisted, unable to place her work for most of the rest of the decade.

For nine years they fought to clear their names, and Franckenstein was finally reinstated, with apologies from the State Department, in 1962. Soon after taking up a post as cultural attaché at the American embassy in Tehran, he was diagnosed with terminal cancer. A few months before his death, Boyle put out word that she was seeking a job. In 1963, she accepted a position on the creative writing faculty at San Francisco State University, where she taught until she retired in 1979.

Living two blocks from the intersection of Haight and Ashbury streets, Boyle was at the epicenter of another cultural revolution. Throughout the sixties and seventies she became a prominent figure in Bay Area protests and picket lines. She got herself publicly fired by university president S. I. Hayakawa during the 1968 student strike and was arrested for blocking access to the Oakland Induction Center at the height of the war in Vietnam, going to jail with Joan Baez and the singer’s mother. She turned her annual birthday parties into fund-raising events for Amnesty International. Well into her eighties, always strapped for money, she took a number of short-term positions teaching creative writing at various universities (customarily opening her courses by admonishing her students never again to take another creative writing course). After moving to a retirement community in Marin County in 1989, she delighted in discomfiting some fellow residents when she determined to integrate the dining room by inviting friends of color to lunch. Until her death in 1992, Boyle continued to speak out in print and in person in support of human rights and social justice.

Boyle’s letters are a portrait of the artist in the continuous present tense. Through them we can chart Boyle’s personal life on an almost daily basis. (She wrote more than 4,000 letters to her husband Joseph von Franckenstein alone. Along with 1,548 of his letters to her, they survive among her papers at the Morris Library at Southern Illinois University, where she had stipulated that their correspondence be sealed for 10 years after her death.) But beyond their biographical interest and their value for illuminating her creative process, Boyle’s letters narrate a running eyewitness history of her times.

My work on Kay Boyle yielded an unexpected reward: the discovery of a carbon typescript of her long-lost first novel, Process, which was written in 1924 and 1925 in France. Boyle had presumed it lost, after she gave her only copy to a potential publisher. The manuscript had been hiding in plain sight—in the Berg Collection of English and American Literature at the New York Public Library. How it got there is a mystery, but the manuscript was among the papers of Boyle’s friend of the 1920s, Louise Theis.

Published for the first time in 2001 by the University of Illinois Press, Process: A Novel combines aesthetic experimentation and social activism. Underpinning it is the view that progressivism in politics and in the arts are “deeply, deeply united.” We can only wonder how the novel would have been received and how Boyle’s literary reputation might have been affected had it been published at the beginning of her career.

In 1986, when she was 84 years old, Boyle’s friend Studs Terkel told an interviewer, “When I think of Kay Boyle, I think of someone who has borne witness to the most traumatic and shattering events of our century: not simply this particular era, but of the whole twentieth century. Starting early. Both as a creative artist as well as being there. . . . All those events that one way or another, for better or for worse have altered all of our lives, Kay Boyle, writer, participant, was there.”

“Why is Kay Boyle not better known?” he asked. “Things are out of joint when someone like Kay Boyle is not as celebrated as she should be.”

↧

The AI Revolution Hasn’t Happened Yet

April 18, 2018, 11:32 pm

≫ Next: PowerBuilder History, Powersoft History (2004)

≪ Previous: Kay Boyle Knew Everyone and Saw It All

Photo credit: Peg Skorpinski

Artificial Intelligence (AI) is the mantra of the current era. The phrase is intoned by technologists, academicians, journalists and venture capitalists alike. As with many phrases that cross over from technical academic fields into general circulation, there is significant misunderstanding accompanying the use of the phrase. But this is not the classical case of the public not understanding the scientists — here the scientists are often as befuddled as the public. The idea that our era is somehow seeing the emergence of an intelligence in silicon that rivals our own entertains all of us — enthralling us and frightening us in equal measure. And, unfortunately, it distracts us.

There is a different narrative that one can tell about the current era. Consider the following story, which involves humans, computers, data and life-or-death decisions, but where the focus is something other than intelligence-in-silicon fantasies. When my spouse was pregnant 14 years ago, we had an ultrasound. There was a geneticist in the room, and she pointed out some white spots around the heart of the fetus. “Those are markers for Down syndrome,” she noted, “and your risk has now gone up to 1 in 20.” She further let us know that we could learn whether the fetus in fact had the genetic modification underlying Down syndrome via an amniocentesis. But amniocentesis was risky — the risk of killing the fetus during the procedure was roughly 1 in 300. Being a statistician, I determined to find out where these numbers were coming from. To cut a long story short, I discovered that a statistical analysis had been done a decade previously in the UK, where these white spots, which reflect calcium buildup, were indeed established as a predictor of Down syndrome. But I also noticed that the imaging machine used in our test had a few hundred more pixels per square inch than the machine used in the UK study. I went back to tell the geneticist that I believed that the white spots were likely false positives — that they were literally “white noise.” She said “Ah, that explains why we started seeing an uptick in Down syndrome diagnoses a few years ago; it’s when the new machine arrived.”

We didn’t do the amniocentesis, and a healthy girl was born a few months later. But the episode troubled me, particularly after a back-of-the-envelope calculation convinced me that many thousands of people had gotten that diagnosis that same day worldwide, that many of them had opted for amniocentesis, and that a number of babies had died needlessly. And this happened day after day until it somehow got fixed. The problem that this episode revealed wasn’t about my individual medical care; it was about a medical system that measured variables and outcomes in various places and times, conducted statistical analyses, and made use of the results in other places and times. The problem had to do not just with data analysis per se, but with what database researchers call “provenance” — broadly, where did data arise, what inferences were drawn from the data, and how relevant are those inferences to the present situation? While a trained human might be able to work all of this out on a case-by-case basis, the issue was that of designing a planetary-scale medical system that could do this without the need for such detailed human oversight.

I’m also a computer scientist, and it occurred to me that the principles needed to build planetary-scale inference-and-decision-making systems of this kind, blending computer science with statistics, and taking into account human utilities, were nowhere to be found in my education. And it occurred to me that the development of such principles — which will be needed not only in the medical domain but also in domains such as commerce, transportation and education — were at least as important as those of building AI systems that can dazzle us with their game-playing or sensorimotor skills.

Whether or not we come to understand “intelligence” any time soon, we do have a major challenge on our hands in bringing together computers and humans in ways that enhance human life. While this challenge is viewed by some as subservient to the creation of “artificial intelligence,” it can also be viewed more prosaically — but with no less reverence — as the creation of a new branch of engineering. Much like civil engineering and chemical engineering in decades past, this new discipline aims to corral the power of a few key ideas, bringing new resources and capabilities to people, and doing so safely. Whereas civil engineering and chemical engineering were built on physics and chemistry, this new engineering discipline will be built on ideas that the preceding century gave substance to — ideas such as “information,” “algorithm,” “data,” “uncertainty,” “computing,” “inference,” and “optimization.” Moreover, since much of the focus of the new discipline will be on data from and about humans, its development will require perspectives from the social sciences and humanities.

While the building blocks have begun to emerge, the principles for putting these blocks together have not yet emerged, and so the blocks are currently being put together in ad-hoc ways.

Thus, just as humans built buildings and bridges before there was civil engineering, humans are proceeding with the building of societal-scale, inference-and-decision-making systems that involve machines, humans and the environment. Just as early buildings and bridges sometimes fell to the ground — in unforeseen ways and with tragic consequences — many of our early societal-scale inference-and-decision-making systems are already exposing serious conceptual flaws.

And, unfortunately, we are not very good at anticipating what the next emerging serious flaw will be. What we’re missing is an engineering discipline with its principles of analysis and design.

The current public dialog about these issues too often uses “AI” as an intellectual wildcard, one that makes it difficult to reason about the scope and consequences of emerging technology. Let us begin by considering more carefully what “AI” has been used to refer to, both recently and historically.

Most of what is being called “AI” today, particularly in the public sphere, is what has been called “Machine Learning” (ML) for the past several decades. ML is an algorithmic field that blends ideas from statistics, computer science and many other disciplines (see below) to design algorithms that process data, make predictions and help make decisions. In terms of impact on the real world, ML is the real thing, and not just recently. Indeed, that ML would grow into massive industrial relevance was already clear in the early 1990s, and by the turn of the century forward-looking companies such as Amazon were already using ML throughout their business, solving mission-critical back-end problems in fraud detection and logistics-chain prediction, and building innovative consumer-facing services such as recommendation systems. As datasets and computing resources grew rapidly over the ensuing two decades, it became clear that ML would soon power not only Amazon but essentially any company in which decisions could be tied to large-scale data. New business models would emerge. The phrase “Data Science” began to be used to refer to this phenomenon, reflecting the need of ML algorithms experts to partner with database and distributed-systems experts to build scalable, robust ML systems, and reflecting the larger social and environmental scope of the resulting systems.

This confluence of ideas and technology trends has been rebranded as “AI” over the past few years. This rebranding is worthy of some scrutiny.

Historically, the phrase “AI” was coined in the late 1950’s to refer to the heady aspiration of realizing in software and hardware an entity possessing human-level intelligence. We will use the phrase “human-imitative AI” to refer to this aspiration, emphasizing the notion that the artificially intelligent entity should seem to be one of us, if not physically at least mentally (whatever that might mean). This was largely an academic enterprise. While related academic fields such as operations research, statistics, pattern recognition, information theory and control theory already existed, and were often inspired by human intelligence (and animal intelligence), these fields were arguably focused on “low-level” signals and decisions. The ability of, say, a squirrel to perceive the three-dimensional structure of the forest it lives in, and to leap among its branches, was inspirational to these fields. “AI” was meant to focus on something different — the “high-level” or “cognitive” capability of humans to “reason” and to “think.” Sixty years hence, however, high-level reasoning and thought remain elusive. The developments which are now being called “AI” arose mostly in the engineering fields associated with low-level pattern recognition and movement control, and in the field of statistics — the discipline focused on finding patterns in data and on making well-founded predictions, tests of hypotheses and decisions.

Indeed, the famous “backpropagation” algorithm that was rediscovered by David Rumelhart in the early 1980s, and which is now viewed as being at the core of the so-called “AI revolution,” first arose in the field of control theory in the 1950s and 1960s. One of its early applications was to optimize the thrusts of the Apollo spaceships as they headed towards the moon.

Since the 1960s much progress has been made, but it has arguably not come about from the pursuit of human-imitative AI. Rather, as in the case of the Apollo spaceships, these ideas have often been hidden behind the scenes, and have been the handiwork of researchers focused on specific engineering challenges. Although not visible to the general public, research and systems-building in areas such as document retrieval, text classification, fraud detection, recommendation systems, personalized search, social network analysis, planning, diagnostics and A/B testing have been a major success — these are the advances that have powered companies such as Google, Netflix, Facebook and Amazon.

One could simply agree to refer to all of this as “AI,” and indeed that is what appears to have happened. Such labeling may come as a surprise to optimization or statistics researchers, who wake up to find themselves suddenly referred to as “AI researchers.” But labeling of researchers aside, the bigger problem is that the use of this single, ill-defined acronym prevents a clear understanding of the range of intellectual and commercial issues at play.

The past two decades have seen major progress — in industry and academia — in a complementary aspiration to human-imitative AI that is often referred to as “Intelligence Augmentation” (IA). Here computation and data are used to create services that augment human intelligence and creativity. A search engine can be viewed as an example of IA (it augments human memory and factual knowledge), as can natural language translation (it augments the ability of a human to communicate). Computing-based generation of sounds and images serves as a palette and creativity enhancer for artists. While services of this kind could conceivably involve high-level reasoning and thought, currently they don’t — they mostly perform various kinds of string-matching and numerical operations that capture patterns that humans can make use of.

Hoping that the reader will tolerate one last acronym, let us conceive broadly of a discipline of “Intelligent Infrastructure” (II), whereby a web of computation, data and physical entities exists that makes human environments more supportive, interesting and safe. Such infrastructure is beginning to make its appearance in domains such as transportation, medicine, commerce and finance, with vast implications for individual humans and societies. This emergence sometimes arises in conversations about an “Internet of Things,” but that effort generally refers to the mere problem of getting “things” onto the Internet — not to the far grander set of challenges associated with these “things” capable of analyzing those data streams to discover facts about the world, and interacting with humans and other “things” at a far higher level of abstraction than mere bits.

For example, returning to my personal anecdote, we might imagine living our lives in a “societal-scale medical system” that sets up data flows, and data-analysis flows, between doctors and devices positioned in and around human bodies, thereby able to aid human intelligence in making diagnoses and providing care. The system would incorporate information from cells in the body, DNA, blood tests, environment, population genetics and the vast scientific literature on drugs and treatments. It would not just focus on a single patient and a doctor, but on relationships among all humans — just as current medical testing allows experiments done on one set of humans (or animals) to be brought to bear in the care of other humans. It would help maintain notions of relevance, provenance and reliability, in the way that the current banking system focuses on such challenges in the domain of finance and payment. And, while one can foresee many problems arising such a system — involving privacy issues, liability issues, security issues, etc — these problems should properly be viewed as challenges, not show-stoppers.

We now come to a critical issue: Is working on classical human-imitative AI the best or only way to focus on these larger challenges? Some of the most heralded recent success stories of ML have in fact been in areas associated with human-imitative AI — areas such as computer vision, speech recognition, game-playing and robotics. So perhaps we should simply await further progress in domains such as these. There are two points to make here. First, although one would not know it from reading the newspapers, success in human-imitative AI has in fact been limited — we are very far from realizing human-imitative AI aspirations. Unfortunately the thrill (and fear) of making even limited progress on human-imitative AI gives rise to levels of over-exuberance and media attention that is not present in other areas of engineering.

Second, and more importantly, success in these domains is neither sufficient nor necessary to solve important IA and II problems. On the sufficiency side, consider self-driving cars. For such technology to be realized, a range of engineering problems will need to be solved that may have little relationship to human competencies (or human lack-of-competencies). The overall transportation system (an II system) will likely more closely resemble the current air-traffic control system than the current collection of loosely-coupled, forward-facing, inattentive human drivers. It will be vastly more complex than the current air-traffic control system, specifically in its use of massive amounts of data and adaptive statistical modeling to inform fine-grained decisions. It is those challenges that need to be in the forefront, and in such an effort a focus on human-imitative AI may be a distraction.

As for the necessity argument, it is sometimes argued that the human-imitative AI aspiration subsumes IA and II aspirations, because a human-imitative AI system would not only be able to solve the classical problems of AI (as embodied, e.g., in the Turing test), but it would also be our best bet for solving IA and II problems. Such an argument has little historical precedent. Did civil engineering develop by envisaging the creation of an artificial carpenter or bricklayer? Should chemical engineering have been framed in terms of creating an artificial chemist? Even more polemically: if our goal was to build chemical factories, should we have first created an artificial chemist who would have then worked out how to build a chemical factory?

A related argument is that human intelligence is the only kind of intelligence that we know, and that we should aim to mimic it as a first step. But humans are in fact not very good at some kinds of reasoning — we have our lapses, biases and limitations. Moreover, critically, we did not evolve to perform the kinds of large-scale decision-making that modern II systems must face, nor to cope with the kinds of uncertainty that arise in II contexts. One could argue
that an AI system would not only imitate human intelligence, but also “correct” it, and would also scale to arbitrarily large problems. But we are now in the realm of science fiction — such speculative arguments, while entertaining in the setting of fiction, should not be our principal strategy going forward in the face of the critical IA and II problems that are beginning to emerge. We need to solve IA and II problems on their own merits, not as a mere corollary to an human-imitative AI agenda.

It is not hard to pinpoint algorithmic and infrastructure challenges in II systems that are not central themes in human-imitative AI research. II systems require the ability to manage distributed repositories of knowledge that are rapidly changing and are likely to be globally incoherent. Such systems must cope with cloud-edge interactions in making timely, distributed decisions and they must deal with long-tail phenomena whereby there is lots of data on some individuals and little data on most individuals. They must address the difficulties of sharing data across administrative and competitive boundaries. Finally, and of particular importance, II systems must bring economic ideas such as incentives and pricing into the realm of the statistical and computational infrastructures that link humans to each other and to valued goods. Such II systems can be viewed as not merely providing a service, but as creating markets. There are domains such as music, literature and journalism that are crying out for the emergence of such markets, where data analysis links producers and consumers. And this must all be done within the context of evolving societal, ethical and legal norms.

Of course, classical human-imitative AI problems remain of great interest as well. However, the current focus on doing AI research via the gathering of data, the deployment of “deep learning” infrastructure, and the demonstration of systems that mimic certain narrowly-defined human skills — with little in the way of emerging explanatory principles — tends to deflect attention from major open problems in classical AI. These problems include the need to bring meaning and reasoning into systems that perform natural language processing, the need to infer and represent causality, the need to develop computationally-tractable representations of uncertainty and the need to develop systems that formulate and pursue long-term goals. These are classical goals in human-imitative AI, but in the current hubbub over the “AI revolution,” it is easy to forget that they are not yet solved.

IA will also remain quite essential, because for the foreseeable future, computers will not be able to match humans in their ability to reason abstractly about real-world situations. We will need well-thought-out interactions of humans and computers to solve our most pressing problems. And we will want computers to trigger new levels of human creativity, not replace human creativity (whatever that might mean).

It was John McCarthy (while a professor at Dartmouth, and soon to take a
position at MIT) who coined the term “AI,” apparently to distinguish his
budding research agenda from that of Norbert Wiener (then an older professor at MIT). Wiener had coined “cybernetics” to refer to his own vision of intelligent systems — a vision that was closely tied to operations research, statistics, pattern recognition, information theory and control theory. McCarthy, on the other hand, emphasized the ties to logic. In an interesting reversal, it is Wiener’s intellectual agenda that has come to dominate in the current era, under the banner of McCarthy’s terminology. (This state of affairs is surely, however, only temporary; the pendulum swings more in AI than
in most fields.)

But we need to move beyond the particular historical perspectives of McCarthy and Wiener.

We need to realize that the current public dialog on AI — which focuses on a narrow subset of industry and a narrow subset of academia — risks blinding us to the challenges and opportunities that are presented by the full scope of AI, IA and II.

This scope is less about the realization of science-fiction dreams or nightmares of super-human machines, and more about the need for humans to understand and shape technology as it becomes ever more present and influential in their daily lives. Moreover, in this understanding and shaping there is a need for a diverse set of voices from all walks of life, not merely a dialog among the technologically attuned. Focusing narrowly on human-imitative AI prevents an appropriately wide range of voices from being heard.

While industry will continue to drive many developments, academia will also continue to play an essential role, not only in providing some of the most innovative technical ideas, but also in bringing researchers from the computational and statistical disciplines together with researchers from other
disciplines whose contributions and perspectives are sorely needed — notably
the social sciences, the cognitive sciences and the humanities.

On the other hand, while the humanities and the sciences are essential as we go forward, we should also not pretend that we are talking about something other than an engineering effort of unprecedented scale and scope — society is aiming to build new kinds of artifacts. These artifacts should be built to work as claimed. We do not want to build systems that help us with medical treatments, transportation options and commercial opportunities to find out after the fact that these systems don’t really work — that they make errors that take their toll in terms of human lives and happiness. In this regard, as I have emphasized, there is an engineering discipline yet to emerge for the data-focused and learning-focused fields. As exciting as these latter fields appear to be, they cannot yet be viewed as constituting an engineering discipline.

Moreover, we should embrace the fact that what we are witnessing is the creation of a new branch of engineering. The term “engineering” is often
invoked in a narrow sense — in academia and beyond — with overtones of cold, affectless machinery, and negative connotations of loss of control by humans. But an engineering discipline can be what we want it to be.

In the current era, we have a real opportunity to conceive of something historically new — a human-centric engineering discipline.

I will resist giving this emerging discipline a name, but if the acronym “AI” continues to be used as placeholder nomenclature going forward, let’s be aware of the very real limitations of this placeholder. Let’s broaden our scope, tone down the hype and recognize the serious challenges ahead.

Michael I. Jordan

↧

PowerBuilder History, Powersoft History (2004)

April 19, 2018, 11:19 am

≫ Next: API and Other Platform Product Changes

≪ Previous: The AI Revolution Hasn’t Happened Yet

Written by Patrick Lannigan - Winter 2004

Powersoft grew out of a company called Computer Solutions Inc (CSI), which was founded in 1974. In the beginning, CSI, like many software companies, providing consulting services. In CSI's case, they focused on small to medium sized manufacturers. As a result of their experience in this field, they decided to build their own software package called GrowthPower. It was released in 1981. It was an MRP II system with a suite of integrated financial products which ran (exclusively) on the HP3000 platform. There was, at one time, over 1,000 customers for GrowthPower.

Mitchell Kertzman, the CEO of GrowthPower, started to solicit feedback from his customers on their future needs. The answer came back loud and clear. They wanted a graphical interface (remember that by this time, in the early 90s, Windows was catching on like wildfire and made the old "character" interfaces look inferior). So, CSI started looking around for tools and technology to build their next generation application. They didn't like what they saw. The only tools at the time that could provide a graphical interface required programmers to use the C language.

As luck would have it, Dave Litwack, former VP of R&D for Cullinet, had just left Cullinet after they got bought by Computer Associates and was circulating a business plan within the venture capital community in the Boston area seeking funding to build an easy to use client/server graphical tool that would communicate with the most popular relational databases like Oracle and Sybase. Dave Litwack had difficulty finding somebody to fund him, but then ran into Mitchell Kertzman. PowerBuilder was born a year or so after their first date.

Powersoft's PowerBuilder 1.0

David Litwack headed up the R&D effort for PowerBuilder and Version 1.0 went into beta (with a codename of "Headstart") in August of 1990. Some of the firms who participated in the beta program were American Airlines, Microsoft, 3M, Fidelity Investments, Coca-Cola, and many others.

PowerBuilder Version 1.0 went into official release in July of 1991. In just six months, Powersoft sold $5.2 million worth of product. Version 2.0 was released less than one year later and sales (in 1992) climbed to $22.1 Million. Profitability was also achieved in the first quarter of 1992.

Powersoft Goes Public

Powersoft went public on February 3, 1993. Shares surged from $20 to $38 a share on the day. Shares were volatile for the next weeks and months but then enjoyed a steady climb as Powersoft continued to pump out record results (1993 revenue was 57m, and 1994 revenue was 133 m). Then, when investors and executives alike were basking in the sunlight of infallibility, some gentlmen callers came knocking. There was an offer made. It was an offer like no other offers. The groom asking for Powersoft's hand in marriage was Sybase and the billion dollar dowry offer was very seductive. So a wedding/merger was arranged on February 13, 1995. I hope they took pictures during the wedding ceremony and honeymoon because the "paper valuation" (the deal was done with Sybase stock—worth $904m) didn't last long. The bad news arrived in the form of fabricated (Sybase) sales results. Sybase stock took a tumble, along with the fortune of many Powersoft executives like Mitchell Kertzman and David Litwack.

Despite the troubles at Sybase, Powersoft's PowerBuilder technology still enjoyed a dominant role in new client server development, until 1996. That's when the Web went wild. That's also when Visual Basic grew up and Borland's Delphi product was launched. On top of these troubles, users were experiencing problems building enterprise applications with PowerBuilder (it wasn't a fault of PowerBuilder per se, but rather it was a problem with client/server applications overall). Budgets were suddenly diverted to Web projects. Visual Basic and Delphi users started to outnumber Powersoft programmers. Talk of PowerBuilder faded slowly, and never quite regained its former glory.

The PowerBuilder groupies still hung on, but the glory days were gone forever.

Get the Wikipedia details about Powersoft and PowerBuilder Here.

Powersoft PowerBuilder Advertisements

Link to LARGE Powersoft Advertisement (Everything You Asked For.) (Opens in a New Window.)

(Below) Two images scanned from a two-page advertisement on PowerBuilder Desktop.

This advertisement was a defensive play against the growing strength of Visual Basic and MS-Access. If I remember correctly, PowerBuilder desktop sold for US$249. PowerBuilder sales were growing so rapidly that if a developer had PowerBuilder experience, they could demand higher rates. Powersoft was keenly aware of this fact—and there were other advertisements (not included here) that featured "PowerBuilder Developers Wanted." (Unfortunately I couldn't put my hands on one of those advertisements.)

↧

API and Other Platform Product Changes

April 19, 2018, 9:39 am

≫ Next: Ursa Labs: an innovation lab for open source data science

≪ Previous: PowerBuilder History, Powersoft History (2004)

Vi bruger cookies som en hjælp til at personliggøre indholdet, skræddersy og måle annoncer samt give en mere sikker oplevelse. Når du klikker eller navigerer på sitet, tillader du, at vi indsamler oplysninger på og uden for Facebook via cookies. Læs mere, bl.a. om hvad du selv kan styre: Politik om cookies.

↧

Ursa Labs: an innovation lab for open source data science

April 19, 2018, 9:51 am

≫ Next: How engineers can stand out from the applicant pool

≪ Previous: API and Other Platform Product Changes

Funding open source software development is a complicated subject. I’m excited to announce that I’ve founded Ursa Labs (https://ursalabs.org), an independent development lab with the mission of innovation in data science tooling.

I am initially partnering with RStudio and Two Sigma to assist me in growing and maintaining the lab’s operations, and to align engineering efforts on creating interoperable, cross-language computational systems for data science, all powered by Apache Arrow.

In this post, I explain the rationale for forming Ursa Labs and what to expect in the future.

Funding Open Source Software: Maintenance and Innovation

In recent years, the world’s businesses have become more dependent than ever on open source software (“OSS”, henceforth). How and why this happened will surely be the subject of future books and research, but at present we are faced with existential challenges as we endeavor to keep making open source “work” for everyone.

In my experience, open source projects feature two dominant archetypal modes: innovation and maintenance. The innovation stage often occurs at the beginning of projects: there are few users and the software changes or evolves rapidly. When a project becomes successful, it can become more conservative. Development shifts to stability, bug fixes, and gradual change and growth. There are many more users, and changes, especially “breaking” ones, can have a high cost to the project’s reputation and future. OSS maintainers, which are often volunteers, routinely “burn out” under the strain of supporting burgeoning user bases who sometimes take a project’s existence and maintenance for granted.

Supporting Maintenance

Some OSS projects become so important that their users consider them to be mission-critical infrastructure software, like Linux or security libraries like OpenSSL. The consequences of under-maintained infrastructure have been studied extensively in recent years in the wake of shocking security vulnerabilities exposed in widely used projects.

Funding OSS maintenance, while challenging, has a clear value-proposition to the world’s organizations, who increasingly view their dependence on OSS as a liability. Companies like RedHat have built their businesses on providing peace-of-mind around mission-critical OSS like Linux.

We are starting to see new business models emerge for funding OSS maintenance, such as Tidelift, which has begun selling a type of “insurance policy” for the package dependency graph of mission-critical OSS frameworks like React and AngularJS. The understanding is that funds from these insurance policies will be paid to the maintainers of projects in the dependency graph to provide timely bug fixes and support the healthy operation of the top-level projects.

Supporting Innovation

Funding the innovation stage of OSS is can be more difficult because of the heightened risk profile. A new project may or may not become successful or widely-used.

Most people know that I struggled for many years to obtain support for developing pandas; in the end I convinced Adam Klein and Chang She to take time away from their well-paying New York finance day jobs (at Goldman Sachs and Citigroup, respectively) to work on the project with me in 2012. I estimate between the three of us pandas cost at least $500,000 in opportunity cost as we did not earn wages during the thousands of hours we invested in the project in 2011 and 2012. If we had refused to build pandas unless we raised enough money to pay for our rent and families’ cost of living, the project likely would not be what it is today.

Open source data science software has become incredibly important to how the world analyzes data and builds production machine learning and AI models. In Google, Facebook, and other industry research labs, Python has become the primary machine learning user interface. If you had told me this in 2008 when I started building pandas, I might not have believed you.

The risks to not funding innovation in OSS for data science are many. The ones I think most about are:

Data scientists’ productivity will suffer, especially as data sizes continue to grow.
Computing costs will remain high as less efficient computing tools are applied as well as possible to process the world’s data.
Organizations continue to rely on less flexible, more expensive proprietary software because they perceive OSS as inadequate.

Traps, and avoiding them

OSS developers have employed various strategies to support their work in lieu of direct funding. Sometimes they work, and sometimes they can be “traps”. I have directly experienced some variant of all of these problems.

The Consulting Trap: project creators hustle for services contracts with users of their software. The contract dealmaking hustle distracts from development, and the services work itself fragments attention from the core development of projects.
The Startup Trap: startups build businesses that monetize the growing use of one or more open source projects. While some of these businesses have succeeded, the creators of open source projects generally must divide their attention between building a business and building a software project. This is obviously a tradeoff: with venture capital and revenue, one can build a larger engineering team. But, startups can have governance conflicts with their user and developer communities. Businesses with hybrid open-source models sometimes must short-change OSS work in favor of work that will grow revenue; the company founders’ desire to invest in OSS may come into conflict with the expectations of the board of directors, who are usually venture capital investors. In some cases, unfortunately, OSS developers are laid off to cut costs.
The Corporate User Trap: a large company that depends on OSS hires or grows developers of those projects to innovate and maintain them going forward. In some cases a company may start a closed-source project, then open source it later. There are many possible problems that arise with this model. Developers may leave a company and fail to find another that will support their work on a project. A company may lose interest in a project and assign the developers to a different project. In some cases, a company will build the new features they need and then “disappear” as they have gotten what they need out of the project. A developer’s ability to grow a larger development team may be limited by budgeting concerns that are out of their control.

2013 to now: DataPad, Cloudera, Two Sigma, and Apache Arrow

Hot on the heels of getting pandas off the ground and publishing my book Python for Data Analysis in 2012, Chang She and I founded DataPad, a venture-funded startup, with the objective of building a data product and later investing R&D budget back into the Python ecosystem. We handed off day-to-day maintenance of pandas to Jeff Reback, Phillip Cloud, and others, who’ve done an amazing job growing the project over the last 5 years.

By mid-2014, at DataPad we found ourselves working on complex systems engineering problems in enterprise analytics that would be more effectively solved in a larger enterprise software company. After the experience of building out pandas and developing the DataPad product, I had accumulated a list of complaints and grievances against pandas’s computational foundations that I summarized infamously in my talk 10 Things I Hate about pandas. In September 2014, the DataPad team and I joined Cloudera to work on these problems and more.

When I arrived at Cloudera, one of my objectives was to form alliances with the big data and analytic database communities to collaborate to solve shared data systems problems for the benefit of the data science world. The two major artifacts of my time at Cloudera were Ibis, a lazy computational expression framework geared toward SQL-style execution engines, and Apache Arrow, a cross-language in-memory data frame format and analytics development platform.

By mid-2016, facing a competitive big data infrastructure market and an arduous path to profitability, Cloudera was not well-positioned to build a team to join me in developing Apache Arrow and improve computational systems for data science. While there was some obvious low-hanging fruit to accelerate Python-on-Spark, overall ROI from investing in Arrow was likely to be several years away and thus was deemed too risky to justify a large budget allocation.

Around this time, I was lucky to connect with Two Sigma, a financial technology and investment management company with a growing OSS development practice and a petascale data warehouse being actively used with Apache Spark and the Python data science stack. I joined Two Sigma in 2016 as a software architect in the analysis tools group, with a plan to make a forward-looking long-term investment in performance and scalability for the Python data stack via the Apache Arrow project. Working with the Two Sigma engineering team, we have helped reach some major Arrow-related milestones. The project has made 11 releases, grown over 130 contributors, and established exciting collaborations with Apache Spark (accelerated data access and Python function execution in Apache Spark), Berkeley RISELab (Fast Python Serialization with Ray and Apache Arrow), and the GPGPU community.

As Apache Arrow has gotten off the ground over the last few years, it has become apparent that the problems we are tackling are much larger in scope than the interests of a single organization or even programming language. As I have argued extensively in talks over the last few years (Data Science Without Borders at JupyterCon, Memory Interoperability for Analytics and Machine Learning at Stanford’s ScaledML, Raising the Tides: Open Source Analytics for Data Science at the Newsweek AI and Data Science Conference), we are solving the same kinds of problems across Python, R, and other languages, and Arrow provides a unifying technology for creating shared computational infrastructure for data science.

After many years collaborating with and learning from the Python, R, JVM, Julia, and other data science communities, I have become convinced that the data science world would benefit from shared computational libraries. I envision a portable, community-standard “data science runtime” that can be utilized for processing native Arrow-based data frames in just about any programming language. This is a huge project. Some of the major areas of work for this include:

Portable C++ shared libraries with bindings for each host language (Python, R, Ruby, etc.)
Portable, multithreaded Apache Arrow-based execution engine for efficient evaluation of lazy data frame expressions created in the host language.
Reusable operator “kernel” containing functions utilizing Arrow format as input and output. This includes pandas-style array functions as well as SQL-style relational operations (joins, aggregations, etc.)
Compilation of operator “subgraphs” using LLVM; optimization of common operator patterns.
Support for user-defined operators and function kernels.
Comprehensive interoperability with existing data representations (e.g. data frames in R, pandas / NumPy in Python).
New front end interfaces for host languages (e.g. dplyr and other "tidy" front ends for R, evolution of pandas for Python)

Enter the Dragon Bear

In light of my experiences building data science software over the last 10 years, I believe the way that I can best serve the open source data science world is by creating an independent organization, Ursa Labs, dedicated to advancing cross-language computational systems for data science. The immediate purpose of this organization is to hire and support developers of data science systems that are part of the burgeoning Apache Arrow ecosystem. The lab will partner with larger organizations to be supported through direct funding and engineering collaborations.

While I am primarily looking for direct funding relationships with companies to grow my development team, I will also be accepting smaller direct donations to the lab which can hopefully support additional developer headcount in time.

Partnering with RStudio

RStudio will be helping me with the administrative side of operating Ursa Labs (HR, benefits, finances, etc., which amounts to a lot of hard work.) I will manage the money raised by the lab, which will primarily be used to pay for salary and benefits for full-time engineers on the Ursa Labs team. The Ursa team and I will operate as a functionally independent engineering group within the RStudio organization and collaborate with other members of RStudio on R-related development work. While it might seem strange to some that I, a long-time Python developer, would be partnering with a company that builds software for R programmers, it actually makes perfect sense.

In 2016, Hadley Wickham and I had a brief collaboration to create the Feather file format, an Arrow-based interoperable binary file format for data frames that can be used from Python and R. The idea of Feather was to socialize the idea of interoperable data technology using Apache Arrow. Many people were surprised to see Hadley and I working together when Python and R are “supposed” to be enemies. The reality is that Hadley and I think the “language wars” are stupid when the real problem we are solving is human user interface design for data analysis. The programming languages are our medium for crafting accessible and productive tools. It has long been a frustration of mine that it isn’t easier to share code and systems between R and Python. This is part of why working on Arrow has been so important for me; it provides a path to sharing of systems code outside of Python by enabling free interoperability at the data level.

R, like Python, faces systems-level problems around fast and scalable in-memory data processing. Since the problems we are solving are so structurally similar, we have long believed that a more extensive collaboration between the communities should happen. It is my goal for the software that I am building to work equally well for R programmers as for Python programmers. As part of the collaboration with RStudio, Hadley Wickham will act as a technical adviser to the work to ensure that we are looking after the needs of R users. We’re all very excited about this.

In the last several years, I have been extremely impressed with the RStudio organization and its founder and CEO, J.J. Allaire. As he, Hadley, and I have gotten to know each other at data science events, I found that we share a passion for the long-term vision of empowering data scientists and building a positive relationship with the open source user community. Critically, RStudio has avoided the “startup trap” and managed to build a sustainable business while still investing the vast majority of its engineering resources in open source development. Nearly 9 years have passed since J.J. started building the RStudio IDE, but in many ways he and Hadley and others feel like they are just getting started.

Partnering with Two Sigma

During my time at Two Sigma, I worked towards a shared vision of data science tools with Matt Greenwood, who heads the company’s Modeling Engineering organization and David Palaitis who manages Two Sigma’s open source efforts. After almost two years, we realized the problems I’m trying to solve with Apache Arrow are bigger than any one company can support. Eventually a project’s scope and needs expand beyond the interests and capabilities of any single organization.

Having access to real problems in data science at massive scale at Two Sigma has informed and validated my vision for Ursa, and my departure to start Ursa Labs does not mean a break on my relationship with the company. By partnering with them as I begin my next venture, I can keep the feedback loop open as they work as early adopters of Arrow software. Two Sigma’s interest in Arrow is part of a larger commitment to creating a productive future for data science, including commitment to communities built around Pandas, Ibis, Jupyter, Spark, Mesos and Tensorflow, among others.

Two Sigma will contribute to Ursa Labs through employee contributions to Ursa Labs projects like Arrow and funding external open sources devs as needed.They will also collaborate on technical advising and rallying support in the community. Matt can help generate the new initiative through his seats on the board of NumFOCUS and TS Ventures. I’ll continue my work with Two Sigma core OSS engineers like Jeff Reback (pandas) and Phillip Cloud (Ibis), and you can look forward to joint talks, such as an upcoming presentation with Jeff Reback at PyData.

Getting involved

We are only now at the beginning of a long journey ahead of us to advance the state of the art in data science tools.

We will soon be posting some full-time engineering positions, so if you are a software engineer and are interested in joining the lab's mission, please stay tuned. In the meantime, we'd love to have you involved with Apache Arrow.

If you are with an organization in a position to sponsor our work or partner with us in some other way, please reach out to info@ursalabs.org.

↧

How engineers can stand out from the applicant pool

April 19, 2018, 9:34 am

≫ Next: Postgres as the Substructure for IoT and the Next Wave of Computing

≪ Previous: Ursa Labs: an innovation lab for open source data science

April 19th, 2018

Everyone wants to stand out when they‘re applying for new opportunities, including me. In fact, it’s one of the most common questions engineers ask me these days.

To get more ideas, I reached out to people who have been on both sides of the table: engineers who have been employed in the past, but are now founders of their own companies and hiring engineers.

I asked each technical founder the same question:

What have you done in the past, or seen other engineers do, to stand out from the applicant pool?

The responses were so good, I had to share.

Daria Rose Evdokimova

Founder of VoiceOps, previously engineering at Coinbase, Gusto, and Google.

I didn’t wait for recruiters to reach out to me, but did my research and found companies I was most excited about and reached out directly to technical recruiters with a very specific explanation of why I wanted to work for them. E.g. I repeatedly heard that Gusto had the best culture and treated their employees extremely well, and I wanted to learn more about that (it turned out to be completely true).

Now being on the other side of the table, as a person who looks at all the applications that come in for VoiceOps, I can say that it’s extremely easy to figure out if a person has done their research and is applying specifically to us, or if they have no idea which company they’re submitting their application for.

Amy Hoy

Founder of Stacking the Bricks, 30x500, Freckle Time Tracking, and more.

The best thing any engineer can do to stand out is to help others. Lots of folks say, “Contribute to open source!” but that’s only one way to do it.

Back when I was more active in programming, everybody in my language communities knew who I was — I became a go-to person — and yet I never contributed to OSS. People trusted my programming ability because of the things I wrote about programming. My cheat sheets built my credibility. Only a person who understands can teach others… but, in a glorious flip side, teaching others builds understanding. Speaking about programming or relevant professional topics, recording screencasts, sharing code snippets, designing cheat sheets, writing blog posts, all of these things will build reputation and communication skills and demonstrate that you are a well-rounded individual and not just a keyboard jockey.

Another thing is to have some kind of cross-over skill, whether it’s ethics, philosophy, business, accessibility, design, sales, public speaking, security, the front-end guy who knows databases, the back-end girl who knows CSS, etc.

Lastly, even if you never do any of the above, learn how to listen, elicit requirements, and understand business goals. If you can go to an interviewer and explain how you and your skills will help the business as a whole and not just “Solve Interesting Problems,” you’ll be ahead of the pack. Most employees never look up.

Christine Spang

Founder of Nylas, previously engineering at Ksplice (bought by Oracle).

Significant open source contributions are one, but they require a lot of free time which not everybody has. I got lucky having gotten into this when I was a teenager when I had a lot of free time.

In onsite interviews, one of the things you can do to stand out is express genuine excitement and curiosity. These traits are infectious and show that you’re likely to put in effort to be successful.

Leah Culver

Founder of Breaker, previously developer advocate and engineer at Dropbox.

When an engineer is a fan of the product, they stand out. If they’ve been using the product, they can guess what some of the technical challenges will be and may already have suggestions for improvements. I also personally look for engineers with a collaborative attitude rather than a competitive one. Humility is important too. Everyone makes mistakes and all software has bugs, but being willing and able to quickly fix issues as they arise is the key to keeping everything moving at a startup.

Michelle Glauser

Founder of Techtonica, previously freelancing as a software engineer.

People really stand out when they make websites specifically targeted at companies they want to work at to showcase their skills, like http://www.nina4airbnb.com.

Jen Dewalt

Founder of Zube, previously Head of Growth at Wit.ai (acquired by Facebook).

For me, it’s all about “what have you built.” I’m totally uninterested in whether someone can whiteboard some ridiculous academic problem. I love to see engineers who have made real things that they can talk about passionately. I think that’s a much better indicator of success than whether they can write bubble sort on a whiteboard.

↧

Postgres as the Substructure for IoT and the Next Wave of Computing

April 19, 2018, 11:26 am

≫ Next: Show HN: Lisp Shell

≪ Previous: How engineers can stand out from the applicant pool

Note: This is based on our February PGConf India keynote (full video). We’re giving a similar talk in May at IoT World. (And right now we’re at PostgresConf in Jersey City — come say hi!)

Computing, like fashion and music trends, advances in waves. (Or, if you prefer a software development metaphor: Computing evolves via major and minor releases.)

From mainframes (1950s-1970s), to Personal Computers (1980s-1990s), to smartphones (2000s-now), each wave brought us smaller, yet more powerful machines, that were increasingly plentiful and pervasive throughout business and society.

We are now sitting on the cusp of another inflection point, or major release if you will, with computing so small and so common that it is becoming nearly as pervading as the air we breathe. Some call this the “Internet of Things”, “Connected Devices”, or “Ubiquitous Computing.”

With each wave, software developers and businesses initially struggle to identify the appropriate software infrastructure on which to develop their applications. But soon common platforms emerge: Unix; Windows; the LAMP stack; iOS/Android.

Today developers of IoT-based applications are asking themselves an important question: what is the right foundation for my new service?

It is too early to declare a winning platform for IoT. But whatever it is, we believe that its substructure, its foundational data layer, will be Postgres. Here’s why.

A closer look at IoT data

Popular science fiction often depicts a future filled with machines, some benevolent (and some less so).

But it turns out that the machines already surround us. In fact, last year was the first year that connected devices (not including computers and smartphones) outnumbered the human population on this planet.

As a result, large amounts of machine (or IoT) data are now showing up in more and more places:

Industrial machines: How most of our things are made.
Transportation and logistics: How we move people and things across our world.
Building management and the Smart Home: How we live in and secure our homes and businesses.
Agriculture: How we feed the planet.
Energy & utilities: How we power our world.
(And so many more)

But what is this machine data? Let’s look at a simple example:

Here we have data generated from three sources: a building, a farm, and a factory. Data arrives periodically, ordered by time. When a new data point comes it, we add it to the existing dataset.

As you can see, as we collect machine data, we build a time-series dataset.

But let’s dig a little deeper with another example:

Here we see that the dataset is a set of measurements collected over time. Again, the dataset is time-series in nature.

But there’s also additional metadata describing the sources (whether sensors, devices, or other “things”) of those measurements. And if we look closely, it appears that the metadata, currently recorded on each reading, looks relational. In fact, one could easily normalize the dataset, with each row containing foreign keys to separate metadata tables. (In our example, we could create additional “devices,” “locations” or even “maintenance” tables.)

In other words, IoT data is a combination of time-series and relational data. The relational data just describes the things that generate the time-series.

Which suggests that for IoT we might want a relational database. Like Postgres.

Why use Postgres for IoT?

There are a lot of reasons why one would want to choose Postgres for IoT:

Relational model + JOINs: As we just saw, the relational model lends itself well to IoT.
Reliability: Decades of software development and production deployments has led to a rock-solid database.
Ease of use: A query language (SQL) that developers and business analysts already know how to use, and a database the DBAs already know how to operate.
Broad ecosystem: The largest ecosystem of compatible visualization tools, backend infra, operational utilities (e.g., backups, replication), and more.
Flexible datatypes (including JSON): A broad set of datatypes, including (but not limited to) numerics, strings, arrays, JSON/JSONB.
Geospatial support via PostGIS, which adds support for geographic objects and location-specific queries.
Momentum: Postgres has perhaps the most momentum of any open source database at the moment (which is why DB-Engines named Postgres as the top DBMS of 2017).

But why isn’t Postgres already used for IoT?

Some people already use Postgres for IoT. If you have low insert rates and are only storing a few million rows of time-series data, Postgres may meet your needs out of the box.

But there is a reason why Postgres isn’t already the default database for these kinds of workloads: Postgres does not naturally scale well.

Insert performance decreasing dramatically as the dataset grows. (Insert throughput as a function of table size for PostgreSQL 9.6.2, running with 10 workers on a Azure standard DS4 v2 (8 core) machine with SSD-based (premium LRS) storage.)

IoT workloads are typically characterized by high insert rates that create large datasets. And as one can see in the graph above, as the dataset grows, insert performance on Postgres drops precipitously — a clear incompatibility.

This drop-off represents the performance trade-off between memory and disk. Beyond a certain size, data and indexes no longer fit in memory, which requires Postgres to start swapping to disk. (A longer explanation here.)

(And it turns out that Postgres 10 does not solve this problem.)

So how should one scale Postgres for IoT data while retaining all of its benefits?

Our former life as an IoT platform

To us, scaling Postgres for IoT workloads is more than an academic question. It’s a real problem that our company faced in a former life.

TimescaleDB in an earlier incarnation.

Our company first started as iobeam, an IoT data analysis platform. And we were moderately successful, collecting large amounts of machine time-series and relational data for our customers. And we needed to store that data somewhere.

Yet we found that the world of databases at that time effectively only offered two choices:

Relational databases (E.g., PostgreSQL, MySQL) that were reliable, easy to use, and performant, but scaled poorly.
Non-relational (aka “NoSQL”) databases (E.g., Cassandra, InfluxDB) that scaled better, but were less reliable, less performant, harder to use, and did not support relational data.

But given that our IoT workloads were comprised of both time-series and relational data, our only choice was to run two databases. This led to other problems: it fragmented our dataset into silos, led to complex joins at the application layer, and required us to maintain and operate two different systems.

What we really wanted was the best of both worlds: something that worked like Postgres, yet scaled for IoT workloads. And given that our Engineering team is led by a Princeton Professor of Computer Science, we decided to build it ourselves.

And then, after hearing from a multitude of other developers who were facing the same problem, we pivoted from an IoT platform to an open source time-series database company, launched the product (April 2017), and then raised $16M (Jan 2018) to grow the business.

Scaling Postgres for IoT workloads

Here’s how we scaled Postgres for IoT:

We identified the main bottleneck: the time spent swapping parts of a dataset that could no longer entirely fit in memory to/from disk.
We then recognized that time series workloads had very different characteristics versus traditional database (or OLTP) workloads:

As one can see, traditional database workloads tend to be update-heavy to random locations, with updates often requiring complex transactions.

For example, here’s the canonical bank account example: If Alice sends Bob $10, then the database needs to atomically debit and credit two otherwise unrelated accounts/records.

But on the other hand time series workloads tend to be insert-heavy, largely in order, with simple transactions. The time-series version of our same bank account example would look like this: insert a row that represents a $10 transfer from Alice to Bob, timestamped to now.

Insight #1: Right-sized chunking
Out of this came insight number one: we could partition our data by time such that each individual partition, or chunk, is right-sized so that all data and indexes fit in memory. If sized appropriately, swapping would be minimized (or even eliminated for the most recent or “hot” chunks).

But when your data is heavily partitioned, that leads to other challenges: managing those partitions, querying across partition boundaries (often requiring complex JOINs), inserting to the right partition, creating new partitions as necessary, etc. This can be a major headache.

Insight #2: The Hypertable
Then came insight number two: the Hypertable, a single virtual table across all partitions that operates like a regular Postgres table and hides all complexity from the user.

Query data from the Hypertable, and it will efficiently identify the right partitions that contain your data; Write data to the Hypertable and it will route tuples to the appropriate partition (and create new ones as necessary); Create indexes/constraints/triggers, manage your schema, all at the Hypertable level, and all changes are propagated to the appropriate chunks.

Now what should the interface to the Hypertable look like? At first we were tempted to create our own query language. But then we were struck with the sheer impracticality of that approach: we’d have to create whole new connectors to every visualization tool, backend component, etc., let alone having to educate entire populations of developers.

Insight #3: Embrace SQL
But then we realized that there was a much simpler path: just embrace SQL. Everyone knows SQL, there is plenty of literature on how to express queries in SQL, and there is a plethora of tools that speak SQL. And for any time-series analytics that are currently suboptimal in SQL, SQL (and PostgreSQL) can be easily improved, via UDFs and query-planning/query-execution level optimizations. (Some examples here.)

And then we took this philosophy one step further and fully embraced Postgres by packaging our work into a Postgres extension. So now anything that works with Postgres (e.g., visualization tools, admin tools, backup/restore utilities, data infra components, etc.) would work with our new time-series database.

At a high-level, here’s the representational model we built:

Of course, the devil is in the details: chunk management, efficient queries, fast tuple routing, enforcing the Hypertable-as-a-vanilla-table guarantee also required a lot of work at the C and PL/pgSQL levels to get right. (More here.)

Our results: 20x higher inserts, 2000x faster deletes, 1.2x-14,000x faster queries

The benefits of this architecture can be seen in the results:

Timescale sustaining 100k+ row inserts/sec, or 1M+ metrics/sec at scale.

Our approach yields up to 10,000x faster queries and 2,000x faster deletes than vanilla Postgres.

In addition, thanks to this architecture we are able to scale a Hypertable up to tens of terabytes, while achieving hundreds of thousands of inserts per second, all on a single node. (More on our benchmarks vs Postgres and vs Postgres 10.)

This design also allowed us to retain all of the benefits of Postgres that we listed earlier:

Relational model + JOINs: Hypertables live alongside “vanilla” relational tables
Reliability, ease-of-use, ecosystem: Our design doesn’t muck with the underlying storage layer, and maintains the same SQL syntax, so it operates and feels just like Postgres.
Flexible datatypes (including JSON), geospatial support: Similarly, this approach maintains compatibility with all of the Postgres native datatypes and extensions like PostGIS.
Momentum: As an extension (i.e. not a fork), the design is compatible with Postgres mainline, and will continue to benefit from the continued underlying improvement in Postgres (and also allows us to contribute back to the community).

The Accidental IoT platform?

Now that we can scale Postgres for IoT, we can also choose from a variety of applications and tools to use on top: e.g., Kafka, RabbitMQ, MQTT, Apache Spark, Grafana, Tableau, Rails, Django… the list goes on and on.

In other words, even though Postgres is multi-decade-old open source project, it has now accidentally become the ideal platform for IoT and the next wave of computing.

If learning about our journey has been helpful, you’re welcome to follow the same path and scale Postgres yourself. But if you’d rather save time, you’re also welcome to use TimescaleDB (open source, Apache 2). The choice is yours. But we are here to help.

↧

Show HN: Lisp Shell

April 19, 2018, 11:17 am

≫ Next: Interview with Simon Peyton-Jones

≪ Previous: Postgres as the Substructure for IoT and the Next Wave of Computing

README.md

Lisp Shell

This is a cross-platform shell developed in Racket. To use, either compile to binary using Racket 6.11 or newer, or run:

Windows: "C:\Program Files\Racket\racket.exe" -f "lsh.rkt" -e "(require 'lsh)" -i

Unix: racket -f "lsh.rkt" -e "(require 'lsh)" -i

Available commands:

     help    ; displays this message
     cd      ; displays the current working directory or change it
     cd/     ; same as (cd "/") - goes back to filesystem root
     pwd     ; print the current directory's path
     dir     ; list the current directory's file list or the specified path
     ls      ; prints the current folder's file list
     mkdir   ; makes a folder
     run     ; run a program from the current directory, optionally takes parameters
     run#    ; run a program directly using its path
     racket  ; edit a file using DrRacket
     edit    ; edit a file using notepad
     edit-me ; edit lsh source file using DrRacket
     url     ; browse to an url
     google  ; google an url
     cp      ; copy a file or folder
     mkdir   ; create a folder
     touch   ; create an empty file
     find    ; walk the current path
     show    ; pretty-prints a command result
     rm      ; delete a file
     rmdir   ; delete a folder
     echo    ; display something on the screen
     search  ; equivalent to Google's 'I'm feeling lucky'

Oh, and it evaluates Racket forms from the command line. Remember to (display ) forms if you need to output results to the screen. This is still very Alpha, but I use it all the time, so you might as well have it too. I use it in Windows, but it should work out of the box on Linux, BSD and MacOS - and if not, would require minor changes.

Cheers,

Dexter

↧

Interview with Simon Peyton-Jones

April 19, 2018, 11:01 am

≫ Next: Airplanes that fly on electricity debut at Fresno’s Chandler Airport

≪ Previous: Show HN: Lisp Shell

An interview project in conjunction with POPL 2018

Interview with Simon Peyton-Jones

Simon Peyton-Jones (Microsoft Research Cambridge) researches the implementations and applications of functional programming languages. He was heavily involved in the design of the Haskell programming language and the development of the Glasgow Haskell Compiler (GHC). We talk about seeing functional programming go from intellectual revolution to practical reality and the importance of investing in programming education.

JY: Tell us about yourself. How did you get here?

SPJ: I first came across computers when I was about fourteen. Our school had one computer, it was a so-called IBM School's Computer, it had 100 memory locations each of which could contain a 10-digit decimal number. A lot of programming was about trying to fit the program into that space. There was one computer for the whole school and there were only two people in the school who cared about this machine. So, my friend Thomas Clarke (he's now at Imperial College) and I spent a lot of time hacking on this. Then we started to build our own computers.

It was also about the time the Intel 4004 came out, the first microprocessor, so that was very exciting. We traveled on a bus to Swindon to a neighbouring technical college that had an Elliot 803, which was the size of several large washing machines. But we had it all to ourselves. It was in the days of punch-tape when, if you wanted to edit your program, you would put your current program on a paper-tape in teletype, run it through (punching a new tape), stop at the bit you wanted to change, type new stuff, carry on copying...

I already knew I would probably dabble in computing for the rest of my life some way or another. Then I went to Cambridge. At that time Cambridge University hadn't yet decided that computing science was a valid subject: you couldn't do a three-year Computer Science degree. So I did mathematics to begin with, and then I discovered mathematics was too difficult. Cambridge mathematicians are kind of a breed apart. Incredibly intelligent people. So, eventually I changed to Electrical Sciences after two years, which was like an electrical engineering degree, and then did a one year postgraduate Diploma in Computer Science. That's my sole formal education in Computer Science. That was in 1979 to 80.

I didn't consider staying for a PhD. I just went and got a job at a small electronics company. But two years later I discovered that actually having to deliver things that customers want is quite hard, and the company was always bankrupt, and I was always stressed. By accident I ended up getting a job as a lecturer at the University of College London in Computer Science. I had one paper to my name and I did not have a PhD, but I got a permanent tenured job as a faculty member at UCL in 1982 or thereabouts.

JY: Was it common to teach without a PhD?

SPJ: There was still the idea that people would work to get their PhD before they got a faculty position, but this was a time in which the Computer Science department was expanding very rapidly and there wasn't enough supply. I think I was incredibly lucky to be looking for a job at that exact time.

JY: What was the one paper?

SPJ: Oh, maybe I had two. One was about the project I did for my Diploma, a comparison of the relative efficiencies of SK combinators and lambda expressions. I built an SK interpreter and a lambda interpreter and ran various programs and saw which went faster. I think it appeared in 1982 This was my first real conference, and John Hughes was giving his first paper then as well so it was an awesome experience. John McCarthy was giving a talk, and Jon L White, and all of these seriously famous people—and then I was giving this talk about a paper that I now regard as completely misplaced. (Misplaced because you can’t draw many conclusions about efficiency from comparing naive interpreters.)

And the other paper that I had was about a little operating system that my boss and I wrote while I worked in my first job. There was only 5 people at this company so it was really small, but we wrote the paper and managed to get it published.

JY: How did you find your way back into functional programming research?

SPJ: Now I had a job as a lecturer and my boss, my Head of Department said to me, "Well Simon, I'll give you a light teaching load so you can get your research started." But of course I had not been a PhD student so I had no idea how to do research. I would sit there in my office with a blank sheet of paper and a sharp pencil and wait for great ideas to come. Of course, nothing did. Then an undergraduate would knock at the door and say, "Simon, do you have a moment?" I'd welcome them in as a distraction from this difficult business of doing research.

Eventually one of my colleagues, a really good guy, called John Washbrook, said to me, "Simon, you should just get on and do something no matter how humble and simple." In the end the first thing that I did was I wrote a parser generator for a functional language SASL, so it was a bit like Yacc. I called it “Yacc in SASL.” That got published in 1985 in Software Practice and Experience.

At the time, I was very inspired by David Turner's papers about SK combinators. I was in London, he was in Canterbury, so I asked him to be an informal mentor. Since I didn't have an adviser he was my sort of remote advisor. I would go to see him every few months and we would have a chat over coffee. That was incredibly helpful to me because I did feel a bit uncertain about what to do.

JY: How did you decide that functional programming was what you wanted to do research on?

SPJ: That was at Cambridge while I was getting my Diploma. There was a very eccentric professor there called Arthur Norman and he was big on computer algebra at the time. He gave a short series of lectures about functional programming, which I had never heard of, in which he showed some functional programs. He even built things like circular lists, which didn't seem even possible given you don't have any side effects. The second thing was David Turner's papers about SK combinators and the amazing idea that you could take lambda expressions and translate them into this big mess of S's and K's and it would evaluate to the same thing.

And all of that occurred at the same time that John Backus was winning the Turing Award and giving his talk called “Can Programming Be Liberated From the von Neumann Style?" In his talk he introduced FP, his functional programming language, and cast it in a big picture. He said, "This is the way to write programs, and moreover not only will it revolutionize programming but we should even build new computers to execute these programs." This was a call to action. We already thought functional programming was cool, but here was this extremely famous guy saying "It's not only cool it's the Right Thing to do." There were a bunch of people at Cambridge, John Hughes, Thomas Clarke, Jon Fairbairn, myself, and a few others who all got excited about functional programming at the same time. It was one of those coincidental things. We all just caught fire.

JY: What were the big open problems in functional programming when you were getting into it?

SPJ: Functional programming is a radical and elegant attack on the whole enterprise of writing programs. It's very different from the "do this and then do that” programming mentality. You have to rewire your brain in quite a different way. For a long time it was well understood theoretically—there was lots of stuff about semantics and it had these very deep foundations in logic. But in terms of a practical programming medium it seemed like a completely virgin field. Then with David Turner’s work, and with the whole ML effort at Edinburgh, people suddenly started to say, "Actually, these languages could be not just elegant, and beautiful, and mathematically cool—but also useful. You might actually be able to write interesting programs using them." That was the movement that I got involved in.

JY: I wanted to talk about how Haskell came about.

SPJ: In the late 80's there were a number of separate researchers who were doing stuff with lazy functional programming. I was one, John Hughes was another, Paul Hudak was another, Thomas Johnsson and Lennart Augustsson at Gothenburg, Arvind and his dataflow colleagues at MIT, Joe Fasel Los Alamos was another, Rinus Plasmeijer at Nijmegen, and so on. There were maybe a dozen all together.

We would meet each other at conferences and we came to realize that we were all building little programming languages and they all basically looked the same. We thought, "Oh, we should do something very modest, very humble. We should just agree a common syntax so that we can run each other's programs." We had SASL and Miranda, David Turner's languages for guidance, so we thought we'll just cohere around some syntactic least common denominator. We wanted a basis for teaching and research just to avoid unnecessary diversity. We weren't thinking of Haskell as a way to solve research problems at all, more as a substrate for research.

We met and decided, "We should form a committee and design a language." So we did, and we then physically met in person. This wasn't before email, but it was certainly before the web and collaborative working and so-forth. We physically met on several occasions to design the language. The surprising thing is that it turned into a research project.

JY: How did that come about?

Several things that happened that were quite serendipitous and unexpected. We knew it was going to be lazy, we knew it was going to have parametric polymorphism like ML does, and we knew it would have algebraic data types and pattern matching. That was all part of the consensus of what we were starting from. Type classes, on the other hand, were entirely new.

We had spent some time debating what we were going to do about functions like read, show, serialization, and equality. They're not parametrically polymorphic, but they are a bit polymorphic, because they should work on a lot of types. And then, out of the blue, Phil Wadler and his student Steve Blott produced, fully formed, the idea of type classes. I still have the email which he sent: it was almost like a little paper to the then committee. We were bowled over: "Oh, this is how we could deal with all of those awkward problems." At that stage, we had a choice make. We could keep thinking of Haskell embodying a current consensus, as we had been. But we didn’t do that. Instead we said, "Type classes may be new, but they solve a really nasty, awkward problem that's a wart on the face of our beautiful language. Let’s embrace them." So we incorporated type classes wholeheartedly, and they turned out to be one of Haskell's big contributions to the world.

For reasons like this—monadic I/O is another example—Haskell ended up being significantly more innovative and ambitious that we had originally intended. But that was largely accidental.

JY: How did Haskell become a platform for answering research questions?

SPJ: I got involved in building a compiler for Haskell, the Glasgow Haskell Compiler, GHC. A lot of the research that I subsequently did revolved around the question "What does a good compiler for this look like?"

So, did Haskell itself answer any questions? I'm not sure I'd put it like that. I'd more say that the spur of having an actual concrete programming language with an actual concrete implementation that aspires to be something that people could use for practical purposes, that spur drove innovation both in the language design, because we would say, "Oh, but look you can't do this and we ought to be able to." And also in the language implementation because we'd think, "This is way too slow." It wasn't the language that answered any questions, it was more its existence of it and its implementation drove a lot of research innovation.

JY: What was the biggest surprise about putting Haskell out there?

SPJ: I had always assumed that the more bleeding edge changes to the type system, things like type-level functions, generalized algebraic data types (GADTs), higher rank polymorphism, and existential data types, would be picked up and used enthusiastically by PhD students in search of a topic, but not really used much in industry. But in fact it turns out that people in companies are using some of these still-not-terribly-stable extensions. I think it's because people in companies are writing software that they want to still be able to maintain and modify in five years time. As you scale up, and as your ambition about timescale increases, so maybe you'll invest more in the static guarantees you get from a type system, and then you push the type system harder. You see people out in industry writing blog posts about catamorphisms and categorical connections, and plenty of stuff that I don't understand. Somehow, the level of abstraction offered by a sophisticated type system lets you get much more ambitious in terms of the intellectual complexity of what you can deal with.

JY: As you were doing all of this work, what was the relationship of the functional programming work to what other people were interested in at POPL at the time?

SPJ: Initially I always thought of POPL as being a conference that was for people cleverer than me, so it was quite a while before I even submitted a paper to POPL. But when I did I found a community that was completely aligned with the kind of things that I cared about. It's right there in the title isn't it? Principles of Programming Languages, so it cares about being principled and it cares about elegance and economy of effort. Try to get the job done with as little machinery as possible. Indeed, I feel that most of my research life is about saying, "It has to be simpler."

I always felt I was more of a theory user, not a theory developer, whereas I'm a compiler developer, not just a compiler user. So I always felt slightly out of my class at POPL. I still do.

JY: What is an important problem our community can work on solving in the next five or ten years?

SPJ: Education. If we're to get the principles, and elegance, and modularity, and economy of effort, and abstractions that POPL contributors value so highly, if we are to get them actually part of the fabric of the software that holds our digital lives together, the way to do that is by instilling those values into our undergraduates, and so then they will become the developers of the future, and CTO's of startups. So, there's a big inertia to overcome, but over time it'll happen. As I often say, when the limestone of imperative programming has worn away, the granite of functional programming will be revealed underneath.

JY: Could you talk about your K12 education efforts too?

SPJ: I've been working on this for about 10 years. There didn’t seem to be any connection with the subject that my kids were learning at school and the subject that I think is so fascinating that I've devoted my professional life to it. So we started a guerrilla movement Computing at School to try and reform our computing curriculum. Rather to our astonishment, we were very successful. The English National Curriculum now explicitly says that all children should learn computer science from primary school onwards as a foundational discipline in the way that they do maths or physics. And for the same reasons: that is, not because they're going to become mathematicians and physicists, but because an elementary understanding of these foundational concepts enables you to be an empowered citizen in a complicated world.

Simon and teachers at Westminster City School.

Now the challenge is actually turning that aspiration into a vibrant reality in every classroom. Even if you had well trained teachers and plenty of time would be difficult enough, but with teachers who are willing but nevertheless underqualified for this particular task it's a real challenge. We're messing it up in numerous ways, partly because we don't really know what we're doing. We've got reasonable amount of experience of teaching CS in university (but not nearly as much as we have in say maths; remember, you could barely do it at Cambridge when I was an undergraduate), but much less experience at school level. We need to study pedagogy and then support and equip those teachers. And that's a big, big challenge, one to which I am devoting a fair amount of effort. I hope that the POPL community would, as individuals and maybe collectively, lend their cycles to that aspiration because I think the POPL community has a handle on the the abstractions that we'd like our children to learn.

↧

Airplanes that fly on electricity debut at Fresno’s Chandler Airport

April 19, 2018, 11:38 am

≫ Next: Deutsche Bank Inadvertently Made a $35B Payment in a Single Transaction

≪ Previous: Interview with Simon Peyton-Jones

An airplane made local aviation history when it was debuted at Fresno’s Chandler Executive Airport Tuesday.

The first all-production electric aircraft to fly in Fresno County took off under sunny skies as part of the Sustainable Aviation Project, an effort of the cities of Reedley and Mendota and CALSTART San Joaquin Valley.

The airplane is a zero-emission craft, and will provide low-cost flight training opportunities for area youth, organizers said.

The airplane, small enough to fit in a home garage, also demonstrates how a network of airports equipped with charging stations could make electric flight feasible. Such charging stations are being set up at Chandler and Mendota and Reedley municipal airports.

In all, four planes were ordered by the cities of Reedley and Mendota.

↧

Deutsche Bank Inadvertently Made a $35B Payment in a Single Transaction

April 19, 2018, 1:01 pm

≫ Next: Stripe Home

≪ Previous: Airplanes that fly on electricity debut at Fresno’s Chandler Airport

A routine payment went awry at Deutsche Bank AG last month when Germany’s biggest lender inadvertently sent 28 billion euros ($35 billion) to an exchange as part of its daily dealings in derivatives, according to a person familiar with the matter.

The errant transfer occurred about a week before Easter as Deutsche Bank was conducting a daily collateral adjustment, the person said. The sum, which far exceeded the amount it was due to post, landed in an account at Deutsche Boerse AG’s Eurex clearinghouse.

The error, which took place in the final weeks of former Chief Executive Officer John Cryan’s tenure, was quickly spotted and no financial harm suffered. But the episode raises fresh questions about the bank’s risk and control processes, which Cryan had boasted of improving before his ouster.

“This was an operational error in the movement of collateral between Deutsche Bank’s principal accounts and Deutsche Bank’s Eurex account,” Charlie Olivier, a spokesman for Deutsche Bank, wrote in an emailed statement. “The error was identified within a matter of minutes, and then rectified. We have rigorously reviewed the reasons why this error occurred and taken steps to prevent its recurrence.”

Deutsche Bank’s Hammonds Leaves as Exits Continue After CEO

It’s another misstep for Deutsche Bank at a time when it is undergoing a change of leadership in the wake of its third straight annual loss, and, like other lenders, faces increased scrutiny from regulators. Cryan, who was CEO for three years, said in a speech earlier this year that the bank was approaching the end of “phase 1” of his restructuring, which bolstered internal controls and shrunk the number of operating systems at the bank to 32 from 45.

“A bank mistakenly making such a large transfer shows its controls aren’t working adequately, and it’s embarrassing,” said Dieter Hein, an analyst at Fairesearch who has the equivalent of a sell recommendation on the bank’s stock. “This kind of incident shows that the bank’s problems are so big that you can’t fix them immediately. Cryan failed.”

Hein also said chief operating officer, Kim Hammonds, who herself is being ousted, bears some of the blame given her involvement in Cryan’s information-technology revamp. Hammonds reportedly called Deutsche Bank “the most dysfunctional company” she’d ever worked for, and hasn’t denied making the remarks.

Bear Trap

The error should have been caught by an internal fail-safe system known as a "bear-trap," the person said. The mechanism was set up after an internal audit at the bank triggered by an earlier collateral payments error, in March 2014, the person said.

While such errors do occur, the amount involved -- more than the bank’s market capitalization of around 24 billion euros -- is highly unusual, according to the person.

Eurex held back 4 billion euros of Deutsche Bank’s funds over the weekend of March 23, the person said. A spokesman for Deutsche Boerse said the company doesn’t comment on single transactions or client relationships.

Deutsche Bank’s new CEO, Christian Sewing, is seeking to turn around the worst-performing member of the Stoxx 600 banks index this year, with the company’s shares having fallen 26 percent to date. Analysts have said Sewing’s appointment raises questions about the lender’s future direction, especially the under-performing investment bank business.

In a separate development, Germany’s biggest bank has been asked by the European Central Bank to simulate an orderly wind-down of its trading book, Chief Financial Officer James von Moltke told Bloomberg Monday. Deutsche Bank is the first to receive such a request from the ECB, according to a person familiar with the matter, who said the ECB is using Europe’s largest investment bank as a "guinea pig" before it sends similar requests to other banks.

— With assistance by Will Hadfield, and Nicholas Comfort

↧