Quantcast
Channel: Hacker News
Viewing all 25817 articles
Browse latest View live

The Dex File Format

$
0
0

Have you ever wondered what happens to your Android application code when it’s compiled and packaged into an APK? This post takes a deep dive into the Dalvik Executable Format, with a practical example of the structure of a minimal Dex file.

What is a Dex file?

A Dex file contains code which is ultimately executed by the Android Runtime. Every APK has a single classes.dex file, which references any classes or methods used within an app. Essentially, any Activity, Object, or Fragment used within your codebase, will be transformed into bytes within a Dex file that can be run as an Android app.

It can be useful to understand the structure of a Dex file because all these references can take up a lot of space in your application. Using many 3rd party libraries can increase your APK size by megabytes, or worse, lead to the infamous 64k method size limit. And of course, there may come a day where knowledge of Dex files helps you track down unexpected behaviour in your app.

Dexing Process

All the Java source files in an Android project are first compiled to .class files, which consist of bytecode instructions. In a traditional Java application, these instructions would be executed on the JVM. However, Android apps are executed on the Android Runtime, which uses incompatible opcodes, and therefore an additional Dexing step is required, where .class files are converted into a single .dex file.

Because most mobile devices are severely constrained in the amount of memory, processing power, and battery life available, ART offers superior performance to the JVM. One key feature that helps achieve this is that ART performs both Ahead-of-Time and Just-in-Time compilation. This avoids some of the runtime overhead of JIT, while still allowing performance improvements over time as an app is profiled.

How to create a Dex file

A practical example of a Dex file makes this a lot easier to understand. Let’s create a minimal APK that only contains one Application class, as this allows us to understand the file format without being overwhelmed by the thousands of methods that are present in a typical app.

We’ll use Hexfiend to view our Dex file in hexadecimal, as Dex uses some unusual data types to save space. We’ve hidden any null bytes, so empty spaces in the above screenshot actually represent 00.

A Dex file’s Structure

The full structure of our 480-byte Dex file is shown in hexadecimal and UTF-8 below. Some sections are instantly recognisable when interpreted as UTF-8, such as the single BugsnagApp class that we have defined within our source code, and others not so much:

6465780A 30333800 7A44CBBB FB4AE841 0286C06A 8DF19000
3C5DE024 D07326A2 E0010000 70000000 78563412 00000000
00000000 64010000 05000000 70000000 03000000 84000000
01000000 90000000 00000000 00000000 02000000 9C000000
01000000 AC000000 14010000 CC000000 E4000000 EC000000
07010000 2C010000 2F010000 01000000 02000000 03000000
03000000 02000000 00000000 00000000 00000000 01000000
00000000 01000000 01000000 00000000 00000000 FFFFFFFF
00000000 57010000 00000000 01000100 01000000 00000000
04000000 70100000 00000E00 063C696E 69743E00 194C616E
64726F69 642F6170 702F4170 706C6963 6174696F 6E3B0023
4C636F6D 2F627567 736E6167 2F646578 6578616D 706C652F
42756773 6E616741 70703B00 01560026 7E7E4438 7B226D69
6E2D6170 69223A32 362C2276 65727369 6F6E223A 2276302E
312E3134 227D0000 00010001 818004CC 01000000 0A000000
00000000 01000000 00000000 01000000 05000000 70000000
02000000 03000000 84000000 03000000 01000000 90000000
05000000 02000000 9C000000 06000000 01000000 AC000000
01200000 01000000 CC000000 02200000 05000000 E4000000
00200000 01000000 57010000 00100000 01000000 64010000

dex
038zDÀª˚JËAÜ¿jçÒê<]‡$–s&¢‡pxV4dpÑêú¨Ã‰Ï,/ˇˇˇˇWp<init>Landroid/app/Application;
#Lcom/bugsnag/dexexample/BugsnagApp;
V&~~D8{"min-api":26,"version":"v0.1.14"}ÅÄÃ
pÑêú¨ Ã ‰ Wd

At a very high level, Dex files can be thought of as two separate parts. A file header which contains metadata, and a body which contains the majority of the data. A diagram of the file header structure is shown below.

Let’s step through each item in the header sequentially.

Dex File Magic

Many file formats begin with a fixed sequence of bytes that uniquely identify the application used to manipulate them, and Dex is no exception.

6465780A 30333800
dex
038

We can see that the first 8 bytes must contain ‘dex’, and the version number - currently 38 when our targetSdkVersion is API 26.

You may have also noticed that the 4th byte encodes a newline character, and the 8th byte is null. These are validated by the Android Framework to check for file corruption - an APK should refuse to install if this exact sequence isn’t present.

Checksum

The next value is a checksum, which is calculated by applying a function to the contents of the entire file, excluding any bytes preceding the checksum. If a byte within the file was corrupted during download or storage on disk, the calculated checksum won’t match and the Android Framework will refuse to install the APK.

SHA1 signature

FB4AE841 0286C06A 8DF19000 3C5DE024 D07326A2

The header also includes a SHA-1 hash of the file (excluding any preceding bytes). This is used to uniquely identify Dex files, which may be useful in scenarios such as Multidex.

File size

This matches the file size in bytes, and can also be used for validation when reading the Dex file.

The header size should be 112 bytes long.

Therefore we can now highlight all the remaining fields within the header_item.

Endian constant

Dex files support both big endian and little endian encoding. This value equals REVERSE_ENDIAN_CONSTANT, inidicating that this particular Dex file is encoded in little endian, which is the default behaviour.

IDs and Offsets

The remaining values in the file header define the size and location of other data structures which hold identifiers for methods, strings, and other items.

00000000 00000000 64010000 05000000
70000000 03000000 84000000 01000000
90000000 00000000 00000000 02000000
9C000000 01000000 AC000000 14010000
CC000000

These values are summarised in the table below, where size equals the array length, and the offset is the number of bytes from the start of file where this information can be found.

TypeSizeOffset
link_size00
map_offN/A356
string_ids5112
type_ids3132
proto_ids1144
field_ids00
method_ids2156
class_defs1172
data276204

It’s worth noting that link_size and field_ids are both 0, because our app doesn’t statically link any libraries or contain any fields. The map_off structure in the data section largely duplicates this information, in an easier format for Dex file parsing.

As an example, we can see that there are 5 Strings IDs in our Dex file, encoded between bytes 112-132. Each ID at this position also points to an offset within the data section, that encodes the actual value of the String.

Map List

The map_list is a section within the data body which contains similar information to the file header.

With this knowledge, we can use the offsets to resolve the actual information, and determine what our Dex file encodes.

Strings

Enough talk - let’s see something concrete. Let’s find out what the string_ids structure points at.

E4000000 EC000000 07010000 2C010000 2F010000
228,     236,     263,     300,     303

The array encodes 5 integer offsets, which point at the data section.

<init>
Landroid/app/Application;
Lcom/bugsnag/dexexample/BugsnagApp;
V
~~D8{"min-api":26,"version":"v0.1.14"}

If we retrieve these values as UTF-8, we are greeted by a few Java symbols which will look familiar to anyone who has used the JNI before, and also some JSON which indicates D8 created the Dex file. All this business of IDs, offsets, and multiple headers, may seem a bit useless at this point. Why not just encode the string value directly in the header?

Some of the reasoning behind this is that these strings are referenced from multiple points within the Dex file. Providing an ID for each one prevents duplication of information and reduces the overall file size, simplifies parsing as an ID will always be a fixed length, and means values are only accessed when required.

Types

01000000 02000000 03000000
1, 2, 3

Our Dex file defines 3 Java types. Each value here is an index into the previous string_id array - therefore we can determine that the types in our file are as follows:

Landroid/app/Application;
Lcom/bugsnag/dexexample/BugsnagApp;
V

The TypeDescriptor syntax may look somewhat unfamiliar, but the L simply refers to a full class name, and V is the type void. Our types include our custom BugsnagApp class, and the Application class from the Android framework.

Prototypes

03000000 02000000 00000000
3,       2,       0
"V",     V

A method prototype consists of information on the return type of a method, and the number of parameters it takes. The proto_id section uses indices to retrieve the type information, and an offset, which is non-functional in this case as our method doesn’t take any parameters.

Methods

The Method section also uses indices. Each method looks up the class ID where it was defined, the method prototype, and the name of the method from the strings table.

00000000 00000000 01000000 00000000
0,  0,   0,       1,  0,   0

Landroid/app/Application "V" <init>
Lcom/bugsnag/dexexample/BugsnagApp; "V" <init>

The only methods in our Dex file relate to the constructor for BugsnagApp - which is exactly what we’d expect.

Class Defs

This section contains the type, inheritance hierarchy, access metadata, and other class metadata such as annotations and source file indices.

01000000 01000000 00000000 00000000 FFFFFFFF 00000000 57010000 00000000
1,       1,       0,       0,       NO_INDEX,0,       343,     0

This evaluates as a public class Lcom/bugsnag/dexexample/BugsnagApp which inherits from Landroid/app/Application, whose class data is stored from byte 343. The public access modifier is determined from a bit field. Let’s view the class data.

Class Data

The first 4 bytes of our BugsnagApp class data define the number of static and instance fields, along with any direct or virtual methods.

00 00 01 00
0, 0, 1, 0,

01 81 80 04 CC 01 00 00 00
1,          460

Ihere is only one direct method defined in this class. It has an ID of 1, which corresponds to Lcom/bugsnag/dexexample/BugsnagApp; "V" <init>, and a code data offset of 460. If our method was abstract or native, there wouldn’t be a code data offset.

If our class defined fields and other information, more data would be encoded in this section. Incidentally, if the method ID was a value larger than 65,536, we would have encountered the infamous 64k method limit.

Code structure

We’re now analysing the constructor method defined in our class, which has the following structure at the offset of 460:

0100 0000 5701 0000 0010, 0000 01000000 64010000
1,   0,   343, 0,   16,   0    1,       64,1

This corresponds to a register size of 1, 0 incoming arguments, 343 outgoing arguments, and an offset of 16 where debug information is stored.

The most important part however, is the last few bytes. We have an instruction list size of 1, which means that our method has compiled to one opcode: 64010000.

The Dalvik Bytecode table suggests that 64 corresponds to the sget-byte operation on a register, using the field reference index of 1. This seems to match up with our expectation that a singleton BugsnagApp field will be created for our application - but diving deep into Dalvik is a story for another day!

A new Android Compiler - D8

We haven’t touched too much on the compilation process, but our minimal Dex file was created using D8, a new compiler that will be rolled out by default in Android Studio 3.1. It offers performance benefits in the overall file size and build speed, so let’s test those claims.

Benchmarking D8 performance

Let’s create a greenfield app with Android Studio 3.0.1. We’ll add Kotlin Support and a Navigation Drawer, but will otherwise leave all the options as default, generate a signed APK, and view it with the APK Analyzer.

We can retrieve classes.dex from within the APK by unzipping the APK with unzip app-release.apk -d app, then measuring the file size in bytes: stat -f%z app/classes.dex.

Better faster smaller stronger

MetricDXD8
Uncompressed File Size (Mb)4.233.73
Class count27902790
Method count2203822038
Total Method references2865328651

Our Dex file is approximately 88% of its previous size when compiling with D8. Your mileage may vary, as this is a very simple example project. One other interesting thing to note is that using D8, we appear to have lost the following two method references:

android.view.View#isInEditMode
java.lang.Class#desiredAssertionStatus

These don’t appear to be used at runtime, so could be an optimisation. Please get in touch if you know why these are missing!

Why minification leads to a better app

Enabling minification and obfuscation is the single greatest thing you can do for your app, and now that you’re an expert in the Dex format, you can probably think of some reasons why.

Firstly, stripping out unused Java classes with Proguard will reduce the size of an APK, as the generated Dex file won’t contain unused class definitions and all their associated data which take up space.

Obfuscation will also reduce the Dex file size, as unless you’re the type of developer who names their classes a.a.A and z.z.Z, less characters will be required for each symbol, which will save space overall. Solutions exist for mapping obfuscated stacktraces, which allow you to easily diagnose crashes within your app.

Finally, a smaller Dex file leads to a smaller APK, which means users spend less on mobile data, and are less likely to give up on a download. If you offer an Instant App, then the hard limit of 4Mb means keeping APK size low is a big consideration.

Would you like to know more?

Hopefully this has helped you understand Dex files, which are about to get a lot smaller with the advent of D8. If you have any questions or feedback, please feel free to get in touch.


Bugsnag automatically monitors your applications for harmful errors and alerts you to them, giving you visibility into the stability of your software. You can think of us as mission control for software quality.

Try Bugsnag’s Kotlin exception reporting.


XPS 13 developer edition 7th generation available

$
0
0

Today I am excited to announce the availability of the 7th generation of the Ubuntu-based XPS 13 developer edition.  Project Sputnik’s latest and greatest system is now simultaneously available in Europe, Canada and the United States.

The new XPS 13 developer edition (9370)features the 8th Generation Intel Quad Core, a brand new chassis, an improved display and smaller borders.

The 9370 is even thinner, lighter and smaller than its already svelte predecessor, the 9360. (Note, the 9370 does not replace the 9360, as the two will coexist.)  If you want more detailed specs, please scroll down dear reader. Before we get to the product details however, here’s a quick Project Sputnik backgrounder.

Project Sputnik, a recap

It all started back in 2012 with the wacky idea of creating a high-end Linux laptop targeted at developers.   An internal innovation fund gave the scrappy project team a little bit of money and six months to see if this idea would fly.

From day one, project Sputnik publically solicited input from the developer community.  It was this input and the tremendous community support that pushed the effort from project to a product.

The initial XPS 13 developer edition, available only as one configuration, debuted on November 29, 2012.  Fast forward five years and not only are we announcing the 7th generation of that initial product but the project itself has now expanded to become a full line of developer-targeted systems.

US and Canadian configurations

In North America, as mentioned above, the new XPS 13 developer edition is available in both the US and Canada.  The following are the available configurations in the two countries (Note: the links below point to the US configurations).

CPU | Memory | Storage | Display

European configurations

As of today, the new XPS 13 developer edition is available online in Europe in the following countries:

UK, Ireland, Germany, Austria, France, Italy, Spain, Switzerland (French and German), Belgium, Netherlands, Sweden, Norway, Denmark.

Beyond the thirteen countries above, there is a much longer list of countries where the 9370 is available offline.  (I’ll be posting this list in the next few days.)

The following configurations are available both online and offline.

CPU | Memory | Storage | Display

  • i7 | 16GB |      1TB | UHD touch (3840 x 2160)
  • i7 | 16GB | 512GB | UHD touch (3840 x 2160)
  • i7 | 16GB | 512GB | FHD non-touch (1920 x 1080)
  • i7 |  8GB | 256GB | FHD non-touch (1920 x 1080)

9370 Specifications

Here’s a bit more detail behind the system specs.  These options are available as noted above.

  • 8th Generation Intel® Quad Core™, i5 (US and Canada only) and i7 versions
  • Memory options: 4GB, 8GB or 16GB Dual Channel SDRAM
  • Storage options: 128GB, 256GB, 512GB or 1TB
  • Ports
    • 2x Thunderbolt™ 3
    • Noble lock slot
    • Headset jack
    • DC-In & DisplayPort 1x USB-C 3.1
    • MicroSD card reader
  • Display options
    • UltraSharp 4K Ultra HD (3840×2160) InfinityEdge touch display
    • FHD (1920 x 1080) InfinityEdge display
  • Ubuntu 16.04 LTS preloaded
  • 1 year ProSupport

Who’d a thunk it

I gotta admit that five years ago the team and I would never have thought that we would be posting a blog announcing the 7th generation of the XPS 13 developer edition.  The thought of even a third generation would have bogled our minds.

Five years ago, thanks to the support of the community, Project Sputnik reached escape velocity.  This support, input and direction has not only continued but has grown over the last five years, enabling the Sputnik team to stop focusing on survival and instead focus on supporting a broader range of developer needs.

As we go forward, please keep driving us by sharing your input and experiences, be they be good, bad or ugly. 🙂

Thanks!

Barton

Extra-credit reading

  • Line up — New XPS 13 developer edition —  (9370)
  • Line up — XPS 13 developer edition — (9360)
  • Project Sputnik Turns Five! – November 29, 2017
  • 2012 — year one

Pau for now…

How and why we teach non-engineers to use GitHub at Thread

$
0
0

At Thread one of our core beliefs is that technology allows for great change. This is important to our product, but it’s also important to how we work internally.

Because of this way of working, we try to represent everything in data—products, measurements, styles, suppliers, locations in our warehouse, support ticket resolutions, and many more things that you’d never even think about.

All of these data models come with a cost of needing a way for those in the company who use them to maintain the data. This means building editing interfaces, with validation, database design, and front-end work. Often we just don’t have time to do this—new features are higher priority, and besides, a engineer can just update a few data files when needed right?

While this is a much quicker solution in the short term, an engineer will have to context switch out of their work, watch the release go out and make sure nothing goes wrong—that all hurts productivity. Perhaps more importantly though, the person who needs the data updated now no longer has ownership of the whole process and are reliant on someone else’s schedule.

Ultimately this process can be useful to get a feature out of the door quickly, but causes far too much friction to work long term.

A better solution

I remember when GitHub first launched their web editor — I wasn’t impressed. Why would anyone edit code in a web browser? Why would I use an editor that could only change one file per commit? Well years later I’ve realised that I am not the target market for the editor.

At Thread we now regularly teach those outside of the engineering team how to contribute to our codebase via the GitHub web interface, so that they are in control of updating data they need to work effectively.

We have now had more contributors to our main codebase who are in non-technical roles, than all engineers and contractors who have contributed over the years.

Has it worked?

As a engineer on the product team, I’m able to focus my efforts on building features that will benefit our customers and move metrics, rather than on building more CRUD interfaces. I’m also able to ship A/B tests faster as we can often skip the internal tooling for the test version in favour of editing data through data files to begin with. When we get to the delivery phase of a project we can then put the time into the editing interfaces as we’ll not only have an idea of the value of the feature, but also have a better idea of how our internal users would like the interfaces to work.

It’s also not limited to data files; many pages on thread.com are essentially static HTML, pages like our delivery FAQ, returns policy, or terms and conditions. By learning how to use GitHub, our operations team can keep these up-to-date without asking for help. Our talent team are also able to edit our jobs site, reacting on a daily basis to common questions that come up when talking to candidates.

All of this means that our team members outside of the engineering team are able to have much more ownership over their work, and have less friction to make the changes their experience tells them is necessary.

How do we do it?

The first thing we do is run GitHub tutorials every now and again when we have a few new starters to teach. We cover the basics of what a repository is, comparing it to document revision histories on Google Docs, what it means to commit a file, and what a branch is. We only talk about these in high level ways as we don’t cover the command line interface at all in our current tutorial format.

Next up we go through how to edit a file on the GitHub web interface, how to write a commit message, what a pull request is, and what the build status reporting from Jenkins means.

Lastly we ask non-technical contributors to pick an engineer who is available on Slack to hit the merge button once the build is green.

Issues we’ve encountered

On balance we feel this is a huge win for the team as a whole, and we’re planning to continue the training and encourage more contributors as we grow, but we have changed our process slightly as this has evolved.

Firstly, we’ve used GitHub roles and locked branches to prevent accidental commits to master. For someone who isn’t as familiar with version control and branches in particular, the GitHub web interface isn’t particularly clear about when a commit is going on to the master branch or a new branch. At Thread our master branch is continuously deployed with no manual intervention required, which resulted in several commits going out that broke the site and caused downtime.

As for all downtime issues, we ran a blameless 5 Whys and realised that while in hindsight we could have caught these issues with unit tests run before deployment, we likely wouldn’t catch everything and so introducing protected branches to encourage code review was a lightweight way to solve the problem.

Secondly, somewhat in response to this issue, we have started to write some unit tests that just sanity-check the structure of the data in data files, or to check that all of our Django template files successfully parse as valid templates. Particularly in the case of the data files, these wouldn’t normally be something we’d expect to test, but as we now want the files to be editable by people without a knowledge of the code, they can be handy in catching simple mistakes.

Lastly, as we’re typically using Python for our data files, we’ve found that the syntax isn’t particularly intuitive and can take some getting used to. To address this, we’ve written documentation with a little more detail than if it were written for an engineer. This documentation is also in the repo and editable by everyone, so we encourage non-engineers to update and clarify the instructions as they learn, and to teach each other how to edit certain parts of the site.

Moving forward

We consider this experiment to be a success and will be continuing it for the foreseeable future. Where we’re designing data files to be editable, we’re going to try including detailed instructions in the files themselves, possibly including copy/pasteable examples.

We already try to make our test failures have informative error messages with details on how to fix where we can, but due to the complexity of interpreting test output we don’t currently expose Jenkins to non-technical team members, even though they can technically log in with single-sign-on. This is perhaps the next opportunity we have to improve the contribution experience and something we might trial in the next batch of new starters who go through the tutorial.

Show HN: An illustration of Web Developer tools in 2018

$
0
0

Web Developer Roadmap

Roadmap to becoming a web developer in 2018

Below you find a set of charts demonstrating the paths that you can take and the technologies that you would want to adopt in order to become a frontend, backend or a devops. I made these charts for an old professor of mine who wanted something to share with his college students to give them a perspective.

If you think that these can be improved in anyway, please do suggest.


Did you like this guide and want more of the similar content?
Subscribe for the launch of Hugobots or follow me on twitter!


🚀 Introduction

🎨 Front-end Roadmap

👽 Back-end Roadmap

For the backend, personally I would prefer Node.js and PHP 7 for the full time. Plus, I have been experimenting lately with Go and I quite like it. Apart from these, if I have to choose another one, I would go for Ruby. However this is just my personal preference, you can choose any of the shown languages and you will be good.

👷 DevOps Roadmap


🚦 Wrap Up

If you think any of the roadmaps can be improved, please do open a PR with any updates and submit any issues. Also, I will continue to improve this, so you might want to watch/star this repository to revisit.

☑ TODO

👬 Contribution

The roadmaps are built using Balsamiq. Project file can be found at /project-files directory. To modify any of the roadmaps, open Balsamiq, click Project > Import > Mockup JSON, it will open the roadmap for you, update it, upload and update the images in readme and create a PR.

Sponsored By

License

License: CC BY 4.0

Algorithm efficiency comes from problem information

$
0
0

This is a high level post about algorithms (especially mathematical, scientific, and data analysis algorithms) which I hope can help people who are not researchers or numerical software developers better understand how to choose and evaluate algorithms. When scientists and programmers think about efficiency of algorithms, they tend to think about high level ideas like temporary arrays, choice of language, and parallelism. Or they tend to think about low level ideas like pointer indirection, cache efficiency, and SIMD vectorization. However, while these features matter, most decent implementations of the same algorithm tend to get quite close in efficiency to each other (probably What does my algorithm know about my problem?

Let me highlight one of the biggest success stories in numerical analysis: solving linear systems. The problem is simple, for what values of x does Ax=b? Most people learn early on in their math career that you solve this via Gaussian elimination which is a very nice \mathcal{O}(n^3) algorithm. In fact, as far as we know right now, solving linear systems has the same asymptotic complexity as matrix multiplication which is currently \mathcal{O}(n^{2.737}). This grow pretty fast, but is it all a game of who can write the most efficient general linear solver?

Not even close. The algorithm complexity is the worst case scenario, but in most applications you actually aren't in "the worst case". MATLAB's backslash (A\b) operator solves the linear system Ax=b, but what algorithm does it use? Well, Mathworks describes it with a picture:

(picture credit to Mathworks, this is linked directly to their documentation)

Do you think that's it? If so, then you're wrong. That's only when the matrix is full/dense. If it's a sparse matrix, then there's another algorithm for it:

(picture credit to Mathworks, this is linked directly to their documentation)

Julia's backslash operator is similar, but what's cool is you can actually inspect Julia's source code and see exactly what it's doing. But it's the same idea. This kind of algorithm is known as a polyalgorithm since it's many different algorithms together. The algorithm starts by checking what kind of type the matrix is and then performing a specialized algorithm on that type, and if no specialized algorithm exists, falls back to Gaussian elimination through a pivoted QR-factorization.

Why would you write an algorithm like this? The trivial answer is because it's faster. If you know the matrix is diagonal then the solution is a \mathcal{O}(n) element-wise division by the diagonal. If you know the matrix is positive-definite then you can use a Cholesky decomposition to solve the equation which takes about half of the operations of Gaussian elimination. However, I think it's better to realize the common thread here. The diagonal method is faster because it knows about properties of diagonal matrices and uses them. The Cholesky decomposition is faster because it utilizes details about positive-definite matrices in order to get rid of possible operations.

These specialized algorithms are fast because they use more "information" about the problem

And that's only the start of this entire field of mathematics known as numerical linear algebra. There are iterative methods which can also be used. (Dense) Matrix factorizations are dense and thus require the same amount of memory as the full matrix itself (though some sparse matrix factorizations similarly need the memory for non-zero elements and maybe some more). Iterative solvers require only the ability to do A*x and thus are much more memory efficient and can scale well on sparse matrices. This also makes them easily to parallelize across large clusters and thus are the method of choice for large solving large sparse systems that arise from things like PDEs. Even then, you can still make it more efficient by choosing a preconditioner, but good choices of preconditioners are dependent on, you guessed it, properties of your A or the equation it is derived from.

As you can see, every significant step of this tool chain is about baking more information about the problem into the solution methods. This is why I would suggest that one thinks about the amount of information an algorithm contains about a problem since that's where the true gains are. Let's take another example.

Are neural networks "efficient"?

If there's one area of computational data science that everyone is excited about right now, it's neural networks. The quickest way to explain a neural network is that it is a computationally efficient way to approximate any mapping from inputs to outputs, f(x)=y. The f that it creates uses a lot of matrix multiplies which are easy to GPU-parallelize, and the deep in "deep neural networks" simply is the application of more matrix multiplies to get a better approximation of f. Movie recommendations are where you take in x = (data about movies the person previously watched) and then y = (the score that you think they'd give to other movies), and you use enough data to get a good enough approximation for the movies they would like and spit out the top few. Classification problems like "does this picture contain a bird?" is just about creating a mapping between x = (a picture represented by its pixel brightness), to y = 1 or 0 (yes or no: it contains a bird). You train this on enough data and it's correct enough of the time. Amazing, right?

The next question people like to ask is, what neural network frameworks are good for these problems? TensorFlow, PyTorch, KNet.jl will all auto-GPU you things for you and in the end all give around the same performance. Of course, package developers will fight over these 2x-5x differences over the next decade, showing that their internal setup is good for these problems, while others will show it's not good for these other problems. But what this doesn't tell you is whether you should be using a neural network in the first place.

(Yes, you can use these computational tools for things other than neural networks, but let's ignore that for now)

If you think about the point of this blog post, it should trigger something in you. I mentioned that neural networks can approximate any function f(x)=y... so what does the neural network "know" about the problem? Pretty much nothing. The reason why neural networks are popular is because they are a nice hammer that you can use on pretty much any problem, but that doesn't mean it's always good. In fact, because you can use a neural network on any problem it pretty much implies that it won't be great on any problem. That's the fundamental trade off between specificity and generality! My friend Lyndon White displayed this very nicely when he showed 7 different ways to solve a classification problem in Julia. Notice that the neural network method is dead last in efficiency. Neural networks aren't bad, but they just don't know anything about a problem. If the problem is linear, then a linear regression or linear SVM will DDDDOMINATE on the problem! If it's a classification problem, then algorithms which incorporate knowledge about what a classification problem means, more so than "spit out y in [0,1] via sigmoid and say yes if y is greater than some threshold", will do better. For this domain, it includes things like decision trees. For movie recommendation problems, factorization machines more succinctly capture our internal model of how a user's movie ratings should be related, and thus it's no surprise that they tend to win most of the Netflix prize type of competitions.

Note that neural networks can be made problem-specific as well. Convolutional neural networks are successful in image processing because they add an extra constraint about the relations of the data, specifically that small "stencils" of the larger matrix (i.e. groups of nearby pixels) should be interrelated. By adding that kind of information into the structure of the neural network and its optimization algorithms, this then becomes a very efficient tool for tasks where the data has this structure.

So neural networks are not useless even though they are not always efficient. They are extremely useful since they can be applied to pretty much any problem. But the general neural network architectures (like deep feed-forward neural networks) lack knowledge about the specific problems they are approximating, and so they cannot hope to be as efficient as domain-optimized algorithms. How do you make them better? You introduce ideas like "memory" to get recurrent neural networks, and it's no surprise that the research shows that these methods are more efficient on problems which have some memory relation. But the generality of neural networks leads me to my favorite example.

Algorithm specificity for differential equations (okay, this goes into more depth!)

I spend my time developing new methods for (stochastic) differential equations and developing differential equation solving software. There are hundreds and hundreds of different algorithms for finding the function u(t) whose derivative satisfies u'(t) = f(t,u(t)) where f is the data given by the user starting to t_0 and ending at t_f (this is the definition of an ODE BTW. If this is new to you, think I'm given a model f which describes how things change and I want to compute where I will be in the future). And again, solving this problem efficiently is all about incorporating more information about the problem into the solver algorithm.

Just as a interesting note, you can solve a differential equation with a neural network, but it's not good. I was hoping that the efficiency gained by using optimized GPU-libraries would be enough to overcome the efficiency lost by not specifying on the problem, but it wasn't even close and the end result was that it was difficult for a deep neural network to solve non-stiff nonlinear problems like the Lotka-Volterra equations which standard differential equations software solve in microseconds. But what's more interesting is why it failed. The neural network simply failed to understand the temporal dependencies of the problem. If you look at the failed solutions on the blog post, you see what happens is that the neural network doesn't know to prioritize the early time points because, well, it's obvious in a temporal model that if you are wrong now then you will be wrong in the future. But the neural network sees each time point as an unconnected matrix and from there the problem becomes much harder. We could try convolutional nets or recurrent nets, or bias the cost function itself, but this is still trying to get around "the information problem" which differential equations don't have.

With a differential equation, we are modeling reality. In many cases, we know things about that reality. We might know that the trajectory of our rocket is smooth (it has all derivatives), and so we can develop high order algorithms which require 9 derivatives of the user's f and it's not surprise that these Vern algorithms benchmark really well. But is it the best integrator? No, there's no such thing. First of all, these methods are only for "non-stiff" ODEs, that is an ODE which has a single timescale. If there are multiple timescales, then one can show that these are "unstable" and require a very small time step. So what's the best algorithm?

What I try to highlight in the post on differential equation libraries is that a large choice of methods is essential to actually being efficient. For example, many only have a few integrators, any in many cases this is simply multistep methods (because integrators like LSODE and VODE, provided in ancient Fortran code, have an option to change between stiff and non-stiff problems). But are these actually good methods? That's not a good way to look at it. Instead, look at what these multistep methods mean. The form for the BDF2 method is:

 y_{n+2} - \frac{4}{3}y_{n+1} + \frac{1}{3}y_n = \frac{2}{3}hf(t_{n+2},y_{n+2})

What are the advantages of this algorithm? Well, because it uses two past data points (assume you already know y_n and y_{n+1} and want y_{n+2}), you only have to solve an implicit function for the variable y_{n+2}. If the cost of evaluating the function is high (as is the case for example with large PDE discretizations), then you want to evaluate it less so this does well! Since this only has a single f call per update, that's doing quite well.

However, it made a trade off. In order to decrease the number of function evaluations, it requires more than one previous data point. It also requires that "nothing happened" between those two points. If you have some discontinuity for example, like modeling a ball when it bounces or modeling the amount of chemicals in a patient and you give a dose, then the previous data is invalid. Adaptive order algorithms like LSODE or CVODE go back to the more error prone backwards Euler method

y_{n+1} - y_n = hf(t_{n+1},y_{n+1})

to compute one small step (it has to be small to make the error low since the error is \mathcal{O}(\Delta t)), then uses that step as the previous step in a BDF2, then goes up to BDF3 etc., where each higher method requires more previous data and has a lower error order. But if you're hitting events quite often (Pk/Pd simulations where a drug dose happens every few hours), notice that this is really really bad. Like, awful: you might as well just have had an implicit Euler scheme!

And not only that, the assumption that this method is efficient requires that f is costly. Other methods, like Rosenbrock or (E)SDIRK methods, make the trade off to take more evaluations of f to get less error, and in turn use this decreased error to take larger time step (and thus less steps overall). And so it's not surprise that these methods benchmark better on small system and even "medium sized systems", while the BDF method benchmarks as more efficient on larger systems with a more expensive function evaluation (the Rosenbrock methods are things like Rosenbrock23 and Rodas4, and the BDF method is CVODE_BDF). Again, this is nothing more than incorporating more details about the problem into the choice of solution method.

This is why having a wide variety of methods which incorporate different types of problem information is really what matters for efficiency. IMEX methods are a great example of this because instead of the user specifying a single ODE u' = f(t,u), the user specifies two functions u' = f_1(t,u) + f_2(t,u) where one of the functions is "stiff" (and thus should be solved implicitly by things like Rosenbrock or BDF methods) while the other is "non-stiff" (and thus should be solved by explicit methods like the Verner method). By specifying this split form, libraries like ARKODE or DifferentialEquations.jl can utilize algorithms which incorporate this information and thus will be more efficient than any of the previously mentioned methods on appropriately split problems. Another example are symplectic integrators which incorporate the mathematics of the underlying symplectic manifold into the solver itself, for problems which have a symplectic manifold. What kinds of problems lie on a symplectic manifold? Well, those arising from second order differential equations and physical Hamiltonians like N-body problems of astrodynamics and molecular dynamics. These methods, by utilizing a fundamental property of the problem specification, can noticeably reduce the amount of drift in the energy and angular momentum of the approximated solution and make the resulting simulations closer to reality. The changes an algorithm choice can make can be orders of magnitude. In the paper I showed earlier on methods for stochastic differential equations, incorporating the idea of time correlation and interpolation of Brownian motion (the Brownian bridge) to build an adaptive method resulted in an average of 100,000x less time steps to compute solutions to the SDE, so much that I couldn't even benchmark how long it would take for algorithms which didn't make use of this because they could not finish! (In my next paper, I am able to estimate that it would take almost 6 years to solve this problem vs the 22 seconds we're at now). Choice of language, cache efficiency, etc. pale in comparison to proper algorithm choice.

The Julia Programming Language

I hope I am getting across the point that the way to evaluate algorithms is by the information they use about the problem. This way of thinking about algorithms extends very far. I show in another blog post that the Julia programming language achieves C/Fortran levels of efficiency because it allows the LLVM compiler to have total type information about your program, as opposed to more dynamic languages like R/MATLAB/Python. Julia is specifically designed so that compilers can at compile-time compute as much type information as possible to essentially build the C code you would have written. When you understand the information that is being optimized on, these performance differences aren't magical anymore: it's just using more information, more specificity, and getting more performance out because of that.

In fact, I have to put a little blurb out here for the Julia programming language as well. Its multiple dispatch and abstract typing system is a very nice way to organize algorithms by their information. Let's take how Julia's linear algebra works. For a standard dense matrix, the backslash operator internally calls factorize(::AbstractMatrix) (and A_ldiv_B!, which is an inplace way of writing A\b) which I showed above is a polyalgorithm. But if you call factorize on some special matrix type, like a Diagonal or Tridiagonal matrix, it uses a specialization of factorize or A_ldiv_B! to perform the most efficient algorithm for that type of matrix. Then in user codes and packages, other people can define a matrix type MyMatrix <: abstractmatrix="" and="" overload="" factorize="" or="" a_ldiv_b="" to="" be="" efficient="" for="" their="" matrix="" type.="" then="" in="" my="" algorithm="" like="" a="" differential="" equation="" solver="" which="" uses="" linear="" algebra="" under="" the="" hood="" i="" can="" simply="" call="" julia="" will="" pick="" most="" form="" based="" on="" type="" that="" currently="" have="" if="" method="" was="" defined="" package="" do="" not="" depend="" this="" means="" generic="" code="" type-insensitive="" code:="" don="" know="" types="" just="" without="" caring="" what="" kind="" of="" it="" is="" made="" more="" by="" user="" specifying="" problem="" information="" dispatches.="" result="" system="" with="" multiple="multiple" dispatch="" doesn="" let="" compilers="" get="" but="" also="" makes="" easy="" developers="" pass="" along="" specializations="" tern="" customize="" other="" solvers.="" so="" your="" may="" rosenbrock23="" gmres="" iterative="" use="" actual="" jacobian="" because="" class="MathJax_Preview">J*x can be computed directly from a function call (this is quite common in PDEs), and you don't need to re-write your own. Instead you can make my generic Rosenbrock23 algorithm compile a code which does that... even though it was never written to know what kind of method factorize should be. This kind of "opening up of the black box" and being able to optimize pieces of it via information injection for your specific problem is a powerful tool for optimizing numerical algorithms to large scale problems. Given this post I hope it's easy to understand why I think this will (and already has started to) lead to a new generation of efficient implementations. (Okay okay, you can do generic programming with C++ templates, but you can't make me do it! Look at this "gentle introduction" and you'll be in my boat asking for a high-level language designed to have that information in mind, and I'll point you to Julia)

Moral of the story

Getting your perfect caches lined up is small potatoes. BLAS and LAPACK are well-known super fast Fortran linear algebra libraries. They are extremely fast not just because they are bit twiddling with assembly (though they are), but they are super fast because they incorporate information about floating point precision and use that to their advantage. Neural networks, ODE solvers, etc. are all just tools for problems. The more specific they are to the problem you're trying to solve, the more efficient they can be. So next time you're looking to optimize some numerical codes, don't ask "what can I do to make the computation more clean/efficient", ask "what information about the problem is the solution method not using?" That's where the big potatoes come from.

In pursuit of Otama's tone (2017)

$
0
0

It would be fun to use the Otamatone in a musical piece. But for someone used to keyboard instruments it's not so easy to play cleanly. It has a touch-sensitive (resistive) slider that spans roughly two octaves in just 14 centimeters, which makes it very sensitive to finger placement. And in any case, I'd just like to have a programmable virtual instrument that sounds like the Otamatone.

What options do we have, as hackers? Of course the slider could be replaced with a MIDI interface, so that we could use a piano keyboard to hit the correct frequencies. But what if we could synthesize a similar sound all in software?

Sampling via microphone

We'll have to take a look at the waveform first. The Otamatone has a piercing electronic-sounding tone to it. One is inclined to think the waveform is something quite simple, perhaps a sawtooth wave with some harmonic coloring. Such a primitive signal would be easy to synthesize.

[Image: A pink Otamatone in front of a microphone. Next to it a screenshot of Audacity with a periodic but complex waveform in it.]

A friend lended me her Otamatone for recording purposes. Turns out the wave is nothing that simple. It's not a sawtooth wave, nor a square wave, no matter how the microphone is placed. But it sounds like one! Why could that be?

I suspect this is because the combination of speaker and air interface filters out the lowest harmonics (and parts of the others as well) of square waves. But the human ear still recognizes the residual features of a more primitive kind of waveform.

We have to get to the source!

Sampling the input voltage to the Otamatone's speaker could reveal the original signal. Also, by recording both the speaker input and the audio recorded via microphone, we could perhaps devise a software filter to simulate the speaker and head resonance. Then our synthesizer would simplify into a simple generator and filter. But this would require opening up the instrument and soldering a couple of leads in, to make a Line Out connector. I'm not doing this to my friend's Otamatone, so I bought one of my own. I named it TÄMÄ.

[Image: A Black Otamatone with a cable coming out of its mouth into a USB sound card. A waveform with more binary nature is displayed on a screen.]

I soldered the left channel and ground to the same pads the speaker is connected to. I had no idea about the voltage range in advance, but fortunately it just happens to fit line level and not destroy my sound card. As you can see in the background, we've recorded a signal that seems to be a square wave with a low duty cycle.

This square wave seems to be superimposed with a much quieter sinusoidal "ring" at 584 Hz that gradually fades out in 30 milliseconds.

Next we need to map out the effect the finger position on the slider has on this signal. It seems to not only change the frequency but the duty cycle as well. This happens a bit differently depending on which one of the three octave settings (LO, MID, or HI) is selected.

The Otamatone has a huge musical range of over 6 octaves:

[Image: Musical notation showing a range of 6 octaves.]

In frequency terms this means roughly 55 to 3800 Hz.

The duty cycle changes according to where we are on the slider: from 33 % in the lowest notes to 5 % in the highest ones, on every octave setting. The frequency of the ring doesn't change, it's always at around 580 Hz, but it doesn't seem to appear at all on the HI setting.

So I had my Perl-based software synth generate a square wave whose duty cycle and frequency change according to given MIDI notes.

FIR filter 1: not so good

Raw audio generated this way doesn't sound right; it needs to be filtered to simulate the effects of the little speaker and other parts.

Ideally, I'd like to simulate the speaker and head resonances as an impulse response, by feeding well-known impulses into the speaker. The generated square wave could then be convolved with this response. But I thought a simpler way would be to create a custom FIR frequency response in REAPER, by visually comparing the speaker input and microphone capture spectra. When their spectra are laid on top of each other, we can read the required frequency response as the difference between harmonic powers, using the cursor in baudline. No problem, it's just 70 harmonics until we're outside hearing range!

[Image: Screenshot of Baudline showing lots of frequency spikes, and next to it a CSV list of dozens of frequencies and power readings in the Vim editor.]

I then subtracted one spectrum from another and manually created a ReaFir filter based on the extrema of the resulting graph.

[Image: Screenshot of REAPER's FIR filter editor, showing a frequency response made out of nodes and lines interpolated between them.]

Because the Otamatone's mouth can be twisted to make slightly different wovels I recorded two spectra, one with the mouth fully closed and the other one as open as possible.

But this method didn't quite give the sound the piercing nasalness I was hoping for.

FIR filter 2: better

After all that work I realized the line connection works in both directions! I can just feed any signal and the Otamatone will sound it via the speaker. So I generated a square wave in Audacity, set its frequency to 35 Hz to accommodate 30 milliseconds of response, played it via one sound card and recorded via another one:

[Image: Two waveforms, the top one of which is a square wave and the bottom one has a slowly decaying signal starting at every square transition.]

The waveform below is called the step response. The simplest way to get a FIR convolution kernel is to just copy-paste one of the repetitions. Strictly, to get an impulse response would require us to sound a unit impulse, i.e. just a single sample at maximum amplitude, not a square wave. But I'm not redoing that since recording this was hard enough already. For instance, I had to turn off the fridge to minimize background noise. I forgot to turn it back on, and now I have a box of melted ice cream and a freezer that smells like salmon. The step response gives pretty good results.

One of my favorite audio tools, sox, can do FFT convolution with an impulse response. You'll have to save the impulse response as a whitespace-separated list of plaintext sample values, and then run sox original.wav convolved.wav fir response.csv.

Or one could use a VST plugin like FogConvolver:

[Image: A screenshot of Fog Convolver.]

A little organic touch

There's more to an instrument's sound than its frequency spectrum. The way the note begins and ends, the so-called attack and release, are very important cues for the listener.

The width of a player's finger on the Otamatone causes the pressure to be distributed unevenly at first, resulting in a slight glide in frequency. This also happens at note-off. The exact amount of Hertz to glide depends on the octave, and by experimentation I stuck with a slide-up of 5 % of the target frequency in 0.1 seconds.

It is also very difficult to hit the correct note, so we could add some kind of random tuning error. But turns out this is would be too much; I want the music to at least be in tune.

Glides (glissando) are possible with the virtual instrument by playing a note before releasing the previous one. This glissando also happens in 100 milliseconds. I think it sounds pretty good when used in moderation.

I read somewhere (Wikipedia?) that vibrato is also possible with Otamatone. I didn't write a vibratio feature in the code itself, but it can be added using a VST plugin in REAPER (I use MVibrato from MAudioPlugins). I also added a slight flanger with inter-channel phase difference in the sample below, to make the sound just a little bit easier on the ears (but not too much).

Sometimes the Otamatone makes a short popping sound, perhaps when finger pressure is not firm enough. I added a few of these randomly after note-off.

Working with MIDI

We're getting on a side track, but anyway. Working with MIDI used to be straightforward on the Mac. But GarageBand, the tool I currently use to write music, amazingly doesn't have a MIDI export function. However, you can "File -> Add Region To Loop Library", then find the AIFF file in the loop library folder, and use a tool called GB2MIDI to extract MIDI data from it.

I used mididump from python-midi to read MIDI files.

Tyna Wind - lucid future vector

Here's TÄMÄ's beautiful synthesized voice singing us a song.

Introducing Preemptible GPUs

$
0
0
For users looking to create dynamic pools of affordable GPU power, Compute Engine’s managed instance groups can be used to automatically re-create your preemptible instances when they're preempted (if capacity is available). Preemptible VMs are also integrated into cloud products built on top of Compute Engine, such as Kubernetes Engine (GKE’s GPU support is currently in preview. The sign-up form can be found here).

Over the years we’ve seen customers do some very exciting things with preemptible resources: everything from solving problems in satellite image analysis, financial services, questions in quantum physics, computational mathematics and drug screening.

"Preemptible GPU instances from GCP give us the best combination of affordable pricing, easy access and sufficient scalability. In our drug discovery programs, cheaper computing means we can look at more molecules, thereby increasing our chances of finding promising drug candidates. Preemptible GPU instances have advantages over the other discounted cloud offerings we have explored, such as consistent pricing and transparent terms. This greatly improves our ability to plan large simulations, control costs and ensure we get the throughput needed to make decisions that impact our projects in a timely fashion." 
Woody Sherman, CSO, Silicon Therapeutics 

We’re excited to see what you build with GPUs attached to Preemptible VMs. If you want to share stories and demos of the cool things you've built with Preemptible VMs, reach out on Twitter, Facebook or G+.

For more details on Preemptible GPU resources, please check out the preemptible documentation, GPU documentation and best practices. For more pricing information, take a look at our Compute Engine pricing page or try out our pricing calculator. If you have questions or feedback, please visit our Getting Help page.

To get started using Preemptible GPUs today; sign up for Google Cloud Platform and get $300 in credits to try out Preemptible GPUs.

Wi-Fi startup Eero lays off 30 employees

$
0
0

Eero, the mesh Wi-Fi router startup, has laid off 20 percent of its workforce (about 30 employees), TechCrunch has learned. Eero confirmed about 30 employees were let go but declined to comment on its total workforce size.

“Our goal is to provide perfect WiFi in every home,” an Eero spokesperson said in a statement to TechCrunch. “Over the past year we explored several related projects, and we’ve now made the tough decision to eliminate one new project in favor of greater focus on our core business. We do not take this lightly, and unfortunately this shift means about 30 colleagues will no longer be working at eero. We will continue our work to make eero the most reliable, secure, and easiest home WiFi solution.”

Eero, which first launched in 2015, aims to change the way we think about wireless routers. Last year, Eero unveiled two new hardware products, a next-generation Eero with triband WiFi and the Eero Beacon, which plugs directly into wall sockets in places where it’s inconvenient to have a corded product.

In July, Eero acqui-hired startup Thington, a home management app founded by Dopplr founder Matt Biddulph and former Yahoo Brickhouse Head of Product Tom Coates.

Eero has raised $90 million from First Round Capital, Menlo Ventures, AME Cloud Ventures, Initialized Capital, Homebrew Ventures and others. Its most recent round came in May 2016 with $50 million in funding.


Show HN: PAST, a secure alternative to JWT

$
0
0

README.md

Build StatusLatest Stable VersionLatest Unstable VersionLicenseDownloads

PAST is everything you love about JOSE (JWT, JWE, JWS) without any of themany design deficits that plague the JOSE standards.

What follows is a reference implementation. Requires PHP 7 or newer.

PAST (Platform-Agnostic Security Tokens) is a specification and reference implementation for secure stateless tokens.

Unlike JSON Web Tokens (JWT), which gives developers more than enough rope with which to hang themselves, PAST only allows secure operations. JWT gives you "algorithm agility", PAST gives you "versioned protocols". It's incredibly unlikely that you'll be able to use PAST in an insecure way.

Key Differences between PAST and JWT

PAST

v2.auth.eyJkYXRhIjoidGhpcyBpcyBhIHNpZ25lZCBtZXNzYWdlIiwiZXhwIjoiMjAzOS0wMS0wMVQwMDowMDowMCJ9VpWy4KU60YnKUzTkixFi9foXhXKTHbcDBtpg7oWllm8

This decodes to:

  • Version: v2
  • Purpose: auth (shared-key authentication)
  • Payload:
    {"data": "this is a signed message","exp": "2039-01-01T00:00:00"
    }
  • Authentication tag:
    VpWy4KU60YnKUzTkixFi9foXhXKTHbcDBtpg7oWllm8=

To learn what each version means, please see this page in the documentation.

JWT

An example JWT (taken from JWT.io) might look like this:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ 

This decodes to:

Header:

{"alg": "HS256","typ": "JWT"
}

Body:

{"sub": "1234567890","name": "John Doe","admin": true
}

Signature:

TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ

Motivation

As you can see, with JWT, you get to specify an alg header. There are a lot of options to choose from (including none).

There have been ways to exploit JWT libraries by replacing RS256 with HS256 and using the known public key as the HMAC-SHA256 key, thereby allowing arbitrary token forgery.

With PAST, your options are version and a purpose. There are three possible values for purpose:

  • auth -- shared-key authentication
  • enc -- shared-key encryption
  • sign -- public-key authentication (a.k.a. digital signatures)

All encryption modes use authenticated modes.

Regardless of the purpose selected, the header (and an optional footer, which is always cleartext but base64url-encoded) is included in the signature or authentication tag.

How to Use this Library

See the documentation.

The section dedicated to this PHP implementation may be more relevant.

“My ten hour white noise video now has five copyright claims”

$
0
0

Velkommen hjem!

Denne tidslinje er der, hvor du vil bruge mest af din tid og konstant få opdateringer om det, der interesserer dig.

Fungerer Tweets ikke for dig?

Hold over profilbilledet og klik på Følger-knappen for at stoppe med at følge enhver konto.

Deltag i samtalen

Tilføj dine tanker om ethvert Tweet med et svar. Find et emne, du er passioneret omkring, og hop direkte ind i samtalen.

Få mere af det, du elsker

Følg flere konti for at få øjeblikkelige opdateringer om de emner, du er interesseret i.

Gå aldrig glip af et Øjeblik

Følg de bedste historier, mens de sker.

Ink/stitch: an Inkscape extension for machine embroidery design

$
0
0

Introduction

Want to design embroidery pattern files (PES, DST, etc) using free, open source software? Hate all the other options? Try this one.

I received a really wonderful christmas gift for a geeky programmer hacker: an embroidery machine. It’s pretty much a CNC thread-bot… I just had to figure out how to design programs for it. The problem is, all free embroidery design software seemed to be terrible, especially when you add in the requirement of being able to run in Linux, my OS of choice.

So I wrote one.

Okay, not really. I’m pretty terrible at GUIs, but I found this nifty inkscape extension that was created and hacked on by a couple of other folks. It was pretty rudimentary, but it got the job done, and more importantly, it was super hackable. I hacked the hell out of it, and at this point ink/stitch is a viable entry-level machine embroidery design tool.

“Quick” Setup On Ubuntu 16.04 (and derivative Linux distributions)

Download the extension’s archive file and unpack the zip archive. Then, on a terminal, run the installation (and update) script from the archive’s bin directory by first changing into that directory with cd <path_to_the_archive>/inkstitch-master/bin and then entering the command sh install_ink_embroidery_Ubuntu.sh. The script will ask for your password to be able to install new software, remove any Inkscape snap packages that may be installed, and to update Inkscape to the current version. Intermittently, it will require you to confirm a step by hitting Enter, or typing in ‘y’ or ‘yes’. Restart Inkscape after the script has executed to see the changes take effect. You can also run the script again when you later want to update the extension (and Inkscape).

To also install the optional conversion program, run sh install_libembroidery-convert.sh in the same directory. This script will also ask your password in order to be able to install the necessary dependencies.

Not tested with Ubuntu versions other than 16.04. If you try it out, please let us know how it went.

Continue reading in the “Usage” section.

Manual Setup

To use this tool, you’re going to need to set it up. It’s an inkscape extension written as a Python file. Once you get it working, you’ll need to learn how to design vectors in the way that ink/stitch expects, and then you can generate your design files.

Inkscape

First, install Inkscape if you don’t have it. I highly recommend version 0.92 or greater, which has a really key feature: the Objects panel. This gives you a heirarchical list of objects in your SVG file, listed in their stacking order. This is really important because the stacking order dictates the order that the shapes will be sewn in.

Versions 0.92.2 and higher let you bind a key to new commands, “stack up” and “stack down”, which I assign to pageup and pagedown. These let you arbitrarily reorder objects in the SVG file, which lets you directly manipulate which order they stitch in. It works way better than the default “raise” and “lower” commands.

Python Dependencies

A few python modules are needed. In some cases this extension uses features that aren’t available in the versions of the modules pre-packaged in distributions, so I recommend installing them directly with pip:

pip install -r requirements.txt

Extension installation

  1. Clone the extension source: git clone https://github.com/lexelby/inkstitch
  2. Install it as directed here

I prefer to symbolically link into my git clone, which allows me to hack on the code. Changes to the Python code take effect the next time the extension is run. Changes to the extension description files (*.inx) take effect the next time Inkscape is restarted

Optional: conversion program

The extension can output machine embroidery design files directly in Melco format. I don’t even know what that is, so I don’t use it. I prefer to use the CSV output format which can be consumed by another awesome open source project: Embroidermodder2. In theory, this project was going to be exactly what I wanted. In practice, it never got funded on Kickstarter and it’s largely incomplete.

However, it contains a really awesome core library that knows pretty much every machine embroidery format and how to convert between them. I use it to convert the CSV files that ink/stitch outputs into the PES files that my SE400 uses.

Grab the source: git clone https://github.com/Embroidermodder/Embroidermodder. Build just libembroidery-convert using the instructions in “3)” in the Embroidermodder build docs. You can then use it like this: ./libembroidery-convert your-file.csv your-file.pes.

Since the CSV + libembroidery-convert method is the only method I use, it’s the one I’ll assume from here on. I’m not even sure if the other output formats from ink/stitch still work (or ever worked).

Usage

Basic Usage

First things first: I’m going to assume you know a few embroidery terms like “fill stitch” and “satin”. Look those up if you’re mentally 404ing, then come back here. I’m not going to assume you know some of the more advanced terms, because I had to learn all that when I started this project, so I might as well teach you too.

  1. Open up Inkscape and create a rectangle.
  2. Make sure it has both a stroke and a fill.
  3. Convert it to a path using Path -> Object to Path (because ink/stitch doesn’t understand rectangles, circles, and the like, and ignores them).
  4. Run Extensions -> Embroidery -> Embroider. Use the default settings.

The rectangle you made will disappear and be replaced with some stripes and zig-zags. ink/stitch has hidden all of your layers and created a new one called Embroidery, in which it has palced a visual representation of the stitch plan it created. It has interpreted your shape as two instructions: Fill and Stroke. Fill is implemented using fill stitching, and Stroke is implemented by running satin stitching along the outline.

Select the horizontal lines using the “Edit Paths by Nodes” tool. Zoom in a bit and you’ll see that the lines are actually made up of lots of points. Each point represents one stitch – one needle penetration and interlocking of the top thread with the bobbin thread. Notice how the points all line up nicely in diagonals. This will give the fill stitching a nice, orderly visual appearance.

Now look at the zig-zags. These are the satin stitches. Note that the corners look pretty ugly. This is because satin stitches generated from a shape’s stroke are pretty rudimentary and aren’t implemented intelligently. You can exert much greater control over satin stitching using a Satin Column, described later.

The stitching preview you’re looking at just now isn’t intended to be permanent. I usually immediately undo it (ctrl-Z) after I’ve looked at the stitches. The actual work that ink/stitch does is to output a design file.

Stitching Out the Design

Where’d the design file go? One of the parameters you were able to specify in the filter settings dialog was the output directory. By default, the directory used is the place where you installed the extension’s Python files. I output mine to ~/Documents/embroidery/output.

ink/stitch will create a file named something.csv, where something is the name of your svg file (e.g. something.svg). If something.csv already existed, it will be renamed to something.csv.1, and something.csv.1 will be renamed to something.csv.2, etc, up to 5 backup copies. When you’ve got the design the way you like it, save off a copy of something.csv.

Next, convert it to your machine’s format using libembroidery-convert (as described above). Send it to your machine in whatever way one does that for your machine, and try stitching it out!

Ordering

Copy your rectangle and paste it elsewhere on your canvas. Deselect any shapes (Edit -> Deselect), re-run the extension, and look at the output. You’ll see that both regions have been stitched, and there will be a line connecting them. That’s a jump-stitch, where the machine will move a long distance between stitching the sections.

If you’re like me, your machine can’t automatically cut the thread between stitching sections, so you’ll need to minimize jump stitches as much as possible through careful planning of your stitch path. If your machine can do thread cuts, congratulations! But you’ll need to modify ink/stitch to allow you to specify a thread cut, because there’s no way to do that right now.

However, note that ink/stitch pays attention to the colors you use for objects. If you change colors from one object to the next, ink/stitch will include a color-change instruction using the color you’ve set for the object. My machine cuts the thread and waits for me to switch to the new thread color.

Reordering

Use the Objects panel to view the stacking order of the objects in your SVG file. ink/stitch will stitch them in their stacking order, from lowest to highest. You can reorder them in the normal way in inkscape to affect the stitching order.

You can also use the Reorder extension. Hold shift and select the objects you’d like to reorder, one at a time, in the order you’d like them to end up in (lowest to highest). Run Embroidery -> Reorder. This extension will pull all of the selected objects out of wherever they were in the stacking order and insert them in order at the same place as the first object you selected. This can save you a ton of time. NOTE: this stopped working in more recent versions of inkscape, which no longer tell the extension what order you selected objects in.

Seeing the stitch plan for selected objects

If you have one or more objects selected when you run the Embroider extension, only those objects will be embroidered. This can be useful to help you fine-tune just one small section of your design.

Embroidery Parameters

When you run Embroider, you’ll have the option to specify a few parameters like stitch length, fill stitch row spacing, etc. These are used as defaults for all objects in your design. You can override these parameters and set many more using the Embroidery -> Params extension.

This extension gives you an interface to control many aspects of the stitching of each object individually. To use it, first select one or more objects. Parameters will be applied to them all as a group. If the selected objects already have parameters set, these settings will be pre-loaded into the interface.

Parameters are stored in your SVG file as additional attributes on the XML objects. You can view these attributes using Inkscape’s built-in XML editor panel, but you shouldn’t actually need to do this during normal usage. Inkscape ignores attributes that it doesn’t know, so these attributes will be saved right along with your SVG file. Note that other SVG programs may not retain these attributes, so be careful!

I recommend avoiding dependence on the default settings specified in the Embroider extension’s settings window. In fact, I bypass it entirely by binding a keystroke (ctrl+e) to “Embroider (no preferences)” in Inkscape’s settings. This way, I can quickly see the stitch plan just by pressing the keystroke. I also bind a keystroke to Params so that I can quickly view and change settings for each object.

Sidenote on extensions

Params is a bit weird, in that the dialog is produced by an entirely separate program (the extension) rather than Inkscape itself. This is due to the way Inkscape structures extensions. I wish ink/stitch could have deeper integration into Inkscape’s user interface, but it’s currently not possible. This is the price we pay for not having to write an entire vector graphics editor program :)

Another issue is that Inkscape has a memory leak related to extensions. The more times you run an extension, the more memory Inkscape uses and the slower it gets. I periodically save my SVG file, close Inkscape, and restart it to work around this issue. See above re: putting up with this kind of hassle so as not to have a to implement an entire vector graphics editor. Hopefully they’ll fix this bug soon.

AutoFill

AutoFill is the default method for generating fill stitching. To use it, create a closed path in Inskcape and add a fill color. This algorithm works for complex shapes with or without holes.

ink/stitch will break the shape up into sections that it can embroider at once using back-and-forth rows of stitches. It then adds straight-stitching between sections until it’s filled in the entire design. The staggered pattern of stitches is continued seamlessly between sections, so the end result doesn’t appear to have any breaks. When moving from one section to the next, it generates running stitching along the border of the shape.

AutoFill parameters

Using the Params extension, you can set these parameters:

  • angle: The angle of the rows of stitches, in degrees. 0 is horizontal, and the angle increases in a counter-clockwise direction. Negative angles are allowed.
  • row spacing: distance between rows of stitches
  • maximum stitch length: the length of each stitch in a row. “Max” is because a shorter stitch may be used at the start or end of a row.
  • running stitch length: length of stitches around the outline of the fill region used when moving from section to section
  • staggers: stitches are staggered so that neighboring rows of stitches don’t all fall in the same column (which would create a distracting valley effect). Setting this dictates how many rows apart the stitches will be before they fall in the same column position.

AutoFill Underlay

By default, AutoFill will cover the shape with one layer of stitches. In almost all cases, this won’t look any good. The individual stitches will sink into the fabric (even if it’s thin) and the fill will appear sparse. The fabric may even stick up between rows.

To solve this, you need underlay: an initial layer of stitches that hold up the final stitches. Underlay for fill stitch it’s usually comprised of fill stitching 90 degrees offset from the final fill (called “top stitching”). The row spacing should be much wider than in the top stitching. The goal is to flatten out the fabric and give the top stitches “rails” to sit on.

In Params, you’ll see an underlay tab next to the AutoFill tab. Enable it by checking the box. The default settings should be good enough for most cases: 90 degrees offset and row spacing 3x the spacing of the top stitching.

Manual Fill

Manual Fill is the old mode from before I figured out how to implement automatic fill routing. In some cases, AutoFill may not be an option, such as when the running stitches between sections are not acceptable for your design. Usually, fill region edges are covered over by satin, but not always.

In manual fill, the extension will still break up the shape into sections, each of which can be embroidered in one go. Then these sections will be fill-stitched one at a time, jumping directly between sections. You’ll almost certainly want to break your shape up into smaller shapes and connect then using running stitches (described below). It’s a painstaking process, made moreso because you’ll need to do it twice: once for the underlay and again for the top stitching.

The flip option can help you with routing your stitch path. When you enable flip, stitching goes from right-to-left instead of left-to-right. Using flip and rotating 180 additional degrees (by adding or subtracting 180 from angle), you can cause fill stitching for a given shape to start from any of the four possible corners.

Running Stitch

Running stitch can be created by setting a dashed stroke on a path. Any kind of dashes will do the job, and the stroke width is irrelevant. ink/stitch will create stitches along the path using the stroke width you specify.

In order to avoid rounding corners, ash extra stitch will be added at the point of any sharp corners.

The repeats parameter says how many times time run down and back song the path. An odd number of repeats means that the stitches will end at the end of the path, while an even number means that stitching will return to the start of the path. The default is one repeat; that is, just traveling once from the start to the end of the path.

If an object consists of multiple paths, they will be stitched in order with a jump between each.

Simple Satin

A line without dashes will result in satin stitching. The width of the satin will be dictated by the stroke width. (For historical reasons, a stroke width less than 0.5 pixels will result in running stitch instead).

This is “simple satin”: Embroider will plot zig-zags to the left and right of the line from start to end, but it won’t do anything special around curves and corners. Sharper curves and corners will result in sparse stitching around the outside of the curve and dense stitching around the i. T

This won’t look good and may even poke holes in the insides of corners. I avoid using plain satin entirely; it’s just kept in for backward compatibility. It’ll probably work fine for straight lines.

Satin Column

Satin Column mode gives you much greater control over how the satin is generated. You define a satin column using a shape made of two mostly-parallel lines. Embroider will draw zig-zags back and forth between the two lines. You can vary the thickness of the column as you like.

The two paths must have the same number of points. This means that each path will be made up of an equal number of Bezier curves. Each pair of points acts as a “checkpoint”: Embroider will ensure that a “zag” ends up going from one point to the other.

Embroider considers each pair of Bezier curves, one at a time. It picks the longest if the two and determines how many zig-zags will be necessary to satisfy the zig-zag spacing setting. This makes it so that the outside of a curve will never have sparse stitching like with simple satin.

However, this does mean that the inside of a curve will have a higher stitch density than you specified. Be careful how you design sharp curves, because stitching at too high a density may poke a hole in the fabric!

To avoid this issue, transition your stitching to go around the corner at an angle, like this:

Some embroidery design programs solve this problem differently. They modify the satin such that some stitches on the inside corner don’t go all the way to the edge, to avoid having the make penetrate the fabric too many times in the same spot. I haven’t gotten around to implementing that yet. Pull requests welcome!

Satin Column supports these settings:

  • zig-zag spacing: the peak-to-peak distance between zig-zags.
  • pull compensation: Satin stitches pull the fabric together, resulting in a column narrower than you draw in Inkscape. This setting expands each pair of needle penetrations outward from the center of the satin column. You’ll have to determine experimentally how much compensation you need for your combination of fabric, thread, and stabilizer.

Satin Column also supports three kinds of underlay, of which you can use any or all simultaneously. I use the terms defined in this excellent article on satin column design.

Center Walk Underlay

This is a row of running stitch down the center of the column and back. This may be all you need for thin satin columns. You can also use it as a base for more elaborate underlay.

Contour Underlay

This is a row of running stitch up one side of the column and back down the other. The rows are set in from the edge of the column by an amount you specify. For small or medium width satin, this may serve well enough by itself.

Zig-Zag Underlay

This is essentially a lower-density satin stitch sewn to the end of the column and back to the start. Added with contour underlay, you get the “German Underlay” mentioned in the article linked above. For wide columns or challenging fabrics, you can use all three underlay types together.

Workflow

Here’s how I use ink/stitch to design embroidery patterns.

Pixels Per Millimeter

My embroidery machine (a Brother SE400) can handle patterns up to 10cm x 10cm (about 4in x 4in). Most machine embroidery design advice articles I’ve read talk in terms of millimeters, so that’s what I work in.

My machine can (theoretically) position the needle with an accuracy of a tenth of a millimeter. The Brother PES format cannot encode a position any more precisely than this. In practice, even if a machine had finer accuracy than this, the realities of sewing on real fabric, even with the best stabilizer, mean that you can’t get any more accurate than this (and you shouldn’t bother trying).

I set the Inkscape’s default document size to 1000 x 1000 pixels and set the “Pixels Per Millimeter” setting in Embroider to ten. This means that every “pixel” in Inkscape is worth a tenth of a millimeter. Practically speaking, there’s no reason I couldn’t choose to have one “pixel” equal one millimeter, because pixels don’t really have much meaning in vector graphics.

Step 1: Sketch design or use an image

First, I get an idea for what I want my finished product to look like. If I’m basing my design off an existing picture or graphic, I load it into Inkscape in its own layer. Some graphics are amenable to Inkscape’s auto-tracing feature, especially if I simplify the image in GIMP first.

After auto-tracing, I clean up the vector shapes, using “Simplify” and deleting nodes by hand when possible. My goal is to use as few Bezier curves as reasonably possible to represent the image.

If I need to trace an image by hand, I usually use the freehand drawing tool. This tool creates paths with a lot of Bezier nodes, so again, I’ll simplify the curves as much as possible.

Working with an existing SVG image can save a ton of time, so consider using Google image search with the filter set to SVG.

For text, choose a font carefully. It’s quite hard to make satin look good when it’s 1mm wide or narrower. Sans-serif fonts tend to be the easiest. For text smaller than 4mm tall, you’ll have a very difficult time making lowercase letters look good, so consider block-caps. Cursive/script fonts can work well, but it’s not going to be as easy as you think. I find that I spend the most time on text by far.

Step 2: Plan stitch path and color changes

At this point, you’ll have a vector graphic representation of your image. The next thing to do is to convert your vectors into the kind that Embroider understands and put them in the right order.

When you’re designing for embroidery machines that can’t cut the thread mid-sew or switch colors automatically, you’re going to want to optimize your stitch path to reduce or hide jump stitches and make minimal color changes. I also try to avoid stitching over jump stitches when possible, because it’s a total pain to trim them by hand when you do.

The order of stitching also affects how the fabric pulls and pushes. Each stitch will distort the fabric, and you’ll need to take this into account and compensate accordingly. Look for articles on machine embroidery distortion for more info on this.

Step 3: create the embroidery vectors

I make heavy use of layers and groups at this point. If I’ve traced an image, I’ll leave it as the lowest layer and set it invisible in the Layers or Objects palette. Any layer, group, or vector shape that is set invisible will be ignored by Embroider.

I keep my initial traced vectors in their own layer and use them as a reference when designing embroidery vectors. I copy and paste then as necessary into a higher layer and work with the copies.

I use only AutoFill and Satin Columns in my designs. I begin converting filled areas to AutoFill regions. Each time I create an AutoFill shape, I set its parameters using Params. Then I select it and run Embroider, which will cause it to show a stitch plan for just the selected object(s).

I examine the resulting stitch plan using the node editor tool. Each vertex is a single stitch; the needle will penetrate the fabric and interlock with the bobbin thread at this point. Once I’m done examining the stitch plan, I Undo the Embroider operation to remove the stitch plan and make my vectors visible again. Then I make any changes necessary, re-run Embroider, and repeat until it looks right.

At this point, I save my SVG file. If Inkscape is starting to become sluggish (due to the memory leak described above), I’ll restart it before continuing.

Next, I work on Satins. Remember that a Satin Column is defined by two lines that run along the edges of the column. It’s usually a good idea to run satin along the outside border of a fill region. Inkscape makes this easy. I copy and paste the shape from the traced vectors, then disable Fill and enable Stroke. I set the stroke width to my desired satin width. Finally, I use the “Stroke to Path” option to convert just the stroke into its own path.

At this point, it’s necessary to cut the paths so that they aren’t a continuous loop. The cut will also tell Embroider where to start stitching from. Add a point at the desired cut location by double-clicking on the path with the Node Editor tool active. Cut at this point by selecting at and pressing shift+b. Repeat for the second path.

Now you’ve got an object made of two paths. They need to be going on the same direction for Satin Column to work. You can tell what direction the path goes in by enabling direction indicators in Inkscape’s preferences. To reverse one of the paths, select one of its points and choose “Reverse Path”. I bind this to ctrl+r in Inkscape’s preferences.

By now, it’s likely that the two paths in the object have an unequal number of nodes. As described above, a Satin Column is made of consecutive pairs of points on the two paths. You have a couple of techniques available to make your nodes line up in pairs:

  • simplify the shape (control+L)
  • delete extra nodes
  • add more nodes
  • joining nodes

I usually Simplify first if necessary to reduce the path to a manageable number of nodes. Remember that machine embroidery is fairly imprecise and your final product will not have the incredibly fine details that you see on your screen, so simplifying can often be acceptable even if it changes the path.

Next, I try to delete nodes or join. Inkscape will attempt to manipulate the neighboring nodes as necessary to keep the path the same, but I find that it’s not particularly good at this. Instead of deleting a troublesome point, it can often work better to simply add a matching point on the other path.

Finally, I run Embroider with the path selected and examine the output. I add, remove, and adjust points as necessary to make my satin look nice.

You may find that you get the dreaded error, “object has two paths with an unequal number of points". This can be confusing because it may _look_ like your paths have the same number of points. Usually this is because of duplicate points: multiple points with the exact same coordinates or very similar coordinates. To find them, I drag-select each point in turn, examining the bottom status bar. It will tell me how many points I've selected. If it looks like I've selected just one but Inkscape says I've selected 3, then it's likely that I've found my culprit.

Often just pressing shift+J (join nodes) will eliminate the extra points without modifying the shape. Sometimes it won’t, and in cases like that, I’ve had luck with adding extra points on either side of the “cluster”, then delete the “cluster”. The extra points anchor the shape and disallow Inkscape from messing it up. Then it may be possible to delete the added points, or I just add points to the other path to match with them.

Step 4: Ordering

Once I’ve created all of my vectors and test-embroidered them individually, it’s time to put everything in the right order. This is where the Objects tool from the latest development version of Inkscape (described above) comes in useful. Because my embroidery machine can neither trim threads nor switch colors mid-sew, I optimize my order to minimize color changes and reduce or hide jump-stitches.

Embroider will stitch objects in exactly the order they appear in your SVG document, from lowest to highest in stacking order. If the distance between two objects is long, Embroider will add a jump-stitch between them automatically. It uses the color of the object to determine thread color, so changes in color from one object to the next will result in a thread-change instruction being added to the embroidery output file.

Inkscape gives you the ability to raise and lower objects in the stacking order using the PageUp and PageDown keys. However, it seems to be unwilling to give the user complete control over the stacking order. I think this has to do with whether objects overlap: if A and B don’t overlap, Inkscape doesn’t care how they’re stacked in the SVG file and it sometimes won’t let you reorder them.

To solve this, I created the Reorder extension. To use it, hold down the shift key and select objects one at a time in the order you’d like them to appear in the SVG file and run Reorder. Reorder will remove the selected objects from the SVG XML tree and then re-add them at the location of the first-selected object, in the order you selected them.

You can also manually manipulate the underlying SVG XML structure by using Inkscape’s XML Editor pane. Its “Raise” and “Lower” buttons directly manipulate the order of XML tags in the SVG file and are not subject to the same limitations as PageUp and PageDown. Note that the ordering of XML tags in the XML Editor tool is the reverse of the order of objects in the Objects tool.

Step 4: Render to CSV

Once I’ve got everything in the right order, I deselect all objects and run Embroider again. This will embroider all visible objects in the document. As described in the Setup section above, I render my embroidery file in CSV format and convert it with EmbroiderModder’s libembroidery-convert utility.

Embroider will create a file in the specified output directory named after your SVG file, but with the extension changed to .csv. It will back up any existing file there, storing up to 5 old copies of each file.

Step 5: Convert to PES and upload

My sewing machine uses the PES format, so I convert the CSV file into a .PES and send it over to my sewing machine. My Brother SE400 acts like a (very small!) USB flash drive. I use a script to do the CSV and upload steps all at once.

Step 6: Test-sew

I’ve never gotten an embroidery file correct on the first try. There’s always room for improvement! To test out my design, I prepare a test piece of fabric that matches my final fabric as closely as possible. I use the same stabilizer and the exact same fabric if possible. For t-shirts, I try to find a similar fabric (usually knit). Knits need a lot of stabilization.

I sew out the design, watching the machine to make sure that there aren’t any surprises. I’m watching for gaps that indicate that the fabric has been distorted. I’m also watching for areas where stitches are piling up too closely and the machine is having trouble sewing, which indicates that the stitch density is too high. Finally, of course, I’m watching in giddy anticipation to see my design for the first time! :)

Step 7+: iterate

Then I go back and tweak my design. Hopefully it only takes a few tries to get it how I want it. Once I’m done, I copy the CSV file from my output directory, just to avoid accidentally overwriting it in the future.

Xerox Alto zero-day: cracking disk password protection on a 45 year old system

$
0
0

We've been archiving a bunch of old Xerox Alto disk packs from the 1970s. A few of them turned out to be password-protected, so I needed to figure out how to get around the password protection. I've developed a way to disable password protection, as well as a program to find the password instantly. (This attack is called XeroDay, based on a suggestion by msla.)

The Xerox Alto. The disk drive is the black unit below the keyboard. The processor is behind the grill. The Mandelbrot set has nothing to do with this article.

The Xerox Alto. The disk drive is the black unit below the keyboard. The processor is behind the grill. The Mandelbrot set has nothing to do with this article.

The Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. In the photo above, the Alto computer itself is in the lower cabinet. The Diablo disk drive (in 1970s orange, below the keyboard) takes a removable 14 inch disk pack that stores 2.5 megabytes of data. (A bunch of disk packs are visible behind the Alto and in the photo below.) I've been restoring a Xerox Alto, along with Marc Verdiell, Luca Severini and Carl Claunch. (The full set of Alto posts is here and Marc's videos are here.)

Some of the disks from Xerox PARC that we're archiving.

Some of the disks from Xerox PARC that we're archiving.

Now that we have the Alto running, one project is to archive a bunch of disks that have been sitting at Xerox PARC for decades, and find out if there are any interesting treasures among the disks. We can archive disks by running the Copydisk program on the Alto, and copying them over Ethernet to a PC server. (Ethernet was invented by Xerox for the Alto.) However, it's considerably faster to read the data directly off the disk drive, bypassing the computer. Carl created an FPGA-based disk controller (below) that connects to the Diablo disk drive, speeding up the archiving process.1

Diablo disk tool, built by Carl Claunch using an FPGA.

Diablo disk tool, built by Carl Claunch using an FPGA.

Before reading each disk, we open up the pack and carefully clean the surface. After storage for decades, these disks have some grime, dust, and the occasional bug (of the dead insect variety), so we need to clean them to reduce the chance of a head crash.

A Xerox Alto disk pack, opened for cleaning. The "flaws" on the surface are just reflections.

A Xerox Alto disk pack, opened for cleaning. The "flaws" on the surface are just reflections.

Most of the archived disks can be booted on the Alto or the ContrAlto simulator. But a few disks only booted to a password prompt (see below), and we couldn't use the disk without the password. So I decided to hack my way into the password-protected disks.

Booting a password-protected disk results in a request for the password.

Booting a password-protected disk results in a request for the password.

The Alto documentation discusses password protection, explaining how a password can be associated with a disk. It only promises "a modest level of security", which turns out to be true. It also says if you forget the password, "you will need an expert to get access to anything on your disk." But could I break in without finding an expert?

Password protection is described in the Alto User's Handbook page 5.

Password protection is described in the Alto User's Handbook page 5.

A bit about passwords

Storing passwords in plain text is a very bad idea, since anyone who can access the file can see the password.9 Most systems use a solution invented by Roger Needham in 1967. Instead of storing the password, you hash the password through a cryptographic one-way function and store the hash. When the user inputs a password, you hash it through the same function and compare the hashes. If they match, the passwords match. And if anyone sees the hash, there's no easy way to get the password back.

One problem with hashed passwords is if two users have the same hash, then you know they have the same password. A solution (invented in Unix) is to hash some random bytes (called salt) along with the password to yield the stored hash. Since different users will have different salt, the hashes will be different even if the passwords are the same. (Of course you need to store the salt along with the hash in order to check passwords.)2 Like Unix, the Alto used salted and hashed passwords.

The Alto's hash algorithm

The source code for the Alto's password algorithm reveals how the password hashing is implemented.3 The Alto uses four words of salt with the password (two words based on the password creation time and two words based on the user name). The password hash is 4 words (64 bits) long. The Alto's password hash algorithm is pretty simple:4

Hash c = -a*x*x + b*y

where a is the time salt and b is the user name salt. x is a one-word value generated from the password string and y is a two-word value from the password string, both generated by concatenating characters from the password.5

Disabling the password

There's a way to disable the password on disk, gaining access to the file system.6 The Alto boots from disk by running the file sys.boot; this file decides if a password is required for boot, based on the 9-word password vector stored inside the file. The first word is a flag indicating if the disk is password protected. If the password flag is set, the boot program requires a password before proceeding. The next four words are the salt, and the final four words are the password hash itself.

The password protection can be disabled by clearing the flag word inside sys.boot, which is the 128th word in the second block of sys.boot. The tricky part is finding where this word is on disk. The file system stores a directory as the name of each file along with the disk address of the file's first block. By reading the directory as raw data and interpreting it, we can find the location of sys.boot. In the Alto file system, each disk block has pointers to the previous and next block. So once we've found the first block, we can follow the pointer to find the second block. (Basically I re-implemented a subset of the Alto file system using the raw disk.) Erasing the password flag in this block makes the disk bootable without the password.

After implementing this, I realized there's a short cut. The writers of the disk bootstrap code didn't want to re-implement the file system either, so the Alto simply copies the first block of sys.boot to the first disk sector. This makes it trivial to find the file, without needing to scan the directory. Once I have the first block, I can find the second block from the pointer, access the block, and clear the password flag, disabling the security.7

I made a simple Python program to update the disk image file and clear the password flag (link). After running this program, I was able to access the protected disks. The password-protected disks didn't have any super-secret treasures. However, one disk contained an implementation of APL in Mesa, which was interesting. (APL is a cool programming language known for its extremely terse notation and strange character set.Mesa is a high-level language developed at Xerox PARC; it influenced Java.) We haven't gotten Mesa running on the Alto yet, but this looks like an interesting thing to try out.

After defeating the password, I can view the disk contents. These files implement APL in Mesa.

After defeating the password, I can view the disk contents. These files implement APL in Mesa.

Brute-forcing the password

While I could access the disk simply by clearing the password flag, I wondered what the original passwords were. I implemented the password algorithm in C (link), so I could rapidly test passwords. My hope was that testing against a list of top 100,000 passwords would find the passwords, since I didn't expect much in the way of 1970s password security practices. Surprisingly, there were no hits so the passwords weren't common words. My next step was brute-forcing the password by generating all strings of length 1, 2, 3 and so forth.8 Finally at length 8, I cracked the first password with "AATFDAFD". The second disk had password "HGFIHD" and the third had "AAJMAKAY". Apparently random strings were popular passwords back then.

I was a bit suspicious when I saw that both 8-character passwords started with "AA" so I investigated a bit more. It turned out that using "AB" in place of "AA" also worked, as did starting with anything in "{A-Z}{A-G}". Studying the algorithm more closely, I realized that when x and y get too long, the old character bits were just dropped. Thus, when you use a password longer than 6 characters, most of the bits from the first characters are lost. This is a pretty big flaw in the algorithm.

Finding the password with math

It takes half an hour or so to brute force the password; can we do better? Yes, by doing some algebra on the password formula yielding:

y = (c + a*x*x) / b

where x and y are generated from the password string, a and b are salt, and c is the stored password hash. Since x is only 16 bits, we can easily try all the values, finding ones for which the division works. When we find a solution for y, we can recover the original password by chopping x and y into 7-bit characters. Using this technique, the password can be recovered almost instantly. I implemented this in Python here.

Conclusion

The Xerox Alto's disk passwords can be trivially bypassed, and the password can be easily recovered. The password algorithm has multiple flaws that make it weaker than you'd expect (dropping password characters) and easily reversed. Since the system is almost 45 years old, they had to keep the code small and they weren't facing modern threats. After all, Xerox only promised "modest" security with the passwords, and that's what it provided.

Notes and references

  1. For more details on the archiving process, see Carl's post on archiving. He writes about the FPGA disk tool here

  2. Salting passwords also protects against password attacks using precomputed rainbow tables, but that wasn't a concern back then. 

  3. The password code can be viewed at Password.bcpl. The code is in BCPL, which is a predecessor of C. The syntax has some trivial but confusing differences; I'll explain the most important.p!1 is just an array access, equivalent to p[1] in C. vec 4 allocaates 4 words, essentially malloc(4). ps>>String.char↑i is equivalent to ps->String.char[i], accessing a character from the password structure. $a is 'a' and rem is %. Square brackets in BCPL are blocks, like curly braces in C. Also note that the code has inline assembly for the vector addition and multiplication functions. 

  4. The negation in the hash function is broken; only the top two words are negated. The Password.bcpl code points out this bug, but notes that they couldn't fix it without invalidating all the existing passwords. 

  5. The x value is generated by concatenating the 1st, 4th, ... characters of the password, while y consists of the other characters. The concatenation is done using 7-bit characters, for some reason. 

  6. I thought I could boot off an unprotected disk and then use the Neptune graphical file browser to view protected files in the second drive. However, they thought of this and Neptune also checks for a password. In the screenshot below, Neptune shows the contents of the left disk, but requires a password to show the contents of the right disk.The Neptune file browser checks for a password.

    The Neptune file browser checks for a password.
  7. A few more details about the Alto file system for reference. Each disk sector consists of a 2-word header, an 8-word label, and a 256-word data record, each with a checksum. (The Alto's file system (like the Alto) uses 16-bit little-endian words.) The file system structures are defined in include file AltoFileSys.D. The header contains the physical disk address of the sector (sector number, track number, head number), which is used to validate that the sector returned from the drive is the correct sector. The label contains file system information: pointers to the next and previous sectors in the file, the number of characters used and the file id. This information can also be used to recover from corruption using the scavenger program, which is similar to Unix's fsck. See the Operating System Reference Manual; section 4.2 describes disk files. 

  8. The Alto's password algorithm converted all lower case letters to upper case, which reduced the search space. I'm not sure if special characters are allowed or not. 

  9. The code that calls the password routine has comments about making sure the cleartext password never gets stored on disk, and the in-memory password buffers are zeroed out so they don't get swapped to disk. So it's clear that the writers of the password code were thinking things through. 

More details about mitigations for the CPU Speculative Execution issue

$
0
0
Summary
Mitigation
Variant 1: bounds check bypass (CVE-2017-5753)
This attack variant allows malicious code to circumvent bounds checking features built into most binaries. Even though the bounds checks will still fail, the CPU will speculatively execute instructions after the bounds checks, which can access memory that the code could not normally access. When the CPU determines the bounds check has failed, it discards any work that was done speculatively; however, some changes to the system can be still observed (in particular, changes to the state of the CPU caches). The malicious code can detect these changes and read the data that was speculatively accessed.

The primary ramification of Variant 1 is that it is difficult for a system to run untrusted code within a process and restrict what memory within the process the untrusted code can access.

In the kernel, this has implications for systems such as the extended Berkeley Packet Filter (eBPF) that takes packet filterers from user space code, just-in-time (JIT) compiles the packet filter code, and runs the packet filter within the context of kernel. The JIT compiler uses bounds checking to limit the memory the packet filter can access, however, Variant 1 allows an attacker to use speculation to circumvent these limitations.

Mitigation requires analysis and recompilation so that vulnerable binary code is not emitted. Examples of targets which may require patching include the operating system and applications which execute untrusted code.
Variant 2: branch target injection (CVE-2017-5715)
This attack variant uses the ability of one process to influence the speculative execution behavior of code in another security context (i.e., guest/host mode, CPU ring, or process) running on the same physical CPU core.

Modern processors predict the destination for indirect jumps and calls that a program may take and start speculatively executing code at the predicted location. The tables used to drive prediction are shared between processes running on a physical CPU core, and it is possible for one process to pollute the branch prediction tables to influence the branch prediction of another process or kernel code.

In this way, an attacker can cause speculative execution of any mapped code in another process, in the hypervisor, or in the kernel, and potentially read data from the other protection domain using techniques like Variant 1. This variant is difficult to use, but has great potential power as it crosses arbitrary protection domains.
Mitigating this attack variant requires either installing and enabling a CPU microcode update from the CPU vendor (e.g., Intel's IBRS microcode), or applying a software mitigation (e.g., Google's Retpoline) to the hypervisor, operating system kernel, system programs and libraries, and user applications.
Variant 3: rogue data cache load (CVE-2017-5754)
This attack variant allows a user mode process to access virtual memory as if the process was in kernel mode. On some processors, the speculative execution of code can access memory that is not typically visible to the current execution mode of the processor; i.e., a user mode program may speculatively access memory as if it were running in kernel mode.

Using the techniques of Variant 1, a process can observe the memory that was accessed speculatively. On most operating systems today, the page table that a process uses includes access to most physical memory on the system, however access to such memory is limited to when the process is running in kernel mode. Variant 3 enables access to such memory even in user mode, violating the protections of the hardware.
Mitigating this attack variant requires patching the operating system. For Linux, the patchset that mitigates Variant 3 is called Kernel Page Table Isolation (KPTI). Other operating systems/providers should implement similar mitigations.

Mitigations for Google products

You can learn more about mitigations that have been applied to Google’s infrastructure, products, and services here.

About speculative execution vulnerabilities in ARM-based and Intel CPUs

$
0
0

Security researchers have recently uncovered security issues known by two names, Meltdown and Spectre. These issues apply to all modern processors and affect nearly all computing devices and operating systems. All Mac systems and iOS devices are affected, but there are no known exploits impacting customers at this time. Since exploiting many of these issues requires a malicious app to be loaded on your Mac or iOS device, we recommend downloading software only from trusted sources such as the App Store. Apple has already released mitigations in iOS 11.2, macOS 10.13.2, and tvOS 11.2 to help defend against Meltdown. Apple Watch is not affected by Meltdown. In the coming days we plan to release mitigations in Safari to help defend against Spectre. We continue to develop and test further mitigations for these issues and will release them in upcoming updates of iOS, macOS, tvOS, and watchOS.

Background

The Meltdown and Spectre issues take advantage of a modern CPU performance feature called speculative execution. Speculative execution improves speed by operating on multiple instructions at once—possibly in a different order than when they entered the CPU. To increase performance, the CPU predicts which path of a branch is most likely to be taken, and will speculatively continue execution down that path even before the branch is completed. If the prediction was wrong, this speculative execution is rolled back in a way that is intended to be invisible to software.

The Meltdown and Spectre exploitation techniques abuse speculative execution to access privileged memory—including that of the kernel—from a less-privileged user process such as a malicious app running on a device.

Meltdown

Meltdown is a name given to an exploitation technique known as CVE-2017-5754 or "rogue data cache load." The Meltdown technique can enable a user process to read kernel memory. Our analysis suggests that it has the most potential to be exploited. Apple released mitigations for Meltdown in iOS 11.2, macOS 10.13.2, and tvOS 11.2. watchOS did not require mitigation. Our testing with public benchmarks has shown that the changes in the December 2017 updates resulted in no measurable reduction in the performance of macOS and iOS as measured by the GeekBench 4 benchmark, or in common Web browsing benchmarks such as Speedometer, JetStream, and ARES-6.

Spectre

Spectre is a name covering two different exploitation techniques known as CVE-2017-5753 or "bounds check bypass," and CVE-2017-5715 or "branch target injection." These techniques potentially make items in kernel memory available to user processes by taking advantage of a delay in the time it may take the CPU to check the validity of a memory access call.

Analysis of these techniques revealed that while they are extremely difficult to exploit, even by an app running locally on a Mac or iOS device, they can be potentially exploited in JavaScript running in a web browser. Apple will release an update for Safari on macOS and iOS in the coming days to mitigate these exploit techniques. Our current testing indicates that the upcoming Safari mitigations will have no measurable impact on the Speedometer and ARES-6 tests and an impact of less than 2.5% on the JetStream benchmark. We continue to develop and test further mitigations within the operating system for the Spectre techniques, and will release them in upcoming updates of iOS, macOS, tvOS, and watchOS. 

Information about products not manufactured by Apple, or independent websites not controlled or tested by Apple, is provided without recommendation or endorsement. Apple assumes no responsibility with regard to the selection, performance, or use of third-party websites or products. Apple makes no representations regarding third-party website accuracy or reliability. Risks are inherent in the use of the Internet. Contact the vendor for additional information. Other company and product names may be trademarks of their respective owners.

Published Date:

iMac Pro's T2 chip

$
0
0

I’ve spent the last week with Apple’s new iMac Pro, and in most ways it’s just a faster Mac. It's the first pro Mac desktop in over three years and the fastest Mac yet made, granted, but still entirely familiar. And yet in many ways—some noticeable, some entirely invisible—this new Mac is completely different from all past Mac models.

The iMac Pro may be an outlier today, but in the future we’ll probably look back on it as the start of a new era for the Mac, all because of the Apple-built T2 chip it carries inside. Here’s how the T2 makes this iMac Pro unlike all other Macs.

The power behind the throne

The T2 processor isn’t doing the heavy lifting in the iMac Pro—that’s the Intel Xeon processor with between 8 and 14 processor cores. The T2 is the brain behind that brain, running the subsystems of the iMac Pro from a single piece of Apple-built silicon. The result is a simplified internal design that doesn’t require multiple components from multiple manufacturers.

Apple - iMac Pro [2017] - thermal architecture / performanceApple

Among the T2's tasks is control over the new iMac cooling system.

On most Macs, there are discrete controllers for audio, system management and disk drives. But the T2 handles all these tasks. The T2 is responsible for controlling the iMac Pro’s stereo speakers, internal microphones, and dual cooling fans, all by itself.

A FaceTime camera like no other

The iMac Pro’s FaceTime camera can capture 1080p video, an upgrade from the 5K iMac’s 720p resolution. But this new FaceTime camera is driven by the T2 processor, which means it’s got intelligence that previous FaceTime cameras lacked. Like its cousin processors that drive the iPhone, the T2 has an Apple-designed image signal processor that detects faces in order to properly set exposure and white balance, dynamically adjusts exposure, and a whole lot more—all in the service of producing a better image, just like what happens when you shoot photos or video with your iPhone.

What flash storage?

The iMac Pro offers between 1TB and 4TB of flash storage, also commonly referred to as an SSD, or “solid-state drive.” This is hardly the first Mac to include SSDs, but it’s a very different approach to storage than previous models.

Most solid-state drives, whether they fit into a drive bay like a spinning hard drive or are reduced to a chip that slides into a slot somewhere, are self-contained—they’re a bank of memory combined with a controller. On the iMac Pro, though, that’s not the case—the SSD that comes with the iMac Pro is actually two banks of NAND memory. (Every iMac Pro has two banks that are “striped” together into a single drive—if you get the 1TB model, your iMac Pro has two 512GB NAND banks; the 4TB model has two 2TB NAND banks.)

imac pro portsRoman Loyola

In addition to all this external I/O, the iMac Pro has crazy-fast striped SSDs, encrypted and controlled by the T2.

As for the disk controller? There isn’t one—or more accurately, the disk controller is built into the T2 itself. This gives the T2 complete control over internal storage on the iMac Pro. This has some major benefits in terms of speed and security. Every bit of data stored on an iMac Pro’s SSD is encrypted on the fly by the T2, so that if a nefarious person tried to pull out the storage chips and read them later, they’d be out of luck.

(For additional security, Apple strongly suggests you turn on FileVault, which ties SSD encryption to your password. This provides an additional level of security, because your disk can’t be decrypted without the proper hardware and your password.)

All this encryption happens invisibly, so the SSDs in the iMac Pro still operate at full speed—approximately 3GB per second.

Boot twice for safety

You know you’re a Mac nerd when you have opinions about the keys you need to hold down when rebooting while troubleshooting a problem. But on the iMac Pro, booting and rebooting is different—very different. In essence, it’s a two-stage process, first driven by the T2, then driven by the more traditional system boot process.

When you start up the iMac Pro, the familiar Apple logo appears almost immediately. This is a sign that the T2 is taking control. For security reasons, the T2 is the iMac Pro hardware’s “root of trust,” and it validates the entire boot process when the power comes on. The T2 starts up, checks things out, loads its bootloader, verifies that it’s legitimate and cryptographically signed by Apple, and then moves on to the next part of the boot process.

This new boot process means there’s also a new utility for Mac users to get to know: Startup Security Utility, which you can only access by booting into Recovery mode by holding down Command-R while starting up. Startup Security Utility gives the T2 guidance about just how strict it should be when judging whether it should boot your computer.

By default, security is set to Full, which means that only the current operating system or another OS version signed and trusted by Apple—meaning it hasn’t been tampered with in any way—can be booted by the computer. This version requires a network connection when you attempt to install any OS software updates, because it needs to verify with Apple that the updates are legitimate. You can also set the security level lower, to Medium (which allows older version of macOS to run regardless of Apple’s level of trust), or turn the feature off entirely, emulating the way all other Macs currently start up.

(This goes for Boot Camp, too—the T2 respects Microsoft’s signing authority for Windows 10 beginning with 2017’s Fall Creators Update, so Boot Camp users can reboot into Windows 10 while remaining fully secure.)

A hybrid Mac? Not quite.

Before the iMac Pro was released, there was a lot of speculation that it was part of a trend toward creating a “hybrid Mac” that is driven by both an Intel processor and an Apple-designed ARM chip like those found in other Apple devices. The iMac Pro is definitely a hybrid of a sort, but probably not the one people were expecting. With the T2, Apple is using its chip-design prowess to take more control over parts of the Mac hardware that were previously outsourced to other controllers, and reaping the benefits of integrating them all together.

The iMac Pro isn’t running iOS apps, but it does get to take advantage of most of the work Apple has done to bolster the security of iOS devices and enhance the quality of photos and video taken by iPhone cameras. Apple will almost certainly continue to push this technology into more future Mac models, because it allows Apple to use the work it’s already done on iOS to improve the features and security of the Mac.


Intel Issues Updates to Protect Systems from Security Exploits

$
0
0

SANTA CLARA, Calif., Jan. 4, 2018 — Intel has developed and is rapidly issuing updates for all types of Intel-based computer systems — including personal computers and servers — that render those systems immune from both exploits (referred to as “Spectre” and “Meltdown”) reported by Google Project Zero. Intel and its partners have made significant progress in deploying updates as both software patches and firmware updates.

MORE:Intel Responds to Security Research Findings (Jan. 3, 2018) | Security Exploits and Intel Products (Press Kit) | Facts about The New Security Research Findings and Intel Products (Intel.com)

Intel has already issued updates for the majority of processor products introduced within the past five years. By the end of next week, Intel expects to have issued updates for more than 90 percent of processor products introduced within the past five years. In addition, many operating system vendors, public cloud service providers, device manufacturers and others have indicated that they have already updated their products and services.

Intel continues to believe that the performance impact of these updates is highly workload-dependent and, for the average computer user, should not be significant and will be mitigated over time. While on some discrete workloads the performance impact from the software updates may initially be higher, additional post-deployment identification, testing and improvement of the software updates should mitigate that impact.

System updates are made available by system manufacturers, operating system providers and others.

Intel will continue to work with its partners and others to address these issues, and Intel appreciates their support and assistance. Intel encourages computer users worldwide to utilize the automatic update functions of their operating systems and other computer software to ensure their systems are up-to-date.

For information and links to useful resources, visit the security research findings page on Intel.com.

Where Pot Entrepreneurs Go When the Banks Say No

$
0
0

Applying for a Safe Harbor checking account is an invasive procedure. The credit union first tries to learn as much as it can about the company and its owners, beginning with an hourlong interview, followed by the document collection that Behzadzadeh found so aggravating: lists of owners, investors, vendors and customers; financial statements and tax forms; the business’s organizing documents; state licenses; leases, handbooks and more. The documents can multiply quickly, because marijuana businesses are often a web of affiliated companies with overlapping ownership. Each often gets its own bank account.

Once the accounts are opened, Safe Harbor’s bankers inspect the business and its premises as frequently as every three months, to confirm that it hews to all of Colorado’s rules. Treasury’s guidance implies that a bank can’t properly make sure that a client stays on the right side of the Cole memo without a rigorous state licensing and regulatory plan in place, according to McVay, the Seattle lawyer. This, he says, explains why California’s large medical-marijuana sector mostly lacks access to banking services: Local governments oversee the business, and with a very light touch. The state won’t implement comprehensive regulations until later this year at the earliest.

Safe Harbor bankers spend most of their time monitoring client transactions, tying every dollar the bank takes in to a legitimate sale and making sure that no dollar withdrawn disappears into the illicit economy. Each month, they reconcile the account activity to a company’s ledger and the sales it reports, as well as to all the other information they’ve gleaned about a client, such as which companies it normally trades with. The credit union has compiled reams of data about how money moves around the marijuana industry, and bankers regularly compare notes about the transactions they see. When an unusual deposit arrives — larger than typical, say, or from a new source — the banker holds the money until the client can account for it.

Seefried recalls how a banker noticed that one dispensary seemed to have customers with unusually deep pockets: its sales averaged $300 to $400 each, while medical-marijuana patients generally spend less than $100 at dispensaries. The banker interrogated the client and, from the parking lot, even staked out the dispensary and studied its customers. In the end, still mystified by how the dispensary made its money, Safe Harbor closed the account. “You cannot bank what you don’t understand,” Seefried says.

Safe Harbor has closed three other accounts after members broke various credit-union rules, indicating that they weren’t willing to be fully transparent with their bankers. Late last summer, Seefried ejected three more members after bankers, working with money-laundering experts, detected suspicious activity. And another three closed their accounts rather than comply with her increasingly strict requirements, including one that customers deposit 90 percent of their receipts. The rule, she acknowledges, was mainly intended to weed out less-committed clients: Only a business trying to hide its transactions would be put off by it.

Last June, the National Credit Union Association, Partner Colorado’s federal regulator, conducted its annual examination of the credit union’s books and practices, and for the third year in a row, had no complaints about Safe Harbor. Today the program consists of 13 people, including five responsible solely for compliance. Most are women. (Women manage more than half of all credit unions.) When Seefried hires bankers for Partner Colorado, she gravitates toward sociable people. For Safe Harbor, she says she prefers introverts or, at the very least, people who evince precision and curiosity.

This sort of meticulous banking isn’t cheap. For each $100,000 deposited at Safe Harbor, a client pays $450 in fees in the first year and $300 thereafter. (Client companies that serve the marijuana industry but don’t actually sell the drugs, like laboratories, require much less vetting and so pay much lower fees.) Seefried says the Safe Harbor program made a modest profit in its first year — less than $200,000. It became much more lucrative for the credit union in 2016, but Seefried won’t reveal specifics. According to federal data, most of the institution’s sources of income have stayed relatively steady since 2014, but fee income has grown to a projected $5 million in 2017 from nearly $3 million.

Seefried says that about three-quarters of Safe Harbor’s marijuana-selling clients pay less than $1,000 a month per account, considerably less than they would pay at banks, where monthly account fees are said to start at $1,500. Many observers assume that the opportunity to assess lucrative fees is what entices small banks to take on these risky accounts. Federal data show that at Champion Bank, in suburban Denver, which began explicitly working with marijuana businesses in 2014, annual fee revenue on deposit accounts increased by a factor of 68 from 2013 to 2016, to $752,000 from $11,000, even as its main line of business, providing loans, shrank. At Colorado Bank and Trust in La Junta, fees grew elevenfold to $2.9 million.

For marijuana businesses, engaging with the financial mainstream has kept them mostly free from legal trouble. Safe Harbor has received subpoenas for the bank records of only four of the just over 200 clients it has had. None has yet been indicted. John Walsh can recall prosecuting one case against regulated dispensaries and none against a business serving the industry. But working without a checking account makes even an otherwise law-abiding business look evasive, says Lewis Koski, who served as the first director of the state’s Marijuana Enforcement Division, which collects marijuana taxes and regulates the industry, before he joined Andrew Freedman in a consulting venture. A cash-management technique like personal deposits structured to evade notice can legally amount to money laundering. “Banking,” Koski says, “just provides a really clear picture of where money is coming in and where money is going out.”

Photo
An employee refills a container at Green Sativa.Credit Angie Smith for The New York Times

On the last Wednesday of last March, Kim Oliver, Partner Colorado’s executive vice president, pulled her old, mud-streaked silver Jetta into Avicenna’s parking lot. She had come from her ranch, east of Denver. She was dressed formally and seemed intently focused on the business at hand. Seefried had reviewed all of Behzadzadeh’s paperwork; now only an inspection stood between him and a bank account. Behzadzadeh gave Oliver a warm welcome, and Elsberg took her ID, signed her in and gave her a visitor badge. Oliver made a mental note: the first small compliance test, passed.

Oliver, who is 59, has worked in credit unions all her adult life. Earlier she had confided to me a lifelong hostility to marijuana and rage when she discovered that her son-in-law, a veteran suffering from PTSD and living in her house, was cultivating a small crop in her basement. But working on Safe Harbor had forced her to reconsider her views on medical marijuana. “I found out that he wasn’t blowing smoke,” she told me. (I couldn’t tell if she intended the pun.) “That was my whole thing: You’re just trying to have an excuse to smoke dope.” She now supplies her arthritic mother with topical creams that contain cannabidiol, a chemical compound in the marijuana flower that doesn’t produce a high but that medical-marijuana advocates believe reduces inflammation.

Oliver sat down with Behzadzadeh and Elsberg at a gray marble table in the kitchenette just inside the employee entrance. She began the formal inspection by going through a checklist: six pages long, scores of questions. The first series delved into Behzadzadeh’s financial practices and the second into business procedures. These were meant to assess Behzadzadeh’s compliance with state regulations. Then she read five or six questions about each of the eight Cole memo priorities. After a response, Oliver made a check — almost always “yes” — and sometimes wrote a little note beside it. “If you had an employee, and they cut a bud off,” she asked at one point, “how would you catch who did that?”

“Cut a bud off” — the phrase in Oliver’s Great Plains twang was disorienting. One of the incongruities in Colorado’s marijuana business is how professionals new to the trade adopt the Mendocino idiom without either irony or any particular reverence, the way their clothes absorb the plant’s scent after a few hours on site. The industry’s brand-builders prefer more sanitized language when talking to the public: “adult use” for “recreational” and “cannabis” for “marijuana.”

Elsberg replied, “That’s the beauty of it — we have cameras on everything.” The questions, and especially the answers, occasionally delved into minutiae, as when Behzadzadeh noted that he had hired a man to drive the perimeter of his two buildings hourly every night. Oliver raised an eyebrow. “He’s legal,” Behzadzadeh said. “I checked his sosh” — Social Security number. “I checked his driver’s license.” He deadpanned, “I sent a copy to Jeff Sessions, too.”

Next, Oliver went about documenting compliance, snapping photos of the cameras, the safes, the locks on cabinets and other equipment, the bars blocking the windows and even the eyewash station. She walked over to a bakery-bun rack and photographed the RFID tag lying next to a brittle sheet of an extract known as shatter. Then she had an employee type the tag number into the state record system to call up that batch’s history. She visited the grow next door (“I’m surprised they have as many cameras that they do,” she remarked approvingly afterward) and then followed Behzadzadeh and Elsberg up to the dispensary in Federal Heights.

The dispensary inspection got off to a rocky start when Oliver noticed the A.T.M. near the entrance. Cash-machine companies often rent them to dispensaries and stores, because so few can accept credit cards, but Behzadzadeh owned his and each week filled it with $2,000 in cash taken from the register. “That’s got to change,” Oliver said matter-of-factly. “The biggest way of laundering money is through the A.T.M.”

Behzadzadeh’s machine was tied to a personal bank account; every time a customer withdrew cash, that account received a credit. But the register money was unaccounted for; it could come from anywhere. Now, Oliver told him, he would have to withdraw money from Safe Harbor and hire an armored car to deliver the cash and fill the A.T.M.; he couldn’t even touch it himself.

A look of incredulity crossed Behzadzadeh’s face. “Wait — I’m not going to give my business to somebody else!” he said. As a start-up entrepreneur, Behzadzadeh nursed a swelling grudge against the contractors and vendors who nickeled and dimed him, he thought, at every turn — maybe even Safe Harbor. The cost of banking was one thing, but the prospect of trading the satchel for an armored car from a company like Blue Line, another niche player that charged a premium in a niche industry, especially galled him. “I have no problem doing business with you guys, and I appreciate the banking, but I’m not going to pay Blue Line to take my money,” he said, his voice rising. “There’s no way.”

Oliver kept pressing. “All they’re doing is loading the machine for you,” she said. “If I wanted to be on the black market, I would come in here and fill this machine all the time to clean the money.”

“Ah,” Behzadzadeh said. Now he understood. He thought out loud: He would bring cash from Avicenna to the dispensary and give it to a courier, who would put different cash, which he had brought with him, into the A.T.M. machine. “While you’re doing that” — Behzadzadeh addressed the imaginary courier — “I’ll make you coffee.”

Behzadzadeh’s dogs were barking ferociously when he and his wife returned home from dinner late one night in September. They didn’t think anything of it, but several hours later, the couple heard noises in their house. Behzadzadeh gathered his wife and children in a bedroom and raced downstairs with a laser-sighted pistol. But the intruders were gone, along with his cellphone — and the satchel.

The robbery could have been a financial disaster. A month earlier, the Denver Fire Department had determined that the extraction room at Avicenna was not sufficiently blastproof, shutting down the factory for two months. “If they had taken $30,000 or $40,000,” Behzadzadeh said, “it would have literally broken my back.” But the satchel held no money that night. Behzadzadeh had opened his bank accounts the morning after the inspection. It had taken him a few months to fully embrace them — Seefried had predicted as much — but now around two-thirds of his trading partners also accept or write checks. What cash remains in Behzadzadeh’s business waits in safes, secured to the floor, for the armored truck.

Yet despite Safe Harbor’s efforts and those of its competitors, bank accounts remain out of reach for many Colorado marijuana companies. The high fees put off smaller businesses at the same time as banks seem to be pulling back. Seefried didn’t want Safe Harbor deposits to swamp the credit union, so last January she reduced the number of new monthly clients from five to three and then closed the door altogether in August after deposits ballooned over the summer. As the state Marijuana Enforcement Division granted licenses to about 220 additional companies last year through November, Safe Harbor’s waiting list swelled to 96 businesses, or two-and-a-half-years’ worth of new clients, before the credit union stopped adding names to it. The five banks have become very stingy about granting new accounts, according to three C.P.A.s who have marijuana-industry clients. The parent company of one of those institutions, Colorado National Bank, declared bankruptcy in November, putting its future as a marijuana banker in doubt.

Marijuana banking has always depended entirely on forbearance from Washington, and the Trump administration seems decidedly less tolerant than its predecessor. In March, Attorney General Jeff Sessions declared before a gathering of law-enforcement officials that marijuana is “only slightly less awful” than heroin. “I reject the idea that America will be a better place if marijuana is sold in every corner store,” he said. A task force assembled in February by the Justice Department to combat violent crime didn’t recommend any changes to the Cole memo when it delivered its initial findings last summer, but the agency is very likely revising it. (The Justice Department wouldn’t comment on its deliberations.) [Update: After this article went to press, the Justice Department rescinded the Cole memo, giving U.S. attorneys more latitude to enforce federal marijuana laws in their districts.]

Seefried had hoped to begin making loans to and processing credit-card transactions for Safe Harbor’s existing clients, but for the foreseeable future, she has put those plans on hold. Until federal law catches up to public sentiment, marijuana banking is unlikely to keep pace with the industry — the vetting is too expensive. Just last month, after a client was caught up in a police investigation, Safe Harbor had to review the account to make sure bankers hadn’t missed any suspicious activity. “It cost me $30,000: $20,000 to bring in investigators for three days and $10,000 for legal fees,” Seefried told me. “That’s one client, with one problem.” She plans to raise fees in the next couple months.

Yet she still hopes to position Safe Harbor as a model for banking the marijuana industry elsewhere. In November, Safe Harbor began licensing its name and protocols to financial institutions nationwide. Six banks and credit unions in six states will begin taking on customers this month, including a credit union in Colorado to serve the customers that Safe Harbor no longer can. Three more will join each quarter. Safe Harbor is also testing a mobile-phone app for buying marijuana in Hawaii’s handful of state-licensed dispensaries.

In June, the National Credit Union Association, Partner Colorado’s regulator, advised Seefried that it would begin examining the institution quarterly rather than yearly, which she attributes to the national expansion. She has hired two full-time employees just to manage the increased compliance. “We are bringing more credit unions to the table, and they don’t want us teaching them incorrectly,” she says.

One industry investment facilitator, the Arcview Group, estimates that legal marijuana sales in the United States will more than double to roughly $21 billion by 2021 from nearly $9 billion last year. Walsh, the former prosecutor, has concluded that the tension between the federal government and liberalizing states is unsustainable. “I came in as U.S. attorney in 2010 assuming that it was my job to enforce the federal marijuana laws regardless of what state legalization efforts would look like,” he says. “The longer I worked on the issue, and we struggled with it, the more I realized that a simple shut-it-down approach was not practical. The notion that you can put this genie back in the bottle today is not realistic.”

Continue reading the main story

Announcing the OpenWrt/LEDE merge

$
0
0

Both the OpenWrt and LEDE projects are happy to announce their unification under the OpenWrt name.

After long and sometimes slowly moving discussions about the specifics of the re-merge, with multiple similar proposals but little subsequent action, we’re happy to announce that both projects are about to execute the final steps of the merger.

svg

The new, unified OpenWrt project will be governed under the rules established by the LEDE project. Active members of both the former LEDE and OpenWrt projects will continue working on the unified OpenWrt.

LEDE’s fork and subsequent re-merge into OpenWrt will not alter the overall technical direction taken by the unified project. We will continue to work on improving stability and release maintenance while aiming for frequent minor releases to address critical bugs and security issues like we did with LEDE 17.01 and its four point releases until now.

Old pre-15.05 OpenWrt CC releases will not be supported by the merged project anymore, leaving these releases without any future security or bug fixes. The OpenWrt CC 15.05 release series will receive a limited amount of security and bug fixes, but is not yet fully integrated in our release automation, so binary releases are lacking behind for now.

The LEDE 17.01 release will continue to get full security and bug fix support for both source code and binary releases.
We are planning a new major release under the new name in the next few months.

The merged project will use the code base of the former LEDE project. OpenWrt specific patches not present in the LEDE repository but meeting LEDEs code quality requirements got integrated into the new tree. The source code will be hosted at git.openwrt.org with a continuously synchronized mirror hosted at Github. The original OpenWrt codebase has been archived on Github for future reference.

Yet open pull requests to both the old OpenWrt and LEDE repositories will be closed after a 30 day grace period. We encourage people to open new pull requests at the new repository or re-send still unmerged patches.

The remerged OpenWrt project is legally represented by the Software in the Public Interest (SPI) - an US 501(c)(3) non-profit organization which is managing our OpenWrt trademark, handling our donations and helping us with legal problems.

Infrastructure currently available under the lede-project.org domain will be moved to corresponding openwrt.org subdomains and redirects will be put in place where appropriate.

The merger of the OpenWrt and LEDE forums and wikis is not yet fully decided, so all portals will continue to be available under their respective domains until a definite consensus is reached.

The dawn of online piracy (2015)

$
0
0

One Saturday in 1994, Bennie Lydell Glover, a temporary employee at the PolyGram compact-disk manufacturing plant in Kings Mountain, North Carolina, went to a party at the house of a co-worker. He was angling for a permanent position, and the party was a chance to network with his managers. Late in the evening, the host put on music to get people dancing. Glover, a fixture at clubs in Charlotte, an hour away, had never heard any of the songs before, even though many of them were by artists whose work he enjoyed.

Later, Glover realized that the host had been d.j.’ing with music that had been smuggled out of the plant. He was surprised. Plant policy required all permanent employees to sign a “No Theft Tolerated” agreement. He knew that the plant managers were concerned about leaking, and he’d heard of employees being arrested for embezzling inventory. But at the party, even in front of the supervisors, it seemed clear that the disks had been getting out. In time, Glover became aware of a far-reaching underground trade in pre-release disks. “We’d run them in the plant in the week, and they’d have them in the flea markets on the weekend,” he said. “It was a real leaky plant.”

The factory sat on a hundred acres of woodland and had more than three hundred thousand square feet of floor space. It ran shifts around the clock, every day of the year. New albums were released in record stores on Tuesdays, but they needed to be pressed, packaged, and shrink-wrapped weeks in advance. On a busy day, the plant produced a quarter of a million CDs. Its lineage was distinguished: PolyGram was a division of the Dutch consumer-electronics giant Philips, the co-inventor of the CD.

One of Glover’s co-workers was Tony Dockery, another temporary hire. The two worked opposite ends of the shrink-wrapping machine, twelve feet apart. Glover was a “dropper”: he fed the packaged disks into the machine. Dockery was a “boxer”: he took the shrink-wrapped jewel cases and stacked them in a cardboard box for shipping. The jobs paid about ten dollars an hour.

Glover and Dockery soon became friends. They lived in the same town, Shelby, and Glover started giving Dockery a ride to work. They liked the same music. They made the same money. Most important, they were both fascinated by computers, an unusual interest for two working-class Carolinians in the early nineties—the average Shelbyite was more likely to own a hunting rifle than a PC. Glover’s father had been a mechanic, and his grandfather, a farmer, had moonlighted as a television repairman. In 1989, when Glover was fifteen, he went to Sears and bought his first computer: a twenty-three-hundred-dollar PC clone with a one-color monitor. His mother co-signed as the guarantor on the layaway plan. Tinkering with the machine, Glover developed an expertise in hardware assembly, and began to earn money fixing the computers of his friends and neighbors.

By the time of the party, he’d begun to experiment with the nascent culture of the Internet, exploring bulletin-board systems and America Online. Soon, Glover also purchased a CD burner, one of the first produced for home consumers. It cost around six hundred dollars. He began to make mixtapes of the music he already owned, and sold them to friends. “There was a lot of people down my way selling shoes, pocketbooks, CDs, movies, and fencing stolen stuff,” he told me. “I didn’t think they’d ever look at me for what I was doing.” But the burner took forty minutes to make a single copy, and business was slow.

Glover began to consider selling leaked CDs from the plant. He knew a couple of employees who were smuggling them out, and a pre-release album from a hot artist, copied to a blank disk, would be valuable. (Indeed, recording executives at the time saw this as a key business risk.) But PolyGram’s offerings just weren’t that good. The company had a dominant position in adult contemporary, but the kind of people who bought knockoff CDs from the trunk of a car didn’t want Bryan Adams and Sheryl Crow. They wanted Jay Z, and the plant didn’t have it.

By 1996, Glover, who went by Dell, had a permanent job at the plant, with higher pay, benefits, and the possibility of more overtime. He began working double shifts, volunteering for every available slot. “We wouldn’t allow him to work more than six consecutive days,” Robert Buchanan, one of his former managers, said. “But he would try.”

The overtime earnings funded new purchases. In the fall of 1996, Hughes Network Systems introduced the country’s first consumer-grade broadband satellite Internet access. Glover and Dockery signed up immediately. The service offered download speeds of up to four hundred kilobits per second, seven times that of even the best dial-up modem.

Glover left AOL behind. He soon found that the real action was in the chat rooms. Internet Relay Chat networks tended to be noncommercial, hosted by universities and private individuals and not answerable to corporate standards of online conduct. You created a username and joined a channel, indicated by a pound sign: #politics, #sex, #computers. Glover and Dockery became chat addicts; sometimes, even after spending the entire day together, they hung out in the same chat channel after work. On IRC, Dockery was St. James, or, sometimes, Jah Jah. And Glover was ADEG, or, less frequently, Darkman. Glover did not have a passport and hardly ever left the South, but IRC gave him the opportunity to interact with strangers from all over the world.

Also, he could share files. Online, pirated media files were known as “warez,” from “software,” and were distributed through a subculture dating back to at least 1980, which called itself the Warez Scene. The Scene was organized in loosely affiliated digital crews, which raced one another to be the first to put new material on the IRC channel. Software was often available on the same day that it was officially released. Sometimes it was even possible, by hacking company servers, or through an employee, to pirate a piece of software before it was available in stores. The ability to regularly source pre-release leaks earned one the ultimate accolade in digital piracy: to be among the “elite.”

By the mid-nineties, the Scene had moved beyond software piracy into magazines, pornography, pictures, and even fonts. In 1996, a Scene member with the screen name NetFraCk started a new crew, the world’s first MP3 piracy group: Compress ’Da Audio, or CDA, which used the newly available MP3 standard, a format that could shrink music files by more than ninety per cent. On August 10, 1996, CDA released to IRC the Scene’s first “officially” pirated MP3: “Until It Sleeps,” by Metallica. Within weeks, there were numerous rival crews and thousands of pirated songs.

Glover’s first visit to an MP3-trading chat channel came shortly afterward. He wasn’t sure what an MP3 was or who was making the files. He simply downloaded software for an MP3 player, and put in requests for the bots of the channel to serve him files. A few minutes later, he had a small library of songs on his hard drive.

One of the songs was Tupac Shakur’s “California Love,” the hit single that had become inescapable after Tupac’s death, several weeks earlier, in September, 1996. Glover loved Tupac, and when his album “All Eyez on Me” came through the PolyGram plant, in a special distribution deal with Interscope Records, he had even shrink-wrapped some of the disks. Now he played the MP3 of “California Love.” Roger Troutman’s talk-box intro came rattling through his computer speakers, followed by Dr. Dre’s looped reworking of the piano hook from Joe Cocker’s “Woman to Woman.” Then came Tupac’s voice, compressed and digitized from beyond the grave, sounding exactly as it did on the CD.

At work, Glover manufactured CDs for mass consumption. At home, he had spent more than two thousand dollars on burners and other hardware to produce them individually. His livelihood depended on continued demand for the product. But Glover had to wonder: if the MP3 could reproduce Tupac at one-eleventh the bandwidth, and if Tupac could then be distributed, free, on the Internet, what the hell was the point of a compact disk?

In 1998, Seagram Company announced that it was purchasing PolyGram from Philips and merging it with the Universal Music Group. The deal comprised the global pressing and distribution network, including the Kings Mountain plant. The employees were nervous, but management told them not to worry; the plant wasn’t shutting down—it was expanding. The music industry was enjoying a period of unmatched profitability, charging more than fourteen dollars for a CD that cost less than two dollars to manufacture. The executives at Universal thought that this state of affairs was likely to continue. In the prospectus that they filed for the PolyGram acquisition, they did not mention the MP3 among the anticipated threats to the business.

The production lines were upgraded to manufacture half a million CDs a day. There were more shifts, more overtime hours, and more music. Universal, it seemed, had cornered the market on rap. Jay Z, Eminem, Dr. Dre, Cash Money—Glover packaged the albums himself.

Six months after the merger, Shawn Fanning, an eighteen-year-old college dropout from Northeastern University, débuted a public file-sharing platform he had invented called Napster. Fanning had spent his adolescence in the same IRC underground as Glover and Dockery, and was struck by the inefficiency of its distribution methods. Napster replaced IRC bots with a centralized “peer-to-peer” server that allowed people to swap files directly. Within a year, the service had ten million users.

Before Napster, a leaked album had caused only localized damage. Now it was a catastrophe. Universal rolled out its albums with heavy promotion and expensive marketing blitzes: videos, radio spots, television campaigns, and appearances on late-night TV. The availability of pre-release music on the Internet interfered with this schedule, upsetting months of work by publicity teams and leaving the artists feeling betrayed.

Even before Napster’s launch, the plant had begun to implement a new anti-theft regimen. Steve Van Buren, who managed security at the plant, had been pushing for better safeguards since before the Universal merger, and he now instituted a system of randomized searches. Each employee was required to swipe a magnetized identification card upon leaving the plant. Most of the time, a green light appeared and the employee could leave. Occasionally, though, the card triggered a red light, and the employee was made to stand in place as a security guard ran a wand over his body, searching for the thin aluminum coating of a compact disk.

Van Buren succeeded in getting some of the flea-market bootleggers shut down. Plant management had heard of the technician who had been d.j.’ing parties with pre-release music, and Van Buren requested that he take a lie-detector test. The technician failed, and was fired. Even so, Glover’s contacts at the plant could still reliably get leaked albums. One had even sneaked out an entire manufacturing spindle of three hundred disks, and was selling them for five dollars each. But this was an exclusive trade, and only select employees knew who was engaged in it.

By this time, Glover had built a tower of seven CD burners, which stood next to his computer. He could produce about thirty copies an hour, which made bootlegging more profitable, so he scoured the other underground warez networks for material to sell: PlayStation games, PC applications, MP3 files—anything that could be burned to a disk and sold for a few dollars.

He focussed especially on movies, which fetched five dollars each. New compression technology could shrink a feature film to fit on a single CD. The video quality was poor, but business was brisk, and soon he was buying blank CDs in bulk. He bought a label printer to catalogue his product, and a color printer to make mockups of movie posters. He filled a black nylon binder with images of the posters, and used it as a sales catalogue. He kept his inventory in the trunk of his Jeep and sold the movies out of his car.

Glover still considered it too risky to sell leaked CDs from the plant. Nevertheless, he enjoyed keeping up with current music, and the smugglers welcomed him as a customer. He was a permanent employee with no rap sheet and an interest in technology, but outside the plant he had a reputation as a roughrider. He owned a Japanese street-racing motorcycle, which he took to Black Bike Week, in Myrtle Beach. He had owned several handguns, and on his forearm was a tattoo of the Grim Reaper, walking a pit bull on a chain.

His co-worker Dockery, by contrast, was a clean-cut churchgoer, and too square for the smugglers. But he had started bootlegging, too, and he pestered Glover to supply him with leaked CDs. In addition, Dockery kept finding files online that Glover couldn’t: movies that were still in theatres, PlayStation games that weren’t scheduled to be released for months.

For a while, Glover traded leaked disks for Dockery’s software and movies. But eventually he grew tired of acting as Dockery’s courier, and asked why the disks were so valuable. Dockery invited him to his house one night, where he outlined the basics of the warez underworld. For the past year or so, he’d been uploading the pre-release leaks Glover gave him to a shadowy network of online enthusiasts. This was the Scene, and Dockery, on IRC, had joined one of its most élite groups: Rabid Neurosis, or RNS. (Dockery declined to comment for this story.)

Instead of pirating individual songs, RNS was pirating entire albums, bringing the pre-release mentality from software to music. The goal was to beat the official release date whenever possible, and that meant a campaign of infiltration against the major labels.

The leader of RNS went by the handle Kali. He was a master of surveillance and infiltration, the Karla of music piracy. It seemed that he spent hours each week researching the confusing web of corporate acquisitions and pressing agreements that determined where and when CDs would be manufactured. With this information, he built a network of moles who, in the next eight years, managed to burrow into the supply chains of every major music label. “This stuff had to be his life, because he knew about all the release dates,” Glover said.

Dockery—known to Kali as St. James—was his first big break. According to court documents, Dockery encountered several members of RNS in a chat room, including Kali. Here he learned of the group’s desire for pre-release tracks. He soon joined RNS and became one of its best sources. But, when his family life began to interfere, he proposed that Glover take his place.

Glover hesitated: what was in it for him?

He learned that Kali was a gatekeeper to the secret “topsite” servers that formed the backbone of the Scene. The ultra-fast servers contained the best pirated media of every form. The Scene’s servers were well hidden, and log-ons were permitted only from pre-approved Internet addresses. The Scene controlled its inventory as tightly as Universal did—maybe tighter.

If Glover was willing to upload smuggled CDs from the plant to Kali, he’d be given access to these topsites, and he’d never have to pay for media again. He could hear the new Outkast album weeks before anyone else did. He could play Madden NFL on his PlayStation a month before it became available in stores. And he could get the same movies that had allowed Dockery to beat him as a bootlegger.

Dockery arranged a chat-room session for Glover and Kali, and the two exchanged cell-phone numbers. In their first call, Glover mostly just listened. Kali spoke animatedly, in a patois of geekspeak, California mellow, and slang borrowed from West Coast rap. He loved computers, but he also loved hip-hop, and he knew all the beefs, all the disses, and all the details of the feuds among artists on different labels. He also knew that, in the aftermath of the murders of Tupac and the Notorious B.I.G., those feuds were dying down. Def Jam, Cash Money, and Interscope had all signed distribution deals with Universal. Kali’s research kept taking him back to the Kings Mountain plant.

He and Glover hashed out the details of their partnership. Kali would track the release dates of upcoming albums and tell Glover which material he was interested in. Glover would acquire smuggled CDs from the plant. He would then rip the leaked CDs to the MP3 format and, using encrypted channels, send them to Kali’s home computer. Kali packaged the MP3s according to the Scene’s exacting technical standards and released them to its topsites.

The deal sounded good to Glover, but to fulfill Kali’s requests he’d have to get new albums from the plant much more frequently, three or four times a week. This would be difficult. In addition to the randomized search gantlet, a fence had been erected around the parking lot. Emergency exits set off alarms. Laptop computers were forbidden in the plant, as were stereos, portable players, boom boxes, and anything else that might accept and read a CD.

Every once in a while, a marquee release would come through—“The Eminem Show,” say, or Nelly’s “Country Grammar.” It arrived in a limousine with tinted windows, carried from the production studio in a briefcase by a courier who never let the master tape out of his sight. When one of these albums was pressed, Van Buren ordered wandings for every employee in the plant.

The CD-pressing machines were digitally controlled, and they generated error-proof records of their output. The shrink-wrapped disks were logged with an automated bar-code scanner. The plant’s management generated a report, tracking which CDs had been printed and which had actually shipped, and any discrepancy had to be accounted for. The plant might now press more than half a million copies of a popular album in a day, but the inventory could be tracked at the level of the individual disk.

Employees like Glover, who worked on the packaging line, had the upper hand when it came to smuggling CDs. Farther down the line and the disks would be bar-coded and logged in inventory; farther up and they wouldn’t have access to the final product. By this time, the packaging line was becoming increasingly complex. The chief advantage of the compact disk over the MP3 was the satisfaction of owning a physical object. Universal was really selling packaging. Album art had become ornate. The disks were gold or fluorescent, the jewel cases were opaque blue or purple, and the album sleeves were thick booklets printed on high-quality paper. Dozens, sometimes hundreds, of extra disks were now being printed for every run, to be used as replacements in case any were damaged during packaging.

At the end of each shift, employees put the overstock disks into scrap bins. These scrap bins were later taken to a plastics grinder, where the disks were destroyed. Over the years, Glover had dumped hundreds of perfectly good disks into the bins, and he knew that the grinder had no memory and generated no records. If there were twenty-four disks and only twenty-three made it into the grinder’s feed slot, no one in accounting would know.

So, on the way from the conveyor belt to the grinder, an employee could take off his surgical glove while holding a disk. He could wrap the glove around the disk and tie it off. He could then hide the disk, leaving everything else to be destroyed. At the end of his shift, he could return and grab the disk.

That still left the security guards. But here, too, there were options. One involved belt buckles. They were the signature fashion accessories of small-town North Carolina. Many people at the plant wore them—big oval medallions with the Stars and Bars on them. Gilt-leaf plates embroidered with fake diamonds that spelled out the word “boss.” Western-themed cowboy buckles with longhorn skulls and gold trim. The buckles always set off the wand, but the guards wouldn’t ask anyone to take them off.

Hide the disk inside the glove; hide the glove inside a machine; retrieve the glove and tuck it into your waistband; cinch your belt so tight it hurts your bladder; position your oversized belt buckle in front of the disk; cross your fingers as you shuffle toward the turnstile; and, if you get flagged, play it very cool when you set off the wand.

From 2001 on, Glover was the world’s leading leaker of pre-release music. He claims that he never smuggled the CDs himself. Instead, he tapped a network of low-paid temporary employees, offering cash or movies for leaked disks. The handoffs took place at gas stations and convenience stores far from the plant. Before long, Glover earned a promotion, which enabled him to schedule the shifts on the packaging line. If a prized release came through the plant, he had the power to ensure that his man was there.

The pattern of label consolidation had led to a stream of hits at Universal’s factory. Weeks before anyone else, Glover had the hottest albums of the year. He ripped the albums on his PC with software that Kali had sent, and then uploaded the files to him. The two made weekly phone calls to schedule the timing of the leaks.

Glover left the distribution to Kali. Unlike many Scene members, he didn’t participate in technical discussions about the relative merits of constant and variable bit rates. He listened to the CDs, but he often grew bored after only one or two plays. When he was done with a disk, he stashed it in a black duffelbag in his bedroom closet.

By 2002, the duffelbag held more than five hundred disks, including nearly every major release to have come through the Kings Mountain plant. Glover leaked Lil Wayne’s “500 Degreez” and Jay Z’s “The Blueprint.” He leaked Queens of the Stone Age’s “Rated R” and 3 Doors Down’s “Away from the Sun.” He leaked Björk. He leaked Ashanti. He leaked Ja Rule. He leaked Nelly. He leaked Blink-182’s “Take Off Your Pants and Jacket.”

Glover didn’t have access to big-tent mom-rock artists like Celine Dion and Cher. But his albums tended to be the most sought after in the demographic that mattered: generation Eminem. The typical Scene participant was a computer-obsessed male, between the ages of fifteen and thirty. Kali—whose favorite artists included Ludacris, Jay Z, and Dr. Dre—was the perfect example. For Glover, the high point of 2002 came in May, when he leaked “The Eminem Show” twenty-five days before its official release. The leak made its way from the Scene’s topsites to public peer-to-peer networks within hours, and, even though the album became the year’s best-seller, Eminem was forced to bump up its release date.

Every Scene release was accompanied by an NFO (from “info”), an ASCII-art text file that served as the releasing group’s signature tag. NFO files were a way for Scene crews to brag about their scores, shout out important associates, and advertise to potential recruits. Rabid Neurosis NFOs were framed by psychedelic smoke trails emanating from a marijuana leaf at the bottom:

Team Rns Presents

Artist: Eminem

Title: The Eminem Show

Label: Aftermath

Ripper: Team RNS

192 kbps-Rap

1hr 17min total-111.6 mb

Release Date: 2002-06-04

Rip Date: 2002-05-10

The most important line was the rip date, which emphasized the timeliness of the leak. Kali drafted many of the release notes himself, in a sarcastic tone, often taunting rival releasing groups. “The Eminem Show” NFO ended with a question: “Who else did you think would get this?”

Who was Kali? Glover wasn’t sure, but as their relationship evolved he picked up some clues. Kali’s 818 area code was from the Los Angeles region. The voice in the background that Glover sometimes heard on the calls sounded as if it might be Kali’s mother. There was also the marijuana leaf that served as RNS’s official emblem: Glover thought he could tell when Kali was high. Most striking was the exaggerated hip-hop swagger that Kali affected. He only ever referred to Glover as “D.” No one else called him that.

“He would try to talk, like, with a slang,” Glover told me. “Kinda cool, kinda hard.” Glover suspected that Kali wasn’t black, though he sensed that he probably wasn’t white, either.

Glover was not permitted to interact with the other members of the group, not even the one who served as the “ripping coördinator.” His online handle was RST, and his name was Simon Tai. A second-generation Chinese immigrant, Tai was brought up in Southern California before arriving at the University of Pennsylvania, in 1997. As a freshman with a T1 Internet connection, he’d been in awe of RNS. After hanging around in the chat channel for nearly a year, he was asked to join.

He also applied for a slot as a d.j. at the school’s radio station. For two years, Kali cultivated Tai’s interest in rap music and told him to make connections with the promotional people at various labels. In 2000, Tai, now a senior at Penn, was promoted to music director at the station and given a key to the office, where he had access to the station’s promo disks. Every day, he checked the station’s mail; when something good came in, he raced back to his dorm room to upload it. Beating rival Scene crews was sometimes a matter of seconds.

Tai scored two major leaks that year, Ludacris’s “Back for the First Time” and Outkast’s “Stankonia.” With his Scene credentials established, for the next two years Tai managed RNS’s roster of leakers. Along with Kali, he tracked the major labels’ distribution schedules and directed his sources to keep an eye out for certain albums.

To find the albums, RNS had international contacts at every level, who went by anonymous online handles. According to court testimony and interviews with Scene members, there were the radio d.j.s: BiDi, in the South; DJ Rhino, in the Midwest. There was the British music journalist who went by KSD, whose greatest coup was 50 Cent’s “lost” début, “Power of the Dollar,” scheduled for release in 2000 by Columbia, but cancelled after the rapper was shot. There was DaLive1, a house-music aficionado who lived in New York City, and used his connections inside Viacom to source leaks from Black Entertainment Television and MTV. There were two Italian brothers sharing the handle Incuboy, who claimed to run a music-promotion business and had reliable access to releases from Sony and Bertelsmann. In Japan, albums sometimes launched a week or two ahead of the U.S. release date, often with bonus tracks, and Tai relied on kewl21 and x23 to source them. Finally, there were the Tuesday rippers, like Aflex and Ziggy, who spent their own money to buy music legally the day that it appeared in stores.

The only leaker Tai didn’t manage was Glover—Kali kept his existence a secret, even from the other members of the group. Glover resented the isolation, but being Kali’s private source was worth the trouble. At any given time, global Scene membership amounted to no more than a couple of thousand people. Kali was close to the top. A typical Scene pirate, bribing record-store employees and cracking software, might be granted access to three or four topsites. By 2002, Glover had access to two dozen.

His contacts made him an incomparable movie bootlegger. He built another tower to replace the first, with burners for DVDs instead of CDs. He upgraded his Internet connection from satellite to cable. He downloaded the past few years’ most popular movies from the topsites, then burned a couple of dozen copies of each. Expanding his customer base beyond his co-workers, he started meeting people in the parking lot of a nearby convenience store. Around Cleveland County, Glover became known as “the movie man.” For five dollars, he would sell you a DVD of “Spider-Man” weeks before it was available at Blockbuster, sometimes even while it was still in theatres.

Glover started selling between two hundred and three hundred DVDs a week, frequently making more than a thousand dollars in cash. He built a second PC and another burn tower to keep up with demand. He knew that this was illegal, but he felt certain that he had insulated himself from suspicion. All transactions were hand to hand, no records were kept, and he never deposited his earnings in the bank. He didn’t sell music, DVDs weren’t made at the Universal plant, and he was sure that his customers had never heard of the Scene.

Scene culture drew a distinction between online file-sharing and for-profit bootlegging. The topsites were seen as a morally permissible system of trade. Using them for the physical bootlegging of media, by contrast, was viewed as a serious breach of ethical principles. Worse, it was known to attract the attention of the law. Kali put the word out that anyone suspected of selling material from the topsites would be kicked out of the group. Thus, for most participants membership in RNS was a money-losing proposition. They spent hundreds of dollars a year on compact disks, and thousands on servers and broadband, and got only thrills in return.

Glover was an exception: he knew that he wouldn’t be kicked out of anything. With Universal’s rap acts ascending, Kali needed Glover.

Napster lasted barely two years, in its original incarnation, but at its peak the service claimed more than seventy million registered accounts, with users sharing more than two billion MP3 files a month. Music piracy became to the early two-thousands what drug experimentation had been to the late nineteen-sixties: a generation-wide flouting of both social norms and the existing body of law, with little thought for consequences. In late 1999, the Recording Industry Association of America, the music business’s trade and lobbying group, sued Napster, claiming that the company was facilitating copyright infringement on an unprecedented scale. Napster lost the lawsuit, appealed, and lost again. In July, 2001, facing a court order to stop enabling the trade of copyrighted files, Napster shut down its service.

That legal victory achieved little. Former users of Napster saw Internet file-sharing as an undeniable prerogative, and instead of returning to the record stores they embraced gray-market copycats of Napster, like Kazaa and Limewire. By 2003, global recording-industry revenues had fallen from their millennial peak by more than fifteen per cent. The losing streak continued for the next decade.

The R.I.A.A. tried to reassert the primacy of the industry’s copyrights. But civil suits against the peer-to-peer services took years to move through the appeals courts, and the R.I.A.A.’s policy of suing individual file-sharers was a public-relations disaster. To some at the music labels, Congress seemed disinclined to help. Harvey Geller, Universal’s chief litigator, spent years futilely petitioning legislators for better enforcement of copyright law. “Politicians pander to their constituents,” Geller said. “And there were more constituents stealing music than constituents selling it.”

Leaking was viewed differently. No one was advocating for the smuggler. So album leakers adhered to a rigid code of silence. Scene groups were the source for almost all of the new releases available on the peer-to-peer networks, but most file-sharers didn’t even suspect their existence. Civil litigation against such actors was impossible: unlike Kazaa, RNS did not have a business address to which a subpoena could be sent. Only criminal prosecutions would work.

In January, 2003, Glover leaked 50 Cent’s official début, “Get Rich or Die Tryin’,” to Kali. It became the bestselling U.S. album of the year. He followed that up with albums from Jay Z, G Unit, Mary J. Blige, Big Tymers, and Ludacris, and then began the following year with Kanye West’s début, “The College Dropout.” After a scare, in which Glover worried that a release might be traced to him, the timing of leaks became more and more a point of focus. Glover’s leaks began to hit the Internet about two weeks before the CDs were due in stores, neither so early that the leak could be traced to the plant nor so late that RNS risked being bested by other pirates.

The group’s ascendancy came during a period of heightened scrutiny by law enforcement. In April, 2004, the F.B.I. and foreign law-enforcement agencies conducted coördinated raids in eleven countries, identifying more than a hundred pirates. The R.I.A.A.’s anti-piracy unit was staffed with investigators, who hung around the chat rooms of the Scene and learned its language. They tried to infiltrate the Scene, and tracked the leaked material and its dissemination throughout the Internet. Their research began to point them to one increasingly powerful crew, RNS, and they shared their findings with the F.B.I.

Journalists poked around the fringes of the Scene, too. A December, 2004, article in Rolling Stone, by Bill Werde, introduced RNS to the general public. A photo caption in the piece read, “In a four-day period, one group leaked CDs by U2, Eminem and Destiny’s Child.” The article quoted a source close to Eminem: “The rapper’s camp believes Encore was leaked when it went to the distributors, who deliver albums from the pressing plants to chain stores such as Wal-Mart.”

The information was wrong. The CD hadn’t come from the distributor; it had come from Glover. Three days later, he leaked the U2 album “How to Dismantle an Atomic Bomb.” (Destiny’s Child’s “Destiny Fulfilled” had come from elsewhere.) Facing increased attention, Kali decided to strip the group’s NFO files of potentially identifying information; from now on, they would consist only of the date that the album was ripped and the date that it was due in stores.

Kali ordered the RNS chat channel moved from the public IRC servers to a private computer in Hawaii. He instructed members to communicate only through this channel, which was encrypted, banning methods like AOL Instant Messenger. And he reasserted the prohibition against physical bootlegging. But Glover refused to follow the Scene’s rules. He used I.M. whenever he felt like it, and kept his duffelbag of leaked CDs in his closet. He wasn’t as interested in music anymore, or in earning Brownie points from some Internet group. All he cared about was topsites. The more he could join, the more leaked movies he could get, and the more DVDs he could sell.

In a good week, Glover on his own might sell three hundred disks, and make fifteen hundred dollars in cash. Now he began to branch out. At the beginning of each week, he dropped off four hundred disks at each of three trusted barbershops in Shelby. At the end of the week, he returned to collect his share of the profits—roughly six hundred dollars a week per shop. His best salesman made more selling bootleg movies than he did cutting hair. Seeing the profits Glover was earning, other bootleggers began moving into his territory. But Glover retained a pronounced edge. “I had access to so much stuff,” he said. “No one on the street could beat me.”

Many of Glover’s best customers worked at the plant, and for those he trusted most he devised an even better deal. Rather than paying five dollars per movie, for twenty dollars a month you could buy an unlimited subscription—and you didn’t even need the disks. Glover had set up his own topsite, and once you’d bought an account you could download anything you wanted. There were current DVDs, plus the latest copies of games, music, software, and more. At the time, video on demand was the technology of the future, but, if you knew Glover, it had already arrived. He was running a private Netflix out of his house.

Glover began to make extravagant purchases. He bought game consoles and presents for his friends and his family. He bought a new off-road quad bike, then a second. He bought a used Lincoln Navigator, and upgraded it with xenon headlights, a hood scoop, and an expensive stereo. For years, rappers had favored rims called “spinners”—metal hubcaps on independent bearings, which continued rotating even when the car had stopped. Looking to switch up the game, Glover bought “floaters”: the weighted rims stood still even when the wheels were moving.

In 2005, RNS leaked four of the five best-selling albums in the U.S. The No. 1 and No. 2 slots were occupied by Mariah Carey’s “The Emancipation of Mimi” and 50 Cent’s “The Massacre,” and Glover had leaked them both. RNS leaks quickly made their way onto public file-sharing networks, and, within forty-eight hours of appearing on the topsites, copies of the smuggled CDs could be found on iPods across the globe.

By the end of 2006, Glover had leaked nearly two thousand CDs. He was no longer afraid of getting caught. Universal had sold its compact-disk-manufacturing holdings, which allowed the company to watch the deterioration of physical media from a comfortable distance. Although still on contract to print music for Universal, the new ownership treated the plant like a wasting asset, and stopped investing in maintenance. The musicians signed to Universal complained constantly of album leaks, but the label’s supply chain was as insecure as ever.

Although RNS was still wildly successful, many of its members were tiring of its activities. When the group started, in 1996, most of the participants were teen-agers. Now they were approaching thirty, and the glamour was fading. They outgrew their jobs at college radio stations or found more lucrative fields than music journalism, and lost their access to advance albums.

Listening to hundreds of new releases a year could lead to a kind of cynicism. The musicians all used Auto-Tune to pitch-correct their voices; the songwriters all copied the last big hit; the same producers worked on every track. Glover didn’t connect with rap in the way that he used to. Tony Dockery had been born again, and listened primarily to gospel. Simon Tai still hung around the chat channel, but he hadn’t leaked an album in years. Even Kali seemed a little bored.

Glover had been thinking about retiring from the Scene. He started leaking when he was in his mid-twenties. He was now thirty-two. He had worn the same haircut for ten years, and dressed in the same screen-print T-shirts and bluejeans, but his perception of himself was changing. He didn’t remember why he had been so attracted to street bikes, or why he’d felt it necessary to own a handgun. He found his Grim Reaper tattoo impossibly stupid.

Glover’s DVD profits began to decline. Leaks from the Scene were now publicly available within seconds of being posted to the topsites, and even those who were technologically challenged could figure out how to download them. Within a couple of years, Glover’s income from bootlegging dropped to a few hundred dollars a week.

Glover began to make his feelings known to Kali. “We’ve been doing this shit for a long time,” he said in a phone call. “We never got caught. Maybe it’s time to stop.” Surprisingly, Kali agreed. Though the plant’s security was increasingly loose, the risks for leakers were greater. Between foreign law enforcement, the F.B.I., and the R.I.A.A.’s internal anti-piracy squads, there were multiple teams of investigators working to catch them. Kali understood the lengths to which law enforcement was willing to go. Some of the targets of the 2004 raids were his friends, and he had visited them in federal prison.

Then, in January of 2007, one of RNS’s topsites mysteriously vanished. The server, which was hosted in Hungary, began refusing all connections, and the company that owned it didn’t respond. Kali ordered the group shut down. RNS’s final leak, released on January 19, 2007, was Fall Out Boy’s “Infinity on High,” sourced from inside the plant by Glover.

Dozens of former members flooded into the chat channel to pay their respects. Dockery, logging in as St. James, started changing his handle, over and over, in tribute to former members. “Even if we quit now, I’ll think about it always,” Kali wrote. “I don’t know about you guys, but why keep taking a chance.” Soon afterward, the RNS channel was closed forever.

Within months, Glover was once again leaking CDs from the plant, to a guy he knew as RickOne, a leader in a Scene releasing group called OSC. Though this was no longer as profitable for Glover, his desire for free media was undiminished. “To know that I could be playing Madden two months before the stores even had it—to me, that was heaven,” Glover told me.

Kali wasn’t able to give up, either. After RNS was shut down, he had continued sourcing and leaking albums, attributing the leaks to nonsense three-letter acronyms that bewildered even Scene veterans. In the summer of 2007, he contacted Glover and told him that there were two more leaks they had to have: new albums by 50 Cent and Kanye West, both with the same release date. The rappers were competing over whose album would sell more copies, and the feud had made the cover of Rolling Stone. 50 Cent said that if he didn’t win he would retire.

But, as Kali probably knew better than anyone, both artists were distributed and promoted by Universal. What looked like an old-school hip-hop beef was actually a publicity stunt designed to boost sales, and Kali was determined to get involved. RNS had leaked every release the artists had ever put out, and going after 50’s “Curtis” and Kanye’s “Graduation” was a matter of tradition.

The official release date was September 11, 2007, but the albums were first pressed at the plant in mid-August. Glover obtained them through his smuggling network and listened to both. “Graduation” was an ambitious marriage of pop rap and high art, sampling widely from sources as diverse as krautrock and French house music, with cover art by Takashi Murakami. “Curtis” played it safer, favoring hard-thumping club music anchored by hits like “I Get Money” and “Ayo Technology.”

Glover enjoyed both albums, but he was in an unusual position: he had the power to influence the outcome of this feud. If he leaked “Graduation” and held on to “Curtis,” Kanye might sell fewer records. But if he leaked “Curtis” and held on to “Graduation”—well, he might make 50 Cent retire.

Glover decided that he would release one album through Kali and the other through RickOne. He offered RickOne the Kanye West album. On August 30, 2007, “Graduation” hit the topsites of the Scene, with OSC taking credit for the leak. Within hours, an anguished Kali called Glover, who told him that he wasn’t sure how it had happened. He said that he hadn’t seen the album at the plant yet. But, he said, “Curtis” had just arrived. On September 4, 2007, Kali released “Curtis” to the Scene.

Universal officially released the albums on Tuesday, September 11th. Despite the leaks, both sold well. “Curtis” sold almost seven hundred thousand copies in its first week, “Graduation” nearly a million. Kanye won the sales contest, even though Glover had leaked his album first. He’d just run a controlled experiment on the effects of leaking on music sales, an experiment that suggested that, at least in this case, the album that was leaked first actually did better. But Glover was happy with the outcome. “Graduation” had grown on him. He liked Kanye’s album, and felt that he deserved his victory. And 50 didn’t retire after all.

On Wednesday, September 12th, Glover went to work at 7 P.M. He had a double shift lined up, lasting through the night. He finished at 7 A.M. As he was preparing to leave, a co-worker pulled him aside. “There’s someone out there hanging around your truck,” he said.

In the dawn light, Glover saw three men in the parking lot. As he approached his truck, he pulled the key fob out of his pocket. The men stared at him but didn’t move. Then he pressed the remote, the truck chirped, and the men drew their guns and told him to put his hands in the air.

The men were from the Cleveland County sheriff’s office. They informed Glover that the F.B.I. was currently searching his house; they had been sent to retrieve him.

In his front yard, half a dozen F.B.I. agents in bulletproof vests were milling around. Glover’s door had been forced open, and agents were carting away the thousands of dollars’ worth of technology purchases he’d made over the years. He found an F.B.I. special agent named Peter Vu waiting for him inside.

Vu, a veteran of the bureau’s computer-crimes division, had spent years searching for the source of the leaks that were crippling the music industry. His efforts had finally led him to this unremarkable ranch house in small-town North Carolina. He introduced himself, then began pressing Glover for information. Vu was particularly interested in Kali, and Glover gave him the scattered details he had picked up over the years. But Vu wanted Kali’s real name, and, although Glover had talked on the phone with Kali hundreds of times, he didn’t know it.

The next day, Kali called Glover. His voice was agitated and nervous.

“It’s me,” Kali said. “Listen, I think the Feds might be onto us.”

Vu had anticipated the possibility of such a call and had instructed Glover to act as if nothing had happened. Glover now had a choice to make. He could play dumb, and further the investigation of Kali. Or he could warn him off.

“You’re too late,” Glover said. “They hit me yesterday. Shut it down.”

“O.K., I got you,” Kali said. Then he said, “I appreciate it,” and hung up.

In the next few months, the F.B.I. made numerous raids, picking up RickOne, of OSC, and several members of RNS. They also found the man they believed to be Kali, the man who had cost the music industry tens of millions of dollars and transformed RNS into the most sophisticated piracy operation in history: Adil R. Cassim, a twenty-nine-year-old Indian-American I.T. worker who smoked weed, listened to rap music, and lived at home in the suburbs of Los Angeles with his mother.

On September 9, 2009, Glover arrived at the federal courthouse in Alexandria, Virginia, and was indicted on one count of felony conspiracy to commit copyright infringement. At his indictment, Glover saw Adil Cassim for the first time. Cassim was clean-shaven and wore his hair cropped short. He was stocky, with a noticeable paunch, and was dressed in a black suit.

A month later, Glover pleaded guilty to the charge. The decision to plead was a difficult one, but Glover thought that his chances of acquittal were poor. In exchange for sentencing leniency, he agreed to testify against Cassim. The F.B.I. needed the help; the agency had thoroughly searched Cassim’s residence, and a forensic team had inspected his laptop, but they had found no pre-release music. Cassim did not admit to being a member of RNS, though two pieces of physical evidence suggested a connection to the group. One was a burned compact disk taken from his bedroom, containing a copy of Cassim’s résumé, on which, in the “Properties” tab, Microsoft Word had automatically included the name of the document’s author: Kali. The second was Cassim’s mobile phone, which contained Glover’s cell number. The contact’s name was listed only as “D.”

Cassim’s trial began in March, 2010, and lasted for five days. Glover testified, as did several other confessed members of RNS, along with a number of F.B.I. agents and technical experts. In the previous ten years, the federal government had prosecuted hundreds of Scene participants, and had won nearly every case it had brought. But on March 19, 2010, after a short period of deliberation, a jury found Cassim not guilty.

After the trial, Glover began to regret his decision to testify and to plead guilty. He wondered if, with a better legal defense, he, too, might have been acquitted. He’d never been sure exactly what damage leaking music actually caused the musicians, and at times he seemed to regard it as something less than a crime.

“Look at 50 Cent,” he said. “He’s still living in Mike Tyson’s house. Ain’t nobody in the world that can hurt them.” He continued, “It’s a loss, but it’s also a form of advertising.” He paused. “But they probably lost more than they gained.” In the end, Glover served three months in prison. (Tony Dockery also pleaded guilty to conspiracy to commit copyright infringement, and spent three months in prison. Simon Tai was never charged with any wrongdoing.)

In their sentencing guidelines, the attorneys for the Department of Justice wrote, “RNS was the most pervasive and infamous Internet piracy group in history.” In eleven years, RNS leaked more than twenty thousand albums. For much of this time, the group’s best asset was Glover—there was scarcely a person younger than thirty who couldn’t trace music in his or her collection to him.

On the day that Glover’s home was raided, F.B.I. agents confiscated his computers, his duplicating towers, his hard drives, and his PlayStation. They took a few pictures of the albums he’d collected over the years, but they left the duffelbag full of compact disks behind—even as evidence, they were worthless. ♦

Battle of the Clipper Chip (1994)

$
0
0

On a sunny spring day in Mountain View, Calif., 50 angry activists are plotting against the United States Government. They may not look subversive sitting around a conference table dressed in T-shirts and jeans and eating burritos, but they are self-proclaimed saboteurs. They are the Cypherpunks, a loose confederation of computer hackers, hardware engineers and high-tech rabble-rousers.

The precise object of their rage is the Clipper chip, offically known as the MYK-78 and not much bigger than a tooth. Just another tiny square of plastic covering a silicon thicket. A computer chip, from the outside indistinguishable from thousands of others. It seems improbable that this black Chiclet is the focal point of a battle that may determine the degree to which our civil liberties survive in the next century. But that is the shared belief in this room.

The Clipper chip has prompted what might be considered the first holy war of the information highway. Two weeks ago, the war got bloodier, as a researcher circulated a report that the chip might have a serious technical flaw. But at its heart, the issue is political, not technical. The Cypherpunks consider the Clipper the lever that Big Brother is using to pry into the conversations, messages and transactions of the computer age. These high-tech Paul Reveres are trying to mobilize America against the evil portent of a "cyberspace police state," as one of their Internet jeremiads put it. Joining them in the battle is a formidable force, including almost all of the communications and computer industries, many members of Congress and political columnists of all stripes. The anti-Clipper aggregation is an equal-opportunity club, uniting the American Civil Liberties Union and Rush Limbaugh.

The Clipper's defenders, who are largely in the Government, believe it represents the last chance to protect personal safety and national security against a developing information anarchy that fosters criminals, terrorists and foreign foes. Its adherents pose it as the answer, or at least part of the answer, to a problem created by an increasingly sophisticated application of an age-old technology: cryptography, the use of secret codes.

For centuries, cryptography was the domain of armies and diplomatic corps. Now it has a second purpose: protecting personal and corporate privacy. Computer technology and advanced telecommunications equipment have drawn precious business information and intimate personal communications out into the open. This phenomenon is well known to the current Prince of Wales, whose intimate cellular phone conversations were intercepted, recorded and broadcast worldwide. And corporations realize that competitors can easily intercept their telephone conversations, electronic messages and faxes. High tech has created a huge privacy gap. But miraculously, a fix has emerged: cheap, easy-to-use, virtually unbreakable encryption. Cryptography is the silver bullet by which we can hope to reclaim our privacy.

The solution, however, has one drawback: cryptography shields the law abiding and the lawless equally. Law-enforcement and intelligence agencies contend that if strong codes are widely available, their efforts to protect the public would be paralyzed. So they have come up with a compromise, a way to neutralize such encryption. That's the Clipper chip and that compromise is what the war is about.

The idea is to give the Government means to override other people's codes, according to a concept called "key escrow." Employing normal cryptography, two parties can communicate in total privacy, with both of them using a digital "key" to encrypt and decipher the conversation or message. A potential eavesdropper has no key and therefore cannot understand the conversation or read the data transmission. But with Clipper, an additional key -- created at the time the equipment is manufactured -- is held by the Government in escrow. With a court-approved wiretap, an agency like the F.B.I. could listen in. By adding Clipper chips to telephones, we could have a system that assures communications will be private -- from everybody but the Government.

And that's what rankles Clipper's many critics. Why, they ask, should people accused of no crime have to give Government the keys to their private communications? Why shouldn't the market rather than Government determine what sort of cryptosystem wins favor. And isn't it true that the use of key escrow will make our technology so unattractive to the international marketplace that the United States will lose its edge in the lucrative telecommunications and computer fields? Clipper might clip the entire economy.

Nonetheless, on Feb. 4 the White House announced its approval of the Clipper chip, which had been under study as a Government standard since last April, and the Crypto War broke out in full force. Within a month, one civil liberties group, Computer Professionals for Social Responsibility, received 47,000 electronic missives urging a stop to Clipper. "The war is upon us," wrote Tim May, co-founder of the Cypherpunks, in an urgent electronic dispatch soon after the announcement. "Clinton and Gore folks have shown themselves to be enthusiastic supporters of Big Brother."

And though the Clinton Administration's endorsement of Clipper as a Government standard required no Congressional approval, rumblings of discontent came from both sides of the Capitol. Senator Patrick J. Leahy, the Vermont Democrat whose subcomittee has held contentious hearings on the matter, has called the plan a "misstep," charging that "the Government should not be in the business of mandating particular technologies."

Two weeks ago, an AT&T Bell Laboratories researcher revealed that he had found a serious flaw in the Clipper technology itself, enabling techno-savvy lawbreakers to bypass the security fuction of the chip in some applications. Besides being a bad idea, Clipper's foes now say, it doesn't even work properly.

Yet the defenders of Clipper have refused to back down, claiming that the scheme -- which is, they often note, voluntary -- is an essential means of stemming an increasing threat to public safety and security by strong encryption in everyday use. Even if Clipper itself has to go back to the drawing board, its Government designers will come up with something quite similar. The underlying issue remains unchanged: If something like Clipper is not implemented, writes Dorothy E. Denning, a Georgetown University computer scientist, "All communications on the information highway would be immune from lawful interception. In a world threatened by international organized crime, terrorism and rogue governments, this would be folly."

The claims from both sides sound wild, almost apocalyptic. The passion blurs the problem: Can we protect our privacy in an age of computers -- without also protecting the dark forces in society?

The crypto war is the inevitable consequence of a remarkable discovery made almost 20 years ago, a breakthrough that combined with the microelectronics revolution to thrust the once-obscure field of cryptography into the mainstream of communications policy.

It began with Whitfield Diffie, a young computer scientist and cryptographer. He did not work for the Government, which was strange because in the 1960's almost all serious crypto in this country was done under Federal auspices, specifically at the Fort Meade, Md., headquarters of the supersecret National Security Agency. Though it became bigger than the C.I.A., the N.S.A. was for years unknown to Americans; the Washington Beltway joke was that the initials stood for "No Such Agency." Its working premise has always been that no information about its activities should ever be revealed. Its main mission involved cryptography, and the security agency so dominated the field that it had the power to rein in even those few experts in the field who were not on its payroll.

But Whitfield Diffie never got that message. He had been bitten by the cryptography bug at age 10 when his father, a professor, brought home the entire crypto shelf of the City College library in New York. Then he lost interest, until he arrived at M.I.T.'s Artifical Intelligence Laboratory in 1966. Two things rekindled his passion. Now trained as a mathematician, he had an affinity for the particular challenges of sophisticated crypto. Just as important, he says, "I was always concerned about individuals, an individual's privacy as opposed to Goverment secrecy."

Diffie, now 50, is still committed to those beliefs. When asked about his politics, he says, "I like to describe myself as an iconoclast." He is a computer security specialist for Sun Microsystems, a celebrated cryptographer and an experienced hand at Congressional testimony. But he looks like he stumbled out of a Tom Robbins novel -- with blond hair that falls to his shoulders and a longish beard that seems a virtual trademark among code makers. At a Palo Alto, Calif., coffeehouse one morning, he describes, in clipped, precise cadence, how he and Martin E. Hellman, an electrical engineering professor at Stanford University, created a crypto revolution.

Diffie was dissatisfied with the security on a new time-sharing computer system being developed by M.I.T. in the 1960's. Files would be protected by passwords, but he felt that was insufficient. The system had a generic flaw. A system manager had access to all passwords. "If a subpeona was served against the system managers, they would sell you out, because they had no interest in going to jail," Diffie says. A perfect system would eliminate the need for a trusted third party.

This led Diffie to think about a more general problem in cryptography: key management. Even before Julius Caesar devised a simple cipher to encode his military messages, cryptography worked by means of keys. That is, an original message (what is now called "plaintext") was encrypted by the sender into seeming gibberish (known as "ciphertext"). The receiver, using the same key, decrypted the message back into the original plaintext. For instance, the Caesar key was the simple replacement of each letter by the letter three places down in the alphabet. If you knew the key, you could encrypt the word help into the nonsense word khos; the recipient of the message would decrypt the message back to help.

The problem came with protecting the key. Since anyone who knew the Caesar key would be able to understand the encoded message, it behooved the Romans to change that key as often as possible. But if you change the key, how do you inform your spies behind enemy lines? (If you tell them using the old code, which may have already been cracked, your enemies will then learn the new code.) For centuries, generals and diplomats have faced that predicament. But a few years ago, it took on added urgency.

With computers and advanced telecommunications, customers outside Government were discovering a need for information security. Cryptography was the answer, but how could it be applied widely, considering the problem of keys? The best answer to date was something called a key-management repository, where two parties who wanted secrecy would go to a trusted third party who would generate a new key for the private session. But that required just what Diffie deplored -- an unwanted third wheel.

"The virtue of cryptography should be that you don't have to trust anybody not directly involved with your communication," Diffie says. "Without conventional key distribution centers, which involved trusting third parties, I couldn't figure how you could build a system to secure, for instance, all the phones in the country."

When Diffie moved to Stanford University in 1969, he foresaw the rise of home computer terminals and began pondering the problem of how to use them to make transactions. "I got to thinking how you could possibly have electronic business, because signed letters of intent, contracts and all seemed so critical," he says. He devoured what literature he could find outside the National Security Agency. And in the mid-1970's, Diffie and Hellman achieved a stunning breakthrough that changed cryptography forever. They split the cryptographic key.

In their system, every user has two keys, a public one and a private one, that are unique to their owner. Whatever is scrambled by one key can be unscrambled by the other. It works like this: If I want to send a message to Whit Diffie, I first obtain his public key. (For complicated mathematical reasons, it is possible to distribute one's public key freely without compromising security; a potential enemy will have no advantage in code-cracking if he holds your public key alone.) Then I use that key to encode the message. Now it's gobbledygook and only one person in the world can decode it -- Whit Diffie, who holds the other, private, key. If he wants to respond to me with a secret message, he uses my public key to encode his answer. And I decode it, using my private key.

It was an amazing solution, but even more remarkable was that this split-key system solved both of Diffie's problems, the desire to shield communications from eavesdroppers and also to provide a secure electronic identification for contracts and financial transactions done by computer. It provided the identification by the use of "digital signatures" that verify the sender much the same way that a real signature validates a check or contract.

Suddenly, the ancient limitations on cryptography had vanished. Now, perhaps before the millennium, strong cryptography could find its way to every telephone, computer and fax machine -- if users wanted it. Subsequent variations on the Diffie-Hellman scheme focused on using crypto algorithms to insure the anonymity of transactions. Using these advances, it is now possible to think of replacing money with digital cash -- while maintaining the comforting untraceability of bills and coins. The dark art of cryptography has become a tool of liberation.

From the moment Diffie and Hellman published their findings in 1976, the National Security Agency's crypto monopoly was effectively terminated. In short order, three M.I.T. mathematicians -- Ronald L. Rivest, Adi Shamir and Leonard M. Adleman -- developed a system with which to put the Diffie and Hellman findings into practice. It was known by their initials, RSA. It seemed capable of creating codes that even the N.S.A. could not break. They formed a company to sell their new system; it was only a matter of time before thousands and then millions of people began using strong encryption.

That was the National Security Agency's greatest nightmare. Every company, every citizen now had routine access to the sorts of cryptographic technology that not many years ago ranked alongside the atom bomb as a source of power. Every call, every computer message, every fax in the world could be harder to decipher than the famous German "Enigma" machine of World War II. Maybe even impossible to decipher!

The genie was out of the bottle. Next question: Could the genie be made to wear a leash and collar? Enter the Clipper chip.

When illustrating the Government's need to control crypto, Jim Kallstrom, the agent in charge of the special operations division of the New York office of the F.B.I., quickly shifts the discussion to the personal: "Are you married? Do you have a child? O.K., someone kidnaps one of your kids and they are holding your kid in this fortress up in the Bronx. Now, we have probable cause that your child is inside this fortress. We have a search warrant. But for some reason, we cannot get in there. They made it out of some new metal, or something, right? Nothing'll cut it, right? And there are guys in there, laughing at us. That's what the basis of this issue really is -- we've got a situation now where a technology has become so sophisticated that the whole notion of a legal process is at stake here!"

Kallstrom is a former head of the Bureau Tech Squad, involved in the bugging operation that brought John Gotti to justice. Some have described him as the F.B.I.'s answer to "Q," the gadget wizard of the James Bond tales.

"From the standpoint of law enforcement, there's a superbig threat out there -- this guy is gonna build this domain in the Bronx now, because he's got a new steel door and none of the welding torches, none of the boomerangs, nothing we have is gonna blast our way in there. Sure, we want those new steel doors ourselves, to protect our banks, to protect the American corporation trade secrets, patent rights, technology. But people operating in legitimate business are not violating the laws -- it becomes a different ball of wax when we have probable cause and we have to get into that domain. Do we want a digital superhighway where not only the commerce of the nation can take place but where major criminals can operate impervious to the legal process? If we don't want that, then we have to look at Clipper."

Wiretapping is among law enforcement's most cherished weapons. Only 919 Federal, state and local taps were authorized last year, but police agencies consider them essential to fighting crime. Obviously if criminals communicate using military-grade cryptosystems, wiretapping them becomes impossible.

For two years, the F.B.I. has been urging Congress to pass the proposed Digital Telephony and Communications Privacy Act, which would in essence require that new communications technologies be designed to facilitate wiretapping. Even if the bill should somehow pass, overcoming the opposition of the communications industry and civil libertarians, the extra effort and expense will be wasted if the only thing the wiretappers can hear is the hissy white noise of encrypted phone conversations and faxes. If cryptography is not controlled, wiretapping could be rendered obsolete. Louis J. Freeh, the Director of the F.B.I., surely fears that prospect. He has told Congress that preserving the ability to intercept communications legally, in the face of these technological advances, is "the No. 1 law enforcement, public safety and national security issue facing us today."

Some people criticize Clipper on the basis that truly sophisticated criminals would never use it, preferring other easily obtained systems that use high-grade cryptography. Despite Clipper, kidnappers and drug kingpins may construct Kallstrom's virtual fort in the Bronx with impunity, laughing at potential wiretappers.

The Government understands the impossibility of eradicating strong crypto. Its objective is instead to prevent unbreakable encryption from becoming rountine. If that happens, even the stupidest criminal would be liberated from the threat of surveillance. But by making Clipper the standard, the Government is betting that only a tiny percentage of users would use other encryption or try to defeat the Clipper.

At a rare public appearance in March at a conference on computers and privacy, Stewart A. Baker, then general counsel of the National Security Agency, tried to explain. "The concern is not so much what happens today when people go in and buy voice scramblers," said Baker, a dapper, mustached lawyer who worked as an Education Department lawyer in the Carter Administration. "It is the prospect that in 5 years or 10 years every phone you buy that costs $75 or more will have an encrypt button on it that will interoperate with every other phone in the country and suddenly we will discover that our entire communications network is being used in ways that are profoundly antisocial. That's the real concern, I think, that Clipper addresses. If we are going to have a standardized form of encryption that is going to change the world, we should think seriously about what we are going to do when it is misused."

Not all law-enforcement experts believe that cryptography will unleash a riot of lawlessness. William R. Spernow, a Sacramento, Calif., computer crime specialist who works on a grant from the Federal Bureau of Justice Assistance, has encountered a few cases in which criminals have encrypted information unbreakably, including one involving a pedophile who encrypted the identities of his young victims. Yet Spernow sees no reason to panic. "In cases where there's encryption, the officers have been able to make the case through other investigative means," he says. "If we hustle, we can still make our cases through other kinds of police work."

But crime is only part of the problem. What happens to national security if cryptography runs free? Those who know best, officials of the National Security Agency, won't say. When the agency's director, Vice Adm. John M. McConnell testified before a Senate subcommittee on May 3, he withheld comment on this question until the public hearing was terminated and a second, classified session convened in a secure room.

Still, the effect of strong crypto on N.S.A. operations is not difficult to imagine. The agency is charged with signals intelligence, and it is widely assumed that it monitors all the communications between borders and probably much of the traffic within foreign countries. (It is barred from intercepting domestic communications.) If the crypto revolution crippled N.S.A.'s ability to listen in on the world, the agency might miss out on something vital -- for instance, portents of a major terrorist attack.

No compelling case has been made, however, that the key-escrow system would make it easier for authorities to learn of such an attack. The National Security Agency would take the legal steps to seek the telltale keys after it had first identified those potential terrorists and wiretapped their calls, then discovered the inpenetrable hiss of encryption. Even then, the keys would be useful only if the terrorists were encoding conversations with Clipper technology, the one kind the Government had the capability to decode instantly. What sort of nuclear terrorist would choose Clipper?

The Government response has been to say that potential terrorists might indeed use alternative crypto methods to converse among themselves. But if Clipper were the accepted standard, the terrorists would have to use it to communicate with outsiders -- banks, suppliers and other contacts. The Government could listen in on those calls. However, the work of the Bell Labs researcher, Matthew Blaze, casts serious doubt on that contention. Blaze has uncovered a flaw in Clipper that would allow a user to bypass the security funtion of the chip. Anyone who tinkered with Clipper in this way could communicate in privacy with anyone else with a Clipper phone and Government wiretappers would be unable to locate the key to unscramble the conversations.

Nonetheless, it was the terrorist threat, along with national security concerns, that moved the Clinton Administration to support the key-escrow inititative. White House high-tech policy makers share a recurrent fear: one day they might be sitting before an emergency Congressional investigation after the destruction of half of Manhattan by a stolen nuclear weapon planted in the World Trade towers and trying to explain that the Government had intercepted the communications of the terrorists but could not understand them because they used strong encryption. If Clipper were enacted, they could at least say, "We tried."

Obviously the Government views the Crypto revolution with alarm and wants to contain it. For years, much of its efforts have focused on the use of stringent export controls. While cryptography within the United States is unrestricted, the country's export laws treat any sort of encryption as munitions, like howitzers or nuclear triggers. The National Security Agency is the final arbiter and it will approve exports of cryptosystems in computer software and electronic hardware only if the protective codes are significantly weakened.

The N.S.A. stance is under attack from American businesses losing sales to foreign competitors. Listen to D. James Bidzos, the 39-year-old president of RSA Data Security, the Redwood City, Calif., company that controls the patents for public-key cryptography: "For almost 10 years, I've been going toe to toe with these people at Fort Meade. The success of this company is the worst thing that can happen to them. To them, we're the real enemy, we're the real target."

RSA is making a pitch to become the standard in encryption; its technology has been adopted by Apple, AT&T, Lotus, Microsoft, Novell and other major manufacturers. So imagine its unhappiness that its main rival is not another private company, but the National Security Agency, designer of the key-escrow cryptosystems. The agency is a powerful and dedicated competitor.

"We have the system that they're most afraid of," Bidzos says. "If the U.S. adopted RSA as a standard, you would have a truly international, interoperable, unbreakable, easy-to-use encryption technology. And all those things together are so synergistically theatening to the N.S.A.'s interests that it's driving them into a frenzy."

The export laws put shackles on Bidzos's company while his overseas competitors have no such restaints. Cryptographic algorithms that the N.S.A. bans for export are widely published and are literally being sold on the streets of Moscow. "We did a study on the problem and located 340 foreign cryptographic products sold by foreign countires," says Douglas R. Miller, government affairs manager of the Software Publishers Association. "The only effect of export controls is to cripple our ability to compete."

The real potential losses, though, come not in the stand-alone encryption category, but in broader applications. Companies like Microsoft, Apple and Lotus want to put strong encryption into their products but cannot get licenses to export them. Often, software companies wind up installing a weaker brand of crypto in all their products so that they can sell a single version worldwide. This seems to be the Government's intent -- to encourage "crypto lite," strong enough to protect communications from casual intruders but not from Government itself.

In the long run, however, export regulation will not solve the National Security Agency's problem. The crypto business is exploding. People are becoming more aware of the vunerability of phone conversations, particularly wireless ones. Even the National Football League is adopting crypto technology; it will try out encrypted radio communication between coaches and quarterbacks, so rivals can't intercept last-minute audibles.

Anticipating such a boom, the N.S.A. devised a strategy for the 90's. It would concede the need for strong encryption but encourage a system with a key-escrow "back door" that provides access to communications for itself and law enforcement. The security agency had already developed a strong cryptosystem based on an algorithm called Skipjack, supposedly 16 million times stronger than the previous standard, D.E.S. (Data Encryption Standard). Now the agency's designers integrated Skipjack into a new system that uses a Law Enforcement Access Field (LEAF) that adds a signal to the message that directs a potential wiretapper to the approriate key to decipher the message. These features were included in a chip called Capstone, which could handle not only telephone communications but computer data transfers and digital signatures.

Supposedly, this technology was designed for Government use, but in 1993 the National Security Agency had a sudden opportunity to thrust it into the marketplace. AT&T had come to the agency with a new, relatively low-cost secure-phone device called the Surity 3600 that was designed to use the nonexportable DES encryption algorithm. The N.S.A. suggested that perhaps AT&T could try something else: a stripped-down version of Capstone for telephone communications. This was the Clipper chip. As a result, AT&T got two things: an agreement that Uncle Sam would buy thousands of phones for its own use (the initial commitment was 9,000, from the F.B.I.) and the prospect that the phone would not suffer the unhappy fate of some other secure devices when considered for export. There was also the expectation that AT&T would sell a lot more phones, since private companies would need to buy Clipper-equipped devices to communicate with the Governmment's Clipper phones.

It was an ingenious plan for several reasons. By agreeing to buy thousands of phones, and holding out the promise that thousands, or even millions more might be sold, AT&T phones gained a price advantage that comes with volume. (The original price of the Surity 3600 was $1,195, considerably less than the previous generation of secure phones; Mykotronx, the company making the Clipper chip, says that each chip now costs $30, but in large orders could quickly go as low as $10.) That would give the phones a big push in the marketplace. But by saturating the market, Clipper had a chance to become the standard for encryption, depending on whether businesses and individuals would be willing to accept a device that had the compromise of a government-controlled back door.

This compromise, of course, is the essence of Clipper. The Government recognizes the importance of keeping business secrets, intimate information and personal data hidden from most eyes and ears. But it also preserves a means of getting hold of that information after obtaining "legal authorization, normally a court order," according to a White House description.

The N.S.A. presented the idea to the Bush Administration, which took no action before the election. Then it had to convince a Democratic Administration to adopt the scheme, and started briefing the Clinton people during the transition. Many in the computer industry figured that with Vice President Al Gore's enthusiastic endorsement of the high-frontier virtues of the information highway, the Administration would never adopt any proposal so tilted in favor of law enforcement and away from his allies in the information industries. They figured wrong. A little more than two months after taking office, the Clinton Administration announced the existence of the Clipper chip and directed the National Institute of Standards and Technology to consider it as a Government standard.

Clipper was something the Administration -- starting with the Vice President -- felt compelled to adopt, and key escrow was considered an honorable attempt to balance two painfully contradictory interests, privacy and safety.

The reaction was instant, bitter and ceaseless. The most pervasive criticisms challenged the idea that a Clipper would be, as the standard said, "voluntary." The Government's stated intent is to manipulate the marketplace so that it will adopt an otherwise unpalatable scheme and make it the standard. Existing systems have to cope with export regulations and, now, incompatibility with the new Government Clipper standard. Is it fair to call a system voluntary if the Government puts all sorts of obstacles in the way of its competitors?

Others felt that it was only a matter of time before the National Security Agency pressured the Government to require key escrow of all cryptographic devices -- that Clipper was only the first step in a master plan to give Uncle Sam a key to everyone's cyberspace back door.

"That's a real fear," says Stephen T. Walker, a former N.S.A. employee who is now president of Trusted Information Systems, a company specializing in computer security products. "I don't think the Government could pull it off -- it would be like prohibition, only worse. But I think they might try it."

But mostly, people were unhappy with the essence of Clipper, that the Government would escrow their keys. As Diffie notes, key escrow reintroduces the vulnerability that led him to invent public key cryptography -- any system that relies on trusted third parties is, by definition, weaker than one that does not. Almost no one outside the Government likes the key-escrow idea. "We published the standard for 60 days of public comments," says F. Lynn McNulty, associate director for computer security at the National Institute of Standards and Technology. "We received 320 comments, only 2 of which were supportive."

Many people thought that in the face of such opposition, the Administration would quietly drop the Clipper proposal. They were dismayed by the Feb. 4 announcement of the adoption of Clipper as a Government standard. Administration officials knew they were alienating their natural allies in the construction of the information superhighway but felt they had no alternative. "This," said Michael R. Nelson, a White House technology official, "is the Bosnia of telecommunications."

If clipper is the administration's Techno-Bosnia, the crypto equivalent of snipers are everywhere -- in industry, among privacy lobbyists and even among Christian Fundamentalists. But the most passionate foes are the Cypherpunks. They have been meeting on the second Saturday of every month at the offices of Cygnus, a Silicon Valley company, assessing new ways they might sabotage Clipper. The group was co-founded in September 1992 by Eric Hughes, a 29-year-old freelance cryptogapher, and Tim May, a 42-year-old physicist who retired early and rich from the Intel company. Other Cypherpunk cells often meet simultaneously in six or seven locations around the world, but the main gathering place for Cypherpunks is the Internet, by means of an active mailing list in which members post as many as 100 electronic messages a day.

Cypherpunks share a few common premises. They assume that cryptography is a liberating tool, one that empowers individuals. They think that one of the most important uses of cryptography is to protect communications from the Government. Many of them believe that the Clipper is part of an overall initiative against cryptography that will culminate in Draconian control of the technology. And they consider it worth their time to fight, educating the general public and distributing cryptographic tools to obstruct such control.

Both Hughes and May have composed manifestos. Hughes's call to arms proclaims: "Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it."

May's document envisions a golden age in which strong cryptography belongs to all -- an era of "crypto anarchism" that governments cannot contain. To May, cryptography is a tool that will not only bestow privacy on people but help rearrange the economic underpinnings of society.

"Combined with emerging information markets, cryptography will create a liquid market for any and all material that can be put into words and pictures," May's document says. "And just as a seemingly minor invention like barbed wire made possible the fencing-off of vast ranches and farms, thus altering forever the concepts of land and property rights in the frontier West, so too will the seemingly minor discovery out of an arcane branch of mathematics come to be the wire clippers which dismantle the barbed wire around intellectual property."

At a recent meeting, about 50 Cypherpunks packed into the Cygnus conference room, with dozens of others participating electronically from sites as distant as Cambridge, Mass., and San Diego. The meeting stretched for six hours, with discussions of hardware encryption schemes, methods to fight an electronic technique of identity forgery called "spoofing," the operation of "remailing" services, which allow people to post electronic messages anonymously -- and various ways to fight Clipper.

While the Cypherpunks came up with possible anti-Clipper slogans for posters and buttons, a bearded crypto activist in wire-rim glasses named John Gilmore was outside the conference room, showing the latest sheaf of cryptography-related Freedom of Information documents he'd dragged out of Government files. Unearthing and circulating the hidden crypto treasures of the National Security Agency is a passion of Gilmore, an early employee of Sun Microsystems who left the company a multimillionaire. The Government once threatened to charge him with a felony for copying some unclassified-and-later-reclassified N.S.A. documents from a university library. After the story hit the newspapers, the Government once again declassified the documents.

"This country was founded as an open society, and we still have the remnants of that society," Gilmore says. "Will crypto tend to open it or close it? Our Government is building some of these tools for its own use, but they are unavailable -- we have paid for cryptographic breakthroughs but they're classified. I wish I could hire 10 guys -- cryptographers, librarians -- to try to pry cryptography out of the dark ages."

Perhaps the most admired Cypherpunk is someone who says he is ineligible because he often wears a suit. He is Philip R. Zimmermann, a 40-year-old software engineer and cryptographic consultant from Boulder, Colo., who in 1991 cobbled together a cryptography program for computer data and electronic mail. "PGP," he called it, meaning Pretty Good Privacy, and he decided to give it away. Anticipating the Cypherpunk credo, Zimmermann hoped that the appearance of free cryptography would guarantee its continued use after a possible Government ban. One of the first people receiving the program placed it on a computer attached to the Internet and within days thousands of people had PGP. Now the program has been through several updates and is becoming sort of a people's standard for public key cryptography. So far, it appears that no one has been able to crack information encoded with PGP.

Like Diffie, Zimmermann developed a boyhood interest in crypto. "When I was a kid growing up in Miami, it was just kind of cool -- secret messages and all," he says. Later, "computers made it possible to do ciphers in a practical manner." He was fascinated to hear of public key cryptography and during the mid-1980's he began experimenting with a system that would work on personal computers. With the help of some colleagues, he finally devised a strong system, albeit one that used some patented material from RSA Data Security. And then he heard about the Senate bill that proposed to limit a citizen's right to use strong encryption by requiring manufacturers to include back doors in their products. Zimmermann, formerly a nuclear freeze activist, felt that one of the most valuable potential uses of cryptography was to keep messages secret from the Government.

Zimmermann has put some political content into the documentation for his program: "If privacy is outlawed, only outlaws will have privacy. Intelligence agencies have access to good cryptographic technology. So do the big arms and drug traffickers. So do defense contractors, oil companies, and other corporate giants. But ordinary people and grassroots political organizations mostly have not had access to affordable 'military grade' public-key cryptographic technology. Until now."

He has been told that Burmese freedom fighters learn PGP in jungle training camps on portable computers, using it to keep documents hidden from their oppressive Government. But his favorite letter comes from a person in Latvia, who informed him that his program was a favorite among one-time refuseniks in that former Soviet republic. "Let it never be," wrote his correspondant, "but if dictatorship takes over Russia, your PGP is widespread from Baltic to Far East now and will help democratic people if necessary."

Early last year, Zimmermann received a visit from two United States Customs Service agents. They wanted to know how it was that the strong encryption program PGP had found its way overseas with no export license. In the fall, he learned from his lawyer that he was a target of a grand jury investigation in San Jose, Calif. But even if the Feds should try to prosecute, they are likely to face a tough legal issue: Can it be a crime, in the process of legally distributing information in this country, to place it on an Internet computer site that is incidentally accessible to network users in other countries? There may well be a First Amendment issue here: Americans prize the right to circulate ideas, including those on software disks.

John Gilmore has discovered that Government lawyers have their own doubts about these issues. In some documents he sued to get, there are mid-1980's warnings by the Justice Department that the export controls on cryptography presented "sensitive constitutional issues." In one letter, an assistant attorney general warns that "the regulatory scheme extends too broadly into an area of protected First Amendment speech."

Perhaps taking Phil Zimmermann to court would not be the Government's best method for keeping the genie in the bottle.

The Clipper program has already begun. About once a month, four couriers with security clearances travel from Washington to the Torrance, Calif., headquarters of Mykotronx, which holds the contract to make Clipper chips. They travel in pairs, two from each escrow agency: the NIST and the Treasury Department. The redundancy is a requirement of a protocol known as Two-Person Integrity, used in situations like nuclear missile launches, where the stakes are too high to rely on one person.

The couriers wait while a Sun work station performs the calculations to generate the digital cryptographic keys that will be imprinted in the Clipper chips. Then it splits the keys into two pieces, separate number chains, and writes them on two floppy disks, each holding lists of "key splits." To reconstruct the keys imprinted on the chip, and thereby decode private conversations, you would need both sets of disks.

After being backed up, the sets of disks are separated, each one going with a pair of couriers. When the couriers return to their respective agencies, each set of disks is placed in a double-walled safe. The backup copies are placed in similar safes. There they wait, two stacks of floppy disks that grow each month, now holding about 20,000 key splits, the so-called back doors.

Will this number grow into the millions as the Government hopes? Ultimately the answer lies with the American public. Administration officials are confident that when the public contemplates scenarios like the Fortress in the Bronx or the Mushroom Cloud in Lower Manhattan, it will realize that allowing the Government to hold the keys is a relatively painless price to pay for safety and national security. They believe the public will eventually accept it in the same way it now views limited legal wiretapping. But so far the Administration hasn't recruited many prominent supporters. The main one is Dorothy Denning, a crypto expert who heads the computer science department at Georgetown University.

Since endorsing Clipper (and advocating passage of the Digital Telephony initiative) Denning has been savagely attacked on the computer nets. Some of the language would wither a professional wrestler. "I've seen horrible things written about me," Denning says with a nervous smile. "I try to actually now avoid looking at them, because that's not what's important to me. What's important is that we end up doing the right thing with this. It was an accumulation of factors that led me to agree with Clipper, and the two most important areas, to me, are organized crime and terrorism. I was exposed to cases where wiretaps had actually stopped crimes in the making, and I started thinking, 'If they didn't have this tool, some of these things might have happened.' You know, I hate to use the word responsibility, but I actually feel some sense of responsibility to at least state my position to the extent so that people will understand it."

The opponents of Clipper are confident that the marketplace will vote against it. "The idea that the Government holds the keys to all our locks, before anyone has even been accused of committing a crime, doesn't parse with the public," says Jerry Berman, executive director of the Electronic Frontier Foundation. "It's not America."

Senator Leahy hints that Congress might not stand for the Clinton Administration's attempt to construct the key-escrow system, at an estimated cost of $14 million dollars initially and $16 million annually. "If the Administration wants the money to set up and run the key-escrow facilities," he says, "it will need Congressional approval." Despite claims by the National Institute of Standards and Technology deputy director, Raymond G. Kammer, that some foreign governments have shown interest in the scheme, Leahy seems to agree with most American telecommunications and computer manufacturers that Clipper and subsequent escrow schemes will find no favor in the vast international marketplace, turning the United States into a cryptographic island and crippling important industries.

Leahy is also concerned about the Administration's haste. "The Administration is rushing to implement the Clipper chip program without thinking through crucial details," he says. Indeed, although the Government has been buying and using Clipper encryption devices, the process of actually getting the keys out of escrow and using them to decipher scrambled conversations has never been field tested. And there exists only a single uncompleted prototype of the device intended to do the deciphering.

Leahy is also among those who worry that, all policy issues aside, the Government's key escrow scheme might fail solely on technical issues. The Clipper and Capstone chips, while powerful enough to use on today's equipment, have not been engineered for the high speeds of the coming information highway; updates will be required. Even more serious are the potential design flaws in the unproved key-escrow scheme. Matthew Blaze's discovery that wrongdoers could foil wiretappers may be only the first indication that Clipper is unable to do the job for which it was designed. In his paper revealing the glitch, he writes, "It is not clear that it is possible to construct EES (Escrow Encryption Standard) that is both completely invulnerable to all kinds of exploitation as well as generally useful."

At bottom, many opponents of Clipper do not trust the Government. They are unimpressed by the elaborate key-escrow security arrangements outlined for Clipper. Instead, they ask questions about the process by which the Clipper was devised -- how is it that the N.S.A., an intelligence agency whose mission does not ordinarily include consumer electronics design, has suddenly seized a central role in creating a national information matrix? They also complain that the Skipjack cryptographic algorithm is a classified secret, one that cryptographic professionals cannot subject to the rigorous, extended testing that has previously been used to gain universal trust for such a standard.

"You don't want to buy a set of car keys from a guy who specializes in stealing cars," says Marc Rotenberg, director of the Electronic Privacy Information Center. "The N.S.A.'s specialty is the ability to break codes, and they are saying, 'Here, take our keys, we promise you they'll work.' "

At the March conference on computers and privacy, Stewart Baker responded to this sort of criticism. "This is the revenge of people who couldn't go to Woodstock because they had too much trig homework," he said, evoking some catcalls. "It's a kind of romanticism about privacy. The problem with it is that the beneficiaries of that sort of romanticism are going to be predators. PGP, they say, is out there to protect freedom fighters in Latvia. But the fact is, the only use that has come to the attention of law enforcement agencies is a guy who was using PGP so the police could not tell what little boys he had seduced over the net. Now that's what people will use this for -- it's not the only thing people will use it for, but they will use it for that -- and by insisting on having a claim to privacy that is beyond social regulation, we are creating a world in which people like that will flourish and be able to do more than they can do today."

Even if Clipper flops, the Crypto War will continue. The Administration remains committed to limiting the spread of strong cryptography unless there's a back door. Recently, it has taken to asking opponents for alternatives to Clipper. One suggestion it will not embrace is inaction. "Deciding that the genie is out of the bottle and throwing our arms up is not where we're at," says a White House official.

The National Security Agency will certainly not go away. "The agency is really worried about its screens going blank" due to unbreakable encryption, says Lance J. Hoffman, a professor of computer science at George Washington University. "When that happens, the N.S.A. -- said to be the largest employer in Maryland -- goes belly-up. A way to prevent this is to expand its mission and to become, effectively, the one-stop shop for encryption for Government and those that do business with the Government."

Sure enough, the security agency is cooking up an entire product line of new key-escrow chips. At Fort Meade, it has already created a high-speed version of the Skipjack algorithm that outperforms both Clipper and Capstone. There is also another, more powerful, encryption device in the works named Baton. As far as the agency is concerned, these developments are no more than common sense. "To say that N.S.A. shouldn't be involved in this issue is to say that Government should try to solve this difficult technical and social problem with both hands tied behind its back," Stewart Baker says.

But Phil Zimmermann and the Cypherpunks aren't going away, either. Zimmermann is, among other things, soliciting funds for a PGP phone that will allow users the same sort of voice encryption provided by the Clipper chip. The difference, of course, is that in his phone there is no key escrow, no back door. If the F.B.I. initiated a wiretap on someone using Zimmermann's proposed phone, all the investigators would hear is static that they could never restore to orderly language.

What if that static shielded the murderous plans of a terrorist or kidnapper? Phil Zimmermann would feel terrible. Ultimately he has no answer. "I am worried about what might happen if unlimited security communications come about," he admits. "But I also think there are tremendous benefits. Some bad things would happen, but the trade-off would be worth it. You have to look at the big picture."

Viewing all 25817 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>