Examining overlooked clues reveals how ultrasound could have caused harm in Cuba

March 15, 2018, 3:30 pm

≫ Next: Anandtech: Our Interesting Call with CTS-Labs

≪ Previous: US utilities have finally realized electric cars will save them

Throughout last year, mysterious ailments struck dozens of U.S. and Canadian diplomats and their families living in Cuba. Symptoms included dizziness, sleeplessness, headache, and hearing loss; many of the afflicted were in their homes or in hotel rooms when they heard intense, high-⁠pitched sounds shortly before falling ill. In February, neurologists who examined the diplomats concluded that the symptoms were consistent with concussion, but without any blunt trauma to the head. Suggested culprits included toxins, viruses, and a sonic weapon, but to date, no cause has been confirmed.

We found the last suggestion—a sonic weapon—intriguing, because around the same time that stories about health problems in Cuba began appearing, our labs, at the University of Michigan–Ann Arbor, and at Zhejiang University in China, were busy writing up our latest research on ultrasonic cybersecurity. We wondered, Could ultrasound be the culprit in Cuba?

On the face of it, it seems impossible. For one thing, ultrasonic frequencies—20 kilohertz or higher—are inaudible to humans, and yet the sounds heard by the diplomats were obviously audible. What’s more, those frequencies don’t propagate well through air and aren’t known to cause direct harm to people except under rarefied conditions. Acoustic experts dismissed the idea that ultrasound could be at fault.

Then, about six months ago, an editor from The Conversation sent us a link to a video from the Associated Press, reportedly recorded in Cuba during one of the attacks.

Video: AP

The editor asked us for our reaction. In the video, you can hear a piercing, metallic sound—it’s not pleasant. Watching the AP video frame by frame, we immediately noticed a few oddities. In one sequence, someone plays a sound file from one smartphone while a second smartphone records and plots the acoustic spectrum. So already the data are somewhat suspect because every microphone and every speaker introduces some distortion. Moreover, what humans hear isn’t necessarily the same as what a microphone picks up. Cleverly crafted sounds can lead to auditory illusions akin to optical illusions.

The AP video also includes a spectral plot of the recording—that’s basically a visual representation of the intensities of the various acoustic tones present, arranged by frequency. Looking closely, we noticed a spectral peak near 7 kilohertz and a dozen other less-intense tones that formed a regular pattern with peaks separated by approximately 180 hertz. What could have caused these ripples every 180 Hz? And what kind of mechanism could make an ultrasonic source produce audible sound?

As the questions began to mount, it still didn’t make sense to us, and that seemed like an excellent reason to dig deeper.

We also felt an obligation to investigate. Our own research had taught us that ultrasound can compromise the security of many types of sensors found widely in medical devices, autonomous vehicles, and the Internet of Things. For the last decade, two of us (Fu and Xu) have been collaborating on embedded security research, with the goal of discovering physics-based engineering principles and practices that will make automated computer systems secure by design. For example, Xu’s 2017 paper “DolphinAttack: Inaudible Voice Commands” describes how we used ultrasonic signals to inject inaudible voice commands into speech recognition systems such as Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana, Alexa, and the navigation system of an Audi automobile.

The Cuban ultrasonic mystery was too close to our research to ignore.

One thing we knew going into this investigation is that acoustic interference can occur where you least expect it. Several years ago, Fu became annoyed by an ear-piercing sound coming from a lightbulb in his apartment. He took spectral measurements and noticed that the lightbulb tended to shriek when the air conditioner turned on. He eventually concluded that the compressor was pumping coolant through its pipes at the same resonant frequency of the filament in the bulb. Normally, this wouldn’t be a problem. But in this case, the coolant pipes ran through the ceiling and mechanically coupled to the ceiling joist supporting the lightbulb. The superintendent opened up the ceiling and separated the joist from the pipe with a piece of duct tape, to dampen the unwanted coupling. The sound stopped.

We also knew that ultrasound isn’t considered harmful to humans—for the most part. Misused, an ultrasonic emitter that’s in direct contact with a person’s body can heat tissues and damage organs. And the U.S. Occupational Safety and Health Administration (OSHA) warns that audible subharmonics caused by intense airborne ultrasonic tones can be harmful. Thus, U.S. standards on ultrasonic emissions build in safety margins to account for those subharmonics. The Canadian government, meanwhile, has ruled that humans can be directly harmed by airborne ultrasound at sound pressures of 155 decibels or higher—which is louder than a jet taking off at 25 meters. That ruling also notes that “a number of ‘subjective’ effects have been reportedly caused by airborne ultrasound, including fatigue, headache, nausea, tinnitus and disturbance of neuromuscular coordination.”

Of course, even at 155 dB, ultrasonic tones remain inaudible. Unless they’re not—more on this in a bit.

To make the problem tractable, we began by assuming that the source of the audible sounds in Cuba was indeed ultrasonic. Reviewing the OSHA guidance, Fu theorized that the sound came from the audible subharmonics of inaudible ultrasound. In contrast to harmonics, which are produced at integer multiples of a sound’s fundamental frequency, subharmonics are produced at integer divisors (or submultiples) of the fundamental frequency, such as 1/2 or 1/3. For instance, the second subharmonic of an ultrasonic 20-kHz tone is a clearly audible 10 kHz. Subharmonics didn’t quite explain the AP video, though: In the video, the spectral plot indicates tones evenly spaced every 180 Hz, whereas subharmonics would have appeared at progressively smaller fractions of the original frequency. Such a plot would not have the constant 180-Hz spacing.

Fu explained his theory to Chen Yan, a Ph.D. student in Xu’s lab. Yan wrote back: It’s not subharmonics—it’s intermodulation distortion.

Intermodulation distortion (IMD) is a bizarre effect. When multiple tones of different frequencies travel through air, IMD can produce several by-products at other frequencies. In particular, second-order IMD by-products will appear at the difference or the sum of the two tones’ frequencies. So if you start with a 25-kHz signal and a 32-kHz signal, the result could be a 7-kHz tone or a 57-⁠kHz tone. These by-products can be significantly lower in frequency while maintaining much of the intensity of the original tones.

IMD is well known to radio engineers, who consider it undesirable for radio communication. The sounds don’t have to travel through air; any “nonlinear medium” will do. A medium is considered nonlinear if a change in the output signal is not proportional to the change in the input. Acoustic devices such as microphones and amplifiers can also exhibit nonlinearity. One way to test for it is to send two pure tones into an amplifier or microphone and then measure the output. If additional tones appear in the output, then you know the device is nonlinear.

Computer science researchers have explored the physics of IMD. In the DolphinAttack paper, we used ultrasonic signals to trick a smartphone’s voice-recognition assistant. Because of nonlinearity in the smartphone’s microphone, the ultrasound produced by-products at audible frequencies inside the circuitry of the microphone. Thus, the IMD signal remains inaudible to humans, but the smartphone hears voices. In an early 2017 paper, Nirupam Roy, Haitham Hassanieh, and Romit Roy Choudhury at the University of Illinois at Urbana-Champaign described their BackDoor system [PDF] for using ultrasound and IMD to jam spy microphones, watermark music played at live concerts, and otherwise create “shadow” sounds.

Some composers and musicians have also used IMD to create synthetic sounds, combining audible tones to create other subliminal, audible tones. For example, in their 1987 book The Musician’s Guide to Acoustics, Murray Campbell and Clive Greated note that the last movement of Jean Sibelius’s Symphony No. 1 in E minor contains tones that lead to a rumbling IMD. The human ear processes sound in a nonlinear fashion, and so it can be “tricked” into hearing tones that weren’t produced by the instruments and that aren’t in the sheet music; those subliminal tones are produced when the played tones combine nonlinearly in the inner ear.

Back to our quest: Knowing that intermodulation distortion between multiple ultrasonic signals can cause lower-frequency by-products, we next set about simulating the effect in the lab, aiming to replicate what we observed in the AP News video. We used two signals: a pure 25-kHz tone and a 32-KHz carrier tone that had its amplitude modulated by a 180-Hz tone. (Our technical report, “On Cuba, Diplomats, Ultrasound, and Intermodulation Distortion” [PDF], goes into more detail on the math of how we did this.) The result was clear: Strong tones appeared at 7 kHz with repeating ripples separated by 180 Hz.

We then followed up with live experiments. As in the simulation, we used two ultrasonic speakers to emit the signals, one as a 180-Hz sine wave amplitude modulated over a 32-kHz carrier, and the second as a single-tone 25-kHz sine wave. We used a smartphone to record the result. IMD caused by the air and the smartphone microphone created the telltale 7-kHz signal. This video shows the experimental setup:

Video: Chen Yan

If you look closely at the spectral plot displayed on the smartphone, you’ll notice some higher-order IMD by-products, at 4 kHz and beyond, as well as several other frequencies. Interestingly, although we could hear the 7-kHz tones during the experiment, we couldn’t hear the 4-kHz tones recorded by the smartphone. We suspect that the 4-kHz tones partly resulted from secondary IMD within the microphone itself. In other words, the microphone was hearing an acoustic illusion that we couldn’t hear.

For fun, we also experimented with using an ultrasonic carrier to eavesdrop on a room. In this kind of setup, a spy places a microphone to pick up speech and then uses the relatively low-frequency audio signal to modulate the amplitude of the carrier wave. The carrier wave then gets picked up by an ultrasonic-capable sensor located some distance away and demodulated to recover the original audio. In our experiments, we selected a song to stand in for the audio signal recorded by an eavesdropping microphone: Rick Astley’s 1980s hit “Never Gonna Give You Up.” We amplitude modulated the song on a 32-kHz ultrasonic carrier. When we introduced a 25-kHz sine wave to interfere with this covert ultrasonic channel, IMD in the air produced a 7-kHz audible tone with ripples associated with the tones of the song, which was then picked up by the recording device. The computer played the song after software demodulation.

This video shows the results of our “rickroll” covert ops:

Video: Chen Yan

One thing to note in the video is that the metallic sounds near 7 kHz are audible only at the point where the two signals cross. When the signals do not intersect, you can’t hear the 7-kHz tone, but the demodulator can still play the covert song. That finding is consistent with what some diplomats reported in Cuba: The sounds they heard tended to be confined to a part of the room. When they moved just a few steps away, the sound stopped.

So if the sources of the sound in Cuba were ultrasonic, what could they have been? There are many sources of ultrasound in the modern world. At Michigan, our offices are bathed in 25-kHz signals coming from ceiling-mounted ultrasonic room-occupancy sensors. We’ve removed the devices closest to our lab equipment, but just last month we discovered a new one. [To learn more about our travails with these sensors, see “How an Ultrasonic Sensor Nearly Derailed a Ph.D. Thesis.”] Another source is ultrasonic pest repellents against rodents and insects. (This blog post describes a family’s encounter with such a device in the Havana airport.) And some automobiles contain ultrasonic emitters.

While the equipment we used in our Cuban re-creation is relatively bulky, ultrasonic emitters can be quite tiny, no larger than a piece of Rolo candy. Online, we found a manufacturer in Russia that sells a fashionable leather clutch that conceals an ultrasonic emitter, presumably to jam recording devices at cocktail parties. We also found electronics stores that carry high-⁠power ultrasonic jammers that cause microphones to malfunction. One advertised jammer emits 120-dB ultrasonic interference at a distance of 1 meter. That’s like standing next to a chainsaw. If a signal from that caliber jammer were to combine with a second ultrasonic source, audible by-products could result.

While the math leads us to believe that intermodulation distortion is a likely culprit in the Cuban case, we haven’t ruled out other null hypotheses that may account for the discomfort that diplomats felt. For example, maybe the tones people heard didn’t cause their symptoms but were just another symptom, a clue to the real cause. Or maybe the sounds had some sort of nonauditory effect on people’s hearing and physiology, through bone conduction or some other known phenomenon. Microwave radiation is another theory. One positive outcome from all this would be if more computer scientists were to master embedded security, signal processing, and systems engineering.

Even if our hypothesis is correct, we may never learn the definitive story. The parties responsible for the ultrasonic emitters would have already figured out by now that their devices are to blame and would have removed or deactivated them. But whether our hypothesis is correct or not, one thing is clear: Ultrasonic emitters can produce audible by-products that could have unintentionally harmed diplomats. That is, bad engineering may be a more likely culprit than a sonic weapon.

About the Authors

Kevin Fu is a Fellow of the IEEE and an associate professor of computer science and engineering at the University of Michigan–Ann Arbor, where he leads the Security and Privacy Research Group. He’s also chief scientist of the health-care cybersecurity startup Virta Labs. Wenyuan Xu is professor and chair of the department of systems science and engineering at Zhejiang University. Xu’s Ubiquitous System Security Lab (USSLab) has twice been recognized by the Tesla Security Researcher Hall of Fame. Chen Yan is a Ph.D. student at Zhejiang University.

To Probe Further

The authors’ technical report “On Cuba, Diplomats, Ultrasound, and Intermodulation Distortion” [PDF] (Technical Report CSE-TR-001-18, University of Michigan, Computer Science & Engineering, 1 March 2018) provides additional details on their simulation and experiments to reverse engineer the Cuban embassy “sonic weapon.”

AP News’s Josh Lederman and Michael Weissenstein were the first to report the Cuban sound recording, in “Dangerous sound? What Americans heard in Cuba attacks,” 13 October 2017.

For more on how sounds can be synthesized using intermodulation distortion, see “Sound Synthesis and Auditory Distortion Products,” by Gary S. Kendall, Christopher Haworth, and Rodrigo F. Cádiz, in Computer Music Journal, 38(4), MIT Press, Winter 2014.

A number of people have suggested that microwaves, rather than ultrasound, may have been at work in Cuba. See, for example, James C. Lin’s article “Strange Reports of Weaponized Sound in Cuba,” in IEEE Microwave Magazine, January/February 2018, pp. 18-19. A remaining question is whether microwaves could have produced the high-pitched sounds recorded by the smartphone in the AP News video.

↧

Anandtech: Our Interesting Call with CTS-Labs

March 15, 2018, 5:10 pm

≫ Next: Regular Expressions – Mastering Lookahead and Lookbehind

≪ Previous: Examining overlooked clues reveals how ultrasound could have caused harm in Cuba

In light of the recent announcement of potential vulnerabilities in Ryzen processors, two stories have emerged. Firstly, that AMD processors could have secondary vulnerabilities in the secure processor and ASMedia chipsets. The second story is behind the company that released the report, CTS-Labs, the approach they have about this disclosure, and the background of this previously unknown security focused outfit – and their intentions as well as their corporate structure. Depending on the angle you take in the technology industry, either as a security expert, a company, the press, or a consumer, one of these stories should interest you.

To make it clear, the two stories boil down to these questions:

One	What are the vulnerabilities, how bad are they, and what can be done?
Two	Who are CTS-Labs, how come their approach to reasonable responsible disclosure differs to other security firms, how come a number of elements about the disclosure are atypical of a security firm, what is thier financial model, and who are their clients?

In our analysis of the initial announcement, we took time to look at what information we had on the flaws, as well as identifying the number of key features about CTS-Labs that did not fit our standard view of a responsible disclosure as well as a few points on Twitter that did not seem to add up. Since then, we have approached a number of experts in the field, a number of companies involved, and attempted to drill down into the parts of the story that are not so completely obvious. I must thank the readers that reached out to me over email and through Twitter that have helped immensely in getting to the bottom of what we are dealing with.

On the back of this, CTS-Labs has been performing a number of press interviews, leading to articles such as this at our sister site, Tom’s Hardware. CTS reached out to us as well, however a number of factors led to delaying the call. Eventually we found a time to suit everyone. It was confirmed in advance that everyone was happy the call was recorded for transcription purposes.

Joining me on the call was David Kanter, a long-time friend of AnandTech, semiconductor industry consultant, and owner of Real World Technologies. From CTS-Labs, we were speaking with Ido Li On, CEO, and Yaron Luk-Zilberman, CFO.

Ian Cutress
AnandTech

David Kanter
RealWorldTech

Ido Li On
CEO, CTS-Labs

Yaron Luk-Zilberman
CFO, CTS-Labs

The text here was transcribed from the recorded call. Some superfluous/irrelevant commentary has been omitted, with the wording tidied a little to be readable.

This text is being provided as-is, with minor commentary at the end. There is a substantial amount of interesting detail to pick through. We try to tackle both of the sides of the story in our questioning.

IC: Who are CTS-Labs, and how did the company start? What are the backgrounds of the employees?

YLZ: We are three co-founders, graduates of a unit called 8200 in Israel, a technological unit of intelligence. We have a background in security, and two of the co-founders have spent most of their careers in cyber-security and working as consultants for the industry performing security audits for financial institutions and defense organizations and so on. My background is in the financial industry, but I also have a technological background as well.

We came together in the beginning of 2017 to start this company, whose focus was to be in hardware in cyber security. As you guys probably know, this is frontier/niche now that most of the low-hanging fruit in software has been picked up. So this is where the game is moving, we think at least. The goal of the company is to provide security audits, and to deliver reports to our clients on the security of those points.

This is our first major publication. Mostly we do not go public with our results, we just deliver our results to our customers. I should say very importantly that we never deliver the vulnerabilities themselves that we find, or the flaws, to a customer to whom the product does not belong. In other words, if you come to us with a request for an audit of your own, we will give you the code and the proof-of-concepts, but if you want us to audit someone else’s product, even as a consumer of a product or a competitor’s product, or a financial institution, we will not give you the actual code – we will only describe to you the flaw that we find.

This is our business model. This time around in this project, we started with ASMedia, and as you probably know the story moved to AMD as they imported the ASMedia technology into their chipset. Having studied one we started studying the other. This became a very large and important project so we decided we were going to go public with the report. That is what has brought is here.

IC: You said that you do not provide flaws to companies that are not the manufacturer of what you are testing. Does that mean that your initial ASMedia research was done with ASMedia as a customer?

ILO: No. So we can audit a product that the manufacturer of the product orders from us, or that somebody else such as a consumer or a third interested party audits from us and then we will provide the part of the description about the vulnerabilities much like our whitepaper but without the technical details to actually implement the exploit.

Actually ASMedia was a test project, as we’re engaged in many projects, and we were looking into their equipment and that’s how it started.

IC: Have you, either professionally or as a hobby, published exploits before?

ILO: No we have not. That being said, we have been working in this industry for a very long time as we have done security audits for companies, found vulnerabilities, and given that information to the companies as part of consultancy agreements but we have never actually went public with any of those vulnerabilities.

IC: What response have you had from AMD?

ILO: We got the email today to say they were looking into it.

DK: If you are not providing Proof of Concept (PoC) to a customer, or technical details of an exploit, with a way to reproduce it, how are you validating your findings?

YLZ: After we do our validation internally, we take a third party validator to look into our findings. In this case it was Trail of Bits, if you are familiar with them. We gave them full code, full proof of concept with instructions to execute, and they have verified every single claim that we have provided to them. They have gone public with this as well.

In addition to that, In this case we also sent our code to AMD, and then Microsoft, HP, and Dell, the integrators and also domestic and some other security partners. So they have all the findings. We decided to not make them public. The reason here is because we believe it will take many many months for the company, even under ideal circumstances, to come out with a patch. So if we wanted inform consumers about the risks that they have on the product, we just couldn’t afford in our minds to not make the details public.

DK: Even when the security team has a good relationship with a company who has a product with a potential vulnerability, simply verifying a security a hole can take a couple of days at least. For example, with the code provided with Spectre, a security focused outsider could look at the code and make educated guesses within a few minutes to the validity of the claim.

ILO: What we’ve done is this. We have found thirteen vulnerabilities, and we wrote a technical write up on each one of those vulnerabilities with code snippets how they work exactly. We have also produced working PoC exploits for each one of the vulnerabilities so you can actually exploit each one of them. And we have also produced very detailed tutorials on how to run the exploits on test hardware step-by-step to get all the results that we have been able to produce here in the lab. We documented it so well that when we gave it to Trail of Bits, they took it, and ran the procedures by themselves without talking to us and reproduced every one of the results.

We took this package of documents, procedures, and exploits, and we sent it to AMD and other security process that took Trail of Bits about 4-5 days to complete, so I am very certain that they will be able to reproduce this. Also we gave them a list of exactly what hardware to buy and instructions with all the latest BIOS updates and everything.

YLZ: We faced the problems – how do we make a third party validator not just sit there and say ‘this thing works’ but actually do it themselves without us contacting them. We had to write a details manual, a step-by-step kind of thing. So we gave it to them, and Trail of Bits came back to us in five days. I think that the guys we sent it to are definitely able to do it within that time frame

IC: Can you confirm that money changes hands with Trail of Bits?

(This was publicly confirmed by Dan Guido earlier, stating that they were expecting to look at one test out of curiosity, but 13 came through so they invoiced CTS for the work. Reuters reports that a $16000 payment was made as ToB’s verification fee for third-party vulnerability checking)

YLZ: I would rather not make any comments about money transactions and things of that nature. You are free to ask Trail of Bits.

IC: The standard procedure for vulnerability disclosure is to have a CVE filing and a Mitre numbers. We have seen in the public disclosures, even 0-day and 1-day public disclosures, have relevant CVE IDs. Can you describe why you haven’t in this case?

ILO: We have submitted everything we have to US Cert and we are still waiting to hear back from them.

IC: Can you elaborate as to why you did not wait for those numbers to come through before going live?

ILO: It’s our first time around. We haven’t – I guess we should have – this really is our first rodeo.

IC: Have you been I contact with ARM or Trustonic about some of these details?

ILO: We have not, and to be honest with you I don’t really think it is their problem. So AMD uses Trustonic t-Base as the base for their firmware on Ryzen processors. But they have built quite a bit of code on top of it and in that code are security vulnerabilities that don’t have much to do with Trustonic t-Base. So we really don’t have anything to say about T-Base.

IC: As some of these attacks go through TrustZone, an Arm Cortex A5, and the ASMedia chipsets, can you speak about other products with these features can also be affected?

ILO: I think that the vulnerabilities found are very much … Actually let us split this up between the processor and the chipset as these are very different.

For the secure processor, AMD built quite a thick layer on Trustonic t-Base. They added many features and they also added a lot of features that break the isolation between process running on top of t-Base. So there are a bunch of vulnerabilities there that are not from Trustonic. In that respect we have no reason to believe that we would find these issues on any other product that is not AMDs.

Regarding the chipset, there you actually have vulnerabilities that affect a range of products. Because as we explained earlier, we just looked first at AMD by looking at ASMedia chips. Specifically we were looking into several lines of chips, one of them is the USB host controller from ASMedia. We’re talking about ASM1042, ASM1142, and the recently released ASM1143. These are USB host controllers that you put on the motherboard and they connect on one side with PCIe and on the other side they give you some USB ports.

What we found are these backdoors that we have been describing that come built into the chips – there are two sets of backdoors, hardware backdoors and software backdoors, and we implemented clients for those backdoors. The client works on AMD Ryzen machines but it also works on any machine that has these ASMedia chipsets and so quite a few motherboards and other PCs are affected by these vulnerabilities as well. If you search online for motherboard drivers, such as the ASUS website, and download ASMedia drivers for your motherboard, then those motherboards are likely vulnerable to the same issues as you would find on the AMD chipset. We have verified this on at least six vendor motherboards, mostly the Taiwanese manufacturers. So yeah, those products are affected.

IC: On the website, CTS-Labs states that the 0-day/1-day way of public disclosure is better than the 90-day responsible disclosure period commonly practiced in the security industry. Do you have any evidence to say that the paradigm you are pursuing with this disclosure is any better?

YLZ: I think there are pros and cons to both methods. I don’t think that it is a simple question. I think that the advantage of the 30 to 90 days of course is that it provides an opportunity for the vendor to consider the problem, comment on the problem, and provide potential mitigations against it. This is not lost on us.

On the other hand, I think that it also gives the vendors a lot of control on how it wants to address these vulnerabilities and they can first deal with the problem then come out with their own PR about the problem, I’m speaking generally and not about AMD in particular here, and in general they attempt to minimize the significance. If the problem is indicative of a widespread issue, as is the case with the AMD processors, then the company will company probably would want to minimize it and to play it down.

The second problem is that if mitigations are not available in the relevant timespan, this paradigm does not make much sense. You know we were talking to experts about the potential threat to these issues, and some of them are in the logic segment, ASICs, and so there is no obvious direct patch that can be developed for a workaround. This may or may not be available. Then the other one requires issuing a patch in the firmware and then going through the QA process, and typically when it comes to processors, QA is a multi-month process.

I estimate it will be many many months before AMD is able to patch these things. If we had said to them, let’s say, ‘you guys have 30 days/90 days to do this’ I don’t think it would matter very much and it would still be irresponsible on our part to come out after the period and release the vulnerabilities into the open.

So basically the choice that we were facing in this case was either we not tell the public and let the company fix it possibly and only then give it to the public and disclose, and in this circumstance we would have to wait, in our estimate, as much as a year, meanwhile everyone is using the flawed product. Or alternatively we never disclose the vulnerabilities, give it to the company, and then disclose at the same time we are giving it to the company so that the customers are aware of the risks of those products and can decide whether to buy and use them, and so on.

In this case we decided that the second option is the more responsible one, but I would say that in every case that this is the better method. But that is my opinion. Maybe Ilia (CTO) has a slightly different take on that. But these are my concerns.

IC: Would it be fair to say that you felt that AMD would not be able to mitigate these issues within a reasonable time frame, therefore you went ahead and made them public?

YLZ: I think that is a very fair statement. I would add that we saw that it was big enough of an issue for the consumer had the right to know about them.

IC: Say, for example, CTS-Labs were in charge of finding Meltdown and Spectre, you would have also followed the same path of logic?

YLZ: I think that it would have depended on the circumstances of how we found it, how exploitable it was, how reproducible it was. I am not sure it would be the case. Every situation I think is specific.

DK: How are you absolutely sure that these issues cannot already be rectified in hardware? There are plenty of external chip designers will not be able to tell you with any degree of certainty what can or cannot be rectified through microcode or through patches, though undocumented register flips etc. Who were the chip design experts that you spoke to, and how confident are you that their assessment was correct?

ILO: Let us start by saying that everything we have said are our own estimates based on talks that we have had with people in the semiconductor industry with chip designers and so forth. But these are estimates. We cannot say for certain how long it will take them to patch it or to find a workaround, or if a workaround is possible. I know exactly what you are talking about, that those chips could have undocumented features, and if you flip a bit in a register to enable some hidden feature or something that the engineers left behind on the design, that can help you to turn certain parts of the chip on and off. Maybe you can do that and those registers are very often than not very well documented even within the company itself. I know about this stuff, and everything that I am doing is our own estimates, it may be incorrect, we don’t really know or understand [trails off and doesn’t finish sentence]. Yeah.

In any case, the first thing I can say about this is that ASMedia produces ASICs. So I am fairly certain, based on everything we have read and the research we have done, that this is not an FPGA chip so they can’t just patch it with FPGA updates. I do not know if they have hidden features that would enable them to disable those features and I guess it is up to them to tell us.

YLZ: I think that it is up to them to announce what kind of workaround is available and how costly it will be in terms of disabling features and in terms of performance or whatnot. I think that as you have a hardware level flaw then it is a serious issue.

Regarding how long it would take, we have spoken to experts involved in the QA process in the semiconductor industry and we received a virtually unanimous response that the QA process is the longest part of the patching process. The patching itself may be simple or difficult but the QA process takes a long time.

In fact the one vulnerability that came out with AMD that was a lower level vulnerability that came out about 3 months ago and I believe they still have not come out with a patch. And now we are talking about 13 of them.

ILO: I want to correct that, they did come out with a patch. It took them over three months. It was announced at the end of September, and the new version of AGESA containing the patch came out mid-January, and you have to go through the process of rolling out the new version of AGESA to different motherboard manufacturers and they have to add to their own QA process for the updates and that process might still be rolling two months after the patch came out. It’s a very long process.

But we can’t say with precision how long it will take AMD to come out with patches but we feel confident that it will take months and not weeks.

IC: How many security researchers did you disclose with before going public?

YLZ: You mean the technical details in full? Trail of Bits was the only external party and then afterwards together with the company we disclosed to Microsoft, HP, Dell, Symantec, FireEye, and CrowdStrike. They have the whole she-bang.

IC: Gadi Evron, the CEO of Cymmetria, has started talking on social media with knowledge of the vulnerabilities. Can you confirm you briefed them, or did they get the details from other way?

ILO: We are in touch them, but they have not gone through the materials yet. They might decide to do that – we are going to see.

YLZ: They are a collaborating with us, so they have seen quite a bit of the findings, but unlike Trail of Bits they have not got the full information, the step-by-step.

IC: Would there be any circumstance in which you would be willing to share the details of these vulnerabilities and exploits under NDA with us?

YLZ: We would love to, but there is one quirk. According to Israel export laws, we cannot share the vulnerabilities with people outside of Israel, unless they are a company that provides mitigations to such vulnerabilities. That is why we chose the list. But look, we are interested in the validation of this – we want people to come out and give their opinion, but we are only limited to that circle of the vendors and the security companies, so that is the limitation there.

IC: Would that also prevent you from publishing them publicly?

YLZ: That is an interesting question, I haven’t even thought about that.

ILO: I think that we spoke to our lawyers, and generally, as far as I know because I am not a lawyer, but I don’t think it stops us. That being said we have no intension of publishing publicly because of these vulnerabilities to anyone outside the large security companies that can handle these.

IC: Sure, but on the website you have a table at the bottom that says that if anyone finds mitigations to these vulnerabilities to get in contact but you have not supplied any details. How do you marry the fact that you are requesting mitigations and not providing any detail for anyone to replicate the issues?

ILO: We are in touch with two large security vendors right now who have the materials and are looking into the materials and producing mitigations. As soon as they do produce them we will definitely update the website.

YLZ: I would add that we can’t assume that we are the only people who have been looking into those processors and found problems there. So what we are saying is that in addition to ourselves, if anyone has mitigations against them, we are happy to share them with the company and to receive it from individuals.

IC: Even though that not producing the details actively limits who can research the vulnerabilities?

YLZ: Yes.

There is nothing that I would love more than to have the validation from the world and you guys and everybody else. If I didn’t think that I would be (a) jeopardizing users because it is a long patching process and (b) violating a couple of laws, but yes that’s the only thing. But now we are sitting here with our fingers crossed that the companies that we gave this to, including AMD, come out with their response and accept or reject but we are confident that all of this works but we would love to hear from them.

Claimed Vulnerabilities
	Attacks	PoC	Claimed
MasterKey	Secure Processor	Ryzen EPYC	Ryzen Pro Ryzen Mobile
Chimera	Promontory + ASMedia Controllers	Ryzen Ryzen Pro	-
Ryzenfall	Secure OS	Ryzen Ryzen Pro	-
Fallout	Secure Boot Loader	EPYC	-

IC: It was stated that, and I quote, that ‘this is probably as bad as it gets in the world of security’. These vulnerabilities are secondary attack vectors and require admin level access and they also do not work in virtualized environments (because you can’t update a BIOS or chip firmware from a virtual machine) without having metal access which is typically impossible in a VM environment. What makes these worse than primary level exploits that give admin access?

ILO: I think that this is an important question. I will give you my opinion. I think that the idea that this requires local admin privileges that it doesn’t matter in a sense because the vector already has the access to the files. What I think that is particularly bad about this secondary attack is that it lets put malware in hardware, such as the secure processor, which has the highest privileges in the system. You are sitting there and you can get to all memory sectors and so from there you can stay undetected by antiviruses, and if the user reinstalls the operating system or formats the hard-drive you still stay there.

So if you think about attaching that to the routine attack, a primary attack, this thing can let an attacker stay there and conduct espionage and sit there indefinitely. Now put yourself in the shoes of a person who discovers that an attack was using this tool and they need to decide what to do now, so they are basically guessing which machine to throw out. That’s one I think kind of degree of severity.

The other is the lateral movement issue as you probably read in our whitepaper: the idea that you can break the virtualization of where the credentials are stored, where the Windows Credentials are in Windows 10. From this an attacker can move laterally in the network – I think it that it is obvious that one of the major barriers to lateral movement is breaking the distinction between software and hardware. If you think about this not as a private user but as an organization that is facing an attack, this is very scary stuff to think that hackers can have the tools of this kind. This is why I think the language is not hyperbole.

IC: Most enterprise level networks are built upon systems that rely on virtual machines (VMs), or use thin clients to access VMs. In this circumstance no OS has bare metal access due to the hypervisor unless the system is already compromised…

ILO: Can I stop you there? That is not correct. That is entirely incorrect. We are talking about companies. You know we have a company here – imagine you had a company with four floors with workstations for employees that run Windows and sometimes you have a domain environment on the network….

IC: Those are desktop systems, I specified enterprise.

ILO: Yeah, this is enterprise, this is a company. As I said it has four floors with computers inside. They may be running Ryzen Pro workstations. They may have a Microsoft Windows Domain server, maybe a file server, and what we are talking about here is lateral movement inside corporate networks like this one. This is ABC, this is what happens on TSX all over the world with reports about how Chinese hackers behave when they hack US companies and this is how it looks like.

IC: What do you suppose the market penetration is of Ryzen based corporate work deployments?

ILO: Well you know they are trying to push hard into this market right now but my own estimate - I don’t know I haven’t done the market research – but the market penetration is not very high. Hopefully it will stay this way until these issues have been resolved as it puts the network at risk.

YLZ: I think that if you look at the market penetration – I have done more market research on this – and I think that analysts are now estimating that by 2020 that AMD will have 10% worldwide server market share. That is in two years. That is quite a few computers out there.

IC: But that is server market share – based mostly on VM oriented systems.

DK: Bare metal access to servers is a different animal. If you take the deployments of Azure, even if you have root privileges, you are still running virtualized. Servers are different to desktops.

ILO: Regarding servers, the main impact – let us say you are customer of Microsoft Azure that is integrating EPYC servers right now. You have a virtual machine on the server and that is all you have. To be honest with you, in that particular situation the vulnerabilities do not help you very much. However if a server gets compromised and the cloud provider is relying on secure virtualization to segregate customer data by encrypting memory, and someone runs an exploit on your server and breaks into the SP, they could tamper with this mechanism and this mechanism. I think this is one of main reasons to integrate EPYC servers into the data center – it is the feature that EPYC servers offer to cloud features, and that feature can be broken if someone gets access to the secure processor.

YLZ: We’re talking about the cloud specifically, rather than servers in your data center, then the secure processor can be taken over with very high privileges. So I think it is a huge detail.

YLZ: As much as it is a pleasure talking to you we have time for only a few more questions.

IC: Can you describe how you came up with the names for these exploits?

YLZ: It was our creativity and fervent imagination.

IC: Did you pre-brief the press before you spoke to AMD?

ILO: What do you mean by pre-brief the press?

IC: We noticed that when the information went live, some press were ready to go with relevant stories and must have had the information in advance.

ILO: Before our announcement you mean?

IC: Correct.

ILO: I would have to check the timing on that and get back to you, I do not know off the top of my head.

DK: I think the biggest question that I still have is that ultimately who originated this request for analysis – who was the customer that kicked this all off?

ILO: I definitely am not going to comment on our customers.

DK: What about the flavor of customer: is it a semiconductor company, is it someone in the industry, or is it someone outside the industry? I don’t expect you to disclose the name but the genre seems quite reasonable.

ILO: Guys I’m sorry we’re really going to need to jump off this call but feel free to follow up with any more questions.

[End of Call]

This call took place at 1:30pm ET on 3/14. After the call, we sent a series of 15 questions to CTS-Labs at 6:52pm ET on the same day. As of 7:10pm ET on 3/15, we have not had a response. These questions included elements related to

The use of a PR firm which is non-standard practice for this (and the PR firm were not involved in any way in our call, which is also odd),
Viceroy Research, a company known for shorting stock, and their 25-page blowout report published only three hours after the initial announcement,
And the 2018 SEC listing of the CFO as the President of NineWells Capital, a hedge fund based in New York, that has interests in equity, corporate debt investments, and emphasis on special situations.

If we get answers, we will share these with you.

Commentary and Clarification

All processors and chips have flaws – some are critical, others are simple, some can be fixed through hardware and software. There are security issues and errata in some processors that are several years old.

CTS-Labs’ reasoning for believing that AMD cannot patch within a reasonable period, coming from ‘industry experts’, seems off – almost as if they believe that when they enter a responsible disclosure time-frame, they cannot reveal the vulnerabilities until they are fixed. The response to the question about Meltdown and Spectre, if they would have the same attitude to the fact that the industry had several months before publication for a coordinated effort, means that despite offering a unilateral reasoning for 0-day disclosure, they would not apply it unilaterally but on a case-by-case basis. The language used is clearly not indicative of their actual feeling and policy.

Blaming the lack of CVE numbers on them being ‘new’ to it, yet citing repeatedly many years of security experience and in cyber-security is also in opposition. It may be their first public disclosure, even as individuals, but I find it hard to believe they were not prepared on the CVE front. Using our own experts, it can take hours for a well-known company to be issued a CVE number, or weeks for an unknown entity. Had CTS-Labs approached a big security firm, or AMD, then these could have been issued relatively easily. CTS-Labs stated that they are waiting for CVE numbers to be issued to going to be an interesting outcome, especially if they appear and/or the time of submission is provided.

It seems a bit odd for a company looking into ASMedia related flaws to then turn their focus onto AMD’s secure processor, using the chipset vulnerabilities as a pivot point. ASMedia chips, especially the USB host controllers cited by CTS-Labs, are used on literally tens of millions of Intel-based motherboards around the world, from all the major OEMs. For a large period of time, it was hard to find a system without one. The decision to pivot on newer AMD platforms is a weak argument, the wishy-washy language when discussing projects at the start of the company’s existence, and the abrupt ending to the call when asked to discuss the original customer could be construed (this is conjecture here) that the funding for the product was purposefully directional towards AMD.

The discussion about the understanding of how vulnerabilities can be mitigated certainly piqued David’s interest. The number of things that can be done through microcode, or that are undocumented to third-party chip analysts, means that it came across as highly strange that CTS-Labs were fervent in their believe that AMD could not patch the issue in reasonable time to warrant a longer reasonable disclosure period (one of many arguments used). As seen in recent vulnerabilities and responsible disclosures, chip designers have implemented a number of significant methods to enable/disable/adjust features that were not known to be able to be adjusted, such as resetting a branch predictor. All it takes is for the chip company to say ‘we can do this’ and for it to be implanted, so the use of high-impact language was certainly noted. The confusion between microcode and FPGA in the discussion also raised an eyebrow or three.

When approaching the subject of virtualized environments, the short sharp acceptance that these vulnerabilities were less of an issue in VMs and with cloud providers was quickly overshadowed by the doom and gloom message for if a system was already compromised, even when it was stated that analyst expect 10% market share for EPYC by 2020. It was clear that the definitions of enterprise deployments seemed to differ between AnandTech and CTS-Labs, again partnered with amounts of doom and gloom.

Lastly, the legal argument of not being able to share the details outside of Israel, or only to registered security companies outside of Israel, was an interesting one we were not expecting. This being coupled with the lack of knowledge on the effect of an open disclosure led us to reach out to our legal contacts that are familiar with the situation. This led to the line:

“It’s BS, no restrictions.”

Some of our contacts, and readers with security backgrounds, have privately confirmed that most of this is quite fishy. The combination of the methodology and presentation with a new company that both claims to have experience but can’t do CVE numbers is waving red flags.

Opinion

Going back to the two original questions, here is where I personally stand on the issue:

One	What are the vulnerabilities, how bad are they, and what can be done?

One, if the vulnerabilities exist: It is very likely that these vulnerabilities are real. A secondary attack vector that could install monitoring software might be part of a multi-layer attack, but offering a place for indiscriminant monitoring of compromised systems can be seen as an important hole to fix. At this point, the nearest trusted source we have that these vulnerabilities are real is from Alex Ionescu, a Windows Internals Expert who works for CrowdStrike, one of the companies that CTS-Labs says has the full disclosure documents. That is still a stage a bit far from us to warrant a full confirmation. Given that Trail of Bits required 4-5 days to examine CTS-Labs work, I suspect it will take AMD a similar amount of time to do so. If that is the case, AMD might have additional statements either on Friday or Monday, either confirming or rebutting the issues, and discussing future action.

Two	Who are CTS-Labs, how come their approach to reasonable responsible disclosure differs to other security firms, how come a number of elements about the disclosure are atypical of a security firm, what is thier financial model, and who are their clients?

Two, the status of CTS-Labs: I’m more than willing to entertain the fact that as a first public high-level disclosure, a security company can be out of step with a few of the usual expected methods of responsible disclosure and presentation. A number of new security companies that want to make a name for themselves have to be bold and brash to get new customers, however we have never quite seen it to this extent – normally the work speaks for itself, of the security company will develop a relationship with the company with the vulnerability and earn its kudos that way. The fact that CTS-Labs went with a polished website (with nine links to download the whitepaper, compared to the Meltdown/Spectre websites that had one), and a PR firm is definitely a different take. The unilateral reasoning for a 0-day/1-day disclosure, followed by a self-rebuttal when presented with a more significant issue, shows elements of inconsistency in their immediate judgement. The lack of CVEs ready to go, despite the employees having many years of experience, as well as experience in the Israeli equivalent of the NSA in Unit 8200, does seem as opposites; an experienced security team would be ready. The swift acceptance that cloud-based systems are vulnerable but then going straight into doom and gloom, despite the limited attack surface in that market, shows that they are focusing on the doom and gloom. The reluctance for CTS-Labs to talk about clients and funding, or previous projects, was perhaps to be expected.

The initial downside of this story coming into the news was the foreboding question of ‘is this how we are going to do security now?’. Despite the actions of CTS and their decision to go with a 24-hour period, after speaking to long-term industry experts at high profile technology companies, a standard 90-180 day pre-disclosure period is still the primary standard that manufacturers would expect security companies to adhere with to actively engage with responsible information and verification. We were told that to go beyond/behind this structure ultimately formulates a level of distrust between the company, the security agency, and potentially the clients, regardless of the capabilities of the security researchers or the severity of the issues found; moreso if the issues are blown out of proportion in relation to their nature and attack surface.

Regular Expressions – Mastering Lookahead and Lookbehind

March 15, 2018, 7:01 pm

≫ Next: Interactive map of Linux kernel

≪ Previous: Anandtech: Our Interesting Call with CTS-Labs

Lookarounds often cause confusion to the regex apprentice. I believe this confusion promptly disappears if one simple point is firmly grasped. It is that at the end of a lookahead or a lookbehind, the regex engine hasn't moved on the string. You can chain three more lookaheads after the first, and the regex engine still won't move. In fact, that's a useful technique.

A quick syntax reminder
This page digs deep into the details of lookahead and lookbehind and assumes you've already become familiar with the basic syntax, perhaps by reading the lookaround section of the reference on (? … ) syntax. As a quick reminder before we dive in, here are the four lookarounds.

Lookaround	Name	What it Does
(?=foo)	Lookahead	Asserts that what immediately follows the current position in the string is foo
(?<=foo)	Lookbehind	Asserts that what immediately precedes the current position in the string is foo
(?!foo)	Negative Lookahead	Asserts that what immediately follows the current position in the string is not foo
(?<!foo)	Negative Lookbehind	Asserts that what immediately precedes the current position in the string is not foo

(direct link)
Jumping Points
For easy navigation, here are some jumping points to various sections of the page:

✽ Lookahead Example: Simple Password Validation
✽ The Order of Lookaheads Doesn't Matter… Almost
✽ Lookarounds Stand their Ground
✽ Various Uses for Lookarounds
✽ Zero-Width Matches
✽ Positioning the Lookaround Before or After the Characters to be Matched
✽ Lookarounds that Look on Both Sides: Back to the Future
✽ Compound Lookahead and Compound Lookbehind
✽ The Engine Doesn't Backtrack into Lookarounds (They're Atomic)
✽ Fixed-Width, Constrained-Width and Infinite-Width Lookbehind
✽ Lookarounds (Usually) Want to be Anchored

(direct link)

Lookahead Example: Simple Password Validation

Let's get our feet wet right away with an expression that validates a password. The technique shown here will be useful for all kinds of other data you might want to validate (such as email addresses or phone numbers).
Our password must meet four conditions:

1. The password must have between six and ten word characters \w
2. It must include at least one lowercase character [a-z]
3. It must include at least three uppercase characters [A-Z]
4. It must include at least one digit \d

We'll assume we're working in a regex flavor where \d only matches ASCII digits 0 through 9, unlike .NET and Python where that token can match any Unicode digit.

With lookarounds, your feet stay planted on the string. You're just looking, not moving!

Our initial strategy (which we'll later tweak) will be to stand at the beginning of the string and look ahead four times—once for each condition. We'll look to check we have the right number of characters, then we'll look for a lowercase letter, and so on. If all the lookaheads are successful, we'll know the string is a valid password… And we'll simply gobble it all up with a plain .*

Let's start with condition 1
A string that is made of six-to-ten word characters can be written like this: \A\w{6,10}\z
The \A anchor asserts that the current position is the beginning of the string. After matching the six to ten word characters, the \z anchor asserts that the current position is the end of the string.

Within a lookahead, this pattern becomes (?=\A\w{6,10}\z). This lookahead asserts: at the current position in the string, what follows is the beginning of the string, six to ten word characters, and the very end of the string.

We want to make this assertion at the very beginning of the string. Therefore, to continue building our pattern, we want to anchor the lookahead with an \A. There is no need to duplicate the \A, so we can take it out of the lookahead. Our pattern becomes:
\A(?=\w{6,10}\z)
So far, we have an expression that validates that a string is entirely composed of six to ten word characters. Note that we haven't matched any of these characters yet: we have only looked ahead. The current position after the lookahead is still the beginning of the string. To check the other conditions, we just add lookaheads.

Condition 2
For our second condition, we need to check that the password contains one lowercase letter. To find one lowercase letter, the simplest idea is to use .*[a-z]. That works, but the dot-star first shoots down to the end of the string, so we will always need to backtrack. Just for the sport, can we think of something more efficient? You might think of making the star quantifier reluctant by adding a ?, giving us .*?[a-z], but that too requires backtracking as a lazy quantifier requires backtracking at each step.

For this type of situation, I recommend you use something like [^a-z]*[a-z] (or even better, depending on your engine, the atomic(?>[^a-z]*)[a-z] or possessive version [^a-z]*+[a-z]—but we'll discuss that in the footnotes). The negated character class [^a-z] is the counterclass of the lowercase letter [a-z] we are looking for: it matches one character that is not a lowercase letter, and the * quantifier makes us match zero or more such characters. The pattern [^a-z]*[a-z] is a good example of the principle of contrast recommended by the regex style guide.

Let's use this pattern inside a lookahead: (?=[^a-z]*[a-z])
The lookahead asserts: at this position in the string (i.e., the beginning of the string), we can match zero or more characters that are not lowercase letters, then we can match one lowercase letter: [a-z]
Our pattern becomes:
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])
At this stage, we have asserted that we are at the beginning of the string, and we have looked ahead twice. We still haven't matched any characters. Note that on a logical level it doesn't matter which condition we check first. If we swapped the order of the lookaheads, the result would be the same.

We have two more conditions to satisfy: two more lookaheads.

Condition 3
For our third condition, we need to check that the password contains at least three uppercase letters. The logic is similar to condition 2: we look for an optional number of non-uppercase letters, then one uppercase letter… But we need to repeat that three times, for which we'll use the quantifier {3}.
We'll use this lookahead: (?=(?:[^A-Z]*[A-Z]){3})

The lookahead asserts: at this position in the string (i.e., the beginning of the string), we can do the following three times: match zero or more characters that are not uppercase letters (the job of the negated character class [^A-Z] with the quantifier *), then match one uppercase letter: [A-Z]
Our pattern becomes:
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})
At this stage, we have asserted that we are at the beginning of the string, and we have looked ahead three times. We still haven't matched any characters.

Condition 4
To check that the string contains at least one digit, we use this lookahead: (?=\D*\d). Opposing \d to its counterclass \D makes good use of the regex principle of contrast.

The lookahead asserts: at this position in the string (i.e., the beginning of the string), we can match zero or more characters that are not digits (the job of the "not-a-digit" character class \D and the * quantifier), then we can match one digit: \d
Our pattern becomes:
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})(?=\D*\d)
At this stage, we have asserted that we are at the beginning of the string, and we have looked ahead four times to check our four conditions. We still haven't matched any characters, but we have validated our string: we know that it is a valid password.

If all we wanted was to validate the password, we could stop right there. But if for any reason we also need to match and return the entire string—perhaps because we ran the regex on the output of a function and the password's characters haven't yet been assigned to a variable—we can easily do so now.

Matching the Validated String
After checking that the string conforms to all four conditions, we are still standing at the beginning of the string. The five assertions we have made (the anchor \A and the four lookaheads) have not changed our position. At this stage, we can use a simple .* to gobble up the string: we know that whatever characters are matched by the dot-star, the string is a valid password. The pattern becomes:
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})(?=\D*\d).*
(direct link)
Fine-Tuning: Removing One Condition

For n conditions,
use n-1 lookaheads

If you examine our lookaheads, you may notice that the pattern \w{6,10}\z inside the first one examines all the characters in the string. Therefore, we could have used this pattern to match the whole string instead of the dot-star .*

This allows us to remove one lookahead and to simplify the pattern to this:

\A(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})(?=\D*\d)\w{6,10}\z
The pattern \w{6,10}\z now serves the double purpose of matching the whole string and of ensuring that the string is entirely composed of six to ten word characters.

Generalizing this result, if you must check for n conditions, your pattern only needs to include n-1 lookaheads at the most. Often, you are even able to combine several conditions into a single lookahead.

You may object that we were able to use \w{6,10}\z because it happened to match the whole string. Indeed that was the case. But we could also have converted any of the other three lookaheads to match the entire string. For instance, taking the lookahead (?=\D*\d) which checks for the presence of one digit, we can add a simple .*\z to get us to the end of the string.

The pattern would have become:
\A(?=\w{6,10}\z)(?=[^a-z]*[a-z])(?=(?:[^A-Z]*[A-Z]){3})\D*\d.*\z
By the way, you may wonder why I bother using the \z after the .*: shouldn't it get me to the end of the string? In general, not so: unless we're in DOTALL mode, the dot doesn't match line breaks. Therefore, the .* only gets you to the end of the first line. After this, the string may have line breaks and many more line. A \z anchor ensures that after the .* we have reached not only the end of the line, but also the end of the string.

In this particular pattern, the first lookaround (?=\w{6,10}\z) already ensures that there cannot be any line breaks in the string, so the final \z is not strictly necessary.

(direct link)

The Order of Lookaheads Doesn't Matter… Almost

In our password validation pattern, since the three lookaheads don't change our position in the string, we can rearrange them in any order without affecting the overall logic.

While the order of lookaheads doesn't matter on a logical level, keep in mind that it may matter for matching speed. If one lookahead is more likely to fail than the other two, it makes little sense to place it in third position and expend a lot of energy checking the first two conditions. Make it first, so that if we're going to fail, we fail early—an application of the design to fail principle from the regex style guide.

In fact, this is what we do by placing the anchor \A in first position. Since it is an assertion that doesn't consume characters, it too could swap positions with any of the lookaheads. We'll see why this is a bad idea, but first…

In passing, consider that \A can be written with lookarounds: in DOTALL mode, where the dot matches any character including line breaks, the negative lookbehind (?<!.) asserts that what precedes the current position is not any character—therefore the position must be the beginning of the string. Without DOTALL mode, the negative lookbehind (?<![\D\d]) asserts the same, since [\D\d] matches one character that is either a digit or a non-digit—in other words, any character.

Now imagine we set \A in fourth position, after the three lookaheads. The resulting match would be the same, but it could take a lot more time. For instance, suppose the third lookahead (whose job it is to assert that the string contains at least one digit) fails. After failing to find a match at the first position in the string, the engine advances to the second position and tries the lookaheads again, one after the other. Once more, the third lookahead is bound to fail to find a digit. After each failure, the engine will start a new match attempt starting at the next position in the string. Even when the two first lookaheads succeed (and they may fail, as the uppercase or lowercase letter they check for may have been the lone one in the string, and at a position already passed), the third lookahead will always fail to find a digit. Therefore the anchor \A is never even attempted: the pattern fails before the engine reaches that token.

In contrast, when \A is first, it can only match at the first position in the string. The third lookahead still fails, but when the engine tries to match at further positions, the \A immediately fails, so the engine doesn't need to waste any more time with the lookaheads.

(direct link)

Lookarounds Stand their Ground

If I seem to be flogging a dead horse here, it's only because this point is the most common source of confusion with lookarounds. As the password validation example made clear, lookarounds stand their ground. They look immediately to the left or right of the engine's current position on the string—but do not alter that position.

Therefore, do not expect the pattern A(?=5) to match the A in the string AB25. Many beginners assume that the lookahead says that "there is a 5 somewhere to the right", but that is not so. After the engine matches the A, the lookahead (?=5) asserts that at the current position in the string, what immediately follows is a 5. If you want to check if there is a 5 somewhere (anywhere) to the right, you can use (?=[^5]*5).

Moreover, don't expect the pattern A(?=5)(?=[A-Z]) to match the A in the string A5B. Many beginners assume that the second lookahead looks to the right of the first lookahead. It is not so. At the end of the first lookahead, the engine is still planted at the very same spot in the string, after the A. When the lookahead (?=[A-Z]) tries to assert that what immediately follows the current position is an uppercase letter, it fails because the next character is still the 5. If you want to check that the 5 is followed by an uppercase letter, just state it in the first lookahead: (?=5[A-Z])

So lookahead and lookbehind don't mean "look way ahead into the distance". They mean "look at the text immediately to the left or to the right". If you want to inspect a piece of string further down, you will need to insert "binoculars" inside the lookahead to get you to the part of the string you want to inspect—for instance a .*, or, ideally, more specific tokens.

(direct link)

Various Uses for Lookarounds

Before we dive into interesting but sometimes terse details, let's get excited about lookarounds by surveying some of their terrific uses.

Validation
The password validation section showed how the combination of several lookaheads can impose a number of conditions on the string to be matched, allowing us to validate it with a single pattern.

Restricting a Character Range (Subtraction, Intersection)
Suppose you want to match one word character \w as long as it is not the letter Q. There are several ways to do it without lookarounds:
✽ In engines that support character class subtraction, you can use [\w-[Q]] (.NET), [\w&&[^Q]] (Java and Ruby 1.9+) or [\w--Q] (Python with the alternate regex module)
✽ You can build a character class such as [_0-9a-zA-PR-Z]
✽ You can use [^\WQ]—an example of an obnoxious double-negative character range.

If your engine doesn't support character class subtraction, the simplest may be to use the workaround shown on the page about class operations. This uses a lookahead to restrict the character class \w:
(?!Q)\w After the negative lookahead asserts that what follows the current position is not a Q, the \w matches a word character.

Not only is this solution easy to read, it is also easy to maintain if we ever decide to exclude the letter K instead of Q, or to exclude both: (?![QK])\w

Note that we can also perform the same exclusion task with a negative lookbehind:
\w(?<!Q) After the \w matches a word character, the negative lookbehind asserts that what precedes the current position is not a Q.

Using the same idea, if we wanted to match one character in the Arabic script as long as it is not a number, we could use this pattern:
(?!\p{N})\p{Arabic} This would work in Perl, PCRE (C, PHP, R…) and Ruby 2+. In .NET and Java, you would use (?!\p{N})\p{IsArabic}

Likewise, we can use this technique to perform a DIY character class intersection. For instance, to match one character in the Arabic script as long as it is a number, we transform the negative lookahead above to a positive lookahead. In the Perl / PCRE / Ruby version, this gives us:
(?=\p{N})\p{Arabic}
This is basically the password validation technique with two conditions applied to a single character.

Needless to say, you can interchange the content of the lookahead with the token to be matched: (?=\p{Arabic})\p{N}
Tempering the scope of a token
This use is similar to the last. Instead of removing characters from a class, it restricts the scope within which a token is allowed to match.

For instance, suppose we want to match any character as long as it is not followed by {END}. Using a negative lookahead, we can use:
(?:(?!{END}).)* Each . token is tempered by (?!{END}), which specifies that the dot cannot be the beginning of {END}. This technique is called tempered greedy token on the Quantifiers page.

Another technique is:
(?:[^{]++|{(?!END}))*+ On the left side of the alternation, [^{]++ matches characters that are not an opening brace. On the right side, {(?!END}) matches an opening brace that is not followed by END}. This technique appears in the Explicit Greedy Alternation section of the Quantifiers page.

Delimiter
Do you have a string where you want to start matching all characters once the first instance of #START# is passed? No problem, just use a lookbehind to make a delimiter:
(?<=#START#).* After the lookbehind asserts that what immediately precedes the current position is #START#, the dot-star .* matches all the characters to the right.

Or would you like to match all characters in a string up to, but not including the characters #END#? Make a delimiter using a lookahead:
.*?(?=#END#)
You can, of course, combine the two:
(?<=#START#).*?(?=#END#)
See the page on boundaries for advice on building fancy DIY delimiters.

(direct link)
Inserting Text at a Position
Someone gave you a file full of film titles in CamelCase, such as HaroldAndKumarGoToWhiteCastle. To make it easier to read, you want to insert a space at each position between a lowercase letter and an uppercase letter. This regex matches these exact positions:
(?<=[a-z])(?=[A-Z])
In your text editor's regex replacement function, all you have to do is replace the matches space characters, and spaces be inserted in the right spot.

This regex is what's known as a "zero-width match" because it matches a position without matching any actual characters. How does it work? The lookbehind asserts that what immediately precedes the current position is a lowercase letter. And the lookahead asserts that what immediately follows the current position is an uppercase letter.

(direct link)
Splitting a String at a Position
We can use the exact same regex from the previous example to split the string AppleOrangeBananaStrawberryPeach into a list of fruits. Once again, the regex
(?<=[a-z])(?=[A-Z]) matches the positions between a lowercase letter and an uppercase letter.

In most languages, when you feed this regex to the function that uses a regex pattern to split strings, it returns an array of words.

Note that Python's re module does not split on zero-width matches—but the far superior regex module does.

(direct link)
Finding Overlapping Matches
Sometimes, you need several matches within the same word. For instance, suppose that from a string such as ABCD you want to extract ABCD, BCD, CD and D. You can do it with this single regex:
(?=(\w+)) When you allow the engine to find all matches, all the substrings will be captured to Group 1

How does this work?

At the first position in the string (before the A), the engine starts the first match attempt. The lookahead asserts that what immediately follows the current position is one or more word characters, and captures these characters to Group 1. The lookahead succeeds, and so does the match attempt. Since the pattern didn't match any actual characters (the lookahead only looks), the engine returns a zero-width match (the empty string). It also returns what was captured by Group 1: ABCD

The engine then moves to the next position in the string and starts the next match attempt. Again, the lookahead asserts that what immediately follows that position is word characters, and captures these characters to Group 1. The match succeeds, and Group 1 contains BCD.

The engine moves to the next position in the string, and the process repeats itself for CD then D.

In .NET, which has infinite lookbehind, you can find overlapping matches from the other side of the string. For instance, on the same string ABCD, consider this pattern:
(? It will capture A, AB, ABC and ABCD. To achieve the same in an engine that doesn't support infinite lookbehind, you would have to reverse the string, use the lookahead version (?=(\w+)) then reverse the captures.

(direct link)

Zero-Width Matches

As we've seen, a lookaround looks left or right but it doesn't add any characters to the match to be returned by the regex engine. Likewise, an anchor such as ^ and a boundary such as \b can match at a given position in the string, but they do not add any characters to the match.

Usually, lookaheads, lookbehinds, anchors and boundaries appear in patterns that contain tokens that do match characters, allowing the engine to return a matched string. For instance, in (?<=start_)\d+, the engine matches and returns some digits, but not the prefix start_

However, if a pattern only contains lookarounds, anchors and boundaries, the engine may be able to match the pattern without matching any characters. The resulting match is called a zero-width match because it contains no characters.

This can be a useful technique, and we have already seen some applications of zero-width matches in the section on uses for lookarounds. To bring them together under one heading, here are some of their main uses.

Validation
If you string several lookarounds in a row, you can validate that a string conforms to a set of rules, as in the password validation technique.

We saw that when you have n conditions, if you also want to match the string, you usually need n-1 lookarounds at the most as one condition can be removed and used in the matching section of the pattern. But if all you want to do is validate, all the conditions can stay inside lookarounds, giving you a zero-width match.

Inserting
You can use a zero-width match regex to match a position in a string and insert text at that position. For instance, by matching (?m)^ (the beginning of a line in multiline mode) and replacing the match with // , you can add a prefix to every line of a file.

Likewise, we saw how the zero-width pattern (?<=[a-z])(?=[A-Z]) allows you to insert characters in a CamelCase word.

Splitting
We saw how the same zero-width pattern (?<=[a-z])(?=[A-Z]) allows you to split a CamelCase word into its components.

Overlapping Matches
We saw how an unanchored lookaround that contains capture groups—such as (?=(\w+))—allows you to match overlapping string segments.

(direct link)

Positioning the Lookaround

Often, you have two options for positioning a lookaround: before the text to be matched, or after. Usually, one of the options is more efficient because it requires less work of the engine.

To illustrate this, here are examples for each kind of lookaround. I borrowed them from the lookarounds section of the main syntax page, where they are discussed in greater detail.

Lookahead
\d+(?= dollars) and (?=\d+ dollars)\d+ both match 100 in 100 dollars, but the first is more efficient because the engine needs to match \d+ only once.

Negative Lookahead
\d+(?! dollars) and (?!\d+ dollars)\d+ both match 100 in 100 pesos, but the first is more efficient because the engine needs to match \d+ only once.

Lookbehind
(?<=USD)\d{3} and \d{3}(?<=USD\d{3}) both match 100 in USD100, but the first is more efficient because the engine needs to match \d{3} only once.

Negative Lookbehind
(?<!USD)\d{3} and \d{3}(?<!USD\d{3}) both match 100 in JPY100, but the first is more efficient because the engine needs to match \d{3} only once.

What may not be so clear is that each of these lookarounds can be used in two main ways: before the expression to be matched, or after it. These two ways have a slightly different feel. Please don't obsess over the differences; rather, just cruise through these simple examples to become familiar with the types of effects you can achieve.

When you compare each pair, the two methods have a different feel. The point of the examples is not to make you memorize "the right position", but to expose you to those two basic feels. Once you're familiar with them, you will naturally think of rewriting a lookaround that feels too heavy. With a bit of practice, the efficient way of positioning your lookarounds will probably come to you naturally.

(direct link)

Lookarounds that Look on Both Sides: Back to the Future

Suppose you want to match a two-digit number surrounded by underscores as in _12_ but not the underscores.

We have already seen three ways to do this:
✽ You can match everything and capture the digits to Group 1: _(\d{2})_
✽ You can use a lookbehind and a lookahead: (?<=_)\d{2}(?=_)
✽ You can use \K to drop the first underscore from the match: _\K\d{2}(?=_)

There is a fourth technique I'd like to introduce you to. I call it the "back to the future lookbehind." There shouldn't be any reason to use it on its own, but sometimes within an intricate pattern it may just what you need, so it's nice to be familiar with it and add it to your repertoire.

We can position our back-to-the-future lookbehind before or after the digits. Let's start with the before version:
(?<=_(?=\d{2}_))\d+
Wowzy, what does this do? The lookbehind asserts that what immediately precedes the current position in the string is an underscore, then a position where the lookahead (?=\d{2}_) can assert that what immediately follows is two digits and an underscore.

This is interesting for several reasons. First, we have a lookahead within a lookbehind, and even though we were supposed to look backwards, this lookahead jumps over the current position by matching the two digits and the trailing underscore. That's acrobatic.

Second, note that even though it looks complex, this is a fixed-width lookbehind (the width is one character, the underscore), so it should work in all flavors of lookbehind. (However, it does not work in Ruby as Ruby does not allow lookaheads and negative lookbehinds inside lookbehind.)

Another interesting feature is how the notion of "current position in the string" is not the same for the lookbehind and for the lookahead. You'll remember that lookarounds stand their ground, so that after checking the assertion made by a lookaround, the engine hasn't moved in the string. Are we breaking that rule?

We're not. In the string 10 _16_ 20, let's say the engine has reached the position between the underscore and the 1 in 16. The lookbehind makes an assertion about what can be matched at that position. When the engine exits the lookbehind, it is still standing in that same spot, and the token \d{2} can proceed to match the characters 16.

But within the lookbehind itself, we enter a different little world. You can imagine that outside that world the engine is red, and inside the little world of the lookbehind, there is another little engine which is yellow. That yellow engine keeps track of its own position in the string. In most engines (.NET proceeds differently), the yellow engine is initially dropped at a position in the string that is found by taking the red engine's position and subtracting the width of the lookbehind, which is 1. The yellow engine therefore starts its work before the leading underscore. Within the lookbehind's little world, after matching the underscore token, the yellow engine's position in the string is between the underscore and the 1. It is that position that the lookahead refers to when it asserts that at the current position in the string (according to the little world of the lookbehind and its yellow engine), what immediately follows is two digits and an underscore.

After the digits
Here is a second version where the "back-to-the-future lookbehind" comes after the digits:
\d+(?<=_\d{2}(?=_))
The lookbehind states: what immediately precedes this position in the string is an underscore and two digits, then a position where the lookahead (?=_) can assert that what immediately follows the current position in the string (according to the yellow engine and the lookbehind's little world) is an underscore.

This too is a fixed-width lookbehind (the width is three character, i.e. the leading underscore and the two digits), so it should work in all flavors of lookbehind except Ruby.

(direct link)

Compound Lookahead and Compound Lookbehind

The back-to-the-future lookbehind introduced us to what I call compound lookarounds, i.e., lookarounds that contain other lookarounds. You could also call them nested lookarounds, but for me the idea of compounding captures something more about the feel of working with these constructs.

Let's look at some examples.

Token followed by one character, but not more
How can you match a number that is followed by one underscore, but not more?

You can use this:
\d+(?=_(?!_)) The lookahead asserts: what follows the current position in the string is one underscore, then a position where the negative lookahead (?!_) can assert that what follows is not an underscore. A less elegant variation would be \d+(?=(?!__)_)

Token preceded by one character, but not more
How can you match a number that is preceded by one underscore, but not more?

You can use this:
(?<=(?<!_)_)\d+ The lookbehind asserts: what precedes the current position in the string is a position where the negative lookbehind (?<!_) can assert that what immediately precedes is not an underscore, then an underscore. A variation would be (?<=_(?<!__))\d+

Multiple Compounding
Needless to say, it won't be long until you find occasions to add levels of compounding beyond the two we've just seen. But that quickly becomes obnoxious, and it becomes simpler to rearrange the regex. For instance, building on the previous pattern,
(?<=(?<!(?<!X)_)_)\d+ matches a number that is precede by an underscore that is not preceded by an underscore unless that underscore is preceded by an X.

In .NET, PCRE, Java and Ruby, this could be simplified to (?<=(?<!_)_|X__)\d+
In Perl and Python, you could use (?:(?<=(?<!_)_)|(?<=X__))\d+

(direct link)

The Engine Doesn't Backtrack into Lookarounds…

…because they're atomic
Here's a fun regex task. You have a string like this:
_rabbit _dog _mouse DIC:cat:dog:mouse
The DIC section at the end contains a list of allowed animals. Our job is to match all the _tokens named after an allowed animal. Therefore, we expect to match _dog and _mouse. A lookaround helps us do this:

_(\w+)\b(?=.*:\1\b)
After matching the underscore, we capture a word to Group 1. Then the lookahead (?=.*:\1\b) asserts what follows the current position in the string is zero or more characters, then a colon, then the word captured to Group 1. As hoped, this matches both _dog and _mouse.

Now suppose we try a "reversed" approach:

_(?=.*:(\w+)\b)\1\b
This only matches _mouse. Why?

First let's try to understand what this regex hopes to accomplish. It may not be that obvious, but it illustrates an important feature of lookarounds.

After the engine matches the underscore, the lookahead (?=.*:(\w+)\b) asserts that what follows the current position in the string is any number of characters, then a colon, then a word (captured to Group 1). After passing that assertion, the back-reference \1 matches what was captured into Group 1.

Let's see how this works out. Remember that our string is
_rabbit _dog _mouse DIC:cat:dog:mouse
After the underscore that precedes rabbit, we expect the lookahead to fail because there is no rabbit in the DIC section—and it does. The next time we match an underscore is before dog. At that stage, inside the lookahead (?=.*:(\w+)\b), the dot-star shoots down to the end of the string, then backtracks just far enough to allow the colon to match, after which the word mouse is matched and captured to Group 1. The lookahead succeeds. The next token \1 tries to match mouse, but the next character in the string is the d from dog, so the token fails. At this stage, having learned everything about backtracking, we might assume that the regex engine allows the dot-star to backtrack even more inside the lookahead, up to the previous colon, which would then allow (\w+) to match and capture mouse. Then the back-reference \1 would match mouse, and the engine would return a successful match.

However, it does not work that way. Once the regex engine has left a lookaround, it will not backtrack into it if something fails somewhere down the pattern. On a logical level, that is because the official point of a lookaround is to return one of two values: true or false. Once a lookahead evaluates to true at a given position in the string, it is always true. From the engine's standpoint, there is nothing to backtrack. What would be the point—since the only other available value is false, and that would fail the pattern?

The fact that the engine will not backtrack into a lookaround means that it is an atomic block. This property of lookarounds will rarely matter, but if someday, in the middle of building an intricate pattern, a lookahead refuses to cooperate… This may be the reason.

(direct link)

Fixed-Width, Constrained-Width and Infinite-Width Lookbehind

In strings such as 123456_ORANGE abc12_APPLE, suppose you are interested in matching uppercase words, provided they are preceded by a prefix composed of digits and an underscore character. Therefore, in this string, you want to match ORANGE but not APPLE.

It's worth remembering that in most regex flavors (.NET is one of the few exceptions), the following pattern is invalid:

(?<=\b\d+_)[A-Z]+
That is because the width of the text matched by the token \d+ can be anything. Most engines require the width of the subexpression within a lookbehind to be known in advance, as in (?<=\d{3})

Some engines allow the width of the subexpression within a lookbehind to take various pre-determined values found on the various sides of an alternation, as in (?<=0|128|\d{6}). Yet others allow the width to vary within a pre-determined range, as in (?<=d{2,6})

For details of what kinds of widths various engines allow in a lookbehind, see the Lookbehind: Fixed-Width / Constrained Width / Infinite Width section of the main syntax page. To honor the winners, I'll just repeat here that the only two programming-language flavors that support infinite-width lookbehind are .NET (C#, VB.NET, …) and Matthew Barnett's regex module for Python. I've also implemented an infinite lookbehind demo for PCRE.

Capture Group Inside Variable Lookbehind: Difference between Java and .NET
Both Java and .NET allow this pattern:
(?<=(\d{1,5}))Z
.NET allows it because it supports infinite-width lookbehind. Java allows it because it supports lookbehind whose width falls within a defined range. However, they operate differently. As a result, against the string 123Z, this pattern will return different Group 1 captures in the two engines.

✽ Java captures 3 to Group 1. The engine sees that the width of the string to be matched inside the lookbehind must fall between one and five characters. Java tries all the possible fixed-width patterns in the range, from the shortest to the longest, until one succeeds. The shortest possible fixed-width pattern is (?<=(\d{1})). The engine temporarily skips back one character in the string, tries to match \d{1} and succeeds. The lookaround succeeds, and Group 1 contains 3.

✽ .NET captures 123 to Group 1. The .NET engine has a far more efficient way of processing variable-width lookbehinds. Instead of trying multiple fixed-width patterns starting at points further and further back in the string, .NET reverses the string as well as the pattern inside the lookbehind, then attempts to match that single pattern on the reversed string. Therefore, in 123Z, to try the lookbehind at the point before Z, it reverses the portion of string to be tested from 123 to 321. Likewise, the lookbehind (?<=(\d{1,5})) is flipped into the lookahead (?=(\d{1,5})). \d{1,5} matches 321. Reversing that string, Group 1 contains 123. To only capture 3 as in Java, you would have to make the quantifier lazy: (?

✽ Like .NET, the regex alternate regular expressions module for Python captures 123 to Group 1.

Workarounds
There are two main workarounds to the lack of support for variable-width (or infinite-width) lookbehind:

✽ Capture groups.
Instead of (?<=\b\d+_)[A-Z]+ , you can use \b\d+_([A-Z]+), which matches the digits and underscore you don't want to see, then matches and captures to Group 1 the uppercase text you want to inspect. This will work in all major regex flavors.

✽ The \K "keep out" verb, which is available in Perl, PCRE (C, PHP, R…), Ruby 2+ and Python\'s alternate regex engine.
\K tells the engine to drop whatever it has matched so far from the match to be returned. Instead of (?<=\b\d+_)[A-Z]+, you can therefore use \b\d+_\K[A-Z]+

Compared with lookbehinds, both the \K and capture group workarounds have limitations:

✽ When you look for multiple matches in a string, at the starting position of each match attempt, a lookbehind can inspect the characters behind the current position in the string. Therefore, against 123, the pattern (?<=\d)\d (match a digit preceded by a digit) will match both 2 and 3. In contrast, \d\K\d can only match 2, as the starting position after the first match is immediately before the 3, and there are not enough digits left for a second match. Likewise, \d(\d) can only capture 2.

✽ With lookbehinds, you can impose multiple conditions (similar to our password validation technique) by using multiple lookbehinds. For instance, to match a digit that is preceded by a lower-case Greek letter, you can use (?<=\p{Ll})(?<=\p{Greek})\d. The first lookbehind (?<=\p{Ll}) ensures that the character immediately to the left is a lower-case letter, and the second lookbehind (?<=\p{Greek}) ensures that the character immediately to the left belongs to the Greek script. With the workarounds, you could use \p{Greek}\K\d to match a digit preceded by a character in the Greek script (or \p{Greek}(\d) to capture it), but you cannot impose a second condition. To get over this limitation, you could capture the Greek character and use a second regex to check that it is a lower-case letter.

(direct link)

Lookarounds (Usually) Want to be Anchored

Let's imagine we want to match a string consisting of one word, provided it contains at least one digit. This pattern offers a reasonable solution—one of several:
\A(?=\D*\d)\w+\z
The \A anchor asserts that the current position is the beginning of the string. The lookahead (?=\D*\d) asserts that at the current position (which is still the beginning of the string), we can match zero or more non-digits, then one digit. Next, \w+ matches our word. Finally, the \z anchor asserts that the current position is the end of the string.

Now consider what happens when we forget the anchor \A and use (?=\D*\d)\w+\z. To make our oversight seem less severe, let's assume we know that our string always contains an uninterrupted string of word characters. This guarantees that if we find a match, it will have to be the right one—at the beginning of the string, as we wanted. So what's the problem?

Suppose we use our regex on a string composed of one hundred characters V. Since the string doesn't contain a single digit, you and I can immediately see that the regex must fail. Let's see how fast the engine comes to the same conclusion.

As always, the engine begins by trying to match the pattern at the first position in the string. Starting with the first token (?=\D*\d), it tries to assert that at the current position, i.e. the beginning of the string, it can match zero or more non-digits, then one digit. Within the subexpression, the \D* matches all the V characters. The engine then tries to match a digit, but since we have reached the end of the string, that fails.

If we're using a smart engine such as PCRE, at this stage the engine fails the lookaround for this first match attempt. That's because before starting the match attempt, the engine has studied the pattern and noticed that the \D and \d tokens are mutually exclusive, and it has turned the * quantifier into a possessive quantifier *+, a process known to PCRE as auto-possessification (see footnote).

A less clever engine will backtrack, giving up all the \D characters it has matched one by one, each time attempting to match a \d after giving up a \D. Eventually, the engine runs out of characters to backtrack, and the lookahead fails.

Once the engine understands that the lookahead must fail (whether it comes to this conclusion cleverly or clumsily), it gives up on the entire first match attempt. Next, as always in such cases, the engine moves to the next position in the string (past the first V) and starts a new match attempt. Again, the \D* eats up all the V characters—although this time, there are only 99 of them. Again, the lookahead fails, either fast if the engine is smart, or, more likely, after backtracking all the way back to the starting position.

After failing a second time, the engine moves past the second V, starts a new match attempt, and fails… And so on, all the way to the end of the string.

Because the pattern is not anchored at the beginning of the string, at each match attempt, the engine checks whether the lookahead matches at the current position. In doing so, in the best case, it matches 100 V characters, then 99 on the second attempt, and so on—so it needs about 5000 steps before it can see that the pattern will never match. In the more usual case, the engine needs to backtrack and try the \d at each position, adding two steps at each V position. Altogether, it needs about 15,000 steps before it can see that the pattern will never match.

In contrast, with the original anchored pattern \A(?=\D*\d)\w+\z, after the engine fails the first match attempt, each of the following match attempts at further positions in the string fail instantly, because the \A fails before the engine gets to the lookahead. In the best case, the engine takes about 200 steps to fail (100 steps to match all the V characters, then one step at each of the further match attempts.) In the more usual case, the engine takes about 400 steps to fail (300 steps on the first match attempt, then one step at each of the further match attempts.)

Needless to say, the ratio of (15,000 / 400) steps is the kind of performance hit we try to avoid in computing. This makes a solid case for helping the engine along by minimizing the number of times lookaheads must be attempted, either by using anchors such as ^ and \A, or by matching literal characters immediately before the lookahead.

One Exception: Overlapping Matches
There are times when we do want the engine to attempt the lookahead at every single position in the string. Usually, the purpose of such a maneuver is to match a number of overlapping substrings. For instance, against the string word, if the regex (?=(\w+)) is allowed to match repeatedly, it will match four times, and each match will capture a different string to Group 1: word, ord, rd, then d. The section on overlapping matches explains how this works.

Footnotes

Atomic tweak
The atomic variation (?>[^a-z]*)[a-z] or possessive version [^a-z]*+[a-z] are tweaks that ensure that if the engine fails to find the lowercase letter, it won't "stupidly" backtrack, giving up the non-lowercase letters one by one to see if a lowercase letter might fit at that stage.

Note that before they start matching, some engines notice the mutually exclusive character of [a-z] and its counterclass and automatically make the * quantifier possessive for you. This optimization is what PCRE calls auto-possessification. It allows you to turn it off with the Special Start-of-Pattern Modifier(*NO_AUTO_POSSESS)—but why would you ever want to?

Mastering Quantifiers

↧

Interactive map of Linux kernel

March 15, 2018, 8:24 pm

≫ Next: Let's build an MP3-decoder (2008)

≪ Previous: Regular Expressions – Mastering Lookahead and Lookbehind

Navigation

Mouse:
Drag - pan
Wheel,
Dbl Click - zoom
Items - links

Keyboard:
PgDn, PgUp - zoom
arrows - pan

New book:

on Apress

Purchase and
download instantly
unlimited resolution
PDF of the map:

e-mail

↧

Let's build an MP3-decoder (2008)

March 15, 2018, 8:13 pm

≫ Next: Stock trade app Robinhood raising at $5B+, up 4X in a year

≪ Previous: Interactive map of Linux kernel

Even though MP3 is probably the single most well known file format and codec on Earth, it’s not very well understood by most programmers – for many encoders/decoders is in the class of software “other people” write, like standard libraries or operating system kernels. This article will attempt to demystify the decoder, with short top-down primers on signal processing and information theory when necessary. Additionally, a small but not full-featured decoder will be written (in Haskell), suited to play around with.

The focus on this article is on concepts and the design choices the MPEG team made when they designed the codec – not on uninteresting implementation details or heavy theory. Some parts of a decoder are quite arcane and are better understood by reading the specification, a good book on signal processing, or the many papers on MP3 (see references at the end).

A note on the code: The decoder accompanying this article is written for readability, not speed. Additionally, some unusual features have been left out. The end result is a decoder that is inefficient and not standards compliant, but with hopefully readable code. You can grab the source here: mp3decoder-0.0.1.tar.gz. Scroll down to the bottom of the article or see README for build instructions.

A fair warning: The author is a hobby programmer, not an authority in signal processing. If you find an error, please drop me an e-mail. be@bjrn.se

With that out of the way, we begin our journey with the ear.

Human hearing and psychoacoustics

The main idea of MP3 encoding, and lossy audio coding in general, is removing acoustically irrelevant information from an audio signal to reduce its size. The job of the encoder is to remove some or all information from a signal component, while at the same time not changing the signal in such a way audible artifacts are introduced.

Several properties (or “deficiencies”) of human hearing are used by lossy audio codecs. One basic property is we can’t hear above 20 kHz or below 20 Hz, approximately. Additionally, there’s a threshold of hearing – once a signal is below a certain threshold it can’t be heard, it’s too quiet. This threshold varies with frequency; a 20 Hz tone can only be heard if it’s stronger than around 60 decibels, while frequencies in the region 1-5 kHz can easily be perceived at low volume.

A very important property affecting the auditory system is known as masking. A loud signal will “mask” other signals sufficiently close in frequency or time; meaning the loud signal modifies the threshold of hearing for spectral and temporal neighbors. This property is very useful: not only can the nearby masked signals be removed; the audible signal can also be compressed further as the noise introduced by heavy compression will be masked too.

This masking phenomenon happens within frequency regions known as critical bands– a strong signal within a critical band will mask frequencies within the band. We can think of the ear as a set of band pass filters, where different parts of the ear pick up different frequency regions. An audiologist or acoustics professor have plenty to say about critical bands and the subtleties of masking effects, however in this article we are taking a simplified engineering approach: for our purpose it’s enough to think of these critical bands as fixed frequency regions where masking effects occur.

Using the properties of the human auditory system, lossy codecs and encoders remove inaudible signals to reduce the information content, thus compressing the signal. The MP3 standard does not dictate how an encoder should be written (though it assumes the existence of critical bands), and implementers have plenty of freedom to remove content they deem imperceptible. One encoder may decide a particular frequency is inaudible and should be removed, while another encoder keeps the same signal. Different encoders use different psychoacoustic models, models describing how humans perceive sounds and thus what information may be removed.

About MP3

Before we begin decoding MP3, it is necessary to understand exactly what MP3 is. MP3 is a codec formally known as MPEG-1 Audio Layer 3, and it is defined in the MPEG-1 standard. This standard defines three different audio codecs, where layer 1 is the simplest that has the worst compression ratio, and layer 3 is the most complex but has the highest compression ratio and the best audio quality per bit rate. Layer 3 is based on layer 2, in turn based on layer 1. All of the three codecs share similarities and have many encoding/decoding parts in common.

The rationale for this design choice made sense back when the MPEG-1 standard was first written, as the similarities between the three codecs would ease the job for implementers. In hindsight, building layer 3 on top of the other two layers was perhaps not the best idea. Many of the advanced features of MP3 are shoehorned into place, and are more complex than they would have been if the codec was designed from scratch. In fact, many of the features of AAC were designed to be “simpler” than the counterpart in MP3.

At a very high level, an MP3 encoder works like this: An input source, say a WAV file, is fed to the encoder. There the signal is split into parts (in the time domain), to be processed individually. The encoder then takes one of the short signals and transforms it to the frequency domain. The psychoacoustic model removes as much information as possible, based on the content and phenomena such as masking. The frequency samples, now with less information, are compressed in a generic lossless compression step. The samples, as well as parameters how the samples were compressed, are then written to disk in a binary file format.

The decoder works in reverse. It reads the binary file format, decompress the frequency samples, reconstructs the samples based on information how content was removed by the model, and then transforms them to the time domain. Let’s start with the binary file format.

Decoding, step 1: Making sense of the data

Many computer users know that an MP3 are made up of several “frames”, consecutive blocks of data. While important for unpacking the bit stream, frames are not fundamental and cannot be decoded individually. In this article, what is usually called a frame we call a physical frame, while we call a block of data that can actually be decoded a logical frame, or simply just a frame.

A logical frame has many parts: it has a 4 byte header easily distinguishable from other data in the bit stream, it has 17 or 32 bytes known as side information, and a few hundred bytes of main data.

A physical frame has a header, an optional 2 byte checksum, side information, but only some of the main data unless in very rare circumstances. The screenshot below shows a physical frame as a thick black border, the frame header as 4 red bytes, and the side information as blue bytes (this MP3 does not have the optional checksum). The grayed out bytes is the main data that corresponds to the highlighted header and side information. The header for the following physical frame is also highlighted, to show the header always begin at offset 0.

The absolutely first thing we do when we decode the MP3 is to unpack the physical frames to logical frames – this is a means of abstraction, once we have a logical frame we can forget about everything else in the bit stream. We do this by reading an offset value in the side information that point to the beginning of the main data.

Why’s not the main data for a logical frame kept within the physical frame? At first this seems unnecessarily clumsy, but it has some advantages. The length of a physical frame is constant (within a byte) and solely based on the bit rate and other values stored in the easily found header. This makes seeking to arbitrary frames in the MP3 efficient for media players. Additionally, as frames are not limited to a fixed size in bits, parts of the audio signal with complex sounds can use bytes from preceding frames, in essence giving all MP3:s variable bit rate.

There are some limitations though: a frame can save its main data in several preceding frames, but not following frames – this would make streaming difficult. Also, the main data for a frame cannot be arbitrarily large, and is limited to about 500 bytes. This is limit is fairly short, and is often criticized.

The perceptive reader may notice the gray main data bytes in the image above begin with an interesting pattern (3E 50 00 00…) that resembles the first bytes of the main data in the next logical frame (38 40 00 00…). There is some structure in the main data, but usually this won’t be noticeable in a hex editor.

To work with the bit stream, we are going to use a very simple type:

data MP3Bitstream = MP3Bitstream {
    bitstreamStream :: B.ByteString,
    bitstreamBuffer :: [Word8]
}

Where the ByteString is the unparsed bit stream, and the [Word8] is an internal buffer used to reconstruct logical frames from physical frames. Not familiar with Haskell? Don’t worry; all the code in this article is only complementary.

As the bit stream may contain data we consider garbage, such as ID3 tags, we are using a simple helper function, mp3Seek, which takes the MP3Bitstream and discards bytes until it finds a valid header. The new MP3Bitstream can then be passed to a function that does the actual physical to logical unpacking.

mp3Seek :: MP3Bitstream -> Maybe MP3Bitstream
mp3UnpackFrame :: MP3Bitstream -> (MP3Bitstream, Maybe MP3LogicalFrame)

The anatomy of a logical frame

When we’re done decoding proper, a logical frame will have yielded us exactly 1152 time domain samples per channel. In a typical PCM WAV file, storing these samples would require 2304 bytes per channel – more than 4½ KB in total for a typical audio track. While large parts of the compression from 4½ KB audio to 0.4 KB frame stems from the removal of frequency content, a not insignificant contribution is thanks to a very efficient binary representation.

Before that, we have to make sense of the logical frame, especially the side information and the main data. When we’re done parsing the logical frame, we will have compressed audio and a bunch of parameters describing how to decompress it.

Unpacking the logical frame requires some information about the different parts. The 4-byte header stores some properties about the audio signal, most importantly the sample rate and the channel mode (mono, stereo etc). The information in the header is useful both for media player software, and for decoding the audio. Note that the header does not store many parameters used by the decoder, e.g. how audio samples should be reconstructed, those parameters are stored elsewhere.

The side information is 17 bytes for mono, 32 bytes otherwise. There’s lots of information in the side info. Most of the bits describe how the main data should be parsed, but there are also some parameters saved here used by other parts of the decoder.

The main data contains two “chunks” per channel, which are blocks of compressed audio (and corresponding parameters) decoded individually. A mono frame has two chunks, while a stereo frame has four. This partitioning is cruft left over from layer 1 and 2. Most new audio codecs designed from scratch don’t bother with this partitioning.

The first few bits of a chunk are the so-called scale factors– basically 21 numbers, which are used for decoding the chunk later. The reason the scale factors are stored in the main data and not the side information, as many other parameters, is the scale factors take up quite a lot of space. How the scale factors should be parsed, for example how long a scale factor is in bits, is described in the side information.

Following the scale factors is the actual compressed audio data for this chunk. These are a few hundred numbers, and take up most of the space in a chunk. These audio samples are actually compressed in a sense many programmers may be familiar with: Huffman coding, as used by zip, zlib and other common lossless data compression methods.

The Huffman coding is actually one of the biggest reasons an MP3 file is so small compared to the raw audio, and it’s worth investigating further. For now let’s pretend we have decoded the main data completely, including the Huffman coded data. Once we have done this for all four chunks (or two chunks for mono), we have successfully unpacked the frame. The function that does this is:

mp3ParseMainData :: MP3LogicalFrame -> Maybe MP3Data

Where MP3Data store some information, and the two/four parsed chunks.

Huffman coding

The basic idea of Huffman coding is simple. We take some data we want to compress, say a list of 8 bit characters. We then create a value table where we order the characters by frequency. If we don’t know beforehand how our list of characters will look, we can order the characters by probability of occurring in the string. We then assign code words to the value table, where we assign the short code words to the most probable values. A code word is simply an n-bit integer designed in such a way there are no ambiguities or clashes with shorter code words.

For example, lets say we have a very long string made up of the letters A, C, G and T. Being good programmers, we notice it’s wasteful to save this string as 8 bit characters, so we store them with 2 bits each. Huffman coding can compress the string further, if some of the letters are more frequent than others. In our example, we know beforehand ‘A’ occurs in the string with about 40% probability. We create a frequency table:

A	40%
C	35%
G	20%
T	5%

We then assign code words to the table. This is done in a specific way – if we pick code words at random we are not Huffman coding anymore but using a generic variable-length code.

A	`0`
C	`10`
G	`110`
T	`111`

Say we have a string of one thousand characters. If we save this string in ASCII, it will take up 8000 bits. If we instead use our 2-bit representation, it will only take 2000 bits. With Huffman coding however, we can save it in only 1850.

Decoding is the reverse of coding. If we have a bit string, say 00011111010, we read bits until there’s a match in the table. Our example string decodes to AAATGC. Note that the code word table is designed so there are no conflicts. If the table read

A	`0`
C	`01`

… and we encounter the bit 0 in a table, there’s no way we can ever get a C as the A will match all the time.

The standard method of decoding a Huffman coded string is by walking a binary tree, created from the code word table. When we encounter a 0 bit, we move – say – left in the tree, and right when we see a 1. This is the simplest method used in our decoder.

There’s a more efficient method to decode the string, a basic time-space tradeoff that can be used when the same code word table is used to code/decode several different bit strings, as is the case with MP3. Instead of walking a tree, we use a lookup table in a clever way. This is best illustrated with an example:

lookup[0xx] = (A, 1)
lookup[10x] = (C, 2)
lookup[110] = (G, 3)
lookup[111] = (T, 3)

In the table above, xx means all permutations of 2 bits; all bit patterns from 00 to 11. Our table thus contains all indices from 000 to 111. To decode a string using this table we peek 3 bits in the coded bit string. Our example bit string is 00011111010, so our index is 000. This matches the pair (A, 1), which means we have found the value A and we should discard 1 bit from the input. We peek another 3 bits in the string, and repeat the process.

For very large Huffman tables, where the longest code word is dozens of bits, it is not feasible to create a lookup table using this method of padding as it would require a table approximately 2ⁿ elements large, where n is the length of the longest code word. By carefully looking at a code word table however, it’s often possible to craft a very efficient lookup table by hand, that uses a method with “pointers” to different tables, which handle the longest code words.

How Huffman coding is used in MP3

To understand how Huffman coding is used by MP3, it is necessary to understand exactly what is being coded or decoded. The compressed data that we are about to decompress is frequency domain samples. Each logical frame has up to four chunks – two per channel – each containing up to 576 frequency samples. For a 44100 Hz audio signal, the first frequency sample (index 0) represent frequencies at around 0 Hz, while the last sample (index 575) represent a frequency around 22050 Hz.

These samples are divided into five different regions of variable length. The first three regions are known as the big values regions, the fourth region is known as the count1 region (or quad region), and the fifth is known as the zero region. The samples in the zero region are all zero, so these are not actually Huffman coded. If the big values regions and the quad region decode to 400 samples, the remaining 176 are simply padded with 0.

The three big values regions represent the important lower frequencies in the audio. The name big values refer to the information content: when we are done decoding the regions will contain integers in the range –8206 to 8206.

These three big values regions are coded with three different Huffman tables, defined in the MP3 standard. The standard defines 15 large tables for these regions, where each table outputs two frequency samples for a given code word. The tables are designed to compress the “typical” content of the frequency regions as much as possible.

To further increase compression, the 15 tables are paired with another parameter for a total of 29 different ways each of the three regions can be compressed. The side information contains information which of the 29 possibilities to use. Somewhat confusingly, the standard calls these possibilities “tables”. We will call them table pairs instead.

As an example, here is Huffman code table 1 (table1), as defined in the standard:

Code word	Value
`1`	(0, 0)
`001`	(0, 1)
`01`	(1, 0)
`000`	(1, 1)

And here is table pair 1: (table1, 0).

To decode a big values region using table pair 1, we proceed as follows: Say the chunk contains the following bits: 000101010... First we decode the bits as we usually decode Huffman coded strings: The three bits 000 correspond to the two output samples 1 and 1, we call them x and y.

Here’s where it gets interesting: The largest code table defined in the standard has samples no larger than 15. This is enough to represent most signals satisfactory, but sometimes a larger value is required. The second value in the table pair is known as the linbits (for some reason), and whenever we have found an output sample that is the maximum value (15) we read linbits number of bits, and add them to the sample. For table pair 1, the linbits is 0, and the maximum sample value is never 15, so we ignore it in this case. For some samples, linbits may be as large as 13, so the maximum value is 15+8191.

When we have read linbits for sample x, we get the sign. If x is not 0, we read one bit. This determines of the sample is positive or negative.

All in all, the two samples are decoded in these steps:

Decode the first bits using the Huffman table. Call the samples x and y.
If x = 15 and linbits is not 0, get linbits bits and add to x. x is now at most 8206.
If x is not 0, get one bit. If 1, then x is –x.
Do step 2 and 3 for y.

The count1 region codes the frequencies that are so high they have been compressed tightly, and when decoded we have samples in the range –1 to 1. There are only two possible tables for this region; these are known as the quad tables as each code word corresponds to 4 output samples. There are no linbits for the count1 region, so decoding is only a matter of using the appropriate table and get the sign bits.

Decode the first bits using the Huffman table. Call the samples v, w, x and y.
If v is not 0, get one bit. If 1, then v is –v.
Do step 2 for w, x and y.

Step 1, summary

Unpacking an MP3 bit stream is very tedious, and is without doubt the decoding step that requires the most lines of code. The Huffman tables alone are a good 70 kilobytes, and all the parsing and unpacking requires a few hundred lines of code too.

The Huffman coding is undoubtedly one of the most important features of MP3 though. For a 500-byte logical frame with two channels, the output is 4x576 samples (1152 per channel) with a range of almost 15 bits, and that is even before we’ve done any transformations on the output samples. Without the Huffman coding, a logical frame would require up to 4-4½ kilobytes of storage, about an eight-fold increase in size.

All the unpacking is done by Unpack.hs, which exports two functions, mp3Seek and mp3Unpack. The latter is a simple helper function that combines mp3UnpackFrame and mp3ParseMainData. It looks like this:

mp3Unpack :: MP3Bitstream -> (MP3Bitstream, Maybe MP3Data)

Decoding, step 2: Re-quantization

Having successfully unpacked a frame, we now have a data structure containing audio to be processed further, and parameters how this should be done. Here are our types, what we got from mp3Unpack:

data MP3Data = MP3Data1Channels SampleRate ChannelMode (Bool, Bool) 
                                MP3DataChunk MP3DataChunk
             | MP3Data2Channels SampleRate ChannelMode (Bool, Bool) 
                                MP3DataChunk MP3DataChunk 
                                MP3DataChunk MP3DataChunk

data MP3DataChunk = MP3DataChunk {
    chunkBlockType    :: Int,
    chunkBlockFlag    :: BlockFlag,
    chunkScaleGain    :: Double,
    chunkScaleSubGain :: (Double, Double, Double),
    chunkScaleLong    :: [Double],
    chunkScaleShort   :: [[Double]],
    chunkISParam      :: ([Int], [[Int]]),
    chunkData         :: [Int]
}

MP3Data is simply an unpacked and parsed logical frame. It contains some useful information, first is the sample rate, second is the channel mode, third are the stereo modes (more about them later). Then are the two-four data chunks, decoded separately. What the values stored in an MP3DataChunk represent will be described soon. For now it’s enough to know chunkData store the (at most) 576 frequency domain samples. An MP3DataChunk is also known as a granule, however to avoid confusion we are not going to use this term until later in the article.

Re-quantization

We have already done one of the key steps of decoding an MP3: decoding the Huffman data. We will now do the second key step – re-quantization.

As hinted in the chapter on human hearing, the heart of MP3 compression is quantization. Quantization is simply the approximation of a large range of values with a smaller set of values i.e. using fewer bits. For example if you take an analog audio signal and sample it at discrete intervals of time you get a discrete signal – a list of samples. As the analog signal is continuous, these samples will be real values. If we quantize the samples, say approximate each real valued sample with an integer between –32767 and +32767, we end up with a digital signal – discrete in both dimensions.

Quantization can be used as a form of lossy compression. For 16 bit PCM each sample in the signal can take on one of 2¹⁶ values. If we instead approximate each sample in the range –16383 to +16383, we lose information but save 1 bit per sample. The difference between the original value and the quantized value is known as the quantization error, and this results in noise. The difference between a real valued sample and a 16-bit sample is so small it’s inaudible for most purposes, but if we remove too much information from the sample, the difference between the original will soon be audible.

Let’s stop for a moment and think about where this noise comes from. This requires a mathematical insight, due to Fourier: all continuous signals can be created by adding sinusoids together – even the square wave! This means that if we take a pure sine wave, say at 440 Hz, and quantize it, the quantization error will manifest itself as new frequency components in the signal. This makes sense – the quantized sine is not really a pure sine, so there must be something else in the signal. These new frequencies will be all over the spectra, and is noise. If the quantization error is small, the magnitude of the noise will be small.

And this is where we can thank evolution our ear is not perfect: If there’s a strong signal within a critical band, the noise due to quantization errors will be masked, up to the threshold. The encoder can thus throw away as much information as possible from the samples within the critical band, up to the point were discarding more information would result in noise passing the audible threshold. This is the key insight of lossy audio encoding.

Quantization methods can be written as mathematical expressions. Say we have a real valued sample in the range –1 to 1. To quantize this value to a form suitable for a 16 bit WAV file, we multiply the sample with 32727 and throw away the fractional part: q = floor(s * 32767) or equivalently in a form many programmers are familiar with: (short)(s * 32767.0). Re-quantization in this simple case is a division, where the difference between the re-quantized sample and the original is the quantization error.

Re-quantization in MP3

After we unpacked the MP3 bit stream and Huffman decoded the frequency samples in a chunk, we ended up with quantized frequency samples between –8206 and 8206. Now it’s time to re-quantize these samples to real values (floats), like when we take a 16-bit PCM sample and turn it to a float. When we’re done we have a sample in the range –1 to 1, much smaller than 8206. However our new sample has a much higher resolution, thanks to the information the encoder left in the frame how the sample should be reconstructed.

The MP3 encoder uses a non-linear quantizer, meaning the difference between consecutive re-quantized values is not constant. This is because low amplitude signals are more sensitive to noise, and thus require more bits than stronger signals – think of it as using more bits for small values, and fewer bits for large values. To achieve this non-linearity, the different scaling quantities are non-linear.

The encoder will first raise all samples by 3/4, that is newsample = oldsample^3/4. The purpose is, according to the literature, to make the signal-to-noise ratio more consistent. We will gloss over the why’s and how’s here, and just raise all samples by 4/3 to restore the samples to their original value.

All 576 samples are then scaled by a quantity simply known as the gain, or the global gain because all samples are affected. This is chunkScaleGain, and it’s also a non-linear value.

This far, we haven’t done anything really unusual. We have taken a value, at most 8206, and scaled it with a variable quantity. This is not that much different from a 16 bit PCM WAV, where we take a value, at most 32767, and scale it with the fixed quantity 1/32767. Now things will get more interesting.

Some frequency regions, partitioned into several scale factor bands, are further scaled individually. This is what the scale factors are for: the frequencies in the first scale factor band are all multiplied by the first scale factor, etc. The bands are designed to approximate the critical bands. Here’s an illustration of the scale factor bandwidths for a 44100 Hz MP3. The astute reader may notice there are 22 bands, but only 21 scale factors. This is a design limitation that affects the very high frequencies.

The reason these bands are scaled individually is to better control quantization noise. If there’s a strong signal in one band, it will mask the noise in this band but not others. The values within a scale factor band are thus quantized independently from other bands by the encoder, depending on the masking effects.

Because of reasons that will hopefully be made more clear shortly, a chunk can be scaled in three different ways.

For one type of chunk – called “long” – we scale the 576 frequencies by the global gain and the 21 scale factors (chunkScaleLong), and leave it at that.

For another type of chunk – called “short” – the 576 samples are really three interleaved sets of 192 frequency samples. Don’t worry if this doesn’t make any sense now, we will talk about it soon. In this case, the scale factor bands look slightly different than in the illustration above, to accommodate the reduced bandwidths of the scale factor bands. Also, the scale factors are not 21 numbers, but sets of three numbers (chunkScaleShort). An additional parameter, chunkScaleSubGain, further scales the individual three sets of samples.

The third type of chunk is a mix of the above two.

When we have multiplied each sample with the corresponding scale factor and other gains, we are left with a high precision floating point representation of the frequency domain, where each sample is in the range –1 to 1.

Here’s some code, that uses almost all values in a MP3DataChunk. The three different scaling methods are controlled by the BlockFlag. There will be plenty more information about the block flag later in this article.

mp3Requantize :: SampleRate -> MP3DataChunk -> [Frequency]
mp3Requantize samplerate (MP3DataChunk bt bf gain (sg0, sg1, sg2) 
                         longsf shortsf _ compressed)
    | bf == LongBlocks  = long
    | bf == ShortBlocks = short
    | bf == MixedBlocks = take 36 long ++ drop 36 short
    where 
        long  = zipWith procLong  compressed longbands
        short = zipWith procShort compressed shortbands

        procLong sample sfb = 
            let localgain   = longsf !! sfb
                dsample     = fromIntegral sample
            in gain * localgain * dsample **^ (4/3)

        procShort sample (sfb, win) =
            let localgain = (shortsf !! sfb) !! win
                blockgain = case win of 0 -> sg0
                                        1 -> sg1
                                        2 -> sg2
                dsample   = fromIntegral sample
            in gain * localgain * blockgain * dsample **^ (4/3)
        -- Frequency index (0-575) to scale factor band index (0-21).
        longbands = tableScaleBandIndexLong samplerate
        -- Frequency index to scale factor band index and window index (0-2).
        shortbands = tableScaleBandIndexShort samplerate

A fair warning: This presentation of the MP3 re-quantization step differs somewhat from the official specification. The specification presents the quantization as a long formula based on integer quantities. This decoder instead treats these integer quantities as floating point representations of non-linear quantities, so the re-quantization can be expressed as an intuitive series of multiplications. The end result is the same, but the intention is hopefully clearer.

Minor step: Reordering

Before quantizing the frequency samples, the encoder will in certain cases reorder the samples in a predefined way. We have already encountered this above: after the reordering by the encoder the “short” chunks with three small chunks of 192 samples each are combined to 576 samples ordered by frequency (sort of). This is to improve the efficiency of the Huffman coding, as the method with big values and different tables assume the lower frequencies are first in the list.

When we’re done re-quantizing in our decoder, we will reorder the “short” samples back to their original position. After this reordering, the samples in these chunks are no longer ordered by frequency. This is slightly confusing, so unless you are really interested in MP3 you can ignore this and concentrate on the “long” chunks, which have very few surprises.

Decoding, step 3: Joint Stereo

MP3 supports four different channel modes. Mono means the audio has a single channel. Stereo means the audio has two channels. Dual channel is identical to stereo for decoding purposes – it’s intended as information for the media player in case the two channels contain different audio, such as an audio book in two languages.

Then there’s joint stereo. This is like the regular stereo mode, but with some extra compression steps taking similarities between the two channels into account. This makes sense, especially for stereo music where there’s usually a very high correlation between the two channels. By removing some redundancy, the audio quality can be much higher for a given bit rate.

MP3 supports two joint stereo modes known as middle/side stereo (MS) and intensity stereo (IS). Whether these modes are in use is given by the (Bool, Bool) tuple in the MP3Data type. Additionally chunkISParam stores parameter used by IS mode.

MS stereo is very simple: instead of encoding two similar channels verbatim, the encoder computes the sum and the difference of the two channels before encoding. The information content in the “side” channel (difference) will be less than the “middle” channel (sum), and the encoder can use more bits for the middle channel for a better result. MS stereo is lossless, and is a very common mode that’s often used in joint stereo MP3:s. Decoding MS stereo is very cute:

mp3StereoMS :: [Frequency] -> [Frequency] -> ([Frequency], [Frequency])
mp3StereoMS middle side =
    let sqrtinv = 1 / (sqrt 2)
        left  = zipWith0 (\x y -> (x+y)*sqrtinv) 0.0 middle side
        right = zipWith0 (\x y -> (x-y)*sqrtinv) 0.0 middle side
    in (left, right)

The only oddity here is the division by the square root of 2 instead of simply 2. This is to scale down the channels for more efficient quantization by the encoder.

A more unusual stereo mode is known as intensity stereo, or IS for short. We will ignore IS stereo in this article.

Having done the stereo decoding, the only thing remaining is taking the frequency samples back to the time domain. This is the part heavy on theory.

Decoding, step 4: Frequency to time

At this point the only remaining MP3DataChunk values we will use are chunkBlockFlag and chunkBlockType. These are the sole two parameters that dictate how we’re going to transform our frequency domain samples to the time domain. To understand the block flag and block type we have to familiarize ourselves with some transforms, as well as one part of the encoder.

The encoder: filter banks and transforms

The input to an encoder is probably a time domain PCM WAV file, as one usually gets when ripping an audio CD. The encoder takes 576 time samples, from here on called a granule, and encodes two of these granules to a frame. For an input source with two channels, two granules per channel are stored in the frame. The encoder also saves information how the audio was compressed in the frame. This is the MP3Data type in our decoder.

The time domain samples are transformed to the frequency domain in several steps, one granule a time.

Analysis filter bank

First the 576 samples are fed to a set of 32 band pass filters, where each band pass filter outputs 18 time domain samples representing 1/32:th of the frequency spectra of the input signal. If the sample rate is 44100 Hz each band will be approximately 689 Hz wide (22050/32 Hz). Note that there’s downsampling going on here: Common band pass filters will output 576 output samples for 576 input samples, however the MP3 filters also reduce the number of samples by 32, so the combined output of all 32 filters is the same as the number of inputs.

This part of the encoder is known as the analysis filter bank (throw in the word polyphase for good measure), and it’s a part of the encoder common to all the MPEG-1 layers. Our decoder will do the reverse at the very end of the decoding process, combining the subbands to the original signal. The reverse is known as the synthesis filter bank. These two filter banks are simple conceptually, but real mammoths mathematically – at least the synthesis filter bank. We will treat them as black boxes.

MDCT

The output of each band pass filter is further transformed by the MDCT, the modified discrete cosine transform. This transform is just a method of transforming the time domain samples to the frequency domain. Layer 1 and 2 does not use this MDCT, but it was added on top of the filter bank for layer 3 as a finer frequency resolution than 689 Hz (given 44.1 KHz sample rate) proved to give better compression. This makes sense: simply dividing the whole frequency spectra in fixed size blocks means the decoder has to take several critical bands into account when quantizing the signal, which results in a worse compression ratio.

The MDCT takes a signal and represents it as a sum of cosine waves, turning it to the frequency domain. Compared to the DFT/FFT and other well-known transforms, the MDCT has a few properties that make it very suited for audio compression.

First of all, the MDCT has the energy compaction property common to several of the other discrete cosine transforms. This means most of the information in the signal is concentrated to a few output samples with high energy. If you take an input sequence, do an (M)DCT transform on it, set the “small” output values to 0, then do the inverse transform – the result is a fairly small change in the original input. This property is of course very useful for compression, and thus different cosine transforms are used by not only MP3 and audio compression in general but also JPEG and video coding techniques.

Secondly, the MDCT is designed to be performed on consecutive blocks of data, so it has smaller discrepancies at block boundaries compared to other transforms. This also makes it very suited for audio, as we’re almost always working with really long signals.

Technically, the MDCT is a so-called lapped transform, which means we use input samples from the previous input data when we work with the current input data. The input is 2N time samples and the output is N frequency samples. Instead of transforming 2N length blocks separately, consecutive blocks are overlapped. This overlapping helps reducing artifacts at block boundaries. First we perform the MDCT on say samples 0-35 (inclusive), then 18-53, then 36-71… To smoothen the boundaries between consecutive blocks, the MDCT is usually combined with a windowing function that is performed prior to the transform. A windowing function is simply a sequence of values that are zero outside some region, and often between 0 and 1 within the region, that are to be multiplied with another sequence. For the MDCT smooth, arc-like window functions are usually used, which makes the boundaries of the input block go smoothly to zero at the edges.

In the case of MP3, the MDCT is done on the subbands from the analysis filter bank. In order to get all the nice properties of the MDCT, the transform is not done on the 18 samples directly, but on a windowed signal formed by the concatenation of the 18 previous and the current samples. This is illustrated in the picture below, showing two consecutive granules (MP3DataChunk) in an audio channel. Remember: we are looking at the encoder here, the decoder works in reverse. This illustration shows the MDCT of the 0-679 Hz band.

The MDCT can either be applied to the 36 samples as described above, or three MDCT:s are done on 12 samples each – in either case the output is 18 frequency samples. The first choice, known as the long method, gives us greater frequency resolution. The second choice, known as the short method, gives us greater time resolution. The encoder selects the long MDCT to get better audio quality when the signal changes very little, and it selects short when there’s lots going on, that is for transients.

For the whole granule of 576 samples, the encoder can either do the long MDCT on all 32 subbands – this is the long block mode, or it can do the short MDCT in all subbands – this is the short block mode. There’s a third choice, known as the mixed block mode. In this case the encoder uses the long MDCT on the first two subbands, and the short MDCT on the remaining. The mixed block mode is a compromise: it’s used when time resolution is necessary, but using the short block mode would result in artifacts. The lowest frequencies are thus treated as long blocks, where the ear is most sensitive to frequency inaccuracies. Notice that the boundaries of the mixed block mode is fixed: the first two, and only two, subbands use the long MDCT. This is considered a design limitation of MP3: sometimes it’d be useful to have high frequency resolution in more than two subbands. In practice, many encoders do not support mixed blocks.

We discussed the block modes briefly in the chapter on re-quantization and reordering, and hopefully that part will make a little more sense knowing what’s going on inside the encoder. The 576 samples in a short granule are really 3x 192 small granules, but stored in such a way the facilities for compressing a long granule can be used.

The combination of the analysis filter bank and the MDCT is known as the hybrid filter bank, and it’s a very confusing part of the decoder. The analysis filter bank is used by all MPEG-1 layers, but as the frequency bands does not reflect the critical bands, layer 3 added the MDCT on top of the analysis filter bank. One of the features of AAC is a simpler method to transform the time domain samples to the frequency domain, which only use the MDCT, not bothering with the band pass filters.

The decoder

Digesting this information about the encoder leads to a startling realization: we can’t actually decode granules, or frames, independently! Due to the overlapping nature of the MDCT we need the inverse-MDCT output of the previous granule to decode the current granule.

This is where chunkBlockType and chunkBlockFlag are used. If chunkBlockFlag is set to the value LongBlocks, the encoder used a single 36-point MDCT for all 32 subbands (from the filter bank), with overlapping from the previous granule. If the value is ShortBlocks instead, three shorter 12-point MDCT:s were used. chunkBlockFlag can also be MixedBlocks. In this case the two lower frequency subbands from the filter bank are treated as LongBlocks, and the rest as ShortBlocks.

The value chunkBlockType is an integer, either 0,1,2 or 3. This decides which window is used. These window functions are pretty straightforward and similar, one is for the long blocks, one is for the three short blocks, and the two others are used exactly before and after a transition between a long and short block.

Before we do the inverse MDCT, we have to take some deficiencies of the encoder’s analysis filter bank into account. The downsampling in the filter bank introduces some aliasing (where signals are indistinguishable from other signals), but in such a way the synthesis filter bank cancels the aliasing. After the MDCT, the encoder will remove some of this aliasing. This, of course, means we have to undo this alias reduction in our decoder, prior the IMDCT. Otherwise the alias cancellation property of the synthesis filter bank will not work.

When we’ve dealt with the aliasing, we can IMDCT and then window, remembering to overlap with the output from the previous granule. For short blocks, the three small individual IMDCT inputs are overlapped directly, and this result is then treated as a long block.

The word “overlap” requires some clarifications in the context of the inverse transform. When we speak of the MDCT, a function from 2N inputs to N outputs, this just means we use half the previous samples as inputs to the function. If we’ve just MDCT:ed 36 input samples from offset 0 in a long sequence, we then MDCT 36 new samples from offset 18.

When we speak of the IMDCT, a function from N inputs to 2N outputs, there’s an addition step needed to reconstruct the original sequence. We do the IMDCT on the first 18 samples from the output sequence above. This gives us 36 samples. Output 18..35 are added, element wise, to output 0..17 of the IMDCT output of the next 18 samples. Here’s an illustration.

With that out of the way, here’s some code:

mp3IMDCT :: BlockFlag -> Int -> [Frequency] -> [Sample] -> ([Sample], [Sample])
mp3IMDCT blockflag blocktype freq overlap =
    let (samples, overlap') = 
            case blockflag of
                 LongBlocks  -> transf (doImdctLong blocktype) freq
                 ShortBlocks -> transf (doImdctShort) freq
                 MixedBlocks -> transf (doImdctLong 0)  (take 36 freq) 
                                transf (doImdctShort) (drop 36 freq)
        samples' = zipWith (+) samples overlap
    in (samples', overlap')
    where
        transf imdctfunc input = unzipConcat $ mapBlock 18 toSO input
            where
                -- toSO takes 18 input samples b and computes 36 time samples
                -- by the IMDCT. These are further divided into two equal
                -- parts (S, O) where S are time samples for this frame
                -- and O are values to be overlapped in the next frame.
                toSO b = splitAt 18 (imdctfunc b)
                unzipConcat xs = let (a, b) = unzip xs
                                 in (concat a, concat b)

doImdctLong :: Int -> [Frequency] -> [Sample]
doImdctLong blocktype f = imdct 18 f `windowWith` tableImdctWindow blocktype

doImdctShort :: [Frequency] -> [Sample]
doImdctShort f = overlap3 shorta shortb shortc
  where
    (f1, f2, f3) = splitAt2 6 f
    shorta       = imdct 6 f1 `windowWith` tableImdctWindow 2
    shortb       = imdct 6 f2 `windowWith` tableImdctWindow 2
    shortc       = imdct 6 f3 `windowWith` tableImdctWindow 2
    overlap3 a b c = 
      p1 ++ (zipWith3 add3 (a ++ p2) (p1 ++ b ++ p1) (p2 ++ c)) ++ p1
      where
        add3 x y z = x+y+z
        p1         = [0,0,0, 0,0,0]
        p2         = [0,0,0, 0,0,0, 0,0,0, 0,0,0]

Before we pass the time domain signal to the synthesis filter bank, there’s one final step. Some subbands from the analysis filter bank have inverted frequency spectra, which the encoder corrects. We have to undo this, as with the alias reduction.

Here are the steps required for taking our frequency samples back to time:

[Frequency] Undo the alias reduction, taking the block flag into account.
[Frequency] Perform the IMDCT, taking the block flag into account.
[Time] Invert the frequency spectra for some bands.
[Time] Synthesis filter bank.

A typical MP3 decoder will spend most of its time in the synthesis filter bank – it is by far the most computationally heavy part of the decoder. In our decoder, we will use the (slow) implementation from the specification. Typical real world decoders, such as the one in your favorite media player, use a highly optimized version of the filter bank using a transform in a clever way. We will not delve in this optimization technique further.

Step 4, summary

It’s easy to miss the forest for the trees, but we have to remember this decoding step is conceptually simple; it’s just messy in MP3 because the designers reused parts from layer 1, which makes the boundaries between time domain, frequency domain and granule less clear.

Using the decoder

Using the decoder is a matter of creating a bit stream, initializing it (mp3Seek), unpacking it to an MP3Data (mp3Unpack) and then decoding the MP3Data with mp3Decode. The decoder does not use any advanced Haskell concepts externally, such as state monads, so hopefully the language will not get in the way of the audio.

module Codec.Audio.MP3.Decoder (
    mp3Seek
   ,mp3Unpack
   ,MP3Bitstream(..)
   ,mp3Decode
   ,MP3DecodeState(..)
   ,emptyMP3DecodeState
) where
...

mp3Decode :: MP3DecodeState -> MP3Data -> (MP3DecodeState, [Sample], [Sample])

data MP3DecodeState = ...

emptyMP3DecodeState :: MP3DecodeState
emptyMP3DecodeState = ...

The code is tested with a new version of GHC. The decoder requires binary-strict, which can be found at Hackage. See README in the code for build instructions. Please note that the software is currently version 0.0.1 – it’s very, very slow, and has some missing features.

Code: mp3decoder-0.0.1.tar.gz.

Conclusion

MP3 has its peculiarities, especially the hybrid filter bank, but it’s still a nice codec with a firm grounding in psychoacoustic principles. Not standardizing the encoder was a good choice by the MPEG-1 team, and the available encoders show it’s possible to compress audio satisfactory within the constraints set by the decoder.

If you decide to play around with the source code, be sure to set your sound card to a low volume if you use headphones! Removing parts of the decoder may result in noise. Have fun.

References

CD 11172-3 Part 3 (the specification)

David Salomon, Data Compression: The Complete Reference, 3rd ed.

Davis Pan, A Tutorial on MPEG/Audio Compression

Rassol Raissi, The Theory Behind Mp3

The source code to libmad, LAME and 8Hz-mp3.

↧

Stock trade app Robinhood raising at $5B+, up 4X in a year

March 15, 2018, 5:10 pm

≫ Next: Sierra Leone just ran the first blockchain-based election

≪ Previous: Let's build an MP3-decoder (2008)

By adding a cryptocurrency exchange, a web version and stock option trading, has managed to quadruple its valuation in a year, according to a source familiar with a new round the startup is raising. Robinhood is closing in on around $350 million in Series D funding led by Russian firm DST Global, the source says. That’s just 11 months after Robinhood confirmed TechCrunch’s scoop that the zero-fee stock trading app had raised a $110 million Series C at a $1.3 billion valuation. The new raise would bring Robinhood to $526 million in funding.

Details of the Series D were first reported by The Wall Street Journal.

The astronomical value growth shows that investors see Robinhood as a core part of the mobile finance tools upon which the next generation will rely. The startup also just proved its ability to nimbly adapt to trends by building its cryptocurrency trading feature in less than two months to make sure it wouldn’t miss the next big economic shift. One million users waitlisted for access in just the five days after Robinhood Crypto was announced.

The launch completed a trio of product debuts. The mobile app finally launched a website version for tracking and trading stocks without a commission in November. In December it opened options trading, making it a more robust alternative to brokers like E*Trade and Scottrade. They often charge $7 or more per stock trade compared to zero with Robinhood, but also give away features that are reserved for Robinhood’s premium Gold subscription tier.

Robinhood won’t say how many people have signed up for its $6 to $200 per month Gold service that lets people trade on margin, with higher prices netting them more borrowing power. That and earning interest on money stored in Robinhood accounts are the startup’s primary revenue sources.

Rapid product iteration and skyrocketing value surely helped recruit Josh Elman, who Robinhood announced yesterday has joined as VP of product as he transitions to a part-time roll at Greylock Partners. He could help the company build a platform business as a backbone for other fintech apps, they way he helped Facebook build its identity platform.

In effect, Robinhood has figured out how to make stock trading freemium. Rather than charge per trade with bonus features included, Robinhood gives away the bare-bones trades and charges for everything else. That could give it a steady, scalable business model akin to Dropbox, which grew by offering small amounts of free storage and then charging for extras and enterprise accounts. From a start with free trades, Robinhood could blossom into a hub for your mobile finance life.

↧

Sierra Leone just ran the first blockchain-based election

March 15, 2018, 7:49 pm

≫ Next: Trump Readies Sweeping Tariffs and Investment Restrictions on China

≪ Previous: Stock trade app Robinhood raising at $5B+, up 4X in a year

The citizens of Sierra Leone went to the polls on March 7 but this time something was different: the country recorded votes at 70% of the polling to the using a technology that is the first of its kind in actual practice.

The tech, created by Leonardo Gammar of Agora, anonymously stored votes in an immutable ledger, thereby offering instant access to the election results.

“Anonymized votes/ballots are being recorded on Agora’s blockchain, which will be publicly available for any interested party to review, count and validate,” said Gammar. “This is the first time a government election is using blockchain technology.”

“Sierra Leone wishes to create an environment of trust with the voters in a contentious election, especially looking at how the election will be publicly viewed post-election. By using blockchain as a means to immutably record ballots and results, the country hopes to create legitimacy around the election and reduce fall-out from opposition parties,” he said.

Why is this interesting? While this is little more than a proof of concept – it is not a complete voting record but instead captured a seemingly acceptable plurality of votes – it’s fascinating to see the technology be implemented in Sierra Leone, a country of about 7.4 million people. The goal ultimately is to reduce voting costs by cutting out paper ballots as well as reducing corruption in the voting process.

Gammar, for his part, sees the value of a decentralizes system.

“We’re the only company in the world that has built a fully-functional blockchain voting platform. Other electronic voting machines are ‘block boxes’ that have been increasingly shown to be vulnerable to security attacks. For that reason, many US states and foreign nations have been moving back to paper,” he said. “If you believe that most countries will use some form of digital voting 50 years from now, then blockchain is the only technology that has been created which can provide an end-to-end verifiable and fully-transparent voting solution for this future.”

One election in one country isn’t a movement – yet. However, Gammar and his team plan on expanding their product to other African countries and, eventually, to the rest of the world.

As for the election it is still unclear who won and there will be a run-off election on March 27. The winner will succeed President Ernest Bai Koroma who has run the country for a full decade.

↧

Trump Readies Sweeping Tariffs and Investment Restrictions on China

March 15, 2018, 10:56 pm

≫ Next: Samuel Johnson, would-be attorney-at-law

≪ Previous: Sierra Leone just ran the first blockchain-based election

Mr. Trump — surrounded by his commerce secretary, Wilbur Ross, his trade adviser Peter Navarro and others — asked for a figure beyond $30 billion and for the plan to be officially announced in the coming weeks, according to two people familiar with the exchange.

The administration is devising the measure to broadly counter a Chinese strategy known as the Made in China 2025 plan. China introduced a comprehensive initiative in 2015 to upgrade Chinese industry over the next decade and dominate sectors of the future, including advanced information technology, new energy vehicles and aerospace equipment.

Unlike the steel and aluminum measure, which divided the president’s advisers and his own party, the idea of targeting China has broad support among a number of officials who believe China is cheating in global trade.

Gary D. Cohn, a top economic adviser who resigned over the steel and aluminum tariffs, had approved of action against China, the people familiar with the discussions said. Orrin G. Hatch, the chairman of the powerful Senate Finance Committee, and Senator Marco Rubio of Florida, Republicans who criticized those tariffs, have also endorsed a tough approach toward China.

Congress is also weighing legislation that would strengthen national security checks on Chinese investment. In a House hearing on Thursday, Heath P. Tarbert, an assistant secretary of the Treasury Department, said the current system for assessing investment is riddled with loopholes that allowed Chinese companies to evade such checks.

Concern over China’s practices picked up speed at the end of the Obama administration and has only increased since. Last year, a technology-focused unit in the Defense Department issued a report arguing that rising Chinese investment in Silicon Valley was giving China unprecedented access to the military technologies of the future, and increasing Chinese ownership of supply chains that service the United States military.

In recent months, China’s political apparatus has exerted even greater control over the nation’s economy. Business leaders and politicians of both parties now widely say that Washington’s past strategy of offering Beijing economic incentives to liberalize its market has failed. On Sunday, China officially ended term limits on the presidency, clearing the way for President Xi Jinping to stay in power indefinitely.

Administration officials say that past failure to rein in China warrants a much tougher approach. Mr. Trump took one step toward this in his national security strategy, which identified China as an economic aggressor. When a top Chinese economic envoy visited in late February, the administration asked China to shave $100 billion off its $375.2 billion trade surplus with the United States, two people close to the talks said. And while the steel and aluminum tariffs will hit many countries, they are primarily aimed at combating overcapacity in Chinese metals, including those that are routed through other nations.

The next step, advisers say, is to more aggressively focus on trade with China.

The United States is expected to impose tariffs on Chinese imports of high-technology goods specified in the Made in China 2025 plan, including semiconductors and new energy vehicles. But they could go beyond that to target more mundane products, including consumer electronics, apparel and even shoes. The breadth of the tariffs remains a contentious topic in the business sector and the White House, with some industries fretting about retaliation and increased costs to American companies and consumers.

Thomas J. Donohue, the president of the U.S. Chamber of Commerce, said on Wednesday that while the administration was right to focus on China’s unfair trade practices, his group strongly disagreed with sweeping tariffs.

“Simply put, tariffs are damaging taxes on American consumers,” he said. “Tariffs of $30 billion a year would wipe out over a third of the savings American families received from the doubling of the standard deduction in tax reform.”

Hun Quach, the vice president for international trade at the Retail Industry Leaders Association, said that tariffs on apparel, shoes and electronics would hit American families most. “Is the best response to make American consumers pay for China’s violations? We don’t think so,” she said.

Although there is wide support for taking action against unfair trade practices by China, business groups and economists still say the measure could be risky. The United States and China maintain the world’s largest trading relationship, and the tariffs could easily provoke a backlash.

“They know our system inside out,” said Jim McGregor, the chairman of the greater China region for APCO Worldwide. He added, referring to the House speaker and the Senate majority leader: “They know what companies are important to Paul Ryan. They know what companies are important to Mitch McConnell. They know which trade associations and political groups have a big voice in Washington.”

Scott Kennedy, a China expert at the Center for Strategic and International Studies, said that while China deserved a tough response, he feared the consequences of the administration’s actions had not been well considered. “You really have to be smart,” he said. “The Chinese aren’t just going to fold over on this.”

Mr. Kennedy compared China to a bully that had stolen America’s lunch money. "You want to teach them a lesson,” he said. “But it’s not as simple as going up in the playground and punching them on the nose.”

↧

Samuel Johnson, would-be attorney-at-law

March 16, 2018, 12:05 am

≫ Next: Tiny, Perfect Staircases Made by French Woodworkers

≪ Previous: Trump Readies Sweeping Tariffs and Investment Restrictions on China

Toward the end of his life, Samuel Johnson drew up a list of subjects that he would like to research. He projected forty-nine works in all; none was on any aspect of the law. According to James Boswell, Johnson’s celebrated biographer, almost the only subjects sure to distress Johnson when raised were mortality, particularly his own, and what might have transpired had he become a lawyer. Even when nearing seventy, he rounded on his friend William Scott, who had innocently commented, “What a pity it is, sir, that you did not follow the profession of the law. You might have been lord chancellor of Great Britain and attained to the dignity of the peerage.” According to Boswell, “Johnson, upon this, seemed much agitated; and in an angry tone exclaimed, ‘Why will you vex me by suggesting this, when it is too late?’ ” For here lay a curious paradox: of all great writers, in any language, Johnson was the one most consumed by the law, yet he never practiced it, and being relegated to the position of an outside observer brought him profound misery—even as he acknowledged his career could not have been otherwise.

For the young Johnson, actually studying the law in any formal way had proved impossible. After leaving Oxford without a degree (he was too poor to continue beyond a bare thirteen months there), he cast around for suitable employment, taking up a succession of menial teaching posts and then, in 1738, regular hack work for the London-based Gentleman’s Magazine. He would sign letters “Impransus”—the supperless one. About that time, he inquired of a legal friend “whether a person might be permitted to practice as an advocate” in the House of Commons without a degree in civil law. The authority he consulted, an Oxford contemporary, was confident that Johnson “would have attained to great eminence.” Johnson himself believed that he would have been a successful lawyer, one reason being his lightning ability to understand both (or many) sides of an argument. Boswell affectionately reports that there where times when the “Why, sir…” of later Johnsonian replies was designed to give him a crucial extra second to decide which side of an argument to take that day. Hardly an unprejudiced observer, Boswell adds:

I cannot conceive a man better qualified to make a distinguished figure as a lawyer; for he would have brought to his profession a rich store of various knowledge, an uncommon acuteness, and a command of language in which few could have equaled, and none have surpassed him.

Throughout his adult life, Johnson was surrounded by lawyers. Boswell himself was a practicing advocate and constantly asked for advice on his own cases and those of his father (Lord Auchinleck, a Scottish judge), as well as about key legal disputes of the day. Johnson would reply by letter, frequently penning lengthy speeches that Boswell might use in court—and sometimes did, near verbatim. Many of Johnson’s friends were lawyers, including his first biographer and literary executor, John Hawkins. Others, like the Irish barrister and wit Arthur Murphy, whose biographical study of Johnson appeared in 1792, or the political philosopher and member of Parliament Edmund Burke, whose father was an attorney and who briefly attended one of the Inns of Court, made lifelong use of their legal training. Not being of their number hurt acutely.

The Great Ebussuud Teaching Law, miniature from a divan by Mahmud Abd-al Baqi, mid-sixteenth century. The jurist and theologian Ebussuud Efendi greatly contributed to the shaping of classical Ottoman law. The Metropolitan Museum of Art, Gift of George D. Pratt, 1925.

Despite being a man of principles, Johnson deeply loved argument both for its own intellectual delight and for the aggressive joy of smashing down an opponent. But the want of a degree proved an insuperable barrier to his using these gifts in a formal setting. The best that he managed was to be commissioned by The Gentleman’s Magazine to report debates in Parliament. Since publishing accounts of Commons proceedings was illegal, these reports appeared under the mock title “Debates in the Senate of Magna Lilliputia.”

He took to the task in his customary eccentric way, attending the House just once, thereafter using a messenger to supply him with illicit notes from sympathetic sources, from which he would fashion and attribute often brilliant speeches. No politician ever complained, although the phrasing and frequently even the arguments were Johnson’s; he made the politicians look good. And, as he composed, he educated himself in many of the legal and political questions of the day.

The mid-eighteenth century was a time when a man of literary talent could be successful across a variety of genres. Johnson arrived in London in 1737 and it took some time for him to establish himself, but by 1750 he was writing essays, generally two a week, at first for the periodical The Rambler (which set his reputation), thereafter for The Adventurer and The Idler. By then he had already completed his poems London and The Vanity of Human Wishes; written and rewritten a play, Irene; and published his groundbreaking biography of the poet and convicted murderer Richard Savage. In 1755, after nine years’ concentrated endeavor, he also produced A Dictionary of the English Language, “proud in its vast bulk,” covering some 43,000 words and citing 114,000 quotations drawn from authors across four centuries. Meticulous, he would give the single verb/noun take 134 definitions, running to 8,000 words, over five pages. He also wrote a novel, The History of Rasselas, Prince of Abissinia, composed in a single week in 1759 to defray the cost of his mother’s funeral. Then there were forewords and introductions for books on subjects in which he might have minimal acquaintance: in 1756 he contributed a preface to Richard Rolt’s Dictionary of Trade and Commerce, later admitting to Boswell that he never read the book—“I knew very well what such a dictionary should be, and I wrote a preface accordingly.”

In the two years that he wrote essays for The Rambler—some 208 of them—not once does he tackle a legal subject head-on. The same is true of his articles in The Idler, The Adventurer, and The Gentleman’s Magazine; in more than 700 articles, none is directly on a legal subject. Two Idler essays are devoted to the shortsightedness of sending debtors to prison, but that was more a social sally:

The wisdom and justice of the English laws are, by Englishmen at least, loudly celebrated; but scarcely the most zealous admirers of our institutions can think that law wise which, when men are capable of work, obliges them to beg; or just which exposes the liberty of one to the passions of another.

(Shortly after the Dictionary was published, Johnson had found himself thus incarcerated in a “sponging house,” and his father too had fallen into debt.) During his years penning pieces for The Rambler, he would make frequent references to legal matters—at least twenty have some mention, if only in a phrase—but only No. 114 concerns a legal question at any length, an impassioned plea against vindictive laws, especially capital punishment for theft. In No. 125 he mentions in passing “one of the maxims of the civil law, that definitions are hazardous.” It is a telling quotation.

He had great respect for the place of law and a reverence for good lawyers, although he could be merciless about bad ones. In the dictionary, he created 178 new legal definitions and took from other sources a further 154, but his definitions are often with a curve to them. Choosing an authority for the word attorney, he quotes Alexander Pope: “vile attorneys, now an useless race”; he defines lawgiver as a “legislator, one who makes laws,” and this time the authority is Jonathan Swift: “A law may be very reasonable in itself, although one does not know the reason of the lawgivers.” He had a sense of humor and a sense of outrage.

In 1756 he contributed a long essay to The Literary Magazine, much of it concerned with King Frederick’s plan to reform the Prussian courts. At one point he acknowledges:

It is perhaps impossible to review the laws of any country without discovering many defects and many superfluities. Laws often continue when their reasons have ceased. Laws made for the first state of the society continue unabolished, when the general form of life is changed. Parts of the judicial procedure, which were at first only accidental, become in time essential; and formalities are accumulated on each other, till the art of litigation requires more study than the discovery of right.

The law of the land was vital to civilize society; but it could be a cumbersome and unfair thing.

By the mid 1750s, Johnson’s reputation as one of the outstanding literary figures of the day was assured, while within a small elite group his talents as a self-taught legal expert were being recognized. Among those impressed by his parliamentary reporting was one William Gerard Hamilton, popularly known as “Single Speech Hamilton,” due to a brilliant Commons debut, never repeated. Johnson was introduced to Hamilton by Robert Chambers, who in 1758, not yet twenty-one, had written to Johnson asking for help in winning a scholarship to Oxford. Now, eight years on, Johnson became Hamilton’s fact gatherer, researcher, and amanuensis, with a welcome salary, while that same year Oxford appointed Chambers, still only twenty-nine, Vinerian Professor of English Law, the most senior such position in the university. The professorship required Chambers to give a minimum of sixty lectures a year; he would be fined for each lecture undelivered. But while he may have been a brilliant constitutional lawyer, he was a timid soul, and when he put quill to paper was infected with a labored, pedestrian writing style. In some desperation, he asked Johnson to dictate the lectures for him.

A functioning police state needs no police.

—William S. Burroughs, 1959

Johnson happily accepted the work, regularly removing his hulk-like, lumbering frame off to Oxford for joint working sessions. To-gether they shared, wrote Boswell, “a great intimacy.” The partnership lasted until 1773, when Chambers was offered an important judgeship in Bengal. The result of the collaboration, in its most recent published edition, comprises two volumes totaling more than nine hundred pages. This Course of Lectures on the English Law boasts Chambers as principal author on the title page but also has the line “Composed in association with Samuel Johnson”—a well-kept secret throughout the time the lectures were delivered, and still not widely known.

The Vinerian lectures were Johnson’s profound contribution to legal literature and education. The spread of his knowledge is daunting, from the general character of feudal law and common law to royal power and medieval trade, the origins of the Commons and the Privy Council, courts of equity, the taxation of colonies, the general nature of punishment, forgery, divorce, even the use of books by a law student, and much else besides. Where the authorship is obviously Johnson’s, the sentences are enlivened by wit and fine phrasing. On page 452 of volume one, we learn that “gaming in persons of low degree is punished as idleness and dissoluteness…and punishable by fine and imprisonment.” But “for persons of a higher rank, if he loses more at a time than ten pounds may sue and recover it.” In George III’s Britain the aristocracy needed a certain protection; not incidentally, the king wanted a copy of the lectures for his private library, which is the sole reason that a complete manuscript of the lectures survives. On the penultimate page, readers are warned that “there are in every particular profession many things necessary to be known which books do not teach”—a Johnsonian gloss, one suspects.

Although Boswell mentions Robert Chambers a score of times in his biography, he seems utterly ignorant of the collaboration the young academic engineered with his unlikely mentor. The first book to examine their unique partnership in any detail was Dr. Johnson and the English Law, by E.L. McAdam Jr., published in 1951. It was McAdam who uncovered the secret of Johnson’s coauthorship, as late as 1939. He rated Johnson “the great lawyer-layman of his century.” One wonders if anyone has been a serious rival to him in the several centuries since.

In 1965 I went up to Cambridge to read law. My commitment did not last long. One of the first lectures I attended was by Professor Glanville Williams, then the country’s preeminent expert on criminal law. “How does one define a crime?” he asked a packed lecture theater. I sat forward expectantly: this was interesting. He went on: “A crime…is an action that is followed by criminal proceedings.” This was not interesting (as it seemed then to me; to be fair, his point was that more substantive proposed definitions all fail). Within the year I had changed courses to read English literature, my girlfriend’s subject and my longtime love.

The Lawyer’s Last Circuit, by Thomas Rowlandson, 1802. Courtesy National Gallery of Art, Washington, Rosenwald Collection.

After college I drifted into book publishing, and by the early 1990s was the editor for Richard Holmes’ new book, Dr. Johnson and Mr. Savage, an extraordinary work that examines the strange friendship between the young Johnson and the belligerent, wayward, spendthrift poet twelve years his senior, who claimed to be the illegitimate and persecuted son of a wealthy aristocrat. Savage’s ill fame was well earned. Some ten years before the two men met, late one night in November 1727, he, together with two well-soused companions, burst into a coffeehouse in Charing Cross, threw over a table, and threatened the customers there. During the ensuing scuffle, Savage drew his sword and thrust it into the belly of one of the men at the table. He was arrested on a capital charge of wounding and murder, found guilty by a grand jury court at the Old Bailey, and sentenced to be hanged at Tyburn. His reputation as provocative poet and rabble-rouser made the case the talking point of literary London, and within eight weeks, through the intercession of two aristocratic sympathizers, he had received a royal pardon.

After Savage had been freed from the hangman’s noose, he gloried in his newfound infamy, but he was soon down on his luck again, which was when Johnson befriended him. Johnson at the time was unknown and new to London, his nighttime companion notorious, but for two years the two men, without the means to rent decent lodgings, wearing paper cravats because their linen was in pawn, would wander the streets of London through the small hours, talking incessantly about life and love, literature and politics, encountering beggars, cutpurses, prostitutes, drunkards, and all the lowlifes of the city’s many disreputable haunts and dark alleyways.

A functioning police state needs no police.

—William S. Burroughs, 1959

Johnson published his re-creation of Savage’s life in 1744, the year after the poet’s death, and records the court case in detail. Although there was little doubt of his friend’s guilt, Johnson tilts the events as if he were counsel for the defense. The testimony of the three prosecuting witnesses—a coffeehouse maid and a prostitute and her pimp—he dismisses witheringly: “The witnesses which appeared against him were proved to be persons of characters which did not entitle them to much credit; a common strumpet, a woman by whom strumpets were entertained, and a man by whom they were supported.” Savage, by contrast, is “a modest, inoffensive man.” The whole account was published anonymously.

Richard Holmes (himself the son of a lawyer) expertly examines each piece of testimony, and Johnson’s distortions of it, and while he marvels at his subject’s skill in massaging the evidence, he makes the special pleading plain. In short, Johnson, in his thirty-fifth year when he wrote Savage’s story, so well past youthful flights of rebellion, puts friendship before the obligation to truth under the law: if Savage broke society’s rules, well, society had unfairly ranged itself against him.

Years later, after Savage’s death, Johnson was to explain to Boswell, “Nobody attempts to dispute that two and two make four; but with contests concerning moral truth, human passions are generally mixed, and therefore it must ever be liable to assault and misrepresentation.” Even though he advocates certain kinds of conduct, absent from his moral writings is any predetermined and authorized pattern of “good behavior.” Boswell goes on to say, as a way of explaining his subject’s moral code, that Johnson “delighted in discrimination of character, and having a masterly knowledge of human nature, was willing to take men as they are, imperfect and with a mixture of good and bad qualities.” He might show “the reverence due to a judicial determination,” but in the end he preferred his own system of ethics to that of any legal formulation.

In summarizing Savage’s case, Johnson concluded with a famous appeal to common humanity: “Those are no proper judges of his conduct who have slumbered away their time on the down of plenty, nor will a wise man easily presume to say, ‘Had I been in Savage’s condition, I should have lived, or written, better than Savage.’ ” Legal and moral judgment were not the same. Forever a committed Christian, Johnson would bend the knee to his maker but not to the overweening wisdom of the courts; they knew much, but he knew better. Thus he might coauthor a magisterial history of the law, but he remained at heart the Great Outlaw.

Petty laws breed great crimes.

—Ouida, 1880

In effect, the financial obstacles preventing Johnson from studying law were perhaps a blessing, saving him the trouble of discovering later on that his worldview would prevent him from enjoying being a lawyer. For all his regrets at the path not taken, there was good reason that he preferred to come at the law from the outside and not from the ranks of the formally admitted. The ways of the human heart might be effectively managed by the courts (“the law is the last result of human wisdom acting upon human experience for the benefit of the public,” he would proclaim in its defense), but they were better understood by what he called “the nose of the mind.” His nose, his mind.

↧

Tiny, Perfect Staircases Made by French Woodworkers

March 15, 2018, 7:12 pm

≫ Next: Scott Kelly’s medical monitoring has spawned some horrific press coverage

≪ Previous: Samuel Johnson, would-be attorney-at-law

Since the Middle Ages, France’s “compagnons” have lived idiosyncratic existences, steeped in mystery, ritual, and a devotion to their trades. Even today, these master craftsmen have certain quirks: As young people, they live in boarding houses together in towns across France, where they spend their days learning and training to become the country’s greatest tradespeople. After six months in one place, each tradesman will pack up and move on to another French town, and a new hostel, to learn more skills under a new master.

The name “compagnon” translates to “companion,” relating to the brotherhood between members and the shared identity of a movement that, today, encompasses around 12,000 permanent, active members. Professions usually fall into one of five “groups,” depending on their principal material: stone; wood; metal; leather and textiles; and food. Within these groups are bakers, clog-makers, carpenters, masons, glaziers, and many more. In the past century, new trades have been added and old ones have fallen away. But whatever the craft, the journey from apprentice to “compagnon” is long and highly specific, and culminates in the completion of a “masterwork”: an item that showcases the skills acquired over at least five years of sustained study.

Curved Staircase Model In The French Style, ca. 1850; carved, planed, turned, and veneered walnut; H x W x D: 30 x 28 x 43.5 cm (11 13/16 in. x 11 in. x 17 1/8 in.); Gift of Eugene V. and Clare E. Thaw; 2014-11-2 Courtesy Cooper Hewitt Museum

Historically, woodworkers have often chosen to produce a tiny, intricate staircase as their “masterwork.” Over 30 years, the art dealer and collector Eugene V. Thaw, who died at 90 in January 2018, amassed an incredible collection of these staircase models, dating from between the 18th and 20th centuries. Measuring only a few inches in height, they are self-supporting, graceful, and impossibly delicate. Since 2007, they have been part of the permanent collection of New York’s Cooper Hewitt, Smithsonian Design Museum, and are currently on display alongside craftsmen’s working drawings.

To make these models, craftsmen draw on a variety of different kinds of wood, including pear, ebony, walnut, and mahogany, with extra twiddly bits, like banisters and infinitesimal hand-railings, made of anything from brass to bone. Every minute piece of wood—and there are hundreds in each model—has been painstakingly hand-cut, carved, planed, joined, and inlaid to produce a staggeringly detailed staircase, in miniature. These were sometimes produced for competitions, writes Sarah D. Coffin, author of Made to Scale: Staircase Masterpieces, The Eugene & Clare Thaw Gift, where apprentices vied to be named the master carpenter of a city. “Other times, they might be group works for parade.” In these instances, slightly larger models would be carried through the city by their makers for all to admire.

Staircase Model, mid–late 19th century; mahogany, oak; H x W x D: 91.8 x 59.8 x 47 cm (36 1/8 x 23 1/2 x 18 1/2 in.); Gift of Eugene V. and Clare E. Thaw; 2007-45-7 Courtesy Cooper Hewitt Museum

After going from apprentice to “compagnon,” craftsmen undergo an initiation rite, which, according to UNESCO documents, remains “shrouded in secrecy to preserve its magic and effectiveness.” Depending on the trade, this ritual may include additional elements, like a two-day “symbolic journey.” A constant, however, is the adoption of a symbolic name that indicates where they have come from and something about their character: Prudence of Draguignan, Flower of Bagnolet, Liberty of Chateauneuf. The organization’s other particularities beyond the “secret” nickname include the wearing of a colored sash and carrying of a tall, ornamental wooden cane, given to them after initiation. For the rest of their lives, compagnons are part of a close knit brotherhood, with its own patron saint, feasts, and even funerary traditions. But in the past, these secretive ways caused outsiders—or “lay people,” as the compagnons call them—to regard them with suspicion and sometimes misgivings, Coffin writes.

Perhaps to dispel these ill-feelings, in 1839, compagnon Agricole Perdiguier wrote The Book of Compagnonnage. This multi-volume series revealed some of the movement’s customs, secrets, and obligations. “I do not pretend to map out its history here,” he wrote, in French, “but I will give a few details which should give enough of an understanding of it. … It should be remembered that I am writing here for the public, and most of all, for Compagnons, who largely possess very few books.” In recording many of their customs, Coffin writes, Perdiguier inspired a novel by George Sand, which “drew attention to some of the great works being produced by compagnonnage members and resulted in a revival of interest in their work.”

Staircase Model (France), late 19th century; planed, joined, and veneered cherry, walnut; H x W x D: 31 x 13 x 17.2 cm (12 3/16 x 5 1/8 x 6 3/4 in.); Gift of Eugene V. and Clare E. Thaw; 2007-45-8 Courtesy Cooper Hewitt Museum

Today, the compagnons continue much as they have for centuries—though these days, the specialist knowledge once put to work on France’s medieval cathedrals has led them to travel all over the world. In 1990, in the wake of Hurricane Hugo, they were flown in to Charleston, South Carolina, to help repair antique furniture and the damaged roofs of stately homes. Nine years later, a team of compagnon metalworkers were brought over from Reims to help to refurbish the Statue of Liberty’s flame. In France, they’ve recently received an uptick in public attention due to French president Emmanuel Macron’s interest in them and their work.

Now comprised of thousands of young French people, there are some small signs of change in this little-understood group of craftspeople. Once made up exclusively of men, since 2005, the compagnons have accepted women, and are now an international organization, with the option to train overseas at sister organizations in countries including Germany and Poland. In the past century, craftspeople from newer trades, like bakers and electricians, have joined traditional woodworkers and masons. What has not changed, however, is the spirit of commitment to their work. In the 19th century, graduating compagnons’ diplomas read: “Glory be to Work and Scorn upon Idleness—Work and Honour, this is our wealth.” Two hundred years on, the same sentiment applies.

Staircase Model (France), late 18th century; joined, planed, bent, and carved pear, wrought brass wire, turned bone; H x W x D: 75 x 67.3 x 67 cm (29 1/2 x 26 1/2 x 26 3/8 in.); Gift of Eugene V. and Clare E. Thaw; 2007-45-11 Courtesy Cooper Hewitt Museum

Staircase Model Double-revolution Stairway Model, 1850–1900; Made by R. B. ; cherry; H x W x D: 48.6 x 33.5 cm (19 1/8 x 13 3/16 in.); Gift of Eugene V. and Clare E. Thaw; 2007-45-9 Courtesy Cooper Hewitt Museum

Staircase Model, 19th century; pearwood; H x W x D: 44 x 16.5 x 16.2 cm (17 5/16 x 6 1/2 x 6 3/8 in.); Gift of Eugene V. and Clare E. Thaw; 2007-45-4 Courtesy Cooper Hewitt Museum

↧

Scott Kelly’s medical monitoring has spawned some horrific press coverage

March 16, 2018, 2:55 am

≫ Next: The Multiworse Is Coming

≪ Previous: Tiny, Perfect Staircases Made by French Woodworkers

Scott Kelly, here shown giving Ars readers a visual tour of the ISS.

Something very strange happened in the world of science news this week. A month-and-a-half-old press release, which reiterated news that was released in 2017, suddenly spawned a flurry of coverage. To make matters worse, a lot of that coverage repeated claims that range from biologically nonsensical to impossible. So if you've seen any mention of astronaut Scott Kelly's DNA this week, it's probably best if you immediately forget anything you read about it.

How did Scott Kelly's genes end up one of the hottest news stories? I really have no idea. The "news" apparently traces back to a NASA press release that came out on the last day of January. That release uses a lot of words to say that attendees of a recent workshop had agreed that preliminary findings NASA had announced a year earlier were legit. So really, the "news" here is well over a year old. Yet somehow, this release has triggered a geyser of news coverage at major outlets including CNN, USA Today, and many others.

While this would clearly be an odd situation, it wouldn't be much of a problem if most of the coverage didn't involve a horrific butchering of biology. To understand the story, we have to understand the biology—and why Scott Kelly's journey through space could tell us something about it.

DNA and genes 101

Why are people excited about Kelly's DNA? The simple answer would seem to be that he has an identical twin, who must have identical DNA, and so we have a chance to see what space does to DNA. After all, space is a high-radiation environment, and we know that radiation damages DNA.

But there's quite a bit more to it than that. First and foremost, the Kelly twins' DNA is not identical. Every time a cell divides, it typically picks up a mutation or two. Further mutations happen simply because of the stresses of life, which expose us all to some radiation and DNA-damaging chemicals, no matter how careful we are about diet and sunscreen. Over the years, the Kelly twins' cells have undoubtedly picked up collections of distinctive mutations.

As a result, the more relevant comparison (and one NASA did) is Scott's DNA before and after his time in space. That can tell us how many changes were picked up while in space. But as noted above, he would have probably picked up some mutations even if he sat here on Earth. And that's where his twin Mark, who did sit here on Earth, comes in. Mark's before and after gives us a sense of the normal background rate of mutation on Earth. Comparing that rate to Scott's tells us the important number: the degree to which this rate is elevated in the environment of low-Earth orbit.

But mutations alone don't tell the full story. Less than three percent of a person's DNA is translated into the proteins that perform the functions in our cell. So chances are good that any mutations Scott picked up would have missed his genes entirely.

But the DNA sequence of a gene isn't the only way to influence its behavior. Our environments influence gene activity all the time—our bodies change gene activity to respond to everything from hostile pathogens to the time of day. All of this happens without changes in our DNA sequences; instead, the activity is largely the product of changes in the proteins that stick to DNA and regulate nearby genes, along with the biochemical consequences of those changes. Many of these changes are transient, but some can get locked into place for the long term through feedback loops.

One of these feedback loops does involve subtle changes in DNA. Rather than changing the sequence of bases, some enzymes can generate small chemical tweaks to a base, adding an extra carbon atom or two. These changes—collectively called epigenetic changes—can then alter which proteins stick to the DNA, which will change the activity of nearby genes.

So NASA naturally also tracked gene activity and epigenetic modification of the twins' DNA. In addition, it also looked at the ends of every chromosome, where there's a structure called a telomere. Telomeres are composed of the same short DNA sequence (TTAGGG) repeated multiple times; the number of repeats responds to things like stress and diet and can influence how often a cell can divide.

Testing 1, 2, 3...

NASA found that while Scott's telomeres got longer in space, they quickly returned to their pre-space state once he returned to Earth. And while lots of genes changed activity in space, most of those returned to normal, too. But for seven percent of his genes, the changed activity levels have persisted. There were changes in the epigenetic DNA modifications, too, but these are difficult to correlate with differences in specific genes. Finally, Scott picked up a few mutations as well.

Unfortunately, lots of people who have been assigned to cover the NASA announcement 45 days after it was made haven't familiarized themselves with the underlying biology. For example, one of Denver's local TV stations announced that "93 percent of his DNA returned to normal" after his return to Earth, before going on to claim that "NASA confirmed that seven percent of his genes have remained changed and may stay that way." If 93 percent of Kelly's DNA hadn't been normal, he'd be dead. Instead, a lot of his genes (though less than the 100 percent implied here) showed changes in gene activity, all of which occurred without any changes in their underlying DNA.

LiveScience gets it wrong in its article's title, where it's announced that Scott "Has Different DNA Than His Identical Twin Brother." In reality, most of the twins' DNA remained identical, and most of the differences between them occurred prior to the trip to space. An awkward title would be forgivable if the article weren't bad as well. This one, however, says Scott's "genetic code had changed significantly." The genetic code is the system that translates DNA sequences into the amino acids of proteins; changes in this code would alter every protein in a person's body, killing them. The article goes on to claim there are hundreds of "space genes" that were altered, when really these genes are involved in things like stress and the immune system.

The coverage at Business Insiderwas equally sloppy, making a similar claim that space "permanently changed 7% of his DNA," which is simply false. It also perpetuates the confusion between a person's genes, which remain largely unchanged, and alterations in gene activity, which take place all the time. CNN also fell victim to the gene activity/DNA difference, referring to "the transformation of 7% of Scott's DNA." Yet another report claimed that "telomeres are involved in the repair of damaged DNA," which is simply wrong.

I'm sure it would be easy to find further confusion, but at some point I had to give up reading to keep myself from having a stress response that rivaled Kelly's.

There are plenty of lessons for journalists here. One is that it's a bad idea to rush to hit stories just because you see coverage of them elsewhere—especially in cases where the story is more than a year old. Another is that you probably shouldn't be covering stories if you don't have anyone on staff who specializes in that subject area.

But it's especially disheartening to see this level of carelessness at a time when reporting on basic facts is under attack as "fake news." If we can't get the facts right when they're as definitive as they are in science and when a 10 minute phone call to a biologist could clarify them, we are demonstrating that we shouldn't be trusted with the facts on more complicated subjects.

↧

The Multiworse Is Coming

March 16, 2018, 2:50 am

≫ Next: What America looked like before the EPA, in photos

≪ Previous: Scott Kelly’s medical monitoring has spawned some horrific press coverage

You haven’t seen headlines recently about the Large Hadron Collider, have you? That’s because even the most skilled science writers can’t find much to write about.

There are loads of data for sure, and nuclear physicists are giddy with joy because the LHC has delivered a wealth of new information about the structure of protons and heavy ions. But the good old proton has never been the media’s darling. And the fancy new things that many particle physicists expected – the supersymmetric particles, dark matter, extra dimensions, black holes, and so on – have shunned CERN.

It’s a PR disaster that particle physics won’t be able to shake off easily. Before the LHC’s launch in 2008, many theorists expressed themselves confident the collider would produce new particles besides the Higgs boson. That hasn’t happened. And the public isn’t remotely as dumb as many academics wish. They’ll remember next time we come ask for money.

The big proclamations came almost exclusively from theoretical physicists; CERN didn’t promise anything they didn’t deliver. That is an important distinction, but I am afraid in the public perception the subtler differences won’t matter. It’s “physicists said.” And what physicists said was wrong. Like hair, trust is hard to split. And like hair, trust is easier to lose than to grow.

What the particle physicists got wrong was an argument based on a mathematical criterion called “naturalness”. If the laws of nature were “natural” according to this definition, then the LHC should have seen something besides the Higgs. The data analysis isn’t yet completed, but at this point it seems unlikely something more than statistical anomalies will show up.

I must have sat through hundreds of seminars in which naturalness arguments were repeated. Let me just flash you a representative slide from a 2007 talk by Michelangelo L. Mangano (full pdf here), so you get the idea. The punchline is at the very top: “new particles must appear” in an energy range of about a TeV (ie accessible at the LHC) “to avoid finetuning.”

I don’t mean to pick on Mangano in particular; his slides are just the first example that Google brought up. This was the argument why the LHC should see something new: To avoid finetuning and to preserve naturalness.

I explained many times previously why the conclusions based on naturalness were not predictions, but merely pleas for the laws of nature to be pretty. Luckily I no longer have to repeat these warnings, because the data agree that naturalness isn’t a good argument.

The LHC hasn’t seen anything new besides the Higgs. This means the laws of nature aren’t “natural” in the way that particle physicists would have wanted them to be. The consequence is not only that there are no new particles at the LHC. The consequence is also that we have no reason to think there will be new particles at the next higher energies – not until you go up a full 15 orders of magnitude, far beyond what even futuristic technologies may reach.

So what now? What if there are no more new particles? What if we’ve caught them all and that’s it, game over? What will happen to particle physics or, more to the point, to particle physicists?

In an essay some months ago, Adam Falkowski expressed it this way:

“[P]article physics is currently experiencing the most serious crisis in its storied history. The feeling in the field is at best one of confusion and at worst depression”

At present, the best reason to build another particle collider, one with energies above the LHC’s, is to measure the properties of the Higgs-boson, specifically its self-interaction. But it’s difficult to spin a sexy story around such a technical detail. My guess is that particle physicists will try to make it sound important by arguing the measurement would probe whether our vacuum is stable. Because, depending on the exact value of a constant, the vacuum may or may not eventually decay in a catastrophic event that rips apart everything in the universe.*

Such a vacuum decay, however, wouldn’t take place until long after all stars have burned out and the universe has become inhospitable to life anyway. And seeing that most people don’t care what might happen to our planet in a hundred years, they probably won’t care much what might happen to our universe in 10¹⁰⁰ billion years.

Personally I don’t think we need a specific reason to build a larger particle collider. A particle collider is essentially a large microscope. It doesn’t use light, it uses fast particles, and it doesn’t probe a target plate, it probes other particles, but the idea is the same: It lets us look at matter very closely. A larger collider would let us look closer than we have so far, and that’s the most obvious way to learn more about the structure of matter.

Compared to astrophysical processes which might reach similar energies, particle colliders have the advantage that they operate in a reasonably clean and well-controlled environment. Not to mention nearby, as opposed to some billion light-years away.

That we have no particular reason to expect the next larger collider will produce so-far unknown particles is in my opinion entirely tangential. If we stop here, the history of particle physics will be that of a protagonist who left town and, after the last street sign, sat down and died, the end. Some protagonist.

But I have been told by several people who speak to politicians more frequently than I that the “just do it” argument doesn’t fly. To justify substantial investments, I am told, an experiment needs a clear goal and at least a promise of breakthrough discoveries.

Knowing this, it’s not hard to extrapolate what particle physicists will do next. We merely have to look at what they’ve done in the past.

The first step is to backpedal from their earlier claims. This has already happened. Originally we were told that if supersymmetric particles are there, we would see them right away.

“Discovering gluinos and squarks in the expected mass range […] seems straightforward, since the rates are large and the signals are easy to separate from Standard Model backgrounds.”Frank Paige (1998).
“The Large Hadron Collider will either make a spectacular discovery or rule out supersymmetry entirely.”Michael Dine (2007)

Now they claim no one ever said it would be easy. By 2012, it was “Natural SUSY is difficult to see at LHC” and “"Natural supersymmetry" may be hard to find.”

Step two is arguing that the presently largest collider will just barely fail to see the new particles but that the next larger collider will be up to the task.

One of the presently most popular proposals for the next collider is the International Linear Collider (ILC), which would be a lepton collider. Lepton colliders have the benefit of doing away with structure functions and fragmentation functions that you need when you collide composite particles like the proton.

In a 2016 essay for Scientific American Howard Baer, Vernon D. Barger, and Jenny List kicked off the lobbying campaign:

“Recent theoretical research suggests that Higgsinos might actually be showing up at the LHC—scientists just cannot find them in the mess of particles generated by the LHC's proton-antiproton collisions […] Theory predicts that the ILC should create abundant Higgsinos, sleptons (partners of leptons) and other superpartners. If it does, the ILC would confirm supersymmetry.”

The “recent theoretical research” they are referring to happens to be that of the authors themselves, vividly demonstrating that the quality standard of this field is currently so miserable that particle physicists can come up with predictions for anything they want. The phrase “theory predicts” has become entirely meaningless.

The website of the ILC itself is also charming. There we can read:

“A linear collider would be best suited for producing the lighter superpartners… Designed with great accuracy and precision, the ILC becomes the perfect machine to conduct the search for dark matter particles with unprecedented precision; we have good reasons to anticipate other exciting discoveries along the way.”

They don’t tell you what those “good reasons” are because there are none. At least not so far. This brings us to step three.

Step three is the fabrication of reasons why the next larger collider should see something. The leading proposal is presently that of Michael Douglas, who is advocating a different version of naturalness, that is naturalness in theory space. And the theory space he is referring to is, drums please, the string theory landscape.

Naturalness, of course, has always been a criterion in theory-space, which is exactly why I keep saying it’s nonsense: You need a probability distribution to define it and since we only ever observe one point in this theory space, we have no way to ever get empirical evidence about this distribution. So far, however, the theory space was that of quantum field theory.

When it comes to the landscape at least the problem of finding a probability distribution is known (called “the measure problem”), but it’s still unsolvable because we never observe laws of nature other than our own. “Solving” the problem comes down to guessing a probability distribution and then drowning your guess in lots of math. Let us see what predictions Douglas arrives at:

Slide from Michael Douglas. PDF here. Emphasis mine.

Supersymmetry might be just barely out of reach of the LHC, but a somewhat larger collider would find it. Who’d have thought.

You see what is happening here. Conjecturing a multiverse of any type (string landscape or eternal inflation or what have you) is useless. It doesn’t explain anything and you can’t calculate anything with it. But once you add a probability distribution on that multiverse, you can make calculations. Those calculations are math you can publish. And those publications you can later refer to in proposals read by people who can’t decipher the math. Mission accomplished.

The reason this cycle of empty predictions continues is that everyone involved only stands to benefit. From the particle physicists who write the papers to those who review the papers to those who cite the papers, everyone wants more funding for particle physics, so everyone plays along.

I too would like to see a next larger particle collider, but not if it takes lies to trick taxpayers into giving us money. More is at stake here than the employment of some thousand particle physicists. If we tolerate fabricated arguments in the scientific literature just because the conclusions suit us, we demonstrate how easy it is for scientists to cheat.

Fact is, we presently have no evidence – neither experimental nor theoretical evidence – that a next larger collider would find new particles. The absolutely last thing particle physicists need right now is to weaken their standards even more and appeal to multiversal math magic that can explain everything and anything. But that seems to be exactly where we are headed.

* I know that’s not correct. I merely said that’s likely how the story will be spun.

↧

What America looked like before the EPA, in photos

March 16, 2018, 1:43 am

≫ Next: De-anonymizing programmers from executable binaries

≪ Previous: The Multiworse Is Coming

Popular Science has a series of photos taken by EPA staff in the early years of the agency after it was formed in the 1970s, that have since been digitized.

It’s pretty grim stuff: abandoned cars in Jamaica Bay, broken candy-glass unreturnable bottles everywhere, and one mill after another belching out smoke and dumping refuse in the rivers.

marc_st_gil_the_atlas_chemical_company_belches_smoke_across_pasture_land_in_foreground._061972_0.jpg
The Atlas Chemical Company, by Marc St. Gil

charles_steinhacker_-_outflow_pipe_6_of_the_oxford_paper_company_will_at_rumford_._061973.jpg
Oxford Paper Company, by Charles Steinhacker

william_strode_-_burning_barge_on_the_ohio_river_may_1972.jpg
Burning barge on the Ohio River, by William Strode

erik_calonius_-_mary_workman_holds_a_jar_of_undrinkable_water_that_comes_from_her_well_and_has_filed_a_damage_suit_against_the_hanna_coal_company_._101973.jpg
Mary Workman holds a jar of undrinkable water from her well, and has filed suit against the Hanna Coal Company - by Erik Calonius

Given that there’s been a renewed, serious push this year to dismantle or undermine the EPA, it’s worth revisiting just why we needed an agency to protect the environment to begin with.

↧

De-anonymizing programmers from executable binaries

March 15, 2018, 11:07 pm

≫ Next: A Smooth Curve as a Fractal Under the Third Definition

≪ Previous: What America looked like before the EPA, in photos

When coding style survives compilation: de-anonymizing programmers from executable binaries Caliskan et al., NDSS’18

As a programmer you have a unique style, and stylometry techniques can be used to fingerprint your style and determine with high probability whether or not a piece of code was written by you. That makes a degree of intuitive sense when considering source code. But suppose we don’t have source code? Suppose all we have is an executable binary? Caliskan et al., show us that it’s possible to de-anonymise programmers even under these conditions. Amazingly, their technique still works even when debugging symbols are removed, aggressive compiler optimisations are enabled, and traditional binary obfuscation techniques are applied! Anonymous authorship of binaries is consequently hard to achieve.

One of the findings along the way that I found particularly interesting is that more skilled/experienced programmers are more fingerprintable. It makes sense that over time programmers acquire their own unique way of doing things, yet at the same time these results seem to suggest that experienced programmers do not converge on a strong set of stylistic conventions. That suggests to me a strong creative element in program authorship, just as experienced authors of written works develop their own unique writing styles.

If we encounter an executable binary sample in the wild, what can we learn from it? In this work, we show that the programmer’s stylistic fingerprint, or coding style, is preserved in the compilation process and can be extracted from the executable binary. This means that it may be possible to infer the programmer’s identity if we have a set of known potential candidate programmers, along with executable binary samples (or source code) known to be authored by these candidates.

Out of a pool of 100 candidate programmers, Caliskan et al. are able to attributed authorship with accuracy of up to 96%, and with a pool of 600 candidate programmers, they reach accuracy of 83%. These results assume that the compiler and optimisation level used for compilation of the binary are known. Fortunately, previous work has shown that toolchain provenance, including the compiler family, version, optimisation level, and source language, can be identified using a linear Conditional Random Field (CRF) with accuracy of up to 99% for language, compiler family, and optimisation level, and 92% for compiler version.

One of the potential uses for the technology is identifying authors of malware.

Finding fingerprint features in executable binaries

So how is this seemingly impossible feat pulled off? The process for training the classifier given a corpus of works by authors in a candidate pool has four main steps, as illustrated below:

Disassembly: first the program is disassembled to obtain features based on machine code instructions, referenced strings, symbol information, and control flow graphs.
Decompilation: the program is translated into C-like pseudo-code via decompilation, and this pseudo-code is passed to a fuzzy C parser to generate an AST. Syntactical features and n-grams are extracted from the AST.
Dimensionality reduction: standard feature selection techniques are used to select the candidate features from amongst those produced in steps 1 and 2.
Classification: a random forest classifier is trained on the corresponding feature vectors to yield a program that can be used for automatic executable binary authorship attribution.

You can download the code at https://github.com/calaylin/bda.

Disassembly

The disassembly step runs the binary through two different disassemblers: the netwide disassembler (ndisasm), which does simple instruction decoding, and the radare2 state-of-the-art open source disassembler, which also understands the executable binary format. Using radare2 it is possible to extract symbols, strings, functions, and control flow graphs.

Information provided by the two disassemblers is combined to obtain our disassembly feature set as follows: we tokenize the instruction traces of both disassemblers and extract token uni-grams, bi-grams, and tri-grams within a single line of assembly, and 6-grams, which span two consecutive lines of assembly… In addition, we extract single basic blocks of radare2’s control flow graphs, as well as pairs of basic blocks connected by control flow.

Decompilation

Decompilation is done using the Hex-Rays commercial state-of-the-art decompiler, which produces human readable C-like pseudo-code. This code may be much longer than the original source code (e.g. decompiling a program that was originally 70 lines long may produce on average 900 lines of decompiled code).

From the decompiled result, both lexical and syntactical features are extracted. Lexical features are word unigrams capturing integer types, library function names, and internal function names (when symbol table information is available). Syntactical features are obtained by passing the code to the joern fuzzy parser and deriving features from the resulting AST.

Dimensionality reduction

Following steps one and two, a large number of features can be generated (e.g., 705,000 features from 900 executable binary samples taken across 100 different programmers). A first level of dimensionality reduction is applied using WEKA’s information gain attribute selection criteria, and then a second level of reduction is applied using correlation based feature selection. The end result for the 900 binary samples is a set of 53 predictive features.

Classification

Classification is done using random forests with 500 trees. Data is stratified by author analysed using k-fold cross-validation, where k is equal to the number of available code samples per author.

Evaluation results

The main evaluation is performed using submission to the annual Google Code Jam competition, in which thousands of programmers take part each year. “We focus our analysis on compiled C++ code, the most popular programming language used in the competition. We collect the solutions from the years 2008 to 2014 along with author names and problem identifiers.”

Datasets are created using gcc and g++, using each of O1, O2, and O3 optimisation flags (so six datasets in all). The resulting datasets contain 900 executable binary samples from 100 different authors. As we saw before, the authors are able to reduce the feature set down to 53 predictive features.

To examine the potential for overfitting, we consider the ability of this feature set to generalize to a different set of programmers, and show that it does so, further supporting our belief that these features effectively capture programming style. Features that are highly predictive of authorial fingerprints include file and stream operations along with the formats and initializations of variables from the domain of ASTs, whereas arithmetic, logic, and stack operations are the most distinguishing ones among the assembly instructions.

Without optimisation enabled, the random forest is able to correctly classify 900 test instances with 95% accuracy. Furthermore, given just a single sample of code (for training) from a given author, the author can be identified out of a pool of 100 candidates with 65% accuracy.

The classifier also reaches a point of dramatically diminishing returns with as few as three training samples, and obtains a stable accuracy by training on 6 samples. Given the complexity of the task, this combination of high accuracy with extremely low requirement on training data is remarkable, and suggests the robustness of our features and method.

The technique continues to work well as the candidate pool size grows:

Turning up the difficulty level

Programming style is preserved to a great extent even under the most aggressive level 3 optimisations:

…programmers of optimized executable binaries can be de-anonymized, and optimization is not a highly effective code anonymization method.

Fully stripping symbol information reduces classification accuracy by 24%, so even removing symbols is not an effective form of anonymisation.

For the pièce de résistance the authors use Obfuscator-LLVM and apply all three of its obfuscation techniques (instruction substitution, introducing bogus control flow, flattening control flow graphs). And the result? “Using the same features as before, we obtain an accuracy of 88% in correctly classifying authors.”

… while we show that our method is capable of dealing with simple binary obfuscation techniques, we do not consider binaries that are heavily obfuscated to hinder reverse engineering.

So you want to stay anonymous?

If you really do want to remain anonymous, you’d better plan for that from the very beginning of your programming career, and even then it doesn’t look easy! Here are the conditions recommended by the authors:

Do not have any public repositories
Don’t release multiple programs using the same online identity
Try to have a different coding style (!) in each piece of software you write, and try to code in different programming languages.
Use different optimisations and obfuscations to avoid deterministic patterns

Another suggestion that comes to mind is to use an obfuscater deliberately designed to prevent reverse engineering. Although since these weren’t tested, we don’t actually know how effective that will be.

A programmer who accomplishes randomness across all potential identifying factors would be very difficult to deanonymize. Nevertheless, even the most privacy savvy developer might be willing to contribute to open source software or build a reputation for her identity based on her set of products, which would be a challenge for maintaining anonymity.

↧

A Smooth Curve as a Fractal Under the Third Definition

March 15, 2018, 5:43 pm

≫ Next: CSS Grid: It's Time to Take a Serious Look

≪ Previous: De-anonymizing programmers from executable binaries

(Submitted on 11 Feb 2018)

Abstract: It is commonly believed in the literature that smooth curves, such as circles, are not fractal, and only non-smooth curves, such as coastlines, are fractal. However, this paper demonstrates that a smooth curve can be fractal, under the new, relaxed, third definition of fractal - a set or pattern is fractal if the scaling of far more small things than large ones recurs at least twice. The scaling can be rephrased as a hierarchy, consisting of numerous smallest, a very few largest, and some in between the smallest and the largest. The logarithmic spiral, as a smooth curve, is apparently fractal because it bears the self-similar property, or the scaling of far more small squares than large ones recurs multiple times, or the scaling of far more small bends than large ones recurs multiple times. A half-circle or half-ellipse and the UK coastline (before or after smooth processing) are fractal, if the scaling of far more small bends than large ones recurs at least twice.
Keywords: Third definition of fractal, head/tail breaks, bends, ht-index, scaling hierarchy

Comments:	8 pages, 5 figures, and 1 table
Subjects:	General Mathematics (math.GM)
Cite as:	arXiv:1802.03698 [math.GM]
	(or arXiv:1802.03698v1 [math.GM] for this version)

From: Bin Jiang [view email]
[v1] Sun, 11 Feb 2018 06:21:24 GMT (399kb)

↧

CSS Grid: It's Time to Take a Serious Look

March 16, 2018, 6:07 am

≫ Next: Show HN: Golang service to read records from kafka and write to elasticsearch

≪ Previous: A Smooth Curve as a Fractal Under the Third Definition

CSS Grid has been around in the web front end development space for a while now. However, up until fairly recently it wasn’t supported across the majority of today’s popular web browsers. Well this has changed my friends and it’s time to invest some hours into learning CSS Grid.

What is it?

Jackie-Chan

The elements of a website are typically defined in the HTML of the page. CSS is used to style and layout those elements beautifully. CSS Grid provides an intuitive syntax which can be used to tackle some of the most infuriating challenges that come with this styling/layout responsibility.

Why do I care?

Before CSS Grid blessed us with its presence, we were used to laying out website elements using Flexbox. Before that… well, let’s forget about the past and focus on the here and now! This is not to say that Grid should/will replace the use of Flexbox in our applications. Actually, they work quiet well together.

CSS Grid establishes a logical representation of a grid consisting of rows and columns together in the DOM. If you’ve ever had the pleasure of working with HTML table elements and developing the logical representation of a column given a dynamic table structure then you’d understand one of the fundamental challenges that is solved with CSS Grid. This new tech also alleviates headaches caused by using margins to space out your elements, dynamic element layout, elements with relative width/height, responsive design and much more.

What’s the best way to learn this?

There are a few really great resources out there for quickly learning CSS Grid. For starters, you should download the Mozilla Firefox browser because the native CSS Grid development tools are the best out there right now.

I would then suggest checking out Wes Bos’ free course on the subject. Wes isn’t charging anything for the course because Mozilla paid him for the time it took him to create it. In the course Wes does a great job of breaking down the fundamentals and shows you some of the awesome Firefox dev tooling for CSS Grid. Also, make sure to bookmark the CSS Tricks cheat sheet which can prove helpful for specific use case analysis.

↧

Show HN: Golang service to read records from kafka and write to elasticsearch

March 16, 2018, 6:07 am

≫ Next: Machine Learning for Predictive Maintenance – Automation World

≪ Previous: CSS Grid: It's Time to Take a Serious Look

README.md

Application responsible for loading kafka topics into elasticsearch. Some use cases

Using elasticsearch as a debugging tool to monitor data activity in kafka topics
Using elasticsearch + kibana as an analytics tool
Easily integrating applications with elasticsearch

Usage

To create new injectors for your topics, you should create a new deployment with your configurations. You can use existing deployments in deploy/ as a template. Just remember to change the app name and the metadata related to it.

Configuration variables

KAFKA_ADDRESS Kafka url. REQUIRED
SCHEMA_REGISTRY_URL Schema registry url port and protocol. REQUIRED
KAFKA_TOPICS Comma separated list of kafka topics to subscribe REQUIRED
KAFKA_CONSUMER_GROUP Consumer group id, should be unique across the cluster. Please be careful with this variable REQUIRED
ELASTICSEARCH_HOST Elasticsearch url with port and protocol. REQUIRED
ES_INDEX Elasticsearch index prefix to write records to(actual index is followed by the record's timestamp to avoid very large indexes). Defaults to topic name. OPTIONAL
PROBES_PORT Kubernetes probes port. Set to any available port. REQUIRED
K8S_LIVENESS_ROUTE Kubernetes route for liveness check. REQUIRED
K8S_READINESS_ROUTEKubernetes route for readiness check. REQUIRED
KAFKA_CONSUMER_CONCURRENCY Number of parallel goroutines working as a consumer. Default value is 1 OPTIONAL
KAFKA_CONSUMER_BATCH_SIZE Number of records to accumulate before sending them to elasticsearch(for each goroutine). Default value is 100 OPTIONAL
ES_INDEX_COLUMN Record field to append to index name. Ex: to create one ES index per campaign, use "campaign_id" here OPTIONAL
ES_BLACKLISTED_COLUMNS Comma separated list of record fields to filter before sending to elasticsearch. Defaults to empty string. OPTIONAL
LOG_LEVEL Determines the log level for the app. Should be set to DEBUG, WARN, NONE or INFO. Defaults to INFO. OPTIONAL
METRICS_PORT Port to export app metrics REQUIRED
ES_BULK_TIMEOUT Timeout for elasticsearch bulk writes in the format of golang's time.ParseDuration. Default value is 1s OPTIONAL
KAFKA_CONSUMER_RECORD_TYPE Kafka record type. Should be set to "avro" or "json". Defaults to avro. OPTIONAL
KAFKA_CONSUMER_METRICS_UPDATE_INTERVAL The interval which the app updates the exported metrics in the format of golang's time.ParseDuration. Defaults to 30s. OPTIONAL

Important note about Elasticsearch mappings and types

As you may know, Elasticsearch is capable of mapping inference. In other words, it'll try to guess your mappings based on the kind of data you are sending. This is fine for some use cases, but we strongly recommend that you create your own mappings (Especially if you care about your date types). If you are using multiple indexes, a index template is something that you should look into.

If you are planning on using kibana as an analytics tool, is recommended to use a template for your data like belows.

Setting up a template in Elastic Search

Note: This step only works with elastic search 5.5.0 and above.

Index templates allow you to define templates that will automatically be applied when new indices are created. In this example, a wildcard (*) is used and every new index following this pattern will use the template configuration.

To set a template for some index, send a PUT REST method to: http://elasticsearch:9200/_template/sample-logs with belows JSON file.

{"template": "sample-logs-*","mappings": {"_default_": {"_all": {"enabled": "false"
      },"_source": {"enabled": "true"
      },"properties": {"timestamp": {"type": "date","format": "date_optional_time","ignore_malformed": true
        }
      },"dynamic_templates": [
        {"strings": {"match_mapping_type": "string","mapping": {"type": "text","index": false
            }
          }
        }
      ]
    }
  }
}

This template does the following

Makes all strings not analyzed by default. If you want/need analyzed fields, you can either remove the dynamic template for "strings" or add your field as a property
Adds a date property to be used as a time field. Replace according to your field name and format.
Sets the _default_ type. Matches all types and acts as a base. You can override settings or set new properties by adding other types.
If your're using a timestamp with milliseconds, use epoch_millis as format for date type.

You can find more information here: Indices Templates, Mapping Changes,Date datatype

Monitoring

This project exports metrics such as consumer lag by default. We don't have a grafana template ready for consumers but you can use existing dashboards as inspiration for your application.

The exported metrics are:

kafka_consumer_partition_delay: number of records betweeen last record consumed successfully and the last record on kafka, by partition and topic.
kafka_consumer_records_consumed_successfully: number of records consumed successfully by this instance.

Development

Clone the repo, install dep and retrieve dependencies:

go get -u github.com/golang/dep/...
dep ensure -v

To run tests, run docker-compose up -d and run make test.

Versioning

The project's version is kept on the VERSION file on the project's root dir. Please update this file before merging any PRs to the master branch.

When a git push is triggered, CircleCI will run the project's tests and push the generated docker image to dockerhub. If the current branch is master, the docker tag will be <version>. If not, it will be <version>-<commit-sha>

↧

Machine Learning for Predictive Maintenance – Automation World

March 16, 2018, 6:14 am

≫ Next: Sheriff Legally Profits $750,000 from Starving Inmates

≪ Previous: Show HN: Golang service to read records from kafka and write to elasticsearch

As a manufacturing executive, plant floor manager or operations engineer, your first thought when hearing the word system-on-chip (SoC) is to think this is a technology that is purely of interest to device developers. Historically, your need to understand this type of embedded technology would be very limited. But just as everything else about automation technology is changing in light of the Internet of Things (IIoT), so too is the need for you to better understand certain aspects of embedded, chip-level technology.

At the Embedded World 2018 event in Nuremberg, Germany, Christoph Fritsch, director of Industrial, Scientific & Medical at Xilinx, told me that he sees the future of embedded technology in automation evolving very quickly. “For device manufacturers, these changes allow them to leverage the technology to build their devices more efficiently, but it’s also opening up the embedded sandboxes for system integrators and end users to drive applications,” he said.

Explaining this recent change, Fritsch said that, in the past, Xilinx only worked with control device builders, such as Rockwell Automation, Siemens, Schneider Electric, etc. But since the advent of automation supplier IIoT platforms, like Siemens’ Mindsphere or GE’s Predix, “we’re project managing the development of systems with OEMs and end users to help them connect from the edge to the cloud,” he said.

In response to these developments, Fritsch pointed out that Xilinx has been working in depth with system integrators for about two years to show how SoC technology can be applied. One example he mentioned in our meeting is Xilinx’s work with system integrator Aingura IIoT for a project at CNC machine builder Etxe-Tar in Elgoibar, Spain. The aim of this partnership between Xilinx and Aingura IIoT is to support Etxe-Tar’s plan to implement machine learning on its CNC machines to enable predictive maintenance.

This project at Etxe-Tar, which sells its CNC machines to the automotive industry for use in building powertrain parts, has been detailed in the Industrial Internet Consortium’s white paper, “Making Factories Smarter with Machine Learning.” In the automotive industry that Etxe-Tar principally serves, operational failure of a CNC machines’ spindles can result in hundreds of thousands of dollars of damages. When spindle internal bearings fail, they effectively create a chain reaction that can destroy any linked device in close proximity. Such occurrences can shut down a production line for weeks, depending on the severity of the failure and spare parts availability. Accounting for all related aspects, including an idled workforce, the total cost impact of such a failure can easily reach millions of dollars per week.

The predictive maintenance system Aingura IIoT built to leverage machine learning and address the CNC failure issue is called Oberon. The Oberon system gathers data from the machines to which it is connected to deliver information about machine behavior.

A key component of the Oberon system is its intelligent gateway, designed and manufactured by System-on-Chip Engineering. The gateway uses the Xilinx Zynq SoC, which combines ARM processing and programmable logic fabric in a reconfigurable SoC device to perform realtime acquisition, sensor fusion (aggregation and connection of data from multiple sensors), data filtering and analysis, and pattern detection.

In terms of Oberon’s data acquisition using SoC technology, its main objective is to gather data coming from sensors. However, it also pre-processes the data to reduce the overall volume being transmitted. For example, vibration is sampled at least twice as often as vibration frequency. In this predictive maintenance application, a fast Fourier transform is performed and only the frequency of interest is stored.

For machine learning to be effective, it is critical to identify the relevant data variables so that noise and bandwidth use is reduced. This technique is called “feature subset selection.”

One example of how this can be applied in a CNC machine involves the machine’s servomotors, where variables like torque, power, temperature, vibration and angular speed can be measured. The number of trackable variables in a servomotor can be as high as 15,000. However, using feature subset selection, the reduction could result in identifying the need to track and store data from only 50 variables.

To illustrate how this works, the “Making Factories Smarter with Machine Learning” paper shows how the workflow with data reduction (Figure 1) would be used to build the CNC machine learning system. In this example, data is taken from the manufacturing system and sent to a machine learning algorithm that uses the new data and other information, such as mathematical models, to produce the predictive system. While data is traveling within this process, a summarization is performed, helping to only move data that is needed. This helps to reduce the bandwidth utilization and increase the response speed.

Based on historical data acquired during typical operation, machine learning algorithms use this and other real-time operational data to identify and learn system behavior patterns during the machining process. The data is analyzed in real time on the intelligent gateway and compared to typical operation data to identify anomalous operation and predict degradation—down to the component level—prior to any system failure.

In a CNC application, a machine learning-based monitoring system could detect the first signs of failure, providing enough time to stop the line in a controlled manner. Production and workforce teams could then be reassigned to reduce the failure’s impact on line productivity.

As an example, the paper highlights how the acceleration level related to the shaft is plotted against servomotor power (Figure 2). Here, the clustering technique distinguishes between idle, acceleration and deceleration and maximum power in terms of acceleration levels. The acceleration level is independent from the power consumption (power levels can be distinguished more clearly than in the data shown in Figure 3).

Based on this data analysis for the predictive maintenance application, it is expected that a servomotor should maintain this fingerprint acceleration level at all power consumption levels. Since the acceleration is related to the shaft angular speed, a malfunction of the servomotor would be detected whenever anomalies appeared outside the clusters; for example, anomalous vibration levels at a given acceleration state.

↧

Sheriff Legally Profits $750,000 from Starving Inmates

March 15, 2018, 10:16 pm

≫ Next: Memoirs of an Ass

≪ Previous: Machine Learning for Predictive Maintenance – Automation World

Etowah County Sheriff Todd Entrekin took home as personal profit more than $750,000 that was budgeted to feed jail inmates, which is legal in Alabama, according to state law and local officials. Brynn Anderson/APhide caption

toggle caption

Brynn Anderson/AP

A sheriff in Alabama took home as personal profit more than $750,000 that was budgeted to feed jail inmates — and then purchased a $740,000 beach house, a reporter at The Birmingham Newsfound.

And it's perfectly legal in Alabama, according to state law and local officials.

Alabama has a Depression-era law that allows sheriffs to "keep and retain" unspent money from jail food-provision accounts. Sheriffs across the state take excess money as personal income — and, in the event of a shortfall, are personally liable for covering the gap.

Etowah County Sheriff Todd Entrekin told the News that he follows that practice of taking extra money from the fund, saying, "The law says it's a personal account and that's the way I've always done it."

Sheriffs across the state do the same thing and have for decades. But the scale of the practice is not clear: "It is presently unknown how much money sheriffs across the state have taken because most do not report it as income on state financial disclosure forms," the Southern Center for Human Rights wrote in January.

But in Etowah County, the News found the paper trail.

'Following the letter of the law'

The News discovered the eye-popping figures on ethics disclosures that Entrekin sent to the state: Over the course of three years, he received more than $750,000 in extra compensation from "Food Provisions." The exact amount over $750,000 is unclear, because Entrekin was not required to specify above a $250,000 a year threshold, the paper writes.

The paper also found that Entrekin and his wife own several properties worth a combined $1.7 million, including a $740,000 four-bedroom house in Orange Beach, Ala., purchased in September.

Without the provision funds, Entrekin earns a little more than $93,000 a year, the paper says.

In a statement emailed to NPR, Entrekin said the "liberal media has began attacking me for following the letter of the law."

"The Food Bill is a controversial issue that's used every election cycle to attack the Sheriff's Office," Entrekin said. "Alabama Law is clear regarding my personal financial responsibilities of feeding inmates. Until the legislature acts otherwise, the Sheriff must follow the current law."

Before he made headlines for profiting off the law, Entrekin was better known for being indebted by it.

When Entrekin's predecessor died while still in office, all the money in the food provision account went to his estate — as state law dictated, a county official told NPR. Entrekin had to borrow $150,000 to keep the inmates fed. He was paying down that debt for years, The Gadsden Times reported.

In 2009, while he was still in debt from paying for inmates' food, Entrekin told the Times that he personally thought the law needed to be changed. But he noted that it might cost more money for taxpayers if the county commission had to manage jail kitchens through an open bid process.

David Akins, the chief administrative officer of the Etowah County Commission, agrees with that assessment. He says the commission is not eager to take on that duty, as some other local governments have done.

"The sheriff can feed inmates cheaper than the county can," he said.

Inmate's diets, sheriff's responsibility

Alabama's controversial system hearkens back to a different era, when county jails were more of a mom and pop operation and feeding inmates was often the responsibility of a sheriff's wife.

Today in Alabama, sheriffs are personally responsible for feeding inmates in their jails and receive funds to cover the cost. For state inmates, it's less than $2 per inmate per day; for county, city or federal inmates, the amount can be higher.

If sheriffs feed inmates on lessthan that, they can "keep and retain" whatever is left over.

Lawyer Aaron Littman, at the Southern Center for Human Rights, said in a January statement that the practice of pocketing leftover funds was a "dubious interpretation" of the law that "raises grave ethical concerns, invites public corruption, and creates a perverse incentive to spend as little as possible on feeding people who are in jail." He argues the sheriffs are supposed to manage the funds, not personally profit from them.

But local governments across the state say the law is clear that the money can be kept for personal use.

"That's the way it was set up years ago," Akins from the Etowah County Commission tells NPR. "That's just the way it's been in the state. ... Of course, state legislators could always change that if they wanted to."

He doesn't see a problem with the practice.

"I think if the inmates were not being fed properly, it might be a concern," he said. "But I'll guarantee you that if they're not fed properly, the federal government would let us know about it."

'Sheriff Corn Dog' and bankrupt car lots

In some cases, the federal government has objected.

In 2009, then-Sheriff Greg Bartlett of Morgan County was briefly tossed in jail after acknowledging that he had personally profited, to the tune of $212,000, from a surplus in the jail-food account. Prisoners testified about receiving meager meals.

To cut corners, Bartlett used charitable donations and "special deals," as CBS put it— including once splitting a $1,000 truck full of corn dogs with a sheriff of a nearby county and then feeding the inmates corn dogs twice a day for weeks.

He defended himself by noting that his profit was legal under state law, but an exasperated federal judge said the sheriff had an obligation to feed his inmates adequate food.

The story made national headlines, and Bartlett agreed to no longer dip into the jail food fund.

In 2015, a sheriff in Morgan County loaned $150,000 from the inmate food fund to a corrupt car lot. The loan was revealed when the business, facing theft and scam charges, went bankrupt.

Again, that sheriff's use of the food money was legal under state law; it was only prohibited in Morgan County because of the county's particular history.

Aside from individual lawsuits like those, it's hard to tell exactly how much money earmarked for inmate food is going to sheriffs.

This January, two advocacy groups sued for access to records that could reveal how much jail food money was being turned into personal profit. The groups said 49 sheriffs had refused to provide records of where funds were spent.

Then in February, reporter Connor Sheets of the News began revealing Entrekin's spending history and his ethics disclosures.

'I put two and two together'

Sheets' investigation has also made headlines because of the arrest of a key source.

Sheets spoke with a landscaper named Matt Qualls who mowed Entrekin's lawn in 2015 and noticed the name of the account on his checks — the "Sheriff Todd Entrekin Food Provision Account." He shared pictures with Sheets.

"A couple people I knew came through the jail, and they say they got meat maybe once a month, and every other day, it was just beans and vegetables," Qualls told Sheets. "I put two and two together and realized that that money could have gone toward some meat or something."

Sheets' initial story was published on Feb. 18. On Feb. 22, Qualls was arrested and charged with drug trafficking after an anonymous call complained of the smell of marijuana from an apartment.

Qualls, who had never been arrested before, faces six charges and is being held on a $55,000 bond, Sheets reports. He is detained in a jail that Entrekin oversees.

Qualls was arrested by Rainbow City Police, not by the sheriff's department.

The Etowah County Drug Enforcement Unit added extra charges to his case, including a charge of drug trafficking, which the Rainbow City Police chief said was based on inaccurate weight calculations. (The unit counted 14 grams of pot, infused in five cups of butter, as more than than 1,000 grams worth of marijuana.)

"Penalties for drug trafficking are extremely steep in Alabama, where people have been imprisoned for life for the crime," Sheets notes.

The sheriff's office denies involvement in Qualls' case, noting that the landscaper was not arrested or charged by the sheriff's office. The extra charges were added by the Drug Enforcement Unit, which consist of agents drawn from the sheriff's department, the FBI and other law enforcement agencies.

↧

Memoirs of an Ass

March 14, 2018, 10:24 pm

≫ Next: The IRS collects data on Coinbase account holders

≪ Previous: Sheriff Legally Profits $750,000 from Starving Inmates

Just to give you the essentials: Probably around 180 A.D. (which is to say probably during the reign of the emperor Marcus Aurelius), a novel was written in Latin. It really is a novel. Trot out any definition of novel: it’s that. Also, it’s the only one, complete, that we have from ancient Rome. Other similar books in Latin have been reduced, over the centuries, to rubble. The one I’m talking about is still whole. Corrupt, but whole.

The author’s name was Apuleius. He was famous during his lifetime as a Platonic philosopher. There were statues of him in North Africa, where he was from. They’re all gone now. And we don’t know how many copies of the novel existed during his lifetime. We do know that every one of ’em had to be copied out by hand. The text requires about two hundred pages of modern type, I don’t know how many pages of Latin holograph.

What’s in it. Well, a guy tells, in the first person, about traveling to the region of Greece that was most notorious for witchcraft in those days: Thessaly. He hears various ominous stories about how dangerous it is to meddle with magic. He does it anyway and, by mistake, transforms himself into a jackass. He gets stolen; he gets beaten; he overhears stories. He’s sold; he’s stolen again; he’s beaten again; he hears more stories. In the end, he turns back into a man, but his mentality is permanently altered. He has become a devotee of the goddess. He becomes a priest. The end.

The end, except I’ve left everything out. The stories he hears, both before the transformation and after, are—some of ’em at least—gold. And the author has set things up so there is an implicit running commentary every step of the way, for the benefit of anybody who’s really paying attention to what’s going on. Once you grasp that Apuleius is right up there with Jonathan Swift in terms of playing dumb while installing deep ironies and perversities, you are suddenly faced with a book that will stand up to a half dozen rereadings.

Apuleius called this book Metamorphoses. History knows it as The Golden Ass.

I’m not speaking loosely when I say the novel will stand up to a half dozen readings. I have read it six times in the last three months, and I am in the middle of a seventh read through as I write this sentence. What I did was I studied and annotated the first chapter in the most recent translation I own (2011). And then I just went right along reading all the other chapter-1 translations to which I had ready access. These are ten in number, counting one in Spanish and one in Russian. The oldest English one is from 1566. The next oldest English one is from 1822. Those first three chapters I read eight times each, but then I settled down into the six translations I found most interesting. I finished the book that way, and now am taking it from the top.

If you’re saying to yourself, He must be one of these ones who are medically incapable of getting bored, you couldn’t be wronger. I never do anything like what I’m describing here. I mean, I might do it with a single poem, but never with a whole book. In fact, I thought I was going to heroically read the first chapter eight times and then settle down to read whichever translation seemed best. The thing that went wrong with that plan is I kept learning important things, every single read through. Stuff that ran right by me on the first, second, and even third and fourth readings would suddenly unfold itself in its full significance on the fifth. And so on. I thought, If ever there was a book that should be read in this crazy way, I have found it.

Just look at chapter 1. The main character—his name is Lucius—is on his way to a town called Hypata. He runs into two travelers, one of whom is saying to the other, “Oh do stop telling such outrageous lies!” Lucius steps in and begs the storyteller to continue and scolds the other one for being closed-minded. Lucius goes off about how life is full of surprises and how he himself saw something he could hardly have believed if he had merely been told about it. He proceeds to describe a sword-swallower. Reader thinks, Oh this is good; his example is indeed amazing, but we know it is possible. But then Lucius goes on to describe an effeminate boy climbing the sword and doing a kind of boneless pole dance on it. Now the reader’s like, Hmm. The thing one doesn’t realize on first and maybe not even on second reading is that all through chapter 1, swallowing is made into an issue. It comes up a couple times at key points. If the book had been written in English, I would say with absolute confidence that the whole thing is playing on the idea of “swallowing” other people’s stories—but here the interrupted “liar” resumes his: It seems he, too, was on business in a foreign land (i.e., his situation “rhymes” with Lucius’s), and by chance he ran into his old acquaintance, whose name happens to be Socrates. Why does his name have to be that? Recall that Apuleius was famous as a Platonic philosopher. That name cannot be without special relevance. But this Socrates is sitting like a beggar, dressed in rags. The storyteller guy approaches him and gets the story. Socrates, it seems, was on business in a foreign land and was robbed by highwaymen. A kindhearted inn-keeping woman feeds him, shelters him, and takes him to bed. Thereafter, his life is hell. She, needless to say, is a witch, and holds Socrates in utter thrall. He dares not escape. Socrates tells his friend how this woman can turn the laws of physics inside out, invert nature at will, turn people into frogs, terrorize towns, et cetera. Our storyteller, frightened, says, “OK then we’d better get a good night’s sleep and then run far away, soon as the sun’s up.” They take a room at an inn.

Uh-oh. Late, late that night, the witches show up. The bolted door explodes open. The storyteller guy’s bed is thrown on top of him, so he becomes like a turtle in a shell. But Socrates doesn’t wake up. Two hags stand over him. The one says, “Here he is, my Ganymede! my Ulysses, fleeing his Calypso. He trifled with my youth, dear sister, toyed with my virgin charms, and now he wants to abandon me. And there, under that bed, is his friend who thinks we don’t know he’s there. What shall we do with these two, my sister?” The other witch suggests they castrate and kill the storyteller guy, but the main witch says, “No, leave him, so there’s someone to bury the body of this other wretch,” whereupon, she takes out a sword and plunges it into Socrates’s neck. The other has a sponge ready and catches the blood, every drop. Then the main one reaches into the wound up to her elbow and draws out Socrates’s heart, and they plug the hole with the sponge, saying a spell to the effect of, O sponge, born in the sea, beware of crossing a river. Then they squat over the other guy, who’s half dead with fright, and piss on him, thoroughly drenching him. Then they leave. The door springs back into place. The hinges reassemble. Then there’s some fuss where the storyteller guy wants to flee the scene, fearing he will be blamed for the murder. His flight is thwarted, so he tries to hang himself. He takes the plunge, the rope breaks, and the authorities charge in at that very moment. You would think his goose is cooked, but no. Socrates wakes up. He goes, “What is this intolerable disturbance, when I was just having the best sleep I’ve had in forever!” The storyteller guy is overjoyed. He tries to embrace Socrates, who immediately throws him off, complaining about the rank odor of the urine. The authorities leave and now the storyteller doesn’t know what to think. There’s no scar on Socrates’s neck.

Of course they decide to be off, but Socrates soon complains of hunger. They share a meal of cheese (the very stuff that Lucius couldn’t swallow before). And now Socrates is thirsty. And here it comes. When he bends down to drink from a nearby river, the sponge jumps out of his neck and a tiny trickle of blood comes out, and Socrates pitches over, dead, on the spot. The storyteller hastily buries him, flees for his life, has never gone home, and is now remarried. And guess what, the storyteller/survivor, the unbelieving listener, and Lucius have just reached the gates of Hypata. Lucius thanks the storyteller for his “charming and delightful” narrative, which made the walk seem so much shorter, and enters the town.

This is actually only the first half of chapter 1.

Two very important points. The witches operate by indirection. They could’ve just killed Socrates outright; instead, they prefer to let the storyteller guy enjoy (if that’s the word for it) a brief period where things seem to be improving. They do this the better to snatch away his hopes, later. They want to toy with their victims, and in this way, they very much resemble a natural phenomenon from real life, namely fatal illnesses, which often allow their victims all sorts of false hopes and time to think ’em over. As a friend of mine once said, Cancer doesn’t want to just kill you, it wants you to have a sinking feeling first. So there has to be all this toying, these feints, these fake outs. And so we must note that the witches also resemble a different real-life phenomenon: the Nabokovian novelist.

Important point number two: there is something very obviously wrong with the end of the story there. Lucius, who has promised to believe anything the storyteller has to say, does not react to the narrative like a person who believes it. If somebody told you that stuff, and you believed it, you wouldn’t call the thing charming and delightful. But this is one of these “reveals” that run right by you, the first time you read ’em. Apuleius seems in many places to regard belief in general—and religious belief in particular—as a kind of “charming and delightful” game. I think this is what he thought of Plato himself, and definitely what he thought of the goddess to whom he devotes the most notoriously baffling final chapter from Classical Antiquity: Book 11, sometimes called “The Isis Book.”

I shall have a great deal more to say about this in my next post, but since you’ve read eighteen hundred words, I think you are due for a rest. (← This move is the kind of thing Apuleius would do.)

Anthony Madrid lives in Victoria, Texas. His second book is Try Never. He is a correspondent for the Daily.

↧

About the Authors

To Probe Further

Commentary and Clarification

Opinion

Related Reading

Lookahead Example: Simple Password Validation

The Order of Lookaheads Doesn't Matter… Almost

Lookarounds Stand their Ground

Various Uses for Lookarounds

Zero-Width Matches

Positioning the Lookaround

Lookarounds that Look on Both Sides: Back to the Future

Compound Lookahead and Compound Lookbehind

The Engine Doesn't Backtrack into Lookarounds…

Fixed-Width, Constrained-Width and Infinite-Width Lookbehind

Lookarounds (Usually) Want to be Anchored

Footnotes

Human hearing and psychoacoustics

About MP3

Decoding, step 1: Making sense of the data

The anatomy of a logical frame

Huffman coding

How Huffman coding is used in MP3

Step 1, summary

Decoding, step 2: Re-quantization

Re-quantization

Re-quantization in MP3

Minor step: Reordering

Decoding, step 3: Joint Stereo

Decoding, step 4: Frequency to time

The encoder: filter banks and transforms

The decoder

Step 4, summary

Using the decoder

Conclusion

References

DNA and genes 101

Testing 1, 2, 3...

Finding fingerprint features in executable binaries

Disassembly

Decompilation

Dimensionality reduction

Classification

Evaluation results

Turning up the difficulty level

So you want to stay anonymous?

Like this:

Related

What is it?

Why do I care?

What’s the best way to learn this?

README.md

Usage

Configuration variables

Important note about Elasticsearch mappings and types

Setting up a template in Elastic Search

Monitoring

Development

Versioning