With time, Seitzer began to outplay the master. Zwicker was an anatomist, and his insights were products of the analog era. Seitzer, by contrast, was a computer scientist, and he anticipated the coming era of digitization. In particular, he suspected that, by exploiting Zwicker’s research into the ear’s inherent flaws, it might be possible to record high-fidelity music with very small amounts of data. This unique education gave him an unusual perspective. When the compact disc debuted in 1982, the engineering community celebrated it as one of the most important achievements in the history of the field. Seitzer, practically alone, saw it as a ridiculous exercise in overkill. Where the sales literature promised “Perfect Sound Forever,” Seitzer saw a maximalist repository of irrelevant information, most of which was ignored by the human ear. He knew that most of the data from a compact disc could be discarded—the human auditory system was already doing it.

That same year, Seitzer applied for a patent for a digital jukebox. Under this more elegant model of distribution, consumers could dial into a centralized computer server, then use the keypad to request music over the new digital telephone lines that Germany was just beginning to install. Rather than pressing millions of discs into jewel cases and distributing them through stores, everything would be saved in a single electronic database and accessed as needed. A subscription-based service of this kind could skip the manifold inefficiencies of physical distribution by hooking the stereo directly to the phone.

The patent was rejected. The earliest digital phone lines were primitive affairs, and the enormous amount of audio data on the compact disc could never fit down such a narrow pipe. For Seitzer’s scheme to work, the files on the disc would have to be shrunk to one-twelfth their original size, and no known approach to data compression would get you anywhere near this level. Seitzer battled with the patent examiner for a few years, citing the importance of Zwicker’s findings, but without a working implementation it was hopeless. Eventually, he withdrew his application.

Still, the idea stayed with him. If the limitations of the human ear had been mapped by Zwicker, then the remaining task was to quantify these limitations with math. Seitzer himself had never been able to solve this problem, nor had any of the many other researchers who had tried. But he directed his own protégé toward the problem with enthusiasm: the young electrical engineering student named Karlheinz Brandenburg was one of the smartest people he’d ever met.

Privately, Brandenburg wondered if a decade of table tennis with an eccentric otological experimenter had driven Seitzer insane. Information in the digital age was stored in binary units of zero or one, termed “bits,” and the goal of compression was to use as few of these bits as possible. CD audio used more than 1.4 million bits to store a single second of stereo sound. Seitzer wanted to do it with 128,000.

Brandenburg thought this goal was preposterous—it was like trying to build a car on a budget of two hundred dollars. But he also thought it was a worthy target for his own ambitions. He worked on the problem for the next three years, until in early 1986 he spotted an avenue of inquiry that had never been explored. Dubbing this insight “analysis by synthesis,” he spent the next few sleepless weeks writing a set of mathematical instructions for how those precious bits could be assigned.

He began by chopping the audio up. With a “sampler,” he divided the incoming sound into fractional slivers of a second. With a “filter bank,” he then further sorted the audio into different frequency partitions. (The filter bank worked on sound the way a prism worked on light.) The result was a grid of time and frequency, consisting of microscopic snippets of sound, sorted into narrow bands of pitch—the audio version of pixels.

Brandenburg then told the computer how to simplify these audio “pixels” using four of Zwicker’s psychoacoustic tricks:

First, Zwicker had shown that human hearing was best at a certain range of pitch frequencies, roughly corresponding to the tonal range of the human voice. At registers beyond that, hearing degraded, particularly as you went higher on the scale. That meant you could assign fewer bits to the extreme ends of the spectrum.

Second, Zwicker had shown that tones that were close in pitch tended to cancel each other out. In particular, lower tones overrode higher ones, so if you were digitizing music with overlapping instrumentation—say a violin and a cello at the same time—you could assign fewer bits to the violin.

Third, Zwicker had shown that the auditory system canceled out noise following a loud click. So if you were digitizing music with, say, a cymbal crash every few measures, you could assign fewer bits to the first few milliseconds following the beat.

Fourth—and this is where it gets weird—Zwicker had shown that the auditory system also canceled out noise prior to a loud click. This was because it took a few milliseconds for the ear to actually process what it was sensing, and this processing could be disrupted by a sudden onrush of louder noise. So, going back to the cymbal crash, you could also assign fewer bits to the first few milliseconds before the beat.

Relying on decades of empirical auditory research, Brandenburg told the bits where to go. But this was just the first step. Brandenburg’s real achievement was figuring out that you could run this process iteratively. In other words, you could take the output of his bit-assignment algorithm, feed it back into the algorithm, and run it again. And you could do this as many times as you wished, each time reducing the number of bits you were spending, making the audio file as small as you liked. There was degradation of course: like a copy of a copy or a fourth-generation cassette dub, with each successive pass of the algorithm, audio quality got worse. In fact, if you ran the process a million times, you’d end up with nothing more than a single bit. But if you struck the right balance, it would be possible to both compress the audio and preserve fidelity, using only those bits you knew the human ear could actually hear.

Of course, not all musical work employed such complex instrumentation. A violin concerto might have all sorts of psychoacoustic redundancies; a violin solo would not. Without cymbal crashes, or an overlapping cello, or high register information to be simplified, there was just a pure tone and nowhere to hide. What Brandenburg could do here, though, was dump the output bits from his compression method into a second, completely different one.

Termed “Huffman coding,” this approach had been developed by the pioneering computer scientist David Huffman at MIT in the 1950s. Working at the dawn of the Information Age, Huffman had observed that if you wanted to save on bits, you had to look for patterns, because patterns, by definition, repeated. Which meant that rather than assigning bits to the pattern every time it occurred, you just had to do it once, then refer back to those bits as needed. And from the perspective of information theory, that was all a violin solo was: a vibrating string, cutting predictable, repetitive patterns of sound in the air.

The two methods complemented each other perfectly: Brandenburg’s algorithm for complicated, overlapping noise; Huffman’s for pure, simple tones. The combined result united decades of research into acoustic physics and human anatomy with basic principles of information theory and complex higher math. By the middle of 1986, Brandenburg had even written a rudimentary computer program that provided a working demonstration of this approach. It was the signature achievement of his career: a proven method for capturing audio data that could stick to even the stingiest budget for bits. He was 31 years old.


Перейти на страницу:
Изменить размер шрифта: