Like a shrinking ray, the compression algorithm could target different output sizes. At half size, the files sounded decent. At quarter size, they sounded OK. In March 1988, Brandenburg isolated a recording of a piano solo, then dialed the encoding ratio as low as he dared—all the way down to Seitzer’s crazy stretch goal of one-twelfth CD size. The resulting encoding was lousy with errors. Brandenburg would later say the pianist sounded “drunk.” But even so, this experiment in uneasy listening gave him confidence, and he began to see for the first time how Seitzer’s vision might be achieved.
Increases in processing power spurred progress. Within a year Brandenburg’s algorithm was handling a wide variety of recorded music. The team hit a milestone with the 1812 Overture, then another with Tracy Chapman, then another with a track by Gloria Estefan (Grill was on a Latin kick). In late 1988, the team made its first sale, and shipped a hand-built decoder to the first ever end user of mp3 technology: a tiny radio station run by missionaries on the remote Micronesian island of Saipan.
But one audio source was proving intractable: what Grill, with his imperfect command of English, called “the lonely voice.” (He meant “lone.”) Human speech could not, in isolation, be psychoacoustically masked. Nor could you use Huffman’s pattern recognition approach—the essence of speech was its dynamic nature, its plosives and sibilants and glottal stops. Brandenburg’s shrinking algorithm could handle symphonies, guitar solos, cannons, even “Oye Mi Canto,” but it still couldn’t handle a newscast.
Stuck, Brandenburg isolated samples of “lonely” voices. The first was a recording of a difficult German dialect that had plagued audio engineers for years. The second was a snippet of Suzanne Vega singing the opening bars of “Tom’s Diner,” her 1987 radio hit. Perhaps you remember the a cappella intro to “Tom’s Diner.” It goes like this:
Dut dut duh dut
Dut dut duh dut
Dut dut duh dut
Dut dut duh dut
Vega had a beautiful voice, but on the early stereo encodings it sounded as if there were rats scratching at the tape.
In 1989, Brandenburg defended his thesis and was awarded his PhD. He then took the voice samples with him on a fellowship to AT&T’s Bell Labs in Murray Hill, New Jersey. There, he worked with James Johnston, a specialist in voice encoding. Johnston was the Newton to Brandenburg’s Leibniz—independently, he had hit upon an identical mathematical approach to psychoacoustic modeling, at almost exactly the same time. After an initial period spent marking territory, the two decided to cooperate. Throughout 1989, listening tests continued in parallel in Erlangen and Murray Hill, but the American test subjects proved less patient than the Germans. After listening to the same rat-eaten, four-second sample of “Tom’s Diner” several hundred times, the volunteers at Bell Labs revolted, and Brandenburg was forced to finish the experiment on his own. He was there in New Jersey, listening to Suzanne Vega, when the Berlin Wall came down.
Johnston was impressed by Brandenburg. He’d spent his life around academic researchers and was accustomed to brilliance, but he’d never seen anybody work so hard. Their collaboration spurred several breakthroughs, and soon the scratching rats were banished. In early 1990, Brandenburg returned to Germany with a nearly finished product in hand. Many compressed samples now revealed a state of perfect “transparency”: even to a discriminating listener like Grill, using the best equipment, they were indistinguishable from the original compact discs.
Impressed, AT&T officially graced the technology with its imprimatur and a modicum of corporate funding. Thomson, a French consumer electronics concern, also began to provide money and technical support. Both firms were seeking an edge in psychoacoustics, as this long-ignored academic discipline was suddenly white hot. Research teams from Europe, Japan, and the United States had been working on the same problem, and other large corporations were jockeying for position. Many had thrown their weight behind Fraunhofer’s better-established competitors. Seeking to mediate, the Moving Picture Experts Group (MPEG)—the standards committee that even today decides which technology makes it to the consumer marketplace—convened a contest in Stockholm in June 1990 to conduct formalized listening tests for the competing methods.
As the ’90s opened, MPEG was preparing for a decade of disruption, shaping technological standards for near-future technologies like high-definition television and the digital video disc. Being moving picture experts, the committee had first focused exclusively on video quality. Audio encoding problems were an afterthought, one they’d tackled only after Brandenburg pointed out that there was no longer much of a market for silent movies. (This was the sort of joke that Brandenburg liked to make.)
An MPEG endorsement might mean a fortune in licensing fees, but Brandenburg knew it would be tough to get. The Stockholm contest was to be graded against ten audio benchmarks: an Ornette Coleman solo, the Tracy Chapman song “Fast Car,” a trumpet solo, a glockenspiel, a recording of fireworks, two separate bass solos, a ten-second castanet sample, a snippet of a newscast, and a recording of Suzanne Vega performing “Tom’s Diner.” (The last was suggested by Fraunhofer.) The judges were neutral participants, selected from a group of Swedish graduate students. And, as MPEG needed undamaged ears that could still hear high-pitched frequencies, the evaluators skewed young.
Fourteen different groups submitted entries to the MPEG trials—the high-stakes version of a middle school science fair. On the eve of the contest, the competing groups conducted informal demonstrations. Brandenburg was confident his group would win. He felt that access to Zwicker’s seminal research, still untranslated from German, gave him an insurmountable edge.
The next day a room full of fair-haired, clear-eared Scandinavian virgins spent the morning listening to “Fast Car” ripped 14 different ways. The listeners scored the results for sound quality on a five-point scale. After tabulating the answers, MPEG announced the results—it was a tie! At the top was Fraunhofer, locked in a statistical dead heat with a rival group called MUSICAM. No one else was close.
Fraunhofer’s strong showing in the contest was unexpected. They were a dark horse candidate from a research institution, a bunch of graduate students competing against established corporate players. MUSICAM was more representative of the typical MPEG contest winner—a well-funded consortium of inventors from four different European universities, with deep ties to the Dutch corporation Philips, which held the patents on the compact disc. MUSICAM also had several German researchers on staff, and Brandenburg suspected this was not a coincidence. They’d had access to Zwicker’s untranslated research, too.
MPEG had not anticipated a tie, and had not made provisions to break one. Fraunhofer’s approach provided better audio quality with less data, but MUSICAM’s required less processing power. Brandenburg felt this disparity worked in his favor, as computer processing speed improved with each new chip cycle, and doubled every 24 months or so. Improving bandwidth was more difficult, as it required digging up city streets and replacing thousands of miles of cable. Thus, Brandenburg felt, MPEG should look to conserve bandwidth rather than processing cycles, and he repeatedly made this argument to the audio committee. But he felt he was being ignored.
After Stockholm the team waited for months for a ruling from MPEG. In October 1990, Germany was reunified, and Grill kept himself busy by applying Brandenburg’s algorithm to his new favorite song: the Scorpions’ “Wind of Change.” In November, Eberhard Zwicker, hearing researcher and table tennis enthusiast, passed away at the age of 66. In January 1991, the Fraunhofer team rolled out its first commercial product, a 25-pound hardware rack for broadcast transmission. It made an early sale to the bus shelters of a reunified Berlin.