HDCD - reading the patent

MartinLogan Audio Owners Forum

Help Support MartinLogan Audio Owners Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

spectral

Well-known member
Joined
Dec 27, 2007
Messages
327
Reaction score
1
Location
NE USA
Recognizing that it's been 15 years since the invention, I have been looking for a concise and detailed description of it for a long time, and being unable to find anything but bits and pieces I sat down to read this very lengthy patent, considering that I don't see the proliferation of higher resolution digital I would have expected by now, plus it really makes for fascinating reading. For what it's worth, the following are some of the more important claims made in the patent, with most comments copied verbatim to preserve fidelity, and hopefully I have not misrepresented anything else - but the more techno savvy should feel free to make necessary corrections...

Basic claim: compatible system which provides an adaptive interplay of gain, slew rate, filter action and wave synthesis processes to substantially reduce signal distortions and improve apparent resolution. Resolution is selectively and adaptively traded off for slew accuracy, and slew rate or maximum level is borrowed for higher resolution. Information rate is conserved by toggling back and forth or fading from process to process when needed.

1) Most digital distortions can be predicted, as they are strongly related to signal conditions which are easy to identify. For a given signal, one can choose a best encode-decode process having the least audible or sonically damaging distortion. If one must operate with the Nyquist limit, then a transient response versus alias compromise exists. HDCD is a series of techniques to address various kinds of distortion. Such techniques include: fast peak expansion, averaged low level gain reductions, selecting complementary interpolation filters, waveform synthesis, record-compress/play-expand system (with some features similar in ways to those used in noise reduction systems, but unlike those, it corrects distortion), and others. See Note 6.

2) Uses the least significant bit of a word for encoding the command codes and other auxiliary data, thus adds the slightest amount of noise (other techniques are also discussed but not utilized because they would presumably break compatibility with non-HDCD processors). However, to make it inaudible, it is encoded or encrypted to a random number sequence which is inserted on an as-needed basis in a serial fashion, one bit per word (external literature claims that the HDCD encoding accounts for 3-4% of the entire dataset). See also Notes 1 and 2 for more details. In typical classical music programming, the control signal would be inserted for intervals of about a millisecond each occurring several times per second at most. The loss of full program resolution for these brief intervals is not noticeable.

3) Thus, lack of HDCD decoding yields a signal with slightly less dynamic range and only slightly higher background noise. However, the signal will have lower quantization and slew induced distortions and, hence, the processed encoded product, when reproduced on non-decoding standard equipment, will sound equal to or better than an unencoded product.

4) The signal is delayed during encoding, and by using look-ahead and look-behind techniques, an optimal encoding technique is identified - and only one is active at any given point in time - to address the issue. This is done in order to address digital distortions which occur at extremes of high level, slew rate, and high frequencies, on one hand, and with quiet signals and short small transients on the other hand; the best encode/decode strategy is chosen for that extreme without the process compromise hurting the opposite aspects of the program.

5) The decoder has a much easier job, because the complementary decoding technique is already defined during the encoding. On the other hand, the encoder would be best served if implemented with higher sample rate and wider word length. Computing power matters here. There is nothing in the patent that prevents the decoder from doing further analysis and adjustments on playback; this may partly explain why recent HDCD playback systems sound better than previous generations with the same old HDCD recordings (PMD-100 vs PMD-200 vs custom processor-based solutions).

6) Reproduction of higher-than-Nyquist-limit frequencies is possible without creating sub-harmonic or foldover distortions[!!!]. Does this mean that current DACs/players that implement brick wall filters in the digital domain using SHARC processors (like the Spectral SDR-4000S Pro and the Berkeley Alhpha DAC) take advantage of this by shifting the cutoff frequency higher, while the digital basis of implementation avoids all phase issues, even better than so-called 'apodizing' filters? See Notes 3 and 4.

7) Minute details, like hall reverberation mixed in otherwise loud bass sounds, are usually attenuated or lost in conventional red book encoding, producing a collapse of the sense of space in a recording, but preserved by the invention. Such distortions occur when very small signal amplitude changes of about 5 to 20 millivolts are mixed with a larger low frequency dominated signal and are, thus, averaged at many different voltage levels of a larger slow waveform.

8) The decoded signal may have an apparent increase of bandwidth and resolution (external literature claims 18-20 bits; the pattent claims 'The system modifies signals to achieve 18 bit performance from a standard 16 bit converter system', and 'the dynamic range for fully decoded reproduction in accordance with the invention has increased almost 20 dB. Average resolution is substantially more than 18 bits', and 'The result of decimation [during D/A] is a signal having approximately 20 bit resolution at one-times sampling rate... this signal is then packed into 16 bit words and adds control information for use by the reproducer').

9) Following the invention, one process can borrow information rate from a less needed performance capability when potentially severe distortion conditions in the signal call for it. In this manner, a decision to provide more points per fast voltage change yields an equivalent higher sampling rate at the expense of less important low level resolution. Conversely, a smaller voltage change per sample automatically reduces the momentarily unneeded speed capability. Such interplay and compromise can be managed and/or computed to maintain a substantially constant digital information rate.

10) A correction strategy is applied to reduce filter trade-off compromise errors between transient response, phase accuracy, settling time, group delay, and other distortions inherent with filtering methods. See Note 5.

11) On sample-time jitter and internal crosstalk: The system of the present invention stops all operations long enough before sampling to allow the energy stored on cables and other energy storage parts to dissipate (noise from cables, IC's and other parts becomes ten to one hundred times less and a signal sample to accurate to millionth's of a volt occurs). One pulse initiates sampling which then occurs during electrical silence. Once the analog signal is sampled and safely held, the conversion process resumes and the digital code is sent to the digital signal processors. In order to keep sample-time jitter to a minimum, we place the system clock in the A to D converter module.

The following are copied mostly verbatim:

NOTE 1: Sharing the LSB with the main program data means that the reproducer has to be able to identify the commands embedded within the stream of arbitrary main program data. This is accomplished by preceding a command code with a synchronizing sequence of bits which the decoder looks for in the data stream. False triggering of the reproducer on program data can be completely eliminated by having the encoder monitor the program data stream during recording and alter the least significant bit in one word if the synchronizing sequence is about to occur, thereby preventing it.

NOTE 2: The process control signal is hidden in the least significant bit of the digital audio channel by modulating it with a noise signal. The circuit consists of a pseudo-random noise generator based on a [31-bit] shift register with feedback which implements a maximal length sequence. This type of generator produces a deterministic sequence of bits which sounds very random, and yet is a reproducible sequence. The output of the noise generator is added to the control signal modulo-two (exclusive-or'ed), modulating the signal with noise, or scrambling it. The result is then inserted into the least significant bit of the record serial data stream. On the play side, the least significant bit is extracted from the serial digital stream and the output of a matching shift register is subtracted from it, modulo-two (exclusive-or again). The result is the process control signal, unscrambled again.

NOTE 3: An important advantage is that the "brick wall" low-pass filter required to prevent Nyquist--Alias errors can be implemented as a digital filter, which has highly reproducible characteristics free from phase distortions. The characteristics of this filter can be chosen dynamically based upon an analysis of the high resolution signal to minimize distortion. Hence major filter and analog to digital encode system problems such as pre-echo, transient ringing, group delay anomalies, missing code errors, alias distortion and beats are greatly reduced or eliminated.

NOTE 4: Recent research has shown, however, that humans use transient information in sounds with frequencies much higher than [Nyquist] to determine the direction from which the sound has come, and that eliminating those very high frequency components impairs one's ability to locate the source of the sound. The inner ear actually has nerve receptors for frequencies up to about 80 kiloHertz. Therefore, if the "brick wall" low-pass filter, which is a necessary part of all digital recording, removes frequencies above about 20 kiloHertz in transients, it reduces the level of realism in the sonic image.

NOTE 5: A transient to be reconstructed is identified at the encoder, and its wave shape is matched to one of a number of predetermined "standard" transient shapes, which are known to both the encoder and decoder. A command code identifying the shape is sent through the control channel to the decoder, which regenerates the shape, either by reading it out of a lookup table or algorithmically generating it [this is another place where a lot of computing horsepower would come into play], and scales it to the amplitude of the bandlimited transient arriving in the main signal.

NOTE 6:
  1. It uses ratios of high frequency content to total amplitude along with detected isolated transients to select filter programs for the decimation filter.
  2. It measures the average signal level of the broad middle frequency spectrum and uses the results to control the gain of the low level compressor. It also generates reproducer control codes to correctly complement the encode gain structure.
  3. It measures the average level of low level high frequency signals, and invokes dynamic dither insertion of extra high frequencies when appropriate.
  4. It analyzes the distribution of peak amplitudes to determine if the incoming signal has been limited prior to the encoder. If so, it can raise the threshold of the encoder's soft limit function, or turn it off altogether.
  5. It can compare the decimated signal to the oversampled one delayed to match the decimation to look for isolated bursts of high frequency information which represent transients which would not fit within the normal 22 kiloHertz bandwidth. These difference signals can be sent to the reproducer in the control channel, spread in time, so that the reproducer can correct the transient on playback.
  6. It can also use the transient analysis to control slew rate limiting of the main signal as an alternate approach to increasing the apparent bandwidth of the system.

PS: The detailed descriptions of the actual circuit designs are beyond my understanding, but appear phenomenally complex and equally fascinating.

Patent

Peter
 
Last edited:
Peter,

Do you have any HDCD's?

I have several and it's clearly a jump in sonic performance.

Keith Johnson, the man behind the technology, is a very smart guy.

I'm sure you know of him given your equipment.

Gordon
 
Peter,

Do you have any HDCD's?

I have several and it's clearly a jump in sonic performance.

Keith Johnson, the man behind the technology, is a very smart guy.

I'm sure you know of him given your equipment.

Gordon

Gordon, I very much agree and I recently said so in a couple of posts last weekend pertaining to some changes I'd made in my system as well as the fact that all of my listening last Sunday was via HDCD encoded disc's.

I believe that Kieth Johnson does indeed utilize the technology in all of his recordings, but I didn't think he had anything to do with it's development. It was originally developed by Pacific Microsonics and (believe it or not) is now owned by Microsoft...:eek:
 
Co-founded with Michael "Pflash" Pflaumer and Michael Ritter, IIRC. These days, Johnson is the guy behind HRx high-res recordings, allegedly mixed with the aid of a PM Model 2. Pflaumer and Ritter are now principals at Berkeley Audio.
 
Yes I have tons of HDCDs from RR and they offer truly spectacular sound through the current equipment (also partly due to the highly optimized recording and transfer process), and Messrs Johnson and Pflaumer are the authors of the patent (as updated in the 1999 version I referenced). My understanding is that HDCD was developed out of KOJ's research into digital distortions.

A good bio can be found here, if you haven't seen it before. It should also be noted that HRx recordings are also HDCD encoded, albeit already being 24-bit/176.4kHz by nature.
 
Last edited:
Peter, have you given any of the HRx recordings a spin via your BADA?

/Ken
 
Hi Ken, no I have not yet. I only have the demo HRx disc that comes with the Berkeley and I've been meaning to borrow a music server, but too much hardware to carry around - computer, monitor, cables...
 
Back
Top