Tonal components are sine-shaped signals that get added after IMDCT on decoding.
Tonal components have their own stereo processing which is not connected to the residual spectrum stereo processing, this is the cause that the serialized tone data has a header that is not replicated per channel. The first bit in the tone info header defines the dynamic range. In the high-dynamic-range mode, the levels for each tone are chosen on a 64 step exponential amplitude scale, while in low-dynamic range, only an overall level is chosen on the exponential scale, and the individual tones have the level chosen on a 16-level linear scale.
The number of bands with tones encoded is independent of the number of bands with residual spectral data and given after the choice of dynamic range. For stereo streams, flags for each band follow, namely whether tones in these band are shared between channels, whether master is left or right and whether the right channel should receive a 180° phase shift.
Finally, for each channel the tone info is stored. For the master channel, tone info for all bands is stored while for the slave channel, tone info for the shared bands is missing, as it was already encoded for the master channel. Each band with at least one tone has one common optional start and optional end time for all tones in that band, while each individual tone has pitch, level and phase info.
For the master channel, start/end info is always directly encoded: Both the start and end time are encoded as one bit indicating whether there is a star or end time followed by 5 bits for the start/end time. On the slave channel, there is one mode bit indicating whether the start/end info is present for the slave just as it is for the master channel, or the master channel info should be copied to the slave channel.
For each band, the count of tones in this band is encoded in one out of two modes for the master channel, or one out of four modes for the slave channel. The sum of all tone counts (shared master/slave bands have their tones counted only once) should not be bigger than 48.
For each band with tones, a four-bit number tells the number of tones in that band.
For each band with tones, a VLC symbol is stored giving the number of tones in that band (maximum is 7 in this case).
For each band with tones, a VLC symbol is stored telling the difference of the tone count between master and slave channel. The difference is a signed 3-bit numbers, while the tone count is 4 bits, and wraparound occurs after 15
No data is stored in this case.
For each tone, the pitch is stored, using near-plain encoding or difference-to-master encoding (obviously available only on the slave channel)
If there is more than one tone in the band, a bit flag tells whether the tones are stored in ascending or descending pitch order. (for only one tone, order obviously doesn't matter). Tone pitches are numbers between 0 and 1023, duplicates are encodable (i.e. two times the same pitch in succession). The first tone is always encoded using ten bits. In ascending mode: each further tone is encoded with a bit count depending on the pitch of the last tone. If the previous tone was below 512, 10 bits are used. If the tone was above or equal to 512, but below 768, 9 bits are used, and 512 is added to the number to obtain the pitch, and so on, up 2 bits if the previous tone was between 1020 (inclusive) and 1022 (exclusive), where 1020 is added to obtain the pitch and finally 1 bit if the previous tone was 1022 or 1023 using a base of 1023. Following the pattern a zero-bit encoding would be expected if the last tone was 1023; this is not the case.
In descending mode: Tones are encoded with 10 bits if the previous tone has a pitch above or equal to 512, encoded with 9 bits if the previous tone has a pitch above or equal to 256 and so on, up to a 1 bit encoding if the previous tone had a pitch of one or zero. The pitches are then reversed (to re-obtain an increasing order) before they are stored into the tones. That means even in decreasing-pitch storage orders, the tone with the lowest index is the one with the lowest pitch.
In each band, the slave pitches of all tones are encoded as difference to the master pitch with the same tone index in that band (if present) or the master tone with the highest index that is present, or, if there are no master tones at all, as difference to 0. Difference application wraps at 1024.
For the compression of level information in the slave channel, tones in the slave channel are linked to tones in the master channel based on their pitch. The linking algorithm is like this:
In high-dynamic range mode, the level of each tone is stored on a logarithmic scale. The value between 0 and 63 can be encoded in one of four modes, the last two only being available on the slave channel (as they refer linked tones).
The level of each tone in each band is encoded as a plain 6-bit number.
The level of each tone in each band is stored using a variable-length code. The possible level values range between 20 and 51 in this case.
The level of each tone in each band is stored as variable-length-encoded difference to the level of the linked tone in the master channel, which is assumed as 34 if there is no linked tone. The difference application does not wrap.
The level of each tone that is linked to a master tone is copied from the master tone level, unlinked tones get a default level of 32.
In low-dynamic range mode, the base level of all tones in a band is described by one common value on an logarithmic scale, with each individual tone having a linear scaled additional level. This allows a more fine-grained level control if all tones in a band have approximately the same level. The encoding modes are similar, but not equivalent to the level info of the HDR mode:
The base level of each band is encoded as a plain 6-bit number.
The base level of each band is stored using a variable-length code. The possible level values range between 24 and 55 in this case.
The base level of each band is stored as variable-lenght-encoded difference to the master level of base level of the corresponding band in the master channel, which is assumed as 44 if the master channel has no tones. Difference application does not wrap.
The base level of each band is copied from the corresponding master base level; Bands without tones in the master channel get a base level of 49.
The individual linearly-scaled tone levels (on a scale between 0 and 15) in LDR mode are stored similar to the logarithmic tone levels in HDR mode:
The level of each tone is encoded as a plain 4-bit number
The level of each tone is encoded using a variable-lenght code. Bands with just one tone use a different code than other bands, probably because the coarse scaling using the logarithmically coded base level can be adjusted more precisely to match the tone level.
The level of each tone in each band is stored as variable-lenght-encoded difference to the level of the linked tone in the master channel, which is assumed as 12 if there is no linked tone. Difference applications wraps around.
The level of each tone that is linked to a master tone is copied from the master tone level, unlinked tones get a default level of 14.
The phase is encoded as plain 5-bit-number for each tone.