The bytes of the ATRAC3+ frame make up a bitstream. The most significant bit of the input bytes is consumed first. In multi-bit-fields, also the first of the bits that make up the field is the most significant one.
The substream data first contains the number of quantization units that are present in this block. For some reason, the total number of quantization units should not be 29, 30 or 31. The maximum number of quantization units, 32, is fine, though. For the quantization units, the quantization precisions are encoded. The precision may be 0, meaning that no data for that quantization unit is present. For each quantization unit (regardless of it's precision being zero or not, a level is stored); for units with non-zero precision, tree-choice info and encoded spectral coefficients are stored; for all dither groups touched by the present quantization units, dither info might be stored.
For each band, joint stereo info, MDCT window shape and and enveloping info is stored next, followed by the non-MDCT encoding tonal component data and finally info about extra dithering noise to be added after synthesis.