So a big problem audio codecs have with artifacts comes from pre-echo where quantization causes some leakage in the time domain.
Codecs usually deal with this by detecting sharp transients in the signal and reducing the transform size, increasing time resolution.
Then there's Opus. Where you can alter per-frequency-band time and frequency resolution, as well as having traditional transient frames. Which is hard to search in an encoder because of coefficient leakage. Have to use heuristics.

