
First, he refers to specific hardware blocks, then segues into discussing queue depths. AMD has always said that it could add more Asynchronous Compute Engine blocks to this structure to facilitate a greater degree of parallelization, but I think Cerny mixed his apples and oranges here, possibly on purpose. Here, you can see the Asynchronous Compute Engines and the GPU Command Processor.
