STCO is an option now in large part because advanced packaging, such as 3D integration, is allowing the high-bandwidth connection of chiplets — small, functional chips — inside a single package. This means that what would once be functions on a single chip can be disaggregated onto dedicated chiplets, which can each then be made using the most optimal semiconductor process technology. For example, Kelleher points out in her plenary that high-performance computing demands a large amount of cache memory per processor core, but chipmaker’s ability to shrink SRAM is not proceeding at the same pace as the scaling down of logic. So it makes sense to build SRAM caches and compute cores as separate chiplets using different process technology and then stitch them together using 3D integration. A key example of STCO in action, says Kelleher, is the Ponte Vecchio processor at the heart of the Aurora supercomputer. It’s composed of 47 active chiplets (as well as 8 blanks for thermal conduction). These are stitched together using both advanced horizontal connections (2.5 packaging tech) and 3D stacking. “It brings together silicon from different fabs and enables them to come together so that the system is able to perform against the workload that it’s designed for,” she says.
Read more of this story at Slashdot.