At the Rambus 2005 conference, Sony have a presentation on the design descisions behind Cell. RAMBUS Conference SCE article

What most interesting to me, is the analysis of program data access patterns. Their research indicated that programs read and write about 128K of local data and read virtually unlimited global data. The more optimised the code, the more bandwidth is used (which is fairly obvious when you think about it…) and apps use between 8Mb and 4Gb of RAM but rarely hit the same memory twice (so caches don’t help much). The competition between Pollack’s and Armdahl’s Law suggest 8 or 9 threads fits between general and specialist processors.

Which provides the basis of 256K LS on each of the 8 SPUs and that you could never have enough L2 Cache so 512K is enough. But the primary question that must be asked of that research must be what data was tested to come to these conclusion?

I’m of the personal opinion that they did choose a good set of data and programs to model and that Cell will perform well in lots of situations.

Its certainly not going to be a walk in the park to get maximum performance with 9 hardware threads and an extremely complex memory architecture its going to stress the talents of programmers the world over. I’ll be surprised if first generation titles get anywhere near the full potential of Cell, its going to take a major re-education among many programmers who have forgotten or never learnt the more old-school way of programming with the hardware in mind.

Many programmers these days only think about the higher level coding, while ignoring the actual hardware its running on. With an architecture like Cell (or for that matter Xe360) thats just not possible, its not only the in-order processing or lack of branch prediction but memory locality. Even ignoring the very hard 256Kb limits of each SPU, the distance between main memory and L1 (and registers) on the PPU is enough that any algorithm that doesn’t optimise to remove unnessecary memory access will perform badly.

Any algorithm that ignores the fact that memory access are bad (mm-kay) is toast on next-gen consoles. But then thats not exactly news, Knuth came to the same conclusion back in the 1960’s (though not about consoles obviously…).