joonho hwangbo

Research on Domain Specific Architectures. Are they really worth it in academia?

If you are working on something that is even tangentially related to VLSI or computer architecture, you probably saw an article starting with “Due to the end of Moore’s Law…”. And yes, the Moore’s law has ended, and we as computer scientists need to come up with innovative ways to squeeze performance. For the past 10 years, a lot of this was achieved by specialization. For computer architects, this boiled down to building accelerators. I think it is a good time to reflect back on time and see what we have achieved, and what we should work on going forward.

The pros of building accelerators for research.

As you already have noticed in the recent ISCA, Micro, ASPLOS proceedings, nearly one third of the papers are “oh we built another accelerator” (a few years before I’m pretty sure the ratio was like half of the entire proceedings). Aside from the fact that we need specialized hardware to gain performance, I personally think that this is because the process of building accelerators is trivial (someone in ISCA 2023 asked this question to Kunle : “Do you recommend students to pursue low hanging fruits like building accelerators or do you think they have to come up with something more substantial?”, and his answer was “sometimes, but not really”). The process looks somewhat like this. You find an application and come up with whatever reason why to build an accelerator for it, then you do some profiling to find common patterns. Finally, you build your thing and you are done. Hence, as long as you have a good story and sufficient evaluation results for it, you will likely get a paper out at one of the top conference venues. It is very tempting to be honest.

The cons of building accelerators for research.

The biggest downside of building accelerators is that, most of the time, people just don’t care what you did. This is partly because the knowledge you obtain about a domain isn’t knowledge or skills that are really transferable to another domain. For instance, knowledge about ML won’t help you build accelerators for bio-informatics for example. This naturally leads to ignorance from people working on other domains.

Another downside is that it is very hard to answer a simple but critical question to researchers. Whats next? I think this paper “The Accelerator Wall: Limits of Chip Specialization” shows this very well. It basically performs a breakdown of performance gains between technology scaling and algorithmic optimizations and claims that as long as we are tied to the CMOS technology, there is a limitation on what we can get by specializing hardware. This is also conveyed in some series of accelerator papers as well. The first accelerator paper claims that they achieve a 10x speed gain over the baseline, then the second paper claims that they achieve something like 2x or 3x over the first iteration, and the third paper claims a 1.2x to 1.5x over the second one. Then the fourth paper is never published because this domain is done to death (you can try searching for a bunch of GCN accelerator papers published around 2021 if you don’t believe me).

Lastly, if a domain is highly impactful, it is likely that the industry already built the accelerator which leaves little room for people in academia. ML is a great example. It is very unlikely that the ML accelerator you built will beat Google’s TPU. Even if you’re accelerator may be suited for certain models, it probably is way too specialized to be deployed anywhere.

Well some accelerator papers are impactful aren’t they?

You may argue that some accelerator papers have very high impact like “Eyeriss”, “TPU”, or “Protobuf”. I strongly agree that these papers are impactful and interesting as well. However, I still stand by my point that these are only a handful out of hundereds of accelerator papers. Then what differentiates between a interesting accelerator paper and a non-interesting one? First and foremost is the domain. Both “Eyeriss” and “TPU” received a tone of attention because ML is the hot potato at the time being (and even today). “Protobuf” also received a tone of attention because it is tightly coupled to datacenter efficiency, and I probably don’t need to emphasize how popular datacenter computing is nowadays. Second of all, a high fidelity evaluation is crucial as well. Unlike traditional microarchitecture research where we build our evaluations on top of a simulator that has been correlated to the hardware, accelerator simulations are hardly validated. Hence nobody really believes your 20x speedup results if it is just some blob of C++ code that spits out numbers. All the papers above actually wrote RTL for their thing to perform evaluations.

Well then, what should we do?

I think there are three possible directions here. First, find a domain that is impactful but is yet explored. In my opinion, datacenter accelerators are a good area because it is only recent that people have started investigating in this direction and has high impact. I won’t state the domains that I think are less appealing, but they are either less impactful in my opinion or well understood (I’d be happy to discuss what those domains are if you contact me). Second, accelerator integration is another possible area. The integration techniques can be agnostic to the specialized hardware itself, which makes it transferable to a wide range of different accelerators. Programming APIs for accelerators, accelerator virtualization, decoupling the accelerator execution from a GPP core, accelerator chaining are all possible areas of research. Last but not least, research on the process of building accelerators themselves are a viable direction. As I have mentioned above, there are a bunch of common steps involved when building accelerators, and making each step agile and efficient is applicable to every possible domain we can think of.