Abstract:
Even though deep neural networks (DNNs) were first proposed around 1960s there was
a rapid progress in related research starting from about 2012. This was due to the
availability of large public datasets, cheap compute that could efficiently run these data
driven algorithms, the rise of open source ML platforms and the resulting spread of
open source code and models. In addition DNN research has attracted a lot of funding
and is of high commercial interest. All of these reasons have contributed to a high
volume of research papers; for example in sparsity/pruning DNNs about one paper per
couple of days published on arXiv and growing exponentially.
Pruning is about training a network that is larger than necessary and then removing
parts that are not needed during inference so that lesser resources are required to store
it and less compute to execute the trained network. Even from early days researchers
observed that neural networks converge easily while training if the network is large and
used it as an experimental heuristic. The published literature on ‘pruning’ show many
ways to identify the aforementioned useless parts or removing them before, during or
after training. It even turns out that not all kinds of pruning actually allow for accelerating
neural networks, which is supposed to be the whole point of pruning.
Moreover, due to the fact that these research areas are quite new and in a rapidly
developing stage based mostly on experimental methods there is some concern in the
research community about the quality of published research. The purpose of this report
is to consider research conducted in deep learning in general and sparsity/pruning of
neural networks in particular from the viewpoint of diverse stakeholders in the research
community as related to the status of published research, empirical rigor and reporting
results and some technical issues related to efficient deployment.