Deep Compression for Improved Inference Efficiency, Mobile Applications, and Regularization
As technology cozies up to the physical limits of Moore’s law, computation is increasingly limited by heat dissipation rather than the number of transistors that can be packed onto a given area of silicon. Modern chips already routinely idle whole sections of their area forming what are referred to as “dark silicon,” referring to design constraining the proportion of the chip that can be powered on for extended periods of time before dangerously surpassing thermal design limits. An important metric for any machine learning accelerator — including tried and true general purpose graphics processing units (GPUs) — is, therefore, the training or inference performance of the device at about 250 W of power consumption.
It’s this drive for efficiency that prompted Google to develop their own tensor processing unit (TPU) in-house for increasing the efficiency of model inference, as well as training in the v2 unit onward, in their data centers. The last few years have seen a plethora of deep learning hardware startups that seek to produce energy-efficient deep learning chips that specialize in the limited mix of operations required for currently fashionable deep learning architectures. If deep learning mostly consists of matrix multiplies and convolutions, why not build a chip that does just that, trading general computing capabilities for specialized efficiency? But there’s more to this tradeoff than meets the eye.
from DZone.com Feed https://ift.tt/2SzpRhI
No comments:
Post a Comment