The NPU is one of the hottest technologies
in the handset industry this year. Apple A11 Bionic, Huawei Unicorn 970, and
the Image Processing Unit (IPU) inside Google Pixel 2 are typical examples. After
equipped with NPU, the common point is that the phone has a learning ability,
if slightly more complicated, that is, with the number of product algorithm and
matrix multiplication, etc., the rapid implementation of complex calculations
of mathematical functions. Then what presented to the user, of course, is
faster, smarter mobile experience.
Of course, the integration of a single NPU,
cell phone circuit design is more complicated, which is why only high-end
mobile phones will be integrated with NPU. Although machine learning is the
only way to intelligent hardware, some media think that machine learning does
not necessarily require the phone's built-in NPU to achieve. ARM released the Cortex-A75
and A55 CPUs and Mali-G72 GPU designs earlier this year, all with more advanced
machine learning algorithms, but not through the NPU.
NPU (Neural Processing Unit) has not been
well-known, but it is a hot technology in the field of chips. In contrast to
CPU processors in the von Neumann architecture, it uses a disruptive new
architecture called "Data Driven Parallel Computing." If regard the
Von Neumann architecture handles data as a single lane, then Data-Driven
Parallel Computing is a 128-lane multi-lane parallel that can process up to 128
data at a time, making it easy to handle large amounts of multimedia data in
video and image formats.
The
real performance of Huawei mate10NPU
Unicorn 970 is Huawei's first artificial
intelligence mobile computing platform, is also the world's first independent
AI artificial intelligence NPU neural network processing unit chip, using the
innovative HiAI mobile computing architecture.
On Huawei AI this feature mainly focus on
the camera, automatic recognition of food fresh flowers text animals different
subjects, depending on the subject to be optimized, and the CPU than Kirin 970
AI computing power up to 50 times the energy efficiency and 25 times
performance. In NPU's blessing, under the same shooting scenes Kirin 970
shooting speed is also faster than the average CPU. Compared to iPhone7 Plus
487, equipped with unicorn 970 intelligent terminals within 1 minute can
identify over 2000 pictures.
The unicorn 970 features an innovatively
designed HiAI mobile computing architecture with an energy-efficient
heterogeneous computing architecture that dramatically increases the
computational power of AI, of which AI performance density significantly outperforms
CPUs and GPUs, enabling AIs to complete computing tasks faster with less power
consumption, which is completely different from server-side AI design.
Machine
learning does not necessarily depend on NPU
In terms of the hardware structure of
current smartphones, the NPU can only do math on 8-bit and 16-bit data instead
of 32-bit and 64-bit. You know, this operation saves the memory cache
requirements, but increased bandwidth requirements, so the actual effect is not
ideal. Therefore, ARM believes that integrating a more advanced single
instruction, multiple data architecture is more resource-saving. The new INT8
can combine multiple instructions into one instruction to improve the delay
phenomenon.
In addition, ARM solutions can help reduce
costs, although there is no custom SoC based on A75 / A55 release, Qualcomm,
MediaTek, Samsung and Hisilicon are expected to take advantage of the above
instruction setting improvements to develop better SoC, Including mid-range
products.
The key to ARM's ability to drive SoC
machine learning is the Compute Library, which includes a full suite of
capabilities for imaging and visual projects and a machine learning framework
such as Google's TensorFlow which developers can use to recompile the version
what they need.
In addition to ARM, Qualcomm also has its
own Hexagon SDK, which also includes machine-learned generic matrix algorithms
for more efficient DSP operation. In addition, the Qualcomm Symphony System
Manager SDK provides an API that specializes in optimizing computer vision,
image and data processing, and low-level algorithms as well as the common
processing needs of smartphones.
Why
does NPU exist?
Since NPU is not necessary, and architects
manufacturer such as ARM also provide other methods, why Apple, Huawei and Google
insisted so? The answer is not simply to improve selling points, hardware
prices, but in some ways NPU still has some advantages.
For example, the unicorn 970 FP16
instruction set throughput is 1.92TFLOP, which is 3 times of the Mali-G72
version. In the role of dedicating hardware acceleration, optimization,
standalone NPU performances more powerful. Of course, these built on the condition
of manufacturers know very well and have completely control of their own
hardware, which means that manufacturers need to spend more time and cost to
research and development, mobile phone prices will not come down.
It is estimated that by 2018, high-end
handsets will still adopt a solution with a stand-alone NPU, but not everyone
will buy a high-end smartphone like the Xiaomi
Redmi Note 4X. Low-end smart phones will be more inclined to choose ARM's
low-cost solutions to achieve better machine learning capabilities.