Image for post
Image for post
Photo by Tim Mossholder

Welcome to my articles on Deep Learning, Reinforcement Learning, and Machine Learning. The purpose of this article is to index the series and articles I wrote so far. I also include a F.A.Q. for the most common questions and in case you want to contact me.

Series

Not part of a series yet

NLP

GPU Computing


Image for post
Image for post

CPU provides a generic set of instructions for general computing. To modify or optimize applications, we change the code but the hardware is fixed. Nevertheless, this generalization comes at the cost of complexity in hardware. Without complex hardware optimizations, like speculative execution, it hurts performance. However, these optimizations increase die size, and power consumption.

Image for post
Image for post
Generality provides flexibility at the cost of complexity

To increase concurrency in Deep Learning (DL), some chip designers limit the chip functionality to a vertical set of instructions and implement it with ASIC (Application-specific integrated circuit) design. That is the approach used by Google TPU. …


Image for post
Image for post

In part 2 of our Meta-Learning article, we will discuss Bayesian Meta-Learning, Unsupervised Learning, and Weak Supervision. In specific, we will discuss ways to apply Bayesian inference to Meta-Learning. Most of the approaches discussed in part 1 are point estimations. The answers are simply true-or-false that hardly captures the uncertainty in the true world. Integrating Bayesian inference will bridge the gap.

For the rest of the article, we address one critical and expensive problem in machine learning (ML )— labeling data. We will check how unsupervised and weak supervised may lower the cost of data acquisition and how it may apply to Meta-learning. …


Image for post
Image for post

At last but not least, we will cover the second half of the AI chip companies. This includes a wide spectrum of product offerings from Intel and the Mobile SoC (System on a Chip) from Apple, Qualcomm, and Samsung that comes with multiple Neural Processing Unit (NPU) cores. Then, we will go over chips that are specific for AI accelerations. In addition, Asian countries have put AI development as a strategic priority, so we will look into their offerings. Finally, we will close our discussion with one important and competitive market segment — the low-power AI edge devices.

Habana Labs

Image for post
Image for post
Source: Habana Lab. Eight TPC cards each containing a 2000-HL chip with 8 TPC cores.

Intel acquired Habana Labs in late 2019 for 2 billion. Let’s see what Intel gets for the money. Habana Labs offers Gaudi for AI training and Goya for inference. A Gaudi 2000-HL chip contains a cluster of eight Tensor Processing Core (TPC) and an accelerated GEMM (matrix multiplication) engine (left diagram below). Each core supports VLIW SIMD in exploring instruction-level parallelism and data parallelism. It also supports mixed precision calculations. …


Image for post
Image for post

SambaNova is very secretive about the company but yet attracts half a billion VC funding since its founding in 2017. In Part 2, we will put on our investigator hat and see what they may be working on. We dedicate part 2 of this series completely to a single company because its approach is quite novel. It has many design characters of AI chip startups and can lead to another major direction for AI chip development.

SambaNova

Image for post
Image for post
Source

However, no outsider knows exactly what SambaNova is working on. But there are plenty of tips left by the researches done by Stanford professors and SambaNova cofounders Kunle Olukotun and Christopher Ré, in particular, their vision on designing computer systems for 2.0. …


Image for post
Image for post

People learn continuously. We recall relevant skills and adjust them accordingly in handling new tasks. Currently, supervised learning has limited perspective and scope that sound like the “Blind Men and the Elephant Story” — each person’s experience is limited to where he/she touches. For instance, supervised models are often trained to specialize in a specific task and dataset only. To form a better perspective, we should learn how to learn (meta-learning). Reproducing the learning efficiency of humans is one of the holy grail in AI.

Image for post
Image for post
Source

Specifically, one big challenge in deep learning (DL) is how can we learn from trained tasks to form transferable knowledge. One obvious solution is to train a model with a meta-training dataset. This dataset contains multiple datasets that correspond to independent tasks from similar problem domains. …


Image for post
Image for post

Major tech companies invest billions in AI chip development. Even Microsoft and Facebook are onboard with Intel FPGA in accelerating their hardware infrastructures. There are a handful of startups that are already unicorns but there are sad stories like Wave Computing that filed for bankruptcy after raising 187 million in 3 years. In this series, we will cover about 30 companies. We will focus on the technology landscape with an emphasis on identifying future advancements and trends.

This series will be split into 3 parts. The first article looks at the development trends for GPU, TPU, FPGA, and Startups. The first three categories represent the largest market share in AI acceleration. We will focus on what vendors have been improving. Hopefully, it tells us where they may go next and the technology bottlenecks are. In the second half of this article, we look at novel approaches popular by startups. In particular, many of them move away from instruction flow designs to dataflow designs. This is a major paradigm shift that can change the AI chip direction completely. …


Deep Learning (DL) inference is often done in the cloud to utilize a widely available and flexible infrastructure. Nevertheless, as AI is gradually embraced by embedding systems, more AI edge chips will be adopted if power consumption, latency, or connectivity become more dominant design factors. While general AI chips emphasize instruction and data throughput, the edge chips will be preferable to be small, low power consumption (many in a few watts to 20/30W), and low latency.

Image for post
Image for post
Source (Google Edge TPU)

Edge Device Applications

The early adopter for AI edge chips in the consumer market will likely be computer vision in the area of object detection and classification, pose estimation, gaze detection, and image segmentation. …


Image for post
Image for post
Source: Google

Google’s chip designers argue that if Moore’s Law is no longer sustainable, domain-specific architectures are the way for the future. Instead of developing generic hardware for ML (machine learning), Google specializes Tensor Processing Unit (TPU) to be an ASIC (application-specific integrated circuit) accelerator for AI. The TPU objective is optimizing operations matter most to its problem domain — on top of the list is the deep neural network (DNN). Here are some other target DL domains that targeted by TPU:

Image for post
Image for post
Source: Google

TPU introduces 128×128 16-bit matrix multiply units (MXU) for matrix multiplication to accelerate ML. PageRank used in Google search ranking involves huge matrix multiplication. Therefore, Google has been utilizing TPU for many operations within Google that involve matrix multiplications heavily, including the inferencing in AlphaGo and PageRank. …

About

Jonathan Hui

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store