钛媒体T-EDGE|谷歌董事会主席John Hennessy: John( 三 )


我想强调的是,人工智能训练带来的问题是密集的算力需求,程序推理变得简单得多。这里展示的是训练人工智能模型的性能需求增长率。以训练 AlphaZero 为例,它需要 1000 pfs-day,也就是说用世界上最大规模的计算机来训练要用上一周。
This speed has been growing actually faster than Moore's law. So the demand is going up faster than what semiconductors ever produced even in the very best era. We've seen 300,000 times increase in compute from training simple models like AlexNet up to AlphaGo Zero and new models like GPT-3 had billions of parameters that need to be set. So the training in the amount of data they have to look at is truly massive. And that's where the real challenge comes.
这个增长率实际上比摩尔定律还要快。因此,即使在半导体行业最鼎盛的时代,需求的增长速度也比半导体生产的要快。从训练 AlexNet 这样的简单模型到 AlphaGo Zero,以及 GPT-3 等新模型,有数十亿个参数需要进行设定,算力已经增加了 300,000 倍。这里涉及到的数据量是真的非常庞大,也是我们需要克服的挑战。
Moore's law, the version that Gordon Moore gave in 1975, predicted that semiconductor density would continue to grow quickly and basically double every two years but we began to diverge from that. Really quickly diverge began in around 2000 and then the spread is growing even wider. As Gordon said in the 50th anniversary of the first prediction: no exponential is forever. Moore's law is not a theorem or something that's definitely must hold true. It's an ambition which the industry was able to focus on and keeping tag. If you look at this curve, you'll notice that for roughly 50 years we drop only a factor of 15 while gaining a factor of more than almost 10,000.
【 钛媒体T-EDGE|谷歌董事会主席John Hennessy: John】摩尔定律,即戈登摩尔在 1975 年给出的版本,预测半导体密度将继续快速增长,基本上每两年翻一番,但我们开始偏离这一增长速度。偏离在2000 年左右出现,并逐步扩大。戈登在预测后的五十年后曾说道:没有任何的物理事物可以持续成倍改变。当然,摩尔定律不是定理或必须成立的真理,它是半导体行业的一个目标。仔细观察这条曲线,你会注意到在大约 50 年中,我们仅偏离了约 15 倍,但总共增长了近 10,000 倍。
So we've largely been able to keep on this curve but we began diverging and when you factor in increasing cost of new fab and new technologies and you see this curve when it's converted to price per transistor not dropping nearly as fast as it once fell.
所以我们基本上能够维持在这条曲线上,但我们确实开始跟不上了。如果你考虑到新晶圆厂和新技术的成本增加,当它转换为每个晶体管的价格时,你会看到这条曲线的下降速度不像曾经下降的那么快。
We also have faced another problem, which is the end of so-called dennard scaling. Dennard scaling is an observation led by Robert Dennard, the inventor of DRAM that is ubiquitous in computing technology. He observes that as dimensions shrunk so would the voltage and other assonance for example. And that would result in nearly constant power per millimeter of silicon. That meant because of the amount of transistors that were in each millimeter we're going up dramatically from one generation to the next, that power per computation was actually dropping quite quickly. That really came to a halt around 2007 and you see this red curb which was going up slowly at the beginning between 2000 and 2007 really began to take off. That meant that power was really the key issue and figuring out how to get energy efficiency would become more and more important as these technologies went forward.
我们还面临另一个问题,即所谓的登纳德缩放定律。登纳德缩放定律是由罗伯特·登纳德 领导的一项观察实验,他是DRAM的发明人。据他的观察,随着尺寸缩小,电压和其他共振也会缩小,这将导致每毫米硅的功率几乎恒定。这意味着由于每一毫米中的晶体管数量从一代到下一代急剧增加,每个计算的功率实际上下降得非常快。这在 2007 年左右最为明显,在 2000 年到 2007 年间开始缓慢上升的功耗开始激增。这意味着功耗确实是关键问题,随着这些技术的发展,弄清楚如何获得更高的能源效率将变得越来越重要。