钛媒体T-EDGE|谷歌董事会主席John Hennessy: John( 二 )


That was the beginning of a world change.
这是巨变的开端。
Today we've seen many other deep learning breakthroughs where deep learning is being used for complex problems, obviously crucial for image recognition which enables self-driving cars, becoming more and more useful in medical diagnosis, for example, looking at images of skin to tell whether or not a lesion is cancerous or not, and applications in natural language particularly around machine translation.
今天,深度学习也在其他领域取得重大突破,被应用于解决复杂的问题。其中最明显的自然是图像识别技术,它让自动驾驶技术成为可能。图像识别技术在医学诊断中也变得越来越有用,可通过查看皮肤图像判断是否存在癌变。除此之外,还有在自然语言处理中的应用,尤其是在机器翻译方面颇具成果。
Now for Latin-based language basically being as good as professional translators and improving constantly for Chinese to English, a much more challenging translation problem but we are seeing even a significant progress.
目前,拉丁语系的机器翻译基本上能做到和专业翻译人员相似的质量。在更具挑战的汉英翻译方面上,机器翻译也有不断改进,我们已经能看到显着的进步。
Most recently we've seen AlphaFold 2, a deep minds approach to using deep learning for protein folding, which advanced the field by at least a decade in terms of what is doable in terms of applying this technology to biology and going to dramatically change the way we make new drug discovery in the future.
近期我们也有 AlphaFold 2,一种使用深度学习进行蛋白质结构预测的应用,它将深度学习与生物学进行结合,让该类型的应用进步了至少十年,将极大程度地改变药物研发的方式。
What drove this incredible breakthrough in deep learning? Clearly the technology concepts have been around for a while and in fact many cases have been discarded earlier.
是什么让深度学习取得了以上突破?显然,这些技术概念已经存在一段时间了,在某种程度上也曾被抛弃过。
So why was it able to make this breakthrough now?
那么为什么现在我们能够取得突破呢?
First of all, we had massive amounts of data for training. The Internet is a treasure trove of data that can be used for training. ImageNet was a critical tool for training image recognition. Today, close to 100,000 objects are on ImageNet and more than 1000 images per object, enough to train image recognition systems really well. So that was the key.
首先是我们有了大量的数据用于训练AI。互联网是数据的宝库。例如 ImageNet ,就是训练图像识别的重要工具。现在ImageNet 上有近 100,000 种物体的图像,每种物体有超过 1000 张图像,这足以让我们很好地训练图像识别系统。这是重要变化之一。
Obviously we have lots of other data were using here for whether it's protein folding or medical diagnosis or natural language we're relying on the data that's available on the Internet that's been accurately labeled to be used for training.
我们当然也使用了其他大量的数据,无论是蛋白质结构、医学诊断还是自然语言处理方面,我们都依赖互联网上的数据。当然,这些数据需要被准确标记才能用于训练。
Second, we were able to marshal mass of computational resources primarily through large data centers and cloud-based computing. Training takes hours and hours using thousands of specialized processors. We simply didn't have this capability earlier. So that was crucial to solving the training problem.
第二,大型数据中心和云计算给我们带来了大量的运算资源。使用数千个专用处理器进行人工智能训练只需要数小时就能完成,我们之前根本没有这种能力。因此,算力也是一个重要因素。
I want to emphasize that training is the computational intensive problem here. Inferences are much simpler by comparison and here you see the rate of growth of performance demand in petaflops days needed to train a series of models here. If you look at training AlphaZero for example requires 1000 petaflops days, roughly a week on the largest computers available in the world.