钛媒体T-EDGE|谷歌董事会主席John Hennessy: John( 五 )


And then of course some combinations of these. Can we come up with languages which match to these new domain specific architecture? Domain specific languages which improve the efficiency and let us code a range of applications very effectively.
最后是以上两类的一些结合。我们是否能开发出与这些特定架构相匹配的语言?特定领域语言可以提高效率,让我们非常有效地开发应用程序。
This is a fascinating slide from a paper that was done by Charles Leiserson and his colleagues at MIT and publish on Science called There's plenty of room at the Top.
这是查理·雷瑟森和他在麻省理工学院的同事完成发表在《科学》杂志上的一篇论文内容。论文名为“顶端有足够的空间”。
What they want to do observe is that software efficiency and the inefficiency of matching software to hardware means that we have lots of opportunity to improve performance. They took admittedly a very simple program, matrix multiply, written initially in python and ran it on an 18 core Intel processor. And simply by rewriting the code from python to C they got a factor of 47 in improvement. Then introducing parallel loops gave them another factor of approximately eight.
他们想要观察的是软件效率,以及软件与硬件匹配过程中带来的低效率,这也意味着我们有很多提高效率的地方。他们在 18 核英特尔处理器上运行了一个用 Python 编写的简单程序。把代码从 Python 重写为 C语言之后,他们就得到了 47 倍的效率改进。引入并行循环后,又有了大约 8 倍的改进。
Then introducing memory optimizations if you're familiar with large scale metrics multiplied by doing it in blocked fashion you can dramatically improve the ability to use the cashe as effectively and thereby they got another factor a little under 20 from that about 15. And then finally using the vector instructions inside the Intel processor they were able to gain another factor of 10. Overall this final program runs more than 62,000 times faster than the initial python program.
引入内存优化后可以显着提高缓存的使用效率,然后就又能获得15~20倍的效率提高。然后最后使用英特尔处理器内部的向量指令,又能够获得10 倍的改进。总体而言,这个最终程序的运行速度比最初的 Python 程序快62,000 多倍。
Now this is not to say that you would get this for the larger scale programs or all kinds of environments but it's an example of how much inefficiency is in at least for one simple application. Of course not many performance sensitive things are written in Python but even the improvement from C to the fully parallel version of C that uses SIMD instructions is similar to what you would get if you use the domain specific processor. It is significant just in its onw right. That's nearly a factor of 100, more than 100, its almost 150.
当然,这并不是说在更大规模的程序或所有环境下我们都可以取得这样的提升,但它是一个很好的例子,至少能说明一个简单的应用程序也有效率改进空间。当然,没有多少性能敏感的程序是用 Python 写的。但从完全并行、使用SIMD 指令的C语言版本程序,它能获得的效率提升类似于特定领域处理器。这已经是很大的性能提升了,这几乎是 100 的因数,超过 100,几乎是 150。
So there's lots of opportunities here and that's the key point behind us slide of an observation.
所以提升空间是很多的,这个研究的发现就是如此。
So what are these domain specific architecture? Their architecture is to achieve higher efficiency by telling the architecture the characteristics of the domain.
那么特定领域架构是什么呢?这些架构能让架构掌握特定领域的特征来实现更高的效率。
We're not trying to do just one application but we're trying to do a domain of applications like deep learning for example like computer graphics like virtual reality applications. So it's different from a strict ASIC that is designed to only one function like a modem for example.