FlashTVM: Optimizing Deep Learning Computation on OpenCL-compatible Hardware Accelerators

Abstract:

TVM is an end-to-end Deep Learning Compiler Stack, it exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. The Versatile Tensor Accelerator (VTA) is an extension of the Apache(incubating) TVM framework that exposes a RISC-like programming abstraction to describe compute and memory operations at the tensor level. The original VTA core only works on selected Xilinx Edge SoC FPGAs. Limited by hardware resources available, the performance of the current VTA core is unable to support demanding applications. At 4Paradigm, we designed and implemented a interface framework providing TVM-VTA the ability to utilize OpenCL-compatible Hardware Accelerators, including Intel and Xilinx's high-performance datacenter FPGAs.

Bio:

Li Jiashu is an R&D Engineer at High Performance Computing Division at 4Paradigm, in charge of architecting and developing FPGA accelerators for AI applications. With more than eight years of industrial experience in RTL design and embedded systems, he is actively exploring innovative ideas to bring FPGA into both data center and edge devices. Li Jiashu received his M.Comp degree on Computer Science and B.Eng degree on Computer Engineering from National University of Singapore, and he is a recipient of Lee Kuan Yew Gold Medal.