• Formal Name: Jiaao He
  • English Name: Rick Ho
  • Common ID: laekov
  • Personal E-mail: laekov.h [at] gmail [dot] com
  • Work E-mail: hja20 [at] mails.tsinghua.edu.cn
  • ORCID: 0000-0001-8578-5158
  • Google Scholar Profile

here is a photo of me


Jiaao He is a PhD candidate at the Institute of High-Performance Computing, Department of Comperter Science and Technology, Tsinghua University. His current research interests include distributed systems for tensors with sparsity. He has developed FastMoE, the world’s first open-source distributed training framework for Mixture-of-Experts models based on PyTorch. He has a few papers published on ASPLOS, PPoPP, and other conferences. He used to be the team leader of the champion-winning Tsinghua Student Cluster Competition Team.

Academic Experiences

PACMAN Lab, Dept. CS, Tsinghua Univ., Oct. 2017 - present

Advisor: Prof. Jidong Zhai.

  • Research Intern (2017-2020)
  • Member / Team Leader of the Student Cluster Competition Team (2018-2019)
  • PhD Student (2020-)

ALCHEM Lab, Dept. ECE, USC, July 2019 - Sep 2019

Summer intern advised by Prof. Xuehai Qian.


  • Jiaao He, Shengqi Chen, Jidong Zhai, POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training (PPoPP'24)

  • Mingshu Zhai, Jiaao He, Zixuan Ma, Zan Zong, Runqing Zhang, Jidong Zhai, SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization (ATC'23)

  • Zixuan Ma, Haojie Wang, Guanyu Feng, Chen Zhang, Lei Xie, Jiaao He, Shengqi Chen, Jidong Zhai, Efficiently emulating high-bitwidth computation with low-bitwidth hardware (ICS'22)

  • Jiaao He, Jidong Zhai, Tiago Antunes, Haojie Wang, et al., FasterMoE: Modeling and Optimizing Training of Large-Scale Dynamic Pre-Trained Models (PPoPP'22)

  • Zixuan Ma, Jiaao He, Jiezhong Qiu, Huanqi Cao, et al., BAGUALU: Targeting Brain Scale Pretrained Models with over 37 Million Cores (PPoPP'22)

  • Chen Zhang, Chenggang Zhao, Jiaao He, et.al., Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From Tsinghua University (ITPDS'21)

  • Qinyi Luo, Jiaao He (co-first), Youwei Zhuo, Xuehai Qian, Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training (ASPLOS'20)

  • Jiaao He, Chenggang Zhao, et.al., Student Cluster Competition 2018, Team Tsinghua University: Reproducing the SeisSol Optimization on Intel Skylake Architecture (PARCO'19)


  • Jiaao He, Jidong Zhai, FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines (2403.11421)
  • Sha Yuan, …, Jiaao He, et al., A Roadmap for Big Model (2203.14101) (Shame on the plagiarism of certain co-authors in this paper)
  • Jiaao He, Jiezhong Qiu, et. al., FastMoE: A Fast Mixture-of-Expert Training System (2103.13262)

Other Experiences


  • Data structure and algorithm, Prof. Yuchun Ma, TA, spring 2023
  • Advanced Programming, Prof. Yuchun Ma, TA, fall 2022
  • Data structure and algorithm, Prof. Yuchun Ma, TA, spring 2022
  • Data structure and algorithm, Prof. Yuchun Ma, TA, spring 2021
  • Data structure, Prof. Hong Wang and Dr. Wentao Han, TA, spring 2021
  • Introduction to HPC, Prof. Jidong Zhai, TA, spring 2021


PAI, Alibaba Group, July-Aug 2020

Research Intern.

DPS, Sensetime Research, April 2018 - April 2019

Department Head: Prof. Dahua Lin, Mentor: Xingcheng Zhang

Intern Researcher

Cloud Beaver Corporation, Oct. 2016 - Dec. 2016

Intern full-stack web developer.


QBXT Education Corporation, Aug. 2016 - Oct. 2019

As a lecturer for Olympiad Informatics, give more than 20 lectures (more than 100 hours) to high school students in programming language, algorithm and data structure in OI contests.

Student Association of Algorithm and Contest, Dept. CS, Tsinghua, Sep. 2016 - Sep. 2017

Vice President, Head of the Platform and System Group. Designer and developer of the TUOJ online judge system.


  • FasterMoE (2021) Faster than the system below.
  • FastMoE (2020-) A fast distributed training system for mixture-of-expert model.
  • SpEst (2020) Performance estimation and optimization of sparse tensor computation kernels.
  • P-Reduce (2019) Partial all-reduce on any subset of a communication group.
  • Garlic (2018-2019) A deep learning training framework.
  • Parrots2 (2018-2019)
  • TUOJ (2016-2017) A distributed online judge system for programming contests.
  • Shiruku (2016) A blogging website that used to host laekov’s blog.

Course and homework projects are not listed here. See my Github FYI.

Awards and Honors

  • Champion in Student Cluster Competition at SC'19 as the team leader, Denver, CO. 2019.

  • CCF Elite Colligate Award, China Computer Foundation, 2019.

  • Scholarship for techniques, CS Dept. Tsinghua. 2019.

  • Second place in Student Cluster Competiton at ISC'19 as the team leader, Frankfurt, Germany. 2019.

  • Second place in ASC'19 as the team leader, Dalian, China. 2019.

  • Outstanding Intern Reward in Sensetime Research, Beijing, China, 2018.

  • Champion in Student Cluster Competition at SC'18, Dallas, TX. 2018.

  • Scholarship for techniques, CS Dept. Tsinghua. 2018.

  • Champion in Student Cluster Competiton at ISC'18, Frankfurt, Germany. 2018.

  • Champion in ASC'18, Nanchang, China, 2018.

  • Star of 9#, Dept. CS, Tsinghua, Beijing, China, 2017.

  • Scholarship for techniques, CS Dept. Tsinghua. 2017.

  • 2nd place, Lanqiao International Programming Competition, Princeton, NJ, 2017.

  • Gold Medal (2nd place) in ACM-ICPC Asia Regional Contest, Qingdao, China, 2016.

  • Silver Medal in National Olympiad Informatics (NOI), Hangzhou, China, 2015.

  • Gold Medal in Asia-Pacific Informatics Olympiad (APIO), Beijing, China, 2015.

Social Services

  • President of Cycling Association, Tsinghua University. Aug.2018 - June.2019

  • International Volunteer for Turtle protection, Sri Lanka, Aug.2017


Outdoor Activites (MTB and Road Cycling, Triathlon and Trail Run, etc.), Auto Racing (Karting and Simulators)