Awesome-Machine-Learning-System-Papers is a curated list of machine learning system papers in recent years (since 2017). Star this repository, and then you can keep abreast of the latest developments of this booming research field.
Thanks to all the people who made contributions to this project. We strongly encourage the researchers to make pull request (e.g., add missing papers, fix errors) and help the others in this community!
Currently, the listed papers are collected from the following conferences:
Conferences | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 |
---|---|---|---|---|---|---|---|---|---|---|
OSDI | 3 | - | 4 | - | 11 | 8 | ||||
SOSP | - | 1 | - | 10 | - | 3 | ||||
SIGMOD | 2 | 4 | 3 | 4 | 8 | 16 | ||||
ASPLOS | - | 3 | 3 | 9 | 4 | 5 | ||||
ATC | - | 2 | 3 | 5 | 8 | 8 | ||||
PPoPP | - | 2 | 4 | 2 | 1 | 15 | ||||
HPCA | - | 3 | 4 | 4 | 1 | 11 | ||||
MICRO | - | 4 | 6 | 9 | 13 | 4 | ||||
SC | 5 | - | 6 | 6 | 14 | 15 | ||||
NSDI | - | 3 | - | 3 | - | 4 | ||||
ISCA | - | 6 | 11 | 7 | 13 | 18 | ||||
VLDB | 3 | 5 | 6 | - | 4 | 3 | ||||
SIGCOMM | - | - | - | 3 | 2 | 2 | ||||
ICDE | 2 | 5 | 1 | 6 | 7 | 4 | ||||
SIGKDD | - | 5 | 4 | 7 | 8 | 8 | ||||
EuroSys | 2 | 1 | 4 | 5 | 3 | 5 | ||||
SoCC | 2 | 3 | 3 | 2 | 6 | - | ||||
SysML | - | - | 6 | 32 | 34 | 52 | ||||
WWW | - | - | 1 | |||||||
INFOCOM | - | 2 | 4 | |||||||
SIGIR | - | - | - |
Some conferences to be added in the future:
- Model training
- Model inference and serving
- Hardware-efficient ML methods
- Privacy and security for ML applications
- Testing, debugging, and monitoring of ML applications
- Data management/preparation and feature selection/extraction
- Distributed and parallel learning algorithms
- ML compilers, programming languages/models
- Resource scheduling for ML applications
- Graph learning systems
- AutoML, e.g., HPO, NAS
- ML platform/pipeline/lifecycle
- ...
-
TensorFlow: A System for Large-Scale Machine Learning OSDI 2016
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, Xiaoqiang Zheng
-
Exploring the Hidden Dimension in Graph Processing OSDI 2016
Mingxing Zhang, Yongwei Wu, Kang Chen, Xuehai Qian, Xue Li, Weimin Zheng
-
Gemini: A Computation-Centric Distributed Graph Processing System OSDI 2016
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, Xiaosong Ma
-
Ray: A Distributed Framework for Emerging AI Applications OSDI 2018
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, Ion Stoica
-
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning OSDI 2018
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
-
Gandiva: Introspective Cluster Scheduling for Deep Learning OSDI 2018
Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, Lidong Zhou
-
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems OSDI 2018
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, Matteo Interlandi
-
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up OSDI 2020
Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace
-
A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters OSDI 2020
Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo
-
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads OSDI 2020
Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, Matei Zaharia
-
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications OSDI 2020
Zhihao Bai, Zhen Zhang, Yibo Zhu, Xin Jin
-
HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees OSDI 2020
Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Fan Yang, Lidong Zhou, Mao Yang, Francis C. M. Lau, Yuqi Wang, Yifan Xiong, Bin Wang
-
AntMan: Dynamic Scaling on GPU Clusters for Deep Learning OSDI 2020
Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, Yangqing Jia
Code: https://github.com/alibaba/GPU-scheduler-for-deep-learning
-
Ansor: Generating High-Performance Tensor Programs for Deep Learning OSDI 2020
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica
-
Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks OSDI 2020
Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, Lidong Zhou
-
A Tensor Compiler for Unified Machine Learning Prediction Serving OSDI 2020
Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi
-
Retiarii: A Deep Learning Exploratory-Training Framework OSDI 2020
Quanlu Zhang, Zhenhua Han, Fan Yang, Yuge Zhang, Zhe Liu, Mao Yang, Lidong Zhou
-
KungFu: Making Training in Distributed Machine Learning Adaptive
Luo Mai, Guo Li, Marcel Wagenländer, Konstantinos Fertakis, Andrei-Octavian Brabete, Peter R. Pietzuch
-
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning OSDI 2021
Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, Eric P. Xing
-
Oort: Efficient Federated Learning via Guided Participant Selection OSDI 2021
Fan Lai, Xiangfeng Zhu, Harsha V. Madhyastha, Mosharaf Chowdhury
-
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections OSDI 2021
Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, Liyan Zheng, Yuanzhi Li, Kaiyuan Rong, Yuanyong Chen, Zhihao Jia
-
Privacy Budget Scheduling OSDI 2021
Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer
-
Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads OSDI 2021
John Thorpe, Yifan Qiao, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, Guoqing Harry Xu
-
GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs OSDI 2021
Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, Yufei Ding
-
Marius: Learning Massive Graph Embeddings on a Single Machine OSDI 2021
Jason Mohoney, Roger Waleffe, Henry Xu, Theodoros Rekatsinas, Shivaram Venkataraman
-
P3: Distributed Deep Graph Learning at Scale OSDI 2021
Swapnil Gandhi, Anand Padmanabha Iyer
-
DeepXplore: Automated Whitebox Testing of Deep Learning Systems SOSP 2017
Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana
Code: https://github.com/peikexin9/deepxplore
-
PipeDream: Generalized Pipeline Parallelism for DNN Training SOSP 2019
Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, Matei Zaharia
-
A Generic Communication Scheduler for Distributed DNN Training Acceleration SOSP 2019
Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, Chuanxiong Guo
-
Parity Models: Erasure-Coded Resilience for Prediction Serving Systems SOSP 2019
Jack Kosaian, K. V. Rashmi, Shivaram Venkataraman
-
TASO: Optimizing Deep Learning Computation with Automated Generation of Graph Substitutions SOSP 2019
Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, Alex Aiken
-
Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations SOSP 2019
Shoumik Palkar, Matei Zaharia
-
Niijima: Sound and Automated Computation Consolidation for Efficient Multilingual Data-Parallel Pipelines SOSP 2019
Guoqing Harry Xu, Margus Veanes, Michael Barnett, Madan Musuvathi, Todd Mytkowicz, Ben Zorn, Huan He, Haibo Lin
-
Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis SOSP 2019
Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, Ravi Sundaram
-
AutoMine: Harmonizing High-Level Abstraction and High Performance for Graph Mining SOSP 2019
Daniel Mawhirter, Bo Wu
-
KnightKing: A Fast Distributed Graph Random Walk Engine SOSP 2019
Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, Yong Jiang
-
Gerenuk: Thin Computation over Big Native Data Using Speculative Program Transformation SOSP 2019
Christian Navasca, Cheng Cai, Khanh Nguyen, Brian Demsky, Shan Lu, Miryung Kim, Guoqing Harry Xu
-
Gradient Compression Supercharged High-Performance Data Parallel DNN Training SOSP 2021
Youhui Bai, Cheng Li, Quan Zhou, Jun Yi, Ping Gong, Feng Yan, Ruichuan Chen, Yinlong Xu
-
Generating Complex, Realistic Cloud Workloads using Recurrent Neural Networks SOSP 2021 Shane Bergsma, Timothy Zeyl, Arik Senderovich, J. Christopher Beck
-
Random Walks on Huge Graphs at Cache Efficiency SOSP 2021
Ke Yang, Xiaosong Ma, Saravanan Thirumuruganathan, Kang Chen, Yongwei Wu
-
Building Machine Learning Systems that Understand SIGMOD 2016
Jeff Dean
-
M3: Scaling Up Machine Learning via Memory Mapping SIGMOD 2016
Dezhi Fang, Duen Horng Chau
-
Heterogeneity-aware Distributed Parameter Servers SIGMOD 2017
Jiawei Jiang, Bin Cui, Ce Zhang, Lele Yu
-
Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics SIGMOD 2017
Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, Jeffrey F. Naughton
-
Data Management in Machine Learning: Challenges, Techniques, and Systems SIGMOD 2017
Arun Kumar, Matthias Boehm, Jun Yang
-
Data Management Challenges in Production Machine Learning SIGMOD 2017
Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, Martin Zinkevich
-
DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions SIGMOD 2018
Jiawei Jiang, Bin Cui, Ce Zhang, Fangcheng Fu
-
SketchML: Accelerating Distributed Machine Learning with Data Sketches SIGMOD 2018
Jiawei Jiang, Fangcheng Fu, Tong Yang, Bin Cui
-
Accelerating Machine Learning Inference with Probabilistic Predicates SIGMOD 2018
Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, Surajit Chaudhuri
-
PS2: Parameter Server on Spark SIGMOD 2019
Zhipeng Zhang, Bin Cui, Yingxia Shao, Lele Yu, Jiawei Jiang, Xupeng Miao
-
Large Scale Graph Mining with G-Miner SIGMOD 2019
Hongzhi Chen, Xiaoxi Wang, Chenghuan Huang, Juncheng Fang, Yifan Hou, Changji Li, James Cheng
-
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees SIGMOD 2019
Yongjoo Park, Jingyi Qing, Xiaoyang Shen, Barzan Mozafari
-
Democratizing Data Science through Interactive Curation of ML Pipelines SIGMOD 2019
Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Ferdinand Kossmann, Philipp Eichmann, Yeounoh Chung, Carsten Binnig, Eli Upfal, Tim Kraska
-
Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs SIGMOD 2020
Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, Lei Chen
-
TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines SIGMOD 2020
Emily Caveness, Paul Suganthan G. C., Zhuo Peng, Neoklis Polyzotis, Sudip Roy, Martin Zinkevich
-
Optimizing Machine Learning Workloads in Collaborative Environments SIGMOD 2020
Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Ziawasch Abedjan, Tilmann Rabl, Volker Markl
-
Vertica-ML: Distributed Machine Learning in Vertica Database SIGMOD 2020
Arash Fard, Anh Le, George Larionov, Waqas Dhillon, Chuck Bear
-
DB4ML - An In-Memory Database Kernel with Machine Learning Support SIGMOD 2020
Matthias Jasny, Tobias Ziegler, Tim Kraska, Uwe Röhm, Carsten Binnig
-
Active Learning for ML Enhanced Database Systems SIGMOD 2020
Lin Ma, Bailu Ding, Sudipto Das, Adith Swaminathan
-
MemFlow: Memory-Aware Distributed Deep Learning SIGMOD 2020
Neil Band
-
Systems and ML: When the Sum is Greater than Its Parts SIGMOD 2020
Ion Stoica
-
Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce SIGMOD 2021
Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma, Bin Cui
-
VF^2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning SIGMOD 2021
Fangcheng Fu, Yingxia Shao, Lele Yu, Jiawei Jiang, Huanran Xue, Yangyu Tao, Bin Cui
-
ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks SIGMOD 2021
Wentao Zhang, Yu Shen, Yang Li, Lei Chen, Zhi Yang, Bin Cui
-
Agile and Accurate CTR Prediction Model Training for Massive-Scale Online Advertising Systems SIGMOD 2021
Zhiqiang Xu, Dong Li, Weijie Zhao, Xing Shen, Tianbo Huang, Xiaoyun Li, Ping Li
-
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities SIGMOD 2021
Doris Xin, Hui Miao, Aditya G. Parameswaran, Neoklis Polyzotis
-
Vertex-Centric Visual Programming for Graph Neural Networks SIGMOD 2021
Yidi Wu, Yuntao Gui, Tatiana Jin, James Cheng, Xiao Yan, Peiqi Yin, Yufei Cai, Bo Tang, Fan Yu
-
Deep Learning: Systems and Responsibility SIGMOD 2021
Abdul Wasay, Subarna Chatterjee, Stratos Idreos
-
Expand your Training Limits! Generating Training Data for ML-based Data Management SIGMOD 2021
Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Volker Markl
-
Towards Benchmarking Feature Type Inference for AutoML Platforms SIGMOD 2021
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, Arun Kumar
Code: https://github.com/pvn25/ML-Data-Prep-Zoo/tree/master/MLFeatureTypeInference
-
LightNE: A Lightweight Graph Processing System for Network Embedding SIGMOD 2021
Jiezhong Qiu, Laxman Dhulipala, Jie Tang, Richard Peng, Chi Wang
-
Scalable and Usable Relational Learning With Automatic Language Bias SIGMOD 2021
Jose Picado, Arash Termehchy, Alan Fern, Sudhanshu Pathak, Praveen Ilango, John Davis
-
Model-Parallel Model Selection for Deep Learning Systems SIGMOD 2021
Kabir Nagrecha
-
Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra SIGMOD 2021
Shangyu Luo, Dimitrije Jankov, Binhang Yuan, Chris Jermaine
-
Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil? SIGMOD 2021
Arun Kumar
-
Towards Demystifying Serverless Machine Learning Training SIGMOD 2021
Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang
-
AI Meets Database: AI4DB and DB4AI SIGMOD 2021
Guoliang Li, Xuanhe Zhou, Lei Cao
-
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs ASPLOS 2017
Kaiwei Li, Jianfei Chen, Wenguang Chen, Jun Zhu
-
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing ASPLOS 2017
Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, Bo Yuan
-
Optimizing CNNs on Multicores for Scalability, Performance and Goodput ASPLOS 2017
Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, Trishul M. Chilimbi
-
Bridge the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler ASPLOS 2018
Yu Ji, Youhui Zhang, Wenguang Chen, Yuan Xie
-
MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects ASPLOS 2018
Hyoukjun Kwon, Ananda Samajdar, Tushar Krishna
-
VIBNN: Hardware Acceleration of Bayesian Neural Networks ASPLOS 2018
Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, Yanzhi Wang
-
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference ASPLOS 2019
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, Dejan S. Milojicic
-
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture ASPLOS 2019
Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, Yuan Xie
-
Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks ASPLOS 20189
Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Kevin Siu, Andreas Moshovos
-
TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators ASPLOS 2019
Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, Christos Kozyrakis
-
Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization ASPLOS 2019
H. T. Kung, Bradley McDanel, Sai Qian Zhang
-
Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization ASPLOS 2019
Tian Jin, Seokin Hong
-
HOP: Heterogeneity-Aware Decentralized Training ASPLOS 2019
Qinyi Luo, Jinkun Lin, Youwei Zhuo, Xuehai Qian
-
Astra: Exploiting Predictability to Optimize Deep Learning ASPLOS 2019
Muthian Sivathanu, Tapan Chugh, Sanjay S. Singapuram, Lidong Zhou
-
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers ASPLOS 2019
Ao Ren, Tianyun Zhang, Shaokai Ye, Jiayu Li, Wenyao Xu, Xuehai Qian, Xue Lin, Yanzhi Wang
Code: https://github.com/bowenl0218/bpgan-signal-compression
-
Prague: High-Performance Heterogeneity-Aware Asynchronouos Decentralized Training ASPLOS 2020
Qinyi Luo, Jiaao He, Youwei Zhuo, Xuehai Qian
-
Capuchin: Tensor-based GPU Memory Management for Deep Learning ASPLOS 2020
Xuan Peng, Xuanhua Shi, Hulin Dai, Hai Jin, Weiliang Ma, Qian Xiong, Fan Yang, Xuehai Qian
-
DNNGuard: An Elastic Heterogeneous DNN Accelerator Architecture against Adversarial Attacks ASPLOS 2020
Xingbin Wang, Rui Hou, Boyan Zhao, Fengkai Yuan, Jun Zhang, Dan Meng, Xuehai Qian
-
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning ASPLOS 2020
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren
-
Neural architecture search as program transformation exploration ASPLOS 2021
Jack Turner, Elliot J. Crowley, Michael F. P. O'Boyle
Code: https://github.com/jack-willturner/nas-as-program-transformation-exploration
-
Analytical characterization and design space exploration for optimization of CNNs ASPLOS 2021
Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P. Sadayappan
-
Mind mappings: enabling efficient algorithm-accelerator mapping space search ASPLOS 2021
Kartik Hegde, Po-An Tsai, Sitao Huang, Vikas Chandra, Angshuman Parashar, Christopher W. Fletcher
-
NeuroEngine: a hardware-based event-driven simulation system for advanced brain-inspired computing ASPLOS 2021
Hunjun Lee, Chanmyeong Kim, Yujin Chung, Jangwoo Kim
-
Defensive approximation: securing CNNs using approximate computing ASPLOS 2021
Amira Guesmi, Ihsen Alouani, Khaled N. Khasawneh, Mouna Baklouti, Tarek Frikha, Mohamed Abid, Nael B. Abu-Ghazaleh
-
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters ATC 2017
Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing
-
Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication ATC 2017
Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, Yafei Dai
-
Litz: Elastic Framework for High-Performance Distributed Machine Learning ATC 2018
Aurick Qiao, Abutalib Aghayev, Weiren Yu, Haoyang Chen, Qirong Ho, Garth A. Gibson, Eric P. Xing
-
Cavs: An Efficient Runtime System for Dynamic Neural Networks ATC 2018
Shizhen Xu, Hao Zhang, Graham Neubig, Wei Dai, Jin Kyu Kim, Zhijie Deng, Qirong Ho, Guangwen Yang, Eric P. Xing
-
DeepCPU: Serving RNN-based Deep Learning Models 10x Faster ATC 2018
Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, Yuxiong He
-
STRADS-AP: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model ATC 2019
Jin Kyu Kim, Abutalib Aghayev, Garth A. Gibson, Eric P. Xing
-
NeuGraph: Parallel Deep Neural Network Computation on Large Graphs ATC 2019
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
-
Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads ATC 2019
Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, Fan Yang
-
Optimizing CNN Model Inference on CPUs ATC 2019
Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, Yida Wang
-
MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving ATC 2019
Chengliang Zhang, Minchen Yu, Wei Wang, Feng Yan
-
HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism ATC 2020
Jay H. Park, Gyeongchan Yun, Chang M. Yi, Nguyen T. Nguyen, Seungmin Lee, Jaesik Choi, Sam H. Noh, Young-ri Choi
-
AutoSys: The Design and Operation of Learning-Augmented Systems ATC 2020
Chieh-Jan Mike Liang, Hui Xue, Mao Yang, Lidong Zhou, Lifei Zhu, Zhao Lucis Li, Zibo Wang, Qi Chen, Quanlu Zhang, Chuanjie Liu, Wenjun Dai
-
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training ATC 2020
Hongyu Zhu, Amar Phanishayee, Gennady Pekhimenko
-
ALERT: Accurate Learning for Energy and Timeliness
Chengcheng Wan, Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire, Shan Lu
-
NeuOS: A Latency-Predictable Multi-Dimensional Optimization Framework for DNN-driven Autonomous Systems ATC 2020
Soroush Bateni, Cong Liu
-
PERCIVAL: Making In-Browser Perceptual Ad Blocking Practical with Deep Learning ATC 2020
Zain ul Abi Din, Panagiotis Tigas, Samuel T. King, Benjamin Livshits
-
GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks ATC 2020
Rui Wang, Yongkun Li, Hong Xie, Yinlong Xu, John C. S. Lui
-
Scaph: Scalable GPU-Accelerated Graph Processing with Value-Driven Differential Scheduling ATC 2020
Long Zheng, Xianliang Li, Yaohui Zheng, Yu Huang, Xiaofei Liao, Hai Jin, Jingling Xue, Zhiyuan Shao, Qiang-Sheng Hua
-
Zico: Efficient GPU Memory Sharing for Concurrent DNN Training ATC 2021
Gangmuk Lim, Jeongseob Ahn, Wencong Xiao, Youngjin Kwon, Myeongjae Jeon
-
Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning ATC 2021
Qihua Zhou, Song Guo, Zhihao Qu, Jingcai Guo, Zhenda Xu, Jiewei Zhang, Tao Guo, Boyuan Luo, Jingren Zhou
-
GLIST: Towards In-Storage Graph Learning ATC 2021
Cangyuan Li, Ying Wang, Cheng Liu, Shengwen Liang, Huawei Li, Xiaowei Li
-
Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism ATC 2021
Saar Eliad, Ido Hakimi, Alon De Jagger, Mark Silberstein, Assaf Schuster
-
INFaaS: Automated Model-less Inference Serving ATC 2021
Francisco Romero, Qian Li, Neeraja J. Yadwadkar, Christos Kozyrakis
-
Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training ATC 2021
Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko
-
Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training ATC 2021
Gyewon Lee, Irene Lee, Hyeonmin Ha, Kyung-Geun Lee, Hwarim Hyun, Ahnjae Shin, Byung-Gon Chun
-
ZeRO-Offload: Democratizing Billion-Scale Model Training ATC 2021
Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He
-
Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations PPoPP 2017
Tal Ben-Nun, Michael Sutton, Sreepathi Pai, Keshav Pingali:
-
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters PPoPP 2017
Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda
-
Bridging the Gap between Deep Learning and Sparse Matrix Format Selection PPoPP 2018
Yue Zhao, Jiajia Li, Chunhua Liao, Xipeng Shen
-
LazyGraph: Lazy Data Coherency for Replicas in Distributed Graph-Parallel Computation PPoPP 2018
Lei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Ying Liu, Xiaobing Feng
-
Making Pull-Based Graph Processing Performant PPoPP 2018
Samuel Grossman, Heiner Litz, Christos Kozyrakis
-
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks PPoPP 2018
Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska
-
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning PPoPP 2019
Joel Hestness, Newsha Ardalani, Gregory F. Diamos
-
A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs PPoPP 2019
Ke Meng, Jiajia Li, Guangming Tan, Ninghui Sun
-
Taming unbalanced training workloads in deep learning with partial collective operations PPoPP 2020
Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler
-
Understanding and bridging the gaps in current GNN performance optimizations PPoPP 2021
Kezhao Huang, Jidong Zhai, Zhen Zheng, Youngmin Yi, Xipeng Shen
-
Scaling implicit parallelism via dynamic control replication PPoPP 2021
Michael Bauer, Wonchan Lee, Elliott Slaughter, Zhihao Jia, Mario Di Renzo, Manolis Papadakis, Galen M. Shipman, Patrick S. McCormick, Michael Garland, Alex Aiken
-
Compiler support for near data computing PPoPP 2021
Mahmut Taylan Kandemir, Jihyun Ryoo, Xulong Tang, Mustafa Karaköy
-
BiPart: a parallel and deterministic hypergraph partitioner PPoPP 2021
Sepideh Maleki, Udit Agarwal, Martin Burtscher, Keshav Pingali
-
GPTune: multitask learning for autotuning exascale applications PPoPP 2021
Yang Liu, Wissam M. Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James Weldon Demmel, Xiaoye S. Li
-
I/O lower bounds for auto-tuning of convolutions in CNNs PPoPP 2021
Xiaoyang Zhang, Junmin Xiao, Guangming Tan
-
ApproxTuner: a compiler and runtime system for adaptive approximations PPoPP 2021
Hashim Sharif, Yifan Zhao, Maria Kotsifakou, Akash Kothari, Ben Schreiber, Elizabeth Wang, Yasmin Sarita, Nathan Zhao, Keyur Joshi, Vikram S. Adve, Sasa Misailovic, Sarita V. Adve
-
TurboTransformers: an efficient GPU serving system for transformer models PPoPP 2021
Jiarui Fang, Yang Yu, Chengduo Zhao, Jie Zhou
-
DAPPLE: a pipelined data parallel approach for training large models PPoPP 2021
Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao, Siyu Wang, Zhen Zheng, Chuan Wu, Guoping Long, Jun Yang, Lixue Xia, Lansong Diao, Xiaoyong Liu, Wei Lin
-
Corder: cache-aware reordering for optimizing graph analytics PPoPP 2021
YuAng Chen, Yeh-Ching Chung
-
DFOGraph: an I/O- and communication-efficient system for distributed fully-out-of-core graph processing PPoPP 2021
Jiping Yu, Wei Qin, Xiaowei Zhu, Zhenbo Sun, Jianqiang Huang, Xiaohan Li, Wenguang Chen
-
An efficient uncertain graph processing framework for heterogeneous architectures PPoPP 2021
Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao, Yanjun Wu, Shuaiwen Leon Song
-
Dynamic scaling for low-precision learning PPoPP 2021
Ruobing Han, Min Si, James Demmel, Yang You
-
Exploring deep reuse in winograd CNN inference PPoPP 2021
Ruofan Wu, Feng Zhang, Zhen Zheng, Xiaoyong Du, Xipeng Shen
-
A novel memory-efficient deep learning training framework via error-bounded lossy compression PPoPP 2021
Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao
-
Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures HPCA 2017
Mingcong Song, Yang Hu, Huixiang Chen, Tao Li
-
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning HPCA 2017
Linghao Song, Xuehai Qian, Hai Li, Yiran Chen
-
FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks HPCA 2017
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, Xiaowei Li
-
Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-based Deep Learning HPCA 2018
Mingcong Song, Jiaqi Zhang, Huixiang Chen, Tao Li
-
Making Memristive Neural Network Accelerators Reliable HPCA 2018
Ben Feinberg, Shibo Wang, Engin Ipek
-
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks HPCA 2018
Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Youngeun Kwon, Stephen W. Keckler
-
In-situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems HPCA 2018
Mingcong Song, Kan Zhong, Jiaqi Zhang, Yang Hu, Duo Liu, Weigong Zhang, Jing Wang, Tao Li
-
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array HPCA 2019
Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen
-
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs HPCA 2019
Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, Qinru Qiu, Wenyao Xu, Xue Lin, Xuehai Qian, Yanzhi Wang
-
Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks HPCA 2019
Xiaowei Wang, Jiecao Yu, Charles Augustine†, Ravi Iyer†, Reetuparna Das
-
Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators HPCA 2019
Arash Azizimazreah, Lizhong Chen
-
AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerator Arrays HPCA 2020
Linghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen
-
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads HPCA 2021
Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, Vikas Chandra
-
SPAGHETTI: Streaming Accelerators for Highly Sparse GEMM on FPGAs HPCA 2021
Reza Hojabr, Ali Sedaghati, Amirali Sharifian, Ahmad Khonsari, Arrvindh Shriraman
-
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning HPCA 2021
Jiajun Li, Ahmed Louri, Avinash Karanth, Razvan Bunescu
-
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework HPCA 2021
Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden K.-H. So, Xuehai Qian, Yanzhi Wang, Xue Lin
-
Revisiting HyperDimensional Learning for FPGA and Low-Power Architectures HPCA 2021
Mohsen Imani, Zhuowen Zou, Samuel Bosch, Sanjay Anantha Rao, Sahand Salamat, Venkatesh Kumar, Yeseong Kim, Tajana Rosing
-
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training HPCA 2021
Youngeun Kwon, Yunjae Lee, Minsoo Rhu
-
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent HPCA 2021
Heesu Kim, Hanmin Park, Taehyun Kim, Kwanheum Cho, Eojin Lee, Soojung Ryu, Hyuk-Jae Lee, Kiyoung Choi, Jinho Lee
-
SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator HPCA 2021
Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, Xing Hu, Yuan Xie
-
Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling HPCA 2021
Young H. Oh, Seonghak Kim, Yunho Jin, Sam Son, Jonghyun Bae, Jongsung Lee, Yeonhong Park, Dong Uk Kim, Tae Jun Ham, Jae W. Lee
-
Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning HPCA 2021
Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, Dong Li
-
CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters HPCA 2021
Jiajun Li, Ahmed Louri, Avinash Karanth, Razvan Bunescu
-
Scale-Out Acceleration for Machine Learning MICRO 2017
Jongse Park, Hardik Sharma, Divya Mahajan, Joon Kyung Kim, Preston Olds, Hadi Esmaeilzadeh
-
Bit-Pragmatic Deep Neural Network Computing MICRO 2017
Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, Andreas Moshovos
-
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices MICRO 2017
Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yipeng Zhang, Jian Tang, Qinru Qiu, Xue Lin, Bo Yuan
-
DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission MICRO 2017
Parker Hill, Animesh Jain, Mason Hill, Babak Zamirai, Chang-Hong Hsu, Michael A. Laurenzano, Scott Mahlke, Lingjia Tang, Jason Mars
-
Addressing Irregularity in Sparse Neural Networks:A Cooperative Software/Hardware Approach MICRO 2018
Xi Zeng, Tian Zhi, Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Bingrui Wang, Yuanbo Wen, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, Ninghui Sun, Yunji Chen
-
Diffy: a Deja vu-Free Differential Deep Neural Network Accelerator MICRO 2018
Mostafa Mahmoud, Kevin Siu, Andreas Moshovos
-
Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs MICRO 2018
Xingyao Zhang, Chenhao Xie, Jing Wang, Weidong Zhang, Xin Fu
-
A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks MICRO 2018
Youjie Li, Jongse Park, Mohammad Alian, Yifan Yuan, Zheng Qu, Peitian Pan, Ren Wang, Alexander Gerhard Schwing, Hadi Esmaeilzadeh, Nam Sung Kim
-
PermDNN: Efficient Compressed Deep Neural Network Architecture with Permuted Diagonal Matrices MICRO 2018
Chunhua Deng, Siyu Liao, Yi Xie, Keshab K. Parhi, Xuehai Qian, Bo Yuan
-
Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach MICRO 2018
Jiawen Liu, Hengyu Zhao, Matheus Almeida Ogleari, Dong Li, Jishen Zhao
-
Wire-Aware Architecture and Dataflow for CNN Accelerators MICRO 2019
Sumanth Gudaparthi, Surya Narayanan, Rajeev Balasubramonian, Edouard Giacomin, Hari Kambalasubramanyam, Pierre-Emmanuel Gaillardon
-
Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture MICRO 2019
Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler
-
ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning MICRO 2019
Alberto DelmásLascorz, Sayeh Sharify, Isak Edo, Dylan Malone Stuart, Omar Mohamed Awad, Patrick Judd, Mostafa Mahmoud, Milos Nikolic, Kevin Siu, Zissis Poulos, Andreas Moshovos
-
ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions MICRO 2019
Berkin Akin, Zeshan A. Chishti, Alaa R. Alameldeen
-
Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating MICRO 2019
Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, G. Edward Suh
-
SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks MICRO 2019
Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, T. N. Vijaykumar
-
EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM MICRO 2019
Skanda Koppula, Lois Orosa, A. Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, Onur Mutlu
-
eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference MICRO 2019
Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, Kai-Ping Lin, Li-Wei Wang, Li-De Chen
-
Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization MICRO 2019
Fazle Sadi, Joe Sweeney, Tze Meng Low, James C. Hoe, Larry Pileggi, Franz Franchetti
-
SuperNPU: An Extremely Fast Neural Processing Unit Using Superconducting Logic Devices MICRO 2020
Koki Ishida, Ilkwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, and Koji Inoue
-
Printed Machine Learning Classifiers MICRO 2020
Muhammad Husnain Mubarik ,Dennis D. Weller, Nathaniel Bleier, Matthew Tomei Jasmin Aghassi-Hagmann, Mehdi B. Tahoori and Rakesh Kumar
-
Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration MICRO 2020
Akshay Krishna Ramanathan, Gurpreet S Kalsi, Srivatsa Srinivasa, Tarun Makesh Chandran, Kamlesh R Pillai, Om J Omer, Vijaykrishnan Narayanan, Sreenivas Subramoney
-
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning MICRO 2020
Sheng-Chun Kao, Geonhwa Jeong, Tushar Krishna
-
VR-DANN: Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration MICRO 2020
Zhuoran Song, Feiyang Wu, Xueyuan Liu, Jing Ke, Naifeng Jing, Xiaoyao Liang
-
Procrustes: A Dataflow and Accelerator for Sparse Deep Neural Network Training MICRO 2020
Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, Mieszko Lis
-
Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores MICRO 2020
Hyeonjin Kim , Sungwoo Ahn , Yunho Oh† , Bogil Kim , Won Woo Ro , William J. Song
-
DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture MICRO 2020
Liu Liu, Zheng Qu, Lei Deng, Fengbin Tu, Shuangchen Li, Xing Hu, Zhenyu Gu, Yufei Ding, Yuan Xie
-
TFE: Energy-Efficient Transferred Filter-Based Engine to Compress and Accelerate Convolutional Neural Networks MICRO 2020
Huiyu Mo, Leibo Liu, Wenjing Hu, Wenping Zhu, Qiang Li, Ang Li, Shouyi Yin, Jian Chen, Xiaowei Jiang, Shaojun Wei
-
MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product MICRO 2020
Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, Zhiru Zhang
-
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training MICRO 2020
Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos
-
SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs MICRO 2020
Zhangxiaowen Gong, Houxiang Ji, Christopher W. Fletcher, Christopher J. Hughes, Sara Baghsorkhi, Josep Torrellas
-
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference MICRO 2020
Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, Andreas Moshovos
-
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference MICRO 2021
Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul Whatmough, Alexander M. Rush, David Brooks, Gu-Yeon Wei
-
FPRaker: A Processing Element for Accelerating Neural Network Training MICRO 2021
Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos
-
RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance MICRO 2021
Udit Gupta, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin Sean Lee, Gu-Yeon Wei, Carole-Jean Wu, David Brooks
-
Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving MICRO 2021
Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu
-
SERF: Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism SC 2016
Feng Yan, Olatunji Ruwase, Yuxiong He, Evgenia Smirni
-
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs SC 2016
Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, Huiyang Zhou
-
dCUDA: Hardware Supported Overlap of Computation and Communication SC 2016
Tobias Gysi, Jeremia Bär, Torsten Hoefler
-
GreenLA: Green Linear Algebra Software for GPU-Accelerated Heterogeneous Computing SC 2016
Jieyang Chen, Li Tan, Panruo Wu, Dingwen Tao, Hongbo Li, Xin Liang, Sihuan Li, Rong Ge, Laxmi Bhuyan, Zizhong Chen
-
Merge-Based Parallel Sparse Matrix-Vector Multiplication (SpMV) SC 2016
Duane Merrill, Michael Garland
-
Large-Scale Hierarchical K-Means for Heterogeneous Many-Core Supercomputers SC 2018
Liandeng Li, Teng Yu, Wenlai Zhao, Haohuan Fu, Chenyu Wang, Li Tan, Guangwen Yang, John Thomson
-
TriCore: Parallel Triangle Counting on GPUs SC 2018
Yang Hu, Hang Liu, H. Howie Huang
-
PruneJuice: Pruning Trillion-Edge Graphs to a Precise Pattern-Matching Solution SC 2018
Tahsin Reza, Matei Ripeanu, Nicolas Tripoul, Geoffrey Sanders, Roger Pearce
-
Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines SC 2018
Randall Pittman, Hui Guan, Xipeng Shen, Seung-Hwan Lim, Robert M. Patton
-
Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures SC 2018
Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj Kalamkar, Greg Henry, Hans Pabst, Alexander Heinecke
-
Fault Tolerant One-Sided Matrix Decompositions on Heterogeneous Systems with GPUs SC 2018
Jieyang Chen, Hongbo Li, Sihuan Li, Xin Liang, Panruo Wu, Dingwen Tao, Kaiming Ouyang, Yuanlai Liu, Kai Zhao, Qiang Guan, Zizhong Chen
-
Large-Batch Training for LSTM and Beyond SC 2019
Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
-
Channel and Filter Parallelism for Large-Scale CNN Training SC 2019
Nikoli Dryden, Naoya Maruyama, Tim Moon, Tom Benson, Marc Snir, Brian Van Essen
-
SparCML: High-Performance Sparse Communication for Machine Learning SC 2019
Cedric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, and Torsten Hoefler
-
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration SC 2019
Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez
-
Scalable Reinforcement-Learning-Based Neural Architecture Search for Cancer Deep Learning Research SC 2019
Prasanna Balaprakash, Romain Egele, Misha Salim, Stefan Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, Rick Stevens
-
BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets SC 2019
Ang Li, Tong Geng, Tianqi Wang, Martin Herbordt, Shuaiwen Leon Song, Kevin Barker
-
A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery SC 2020
Ankit Srivastava, Sriram Chockalingam, Srinivas Aluru
-
Recurrent Neural Network Architecture Search for Geophysical Emulation SC 2020
Romit Maulik, Romain Egele, Bethany Lusch, Prasanna Balaprakash
-
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity SC 2020
Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu
-
Sparse GPU Kernels for Deep Learning SC 2020
Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen
-
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA SC 2020
Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka
Code: https://github.com/wahibium/SC20-KARMA-AD-Appendix-Description-
-
ZeRO: Memory optimizations Toward Training Trillion Parameter Models SC 2020
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
-
Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations SC 2020
Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang,Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, Jiwu Shu
-
Optimizing Deep Learning Recommender Systems Training on CPU Cluster Architectures SC 2020
Dhiraj Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, Alexander Heinecke
-
Herring: Rethinking the Parameter Server at Scale for the Cloud SC 2020
Indu Thangakrishnan, Derya Cavdar, Can Karakus, Piyush Ghai, Yauheni Selivonchyk, Cory Pruce
-
GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training SC 2020
Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani
-
Newton-ADMM: A Distributed GPU-Accelerated Optimizer for Multiclass Classification Problems SC 2020
Chih-Hao Fang, Sudhir B. Kylasa, Fred Roosta, Michael W. Mahoney, Ananth Grama
-
Reducing Communication in Graph Neural Network Training SC 2020
Alok Tripathy, Katherine Yelick, Aydın Buluç
-
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems SC 2020
Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang
-
GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks
Guyue Huang, Guohao Dai, Yu Wang, Huazhong Yang
-
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks SC 2021
J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang
-
Tensor Processing Primitives: A Programming Abstraction for Efficiency And Portability in Deep Learning Workloads SC 2021
Evangelos Georganas, Dhiraj Kalamkar, Sasikanth Avancha, Menachem Adelman, Cristina Anderson, Alexander Breuer, Jeremy Bruestle, Narendra Chaudhary, Abhisek Kundu, Denise Kutnick, Frank Laub, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Barukh Ziv, Alexander Heinecke
-
ET: Re-Thinking Self-Attention for Transformer Models on GPUs SC 2021
Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, Minyi Guo
Code: https://github.com/cctry/SCpaper-2021/tree/aedab163f44bff8dfad3745d4f57972cb7640cda
-
Parallel Construction of Module Networks SC 2021
Ankit Srivastava, Sriram P. Chockalingam, Maneesha Aluru, Srinivas Aluru
-
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines SC 2021
Shigang Li, Torsten Hoefler
-
APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores SC 2021
Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding
-
Scalable Edge-based Hyperdimensional Learning System with Brain-like Neural Adaptation SC 2021
Zhuowen Zou, Yeseong Kim, Farhad Imani, Haleh Alimohamadi, Rosario Cammarota, Mohsen Imani
-
Dr. Top-k: Delegate-Centric Top-k Computation on GPUs SC 2021
Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, Hang Liu
-
Distributed Multigrid Neural Solver on Megavoxel Domains SC 2021
Aditya Balu, Sergio Botelho, Biswajit Khara, Vinay Rao, Soumik Sarkar, Chinmay Hegde, Adarsh Krishnamurthy, Santi Adavani, Baskar Ganapathysubramanian
-
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM SC 2021
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia
-
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning SC 2021
Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He
-
FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers SC 2021
Zheng Chai, Yujing Chen, Ali Anwar, Liang Zhao, Yue Cheng, Huzefa Rangwala
-
DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks SC 2021
Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed, Sasikanth Avancha
Code: dmlc/dgl#2914 (commit: cfb73e2)
-
Efficient Scaling of Dynamic Graph Neural Networks SC 2021
Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, Yogish Sabharwal, Toyotaro Suzumura, Shashanka Ubaru
-
Efficient Tensor Core-Based GPU Kernels for Structured Sparsity Under Reduced Precision SC 2021
Zhaodong Chen, Zheng Qu, Liu Liu, Yufei Ding, Yuan Xie
-
Clipper: A Low-Latency Online Prediction Serving System NSDI 2017
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, Ion Stoica
-
Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds NSDI 2017
Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, and Phillip B. Gibbons, Onur Mutlu
-
TUX2: Distributed Graph Computation for Machine Learning NSDI 2017
Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen, Ming Wu, Wei Li, Lidong Zhou
-
Janus: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs NSDI 2019
Eunji Jeong, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, and Byung-Gon Chun
-
BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference? NSDI 2019
Suhas Jayaram Subramanya and Harsha Vardhan Simhadri, Srajan Garg, Anil Kag and Venkatesh Balasubramanian
-
Tiresias: A GPU Cluster Manager for Distributed Deep Learning NSDI 2019
Juncheng Gu, Mosharaf Chowdhury, and Kang G. Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, Chuanxiong Guo
-
Elastic Resource Sharing for Distributed Deep Learning NSDI 2021
Changho Hwang and Taehyun Kim, Sunghyun Kim, Jinwoo Shin and KyoungSoo Park
-
ATP: In-network Aggregation for Multi-tenant Learning NSDI 2021
ChonLam Lao, Yanfang Le and Kshiteej Mahajan, Yixi Chen and Wenfei Wu, Aditya Akella, Michael Swift
-
On the Use of ML for Blackbox System Performance Prediction NSDI 2021
Silvery Fu, Saurabh Gupta and Radhika Mittal, Sylvia Ratnasamy
-
Scaling Distributed Machine Learning with In-Network Aggregation NSDI 2021
Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan Ports, Peter Richtarik
-
In-Datacenter Performance Analysis of a Tensor Processing Unit ISCA 2017
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Hyun Yoon
-
SCALEDEEP: A Scalable Compute Architecture for Learning and Evaluating Deep Networks ISCA 2017
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, Anand Raghunathan
-
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks ISCA 2017
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally
-
Maximizing CNN Accelerator Efficiency Through Resource Partitioning ISCA 2017
Yongming Shen, Michael Ferdman, Peter Milder
-
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism ISCA 2017
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, Scott Mahlke
-
Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent ISCA 2017
Christopher De Sa, Matthew Feldman, Christopher Ré, Kunle Olukotun
-
PROMISE: An End-to-End Design of a Programmable Mixed-Signal Accelerator for Machine-Learning Algorithms ISCA 2018
Prakalp Srivastava, Mingu Kang, Sujan K. Gonugondla, Sungmin Lim, Jungwook Choi, Vikram Adve, Nam Sung Kim, Naresh Shanbhag
-
Computation Reuse in DNNs by Exploiting Input Similarity ISCA 2018
Marc Riera, Jose-Maria Arnau, Antonio González
-
GenAx: A Genome Sequencing Accelerator ISCA 2018
Daichi Fujiki, Aran Subramaniyan, Tianjun Zhang, Yu Zeng, Reetuparna Das, David Blaauw, Satish Narayanasamy
-
GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks ISCA 2018
Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Hadi Esmaeilzadeh
-
SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks ISCA 2018
Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, Hadi Esmaeilzadeh
-
UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition ISCA 2018
Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, Christopher W. Fletcher
-
Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation ISCA 2018
Eunhyeok Park, Dongyoung Kim, Sungjoo Yoo
-
Prediction based Execution on Deep Neural Networks ISCA 2018
Mingcong Song, Jiechen Zhao, Yang Hu, Jiaqi Zhang, Tao Li
-
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network ISCA 2018
Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, Hadi Esmaeilzadeh
-
Gist: Efficient Data Encoding for Deep Neural Network Training ISCA 2018
Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, Gennady Pekhimenko
-
The Dark Side of DNN Pruning ISCA 2018
Reza Yazdani, Marc Riera, Jose-Maria Arnau, Antonio González
-
Sparse ReRAM Engine: Joint exploration of activation and weight sparsity on compressed neural network ISCA 2019
Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I-Ching Tseng, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li
-
MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks ISCA 2019
Hanhwi Jang, Joonsung Kim, Jae-Eon Jo, Jaewon Lee, Jangwoo Kim
-
TIE: Energy-efficient tensor train-based inference engine for deep neural network ISCA 2019
Chunhua Deng, Fangxuan Sun, Xuehai Qian, Jun Lin, Zhongfeng Wang, Bo Yuan
-
Accelerating Distributed Reinforcement Learning with In-Switch Computing ISCA 2019
Youjie Li, Iou-Jen Liu, Yifan Yuan, Deming Chen, Alexander Schwing, Jian Huang
-
Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks ISCA 2019
Jiaqi Zhang, Xiangru Chen, Mingcong Song, Tao Li
-
Laconic Deep Learning Inference Acceleration ISCA 2019
Sayeh Sharify, Alberto Delmas Lascorz, Mostafa Mahmoud, Milos Nikolic, Kevin Siu, Dylan Malone Stuart, Zissis Poulos, Andreas Moshovos
-
DeepAttest: An End-to-End Attestation Framework for Deep Neural Networks ISCA 2019
Huili Chen, Cheng Fu, Bita Darvish Rouhani, Jishen Zhao, Farinaz Koushanfar
-
High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs ISCA 2020
Glenn Henry, Parviz Palangpour, Michael Thomson, J Scott Gardner, Bryce Arden, Kimble Houck, Jonathan Johnson, Kyle O'Brien, Scott Petersen, Benjamin Seroussi, Tyler Walker
-
Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads ISCA 2020
Dennis Abts, Jonathan Ross, Jon Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Teme Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, Rogan Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Utham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Rick Czekalski, Ashay Rane, Sahil Parmar
-
Gorgon: Accelerating Machine Learning from Relational Data ISCA 2020
Matthew Vilim, Alexander Rucker, Yaqi Zhang, Sophia Liu, Kunle Olukotun
-
SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural Networks ISCA 2020
Surya Narayanan, Karl Taht, Rajeev Balasubramonian, Edouard Giacomin, Pierre-Emmanuel Gaillardon
-
NEBULA: A Neuromorphic Spin-Based Ultra-Low Power Architecture for SNNs and ANNs ISCA 2020
Sonali Singh, Anup Sarma, Nicholas Jao, Ashutosh Pattnaik, Sen Lu, Kezhou Yang, Abhronil Sengupta, Vijaykrishnan Narayanan, Chita R. Das
-
uGEMM: Unary Computing Architecture for GEMM Applications ISCA 2020
Di Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, Joshua San Miguel
-
Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs ISCA 2020
Esha Choukse, Michael B. Sullivan, Mike O’Connor, Mattan Erez, Jeff Pool, David Nellans, Stephen W. Keckler
-
A Multi-Neural Network Acceleration Architecture ISCA 2020
Eunjin Baek, Dongup Kwon, Jangwoo Kim
-
SmartExchange: Trading Higher-Cost Memory Storage/Access for Lower-Cost Computation ISCA 2020
Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, Yingyan Lin
-
Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations ISCA 2020
Ranggi Hwang, Taehun Kim, Youngeun Kwon, Minsoo Rhu
-
DeepRecSys: A System for Optimizing End-to-End At-Scale Neural Recommendation Inference ISCA 2020
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu
-
An In-Network Architecture for Accelerating Shared-Memory Multiprocessor Collectives ISCA 2020
Benjamin Klenk, Nan Jiang, Greg Thorson, Larry Dennison
-
DRQ: Dynamic Region-Based Quantization for Deep Neural Network Acceleration ISCA 2020
Zhuoran Song, Bangqi Fu, Feiyang Wu, Zhaoming Jiang, Li Jiang, Naifeng Jing, Xiaoyao Liang
-
RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference ISCA 2021
Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei Wang, Sanchari Sen, Jintao Zhang, Ankur Agrawal, Monodeep Kar, Shubham Jain, Alberto Mannari, Hoang Tran, Yulong Li, Eri Ogawa, Kazuaki Ishizaki, Hiroshi Inoue, Marcel Schaal, Mauricio Serrano, Jungwook Choi, Xiao Sun, Naigang Wang, Chia-Yu Chen, Allison Allain, James Bonano, Nianzheng Cao, Robert Casatuta, Matthew Cohen, Bruce Fleischer, Michael Guillorn, Howard Haynie, Jinwook Jung, Mingu Kang, Kyu-hyoun Kim, Siyu Koswatta, Saekyu Lee, Martin Lutz, Silvia Mueller, Jinwook Oh, Ashish Ranjan, Zhibin Ren, Scot Rider, Kerstin Schelm, Michael Scheuermann, Joel Silberman, Jie Yang, Vidhi Zalani, Xin Zhang, Ching Zhou, Matt Ziegler, Vinay Shah, Moriyoshi Ohara, Pong-Fei Lu, Brian Curran, Sunil Shukla, Leland Chang, Kailash Gopalakrishnan
-
REDUCT: Keep It Close, Keep It Cool! - Scaling DNN Inference on Multi-Core CPUs with Near-Cache Compute ISCA 2021
Anant V. Nori, Rahul Bera, Shankar Balachandran, Joydeep Rakshit, Om J. Omer, Avishaii Abuhatzera, Belliappa Kuttanna, Sreenivas Subramoney
-
Communication Algorithm-Architecture Co-Design for Distributed Deep Learning ISCA 2021
Jiayi Huang, Pritam Majumder, Sungkeun Kim, Abdullah Muzahid, Ki Hwan Yum, Eun Jung Kim
-
Hetero-ViTAL: A Virtualization Stack for Heterogeneous FPGA Clusters ISCA 2021
Yue Zha, Jing LiYue Zha, Jing Li
-
Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms ISCA 2021
Saeed Rashidi, Matthew Denton, Srinivas Sridharan, Sudarshan Srinivasan, Amoghavarsha Suresh, Jade Nie, Tushar Krishna
-
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators ISCA 2021
Qijing Huang, Aravind Kalaiah, Minwoo Kang, James Demmel, Grace Dinh, John Wawrzynek, Thomas Norell, Yakun Sophia Shao
-
η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities ISCA 2021
Xingyao Zhang, Haojun Xia, Donglin Zhuang, Hao Sun, Xin Fu, Michael B. Taylor, Shuaiwen Leon Song
-
SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations ISCA 2021
Hongju Kal, Seokmin Lee, Gun Ko, Won Woo Ro
-
ELSA: Hardware-Software Co-Design for Efficient, Lightweight Self-Attention Mechanism in Neural Networks ISCA 2021
Tae Jun Ham, Yejin Lee, Seong Hoon Seo, Soosung Kim, Hyunji Choi, Sung Jun Jung, Jae W. Lee
-
Cambricon-Q: A Hybrid Architecture for Efficient Training ISCA 2021
Yongwei Zhao, Chang Liu, Zidong Du, Qi Guo, Xing Hu, Yimin Zhuang, Zhenxing Zhang, Xinkai Song, Wei Li, Xishan Zhang, Ling Li, Zhiwei Xu, Tianshi Chen
-
TENET: A Framework for Modeling Tensor Dataflow Based on Relation-Centric Notation ISCA 2021
Liqiang Lu, Naiqing Guan, Yuyue Wang, Liancheng Jia, Zizhang Luo, Jieming Yin, Jason Cong, Yun Liang
-
NASGuard: A Novel Accelerator Architecture for Robust Neural Architecture Search (NAS) Networks ISCA 2021
Xingbin Wang, Boyan Zhao, Rui Hou, Amro Awad, Zhihong Tian, Dan Meng
-
NASA: Accelerating Neural Network Design with a NAS Processor ISCA 2021
Xiaohan Ma, Chang Si, Ying Wang, Cheng Liu, Lei Zhang
-
NN-Baton: DNN Workload Orchestration and Chiplet Granularity Exploration for Multichip Accelerators ISCA 2021
Zhanhong Tan, Hongyu Cai, Runpei Dong, Kaisheng Ma
-
HASCO: Towards Agile Hardware and Software CO-design for Tensor Computation ISCA 2021
Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang
-
Dual-Side Sparse Tensor Core ISCA 2021
Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng
-
RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging ISCA 2021
Chao-Tsung Huang
-
GoSPA: An Energy-Efficient High-Performance Globally Optimized SParse Convolutional Neural Network Accelerator ISCA 2021
Chunhua Deng, Yang Sui, Siyu Liao, Xuehai Qian, Bo Yuan
-
TVM: End-to-End Compilation Stack for Deep Learning SysML 2018
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
-
Robust Gradient Descent via Moment Encoding with LDPC Codes SysML 2018
Raj Kumar Maity, Ankit Singh Rawat, Arya Mazumdar
-
Analog electronic deep networks for fast and efficient inference SysML 2018
Jonathan Binas, Daniel Neil, Giacomo Indiveri, Shih-Chii Liu, Michael Pfeiffer
-
YellowFin: Adaptive optimization for (A)synchronous systems SysML 2018
Jian Zhang, Ioannis Mitliagkas
-
Understanding the Limitations of Current Energy-Efficient Design Approaches for Deep Neural Networks SysML 2018
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze
-
"I Like the Way You Think!" - Inspecting the Internal Logic of Recurrent Neural Networks SysML 2018
Thibault Sellam, Kevin Lin, Ian Yiran Huang, Carl Vondrick, Eugene Wu
-
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling SysML 2019
Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, Roy Campbell
-
Priority-based Parameter Propagation for Distributed DNN Training SysML 2019
Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, Gennady Pekhimenko
-
BlueConnect: Decomposing All-Reduce for Deep Learning on Heterogeneous Network Hierarchy SysML 2019
Minsik Cho, Ulrich Finkler, David Kung
-
Beyond Data and Model Parallelism for Deep Neural Networks SysML 2019
Zhihao Jia, Matei Zaharia, Alex Aiken
-
ParMAC: Distributed Optimisation of Nested Functions, with Application to Learning Binary Autoencoders SysML 2019
Miguel A Carreira-Perpinan, Mehdi Alizadeh
-
3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning SysML 2019
Hyeontaek Lim, David G Andersen, Michael Kaminsky
-
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD SysML 2019
Jianyu Wang, Gauri Joshi
-
YellowFin and the Art of Momentum Tuning SysML 2019
Jian Zhang, Ioannis Mitliagkas
-
AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation SysML 2019
Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Arsany Guirguis, Sébastien Rouault
-
Towards Federated Learning at Scale: System Design SysML 2019
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konečný, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, Jason Roselander
-
Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification SysML 2019
Qi Lei, Lingfei Wu, Pin-Yu Chen, Alex Dimakis, Inderjit Dhillon, Michael J Witbrock
-
To compress or not to compress: Understanding the Interactions between Adversarial Attacks and Neural Network Compression SysML 2019
Ilia Shumailov, Yiren Zhao, Robert Mullins, Ross Anderson
-
Scaling Video Analytics on Constrained Edge Nodes SysML 2019
Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G Andersen, Michael Kaminsky, Subramanya R. Dulloor
-
CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video SysML 2019
Huizi Mao, Taeyoung Kong, William J. Dally
-
AdaScale: Towards Real-time Video Object Detection using Adaptive Scaling SysML 2019
Ting-Wu Chin, Ruizhou Ding, Diana Marculescu
-
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning SysML 2019
Paul Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Venkataramanaiah, Jae-sun Seo, Matthew Mattina
-
Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator SysML 2019
Tian Zhao, Yaqi Zhang, Kunle Olukotun
-
Restructuring Batch Normalization to Accelerate CNN Training SysML 2019
Wonkyung Jung, Daejin Jung, Byeongho Kim, Sunjung Lee, Wonjong Rhee, Jung Ho Ahn
Code: https://github.com/scale-snu/caffe-bn-restructuring, https://github.com/scale-snu/mkldnn-bn-restructuring
-
Bandana: Using Non-Volatile Memory for Storing Deep Learning Models SysML 2019
Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, Sachin Katti
-
Mini-batch Serialization: CNN Training with Inter-layer Data Reuse SysML 2019
Sangkug Lym, Armand Behroozi, Wei Wen, Ge Li, Yongkee Kwon, Mattan Erez
-
Accurate and Efficient 2-bit Quantized Neural Networks SysML 2019
Jungwook Choi, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan, Zhuo Wang, Pierce Chuang
-
Full Deep Neural Network Training on a Pruned Weight Budget SysML 2019
Mieszko Lis, Maximilian Golub, Guy Lemieux
-
Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment SysML 2019
Cedric Renggli, Bojan Karlaš, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, Ce Zhang
-
Data Validation for Machine Learning SysML 2019
Neoklis Polyzotis, Martin Zinkevich, Sudip Roy, Eric Breck, Steven Whang
-
Kernel machines that adapt to GPUs for effective large batch training SysML 2019
Siyuan Ma, Mikhail Belkin
-
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications SysML 2019
Dibakar Gope, Ganesh Dasika, Matthew Mattina
-
Optimizing DNN Computation with Relaxed Graph Substitutions SysML 2019
Zhihao Jia, James Thomas, Todd Warszawski, Mingyu Gao, Matei Zaharia, Alex Aiken
-
Pytorch-BigGraph: A Large Scale Graph Embedding System SysML 2019
Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, Alex Peysakhovich
-
RLgraph: Modular Computation Graphs for Deep Reinforcement Learning SysML 2019
Michael Schaarschmidt, Sven Mika, Kai Fricke, Eiko Yoneki
-
TensorFlow Eager: A multi-stage, Python-embedded DSL for machine learning SysML 2019
Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, Shanqing Cai
-
AutoGraph: Imperative-style Coding with Graph-based Performance SysML 2019
Dan Moldovan, James Decker, Fei Wang, Andrew Johnson, Brian Lee, Zack Nado, D Sculley, Tiark Rompf, Alexander B Wiltschko
Code: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/autograph
-
TensorFlow.js: Machine Learning for the Web and Beyond SysML 2019
Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Charles Nicholson, Nick Kreeger, Ping Yu, Shanqing Cai, Eric Nielsen, David Soegel, Stan Bileschi, Michael Terry, Ann Yuan, Kangyi Zhang, Sandeep Gupta, Sarah Sirajuddin, D Sculley, Rajat Monga, Greg Corrado, Fernanda Viegas, Martin M Wattenberg
-
A System for Massively Parallel Hyperparameter Tuning SysML 2020
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Jonathan Ben-tzur, Moritz Hardt, Benjamin Recht, Ameet Talwalkar
-
PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud SysML 2020
Liang Luo, Peter West, Jacob Nelson, Arvind Krishnamurthy, Luis Ceze
-
Federated Optimization in Heterogeneous Networks SysML 2020
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith
-
BPPSA: Scaling Back-propagation by Parallel Scan Algorithm SysML 2020
Shang Wang, Yifan Bai, Gennady Pekhimenko
-
Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems SysML 2020
Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, Ping Li
-
Resource Elasticity in Distributed Deep Learning SysML 2020
Andrew Or, Haoyu Zhang, Michael Freedman
-
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems SysML 2020
Beidi Chen, Tharun Medini, James Farwell, sameh gobriel, Charlie Tai, Anshumali Shrivastava
-
FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks SysML 2020
Hui Guan, Laxmikant Kishor Mokadam, Xipeng Shen, Seung-Hwan Lim, Robert Patton
-
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization SysML 2020
Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Joseph Gonzalez, Kurt Keutzer, Ion Stoica
-
What is the State of Neural Network Pruning? SysML 2020
Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, John Guttag
-
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems SysML 2020
Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-Mei Hwu, Deming Chen
-
MNN: A Universal and Efficient Inference Engine SysML 2020
Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lyu, Zhihua Wu
-
Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference SysML 2020
Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia
-
Attention-based Learning for Missing Data Imputation in HoloClean SysML 2020
Richard Wu, Aoqian Zhang, Ihab Ilyas, Theodoros Rekatsinas
-
Privacy-Preserving Bandits SysML 2020
Mohammad Malekzadeh, Dimitrios Athanasakis, Hamed Haddadi, Ben Livshits
Code: https://github.com/mmalekzadeh/privacy-preserving-bandits
-
Understanding the Downstream Instability of Word Embeddings SysML 2020
Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher Aberger, Christopher Re
-
Model Assertions for Monitoring and Improving ML Models SysML 2020
Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia
-
AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning SysML 2020
Ameer Haj-Ali, Qijing (Jenny) Huang, John Xiang, William Moses, Krste Asanovic, John Wawrzynek, Ion Stoica
-
Automatically batching control-intensive programs for modern accelerators SysML 2020
Alexey Radul, Brian Patton, Dougal Maclaurin, Matthew Hoffman, Rif A. Saurous
-
Predictive Precompute with Recurrent Neural Networks SysML 2020
Hanson Wang, Zehui Wang, Yuanyuan Ma
-
Sense & Sensitivities: The Path to General-Purpose Algorithmic Differentiation SysML 2020
Mike Innes
-
Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices SysML 2020
Byung Hoon Ahn, Jinwon Lee, Jamie Menjay Lin, Hsin-Pai Cheng, Jilei Hou, Hadi Esmaeilzadeh
-
Fine-Grained GPU Sharing Primitives for Deep Learning Applications SysML 2020
Peifeng Yu, Mosharaf Chowdhury
-
Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc SysML 2020
Zhihao Jia, Sina Lin, Mingyu Gao, Matei Zaharia, Alex Aiken
-
OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator SysML 2020
Junki Park, Hyunsung Yoon, Daehyun Ahn, Jungwook Choi, Jae-Joon Kim
-
PoET-BiN: Power Efficient Tiny Binary Neurons SysML 2020
Sivakumar Chidambaram, Pierre Langlois, Jean-Pierre David
-
Memory-Driven Mixed Low Precision Quantization for Enabling Deep Network Inference on Microcontrollers SysML 2020
Manuele Rusci, Alessandro Capotondi, Luca Benini
Code: https://github.com/mrusci/training-mixed-precision-quantized-networks
-
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks SysML 2020
Sambhav Jain, Albert Gural, Michael Wu, Chris Dick
-
Riptide: Fast End-to-End Binarized Neural Networks SysML 2020
Joshua Fromm, Meghan Cowan, Matthai Philipose, Luis Ceze, Shwetak Patel
-
Searching for Winograd-aware Quantized Networks SysML 2020
Javier Fernandez-Marques, Paul Whatmough, Andrew Mundy, Matthew Mattina
-
Blink: Fast and Generic Collectives for Distributed ML SysML 2020
Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Nikhil Devanur, Jorgen Thelin, Ion Stoica
-
A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms SysML 2020
Yu Wang, Gu-Yeon Wei, David Brooks
-
MotherNets: Rapid Deep Ensemble Learning SysML 2020
Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, Stratos Idreos
-
MLPerf Training Benchmark SysML 2020
Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, Matei Zaharia
-
Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick SysML 2021
Isak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma, Ameer Abdelhadi, Ciaran Bannon, Milos Nikolic, Mostafa Mahmoud, Alberto
-
To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks SysML 2021
Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, Yunxin Liu
-
Cortex: A Compiler for Recursive Deep Learning Models SysML 2021
Pratik Fegade, Tianqi Chen, Phillip Gibbons, Todd Mowry
-
Adaptive Gradient Communication via Critical Learning Regime Identification SysML 2021
Saurabh Agarwal, Hongyi Wang, Kangwook Lee, Shivaram Venkataraman, Dimitris Papailiopoulos
-
EXPLORING THE LIMITS OF CONCURRENCY IN ML TRAINING ON GOOGLE TPUS SysML 2021
Sameer Kumar, Yu Wang, Cliff Young, James Bradbury, Naveen Kumar, Dehao Chen, Andy Swing
-
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy SysML 2021
Lucas Liebenwein, Cenk Baykal, Brandon Carter, David Gifford, Daniela Rus
-
Learning Fitness Functions for Machine Programming SysML 2021
Shantanu Mandal, Todd Anderson, Javier Turek, Justin Gottschlich, Shengtian Zhou, Abdullah Muzahid
-
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More SysML 2021
Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Anshumali Shrivastava
-
IOS: Inter-Operator Scheduler for CNN Acceleration SysML 2021
Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han
Code: https://github.com/mit-han-lab/inter-operator-scheduler
-
A Deep Learning Based Cost Model for Automatic Code Optimization SysML 2021
Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham LEGHETTAS, Kamel Abdous, Taha Arbaoui, Karima BENATCHBA, Saman amarasinghe
-
Don't Forget to Sign the Gradients! SysML 2021
Omid Aramoon, Pin-Yu Chen, Gang Qu
-
Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference SysML 2021
Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang
-
Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators SysML 2021
Hamzah Abdelaziz, ali shafiee, Jong Hoon Shin, Ardavan Pedram, Joseph Hassoun
-
Swift for TensorFlow: A portable, flexible platform for deep learning SysML 2021
Brennan Saeta, Denys Shabalin
-
Equality Saturation for Tensor Graph Superoptimization SysML 2021
Yichen Yang, Phitchaya Phothilimthana, Yisu Wang, Max Willsey, Sudip Roy, Jacques Pienaar
-
PipeMare: Asynchronous Pipeline Parallel DNN Training SysML 2021
Bowen Yang, Jian Zhang, Jonathan Li, Christopher Re, Christopher Aberger, Christopher De Sa
-
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems SysML 2021
Ahmed M. Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini , Marco Canini
-
Value Learning for Throughput Optimization of Deep Learning Workloads SysML 2021
Benoit Steiner, Chris Cummins, Horace He, Hugh Leather
-
Scaling Distributed Training with Adaptive Summation SysML 2021
Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum
-
Learning on Distributed Traces for Data Center Storage Systems SysML 2021
Giulio Zhou, Martin Maas
-
Pufferfish: Communication-efficient Models At No Extra Cost SysML 2021
Hongyi Wang, Saurabh Agarwal, Dimitris Papailiopoulos
-
A Learned Performance Model for Tensor Processing Units SysML 2021
Sam Kaufman, Phitchaya Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows
-
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters SysML 2021
Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu
-
ModularNAS: Towards Modularized and Reusable Neural Architecture Search SysML 2021
Yunfeng Lin, Guilin Li, Xing Zhang, Weinan Zhang, Bo Chen, Ruiming Tang, Zhenguo Li, Jiashi Feng, Yong Yu
Code: https://github.com/huawei-noah/vega/tree/master/vega/algorithms/nas/modnas
-
FLAML: A Fast and Lightweight AutoML Library SysML 2021
Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu
-
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models SysML 2021 D
Chunxing Yin, Bilge Acun, Carole-Jean Wu, Xing Liu
-
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection SysML 2021
Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Changlin Wan, Wen Wang, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao, Jimeng Sun, Leman Akoglu
-
Pipelined Backpropagation at Scale: Training Large Models without Batches SysML 2021
Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Koster
-
Fluid: Resource-aware Hyperparameter Tuning Engine SysML 2021
Peifeng Yu, Jiachen Liu, Mosharaf Chowdhury
-
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers SysML 2021
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul Whatmough
-
Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices SysML 2021
Urmish Thakker, Paul Whatmough, ZHIGANG LIU, Matthew Mattina, Jesse Beu
-
A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-core Systems SysML 2021
Guixiang Ma, Yao Xiao, Theodore Willke, Nesreen Ahmed, Shahin Nazarian, Paul Bogdan
-
Bit Error Robustness for Energy-Efficient DNN Accelerators SysML 2021
David Stutz, Nandhini Chandramoorthy, Matthias Hein, Bernt Schiele
-
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models SysML 2021
Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko
-
Characterizing and Taming Model Instability Across Edge Devices SysML 2021
Eyal Cidon, Evgenya Pergament, Zain Asgar, Asaf Cidon, Sachin Katti
-
Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery SysML 2021
Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu
-
FirePlace: Placing Firecraker Virtual Machines with Hindsight Imitation SysML 2021
Bharathan Balaji, Christopher Kakovitch, Balakrishnan Narayanaswamy
-
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data SysML 2021
Guanhua Wang, Zhuang Liu, Brandon Hsieh, Siyuan Zhuang, Joseph Gonzalez, Trevor Darrell, Ion Stoica
-
Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural Networks SysML 2021
Tom Bannink, Adam Hillier, Lukas Geiger, Tim de Bruin, Leon Overweel, Jelmer Neeven, Koen Helwegen
-
Wavelet: Efficient DNN Training with Tick-Tock Scheduling SysML 2021
Guanhua Wang, Kehan Wang, Kenan Jiang, XIANGJUN LI, Ion Stoica
-
Data Movement Is All You Need SysML 2021
Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler
-
Scaling Polyhedral Neural Network Verification on GPUs SysML 2021
Christoph Müller , François Serre, Gagandeep Singh, Markus Püschel, Martin Vechev
-
Accounting for Variance in Machine Learning Benchmarks SysML 2021
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Tal Arbel, Chris Pal, Gael Varoquaux, Pascal Vincent
-
Amazon SageMaker Debugger: A System for Real-Time Insights into Machine Learning Model Training SysML 2021
Nathalie Rauschmayr, Vikas Kumar, Rahul Huilgol, Andrea Olgiati, Satadal Bhattacharjee, Nihal Harish, Vandana Kannan, Amol Lele, Anirudh Acharya, Jared Nielsen, Lakshmi Ramakrishnan, Ishan Bhatt, Kohen Chia, Neelesh Dodda, Zhihan Li, Jiacheng Gu, Miyoung Choi, Balajee Nagarajan, Jeffrey Geevarghese, Denis Davydenko, Sifei Li, Lu Huang, Edward Kim, Tyler Hill, Krishnaram Kenthapadi
-
RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads SysML 2021
James Gleeson, Moshe Gabel, Gennady Pekhimenko, Eyal de Lara, Srivatsan Krishnan, Vijay Janapa Reddi
-
TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems SysML 2021
Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, Rocky Rhodes
-
ByzShield: An Efficient and Robust System for Distributed Training SysML 2021
Konstantinos Konstantinidis, Aditya Ramamoorthy
-
In-network Aggregation for Shared Machine Learning Clusters SysML 2021
Nadeen Gebara, Manya Ghobadi, Paolo Costa
-
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions SysML 2021
Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li , Jingren Zhou, Ce Zhang, Gustavo Alonso
-
Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity SysML 2021
Toshiaki Wakatsuki, Sekitoshi Kanai, Yasuhiro Fujiwara
-
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference SysML 2021
Steve Dai, Rangha Venkatesan, Mark Ren, Brian Zimmer, William Dally, Brucek Khailany
-
CODE: Compiler-based Neuron-aware Ensemble training SysML 2021
Ettore M. G. Trainiti, Thanapon Noraset, David Demeter, Doug Downey, Simone Campanoni
-
SystemML: Declarative Machine Learning on Spark VLDB 2016
Matthias Boehm, Michael W. Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Arvind C. Surve, Shirish Tatikonda
-
Compressed linear algebra for large-scale machine learning VLDB 2016
Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald
-
dmapply: A functional primitive to express distributed machine learning algorithms in R VLDB 2016
Edward Ma, Vishrut Gupta, Meichun Hsu, Indrajit Roy
-
MLog: Towards Declarative In-Database Machine Learning VLDB 2017
Xupeng Li, Bin Cui, Yiru Chen, Wentao Wu, Ce Zhang
-
Towards Linear Algebra over Normalized Data VLDB 2017
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, Jignesh M. Patel
-
Distributed Join Algorithms on Thousands of Cores VLDB 2017
Claude Barthels, Ingo Müller, Timo Schneider, Gustavo Alonso, Torsten Hoefler
-
Probabilistic demand forecasting at scale VLDB 2017
Joos-Hendrik Böse, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Dustin Lange, David Salinas, Sebastian Schelter, Matthias Seeger, Yuyang Wang
-
Optimizing Deep CNN-Based Queries over Video Streams at Scale VLDB 2017
Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia
-
On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML VLDB 2018
Raj Kumar Maity, Ankit Singh Rawat, Arya MazumdarRaj Kumar Maity, Ankit Singh Rawat, Arya Mazumdar
-
Helix: Accelerating Human-in-the-loop Machine Learning VLDB 2018
Doris Xin, Litian Ma, Jialin Liu, Stephen Macke, Shuchen Song, Aditya Parameswaran
-
Ease.ml in action: towards multi-tenant declarative learning services VLDB 2018
Bojan Karlaš, Ji Liu, Wentao Wu, Ce Zhang
-
Learning Efficiently Over Heterogeneous Databases VLDB 2018
Jose Picado, Arash Termehchy, Sudhanshu Pathak
-
MLBench: Benchmarking Machine Learning Services Against Human Experts VLDB 2018
Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang
-
Distributed Representations of Tuples for Entity Resolution VLDB 2018
Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, Nan Tang
-
SKCompress: compressing sparse and nonuniform gradient in distributed machine learning
Jiawei Jiang, Fangcheng Fu, Tong Yang, Yingxia Shao, Bin Cui
-
PyTorch Distributed: Experiences on Accelerating Data Parallel Training VLDB 2020
Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala
-
Tunable Streaming Graph Embeddings at Scale VLDB 2020
Serafeim Papadias
-
Snorkel: rapid training data creation with weak supervision
Alexander Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason A. Fries, Sen Wu, Christopher Ré
-
Model averaging in distributed machine learning: a case study with Apache Spark
Yunyan Guo, Zhipeng Zhang, Jiawei Jiang, Wentao Wu, Ce Zhang, Bin Cui, Jianzhong Li
-
Towards a Unified Knowledge Graph Data Management System VLDB 2021
Baozhu Liu, Xin Wang, Pengkai Liu, Sizhuo Li
-
Exploiting Data Distribution in Distributed Learning of Deep Classification Models under the Parameter Server Architecture VLDB 2021
Nikodimos Provatas
-
Learning scheduling algorithms for data processing clusters SIGCOMM 2019
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh
-
An Edge Computing Marketplace for Distributed Machine Learning SIGCOMM 2019
Susham Yerabolu, Samuel Gomena, Ehsan Aryafar, Carlee Joe-Wong
-
Challenging the generalization capabilities of Graph Neural Networks for network modeling SIGCOMM 2019
José Suárez-Varela, Sergi Carol-Bosch, Krzysztof Rusek, Paul Almasan, Marta Arias, Pere Barlet-Ros, Albert Cabellos-Aparicio
Code: https://github.com/knowledgedefinednetworking/demo-routenet
-
Is Network the Bottleneck of Distributed Training? SIGCOMM 2020
Zhen Zhang, Chaokun Chang, Haibin Lin, Yida Wang, Raman Arora, Xin Jin
-
Challenges in Using ML for Networking Research: How to Label If You Must SIGCOMM 2020
Yukhe Lavinia, Ramakrishnan Durairajan, Reza Rejaie, Walter Willinger
-
Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems SIGCOMM 2021
Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica
-
SiP-ML: high-bandwidth optical network interconnects for machine learning training SIGCOMM 2021
Mehrdad Khani Shirkoohi, Manya Ghobadi, Mohammad Alizadeh, Ziyi Zhu, Madeleine Glick, Keren Bergman, Amin Vahdat, Benjamin Klenk, Eiman Ebrahimi
-
Being prepared in a sparse world: The case of KNN graph construction ICDE 2016
Antoine Boutet, Anne-Marie Kermarrec, Nupur Mittal, François Taïani
-
pSCAN: Fast and exact structural graph clustering ICDE 2016
Lijun Chang, Wei Li, Xuemin Lin, Lu Qin, Wenjie Zhang
-
Scalable and Interactive Graph Clustering Algorithm on Multicore CPUs ICDE 2017
Son T. Mai, Martin Storgaard Dieu, Ira Assent, Jon Jacobsen, Jesper Kristensen, Mathias Birk
-
Towards Unified Data and Lifecycle Management for Deep Learning ICDE 2017
Hui Miao, Ang Li, Larry S. Davis, Amol Deshpande
-
In-Memory Distributed Matrix Computation Processing and Optimization ICDE 2017
Yongyang Yu, Mingjie Tang, Walid G. Aref, Qutaibah M. Malluhi, Mostafa M. Abbas, Mourad Ouzzani
Code: https://github.com/yuyongyang800/SparkDistributedMatrix
-
Modeling Scalability of Distributed Machine Learning ICDE 2017
Alexander Ulanov, Andrey Simanovsky, Manish Marwah
-
ModelHub: Deep Learning Lifecycle Management ICDE 2017
Hui Miao, Ang Li, Larry S. Davis, Amol Deshpande
-
Aurum: A Data Discovery System ICDE 2018
Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, Michael Stonebraker
-
MLlib*: Fast Training of GLMs Using Spark MLlib ICDE 2019
Zhipeng Zhang, Jiawei Jiang, Wentao Wu, Ce Zhang, Lele Yu, Bin Cui
-
Towards Explaining the Effects of Data Preprocessing on Machine Learning ICDE 2019
Carlos Vladimiro Gonzalez Zelaya
-
Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study ICDE 2019
Guowei Xu, Wenbiao Ding, Jiliang Tang, Songfan Yang, Gale Yan Huang, Zitao Liu
Code: https://github.com/tal-ai/Representation-Learning-with-crowdsourced-Labels
-
CogLearn: A Cognitive Graph-Oriented Online Learning System ICDE 2019
Yang Pian, Yu Lu, Penghe Chen, Qinglong Duan
-
Don't Fear the REAPER: A Framework for Materializing and Reusing Deep-Learning Models ICDE 2019
Melanie B. Sigl
-
Adaptive Deep Reuse: Accelerating CNN Training on the Fly ICDE 2019
Lin Ning, Hui Guan, Xipeng Shen
-
ColumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent ICDE 2020
Zhipeng Zhang, Wentao Wu, Jiawei Jiang, Lele Yu, Bin Cui, Ce Zhang
-
PSGraph: How Tencent trains extremely large-scale graphs with Spark ICDE 2020
Jiawei Jiang, Pin Xiao, Lele Yu, Xiaosen Li, Jiefeng Cheng, Xupeng Miao, Zhipeng Zhang, Bin Cui
-
Efficient Diversity-Driven Ensemble for Deep Neural Networks ICDE 2020
Wentao Zhang, Jiawei Jiang, Yingxia Shao, Bin Cui
-
HomoPAI: A Secure Collaborative Machine Learning Platform based on Homomorphic Encryption ICDE 2020
Qifei Li, Zhicong Huang, Wen-jie Lu, Cheng Hong, Hunter Qu, Hui He, Weizhe Zhang
-
Machine Learning Meets Big Spatial Data ICDE 2020
Ibrahim Sabek, Mohamed F. Mokbel
-
On the Integration of Machine Learning and Array Databases ICDE 2020
Sebastián Villarroya, Peter Baumann
-
Fela: Incorporating Flexible Parallelism and Elastic Tuning to Accelerate Large-Scale DML ICDE 2020
Jinkun Geng, Dan Li, Shuai Wang
-
CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs (Extended Abstract) ICDE 2021
Xupeng Miao, Lingxiao Ma, Zhi Yang, Yingxia Shao, Bin Cui, Lele Yu, Jiawei Jiang
-
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks ICDE 2021
Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, Ce Zhang
-
MLCask: Efficient Management of Component Evolution in Collaborative Data Analytics Pipelines ICDE 2021
Zhaojing Luo, Sai Ho Yeung, Meihui Zhang, Kaiping Zheng, Lei Zhu, Gang Chen, Feiyi Fan, Qian Lin, Kee Yuan Ngiam, Beng Chin Ooi
-
Ranking Data Slices for ML Model Validation: A Shapley Value Approach ICDE 2021
Eitan Farchi, Ramasuri Narayanam, Lokesh Nagalapatti
-
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform SIGKDD 2017
Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, Martin Zinkevich
-
PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification SIGKDD 2017
Ian E.H. Yen, Xiangru Huang, Wei Dai, Pradeep Ravikumar, Inderjit Dhillon, Eric Xing
-
A Data Mining Framework for Valuing Large Portfolios of Variable Annuities SIGKDD 2017
Guojun Gan, Jimmy Xiangji Huang
-
Google Vizier: A Service for Black-Box Optimization SIGKDD 2017
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, D. Sculley
-
KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial SIGKDD 2017
Jun Zhou, Xiaolong Li, Peilin Zhao, Chaochao Chen, Longfei Li, Xinxing Yang, Qing Cui, Jin Yu, Xu Chen, Yi Ding, Yuan Alan Qi
-
Deploying Machine Learning Models for Public Policy: A Framework SIGKDD 2018
*Klaus Ackermann, Joe Walsh, Adolfo De Unánue, Hareem Naveed, Andrea Navarrete Rivera, Sun-Joo Lee, Jason Bennett, Michael Defoe, Crystal Cody, Lauren Haynes, Rayid Ghani *
-
Scalable Optimization for Embedding Highly-Dynamic and Recency-Sensitive Data SIGKDD 2018
Xumin Chen, Peng Cui, Lingling Yi, Shiqiang Yang
-
Autotune: A Derivative-free Optimization Framework for Hyperparameter Tuning SIGKDD 2018
Patrick Koch, Oleg Golovidov, Steven Gardner, Brett Wujek, Joshua Griffin, Yan Xu
-
Corpus Conversion Service: A Machine Learning Platform to Ingest Documents at Scale SIGKDD 2018
Peter W J Staar, Michele Dolfi, Christoph Auer, Costas Bekas
-
FDML: A Collaborative Machine Learning Framework for Distributed Features SIGKDD 2019
Yaochen Hu, Di Niu, Jianming Yang, Shengping Zhou
-
Machine Learning at Microsoft with ML.NET SIGKDD 2019
Zeeshan Ahmed, Saeed Amizadeh, Mikhail Bilenko, Rogan Carr, Wei-Sheng Chin, Yael Dekel, Xavier Dupré, Vadim Eksarevskiy, Senja Filipi, Tom Finley, Abhishek Goswami, Monte Hoover, Scott Inglis, Matteo Interlandi, Najeeb Kazmi, Gleb Krivosheev, Pete Luferenko, Ivan Matantsev, Sergiy Matusevych, Shahab Moradi, Gani Nazirov, Justin Ormont, Gal Oshri, Artidoro Pagnoni, Jignesh Parmar, Prabhat Roy, Mohammad Zeeshan Siddiqui, Markus Weimer, Shauheen Zahirazami, Yiwen Zhu
-
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks SIGKDD 2019
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh
-
Training and Meta-Training Binary Neural Networks with Quantum Computing SIGKDD 2019
Abdulah Fawaz, Paul Klein, Sebastien Piat, Simone Severini, Peter Mountney
-
A Generalized Framework for Population Based Training SIGKDD 2019
Ang Li, Ola Spyra, Sagi Perel, Valentin Dalibard, Max Jaderberg, Chenjie Gu, David Budden, Tim Harley, Pramod Gupta
-
Large-Scale Training Framework for Video Annotation SIGKDD 2019
Seong Jae Hwang, Joonseok Lee, Balakrishnan Varadarajan, Ariel Gordon, Zheng Xu, Apostol Natsev
-
OBOE: Collaborative Filtering for AutoML Model Selection SIGKDD 2019
Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell
-
Building Continuous Integration Services for Machine Learning SIGKDD 2020
Bojan Karlas, Matteo Interlandi, Cédric Renggli, Wentao Wu, Ce Zhang, Deepak Mukunthu Iyappan Babu, Jordan Edwards, Chris Lauren, Andy Xu, Markus Weimer
-
An Empirical Analysis of Backward Compatibility in Machine Learning Systems SIGKDD 2020
Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz
-
FedFast: Going Beyond Average for Faster Training of Federated Recommender Systems SIGKDD 2020
Khalil Muhammad, Qinqin Wang, Diarmuid O'Reilly-Morgan, Elias Z. Tragos, Barry Smyth, Neil Hurley, James Geraci, Aonghus Lawlor
-
Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks SIGKDD 2020
Weilin Cong, Rana Forsati, Mahmut T. Kandemir, Mehrdad Mahdavi
-
GPT-GNN: Generative Pre-Training of Graph Neural Networks SIGKDD 2020
Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, Yizhou Sun
-
Large-Scale Training System for 100-Million Classification at Alibaba SIGKDD 2020
Liuyihan Song, Pan Pan, Kang Zhao, Hao Yang, Yiming Chen, Yingya Zhang, Yinghui Xu, Rong Jin
-
DeepLine: AutoML Tool for Pipelines Generation using Deep Reinforcement Learning and Hierarchical Actions Filtering SIGKDD 2020
Yuval Heffetz, Roman Vainshtein, Gilad Katz, Lior Rokach
-
AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space SIGKDD 2020
Chengrun Yang, Jicong Fan, Ziyang Wu, Madeleine Udell
-
DeGNN: Improving Graph Neural Networks with Graph Decomposition SIGKDD 2021
Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Susie Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang
-
ROD: Reception-aware Online Distillation for Sparse Graphs SIGKDD 2021
Wentao Zhang, Yuezihan Jiang, Yang Li, Zeang Sheng, Yu Shen, Xupeng Miao, Liang Wang, Zhi Yang, Bin Cui
-
OpenBox: A Generalized Black-box Optimization Service SIGKDD 2021
Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, Bin Cui
-
Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud SIGKDD 2021
Michaela Hardt, Xiaoguang Chen, Xiaoyi Cheng, Michele Donini, Jason Gelman, Satish Gollaprolu, John He, Pedro Larroy, Xinyu Liu, Nick McCarthy, Ashish Rathi, Scott Rees, Ankit A. Siva, ErhYuan Tsai, Keerthan Vasist, Pinar Yilmaz, Muhammad Bilal Zafar, Sanjiv Das, Kevin Haas, Tyler Hill, Krishnaram Kenthapadi
-
AutoSmart: An Efficient and Automatic Machine Learning Framework for Temporal Relational Data SIGKDD 2021
Zhipeng Luo, Zhixing He, Jin Wang, Manqing Dong, Jianqiang Huang, Mingjian Chen, Bohang Zheng
-
Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs SIGKDD 2021
Jialin Dong, Da Zheng, Lin F. Yang, George Karypis
-
Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism SIGKDD 2021
Vipul Gupta, Dhruv Choudhary, Ping Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, Michael W. Mahoney
-
Hierarchical Training: Scaling Deep Recommendation Models on Large CPU Clusters SIGKDD 2021
Yuzhen Huang, Xiaohan Wei, Xing Wang, Jiyan Yang, Bor-Yiing Su, Shivam Bharuka, Dhruv Choudhary, Zewei Jiang, Hai Zheng, Jack Langman
-
STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning EuroSys 2016
Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, Eric P. Xing
-
GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-specialized Parameter Server EuroSys 2016
Henggang Cui, Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons, Eric P. Xing
-
Proteus: Agile ML Elasticity Through Tiered Reliability in Dynamic Resource Markets EuroSys 2017
Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, Phillip B. Gibbons
-
Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters EuroSys 2018
Yangrui Chen, Yanghua Peng, Yixin Bao, Chuan Wu, Yibo Zhu, Chuanxiong Guo
-
Dynamic Control Flow in Large-scale Machine Learning EuroSys 2018
Yuan Yu, Martín Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis, Jeff Dean, Sanjay Ghemawat, Tim Harley, Peter Hawkins, Michael Isard, Manjunath Kudlur, Rajat Monga, Derek Murray, Xiaoqiang Zheng
-
Improving the Expressiveness of Deep Learning Frameworks with Recursion EuroSys 2018
Eunji Jeong, Joo Seong Jeong, Soojeong Kim, Gyeong-In Yu, Byung-Gon Chun
-
Low Latency RNN Inference with Cellular Batching EuroSys 2018
Pin Gao, Lingfan Yu, Yongwei Wu, Jinyang Li
-
GRNN: Low-Latency and Scalable RNN Inference on GPUs EuroSys 2019
Connor Holmes, Daniel Mawhirter, Yuxiong He, Feng Yan, Bo Wu
-
Automating Dependence-Aware Parallelization of Machine Learning Training on Distributed Shared Memory EuroSys 2019
Jinliang Wei, Garth A. Gibson, Phillip B. Gibbons, Eric P. Xing
-
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks EuroSys 2019
Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun
-
Fast Distributed Deep Learning over RDMA EuroSys 2019
Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, Lidong Zhou
-
μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization EuroSys 2019
Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, Jangwoo Kim
-
Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning EuroSys 2020
Shubham Chaudhary, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Srinidhi Viswanatha
-
Subway: minimizing data transfer during out-of-GPU-memory graph processing EuroSys 2020
Amir Hossein Nodehi Sabet, Zhijia Zhao, Rajiv Gupta
-
Peregrine: a pattern-aware graph mining system EuroSys 2020
Kasra Jamshidi, Rakesh Mahadasa, Keval Vora
-
FlexGraph: a flexible and efficient distributed framework for GNN training EuroSys 2021
Lei Wang, Qiang Yin, Chao Tian, Jianbang Yang, Rong Chen, Wenyuan Yu, Zihang Yao, Jingren Zhou
-
DGCL: an efficient communication library for distributed GNN training EuroSys 2021
Zhenkun Cai, Xiao Yan, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu
-
Accelerating graph sampling for graph machine learning using GPUs EuroSys 2021
Abhinav Jangda, Sandeep Polisetty, Arjun Guha, Marco Serafini
-
Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU EuroSys 2021
Zhen Xie, Wenqian Dong, Jiawen Liu, Hang Liu, Dong Li
-
Seastar: Vertex-Centric Programming for Graph Neural Networks EuroSys 2021
Yidi Wu, Kaihao Ma, Zhenkun Cai, Tatiana Jin, Boyang Li, Chengguang Zheng, James Cheng, Fan Yu
-
Ako: Decentralised Deep Learning with Partial Gradient Exchange SoCC 2016
Pijika Watcharapichat, Victoria Lopez Morales, Raul Castro Fernandez, Peter Pietzuch
-
Addressing the straggler problem for iterative convergent parallel ML SoCC 2016
Aaron Harlap, Henggang Cui, Wei Dai, Jinliang Wei, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing
-
SLAQ: quality-driven scheduling for distributed machine learning SoCC 2017
Haoyu Zhang, Logan Stafman, Andrew Or, Michael J. Freedman
-
How good are machine learning clouds for binary classification with good features?: extended abstract SoCC 2017
Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang
-
SQML: large-scale in-database machine learning with pure SQL SoCC 2017
Umar Syed, Sergei Vassilvitskii
-
Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training SoCC 2018
Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
-
Scheduling CPU for GPU-based Deep Learning Jobs SoCC 2018
Wencong Xiao, Zhenhua Han, Hanyu Zhao, Xuan Peng, Quanlu Zhang, Fan Yang, Lidong Zhou
-
SSD QoS Improvements through Machine Learning SoCC 2018
Chandranil Chakraborttii, Vikas Sinha, Heiner Litz
-
Training Deep Neural Networks Using Posit Number System SoCC 2019
Jinming Lu, Siyuan Lu, Zhisheng Wang, Chao Fang, Jun Lin, Zhongfeng Wang, Li Du
-
An Operation-Minimized FPGA Accelerator Design by Dynamically Exploiting Sparsity in CNN Winograd Transform SoCC 2019
Xinkai Di, Haigang Yang, Zhihong Huang, Ning Mao
-
A DNN Compression Framework for SOT-MRAM-based Processing-In-Memory Engine SoCC 2020
Geng Yuan, Xiaolong Ma, Sheng Lin, Zhengang Li, Jieren Deng, Caiwen Ding
-
Cycle-to-cycle Variation Enabled Energy Efficient Privacy Preserving Technology in ANN SoCC 2020
Jingyan Fu, Zhiheng Liao, Jinhui Wang
-
Improving the Performance of a NoC-based CNN Accelerator with Gather Support SoCC 2020
Binayak Tiwari, Mei Yang, Xiaohang Wang, Yingtao Jiang, Venkatesan Muthukumar
-
Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer SoCC 2020
Siyuan Lu, Meiqi Wang, Shuang Liang, Jun Lin, Zhongfeng Wang
-
Optimizing CNN Accelerator With Improved Roofline Model SoCC 2020
Shaoxia Fang, Shulin Zeng, Yu Wang
-
AutoML for Multilayer Perceptron and FPGA Co-design SoCC 2020
Philip Colangelo, Oren Segal, Alexander Speicher, Martin Margala
-
Sharing Deep Neural Network Models with Interpretation WWW 2018
Huijun Wu, Chen Wang, Jie Yin, Kai Lu, Liming Zhu
-
AccurateML: Information-aggregation-based approximate processing for fast and accurate machine learning on MapReduce INFOCOM 2017
Rui Han, Fan Zhang, Zhentao Wang
-
A Secure and Verifiable Outsourcing Scheme for Matrix Inverse Computation INFOCOM 2017
Chunqiang Hu, Abdulrahman Alhothaily, Arwa Alrawais, Xiuzhen Cheng, Carl Sturtivant, Hang Liu
-
When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning INFOCOM 2018
Shiqiang Wang, Tiffany Tuor, Theodoros Salonidis, Kin K. Leung, Christian Makaya, Ting He, Kevin Chan
-
Online Job Scheduling in Distributed Machine Learning Clusters INFOCOM 2018
Yixin Bao, Yanghua Peng, Chuan Wu, Zongpeng Li
-
DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics INFOCOM 2018
Xukan Ran, Haolianz Chen, Xiaodan Zhu, Zhenming Liu, Jiasi Chen
-
FML: Fast Machine Learning for 5G mmWave Vehicular Communications INFOCOM 2018
Arash Asadi, Sabrina Müller, Gek Hong Sim, Anja Klein, Matthias Hollick
Contributed by Xupeng Miao, Zihao Yu, Chunan Shi and Hetu team members.