A unified schedule policy of distributed machine learning framework for CPU-GPU cluster | Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

Open Access

Issue		JNWPU Volume 39, Number 3, June 2021


Page(s)		529 - 538
DOI		https://doi.org/10.1051/jnwpu/20213930529
Published online		09 August 2021

Chen T, Li M, Li Y, et al. Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems[J/OL]. (2015-12-03)[2015-12-07]. https://arxiv.org/abs/1512.01274 [Google Scholar]
Jia Y, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia, 2014: 675–678 [Google Scholar]
Xing E P, Ho Q, Dai W, et al. Petuum: a new platform for distributed machine learning on big data[J]. IEEE Trans on Big Data, 2015, 1: 49–67 10.1109/TBDATA.2015.2472014 [CrossRef] [Google Scholar]
Chen L, Huo X, Agrawal G. Accelerating mapreduce on a coupled CPU-GPU architecture[C]//Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012: 1–11 [Google Scholar]
Ravi V T, Becchi M, Jiang W, et al. Scheduling concurrent applications on a cluster of CPU-GPU nodes[C]//2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2012: 140–147 [Google Scholar]
Gu J, Liu H, Zhou Y, et al. Deepprof: performance analysis for deep learning applications via mining GPU execution patterns[J/OL]. (2017-07-12)[2017-07-13]. https://arxiv.org/abs/1707.03750 [Google Scholar]
Rhu M, Gimelshein N, Clemons J, et al. vDNN: virtualized deep neural networks for scalable, memory-efficient neural network design[C]//2016 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016: 1–13 [Google Scholar]
Goyal P, Dollár P, Girshick R, et al. Accurate, large minibatch SGD: training imagenet in 1 hour[J/OL]. (2018-04-30)[2018-05-02]. https://arxiv.org/abs/1706.02677 [Google Scholar]
Vavilapalli V K, Murthy A C, Douglas C, et al. Apache hadoop yarn: yet another resource negotiator[C]//Proceedings of the 4th Annual Symposium on Cloud Computing, 2013: 1–16 [Google Scholar]
Zhang H, Stafman L, Or A, et al. Slaq: quality-driven scheduling for distributed machine learning[C]//Proceedings of the 2017 Symposium on Cloud Computing, 2017: 390–404 [Google Scholar]
Tang Xiaochun, Fu Ying, Fan Xuefeng. Research on fine-grained allocation algorithm of heterogeneous resources in data center[J]. Journal of Northwestern Polytechnical University, 2020, 38: 589–595 10.3969/j.issn.1000-2758.2020.03.017[Article] (in Chinese) [CrossRef] [Google Scholar]
Wang Yanhua, Qiao Jianzhong, Lin Shukuan, et al. SVM-based task allocation model of CPU-GPU heterogeneous system[J]. Journal of Northeastern University, 2016, 37: 1089–1094 [Article] (in Chinese) [Google Scholar]
Xiao W, Bhardwaj R, Ramjee R, et al. Gandiva: introspective cluster scheduling for deep learning[C]//13th Symposium on Operating Systems Design and Implementation, 2018: 595–610 [Google Scholar]
Gu J, Chowdhury M, Shin K G, et al. Tiresias: a {GPU} cluster manager for distributed deep learning[C]//16th Symposium on Networked Systems Design and Implementation, 2019: 485–500 [Google Scholar]
Peng Y, Bao Y, Chen Y, et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters[C]//Proceedings of the Thirteenth EuroSys Conference, 2018: 1–14 [Google Scholar]
Jeon M, Venkataraman S, Qian J, et al. Multi-tenant GPU clusters for deep learning workloads: analysis and mplications[J/OL]. (2018-05-13)[2018-06-16]. https://www.microsoft.com/en-us/research/publication/multi-tenant-gpu-clusters-deep-learning-workloads-analysis-implications-tr [Google Scholar]
Shirahata K, Sato H, Matsuoka S. Hybrid map task scheduling for GPU-based heterogeneous clusters[C]//2010 IEEE Second International Conference on Cloud Computing Technology and Science, 2010: 733–740 [Google Scholar]
Zhou H, Liu C. Task mapping in heterogeneous embedded systems for fast completion time[C]//2014 International Conference on Embedded Software, 2014: 1–10 [Google Scholar]
Che S, Boyer M, Meng J, et al. Rodinia: a benchmark suite for heterogeneous computing[C]//2009 IEEE International Symposium on Workload Characterization, 2009: 44–54 [Google Scholar]
Rousseeuw P J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis[J]. Journal of Computational and Applied Mathematics, 1987, 20: 53–65 10.1016/0377-0427(87)90125-7 [CrossRef] [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.