Data Compression in Distributed Learning
Monday, October 14, 2019 - 4:45pm - 5:30pm
Modern large-scale machine learning models are often trained by the parallel stochastic gradient descent algorithm or its variants on distributed systems with parameter servers or master/worker nodes. However, the communications of the gradient aggregation and model synchronization between the master and worker nodes are the major obstacles for efficient learning as the number of workers and the dimension of the model scale. In this talk, I will introduce several ways to compress the transferred data and reduce the overall communication such that the obstacles can be immensely mitigated. In particular, I will introduce double residual compression stochastic gradient descent algorithm (DRC-SGD), which compresses both transfers to and from the server.