Seminars on Numerical Algebra, Optimization and Data Sciences: Finding Low-Rank Matrix Weights in DNNs via Riemannian Optimization: RAdaGrad and RAadmW
- A+
:蔡剑锋(香港科技大学)
:2026-05-29 10:00
:海韵园实验楼S102
报告人:蔡剑锋(香港科技大学)
时 间:2026年5月29日10:00
地 点:海韵园实验楼S102
内容摘要:
Finding low-rank matrix weights is a key technique for addressing the high memory usage and computational demands of large models. Most existing algorithms rely on the factorization of the low-rank matrix weights, which is non-unique and redundant. Their convergence is slow especially when the target low-rank matrices are ill-conditioned, because the convergence rate depends on the condition number of the Jacobian operator for the factorization and the Hessian of the loss function with respect to the weight matrix. To address this challenge, we adopt the Riemannian gradient descent (RGD) algorithm on the Riemannian manifold of fixed-rank matrices to update the entire low-rank weight matrix. This algorithm completely avoids the factorization, thereby eliminating the negative impact of the Jacobian condition number. Furthermore, by leveraging the geometric structure of the Riemannian manifold and selecting an appropriate metric, it mitigates the negative impact of the Hessian condition number. Ultimately, this results in our two plug-and-play optimizers: RAdaGrad and RAdamW, which are RGD with metrics adapted from AdaGrad and AdamW and restricted to the manifold. Our algorithms can be seamlessly integrated with various deep neural network architectures without any modifications. We evaluate the effectiveness of our algorithms through fine-tuning experiments on large language models and diffusion models. Experimental results consistently demonstrate that our algorithms provide superior performance compared to state-of-the-art methods. Additionally, our algorithm is not only effective for fine-tuning large models but is also applicable to deep neural network (DNN) compression.
个人简介:
蔡剑锋,香港科技大学数学系教授,博导,主要研究领域为信号处理,矩阵恢复和图像重构等。作为计算调和分析、信号与图像处理、稀疏与低秩重构领域的权威专家,蔡教授取得多项突破性研究成果,发表在JAMS、SIAM系列、IEEE系列、ACHA、PRL、Ann. Stat.、JMLR等国际知名数学与工程期刊上,其关于矩阵恢复的SVT算法对学术研究和实际应用产生重要影响,该文章谷歌被引次数超6000次。蔡剑锋教授关于图像恢复的工作发表于被誉为数学四大期刊之一的Journal of the AMS。蔡剑锋教授在2017年和2018年被评为全球高被引学者,学术文章总被引超15000次。
联系人:杜魁
