LLM Inference Optimization and Applications on Hygon DCU
Talk, CCF China Service 2025, Yantai International Expo Center, Yantai, China
This talk presents our exploration of adapting and optimizing large language model inference on Hygon DCU computing resources. In the absence of comprehensive framework support, we investigated multi-card parallelism and lightweight scheduling strategies to enhance the performance of models such as Qwen3-14B and Qwen3-4B. The work demonstrates that even under constrained compute environments, it is possible to unlock practical value from earlier-generation domestic accelerators through careful system-level optimization.