LLM Inference Optimization and Applications on Hygon DCU

Date:

This talk presents our exploration of adapting and optimizing large language model inference on Hygon DCU computing resources. In the absence of comprehensive framework support, we investigated multi-card parallelism and lightweight scheduling strategies to enhance the performance of models such as Qwen3-14B and Qwen3-4B. The work demonstrates that even under constrained compute environments, it is possible to unlock practical value from earlier-generation domestic accelerators through careful system-level optimization.

Beyond performance, we examined the application potential of DCU-based LLM inference in real-world educational scenarios. As a case study, we deployed Qwen3-14B as part of the “ChengyuanTong” freshman assistant, enabling intelligent query and Q&A services within the campus network. This experience illustrates how domestic compute platforms can support scenario-driven innovation, while also providing transferable methodologies for cross-platform LLM adaptation and deployment.