LLM Inference Optimization and Applications on Hygon DCU

Date: August 19, 2025

This talk presents our exploration of adapting and optimizing large language model inference on Hygon DCU computing resources. In the absence of comprehensive framework support, we investigated multi-card parallelism and lightweight scheduling strategies to enhance the performance of models such as Qwen3-14B and Qwen3-4B. The work demonstrates that even under constrained compute environments, it is possible to unlock practical value from earlier-generation domestic accelerators through careful system-level optimization.

Beyond performance, we examined the application potential of DCU-based LLM inference in real-world educational scenarios. As a case study, we deployed Qwen3-14B as part of the “ChengyuanTong” freshman assistant, enabling intelligent query and Q&A services within the campus network. This experience illustrates how domestic compute platforms can support scenario-driven innovation, while also providing transferable methodologies for cross-platform LLM adaptation and deployment.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Zhuhan Bao

Share on