Yuan3.0 Ultra Multi-modal Foundation Large Model Open Source
On March 5th, the YuanLab.ai team officially released the “Source Yuan 3.0 Ultra” multimodal foundational large model. As the flagship model of the Source 3.0 series specifically designed for trillions of parameters, it has become one of the only three trillion-scale open-source multimodal large models in the current industry. Yuan 3.0 Ultra adopts a unified multimodal model architecture, consisting of a visual encoder, a language backbone network, and a multimodal alignment module, achieving collaborative modeling of visual and language information. The language backbone network is constructed based on the Hybrid Expert (MoE) architecture and includes 103 layers of Transformer. In the initial training stage, the parameter size was 1515B. Through the LAEP method innovation, the team optimized the model parameters to 1010B during the pre-training process, increasing the pre-training computing efficiency by 49%. The activation parameters of Yuan 3.0 Ultra are 68.8B. Additionally, the model introduces the Localized Filtering Attention (LFA) mechanism, effectively enhancing the modeling ability of semantic relationships, and can achieve higher model accuracy performance compared to the classic Attention structure
.

