Recently, under the guidance of the Group, China Unicom Research Institute, China Unicom Zhejiang Branch and Unicom Clothing Manufacturing Corps have collaborated on research and put forward an innovative business model for the local storage of AI sensitive data and off-site training requirements, and successfully implemented the industry's first storage and calculation separation of 30TB sample data across 200 kilometers between Hangzhou and Jinhua, and the training efficiency is up to 97% after actual measurement. After actual measurement, the training efficiency is as high as 97%. This test fully verifies the safety, feasibility and efficiency of the storage-computation separation technology, and provides new ideas and directions for the development of future AI technology.
The storage-calculation separation technology refers to the independence of the warehouse for storing data and the processing plant for calculating data, so that the data is directly pulled from the remote storage device for calculation during training, instead of being stored to the local disk and then processed, which can effectively ensure the security and consistency of user data.
Currently, the demand for AI intelligent computing is strong, and there are two major challenges in the process of processing massive sample data: first, the data is mostly stored on the enterprise side, and some data with high security requirements are inconvenient to relocate; second, the volume of sample data is proliferating, and the AI intelligent computing center needs to be additionally equipped with storage resources while possessing powerful arithmetic power, which significantly improves the construction cost. In this context, the industry is in urgent need of realizing the “separation of storage and calculation and pulling far away, sample training and pulling along”.
As a “national team of digital information operation services and a leader in digital technology integration and innovation”, China Unicom has been actively carrying out research on the architecture, key technologies and relevant scenarios of arithmetic intelligence networking in response to this demand, and has put forward the innovative service model of storage-calculation separation. At present, we have promoted the layout of relevant standards in the field of smart computing in ITU, and the research on the requirements and technical framework of wide-area losslessness in IETF. Meanwhile, we have formulated a series of standards on the enhancement of wide-area network capacity for smart computing bearer in CCSA, and have joined hands with industrial partners to jointly promote the research on the core technologies of smart computing interconnection and wide-area losslessness. In terms of long-distance RDMA wide-area lossless technology research, it realizes long-distance, high-throughput lossless transmission of RDMA in the coverage range of 100 km to 1,000 km; in terms of precise flow control technology research, it realizes tenant-level flow control to ensure inter-tenant business isolation and no loss of computational efficiency; and in terms of storage read/write performance optimization, it optimizes distributed storage file systems in remote scenarios by adopting measures such as multithreaded processing, enhancing concurrency and system-level optimization, and so on. In terms of storage read/write performance optimization, by adopting measures such as threaded processing, enhanced concurrency and system-level optimization, the file read/write performance of the distributed storage file system in the remote scenario is improved by more than 5 times, thus meeting the needs of NLP models and CV models in remote training.
In the training scenario of “Clothes Pupil Industry Model” of Unicom Clothing Manufacturing Corps, the original data is mainly based on multimodal data of clothing, and clothing manufacturers can realize real-time detection by using the “Clothes Pupil Industry Model”, but the clothing data of many clothing manufacturers need property rights protection and are not willing to be disclosed, while the data based on the storage algorithm is not disclosed. Do not want to disclose, and the innovative training mode based on the separation of storage and computation just perfectly meets the needs of users. The main features of the test verification of the storage-calculation pull-away include:
First, the smart computing training mode is innovatively reconstructed with cross-location AI large model training capabilities. The traditional centralized training mode of smart computing requires users to upload samples to the smart computing center for drop disk training, but some users have security concerns about private sample drop disks. Zhejiang Unicom has realized the “data without dropping disk” remote training of Hangzhou storage and Jinhua training through IP wide-area non-destructive program, and explored a new way for enterprise users to train their privacy samples with the ability of computing network synergy.
Second, the total amount of sample data reaches 30TB, the transmission distance exceeds 200 kilometers, and the computational distancing efficiency is greater than 97%. Through the AI training storage-computing separation test of “Clothes Pupil Industry Model” of Unicom's apparel manufacturing legion, it has been fully verified that the AI training service for enterprise users can be fully utilized. Fully verified the technical feasibility of storage-calculation-relaxation for AI training business, future users with relevant data-sensitive business needs can complete the privacy sample without leaving the campus of the remote training through the operator's arithmetic services, to achieve the best balance between cost and security.
Tang Xiongyan, Vice President and Chief Scientist of China Unicom Research Institute, introduced that the separation of storage and calculation can adapt to the efficient and specialized needs of future intelligent calculation, enhance security and data protection, and provide strong support for technological innovation and industrial upgrading. It can not only optimize resource utilization and improve system performance, but also promote technological innovation and sustainable development, which is an important trend for the future development of the smart computing industry.
Yingqi Tang, general manager of Zhejiang Unicom's Network Department (Science and Technology Innovation Department), said that Zhejiang Unicom will accelerate digital integration and continue to help traditional industries transform and upgrade, especially to build a “high-throughput, high-performance, high-intelligence” Arithmetic Intelligent Networking AINet, and actively explore innovative service modes, such as elastic bandwidth, task-based service, data courier, lossless transmission, and so on. Innovative Service Models.
Looking to the future, China Unicom will continue to plough into the technological innovation of AINet, and through the R&D and construction of AINet, push forward the development of new networks, technologies and services, and continue to provide leading networked communications and AINet digital intelligence products, accelerate the development of new quality productivity centered on computing power and data, and empower the digital transformation and upgrading of thousands of industries.