2025, 14(1):1-8. DOI: 10.12146/j.issn.2095-3135.20241215001
Abstract:Mainstream artificial intelligence technology can be viewed as “intelligent computing technology” from one perspective. This article presents some views on the historic breakthroughs, development trends, and challenges faced by intelligent computing technology. It also provides a brief overview of commonly concerned issues such as whether the scaling law has reached its ceiling, where the solution to the shortage of computing power lies, and what the essence of large models is.
2025, 14(1):9-24. DOI: 10.12146/j.issn.2095-3135.20241201002
Abstract:In recent years, the rapid development of generative AI has made text-driven video prediction large models a hot topic in academia and industry. Video prediction and generation should address temporal dynamics and consistency, requiring precise control of scene structures, subject behaviors, camera movements, and semantic expressions. One major challenge is accurately controlling scene dynamics in video prediction to achieve high-quality, semantically consistent outputs. Researchers have proposed key control methods, including camera control enhancement, reference video control, semantic consistency enhancement, and subject feature control improvement. These methods aim to improve generation quality, ensuring outputs align with historical context while meeting user needs. This paper systematically explores the core concepts, advantages, limitations, and future directions of these four control approaches.
XU Qinglin, QIAO Yu, WANG Yali
2025, 14(1):25-38. DOI: 10.12146/j.issn.2095-3135.20231225001
Abstract:The domain gap between dark scenes and the data used by traditional pretrained models leads to suboptimal performance with the conventional pretrain-finetune approach, and pretraining from scratch is costly. To address this issue, a domain-adaptive pretraining method is proposed to improve action recognition performance in the dark environments. The method integrates an external vision enhancement model for de darkening to introduce critical knowledge for dark scene processing. It also employs a cross-domain self distillation framework to reduce the domain gap of visual representations between illuminated and dark scenes. Through extensive experiments in various dark environment action recognition settings, the proposed approach can achieve a Top1 accuracy of 97.19% on the dark dataset of fully supervised action recognition. In the source-free domain adaptation on the Daily-DA dataset, the accuracy can be improved to 49.11%. In the multi source domain adaptation scenario on the Daily-DA dataset, the Top1 accuracy can reach 54.63%.
XIE Zhijun, ZHAO Canming, KE Xin, XIAO Yang, WU Jing, SONG Jialei
2025, 14(1):39-49. DOI: 10.12146/j.issn.2095-3135.20240312002
Abstract:The paper designed a compact, low-cost, and efficiently swimming small, single-jointed bionic robotic fish based on the propulsion mode and fin modularity of trevally fish, with easy disassembly of the pectoral, ventral, and caudal fins. In addition, the paper conducted underwater experiments on the straightline propulsion, static turning, and head stability of the bionic robotic fish, and investigated the effects of the pectoral and ventral fins on the swimming performance. In the swimming test of the prototype, the paper utilizes a high-speed camera and a plane mirror to construct a “binocular vision system” to record the movement of the fish, which can track the most anterior part of the fish head and two marking points above the prime point, and record the three-dimensional position information. This can provide a reference for quantitative analysis of swimming performance, attitude change and head stability performance. The results showed that the straight-line propulsion and turning performance of the robotic fish was better; in the stability experiment, the robotic fish equipped with pectoral and ventral fins had better head stability in low-frequency swimming, while it did not show an advantage in high-frequency swimming, which was consistent with the phenomenon that various fins except the caudal fin stick to the body in high-frequency swimming of fishes in nature.
ZHANG Mingkai, GU Feifei, XIAO Zhenzhong, SHI Shaoguang
2025, 14(1):50-64. DOI: 10.12146/j.issn.2095-3135.20240820001
Abstract:The accurate reconstruction of industrial component edges is essential and crucial for visual positioning and quality inspection. To address the issue of difficulty in accurately reconstructing point clouds at the edges of industrial components, a three-dimensional edge reconstruction algorithm based on point cloud projection is proposed. First, the three-dimensional point cloud of the components is obtained by scanning using a binocular structured light method, edge points in the scanned point cloud are extracted. Then the image edge points are extracted from the binocular images. Subsequently, the point cloud edge points are projected onto the binocular images, the nearest image edge points are searched around each projected point to obtain corresponding binocular edge points. Finally, accurate three-dimensional edge point clouds are reconstructed using stereo vision methods. Experimental results demonstrate that compared to other current methods, this approach can effectively address the issue of false edges caused by interference such as reflection and surface scratches, the reconstructed edge point cloud using this method has high accuracy with reconstruction error less than 0.15 mm and can be applied in industrial scenarios such as bin picking, online quality inspection.
XU Tao, WANG Shuncheng, ZHONG Jianwen, LIU Dabo, ZHOU Yilong, LIU Chang
2025, 14(1):65-77. DOI: 10.12146/j.issn.2095-3135.20240307001
Abstract:Adenoid hypertrophy (AH) is a key contributor to pediatric obstructive sleep apnea syndrome (OSAS). Physicians rely on nasopharyngeal endoscopy to identify AH and the obstruction of adenoid to the airway. However, due to the limitations of 2D endoscope images, physicians have to subjectively infer the 3D structure of the adenoid region, which heavily relies on their expertise and the angle at which the adenoids are observed. The adenoid surface is composed of mucosal tissue covered by nasal secretions, and thus strongly reflective, smooth, and lack features. Furthermore, the endoscope image of adenoid is relatively blurred. Based on these unique characteristics of the adenoids, this paper introduces a multi-view stereo algorithm based on endoscopic image sequences of the adenoid nasopharyngeal cavity. The algorithm employs multi view stereo matching to first estimate the depth maps corresponding to the images. Subsequently, it utilizes mesh surfaces to fit the rough depth information in the depth space, and thereby generates the smooth and refined depth maps. Eventually, fusing the obtained depth maps leads to a dense and precise reconstruction of the adenoid region. Both synthetic and real experimental results demonstrate that the algorithm can achieve accurate, dense, and smooth reconstruction of the adenoid area, surpassing the existing reconstruction algorithms significantly.
2025, 14(1):78-90. DOI: 10.12146/j.issn.2095-3135.20240422001
Abstract:In this work, a visual language model is introduced in ophthalmic image disease recognition. And a multi-disease recognition algorithm based on a pre-trained contrasting language-images model is proposed. First, a multi-labeled fundus image dataset MDFCD8 containing 8 categories is constructed based on several publicly available fundus image datasets. Then, the generative artificial intelligence GPT-4 (Generative Pre-trained Transformer 4) is utilized to generate expert knowledge describing the fine-grained pathological features of fundus images, which solves the problem of the lack of text labels in fundus image datasets. The paper calculates the average precision (AP), F1 score, and area under the receiver operating characteristic curve (AUC), and takes the mean value of the three as the final performance evaluation index. The experimental results showed that, the method proposed in this paper outperforms the traditional convolutional neural network and Transformer network by 4.8% and 3.2%, respectively. This study also conducted ablation experiments on each module to validate the effectiveness of the method, demonstrating the potential application of visual language modeling in the field of auxiliary diagnosis of ophthalmic diseases.
KONG Weikun, ZHONG Cheng, CHEN Wenbo, YU Shuhui, SUN Rong
2025, 14(1):91-104. DOI: 10.12146/j.issn.2095-3135.20240119001
Abstract:Against the backdrop of Moore’s Law approaching its limits and the increasing difficulty and cost of next-generation integrated circuit technologies, advanced substrate technology emerges a crucial carrier for supporting I/O enhancementand system integration in the realm of advanced packaging. It is also one of the core components in the post-Moore era. Currently, semi-additive process utilizing build-up film (BF) is one of theprimary methods for achieving fine-pitch multilayer packaging substrates. Given the increasingly prominent issue of signal integrity in electronic equipment operating in high-frequency and high-speed environments, this paper thoroughly discusses the impact of the physical properties of BF material and structural characteristics on signal transmission loss. Based on typical substrate structures, such as microstrip lines and vias, the relationship between BF material parameters and signal transmission performance is studied using an electrical simulation analysis system. It is found that in a microstrip structure, the signal transmission loss increases with an increase in frequency, and this loss is closely related to the dielectric loss factor of BF material. However, in the via structure, the dielectric constant of the BF material significantly influences the equivalent capacitance and the value of impedance, subsequently affecting impedance mismatch. Although the characteristics of the BF material do have some impact on impedance mismatch, the primary factor affecting impedance matching remains the design of via structure itself. In addition, the conductor loss resulting from the skin effect increases with the rise in copper foil roughness at high frequency, offering a crucial reference for quality control of copper foil during the manufacturing process of packaging substrate. This study elucidates the mechanism of physical property and structural characteristics of BF material influencing signal transmission loss, thereby proving a theoretical foundation for the design and optimization of BF material with enhanced physical properties for packaging substrate.
Mobile website