UNIST Researchers Recognized for Data Distillation Excellence at ICLR 2026

NEWS CENTER

Discover not only Research Findings and event news, but also the diverse facets of UNIST presented by reporters and writers.

News Center

UNIST News

Research

UNIST Researchers Recognized for Data Distillation Excellence at ICLR 2026

Professor Jae-Young Sim’s team has achieved a significant milestone with two papers accepted for presentation at the ICLR 2026.

Research
JooHyeon Heo
2026.06.11
2232

UNIST Researchers Recognized for Data Distillation Excellence at ICLR 2026

The research, conducted by Professor Jae-Young Sim and his team from the Graduate School of Artificial Intelligence at UNIST, has received notable recognition with two papers accepted for presentation at the International Conference on Learning Representations (ICLR) 2026—one of the most prestigious venues in machine learning research.

Their work addresses one of the fundamental challenges in deploying artificial intelligence (AI) at scale: managing vast datasets—often comprising millions of images or extensive 3D scans—that demand enormous computational power and energy. To overcome this, the team developed advanced methods for dataset distillation, a process that synthesizes compact, highly representative datasets capable of training models effectively while dramatically reducing training time, computational costs, and energy consumption.

■ Advancing 3D Point Cloud Compression via Dataset Distillation

A key challenge tackled by the team involves the compression of 3D point cloud data—crucial for applications such as autonomous vehicles and robotic perception. Unlike traditional images, point clouds are irregular and unordered, making them difficult to compress efficiently with conventional methods. Existing approaches often rely on storing a single high-resolution synthetic sample, which limits diversity and robustness within constrained memory budgets.

To address this, Professor Sim and his team—comprising Dongwook Kim and Jae-Young Yim—developed a novel framework tailored specifically for 3D data. Instead of a single representative sample, the method employs multiple low-resolution anchor point clouds. These anchors are then smoothly transformed into a variety of synthetic shapes through a learnable shape-morphing mechanism, enabling the generation of diverse, high-quality 3D samples within a limited memory footprint. To preserve structural fidelity, the researchers introduced the Uniformity-Aware Matching Loss function, which maintains the geometric relationships inherent in the original data.

Extensive evaluations across multiple benchmark datasets—including ModelNet10, ModelNet40, and ShapeNet—demonstrated that this approach surpasses existing methods, achieving remarkable accuracy even under severe compression. Notably, in the most constrained scenario—using only a single synthetic sample per class—the recognition accuracy increased from 35.9% to 87.7%, illustrating the method's efficiency and robustness.

Dongwook Kim, Jae-Young Yim, and Jae-Young Sim, "Parameterization-Based Dataset Distillation of 3D Point Clouds through Learnable Shape Morphing," ICLR 2026.

■ Enabling Continual Learning through Dynamic Dataset Updating

Beyond static data compression, the team addressed the challenge of continuous learning in dynamic environments—where new data continually arrives, and models must adapt without losing prior knowledge. Traditional dataset distillation approaches often require creating separate synthetic datasets for each new batch, leading to increased storage and computational costs. More critically, repeatedly updating a fixed synthetic dataset risks Catastrophic Forgetting , where previously learned information is overwritten.

To tackle this, the researchers developed an Asymmetric Synthetic Data Update strategy. This innovative approach dynamically adjusts the influence of each synthetic sample during updates, allowing the dataset to incorporate new information while retaining past knowledge. A bi-level meta-learning framework automatically determines optimal per-sample update rates, balancing stability and plasticity. This enables the synthetic dataset to evolve continuously, effectively mitigating forgetting and maintaining high performance over time.

Experimental results demonstrate that this method facilitates efficient lifelong learning within fixed storage capacities, making it highly suitable for real-world applications, such as autonomous vehicles, robotics, and edge devices.

Professor Sim remarked, “Our work exemplifies how intelligent data synthesis and adaptive updating strategies can make AI systems more resource-efficient, scalable, and capable of lifelong learning. We believe these advancements will accelerate the deployment of smarter, more sustainable AI in diverse real-world environments.” The study was led by Minyoung Oh, serving as the first author.

Minyoung Oh and Jae-Young Sim, “Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation,” ICLR 2026.