talks
Mar 27, 2025 | Diffusion-Based Neural Samplers: A Systematic Review and Open Questions Computational Statistics and Machine Learning seminar at Imperial College London Abstract: Sampling from unnormalized density is a fundamental task in machine learning. Recently, motivated by the success of diffusion models, diffusion-based neural sampler start to gain attention. This talk will provide a systematical review according to their choices of sampling process, and their training objectives. By combining different sampling process with different objectives, we can recover almost all diffusion/controlled-based neural samplers in recent literatures. We then consider a potential approach to achieve simulation-free training. Although promising in theory, this method ultimately encounters severe mode collapse. In fact, on closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing, raising important questions about their efficiency and shedding light on future explorations of these neural samplers. |
---|---|
Feb 27, 2025 | Pursuits and Challenges Towards Simulation-free Training of Neural Samplers Sampling Reading Group at Mila Abstract: We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant. Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem. However, neural samplers typically incur heavy computational overhead due to simulating trajectories during training. This motivates the pursuit of simulation-free training procedures of neural samplers. In this work, we propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow. However, it ultimately suffers from severe mode collapse. On closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing. We systematically analyze several popular methods with various objective functions and demonstrate that, in the absence of Langevin preconditioning, most of them fail to adequately cover even a simple target. Finally, we draw attention to a strong baseline by combining the state-of-the-art MCMC method, Parallel Tempering (PT), with an additional generative model to shed light on future explorations of neural samplers. This talk is based on the paper: No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers, and delivered jointly with Yuanqi Du. |
Feb 27, 2025 | Diffusion Neural Sampler MLG Reading Group at University of Cambridge Abstract: Sampling from unnormalized density is a fundamental task in machine learning. Recently, motivated by the success of diffusion models, diffusion-based neural sampler start to gain attention. In this talk, we will look at several recently developed diffusion-based neural samplers, and discuss their design choices. We classify diffusion/controlled-based neural samplers according to either their choices of sampling process, or their training objectives. By combining different sampling process with different objectives, we can recover almost all diffusion/controlled-based neural samplers in recent literatures. |
Dec 15, 2024 | Getting free Bits Back from Rotational Symmetries in LLMs Workshop on Machine Learning and Compression at NeurIPS 2024 Abstract: Current methods for compressing neural network weights, such as decomposition, pruning, quantization, and channel simulation, often overlook the inherent symmetries within these networks and thus waste bits on encoding redundant information. We propose a format based on bits-back coding for storing rotationally symmetric Transformer weights more efficiently than the usual array layout at the same floating-point precision. We evaluate our method on Large Language Models (LLMs) pruned by SliceGPT (Ashkboos et al., 2024) and achieve a 3-5% reduction in total bit usage for free across different model sizes and architectures without impacting model performance within a certain numerical precision. |
Nov 28, 2024 | Diffusion Inspired Sampling for Multimodel Distributions Computational Statistics and Machine Learning seminar at Imperial College London Abstract: Sampling from unnormalized densities has long been a challenge in statistics, machine learning and molecular simulations. Traditional MCMC algorithms often struggle when the target distribution contains multiple distinct modes. Recent advances in diffusion models highlight the effectiveness of Gaussian convolutions to bridge and merge modes. We describe two approaches - a training-free MCMC sampler and a neural sampler trained with a novel diffusive divergence. Inspired by diffusion process, both approaches enables efficient sampling from multi-modal distributions. |