Skip to main content

Diffusing Gaussian Mixtures for Categorical data

Learning a categorical distribution comes with its own set of challenges. A successful approach taken by state-of-the-art works is to cast the problem in a continuous domain to take advantage of the impressive performance of the generative models for continuous data. Amongst them are the recently emerging diffusion probabilistic models, which have the observed advantage of generating high-quality samples. Recent advances for categorical generative models have focused on log likelihood improvements. In this work, we propose a generative model for categorical data based on diffusion models with a focus on high-quality sample generation, and propose sampled-based evaluation methods.

The efficacy of our method stems from performing diffusion in the continuous domain while having its parameterization informed by the structure of the categorical nature of the target distribution. Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data.

Sphere packing

To map the data to the continuous space, we use a a sphere packing algorithm to set a flexible and fixed encoding.

Idea: Solve a sphere packing problem -> Fits K well-separated balls in a d-dimensional real space:

We can then use the solution to this problem to set the mean and variance of an encoding distribution conditioned on the category: q(x)=N(ux,σ).q(\cdot|x) = \mathcal{N}(\mathbf{u}^*_x, \sigma). where σ\sigma is derived from the min distance between u1,..,uK\mathbf{u}^*_1 ,.., \mathbf{u}^*_K.

Argmax encoding.Learned encoding.Sphere packing encoding (ours).
3 dimensions are needed to model 3 categories.2 dimensions can model 3 categories.2 dimensions can model 3 categories.

Parameterizing the denoising step

This parameterization induces a structure on the targeted distribution, which enables us to design a task-cognizant Gaussian Mixtures denoising function:

where Z(s)tRd\mathbf{Z}^t_{(s)} \in \mathcal{R}^d is the tt-representation of xsx_s. (ZtRS×d\mathbf{Z}^t \in \mathcal{R}^{S\times d}). We can obtain a closed-from expression for the conditional p(Z(s)t1Z(s)t,x(s))p(\mathbf{Z}^{t-1}_{(s)} | \mathbf{Z}^{t}_{(s)} , x_{(s)}); it's a MV Gaussian with fixed paramters. Hence we only have to learn the mixture weights p_θ(X(s)Zt,t)p\_{\theta}(X_{(s)}|\mathbf{Z}^t, t) to learn the denoising step.

Team

Citation

This project was published in AAAI 2023.

@inproceedings{{regol2023,
title = {Diffusing Gaussian Mixtures for Generating Categorical Data},
author={Regol, Florence and Coates, Mark},
booktitle = {Proc. AAAI Conf. on Artificial Intelligence},
year = {2023}

ArXiv link