This paper performs a new analysis of the sample complexity of Gaussian processes when the Gaussian covariance is approximated by sparse random Fourier features. They show that if one assumes that the data are clustered, then the approximate covariance matrix is close to the true covariance matrix, and in turn that the approximate GP estimator is close to the GP estimator, for a number of sparse features smaller than what needs without the clustering assumption. In a second part, the paper proposes to combine GP with a prior embedding of data to a latent space where they are clustered, using a VAE, and report preliminary experimental results. All reviewers found the theoretical work interesting, novel, and worth presenting at NeurIPS. Exploiting a cluster structure in the data to decrease the number of random features needed to approximate the kernel is a nice idea, and this paper provides a rigorous analysis to quantify how this happens. Reviewers were a bit more critical about the second part, where it is not very clear why the objective function of the VAE is a good one, and had questions about the computational cost of the overall procedure (given that the goal of using a sparse spectrum approximation is to gain in computational cost). They also did not find the experiments very impressive, and suggested in particular to consider larger datasets to assess the scalability of the approach.