The paper gives insights on DSM (Denoising Score Matching ) and MCMC method and links it to Probabilistic Diffusion models. This is novel and reviewer agrees that the paper has a good contribution. Concerns: • Algorithmically it is the same algorithm of NCSN with 1) different hyper-parameters motivated from diffusion models ( like scaling of inputs between stages ) 2) different architectural choices • The FID is very low , maybe some memorization ? qualitative experiments are done like nearest neighbor and interpolation, can you add FID on a test set not on the training set to measure memorization? Please include in the final version of the paper all the details in answers in rebuttal to R2 on the main comparison with NCSN, architecture choices etc, training time , sampling time, the need for cross-validation etc and how long the full training and cross validation takes. While probabilistic diffusion models are elegant their compute time is intensive please discuss this in the paper, and how you think this can be addressed.