The paper describes new offline and online techniques to optimize the policy of continuous time discrete state and action POMDPs. This paper makes an important contribution to the RL and control literature. Very little work has focused on continuous time control problems in the ML community. While the techniques assume that the model is known, do not scale to high dimensional problems and were tested only on toy problems, they introduce new formalisms that will help the community get familiar with the mathematics of continuous time control. Hence this paper will be of high interest for the RL community.