This paper considers an interesting problem setting where reward is biased by context. It provides a strong and thorough experimental analysis on real-world problems that includes a comparison to several competing techniques, and the results are compelling. On the negative side, the approach lies mainly on existing techniques and the level of conceptual advance and innovation is rather low, the presentation should be improved to provide a clearer problem statement, and the relationship to prior related work in the field of Operations Research is missing. On balance, the consensus opinion of the reviewers is that the problem and experimental analysis makes a sufficient conversation and that the authors are committed to addressing the improvements mentioned above in their final version. The recommendation is to accept on this presumption.