UR3 Deep Q-learning