Seems the link is broken, so I’ll add it here:
Also the discussion on LMR you mentioned is here: Can anyone interpret this Q-learning code?
Could you maybe add some quick info about what happens when your robot learns, and how Q-learning works?
Edit:
This video explains roughly the process, see also the video description: A Self-Learning Crawling Robot (Q-Learning)
And here is a simpler problem about finding the shortest path between rooms (symbolized by LEDs): https://alidemir1.github.io/qlearning-post/
I think it’s useful to understand, because it maps well to the key part of Q-Learning: the reward matrix.
Finally, here is a simulator for q-learning: https://www.mladdict.com/q-learning-simulator
Another tutorial: https://people.revoledu.com/kardi/tutorial/ReinforcementLearning/
Unfortunately they all have small mistakes/little imprecisions. But when you follow the examples, you get what is really intended. This trumps pure theory where you don’t have enough information to infer the errors/missing information, and all is just codified in abstract terms that could have various interpretations.