Q-learning (machine learning) crawling robot

This robot uses the Q-learning algorithm to learn to crawl. It uses the Arduino code from this project: 

https://planetachatbot.com/q-learning-con-arduino-crawling-robot-espanol-5eb0acf5aaaf

 


This is a companion discussion topic for the original entry at https://community.robotshop.com/robots/show/q-learning-machine-learning-crawling-robot

Seems the link is broken, so I’ll add it here:


Also the discussion on LMR you mentioned is here: Can anyone interpret this Q-learning code?

Could you maybe add some quick info about what happens when your robot learns, and how Q-learning works?

Edit:
This video explains roughly the process, see also the video description: A Self-Learning Crawling Robot (Q-Learning)

And here is a simpler problem about finding the shortest path between rooms (symbolized by LEDs): https://alidemir1.github.io/qlearning-post/
I think it’s useful to understand, because it maps well to the key part of Q-Learning: the reward matrix.

Finally, here is a simulator for q-learning: https://www.mladdict.com/q-learning-simulator

Another tutorial: https://people.revoledu.com/kardi/tutorial/ReinforcementLearning/

Unfortunately they all have small mistakes/little imprecisions. But when you follow the examples, you get what is really intended. This trumps pure theory where you don’t have enough information to infer the errors/missing information, and all is just codified in abstract terms that could have various interpretations.

Thanks maelh, I will post the links you supplied and a description of how the robot works.
Jim.

1 Like

Excuse me, what is the shield that you used in Arduino?

Hi.
I used this one: https://www.ebay.com/itm/V5-Sensor-Shield-Expansion-Board-Shield-For-Arduino-UNO-R3-V5-0-Electric-Module/201741001260
But you have to be careful if you use an external voltage as it must be 5 volts. I like a higher servo voltage so I bent the 5 volt pin away so that it does not connect to the Arduino Uno and then I used a 2s lipo. But also causes sensor voltage issues so you cannot use the analog and digital pins for sensors, you have to use the 5 volt (EX/A7) pin for sensor vcc though you can use the shield ground and signal wires. Probably easiest to just use 5 volts for everything and then no problems.
Or use a newer version shield. Bought mine in China (as I am located in China).
Jim.

Thanks for your updated post. Some later time I’ll check out the reinforcement version. Interesting to see various approaches.

I like the new video, looks interesting to watch it learn.

What is the robot base / chassis you use?

Thank you maelh, always appreciate your comments and input. The chassis is just an acrylic one from one of those two wheel robot kits. The wheels were scavenged from an rc car. The trailing wheel I was pretty proud of. I made it a one direction wheel by screwing on a sliver of vinyl to get caught in the cogs of the gear. Keeps the robot from going backward. Had made it for some other project. I saw some crawling robot projects that work both directions. Might try that one day.

1 Like

Seems I made the reinforcement learning algorithms more complex than they needed to be. The simplest solution is:

  1. Randomly choose 4 servo positions ( 2 arm positions).
  2. Move servos
  3. Check distance, if distance greater than previously moved distance, keep those positions
  4. Loop to 1 n times.
  5. In a loop, cycle between most successful two arm positions.

No need for any arrays.
I got a neural net working with this robot and then realized, what do I need the NN for, just keep the most successful positions and repeat those.

Guess I will move on to some other project now.

The point would probably be to generalize in some way, from what was learned, or to interpolate between those known cases. For example, learn to crawl in different contexts/environments, such as on a carpet or uneven terrain, or more slippery ground.

Ideally, you let it work in another environment then (or at least different enough), and it will still work.
A simple form of that could be to change the maximum angles that can be used, so it has to bend its arm differently, depending on the constraints valid right now.

In your current setting it’s probably enough to find the ideal values once, instead of generalizing.

A more uneven ground could prove to be more challenging, because it would need vision or some distance sensors, to determine the actual distance the arm needs to travel, so the end effector (i.e. “finger”) touches the ground. But it would be more complex since it’s not a linear motion, but two rotations around pivot points. You could figure that out using vector math, but here it could be fun to see if you can let the training figure out this relationship itself.

Maybe this is easier to do in a robot simulator.

Edit: This makes me think back of a project with a robot arm that tried to grab something and drop it again. Since there is quite some mechanical play, it is difficult to just replay a recorded sequence. Maybe this would work better using a neural net, and training it several times. Though it would still not be able to compensate unmeasured/unsensed difference in angular positions, since the potentiometers just wont be rotated immediately due to mechanical play.

I have another project I am working on, using neural nets. A modified RC car driving along a track made out of paper sheets. Training proves challenging.

Yeah, I get my robot working and think, “that’s cute” and then wander off on a tangent without really investigating further.
Focus is not my forte.
Will be interesting to see your code when you get yours working.
Have been working on one of my goofy machine diversions and will post a video soon.
End of term approaching so will have to get back to my real job now.