Super Droid Bot "Anna" - w/ Learning AI

dannyjeffrey · June 28, 2014, 4:57pm

The Face (Android)

Hello, I just want to say, what a fantastic robot! It’s just… amazing.

I was just wondering how on earth you get the face on the phone, and how exactly you set it all up?

Even a vague response would be greatly appreciated, thank you!

mtriplett · July 2, 2014, 12:22am

re: Building a Face

Thanks much for the feedback. Not knowing your background, I’ll try to sum up how to build a face at a high level.

1) Write a Java Android Application - as an IDE, I use Eclipse. You can plug your phone into your computer, deploy, run, and debug on a PC.

2) Add an Activity to your App - (this is a visible form or window on other platforms) For me, this Activity is also a “USB Accessory”, and so it automatically runs when plugged into an Arduino through USB. You’ll have to write some tricky code to talk through USB though. My activity is also hooked into OpenCV (a separate library) and so it receives video frames from OpenCV of the front facing camera on the phone.

3) I built the face as a custom “View”, this would be called a “Custom Control” if you are a windows person. There are other ways to do it, this is just how I chose to.

4) Displaying the face is about drawing graphics, circles, arcs, lines, etc, for whatever you can imagine, to your custom “View”. The possibililities are endless. The eyes/nose are pretty easy, eyelids and blinking is what made the eyes come to life in my opinion. The mouth was the hardest for me…piecing arcs together to form a mouth was tricky for me. You can also build overlays like video, sonar, etc. Android will let you draw graphics that are semi-transparent, which is what I did for the video overlay eyepiece, which is drawn in mirrored view as it would be if the robot was looking at it on “Google Glasses”. There is another one that draws a display of sonar.

5) Redraw the face several times a second either on some fixed timer or some other trigger frequent trigger. I chose to redraw it every fourth video frame, or about 5 times a second when OpenCV is running at 20.

6) It helps if this varies so that the face will appear to animate. I use a combination of emotions, light level, color blobs recognized by OpenCV (to change where the eyes are looking), and speech. I throw out random movements of the mouth within an acceptable range from what the mouth would be in the current emotional state.

7) You can add “touch” events to your face, recognize gestures, etc. For me, I just make the phone listen for speech if I touch the face. You can also build menus or other “Views” to your hearts content. I have other views for the open cv features, changing settings, etc.

I was familiar with android apps, but not building graphics when I wrote the face…it took a couple days to get past the learning curve and write the face, mostly because of the arcs in the mouth…haven’t touched the code since. The overlays were probably another day or so.

Hope this helps. I use a book on android progamming as a reference and google a lot if I get stuck. Happy coding!

dangerousthing · July 3, 2014, 2:00pm

Absolutely wonderful!
Wow!

I don’t think I’ve ever seen an AI this useful or detailed built in such a short time. And I thought I was being ambitious hoping that I could give my robot Groucho a sense of humor.

I’ve just gotten back into robotics again after a decade out, and I’m building three bots currently, one of them a wheelchair based robot to be built after I build a test robot. I also come from a programming background and can see a great many educational uses for this sort of AI. If you need a collaborator, I’m more than willing to help.

Though I’m running on Linux (Groucho’s brain), I have enough experience to do conversions when necessary. I’m not using Androids as robot brains for various reasons. I also have a few Windows machines around the house if necessary.

I have problems with our web connection, and would love it if you made your AI open-source. However, I do understand objections to doing this - mainly that it would be more work for you. It wouldn’t prevent you from selling a newer version though, and this sounds like a salable product.

Sometime in that last couple of months I read of a research AI that could take a concept and decide if various things were similar or not by using Google images to compare things once they have text to compare. For example, under “dog” it ran various images of dogs and tried to figure out which were similar to other dog concepts. For example, poodle was not similar to German Shepherd. The idea was that it could run autonomously. And now I can’t find it in my bookmarks. Argh!

mtriplett · July 3, 2014, 8:52pm

Re: Absolutely wonderful!

Thanks so much for the positive feedback. Seriously.

Good luck with the sense of humor. Sense of humor is tough in my opinion…I taught Anna to play “Who’s on first?”, the famous Abbot and Costello routine. It doesn’t come off all that funny just yet. Something in the humor is lost by the robotic speech delivery and slight pauses. I’m considering a learning routine whether the bot learns and tags its responses from me saying “That is funny”, “That is rude”, “That is ____”, and so uses this “tagging” of responses later depending on whether the mood is funny, serious, whatever. She says funny things now but its by accident of quoting funny people. She doesn’t know what she said is funny or choose to be funny yet. That will change.

I just saw the HelloSpoon robot about 10 minutes ago and my brain is spinning about putting Anna on an actual wheelchair (with a person) and an arm…I’m quite preoccupied with the possibilities for the handicapped.

You brought up Linux, so I must say that I feel a need to dive in and get some Linux hardware going. Should I get a Pi? What would you recommend? Android runs on a version of Linux, but how much similarity there is I have no idea. Most of the verbal capabilities on my bot are using C# on Windows 7, and SQL Server…its what I knew from my silicon valley days. I keep banging against the need to do some serious image processing along the lines of what you are talking about (What image in the DB is most similar to the current image? What objects does this image seem to contain?) and it seems like I might need Linux instead of Windows.

I keep entertaining the open source option. I’d like to find a way to collaborate with more folks without having so many copies/versions emerge. I’m more used to source control (check in/check out) and am new to opensource and how to manage things. The code is still immature and evolving, and I want to find a way to benefit from each other rather than just copying/branching and working in silos.

If you find that link pass it on. Thanks again and good luck with Groucho.

dangerousthing · July 4, 2014, 4:54am

Ayup, Comedy is hard…
When I do come across that bookmark I’ll get it to you.

I will admit that open source projects are difficult. You get more people helping you, but then you need a “manager” to keep all the code coherent which means that (probably you) has less time to code.

Have you thought of using ROS (www.ros.org) to handle the more physical parts of your robot?

I very much like how you’ve compartmentalized your code such that you can change one part of it without messing up the entire system. If you knew how long I tried to teach my up former unit at Penn State to do this… This was even more futile to convince my managers to release the educational software we wrote as open source rather than just sitting on it.

As for learning Linux I see two main ways to go.

One is to use the Raspberry Pi (the RasPi). This has the advantage of a large community that is dedicated to education. The disadvantage is that it is a fairly slow computer by current standards, but that isn’t always bad.

The other way is to take a PC and put Linux on it. It’s faster, has more memory and disk, that sort if thing. Groucho runs on an Intel NUC (i5, 16 GB RAM, 1.25 TB SSD), however, you don’t need that much.

I put Ubuntu on my Linux boxes when possible. If going the RasPi route, I’d stick with Raspian so that the examples in the various docs will work correctly.

Steam Automaton Groucho is the main robot in the Troupe, along with Zeppo, Harpo, and Chico. Groucho coordinates the others via wifi (that’s the idea, anyway), because the others will be smaller robots running on BeagleBone Blacks. I’m hoping to have Zeppo up shortly (a few weeks as I find the time). Eventually I’ll get to the detailing to make a somewhat Steampunk theme for the bots; my 3d printer should come in sometime!

I had a major setback with Groucho, because the battery won’t fit into the wheelchair frame without a lot so metalwork - cutting at the very least, and I don’t do that any more. So I may build the base from plywood and aluminum extrusions using the wheelchair motors and casters. Hmmmm, I may be able to cobble something together from the two frames I have…

Have to try to get Lee to get some sleep. It’s almost 5am and she it’s still watching tv or web surfing.

TTFN

deltabravo · July 6, 2014, 5:50am

Great project!
Looking at what is already done concerning my project, I found yours… What a great project! That’s almost mine! So as I’m already one year late behind you, I got hard work to archieve so my bot one day will discuss with yours…

After being impressed by your robot, after becoming sad understanding you were far advanced in the project, Anna became a source of hope seeing what can be done in one year (maybe 2 with my jammed programming skills).

So of course I have a lot of questions to ask you if you have some time to answer it. First, where did you learn what you are implemented concerning machine learning (management of knowledge database for robots, learning agents, …). Second, which speech api are you using? Is that Microsoft one which is usable offline or Google one needed to be online or sphinx under linux? And would you be kind enough to explain your choice? Many more questions but I keep them for a next time…

Keep it great!

Regards,

Delta.

dangerousthing · July 7, 2014, 5:05pm

Questions about speech to text and concepts.
Martin,

I also have some questions if you’re willing to answer them.

What do you use for the speech-to-text part? Unfortunately this is one place where Linux is fairly weak, but things are advancing.
What do you use to parse the “sentences”?
Anna seems to learn concepts. How do you handle the cases where something has two obvious meanings. For example, Nirvana? How do you store the concepts learned?

As for humor, I’ve got a few ideas for tagging things as humor. An old book that touched on the subject started me thinking along this way. The book was Stranger in a Strange Land by Robert Heinlein. In the book, an old man had to teach a young man who had been raised by Martians about humor. He broke humor into a few types: Funny Once, Funny Always, and Funny Never (there may have been more categories, but I haven’t read the book in over 40 years).

I’ve also got some rough ideas floating in my morass of a mind about how to use a list of real quotes to make new statements of the same type. For example, given a long list of Groucho Marx quotes, it should be able to come up with other “quotes” that sound like Groucho might have said them. However, getting this to reality may take a bit of time.

mtriplett · July 7, 2014, 7:29pm

Re: Great Project!

Hey, thanks. I’m so glad to hear you are doing a similar project. I took a year, but I bet it could be done in 2-3 months if someone had the skills, was decisive, and didn’t have a life. I wasted loads of time playing with plastic boxes just deciding on the physical basic layout and the sonar array. I had no prior robotic skills other than programming on PCs and Androids, no microcontroller experience.

My machine learning techniques are homegrown. After I built the bot, I tried learning some concepts from on online university, but quickly decided that (in my opinion) academics are limited by a thought process that centers around statistics and math, and not programming or patterns. Good programming is about recognizing patterns and then building code and/or metadata around that.

Anna basically recognizes sentences (mainly with regular expressions), and routes the sentence to the applicable agents (can be several at same time). To do this, I found a chunk of code on the net that could take answer the question “What regular expressions in a db match a given string?” which is a HUGE leap from the typical “Does 1 regular expression match this string?” These agents then either “learn” by then creating one or more structured “memories” in the database, or they “respond” by recalling one or more memories from the database and producing a response. Many of these agents search memory “trees” either up or down (parents or children), usually not more than 6 levels deep…something like “Does Bob have a brain?” would involve a small tree search. Each agent can basically use whatever algorithms it need to, so they are homespun for each purpose. I also do a lot of tricks with a database of sentences and attributes such that the AI can recognize questions or statements in first, second, or third person and translate from question to answer or from answer to question for the given person being referenced, inserting the data along the way. I seriously don’t know how anyone could create a great robot without designing a great datamodel for the brain. I’m not there yet and I break a lot of rules, but it is evolving and slowly getting better. The point is, I think more people need to be thinking about “metadata” driven behavior and intelligence.

As for the speech API, I’m using the one Motorola put on the phone I’m using…Google. Its not usuable offline, but my bot always has Wifi at home. I expect newer and newer phones either are or will soon be running the speech directly on the phone. In addition, I’m running microsoft’s textotspeech engine on my PC. The bot doesn’t use it, but I use it when I’m testing the AI through a windows app without the bot.

I am happy to answer any questions you have or help in any way I can (within reason!) Good luck on your project. I’m looking forward to seeing it soon! Will the brain be Android, Linux, Windows?

Regards,

Martin

mtriplett · July 7, 2014, 9:28pm

Re: Questions about speech to text and concepts.

I am happy to answer questions. If I don’t answer in enough detail, let me know.

I think I answered some of#1 and #2 are in my response to DeltaBravo. Since it was lengthy, I’ll assume that you read that response first, so I don’t repeat.

“Parsing” involves “Normalizing” the sentence first, and then “Annotating” it. Normalizing involves finding the largest phrases (multi-word takes precedence over single word) starting from the beginning of a sentence and passing through to the end, removing each piece as it is normalized, while building an array of “Words/Phrase” objects in the sentece. Normalization also involves plurals to singulars, removing useless words or politeness words that don’t add meaning. I don’t do much in the way of Annotating, but I can see that I will need to. There are people that have shared work FAR better than mine on this stuff…I homespun mine as I was using Windows and SQL Server and not Unix/text files.

Once the sentence has gone through this process, I have a object array that represents the sentence, the original text sentence, and a normalized text version of the sentence. I can do a database lookup to see if I can find that sentence as a command, human request, topic, or any other type of sentence. I can also test the sentence against the database of regular expressions, and get a list of all the regular expressions that the sentence matches. This basically gives me a list of software “Agents” to execute, as each regular expression has an assigned “Agent”. Each one executes, doing whatever it needs to do, adding a randomized response from a ResponseList, a probability, and an emotional response. Some agents and not tied to reg expressions, and run all the time at the beginning and end of the process, like babble agents and system agents. When everything is done…higher probability wins. Response and Emotion are sent back. I would be more than happy to explain more on this if you have more questions.

Your Question number 3 is a very interesting one. I will say that Anna will needs improvement on handling concepts with the same name. Like people, she can handle several situations correctly despite ambiguity.

What is Nirvana? - she would give whatever multiple answers she knew, like “A Band and a State of Mind”. No problem here.

Is Nirvana a Band? - No problem here either, as she can safely answer in the positive. I would think a lot of questions asked in a negative way would be a problem.

You are quite intuitive and correct to suggest that this can be a problem. I have seen it happen, and she will “Babble” about the wrong meaning when given the opportunity…I can be talking about Music, and she could start babbling confucious sayings that contain Nirvana, for example. The nice thing is, it can be quite funny and interesting. I would like to solve this issue at some point. I want to deal with circumstances where I am talking about something like football ( a context) and I say something like “What do you think of Alabama?” and have the robot know I’m talking about the university of Alabama’s football team and not the state of Alabama. The robot might say ambiguously like “Alabama is beautiful”…doesn’t really work.

Wow, Stranger in a Strange Land… I read and “grokked” that book around 1990. It was my favorite book then and perhaps since. I read a lot of Heinlein back in the day, my fav author bar none, genius. Heinlein also said that there is some kind of pain at the root of ALL humor.

Very interesting ideas you are having around humor, hope you can pull it off. I ran accross a paper or a video through an online university about someone getting a robot to form its own sentences using probabilities of words being next to each other in a large body of imported data. Sounds interesting (and I hope you get inspired and show me how to do it). Sometimes I think the AI people try to use statistics for too many things…seems like a lot of crappy/incorrect output would happen. They say “Oh but it will learn!”…to which I would ask “Before I die?” But hey, if it can work, I’d love to do it.

Thanks for the interest,

Martin

moheshmohan · July 8, 2014, 1:44am

Awesome work

That is one awesome robot you have made. I am working on a similar one myself but it will take some time. Do you have the source code for this posted somewhere so that noobies like us can learn from that? Thanks for sharing so much details on this. Awesome work

bonzadog · July 8, 2014, 5:49am

This is a fasinationg Project

8.July.2014

Hello,

This is a fasinating project and it exactly where I want to go in Robotics.

Where did the mechanical + motors come from ?? …here in the EU I have problems getting a decent chassis

and I am not mechanically minded or have the tools , so I do not really want to build one myself and computer the motorisation.

An exellent write up and design - well done indeed!

BD

dangerousthing · July 8, 2014, 11:33pm

**Martin,

Thanks of the**
Martin,

Thanks of the answer. Please call me Jay.

I like the idea of parsing/normalizing/annotating. Perhaps you could make “pluralizing” an annotation. You probably don’t need it for too many things, but it can sometimes be useful.

It sounds like Anna has some idea about who she is talking with so that she can make more appropriate responses. Does this extend to a group, or does this she just concentrate on one person.?

If you remember the names or URLs, I’d love to take a look at them. I’d also love to take a look at your code. I’m really hoping you have good backup habits. At the beginning of the month we lost two computers due to a lightning hit. It seems to have come in through the DSL modem and fried all the computers connected via ethernet. I don’t morn the computers too much, but I do morn of the data lost. I’m still looking of that url I promised you, but I’m suspecting that it’s on my laptop rather than this iPad which means I’ll have to look for it later when my wife is asleep.

Multiple meanings is always a problem. Some programs handle this with context. I understand the basics of statistical programming and tranlation, but I’d have to learn the math and algorithms behind it. And here I was hoping you could explain it to me. :). I hadn’t heard that they were using it for robots, but it makes sense. Basically (which is all I know at the moment - I specialized in educational/user-interface programming) is that if you take a large enough text you can get the probabilities of groups of words being together in certain contexts without having to the full grammar. This could be useful in many ways, but I’d imagine that unless your text was taken from idiomaticically spoken English (or whatever language you wanted) then the robot would sound too bookish.

Statistical programs can learn if they get some kind of feedback. I’d imagine that would initially come from the programmers (well, grad students working for the programmers) who would be paid to talk to the robot and give it feedback and correction when it said something incorrectly or awkwardly. I would initially have the feedback come from a button, but also I’d have the robot keep track of the conversations so it could learn from them also.

Oh, and I finally found what MIBE meant. I knew what you were doing wasn’t too far from what I had started many years ago, though I just called it behavioral programming or subsumption++. I even had the book Robot Shaping sitting in my bathroom until we moved and now I can’t find it!

Thanks for listening,

D. Jay Newman

mtriplett · July 8, 2014, 11:34pm

Not Open Source - YET

Thanks for the kind words. Sorry, I haven’t made the source code available. Long story…a lot of intellectual property that goes far beyond robotics into the business community at large.

I’d be happy to talk ideas, pseudo-code, questions, etc.

Regards,

Martin

mtriplett · July 9, 2014, 12:02am

Thanks, I hope you pursue it very far.

Thanks much. I’m so glad when others find this kind of bot interesting as I do.

The mechanicals are easy, so here goes:

I used a LynxMotion tracked chassis with the 2in wide treads that was given to me by a friend…you can get a new one for about $220USD I think. It will come with 2 12volt motors. I had no prior build expertise other than legos, so this worked out really well.

I used a “Base Rotate” with a hitec hs-422 servo to rotate the head, and an hs645mg for elevation. You can get all these parts from RobotShop or other websites. I used a few plastic project boxes from RadioShack, some thin plywood, spray primer and paint.

Happy building,

Martin

dangerousthing · July 9, 2014, 12:38am

LEVAN - Learning Everything about Anything
Hi Martin:

The url is: http://levan.cs.washington.edu

I have to hit “Add Concept” in order to get to the concepts page.

While it returns the data that it has, I don’t think that it is really accepting new concepts.

kingart3 · July 10, 2014, 10:55pm

Blown Away

This robot just blows me away.

Often we talk about what a robot can’t do well, while overlooking some of the things they can do very well. The ability to command the robot verbally and for it to hold converations is nothing short of AMAZING!!!

I am also quite impressed that you have packed so much into such a small package.

Thanks for sharing, there is tons to be learned from just watching this.

ridgelift · July 11, 2014, 12:46am

Anna

Anna is becoming more charming every time I look in on this amazing project.

It could be me, but it seems like Anna has already developed a wry sense of humor.

I cannot help smiling when I watch the interactions on the videos.

The variable delays while processing responses add to the flow of natural interaction and are not objectionable at all.

Great stuff Martin!

By the way, the small spring garden has come and gone for this year. Lots of cucumbers, turnips and kale. Not a good yield for the carrots. I think I need to adjust the soil ph for them to thrive.

mtriplett · July 11, 2014, 8:38pm

re: Blown Away

Thank you very much for the positive words and thanks for the compliment about fitting things in to a small package. To be honest, the PC where most (not all) of the verbal skills reside is off bot, so its not “packed in”. This is by design however so I can share the AI for my next bots. I am always envious of other LMR creations that fit so much into tight spaces and lighter weight. The inside of Anna is all custom pololu wiring harnesses for various sensors, and a couple too many breadboards.

I do try to focus on the pragmatic side of things on any given day…What can I really do? This might be another case of TMI - too much information… I realized after working with an app I had built on Android, that the base translation of Google was quite error prone, but by programming another layer on top of it, a fault tolerant “voice interface” could be achieved. Many phones and customer service systems on phones are a collosal fail because they don’t program in all the common misinterpretations of the voice engine. Example: When I say “Track Pink”, the voice engine from google often thinks I said “Track Paint”. This happens all over the place but can be dealt with by programming. Even a word as simple as “Set” often is misinterpreted as “Sex” by google.

Thanks again. When I get posts like this it helps me get motivated to create more video.

Regards,

Martin

mtriplett · July 11, 2014, 8:57pm

re: Anna

I really appreciate the feedback. I should probably work a little harder on getting more charm and sense of humor.

When kids aren’t around, she can really get people on the floor laughing when I take off the handcuffs and let her express herself in “Adult Language”. I keep it clean on the videos as I don’t want to offend anyone on LMR or be unsuitable for the kids. I’d like to find ways to get more humor without resorting to colorful language. She once had a sex therapist mode that was hilarious. I deleted it eventually to avoid it “popping out” at the wrong times.

Awesome to hear about your garden. We planted a small one this year with tomatoes and squash but it did not turn out too well, much to learn still and flooding rains didn’t help. We do however have the best figs in the world. We’ve been helping my mother with her garden…it is huge…and eating quite well year round with each season. Glad to hear others are skipping the processed stuff for the truly tasty.

Regards,

Martin

dangerousthing · July 12, 2014, 3:24am

**A sex therapist mode?

I**
A sex therapist mode?

I thought it was odd enough when a priest who is sworn to celibacy councils couples on “marital” issues.

Well, at least Anna would be unbiased…

Sorry. It’s late, and the idea still,seems hilarious to me. On the other hand, I once tried to teach the Nerd (my parrot - it got the nickname Nerd when Lee started singing “the bird, the bird is a nerd…” to the tune of the obvious song) Samual Jackson’s favorite phrase (m… f…). Lee forcibly stopped me, even though she taught him “eat me” and others. Robots are easier to teach.