AVA OS - "A Very Advanced Operating System"

Background & Motivation

My goal was to develop a reusable OS that could support my bots or others...anywhere from small or medium bots like my Anna/Ava, to bots reaching the complexity of an InMoov that can also move around.  In working on my prior bots, I learned many lessons and had many things from an OS perspective and backbone comms that I would like to improve.

My next bot will be around 30 inches tall and have many servos and 4 linear actuators.  It also needs to support a depth camera, an indoor localization system, and a FLIR camera.  This OS is going to be the brain for this bot which I am still designing for 3D printing later.

Goals of the OS

  1. I wanted a simple way to write code for arduinos, SBCs, and PCs, where they could all work together and coodinate.
  2. I wanted a "spinal cord" backbone with a lot higher thoughput and reliability.
  3. I wanted it ALL to be able to run on bot, no external web server.   For on the on bot setup, I chose a Latte Panda.
  4. I wanted to have multiple windows forms displays for the face, sonar, emotions, etc. and open/close as needed.
  5. I wanted the behaviors to be able to grow into sophisticated results while keeping its basic simplicity.
  6. I wanted the basic memory/DNA of the robot to be generic and something that could be iterated over.
  7. I wanted to be able to set goals and have the robot converge upon those goals without having to write code for timing.
  8. I wanted a core design that I could add learning algos, pattern recog, genetic algos, and other features later.  I wanted the brain to be able to mutate all its underlying memory values if necessary and try out combos.
  9. I wanted simple ways to control them or orchestrate complex actions.  One of these is voice or written text.  I wanted to communicate on my terms, and "talk" to the bot, its services, its very DNA...without having to write code.

Baseline Hardware Setup

The prototype "brain" is running on a 4G Latte Panda SBC running Windows 10 and SQL Server 2014 Express.  The Panda also has an Arduino Leonardo on board.  The two sides of the board communicate through serial running a library from Latte Panda on the windows side, and the Standard Firmata library on the Arduino side.  The I2C pins on the Panda's Leonardo are connected to an Arduino Mega.  The brain commiunicates to the Mega and any additional arduinos through this I2C connection.  A bluetooth dongle is connected to the Panda for Keyboard/Mouse.  A logitech cam/mic are connected to the Panda for video as well as audio listening.  Right now I am using an HDMI TV for the screen, but will add a Panda screen and touch overlay soon that does not use the HDMI.  The final bot could support two screens.

Hardware

The intention is to retrofit this brain into my Anna and Ava robots, as well as put this brain into my upcoming and as yet unnamed larger bot.  The larger bot will use an 8G Latte Panda Alpha.  I'll also be adding in servo controller, motor controller, and sensor set to this ref hardware setup.  Once again...12 sonars, and a whole lot more!

Progress to Date

  1. Developed a set of standards for communicating commands and data to/from a Panda and multiple arduinos with multiple services running on each platform.
  2. Developed the baseline windows based brain.
  3. Developed an Animated Face - Eyes, Mouth, etc.  This uses a windows forms app that hosts a WPF user control.
  4. Got the bot to listen (through the cam's mic) and do speech-to-text.  The microsoft speech-to-text needs more training.
  5. Got the bot to speak using the speech engine.
  6. Got the animated face to coordinate lips with phonemes using vismemes and events from the speech engine.
  7. Developed a core set of standards and services for managing communication, memories, database access, and sensors/actuators.
  8. Developed a lightweight set of verbal services to interpret speech, delegate actions, and generate responses.
  9. Developed standards for verbally enabling services so they can talk/listen to people or each other.  Each service registers its input and output "patterns" on startup and is then called when any of the input patterns is matched.  The verbal service will take care of all the language details and call each service with a set of named/value pairs that correspond to the arguments that the service needs to function.  In verbal service also handles generating varied verabl output responses from these services...so that the bot can say the same thing in multiple ways.
  10. Developed and tested a scripting language for orchestrating actions...called "Avascript".  
  11. Developed a database to support brains persistent data needs.  The baseline version has the following tables (Service, MemoryItem, Word, Script, WordAssoc, and Quantifier)
  12. Developed a baseline "difference engine" into the memory engine...more on that later.
  13. Added services for handling basic human pleasantries like greetings, goodbyes, addition, etc.  I plan to port Anna/Ava's skills over to this brain...eventually.
  14. At this point, she can listen, is polite, and can follow a lot of basic instructions and give responses.

Backbone System

It is critical for any bot with multiple processors to have a reliable way to communicate from one platform to another.  This is why I started on this part first, as my prior bot had problems here.  Right now I am doing serial/I2C to deliver frequent messages from Panda/Leonardo/Mega and back.  This is where I see the highest throughput being required.  I will also likely add bluetooth and http delivery, just at lower frequency levels.

Services System

The brain is made up of services running on various hardware platforms.  Service communicate by sending commands or asking a script to execute...which gets translated into commands.  Every command is like mail that has a zip code that allows it to get to where it needs to go.  Each service on the Panda implements an interface.  Each arduino must simply conform to some basic coding standards to receive/send messages.  Each service can be called at a specified frequency.  Services will receive notifications of various events, like when a memory item has changed.  Services can do synchronous and asynchronous actions, depending on the need.  Services can have their own verbal patterns that they can respond to in verbal or non-verbal ways.

Panda Memory Engine - Its like DNA that you can talk to

The bots primary memory is made up of many "MemoryItems".  Each memory item represents some state information that could change at runtime.   The system has metadata about each memory that tells it everything it needs to know to handle its behavior, goal setting, verbal functions, and how to route notifications on the Panda and on one or more arduinos.  Each memories metadata is defaulted on startup from a db table.  

Each memory item can have various english "names" that it corresponds to, valid values, and each one of those values could have one or more names that they correspond to.  In the end, the instruction "Set heading to south" can then be translated into bytes over I2C to the right arduino, the right service, with the proper parameters.

Arduino Memory System - Micro-Controller Friendly

While it is easy to have lots of memories (with multiple names) on a windows machine like the Panda, its not practical on an arduino.  What is needed is a way to shuffle memories back and forth between Windows and arduinos but not bog the arduinos down with strings or bloated data.   What I did was allow each MemoryItem on the windows side to have a set of identifiers that give it meaning on the arduino side, notably an ID.   The memoy system on the arduino side can hold only the memories it cares about, in an array, while the windows side can hold memories from all platforms in a hashtable indexed by ID, Name Alias, etc...and use whatever method desired to look it up.  The separate ID and array based storage on the arduino side means that items are looked up by integer into arrays...so a constants file in the arduino project is desirable to make everything readable.   example:  LEFT_MOTOR_SPEED = 1, RIGHT_MOTOR_SPEED = 2, etc.

"Avascript" Scripting Engine - also known as English.

Avascript is natural language english, but could be any language if future by adding translation and alternate grammars.  With avascript, commands are simply written or spoken as words, sentences, or paragraphs, and given names.  You can simply write a script named "Wake Up" which might say "Set alertness to maximum. 2nd Instruction.  3rd Instruction, etc." or "Blink" with script "Set Pin 13 to on for 1000"   It's that simple.  Every service, memory item, and script can have multiple aliases so you can say things in multiple ways.  "Sleep" and "Close your eyes" could resolve to the same thing.  Scripts can call other scripts.

The ability to talk to any service or any piece of memory, without coding, and save it with a name/multiple aliases, is one of the key and fundamental ways I plan to enable the system to grow while remaining as simple as possible.

Difference Engine

I am not a mathematician, so I will explain what I mean in my use of this term.  At any given point in time, the bot needs to recognize differences between its current state and its goals, and try to achieve its goals by reducing/eliminating the differences in a graceful manner.

To illustrate the basic need, lets take an example.  The bot might have a "Speed" of 255 and want to stop (a speed of 0)  If it stopped suddenly, it might fall over or break its neck.  In this example, we can set the GoalValue to 0, with a GoalSpeed of 5, and a GoalInterval of 50.    This means the bot will speed up or down in increments of 5 every 50 milliseconds, thereby not breaking its neck or falling over...I have done both of these.

To support this, any memory item has a "CurrentValue" and CAN have a "GoalValue", "GoalType", "GoalSpeed", "GoalInterval", and "LockingInterval"

The difference engine will converge all memory items upon their goals using different speeds/models.  Basic example...If you say look left, but the bot is looking down and to the right...it will converge the X and Y to the desired angle at a particular speed...and publish incremental changes along the way to the associated services.  Right now I am simply converging using a speed and time interval (in ms).    I am building it to support other goal seeking models though...like one that halves the difference on each interval, or one that accelerates and decellerates to the goal.  The locking interval can be used to prevent one action from stepping on another for a duration of time.  If I say look left, I don't want the bots autonomous eye movements to contradict my order in the next instant.

Thats all for now...back to the bots.


This is a companion discussion topic for the original entry at https://community.robotshop.com/robots/show/ava-os-a-very-advanced-operating-system

Great concept you have going

Great concept you have going there Martin.

I find the goals interesting, great way to keep the bot steady and stable.

What will you be using for the speech recognition. How will you handle noisy enviroments when trying to control via speech.

I did a project using speech recognition and ended buying a wireless lapel mic and reciever so that the bot could hear my commands in a noisy enviroment.

re: Jeff

There are a few basic models I am likely to use in different situations:

A.  Bot listens continuously but doesn’t answer verbally…avoiding false positives.

B.  Bot listens continuously but doesn’t answer verbally to what it hears unless it hears certain keywords first…like its name.

C.  Bot receives verbal commands via BT or Http from a phone/tablet app that someone can use in a crowded room as a voice enabled remote control (using Google Text-To-Speech)

D.  Bot receives IR commands from nearby emitters and decodes them

E.  Bot receives Email/Text/Twitter

F.  All of the Above

At this point, unless I get much better results from the MS speech to text, I will probably use Option C while in crowded public spaces.  I’d like to get a continuous listening and responding (with no prior keywords) in 1 on 1 situations.  The MS lib needs a lot more training with my voice for me to have any confidence in that.

 Regards,

Martin

AI architecture

It is cool that you build reusable tools, this is what separates a professional from a beginner, keep up the good work! :slight_smile:

With AVA OS you are basically building a Cognitive Architecture + hardware control system a Robot OS.

It looks like your “Difference Engine” is for robot kinematics/dynamics only, so i think it can be called a Motion Planner.

I have a few questions:

Do AVA OS have some AI planning ability, by planning i mean searching for action sequences to achive an abstract goal from a current state?

Is there a chance that your project will be open source?

(I’m not a native english speaker, so sorry for grammar errors.)

re: Notaii

When I said “Advanced”…I was partially joking, I hope for it to be advanced one day.

re:  Difference Engine and Motion…More Uses I Think

While the application of the difference engine to motion is clear, I think the difference engine is useful for more than just motion.  I use it for emotional states.  I plan to use it for motivation states as well, which will probably drive more autonomous behavior at some point.  The “autonomous behavior” will fall into categories of things you can see as well as things you can’t (like thoughts).  For example, curiousity can be a motive, which could trigger particular thoughts/algos to happen.  I think there will be a lot more uses yet to be thought of.

re:  Planning…I wish I had more to offer here

For Path Planning…I do have some working A*'s-like code for this that will be integrated at some point.  This will be a lot more applicable once I integrate GPS and Posxyz sensor.  I used the code on Anna to drive a series of GPS waypoints outdoors…where the path planning preferred concrete over grass due to the terrain costs stored in its map.  I would like to have a SLAM capability, but don’t have code/sensors for that yet.

Another goal I have:   I would like to integrate a non-linear optimization engine (like the solver from excel that I have worked with a lot).  You can give it a model with a goal and a set of constraints and the engine will iterate until it finds an optimal solution.  I think this could become a PART of some planning abilities, but I imagine planning to be a LOT more complicated…sequences of dependent steps and the like.  It could be a start though.

re:  Open Source

There is a fairly good chance (not definite though, I would guess 80%) that I’ll open source this once I think there is a decent stable foundation laid down.  At this early stage, I still think I might make radical changes to the internal organization and public interfaces for adding additional “Services” to the code.  I would want to have some confidence that these interfaces are stable so other people could write add-on services without me breaking their code with each future version.  I am not there yet.

Thanks for the questions and interest.  By the way, I thought your grammar was excellent.  A previous version of this was multi-language, so this might become so too.

Regards,

Martin

Timeline like in animation software for the difference engine

You have many interesting ideas, maybe it could become a platform to use by other projects as well.

When ROS 2.0 finally comes out for Windows it would be exciting if it could be integrated.

Regarding your difference engine, if I understood you correctly, what you want to do is interpolation between two positions. A similar idea is used in animation or, since you mentioned WPF, Expression Blend. It allows to define key frames, one start and one goal frame, and then specify the type or transition/interpolation between those.

This method is a very powerful tool and is used for game, animation movies, or music writing. There are a multitude of related concepts and while the UIs all differ a bit the general idea is the same.

I am actually working on a software that implements such a timeline concept since a couple months, which will hopefully make it easy to describe movements of robot arms for example, and allow to chain actions intelligently. One major difference to traditional animation software will be that it will have to react to real world events (such as an obstacle that prevents reaching a goal point, or slower movements than planned in the animation).

I2C

I was crusing the web last night thinking about this project and came across this guy:

Pi to Arduino

https://oscarliang.com/raspberry-pi-arduino-connected-i2c/

Voice recognition

https://oscarliang.com/raspberry-pi-voice-recognition-works-like-siri/

Face recognition

https://oscarliang.com/raspberry-pi-face-recognition-opencv/

Might help someone build off what he knows.