MDIBs - Metadata Driven Internet Brains

re: Diff Sensors

I heard about the Kinect data from another member.  Interesting.  I wouldn’t know how to do the math on the pattern matching for that.  It could be very useful for human gesture / stickfigure recognition, which I wrote a little about elsewhere in this forum.

I hear there is some new thermopile camera attachment that hooks up to a phone for about $300-$400.  A db of images and pattern matching based on heat signatures could be useful.  Once again, the math is beyond me right now.  Anna’s thermopile is only 16x4 pixels, probably not enough to do anything all that great.

I have long wanted some sort of visual landmark memory.  One that might be simple enough for me to make work…I speculated that if you pointed a robot due west, and recorded the rarest color every say 5 degrees, swinging around to the north and around to due east, adding each “rarest color” into a color “barcode” of sorts.  (Instead of rarest color, the change in color (up or down) might work better so as to remove the significance of the actual color number in HSV color space.)  You could do it south as well, or in a full 360.  Then, if you could pattern match this “barcode” against a memory of barcode scans from around a house.  This might be able to tell you where you are in a house.  If there were a couple of rarer patterns then in the barcode, the bearing could be found to those, and a position could be estimated within the room.   This is just a theory.  Its really localization and not a general purpose visual memory.  There are some OpenCV techniques that are probably more sound, I was just trying to come up with something new.  

I am doubtful on using sonars for any kind of pattern matching, although I do get a bit of some of the localization techniques based on them.  At the least, my thought would be to allow the developer to “Add Sensors and Services” to their bot using the website.  This would include sonars and algorithms, specifying the orientation of the sensor.  Maybe a set of algorithms (like Force Field) could be present that could be turned on and configured for the specifics of the given bot.

On a related “pattern matching” front, I am still trying to wrap my head around what a hypergraph really is and how I might pattern match hypergraphs.  Is that is even a valid concept?  I’d hate to build something that it conceptually similar to hypergraphs without realizing it, and not be able to benefit from the mathematical work that has been done on them.  Also, how to pattern match natural language parses?  So much of this is new to me.  You implied somewhere that you understood the math behind some of these things.  I don’t yet.  Calc and Stats was as far as I got.

Hypergraphs
Hypergraphs are, as I understand it from the OpenCog blog, merely a form of graph that breaks 2 rules:

  1. A node can be a graph or hypergraph.

  2. A link can link many nodes to many other nodes, instead of just one node to one node.

Both of these things have been done in relational databases for a long time, so I think you’ve been using hypergraphs without realizing it.

You can convert any graph into a hypergraph; in fact every graph already is a hypergraph. I believe that you can contpvert any hypergraph into a graph, but I can’t prove it.

The advantage of a hypergraph is that seems a better model to store information with for cognition. In other words, our data storage is more than a bit chaosy. Another advantage is that it seems to be easier to do pattern matching on hypergraphs.

And I may have had a couple of math classes more (probability, proofs, and differential equations – I remember very little of the last two), but I do read a lot.

I’d try OpenNLP FIRST
I don’t know how fast OpenNLP is compared to your algorithm. I would try getting a parse tree to see if that can get you to the core of the sentence. It might be more accurate than your routines, especially if you can find a way to keep the modifiers (numbers, etc) as annotations or whatnot.

Perhaps each word could be an atom, or maybe just the core of the word. A sentence would be a graph that connects the words in their proper order (and also be an atom). Perhaps the proper order is as a parsed tree, I don’t know.

Mixing of OpenNLP and Anna’s Brain

Thanks for thinking about this Jay,  I believe we are on the same page generally.

This is my latest over-simplified pseudo-code of how to fit my prior code and OpenNLP together:

Sentence Comes In as Input:

1)  Calculate “Normal Sentence” with “Annotations hanging off of it”.  There are a few versions of the sentence held in different structures for various agents to use as they best see fit.

1.1)  Determine Tokens - Open NLP

1.2)  Do OpenNLP Parse - The first sentence takes a second to warm up OpenNLP and WordNet but after that I barely notice a speed difference if at all.

1.3)  Group Words Into Phrases - My stuff

the OpenNLP tokenizes into words, but my own routines recognize more multi-word phrases like private people, learned phrases, countries, states (New Jersey) as single concepts, which is good.  Maybe the OpenNLP tokenizing could be rectified by modifying the WordNet db that OpenNLP uses when tokenizing.

1.4)  Annotate Sentence with singulars, plurals, tense, adjectives and adverbs from NLP parse, and REMOVE all these words from the “Normalized Version” of the sentence that will be used for pattern matching.  This leaves me with the “Core of the Sentence” as you said.  This removal will have the HUGE initial benefit of making all my existing processes “match” when the input sentence contains adjectives, adverbs, etc. that the regular expression didn’t expect.  I can make the agents where this “additional color” is relevant look for this color in the annotations when they do their individual processing functions.  i.e.  The difference between “Go Forward” and “Go Forward Slowly” is an adverb that would be an annotation.

2)  Find Agents that Match “Normal Sentence” in Some Way.  The following basically represents a prioritizing from “Exact Matches” that take priority over less exact “Pattern Matches”.

2.1)  Look for Sentence in List of Commands (AtomList) and return commands to caller.  'Drive Forward"

2.2)  Look for Sentence in List of Chat Requests (AtomList) and return one of specified chat responses to caller - this is a random routine that checks responses against recent history so as not to repeat.  “Whats Up?”

2.3)  Look for 2nd Person Sentence in List of Questions (AtomList) and process accordingly if found.  (this is a topic unto itself)  “How old is Fred?”, translates to “How old be you?”, question is found and Fred’s response to this question is found.

2.4)  Find Regular Expressions that Match Sentence and execute corresponding agents.  “Where is London?”

2.5)  Find “NLP Patterns” that match Sentence - must invent this if necessary.  I need a way to define a bunch of NLP parse expressions with wildcards, and then a way to determine which of those “NLP patterns” match the input sentence’s NLP Parse Tree.  “How would you describe the fox?” - from the prior sentence “The quick brown fox jumped over the lazy dog” …the answer "The fox was brown and quick."

A bit on atom storage.  I am proceeding with data conversion.  Words, Phrases, and Sentences are atoms.  The Keep It Simple Stupid part of me stores sentences as strings rather than some ordered list of IDs to other Word atoms.  Maybe when I finally get the Graph/Hypergraph concept in my gut I will change my mind.  Is a string/word/phrase/sentence just a graph of ASCII characters anyway?  I don’t see storing sentences as strings as a problem right now and there are many advantages, as I can always lookup a word atom based on its string key or its int key.  Its a whole lot easier to comprehend what’s going on in the system if I can see words, phrases, sentences, etc. as strings that I can personally relate to.  I have been thinking about storing a full NLP parse of each sentence in another column, thinking that maybe I could then use SQL and LIKE expressions to do the “NLP Pattern Matching” process described in 2.5 in some way.

Try as I might, I can never write short posts on this topic.  I hope this though train helps some future hobbyists out there.