**re: Wikipedia **
Jay, I have tried a few variations of your query about Wikipedia.
I map words in my dictionary to terms in wikipedia. This allows some questions to be answered by going out to wikipedia and grabbing particular bits of the xml.
Question Answering: An example in the videos is when I ask questions like “What is the population of germany?” or some such. There are a great many xml tags in the wikipedia data. I have also tried tags about people so I can do things like ask how much a given athelete weighs or what team they play for.
Reading and Learning: I tried writing a routine that grabs given articles, removes html/xml and other markup, and then attempt to find sentences in the remaining data that contain a given word. Of the results, this routine currently generates about 30% garbage. The rest are decent knowledge about the world that a bot could use in conversation. Right now I don’t set it loose because I don’t want the garbage and the routines coule run until my harddrive filled up with excess trivia.
Blogosphere
Sounds interesting. I’m not really a blogger, so I have trouble imagining the possibilities. I suppose “infering/learning” from many different sources (Twitter?) could lead to a lot of robot knowledge or a bunch of fluff, depending on the source I suppose. I would need a good way to filter out the garbage, like the wiki issues, or is some probabilistic inference approach possible?
NLP
I believe words, phrases, etc should be atoms. Last night I converted all of Anna’s known words, phrases, and associations into Atoms. This is paramount for word/phrase associations to occur. Some people call these “triples”. “Birds can fly”, “Humans are primates”, etc. The next level would be triple+, something like “Birds have 2 wings”. This might sound like a weird exception, but the one thing I am leaving out of atoms currently is the Princeton WordNet data, as I mainly just use it to look up the definition or part of speech of a word. It has 210000 rows and I don’t think all that is particularly needed in the atomspace. I could change my mind and import them if I decide to decipher their synonymn/antonym data, but I can get that sort of stuff from a web service. I have been teaching those to Anna manually in speech. Besides, I create a new “word” atom the first time the bot encounters it automatically, so I think I could do without around 190000 atoms that I will likely never use and can look in up in the other table when needed. My next goal is to convert sentences, this is probably my most complex conversion as I have a lot of sub structure and sub types of sentences.
It looks like a good bit of OpenCogs triples and other brain data is available. I could load these as Atoms. That would be quite a knowledge set.
I include a “Usage” on every atom that tallies the number of times that word has been used. That alone is probably not enough, I will probably also need a “probabillity” or “strength” for an atom. Not worried there, easy stuff to create, just how to use it?
Attachment Theory and Trust Level
I am considering some type of robot/human “attachment theory” and “trust level” where the robot is inherently trusting of its creator and close friends (which must be earned), and inherently skeptical of others without multiple sources. This means I will need to save sources for everything. While that means a lot of “source” atoms, which are simple associative atoms to people, websites, etc, which are simple more atoms, it is useful to be able to interact socially and say things like “I heard from Jenn that your mother was sick.”, or “Wikipedia says birds are modern day dinosaurs.” I think this is necessary. Also inherent in this model would be “trust level” atoms to represent the level of trust that the robot has in a given person/source/website/etc.
B.S. Detector
Ideally I’d like the robot to evaluate a new statement like “Penguins can fly” against prior statements from other sources like “Penguins can not fly” and determine which statements are credible and which should be forgotten, and which sources "Might not know what they are talking about because they seem to give me a lot of B.S. and lower their trust level.
Brain Viewing
I would like to make the brain “viewable” as soon as I can. The sooner I can make it transparent, the sooner I can get help. I have to convert all her existing memories to atoms, and write a webapp to view and edit the atomspace. At this point I could make the memory “viewable” and editable. The brain would not be usable though until I rewrite the various services and agents to use the new memories, and re-test everything Thats before I make any improvements.
Its a lot of work, but I know it can be done, I know it will be powerful, and I know I could do it if I could sit down for even 2 weeks of straight time and crank it out. Unfortunately, my days are filled right now with other summer labors. It will happen though. I would like to pull you and a few others in at that point and see if I can get a few to try it out and keep contributing ideas. I very much value your ongoing thought contribution/leadership on this.
I’ve mellowed in my enthusiasm for OpenCog. I’m no longer frothing at the mouth, I’m merely a big fan and student of their work. I still lack the computer science and AI background to grok a lot of techniques they are using. Perhaps others can help on that. I like Minsky’s ideas too and they seem similar (although I haven’t read his stuff, just wiki). Minsky makes me think that my memory ideas and having a bunch a fairly simple agents for different purposes is a good foundation to build on. We’ll see.
A final thought…“Gesture” atoms. A gesture would represent a pose or a set of synchronized movements. If robot responses can be associated with gesture nodes, then this facilitates more expressive behavior beyond just facial emotion expression, if a robot has arms, head movement, or other means of movement/expression. Figuring out how to map that to servo movements will be on wish list for next year. More atoms.