re: Memories and Indexing
Hi Peter,
I’ve been admiring your work for a long time. I started printing my own InMoov but got distracted and built Ava instead. I’ll do my best to answer your question in as simple a way as possible without losing the meaning.
I used to be a data modeller for a lot of fortune 500s among other things…everything we did was in 5th Normal form. What I am about to share breaks a lot of conventions and has some disadvantages, however there are a great number of advantages. When you are doing something this big and solo such as i am, looking for ways to save huge chunks of dev time is a huge factor.
The primary table is called “Atom”. An atom represents a memory. Every atom has an AtomType which tells you what kind of memory it is. I’m guessing I have around a hundred different memory types. All the atom types are also defined and stored as atoms. Each type has a different meaning for its data columns. One of the columns for an Atom Type stores the name of the table where that type of memory will be stored. This means I can split up memories in multiple tables all having the same structure, I only use a few though.
Every atom has a name, around 5 foreign keys to itself (keys to other atoms), a few integer columns, a few varchar columns, and a few housekeeping columns (Source, Truth, ModifiedDate). You don’t know what is stored where until you know what the atom type of a memory is. All atoms of the same type use the columns in a consistent way.
I have a type called AtomTypeField. It is a memory that stores the meaning of a field for a given atom type. Is is able to define a label for the field, whether it is required, and another AtomType to be used as a list of values for a field (if applicable).
While this may sound a bit weird, the end result is I could build a single form (in a day or two) that allowed me to edit any memory, regardless of its structure, and provide labels and dropdown lists where appropriate, and show me all the other memories that use this memory. This means one form can maintain any of 100 different memory types and growing, as well as all the foreigh keys. This saves months of work and a myriad of bugs over more conventional methods.
This also means I can dream up new types of memories and create them and get them working in the system in minutes. I simply go to the UI and create a new AtomType in a form, and create a few AtomTypeFields in a form to define the columns for that new memory type. So quick, so easy, no coding and no bugs.
Likewise, a single class can handle loading and persistence for all memories. I called it…Atom. Quick, simple, no bugs. All code everywhere only has to deal with a single class…Atom, however it does need to know what kind of atom it is and the meaning of its fields. This is in practice not a big deal, as a given Agent (algo) in the system usually is only dealing with one or a few memory types.
How are things indexed?
This is going to get weird again. For most of my memory types, my code doesn’t use the database directly. I use a class I have written called an “AtomList”, that serves as a cache (of all Atoms of a given type) and an index so they can be looked up in the best way for that type of memory, usually by name but sometimes not. An AtomList basically holds two hybriddictionaries…one for the Atoms (memories) of a given type, and one for an index so the data can be looked up by id, name, or a combo of columns if needed. The hybriddictionaries are the part of the system being pounded, not the db.
In general, I am caching this much because I do so many memory lookups and iterations for the verbal processing and the logic operations. Something like “What color is a potato?” results in a lot of memory lookups. My DB won’t use multiple threads, and eventually I would like to make my brain execute a lot of its major features simultaneously.
I have a “MemoryAgent” that loads up all these AtomLists on startup for all the frequently used memory types. Each type that I want to load ends up taking one line of code. Loading up most of the memories ends up taking a few seconds on my PC, but after that and after OpenNLP warms up, everything is zippy.
I still have features that use the database and SQL, they are just not frequently used algos. I have a “MemorySearch” object that can do just about any search I need. There is also a facility for caching the results of those searches with a unique key so that repetitive searches can be avoided, which I use in a few cases.
I have a similar mechanism for dealing with web requests or web service requests, as this can be thought of as a unique web URL and parms, resulting in a string of output content. You can cache this in a hybriddictionary with the URL as a key, stick it in between your code and the web, and avoid a lot of repetitive web requests. If your bot is going to spend 5 minutes talking about today’s news or weather, it helps not to have to keep retrieving it from a 3rd party, license permitting.
Cheers,
Martin