Processing Algorithms

Processing Algorithms Jun 1, 2015 18:27:47 GMT

Quote

Post by Admin on Jun 1, 2015 18:27:47 GMT

See Attachment

Processing Algorithms, routines and functions. These are listed in the order they are called in the program.

SC Sentence Code
The code is created from the words type value. If the word isn't in the Brain Database then the code associated with the word is set to B.

ST Sentence Types
This process attempts to work out what the inputted sentence is about, the types of sentences the input could be about are split into 5 main groups. The input could have more than one type
**** REFERANCE
RY: you referring to the Ai
RM: me referring the in putter
RB: both saying i something to you like i want you or i thank you
**** FORCFULL perform some task
FI: instruction : command or keyword
FF: fact : update brain
FP: preaching or dictating : defensive mode no religion
FA: answer : update brain
FQ: question : reply with an answer
**** GENERAL CHAT reply with interesting and relevant
GA: anecdote or story : try and find similar references relevant to ai and the story
GR: rambling conversation : tell a joke or story
GJ: joke : laugh
GQ: quote : maybe update brain, or ask a question about the quote
**** LEARNING update data base with new information
LI: information : update brain
LO: observation : question the observation maybe update brain
LS: suggestion : question the maybe update brain
**** CONNECTED TO THE AI LAST OUTPUT subject is set or object ongoing chat could be connected to other things
CR: reply : find something to say
CE: repeat : rephrase the last output
CS: same as reply but said differently : something like exactly or see it as understanding the last reply
CP: rephrase : something like exactly or see it as understanding the last reply
****
1606 this holds information on the conversation
filed 7 = subject
field 8 = object
field 10 = sentence type codes
in cell fields hold the subject and object nouns cell 1 is the subject cell 2 the object as new words are added the list moves up until dropping of the list at cell 9

QI Question and Is
If there is a question, word type 34 and the word is, word type 47 in the input then the following word is looked up and the detail field is sent to the output

LM Learning Mode
Creates associations between cells using the prompts in system cell 923

B Brain
The Brain Database is the database used by the Ai for all things

QNV Question Noun Verb
A question about a noun and its associated verb cause the QNV routine to scan through the various words detail fields to create associations. The resulting words are built into a sentence using the information from cell 909 the output filter for QNV field 8=first bit "i think that" file field=second bit "are" field 10=third bit "with" field 11=fourth part "do"

EI Expected in Input
The routine looks for words in the input that it is waiting for an answer to, set from a previous input. Field 8 the file field hold the two words. Depending on which words are found then the response is in field 10 and field 11. 10 is the response if word in 8 is found and 11 is the response if the word in file field is found. If the type field is set to 68 then the response is put into the input, if type field is 69 then the response is sent to the output. The operation field in entry 811 is the number of times to keep asking for the word before giving up, field 11 is the "I give up phrase".

K Keyword
Keywords are predefined words in the keyword field that perform various functions. Keywords are described in more detail here
The Keywords exitai and restartai are pre defined within the program, all other keywords are activated by any user defined cell word. Some Keywords can only be activated with a pattern.

PR Pattern Recognition, PC Pattern code and SC Sentence Code
The Sentence Code SC is created by using the word type number of the input words. This number corresponds to a character in the ASCII character set. The characters are added together to build the Sentence Code.
If SC learning is ticked in settings then sentence codes created from the input are added to the Brain db.
The type field for a sentence code is 32, the cell is only used if an exact match is found with the inputs sentence code. To turn the SC into a Pattern Code PC the cell type field needs to be changed to 60. Patterns cells are used when the Pattern is found anywhere in the inputs sentence code
Tags in field 11 are used to build an output
<prsendtobrowser>
<prsendtosearchengine>
<prsetname>sets the name variable and if SC learning is ticked in settings then adds the name to the Brain if its not already there. If the name is larger than one word then a dash - is put in between the names. If there is a number in field 10 then that number is used when adding to the Brain type field instead of the usual number 51 which is the type denoting a name. The names with spaces are added to the detail field . The name is that of the person talking to the Ai.
<prshowname>reads out the name variable puts the contents of field 8 in front of the name. If the name is in the brain then the contents of the detail field of the cell holding the name are added to the output
<prwordafter>
<activate> activates the cells neural network, sometimes other cells are activated instead of the pattern cell depending on other tags used
<use7> adds contents of field 7 to the output variable
<use8> adds contents of field 8 to the output variable
<use10> adds contents of field 10 to the output variable
<{any word type}> any pattern code eg pattern code for a noun is % this is created by f4 of the fish entry in the tf being 2 so if you use the tag <%> then the noun in the input is used. Because the sentence code is scanned words that aren't in the pattern but are still in the sentence code can be used for the output. F11 is used for output with words and tags. eg what is<B>, would use the bad word in the input for the output instead of the tag <B>
f9 is used when there is nothing in f10 this is legacy code and due to change.

NN Neural Network, NNE Neural Network Extended, BD Brain Database
The Brain Database is two databases merged together. These databases were the Translation File TF and the Neural Network Extended NNE databases. These were used on previous versions of the Ai program. all references to the TF or NNE now refer to the Brain Database BD.
Each entry in the Brain Database is called a cell each cell can be connected to other cells and grouped together in a matrix of cells. The processing of cells is done using a matrix file, this file holds the numbers of the cells to be activated, each line holds a number and the cells are processed until it reaches the end of the file. Cells can also be activated using a keyword, a Brain event or directly from the database window.
The file names in entry 919 fields 8, file field and field 10 are the file names of the matrix files used. Field 8 is the first to be read, then file field and then field 10. If there is nothing in the field then nothing happens. The file field is usually ...\weights\6PMatrix.txt this file is created by the NNE6P routine. When the Matrix file is created it uses the numbers in the files in the fields 8, file and 10
field 11 is put into the box marked cell action this means the cells can have a default output set when the words are added
Brain cells 800 to 2000 are used for various things by the program look in System Cells for more details
When a cell is activated the cells Neural Network is fed the input code. This is processed through the network and produces the output code. The network is trained using the training data which controls the output produced by the input. The network will always produce an output even if it hasn't been trained with that particular input code.
The input is a 9 bit code consisting of 0 and 1 the output is also a 9 bit code of 0 and 1.
Weights are created when the network is trained, these are stored within the Brain Database in a hidden field in the cell data
The field Cell Action performs actions before Tags are interpreted
cell IN to cell OUT = send input code straight to output code, bypasses processing by the NN
cell OUT to cell IN = feeds the output back into the input
The box's on the right are used to connect the output nodes of the NN to other input nodes in other cells. The input code in the cell is altered and changed to the output node eg. node 6 to node '4' in cell '2' if the output of the cell is 000001000 then counting from the left we can see node 6 is 1 so then if the input code of cell 2 was 010000000 it will now be 010001000 non of the other nodes are changed. If an input code is created by other cells then its important to have the final cell at the end of the list of cells in the matrix file which processes the cells in sequence.
O to TRAIN this button puts the input code and the output code into the training file, the weights filename with .nnt extension
The Tags field is used for function tags which do various things click here for details of Brain Tags

RAS Reference Action Subject
The RAS routine breaks down the input into 3 distinct types. This routine is designed to look at individual words in more depth. The routine utilizes the search routine first looking for the subject word then rescanning for the reference and action words. The assigning of words to different types is dependant on the types of words in the input ie the subject word may be the noun in the sentence and the reference may be the verb in the sentence but if there is no verb and there are 2 nouns then the reference word will be one of the nouns. This routine is particularly useful in finding information that could be in many files. The test, search file, contains over 60 different files from text files to web pages that are searched. The search takes less than a second on 1296mhz laptop. Because the RAS routine breaks down the sentence to 3 words only you would have thought that the meaning in the sentence would be lost but most times this isn't the case consider the question 'please tell me what the capital of england is' RAS=england-tell-capital this quite clearly describes what is wanted, this also means that different ways of talking or asking for something can still produce the correct answer.
If a file that is searched contains 2 or more of the RAS words then the sentence containing the words is sent to the output.
The Brain Database is also searched for the 3 words joined eg. england-tell-capital if found in the database the sentence in field 8 is sent to the output.
The file name in field 11 of entry 819 is the file name of the text file holding file names of files to search when using RAS

SJ Subject
The Subject routine attempts to work out what subject the conversation is about. The most frequently found word in the Short Term Memory STM is compared against key words that define subjects. Cells 840 to 900 are used by the Subject routine. Fields 8, 9 and 10 contain the keywords that the most occurring word in the input is compared with. The sentence in field 8 of the subject match is sent to the output. Words in field 7 are sometimes fed back into the input. This happens when there is no other output or the output is repeated.
Cell 40 is the 'no subject set' response.

SJN Subject New
The new subject routine is designed to better extract meaning from the input. Using a number of words in the input to do this eg. do you like talking, would you like to talk to me, what do you like talking about. The key operators to distinguish between are:
do you like
would you like
what do you like
Subject new uses cells 860 to 900 with the more complicated meanings toward the end, so if there is say 'you like' then this meaning is used if not overwritten by another meaning in following cells if say 'do you like' is a match then this meaning is used. Cell 860 is used by the Emotion routine so the fields order,11 and 8 are sometimes changed. Cells 861 to 900 are user defined. Field 11 is used for the subject words. The cell with the best match is activated. The Subject Reply variable, RQ number 14 in settings is set with the output from cell activation. eg.

field contents
order 3
field 8 if you <'> so
field 11 do what

using order set to 3 uses one of the Brain translation routines which also uses PR Tags. The word type code between brackets is replaced by the word of that type in the input. If a word of that type isn't found in the input then the tag is replaced with a space.

STM Short Term Memory
The STM is some times refereed to as the Data Store. The length of the STM is set in the box called Learning file length in settings. The number is the number of previous inputs to remember and refer to when looking for repeat occurrences of words when trying to work out what we are talking about. The word is first looked for using the Subject routine, if no match is found then the database is searched.
Brain Database cells 35 to 38 are the various default outputs if nothing is in field 11 of the word found most in STM. If SKIP is in 11 then the word is ignored, if NoSTM is in 11 then the previous word is used. 35=STM noun 36=STM verb 37=Word found in STM and TF not a noun or verb 38=Word in STM but not in database

QN rq slider in settings
This is the setting for the depth that the QNV algorithm goes to find an answer, can be heavy on processing power

SM Sum Maths algorithms

P and PAR Parameters
The PAR algorithm attempts to get as much information as it can from the inputs received. The first thing it does is scans the input for nouns the found nouns are compared with nouns found in previous inputs and applies weights to the nouns, the noun that appears the most is treated as the main subject noun. RAS keywords words with 104 in the type field are treated with highest priority. Verbs and Adjectives are then scanned and the verbs and adjectives occurring the most are given highest priority. The words in the priority words field 10 are words associated to the word the main word should also be in this list. The words in this list are the words used when search the information files. The file in the file field is a text file holding the filenames of the files to search for the keywords in field 10.
The first results file created is the results from the main subject word, the first file is searched using the adjective key words or the verb keywords creating the second file which is then researched with a third file being created holding more narrowed down answers.
If the third file is created then full reply quality is assigned RQ weight no.19 in settings, if only 2 results files then RQ = RQ weight / 2 if only one file created RQ = RQ weight / 3.
This routine is very good at pinning down information i use it mainly to find the prices from the price list as it also picks out numbers and you can associate many things to the words. The routine uses the short term memory and will remember the words from previous inputs until replaced by another word this means it will remember what you are talking about.
Because the routine uses information generated by the RAS routine then RAS rq slider 11 in settings must be higher than 0
WSB Wheat Stone Bridge

Wheat Stone Bridge (WSB)
variables are set up by comparing the resistance variable over 594 cycles. The frequency and the degree of change sets up the other 4 variables. These variables are used to produce words, sentences and codes that trigger a response from the Ai.
Resistance (wsbresistance) the resistance value changes slowly so this value is used to associate to whole sentences, if the plant is moist i.e. just been watered this value is high so the sentence associated is 'I'm feeling damp'.
Resistance variance (wsbvarience) The resistance variance produces two letters the first is the number of times the signal has raised over time the second is the number of times the resistance signal has dropped over time, I use the two letters for yes and no responses if it produces say 'cc' then not sure either way if its 'cf' then strong no if 'db' then strong yes etc.
Frequency and frequency variance over time (wsbwave) One of the variables produces letters, another the number of letters in the word then another is used to determine the length of the sentence. This sentence can then be fed into the main processing part of the program which then learns the words/puts the words into the data base, words from from the TF are associate to the created words. After the learning process it does the conversion process automatically. When I train the Ai I set it on learning mode and choose words that fit the mood, the other night was full moon so I did some training and used moon like words.
The WSB algorithm draws a graph showing in real time the breakdown of the signal
red and blue = number of times the signal rises or falls not sure which is which
purple = red and blue signals combined
green = wsbwave
resistance = wsbresistance
variance = wsbvariance
numbers = the raw data, the signal is digitally processed this produces 10 different values, the numbers
keep going, keeps the WSB routine running usually it switches off when the timer loop activates this is to save system resources, the WSB is quite heavy on processor power
The WSB algorithm is assigned a reply quality (RQ) weight like all the other processing algorithms in the program.
Because the WSB is constantly producing output the RQ weight is quite low so the WSB response is used when all the other algorithms fail to produce an answer. The resistance response is used when the sentence code (SC) for 'how do you feel' is found. The resistance variance is used when the SC for 'yes or no' is found.
Results so far are 'interesting' experiments are ongoing, each training session produces more words.
To add new words WSB learning needs to be ticked in settings, words are added automatically to the TF and are also assigned words already in the TF to the words created by the WSB. There is no intelligence involved when associating words, the next un-used word in the TF is used so changing words manually is needed to be done in order to maintain sensible replies. To feed the raw words created from the WSB routine into the input edit TF entry 908 and set the fields thus: f1=8, f4=40, f5 can use any of the WSB outputs these are wsball, wsbresistance, wsbvariance, wsbwave, f11=putininput. The RAS routine can inter fear with the learning process sometimes 3 WSB words are added together with a - separating them if this happens then move the first RAS RQ slider in settings to 0 this then stops 3 words joining together and single WSB words are added ( i think its slider RAS 1 but if it still happens move the other two to 0). I can't decide if this is a fault or if its useful to have the option, also if i disable this then other learning routines are affected so i'm leaving it as it is for now.

SE Search Engine and SEN Search Engine New
SE and SEN use the information in cell 910, also known as BASC Build Advanced Search Code. The routine sends a search request to a search engine then uses the search engine results for the output. The process uses the DownLoadFile, ExtractWords and KnownWordsTwo processes.
System cell 910 holds the information for the process. File field is the output filename, field 8 is the first part of web address, field f10 is the code added after the search string.
The Mode field controls the search string
If Mode = 0 then the search string is built up with the input sandwiched between the contents of field 8 and 10.
If Mode = 1 then the search string placed in between fields 8 and 10 is built up in one of the 4 ways:
the noun in the input
the noun and the adjective
the noun and the adverb
the noun and the verb

Deeper Thought DT
The Brain Cells of the input words are activated the output code is used to select cells to use for the answer. This means you can use the neural network and train it to use certain words/cells in the output. The input code is the sentence code character numbers added together then converted to the 9 bit binary code used for the input. The thinking behind this routine is to use the full functionality of the neural network and using all the input to work the neural network. This should also help facilitate a feedback look to build better responses by training the network.
If you click in the output box and the answer has been produced by the DT routine then you can edit the output. When you have edited the output press ok. The entries of the original input words have there output nodes connected to cells changed to the new words from the edited output. New training codes are created and the training file for the cell is updated. The training file and weights file if different from, filename created by the cell number then new training and weights files are created using the format <cell number>.txt
The training file is then loaded into the neural network training algorithm and new weights are created and stored in the new weights file associated to the cell.
The DT routines are designed to provide a feed back loop and utilizes the full functionality of the neural network.
The DT routine attempts to incorporate Chaos Theory. The training algorithm uses random numbers for the weights and then tests those weights though the neural network keeps the best outputs and starts the process again. This system is used to choose the best words and by editing the output and changing the words the connected to words are updated with better words and the best words are kept so the principle is the same as the generic algorithm that trains the neural network but its done by the person updating the output rather than randomly.
If the same words are changed allot then the training data builds up leading to longer training sessions as more conditions need to be set. The number of generations is set in the settings dialog. The more generations the better chance of achieving 100% accuracy, if this is reached then the training algorithm is halted and the results written to the weights field used by the cell. The smaller the amount of training data there is then fewer number of conditions need to be set and thus training is quicker. There might be instances where all conditions of the training data cant be set or would require considerable more generations of training to reach 100%. Future development will probably have a sleep option so at times of inactivity the Ai uses the time to train the network.
Created file
the text file NNECCFile.txt is created by the changed cell inputs routine. As the Ai is processing inputs certain functions and algorithms are used these create information, this info is added to the Brain Database. err this needs looking at

6P NNE 6 Points
This routine uses the NNE database. Entries 2000 upwards are used by 6P. Words in the TF are added to the NNE, each new word is linked to the TF, the entry number of the word in the NNE is put into field 1 of the words entry in the TF. The words added are words in the input but they will only be added if the word exists in the TF. When 6P learning is ticked in settings then when an input is being processed the 6P words are scanned and associations are made by adding NNE entry numbers to output node cell numbers 5 and 6. Field 12 TF is scanned for the word being examined and the word in field 6 is added if it exists in the NNE. The keywords sixplearnon and sixplearnoff turn on and of 6P learning each timer loop, this means learning can be done when the Ai is idle and means inputs are processed at the normal speed.
This routine works by making connections to words in the input following associations with other words. Each word has an entry in the NNE the output nodes cell number is used to associate to other words cell number nodes 1 to 6 are used. The word being examined in the input is checked for in the TF if there then it looks to see if it has an entry in the NNE if field 1 is higher than 10 then this tells it to go to the number in the NNE. It then looks at the numbers in the output nodes cell numbers starting at node 1 and follows the entry numbers checking for connections to the other words of the input, if a direct connection is found then the words that took it to the word are used for the output. If no direct route is found then it chooses one of the other answer routes, a bit of filtering is done using grammar and crude associations.
The last 3 bits (righter most) of the input code can be used to force a route
******000 = work your way through all nodes
******100 = follow node 1
******010 = follow node 2
******110 = follow node 3
******001 = follow node 4
******101 = follow node 5
******011 = follow node 6

EB and NNEB Neural Network Extended Backward
This routine uses the keyword from the PAR routine then using the NNE looks for words that connect to the word. Its works in reverse when compared to the NNE6P routine.

EC and NNEC Neural Network Extended Cell numbers at 1
This routine uses the Word Find routine to find a keyword from the input, the WordFind routine uses different criteria than the PAR routine so the word is often different. The Cell in the NNE associated with the word is activated. The output nodes that are set to 1 after activation, if connected to another cell then the TF entry word associated with that cell is added to the output. The input code is the input code generated from the user input.
At the moment this routine only uses one of the words from the input but this routine is proving very good at producing accurate answers and because it uses the full functionality of the NNE it is extremely flexible and also opens up more self learning possibilities.

DID Dig In Deeper
This routine is similar to the 6p routine but follows cell connections to much greater depth. The routine attempts to find connections between inputted words by following the cells that the inputted words are connected to. The 9 Cell node fields contain the cell numbers to follow these numbers are either added manually or automatically by the LM, DT and 6P learning routines. The output generated is the cell words in between or the route taken between the two words. This routine checks all the words in the input for connections to each other.

6P2 second 6p algorithm experimental
This is an experimental routine and produces no output. Results are displayed on the green screen. This algorithm needs a lot of monitoring to see if the idea is any good and worth using as an output.
Similar to six p but used a different method of getting to linked words. Maps data like cd's laid out on floor with each cd representing an entry in the Brain the route is north east west south and words used are those it passes through

BA best answer
The Choose Answer routine makes the decision about which output to use. It does this by comparing the reply qualities and previous answers. Aiming to find an output that hasn't been used in the last 9 outputs and with the highest reply quality.

CR Create Reply or Keyword
Look at Keywords for details on the functions

GIT Gap In Time
The amount of time the program takes to process an input is monitored, if there has been a long gap. The operation field in cell 929 is the amount of time in seconds to Waite before triggering the GIT response which is in field 8

HTML web page created
If create html is ticked in settings then a web page is created using the information in cell 832. Look in system cell details for more

IN input from input box
The input from the input box

MSG message
Indicates an incoming message from another program or from the network

OF Output Filter
Performs various filters on the output

OU output sent to output buffer
Outputs sent to the output buffer

OUT Output
Main output from the Ai

Pass number of processes or algorithms used to process the input
The pass number is incremented as it passes through various routines, it gives an idea of how much processing various inputs use.

PLY play mp3 music
Indicates the playing of a mp3 audio file

REP reprocess input
If no output is found then sometimes the input is reprocessed

TF Translation File
the name of one of the data bases used by Ai versions prior to the rebuild version AiV016a

Pre set words
Pre set words are words embedded in the program,
exitai shuts down the Ai program
restartai shuts down the Ai program and runs the StartAi.exe program
inputkeyword pops up a messagebox with the words after input keyword in it

Search
The search routine is designed to do different jobs depending and the algorithm that is using it.
RAS uses the routine most so it is advisable to have the number of files searched limited to an acceptable amount depending on how fast the response is required.
The file in cell 819 field 11 is the file name of the file that holds the file names of the files to search each time by the RAS routine.
If you want to have a large number of files searched and don't want the RAS routine to search all files each translation then it is best to use a different search file. If you use the keyword searchinfofile for a pattern then the file in 9 is the file holding the file names that will searched. This allows for different search criteria depending on what has been activated from the translation. You could have a search file full of say html file names that is used when the pattern for say 'search html for fish' is entered into the input.

DLF DownLoad File
this program is used to download files from the internet
DLF error codes
error 1, unable to read file: dlf_info.txt
error 2, no local filename
error 3, no online filename
error 4, problem downloading file
error 5, problem creating file

Attachments:

ProcessingAlgorithms.htm (33.03 KB)

Post by Admin on Jun 1, 2015 18:27:47 GMT

Quick Reply