Natural languages are heavy to interpret and are full of special cases, etc...
What about taking the other way around and make a simpler, english based, regular language for NPC interaction ?
We can even use this language for people to people interaction. Some kind of Esperanto for Planeshift, based on english words...
I can write a C++ class for understanding Esperanto grammar or any other regular language, this solves the "special cases and constructions" problem, and the construction can be understood by the computer using a neural network approach or any pattern matching algoritmic approach.
Process :
STEP 1 : Transform the input string into a stream of tokens
STEP 2 : Expand macros (Like OMG ! translated to "By the gods !")
STEP 3 : Apply spell checking
STEP 4 : Translate each token into a index interger using a words database (The same database can be used to correct spelling), if a new token is found, add it to the database. The token list is translated into a array of unsigned intergers, and a start of phrase and end of phrase is applied to the start and end of the array. Special chars like "?" are to be represented in this database as well.
STEP 5 : Take each word pair (including the phrase start/end markers) and, using their indexes as a 2D array entry, build a word to word transition index. (You will understand this if you search the wikipedia for Markov Chains). This is the "phrase" database. See table 1 for an example. If a new transition is found (When a new word is found), add a new transition to the database. (The tabular format is only a way to represent this database here. It should be easier to use a sparse matrix, ommiting the non-ocurrent entries).
STEP 6 : Take each transition index and apply into a pattern matching system.
STEP 7 : If a pattern sufficiently matches, gives the appropriate answer.
STEP 8 : Else, give a "i am not understanding" answer.
The database should be locked (Not allowed to write to it) when the language system reaches some consistent state.
Each NPC should have their own pattern matching system, and the NPC should rule the context of the conversation (In order to solve the context problems that are IMPOSSIBLE for a computer to solve in the near future... [I can guarantee you, the processing power involved into context-free grammars are out of current technology scope, adding to it the number of concurrent queries for each npc X player interaction only worses this]), you can even force the context by having the NPC's to start the conversation...
Example Table 1
| State X State | Start-Phrase | I | AM | NICE | End-Phrase |
| Start-Phrase | Transition 0 | Transition 1 | Transition 2 | Transition 3 | Transition 4 |
| I | Transition 5 | Transition 6 | Transition 7 | Transition 8 | Transition 9 |
| AM | Transition 10 | Transition 11 | Transition 12 | Transition 13 | Transition 14 |
| NICE | Transition 15 | Transition 16 | Transition 17 | Transition 18 | Transition 19 |
| End-Phrase | Transition 20 | Transition 21 | Transition 22 | Transition 23 | Transition 24 |
*Note that some of these transitions are impossible to happen (Like a start-phrase to start-phrase marker) and that some transitions are not meaningfull (Like start-phrase to end-phrase marker) so can be discarded...
**Note too that this table is one of the elements of the grammar, saying wich word to word transitions are allowed in our "Planeshift esperanto-like grammar"
***Note that the larger the grammar, the larger the database, by the relation N^2 wich is not good but is better than most statistical approaches.
The rest of the grammar is implemmented into each NPC pattern matching system...
Theres a free, C/C++ Neural network library available as GPL (
http://leenissen.dk/fann/) wich can be used to build each NPC pattern matching system. Each NPC will have to be trainned into pattern matching (generating a .dat file with the neural network desired state) but will be very good at this after being trainned... Each word to word transition pattern should trigger a NPC state for this player (or, if two players try to address the same NPC, we can have it say "Sorry, wait until i end with <1st player name here> conversation.", this is more natural).
I think with this system, and a restricted language (not a really natural language one) we can have a very good NPC system

*I dont really know current game development state, so i dont know if this is already being addressed or if this really suits the developers taste, etc. I dont want to offend anyone here either...
*edit*
Example :
Phrase "I am nice ?"
/* Affirmative and interrogative forms are the same, only "?" denotes interrogation. The only language instances allowed are affirmative and interrogative. Affirmatives could be taken as interrogatives if theres no current context set by the NPC itself (A context is a array of pattern matching instances), in this case the noun "I" selects the context, "Player talking about himself" wich can be the 1st context of the grammar, the second context can be "Player talking about the NPC itself", third "Player talking about other player", Forth "Player talking about other NPC", Fifty "Player talking about city", etc */
Word database :
1 "<<" /* "<<" here denotes start marker */
2 ">>" /* ">>" here denotes end marker */
3 "I" /* 1st real word in the database */
4 "AM"
5 "NICE"
Transition database :
1, 1 to 2 /* can be omitted as it means nothing */
2, 1 to 3 /* a real phrase starting... */
3, 1 to 4 /* ommited as this is a regular grammar */
4, 1 to 5 /* can be meaningfull only if the next transition is 5 to 2 */
Transition of 2 to anything cant really happen, so this whole row is not present in the transition table
All transitions into 1 are ommited as they dont make sense.
Transitions to and from the same word are not meaningfull, so are ommited too.
5, 3 to 2 /* can be omitted or if happens just after transition 2 be a trigger for the NPC doing some jokes about the player, etc */
6, 3 to 4 /* player talking about himself... */
7, 3 to 5 /* can represent something in the grand scheme of the grammar */
8, 4 to 2 /* can be ommited */
9, 4 to 3 /* can be ommited */
10, 4 to 5 /* good construction */
11, 5 to 2 /* good construction */
12, 5 to 3 /* can represent something */
13, 5 to 4 /* inverse order should not be allowed in order to avoid special cases in the grammar, but... */
So we have 5 words and 13 phrase fragments, nowwe do the black magic :
"I am nice" ->tokenize-> 1 3 4 5 2 ->transitions-> 2 6 10 11 ->using the current context "Player talking about himself" (an array of pattern matchers)-> triggers the answer for "iamnice" catchphrase...
Well, this is just a SMALL example and is not really shocking to see, but with really big databases we can see some improvement of NPC communication skills...
By the way, if applied in the reverse order, we can have the NPC's talking FROM the database...
*edit*
I actually have working code for Markov Chains (i used to make a ircbot talk and learn from the users in the channel), but is written in ObjectPascal and then will take some time to translate to C++ ... The current code uses an array to hold everything, but it should be better if you use some kind of database (Like postgre or mysql) and give to the database all the management/search work... Even the neural network .dat files (The contexts) can be stored as BLOB's... and we can have a context table too, holding indexes to .dat patterns and corresponding event triggers in the game's internal interpreted language that coordinates everything...
Please avoid posting two or more successive posts before others have replied. Just "Modify" your last post to add new information. Thanks! --Karyuu