I am trying to find information (and hopefully c# source code) about trying to create a basic AI tool that can understand english words, grammar and context.
The Idea is to train the AI by using as many written documents as possible and then based on these documents, for the AI to create its own creative writitng in proper english that makes sense to a human.
While the idea is simple, I do realise that the hurdles are huge, any starting points or good resoueces will be appriacted.
A basic AI tool that you can use to do something like this is a Markov Chain. It's actually not too tricky to write!
See: http://pscode.com/vb/scripts/ShowCode.asp?txtCodeId=2031&lngWId=10
If that's not enough, you might be able to store WordNet synsets in your Markov chain instead of just words. This gives you some sense of the meaning of the words.
To be able to recompose a document you are going to have to have away to filter through the bad results.
Which means:
You are going to have to write a program that can evaluate if the output is valid (grammatically and syntactically is the best you can do reliablily) (This would would NLP)
You would need lots of training data and test data
You would need to watch out for overtraining (take a look at ROC curves)
Instead of writing a tool you could:
Manually score the output (will take a long time to properly train the algorigthm)
With this using the Amazon Mechanical Turk might be a good idea
The irony of this: The computer would have a difficult time "Creatively" composing something new. All of its worth will be based on its previous experiences [training data]
Some good references and reading at this Natural Language article.
As others said, Markov chain seems to be most suitable for such a task. Nice description of implementing Markov chain can be found in Kernighan & Pike, The Practice of Programming, section 3.1. Nice description of text-generating is also present in Programming Pearls.
One thing, though not quite what you need, would be a Markov chain of words. Here's a link I found by a quick search: http://blog.figmentengine.com/2008/10/markov-chain-code.html, but you can find much more information by searching for it.
Take a look at http://www.nltk.org/ (Natural Language Toolkit), lots of powerful tools there. They use Python (not C#) but Python is easy enough to pick up. Much easier to pick up than the breadth and depth of natural language processing, at least.
I agree, that you will have troubles in creating something creative. You could possibly also use a keyword spinner on certain words. You might also want to implement a stop word filter to remove anything colloquial.
Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
One of the first things that got me into wanting to program was to create a multiplayer text game. I was scared away from the notion though when I realized, at least at the time for me, how complicated writing a smart parser would be.
So now I'm back to thinking about it, and I've tried to do lots of research on the subject. It turns out that it seems to be a lot more involved than I think, and I've stumbled across terms such as lexing and tokenizing and parsing, only the latter of which I knew of before. I figured the field of lexical analysis is what I wanted to look for.
So instead of trying to create my own lexer and parser which I've read to be very difficult and error prone, and most people instruct to steer away from, I thought I would find a good lexer and parser generator to use which will supposedly do all the heavy lifting for me and I can just focus on the grammar that I want. I've also heard lots of people say that individuals looking to do such a thing should simply use Inform.
Sure, I guess Inform is cool, but C# is my language of choice and I like the freedom that it allows me over what I perceive Inform to offer. I'm more interested in creating all the components and framework of the multiplayer text game than I am in any one particular final result, and so I like the idea of using a standard programming language best.
I've been trying to find a good lexer/parser generator for C# for a while now, not really satisfied with all that seems to be offered in terms of the comments people give.
antlr for C# seems to be underdeveloped and mostly an afterthought.
I've tried understanding GPLEX and GPPG but as of right now they are far too confusing for me, despite reading a lot of the documentation and trying to read up a lot on lexing in general.
I have a lot of the concepts in my head about the whole process of lexing, but when faced with a lexer and a parser, I guess I don't really know how these are supposed to be meshed into my actual code.
I want to build a simple English like grammar with noun phrases and verb phrases, and be able to have lists of nouns and verbs that can be dynamically added and ready from a database as the game is developed and expanded upon.
I guess I'm feeling like I'm turning up short on the results of the research I've been trying to do with this subject.
To be honest, the idea of creating my own bastardized notion of a lexer and parser based on what I've researched seems far more appealing at this point than using any lexer/parser generator that I've read about.
First, some meta-advice. This is maybe not a great question for StackOverflow since it is not about a specific technical problem. You might consider asking questions like this on Programmers.StackExchange.com instead.
To address your question: I agree with tallseth; it depends on what skill you want to improve here. People occasionally ask me why I don't build a boat, since I am fairly handy and like sailing. Because you should only build a boat if you like having an unfinished boat in your garage for two years. If you want to sail, sail. If you want to build a boat, build a boat. But don't build a boat because you want to sail. If you want to write a text adventure, learn Inform7; it is the most incredibly awesome tool for making text adventures the world has ever seen. If you want to learn how to make a lexer and parser, then don't mess around with parser generators. Make a lexer and parser.
Parser generators are great if you are prototyping up a new programming language and you want to get going quickly. But programming language grammars can have really nice properties that make them amenable to automatic parser generation in a way that natural languages do not. You're likely to spend as much time fighting with the parser generator as you spend building the grammar if you go with a parser generator approach.
Since you're a beginner at this, I suggest that you recapitulate the history of text adventures. Start by writing a lexer and parser that can parse two-word commands: "go north", "get sword", "drop brick", and so on. That will give you enough flexibility that you can then start solving other problems in text adventure design: how to represent objects and locations. Start small; if you can make a two-room game where the player can pick up and put down objects, you are well on your way.
In short, I strongly encourage you to follow your instinct; writing your own lexer and parser is super fun and actually not that hard if you start small and work your way up.
It depends on what you want to get out of the project.
If this is a learning project to do on your spare time, rolling your own tokenizer and interpreter is a good solution. While it is hard to do this perfectly, it is fairly easy to make a sufficient parsing layer. I'd recommend making the most minimal parsing layer you can to start with, and incrementally improving it as you add richness to the game. This is a good type of problem to practice TDD and design patterns with, if that interests you. Using those practices will make it easier to incrementally make improvements.
On the other hand, if you will not get a lot of value from that exercise (you're on a tight timeline, this is for real work, you just don't care about the parsing layer, etc), then I suggest making do with an existing parser. If you encapsulate a third party parser behind an abstraction, you can change your mind later about writing your own, or about what parser to use.
You probably already saw it, but this SO question looks to have a nice list of parsers for c#.
I'm sorry to ask such a localized question but until I get confirmation I don't feel confident moving on with my project.
I have read lots about the template pattern, Wikipedia has a good example.
It shows that you create the basic virtual methods and then inherit the base class and override where you want. The example on the site is for Monopoly and Chess which both inherit the base class.
So, my question is, if you had an application which was only going to be Chess and never anything else, would there be any benefit in using the template pattern (other than as an education exercise)?
No, I think that falls under the category of "You Ain't Gonna Need It."
To be more specific, design patterns exist to solve a particular problem, and if your code doesn't need to solve that problem, all they do is add lines of code without having any benefit.
No. Expressed in a very simplified and superficial way, the template pattern is just worthwhile starting at a certain relationship between total code size and templated code size. In your example, the chess game is going to be the entire program, so there'll be no need to use the template pattern here.
The template pattern is used in specific situations. It is used when you want to sketch out an algorithm but let the specific steps differ.
This could be useful in a Chess application. However, you should not start developing an application with the idea 'I'm going to use this pattern and that one and..'. Instead, you develop the code and you discover that you need certain patterns.
This is where a Test Driven Development approach is really handy. It allows you to refactor your code each step of the way.
A nice book that explains this is Refactoring To Patterns.
I would suggest writing your chess game and then if in the future coming back and changing things to fit monopoly too. But its something totally different if you want to use the pattern to learn the pattern, in that case its good to start simple so the complex is easier to understand.
It really depends on the parts of the program. The whole idea of Template is to have an algorithm that never changes and to be able to add or edit certain steps of that algorithm.
It may well be that you never change, however, this is the issue with design principles, it IS good practice and you may later wish you'd implemented them. I would say though that if you are 100% sure then you can leave it out as it usually saves time and lines of code. Depends if you want to learn Template usage or not.
Also the GOF principles website is quite good:
I'm writing a bot that will analyse posts and reply with a vaguely related strings from a database. I'm not aiming for coherence, just for vague similarity that could pass as someone ignorant to the topic (but knowledgeable enough to try to reply). What are some methods that would help me to choose the right reply?
One thing I've come up with is to create a vocabulary list, check which elements of the list are in the post, and get a reply from the database based on these results. This crude method has been successful about 10% of the time (based on 100 replies to random posts). I might expand the list by more words, but this method has its limit. Any better ones?
(P. S. The database is sizeable -- about 500 000 replies)
First of all, I think the best you can hope for will be about a 50% answer rate, unless you're prepared to write a lot of code.
If you're willing to get your hands dirty with some statistics, check out term frequency–inverse document frequency. Basically, you will use the frequency of uncommon words to determine what keywords are critical to the document, and use this as the input into the tf-idf algorithm to pull out other replies with those same keywords.
You can then combine this further with whitelisting and blacklisting techniques to ignore common words and prioritize certain keywords. You can then keep tuning those lists to enhance the algorithm as you see it work.
There are also simpler string metrics you can use to test basic similarity. Take a look at this list of string metrics.
You might want to look into vector-space mapping and resemblance. The "vaguely related" problem could be handled by resemblance statistical analysis most likely.
Check out this novel use of resemblance:
http://www.cromwell-intl.com/security/attack-study/
There is a PHP function called "similar_text()", (e.g.:
$percent_similar = similar_text($str1, $str2);) This works fairly well but I didn't come up with anything similar in C#. If you could get hold of the source for the PHP function you might try to translate it. I think there may be a Java version also.
i feel like i'm going back to the stone age.
how do i relearn to develop without intellisense (pydev intellisense doesn't count).. in general, how does one successfully leave the comfort of visual studio ?
I recently learned python with a strong C# background.
My advise: Just do it. Sorry, couldn't resist but I'm also serious. Install python and read: Python.org documentation (v2.6). A book might help too -- I like the Python PhraseBook. From there, I started using python to implement solutions for various things. Most notably, ProjectEuler.net questions.
It forced me to consider the languages and built in data structures.
Python is truly easy to use and intuitive. To learn the basics, took me about an hour. To get pretty good with it, took around 5 hours. Of course, there is always more to learn.
Also, I want to note that I would discourage using IronPython or Jython because I feel learning core, regular python is the first step.
Python has rich "introspection" features. In particular, you can find out a lot about built-in features using a command called help() from the Python command line.
Suppose you want to use regular expressions, and want to find out how to use them.
>>> import re
>>> help(re)
You get a nice display of information, automatically shown to you a page at a time (hit the space bar to see the next page).
If you already know you want to use the sub() function from the re module, you can get help on just that:
>>> help(re.sub)
And this help() feature will even work on your own code, as long as you define Python docstrings for your modules, classes, and functions.
You can enable features in the vim editor (or gvim, or vim for Windows) that enable an "IntelliSense"-like autocompletion feature, and you can use Exuberant Ctags to generate hyperlink "tags" to let you navigate quickly through your code. These turn vim into something that is roughly as powerful as an IDE, with the full power of vim for editing. (There isn't an explicit refactoring tool built in to vim, but there are options available.
And as others have noted, you can get IDEs for Python too. I've used the Wingware IDE, and I recommend it. I try to do most of my work with free, open-source software, but that is one piece of proprietary software I would be willing to buy. I have also used Eclipse with the Pydev plugin (I used its refactoring tool and it worked fine).
P.S. Python has a richer feature set than C#, albeit at the cost that your code won't run as fast. Once you get used to Python, you won't feel like you are in the stone age anymore.
One step at a time?
Start off with simple programs (things you can write with your eyes closed in C#), and keep going... You will end up knowing the API by heart.
<rant>
This is sort of the reason that I think being a good visual studio user makes you a bad developer. Instead of learning an API, you look in the object browser until you find something that sounds more or less like what you are looking for, instantiate it, then hit . and start looking for what sounds like the right thing to use. While this is very quick, it also means it takes forever to learn an API in any depth, which often means you end up either re-inventing the wheel (because the wheel is buried under a mountain worth of household appliances and you had no idea it was there), or just doing things in a sub-optimal way. Just because you can find A solution quickly, doesn't mean it is THE BEST solution.
</rant>
In the case of .NET, which ships with about a billion APIs for everything under the sun, this is actually preferred. You need to sift through a lot of noise to find what you are aiming for.
Pythonic code favors one, obvious way to do any given thing. This tends to make the APIs very straight forward and readable. The trick is to learn the language and learn the APIs. It is not terribly hard to do, especially in the case of python, and while not being able to rely on intellisense may be jarring at first, it shouldn't take more then a week or two before you get used to it not being there.
Learn the language and the basic standard libraries with a good book. When it comes to a new library API, I find the REPL and some exploratory coding can get me to a pretty good understanding of what is going on fairly quickly.
You could always start with IronPython and continue to develop in Visual Studio.
The python ide from wingware is pretty nice. Thats what I ended up using going from visual studio C++/.net to doing python.
Don't worry about intellisense. Python is really simple and there really isn't that much to know, so after a few projects you'll be conceiving of python code while driving, eating, etc., without really even trying. Plus, python's built in text editor (IDLE) has a wimpy version of intellisense that has always been more than sufficient for my purposes. If you ever go blank and want to know what methods/properties and object has, just type this in an interactive session:
dir(myobject)
Same way you do anything else that doesn't have IntelliStuff.
I'm betting you wrote that question without the aid of an IntelliEnglish computer program that showed you a list of verbs you could use and automatically added punctuation at the end of your sentences, for example. :-)
I use Eclipse and PyDev, most of the time, and the limited auto-completion help that it provides is pretty useful.
It's not ever going to come up to the level of VS's IntelliSense, and it can't, because of the dynamic nature of Python. But there are compensations, big ones.
The biggest is the breaking of the code-compile-test cycle. It's so easy to write and test prototype code in IDLE that very often it's where I go first: I'll sketch out and test a couple of methods that have to interoperate, figure out that there's something I don't know, learn it, fix my test, and then port the whole thing over to PyDev and watch it work the first time.
Another is that it's a lot simpler. It's really important to know what the standard modules are, and what they do, but for the most part that can be picked up with a little reading. I only use a small handful of modules in my everyday programming - itertools, os, csv (yeah, well), datetime, StringIO - and everything else is there if I need it, but usually I don't.
The stuff that it's really important to know is stuff that IntelliSense couldn't help you with anyway. Auto-completion isn't going to make
self.__dict__.update(kwargs)
make a damn bit of sense; you have to learn what an amazing line of code that is, and why you'd write it, by yourself.
Then you'll think, "how would I even begin to implement something like that in C#?" and realize that the tools these stone-age people are using are a little more sophisticated than you think.
A mistake people coming from C# or Java make a lot is recreate a large class hierarchy like is common in those languages. With Python this is almost never useful. Instead, learn about duck typing. And don't worry too much about the lack of static typing: in practice, it's almost never an issue.
I'm not too familiar with Python so I'll give some general advice. Make sure to print out some reference pages of common functions. Keep them by you(or pinned to a wall or desk) at all times when your programming.. I'm traditionally a "stone age" developer and after developing in VS for a few months I'm finding my hobby programming is difficult without intellisense.. but keep at it to memorize the most common functions and eventually it'll just be natural
You'll also have to get use to a bit more "difficult" of debugging(not sure how that works with python)
While your learning though, you should enjoy the goodness(at least compared to C#) of Python even if you don't have intellisense.. hopefully this will motivate you!
I've only ever used IDLE for python development and it is not fun in the least. I would recommend getting your favorite text editor, mine is notepad++, and praying for a decent plugin for it. I only ever had to go from Eclipse to Visual Studio so my opinions are generally invalid in these contexts.
Pyscripter does a reasonable job at auto-completion/intellisense. I've been using it recently as I started to learn Python/Django, where I've been mainly a C# developer for the last few years.
I suggest going cold turkey - languages like Python shine with great text editors. Choose one you want to become amazing at (vim, emacs, etc.) and never look back.
I use Komodo Edit and it does reasonably good guessing at the autocompletion.
Others have suggested several editors that have intellisense-like capabilities. Try them out.
Also install ipython and use that to learn the language interactively. It is like a souped up version of the regular python interactive shell with lots and lots of added capabilities, and one of the most useful it its extensive context sensitive tab-completion and help.
For example if you type
import r<tab>
it will show all the modules you can import starting with r
import re
re.<tab>
will show all the objects in the re module
re.compile?
will show the docstring and other information about the re.compile function, automatically piping it through a pager if it is longer than a screenful.
re.compile??
will show the source code as well, if available.
Using this I find it is much faster to switch to ipython and query objects directly than it is to look up anything in the docs. You have access to the usual python help() system as well.
ipython has lots and lots of other features - far too many to cover in a short post.
Today I found out about an interface I'd never heard of before: IGrouping
IEnumerable<IGrouping<YourCategory, YourDataItem>>
I am fortunate to have access to some of the best programming books available, but seldom do I come across these kinds of gems in those books. Blogs and podcasts work, but that approach is somewhat scattershot. Is there a better way to learn these things, or do I need to sift through the entire MSDN library to discover them?
Eric Lippert's blog. The real guts of C# - why there are some limitations which might seem arbitrary at first sight, how design decisions are made, etc.
Alternatively, for more variety, look at the Visual C# Developer Center - there's a whole range of blogs and articles there.
Oh, and read the C# spec. No, I mean it - some bits can be hard to wade through (I'm looking at you, generic type inference!) but there's some very interesting stuff in there.
The best place to start is Jon Skeet's C# Coding blog: http://msmvps.com/blogs/jon_skeet/
He regularly covers stuff you won't see anywhere else.
How about the Hidden Features series of questions?
Hidden Features of C#
Hidden Features of ASP.NET
And many more...
I personally like the way of discovering hidden features on my own while solving a specific problem. In the end, a hidden feature that you never needed to get something done is of questionable value. It just adds clutter to the brain.
The way to do it is to use the MSDN library to look things up. Then take a little time to look around what you found.
That's especially important with the pure API documentation. For instance, I just browsed to http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx (note how that URL is formed). When I look in the Contents pane on the left, I see everything from XmlDocument (and XmlDocumentFragment) all the way down to XmlReader. In the middle are some things I rarely or never use, like XmlNamespaceScope and XmlNodeOrder.
From time to time, spend a little time on "abstract knowledge". Sometimes, it's good to look up from the trees to learn your way around the forest. You never know when you'll need something you've learned to get you out of the woods.
For the people who don't know IGrouping:
http://msdn.microsoft.com/en-us/library/bb344977.aspx
I often read useful stuff on the Viual Studio Startup page and start clicking around to other keywords/areas. Not too promote StackOverflow too much, but you'll find some hidden gems here as well, simply by looking at how other people write code.
For example:
Hidden Features of C#?