I am looking for some kind of intelligent (I was thinking AI or Neural network) library that I can feed a list of historical data and this will predict the next sequence of outputs.
As an example I would like to feed the library the following figures 1,2,3,4,5
and based on this, it should predict the next sequence is 6,7,8,9,10 etc.
The inputs will be a lot more complex and contain much more information.
This will be used in a C# application.
If you have any recommendations or warning that will be great.
Thanks
EDIT
What I am trying to do i using historical sales data, predict what amount a specific client is most likely going to spend in the next period.
I do understand that there are dozens of external factors that can influence a clients purchases but for now I need to merely base it on the sales history and then plot a graph showing past sales and predicted sales.
If you're looking for a .NET API, then I would recommend you try AForge.NET http://code.google.com/p/aforge/
If you just want to try various machine learning algorithms on a data set that you have at your disposal, then I would recommend that you play around with Weka; it's (relatively) easy to use and it implements a lot of ML/AI algorithms. Run multiple runs with different settings for each algorithm and try as many algorithms as you can. Most of them will have some predictive power and if you combine the right ones, then you might really get something useful.
If I understand your question correctly, you want to approximate and extrapolate an unknown function. In your example, you know the function values
f(0) = 1
f(1) = 2
f(2) = 3
f(3) = 4
f(4) = 5
A good approximation for these points would be f(x) = x+1, and that would yield f(5) = 6... as expected. The problem is, you can't solve this without knowledge about the function you want to extrapolate: Is it linear? Is it a polynomial? Is it smooth? Is it (approximately or exactly) cyclic? What is the range and domain of the function? The more you know about the function you want to extrapolate, the better your predictions will be.
I just have a warning, sorry. =)
Mathematically, there is no reason for your sequence above to be followed by a "6". I can easily give you a simple function, whose next value is any value you like. Its just that humans like simple rules, and therefore tend to see a connection in these sequences, that in reality is not there. Therefore, this is a impossible task for a computer, if you do not want to feed it with additional information.
Edit:
In the case that you suspect your data to have a known functional dependence, and there are uncontrollable outside factors, maybe regression analysis will have good results. To start easy, look at linear regression first.
If you cannot assume linear dependence, there is a nice application that looks for functions fitting your historical data... I'll update this post with its name as soon as I remember. =)
Related
I am implementing full text search on a single entity, document which contains name and content. The content can be quite big (20+ pages of text). I am wondering how to do it.
Currently I am looking at using Redis and RedisSearch, but I am not sure if it can handle search in big chunks of text. We are talking about a multitenant application with each customer having more than 1000 documents that are quite big.
TLDR: What to use to search into big chunks of text content.
This space is a bit unclear to me, sorry for the confusion. Will update the question when I have more clarity.
I can't tell you what the right answer is, but I can give you some ideas about how to decide.
Normally if I had documents/content in a DB I'd be inclined to search there - assuming that the search functionality that I could implement was (a) functionally effect enough, (b) didn't require code that was super ugly, and (c) it wasn't going to kill the database. There's usually a lot of messing around trying to implement search features and filters that you want to provide to the user - UI components, logic components, and then translating that with how the database & query language actually works.
So, based on what you've said, the key trade-offs are probably:
Functionality / functional fit (creating the features you need, to work in a way that's useful).
Ease of development & maintenance.
Performance - purely on the basis that gathering search results across "documents" is not necessarily the fastest thing you can do with a IT system.
Have you tried doing a simple whiteboard "options analysis" exercise? If not try this:
Get a small number of interested and smart people around a whiteboard. You can do this exercise alone, but bouncing ideas around with others is almost always better.
Agree what the high level options are. In your case you could start with two: one based on MSSQL, the other based on Redis.
Draw up a big table - each option has it's own column (starting at column 2).
In Column 1 list out all the important things which will drive your decision. E.g. functional fit, Ease of development & maintenance, performance, cost, etc.
For each driver in column 1, do a score for each option.
How you do it is up to you: you could use a 1-5 point system (optionally you could use planning poker type approach to avoid anchoring) or you could write down a few key notes.
Be ready to note down any questions that come up, important assumptions, etc so they don't get lost.
Sometimes as you work through the exercise the answer becomes obvious. If it's really close you can rely on scores - but that's not ideal. It's more likely that of all the drivers listed some will be more important than others, so don't ignore the significance of those.
Hi I need some help on getting started with creating my first algorithm; I want to create a NN/Genetic Algorithm for use as an Intrusion detection system.
But I’m struggling with some points (never written an algorithm before.)
I want to develop in C# would it be possible as a console app? If so, as a precursor how big would the programme roughly be, at its most simplistic form. Is it even possible in c#?
How to connect the program to read in data from the network? Also how packets can be converted to readable data for the algorithm.
How to get the programme to write rules for snort or some other form of firewall and block what the programme deems as a potential threat. (i.e it spots a threat from No.2 then it writes a rule into the snort rules page blocking that specific traffic)
How to track the data. (what its blocked what its observing how it came to that conclusion)
Where to place it on the network? (can the programme connect to other algorithms and share data on the same network, would that be beneficial)
If anyone can help start me off in the right direction or explain what other alternatives there are like fuzzy logic etc and why is it deemed as a black box?
Yes, a console app, and C#, can be used to create a Neural Network. Of course, if you want more visual aspects to the UI, you'll want to use WinForms/WPF/Silverlight etc.. It's impossible to tell how big the program will be as there's not enough information on what you want to do. Also, the size shouldn't really be a problem as long as it's efficient.
I assume this is some sort of final year project? What type of Neural Network are you using? You should read some academic papers /whitepapers on using NN with intrusion detection to get an idea. For example, this PDF has some information that might help.
You should take this one step at a time. Creating a Neural Network is separate from creating a new rule in Snort. Work on one topic at a time otherwise you'll just get overwhelmed. Considering the hard part will most likely be the NN, you should focus on that first.
It's unlikely anyone's going to go through each step with you as it's quite a large project. Show what you've done and explain where you need help.
My core realization when I started learning about neural networks is that they are just function approximators. I think that's a crucial thing to keep in mind. Whether you're using genetic algorithms or neural nets (or combining them as mentioned by #Ben Voigt, even though neural networks are typically associated with other training techniques) - what you get in the end is a function where you put in a number of real values and get out a single value.
Keeping this in mind, you can design your program and just think of the network as a black box providing those predictions, on the testing part. During training, think of another black box where you put in pairs of input and output pairs and assume it's gonna get better the more pairs you show to it.
Maybe you find this trivial, but with all the theory and mystic behaviour that's associated with this type of algorithms, I found it reassuring (though a bit disappointing ;) to reduce them to those kinds of boxes.
I'm writing a bot that will analyse posts and reply with a vaguely related strings from a database. I'm not aiming for coherence, just for vague similarity that could pass as someone ignorant to the topic (but knowledgeable enough to try to reply). What are some methods that would help me to choose the right reply?
One thing I've come up with is to create a vocabulary list, check which elements of the list are in the post, and get a reply from the database based on these results. This crude method has been successful about 10% of the time (based on 100 replies to random posts). I might expand the list by more words, but this method has its limit. Any better ones?
(P. S. The database is sizeable -- about 500 000 replies)
First of all, I think the best you can hope for will be about a 50% answer rate, unless you're prepared to write a lot of code.
If you're willing to get your hands dirty with some statistics, check out term frequency–inverse document frequency. Basically, you will use the frequency of uncommon words to determine what keywords are critical to the document, and use this as the input into the tf-idf algorithm to pull out other replies with those same keywords.
You can then combine this further with whitelisting and blacklisting techniques to ignore common words and prioritize certain keywords. You can then keep tuning those lists to enhance the algorithm as you see it work.
There are also simpler string metrics you can use to test basic similarity. Take a look at this list of string metrics.
You might want to look into vector-space mapping and resemblance. The "vaguely related" problem could be handled by resemblance statistical analysis most likely.
Check out this novel use of resemblance:
http://www.cromwell-intl.com/security/attack-study/
There is a PHP function called "similar_text()", (e.g.:
$percent_similar = similar_text($str1, $str2);) This works fairly well but I didn't come up with anything similar in C#. If you could get hold of the source for the PHP function you might try to translate it. I think there may be a Java version also.
I have a question. I want to write a chess like program applying the rules as follows:
It should have just a king and a queen on one side and the other side should have just a king.
The first side should mate the second side with the lowest number of moves possible.
I want to know your thoughts about how to make this project. For example I want to know about which way of writing code is easier (object oriented or structured, ...) (I have a little information about object oriented) and can say me about writing its algorithm? For example from where I should begin to write the codes?
The good news here is that your problem is quite restricted in scope, as you only have three pieces to contend with. You're not really implementing a game so much here, as solving a logical puzzle. I'd approach it like this:
Figure out how to represent the three pieces in a simple way. You really don't need a UI here (other than for testing), since you're just trying to solve a puzzle. The simplest way is probably a simple Row,Column position for each of the three pieces.
If you haven't written an object-oriented program before, you'll probably want to stick with a procedural model and simply define variables for the data you'll need to represent. The problem scope is small, so you can get away with this. If you have some OOP experience, you can split up the problem appropriately, though you probably won't need any inheritance relationships.
Write the code for generating possible moves and determine whether a given move makes any sense at all. A legal King move is any move that does not check the King. Most queen moves should be permissible, but you probably also want to exclude moves that would allow the enemy King to take the Queen.
Now you need to determine a strategy for how to put together a sequence of moves that will solve the puzzle. If you need to find the true optimal solution (not merely a good solution), you may need to do a brute-force search. This may be feasible for this problem. You'll probably want to perform a depth-first search (if you don't know what this means, that's your first topic to research), as once you find a possible solution, that limits the depth at which all other solutions must be considered.
If you can get brute force functional and need to make things faster, consider if there are moves you can prove will have no benefit. If so, you can exclude these moves immediately from your search, saving on the number of branches you need to consider. You can also work to optimize your evaluation functions, as a faster evaluation is very beneficial when you are doing billions of them. Finally, you might come up with some heuristics to evaluate which of the branches to try first. The faster you can converge to a 'good' solution, the less cases you need to consider to find the optimal solution.
One side note I realized is that the problem is very different if you assume that the enemy King is trying to avoid checkmate. The simple depth-first pruning only works if you are allowed to move the enemy King in the way that best checkmates it. If the enemy King is attempting to avoid checkmate, that complicates the problem, as you have conflicting optimization goals (you want it to occur in as few moves as possible, yet your enemy King wants to postpone as long as possible.) You might be limited to characterizing a range of possibilities (say, 3 moves best case if King is perfectly cooperative, 8 moves best worst-case if King is perfectly evasive.)
Take a look at this SO question (Programming a chess AI).
From the answers to that question, I think this C# Chess Game Starter Kit would be a good start, but I would also look at the other articles referenced as well for some interesting history/information.
This is the simplest possible example of an endgame database. There are less than 64^3 = 262144 positions, so you can easily store the score of each position. In this case, we can define the score as the number of moves to checkmate, for a winning position; or 255 for a drawn position. Here is an outline:
Set all scores to 255.
Look for all checkmate positions, and score them as 0.
Set depth = 1.
For each drawn position (score=255), see if a move exists into a won position (more precisely, see if a move exists into a position from which all of the opponent's moves are losing.) If so, set its score to depth.
If no new position was found in step 4, you're done.
Increment depth, and go to Step 4.
Now you have a 250k table that you can save to disk (not that it should take many seconds to generate it from scratch). If space is important, you can reduce this significantly by various tricks. Wikipedia has a nice article on all this -- seach for "Endgame tablebase".
A poster here suggest that Stockfish would be a good start, but it is a C++ project, whereas you are asking for C#.
The solution depends on your requirement. If you are interested in "just make it work", you could complete the project without writing more than 200 lines of code. You could embed an open source C# project, and ask the engine to report you the number of moves to mate. If the open source project is UCI supported, the following command will do the job:
go mate x
where x is the number of moves to mate.
However, if you need to do the thinking yourself. You will need to choose between efficient bitboard or object-oriented representation. Bitboard is a much better representation, it is very fast but harder to program. All chess engines use bitboard. In your project, represenation efficiency is not too much of concern, so you could choose OO represenation.
I found this very cool C++ sample , literally the "Hello World!" of genetic algorithms.
I so decided to re-code the whole thing in C# and this is the result.
Now I am asking myself: is there any practical application along the lines of generating a target string starting from a population of random strings?
EDIT: my buddy on twitter just tweeted that "is useful for transcription type things such as translation. Does not have to be Monkey's". I wish I had a clue.
Is there any practical application along the lines of generating a target string starting from a population of random strings?
Sure. Imagine any scenario in which you know how to evaluate the fitness of a particular string, and in which the choices are discrete and constrained in some way:
Picking pronounceable names ("Xhjkxc" has low fitness; "Artekzo" has high fitness)
Trying out a series of chess moves
Guessing the combination to a safe, assuming you can tell how close you are to unlocking each tumbler
Picking phone numbers that evaluate to words (e.g. "843-2378" has high fitness because it spells "THE-BEST")
No. Each time you run the GA, you are giving it the eventual answer. This is great for showing how a GA works and to show how powerful it can be, but it does not have any purpose beyond that.
You could write an EA that writes code in a dynamic language like IronPython with the goal of creating code that a) executes without crashing and b) analyzes the stock market and intelligently buys and sells stock.
That's a very simplistic take on what would be necessary, but it's possible. You would need a host that provides a lot of methods for the IronPython code (technical indicators, etc) and a database of ticks.
It would also be smart to not just generate any old random code, lest you format your own hard-drive. You need a sandbox, and you need to limit the namespaces that are accessable, and you would need to provide a time limit to avoid infinite loops. You could also provide symantic guidelines that allow it to choose appropriate approved keywords instead of just stringing random letters together -- this would greatly speed up evolution.
So, I was involved with a project that did everything but the EA. We had a satellite dish that got real-time stock ticks from the NASDAQ, a service for trading that had an API, and a primitive decision making "brain" that made decisions as the ticks came in.
Sadly, one of the partners flipped out, quit his job, forked the project (got his own dish, etc), and started trading with logic that wasn't ready. He lost a bunch of money. It turns out that for some people this type of project is only a step away from common gambling. But anyway, the project kind of fizzled out after that. Evolving the logic part is the missing link though. And I know there are people out there doing this type of thing.
I have used GA in 2 real life research problems.
One was a power optimization problem (maximize number of appliances turned on, meeting the available power constraint and service guarantee for each appliance)
Another was for radio network optimization, maximizing the coverage area given a fixed equipment budget
GA has one main disadvantage, it usually works with genetic speed so using it in some serious time-dependant projects is quite risky.