I am an apprentice and I just finished my first .NET web application that main premise is to parse group and visualize logs from ELmah.io(Error Logging Modules and Handlers). I had a look in to ML.NET Model Builder and multi-class classification that would allow me to add a single class to log (for example priority) and train model according to that straight from SQL server tables. What I'm more interest is multi-label classification which ML.NET not support at the moment and I was looking in to Accord.NET and I have to admit it's a bit confusing and very hard to find any tutorials - I was able to run a simple binary classification example from their website, but that's not what I'm looking for.
Any guidance where to start would be appreciated(maybe another alternative to ML.NET?).
From what I've read my table structure would look something like this:
An example of my front end:
Thanks,
Jakub
ML.net till today does not support Multi-Label Classification.
What you can do instead is to train your model on each label separetly then combine results.
Related
I am going to analyze the data using NEAT(Encog C# version). And then, I would like to confirm the network structure and the network weights when the learning was converged. Although I have been reading the Encog documention, I cannot find the articles and sample code. Would it be possible to tell me how to do this(sample code or the other way).
I would suggest you check out the Pluralsight courses by Abishek Kumar You can watch the courses by signing up for a free trial.
To "confirm the network weights", I would suggest you start with Kumar's "Introduction to Machine Learning with ENCOG 3". Start with his examples of "Resilient Propagation" or "Quick Propagation" training algorithms, as these are the easiest to setup (its tricky to set the parameters for many of the other training algorithms). These algorithms will setup the weights of your network.
To "confirm the network structure", I would recommend Kumar's "Advanced Machine Learning with ENCOG" course which has two sections on network tuning. I would suggest you only use only one hidden layer and use the "Pruning" techniques described in his tutorial (or downloadable code) to choose the number of neurons in the hidden layer.
To decide when the "learning has converged", I would recommend you follow his examples (or download his example code) and use the StopTrainingStrategy (so you automatically stop training when the global error starts increasing) and EarlyStoppingStrategy (so you automatically stop training when the Cross Validation error starts increasing).
I have a lucene index with a lot of text data, each item has a description, I want to extract the more common words from the description and generate tags to classify each item based on the description, is there a lucene.net library for doing this or any other library for text classification?
No, lucene.net can make search, index, text normalization, "find more like this" funtionalty, but not a text classification.
What to suggest to you depends from your requirements. So, maybe more description needed.
But, generally, easiest way try to use external services. All external services have REST API, and it's very easy to interact with it using C#.
From external services:
Open Calais
uClassify
Google Prediction API
Text Classify
Alchemy API
Also there good Java SDK like Mahout. As I remember interactions with Mahout could be also done like with service, so integration with it is not a problem at all.
I had similar "auto tagging" task using c#, and I've used for that Open Calais. It's free to make 50,000 transactions per day. It was enough for me. Also uClassify has good pricing, as example "Indie" license 99$ per year.
But maybe external services and Mahout is not your way. Than take a look at DBpedia project and RDF.
And the last, you can use some implementations of Naive Bayes algorithm, at least. It's easy, and all will be under your control.
This is a very hard problem but if you don't want to spend time on it you can take all words which have between 5% and 10% frequency in the whole document. Or, you simply take the most common 5 words.
Doing tag extraction well is very very hard. It is so hard that whole companies live from webservices exposing such an API.
You can also do stopword removal (using a fixed stopword list obtained from the internet).
And you can find common N-grams (for example pairs) which you can use to find multi-word tags.
In my next project I will have to implement an automation solution to test a hardware device. basically, the test involves an industrial robotic arm picking a device to be tested, holding it at some specified position and then using a series of other devices like motors and sensors to exercise several areas of the product to be tested.
So my test automation solution will need to communicate with several controllers, either issuing actuation commands or getting information from sensors.
The first idea that comes to mind is to define the sequence of steps for each controller in a custom XML language. In this language I'd need to define primitives such as "MOVE", "IF", "WAIT", "SIGNAL" and etc. These primitives would be used to define the operation script for each controller. Each controller runs asynchronous but eventually gets synchronized, so that's the need for things like "WAIT" and "SIGNAL".
I did a basic search on google and the only thing I was able to find was really old stuff (I don't need to comply to industrial standards, it's a small venture) or XML dialects that were designed for something else.
Question is - do you know of any XML standard that I could use instead of creating my own?
EDIT: I'm currently investigating a plan execution language by NASA that looks promising. Name is PLEXIL. If anybody knows anything about it, please feel to contribute.
Have you reviewed PARSL? It's an XML based robotic scripting language which incorporates sensors, looping, and conditional behavior.
XML can be amended to create your 'own standard'. You can define things using a DTD (Document Type Definition) file. In this manner you can create your own way the XML has to look like.
The DTD is a schema that contains the structure and constraints you want to put on your XML file. Have a look here on wikipedia for more info.
Hope this is helpful!
What is the easiest way to programmatically extract structured data from a bunch of web pages?
I am currently using an Adobe AIR program I have written to follow the links on one page and grab a section of data off of the subsequent pages. This actually works fine, and for programmers I think this(or other languages) provides a reasonable approach, to be written on a case by case basis. Maybe there is a specific language or library that allows a programmer to do this very quickly, and if so I would be interested in knowing what they are.
Also do any tools exist which would allow a non-programmer, like a customer support rep or someone in charge of data acquisition, to extract structured data from web pages without the need to do a bunch of copy and paste?
If you do a search on Stackoverflow for WWW::Mechanize & pQuery you will see many examples using these Perl CPAN modules.
However because you have mentioned "non-programmer" then perhaps Web::Scraper CPAN module maybe more appropriate? Its more DSL like and so perhaps easier for "non-programmer" to pick up.
Here is an example from the documentation for retrieving tweets from Twitter:
use URI;
use Web::Scraper;
my $tweets = scraper {
process "li.status", "tweets[]" => scraper {
process ".entry-content", body => 'TEXT';
process ".entry-date", when => 'TEXT';
process 'a[rel="bookmark"]', link => '#href';
};
};
my $res = $tweets->scrape( URI->new("http://twitter.com/miyagawa") );
for my $tweet (#{$res->{tweets}}) {
print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n";
}
I found YQL to be very powerful and useful for this sort of thing. You can select any web page from the internet and it will make it valid and then allow you to use XPATH to query sections of it. You can output it as XML or JSON ready for loading into another script/ application.
I wrote up my first experiment with it here:
http://www.kelvinluck.com/2009/02/data-scraping-with-yql-and-jquery/
Since then YQL has become more powerful with the addition of the EXECUTE keyword which allows you to write your own logic in javascript and run this on Yahoo!s servers before returning the data to you.
A more detailed writeup of YQL is here.
You could create a datatable for YQL to get at the basics of the information you are trying to grab and then the person in charge of data acquisition could write very simple queries (in a DSL which is prettymuch english) against that table. It would be easier for them than "proper programming" at least...
There is Sprog, which lets you graphically build processes out of parts (Get URL -> Process HTML Table -> Write File), and you can put Perl code in any stage of the process, or write your own parts for non-programmer use. It looks a bit abandoned, but still works well.
I use a combination of Ruby with hpricot and watir gets the job done very efficiently
If you don't mind it taking over your computer, and you happen to need javasript support, WatiN is a pretty damn good browsing tool. Written in C#, it has been very reliable for me in the past, providing a nice browser-independent wrapper for running through and getting text from pages.
Are commercial tools viable answers? If so check out http://screen-scraper.com/ it is super easy to setup and use to scrape websites. They have free version which is actually fairly complete. And no, I am not affiliated with the company :)
Im trying to implement NLP in my project,
I need to Tag the words as Person,Location ,Organix=sation etc..If any body knows the logic please let me know..
Regards,
Stack
The task you want to perform is known as Named Entity Recognition (NER).
The majority of software for doing NER is in Java. For example, the Stanford NER system and the OpenNLP NER system. There are far fewer similar libraries written in C#, however I found SharpNLP through a Google search. I have not used it personally so I have no idea how well it works.
There is a nice web-service by Reuters: http://www.opencalais.com/.
You can access it via an API.
I thought that the demo was impressive http://viewer.opencalais.com/.
I did not pursue it further, as I want to create a German application. Calais is only supporting English.