Im trying to implement NLP in my project,
I need to Tag the words as Person,Location ,Organix=sation etc..If any body knows the logic please let me know..
Regards,
Stack
The task you want to perform is known as Named Entity Recognition (NER).
The majority of software for doing NER is in Java. For example, the Stanford NER system and the OpenNLP NER system. There are far fewer similar libraries written in C#, however I found SharpNLP through a Google search. I have not used it personally so I have no idea how well it works.
There is a nice web-service by Reuters: http://www.opencalais.com/.
You can access it via an API.
I thought that the demo was impressive http://viewer.opencalais.com/.
I did not pursue it further, as I want to create a German application. Calais is only supporting English.
Related
I would like to build an application framework that is mainly interpreted.
Say that the source code would be stored in the database that could be edited by the users and always the latest version would be executed.
Can anyone give me some ideas how does one implement sth like this !
cheers,
gabor
In .Net, you can use reflection and CodeDOM to compile code on the fly. But neither approach is really very simple or practical. Mono has some ability to interpret c# on the fly as well, but I haven't looked closely at it yet.
Another alternative is to go with an interpreted .Net language like Boo or IronPython as the language for your database code.
Either way, make sure you think long and hard about the security of your platform. Allowing users to execute arbitrary code is always an exercise fraught with peril. It's often too tempting to look for a simple eval() method, and even if one exists, that is not good enough for this kind of scenario.
Try Mono ( http://www.monoproject.org ). It supports many scripting languages including JavaScript.
If you don't want to use any scripting you can use CodeDOM or Reflection (see Reflection.Emit).
Here are really useful links on the topic :
Dynamically executing code in .Net (Here you can find a tool which can be very helpul)
Late Binding and On-the-Fly Code
Generation Using Reflection in C#
Dynamic Source Code Generation and
Compilation
Usually the Program uses a scripting language for the scriptable parts, i.e. Lua or Javascript.
To answer your technical question: You don't want to write your own language and interpreter. That's too much work for you to do. So pick some other language, say Python or Lua, and look for the documentation that lets your C program hand it blocks of code to execute. Of course, the script needs to be able to do something, so you'll need to find how to expose your program's objects to the script. Also, what will happen if a client is running the program when you update its source code in the database? Should the client restart? Are you going to store the entire program as a single row in this database, or did you want to store individual functions? That affects how you structure your updates.
To address other issues with your question: Why do you want to do this? Making "interpreted language" part of your design spec for a system is not often a good sign. Is the real requirement something like this: "I update the program often and I want users to always have the latest copy?" If so, there are other, better ways to go about this (just give us your actual scenario and requirements).
I have a VFD display from Soundgraph and I want to know if there are some current
API (C#, JAVA, C++, C, etc.) to program it.
Without the model I can't say more about it, however if it's a text VFD it is very likely based on the HD44780 Character LCD controller.
This is probably not the best place for asking questions about a specific device like that. I'd recommend checking out the manufacturer's website to see if they have anything or maybe there's a user's forum where you can find something. If you don't find anything, it might be worth writing your own library and then making it publicly available.
As it's a text display, it's likely it is HD44780-compatible. For this display driver chip you will find tons of driver software and documentation. If it's equipped with a proprietary controller chip, odd are much worse. Then you would have to check the datasheet for information of the protocol used and then write your own driver SW.
This question already has answers here:
How to detect the language of a string?
(9 answers)
Closed 8 years ago.
Is there any C# library which can detect the language of a particular piece of text? i.e. for an input text "This is a sentence", it should detect the language as "English". Or for "Esto es una sentencia" it should detect the language as "Spanish".
I understand that language detection from text is not a deterministic problem. But both Google Translate and Bing Translator have an "Auto detect" option, which best-guesses the input language. Is there something similar available publicly, preferably in C#?
Yes indeed, TextCat is very good for language identification. And it has a lot of implementations in different languages.
There were no ports in .Net. So I have written one: NTextCat (NuGet, Online Demo).
It is pure .NET Standard 2.0 DLL + command line interface to it. By default, it uses a profile of 14 languages.
Any feedback is very appreciated! New ideas and feature requests are welcomed too :)
Language detection is a pretty hard thing to do.
Some languages are much easier to detect than others simply due to the diacritics and digraphs/trigraphs used. For example, double-acute accents are used almost exclusively in Hungarian. The dotless i ‘ı’, is used exclusively [I think] in Turkish, t-comma (not t-cedilla) is used only in Romanian, and the eszett ‘ß’ occurs only in German.
Some digraphs, trigraphs and tetragraphs are also a good give-away. For example, you'll most likely find ‘eeuw’ and ‘ieuw’ primarily in Dutch, and ‘tsch’ and ‘dsch’ primarily in German etc.
More giveaways would include common words or common prefixes/suffixes used in a particular language. Sometimes even the punctuation that is used can help determine a language (quote-style and use, etc).
If such a library exists I would like to know about it, since I'm working on one myself.
Please find a C# implementation based on of 3grams analysis here:
http://idsyst.hu/development/language_detector.html
Here you have a simple detector based on bigram statistics (basically means learning from a big set which bigrams occur more frequently on each language and then count those in a piece of text, comparing to your previously detected values):
http://allantech.blogspot.com/2007/07/automatic-language-detection.html
This is probably good enough for many (most?) applications and doesn't require Internet access.
Of course it will perform worse than Google's or Bing's algorithm (which themselves aren't great). If you need excellent detection performance you would have to do both a lot of hard work and over huge amounts of data.
The other option would be to leverage Google's or Bing APIs if your app has Internet access.
You'll want a machine learning algorithm based on hidden markov chains, process a bunch of texts in different languages.
Then when it gets to the unidentified text, the language that has the closer 'score' is the winner.
There is a simple tool to identify text language:
http://www.detectlanguage.com/
I've found that "textcat" is very useful for this. I've used a PHP implementation, PHP Text Cat, based on this this original implementation, and found it reliable. If you have a look at the sources, you'll find it's not a terrifyingly difficult thing to implement in the language of your choice. The hard work -- the letter combinations that are relevant to a particular language -- is all in there as data.
I am trying to find information (and hopefully c# source code) about trying to create a basic AI tool that can understand english words, grammar and context.
The Idea is to train the AI by using as many written documents as possible and then based on these documents, for the AI to create its own creative writitng in proper english that makes sense to a human.
While the idea is simple, I do realise that the hurdles are huge, any starting points or good resoueces will be appriacted.
A basic AI tool that you can use to do something like this is a Markov Chain. It's actually not too tricky to write!
See: http://pscode.com/vb/scripts/ShowCode.asp?txtCodeId=2031&lngWId=10
If that's not enough, you might be able to store WordNet synsets in your Markov chain instead of just words. This gives you some sense of the meaning of the words.
To be able to recompose a document you are going to have to have away to filter through the bad results.
Which means:
You are going to have to write a program that can evaluate if the output is valid (grammatically and syntactically is the best you can do reliablily) (This would would NLP)
You would need lots of training data and test data
You would need to watch out for overtraining (take a look at ROC curves)
Instead of writing a tool you could:
Manually score the output (will take a long time to properly train the algorigthm)
With this using the Amazon Mechanical Turk might be a good idea
The irony of this: The computer would have a difficult time "Creatively" composing something new. All of its worth will be based on its previous experiences [training data]
Some good references and reading at this Natural Language article.
As others said, Markov chain seems to be most suitable for such a task. Nice description of implementing Markov chain can be found in Kernighan & Pike, The Practice of Programming, section 3.1. Nice description of text-generating is also present in Programming Pearls.
One thing, though not quite what you need, would be a Markov chain of words. Here's a link I found by a quick search: http://blog.figmentengine.com/2008/10/markov-chain-code.html, but you can find much more information by searching for it.
Take a look at http://www.nltk.org/ (Natural Language Toolkit), lots of powerful tools there. They use Python (not C#) but Python is easy enough to pick up. Much easier to pick up than the breadth and depth of natural language processing, at least.
I agree, that you will have troubles in creating something creative. You could possibly also use a keyword spinner on certain words. You might also want to implement a stop word filter to remove anything colloquial.
Actually, maybe not full-blown Lex/Yacc. I'm implementing a command-interpreter front-end to administer a webapp. I'm looking for something that'll take a grammar definition and turn it into a parser that directly invokes methods on my object. Similar to how ASP.NET MVC can figure out which controller method to invoke, and how to pony up the arguments.
So, if the user types "create foo" at my command-prompt, it should transparently call a method:
private void Create(string id) { /* ... */ }
Oh, and if it could generate help text from (e.g.) attributes on those controller methods, that'd be awesome, too.
I've done a couple of small projects with GPLEX/GPPG, which are pretty straightforward reimplementations of LEX/YACC in C#. I've not used any of the other tools above, so I can't really compare them, but these worked fine.
GPPG can be found here and GPLEX here.
That being said, I agree, a full LEX/YACC solution probably is overkill for your problem. I would suggest generating a set of bindings using IronPython: it interfaces easily with .NET code, non-programmers seem to find the basic syntax fairly usable, and it gives you a lot of flexibility/power if you choose to use it.
I'm not sure Lex/Yacc will be of any help. You'll just need a basic tokenizer and an interpreter which are faster to write by hand. If you're still into parsing route see Irony.
As a sidenote: have you considered PowerShell and its commandlets?
Also look at Antlr, which has C# support.
Still early CTP so can't be used in production apps but you may be interested in Oslo/MGrammar:
http://msdn.microsoft.com/en-us/oslo/
Jison is getting a lot of traction recently. It is a Bison port to javascript. Because of it's extremely simple nature, I've ported the jison parsing/lexing template to php, and now to C#. It is still very new, but if you get a chance, take a look at it here: https://github.com/robertleeplummerjr/jison/tree/master/ports/csharp/Jison
If you don't fear alpha software and want an alternative to Lex / Yacc for creating your own languages, you might look into Oslo. I would recommend you to sit through session recordings of sessions TL27 and TL31 from last years PDC. TL31 directly addresses the creation of Domain Specific Languages using Oslo.
Coco/R is a compiler generator with a .NET implementation. You could try that out, but I'm not sure if getting such a library to work would be faster than writing your own tokenizer.
http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/
I would suggest csflex - C# port of flex - most famous unix scanner generator.
I believe that lex/yacc are in one of the SDKs already (i.e. RTM). Either Windows or .NET Framework SDK.
Gardens Point Parser Generator here provides Yacc/Bison functionality for C#. It can be donwloaded here. A usefull example using GPPG is provided here
As Anton said, PowerShell is probably the way to go. If you do want a lex/ yacc implementation then Malcolm Crowe has a good set.
Edit: Direct Link to the Compiler Tools
Just for the record, implementation of lexer and LALR parser in C# for C#:
http://code.google.com/p/naive-language-tools/
It should be similar in use to Lex/Yacc, however those tools (NLT) are not generators! Thus, forget about speed.