What libraries are there to write C# internationalized applications?
Typical functionalities that should be contained in the library:
Validation of country specific data (e.g. VAT numbers, phone numbers, addresses,...)
Validation of bank and financial coordinates (e.g. Credit Card numbers, IBAN,...)
Language-specific functionalities (e.g. numbers to words to numbers, summarize,...)
Language specific content filtering (e.g. swearword filtering...)
An example of such libraries in Perl would be the Internationalization/Locale section of CPAN.
What C# solutions are available?
Note: I am not looking for an introduction to the System.Globalization namespace :)
Note 2: Should I desume that there are no options available? Is someone interested in joining forces and create one?
Note 3: Edit to make the question appear on front page in hope of more answers. This isn't such a hard question, how is it possible that Stackers don't ever do i18n?
One project that is working towards a database of globalization, internationalization and localization knowledge is the Unicode Common Locale Data Repository, based on the old ICU project at IBM.
As it is a database of XML data it doesn't contain any .NET-specific code, but as a body of knowledge it is very good.
Only a smallish subset is in the .NET framework. Microsoft hasn't gone near any of the supplemental stuff, like postcode formats, number spelling (for check/cheque amounts), etc. Standard time zone names (from the Olson/tz distribution), etc. are also included, with mappings to the Windows-specific names. Some of the hierarchical locale-specific behaviours also have better support.
I wouldn't say that no one does i18n, but I don't know of any generic tools that can be used for every project. Maintaining a database with all of the information you are looking for would be an epic project. It sounds like what you're looking for isn't a specific C# library, but more a collection of information online that you can draw from. If you were able to find a repository of swear words in various languages (for example), it would be trivial for you to use this in C#. I think that finding a solution that wraps up all of your requirements into an easy-to-use assembly is going to be impossible to find.
Have a look at
http://www.microsoft.com/globaldev/getwr/dotneti18n.mspx
and
http://www.dotneti18n.com/
String to number and vice versa can be dones as following:
culture = new CultureInfo(locale);
int number = Convert.ToInt32(myString, culture.NumberFormat);
string str= Convert.ToString(myNumber, culture.NumberFormat);
As to checking VATS and adresses, I'm interested in that too, haven't found anything useful so far.
Not exactly a "library", per se, but I've actually ran into a great service (for pay), by a company called E4X (former client of mine).
What they provide is complete localization of your ecommerce site, including language translations, currency exchanges, local billing and handling of financial transactions including region-specific taxes etc, and more. They even deal with logisitics of physical shipping...
Worth looking into, for an ecommerce business. Let 'em know I sent you... ;-)
That's a huge endeavor. Let's start with one simple problem: phone numbers. Libphonenumber Google library at http://code.google.com/p/libphonenumber/ has a C# port at https://bitbucket.org/pmezard/libphonenumber-csharp with notes at http://blog.thekieners.com/2011/06/06/using-googles-libphonenumber-in-microsoft-net-with-c/. Appears to be a good library for handling both US and int'l numbers.
Related
Is there any C# algorithm by which personal and place names can be extracted from text?
e.g., given the following text:
St. Mark died at Alexandria, in Egypt. He was martyred, I think.
However, that has nothing to do with my legend. About the founding of
the city of Venice--
(taken from "The Innocents Abroad" by Mark Twain)
...is there any way to extract:
St. Mark
Alexandria (or better yet, "Alexandria, Egypt")
Venice
?
I realize that there is no way to get 100% accuracy (where all place names and personal names are captured, and no "false positives" are added), but 80% accuracy could be very valuable.
I understand that each word could be compared with an encyclopedia or some such, but there must be a better way. Also, how could the algorithm know to combine "St." and "Mark" and to see "Alexandria, in Egypt" as "Alexandria, Egypt"?
I noticed that the links provided here are a bit dated. One project that is still active (and free [correction: GPL, so free for non-commercial]) is the Stanford Natural Language Processing (NLP) libraries (https://nlp.stanford.edu/software/). You can demo their Named Entity Recognition (NER) here. It even has a .NET wrapper (http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordNER.html).
Microsoft also offers many similar algorithms through Azure Cognitive Services. You would be most interested in Entity Linking (https://azure.microsoft.com/en-us/services/cognitive-services/entity-linking-intelligence-service/)
I hope helps future viewers.
You are best off using some kind of API that will be able to perform this kind of entity matching, as what you are asking is potentially very complex and requires some degree of semantic textual analysis backed up by a large database. I'd recommend at looking at APIs such as:
OpenCalais - English Semantic Metadata: Entity/Fact/Event Definitions and Descriptions web-service
Calais supports a rich set of semantic metadata, including entities, events and facts.
Alchemy API - Entity Extraction API
AlchemyAPI is capable of identifying people, companies, organizations, cities, geographic features, and other typed entities within your HTML, text, or web-based content. We employ sophisticated statistical algorithms and natural language processing technology to analyze your information, extracting the semantic richness embedded within.
I want to design an efficient & flexible architecture to process Swaps based on a standard financial protocol - FPML (Financial products markup language).
I've researched on internet but didn't find much information. The definitions which I found are:
SWAP (definition):
Swap refers to an exchange of one financial instrument for another between the parties concerned. This exchange takes place at a predetermined time, as specified in the contract.
FPML:
FPML (Financial products Markup Language) is the open source XML
standard for electronic dealing and processing of OTC derivatives. It
establishes the industry protocol for sharing information on, and
dealing in, financial derivatives and structured products.
Seems you have got an assignment to complete. I had almost the same thing to build using c# and ms sql.
Few important links and mentioning which will help you.
Few References
http://www.fpml.org
http://www.investinganswers.com/financial-dictionary/optionsderivatives/interest-rate-swap-2252
http://www.investopedia.com/terms/s/swap.asp
http://en.wikipedia.org/wiki/Swap_(finance)
http://www.investopedia.com/terms/e/equityswap.asp
http://www.fpml.org/documents/FpML5-products-framework.pdf
http://www.investopedia.com/articles/optioninvestor/07/swaps.asp
Hope this will help you.
Have a look at the following stackoverflow posting: http://bit.ly/python-xml-swap
Just wondering if anyone is aware of any low level phone number storage constructs for C#. I have been surprised to find that all of my searches have proved fruitless for such a library.
Essentially I am hoping for something that can take a string phone number input (in all its varied goodness) and both validate and segment the given string into its various sections (IE: country code, area code, number) along with providing a common format to store this data.
Does any such library exist? If not, any idea why something like this hasn't been attempted? (Is it really that hard a problem?)
I think that the main reason is that between people themselves there seems to be no real standard way of writing the phone number. For instance, people living on small islands tend to not have regional codes and there is no need for the country code when calling residents of the same island.
This changes when you move to larger places. Also, I have seen certain numbers written as (XXXX)XXXXX or XXXXX-XXXXX or XXXXXXXXXX.
The standard way of dealing with this seems to be with regular expressions. The developer usually takes a few possible input formats and uses regular expressions to validate and transform the format of the number.
So it turns out there is actually a pretty awesome library available after all.
Original Java version: http://code.google.com/p/libphonenumber/
C# version: https://bitbucket.org/pmezard/libphonenumber-csharp/wiki/Home
Me discussing the library on my blog.
I have just been looking at the 'Forgiving Format' design pattern (e.g. http://ui-patterns.com/patterns/ForgivingFormat), however I am surprised that I can't find any libraries implementing this (specifically for simple date/times). Does anybody know of any (perferably open source) libraries for this?
Thanks
I don't think this is a design pattern, rather a UI pattern... (edit: i just noticed the name of the website you linked to :) )
As a matter of fact, this functionality exists in some libraries. The first one to spring to mind is dateJS, a javascript parsing library that allows for fuzzy date input. However since i last heard of it, there hasn't been much activity in the project.
Apart from dates, countries, etc... , i think that any project of this kind is very business-specific; first you've got to learn how users express themselves and how to translate this in business terms. Working on a generic translator doesn't look like it's feasible, at least not without a lot of configuration.
The Forgiving Format design pattern is heavily dependent upon your interface. If you are using HTML 4 and have only a text box, how is it supposed to know that only numbers are acceptable? How is it supposed to know that 2.30 is supposed to mean 2:30 as in hour of the day? Et cetera.
There are jQuery plugins which steer user input in the right direction using general rules, however you're the one to determine what is acceptable and what isn't in the end. And if you wanted to have a field which accepted either telephone numbers or e-mail addresses, you'd be hard-pressed to find a library which validates it as such without a little tweaking.
Ultimately it comes down to you to be able to determine what is tolerated input and what isn't. Libraries merely help you do the more common validation.
I require POS tagging for my files in the corpus.
I have successfully followed the installation instructions of SharpNlp
I am using the binary version
I created a new c# project in: E:\sharp\sharpapp
location of Models Folder is: E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is: E:\sharp\SharpNLP-1.0.2529-Bin
I have also followed the instructions to modify both .config files "ParseTree.Exe" and "ToolsExamples.Exe"
Now in my c# project I have a class called tagging.cs where I have to access my corpus text files and do POS tagging for those files. Can anybody help me how can I make use of SharpNlp to do so
Please provide steps to do so.
In a nutshell, SharpNLP is
a port to C# of OpenNLP Tools and OpenNLP MaxEnt
a connector to WordNet
a set of pre-computed models, mostly for the English language
utility modules such as integration with SQLLite
It should be noted that the port of the OpenNLP libraries is relatively informal, with various class and property name changes, possibly loose preservation of features and semantics and no apparent connection with the original Java projects' lifecycle. This situation will likely ensure that in time the OpenNLP portion of SharpNLP will be more akin to distant cousins than twin sisters...
Never the less, it is possible to use examples and documentation from OpenNLP to complement the relatively thin support material available with SharpNLP. Between the source code of SharpNLP and resources like the OpenNLP API reference and the OpenNLP wiki, one can generally map things and adapt accordingly.
A loose conductor could be the study of this particular source file which makes use of OpenNLP in a way that seems close to what you may need. Note the name changes between OpenNLP and SharpNLP, for example POSTTaggerME class becomes MaximumEntropyPosTagger and the Parse() method and its overload turn to TagSentence() and such.
A more general hint is to understand...
...the sequence of steps typically necessary to perform POS Tagging.
This is a very high-level approximate description but, I think, useful.
get the text to be tagged = string(s) of text
Initialize a text parser
parse it = an "array" (or other container) with individual tokens i.e. words and punctuation characters.
initialize the POS Tagger, in particular tell its which model it should use
feed the [ordered] sequence of tokens to the POS Tagger
Ta dah! Use the POS tags for the eventual purpose of your NLP application.
Note how the above sequence assumes that the model is readily available.
The model is a representation of the statistical "profile" of text in general, obtained from training the Tagger with a set of text readily tagged.
SharpNLP comes with a model for generic English language, but in order to tag other languages or if the specific corpora to be tagged belongs to a particular domain (say medical reports or Tweets or...) it may be preferable to re-train the tagger to improve its precision.
Open/SharpNLP as most POS Taggers, whether stand-alone or their API, typically include features to train them (= to produce a model given a sample set of text readily tagged) and also to verify the quality of the model/tagger so produced (= to compare the tags produced on a test set, with the tags expected for this set).
Kindly read through the article that I have written for this. It will give you a detailed step by step method with sample code snippets.
Easy way of Integrating SharpNLP into your project in Visual Studio
I hope this was useful.