Recommended data format for describing the rules of chess - c#

I'm going to be writing a chess server and one or more clients for chess and I want to describe the rules of chess (e.g. allowable moves based on game state, rules for when a game is complete) in a programming language independant way. This is a bit tricky since some of the chess rules (e.g. King Castling, en passent, draws based on 3 or more repeated moves) are based not only on the board layout but also on the history of moves.
I would prefer the format to be:
textual
human readable
based on a standard (e.g. YAML, XML)
easily parsable in a variety of languages
But I am willing to sacrifice any of these for a suitable solution.
My main question is: How can I build algorithms of such a complexity that operate on such complex state from a data format?
A followup queston is: Can you provide an example of a similar problem solved in a similar manner that can act as a starting point?
Edit: In response to a request for clarity -- consider that I will have a server written in Python, one client written in C# and another client written in Java. I would like to avoid specifying the rules (e.g. for allowable piece movement, circumstances for check, etc.) in each place. I would prefer to specify these rules once in a language independant manner.

Let's think. We're describing objects (locations and pieces) with states and behaviors. We need to note a current state and an ever-changing set of allowed state changes from a current state.
This is programming. You don't want some "meta-language" that you can then parse in a regular programming language. Just use a programming language.
Start with ordinary class definitions in an ordinary language. Get it all to work. Then, those class definitions are the definition of chess.
With only miniscule exceptions, all programming languages are
Textual
Human readable
Reasonably standardized
Easily parsed by their respective compilers or interpreters.
Just pick a language, and you're done. Since it will take a while to work out the nuances, you'll probably be happier with a dynamic language like Python or Ruby than with a static language like Java or C#.
If you want portability. Pick a portable language. If you want the language embedded in a "larger" application, then, pick the language for your "larger" application.
Since the original requirements were incomplete, a secondary minor issue is how to have code that runs in conjunction with multiple clients.
Don't have clients in multiple languages. Pick one. Java, for example, and stick with it.
If you must have clients in multiple languages, then you need a language you can embed in all three language run-time environments. You have two choices.
Embed an interpreter. For example Python, Tcl and JavaScript are lightweight interpreters that you can call from C or C# programs. This approach works for browsers, it can work for you. Java, via JNI can make use of this, also. There are BPEL rules engines that you can try this with.
Spawn an interpreter as a separate subprocess. Open a named pipe or socket or something between your app and your spawned interpreter. Your Java and C# clients can talk with a Python subprocess. Your Python server can simply use this code.

There's already a widely used format specific to chess called Portable Game Notation. There's also Smart Game Format, which is adaptable to many different games.

I would suggest Prolog for describing the rules.

This is answering the followup question :-)
I can point out that one of the most popular chess servers around documents its protocol here (Warning, FTP link, and does not support passive FTP), but only to write interfaces to it, not for any other purpose. You could start writing a client for this server as a learning experience.
One thing that's relevant is that good chess servers offer many more features than just a move relay.
That said, there is a more basic protocol used to interface to chess engines, documented here.
Oh, and by the way: Board Representation at Wikipedia
Anything beyond board representation belongs to the program itself, as many have already pointed out.

Edit: Overly wordy answer deleted.
The short answer is, write the rules in Python. Use Iron Python to interface that to the C# client, and Jython for the Java client.

What I've gathered from the responses so far:
For chess board data representations:
See the Wikipedia article on [chess board representations](http://en.wikipedia.org/wiki/Board_representation_(chess)).
For chess move data representations:
See the Wikipedia articles on Portable Game Notation and Algebraic Chess Notation
For chess rules representations:
This must be done using a programming language. If one wants to reduce the amount of code written in the case where the rules will be implemented in more than one language then there are a few options
Use a language where an embedable interpreter exists for the target languages (e.g. Lua, Python).
Use a Virtual Machine that the common languages can compile to (e.g. IronPython for C#, JPython for Java).
Use a background daemon or sub-process for the rules with which the target languages can communicate.
Reimplement the rules algorithms in each target language.
Although I would have liked a declarative syntax that could have been interpreted by mutliple languages to enforce the rules of chess my research has lead me to no likely candidate. I have a suspicion that Constraint Based Programming may be a possible route given that solvers exist for many languages but I am not sure they would truly fulfill this requirement. Thanks for all the attention and perhaps in the future an answer will appear.

Drools has a modern human readable rules implementation -- https://www.jboss.org/drools/.
They have a way users can enter their rules in Excel. A lot more users can understand what is in Excel than in other tools.

To represent the current state of a board (including castling possibilities etc) you can use
Forsyth-Edwards Notation, which will give you a short ascii representation. e.g.:
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Would be the opening board position.
Then to represent a particular move from a position you could use numeric move notation (as used in correspondence chess), which give you a short (4-5 digits) representation of a move on the board.
As to represent the rules - I'd love to know myself. Currently the rules for my chess engine are just written in Python and probably aren't as declarative as I'd like.

I would agree with the comment left by ΤΖΩΤΖΙΟΥ, viz. just let the server do the validation and let the clients submit a potential move. If that's not the way you want to take the design, then just write the rules in Python as suggested by S. Lott and others.
It really shouldn't be that hard. You can break the rules down into three major categories:
- Rules that rely on the state of the board (castling, en passant, draws, check, checkmate, passing through check, is it even this player's turn, etc.)
- Rules that apply to all pieces (can't occupy the same square as another piece of your own colour, moving to a square w/ opponent's piece == capture, can't move off the board)
- Rules that apply to each individual piece. (pawns can't move backwards, castles can't move diagonally, etc)
Each rule can be implemented as a function, and then for each half-move, validity is determined by seeing if it passes all of the validations.
For each potential move submitted, you would just need to check the rules in the following order:
is the proposed move potentially valid? (the right "shape" for the piece)
does it fit the restraints of the board? (is the piece blocked, would it move off the edge)
does the move violate state requirements? (am I in check after this move? do I move through check? is this en passant capture legal?)
If all of those are ok, then the server should accept the move as legal…

Related

If it is possible to auto-format code before and after a source control commit, checkout, diff, etc. does a company really need a standard code style?

If it is possible to auto-format code before and after a source control commit, checkout, diff, etc. does a company really need a standard code style?
It feels like standard coding style debates that have been raging since programming began like "put the bracket on the following line" or "properly indent your (" are no longer essential.
I realize in languages where white space matters the diff will have to consider it but for languages where the style is a personal preference is there really a need to worry about it anymore?
Auto-format can really only address whitespace.
It wont address developers giving variables bizarre nonsensical names.
It won't address some developers having functions return null on an error vs throwing an exception.
I'm sure others can think of more examples.
This is what we do at my work:
We all use Eclipse. We don't have a policy for using Eclipse but somehow none of us is an IDEA/IntelliJ guy. We also think our code should be written with legacy in mind. This means our code has to be readable in a certain way even years after (#1) no matter who wrote it and if that person even is in the company anymore.
Eclipse has couple handy features, automatic format on save and a specific Formatter tool. As you can see from the linked screenshot, it can be configured with XML. Thus there's a bunch of premade XML:s available for every worker in our company so that when a new guy comes in, we walk him through of the whole process and configure their Eclipse for them (yes, it's slightly evil thing to do) so that it actually uses those formatting XML:s we have provided. We do not enforce automatic format on save, we don't want to be completely intrusive, we just want to push all our developers into the right directions. For even increased compatability, we mostly use rules defined in JCC.
Next comes the important part, the actual builds. We are those who embrace automatic builds and for that we use Hudson Continuous Integration Server. There's two important parts in our configurations beyond this:
We use CVS loginfo to trigger builds whenever something is committed.
We utilize several plugins available for Hudson, including Continuous Integration Game in conjuction with the most important one, Checkstyle.
The Checkstyle Plugin is the magician in our code style enforcement guide line:
After commiting code to CVS, Hudson build is triggered
After build has been completed succesfully (all unit tests pass etc.), Checkstyle inspects the actual source files
Checkstyle ranks the code based rules we have defined for it
Continuous Integration Game sees the result of Checkstyle and awards/takes away points for the person who has the ownership for the relevant part of the code
Leaderboard shows total points for every commiter in system
Basically this means that when anyone commits ugly code into our CVS, our build server automatically reduces that person's points.
This means that eventually any one of us can be ranked on the Leaderboard based on the general code quality in both look and OO principles such as Law of Demeter, Cyclomatic complexity etc. etc. Naturally this isn't a completely serious statistic, but it's a good indication you're doing something wrong when causing a build to be initiated in our CI won't reduce your points - most of our commits are worth between 1 and 5 points.
And is it working? Sort of, I don't think anyone of us at my work writes ugly or unmaintainable code and personally I love to hunt all kinds of scores so it's definitely motivating me to make code that looks nice and follows all the OO paradigms I know of.
And do we as a company really need it? I think we do as you should see from reading this entire answer, it can be considered a good practice for the advancements it brings.
#1: in a related note, I refactored legacy code from 2002 today which used those standards, didn't look "bad" at all even in its original form and certainly not worse in its new form
No, not really.
If you can actually get it to work consistently and not make it flag code has changed due to a different style of laying the code out.
However, this is just a small part of coding standards. It won't cover multiple return statements, the use or not of ternary operators, etc.
It is always nice if the coding style that the shop uses is the same one that is also followed by the development tools.
Otherwise, if there is a large body of code that already follows a shop standard which is NOT the same as that of the tools you have two choices:
Modify all of the code to follow the tool standard, or
Maintain the existing shop standard.
Many shops do the latter. Regardless, there does need to be some kind of standard, and it does need to be followed.
Some development tools allow you to tweak their standard. In some cases you may be able to bring the tools in alignment with the shop standard.
It probably doesn't matter that much anymore if you can ensure that everybody in the team sees the source code "correctly" formatted, whatever they think it is. However I've not seen a system that can do that - you can do parts of it (say, reformat before and after checkin/checkout) but these days you also have to consider web interfaces into the version control, external code review systems that interact directly with the version control system etc.
The main purpose of a standard code style is (IMHO) to ensure that you can read other team members' code easily without having to start reverse engineering it because all the code is written using the same sort of guiding principles. Indenting and parentheses placement seem to be a major hangup on this but they are only a very small and in my opinion, somewhat overblown and not very important part of the need to make code consistent.
Unfortunately I'm not aware of any tools that can automatically apply consistent coding principles to source code...
Yes, coding styles are needed if there is a desire to have a homogeneous code base. Such a code base can be useful in preventing individual ownership of parts of the code base, which can cause problems when people leave the team. If you can't imagine having wildly different styles and problems understanding all of it, just look at all the different ways English text can be organized in various communications, all written but quite different such as tweets, e-mail, text messages, IM, message board posts, etc. and changes in fonts, capitalization, decorations, etc.

What Python features will excite the interest of a C# developer? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
For someone who’s been happily programming in C# for quite some time now and planning to learn a new language I find the Python community more closely knit than many others.
Personally dynamic typing puts me off, but I am fascinated by the way the Python community rallies around it. There are a lot of other things I expect I would miss in Python (LINQ, expression trees, etc.)
What are the good things about Python that developers love? Stuff that’ll excite me more than C#.
For me its the flexibility and elegance, but there are a handful of things I wish could be pulled in from other languages though (better threading, more robust expressions).
In typical I can write a little bit of code in python and do a lot more than the same amount of lines in many other languages. Also, in python code form is of utmost importance and the syntax lends its self to highly readable, clean looking code. That of course helps out with maintenance.
I love having a command line interpreter that I can quickly prototype an algorithm in rather than having to start up a new project, code, compile, test, repeat. Not to mention the fact I can use it to help me automate my server maintenance as well (I double as a SA for my company).
The last thing that comes to mind immediately is the vast amounts of libraries. There are a lot of things already solved out there, the built-in library has a lot to offer, and the third party ones are many times very good (not always though).
Being able to type in some code and get the result back immediately.
(Disclaimer: I use both C# and Python regularly, and I think both have their good and bad points.)
I'm primarily .NET developer and using Python for me personal projects.
What are the good things about python that developers love?
I can say for myself - Python is like a breath of fresh air.
1) It's simple to learn, took about a week for me in the evenings. I'm saying about Python + Django. Python syntax is quite simple.
2) It's simple to use. No troubles installing Python + Django on Windows at all.
3) It can be run on Windows and UNIX.
4) I need it for web, so I get cheaper hosting than ASP.NET.
5) All the advantages of Python language over C#. Like tuples - so useful!
The only thing I don't like is that my favorite IDE Visual Studio doesn't support it (I know about IronPython, don't you worry).
I'm a very heavy user of both C# and Python; I've built very complicated applications in both languages, and I've also embedded Python scripting in my major C# application. I'm not using either to do much in the way of web work right now, but other than that I feel like I'm pretty qualified to answer the question.
The things about Python that excite me, in particular:
The deep integration of generators into the language. This was the first thing that made me realize that I needed to take a long, serious look at Python. My appreciation for this has deepened considerably since I've become conversant with the itertools module, which looks like a nifty set of tools but is in fact a new way of life.
The coupling of dynamic typing and the fact that everything's an object makes pretty sophisticated techniques extremely simple to implement. It's so easy to replace logic with tables in Python (e.g. o = class_map[k]() instead of if k='foo': o = Foo()) that it becomes a basic technique. It's so normal in Python to write methods that take methods as parameters that you don't raise an eyebrow when you see d = defaultdict(list).
zip, and the methods that are designed with it in mind. It takes a while before you can intuitively grasp what dict(zip(k, v)) and d.update(zip(k, v)) are doing, but it's a paradigm-shifting moment when you get there. An entire universe of uninteresting and potentially error-laden code eliminated, just by using one function. Then you start designing functions and classes with the expectation that they'll be used in conjunction with zip, and suddenly your code gets simpler and easier. (Protip: Or itertools.izip. Or itertools.izip_longest.)
Speaking of dictionaries, the way that they're deeply integrated into the language. Understanding what a line of code like self.__dict__.update(**kwargs) does is another one of those paradigm-shifting moments.
List comprehensions and generator expressions, of course.
Inexpensive exceptions.
An interactive intepreter.
Function decorators.
IronPython, which is so much simpler to use than we have any right to expect.
And that's without even getting into the remarkable array of functionality in the standard modules, or the ridiculous bounty of third-party tools like BeautifulSoup or SQL Alchemy or Pylons.
One of the most direct benefits that I've gotten from getting deeply into Python is that it has greatly improved my C# code. I could generally understand code that had a variable of type Dictionary<string, Action<Foo>> in it, but it didn't seem natural to write it. (I use static dictionaries to replace hard-coded logic far more frequently today than I did a year ago.) I have no difficulty understanding what LINQ is doing now, or how IEnumerable<T> and return yield work.
So what don't I like about Python?
Dynamic typing really limits what you can do with static code analysis. Not only isn't there a tool like Resharper for Python, in a language where it's possible to write getattr(x, y)() there really can't be.
It has a bunch of inelegant conventions. How I would love to be able to go back in time and try to talk GVR out of the idea that lambda expressions should be introduced with the word lambda - it's pretty damning that something as fundamental as lambda expressions should be more concise in C# than they are in Python. The leading and trailing double-underscore convention is horrible, and the fact that people mutely acquiesce to it is testimony to Dostoevsky's observation that man is the animal who can get used to anything. And don't get me started on the fact that a module with the name of StringIO was allowed to get out the door.
Some of the features that make Python work on multiple platforms also make it kind of baffling. It's easy to use import, but it's really not easy to understand what the hell it's actually doing. (Where is it looking? What does __init__.py do? Etc.)
The amazingly rich library of standard modules is so amazingly rich that it's hard to know what's in it. It's often easier to write a function than it is to find out whether or not there's something in the standard library that does the same thing - I'm looking at you, itertools.chain.
Your question is kind of like a plumber asking why carpenters are always going on and on about hammers. After all the plumber doesn't have a hammer and has never missed it. Python (even IronPython) and C# target different types of developers and different types of programs. I am very comfortable in Python and enjoy the freedom to focus on the business rules without being distracted by the syntax requirements of the language. On the other hand I have written some fairly substantial code in C# and would be very concerned about the lack of type safety had I taken on the same task in Python. This is not to say that Python is a "toy" language. You can (and people have) write a complete medium or large application in Python. You have the freedom of dynamic typing, but you also have the responsibility to keep it all straight (frameworks help here). Similarly you can write a small application in C#, but you will bring along some overhead you do not likely need.
So if the problem is a nail use a hammer, if the problem is a screw use a screw driver. In other words spend some time to learn Python, get to know it's streangths (text processing, quick coding cycles, simple clean code, etc) and then when you are looking at tackling a new problem ask whether you would be better off in Python or C#. One thing is certain. So long as C# is the only programming language you know, it is the only one you will ever use.
Pat O
My language of choice is C#, and I didn't quite see the point for me to learn Python so far. This talk from PDC09 really piqued my interest: the guy demonstrates how you can use IronPython (or IronRuby) to make a C# app scriptable (in his demo, drop a Python script in a text box, and it works with/extends your C# code). I found this really fascinating: I don't even know where I would start to do something similar in C#, and this made me at least appreciate that it brings something different to the table, which could really enrich what I can develop!
I'm an asymmetrical user of both languages, in a sense that I use C# mostly professionally and Python for all my "fun" projects (not that work is never fun, but... you know...)
This difference of context may skew my perspective, including my opinion that they are two distinct types (pun intended) of languages for, generally, distinct purposes.
This said, it may not be a coincidence that Python is, at this point in time, [one of?] the languages of choice for all kinds of cutting edge, somewhat scholarly, technology/science oriented projects. (And BTW, this "scholarly" keyword here doesNOT imply, that Python is a university toy, plenty of "serious" applications in plenty of domains/industry are proof to the contrary). This may be due to several factors:
(I don't develop most points, readily well expressed in other responses)
the openness and quasi universal availability of Python (unlike C# !)
the lightweight / ease of use / low learning curve
the extensive, high quality, "standard" library and the extensiver (and occasionally bum quality, but on the whole available, open-sourced, etc.) additional library.
the wide array of open source projects in Python language
the relative ease to bind with C/C++ for reusing legacy code, but also for placing performance-critical portions of a project
the generally higher level of abstraction of may constructs of the language
the multi-paradigms (imperative, object oriented and functional)
the availability of practitioners in so many domain of science and technology
and, yes, the
"herd mentality effect" mentioned in a remark, possibly in a [self?] deriding way. The fact that a language attracts a broad, "closely knit" community, makes it attractive too, beyond the superficial ("look cool" and such) traits of herd mentality. Put in broader context, sometimes the best technology/language to use is not measured on the its intrinsic merits but on the overall "picture", including the user community.
I like all stuff with [] and {}. Selectors like this [-1:1]. Possibility to write less code, but more something meaningfull, that gives to write Models and other declarative things very DRY.
Like any programming language, it is just a tool in the box or a brush by which you may paint your creation. Any creative endeavour requires that the artist loves the tools he uses; otherwise, the outcome suffers. Some people like Python for the same reason others love Perl. Incidentally, I have found that most Python lovers loathe Perl's flexible and expressive syntax. As a Perl lover, I don't hate Python, but consider it to be overly structured and restrictive.
If you ask me, all of these throngs of people who seem to love Python were silently suffering under the tool choices before Python came into being. Some suffered under Perl, others under something else. In other words, I believe that when Python came along, it found a large group of silent sufferers longing for a tool like Python.
I can't program in Python because I can't "think" in Python. I can "think" in Perl, therefore, it is the tool I prefer. The silently suffering mass of, now, Python users seem to have found some long lost salvation. Now if they could only keep their evangelism to themselves :).
If you are familiar with the .NET CLR and prefer a statically-typed language, but you like Python's lightweight syntax, then perhaps Boo is the language for you.
Don't get me wrong, I am and will always be a devoted fan of C#.
But sometimes there are things I can't do in C#. lthough C# keeps reducing those gaps, Python is still the language I go to to fill them.
It's dynamic, flexible, powerful, and clean. Lovely language. Whenever I need to script or build dynamic or functional (as in functional programming) software, I go Python.
For me Python is the most elegant language I've used. The syntax is minimalist (significantly less punctuation than most) and intentionally modeled after the psuedo-code conventions which are ubiquitously used by programmer to outline their intentions.
Python's if __name__ == '__main__': suite encourages re-use and test driven development.
For example, the night before last I hacked together to run thousands of ssh jobs (with about 100 concurrently) and gather up all the results (output, error messages, exit values) ... and record the time take on each. It also handles timeouts (An ssh command can stall indefinitely on connection to a thrashing system --- it's connection timeouts and retry options don't apply after the socket connection is made, not matter if the authentication stalls). This only takes a few dozen lines of Python and it's really is easiest to create it as a class (defined above the __main__ suite) and do my command line parsing in a simple wrapper down inside __main__. That's sufficient to do the job at hand (I ran the script on 25,000 hosts the next day, in about two hours). It I can now use this code in other scripts as easily as:
from sshwrap import SSHJobMan
cmd = '/etc/init.d/foo restart'
targets = queryDB(some_criteria)
job = SSHJobMan(cmd, targets)
job.start()
while not job.done():
completed = job.poll()
# ...
# Deal with incremental disposition of of completed jobs
for each in sorted(job.results):
# ...
# Summarize results
... and so on.
So my script can be used for simple jobs ... and it can be imported as a module for more specialized work that couldn't be described on my wrapper's command line. (For example I could start up "consumer" subprocesses for handling other work on each host where the job was successful while spitting out service tickets or automated reboot requests for all hosts reporting timeouts or failures, etc).
For modules which have no standalone usage I can use the __main__ suite to contain unit-tests. Thus every module can contain its own tests ... which, in fact, can be integrated into the "doc strings" using the doctest module from the standard libraries. (Which, incidentally, means that properly formatted examples in the documentary comments can be kept in sync with the implementation ... since they are parts of the unit-test suite).
The main thing I like about Python is its very concise, readable syntax. Though using indentation as a block delimiter can seem strange at first, once you begin to code a lot in the language I find it begins to make sense. Though the core language is quite simple, its more advanced features, e.g. list comprehension, decorators and generators, are rather useful too.
In addition, the Python standard library is just fantastic; its documentation is very well written, and it contains a lot of very useful packages. I also find that there are plenty of good bindings for C libraries, such as PyGTK, Webkit and Qt, to name but a few.
One caveat is that Python, like most dynamic languages, is quite slow in comparison with compiled, statically-typed languages. However, you can easily extend it with C, allowing you to write code requiring better performance in C and the rest in Python.
It's a great language overall, and (for me at least) makes coding more productive and enjoyable.

Detect language of text [duplicate]

This question already has answers here:
How to detect the language of a string?
(9 answers)
Closed 8 years ago.
Is there any C# library which can detect the language of a particular piece of text? i.e. for an input text "This is a sentence", it should detect the language as "English". Or for "Esto es una sentencia" it should detect the language as "Spanish".
I understand that language detection from text is not a deterministic problem. But both Google Translate and Bing Translator have an "Auto detect" option, which best-guesses the input language. Is there something similar available publicly, preferably in C#?
Yes indeed, TextCat is very good for language identification. And it has a lot of implementations in different languages.
There were no ports in .Net. So I have written one: NTextCat (NuGet, Online Demo).
It is pure .NET Standard 2.0 DLL + command line interface to it. By default, it uses a profile of 14 languages.
Any feedback is very appreciated! New ideas and feature requests are welcomed too :)
Language detection is a pretty hard thing to do.
Some languages are much easier to detect than others simply due to the diacritics and digraphs/trigraphs used. For example, double-acute accents are used almost exclusively in Hungarian. The dotless i ‘ı’, is used exclusively [I think] in Turkish, t-comma (not t-cedilla) is used only in Romanian, and the eszett ‘ß’ occurs only in German.
Some digraphs, trigraphs and tetragraphs are also a good give-away. For example, you'll most likely find ‘eeuw’ and ‘ieuw’ primarily in Dutch, and ‘tsch’ and ‘dsch’ primarily in German etc.
More giveaways would include common words or common prefixes/suffixes used in a particular language. Sometimes even the punctuation that is used can help determine a language (quote-style and use, etc).
If such a library exists I would like to know about it, since I'm working on one myself.
Please find a C# implementation based on of 3grams analysis here:
http://idsyst.hu/development/language_detector.html
Here you have a simple detector based on bigram statistics (basically means learning from a big set which bigrams occur more frequently on each language and then count those in a piece of text, comparing to your previously detected values):
http://allantech.blogspot.com/2007/07/automatic-language-detection.html
This is probably good enough for many (most?) applications and doesn't require Internet access.
Of course it will perform worse than Google's or Bing's algorithm (which themselves aren't great). If you need excellent detection performance you would have to do both a lot of hard work and over huge amounts of data.
The other option would be to leverage Google's or Bing APIs if your app has Internet access.
You'll want a machine learning algorithm based on hidden markov chains, process a bunch of texts in different languages.
Then when it gets to the unidentified text, the language that has the closer 'score' is the winner.
There is a simple tool to identify text language:
http://www.detectlanguage.com/
I've found that "textcat" is very useful for this. I've used a PHP implementation, PHP Text Cat, based on this this original implementation, and found it reliable. If you have a look at the sources, you'll find it's not a terrifyingly difficult thing to implement in the language of your choice. The hard work -- the letter combinations that are relevant to a particular language -- is all in there as data.

What's the "Hello World!" of genetic algorithms good for?

I found this very cool C++ sample , literally the "Hello World!" of genetic algorithms.
I so decided to re-code the whole thing in C# and this is the result.
Now I am asking myself: is there any practical application along the lines of generating a target string starting from a population of random strings?
EDIT: my buddy on twitter just tweeted that "is useful for transcription type things such as translation. Does not have to be Monkey's". I wish I had a clue.
Is there any practical application along the lines of generating a target string starting from a population of random strings?
Sure. Imagine any scenario in which you know how to evaluate the fitness of a particular string, and in which the choices are discrete and constrained in some way:
Picking pronounceable names ("Xhjkxc" has low fitness; "Artekzo" has high fitness)
Trying out a series of chess moves
Guessing the combination to a safe, assuming you can tell how close you are to unlocking each tumbler
Picking phone numbers that evaluate to words (e.g. "843-2378" has high fitness because it spells "THE-BEST")
No. Each time you run the GA, you are giving it the eventual answer. This is great for showing how a GA works and to show how powerful it can be, but it does not have any purpose beyond that.
You could write an EA that writes code in a dynamic language like IronPython with the goal of creating code that a) executes without crashing and b) analyzes the stock market and intelligently buys and sells stock.
That's a very simplistic take on what would be necessary, but it's possible. You would need a host that provides a lot of methods for the IronPython code (technical indicators, etc) and a database of ticks.
It would also be smart to not just generate any old random code, lest you format your own hard-drive. You need a sandbox, and you need to limit the namespaces that are accessable, and you would need to provide a time limit to avoid infinite loops. You could also provide symantic guidelines that allow it to choose appropriate approved keywords instead of just stringing random letters together -- this would greatly speed up evolution.
So, I was involved with a project that did everything but the EA. We had a satellite dish that got real-time stock ticks from the NASDAQ, a service for trading that had an API, and a primitive decision making "brain" that made decisions as the ticks came in.
Sadly, one of the partners flipped out, quit his job, forked the project (got his own dish, etc), and started trading with logic that wasn't ready. He lost a bunch of money. It turns out that for some people this type of project is only a step away from common gambling. But anyway, the project kind of fizzled out after that. Evolving the logic part is the missing link though. And I know there are people out there doing this type of thing.
I have used GA in 2 real life research problems.
One was a power optimization problem (maximize number of appliances turned on, meeting the available power constraint and service guarantee for each appliance)
Another was for radio network optimization, maximizing the coverage area given a fixed equipment budget
GA has one main disadvantage, it usually works with genetic speed so using it in some serious time-dependant projects is quite risky.

Should you use international identifiers in Java/C#?

C# and Java allow almost any character in class names, method names, local variables, etc.. Is it bad practice to use non-ASCII characters, testing the boundaries of poor editors and analysis tools and making it difficult for some people to read, or is American arrogance the only argument against?
I would stick to english, simply because you usually never know who is working on that code, and because some third-party tools used in the build/testing/bugtracking progress may have problems. Typing äöüß on a Non-German Keyboard is simply a PITA, and I simply believe that anyone involved in software development should speak english, but maybe that's just my arrogance as a non-native-english speaker.
What you call "American arrogance" is not whether or not your program uses international variable names, it's when your program thinks "Währung" and "Wahrung" are the same words.
I'd say it entirely depends on who's working on the codebase.
If you have a small group of developers who all share a common language and you don't ever plan needing anyone who doesn't speak the language to work on the code then go ahead and use whatever characters you want.
If you need to have people of varying cultures and languages working on the code then it's probably best to stick with English since it's the common denominator for just about everyone in the world.
If your business are non-English speakers, and you think Domain Driven Design has something to it, then there is another aspect: How do we, as developers, use the same domain language as our business without any translation overhead?
That does not only mean translations between languages, say English and Norwegian, but also between different words. We should use the exact same words as our business for our entity classes and services.
I have found it easier to just give in and use my native language. Now that my code use the same words, it's easier to have a conversation with my domain experts. And after a while you get used to it, just like how you got used to code without Hungarian notation.
I used to work in a development team that happily wiped their asses with any naming (and for that matter any other coding) conventions. Believe it or not, having to cope with ä's and ö's in the code was a contributing factor of me resigning. Though I'm Finnish, I prefer writing code with US keyboard settings because curly and square brackets are a pain to write in a Finnish keyboard (try right alt and 7 and 0 for curlies).
So I say stick with the ascii characters.
Here's an example of where I've used non-ASCII identifiers, because I found it more readable than replacing the greek letters with their English names. Even though I don't have θ or φ on my keyboard (I relied on copy-and-paste.)
However these are all local variables. I would keep non-ASCII identifiers out of public interfaces.
It depends:
Does your team conform to any existing standards that require your using ASCII?
Is your code ever going to be feasibly reused or read by someone who doesn't speak your native language?
Do you envision a scenario where you'll need to ask for help online and will therefore not be able to copy-paste your code sample in as-is?
Are you certain your entire suite of tools support code encoding?
If you answered 'yes' to any of the above, stay ASCII only. If not, go forward at your own risk.
Part of the problem is that the Java/C# language and its libraries are based on English words like if and toString(). I personally would not like to switch between non-English language and English while reading code.
However, if your database, UI, business logics (including metaphors) are already in some non-English language, there's no need to translate every method names and variables into English.
IF you get past the other prerequisites you then have one extra (IMHO more important) one - How difficult is the symbol to type.
On my regular en-us keyboard, the only way I know of to type the letter ç is to hold alt, and hit 0227 on the numeric keypad, or copy and paste.
This would be a HUGE big roadblock in the way of typing quickly. You don't want to slow your coding down with trivial stuff like this if you aren't forced to. International keyboards may alleviate this, but then what happens if you have to code on your laptop which doesn't have an international keyboard, etc?
I would stick to ASCII characters because if anyone in your development team is using an SDK that only supports ASCII or you wanted to make your code open source, alot of problems could arise. Personally, I would not do it even if you are not planning on bringing anyone who doesn't speak the language in on the project, because you are running a business and it seems to me that one running a business would want his business to expand, which in this day and age means transcending national borders. My opinion is that English is the language of the realm, and even if you name your variables in a different language, there is little to no point to use any non-ASCII characters in your programming. Leave it up to the language to deal with it if you are handling data that is UTF8: my iPhone program (which involves tons of user data going in between the phone and server) has full UTF8 support, but has no UTF8 in the source code. It just seems to open such a large can of worms for almost no benefit.
There is another hazzard to using non-ASCII characters, though it will probably only bite in obscure cases. The allowed characters are defined in terms of the methods Character.isJavaIdentifierStart(int) and Character.isJavaIdentifierPart(int), which are defined in terms of Unicode. However, the exact version of Unicode used depends on the version Java platform, as specified in the documentation for java.lang.Character.
Since character properties change slightly from one Unicode version to the next, it's possible (but probably very unlikely) you could have identifiers that are valid in one version of Java, but not in the next.
As already pointed out, unless method names mostly match the language, it is a bit weird to constantly switch languages while reading.
For the Scandinavian languages & German, which I can speak and thus speak for, I would at least recommend using standard substitutions, ie.
ä/æ -> ae, ö/ø -> oe, å -> aa, ü -> ue
etc. just in case as others may find it difficult to type the original letters without keyboard/keymap changes. Think if you suddenly had to work with a codebase where the developers used a third language (for instance including the French ç) and didn't do this.. Switching between more than 2 keymaps to type efficiently would be painful in my experience.

Categories

Resources