I'm Creating a Multi Language website with at least 5 language, what should I consider
On a technical front not a lot, you can use a framework like Zend or Kohana or Rails etc. which usually have the ability to replace the content with tags and then fill the tag with the language of choice at run time. The different languages reside in appropriately named directories and can be triggered by the browser language tag or another mechanism. If you are not using a framework with this facility then study one to see how it is done.
After that and in no particular order.
Why multilingual? you really need a compelling reason to do it as the workload you are taking on is large and complex and onerous. I know all the reasons about how and why people like sites in their native language but for a multi lingual site's investment in money you need to be making a proper return on the investment and not just doing it for the sake of it.
Localising, L10N or internationalisation i18n, is not just about language. It is about about cultural differences. Anglo Saxons like cool restrained san-serif type sites. Latin and Latin American cultures like more vibrant colours and cursive type faces. And so on. So you need to have a mechanism that will change the css for each language as well (well to be truly effective you do)
Who is doing the translation? Remember it is an idiomatic translation you need, Google Translate API will not cut it and you need a native speaker to translate from and to the target. So for example if you are using FIGS (French Italian etc) from an English original. You need a translator for English to French, English to German, English to Italian, and English to Spanish (see the costs mounting?). A good bureau will provide all this for you and manage the process though.
Proof reading. Can you speak 5 languages well enough to check that above work is correct?
Maintenance. Again assuming English is the base language and there is a new page or a page rewrite or even a typo you need to go through the above process and update the site so you need a good workflow and process control system to ensure that the changes and updates work effectively. The ongoing maintenance can be crippling in time and cost, work it out before you start.
Beware of advice form people who localise programmes/applications It is not remotely the same thing.
Many solutions actually use separate web sites for each localisation rather than the all in one approach. This can be counter intuitive when we want to put it all into one "technical" package. However you can by separating the sites easily cope with different styles, and character sets and ltr text etc. You can stagger updates and manage the workflow more effectively, adding new site is far far simpler and it allow you to use the different URL's that may be required and allows you to optimize each site for SEO.
Please see: Best Practices for Developing World-Ready Applications
Walkthrough: Using Resources for Localization with ASP.NET
When I developed bi-directional web applications, I found the following practices helped too much:
To make your page as easy to globalize
as possible, follow these best
practices:
❑ Avoid using absolute positioning and
sizes for controls.
❑ Use the entire width and height of
forms.
❑ Size elements relative to the
overall size of the form.
❑ Use a separate table cell for each
control.
❑ Avoid enabling the NoWrap property
in tables.
❑ Avoid specifying the Align property
in tables.
source: MCTS Self-Paced Training kit: Microsoft .NET Framework 2.0 Web-based client development.
As a general tip, I found using tables instead of just DIVs is very helopful.
Related
I've been tasked to create (or seek something that is already working) a centralized server with an API that has the ability to return a PDF file passing some data, and the name of the template, it has to be a robust solution, enterprise ready. The goal is as follows:
A series of templates for different company things. (Invoices, Orders, Order Plannings, etc)
A way of returning a PDF from external software (Websites, ERP, etc)
Can be an already ready enterprise solution, but they are pressing for a custom one.
Can be any language, but we don't have any dedicated Java programmers in-house. We are PHP / .NET, some of us dabble, but the learning curve could be a little steep.
So, I've been reading. One way we've thought it may be possible is installing a jasper reports server, and creating the templates in Jaspersoft Studio, then using the API to return the PDF files. A colleague stands for this option, because it's mostly done, but 1º is java and 2º I think it's like using a hammer to crack a nut.
Other option we've been toying with is to use C# with iTextSharp to create a server, and create our own API that returns exactly the PDF with the data we need. Doing this we could have some benefits, like using the database connector we have already made and extracting most of the data from the database, instead of having to pass around a big chunk of data, but as it is bare, it doesn't really have a templating system. We'd have create something from with the XMLWorker or with c# classes but it's not really "easy" as drag and drop. For this case I've been reading about XFA too, but documentation on the iText site is misleading and not clear.
I've been also reading about some other alternatives, like PrinceXML, PDFBox, FOP, etc, but the concept will be the same as iText, we'd have to do it ourselves.
My vote, even if it's more work is to go the route of iText and use HTML / CSS for the templates, but my colleagues claim that the templates should be able to be changed every other week (I doubt it), and be easy. HTML / CSS would be too much work.
So the real question is, how do other business approach this? Did I leave anything out on my search? Is there an easier way to achieve this?
PS: I didn't know if SO would be the correct place for this question, but I'm mostly lost and risking a "too broad question" or "off topic" tag doesn't seem that bad.
EDIT:
Input should be sent with the same request. If we decide the C# route, we can get ~70% of the data from the ERP directly, but anyway, it should accept a post request with some data (template, and data needed for that template, like an invoice data, or the invoice ID if we have access to the ERP).
Output should be a PDF (not interested in other formats, just PDF).
Templates will be updated only by IT. (Mostly us, the development team).
Performance wise, I don't know how much muscle we'll need, but right now, without any increase, we are looking at ~500/1000 PDFs daily, mostly printed from 10 to 10.30 and from 12 to 13h. Then maybe 100 more the rest of the day.
TOP performance should not be more than ~10000 daily when the planets align, and is sales season (twice a year). That should be our ceiling for the years to come.
The templates have some requirements:
Have repeating blocks (invoice lines, for example).
Have images as background, as watermark and as blocks.
Have to be multi language (translatable, with the same data).
Have some blocks that are only show on a condition.
Blocks dependent on the page (PDF header / page header / page footer / PDF footer)
Template will maybe have to do calculations over some of the data, I don't think we'll ever need this, but it's something in the future may be asked by the company.
The PDFs don't need to be stored, as we have a document management system, maybe in the future we could link them.
Extra data: Right now we are using "Fast-Reports v2 VCL"
Your question shows you've been considering the problem in detail before asking for help so I'm sure SO will be friendly.
Certainly one thing you haven't detailed much in your description is the broader functional requirements. You mentioned cracking a nut with a hammer, but I think you are focused mostly on the technology/interfacing. If you consider your broader requirements for the documents you need to create, the variables involved, it's might be a bigger nut that you think.
The approach I would suggest is to prototype solutions, assuming you have some room to do so. From your research, pick maybe the best 3 to try which may well include the custom build you have in mind. Put them through some real use-cases end to end - rough as possible but realistic. One or two key documents you need to output should be used across all solutions. Make sure you are covering the most important or most common requirements in terms of:
Input Format(s) - who can/should be updating templates. What is the ideal requirement and what is the minimum requirement?
Output Requirement(s) - who are you delivering to and what formats are essential/desirable
Data Requirement(s) - what are your sources of data and how hard/easy is it to get data from your sources to the reporting system in the format needed?
Template feature(s) - if you are using templates, what features do the templates need? This includes input format(s) but I was mostly thinking of features of the engine like repeating/conditional content, image insertion, table manipulation etc. ie are your invoices, orders and planning documents plain or complex
API requirements - do you have any broader API requirements. You mentioned you use PHP so a PHP library or Web/Web Service is likely to be a good starting point.
Performance - you haven't mentioned any performance characteristics but certainly if you are working at scale (enterprise) it would be worth even rough-measuring the throughput.
iText and Jasper are certainly enterprise grade engines you can rely on. You may wish to look at Docmosis (please note I work for the company) and probably do some searches for PDF libraries that use templates.
A web service interface is possibly a key feature you might want to look at. A REST API is easy to call from PHP and virtually any technology stack. It means you will likely have options about how you can architect a solution, and it's typically easy to prototype against. If you decide to go down the prototyping path and try Docmosis, start with the cloud service since you can prototype/integrate very quickly.
I hope that helps.
From my years of experience in working with PDF I think you should pay attention to the following points:
The performance: You may do the fastest performance with API based pdf files generation in comparision to HTML or XML to PDF generation (because of an additional layer of conversion involved). Considering peaks in the load you may want to calculate the cost of scaling up the generation by adding more servers (and estimate the cost of additional servers or resources required per additional pdf file per day).
Ease of iterations and changes: how often will you need to adjust templates? If you are going to create templates just once (with some iterations) but then no changes required then you should be OK by just coding them using the API. Otherwise you should strongly consider using HTML or XML for templates to simplify changes and to decrease the complexity of making changes in templates;
Search and indexing: If you may need to run search among created documents then you should consider storing indexes of documents generated or maybe store more the source data in XML along with PDF file generated;
Long time preservation: you should better conform to PDF/A sub-format in case you are looking for a long time digital preservation for your documents. See the VeraPDF open source initiative that you may use to validate generated and incoming PDF documents against the conformance to PDF/A requirements;
Preserving source files The PDF format itself was not designed to be edited (though there are some PDF editors already) so you may consider the need of preserving the source data to be able to regenerate PDF documents later and probably introduce additional output formats later.
The question might sound weird, but I am planning to create a asp.net website, which when fully done, will ideally cater to all countries.
I am currently in the architecture phase.. and is there anything that I should keep in might when doing this?
like
saving all datetime fields in utc
using user's timezone to display all time related data
all labels in the website to be localizable
is there anything else??
thanks,
Chris
A few additional points:
Some languages read from Right to
Left (Hebrew for example), which
will affect your UI.
Make sure your datastore supports
unicode (NVARHCAR vs VARCHAR).
Provide an easy way for translators
to contribute content. Usually means
creating a Data Driven Resource
Provider.
Internationalization and Localization is a good place to start.
You should think about how the localization process will take place. I assume you are not a native speaker in all languages you want to use for your application.
There are several approachs on how to address this: For example, there are companies that specialize in localization, meaning you give them an excel sheet, or an xml file.
You should also think about, where do you want to have all these localizations. Do you only want them in your ASP.net application, meaning in only one place? Then the resource file will be your way to go, because they are easy to handle and easy to send to localization studios.
But if you want to use the localizations in more than one place, you need to store them in a web service or in a database. Keep in mind that using localizations across multiple plattforms (e.g. web site, administrative tools) will force you to write import/export functionality for the used tables. (Because you won't give the localization company access to your database)
I would start by looking here: http://msdn.microsoft.com/en-us/library/c6zyy3s9.aspx.
I also guess you are working on doing a SQL database. If that is the case look at things like using nvarchars.
If it is possible to auto-format code before and after a source control commit, checkout, diff, etc. does a company really need a standard code style?
It feels like standard coding style debates that have been raging since programming began like "put the bracket on the following line" or "properly indent your (" are no longer essential.
I realize in languages where white space matters the diff will have to consider it but for languages where the style is a personal preference is there really a need to worry about it anymore?
Auto-format can really only address whitespace.
It wont address developers giving variables bizarre nonsensical names.
It won't address some developers having functions return null on an error vs throwing an exception.
I'm sure others can think of more examples.
This is what we do at my work:
We all use Eclipse. We don't have a policy for using Eclipse but somehow none of us is an IDEA/IntelliJ guy. We also think our code should be written with legacy in mind. This means our code has to be readable in a certain way even years after (#1) no matter who wrote it and if that person even is in the company anymore.
Eclipse has couple handy features, automatic format on save and a specific Formatter tool. As you can see from the linked screenshot, it can be configured with XML. Thus there's a bunch of premade XML:s available for every worker in our company so that when a new guy comes in, we walk him through of the whole process and configure their Eclipse for them (yes, it's slightly evil thing to do) so that it actually uses those formatting XML:s we have provided. We do not enforce automatic format on save, we don't want to be completely intrusive, we just want to push all our developers into the right directions. For even increased compatability, we mostly use rules defined in JCC.
Next comes the important part, the actual builds. We are those who embrace automatic builds and for that we use Hudson Continuous Integration Server. There's two important parts in our configurations beyond this:
We use CVS loginfo to trigger builds whenever something is committed.
We utilize several plugins available for Hudson, including Continuous Integration Game in conjuction with the most important one, Checkstyle.
The Checkstyle Plugin is the magician in our code style enforcement guide line:
After commiting code to CVS, Hudson build is triggered
After build has been completed succesfully (all unit tests pass etc.), Checkstyle inspects the actual source files
Checkstyle ranks the code based rules we have defined for it
Continuous Integration Game sees the result of Checkstyle and awards/takes away points for the person who has the ownership for the relevant part of the code
Leaderboard shows total points for every commiter in system
Basically this means that when anyone commits ugly code into our CVS, our build server automatically reduces that person's points.
This means that eventually any one of us can be ranked on the Leaderboard based on the general code quality in both look and OO principles such as Law of Demeter, Cyclomatic complexity etc. etc. Naturally this isn't a completely serious statistic, but it's a good indication you're doing something wrong when causing a build to be initiated in our CI won't reduce your points - most of our commits are worth between 1 and 5 points.
And is it working? Sort of, I don't think anyone of us at my work writes ugly or unmaintainable code and personally I love to hunt all kinds of scores so it's definitely motivating me to make code that looks nice and follows all the OO paradigms I know of.
And do we as a company really need it? I think we do as you should see from reading this entire answer, it can be considered a good practice for the advancements it brings.
#1: in a related note, I refactored legacy code from 2002 today which used those standards, didn't look "bad" at all even in its original form and certainly not worse in its new form
No, not really.
If you can actually get it to work consistently and not make it flag code has changed due to a different style of laying the code out.
However, this is just a small part of coding standards. It won't cover multiple return statements, the use or not of ternary operators, etc.
It is always nice if the coding style that the shop uses is the same one that is also followed by the development tools.
Otherwise, if there is a large body of code that already follows a shop standard which is NOT the same as that of the tools you have two choices:
Modify all of the code to follow the tool standard, or
Maintain the existing shop standard.
Many shops do the latter. Regardless, there does need to be some kind of standard, and it does need to be followed.
Some development tools allow you to tweak their standard. In some cases you may be able to bring the tools in alignment with the shop standard.
It probably doesn't matter that much anymore if you can ensure that everybody in the team sees the source code "correctly" formatted, whatever they think it is. However I've not seen a system that can do that - you can do parts of it (say, reformat before and after checkin/checkout) but these days you also have to consider web interfaces into the version control, external code review systems that interact directly with the version control system etc.
The main purpose of a standard code style is (IMHO) to ensure that you can read other team members' code easily without having to start reverse engineering it because all the code is written using the same sort of guiding principles. Indenting and parentheses placement seem to be a major hangup on this but they are only a very small and in my opinion, somewhat overblown and not very important part of the need to make code consistent.
Unfortunately I'm not aware of any tools that can automatically apply consistent coding principles to source code...
Yes, coding styles are needed if there is a desire to have a homogeneous code base. Such a code base can be useful in preventing individual ownership of parts of the code base, which can cause problems when people leave the team. If you can't imagine having wildly different styles and problems understanding all of it, just look at all the different ways English text can be organized in various communications, all written but quite different such as tweets, e-mail, text messages, IM, message board posts, etc. and changes in fonts, capitalization, decorations, etc.
What libraries are there to write C# internationalized applications?
Typical functionalities that should be contained in the library:
Validation of country specific data (e.g. VAT numbers, phone numbers, addresses,...)
Validation of bank and financial coordinates (e.g. Credit Card numbers, IBAN,...)
Language-specific functionalities (e.g. numbers to words to numbers, summarize,...)
Language specific content filtering (e.g. swearword filtering...)
An example of such libraries in Perl would be the Internationalization/Locale section of CPAN.
What C# solutions are available?
Note: I am not looking for an introduction to the System.Globalization namespace :)
Note 2: Should I desume that there are no options available? Is someone interested in joining forces and create one?
Note 3: Edit to make the question appear on front page in hope of more answers. This isn't such a hard question, how is it possible that Stackers don't ever do i18n?
One project that is working towards a database of globalization, internationalization and localization knowledge is the Unicode Common Locale Data Repository, based on the old ICU project at IBM.
As it is a database of XML data it doesn't contain any .NET-specific code, but as a body of knowledge it is very good.
Only a smallish subset is in the .NET framework. Microsoft hasn't gone near any of the supplemental stuff, like postcode formats, number spelling (for check/cheque amounts), etc. Standard time zone names (from the Olson/tz distribution), etc. are also included, with mappings to the Windows-specific names. Some of the hierarchical locale-specific behaviours also have better support.
I wouldn't say that no one does i18n, but I don't know of any generic tools that can be used for every project. Maintaining a database with all of the information you are looking for would be an epic project. It sounds like what you're looking for isn't a specific C# library, but more a collection of information online that you can draw from. If you were able to find a repository of swear words in various languages (for example), it would be trivial for you to use this in C#. I think that finding a solution that wraps up all of your requirements into an easy-to-use assembly is going to be impossible to find.
Have a look at
http://www.microsoft.com/globaldev/getwr/dotneti18n.mspx
and
http://www.dotneti18n.com/
String to number and vice versa can be dones as following:
culture = new CultureInfo(locale);
int number = Convert.ToInt32(myString, culture.NumberFormat);
string str= Convert.ToString(myNumber, culture.NumberFormat);
As to checking VATS and adresses, I'm interested in that too, haven't found anything useful so far.
Not exactly a "library", per se, but I've actually ran into a great service (for pay), by a company called E4X (former client of mine).
What they provide is complete localization of your ecommerce site, including language translations, currency exchanges, local billing and handling of financial transactions including region-specific taxes etc, and more. They even deal with logisitics of physical shipping...
Worth looking into, for an ecommerce business. Let 'em know I sent you... ;-)
That's a huge endeavor. Let's start with one simple problem: phone numbers. Libphonenumber Google library at http://code.google.com/p/libphonenumber/ has a C# port at https://bitbucket.org/pmezard/libphonenumber-csharp with notes at http://blog.thekieners.com/2011/06/06/using-googles-libphonenumber-in-microsoft-net-with-c/. Appears to be a good library for handling both US and int'l numbers.
I'm using Visual Studio (2005 and up). I am looking into trying out making an application where the user can change language for all menues, input formats and such. How would I go on doing this, as I suppose that there is some complete feature within .Net that can help me with this?
I need to take the following into account (and fill me in if I miss some obvious stuff)
Strings (menues, texts)
Input data (parsing floats, dates, etc..)
Should be easy to add support for another language
I'm not an expert with .NET by any means but Localization is never just as simple as "swapping out String values" or "changing date formats". There is much more to be taken into consideration such as layout, proper text placement.
Take Chinese for example. The way you read is top to bottom not left to right. If properly localized the app should take that into account.
http://msdn.microsoft.com/en-us/library/y99d1cd3(VS.80).aspx seems to be a good start though if you're dealing with Windows Forms.
The classic recipe is: design the app with no native language but a localization facility, and develop an initialization into one language (e.g., English). So you build the app and localize it into English every night; without the localization step it would not be usable. Do that well, and the resources for the initial sample localization can be replaced with those for any other language. Take into account non-roman scripts from the beginning. It's much cleaner to have a no-language app that always requires localization rather than a language-specific app that needs to have its native language subtracted and a replacement added.
For strings you should just separate your strings from your code (having an XML/DLL that will transform string IDs to real strings is one way to go). However you do need to make sure that you are supporting double byte characters for some languages (this is relevant if you use C/C++).
For input data what you want is to have different locale's. In Java this is relatively easy, and if you use C# it probably is quite easy also. In C/C++ I don't really know. The basic idea is that the input parsers should be different based on the locale selected at that time. So each field (textfield, textbox, etc.) must have an abstract parser that is then implemented by a different class depending on the locale (right to left, double byte, etc.).
Check the Java implementation for details on how they did it. It is quite functional.
You definitely need to be using the .NET ResourceManager and the resx file xml format, however there are a number of approaches to using this.
It really depends on what you are wanting to achieve. For me I wanted a single xml resource file (for each supported language) that could be modified by anyone. I created a helper class that loaded the global resource file into ResourceManager (once only) and I had a helper function that gives me the required resource for a given name. The only disadvantage in this approach was that I could not leverage dynamic binding of resources to properties.
I found this better and easier to manage than multiple or embedded resource files for every form. Additionally exactly the same approach can used in an ASP.NET application. I also found this approach means that outsourcing translation of resources and shipping language packs to customers much more manageable.
Microsoft's recommended approach is to use satellite assemblies, as described in Packaging and Deploying Resources. If you're using a ResourceManager to load resources, .NET will load the correct resources for the CurrentUICulture. This defaults to the user's current UI language setting in Windows.
It is possible to localize Windows Forms either through Visual Studio or an external tool, WinRes.exe. This article describes WinRes and how to use Visual Studio to localize the form.