C#: Autogenerating DDL and ORM classes from XML schema (XSD) file

C#: Autogenerating DDL and ORM classes from XML schema (XSD) file - c#

I have a rather large XSD file available here.
I want to generate the following from the file:
Generate DDL (for PostgreSQL), the DDL should contain initial values where appropriate, as specified by 'permitted' values in the XSD
Generate an ORM that will allow me to perform CRUD operations on the records in the database created in step 1
Can anyone suggest a tool or series of tools/technologies to achieve this?
In case I have to roll my own solution, can someone suggest a good tutorial for XSLT (preferably a cookbook - since I already know some XML/XPath).
Incidentally, I tried xsd.exe on Windows - it failed and printed an error message suggesting that there was a circular reference in the XSD file. I then tried xsd.exe on mono, that worked - but the file created had some invalid statements. I am guessing (perhaps incorrectly) that xsd.exe is NOT the way to achieve these twin goals - if I am wrong, let me know.
Also, I took at Ann Lewkowicz's XSLT transform file to generate a DDL from an XSD file - BUT that appeared to have got stuck in an infinite loop - and also complained about 'infinite recursion'
So I need help with the following:
First of all, can anyone test/check if the XSD file is indeed screwed up? - and if it is, how to fix it?
How do I go about generating a DDL and ORM from the XSD file?

Personally I would have written the generator myself. There may be good generators out there, but I haven't seen any. All I've tried using (though I never used an XSD as the starting point) generate terrible code, and worse, are rather impossible to customize to handle every quirk that inevitably turns up.
Doing so is a lot less work than people seem to imagine, and gives many benefits, not least that you'll actually have total control over exactly what is generated. And you could even (and quite easily) take it to the next level and generate the stuff at run-time. The latter is hardly meaningful if the schema is final, but can be a huge time-saver if it's constantly evolving.
I'm quite sure this isn't the answer you were hoping for, and I'd be interested too if anyone knows of good tools for the job.

Related

Multiple Versions C# classes/XSD using XSD.exe

I am using XSD.exe to convert a pretty complex XML-Schema (XSD-file) to C# Classes. I am then using XmlSerializer to read XML into memory and work with the data.
In the future, the XSD will change. So there will be a new version. I will have to create a new cs file with XSD.exe. But I still want to support the old versions of XML files as well.
What is the best way to go about this and support both the old and new versions of XML files? Obviously, the classes XSD.exe creates will have the same names. So I can't really just generate another cs file in parallel with XSD.exe.
Any ideas are welcome. Thanks in advance!

XML Data Binding has the advantage of enabling you to code against strongly typed classes rather than untyped nodes, but this can make versioning tricky.
Information about this can be found in the 'Schema Versioning' section of Liquid XML Data Binder 2021 - Getting Started documentation.

Data binding technologies (that convert XSD definitions into types in a strongly-typed programming language) are an absolute pain when the schema is large, complex, or changing. My strong advice would be, find a different approach. I've earned a lot of consulting money helping people dig themselves out of this hole.
Use technologies that are better at coping with change and variety. XSLT, XQuery, LINQ, or even DOM if you must. XSLT and XQuery come with schema-awareness as an option so you can get some of the benefits (having your program code checked against the schema) without the heavy price of rebuilding and retesting your application every time there's a change.

Thank you for your answers.
For now, I placed the Code generated by XSD.exe in separate Namespaces and have them derive from a base class.
Like this, I can use either one or the other Class for generating/reading the XML. It appears to be working for me right now, as the Schema will not change without a new Version. Any changes made will be put into a new Version.

Generate class from XSD - compact solution

I have XSD file, which seems rather complex (I am very new to working with XSD).
My task is to create a program, which would generate XML files based on the XSD schema (in a more detail - we will get a CSV file with the data and these need to be serialized into a XML). I did a research and tried various techniques of generating C# class from the XSD file, where the most 'compact' was xsd2code plugin for Visual Studio.
Nonetheless, this plugin has generated over 7,000 lines of code which quite shocked me as it was just one giant mess (for me).
My question now is - is there a better way (or maybe some switch I forgot to check) which will generate rather compact C# class? If not, then what is the next step that people have to do once they get C# class? Do they have to additional manual post processing so that the file is more 'programmer-friendly', or ...?
Thank you for your guidance; any help or tip will be highly appreciated!

Reflect and Load code at design time in Visual Studio

I have an XML file that lists a series of items, and the items are often referenced by their name in code.
Is there any way to use reflection within Visual Studio to make this list 'accessible' sort of like intellisence? I've seen it done before - and I'm pretty sure it's ridiculously difficult, but I figure it can't hurt to at least ask.

I would recommend against using reflection for this.
Apart from the added complexity in the code you are also opening the code up to abuse from somebody modifying your XML to get your code to do what they want (think injection attack).
You would be better off parsing the XML as usual but using a big if / switch statement to define what how the code runs. That way you have more chance of catching any problems and validating the input.
From string to function call sounds great but will bite you in the bum.

I think he wants to access xml from c# code with intelligences.
My guess is that you would have to build some sort of code generator that would generate c# class that has the properties of you xml field... kind of how visual studio generates code for resourcefiles.

Using reflection for code gen?

I'm writing a console tool to generate some C# code for objects in a class library. The best/easiest way I can actual generate the code is to use reflection after the library has been built. It works great, but this seems like a haphazard approch at best. Since the generated code will be compiled with the library, after making a change I'll need to build the solution twice to get the final result, etc. Some of these issues could be mitigated with a build script, but it still feels like a bit too much of a hack to me.
My question is, are there any high-level best practices for this sort of thing?

Its pretty unclear what you are doing, but what does seem clear is that you have some base line code, and based on some its properties, you want to generate more code.
So the key issue here are, given the base line code, how do you extract interesting properties, and how do you generate code from those properties?
Reflection is a way to extract properties of code running (well, at least loaded) into the same execution enviroment as the reflection user code. The problem with reflection is it only provides a very limited set of properties, typically lists of classes, methods, or perhaps names of arguments. IF all the code generation you want to do can be done with just that, well, then reflection seems just fine. But if you want more detailed properties about the code, reflection won't cut it.
In fact, the only artifact from which truly arbitrary code properties can be extracted is the the source code as a character string (how else could you answer, is the number of characters between the add operator and T in middle of the variable name is a prime number?). As a practical matter, properties you can get from character strings are generally not very helpful (see the example I just gave :).
The compiler guys have spent the last 60 years figuring out how to extract interesting program properties and you'd be a complete idiot to ignore what they've learned in that half century.
They have settled on a number of relatively standard "compiler data structures": abstract syntax trees (ASTs), symbol tables (STs), control flow graphs (CFGs), data flow facts (DFFs), program triples, ponter analyses, etc.
If you want to analyze or generate code, your best bet is to process it first into such standard compiler data structures and then do the job. If you have ASTs, you can answer all kinds of question about what operators and operands are used. If you have STs, you can answer questions about where-defined, where-visible and what-type. If you have CFGs, you can answer questions about "this-before-that", "what conditions does statement X depend upon". If you have DFFs, you can determine which assignments affect the actions at a point in the code. Reflection will never provide this IMHO, because it will always be limited to what the runtime system developers are willing to keep around when running a program. (Maybe someday they'll keep all the compiler data structures around, but then it won't be reflection; it will just finally be compiler support).
Now, after you have determined the properties of interest, what do you do for code generation? Here the compiler guys have been so focused on generation of machine code that they don't offer standard answers. The guys that do are the program transformation community (http://en.wikipedia.org/wiki/Program_transformation). Here the idea is to keep at least one representation of your program as ASTs, and to provide special support for matching source code syntax (by constructing pattern-match ASTs from the code fragments of interest), and provide "rewrite" rules that say in effect, "when you see this pattern, then replace it by that pattern under this condition".
By connecting the condition to various property-extracting mechanisms from the compiler guys, you get relatively easy way to say what you want backed up by that 50 years of experience. Such program transformation systems have the ability to read in source code,
carry out analysis and transformations, and generally to regenerate code after transformation.
For your code generation task, you'd read in the base line code into ASTs, apply analyses to determine properties of interesting, use transformations to generate new ASTs, and then spit out the answer.
For such a system to be useful, it also has to be able to parse and prettyprint a wide variety of source code langauges, so that folks other than C# lovers can also have the benefits of code analysis and generation.
These ideas are all reified in the
DMS Software Reengineering Toolkit. DMS handles C, C++, C#, Java, COBOL, JavaScript, PHP, Verilog, ... and a lot of other langauges.
(I'm the architect of DMS, so I have a rather biased view. YMMV).

Have you considered using T4 templates for performing the code generation? It looks like it's getting much more publicity and attention now and more support in VS2010.
This tutorial seems database centric but it may give you some pointers: http://www.olegsych.com/2008/09/t4-tutorial-creatating-your-first-code-generator/ in addition there was a recent Hanselminutes on T4 here: http://www.hanselminutes.com/default.aspx?showID=170.
Edit: Another great place is the T4 tag here on StackOverflow: https://stackoverflow.com/questions/tagged/t4
EDIT: (By asker, new developments)
As of VS2012, T4 now supports reflection over an active project in a single step. This means you can make a change to your code, and the compiled output of the T4 template will reflect the newest version, without requiring you to perform a second reflect/build step. With this capability, I'm marking this as the accepted answer.

You may wish to use CodeDom, so that you only have to build once.
First, I would read this CodeProject article to make sure there are not language-specific features you'd be unable to support without using Reflection.

From what I understand, you could use something like Common Compiler Infrastructure (http://ccimetadata.codeplex.com/) to programatically analyze your existing c# source.
This looks pretty involved to me though, and CCI apparently only has full support for C# language spec 2. A better strategy may be to streamline your existing method instead.

I'm not sure of the best way to do this, but you could do this
As a post-build step on your base dll, run the code generator
As another post-build step, run csc or msbuild to build the generated dll
Other things which depend on the generated dll will also need to depend on the base dll, so the build order remains correct

Mapping internal data elements to external vendors' XML schema

I'm considering Altova MapForce (or something similar) to produce either XSLT and/or a Java or C# class to do the translation. Today, we pull data right out of the database and manually build an XML string that we post to a webservice.
Should it be db -> (internal)XML -> XSLT -> (External)XML? What do you folks do out there in the wide world?

I would use one of the out-of-the-box XML serialization classes to do your internal XML generation, and then use XSLT to transform to the external XML. You might generate a schema as well to enforce that the translation code (whatever will drive your XSLT translation) continues to get the XML it is expecting for translation in case of changes to the object breaks things.
There are a number of XSLT editors on the market that will help you do the mappings, but I prefer to just use a regular XML editor.

ya, I think you're heading down the right path with MapForce. If you don't want to write code to preform the actual transformation, MapForce can do that for you also. THis may be better long term b/c it's less code to maintain.
Steer clear of more expensive options (e.g. BizTalk) unless you really need to B2B integration and orchestration.

What database are you using? Oracle has some nice XML mapping tools. There are some Java binding tools (one is http://java.sun.com/developer/technicalArticles/WebServices/jaxb). However, if you have the luxory consider using Ruby which has nice built-in "to_xml" methods.

Tip #1: Avoid all use of XSLT.
The tool support sucks. The resulting solution will be unmaintainable.
Tip #2: Eliminate all unnecessary steps.
Just translate your resultset (assuming you're using JDBC or equiv) to the outbound XML.
Tip #3: Assume all use of a schema-based tool to be incorrect and plan accordingly.
In other words, just fake it. If you have to squirt out some mutant SOAP (redundant, I know) payload just mock up a working SOAP message and then turn it into a template. Velocity doesn't suck.
That said, the best/correct answer, is to use an "XML Writer" style solution. There's a few.
The best is the one I wrote, LOX (Lightweight Objects for XML).
The public API uses a Builder design pattern. Due to some magic under the hood, it's impossible to create malformed XML.
Please note: If XML is the answer, you've asked the wrong question. Sometimes, we're forced against our will to use it in some way. When that happens, it's crucial to use tools which minimize developer effort and improve code maintainability.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.