Find number of serialized objects - c#

My issue is trying to determine a number of objects created, the objects being serialized from an XML document. The XML document should be set up for simplicity, so any developer can add an additional object and need no further modification to the code. However each of these objects need to be handled/updated seperately, and specifically, some of the objects are of different sub-classes, which need to be handled differently. So what would be my simplest course of action, allowing other to add objects via the XML, but still ensuring the proper logic happenes for each?

This is totally a bad idea, but if you want something constructive...
Model your XML document objects and include some kind of known syntax for you to specify Lambda expressions in it. So if you enter a
<BinaryExpresion>
<NodeType>Add</NodeType>
<Left>3</Left>
<Right>4</Right>
</BinaryExpression>
Then when you read and compile the expression, you could run that code against the data if the XML object and do something (in this case, executing 3 + 4)

Related

Finding all XPaths in a XQuery using Saxon-HE with C#

Situational Background: XSD with SCH
XML Schema (XSD)
I have an XML schema definition ("the schema") that includes several other XSDs, all in the same namespace. Some of those import other XSDs from foreign namespaces. All in all, the schema declares several global elements that can be instantiated as XML documents. Let's call them Global_1, Global_2 and Global_3.
Business Rules (SCH)
The schema is augmented by a Schematron file that defines the "business rules". It defines a number of abstract rules, and each abstract rule contains a number of assertions using the data model defined via XSD. For instance:
<sch:pattern>
<sch:rule id="rule_A" abstract="true">
<sch:assert test="if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true()" id="A-01">Error message</sch:assert>
<sch:assert test="not(abc:c = 'abcd' and abc:d = 'zz')" id="A-02">Some other error message</sch:assert>
</sch:rule>
<!-- (...) -->
</sch:pattern>
Each abstract rule is extended by one or more non-abstract (concrete) rule that defines a specific context in which the abstract rule's assertions are to be validated. For example:
<sch:pattern>
<!-- (...) -->
<sch:rule context="abc:Global_1/abc:x/abc:y">
<sch:extends rule="rule_A"/>
</sch:rule>
<sch:rule context="abc:Global_2/abc:j//abc:k/abc:l">
<sch:extends rule="rule_A"/>
</sch:rule>
<!-- (...) -->
</sch:pattern>
In other words, all the assertions defined within the abstract rule_A are being applied to their specific contexts.
Both "the schema" and "the business rules" are subject to change - my program gets them at run-time and I don't know their content at design-time. The only thing I can safely assume is that there are no endless recursive structures in the schema: There is always one definite leaf node for every type and no type contains itself. Put differently, there are no "infinite loops" possible in the instances.
The Problem I want To Solve
Basically, I want to evaluate programmatically if each of the defined rules is correct. Since correctness can be quite a problematic topic, here by correctness I simply mean: Each XPath used in a rule (i.e. its context and within the XQueries of its inherited assertions) is "possible", meaning it can exist according to the data model defined in the schema. If, for instance, a namespace prefix is forgotten (abc:a/b instead of abc:a/abc:b), this XPath will never return anything other than an empty node set. The same is true if one step in the XPath is accidentally omitted, or spelled wrong, etc. This is obviously not a very strong claim for "correctness" of such a rule, but it'll do for a first step.
My Approach Towards A Solution For This
At least to me it doesn't seem like a trivial problem to evaluate an XPath (not to speak of the entire XQuery!) designed for the instance of a schema against the actual schema, given how it may contain axis steps like //, ancestor::, sibling::, etc. So I decided to construct something I would call a "maximum instance": By recursively iterating through all global elements and their children (and the structure of their respective complex types etc.), I build an XML instance at run-time that contains every possible element and attribute where it would be in the normal instance, but all at once. So every optional element/attribute, every element within a choice block and so on. So, said maximum instance would look something like this:
<maximumInstance>
<Global_1>
<abc:a>
<abc:b additionalAttribute="some_fixed_value">
<abc:j/>
<abc:k/>
<abc:l/>
</abc:b>
</abc:a>
</Global_1>
<Global_2>
<abc:x>
<abc:y>
<abc:a/>
<abc:z>
<abc:l/>
</abc:z>
</abc:y>
</abc:x>
</Global_2>
<Global_3>
<!-- ... -->
</Global_3>
<!-- ... -->
</maximumInstance>
All it takes now is to iterate over all abstract rules: And for every assertion in each abstract rule it must be checked that for every context the respective abstract rule is extended by, every XPath within an assertion results in a non-empty node set when evaluated against the maximum instance.
Where I'm stuck
I have written a C# (.NET Framework 4.8) program that parses "the schema" into said "maximum instance" (which is an XDocument at run-time). It also parses the business rules into a structure that makes it easy to get each abstract rule, its assertions, and the contexts these assertions are to be validated against.
But currently, I only have each complete XQuery (just like they are in the Schematron file) which effectively creates an assertion. But I actually need to break the XQuery down into its components (I guess I'd need the abstract syntax tree) so that I would have all individual XPaths. For instance, when given the XQuery if (abc:a/abc:b = '123') then abc:x/abc:y = ('aaa', 'bbb', 'ccc') else true(), I would need to retrieve abc:a/abc:b and abc:x/abc:y.
I assume that this could be done using Saxon-HE (or maybe another Parser/Compiler currently available for C# I don't know about). Unfortunately, I have yet to understand how to make use of Saxon well enough to even find at least a valid starting point for what I want to achieve. I've been trying to use the abstract syntax tree (so I can access the respective XPaths in the XQuery) seemingly accessible via XQueryExecutable:
Processor processor = new Processor();
XQueryCompiler xqueryCompiler = processor.NewXQueryCompiler();
XQueryExecutable exe = xqueryCompiler.Compile(xquery);
var AST = exe.getUnderlyingCompiledQuery();
var st = new XDocument();
st.Add(new XElement("root"));
XdmNode node = processor.NewDocumentBuilder().Build(st.CreateReader());
AST.explain((node); // <-- this is an error!
But that doesn't get me anywhere: I don't find any properties exposed I could work with? And while VS offers me to use AST.explain(...) (which seems promising), I'm unable to figure out what to parametrize here. I tried using a XdmNode which I thought would be a Destination? But also, I am using Saxon 10 (via NuGet), while Destination seems to be from Saxon 9: net.sf.saxon.s9api.Destination?!
Does anybody who was kind enough to read through all of this have any advice for me on how to tackle this? :-) Or, maybe there's a better way to solve my problem I haven't thought of - I'm also grateful for suggestions.
TL;DR
Sorry for the wall of text! In short: I have Schematron rules that augment an XML schema with business logic. To evaluate these rules (not: validate instances against the rules!) without actual XML instances, I need to break down the XQueries which make up the Schematron's assertions into their components so that I can handle all XPaths used in them. I think it can be done with Saxon-HE, but my knowledge is too limited to even understand what a good starting point what be for that. I'm also open for suggestions regarding a possibly better approach to solve my actual problem (as described in detail above).
Thank you for taking the time to read this.
If this were an XSD schema rather than a Schematron schema, then Saxon-EE would do the job for you automatically: this is very similar what a schema-aware XQuery processor attempts to do. But another difference is that in schema-aware XQuery, you can't assume that every element named foo is a valid instance of the element declaration named foo in the schema; it's quite legitimate, for example, for a query to transform valid instances into invalid instances, or vice versa. The input and output, after all, might conform to different schemas.
Saxon uses path analysis to do this: it looks at path expressions to see "where they might lead". Path analysis is also used to assess streamability, and to support document projection (building a trimmed-down tree representation of the source document that leaves out the parts that the query cannot reach). The path analysis in Saxon is by no means complete, for example it doesn't attempt to handle recursive functions. Although all these operations require Saxon-EE, the basic path analysis code is actually present in Saxon-HE, but I would offer no guarantee that it works for any purpose other than those described.
You're basically right that this is a tough problem you've set yourself, and I wish you luck with it.
Another approach you could adopt that wouldn't involve grovelling around the Saxon internals is to convert the XQuery to XQueryX, which is an XML representation of the parse tree, and then inspect the XQueryX (presumably using XQuery) to find the parts you need.
While XQueryX (as pointed out by Michael Kay) would theoretically have been exactly what I was looking for, unfortunately I could not find anything useful regarding an implementation for .NET during my research.
So I eventually solved the whole thing by creating my own parser using the XPath3.1 grammar for ANTLR4 as an ideal starting point. This way, I am now able to retrieve a syntax tree of any Schematron rule expression, allowing me to extract each contained XPath expression (and its sub expressions) separately.
Note that another stumbling block has been the fact that .NET still (!) only handles XPath 1.0 genuinely: While my parser does everything as supposed to, for some of the found expressions .NET gave me "illegal token" errors when trying to evaluate them. Installing the XPath2 NuGet package by Chertkov/Heyenrath was the solution.

C# custom file parsing with 2 delimiters and different record types

I have a (not quite valid) CSV file that contains rows of multiple types. Any record could be one of about 6 different types and each type has a different number of properties. The first part of any row contains the timestamp and the type of record, followed by a standard CSV of the data.
Example
1456057920 PERSON, Ted Danson, 123 Fake Street, 555-123-3214, blah
1476195120 PLACE, Detroit, Michigan, 12345
1440581532 THING, Bucket, Has holes, Not a good bucket
And to make matters more complex, I need to be able to do different things with the records depending on certain criteria. So a PERSON type can be automatically inserted into a DB without user input, but a THING type would be displayed on screen for the user to review and approve before adding to DB and continuing the parse, etc.
Normally, I would use a library like CsvHelper to map the records to a type, but in this case since the types could be different, and the first part uses a space instead of comma, I dont know how to do that with a standard CSV library. So currently how I am doing it each loop is:
String split based off comma.
Split the first array item by the space.
Use a switch statement to determine the type and create the object.
Put that object into a List of type object.
Get confused as to where to go now because i now have a list of various types and will have to use yet another switch or if to determine the next parts.
I don't really know for sure if I will actually need that List but I have a feeling the user will want the ability to manually flip through records in the file.
By this point, this is starting to make for very long, confusing code, and my gut feeling tells me there has to be a cleaner way to do this. I thought maybe using Type.GetType(string) would help simplify the code some, but this seems like it might be terribly inefficient in a loop with 10k+ records and might make things even more confusing. I then thought maybe making some interfaces might help, but I'm not the greatest at using interfaces in this context and I seem to end up in about this same situation.
So what would be a more manageable way to parse this file? Are there any C# parsing libraries out there that would be able to handle something like this?
You can implement an IRecord interface that has a Timestamp property and a Process method (perhaps others as well).
Then, implement concrete types for each type of record.
Use a switch statement to determine the type and create and populate the correct concrete type.
Place each object in a List
After that you can do whatever you need. Some examples:
Loop through each item and call Process() to handle it.
Use linq .OfType<{concrete type}> to segment the list. (Warning with 10k
records, this would be slow since it would traverse the entire list for each concrete type.)
Use an overridden ToString method to give a single text representation of the IRecord
If using WPF, you can define a datatype template for each concrete type, bind an ItemsControl derivative to a collection of IRecords and your "detail" display (e.g. ListItem or separate ContentControl) will automagically display the item using the correct DataTemplate
Continuing in my comment - well that depends. What u described is actually pretty good for starters, u can of course expand it to a series of factories one for each object type - so that you move from explicit switch into searching for first factory that can parse a line. Might prove useful if u are looking to adding more object types in the future - you just add then another factory for new kind of object. Up to you if these objects should share a common interface. Interface is used generally to define a a behavior, so it doesn't seem so. Maybe you should rather just a Dictionary? You need to ask urself if you actually need strongly typed objects here? Maybe what you need is a simple class with ObjectType property and Dictionary of properties with some helper methods for easy typed properties access like GetBool, GetInt or generic Get?

Generate a C# object based on an xml file?

This may be way out in left field, crazy, but I just need to ask before I go on implementing this massive set of classes.
Basically, I'm writing a binary message parser that decodes a certain military message format into an object. The problem is that there are literally hundreds of different message types and they share almost nothing in common with each other. So the way I'm planning to implement this is to create hundreds of different objects.
However, even though the message attributes share nothing in common, the method for decoding them is fairly straightforward and follows a pattern. So I'm planning to write a code generator to generate all the objects and the decode logic for each message type.
What would be really sweet is if there was some way to dynamically create an object based on some schema. It doesn't necessarily have to be XML, but XML is pretty easy to work with.
Is this possible in C#?
I would like the interface to look something like this:
var decodedMessage = MessageDecoder.Decode(byteArray);
Where the MessageDecoder figures out what type of message it is and then returns the appropriate object. It will probably return an interface which implements a MessageType Property or something like that.
Basically what I'm wondering is if there is a way to have one object called Message, which implements a MessageType Property. And then Depending on the MessageType, the Message object transforms into whatever type of message it is, so I don't have to spend the time creating all of these message types.
ExpandOobject Where you can dynamically add fields to an object.
A good starting point is here.
Is xsd.exe what you are looking for? It can take an XML file or a schema and generate the c# classes. One problem that you might encounter though is that some of the military message formats are VERY obtuse. You could end up with some very large code files.
Look at T4 templates. They let you write code to generate code, they are integrated into the IDE, and they are quite easy really.
EDIT: There is no way to do what you are after with var, because var requires the right-hand side of the assignment to be statically typed (at compile time). I suppose that you could dynamically generate that statement, then compile and run it, but that's a very painful approach.
If you have XSD's for all of the message types, then you can use xsd.exe as #jle suggests. If not, then I am curious about the following:
// Let's assume this works
var decodedMessage = MessageDecoder.Decode(byteArray);
// Now what? I don't know what properties there are on decodedMessage, so I cant do anything with it.

Should I have seperate objects for Input and Output, does this pattern have a name?

I am looking for the name of this pattern and to understand the reasons for this design.
I have:
Input data(from DB): being put in an
object, containing data
corresponding to a table row and
also the objects of it's foreign
keys. Data could come from DB or
generated from code for unit
testing.
A calculator: that analyses,
processes and modify this data. I
want the input data to stay external
to this calculator to be able to
unit test it with data generated
from code or coming from and XML
file instead of a DB.
Results data: Then I need to update
my database, BUT also those results
can be used to validate against
expected results in the case of unit
testing.
Is this a Separation of Concern Pattern? Is there other known patterns involved here?
Should my result object, containing result data, sit in the same object as my input data and null at the begining then updated after the processing, or should the results object be totally independent from the input object, and why please.
I would suggest that your output object be completely independent from your input object. Otherwise your input object has to "know" about the output object. That seems unreasonable because any time you change the output object, you by extension change the input object. Or perhaps you want multiple different output objects. Then the input object needs to know about all of the output objects, or you have an untyped reference (i.e. object in .NET) that you'll need to cast.
The input object is exactly that: it represents the data as its received from your data store or whatever. That input data is passed through some transformation to create an output object of some kind, which might bear very little relation to the input object.
In short, there's no good reason for the input object to have any knowledge of the output object. In fact, the input object shouldn't even be concerned that the output object exists. Giving the input object that knowledge will result in unnecessary coupling, which will make it more difficult to modify your code in the future.
There are many different things in your list which seem to specialized to be part of one single pattern. The cornerstone of you describe seems to be the Repository pattern though.
At a high level, I would refer to this as "ETL", "Extract, Transform and Load". This term is more commonly used in a BI context, but it is the exact same thing conceptually.

Linq to DataTable without enumerating fields

i´m trying to query a DataTable object without specifying the fields, like this :
var linqdata = from ItemA in ItemData.AsEnumerable()
select ItemA
but the returning type is
System.Data.EnumerableRowCollection<System.Data.DataRow>
and I need the following returning type
System.Data.EnumerableRowCollection<<object,object>>
(like the standard anonymous type)
Any idea?
Thanks
If I understand you correctly, you'd like to get a collection of objects that you don't need to define in your code but that are usable in a strongly typed fashion. Sadly, no you can't.
An anonymous type seems like some kind of variant or dynamic object, but it is in fact a strongly typed class that is defined at compile time. .NET defines the type for you automatically behind the scenes. In order for .net to be able to do this, it has to have some clue from the code with which to infer the type definition. It has to have something like:
from ItemA in ItemData.AsEnumerable()
select ItemA.Item("Name"), ItemA.Item("Email")
so it knows what members to define. There's no way to get around it, the information has to logically be there for the anonymous type to be defined.
Depending on why exactly your are trying to do this, there are some options.
If you want intellisense while still encapsulating your data access, you can return xml instead of a datatable from your encapsulated data access class. (You can convert data tables to xml very easily. You'll want to use the new System.Xml.Linq classes like the XElement. They're great!) Then you can use VS2008's ability to create an xsd schema from xml. Then use/import that schema at the top of your code page, and you have intellisense.
If you have to have an object an with properties for your data, but don't want to define a class/structure for them, you'll love the new dynamic objects coming in C#4.0/VB10. You have object properties based on what the sql returns, but you won't have intellisense. There is also a performance cost to this, but (a) that might not matter for your situation and (b) it actually is not so bad in some situations.
If you're just trying to avoid making a lot of classes, consider defining structs/structures on the same code file, beneath your class definition. When you add more columns to your result set, it's easy to adjust a struct with more public fields.
In short you can have any two of the following three: (a) dynamic, (b) strontly-typed objects, (3) intellisense. But not all three.
There is one way to accomplish what you want, but it required knowledge of dynamic linq. You would build the query during run-time and then use it. I am no expert and have never really played around with it, but here is a link to Scott Guthrie's blog about it - Dynamic Linq. Hope that helps.
Wade

Categories

Resources