Validate text file (not XML) with XSD? - c#

I need to validate a flat file ( text file ) with an XSD file (schema). I found to do this for a XML file but not for text file.
Is there any base class to do that ?
The contents of the text file is as follows:
Header
SubHeader (many)
Records (many)
Footer

An XSD cannot be used to validate an arbitrary text file, only an XML file.
The validation rules specified in the W3C XML Schema Recommendation are defined against XML elements and attributes, not arbitrary text:
Throughout this specification, Definition: the word valid and its
derivatives are used to refer to [the following:]
[...] whether an element or
attribute information item satisfies the constraints embodied in the
relevant components of an XML Schema
[Order rearranged and emphasis added from original source.]

XSD stands for Xml Schema Definition. You can only use it to check xml, not arbitrary text.
Your best bet would be to refresh your Regex skills.

May be you want to use Flat File Checker, is an open source application for data validation in import and export files, then This can help you https://www.codeproject.com/Articles/43398/Validating-data-with-Flat-File-Checker

Related

C# multiple writeattributestring to xml not showing in correct order

xtw.WriteStartElement("cXML");
xtw.WriteAttributeString("payloadID", payloadidstr);
xtw.WriteAttributeString("timestamp", utctime());
xtw.WriteAttributeString("version", "1.2.024");
above code working fine to generate xml attribute. if open xml file in notepad shows the following string which is correct.
cXML payloadID="1392408819113-4172669982087053277#123.456.789.10" timestamp="2014-02-14T12:13:39-08:00" version="1.2.024"
but when open xml file in any browser the attribute order is changed showing like this.
cXML version="1.2.024" timestamp="2015-01-15T16:54:48-08:00" payloadID="150120150454480293-832257153#123.456.789.10"
Can someone let me know why browser not showing in correct order or how to shows multiple string under one element.
XML does not define ordering of attributes, so there is no "correct" order - compliant reader/writers are free to order the way they pleased.
Per the spec section 3.1:
the order of attribute specifications in a start-tag or empty-element tag is not significant

XmlReader - read xml with colons in tag names

I have very complex XML that I need to read with XmlReader.
The elements in the xml are as the following:
<log:event>
<ev:logger>some text</ev:logger>
<ev:line>24</ev:line>
<ev:ex>
<ev:levelone>some message</ev:levelone>
<ev:leveltwo>some other message</ev:leveltwo>
</ev:ex>
</log:event>
XmlReader will not know how to read this since it does not have name space definition on each xml tag.
I would have done that programmatically (appending namespace to the strings), but the file is huge and I its impossible.
(I dont control the file creation).
Any suggestion how can that file be read as xml without namespacing?
Thanks!
You can still append namespaces, just read the file into memory and manipulate it there. I do it with several XML-based API's from machine manufacturers that doesn't comply with XML standards to make it easier to read with normal xml parsers

Remove Empty Line From XML Document

I am Currently Facing A problem. I am loading a xml file in C# and remove some nodes from it and appending some nodes. now problem is that when i am doing removal from the xml file then there are some empty lines created automatically ,so i want to remove these line .
And when i append some nodes to the parent node in xml then i want the new line in each ending tag
For Eg. My Xml file is
<intro id="S0001">
<title>Introduction Title</title>
<para>This is a paragraph. Note that paragraphs can contain other block–level objects, such as lists, as well as directly containing text.</para>
<para>The introduction can contain all of the text objects that a section can contain, except that it cannot be divided into parts, sections and sub–sections.</para>
<para>The introduction can contain tables:</para>
</intro><part>
<no>Part A</no> Article Structure <sup>&lpar;Part Title&rpar;</sup><section1 id="S0002">`enter code here`
<no>Sect 1</no>
<title>First Section in Part 1 <sup>&lpar;Section 1 Title&rpar;</sup></title>
<shortsectionhead>Short Section Header</shortsectionhead>
<para>This is a section in the first part of the article.</para>
</section1><section1 id="S0003">
Code:
XmlNode partNnode = xmlDoc.SelectSingleNode("//part");
XmlNode introNode=xmlDoc.SelectSingleNode("//intro");
XmlDocumentFragment newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=partNnode.OuterXml;
introNode.ParentNode.InsertAfter(newNode,introNode);
partNnode.ParentNode.RemoveChild(partNnode);
partNnode = xmlDoc.SelectSingleNode("//part");
nodeList = xmlDoc.SelectNodes("//section1");
foreach (XmlNode refrangeNode in nodeList)
{
newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=refrangeNode.??OuterXml;
partNnode.AppendChild(newNode);
}
Please help me
Thanks in advance
If you load and save a XMl file with C#, then the XML should be formatted correctly (an easy way to format strange looking XML files is just to load and save them with some C# code).
If I understand your question correctly, then you are just not happy with the format of the XML file?
Like you want (A):
</intro><part>
But you get (B):
</intro>
<part>
If that is the question, then, in my eyes, you just want a strange thing. Because...
a) Code doesn't care how the XML file is formatted and
b) The format in (B) is the correct one
If you, for what reason ever, want to change it, then you have to parse through the XML file, opening it as a string and checking manually for closed and opened tags.

Creating XSD Dynamically

I have two inputs. I get as input one XML file. I have to create an XSD file for this XML file. This XML file has tags which depend on another input. But that XML file should have certain tags for sure. For example, the XML file has the following structure :
<A>
<B>
<C>...</C>
<D>...</D>
<E>
<F>...</F>
<G>...</G>
</E>
</B>
</A>
Here, in this XML file, A,B and E tags should be compulsory. But the tags C and D inside the B tag and tags F and G inside the E tag depends on another input. So I should create an XSD dynamically(i know that A,B and E tags should be present and I do know about the other tags from the other input) and validate the input XML file against the XML Schema. Can someone tell me how I can do this in C#?
I have no idea what you're asking.
An XSD is a blue-print for constructing a business-valid XML document. You do not generally create XSD documents dynamically. You obtain an XSD document so that you can create an XML document that will be valid in a specific business usage or validate XML documents against that schema.
I'm know XML Serialization in C# is covered in great depth on the web.
Have you looked at XSLT yet? It's very useful for creating one XML file based on another. If you can access an XSLT engine from C# (I guess that's possible), I can help you set up the XSLT stylesheet.

How to get the text from XML with position in the XML file?

I want to parse HTML (you can assume as a XML, converted via Tidy) and get all the text nodes (which means nodes in Body tag that are visible) and their location in the XML file. Location means the text position in the flat XML file.
XmlTextReader implements IXmlLineInfo - if you look at the docs for IXmlLineInfo it gives an example of reading an XML file and reporting the location of each node.
EDIT: For those saying it's irrelevant, it may well be irrelevant to the XML - but quite possibly not to a human. If you're trying to tell people where to look in the XML for particular bits, it can be very helpful to report line numbers and positions.
The SAX specification for reading XML (which almost all XML tools implement) provides a ContentHandler with a Locator which allows you to get the line and character (column) number.
int getColumnNumber()
Return the column number where the current document event ends.
int getLineNumber()
Return the line number where the current document event ends.
(I missed the requirement for C#. The example above is for Java but I will try to find the corresponding C# interface).
The event could be a string of characters.
SAX for .NET is described in:
http://saxdotnet.sourceforge.net/
You should not rely on text position in an XML file(whitespace is completely ignored by any sane parser). What you can (and should) do is use XPath to identify the nodes you are interested in, and then take out the text from those nodes. If you're interested in just the text nodes, then the query "//text()" will grab all the text nodes.

Categories

Resources