I would like to load a XML File with an absolute path. I have tried doing this:
XmlDocument doc = new XmlDocument();
doc.Load(#"C:\Users\Accueil\Desktop\TestEDI\ARTest.xml");
But I get the error:
the character '<', hexadecimal value 0x3c, cannot be included in a name.
You will get this error if you have a use of < other than as the open tag of an xml element.
For example, <my<Element> could give you this error, because the parser finds the second < while it is expecting either part of the tag name for myElement or the closing tag >.
Another example would be that you wanted to use < in the body of some xml text:
<inequality>Here is an example of an inequality: x < 5</inequality>
The way to avoid this is to make sure that all non opening tag uses of '<' are encoded as proper XML entities, in this case, that would be <
As Andy has said it looks as though you are using restricted characters in your xml file...
Taken from here...
This gives an error message:
<message>if salary < 1000 then</message>
This is fine:
<message>if salary < 1000 then</message>
There are 5 pre-defined entity references in XML:
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.
So replace those illegal characters or consider using CData
Try this:
XmlReader xmlFile;
FileStream fs = new FileStream("/*YOUR XML FILE PATH*/.xml", FileMode.Open, FileAccess.Read, FileShare.ReadWrite); // Creates a FileStream that will open a specific .xml file you want, read it and then write its data into your program
xmlFile = XmlReader.Create(fs, new XmlReaderSettings()); // Enables your program to use this newly "created" .xml file
DataSet ds = new DataSet();
ds.ReadXml(xmlFile); // Uses your .xml file as a DataSet, which can then be used as a data source for something you need (e.g.: a DataGridView)
Related
For a code in C#, I am parsing a string to XML using XPathDocument.
The string is retrieved from SDL Trados Studio and it depends on the XML that is being worked on (how it was originally created and loaded for translations) the string sometimes has a BOM sometimes not.
Edit: The 'xml' is actually parsed from the segments of the source and target text and the structure element. The textual elements are escaped for xml and the markup and text is joined in one string. So if the markup has BOM in the xliff, then the string will have BOM.
I am trying to actually parse any of the xmls, independent of encoding. So at this point my solution is to remove the BOM with Substring.
Here is my code:
//Recreate XML files (extractor returns two string arrays)
string strSourceXML = String.Join("", extractor.TextSrc);
string strTargetXML = String.Join("", extractor.TextTgt);
//strip BOM
strSourceXML = strSourceXML.Substring(strSourceXML.IndexOf("<?"));
strTargetXML = strTargetXML.Substring(strSourceXML.IndexOf("<?"));
//Transform XML with the preview XSL
var xSourceDoc = new XPathDocument(strSourceXML);
var xTargetDoc = new XPathDocument(strTargetXML);
I have searched for a better solution, through several articles, such as these, but I found no better solution yet:
XML - Data At Root Level is Invalid
Parsing XML with C#
Parsing complex XML with C#
Parsing : String to XML
XmlReader breaks on UTF-8 BOM
Any advice to solve this more elegantly?
The constructor of XPathDocument taking a String argument https://msdn.microsoft.com/en-us/library/te0h7f95%28v=vs.110%29.aspx takes a URI with the XML file location. If you have a string with XML markup then use a StringReader over that string e.g.
XPathDocument xSourceDoc;
using (TextReader tr = new StringReader(strSourceXML))
{
xSourceDoc = new XPathDocument(tr);
}
While saving the existing XML to new location, entities escaped from the content and replaced with Question Mark
See the snaps below entity ‐ (- as Hex) present while reading but its replaced with question mark after saving to another location.
While Reading as Inner XML
While Reading as Inner Text
After Saving XML File
EDIT 1
Below is my code
string path = #"C:\work\myxml.XML";
string pathnew = #"C:\work\myxml_new.XML";
//GetFileEncoding(path);
XmlDocument document = new XmlDocument();
XmlDeclaration xmlDeclaration = document.CreateXmlDeclaration("1.0","US-ASCII",null);
//document.CreateXmlDeclaration("1.0", null, null);
document.Load(path);
string x = document.InnerText;
document.Save(pathnew);
EDIT 2
My source file looks like below. I need to retain the entities as it is
The issue here seems to be the handling of encoding of entity references by the specific XmlWriter implementation internal to XmlDocument.
The issue disappears if you create an XmlWriter yourself - the unsupported character will be correctly encoded as an entity reference. This XmlWriter is a different (and newer) implementation that sets an EncoderFallback that encodes characters as entity references for characters that can't be encoded. Per the remarks in the docs, the default fallback mechanism is to encode a question mark.
var settings = new XmlWriterSettings
{
Indent = true,
Encoding = Encoding.GetEncoding("US-ASCII")
};
using (var writer = XmlWriter.Create(pathnew, settings))
{
document.Save(writer);
}
As an aside, I'd recomment using the LINQ to XML XDocument API, it's much nicer to work with than the old creaky XmlDocument API. And its version of Save doesn't have this problem, either!
I am working on some code in C# where I want the output as the text present at some xpath from some xml-file. Now as the xml file keeps changing and so do the namespaces, I don't want to hardcode the namespaces in the code.
XmlDocument xml = new XmlDocument();
xml.Load(file);
XmlNamespaceManager nsMgr = new XmlNamespaceManager(xml.NameTable);
Now as the namespace keeps changing I am thinking of reading the xml and using some string operations to get the namspace and uri string:
string s = System.IO.File.ReadAllText(file);
string[] s1 = new string[1];
s1[0]="xmlns:";
string[] s3 = s.Split(s1, System.StringSplitOptions.None);
foreach (string k in s3)
{
nsMgr.AddNamespace( k.Substring( 0 , k.IndexOf('=') - 1) , need help for this )
}
Some sample values of k after the split operation are:
"xs=\"http://www.w3.org/2001/XMLSchema\" "
"location=\"urn:x-ABC:content:location:mastering:1\" "
"entity=\"urn:x-ABC:content:identified-entities:mastering:1\" "
I need help on second parameter of nsMgr.AddNamespace(). Also if there is a cleaner way of adding namespaces without hardcoding.
EDIT:- Clarifying what i am doing here. I am trying to write a winform through which onc can get output as text present at some xpath in some xml. So winform takes 2 inputs .One is xml file location and other is xpath . The output should be the text at that xpath location.
For example
<a>
<b>abc
</b>
</a>
if user searches for //b or a/b then output should be abc.
The code works fine when there are default namespaces , but when the xml has namespaces defined i need to include them in the code. As i want to make the code generic , so i cannot hardcode the namespaces.
I am new to XML and I am now trying to read an xml file.
I googled and try this way to read xml but I get this error.
Reference to undeclared entity 'Ccaron'. Line 2902, position 9.
When I go to line 2902 I got this,
<H0742>Čopova 14, POB 1725,
SI-1000 Ljubljana</H0742>
This is the way I try
XmlDocument xDoc = new XmlDocument();
xDoc.Load(file);
XmlNodeList nodes = xDoc.SelectNodes("nodeName");
foreach (XmlNode n in nodes)
{
if (n.SelectSingleNode("H0742") != null)
{
row.IrNbr = n.SelectSingleNode("H0742").InnerText;
}
.
.
.
}
When I look at w3school, & is illegal in xml.
EDIT :
This is the encoding. I wonder it's related with xml somehow.
encoding='iso-8859-1'
Thanks in advance.
EDIT :
They gave me an .ENT file and I can reference online ftp.MyPartnerCompany.com/name.ent.
In this .ENT file
I see entities like that
<!ENTITY Cacute "Ć"> <!-- latin capital letter C with acute,
U+0106 Latin Extended-A -->
How can I reference it in my xml Parsing ?
I prefer to reference online since they may add new anytime.
Thanks in advance !!!
The first thing to be aware of is that the problem isn't in your software.
As you are new to XML, I'm going to guess that definining entities isn't something you've come across before. Character entities are shortcuts for arbitrary pieces of text (one or more characters). The most common place you are going to see them is in the situation you are in now. At some point, your XML has been created by someone who wanted to type the character 'Č' or 'č' (that's upper and lower case C with Caron if your font can't display it).
However, in XML we only have a few predeclared entities (ampersand, less than, greater than, double quote and apostraphe). Any other character entities need to be declared. In order to parse your file correctly you will need to do one of two things - either replace the character entity with something that doesn't cause the parser issues or declare the entity.
To declare the entity, you can use something called an "internal subset" - a specialised form of the DTD statement you might see at the top of your XML file. Something like this:
<!DOCTYPE root-element
[ <!ENTITY Ccaron "Č">
<!ENTITY ccaron "č">]
>
Placing that statement at the beginning of the XML file (change the 'root-element' to match yours) will allow the parser to resolve the entity.
Alternatively, simply change the Č to Č and your problem will also be resolved.
The &# notation is a numeric entity, giving appropriate unicode value for the character (the 'x' indicates that it's in hex).
You could always just type the character too but that requires knowledge of the ins and outs of your keyboard and region.
Č isn't XML it's not even defined in the HTML 4 entity reference. Which btw isn't XML. XML doesn't support all those entities, in fact, it supports very few of them but if you look up the entity and find it, you'll be able to use it's Unicode equivalent, which you can use. e.g. Š is invalid XML but Š isn't. (Scaron was the closest I could find to Ccaron).
Your XML file isn't well-formed and, so, can't be used as XmlDocument. Period.
You have two options:
Open that file as a regular text file and fixed that symptom.
Fix your XML generator, and that's your real problem. That generator isn't generating that file using System.Xml, but probably concatening several strings, as "XML is just a text file". You should repair it, or opening a generated XML file will be always a surprise.
EDIT: As you can't fix your XML generator, I recommend to open it with File.ReadAllText and execute an regular expression to re-encode that & or to strip off entire entity (as we can't translate it)
Console.WriteLine(
Regex.Replace("<H0742>Čopova 14, { POB & SI-1000 &</H0742>",
#"&((?!#)\S*?;)?", match =>
{
switch (match.Value)
{
case "<":
case ">":
case "&":
case """:
case "'":
return match.Value; // correctly encoded
case "&":
return "&";
default: // here you can choose:
// to remove entire entity:
return "";
// or just encode that & character
return "&" + match.Value.Substring(1);
}
}));
Č is an entity reference. It is likely that the entity reference is intended to be for the character Č, in order to produce: Čopova.
However, that entity must be declared, or the XML parser will not know what should be substituted for the entity reference as it parses the XML.
solution :-
byte[] encodedString = Encoding.UTF8.GetBytes(xml);
// Put the byte array into a stream and rewind it to the beginning
MemoryStream ms = new MemoryStream(encodedString);
ms.Flush();
ms.Position = 0;
// Build the XmlDocument from the MemorySteam of UTF-8 encoded bytes
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(ms);
We're creating a system outputting some data to an XML schema. Some of the fields in this schema need their formatting preserved, as it will be parsed by the end system into potentially a Word doc layout. To do this we're using <![CDATA[Some formatted text]]> tags inside of the App.Config file, then putting that into an appropriate property field in a xsd.exe generated class from our schema. Ideally the formatting wouldn't be out problem, but unfortunately thats just how the system is going.
The App.Config section looks as follows:
<header>
<![CDATA[Some sample formatted data]]>
</header>
The data assignment looks as follows:
HeaderSection header = ConfigurationManager.GetSection("header") as HeaderSection;
report.header = "<[CDATA[" + header.Header + "]]>";
Finally, the Xml output is handled as follows:
xs = new XmlSerializer(typeof(report));
fs = new FileStream (reportLocation, FileMode.Create);
xs.Serialize(fs, report);
fs.Flush();
fs.Close();
This should in theory produce in the final Xml a section that has information with CDATA tags around it. However, the angled brackets are being converted into < and >
I've looked at ways of disabling Outout Escaping, but so far can only find references to XSLT sheets. I've also tried #"<[CDATA[" with the strings, but again no luck.
Any help would be appreciated!
You're confusing markup with content.
When you assign the string "<![CDATA[ ... ]]>" to the value, you are saying that is the content that you wish to put in there. The XmlSerializer does not, and indeed should not, attempt to infer any markup semantics from this content, and simply escapes it according to the normal rules.
If you want CDATA markup in there, then you need to explicitly instruct the serializer to do so. Some examples of how to do this are here.
Have you tried changing
report.header = "<[CDATA[" + header.Header + "]]>";
to
report.header = "<![CDATA[" + header.Header + "]]>";