I'm having problems reading the ampersand symbol from an XML file:
XElement xmlElements = XElement.Load(Path_Xml_Data_File);
I get error when I have:
<Name>Patrick & Phill</Name>
Error: Name cannot begin with the ' ' character, hexadecimal value 0x20. Xml.XmlException) A System.Xml.XmlException was thrown: "Name cannot begin with the ' ' character
Or with special Portuguese characters:
<Extra>Direcção Assistida</Extra> (= <Extra>Direcção Assistida</Extra>)
Error: Reference to undeclared entity 'ccedil'
Any idea how to solve this problem?
I'm afraid that you're dealing with malformed XML.
To represent the ampersand, the data that you're loading should use the "&" entity.
The ç (ç) and ã (ã) named entities are not part of the XML standard, they are more commonly found in HTML (although they can be added to XML by the use of a DTD).
You could use HtmlTidy to tidy up the data first, or you could write something to convert the bare ampersands into entities on the incoming files.
For example:
public string CleanUpData(string data)
{
var r = new Regex(#"&\s");
string output = r.Replace(data, "& ");
output = output.Replace("ç", "ç");
output = output.Replace("ã", "ã");
return output;
}
Related
I have a on web uploaded File Name "Schränke Wintsch.pdf".
The file Name is saved in a XML file like so:
<File>Schra?nke Wintsch.pdf</File>
If I debug this in c# and maunally add an ä, then it is saved correctly.
<File>Schra?nke Wintsch-ä.pdf</File>
OK I know it is an Encoding Problem.
But why is the same ä character represented with different char codes(example on Img 2)?
XML defines the encoding used within the document using the header. It will look something like this.. <?xml version="1.0" encoding="ISO-8859-9" ?>.
If you append the string make sure to use the same encoding to avoid a mismatch.
Test appending the char bytes and see if that helps.
var en = Encoding.GetEncoding("ISO-8859-9");
en.GetString(Encoding.GetBytes("ä"));
The original XML that you have is using the Unicode 'COMBINING DIAERESIS' code (int value 776) to use two characters to representä.
(Note how the combining character has been displayed as ? in the <File>Schra?nke Wintsch.pdf</File> image in your post.)
The 776 code says to put the double-dots above the previous character (an a).
However, where you typed in the ä it has been stored as the unicode character with code 228.
The question you need to answer is: Why is the original source XML using the "Combining Diaeresis" character rather than the more usual ä? (Without knowing the origin of the XML file, we cannot answer that question.)
Incidentally, you can "normalise" those sorts of characters by using string.Normalize(), as demonstrated by the following program:
using System;
namespace Demo
{
static class Program
{
static void Main()
{
char[] a = {(char)97, (char)776};
string s = new string(a);
Console.WriteLine(s + " -> " + s.Length); // Prints a¨ -> 2
var t = s.Normalize();
Console.WriteLine(t + " -> " + t.Length); // Prints ä -> 1
}
}
}
Note how the length of s is 2, but the length of t is only 1 (and it contains the single character ä).
So you might be able to improve things by using string.Normalize() to normalise these unexpected characters.
string.Normalize() ist the working solution for the string "Schränke Wintsch-ä.pdf". So it ist correctly saved as Schränke Wintsch-ä.pdf
Is there a way/function/reg-ex in SQL Server or Visual Studio by which we can escape any character/special character within a string?
I have a functionality/page where there are server text field and user can enter any kind of string there (including special characters). And as a result I am showing a JSON string as a 'Key', 'Value' pare of those text fields entries.
For ex: I have these fields on a page:
Name , LastName , Address
And the entered values for above fields are:
Name : *-+-#. Wwweee4426554456666yyyy5uuuuttrrrreree6655zfgh\\][;'/.uuuuuuuu66uuyt,+_)(*&^%$##!~|}{:\\\"?><\\\\][;'/.,+_)(*&^%$##!~|}{:\\\"?><\\\\][;'/.,+_)(*&^%$##!~|}{:\\\"?><\\\\][;'/.,+_)(*&^%$##!~|}{:\
LastName : Piterson
Address : Park Road, LA
And I am showing the output like a JSON string below-
[{"Key":"Name","Value":"*-+-#.Wwweee4426554456666yyyy5uuuuttrrrreree6655zfgh\\][;'/.uuuuuuuu66uuyt,+_)(*&^%$##!~|}{:\\\"?><\\\\][;'/.,+_)(*&^%$##!~|}{:\\\"?><\\\\][;'/.,+_)(*&^%$##!~|}{:\\\"?><\\\\][;'/.,+_)(*&^%$##!~|}{:\"},{"Key":"LastName","Value":"Piterson"},{"Key":"Address","Value":"Park Road, LA"}]
But while parsing this string I am getting a parsing error below -
"After parsing a value an unexpected character was encountered: K. Path '[4].Value', line 1, position 1246."
I am using below SQL Server function to parse the string -
ALTER function [dbo].[fnEscapeString](#text nVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
as
BEGIN
--if(CHARINDEX() )
if (CHARINDEX('\',#text) > 0)
set #text = Replace(#text,'\','\\')
if (CHARINDEX('"',#text) > 0)
set #text = Replace(#text,'"','\"')
return #text
END
This function is working in many other cases (with many other strings). But not working with above string. I think this function is not enough able to parse all kind of strings.
So is there any way where we can parse a string in a valid JSON row format. May be any reg-ex or sql function can do that. Please suggest.
You can directly convert your table data to json in 2016 for example,
SELECT name, surname
FROM emp
FOR JSON AUTO
but in case of lower versions you have to convert your sql table data to xml and then to Json.
Please refer this link to parse SQL Data to Json.
http://www.codeproject.com/Articles/815371/Data-Parsing-SQL-to-JSON
You can try this as mentioned here
var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
eval('(' + text + ')');
Try converting the input string to JSON by using:
a) System.Web.HttpUtility.JavaScriptStringEncode
string jsonEncoded = HttpUtility.JavaScriptStringEncode(s)
or
b) NuGet Package Newtonsoft.Json
string jsonEncoded = JsonConvert.ToString(s)
Reference: How to escape JSON string?
I am converting a string into UTF8 byte code,where as it is not accepting any special character and not converting it. so please help me to know convert these special char also in c#.
byte[] bytes = Encoding.UTF8.GetBytes("<Shipper>A & G VENLO BV</Shipper>");
Do not lead people astray. Your code throws a System.Xml.XmlException while parsing the XML.
The fact is that the string <Shipper>A & G VENLO BV</Shipper> is not well formed XML. The & symbol in XML must be escaped.
You have to create XML using the right approach:
XmlDocument xmlDoc = new XmlDocument();
XmlElement shipper = xmlDoc.CreateElement("Shipper");
shipper.InnerText = "A & G VENLO BV";
xmlDoc.AppendChild(shipper);
As a result, you will get the well-formed XML
<Shipper>A & G VENLO BV</Shipper>
Now you can work with it
byte[] bytes = Encoding.UTF8.GetBytes(shipper.OuterXml);
I try execute such a code sample.
var xmlDocument = new XmlDocument();
documentTagName = "testName)"
XmlNode headerElement = xmlDocument.CreateElement(documentTagName);
Of cource I get XmlException:
The ')' character, hexadecimal value 0x... (doesn't matter), cannot be included in a name
Because I have ) symbol in documentTagName. And of cource I'll get the same exception if documentTagName would be like this:
documentTagName = "testName("
or like this:
documentTagName = "testName:"
Because all of these characters ('(' , ')' , ':') are invalid for the xml tag name. But I check many links (and even this) and cannot find the list of all invalid characters for xml tag name. Can anybody help me?
I have an XML string that contains an apostrophe. I replace the apostrophe with its equivalent & parse the revised string into an XElement. The XElement, however, is turning the ' back into an apostrophe.
How do I force XElement.Parse to preserve the encoded string?
string originalXML = #"<Description><data>Mark's Data</data></Description>"; //for illustration purposes only
string encodedApostrophe = originalXML.Replace("'", "'");
XElement xe = XElement.Parse(encodedApostrophe);
This is correct behavior. In places where ' is allowed, it works the same as ', ' or '. If you want to include literal string ' in the XML, you should encode the &:
originalXML.Replace("'", "'")
Or parse the original XML and modify that:
XElement xe = XElement.Parse(originalXML);
var data = xe.Element("data");
data.Value = data.Value.Replace("'", "'");
But doing this seems really weird. Maybe there is a better solution to the problem you're trying to solve.
Also, this encoding is not “ASCII equivalent”, they are called character entity references. And the numeric ones are based on the Unicode codepoint of the character.