Special Charcter Issue in XML Creation using SQLXMLBULKLOADLib - C# - c#

I am trying to load csv file using SQLXMLBULKLOADLib which first converts csv to xml and then maps it to database model. My cvs file contains special character. When SQLXMLBULKLOADLib loads it in XML, the special characters are converted to different representation like , ,  etc. I am not aware of what , ,  is. How to handle this in XML and SQL Server? I need to show the exact special character available in csv file.

I got answer from http://social.msdn.microsoft.com/Forums/en-au/sqlxml/thread/5c46314d-ec4c-4ec2-91c9-7ceb466af3c6

Related

Convert file path to UTF-8

I want to get, print and write to a text file the full path on disk of a file named A&T+X-8_L_R1.png but when I print it I get A&T+X-8_L_R1.png.
AFAIK I need to change the encoding. I did a search and found this potential solution but it doesn't work:
String filePathString = relativeUri.ToString();
byte[] bytes = Encoding.Default.GetBytes(filePathString);
filePathString = Encoding.UTF8.GetString(bytes);
filePathNode.SetValue(filePathString);
This is the full code of my class: http://pastebin.com/dZLGeS8p
The class searches recursively for *.png files and creates an XML structure from their paths. When I save the XML file the special characters from the paths like & are changed.
Can anyone point me to a solution?
You are writing an XML file, not a plain text file. In XML, an ampersand needs to be escaped to &.
So the result you get is perfectly ok. It's even required to be like this.
I recommend to open the XML file with an application that can properly validate and display XML. It'll be easier to see that the file is correct.
The UTF-8 conversion in your code isn't required. If the XML file is encoded in UTF-8, your XML classes will take care of any required conversions.

How to write '&' in xml?

I am using xmlTextWriter to create the xml.
writer.WriteStartElement("book");
writer.WriteAttributeString("author", "j.k.rowling");
writer.WriteAttributeString("year", "1990");
writer.WriteString("&");
writer.WriteEndElement();
But now i need to write '&' but xmlTextWriter will automatically write this one as "&amp";
So is there any work around to do this?
I am creating xml by reading the doc file.So if I read "-" then in xml i need to write "&ndash";.So while writing it's written as "&amp";ndash.
So, for example, if I am trying to write a node with the text good-bad, I actually need to write my XML such as <node>good–bad</node>. This is a requirement of my project.
In a proper XML file, you cannot have a standalone & character unless it is an escape character. So if you need an XML node to contain good–bad, then it will have to be encoded as good&ndash;bad. There is no workaround as anything different would not be valid XML. The only way to make it work is to just write the XML file as a plain text how you want it, but then it could not be read by an XML parser as it is not proper XML.
Here's a code example of my suggested workaround (you didn't specify a language, so I am showing you in C#, but Java should have something similar):
using(var sw = new StreamWriter(stream))
{
// other code to write XML-like data
sw.WriteLine("<node>good–bad</node>");
// other code to write XML-like data
}
As you discovered, another option is to use the WriteRaw() method on XmlTextWriter (in C#) will write an unencoded string, but it does not change the fact it is not going to be a valid XML file when it is done.
But as I mentioned, if you tried to read this with an XML Parser, it would fail because &ndash is not a valid XML character entity so it is not valid XML.
– is an HTML character entity, so escaping it in an XML should not normally be necessary.
In the XML language, & is the escape character, so & is appropriate string representation of &. You cannot use just a & character because the & character has a special meaning and therefore a single & character would be misinterpreted by the parser/
You will see similar behavior with the <, >, ", and' characters. All have meaning within the XML language so if you need to represent them as text in a document.
Here's a reference to all of the character entities in XML (and HTML) from Wikipedia. Each will always be represented by the escape character and the name (>, <, ", &apos;)
In XML, & must be escaped as &. The & character is reserved for entities and thus not allowed otherwise. Entities are used to escape characters with special meanings in XML.
Another software reading the XML has to decode the entity again. < for < and > for > or other examples, some other languages like HTML which are based on XML provide even more of these.
I think you will need to encode it. Like so:
colTest = "&"
writer.WriteEncodedText(colTest)

How to convert Binary data from database to String(text)

I have uploaded my MS word file in Database in binary format. I am able to retrive it back. But I am planning to open up the word file in read only mode. I have done much operations on the file which I have stored in databse like trackrevisions,protection etc. Now I just want only one thing to happen. I want to convert the binary data efficiently back to text(string) which was origianlly stored in the database.
Here are some ways I am trying to get text back from binary but all of them return symbols(format not supported) rather than text.
string str1 = System.Text.ASCIIEncoding.ASCII.GetString(bytes);
string x = Encoding.ASCII.GetString(bytes).ToLower();
Any suggestions
MS Word files are not "plain text". You cannot read them by merely using a text decoder.
Making a Word document read only is something you do before saving it (Word.Document.Protect())
If you are using .docx - it is xml, therefore already text (utf8)

Reading only XML from a text file which contains text, binary, and XML data?

I have a text file (.txt) which has text data, binary data, and XML data all mixed together within it. I've googled around for a few minutes and cannot figure out how to only extract the XML from this text file. Can the good users of SO offer some suggestion?
I'm using C# 4.0.
Since I cannot simply load the text file into an XDocument, I've been messing with regex, but this approach is getting me no where.
First of all, file can't be text and binary simultaneously: if it contains binary data, it's binary file. But from your description seems like it's a text file with some binary data in text-encoded form.
If you know what root tag name is then use substring search to locate start and end of xml document, "cut" it, and then you can process it in any way you want.

Reading XML file with Invalid character

I am using Dataset.ReadXML() to read an XML string. I get an error as the XML string contains the Invalid Character 0x1F which is 'US' - Unit seperator. This is contained within fully formed tags.
The data is extracted from an Oracle DB, using a Perl script. How would be the best way to escape this character so that the XML is read correctly.
EDIT: XML String:
<RESULT>
<DEPARTMENT>Oncology</DEPARTMENT>
<DESCRIPTION>Oncology</DESCRIPTION>
<STUDY_NAME>**7360C hsd**</STUDY_NAME>
<STUDY_ID>27</STUDY_ID>
</RESULT>
Is between the C and h in the bold part, is where there is a US seperator, which when pasted into this actually shows a space. So I want to know how can I ignore that in an XML string?
If you look at section 2.2 of the XML recommendation, you'll see that x01F is not in the range of characters allowed in XML documents. So while the string you're looking at may look like an XML document to you, it isn't one.
You have two problems. The relatively small one is what to do about this document. I'd probably preprocess the string and discard any character that's not legal in well-formed XML, but then I don't know anything about the relatively large problem.
And the relatively large problem is: what's this data doing in there in the first place? What purpose (if any) do non-visible ASCII characters in the middle of a (presumably) human-readable data field serve? Why is it doesn't the Perl script that produces this string failing when it encounters an illegal character?
I'll bet you one American dollar that it's because the person who wrote that script is using string manipulation and not an XML library to emit the XML document. Which is why, as I've said time and again, you should never use string manipulation to produce XML. (There are certainly exceptions. If you're writing a throwaway application, for instance, or an XML parser. Or if your name's Tim Bray.)
Your XmlReader/TextReader must be created with correct encoding. You can create it as below and pass to your Dataaset:
StreamReader reader = new StreamReader("myfile.xml",Encoding.ASCII); // or correct encoding
myDataset.ReadXml(reader);

Categories

Resources