XMLinvalide chars replacement C# - c#

I have a string that is displayed in XML but in it I have some invalid chars like string
s = <root> something here <XMLElement>hello</XMLElement> somethig here too </root>
where XMLElement is a List like XMLElement = {"bold", "italic",...} .
What I need is to replace the < and </ if followed by any of the XMLElements to be replaced by > or < depending on the cases.
The <root> is to keep
I have tried so far some regEx
strAux = Regex.Replace(strAux, "bold=\"[^\"]*\"",
match => match.Value.Replace("<", "<").Replace(">", ">"));
or
List<string> startsWith = new List<string> { "<", "</"};
foreach(var stw in startsWith)
{
int nextLt = 0;
while ((nextLt = strAux.IndexOf(stw, nextLt)) != -1)
{
bool isMatch = strAux.Substring(nextLt + 1).StartsWith(BoldElement); // needs to ckeck all the XMLElements
//is element, leave it
if (isMatch)
{
//its not, replace
strAux = string.Format(#"{0}<{1}", strAux.Substring(0, nextLt), strAux.Substring(nextLt +1, strAux.Length - (nextLt + 1)));
}
nextLt++;
}
}
Also tried
XmlDocument doc = new XmlDocument();
XmlElement element = doc.CreateElement("root");
element.InnerText = strAux;
Console.WriteLine(element.OuterXml);
strAux = element.OuterXml.Replace("<root>", "").Replace("</root>", "");
return strAux; But it will repeat the `<root>` too
But nothing worked like I suposed. Is there any different ideias .Thanks

What you have is well-formed XML, so you can use the XML APIs to help you:
Using LINQ to XML (which is generally the better API):
var element = XElement.Parse(s);
element.Value = string.Concat(element.Nodes());
var result = element.ToString();
Or using the older XmlDocument API:
var doc = new XmlDocument();
doc.LoadXml(s);
var root = doc.DocumentElement;
root.InnerText = root.InnerXml;
var result = root.OuterXml;
The result for both is:
<root> something here <XMLElement>hello</XMLElement> somethig here too </root>
See this fiddle for a demo.

You should be using the XmlWriter class.
Sample from the documentation:
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.ConformanceLevel = ConformanceLevel.Fragment;
settings.CloseOutput = false;
// Create the XmlWriter object and write some content.
MemoryStream strm = new MemoryStream();
XmlWriter writer = XmlWriter.Create(strm, settings);
writer.WriteElementString("someNode", "someValue");
writer.Flush();
writer.Close();
https://msdn.microsoft.com/en-us/library/system.xml.xmlwriter(v=vs.110).aspx

It sounds like your input is well-formed XML, but you want to escape some of the tags. The issue here is that there's no way for the code to know which tags are valid and which aren't.
One way to do this is to create a list of valid tags.
List<string> validTags = new List<string>() { "root", "..." };
Then use regex to pick out all instances of <tag> or </tag>and replace them if they're not in the list.
Another way which is faster and easier, but requires more information up front, is to create a list of tags which aren't valid.
List<string> invalidTags = new List<string>() { "XMLElement", "..." };
Simple string manipulation will do, now.
string s = GetYourXMLString();
invalidTags.ForEach(t => s = s.Replace($"</{t}>",$"<{t}>")
.Replace($"<{t}>",$"</{t}>"));
The second way should really only be used if you know which foreign tags are making (or will ever make) an appearance. If not the first approach should be used. One clever possibility is to dynamically create the list of valid tags using reflection or a data contract so that changes to the XML spec will be automatically reflected in your code.
For example, if each element is a property of an object, you might get the list like this:
var validTags = typeof(MyObjectType).GetProperties()
.Select(p => p.PropertyName)
.ToList();
Of course, the property names likely won't be the actual tag names, AND often you'll want to only include certain properties. So you make an attribute class to designate the desired properties (let's call it XMLTagName) and then you can do this:
var validTags = typeof(MyObjectType).GetProperties()
.Select(p => p.GetCustomAttribute<XMLTagName>()?.TagName)
.Where(tagName => tagName != null) //gets rid of properties that aren't tagged
.ToList();
Even with all that, you'll still committing the crime of string manipulation on raw XML. After all, the best real solution here is to figure out how to fix the incoming XML to actually contain the data you want. But if that's not a possibility, the above should do the job.

Related

Getting innertext of XML node

When i try M Adeel Khalid kode i get nothing, and trying others i get errors. i miss something, but i cant se it. My code look like this. but i get an error on Descendants, Saying "xmlDocument does not contain a definition for descendants" As you can probably see, I'm pretty new to this, so bear with me.
protected void btnRetVare_Click(object sender, EventArgs e)
{
fldRetVare.Visible = true;
try
{
functions func = new functions();
bool exists = func.checForMatch(txtRetVare.Text);
string myNumber = txtRetVare.Text;
if (Page.IsValid)
{
if (!exists)
{
txtRetVare.Text= "Varenummer findes ikke";
}
else
{
XmlDocument xmldoc = new XmlDocument();
//xmldoc.Load(Server.MapPath(map));
xmldoc.LoadXml(Server.MapPath(map));
//var Varenummer2055component = xmldoc.SelectNodes("s/Reservedele/Component[Varenummer/text()='"+txtRetVare+"']/Remarks");
//if (Varenummer2055component.Count == 1)
//{
// var remarks = Varenummer2055component[0].InnerText;
// txtRetBemærkninger.Text = remarks.ToString();
//}
string remarks = (from xml2 in xmldoc.Descendants("Component")
where xml2.Element("Varenummer").Value == txtRetVare.Text
select xml2.Element("Remarks")).FirstOrDefault().Value;
txtRetBemærkninger.Text = remarks;
}
}
You can get it this way.
XDocument xdoc = XDocument.Load(XmlPath);
string remarks = (from xml2 in xdoc.Descendants("Component")
where xml2.Element("Varenummer").Value == "2055"
select xml2.Element("Remarks")).FirstOrDefault().Value;
I've tested this code.
Hope it helps.
Use XPath to select the correct node:
XmlDocument xml = new XmlDocument();
xml.LoadXml(#"
<Reservedele>
<Component>
<Type>Elektronik</Type>
<Art>Wheel</Art>
<Remarks>erter</Remarks>
<Varenummer>2055</Varenummer>
<OprettetAf>jg</OprettetAf>
<Date>26. januar 2017</Date>
</Component>
<Component>
<Type>Forbrugsvarer</Type>
<Art>Bulb</Art>
<Remarks>dfdh</Remarks>
<Varenummer>2055074</Varenummer>
<OprettetAf>jg</OprettetAf>
<Date>27. januar 2017</Date>
</Component>
</Reservedele>");
var Varenummer2055component = xml.SelectNodes("s/Reservedele/Component[Varenummer/text()='2055']/Remarks");
if (Varenummer2055component.Count == 1)
{
var remarks = Varenummer2055component[0].InnerText;
}
I think extension method First of LINQ to XML will be simple enough and fill requirements of your questions.
var document = XDocument.Load(pathTopXmlFile);
var remark =
document.Descendants("Component")
.First(component => component.Element("Varenummer").Value.Equals("2055"))
.Element("Remarks")
.Value;
First method will throw exception if xml doesn't contain element with Varenummer = 2055
In case where there is possibility that given number doesn't exists in the xml file you can use FirstOrDefault extension method and add checking for null
var document = XDocument.Load(pathTopXmlFile);
var component =
document.Descendants("Component")
.FirstOrDefault(comp => comp.Element("Varenummer").Value.Equals("2055"));
var remark = component != null ? component.Element("Remarks").Value : null;
For saving new value you can use same "approach" and after setting new value save it to the same file
var document = XDocument.Load(pathTopXmlFile);
var component =
document.Descendants("Component")
.FirstOrDefault(comp => comp.Element("Varenummer").Value.Equals("2055"));
component.Element("Remarks").Value = newValueFromTextBox;
document.Save(pathTopXmlFile);
One more approach, which will be overkill in your particular case, but can be useful if you use other values of xml. This approach is serialization.
You can create class which represent data of your xml file and then just use serialization for loading and saving data to the file. Examples of XML Serialization

Get certain xml node and save the value

Considering the following XML:
<Stations>
<Station>
<Code>HT</Code>
<Type>123</Type>
<Names>
<Short>H'bosch</Short>
<Middle>Den Bosch</Middle>
<Long>'s-Hertogenbosch</Long>
</Names>
<Country>NL</Country>
</Station>
</Stations>
There are multiple nodes. I need the value of each node.
I've got the XML from a webpage (http://webservices.ns.nl/ns-api-stations-v2)
Login (--) Pass (--)
Currently i take the XML as a string and parse it to a XDocument.
var xml = XDocument.Parse(xmlString);
foreach (var e in xml.Elements("Long"))
{
var stationName = e.ToString();
}
You can retrieve "Station" nodes using XPath, then get each subsequent child node using more XPath. This example isn't using Linq, which it looks like you possibly are trying to do from your question, but here it is:
XmlDocument xml = new XmlDocument();
xml.Load(xmlStream);
XmlNodeList stations = xml.SelectNodes("//Station");
foreach (XmlNode station in stations)
{
var code = station.SelectSingleNode("Code").InnerXml;
var type = station.SelectSingleNode("Type").InnerXml;
var longName = station.SelectSingleNode("Names/Long").InnerXml;
var blah = "you should get the point by now";
}
NOTE: If your xmlStream variable is a String, rather than a Stream, use xml.LoadXml(xmlStream); for line 2, instead of xml.Load(xmlStream). If this is the case, I would also encourage you to name your variable to be more accurately descriptive of the object you're working with (aka. xmlString).
This will give you all the values of "Long" for every Station element.
var xml = XDocument.Parse(xmlStream);
var longStationNames = xml.Elements("Long").Select(e => e.Value);

Xml within an Xml

I basically want to know how to insert a XmlDocument inside another XmlDocument.
The first XmlDocument will have the basic header and footer tags.
The second XmlDocument will be the body/data tag which must be inserted into the first XmlDocument.
string tableData = null;
using(StringWriter sw = new StringWriter())
{
rightsTable.WriteXml(sw);
tableData = sw.ToString();
}
XmlDocument xmlTable = new XmlDocument();
xmlTable.LoadXml(tableData);
StringBuilder build = new StringBuilder();
using (XmlWriter writer = XmlWriter.Create(build, new XmlWriterSettings { OmitXmlDeclaration = true }))
{
writer.WriteStartElement("dataheader");
//need to insert the xmlTable here somehow
writer.WriteEndElement();
}
Is there an easier solution to this?
Use importNode feature in your document parser.
You can use this code based on CreateCDataSection method
// Create an XmlCDataSection from your document
var cdata = xmlTable.CreateCDataSection("<test></test>");
XmlElement root = xmlTable.DocumentElement;
// Append the cdata section to your node
root.AppendChild(cdata);
Link : http://msdn.microsoft.com/fr-fr/library/system.xml.xmldocument.createcdatasection.aspx
I am not sure what you are really looking for but this can show how to merge two xml documents (using Linq2xml)
string xml1 =
#"<xml1>
<header>header1</header>
<footer>footer</footer>
</xml1>";
string xml2 =
#"<xml2>
<body>body</body>
<data>footer</data>
</xml2>";
var xdoc1 = XElement.Parse(xml1);
var xdoc2 = XElement.Parse(xml2);
xdoc1.Descendants().First(d => d.Name == "header").AddAfterSelf(xdoc2.Elements());
var newxml = xdoc1.ToString();
OUTPUT
<xml1>
<header>header1</header>
<body>body</body>
<data>footer</data>
<footer>footer</footer>
</xml1>
You will need to write the inner XML files in CDATA sections.
Use writer.WriteCData for such nodes, passing in the inner XML as text.
writer.WriteCData(xmlTable.OuterXml);
Another option (thanks DJQuimby) is to encode the XML to some XML compatible format (say base64) - note that the encoding used must be XML compatible and that some encoding schemes will increase the size of the encoded document (base64 adds ~30%).

Parsing XML document using .Descendents(value)

I am trying to parse an xml document that I have created. However xml.Descendants(value) doesn't work if value has certain characters (including space, which is my problem).
My xml is structured like this:
<stockists>
<stockistCountry country="Great Britain">
<stockist>
<name></name>
<address></address>
</stockist>
</stockistCountry>
<stockistCountry country="Germany">
<stockist>
<name></name>
<address></address>
</stockist>
</stockistCountry>
...
</stockists>
And my C# code for parsing looks like this:
string path = String.Format("~/Content/{0}/Content/Stockists.xml", Helper.Helper.ResolveBrand());
XElement xml = XElement.Load(Server.MapPath(path));
var stockistCountries = from s in xml.Descendants("stockistCountry")
select s;
StockistCountryListViewModel stockistCountryListViewModel = new StockistCountryListViewModel
{
BrandStockists = new List<StockistListViewModel>()
};
foreach (var stockistCountry in stockistCountries)
{
StockistListViewModel stockistListViewModel = new StockistListViewModel()
{
Country = stockistCountry.FirstAttribute.Value,
Stockists = new List<StockistDetailViewModel>()
};
var stockist = from s in xml.Descendants(stockistCountry.FirstAttribute.Value) // point of failure for 'Great Britain'
select s;
foreach (var stockistDetail in stockist)
{
StockistDetailViewModel stockistDetailViewModel = new StockistDetailViewModel
{
StoreName = stockistDetail.FirstNode.ToString(),
Address = stockistDetail.LastNode.ToString()
};
stockistListViewModel.Stockists.Add(stockistDetailViewModel);
}
stockistCountryListViewModel.BrandStockists.Add(stockistListViewModel);
}
return View(stockistCountryListViewModel);
I am wondering if I am approaching the Xml parsing correctly, whether I shouldn't have spaces in my attributes etc? How to fix it so that Great Britain will parse
However xml.Descendants(value) doesn't work if value has certain characters
XElement.Descendants() expects an XName for the tag, not for the value.
And XML tags are indeed not allowed to contain spaces.
Your sample XML however only contains a value for an attribute, and the space there is fine.
Update:
I think you need
//var stockist = from s in xml.Descendants(stockistCountry.FirstAttribute.Value)
// select s;
var stockists = stockistCountry.Descendants("stockist");

XmlReader to return node as-is without children

I'm traversing a large XML document using XmlReader and stitching it into a much smaller and more manageable XmlDocmuent. Along the way, I find a node that's interesting so to move it I do this:
targetDoc.LoadXml("<result></result>");
// Some interesting code removed
using (XmlReader r = XmlReader.Create(file))
{
while (r.Read())
{
if (r.NodeType == XmlNodeType.Element)
{
if (r.Name == match)
{
// Put the node into the target document
targetDoc.FirstChild.InnerXml = r.ReadOuterXml();
return targetDoc;
}
}
}
}
This is all well and good, except I'd like to include the node without its descendents. What I'm interested in is the node itself with its attributes. The descendents are very large, bulky and uninteresting at this point. (And reading them into memory all at once will cause out of memory errors...)
Is there an easy way to get the text (?) of the found element with its attributes -- but not its descendents -- into the target document?
I don't think there's a Built-in way of doing it. I think you have to read out the Attributes and content yourself.
e.g.
static void Main(string[] args)
{
var xml = #"<root>
<parent a1 = 'foo' a2 = 'bar'>Some Parent text
<child a3 = 'frob' a2= 'brob'> Some Child Text
</child>
</parent>
</root>";
var file = new StringReader(xml) ;
using (XmlReader r = XmlReader.Create(file))
{
while (r.Read())
{
if (r.NodeType == XmlNodeType.Element)
{
if (r.Name == "parent")
{
var output = new StringBuilder();
var settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
using (var elementWriter = XmlWriter.Create(output, settings))
{
elementWriter.WriteStartElement(r.Name);
elementWriter.WriteAttributes(r,false);
elementWriter.WriteValue(r.ReadString());
elementWriter.WriteEndElement();
}
Console.WriteLine(output.ToString());
}
}
}
}
if (System.Diagnostics.Debugger.IsAttached)
Console.ReadLine();
}
Will produce
<parent a1="foo" a2="bar">Some Parent text</parent>
Press any key to continue . . .
You can try XmlNode.CloneNode(bool deep) method.
deep: true to recursively clone the subtree under the specified node; false to clone only the node itself.
Not necessarily a great way, but you can read the string until you get to the end of the start tag, and then manually append an end tag and load that into an XmlDocument.
edit:
Thinking something like:
string xml = r.ReadOuterXml();
int firstEndTag = xml.IndexOf('>');
int lastStartTag = xml.LastIndexOf('<');
string newXml = xml.Substring(0, firstEndTag) + xml.Substring(lastStartTag);
This might not be valid at all, given that there's a large string right there. Your way might be the best. Neither are pretty, but I personally can't think of a better way, given your constraints (which is not to say that a better way doesn't exist).

Categories

Resources