Edit Xml InnerText accessed as Plain Text using RegEx? - c#

I have the following dummy fake sample:
<family>
<member> dad </member>
<member> mum </member>
<member> son </member>
<member> grandad<> </member>
</family>
I have been given a document to convert into XML but I have been unsuccessful so far in doing so. I have no control over how the document (html) given to me is created but I need to convert the document to xml; So that I can convert it using a stylesheet.
TidyManaged and HAP are no good to me at this stage in my workflow. Will explain more if people are interested knowing why.
In order for me to use HAP successfully, I need the above sample to look like the below:
<family>
<member> dad </member>
<member> mum </member>
<member> son </member>
<member> grandad<> </member>
</family>
My last approach before I give up on this problem would be, to read in my source html document, treat it as a plan text document and read it line by line.
I require someone to give me some regex that will successfully match the inner text of an element i.e:
<member> grandad<> </member>
Would give me the string:
"grandad<>"
If I can get this far, I should be able to convert the angle brackets into html key code equivalents. This should then pass as valid XML allowing me to load this into an XDocument class.
Then replace that result string back with this one:
<member> grandad<> </member>
When all special characters have been 'escaped' like this properly then I will be in a position to leverage the benefits of HTML Agility Pack (HAP) otherwise I will have to give up.
Thanks for reading.

The simplest Regular Expressions
var reg = new Regex(#"(?<=<(\w+)>)(.*)(?=</\1>)");
var input = "<member> grandad<Regexp is a bad tool because of <strong>this</strong>> </member>";
var output = reg.Match(input).Value;
Problem will be if your member tag contains any white spaces or attributes or more then one member tag will be in single line. So if you can provide ugliest example I'll change expression to adjust your input.

If you can process each document manually then you can use notepad++.
The reindent xml(TextFX->TextFX HTML Tools->Reindent xml> functionality will automatically impose the entities you want.

Related

How to remove the time from datetime format in particular xml element?

I need to remove the Time from the date-time format.
My current output:
<Members>
<Member>
<EmployeeName>Dorothy Yan</EmployeeName>
<DateofBirth>05-30-2016T00:00:00+05:30</DateofBirth>
</Member>
<Member>
<EmployeeName>Dorothy Yan</EmployeeName>
<DateofBirth>01-25-2014T00:00:00+05:30</DateofBirth>
</Member>
</Members>
I need to remove the time from DateofBirth
Correct Output
<Members>
<Member>
<EmployeeName>Dorothy Yan</EmployeeName>
<DateofBirth>05-30-2016</DateofBirth>
</Member>
<Member>
<EmployeeName>Dorothy Yan</EmployeeName>
<DateofBirth>01-25-2014</DateofBirth>
</Member>
</Members>
I'm using c# code in WEB API project. any one help me. I searched in Google but unable to find the solution.
You can do it easily with a regex (without handling XElement or doing DateTime parsing):
String xmlInitialContent = .... // Content of your initial xml
Regex rgx = new Regex("T\d\d:\d\d:\d\d+\d\d:\d\d");
String result = rgx.Replace(xmlInitialContent, String.Empty) ;
You should post the code that create the date so we can see what type of Date data you used and how you get the ouput. I will assume you're creating the xml from c# because you tag the question with c#. But you should be able to do it with a pattern.
With tostring like that :
DateTime thisDate1 = new DateTime(2011, 6, 10);
String dateFormat = thisDate1.ToString("MM dd yyyy");
Look here : http://msdn.microsoft.com/en-us/library/8kb3ddd4(v=vs.110).aspx
in youre case the pattern would be MM-dd-yyyy like the exemple.

How to encode xml doc as a base64 binary object

I'm trying to call an xml-rpc web service method that takes 1 parameter (an array of values) key and leads.
Key must be named 'key' and must have a value of type string.
Leads is an xml document containing the leads data.This must be packaged as a binary object. This value must be named leads and must be of type base64.
Alright so the SINGLE parameter for this method call in python is:
r = proxy.leads({'key': key, 'leads': doc})
My first question is how can I do this in c#? The closest thing .net has to that is a dictionary object which won't work for this.
Secondly, how do I make the xml doc a binary object of type base64? Is that the same as converting a byte[] array to base64 string? Like this:
Convert.ToBase64String(byteArray)
Here is what the request should look like:
<?xml version="1.0" encoding="iso-8859-1"?>
<methodCall>
<methodName>leads</methodName>
<params>
<param>
<value>
<struct>
<member>
<name>key</name>
<value>
<string>XXXXXXXXXXX</string>
</value>
</member>
<member>
<name>leads</name>
<value>
<base64>PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPGxlYWRzPgogICA8bGVhZD4K
ICAgICAgPGlkPjM5OTk3PC9pZD4KICAgICAgPEZpcnN0TmFtZT5Cb2IgSmltPC9GaXJzdE5hbWU+
CiAgICAgIDxMYXN0TmFtZT5TbWl0aDwvTGFzdE5hbWU+CiAgICAgIDxBZGRyZXNzPjEyMzQgV2Vz
:
:
ICAgICA8UmVjZWl2ZUFkZGxJbmZvPlllczwvUmVjZWl2ZUFkZGxJbmZvPgogICAgICA8bG9wX3dj
X3N0YXR1cz5ObzwvbG9wX3djX3N0YXR1cz4KICAgPC9sZWFkPgo8L2xlYWRzPg==
</base64>
</value>
</member>
</struct>
</value>
</param>
</params>
</methodCall>
I'm completely stuck on this problem. Any help would be much appreciated.
Check this out http://codinghints.blogspot.com/2010/03/xml-rpc-calls-with-c.html to see how one can manually call the service. There are probably libraries to do it in nice way...
How you specify parameters depends on what approach you find to construct the request. In case of manually constructing request (I'd recommend XDocument to build XML, not String.Format, but String.Format may be ok in very simple cases like your example) you would just put values into right places in boilerplate XML...
Yes byte array to base64 is Convert.ToBase64String(byteArray).
Something like following could be enough (but please try use proper ways to construct XML for non-one-time-use code):
String.Format("<?xml versi... <name>key</name><value><string>{0}</string>...",
key, Convert.ToBase64String(byteArray));

Convert node section into one line

i have xml document with a lot of items like this:
<item>
<key>
<unsignedShort>x</unsignedShort>
</key>
<value>
<unsignedShort>y</unsignedShort>
</value>
</item>
<item>
......
now i want that every section of it will look like this:
<item><key><unsignedShort>x</unsignedShort></key><value><unsignedShort>y</unsignedShort></value></item>
You would have to implement your own XmlWriter. XmlTextWriter allows you to use an unindented format, but that would result in the entire document being on one line. So, yeah, you'd have to roll your own, unless you're OK with the entire document sitting on one line.

Flatten XML structure by element with linq to xml

I recently created a post about flattening an XML structure so every element and it's values were turned into attributes on the root element. Got some great answer and got it working. However, sad thing is that by flattening, the client meant to flatten the elements and not make them into attributes :-/
What I have is this:
<members>
<member xmlns="mynamespace" id="1" status="1">
<sensitiveData>
<notes/>
<url>someurl</url>
<altUrl/>
<date1>somedate</date1>
<date2>someotherdate</date2>
<description>some description</description>
<tags/>
<category>some category</category>
</sensitiveData>
<contacts>
<contact contactId="1">
<contactPerson>some contact person</contactPerson>
<phone/>
<mobile>mobile number</mobile>
<email>some#email.com</email>
</contact>
</kontakter>
</member>
</members>
And what I need is the following:
<members>
<member xmlns="mynamespace" id="1" status="1">
<sensitiveData/>
<notes/>
<url>someurl</url>
<altUrl/>
<date1>somedate</date1>
<date2>someotherdate</date2>
<description>some description</description>
<tags/>
<category>some category</category>
<contacts/>
<contact contactId="1"></contact>
<contactPerson>some contact person</contactPerson>
<phone/>
<mobile>mobile number</mobile>
<email>some#email.com</email>
</member>
</members>
So basically all elements, but flattened as childnodes of . I do know that it's not pretty at all to begin parsing XML documents like this, but it's basically the only option left as the CMS we're importing data to requires this flat structure and the XML document comes from an external webservice.
I started to make a recursive method for this, but I've got an odd feeling that it could be made smoother (well, as smooth as possible at least) with some LINQ to XML (?) I'm not the best at linq to xml, so I hope there's someone out there who would be helpful to give a hint on how to solve this? :-)
This seems to work - there may be neater approaches, admittedly:
var doc = XDocument.Load("test.xml");
XNamespace ns = "mynamespace";
var member = doc.Root.Element(ns + "member");
// This will *sort* of flatten, but create copies...
var descendants = member.Descendants().ToList();
// So we need to strip child elements from everywhere...
// (but only elements, not text nodes). The ToList() call
// materializes the query, so we're not removing while we're iterating.
foreach (var nested in descendants.Elements().ToList())
{
nested.Remove();
}
member.ReplaceNodes(descendants);

Is this structure valid in XML RPC?

Is it valid in XML-RPC to have an unbounded array of elements without having them inside of a array/data parent? From my limited experience with XML-RPC I have seen that arrays should be listed as such:
<member>
<name>Name</name>
<value>
<array>
<data>
<value>
<string>Red</string>
</value>
<value>
<string>Blue</string>
</value>
</data>
</array>
</value>
</member>
...with the parent Name having children strings Red and Blue. However, the 3rd party RPC service we are integrating with sends arrays of unbounded elements without having them inside of the array/data element, but inside a struct, e.g.
<member>
<name>Name</name>
<value>
<struct>
<member><name>Option0</name>
<value><string>Red</string>
</member>
<member><name>Option1</name>
<value><string>Blue</string>
</member>
</struct>
</value>
</member>
...With the values of Option1 and Option2 encapsulated inside of a struct.
The problem I am facing is that when designing the classes that will be serialized, I would have to design my class such as
private string Option0
private string Option1
...
...instead of:
private string[] Name
As I do not know the number of unbounded fields coming back in the structure, it seems the right way to accomplish the task would be to have an array of strings to enumerate through. However, there are no arrays in the resoponse XML, just structures with a dynamic number of fields. Because of that, I would have to list a large number of fields to conform to the structure, even though it's not really a structure, but an array. Is there something I am missing with the XML-RPC?
Yes it is quite valid XML-RPC struct. We also have such case and are using Cook Computings' XML-RPC.NET. It works perfectly. Check it, there is a special class there called XmlRpcStruct. You just need to use it in your XML-RPC method request or response.

Categories

Resources