How to read a XML file [duplicate] - c#

This question already has answers here:
How does one parse XML files? [closed]
(12 answers)
Closed 6 years ago.
I have the following XML structure for a card game. I want to load the card titels and descriptions into two arrays that I can use to randomize the cards.
<Cards>
<CardTitles>
<Title>Some Title</Title>
.
.
.
.
.
</CardTitles>
<CardDesc>
<Desc>Some description</Desc>
</CardDesc>
</Cards>
But no matter what I do or what code I write I'm unable to get the actual text from the proper tag. The closest I got was following this example :https://msdn.microsoft.com/en-us/library/system.xml.xmlreader.readsubtree(v=vs.110).aspx
I know I'm not supposed to ask for complete solutions but I'm just stumped. Any help in getting this matter cleared up to me will be great.

Supposing that you have an xml file named sample.xml in C:\temp you can use LINQ To XML:
XElement x = XElement.Load (#"c:\temp\Sample.xml");
IEnumerable<string> titles = from title in x.Element("CardTitles").Elements()
select title.Value;
IEnumerable<string> descriptions = from description in x.Element("CardDesc").Elements()
select description.Value;

Instead of going XmlReader route, you can use XmlSerializer which is much more simple and straightforward to use.
https://msdn.microsoft.com/en-us/library/58a18dwa(v=vs.110).aspx
You'd have something like this:
<Cards>
<CardTitles>
<Title>Some Title</Title>
</CardTitles>
<CardDesc>
<Desc>Some description</Desc>
</CardDesc>
</Cards>
.Net Classes
public class Cards {
public CardTitles CardTitles;
public CardDesc CardDesc;
}
public class CardTitles {
public String Title;
}
public class CardDesc {
public String Desc;
}
And then use XmlSerializer.Deserialize method.
XmlSerializer xmlSerializer = new XmlSerializer(typeof(Cards));
StringReader inputStrReader = new StringReader(inputString);
Cards cards = (Cards)xmlSerializer.Deserialize(inputStrReader);

Related

Count how many times there is a specific text in a string and get the values in a array [duplicate]

This question already has answers here:
Read a XML (from a string) and get some fields - Problems reading XML
(5 answers)
Closed 12 months ago.
I have this XML file which I am having troubles deserializing it, so I'm going kind of way around it. I have an XML string and I want to get a value out of it. Let's say this is my XML string:
string XMLstring = "<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<InputText>123</InputText>
<InputText>Apple</InputText>
<InputText>John</InputText>
</note>";
Now, I have tried something like checking if the XMLstring contains InputText, but I want someway to get all the three values from there and then use them somewhere. Is there any way I can do this without having to deserialize it?
You can use LINQ-to-XML to parse the string and obtain the values.
using System.Linq;
using System.Xml.Linq;
public static void Main()
{
var xml = #"<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body><InputText>123</InputText><InputText>Apple</InputText><InputText>John</InputText></note>";
var list = XDocument.Parse(xml).Descendants("InputText").Select( x => x.Value );
foreach (var item in list) Console.WriteLine(item);
}
Output:
123
Apple
John
Fiddle

Printing links out in correct format using HtmlAgilityPack [duplicate]

This question already has answers here:
How to extract full url with HtmlAgilityPack - C#
(2 answers)
Closed 4 years ago.
I’ve been scraping a website using HtmlAgilityPack, but I need the links to print out in the proper format. On the page, I am scraping some of the links include the proper “https://...” formatting at the beginning of the link, however, most start with something else.
For example, a few of the links print starting with “/xxx” or just simply “.//”. Is there any way to sort through the links I have scraped and print the links starting with the proper “https://” format before them?
Currently my code looks like this:
var hg = doc.DocumentNode.SelectNodes("//body[#class]");
//Sort through list and print
foreach (var node in hg)
{
foreach(HtmlNode node2 in node.SelectNodes(".//a[#href]"))
{
string attributeValue = node2.GetAttributeValue("href", "");
if (attributeValue[0:7] != "https://")
{
Console.WriteLine("https://url/" + node2.Attributes["href"].Value);
}
}
}
Console.ReadLine();
I’ve been trying to use indexing of the attributeValue string to see what the link starts with, but keep getting an error telling me I can’t use indexing there. Perhaps there is a better way to check the beginning of the links I am unaware of?
I’m a novice at C#, and any help understanding this issue would be greatly appreciated!
Try using StartsWith as oppose to trying to index the string
var hg = doc.DocumentNode.SelectNodes("//body[#class]");
//Sort through list and print
foreach (var node in hg)
{
foreach(HtmlNode node2 in node.SelectNodes(".//a[#href]"))
{
string attributeValue = node2.GetAttributeValue("href", "");
if (!attributeValue.StartsWith("https://"))
{
Console.WriteLine("https://url/" + node2.Attributes["href"].Value);
}
}
}
Console.ReadLine();

Writing contents of a class to a file [duplicate]

This question already has answers here:
Best practices for serializing objects to a custom string format for use in an output file
(8 answers)
Closed 7 years ago.
I have a class, all string properties like this:
public class MyClass
{
public string Name {get;set;}
public string Phone {get;set;}
// a bunch of more fields....
{
And a list of that class List<MyClass> myListOfObjects; that I have populated it through out the program with values.
And then I have a file (csv) with headers:
Name, Phone, etc...
I want to write the contents of that myListOfObjects into that file.
I was thinking just loop through them and write one row per object. But wanted to see is there a better nicer way?
You can write all your data in one shot, like
var list = new List<MyClass>();
var fileData = list.Select(row => string.Join(",", row.Name, row.Phone, row.Etc)).ToArray();
File.WriteAllLines(#"C:\YourFile", fileData);
Note: This is one way to improve the file write, but it doesn't handle un-escaped text data like Name with comma.

How To Parse JSON Object Using Regular Expression in C# [duplicate]

This question already has answers here:
Regex To Extract An Object From A JSON String
(3 answers)
Closed 8 years ago.
string sample = "{\"STACK_SIZE\":4,\"thes_stack\":[4,4]}";
how can I parse it using RE in C#?
First of all this isn't a valid JSON, remove the backslashes.
Second, using a library like JSON.NET you can parse your sample.
string sample = "{"STACK_SIZE":4, "thes_stack":[4,4]}";
var parsed = JsonConvert.DeserializeObject<dynamic>(sample);
that will parse it into a dynamic type, if you want something more strongly typed create your own class:
class StackInfo
{
public int STACK_SIZE {get; set;}
public int[] thes_stack {get; set;}
}
then you can deserialize into it:
string sample = "{"STACK_SIZE":4, "thes_stack":[4,4]}";
var parsed = JsonConvert.DeserializeObject<StackInfo>(sample);
But since you didn't put exactly what you need or exactly what your problem is with the suggestions in the comments no one can really help you.

How do I get the contents of an XML element using a XmlSerializer?

I have an XML reader on this XML string:
<?xml version="1.0" encoding="UTF-8" ?>
<story id="1224488641nL21535800" date="20 Oct 2008" time="07:44">
<title>PRESS DIGEST - PORTUGAL - Oct 20</title>
<text>
<p> LISBON, Oct 20 (Reuters) - Following are some of the main
stories in Portuguese newspapers on Monday. Reuters has not
verified these stories and does not vouch for their accuracy. </p>
<p>More HTML stuff here</p>
</text>
</story>
I created an XSD and a corresponding class for deserialization.
[System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
public class story {
[System.Xml.Serialization.XmlAttributeAttribute()]
public string id;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string date;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string time;
public string title;
public string text;
}
I then create an instance of the class using the Deserialize method of XmlSerializer.
XmlSerializer ser = new XmlSerializer(typeof(story));
return (story)ser.Deserialize(xr);
Now, the text member of story is always null. How do I change my story class so that the XML is parsed as expected?
EDIT:
Using an XmlText does not work and I have no control over the XML I'm parsing.
I found a very unsatisfactory solution.
Change the class like this (ugh!)
// ...
[XmlElement("HACK - this should never match anything")]
public string text;
// ...
And change the calling code like this (yuck!)
XmlSerializer ser = new XmlSerializer(typeof(story));
string text = string.Empty;
ser.UnknownElement += delegate(object sender, XmlElementEventArgs e) {
if (e.Element.Name != "text")
throw new XmlException(
string.Format(CultureInfo.InvariantCulture,
"Unknown element '{0}' cannot be deserialized.",
e.Element.Name));
text += e.Element.InnerXml;
};
story result = (story)ser.Deserialize(xr);
result.text = text;
return result;
This is a really bad way of doing it because it breaks encapsulation. Is there a better way of doing it?
The suggestion that I was going to make if the text tag only ever contained p tags was the following, it may be useful in the short term.
Instead of story having the text field as a string, you could have it as an array of strings. You could then use the right XmlArray attributes (can't remember the exact names, something like XmlArrayItemAttribute), with the right parameters to make it look like:
<text>
<p>blah</p>
<p>blib</p>
</text>
Which is a step closer, but not completely what you need.
Another option is to make a class like:
public class Text //Obviously a bad name for a class...
{
public string[] p;
public string[] pre;
}
And again use the XmlArray attributes to get it to look right, not sure if they are as configurable as that because I've only used them for simple types before.
Edit:
Using:
[System.Xml.Serialization.XmlRootAttribute(Namespace = "", IsNullable = false)]
public class story
{
[System.Xml.Serialization.XmlAttributeAttribute()]
public string id;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string date;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string time;
public string title;
[XmlArrayItem("p")]
public string[] text;
}
Works well with the supplied XML, but having the class seems a little more complicated. It ends up as something similar to:
<text>
<p>
<p>qwertyuiop</p>
<p>asdfghjkl</p>
</p>
<pre>
<pre>stuff</pre>
<pre>nonsense</pre>
</pre>
</text>
which is obviously not what is desired.
You could implement IXmlSerializable for your class and handle the inner elements there, this means that you keep the code for deserializing your data inside the target class (thus avoiding your problem with encapsulation). It's a simple enough data type that the code should be trivial to write.
Looks to me that the XML is incorrect.
Since you use HTML tags within the text tag the HTML tags are interpreted as XML.
You should use CDATA to correctly interpret the data or escape < and >.
Since you do not have control over the XML you could use StreamReader instead.
XmlReader interprets the HTML tags as XML which is not what you want.
XmlSerializer will however strip the HTML tags within the text tag.
Perhaps using the XmlAnyElement attribute instead of handling the UnknownElement event may be more elegant.
Have you tried xsd.exe? It allows you to create xsd's from xml doc's and then generate classes from the xsd that should be ripe for xml deserialization.
I encountered this same issue after using XSD.exe to generate XSD from XML and then XSD to classes. I added an [XmlText] tag before the class of the object in the generated class file (called P in my case because of the <p> tag it was inferring as an XML node) and it worked instantly. pulling in the complete HTML content that was inside the parent node and putting in that P object, which I then renamed to something more useful.

Categories

Resources