Hi I am trying to parse some xml from a weird xml document developed by icalander. I have been having a lot of trouble just parsing the data, but thanks to the help of people from stackoverflow I have been able to parse the data. Now I need some help parsing between the nodes. Here is a link to the xml file I am parsing from (http://datastore.unm.edu/events/events.xml)
I am using the pivotapp model from Visual Studio 2010 to create this app. In the MainViewModel.cs section I am modifying the following code in hopes that the tag will print out in place of "LineOne" (code listed below). For example, from the xml file linked above, I would like LineOne = Lobo's Got Talent.
I need help figuring out the best method to achieve this, I will need LineTwo to contain the date and time, and LineThree to contain the description.
Thank you for your time and help, it has been greatly appreciated!
public void LoadData()
{
var webClient = new WebClient();
webClient.OpenReadAsync(new Uri("http://datastore.unm.edu/events/events.xml"));
webClient.OpenReadCompleted += new OpenReadCompletedEventHandler(webClient_OpenReadCompleted);
}
public void webClient_OpenReadCompleted(object sender,
OpenReadCompletedEventArgs e)
{
XDocument unmXdoc = XDocument.Load(e.Result, LoadOptions.None);
this.Items.Add(new ItemViewModel() { LineOne = unmXdoc.ToString(),
LineTwo = "", LineThree = "" });
}
Thank you for looking and helping!
The xml is fine, I think you are running into a namespace issue here, you have two options, strip the namespace of the xml file if you are sure you do not need it. The preferred option is to work with the namespace and specify it for the fully qualified element names. see Here
private readonly XNamespace dataNamspace = "urn:ietf:params:xml:ns:icalendar-2.0";
public void webClient_OpenReadCompleted(object sender,
OpenReadCompletedEventArgs e)
{
XDocument unmXdoc = XDocument.Load(e.Result, LoadOptions.None);
this.Items = from p in unmXdoc.Descendants(dataNamspace + "vevent").Elements(dataNamspace + "properties")
select new ItemViewModel
{
LineOne = this.GetElementValue(p, "summary"),
LineTwo = this.GetElementValue(p, "description"),
LineThree = this.GetElementValue(p, "categories"),
};
lstData.ItemsSource = this.Items;
}
private string GetElementValue(XElement element, string fieldName)
{
var childElement = element.Element(dataNamspace + fieldName);
return childElement != null ? childElement.Value : String.Empty;
}
Related
I've got a few web pages that have static data in HTML mark-up tables. By this, I mean, manually maintained text:
<table border="1" >
<tr><th>Number</th><th>Date</th><th>BW</th><th>WW</th><th>%</th><th>Type</th><th>CED</th><th>BW</th><th>WW</th><th>YW</th><th>Mlk</th><th>Me</th></tr>
<tr><td>313</td><td>9/16/2013</td><td>74</td><td>512</td><td>100</td><td>861U</td><td>3</td><td>-1.1</td><td>54</td><td>85</td><td>16</td><td></td></tr>
<tr><td>315</td><td>10/6/2013</td><td>-</td><td>-</td><td>-</td><td>W179</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr>
<tr><td>316</td><td>10/102013</td><td>72</td><td>595</td><td>94.2</td><td>W179</td><td>7</td><td>-2.3</td><td>53</td><td>80</td><td>21</td><td>-3</td></tr>
<tr><td>350</td><td>10/11/2013</td><td>71</td><td>703</td><td>100</td><td>W179</td><td>7</td><td>-2.3</td><td>46</td><td>72</td><td>20</td><td>-5</td></tr>
<tr><td>392</td><td>3/8/2013</td><td>61</td><td>651</td><td>100</td><td>RANGER</td><td>7</td><td>-2.3</td><td>52</td><td>82</td><td>20</td><td>-2</td></tr>
<tr><td>303</td><td>7/3/2013</td><td>63</td><td>-</td><td>97.1</td><td>W179</td><td>8</td><td>-3.2</td><td>N/A</td><td>82</td><td>21</td><td>-8</td></tr>
<tr><td>304</td><td>7/8/2013</td><td>62</td><td>-</td><td>97.1</td><td>W179</td><td>7</td><td>-3.9</td><td>N/A</td><td>69</td><td>20</td><td>-4</td></tr>
<tr><td>397</td><td>3/18/2013</td><td>78</td><td>621</td><td>100</td><td>STATEMENT</td><td>6</td><td>-2.7</td><td>55</td><td>84</td><td>19</td><td>5</td></tr>
<tr><td>395</td><td>3/17/2013</td><td>63</td><td>716</td><td>94.2</td><td>STATEMENT</td><td>5</td><td>-2.7</td><td>54</td><td>85</td><td>19</td><td>5</td></tr>
<tr><td>390</td><td>3/6/2013</td><td>66</td><td>583</td><td>94.2</td><td>ENVY</td><td>2</td><td>-0.6</td><td>55</td><td>80</td><td>23</td><td>2</td></tr>
<tr><td>388</td><td>3/4/2013</td><td>53</td><td>621</td><td>100</td><td>STATEMENT</td><td>10</td><td>-5.1</td><td>49</td><td>82</td><td>20</td><td>2</td></tr>
<tr><td>300</td><td>3/22/2013</td><td>61</td><td>633</td><td>100</td><td>RANGER</td><td>8</td><td>-2.8</td><td>49</td><td>81</td><td>19</td><td>-2</td></tr>
<tr><td>379</td><td>2/1/2013</td><td>55</td><td>518</td><td>100</td><td>STATEMENT</td><td>8</td><td>-4.1</td><td>61</td><td>98</td><td>18</td><td>1</td></tr>
<tr><td>398</td><td>3/20/2013</td><td>62</td><td>664</td><td>100</td><td>RANGER</td><td>6</td><td>-2.3</td><td>53</td><td>83</td><td>20</td><td>0</td></tr>
<tr><td>384</td><td>2/10/2013</td><td>61</td><td>650</td><td>100</td><td>ENVY</td><td>3</td><td>-1</td><td>50</td><td>70</td><td>19</td><td>4</td></tr>
<tr><td>369</td><td>1/30/2013</td><td>76</td><td>651</td><td>100</td><td>STATEMENT</td><td>5</td><td>-2.4</td><td>60</td><td>99</td><td>20</td><td>8</td></tr>
<tr><td>373</td><td>1/21/2013</td><td>71</td><td>433</td><td>100</td><td>STATEMENT</td><td>4</td><td>-1.6</td><td>55</td><td>89</td><td>17</td><td>3</td></tr>
<tr><td>393</td><td>3/10/2013</td><td>63</td><td>717</td><td>100</td><td>STATEMENT</td><td>3</td><td>-4.6</td><td>51</td><td>91</td><td>20</td><td>5</td></tr>
<tr><td>389</td><td>3/8/2013</td><td>72</td><td>723</td><td>88.3</td><td>ENVY</td><td>4</td><td>-0.6</td><td>54</td><td>76</td><td>24</td><td>2</td></tr>
<tr><td>364</td><td>10/1/2012</td><td>60</td><td>574</td><td>100</td><td>RANGER</td><td>1</td><td>0.4</td><td>56</td><td>84</td><td>21</td><td>2</td></tr>
</table>
Currently, I am contemplating using a WebClient.DownloadString to pull all of the text in, and try to create an XML file out of it by parsing each row <tr>.
That sounds tedious, and I would rather not reinvent the wheel. Besides, a few good solutions would give me something to look at for ideas on how to best approach writing my version.
Has anyone come across some code that can do this?
I've started, to give you an idea of what I'm working on:
private const string XML_DATA = "App_Data/page_data.xml";
private const string TABLE_START = "<table>";
private const string TABLE_STOP = "</table>";
private string[] TABLE_ROW = { "<tr>", "</tr>" };
private string[] TABLE_HEAD = { "<th>", "</th>" };
private string[] TABLE_DET = { "<td>", "</td>" };
private void load_data() {
if (!File.Exists(XML_DATA)) {
string HtmlText;
using (var client = new WebClient()) {
HtmlText = client.DownloadString(Server.MapPath("/Sales.aspx"));
}
if (!String.IsNullOrEmpty(HtmlText)) {
var lcTxt = HtmlText.ToLower();
int len0 = TABLE_START.Length;
int tStart = lcTxt.IndexOf(TABLE_START) + len0;
int tStop = lcTxt.IndexOf(TABLE_STOP);
if ((len0 < tStart) && (tStart < tStop)) {
var tableString = HtmlText.Substring(tStart, tStop - tStart);
var tableRows = tableString.Split(TABLE_ROW, StringSplitOptions.RemoveEmptyEntries);
foreach (var row in tableRows) {
if (-1 < row.IndexOf(TABLE_HEAD[0])) {
//
} else {
//
}
}
}
}
}
}
Of course, you can see that is already going to fail, because the Markup using <table border="1">.
Yes, easy to fix, but I'd rather have a working guide that has already been through a lot of debugging steps.
UPDATE: I tried using XmlDocument's LoadXml method, but it can't seem to read basic HTML:
You definitely shouldn't be trying to parse that manually. Other people have already solved that problem.
If your markup is valid XML (and from what you've shown us, it looks like it is), then you can just parse it as XML:
XmlDocument doc = new XmlDocument();
doc.LoadXml(HtmlString);
doc.Save("myfile.xml");
But for that matter, if it's already valid XML markup, and all you need to do is save it as a file, then you don't need to parse it. Just save it:
File.WriteAllText("myfile.xml", HtmlString);
I have created a simple web crawler but I want to add the recursion function so that every page that is opened I can get the URLs in this page, but I have no idea how I can do that and I want also to include threads to make it faster.
Here is my code
namespace Crawler
{
public partial class Form1 : Form
{
String Rstring;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
WebRequest myWebRequest;
WebResponse myWebResponse;
String URL = textBox1.Text;
myWebRequest = WebRequest.Create(URL);
myWebResponse = myWebRequest.GetResponse();//Returns a response from an Internet resource
Stream streamResponse = myWebResponse.GetResponseStream();//return the data stream from the internet
//and save it in the stream
StreamReader sreader = new StreamReader(streamResponse);//reads the data stream
Rstring = sreader.ReadToEnd();//reads it to the end
String Links = GetContent(Rstring);//gets the links only
textBox2.Text = Rstring;
textBox3.Text = Links;
streamResponse.Close();
sreader.Close();
myWebResponse.Close();
}
private String GetContent(String Rstring)
{
String sString="";
HTMLDocument d = new HTMLDocument();
IHTMLDocument2 doc = (IHTMLDocument2)d;
doc.write(Rstring);
IHTMLElementCollection L = doc.links;
foreach (IHTMLElement links in L)
{
sString += links.getAttribute("href", 0);
sString += "/n";
}
return sString;
}
I fixed your GetContent method as follow to get new links from crawled page:
public ISet<string> GetNewLinks(string content)
{
Regex regexLink = new Regex("(?<=<a\\s*?href=(?:'|\"))[^'\"]*?(?=(?:'|\"))");
ISet<string> newLinks = new HashSet<string>();
foreach (var match in regexLink.Matches(content))
{
if (!newLinks.Contains(match.ToString()))
newLinks.Add(match.ToString());
}
return newLinks;
}
Updated
Fixed: regex should be regexLink. Thanks #shashlearner for pointing this out (my mistype).
i have created something similar using Reactive Extension.
https://github.com/Misterhex/WebCrawler
i hope it can help you.
Crawler crawler = new Crawler();
IObservable observable = crawler.Crawl(new Uri("http://www.codinghorror.com/"));
observable.Subscribe(onNext: Console.WriteLine,
onCompleted: () => Console.WriteLine("Crawling completed"));
The following includes an answer/recommendation.
I believe you should use a dataGridView instead of a textBox as when you look at it in GUI it is easier to see the links (URLs) found.
You could change:
textBox3.Text = Links;
to
dataGridView.DataSource = Links;
Now for the question, you haven't included:
using System. "'s"
which ones were used, as it would be appreciated if I could get them as can't figure it out.
From a design standpoint, I've written a few webcrawlers. Basically you want to implement a Depth First Search using a Stack data structure. You can use Breadth First Search also, but you'll likely come into stack memory issues. Good luck.
update
I'm writing a silverlight application and I have the following Class "Home", in this class a read a .xml file a write these to a ListBox. In a other class Overview I will show the same .xml file. I know it is stupid to write the same code as in the class "Home".
The problem is, how to reach these data.
My question is how can I reuse the method LoadXMLFile() from another class?
The code.
// Read the .xml file in the class "Home"
public void LoadXMLFile()
{
WebClient xmlClient = new WebClient();
xmlClient.DownloadStringCompleted += new DownloadStringCompletedEventHandler(XMLFileLoaded);
xmlClient.DownloadStringAsync(new Uri("codeFragments.xml", UriKind.RelativeOrAbsolute));
}
private void XMLFileLoaded(object sender, DownloadStringCompletedEventArgs e)
{
if (e.Error == null)
{
string xmlData = e.Result;
XDocument xDoc = XDocument.Parse(xmlData);
var tagsXml = from c in xDoc.Descendants("Tag") select c.Attribute("name");
List<Tag> lsTags = new List<Tag>();
foreach (string tagName in tagsXml)
{
Tag oTag = new Tag();
oTag.name = tagName;
var tags = from d in xDoc.Descendants("Tag")
where d.Attribute("name").Value == tagName
select d.Elements("oFragments");
var tagXml = tags.ToArray()[0];
foreach (var tag in tagXml)
{
CodeFragments oFragments = new CodeFragments();
oFragments.tagURL = tag.Attribute("tagURL").Value;
//Tags.tags.Add(oFragments);
oTag.lsTags.Add(oFragments);
}
lsTags.Add(oTag);
}
//List<string> test = new List<string> { "a","b","c" };
lsBox.ItemsSource = lsTags;
}
}
Create a class to read the XML file, make references to this from your other classes in order to use it. Say you call it XmlFileLoader, you would use it like this in the other classes:
var xfl = new XmlFileLoader();
var data = xfl.LoadXMLFile();
If I were you, I would make the LoadXMLFile function take a Uri parameter to make it more reusable:
var data = xfl.LoadXMLFile(uriToDownload);
You could create a class whose single responsibility is loading XML and returning it, leaving the class that calls your LoadXmlFile method to determine how to handle the resulting XML.
I have looked all over for this. It could be me just typing the wrong thing in search I'm not sure. So, if you know a good tutorial or example of this please share. I'm trying to learn.
I have a C# Windows Form app I'm working on. I have information (movies in this case) saved in an XML file. I saved the xml file like this.
//Now we add new movie.
XmlElement nodRoot = doc.DocumentElement;
string allMyChildren = nodRoot.InnerText;
string capitalized = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(movieEditNameTextbox.Text);
int indexLookForNewMake = allMyChildren.IndexOf(capitalized);
if (indexLookForNewMake >= 0)
{
MessageBox.Show("Movie is already saved.", "Error");
}
else
{
XmlElement el = doc.CreateElement("Name");
el.InnerText = capitalized;
doc.DocumentElement.AppendChild(el);
//Check if Year is really a Number.
if (movieEditYearTextbox.Text.All(Char.IsDigit))
{
//Remove ' cause it gives errors.
string capitalizedFixed = capitalized.Replace("'", "");
string capitalizedFinalFixed = capitalizedFixed.Replace("\"", "");
//Assign Attribute to each New one.
el.SetAttribute("Name", capitalizedFinalFixed);
el.SetAttribute("Type", movieEditTypeDropdown.Text);
el.SetAttribute("Year", movieEditYearTextbox.Text);
//Reset all fields, they don't need data now.
movieEditNameTextbox.Text = "";
movieEditYearTextbox.Text = "";
movieEditTypeDropdown.SelectedIndex = -1;
removeMovieTextbox.Text = "";
doc.Save("movie.xml");
label4.Text = "Movie Has been Edited";
loadXml();
}
else
{
//Error out. Year not a Number
MessageBox.Show("Check movie year. Seems it isn't a number.", "Error");
}
}
That all works fine. Now what I'm trying to do is make it where you can choose a directory, and it search the directory and sub directories and get file names and save them into the XML file.
I used this to try to accomplish this. It does pull the list. But it doesn't save it. It don't save the new information.
I can't use LINQ as it cause a confliction for some reason with other code.
DirectoryInfo dirCustom = new DirectoryInfo(#"D:\Video");
FileInfo[] filCustom;
filCustom = dirCustom.GetFiles("*",SearchOption.AllDirectories);
//Open XML File.
XmlDocument doc = new XmlDocument();
doc.Load("movie.xml");
XmlElement el = doc.CreateElement("Name");
string fulCustoms = filCustom.ToString();
foreach (FileInfo filFile in filCustom)
{
string capitalized = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(filFile.Name);
string capitalizedFixed = capitalized.Replace("\"", "");
el.SetAttribute("Name", capitalizedFixed);
el.SetAttribute("Type", "EDIT TYPE");
el.SetAttribute("Year", "EDIT YEAR");
richTextBox1.AppendText(capitalizedFixed + "\r\n");
}
doc.Save("movie.xml");
label4.Text = "Movie Has been Edited";
loadXml();
Now, the richTextBox does display the information correctly but it don't save it.
The loadXml() is just my noobish way to refresh the datagridview.
I'm completely lost and don't know where to turn to. I know my coding is probarely horrible, lol. I'm new to this. This is my first more complex application I have worked on.
I can't think of anymore information that would help you understand what I mean. I hope you do.
Thank you so much for your help.
Not sure exactly what your LoadXML() method does but my only piece of advise with your issue is to change the way you are implementing this functionality.
Create an object called Movie
public class Movie
{
public Movie() {}
public String Title { get; set; }
blah... blah...
}
Then create a MovieList
public class MovieList : List<Movie> { }
Then implement the following 2 methods inside the MovieList.
public static void Serialize(String path, MovieList movieList)
{
XmlSerializer serializer = new XmlSerializer(typeof(MovieList));
using (StreamWriter streamWriter = new StreamWriter(path))
{
serializer.Serialize(streamWriter, movieList);
}
}
public static MovieList Deserialize(String path)
{
XmlSerializer serializer = new XmlSerializer(typeof(MovieList));
using (StreamReader streamReader = new StreamReader(path))
{
return (MovieList) serializer.Deserialize(streamReader);
}
}
Thats it... You now have your object serialized and you can retrieve the data to populate through binding or whatever other methods you choose.
How can I iterate SharePoint lists and subsites from a C# program? Is the SharePoint.dll from a SharePoint installation required for this, or is there a "Sharepoint client" dll available for remotely accessing that data?
Use the Sharepoint web services; in particular the Webs and Lists webservices do what you ask.
For Sharepoint 2007:
http://msdn.microsoft.com/en-us/library/bb862916(v=office.12).aspx
For Sharepoint 2007 you will need to access the web services. In Sharepoint 2010, there is a sharepoint client object model.
http://msdn.microsoft.com/en-us/library/ee857094%28office.14%29.aspx
I happen to be dealing with this very thing now... this works. I've dumbed down the code a bit to focus on just the mechanics. It's rough around the edges, but hopefully you get the idea. It's working for me.
Also, be sure to set up a web reference using the URL of your Sharepoint site. Use that as your "web reference" below.
private <web reference> _Service;
private String _ListGuid, _ViewGuid;
private Initialize()
{
_Service = new <web reference>.Lists();
_Service.Credentials = System.Net.CredentialCache.DefaultCredentials;
_Service.Url = "https://sharepointsite/_vti_bin/lists.asmx";
}
private String SpFieldName(String FieldName, Boolean Prefix)
{
return String.Format("{0}{1}", Prefix ? "ows_" : null,
FieldName.Replace(" ", "_x0020_"));
}
private String GetFieldValue(XmlAttributeCollection AttributesList,
String AttributeName)
{
AttributeName = SpFieldName(AttributeName, true);
return AttributesList[AttributeName] == null ?
null : return AttributesList[AttributeName].Value;
}
public void GetList()
{
string rowLimit = "2000"; // or whatever
System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
System.Xml.XmlElement query = xmlDoc.CreateElement("Query");
System.Xml.XmlElement viewFields = xmlDoc.CreateElement("ViewFields");
System.Xml.XmlElement queryOptions =
xmlDoc.CreateElement("QueryOptions");
queryOptions.InnerXml = "";
System.Xml.XmlNode nodes = _Service.GetListItems(_ListGuid, _ViewGuid,
query, viewFields, rowLimit, null, null);
foreach (System.Xml.XmlNode node in nodes)
{
if (node.Name.Equals("rs:data"))
{
for (int i = 0; i < node.ChildNodes.Count; i++)
{
if (node.ChildNodes[i].Name.Equals("z:row"))
{
XmlAttributeCollection att =
node.ChildNodes[i].Attributes;
String title = GetFieldValue("Title");
String partNumber = GetFieldValue("Part Number");
}
}
}
}
}
}
Also, the SpFieldName method is not iron-clad. It's just a good guess, for most field names in a list. This, unfortunately, is a journey of discovery. You need to expose the XML to find the actual field names if they don't match.
Good hunting.