Url Xml Parsing In c# - c#

i want to get a data from a xml site but i want spesific data i want to get USD/TRY, GBP/TRY and EUR/TRY Forex Buying values i dont know how to split those values from the data i have a test console program and the is like this
using System;
using System.Xml;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string XmlUrl = "https://www.tcmb.gov.tr/kurlar/today.xml";
XmlTextReader reader = new XmlTextReader(XmlUrl);
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element: // The node is an element.
Console.Write("<" + reader.Name);
while (reader.MoveToNextAttribute()) // Read the attributes.
Console.Write(" " + reader.Name + "='" + reader.Value + "'");
Console.Write(">");
Console.WriteLine(">");
break;
case XmlNodeType.Text: //Display the text in each element.
Console.WriteLine(reader.Value);
break;
case XmlNodeType.EndElement: //Display the end of the element.
Console.Write("</" + reader.Name);
Console.WriteLine(">");
break;
}
}
}
}
}
how can i split the values from i want from the xml
My desired output is this
public class ParaBirimi
{
public decimal ForexBuying { get; set; }
public string Name { get; set; }//Values like USD GBP EUR
}
class to a list

It is better to use LINQ to XML API. It is available in the .Net Framework since 2007.
Here is your starting point. You can extend it to read any attribute or element.
XML fragment
<Tarih_Date Tarih="28.05.2021" Date="05/28/2021" Bulten_No="2021/100">
<Currency CrossOrder="0" Kod="USD" CurrencyCode="USD">
<Unit>1</Unit>
<Isim>ABD DOLARI</Isim>
<CurrencyName>US DOLLAR</CurrencyName>
<ForexBuying>8.5496</ForexBuying>
<ForexSelling>8.5651</ForexSelling>
<BanknoteBuying>8.5437</BanknoteBuying>
<BanknoteSelling>8.5779</BanknoteSelling>
<CrossRateUSD/>
<CrossRateOther/>
</Currency>
<Currency CrossOrder="1" Kod="AUD" CurrencyCode="AUD">
<Unit>1</Unit>
<Isim>AVUSTRALYA DOLARI</Isim>
<CurrencyName>AUSTRALIAN DOLLAR</CurrencyName>
<ForexBuying>6.5843</ForexBuying>
<ForexSelling>6.6272</ForexSelling>
<BanknoteBuying>6.5540</BanknoteBuying>
<BanknoteSelling>6.6670</BanknoteSelling>
<CrossRateUSD>1.2954</CrossRateUSD>
<CrossRateOther/>
</Currency>
...
</Tarih_Date>
c#
void Main()
{
const string URL = #"https://www.tcmb.gov.tr/kurlar/today.xml";
XDocument xdoc = XDocument.Load(URL);
foreach (XElement elem in xdoc.Descendants("Currency"))
{
Console.WriteLine("CrossOrder: '{1}', Kod: '{1}', CurrencyCode: '{2}', Isim: '{3}', CurrencyName: '{4}'{0}"
, Environment.NewLine
, elem.Attribute("CrossOrder").Value
, elem.Attribute("Kod").Value
, elem.Attribute("CurrencyCode").Value
, elem.Element("Isim").Value
, elem.Element("CurrencyName").Value
);
}
}
Output
CrossOrder: '0', Kod: '0', CurrencyCode: 'USD', Isim: 'USD', CurrencyName: 'ABD DOLARI'
CrossOrder: '1', Kod: '1', CurrencyCode: 'AUD', Isim: 'AUD', CurrencyName: 'AVUSTRALYA DOLARI'

Related

C# Webscraper to grab amount of Google Results given a specific search term

I've been working on a webscraper as a Windows Forms application in C#. The user enter a search term and the term and the program will then split the search string for each individual words and look up the amount of search results through Yahoo and Google.
My issue lies with the orientation of the huge HTML document. I've tried multiple approaches such as
iterating recursively and comparing ids aswell as with lamba and the Where statements. Both results in null. I also manually looked into the html document to make sure the id of the div I want exist in the document.
The id I'm looking for is "resultStats" but it is suuuuuper nested. My code looks like this:
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace WebScraper2._0
{
public class Webscraper
{
private string Google = "http://google.com/#q=";
private string Yahoo = "http://search.yahoo.com/search?p=";
private HtmlWeb web = new HtmlWeb();
private HtmlDocument GoogleDoc = new HtmlDocument();
private HtmlDocument YahooDoc = new HtmlDocument();
public Webscraper()
{
Console.WriteLine("Init");
}
public int WebScrape(string searchterms)
{
//Console.WriteLine(searchterms);
string[] ssize = searchterms.Split(new char[0]);
int YahooMatches = 0;
int GoogleMatches = 0;
foreach (var term in ssize)
{
//Console.WriteLine(term);
var y = web.Load(Yahoo + term);
var g = web.Load(Google + term + "&cad=h");
YahooMatches += YahooFilter(y);
GoogleMatches += GoogleFilter(g);
}
Console.WriteLine("Yahoo found " + YahooMatches.ToString() + " matches");
Console.WriteLine("Google found " + GoogleMatches.ToString() + " matches");
return YahooMatches + GoogleMatches;
}
//Parse to get correct info
public int YahooFilter(HtmlDocument doc)
{
//Look for node with correct ID
IEnumerable<HtmlNode> nodes = doc.DocumentNode.Descendants().Where(n => n.HasClass("mw-jump-link"));
foreach (var item in nodes)
{
// displaying final output
Console.WriteLine(item.InnerText);
}
//TODO: Return search resultamount.
return 0;
}
int testCounter = 0;
string toReturn = "";
bool foundMatch = false;
//Parse to get correct info
public int GoogleFilter(HtmlDocument doc)
{
if (doc == null)
{
Console.WriteLine("Null");
}
foreach (var node in doc.DocumentNode.ChildNodes)
{
toReturn += Looper(node, testCounter, toReturn, foundMatch);
}
Console.WriteLine(toReturn);
/*
var stuff = doc.DocumentNode.Descendants("div")
.Where(node => node.GetAttributeValue("id", "")
.Equals("extabar")).ToList();
IEnumerable<HtmlNode> nodes = doc.DocumentNode.Descendants().Where(n => n.HasClass("appbar"));
*/
return 0;
}
public string Looper(HtmlNode node, int counter, string returnstring, bool foundMatch)
{
Console.WriteLine("Loop started" + counter.ToString());
counter++;
Console.WriteLine(node.Id);
if (node.Id == "resultStats")
{
returnstring += node.InnerText;
}
foreach (HtmlNode n in node.Descendants())
{
Looper(n, counter, returnstring, foundMatch);
}
return returnstring;
}
}
}
I made an google HTML Scraper a few weeks ago, a few things to consider
First: Google don't like when you try to Scrape their Search HTML, while i was running a list of companies trying to get their addresses and phone number, Google block my IP from accessing their website for a little bit (Which cause a hilarious panic in the office)
Second: Google will change the HTML (Id names and etc) of the page so using ID's won't work, on my case i used the combination of HTML Tags and specific information to parse the response and extract the information that i wanted.
Third: It's better to just use their API to grab the information you need, just make sure you respect their free tier query limit and you should be golden.
Here is the Code i used.
public static string getBetween(string strSource, string strStart, string strEnd)
{
int Start, End;
if (strSource.Contains(strStart) && strSource.Contains(strEnd))
{
Start = strSource.IndexOf(strStart, 0) + strStart.Length;
End = strSource.IndexOf(strEnd, Start);
return strSource.Substring(Start, End - Start);
}
else
{
return "";
}
}
public void SearchResult()
{
//Run a Google Search
string uriString = "http://www.google.com/search";
string keywordString = "Search String";
WebClient webClient = new WebClient();
NameValueCollection nameValueCollection = new NameValueCollection();
nameValueCollection.Add("q", keywordString);
webClient.QueryString.Add(nameValueCollection);
string result = webClient.DownloadString(uriString);
string search = getBetween(result, "Address", "Hours");
rtbHtml.Text = getBetween(search, "\">", "<");
}
On my case i used the String Address and Hours to limit what information i wanted to extract.
Edit: Fixed the Logic and added the Code i used.
Edit2: forgot to add the GetBetween Class. (sorry it's my first Answer)

Build XPath for node from XmlReader

I am writing an application which parses dynamic xml from various sources and traverses the XML and returns all the unique elements.
Given the sometimes very large size of the Xml files I am using a XmlReader to parse the Xml structure due to memory constraints.
public IDictionary<string, int> Discover(string filePath)
{
Dictionary<string, string> nodeTable = new Dictionary<string, string>();
using (XmlReader reader = XmlReader.Create(filePath))
{
while (!reader.EOF)
{
if (reader.NodeType == XmlNodeType.Element)
{
if (!nodeTable.ContainsKey(reader.LocalName))
{
nodeTable.Add(reader.LocalName, reader.Depth);
}
}
reader.Read();
}
}
Debug.WriteLine("The node table has {0} items.", nodeTable.Count);
return nodeTable;
}
This works a treat and is nice and performant, however the final piece of the puzzle eludes me, I am trying to generate the XPath for each element.
Now, this at first seemed straight forward using something like this.
var elements = new Stack<string>();
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
elements.Push(reader.LocalName);
break;
case XmlNodeType.EndElement:
elements.Pop();
break;
case XmlNodeType.Text:
path = string.Join("/", elements.Reverse());
break;
}
}
But this only really gives me one part of the solution. Given that I wish to return the XPath for every node in the tree which contains data and also detect if a given node tree contains nested collections of data.
i.e.
<customers>
<customer id=2>
<name>ted smith</name>
<addresses>
<address1>
<line1></line1>
</address1>
<address2>
<line1></line1>
<line2></line2>
</address2>
</addresses>
</customer>
<customer id=322>
<name>smith mcsmith</name>
<addresses>
<address1>
<line1></line1>
<line2></line2>
</address1>
<address2>
<line1></line1>
<line2></line2>
</address2>
</addresses>
</customer>
</customers>
Keeping in mind the data is completely dynamic and the schema is unknown.
So the output should include
/customer/name
/customer/address1/line1
/customer/address1/line2
/customer/address2/line1
/customer/address2/line2
I like using recursive method rather than push/pop. See code below
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input =
"<customers>" +
"<customer id=\"2\">" +
"<name>ted smith</name>" +
"<addresses>" +
"<address1>" +
"<line1></line1>" +
"</address1>" +
"<address2>" +
"<line1></line1>" +
"<line2></line2>" +
"</address2>" +
"</addresses>" +
"</customer>" +
"<customer id=\"322\">" +
"<name>smith mcsmith</name>" +
"<addresses>" +
"<address1>" +
"<line1></line1>" +
"<line2></line2>" +
"</address1>" +
"<address2>" +
"<line1></line1>" +
"<line2></line2>" +
"</address2>" +
"</addresses>" +
"</customer>" +
"</customers>";
StringReader sReader = new StringReader(input);
XmlReader reader = XmlReader.Create(sReader);
Node root = new Node();
ReadNode(reader, root);
}
static bool ReadNode(XmlReader reader, Node node)
{
Boolean done = false;
Boolean endElement = false;
while(done = reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (node.name.Length == 0)
{
node.name = reader.Name;
GetAttrubutes(reader, node);
}
else
{
Node newNode = new Node();
newNode.name = reader.Name;
if (node.children == null)
{
node.children = new List<Node>();
}
node.children.Add(newNode);
GetAttrubutes(reader, newNode);
done = ReadNode(reader, newNode);
}
break;
case XmlNodeType.EndElement:
endElement = true;
break;
case XmlNodeType.Text:
node.text = reader.Value;
break;
case XmlNodeType.Attribute:
if (node.attributes == null)
{
node.attributes = new Dictionary<string, string>();
}
node.attributes.Add(reader.Name, reader.Value);
break;
}
if (endElement)
break;
}
return done;
}
static void GetAttrubutes(XmlReader reader, Node node)
{
for (int i = 0; i < reader.AttributeCount; i++)
{
if (i == 0) node.attributes = new Dictionary<string, string>();
reader.MoveToNextAttribute();
node.attributes.Add(reader.Name, reader.Value);
}
}
}
public class Node
{
public string name = string.Empty;
public string text = string.Empty;
public Dictionary<string, string> attributes = null;
public List<Node> children = null;
}
}
​

While loop jumps unexpectedly while reading XML

I am reading an XML file using XMLDocument and XmlNodeReader.I do not know what happens to the while loop that it fails to run several parts of the code.
Here is my C# code:
public string TitleXml;
public string NameXml;
public string TypeXml;
public string ValueXml;
public Guid GuidXml;
public string DataString;
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(MyParent.xmlstring);
XmlNodeReader xreader = new XmlNodeReader(xdoc);
while (xreader.Read())
{
switch (xreader.Name)
{
case"GUID":
GuidXml = Guid.Parse(xreader.ReadInnerXml());
//after this break the name of the xreader changes.
break;
case "Type":
TypeXml = xreader.ReadInnerXml();
break;
case "Name":
NameXml = xreader.ReadInnerXml();
break;
case "Title":
TitleXml = xreader.ReadInnerXml();
break;
}
}
xreader.Close();
}
Here is my XML:
<Item>
<GUID>9A4FA56F-EAA0-49AF-B7F0-8CA09EA39167</GUID>
<Type>button</Type>
<Title>Save</Title>
<Value>submit</Value>
<Name>btnsave</Name>
<MaxLen>5</MaxLen>
</Item>
It doesn't exactly answer your question, but an (at least according to me) easier way of solving this would be:
XDocument doc = XDocument.Load("test.xml");
string TitleXml = doc.Descendants("Title").Single().Value;
string NameXml = doc.Descendants("Name").Single().Value;
string TypeXml = doc.Descendants("Type").Single().Value;
string ValueXml = doc.Descendants("Value").Single().Value;
Guid GuidXml = Guid.Parse(doc.Descendants("GUID").Single().Value);
I also think you should use Linq-to-XML, but for your example I'd explicitly list the elements, like so (compilable example program):
using System;
using System.Xml.Linq;
namespace ConsoleApplication1
{
internal class Program
{
static void Main()
{
string xml =
#"<Item>
<GUID>9A4FA56F-EAA0-49AF-B7F0-8CA09EA39167</GUID>
<Type>button</Type>
<Title>Save</Title>
<Value>submit</Value>
<Name>btnsave</Name>
<MaxLen>5</MaxLen>
</Item>";
XElement elem = XElement.Parse(xml);
Guid GuidXml = Guid.Parse(elem.Element("GUID").Value);
Console.WriteLine(GuidXml);
string TypeXml = elem.Element("Type").Value;
Console.WriteLine(TypeXml);
string NameXml = elem.Element("Name").Value;
Console.WriteLine(NameXml);
string TitleXml = elem.Element("Title").Value;
Console.WriteLine(TitleXml);
}
}
}

XML take the position of an element and at next usage go directly there

So i have a huge XML file ( wikipedia dump xml ) .
My school project requirement says that i should be able to do a really fast search on this xml file ( so no, not import it into an sql database )
so of course i want to create an indexer, that will display into a separate file ( probably xml ) something like this : [content to search]:[byte offset to the start of the xml node that contains the content]
My question is how can i take the position of the element, and how can I jump to that position in the xml in case it is required for a search ?
The project is in C#. Thank you in advance.
Later Edit : I am trying to work with XmlReader, but I am open for any other suggestions.
For the moment this is how I read my XML for a non-indexed search
XmlReader reader = XmlReader.Create(FileName);
while (reader.Read())
{
switch (reader.Name)
{
case "page":
Boolean found = false;
String title = "";
String element = "<details>";
readMore(reader, "title");
title = reader.Value;
if (title.Contains(word))
{
found = true;
}
readMore(reader, "text");
String content = reader.Value;
if (content.Contains(word) & !found)
{
found = true;
}
if (found)
{
element += "<summary>" + title + " (click)</summary>";
element += content;
element += "</details>";
result.Add(element);
}
break;
}
}
reader.Close();
if (result.Count == 0)
{
result.Add("No results were found");
}
return result;
…
static void readMore(XmlReader reader, String name)
{
while (reader.Name != name)
{
reader.Read();
}
reader.Read();
}
The correct solution would be to use an intermediary binary format; but if you can't do that, and assuming that you use DOM, I don't see any solution but to store the position of the node in the DOM tree as a list of indexes.
Example in JavaScript (should be fairly the same in C#):
function getPosition(node) {
var pos = [], i = 0;
while (node != document.documentElement) {
if (node.previousSibling) {
++i;
node = node.previousSibling;
} else {
pos.unshift(i);
i = 0;
node = node.parentNode;
}
}
return pos;
}
function getNode(pos) {
var node = document.documentElement;
for (var i = 0; i < pos.length; ++i) {
node = node.childNodes[pos[i]];
}
return node;
}

get attribute name in addition to attribute value in xml

I am receiving dynamic xml where I won't know the attribute names, if you'll look at the xml and code... I tried to make a simple example, I can get the attribute values i.e. "myName", "myNextAttribute", and "blah", but I can't get the attribute names i.e. "name", "nextAttribute", and "etc1". Any ideas, I figure it has to be something easy I'm missing...but I'm sure missing it.
static void Main(string[] args)
{
string xml = "<test name=\"myName\" nextAttribute=\"myNextAttribute\" etc1=\"blah\"/>";
TextReader sr = new StringReader(xml);
using (XmlReader xr = XmlReader.Create(sr))
{
while (xr.Read())
{
switch (xr.NodeType)
{
case XmlNodeType.Element:
if (xr.HasAttributes)
{
for (int i = 0; i < xr.AttributeCount; i++)
{
System.Windows.Forms.MessageBox.Show(xr.GetAttribute(i));
}
}
break;
default:
break;
}
}
}
}
You can see in MSDN:
if (reader.HasAttributes) {
Console.WriteLine("Attributes of <" + reader.Name + ">");
while (reader.MoveToNextAttribute()) {
Console.WriteLine(" {0}={1}", reader.Name, reader.Value);
}
// Move the reader back to the element node.
reader.MoveToElement();
}
Your switch is unnecessary since you only have a single case, try rolling that into your if statement instead.
if (xr.NodeType && xr.HasAttributes)
{
...
}
Note that the && operator evaluates in order, so if xr.NoteType is false, the rest of the arguments are ignored and the if block is skipped.

Categories

Resources