Parse particular text from an XML string - c#

Im writing an app which reads an RSS feed and places items on a map.
I need to read the lat and long numbers only from this string:
http://www.digitalvision.se/feed.aspx?isAlert=true&lat=53.647351&lon=-1.933506
.This is contained in link tags
Im a bit of a programming noob but im writing this in C#/Silverlight using Linq to XML.
Shold this text be extrated when parsing or after parsing and sent to a class to do this?
Many thanks for your assistance.
EDIT. Im going to try and do a regex on this
this is where I need to integrate the regex somewhere in this code. I need to take the lat and long from the Link element and seperate it into two variables I can use (the results are part of a foreach loop that creates a list.)
var events = from ev in document.Descendants("item")
select new
{
Title = (ev.Element("title").Value),
Description = (ev.Element("description").Value),
Link = (ev.Element("link").Value),
};
Question is im not quite sure where to put the regex (once I work out how to use the regex properly! :-) )

try this
var url = "http://www.xxxxxxxxxxxxxx.co.uk/map.aspx?isTrafficAlert=true&lat=53.647351&lon=-1.93350";
var items = url.Split('?')[1]
.Split('&')
.Select(i => i.Split('='))
.ToDictionary(o => o[0], o => o[1]);
var lon = items["lon"];
var lat = items["lat"];

If you only need the Lat and Lon values and the feed is just one big XML string you can do the whole thing with a regular expression.
var rssFeed = #"http://www.xxxxxxxxxxxxxx.co.uk/map.aspx?isTrafficAlert=true&lat=53.647351&lon=-1.933506
http://www.xxxxxxxxxxxxxx.co.uk/map.aspx?isTrafficAlert=true&lat=53.647352&lon=-1.933507
http://www.xxxxxxxxxxxxxx.co.uk/map.aspx?isTrafficAlert=true&lat=53.647353&lon=-1.933508
http://www.xxxxxxxxxxxxxx.co.uk/map.aspx?isTrafficAlert=true&lat=53.647354&lon=-1.933509";
var regex = new Regex(#"lat=(?<Lat>[+-]?\d*\.\d*)&lon=(?<Lon>[+-]?\d*\.\d*)");
var latLongPairs = new List<Tuple<decimal, decimal>>();
foreach (Match match in regex.Matches(rssFeed))
{
var lat = Convert.ToDecimal(match.Groups["Lat"].Value);
var lon = Convert.ToDecimal(match.Groups["Lon"].Value);
latLongPairs.Add(new Tuple<decimal, decimal>(lat, lon));
}

Related

Get everything after Slash c#

I'm trying to figure out the best way to get everything before the / character in a string. Some example strings are below.
var url = dr.FindElements(By.XPath("//*[#id=\"u_0_3\"]/div/h1/a"));
foreach (var item in url)
{
if (item.GetAttribute("href").ToString().Contains("https://www.facebook.com/"))
{
listBox4.Items.Add("here");
}
}
the href is like that = "http://facebook.com/xxx"
want the xxx which is username want to get it alone in my listbox without the rest of the url
If you're at the point where you've got the string you want to work with, here are two ways to do this:
Split the string by / and take the last part
var stringToProcess = "https://www.facebook.com/ProfileName";
var partsOfString = stringToProcess.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
var profileName = partsOfString.Last();
Use the Uri class to extract the last part
var stringToProcess = "https://www.facebook.com/ProfileName";
var stringToProcessAsUri = new Uri(stringToProcess);
var profileNameFromUri = stringToProcessAsUri.Segments.Last();
This is the "strictly better" way as it will give you a clean result even if the profile address has a query string attached to it, i.e:
var stringToProcess = "https://www.facebook.com/ProfileName?abc=def";
var stringToProcessAsUri = new Uri(stringToProcess);
var profileNameFromUri = stringToProcessAsUri.Segments.Last();
You'll still have the variable profileNameFromUri returned containing only ProfileName

How to deal with non-fixed xelement?

im using a method to parse a xdocument to an object but i have a situation in this line of code:
var xElementTax = xElementXml.Element(xn + "tax");
var aux = xElementTax.Element(xn + "taxNN").Value;
In the taxNN XName the NN part is a random number, i.e: tax01, tax02, tax03 and goes on. It could be any two digit number.
How can i deal with this situation wheres i dont have a fixed tag? The only fixed part of the tag is the tax word.
Thanks.
Are you looping through all the elements of xElementTax?
If so you can just go with this:
foreach(XElement auxElement in xElementTax.Elements)
{
var aux = auxElement.Value;
// And so on
}
If you want only those which match "taxNN" you can go instead with:
foreach(XElement auxElement in xElementTax.Elements.Where(x => x.Name.ToString().StartsWith(xn + "tax"))
{
var aux = auxElement.Value;
...
}
If there's only going to be one of them you can go with:
XElement auxElement = xElementTax.Elements.Where(
x => x.Name.ToString().StartsWith(xn + "tax").FirstOrDefault();
var aux = auxElement.Value;

Removing duplicate words in url and filtering out words not in a list

I have a links that the sub category got repeated a bunch of times. Also only want to keep the repeating if they are in a certain list. But also keep the last part of the link
About
Video
Example1
Example2
www.example.com/About/About/Videos/Videos/Videos/Featured/5-great-videos
should be
www.example.com/about/videos/5-great-videos
Any help?
How about this one, using LINQ
string str = "www.example.com/About/About/Videos/Videos/Videos/Featured/5-great-videos";
var result = str.Split('/').GroupBy(x=>x).Select(x=>x.Key).Aggregate((a,b)=>a+"/"+b);
In case you prefer not using linq:
string url = "www.example.com/About/About/Videos/Videos/Videos/Featured/5-great-videos";
List<string> urlList = new List<string>();
foreach (string urlToken in url.Split('/'))
if (!urlList.Contains(urlToken)) urlList.Add(urlToken);
url = String.Join("/", urlList.ToArray());

Get all RSS links on a website

I'm currently writing a very basic program that'll firstly go through the html code of a website to find all RSS Links, and thereafter put the RSS Links into an array and parse each content of the links into an existing XML file.
However, I'm still learning C# and I'm not that familiar with all the classes yet. I have done all this in PHP by writing own class with get_file_contents() and as well been using cURL to do the work. I managed to get around it with Java also. Anyhow, I'm trying to accomplish the same results by using C#, but I think I'm doing something wrong here.
TLDR; What's the best way to write the regex to find all RSS links on a website?
So far, my code looks like this:
private List<string> getRSSLinks(string websiteUrl)
{
List<string> links = new List<string>();
MatchCollection collection = Regex.Matches(websiteUrl, #"(<link.*?>.*?</link>)", RegexOptions.Singleline);
foreach (Match singleMatch in collection)
{
string text = singleMatch.Groups[1].Value;
Match matchRSSLink = Regex.Match(text, #"type=\""(application/rss+xml)\""", RegexOptions.Singleline);
if (matchRSSLink.Success)
{
links.Add(text);
}
}
return links;
}
Don't use Regex to parse html. Use an html parser instead See this link for the explanation
I prefer HtmlAgilityPack to parse htmls
using (var client = new WebClient())
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(client.DownloadString("http://www.xul.fr/en-xml-rss.html"));
var rssLinks = doc.DocumentNode.Descendants("link")
.Where(n => n.Attributes["type"] != null && n.Attributes["type"].Value == "application/rss+xml")
.Select(n => n.Attributes["href"].Value)
.ToArray();
}

Improving Linq-XML to Object query

I want to use Linq to extract data from an XML document and place it into a list
<Data>
<FlightData DTS="20110216 17:17" flight="1234" origin="CYYZ" dest="CYUL" aircraft="945">
<TLDRequest>
<Airline>ABC</Airline>
<AcReg>C-FABC</AcReg>
<CalcType>T</CalcType>
<OAT>-05</OAT>
<Wind>060/10</Wind>
<Flaps>5</Flaps>
<Switches></Switches>
<Runways>
<Rwy>6L</Rwy>
<Rwy>6R</Rwy>
</Runways>
...
</TLDRequest>
...
</FlightData>
</Data>
My Linq code in C# works - I can get attributes from the FlightData tab, but I think it could be more efficient, especially in the area of getting data from the TLDRequest tag. Can I get some insight on using best practices to get to and grab child tags?
public static List<ACARS_Phase> createAcarsPhaseObject(XDocument xDoc)
{
return (from ao in xDoc.Descendants("FlightData")
select new ACARS_Phase
{
FlightDate = DateTime.ParseExact(ao.Attribute("DTS").Value, "yyyyMMdd HH:mm", new CultureInfo("en-CA")),
FlightNumber = ao.Attribute("flight").Value,
Origin = ao.Attribute("origin").Value,
Destination = ao.Attribute("dest").Value,
InternalFinNumber = ao.Attribute("aircraft").Value,
OperatorCode = ao.Element("TLDRequest").Element("Airline").Value,
RegistrationNumber = ao.Element("TLDRequest").Element("AcReg").Value,
Wind = ao.Element("TLDRequest").Element("Wind").Value,
Flaps = ao.Element("TLDRequest").Element("Flaps").Value,
OAT = ao.Element("TLDRequest").Element("OAT").Value,
}).ToList();
}
Best regards
Your query is fine, generally speaking. If you want to cut down on some of the redundancy, consider using let to get the TLDRequest element once, so you repeat yourself a bit less.
return (from ao in xDoc.Descendants("FlightData")
let request = ao.Element("TLDRequest")
select new AcARS_Phase
{
// stuff
OperatorCode = request.Element("Airline").Value,
RegistrationNumber = request.Element("AcReg").Value,
// etc.
}).ToList();

Categories

Resources