Better way to use LINQ To XML for an HTML Page - c#

I am looking for specific items on a web page.
What I did (to test, so far) is working just fine, but is really ugly to my eyes. I would like to get suggestions to do this in a more concise manner, that is ONE Linq query instead of 2 now....
document.GetXDocument();
string xmlns = "{http://www.w3.org/1999/xhtml}";
var AllElements = from AnyElement in document.fullPage.Descendants(xmlns + "div")
where AnyElement.Attribute("id") != null && AnyElement.Attribute("id").Value == "maincolumn"
select AnyElement;
// this first query bring only one LARGE Element.
XDocument subdocument = new XDocument(AllElements);
var myElements = from item in subdocument.Descendants(xmlns + "img")
where String.IsNullOrEmpty(item.Attribute("src").Value.Trim()) != true
select item;
foreach (var element in myElements)
{
Console.WriteLine(element.Attribute("src").Value.Trim());
}
Assert.IsNotNull(myElements.Count());
I know I could directly look for "img", but I want to be able to get other types of items in those pages, like links and some text.
I strongly doubt this is the best way!

The same logic in single query:
var myElements = from element in document.fullPage.Descendants(xmlns + "div")
where element.Attribute("id") != null
&& element.Attribute("id").Value == "maincolumn"
from item in new XDocument(element).Descendants(xmlns + "img")
where !String.IsNullOrEmpty(item.Attribute("src").Value.Trim())
select item;

If you insist on parsing the web page as XML, try this:
var elements =
from element in document.Descendants(xmlns + "div")
where (string)element.Attribute("id") == "maincolumn"
from element2 in element.Descendants(xmlns + "img")
let src = ((string)element2.Attribute("src")).Trim()
where String.IsNullOrEmpty(src)
select new {
element2,
src
};
foreach (var item in elements) {
Console.WriteLine(item.src);
}
Notes:
What is the type of document? I am assuming it's an XDocument. If that is the case, you can use Descendants directly on XDocument. (OTOTH if document is an XDocument, where does that fullPath property come from?)
Cast the XAttribute to a string. If it's empty, the result of the cast will be null. This will save on the double check. (This doesn't offer any performance benefits.)
Use let to "save" a value for later reuse, in this case for use in the foreach. Unless all you need is that final Assert, in which case it might be more efficient to use Any instead of Count. Any only has to iterate over the first result in order to return a value; Count has to iterate over all of them.
Why is subdocument of type XDocument? Wouldn't XElement be the appropriate type?
You can also use String.IsNullOrWhitespace to check for whitespace in src, instead of String.IsNullOrEmpty, assuming you want to process the src as is, with any whitespace it might have.

Related

How to compare ItemElements of a Radcombobox with an expected string?

I need compare if a Radcombobox has ItemElements that matches with my expected string. Here is what I'm trying to do:
foreach (IRadComboBoxItem item in comboBox.ItemElements)
{
var itemExists = comboBox.ItemElements.FirstOrDefault(items => item.Text.Contains(expectedString));
if (itemExists == null) continue;
itemExists.Select();
return true;
}
However comboBox.Text.Contains(expectedString) is not supported as I'm comparing IRadComboBoxItem with a string. Could you please suggest how to achieve this?
Use linq method of Any:
return comboBox.ItemElements.Any(item => item.Text.Contains(expectedString));
In your above code you mixed a bit the use of different linq methods
In the FirstOrDefault - it returns the first item in a collection that matches a predicate, otherwise default(T).
Then if it is not null you perform an Select but assign it to nowhere.
You have this code in a foreach loop - but do not use the item nowhere. you don't need the loop because you are trying to use the linq methods (which behind the scenes use the loops themselves)
Following comment what you want is:
var wantedItem = comboBox.ItemElements.FirstOrDefault(item => item.Text.Contains(expectedString));
if(wantedItem != null)
{
//What you want to do with item
}
Didn't work with RadComboBox myself but by this site maybe:
RadComboBoxItem item = comboBox.FindItemByText(expectedString);
I assume that if it doesn't find it returns null

linq looping through tags with the same name

I'm some what new to linq could uses some help..
I have an xml file that looks like this:
<InputPath>
<path isRename="Off" isRouter="Off" pattern="pattern-1">d:\temp1</path>
<path isRename="Off" isRouter="pattern-1">d:\temp2</path>
</InputPath>
I need to loop through and get the key values of the tag "path".
What I have so far is
var results = from c in rootElement.Descendants("InputPath") select c;
foreach (XElement _path in results)
{
string value = _path.Element("path").Value;
}
But I only get the last <path> value. Any help would be great.
Have you tried just just enumerating the path items?
foreach (var element in rootElement.Descendants("path"))
{
var value = element.Value;
}
You'll only get the first element that way, because that's what the Element method gives you: the first child element with the given name.
If you want multiple elements you can just use Elements instead:
// Note: the query expression here is pointless.
var results = from c in rootElement.Descendants("InputPath") select c;
foreach (XElement _path in results)
{
string value = _path.Elements("path").Value;
// Use value here...
}
Alternatively, use the Elements extension method and do it all in one go:
foreach (var path in rootElement.Descendants("InputPath").Elements("path"))
{
string value = path.Value;
// Use value here
}
If that doesn't help, please give more information about what you're trying to do and what the problem is.
If by "last" you mean "the element contents" that's because you're using the Value property. If you want the attributes within the path element, you need the Attribute method, as shown by IamStalker, although personally I'd usually cast the XAttribute to string (or whatever) rather than using the Value property, in case the attribute is missing. (It depends on what you want the behaviour to be in that case.)
What you need is, to loop through the attributes like so
foreach (XElement xElem in rootElement.Descendants("InputPath"))
{
string isRename = xElem.Attribute("isRename").Value;
}

Extracting XElement children and grandchildren by name

I have an XElement (myParent) containing multiple levels of children that I wish to extract data from. The elements of interest are at known locations in the parent.
I understand that I am able to get a child element by:
myParent.Element(childName);
or
myParent.Element(level1).Element(childName);
I am having trouble figuring out how to do this if I want to loop through an array offor a list of elements that are at different levels, and looping through the list. For instance, I am interested in getting the following set of elements:
myParent.Element("FieldOutputs").Element("Capacity");
myParent.Element("EngOutputs").Element("Performance")
myParent.Element("EngOutputs").Element("Unit").Element("Efficiency")
How can I define these locations in an array so that I can simply loop through the array?
i.e.
string[] myStringArray = {"FieldOutputs.Capacity", "EngOutputs.Performance", "EngOutputs.Unit.Efficiency"};
for (int i=0; i< myArray.Count(); i++)
{
XElement myElement = myParent.Element(myStringArray);
}
I understand that the method above does not work, but just wanted to show effectively what I am trying to achieve.
Any feedback is appreciated.
Thank you,
Justin
While normally I'm reluctant to suggest using XPath, it's probably the most appropriate approach here, using XPathSelectElement:
string[] paths = { "FieldOutputs/Capacity", "EngOutputs/Performance",
"EngOutputs/Unit/Efficiency"};
foreach (string path in paths)
{
XElement element = parent.XPathSelectElement(path);
if (element != null)
{
// ...
}
}
The Descendants() method is what you're looking for, I believe. For example:
var descendants = myParent.Descendants();
foreach (var e in descendants) {
...
}
http://msdn.microsoft.com/en-us/library/system.xml.linq.xelement.descendants.aspx
Edit:
Looking at your question more closely, it looks like you may want to use XPathSelectElements()
var descendants = myParent.XPathSelectElements("./FieldOutputs/Capacity | ./EngOutputs/Performance | ./EngOutputs/Units/Efficency");
http://msdn.microsoft.com/en-us/library/bb351355.aspx

Can you improve the performance of this linq-to-xml method?

I really need to be somewhere else this morning. So, I have decided to post a performance question here instead.
The code below works but it calls Load and Save method multiple times. This seems far from efficient. Please could someone provide the code so far the load and save lines occur outside the loop. I wish to call load and save only once.
Thanks chaps :)
public void RemoveNodes(IList<String> removeItems)
{
foreach (String removeItem in removeItems)
{
XDocument document = XDocument.Load(fullFilePath);
var results = from item in document.Descendants(elementName)
let attr = item.Attribute(attributeName)
where attr != null && attr.Value == removeItem.ToString()
select item;
results.ToList().ForEach(item => item.Remove());
document.Save(fullFilePath);
}
}
You've already given the answer yourself - just move the Load and Save calls outside the loop. It's not clear to me where you were having problems implementing that yourself...
You can make your query slightly simpler too though:
XDocument document = XDocument.Load(fullFilePath);
foreach (String removeItem in removeItems)
{
var results = from item in document.Descendants(elementName)
where (string) item.Attribute(attributeName) == removeItem
select item;
results.ToList().ForEach(item => item.Remove());
}
document.Save(fullFilePath);
This uses the fact that the conversion from XAttribute to string returns null if the attribute reference itself is null.
You don't even need to use a query expression:
var results = document.Descendants(elementName)
.Where(item => (string) item.Attribute(attributeName) == removeItem);

How to access a particular data in LINQ query result?

I know, this is very simple for you guys.
Please consider the following code:
string[] str = { "dataReader", "dataTable", "gridView", "textBox", "bool" };
var s = from n in str
where n.StartsWith("data")
select n;
foreach (var x in s)
{
Console.WriteLine(x.ToString());
}
Console.ReadLine();
Supposedly, it will print:
dataReader
dataTable
right?
What if for example I don't know the data, and what the results of the query will be (but I'm sure it will return some results) and I just want to print the second item that will be produced by the query, what should my code be instead of using foreach?
Is there something like array-indexing here?
You're looking forEnumerable.ElementAt.
var secondMatch = str.Where(item => item.StartsWith("data")) //consider null-test
.ElementAt(1);
Console.WriteLine(secondMatch); //ToString() is redundant
SinceWherestreams its results, this will be efficient - enumeration of the source sequence will be discontinued after the second match (the one you're interested in) has been found.
If you find that the implicit guarantee you have that the source will contain two matches is not valid, you can use ElementAtOrDefault.
var secondMatch = str.Where(item => item.StartsWith("data"))
.ElementAtOrDefault(1);
if(secondMatch == null) // because default(string) == null
{
// There are no matches or just a single match..
}
else
{
// Second match found..
}
You could use array-indexing here as you say, but only after you load the results into... an array. This will of course mean that the entire source sequence has to be enumerated and the matches loaded into the array, so it's a bit of a waste if you are only interested in the second match.
var secondMatch = str.Where(item => item.StartsWith("data"))
.ToArray()[1]; //ElementAt will will work too
you got a few options:
s.Skip(1).First();
s.ElementAt(1);
The first is more suited for scenarios where you want X elements but after the y first elements. The second is more clear when you just need a single element on a specific location

Categories

Resources