I am using the HTMLAgilityPack to parse through some html. I am getting the result set that I am expecting when using and xpath query combined with a linq query. Is there a way that I could combine them both into a single LINQ query?
var test = doc.DocumentNode.SelectNodes("//div[#class='product']");
foreach (var item in test)
{
var result = from input in item.Descendants("span")
where input.Attributes["class"].Value == "Bold"
where input.InnerHtml.ToUpper() == "GetItem"
select input;
return result;
}
If you'd want to gather all the spans together (if I'm correct in assuming that's what you want)...
I'd first convert it to a more fluent notation (I find SelectMany much easier to grasp that way - but that's just me)
(disclaimer: I'm writing this from memory, copy/pasting your code - not by VS at the moment - you'd need to check, make it write if any issues - but I think I got it ok more or less)
var test = doc.DocumentNode.SelectNodes("//div[#class='product']");
foreach(var item in test)
item.Descendants("span").Where(input => input.Attributes["class"].Value == "Bold").Where(input => input.InnerHtml.ToUpper() == "GetItem").Select(input => input);
and finally...
var allSpans = doc.DocumentNode.SelectNodes("//div[#class='product']")
.SelectMany(item => item.Descendants("span").Where(input => input.Attributes["class"].Value == "Bold").Where(input => input.InnerHtml.ToUpper() == "GetItem"));
...or along those lines
Just wanted to show you the other way to do SelectMany in Linq. This is a style choice, and many people here in SO would prefer the .SelectMany extension method, because they can see how the Monad is applied to the IEnumerable. I prefer this method since its much closer to how a functional programming model would do it.
return from product in doc.DocumentNode.SelectNodes("//div[#class='product']")
from input in product.Descendants("span")
where input.Attributes["class"].Value == "Bold"
where input.InnerHtml.ToUpper() == "GetItem"
select input;
return (
from result in
from item in doc.DocumentNode.SelectNodes("//div[#class='product']")
from input in item.Descendants("span")
where input.Attributes["class"].Value == "Bold"
where input.InnerHtml.ToUpper() == "GetItem"
select input
select result
).First();
If you really wanted it all in one query you could with something like this:
var result = doc.DocumentNode.SelectNodes("//div[#class='product']")
.SelectMany(e => e.Descendants("span")
.Where(x => x.Attributes["class"].Value == "Bold" &&
x.InnerHtml.ToUpper() == "GetItem"))
.ToList();
return result;
I would recommend spacing it out a bit though for the sake of readability, something more like this:
var result = new List<SomeType>();
var nodes = doc.DocumentNode.SelectNodes("//div[#class='product']");
nodes.SelectMany(e => e.Descendants("span")
.Where(x => x.Attributes["class"].Value == "Bold" &&
x.InnerHtml.ToUpper() == "GetItem"));
return result.ToList();
The SelectMany() method will flatten the results of the inner queries into a single IEnumerable<T>.
Related
So I know the query notation
var word = from s in stringList
where s.Length == 3
select s;
is equivalent to the dot notation
var word = stringList
.Where(s => s.Length == 3)
.Select(s => s);
But how do you convert this dot notation to a query notation?
var word = wordsList
.Single(p => p.Id == savedId);
I couldn't find much resources on Google.
You can't. A lot of LINQ functions can't be used in the query syntax. At best, you can combine both and do something like
var word = (from p in wordsList
where p.Id == savedId
select p).Single()
but in the simple case of collection.Single(condition), the "dot notation" seems more readable to me.
There is a list of keywords used by LINQ on MSDN, you can see which functions are integrated into the language from that list.
Single doesn't have an exact equivalent in query notation.
The best you can do is to wrap your query in parentheses and call .Single yourself.
The only thing you can do is:
var word = (from w in wordsList
where w.Id == savedId
select w).Single();
but It will not be exactly the same. It will be transformed into
var word = wordsList
.Where(p => p.Id == savedId)
.Single();
Single is not part of the query notation. You could change your first example to the following to achieve what you want:
var word = (from s in stringList where s.Length == 3 select s).Single();
I'm trying to debug code that a fellow developer wrote and LINQ expressions are making the task painful. I don't know how to debug around complicated LINQ expressions, so can anyone tell me what the equivalent code is without them?
instanceIdList.AddRange(
strname.Instances
.Where(z => instancehealthList.Find(y => y.InstanceId == z.InstanceId
&& y.State == "InService") != null)
.Select(x => x.InstanceId)
.ToList()
.Select(instanceid => new ServerObj(servertype, instanceid))
);
Also is this well written? In general is this sort of LINQ encouraged or frowned upon?
Refactoring the query using loops would look something like this:
var serverObjList = new List<ServerObj>();
foreach (var inst in strname.Instances)
{
foreach (var health in instancehealthList)
{
if (inst.InstanceID == health.InstanceID && health.State == "InService")
{
serverObjList.Add(new ServerObj(servertype, health.InstanceID));
break;
}
}
}
Rather than rewrite it to a series of foreach loops, you could eagerly-execute the expression after each operation, allowing you to inspect the data-set at intermediate steps, like so:
List<var> soFar = strname.Instances.Where(z => instancehealthList.Find(y => y.InstanceId == z.InstanceId && y.State == "InService") != null).ToList();
List<Int64> soFar2 = soFar.Select( x => x.InstanceId ).ToList();
List<ServerObj> soFar3 = soFar2.Select( instanceId => new ServerObj(servertype, instanceid) ).ToList();
instanceIdList.AddRange( soFar3 );
Of course, I feel this Linq isn't well-written.
I am using the query below to grab all records that have a SubCategoryName == subCatName and i want to return all of there ProductID's as a list of ints. Problem is when my code runs it is only returning 1 int(record) instead of all. How can i make it return all of the records that have that subCatName? Its returning a count = 1 with a capacity of 4. So it is a int[4] but only the first [0] is = to a actual product ID the rest returning zero?
public List<int> GetPRodSubCats(string subCatName)
{
var _db = new ProductContext();
if (subCatName != null)
{
var query = _db.ProductSubCat
.Where(x => x.SubCategory.SubCategoryName == subCatName)
.Select(p => p.ProductID);
return query.ToList<int>();
}
return null;
}
As Daniel already has mentioned, the code should work. But maybe you are expecting that it's case-insensitive or ignores white-spaces. So this is more tolerant:
subCatName = subCatName.Trim();
List<int> productIDs = _db.ProductSubCat
.Where(x => String.Equals(x.SubCategory.SubCategoryName.Trim(), subCatName, StringComparison.OrdinalIgnoreCase))
.Select(p => p.ProductID)
.ToList();
This seems more like an expected behavior here. How do you know you don't only have 1 record that satisfies the Where predicate.
Your code is correct, however you might want to normalize your comparison.
x => x.SubCategory.SubCategoryName == subCatName
to use a specific case for instance:
x => x.SubCategory.SubCategoryName.ToLower() == subCatName.ToLower()
you might also consider a Trim.
When i execuse these lines
drpdf["meno"] = matches.Cast<Match>().Where(c => c.Groups["ID"].Value == i.ToString()).Select(c => c.Groups["meno"].Value);
drpdf["info"] = matches.Cast<Match>().Where(c => c.Groups["ID"].Value == i.ToString()).Select(c => Regex.Replace(c.Groups["zvysok"].Value, #"^,\s?", string.Empty));
it wont save into DataRow value that i want, instead of
System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Text.RegularExpressions.Match,System.String]
Can you help me pls how to select/cast return value to the readable type? Thanks anyway. Ondro
Your LINQ queries use Select, so you get an IEnumerable<T> back. If you want the result of your LINQ query, and are expecting exactly one result, add .Single():
drpdf["meno"] = matches.Cast<Match>()
.Where(c => c.Groups["ID"].Value == i.ToString())
.Select(c => c.Groups["meno"].Value)
.Single();
On the other hand, if your query can have multiple results, you should use .First() instead to take the first result. At that point, however, it depends what your scenario is and what you're trying to capture.
Something like:
matches.Cast<Match>()
.Where(c => c.Groups["ID"].Value == i.ToString())
.Select(c => c.Groups["meno"].Value)
.FirstOrDefault(); // this expression will evaluate the linq
// expression, so you get the string you want
Please note: You should only use FirstOrDefault or SingleOrDefault if null is actually a valid value in your context. (like said by #Daniel Hilgarth).
If null is not a valid result and instead you want an empty string, append a ?? String.Empty to the expression:
matches
...
.FirstOrDefault() ?? String.Empty;
The result of your queries are enumerable objects. Calling ToString() on these doesn't give you a meaningful string representation as you have already noticed. You need to generate a string appropriate for display.
If you simply want to display the contents as a comma-separated list, you can use String.Join() to do this:
var menos = matches.Cast<Match>()
.Where(c => c.Groups["ID"].Value == i.ToString())
.Select(c => c.Groups["meno"].Value);
drpdf["meno"] = String.Join(", ", menos);
Otherwise if you intended to select a single result, use Single() to select that single string result.
I have two queries and i'm using the result of the first one in the second one like this
var temp = (ObjectTable.Where(o => o.Category == "Y"));
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == temp.Max(x => x.Value))});
Is there a way to combine these into one query?
EDIT:
I cannot just chain them directly because I'm using temp.Max() in the second query.
Why? it would be clearer (and more efficient) to make it three:
var temp = (ObjectTable.Where(o => o.Category == "Y"));
int max = temp.Max(x => x.Value);
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == max)});
You can do it in one statement using query syntax, using the let keyword. It only evaluates the 'max' once, so it just like the three separate statements, just in one line.
var anonymousObjList = from o in ObjectTable
where o.Category == "Y"
let max = ObjectTable.Max(m => m.Value)
select new { o, IsMax = (o.Value == max) };
This is the only time I ever use query syntax. You can't do this using method syntax!
edit: ReSharper suggests
var anonymousObjList = ObjectTable.Where(o => o.Category == "Y")
.Select(o => new {o, max = ObjectTable.Max(m => m.Value)})
.Select(#t => new {#t.o, IsMax = (#t.o.Value == #t.max)});
however this is not optimal. The first Select is projecting a max Property for each item in ObjectTable - the Max function will be evaluated for every item. If you use query syntax it's only evaluated once.
Again, you can only do this with query syntax. I'm not fan of query syntax but this makes it worthwhile, and is the only case in which I use it. ReSharper is wrong.
Possibly the most straightfirward refactoring is to replace all instances of "temp" with the value of temp. Since it appears that this value is immutable, the refactoring should be valid (yet ugly):
var anonymousObjList = ObjectTable.Where(o => o.Category == "Y")
.Select(o => new {o, IsMax = (o.Value == ObjectTable.Where(o => o.Category == "Y").Max(x => x.Value))});
As has already been pointed out, this query really has no advantages over the original, since queries use deffered execution and can be built up. I would actually suggest splitting the query even more:
var temp = (ObjectTable.Where(o => o.Category == "Y"));
var maxValue = temp.Max(x => x.Value);
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == maxValue)});
This is better than the original because every time "Max" is called causes another iteration over the entire dataset. Since it is being called in the Select of the original, Max was being called n times. That makes the original O(n^2)!