Can you improve the performance of this linq-to-xml method? - c#

I really need to be somewhere else this morning. So, I have decided to post a performance question here instead.
The code below works but it calls Load and Save method multiple times. This seems far from efficient. Please could someone provide the code so far the load and save lines occur outside the loop. I wish to call load and save only once.
Thanks chaps :)
public void RemoveNodes(IList<String> removeItems)
{
foreach (String removeItem in removeItems)
{
XDocument document = XDocument.Load(fullFilePath);
var results = from item in document.Descendants(elementName)
let attr = item.Attribute(attributeName)
where attr != null && attr.Value == removeItem.ToString()
select item;
results.ToList().ForEach(item => item.Remove());
document.Save(fullFilePath);
}
}

You've already given the answer yourself - just move the Load and Save calls outside the loop. It's not clear to me where you were having problems implementing that yourself...
You can make your query slightly simpler too though:
XDocument document = XDocument.Load(fullFilePath);
foreach (String removeItem in removeItems)
{
var results = from item in document.Descendants(elementName)
where (string) item.Attribute(attributeName) == removeItem
select item;
results.ToList().ForEach(item => item.Remove());
}
document.Save(fullFilePath);
This uses the fact that the conversion from XAttribute to string returns null if the attribute reference itself is null.
You don't even need to use a query expression:
var results = document.Descendants(elementName)
.Where(item => (string) item.Attribute(attributeName) == removeItem);

Related

Querying a chain of list of lists with LINQ

I am working with an XML standard called SDMX. It's fairly complicated but I'll make it as short as possible. I am receiving an object called CategoryScheme. This object can contain a number of Category, and each Category can contain more Category, and so on, the chain can be infinite. Every Category has an unique ID.
Usually each Category contains a lot of Categories. Together with this object I am receiving an Array, that contains the list of IDs that indicates where a specific Category is nested, and then I am receiving the ID of that category.
What I need to do is to create an object that maintains the hierarchy of the Category objects, but each Category must have only one child and that child has to be the one of the tree that leads to the specific Category.
So I had an idea, but in order to do this I should generate LINQ queries inside a cycle, and I have no clue how to do this. More information of what I wanted to try is commented inside the code
Let's go to the code:
public void RemoveCategory(ArtefactIdentity ArtIdentity, string CategoryID, string CategoryTree)
{
try
{
WSModel wsModel = new WSModel();
// Prepare Art Identity and Array
ArtIdentity.Version = ArtIdentity.Version.Replace("_", ".");
var CatTree = JArray.Parse(CategoryTree).Reverse();
// Get Category Scheme
ISdmxObjects SdmxObj = wsModel.GetCategoryScheme(ArtIdentity, false, false);
ICategorySchemeMutableObject CatSchemeObj = SdmxObj.CategorySchemes.FirstOrDefault().MutableInstance;
foreach (var Cat in CatTree)
{
// The cycle should work like this.
// At every iteration it must delete all the elements except the correct one
// and on the next iteration it must delete all the elements of the previously selected element
// At the end, I need to have the CatSchemeObj full of the all chains of categories.
// Iteration 1...
//CatSchemeObj.Items.ToList().RemoveAll(x => x.Id != Cat.ToString());
// Iteration 2...
//CatSchemeObj.Items.ToList().SingleOrDefault().Items.ToList().RemoveAll(x => x.Id != Cat.ToString());
// Iteration 3...
//CatSchemeObj.Items.ToList().SingleOrDefault().Items.ToList().SingleOrDefault().Items.ToList().RemoveAll(x => x.Id != Cat.ToString());
// Etc...
}
}
catch (Exception ex)
{
throw ex;
}
}
Thank you for your help.
So, as i already said in my comment, building a recursive function should fix the issue. If you're new to it, you can find some basic information about recursion in C# here.
The method could look something like this:
private void DeleteRecursively(int currentRecursionLevel, string[] catTree, ICategorySchemeMutableObject catSchemeObj)
{
catSchemeObj.Items.ToList().RemoveAll(x => x.Id != catTree[currentRecursionLevel].ToString());
var leftoverObject = catSchemeObj.Items.ToList().SingleOrDefault();
if(leftoverObject != null) DeleteRecursively(++currentRecursionLevel, catTree, leftoverObject);
}
Afterwards you can call this method in your main method, instead of the loop:
DeleteRecursively(0, CatTree, CatSchemeObject);
But as i also said, keep in mind, that calling the method in the loop, seems senseless to me, because you already cleared the tree, besides the one leftover path, so calling the method with the same tree, but another category, will result in an empty tree (in CatSchemeObject).
CAUTION! Another thing to mention i noticed right now: Calling to list on your Items property and afterwards deleting entries, will NOT affect your source object, as ToList is generating a new object. It IS keeping the referenced original objects, but a deletion only affects the list. So you must write back the resulting list to your Items property, or find a way to directly delete in the Items object. (Assuming it's an IEnumerable and not a concrete collection type you should write it back).
Just try it out with this simple example, and you will see that the original list is not modified.
IEnumerable<int> test = new List<int>() { 1, 2, 3, 4 , 1 };
test.ToList().RemoveAll(a => a != 1);
Edited:
So here is another possible way of going after the discussion below.
Not sure what do you really need so just try it out.
int counter = 0;
var list = CatSchemeObj.Items.ToList();
//check before you call it or you will get an error
if(!list.Equals(default(list)))
{
while(true)
{
var temp = list.Where(x => CatTree[counter++] == x.Id); // or != ? play with it .
list = temp.Items.ToList().SingleOrDefault();
if(list.Equals(default(list))
{
break;
}
}
}
I just translated you problem to 2 solutions, but I am not sure if you won't lose data because of the SingleOrDefault call. It means 'Grab the first item regardless of everything'. I know you said you have only 1 Item that is ok, but still... :)
Let me know in comment if this worked for you or not.
//solution 1
// inside of this loop check each child list if empty or not
foreach (var Cat in CatTree)
{
var list = CatSchemeObj.Items.ToList();
//check before you call it or you will get an error
if(!list.Equals(default(list)))
{
while(true)
{
list.RemoveAll(x => x.Id != Cat.ToString());
list = list.ToList().SingleOrDefault();
if(list.Equals(default(list))
{
break;
}
}
}
}
//solution 2
foreach (var Cat in CatTree)
{
var list = CatSchemeObj.Items.ToList();
//check before you call it or you will get an error
if(!list.Equals(default(list)))
{
CleanTheCat(cat, list);
}
}
//use this recursive function outside of loop because it will cat itself
void CleanTheCat(string cat, List<typeof(ICategorySchemeMutableObject.Items) /*Place here whatever type you have*/> CatSchemeObj)
{
CatSchemeObj.RemoveAll(x => x.Id != cat);
var catObj = CatSchemeObj.Items.ToList().SingleOrDefault();
if (!catObj.Equals(default(catObj)){
CleanTheCat(cat, catObj);
}
}
Thank you to whoever tried to help but I solved it by myself in a much easier way.
I just sent the full CategoryScheme object to the method that converted it in the XML format, then just one line did the trick:
XmlDocument.Descendants("Category").Where(x => !CatList.Contains(x.Attribute("id").Value)).RemoveIfExists();

How to compare ItemElements of a Radcombobox with an expected string?

I need compare if a Radcombobox has ItemElements that matches with my expected string. Here is what I'm trying to do:
foreach (IRadComboBoxItem item in comboBox.ItemElements)
{
var itemExists = comboBox.ItemElements.FirstOrDefault(items => item.Text.Contains(expectedString));
if (itemExists == null) continue;
itemExists.Select();
return true;
}
However comboBox.Text.Contains(expectedString) is not supported as I'm comparing IRadComboBoxItem with a string. Could you please suggest how to achieve this?
Use linq method of Any:
return comboBox.ItemElements.Any(item => item.Text.Contains(expectedString));
In your above code you mixed a bit the use of different linq methods
In the FirstOrDefault - it returns the first item in a collection that matches a predicate, otherwise default(T).
Then if it is not null you perform an Select but assign it to nowhere.
You have this code in a foreach loop - but do not use the item nowhere. you don't need the loop because you are trying to use the linq methods (which behind the scenes use the loops themselves)
Following comment what you want is:
var wantedItem = comboBox.ItemElements.FirstOrDefault(item => item.Text.Contains(expectedString));
if(wantedItem != null)
{
//What you want to do with item
}
Didn't work with RadComboBox myself but by this site maybe:
RadComboBoxItem item = comboBox.FindItemByText(expectedString);
I assume that if it doesn't find it returns null

Better way to use LINQ To XML for an HTML Page

I am looking for specific items on a web page.
What I did (to test, so far) is working just fine, but is really ugly to my eyes. I would like to get suggestions to do this in a more concise manner, that is ONE Linq query instead of 2 now....
document.GetXDocument();
string xmlns = "{http://www.w3.org/1999/xhtml}";
var AllElements = from AnyElement in document.fullPage.Descendants(xmlns + "div")
where AnyElement.Attribute("id") != null && AnyElement.Attribute("id").Value == "maincolumn"
select AnyElement;
// this first query bring only one LARGE Element.
XDocument subdocument = new XDocument(AllElements);
var myElements = from item in subdocument.Descendants(xmlns + "img")
where String.IsNullOrEmpty(item.Attribute("src").Value.Trim()) != true
select item;
foreach (var element in myElements)
{
Console.WriteLine(element.Attribute("src").Value.Trim());
}
Assert.IsNotNull(myElements.Count());
I know I could directly look for "img", but I want to be able to get other types of items in those pages, like links and some text.
I strongly doubt this is the best way!
The same logic in single query:
var myElements = from element in document.fullPage.Descendants(xmlns + "div")
where element.Attribute("id") != null
&& element.Attribute("id").Value == "maincolumn"
from item in new XDocument(element).Descendants(xmlns + "img")
where !String.IsNullOrEmpty(item.Attribute("src").Value.Trim())
select item;
If you insist on parsing the web page as XML, try this:
var elements =
from element in document.Descendants(xmlns + "div")
where (string)element.Attribute("id") == "maincolumn"
from element2 in element.Descendants(xmlns + "img")
let src = ((string)element2.Attribute("src")).Trim()
where String.IsNullOrEmpty(src)
select new {
element2,
src
};
foreach (var item in elements) {
Console.WriteLine(item.src);
}
Notes:
What is the type of document? I am assuming it's an XDocument. If that is the case, you can use Descendants directly on XDocument. (OTOTH if document is an XDocument, where does that fullPath property come from?)
Cast the XAttribute to a string. If it's empty, the result of the cast will be null. This will save on the double check. (This doesn't offer any performance benefits.)
Use let to "save" a value for later reuse, in this case for use in the foreach. Unless all you need is that final Assert, in which case it might be more efficient to use Any instead of Count. Any only has to iterate over the first result in order to return a value; Count has to iterate over all of them.
Why is subdocument of type XDocument? Wouldn't XElement be the appropriate type?
You can also use String.IsNullOrWhitespace to check for whitespace in src, instead of String.IsNullOrEmpty, assuming you want to process the src as is, with any whitespace it might have.

Is if(items != null) superfluous before foreach(T item in items)?

I often come across code like the following:
if ( items != null)
{
foreach(T item in items)
{
//...
}
}
Basically, the if condition ensures that foreach block will execute only if items is not null. I'm wondering if the if condition is really needed, or foreach will handle the case if items == null.
I mean, can I simply write
foreach(T item in items)
{
//...
}
without worrying about whether items is null or not? Is the if condition superfluous? Or this depends on the type of items or maybe on T as well?
You still need to check if (items != null) otherwise you will get NullReferenceException. However you can do something like this:
List<string> items = null;
foreach (var item in items ?? new List<string>())
{
item.Dump();
}
but you might check performance of it. So I still prefer having if (items != null) first.
Based on Eric's Lippert suggestion I changed code to:
List<string> items = null;
foreach (var item in items ?? Enumerable.Empty<string>())
{
item.Dump();
}
Using C# 6 you could use the new null conditional operator together with List<T>.ForEach(Action<T>) (or your own IEnumerable<T>.ForEach extension method).
List<string> items = null;
items?.ForEach(item =>
{
// ...
});
The real takeaway here should be a sequence should almost never be null in the first place. Simply make it an invariant in all of your programs that if you have a sequence, it is never null. It is always initialized to be the empty sequence or some other genuine sequence.
If a sequence is never null then obviously you don't need to check it.
Actually there is a feature request on that here: https://github.com/dotnet/csharplang/discussions/1081#issuecomment-443209795
And the response is quite logical:
I think that most foreach loops are
written with the intent of iterating a
non-null collection. If you try
iterating through null you should get
your exception, so that you can fix
your code.
You could always test it out with a null list... but this is what I found on the msdn website
foreach-statement:
foreach ( type identifier in expression ) embedded-statement
If expression has the value null, a System.NullReferenceException is thrown.
You can encapsulate the null check in an extension method and use a lambda:
public static class EnumerableExtensions {
public static void ForEach<T>(this IEnumerable<T> self, Action<T> action) {
if (self != null) {
foreach (var element in self) {
action(element);
}
}
}
}
The code becomes:
items.ForEach(item => {
...
});
If can be even more concise if you want to just call a method that takes an item and returns void:
items.ForEach(MethodThatTakesAnItem);
It is not superflous. At runtime items will be casted to an IEnumerable and its GetEnumerator method will be called. That will cause a dereferencing of items that will fail
You do need this. You'll get an exception when foreach accesses the container to set up the iteration otherwise.
Under the covers, foreach uses an interface implemented on the collection class to perform the iteration. The generic equivalent interface is here.
The foreach statement of the C#
language (for each in Visual Basic)
hides the complexity of the
enumerators. Therefore, using foreach
is recommended instead of directly
manipulating the enumerator.
The test is necessary, because if the collection is null, foreach will throw a NullReferenceException. It's actually quite simple to try it out.
List<string> items = null;
foreach(var item in items)
{
Console.WriteLine(item);
}
the second will throw a NullReferenceException with the message Object reference not set to an instance of an object.
As mentioned here you need to check is it not null.
Do not use an expression that evaluates to null.
The accepted answer is getting old.
Nowadays, nullable types are used extensively and help the compiler understand what you're trying to achive (and avoid mistakes).
Which means that your list could be this :
List<Item>? list
...OR... this :
List<Item> list
You'll need to check for nullability only for the former case.
Same thing goes for items:
List<Item?> list
...OR... this :
List<Item> list
You'll need to check for nullability of items only for the former case.
And of course finally you have this :
List<Item?>? list
where anything (list and items) could potentially be null.
==================
EDIT: A picture is better than 1,000 words
In C# 6 you can write sth like this:
// some string from file or UI, i.e.:
// a) string s = "Hello, World!";
// b) string s = "";
// ...
var items = s?.Split(new char[] { ',', '!', ' ' }) ?? Enumerable.Empty<string>();
foreach (var item in items)
{
//..
}
It's basically Vlad Bezden's solution but using the ?? expression to always generate an array that is not null and therefore survives the foreach rather than having this check inside the foreach bracket.

Remove repetitive, hard coded loops and conditions in C#

I have a class that compares 2 instances of the same objects, and generates a list of their differences. This is done by looping through the key collections and filling a set of other collections with a list of what has changed (this may make more sense after viewing the code below). This works, and generates an object that lets me know what exactly has been added and removed between the "old" object and the "new" one.
My question/concern is this...it is really ugly, with tons of loops and conditions. Is there a better way to store/approach this, without having to rely so heavily on endless groups of hard-coded conditions?
public void DiffSteps()
{
try
{
//Confirm that there are 2 populated objects to compare
if (NewStep.Id != Guid.Empty && SavedStep.Id != Guid.Empty)
{
//<TODO> Find a good way to compare quickly if the objects are exactly the same...hash?
//Compare the StepDoc collections:
OldDocs = SavedStep.StepDocs;
NewDocs = NewStep.StepDocs;
Collection<StepDoc> docstoDelete = new Collection<StepDoc>();
foreach (StepDoc oldDoc in OldDocs)
{
bool delete = false;
foreach (StepDoc newDoc in NewDocs)
{
if (newDoc.DocId == oldDoc.DocId)
{
delete = true;
}
}
if (delete)
docstoDelete.Add(oldDoc);
}
foreach (StepDoc doc in docstoDelete)
{
OldDocs.Remove(doc);
NewDocs.Remove(doc);
}
//Same loop(s) for StepUsers...omitted for brevity
//This is a collection of users to delete; it is the collection
//of users that has not changed. So, this collection also needs to be checked
//to see if the permisssions (or any other future properties) have changed.
foreach (StepUser user in userstoDelete)
{
//Compare the two
StepUser oldUser = null;
StepUser newUser = null;
foreach(StepUser oldie in OldUsers)
{
if (user.UserId == oldie.UserId)
oldUser = oldie;
}
foreach (StepUser newie in NewUsers)
{
if (user.UserId == newie.UserId)
newUser = newie;
}
if(oldUser != null && newUser != null)
{
if (oldUser.Role != newUser.Role)
UpdatedRoles.Add(newUser.Name, newUser.Role);
}
OldUsers.Remove(user);
NewUsers.Remove(user);
}
}
}
catch(Exception ex)
{
string errorMessage =
String.Format("Error generating diff between Step objects {0} and {1}", NewStep.Id, SavedStep.Id);
log.Error(errorMessage,ex);
throw;
}
}
The targeted framework is 3.5.
Are you using .NET 3.5? I'm sure LINQ to Objects would make a lot of this much simpler.
Another thing to think about is that if you've got a lot of code with a common pattern, where just a few things change (e.g. "which property am I comparing?" then that's a good candidate for a generic method taking a delegate to represent that difference.
EDIT: Okay, now we know we can use LINQ:
Step 1: Reduce nesting
Firstly I'd take out one level of nesting. Instead of:
if (NewStep.Id != Guid.Empty && SavedStep.Id != Guid.Empty)
{
// Body
}
I'd do:
if (NewStep.Id != Guid.Empty && SavedStep.Id != Guid.Empty)
{
return;
}
// Body
Early returns like that can make code much more readable.
Step 2: Finding docs to delete
This would be much nicer if you could simply specify a key function to Enumerable.Intersect. You can specify an equality comparer, but building one of those is a pain, even with a utility library. Ah well.
var oldDocIds = OldDocs.Select(doc => doc.DocId);
var newDocIds = NewDocs.Select(doc => doc.DocId);
var deletedIds = oldDocIds.Intersect(newDocIds).ToDictionary(x => x);
var deletedDocs = oldDocIds.Where(doc => deletedIds.Contains(doc.DocId));
Step 3: Removing the docs
Either use the existing foreach loop, or change the properties. If your properties are actually of type List<T> then you could use RemoveAll.
Step 4: Updating and removing users
foreach (StepUser deleted in usersToDelete)
{
// Should use SingleOfDefault here if there should only be one
// matching entry in each of NewUsers/OldUsers. The
// code below matches your existing loop.
StepUser oldUser = OldUsers.LastOrDefault(u => u.UserId == deleted.UserId);
StepUser newUser = NewUsers.LastOrDefault(u => u.UserId == deleted.UserId);
// Existing code here using oldUser and newUser
}
One option to simplify things even further would be to implement an IEqualityComparer using UserId (and one for docs with DocId).
As you are using at least .NET 2.0 I recommend implement Equals and GetHashCode ( http://msdn.microsoft.com/en-us/library/7h9bszxx.aspx ) on StepDoc. As a hint to how it can clean up your code you could have something like this:
Collection<StepDoc> docstoDelete = new Collection<StepDoc>();
foreach (StepDoc oldDoc in OldDocs)
{
bool delete = false;
foreach (StepDoc newDoc in NewDocs)
{
if (newDoc.DocId == oldDoc.DocId)
{
delete = true;
}
}
if (delete) docstoDelete.Add(oldDoc);
}
foreach (StepDoc doc in docstoDelete)
{
OldDocs.Remove(doc);
NewDocs.Remove(doc);
}
with this:
oldDocs.FindAll(newDocs.Contains).ForEach(delegate(StepDoc doc) {
oldDocs.Remove(doc);
newDocs.Remove(doc);
});
This assumes oldDocs is a List of StepDoc.
If both StepDocs and StepUsers implement IComparable<T>, and they are stored in collections that implement IList<T>, then you can use the following helper method to simplify this function. Just call it twice, once with StepDocs, and once with StepUsers. Use the beforeRemoveCallback to implement the special logic used to do your role updates. I'm assuming the collections don't contain duplicates. I've left out argument checks.
public delegate void BeforeRemoveMatchCallback<T>(T item1, T item2);
public static void RemoveMatches<T>(
IList<T> list1, IList<T> list2,
BeforeRemoveMatchCallback<T> beforeRemoveCallback)
where T : IComparable<T>
{
// looping backwards lets us safely modify the collection "in flight"
// without requiring a temporary collection (as required by a foreach
// solution)
for(int i = list1.Count - 1; i >= 0; i--)
{
for(int j = list2.Count - 1; j >= 0; j--)
{
if(list1[i].CompareTo(list2[j]) == 0)
{
// do any cleanup stuff in this function, like your role assignments
if(beforeRemoveCallback != null)
beforeRemoveCallback(list[i], list[j]);
list1.RemoveAt(i);
list2.RemoveAt(j);
break;
}
}
}
}
Here is a sample beforeRemoveCallback for your updates code:
BeforeRemoveMatchCallback<StepUsers> callback =
delegate(StepUsers oldUser, StepUsers newUser)
{
if(oldUser.Role != newUser.Role)
UpdatedRoles.Add(newUser.Name, newUser.Role);
};
What framework are you targeting? (This will make a difference in the answer.)
Why is this a void function?
Shouldn't the signature look like:
DiffResults results = object.CompareTo(object2);
If you want to hide the traversal of the tree-like structure you could create an IEnumerator subclass that hides the "ugly" looping constructs and then use CompareTo interface:
MyTraverser t =new Traverser(oldDocs, newDocs);
foreach (object oldOne in t)
{
if (oldOne.CompareTo(t.CurrentNewOne) != 0)
{
// use RTTI to figure out what to do with the object
}
}
However, I'm not at all sure that this particularly simplifies anything. I don't mind seeing the nested traversal structures. The code is nested, but not complex or particularly difficult to understand.
Using multiple lists in foreach is easy. Do this:
foreach (TextBox t in col)
{
foreach (TextBox d in des) // here des and col are list having textboxes
{
// here remove first element then and break it
RemoveAt(0);
break;
}
}
It works similar as it is foreach (TextBox t in col && TextBox d in des)

Categories

Resources