remove duplicates items from a collection except its first occurance - c#

I have a Collection of type string that can contain any number of elements.
Now i need to find out all those elements that are duplicating and find out only the first occurance of duplicating elements and delete rest.
For ex
public class CollectionCategoryTitle
{
public long CollectionTitleId { get; set; }
public bool CollectionTitleIdSpecified { get; set; }
public string SortOrder { get; set; }
public TitlePerformance performanceField { get; set; }
public string NewOrder { get; set; }
}
List<CollectionCategoryTitle> reorderTitles =
(List<CollectionCategoryTitle>)json_serializer
.Deserialize<List<CollectionCategoryTitle>>(rTitles);
Now i need to process this collection in such a way tat it removes duplicates but it must keep the 1st occurance.
EDIT:
I have updated the code and i need to compare on "NewOrder " property
Thanks

For your specific case:
var withoutDuplicates = reorderTitles.GroupBy(z => z.NewOrder).Select(z => z.First()).ToList();
For the more general case, Distinct() is generally preferable. For example:
List<int> a = new List<int>();
a.Add(4);
a.Add(1);
a.Add(2);
a.Add(2);
a.Add(4);
a = a.Distinct().ToList();
will return 4, 1, 2. Note that Distinct doesn't guarantee the order of the returned data (the current implementation does seem to return them based on the order of the original data - but that is undocumented and thus shouldn't be relied upon).

Use the Enumerable.Distinct<T>() extension method to do this.

EDIT: mjwills correctly points out that guaranteed ordering is important in the question, so the other two suggestions are not spec-guaranteed to work. Leaving just the one that gives this guarantee.
private static IEnumerable<CollectionCategoryTitle> DistinctNewOrder(IEnumerable<CollectionCategoryTitle> src)
{
HashSet<string> seen = new HashSet<string>();
//for one last time, change for different string comparisons, such as
//new HashSet<string>(StringComparer.CurrentCultureIgnoreCase)
foreach(var item in src)
if(seen.Add(item.NewOrder))
yield return item;
}
/*...*/
var distinctTitles = reorderTitles.DistinctNewOrder().ToList();
Finally, only use .ToList() after the call to DistinctNewOrder() if you actually need it to be a list. If you're going to process the results once and then do no further work, you're better off not creating a list which wastes time and memory.

Related

C# Binary search

so I am trying to learn and practice binary search but unfortunately can not understand how binary search could work for a List of objects or just strings for example. It does not look that complicated when you deal with numbers. But how do u actually perform a binary search for instance with list of object which holds property name inside as a string value.
Binary search assumes sorted collection. So, you have to provide a compare(a,b) function. That function will return -1,0 or 1 as the result of the comparison. The function implementation for numbers or chars is trivial. But, you can implement a much more complex logic that takes one or more object properties into consideration. As long as you provide that function, you can sort any collection of the objects and you can apply binary search on that collection.
You would do it the same way as with numbers, the only difference is, that you access the property of the instance you are looking at.
For example items[x].Value instead of items[x].
Say you have a list of friends which use the Friend Class and you want to use a binary search to find if a friends exists in this list. Firstly, a binary search should only be conducted on a sorted list. Using lambda, you can order the list and then change it to an array. In this example I am collecting the input into from a textbox (which is looking for the name of the friend) after a button is clicked and then conducting the binary search. The Friend class must also implement IComparable.
class Friend : IComparable<Friend>
{
public string Name { get; set; }
public string Likes { get; set; }
public string Dislikes { get; set; }
public int Birthday { get; set; }
//Used for the binary search
public int CompareTo(Friend other)
{
return this.Name.CompareTo(other.Name);
}
}
class MainWindow
{
List<Friend> friends = new List<Friend>();
//other functions here populated the list
private void OnButtonClick(object sender, EventArgs e)
{
Friend[] sortedArray = friends.OrderBy(f => f.Name).ToArray();
int index = Array.BinarySearch(sortedArray, new Friend() { Name = tb_binarySearch.Text });
if (index < 0)
{
Console.WriteLine("Friend does not exist in list");
}
else
{
Console.WriteLine("Your friend exists at index {0}", index);
}
}
}
If the index returns as a negative number, the object does not exist. Otherwise it will be the index of the object in the sorted list.

How should I remove elements from a generic list based on the list s object's inclusion of elementfrom another list in C# using predicate logic?

I am trying to learn C# by making a simple program that shows the user sushi rolls given their desired ingredients. i.e. a user wants a roll with crab, and the program will spit out a list of sushi rolls that contain crab.
I've created a Roll class
public class Roll
{
private string name;
private List<string> ingredients = new List<string>();
}
With some getters and setters and other various methods.
In the GUI, I have some checkboxes which each call an update() method from the Control class, which will then need to check a list of rolls against a list of ingredients given by the GUI checkboxes. What I have is this
class Controller
{
static List<Roll> Rolls = new List<Roll>();
static RollList RL = new RollList();
static List<String> ingredients = new List<String>();
static Roll roll = new Roll();
}
public void update
{
foreach(Roll roll in Rolls)
{
foreach (String ingredient in ingredients)
if (!roll.checkForIngredient(ingredient))
Rolls.Remove(roll);
}
}
But a System.InvalidOperationException is thrown saying that because the collection was modified, the operation can't execute. OK, that's fair, but then what's the best way to do this? Here on Stack Overflow there's a post about removing elements from a generic list while iterating over it.
This was good and pointed me in the right direction, but unfortunately, my predicate condition simply doesn't match the top answer's.
It would have to iterate over the ingredients list, and I'm not even sure that's possible...
list.RemoveAll(roll => !roll.containsIngredient(each string ingredient in ingredients) );
shudder
I've tried the for loop, but I can't seem to get the enumeration to work either, and I wonder if it's even necessary to enumerate the class for just this method.
So I come here to try and find an elegant, professional solution to my problem. Keep in mind that I'm new to C# and I'm not all too familiar with predicate logic or enumeration on classes.
To use RemoveAll you can rewrite your condition to this:
list.RemoveAll(roll => !ingredients.All(roll.checkForIngredient));
This exploits the fact that when the compiler sees this, it will effectively rewrite it to this:
list.RemoveAll(roll => !ingredients.All(i => roll.checkForIngredient(i)));
Which is what you want. If not all the ingredients are present, remove the roll.
Now, having said that, since you say you're a beginner, perhaps you feel more comfortable keeping your loop, if you could just make it work (ie. stop crashing due to modifying the loop). To do that, just make a copy of the collection and then loop through the copy, you can do this by just modifying the foreach statement to this:
foreach(Roll roll in Rolls.ToList())
This will create a list based copy of the Rolls collection, and then loop on that. The list will not be modified, even if Rolls is, it is a separate copy containing all the elements of Rolls when it was created.
As requested in the comments, I'll try to explain how this line of code works:
list.RemoveAll(roll => !ingredients.All(roll.checkForIngredient));
The RemoveAll method, which you can see the documentation for here takes a predicate, a Predicate<T>, which is basically a delegate, a reference to a method.
This can be a lambda, syntax that creates an anonymous method, using the => operator. An anonymous method is basically a method declared where you want to use it, without a name, hence the anonymous part. Let's rewrite the code to use an anonymous method instead of a lambda:
list.RemoveAll(delegate(Roll roll)
{
return !ingredients.All(roll.checkForIngredient);
});
This is the exact same compiled code as for the lambda version above, just using the bit more verbose syntax of an anonymous method.
So, how does the code inside the method work.
The All method is an extension method, found on the Enumerable class: Enumerable.All.
It will basically loop through all the elements of the collection it is extending, in this case the ingredients collection of a single roll, and call the predicate function. If for any of the elements the predicate returns false, the result of calling All will also be false. If all the calls return true, the result will also be true. Note that if the collection (ingredients) is empty, the result will also be true.
So let's try to rewrite our lambda code, which again looked like this:
list.RemoveAll(roll => !ingredients.All(roll.checkForIngredient));
Into a more verbose method, not using the All extension method:
list.RemoveAll(delegate(Roll roll)
{
bool all = true;
foreach (var ingredient in ingredients)
if (!roll.checkForIngredient(ingredient))
{
all = false;
break;
}
return !all;
});
This now starts to look like your original piece of code, except that we're using the RemoveAll method, which needs a predicate that returns whether to remove the item or not. Since if all is false, we need to remove the roll, we use the not operator ! to reverse that value.
Since you are both new to C# but also asked for an elegant solution, I will give you an example of how to solve this using a more object-oriented approach.
First of all, any "thing" of significance should be modeled as a class, even if it has just one property. This makes it easier to extend the behavior later on. You already defined a class for Roll. I would also add a class for Ingredient:
public class Ingredient
{
private string _name;
public string Name
{
get { return _name; }
}
public Ingredient(string name)
{
_name = name;
}
}
Note the Name property which only has a getter, and the constructor which accepts a string name. This might look like unnecessary complexity at first but will make your code more straightforward to consume further down the road.
Next, we'll modify your Roll class according to this guideline and give it some helper methods that make it easier for us to check if a roll contains a certain (list of) ingredients:
public class Roll
{
private string _name;
private List<Ingredient> _ingredients = new List<Ingredient>();
public string Name
{
// By only exposing the property through a getter, you are preventing the name
// from being changed after the roll has been created
get { return _name; }
}
public List<Ingredient> Ingredients
{
// Similarly here, you are forcing the consumer to use the AddIngredient method
// where you can do any necessary checks before actually adding the ingredient
get { return _ingredients; }
}
public Roll(string name)
{
_name = name;
}
public bool AddIngredient(Ingredient ingredient)
{
// Returning a boolean value to indicate whether the ingredient was already present,
// gives the consumer of this class a way to present feedback to the end user
bool alreadyHasIngredient = _ingredients.Any(i => i.Name == ingredient.Name);
if (!alreadyHasIngredient)
{
_ingredients.Add(ingredient);
return true;
}
return false;
}
public bool ContainsIngredients(IEnumerable<Ingredient> ingredients)
{
// We use a method group to check for all of the supplied ingredients
// whether or not they exist
return ingredients.All(ContainsIngredient);
// Could be rewritten as: ingredients.All(i => ContainsIngredient(i));
}
public bool ContainsIngredient(Ingredient ingredient)
{
// We simply check if an ingredient is present by comparing their names
return _ingredients.Any(i => i.Name == ingredient.Name);
}
}
Pay attention to the ContainsIngredient and ContainsIngredients methods here. Now you can do stuff like if (roll.ContainsIngredient(ingredient)), which will make your code more expressive and more readable. You'll see this in action in the next class that I'm going to add, RollCollection.
You are modeling collections of food to pick from, presumably in the context of a restaurant menu or some similar domain. You might as well go ahead and model just that: a RollCollection. This will allow you to encapsulate some meaningful logic inside of the collection.
Again, this sort of thing tends to require some boilerplate code and may look overly complex at first, but it will make your classes easier to consume. So let's add a RollCollection:
public class RollCollection : IEnumerable<Roll>
{
private List<Roll> _rolls = new List<Roll>();
public RollCollection()
{
// We need to provide a default constructor if we want to be able
// to instantiate an empty RollCollection and then add rolls later on
}
public RollCollection(IEnumerable<Roll> rolls)
{
// By providing a constructor overload which accepts an IEnumerable<Roll>,
// we have the opportunity to create a new RollCollection based on a filtered existing collection of rolls
_rolls = rolls.ToList();
}
public RollCollection WhichContainIngredients(IEnumerable<Ingredient> ingredients)
{
IEnumerable<Roll> filteredRolls = _rolls
.Where(r => r.ContainsIngredients(ingredients));
return new RollCollection(filteredRolls);
}
public bool AddRoll(Roll roll)
{
// Similar to AddIngredient
bool alreadyContainsRoll = _rolls.Any(r => r.Name == roll.Name);
if (!alreadyContainsRoll)
{
_rolls.Add(roll);
return true;
}
return false;
}
#region IEnumerable implementation
public IEnumerator<Roll> GetEnumerator()
{
foreach (Roll roll in _rolls)
{
yield return roll;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}
WhichContainIngredients is the thing we were really looking for, as it allows you to do something like this:
// I have omitted the (proper) instantiation of Rolls and ChosenIngredients for brevity here
public RollCollection Rolls { get; set; }
public List<Ingredient> ChosenIngredients { get; set; }
public void Update()
{
Rolls = Rolls.WhichContainIngredients(ChosenIngredients);
}
This is simple and clean, just the sort of thing you want to be doing in your presentation layer. The logic to accomplish your requirement is now nicely encapsulated in the RollCollection class.
EDIT: a more complete (but still simplified) example of how your Controller class might end up looking like:
public class Controller
{
private RollCollection _availableRolls = new RollCollection();
private List<Ingredient> _availableIngredients = new List<Ingredient>();
public RollCollection AvailableRolls
{
get { return _availableRolls; }
}
public List<Ingredient> AvailableIngredients
{
get { return _availableIngredients; }
}
public RollCollection RollsFilteredByIngredients
{
get { return AvailableRolls.WhichContainIngredients(ChosenIngredients); }
}
public List<Ingredient> ChosenIngredients { get; set; }
public Controller()
{
ChosenIngredients = new List<Ingredient>();
InitializeTestData();
}
private void InitializeTestData()
{
Ingredient ingredient1 = new Ingredient("Ingredient1");
Ingredient ingredient2 = new Ingredient("Ingredient2");
Ingredient ingredient3 = new Ingredient("Ingredient3");
_availableIngredients.Add(ingredient1);
_availableIngredients.Add(ingredient2);
_availableIngredients.Add(ingredient3);
Roll roll1 = new Roll("Roll1");
roll1.AddIngredient(ingredient1);
roll1.AddIngredient(ingredient2);
Roll roll2 = new Roll("Roll2");
roll2.AddIngredient(ingredient3);
_availableRolls.AddRoll(roll1);
_availableRolls.AddRoll(roll2);
}
}
I am trying to learn C# by making a simple program that shows the user
sushi rolls given their desired ingredients. i.e. a user wants a roll
with crab, and the program will spit out a list of sushi rolls that
contain crab.
Here's my solution to the given problem:
public class Roll
{
public string Name { get; set; }
private List<string> ingredients = new List<string>();
public IList<string> Ingredients { get { return ingredients; } }
public bool Contains(string ingredient)
{
return Ingredients.Any(i => i.Equals(ingredient));
}
}
You can use the LINQ extension method .Where to filter your collection of Rolls
public class Program
{
static void Main()
{
var allRolls = new List<Roll>
{
new Roll
{
Name = "Roll 1",
Ingredients = { "IngredientA", "Crab", "IngredientC" }
},
new Roll
{
Name = "Roll 2",
Ingredients = { "IngredientB", "IngredientC" }
},
new Roll
{
Name = "Roll 3",
Ingredients = { "Crab", "IngredientA" }
}
};
var rollsWithCrab = allRolls.Where(roll => roll.Contains("Crab"));
foreach (Roll roll in rollsWithCrab)
{
Console.WriteLine(roll.Name);
}
}
}
From what I see you're trying to remove all rolls that don't contain crab from your list of rolls. A better approach is to filter out those rolls that don't contain crab (using .Where), you can then use .ToList() if you need to manipulate the whole list directly rather than iterating through the collection (fetching one item at a time).
You should read up on Delegates, Iterators, Extension Methods and LINQ to better understand what's going on under the covers.

Efficient data pattern to store three related items for easy access

I am having a little trouble designing an efficient data storage method of storing some bookmark elements for a word automation project. Here is what i need to do. I need to get all the bookmarkstart and bookmark end and the bookmark id stored in a neat data structure that lets me access any one of these three objects given one of them with the least runtime complexity. If for example i didnt need the id to be stored, i could just make a dictionary and use the bookmark start as a key and the bookmark end as a value to have an access time of O(1). But is there a logical, simple and efficient structure to have this functionalty with all three of these items coupled together?
Thanks
If bookmark start and bookmark end are both zero based integers, you could store them in simple arrays. If you use a dictionary, then it's no longer O(1), but very close to it.
Create a basic object with the 3 fields, then have 2 arrays as indices to the actual data.
public class Bookmark
{
public int ID { get; set; }
public int Start { get; set; }
public int End { get; set; }
}
// setting up your bookmark indices
const int NumBookmarks = 200;
Bookmark[] startIndices = new Bookmark[NumBookmarks];
Bookmark[] endIndices = new Bookmark[NumBookmarks];
// add a new bookmark
Bookmark myBookmark = new Bookmark(){ID=5, Start=10, End=30};
startIndices[myBookmark.Start] = myBookmark;
endIndices[myBookmark.End] = myBookmark;
// get a bookmark
Bookmark myBookmark = startIndices[10];
Of course using an array is probably the least flexible, but will be the fastest.
If you don't need absolute speed, you can create a List, and use the Find method.
Bookmark myBookmark = myBookmarks.Find(x=>x.Start==10);
It is called a DTO:
public class Bookmark
{
public int Id { get; set; }
public DateTime Start { get; set; }
public DateTime End { get; set; }
}
You can store those bookmarks in a dictionary:
var dict = new Dictionary<int, Bookmark>();
dict[bookmark.Id] = bookmark;
You are looking at implementing a customized Map/Hash/Dictionary class for this purpose.
Basically:
class BookmarkObj { /* similar to steven's */ }
class BookmarkStore {
Dictionary<int, BookmarkObj> byId;
Dictionary<DateTime, BookmarkObj> byStartDate;
Dictionary<DateTime, BookmarkObj> byEndDate;
/* Boring init code */
public void Insert(BookmarkObj obj) {
byId[obj.Id] = obj;
byStartDate[obj.Start] = obj;
byEndDate[obj.End] = obj;
}
public BookmarkObj GetById(int id) {
return byId[obj.Id];
}
/* And so on */
}
This data structure doesn't really map onto IDictionary, but you could maybe make it implement ICollection if iteration, and contracts are important to you.
If the O(1) lookup is not sooo important, you could alternatively just implement this as an List and do lookups using LINQ and simplify your life a bit. Remember, premature optimization is bad.

Converting nested foreach loops to LINQ

I've written the following code to set the properties on various classes. It works, but one of my new year's rsolutions is to make as much use of LINQ as possible and obviously this code doesn't. Is there a way to rewrite it in a "pure LINQ" format, preferably without using the foreach loops? (Even better if it can be done in a single LINQ statement - substatements are fine.)
I tried playing around with join but that didn't get me anywhere, hence I'm asking for an answer to this question - preferably without an explanation, as I'd prefer to "decompile" the solution to figure out how it works. (As you can probably guess I'm currently a lot better at reading LINQ than writing it, but I intend to change that...)
public void PopulateBlueprints(IEnumerable<Blueprint> blueprints)
{
XElement items = GetItems();
// item id => name mappings
var itemsDictionary = (
from item in items
select new
{
Id = Convert.ToUInt32(item.Attribute("id").Value),
Name = item.Attribute("name").Value,
}).Distinct().ToDictionary(pair => pair.Id, pair => pair.Name);
foreach (var blueprint in blueprints)
{
foreach (var material in blueprint.Input.Keys)
{
if (itemsDictionary.ContainsKey(material.Id))
{
material.Name = itemsDictionary[material.Id];
}
else
{
Console.WriteLine("m: " + material.Id);
}
}
if (itemsDictionary.ContainsKey(blueprint.Output.Id))
{
blueprint.Output.Name = itemsDictionary[blueprint.Output.Id];
}
else
{
Console.WriteLine("b: " + blueprint.Output.Id);
}
}
}
Definition of the requisite classes follow; they are merely containers for data and I've stripped out all the bits irrelevant to my question:
public class Material
{
public uint Id { get; set; }
public string Name { get; set; }
}
public class Product
{
public uint Id { get; set; }
public string Name { get; set; }
}
public class Blueprint
{
public IDictionary<Material, uint> Input { get; set; }
public Product Output { get; set; }
}
I don't think this is actually a good candidate for conversion to LINQ - at least not in its current form.
Yes, you have a nested foreach loop - but you're doing something else in the top-level foreach loop, so it's not the easy-to-convert form which just contains nesting.
More importantly, the body of your code is all about side-effects, whether that's writing to the console or changing the values within the objects you've found. LINQ is great when you've got a complicated query and you want to loop over that to act on each item in turn, possibly with side-effects... but your queries aren't really complicated, so you wouldn't get much benefit.
One thing you could do is give Blueprint and Product a common interface containing Id and Name. Then you could write a single method to update the products and blueprints via itemsDictionary based on a query for each:
UpdateNames(itemsDictionary, blueprints);
UpdateNames(itemsDictionary, blueprints.SelectMany(x => x.Input.Keys));
...
private static void UpdateNames<TSource>(Dictionary<string, string> idMap,
IEnumerable<TSource> source) where TSource : INameAndId
{
foreach (TSource item in source)
{
string name;
if (idMap.TryGetValue(item.Id, out name))
{
item.Name = name;
}
}
}
This is assuming you don't actually need the console output. If you do, you could always pass in the appropriate prefix and add an "else" block in the method. Note that I've used TryGetValue instead of performing two lookups on the dictionary for each iteration.
I'll be honest, I did not read your code. For me, your question answered itself when you said "code to set the properties." You should not be using LINQ to alter the state of objects / having side effects. Yes, I know that you could write extension methods that would cause that to happen, but you'd be abusing the functional paradigm poised by LINQ, and possibly creating a maintenance burden, especially for other developers who probably won't be finding any books or articles supporting your endeaver.
As you're interested in doing as much as possible with Linq, you might like to try the VS plugin ReSharper. It will identify loops (or portions of loops) that can be converted to Linq operators. It does a bunch of other helpful stuff with Linq too.
For example, loops that sum values are converted to use Sum, and loops that apply an internal filter are changed to use Where. Even string concatenation or other recursion on an object is converted to Aggregate. I've learned more about Linq from trying the changes it suggests.
Plus ReSharper is awesome for about 1000 other reasons as well :)
As others have said, you probably don't want to do it without foreach loops. The loops signify side-effects, which is the whole point of the exercise. That said, you can still LINQ it up:
var materialNames =
from blueprint in blueprints
from material in blueprint.Input.Keys
where itemsDictionary.ContainsKey(material.Id)
select new { material, name = itemsDictionary[material.Id] };
foreach (var update in materialNames)
update.material.Name = update.name;
var outputNames =
from blueprint in blueprints
where itemsDictionary.ContainsKey(blueprint.Output.Id)
select new { blueprint, name = itemsDictionary[blueprint.Output.Id] };
foreach (var update in outputNames)
update.Output.Name = update.name;
What about this
(from blueprint in blueprints
from material in blueprint.Input.Keys
where itemsDictionary.ContainsKey(material.Id)
select new { material, name = itemsDictionary[material.Id] })
.ToList()
.ForEach(rs => rs.material.Name = rs.name);
(from blueprint in blueprints
where itemsDictionary.ContainsKey(blueprint.Output.Id)
select new { blueprint, name = itemsDictionary[blueprint.Output.Id] })
.ToList()
.ForEach(rs => rs.blueprint.Output.Name = rs.name);
See if this works
var res = from blueprint in blueprints
from material in blueprint.Input.Keys
join item in items on
material.Id equals Convert.ToUInt32(item.Attribute("id").Value)
select material.Set(x=> { Name = item.Attribute("id").Value; });
You wont find set method, for that there is an extension method created.
public static class LinqExtensions
{
/// <summary>
/// Used to modify properties of an object returned from a LINQ query
/// </summary>
public static TSource Set<TSource>(this TSource input,
Action<TSource> updater)
{
updater(input);
return input;
}
}

Why does List<T>.Sort method reorder equal IComparable<T> elements?

I have a problem with how the List Sort method deals with sorting. Given the following element:
class Element : IComparable<Element>
{
public int Priority { get; set; }
public string Description { get; set; }
public int CompareTo(Element other)
{
return Priority.CompareTo(other.Priority);
}
}
If I try to sort it this way:
List<Element> elements = new List<Element>()
{
new Element()
{
Priority = 1,
Description = "First"
},
new Element()
{
Priority = 1,
Description = "Second"
},
new Element()
{
Priority = 2,
Description = "Third"
}
};
elements.Sort();
Then the first element is the previously second element "Second". Or, in other words, this assertion fails:
Assert.AreEqual("First", elements[0].Description);
Why is .NET reordering my list when the elements are essentially the same? I'd like for it to only reorder the list if the comparison returns a non-zero value.
From the documentation of the List.Sort() method from MSDN:
This method uses Array.Sort, which uses the QuickSort algorithm. This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.
Here's the link:
http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx
Essentially, the sort is performing as designed and documented.
Here is an extension method SortStable() for List<T> where T : IComparable<T>:
public static void SortStable<T>(this List<T> list) where T : IComparable<T>
{
var listStableOrdered = list.OrderBy(x => x, new ComparableComparer<T>()).ToList();
list.Clear();
list.AddRange(listStableOrdered);
}
private class ComparableComparer<T> : IComparer<T> where T : IComparable<T>
{
public int Compare(T x, T y)
{
return x.CompareTo(y);
}
}
Test:
[Test]
public void SortStable()
{
var list = new List<SortItem>
{
new SortItem{ SortOrder = 1, Name = "Name1"},
new SortItem{ SortOrder = 2, Name = "Name2"},
new SortItem{ SortOrder = 2, Name = "Name3"},
};
list.SortStable();
Assert.That(list.ElementAt(0).SortOrder, Is.EqualTo(1));
Assert.That(list.ElementAt(0).Name, Is.EqualTo("Name1"));
Assert.That(list.ElementAt(1).SortOrder, Is.EqualTo(2));
Assert.That(list.ElementAt(1).Name, Is.EqualTo("Name2"));
Assert.That(list.ElementAt(2).SortOrder, Is.EqualTo(2));
Assert.That(list.ElementAt(2).Name, Is.EqualTo("Name3"));
}
private class SortItem : IComparable<SortItem>
{
public int SortOrder { get; set; }
public string Name { get; set; }
public int CompareTo(SortItem other)
{
return SortOrder.CompareTo(other.SortOrder);
}
}
In the test method, if you call Sort() method instead of SortStable(), you can see that the test would fail.
You told it how to compare things and it did. You should not rely on internal implementation of Sort in your application. That's why it let's you override CompareTo. If you want to have a secondary sort parameter ("description" in this case), code it into your CompareTo. Relying on how Sort just happens to work is a great way to code in a bug that is very difficult to find.
You could find a stable quicksort for .NET or use a merge sort (which is already stable).
See the other responses for why List.Sort() is unstable. If you need a stable sort and are using .NET 3.5, try Enumerable.OrderBy() (LINQ).
You can fix this by adding an "index value" to your structure, and including that in the CompareTo method when Priority.CompareTo returns 0. You would then need to initialize the "index" value before doing the sort.
The CompareTo method would look like this:
public int CompareTo(Element other)
{
var ret = Priority.CompareTo(other.Priority);
if (ret == 0)
{
ret = Comparer<int>.Default.Compare(Index, other.Index);
}
return ret;
}
Then instead of doing elements.Sort(), you would do:
for(int i = 0; i < elements.Count; ++i)
{
elements[i].Index = i;
}
elements.Sort();
In some applications, when a list of items is sorted according to some criterion, preserving the original order of items which compare equal is unnecessary. In other applications, it is necessary. Sort methods which preserve the arrangement of items with matching keys (called "stable sorts" are generally either much slower than those which do not ("unstable sorts"), or else they require a significant amount of temporary storage (and are still somewhat slower). The first "standard library" sort routine to become widespread was probably the qsort() function included in the standard C library. That library would frequently have been used to sort lists that were large relative to the total amount of memory available. The library would have been much less useful if it had required an amount of temporary storage proportional to the number of items in the array to be sorted.
Sort methods that will be used under frameworks like Java or .net could practically make use of much more temporary storage than would have been acceptable in a C qsort() routine. A temporary memory requirement equal to the size of the array to be sorted would in most cases not pose any particular problem. Nonetheless, since it's been traditional for libraries to supply a Quicksort implementation, that seems to be the pattern followed by .net.
If you wanted to sort based on two fields instead of one you could do this:
class Element : IComparable<Element>
{
public int Priority { get; set; }
public string Description { get; set; }
public int CompareTo(Element other)
{
if (Priority.CompareTo(other.Priority) == 0)
{
return Description.CompareTo(other.Description);
} else {
return Priority.CompareTo(other.Priority);
}
}
}
Obviously, this doesn't satisfy the requirement of a stable search algorithm; However, it does satisfy your assertion, and allows control of your element order in the event of equality.

Categories

Resources