List<T> BinarySearch using a String indexer

List<T> BinarySearch using a String indexer - c#

So I have a CreditCard class that has some properties, one of which is for the card number as a String (public string Number { get; set; }). I'm storing the CreditCard objects in a CreditCardList class which has the variable List (private List<CreditCard> cclist = new List<CreditCard>();). I want to be able to retrieve a CreditCard by its card number by sorting the List first, then using the BinarySearch method on the List. I also want to do this by passing a String indexer of the number to search for into the BinarySearch method, along with a comparer if I need one.
This is what I have so far for the method to get the CreditCard matching the number, but Visual Studio 2013 gives me an error on the line: int index = cclist.BinarySearch(cclist[input], new CreditCardComparer()); "the best overloaded method match for 'System.Collections.Generic.List.this[int]' has some invalid arguments." I assume it's because I'm using the String indexer wrong or something.
public List<CreditCard> GetCardByNumber (string input)
{
List<CreditCard> tempList = new List<CreditCard>();
// save the current unsorted list to a temporary list to revert back to after sorting
List<CreditCard> originalList = new List<CreditCard>(cclist.Capacity);
for (int i = 0; i < cclist.Capacity; i++)
{
originalList[i] = cclist[i];
}
// begin sorting for binary search of card number
cclist.Sort();
int index = cclist.BinarySearch(cclist[input], new CreditCardComparer());
if (index < 0)
{
tempList.Add(cclist[input]);
}
// revert back to the original unsorted list
for (int i = 0; i < originalList.Capacity; i++)
{
cclist[i] = originalList[i];
}
// return the found credit card matching the specified number
return tempList;
}// end GetCardByNumber (string input)
Here are my int and string indexers:
public CreditCard this[int i]
{
get
{
if (i < 0 || i >= cclist.Count)
{
throw new ArgumentOutOfRangeException("index " + i + " does not exist");
}
return cclist[i];
}
set
{
if (i < 0 || i >= cclist.Count)
{
throw new ArgumentOutOfRangeException("index " + i + " does not exist");
}
cclist[i] = value;
saveNeeded = true;
}
}// end CreditCard this[int i]
public CreditCard this[string input]
{
get
{
foreach (CreditCard cc in cclist)
{
if (cc.Number == input)
{
return cc;
}
}
return null;
}
}// end CreditCard this[string number]
And here is my comparer class:
public class CreditCardComparer : IComparer<CreditCard>
{
public override int Compare(CreditCard x, CreditCard y)
{
return x.Number.CompareTo(y.Number);
}
}// end CreditCardComparer : IComparer<CreditCard>
And lastly, here are the necessities for my list sorting and what not...
class CreditCard : IEquatable<CreditCard>, IComparable<CreditCard>
{
public bool Equals (CreditCard other)
{
if (this.Number == other.Number)
{
return true;
}
else
{
return false;
}
}// end Equals(CreditCard other)
public int CompareTo(CreditCard other)
{
return Number.CompareTo(other.Number);
}// end CompareTo(CreditCard other)
}
Is it truly possible to do what I'm attempting, i.e. sending a string indexer that returns a CreditCard object based on a string into a BinarySearch method of List?
Also, I can provide any more code if necessary, but I felt like this was a little much to begin with.

A System.Collections.Generic.List uses an int as the indexer property, it does not let you use a string.
If you want to use a string as your indexer (primary key), you should probably use a Dictionary<string,CreditCard> instead.

There are a couple things amiss in your GetCardByNumber method. First is the method returns an entire list instead of a single CreditCard, which goes against the method name. Second, the binary search is not even needed since you do the searching in the string indexer first:
public CreditCard this[string input]
{
get
{
foreach (CreditCard cc in cclist)
{
if (cc.Number == input)
{
return cc;
}
}
return null;
}
}
By this point, you've already found the CreditCard with the information you need, so why search for it again in a BinarySearch? Thirdly, as was covered in landoncz's answer, you can't use a string as an index for a List<T>. What you probably intended to use was the CreditCardList instead of the List<CreditCard>
CreditCardList creditCardList = new CreditCardList();
creditCardList["1234"]; //correct
List<CreditCard> cclist = new List<CreditCard>();
cclist["1234"]; //incorrect. This is where your error is coming from.
If you're trying to access the indexer property inside of the class that implements it (which I'm assuming you are trying to do in your GetCardByNumber method), just use this[index]:
public class CreditCardList
{
public CreditCard this[string s] { /*Implementation*/ }
public CreditCard GetCard(string s)
{
return this[s]; // right here!
}
}
Now... according to your comment, "Retrieve the CreditCard with a specified number if it exists using the BinarySearch method in List<T> in the implementation of a String indexer.", it seems to me that the assignment wants you doing something along these lines. (a thing to note is that I'm not sure of your entire implementation of the CreditCard class, so please excuse the naive instantiation in the following code)
public class CreditCardList
{
private List<CreditCard> cclist = new List<CreditCard>();
public CreditCardList()
{
//For the sake of an example, let's magically populate the list.
MagicallyPopulateAList(cclist);
}
public CreditCard this[string s] /* In the implementation of a String indexer... */
{
get
{
CreditCard ccToSearchFor = new CreditCard() { Number = value };
cclist.Sort();
/* ...use the BinarySearch method... */
int index = cclist.BinarySearch(ccToSearchFor);
if (index >= 0)
return cclist[index]; /* ...to retrieve a CreditCard. */
else
throw new ArgumentException("Credit Card Number not found.");
}
}
}

Related

How to find all the possible words using adjacent letters in a matrix

I have the following test matrix:
a l i
g t m
j e a
I intend to create an algorithm that helps me find every possible word from a given minimum length to a maximum length using adjacent letters only.
For example:
Minimum: 3 letters
Maximum: 6 letters
Based on the test matrix, I should have the following results:
ali
alm
alg
alt
ati
atm
atg
...
atmea
etc.
I created a test code (C#) that has a custom class which represents the letters.
Each letter knows its neighbors and has a generation counter (for keeping track of them during traversal).
Here is its code:
public class Letter
{
public int X { get; set; }
public int Y { get; set; }
public char Character { get; set; }
public List<Letter> Neighbors { get; set; }
public Letter PreviousLetter { get; set; }
public int Generation { get; set; }
public Letter(char character)
{
Neighbors = new List<Letter>();
Character = character;
}
public void SetGeneration(int generation)
{
foreach (var item in Neighbors)
{
item.Generation = generation;
}
}
}
I figured out that if I want it to be dynamic, it has to be based on recursion.
Unfortunately, the following code creates the first 4 words, then stops. It is no wonder, as the recursion is stopped by the specified generation level.
The main problem is that the recursion returns only one level but it would be better to return to the starting point.
private static void GenerateWords(Letter input, int maxLength, StringBuilder sb)
{
if (input.Generation >= maxLength)
{
if (sb.Length == maxLength)
{
allWords.Add(sb.ToString());
sb.Remove(sb.Length - 1, 1);
}
return;
}
sb.Append(input.Character);
if (input.Neighbors.Count > 0)
{
foreach (var child in input.Neighbors)
{
if (input.PreviousLetter == child)
continue;
child.PreviousLetter = input;
child.Generation = input.Generation + 1;
GenerateWords(child, maxLength, sb);
}
}
}
So, I feel a little stuck, any idea how I should proceed?

From here, you can treat this as a graph traversal problem. You start at each given letter, finding each path of length min_size to max_size, with 3 and 6 as those values in your example. I suggest a recursive routine that builds the words as paths through the grid. This will look something like the following; replace types and pseudo-code with your preferences.
<array_of_string> build_word(size, current_node) {
if (size == 1) return current_node.letter as an array_of_string;
result = <empty array_of_string>
for each next_node in current_node.neighbours {
solution_list = build_word(size-1, next_node);
for each word in solution_list {
// add current_node.letter to front of that word.
// add this new word to the result array
}
}
return the result array_of_string
}
Does that move you toward a solution?

When solving these kind of problems, I tend to use immutable classes because everything is so much easier to reason about. The following implementation makes use of a ad hoc ImmutableStack because its pretty straightforward to implement one. In production code I'd probably want to look into System.Collections.Immutable to improve performance (visited would be an ImmutableHashSet<> to point out the obvious case).
So why do I need an immutable stack? To keep track of the current character path and visited "locations" inside the matrix. Because the selected tool for the job is immutable, sending it down recursive calls is a no brainer, we know it can't change so I don't have to worry about my invariants in every recursion level.
So lets implement an immutable stack.
We'll also implement a helper class Coordinates that encapsulates our "locations" inside the matrix, will give us value equality semantics and a convenient way to obtain valid neighbors of any given location. It will obviously come in handy.
public class ImmutableStack<T>: IEnumerable<T>
{
private readonly T head;
private readonly ImmutableStack<T> tail;
public static readonly ImmutableStack<T> Empty = new ImmutableStack<T>(default(T), null);
public int Count => this == Empty ? 0 : tail.Count + 1;
private ImmutableStack(T head, ImmutableStack<T> tail)
{
this.head = head;
this.tail = tail;
}
public T Peek()
{
if (this == Empty)
throw new InvalidOperationException("Can not peek an empty stack.");
return head;
}
public ImmutableStack<T> Pop()
{
if (this == Empty)
throw new InvalidOperationException("Can not pop an empty stack.");
return tail;
}
public ImmutableStack<T> Push(T value) => new ImmutableStack<T>(value, this);
public IEnumerator<T> GetEnumerator()
{
var current = this;
while (current != Empty)
{
yield return current.head;
current = current.tail;
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
struct Coordinates: IEquatable<Coordinates>
{
public int Row { get; }
public int Column { get; }
public Coordinates(int row, int column)
{
Row = row;
Column = column;
}
public bool Equals(Coordinates other) => Column == other.Column && Row == other.Row;
public override bool Equals(object obj)
{
if (obj is Coordinates)
{
return Equals((Coordinates)obj);
}
return false;
}
public override int GetHashCode() => unchecked(27947 ^ Row ^ Column);
public IEnumerable<Coordinates> GetNeighbors(int rows, int columns)
{
var increasedRow = Row + 1;
var decreasedRow = Row - 1;
var increasedColumn = Column + 1;
var decreasedColumn = Column - 1;
var canIncreaseRow = increasedRow < rows;
var canIncreaseColumn = increasedColumn < columns;
var canDecreaseRow = decreasedRow > -1;
var canDecreaseColumn = decreasedColumn > -1;
if (canDecreaseRow)
{
if (canDecreaseColumn)
{
yield return new Coordinates(decreasedRow, decreasedColumn);
}
yield return new Coordinates(decreasedRow, Column);
if (canIncreaseColumn)
{
yield return new Coordinates(decreasedRow, increasedColumn);
}
}
if (canIncreaseRow)
{
if (canDecreaseColumn)
{
yield return new Coordinates(increasedRow, decreasedColumn);
}
yield return new Coordinates(increasedRow, Column);
if (canIncreaseColumn)
{
yield return new Coordinates(increasedRow, increasedColumn);
}
}
if (canDecreaseColumn)
{
yield return new Coordinates(Row, decreasedColumn);
}
if (canIncreaseColumn)
{
yield return new Coordinates(Row, increasedColumn);
}
}
}
Ok, now we need a method that traverses the matrix visiting each position once returning words that have a specified minimum number of characters and don't exceed a specified maximum.
public static IEnumerable<string> GetWords(char[,] matrix,
Coordinates startingPoint,
int minimumLength,
int maximumLength)
That looks about right. Now, when recursing we need to keep track of what characters we've visited, That's easy using our immutable stack, so our recursive method will look like:
static IEnumerable<string> getWords(char[,] matrix,
ImmutableStack<char> path,
ImmutableStack<Coordinates> visited,
Coordinates coordinates,
int minimumLength,
int maximumLength)
Now the rest is just plumbing and connecting the wires:
public static IEnumerable<string> GetWords(char[,] matrix,
Coordinates startingPoint,
int minimumLength,
int maximumLength)
=> getWords(matrix,
ImmutableStack<char>.Empty,
ImmutableStack<Coordinates>.Empty,
startingPoint,
minimumLength,
maximumLength);
static IEnumerable<string> getWords(char[,] matrix,
ImmutableStack<char> path,
ImmutableStack<Coordinates> visited,
Coordinates coordinates,
int minimumLength,
int maximumLength)
{
var newPath = path.Push(matrix[coordinates.Row, coordinates.Column]);
var newVisited = visited.Push(coordinates);
if (newPath.Count > maximumLength)
{
yield break;
}
else if (newPath.Count >= minimumLength)
{
yield return new string(newPath.Reverse().ToArray());
}
foreach (Coordinates neighbor in coordinates.GetNeighbors(matrix.GetLength(0), matrix.GetLength(1)))
{
if (!visited.Contains(neighbor))
{
foreach (var word in getWords(matrix,
newPath,
newVisited,
neighbor,
minimumLength,
maximumLength))
{
yield return word;
}
}
}
}
And we're done. Is this the most elegant or fastest algorithm? Probably not, but I find it the most understandable and therefore maintainable. Hope it helps you out.
UPDATE Based upon comments below, I've run a few test cases one of which is:
var matrix = new[,] { {'a', 'l'},
{'g', 't'} };
var words = GetWords(matrix, new Coordinates(0,0), 2, 4);
Console.WriteLine(string.Join(Environment.NewLine, words.Select((w,i) => $"{i:00}: {w}")));
And the outcome is the expected:
00: ag
01: agl
02: aglt
03: agt
04: agtl
05: at
06: atl
07: atlg
08: atg
09: atgl
10: al
11: alg
12: algt
13: alt
14: altg

Search Integer Array for duplicates

I'm currently creating a very basic game in C# and I have an inventory system created which using a very simple command (Items.Add(id, amount)) you can add items to said inventory. What I want to be able to do, which my current system does not do is be able to effectively "search" my inventory array which is a 2D array holding the item id and item amount. My current system is like this:
public static void add(int id, int amount)
{
for (int i = 0; i < Ship_Builder.Player.invCount; i++)
{
if (Ship_Builder.Player.inv[i, 0] == 0)
{
Ship_Builder.Player.inv[i, 0] = id;
Ship_Builder.Player.inv[i, 1] = amount;
}
}
Ship_Builder.Player.invCount++;
}
and I want it to (in an else if) be able to search the array. I did have this:
else if (Ship_Builder.Player.inv[i, 0] == Ship_Builder.Player.inv[i + 1, 0])
{
//Do
}
Before, but it didn't work how I wanted it to.
Any help would be greatly appreciated thanks,
Laurence.

As comments suggest, you should use a Dictionary for such a task. But if you have to use a 2-d array, which is (i presume) pre-populated with zeros before we add any items to it, then an if-else statement like you propose won't do the trick. What you need to do is iterate through the array looking for a matching id first and each time your ids don't match, you have to check if the id that you're currently checking is equal to 0. If it is, then you have traversed all "slots" which had some items in them without finding a match, which means this item must go into another, empty slot.
public static void add(int id, int amount)
{
for (int i = 0; i < Ship_Builder.Player.invCount; i++)
{
if (Ship_Builder.Player.inv[i, 0] != id)
{
if (Ship_Builder.Player.inv[i, 0] == 0)
{
Ship_Builder.Player.inv[i, 0] = id;
Ship_Builder.Player.inv[i, 1] = amount;
Ship_Builder.Player.invCount++;
continue;
}
}
else
{
Ship_Builder.Player.inv[i, 1] += amount;
continue;
}
}
}
Warning! My answer assumes that you locate new items in the empty slot with the smallest possible index. Also, if you are removing items and setting the id to zero as a result, then you'll have to traverse the whole array first in search of a matching index before you can allocate a new item. Which might get very expensive time-wise if the array is large.

There's a lot going on here (and there isn't enough detail to give any answer except in broad strokes), but how I would approach something like this would be to start with using object oriented designs rather than relying on indexed positions in arrays. I'd define something like this:
public class InventoryItem
{
public int Id { get; set; }
public int Amount { get; set; }
// Now I can add other useful properties here too
// ...like Name perhaps?
}
Now I'd make my inventory a Dictionary<int,InventoryItem> and adding something to my inventory might look something like this:
public void Add(int id, int amount)
{
// assuming myInventory is our inventory
if (myInventory.ContainsKey(id)) {
myInventory[id].Amount += amount;
}
else {
myInventory[id] = new InventoryItem()
{
Id = id,
Amount = amount
};
}
}
Now it's not necessary that you actually use the InventoryItem class, you could just stick with Dictonary<int,int>, but you'll probably find as you work through it that you'd much rather have some objects to work with.
Then you could probably have a master dictionary of all objects and just add them to your inventory, so you end up with something like:
public void Add(InventoryItem item, int amount)
{
// assuming myInventory is our inventory
if (myInventory.ContainsKey(item.Id)) {
myInventory[item.Id].Amount += amount;
}
else {
myInventory[item.Id] = new InventoryItem(item) // assuming you added a
// copy constructor, for example
{
Amount = amount
};
}
}

Depending on speed performance requirements (using arrays should be only slightly faster than this) you could just skip the hard coded values and arrays all together. This has a few semi-advanced topics:
public abstract class InventoryItem
// or interface
{
public abstract string Name { get; }
public int Count { get; set; }
}
public class InventoryGold : InventoryItem
{
public string Name { get { return "Gold" } }
}
public abstract class InventoryWeapon : InventoryItem { }
public class OgreSlayingKnife : InventoryWeapon
{
public string Name { get { return "Ogre Slaying Knife"; } }
public int VersusOgres { get { return +9; } }
}
public UpdateCount<Item>(this ICollection<Item> instance,
int absoluteCount)
{
var item = instance.OfType<Item>().FirstOrDefault();
if (item == null && absoluteCount > 0)
{
item = default(Item);
item.Count = absoluteCount;
instance.add(item);
}
else
{
if (absoluteCount > 0)
item.Count = absoluteCount;
else
instance.Remove(item);
}
}
// Probably should be a Hashset
var inventory = new List<InventoryItem>();
inventory.UpdateCount<InventoryGold>(10);
inventory.UpdateCount<OgreSlayingKnife(1)

On string interning and alternatives

I have a large file which, in essence contains data like:
Netherlands,Noord-holland,Amsterdam,FooStreet,1,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,2,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,3,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,4,...,...
Netherlands,Noord-holland,Amsterdam,FooStreet,5,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,1,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,2,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,3,...,...
Netherlands,Noord-holland,Amsterdam,BarRoad,4,...,...
Netherlands,Noord-holland,Amstelveen,BazDrive,1,...,...
Netherlands,Noord-holland,Amstelveen,BazDrive,2,...,...
Netherlands,Noord-holland,Amstelveen,BazDrive,3,...,...
Netherlands,Zuid-holland,Rotterdam,LoremAve,1,...,...
Netherlands,Zuid-holland,Rotterdam,LoremAve,2,...,...
Netherlands,Zuid-holland,Rotterdam,LoremAve,3,...,...
...
This is a multi-gigabyte file. I have a class that reads this file and exposes these lines (records) as an IEnumerable<MyObject>. This MyObject has several properties (Country,Province,City, ...) etc.
As you can see there is a LOT of duplication of data. I want to keep exposing the underlying data as an IEnumerable<MyObject>. However, some other class might (and probably will) make some hierarchical view/structure of this data like:
Netherlands
Noord-holland
Amsterdam
FooStreet [1, 2, 3, 4, 5]
BarRoad [1, 2, 3, 4]
...
Amstelveen
BazDrive [1, 2, 3]
...
...
Zuid-holland
Rotterdam
LoremAve [1, 2, 3]
...
...
...
...
When reading this file, I do, essentially, this:
foreach (line in myfile) {
fields = line.split(",");
yield return new MyObject {
Country = fields[0],
Province = fields[1],
City = fields[2],
Street = fields[3],
//...other fields
};
}
Now, to the actual question at hand: I could use string.Intern() to intern the Country, Province, City, and Street strings (those are the main 'vilains', the MyObject has several other properties not relevant to the question).
foreach (line in myfile) {
fields = line.split(",");
yield return new MyObject {
Country = string.Intern(fields[0]),
Province = string.Intern(fields[1]),
City = string.Intern(fields[2]),
Street = string.Intern(fields[3]),
//...other fields
};
}
This will save about 42% of memory (tested and measured) when holding the entire dataset in memory since all duplicate strings will be a reference to the same string. Also, when creating the hierarchical structure with a lot of LINQ's .ToDictionary() method the keys (Country, Province etc.) of the resp. dictionaries will be much more efficient.
However, one of the drawbacks (aside a slight loss of performance, which is not problem) of using string.Intern() is that the strings won't be garbage collected anymore. But when I'm done with my data I do want all that stuff garbage collected (eventually).
I could use a Dictionary<string, string> to 'intern' this data but I don't like the "overhead" of having a key and value where I am, actually, only interested in the key. I could set the value to null or the use the same string as value (which will result in the same reference in key and value). It's only a small price of a few bytes to pay, but it's still a price.
Something like a HashSet<string> makes more sense to me. However, I cannot get a reference to a string in the HashSet; I can see if the HashSet contains a specific string, but not get a reference to that specific instance of the located string in the HashSet. I could implement my own HashSet for this, but I am wondering what other solutions you kind StackOverflowers may come up with.
Requirements:
My "FileReader" class needs to keep exposing an IEnumerable<MyObject>
My "FileReader" class may do stuff (like string.Intern()) to optimize memory usage
The MyObject class cannot change; I won't make a City class, Country class etc. and have MyObject expose those as properties instead of simple string properties
Goal is to be (more) memory efficient by de-duplicating most of the duplicate strings in Country, Province, City etc.; how this is achieved (e.g. string interning, internal hashset / collection / structure of something) is not important. However:
I know I can stuff the data in a database or use other solutions in such direction; I am not interested in these kind of solutions.
Speed is only of secondary concern; the quicker the better ofcourse but a (slight) loss in performance while reading/iterating the objects is no problem
Since this is a long-running process (as in: windows service running 24/7/365) that, occasionally, processes a bulk of this data I want the data to be garbage-collected when I'm done with it; string interning works great but will, in the long run, result in a huge string pool with lots of unused data
I would like any solutions to be "simple"; adding 15 classes with P/Invokes and inline assembly (exaggerated) is not worth the effort. Code maintainability is high on my list.
This is more of a 'theoretical' question; it's purely out of curiosity / interest that I'm asking. There is no "real" problem, but I can see that in similar situations this might be a problem to someone.
For example: I could do something like this:
public class StringInterningObject
{
private HashSet<string> _items;
public StringInterningObject()
{
_items = new HashSet<string>();
}
public string Add(string value)
{
if (_items.Add(value))
return value; //New item added; return value since it wasn't in the HashSet
//MEH... this will quickly go O(n)
return _items.First(i => i.Equals(value)); //Find (and return) actual item from the HashSet and return it
}
}
But with a large set of (to be de-duplicated) strings this will quickly bog down. I could have a peek at the reference source for HashSet or Dictionary or... and build a similar class that doesn't return bool for the Add() method but the actual string found in the internals/bucket.
The best I could come up with until now is something like:
public class StringInterningObject
{
private ConcurrentDictionary<string, string> _items;
public StringInterningObject()
{
_items = new ConcurrentDictionary<string, string>();
}
public string Add(string value)
{
return _items.AddOrUpdate(value, value, (v, i) => i);
}
}
Which has the "penalty" of having a Key and a Value where I'm actually only interested in the Key. Just a few bytes though, small price to pay. Coincidally this also yields 42% less memory usage; the same result as when using string.Intern() yields.
tolanj came up with System.Xml.NameTable:
public class StringInterningObject
{
private System.Xml.NameTable nt = new System.Xml.NameTable();
public string Add(string value)
{
return nt.Add(value);
}
}
(I removed the lock and string.Empty check (the latter since the NameTable already does that))
xanatos came up with a CachingEqualityComparer:
public class StringInterningObject
{
private class CachingEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public System.WeakReference X { get; private set; }
public System.WeakReference Y { get; private set; }
private readonly IEqualityComparer<T> Comparer;
public CachingEqualityComparer()
{
Comparer = EqualityComparer<T>.Default;
}
public CachingEqualityComparer(IEqualityComparer<T> comparer)
{
Comparer = comparer;
}
public bool Equals(T x, T y)
{
bool result = Comparer.Equals(x, y);
if (result)
{
X = new System.WeakReference(x);
Y = new System.WeakReference(y);
}
return result;
}
public int GetHashCode(T obj)
{
return Comparer.GetHashCode(obj);
}
public T Other(T one)
{
if (object.ReferenceEquals(one, null))
{
return null;
}
object x = X.Target;
object y = Y.Target;
if (x != null && y != null)
{
if (object.ReferenceEquals(one, x))
{
return (T)y;
}
else if (object.ReferenceEquals(one, y))
{
return (T)x;
}
}
return one;
}
}
private CachingEqualityComparer<string> _cmp;
private HashSet<string> _hs;
public StringInterningObject()
{
_cmp = new CachingEqualityComparer<string>();
_hs = new HashSet<string>(_cmp);
}
public string Add(string item)
{
if (!_hs.Add(item))
item = _cmp.Other(item);
return item;
}
}
(Modified slightly to "fit" my "Add() interface")
As per Henk Holterman's request:
public class StringInterningObject
{
private Dictionary<string, string> _items;
public StringInterningObject()
{
_items = new Dictionary<string, string>();
}
public string Add(string value)
{
string result;
if (!_items.TryGetValue(value, out result))
{
_items.Add(value, value);
return value;
}
return result;
}
}
I'm just wondering if there's maybe a neater/better/cooler way to 'solve' my (not so much of an actual) problem. By now I have enough options I guess
Here are some numbers I came up with for some simple, short, preliminary tests:
Non optimizedMemory: ~4,5GbLoad time: ~52s
StringInterningObject (see above, the ConcurrentDictionary variant)Memory: ~2,6GbLoad time: ~49s
string.Intern()Memory: ~2,3GbLoad time: ~45s
System.Xml.NameTableMemory: ~2,3GbLoad time: ~41s
CachingEqualityComparerMemory: ~2,3GbLoad time: ~58s
StringInterningObject (see above, the (non-concurrent) Dictionary variant) as per Henk Holterman's request:Memory: ~2,3GbLoad time: ~39s
Although the numbers aren't very definitive, it seems that the many memory-allocations for the non-optimized version actually slow down more than using either string.Intern() or the above StringInterningObjects which results in (slightly) longer load times. Also, string.Intern() seems to 'win' from StringInterningObject but not by a large margin; << See updates.

I've had exactly this requirement and indeed asked on SO, but with nothing like the detail of your question, no useful responses. One option that is built in is a (System.Xml).NameTable, which is basically a string atomization object, which is what you are looking for, we had (we've actually move to Intern because we do keep these strings for App-life).
if (name == null) return null;
if (name == "") return string.Empty;
lock (m_nameTable)
{
return m_nameTable.Add(name);
}
on a private NameTable
http://referencesource.microsoft.com/#System.Xml/System/Xml/NameTable.cs,c71b9d3a7bc2d2af shows its implemented as a Simple hashtable, ie only storing one reference per string.
Downside? is its completely string specific. If you do cross-test for memory / speed I'd be interested to see the results. We were already using System.Xml heavily, might of course not seem so natural if you where not.

When in doubt, cheat! :-)
public class CachingEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public T X { get; private set; }
public T Y { get; private set; }
public IEqualityComparer<T> DefaultComparer = EqualityComparer<T>.Default;
public bool Equals(T x, T y)
{
bool result = DefaultComparer.Equals(x, y);
if (result)
{
X = x;
Y = y;
}
return result;
}
public int GetHashCode(T obj)
{
return DefaultComparer.GetHashCode(obj);
}
public T Other(T one)
{
if (object.ReferenceEquals(one, X))
{
return Y;
}
if (object.ReferenceEquals(one, Y))
{
return X;
}
throw new ArgumentException("one");
}
public void Reset()
{
X = default(T);
Y = default(T);
}
}
Example of use:
var comparer = new CachingEqualityComparer<string>();
var hs = new HashSet<string>(comparer);
string str = "Hello";
string st1 = str.Substring(2);
hs.Add(st1);
string st2 = str.Substring(2);
// st1 and st2 are distinct strings!
if (object.ReferenceEquals(st1, st2))
{
throw new Exception();
}
comparer.Reset();
if (hs.Contains(st2))
{
string cached = comparer.Other(st2);
Console.WriteLine("Found!");
// cached is st1
if (!object.ReferenceEquals(cached, st1))
{
throw new Exception();
}
}
I've created an equality comparer that "caches" the last Equal terms it analyzed :-)
Everything could then be encapsulated in a subclass of HashSet<T>
/// <summary>
/// An HashSet<T;gt; that, thorough a clever use of an internal
/// comparer, can have a AddOrGet and a TryGet
/// </summary>
/// <typeparam name="T"></typeparam>
public class HashSetEx<T> : HashSet<T> where T : class
{
public HashSetEx()
: base(new CachingEqualityComparer<T>())
{
}
public HashSetEx(IEqualityComparer<T> comparer)
: base(new CachingEqualityComparer<T>(comparer))
{
}
public T AddOrGet(T item)
{
if (!Add(item))
{
var comparer = (CachingEqualityComparer<T>)Comparer;
item = comparer.Other(item);
}
return item;
}
public bool TryGet(T item, out T item2)
{
if (Contains(item))
{
var comparer = (CachingEqualityComparer<T>)Comparer;
item2 = comparer.Other(item);
return true;
}
item2 = default(T);
return false;
}
private class CachingEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public WeakReference X { get; private set; }
public WeakReference Y { get; private set; }
private readonly IEqualityComparer<T> Comparer;
public CachingEqualityComparer()
{
Comparer = EqualityComparer<T>.Default;
}
public CachingEqualityComparer(IEqualityComparer<T> comparer)
{
Comparer = comparer;
}
public bool Equals(T x, T y)
{
bool result = Comparer.Equals(x, y);
if (result)
{
X = new WeakReference(x);
Y = new WeakReference(y);
}
return result;
}
public int GetHashCode(T obj)
{
return Comparer.GetHashCode(obj);
}
public T Other(T one)
{
if (object.ReferenceEquals(one, null))
{
return null;
}
object x = X.Target;
object y = Y.Target;
if (x != null && y != null)
{
if (object.ReferenceEquals(one, x))
{
return (T)y;
}
else if (object.ReferenceEquals(one, y))
{
return (T)x;
}
}
return one;
}
}
}
Note the use of WeakReference so that there aren't useless references to objects that could prevent garbage collection.
Example of use:
var hs = new HashSetEx<string>();
string str = "Hello";
string st1 = str.Substring(2);
hs.Add(st1);
string st2 = str.Substring(2);
// st1 and st2 are distinct strings!
if (object.ReferenceEquals(st1, st2))
{
throw new Exception();
}
string stFinal = hs.AddOrGet(st2);
if (!object.ReferenceEquals(stFinal, st1))
{
throw new Exception();
}
string stFinal2;
bool result = hs.TryGet(st1, out stFinal2);
if (!object.ReferenceEquals(stFinal2, st1))
{
throw new Exception();
}
if (!result)
{
throw new Exception();
}

edit3:
instead of indexing strings, putting them in non-duplicate lists will save much more ram.
we have int indexes in class MyObjectOptimized. access is instant.
if list is short(like 1000 item) speed of setting values wont be noticable.
i assumed every string will have 5 character .
this will reduce memory usage
percentage : 110 byte /16byte = 9x gain
total : 5gb/9 = 0.7 gb + sizeof(Country_li , Province_li etc )
with int16 index (will further halve ram usage )
*note:* int16 capacity is -32768 to +32767 ,
make sure your list is not bigger than 32 767
usage is same but will use the class MyObjectOptimized
main()
{
// you can use same code
foreach (line in myfile) {
fields = line.split(",");
yield
return
new MyObjectOptimized {
Country = fields[0],
Province = fields[1],
City = fields[2],
Street = fields[3],
//...other fields
};
}
}
required classes
// single string size : 18 bytes (empty string size) + 2 bytes per char allocated
//1 class instance ram cost : 4 * (18 + 2* charCount )
// ie charcounts are at least 5
// cost: 4*(18+2*5) = 110 byte
class MyObject
{
string Country ;
string Province ;
string City ;
string Street ;
}
public static class Exts
{
public static int AddDistinct_and_GetIndex(this List<string> list ,string value)
{
if( !list.Contains(value) ) {
list.Add(value);
}
return list.IndexOf(value);
}
}
// 1 class instance ram cost : 4*4 byte = 16 byte
class MyObjectOptimized
{
//those int's could be int16 depends on your distinct item counts
int Country_index ;
int Province_index ;
int City_index ;
int Street_index ;
// manuallly implemented properties will not increase memory size
// whereas field WILL increase
public string Country{
get {return Country_li[Country_index]; }
set { Country_index = Country_li.AddDistinct_and_GetIndex(value); }
}
public string Province{
get {return Province_li[Province_index]; }
set { Province_index = Province_li.AddDistinct_and_GetIndex(value); }
}
public string City{
get {return City_li[City_index]; }
set { City_index = City_li.AddDistinct_and_GetIndex(value); }
}
public string Street{
get {return Street_li[Street_index]; }
set { Street_index = Street_li.AddDistinct_and_GetIndex(value); }
}
//beware they are static.
static List<string> Country_li ;
static List<string> Province_li ;
static List<string> City_li ;
static List<string> Street_li ;
}

How to remove items from a List of Structs that exist in another List

public struct RegistryApp
{
public string VendorName;
public string Name;
public string Version;
}
I Have two List<RegistryApp> which hold all Applications currently installed on the Windows box. Why two? Well I have one List to hold all x86 Applications and one to hold all x64 Applications.
List<RegistryApp> x64Apps64List = new List<RegistryApp>();
List<RegistryApp> x64Apps32List = new List<RegistryApp>();
Once those two are populated with their appropriate data which was retrieved from the registry, I try the following to make sure there are no duplicates. This worked decently on List<string> but not working with List<RegistryApp>.
List<RegistryApp> ListOfAllAppsInstalled = new List<RegistryApp>();
IEnumerable<RegistryApp> x86Apps = x64Apps32List.Except(x64Apps64List);
IEnumerable<RegistryApp> x64Apps = x64Apps64List.Except(x64Apps32List);
foreach (RegistryApp regitem in x86Apps)
{
if ((regitem.Name != null) &&
(regitem.Name.Length > 2) &&
(regitem.Name != ""))
{
ListOfAllAppsInstalled.Add(regitem);
}
}
foreach (RegistryApp regitem in x64Apps)
{
if ((regitem.Name != null) &&
(regitem.Name.Length > 2) &&
(regitem.Name != ""))
{
ListOfAllAppsInstalled.Add(regitem);
}
}
Any way to pull this off?

EDITED
To remove items from a List of Structs that exist in another List you can see the solution provided by Cuong Le Here :
https://stackoverflow.com/a/12784937/1507182
By using the Distinct parameterless extension method on the List type, we can remove those duplicate elements.
Then, we can optionally invoke the ToList extension to get an actual List with the duplicates removed.
static void Main()
{
// List with duplicate elements.
List<int> mylist = new List<int>();
mylist.Add(1);
mylist.Add(2);
mylist.Add(3);
mylist.Add(3);
mylist.Add(4);
mylist.Add(4);
mylist.Add(4);
foreach (int value in mylist)
{
Console.WriteLine("Before: {0}", value);
}
// Get distinct elements and convert into a list again.
List<int> distinct = mylist.Distinct().ToList();
foreach (int value in distinct)
{
Console.WriteLine("After: {0}", value);
}
}
If my answer has solved your problem click Accept as solution button, doing it will help others know the solution.

For Execpt to work the thing you are using it on must be compareable. To make it work for your custom struct you will need to do one of two things, either override GetHashCode and Equals to be able to use Execpt with your struct:
public struct RegistryApp
{
public string VendorName;
public string Name;
public string Version;
public override bool Equals(object obj)
{
if (!(obj is MyStruct))
return false;
RegistryApp ra = (RegistryApp) obj;
return ra.VendorName == this.VendorName &&
ra.Name == this.Name &&
ra.Version == this.Version;
}
public override int GetHashCode()
{
return VendorName.GetHashCode() ^ Name.GetHashCode() ^ Version.GetHashCode();
}
}
or use the overload of Execpt that allows you to pass in your own comparer and pass that in. See the MSDN for an example

Help in Indexers code

I'm currently studying indexers chapter, but I'm unable to understand "this[int pos]" and "this[string data]" of the following code. Could anyone help me in this?
class OvrIndexer
{
private string[] myData;
private int arrSize;
public OvrIndexer(int size)
{
arrSize = size;
myData = new string[size];
for (int i=0; i < size; i++)
{
myData[i] = "empty";
}
}
public string this[int pos]
{
get
{
return myData[pos];
}
set
{
myData[pos] = value;
}
}
public string this[string data]
{
get
{
int count = 0;
for (int i=0; i < arrSize; i++)
{
if (myData[i] == data)
{
count++;
}
}
return count.ToString();
}
set
{
for (int i=0; i < arrSize; i++)
{
if (myData[i] == data)
{
myData[i] = value;
}
}
}
}

One accesses the index by integer, and the other by string:
var indexer = new OvrIndexer();
// Sets the first item of the indexer to "Value1"
indexer[0] = "Value1";
// Sets the item identified by key "Key2" to value "Value2"
indexer["Key2"] = "Value2";

This is just the defining of indexers for your class, so you can use syntax like this to get the contents of the class internal array.
myOvrIndexer[3]
will return the 4th element of the myData array.
myOvrIndexer["Test"]
will return the first element with the content Test.
Note that this class is mostly useless as it just wraps a array and do not add any useful functionality except the indexer receiving a string instead of index, but from a learning perspective this class should do a good job explaining whats going on.
The main purpose of the indexers is to avoid having to create methods, and thereby having to write syntax like this:
myOvrIndexer.GetElement(3);
myOvrIndexer.SetElement(3, myValue);
I think we both agree that this syntax looks better:
myOvrIndexer[3];
myOvrIndexer[3] = myValue;

this[int pos]
The getter will return the value at index specified.
The setter will set the value at the index specified.
whereas
this[string data]
The getter will return the count of occurances of the value you are looking up.
The setter will replace all value matches found with the new value.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

List<T> BinarySearch using a String indexer - c#

A System.Collections.Generic.List uses an int as the indexer property, it does not let you use a string. If you want to use a string as your indexer (primary key), you should probably use a Dictionary<string,CreditCard> instead.

Related

How to find all the possible words using adjacent letters in a matrix

Search Integer Array for duplicates

On string interning and alternatives

How to remove items from a List of Structs that exist in another List

Help in Indexers code

Categories

Resources