Enum and IEnumerable in C# - c#

my TIME Enum contains Annual, Monthly, weekly, daily and Hourly.
Here I want to decide which is the minimum and want to return that.
How can I do this ? Here is the code I tried.
private Time DecideMinTime(IEnumerable<Time> g)
{
var minTime = Time.Hourly;
foreach (var element in g)
{
minTime = element;
}
return minTime;
}

Assuming that the numeric value of the enum elements decides what the minimum is:
private Time DecideMinTime(IEnumerable<Time> g)
{
if (g == null) { throw new ArgumentNullException("g"); }
return (Time)g.Cast<int>().Min();
}
If the numeric values indicate the opposite order then you would use .Max() instead of .Min().
As indicated, the numeric order is not consistent. This can be worked around simply by using a mapping indicating the correct order:
static class TimeOrdering
{
private static readonly Dictionary<Time, int> timeOrderingMap;
static TimeOrdering()
{
timeOrderingMap = new Dictionary<Time, int>();
timeOrderingMap[Time.Hourly] = 1;
timeOrderingMap[Time.Daily] = 2;
timeOrderingMap[Time.Weekly] = 3;
timeOrderingMap[Time.Monthly] = 4;
timeOrderingMap[Time.Annual] = 5;
}
public Time DecideMinTime(IEnumerable<Time> g)
{
if (g == null) { throw new ArgumentNullException("g"); }
return g.MinBy(i => timeOrderingMap[i]);
}
public TSource MinBy<TSource, int>(
this IEnumerable<TSource> self,
Func<TSource, int> ordering)
{
if (self == null) { throw new ArgumentNullException("self"); }
if (ordering == null) { throw new ArgumentNullException("ordering"); }
using (var e = self.GetEnumerator()) {
if (!e.MoveNext()) {
throw new ArgumentException("Sequence is empty.", "self");
}
var minElement = e.Current;
var minOrder = ordering(minElement);
while (e.MoveNext()) {
var curOrder = ordering(e.Current);
if (curOrder < minOrder) {
minOrder = curOrder;
minElement = e.Current;
}
}
return minElement;
}
}
}

To make it easier you can assign int values to your enum:
enum Time : byte {Hourly=1, Daily=2, Weekly=3, Monthly=4, Annual=5};
and then
private static Time DecideMinTime(IEnumerable<Time> g)
{
return g.Min();
}
That way you avoid casting back and forth.

Related

Modify multiple fields of an object in an Enumerable at the same time in C#

If I have an Enumerable of objects and want to modify multiple fields of a single one that I already know the index, I currently do:
var myObject = myEnumerable[index];
myObject.one = 1;
myObject.two = 2;
Is there a way to compact that? To make it simpler?
As an example, in VB you can do:
With myEnumerable[index]
.one = 1
.two = 2
End With
PS: using doesn't work here as the object would need to implement IDisposable, we don't' always control the object. I'm looking for a generic way to do this.
Handmade way of obtaining what you wanted. You can use it on every object.
class Program
{
static void Main(string[] args)
{
var item = new Item
{
One = 0,
Two = 0
};
item.SetProperties(new string[] { "One", "Two" }, new object[] { 1, 2 });
}
}
public class Item
{
public int One { get; set; }
public int Two { get; set; }
}
public static class Extensions
{
public static void SetProperties<T>(this T obj, IEnumerable<string> propertiesNames, IEnumerable<object> propertiesValues)
{
if (propertiesNames.Count() != propertiesValues.Count())
{
throw new ArgumentNullException();
}
var properties = obj.GetType().GetProperties();
for (int i = 0; i < propertiesNames.Count(); i++)
{
var property = properties.FirstOrDefault(x => x.Name == propertiesNames.ElementAt(i));
if (property is null)
{
throw new ArgumentException();
}
property.SetValue(obj, propertiesValues.ElementAt(i));
}
}
public static void SetFields<T>(this T obj, IEnumerable<string> fieldsNames, IEnumerable<object> fieldsValues)
{
if (fieldsNames.Count() != fieldsValues.Count())
{
throw new ArgumentNullException();
}
var fields = obj.GetType().GetFields();
for (int i = 0; i < fieldsNames.Count(); i++)
{
var field = fields.FirstOrDefault(x => x.Name == fieldsNames.ElementAt(i));
if (field is null)
{
throw new ArgumentException();
}
field.SetValue(obj, fieldsValues.ElementAt(i));
}
}
}

Overlapping Ranges Check for Overlapping

I have a list of ranges and I would like to find out if they overlap.
I have the following code. Which does not seem to be working. Is there an easier way to do this or a way that works :)
Thanks in advance for any advice.
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private IList<Range> rangeList;
private void Form1_Load(object sender, EventArgs e)
{
rangeList.Add(new Range{FromNumber = 0, ToNumber = 100});
rangeList.Add(new Range { FromNumber = 101, ToNumber = 200 });
// this range should over lap and throw an exception
rangeList.Add(new Range { FromNumber = 199, ToNumber = 300 });
}
private bool RangesOverlap()
{
var bigList = new List<List<int>>();
foreach (var range in this.rangeList)
{
bigList.Add(new List<int> { range.FromNumber , range.ToNumber });
}
IEnumerable<IEnumerable<int>> lists = bigList;
return lists
.Where(c => c != null && c.Any())
.Aggregate(Enumerable.Intersect)
.ToList().Count > 0;
}
}
public class Range
{
public int FromNumber { get; set; }
public int ToNumber { get; set; }
}
First merge numbers and then check generated list is in sorted order:
rangeList
.OrderBy(p => p.FromNumber)
.Select(p => new[] { p.FromNumber, p.ToNumber })
.SelectMany(p => p)
.Aggregate((p, q) => q >= p ? q : int.MaxValue) == int.MaxValue
In the past I faced a challenge where I had to write a validating algorithm for ranges that are created by the user (integers or reals). I had to check 3 things:
Ranges are continuous
Ranges are not overlapping
LOW value must be always <= than HIGH
So I came up with the following PHP algorithm.
//Let's hardcode an array of integers (it can be of reals as well):
$range = array
(
array(1, 13),
array(14, 20),
array(21, 45),
array(46, 50),
array(51, 60)
);
//And the validation algorithm:
$j = 0;
$rows = sizeof($range);
$step = 1; // This indicates the step of the values.
// 1 for integers, 0.1 or 0.01 for reals
for ($x=0; $x<$rows; $x++)
for ($y=0; $y<$rows; $y++) {
if ( ($range[$x][0] <= $range[$y][0]) AND ($range[$y][0] <= $range[$x][1]) AND ($x != $y) ) {
echo "Ranges intercepting"; // replace with error handling code
break 3;
}
if ( $range[$x][0] > $range[$x][1] ) {
echo "Not valid high & low"; // replace with error handling code
break 3;
}
if ( $range[$x][0] - $range[$y][1] == $step ) {
$j++;
}
}
if ( $j != $rows - $step )
echo "Not continuous"; // replace with error handling code
We iterate through the ranges and compare them in pairs. For each pair we:
find overlapping ranges and raise an error
find start & end points in reverse and raise an error
If none of the above occurs, we count the difference of end - start points. This number must be equals to the number of ranges minus the step (1, 0.1, 0.01, etc). Otherwise we raise an error.
Hope this helps!
You can fulfill your new requirement with a slight modification of RezaArabs answer:
rangeList
.Select(p => new[] { p.FromNumber, p.ToNumber })
.SelectMany(p => p.Distinct())
.Aggregate((p, q) => q >= p ? q : int.MaxValue) == int.MaxValue
The solution to this problem can be as simple as writing your own RangeList : IList<Range> whose Add() method throws an exception when the specified range overlaps with one or more ranges that are already in the collection.
Working example:
class Range
{
public int FromNumber { get; set; }
public int ToNumber { get; set; }
public bool Intersects(Range range)
{
if ( this.FromNumber <= range.ToNumber )
{
return (this.ToNumber >= range.FromNumber);
}
else if ( this.ToNumber >= range.FromNumber )
{
return (this.FromNumber <= range.ToNumber);
}
return false;
}
}
class RangeList : IList<Range>
{
private readonly IList<Range> innerList;
#region Constructors
public RangeList()
{
this.innerList = new List<Range>();
}
public RangeList(int capacity)
{
this.innerList = new List<Range>(capacity);
}
public RangeList(IEnumerable<Range> collection)
{
if ( collection == null )
throw new ArgumentNullException("collection");
var overlap = from left in collection
from right in collection.SkipWhile(right => left != right)
where left != right
select left.Intersects(right);
if ( overlap.SkipWhile(value => value == false).Any() )
throw new ArgumentOutOfRangeException("collection", "The specified collection contains overlapping ranges.");
this.innerList = new List<Range>(collection);
}
#endregion
private bool IsUniqueRange(Range range)
{
if ( range == null )
throw new ArgumentNullException("range");
return !(this.innerList.Any(range.Intersects));
}
private Range EnsureUniqueRange(Range range)
{
if ( !IsUniqueRange(range) )
{
throw new ArgumentOutOfRangeException("range", "The specified range overlaps with one or more other ranges in this collection.");
}
return range;
}
public Range this[int index]
{
get
{
return this.innerList[index];
}
set
{
this.innerList[index] = EnsureUniqueRange(value);
}
}
public void Insert(int index, Range item)
{
this.innerList.Insert(index, EnsureUniqueRange(item));
}
public void Add(Range item)
{
this.innerList.Add(EnsureUniqueRange(item));
}
#region Remaining implementation details
public int IndexOf(Range item)
{
return this.innerList.IndexOf(item);
}
public void RemoveAt(int index)
{
this.innerList.RemoveAt(index);
}
public void Clear()
{
this.innerList.Clear();
}
public bool Contains(Range item)
{
return this.innerList.Contains(item);
}
public void CopyTo(Range[] array, int arrayIndex)
{
this.innerList.CopyTo(array, arrayIndex);
}
public int Count
{
get { return this.innerList.Count; }
}
public bool IsReadOnly
{
get { return this.innerList.IsReadOnly; }
}
public bool Remove(Range item)
{
return this.innerList.Remove(item);
}
public IEnumerator<Range> GetEnumerator()
{
return this.innerList.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.innerList.GetEnumerator();
}
#endregion
}
Usage:
IList<Range> rangeList = new RangeList();
try
{
rangeList.Add(new Range { FromNumber = 12, ToNumber = 12 });
rangeList.Add(new Range { FromNumber = 13, ToNumber = 20 }); // should NOT overlap
rangeList.Add(new Range { FromNumber = 12, ToNumber = 20 }); // should overlap
}
catch ( ArgumentOutOfRangeException exception )
{
Console.WriteLine(exception.Message);
}

Why is ConcurrentDictionary.AddOrUpdate method slow?

I am working on a thread safe multi valued dictionary. Internally this dictionary uses a Concurrent dictionary (.net 4.0) with a custom linklist as value. Same key items are added in the linklist. The problem is when I use concurrent dictionary's AddOrUpdate method (approach 1) to insert an item, the code runs a bit slow as compared to when I use a TryGetValue method to check whether the key is present or not and then add or update the value manually inside a lock (approach 2). It takes around 20 seconds to insert 3 million records using the first approach, whereas using the second approach it takes around 9.5 seconds on a same machine (Intel i3 2nd generation 2.2 ghz & 4 Gb ram). There must be something missing which I am not able to figure out yet.
I have also checked the code for concurrent dictionary but it seems to do the same thing as I am doing inside a lock:
public TValue AddOrUpdate(TKey key, Func<TKey, TValue> addValueFactory, Func<TKey, TValue, TValue> updateValueFactory)
{
if (key == null) throw new ArgumentNullException("key");
if (addValueFactory == null) throw new ArgumentNullException("addValueFactory");
if (updateValueFactory == null) throw new ArgumentNullException("updateValueFactory");
TValue newValue, resultingValue;
while (true)
{
TValue oldValue;
if (TryGetValue(key, out oldValue))
//key exists, try to update
{
newValue = updateValueFactory(key, oldValue);
if (TryUpdate(key, newValue, oldValue))
{
return newValue;
}
}
else //try add
{
newValue = addValueFactory(key);
if (TryAddInternal(key, newValue, false, true, out resultingValue))
{
return resultingValue;
}
}
}
}
Here is the code for thread safe multi valued dictionary (approach 2 is commented, uncomment it to check the difference).
Update: There are Remove, Add and other methods also which I have not pasted below.
class ValueWrapper<U, V>
{
private U _key;
private V _value;
public ValueWrapper(U key, V value)
{
this._key = key;
this._value = value;
}
public U Key
{
get { return _key; }
}
public V Value
{
get { return _value; }
set { _value = value; }
}
}
class LinkNode<Type>
{
public LinkNode(Type data)
{
Data = data;
}
public LinkNode<Type> Next;
public Type Data;
}
public class SimpleLinkedList<T>
{
#region Instance Member Variables
private LinkNode<T> _startNode = null;
private LinkNode<T> _endNode = null;
private int _count = 0;
#endregion
public void AddAtLast(T item)
{
if (_endNode == null)
_endNode = _startNode = new LinkNode<T>(item);
else
{
LinkNode<T> node = new LinkNode<T>(item);
_endNode.Next = node;
_endNode = node;
}
_count++;
}
public T First
{
get { return _startNode == null ? default(T) : _startNode.Data; }
}
public int Count
{
get { return _count; }
}
}
class MultiValThreadSafeDictionary<U, T>
{
private ConcurrentDictionary<U, SimpleLinkedList<ValueWrapper<U, T>>> _internalDictionary;
private ReaderWriterLockSlim _slimLock = new ReaderWriterLockSlim();
public MultiValThreadSafeDictionary()
{
_internalDictionary = new ConcurrentDictionary<U, SimpleLinkedList<ValueWrapper<U, T>>>(2, 100);
}
public T this[U key]
{
get
{
throw new NotImplementedException();
}
set
{
/* ****Approach 1 using AddOrUpdate**** */
_internalDictionary.AddOrUpdate(key, (x) =>
{
SimpleLinkedList<ValueWrapper<U, T>> list = new SimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
//_internalDictionary[key] = list;
return list;
},
(k, existingList) =>
{
try
{
_slimLock.EnterWriteLock();
if (existingList.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
existingList.AddAtLast(vw);
}
else
existingList.First.Value = value;
return existingList;
}
finally
{
_slimLock.ExitWriteLock();
}
});
/* ****Approach 2 not using AddOrUpdate**** */
/*
try
{
_slimLock.EnterWriteLock();
SimpleLinkedList<ValueWrapper<U, T>> list;
if (!_internalDictionary.TryGetValue(key, out list))
{
list = new SimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
_internalDictionary[key] = list;
//_iterator.AddAtLast(vw);
return;
}
if (list.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
//_iterator.AddAtLast(vw);
}
else
list.First.Value = value;
}
finally
{
_slimLock.ExitWriteLock();
}
*/
}
}
}
The test code only insert items, all with unique keys. It is as follows.
MultiValThreadSafeDictionary<string, int> testData = new MultiValThreadSafeDictionary<string, int>();
Task t1 = new Task(() =>
{
for (int i = 0; i < 1000000; i++)
{
testData[i.ToString()] = i;
}
}
);
Task t2 = new Task(() =>
{
for (int i = 1000000; i < 2000000; i++)
{
testData[i.ToString()] = i;
}
}
);
Task t3 = new Task(() =>
{
for (int i = 2000000; i < 3000000; i++)
{
testData[i.ToString()] = i;
}
}
);
Stopwatch watch = new Stopwatch();
watch.Start();
t1.Start();
t2.Start();
t3.Start();
t1.Wait();
t2.Wait();
t3.Wait();
watch.Stop();
Console.WriteLine("time taken:" + watch.ElapsedMilliseconds);
Update 1:
Based on the answer from '280Z28', I am rephrasing the question. Why is GetOrAdd and 'my' method taking almost the same time, where as in my method I am taking an extra lock and also calling TryAndGet method also. And why AddOrUpdate taking the double amount of time as compared to AddOrGet. Code for all of the approaches is as under:
GetOrAdd and AddOrUpdate method in ConcurrentDictionary (.net 4) has the following code:
public TValue GetOrAdd(TKey key, TValue value)
{
if (key == null) throw new ArgumentNullException("key");
TValue resultingValue;
TryAddInternal(key, value, false, true, out resultingValue);
return resultingValue;
}
public TValue AddOrUpdate(TKey key, Func<TKey, TValue> addValueFactory, Func<TKey, TValue, TValue> updateValueFactory)
{
if (key == null) throw new ArgumentNullException("key");
if (addValueFactory == null) throw new ArgumentNullException("addValueFactory");
if (updateValueFactory == null) throw new ArgumentNullException("updateValueFactory");
TValue newValue, resultingValue;
while (true)
{
TValue oldValue;
if (TryGetValue(key, out oldValue))
//key exists, try to update
{
newValue = updateValueFactory(key, oldValue);
if (TryUpdate(key, newValue, oldValue))
{
return newValue;
}
}
else //try add
{
newValue = addValueFactory(key);
if (TryAddInternal(key, newValue, false, true, out resultingValue))
{
return resultingValue;
}
}
}
}
GetOrAdd in my code is used as follows (taking 9 seconds):
SimpleLinkedList<ValueWrapper<U, T>> existingList = new SimpleLinkedList<ValueWrapper<U, T>>();
existingList = _internalDictionary.GetOrAdd(key, existingList);
try
{
_slimLock.EnterWriteLock();
if (existingList.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
existingList.AddAtLast(vw);
}
else
existingList.First.Value = value;
}
finally
{
_slimLock.ExitWriteLock();
}
AddOrUpdate is used as follows (taking 20 seconds on all adds, no updates). As described in one of the answers this approach is not suitable for update.
_internalDictionary.AddOrUpdate(key, (x) =>
{
SimpleLinkedList<ValueWrapper<U, T>> list = new SimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
return list;
},
(k, existingList ) =>
{
try
{
_slimLock.EnterWriteLock();
if (existingList.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
existingList.AddAtLast(vw);
}
else
existingList.First.Value = value;
return existingList;
}
finally
{
_slimLock.ExitWriteLock();
}
});
Code without AddOrGet and AddOrUpdate is as follows (taking 9.5 seconds):
try
{
_slimLock.EnterWriteLock();
VerySimpleLinkedList<ValueWrapper<U, T>> list;
if (!_internalDictionary.TryGetValue(key, out list))
{
list = new VerySimpleLinkedList<ValueWrapper<U, T>>();
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
_internalDictionary[key] = list;
return;
}
if (list.Count == 0)
{
ValueWrapper<U, T> vw = new ValueWrapper<U, T>(key, value);
list.AddAtLast(vw);
}
else
list.First.Value = value;
}
finally
{
_slimLock.ExitWriteLock();
}
You should not be using AddOrUpdate for this code. This is extremely clear because your update method never actually updates the value stored in ConcurrentDictionary - it always returns the existingList argument unchanged. Instead, you should be doing something like the following.
SimpleLinkedList<ValueWrapper<U, T>> list = _internalDictionary.GetOrAdd(key, CreateEmptyList);
// operate on list here
...
private static SimpleLinkedList<ValueWrapper<U, T>> CreateEmptyList()
{
return new SimpleLinkedList<ValueWrapper<U, T>>();
}
Read operations on the dictionary are performed in a lock-free manner.
As mentioned in http://msdn.microsoft.com/en-us/library/dd287191.aspx
Implementation of AddOrUpdate is using fine-grained lock so to check if item already exists or not, but when you first read by yourself, being lock free read it is faster and by doing that you are reducing locks required for existing items.

How to make the below code generic?

I have a method as under
private int SaveRecord(PartnerViewLog partnerViewLog, PortalConstant.DataSourceType DataSourceType, Func<IDataAccess, PartnerViewLog, int> chooseSelector)
{
int results = -1;
var dataPlugin = DataPlugins.FirstOrDefault(i => i.Metadata["SQLMetaData"].ToString() == DataSourceType.EnumToString());
if (dataPlugin != null)
{
results = chooseSelector(dataPlugin.Value, partnerViewLog);
}
return results;
}
I am invoking it as under
public int SavePartnerViewLog(PartnerViewLog partnerViewLog, PortalConstant.DataSourceType DataSourceType)
{
return SaveRecord(partnerViewLog, DataSourceType, (i, u) => i.SavePartnerViewLog(partnerViewLog));
}
As can be figured out that PartnerViewLog is a class. Now I want to make the function SaveRecord as generic where the class name can be anything?
How to do so?
Try the following:
private int SaveRecord<T>(T record, PortalConstant.DataSourceType dataSourceType, Func<IDataAccess, T, int> chooseSelector)
{
int results = -1;
var dataPlugin = DataPlugins.FirstOrDefault(i => i.Metadata["SQLMetaData"].ToString() == dataSourceType.EnumToString());
if (dataPlugin != null)
{
results = chooseSelector(dataPlugin.Value, record);
}
return results;
}
private int SaveRecord<T>(T partnerViewLog, PortalConstant.DataSourceType dataSourceType, Func<IDataAccess, T, int> chooseSelector)
{
...
}
Every thing else remains the same

Optimized Generic List Split

Read the edit below for more information.
I have some code below that I use to split a generic list of Object when the item is of a certain type.
public static IEnumerable<object>[] Split(this IEnumerable<object> tokens, TokenType type) {
List<List<object>> t = new List<List<object>>();
int currentT = 0;
t.Add(new List<object>());
foreach (object list in tokens) {
if ((list is Token) && (list as Token).TokenType == type) {
currentT++;
t.Add(new List<object>());
}
else if ((list is TokenType) && ((TokenType)list )== type) {
currentT++;
t.Add(new List<object>());
}
else {
t[currentT].Add(list);
}
}
return t.ToArray();
}
I dont have a clear question as much as I am curious if anyone knows of any ways I can optimize this code. I call it many times and it seems to be quite the beast as far as clock cycles go. Any ideas? I can also make it a Wiki if anyone is interested, maybe we can keep track of the latest changes.
Update: Im trying to parse out specific tokens. Its a list of some other class and Token classes. Token has a property (enum) of TokenType. I need to find all the Token classes and split on each of them.
{a b c T d e T f g h T i j k l T m}
would split like
{a b c}{d e}{f g h}{i j k l}{m}
EDIT UPDATE:
It seems like all of my speed problems come into the constant creation and addition of Generic Lists. Does anyone know how I can go about this without that?
This is the profile of what is happening if it helps anyone.
Your code looks fine.
My only suggestion would be replacing IEnumerable<object> with the non-generic IEnumerable. (In System.Collections)
EDIT:
On further inspection, you're casting more times than necessary.
Replace the if with the following code:
var token = list as Token;
if (token != null && token.TokenType == type) {
Also, you can get rid your currentT variable by writing t[t.Count - 1] or t.Last(). This will make the code clearer, but might have a tiny negative effect on performance.
Alternatively, you could store a reference to the inner list in a variable and use it directly. (This will slightly improve performance)
Finally, if you can change the return type to List<List<Object>>, you can return t directly; this will avoid an array copy and will be noticeably faster for large lists.
By the way, your variable names are confusing; you should swap the names of t and list.
Type-testing and casts can be a performance killer. If at all possible, your token types should implement a common interface or abstract class. Instead of passing in and object, you should pass in an IToken which wraps your object.
Here's some concept code you can use to get started:
using System;
using System.Collections.Generic;
namespace Juliet
{
interface IToken<T>
{
bool IsDelimeter { get; }
T Data { get; }
}
class DelimeterToken<T> : IToken<T>
{
public bool IsDelimeter { get { return true; } }
public T Data { get { throw new Exception("No data"); } }
}
class DataToken<T> : IToken<T>
{
public DataToken(T data)
{
this.Data = data;
}
public bool IsDelimeter { get { return false; } }
public T Data { get; private set; }
}
class TokenFactory<T>
{
public IToken<T> Make()
{
return new DelimeterToken<T>();
}
public IToken<T> Make(T data)
{
return new DataToken<T>(data);
}
}
class Program
{
static List<List<T>> SplitTokens<T>(IEnumerable<IToken<T>> tokens)
{
List<List<T>> res = new List<List<T>>();
foreach (IToken<T> token in tokens)
{
if (token.IsDelimeter)
{
res.Add(new List<T>());
}
else
{
if (res.Count == 0)
{
res.Add(new List<T>());
}
res[res.Count - 1].Add(token.Data);
}
}
return res;
}
static void Main(string[] args)
{
TokenFactory<string> factory = new TokenFactory<string>();
IToken<string>[] tokens = new IToken<string>[]
{
factory.Make("a"), factory.Make("b"), factory.Make("c"), factory.Make(),
factory.Make("d"), factory.Make("e"), factory.Make(),
factory.Make("f"), factory.Make("g"), factory.Make("h"), factory.Make(),
factory.Make("i"), factory.Make("j"), factory.Make("k"), factory.Make("l"), factory.Make(),
factory.Make("m")
};
List<List<string>> splitTokens = SplitTokens(tokens);
for (int i = 0; i < splitTokens.Count; i++)
{
Console.Write("{");
for (int j = 0; j < splitTokens[i].Count; j++)
{
Console.Write("{0}, ", splitTokens[i][j]);
}
Console.Write("}");
}
Console.ReadKey(true);
}
}
}
In principle, you can create instances of IToken<object> to have it generalized to tokens of multiple classes.
A: An all-lazy implementation will suffice if you just iterate through the results in a nested foreach:
using System;
using System.Collections.Generic;
public static class Splitter
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, Predicate<T> match)
{
using (IEnumerator<T> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
yield return Split(enumerator, match);
}
}
}
static IEnumerable<T> Split<T>(IEnumerator<T> enumerator, Predicate<T> match)
{
do
{
if (match(enumerator.Current))
{
yield break;
}
else
{
yield return enumerator.Current;
}
} while (enumerator.MoveNext());
}
}
Use it like this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace MyTokenizer
{
class Program
{
enum TokenTypes { SimpleToken, UberToken }
class Token { public TokenTypes TokenType = TokenTypes.SimpleToken; }
class MyUberToken : Token { public MyUberToken() { TokenType = TokenTypes.UberToken; } }
static void Main(string[] args)
{
List<object> objects = new List<object>(new object[] { "A", Guid.NewGuid(), "C", new MyUberToken(), "D", new MyUberToken(), "E", new MyUberToken() });
var splitOn = TokenTypes.UberToken;
foreach (var list in objects.Split(x => x is Token && ((Token)x).TokenType == splitOn))
{
foreach (var item in list)
{
Console.WriteLine(item);
}
Console.WriteLine("==============");
}
Console.ReadKey();
}
}
}
B: If you need to process the results some time later and you wish to do it out-of-order, or you partition on one thread and then possibly dispatch the segments to multiple threads, then this would probably provide a good starting point:
using System;
using System.Collections.Generic;
using System.Linq;
public static class Splitter2
{
public static IEnumerable<IEnumerable<T>> SplitToSegments<T>(this IEnumerable<T> source, Predicate<T> match)
{
T[] items = source.ToArray();
for (int startIndex = 0; startIndex < items.Length; startIndex++)
{
int endIndex = startIndex;
for (; endIndex < items.Length; endIndex++)
{
if (match(items[endIndex])) break;
}
yield return EnumerateArraySegment(items, startIndex, endIndex - 1);
startIndex = endIndex;
}
}
static IEnumerable<T> EnumerateArraySegment<T>(T[] array, int startIndex, int endIndex)
{
for (; startIndex <= endIndex; startIndex++)
{
yield return array[startIndex];
}
}
}
C: If you really must return a collection of List<T> -s - which I doubt, unless you explicitly want to mutate them some time later on -, then try to initialize them to a given capacity before copying:
public static List<List<T>> SplitToLists<T>(this IEnumerable<T> source, Predicate<T> match)
{
List<List<T>> lists = new List<List<T>>();
T[] items = source.ToArray();
for (int startIndex = 0; startIndex < items.Length; startIndex++)
{
int endIndex = startIndex;
for (; endIndex < items.Length; endIndex++)
{
if (match(items[endIndex])) break;
}
List<T> list = new List<T>(endIndex - startIndex);
list.AddRange(EnumerateArraySegment(items, startIndex, endIndex - 1));
lists.Add(list);
startIndex = endIndex;
}
return lists;
}
D: If this is still not enough, I suggest you roll your own lightweight List implementation that can copy a range directly to its internal array from another instance.
My first thought would be instead of looking up t[currentT] all the time, just store a currentList and add directly to that.
This is the best I could do to eliminate as much of the allocation times as possible for the function (should only allocate when it goes over the capacity, which should be no more than what is required to create the largest sub list in the results). I've tested this implementation and it works as you described.
Please note that the results of the prior sub list are destroyed when the next list in the group is accessed.
public static IEnumerable<IEnumerable> Split(this IEnumerable tokens, TokenType type)
{
ArrayList currentT = new ArrayList();
foreach (object list in tokens)
{
Token token = list as Token;
if ((token != null) && token.TokenType == type)
{
yield return currentT;
currentT.Clear();
//currentT = new ArrayList(); <-- Use this instead of 'currentT.Clear();' if you want the returned lists to be a different instance
}
else if ((list is TokenType) && ((TokenType)list) == type)
{
yield return currentT;
currentT.Clear();
//currentT = new ArrayList(); <-- Use this instead of 'currentT.Clear();' if you want the returned lists to be a different instance
}
else
{
currentT.Add(list);
}
}
}
EDIT
Here's another version that doesn't make use of another list at all (shouldn't be doing any allocations). Not sure how well this will compare, but it does work as requested (also I've got no idea how this one will go if you try to cache a sub 'array').
Also, both of these require a "using System.Collections" statement (in addition to the Generic namespace).
private static IEnumerable SplitInnerLoop(IEnumerator iter, TokenType type)
{
do
{
Token token = iter.Current as Token;
if ((token != null) && token.TokenType == type)
{
break;
}
else if ((iter.Current is TokenType) && ((TokenType)iter.Current) == type)
{
break;
}
else
{
yield return iter.Current;
}
} while (iter.MoveNext());
}
public static IEnumerable<IEnumerable> Split(this IEnumerable tokens, TokenType type)
{
IEnumerator iter = tokens.GetEnumerator();
while (iter.MoveNext())
{
yield return SplitInnerLoop(iter, type);
}
}
I think that there are broken cases for these scenarios where assuming that list items are lower case letters, and the item with matching token type is T:
{ T a b c ... };
{ ... x y z T };
{ ... j k l T T m n o ... };
{ T }; and
{ }
Which will result in:
{ { } { a b c ... } };
{ { ... x y z } { } };
{ { ... j k l } { } { } { m n o ... } };
{ { } }; and
{ }
Doing a straight refactoring:
public static IEnumerable<object>[] Split(this IEnumerable<object> tokens,
TokenType type) {
var outer = new List<List<object>>();
var inner = new List<object>();
foreach (var item in tokens) {
Token token = item as token;
if (token != null && token.TokenType == type) {
outer.Add(inner);
inner = new List<object>();
continue;
}
inner.Add(item);
}
outer.Add(inner);
return outer.ToArray();
}
To fix the broken cases (assuming that those are truly broken), I recommend:
public static IEnumerable<object>[] Split(this IEnumerable<object> tokens,
TokenType type) {
var outer = new List<List<object>>();
var inner = new List<object>();
var enumerator = tokens.GetEnumerator();
while (enumerator.MoveNext()) {
Token token = enumerator.Current as token;
if (token == null || token.TokenType != type) {
inner.Add(enumerator.Current);
}
else if (inner.Count > 0) {
outer.Add(inner);
inner = new List<object>();
}
}
return outer.ToArray();
}
Which will result in:
{ { a b c ... } };
{ { ... x y z } };
{ { ... j k l } { m n o ... } };
{ }; and
{ }
Using LINQ you could try this: (I did not test it...)
public static IEnumerable<object>[] Split(this IEnumerable<object> tokens, TokenType type)
{
List<List<object>> l = new List<List<object>>();
l.Add(new List<object>());
return tokens.Aggregate(l, (c, n) =>
{
var t = n as Token;
if (t != null && t.TokenType == type)
{
t.Add(new List<object>());
}
else
{
l.Last().Add(n);
}
return t;
}).ToArray();
}
Second try:
public static IEnumerable<object>[] Split(this IEnumerable<object> tokens, TokenType type)
{
var indexes = tokens.Select((t, index) => new { token = t, index = index }).OfType<Token>().Where(t => t.token.TokenType == type).Select(t => t.index);
int prevIndex = 0;
foreach (int item in indexes)
{
yield return tokens.Where((t, index) => (index > prevIndex && index < item));
prevIndex = item;
}
}
Here is one possibility
The Token class ( could be what ever class )
public class Token
{
public string Name { get; set; }
public TokenType TokenType { get; set; }
}
Now the Type enum ( this could be what ever other grouping factor )
public enum TokenType
{
Type1,
Type2,
Type3,
Type4,
Type5,
}
The Extention Method (Declare this anyway you choose)
public static class TokenExtension
{
public static IEnumerable<Token>[] Split(this IEnumerable<Token> tokens)
{
return tokens.GroupBy(token => ((Token)token).TokenType).ToArray();
}
}
Sample of use ( I used a web project to spin this )
List<Token> tokens = new List<Token>();
tokens.Add(new Token { Name = "a", TokenType = TokenType.Type1 });
tokens.Add(new Token { Name = "b", TokenType = TokenType.Type1 });
tokens.Add(new Token { Name = "c", TokenType = TokenType.Type1 });
tokens.Add(new Token { Name = "d", TokenType = TokenType.Type2 });
tokens.Add(new Token { Name = "e", TokenType = TokenType.Type2 });
tokens.Add(new Token { Name = "f", TokenType = TokenType.Type3 });
tokens.Add(new Token { Name = "g", TokenType = TokenType.Type3 });
tokens.Add(new Token { Name = "h", TokenType = TokenType.Type3 });
tokens.Add(new Token { Name = "i", TokenType = TokenType.Type4 });
tokens.Add(new Token { Name = "j", TokenType = TokenType.Type4 });
tokens.Add(new Token { Name = "k", TokenType = TokenType.Type4 });
tokens.Add(new Token { Name = "l", TokenType = TokenType.Type4 });
tokens.Add(new Token { Name = "m", TokenType = TokenType.Type5 });
StringBuilder stringed = new StringBuilder();
foreach (Token token in tokens)
{
stringed.Append(token.Name);
stringed.Append(", ");
}
Response.Write(stringed.ToString());
Response.Write("</br>");
var q = tokens.Split();
foreach (var list in tokens.Split())
{
stringed = new StringBuilder();
foreach (Token token in list)
{
stringed.Append(token.Name);
stringed.Append(", ");
}
Response.Write(stringed.ToString());
Response.Write("</br>");
}
So all I am soing is using Linq, feel free to add or remove, you can actualy go crazy on this and group on many diferent properties.
Do you need to convert it to an array? You could potentially use LINQ and delayed execution to return the results.
EDIT:
With the clarified question it would be hard to bend LINQ to make it return the results the way you want. If you still want to have the execution of each cycle delayed you could write your own enumerator.
I recommend perf testing this compared to the other options to see if there is a performance gain for your scenario if you attempt this approach. It might cause more overhead managing the iterator which would be bad for cases with little data.
I hope this helps.
// This is the easy way to make your own iterator using the C# syntax
// It will return sets of separated tokens in a lazy fashion
// This sample is based on the version provided by #Ants
public static IEnumerable<IEnumerable<object>> Split(this IEnumerable<object> tokens,
TokenType type) {
var current = new List<object>();
foreach (var item in tokens)
{
Token token = item as Token;
if (token != null && token.TokenType == type)
{
if( current.Count > 0)
{
yield return current;
current = new List<object>();
}
}
else
{
current.Add(item);
}
}
if( current.Count > 0)
yield return current;
}
Warning: This compiles but has still might have hidden bugs. It is getting late here.
// This is doing the same thing but doing it all by hand.
// You could use this method as well to lazily iterate through the 'current' list as well
// This is probably overkill and substantially more complex
public class TokenSplitter : IEnumerable<IEnumerable<object>>, IEnumerator<IEnumerable<object>>
{
IEnumerator<object> _enumerator;
IEnumerable<object> _tokens;
TokenType _target;
List<object> _current;
bool _isDone = false;
public TokenSplitter(IEnumerable<object> tokens, TokenType target)
{
_tokens = tokens;
_target = target;
Reset();
}
// Cruft from the IEnumerable and generic IEnumerator
public IEnumerator<IEnumerable<object>> GetEnumerator() { return this; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public IEnumerable<object> Current { get { return _current; } }
public void Dispose() { }
#region IEnumerator Members
object System.Collections.IEnumerator.Current { get { return Current; } }
// See if there is anything left to get
public bool MoveNext()
{
if (_isDone) return false;
FillCurrent();
return !_isDone;
}
// Reset the enumerators so that you could reuse this structure if you wanted
public void Reset()
{
_isDone = false;
_enumerator = _tokens.GetEnumerator();
_current = new List<object>();
FillCurrent();
}
// Fills the current set of token and then begins the next set
private void FillCurrent()
{
// Try to accumulate as many tokens as possible, this too could be an enumerator to delay the process more
bool hasNext = _enumerator.MoveNext();
for( ; hasNext; hasNext = _enumerator.MoveNext())
{
Token token = _enumerator.Current as Token;
if (token == null || token.TokenType != _target)
{
_current.Add(_enumerator.Current);
}
else
{
_current = new List<object>();
}
}
// Continue removing matching tokens and begin creating the next set
for( ; hasNext; hasNext = _enumerator.MoveNext())
{
Token token = _enumerator.Current as Token;
if (token == null || token.TokenType != _target)
{
_current.Add(_enumerator.Current);
break;
}
}
_isDone = !hasNext;
}
#endregion
}

Categories

Resources