Running number of items in subgroups within ienumerable of items

Running number of items in subgroups within ienumerable of items - c#

Say I have an
IEnumerable< IEnumerable< string > > rowsOfTextColumns
The inner ienumerable string values represent columns in a row, thus the outer ienumerable stores several rows of text columns.
Like: 3 rows by 4 columns:
12345 foo 2014-10-16 09:55 blah
12345 foo 2014-10-16 09:55 bleh
67890 bar 2014-10-16 09:58 ugh
The DateTime column values are not unique - as you can see in the example, several entries at the same time are possible. But datetime makes most sense to use as ID in my data.
Since I want a unique ID for each row, I would like to add a column to each row "on the fly", which contains the number of occurence from entries with same datetime, starting with 1. Like this:
12345 foo 2014-10-16 09:55 blah (1)
12345 foo 2014-10-16 09:55 bleh (2)
67890 bar 2014-10-16 10:21 ugh (1)
(For clarification: the unique id would be a compound of datetime + running number within datetime subgroup)
Sure I know how to do this some way.
But - how is this done most elegantly, e.g. using LINQ / functional programming aspects of C#?
Furthermore I am curious, how would the same be done most elegantly in F#?
EDIT #1: better illustrated the source data format
EDIT #2:
Allright, using groupby as suggested in one comment, I got this so far (in C#, look at my selected Answer for F# code):
var groupsByDatetime = rowsOfColumns.GroupBy( rec => rec.ElementAt(2) );
var extendedRows =
groupsByDatetime.SelectMany( g =>
g.Select( (columns,i) =>
columns.Concat( new[]{(1+i).ToString()} ) ) );
Anyone bids less? :)
Well doesn't look too bad already I guess.

This groups the items and maps each item to include its index within the group.
let groupAndIndexItems keySelector =
Seq.groupBy keySelector
>> Seq.map (fun (key, items) ->
let indexedItems = items |> Seq.mapi (fun i x -> x, i)
key, indexedItems
)
Example usage:
[
12345, "foo", "2014-10-16 09:55", "blah"
12345, "foo", "2014-10-16 09:55", "bleh"
67890, "bar", "2014-10-16 09:58", "ugh"
]
|> groupAndIndexItems (fun (_, _, s, _) -> s)
Output:
val it : seq<string * seq<(int * string * string * string) * int>> =
seq
[("2014-10-16 09:55",
seq [((12345, "foo", "2014-10-16 09:55", "blah"), 0);
((12345, "foo", "2014-10-16 09:55", "bleh"), 1)]);
("2014-10-16 09:58",
seq [((67890, "bar", "2014-10-16 09:58", "ugh"), 0)])]

Related

Calculate a total for each record in a list based on filtered list

I have the following sample list:
Index
No
Path
A
B
C
Amount
1
1000
1000
a
b
c
700
2
1001
1000.1001
a
b
c
100
3
1001
1000.1001
a
b
d
200
I need to iterate over the list, and for each record, in the list I need to calculate a value and store it in a new column based on specific conditions:
filter records where Path contains record's No
filter record with same values in columns A, B, and C
calculate a sum of Amount and save it as TotalAmount
To give you an example:
Index
No
Path
A
B
C
Amount
TotalAmount
1
1000
1000
a
b
c
700
800
2
1001
1000.1001
a
b
c
100
100
3
1001
1000.1001
a
b
d
200
200
For the first record, I need to find all records in the list which Path contains the No of the record (1000) and with the same values in columns A, B and C. So In this example for the first record, we take records with index = 1 and index = 2, calculate a sum of amount and return it in the column TotalAmount.
I had such idea for this:
foreach (record in List)
{
var totalAmount = List
.Where(e =>
e.Path.Contains(record.No) &&
e.A == record.A &&
e.B == record.B &&
e.C == record.C)
.Sum(e => e.Amount)
}
However, it doesn't return what I want and I don't know how to save it back to the list after such calculations.

If you're not getting the proper result from the Sum operation, it's likely due to the Contains method returning records that you don't want (because 10000 contains 1000, for example). One way to handle that would be to add periods when checking the beginning, middle, and end of the value.
The other issue is as a previous answer described - you need to set the TotalAmount property of the record (assuming it has one):
foreach (var record in list)
{
record.TotalAmount = list
.Where(e =>
(e.Path.StartsWith($"{record.No}.") ||
e.Path.Contains($".{record.No}.") ||
e.Path.EndsWith($".{record.No}")) &&
e.A == record.A &&
e.B == record.B &&
e.C == record.C)
.Sum(e => e.Amount)
}

Update your result to the current item running in the loop
list.ForEach(record=>{
{
var totalAmount = list.Where(e => e.Path.Contains(record.No) && e.A == record.A && e.B
== record.B
&& e.C == record.C).Sum(e => e.Amount);
record.TotalAmount = totalAmount;
});

First, rather than putting the TotalAmount on your record type, I recommend creating a new type to represent the result you're looking for.
public record Source(int Index, string No, string Path, string A, string B, string C, double Amount);
public record Totals(int Index, string No, string Path, string A, string B, string C, double Amount, double TotalAmount);
Next, I would create an intermediate representation of your data that's easier to reason with. It sounds like the combination of (A, B, C) has a meaning, and it sounds like your Path is really kind of a collection of the parent elements to the current record.
var intermediates = sources.Select(
source => new
{
source,
pathComponents = source.Path.Split('.').ToHashSet(),
abc = (source.A, source.B, source.C)
});
Then let's group those values by the combined abc value, so we can quickly look up the items that belong to the same group as a given entry.
var byAbc = intermediates.ToLookup(e => e.abc);
Finally, calculate the totals:
var totals =
from intermediate in intermediates
let totalAmount = byAbc[intermediate.abc]
.Where(e => e.pathComponents.Contains(intermediate.source.No))
.Sum(e => e.source.Amount)
let source = intermediate.source
select new Totals(
source.Index,
source.No,
source.Path,
source.A,
source.B,
source.C,
source.Amount,
totalAmount);
Here are a few of the benefits to this approach:
By breaking down the problem into individual steps with immutable behavior:
it's possible to step through the code and visually inspect (or log) the results from each line of code.
it's possible to jump back to a previous step in the debugger and walk through the code again without changing the program's behavior.
the results of the individual steps can be put into named variables that help the reader understand intent.
the individual steps can easily be refactored out to separate methods or classes.
By representing the path as a collection of individual pieces which can be compared using equality checks, we avoid bugs in the Contains logic. For example, if a record had a No of 100, all of the paths in the above example would match it if you used a simple string.Contains check, even though none of the paths actually includes 100 as a component of their path.
By using data structures like Lookup and HashSet, we avoid high asymptotic complexity, which means that this scales well to really large data sets.
By using a different type for the input and output, you prevent the introduction of bugs in your application where someone needs to use the TotalAmount property, but is given a list where that property has not yet been populated.
Here's a LINQPad script putting the whole thing together.
Result:

string name of variable(object) [duplicate]

I would like to be able to get the name of a variable as a string but I don't know if Python has that much introspection capabilities. Something like:
>>> print(my_var.__name__)
'my_var'
I want to do that because I have a bunch of variables I'd like to turn into a dictionary like :
bar = True
foo = False
>>> my_dict = dict(bar=bar, foo=foo)
>>> print my_dict
{'foo': False, 'bar': True}
But I'd like something more automatic than that.
Python have locals() and vars(), so I guess there is a way.

As unwind said, this isn't really something you do in Python - variables are actually name mappings to objects.
However, here's one way to try and do it:
>>> a = 1
>>> for k, v in list(locals().iteritems()):
if v is a:
a_as_str = k
>>> a_as_str
a
>>> type(a_as_str)
'str'

I've wanted to do this quite a lot. This hack is very similar to rlotun's suggestion, but it's a one-liner, which is important to me:
blah = 1
blah_name = [ k for k,v in locals().iteritems() if v is blah][0]
Python 3+
blah = 1
blah_name = [ k for k,v in locals().items() if v is blah][0]

Are you trying to do this?
dict( (name,eval(name)) for name in ['some','list','of','vars'] )
Example
>>> some= 1
>>> list= 2
>>> of= 3
>>> vars= 4
>>> dict( (name,eval(name)) for name in ['some','list','of','vars'] )
{'list': 2, 'some': 1, 'vars': 4, 'of': 3}

This is a hack. It will not work on all Python implementations distributions (in particular, those that do not have traceback.extract_stack.)
import traceback
def make_dict(*expr):
(filename,line_number,function_name,text)=traceback.extract_stack()[-2]
begin=text.find('make_dict(')+len('make_dict(')
end=text.find(')',begin)
text=[name.strip() for name in text[begin:end].split(',')]
return dict(zip(text,expr))
bar=True
foo=False
print(make_dict(bar,foo))
# {'foo': False, 'bar': True}
Note that this hack is fragile:
make_dict(bar,
foo)
(calling make_dict on 2 lines) will not work.
Instead of trying to generate the dict out of the values foo and bar,
it would be much more Pythonic to generate the dict out of the string variable names 'foo' and 'bar':
dict([(name,locals()[name]) for name in ('foo','bar')])

This is not possible in Python, which really doesn't have "variables". Python has names, and there can be more than one name for the same object.

I think my problem will help illustrate why this question is useful, and it may give a bit more insight into how to answer it. I wrote a small function to do a quick inline head check on various variables in my code. Basically, it lists the variable name, data type, size, and other attributes, so I can quickly catch any mistakes I've made. The code is simple:
def details(val):
vn = val.__name__ # If such a thing existed
vs = str(val)
print("The Value of "+ str(vn) + " is " + vs)
print("The data type of " + vn + " is " + str(type(val)))
So if you have some complicated dictionary / list / tuple situation, it would be quite helpful to have the interpreter return the variable name you assigned. For instance, here is a weird dictionary:
m = 'abracadabra'
mm=[]
for n in m:
mm.append(n)
mydic = {'first':(0,1,2,3,4,5,6),'second':mm,'third':np.arange(0.,10)}
details(mydic)
The Value of mydic is {'second': ['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a'], 'third': array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]), 'first': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}
The data type of mydic is <type 'dict'>
details(mydic['first'])
The Value of mydic['first'] is (0, 1, 2, 3, 4, 5, 6)]
The data type of mydic['first'] is <type 'list'>
details(mydic.keys())
The Value of mydic.keys() is ['second', 'third', 'first']
The data type of mydic.keys() is <type 'tuple'>
details(mydic['second'][0])
The Value of mydic['second'][0] is a
The data type of mydic['second'][0] is <type 'str'>
I'm not sure if I put this in the right place, but I thought it might help. I hope it does.

I wrote a neat little useful function based on the answer to this question. I'm putting it here in case it's useful.
def what(obj, callingLocals=locals()):
"""
quick function to print name of input and value.
If not for the default-Valued callingLocals, the function would always
get the name as "obj", which is not what I want.
"""
for k, v in list(callingLocals.items()):
if v is obj:
name = k
print(name, "=", obj)
usage:
>> a = 4
>> what(a)
a = 4
>>|

I find that if you already have a specific list of values, that the way described by #S. Lotts is the best; however, the way described below works well to get all variables and Classes added throughout the code WITHOUT the need to provide variable name though you can specify them if you want. Code can be extend to exclude Classes.
import types
import math # mainly showing that you could import what you will before d
# Everything after this counts
d = dict(globals())
def kv_test(k,v):
return (k not in d and
k not in ['d','args'] and
type(v) is not types.FunctionType)
def magic_print(*args):
if len(args) == 0:
return {k:v for k,v in globals().iteritems() if kv_test(k,v)}
else:
return {k:v for k,v in magic_print().iteritems() if k in args}
if __name__ == '__main__':
foo = 1
bar = 2
baz = 3
print magic_print()
print magic_print('foo')
print magic_print('foo','bar')
Output:
{'baz': 3, 'foo': 1, 'bar': 2}
{'foo': 1}
{'foo': 1, 'bar': 2}

In python 3 this is easy
myVariable = 5
for v in locals():
if id(v) == id("myVariable"):
print(v, locals()[v])
this will print:
myVariable 5

Python3. Use inspect to capture the calling local namespace then use ideas presented here. Can return more than one answer as has been pointed out.
def varname(var):
import inspect
frame = inspect.currentframe()
var_id = id(var)
for name in frame.f_back.f_locals.keys():
try:
if id(eval(name)) == var_id:
return(name)
except:
pass

Here's the function I created to read the variable names. It's more general and can be used in different applications:
def get_variable_name(*variable):
'''gets string of variable name
inputs
variable (str)
returns
string
'''
if len(variable) != 1:
raise Exception('len of variables inputed must be 1')
try:
return [k for k, v in locals().items() if v is variable[0]][0]
except:
return [k for k, v in globals().items() if v is variable[0]][0]
To use it in the specified question:
>>> foo = False
>>> bar = True
>>> my_dict = {get_variable_name(foo):foo,
get_variable_name(bar):bar}
>>> my_dict
{'bar': True, 'foo': False}

In reading the thread, I saw an awful lot of friction. It's easy enough to give
a bad answer, then let someone give the correct answer. Anyway, here is what I found.
From: [effbot.org] (http://effbot.org/zone/python-objects.htm#names)
The names are a bit different — they’re not really properties of the object, and the object itself doesn't know what it’s called.
An object can have any number of names, or no name at all.
Names live in namespaces (such as a module namespace, an instance namespace, a function’s local namespace).
Note: that it says the object itself doesn’t know what it’s called, so that was the clue. Python objects are not self-referential. Then it says, Names live in namespaces. We have this in TCL/TK. So maybe my answer will help (but it did help me)
jj = 123
print eval("'" + str(id(jj)) + "'")
print dir()
166707048
['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'jj']
So there is 'jj' at the end of the list.
Rewrite the code as:
jj = 123
print eval("'" + str(id(jj)) + "'")
for x in dir():
print id(eval(x))
161922920
['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'jj']
3077447796
136515736
3077408320
3077656800
136515736
161922920
This nasty bit of code id's the name of variable/object/whatever-you-pedantics-call-it.
So, there it is. The memory address of 'jj' is the same when we look for it directly, as when we do the dictionary look up in global name space. I'm sure you can make a function to do this. Just remember which namespace your variable/object/wypci is in.
QED.

I wrote the package sorcery to do this kind of magic robustly. You can write:
from sorcery import dict_of
my_dict = dict_of(foo, bar)

Maybe I'm overthinking this but..
str_l = next((k for k,v in locals().items() if id(l) == id(v)))
>>> bar = True
>>> foo = False
>>> my_dict=dict(bar=bar, foo=foo)
>>> next((k for k,v in locals().items() if id(bar) == id(v)))
'bar'
>>> next((k for k,v in locals().items() if id(foo) == id(v)))
'foo'
>>> next((k for k,v in locals().items() if id(my_dict) == id(v)))
'my_dict'

import re
import traceback
pattren = re.compile(r'[\W+\w+]*get_variable_name\((\w+)\)')
def get_variable_name(x):
return pattren.match( traceback.extract_stack(limit=2)[0][3]) .group(1)
a = 1
b = a
c = b
print get_variable_name(a)
print get_variable_name(b)
print get_variable_name(c)

I uploaded a solution to pypi. It's a module defining an equivalent of C#'s nameof function.
It iterates through bytecode instructions for the frame its called in, getting the names of variables/attributes passed to it. The names are found in the .argrepr of LOAD instructions following the function's name.

Most objects don't have a __name__ attribute. (Classes, functions, and modules do; any more builtin types that have one?)
What else would you expect for print(my_var.__name__) other than print("my_var")? Can you simply use the string directly?
You could "slice" a dict:
def dict_slice(D, keys, default=None):
return dict((k, D.get(k, default)) for k in keys)
print dict_slice(locals(), ["foo", "bar"])
# or use set literal syntax if you have a recent enough version:
print dict_slice(locals(), {"foo", "bar"})
Alternatively:
throw = object() # sentinel
def dict_slice(D, keys, default=throw):
def get(k):
v = D.get(k, throw)
if v is not throw:
return v
if default is throw:
raise KeyError(k)
return default
return dict((k, get(k)) for k in keys)

Well, I encountered the very same need a few days ago and had to get a variable's name which was pointing to the object itself.
And why was it so necessary?
In short I was building a plug-in for Maya. The core plug-in was built using C++ but the GUI is drawn through Python(as its not processor intensive). Since I, as yet, don't know how to return multiple values from the plug-in except the default MStatus, therefore to update a dictionary in Python I had to pass the the name of the variable, pointing to the object implementing the GUI and which contained the dictionary itself, to the plug-in and then use the MGlobal::executePythonCommand() to update the dictionary from the global scope of Maya.
To do that what I did was something like:
import time
class foo(bar):
def __init__(self):
super(foo, self).__init__()
self.time = time.time() #almost guaranteed to be unique on a single computer
def name(self):
g = globals()
for x in g:
if isinstance(g[x], type(self)):
if g[x].time == self.time:
return x
#or you could:
#return filter(None,[x if g[x].time == self.time else None for x in g if isinstance(g[x], type(self))])
#and return all keys pointing to object itself
I know that it is not the perfect solution in in the globals many keys could be pointing to the same object e.g.:
a = foo()
b = a
b.name()
>>>b
or
>>>a
and that the approach isn't thread-safe. Correct me if I am wrong.
At least this approach solved my problem by getting the name of any variable in the global scope which pointed to the object itself and pass it over to the plug-in, as argument, for it use internally.
I tried this on int (the primitive integer class) but the problem is that these primitive classes don't get bypassed (please correct the technical terminology used if its wrong). You could re-implement int and then do int = foo but a = 3 will never be an object of foo but of the primitive. To overcome that you have to a = foo(3) to get a.name() to work.

With python 2.7 and newer there is also dictionary comprehension which makes it a bit shorter. If possible I would use getattr instead eval (eval is evil) like in the top answer. Self can be any object which has the context your a looking at. It can be an object or locals=locals() etc.
{name: getattr(self, name) for name in ['some', 'vars', 'here]}

I was working on a similar problem. #S.Lott said "If you have the list of variables, what's the point of "discovering" their names?" And my answer is just to see if it could be done and if for some reason you want to sort your variables by type into lists. So anyways, in my research I came came across this thread and my solution is a bit expanded and is based on #rlotun solution. One other thing, #unutbu said, "This idea has merit, but note that if two variable names reference the same value (e.g. True), then an unintended variable name might be returned." In this exercise that was true so I dealt with it by using a list comprehension similar to this for each possibility: isClass = [i for i in isClass if i != 'item']. Without it "item" would show up in each list.
__metaclass__ = type
from types import *
class Class_1: pass
class Class_2: pass
list_1 = [1, 2, 3]
list_2 = ['dog', 'cat', 'bird']
tuple_1 = ('one', 'two', 'three')
tuple_2 = (1000, 2000, 3000)
dict_1 = {'one': 1, 'two': 2, 'three': 3}
dict_2 = {'dog': 'collie', 'cat': 'calico', 'bird': 'robin'}
x = 23
y = 29
pie = 3.14159
eee = 2.71828
house = 'single story'
cabin = 'cozy'
isClass = []; isList = []; isTuple = []; isDict = []; isInt = []; isFloat = []; isString = []; other = []
mixedDataTypes = [Class_1, list_1, tuple_1, dict_1, x, pie, house, Class_2, list_2, tuple_2, dict_2, y, eee, cabin]
print '\nMIXED_DATA_TYPES total count:', len(mixedDataTypes)
for item in mixedDataTypes:
try:
# if isinstance(item, ClassType): # use this for old class types (before 3.0)
if isinstance(item, type):
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
isClass.append(mapping_as_str)
isClass = [i for i in isClass if i != 'item']
elif isinstance(item, ListType):
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
isList.append(mapping_as_str)
isList = [i for i in isList if i != 'item']
elif isinstance(item, TupleType):
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
isTuple.append(mapping_as_str)
isTuple = [i for i in isTuple if i != 'item']
elif isinstance(item, DictType):
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
isDict.append(mapping_as_str)
isDict = [i for i in isDict if i != 'item']
elif isinstance(item, IntType):
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
isInt.append(mapping_as_str)
isInt = [i for i in isInt if i != 'item']
elif isinstance(item, FloatType):
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
isFloat.append(mapping_as_str)
isFloat = [i for i in isFloat if i != 'item']
elif isinstance(item, StringType):
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
isString.append(mapping_as_str)
isString = [i for i in isString if i != 'item']
else:
for k, v in list(locals().iteritems()):
if v is item:
mapping_as_str = k
other.append(mapping_as_str)
other = [i for i in other if i != 'item']
except (TypeError, AttributeError), e:
print e
print '\n isClass:', len(isClass), isClass
print ' isList:', len(isList), isList
print ' isTuple:', len(isTuple), isTuple
print ' isDict:', len(isDict), isDict
print ' isInt:', len(isInt), isInt
print ' isFloat:', len(isFloat), isFloat
print 'isString:', len(isString), isString
print ' other:', len(other), other
# my output and the output I wanted
'''
MIXED_DATA_TYPES total count: 14
isClass: 2 ['Class_1', 'Class_2']
isList: 2 ['list_1', 'list_2']
isTuple: 2 ['tuple_1', 'tuple_2']
isDict: 2 ['dict_1', 'dict_2']
isInt: 2 ['x', 'y']
isFloat: 2 ['pie', 'eee']
isString: 2 ['house', 'cabin']
other: 0 []
'''

you can use easydict
>>> from easydict import EasyDict as edict
>>> d = edict({'foo':3, 'bar':{'x':1, 'y':2}})
>>> d.foo
3
>>> d.bar.x
1
>>> d = edict(foo=3)
>>> d.foo
3
another example:
>>> d = EasyDict(log=False)
>>> d.debug = True
>>> d.items()
[('debug', True), ('log', False)]

On python3, this function will get the outer most name in the stack:
import inspect
def retrieve_name(var):
"""
Gets the name of var. Does it from the out most frame inner-wards.
:param var: variable to get name from.
:return: string
"""
for fi in reversed(inspect.stack()):
names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
if len(names) > 0:
return names[0]
It is useful anywhere on the code. Traverses the reversed stack looking for the first match.

While this is probably an awful idea, it is along the same lines as rlotun's answer but it'll return the correct result more often.
import inspect
def getVarName(getvar):
frame = inspect.currentframe()
callerLocals = frame.f_back.f_locals
for k, v in list(callerLocals.items()):
if v is getvar():
callerLocals.pop(k)
try:
getvar()
callerLocals[k] = v
except NameError:
callerLocals[k] = v
del frame
return k
del frame
You call it like this:
bar = True
foo = False
bean = False
fooName = getVarName(lambda: foo)
print(fooName) # prints "foo"

should get list then return
def get_var_name(**kwargs):
"""get variable name
get_var_name(var = var)
Returns:
[str] -- var name
"""
return list(kwargs.keys())[0]

It will not return the name of variable but you can create dictionary from global variable easily.
class CustomDict(dict):
def __add__(self, other):
return CustomDict({**self, **other})
class GlobalBase(type):
def __getattr__(cls, key):
return CustomDict({key: globals()[key]})
def __getitem__(cls, keys):
return CustomDict({key: globals()[key] for key in keys})
class G(metaclass=GlobalBase):
pass
x, y, z = 0, 1, 2
print('method 1:', G['x', 'y', 'z']) # Outcome: method 1: {'x': 0, 'y': 1, 'z': 2}
print('method 2:', G.x + G.y + G.z) # Outcome: method 2: {'x': 0, 'y': 1, 'z': 2}

With python-varname you can easily do it:
pip install python-varname
from varname import Wrapper
foo = Wrapper(True)
bar = Wrapper(False)
your_dict = {val.name: val.value for val in (foo, bar)}
print(your_dict)
# {'foo': True, 'bar': False}
Disclaimer: I'm the author of that python-varname library.

>>> a = 1
>>> b = 1
>>> id(a)
34120408
>>> id(b)
34120408
>>> a is b
True
>>> id(a) == id(b)
True
this way get varname for a maybe 'a' or 'b'.

How to group array of char/string with UNION?

I have a two dimensional array of char, called Letters[ ][ ]
Letters[0][0] = A
[0][1] = B
Letters[1][0] = C
[1][1] = D
Letters[2][0] = B
[2][1] = A
[2][2] = F
Letters[3][0] = I
[3][1] = F
[3][2] = J
I need to group it, so it will be something like this:
group[0] [0] = A
group[0] [1] = B
group[0] [2] = F
group[0] [3] = I
group[0] [4] = J
group[1] [0] = C
group[1] [1] = D
My logic so far for my problem is check every elements with other elements. If both elements are the same letter, it groups together with the whole other array elements with no double/duplicated elements. But, I'm not sure of using C# Linq Union or maybe just a standard array access.
How do I supposed to do to group it in best way? Or are there any other solutions for this?

I think a pure LINQ solution would be overly complex. This isn't (if I understand your specification correctly) a simple union operation. You want to union based on non-empty intersections. That would mean having to first rearrange the data so LINQ can do a join, to find the data that matches, and since LINQ will only join on equality, doing that while preserving the original grouping information is going to result in syntax that would be more trouble than it's worth, IMHO.
Here is a non-LINQ approach that works for the example you've given:
static void Main(string[] args)
{
char[][] letters =
{
new [] { 'A', 'B' },
new [] { 'C', 'D' },
new [] { 'B', 'A', 'F' },
new [] { 'I', 'F', 'J' },
};
List<HashSet<char>> sets = new List<HashSet<char>>();
foreach (char[] row in letters)
{
List<int> setIndexes = Enumerable.Range(0, sets.Count)
.Where(i => row.Any(ch => sets[i].Contains(ch))).ToList();
CoalesceSets(sets, row, setIndexes);
}
foreach (HashSet<char> set in sets)
{
Console.WriteLine("{ " + string.Join(", ", set) + " }");
}
}
private static void CoalesceSets(List<HashSet<char>> sets, char[] row, List<int> setIndexes)
{
if (setIndexes.Count == 0)
{
sets.Add(new HashSet<char>(row));
}
else
{
HashSet<char> targetSet = sets[setIndexes[0]];
targetSet.UnionWith(row);
for (int i = setIndexes.Count - 1; i >= 1; i--)
{
targetSet.UnionWith(sets[setIndexes[i]]);
sets.RemoveAt(setIndexes[i]);
}
}
}
It builds up sets of the input data by scanning the previously identified sets to find which ones the current row of data intersects with, and then coalesces these sets into a single set containing all of the members (your specification appears to impose transitive membership…i.e. if one letter joins sets A and B, and a different letter joins set B and C, you want A, B, and C all joined into a single set).
This isn't an optimal solution, but it's readable. You could avoid the O(N^2) search by maintaining a Dictionary<char, int> to map each character to the set which contains it. Then instead of scanning all the sets, it's a simple lookup for each character in the current row, to build up the list of set indexes. But there's a lot more "housekeeping" code going that approach; I would not bother implementing it that way unless you find a proven performance issue doing it the more basic way.
By the way: I have a vague recollection I've seen this type of question before on Stack Overflow, i.e. this sort of transitive unioning of sets. I looked for the question but couldn't find it. You may have more luck, and may find there is additional helpful information with that question and its answers.

Sorting a List<String> containg coordinates x and y

Can anyone please help me sort the following List of Strings:
The List<String> contains coordinates
[0] "0 0"
[1] "0 1"
[2] "0 2"
[3] "0 3"
[4] "1 1"
[5] "1 2"
[6] "1 3"
Although It may not always be in that order I would like to make sure it is, by sorting / ordering it (sort by X coordinate ASC then by Y coordinate ASC)
I have tried this but it does not alter the list at all? - see below
boardObjectList.OrderBy(p => (p.Split())[0]).ThenBy(p=> (p.Split())[1]);
Any ideas?
Thanks,
JP

OrderBy and ThenBy do not modify the original list, they only return a new list (in the form of an IEnumerable<>). What you need to do is create a new List<> from the resulting IEnumerable<>, like this:
// Note that we are assigning the variable to a new list
boardObjectList = boardObjectList.OrderBy(p => (p.Split())[0])
.ThenBy(p => (p.Split())[1])
.ToList(); // Also note that we call ToList,
// to get a List from an IEnumerable
You will get strange results when storing numbers in strings, and trying to sort. I recommend changing your code to this:
boardObjectList = boardObjectList.OrderBy(p => int.Parse(p.Split()[0]))
.ThenBy(p => int.Parse(p.Split()[1]))
.ToList();
This method converts the strings into integers before sorting. The reason to do this is that string sorting sorts alphabetically, leading to sorting like this:
1
10
11
12
2
3
4
5
6
7
8
9

Here's a possible solution. I use a separate struct for the integer coordinate and translate the split string into instances of that.
// Defined elsewhere
struct Coord
{
public int x;
public int y;
}
// Where you're doing your work...
var intCoords = new List<Coord>();
foreach (var coord in boardObjectList)
{
var str = coord.Split(new char[] { ' ' });
intCoords.Add(new Coord() {
x = Int32.Parse(str[0]),
y = Int32.Parse(str[1])
});
}
// Do the actual sort. Ensure you assign the result to a variable
var newCoords = intCoords.OrderBy(x => x.x).ThenBy(x => x.y).ToList();

Of course it won't alter the list at all - LINQ does not modify underyling collections, it simply creates queries.
What you are after is storing the query result in a new list:
boardObjectList = boardObjectList.OrderBy(p => (p.Split())[0]).ThenBy(p=> (p.Split())[1]).ToList();
EDIT: had another look. You should not compare strings like that - what will happen if any of those "numbers" will be larger than "9"? Here's the updated solution:
boardObjectList = boardObjectList.Select(p => new { P = p, Split = p.Split() } ).
OrderBy(x => int.Parse(x.Split[0])).ThenBy(x => int.Parse(x.Split[1])).
Select(x => x.P).ToList();
EDIT2: You can also do it without LINQ with less memory overhead:
boardObjectList.Sort((a, b) =>
{
// split a and b
var aSplit = a.Split();
var bSplit = b.Split();
// see if there's a difference in first coordinate
int diff = int.Parse(aSplit[0]) - int.Parse(bSplit[0]);
if (diff == 0)
{
// if there isn't, return difference in the second
return int.Parse(aSplit[1]) - int.Parse(bSplit[1]);
}
// positive if a.x>b.x, negative if a.x<b.x - exactly what sort expects
return diff;
});

LINQ Aggregate algorithm explained

This might sound lame, but I have not been able to find a really good explanation of Aggregate.
Good means short, descriptive, comprehensive with a small and clear example.

The easiest-to-understand definition of Aggregate is that it performs an operation on each element of the list taking into account the operations that have gone before. That is to say it performs the action on the first and second element and carries the result forward. Then it operates on the previous result and the third element and carries forward. etc.
Example 1. Summing numbers
var nums = new[]{1,2,3,4};
var sum = nums.Aggregate( (a,b) => a + b);
Console.WriteLine(sum); // output: 10 (1+2+3+4)
This adds 1 and 2 to make 3. Then adds 3 (result of previous) and 3 (next element in sequence) to make 6. Then adds 6 and 4 to make 10.
Example 2. create a csv from an array of strings
var chars = new []{"a","b","c","d"};
var csv = chars.Aggregate( (a,b) => a + ',' + b);
Console.WriteLine(csv); // Output a,b,c,d
This works in much the same way. Concatenate a a comma and b to make a,b. Then concatenates a,b with a comma and c to make a,b,c. and so on.
Example 3. Multiplying numbers using a seed
For completeness, there is an overload of Aggregate which takes a seed value.
var multipliers = new []{10,20,30,40};
var multiplied = multipliers.Aggregate(5, (a,b) => a * b);
Console.WriteLine(multiplied); //Output 1200000 ((((5*10)*20)*30)*40)
Much like the above examples, this starts with a value of 5 and multiplies it by the first element of the sequence 10 giving a result of 50. This result is carried forward and multiplied by the next number in the sequence 20 to give a result of 1000. This continues through the remaining 2 element of the sequence.
Live examples: http://rextester.com/ZXZ64749
Docs: http://msdn.microsoft.com/en-us/library/bb548651.aspx
Addendum
Example 2, above, uses string concatenation to create a list of values separated by a comma. This is a simplistic way to explain the use of Aggregate which was the intention of this answer. However, if using this technique to actually create a large amount of comma separated data, it would be more appropriate to use a StringBuilder, and this is entirely compatible with Aggregate using the seeded overload to initiate the StringBuilder.
var chars = new []{"a","b","c", "d"};
var csv = chars.Aggregate(new StringBuilder(), (a,b) => {
if(a.Length>0)
a.Append(",");
a.Append(b);
return a;
});
Console.WriteLine(csv);
Updated example: http://rextester.com/YZCVXV6464

It partly depends on which overload you're talking about, but the basic idea is:
Start with a seed as the "current value"
Iterate over the sequence. For each value in the sequence:
Apply a user-specified function to transform (currentValue, sequenceValue) into (nextValue)
Set currentValue = nextValue
Return the final currentValue
You may find the Aggregate post in my Edulinq series useful - it includes a more detailed description (including the various overloads) and implementations.
One simple example is using Aggregate as an alternative to Count:
// 0 is the seed, and for each item, we effectively increment the current value.
// In this case we can ignore "item" itself.
int count = sequence.Aggregate(0, (current, item) => current + 1);
Or perhaps summing all the lengths of strings in a sequence of strings:
int total = sequence.Aggregate(0, (current, item) => current + item.Length);
Personally I rarely find Aggregate useful - the "tailored" aggregation methods are usually good enough for me.

Super short
Aggregate works like fold in Haskell/ML/F#.
Slightly longer
.Max(), .Min(), .Sum(), .Average() all iterates over the elements in a sequence and aggregates them using the respective aggregate function. .Aggregate () is generalized aggregator in that it allows the developer to specify the start state (aka seed) and the aggregate function.
I know you asked for a short explaination but I figured as others gave a couple of short answers I figured you would perhaps be interested in a slightly longer one
Long version with code
One way to illustrate what does it could be show how you implement Sample Standard Deviation once using foreach and once using .Aggregate. Note: I haven't prioritized performance here so I iterate several times over the colleciton unnecessarily
First a helper function used to create a sum of quadratic distances:
static double SumOfQuadraticDistance (double average, int value, double state)
{
var diff = (value - average);
return state + diff * diff;
}
Then Sample Standard Deviation using ForEach:
static double SampleStandardDeviation_ForEach (
this IEnumerable<int> ints)
{
var length = ints.Count ();
if (length < 2)
{
return 0.0;
}
const double seed = 0.0;
var average = ints.Average ();
var state = seed;
foreach (var value in ints)
{
state = SumOfQuadraticDistance (average, value, state);
}
var sumOfQuadraticDistance = state;
return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}
Then once using .Aggregate:
static double SampleStandardDeviation_Aggregate (
this IEnumerable<int> ints)
{
var length = ints.Count ();
if (length < 2)
{
return 0.0;
}
const double seed = 0.0;
var average = ints.Average ();
var sumOfQuadraticDistance = ints
.Aggregate (
seed,
(state, value) => SumOfQuadraticDistance (average, value, state)
);
return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}
Note that these functions are identical except for how sumOfQuadraticDistance is calculated:
var state = seed;
foreach (var value in ints)
{
state = SumOfQuadraticDistance (average, value, state);
}
var sumOfQuadraticDistance = state;
Versus:
var sumOfQuadraticDistance = ints
.Aggregate (
seed,
(state, value) => SumOfQuadraticDistance (average, value, state)
);
So what .Aggregate does is that it encapsulates this aggregator pattern and I expect that the implementation of .Aggregate would look something like this:
public static TAggregate Aggregate<TAggregate, TValue> (
this IEnumerable<TValue> values,
TAggregate seed,
Func<TAggregate, TValue, TAggregate> aggregator
)
{
var state = seed;
foreach (var value in values)
{
state = aggregator (state, value);
}
return state;
}
Using the Standard deviation functions would look something like this:
var ints = new[] {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
var average = ints.Average ();
var sampleStandardDeviation = ints.SampleStandardDeviation_Aggregate ();
var sampleStandardDeviation2 = ints.SampleStandardDeviation_ForEach ();
Console.WriteLine (average);
Console.WriteLine (sampleStandardDeviation);
Console.WriteLine (sampleStandardDeviation2);
IMHO
So does .Aggregate help readability? In general I love LINQ because I think .Where, .Select, .OrderBy and so on greatly helps readability (if you avoid inlined hierarhical .Selects). Aggregate has to be in Linq for completeness reasons but personally I am not so convinced that .Aggregate adds readability compared to a well written foreach.

A picture is worth a thousand words
Reminder:
Func<X, Y, R> is a function with two inputs of type X and Y, that returns a result of type R.
Enumerable.Aggregate has three overloads:
Overload 1:
A Aggregate<A>(IEnumerable<A> a, Func<A, A, A> f)
Example:
new[]{1,2,3,4}.Aggregate((x, y) => x + y); // 10
This overload is simple, but it has the following limitations:
the sequence must contain at least one element,
otherwise the function will throw an InvalidOperationException.
elements and result must be of the same type.
Overload 2:
B Aggregate<A, B>(IEnumerable<A> a, B bIn, Func<B, A, B> f)
Example:
var hayStack = new[] {"straw", "needle", "straw", "straw", "needle"};
var nNeedles = hayStack.Aggregate(0, (n, e) => e == "needle" ? n+1 : n); // 2
This overload is more general:
a seed value must be provided (bIn).
the collection can be empty,
in this case, the function will yield the seed value as result.
elements and result can have different types.
Overload 3:
C Aggregate<A,B,C>(IEnumerable<A> a, B bIn, Func<B,A,B> f, Func<B,C> f2)
The third overload is not very useful IMO.
The same can be written more succinctly by using overload 2 followed by a function that transforms its result.
The illustrations are adapted from this excellent blogpost.

Aggregate is basically used to Group or Sum up data.
According to MSDN
"Aggregate Function Applies an accumulator function over a sequence."
Example 1: Add all the numbers in a array.
int[] numbers = new int[] { 1,2,3,4,5 };
int aggregatedValue = numbers.Aggregate((total, nextValue) => total + nextValue);
*important: The initial aggregate value by default is the 1 element in the sequence of collection.
i.e: the total variable initial value will be 1 by default.
variable explanation
total: it will hold the sum up value(aggregated value) returned by the func.
nextValue: it is the next value in the array sequence. This value is than added to the aggregated value i.e total.
Example 2: Add all items in an array. Also set the initial accumulator value to start adding with from 10.
int[] numbers = new int[] { 1,2,3,4,5 };
int aggregatedValue = numbers.Aggregate(10, (total, nextValue) => total + nextValue);
arguments explanation:
the first argument is the initial(starting value i.e seed value) which will be used to start addition with the next value in the array.
the second argument is a func which is a func that takes 2 int.
1.total: this will hold same as before the sum up value(aggregated value) returned by the func after the calculation.
2.nextValue: : it is the next value in the array sequence. This value is than added to the aggregated value i.e total.
Also debugging this code will give you a better understanding of how aggregate work.

In addition to all the great answers here already, I've also used it to walk an item through a series of transformation steps.
If a transformation is implemented as a Func<T,T>, you can add several transformations to a List<Func<T,T>> and use Aggregate to walk an instance of T through each step.
A more concrete example
You want to take a string value, and walk it through a series of text transformations that could be built programatically.
var transformationPipeLine = new List<Func<string, string>>();
transformationPipeLine.Add((input) => input.Trim());
transformationPipeLine.Add((input) => input.Substring(1));
transformationPipeLine.Add((input) => input.Substring(0, input.Length - 1));
transformationPipeLine.Add((input) => input.ToUpper());
var text = " cat ";
var output = transformationPipeLine.Aggregate(text, (input, transform)=> transform(input));
Console.WriteLine(output);
This will create a chain of transformations: Remove leading and trailing spaces -> remove first character -> remove last character -> convert to upper-case. Steps in this chain can be added, removed, or reordered as needed, to create whatever kind of transformation pipeline is required.
The end result of this specific pipeline, is that " cat " becomes "A".
This can become very powerful once you realize that T can be anything. This could be used for image transformations, like filters, using BitMap as an example;

Learned a lot from Jamiec's answer.
If the only need is to generate CSV string, you may try this.
var csv3 = string.Join(",",chars);
Here is a test with 1 million strings
0.28 seconds = Aggregate w/ String Builder
0.30 seconds = String.Join
Source code is here

Definition
Aggregate method is an extension method for generic collections. Aggregate method applies a function to each item of a collection. Not just only applies a function, but takes its result as initial value for the next iteration. So, as a result, we will get a computed value (min, max, avg, or other statistical value) from a collection.
Therefore, Aggregate method is a form of safe implementation of a recursive function.
Safe, because the recursion will iterate over each item of a collection and we can’t get any infinite loop suspension by wrong exit condition. Recursive, because the current function’s result is used as a parameter for the next function call.
Syntax:
collection.Aggregate(seed, func, resultSelector);
seed - initial value by default;
func - our recursive function. It can be a lambda-expression, a Func delegate or a function type T F(T result, T nextValue);
resultSelector - it can be a function like func or an expression to compute, transform, change, convert the final result.
How it works:
var nums = new[]{1, 2};
var result = nums.Aggregate(1, (result, n) => result + n); //result = (1 + 1) + 2 = 4
var result2 = nums.Aggregate(0, (result, n) => result + n, response => (decimal)response/2.0); //result2 = ((0 + 1) + 2)*1.0/2.0 = 3*1.0/2.0 = 3.0/2.0 = 1.5
Practical usage:
Find Factorial from a number n:
int n = 7;
var numbers = Enumerable.Range(1, n);
var factorial = numbers.Aggregate((result, x) => result * x);
which is doing the same thing as this function:
public static int Factorial(int n)
{
if (n < 1) return 1;
return n * Factorial(n - 1);
}
Aggregate() is one of the most powerful LINQ extension method, like Select() and Where(). We can use it to replace the Sum(), Min(). Max(), Avg() functionality, or to change it by implementing addition context:
var numbers = new[]{3, 2, 6, 4, 9, 5, 7};
var avg = numbers.Aggregate(0.0, (result, x) => result + x, response => (double)response/(double)numbers.Count());
var min = numbers.Aggregate((result, x) => (result < x)? result: x);
More complex usage of extension methods:
var path = #“c:\path-to-folder”;
string[] txtFiles = Directory.GetFiles(path).Where(f => f.EndsWith(“.txt”)).ToArray<string>();
var output = txtFiles.Select(f => File.ReadAllText(f, Encoding.Default)).Aggregate<string>((result, content) => result + content);
File.WriteAllText(path + “summary.txt”, output, Encoding.Default);
Console.WriteLine(“Text files merged into: {0}”, output); //or other log info

This is an explanation about using Aggregate on a Fluent API such as Linq Sorting.
var list = new List<Student>();
var sorted = list
.OrderBy(s => s.LastName)
.ThenBy(s => s.FirstName)
.ThenBy(s => s.Age)
.ThenBy(s => s.Grading)
.ThenBy(s => s.TotalCourses);
and lets see we want to implement a sort function that take a set of fields, this is very easy using Aggregate instead of a for-loop, like this:
public static IOrderedEnumerable<Student> MySort(
this List<Student> list,
params Func<Student, object>[] fields)
{
var firstField = fields.First();
var otherFields = fields.Skip(1);
var init = list.OrderBy(firstField);
return otherFields.Skip(1).Aggregate(init, (resultList, current) => resultList.ThenBy(current));
}
And we can use it like this:
var sorted = list.MySort(
s => s.LastName,
s => s.FirstName,
s => s.Age,
s => s.Grading,
s => s.TotalCourses);

Aggregate used to sum columns in a multi dimensional integer array
int[][] nonMagicSquare =
{
new int[] { 3, 1, 7, 8 },
new int[] { 2, 4, 16, 5 },
new int[] { 11, 6, 12, 15 },
new int[] { 9, 13, 10, 14 }
};
IEnumerable<int> rowSums = nonMagicSquare
.Select(row => row.Sum());
IEnumerable<int> colSums = nonMagicSquare
.Aggregate(
(priorSums, currentRow) =>
priorSums.Select((priorSum, index) => priorSum + currentRow[index]).ToArray()
);
Select with index is used within the Aggregate func to sum the matching columns and return a new Array; { 3 + 2 = 5, 1 + 4 = 5, 7 + 16 = 23, 8 + 5 = 13 }.
Console.WriteLine("rowSums: " + string.Join(", ", rowSums)); // rowSums: 19, 27, 44, 46
Console.WriteLine("colSums: " + string.Join(", ", colSums)); // colSums: 25, 24, 45, 42
But counting the number of trues in a Boolean array is more difficult since the accumulated type (int) differs from the source type (bool); here a seed is necessary in order to use the second overload.
bool[][] booleanTable =
{
new bool[] { true, true, true, false },
new bool[] { false, false, false, true },
new bool[] { true, false, false, true },
new bool[] { true, true, false, false }
};
IEnumerable<int> rowCounts = booleanTable
.Select(row => row.Select(value => value ? 1 : 0).Sum());
IEnumerable<int> seed = new int[booleanTable.First().Length];
IEnumerable<int> colCounts = booleanTable
.Aggregate(seed,
(priorSums, currentRow) =>
priorSums.Select((priorSum, index) => priorSum + (currentRow[index] ? 1 : 0)).ToArray()
);
Console.WriteLine("rowCounts: " + string.Join(", ", rowCounts)); // rowCounts: 3, 1, 2, 2
Console.WriteLine("colCounts: " + string.Join(", ", colCounts)); // colCounts: 3, 2, 1, 2

Everyone has given his explanation. My explanation is like that.
Aggregate method applies a function to each item of a collection. For example, let's have collection { 6, 2, 8, 3 } and the function Add (operator +) it does (((6+2)+8)+3) and returns 19
var numbers = new List<int> { 6, 2, 8, 3 };
int sum = numbers.Aggregate(func: (result, item) => result + item);
// sum: (((6+2)+8)+3) = 19
In this example there is passed named method Add instead of lambda expression.
var numbers = new List<int> { 6, 2, 8, 3 };
int sum = numbers.Aggregate(func: Add);
// sum: (((6+2)+8)+3) = 19
private static int Add(int x, int y) { return x + y; }

A short and essential definition might be this: Linq Aggregate extension method allows to declare a sort of recursive function applied on the elements of a list, the operands of whom are two: the elements in the order in which they are present into the list, one element at a time, and the result of the previous recursive iteration or nothing if not yet recursion.
In this way you can compute the factorial of numbers, or concatenate strings.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Running number of items in subgroups within ienumerable of items - c#

Related

Calculate a total for each record in a list based on filtered list

string name of variable(object) [duplicate]

How to group array of char/string with UNION?

Sorting a List<String> containg coordinates x and y

LINQ Aggregate algorithm explained

Categories

Resources