C# Splitting a List<string> Value - c#

I have a List with values {"1 120 12", "1 130 22", "2 110 21", "2 100 18"}, etc.
List<string> myList = new List<string>();
myList.Add("1 120 12");
myList.Add("1 130 22");
myList.Add("2 110 21");
myList.Add("2 100 18");
I need to count based on the first number (ID) is and sum the consequent values for this
IDs i.e. for ID = 1 -> 120+130=150 and 12+22=34 and so on... I have to return an array with these values.
I know I can get these individual values, add them to an array and split it by the empty space between them with something like:
string[] arr2 = arr[i].Split(' ');
and loop thru them to do the sum of each value, but... is there an easy way to do it straight using Lists or Linq Lambda expression?

You can do it in LINQ like this:
var result = myList.Select(x => x.Split(' ').Select(int.Parse))
.GroupBy(x => x.First())
.Select(x => x.Select(y => y.Skip(1).ToArray())
.Aggregate(new [] {0,0}, (y,z) => new int[] {y[0] + z[0], y[1] + z[1]}));
First, the strings are split and converted to int, then they are grouped by ID, then the ID is dropped, and in the end, they are summed together.
But I strongly recommend not doing it in LINQ, because this expression is not easy to understand. If you do it the classic way with a loop, it is quite clear what is going on at first sight. But put this code containing the loop into a separate method, because that way it won't distract you and you still only call a one-liner as in the LINQ solution.

To do it straight, no LINQ, perhaps:
var d = new Dictionary<string, (int A, int B)>();
foreach(var s in myList){
var bits = s.Split();
if(!d.ContainsKey(bits[0]))
d[bits[0]] = (int.Parse(bits[1]), int.Parse(bits[2]));
else {
(int A, int B) x = d[bits[0]];
d[bits[0]] = (x.A + int.Parse(bits[1]), x.B + int.Parse(bits[2]));
}
}
Using LINQ to parse the int, and switching to using TryGetValue, will tidy it up a bit:
var d = new Dictionary<int, (int A, int B)>();
foreach(var s in myList){
var bits = s.Split().Select(int.Parse).ToArray();
if(d.TryGetValue(bits[0], out (int A, int B) x))
d[bits[0]] = ((x.A + bits[1], x.B + bits[2]));
else
d[bits[0]] = (bits[1], bits[2]);
}
Introducing a local function to safely get either the existing nums in the dictionary or a (0,0) pair might reduce it a bit too:
var d = new Dictionary<int, (int A, int B)>();
(int A, int B) safeGet(int i) => d.ContainsKey(i) ? d[i]: (0,0);
foreach(var s in myList){
var bits = s.Split().Select(int.Parse).ToArray();
var nums = safeGet(bits[0]);
d[bits[0]] = (bits[1] + nums.A, bits[2] + nums.B);
}
Is it any more readable than a linq version? Hmm... Depends on your experience with Linq, and tuples, I suppose..

I know this question already has a lot of answers, but I have not seen one yet that focuses on readability.
If you split your code into a parsing phase and a calculation phase, we can use LINQ without sacrificing readability or maintainability, because each phase only does one thing:
List<string> myList = new List<string>();
myList.Add("1 120 12");
myList.Add("1 130 22");
myList.Add("2 110 21");
myList.Add("2 100 18");
var parsed = (from item in myList
let split = item.Split(' ')
select new
{
ID = int.Parse(split[0]),
Foo = int.Parse(split[1]),
Bar = int.Parse(split[2])
});
var summed = (from item in parsed
group item by item.ID into groupedByID
select new
{
ID = groupedByID.Key,
SumOfFoo = groupedByID.Sum(g => g.Foo),
SumOfBar = groupedByID.Sum(g => g.Bar)
}).ToList();
foreach (var s in summed)
{
Console.WriteLine($"ID: {s.ID}, SumOfFoo: {s.SumOfFoo}, SumOfBar: {s.SumOfBar}");
}
fiddle

If you want, but I think it will be much easier to edit and optimize using the usual value. I don't find using this kind of logic inside LINQ will stay that way for a long period of time. Usually, we need to add more values, more parsing, etc. Make it not really suitable for everyday use.
var query = myList.Select(a => a.Split(' ').Select(int.Parse).ToArray())
.GroupBy(
index => index[0],
amount => new
{
First = amount[1],
Second = amount[2]
},
(index, amount) => new
{
Index = index,
SumFirst = amount.Sum(a => a.First),
SumSecond = amount.Sum(a => a.Second)
}
);
fiddle

is there an easy way to do it straight using Lists or Linq Lambda expression?
Maybe, is it wise to do this? Probably not. Your code will be hard to understand, impossible to unit test, the code will probably not be reusable, and small changes are difficult.
But let's first answer your question as a one LINQ statement:
const char separatorChar = ' ';
IEnumerable<string> inputText = ...
var result = inputtext.Split(separatorChar)
.Select(text => Int32.Parse(text))
.Select(numbers => new
{
Id = numbers.First()
Sum = numbers.Skip(1).Sum(),
});
Not reusable, hard to unit test, difficult to change, not efficient, do you need more arguments?
It would be better to have a procedure that converts one input string into a proper object that contains what your input string really represents.
Alas, you didn't tell us if every input string contains three integer numbers, of that some might contain invalid text, and some might contain more or less than three integer numbers.
You forgot to tell use what your input string represents.
So I'll just make up an identifier:
class ProductSize
{
public int ProductId {get; set;} // The first number in the string
public int Width {get; set;} // The 2nd number
public int Height {get; set;} // The 3rd number
}
You need a static procedure with input a string, and output one ProductSize:
public static ProductSize FromText(string productSizeText)
{
// Todo: check input
const char separatorChar = ' ';
var splitNumbers = productSizeText.Split(separatorChar)
.Select(splitText => Int32.Parse(splitText))
.ToList();
return new ProductSize
{
ProductId = splitNumbers[0],
Width = splitNumbers[1],
Height = splitNumbers[2],
};
}
I need to count based on the first number (ID) is and sum the consequent values for this IDs
After creating method ParseProductSize this is easy:
IEnumerable<string> textProductSizes = ...
var result = textProductSizes.Select(text => ProductSize.FromText(text))
.Select(productSize => new
{
Id = productSize.Id,
Sum = productSize.Width + productSize.Height,
});
If your strings do not always have three numbers
If you don't have always three numbers, then you won't have Width and Height, but a property:
IEnumerable<int> Numbers {get; set;} // TODO: invent proper name
And in ParseProductSize:
var splitText = productSizeText.Split(separatorChar);
return new ProductSize
{
ProductId = Int32.Parse(splitText[0]),
Numbers = splitText.Skip(1)
.Select(text => Int32.Parse(text));
I deliberately keep it an IEnumerable, so if you don't use all Numbers, you won't have parsed numbers for nothing.
The LINQ:
var result = textProductSizes.Select(text => ProductSize.FromText(text))
.Select(productSize => new
{
Id = productSize.Id,
Sum = productSize.Numbers.Sum(),
});

Related

Convert ordered comma separated list into tuples with ordered element number (a la SQL SPLIT_STRING) using C# 6.0/.Net Framework 4.8

I can't seem to find a ready answer to this, or even if the question has ever been asked before, but I want functionality similar to the SQL STRING_SPLIT functions floating around, where each item in a comma separated list is identified by its ordinal in the string.
Given the string "abc,xyz,def,tuv", I want to get a list of tuples like:
<1, "abc">
<2, "xyz">
<3, "def">
<4, "tuv">
Order is important, and I need to preserve the order, and be able to take the list and further join it with another list using linq, and be able to preserve the order. For example, if a second list is <"tuv", "abc">, I want the final output of the join to be:
<1, "abc">
<4, "tuv">
Basically, I want the comma separated string to determine the ORDER of the end result, where the comma separated string contains ALL possible strings, and it is joined with an unordered list of a subset of strings, and the output is a list of ordered tuples that consists only of the elements in the second list, but in the order determined by the comma separated string at the beginning.
I could likely figure out all of this on my own if I could just get a C# equivalent to all the various SQL STRING_SPLIT functions out there, which do the split but also include the ordinal element number in the output. But I've searched, and I find nothing for C# but splitting a string into individual elements, or splitting them into tuples where both elements of the tuple are in the string itself, not generated integers to preserve order.
The order is the important thing to me here. So if an element number isn't readily possible, a way to inner join two lists and guarantee preserving the order of the first list while returning only those elements in the second list would be welcome. The tricky part for me is this last part: the result of a join needs a specific (not easy to sort by) order. The ordinal number would give me something to sort by, but if I can inner join with some guarantee the output is in the same order as the first input, that'd work too.
That should work on .NET framework.
using System.Linq;
string str = "abc,xyz,def,tuv";
string str2 = "abc,tuv";
IEnumerable< PretendFileObject> secondList = str2.Split(',').Select(x=> new PretendFileObject() { FileName = x}); //
var tups = str.Split(',')
.Select((x, i) => { return (i + 1, x); })
.Join(secondList, //Join Second list ON
item => item.Item2 //This is the filename in the tuples
,item2 => item2.FileName, // This is the filename property for a given object in the second list to join on
(item,item2) => new {Index = item.Item1,FileName = item.Item2, Obj = item2})
.OrderBy(JoinedObject=> JoinedObject.Index)
.ToList();
foreach (var tup in tups)
{
Console.WriteLine(tup.Obj.FileName);
}
public class PretendFileObject
{
public string FileName { get; set; }
public string Foo { get; set; }
}
Original Response Below
If you wanted to stick to something SQL like here is how to do it with linq operators. The Select method has a built in index param you can make use of. And you can use IntersectBy to perform an easy inner join.
using System.Linq;
string str = "abc,xyz,def,tuv";
string str2 = "abc,tuv";
var secondList = str2.Split(',');
var tups = str.Split(',')
.Select((x, i) => { return (i + 1, x); })
.IntersectBy(secondList, s=>s.Item2) //Filter down to only the strings found in both.
.ToList();
foreach(var tup in tups)
{
Console.WriteLine(tup);
}
This will get you list of tuples
var input = "abc,xyz,def,tuv";
string[] items = input.Split(',');
var tuples = new List<(int, string)>();
for (int i = 0; i < items.Length)
{
tuples.Add(((i + 1), items[i]));
}
if then you want to add list of "tuv" and "abc" and keep 1, you probably want to "Left Join". But I am not sure, how you can do using LINQ because you first need to iterate the original list of tuples and assign same int. Then join. Or, you can join first and then assign int but technically, order is not guaranteed. However, if you assign int first, you can sort by it in the end.
I am slightly confused by "and be able to take the list and further join it with another list using linq". Join usually means aggregate result. But in your case it seem you demanding segment, not joined data.
--
"I want to remove any items from the second list that are not in the first list, and then I need to iterate over the second list IN THE ORDER of the first list"
var input2 = "xxx,xyz,yyy,tuv,";
string[] items2 = input2.Split(',');
IEnumerable<(int, string)> finalTupleOutput =
tuples.Join(items2, t => t.Item2, i2 => i2, (t, i2) => (t.Item1, i2)).OrderBy(tpl => tpl.Item1);
This will give you what you want - matching items from L2 in the order from L1
with LINQ
string inputString = "abc,xyz,def,tuv";
var output = inputString.Split(',')
.Select((item, index) => { return (index + 1, item); });
now you can use the output list as you want to use.
Not 100% sure what you're after, but here's an attempt:
string[] vals = new[] { "abc", "xyz", "dev", "tuv"};
string[] results = new string[vals.Length];
int index = 0;
for (int i = 0; i < vals.Length; i++)
{
results[i] = $"<{++index},\"{vals[i]}\">";
}
foreach (var item in results)
{
Console.WriteLine(item);
}
This produces:
<1,"abc">
<2,"xyz">
<3,"dev">
<4,"tuv">
Given the example
For example, if a second list is <"tuv", "abc">, I want the final
output of the join to be:
<1, "abc"> <4, "tuv">
I think this might be close?
List<string> temp = new List<string>() { "abc", "def", "xyz", "tuv" };
List<string> temp2 = new List<string>() { "dbc", "ace", "zyw", "tke", "abc", "xyz" };
var intersect = temp.Intersect(temp2).Select((list, idx) => (idx+1, list));
This produces an intersect result that has the elements from list 1 that are also in list 2, which in this case would be:
<1, "abc">
<2, "xyz">
If you want all the elements from both lists, switch the Intersect to Union.

Get Elements from String List in order of Occurrence in provided string

Hi I have List of strings as below.
List<string> MyList = new List<string> { "[FirstName]", "[LastName]", "[VoicePhoneNumber]", "[SMSPhoneNumber]" };
I need to get all the elements from the List if exist in string in order. For example my string is
string MessageContent = Hello [LastName] [FirstName]There, this message is for [SMSPhoneNumber]
Right now I am doing
var Exists = MyList.Where(MessageContent.Contains);
This new list have all the items from MyList which occured in MessageContent string but not in order.
How i can get occurrence in order in string?
Desired List as per example is = { "[LastName]","[FirstName]","[SMSPhoneNumber]" }
I would suggest using IndexOf to determine position (and thereby order) as well as existence to avoid searching MessageContent twice at the expense of sorting the answer:
var ans = MyList.Select(w => new { w, pos = MessageContent.IndexOf(w) })
.Where(wp => wp.pos >= 0)
.OrderBy(wp => wp.pos)
.Select(wp => wp.w)
.ToList();
However, if a field may appear more than once, or if you think avoiding the repeated scanning of MessageContent is faster than multiple IndexOf (once per MyList member) (probably not) and avoiding the sort, then you can invert the search (using Span to avoid generating lots of Strings):
var ans2 = Enumerable.Range(0, MessageContent.Length-MyList.Select(w => w.Length).Min())
.Select(p => MyList.FirstOrDefault(w => MessageContent.AsSpan().Slice(p).StartsWith(w)))
.Where(w => w != null)
.ToList();
I did it Using
var Exists = MyList.Where(MessageContent.Contains).OrderBy(s => MessageContent.IndexOf(s));

Iterate over a collection of strings using LINQ

I put the following code segment in .NET Fiddle but it printed out System.Linq.Enumerable+WhereArrayIterator1[System.String] I'd like to print out each content in result, in order to understand how Select works. Can someone please help to point out what the problem is? Many thanks!
string[] sequ1 = { "abcde", "fghi", "jkl", "mnop", "qrs" };
string[] sequ2 = { "abc", "defgh", "ijklm", "nop" };
var result =sequ1.Select( n1 => sequ2.Where(n2 => n1.Length < n2.Length) );
foreach( var y in result)
{
Console.WriteLine(y);
}
You are actually returning a collection of collections.
sequ1.Select( n1 => sequ2.Where(n2 => n1.Length < n2.Length) );
For each element in sequ1, this statement filters sequ2 to find all of the elements from the second sequence where the current value in the first sequence is shorter than it and then maps to a new collection containing each of those results.
To describe what Select is actually doing:
You start with a collection of things. In your case: sequ1 which has type IEnumerable<string>
You supply it with a function, this function takes an argument of the type of thing you supplied it with a collection of and has a return type of some other thing, in your case:
fun n1 => sequ2.Where(n2 => n1.Length < n2.Length)
Your function takes a string and returns an IEnumerable<string>
Finally, it returns a result containing a collection of each element in the original collection transformed to some new element by the function you supplied it with.
So you started with IEnumerable<string> and ended up with IEnumerable<IEnumerable<string>>.
That means you have a collection for each value that appears in sequ1.
As such, you would expect the result to be:
{{}, {"defgh", "ijklm"}, {"defgh", "ijklm"}, {"defgh", "ijklm"}, {"defgh", "ijklm"}}
You can inspect the results by adding another loop.
foreach(var y in result)
{
foreach(var z in result)
{
Console.WriteLine(z);
}
}
Change your Select to SelectMany:
var result = sequ1.SelectMany(n1 => sequ2.Where(n2 => n1.Length < n2.Length));
I may be wrong, but I think the OP wants to compare both arrays, and for each element, print the longest one.
If that's the case, I would do it as follows:
var result = sequ1.Take(sequ2.Length)
.Select((n1, i) =>
(n1.Length > sequ2.ElementAt(i).Length)
? n1
: sequ2.ElementAt(i));
Explanation:
Use Take to only go as long as the length of the second array, and avoid nullreference exceptions later on.
Use Select, with two arguments, the first is the string, the second is the index.
Use ElementAt to find the corresponding element in sequ2
I don't know about this example is about to help you to understand how select work. A more simple exmaple what i think is this.
public class Person {
public string Name { get; set; }
public string LastName { get; set; }
}
public class Test {
public Test() {
List<Person> persons = new List<Person>();
persons.Add(new Person() { Name = "Person1",LastName = "LastName1" });
persons.Add(new Person() { Name = "Person2",LastName = "LastName2" });
var getNamesFromPersons = persons.Select(p => p.Name);
}
}
If you are beginning c#, you need to sideline the keyword "var" from your code.
Force yourself to write out what the variables really are:
If you forego the use of var, you would have seen why your code was Console.Writing what it did.
string[] sequ1 = { "abcde", "fghi", "jkl", "mnop", "qrs", };
string[] sequ2 = { "abc", "defgh", "ijklm", "nop", };
IEnumerable<IEnumerable<string>> result = sequ1.Select(n1 => sequ2.Where(n2 => n1.Length < n2.Length));
foreach (IEnumerable<string> y in result)
{
foreach (string z in y)
{
Console.WriteLine(z);
}
}

Find the index position of duplicate entries in a comma separated string

My problem just got more complicated than I thought and I've just wiped out my original question... So I'll probably post multiple questions depending on how I get on with this.
Anyway, back to the problem. I need to find the index position of duplicate entries in string that contains csv data. For example,
FirstName,LastName,Address,Address,Address,City,PostCode,PostCode, Country
As you can see the Address is duplicated and I need to find out the index of each duplicates assuming first index position starts at 0.
If you have a better suggestion on how to do this, let me know, but assuming it can be done, could we maybe have with a dicitionary>?
So if I had to code this, you would have:
duplicateIndexList.Add(2);
duplicateIndexList.Add(3);
duplicateIndexList.Add(4);
myDuplicateList.Add("Address", duplicateIndexList);
duplicateIndexList.Add(6);
duplicateIndexList.Add(7);
myDuplicateList.Add("PostCode", duplicateIndexList);
Obviously I don't want to do this but is it possible to achieve the above using Linq to do this? I could probably write a function that does this, but I love seeing how things can be done with Linq.
In case you're curious as to why I want to do this? Well, in short, I have an xml definition which is used to map csv fields to a database field and I want to first find out if there are any duplicate columns, I then want to append the relevant values from the actual csv row i.e. Address = Address(2) + Address(3) + Address(4), PostCode = PostCode(6) + PostCode(7)
The next part will be how to remove all the relevant values from the csv string defined above based on the indexes found once I have appended their actual values, but that will be the next part.
Thanks.
T.
UPDATE:
Here is the function that does what I want but as I said, linq would be nice. Note that in this function I'm using a list instead of the comma separated string as I haven't converted that list yet to a csv string.
Dictionary<string, List<int>> duplicateEntries = new Dictionary<string, List<int>>();
int indexPosition = 0;
foreach (string fieldName in Mapping.Select(m=>m.FieldName))
{
string key = fieldName.ToUpper();
if (duplicateEntries.ContainsKey(key))
{
List<int> indexes = duplicateEntries[fieldName];
indexes.Add(indexPosition);
duplicateEntries[key] = indexes;
indexes = null;
}
else
{
duplicateEntries.Add(key, new List<int>() { indexPosition });
}
indexPosition += 1;
}
Maybe this will help clarify what I'm trying to achieve.
You need to do the following:
Use .Select on the resulting array to project a new IEnumerable of objects that contains the index of the item in the array along with the value.
Use either ToLookup or GroupBy and ToDictionary to group the results by column value.
Seems like an ILookup<string, int> would be appropriate here:
var lookup = columnArray
.Select((c, i) => new { Value = c, Index = i })
.ToLookup(o => o.Value, o => o.Index);
List<int> addressIndexes = lookup["Address"].ToList(); // 2, 3, 4
Or if you wanted to create a Dictionary<string, List<int>>:
Dictionary<string, List<int>> dictionary = columnArray
.Select((c, i) => new { Value = c, Index = i })
.GroupBy(o => o.Value, o => o.Index)
.ToDictionary(grp => grp.Key, grp => grp.ToList());
List<int> addressIndexes = dictionary["Address"]; // 2, 3, 4
Edit
(in response to updated question)
This should work:
Dictionary<string, List<int>> duplicateEntries = Mapping
.Select((m, i) => new { Value = m.FieldName, Index = i })
.GroupBy(o => o.Value, o => o.Index)
.ToDictionary(grp => grp.Key, grp => grp.ToList());
You could do something like :
int count = 0;
var numbered_collection =
from line in File.ReadAllLines("your_csv_name.csv").Skip(1)
let parts = line.Split(',')
select new CarClass()
{
Id = count++,
First_Field = parts[0],
Second_Field = parts[1], // rinse and repeat
};
This gives you Id's per item. (and also skip the first line which has the header). You could put it in a method if you want to automatically map the names from the first csv line to the fields).
From there, you can do:
var duplicates = (from items in numbered_collection
group items by items.First_Field into g
select g)
.Where(g => g.Count() > 1);
Now you have all the groups where you actually have duplicates, and you can just get the 'Id' from the object to know which one is the duplicated.

How to Select the Token and Index from a Comma Separated String in Linq

Given a comma delimited string "a,b,c" I would like to split the string and select the token and its respective index into a list.
In other words, I want "a,b,c".Split(',') to return a list of:
a, 1
b, 2
c, 3
I attempted the solution myself, but this is as close as I get. Of course, I only use a.Index() in the final line to indicate what I am trying to do.
public class var
{
public string Token;
public int Index;
}
List<var> varList = "a,b,c"
.Split(',')
.Select(a => new var { Token = a, Index = a.Index() };
You can use the other overload of Select, .Select(Func<string, int, TResult>) which gives us the index of the value.
List<Var> varList = "a,b,c".Split(',')
.Select((a, i) => new Var { Token = a, Index = i + 1 })
.ToList();
You can use the overload of Select which provides indexing:
var list = "a,b,c".Split(',').Select((a,i) => new { Token = a, Index = i+1 }).ToList();
On a side note - I would recommend not using var as a class name, as it will conflict with the C# var keyword.

Categories

Resources