I found this piece of code on stackoverflow some time ago, but can't seem to find it again. All credit to the author. Sorry I could not link.
QUESTION? Can some C# LINQ guru please break down this statement step by step as I am having difficulty understanding it. It certainly works well and does the job, but how?
Line to Split
var line = $"13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 \"Beam 1\" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3";
var splitLineResult = line.Trim().Split('"')
.Select((element, index) => index % 2 == 0
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
: new string[] { element })
.SelectMany(element => element).ToList();
Result of Statement in LinqPad
You need to begin by analysing the input you have in hand.
13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 "Beam 1" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3
The input consists of few alphanumeric strings separated by Whitespace. However, there is one special case that needs to be handled as well. The word "Beam 1" is enclosed in Quotes.
Now, let's break down the Linq statement.
line.Trim().Split('"')
The first statement splits the input based on the delimiter Quotes. This splits the string into 3 parts.
As you can observe the first(in 0th Index) and Last(in index position 2) needs to be split further while, the element in the in index position 1 has already been parsed. This is where the second part of Linq statement comes into picture.
.Select((element, index) => index % 2 == 0
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
: new string[] { element })
In the above statement the Select((element, index) => index % 2 == 0 part checks if the current index position is in an even position. If so, it needs to be split the substring further based on delimiter ' ' (whitespace). Otherwise, it creates an array with single entity 'Beam 1'
At the end of the second part, what you get is a collection 3 sub-collections of alphanumeric strings (IEnumerble<string[]>).
What now needs to be done is create a collection by flattening the parent collection. This done using Enumerable.SelectMany.
.SelectMany(element => element).ToList();
Hope that helped in understanding the Linq query better
Breaking up the statement
var splitLineResult = line.Trim().Split('"')
.Select((element, index) => index % 2 == 0
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
: new string[] { element }).ToList()
.SelectMany(element => element).ToList();
Trim(): Removes any spaces in the beginning or the end.
Split('"'): Split the string into an array of string with double quote as a delimiter
Select(): Only select a specific part from the element being iterated. In this scenario, its the element and its index in the entire string (starts with 0).
Within your select statement, you have %2 ==0 .. which is true only for every other element. (skips 1)
You can do an if else statement with ? and :
a. If true print 1 else print 0 can be thought of as true ? print 1 : print 0
SelectMany(): Selects all the elements that are returned. Instead of returning 3 arrays, it returns the elements from each of the array.
ToList(): Converts the array to a list.
Seems like there are some really good answers here already, but I had already almost finished mine when I saw people had beat me to it, so here's my version. Hope it brings something to the table.
My approach was writing it out in non Linq code with comments
var line = $"13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 \"Beam 1\" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3";
line = line.Trim(); //remove leading and trailing white space
var tempArray1 = line.Split('"'); //split the line to a string array on "
//so the string array we get, is everything before, between, and after "
//.Select((element, index)
//Basically doing a for-loop on the array we got
List<string[]> tempListForElements = new List<string[]>(); //initialize a temporary list of string arrays we're going to be using
for (var index = 0; index < tempArray1.Length; index++)
{
var element = tempArray1[index];
//now we're getting to the ternary, which is basically like an inline if-else statement
//index % 2 == 0 ? <if true> : <if false>
//if remainder of division by 2 is 0, so basically a way of doing two
//different things for every other iterator of the loop
if (index % 2 == 0)
{
//if on the first or last iteraton on the loop (before and after " in the line)
tempListForElements.Add(element.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)); //we create yet another string array splitting on each whitespace and add it to our temporary list
}
else
{
//if on the second iteraton on the loop (between ")
tempListForElements.Add(new string[] { element }); //we're creating yet another string array, this time there's just one element in the array though, and then add it to our temporary list
}
}
//.SelectMany(element => element).ToList()
//we're basically turning out list of 3 string array into one string array
//can't be asked to type it out in non linq since just realized there are some good answers here already,
//but imagine initializing a string array with the correct size and then a foreach loop adding each string to it in order.
if you convert this Linq in normal for loop it will look like this.
var line = $"13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 \"Beam 1\" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3";
string[] splitLineResult = line.Trim().Split('"');
var list = new List<string[]>();
for (int i = 0; i < splitLineResult.Length; i++)
{
if (i % 2 == 0)
{
list.Add(splitLineResult[i].Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
}
else
{
list.Add(new string[] { splitLineResult[i] });
}
}
var finalList = list.SelectMany(x=>x).ToList();
and for SelectMany method, you can refer MS documentation https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.selectmany?view=netframework-4.8
Related
I have to find where a * is at when it could be none at all , 1st position | 2nd position | 3rd position.
The positions are separated by pipes |
Thus
No * wildcard would be
`ABC|DEF|GHI`
However, while that could be 1 scenario, the other 3 are
string testPosition1 = "*|DEF|GHI";
string testPosition2 = "ABC|*|GHI";
string testPosition3 = "ABC|DEF|*";
I gather than I should use IndexOf , but it seems like I should incorporate | (pipe) to know the position ( not just the length as the values could be long or short in each of the 3 places. So I just want to end up knowing if * is in first, second or third position ( or not at all )
Thus I was doing this but i'm not going to know about if it is before 1st or 2nd pipe
if(testPosition1.IndexOf("*") > 0)
{
// Look for pipes?
}
There are lots of ways you could approach this. The most readable might actually just be to do it the hard way (i.e. scan the string to find the first '*' character, keeping track of how many '|' characters you see along the way).
That said, this could be a similarly readable and more concise:
int wildcardPosition = Array.IndexOf(testPosition1.Split('|'), "*");
Returns -1 if not found, otherwise 0-based index for which segment of the '|' delimited string contains the wildcard string.
This only works if the wildcard is exactly the one-character string "*". If you need to support other variations on that, you will still want to split the string, but then you can loop over the array looking for whatever criteria you need.
You can try with linq splitting the string at the pipe character and then getting the index of the element that contains just a *
var x = testPosition2.Split('|').Select((k, i) => new { text = k, index = i}).FirstOrDefault(p => p.text == "*" );
if(x != null) Console.WriteLine(x.index);
So the first line starts splitting the string at the pipe creating an array of strings. This sequence is passed to the Select extension that enumerates the sequence passing the string text (k) and the index (i). With these two parameters we build a sequences of anonymous objects with two properties (text and index). FirstOrDefault extract from this sequence the object with text equals to * and we can print the property index of that object.
The other answers are fine (and likely better), however here is another approach, the good old fashioned for loop and the try-get pattern
public bool TryGetStar(string input, out int index)
{
var split = input.Split('|');
for (index = 0; index < split.Length; index++)
if (split[index] == "*")
return true;
return false;
}
Or if you were dealing with large strings and trying to save allocations. You could remove the Split entirely and use a single parse O(n)
public bool TryGetStar(string input, out int index)
{
index = 0;
for (var i = 0; i < input.Length; i++)
if (input[i] == '|') index++;
else if (input[i] == '*') return true;
return false;
}
Note : if performance was a consideration, you could also use unsafe and pointers, or Span<Char> which would afford a small amount of efficiency.
Try DotNETFiddle:
testPosition.IndexOf("*") - testPosition.Replace("|","").IndexOf("*")
Find the index of the wildcard ("*") and see how far it moves if you remove the pipe ("|") characters. The result is a zero-based index.
From the question you have the following code segment:
if(testPosition1.IndexOf("*") > 0)
{
}
If you're now inside the if statement, you're sure the asterisk exists.
From that point, an efficient solution could be to check the first two chars, and the last two chars.
if (testPosition1.IndexOf("*") > 0)
{
if (testPosition1[0] == '*' && testPosition[1] == '|')
{
// First position.
}
else if (testPosition1[testPosition.Length - 1] == '*' && testPosition1[testPosition.Length - 2] == '|')
{
// Third (last) position.
}
else
{
// Second position.
}
}
This assumes that no more than one * can exist, and also assumes that if an * exist, it can only be surrounded by pipes. For example, I assume an input like ABC|DEF|G*H is invalid.
If you want to remove this assumptions, you could do a one-pass loop over the string and keeping track with the necessary information.
In my foreach loop, char 'c' characterizes each character in the string currentDependency, and if c is '|', then it adds the position of this to List sectionsSpots. However, it seems to loop through characters that have been already found, which means I probably don't understand exactly how the loop method is working.
In Debug 1, it goes through each of the characters in currentDependency just as expected. In Debug 2, however, when the if statement passes, it always returns an index of 1, which is correct for the first |, as although the first | has an index of 1, the subsequent |'s should have indexes of 19 and 38. Why does the .IndexOf(c) return to the first c that passed in the if statement, while the code should actually be measuring characters later in the string? Thanks in advance!
string currentDependency = ">|Policies/Tax/-0.3|Policies/Trade/0.3|Power/Trader:Farmer/0.4";
List<int> sectionSpots = new List<int> { };
foreach (char c in currentDependency)//add spots of separations of sections
{
Debug.Log("CurrentChar: " + c.ToString());//DEBUG 1
if (c.ToString().Contains("|"))
{
sectionsSpots.Add(currentDependency.IndexOf(c));
Debug.Log("| found in string, index of " + currentDependency.IndexOf(c));//DEBUG 2
}
}
//Output:
//CurrentChar: >
//CurrentChar: |
//| found in string, index of 1
//CurrentChar: P
//CurrentChar: o
//[...]
//CurrentChar: 3
//CurrentChar: |
//| found in string, index of 1////Why is the "index of 1", rather than of 19, if I already made it through the previous | with an index of 1?
//[and so on...]
Each time you are finding the first occurrence (index 2). see: :https://learn.microsoft.com/en-us/dotnet/api/system.string.indexof?view=netframework-4.8;
The easier way how to achieve your goal is:
for (int i = 0; i < currentDependency.Length; i++)
{
if (currentDependency[i] == '|')
{
sectionsSpots.Add(i);
Debug.Log("| found in string, index of " + i);//DEBUG 2
}
}
Using your own code as a base, I modified how you call the IndexOf function to make use of its other parameters.
List<Int32> sectionsSpots = new List<Int32>();
string currentDependency = ">|Policies/Tax/-0.3|Policies/Trade/0.3|Power/Trader:Farmer/0.4";
Int32 startPosition = 0;
foreach (char c in currentDependency)//add spots of separations of sections
{
Debug.Print("CurrentChar: " + c.ToString());//DEBUG 1
if (c.Equals('|'))
{
Int32 position = currentDependency.IndexOf(c, startPosition);
sectionsSpots.Add(position);
Debug.Print("| found in string, index of " + position);//DEBUG 2
startPosition = position + 1;
}
}
By passing in a position, the IndexOf function will start looking for the required character from a different starting point, not the very start of the string.
I want to remove duplicate items from an array, ignoring spaces, so Distinct.ToArray() won't work.
I copied the array to a list and enumerate backwards through the array with a nested loop. I compare the array item of the inner loop with that of the outer loop. The loop runs without problems, but when I remove an item from the copied list that is indexed to the inner loop, I get an exception.
for (int k = aArray.Length - 1; k > 0 ; k--)
{
for ( int j = k -1; j >= 0; j--)
{
if (Regex.Replace(aArray[k], #"\s", "") ==
Regex.Replace(aArray[j], #"\s", ""))
{
aList.RemoveAt(j);
}
}
}
How can I enumerate through an array and remove items from a copy of that array based on a comparison of items in the array? Thanks.
Edit: Given three strings, one containing NOSPACE, one containing NO SPACE (1 sp), and one containing NO SPACE (2 sp), but otherwise the same, I want to remove two of those strings. Doesn't matter which two. Distinct won't work because it doesn't ignore spaces, and the suggested answer removes all spaces.
2nd edit: Greg's answer does work, but I can't upvote it (less than 15). I've been struggling with this all day....
3rd edit: Greg's answer works, but removes all spaces in strings. I want to remove items that are identical except for spaces, and leave spaces in items. I still think that enumeration should work, somehow.
Try this:
var result = aArray.Select(x => x.Replace(" ", string.Empty)).Distinct();
you can use Regex.Replace(x, #"\s", "") instead of x.Replace if you want to
var array = new[] { " 1 2 ", " 12 ", "1 ", " 1", "12", " 1 ", "3 3", " 33" };
var result = array
.ToLookup(k => k.Replace(" ", string.Empty))
.Select(v => v.First())
.ToArray();
everyone. I've this small task to do:
There are two sequences of numbers:
A[0], A[1], ... , A[n].
B[0], B[1], ... , B[m].
Do the following operations with the sequence A:
Remove the items whose indices are divisible by B[0].
In the items remained, remove those whose indices are divisible by B[1].
Repeat this process up to B[m].
Output the items finally remained.
Input is like this: (where -1 is delimiter for two sequences A and B)
1 2 4 3 6 5 -1 2 -1
Here goes my code (explanation done via comments):
List<int> result = new List<int>(); // list for sequence A
List<int> values = new List<int>(); // list for holding value to remove
var input = Console.ReadLine().Split().Select(int.Parse).ToArray();
var len = Array.IndexOf(input, -1); // getting index of the first -1 (delimiter)
result = input.ToList(); // converting input array to List
result.RemoveRange(len, input.Length - len); // and deleting everything beyond first delimiter (including it)
for (var i = len + 1; i < input.Length - 1; i++) // for the number of elements in the sequence B
{
for (var j = 0; j < result.Count; j++) // going through all elmnts in sequence A
{
if (j % input[i] == 0) // if index is divisible by B[i]
{
values.Add(result[j]); // adding associated value to List<int> values
}
}
foreach (var value in values) // after all elements in sequence A have been looked upon, now deleting those who apply to criteria
{
result.Remove(value);
}
}
But the problem is that I'm only passing 5/11 tests cases. The 25% is 'Wrong result' and the rest 25% - 'Timed out'. I understand that my code is probably very badly written, but I really can't get to understand how to improve it.
So, if someone more experienced could explain (clarify) next points to me it would be very cool:
1. Am I doing parsing from the console input right? I feel like it could be done in a more elegant/efficient way.
2. Is my logic of getting value which apply to criteria and then storing them for later deleting is efficient in terms of performance? Or is there any other way to do it?
3. Why is this code not passing all test-cases or how would you change it in order to pass all of them?
I'm writing the answer once again, since I have misunderstood the problem completely. So undoubtly the problem in your code is a removal of elements. Let's try to avoid that. Let's try to make a new array C, where you can store all the correct numbers that should be left in the A array after each removal. So if index id is not divisible by B[i], you should add A[id] to the array C. Then, after checking all the indices with the B[i] value, you should replace the array A with the array C and do the same for B[i + 1]. Repeat until you reach the end of the array B.
The algorithm:
1. For each value in B:
2. For each id from 1 to length(A):
3. If id % value != 0, add A[id] to C
4. A = C
5. Return A.
EDIT: Be sure to make a new array C for each iteration of the 1. loop (or clear C after replacing A with it)
I'm having two collections of String like
List<String> l_lstOne = new List<String> { "100", "1X0", "X11", "XXX" },
l_lstTwo = new List<String> { "000", "110", "100", "000" };
I need to compare the two lists and make the second list like
{ "000", "1X0", "X00", "XXX" }
Note:
Both the list will contain same numbe of elements and the length of each element will be same.
The comparision is like
If an mth element in l_lstOne have an 'X' in nth position, the the nth position of the mth in l_lstTwo should be replaced by 'X'.
Example
l_lstOne l_lstTwo Output
100 000 000
1X0 110 1X0
X11 100 X00
So, to solve this i used nested for loop , here is my source code,
for (int l_nIndex = 0; l_nIndex < l_lstTwo.Count; l_nIndex++)
{
String l_strX = String.Empty;
for (int l_nInnerIndex = 0; l_nInnerIndex < l_lstTwo[l_nInnerIndex].Length; l_nInnerIndex++)
{
l_strX += l_lstOne[l_nIndex][l_nInnerIndex] == 'X' ? 'X' : l_lstTwo[l_nIndex][l_nInnerIndex];
}
l_lstTwo[l_nIndex] = l_strX;
}
This code is working fine, but the thing is, its taking more time to execute, i.e almost 600 milliseconds to process 200000 elements and each of length 16.
And moreover i need a Linq or Lambda method to resolve this. So please help me to do this. Thanks in advance.
LINQ will not help you here; LINQ is not meant to modify collections.
You can make your code substantially faster by building a char[] instead of a string; right now, you're building 3.2 million string objects because of the +=.
Instead, you can write
char[] l_strX = new char[l_lstTwo[l_nInnerIndex].Length];
for (int l_nInnerIndex = 0; l_nInnerIndex < l_lstTwo[l_nInnerIndex].Length; l_nInnerIndex++)
{
l_strX[l_nInnerIndex] = l_lstOne[l_nIndex][l_nInnerIndex] == 'X' ? 'X' : l_lstTwo[l_nIndex][l_nInnerIndex];
}
l_lstTwo[l_nIndex] = new string(l_strX);
You could do it with the following statement in .NET 3.5
IEnumerable <String> result =
Enumerable.Range(0, l_lstOne.Count)
.Select(i => Enumerable.Range(0, l_lstOne[i].Length)
.Aggregate(string.Empty, (innerResult, x) => innerResult += l_lstOne[i][x] == 'X' ? 'X' : l_lstTwo[i][x]));
Mh, if I understand it correctly the words in l_lstOne act as a mask for the words in l_lstTwo where the mask is transparent unless it's an X. How about this:
l_lstOne.Zip(l_lstTwo,
(w1, w2) => new String(w1.Zip(w2, (c1, c2) => c1 == 'X' ? c1 : c2).ToArray())))
Zip is a Linq extension method available from .NET 4 on which combines the elements of two lists like a zip. The outer zip basically creates the word pairs to iterate over and the second one creates a the mask (take all characters from the second word unless word one has an X in that position).
Also note that this creates a new sequence of strings rather than replacing the ones in l_lstTwo - that's the Linq way of doing things.