I want to remove duplicate items from an array, ignoring spaces, so Distinct.ToArray() won't work.
I copied the array to a list and enumerate backwards through the array with a nested loop. I compare the array item of the inner loop with that of the outer loop. The loop runs without problems, but when I remove an item from the copied list that is indexed to the inner loop, I get an exception.
for (int k = aArray.Length - 1; k > 0 ; k--)
{
for ( int j = k -1; j >= 0; j--)
{
if (Regex.Replace(aArray[k], #"\s", "") ==
Regex.Replace(aArray[j], #"\s", ""))
{
aList.RemoveAt(j);
}
}
}
How can I enumerate through an array and remove items from a copy of that array based on a comparison of items in the array? Thanks.
Edit: Given three strings, one containing NOSPACE, one containing NO SPACE (1 sp), and one containing NO SPACE (2 sp), but otherwise the same, I want to remove two of those strings. Doesn't matter which two. Distinct won't work because it doesn't ignore spaces, and the suggested answer removes all spaces.
2nd edit: Greg's answer does work, but I can't upvote it (less than 15). I've been struggling with this all day....
3rd edit: Greg's answer works, but removes all spaces in strings. I want to remove items that are identical except for spaces, and leave spaces in items. I still think that enumeration should work, somehow.
Try this:
var result = aArray.Select(x => x.Replace(" ", string.Empty)).Distinct();
you can use Regex.Replace(x, #"\s", "") instead of x.Replace if you want to
var array = new[] { " 1 2 ", " 12 ", "1 ", " 1", "12", " 1 ", "3 3", " 33" };
var result = array
.ToLookup(k => k.Replace(" ", string.Empty))
.Select(v => v.First())
.ToArray();
Related
I found this piece of code on stackoverflow some time ago, but can't seem to find it again. All credit to the author. Sorry I could not link.
QUESTION? Can some C# LINQ guru please break down this statement step by step as I am having difficulty understanding it. It certainly works well and does the job, but how?
Line to Split
var line = $"13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 \"Beam 1\" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3";
var splitLineResult = line.Trim().Split('"')
.Select((element, index) => index % 2 == 0
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
: new string[] { element })
.SelectMany(element => element).ToList();
Result of Statement in LinqPad
You need to begin by analysing the input you have in hand.
13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 "Beam 1" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3
The input consists of few alphanumeric strings separated by Whitespace. However, there is one special case that needs to be handled as well. The word "Beam 1" is enclosed in Quotes.
Now, let's break down the Linq statement.
line.Trim().Split('"')
The first statement splits the input based on the delimiter Quotes. This splits the string into 3 parts.
As you can observe the first(in 0th Index) and Last(in index position 2) needs to be split further while, the element in the in index position 1 has already been parsed. This is where the second part of Linq statement comes into picture.
.Select((element, index) => index % 2 == 0
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
: new string[] { element })
In the above statement the Select((element, index) => index % 2 == 0 part checks if the current index position is in an even position. If so, it needs to be split the substring further based on delimiter ' ' (whitespace). Otherwise, it creates an array with single entity 'Beam 1'
At the end of the second part, what you get is a collection 3 sub-collections of alphanumeric strings (IEnumerble<string[]>).
What now needs to be done is create a collection by flattening the parent collection. This done using Enumerable.SelectMany.
.SelectMany(element => element).ToList();
Hope that helped in understanding the Linq query better
Breaking up the statement
var splitLineResult = line.Trim().Split('"')
.Select((element, index) => index % 2 == 0
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
: new string[] { element }).ToList()
.SelectMany(element => element).ToList();
Trim(): Removes any spaces in the beginning or the end.
Split('"'): Split the string into an array of string with double quote as a delimiter
Select(): Only select a specific part from the element being iterated. In this scenario, its the element and its index in the entire string (starts with 0).
Within your select statement, you have %2 ==0 .. which is true only for every other element. (skips 1)
You can do an if else statement with ? and :
a. If true print 1 else print 0 can be thought of as true ? print 1 : print 0
SelectMany(): Selects all the elements that are returned. Instead of returning 3 arrays, it returns the elements from each of the array.
ToList(): Converts the array to a list.
Seems like there are some really good answers here already, but I had already almost finished mine when I saw people had beat me to it, so here's my version. Hope it brings something to the table.
My approach was writing it out in non Linq code with comments
var line = $"13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 \"Beam 1\" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3";
line = line.Trim(); //remove leading and trailing white space
var tempArray1 = line.Split('"'); //split the line to a string array on "
//so the string array we get, is everything before, between, and after "
//.Select((element, index)
//Basically doing a for-loop on the array we got
List<string[]> tempListForElements = new List<string[]>(); //initialize a temporary list of string arrays we're going to be using
for (var index = 0; index < tempArray1.Length; index++)
{
var element = tempArray1[index];
//now we're getting to the ternary, which is basically like an inline if-else statement
//index % 2 == 0 ? <if true> : <if false>
//if remainder of division by 2 is 0, so basically a way of doing two
//different things for every other iterator of the loop
if (index % 2 == 0)
{
//if on the first or last iteraton on the loop (before and after " in the line)
tempListForElements.Add(element.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)); //we create yet another string array splitting on each whitespace and add it to our temporary list
}
else
{
//if on the second iteraton on the loop (between ")
tempListForElements.Add(new string[] { element }); //we're creating yet another string array, this time there's just one element in the array though, and then add it to our temporary list
}
}
//.SelectMany(element => element).ToList()
//we're basically turning out list of 3 string array into one string array
//can't be asked to type it out in non linq since just realized there are some good answers here already,
//but imagine initializing a string array with the correct size and then a foreach loop adding each string to it in order.
if you convert this Linq in normal for loop it will look like this.
var line = $"13351.750815 26646.150876 6208.767863 26646.150876 1219.200000 914.400000 0.000000 1 \"Beam 1\" 0 1 1 1 0 1 1e8f59dd-142d-4a4d-81ff-f60f93f674b3";
string[] splitLineResult = line.Trim().Split('"');
var list = new List<string[]>();
for (int i = 0; i < splitLineResult.Length; i++)
{
if (i % 2 == 0)
{
list.Add(splitLineResult[i].Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
}
else
{
list.Add(new string[] { splitLineResult[i] });
}
}
var finalList = list.SelectMany(x=>x).ToList();
and for SelectMany method, you can refer MS documentation https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.selectmany?view=netframework-4.8
everyone. I've this small task to do:
There are two sequences of numbers:
A[0], A[1], ... , A[n].
B[0], B[1], ... , B[m].
Do the following operations with the sequence A:
Remove the items whose indices are divisible by B[0].
In the items remained, remove those whose indices are divisible by B[1].
Repeat this process up to B[m].
Output the items finally remained.
Input is like this: (where -1 is delimiter for two sequences A and B)
1 2 4 3 6 5 -1 2 -1
Here goes my code (explanation done via comments):
List<int> result = new List<int>(); // list for sequence A
List<int> values = new List<int>(); // list for holding value to remove
var input = Console.ReadLine().Split().Select(int.Parse).ToArray();
var len = Array.IndexOf(input, -1); // getting index of the first -1 (delimiter)
result = input.ToList(); // converting input array to List
result.RemoveRange(len, input.Length - len); // and deleting everything beyond first delimiter (including it)
for (var i = len + 1; i < input.Length - 1; i++) // for the number of elements in the sequence B
{
for (var j = 0; j < result.Count; j++) // going through all elmnts in sequence A
{
if (j % input[i] == 0) // if index is divisible by B[i]
{
values.Add(result[j]); // adding associated value to List<int> values
}
}
foreach (var value in values) // after all elements in sequence A have been looked upon, now deleting those who apply to criteria
{
result.Remove(value);
}
}
But the problem is that I'm only passing 5/11 tests cases. The 25% is 'Wrong result' and the rest 25% - 'Timed out'. I understand that my code is probably very badly written, but I really can't get to understand how to improve it.
So, if someone more experienced could explain (clarify) next points to me it would be very cool:
1. Am I doing parsing from the console input right? I feel like it could be done in a more elegant/efficient way.
2. Is my logic of getting value which apply to criteria and then storing them for later deleting is efficient in terms of performance? Or is there any other way to do it?
3. Why is this code not passing all test-cases or how would you change it in order to pass all of them?
I'm writing the answer once again, since I have misunderstood the problem completely. So undoubtly the problem in your code is a removal of elements. Let's try to avoid that. Let's try to make a new array C, where you can store all the correct numbers that should be left in the A array after each removal. So if index id is not divisible by B[i], you should add A[id] to the array C. Then, after checking all the indices with the B[i] value, you should replace the array A with the array C and do the same for B[i + 1]. Repeat until you reach the end of the array B.
The algorithm:
1. For each value in B:
2. For each id from 1 to length(A):
3. If id % value != 0, add A[id] to C
4. A = C
5. Return A.
EDIT: Be sure to make a new array C for each iteration of the 1. loop (or clear C after replacing A with it)
I am trying to extract "strings" which are items of List. There are 200 lines from which I need to extract strings which are item(s) of a list, there are 54474 items in the said list, I am passing the line from which the items of the list are to extracted as sub-string if exists. I am passing both to the function as an argument as below:-
private static string FindMatchingSkill(string line, List<string> skillsfromMongoDB)
{
StringBuilder builtString = new StringBuilder();
foreach (var item in skillsfromMongoDB)
{
string temp = " " + item;
builtString.Append(line.Substring(line.IndexOf(temp), temp.Length).Trim() + ", ");
}
return builtString.ToString();
}
The first thing you want to do is not to substring the original string, instead, print out the item from the list.
Instead of:
Console.WriteLine(line.Substring(line.IndexOf(item), item.Length).Trim() + ", ");
use
Console.Write(item +", ");
But to do that, you need to get only the items that are actually in the string, so your loop should be something like this:
foreach (var item in data.Where(i => line.IndexOf(i) > -1)
That might leave you with some false positives, since if the line contains javascript and not java, you will get both.
So the next step is to identify what is a full world and what is not - now that might be a problem, since dot net is two words but it's just one item. Also, Items in the original string might be followed by chars other then white space - like a comma, a dot, semicolon etc'.
So instead of just using IndexOf, you need to also make sure the item you found is not a part of a larger item - and since your list items are not restricted to be a single word, that poses a real difficulty.
I would probably suggest something like this:
foreach (var item in data.Where(i => line.IndexOf(i) > -1 && !Char.IsLetter(line[line.IndexOf(i) + i.Length + 1]) && !Char.IsLetter(line[line.IndexOf(i) - 1]))
{
Console.Write(item +", ");
}
Testing the char after the item to make sure it's not a letter. If it is, then it's a false positive. Please note that since your items might contain non-letters chars you might still get false positives - if you have both dot net core and dot net in the list, but the line only have dot net core you will get a false positive for dot net. However, this is an edge case that I think it's probably safe to ignore.
Here is an example:
var result = new Dictionary<string, string[]>();
var searchInLines = new string[200]; // filled with resumes
var dictionary = new string[50000]; // search dictionary
searchInLines.AsParallel()
.WithDegreeOfParallelism(Environment.ProcessorCount * 2)
.Select(searchInLine =>
{
result.Add(searchInLine, dictionary.Where(s => searchInLine.Contains(s)).ToArray());
return string.Empty;
})
.ToList();
Produces dictionary with "Resume" to found dictionary items.
If you use inaccurate string.Contains then it will work quickly in 0.2 seconds
If you use RegEx like Regex.IsMatch(searchInLine, $"\\b{s}\\b")) (to find words), then it will work slowly in 30 seconds.
The choice is yours
The list of data being bigger is not good to be looped through. I would suggest to loop through the line as it is smaller. Considering there would always be space between words.
List<string> data = new List<string>() { "Delphi", "dot net", "java", "Oracle" }
String line = "Dheeraj has experience in dot net java programming Oracle javascript and Delphi";
foreach (var item in line.Split(new char[] { ' ' }))
{
// If you use Contains here, it will use sorting and searching the keyword
if(data.Contains(item))
{
Console.WriteLine(item);
}
}
I had three values (Name, Course and Average) which were assigned to 3 arrays. I had to combine them and put them in a listbox. Now, I need to be able to select the same line and break it back up into the 3 variables.
My listbox output looks like this:
Lastname, firstname SEC360 93.5
I tried to do a split with space, but that breaks up my lastname and firstname, which need to be one combined variable with the comma included (I need to check it against the array in which it is placed). I cannot do substring either, because I do not have a set value. Any ideas?
EDIT:
I am sorry everyone. Im a inexperienced programmer (to say the least) and new to this site.
This is where I loaded the arrays:
studentNamesAr[studentCount] = studentNameTxtBox.Text;
courseAr[studentCount] = courseNumTxtBox.Text;
gradesAr[studentCount, 0] = Convert.ToInt32(grade1TxtBox.Text);
gradesAr[studentCount, 1] = Convert.ToInt32(grade2TxtBox.Text);
gradesAr[studentCount, 2] = Convert.ToInt32(grade3TxtBox.Text);
gradesAr[studentCount, 3] = Convert.ToInt32(grade4TxtBox.Text);
This is where I load the arrays to the listbox:
for (int i = 0; i != studentCount; i++)
{
studentAvg = ((gradesAr[i, 0] + gradesAr[i, 1] + gradesAr[i, 2] + gradesAr[i, 3]) / 4);
studentListBox.Items.Add(string.Format("{0, -20} {1, 20} {2, 20:F1}", studentNamesAr[i], courseAr[i], studentAvg));
}
Yes this is Windows Forms.
Data is not lost in arrays. When the program runs I should have maybe 5 entries. I need to split them back up so that when I select one from the listbox (to delete it), I will find the values in the 3 arrays and delete them, and then shift the remaining array values up.
If you preserve the order of your items in the listbox, the SelectedIndex of the listbox should match up with the indices from your original arrays. So once you have that, you can go about shifting your arrays up (which is pretty tedious).
For the record, there are much better constructs to use to approach this problem, but I'm assuming the stipulation of using arrays is part of a homework assignment. :)
You could hold the value separated by some other delimiter in the .Tag property.
I suggest you to replace the ", " (notice the space after the ',') with something else (like ";") then use the split method with space as the parameter.
string s = "Lastname, firstname SEC360 93.5";
string[] result = s.Replace(", ", ";").Split(' ');
then you may change back the result[0] by replacing ";" with ", "
I'm having two collections of String like
List<String> l_lstOne = new List<String> { "100", "1X0", "X11", "XXX" },
l_lstTwo = new List<String> { "000", "110", "100", "000" };
I need to compare the two lists and make the second list like
{ "000", "1X0", "X00", "XXX" }
Note:
Both the list will contain same numbe of elements and the length of each element will be same.
The comparision is like
If an mth element in l_lstOne have an 'X' in nth position, the the nth position of the mth in l_lstTwo should be replaced by 'X'.
Example
l_lstOne l_lstTwo Output
100 000 000
1X0 110 1X0
X11 100 X00
So, to solve this i used nested for loop , here is my source code,
for (int l_nIndex = 0; l_nIndex < l_lstTwo.Count; l_nIndex++)
{
String l_strX = String.Empty;
for (int l_nInnerIndex = 0; l_nInnerIndex < l_lstTwo[l_nInnerIndex].Length; l_nInnerIndex++)
{
l_strX += l_lstOne[l_nIndex][l_nInnerIndex] == 'X' ? 'X' : l_lstTwo[l_nIndex][l_nInnerIndex];
}
l_lstTwo[l_nIndex] = l_strX;
}
This code is working fine, but the thing is, its taking more time to execute, i.e almost 600 milliseconds to process 200000 elements and each of length 16.
And moreover i need a Linq or Lambda method to resolve this. So please help me to do this. Thanks in advance.
LINQ will not help you here; LINQ is not meant to modify collections.
You can make your code substantially faster by building a char[] instead of a string; right now, you're building 3.2 million string objects because of the +=.
Instead, you can write
char[] l_strX = new char[l_lstTwo[l_nInnerIndex].Length];
for (int l_nInnerIndex = 0; l_nInnerIndex < l_lstTwo[l_nInnerIndex].Length; l_nInnerIndex++)
{
l_strX[l_nInnerIndex] = l_lstOne[l_nIndex][l_nInnerIndex] == 'X' ? 'X' : l_lstTwo[l_nIndex][l_nInnerIndex];
}
l_lstTwo[l_nIndex] = new string(l_strX);
You could do it with the following statement in .NET 3.5
IEnumerable <String> result =
Enumerable.Range(0, l_lstOne.Count)
.Select(i => Enumerable.Range(0, l_lstOne[i].Length)
.Aggregate(string.Empty, (innerResult, x) => innerResult += l_lstOne[i][x] == 'X' ? 'X' : l_lstTwo[i][x]));
Mh, if I understand it correctly the words in l_lstOne act as a mask for the words in l_lstTwo where the mask is transparent unless it's an X. How about this:
l_lstOne.Zip(l_lstTwo,
(w1, w2) => new String(w1.Zip(w2, (c1, c2) => c1 == 'X' ? c1 : c2).ToArray())))
Zip is a Linq extension method available from .NET 4 on which combines the elements of two lists like a zip. The outer zip basically creates the word pairs to iterate over and the second one creates a the mask (take all characters from the second word unless word one has an X in that position).
Also note that this creates a new sequence of strings rather than replacing the ones in l_lstTwo - that's the Linq way of doing things.