Create nested list (array) of strings from list of strings - c#

I have a list of names. From these, I want to create a new list of lists (or jagged array, if that works better), where the lower-level lists contain variations of the names.
The basic idea is that you take a name and remove one letter at a time to create a list which features all of these creations, plus the original name. So for example, if your names are "bob" and "alice", I want to output
[["bo", "bb","ob","bob"], ["alic", "alie", "alce", "aice", "lice", "alice"]].
I can easily do this for just one name, but I run into problems I can't resolve when I try to create such a nested list (I'm relatively new to C#). Here's a toy example of what I've been trying:
List<string> names = new List<string>()
{
"alice",
"bob",
"curt"
};
//initialize jagged array
string[][] modifiedNames = new string[names.Count][];
//iterate over all names in "names" list
foreach(string name in names)
{
int nameIndex = names.IndexOf(name);
//initialize lower level of array
modifiedNames[nameIndex] = new string[name.Length];
//create variations of a given name
for (int i = 0; i < name.Length; i++)
{
string newName = name.Substring(0, i) + name.Substring(i + 1);
if (modNames[nameIndex].Contains(newName) == false)
modNames[nameIndex].Add(newName);
}
modName.Add(name);
}
I've tried several version thereof, both with lists and arrays, but to no avail. The error message I get in this case is
'string[]' does not contain a definition for 'Add' and no accessible extension method 'Add' accepting a first argument of type 'string[]' could be found (are you missing a using directive or an assembly reference?)
Thanks a lot for your help!

First, that error is telling you that there is no Add() function for an array. As JohnB points out, a List would probably be a better fit here.
Second, I don't like the string[][] anyway. I'd use IDictionary<string, IList<string>>, storing your original name as the key, and the modified names as the value. That way the original and modified versions are stored together and you don't need to cross reference names with modifiedNames (one of which is a List and the other (currently) an array).
IDictionary<string, IList<string>> names = new Dictionary<string, IList<string>>();
names.Add("alice", new List<string>());
names.Add("bob", new List<string>());
names.Add("curt", new List<string>());
foreach (KeyValuePair<string, IList<string>> name in names)
{
for (int i = 0; i < name.Key.Length; i++)
{
string newName = name.Key.Substring(0, i) + name.Key.Substring(i + 1);
if (!name.Value.Contains(newName))
{
name.Value.Add(newName);
}
}
}
Hope this helps.

How about using a list of lists (of string) while you are working in the method and then converting it into an array of array before returning? Or even just returning a list of list, if the return type is not set in stone?
Here is a suggestion:
var names = new List<string>()
{
"alice",
"bob",
"curt"
};
var nameVariations = new List<List<string>>();
foreach (var name in names)
{
var variationsOfName = new List<string>();
for (int i = 0; i < name.Length; i++)
{
var newName = name.Substring(0, i) + name.Substring(i + 1);
if (!variationsOfName.Contains(newName))
{
variationsOfName.Add(newName);
}
}
nameVariations.Add(variationsOfName);
}
return nameVariations.Select(variationsOfName => variationsOfName.ToArray()).ToArray();
Note: for this to compile you'll need to add Linq (using System.Linq;).

I would do this in two steps. First, write a simple method that will get a list of name variations for a single name. We can simplify the code using some System.Linq extension methods, like Select() and ToList(). The Select statement below treats the string as a character array, and for each character t at index i, it selects the substring from name up to that character and adds the substring from name after that character, returning an IEnumerable<string>, which we create a new List<string> from. Then we finally add the original name to the list and return it:
public static List<string> GetNameVariations(string name)
{
var results = name.Select((t, i) =>
name.Substring(0, i) + name.Substring(i + 1, name.Length - (i + 1)))
.ToList();
results.Add(name);
return results;
}
And then we can use this method to get a List<List<string>> of names from a list of names using another method. Here we are calling GetNameVariations for each name in names (which returns a new List<string> for each name), and returning these lists in a new List<List<string>>:
public static List<List<string>> GetNameVariations(List<string> names)
{
return names.Select(GetNameVariations).ToList();
}
In use, this might look like (using your example):
private static void Main()
{
var names = new List<string> {"bob", "alice", "curt"};
foreach (var nameVariations in GetNameVariations(names))
{
Console.WriteLine(string.Join(", ", nameVariations));
}
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

You can simplify your logic by using String.Remove(i, 1) to remove one character at a time, iterating over values of i for the length of the string. You can write your query as a one-liner:
var result = names.Select(name => name.Select((_, i) => name.Remove(i, 1)).Reverse().Concat(new[] { name }).ToList()).ToList();
Reformatted for readability:
var result = names
.Select(name =>
name.Select((_, i) => name.Remove(i, 1))
.Reverse()
.Concat(new[] { name })
.ToList())
.ToList();

I think this is a fairly simple way to go:
List<string> names = new List<string>()
{
"alice",
"bob",
"curt"
};
string[][] modifiedNames =
names
.Select(name =>
Enumerable
.Range(0, name.Length)
.Select(x => name.Substring(0, x) + name.Substring(x + 1))
.Concat(new [] { name })
.ToArray())
.ToArray();
That gives:

Related

Remove names that contain another in a list

I have a file with "Name|Number" in each line and I wish to remove the lines with names that contain another name in the list.
For example, if there is "PEDRO|3" , "PEDROFILHO|5" , "PEDROPHELIS|1" in the file, i wish to remove the lines "PEDROFILHO|5" , "PEDROPHELIS|1".
The list has 1.8 million lines, I made it like this but its too slow :
List<string> names = File.ReadAllLines("firstNames.txt").ToList();
List<string> result = File.ReadAllLines("firstNames.txt").ToList();
foreach (string name in names)
{
string tempName = name.Split('|')[0];
List<string> temp = names.Where(t => t.Contains(tempName)).ToList();
foreach (string str in temp)
{
if (str.Equals(name))
{
continue;
}
result.Remove(str);
}
}
File.WriteAllLines("result.txt",result);
Does anyone know a faster way? Or how to improve the speed?
Since you are looking for matches everywhere in the word, you will end up with O(n2) algorithm. You can improve implementation a bit to avoid string deletion inside a list, which is an O(n) operation in itself:
var toDelete = new HashSet<string>();
var names = File.ReadAllLines("firstNames.txt");
foreach (string name in names) {
var tempName = name.Split('|')[0];
toDelete.UnionWith(
// Length constraint removes self-matches
names.Where(t => t.Length > name.Length && t.Contains(tempName))
);
}
File.WriteAllLines("result.txt", names.Where(name => !toDelete.Contains(name)));
This works but I don't know if it's quicker. I haven't tested on millions of lines. Remove the tolower if the names are in the same case.
List<string> names = File.ReadAllLines(#"C:\Users\Rob\Desktop\File.txt").ToList();
var result = names.Where(w => !names.Any(a=> w.Split('|')[0].Length> a.Split('|')[0].Length && w.Split('|')[0].ToLower().Contains(a.Split('|')[0].ToLower())));
File.WriteAllLines(#"C:\Users\Rob\Desktop\result.txt", result);
test file had
Rob|1
Robbie|2
Bert|3
Robert|4
Jan|5
John|6
Janice|7
Carol|8
Carolyne|9
Geoff|10
Geoffrey|11
Result had
Rob|1
Bert|3
Jan|5
John|6
Carol|8
Geoff|10

List function, how to get an average of scores for each name- c# console application

I have a list function on a console application on C#. This list function has different items where they look something like 'matt,5' 'matt,7' 'jack,4' 'jack,8' etc...
I want to be able to combine all of the names where I only see their name written once but the number after them are averaged out so it would be like 'jack,5+7/2' which would then display as 'jack,6'.
So far I have this...
currentFileReader = new StreamReader(file);
List<string> AverageList = new List<string>();
while (!currentFileReader.EndOfStream)
{
string text = currentFileReader.ReadLine();
AverageList.Add(text.ToString());
}
AverageList.GroupBy(n => n).Any(c => c.Count() > 1);
Not really sure where to go from here.
What you need is to Split your each string item on , and then group by first element of the returned array and average second element of the array (after parsing it to int) something like:
List<string> AverageList = new List<string> { "matt,5", "matt,7", "jack,4", "jack,8" };
var query = AverageList.Select(s => s.Split(','))
.GroupBy(sp => sp[0])
.Select(grp =>
new
{
Name = grp.Key,
Avg = grp.Average(t=> int.Parse(t[1])),
});
foreach (var item in query)
{
Console.WriteLine("Name: {0}, Avg: {1}", item.Name, item.Avg);
}
and it will give you:
Name: matt, Avg: 6
Name: jack, Avg: 6
But, a better option would be to use a class with Name and Score properties instead of comma separated string values.
(The code above doesn't check for invalid input values).
Firstly you will want to populate your unformatted data into a List, as you can see I called it rawScores. You could then Split each line by the comma delimiting them. You can then check to see if an existing person is in your Dictionary and add his score to it, or if not create a new person.
After that you would simply have to generate the Average of the List.
Hope this helps!
var scores = new Dictionary<string, List<int>>();
var rawScores = new List<string>();
rawScores.ForEach(raw =>
{
var split = raw.Split(',');
if (scores.Keys.All(a => a != split[0]))
{
scores.Add(split[0], new List<int> {Convert.ToInt32(split[1])});
}
else
{
var existing = scores.FirstOrDefault(f => f.Key == split[0]);
existing.Value.Add(Convert.ToInt32(split[1]));
}
});

Add value to array in C#

Hello I'm just starting in c # and am practicing with arrays, my question is how I can add a name called "steve" the array of this code:
string[] names = new string[] {"Matt", "Joanne", "Robert"};
foreach (string i in names)
{
richTextBox1.AppendText(i + Environment.NewLine);
}
anyone can help me?
You can resize an array, however its probably better to just use a list if you need a collection who's size changes.
Note that resizing an array actually just creates a new array of the size you want behind the scenes and copies over all the data
Arrays don't play well with this idea. Usually, people use List for this kind of thing.
List<string> names = new List<string> {"Matt", "Joanne", "Robert"};
names.Add("Steve");
foreach (string i in names)
{
richTextBox1.AppendText(i + Environment.NewLine);
}
You can't add elements to an array once the array has been created. You can:
Add the element before the array has been created as a literal:
string[] names = new string[] {"Matt", "Joanne", "Robert", "Steve", "Another name", "Tons of other names"};
Or you can use a collection that allows you to add elements after it has been created such as a List. To use a List instead of array, make sure you have the following directive using System.Collections.Generic at the top of your main file (should be included by default). Now you can do:
List<string> names = new List<string> {"Matt", "Joanne", "Robert"};
names.Add("Steve");
names.Add("Another one");
Although you can expand .NET arrays, in a situation like this you would be better off with a List<string>:
List<string> names = new List<string> {"Matt", "Joanne", "Robert"};
Now you can add a new name to names by calling Add:
names.Add("Steve");
Note: rather than using AppendText in a loop, you could use string.Join, like this:
richTextBox1.AppendText(names.Join(Environment.NewLine, names));
To add the Item to the array, using the code you provided, you can do this:
string[] names = new string[] { "Matt", "Joanne", "Robert" };
Array.Resize(ref names, names.Length + 1);
names[names.Length - 1] = "Steve";
foreach (string i in names)
{
richTextBox1.AppendText(i + Environment.NewLine);
}
Consider using this code instead, that uses List:
List<string> names = new List<string> { "Matt", "Joanne", "Robert" };
names.Add("Steve"); // Add a new entry
richTextBox1.AppendText(String.Join(Environment.NewLine, names));
The array has a fix size. At first you've created that with three elements, so it will have three elements. You can modify any element so:
names[index] = "value";
You can make a list from an Array by writting:
List<string> list = names.OfType<string>().ToList();
and then continue from there as the others mentioned!
Example for resizing your array:
string[] names = { "Matt", "Joanne", "Robert" };
Array.Resize(ref names, names.Length + 1);
names[names.Length - 1] = "Steve";
Steve has given the proper reference above.

How to remove Duplicates from two List except few elements which may be duplicate also?

i have two lists having few elements in common, i want to remove duplicates events except few as described below..and the order of the string must be same and both list may not contain same no of elements?
list A: List B
ASCB ASCB
test1 test1
test2 test5
test3 test3
test4 test6
Arinc Arinc
testA testC
testB testB
testC
tesctD
now i want to remove all common elements in two list except elements ASCB, ARINC.. how to do that can any one help me in that...
I would just store the special values ( ASCB, ARINC, ect ) in their own list so I can use Except to get the difference between the two sets. You can add the special values in afterwards.
List<string> except = ListA.Except(ListB).Concat(listB.Except(ListA)).Concat(SpecialValues).ToList();
You have to call except twice because first we get items in A that are not in B. Then we add items that are in B but not in A. Finally we add the special values (I'm assuming SpecialValues is a collection with the strings you don't want removed).
You'd have to test performance as I suspect it's not the most efficient.
List<string> wordstoKeep = new List<string>() { "ASCB", "Arinc" };
foreach (string str in listB)
{
int index = listA.FindIndex(x => x.Equals(str, StringComparison.OrdinalIgnoreCase));
if (index >= 0)
{
if (!wordstoKeep.Any(x => x.Equals(str, StringComparison.OrdinalIgnoreCase)))
listA.RemoveAt(index);
}
else
listA.Add(str);
}
var listA = new List<string>{"ASCB","test1","test2"};
var listB = new List<string>{"ASCB","test1","test2"};
var combinedList = listA.Where(a => a.Contains("test"))
.Concat(listB.Where(b => b.Contains("test")))
.Distinct().Dump();
outputs 'test1', 'test2'
your filter conditions are contained in your Where clause.
Where can be whatever condition you want to filter by:
Where(a => a != "ASCB" or whatever...
Concat joins the two lists. Then call Distinct() to get unique entries.
Going off the requirement that order must be the same
if(B.Count != A.Count)
return;
List<String> reserved = new List<string>{ "ARCB", "ARINC" };
for (int i = A.Count -1; i >= 0; i--)
{
if (!reserved.Contains(A[i].ToUpper()) && A[i] == B[i])
{
A.RemoveAt(i);
B.RemoveAt(i);
}
}
This works:
var listA = new List<string>()
{
"ASCB",
"test1",
"test2",
"test3",
"test4",
"Arinc",
"testA",
"testB"
};
var listB = new List<string>()
{
"ASCB",
"test1",
"test5",
"test3",
"test6",
"Arinc",
"testC",
"testB"
};
var dontRemoveThese = new List<string>(){"ASCB", "Arinc"};
var listToRemove = new List<string>();
foreach (var str in listA)
if (listB.Contains(str))
listToRemove.Add(str);
foreach (var str in listToRemove){
if (dontRemoveThese.contains(str))
continue;
listA.Remove(str);
listB.Remove(str);
}
I like this solution because you can see what happens. I'd rather have 10 lines of code where it's obvious what happens than 1-3 lines of obscure magic.

C# dedupe List based on split

I'm having a hard time deduping a list based on a specific delimiter.
For example I have 4 strings like below:
apple|pear|fruit|basket
orange|mango|fruit|turtle
purple|red|black|green
hero|thor|ironman|hulk
In this example I should want my list to only have unique values in column 3, so it would result in an List that looks like this,
apple|pear|fruit|basket
purple|red|black|green
hero|thor|ironman|hulk
In the above example I would have gotten rid of line 2 because line 1 had the same result in column 3. Any help would be awesome, deduping is tough in C#.
how i'm testing this:
static void Main(string[] args)
{
BeginListSet = new List<string>();
startHashSet();
}
public static List<string> BeginListSet { get; set; }
public static void startHashSet()
{
string[] BeginFileLine = File.ReadAllLines(#"C:\testit.txt");
foreach (string begLine in BeginFileLine)
{
BeginListSet.Add(begLine);
}
}
public static IEnumerable<string> Dedupe(IEnumerable<string> list, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in list)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
Something like this should work for you
static IEnumerable<string> Dedupe(this IEnumerable<string> input, char seperator, int keyIndex)
{
var hashset = new HashSet<string>();
foreach (string item in input)
{
var array = item.Split(seperator);
if (hashset.Add(array[keyIndex]))
yield return item;
}
}
...
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk"
};
foreach (string item in list.Dedupe('|', 2))
Console.WriteLine(item);
Edit: In the linked question Distinct() with Lambda, Jon Skeet presents the idea in a much better fashion, in the form of a DistinctBy custom method. While similar, his is far more reusable than the idea presented here.
Using his method, you could write
var deduped = list.DistinctBy(item => item.Split('|')[2]);
And you could later reuse the same method to "dedupe" another list of objects of a different type by a key of possibly yet another type.
Try this:
var list = new string[]
{
"apple|pear|fruit|basket",
"orange|mango|fruit|turtle",
"purple|red|black|green",
"hero|thor|ironman|hulk "
};
var dedup = new List<string>();
var filtered = new List<string>();
foreach (var s in list)
{
var filter = s.Split('|')[2];
if (dedup.Contains(filter)) continue;
filtered.Add(s);
dedup.Add(filter);
}
// Console.WriteLine(filtered);
Can you use a HashSet instead? That will eliminate dupes automatically for you as they are added.
May be you can sort the words with delimited | on alphabetical order. Then store them onto grid (columns). Then when you try to insert, just check if there is column having a word which starting with this char.
If LINQ is an option, you can do something like this:
// assume strings is a collection of strings
List<string> list = strings.Select(a => a.Split('|')) // split each line by '|'
.GroupBy(a => a[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.Select(a => string.Join("|", a))
.ToList(); // convert to list of strings
Edit (per Jeff Mercado's comment), this can be simplified further:
List<string> list =
strings.GroupBy(a => a.split('|')[2]) // group by third column
.Select(a => a.First()) // select first line from each group
.ToList(); // convert to list of strings

Categories

Resources