Find common string in list of strings C# - c#

I have a list of strings:
List<string> list = new <string>
{
"AAB AOC 321"
"AABAOC-WEB_A"
"AABAOC-WEB_B"
}
Now I want to extract from the list of strings largest common prefix, which would ignore the white spaces (and special characters like "_" etc.) and give the below result:
"AABAOC"
I have tried the below method to achieve the same:
var samples = new[] {
"AAB AOC 321"
"AABAOC-WEB_A"
"AABAOC-WEB_B" };
var commonPrefix = new string(samples.First().Substring(0, samples.Min(s => s.Length))
.TakeWhile((c, i) => samples.All(s => s[i] == c)).ToArray());
But the above method wont ignore white spaces and special characters and give the result "AAB", I have tried to play with the TakeWhile functionality but somehow not able to get what I want.

You need to transform your strings - remove anything that should be ignored from them.
You can use Linq as well to do that:
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var samples = new[] {"AAB AOC 321", "AABAOC-WEB_A", "AABAOC-WEB_B"};
var ignored = new HashSet<char>("-& _");
// use char[] internally
// use ToList() to avoid multiple enumerations
var transformed = samples
.Select(s => s.Where(c => !ignored.Contains(c)).ToArray()).ToList();
foreach (var s in transformed)
System.Console.WriteLine(s);
// just use transformed here - adjusted Substring to Take
var commonPrefix = new string (transformed.First()
.Take(transformed.Min(s => s.Length))
.TakeWhile((c, i) => transformed.All(s => s[i] == c))
.ToArray());
System.Console.WriteLine($"\nPrefix {commonPrefix}");
}
}
Output:
AABAOC321
AABAOCWEBA
AABAOCWEBB
Prefix AABAOC

Related

How to prevent a System.IndexOutOfRangeException in a LINQ WHERE?

I'm facing this exception when I'm using String.Split with random strings.
List<string> linhas = new List<string>();
linhas.Add("123;abc");
linhas.Add("456;def");
linhas.Add("789;ghi");
linhas.Add("chocolate");
var novas = linhas.Where(l => l.ToString().Split(';')[1]=="def");
The last string "chocolate"doesn't contain a ";", so String.Split returns an array with a single string "chocolate". That's why you get the exception if you try to accesss the second.
You could use ElementAtOrDefault which returns null for strings instead:
var novas = linhas.Where(l => l.Split(';').ElementAtOrDefault(1) == "def");
A longer approach using an anonymous type:
var novas = linhas
.Select(l => new { Line = l, Split = l.Split(';') })
.Where(x => x.Split.Length >= 2 && x.Split[1] == "def")
.Select(x => x.Line);
I'm going to expand a little on Tim's answer and show a way to do a few extra things within your LINQ queries.
You can expand the logic within you Where clause to do some additional processes, which can make your code a bit more readable. This would be good for something small:
var novas = linhas.Where(l =>
{
var parts = l.Split(':');
return parts.Length > 1 ? parts[1] == "def" : false;
});
If you need multiple statements, you can wrap the body of your clause within curly braces, but then you need the return keyword.
Alternatively, if you have a lot of information that would make something inline like that unreadable, you can also use a separate method within your query.
public void FindTheStringImLookingFor()
{
var linhas = new List<string>();
linhas.Add("123;abc");
linhas.Add("456;def");
linhas.Add("789;ghi");
linhas.Add("chocolate");
var words = linhas.Where(GetTheStringIWant);
}
private bool GetTheStringIWant(string s)
{
var parts = s.Split(':');
// Do a lot of other operations that take a few lines.
return parts.Length > 1 ? parts[1] == "def" : false;
}

How to find the nearest string in a List in LINQ?

If I want to find the exact match or the next nearest for a string.
Using SQL, I can do :
SELECT TOP 1 *
FROM table
WHERE Code >= #searchcode
ORDER BY Code
How might I achieve this using LINQ and a List of the records.
I was expecting to be able to do something like:
var find = ListDS.Where(c => c.Code >= searchcode).First();
but you can't compare strings that way.
Note that Code is an alpha string, letters, numbers, symbols, whatever..
Nearest means if you have a list containing "England", "France", "Spain", and you search for "France" then you get "France". If you search for "Germany" you get "Spain".
Here is a simple code may help you
List<string> ls = new List<string>();
ls.Add("ddd");
ls.Add("adb");
var vv = from p in ls where p.StartsWith("a") select p;
select all element with starting string "a"
If Code is an int this might work:
var find = ListDS.Where(c => c.Code >= searchcode).OrderBy(c => c.Code).First();
otherwise you need to convert it to one:
int code = int.Parse(searchcode);
var find = ListDS.Where(c => Convert.ToInt32(c.Code) >= code).OrderBy(c => Convert.ToInt32(c.Code)).First();
Try this solution:
class Something
{
public string Code;
public Something(string code)
{
this.Code = code;
}
}
class Program
{
static void Main(string[] args)
{
List<Something> ListDS = new List<Something>();
ListDS.Add(new Something("test1"));
ListDS.Add(new Something("searchword1"));
ListDS.Add(new Something("test2"));
ListDS.Add(new Something("searchword2"));
string searchcode = "searchword";
var find = ListDS.First(x => x.Code.Contains(searchcode));
Console.WriteLine(find.Code);
Console.ReadKey();
}
}
I replaced your >= with .Contains. You can also add the action into First, no need for Where.
It will not find the "nearest", just the first word containg your search parameters.
You could compare string in C#, it will use alphabetically order:
var find = ListDS.Where(c => c.Code.CompareTo(searchcode) >= 0)
.OrderBy(c => c) // get closer one, need to order
.First();
See the CompareTo docs.
Note that with this method, "10" > "2".

Performing operations on a collection of string arrays

I am reading a file that contains rows like
pathName; additionalString; maybeSomeNumbers
I read it using
var lines = File.ReadAllLines(fileListFile);
var fileListEntries = from line in lines
where !line.StartsWith("#")
select line.Split(';').ToArray();
This works well so far. However I would like to change the drive letter in the pathName. I could convert fileListEntries to an array and loop across elements [i][0], but is there a way that I could do this operation on the collection directly?
Use the LINQ extension method syntax in order to be able to use code blocks { ... } in the lambda expressions. If you do so, you have to include an explicit return-statement.
var fileListEntries = lines
.Where(l => !l.StartsWith("#"))
.Select(l => {
string[] columns = l.Split(';');
if (Path.IsPathRooted(column[0])) {
string root = Path.GetPathRoot(columns[0]);
columns[0] = Path.Combine(#"X:\", columns[0].Substring(root.Length));
}
return columns;
})
.ToArray();
I think you can do it inline with the LINQ.
File.ReadAllLines() returns a string array, so you should be able to perform Replace() on the line from the collection.
var replace = "The string to replace the drive letter";
var lines = File.ReadAllLines(fileListFile);
var fileListEntries = from line in lines
where !line.StartsWith("#")
select (line.Replace(line[0], replace).Split(';')).ToArray();
You could just call a method in your select that modifies the text in the manner that you would like.
static void Main(string[] args)
{
var fileListEntries = from line in lines
where !(line.StartsWith("#"))
select ( ModifyString(line));
}
private static string[] ModifyString(string line)
{
string[] elements = line.Split(';');
elements[0] = "modifiedString";
return elements;
}
lines.Where(l => !l.StartsWith("#").
Select(l => string.Concat(driveLetter, l.Substring(1))).
Select(l => l.Split(';');

Is there a cleaner way to split delimited text into data structures?

I have this code:
private IEnumerable<FindReplacePair> ConstructFindReplacePairs(string inputFilePath)
{
var arrays = from line in File.ReadAllLines(Path.GetFullPath(inputFilePath))
select line.Split('|');
var pairs = from array in arrays
select new FindReplacePair { Find = array[0], Replace = array[1] };
return pairs;
}
I'm wondering if there is a clean linq syntax to do this operation in only one query, because it feels like there should be.
I tried chaining the from clauses (a SelectMany), but it splits up the data too much and I could not get to the separate arrays to select from (instead I got individual strings one at a time).
IEnumerable<FindReplacePair> ConstructFindReplacePairs(string inputFilePath)
{
return File.ReadAllLines(Path.GetFullPath(inputFilePath))
.Select(line => line.Split('|'))
.Select(array => new FindReplacePair {
Find = array[0],
Replace = array[1]
});
}
OR
IEnumerable<FindReplacePair> ConstructFindReplacePairs(string inputFilePath)
{
return from line in File.ReadAllLines(Path.GetFullPath(inputFilePath))
let array = line.Split('|')
select new FindReplacePair {
Find = array[0], Replace = array[1]
};
}
You can also add where condition to check if array has more than one element.
Not sure if this is cleaner, just a little bit shorter.
IEnumerable<FindReplacePair> allFindReplacePairs = File.ReadLines(inputFilePath)
.Select(l => new FindReplacePair { Find = l.Split('|')[0], Replace = l.Split('|')[1] });
Note that i'm using File.ReadLines which does not need to read all lines into memory first. it works like a StreamReader.
When it comes down to prettifying LINQ, I usually write out simple loop and Resharper will suggest a better LINQ optimisation, e.g.
foreach (var split in File.ReadAllLines(inputFilePath).Select(l => l.Split('|')))
yield return new FindReplacePair { Find = split[0], Replace = split[1] };
R# convertes it to
return File.ReadAllLines(inputFilePath).Select(l => l.Split('|')).Select(split => new FindReplacePair { Find = split[0], Replace = split[1] });
That said you might as well use builtin type, e.g. .ToDictionary(l => l[0], l => l[1]) or add a method on FindReplacePair, i.e.
return File.ReadAllLines(inputFilePath).Select(l => l.Split('|')).Select(FindReplacePair.Create);
public static FindReplacePair Create(string[] split)
{
return new FindReplacePair { Find = split.First(), Replace = split.Last() };
}

Determine if string appears more than once in string array (C#)

I have an array of strings, f.e.
string [] letters = { "a", "a", "b", "c" };
I need to find a way to determine if any string in the array appears more than once.
I thought the best way is to make a new string-array without the string in question and to use Contains,
foreach (string letter in letters)
{
string [] otherLetters = //?
if (otherLetters.Contains(letter))
{
//etc.
}
}
but I cannot figure out how.
If anyone has a solution for this or a better approach, please answer.
The easiest way is to use GroupBy:
var lettersWithMultipleOccurences = letters.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key);
This will first group your array using the letters as keys. It then returns only those groups with multiple entries and returns the key of these groups. As a result, you will have an IEnumerable<string> containing all letters that occur more than once in the original array. In your sample, this is only "a".
Beware: Because LINQ is implemented using deferred execution, enumerating lettersWithMultipleOccurences multiple times, will perform the grouping and filtering multiple times. To avoid this, call ToList() on the result:
var lettersWithMultipleOccurences = letters.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key).
.ToList();
lettersWithMultipleOccurences will now be of type List<string>.
You can the LINQ extension methods:
if (letters.Distinct().Count() == letters.Count()) {
// no duplicates
}
Enumerable.Distinct removes duplicates. Thus, letters.Distinct() would return three elements in your example.
Create a HashSet from the array and compare their sizes:
var set = new HashSet(letters);
bool hasDoubleLetters = set.Size == letters.Length;
A HashSet will give you good performance:
HashSet<string> hs = new HashSet<string>();
foreach (string letter in letters)
{
if (hs.Contains(letter))
{
//etc. more as once
}
else
{
hs.Add(letter);
}
}

Categories

Resources