Is there a cleaner way to split delimited text into data structures?

Is there a cleaner way to split delimited text into data structures? - c#

I have this code:
private IEnumerable<FindReplacePair> ConstructFindReplacePairs(string inputFilePath)
{
var arrays = from line in File.ReadAllLines(Path.GetFullPath(inputFilePath))
select line.Split('|');
var pairs = from array in arrays
select new FindReplacePair { Find = array[0], Replace = array[1] };
return pairs;
}
I'm wondering if there is a clean linq syntax to do this operation in only one query, because it feels like there should be.
I tried chaining the from clauses (a SelectMany), but it splits up the data too much and I could not get to the separate arrays to select from (instead I got individual strings one at a time).

IEnumerable<FindReplacePair> ConstructFindReplacePairs(string inputFilePath)
{
return File.ReadAllLines(Path.GetFullPath(inputFilePath))
.Select(line => line.Split('|'))
.Select(array => new FindReplacePair {
Find = array[0],
Replace = array[1]
});
}
OR
IEnumerable<FindReplacePair> ConstructFindReplacePairs(string inputFilePath)
{
return from line in File.ReadAllLines(Path.GetFullPath(inputFilePath))
let array = line.Split('|')
select new FindReplacePair {
Find = array[0], Replace = array[1]
};
}
You can also add where condition to check if array has more than one element.

Not sure if this is cleaner, just a little bit shorter.
IEnumerable<FindReplacePair> allFindReplacePairs = File.ReadLines(inputFilePath)
.Select(l => new FindReplacePair { Find = l.Split('|')[0], Replace = l.Split('|')[1] });
Note that i'm using File.ReadLines which does not need to read all lines into memory first. it works like a StreamReader.

When it comes down to prettifying LINQ, I usually write out simple loop and Resharper will suggest a better LINQ optimisation, e.g.
foreach (var split in File.ReadAllLines(inputFilePath).Select(l => l.Split('|')))
yield return new FindReplacePair { Find = split[0], Replace = split[1] };
R# convertes it to
return File.ReadAllLines(inputFilePath).Select(l => l.Split('|')).Select(split => new FindReplacePair { Find = split[0], Replace = split[1] });
That said you might as well use builtin type, e.g. .ToDictionary(l => l[0], l => l[1]) or add a method on FindReplacePair, i.e.
return File.ReadAllLines(inputFilePath).Select(l => l.Split('|')).Select(FindReplacePair.Create);
public static FindReplacePair Create(string[] split)
{
return new FindReplacePair { Find = split.First(), Replace = split.Last() };
}

Related

Getting a list of strings with only the first and last character from another list LINQ

From a given list of strings I need to use LINQ to generate a new sequence of strings, where each string consists of the first and last characters of the corresponding string in the original list.
Example:
stringList: new[] { "ehgrtthrehrehrehre", "fjjgoerugrjgrehg", "jgnjirgbrnigeheruwqqeughweirjewew" },
expected: new[] { "ee", "fg", "jw" });
list2 = stringList.Select(e => {e = "" + e[0] + e[e.Length - 1]; return e; }).ToList();
This is what I've tried, it works, but I need to use LINQ to solve the problem and I'm not sure how to adapt my solution.

just for the sake of completeness here is a version using Zip
var stringList = new string [] { "ehgrtthrehrehrehre", "fjjgoerugrjgrehg", "jgnjirgbrnigeheruwqqeughweirjewew" };
var result = stringList.Zip(stringList, (first, last) => $"{first.First()}{last.Last()}");

As mentioned in the comment that Select is already part of LINQ, you can use this code.var output = arr.Select(x => new string(new char[] { x.First(), x.Last() })).ToList();

Here you go:
var newList = stringList.Select(e => $"{e[0]}{e[e.Length - 1]}").ToList();

Approach with LINQ and String.Remove():
string[] input = new[] { "ehgrtthrehrehrehre", "fjjgoerugrjgrehg", "jgnjirgbrnigeheruwqqeughweirjewew" };
string[] result = input.Select(x => x.Remove(1, x.Length - 2)).ToArray();

How to return substring of linq results

Let's say I have a list of strings:
originalList = { "XX.one", "XX.two", "YY.three" }
I want to use linq to select and return a list with {"one", "two"}.
if I do for example
resultList = originalList.FindAll(o => o.StartsWith("XX")));
I will get resultList = { "XX.one", "XX.two" } but what I want is resultList = { "one", "two" }
Any way to solve this?
EDIT: Thanks for all who answered, I've chosen the split function of #er-mfahhgk since it does the minimum of manipulation and doesn't depend on size of the prefix.

You can use SelectWith your desired string and then using Split function on Dot (.) you can select the second part like
var resultList = originalList.Where(o => o.StartsWith("XX"))
.Select(x => x.Split('.')[1])
.ToList();
And finally your output will be,
foreach (var item in resultList)
{
Console.WriteLine(item);
}
Console.ReadLine();
Output:

result = originalList.Where(o => o.StartsWith("XX"))
.Select(x=>x.Replace("XX.,""))
.ToList();

You could try this:
resultList = originalList.Where(o => o.StartsWith("XX"))
.Select(x=>x.Substring(3))
.ToList();
( edited to correct wording of Substring )

How to prevent a System.IndexOutOfRangeException in a LINQ WHERE?

I'm facing this exception when I'm using String.Split with random strings.
List<string> linhas = new List<string>();
linhas.Add("123;abc");
linhas.Add("456;def");
linhas.Add("789;ghi");
linhas.Add("chocolate");
var novas = linhas.Where(l => l.ToString().Split(';')[1]=="def");

The last string "chocolate"doesn't contain a ";", so String.Split returns an array with a single string "chocolate". That's why you get the exception if you try to accesss the second.
You could use ElementAtOrDefault which returns null for strings instead:
var novas = linhas.Where(l => l.Split(';').ElementAtOrDefault(1) == "def");
A longer approach using an anonymous type:
var novas = linhas
.Select(l => new { Line = l, Split = l.Split(';') })
.Where(x => x.Split.Length >= 2 && x.Split[1] == "def")
.Select(x => x.Line);

I'm going to expand a little on Tim's answer and show a way to do a few extra things within your LINQ queries.
You can expand the logic within you Where clause to do some additional processes, which can make your code a bit more readable. This would be good for something small:
var novas = linhas.Where(l =>
{
var parts = l.Split(':');
return parts.Length > 1 ? parts[1] == "def" : false;
});
If you need multiple statements, you can wrap the body of your clause within curly braces, but then you need the return keyword.
Alternatively, if you have a lot of information that would make something inline like that unreadable, you can also use a separate method within your query.
public void FindTheStringImLookingFor()
{
var linhas = new List<string>();
linhas.Add("123;abc");
linhas.Add("456;def");
linhas.Add("789;ghi");
linhas.Add("chocolate");
var words = linhas.Where(GetTheStringIWant);
}
private bool GetTheStringIWant(string s)
{
var parts = s.Split(':');
// Do a lot of other operations that take a few lines.
return parts.Length > 1 ? parts[1] == "def" : false;
}

Performing operations on a collection of string arrays

I am reading a file that contains rows like
pathName; additionalString; maybeSomeNumbers
I read it using
var lines = File.ReadAllLines(fileListFile);
var fileListEntries = from line in lines
where !line.StartsWith("#")
select line.Split(';').ToArray();
This works well so far. However I would like to change the drive letter in the pathName. I could convert fileListEntries to an array and loop across elements [i][0], but is there a way that I could do this operation on the collection directly?

Use the LINQ extension method syntax in order to be able to use code blocks { ... } in the lambda expressions. If you do so, you have to include an explicit return-statement.
var fileListEntries = lines
.Where(l => !l.StartsWith("#"))
.Select(l => {
string[] columns = l.Split(';');
if (Path.IsPathRooted(column[0])) {
string root = Path.GetPathRoot(columns[0]);
columns[0] = Path.Combine(#"X:\", columns[0].Substring(root.Length));
}
return columns;
})
.ToArray();

I think you can do it inline with the LINQ.
File.ReadAllLines() returns a string array, so you should be able to perform Replace() on the line from the collection.
var replace = "The string to replace the drive letter";
var lines = File.ReadAllLines(fileListFile);
var fileListEntries = from line in lines
where !line.StartsWith("#")
select (line.Replace(line[0], replace).Split(';')).ToArray();

You could just call a method in your select that modifies the text in the manner that you would like.
static void Main(string[] args)
{
var fileListEntries = from line in lines
where !(line.StartsWith("#"))
select ( ModifyString(line));
}
private static string[] ModifyString(string line)
{
string[] elements = line.Split(';');
elements[0] = "modifiedString";
return elements;
}

lines.Where(l => !l.StartsWith("#").
Select(l => string.Concat(driveLetter, l.Substring(1))).
Select(l => l.Split(';');

Sorting a generic list by an external sort order

I have a generic list
Simplified example
var list = new List<string>()
{
"lorem1.doc",
"lorem2.docx",
"lorem3.ppt",
"lorem4.pptx",
"lorem5.doc",
"lorem6.doc",
};
What I would like to do is to sort these items based on an external list ordering
In example
var sortList = new[] { "pptx", "ppt", "docx", "doc" };
// Or
var sortList = new List<string>() { "pptx", "ppt", "docx", "doc" };
Is there anything built-in to linq that could help me achieve this or do I have to go the foreach way?

With the list you can use IndexOf for Enumerable.OrderBy:
var sorted = list.OrderBy(s => sortList.IndexOf(Path.GetExtension(s)));
So the index of the extension in the sortList determines the priority in the other list. Unknown extensions have highest priority since their index is -1.
But you need to add a dot to the extension to get it working:
var sortList = new List<string>() { ".pptx", ".ppt", ".docx", ".doc" };
If that's not an option you have to fiddle around with Substring or Remove, for example:
var sorted = list.OrderBy(s => sortList.IndexOf(Path.GetExtension(s).Remove(0,1)));

This solution will work even if some file names do not have extensions:
var sortList = new List<string>() { "pptx", "ppt", "docx", "doc" };
var list = new List<string>()
{
"lorem1.doc",
"lorem2.docx",
"lorem3.ppt",
"lorem4.pptx",
"lorem5.doc",
"lorem6.doc",
};
var result =
list.OrderBy(f => sortList.IndexOf(Path.GetExtension(f).Replace(".","")));

You could try using Array.IndexOf() method:
var sortedList = list.OrderBy(i => sortList.IndexOf(System.IO.Path.GetExtension(i))).ToList();

A sortDicionary would be more efficient:
var sortDictionary = new Dictionary<string, int> {
{ ".pptx", 0 },
{ ".ppt" , 1 },
{ ".docx", 2 },
{ ".doc" , 3 } };
var sortedList = list.OrderBy(i => {
var s = Path.GetExtension(i);
int rank;
if (sortDictionary.TryGetValue(s, out rank))
return rank;
return int.MaxValue; // for unknown at end, or -1 for at start
});
This way the lookup is O(1) rather than O(# of extensions).
Also, if you have a large number of filenames and a small number of extensions, it might actually be faster to do
var sortedList = list
.GroupBy(p => Path.GetExtension(p))
.OrderBy(g => {
int rank;
if (sortDictionary.TryGetValue(g.Key, out rank))
return rank;
return int.MaxValue; // for unknown at end, or -1 for at start
})
.SelectMany(g => g);
This means the sort scales by the number of distinct extensions in the input, rather than the number of items in the input.
This also allows you to give two extensions the same priority.

Here's another way that does not use OrderBy:
var res =
sortList.SelectMany(x => list.Where(f => Path.GetExtension(f).EndsWith(x)));
Note that the complexity of this approach is O(n * m) with n = sortList.Count and m list.Count.
The OrderBy approach worst-case complexity is instead O(n * m * log m) but probably in general it will be faster (since IndexOf does not result always in O(n) ). However with small n and m you won't notice any difference.
For big lists the fastest way ( complexity O(n+m) ) could be constructing a temporary lookup i.e. :
var lookup = list.ToLookup(x => Path.GetExtension(x).Remove(0,1));
var res = sortList.Where(x => lookup.Contains(x)).SelectMany(x => lookup[x]);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Is there a cleaner way to split delimited text into data structures? - c#

Related

Getting a list of strings with only the first and last character from another list LINQ

How to return substring of linq results

How to prevent a System.IndexOutOfRangeException in a LINQ WHERE?

Performing operations on a collection of string arrays

Sorting a generic list by an external sort order

Categories

Resources