LINQ Spliting string into substrings sequentially, with using a delimiter

LINQ Spliting string into substrings sequentially, with using a delimiter - c#

I have a table of strings, e.g :
string[] list = { "900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org" };
string delimiter = ".";
foreach (var item in cpdomains)
{
var result = item.Split(' ', '.').Aggregate((a,b) => a + delimiter +b);
result.ForEach(e => Console.WriteLine(e));
Console.WriteLine(result);
}
my result now
900.google.mail.com
50.yahoo.com
1.intel.mail.com
5.wiki.org
I want to split every string for substrings like this:
1- 900
2- google.mail.com
3-mail.com
4-com
etc...
How I can do this ?
Thanks in advance

public string[] GetParentDomains(string[] input) {
return input
.SelectMany(s => s.Split(' '))
.SelectMany(s => {
string[] splitDomain = s.Split('.');
return Enumerable.Range(0, splitDomain.Length)
.Select(counter =>
String.Join(".", splitDomain.Skip(counter)))
.ToArray();
})
.ToArray();
}
Output
900
google.mail.com
mail.com
com
50
yahoo.com
com
1
intel.mail.com
mail.com
com
5
wiki.org
org

Related

Count occurence of a string in a list and display it in console

I'm creating a Logparser right now i'm able to go trought all the file in a folder line by line and extracting the substring i wanted which is the value after "fct=". All that using Regex and i put the result in a List.
Now i want to Count the occurence of every string in my list and display it.
I'm using GroupBy but when i display the result all the occurence are at 1.
Actual:
720 1x
720 1x
710 1x
And it should be:
720 2x
710 1x
I was able to find that the problem is that i read my file line by line so if the "fct=" value is not twice on the same line it won't count it a 2 but only at 1 for every line that its appears.
So i need to find a way to count my list and not my file line by line.
I'm really beginner so not sure how to do this any tips would be appreciated.
Here's the log data example:
<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat>LBZ=test.sqi</logurl>
<dat>XN=KEY,CN=RT</dat>FCT=10019</logurl>
I want to display:
FCT=10019 2x
FCT=4515 1x
My Code:
class Program
{
static void Main(string[] args)
{
int counter = 0;
string[] dirs = Directory.GetFiles(#"C:/LogParser/LogParserV1", "*.txt");
StreamWriter sw = new StreamWriter("C:/LogParser/LogParserV1/test.txt");
char[] delimiters = { '<', ',', '&', ':', ' ', '\\', '\'' };
string patternfct = "(?<=FCT=)[0-9]*";
foreach (string fileName in dirs)
{
StreamReader sr = new StreamReader(fileName);
{
String lineRead;
while ((lineRead = sr.ReadLine()) != null)
{
//To find all the value of fct= occurence
var listfct = Regex.Matches(lineRead, patternfct,
RegexOptions.IgnoreCase).Cast<Match>().Select(x => x.Value).ToList();
var fctGroups = listfct.GroupBy(i => i);
foreach (var grp in fctGroups)
{
var fct = grp.Key;
var total = grp.Count();
System.Console.WriteLine("fct=" + fct + " " + "Total=" + total);
}
counter++;
}
System.Console.WriteLine(fileName);
sr.Close();
sw.Close();
}
}
// Suspend the screen.
System.Console.ReadLine();
}
}
}

You can try querying data with a help of Linq:
using System.Linq;
using System.Text.RegularExpressions;
...
Regex regex = new Regex("(?<=FCT=)[0-9]*", RegexOptions.IgnoreCase);
var records = Directory
.EnumerateFiles(#"C:/LogParser/LogParserV1", "*.txt")
.SelectMany(file => File.ReadLines(file))
.SelectMany(line => regex
.Matches(line)
.Cast<Match>()
.Select(match => match.Value))
.GroupBy(number => number)
.Select(group => $"FCT={group.Key} {group.Count()}x");
foreach (string record in records)
Console.WriteLine(record);
Demo: We can't mimic directory and files, so I've removed
Directory
.EnumerateFiles(#"C:/LogParser/LogParserV1", "*.txt")
.SelectMany(file => File.ReadLines(file))
but added testLines
string[] testLines = new string[] {
"<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC</dat></logurl>",
"<dat>XN=KEY,CN=RTU FCT=4515</dat>LBZ=test.sqi</logurl>",
"<dat>XN=KEY,CN=RT</dat>FCT=10019</logurl>",
};
Regex regex = new Regex("(?<=FCT=)[0-9]*", RegexOptions.IgnoreCase);
var records = testLines
.SelectMany(line => regex
.Matches(line)
.Cast<Match>()
.Select(match => match.Value))
.GroupBy(number => number)
.Select(group => $"FCT={group.Key} {group.Count()}x");
foreach (string record in records)
Console.WriteLine(record);
Outcome:
FCT=10019 2x
FCT=4515 1x
Edit: If you want to include file into records, you can use anonymous objects:
var records = Directory
.EnumerateFiles(#"C:/LogParser/LogParserV1", "*.txt")
.SelectMany(file => File
.ReadLines(file)
.Select(line => new {
file = file,
line = line,
}))
.SelectMany(item => regex
.Matches(item.line)
.Cast<Match>()
.Select(match => new {
file = item.file,
number = match.Value
}))
.GroupBy(item => new {
file = item.file,
number = item.number
})
.OrderBy(group => group.Key.file)
.ThenBy(group => group.Key.number)
.Select(group => $"{group.Key.file} has FCT={group.Key.number} {group.Count()}x")

C# RegEx Pattern to Split a String into 2 Character Substring

I am trying to figure out a regex to use to split a string into 2 character substring.
Let's say we have the following string:
string str = "Idno1";
string pattern = #"\w{2}";
Using the pattern above will get me "Id" and "no", but it will skip the "1" since it doesn't match the pattern. I would like the following results:
string str = "Idno1"; // ==> "Id" "no" "1 "
string str2 = "Id n o 2"; // ==> "Id", " n", " o", " 2"

Linq can make easy the code. Fiddle version works
The idea: I have a chunkSize = 2 as your requirement, then, Take the string at the index (2,4,6,8,...) to get the chunk of chars and Join them to string.
public static IEnumerable<string> ProperFormat(string s)
{
var chunkSize = 2;
return s.Where((x,i) => i % chunkSize == 0)
.Select((x,i) => s.Skip(i * chunkSize).Take(chunkSize))
.Select(x=> string.Join("", x));
}
With the input, I have the output
Idno1 -->
Id
no
1
Id n o 2 -->
Id
n
o
2

Linq is really better in this case. You can use this method - it will allow to split string in chunks of arbitrary size:
public static IEnumerable<string> SplitInChunks(string s, int size = 2)
{
return s.Select((c, i) => new {c, id = i / size})
.GroupBy(x => x.id, x => x.c)
.Select(g => new string(g.ToArray()));
}
But if you are bound to regex, use this code:
public static IEnumerable<string> SplitInChunksWithRegex(string s, int size = 2)
{
var regex = new Regex($".{{1,{size}}}");
return regex.Matches(s).Cast<Match>().Select(m => m.Value);
}

LINQ search in list

I have this list
var allPlaces = new[]
{
new { Name = "Red apple", OtherKnownNames = "Green" },
new { Name = "Orange", OtherKnownNames = "" },
new { Name = "Banana", OtherKnownNames = "the" },
}.ToList();
my query is "the apple"
my code does not return me first and third item, query has 2 words separated by a space, I want if any word in query starts with the Name or OtherKnownName should be returned.
var query = "the apple";
var queryParts = query.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
var filteredList =
allPlaces
.Where(p =>
p.Name
.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Any(pp => queryParts.Any(qp => qp.StartsWith(pp)))
|| p.OtherKnownNames
.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Any(pp => queryParts.Any(qp => qp.StartsWith(pp))))
.ToList();

Assuming you want to ignore case, and accept names that match the beginning of query words (based on your example with StartsWith).
Use an extension method to make splitting without empty entries nicer:
public static string[] SplitNoEmpty(this string s, params char[] seps) => s.Split(seps, StringSplitOptions.RemoveEmptyEntries);
You can simply split up the query string and search for matches:
var qwords = query.SplitNoEmpty(' ');
var ans = allPlaces.Where(p => qwords.Any(qw => (p.Name + " " + p.OtherKnownNames).SplitNoEmpty(' ')
.Any(nw => qw.StartsWith(nw, StringComparison.CurrentCultureIgnoreCase))
)
)
.ToList();

Sorting a string based on prefixes

If you are given an array with random prefixes, like this:
DOG_BOB
CAT_ROB
DOG_DANNY
MOUSE_MICKEY
DOG_STEVE
HORSE_NEIGH
CAT_RUDE
HORSE_BOO
MOUSE_STUPID
How would i go about sorting this so that i have 4 different arrays/lists of strings?
So the end result would give me 4 string ARRAYS or lists with
DOG_BOB,DOG_DANNY,DOG_STEVE <-- Array 1
HORSE_NEIGH, HORSE_BOO <-- Array 2
MOUSE_MICKEY, MOUSE_STUPID <-- Array 3
CAT_RUDE, CAT_ROB <-- Array 4
sorry about the names i just made them up lol
var fieldNames = typeof(animals).GetFields()
.Select(field => field.Name)
.ToList();
List<string> cats = new List<string>();
List<string> dogs = new List<string>();
List<string> mice= new List<string>();
List<string> horse = new List<string>();
foreach (var n in fieldNames)
{
var fieldValues = typeof(animals).GetField(n).GetValue(n);"
//Here's what i'm trying to do, with if statements
if (n.ToString().ToLower().Contains("horse"))
{
}
}
So i need them to be splitted into STRING ARRAYS/STRING LISTS and NOT just strings

string[] strings = new string[] {
"DOG_BOB",
"CAT_ROB",
"DOG_DANNY",
"MOUSE_MICKEY",
"DOG_STEVE",
"HORSE_NEIGH",
"CAT_RUDE",
"HORSE_BOO",
"MOUSE_STUPID"};
string[] results = strings.GroupBy(s => s.Split('_')[0])
.Select(g => String.Join(",",g))
.ToArray();
Or maybe something like this
List<List<string>> res = strings.ToLookup(s => s.Split('_')[0], s => s)
.Select(g => g.ToList())
.ToList();

var groups = fieldNames.GroupBy(n => n.Split('_')[0]);
Usage
foreach(var group in groups)
{
// group.Key (DOG, HORSE, CAT, etc)
foreach(var name in group)
// all names groped by prefix
}

foreach (String s in strings)
{
if (s.StartsWith("CAT_")
cats.Add(s);
else if (s.StartsWith("HORSE_")
horses.Add(s);
// ...
}
Or:
foreach (String s in strings)
{
String[] split = s.Split(new Char [] { '_' });
if (split[0].Equals("CAT")
cats.Add(s);
else if (split[0].Equals("HORSE")
horses.Add(s);
// ...
}
But I would prefer the first one.

Algorithmically, I'd do the following:
Parse out all unique prefixes by using the "_" as your delimeter.
Loop through your list of prefixes.
2a. Retrieve any values that have your prefix (loop/find/regex/depends on structure)
2b. Place retrieved values in a List.
2c. Sort list.
Output your results, or do what you need with your collections.

You can order the list up front and sort by prefix:
string[] input = new string[] {"DOG_BOB","CAT_ROB","DOG_DANNY","MOUSE_MICKEY","DOG_STEVE","HORSE_NEIGH","CAT_RUDE","HORSE_BOO","MOUSE_STUPID"};
string[] sortedInput = input.OrderBy(x => x).ToArray();
var distinctSortedPrefixes = sortedInput.Select(item => item.Split('_')[0]).Distinct().ToArray();
Dictionary<string, string[]> orderedByPrefix = new Dictionary<string, string[]>();
for (int prefixIndex = 0; prefixIndex < distinctSortedPrefixes.Length; prefixIndex++)
{
string prefix = distinctSortedPrefixes[prefixIndex];
var group = input.Where(item => item.StartsWith(prefix)).ToArray();
orderedByPrefix.Add(prefix, group);
}

With LINQ, using something like
names.GroupBy(s => s.Substring(0, s.IndexOf("_"))) // group by prefix
.Select(g => string.Join(",", g)) // join each group with commas
.ToList(); // take the results
See it in action (some extra .ToArray() calls included for .NET 3.0 compatibility)

This LINQ expression does what you want.
var result = data.GroupBy(data.Split('_')[0])
.Select(group => String.Join(", ", group))
.ToList();
For a list of lists of strings use this expression.
var result = data.GroupBy(data.Split('_')[0])
.Select(group => group.ToList())
.ToList();

Mapping numbers to letters

I had an interview question asking this:
text file has following lines>
1: A C D
4: A B
5: D F
7: A E
9: B C
*Every line has a unique integer followed by a colon and one or
more letters. These letters are
delimited spaces (one or more)>
#2 Write a short program in the language
of your choice that outputs a sorted
list like
A: 1 4 7
B: 4 9
C: 1 9
D: 1 5
E: 7
F: 5
I'm not looking for someone to solve it, but I always get confused with problems like this. I'd like to do it in C# and was wondering should I store each line in a 2d array? What is the best way to handle this. After storing it how do I relist each line with letters rather then numbers?
Just looking for pointers here.

You can solve the problem by creating a Lookup mapping letters to a collection of numbers. You can use the extension method ToLookup to create a Lookup.
Warning: Spoilers ahead
Using LINQ you can do it like this (breaks on invalid input):
var text = #"1: A C D
4: A B
5: D F
7: A E
9: B C";
var lookup = text
.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
.Select(
line => new {
Number = Int32.Parse(line.Split(':').First()),
Letters = line.Split(':').Skip(1).First().Split(
new[] {' '}, StringSplitOptions.RemoveEmptyEntries
)
}
)
.SelectMany(x => x.Letters, (x, letter) => new { x.Number, Letter = letter })
.OrderBy(x => x.Letter)
.ToLookup(x => x.Letter, x => x.Number);
foreach (var item in lookup)
Console.WriteLine(item.Key + ": " + String.Join(" ", item.ToArray()));

In case you are familiar with LINQ the below code can give you what you are looking for:
var result = File.ReadAllLines("inFile").SelectMany(line =>
{
var ar = line.Split(" ".ToCharArray());
var num = int.Parse(ar[0].Split(":".ToCharArray())[0]);
return ar.Skip(1).Select(s => new Tuple<string, int>(s, num));
}).GroupBy(t => t.Item1).OrderByDescending(g => g.Count())
.Select(g => g.Key + ": " + g.Select(t => t.Item2.ToString()).Aggregate( (a,b) => a + " " + b));
File.WriteAllLines("outFile", result);

I know you said you didn't want full answers, but this kind of thing is fun. It looks like others have come up with similar solutions, but here's another way to represent it - in "one line" of code (but lots of brackets!) :)
var data = #"1: A C D
4: A B
5: D F
7: A E
9: B C";
Console.WriteLine(
String.Join(
Environment.NewLine,
(from line in data.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
let lineParts = line.Split(new[] { ':', ' ' }, StringSplitOptions.RemoveEmptyEntries)
from letter in lineParts.Skip(1)
select new { Number = lineParts[0], Letter = letter })
.ToLookup(l => l.Letter, l => l.Number)
.OrderBy(l => l.Key)
.Select(l => String.Format("{0}: {1}", l.Key, String.Join(" ", l)))));
Oh, and would I write code like that in production? Probably not, but it's fun in an exercise like this!

The thing that will help you solve this
IDictionary<char, IList<int> >
Yet Another Linq Masturbatory Implementation ("Look Ma! No loops!")
using System;
using System.IO;
using System.Linq;
public static class Program
{
public static void Main(string[] args)
{
File.ReadAllLines("input.txt")
.Select(line =>
{
var split = line.Split(":".ToCharArray(), 2);
return new { digit = split[0].Trim().Substring(0,1),
chars = split[1]
.Split(" \t".ToCharArray())
.Select(s=>s.Trim())
.Where(s => !String.IsNullOrEmpty(s))
.Select(s => s[0])
};
})
.SelectMany(p => p.chars.Select(ch => new { p.digit, ch }))
.GroupBy(p => p.ch, p => p.digit)
.ToList()
.ForEach(g => Console.WriteLine("{0}: {1}", g.Key, string.Join(" ", g)));
}
}
Of course you can replace GroupBy with ToLookup

I will use a Dictionary<string,List<int>> I will read the input and add 1 into the list at keys A,C,D, A at keys A,B etc, so having the result is just a lookup by letter.
So like this, in a non esoteric way:
string inp = #"1: A C D
4: A B
5: D F
7: A E
9: B C";
Dictionary<string, List<int>> res = new Dictionary<string, List<int>>();
StringReader sr = new StringReader(inp);
string line;
while (null != (line = sr.ReadLine()))
{
if (!string.IsNullOrEmpty(line))
{
string[] tokens = line.Split(": ".ToArray(),StringSplitOptions.RemoveEmptyEntries);
int idx = int.Parse(tokens[0]);
for (int i = 1; i < tokens.Length; ++i)
{
if (!res.ContainsKey(tokens[i]))
res[tokens[i]] = new List<int>();
res[tokens[i]].Add(int.Parse(tokens[0]));
}
}
}
res will contain the result of letter->list of numbers.

String parsing using Split(":") and Split(" ").
Then fill
Dictionary<int, List<string>>
and translate it into
Dictionary<string, List<int>>

You could store the input in an IDictionary, and reverse it to produce your output.
Take a look at this question.

I see that multiple similar (loops) and not so similar (linq) solutions were already posted but since i've written this i thought i'd throw it in the mix.
static void Main(string[] args)
{
var result = new SortedDictionary<char, List<int>>();
var lines = System.IO.File.ReadAllLines(#"input.txt");
foreach (var line in lines)
{
var split = line.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
var lineNumber = Int32.Parse(split[0].Substring(0,1));
foreach (var letter in split.Skip(1))
{
var key = letter[0];
if (!result.ContainsKey(key))
{
result.Add(key, new List<int> { lineNumber });
}
else
{
result[key].Add(lineNumber);
}
}
}
foreach (var item in result)
{
Console.WriteLine(String.Format("{0}: {1}", item.Key, String.Join(" ", item.Value)));
}
Console.ReadKey();
}

An important part of the interview process is asking about and verifying assumptions. Although your description states the file is structured as an integer followed by letters, the example you give shows the integers in increasing order. If that's the case, you can avoid all of the LINQ craziness and implement a much more efficient solution:
var results = new Dictionary<char, List<int>>();
foreach (var line in File.ReadAllLines(#"input.txt"))
{
var split = line.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries);
var num = int.Parse(split[0].TrimEnd(':'));
for (int i = 1; i < split.Length; i++)
{
char letter = split[i][0];
if (!results.ContainsKey(letter))
results[letter] = new List<int>();
results[letter].Add(num);
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ Spliting string into substrings sequentially, with using a delimiter - c#

Related

Count occurence of a string in a list and display it in console

C# RegEx Pattern to Split a String into 2 Character Substring

LINQ search in list

Sorting a string based on prefixes

Mapping numbers to letters

Categories

Resources