Linq select with regex

Linq select with regex - c#

I want to extract the strings like aaa.a1 and aaa.a2 from my list. All this strings contain "aaa.".
How can I combine Regex with Linq?
var inputList = new List<string>() { "bbb aaa.a1 bbb", "ccc aaa.a2 ccc" };
var result = inputList.Where(x => x.Contains(#"aaa.")).Select(x => x ???? ).ToList();

You may use
var inputList = new List<string>() { "bbb aaa.a1 bbb", "ccc aaa.a2 ccc" };
var result = inputList
.Select(i => Regex.Match(i, #"\baaa\.\S+")?.Value)
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
foreach (var s in result)
Console.WriteLine(s);
Output:
aaa.a1
aaa.a2
See C# demo
The Regex.Match(i, #"\baaa\.\S+")?.Value part tries to match the following pattern in each item:
\b - a word boundary
aaa\. - an aaa. substring
\S+ - 1+ non-whitespace chars.
The .Where(x => !string.IsNullOrEmpty(x)) will discard empty items that result from the items with no matching strings.

You could try slight different solution:
var result = inputList
.Where(i => Regex.Match(i, #"\baaa\.[a-z0-9]+")?.Success)
// or even
// .Where(i => Regex.Match(i, #"\ba+\.[a-z0-9]+")?.Success)

Related

Extract values from a string into arrays

I have a string like this:
john "is my best buddy" and he loves "strawberry juice"
I want to-
Extract texts within double-quotes into a string array array1
Split texts outside of double-quotes by spaces and then insert them into another string array (array2).
Output:
array1[0]: is my best buddy
array1[1]: strawberry juice
array2[0]: john
array2[1]: and
array2[2]: he
array2[3]: loves
Any help is appreciated.

Clearly, this is a call for Regular Expressions:
var str = #"john ""is my best buddy"" and he loves ""strawberry juice""";
var regex = new Regex("(\"(?'quoted'[^\"]+)\")|(?'word'\\w+)",
RegexOptions.Singleline|RegexOptions.Compiled);
var matches = regex.Matches(str);
var quotes = matches.Cast<Match>()
.SelectMany(m => m.Groups.Cast<Group>())
.Where(g => g.Name == "quoted" && g.Success)
.Select(g => g.Value)
.ToArray();
var words = matches.Cast<Match>()
.SelectMany(m => m.Groups.Cast<Group>())
.Where(g => g.Name == "word" && g.Success)
.Select(g => g.Value)
.ToArray();

The most common word in spaceless string

I have a very long string of text that is many words separated by camelCase like so:
AedeagalAedilityAedoeagiAefaldnessAegeriidaeAeginaAeipathyAeneolithicAeolididaeAeonialAerialityAerinessAerobia
I need to find the most common word and the number of times it has been used, I am unaware how to do this due to the lack of spaces and being new to C#.
I have tried many methods but none seem to work, any advice you have I'd be very grateful.
I have a github repo with the file being downloaded and a few tests already done here: https://github.com/Imstupidpleasehelp/C-code-test
Thank you.

You can try querying the string with a help of regular expressions and Linq:
string source = ...
var result = Regex
.Matches(source, "[A-Z][a-z]*")
.Cast<Match>()
.Select(match => match.Value)
.GroupBy(word => word)
.Select(group => (word : group.Key, count : group.Count()))
.OrderByDescending(pair => pair.count)
.First();
Console.Write($"{result.word} appears {result.count} time");

string[] split = Regex.Split(exampleString, "(?<=[A-Za-z])(?=[A-Z][a-z])");
var result = split.GroupBy(s => s)
.Where(g=> g.Count()>=1 )
.OrderByDescending(g => g.Count())
.Select(g => new{ Word = g.Key, Occurrences = g.Count()});
var result will contain pairs of (Word, Occurrences) for all words.
If you want just the first one (the one with the most occurrences) use
var result = split.GroupBy(s => s)
.Where(g=> g.Count()>=1 )
.OrderByDescending(g => g.Count())
.Select(g => new{ Word = g.Key, Occurrences = g.Count()}).First();
Have in mind that it can happen that you have 2 or more words with the same number of occurrences, so using First() would only give you one of those.

A non-linq approach using for loop and IsUpper to separate the words.
string data = "AedeagalAedilityAedoeagiAefaldness";
var words = new List<string>();
var temp = new StringBuilder();
for(int i = 0;i < data.Length;i++)
{
temp.Append(data[i]);
if (i == data.Length-1 || char.IsUpper(data[i+1]))
{
words.Add(temp.ToString());
temp.Clear();
}
}

Check if String Contains Match in Enumerable.Range Filter List

I want to check if a string contains a word or number from a list and remove it from the string.
I want to use Enumerable.Range() to create the filter list and use it to filter many different strings.
I'm trying to combine two previous answers:
https://stackoverflow.com/a/49733139/6806643
https://stackoverflow.com/a/49740832/6806643
The sentence I want to filter:
This is a A05B09 hello 02 100 test
Filter
A00B00-A100B100, 01-100, 000-100, hello
Should read:
This is a test
Old Way
For Loop - Works
http://rextester.com/BJL70824
New Way
Enumerable Range List - Does not work
http://rextester.com/ZSCM64375
C#
List<List<string>> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "A{0:00}B{1:00}"))
.Select(i => Enumerable.Range(0, 10).Select(c => string.Empty).ToList())
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "{0:000}"))
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "{0:00}"))
.SelectMany(a => Enumerable.Range(0, 1).Select(b => "hello"))
.ToList();
List<string> matches = new List<string>();
// Sentence
string sentence = "This is a A05B09 hello 02 100 test";
string newSentence = string.Empty;
// Find Matches
for (int i = 0; i < filters.Count; i++)
{
// Add to Matches List
if (sentence.Contains(filters[i].ToString()))
{
matches.Add(filters[i]);
}
}
// Filter Sentence
newSentence = Regex.Replace(
sentence
, #"(?<!\S)(" + string.Join("|", matches) + #")(?!\S)"
, ""
, RegexOptions.IgnoreCase
);
// Display New Sentence
Console.WriteLine(newSentence);

I think creating a list of all possible combinations is a very bad approach. You are creating huge lists which will make your process use a lot of RAM and be very slow without any good reason. Why not just create a good Regex? For example, with this expression, you get your desired string:
\b(A\d\dB\d\d|A100B100|0?\d\d|100|hello)\b\s*
That is assuming you don't want to replace stuff like A101B101 or 123.
If you want to replace those as well, the regex is a bit simpler:
\b(A\d\d\d?B\d\d\d?|\d\d\d?|hello)\b\s*

Your this line seems not meet your requirements..SelectMany(a => Enumerable.Range(0, 101).Select(b => "A{0:00}B{1:00}"))
Can you try this Linq?
List<string> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => $"A{a:00}B{b:00}"))
.Union(Enumerable.Range(0, 101).Select(b => $"{b:000}"))
.Union(Enumerable.Range(0, 101).Select(b => $"{b:00}"))
.Union(new List<string> {"hello"})
.ToList();
This verion can give you expected result on rextester
List<string> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => string.Format("A{0:00}B{1:00}", a, b)))
.Union(Enumerable.Range(0, 101).Select(b => string.Format("{0:000}", b)))
.Union(Enumerable.Range(0, 101).Select(b => string.Format("{0:00}", b)))
.Union(new List<string> { "hello" })
.ToList();

What is elegant way to create array from list?

I have this string:
"(Id=7) OR (Id=6) OR (Id=8)"
from the string above how can I create array or list like this:
"Id=6"
"Id=7"
"Id=8"

Without using Regex but with some Linq you could write
string test = "(Id=7) OR (Id=6) OR (Id=8)";
var result = test
.Split(new string[] { " OR "}, StringSplitOptions.None)
.Select(x => x = x.Trim('(', ')'))
.ToList();
If you need also to take in consideration the presence of the AND operator or a variable number of spaces between the AND/OR and the conditions then you could change the code to this one
string test = "(Id=7) OR (Id=6) OR (Id=8)";
var result = test
.Split(new string[] { "OR", "AND"}, StringSplitOptions.None)
.Select(x => x = x.Trim('(', ')', ' '))
.ToList();

I suggest combining regex and LINQ powers:
var result = Regex.Matches(input, #"\(([^()]+)\)")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
The \(([^()]+)\) pattern (see its demo) will match all (...) strings and use the Group 1 (inside unescaped (...)) to build the final list.

Simply grab the matches
(?<=\()[^)]*(?=\))
See demo.
https://regex101.com/r/iJ7bT6/18
string strRegex = #"(?<=\()[^)]*(?=\))";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"(Id=7) OR (Id=6) OR (Id=8)";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Identifying and grouping similar items in a collection of strings

I have a collection of strings like the following:
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
Each string is made up of two components separated by a full stop - a prefix code and a subcode. Some of the strings don't have sub codes.
I want to be able combine the strings whose prefixes are the same and output them as follows with the other codes also:
44(01,02,03,04,05,06,07,08),46,47.10
I'm stuck at the first hurdle of this, which is how to identify and group together the codes whose prefix values are the same, so that I can combine them into a single string as you can see above.

You can do:
var query = codes.Select(c =>
new
{
SplitArray = c.Split('.'), //to avoid multiple split
Value = c
})
.Select(c => new
{
Prefix = c.SplitArray.First(), //you can avoid multiple split if you split first and use it later
PostFix = c.SplitArray.Last(),
Value = c.Value,
})
.GroupBy(r => r.Prefix)
.Select(grp => new
{
Key = grp.Key,
Items = grp.Count() > 1 ? String.Join(",", grp.Select(t => t.PostFix)) : "",
Value = grp.First().Value,
});
This is how it works:
Split each item in the list on the delimiter and populate an anonymous type with Prefix, Postfix and original value
Later group on Prefix
after that select the values and the post fix values using string.Join
For output:
foreach (var item in query)
{
if(String.IsNullOrWhiteSpace(item.Items))
Console.WriteLine(item.Value);
else
Console.WriteLine("{0}({1})", item.Key, item.Items);
}
Output would be:
44(01,02,03,04,05,06,07,08)
46
47.10

Try this:-
var result = codes.Select(x => new { SplitArr = x.Split('.'), OriginalValue = x })
.GroupBy(x => x.SplitArr[0])
.Select(x => new
{
Prefix= x.Key,
subCode = x.Count() > 1 ?
String.Join(",", x.Select(z => z.SplitArray[1])) : "",
OriginalValue = x.First().OriginalValue
});
You can print your desired output like this:-
foreach (var item in result)
{
Console.Write("{0}({1}),",item.Prefix,item.subCode);
}
Working Fiddle.

Outlined idea:
Use Dictionary<string, List<string>> for collecting your result
in a loop over your list, use string.split() .. the first element will be your Dictionary key ... create a new List<string> there if the key doesn't exist yet
if the result of split has a second element, append that to the List
use a second loop to format that Dictionary to your output string
Of course, linq is possible too, e.g.
List<string> codes = new List<string>() {
"44.01", "44.05", "47", "42.02", "44.03" };
var result = string.Join(",",
codes.OrderBy(x => x)
.Select(x => x.Split('.'))
.GroupBy(x => x[0])
.Select((x) =>
{
if (x.Count() == 0) return x.Key;
else if (x.Count() == 1) return string.Join(".", x.First());
else return x.Key + "(" + string.Join(",", x.Select(e => e[1]).ToArray()) + ")";
}).ToArray());
Gotta love linq ... haha ... I think this is a monster.

You can do it all in one clever LINQ:
var grouped = codes.Select(x => x.Split('.'))
.Select(x => new
{
Prefix = int.Parse(x[0]),
Subcode = x.Length > 1 ? int.Parse(x[1]) : (int?)null
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode.HasValue).Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 1 ? string.Format(".{0}", x.Subcodes.First()) :
x.Subcodes.Count() > 1 ? string.Format("({0})", string.Join(",", x.Subcodes))
: string.Empty)
).ToArray();
First it splits by Code and Subcode
Group by you Code, and get all Subcodes as a collection
Select it in the appropriate format
Looking at the problem, I think you should stop just before the last Select and let the data presentation be done in another part/method of your application.

The old fashioned way:
List<string> codes = new List<string>() {"44.01", "44.05", "47", "42.02", "44.03" };
string output=""
for (int i=0;i<list.count;i++)
{
string [] items= (codes[i]+"..").split('.') ;
int pos1=output.IndexOf(","+items[0]+"(") ;
if (pos1<0) output+=","+items[0]+"("+items[1]+")" ; // first occurence of code : add it
else
{ // Code already inserted : find the insert point
int pos2=output.Substring(pos1).IndexOf(')') ;
output=output.Substring(0,pos2)+","+items[1]+output.Substring(pos2) ;
}
}
if (output.Length>0) output=output.Substring(1).replace("()","") ;

This will work, including the correct formats for no subcodes, a single subcode, multiple subcodes. It also doesn't assume the prefix or subcodes are numeric, so it leaves leading zeros as is. Your question didn't show what to do in the case you have a prefix without subcode AND the same prefix with subcode, so it may not work in that edge case (44,44.01). I have it so that it ignores the prefix without subcode in that edge case.
List<string> codes = new List<string>
{
"44.01", "44.02", "44.03", "44.04", "44.05", "44.06", "44.07", "44.08", "46", "47.10"
};
var result=codes.Select(x => (x+".").Split('.'))
.Select(x => new
{
Prefix = x[0],
Subcode = x[1]
})
.GroupBy(k => k.Prefix)
.Select(g => new
{
Prefix = g.Key,
Subcodes = g.Where(s => s.Subcode!="").Select(s => s.Subcode)
})
.Select(x =>
x.Prefix +
(x.Subcodes.Count() == 0 ? string.Empty :
string.Format(x.Subcodes.Count()>1?"({0})":".{0}",
string.Join(",", x.Subcodes)))
).ToArray();

General idea, but i'm sure replacing the Substring calls with Regex would be a lot better as well
List<string> newCodes = new List<string>()
foreach (string sub1 in codes.Select(item => item.Substring(0,2)).Distinct)
{
StringBuilder code = new StringBuilder();
code.Append("sub1(");
foreach (string sub2 in codes.Where(item => item.Substring(0,2) == sub1).Select(item => item.Substring(2))
code.Append(sub2 + ",");
code.Append(")");
newCodes.Add(code.ToString());
}

You could go a couple ways... I could see you making a Dictionary<string,List<string>> so that you could have "44" map to a list of {".01", ".02", ".03", etc.} This would require you processing the codes before adding them to this list (i.e. separating out the two parts of the code and handling the case where there is only one part).
Or you could put them into a a SortedSet and provide your own Comparator which knows that these are codes and how to sort them (at least that'd be more reliable than grouping them alphabetically). Iterating over this SortedSet would still require special logic, though, so perhaps the Dictionary to List option above is still preferable.
In either case you would still need to handle a special case "46" where there is no second element in the code. In the dictionary example, would you insert a String.Empty into the list? Not sure what you'd output if you got a list {"46", "46.1"} -- would you display as "46(null,1)" or... "46(0,1)"... or "46(,1)" or "46(1)"?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq select with regex - c#

I want to extract the strings like aaa.a1 and aaa.a2 from my list. All this strings contain "aaa.". How can I combine Regex with Linq? var inputList = new List<string>() { "bbb aaa.a1 bbb", "ccc aaa.a2 ccc" }; var result = inputList.Where(x => x.Contains(#"aaa.")).Select(x => x ???? ).ToList();

You could try slight different solution: var result = inputList .Where(i => Regex.Match(i, #"\baaa\.[a-z0-9]+")?.Success) // or even // .Where(i => Regex.Match(i, #"\ba+\.[a-z0-9]+")?.Success)

Related

Extract values from a string into arrays

The most common word in spaceless string

Check if String Contains Match in Enumerable.Range Filter List

What is elegant way to create array from list?

Identifying and grouping similar items in a collection of strings

Categories

Resources