I am creating a Word Cloud and so I am splitting my sentences in Linq using Regex and grouping the words and taking the count of them. However, I don't want some blacklist words to appear in my cloud, so I get those words in a datatable (dtBlackList) and check with Linq as shown in the code below
var result = (Regex.Split(StringsForWordCloud, #"\W+")
.GroupBy(s => s, StringComparer.InvariantCultureIgnoreCase)
.Where(q => q.Key.Trim() != "")
.Where(q => (dtBlackList.Select("blacklistword = '" + q.Key.Trim() + "'").Count() == 0))
.OrderByDescending(g => g.Count())
.Select(p => new { Word = p.Key, Count = p.Count() })
).Take(200);
Will this query affect my performance badly? Is this the right way to check against a datatable?
A LINQ query as this one will execute a query for each word found with the Regex.Split operation. I'm referring to this line of code:
.Where(q => (dtBlackList.Select("blacklistword = '" + q.Key.Trim() + "'").Count() == 0))
I've had to deal with a lot of performance problems on the project I'm working right now, caused by situations similar to this one.
In general, performing a query to check or complete the data extracted in your database is not a good practice.
In your case, I think it's much better to write a single query that will extract the blacklist words and then exclude that list from the dataset you have just extracted. As follows:
var words = Regex.Split(StringsForWordCloud, #"\W+")
.GroupBy(s => s, StringComparer.InvariantCultureIgnoreCase)
.Where(q => q.Key.Trim() != "")
.OrderByDescending(g => g.Count())
.Select(p => new { Word = p.Key, Count = p.Count() });
// Now extract all the word in the blacklist
IEnumerable<string> blackList = dtBlackList...
// Now exclude them from the set of words all in once
var result = words.Where(w => !blackList.Contains(w.Word)
.OrderByDescending(g => g.Count())
.Take(200);
Related
A sequence of non-empty strings stringList is given, containing only uppercase letters of the Latin alphabet. For all strings starting with the same letter, determine their total length and obtain a sequence of strings of the form "S-C", where S is the total length of all strings from stringList that begin with the character C. Order the resulting sequence in descending order of the numerical values of the sums, and for equal values of the sums, in ascending order of the C character codes.
This question is related to one of my previous questions.
One solution that works is this one:
stringList.GroupBy(x => x[0]).Select(g => $"{g.Sum(x => x.Length)}-{g.Key}");
The problem is that with this given example I don't know where to add the OrderByDescending()/ThenBy() clauses in order to get the correctly sorted list.
Create an intermediate data structure to store needed info and use it for sorting and then building the output:
stringList
.GroupBy(x => x[0])
.Select(g => (Length: g.Sum(x => x.Length), Char: g.Key))
.OrderByDescending(t => t.Length)
.ThenBy(t => t.Char)
.Select(t => $"{t.Length}-{t.Char}");
You're almost there. The cleanest way of doing it would be to make a more complex object with the properties you care about, use those to sort, then keep only what you want in the output. Like:
stringList
.GroupBy(x => x[0])
.Select(g => new {
Len = g.Sum(x => x.Length),
Char = g.Key,
Val = $"{g.Sum(x => x.Length)}-{g.Key}"
})
.OrderByDescending(x => Len)
.ThenBy(x => x.Char)
.Select(x => x.Val);
You can add a Select after the GroupBy to transform the groups into an anonymous object containing the things you want to sort by. Then you can use OrderByDescending and ThenBy to sort. After that, Select the formatted string you want:
stringList.GroupBy(x => x[0]) // assuming all strings are non-empty
.Select(g => new {
LengthSum = g.Sum(x => x.Length),
FirstChar = g.Key
})
.OrderByDescending(x => x.LengthSum)
.ThenBy(x => x.FirstChar)
.Select(x => $"{x.LengthSum}-{x.FirstChar}");
Alternatively, do it in the query syntax with let clauses, which I find more readable:
var query = from str in stringList
group str by str[0] into g
let lengthSum = g.Sum(x => x.Length)
let firstChar = g.Key
orderby lengthSum descending, firstChar
select $"{lengthSum}-{firstChar}";
I have a very long string of text that is many words separated by camelCase like so:
AedeagalAedilityAedoeagiAefaldnessAegeriidaeAeginaAeipathyAeneolithicAeolididaeAeonialAerialityAerinessAerobia
I need to find the most common word and the number of times it has been used, I am unaware how to do this due to the lack of spaces and being new to C#.
I have tried many methods but none seem to work, any advice you have I'd be very grateful.
I have a github repo with the file being downloaded and a few tests already done here: https://github.com/Imstupidpleasehelp/C-code-test
Thank you.
You can try querying the string with a help of regular expressions and Linq:
string source = ...
var result = Regex
.Matches(source, "[A-Z][a-z]*")
.Cast<Match>()
.Select(match => match.Value)
.GroupBy(word => word)
.Select(group => (word : group.Key, count : group.Count()))
.OrderByDescending(pair => pair.count)
.First();
Console.Write($"{result.word} appears {result.count} time");
string[] split = Regex.Split(exampleString, "(?<=[A-Za-z])(?=[A-Z][a-z])");
var result = split.GroupBy(s => s)
.Where(g=> g.Count()>=1 )
.OrderByDescending(g => g.Count())
.Select(g => new{ Word = g.Key, Occurrences = g.Count()});
var result will contain pairs of (Word, Occurrences) for all words.
If you want just the first one (the one with the most occurrences) use
var result = split.GroupBy(s => s)
.Where(g=> g.Count()>=1 )
.OrderByDescending(g => g.Count())
.Select(g => new{ Word = g.Key, Occurrences = g.Count()}).First();
Have in mind that it can happen that you have 2 or more words with the same number of occurrences, so using First() would only give you one of those.
A non-linq approach using for loop and IsUpper to separate the words.
string data = "AedeagalAedilityAedoeagiAefaldness";
var words = new List<string>();
var temp = new StringBuilder();
for(int i = 0;i < data.Length;i++)
{
temp.Append(data[i]);
if (i == data.Length-1 || char.IsUpper(data[i+1]))
{
words.Add(temp.ToString());
temp.Clear();
}
}
I want to check if a string contains a word or number from a list and remove it from the string.
I want to use Enumerable.Range() to create the filter list and use it to filter many different strings.
I'm trying to combine two previous answers:
https://stackoverflow.com/a/49733139/6806643
https://stackoverflow.com/a/49740832/6806643
The sentence I want to filter:
This is a A05B09 hello 02 100 test
Filter
A00B00-A100B100, 01-100, 000-100, hello
Should read:
This is a test
Old Way
For Loop - Works
http://rextester.com/BJL70824
New Way
Enumerable Range List - Does not work
http://rextester.com/ZSCM64375
C#
List<List<string>> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "A{0:00}B{1:00}"))
.Select(i => Enumerable.Range(0, 10).Select(c => string.Empty).ToList())
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "{0:000}"))
.SelectMany(a => Enumerable.Range(0, 101).Select(b => "{0:00}"))
.SelectMany(a => Enumerable.Range(0, 1).Select(b => "hello"))
.ToList();
List<string> matches = new List<string>();
// Sentence
string sentence = "This is a A05B09 hello 02 100 test";
string newSentence = string.Empty;
// Find Matches
for (int i = 0; i < filters.Count; i++)
{
// Add to Matches List
if (sentence.Contains(filters[i].ToString()))
{
matches.Add(filters[i]);
}
}
// Filter Sentence
newSentence = Regex.Replace(
sentence
, #"(?<!\S)(" + string.Join("|", matches) + #")(?!\S)"
, ""
, RegexOptions.IgnoreCase
);
// Display New Sentence
Console.WriteLine(newSentence);
I think creating a list of all possible combinations is a very bad approach. You are creating huge lists which will make your process use a lot of RAM and be very slow without any good reason. Why not just create a good Regex? For example, with this expression, you get your desired string:
\b(A\d\dB\d\d|A100B100|0?\d\d|100|hello)\b\s*
That is assuming you don't want to replace stuff like A101B101 or 123.
If you want to replace those as well, the regex is a bit simpler:
\b(A\d\d\d?B\d\d\d?|\d\d\d?|hello)\b\s*
Your this line seems not meet your requirements..SelectMany(a => Enumerable.Range(0, 101).Select(b => "A{0:00}B{1:00}"))
Can you try this Linq?
List<string> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => $"A{a:00}B{b:00}"))
.Union(Enumerable.Range(0, 101).Select(b => $"{b:000}"))
.Union(Enumerable.Range(0, 101).Select(b => $"{b:00}"))
.Union(new List<string> {"hello"})
.ToList();
This verion can give you expected result on rextester
List<string> filters = Enumerable.Range(0, 101)
.SelectMany(a => Enumerable.Range(0, 101).Select(b => string.Format("A{0:00}B{1:00}", a, b)))
.Union(Enumerable.Range(0, 101).Select(b => string.Format("{0:000}", b)))
.Union(Enumerable.Range(0, 101).Select(b => string.Format("{0:00}", b)))
.Union(new List<string> { "hello" })
.ToList();
I have a list of files say prg_3.txt , prg_2.txt , prg_1.txt .
I need to loop over the files and merge the files in order 1,2,3 .
The query i am using is as follows:
var Groups = shortfilenames.GroupBy(s => s.Substring(0, s.IndexOf('_'))).ToList();
The above query would create a group names prg and it will have 3 files.
Now,i need to sort them in the order 1,2,3 i.e fromm their file names.
Here, I am getting grouped results, but i am not sure how to order the elements in each group
Please help..let me know incase of any questions..
Edited :
Will it be good enough ?
var userGroups = shortfilenames.GroupBy(s => s.Substring(0, s.IndexOf('_'))).Select(g=>g.OrderBy(x=>x.Substring(x.IndexOf('_',x.Length-x.IndexOf('_')))));
This should work but probably won't be so efficient:
shortfilenames
.GroupBy(s => s.Substring(0, s.IndexOf('_')))
.Select(
g => g.OrderBy(x => int.Parse(new String(x.Where(char.IsDigit).ToArray()))));
This will not work if your file contains additional digits, here is another solution to fix that, according to your comment this should work with the format you specified:
shortfilenames
.GroupBy(s => s.Substring(0, s.IndexOf('_')))
.Select(g => g.OrderBy(
x =>
{
var index = x.IndexOf('_');
return int.Parse(x.Substring(index + 1, x.LastIndexOf('.') - index));
}));
Since the names kinda match, what's the problem with simply using the OrderBy and giving it the names ?
var v = new string[] {"prg_3.txt","prg_2.txt", "prg_1.txt"};
var sorted = v.OrderBy(name => name);
you get :
prg_1.txt
prg_2.txt
prg_3.txt
If you want to sort inner groupings by file name this should do the trick:
shortfilenames.GroupBy(s => s.Substring(0, s.IndexOf('_'))).Select(g => g.OrderBy(e => e)).ToList();
I've a problem in my C# application... I've some school classes in database for example 8-B, 9-A, 10-C, 11-C and so on .... when I use order by clause to sort them, the string comparison gives results as
10-C
11-C
8-B
9-A
but I want integer sorting on the basis of first integer present in string...
i.e.
8-B
9-A
10-C
11-C
hope you'll understand...
I've tried this but it throws exception
var query = cx.Classes.Select(x=>x.Name)
.OrderBy( x=> new string(x.TakeWhile(char.IsDigit).ToArray()));
Please help me... want ordering on the basis of classes ....
Maybe Split will do?
.OrderBy(x => Convert.ToInt32(x.Split('-')[0]))
.ThenBy(x => x.Split('-')[1])
If the input is well-formed enough, this would do:
var maxLen = cx.Classes.Max(x => x.Name.Length);
var query = cx.Classes.Select(x => x.Name).OrderBy(x => x.PadLeft(maxLen));
You can add 0 as left padding for a specified length as your data for example 6
.OrderBy(x => x.PadLeft(6, '0'))
This is fundamentally the same approach as Andrius's answer, written out more explicitly:
var names = new[] { "10-C", "8-B", "9-A", "11-C" };
var sortedNames =
(from name in names
let parts = name.Split('-')
select new {
fullName = name,
number = Convert.ToInt32(parts[0]),
letter = parts[1]
})
.OrderBy(x => x.number)
.ThenBy(x => x.letter)
.Select(x => x.fullName);
It's my naive assumption that this would be more efficient because the Split is only processed once in the initial select rather than in both OrderBy and ThenBy, but for all I know the extra "layers" of LINQ may outweigh any gains from that.