How many specified chars are in a string? - c#

taken a string example as 550e8400-e29b-41d4-a716-446655440000 how can one count how many - chars are in such string?
I'm currently using:
int total = "550e8400-e29b-41d4-a716-446655440000".Split('-').Length + 1;
Is there any method that we don't need to add 1... like using the Count maybe?
All other methods such as
Contains IndexOf etc only return the first position and a boolean value, nothing returns how many were found.
what am I missing?

You can use the LINQ method Enumerable.Count for this purpose (note that a string is an IEnumerable<char>):
int numberOfHyphens = text.Count(c => c == '-');
The argument is a Func<char, bool>, a predicate that specifies when an item is deemed to have 'passed' the filter.
This is (loosely speaking) equivalent to:
int numberOfHyphens = 0;
foreach (char c in text)
{
if (c == '-') numberOfHyphens++;
}

using System.Linq;
..
int total = "550e8400-e29b-41d4-a716-446655440000".Count(c => c == '-');

int total = "550e8400-e29b-41d4-a716-446655440000".Count(c => c == '-');

The most straight forward method is to simply loop throught the characters, as that is what any algorithm has to do some way or the other:
int total = 0;
foreach (char c in theString) {
if (c == '-') total++;
}
You can use extension methods to do basically the same:
int total = theString.Count(c => c == '-');
Or:
int total = theString.Aggregate(0, (t,c) => c == '-' ? t + 1 : t)
Then there are interresting (but less efficient) tricks like removing the characters and compare the lengths:
int total = theString.Length - theString.Replace("-", String.Empty).Length;
Or using a regular expression to find all occurances of the character:
int total = Regex.Matches(theString, "-").Count;

int total = "550e8400-e29b-41d4-a716-446655440000".Count(c => c == '-')

To find the number of '-' in a string, you are going to need to loop through the string and check each character, so the simplest thing to do is to just write a function that does that. Using Split actually takes more time because it creates arrays for no reason.
Also, it's confusing what you are trying to do, and it even looks like you got it wrong (you need to subtract 1).

Try this:
string s = "550e8400-e29b-41d4-a716-446655440000";
int result = s.ToCharArray().Count( c => c == '-');

Related

Find first difference in strings case insensitive given culture

There are many ways to compare two strings to find the first index where they differ, but if I require case-insensitivity in any given culture, then the options go away.
This is the only way I can think to do such a comparison:
static int FirstDiff(string str1, string str2)
{
for (int i = 1; i <= str1.Length && i <= str2.Length; i++)
if (!string.Equals(str1.Substring(0, i), str2.Substring(0, i), StringComparison.CurrentCultureIgnoreCase))
return i - 1;
return -1; // strings are identical
}
Can anyone think of a better way that doesn't involve so much string allocation?
For testing purposes:
// Turkish word 'open' contains the letter 'ı' which is the lowercase of 'I' in Turkish, but not English
string lowerCase = "açık";
string upperCase = "AÇIK";
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
FirstDiff(lowerCase, upperCase); // Should return 2
Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
FirstDiff(lowerCase, upperCase); // Should return -1
Edit: Checking both ToUpper and ToLower for each character appears to work for any example that I can come up with. But will it work for all cultures? Perhaps this is a question better directed at linguists.
One way to reduce the number of string allocations is to reduce the number of times you do a comparison. We can borrow from the binary search algorithm for searching arrays in this case, and start out by comparing a substring that is half the length of the string. Then we continue to add or remove half of the remaining indexes (depending on whether or not the strings were equal), until we get to the first instance of inequality.
In general this should speed up the search time:
public static int FirstDiffBinarySearch(string str1, string str2)
{
// "Fail fast" checks
if (string.Equals(str1, str2, StringComparison.CurrentCultureIgnoreCase))
return -1;
if (str1 == null || str2 == null) return 0;
int min = 0;
int max = Math.Min(str1.Length, str2.Length);
int mid = (min + max) / 2;
while (min <= max)
{
if (string.Equals(str1.Substring(0, mid), str2.Substring(0, mid),
StringComparison.CurrentCultureIgnoreCase))
{
min = mid + 1;
}
else
{
max = mid - 1;
}
mid = (min + max) / 2;
}
return mid;
}
You could compare characters instead of strings. This is far from optimized, and rather quick and dirty, but something like this appears to work
for (int i = 0; i < str1.Length && i < str2.Length; i++)
if (char.ToLower(str1[i]) != char.ToLower(str2[i]))
return i;
This should work with culture as well, according to the docs: https://learn.microsoft.com/en-us/dotnet/api/system.char.tolower?view=netframework-4.8
Casing rules are obtained from the current culture.
To convert a character to lowercase by using the casing conventions of the current culture, call the ToLower(Char, CultureInfo) method overload with a value of CurrentCulture for its culture parameter.
You need to check both ToLower and ToUpper.
private static int FirstDiff(string str1, string str2)
{
int length = Math.Min(str1.Length, str2.Length);
TextInfo textInfo = CultureInfo.CurrentCulture.TextInfo;
for (int i = 0; i < length; ++i)
{
if (textInfo.ToUpper(str1[i]) != textInfo.ToUpper(str2[i]) ||
textInfo.ToLower(str1[i]) != textInfo.ToLower(str2[i]))
{
return i;
}
}
return str1.Length == str2.Length ? -1 : length;
}
I am reminded of one additional oddity of characters (or rather unicode code points): there are some that act as surrogate pairs and they have no relevance to any culture unless the pair appears next to one another. For more information about Unicode interpretation standards see the document that #orhtej2 linked in his comment.
While trying out different solutions I stumbled upon this particular class, and I think it will best suit my needs: System.Globalization.StringInfo (The MS Doc Example shows it in action with surrogate pairs)
The class breaks the string down into sections by pieces that need each other to make sense (rather than by strictly character). I can then compare each piece by culture using string.Equals and return the index of the first piece that differs:
static int FirstDiff(string str1, string str2)
{
var si1 = StringInfo.GetTextElementEnumerator(str1);
var si2 = StringInfo.GetTextElementEnumerator(str2);
bool more1, more2;
while ((more1 = si1.MoveNext()) & (more2 = si2.MoveNext())) // single & to avoid short circuiting the right counterpart
if (!string.Equals(si1.Current as string, si2.Current as string, StringComparison.CurrentCultureIgnoreCase))
return si1.ElementIndex;
if (more1 || more2)
return si1.ElementIndex;
else
return -1; // strings are equivalent
}
Here's a little different approach. Strings are technically arrays of char, so I'm using that along with LINQ.
var list1 = "Hellow".ToLower().ToList();
var list2 = "HeLio".ToLower().ToList();
var diffIndex = list1.Zip(list2, (item1, item2) => item1 == item2)
.Select((match, index) => new { Match = match, Index = index })
.Where(a => !a.Match)
.Select(a => a.Index).FirstOrDefault();
If they match, diffIndex will be zero. Otherwise it will be the index of the first mismatching character.
Edit:
A little improved version with casting to lower case on the go. And initial ToList() was really redundant.
var diffIndex = list1.Zip(list2, (item1, item2) => char.ToLower(item1) == char.ToLower(item2))
.Select((match, index) => new { Match = match, Index = index })
.Where(a => !a.Match)
.Select(a => a.Index).FirstOrDefault();
Edit2:
Here's a working version where it can be further shortened. This is the best answer since in the previous two you'll get 0 if strings match. Here' if strings match you get null and the index otherwise.
var list1 = "Hellow";
var list2 = "HeLio";
var diffIndex = list1.Zip(list2, (item1, item2) => char.ToLower(item1) == char.ToLower(item2))
.Select((match, index) => new { Match = match, Index = index })
.FirstOrDefault(x => !x.Match)?.Index;

How to treat integers from a string as multi-digit numbers and not individual digits?

My input is a string of integers, which I have to check whether they are even and display them on the console, if they are. The problem is that what I wrote checks only the individual digits and not the numbers.
string even = "";
while (true)
{
string inputData = Console.ReadLine();
if (inputData.Equals("x", StringComparison.OrdinalIgnoreCase))
{
break;
}
for (int i = 0; i < inputData.Length; i++)
{
if (inputData[i] % 2 == 0)
{
even +=inputData[i];
}
}
}
foreach (var e in even)
Console.WriteLine(e);
bool something = string.IsNullOrEmpty(even);
if( something == true)
{
Console.WriteLine("N/A");
}
For example, if the input is:
12
34
56
my output is going to be
2
4
6 (every number needs to be displayed on a new line).
What am I doing wrong? Any help is appreciated.
Use string.Split to get the independent sections and then int.TryParse to check if it is a number (check Parse v. TryParse). Then take only even numbers:
var evenNumbers = new List<int>();
foreach(var s in inputData.Split(" "))
{
if(int.TryParse(s, out var num) && num % 2 == 0)
evenNumbers.Add(num); // If can't use collections: Console.WriteLine(num);
}
(notice the use of out vars introduced in C# 7.0)
If you can use linq then similar to this answer:
var evenNumbers = inputData.Split(" ")
.Select(s => (int.TryParse(s, out var value), value))
.Where(pair => pair.Item1)
.Select(pair => pair.value);
I think you do too many things here at once. Instead of already checking if the number is even, it is better to solve one problem at a time.
First we can make substrings by splitting the string into "words". Net we convert every substring to an int, and finally we filter on even numbers, like:
var words = inputData.Split(' '); # split the words by a space
var intwords = words.Select(int.Parse); # convert these to ints
var evenwords = intwords.Where(x => x % 2 == 0); # check if these are even
foreach(var even in evenwords) { # print the even numbers
Console.WriteLine(even);
}
Here it can still happen that some "words" are not integers, for example "12 foo 34". So you will need to implement some extra filtering between splitting and converting.

I'm trying to find a specific word within a char array

Using C# I'm trying to find a specific word within a char array. Also, I don't want the same letter used more than once i.e. the word is 'hello' and I'm trying to find it within a random array of letters, so if the letter 'l' is used out of the random array of letters, I don't want it to be used again. There should be another 'l' within the array of letters to be used as the second 'l' in "hello". Just trying to be precise. A simple answer would be very helpful. Thank you.
Here is my attempt so far.
public static char [] Note = "hello".ToCharArray();
public static char [] Newspaper = "ahrenlxlpoz".ToCharArray();
static void main(string[] args)
{
Array.Sort(Note);
Array.Sort(Newspaper);
if(Newspaper.Contains<Note>)
{
Console.Write("It should display the letters of Note found within Newspaper");
}
}
I assume by "contains" you mean Newspaper has enough number of letters from each letter to make up Note. For example, you need at least two l's to make up the word "hello". If so, you need to basically count the number of each letter in both strings, and make sure the number of each letter in Note is less than or equal to the number of that letter in Newspaper.
var dictNote = Note.GroupBy(c => c).ToDictionary(g => g.Key, g => g.Count());
var dictNews = Newspaper.GroupBy(c => c).ToDictionary(g => g.Key, g => g.Count());
bool contains = dictNote.All(x =>
dictNews.ContainsKey(x.Key) && x.Value <= dictNews[x.Key]);
In fact, a string is a char array. And the most "classic" way to do this would be:
string Note = "hello";
char[] Newspaper = "ahrenlxlpoz".ToCharArray();
string res = "";
for (int i = 0; i < Note.Length; i++)
for (int j = 0; j < Newspaper.Length; j++)
if (Note[i] == Newspaper[j])
{
res += Newspaper[j];
Newspaper[j] = ' ';
break;
}
//This prints the remaining characters in Newspaper. I avoid repeating chars.
for (int i = 0; i < Newspaper.Length; i++ )
Console.Write(Newspaper[i]+"\n");
Console.Write("\n\n");
if (Note.Equals(res)) Console.Write("Word found");
else Console.Write("Word NOT found");
Console.Read();
At the end, res will be "hello". Print res in the console. I added the ' ' to avoid repeated characters as someone said in the answer up. So at the end it will compare the result with the word and will tell you if it found the word in the string. Try changing Newspaper to this: "ahrenlxlpaz" and it will tell you the word is NOT found :)
Try this:
public static char[] Note = "hello".ToCharArray();
public static char[] Newspaper = "ahrenlxlpoz".ToCharArray();
foreach (char c in Note) //check each character of Note
{
if (Newspaper.Contains(c))
{
Console.Write(c); //it will display hello
}
}

What's the best way to split a list of strings to match first and last letters?

I have a long list of words in C#, and I want to find all the words within that list that have the same first and last letters and that have a length of between, say, 5 and 7 characters. For example, the list might have:
"wasted was washed washing was washes watched watches wilts with wastes wits washings"
It would return
Length: 5-7, First letter: w, Last letter: d, "wasted, washed, watched"
Length: 5-7, First letter: w, Last letter: s, "washes, watches, wilts, wastes"
Then I might change the specification for a length of 3-4 characters which would return
Length: 3-4, First letter: w, Last letter: s, "was, wits"
I found this method of splitting which is really fast, made each item unique, used the length and gave an excellent start:
Spliting string into words length-based lists c#
Is there a way to modify/use that to take account of first and last letters?
EDIT
I originally asked about the 'fastest' way because I usually solve problems like this with lots of string arrays (which are slow and involve a lot of code). LINQ and lookups are new to me, but I can see that the ILookup used in the solution I linked to is amazing in its simplicity and is very fast. I don't actually need the minimum processor time. Any approach that avoids me creating separate arrays for this information would be fantastic.
this one liner will give you groups with same first/last letter in your range
int min = 5;
int max = 7;
var results = str.Split()
.Where(s => s.Length >= min && s.Length <= max)
.GroupBy(s => new { First = s.First(), Last = s.Last()});
var minLength = 5;
var maxLength = 7;
var firstPart = "w";
var lastPart = "d";
var words = new List<string> { "washed", "wash" }; // so on
var matches = words.Where(w => w.Length >= minLength && w.Length <= maxLength &&
w.StartsWith(firstPart) && w.EndsWith(lastPart))
.ToList();
for the most part, this should be fast enough, unless you're dealing with tens of thousands of words and worrying about ms. then we can look further.
Just in LINQPad I created this:
void Main()
{
var words = new []{"wasted", "was", "washed", "washing", "was", "washes", "watched", "watches", "wilts", "with", "wastes", "wits", "washings"};
var firstLetter = "w";
var lastLetter = "d";
var minimumLength = 5;
var maximumLength = 7;
var sortedWords = words.Where(w => w.StartsWith(firstLetter) && w.EndsWith(lastLetter) && w.Length >= minimumLength && w.Length <= maximumLength);
sortedWords.Dump();
}
If that isn't fast enough, I would create a lookup table:
Dictionary<char, Dictionary<char, List<string>> lookupTable;
and do:
lookupTable[firstLetter][lastLetter].Where(<check length>)
Here's a method that does exactly what you want. You are only given a list of strings and the min/max length, correct? You aren't given the first and last letters to filter on. This method processes all the first/last letters in the strings.
private static void ProcessInput(string[] words, int minLength, int maxLength)
{
var groups = from word in words
where word.Length > 0 && word.Length >= minLength && word.Length <= maxLength
let key = new Tuple<char, char>(word.First(), word.Last())
group word by key into #group
orderby Char.ToLowerInvariant(#group.Key.Item1), #group.Key.Item1, Char.ToLowerInvariant(#group.Key.Item2), #group.Key.Item2
select #group;
Console.WriteLine("Length: {0}-{1}", minLength, maxLength);
foreach (var group in groups)
{
Console.WriteLine("First letter: {0}, Last letter: {1}", group.Key.Item1, group.Key.Item2);
foreach (var word in group)
Console.WriteLine("\t{0}", word);
}
}
Just as a quick thought, I have no clue if this would be faster or more efficient than the linq solutions posted, but this could also be done fairly easily with regular expressions.
For example, if you wanted to get 5-7 letter length words that begin with "w" and end with "s", you could use a pattern along the lines of:
\bw[A-Za-z]{3,5}s\b
(and this could fairly easily be made to be more variable driven - For example, have a variable for first letter, min length, max length, last letter and plug them in to the pattern to replace w, 3, 5 & s)
Them, using the RegEx library, you could then just take your captured groups to be your list.
Again, I don't know how this compares efficiency-wise to linq, but I thought it might deserve mention.
Hope this helps!!

Count the number of characters between opening and closing '<' and '>' for an entire file

I would like to count the number of characters between opening and closing '<' and '>' for an entire file (e.g. <tag>bla<tag> == 6). I could always write a quick algo to do it, but I am curious to know if there is another way. Maybe Regular expression?
Thanks
It's probably what you already had in mind, but:
string s = System.IO.File.ReadAllText("myfile.txt");
bool inbrackets = false;
int count = 0;
foreach (char ch in s)
{
if (ch == '<')
inbrackets = true;
else if (ch == '>')
inbrackets = false;
else if (inbrackets)
++count;
}
System.Console.WriteLine("count = " + count);
Update: If you want to handle embedded brackets, just use an int counter instead of a bool. Sorry, that's obvious, but just an afterthought.
You could with regex do it like this:
var brackets = new char[] {'<', '>'};
int counter = 0;
foreach (var match in System.Text.RegularExpressions.Regex.Matches(data, #"</?[^<>]+>"))
counter += match.ToString().Trim(brackets).TrimStart('/').Length;
This also counts ending tags correctly if you happen to have those aswell.
Assuming there's no nesting tags, and you have well formed input
var charcount = File.ReadAllText("C:\foo.txt").Split('<')
.Select(x => x.IndexOf('>')).Where(x => x > 0).Sum();
If you have nesting or need error checking, obviously you're going to need to write something more thorough.
int sum = new Regex("<([^<>]+?)>").Matches("<tag>bla<tag>")
.Cast<Match>()
.Sum(m => m.Value.Length - 2);
= 6

Categories

Resources