Determine if string has all unique characters - c#

I'm working through an algorithm problem set which poses the following question:
"Determine if a string has all unique characters. Assume you can only use arrays".
I have a working solution, but I would like to see if there is anything better optimized in terms of time complexity. I do not want to use LINQ. Appreciate any help you can provide!
static void Main(string[] args)
{
FindDupes("crocodile");
}
static string FindDupes(string text)
{
if (text.Length == 0 || text.Length > 256)
{
Console.WriteLine("String is either empty or too long");
}
char[] str = new char[text.Length];
char[] output = new char[text.Length];
int strLength = 0;
int outputLength = 0;
foreach (char value in text)
{
bool dupe = false;
for (int i = 0; i < strLength; i++)
{
if (value == str[i])
{
dupe = true;
break;
}
}
if (!dupe)
{
str[strLength] = value;
strLength++;
output[outputLength] = value;
outputLength++;
}
}
return new string(output, 0, outputLength);
}

If time complexity is all you care about you could map the characters to int values, then have an array of bool values which remember if you've seen a particular character value previously.
Something like ... [not tested]
bool[] array = new bool[256]; // or larger for Unicode
foreach (char value in text)
if (array[(int)value])
return false;
else
array[(int)value] = true;
return true;

try this,
string RemoveDuplicateChars(string key)
{
string table = string.Empty;
string result = string.Empty;
foreach (char value in key)
{
if (table.IndexOf(value) == -1)
{
table += value;
result += value;
}
}
return result;
}
usage
Console.WriteLine(RemoveDuplicateChars("hello"));
Console.WriteLine(RemoveDuplicateChars("helo"));
Console.WriteLine(RemoveDuplicateChars("Crocodile"));
output
helo
helo
Crocdile

public boolean ifUnique(String toCheck){
String str="";
for(int i=0;i<toCheck.length();i++)
{
if(str.contains(""+toCheck.charAt(i)))
return false;
str+=toCheck.charAt(i);
}
return true;
}
EDIT:
You may also consider to omit the boundary case where toCheck is an empty string.

The following code works:
static void Main(string[] args)
{
isUniqueChart("text");
Console.ReadKey();
}
static Boolean isUniqueChart(string text)
{
if (text.Length == 0 || text.Length > 256)
{
Console.WriteLine(" The text is empty or too larg");
return false;
}
Boolean[] char_set = new Boolean[256];
for (int i = 0; i < text.Length; i++)
{
int val = text[i];//already found this char in the string
if (char_set[val])
{
Console.WriteLine(" The text is not unique");
return false;
}
char_set[val] = true;
}
Console.WriteLine(" The text is unique");
return true;
}

If the string has only lower case letters (a-z) or only upper case letters (A-Z) you can use a very optimized O(1) solution.Also O(1) space.
c++ code :
bool checkUnique(string s){
if(s.size() >26)
return false;
int unique=0;
for (int i = 0; i < s.size(); ++i) {
int j= s[i]-'a';
if(unique & (1<<j)>0)
return false;
unique=unique|(1<<j);
}
return true;
}

Remove Duplicates in entire Unicode Range
Not all characters can be represented by a single C# char. If you need to take into account combining characters and extended unicode characters, you need to:
parse the characters using StringInfo
normalize the characters
find duplicates amongst the normalized strings
Code to remove duplicate characters:
We keep track of the entropy, storing the normalized characters (each character is a string, because many characters require more than 1 C# char). In case a character (normalized) is not yet stored in the entropy, we append the character (in specified form) to the output.
public static class StringExtension
{
public static string RemoveDuplicateChars(this string text)
{
var output = new StringBuilder();
var entropy = new HashSet<string>();
var iterator = StringInfo.GetTextElementEnumerator(text);
while (iterator.MoveNext())
{
var character = iterator.GetTextElement();
if (entropy.Add(character.Normalize()))
{
output.Append(character);
}
}
return output.ToString();
}
}
Unit Test:
Let's test a string that contains variations on the letter A, including the Angstrom sign Å. The Angstrom sign has unicode codepoint u212B, but can also be constructed as the letter A with the diacritic u030A. Both represent the same character.
// ÅÅAaA
var input = "\u212BA\u030AAaA";
// ÅAa
var output = input.RemoveDuplicateChars();
Further extensions could allow for a selector function that determines how to normalize characters. For instance the selector (x) => x.ToUpperInvariant().Normalize() would allow for case-insensitive duplicate removal.

public static bool CheckUnique(string str)
{
int accumulator = 0;
foreach (int asciiCode in str)
{
int shiftedBit = 1 << (asciiCode - ' ');
if ((accumulator & shiftedBit) > 0)
return false;
accumulator |= shiftedBit;
}
return true;
}

Related

Get Difference Between Two Strings in Terms of Remove and Insert Actions

So I have a text box and on the text changed event I have the old text and the new text, and want to get the difference between them. In this case, I want to be able to recreate the new text with the old text using one remove function and one insert function. That is possible because there are a few possibilities of the change that was in the text box:
Text was only removed (one character or more using selection) - ABCD -> AD
Text was only added (one character or more using paste) - ABCD -> ABXXCD
Text was removed and added (by selecting text and entering text in the same action) - ABCD -> AXD
So I want to have these functions:
Sequence GetRemovedCharacters(string oldText, string newText)
{
}
Sequence GetAddedCharacters(string oldText, string newText)
{
}
My Sequence class:
public class Sequence
{
private int start;
private int end;
public Sequence(int start, int end)
{
StartIndex = start; EndIndex = end;
}
public int StartIndex { get { return start; } set { start = value; Length = end - start + 1; } }
public int EndIndex { get { return end; } set { end = value; Length = end - start + 1; } }
public int Length { get; private set; }
public override string ToString()
{
return "(" + StartIndex + ", " + EndIndex + ")";
}
public static bool operator ==(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return true;
else if(IsNull(a) || IsNull(b))
return false;
else
return a.StartIndex == b.StartIndex && a.EndIndex == b.EndIndex;
}
public override bool Equals(object obj)
{
return base.Equals(obj);
}
public static bool operator !=(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return false;
else if(IsNull(a) || IsNull(b))
return true;
else
return a.StartIndex != b.StartIndex && a.EndIndex != b.EndIndex;
}
public override int GetHashCode()
{
return base.GetHashCode();
}
static bool IsNull(Sequence sequence)
{
try
{
return sequence.Equals(null);
}
catch(NullReferenceException)
{
return true;
}
}
}
Extra Explanation: I want to know which characters were removed and which characters were added to the text in order to get the new text so I can recreate this. Let's say I have ABCD -> AXD. 'B' and 'C' would be the characters that were removed and 'X' would be the character that was added. So the output from the GetRemovedCharacters function would be (1, 2) and the output from the GetAddedCharacters function would be (1, 1). The output from the GetRemovedCharacters function refers to indexes in the old text and the output from the GetAddedCharacters function refers to indexes in the old text after removing the removed characters.
EDIT: I've thought of a few directions:
This code I created* which returns the sequence that was affected - if characters were removed it returns the sequence of the characters that were removed in the old text; if characters were added it returns the sequence of the characters that were added in the new text. It does not return the right value (which I myself not sure what I want it to be) when removing and adding text.
Maybe the SelectionStart property in the text box could help - the position of the caret after the text was changed.
*
private static Sequence GetChangeSequence(string oldText, string newText)
{
if(newText.Length > oldText.Length)
{
for(int i = 0; i < newText.Length; i++)
if(i == oldText.Length || newText[i] != oldText[i])
return new Sequence(i, i + (newText.Length - oldText.Length) - 1);
return null;
}
else if(newText.Length < oldText.Length)
{
for(int i = 0; i < oldText.Length; i++)
if(i == newText.Length || oldText[i] != newText[i])
return new Sequence(i, i + (oldText.Length - newText.Length) - 1);
return null;
}
else
return null;
}
Thanks.
A simple string comparison wont do the job since you are asking for a algorithm which supports added and removed chars at the same time and is hence not easy to achive in a few lines of code. Id suggest to use a library instead of writing your own comparison algorithm.
Have a look at this project for example.
I quickly threw this together to give you an idea of what I did to solve your question. It doesn't use your classes but it does find an index so it's customizable for you.
There are also obvious limitations to this as it is just bare bones.
This method will spot out changes made to the original string by comparing it to the changed string
// Find the changes made to a string
string StringDiff (string originalString, string changedString)
{
string diffString = "";
// Iterate over the original string
for (int i = 0; i < originalString.Length; i++)
{
// Get the character to search with
char diffChar = originalString[i];
// If found char in the changed string
if (FindInString(diffChar, changedString, out int index))
{
// Remove from the changed string at the index as we don't want to match to this char again
changedString = changedString.Remove(index, 1);
}
// If not found then this is a difference
else
{
// Add to diff string
diffString += diffChar;
}
}
return diffString;
}
This method will return true at the first matching occurrence (an obvious limitation but this is more to give you an idea)
// Find char at first occurence in string
bool FindInString (char c, string search, out int index)
{
index = -1;
// Iterate over search string
for (int i = 0; i < search.Length; i++)
{
// If found then return true with index
if (c == search[i])
{
index = i;
return true;
}
}
return false;
}
This is a simple helper method to show you an example
void SplitStrings(string oldStr, string newStr)
{
Console.WriteLine($"Old : {oldStr}, New: {newStr}");
Console.WriteLine("Removed - " + StringDiff(oldStr, newStr));
Console.WriteLine("Added - " + StringDiff(newStr, oldStr));
}
I've done it.
static void Main(string[] args)
{
while(true)
{
Console.WriteLine("Enter the Old Text");
string oldText = Console.ReadLine();
Console.WriteLine("Enter the New Text");
string newText = Console.ReadLine();
Console.WriteLine("Enter the Caret Position");
int caretPos = int.Parse(Console.ReadLine());
Sequence removed = GetRemovedCharacters(oldText, newText, caretPos);
if(removed != null)
oldText = oldText.Remove(removed.StartIndex, removed.Length);
Sequence added = GetAddedCharacters(oldText, newText, caretPos);
if(added != null)
oldText = oldText.Insert(added.StartIndex, newText.Substring(added.StartIndex, added.Length));
Console.WriteLine("Worked: " + (oldText == newText).ToString());
Console.ReadKey();
Console.Clear();
}
}
static Sequence GetRemovedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(startIndex, caretPosition + (oldText.Length - newText.Length) - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static Sequence GetAddedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(GetStartIndex(oldText, newText), caretPosition - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static int GetStartIndex(string oldText, string newText)
{
for(int i = 0; i < Math.Max(oldText.Length, newText.Length); i++)
if(i >= oldText.Length || i >= newText.Length || oldText[i] != newText[i])
return i;
return -1;
}
static bool SequenceValid(Sequence sequence)
{
return sequence.StartIndex >= 0 && sequence.EndIndex >= 0 && sequence.EndIndex >= sequence.StartIndex;
}

Compare numeric values as string

I have a method which gets two string. These strings can contain numbers, ASCII chars or both at the same time.
The algorithm works like this:
Split both strings into char Arrays A and B.
Try to parse element Ai and Bi to an int
Compare element Ai with element Bi, in case of integers use direct comparison, in case of chars use ordinal string comparison.
Do work based on the result
Now, I'm wondering: Do I really need to parse the elements to int? I simply could compare each element in an ordinal string comparison and would get the same result, right?
What are the performance implications here? Is parsing and normal comparison faster than ordinal string comparison? Is it slower?
Is my assumption (using ordinal string comparison instead of parsing and comparing) correct?
Here is the method in question:
internal static int CompareComponentString(this string componentString, string other)
{
bool componentEmpty = string.IsNullOrWhiteSpace(componentString);
bool otherEmtpy = string.IsNullOrWhiteSpace(other);
if (componentEmpty && otherEmtpy)
{
return 0;
}
if (componentEmpty)
{
return -1;
}
if (otherEmtpy)
{
return 1;
}
string[] componentParts = componentString.Split(new[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
string[] otherParts = other.Split(new[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < Math.Min(componentParts.Length, otherParts.Length); i++)
{
string componentChar = componentParts[i];
string otherChar = otherParts[i];
int componentNumVal, otherNumVal;
bool componentIsNum = int.TryParse(componentChar, out componentNumVal);
bool otherIsNum = int.TryParse(otherChar, out otherNumVal);
if (componentIsNum && otherIsNum)
{
if (componentNumVal.CompareTo(otherNumVal) == 0)
{
continue;
}
return componentNumVal.CompareTo(otherNumVal);
}
else
{
if (componentIsNum)
{
return -1;
}
if (otherIsNum)
{
return 1;
}
int comp = string.Compare(componentChar, otherChar, StringComparison.OrdinalIgnoreCase);
if (comp != 0)
{
return comp;
}
}
}
return componentParts.Length.CompareTo(otherParts.Length);
}
This are strings that might be used. I might add only the part after the minus sign is used.
1.0.0-alpha
1.0.0-alpha.1
1.0.0-alpha.beta
1.0.0-beta.2
With this method you can create a compare string for each of your string. These strings are comparable by simple alphanumeric comparison.
Assumptions:
There is a minus in the string separating the common part and the indiv part
before the minus is always a substring of three integer values divided by a dot
These integer values are not higher than 999 (look at variable "MaxWidth1")
behind the minus is another substring consisting of several parts, also divided by a dot
The second substring's parts may be numeric or alphanumeric with a max. width of 7 (look at "MaxWidth2")
The second substring consists of max. 5 parts (MaxIndivParts)
Put this method wherever you want:
public string VersionNumberCompareString(string versionNumber, int MaxWidth1=3, int MaxWidth2=7,int MaxIndivParts=5){
string result = null;
int posMinus = versionNumber.IndexOf('-');
string part1 = versionNumber.Substring(0, posMinus);
string part2 = versionNumber.Substring(posMinus+1);
var integerValues=part1.Split('.');
result = integerValues[0].PadLeft(MaxWidth1, '0');
result += integerValues[1].PadLeft(MaxWidth1, '0');
result += integerValues[2].PadLeft(MaxWidth1, '0');
var alphaValues = part2.Split('.');
for (int i = 0; i < MaxIndivParts;i++ ) {
if (i <= alphaValues.GetUpperBound(0)) {
var s = alphaValues[i];
int casted;
if (int.TryParse(s, out casted)) //if int: treat as number
result += casted.ToString().PadLeft(MaxWidth2, '0');
else //treat as string
result += s.PadRight(MaxWidth2, ' ');
}
else
result += new string(' ', MaxWidth2);
}
return result; }
You call it like this:
var s1 = VersionNumberCompareString("1.3.0-alpha.1.12");
//"001003000alpha 00000010000012 "
var s2 = VersionNumberCompareString("0.11.4-beta");
//"000011004beta "
var s3 = VersionNumberCompareString("2.10.11-beta.2");
//"002010011beta 0000002 "
Be aware of the final " sign. All strings are of the same length!
Hope this helps...
that's .net comparison logic for ascii strings -
private unsafe static int CompareOrdinalIgnoreCaseHelper(String strA, String strB)
{
Contract.Requires(strA != null);
Contract.Requires(strB != null);
Contract.EndContractBlock();
int length = Math.Min(strA.Length, strB.Length);
fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
{
char* a = ap;
char* b = bp;
while (length != 0)
{
int charA = *a;
int charB = *b;
Contract.Assert((charA | charB) <= 0x7F, "strings have to be ASCII");
// uppercase both chars - notice that we need just one compare per char
if ((uint)(charA - 'a') <= (uint)('z' - 'a')) charA -= 0x20;
if ((uint)(charB - 'a') <= (uint)('z' - 'a')) charB -= 0x20;
//Return the (case-insensitive) difference between them.
if (charA != charB)
return charA - charB;
// Next char
a++; b++;
length--;
}
return strA.Length - strB.Length;
}
}
having said that, Unless you have a strict performance constaint, i would say if you get the same result from an already implemented & tested function, its better to reuse it and not to reinvent the wheel.
It saves so much time in implementation, unit testing, debugging & bug fixing time. & helps keep the software simple.

Convert Dash-Separated String to camelCase via C#

I have a large XML file that contain tag names that implement the dash-separated naming convention. How can I use C# to convert the tag names to the camel case naming convention?
The rules are:
1. Convert all characters to lower case
2. Capitalize the first character after each dash
3. Remove all dashes
Example
Before Conversion
<foo-bar>
<a-b-c></a-b-c>
</foo-bar>
After Conversion
<fooBar>
<aBC></aBC>
</fooBar>
Here's a code example that works, but it's slow to process - I'm thinking that there is a better way to accomplish my goal.
string ConvertDashToCamelCase(string input)
{
input = input.ToLower();
char[] ca = input.ToCharArray();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < ca.Length; i++)
{
if(ca[i] == '-')
{
string t = ca[i + 1].ToString().toUpper();
sb.Append(t);
i++;
}
else
{
sb.Append(ca[i].ToString());
}
}
return sb.ToString();
}
The reason your original code was slow is because you're calling ToString all over the place unnecessarily. There's no need for that. There's also no need for the intermediate array of char. The following should be much faster, and faster than the version that uses String.Split, too.
string ConvertDashToCamelCase(string input)
{
StringBuilder sb = new StringBuilder();
bool caseFlag = false;
for (int i = 0; i < input.Length; ++i)
{
char c = input[i];
if (c == '-')
{
caseFlag = true;
}
else if (caseFlag)
{
sb.Append(char.ToUpper(c));
caseFlag = false;
}
else
{
sb.Append(char.ToLower(c));
}
}
return sb.ToString();
}
I'm not going to claim that the above is the fastest possible. In fact, there are several obvious optimizations that could save some time. But the above is clean and clear: easy to understand.
The key is the caseFlag, which you use to indicate that the next character copied should be set to upper case. Also note that I don't automatically convert the entire string to lower case. There's no reason to, since you'll be looking at every character anyway and can do the appropriate conversion at that time.
The idea here is that the code doesn't do any more work than it absolutely has to.
For completeness, here's also a regular expression one-liner (inspred by this JavaScript answer):
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-.", m => m.Value.ToUpper().Substring(1));
It replaces all occurrences of -x with x converted to upper case.
Special cases:
If you want lower-case all other characters, replace input with input.ToLower() inside the expression:
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input.ToLower(), "-.", m => m.Value.ToUpper().Substring(1));
If you want to support multiple dashes between words (dash--case) and have all of the dashes removed (dashCase), replace - with -+ in the regular expression (to greedily match all sequences of dashes) and keep only the final character:
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-+.", m => m.Value.ToUpper().Substring(m.Value.Length - 1));
If you want to support multiple dashes between words (dash--case) and remove only the final one (dash-Case), change the regular expression to match only a dash followed by a non-dash (rather than a dash followed by any character):
string ConvertDashToCamelCase(string input) =>
Regex.Replace(input, "-[^-]", m => m.Value.ToUpper().Substring(1));
string ConvertDashToCamelCase(string input)
{
string[] words = input.Split('-');
words = words.Select(element => wordToCamelCase(element));
return string.Join("", words);
}
string wordToCamelCase(string input)
{
return input.First().ToString().ToUpper() + input.Substring(1).ToLower();
}
Here is an updated version of #Jim Mischel's answer that will ignore the content - i.e. it will only camelCase tag names.
string ConvertDashToCamelCase(string input)
{
StringBuilder sb = new StringBuilder();
bool caseFlag = false;
bool tagFlag = false;
for(int i = 0; i < input.Length; i++)
{
char c = input[i];
if(tagFlag)
{
if (c == '-')
{
caseFlag = true;
}
else if (caseFlag)
{
sb.Append(char.ToUpper(c));
caseFlag = false;
}
else
{
sb.Append(char.ToLower(c));
}
}
else
{
sb.Append(c);
}
// Reset tag flag if necessary
if(c == '>' || c == '<')
{
tagFlag = (c == '<');
}
}
return sb.ToString();
}
using System;
using System.Text;
public class MyString
{
public static string ToCamelCase(string str)
{
char[] s = str.ToCharArray();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < s.Length; i++)
{
if (s[i] == '-' || s[i] == '_')
sb.Append(Char.ToUpper(s[++i]));
else
sb.Append(s[i]);
}
return sb.ToString();
}
}

How to insert/remove hyphen to/from a plain string in c#?

I have a string like this;
string text = "6A7FEBFCCC51268FBFF";
And I have one method for which I want to insert the logic for appending the hyphen after 4 characters to 'text' variable. So, the output should be like this;
6A7F-EBFC-CC51-268F-BFF
Appending hyphen to above 'text' variable logic should be inside this method;
public void GetResultsWithHyphen
{
// append hyphen after 4 characters logic goes here
}
And I want also remove the hyphen from a given string such as 6A7F-EBFC-CC51-268F-BFF. So, removing hyphen from a string logic should be inside this method;
public void GetResultsWithOutHyphen
{
// Removing hyphen after 4 characters logic goes here
}
How can I do this in C# (for desktop app)?
What is the best way to do this?
Appreciate everyone's answer in advance.
GetResultsWithOutHyphen is easy (and should return a string instead of void
public string GetResultsWithOutHyphen(string input)
{
// Removing hyphen after 4 characters logic goes here
return input.Replace("-", "");
}
for GetResultsWithHyphen, there may be slicker ways to do it, but here's one way:
public string GetResultsWithHyphen(string input)
{
// append hyphen after 4 characters logic goes here
string output = "";
int start = 0;
while (start < input.Length)
{
output += input.Substring(start, Math.Min(4,input.Length - start)) + "-";
start += 4;
}
// remove the trailing dash
return output.Trim('-');
}
Use regex:
public String GetResultsWithHyphen(String inputString)
{
return Regex.Replace(inputString, #"(\w{4})(\w{4})(\w{4})(\w{4})(\w{3})",
#"$1-$2-$3-$4-$5");
}
and for removal:
public String GetResultsWithOutHyphen(String inputString)
{
return inputString.Replace("-", "");
}
Here's the shortest regex I could come up with. It will work on strings of any length. Note that the \B token will prevent it from matching at the end of a string, so you don't have to trim off an extra hyphen as with some answers above.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string text = "6A7FEBFCCC51268FBFF";
for (int i = 0; i <= text.Length;i++ )
Console.WriteLine(hyphenate(text.Substring(0, i)));
}
static string hyphenate(string s)
{
var re = new Regex(#"(\w{4}\B)");
return re.Replace (s, "$1-");
}
static string dehyphenate (string s)
{
return s.Replace("-", "");
}
}
}
var hyphenText = new string(
text
.SelectMany((i, ch) => i%4 == 3 && i != text.Length-1 ? new[]{ch, '-'} : new[]{ch})
.ToArray()
)
something along the lines of:
public string GetResultsWithHyphen(string inText)
{
var counter = 0;
var outString = string.Empty;
while (counter < inText.Length)
{
if (counter % 4 == 0)
outString = string.Format("{0}-{1}", outString, inText.Substring(counter, 1));
else
outString += inText.Substring(counter, 1);
counter++;
}
return outString;
}
This is rough code and may not be perfectly, syntactically correct
public static string GetResultsWithHyphen(string str) {
return Regex.Replace(str, "(.{4})", "$1-");
//if you don't want trailing -
//return Regex.Replace(str, "(.{4})(?!$)", "$1-");
}
public static string GetResultsWithOutHyphen(string str) {
//if you just want to remove the hyphens:
//return input.Replace("-", "");
//if you REALLY want to remove hyphens only if they occur after 4 places:
return Regex.Replace(str, "(.{4})-", "$1");
}
For removing:
String textHyphenRemoved=text.Replace('-',''); should remove all of the hyphens
for adding
StringBuilder strBuilder = new StringBuilder();
int startPos = 0;
for (int i = 0; i < text.Length / 4; i++)
{
startPos = i * 4;
strBuilder.Append(text.Substring(startPos,4));
//if it isn't the end of the string add a hyphen
if(text.Length-startPos!=4)
strBuilder.Append("-");
}
//add what is left
strBuilder.Append(text.Substring(startPos, 4));
string textWithHyphens = strBuilder.ToString();
Do note that my adding code is untested.
GetResultsWithOutHyphen method
public string GetResultsWithOutHyphen(string input)
{
return input.Replace("-", "");
}
GetResultsWithOutHyphen method
You could pass a variable instead of four for flexibility.
public string GetResultsWithHyphen(string input)
{
string output = "";
int start = 0;
while (start < input.Length)
{
char bla = input[start];
output += bla;
start += 1;
if (start % 4 == 0)
{
output += "-";
}
}
return output;
}
This worked for me when I had a value for a social security number (123456789) and needed it to display as (123-45-6789) in a listbox.
ListBox1.Items.Add("SS Number : " & vbTab & Format(SSNArray(i), "###-##-####"))
In this case I had an array of Social Security Numbers. This line of code alters the formatting to put a hyphen in.
Callee
public static void Main()
{
var text = new Text("THISisJUSTanEXAMPLEtext");
var convertText = text.Convert();
Console.WriteLine(convertText);
}
Caller
public class Text
{
private string _text;
private int _jumpNo = 4;
public Text(string text)
{
_text = text;
}
public Text(string text, int jumpNo)
{
_text = text;
_jumpNo = jumpNo < 1 ? _jumpNo : jumpNo;
}
public string Convert()
{
if (string.IsNullOrEmpty(_text))
{
return string.Empty;
}
if (_text.Length < _jumpNo)
{
return _text;
}
var convertText = _text.Substring(0, _jumpNo);
int start = _jumpNo;
while (start < _text.Length)
{
convertText += "-" + _text.Substring(start, Math.Min(_jumpNo, _text.Length - start));
start += _jumpNo;
}
return convertText;
}
}

C# string comparison ignoring spaces, carriage return or line breaks

How can I compare 2 strings in C# ignoring the case, spaces and any line-breaks. I also need to check if both strings are null then they are marked as same.
Thanks!
You should normalize each string by removing the characters that you don't want to compare and then you can perform a String.Equals with a StringComparison that ignores case.
Something like this:
string s1 = "HeLLo wOrld!";
string s2 = "Hello\n WORLd!";
string normalized1 = Regex.Replace(s1, #"\s", "");
string normalized2 = Regex.Replace(s2, #"\s", "");
bool stringEquals = String.Equals(
normalized1,
normalized2,
StringComparison.OrdinalIgnoreCase);
Console.WriteLine(stringEquals);
Here Regex.Replace is used first to remove all whitespace characters. The special case of both strings being null is not treated here but you can easily handle that case before performing the string normalization.
This may also work.
String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0
Edit:
IgnoreSymbols: Indicates that the string comparison must ignore symbols, such as
white-space characters, punctuation, currency symbols, the percent
sign, mathematical symbols, the ampersand, and so on.
Remove all the characters you don't want and then use the ToLower() method to ignore case.
edit: While the above works, it's better to use StringComparison.OrdinalIgnoreCase. Just pass it as the second argument to the Equals method.
First replace all whitespace via regular expression from both string and then use the String.Compare method with parameter ignoreCase = true.
string a = System.Text.RegularExpressions.Regex.Replace("void foo", #"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace("voidFoo", #"\s", "");
bool isTheSame = String.Compare(a, b, true) == 0;
If you need performance, the Regex solutions on this page run too slow for you. Maybe you have a large list of strings you want to sort. (A Regex solution is more readable however)
I have a class that looks at each individual char in both strings and compares them while ignoring case and whitespace. It doesn't allocate any new strings. It uses the char.IsWhiteSpace(ch) to determine whitespace, and char.ToLowerInvariant(ch) for case-insensitivity (if required). In my testing, my solution runs about 5x - 8x faster than a Regex-based solution. My class also implements IEqualityComparer's GetHashCode(obj) method using this code in another SO answer. This GetHashCode(obj) also ignores whitespace and optionally ignores case.
Here's my class:
private class StringCompIgnoreWhiteSpace : IEqualityComparer<string>
{
public bool Equals(string strx, string stry)
{
if (strx == null) //stry may contain only whitespace
return string.IsNullOrWhiteSpace(stry);
else if (stry == null) //strx may contain only whitespace
return string.IsNullOrWhiteSpace(strx);
int ix = 0, iy = 0;
for (; ix < strx.Length && iy < stry.Length; ix++, iy++)
{
char chx = strx[ix];
char chy = stry[iy];
//ignore whitespace in strx
while (char.IsWhiteSpace(chx) && ix < strx.Length)
{
ix++;
chx = strx[ix];
}
//ignore whitespace in stry
while (char.IsWhiteSpace(chy) && iy < stry.Length)
{
iy++;
chy = stry[iy];
}
if (ix == strx.Length && iy != stry.Length)
{ //end of strx, so check if the rest of stry is whitespace
for (int iiy = iy + 1; iiy < stry.Length; iiy++)
{
if (!char.IsWhiteSpace(stry[iiy]))
return false;
}
return true;
}
if (ix != strx.Length && iy == stry.Length)
{ //end of stry, so check if the rest of strx is whitespace
for (int iix = ix + 1; iix < strx.Length; iix++)
{
if (!char.IsWhiteSpace(strx[iix]))
return false;
}
return true;
}
//The current chars are not whitespace, so check that they're equal (case-insensitive)
//Remove the following two lines to make the comparison case-sensitive.
chx = char.ToLowerInvariant(chx);
chy = char.ToLowerInvariant(chy);
if (chx != chy)
return false;
}
//If strx has more chars than stry
for (; ix < strx.Length; ix++)
{
if (!char.IsWhiteSpace(strx[ix]))
return false;
}
//If stry has more chars than strx
for (; iy < stry.Length; iy++)
{
if (!char.IsWhiteSpace(stry[iy]))
return false;
}
return true;
}
public int GetHashCode(string obj)
{
if (obj == null)
return 0;
int hash = 17;
unchecked // Overflow is fine, just wrap
{
for (int i = 0; i < obj.Length; i++)
{
char ch = obj[i];
if(!char.IsWhiteSpace(ch))
//use this line for case-insensitivity
hash = hash * 23 + char.ToLowerInvariant(ch).GetHashCode();
//use this line for case-sensitivity
//hash = hash * 23 + ch.GetHashCode();
}
}
return hash;
}
}
private static void TestComp()
{
var comp = new StringCompIgnoreWhiteSpace();
Console.WriteLine(comp.Equals("abcd", "abcd")); //true
Console.WriteLine(comp.Equals("abCd", "Abcd")); //true
Console.WriteLine(comp.Equals("ab Cd", "Ab\n\r\tcd ")); //true
Console.WriteLine(comp.Equals(" ab Cd", " A b" + Environment.NewLine + "cd ")); //true
Console.WriteLine(comp.Equals(null, " \t\n\r ")); //true
Console.WriteLine(comp.Equals(" \t\n\r ", null)); //true
Console.WriteLine(comp.Equals("abcd", "abcd h")); //false
Console.WriteLine(comp.GetHashCode(" a b c d")); //-699568861
//This is -699568861 if you #define StringCompIgnoreWhiteSpace_CASE_INSENSITIVE
// Otherwise it's -1555613149
Console.WriteLine(comp.GetHashCode("A B c \t d"));
}
Here's my testing code (with a Regex example):
private static void SpeedTest()
{
const int loop = 100000;
string first = "a bc d";
string second = "ABC D";
var compChar = new StringCompIgnoreWhiteSpace();
Stopwatch sw1 = Stopwatch.StartNew();
for (int i = 0; i < loop; i++)
{
bool equals = compChar.Equals(first, second);
}
sw1.Stop();
Console.WriteLine(string.Format("char time = {0}", sw1.Elapsed)); //char time = 00:00:00.0361159
var compRegex = new StringCompIgnoreWhiteSpaceRegex();
Stopwatch sw2 = Stopwatch.StartNew();
for (int i = 0; i < loop; i++)
{
bool equals = compRegex.Equals(first, second);
}
sw2.Stop();
Console.WriteLine(string.Format("regex time = {0}", sw2.Elapsed)); //regex time = 00:00:00.2773072
}
private class StringCompIgnoreWhiteSpaceRegex : IEqualityComparer<string>
{
public bool Equals(string strx, string stry)
{
if (strx == null)
return string.IsNullOrWhiteSpace(stry);
else if (stry == null)
return string.IsNullOrWhiteSpace(strx);
string a = System.Text.RegularExpressions.Regex.Replace(strx, #"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace(stry, #"\s", "");
return String.Compare(a, b, true) == 0;
}
public int GetHashCode(string obj)
{
if (obj == null)
return 0;
string a = System.Text.RegularExpressions.Regex.Replace(obj, #"\s", "");
return a.GetHashCode();
}
}
I would probably start by removing the characters you don't want to compare from the string before comparing. If performance is a concern, you might look at storing a version of each string with the characters already removed.
Alternatively, you could write a compare routine that would skip over the characters you want to ignore. But that just seems like more work to me.
You can also use the following custom function
public static string ExceptChars(this string str, IEnumerable<char> toExclude)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
char c = str[i];
if (!toExclude.Contains(c))
sb.Append(c);
}
return sb.ToString();
}
public static bool SpaceCaseInsenstiveComparision(this string stringa, string stringb)
{
return (stringa==null&&stringb==null)||stringa.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }).Equals(stringb.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }));
}
And then use it following way
"Te st".SpaceCaseInsenstiveComparision("Te st");
Another option is the LINQ SequenceEquals method which according to my tests is more than twice as fast as the Regex approach used in other answers and very easy to read and maintain.
public static bool Equals_Linq(string s1, string s2)
{
return Enumerable.SequenceEqual(
s1.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant),
s2.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant));
}
public static bool Equals_Regex(string s1, string s2)
{
return string.Equals(
Regex.Replace(s1, #"\s", ""),
Regex.Replace(s2, #"\s", ""),
StringComparison.OrdinalIgnoreCase);
}
Here the simple performance test code I used:
var s1 = "HeLLo wOrld!";
var s2 = "Hello\n WORLd!";
var watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
Equals_Linq(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~1.7 seconds
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
Equals_Regex(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~4.6 seconds
An approach not optimized for performance, but for completeness.
normalizes null
normalizes unicode, combining characters, diacritics
normalizes new lines
normalizes white space
normalizes casing
code snippet:
public static class StringHelper
{
public static bool AreEquivalent(string source, string target)
{
if (source == null) return target == null;
if (target == null) return false;
var normForm1 = Normalize(source);
var normForm2 = Normalize(target);
return string.Equals(normForm1, normForm2);
}
private static string Normalize(string value)
{
Debug.Assert(value != null);
// normalize unicode, combining characters, diacritics
value = value.Normalize(NormalizationForm.FormC);
// normalize new lines to white space
value = value.Replace("\r\n", "\n").Replace("\r", "\n");
// normalize white space
value = Regex.Replace(value, #"\s", string.Empty);
// normalize casing
return value.ToLowerInvariant();
}
}
I would Trim the string using Trim() to remove all the
whitespace.
Use StringComparison.OrdinalIgnoreCase to ignore case sensitivity ex. stringA.Equals(stringB, StringComparison.OrdinalIgnoreCase)

Categories

Resources