Get Difference Between Two Strings in Terms of Remove and Insert Actions - c#

So I have a text box and on the text changed event I have the old text and the new text, and want to get the difference between them. In this case, I want to be able to recreate the new text with the old text using one remove function and one insert function. That is possible because there are a few possibilities of the change that was in the text box:
Text was only removed (one character or more using selection) - ABCD -> AD
Text was only added (one character or more using paste) - ABCD -> ABXXCD
Text was removed and added (by selecting text and entering text in the same action) - ABCD -> AXD
So I want to have these functions:
Sequence GetRemovedCharacters(string oldText, string newText)
{
}
Sequence GetAddedCharacters(string oldText, string newText)
{
}
My Sequence class:
public class Sequence
{
private int start;
private int end;
public Sequence(int start, int end)
{
StartIndex = start; EndIndex = end;
}
public int StartIndex { get { return start; } set { start = value; Length = end - start + 1; } }
public int EndIndex { get { return end; } set { end = value; Length = end - start + 1; } }
public int Length { get; private set; }
public override string ToString()
{
return "(" + StartIndex + ", " + EndIndex + ")";
}
public static bool operator ==(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return true;
else if(IsNull(a) || IsNull(b))
return false;
else
return a.StartIndex == b.StartIndex && a.EndIndex == b.EndIndex;
}
public override bool Equals(object obj)
{
return base.Equals(obj);
}
public static bool operator !=(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return false;
else if(IsNull(a) || IsNull(b))
return true;
else
return a.StartIndex != b.StartIndex && a.EndIndex != b.EndIndex;
}
public override int GetHashCode()
{
return base.GetHashCode();
}
static bool IsNull(Sequence sequence)
{
try
{
return sequence.Equals(null);
}
catch(NullReferenceException)
{
return true;
}
}
}
Extra Explanation: I want to know which characters were removed and which characters were added to the text in order to get the new text so I can recreate this. Let's say I have ABCD -> AXD. 'B' and 'C' would be the characters that were removed and 'X' would be the character that was added. So the output from the GetRemovedCharacters function would be (1, 2) and the output from the GetAddedCharacters function would be (1, 1). The output from the GetRemovedCharacters function refers to indexes in the old text and the output from the GetAddedCharacters function refers to indexes in the old text after removing the removed characters.
EDIT: I've thought of a few directions:
This code I created* which returns the sequence that was affected - if characters were removed it returns the sequence of the characters that were removed in the old text; if characters were added it returns the sequence of the characters that were added in the new text. It does not return the right value (which I myself not sure what I want it to be) when removing and adding text.
Maybe the SelectionStart property in the text box could help - the position of the caret after the text was changed.
*
private static Sequence GetChangeSequence(string oldText, string newText)
{
if(newText.Length > oldText.Length)
{
for(int i = 0; i < newText.Length; i++)
if(i == oldText.Length || newText[i] != oldText[i])
return new Sequence(i, i + (newText.Length - oldText.Length) - 1);
return null;
}
else if(newText.Length < oldText.Length)
{
for(int i = 0; i < oldText.Length; i++)
if(i == newText.Length || oldText[i] != newText[i])
return new Sequence(i, i + (oldText.Length - newText.Length) - 1);
return null;
}
else
return null;
}
Thanks.

A simple string comparison wont do the job since you are asking for a algorithm which supports added and removed chars at the same time and is hence not easy to achive in a few lines of code. Id suggest to use a library instead of writing your own comparison algorithm.
Have a look at this project for example.

I quickly threw this together to give you an idea of what I did to solve your question. It doesn't use your classes but it does find an index so it's customizable for you.
There are also obvious limitations to this as it is just bare bones.
This method will spot out changes made to the original string by comparing it to the changed string
// Find the changes made to a string
string StringDiff (string originalString, string changedString)
{
string diffString = "";
// Iterate over the original string
for (int i = 0; i < originalString.Length; i++)
{
// Get the character to search with
char diffChar = originalString[i];
// If found char in the changed string
if (FindInString(diffChar, changedString, out int index))
{
// Remove from the changed string at the index as we don't want to match to this char again
changedString = changedString.Remove(index, 1);
}
// If not found then this is a difference
else
{
// Add to diff string
diffString += diffChar;
}
}
return diffString;
}
This method will return true at the first matching occurrence (an obvious limitation but this is more to give you an idea)
// Find char at first occurence in string
bool FindInString (char c, string search, out int index)
{
index = -1;
// Iterate over search string
for (int i = 0; i < search.Length; i++)
{
// If found then return true with index
if (c == search[i])
{
index = i;
return true;
}
}
return false;
}
This is a simple helper method to show you an example
void SplitStrings(string oldStr, string newStr)
{
Console.WriteLine($"Old : {oldStr}, New: {newStr}");
Console.WriteLine("Removed - " + StringDiff(oldStr, newStr));
Console.WriteLine("Added - " + StringDiff(newStr, oldStr));
}

I've done it.
static void Main(string[] args)
{
while(true)
{
Console.WriteLine("Enter the Old Text");
string oldText = Console.ReadLine();
Console.WriteLine("Enter the New Text");
string newText = Console.ReadLine();
Console.WriteLine("Enter the Caret Position");
int caretPos = int.Parse(Console.ReadLine());
Sequence removed = GetRemovedCharacters(oldText, newText, caretPos);
if(removed != null)
oldText = oldText.Remove(removed.StartIndex, removed.Length);
Sequence added = GetAddedCharacters(oldText, newText, caretPos);
if(added != null)
oldText = oldText.Insert(added.StartIndex, newText.Substring(added.StartIndex, added.Length));
Console.WriteLine("Worked: " + (oldText == newText).ToString());
Console.ReadKey();
Console.Clear();
}
}
static Sequence GetRemovedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(startIndex, caretPosition + (oldText.Length - newText.Length) - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static Sequence GetAddedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(GetStartIndex(oldText, newText), caretPosition - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static int GetStartIndex(string oldText, string newText)
{
for(int i = 0; i < Math.Max(oldText.Length, newText.Length); i++)
if(i >= oldText.Length || i >= newText.Length || oldText[i] != newText[i])
return i;
return -1;
}
static bool SequenceValid(Sequence sequence)
{
return sequence.StartIndex >= 0 && sequence.EndIndex >= 0 && sequence.EndIndex >= sequence.StartIndex;
}

Related

C# - How do I determine if two strings have only one single letter in common using recursion

I've been trying for several hours to find a way to determine if two strings have one single letter in common (exclusively one) using recursion in C#...
For instance if the word1 was "hello" and the word2 was "bye" it should return true because there is only one "e" in common. However, if the word1 was "hello" and the word2 was "yellow" or "banana" it should return false because there is more than one letter in common between "hello" and "yellow" and none in "banana"
This is what i've done so far, but i do not understand why it doesn't return the expected result:
private static bool didHaveOneCaracterInCommon(string word1, string word2, int index)
{
int indexChar = 0;
if(index + 1 < word1.Length)
indexChar = word2.IndexOf(word1[index]);
if (indexCar != -1) //There is at least one char in common
{
//Verify if there is another one character in common
if ( (index + 1 < word1.Length && didHaveOneCaracterInCommon(word1,word2.Remove(indexChar, 1), index + 1))
return false;
return true;
}
if (index + 1 == word1.Length)
return false;
return didHaveOneCaracterInCommon(word1, word2, index + 1);
}
Thank you in advance !
you can approach it like this
public static bool ExclusiveCharInCommon(string l, string r)
{
int CharactersInCommon(string f, string s)
{
if (f.Length == 0) return 0;
return ((s.IndexOf(f[0]) != -1) ? 1 : 0) + CharactersInCommon(f.Substring(1), s);
}
return CharactersInCommon(l, r) == 1;
}
I would recommend approaching it slightly differently with a clearer base case
private static bool charInCommon(string word1, string word2, int index)
{
int indexChar = 0;
indexChar = word2.IndexOf(word1[index]);
if (indexChar != -1)
{
return true;
}
return false;
}
private static bool onlyOneCaracterInCommon(string word1, string word2, int index = 0, bool commonfound = false)
{
if (index >= word1.Length) { return commonfound; }
if (commonfound) //if you find another return false
{
if (charInCommon(word1, word2, index))
{ return false; }
}
else
{
if (charInCommon(word1, word2, index))
{ commonfound = true; }
return onlyOneCaracterInCommon(word1, word2, index + 1, commonfound);
}
return onlyOneCaracterInCommon(word1, word2, index + 1, commonfound);
}
Edit: Changed from pseudocode to real code
There are two basecases here:
Edit : changed signature to
private static bool onlyOneCaracterInCommon(string word1, string word2, int index = 0, bool commonfound = false) so you can call it with just word1 and word2.
1) Reach the end of string
2) Reach the second character that both strings share in common.

Best way to split string into lines with maximum length, without breaking words

I want to break a string up into lines of a specified maximum length, without splitting any words, if possible (if there is a word that exceeds the maximum line length, then it will have to be split).
As always, I am acutely aware that strings are immutable and that one should preferably use the StringBuilder class. I have seen examples where the string is split into words and the lines are then built up using the StringBuilder class, but the code below seems "neater" to me.
I mentioned "best" in the description and not "most efficient" as I am also interested in the "eloquence" of the code. The strings will never be huge, generally splitting into 2 or three lines, and it won't be happening for thousands of lines.
Is the following code really bad?
private static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
stringToSplit = stringToSplit.Trim();
var lines = new List<string>();
while (stringToSplit.Length > 0)
{
if (stringToSplit.Length <= maximumLineLength)
{
lines.Add(stringToSplit);
break;
}
var indexOfLastSpaceInLine = stringToSplit.Substring(0, maximumLineLength).LastIndexOf(' ');
lines.Add(stringToSplit.Substring(0, indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine : maximumLineLength).Trim());
stringToSplit = stringToSplit.Substring(indexOfLastSpaceInLine >= 0 ? indexOfLastSpaceInLine + 1 : maximumLineLength);
}
return lines.ToArray();
}
Even when this post is 3 years old I wanted to give a better solution using Regex to accomplish the same:
If you want the string to be splitted and then use the text to be displayed you can use this:
public string SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Replace(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)", "$1\n");
}
If on the other hand you need a collection you can use this:
public MatchCollection SplitToLines(string stringToSplit, int maximumLineLength)
{
return Regex.Matches(stringToSplit, #"(.{1," + maximumLineLength +#"})(?:\s|$)");
}
NOTES
Remember to import regex (using System.Text.RegularExpressions;)
You can use string interpolation on the match:
$#"(.{{1,{maximumLineLength}}})(?:\s|$)"
The MatchCollection works almost like an Array
Matching example with explanation here
How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}
I don't think your solution is too bad. I do, however, think you should break up your ternary into an if else because you are testing the same condition twice. Your code might also have a bug. Based on your description, it seems you want lines <= maxLineLength, but your code counts the space after the last word and uses it in the <= comparison resulting in effectively < behavior for the trimmed string.
Here is my solution.
private static IEnumerable<string> SplitToLines(string stringToSplit, int maxLineLength)
{
string[] words = stringToSplit.Split(' ');
StringBuilder line = new StringBuilder();
foreach (string word in words)
{
if (word.Length + line.Length <= maxLineLength)
{
line.Append(word + " ");
}
else
{
if (line.Length > 0)
{
yield return line.ToString().Trim();
line.Clear();
}
string overflow = word;
while (overflow.Length > maxLineLength)
{
yield return overflow.Substring(0, maxLineLength);
overflow = overflow.Substring(maxLineLength);
}
line.Append(overflow + " ");
}
}
yield return line.ToString().Trim();
}
It is a bit longer than your solution, but it should be more straightforward. It also uses a StringBuilder so it is much faster for large strings. I performed a benchmarking test for 20,000 words ranging from 1 to 11 characters each split into lines of 10 character width. My method completed in 14ms compared to 1373ms for your method.
Try this (untested)
private static IEnumerable<string> SplitToLines(string value, int maximumLineLength)
{
var words = value.Split(' ');
var line = new StringBuilder();
foreach (var word in words)
{
if ((line.Length + word.Length) >= maximumLineLength)
{
yield return line.ToString();
line = new StringBuilder();
}
line.AppendFormat("{0}{1}", (line.Length>0) ? " " : "", word);
}
yield return line.ToString();
}
~6x faster than the accepted answer
More than 1.5x faster than the Regex version in Release Mode (dependent on line length)
Optionally keep the space at the end of the line or not (the regex version always keeps it)
static IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength, bool removeSpace = true)
{
int start = 0;
int end = 0;
for (int i = 0; i < stringToSplit.Length; i++)
{
char c = stringToSplit[i];
if (c == ' ' || c == '\n')
{
if (i - start > maximumLineLength)
{
string substring = stringToSplit.Substring(start, end - start); ;
start = removeSpace ? end + 1 : end; // + 1 to remove the space on the next line
yield return substring;
}
else
end = i;
}
}
yield return stringToSplit.Substring(start); // remember last line
}
Here is the example code used to test speeds (again, run on your own machine and test in Release mode to get accurate timings)
https://dotnetfiddle.net/h5I1GC
Timings on my machine in release mode .Net 4.8
Accepted Answer: 667ms
Regex: 368ms
My Version: 117ms
My requirement was to have a line break at the last space before the 30 char limit.
So here is how i did it. Hope this helps anyone looking.
private string LineBreakLongString(string input)
{
var outputString = string.Empty;
var found = false;
int pos = 0;
int prev = 0;
while (!found)
{
var p = input.IndexOf(' ', pos);
{
if (pos <= 30)
{
pos++;
if (p < 30) { prev = p; }
}
else
{
found = true;
}
}
outputString = input.Substring(0, prev) + System.Environment.NewLine + input.Substring(prev, input.Length - prev).Trim();
}
return outputString;
}
An approach using recursive method and ReadOnlySpan (Tested)
public static void SplitToLines(ReadOnlySpan<char> stringToSplit, int index, ref List<string> values)
{
if (stringToSplit.IsEmpty || index < 1) return;
var nextIndex = stringToSplit.IndexOf(' ');
var slice = stringToSplit.Slice(0, nextIndex < 0 ? stringToSplit.Length : nextIndex);
if (slice.Length <= index)
{
values.Add(slice.ToString());
nextIndex++;
}
else
{
values.Add(slice.Slice(0, index).ToString());
nextIndex = index;
}
if (stringToSplit.Length <= index) return;
SplitToLines(stringToSplit.Slice(nextIndex), index, ref values);
}

Determine if string has all unique characters

I'm working through an algorithm problem set which poses the following question:
"Determine if a string has all unique characters. Assume you can only use arrays".
I have a working solution, but I would like to see if there is anything better optimized in terms of time complexity. I do not want to use LINQ. Appreciate any help you can provide!
static void Main(string[] args)
{
FindDupes("crocodile");
}
static string FindDupes(string text)
{
if (text.Length == 0 || text.Length > 256)
{
Console.WriteLine("String is either empty or too long");
}
char[] str = new char[text.Length];
char[] output = new char[text.Length];
int strLength = 0;
int outputLength = 0;
foreach (char value in text)
{
bool dupe = false;
for (int i = 0; i < strLength; i++)
{
if (value == str[i])
{
dupe = true;
break;
}
}
if (!dupe)
{
str[strLength] = value;
strLength++;
output[outputLength] = value;
outputLength++;
}
}
return new string(output, 0, outputLength);
}
If time complexity is all you care about you could map the characters to int values, then have an array of bool values which remember if you've seen a particular character value previously.
Something like ... [not tested]
bool[] array = new bool[256]; // or larger for Unicode
foreach (char value in text)
if (array[(int)value])
return false;
else
array[(int)value] = true;
return true;
try this,
string RemoveDuplicateChars(string key)
{
string table = string.Empty;
string result = string.Empty;
foreach (char value in key)
{
if (table.IndexOf(value) == -1)
{
table += value;
result += value;
}
}
return result;
}
usage
Console.WriteLine(RemoveDuplicateChars("hello"));
Console.WriteLine(RemoveDuplicateChars("helo"));
Console.WriteLine(RemoveDuplicateChars("Crocodile"));
output
helo
helo
Crocdile
public boolean ifUnique(String toCheck){
String str="";
for(int i=0;i<toCheck.length();i++)
{
if(str.contains(""+toCheck.charAt(i)))
return false;
str+=toCheck.charAt(i);
}
return true;
}
EDIT:
You may also consider to omit the boundary case where toCheck is an empty string.
The following code works:
static void Main(string[] args)
{
isUniqueChart("text");
Console.ReadKey();
}
static Boolean isUniqueChart(string text)
{
if (text.Length == 0 || text.Length > 256)
{
Console.WriteLine(" The text is empty or too larg");
return false;
}
Boolean[] char_set = new Boolean[256];
for (int i = 0; i < text.Length; i++)
{
int val = text[i];//already found this char in the string
if (char_set[val])
{
Console.WriteLine(" The text is not unique");
return false;
}
char_set[val] = true;
}
Console.WriteLine(" The text is unique");
return true;
}
If the string has only lower case letters (a-z) or only upper case letters (A-Z) you can use a very optimized O(1) solution.Also O(1) space.
c++ code :
bool checkUnique(string s){
if(s.size() >26)
return false;
int unique=0;
for (int i = 0; i < s.size(); ++i) {
int j= s[i]-'a';
if(unique & (1<<j)>0)
return false;
unique=unique|(1<<j);
}
return true;
}
Remove Duplicates in entire Unicode Range
Not all characters can be represented by a single C# char. If you need to take into account combining characters and extended unicode characters, you need to:
parse the characters using StringInfo
normalize the characters
find duplicates amongst the normalized strings
Code to remove duplicate characters:
We keep track of the entropy, storing the normalized characters (each character is a string, because many characters require more than 1 C# char). In case a character (normalized) is not yet stored in the entropy, we append the character (in specified form) to the output.
public static class StringExtension
{
public static string RemoveDuplicateChars(this string text)
{
var output = new StringBuilder();
var entropy = new HashSet<string>();
var iterator = StringInfo.GetTextElementEnumerator(text);
while (iterator.MoveNext())
{
var character = iterator.GetTextElement();
if (entropy.Add(character.Normalize()))
{
output.Append(character);
}
}
return output.ToString();
}
}
Unit Test:
Let's test a string that contains variations on the letter A, including the Angstrom sign Å. The Angstrom sign has unicode codepoint u212B, but can also be constructed as the letter A with the diacritic u030A. Both represent the same character.
// ÅÅAaA
var input = "\u212BA\u030AAaA";
// ÅAa
var output = input.RemoveDuplicateChars();
Further extensions could allow for a selector function that determines how to normalize characters. For instance the selector (x) => x.ToUpperInvariant().Normalize() would allow for case-insensitive duplicate removal.
public static bool CheckUnique(string str)
{
int accumulator = 0;
foreach (int asciiCode in str)
{
int shiftedBit = 1 << (asciiCode - ' ');
if ((accumulator & shiftedBit) > 0)
return false;
accumulator |= shiftedBit;
}
return true;
}

How to insert/remove hyphen to/from a plain string in c#?

I have a string like this;
string text = "6A7FEBFCCC51268FBFF";
And I have one method for which I want to insert the logic for appending the hyphen after 4 characters to 'text' variable. So, the output should be like this;
6A7F-EBFC-CC51-268F-BFF
Appending hyphen to above 'text' variable logic should be inside this method;
public void GetResultsWithHyphen
{
// append hyphen after 4 characters logic goes here
}
And I want also remove the hyphen from a given string such as 6A7F-EBFC-CC51-268F-BFF. So, removing hyphen from a string logic should be inside this method;
public void GetResultsWithOutHyphen
{
// Removing hyphen after 4 characters logic goes here
}
How can I do this in C# (for desktop app)?
What is the best way to do this?
Appreciate everyone's answer in advance.
GetResultsWithOutHyphen is easy (and should return a string instead of void
public string GetResultsWithOutHyphen(string input)
{
// Removing hyphen after 4 characters logic goes here
return input.Replace("-", "");
}
for GetResultsWithHyphen, there may be slicker ways to do it, but here's one way:
public string GetResultsWithHyphen(string input)
{
// append hyphen after 4 characters logic goes here
string output = "";
int start = 0;
while (start < input.Length)
{
output += input.Substring(start, Math.Min(4,input.Length - start)) + "-";
start += 4;
}
// remove the trailing dash
return output.Trim('-');
}
Use regex:
public String GetResultsWithHyphen(String inputString)
{
return Regex.Replace(inputString, #"(\w{4})(\w{4})(\w{4})(\w{4})(\w{3})",
#"$1-$2-$3-$4-$5");
}
and for removal:
public String GetResultsWithOutHyphen(String inputString)
{
return inputString.Replace("-", "");
}
Here's the shortest regex I could come up with. It will work on strings of any length. Note that the \B token will prevent it from matching at the end of a string, so you don't have to trim off an extra hyphen as with some answers above.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string text = "6A7FEBFCCC51268FBFF";
for (int i = 0; i <= text.Length;i++ )
Console.WriteLine(hyphenate(text.Substring(0, i)));
}
static string hyphenate(string s)
{
var re = new Regex(#"(\w{4}\B)");
return re.Replace (s, "$1-");
}
static string dehyphenate (string s)
{
return s.Replace("-", "");
}
}
}
var hyphenText = new string(
text
.SelectMany((i, ch) => i%4 == 3 && i != text.Length-1 ? new[]{ch, '-'} : new[]{ch})
.ToArray()
)
something along the lines of:
public string GetResultsWithHyphen(string inText)
{
var counter = 0;
var outString = string.Empty;
while (counter < inText.Length)
{
if (counter % 4 == 0)
outString = string.Format("{0}-{1}", outString, inText.Substring(counter, 1));
else
outString += inText.Substring(counter, 1);
counter++;
}
return outString;
}
This is rough code and may not be perfectly, syntactically correct
public static string GetResultsWithHyphen(string str) {
return Regex.Replace(str, "(.{4})", "$1-");
//if you don't want trailing -
//return Regex.Replace(str, "(.{4})(?!$)", "$1-");
}
public static string GetResultsWithOutHyphen(string str) {
//if you just want to remove the hyphens:
//return input.Replace("-", "");
//if you REALLY want to remove hyphens only if they occur after 4 places:
return Regex.Replace(str, "(.{4})-", "$1");
}
For removing:
String textHyphenRemoved=text.Replace('-',''); should remove all of the hyphens
for adding
StringBuilder strBuilder = new StringBuilder();
int startPos = 0;
for (int i = 0; i < text.Length / 4; i++)
{
startPos = i * 4;
strBuilder.Append(text.Substring(startPos,4));
//if it isn't the end of the string add a hyphen
if(text.Length-startPos!=4)
strBuilder.Append("-");
}
//add what is left
strBuilder.Append(text.Substring(startPos, 4));
string textWithHyphens = strBuilder.ToString();
Do note that my adding code is untested.
GetResultsWithOutHyphen method
public string GetResultsWithOutHyphen(string input)
{
return input.Replace("-", "");
}
GetResultsWithOutHyphen method
You could pass a variable instead of four for flexibility.
public string GetResultsWithHyphen(string input)
{
string output = "";
int start = 0;
while (start < input.Length)
{
char bla = input[start];
output += bla;
start += 1;
if (start % 4 == 0)
{
output += "-";
}
}
return output;
}
This worked for me when I had a value for a social security number (123456789) and needed it to display as (123-45-6789) in a listbox.
ListBox1.Items.Add("SS Number : " & vbTab & Format(SSNArray(i), "###-##-####"))
In this case I had an array of Social Security Numbers. This line of code alters the formatting to put a hyphen in.
Callee
public static void Main()
{
var text = new Text("THISisJUSTanEXAMPLEtext");
var convertText = text.Convert();
Console.WriteLine(convertText);
}
Caller
public class Text
{
private string _text;
private int _jumpNo = 4;
public Text(string text)
{
_text = text;
}
public Text(string text, int jumpNo)
{
_text = text;
_jumpNo = jumpNo < 1 ? _jumpNo : jumpNo;
}
public string Convert()
{
if (string.IsNullOrEmpty(_text))
{
return string.Empty;
}
if (_text.Length < _jumpNo)
{
return _text;
}
var convertText = _text.Substring(0, _jumpNo);
int start = _jumpNo;
while (start < _text.Length)
{
convertText += "-" + _text.Substring(start, Math.Min(_jumpNo, _text.Length - start));
start += _jumpNo;
}
return convertText;
}
}

How to extract phrases and then words in a string of text?

I have a search method that takes in a user-entered string, splits it at each space character and then proceeds to find matches based on the list of separated terms:
string[] terms = searchTerms.ToLower().Trim().Split( ' ' );
Now I have been given a further requirement: to be able to search for phrases via double quote delimiters a la Google. So if the search terms provided were:
"a line of" text
The search would match occurrences of "a line of" and "text" rather than the four separate terms [the open and closing double quotes would also need to be removed before searching].
How can I achieve this in C#? I would assume regular expressions would be the way to go, but haven't dabbled in them much so don't know if they are the best solution.
If you need any more info, please ask. Thanks in advance for the help.
Here's a regex pattern that would return matches in groups named 'term':
("(?<term>[^"]+)"\s*|(?<term>[^ ]+)\s*)+
So for the input:
"a line" of text
The output items identified by the 'term' group would be:
a line
of
text
Regular expressions would definitely be the way to go...
You should check this MSDN link out for some info on the Regex class:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
and here is an excellent link to learn some regular expression syntax:
http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx
Then to add some code examples, you could be doing it something along these lines:
string searchString = "a line of";
Match m = Regex.Match(textToSearch, searchString);
or if you just want to find out if the string contains a match or not:
bool success = Regex.Match(textToSearch, searchString).Success;
use the regular expression builder here
http://gskinner.com/RegExr/
and you will be able to manipulate the regular expression to how you need it displayed
Use Regexs....
string textToSearchIn = ""a line of" text";
string result = Regex.Match(textToSearchIn, "(?<=").*?(?=")").Value;
or if more then one, put this into a match collection...
MatchCollection allPhrases = Regex.Matches(textToSearchIn, "(?<=").*?(?=")");
The Knuth-Morris-Pratt (KMP algorithm)is recognised as the fastest algorithm for finding substrings in strings (well, technically not strings but byte-arrays).
using System.Collections.Generic;
namespace KMPSearch
{
public class KMPSearch
{
public static int NORESULT = -1;
private string _needle;
private string _haystack;
private int[] _jumpTable;
public KMPSearch(string haystack, string needle)
{
Haystack = haystack;
Needle = needle;
}
public void ComputeJumpTable()
{
//Fix if we are looking for just one character...
if (Needle.Length == 1)
{
JumpTable = new int[1] { -1 };
}
else
{
int needleLength = Needle.Length;
int i = 2;
int k = 0;
JumpTable = new int[needleLength];
JumpTable[0] = -1;
JumpTable[1] = 0;
while (i <= needleLength)
{
if (i == needleLength)
{
JumpTable[needleLength - 1] = k;
}
else if (Needle[k] == Needle[i])
{
k++;
JumpTable[i] = k;
}
else if (k > 0)
{
JumpTable[i - 1] = k;
k = 0;
}
i++;
}
}
}
public int[] MatchAll()
{
List<int> matches = new List<int>();
int offset = 0;
int needleLength = Needle.Length;
int m = Match(offset);
while (m != NORESULT)
{
matches.Add(m);
offset = m + needleLength;
m = Match(offset);
}
return matches.ToArray();
}
public int Match()
{
return Match(0);
}
public int Match(int offset)
{
ComputeJumpTable();
int haystackLength = Haystack.Length;
int needleLength = Needle.Length;
if ((offset >= haystackLength) || (needleLength > ( haystackLength - offset)))
return NORESULT;
int haystackIndex = offset;
int needleIndex = 0;
while (haystackIndex < haystackLength)
{
if (needleIndex >= needleLength)
return haystackIndex;
if (haystackIndex + needleIndex >= haystackLength)
return NORESULT;
if (Haystack[haystackIndex + needleIndex] == Needle[needleIndex])
{
needleIndex++;
}
else
{
//Naive solution
haystackIndex += needleIndex;
//Go back
if (needleIndex > 1)
{
//Index of the last matching character is needleIndex - 1!
haystackIndex -= JumpTable[needleIndex - 1];
needleIndex = JumpTable[needleIndex - 1];
}
else
haystackIndex -= JumpTable[needleIndex];
}
}
return NORESULT;
}
public string Needle
{
get { return _needle; }
set { _needle = value; }
}
public string Haystack
{
get { return _haystack; }
set { _haystack = value; }
}
public int[] JumpTable
{
get { return _jumpTable; }
set { _jumpTable = value; }
}
}
}
Usage :-
using System;
using System.Collections.Generic;
using System.Text;
using System.Reflection;
namespace KMPSearch
{
class Program
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: " + Environment.GetCommandLineArgs()[0] + " haystack needle");
}
else
{
KMPSearch search = new KMPSearch(args[0], args[1]);
int[] matches = search.MatchAll();
foreach (int i in matches)
Console.WriteLine("Match found at position " + i+1);
}
}
}
}
Try this, It'll return an array for text. ex: { "a line of" text "notepad" }:
string textToSearch = "\"a line of\" text \" notepad\"";
MatchCollection allPhrases = Regex.Matches(textToSearch, "(?<=\").*?(?=\")");
var RegArray = allPhrases.Cast<Match>().ToArray();
output: {"a line of","text"," notepad" }

Categories

Resources