Find matching word in C# - c#

I'm facing a problem to solve this issue. I'm having a string variable, for an example
string text="ABCD,ABCDABCD,ADCDS";
I need to search a string value like 'BC' into above string and find the position where "BC" occur. i.e if we search "BC" in to that string variable it will bring the output as 1,6
0 1 2 3 4 5 6 7 8 9 10 11 12 13
-------------------------------------------------------
| A | B | C | D | , | A | B | C | D | , | A | D | C | S |
-------------------------------------------------------
The problem is we cant use built in string class methods contains(), lastIndexOf(). can anyone help me to do this?

The problem is we cant use built in string class methods
'contains()','lastIndexOf()'. can anyone help me to do this?
Then you can built your own. I assume that even Substring is forbidden.
string text="ABCD,ABCDABCD,ADCDS";
string whatToFind = "BC";
List<int> result = new List<int>();
for(int index=0; index < text.Length; index++)
{
if(index + whatToFind.Length > text.Length)
break;
bool matches = true;
for(int index2=0; index2<whatToFind.Length; index2++)
{
matches = text[index+index2] == whatToFind[index2];
if(!matches)
break;
}
if(matches)
result.Add(index);
}
Here's the running code: http://ideone.com/s7ej3

Probably you can't use regural expression in your homework. The best solution is think about your string as char array. Read about http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

Rolling your own version of IndexOf is not hard (as per the answers you've already received), and since it's homework, you can probably get away with it.
However, as you can probably imagine, a simple for loop is not the most efficient way to do it. String searching is an important topic, and although you won't probably need to implement it outside of homework ever again, you can read about it for your own edification.

string text = "ABCD,ABCDABCD,ADCDS";
int location;
for (int i = 0; i < text.Length; i++)
if (text[i] == 'B')
if (text[i + 1] == 'C')
{
location = i;
i++;
}
EDIT:
List<int> locations = new List<int>();
string text = "ABCD,ABCDABCD,ADCDS";
for (int i = 0; i < text.Length; i++)
if (text[i] == 'B')
if (text[i + 1] == 'C')
{
location.Add(i);
i++;
}

This should work out for you:
string text="ABCD,ABCDABCD,ADCDS";
var seekindex = 0;
var positions = new List<int>();
while( seekindex < text.Length ){
var index = text.IndexOf( "BC", seekindex);
if( index > -1){
positions.Add(index);
seekindex = index + 1;
}else{
break;
}
}
This uses the IndexOf method with a startindex to make sure that we continue searhing from our previous hit location the next time, and untill IndexOf returns -1 indication no more hits.
positions will contain the indexes at the end, and the result is actually 1,6,10 and not 1,6 ;)
EDIT
Just realized he could not use IndexOf. Trying again :)
string text="ABCD,ABCDABCD,ADCDS";
var positions = new List<int>();
for( int i = 0; i < text.Length-1; i++ ){
if( text[i] == 'B' && text[i+1] == 'C' ){
positions.Add(i);
}
}
It might seem like a preformance problem here since the if sentence checks both the current, and the next char, and therefore checking all chars twice.
But in fact it wont. Because of the AND (&&) in between, if text[i] is not B, it will not perform the second check since it knows that the if will fail anyway.

Below is a perfectly working example to your requirements, but is also nice and slow and also has a big memory footprint:
string text = "ABCD,ABCDABCD,ADCDS";
string whatToFind = "BC";
string delim = "";
for(int index=0; index < text.Length; index++)
{
if(index + whatToFind.Length > text.Length)
break;
if(text.SubString(index, whatToFind.Length) == whatToFind)
{
Console.Out.WriteLine(delim + index.ToString())
delim = ",";
}
}
I leave it to the reader as an exercise to improve the performance and memory usage. It's more useful to understand where and why this is slow than to achieve a faster answer.

Related

How to find text between two tabs

I have a file that looks similar like the following:
Tomas | Nordstrom | Sweden | Europe | World
(the character "|" in the above line represents a tab, new column)
Now I want a string containing only the text in the 4th column.
I have succeeded to find characters in a certain spot in the line. But that spot changes according to the number och characters in each column.
I could really need some nice input on this.
Thanks in advance.
/Tomas
This can be done using the Split method like this:
string s = "Tomas|Nordstrom|Sweden|Europe|World";
string[] stringArray = s.Split( new string[] { "|" }, StringSplitOptions.None );
Console.WriteLine( stringArray[3] );
This will print out "Europe", because that is located at index 3 in stringArray.
Edit:
The same can be achieved using Regex like this:
string[] stringRegex = Regex.Split( s, #"\|+" );
Basic algorithm would be iterating characters, until n-1 tabs found, then take chars up to the next tab or the end of string.
Depending on requirements, if performance is critical, you might need to implement a scanning algorithm manually.
You might be surprising how slow is string splitting. Well - it's not not by itself, but the overall approach requires:
Scanning to the end of the string
Creation of all of the split parts on heap
Collecting garbage
Consider following benchmark of the two approaches:
void Main()
{
string source = "Tomas\tNordstrom\tSweden\tEurope\tWorld";
var sw = Stopwatch.StartNew();
string result = null;
var n = 100000000;
for (var i = 0; i < n; i++)
{
result = FindBySplitting(source);
}
sw.Stop();
var splittingNsop = (double)sw.ElapsedMilliseconds / n * 1000000.0;
Console.WriteLine("Splitting. {0} ns/op",splittingNsop);
Console.WriteLine(result);
sw.Restart();
for (var i = 0; i < n; i++)
{
result = FindByScanning(source);
}
sw.Stop();
var scanningNsop = (double)sw.ElapsedMilliseconds / n * 1000000.0;
Console.WriteLine("Scanning. {0} ns/op",
scanningNsop);
Console.WriteLine(result);
Console.WriteLine("Scanning over splitting: {0}", splittingNsop / scanningNsop);
}
string FindBySplitting(string s)
{
return s.Split('\t')[3];
}
string FindByScanning(string s)
{
int l = s.Length, p = 0, q = 0, c = 0;
while (c++ < 4 - 1)
while (p < l && s[p++] != '\t')
;
for (q = p; q < l && s[q] != '\t'; q++)
;
return s.Substring(p, q - p);
}
Scanning algorithm implemented in pure C# outperforms the splitting one implemented on the low level by a factor of 4.6 on my laptop:
Splitting. 174.81 ns/op
Europe
Scanning. 37.58 ns/op
Europe
Scanning over splitting: 4.65167642362959

Adding 'space' in C# textbox

Hi guys, so I need to add a 'space' between each character in my displayed text box.
I am giving the user a masked word like this He__o for him to guess and I want to convert this to H e _ _ o
I am using the following code to randomly replace characters with '_'
char[] partialWord = word.ToCharArray();
int numberOfCharsToHide = word.Length / 2; //divide word length by 2 to get chars to hide
Random randomNumberGenerator = new Random(); //generate rand number
HashSet<int> maskedIndices = new HashSet<int>(); //This is to make sure that I select unique indices to hide. Hashset helps in achieving this
for (int i = 0; i < numberOfCharsToHide; i++) //counter until it reaches words to hide
{
int rIndex = randomNumberGenerator.Next(0, word.Length); //init rindex
while (!maskedIndices.Add(rIndex))
{
rIndex = randomNumberGenerator.Next(0, word.Length); //This is to make sure that I select unique indices to hide. Hashset helps in achieving this
}
partialWord[rIndex] = '_'; //replace with _
}
return new string(partialWord);
I have tried : partialWord[rIndex] = '_ ';however this brings the error "Too many characters in literal"
I have tried : partialWord[rIndex] = "_ "; however this returns the error " Cannot convert type string to char.
Any idea how I can proceed to achieve a space between each character?
Thanks
The following code should do as you ask. I think the code is pretty self explanatory., but feel free to ask if anything is unclear as to the why or how of the code.
// char[] partialWord is used from question code
char[] result = new char[(partialWord.Length * 2) - 1];
for(int i = 0; i < result.Length; i++)
{
result[i] = i % 2 == 0 ? partialWord[i / 2] : ' ';
}
return new string(result);
Since the resulting string is longer than the original string, you can't use only one char array because its length is constant.
Here's a solution with StringBuilder:
var builder = new StringBuilder(word);
for (int i = 0 ; i < word.Length ; i++) {
builder.Insert(i * 2, " ");
}
return builder.ToString().TrimStart(' '); // TrimStart is called here to remove the leading whitespace. If you want to keep it, delete the call.

String.Substring() crashes with certain inputs

I trying to split a string with Substring(), and I am having a problem I keep getting crashes with certin values.The problematic lane is(according to the "debugging" i tried):
string sub = str.Substring(beg,i);
and the whole code is :
static void Prints(string str)
{
int beg = 0;
for (int i = 0; i < str.Length; i++)
{
if (str[i] == '*')
{
Console.WriteLine(i);
//Console.WriteLine("before");
string sub = str.Substring(beg,i);
//Console.WriteLine("after");
beg = i+1;
if (sub.Length % 2 == 0)
{
Console.WriteLine(sub.Length/2);
int n = sub.Length / 2;
Console.WriteLine("{0} {1}", sub[n-1], sub[n]);
}
else
{
int n = sub.Length / 2;
Console.WriteLine(sub[n]);
}
The eror happens when the input is :
hi*its*
thats the output:
h i
Unhandled Exception: System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
Parameter name: length
at System.String.Substring(Int32 startIndex, Int32 length)
at _39.Program.Prints(String str) in D:\12\39\Program.cs:line 36
at _39.Program.Main(String[] args) in D:\12\39\Program.cs:line 13
I know there might be a better way using split() but I still want to understand what cause the eror.
Thanks in advance
Doron.
The problem is that you're not subtracting the distance you are into the string from the overall length.
If you look at the debug output you will find that:
str.Substring(3, 1) = "i"
str.Substring(3, 2) = "it"
str.Substring(3, 3) = "its"
str.Substring(3, 4) = "its*"
str.Substring(3, 5) = // Error! You're beyond the end of the string.
So clearly you are attempting to pull (in your example) 6 characters from the string starting at position 3. This would require an input string with total length 10 or more (as substring is Zero Index based). Your input string is only 7 chars long.
Try tokenizing your string. As soon as you try manually tokenizing using indices and counting things go wrong. Tokenizing is a god send :)
Good Luck!

Char/String comparison

I'm trying to have a suggestion feature for the search function in my program eg I type janw doe in the search section and it will output NO MATCH - did you mean jane doe? I'm not sure what the problem is, maybe something to do with char/string comparison..I've tried comparing both variables as type char eg char temp -->temp.Contains ...etc but an error appears (char does not contain a definition for Contains). Would love any help on this! 8)
if (found == false)
{
Console.WriteLine("\n\nMATCH NOT FOUND");
int charMatch = 0, charCount = 0;
string[] checkArray = new string[26];
//construction site /////////////////////////////////////////////////////////////////////////////////////////////////////////////
for (int controlLoop = 0; controlLoop < contPeople.Length; controlLoop++)
{
foreach (char i in userContChange)
{
charCount = charCount + 1;
}
for (int i = 0; i < userContChange.Length; )
{
string temp = contPeople[controlLoop].name;
string check=Convert.ToString(userContChange[i]);
if (temp.Contains(check))
{
charMatch = charMatch + 1;
}
}
int half = charCount / 2;
if (charMatch >= half)
{
checkArray[controlLoop] = contPeople[controlLoop].name;
}
}
///////////////////////////////////////////////////////////////////////////////////////////////////////////
Console.WriteLine("Did you mean: ");
for (int a = 0; a < checkArray.Length; a++)
{
Console.WriteLine(checkArray[a]);
}
///////////////////////////////////////////////////////////////////////////////////////////////////
A string is made up of many characters. A character is a primitive, likewise, it doesn't "contain" any other items. A string is basically an array of characters.
For comparing string and characters:
char a = 'A';
String alan = "Alan";
Debug.Assert(alan[0] == a);
Or if you have a single digit string.. I suppose
char a = 'A';
String alan = "A";
Debug.Assert(alan == a.ToString());
All of these asserts are true
But, the main reason I wanted to comment on your question, is to suggest an alternative approach for suggesting "Did you mean?". There's an algorithm called Levenshtein Distance which calculates the "number of single character edits" required to convert one string to another. It can be used as a measure of how close two strings are. You may want to look into how this algorithm works because it could help you.
Here's an applet that I found which demonstrates: Approximate String Matching with k-differences
Also the wikipedia link Levenshtein distance
Char type cannot have .Contains() because is only 1 char value type.
In your case (if i understand), maybe you need to use .Equals() or the == operator.
Note: for compare String correctly, use .Equals(),
the == operator does not work good in this case because String is reference type.
Hope this help!
char type dosen't have the Contains() method, but you can use iit like this: 'a'.ToString().Contains(...)
if do not consider the performance, another simple way:
var input = "janw doe";
var people = new string[] { "abc", "123", "jane", "jane doe" };
var found = Array.BinarySearch<string>(people, input);//or use FirstOrDefault(), FindIndex, search engine...
if (found < 0)//not found
{
var i = input.ToArray();
var target = "";
//most similar
//target = people.OrderByDescending(p => p.ToArray().Intersect(i).Count()).FirstOrDefault();
//as you code:
foreach (var p in people)
{
var count = p.ToArray().Intersect(i).Count();
if (count > input.Length / 2)
{
target = p;
break;
}
}
if (!string.IsNullOrWhiteSpace(target))
{
Console.WriteLine(target);
}
}

What is the most efficient way to detect if a string contains a number of consecutive duplicate characters in C#?

For example, a user entered "I love this post!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
the consecutive duplicate exclamation mark "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" should be detected.
The following regular expression would detect repeating chars. You could up the number or limit this to specific characters to make it more robust.
int threshold = 3;
string stringToMatch = "thisstringrepeatsss";
string pattern = "(\\d)\\" + threshold + " + ";
Regex r = new Regex(pattern);
Match m = r.Match(stringToMatch);
while(m.Success)
{
Console.WriteLine("character passes threshold " + m.ToString());
m = m.NextMatch();
}
Here's and example of a function that searches for a sequence of consecutive chars of a specified length and also ignores white space characters:
public static bool HasConsecutiveChars(string source, int sequenceLength)
{
if (string.IsNullOrEmpty(source))
return false;
if (source.Length == 1)
return false;
int charCount = 1;
for (int i = 0; i < source.Length - 1; i++)
{
char c = source[i];
if (Char.IsWhiteSpace(c))
continue;
if (c == source[i+1])
{
charCount++;
if (charCount >= sequenceLength)
return true;
}
else
charCount = 1;
}
return false;
}
Edit fixed range bug :/
Can be done in O(n) easily: for each character, if the previous character is the same as the current, increment a temporary count. If it's different, reset your temporary count. At each step, update your global if needed.
For abbccc you get:
a => temp = 1, global = 1
b => temp = 1, global = 1
b => temp = 2, global = 2
c => temp = 1, global = 2
c => temp = 2, global = 2
c => temp = 3, global = 3
=> c appears three times. Extend it to get the position, then you should be able to print the "ccc" substring.
You can extend this to give you the starting position fairly easily, I'll leave that to you.
Here is a quick solution I crafted with some extra duplicates thrown in for good measure. As others pointed out in the comments, some duplicates are going to be completely legitimate, so you may want to narrow your criteria to punctuation instead of mere characters.
string input = "I loove this post!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!aa";
int index = -1;
int count =1;
List<string> dupes = new List<string>();
for (int i = 0; i < input.Length-1; i++)
{
if (input[i] == input[i + 1])
{
if (index == -1)
index = i;
count++;
}
else if (index > -1)
{
dupes.Add(input.Substring(index, count));
index = -1;
count = 1;
}
}
if (index > -1)
{
dupes.Add(input.Substring(index, count));
}
The better way i my opinion is create a array, each element in array is responsible for one character pair on string next to each other, eg first aa, bb, cc, dd. This array construct with 0 on each element.
Solve of this problem is a for on this string and update array values.
You can next analyze this array for what you want.
Example: For string: bbaaaccccdab, your result array would be { 2, 1, 3 }, because 'aa' can find 2 times, 'bb' can find one time (at start of string), 'cc' can find three times.
Why 'cc' three times? Because 'cc'cc & c'cc'c & cc'cc'.
Use LINQ! (For everything, not just this)
string test = "aabb";
return test.Where((item, index) => index > 0 && item.Equals(test.ElementAt(index)));
// returns "abb", where each of these items has the previous letter before it
OR
string test = "aabb";
return test.Where((item, index) => index > 0 && item.Equals(test.ElementAt(index))).Any();
// returns true

Categories

Resources