String.Substring() crashes with certain inputs

String.Substring() crashes with certain inputs - c#

I trying to split a string with Substring(), and I am having a problem I keep getting crashes with certin values.The problematic lane is(according to the "debugging" i tried):
string sub = str.Substring(beg,i);
and the whole code is :
static void Prints(string str)
{
int beg = 0;
for (int i = 0; i < str.Length; i++)
{
if (str[i] == '*')
{
Console.WriteLine(i);
//Console.WriteLine("before");
string sub = str.Substring(beg,i);
//Console.WriteLine("after");
beg = i+1;
if (sub.Length % 2 == 0)
{
Console.WriteLine(sub.Length/2);
int n = sub.Length / 2;
Console.WriteLine("{0} {1}", sub[n-1], sub[n]);
}
else
{
int n = sub.Length / 2;
Console.WriteLine(sub[n]);
}
The eror happens when the input is :
hi*its*
thats the output:
h i
Unhandled Exception: System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
Parameter name: length
at System.String.Substring(Int32 startIndex, Int32 length)
at _39.Program.Prints(String str) in D:\12\39\Program.cs:line 36
at _39.Program.Main(String[] args) in D:\12\39\Program.cs:line 13
I know there might be a better way using split() but I still want to understand what cause the eror.
Thanks in advance
Doron.

The problem is that you're not subtracting the distance you are into the string from the overall length.
If you look at the debug output you will find that:
str.Substring(3, 1) = "i"
str.Substring(3, 2) = "it"
str.Substring(3, 3) = "its"
str.Substring(3, 4) = "its*"
str.Substring(3, 5) = // Error! You're beyond the end of the string.
So clearly you are attempting to pull (in your example) 6 characters from the string starting at position 3. This would require an input string with total length 10 or more (as substring is Zero Index based). Your input string is only 7 chars long.
Try tokenizing your string. As soon as you try manually tokenizing using indices and counting things go wrong. Tokenizing is a god send :)
Good Luck!

Related

How do you do a string split with 2 chars counts in C#?

I know how to do a string split if there's a letter, number, that I want to replace.
But how could I do a string.Split() by 2 char counts without replacing any existing letters, number, etc...?
Example:
string MAC = "00122345"
I want that string to output: 00:12:23:45

You could create a LINQ extension method to give you an IEnumerable<string> of parts:
public static class Extensions
{
public static IEnumerable<string> SplitNthParts(this string source, int partSize)
{
if (string.IsNullOrEmpty(source))
{
throw new ArgumentException("String cannot be null or empty.", nameof(source));
}
if (partSize < 1)
{
throw new ArgumentException("Part size has to be greater than zero.", nameof(partSize));
}
return Enumerable
.Range(0, (source.Length + partSize - 1) / partSize)
.Select(pos => source
.Substring(pos * partSize,
Math.Min(partSize, source.Length - pos * partSize)));
}
}
Usage:
var strings = new string[] {
"00122345",
"001223453"
};
foreach (var str in strings)
{
Console.WriteLine(string.Join(":", str.SplitNthParts(2)));
}
// 00:12:23:45
// 00:12:23:45:3
Explanation:
Use Enumerable.Range to get number of positions to slice string. In this case its the length of the string + chunk size - 1, since we need to get a big enough range to also fit leftover chunk sizes.
Enumerable.Select each position of slicing and get the startIndex using String.Substring using the position multiplied by 2 to move down the string every 2 characters. You will have to use Math.Min to calculate the smallest size leftover size if the string doesn't have enough characters to fit another chunk. You can calculate this by the length of the string - current position * chunk size.
String.Join the final result with ":".
You could also replace the LINQ query with yield here to increase performance for larger strings since all the substrings won't be stored in memory at once:
for (var pos = 0; pos < source.Length; pos += partSize)
{
yield return source.Substring(pos, Math.Min(partSize, source.Length - pos));
}

You can use something like this:
string newStr= System.Text.RegularExpressions.Regex.Replace(MAC, ".{2}", "$0:");
To trim the last colon, you can use something like this.
newStr.TrimEnd(':');
Microsoft Document

Try this way.
string MAC = "00122345";
MAC = System.Text.RegularExpressions.Regex.Replace(MAC,".{2}", "$0:");
MAC = MAC.Substring(0,MAC.Length-1);
Console.WriteLine(MAC);

A quite fast solution, 8-10x faster than the current accepted answer (regex solution) and 3-4x faster than the LINQ solution
public static string Format(this string s, string separator, int length)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i += length)
{
sb.Append(s.Substring(i, Math.Min(s.Length - i, length)));
if (i < s.Length - length)
{
sb.Append(separator);
}
}
return sb.ToString();
}
Usage:
string result = "12345678".Format(":", 2);

Here is a one (1) line alternative using LINQ Enumerable.Aggregate.
string result = MAC.Aggregate("", (acc, c) => acc.Length % 3 == 0 ? acc += c : acc += c + ":").TrimEnd(':');

An easy to understand and simple solution.
This is a simple fast modified answer in which you can easily change the split char.
This answer also checks if the number is even or odd , to make the suitable string.Split().
input : 00122345
output : 00:12:23:45
input : 0012234
output : 00:12:23:4
//The List that keeps the pairs
List<string> MACList = new List<string>();
//Split the even number into pairs
for (int i = 1; i <= MAC.Length; i++)
{
if (i % 2 == 0)
{
MACList.Add(MAC.Substring(i - 2, 2));
}
}
//Make the preferable output
string output = "";
for (int j = 0; j < MACList.Count; j++)
{
output = output + MACList[j] + ":";
}
//Checks if the input string is even number or odd number
if (MAC.Length % 2 == 0)
{
output = output.Trim(output.Last());
}
else
{
output += MAC.Last();
}
//input : 00122345
//output : 00:12:23:45
//input : 0012234
//output : 00:12:23:4

Taking certain string characters and returning the string

I asked this question yesterday but it wasn't well received mainly due to how I asked it so ill try do better this time.
I have a string variable called message. lets say message equals "ABCDABCDABCDABCD"
now I need to do some processing on the characters in the string but not all at the same time, I want to access characters [0][4][8][12] on the first pass of the function, put each of these characters in a string and return it which is easy done if I pass an integer to my function lets say 4 and with in a for loop do
if(i % int == 0)
{
string += message[i];
}
this should return "AAAA"
the next time I call the function ill need elements [0][1], [4][5], [8][9], [12][13] and the time after that ill need [0][1][2], [4][5][6], [8][9][10], [12][13][14].
I need the characters returned in a string in the order they were taken, I could do this by changing my int I pass the function but then id need to call the function several times and do work on the returned strings to get them into the order they were taken, which I have already tried and it slowed my program down when dealing with large messages > 10k characters.
Please don't delete or put my question on hold, im quite happy to give more information on my problem if its not clear, ill seldom post to this site and usually try and find a solution myself, there are too many acceptance junkies on here for my liking. but I would appreciate some help from some of them regarding this.Thanks
Edit
I understand its not easy to figure it out and I have to say im not the best at describing it, its a vigenere cracker in WPF, I have done the kasiski examination on a piece of text and graphed out all the data, it finds the key length 90% of the time or gives me the best clue to what the key might be, now im calculating the frequency of bi,tri and quad grams of the message based on the data from the kasiski exam, lets say the key is 5 and the message is "ABCDABCDABCDABCD" im calculating probability on only the characters of the key Im changing so when I try key AAAAA im only wanting to calculate monograms on elements [0][4][9][14] of the message, ill run through 26 characters up to ZAAAA and take the most probable then I move onto element [1] of the key, lets say FAAAA gave the best score on the first element of the key. now I need elements [0][1],[5][6],[9][10][13][14] as im calculating probability on 2 pieces on the key FCAAA, so the length of the key and what key character im working on will determine what elements of the message ill be taking.

One-liner with LINQ (I use Batch extension from MoreLINQ, but you can use your own) which selects all required chars from input string:
string message = "ABCDABCDABCDABCD";
int size = 4;
int charsToTake = 2;
var characters = message.Batch(size).SelectMany(b => b.Take(charsToTake));
If you need result as string, you can easily create one:
var result = new String(characters.ToArray());
// ABABABAB
More efficient way - create your own method which will split string by substrings of required length:
public static IEnumerable<string> ToSubstrings(this string s, int length)
{
int index = 0;
while (index + length < s.Length)
{
yield return s.Substring(index, length);
index += length;
}
if (index < s.Length)
yield return s.Substring(index);
}
I would also create method for safe getting substring from start of string (to avoid annoying string length check and passing zero as start index):
public static string SubstringFromStart(this string s, int length)
{
return s.Substring(0, Math.Min(s.Length, length));
}
Now its very clear what you are doing:
var substrings = message.ToSubstrings(size)
.Select(s => s.SubstringFromStart(charsToTake));
var result = String.Concat(substrings);

Here is a simple program which does what you want, if I understand correctly:
static void Main(string[] args)
{
string data = "ABCDABCDABCDABCD";
Console.WriteLine(StrangeSubstring(data,4, 1));
// "AAAA"
Console.WriteLine(StrangeSubstring(data,4, 2));
// "ABABABAB"
Console.WriteLine(StrangeSubstring(data,4, 3));
// "ABCABCABCABC"
}
static string StrangeSubstring(string input, int modulo, int length)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.Length; ++i)
{
if (i % modulo == 0)
{
for (int j = 0; j<length; ++j)
{
if (i+j < input.Length)
sb.Append(input[i+j]);
}
}
}
return sb.ToString();
}

My solution will be like this
static string MethodName(int range){
StringBuilder sb = new StringBuilder();
for(int i = 0 ; i < str.Length ; i++){
if(i % 4 == 0){
sb.Append(str[i]);
for(int j = i + 1 ; j <= i + range ; j ++){
if(j >= str.Length)
break;
sb.Append(str[j]);
}
}
}
return sb.ToString();
}

you can parse your string to a char array :
string message="ABCDABCDABCDABCD";
char[] myCharArray = message.ToCharArray();
string result="";
for(int i=0, i<myCharArray.Length -1 ; i++)
{
if(i%4 ==0)
result+=myCharArray[i];
}
EDIT 1 :
public string[] myfunction(char[] charArray)
{
List<string> result = new List<string>();
for(int i=0, i<charArray.length -1; i=i+4)
{
result.add(charArray[i]+charArray[i+1])
}
return result.toArray();
}

This is a recursive solution. In YourFunction, PatternLength is the length of the character pattern which is repeated (so, 4 for "ABCD"), Offset is where you start in the pattern (e.g. 0 if you start with "A") and SubstringLength is the number of characters.
The function call in Main will give you all "A". If you change SubstringLength to 2, it gives you all "AB". There is no error handling, make sure then PatternLength<=Offest+SubstringLength
namespace Foo
{
class Bar
{
static void Main(string[] args)
{
Console.WriteLine(YourFunction("ABCABCABCABCABCABCABC", 3, 0,1));
Console.ReadKey();
}
static string YourFunction(string SubString, int PatternLength, int Offset, int SubstringLength)
{
string result;
if (SubString.Length <= PatternLength)
{
result = SubString.Substring(Offset, SubstringLength);
}
else
{
result = YourFunction(SubString.Substring(PatternLength, (SubString.Length - PatternLength)), PatternLength, Offset, SubstringLength) + SubString.Substring(Offset, SubstringLength);
}
return result;
}
}
}

erroneous character fixing of strings in c#

I have five strings like below,
ABBCCD
ABBDCD
ABBDCD
ABBECD
ABBDCD
all the strings are basically same except for the fourth characters. But only the character that appears maximum time will take the place. For example here D was placed 3 times in the fourth position. So, the final string will be ABBDCD. I wrote following code, but it seemed to be less efficient in terms of time. Because this function can be called million times. What should I do to improve the performance?
Here changedString is the string to be matched with other 5 strings. If Any position of the changed string is not matched with other four, then the maxmum occured character will be placed on changedString.
len is the length of the strings which is same for all strings.
for (int i = 0; i < len;i++ )
{
String findDuplicate = string.Empty + changedString[i] + overlapStr[0][i] + overlapStr[1][i] + overlapStr[2][i] +
overlapStr[3][i] + overlapStr[4][i];
char c = findDuplicate.GroupBy(x => x).OrderByDescending(x => x.Count()).First().Key;
if(c!=changedString[i])
{
if (i > 0)
{
changedString = changedString.Substring(0, i) + c +
changedString.Substring(i + 1, changedString.Length - i - 1);
}
else
{
changedString = c + changedString.Substring(i + 1, changedString.Length - 1);
}
}
//string cleanString = new string(findDuplicate.ToCharArray().Distinct().ToArray());
}

I'm not quite sure what you are going to do, but if it is about sorting strings by some n-th character, then the best way is to use Counting Sort http://en.wikipedia.org/wiki/Counting_sort It is used for sorting array of small integers and is quite fine for chars. It has linear O(n) time. The main idea is that if you know all your possible elements (looks like they can be only A-Z here) then you can create an additional array and count them. For your example it will be {0, 0, 1 ,3 , 1, 0,...} if we use 0 for 'A', 1 for 'B' and so on.

There is a function that might help performance-wise as it runs five times faster. The idea is to count occurrences yourself using a dictionary to convert character to a position into counting array, increment value at this position and check if it is greater than previously highest number of occurrences. If it is, current character is top and is stored as result. This repeats for each string in overlapStr and for each position within the strings. Please read comments inside code to see details.
string HighestOccurrenceByPosition(string[] overlapStr)
{
int len = overlapStr[0].Length;
// Dictionary transforms character to offset into counting array
Dictionary<char, int> char2offset = new Dictionary<char, int>();
// Counting array. Each character has an entry here
int[] counters = new int[overlapStr.Length];
// Highest occurrence characters found so far
char[] topChars = new char[len];
for (int i = 0; i < len; ++i)
{
char2offset.Clear();
// faster! char2offset = new Dictionary<char, int>();
// Highest number of occurrences at the moment
int highestCount = 0;
// Allocation of counters - as previously unseen character arrives
// it is given a slot at this offset
int lastOffset = 0;
// Current offset into "counters"
int offset = 0;
// Small optimization. As your data seems very similar, this helps
// to reduce number of expensive calls to TryGetValue
// You might need to remove this optimization if you don't have
// unused value of char in your dataset
char lastChar = (char)0;
for (int j = 0; j < overlapStr.Length; ++ j)
{
char thisChar = overlapStr[j][i];
// If this is the same character as last one
// Offset already points to correct cell in "counters"
if (lastChar != thisChar)
{
// Get offset
if (!char2offset.TryGetValue(thisChar, out offset))
{
// First time seen - allocate & initialize cell
offset = lastOffset;
counters[offset] = 0;
// Map character to this cell
char2offset[thisChar] = lastOffset++;
}
// This is now last character
lastChar = thisChar;
}
// increment and get count for character
int charCount = ++counters[offset];
// This is now highestCount.
// TopChars receives current character
if (charCount > highestCount)
{
highestCount = charCount;
topChars[i] = thisChar;
}
}
}
return new string(topChars);
}
P.S. This is certainly not the best solution. But as it is significantly faster than original I thought I should help out.

Help me to understand this c# code

this code in Beginning C# 3.0: An Introduction to Object Oriented Programming
this is a program that has the user enter a couple of sentences in a multi - line textbox and then count how many times each letter occurs in that text
private const int MAXLETTERS = 26; // Symbolic constants
private const int MAXCHARS = MAXLETTERS - 1;
private const int LETTERA = 65;
.........
private void btnCalc_Click(object sender, EventArgs e)
{
char oneLetter;
int index;
int i;
int length;
int[] count = new int[MAXLETTERS];
string input;
string buff;
length = txtInput.Text.Length;
if (length == 0) // Anything to count??
{
MessageBox.Show("You need to enter some text.", "Missing Input");
txtInput.Focus();
return;
}
input = txtInput.Text;
input = input.ToUpper();
for (i = 0; i < input.Length; i++) // Examine all letters.
{
oneLetter = input[i]; // Get a character
index = oneLetter - LETTERA; // Make into an index
if (index < 0 || index > MAXCHARS) // A letter??
continue; // Nope.
count[index]++; // Yep.
}
for (i = 0; i < MAXLETTERS; i++)
{
buff = string.Format("{0, 4} {1,20}[{2}]", (char)(i + LETTERA)," ",count[i]);
lstOutput.Items.Add(buff);
}
}
I do not understand this line
count[index]++;
and this line of code
buff = string.Format("{0, 4} {1,20}[{2}]", (char)(i + LETTERA)," ",count[i]);

count[index]++; means "add 1 to the value in count at index index". The ++ is specifically known as incrementing. What the code is doing is tallying the number of occurrences of a letter.
buff = string.Format("{0, 4} {1,20}[{2}]", (char)(i + LETTERA)," ",count[i]); is formatting a line of output. With string.Format, you first pass in a format specifier that works like a template or form letter. The parts between { and } specify how the additional arguments passed into string.Format are used. Let me break down the format specification:
{0, 4} The first (index 0) argument (which is the letter, in this case).
The ,4 part means that when it is output, it should occupy 4 columns
of text.
{1,20} The second (index 1) argument (which is a space in this case).
The ,20 is used to force the output to be 20 spaces instead of 1.
{2} The third (index 2) argument (which is the count, in this case).
So when string.Format runs, (char)(i + LETTERA) is used as the first argument and is plugged into the {0} portion of the format. " " is plugged into {1}, and count[i] is plugged into {2}.

count[index]++;
That's a post-increment. If you were to save the return of that it would be count[index] prior to the increment, but all it basically does is increment the value and return the value prior to the increment. As for the reason why there is a variable inside square brackets, it is referencing a value in the index of an array. In other words, if you wanted to talk about the fifth car on the street, you may consider something like StreetCars(5). Well, in C# we use square brackets and zero-indexing, so we would have something like StreetCars[4]. If you had a Car array call StreetCars you could reference the 5th Car by using the indexed value.
As for the string.Format() method, check out this article.

c# getting a string within another string

i have a string like this:
some_string = "A simple demo of SMS text messaging.\r\n+CMGW: 3216\r\n\r\nOK\r\n\"
im coming from vb.net and i need to know in c#, if i know the position of CMGW, how do i get "3216" out of there?
i know that my start should be the position of CMGW + 6, but how do i make it stop as soon as it finds "\r" ??
again, my end result should be 3216
thank you!

Find the index of \r from the start of where you're interested in, and use the Substring overload which takes a length:
// Production code: add validation here.
// (Check for each index being -1, meaning "not found")
int cmgwIndex = text.IndexOf("CMGW: ");
// Just a helper variable; makes the code below slightly prettier
int startIndex = cmgwIndex + 6;
int crIndex = text.IndexOf("\r", startIndex);
string middlePart = text.Substring(startIndex, crIndex - startIndex);

If you know the position of 3216 then you can just do the following
string inner = some_string.SubString(positionOfCmgw+6,4);
This code will take the substring of some_string starting at the given position and only taking 4 characters.
If you want to be more general you could do the following
int start = positionOfCmgw+6;
int endIndex = some_string.IndexOf('\r', start);
int length = endIndex - start;
string inner = some_string.SubString(start, length);

One option would be to start from your known index and read characters until you hit a non-numeric value. Not the most robust solution, but it will work if you know your input's always going to look like this (i.e., no decimal points or other non-numeric characters within the numeric part of the string).
Something like this:
public static int GetNumberAtIndex(this string text, int index)
{
if (index < 0 || index >= text.Length)
throw new ArgumentOutOfRangeException("index");
var sb = new StringBuilder();
for (int i = index; i < text.Length; ++i)
{
char c = text[i];
if (!char.IsDigit(c))
break;
sb.Append(c);
}
if (sb.Length > 0)
return int.Parse(sb.ToString());
else
throw new ArgumentException("Unable to read number at the specified index.");
}
Usage in your case would look like:
string some_string = #"A simple demo of SMS text messaging.\r\n+CMGW: 3216\r\n...";
int index = some_string.IndexOf("CMGW") + 6;
int value = some_string.GetNumberAtIndex(index);
Console.WriteLine(value);
Output:
3216

If you're looking to extract the number portion of 'CMGW: 3216' then a more reliable method would be to use regular expressions. That way you can look for the entire pattern, and not just the header.
var some_string = "A simple demo of SMS text messaging.\r\n+CMGW: 3216\r\n\r\nOK\r\n";
var match = Regex.Match(some_string, #"CMGW\: (?<number>[0-9]+)", RegexOptions.Multiline);
var number = match.Groups["number"].Value;

More general, if you don't know the start position of CMGW but the structure remains as before.
String s;
char[] separators = {'\r'};
var parts = s.Split(separators);
parts.Where(part => part.Contains("CMGW")).Single().Reverse().TakeWhile(c => c != ' ').Reverse();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

String.Substring() crashes with certain inputs - c#

Related

How do you do a string split with 2 chars counts in C#?

Taking certain string characters and returning the string

erroneous character fixing of strings in c#

Help me to understand this c# code

c# getting a string within another string

Categories

Resources