How to compare strings with multiple line

How to compare strings with multiple line - c#

I have a method which is generating the following, which i am saving in a string, lets say that string is name output
HDRPB509030978SENTRIC MUSIC 01.102013070914290620130709
GRHREV0000102.100000000000
REV0000000000000000AWAITING YOUR CALL EN00000000044021 POP000436Y ORI PHIL
TRL000010000000100000022
what i am trying is to hard code the above line and compare it to the generated output. I am hard-coding like this i am replacing the next lines with \n like this
string hardCoded = " HDRPB509030978SENTRIC MUSIC \n01.102013070914290620130709 \n GRHREV0000102.100000000000 \n REV0000000000000000AWAITING YOUR CALL \nEN00000000044021 POP000436Y ORI PHIL \n TRL000010000000100000022 "
now when i compare
output == hardCoded
OR
Assert.AreEqual(output,hardCoded);
is is false. how to compare these two

Comparing multi-line strings is not different from comparing single-line strings: the strings you compare must match character-for-character, including whitespace and line breaks. If your generated string uses \r\n separator instead of \n, then the string constant that you expect to get must contain the same separator as well. You can check the kind of separators that you use by setting a breakpoint, and examining the string that you generate in a debugger.
Rather than hard-coding the string for unit testing, consider reading it from a resource. This would let you edit the string in a text editor, and inspect it visually for differences.
Finally, if you do not need the whitespace to match, you could define a function that compares strings excluding whitespace:
static bool EqualsExcludingWhitespace(String a, String b) {
return a.Where(c => !Char.IsWhiteSpace(c))
.SequenceEqual(b.Where(c => !Char.IsWhiteSpace(c)));
}

Related

C# ignore part of string split

In my C# application I concatenate form data into a string format to be passed over to a format expected by a webservice.
string firstName = "Test";
string lastName = "Test";
string freeText = "this is some free text, thanks";
string submitString = firstName + "," + lastName + "," + freeText;
Later in the application I need to pick this apart when it is returned from the webservice to be used somewhere else.
string[] returnData = submitString.Split(',');
However if free text contains a comma, the returnData variable splits it as part of the string array and I would like to keep the contents of freeText as one whole string (despite containing a comma).
Is there a quick way I can ignore the contents of that field in the string split (rather than stopping the customer entering a comma).

If the following two conditions are satisfied:
there is a fixed number of "fields" in your comma-separated string (e.g. 3) and
only the last one can contain additional commas,
Then, yes, you can use the String.Split(char[], int) overload to specify the maximum number of items to return:
var s = "Test,Test,this is some free text, thanks";
var a = s.Split(new[] {','}, 3); // return at most 3 items
Console.WriteLine(a[0]); // prints Test
Console.WriteLine(a[1]); // prints Test
Console.WriteLine(a[2]); // prints this is some free text, thanks
Otherwise, the answer is "no", because String.Split has no way to see a difference between a "field-separating comma" and a "user-entered comma". How would it know to split Test,Test,free text, thanks,Test as Test/Test/free text, thanks/Test or Test/Test/free text/ thanks,Test?
However, there are a few other ways to solve this problem:
What you have is essentially a string with "comma-separated values" (CSV). If you use a professional CSV library (instead of String.Join/String.Split), values that contain commas will be quoted, and those commas will be ignored when extracting the values.
An easier solution might be to use a different string format altogether: If you encode your values in a JSON array instead of a CSV string, the JSON library will take care of encoding/decoding values that include special characters.
Obviously, if you can avoid encoding all values in a single string at all and just use an array or some other data structure instead, the problem would just disappear. However, there is not enough background in your question to know whether this is a viable option.

Is it possible to enter a New Line in a string without Escape Sequences?

I want a String to have a New Line in it, but I cannot use escape sequences because the interface I am sending my string to does not recognize them. As far as I know, C# does not actually store a New Line in the String, but rather it stores the escape sequence, causing the literal contents to be passed, rather than what they actually mean.
My best guess is that I would have to somehow parse the number 10 (the decimal value of a New Line according to the ASCII table) into ASCII. But I'm not sure how to do that, because C# parses numbers directly to String if attempting this:
"hello" + 10 + "world"
Any suggestions?

If you say "hello\nworld", the actual string will contain:
hello
world
There will be an actual new-line character in the string. At no point are the characters \ and n stored in the string.
There are a few ways to get the exact same result, but a simple \n in the string is a common way.
A simple cast should also do the same:
"hello" + (char)10 + "world"
Although likely slightly slower because of string concatenation. I say "likely" because it could probably be optimized away, or an actual example using \n will also result in string concatenation, taking roughly the same amount of time.
Test.

The preferred new line character is Environment.NewLine for its cross-platform capability.

You could use xml for communication, if you're receiver can handle this

Find and replace ASCII character with a new line

I am trying to find every occurrence of an ASCII character in a string and replace it with a new line. Here is what I have so far:
public string parseText(string inTxt)
{
//String builder based on the string passed into the method
StringBuilder n = new StringBuilder(inTxt);
//Convert the ASCII character we're looking for to a string
string replaceMe = char.ConvertFromUtf32(187);
//Replace all occurences of string with a new line
n.Replace(replaceMe, Environment.NewLine);
//Convert our StringBuilder to a string and output it
return n.ToString();
}
This does not add in a new line and the string all remains on one line. I’m not sure what the problem is here. I have tried this as well, but same result:
n.Replace(replaceMe, "\n");
Any suggestions?

char.ConvertFromUtf32, whilst correct, is not the simplest way to read a character based on its ASCII numeric value. (ConvertFromUtf32 is mainly intended for Unicode code points that lie outside the BMP, which result in surrogate pairs. This is not something you'd encounter in English or most modern languages.) Rather, you should just cast it using (char).
char c = (char)187;
string replaceMe = c.ToString();
You may, of course, define a string with the required character as a literal in your code: "»".
Your Replace would then be simplified to:
n.Replace("»", "\n");
Finally, on a technical level, ASCII only covers characters whose value lies in the 0–127 range. Character 187 is not ASCII; however, it corresponds to » in ISO 8859-1, Windows-1252, and Unicode, which collectively are by far the most popular encodings in use today.
Edit: I just tested your original code, and found that it actually worked. Are you sure the result remains on one line? It might be an issue with the way the debugger renders strings in single-line view:
Note that the \r\n sequences actually do represent newlines, despite being displayed as literals. You can check this from the multi-line display (by clicking on the magnifying glass):

StringBuilder.Replace returns a new StringBuilder with the changes made. Strange, I know, but this should work:
StringBuilder replaced = n.Replace(replaceMe, Environment.NewLine);
return replaced.ToString();

Comparing strings that contain formatting in C#

I'm working on a function that given some settings - such as line spacing, the output (in string form) is modified. In order to test such scenarios, I'm using string literals, as shown below for the expected result.
The method, using a string builder, (AppendLine) generates the said output. One issue I have run into is that of comparing such strings. In the example below, both are equal in terms of what they represent. The result is the area which I care about, however when comparing two strings, one literal, one not, equality naturally fails. This is because one of the strings emits line spacing, while the other only demonstrates the formatting it contains.
What would be the best way of solving this equality problem? I do care about formatting such as new lines from the result of the method, this is crucially important.
Code:
string expected = #"Test\n\n\nEnd Test.";
string result = "Test\n\n\nEnd Test";
Console.WriteLine(expected);
Console.WriteLine(result);
Output:
Test\n\n\nEnd Test.
Test
End Test

The # prefix tells the compiler to take the string exactly as it is written. So, it doesn't format the \n characters to carriage returns and line feeds.
Since you don't have the same prefix for the string assigned to your result variable, the compiler formats it. If you would like to continue to use the # prefix, just do the following:
string expected = #"Test
End Test";
You'll have to input the carriage returns and line feed within the string as invisible characters.

You're using the term "literal" incorrectly. "Literal" simply means an actual value that exists in code. In other words, values exist in code either as variables (for the sake of simplicity I'm including constants in this group) and literals. Variables are an abstract notion of a value, whereas literals are a value.
All this is to say that both of your strings are string literals, as they're hard-coded into your application. The # prefix simply states that the compiler is to include escape characters (indeed, anything other than a double-quote) in the string, rather than evaluating the escape sequences when compiling the string literal into the assembly.
First of all, whatever your function returns (either a string that contains standard escape sequences for newlines rather than newlines themselves, or a string that actually contains newlines) is what your test variable should contain. Make your tests as close to the actual output as possible, as the more work you do to massage the values into a comparable form the more code paths you have to test. If you're looking to be able to compare a string with formatting escape sequences embedded into it to a string where those sequences have been evaluated (essentially comparing the two strings in your example), then I would say this:
Be sure that this is really want you want to do.
You'll have to duplicate the functionality of the C# compiler in interpreting these values and turning your "format string" into a "formatted string".
For doing #2, a RegEx processor is probably going to be the simplest option. See this page for a list of C# string escape sequences.

I feel somewhat enlightened, yet annoyed at what I discovered.
This is my first project using MSTest, and after a failing test I was selecting View Test Details to see how and why my test failed. The formatting for string output in this details display is very poor, for example you get:
Assert.AreEqual failed. Expected:<TestTest End>. Actual:<TestTest End>.
This is for formatted text - the strange thing is if you have /r (line feeds) instead of line breaks (/n) the formatting is actually somewhat correct.
It turns out to view the correct output you need to run the tests in debug mode. In other words, when you have a failing test, run the test in debug and the exception will be caught and displayed as follows:
Assert.AreEqual failed. Expected:<Test
Test End>. Actual:<Test
Test End>.
The above obviously containing the correct formatting.
In the end it turns out my initial method of storing the expectations (with formatting) in strings was correct, yet my unfamiliarity of MSTest made me question my means as it appeared to be valid input, yet was simply being displayed back to myself in what appeared a valid output.

Use a regex to strip white space before you do your compare?

.NET string IndexOf unexpected result

A string variable str contains the following somewhere inside it: se\">
I'm trying to find the beginning of it using:
str.IndexOf("se\\\">")
which returns -1
Why isn't it finding the substring?
Note: due to editing the snippet showed 5x \ for a while, the original had 3 in a row.

Your code is in fact searching for 'se\\">'. When searching for strings including backslashes I usually find it easier to use verbatim strings:
str.IndexOf(#"se\"">")
In this case you also have a quote in the search string, so there is still some escaping, but I personally find it easier to read.
Update: my answer was based on the edit that introduced extra slashes in the parameter to the IndexOf call. Based on current version, I would place my bet on str simply not containing the expected character sequence.
Update 2:
Based on the comments on this answer, it seems to be some confusion regarding the role of the '\' character in the strings. When you inspect a string in the Visual Studio debugger, it will be displayed with escaping characters.
So, if you have a text box and type 'c:\' in it, inspecting the Text property in the debugger will show 'c:\\'. An extra backslash is added for escaping purposes. The actual string content is still 'c:\' (which can be verified by checking the Length property of the string; it will be 3, not 4).
If we take the following string (taken from the comment below)
" '<em
class=\"correct_response\">a
night light</em><br
/><br /><table
width=\"100%\"><tr><td
class=\"right\">Ingrid</td></tr></table>')"
...the \" sequences are simply escaped quotation marks; the backslashes are not part of the string content. So, you are in fact looking for 'se">', not 'se\">'. Either of these will work:
str.IndexOf(#"se"">"); // verbatim string; escape quotation mark by doubling it
str.IndexOf("se\">"); // regular string; escape quotation mark using backslash

This works:
string str = "<case\\\">";
int i = str.IndexOf("se\\\">"); // i = 3
Maybe you're not correctly escaping one of the two strings?
EDIT there's an extra couple of \ in the string you are searching for.

Maybe the str variable does not actually contain the backslash.
It may be just that when you mouse over the variable while debugging, the debugger tooltip will show the escape character.
e.g. If you put a breakpoint after this assignment
string str = "123\"456";
the tooltip will show 123\"456 and not 123"456.
However if you click on the visualize icon, you will get the correct string 123"456

Following code:
public static void RunSnippet()
{
string s = File.ReadAllText (#"D:\txt.txt");
Console.WriteLine (s);
int i = s.IndexOf("se\\\">");
Console.WriteLine (i);
}
Gives following output:
some text before se\"> some text after
17
Seems like working to me...

TextBox2.Text = TextBox1.Text.IndexOf("se\"">")
seems to work in VB.

DoubleQuotes within a string need to be specified like "" Also consider using verbatim strings - So an example would be
var source = #"abdefghise\"">jklmon";
Console.WriteLine(source.IndexOf(#"se\"">")); // returns 8

If you are looking for se\">
then
str.IndexOf(#"se\"">")
is less error-prone. Note the double "" and single \
Edit, after the comment: it seems like the string may contain ecaping itself, in which case in se\"> the \" was an escaped quote, so the literal text is simply se"> and the string to use is Indexof("se\">")

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.