Unquote string in C#

Unquote string in C# - c#

I have a data file in INI file like format that needs to be read by both some C code and some C# code. The C code expects string values to be surrounded in quotes. The C# equivalent code is using some underlying class or something I have no control over, but basically it includes the quotes as part of the output string. I.e. data file contents of
MY_VAL="Hello World!"
gives me
"Hello World!"
in my C# string, when I really need it to contain
Hello World!
How do I conditionally (on having first and last character being a ") remove the quotes and get the string contents that I want.

On your string use Trim with the " as char:
.Trim('"')

I usually call String.Trim() for that purpose:
string source = "\"Hello World!\"";
string unquoted = source.Trim('"');

My implementation сheck that quotes are from both sides
public string UnquoteString(string str)
{
if (String.IsNullOrEmpty(str))
return str;
int length = str.Length;
if (length > 1 && str[0] == '\"' && str[length - 1] == '\"')
str = str.Substring(1, length - 2);
return str;
}

Just take the returned string and do a Trim('"');

Being obsessive, here (that's me; no comment about you), you may want to consider
.Trim(' ').Trim('"').Trim(' ')
so that any, bounding spaces outside of the quoted string are trimmed, then the quotation marks are stripped and, finally, any, bounding spaces for the contained string are removed.
If you want to retain contained, bounding white space, omit the final .Trim(' ').
Should there be embedded spaces and/or quotation marks, they will be preserved. Chances are, such are desired and should not be deleted.
Do some study as to what a no argument Trim() does to things like form feed and/or tabulation characters, bounding and embedded. It could be that one and/or the other Trim(' ') should be just Trim().

If you know there will always be " at the end and beginning, this would be the fastest way.
s = s.Substring(1, s.Length - 2);

Use string replace function or trim function.
If you just want to remove first and last quotes use substring function.
string myworld = "\"Hello World!\"";
string start = myworld.Substring(1, (myworld.Length - 2));

I would suggest using the replace() method.
string str = "\"HelloWorld\"";
string result = str.replace("\"", string.Empty);

What you are trying to do is often called "stripping" or "unquoting". Usually, when the value is quoted that means not only that it is surrounded by quotation characters (like " in this case) but also that it may or may not contain special characters to include quotation character itself inside quoted text.
In short, you should consider using something like:
string s = #"""Hey ""Mikey""!";
s = s.Trim('"').Replace(#"""""", #"""");
Or when using apostrophe mark:
string s = #"'Hey ''Mikey''!";
s = s.Trim('\'').Replace("''", #"'");
Also, sometimes values that don't need quotation at all (i.e. contains no whitespace) may not need to be quoted anyway. That's the reason checking for quotation characters before trimming is reasonable.
Consider creating a helper function that will do this job in a preferable way as in the example below.
public static string StripQuotes(string text, char quote, string unescape)a
{
string with = quote.ToString();
if (quote != '\0')
{
// check if text contains quote character at all
if (text.Length >= 2 && text.StartsWith(with) && text.EndsWith(with))
{
text = text.Trim(quote);
}
}
if (!string.IsNullOrEmpty(unescape))
{
text = text.Replace(unescape, with);
}
return text;
}
using System;
public class Program
{
public static void Main()
{
string text = #"""Hello World!""";
Console.WriteLine(text);
// That will do the job
// Output: Hello World!
string strippedText = text.Trim('"');
Console.WriteLine(strippedText);
string escapedText = #"""My name is \""Bond\"".""";
Console.WriteLine(escapedText);
// That will *NOT* do the job to good
// Output: My name is \"Bond\".
string strippedEscapedText = escapedText.Trim('"');
Console.WriteLine(strippedEscapedText);
// Allow to use \" inside quoted text
// Output: My name is "Bond".
string strippedEscapedText2 = escapedText.Trim('"').Replace(#"\""", #"""");
Console.WriteLine(strippedEscapedText2);
// Create a function that will check texts for having or not
// having citation marks and unescapes text if needed.
string t1 = #"""My name is \""Bond\"".""";
// Output: "My name is \"Bond\"."
Console.WriteLine(t1);
// Output: My name is "Bond".
Console.WriteLine(StripQuotes(t1, '"', #"\"""));
string t2 = #"""My name is """"Bond"""".""";
// Output: "My name is ""Bond""."
Console.WriteLine(t2);
// Output: My name is "Bond".
Console.WriteLine(StripQuotes(t2, '"', #""""""));
}
}
https://dotnetfiddle.net/TMLWHO

Here's my solution as extension method:
public static class StringExtensions
{
public static string UnquoteString(this string inputString) => inputString.TrimStart('"').TrimEnd('"');
}
It's just trimming at the start an the end...

Related

C# - Split a string separated by ':'

Im trying to split this string:
PublishDate: "2011-03-18T11:08:07.983"
I tried Split method but it's not successful.
str.Split(new[] { ':', ' ' }, StringSplitOptions.RemoveEmptyEntries)
As a result I get PublishDate 2011-03-18T11 08 07.983
But correct result is PublishDate 2011-03-18T11:08:07.983
What i need to do?

Split(String, Int32, StringSplitOptions)
Splits a string into a maximum number of substrings based on a specified delimiting string and, optionally, options.
str.Split(':', 2, StringSplitOptions.RemoveEmptyEntries)
https://learn.microsoft.com/en-us/dotnet/api/system.string.split?view=net-6.0#system-string-split(system-string-system-int32-system-stringsplitoptions)

I would solve like this:
locate the index of the first :. The property name will be all the characters before this, which you can extract with Substring and Trim to remove whitespace before the colon, if present.
locate the index of the first " and last ". Characters between the first and last quotes are the property value.
string input = "PublishDate: \"2011-03-18T11:08:07.983\"";
int iColon = input.IndexOf(':');
int iOpenQuote = input.IndexOf('"', iColon);
int iCloseQuote = input.LastIndexOf('"');
string propertyName = input.Substring(0, iColon).Trim();
string propertyValue = input.Substring(iOpenQuote + 1, iCloseQuote - iOpenQuote - 1);
This does not handle escaped characters within the property value (for example, to embed a literal quote or newline using a typical escape sequence like \" or \n). But it's likely good enough to extract a date/time string, and permits all characters because of the use of LastIndexOf. However, this is not robust against malformed input, so you will want to add checks for missing colon, or missing quote, or what happens when the close quote is missing (same same index for start and end quote).

So if I got you right, you want as a result: PublishDate 2011-03-18T11:08:07.983.
Then I would recommend you to use the string.Replace method.
using System;
public class HelloWorld
{
public static void Main(string[] args)
{
string yourData = "PublishDate: \"2011-03-18T11:08:07.983\"";
// First replace the colon and the space after the PublishDate with and space
// then replace the quotes from the timestamp -> "2011-03-18T11:08:07.983"
yourData = yourData.Replace(": ", " ").Replace("\"", "");
// Output the result -> PublishDate 2011-03-18T11:08:07.983
Console.WriteLine(yourData);
}
}

.NET Regex To Remove Line Breaks Within Quotes

I am trying to clean up a text file so that it can be imported into Excel but the text file contains line breaks within several of the double quoted fields. The file is tab delimited.
Example would be:
"12313"\t"1234"\t"123
5679"
"test"\t"test"\t"test"
"test"\t"test"\t"test"
"12313"\t"1234"\t"123
5679"
I need to remove the line breaks so that it will ultimately display like:
"12313"\t"1234"\t"1235679"
"test"\t"test"\t"test"
"test"\t"test"\t"test"
"12313"\t"1234"\t"1235679"
The "\t" is the tab delimiter.
I've looked at several other solutions on SO but they don't seem to deal with multiple lines. We've tried using several CSV parser solutions but can't seem to get them to work for this scenario. The goal is to pass the entire string into a REGEX expression and have it return with all line breaks between quotes removed while the line breaks outside of the quotes remain.

You can use this regex:
(?!(([^"]*"){2})*[^"]*$)\n+
Working Demo
This one matches one or more newline character that are not followed by even number of quotes (It assumes there is no escaping exceptions in the data).

This worked for me:
var fixedCsvFileContent = Regex.Replace(csvFileContent, #"(?!(([^""]*""){2})*[^""]*$)\n+", string.Empty);
This didnt work:
var fixedCsvFileContent = Regex.Replace(csvFileContent, #"(?!(([^""]*""){2})*[^""]*$)\n+", string.Empty, RegexOptions.Multiline);
Thus one must not add RegexOptions.Multiline when doing the check on the input string.

If just removing blank lines works:
string text = Regex.Replace( inputString, #"\n\n", "" , RegexOptions.None | RegexOptions.Multiline );

I have been running into a similar problem, but also some of the files might be really large. So using a RegEx on everything would be a heavy solution, and instead I wanted to try to make something a bit like ReadLine except that it would ignore breaklines within quotes. This is the solution I am using.
It is an extension to the StreamReader class, used to reading the CSV files and like some of the RegEx solutions here, it ensures there is an even number of quotes. So it uses ReadLine, checks if there is an odd number of quotes and if there is it does another ReadLine until the number of quotes is even:
public static class Extensions
{
public static string ReadEntry(this StreamReader sr)
{
string strReturn = "";
//get first bit
strReturn += sr.ReadLine();
//And get more lines until the number of quotes is even
while (strReturn.GetNumberOf("\"").IsOdd())
{
string strNow = sr.ReadLine();
strReturn += strNow;
}
//Then return what we've gotten
if (strReturn == "")
{
return null;
}
else
{
return strReturn;
}
}
public static int GetNumberOf(this string s, string strSearchString)
{
return s.Length - s.Replace(strSearchString, "").Length;
}
public static Boolean IsOdd(this int i)
{
return i % 2 != 0;
}
}

string output = Regex.Replace(input, #"(?<=[^""])\r\n", string.Empty);
Demo with the input provided

Insert spaces between words on a camel-cased token [duplicate]

This question already has answers here:
.NET - How can you split a "caps" delimited string into an array?
(19 answers)
Closed 10 years ago.
Is there a nice function to to turn something like
FirstName
to this:
First Name?

See: .NET - How can you split a "caps" delimited string into an array?
Especially:
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")

Here's an extension method that I have used extensively for this kind of thing
public static string SplitCamelCase( this string str )
{
return Regex.Replace(
Regex.Replace(
str,
#"(\P{Ll})(\P{Ll}\p{Ll})",
"$1 $2"
),
#"(\p{Ll})(\P{Ll})",
"$1 $2"
);
}
It also handles strings like IBMMakeStuffAndSellIt, converting it to IBM Make Stuff And Sell It (IIRC).
Syntax explanation (credit):
{Ll} is Unicode Character Category "Letter lowercase" (as opposed to {Lu} "Letter uppercase"). P is a negative match, while p is a positive match, so \P{Ll} is literally "Not lowercase" and p{Ll} is "Lowercase".
So this regex splits on two patterns. 1: "Uppercase, Uppercase, Lowercase" (which would match the MMa in IBMMake and result in IBM Make), and 2. "Lowercase, Uppercase" (which would match on the eS in MakeStuff). That covers all camelcase breakpoints.
TIP: Replace space with hyphen and call ToLower to produce HTML5 data attribute names.

Simplest Way:
var res = Regex.Replace("FirstName", "([A-Z])", " $1").Trim();

You can use a regular expression:
Match ([^^])([A-Z])
Replace $1 $2
In code:
String output = System.Text.RegularExpressions.Regex.Replace(
input,
"([^^])([A-Z])",
"$1 $2"
);

/// <summary>
/// Parse the input string by placing a space between character case changes in the string
/// </summary>
/// <param name="strInput">The string to parse</param>
/// <returns>The altered string</returns>
public static string ParseByCase(string strInput)
{
// The altered string (with spaces between the case changes)
string strOutput = "";
// The index of the current character in the input string
int intCurrentCharPos = 0;
// The index of the last character in the input string
int intLastCharPos = strInput.Length - 1;
// for every character in the input string
for (intCurrentCharPos = 0; intCurrentCharPos <= intLastCharPos; intCurrentCharPos++)
{
// Get the current character from the input string
char chrCurrentInputChar = strInput[intCurrentCharPos];
// At first, set previous character to the current character in the input string
char chrPreviousInputChar = chrCurrentInputChar;
// If this is not the first character in the input string
if (intCurrentCharPos > 0)
{
// Get the previous character from the input string
chrPreviousInputChar = strInput[intCurrentCharPos - 1];
} // end if
// Put a space before each upper case character if the previous character is lower case
if (char.IsUpper(chrCurrentInputChar) == true && char.IsLower(chrPreviousInputChar) == true)
{
// Add a space to the output string
strOutput += " ";
} // end if
// Add the character from the input string to the output string
strOutput += chrCurrentInputChar;
} // next
// Return the altered string
return strOutput;
} // end method

Regex:
http://weblogs.asp.net/jgalloway/archive/2005/09/27/426087.aspx
http://stackoverflow.com/questions/773303/splitting-camelcase
(probably the best - see the second answer)
http://bytes.com/topic/c-sharp/answers/277768-regex-convert-camelcase-into-title-case
To convert from UpperCamelCase to
Title Case, use this line :
Regex.Replace("UpperCamelCase",#"(\B[A-Z])",#"
$1");
To convert from both lowerCamelCase
and UpperCamelCase to Title Case, use
MatchEvaluator : public string
toTitleCase(Match m) { char
c=m.Captures[0].Value[0]; return
((c>='a')&&(c<='z'))?Char.ToUpper(c).ToString():"
"+c; } and change a little your regex
with this line :
Regex.Replace("UpperCamelCase or
lowerCamelCase",#"(\b[a-z]|\B[A-Z])",new
MatchEvaluator(toTitleCase));

Read from file without special characters

Im using a StreamReader to open a text file and grab its contents. I need to grab just the text from the file without any escape characters ( \n, \r, \", etc ). Google is failing me right now. Any ideas?

There are no escape characters in a text that you read from a file. Escape characters are used when you write a string literal, for example in program code. I assume that you mean that you want to replace any write space characters with plain spaces.
You can use a regular expression to match white space characters and replace them with spaces. It's easier to use the File.ReadAllText to read the text from the file:
string text = Regex.Replace(File.ReadAllText(fileName), #"[\r\n\t ]+", " ");

Why don't you just call ReadToEnd and then Split the string?
// using statement and whatever code here
var rawContent = sr.ReadToEnd();
var usefulContent = rawContent.Split(new []{ "\r\n", "\\" },
StringSplitOptions.RemoveEmptyEntries);
Note: you'll want to tweak the separators in the Split method; this is just an example.
You could also simply Replace the unwanted characters:
// using statement and whatever code here
var rawContent = sr.ReadToEnd();
var usefulContent = rawContent
.Replace("\r\n", "" )
.Replace("\\", "");

If you're trying to do it as you stream, call StreamReader.Read() in a while loop and test the characters one by one.
If you're able to grab the entire file contents into a string, use a regular expression to strip the undesirable characters. Check out RegexHero: http://regexhero.net/tester/

Assume you have read the entire file in a string s
for (int i = 0; i < s.Length; i++)
{
if (char.IsLetterOrDigit(s, i)) // or if (!char.IsWhiteSpace(s, i))
{
// append to StringBuilder
}
}
If IsLetterOrDigit or IsWhiteSpace don't fit your needs you can create your own method and call it.

You may use universal function for skipping all characters you not need:
public string SkipChars(string InputString, char[] CharsToSkip)
{
string result = InputString;
foreach (var chr in CharsToSkip)
{
result = result.Replace(chr.ToString(), "");
}
return result;
}
usage:
string test = "one\ntwo\tthree";
MessageBox.Show(SkipChars(test, new char[] { '\n', '\t' }));

How to remove leading and trailing spaces from a string

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?

String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.

You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"

I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.

txt = txt.Trim();

Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.

text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();

Use the Trim method.

static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.

You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Unquote string in C# - c#

On your string use Trim with the " as char: .Trim('"')

I usually call String.Trim() for that purpose: string source = "\"Hello World!\""; string unquoted = source.Trim('"');

My implementation сheck that quotes are from both sides public string UnquoteString(string str) { if (String.IsNullOrEmpty(str)) return str; int length = str.Length; if (length > 1 && str[0] == '\"' && str[length - 1] == '\"') str = str.Substring(1, length - 2); return str; }

Just take the returned string and do a Trim('"');

If you know there will always be " at the end and beginning, this would be the fastest way. s = s.Substring(1, s.Length - 2);

Use string replace function or trim function. If you just want to remove first and last quotes use substring function. string myworld = "\"Hello World!\""; string start = myworld.Substring(1, (myworld.Length - 2));

I would suggest using the replace() method. string str = "\"HelloWorld\""; string result = str.replace("\"", string.Empty);

Here's my solution as extension method: public static class StringExtensions { public static string UnquoteString(this string inputString) => inputString.TrimStart('"').TrimEnd('"'); } It's just trimming at the start an the end...

Related

C# - Split a string separated by ':'

.NET Regex To Remove Line Breaks Within Quotes

Insert spaces between words on a camel-cased token [duplicate]

Read from file without special characters

How to remove leading and trailing spaces from a string

Categories

Resources