replace unicode character

replace unicode character - c#

String jData="Memur adayar\u0131n\u0131n en b\u00fcy\u00fck sorunar"
+ "\u0131ndan KPSS \u0 131 ";
jData = Regex.Replace(jData, #"\\u0 ", #"\\u0", RegexOptions.Compiled).Trim();
I have to replace "\u0 " in jData with "\u0" (i.e. remove the trailing whitespace character if there is one) but the method I used isn't working. What should I do?

So you've got some malformed Unicode escapes in the string and you want to fix them by removing any whitespace after the 0. That's simple enough:
jData = Regex.Replace(jData, #"(\\u0)\s+(\w+)", "$1$2");
The hardest part of all this is figuring out what all the backslashes are supposed to mean. C# can helps you with that supports an alternative string literal syntax for verbatim string, the only character that you have to escape with a backslash is the backslash itself. (You have to escape quotation marks too, but you do that with another quote, i.e. "").
With that out of the way, the real reason I answered this question was to advise you not to use RegexOptions.Compiled. I'm sure you've heard many people say it makes the regex work faster. That's true, but it's an oversimplification. Read this article for a good discussion of this issue. Do yourself a favor and forget RegexOptions.Compiled even exists until you run into a problem you can't solve without it.

find: #"\\u0 "
replace: #"\\u0"
they are the same. Try it with an capital O or normal o

I think I got it working
string jData= #"Memur adayar\u0131n\u0131n en b\u00fcy\u00fck sorunar\u0131ndan KPSS \u0 131 ";
jData = Regex.Replace(jData, #"\\u0 ", #"\u0", RegexOptions.Compiled).Trim();
Notice I added an extra '#' in front of the input string. And in the regex part I changed the third argument to #"\u0"

There's a problem with your example string. I'm supposing that you actually wanted the backslashes in the string, in which case the simplest approach is to put # before the string literals. And then I believe you have the opposite problem in the second line, where you should have either used just one backslash in each string, or omitted the #.
There's no reason to use Regex.Replace() here. jData.Replace() would suffice just fine:
String jData=#"Memur adayar\u0131n\u0131n en b\u00fcy\u00fck sorunar"
+ #"\u0131ndan KPSS \u0 131 ";
jData = jData.Replace(#"\u0 ", #"\u0").Trim();

Related

Replace characters where certain character does not follow a comma?

Is there a way to use wildcards to define the following:
I would like a "\" to come before and after a comma, when a comma character does not contain a "\"" before it or after it.
I am a little unsure how to do the negation.
EDIT Sample data:
"col1,col2,col3"
should become
"\"col1\",\"col2\",\"col3\""
where "\"" just means a quote string

Use the "negative look behind" assertion:
(?<!\\),
Can't give you a better answer without having sample input/output.

Try (?<!\"),(?!\"), which is called Zero-Width Assertions
I'll busy now, would explain later, sorry for that.

Replace everything that matches the following: ^(\\\"),^(\\\") with: \",\"
It means anything but a backslash followed by a quote, followed by a comma, followed by anything but a backslash followed by a quote.

Use regular expressions or a simple replace:
string s = "col1,col2\",\"col3";
// replace all existing quotes and replace all commas with escaped characters again
string r = s.Replace('\"','').Replace(",","\",\"");
// r = "col1\",\"col2\",\"col3"
But this does not do what your sample data looks like:
"col1,col2,col3" should become "col1\",\"col2\",\"col3\""
This isn't following your rule (look at the trailing \" !). Maybe you want to wrap all col's, so you can add a \" at the beginning and the end, too. (Assuming the seperator is always just ,, not including spaces)

I know this thread is a bit old but for the new visitors this can also be done:
string sample = "col1,col2,col3"
string result = sample.Replace("""","");
result = "\"" + result.replace(",","\",\"") + "\""
Hope it helps!

Regex : replace a string

I'm currently facing a (little) blocking issue. I'd like to replace a substring by one another using regular expression. But here is the trick : I suck at regex.
Regex.Replace(contenu, "Request.ServerVariables("*"))",
"ServerVariables('test')");
Basically I'd like to replace whatever is between the " by "test". I tried ".{*}" as a pattern but it doesn't work.
Could you give me some tips, I'd appreciate it!

There are several issues you need to take care of.
You are using special characters in your regex (., parens, quotes) -- you need to escape these with a slash. And you need to escape the slashes with another slash as well because we 're in a C# string literal, unless you prefix the string with # in which case the escaping rules are different.
The expression to match "any number of whatever characters" is .*. In this case, you would want to match any number of non-quote characters, which is [^"]*.
In contrast to (1) above, the replacement string is not a regular expression so you don't want any slashes there.
You need to store the return value of the replace somewhere.
The end result is
var result = Regex.Replace(contenu,
#"Request\.ServerVariables\(""[^""]*""\)",
"Request.ServerVariables('test')");

Based purely on my knowledge of regex (and not how they are done in C#), the pattern you want is probably:
"[^"]*"
ie - match a " then match everything that's not a " then match another "
You may need to escape the double-quotes to make your regex-parser actually match on them... that's what I don't know about C#

Try to avoid where you can the '.*' in regex, you can usually find what you want to get by avoiding other characters, for example [^"]+ not quoted, or ([^)]+) not in parenthesis. So you may just want "([^"]+)" which should give you the whole thing in [0], then in [1] you'll find 'test'.
You could also just replace '"' with '' I think.

Taryn Easts regex includes the *. You should remove it, if it is just a placeholder for any value:
"[^"]"
BTW: You can test this regex with this cool editor: http://rubular.com/r/1MMtJNF3kM

Replace single backslash with double backslash

It seems simple enough, right? Well, I don't know.
Here's the code I'm trying:
input = Regex.Replace(input, "\\", "\\\\\\");
However, I'm receiving an error,
ArgumentException was unhandled - parsing "\" - Illegal \ at end of pattern.
How do I do this?

The first one should be "\\\\", not "\\". It works like this:
You have written "\\".
This translates to the sequence \ in a string.
The regex engine then reads this, which translates as backslash which isn't escaping anything, so it throws an error.
With regex, it's much easier to use a "verbatim string". In this case the verbatim string would be #"\\". When using verbatim strings you only have to consider escaping for the regex engine, as backslashes are treated literally. The second string will also be #"\\", as it will not be interpreted by the regex engine.

If you want to replace one backslash with two, it might be clearer to eliminate one level of escaping in the regular expression by using #"..." as the format for your string literals, also known as a verbatim string. It is then easier to see that
string output = Regex.Replace(input, #"\\", #"\\");
is a replacement from \ to \\.

I know it's too late to help you, maybe someone else will benefit from this. Anyway this worked for me:
text = text.Replace(#"\",#"\\");
and I find it even more simplier.
Cheers!

var result = Regex.Replace(#"afd\tas\asfd\", #"\\", #"\\");
The first parameter is string \\ which is \ in regex.
The second parameter is not processed by regex, so it will put it as is, when replacing.

If you intend to use the input in a regex pattern later, it can be a good idea to use Regex.Encode.
input = Regex.Escape(input);

regex.replace #number;#

What would be the regex expression to find (PoundSomenumberSemiColonPound) (aka #Number;#)? I used this but not working
string st = Regex.Replace(string1, #"(#([\d]);#)", string.Empty);

You're looking for #\d+;#.
\d matches a single numeric character
+ matches one or more of the preceding character.

(\x23\d+\x3B\x32)
# and / are both used around patterns, thus the trouble. Try using the above (usually when I come in to trouble with specific characters I revert to their hex facsimile (asciitable.com has a good reference)
EDIT Forgot to group for replacement.
EDITv2 The below worked for me:
String string1 = "sdlfkjsld#132;#sdfsdfsdf#1;#sdfsdfsf#34d;#sdfs";
String string2 = System.Text.RegularExpressions.Regex.Replace(string1, #"(\x23\d+\x3B\x23)", String.Empty);
Console.WriteLine("from: {0}\r\n to: {1}", string1, string2);;
Output:
from: sdlfkjsld#132;#sdfsdfsdf#1;#sdfsdfsf#34d;#sdfs
to: sdlfkjsldsdfsdfsdfsdfsdfsf#34d;#sdfs
Press any key to continue . . .

You don't need a character class when using \d, and as SLaks points out you need + to match one or more digits. Also, since you're not capturing anything the parentheses are redundant too, so something like this should do it
string st = Regex.Replace(string1, #"#\d+;#", string.Empty);

You may need to escape the # symbols, they're usually interpreted as comment markers, in addition to #SLaks comment about using + to allow multiple digits

how to merge or inject "#" character in a string including escape characters without definning the string varibale from scratch in C#

hi , I have 2 related questions.
1)suppose we have:
string strMessage="\nHellow\n\nWorld";
console.writeln(strMessage);
Result is:
Hellow
World
Now if we want to show the string in the original format in One Line
we must redefine the first variable from scratch.
string strOrignelMessage=#"\nHellow\n\nWorld" ;
console.writln(strOrignelMessage);
Result is:
\nHellow\n\nWorld --------------------->and everything is ok.
i am wondering is there a way to avoid definning
the new variable(strOrignelMessage) in code for this purpose and just using only
the first string variable(strMessage) and apply some tricks and print it in one line.
at first i tried the following workaround but it makes some bugs.suppose we have:
string strMessage="a\aa\nbb\nc\rccc";
string strOrigenalMessage=strMessage.replace("\n","\\n").replace("\r","\\r");
Console.writeln(strOrigenalMessage)
result is :aa\nbb\nc\rccc
notice that befor the first "\" not printed.and now my second question is:
2)How we can fix the new problem with single "\"in the string
i hope to entitle this issue correctly and my explanations would be enough,thanks

No, because the compiler has already converted all of your escaped characters in the original string to the characters they represent. After the fact, it is too late to convert them to non-special characters. You can do a search and replace, converting '\n' to literally #"\n", but that is whacky and you're better off defining the string correctly in the first place. If you wanted to escape the backslashes in the first place, why not put an extra backslash character in front of each of them:
Instead of "\n" use "\\n".
Updated in response to your comment:
If the string is coming from user input, you don't need to escape the backslash, because it will be stored as a backslash in the input string. The escape character only works as an escape character in string literals in code (and not preceded by #, which makes them verbatim string literals).

if you want "\n\n\a\a\r\blah" to print as \n\n\a\a\r\blah without # just replace all \ with \\
\ is the escaper in a non-verbatim string. So you simply need to escape the escaper, as it were.

If you want to use both strings, but want to have only one in the code then write the string with #, and construct the other one with Replace(#"\n","\n").

explanations for Anthony Pegram (if i understand u right) and anyone that found it usefull
i think i find my way in question2.
at first ,unfortunately,i thought that the
escape characters limts to \n,\t,\r,\v and
this made me confuesed becouse in my sample string i used \a and \b
and the compiler behaviuor was not understandable for me.
but finally i found that \a and \b is in
escape-characters set too.and if u use "\" without escap characters
a compile time error would be raised (its so funny when i think to My mistake again)
pls refers to this usefull msdn article for more info.
2.4.4.5 String literals
and you couldnt replace \ (single\) with \\
becouse fundamentally you couldnt have a (single \) without using
escape-characters after it in a string .so we coudnt write such a string in the code:
string strTest="abc\pwww"; ------> compile time error
and for retriving an inactived escape characters version of a string
we can use simply string.replace method as i used befor.
excuse me for long strory ,thank u all for cooperation.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

replace unicode character - c#

find: #"\\u0 " replace: #"\\u0" they are the same. Try it with an capital O or normal o

Related

Replace characters where certain character does not follow a comma?

Regex : replace a string

Replace single backslash with double backslash

regex.replace #number;#

how to merge or inject "#" character in a string including escape characters without definning the string varibale from scratch in C#

Categories

Resources