I have a string that is returned to me which contains escape characters.
Here is a sample string
"test\40gmail.com"
As you can see it contains escape characters. I need it to be converted to its real value which is
"test#gmail.com"
How can I do this?
If you are looking to replace all escaped character codes, not only the code for #, you can use this snippet of code to do the conversion:
public static string UnescapeCodes(string src) {
var rx = new Regex("\\\\([0-9A-Fa-f]+)");
var res = new StringBuilder();
var pos = 0;
foreach (Match m in rx.Matches(src)) {
res.Append(src.Substring(pos, m.Index - pos));
pos = m.Index + m.Length;
res.Append((char)Convert.ToInt32(m.Groups[1].ToString(), 16));
}
res.Append(src.Substring(pos));
return res.ToString();
}
The code relies on a regular expression to find all sequences of hex digits, converting them to int, and casting the resultant value to a char.
string test = "test\40gmail.com";
test.replace(#"\40","#");
If you want a more general approach ...
HTML Decode
The sample string provided ("test\40gmail.com") is JID escaped. It is not malformed, and HttpUtility/WebUtility will not correctly handle this escaping scheme.
You can certainly do it with string or regex functions, as suggested in the answers from dasblinkenlight and C.Barlow. This is probably the cleanest way to achieve the desired result. I'm not aware of any .NET libraries for decoding JID escaping, and a brief search hasn't turned up much. Here is a link to some source which may be useful, though.
I just wrote this piece of code and it seems to work beautifully... It requires that the escape sequence is in HEX, and is valid for value's 0x00 to 0xFF.
// Example
str = remEscChars(#"Test\x0D") // str = "Test\r"
Here is the code.
private string remEscChars(string str)
{
int pos = 0;
string subStr = null;
string escStr = null;
try
{
while ((pos = str.IndexOf(#"\x")) >= 0)
{
subStr = str.Substring(pos + 2, 2);
escStr = Convert.ToString(Convert.ToChar(Convert.ToInt32(subStr, 16)));
str = str.Replace(#"\x" + subStr, escStr);
}
}
catch (Exception ex)
{
throw ex;
}
return str;
}
.NET provides the static methods Regex.Unescape and Regex.Escape to perform this task and back again. Regex.Unescape will do what you need.
https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.unescape
Related
Edit:
Because of your responses I think I've asked the question wrong.
It's not that my solution doesn't work or isn't very clean. I'm interested if there is a general way, how you can foramt a string. Like you can do it with a int or other data types.
So I couldn't find one. But I hope there is one.
So that's the question I wanted to ask:
Does C# provides a way to format strings, like it does for a int or other data types?
I'm looking for something like this:
myString.Format(myFormat);
or:
myFormattedString = String.Format(myString, myFormat);
And if the answer is no, it's also ok. I just want to know it. (And maybe someone else as well)
Original question:
What's the best way to change the format of a string?
So I have a string that looks like this:
"123456789012345678"
And now I want that:
"12.34.567890.12345678"
I'm using this, but I don't find it very clean:
private string FormatString(string myString)
{
return myString.Insert(2, ".").Insert(5, ".").Insert(12, ".");
}
Things I've tried:
// Too long.
private string FormatString(string myString)
{
return myString.Substring(0, 2)
+ "."
+ myString.Substring(2, 2)
+ "."
+ myString.Substring(4, 6)
+ "."
+ myString.Substring(10, 8);
}
// Convertion from string -> long -> string.
private string FormatString(string myString)
{
return String.Format("{0:##'.'##'.'######'.'########}", long.Parse(myString));
}
I'm looking for something like that:
private string FormatString(string myString)
{
return String.Format("{0:##'.'##'.'######'.'########}", myString);
}
I don't see anything wrong with your code, but if you want a better matching system, you might want to consider regular expressions:
(\d{2})(\d{2})(\d{6})(\d{8})
And replace it with:
$1\.$2\.$3\.$4
(In action)
But my two cents: keep it like it is.
Well...when the framework does not provide what you want, you can always do it yourself.
I've made this method as a experiment. It can surely be optimized and is not fully tested, but it can give you a view of what you could do:
private string FormatString(string myString,string format)
{
const char number = '#';
const char character = '%';
StringBuilder sb = new StringBuilder();
if (format.Length < myString.Length) throw new Exception("Invalid format string");
int i = 0;
foreach (char c in format)
{
switch (c)
{
case number:
if (char.IsDigit(myString[i]))
{
sb.Append(myString[i]);
i++;
}
else
{
throw new Exception("Format string doesn't match input string");
}
break;
case character:
if (!char.IsDigit(myString[i]))
{
sb.Append(myString[i]);
i++;
}
else
{
throw new Exception("Format string doesn't match input string");
}
break;
default:
sb.Append(c);
break;
}
}
return sb.ToString();
}
This method expects the format string to have either a # to denote digit, a % to denote a character, or any other character that would be copied literally to the formatted string.
Usage:
string test = FormatString("123456789012345678", "##.##.######.########");
//outputs 12.34.567890.12345678
string test = FormatString("12345F789012345678", "##.##.#%####.########");
//outputs 12.34.5F7890.12345678
If your string will always be a number then you can do it like this:
string stringData = "123456789012345678";
string dataInFormat = Convert.ToInt64(stringData).ToString(#"##\.##\.######\.########");
First convert string to long and then implement the format on that. In your case it would be like this:
private string FormatString(string myString)
{
return Convert.ToInt64(myString).ToString(#"##\.##\.######\.########");
}
I have a string "201607" and I need to split it to 2 separate types. 2016 into int and 07 into byte. I have seen string split functions which all use delimeters but that won't work here. Is there an easier way to do this or so I have to split it into chars and then reconstruct them in C#?
Try it also:
string input="201607";
int IntPart=Convert.ToInt32(input.Substring(0,4));
byte BytePart=Convert.ToByte(input.Substring(4));
Try this Example
string input="201607";
int integerPart=0;
if(int.TryParse(input.Substring(0,4),out integerPart))
{
Console.WriteLine("Integer value is {0}",integerPart);
}
else
{
Console.WriteLine("Conversion Failed");
}
byte bytePart = byte.Parse(input.Substring(4));
Console.WriteLine("Byte Part is {0}",bytePart);
Perhaps try this too:
var input = "201607";
var matches = Regex.Match(input, "(\\d{4})(\\d{2})");
var integerPart = int.Parse(matches.Groups[1].Captures[0].Value);
var bytePart = byte.Parse(matches.Groups[2].Captures[0].Value);
I've been using C# String.Format for formatting numbers before like this (in this example I simply want to insert a space):
String.Format("{0:### ###}", 123456);
output:
"123 456"
In this particular case, the number is a string. My first thought was to simply parse it to a number, but it makes no sense in the context, and there must be a prettier way.
Following does not work, as ## looks for numbers
String.Format("{0:### ###}", "123456");
output:
"123456"
What is the string equivalent to # when formatting? The awesomeness of String.Format is still fairly new to me.
You have to parse the string to a number first.
int number = int.Parse("123456");
String.Format("{0:### ###}", number);
of course you could also use string methods but that's not as reliable and less safe:
string strNumber = "123456";
String.Format("{0} {1}", strNumber.Remove(3), strNumber.Substring(3));
As Heinzi pointed out, you can not have format specifier for string arguments.
So, instead of String.Format, you may use following:
string myNum="123456";
myNum=myNum.Insert(3," ");
Not very beautiful, and the extra work might outweigh the gains, but if the input is a string on that format, you could do:
var str = "123456";
var result = String.Format("{0} {1}", str.Substring(0,3), str.Substring(3));
string is not a IFormattable
Console.WriteLine("123456" is IFormattable); // False
Console.WriteLine(21321 is IFormattable); // True
No point to supply a format if the argument is not IFormattable only way is to convert your string to int or long
We're doing string manipulation, so we could always use a regex.
Adapted slightly from here:
class MyClass
{
static void Main(string[] args)
{
string sInput, sRegex;
// The string to search.
sInput = "123456789";
// The regular expression.
sRegex = "[0-9][0-9][0-9]";
Regex r = new Regex(sRegex);
MyClass c = new MyClass();
// Assign the replace method to the MatchEvaluator delegate.
MatchEvaluator myEvaluator = new MatchEvaluator(c.ReplaceNums);
// Replace matched characters using the delegate method.
sInput = r.Replace(sInput, myEvaluator);
// Write out the modified string.
Console.WriteLine(sInput);
}
public string ReplaceNums(Match m)
// Replace each Regex match with match + " "
{
return m.ToString()+" ";
}
}
How's that?
It's been ages since I used C# and I can't test, but this may work as a one-liner which may be "neater" if you only need it once:
sInput = Regex("[0-9][0-9][0-9]").Replace(sInput,MatchEvaluator(Match m => m.ToString()+" "));
There is no way to do what you want unless you parse the string first.
Based on your comments, you only really need a simple formatting so you are better off just implementing a small helper method and thats it. (IMHO it's not really a good idea to parse the string if it isn't logically a number; you can't really be sure that in the future the input string might not be a number at all.
I'd go for something similar to:
public static string Group(this string s, int groupSize = 3, char groupSeparator = ' ')
{
var formattedIdentifierBuilder = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
if (i != 0 && (s.Length - i) % groupSize == 0)
{
formattedIdentifierBuilder.Append(groupSeparator);
}
formattedIdentifierBuilder.Append(s[i]);
}
return formattedIdentifierBuilder.ToString();
}
EDIT: Generalized to generic grouping size and group separator.
The problem is that # is a Digit placeholder and it is specific to numeric formatting only. Hence, you can't use this on strings.
Either parse the string to a numeric, so the formatting rules apply, or use other methods to split the string in two.
string.Format("{0:### ###}", int.Parse("123456"));
I am working on a project in Unity which uses Assembly C#. I try to get special character such as é, but in the console it just displays a blank character: "". For instance translating "How are you?" Should return "Cómo Estás?", but it returns "Cmo Ests". I put the return string "Cmo Ests" in a character array and realized that it is a non-null blank character. I am using Encoding.UTF8, and when I do:
char ch = '\u00e9';
print (ch);
It will print "é". I have tried getting the bytes off of a given string using:
byte[] utf8bytes = System.Text.Encoding.UTF8.GetBytes(temp);
While translating "How are you?", it will return a byte string, but for the special characters such as é, I get the series of bytes 239, 191, 189, which is a replacement character.
What type of information do I need to retrieve from the characters in order to accurately determining what character it is? Do I need to do something with the information that Google gives me, or is it something else? I am need a general case that I can place in my program and will work for any input string. If anyone can help, it would be greatly appreciated.
Here is the code that is referenced:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using UnityEngine;
using System.Collections;
using System.Net;
using HtmlAgilityPack;
public class Dictionary{
string[] formatParams;
HtmlDocument doc;
string returnString;
char[] letters;
public char[] charString;
public Dictionary(){
formatParams = new string[2];
doc = new HtmlDocument();
returnString = "";
}
public string Translate(String input, String languagePair, Encoding encoding)
{
formatParams[0]= input;
formatParams[1]= languagePair;
string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", formatParams);
string result = String.Empty;
using (WebClient webClient = new WebClient())
{
webClient.Encoding = encoding;
result = webClient.DownloadString(url);
}
doc.LoadHtml(result);
input = alter (input);
string temp = doc.DocumentNode.SelectSingleNode("//span[#title='"+input+"']").InnerText;
charString = temp.ToCharArray();
return temp;
}
// Use this for initialization
void Start () {
}
string alter(string inputString){
returnString = "";
letters = inputString.ToCharArray();
for(int i=0; i<inputString.Length;i++){
if(letters[i]=='\''){
returnString = returnString + "'";
}else{
returnString = returnString + letters[i];
}
}
return returnString;
}
}
Maybe you should use another API/URL. This function below uses a different url that returns JSON data and seems to work better:
public static string Translate(string input, string fromLanguage, string toLanguage)
{
using (WebClient webClient = new WebClient())
{
string url = string.Format("http://translate.google.com/translate_a/t?client=j&text={0}&sl={1}&tl={2}", Uri.EscapeUriString(input), fromLanguage, toLanguage);
string result = webClient.DownloadString(url);
// I used JavaScriptSerializer but another JSON parser would work
JavaScriptSerializer serializer = new JavaScriptSerializer();
Dictionary<string, object> dic = (Dictionary<string, object>)serializer.DeserializeObject(result);
Dictionary<string, object> sentences = (Dictionary<string, object>)((object[])dic["sentences"])[0];
return (string)sentences["trans"];
}
}
If I run this in a Console App:
Console.WriteLine(Translate("How are you?", "en", "es"));
It will display
¿Cómo estás?
I don't know much about the GoogleTranslate API, but my first thought is that you've got a Unicode Normalization problem.
Have a look at System.String.Normalize() and it's friends.
Unicode is very complicated, so I'll over simplify! Many symbols can be represented in different ways in Unicode, that is: 'é' could be represented as 'é' (one character), or as an 'e' + 'accent character' (two characters), or, depending what comes back from the API, something else altogether.
The Normalize function will convert your string to one with the same Textual meaning, but potentially a different binary value which may fix your output problem.
You actually pretty much have it. Just insert the coded letter with a \u and it works.
string mystr = "C\u00f3mo Est\u00e1s?";
There are several issues with your approach. First of all the UTF8 encoding is a multibyte encoding. This means that if you use any non-ASCII character (having char code > 127), you will get a series of special characters that indicate to the system that this is an Unicode char. So actually your sequence 239, 191, 189 indicates a single character which is not an ASCII character. If you use UTF16, then you get fixed-size encodings (2-byte encodings) which actually map a character to an unsigned short (0-65535).
The char type in c# is a two-byte type, so it is actually an unsigned short. This contrasts with other languages, such as C/C++ where the char type is a 1-byte type.
So in your case, unless you really need to be using byte[] arrays, you should use char[] arrays. Or if you want to encode the characters so that they can be used in HTML, then you can just iterate through the characters and check if the character code is > 128, then you can replace it with the &hex; character code.
I had the same problem working one of my project [Language Resource Localization Translation]
I was doing the same thing and was using.. System.Text.Encoding.UTF8.GetBytes() and because of utf8 encoding was receiving special characters like your
e.g 239, 191, 189 in result string.
please take a look of my solution... hope this helps
Don't Use encoding at all Google translation will return correct like á as it self in the string. do some string manipulation and read the string as it is...
Generic Solution [works for every language translation which google support]
try
{
//Don't use UtF Encoding
// use default webclient encoding
var url = String.Format("http://www.google.com/translate_t?hl=en&text={0}&langpair={1}", "►" + txtNewResourceValue.Text.Trim() + "◄", "en|" + item.Text.Substring(0, 2));
var webClient = new WebClient();
string result = webClient.DownloadString(url); //get all data from google translate in UTF8 coding..
int start = result.IndexOf("id=result_box");
int end = result.IndexOf("id=spell-place-holder");
int length = end - start;
result = result.Substring(start, length);
result = reverseString(result);
start = result.IndexOf(";8669#&");//◄
end = result.IndexOf(";8569#&"); //►
length = end - start;
result = result.Substring(start +7 , length - 8);
objDic2.Text = reverseString(result);
//hard code substring; finding the correct translation within the string.
dictList.Add(objDic2);
}
catch (Exception ex)
{
lblMessages.InnerHtml = "<strong>Google translate exception occured no resource saved..." + ex.Message + "</strong>";
error = true;
}
public static string reverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
as you can see from the code no encoding has been performed and i am sending 2 special key charachters as "►" + txtNewResourceValue.Text.Trim() + "◄"to determine the start and end of the return translation from google.
Also i have checked hough my language utility tool I am getting "Cómo Estás?" when sending
How are you to google translation... :)
Best regards
[Shaz]
---------------------------Edited-------------------------
public string Translate(String input, String languagePair)
{
try
{
//Don't use UtF Encoding
// use default webclient encoding
//input [string to translate]
//Languagepair [eg|es]
var url = String.Format("http://www.google.com/translate_t?hl=en&text={0}&langpair={1}", "►" + input.Trim() + "◄", languagePair);
var webClient = new WebClient();
string result = webClient.DownloadString(url); //get all data from google translate
int start = result.IndexOf("id=result_box");
int end = result.IndexOf("id=spell-place-holder");
int length = end - start;
result = result.Substring(start, length);
result = reverseString(result);
start = result.IndexOf(";8669#&");//◄
end = result.IndexOf(";8569#&"); //►
length = end - start;
result = result.Substring(start + 7, length - 8);
//return transalted string
return reverseString(result);
}
catch (Exception ex)
{
return "Google translate exception occured no resource saved..." + ex.Message";
}
}
I want decode URL A to B:
A) http:\/\/example.com\/xyz?params=id%2Cexpire\u0026abc=123
B) http://example.com/xyz?params=id,expire&abc=123
This is a sample URL and I look for a general solution not A.Replace("\/", "/")...
Currently I use HttpUtility.UrlDecode(A, Encoding.UTF8) and other Encodings but cannot generate URL B !
You only need this function
System.Text.RegularExpressions.Regex.Unescape(str);
This is a basic example I was able to come up with:
static void Sample()
{
var str = #"http:\/\/example.com\/xyz?params=id%2Cexpire\u0026abc=123";
str = str.Replace("\\/", "/");
str = HttpUtility.UrlDecode(str);
str = Regex.Replace(str, #"\\u(?<code>\d{4})", CharMatch);
Console.Out.WriteLine("value = {0}", str);
}
private static string CharMatch(Match match)
{
var code = match.Groups["code"].Value;
int value = Convert.ToInt32(code, 16);
return ((char) value).ToString();
}
This is probably missing a lot depending on the types of URLs you are going to get. It doesn't handle error checking, escaping of literals, like \\u0026 should be \u0026. I'd recommend writing a few unit tests around this with various inputs to get started.