Unicode characters string

Unicode characters string - c#

I have the following String of characters.
string s = "\\u0625\\u0647\\u0644";
When I print the above sequence, I get:
\u0625\u0647\u062
How can I get the real printable Unicode characters instead of this \uxxxx representation?

If you really don't control the string, then you need to replace those escape sequences with their values:
Regex.Replace(s, #"\u([0-9A-Fa-f]{4})", m => ((char)Convert.ToInt32(m.Groups[1].Value, 16)).ToString());
and hope that you don't have \\ escapes in there too.

Asker posted this as an answer to their question:
I have found the answer:
s = System.Text.RegularExpressions.Regex.Unescape(s);

Try Regex:
String inputString = "\\u0625\\u0647\\u0644";
var stringBuilder = new StringBuilder();
foreach (Match match in Regex.Matches(inputString, #"\u([\dA-Fa-f]{4})"))
{
stringBuilder.AppendFormat(#"{0}",
(Char)Convert.ToInt32(match.Groups[1].Value));
}
var result = stringBuilder.ToString();

I had the following string "\u0001" and I wanted to get the value of it.
I tried a lot but this is what worked for me
int val = Convert.ToInt32(Convert.ToChar("\u0001")); // val = 1;
if you have multiple chars you can use the following technique
var original ="\u0001\u0002";
var s = "";
for (int i = 0; i < original.Length; i++)
{
s += Convert.ToInt32(Convert.ToChar(original[i]));
}
// s will be "12"

I would suggest the use of String.Normalize. You can find everything here:
http://msdn.microsoft.com/it-it/library/8eaxk1x2.aspx

Related

Substring IndexOf in c#

I have a string that looks like this: "texthere^D123456_02". But I want my result to be D123456.
this is what i do so far:
if (name.Contains("_"))
{
name = name.Substring(0, name.LastIndexOf('_'));
}
With this I remove at least the _02, however if I try the same way for ^ then I always get back texthere, even when I use name.IndexOf("^")
I also tried only to check for ^, to get at least the result:D123456_02 but still the same result.
I even tried to name.Replace("^" and then use the substring way I used before. But again the result stays the same.
texthere is not always the same length, so .Remove() is out of the question.
What am I doing wrong?
Thanks

When call Substring you should not start from 0, but from the index found:
String name = "texthere^D123456_02";
int indexTo = name.LastIndexOf('_');
if (indexTo < 0)
indexTo = name.Length;
int indexFrom = name.LastIndexOf('^', indexTo - 1);
if (indexFrom >= 0)
name = name.Substring(indexFrom + 1, indexTo - indexFrom - 1);

string s = "texthere^D123456_02";
string result= s.Substring(s.IndexOf("^") + 1);//Remove all before
result = result.Remove(result.IndexOf("_"));//Remove all after

Use the String.Split method :
var split1 = name.Split('^')[1];
var yourText = split1.Split('_')[0];
Or you could use RegExp to achieve basically the same.

Your easiest solution would be to split the string first, and then use your original solution for the second part.
string name = "texthere^D123456_02";
string secondPart = name.Split('^')[1]; // This will be D123456_02
Afterwards you can use the Substring as before.

With Regular Expression
string s = "texthere^D123456_02";
Regex r1 = new Regex(#"\^(.*)_");
MatchCollection mc = r1.Matches(s);
Console.WriteLine("Result is " + mc[0].Groups[1].Value);

An alternative to what's already been suggested is to use regex:
string result = Regex.Match("texthere^D123456_02", #"\^(.*)_").Groups[1].Value; // D123456

use regex.
Regex regex = new Regex(#"\^(.*)_");
Match match = regex.Match(name);
if(match.Success)
{
name= match.Groups[1].Value
}

An easier way would be to use Split
string s = "texthere^D123456_02";
string[] a = s.Split('^', '_');
if (a.Length == 3) // correct
{
}

Well, if you use the same code you posted, it's doing the right thing, you start to retrieve characters from the char 0 and stop when it finds "^", so what you will get is "texthere".
If you want only the value, then use this:
name = name.Substring(0, name.LastIndexOf('_')).Substring(name.IndexOf("^") + 1);
It will first remove whatever is after the "_" and whatever is before "^".

Substring takes a position and a length, so you need to actually figure out where your caret position is and where the underscore is to calculate the length
var name = "texthere^D123456_02";
if(name.Contains('_'))
{
var caretPos = name.IndexOf('^') + 1; // skip ahead
var underscorePos = name.IndexOf('_');
var length = underscorePos - caretPos;
var substring = name.Substring(caretPos, length);
Debug.Print(substring);
};

Try this and let me know how it goes
string inputtext = "texthere^D123456_02";
string pattern = #".+\^([A-Z]+[0-9]+)\_[0-9]+";
string result = Regex.Match(inputtext, pattern).Groups[1].Value;

String name = "texthere^D123456_02"
print name.split('_', '^')[1]
This splits your string at all occurrences of _ and ^ and returns the list of strings after the split. Since the string you need D123456 would be at the 1st index, (i.e. the 2nd position), I have printed out that.

If you are just wanting the "d123456" it's simple with just String.Split() there is no need for anything else. Just define the index you want afterwards. There are overloads on Split() for this very reason.
//...
var source = "texthere^D123456_02";
var result = source.Split(new char[] {'^', '_'}, StringSplitOptions.RemoveEmptyEntries)[1];
Console.WriteLine(result);
//Outputs: "D123456"
Hope this helps.

best possible way to get given substring

lets say I have string in format as below:
[val1].[val2].[val3] ...
What is the best way to get the value from the last bracket set [valx] ?
so for given example
[val1].[val2].[val3]
the result would be val3

You have to define best first, best in terms of readability or cpu-cycles?
I assume this is efficient and readable enough:
string values = "[val1].[val2].[val3]";
string lastValue = values.Split('.').Last().Trim('[',']');
or with Substring which can be more efficient, but it's not as safe since you have to handle the case that's there no dot at all.
lastValue = values.Substring(values.LastIndexOf('.') + 1).Trim('[',']');
So you need to check this first:
int indexOflastDot = values.LastIndexOf('.');
if(indexOflastDot >= 0)
{
lastValue = values.Substring(indexOflastDot + 1).Trim('[',']');
}

For a quick solution to your problem (so not structural),
I'd say:
var startIndex = input.LastIndexOf(".["); // getting the last
then using the Substring method
var value = input.Substring(startIndex + 2, input.Length - (startIndex - 2)); // 2 comes from the length of ".[".
then removing the "]" with TrimEnd function
var value = value.TrimEnd(']');
But this is by all means not the only solution, and not structural to apply.. Just one of many answers to your problem.

I think you want to access the valx.
The easiest solution that comes in my mind is this one:
public void Test()
{
var splitted = "[val1].[val2].[val3]".Split('.');
var val3 = splitted[2];
}

You can use following:
string[] myStrings = ("[val1].[val2].[val3]").Split('.');
Now you can access via index. For last you can use myStrings[myStrings.length - 1]

Providing, that none of val1...valN contains '.', '[' or ']' you can use a simple Linq code:
String str = #"[val1].[val2].[val3]";
String[] vals = str.Split('.').Select((x) => x.TrimStart('[').TrimEnd(']')).ToArray();
Or if all you want is the last value:
String str = #"[val1].[val2].[val3]";
String last = str.Split('.').Last().TrimStart('[').TrimEnd(']');

I'm assuming you always need the last brace. I would do it like this:
string input = "[val1].[val2].[val3]";
string[] splittedInput = input.split('.');
string lastBraceSet = splittedInput[splittedInput.length-1];
string result = lastBraceSet.Substring(1, lastBraceSet.Length - 2);

string str = "[val1].[val2].[val3]";
string last = str.Split('.').LastOrDefault();
string result = last.Replace("[", "").Replace("]", "");

string input="[val1].[val2].[val3]";
int startpoint=input.LastIndexOf("[")+1;
string result=input.Substring(startpoint,input.Length-startpoint-1);

I'd use the below regex. One warning is that it won't work if there are unbalanced square brackets after the last pair of brackets. Most of the answers given suffer from that though.
string s = "[val1].[val2].[val3]"
string pattern = #"(?<=\[)[^\]]+(?=\][^\[\]]*$)"
Match m = Regex.Match(s, pattern)
string result;
if (m.Success)
{
result = m.Value;
}

I would use regular expression, as they are the most clear from intention point of view:
string input = "[val1].[val2].[val3] ...";
string match = Regex.Matches(input, #"\[val\d+\]")
.Cast<Match>()
.Select(m => m.Value)
.Last();

Regex to find and replace a year in a string

I have a string in my c#:
The.Big.Bang.Theory.(2013).S07E05.Release.mp4
I need to find an occurance of (2013), and replace the whole thing, including the brackets, with _ (Three underscores). So the output would be:
The.Big.Bang.Theory._.S07E05.Release.mp4
Is there a regex that can do this? Or is there a better method?
I then do some processing on the new string - but later, need to report that '(2013)' was removed .. so I need to store the value that is replaced.

Tried with your string. It works
string pattern = #"\(\d{4}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var m = Regex.Replace(search, pattern, "___");
Console.WriteLine(m);
This will find any 4 digits number enclosed in open/close brakets.
If the year number can change, I think that Regex is the best approach .
Instead this code will tell you if there a match for your pattern
var k = Regex.Matches(search, pattern);
if(k.Count > 0)
Console.WriteLine(k[0].Value);

Many of these answers forgot the original question in that you wanted to know what you are replacing.
string pattern = #"\((19|20)\d{2}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
string replaced = Regex.Match(search, pattern).Captures[0].ToString();
string output = Regex.Replace(search, pattern, "___");
Console.WriteLine("found: {0} output: {1}",replaced,output);
gives you the output
found: (2013) output: The.Big.Bang.Theory.___.S07E05.Release.mp4
Here is an explanation of my pattern too.
\( -- match the (
(19|20) -- match the numbers 19 or 20. I assume this is a date for TV shows or movies from 1900 to now.
\d{2} -- match 2 more digits
\) -- match )

Here is a working snippet from a console application, note the regex \(\d{4}\):
var r = new System.Text.RegularExpressions.Regex(#"\(\d{4}\)");
var s = r.Replace("The.Big.Bang.Theory.(2013).S07E05.Release.mp4", "___");
Console.WriteLine(s);
and the output from the console application:
The.Big.Bang.Theory.___.S07E05.Release.mp4
and you can reference this Rubular for proof.
Below is a modified solution taking into consideration your additional requirement:
var m = r.Match("The.Big.Bang.Theory.(2013).S07E05.Release.mp4");
if (m.Success)
{
var s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4".Replace(m.Value, "___");
var valueReplaced = m.Value;
}

Try this:
string s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var info = Regex.Split(
Regex.Matches(s, #"\(.*?\)")
.Cast<Match>().First().ToString(), #"[\s,]+");
s = s.Replace(info[0], "___");
Result
The.Big.Bang.Theory.___.S07E05.Release.mp4

try this :
string str="The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var matches = Regex.Matches(str, #"\([0-9]{4}\)");
List<string> removed=new List<string>();
if (matches.Count > 0)
{
for (int i = 0; i < matches.Count; i++)
{
List.add(matches.value);
}
}
str=Regex.replace(str,#"\([0-9]{4}\)","___");
System.out.println("Removed Strings are:")
foreach(string s in removed )
{
System.out.println(s);
}
output:
Removed Strings are:
(2013)

You don't need a regex for a simple replace (you can use one, but's it's not needed)
var name = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var replacedName = name.Replace("(2013)", "___");

Extract digit in a string

I have a list of string
goal0=1234.4334abc12423423
goal1=-234234
asdfsdf
I want to extract the number part from string that start with goal,
in the above case is
1234.4334, -234234
(if two fragments of digit get the first one)
how should i do it easily?
Note that "goal0=" is part of the string, goal0 is not a variable.
Therefore I would like to have the first digit fragment that come after "=".

You can do the following:
string input = "goal0=1234.4334abc12423423";
input = input.Substring(input.IndexOf('=') + 1);
IEnumerable<char> stringQuery2 = input.TakeWhile(c => Char.IsDigit(c) || c=='.' || c=='-');
string result = string.Empty;
foreach (char c in stringQuery2)
result += c;
double dResult = double.Parse(result);

Try this
string s = "goal0=-1234.4334abc12423423";
string matches = Regex.Match(s, #"(?<=^goal\d+=)-?\d+(\.\d+)?").Value;
The regex says
(?<=^goal\d+=) - A positive look behind which means look back and make sure goal(1 or more number)= is at the start of the string, but dont make it part of the match
-? - A minus sign which is optional (the ? means 1 or more)
\d+ - One or more digits
(\.\d+)? - A decimal point followed by 1 or more digits which is optional
This will work if your string contains multiple decimal points as well as it will only take the first set of numbers after the first decimal point if there are any.

Use a regex for extracting:
x = Regex.Match(string, #"\d+").Value;
Now convert the resulting string to the number by using:
finalNumber = Int32.Parse(x);

Please try this:
string sample = "goal0=1234.4334abc12423423goal1=-234234asdfsdf";
Regex test = new Regex(#"(?<=\=)\-?\d*(\.\d*)?", RegexOptions.Singleline);
MatchCollection matchlist = test.Matches(sample);
string[] result = new string[matchlist.Count];
if (matchlist.Count > 0)
{
for (int i = 0; i < matchlist.Count; i++)
result[i] = matchlist[i].Value;
}
Hope it helps.
I didn't get the question at first. Sorry, but it works now.

I think this simple expression should work:
Regex.Match(string, #"\d+")

You can use the old VB Val() function from C#. That will extract a number from the front of a string, and it's already available in the framework:
result0 = Microsoft.VisualBasic.Conversion.Val(goal0);
result1 = Microsoft.VisualBasic.Conversion.Val(goal1);

string s = "1234.4334abc12423423";
var result = System.Text.RegularExpressions.Regex.Match(s, #"-?\d+");

List<String> list = new List<String>();
list.Add("goal0=1234.4334abc12423423");
list.Add("goal1=-23423");
list.Add("asdfsdf");
Regex regex = new Regex(#"^goal\d+=(?<GoalNumber>-?\d+\.?\d+)");
foreach (string s in list)
{
if(regex.IsMatch(s))
{
string numberPart = regex.Match(s).Groups["GoalNumber"];
// do something with numberPart
}
}

Removing unwanted characters from a string

I have a program where I take a date from an RSS file and attempt to convert it into a DateTime. Unfortunately, the RSS file that I have to use has a lot of spacing issues. When I parse the string I get this:
"\t\t\n\t\t4/13/2011\n\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t"
I want to remove all of the \t's and\n's. So far these have all failed:
finalDateString.Trim('\t');
finalDateString.Trim('\n');
finalDateString.Trim();
finalDateString.Replace("\t", "");
finalDateString.Replace("\n", "");
finalDateString.Replace(" ", "");
Every one of the commands will return the same string. Any suggestions?
(I tagged RSS in the case that there is an RSS reason for this)

You need to assign the original value the Replace output. You do not need to do the trim either as the replace will get rid of all of them.
finalDateString = finalDateString.Replace("\t", "");
finalDateString = finalDateString.Replace("\n", "");

First, you can remove all the whitespace from your string by using a 1-character regular expression:
String finalDateTimeString = "\t\t\n\t\t4/13/2011\n\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t";
Regex whitespaceRegex = new Regex("\\s");
finalDateTimeString = whitespaceRegex.Replace(finalDateTimeString, "");
I just tested this, and it worked.
Second, I just tested calling DateTime.Parse() on your string, and it worked without even removing the whitespace. So maybe you don't even have to do that.
String finalDateTimeString = "\t\t\n\t\t4/13/2011\n\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t";
DateTime finalDateTime = DateTime.Parse(finalDateTimeString);
// finalDateTime.toString() == "4/13/2011 12:00:00 AM"

I will use Regular expressions
string strRegex = #"([\s])";
Regex myRegex = new Regex(strRegex);
string strTargetString = #" 4/13/2011 ";
string strReplace = "";
string result = myRegex.Replace(strTargetString, strReplace);

using Regex.Replace:
string result = Regex.Replace(data,"[\\t,\\n]",""));

All the previously posted answers remove all whitespace from the string, but it would be more robust to only remove leading and trailing whitespace.
finalDateTimeString = Regex.Replace(finalDateTimeString, #"^\s+", "");
finalDateTimeString = Regex.Replace(finalDateTimeString, #"\s+$", "");
[ I don't know C#, so I'm guessing at the syntax from the other posts. Corrections welcome. ]

private String stringclear(String str)
{
String tuslar = "qwertyuopasdfghjklizxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM._0123456789 :;-+/*#%()[]!\nüÜğĞİışŞçÇöÖ"; // also you can add utf-8 chars
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
if (tuslar.Contains(str[i].ToString())) //from tuslar string. non special chars
sb.Append(str[i]);
if (str[i] == (char)13) // special char (enter key)
sb.Append(str[i]);
}
return sb.ToString();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Unicode characters string - c#

I have the following String of characters. string s = "\\u0625\\u0647\\u0644"; When I print the above sequence, I get: \u0625\u0647\u062 How can I get the real printable Unicode characters instead of this \uxxxx representation?

If you really don't control the string, then you need to replace those escape sequences with their values: Regex.Replace(s, #"\u([0-9A-Fa-f]{4})", m => ((char)Convert.ToInt32(m.Groups[1].Value, 16)).ToString()); and hope that you don't have \\ escapes in there too.

Asker posted this as an answer to their question: I have found the answer: s = System.Text.RegularExpressions.Regex.Unescape(s);

Try Regex: String inputString = "\\u0625\\u0647\\u0644"; var stringBuilder = new StringBuilder(); foreach (Match match in Regex.Matches(inputString, #"\u([\dA-Fa-f]{4})")) { stringBuilder.AppendFormat(#"{0}", (Char)Convert.ToInt32(match.Groups[1].Value)); } var result = stringBuilder.ToString();

I would suggest the use of String.Normalize. You can find everything here: http://msdn.microsoft.com/it-it/library/8eaxk1x2.aspx

Related

Substring IndexOf in c#

best possible way to get given substring

Regex to find and replace a year in a string

Extract digit in a string

Removing unwanted characters from a string

Categories

Resources