Removing unwanted characters from a string - c#

I have a program where I take a date from an RSS file and attempt to convert it into a DateTime. Unfortunately, the RSS file that I have to use has a lot of spacing issues. When I parse the string I get this:
"\t\t\n\t\t4/13/2011\n\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t"
I want to remove all of the \t's and\n's. So far these have all failed:
finalDateString.Trim('\t');
finalDateString.Trim('\n');
finalDateString.Trim();
finalDateString.Replace("\t", "");
finalDateString.Replace("\n", "");
finalDateString.Replace(" ", "");
Every one of the commands will return the same string. Any suggestions?
(I tagged RSS in the case that there is an RSS reason for this)

You need to assign the original value the Replace output. You do not need to do the trim either as the replace will get rid of all of them.
finalDateString = finalDateString.Replace("\t", "");
finalDateString = finalDateString.Replace("\n", "");

First, you can remove all the whitespace from your string by using a 1-character regular expression:
String finalDateTimeString = "\t\t\n\t\t4/13/2011\n\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t";
Regex whitespaceRegex = new Regex("\\s");
finalDateTimeString = whitespaceRegex.Replace(finalDateTimeString, "");
I just tested this, and it worked.
Second, I just tested calling DateTime.Parse() on your string, and it worked without even removing the whitespace. So maybe you don't even have to do that.
String finalDateTimeString = "\t\t\n\t\t4/13/2011\n\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t";
DateTime finalDateTime = DateTime.Parse(finalDateTimeString);
// finalDateTime.toString() == "4/13/2011 12:00:00 AM"

I will use Regular expressions
string strRegex = #"([\s])";
Regex myRegex = new Regex(strRegex);
string strTargetString = #" 4/13/2011 ";
string strReplace = "";
string result = myRegex.Replace(strTargetString, strReplace);

using Regex.Replace:
string result = Regex.Replace(data,"[\\t,\\n]",""));

All the previously posted answers remove all whitespace from the string, but it would be more robust to only remove leading and trailing whitespace.
finalDateTimeString = Regex.Replace(finalDateTimeString, #"^\s+", "");
finalDateTimeString = Regex.Replace(finalDateTimeString, #"\s+$", "");
[ I don't know C#, so I'm guessing at the syntax from the other posts. Corrections welcome. ]

private String stringclear(String str)
{
String tuslar = "qwertyuopasdfghjklizxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM._0123456789 :;-+/*#%()[]!\nüÜğĞİışŞçÇöÖ"; // also you can add utf-8 chars
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
if (tuslar.Contains(str[i].ToString())) //from tuslar string. non special chars
sb.Append(str[i]);
if (str[i] == (char)13) // special char (enter key)
sb.Append(str[i]);
}
return sb.ToString();
}

Related

How can I eliminate a quote from the start of my string using regex?

I have strings that sometimes start like this:
"[1][v5r,vi][uk]
Other times like this:
[1][v5r,vi][uk]
How can I remove the " when it appears at the start of a string using Regex? I know I need to do something like this, but not sure how to set it up:
regex = new Regex(#"(\n )?\[ant=[^\]]*\]");
regex.Replace(item.JmdictMeaning, ""));
If the string always starts with [1]:
int indexOfFirstElement = item.IndexOf("[1]");
if (indexOfFirstElement > 0)
item = item.Substring(indexOfFirstElement);
If you just want to start at the first [:
int indexOfFirstElement = item.IndexOf('[');
if (indexOfFirstElement > 0)
item = item.Substring(indexOfFirstElement);
Simpler than Regex, which is probably overkill for this problem.
Here you go
string input =#" ""[1][v5r,vi][uk]";
string pattern = #"^\s*""?|""?\s*$";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, "");
Console.WriteLine(result);
You can find my Example here in dotnetfiddle
string.StartsWith will do the trick
string str = "\"[1][v5r,vi][uk]";
if(str.StartsWith('"'))
str = str.Substring(1);
It can be done using indexOf and Substring
string str = "\"a[1][v5r,vi][uk]";
Console.WriteLine(str.Substring(str.IndexOf('[')));
use TrimStart() to remove this character if exists
string str = "\"a[1][v5r,vi][uk]";
str= str.TrimStart('\"');

Removing parts of the path from string

Consider the following string
string path = #"\\ParentDirectory\All_Attachments$\BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt";
I am trying to modify the path by removing the \\ParentDirectory\All_Attachments$\. So I want my final string to look like:
BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt
I have come up with the following regex
string pattern = #"(?<=\$)(\\)";
string returnValue = Regex.Replace(path, pattern, "", RegexOptions.IgnoreCase);
With the above if I do Console.WriteLine(returnValue) I get
\\ParentDirectory\All_Attachments$BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt
So it only removes \ can someone tell me how to achieve this please.
The code below should do the trick.
string path = #"\\ParentDirectory\All_Attachments$\BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt";
var result = Regex.Replace(path,
#"^ # Start of string
[^$]+ # Anything that is not '$' at least one time
\$ # The '$ sign
\\ # The \ after the '$'
", String.Empty, RegexOptions.IgnorePatternWhitespace);
When executed in LinqPad it gives the following result:
BATCH_NUMBERS\TS0001\SubDirectory\FileName.txt
As an alternative avoiding an RE or split/join you could just run along the string until you have seen 4 slashes:
string result = null;
for (int i = 0, m = 0; i < path.Length; i++)
if (path[i] == '\\' && ++m == 4) {
result = path.Substring(i + 1);
break;
}
Using a regex that takes the first 2 groups of (backslash(es) followed by 1 or more non-backslashes). And including the $ and backslash after that.
string returnValue = Regex.Replace(path, #"^(?:\\+[^\\]+){2}\$\\", "");
Or by splitting on $, joining the string array without it's first element and then trim \ from the start:
string returnValue = string.Join(null, path.Split('$').Skip(1)).TrimStart('\\');
But you'll be using System.Linq for that method to work.
You can use a combination of Substring() and IndexOf() to accomplish your goal:
string result = path.Substring(path.IndexOf("$") + 1);

Delete string from a double in C#

I am sending data from arduino to c# and have a problem. The value I get from the serialread comes with an "\r" at the end of it, example: "19.42\r". I found a solution to delete the characters after my number by using Regex. But it also makes my double an integer. "19.42\r" becomes "1942". How can I delete my string but still keep the value as a double?
line = Regex.Replace(line, #"[^\d]", string.Empty);
You want to trim the whitespace from the end of the string.
Use
line = line.TrimEnd();
See the C# demo
If you need to actually extract a double number from a string with regex, use
var my_number = string.Empty;
var match = Regex.Match(line, #"[0-9]+\.[0-9]+");
if (match.Success)
{
my_number = match.Value;
}
If the number can have no fractional part, use #"[0-9]*\.?[0-9]+" regex.
string data = "19.42\r";
return data.Substring(0, data.Length - 1);
or even better
data.TrimEnd('\r')
if \r is fixed characters you want to remove
string str = "awdawdaw\r";
str = str.replace("\r","");
if \r is not fixed characters you want to remove
string str = "awdawdaw\\";
str = str.Substring((str.Length - 2), 2); \\will be removed

Extra delimiter in MAC Address reformat

I've looked at several questions on here about formatting and validating MAC addresses, which is where I developed my regex from. The problem I'm having is that when I go to update the field is that there are extra delimiters in the new formatted MAC or if no delimiter exists the MAC fails to validate. I'm new to using regex, so can someone clarify why this is happening?
if (checkMac(NewMacAddress.Text) == true)
{
string formattedMAC = NewMacAddress.Text;
formattedMAC.Replace(" ", "").Replace(":", "").Replace("-", ""); //attempt to remove the delimiters before formatting
var regex = "(.{2})(.{2})(.{2})(.{2})(.{2})(.{2})";
var replace = "$1:$2:$3:$4:$5:$6";
var newformat = Regex.Replace(formattedMAC, regex, replace);
NewMacAddress.Text = newformat.ToString();
}
Here is the checkmac function
protected bool checkMac(string macaddress)
{
macaddress.Replace(" ", "").Replace(":", "").Replace("-", "");
Regex r = new Regex("^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$");
if (r.Match(macaddress).Success)
{
return true;
}
else
{
return false;
}
}
This is sample output for the extra delimiter that I was talking about. 00::5:0::56::b:f:00:7f
I was able to get the original MAC from a textbox. This also occurs with the MAC address I get from screen scrapes.
The reason your code is not working as intended is because:
String.Replace does not modify the string you pass in, but returns a new string instead (strings are immutable). You have to assign the result of String.Replace to a variable.
Your checkMac function only allows mac addresses with delimiters. You can simply remove this restriction to resolve your problems.
The working code then becomes something along the lines of:
string newMacAddress = "00::5:0::56::b:f:00:7f";
if (checkMac(newMacAddress) == true)
{
string formattedMAC = newMacAddress;
formattedMAC = formattedMAC.Replace(" ", "").Replace(":", "").Replace("-", ""); //attempt to remove the delimiters before formatting
var regex = "(.{2})(.{2})(.{2})(.{2})(.{2})(.{2})";
var replace = "$1:$2:$3:$4:$5:$6";
var newformat = Regex.Replace(formattedMAC, regex, replace);
newMacAddress = newformat.ToString();
}
protected static bool checkMac(string macaddress)
{
macaddress = macaddress.Replace(" ", "").Replace(":", "").Replace("-", "");
Regex r = new Regex("^([0-9A-Fa-f]{12})$");
if (r.Match(macaddress).Success)
{
return true;
}
else
{
return false;
}
}
You're close. I'm first going to answer using Ruby because that's what I'm most familiar with at the moment, and it should be sufficient for you to understand how to get it working in C#. Maybe I can convert it to C# later.
using these elements:
\A - start of entire string
[0-9a-fA-F] - any hex digit
{2} - twice
[:-]? - either ":" or "-" or "" (no delimiter)
\Z - end of entire string, before ending newline if it exists
() - parenthetical match in order to reference parts of the regex, e.g. match[1]
This regex will do what you need:
mac_address_regex = /\A([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})[:-]?([0-9a-fA-F]{2})\Z/
You can both validate and sanitize input using this regex:
match = mac_address_regex.match(new_mac_address.text)
if match.present?
sanitized_mac_addr = (1..6).map { |i| match[i] }.join(":") # join match[i] for i = (1,2,3,4,5,6)
sanitized_mac_addr.upcase! # uppercase
else
sanitized_mac_addr = nil
end

Unicode characters string

I have the following String of characters.
string s = "\\u0625\\u0647\\u0644";
When I print the above sequence, I get:
\u0625\u0647\u062
How can I get the real printable Unicode characters instead of this \uxxxx representation?
If you really don't control the string, then you need to replace those escape sequences with their values:
Regex.Replace(s, #"\u([0-9A-Fa-f]{4})", m => ((char)Convert.ToInt32(m.Groups[1].Value, 16)).ToString());
and hope that you don't have \\ escapes in there too.
Asker posted this as an answer to their question:
I have found the answer:
s = System.Text.RegularExpressions.Regex.Unescape(s);
Try Regex:
String inputString = "\\u0625\\u0647\\u0644";
var stringBuilder = new StringBuilder();
foreach (Match match in Regex.Matches(inputString, #"\u([\dA-Fa-f]{4})"))
{
stringBuilder.AppendFormat(#"{0}",
(Char)Convert.ToInt32(match.Groups[1].Value));
}
var result = stringBuilder.ToString();
I had the following string "\u0001" and I wanted to get the value of it.
I tried a lot but this is what worked for me
int val = Convert.ToInt32(Convert.ToChar("\u0001")); // val = 1;
if you have multiple chars you can use the following technique
var original ="\u0001\u0002";
var s = "";
for (int i = 0; i < original.Length; i++)
{
s += Convert.ToInt32(Convert.ToChar(original[i]));
}
// s will be "12"
I would suggest the use of String.Normalize. You can find everything here:
http://msdn.microsoft.com/it-it/library/8eaxk1x2.aspx

Categories

Resources