C# string.replace to remove illegal characters [duplicate]

C# string.replace to remove illegal characters [duplicate] - c#

This question already has answers here:
How to remove illegal characters from path and filenames?
(30 answers)
Closed 9 years ago.
I'm working on a program that reads files and saves pieces of them according to their column's title. Some of those titles have illegal characters for file names, so i've written this piece of code to handle those issues.
string headerfile = saveDir + "\\" + tVS.Nodes[r].Text.Replace("\"", "").Replace
("/","").Replace(":"," -").Replace(">","(Greater Than)") + ".csv";
Is there a nicer way of doing this where i don't have 4 .Replace()? or is there some sort of built in illegal character remover i don't know of?
Thanks!
EDIT: It does not need to replace the characters with anything specific. A blank space is sufficient.

Regular expressions are generally a good way to do that, but not when you're replacing every character with something different. You might consider replacing them all with the same thing, and just using System.IO.Path.GetInvalidFileNameChars().
string filename = tVS.Nodes[r].Text;
foreach(char c in System.IO.Path.GetInvalidFileNameChars()) {
filename = filename.Replace(c, '_');
}

System.IO.Path.GetInvalidFileNameChars() has all the invalid characters.
Here's a sample method:
public static string SanitizeFileName(string fileName, char replacementChar = '_')
{
var blackList = new HashSet<char>(System.IO.Path.GetInvalidFileNameChars());
var output = fileName.ToCharArray();
for (int i = 0, ln = output.Length; i < ln; i++)
{
if (blackList.Contains(output[i]))
{
output[i] = replacementChar;
}
}
return new String(output);
}

Have a look at Regex.Replace here, it will do everything you desire when it comes to stripping out characters individually. Selective replacement of other strings may be trickier.

string charsToRemove = #"\/:";TODO complete this list
string filename;
string pattern = string.format("[{0}]", Regex.Escape(charsToRemove));
Regex.Replace(filename, pattern, "");
If you just want to remove illegal chars, rather than replacing them with something else you can use this.

Related

C#: get rid of multiple invalid characters in string [duplicate]

This question already has answers here:
Replace multiple characters in a C# string
(15 answers)
Closed 3 years ago.
I am new to C#. Say that I have a string like this:
string test = 'yes/, I~ know# there# are% invalid£ characters$ in& this* string^";
If I wanted to get rid of a single invalid symbol, I would do:
if (test.Contains('/'))
{
test = test.Replace("/","");
}
But is there a way I can use a list of symbols as argument of the Contains and Replace functions, instead of deleting symbols one by one?

I would go with the regular expression solution
string test = Regex.Replace(test, #"\/|~|#|#|%|£|\$|&|\*|\^", "");
Add a | or parameter for each character and use the replace
Bear in mind the \/ means / but you need to escape the character.

You'll likely be better off defining acceptable characters than trying to think of and code for everything you need to eliminate.
Because you mention that you are learning, sounds like the perfect time to learn about Regular Expressions. Here are a couple of links to get you started:
Regular Expression Language - Quick Reference (MSDN)
C# Regex.Match Examples (DotNetPerls

I don't think there is such a feature out of the box.
I think your idea is pretty much on point, despite the fact the in my opinion you don't really need the if(test.Contains(..)) part. Doing this, once you iterate the characters of the string to see if such element is present when at the end if indeed this character is in the string you replace it
It would be faster just to replace the special characters right away. So...
List<string> specialChars = new List<string>() {"*", "/", "&"}
for (var i = 0; i < specialChars.Count; i++)
{
test = test.Replace(specialChars[i],"");
}

Your solution is:
Path.GetInvalidPathChars()
So the code would look something like this:
string illegal = "yes/, I~ know# there# are% invalid£ characters$ in& this* string^";
string invalid = new string(Path.GetInvalidFileNameChars()) + new
string(Path.GetInvalidPathChars());
foreach (char c in invalid)
{
illegal = illegal.Replace(c.ToString(), "");
}

Another variant:
List<string> chars = new List<string> {"!", "#"};
string test = "My funny! string#";
foreach (var c in chars)
{
test = test.Replace(c,"");
}
No need to use Contains as Replace does that.

How to remove backslashes in a string

I get a string from a website called "willekeurigwoord.nl" which means random word. So when I get the string from the site with HtmlAgilityPack, it is formatted like "\n\t\t\tkegelvrucht\r\n \t\n\t\t".
So the word that I get is "kegelvrucht" but before and after the word there are backslashes which when I try to remove they get ignored even when I put "#" or use double backslashes ("\") in front of the string.
So my question is, how do I remove the \ in my string?
I did try everything that is in the comment lines.
private string RandomWordOnline() //Get the word online
{
//get string from htlm file with htmlagilitypack
var webGet = new HtmlWeb();
var doc = webGet.Load("http://www.willekeurigwoord.nl/");
String word = doc.DocumentNode.SelectSingleNode("//h1").InnerText;
//word = word.Replace(#"\", "");
//word = #word.Trim(new char[] {' ','\\'});
//word = word.Substring(8, word.Length - 13);
//word = word.Substring(0, 13);
//trying to remove backslash, does not work
for (int i = 0; i < word.Length; i++)
{
char chrWord = Convert.ToChar(word.Substring(i, 1));
char backslash = Convert.ToChar(#"\");
if (chrWord == backslash)
{
word = word.Remove(i, 1);
}
}
return word;
}

Those backslashes are not in the string, they are just a representation of tabs, carriage returns and line feeds. For example, a string which Visual Studio shows as \t\t\n\n is only 4 characters long, not 8.
You can get rid of them just like this:
var webGet = new HtmlWeb();
var doc = webGet.Load("http://www.willekeurigwoord.nl/");
String word = doc.DocumentNode.SelectSingleNode("//h1").InnerText;
string fixedWord = word.Trim();
Trim removes all white spaces that surround your text, including tabs and new lines. If you happen to only want to remove some specific characters, or to remove them in the middle of the string, you need to do something like this:
string fixedWord = word.Replace("\t", "").Replace("\n", "").Replace("\r", "").Trim();

Just call Trim() on your string:
string cleaned = word.Trim();
It will remove all leading and trailing whitespace, which includes all of the characters you want removed.

Probably a C# String expert will know the answer you are looking for. But this is a great example of where post C languages make things harder. Probably your \ is being taken as an escape character by the compiler, so the code never sees it at run time.
By the way, "word" is a terrible choice for a label because it is reserved in most languages (meaning a type 16 bits wide or something similar).
In C, you just go through the string character by character and copy each one into a new string based on whether it is or isn't '\'; (I didn't test/debug this, and you need to add bounds checking unless you know the sizes of all the strings.)
i = j = 0;
while (strIn[i] != '0') {
if (strIn[i] != '\') {
strOut[j++] = strIn[i];
}
i++;
}
(If that sounds like extra work, know that at run time, your C# is doing that anyway, and hiding the required interaction with the memory manager from you so you don't know why your program runs slowly.)

Splitting string with dots and spaces [duplicate]

This question already has answers here:
Regex split string but keep separators
(2 answers)
Closed 7 years ago.
Let's say I have the following string: "This is a test. Haha.". I want to split it so that it becomes these there lines:
Hey.
This is a test.
Haha.
(Note that the space after the dot is preserved).
I tried to split the string using the Split method, and split by the dot, but it returns 3 new strings with the space before the string, and it removes the dots. I want to keep the space after the dot and keep the space.
How can I achieve this?
EDIT: I found a workaround, but I'm sure there's a simpler way:
string a = "Hey. This is a test. Haha.";
string[] splitted = a.Split('.');
foreach(string b in splitted)
{
if (b.Length < 3)
{
continue;
}
string f = b.Remove(0, 1);
Console.WriteLine(f + ". ");
}

I can't test this but due to the post of Darin Dimitrov :
string input = "Hey. This is a test. Haha.";
string result = input.Replace(". ", ".\n");

Remove characters between different parameters [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 7 years ago.
I have a string of different emails
ex: "email1#uy.com, email2#iu.it, email3#uu.edu" etc, etc
I would like to formulate a Regex that creates the following output
ex: "email1,email2,email3" etc, etc
How can I remove characters between an "#" and "," but leaving a "," and a Space in C#
Thank you so much for the help!!

If you want to replace all characters between # and comma by blank, the easiest option is to use Regex.Replace:
var emails = "a#m.com, b#m.com, d#m.com";
var result = Regex.Replace(emails, "#[^,]+", string.Empty);
// result is "a, b, d"
Please note that it leaves spaces after comma in the result, as you wanted in your question, though your example result has spaces removed.
The regular expression looks for all substrings starting '#' characters, followed by any character which is not comma. Those substrings are replaced with empty string.

Replacing all occurrences of #[^,]+ with an empty string will do the job.
The expression matches sequences that start in #, inclusive, up to a comma or to the end, exclusive. Therefore, commas in the original string of e-mails would be kept.
Demo.

Maybe you don't need to use a regex, in that case you can do the following:
string input = "email1#uy.com, email2#iu.it, email3#uu.edu";
input = input.Replace(" ", "");
string[] ocurrences = input.Split(',');
for (int i = 0; i < ocurrences.Length; i++)
{
string s = ocurrences[i];
ocurrences[i] = s.Substring(0, s.IndexOf('#'));
}
string final = string.Join(", ", occurences);

How to remove spaces and newlines in a string [duplicate]

This question already has answers here:
Fastest way to remove white spaces in string
(13 answers)
Closed 9 years ago.
sorry if they are not very practical for C # Asp.Net, I hope to make me understand
I have this situation
string content = ClearHTMLTags(HttpUtility.HtmlDecode(e.Body));
content=content.Replace("\r\n", "");
content=content.Trim();
((Post)sender).Description = content + "...";
I would make sure that the string does not contain content nor spaces (Trim) and neither carriage return with line feed, I'm using the above code inserted but it does not work great either
any suggestions??
Thank you very much
Fabry

You can remove all whitespaces with this regex
content = Regex.Replace(content, #"\s+", string.Empty);
what are whitespace characters from MSDN.
Btw you are mistaking Trim with removing spaces, in fact it's only removing spaces at the begining and at the end of string. If you want to replace all spaces and carige returns use my regex.

this should do it
String text = #"hdjhsjhsdcj/sjksdc\t\r\n asdf";
string[] charactersToReplace = new string[] { #"\t", #"\n", #"\r", " " };
foreach (string s in charactersToReplace)
{
text = text.Replace(s, "");
}

simple change only you missed # symbol
string content = ClearHTMLTags(HttpUtility.HtmlDecode(e.Body));
content=content.Replace(#"\r\n", "");
content=content.Trim();
((Post)sender).Description = content + "...";

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# string.replace to remove illegal characters [duplicate] - c#

Have a look at Regex.Replace here, it will do everything you desire when it comes to stripping out characters individually. Selective replacement of other strings may be trickier.

string charsToRemove = #"\/:";TODO complete this list string filename; string pattern = string.format("[{0}]", Regex.Escape(charsToRemove)); Regex.Replace(filename, pattern, ""); If you just want to remove illegal chars, rather than replacing them with something else you can use this.

Related

C#: get rid of multiple invalid characters in string [duplicate]

How to remove backslashes in a string

Splitting string with dots and spaces [duplicate]

Remove characters between different parameters [duplicate]

How to remove spaces and newlines in a string [duplicate]

Categories

Resources