Remove illegal characters from filename to burn on CD - c#

I have done research but my app downloads mp3 files every once in a while I get weird filename which doesn't hurt until I try to burn them to CD. Below is a good example.
The Animals - House of the Rising Sun (1964) + clip compilation ♫♥ 50 YEARS - counting.mp3
I have some code to try and catch illegal characters but it doesn't stop this filename. Is there a better way to catch the weird stuff the code I use currently is:
public static string RemoveIllegalFileNameChars(string input, string replacement = "")
{
if (input.Contains("?"))
{
input = input.Replace('?', char.Parse(" "));
}
if (input.Contains("&"))
{
input = input.Replace('&', char.Parse("-"));
}
var regexSearch = new string(Path.GetInvalidFileNameChars()) +
new string(Path.GetInvalidPathChars());
var r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
return r.Replace(input, replacement);
}

The CD file system is different to the OS file system, so those Path.GetInvalidX functions don't really apply to CDs.
I'm not sure, but possibly the standard you are looking at is ISO 9660
https://en.wikipedia.org/wiki/ISO_9660
Which has an extremely limited character set in filenames.
I think that Joliet extension to that standard must be in play:
https://en.wikipedia.org/wiki/Joliet_(file_system)
I think that maybe you are running into the filename length problem more than anything: "The specification only allows filenames to be up to 64 Unicode characters in length". Your filename is 90 characters long.

The following code will turn non-ascii characters into '?'
string sOut = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(s))
Then you can use a sOut.Replace('?', '') call to take them out. Does this seem like it would work for you?

Although, in this case your file name is valid, to catch invalid file names, it is suggested to use GetInvalidFileNameChars() method.
string fileName = "The Animals - House of the Rising Sun ? (1964) + clip compilation ♫♥ 50 YEARS - counting.mp3";
byte[] bytes = Encoding.ASCII.GetBytes(fileName);
char[] characters = Encoding.ASCII.GetChars(bytes);
string name = new string(characters);
StringBuilder fileN = new StringBuilder(name);
foreach (char c in Path.GetInvalidFileNameChars())
{
fileN.Replace(c, '_');
}
string validFileName = fileN.ToString();
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getinvalidfilenamechars?view=netframework-4.7.2

Thanks for all your help the final working code is listed below
public static string RemoveIllegalFileNameChars(string input, string replacement = "")
{
if (input.Contains("?"))
{
input = input.Replace('?', char.Parse(" "));
}
if (input.Contains("&"))
{
input = input.Replace('&', char.Parse("-"));
}
var regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
var r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
// check for non asccii characters
byte[] bytes = Encoding.ASCII.GetBytes(input);
char[] chars = Encoding.ASCII.GetChars(bytes);
string line = new String(chars);
line = line.Replace("?", "");
//MessageBox.Show(line);
return r.Replace(line, replacement);
}

Related

Best way to use Regex and int's

I have an application that loops through a group of documents and if a value is detected, then the user receives a prompt to replace this value. My current code looks like the following;
if (alllines[i].Contains("$"))
{
// prompt
int dollarIndex = alllines[i].IndexOf("%");
string nextTenChars = alllines[i].Substring(dollarIndex + 1, 18);
string PromtText = nextTenChars.Replace("%", "").Replace("/*", "").Replace("*/", "");
string promptValue = CreateInput.ShowDialog(PromtText, fi.FullName);
if (promptValue.Equals(""))
{
}
else
{
alllines[i] = alllines[i].Replace("$", promptValue);
File.WriteAllLines(fi.FullName, alllines.ToArray());
}
}
As you can see the prompt box displays 18 characters after the index which in this case is % however, if there are not 18 characters then the application crashes. What I want to do is use regex but I am unsure of how to apply this in the codes current state.
If I use the below I get the error Cannot convert from int to string any help would be appreciated.
Regex regex = new Regex(#"(\$.{1,10})");
var chars = regex.Matches(dollarIndex);
This should work
Regex regex = new Regex(#"(/*%.{1,50})");
var chars = regex.Match(alllines[i]).ToString();
string promptValue = CreateInput.ShowDialog(PromtText, fi.FullName);

Get specific words from string c#

I am working on a final year project. I have a file that contain some text. I need to get words form this file that contain "//jj" tag. e.g abc//jj, bcd//jj etc.
suppose file is containing the following text
ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj
dsdsd sfsfhf//vv
dfdfdf
I need all the words that are associated with //jj tag. I am stuck here past few days.
My code that i am trying
// Create OpenFileDialog
Microsoft.Win32.OpenFileDialog dlg = new Microsoft.Win32.OpenFileDialog();
// Set filter for file extension and default file extension
dlg.DefaultExt = ".txt";
dlg.Filter = "Text documents (.txt)|*.txt";
// Display OpenFileDialog by calling ShowDialog method
Nullable<bool> result = dlg.ShowDialog();
// Get the selected file name and display in a TextBox
string filename = string.Empty;
if (result == true)
{
// Open document
filename = dlg.FileName;
FileNameTextBox.Text = filename;
}
string text;
using (var streamReader = new StreamReader(filename, Encoding.UTF8))
{
text = streamReader.ReadToEnd();
}
string FilteredText = string.Empty;
string pattern = #"(?<before>\w+) //jj (?<after>\w+)";
MatchCollection matches = Regex.Matches(text, pattern);
for (int i = 0; i < matches.Count; i++)
{
FilteredText="before:" + matches[i].Groups["before"].ToString();
//Console.WriteLine("after:" + matches[i].Groups["after"].ToString());
}
textbx.Text = FilteredText;
I cant find my result please help me.
With LINQ you could do this with one line:
string[] taggedwords = input.Split(' ').Where(x => x.EndsWith(#"//jj")).ToArray();
And all your //jj words will be there...
Personally I think Regex is overkill if that's definitely how the string will look. You haven't specified that you definitely need to use Regex so why not try this instead?
// A list that will hold the words ending with '//jj'
List<string> results = new List<string>();
// The text you provided
string input = #"ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj dsdsd sfsfhf//vv dfdfdf";
// Split the string on the space character to get each word
string[] words = input.Split(' ');
// Loop through each word
foreach (string word in words)
{
// Does it end with '//jj'?
if(word.EndsWith(#"//jj"))
{
// Yes, add to the list
results.Add(word);
}
}
// Show the results
foreach(string result in results)
{
MessageBox.Show(result);
}
Results are:
ssss//jj
dsdsd//jj
Obviously this is not quite as robust as a regex, but you didn't provide any more detail for me to go on.
You have an extra space in your regex, it assumes there's a space before "//jj". What you want is:
string pattern = #"(?<before>\w+)//jj (?<after>\w+)";
This regular expression will yield the words you are looking for:
string pattern = "(\\S*)\\/\\/jj"
A bit nicer without backslash escaping:
(\S*)\/\/jj
Matches will include the //jj but you can get the word from the first bracketed group.

How can I remove a last part of a string with an unknown int?

So, I'm making a file transfer program from one PC in my house to the other. The client can look through the server's files and take what it wants. (Makes it very easy for moving projects/documents/music). This is an example of what a string of a file looks like:
New Text Document.txt : "(FILE)-(" + f.Length + " Bytes)"
My problem is removing : "(FILE)-(" + f.Length + " Bytes)".
How can I remove JUST that part from the string? Where the f.Length is unknown...
Thanks!
Just as an alternative to the regex answers, one option is to use LastIndexOf to find the last occurence of a known part of the string (e.g. (FILE)).
var oldString = "ThisIsAString (FILE)-(1234 Bytes";
int indexToRemoveTo = oldString.LastIndexOf("(FILE)");
// Get all the characters from the start of the string to "(FILE)"
var newString = oldString.Substring(0, indexToRemoveTo);
I hope I've got what you want
string contents = "some text (FILE)-(5435 Bytes) another text";
string result = Regex.Replace(contents, #"\(FILE\)-\(\d+ Bytes\)", "");
Console.WriteLine (result);
Prints:
some text another text
Solution to remove everything after .txt
string contents = "some text .txt (FILE)-(5435 Bytes) another text";
string lastSegment = ".txt";
var result = contents.Substring(0, contents.IndexOf(lastSegment) + lastSegment.Length);
Console.WriteLine (result);
prints some text .txt
var match = Regex.Match(pattern: #"\((.*)\)-\(\d+ Bytes\)$", input: name);
if(match.Success)
{
string fileName = match.Groups[1].Value;
}

In C# how can I prepare a string to be valid for windows directory name

I am writing a C# program which reads certain tags from files and based on tag values it creates a directory structure.
Now there could be anything in those tags,
If the tag name is not suitable for a directory name I have to prepare it to make it suitable by replacing those characters with anything suitable. So that directory creation does not fail.
I was using following code but I realised this is not enough..
path = path.replace("/","-");
path = path.replace("\\","-");
please advise what's the best way to do it..
thanks,
Import System.IO namespace and for path use
Path.GetInvalidPathChars
and for filename use
Path.GetInvalidFileNameChars
For Eg
string filename = "salmnas dlajhdla kjha;dmas'lkasn";
foreach (char c in Path.GetInvalidFileNameChars())
filename = filename.Replace(System.Char.ToString(c), "");
foreach (char c in Path.GetInvalidPathChars())
filename = filename.Replace(System.Char.ToString(c), "");
Then u can use Path.Combine to add tags to create a path
string mypath = Path.Combine(#"C:\", "First_Tag", "Second_Tag");
//return C:\First_Tag\Second_Tag
You can use the full list of invalid characters here to handle the replacement as desired. These are available directly via the Path.GetInvalidFileNameChars and Path.GetInvalidPathChars methods.
The characters you must now use are: ? < > | : \ / * "
string PathFix(string path)
{
List<string> _forbiddenChars = new List<string>();
_forbiddenChars.Add("?");
_forbiddenChars.Add("<");
_forbiddenChars.Add(">");
_forbiddenChars.Add(":");
_forbiddenChars.Add("|");
_forbiddenChars.Add("\\");
_forbiddenChars.Add("/");
_forbiddenChars.Add("*");
_forbiddenChars.Add("\"");
for (int i = 0; i < _forbiddenChars.Count; i++)
{
path = path.Replace(_forbiddenChars[i], "");
}
return path;
}
Tip: You can't include double-quote ("), but you can include 2 quotes ('').
In this case:
string PathFix(string path)
{
List<string> _forbiddenChars = new List<string>();
_forbiddenChars.Add("?");
_forbiddenChars.Add("<");
_forbiddenChars.Add(">");
_forbiddenChars.Add(":");
_forbiddenChars.Add("|");
_forbiddenChars.Add("\\");
_forbiddenChars.Add("/");
_forbiddenChars.Add("*");
//_forbiddenChars.Add("\""); Do not delete the double-quote character, so we could replace it with 2 quotes (before the return).
for (int i = 0; i < _forbiddenChars.Count; i++)
{
path = path.Replace(_forbiddenChars[i], "");
}
path = path.Replace("\"", "''"); //Replacement here
return path;
}
You'll of course use only one of those (or combine them to one function with a bool parameter for replacing the quote, if needed)
The correct answer of Nikhil Agrawal has some syntax errors.
Just for the reference, here is a compiling version:
public static string MakeValidFolderNameSimple(string folderName)
{
if (string.IsNullOrEmpty(folderName)) return folderName;
foreach (var c in System.IO.Path.GetInvalidFileNameChars())
folderName = folderName.Replace(c.ToString(), string.Empty);
foreach (var c in System.IO.Path.GetInvalidPathChars())
folderName = folderName.Replace(c.ToString(), string.Empty);
return folderName;
}

c# split line with .txt

lets say i have 5 lines within a txt file called users.txt each line has the following information
username:password
how would i go about spliting each line within a txt file and store the username as one string and password as the other.
I have the code to grab a random line using this code. This code is used for another part of my project aswell so I dont want the code to be altered. I was thinking after the line has been grabbed call another function but I have no idea on how to split it with the :
private static string GetRandomLine(string file)
{
List<string> lines = new List<string>();
Random rnd = new Random();
int i = 0;
try
{
if (File.Exists(file))
{
//StreamReader to read our file
StreamReader reader = new StreamReader(file);
//Now we loop through each line of our text file
//adding each line to our list
while (!(reader.Peek() == -1))
lines.Add(reader.ReadLine());
//Now we need a random number
i = rnd.Next(lines.Count);
//Close our StreamReader
reader.Close();
//Dispose of the instance
reader.Dispose();
//Now write out the random line to the TextBox
return lines[i].Trim();
}
else
{
//file doesn't exist so return nothing
return string.Empty;
}
}
catch (IOException ex)
{
MessageBox.Show("Error: " + ex.Message);
return string.Empty;
}
}
You should be able to use string.Split:
string line = GetRandomLine(file);
string[] parts = line.Split(':');
string user = parts[0];
string pass = parts[1];
That being said, you may also want to add error checking (ie: make sure parts has 2 elements, etc).
This is much cleaner, and handles cases where the password might contain ':'s
Of course I would expect you to ensure that passwords are not plain text and hashed password's don't contain any ':'s; But just in case they do, this is what would work:
Split() will cause other problems.
bool GetUsernamePassword(string line, ref string uname, ref string pwd)
{
int idx = line.IndexOf(':') ;
if (idx == -1)
return false;
uname = line.Substring(0, idx);
pwd = line.Substring(idx + 1);
return true;
}
.
string username_password = "username:password";
string uname = String.Empty;
string pwd = String.Empty;
if (!GetUsernamePassword(username_password, ref uname, ref pwd))
{
// Handle error: incorrect format
}
Console.WriteLine("{0} {1} {2}", username_password, uname, pwd);
btw. having said the above this won't work (like all other solutions before this one) if the username has ':' :P But this will handle the case where password has ':'.
To split the string is simple:
string[] components = myUserAndPass.Split(':');
string userName = components[0];
string passWord = components[1];
Try to read the following stackoverflow pages:
C# Tokenizer - keeping the separators
Does C# have a String Tokenizer like Java's?
Use the Split() method.
For example, in this case
string[] info = lines[i].Split(':');
info[0] will have the username and info[1] will have the password.
Try something like this...
string []split = line.Split(':');
string username = split[0];
string pwd = split[1];
Reed Corpsey gave a nice answer already, so instead of giving another solution, I'd just like to make one comment about your code. You can use the Using statement to handle the StreamReader Close and Dispose method calling for you. This way if an error happens, you don't have to worry that the Stream is left open.
Changing your code slightly would make it look like:
//StreamReader to read our file
using(StreamReader reader = new StreamReader(file))
{
//Now we loop through each line of our text file
//adding each line to our list
while (!(reader.Peek() == -1))
lines.Add(reader.ReadLine());
//Now we need a random number
i = rnd.Next(lines.Count);
}
//Now write out the random line to the TextBox
return lines[i].Trim();

Categories

Resources