Regex get group block with specific start and end each group - c#

If we had some string like :
----------DBVer=1
/*some sql script*/
----------DBVer=1
----------DBVer=2
/*some sql script*/
----------DBVer=2
----------DBVer=n
/*some sql script*/
----------DBVer=n
Can we extract scripts between first DBVer=1 and second DBVer=1 and so on... with regex?
I thing we must have some placehoder for regex, and tel regex engine if saw DBVer=digitA pick string until DBVer=digitA again if saw DBVer=digitB pick string until DBVer=digitB and so on...
Can we implement this with regex and if we can how?

Yes, using backreferences and lookarounds, you can capture the scripts:
var pattern = #"(?<=(?<m>-{10}DBVer=\d+)\r?\n).*(?=\r?\n\k<m>)";
var scripts = Regex.Matches(input, pattern, RegexOptions.Singleline)
.Cast<Match>()
.Select(m => m.Value);
Here, we capture the m (marker) group with (?<m>-{10}DBVer=\d+) and reuse the m value later in the regex with \k<m> to match against the end marker.
In order for .* to match newline chars, it is necessary to turn on Singleline mode. This, in turn, means we have to be specific about our newlines. In Singleline mode, these can be accounted for in a non-platform specific way with \r?\n.

Try code below. Not RegEx but works very well.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication6
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
Script.ReadScripts(FILENAME);
}
}
public class Script
{
enum State
{
Get_Script,
Read_Script
}
public static List<Script> scripts = new List<Script>();
public int version { get; set; }
public string script { get; set; }
public static void ReadScripts(string filename)
{
string inputLine = "";
string pattern = "DBVer=(?'version'\\d+)";
State state = State.Get_Script;
StreamReader reader = new StreamReader(filename);
Script newScript = null;
while ((inputLine = reader.ReadLine()) != null)
{
inputLine = inputLine.Trim();
if (inputLine.Length > 0)
{
switch (state)
{
case State.Get_Script :
if(inputLine.StartsWith("-----"))
{
newScript = new Script();
scripts.Add(newScript);
string version =
Regex.Match(inputLine, pattern).Groups["version"].Value;
newScript.version = int.Parse(version);
newScript.script = "";
state = State.Read_Script;
}
break;
case State.Read_Script :
if (inputLine.StartsWith("-----"))
{
state = State.Get_Script;
}
else
{
if (newScript.script.Length == 0)
{
newScript.script = inputLine;
}
else
{
newScript.script += "\n" + inputLine;
}
}
break;
}
}
}
}
}
}

Related

How To read Block of text in a text file

I have a file in which i have to read text between startscriptexpression$ and Finish scriptExpression$, and also read between startupdatedescription$ and startupdatedescription$[
The problem is that i want to re write the code in a cleaner format.
My Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace Vesrion
{
class Program
{
static void Main(string[] args)
{
string path = #"C:\Users\Development\Desktop\Read\Test.txt";
using (var reader = new StreamReader(path))
{
var textInBetween = new List<string>();
var ListOFDescription = new List<string>();
string NewString = "";
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
//Reads First line,
switch (line)
{
case "StartScriptExpression$":
continue;
case "FinishScriptExpression$":
if (line.Contains("FinishScriptExpression$"))
{
line = "";
}
string Something = string.Join("", textInBetween);
textInBetween = line.Split(',').ToList();
string[] lines = Something.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None);
foreach (var S in lines)
{
ListOFDescription.Add(S);
Console.WriteLine(S);
}
NewString += ListOFDescription;
break;
case "StartUpdateDescription$":
//Console.WriteLine(Environment.NewLine);
continue;
case "FinishUpdateDescription$":
// Console.WriteLine(Environment.NewLine);
continue;
default:
textInBetween.Add(line);
//Console.WriteLine(line);
break;
}
}
}
}
}
}
Text inside start and finish expression must be in a list of string array.
text inside startupdatedescription and finishupdatedescription must be in a string.
.
One way to do it is using regular expression https://dotnetfiddle.net/pxBAMv

How do I parse and find a string in a C# script?

I've been trying to parse and search for a specific word in a big string, but I can't seem to be able to figure it out. I have created a script that connects a Twitch Channel's chat into unity.
An example of a message would be:
"#badge-info=subscriber/4;badges=moderator/1,subscriber/3,bits/1;bits=1;color=;display-name=TwitchUser1234;emotes=;flags=;id=da6ec4c6-af61-4346-abc-123456789;mod=1;room-id=12345678;subscriber=1;tmi-sent-ts=160987654321;turbo=0;user-id=123456789;user-type=mod :TwitchUser1234#TwitchUser1234.tmi.twitch.tv PRIVMSG #thechannelyouarewatching :PogChamp1 Another Test Bit"
I tried parsing and searching for the string 'bits' the message by doing:
private void GameInputs(string ChatInputs)
{
string Search;
Search = ChatInputs.Split(";", "=");
if(string "bits" in Search)
{
print("I made it here.");
}
}
I'm at a complete loss and have no idea how to do this. Any help is appreciated.
If my full code is needed it is:
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System;
using System.ComponentModel;
using System.Net.Sockets;
using System.IO;
public class TwitchChat : MonoBehaviour
{
private TcpClient twitchClient;
private StreamReader reader;
private StreamWriter writer;
public string username, password, channelName; // http://twitchapps.com/tmi
// Start is called before the first frame update
void Start()
{
Connect();
}
void Update()
{
if(!twitchClient.Connected)
{
Connect();
}
ReadChat();
}
private void Connect()
{
twitchClient = new TcpClient("irc.chat.twitch.tv", 6667);
reader = new StreamReader(twitchClient.GetStream());
writer = new StreamWriter(twitchClient.GetStream());
writer.WriteLine("PASS " + password);
writer.WriteLine("NICK " + username);
writer.WriteLine("USER " + username + " 8 * :" + username);
writer.WriteLine("JOIN #" + channelName);
writer.WriteLine("CAP REQ :twitch.tv/tags");
writer.Flush();
}
private void ReadChat()
{
if (twitchClient.Available > 0)
{
var message = reader.ReadLine();
print(message);
GameInputs(message);
}
}
private void GameInputs(string ChatInputs)
{
string Search;
Search = ChatInputs.Split(";", "=");
if(string "bits" in Search)
{
print("I made it here.");
}
}
}
If you want to pull the value of "bits=xx" out, this would do it:
var b = value.Split(';').FirstOrDefault(s => s.StartsWith("bits="))?[5..];
b will be null if "bits=" is not present
If you're going to parse a lot of values out of this string consider turning it into a dictionary:
var c = new []{'='};
var d = value.Split(';').ToDictionary(s => s.Split(c,2)[0], s => s.Split(c,2)[1]);
It's slightly inefficient to split twice, if it bothers you, you can sub string:
value.Split(';').ToDictionary(s => s[..s.IndexOf('=')], s => s[s.IndexOf('=')+1..]);
This gives a dictionary of string, so you can do like:
if(d.ContainsKey("bits")){
var bits = int.Parse(d["bits"]);
...
String has a method Contains(string) that does the job:
if (ChantInputs.Contains("bits")
{
print("I made it here.");
}
You can try below.
private void GameInputs(string ChatInputs)
{
string[] Search = ChatInputs.Split(new char[] { ';', '=' });
foreach(string s in Search)
{
if(s == "bits")
{
print("I made it here.");
}
}
}
Below is the working code.
using System;
namespace ConsoleApp3
{
class Program
{
static void Main(string[] args)
{
string str = "one;two;test;three;test+test";
string[] strs = str.Split(new char[] { ';', '+' });
foreach(string s in strs)
{
if(s == "test")
{
Console.WriteLine(s);
}
}
Console.ReadLine();
}
}
}

How to select precisely a code block in String (Methods, Functions)

I have a code in String and I need to select only the method that I want, for example:
"public void GetEntities(List<PublisherBO> _publishers, WebResourceBO _webResourceBO, SqlBO _sqlBO) {
List<EntityBO> entities = new List<EntityBO>();
List<EntityMetadata> results = entityDAO.GetAllEntities();
EntitiesMetadata = results;
entities = ConvertEntities(results);
Data = entities;
}
public void ResetConnection() {
sqlOpen = false;
dynamicsOpen = false;
if (Service != null) {
if (Service.OrganizationWebProxyClient != null)
Service.OrganizationWebProxyClient.Close();
Service.Dispose();
}
}"
And I want to select only this:
"public void ResetConnection() {
sqlOpen = false;
dynamicsOpen = false;
if (Service != null) {
if (Service.OrganizationWebProxyClient != null)
Service.OrganizationWebProxyClient.Close();
Service.Dispose();
}
}"
I know that I have to use IndexOf Methods, the complicated is consider the { } because I know that I need to finish the select on code in } but how to consider the others { } inside this block of code and precisely select all method.
And a note: I cannot divide this in lines, because some codes have only 1 line (javascript)
This is a regex pattern that will work to match code blocks. It can fail if there are unclosed {} inside of strings.
(.*?){(?:[^{}]+|{(?<n>)|}(?<-n>))+(?(n)(?!))*}
Regex Storm
Here is a working example in C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace ConsoleApp1
{
class Program
{
static void Main (string[] args)
{
var text = #"public void GetEntities(List<PublisherBO> _publishers, WebResourceBO _webResourceBO, SqlBO _sqlBO) {
List<EntityBO> entities = new List<EntityBO>();
List<EntityMetadata> results = entityDAO.GetAllEntities();
EntitiesMetadata = results;
entities = ConvertEntities(results);
Data = entities;
}
public void ResetConnection() {
sqlOpen = false;
dynamicsOpen = false;
if (Service != null) {
if (Service.OrganizationWebProxyClient != null)
Service.OrganizationWebProxyClient.Close();
Service.Dispose();
}
}";
var matches = Regex.Matches(text, "(.*?){(?:[^{}]+|{(?<n>)|}(?<-n>))+(?(n)(?!))*}");
if (matches.Count > 0)
{
}
}
}
}
Here is a link to regex storm showing the original pattern matching C# code as well as JS code with various syntax
Regex Storm with C# and JS

Searching Specific Data From a File

I have a File having text and few numbers.I just want to extract numbers from it.How do I go about it ???
I tried using all that split thing but no luck so far.
My File is like this:
AT+CMGL="ALL"
+CMGL: 5566,"REC READ","Ufone"
Dear customer, your DAY_BUCKET subscription will expire on 02/05/09
+CMGL: 5565,"REC READ","+923466666666"
KINDLY TELL ME THE WAY TO EXTRACT NUMBERS LIKE +923466666666 from this File so I can put them into another File or textbox.
Thanks
Here's an example using the String.Split. The "number" contains a '+', so really it should be treated as a string not a number. I'm presuming it's a telephone number with the '+' potentially used for international calls? If it is a telephone number, you need to be careful of dashes, spaces in the number as well as extension numbers added to the end eg "+9234 666-66666 ext 235" and so on...
Anyway - hopefully the example is useful in getting to grips with Split.
The code include unit tests using NUnit v2.4.8
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NUnit.Framework;
using System.Text.RegularExpressions;
namespace SO.NumberExtractor.Test
{
public class NumberExtracter
{
public List<string> ExtractNumbers(string lines)
{
List<string> numbers = new List<string>();
string[] seperator = { System.Environment.NewLine };
string[] seperatedLines = lines.Split(seperator, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in seperatedLines)
{
string s = ExtractNumber(line);
numbers.Add(s);
}
return numbers;
}
public string ExtractNumber(string line)
{
string s = line.Split(',').Last<string>().Trim('"');
return s;
}
public string ExtractNumberWithoutLinq(string line)
{
string[] fields = line.Split(',');
string s = fields[fields.Length - 1];
s = s.Trim('"');
return s;
}
}
[TestFixture]
public class NumberExtracterTest
{
private readonly string LINE1 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666666\"";
private readonly string LINE2 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666667\"";
private readonly string LINE3 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666668\"";
[Test]
public void ExtractOneLineWithoutLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumberWithoutLinq(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractOneLineUsingLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumber(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractMultipleLines()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(LINE1);
sb.AppendLine(LINE2);
sb.AppendLine(LINE3);
NumberExtracter ne = new NumberExtracter();
List<string> extractedNumbers = ne.ExtractNumbers(sb.ToString());
string expectedFirst = "+923466666666";
string expectedSecond = "+923466666667";
string expectedThird = "+923466666668";
Assert.AreEqual(expectedFirst, extractedNumbers[0]);
Assert.AreEqual(expectedSecond, extractedNumbers[1]);
Assert.AreEqual(expectedThird, extractedNumbers[2]);
}
}
}
If the numbers are all at the end of the lines then you can use code like the following
foreach ( string line in File.ReadAllLines(#"c:\path\to\file.txt") ) {
Match result = Regex.Match(line, #"\+(\d+)""$");
if ( result.Success ) {
var number = result.Groups[1].Value;
// do what you want with the number
}
}
How large is the file? If the file is under a few megabytes in size I would recommend loading the file contents into a string and using a compiled regular expression to extract matches.
Here's a quick example:
Regex NumberExtractor = new Regex("[0-9]{7,16}",RegexOptions.Compiled);
/// <summary>
/// Extracts numbers between seven and sixteen digits long from the target file.
/// Example number to be extracted: +923466666666
/// </summary>
/// <param name="TargetFilePath"></param>
/// <returns>List of the matching numbers</returns>
private IEnumerable<ulong> ExtractLongNumbersFromFile(string TargetFilePath)
{
if (String.IsNullOrEmpty(TargetFilePath))
throw new ArgumentException("TargetFilePath is null or empty.", "TargetFilePath");
if (File.Exists(TargetFilePath) == false)
throw new Exception("Target file does not exist!");
FileStream TargetFileStream = null;
StreamReader TargetFileStreamReader = null;
string FileContents = "";
List<ulong> ReturnList = new List<ulong>();
try
{
TargetFileStream = new FileStream(TargetFilePath, FileMode.Open);
TargetFileStreamReader = new StreamReader(TargetFileStream);
FileContents = TargetFileStreamReader.ReadToEnd();
MatchCollection Matches = NumberExtractor.Matches(FileContents);
foreach (Match CurrentMatch in Matches) {
ReturnList.Add(System.Convert.ToUInt64(CurrentMatch.Value));
}
}
catch (Exception ex)
{
//Your logging, etc...
}
finally
{
if (TargetFileStream != null) {
TargetFileStream.Close();
TargetFileStream.Dispose();
}
if (TargetFileStreamReader != null)
{
TargetFileStreamReader.Dispose();
}
}
return (IEnumerable<ulong>)ReturnList;
}
Sample Usage:
List<ulong> Numbers = (List<ulong>)ExtractLongNumbersFromFile(#"v:\TestExtract.txt");

C# Sanitize File Name

I recently have been moving a bunch of MP3s from various locations into a repository. I had been constructing the new file names using the ID3 tags (thanks, TagLib-Sharp!), and I noticed that I was getting a System.NotSupportedException:
"The given path's format is not supported."
This was generated by either File.Copy() or Directory.CreateDirectory().
It didn't take long to realize that my file names needed to be sanitized. So I did the obvious thing:
public static string SanitizePath_(string path, char replaceChar)
{
string dir = Path.GetDirectoryName(path);
foreach (char c in Path.GetInvalidPathChars())
dir = dir.Replace(c, replaceChar);
string name = Path.GetFileName(path);
foreach (char c in Path.GetInvalidFileNameChars())
name = name.Replace(c, replaceChar);
return dir + name;
}
To my surprise, I continued to get exceptions. It turned out that ':' is not in the set of Path.GetInvalidPathChars(), because it is valid in a path root. I suppose that makes sense - but this has to be a pretty common problem. Does anyone have some short code that sanitizes a path? The most thorough I've come up with this, but it feels like it is probably overkill.
// replaces invalid characters with replaceChar
public static string SanitizePath(string path, char replaceChar)
{
// construct a list of characters that can't show up in filenames.
// need to do this because ":" is not in InvalidPathChars
if (_BadChars == null)
{
_BadChars = new List<char>(Path.GetInvalidFileNameChars());
_BadChars.AddRange(Path.GetInvalidPathChars());
_BadChars = Utility.GetUnique<char>(_BadChars);
}
// remove root
string root = Path.GetPathRoot(path);
path = path.Remove(0, root.Length);
// split on the directory separator character. Need to do this
// because the separator is not valid in a filename.
List<string> parts = new List<string>(path.Split(new char[]{Path.DirectorySeparatorChar}));
// check each part to make sure it is valid.
for (int i = 0; i < parts.Count; i++)
{
string part = parts[i];
foreach (char c in _BadChars)
{
part = part.Replace(c, replaceChar);
}
parts[i] = part;
}
return root + Utility.Join(parts, Path.DirectorySeparatorChar.ToString());
}
Any improvements to make this function faster and less baroque would be much appreciated.
To clean up a file name you could do this
private static string MakeValidFileName( string name )
{
string invalidChars = System.Text.RegularExpressions.Regex.Escape( new string( System.IO.Path.GetInvalidFileNameChars() ) );
string invalidRegStr = string.Format( #"([{0}]*\.+$)|([{0}]+)", invalidChars );
return System.Text.RegularExpressions.Regex.Replace( name, invalidRegStr, "_" );
}
A shorter solution:
var invalids = System.IO.Path.GetInvalidFileNameChars();
var newName = String.Join("_", origFileName.Split(invalids, StringSplitOptions.RemoveEmptyEntries) ).TrimEnd('.');
Based on Andre's excellent answer but taking into account Spud's comment on reserved words, I made this version:
/// <summary>
/// Strip illegal chars and reserved words from a candidate filename (should not include the directory path)
/// </summary>
/// <remarks>
/// http://stackoverflow.com/questions/309485/c-sharp-sanitize-file-name
/// </remarks>
public static string CoerceValidFileName(string filename)
{
var invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars()));
var invalidReStr = string.Format(#"[{0}]+", invalidChars);
var reservedWords = new []
{
"CON", "PRN", "AUX", "CLOCK$", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4",
"COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4",
"LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
};
var sanitisedNamePart = Regex.Replace(filename, invalidReStr, "_");
foreach (var reservedWord in reservedWords)
{
var reservedWordPattern = string.Format("^{0}\\.", reservedWord);
sanitisedNamePart = Regex.Replace(sanitisedNamePart, reservedWordPattern, "_reservedWord_.", RegexOptions.IgnoreCase);
}
return sanitisedNamePart;
}
And these are my unit tests
[Test]
public void CoerceValidFileName_SimpleValid()
{
var filename = #"thisIsValid.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual(filename, result);
}
[Test]
public void CoerceValidFileName_SimpleInvalid()
{
var filename = #"thisIsNotValid\3\\_3.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid_3__3.txt", result);
}
[Test]
public void CoerceValidFileName_InvalidExtension()
{
var filename = #"thisIsNotValid.t\xt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid.t_xt", result);
}
[Test]
public void CoerceValidFileName_KeywordInvalid()
{
var filename = "aUx.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("_reservedWord_.txt", result);
}
[Test]
public void CoerceValidFileName_KeywordValid()
{
var filename = "auxillary.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("auxillary.txt", result);
}
string clean = String.Concat(dirty.Split(Path.GetInvalidFileNameChars()));
there are a lot of working solutions here. just for the sake of completeness, here's an approach that doesn't use regex, but uses LINQ:
var invalids = Path.GetInvalidFileNameChars();
filename = invalids.Aggregate(filename, (current, c) => current.Replace(c, '_'));
Also, it's a very short solution ;)
I'm using the System.IO.Path.GetInvalidFileNameChars() method to check invalid characters and I've got no problems.
I'm using the following code:
foreach( char invalidchar in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(invalidchar, '_');
}
I wanted to retain the characters in some way, not just simply replace the character with an underscore.
One way I thought was to replace the characters with similar looking characters which are (in my situation), unlikely to be used as regular characters. So I took the list of invalid characters and found look-a-likes.
The following are functions to encode and decode with the look-a-likes.
This code does not include a complete listing for all System.IO.Path.GetInvalidFileNameChars() characters. So it is up to you to extend or utilize the underscore replacement for any remaining characters.
private static Dictionary<string, string> EncodeMapping()
{
//-- Following characters are invalid for windows file and folder names.
//-- \/:*?"<>|
Dictionary<string, string> dic = new Dictionary<string, string>();
dic.Add(#"\", "Ì"); // U+OOCC
dic.Add("/", "Í"); // U+OOCD
dic.Add(":", "¦"); // U+00A6
dic.Add("*", "¤"); // U+00A4
dic.Add("?", "¿"); // U+00BF
dic.Add(#"""", "ˮ"); // U+02EE
dic.Add("<", "«"); // U+00AB
dic.Add(">", "»"); // U+00BB
dic.Add("|", "│"); // U+2502
return dic;
}
public static string Escape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Key, replace.Value);
}
//-- handle dot at the end
if (name.EndsWith(".")) name = name.CropRight(1) + "°";
return name;
}
public static string UnEscape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Value, replace.Key);
}
//-- handle dot at the end
if (name.EndsWith("°")) name = name.CropRight(1) + ".";
return name;
}
You can select your own look-a-likes. I used the Character Map app in windows to select mine %windir%\system32\charmap.exe
As I make adjustments through discovery, I will update this code.
I think the problem is that you first call Path.GetDirectoryName on the bad string. If this has non-filename characters in it, .Net can't tell which parts of the string are directories and throws. You have to do string comparisons.
Assuming it's only the filename that is bad, not the entire path, try this:
public static string SanitizePath(string path, char replaceChar)
{
int filenamePos = path.LastIndexOf(Path.DirectorySeparatorChar) + 1;
var sb = new System.Text.StringBuilder();
sb.Append(path.Substring(0, filenamePos));
for (int i = filenamePos; i < path.Length; i++)
{
char filenameChar = path[i];
foreach (char c in Path.GetInvalidFileNameChars())
if (filenameChar.Equals(c))
{
filenameChar = replaceChar;
break;
}
sb.Append(filenameChar);
}
return sb.ToString();
}
I have had success with this in the past.
Nice, short and static :-)
public static string returnSafeString(string s)
{
foreach (char character in Path.GetInvalidFileNameChars())
{
s = s.Replace(character.ToString(),string.Empty);
}
foreach (char character in Path.GetInvalidPathChars())
{
s = s.Replace(character.ToString(), string.Empty);
}
return (s);
}
Here's an efficient lazy loading extension method based on Andre's code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LT
{
public static class Utility
{
static string invalidRegStr;
public static string MakeValidFileName(this string name)
{
if (invalidRegStr == null)
{
var invalidChars = System.Text.RegularExpressions.Regex.Escape(new string(System.IO.Path.GetInvalidFileNameChars()));
invalidRegStr = string.Format(#"([{0}]*\.+$)|([{0}]+)", invalidChars);
}
return System.Text.RegularExpressions.Regex.Replace(name, invalidRegStr, "_");
}
}
}
Your code would be cleaner if you appended the directory and filename together and sanitized that rather than sanitizing them independently. As for sanitizing away the :, just take the 2nd character in the string. If it is equal to "replacechar", replace it with a colon. Since this app is for your own use, such a solution should be perfectly sufficient.
using System;
using System.IO;
using System.Linq;
using System.Text;
public class Program
{
public static void Main()
{
try
{
var badString = "ABC\\DEF/GHI<JKL>MNO:PQR\"STU\tVWX|YZA*BCD?EFG";
Console.WriteLine(badString);
Console.WriteLine(SanitizeFileName(badString, '.'));
Console.WriteLine(SanitizeFileName(badString));
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
private static string SanitizeFileName(string fileName, char? replacement = null)
{
if (fileName == null) { return null; }
if (fileName.Length == 0) { return ""; }
var sb = new StringBuilder();
var badChars = Path.GetInvalidFileNameChars().ToList();
foreach (var #char in fileName)
{
if (badChars.Contains(#char))
{
if (replacement.HasValue)
{
sb.Append(replacement.Value);
}
continue;
}
sb.Append(#char);
}
return sb.ToString();
}
}
Based #fiat's and #Andre's approach, I'd like to share my solution too.
Main difference:
its an extension method
regex is compiled at first use to save some time with a lot executions
reserved words are preserved
public static class StringPathExtensions
{
private static Regex _invalidPathPartsRegex;
static StringPathExtensions()
{
var invalidReg = System.Text.RegularExpressions.Regex.Escape(new string(Path.GetInvalidFileNameChars()));
_invalidPathPartsRegex = new Regex($"(?<reserved>^(CON|PRN|AUX|CLOCK\\$|NUL|COM0|COM1|COM2|COM3|COM4|COM5|COM6|COM7|COM8|COM9|LPT0|LPT1|LPT2|LPT3|LPT4|LPT5|LPT6|LPT7|LPT8|LPT9))|(?<invalid>[{invalidReg}:]+|\\.$)", RegexOptions.Compiled);
}
public static string SanitizeFileName(this string path)
{
return _invalidPathPartsRegex.Replace(path, m =>
{
if (!string.IsNullOrWhiteSpace(m.Groups["reserved"].Value))
return string.Concat("_", m.Groups["reserved"].Value);
return "_";
});
}
}

Categories

Resources