Running a RegEx on an ITextViewLine (Visual Studio Extension)

Running a RegEx on an ITextViewLine (Visual Studio Extension) - c#

I am looking to run a RegEx search to find all occurrences of certain keywords in the Editor Window and then draw some adornments and add some tags to them.
Is there any way to run a RegEx on an ITextViewLine.
This is how my calling function looks:
private void OnLayoutChanged(object sender, TextViewLayoutChangedEventArgs e)
{
foreach (ITextViewLine line in e.NewOrReformattedLines)
{
this.CreateVisuals(line);
}
}
private void CreateVisuals(ITextViewLine line)
{
IWpfTextViewLineCollection textViewLines = _wpfTextView.TextViewLines;
// Run RegEx match here and do some stuff for all matches
}
As suggested by #stribizhev I tried using FormattedSpan as follows:
private void CreateVisuals(ITextViewLine line)
{
var textViewLines = _wpfTextView.TextViewLines;
var snapshot = textViewLines.FormattedSpan;
var text = snapshot.ToString();
var todoRegex = new Regex(#"\/\/\s*TODO\b");
var match = todoRegex.Match(text);
if (match.Success)
{
int matchStart = line.Start.Position + match.Index;
var span = new SnapshotSpan(_wpfTextView.TextSnapshot, Span.FromBounds(matchStart, matchStart + match.Length));
DrawAdornment(textViewLines, span);
}
}
But this causes a NullReference at the call to DrawAdornment telling me that span is unset.
And moreover by putting breakpoints on all lines in the CreateVisuals function I saw that the highlighting only starts when the line containing the TODO scrolls out of view or becomes the first line in the viewport.
The input I tried was:
using System;
public class Class1
{
public Class1()
{
// TODO: It's a good thing to have todos
}
}
The code is able to put adornments sometimes but they are shifted slightly to the right and appear on three different lines.

I finally got it to work. There are two ways to do it.
My way (easier):
private void CreateVisuals()
{
var textViewLines = _wpfTextView.TextViewLines;
var text = textViewLines.FormattedSpan.Snapshot.GetText();
var todoRegex = new Regex(#"\/\/\s*TODO\b");
var match = todoRegex.Match(text);
while (match.Success)
{
var matchStart = match.Index;
var span = new SnapshotSpan(_wpfTextView.TextSnapshot, Span.FromBounds(matchStart, matchStart + match.Length));
DrawAdornment(textViewLines, span);
match = match.NextMatch();
}
The tough(er) way: (From this article)
/// <summary>
/// This will get the text of the ITextView line as it appears in the actual user editable
/// document.
/// jared parson: https://gist.github.com/4320643
/// </summary>
public static bool TryGetText(IWpfTextView textView, ITextViewLine textViewLine, out string text)
{
var extent = textViewLine.Extent;
var bufferGraph = textView.BufferGraph;
try
{
var collection = bufferGraph.MapDownToSnapshot(extent, SpanTrackingMode.EdgeInclusive, textView.TextSnapshot);
var span = new SnapshotSpan(collection[0].Start, collection[collection.Count - 1].End);
//text = span.ToString();
text = span.GetText();
return true;
}
catch
{
text = null;
return false;
}
}
Regex todoLineRegex = new Regex(#"\/\/\s*TODO\b");
private void CreateVisuals(ITextViewLine line)
{
IWpfTextViewLineCollection textViewLines = _view.TextViewLines;
string text = null;
if (TryGetText(_view, line, out text))
{
var match = todoLineRegex.Match(text);
if (match.Success)
{
int matchStart = line.Start.Position + span.Index;
var span = new SnapshotSpan(_view.TextSnapshot, Span.FromBounds(matchStart, matchStart + match.Length));
DrawAdornment(textViewLines, span);
}
}
}

Related

EnvDTE - Get the Find text results from 'FindResult 1" window in Visual Studio with C#

According to this post, you can make Visual Studio find.
I update the code of Asif Iqbal K from the article a bit to eliminate build error.
public const string vsWindowKindFindResults1 = "{0F887920-C2B6-11D2-9375-0080C747D9A0}";
public string FindInFiles(string searchText)
{
EnvDTE80.DTE2 dte;
dte = (EnvDTE80.DTE2)System.Runtime.InteropServices.Marshal.GetActiveObject("VisualStudio.DTE");
dte.MainWindow.Activate();
EnvDTE.Find find = dte.Find;
find.Action = EnvDTE.vsFindAction.vsFindActionFindAll;
find.FindWhat = searchText;
find.MatchWholeWord = false;
find.ResultsLocation = EnvDTE.vsFindResultsLocation.vsFindResults1;
find.Target = EnvDTE.vsFindTarget.vsFindTargetSolution;
find.PatternSyntax = EnvDTE.vsFindPatternSyntax.vsFindPatternSyntaxRegExpr;
find.SearchSubfolders = true;
var x = dte.Find.FindWhat;
EnvDTE.vsFindResult result = find.Execute();
var findWindow = dte.Windows.Item(vsWindowKindFindResults1);
string data = "";
System.Threading.Thread.Sleep(5000);//Comment out this code to see the problem, this line of code is not the solution though.
if (result == EnvDTE.vsFindResult.vsFindResultFound)
{
var selection = findWindow.Selection as EnvDTE.TextSelection;
selection.SelectAll();
data = selection.Text;
}
return data;
}
I see that the problem is the function return the string (string data) too early, so it can't get all the text from the result window.
So the code comes so close to get the find text. One remaining puzzle is to check if the find process complete, then get the text.
So the question is: replace what code with the code
System.Threading.Thread.Sleep(5000);
So that the function FindInFiles() can get all the text of 'FindResult 1" window.
Thanks for reading.

Here is the solution
EnvDTE80.DTE2 s_dte;
EnvDTE.FindEvents s_findEvents;
public const string vsWindowKindFindResults1 = "{0F887920-C2B6-11D2-9375-0080C747D9A0}";
public frmFindHelper()
{
InitializeComponent();
s_dte = (EnvDTE80.DTE2)System.Runtime.InteropServices.Marshal.GetActiveObject("VisualStudio.DTE");
s_dte.MainWindow.Activate();
s_findEvents = s_dte.Events.FindEvents;
s_findEvents.FindDone += new EnvDTE._dispFindEvents_FindDoneEventHandler(OnFindDone);
}
private void OnFindDone(EnvDTE.vsFindResult result, bool cancelled)
{
if (result == EnvDTE.vsFindResult.vsFindResultFound)
{
var findWindow = s_dte.Windows.Item(vsWindowKindFindResults1);
string data = "";
var selection = findWindow.Selection as EnvDTE.TextSelection;
selection.SelectAll();
data = selection.Text;
MessageBox.Show("Done!");
}
}
private void btnFind_Click(object sender, EventArgs e)
{
EnvDTE.Find find = s_dte.Find;
find.Action = EnvDTE.vsFindAction.vsFindActionFindAll;
find.FindWhat = txtSearch.Text;
find.MatchWholeWord = false;
find.ResultsLocation = EnvDTE.vsFindResultsLocation.vsFindResults1;
find.Target = EnvDTE.vsFindTarget.vsFindTargetSolution;
find.PatternSyntax = EnvDTE.vsFindPatternSyntax.vsFindPatternSyntaxRegExpr;
find.SearchSubfolders = true;
var x = s_dte.Find.FindWhat;
EnvDTE.vsFindResult result = find.Execute();
}
Thanks to Ed Dore from this post

Profile Language Syntax Highlighting C# Extension for Visual Studio

I have an add-in for Visual Studio for Java like profile files. A sample file can be found here.
The problem occurs when the \ character is inserted on a specific line. When backwards slash is used, the next line is interpreted as a value (or as a key, or as a key and a value). A simple example is:
1. part1_key\
2. part2_key\
3. part3_key = value
4. key = part1_value\
5. part2_value
The syntax highlighter works when the file is loaded, but when a line is modified, only that line is evaluated and highlighted. So, when the \ is inserted on line 4 for example, the line 4 is highlighted, but the line 5 is not updated. The line 5 is updated only when it's modified (add or remove a character, or a space).
I've created a ITaggerProvider which creates an ITagger whose type is Key, Value or Comment. The ITagger class is as follows:
internal sealed class PropertiesTokenTagger : ITagger<PropertiesTokenTag> {
private readonly Regex keyValuePattern = new Regex(#"(?<!^\s*|\\)([ \t]*[=:][ \t]*|[ \t]+)");
private readonly Regex separatorPattern = new Regex(#"^([=:]|[ \t]+)");
private readonly Regex commentPattern = new Regex(#"^\s*[#!]");
private readonly Regex escapedLineEndPattern = new Regex(#"\\$");
public event EventHandler<SnapshotSpanEventArgs> TagsChanged {
add { }
remove { }
}
public IEnumerable<ITagSpan<PropertiesTokenTag>> GetTags(NormalizedSnapshotSpanCollection spans) {
// sadly `spans` gets one line at a time, so previouslyEscapedValue will not get the chance to be used
foreach (var curSpan in spans) {
var containingLine = curSpan.Start.GetContainingLine();
var lineStartLoc = containingLine.Start.Position;
var lineText = containingLine.GetText();
var previousIsNotComment = false;
var previousLine = curSpan.Snapshot.Lines.LastOrDefault(l => l.End <= curSpan.Start);
if (previousLine != null && escapedLineEndPattern.IsMatch(previousLine.GetText())) {
var previousToken = GetTags(new NormalizedSnapshotSpanCollection(previousLine.Extent)).ToList();
if (previousToken.Count > 0) {
var propertiesTokenTypes = previousToken.Last().Tag.Type;
if (propertiesTokenTypes == PropertiesValue) {
var valueSpan = new SnapshotSpan(curSpan.Snapshot, new Span(lineStartLoc, lineText.Length));
yield return new TagSpan<PropertiesTokenTag>(valueSpan, new PropertiesTokenTag(PropertiesValue));
continue;
}
if (propertiesTokenTypes == PropertiesKey && separatorPattern.IsMatch(lineText)) {
var valueSpan = new SnapshotSpan(curSpan.Snapshot, new Span(lineStartLoc, lineText.Length));
yield return new TagSpan<PropertiesTokenTag>(valueSpan, new PropertiesTokenTag(PropertiesValue));
continue;
}
previousIsNotComment = propertiesTokenTypes != PropertiesComment;
}
}
if (commentPattern.IsMatch(lineText) && !previousIsNotComment) {
var commentSpan = new SnapshotSpan(curSpan.Snapshot, new Span(lineStartLoc, lineText.Length));
yield return new TagSpan<PropertiesTokenTag>(commentSpan, new PropertiesTokenTag(PropertiesComment));
continue;
}
if (keyValuePattern.IsMatch(lineText)) {
var splitPosition = keyValuePattern.Split(lineText)[0].Length;
var keySpan = new SnapshotSpan(curSpan.Snapshot, new Span(lineStartLoc, splitPosition));
yield return new TagSpan<PropertiesTokenTag>(keySpan, new PropertiesTokenTag(PropertiesKey));
var valueSpan = new SnapshotSpan(curSpan.Snapshot, new Span(lineStartLoc + splitPosition + 1, lineText.Length - splitPosition - 1));
yield return new TagSpan<PropertiesTokenTag>(valueSpan, new PropertiesTokenTag(PropertiesValue));
} else {
var keySpan = new SnapshotSpan(curSpan.Snapshot, new Span(lineStartLoc, lineText.Length));
yield return new TagSpan<PropertiesTokenTag>(keySpan, new PropertiesTokenTag(PropertiesKey));
}
}
}
From what I see, the problem is the fact that GetTags method is called from an external source, and I cannot call manually this method in order to update other lines. This method return a List of TagSpans which intersects the NormalizedSnapshotSpanCollection parameter, so if I try to return a tag for another line other than current line, that tag won't be processed.
LE: I've done it, the solution was to raise the TagsChanged event from outside GetTags method, and also in aggregator from ITagger.

trouble comparing strings in wp7 application

Here is a sample of my code.
Here I recieve a string variable from another page.
protected override void OnNavigatedTo(System.Windows.Navigation.NavigationEventArgs e)
{
base.OnNavigatedTo(e);
string newparameter = this.NavigationContext.QueryString["search"];
weareusingxml();
displayResults(newparameter);
}
private void displayResults(string search)
{
bool flag = false;
try
{
using (IsolatedStorageFile myIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication())
{
using (IsolatedStorageFileStream stream = myIsolatedStorage.OpenFile("People.xml", FileMode.Open))
{
XmlSerializer serializer = new XmlSerializer(typeof(List<Person>));
List<Person> data = (List<Person>)serializer.Deserialize(stream);
List<Result> results = new List<Result>();
for (int i = 0; i < data.Count; i++)
{
string temp1 = data[i].name.ToUpper();
string temp2 = "*" + search.ToUpper() + "*";
if (temp1 == temp2)
{
results.Add(new Result() {name = data[i].name, gender = data[i].gender, pronouciation = data[i].pronouciation, definition = data[i].definition, audio = data[i].audio });
flag = true;
}
}
this.listBox.ItemsSource = results;
}
catch
{
textBlock1.Text = "error loading page";
}
if(!flag)
{
textBlock1.Text = "no matching results";
}
}
Nothing is loaded into the list when the code is run, I just get the message "no matching results".

Looks like you are trying to do a contains search (my guess based on your addition of the * around the search string. You can remove the '*' and do a string.Contains match.
Try this.
string temp1 = data[i].name.ToUpper();
string temp2 = search.ToUpper()
if (temp1.Contains(temp2))
{

It looks like you are trying to check if one string contains another (ie substring match) and not if they are equal.
In C#, you do this like this:
haystack = "Applejuice box";
needle = "juice";
if (haystack.Contains(needle))
{
// Match
}
Or, in your case (and skip the * you added to the string temp2)
if (temp1.Contains(temp2))
{
// add them to the list
}

Have you checked to make sure data.Count > 0?

Searching Specific Data From a File

I have a File having text and few numbers.I just want to extract numbers from it.How do I go about it ???
I tried using all that split thing but no luck so far.
My File is like this:
AT+CMGL="ALL"
+CMGL: 5566,"REC READ","Ufone"
Dear customer, your DAY_BUCKET subscription will expire on 02/05/09
+CMGL: 5565,"REC READ","+923466666666"
KINDLY TELL ME THE WAY TO EXTRACT NUMBERS LIKE +923466666666 from this File so I can put them into another File or textbox.
Thanks

Here's an example using the String.Split. The "number" contains a '+', so really it should be treated as a string not a number. I'm presuming it's a telephone number with the '+' potentially used for international calls? If it is a telephone number, you need to be careful of dashes, spaces in the number as well as extension numbers added to the end eg "+9234 666-66666 ext 235" and so on...
Anyway - hopefully the example is useful in getting to grips with Split.
The code include unit tests using NUnit v2.4.8
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NUnit.Framework;
using System.Text.RegularExpressions;
namespace SO.NumberExtractor.Test
{
public class NumberExtracter
{
public List<string> ExtractNumbers(string lines)
{
List<string> numbers = new List<string>();
string[] seperator = { System.Environment.NewLine };
string[] seperatedLines = lines.Split(seperator, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in seperatedLines)
{
string s = ExtractNumber(line);
numbers.Add(s);
}
return numbers;
}
public string ExtractNumber(string line)
{
string s = line.Split(',').Last<string>().Trim('"');
return s;
}
public string ExtractNumberWithoutLinq(string line)
{
string[] fields = line.Split(',');
string s = fields[fields.Length - 1];
s = s.Trim('"');
return s;
}
}
[TestFixture]
public class NumberExtracterTest
{
private readonly string LINE1 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666666\"";
private readonly string LINE2 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666667\"";
private readonly string LINE3 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666668\"";
[Test]
public void ExtractOneLineWithoutLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumberWithoutLinq(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractOneLineUsingLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumber(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractMultipleLines()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(LINE1);
sb.AppendLine(LINE2);
sb.AppendLine(LINE3);
NumberExtracter ne = new NumberExtracter();
List<string> extractedNumbers = ne.ExtractNumbers(sb.ToString());
string expectedFirst = "+923466666666";
string expectedSecond = "+923466666667";
string expectedThird = "+923466666668";
Assert.AreEqual(expectedFirst, extractedNumbers[0]);
Assert.AreEqual(expectedSecond, extractedNumbers[1]);
Assert.AreEqual(expectedThird, extractedNumbers[2]);
}
}
}

If the numbers are all at the end of the lines then you can use code like the following
foreach ( string line in File.ReadAllLines(#"c:\path\to\file.txt") ) {
Match result = Regex.Match(line, #"\+(\d+)""$");
if ( result.Success ) {
var number = result.Groups[1].Value;
// do what you want with the number
}
}

How large is the file? If the file is under a few megabytes in size I would recommend loading the file contents into a string and using a compiled regular expression to extract matches.
Here's a quick example:
Regex NumberExtractor = new Regex("[0-9]{7,16}",RegexOptions.Compiled);
/// <summary>
/// Extracts numbers between seven and sixteen digits long from the target file.
/// Example number to be extracted: +923466666666
/// </summary>
/// <param name="TargetFilePath"></param>
/// <returns>List of the matching numbers</returns>
private IEnumerable<ulong> ExtractLongNumbersFromFile(string TargetFilePath)
{
if (String.IsNullOrEmpty(TargetFilePath))
throw new ArgumentException("TargetFilePath is null or empty.", "TargetFilePath");
if (File.Exists(TargetFilePath) == false)
throw new Exception("Target file does not exist!");
FileStream TargetFileStream = null;
StreamReader TargetFileStreamReader = null;
string FileContents = "";
List<ulong> ReturnList = new List<ulong>();
try
{
TargetFileStream = new FileStream(TargetFilePath, FileMode.Open);
TargetFileStreamReader = new StreamReader(TargetFileStream);
FileContents = TargetFileStreamReader.ReadToEnd();
MatchCollection Matches = NumberExtractor.Matches(FileContents);
foreach (Match CurrentMatch in Matches) {
ReturnList.Add(System.Convert.ToUInt64(CurrentMatch.Value));
}
}
catch (Exception ex)
{
//Your logging, etc...
}
finally
{
if (TargetFileStream != null) {
TargetFileStream.Close();
TargetFileStream.Dispose();
}
if (TargetFileStreamReader != null)
{
TargetFileStreamReader.Dispose();
}
}
return (IEnumerable<ulong>)ReturnList;
}
Sample Usage:
List<ulong> Numbers = (List<ulong>)ExtractLongNumbersFromFile(#"v:\TestExtract.txt");

C# Sanitize File Name

I recently have been moving a bunch of MP3s from various locations into a repository. I had been constructing the new file names using the ID3 tags (thanks, TagLib-Sharp!), and I noticed that I was getting a System.NotSupportedException:
"The given path's format is not supported."
This was generated by either File.Copy() or Directory.CreateDirectory().
It didn't take long to realize that my file names needed to be sanitized. So I did the obvious thing:
public static string SanitizePath_(string path, char replaceChar)
{
string dir = Path.GetDirectoryName(path);
foreach (char c in Path.GetInvalidPathChars())
dir = dir.Replace(c, replaceChar);
string name = Path.GetFileName(path);
foreach (char c in Path.GetInvalidFileNameChars())
name = name.Replace(c, replaceChar);
return dir + name;
}
To my surprise, I continued to get exceptions. It turned out that ':' is not in the set of Path.GetInvalidPathChars(), because it is valid in a path root. I suppose that makes sense - but this has to be a pretty common problem. Does anyone have some short code that sanitizes a path? The most thorough I've come up with this, but it feels like it is probably overkill.
// replaces invalid characters with replaceChar
public static string SanitizePath(string path, char replaceChar)
{
// construct a list of characters that can't show up in filenames.
// need to do this because ":" is not in InvalidPathChars
if (_BadChars == null)
{
_BadChars = new List<char>(Path.GetInvalidFileNameChars());
_BadChars.AddRange(Path.GetInvalidPathChars());
_BadChars = Utility.GetUnique<char>(_BadChars);
}
// remove root
string root = Path.GetPathRoot(path);
path = path.Remove(0, root.Length);
// split on the directory separator character. Need to do this
// because the separator is not valid in a filename.
List<string> parts = new List<string>(path.Split(new char[]{Path.DirectorySeparatorChar}));
// check each part to make sure it is valid.
for (int i = 0; i < parts.Count; i++)
{
string part = parts[i];
foreach (char c in _BadChars)
{
part = part.Replace(c, replaceChar);
}
parts[i] = part;
}
return root + Utility.Join(parts, Path.DirectorySeparatorChar.ToString());
}
Any improvements to make this function faster and less baroque would be much appreciated.

To clean up a file name you could do this
private static string MakeValidFileName( string name )
{
string invalidChars = System.Text.RegularExpressions.Regex.Escape( new string( System.IO.Path.GetInvalidFileNameChars() ) );
string invalidRegStr = string.Format( #"([{0}]*\.+$)|([{0}]+)", invalidChars );
return System.Text.RegularExpressions.Regex.Replace( name, invalidRegStr, "_" );
}

A shorter solution:
var invalids = System.IO.Path.GetInvalidFileNameChars();
var newName = String.Join("_", origFileName.Split(invalids, StringSplitOptions.RemoveEmptyEntries) ).TrimEnd('.');

Based on Andre's excellent answer but taking into account Spud's comment on reserved words, I made this version:
/// <summary>
/// Strip illegal chars and reserved words from a candidate filename (should not include the directory path)
/// </summary>
/// <remarks>
/// http://stackoverflow.com/questions/309485/c-sharp-sanitize-file-name
/// </remarks>
public static string CoerceValidFileName(string filename)
{
var invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars()));
var invalidReStr = string.Format(#"[{0}]+", invalidChars);
var reservedWords = new []
{
"CON", "PRN", "AUX", "CLOCK$", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4",
"COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4",
"LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
};
var sanitisedNamePart = Regex.Replace(filename, invalidReStr, "_");
foreach (var reservedWord in reservedWords)
{
var reservedWordPattern = string.Format("^{0}\\.", reservedWord);
sanitisedNamePart = Regex.Replace(sanitisedNamePart, reservedWordPattern, "_reservedWord_.", RegexOptions.IgnoreCase);
}
return sanitisedNamePart;
}
And these are my unit tests
[Test]
public void CoerceValidFileName_SimpleValid()
{
var filename = #"thisIsValid.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual(filename, result);
}
[Test]
public void CoerceValidFileName_SimpleInvalid()
{
var filename = #"thisIsNotValid\3\\_3.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid_3__3.txt", result);
}
[Test]
public void CoerceValidFileName_InvalidExtension()
{
var filename = #"thisIsNotValid.t\xt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid.t_xt", result);
}
[Test]
public void CoerceValidFileName_KeywordInvalid()
{
var filename = "aUx.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("_reservedWord_.txt", result);
}
[Test]
public void CoerceValidFileName_KeywordValid()
{
var filename = "auxillary.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("auxillary.txt", result);
}

string clean = String.Concat(dirty.Split(Path.GetInvalidFileNameChars()));

there are a lot of working solutions here. just for the sake of completeness, here's an approach that doesn't use regex, but uses LINQ:
var invalids = Path.GetInvalidFileNameChars();
filename = invalids.Aggregate(filename, (current, c) => current.Replace(c, '_'));
Also, it's a very short solution ;)

I'm using the System.IO.Path.GetInvalidFileNameChars() method to check invalid characters and I've got no problems.
I'm using the following code:
foreach( char invalidchar in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(invalidchar, '_');
}

I wanted to retain the characters in some way, not just simply replace the character with an underscore.
One way I thought was to replace the characters with similar looking characters which are (in my situation), unlikely to be used as regular characters. So I took the list of invalid characters and found look-a-likes.
The following are functions to encode and decode with the look-a-likes.
This code does not include a complete listing for all System.IO.Path.GetInvalidFileNameChars() characters. So it is up to you to extend or utilize the underscore replacement for any remaining characters.
private static Dictionary<string, string> EncodeMapping()
{
//-- Following characters are invalid for windows file and folder names.
//-- \/:*?"<>|
Dictionary<string, string> dic = new Dictionary<string, string>();
dic.Add(#"\", "Ì"); // U+OOCC
dic.Add("/", "Í"); // U+OOCD
dic.Add(":", "¦"); // U+00A6
dic.Add("*", "¤"); // U+00A4
dic.Add("?", "¿"); // U+00BF
dic.Add(#"""", "ˮ"); // U+02EE
dic.Add("<", "«"); // U+00AB
dic.Add(">", "»"); // U+00BB
dic.Add("|", "│"); // U+2502
return dic;
}
public static string Escape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Key, replace.Value);
}
//-- handle dot at the end
if (name.EndsWith(".")) name = name.CropRight(1) + "°";
return name;
}
public static string UnEscape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Value, replace.Key);
}
//-- handle dot at the end
if (name.EndsWith("°")) name = name.CropRight(1) + ".";
return name;
}
You can select your own look-a-likes. I used the Character Map app in windows to select mine %windir%\system32\charmap.exe
As I make adjustments through discovery, I will update this code.

I think the problem is that you first call Path.GetDirectoryName on the bad string. If this has non-filename characters in it, .Net can't tell which parts of the string are directories and throws. You have to do string comparisons.
Assuming it's only the filename that is bad, not the entire path, try this:
public static string SanitizePath(string path, char replaceChar)
{
int filenamePos = path.LastIndexOf(Path.DirectorySeparatorChar) + 1;
var sb = new System.Text.StringBuilder();
sb.Append(path.Substring(0, filenamePos));
for (int i = filenamePos; i < path.Length; i++)
{
char filenameChar = path[i];
foreach (char c in Path.GetInvalidFileNameChars())
if (filenameChar.Equals(c))
{
filenameChar = replaceChar;
break;
}
sb.Append(filenameChar);
}
return sb.ToString();
}

I have had success with this in the past.
Nice, short and static :-)
public static string returnSafeString(string s)
{
foreach (char character in Path.GetInvalidFileNameChars())
{
s = s.Replace(character.ToString(),string.Empty);
}
foreach (char character in Path.GetInvalidPathChars())
{
s = s.Replace(character.ToString(), string.Empty);
}
return (s);
}

Here's an efficient lazy loading extension method based on Andre's code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LT
{
public static class Utility
{
static string invalidRegStr;
public static string MakeValidFileName(this string name)
{
if (invalidRegStr == null)
{
var invalidChars = System.Text.RegularExpressions.Regex.Escape(new string(System.IO.Path.GetInvalidFileNameChars()));
invalidRegStr = string.Format(#"([{0}]*\.+$)|([{0}]+)", invalidChars);
}
return System.Text.RegularExpressions.Regex.Replace(name, invalidRegStr, "_");
}
}
}

Your code would be cleaner if you appended the directory and filename together and sanitized that rather than sanitizing them independently. As for sanitizing away the :, just take the 2nd character in the string. If it is equal to "replacechar", replace it with a colon. Since this app is for your own use, such a solution should be perfectly sufficient.

using System;
using System.IO;
using System.Linq;
using System.Text;
public class Program
{
public static void Main()
{
try
{
var badString = "ABC\\DEF/GHI<JKL>MNO:PQR\"STU\tVWX|YZA*BCD?EFG";
Console.WriteLine(badString);
Console.WriteLine(SanitizeFileName(badString, '.'));
Console.WriteLine(SanitizeFileName(badString));
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
private static string SanitizeFileName(string fileName, char? replacement = null)
{
if (fileName == null) { return null; }
if (fileName.Length == 0) { return ""; }
var sb = new StringBuilder();
var badChars = Path.GetInvalidFileNameChars().ToList();
foreach (var #char in fileName)
{
if (badChars.Contains(#char))
{
if (replacement.HasValue)
{
sb.Append(replacement.Value);
}
continue;
}
sb.Append(#char);
}
return sb.ToString();
}
}

Based #fiat's and #Andre's approach, I'd like to share my solution too.
Main difference:
its an extension method
regex is compiled at first use to save some time with a lot executions
reserved words are preserved
public static class StringPathExtensions
{
private static Regex _invalidPathPartsRegex;
static StringPathExtensions()
{
var invalidReg = System.Text.RegularExpressions.Regex.Escape(new string(Path.GetInvalidFileNameChars()));
_invalidPathPartsRegex = new Regex($"(?<reserved>^(CON|PRN|AUX|CLOCK\\$|NUL|COM0|COM1|COM2|COM3|COM4|COM5|COM6|COM7|COM8|COM9|LPT0|LPT1|LPT2|LPT3|LPT4|LPT5|LPT6|LPT7|LPT8|LPT9))|(?<invalid>[{invalidReg}:]+|\\.$)", RegexOptions.Compiled);
}
public static string SanitizeFileName(this string path)
{
return _invalidPathPartsRegex.Replace(path, m =>
{
if (!string.IsNullOrWhiteSpace(m.Groups["reserved"].Value))
return string.Concat("_", m.Groups["reserved"].Value);
return "_";
});
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Running a RegEx on an ITextViewLine (Visual Studio Extension) - c#

Related

EnvDTE - Get the Find text results from 'FindResult 1" window in Visual Studio with C#

Profile Language Syntax Highlighting C# Extension for Visual Studio

trouble comparing strings in wp7 application

Searching Specific Data From a File

C# Sanitize File Name

Categories

Resources