String literal recognition problem - c#

I'm trying to recognize string literal by reading string per symbol.
Example of my scanner skeleton:
public sealed class Scanner
{
// some class inner implementations
/// <summary>
///
/// </summary>
/// <param name="Line"></param>
/// <param name="LineNumber"></param>
public void Run(String Line, Int32 LineNumber)
{
var ChPosition = default(Int32);
var ChCurrent = default(Char);
var Value = new StringBuilder();
while (default(Char) != Line.ElementAtOrDefault<Char>(ChPosition))
{
ChCurrent = Line.ElementAtOrDefault<Char>(ChPosition);
#region [Whitespace]
if (Char.IsWhiteSpace(ChCurrent))
{
ChPosition++;
}
#endregion
else
{
switch (ChCurrent)
{
#region [String Literal (")]
case '"':
{
// skipping " sign, include only string inner value
ChCurrent = Line.ElementAtOrDefault<Char>(++ChPosition);
// ...? Problematic place!!!
this.Tokens.Enqueue(new SharedEntities.Token
{
Class = SharedEntities.Token.TokenClass.StringLiteral,
Value = Value.ToString()
}
);
Value.Clear();
ChPosition++;
break;
}
#endregion
{
throw new ScanningException(
"<syntax_error#" + ChCurrent.ToString() + ">\n"
+ "Unsupported character appeared at: {ln: "
+ LineNumber.ToString()
+ "; pos: "
+ (ChPosition + 1).ToString()
+ "}"
);
}
} // [switch(ChCurrent)]
} // [if(Char.IsWhiteSpace(ChCurrent))...else]
} // [while(default(Char) != Line.ElementAtOrDefault<Char>(ChPosition))]
} // [public void Run(String Line, Int32 LineNumber)]
} // [public sealed class Scanner]
My target is to parse pascal-like string: "{everything enclosed, but ", only "" pair is allowed}".

First, you are obviously using some kind of parsing library, you would have better chance if you had modified your code, e.g. to something like I did, so that anybody can copy, paste, run your code.
Answer is simple, your (string literal)-parsing region does not parse all input. Here is your code modified to be used without any additional library:
public class Test
{
static char ElementAtOrDefault(string value, int position)
{
return position >= value.Length ? default(char) : value[position];
}
static string parseStringLiteral(string value, ref int ChPosition)
{
StringBuilder Value = new StringBuilder();
char ChCurrent = ElementAtOrDefault(value, ++ChPosition);
while (ChCurrent != '"')
{
Value.Append(ChCurrent);
ChCurrent = ElementAtOrDefault(value, ++ChPosition);
if (ChCurrent == '"')
{
// "" sequence only acceptable
if (ElementAtOrDefault(value, ChPosition + 1) == '"')
{
Value.Append(ChCurrent);
// skip 2nd double quote
ChPosition++;
// move position next
ChCurrent = ElementAtOrDefault(value, ++ChPosition);
}
}
else if (default(Char) == ChCurrent)
{
// message: unterminated string
throw new Exception("ScanningException");
}
}
ChPosition++;
return Value.ToString();
}
public static void test(string literal)
{
Console.WriteLine("testing literal with " + literal.Length +
" chars:\n" + literal);
try
{
int pos = 0;
string res = parseStringLiteral(literal, ref pos);
Console.WriteLine("Parsed " + res.Length + " chars:\n" + res);
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex.Message);
}
Console.WriteLine();
}
public static int Main(string[] args)
{
test(#"""Hello Language Design""");
test(#"""Is there any problems with the """"strings""""?""");
test(#"""v#:';?325;.<>,|+_)""(*&^%$##![]{}\|-_=""");
return 0;
}
}
Running this program produces output:
testing literal with 23 chars:
"Hello Language Design"
Parsed 21 chars:
Hello Language Design
testing literal with 45 chars:
"Is there any problems with the ""strings""?"
Parsed 41 chars:
Is there any problems with the "strings"?
testing literal with 39 chars:
"v#:';?325;.,|+_)"(*&^%$##![]{}\|-_="
Parsed 18 chars:
v#:';?325;.,|+_)
So it works for your testing, but algorithm is not correct, try running:
//literal with "", should produce ", but it does not
test(#"""""""""");
And you will incorrectly get:
testing literal with 4 chars:
""""
Parsed 0 chars:
Problem is, if you encounter character " in your while condition, you do not check next character, if it is " or not:
while (ChCurrent != '"') //bug
Of course, I created correct version for you :-)
Here it is (it uses your style, just edited version of yours):
static string parseStringLiteral(string value, ref int ChPosition)
{
StringBuilder Value = new StringBuilder();
char ChCurrent = ElementAtOrDefault(value, ++ChPosition);
bool goon = true;
while (goon)
{
if (ChCurrent == '"')
{
// "" sequence only acceptable
if (ElementAtOrDefault(value, ChPosition + 1) == '"')
{
Value.Append(ChCurrent);
// skip 2nd double quote
ChPosition++;
// move position next
ChCurrent = ElementAtOrDefault(value, ++ChPosition);
}
else goon = false; //break;
}
else if (default(Char) == ChCurrent)
{
// message: unterminated string
throw new Exception("ScanningException");
}
else
{
Value.Append(ChCurrent);
ChCurrent = ElementAtOrDefault(value, ++ChPosition);
}
}
ChPosition++;
return Value.ToString();
}
Happy coding :-)

Related

Stackowerflow in recursion void

My function try parse text - find Vector3 modify it, if exist next Vector func call self again.
Its work fine for small text files, but with big text files all time stackoverflow. Parser try find text if not find - make exit, text file all times make small and small - its not "dead cycle". Error in rand places, usually a here int pos = text.IndexOf(search); (Support module)
private void ParseText()
{
if (isWorking)
{
isWorking = IsVectorReplace();
ParseText();
ProcessShow();
}
else
{
ParserWorkComplite();
}
}
private bool IsVectorReplace()
{
//find index of substring start
int indexOfSubstringStart = fileContent.IndexOf(prefix);
if (indexOfSubstringStart == -1) { return false; } //vector3 not find
//find index of substring end
int nextCharIndx = indexOfSubstringStart;
Char ch = fileContent[nextCharIndx];
while (ch.ToString() != suffix)
{
ch = fileContent[nextCharIndx];
nextCharIndx++;
}
int startCutIndex = indexOfSubstringStart + prefix.Length;
int endCutIndex = nextCharIndx - (indexOfSubstringStart + prefix.Length + 1);
//search done. parse vector
string vectorTextContent = fileContent.Substring(startCutIndex, endCutIndex);
string oldVecText, newVecText;
string vectorNewTextContent = "";
parseVector3 = ConvertFromString(vectorTextContent);
parseVector3 += shiftVector3;
vectorNewTextContent = ConvertVect(parseVector3);
oldVecText = prefix + vectorTextContent + suffix;
newVecText = prefix + vectorNewTextContent + suffix;
string replaceText = ReplaceFirst(fileContent, oldVecText, newVecText);
//Debug.WriteLine("VEC OLD " + vectorTextContent + " VEC NEW "+ vectorNewTextContent);
int lastIndex = endCutIndex; // indexOfSubstringStart + newVecText.Length;
//save and cut file
string savePartText = fileContent.Remove(lastIndex);
partsOfFile.Add(savePartText);
fileContent = fileContent.Remove(0, savePartText.Length);
return true;
}
Some supp modules:
//find vec in string
Vector3 ConvertFromString(string input)
{
if (input != null)
{
var vals = input.Split(',').Select(s => s.Trim()).ToArray();
if (vals.Length == 3)
{
NumberStyles style = System.Globalization.NumberStyles.Any;
CultureInfo culture = CultureInfo.InvariantCulture;
Single v1, v2, v3;
if (Single.TryParse(vals[0], style, culture, out v1) && Single.TryParse(vals[1], style, culture, out v2) && Single.TryParse(vals[2], style, culture, out v3))
return new Vector3(v1, v2, v3);
else
throw new ArgumentException();
}
else
throw new ArgumentException();
}
else
throw new ArgumentException();
}
//convert vect to text back
private string ConvertVect(Vector3 v)
{
string data = "";
string v0, v1, v2;
v0 = v.X.ToString().Replace(",", ".");
v1 = v.Y.ToString().Replace(",", ".");
v2 = v.Z.ToString().Replace(",", ".");
data = v0 + "," + v1 + "," + v2 + "";
return data;
}
public string ReplaceFirst(string text, string search, string replace)
{
int pos = text.IndexOf(search);
if (pos < 0)
{
return text;
}
return text.Substring(0, pos) + replace + text.Substring(pos + search.Length);
}
This isn't exactly what you asked for, but based on the code you have provided, I would propose an entirely different approach using Regex
Here is some code to use as a starting point
You'll need this using statement in your .cs file
using System.Text.RegularExpressions;
//fileContent is assumed to be your original string containing Vector3 instances.
//prefix and suffix are assumed to be strings that mark the beginning and end of the Vector3 string respectively
string vectorRegex = $"(?<={ Regex.Escape(prefix) }).+?(?={ Regex.Escape(suffix) })";
string replacedContent = Regex.Replace(fileContent, vectorRegex, ModifyVector);
private static string ModifyVector(Match vectorMatch)
{
//This is adapted from your code
Vector3 parseVector3 = ConvertFromString(vectorMatch.Value);
parseVector3 += shiftVector3;
return ConvertVect(parseVector3);
}
The code above will do the following
Find all instances of text between prefix and suffix delimiters in fileContent
Invoke the ModifyVector function for each match
Replace the matched value with the output of the ModifyVector funtion

Get Difference Between Two Strings in Terms of Remove and Insert Actions

So I have a text box and on the text changed event I have the old text and the new text, and want to get the difference between them. In this case, I want to be able to recreate the new text with the old text using one remove function and one insert function. That is possible because there are a few possibilities of the change that was in the text box:
Text was only removed (one character or more using selection) - ABCD -> AD
Text was only added (one character or more using paste) - ABCD -> ABXXCD
Text was removed and added (by selecting text and entering text in the same action) - ABCD -> AXD
So I want to have these functions:
Sequence GetRemovedCharacters(string oldText, string newText)
{
}
Sequence GetAddedCharacters(string oldText, string newText)
{
}
My Sequence class:
public class Sequence
{
private int start;
private int end;
public Sequence(int start, int end)
{
StartIndex = start; EndIndex = end;
}
public int StartIndex { get { return start; } set { start = value; Length = end - start + 1; } }
public int EndIndex { get { return end; } set { end = value; Length = end - start + 1; } }
public int Length { get; private set; }
public override string ToString()
{
return "(" + StartIndex + ", " + EndIndex + ")";
}
public static bool operator ==(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return true;
else if(IsNull(a) || IsNull(b))
return false;
else
return a.StartIndex == b.StartIndex && a.EndIndex == b.EndIndex;
}
public override bool Equals(object obj)
{
return base.Equals(obj);
}
public static bool operator !=(Sequence a, Sequence b)
{
if(IsNull(a) && IsNull(b))
return false;
else if(IsNull(a) || IsNull(b))
return true;
else
return a.StartIndex != b.StartIndex && a.EndIndex != b.EndIndex;
}
public override int GetHashCode()
{
return base.GetHashCode();
}
static bool IsNull(Sequence sequence)
{
try
{
return sequence.Equals(null);
}
catch(NullReferenceException)
{
return true;
}
}
}
Extra Explanation: I want to know which characters were removed and which characters were added to the text in order to get the new text so I can recreate this. Let's say I have ABCD -> AXD. 'B' and 'C' would be the characters that were removed and 'X' would be the character that was added. So the output from the GetRemovedCharacters function would be (1, 2) and the output from the GetAddedCharacters function would be (1, 1). The output from the GetRemovedCharacters function refers to indexes in the old text and the output from the GetAddedCharacters function refers to indexes in the old text after removing the removed characters.
EDIT: I've thought of a few directions:
This code I created* which returns the sequence that was affected - if characters were removed it returns the sequence of the characters that were removed in the old text; if characters were added it returns the sequence of the characters that were added in the new text. It does not return the right value (which I myself not sure what I want it to be) when removing and adding text.
Maybe the SelectionStart property in the text box could help - the position of the caret after the text was changed.
*
private static Sequence GetChangeSequence(string oldText, string newText)
{
if(newText.Length > oldText.Length)
{
for(int i = 0; i < newText.Length; i++)
if(i == oldText.Length || newText[i] != oldText[i])
return new Sequence(i, i + (newText.Length - oldText.Length) - 1);
return null;
}
else if(newText.Length < oldText.Length)
{
for(int i = 0; i < oldText.Length; i++)
if(i == newText.Length || oldText[i] != newText[i])
return new Sequence(i, i + (oldText.Length - newText.Length) - 1);
return null;
}
else
return null;
}
Thanks.
A simple string comparison wont do the job since you are asking for a algorithm which supports added and removed chars at the same time and is hence not easy to achive in a few lines of code. Id suggest to use a library instead of writing your own comparison algorithm.
Have a look at this project for example.
I quickly threw this together to give you an idea of what I did to solve your question. It doesn't use your classes but it does find an index so it's customizable for you.
There are also obvious limitations to this as it is just bare bones.
This method will spot out changes made to the original string by comparing it to the changed string
// Find the changes made to a string
string StringDiff (string originalString, string changedString)
{
string diffString = "";
// Iterate over the original string
for (int i = 0; i < originalString.Length; i++)
{
// Get the character to search with
char diffChar = originalString[i];
// If found char in the changed string
if (FindInString(diffChar, changedString, out int index))
{
// Remove from the changed string at the index as we don't want to match to this char again
changedString = changedString.Remove(index, 1);
}
// If not found then this is a difference
else
{
// Add to diff string
diffString += diffChar;
}
}
return diffString;
}
This method will return true at the first matching occurrence (an obvious limitation but this is more to give you an idea)
// Find char at first occurence in string
bool FindInString (char c, string search, out int index)
{
index = -1;
// Iterate over search string
for (int i = 0; i < search.Length; i++)
{
// If found then return true with index
if (c == search[i])
{
index = i;
return true;
}
}
return false;
}
This is a simple helper method to show you an example
void SplitStrings(string oldStr, string newStr)
{
Console.WriteLine($"Old : {oldStr}, New: {newStr}");
Console.WriteLine("Removed - " + StringDiff(oldStr, newStr));
Console.WriteLine("Added - " + StringDiff(newStr, oldStr));
}
I've done it.
static void Main(string[] args)
{
while(true)
{
Console.WriteLine("Enter the Old Text");
string oldText = Console.ReadLine();
Console.WriteLine("Enter the New Text");
string newText = Console.ReadLine();
Console.WriteLine("Enter the Caret Position");
int caretPos = int.Parse(Console.ReadLine());
Sequence removed = GetRemovedCharacters(oldText, newText, caretPos);
if(removed != null)
oldText = oldText.Remove(removed.StartIndex, removed.Length);
Sequence added = GetAddedCharacters(oldText, newText, caretPos);
if(added != null)
oldText = oldText.Insert(added.StartIndex, newText.Substring(added.StartIndex, added.Length));
Console.WriteLine("Worked: " + (oldText == newText).ToString());
Console.ReadKey();
Console.Clear();
}
}
static Sequence GetRemovedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(startIndex, caretPosition + (oldText.Length - newText.Length) - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static Sequence GetAddedCharacters(string oldText, string newText, int caretPosition)
{
int startIndex = GetStartIndex(oldText, newText);
if(startIndex != -1)
{
Sequence sequence = new Sequence(GetStartIndex(oldText, newText), caretPosition - 1);
if(SequenceValid(sequence))
return sequence;
}
return null;
}
static int GetStartIndex(string oldText, string newText)
{
for(int i = 0; i < Math.Max(oldText.Length, newText.Length); i++)
if(i >= oldText.Length || i >= newText.Length || oldText[i] != newText[i])
return i;
return -1;
}
static bool SequenceValid(Sequence sequence)
{
return sequence.StartIndex >= 0 && sequence.EndIndex >= 0 && sequence.EndIndex >= sequence.StartIndex;
}

How to parse C# generic type names?

How can I parse C#-style generic type names of the format List<int> or Dictionary<string,int> or even more complex Dictionary<string,Dictionary<System.String,int[]>>. Assume that these names are strings and may not actually represent existing types. It should just as easily be be able to parse BogusClass<A,B,Vector<C>>. To be clear, I am NOT interested in parsing .NET internal type names of the format List`1[[System.Int32]], but actual C# type names as they would appear in the source code, with or without namespace qualifiers using dot notation.
Regular expressions are out because these are nested structures. I thought perhaps the System.CodeDom.CodeTypeReference constructor would parse it for me since it has string BaseType and CodeTypeReferenceCollection TypeArguments members, but those apparently need to be set manually.
CodeTypeReference is the kind of structure I need:
class TypeNameStructure
{
public string Name;
public TypeNameStructure[] GenericTypeArguments;
public bool IsGenericType{get;}
public bool IsArray{get;} //would be nice to detect this as well
public TypeNameStructure( string friendlyCSharpName )
{
//Parse friendlyCSharpName into name and generic type arguments recursively
}
}
Are there any existing classes in the framework to achieve this kind of type name parsing? If not, how would I go about parsing this?
Well, I had a lot of fun writing this little parsing class using Regex and named capture groups (?<Name>group).
My approach was that each 'type definition' string could be broken up as a set of the following: Type Name, optional Generic Type, and optional array marker '[ ]'.
So given the classic Dictionary<string, byte[]> you would have Dictionary as the type name and string, byte[] as your inner generic type string.
We can split the inner generic type on the comma (',') character and recursively parse each type string using the same Regex. Each successful parse should be added to the parent type information and you can build a tree hierarchy.
With the previous example, we would end up with an array of {string, byte[]} to parse. Both of these are easily parsed and set to part of Dictionary's inner types.
On ToString() it's simply a matter of recursively outputting each type's friendly name, including inner types. So Dictionary would output his type name, and iterate through all inner types, outputting their type names and so forth.
class TypeInformation
{
static readonly Regex TypeNameRegex = new Regex(#"^(?<TypeName>[a-zA-Z0-9_]+)(<(?<InnerTypeName>[a-zA-Z0-9_,\<\>\s\[\]]+)>)?(?<Array>(\[\]))?$", RegexOptions.Compiled);
readonly List<TypeInformation> innerTypes = new List<TypeInformation>();
public string TypeName
{
get;
private set;
}
public bool IsArray
{
get;
private set;
}
public bool IsGeneric
{
get { return innerTypes.Count > 0; }
}
public IEnumerable<TypeInformation> InnerTypes
{
get { return innerTypes; }
}
private void AddInnerType(TypeInformation type)
{
innerTypes.Add(type);
}
private static IEnumerable<string> SplitByComma(string value)
{
var strings = new List<string>();
var sb = new StringBuilder();
var level = 0;
foreach (var c in value)
{
if (c == ',' && level == 0)
{
strings.Add(sb.ToString());
sb.Clear();
}
else
{
sb.Append(c);
}
if (c == '<')
level++;
if(c == '>')
level--;
}
strings.Add(sb.ToString());
return strings;
}
public static bool TryParse(string friendlyTypeName, out TypeInformation typeInformation)
{
typeInformation = null;
// Try to match the type to our regular expression.
var match = TypeNameRegex.Match(friendlyTypeName);
// If that fails, the format is incorrect.
if (!match.Success)
return false;
// Scrub the type name, inner type name, and array '[]' marker (if present).
var typeName = match.Groups["TypeName"].Value;
var innerTypeFriendlyName = match.Groups["InnerTypeName"].Value;
var isArray = !string.IsNullOrWhiteSpace(match.Groups["Array"].Value);
// Create the root type information.
TypeInformation type = new TypeInformation
{
TypeName = typeName,
IsArray = isArray
};
// Check if we have an inner type name (in the case of generics).
if (!string.IsNullOrWhiteSpace(innerTypeFriendlyName))
{
// Split each type by the comma character.
var innerTypeNames = SplitByComma(innerTypeFriendlyName);
// Iterate through all inner type names and attempt to parse them recursively.
foreach (string innerTypeName in innerTypeNames)
{
TypeInformation innerType = null;
var trimmedInnerTypeName = innerTypeName.Trim();
var success = TypeInformation.TryParse(trimmedInnerTypeName, out innerType);
// If the inner type fails, so does the parent.
if (!success)
return false;
// Success! Add the inner type to the parent.
type.AddInnerType(innerType);
}
}
// Return the parsed type information.
typeInformation = type;
return true;
}
public override string ToString()
{
// Create a string builder with the type name prefilled.
var sb = new StringBuilder(this.TypeName);
// If this type is generic (has inner types), append each recursively.
if (this.IsGeneric)
{
sb.Append("<");
// Get the number of inner types.
int innerTypeCount = this.InnerTypes.Count();
// Append each inner type's friendly string recursively.
for (int i = 0; i < innerTypeCount; i++)
{
sb.Append(innerTypes[i].ToString());
// Check if we need to add a comma to separate from the next inner type name.
if (i + 1 < innerTypeCount)
sb.Append(", ");
}
sb.Append(">");
}
// If this type is an array, we append the array '[]' marker.
if (this.IsArray)
sb.Append("[]");
return sb.ToString();
}
}
I made a console app to test it, it seems to work with most cases I threw at it.
Here's the code:
class MainClass
{
static readonly int RootIndentLevel = 2;
static readonly string InputString = #"BogusClass<A,B,Vector<C>>";
public static void Main(string[] args)
{
TypeInformation type = null;
Console.WriteLine("Input = {0}", InputString);
var success = TypeInformation.TryParse(InputString, out type);
if (success)
{
Console.WriteLine("Output = {0}", type.ToString());
Console.WriteLine("Graph:");
OutputGraph(type, RootIndentLevel);
}
else
Console.WriteLine("Parsing error!");
}
static void OutputGraph(TypeInformation type, int indentLevel = 0)
{
Console.WriteLine("{0}{1}{2}", new string(' ', indentLevel), type.TypeName, type.IsArray ? "[]" : string.Empty);
foreach (var innerType in type.InnerTypes)
OutputGraph(innerType, indentLevel + 2);
}
}
And here's the output:
Input = BogusClass<A,B,Vector<C>>
Output = BogusClass<A, B, Vector<C>>
Graph:
BogusClass
A
B
Vector
C
There are some possible lingering issues, such as multidimensional arrays. It will more than likely fail on something like int[,] or string[][].
Answering own question. I wrote the following class achieve the results I need; give it a spin.
public class TypeName
{
public string Name;
public bool IsGeneric;
public List<ArrayDimension> ArrayDimensions;
public List<TypeName> TypeArguments;
public class ArrayDimension
{
public int Dimensions;
public ArrayDimension()
{
Dimensions = 1;
}
public override string ToString()
{
return "[" + new String(',', Dimensions - 1) + "]";
}
}
public TypeName()
{
Name = null;
IsGeneric = false;
ArrayDimensions = new List<ArrayDimension>();
TypeArguments = new List<TypeName>();
}
public static string MatchStructure( TypeName toMatch, TypeName toType )
{
return null;
}
public override string ToString()
{
string str = Name;
if (IsGeneric)
str += "<" + string.Join( ",", TypeArguments.Select<TypeName,string>( tn => tn.ToString() ) ) + ">";
foreach (ArrayDimension d in ArrayDimensions)
str += d.ToString();
return str;
}
public string FormatForDisplay( int indent = 0 )
{
var spacing = new string(' ', indent );
string str = spacing + "Name: " + Name + "\r\n" +
spacing + "IsGeneric: " + IsGeneric + "\r\n" +
spacing + "ArraySpec: " + string.Join( "", ArrayDimensions.Select<ArrayDimension,string>( d => d.ToString() ) ) + "\r\n";
if (IsGeneric)
{
str += spacing + "GenericParameters: {\r\n" + string.Join( spacing + "},{\r\n", TypeArguments.Select<TypeName,string>( t => t.FormatForDisplay( indent + 4 ) ) ) + spacing + "}\r\n";
}
return str;
}
public static TypeName Parse( string name )
{
int pos = 0;
bool dummy;
return ParseInternal( name, ref pos, out dummy );
}
private static TypeName ParseInternal( string name, ref int pos, out bool listTerminated )
{
StringBuilder sb = new StringBuilder();
TypeName tn = new TypeName();
listTerminated = true;
while (pos < name.Length)
{
char c = name[pos++];
switch (c)
{
case ',':
if (tn.Name == null)
tn.Name = sb.ToString();
listTerminated = false;
return tn;
case '>':
if (tn.Name == null)
tn.Name = sb.ToString();
listTerminated = true;
return tn;
case '<':
{
tn.Name = sb.ToString();
tn.IsGeneric = true;
sb.Length = 0;
bool terminated = false;
while (!terminated)
tn.TypeArguments.Add( ParseInternal( name, ref pos, out terminated ) );
var t = name[pos-1];
if (t == '>')
continue;
else
throw new Exception( "Missing closing > of generic type list." );
}
case '[':
ArrayDimension d = new ArrayDimension();
tn.ArrayDimensions.Add( d );
analyzeArrayDimension: //label for looping over multidimensional arrays
if (pos < name.Length)
{
char nextChar = name[pos++];
switch (nextChar)
{
case ']':
continue; //array specifier terminated
case ',': //multidimensional array
d.Dimensions++;
goto analyzeArrayDimension;
default:
throw new Exception( #"Expecting ""]"" or "","" after ""["" for array specifier but encountered """ + nextChar + #"""." );
}
}
throw new Exception( "Expecting ] or , after [ for array type, but reached end of string." );
default:
sb.Append(c);
continue;
}
}
if (tn.Name == null)
tn.Name = sb.ToString();
return tn;
}
}
If I run the following:
Console.WriteLine( TypeName.Parse( "System.Collections.Generic.Dictionary<Vector<T>,int<long[]>[],bool>" ).ToString() );
It correctly produces the following output, representing the TypeName as a string:
Name: System.Collections.Generic.Dictionary
IsGeneric: True
ArraySpec:
GenericParameters: {
Name: Vector
IsGeneric: True
ArraySpec:
GenericParameters: {
Name: T
IsGeneric: False
ArraySpec:
}
},{
Name: int
IsGeneric: True
ArraySpec: []
GenericParameters: {
Name: long
IsGeneric: False
ArraySpec: []
}
},{
Name: bool
IsGeneric: False
ArraySpec:
}

Splitting CamelCase

This is all asp.net c#.
I have an enum
public enum ControlSelectionType
{
NotApplicable = 1,
SingleSelectRadioButtons = 2,
SingleSelectDropDownList = 3,
MultiSelectCheckBox = 4,
MultiSelectListBox = 5
}
The numerical value of this is stored in my database. I display this value in a datagrid.
<asp:boundcolumn datafield="ControlSelectionTypeId" headertext="Control Type"></asp:boundcolumn>
The ID means nothing to a user so I have changed the boundcolumn to a template column with the following.
<asp:TemplateColumn>
<ItemTemplate>
<%# Enum.Parse(typeof(ControlSelectionType), DataBinder.Eval(Container.DataItem, "ControlSelectionTypeId").ToString()).ToString()%>
</ItemTemplate>
</asp:TemplateColumn>
This is a lot better... However, it would be great if there was a simple function I can put around the Enum to split it by Camel case so that the words wrap nicely in the datagrid.
Note: I am fully aware that there are better ways of doing all this. This screen is purely used internally and I just want a quick hack in place to display it a little better.
I used:
public static string SplitCamelCase(string input)
{
return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
}
Taken from http://weblogs.asp.net/jgalloway/archive/2005/09/27/426087.aspx
vb.net:
Public Shared Function SplitCamelCase(ByVal input As String) As String
Return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim()
End Function
Here is a dotnet Fiddle for online execution of the c# code.
Indeed a regex/replace is the way to go as described in the other answer, however this might also be of use to you if you wanted to go a different direction
using System.ComponentModel;
using System.Reflection;
...
public static string GetDescription(System.Enum value)
{
FieldInfo fi = value.GetType().GetField(value.ToString());
DescriptionAttribute[] attributes = (DescriptionAttribute[])fi.GetCustomAttributes(typeof(DescriptionAttribute), false);
if (attributes.Length > 0)
return attributes[0].Description;
else
return value.ToString();
}
this will allow you define your Enums as
public enum ControlSelectionType
{
[Description("Not Applicable")]
NotApplicable = 1,
[Description("Single Select Radio Buttons")]
SingleSelectRadioButtons = 2,
[Description("Completely Different Display Text")]
SingleSelectDropDownList = 3,
}
Taken from
http://www.codeguru.com/forum/archive/index.php/t-412868.html
This regex (^[a-z]+|[A-Z]+(?![a-z])|[A-Z][a-z]+) can be used to extract all words from the camelCase or PascalCase name. It also works with abbreviations anywhere inside the name.
MyHTTPServer will contain exactly 3 matches: My, HTTP, Server
myNewXMLFile will contain 4 matches: my, New, XML, File
You could then join them into a single string using string.Join.
string name = "myNewUIControl";
string[] words = Regex.Matches(name, "(^[a-z]+|[A-Z]+(?![a-z])|[A-Z][a-z]+)")
.OfType<Match>()
.Select(m => m.Value)
.ToArray();
string result = string.Join(" ", words);
As #DanielB noted in the comments, that regex won't work for numbers (and with underscores), so here is an improved version that supports any identifier with words, acronyms, numbers, underscores (slightly modified #JoeJohnston's version), see online demo (fiddle):
([A-Z]+(?![a-z])|[A-Z][a-z]+|[0-9]+|[a-z]+)
Extreme example: __snake_case12_camelCase_TLA1ABC → snake, case, 12, camel, Case, TLA, 1, ABC
Tillito's answer does not handle strings already containing spaces well, or Acronyms. This fixes it:
public static string SplitCamelCase(string input)
{
return Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled);
}
If C# 3.0 is an option you can use the following one-liner to do the job:
Regex.Matches(YOUR_ENUM_VALUE_NAME, "[A-Z][a-z]+").OfType<Match>().Select(match => match.Value).Aggregate((acc, b) => acc + " " + b).TrimStart(' ');
Here's an extension method that handles numbers and multiple uppercase characters sanely, and also allows for upper-casing specific acronyms in the final string:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.Text.RegularExpressions;
using System.Web.Configuration;
namespace System
{
/// <summary>
/// Extension methods for the string data type
/// </summary>
public static class ConventionBasedFormattingExtensions
{
/// <summary>
/// Turn CamelCaseText into Camel Case Text.
/// </summary>
/// <param name="input"></param>
/// <returns></returns>
/// <remarks>Use AppSettings["SplitCamelCase_AllCapsWords"] to specify a comma-delimited list of words that should be ALL CAPS after split</remarks>
/// <example>
/// wordWordIDWord1WordWORDWord32Word2
/// Word Word ID Word 1 Word WORD Word 32 Word 2
///
/// wordWordIDWord1WordWORDWord32WordID2ID
/// Word Word ID Word 1 Word WORD Word 32 Word ID 2 ID
///
/// WordWordIDWord1WordWORDWord32Word2Aa
/// Word Word ID Word 1 Word WORD Word 32 Word 2 Aa
///
/// wordWordIDWord1WordWORDWord32Word2A
/// Word Word ID Word 1 Word WORD Word 32 Word 2 A
/// </example>
public static string SplitCamelCase(this string input)
{
if (input == null) return null;
if (string.IsNullOrWhiteSpace(input)) return "";
var separated = input;
separated = SplitCamelCaseRegex.Replace(separated, #" $1").Trim();
//Set ALL CAPS words
if (_SplitCamelCase_AllCapsWords.Any())
foreach (var word in _SplitCamelCase_AllCapsWords)
separated = SplitCamelCase_AllCapsWords_Regexes[word].Replace(separated, word.ToUpper());
//Capitalize first letter
var firstChar = separated.First(); //NullOrWhiteSpace handled earlier
if (char.IsLower(firstChar))
separated = char.ToUpper(firstChar) + separated.Substring(1);
return separated;
}
private static readonly Regex SplitCamelCaseRegex = new Regex(#"
(
(?<=[a-z])[A-Z0-9] (?# lower-to-other boundaries )
|
(?<=[0-9])[a-zA-Z] (?# number-to-other boundaries )
|
(?<=[A-Z])[0-9] (?# cap-to-number boundaries; handles a specific issue with the next condition )
|
(?<=[A-Z])[A-Z](?=[a-z]) (?# handles longer strings of caps like ID or CMS by splitting off the last capital )
)"
, RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace
);
private static readonly string[] _SplitCamelCase_AllCapsWords =
(WebConfigurationManager.AppSettings["SplitCamelCase_AllCapsWords"] ?? "")
.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(a => a.ToLowerInvariant().Trim())
.ToArray()
;
private static Dictionary<string, Regex> _SplitCamelCase_AllCapsWords_Regexes;
private static Dictionary<string, Regex> SplitCamelCase_AllCapsWords_Regexes
{
get
{
if (_SplitCamelCase_AllCapsWords_Regexes == null)
{
_SplitCamelCase_AllCapsWords_Regexes = new Dictionary<string,Regex>();
foreach(var word in _SplitCamelCase_AllCapsWords)
_SplitCamelCase_AllCapsWords_Regexes.Add(word, new Regex(#"\b" + word + #"\b", RegexOptions.Compiled | RegexOptions.IgnoreCase));
}
return _SplitCamelCase_AllCapsWords_Regexes;
}
}
}
}
You can use C# extension methods
public static string SpacesFromCamel(this string value)
{
if (value.Length > 0)
{
var result = new List<char>();
char[] array = value.ToCharArray();
foreach (var item in array)
{
if (char.IsUpper(item) && result.Count > 0)
{
result.Add(' ');
}
result.Add(item);
}
return new string(result.ToArray());
}
return value;
}
Then you can use it like
var result = "TestString".SpacesFromCamel();
Result will be
Test String
Using LINQ:
var chars = ControlSelectionType.NotApplicable.ToString().SelectMany((x, i) => i > 0 && char.IsUpper(x) ? new char[] { ' ', x } : new char[] { x });
Console.WriteLine(new string(chars.ToArray()));
I also have an enum which I had to separate. In my case this method solved the problem-
string SeparateCamelCase(string str)
{
for (int i = 1; i < str.Length; i++)
{
if (char.IsUpper(str[i]))
{
str = str.Insert(i, " ");
i++;
}
}
return str;
}
public enum ControlSelectionType
{
NotApplicable = 1,
SingleSelectRadioButtons = 2,
SingleSelectDropDownList = 3,
MultiSelectCheckBox = 4,
MultiSelectListBox = 5
}
public class NameValue
{
public string Name { get; set; }
public object Value { get; set; }
}
public static List<NameValue> EnumToList<T>(bool camelcase)
{
var array = (T[])(Enum.GetValues(typeof(T)).Cast<T>());
var array2 = Enum.GetNames(typeof(T)).ToArray<string>();
List<NameValue> lst = null;
for (int i = 0; i < array.Length; i++)
{
if (lst == null)
lst = new List<NameValue>();
string name = "";
if (camelcase)
{
name = array2[i].CamelCaseFriendly();
}
else
name = array2[i];
T value = array[i];
lst.Add(new NameValue { Name = name, Value = value });
}
return lst;
}
public static string CamelCaseFriendly(this string pascalCaseString)
{
Regex r = new Regex("(?<=[a-z])(?<x>[A-Z])|(?<=.)(?<x>[A-Z])(?=[a-z])");
return r.Replace(pascalCaseString, " ${x}");
}
//In your form
protected void Button1_Click1(object sender, EventArgs e)
{
DropDownList1.DataSource = GeneralClass.EnumToList<ControlSelectionType >(true); ;
DropDownList1.DataTextField = "Name";
DropDownList1.DataValueField = "Value";
DropDownList1.DataBind();
}
The solution from Eoin Campbell works good except if you have a Web Service.
You would need to do the Following as the Description Attribute is not serializable.
[DataContract]
public enum ControlSelectionType
{
[EnumMember(Value = "Not Applicable")]
NotApplicable = 1,
[EnumMember(Value = "Single Select Radio Buttons")]
SingleSelectRadioButtons = 2,
[EnumMember(Value = "Completely Different Display Text")]
SingleSelectDropDownList = 3,
}
public static string GetDescriptionFromEnumValue(Enum value)
{
EnumMemberAttribute attribute = value.GetType()
.GetField(value.ToString())
.GetCustomAttributes(typeof(EnumMemberAttribute), false)
.SingleOrDefault() as EnumMemberAttribute;
return attribute == null ? value.ToString() : attribute.Value;
}
And if you don't fancy using regex - try this:
public static string SeperateByCamelCase(this string text, char splitChar = ' ') {
var output = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
var c = text[i];
//if not the first and the char is upper
if (i > 0 && char.IsUpper(c)) {
var wasLastLower = char.IsLower(text[i - 1]);
if (i + 1 < text.Length) //is there a next
{
var isNextUpper = char.IsUpper(text[i + 1]);
if (!isNextUpper) //if next is not upper (start of a word).
{
output.Append(splitChar);
}
else if (wasLastLower) //last was lower but i'm upper and my next is an upper (start of an achromin). 'abcdHTTP' 'abcd HTTP'
{
output.Append(splitChar);
}
}
else
{
//last letter - if its upper and the last letter was lower 'abcd' to 'abcd A'
if (wasLastLower)
{
output.Append(splitChar);
}
}
}
output.Append(c);
}
return output.ToString();
}
Passes these tests, it doesn't like numbers but i didn't need it to.
[TestMethod()]
public void ToCamelCaseTest()
{
var testData = new string[] { "AAACamel", "AAA", "SplitThisByCamel", "AnA", "doesnothing", "a", "A", "aasdasdAAA" };
var expectedData = new string[] { "AAA Camel", "AAA", "Split This By Camel", "An A", "doesnothing", "a", "A", "aasdasd AAA" };
for (int i = 0; i < testData.Length; i++)
{
var actual = testData[i].SeperateByCamelCase();
var expected = expectedData[i];
Assert.AreEqual(actual, expected);
}
}
#JustSayNoToRegex
Takes a C# identifier, with uderscores and numbers, and converts it to space-separated string.
public static class StringExtensions
{
public static string SplitOnCase(this string identifier)
{
if (identifier == null || identifier.Length == 0) return string.Empty;
var sb = new StringBuilder();
if (identifier.Length == 1) sb.Append(char.ToUpperInvariant(identifier[0]));
else if (identifier.Length == 2) sb.Append(char.ToUpperInvariant(identifier[0])).Append(identifier[1]);
else {
if (identifier[0] != '_') sb.Append(char.ToUpperInvariant(identifier[0]));
for (int i = 1; i < identifier.Length; i++) {
var current = identifier[i];
var previous = identifier[i - 1];
if (current == '_' && previous == '_') continue;
else if (current == '_') {
sb.Append(' ');
}
else if (char.IsLetter(current) && previous == '_') {
sb.Append(char.ToUpperInvariant(current));
}
else if (char.IsDigit(current) && char.IsLetter(previous)) {
sb.Append(' ').Append(current);
}
else if (char.IsLetter(current) && char.IsDigit(previous)) {
sb.Append(' ').Append(char.ToUpperInvariant(current));
}
else if (char.IsUpper(current) && char.IsLower(previous)
&& (i < identifier.Length - 1 && char.IsUpper(identifier[i + 1]) || i == identifier.Length - 1)) {
sb.Append(' ').Append(current);
}
else if (char.IsUpper(current) && i < identifier.Length - 1 && char.IsLower(identifier[i + 1])) {
sb.Append(' ').Append(current);
}
else {
sb.Append(current);
}
}
}
return sb.ToString();
}
}
Tests:
[TestFixture]
static class HelpersTests
{
[Test]
public static void Basic()
{
Assert.AreEqual("Foo", "foo".SplitOnCase());
Assert.AreEqual("Foo", "_foo".SplitOnCase());
Assert.AreEqual("Foo", "__foo".SplitOnCase());
Assert.AreEqual("Foo", "___foo".SplitOnCase());
Assert.AreEqual("Foo 2", "foo2".SplitOnCase());
Assert.AreEqual("Foo 23", "foo23".SplitOnCase());
Assert.AreEqual("Foo 23 A", "foo23A".SplitOnCase());
Assert.AreEqual("Foo 23 Ab", "foo23Ab".SplitOnCase());
Assert.AreEqual("Foo 23 Ab", "foo23_ab".SplitOnCase());
Assert.AreEqual("Foo 23 Ab", "foo23___ab".SplitOnCase());
Assert.AreEqual("Foo 23", "foo__23".SplitOnCase());
Assert.AreEqual("Foo Bar", "Foo_bar".SplitOnCase());
Assert.AreEqual("Foo Bar", "Foo____bar".SplitOnCase());
Assert.AreEqual("AAA", "AAA".SplitOnCase());
Assert.AreEqual("Foo A Aa", "fooAAa".SplitOnCase());
Assert.AreEqual("Foo AAA", "fooAAA".SplitOnCase());
Assert.AreEqual("Foo Bar", "FooBar".SplitOnCase());
Assert.AreEqual("Mn M", "MnM".SplitOnCase());
Assert.AreEqual("AS", "aS".SplitOnCase());
Assert.AreEqual("As", "as".SplitOnCase());
Assert.AreEqual("A", "a".SplitOnCase());
Assert.AreEqual("_", "_".SplitOnCase());
}
}
Simple version similar to some of the above, but with logic to not auto-insert the separator (which is by default, a space, but can be any char) if there's already one at the current position.
Uses a StringBuilder rather than 'mutating' strings.
public static string SeparateCamelCase(this string value, char separator = ' ') {
var sb = new StringBuilder();
var lastChar = separator;
foreach (var currentChar in value) {
if (char.IsUpper(currentChar) && lastChar != separator)
sb.Append(separator);
sb.Append(currentChar);
lastChar = currentChar;
}
return sb.ToString();
}
Example:
Input : 'ThisIsATest'
Output : 'This Is A Test'
Input : 'This IsATest'
Output : 'This Is A Test' (Note: Still only one space between 'This' and 'Is')
Input : 'ThisIsATest' (with separator '_')
Output : 'This_Is_A_Test'
Try this:
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
Console
.WriteLine(
SeparateByCamelCase("TestString") == "Test String" // True
);
}
public static string SeparateByCamelCase(string str)
{
return String.Join(" ", SplitByCamelCase(str));
}
public static IEnumerable<string> SplitByCamelCase(string str)
{
if (str.Length == 0)
return new List<string>();
return
new List<string>
{
Head(str)
}
.Concat(
SplitByCamelCase(
Tail(str)
)
);
}
public static string Head(string str)
{
return new String(
str
.Take(1)
.Concat(
str
.Skip(1)
.TakeWhile(IsLower)
)
.ToArray()
);
}
public static string Tail(string str)
{
return new String(
str
.Skip(
Head(str).Length
)
.ToArray()
);
}
public static bool IsLower(char ch)
{
return ch >= 'a' && ch <= 'z';
}
}
See sample online

How to determine if a File Matches a File Mask?

I need to decide whether file name fits to file mask. The file mask could contain * or ? characters. Is there any simple solution for this?
bool bFits = Fits("myfile.txt", "my*.txt");
private bool Fits(string sFileName, string sFileMask)
{
??? anything simple here ???
}
I appreciate finding Joel's answer--saved me some time as well ! I did, however, have to make a few changes to make the method do what most users would expect:
I removed the 'this' keyword preceding the first argument. It does nothing here (though it could be useful if the method is intended to be an extension method, in which case it needs to be public and contained within a static class and itself be a static method).
I made the regular expression case-independent to match standard Windows wildcard behavior (so e.g. "c*.*" and "C*.*" both return the same result).
I added starting and ending anchors to the regular expression, again to match standard Windows wildcard behavior (so e.g. "stuff.txt" would be matched by "stuff*" or "s*" or "s*.*" but not by just "s").
private bool FitsMask(string fileName, string fileMask)
{
Regex mask = new Regex(
'^' +
fileMask
.Replace(".", "[.]")
.Replace("*", ".*")
.Replace("?", ".")
+ '$',
RegexOptions.IgnoreCase);
return mask.IsMatch(fileName);
}
2009.11.04 Update: Match one of several masks
For even more flexibility, here is a plug-compatible method built on top of the original. This version lets you pass multiple masks (hence the plural on the second parameter name fileMasks) separated by lines, commas, vertical bars, or spaces. I wanted it so that I could let the user put as many choices as desired in a ListBox and then select all files matching any of them. Note that some controls (like a ListBox) use CR-LF for line breaks while others (e.g. RichTextBox) use just LF--that is why both "\r\n" and "\n" show up in the Split list.
private bool FitsOneOfMultipleMasks(string fileName, string fileMasks)
{
return fileMasks
.Split(new string[] {"\r\n", "\n", ",", "|", " "},
StringSplitOptions.RemoveEmptyEntries)
.Any(fileMask => FitsMask(fileName, fileMask));
}
2009.11.17 Update: Handle fileMask inputs more gracefully
The earlier version of FitsMask (which I have left in for comparison) does a fair job but since we are treating it as a regular expression it will throw an exception if it is not a valid regular expression when it comes in. The solution is that we actually want any regex metacharacters in the input fileMask to be considered literals, not metacharacters. But we still need to treat period, asterisk, and question mark specially. So this improved version of FitsMask safely moves these three characters out of the way, transforms all remaining metacharacters into literals, then puts the three interesting characters back, in their "regex'ed" form.
One other minor improvement is to allow for case-independence, per standard Windows behavior.
private bool FitsMask(string fileName, string fileMask)
{
string pattern =
'^' +
Regex.Escape(fileMask.Replace(".", "__DOT__")
.Replace("*", "__STAR__")
.Replace("?", "__QM__"))
.Replace("__DOT__", "[.]")
.Replace("__STAR__", ".*")
.Replace("__QM__", ".")
+ '$';
return new Regex(pattern, RegexOptions.IgnoreCase).IsMatch(fileName);
}
2010.09.30 Update: Somewhere along the way, passion ensued...
I have been remiss in not updating this earlier but these references will likely be of interest to readers who have made it to this point:
I embedded the FitsMask method as the heart of a WinForms user control aptly called a FileMask--see the API here.
I then wrote an article featuring the FileMask control published on Simple-Talk.com, entitled Using LINQ Lambda Expressions to Design Customizable Generic Components. (While the method itself does not use LINQ, the FileMask user control does, hence the title of the article.)
Try this:
private bool FitsMask(string sFileName, string sFileMask)
{
Regex mask = new Regex(sFileMask.Replace(".", "[.]").Replace("*", ".*").Replace("?", "."));
return mask.IsMatch(sFileName);
}
Many people don't know that, but .NET includes an internal class, called "PatternMatcher" (under the "System.IO" namespace).
This static class contains only 1 method:
public static bool StrictMatchPattern(string expression, string name)
This method is used by .net whenever it needs to compare files with wildcard (FileSystemWatcher, GetFiles(), etc)
Using reflector, I exposed the code here.
Didn't really go through it to understand how it works, but it works great,
So this is the code for anyone who doesn't want to work with the inefficient RegEx way:
public static class PatternMatcher
{
// Fields
private const char ANSI_DOS_QM = '<';
private const char ANSI_DOS_STAR = '>';
private const char DOS_DOT = '"';
private const int MATCHES_ARRAY_SIZE = 16;
// Methods
public static bool StrictMatchPattern(string expression, string name)
{
expression = expression.ToLowerInvariant();
name = name.ToLowerInvariant();
int num9;
char ch = '\0';
char ch2 = '\0';
int[] sourceArray = new int[16];
int[] numArray2 = new int[16];
bool flag = false;
if (((name == null) || (name.Length == 0)) || ((expression == null) || (expression.Length == 0)))
{
return false;
}
if (expression.Equals("*") || expression.Equals("*.*"))
{
return true;
}
if ((expression[0] == '*') && (expression.IndexOf('*', 1) == -1))
{
int length = expression.Length - 1;
if ((name.Length >= length) && (string.Compare(expression, 1, name, name.Length - length, length, StringComparison.OrdinalIgnoreCase) == 0))
{
return true;
}
}
sourceArray[0] = 0;
int num7 = 1;
int num = 0;
int num8 = expression.Length * 2;
while (!flag)
{
int num3;
if (num < name.Length)
{
ch = name[num];
num3 = 1;
num++;
}
else
{
flag = true;
if (sourceArray[num7 - 1] == num8)
{
break;
}
}
int index = 0;
int num5 = 0;
int num6 = 0;
while (index < num7)
{
int num2 = (sourceArray[index++] + 1) / 2;
num3 = 0;
Label_00F2:
if (num2 != expression.Length)
{
num2 += num3;
num9 = num2 * 2;
if (num2 == expression.Length)
{
numArray2[num5++] = num8;
}
else
{
ch2 = expression[num2];
num3 = 1;
if (num5 >= 14)
{
int num11 = numArray2.Length * 2;
int[] destinationArray = new int[num11];
Array.Copy(numArray2, destinationArray, numArray2.Length);
numArray2 = destinationArray;
destinationArray = new int[num11];
Array.Copy(sourceArray, destinationArray, sourceArray.Length);
sourceArray = destinationArray;
}
if (ch2 == '*')
{
numArray2[num5++] = num9;
numArray2[num5++] = num9 + 1;
goto Label_00F2;
}
if (ch2 == '>')
{
bool flag2 = false;
if (!flag && (ch == '.'))
{
int num13 = name.Length;
for (int i = num; i < num13; i++)
{
char ch3 = name[i];
num3 = 1;
if (ch3 == '.')
{
flag2 = true;
break;
}
}
}
if ((flag || (ch != '.')) || flag2)
{
numArray2[num5++] = num9;
numArray2[num5++] = num9 + 1;
}
else
{
numArray2[num5++] = num9 + 1;
}
goto Label_00F2;
}
num9 += num3 * 2;
switch (ch2)
{
case '<':
if (flag || (ch == '.'))
{
goto Label_00F2;
}
numArray2[num5++] = num9;
goto Label_028D;
case '"':
if (flag)
{
goto Label_00F2;
}
if (ch == '.')
{
numArray2[num5++] = num9;
goto Label_028D;
}
break;
}
if (!flag)
{
if (ch2 == '?')
{
numArray2[num5++] = num9;
}
else if (ch2 == ch)
{
numArray2[num5++] = num9;
}
}
}
}
Label_028D:
if ((index < num7) && (num6 < num5))
{
while (num6 < num5)
{
int num14 = sourceArray.Length;
while ((index < num14) && (sourceArray[index] < numArray2[num6]))
{
index++;
}
num6++;
}
}
}
if (num5 == 0)
{
return false;
}
int[] numArray4 = sourceArray;
sourceArray = numArray2;
numArray2 = numArray4;
num7 = num5;
}
num9 = sourceArray[num7 - 1];
return (num9 == num8);
}
}
None of these answers quite seem to do the trick, and msorens's is needlessly complex. This one should work just fine:
public static Boolean MatchesMask(string fileName, string fileMask)
{
String convertedMask = "^" + Regex.Escape(fileMask).Replace("\\*", ".*").Replace("\\?", ".") + "$";
Regex regexMask = new Regex(convertedMask, RegexOptions.IgnoreCase);
return regexMask.IsMatch(fileName);
}
This makes sure possible regex chars in the mask are escaped, replaces the \* and \?, and surrounds it all by ^ and $ to mark the boundaries.
Of course, in most situations, it's far more useful to simply make this into a FileMaskToRegex tool function which returns the Regex object, so you just got it once and can then make a loop in which you check all strings from your files list on it.
public static Regex FileMaskToRegex(string fileMask)
{
String convertedMask = "^" + Regex.Escape(fileMask).Replace("\\*", ".*").Replace("\\?", ".") + "$";
return new Regex(convertedMask, RegexOptions.IgnoreCase);
}
Use WildCardPattern class from System.Management.Automation available as NuGet package or in Windows PowerShell SDK.
WildcardPattern pattern = new WildcardPattern("my*.txt");
bool fits = pattern.IsMatch("myfile.txt");
From Windows 7 using P/Invoke (without 260 char count limit):
// UNICODE_STRING for Rtl... method
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct UNICODE_STRING
{
public ushort Length;
public ushort MaximumLength;
[MarshalAs(UnmanagedType.LPWStr)]
string Buffer;
public UNICODE_STRING(string buffer)
{
if (buffer == null)
Length = MaximumLength = 0;
else
Length = MaximumLength = unchecked((ushort)(buffer.Length * 2));
Buffer = buffer;
}
}
// RtlIsNameInExpression method from NtDll.dll system library
public static class NtDll
{
[DllImport("NtDll.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
[return: MarshalAs(UnmanagedType.U1)]
public extern static bool RtlIsNameInExpression(
ref UNICODE_STRING Expression,
ref UNICODE_STRING Name,
[MarshalAs(UnmanagedType.U1)]
bool IgnoreCase,
IntPtr Zero
);
}
public bool MatchMask(string mask, string fileName)
{
// Expression must be uppercase for IgnoreCase == true (see MSDN for RtlIsNameInExpression)
UNICODE_STRING expr = new UNICODE_STRING(mask.ToUpper());
UNICODE_STRING name = new UNICODE_STRING(fileName);
if (NtDll.RtlIsNameInExpression(ref expr, ref name, true, IntPtr.Zero))
{
// MATCHES !!!
}
}
Fastest version of the previously proposed function:
public static bool FitsMasks(string filePath, params string[] fileMasks)
// or
public static Regex FileMasksToRegex(params string[] fileMasks)
{
if (!_maskRegexes.ContainsKey(fileMasks))
{
StringBuilder sb = new StringBuilder("^");
bool first = true;
foreach (string fileMask in fileMasks)
{
if(first) first =false; else sb.Append("|");
sb.Append('(');
foreach (char c in fileMask)
{
switch (c)
{
case '*': sb.Append(#".*"); break;
case '?': sb.Append(#"."); break;
default:
sb.Append(Regex.Escape(c.ToString()));
break;
}
}
sb.Append(')');
}
sb.Append("$");
_maskRegexes[fileMasks] = new Regex(sb.ToString(), RegexOptions.IgnoreCase);
}
return _maskRegexes[fileMasks].IsMatch(filePath);
// or
return _maskRegexes[fileMasks];
}
static readonly Dictionary<string[], Regex> _maskRegexes = new Dictionary<string[], Regex>(/*unordered string[] comparer*/);
Notes:
Re-using Regex objects.
Using StringBuilder to optimize Regex creation (multiple .Replace() calls are slow).
Multiple masks, combined with OR.
Another version returning the Regex.
If PowerShell is available, it has direct support for wildcard type matching (as well as Regex).
WildcardPattern pat = new WildcardPattern("a*.b*");
if (pat.IsMatch(filename)) { ... }
I didn't want to copy the source code and like #frankhommers I came up with a reflection based solution.
Notice the code comment about the use of wildcards in the name argument I found in the reference source.
public static class PatternMatcher
{
static MethodInfo strictMatchPatternMethod;
static PatternMatcher()
{
var typeName = "System.IO.PatternMatcher";
var methodName = "StrictMatchPattern";
var assembly = typeof(Uri).Assembly;
var type = assembly.GetType(typeName, true);
strictMatchPatternMethod = type.GetMethod(methodName, BindingFlags.Static | BindingFlags.Public) ?? throw new MissingMethodException($"{typeName}.{methodName} not found");
}
/// <summary>
/// Tells whether a given name matches the expression given with a strict (i.e. UNIX like) semantics.
/// </summary>
/// <param name="expression">Supplies the input expression to check against</param>
/// <param name="name">Supplies the input name to check for.</param>
/// <returns></returns>
public static bool StrictMatchPattern(string expression, string name)
{
// https://referencesource.microsoft.com/#system/services/io/system/io/PatternMatcher.cs
// If this class is ever exposed for generic use,
// we need to make sure that name doesn't contain wildcards. Currently
// the only component that calls this method is FileSystemWatcher and
// it will never pass a name that contains a wildcard.
if (name.Contains('*')) throw new FormatException("Wildcard not allowed");
return (bool)strictMatchPatternMethod.Invoke(null, new object[] { expression, name });
}
}
For .net Core the way microsoft does.
private bool MatchPattern(ReadOnlySpan<char> relativePath)
{
ReadOnlySpan<char> name = IO.Path.GetFileName(relativePath);
if (name.Length == 0)
return false;
if (Filters.Count == 0)
return true;
foreach (string filter in Filters)
{
if (FileSystemName.MatchesSimpleExpression(filter, name, ignoreCase: !PathInternal.IsCaseSensitive))
return true;
}
return false;
}
The way microsoft itself seemed to do for .NET 4.6 is documented in github:
private bool MatchPattern(string relativePath) {
string name = System.IO.Path.GetFileName(relativePath);
if (name != null)
return PatternMatcher.StrictMatchPattern(filter.ToUpper(CultureInfo.InvariantCulture), name.ToUpper(CultureInfo.InvariantCulture));
else
return false;
}
My version, which supports ** wild card:
static Regex FileMask2Regex(string mask)
{
var sb = new StringBuilder(mask);
// hide wildcards
sb.Replace("**", "affefa0d52e84c2db78f5510117471aa-StarStar");
sb.Replace("*", "affefa0d52e84c2db78f5510117471aa-Star");
sb.Replace("?", "affefa0d52e84c2db78f5510117471aa-Question");
sb.Replace("/", "affefa0d52e84c2db78f5510117471aa-Slash");
sb.Replace("\\", "affefa0d52e84c2db78f5510117471aa-Slash");
sb = new StringBuilder(Regex.Escape(sb.ToString()));
// unhide wildcards
sb.Replace("affefa0d52e84c2db78f5510117471aa-StarStar", #".*");
sb.Replace("affefa0d52e84c2db78f5510117471aa-Star", #"[^/\\]*");
sb.Replace("affefa0d52e84c2db78f5510117471aa-Question", #"[^/\\]");
sb.Replace("affefa0d52e84c2db78f5510117471aa-Slash", #"[/\\]");
sb.Append("$");
// allowed to have prefix
sb.Insert(0, #"^(?:.*?[/\\])?");
return new Regex(sb.ToString(), RegexOptions.IgnoreCase);
}
How about using reflection to get access to the function in the .NET framework?
Like this:
public class PatternMatcher
{
public delegate bool StrictMatchPatternDelegate(string expression, string name);
public StrictMatchPatternDelegate StrictMatchPattern;
public PatternMatcher()
{
Type patternMatcherType = typeof(FileSystemWatcher).Assembly.GetType("System.IO.PatternMatcher");
MethodInfo patternMatchMethod = patternMatcherType.GetMethod("StrictMatchPattern", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public);
StrictMatchPattern = (expression, name) => (bool)patternMatchMethod.Invoke(null, new object[] { expression, name });
}
}
void Main()
{
PatternMatcher patternMatcher = new PatternMatcher();
Console.WriteLine(patternMatcher.StrictMatchPattern("*.txt", "test.txt")); //displays true
Console.WriteLine(patternMatcher.StrictMatchPattern("*.doc", "test.txt")); //displays false
}

Categories

Resources