How can I use XML Schema regex in C#? - c#

I have a 2 part question:
How can I get the regex expression of an XSD facet and then use it to determine if a string matches the restriction? In my mind, this is "How do I convert XML Schema regex to .NET Regex", but I'm open for suggestions if you have another way for me to do it other than converting the expression.
If the test (#1) fails, how can I use the XSD pattern regex to automatically create a string which does satisfy the constraint?

XmlSchemaDatatype.ParseValue is your answer. Assuming the associated simple type has more facets and you only want to validate against the pattern one(s), then you have to simply find the enumeration facet in the XmlSchemaSimpleTypeRestriction.Facets, use a copy of that to create a new XmlSchemaSimpleType, with a new XmlSchemaSimpleTypeRestriction Content and new pattern facet(s) using the values you scooped above. Then using this newly created simple type, invoke XmlSchemaDatatype.ParseValue.
I would advise against your suggestion in the comment, since the regex "dialects" are different.
I am not aware of such a thing, available for free or otherwise. I am sure it can be done but I never found something that would actually work, when I needed it myself. If you do find one, please share.

It is not too difficult to convert a XML Schema regex to a .NET regex.
Basically you need to replace few patterns such as \c and \D with by their .NET alternatives such as \p{_xmlC} and \P{_xmlD}.
Also you need to wrap expression in ^ and $ markers.
.NET implements this in method Preprocess in https://github.com/Microsoft/referencesource/blob/master/System.Xml/System/Xml/Schema/FacetChecker.cs
If you decide to copy-paste the implementation, be careful, though.
You need to replace loop
for (int position = 0; position < length - 2; position ++)
with
for (int position = 0; position < length - 1; position ++)
because for optimization reasons Preprocess assumes the input expression is enclosed in parentheses.

Related

Is there a syntactically legal expression that has 2 consecutive identifiers separated only by white space in C#?

That might not be the best way to phrase it, but I'm considering writing a tool that converts identifiers separated by spaces in my code to camel case. A quick example:
var zoo animals = GetZooAnimals(); // i can't help but type this
var zooAnimals = GetZooAnimals(); // i want it to rewrite it like this
I was wondering if writing a tool like this would run into any ambiguities assuming it ignores all keywords. The only reason I can think of is if there is a syntactically valid expression with 2 identifiers only separated by white space.
Looking through the grammar I could not immediately find a place that allows it, but perhaps someone else would know better.
On a side note, I realize this is not a practical solution to a real problem a lot of people have, but just something I do all the time and wanted to take a stab at fixing with tools instead of forcing myself to always write camel case.
It is hard to tell whether a space-separated sequence of identifiers represents a single variable or not without doing full semantic analysis. For example
Myclass myVariable;
is a pair of space-separated identifiers which are perfectly valid. This would cause an ambiguity if you want to camel-case both type names and variable names.
If one enters:
csharp> var i j = 3;
(1,7): error CS1525: Unexpected symbol `j', expecting `,', `;', or `='
in the csharp interactive shell, one gets an error generated by the parser (a (LA)LR parser does bookkeeping what to expect next). Such parser works left-to-right so it doesn't know which characters to come next. It simply knows that the next characters are one of the list shown above.
So that means that there is probably no way to - at least declare a variable - with spaces.
Furthermore based on this context-free grammar for C# there doesn't seem to be a case where one can place two identifiers next to each other. It is for instance possible that a primary expressions is an identifier, but there is no situation where a primary expression is placed next to an identifier (or with an optional part in between).
As #dasblinkenlight says, you can indeed see the rule "local-variable-declaration":
type variable-declarator
with type that can be evaluated to an identifier and variable-declarator starting with an identifier. You can however know that the type is the first identifier (or the var keyword). Some kind of rewrite rule is thus:
(\w+)(\s+\w+)+ -> \1 concat(\2)
where you need to combine (concat) the identifiers of the second group. In case of an assignment.

.Net regex not replacing correctly

Background
I am trying to do some regex matching and replacing, but for some reason the replacement isn't correct in .NET.
Regex pattern - "^.*?/rebate/?$"
Input string - "/my-tax/rebate"
Replacement string - "/new-path/rebate"
Basically, if the word 'rebate' is seen in a string, the input string needs to be replaced entirely by the replacement string.
Problem
If I create a regex with the pattern and execute
patternMatch.Pattern.Replace("/my-tax/rebate", "/new-path/rebate")
I get /my-tax/new-path/rebate, which isn't correct.
But, if I execute -
new Regex(#"^.*?/rebate/?$").Replace("/my-tax/rebate", "/new-path/rebate"),
the result is correct - /new-path/rebate
Why is that?
patternMatch is an object with two properties - one Pattern (which is the Regex Pattern) and another one is TargetPath (which is the replacement string). In this example, I am only using the pattern property.
patternMatch.Pattern on debugging is
Here are the results during run time-
You are simply wrongly using the function. I'm not sure how you are getting /my-tax/new-path/rebate since it is giving me an error on ideone.com (Maybe you have a regex named Pattern?).
Anyway, you shouldn't have any issues with using the function like this:
patternMatch.Replace("/my-tax/rebate", "/new-path/rebate");
ideone demo
A number of points in your question are incorrect. The regex is replacing correctly.
Per #XiaoguangQiao's comment, what is patternMatch.Pattern.Replace? Your example...
var patternMatch = new Regex("^.*?/rebate/?$");
patternMatch.Pattern.Replace("/my-tax/rebate", "/new-path/rebate");
...errors with the message...
'System.Text.RegularExpressions.Regex' does not contain a definition for 'Pattern' and no extension method 'Pattern' accepting a first argument of type 'System.Text.RegularExpressions.Regex' could be found
...when I throw it into a quick LINQPad 4 query (set to C# Statement(s)).
pattern is a private string field of System.Text.RegularExpressions.Regex; and patternMatch.Replace("/my-tax/rebate", "/new-path/rebate") - which I expect is what you meant - yields the correct result ("/new-path/rebate") rather than the incorrect result you said you get ("/my-tax/new-path/rebate").
Otherwise your pattern(s) (i.e. with and without the extra / that #rene pointed out) is fine for the input ("/my-tax/rebate") and replacement ("/new-path/rebate") you initially outline - insofar as they match and yield the result you want. You can check this outside your code in quick fiddles with the extra / and without the extra /.
Use String.Replace Method.
str.replace("rebate","new-path/rebate")
http://msdn.microsoft.com/en-us/library/fk49wtc1%28v=vs.110%29.aspx

How to validate a textbox that it must contain the values starting from fixed combination of words?

I am trying to validate a textbox that it must contain the values starting from fixed word "temp", User must enter temp before entering any other thing in the textbox.
Please help.
Regards.
Have you tried regular expressions? Regular expressions are a way to see if a string contains a specified sequence of characters, and is much more robust than a simple 'search'! They're a powerful tool and I would suggest google for a tutorial.
I noticed you said this is client side, so here's a page describing regexp in javascript. I haven't used regular expressions in javascript, but they can be very useful. Of course, regular expressions are also available in C#.
Basically you'll want to use "^temp" as your pattern. The '^' will make sure that the matching starts at the beginning of the string you're testing, and check to see if 'temp' is there. If the pattern doesn't match, the string doesn't have 'temp' at the start of it.
var stringToTest = "TemP this should match"
var pattern = /^temp/i
var result = pattern.test(stringToTest)
Above is a simple example that I pulled from W3Schools. As you see, the pattern uses '^temp' as its pattern, and it uses the modifier 'i' to make the check case-insensitive, so that it doesn't matter how the user types in 'temp'(Could be Temp, temP, teMp, teMP, tEmp, etc).

Regular expression for filenames that doesn't exclude whitespaces

I have been using this regular expression to extract file names out of file path strings:
Regex r = new Regex(#"\w+[.]\w+$+");
This works, as long as there is no space in the file name. For example:
r.Match("c:\somestuff\myfile.doc").Value = "myfile.doc"
r.Match("c:\somestuff\my file.doc").Value = "file.doc"
I need my regular expression to give me "my file.doc", and not just "file.doc"
I tried messing around with the expression myself. In particular I tried adding \s+ after learning that that is for matching whitespaces. I didn't get the results I hoped for.
I did devise a solution just to get the job done: I started at the end of the string, went backwards until a backslash was reached. This gave me the file name in reverse order (i.e. cod.elifym) into an array of chars, then I used Array.Reverse() to turn it around. However I'd like to learn how to achieve this by simply modifying my original regular expression.
Does it have to be a regular expression? Use System.IO.Path.GetFileName() instead.
Regex r = new Regex(#"[\w ]+\.\w+$");
A working regex might simply look like:
[^\\]+$
Consider using:
System.IO.Path.GetFileName(path)

Quick & Dirty way to update "IDs" in a string formatted as XML (C#)

For a one-shot operation, i need to parse the contents of an XML string and change the numbers of the "ID" field. However, i can not risk changing anything else of the string, eg. whitespace, line feeds, etc. MUST remain as they are!
Since i have made the experience that XmlReader tends to mess whitespace up and may even reformat your XML i don't want to use it (but feel free to convince me otherwise). This also screams for RegEx but ... i'm not good at RegEx, particularly not with the .NET implementation.
Here's a short part of the string, the number of the ID field needs to be updated in some cases. There can be many such VAR entries in the string. So i need to convert each ID to Int32, compare & modify it, then put it back into the string.
<VAR NAME="sf_name" ID="1001210">
I am looking for the simplest (in terms of coding time) and safest way to do this.
The regex pattern you are looking for is:
ID="(\d+)"
Match group 1 would contain the number. Use a MatchEvaluator Delegate to replace matches with dynamically calculated replacements.
Regex r = new Regex("ID=\"(\\d+)\"");
string outputXml = r.Replace(inputXml, new MatchEvaluator(ReplaceFunction));
where ReplaceFunction is something like this:
public string ReplaceFunction(Match m)
{
// do stuff with m.Groups(1);
return result.ToString();
}
If you need I can expand the Regex to match more specifically. Currently all ID values (that contain numbers only) are replaced. You can also build that bit of "extra intelligence" into the match evaluator function and make it return the match unchanged if you don't want to change it.
Take a look at this property PreserveWhitespace in XmlDocument class

Categories

Resources