I am making a programming language in native C++, with which I am making a basic editor in C#. NET WinForms. However, I am using a SyntaxRTB, with which I would like the Regex to catch the following error:
if declare is not succeeded by string / int / float / bool / array / char
How would I do that?
(The syntax to declare a variable is declare variable_type variable_name) - A whitespace would have to be accounted for too)
I have declare(?!string), but am still confused.
If you want a regex, you need a zero-width negative lookahead
But if you're constructing a language, this isn't the way to go. Full-blown language parsers are a different entity.
Although I agree with #fejesjoco, this is the expression I used here:
(declare)[\s](int|string|float|bool|array|char)[\s](.*)
Check for !match(pattern) to further diagnose an issue.
You are going to want to use Lookahead assertion. To be honest, I'm decent in Regex but I'm not really the guy you want explaining it to you.
This link will explain it better than I can, and this link provides a fairly decent Regex editor.
Related
I am trying to add a feature that works with certain unicode groups from a string. I found this question that suggests the following solution, which does work on the unicodes inside of the stated range:
s = Regex.Replace(s, #"[^\u0000-\u007F]", string.Empty);
This works fine.
In my research, though, I came across the use of unicode blocks, which I find to be far more readable.
InBasic_Latin = U+0000–U+007F
More often, I saw recommendations pointing people to use the actual codes themselves (\u0000-\u007F) rather than these blocks (InBasic_Latin). I could see the benefit of explicitly declaring a range when you need some subset of that block or a specific unicode, but when you really just want that entire grouping using the block declaration it seems more friendly to readability and even programmability to use the block name instead.
So, generally, my question is why would \u0000–\u007F be considered a better syntax than InBasic_Latin?
It depends on your regex engine, but some (like .NET, Java, Perl) do support Unicode blocks:
if (Regex.IsMatch(subjectString, #"\p{IsBasicLatin}")) {
// Successful match
}
Others don't (like JavaScript, PCRE, Python, Ruby, R and most others), so you need to spell out those codepoints manually or use an extension like Steve Levithan's XRegExp library for JavaScript.
I would like to know how does Wikimedia transform its model syntax ({{model|options}}) into html code.
I have a regex for a simple model ({{.*?}}) but it fails for a nested model (ex: {{model|options containing a {{submodel|options}}...}})
Remember,
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
That said, you can read: Forum tags. What is the best way to implement them? I made an example of nested tags, both with "pure" Regex and with a "more stable" C# parser that uses a little of Regexes but keeps the stack out of the Regex hands.
You can do it with balancing groups. They aren't part of "base" Regex (and some persons don't consider them to be true regexes),
But I wouldn't program something as big as Wiki with something like a regex. The problem of regexes is that it's quite difficult to program them so that they don't backtrack (there is an option to do it, but it's difficult to build a regex that doesn't need backtracking or that need only limited amout of backtracking), and when they begin to backtrack it's the end: they could stall for minutes searching for the right combination of captures.
I never use regular expression before and plan to use it to solve my problem but not quite sure whether it can help me.
I have a situation where I need store a rule or formula to build string values like following examples in a database field then retrieve this rule and build the string value.
FacilityCode + Left(ModelNO,2)
Right(PO,3) + Left(Serial,2)
Is this achievable using .net regular expression? Any good tutorial or simple examples of this problem.
Regexp : http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
But it doesn't seems fitting :)
It might be better to code some random string generator. Regex is for searching data not creating data.
The thing to remember about regex is that it is like an aircraft carrier; it does one thing very very well, it does not do other jobs very well at all.
An aircraft carrier moves planes very well on the ocean; it does not make a cheese sandwich well AT ALL!!
That is to say, if you use regex when you shouldn't you will almost certainly use far more processing power than if you used another tool for that job. Html parsing comes to mind.
Regex is provided as part of System.Text.RegularExpressions, but you can't rely exclusively on it. It'll let you search existing strings, but you'll need to implement your own logic for building new strings based on what you find in the existing data.
Also, keep in mind that System.Text.RegularExpressions works differently from regexp in Perl and other implementations. For example, it doesn't recognize POSIX character class definitions.
Since you're new to regex, you might want to check out the "Regular Expressions User Guide" on zytrax.com. It's not as comprehensive as an O'Reilly manual, but it'll do as a start.
I m giving a string that contains several different combination of data.
For example :
string data = "(age=20&gender=male) or (city=newyork)"
string data1 = "(job=engineer&gender=female)"
string data2 = "(foo =1 or foo = 2) & (bar =1)"
I need to parse this string and create structure out of it and i have to evaluate this to a condition of another object. eg: if the object has these properties, then do something , else skip etc.
What are the best practices to do this?
Should i use a parser such as antlr and generate tokens out of the string. etc.?
reminder : there are several combinations of how this string is created. but it s all and/or.
Something like ANTLR is probably overkill for this.
A simple implementation of the shunting-yard algorithm would probably do the trick quite nicely.
Using regular expressions may work if the example is very simple, but it will more likely lead to a code that is impossible to maintain. Using some other approach to parsing seems like a good idea.
I would take a look at NCalc - it is mainly focused on parsing mathematical expressions, but it seems to be quite customizable (you can specify your functions and constants), so it may work in your scenario as well.
If this is too complex for your purpose, you can use any "parser generator" for C#. Using ANTLR is one great option - here is an example that shows how to start writing something like your example Five minute introduction to ANTLR
You could also try using F#, which is a great language for this kind of problem. See for example FsLex Sample by Chris Smith, which shows a simple mathematical evaluator - processing the parsed expression in F# would be a lot easier than in C#. In F#, you could also use FParsec, which is very lightweight, but may be a bit difficult to follow if you're not used to F#.
I suggest you to have a look at regular expressions: http://www.codeproject.com/KB/dotnet/regextutorial.aspx
Antlr is a great tool, but you can probably do this with regular expressions. One of the nice things about the .NET regex engine is support for nested constructs. See
http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/
and this SO post.
Seems like you might want to use Regular Expressions to do this.
Read up a little bit on Regular Expressions in .NET. Here are some good articles:
http://msdn.microsoft.com/en-us/library/hs600312.aspx
http://www.regular-expressions.info/dotnet.html
When it comes time to write/test your Regular expression i would highly recommend using RegExLib.com's regex tester.
I'm writing a CMS in ASP.NET/C#, and I need to process things like that, every page request:
<html>
<head>
<title>[Title]</title>
</head>
<body>
<form action="[Action]" method="get">
[TextBox Name="Email", Background=Red]
[Button Type="Submit"]
</form>
</body>
</html>
and replace the [...] of course.
My question is how should I implement it, with ANTLR or with Regex? What will be faster? Note, that if I'm implementing it with ANTLR I think that I will need to implement XML, in addon to the [..].
I will need to implement parameters, etc.
EDIT: Please note that my regex can even look like something like that:
public override string ToString()
{
return Regex.Replace(Input, #"\[
\s*(?<name>\w+)\s*
(?<parameter>
[\s,]*
(?<paramName>\w+)
\s*
=
\s*
(
(?<paramValue>\w+)
|
(""(?<paramValue>[^""]*)"")
)
)*
\]", (match) =>
{
...
}, RegexOptions.IgnorePatternWhitespace);
}
Whether the correct tool is RegEx or ANTLR or even something else entirely should be heavily dependent on your requirements. The best answer to a "what tool to use" question shouldn't be primarily based on performance, but on the right tool for the job.
RegEx is a text search tool. If all you need to do is pull strings out of strings then it's often the hammer of choice. You'll likely want a tool to help you build your RegEx. I'd recommend Expresso, but there are lots of options out there.
ANTLR is a compiler generator. If you need error messages and parse actions or any of the complicated things that come with a compiler then it's a good option.
What it looks like you're doing is XML search/replace, have you considered XPath? That would be my suggestion.
Choosing the right tool for the job is definitely important, something that should be researched and thought out before development begins. In all cases, it's important to fully understand the program requirements before making any decisions. Do you have a specification for the project? If not, spending the time to come up with one will save you all the time that a poor tool choice can cost you.
Hope that helps!
About the performance of ANTLR vs. RegEx depends on the implementation of RegEx in C#. I know, from experience, that ANTLR is fast enough.
In ANTLR you can ignore certain content, like the XML. You can also seek for the [ and ] and go further with processing.
Both RegEx and ANTLR are supporting your kind of parameters (the "etc." I'm not sure about).
In terms of development speed: RegEx is slightly faster for such a case like this. You can use an online tool to develop the RegEx and see the capture-groups while you edit the RegEx. (Google # regex gskinner)
Then ANTLR has perfect support for "error-messages": they show line/column numbers and what was wrong. RegEx doesn't have this support.
A general approach for RegEx would be: create a "global scan" RegEx which will find correct [...] groups in your content. Then let the "..." be captuerd by a group, and then apply another RegEx for this smaller content (which splits content based on the equal-sign and commas). This way you have the best runtime performance and it's easy to develop.
If the language you are parsing is regular then regular expressions are certainly an option. If it is not then ANTLR may be your only choice. If I understand these matters correctly XML is not regular.