I never use regular expression before and plan to use it to solve my problem but not quite sure whether it can help me.
I have a situation where I need store a rule or formula to build string values like following examples in a database field then retrieve this rule and build the string value.
FacilityCode + Left(ModelNO,2)
Right(PO,3) + Left(Serial,2)
Is this achievable using .net regular expression? Any good tutorial or simple examples of this problem.
Regexp : http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
But it doesn't seems fitting :)
It might be better to code some random string generator. Regex is for searching data not creating data.
The thing to remember about regex is that it is like an aircraft carrier; it does one thing very very well, it does not do other jobs very well at all.
An aircraft carrier moves planes very well on the ocean; it does not make a cheese sandwich well AT ALL!!
That is to say, if you use regex when you shouldn't you will almost certainly use far more processing power than if you used another tool for that job. Html parsing comes to mind.
Regex is provided as part of System.Text.RegularExpressions, but you can't rely exclusively on it. It'll let you search existing strings, but you'll need to implement your own logic for building new strings based on what you find in the existing data.
Also, keep in mind that System.Text.RegularExpressions works differently from regexp in Perl and other implementations. For example, it doesn't recognize POSIX character class definitions.
Since you're new to regex, you might want to check out the "Regular Expressions User Guide" on zytrax.com. It's not as comprehensive as an O'Reilly manual, but it'll do as a start.
Related
Hi fellow programmers and nerds!
When creating regular expressions Visual Studio, the IDE will highlight the string if it's preceded by a verbatim identifier (for example, #"Some string). This looks something like this:
(Notice the way the string is highlighted). Most of you will have seen this by now, I'm sure.
My problem: I am using a package acquired from NuGet which deals with regular expressions, and they have a function which takes in a regular expression string, however their function doesn't have the syntax highlighting.
As you can see, this just makes reading the Regex string just a pain. I mean, it's not all-too-important, but it would make a difference if we can just have that visually-helpful highlighting to reduce the time and effort one's brain uses trying to decipher the expression, especially in a case like mine where there will be quite a quantity of these expressions.
The question
So what I'm wanting to know is, is there a way to make a function highlight the string this way*, or is it just something that's hardwired into the IDE for the specific case of the Regex c-tor? Is there some sort of annotation which can be tacked onto the function to achieve this with minimal effort, or would it be necessary to use some sort of extension?
*I have wrapped the call to AddStyle() into one of my own functions anyway, and the string will be passed as a parameter, so if any modifications need to be made to achieve the syntax-highlight, they can be made to my function. Therefore the fact that the AddStyle() function is from an external library should be irrelevant.
If it's a lot of work then it's not worth my time, somebody else is welcome to develop an extension to solve this, but if there is a way...
Important distinction
Please bear in mind I am talking about Visual Studio, NOT Visual Studio Code.
Also, if there is a way to pull the original expression string from the Regex, I might do it that way, since performance isn't a huge concern here as this is a once-on-startup thing, however I would prefer not to do it that way. I don't actually need the Regex object.
According to https://devblogs.microsoft.com/dotnet/visual-studio-2019-net-productivity/#regex-language-support and https://www.meziantou.net/visual-studio-tips-and-tricks-regex-editing.htm you can mark the string with a special comment to get syntax highlighting:
// language=regex
var str = #"[A-Z]\d+;
or
MyMethod(/* language=regex */ #"[A-Z]\d+);
(the comment may contain more than just this language=regex part)
The first linked blog talks about a preview, but this feature is also present in the final product.
.NET 7 introduces the new [StringSyntax(...)] attribute, which is used in .NET 7 on more than 350 string, string[], and ReadOnlySpan<char> parameters, properties, and fields to highlight to an interested tool what kind of syntax is expected to be passed or set.
https://devblogs.microsoft.com/dotnet/regular-expression-improvements-in-dotnet-7/?WT_mc_id=dotnet-35129-website&hmsr=joyk.com&utm_source=joyk.com&utm_medium=referral
So for a method argument you should just use:
void MyMethod([StringSyntax(StringSyntaxAttribute.Regex)] string regex);
Here is a video demonstrating the feature: https://youtu.be/Y2YOaqSAJAQ
I would like to know how does Wikimedia transform its model syntax ({{model|options}}) into html code.
I have a regex for a simple model ({{.*?}}) but it fails for a nested model (ex: {{model|options containing a {{submodel|options}}...}})
Remember,
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
That said, you can read: Forum tags. What is the best way to implement them? I made an example of nested tags, both with "pure" Regex and with a "more stable" C# parser that uses a little of Regexes but keeps the stack out of the Regex hands.
You can do it with balancing groups. They aren't part of "base" Regex (and some persons don't consider them to be true regexes),
But I wouldn't program something as big as Wiki with something like a regex. The problem of regexes is that it's quite difficult to program them so that they don't backtrack (there is an option to do it, but it's difficult to build a regex that doesn't need backtracking or that need only limited amout of backtracking), and when they begin to backtrack it's the end: they could stall for minutes searching for the right combination of captures.
I m giving a string that contains several different combination of data.
For example :
string data = "(age=20&gender=male) or (city=newyork)"
string data1 = "(job=engineer&gender=female)"
string data2 = "(foo =1 or foo = 2) & (bar =1)"
I need to parse this string and create structure out of it and i have to evaluate this to a condition of another object. eg: if the object has these properties, then do something , else skip etc.
What are the best practices to do this?
Should i use a parser such as antlr and generate tokens out of the string. etc.?
reminder : there are several combinations of how this string is created. but it s all and/or.
Something like ANTLR is probably overkill for this.
A simple implementation of the shunting-yard algorithm would probably do the trick quite nicely.
Using regular expressions may work if the example is very simple, but it will more likely lead to a code that is impossible to maintain. Using some other approach to parsing seems like a good idea.
I would take a look at NCalc - it is mainly focused on parsing mathematical expressions, but it seems to be quite customizable (you can specify your functions and constants), so it may work in your scenario as well.
If this is too complex for your purpose, you can use any "parser generator" for C#. Using ANTLR is one great option - here is an example that shows how to start writing something like your example Five minute introduction to ANTLR
You could also try using F#, which is a great language for this kind of problem. See for example FsLex Sample by Chris Smith, which shows a simple mathematical evaluator - processing the parsed expression in F# would be a lot easier than in C#. In F#, you could also use FParsec, which is very lightweight, but may be a bit difficult to follow if you're not used to F#.
I suggest you to have a look at regular expressions: http://www.codeproject.com/KB/dotnet/regextutorial.aspx
Antlr is a great tool, but you can probably do this with regular expressions. One of the nice things about the .NET regex engine is support for nested constructs. See
http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/
and this SO post.
Seems like you might want to use Regular Expressions to do this.
Read up a little bit on Regular Expressions in .NET. Here are some good articles:
http://msdn.microsoft.com/en-us/library/hs600312.aspx
http://www.regular-expressions.info/dotnet.html
When it comes time to write/test your Regular expression i would highly recommend using RegExLib.com's regex tester.
I'd like to give users the ability to search through a large list of businesses, but still find near matches.
Does anyone have any recommendations on how best to go about this when you're not targeting simple dictionary words, but instead complex names like ABC Business Name?
Regards.
Check out the wikipedia article on Levenshtein distance. It's a fairly simple concept to wrap your head around and pretty easy to implement an algorithm in whichever language you are using, in your case, C#.
I found an example in C# for you here.
Also, here is an example of a spelling corrector from Peter Norvig of Google. It was said on the SO podcast a few episodes ago that Jon Skeet attempted a rewrite of this same algorithm in C#. Not sure if he completed it and/or made it publicly available though.
Consider using Keyword match and edit distance based similarity. Might combine with 'original searched' to 'actually clicked'.
This is probably a crazy solution but could you split the business name by space and then search either all the items or maybe the first couple.
So you might search on 'ABC' and 'Business' but leave out 'Name' as this might take too long.
You might even check to see if the string is of a certain length, then trim and just search on the first say 5 letters.
Have you had a look at "soundex" as a way of searching through your businesses. Again, I think you'd need to split the name by space.
You might check out the SQL Server SOUNDEX and DIFFERENCE functions. SOUNDEX converts a sequence of characters (such as a word) into a 4-character code which will be the same for similar-sounding words. DIFFERENCE gives a number which represents how "different" two strings are based on sound.
You could, for example, create a computed column based on the SOUNDEX function and match on that column later. Or you could use DIFFERENCE in a WHERE clause.
I need to implement something similar to wikilinks on my site. The user is entering plain text and will enter [[asdf]] wherever there is an internal link. Only the first five examples are really applicable in the implementation I need.
Would you use regex, what expression would do this? Is there a library out there somewhere that already does this in C#?
On the pure regexp side, the expression would rather be:
\[\[([^\]\|\r\n]+?)\|([^\]\|\r\n]+?)\]\]([^\] ]\S*)
\[\[([^\]\|\r\n]+?)\]\]([^\] ]\S*)
By replacing the (.+?) suggested by David with ([^\]\|\r\n]+?), you ensure to only capture legitimate wiki links texts, without closing square brackets or newline characters.
([^\] ]\S+) at the end ensures the wiki link expression is not followed by a closing square bracket either.
I am note sure if there is C# libraries already implementing this kind of detection.
However, to make that kind of detection really full-proof with regexp, you should use the pushdown automaton present in the C# regexp engine, as illustrated here.
I don't know if there are existing libraries to do this, but if it were me I'd probably just use regexes:
match \[\[(.+?)\|(.+?)\]\](\S+) and replace with \1\3
match \[\[(.+?)\]\](\S+) and replace with \1\2
Or something like that, anyway.
Although this is an old question and already answered, I thought I'd add this as an addendum for anyone else coming along. The existing two answers do all the real work and got me 90% there, but here is the last bit for anyone looking for code to get straight on with trying:
string html = "Some text with a wiki style [[page2.html|link]]";
html = Regex.Replace(html, #"\[\[([^\]\|\r\n]+?)\|([^\]\|\r\n]+?)\]\]([^\] ]\S*)", #"$2$3");
html = Regex.Replace(html, #"\[\[([^\]\|\r\n]+?)\]\]([^\] ]\S*)", #"$1$2");
The only change to the actual regex is I think the original answer had the replacement parts the wrong way around, so the href was set to the display text and the link was shown on the page. I've therefore swapped them.