I'm a beginner in regexes. My requirement is to validate simple urls to urls with query strings, square brackets etc.. say for eg,
www.test.com?waa=[sample data]
the regex that I wrote only work for simple urls. It fails for the one with square brackets. Any idea?
Do you really need to use regex ?
bool isUri = Uri.IsWellFormedUriString("http://...", UriKind.RelativeOrAbsolute)
I would suggest taking a better look at the following site
http://www.regular-expressions.info/dotnet.html
Without actually seeing the Regex you're using I can't provide much insight. And giving you the answer wouldn't really teach you much either. Give a man a regex and you help him for a bit. Teach him regex and he's good for life
Take a look at the following:
http://www.geekzilla.co.uk/view2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm
thanks a lot fr reply..
this is what i wrote ..works for query strings too...but it fails while adding []..
/^(https?|ftp)://(?#)(([a-z0-9$.+!*\'(),;\?&=-]|%[0-9a-f]{2})+(?#)(:([a-z0-9$.+!*\'(),;\?&=-]|%[0-9a-f]{2})+)?(?#)#)?(#)((([a-z0-9][a-z0-9-][a-z0-9].)(#)[a-z]{2}[a-z0-9-]a-z0-9|(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5].){3}(?#)(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?#))(:\d+)?(?#))(((/+([a-z0-9$_.+!*\'(),;:#&=-]|%[0-9a-f]{2}))(?# )(\?([a-z0-9$_.+!*\'(),;:#&=-]|%[0-9a-f]{2}))(?#)?)?)?(?#)(#([a-z0-9$_.+!*\'(),;:#&=-]|%[0-9a-f]{2})*)?(?#)$/i
Use this if u want url with http
http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=]*)?
if oyu dnt want http in URL then go for
?://([\w-]+.)+[\w-]+(/[\w- ./?%&=]*)?
Related
I wanted to split a string into possible word string. What approach should I follow.
Given string : thisisapineapple
solution 1: this is a pineapple
solution 2: this is a pine apple
Please suggest and explain the possible alogriths to get above solution.
Thanks :)
To answer your question, Knuth-Morris-Pratt algorithm is powerful and not terribly difficult to implement.
Use strings from /usr/share/dict/words or /usr/dict/words as the patterns.
You need a scanner-less, GLR parser. They can handle words run together like this and can return ambiguous results. My own NLP library (AboditNLP) does this. Wordnet is a good source for the words.
I have URL's like:
http://127.0.0.1:81/controller/verbOne/NXw4fDF8MXwxfDQ1?source=dddd
or
http://127.0.0.1:81/controller/verbTwo/NXw4fDF8MXwxfDQ1
I'd like to extract that part in bold. The host and port can change to anything (when I publish it to a live server it will change). The controller never changes. And for the verb part, there are 2 possibilities.
Can anyone help me with the regex?
Thanks
Instead of using a regex you could use the built in functionality of Uri
Uri uri = new Uri("http://127.0.0.1:81/controller/verbOne/NXw4fDF8MXwxfDQ1?source=dddd");
var lastSegment = uri.Segments.Last();
You're looking for the Uri and Path classes:
Path.GetFileName(new Uri(str).AbsolutePath)
Why do you look for a regex? you can look for the two string elements "verbOne/" or "verbTwo/" and make a substring from the end. And then you can look for the rest and substrakt the part with the '?'
I think this is faster then a regex.
krikit
Though everyone else here is correct that regex is not the best solution, because it could fail when parsers already exist that should never fail due to their specialization, I believe you could use the following regex:
(?<=http://127\.0\.0\.1:81/controller/verb(One|Two)/)[a-zA-Z0-9]*
I don't really know what to entitle this, but I need some help with regular expressions. Firstly, I want to clarify that I'm not trying to match HTML or XML, although it may look like it, it's not. The things below are part of a file format I use for a program I made to specify which details should be exported in that program. There is no hierarchy involved, just that each new line contains a 'tag':
<n>
This is matched with my program to find an enumeration, which tells my program to export the name value, anyway, I also have tags like this:
<adr:home>
This specifies the home address. I use the following regex:
<((?'TAG'.*):(?'SUBTAG'.*)?)?(\s+((\w+)=('|"")?(?'VALUE'.*[^'])('|"")?)?)?>
The problem is that the regex will split the adr:home tag fine, but fail to find the n tag because it lacks a colon, but when I add a ? or a *, it then doesn't split the adr:home and similar tags. Can anyone help? I'm sure it's only simple, it's just this is my first time at creating a regular expression. I'm working in C#, by the way.
Will this help
<((?'TAG'.*?)(?::(?'SUBTAG'.*))?)?(\s+((\w+)=('|"")?(?'VALUE'.*[^'])('|"")?)?)?>
I've wrapped the : capture into a non capturing group round subtag and made the tag capture non greedy
Not entirely sure what your aim is but try this:
(?><)(?'TAG'[^:\s>]*)(:(?'SUBTAG'[^\s>:]*))?(\s\w+=['"](?'VALUE'[^'"]*)['"])?(?>>)
I find this site extremely useful for testing C# regex expressions.
What if you put the colon as part of the second tag?
<((?'TAG'.*)(?':SUBTAG'.*)?)?(\s+((\w+)=('|"")?(?'VALUE'.*[^'])('|"")?)?)?>
I posted a similar question earlier, but I now realize I should have been more thorough.
I've tested a number of the URL/URI expressions listed on regexlib.com, but I can't get any of them to work as desired:
msn.com
msn-msn.net
yahoo.c!om
http://www.yahoo.com
msn msn
test ! number 1
Here is how I desire them to act:
msn.com (match)
msn-msn.net (match)
yahoo.c!om (fail)
http://www.yahoo.com (match)
msn msn (fail)
test ! number 1 (fail)
I'm using the tester here: http://regexlib.com/RETester.aspx before testing in my own app (C#, .NET 4.0)
The expression that is closest is this, but it doesn't match the http://www.yahoo.com one:
^[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*
Any help is appreciated. Additionally, somebody should come up with a more human-readable equivalent to RegEx...this stuff is a nightmare.
Thanks,
Beems
If you can't guarantee that the URL-esque pattern you're trying to match has a scheme/protocol, then the safest thing to do is match against top-level domains:
^(https?://)[^/]*.([possibly|really|long|list|of|valid|top|level|domains][2])
From your post it's evidently not necessary to go into the path, hash, or querystring parts of a URL, so that's it!
This one appears to work as desired:
[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?
Can anyone see any issues with this in regards to my original query? I don't need to validate whether the TLD is proper, so this isn't really an issue.
Agree with kojiro
But this does match your tests
http://www.rubular.com/r/gUb4U6Pzux
I never use regular expression before and plan to use it to solve my problem but not quite sure whether it can help me.
I have a situation where I need store a rule or formula to build string values like following examples in a database field then retrieve this rule and build the string value.
FacilityCode + Left(ModelNO,2)
Right(PO,3) + Left(Serial,2)
Is this achievable using .net regular expression? Any good tutorial or simple examples of this problem.
Regexp : http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
But it doesn't seems fitting :)
It might be better to code some random string generator. Regex is for searching data not creating data.
The thing to remember about regex is that it is like an aircraft carrier; it does one thing very very well, it does not do other jobs very well at all.
An aircraft carrier moves planes very well on the ocean; it does not make a cheese sandwich well AT ALL!!
That is to say, if you use regex when you shouldn't you will almost certainly use far more processing power than if you used another tool for that job. Html parsing comes to mind.
Regex is provided as part of System.Text.RegularExpressions, but you can't rely exclusively on it. It'll let you search existing strings, but you'll need to implement your own logic for building new strings based on what you find in the existing data.
Also, keep in mind that System.Text.RegularExpressions works differently from regexp in Perl and other implementations. For example, it doesn't recognize POSIX character class definitions.
Since you're new to regex, you might want to check out the "Regular Expressions User Guide" on zytrax.com. It's not as comprehensive as an O'Reilly manual, but it'll do as a start.