Regex to match a fragment of the URL - c#

I have URL's like:
http://127.0.0.1:81/controller/verbOne/NXw4fDF8MXwxfDQ1?source=dddd
or
http://127.0.0.1:81/controller/verbTwo/NXw4fDF8MXwxfDQ1
I'd like to extract that part in bold. The host and port can change to anything (when I publish it to a live server it will change). The controller never changes. And for the verb part, there are 2 possibilities.
Can anyone help me with the regex?
Thanks

Instead of using a regex you could use the built in functionality of Uri
Uri uri = new Uri("http://127.0.0.1:81/controller/verbOne/NXw4fDF8MXwxfDQ1?source=dddd");
var lastSegment = uri.Segments.Last();

You're looking for the Uri and Path classes:
Path.GetFileName(new Uri(str).AbsolutePath)

Why do you look for a regex? you can look for the two string elements "verbOne/" or "verbTwo/" and make a substring from the end. And then you can look for the rest and substrakt the part with the '?'
I think this is faster then a regex.
krikit

Though everyone else here is correct that regex is not the best solution, because it could fail when parsers already exist that should never fail due to their specialization, I believe you could use the following regex:
(?<=http://127\.0\.0\.1:81/controller/verb(One|Two)/)[a-zA-Z0-9]*

Related

Regex to Replace the end of the Url

I have a url something that follows a pattern as below :
http://i.ebayimg.com/00/s/MTUw12323gxNTAw/$(KGr123qF,!p0F123Q~~60_12.JPG?set_id=88123231232F
I need a regex to find and replace the end of the url _12.JPG with _14.JPG. So basically i need to capture the _[numbers only].JPG pattern and replace it with my value.
var regex = new Regex(#"_\d+\.JPG");
var newUrl = regex.Replace(url, "_14.JPG");
_[0-9]+\.JPG\?
works for the sample URL. You didn't really mention whether you wanted the
?set_id=88123231232F gone or not.
Basically, you shouldn't normally be concerned with periods anywhere else in the URL. It is possible, but the additional constraint of the jpg extension should limit anything returned with not much issue.
///_(\d?\d).jpg/ig
var regex = new Regex(#"_(\d?\d).[Jj][Pp][Gg]");
That will capture one or two numbers between an underscore and .jpg
I will double check this, but it should work for both one digit and two digits.

URL regex - not getting it to work

I am using the following regex to find if there is a url present in a text, however it seems to miss some URLs like:
youtube.be/8P0BxJO
youtube.com/watch?v=VrmlFL
and also some bit.ly links (but not all)
Match m = Regex.Match(nc[i].InnerText,
#"(http(s)?://)?([\w-]+\.)+[\w-]+(/\S\w[\w- ;,./?%&=]\S*)?");
if (m.Success)
{
MessageBox.Show(nc[i].InnerText);
}
any ideas how to fix it?
See this related question, the first answer should help you out. The suggestion both finds links and then replaces them, so obviously just take what you need. This and this article are different approaches that should get you more or less the same result.
Another (perhaps more reliable) non-regex approach would be to tokenize the string by splitting on spaces and punctuation, and then checking the tokens to see whether they are a valid uri using Uri.IsWellFormedUriString (which only works on well formed uri's, as this question points out).

Regex to find WSDL files in HTML

I am writing a discover service that takes a URL and returns the HTML located at that page.
From that page, I need to "scrape" all the WSDL URL's.
So I need something like the following, but I am not sure how to specify the regex to pass into the pattern matching.
string wsdlPattern = //SOME REGEX THAT MATCHES WSDL http:{address}wsdl
Regex wsdlRegex = new Reges(wsdlPattern);
MatchCollection matches = wsdlRegex.Match(html);
Can somebody please help me figure how I can do this?
Try this:
http://[^\s]*?.wsdl
The regular text parts are obvious: it needs to start with http:// and end with .wsdl. [^\s] means "any non-whitespace character", and *? means "as few as possible" (this is necessary in case you have something like http://www.blah.com/a.wsdl<br>http://www.blah.com/b.wsdl. Without the ?, you'd match that whole thing as one string.)
This isn't perfect, but it should get you started.
If you want to play with regex, this is a great resource:
http://www.gskinner.com/RegExr
I used below RE for validting WSDL urls, as you can see I had to check if they end with "?wsdl"
RE : (http|https):\/\/[^\s]*?.\?wsdl
Ignore Case : (?i)(http|https):\/\/[^\s]*?.\?wsdl(?-i)
( Test Case : http://localhost/WebService1.asmx?wSDl )
wsdls can be uploaded using ftp and files as well therefore:
(http|https|ftp|file)://[^\s]*?.(wsdl|WSDL)
Hope this helps!

C# string.Split() Matching Both Slashes?

I've got a .NET 3.5 web application written in C# doing some URL rewriting that includes a file path, and I'm running into a problem. When I call string.Split('/') it matches both '/' and '\' characters. Is that... supposed to happen? I assumed that it would notice that the ASCII values were different and skip it, but it appears that I'm wrong.
// url = 'someserver.com/user/token/files\subdir\file.jpg
string[] buffer = url.Split('/');
The above code gives a string[] with 6 elements in it... which seems counter intuitive. Is there a way to force Split() to match ONLY the forward slash? Right now I'm lucky, since the offending slashes are at the end of the URL, I can just concatenate the rest of the elements in the string[], but it's a lot of work for what we're doing, and not a great solution to the underlying problem.
Anyone run into this before? Have a simple answer? I appreciate it!
More Code:
url = HttpContext.Current.Request.Path.Replace("http://", "");
string[] buffer = url.Split('/');
Turns out, Request.Path and Request.RawUrl are both changing my slashes, which is ridiculous. So, time to research that a bit more and figure out how to get the URL from a function that doesn't break my formatting. Thanks everyone for playing along with my insanity, sorry it was a misleading question!
When I try the following:
string url = #"someserver.com/user/token/files\subdir\file.jpg";
string[] buffer = url.Split('/');
Console.WriteLine(buffer.Length);
... I get 4. Post more code.
Something else is happening, paste more code.
string str = "a\\b/c\\d";
string[] ts = str.Split('/');
foreach (string t in ts)
{
Console.WriteLine(t);
}
outputs
a\b
c\d
just like it should.
My guess is that you are converting / into \ somewhere.
You could use regex to convert all \ slashes to a temp char, split on /, then regex the temp chars back to \. Pain in the butt, but one option.
I suspect (without seeing your whole application) that the problem lies in the semantics of path delimiters in URLs. It sounds like you are trying to attach a semantic value to backslashes within your application that is contrary to the way HTTP protocols define and use backslashes.
This is just a guess, of course.
The best way to solve this problem might be modifying the application to encode the path in some other way (such as "%5C" for backslashes, maybe?).
those two functions are probably converting \ to / because \ is not a valid character in a URL (see Which characters make a URL invalid?). The browser (NOT C#, as you are inferring) is assuming that when you are using that invalid character, you mean /, so it is "fixing" it for you. If you want \ in your URL, you need to encode it first.
The browsers themselves are actually the ones that make that change in the request, even if it is behind the scenes. To verify this, just turn on fiddler and look at the URLs that are actually getting sent when you go to a URL like this. IE and Chrome actually change the \ to / in the URL field on the browser itself, FireFox doesn't, but the request goes through that way anyways.
Update:
How about this:
Regex.Split(url, "/");

Regular expression for validating a url

I'm a beginner in regexes. My requirement is to validate simple urls to urls with query strings, square brackets etc.. say for eg,
www.test.com?waa=[sample data]
the regex that I wrote only work for simple urls. It fails for the one with square brackets. Any idea?
Do you really need to use regex ?
bool isUri = Uri.IsWellFormedUriString("http://...", UriKind.RelativeOrAbsolute)
I would suggest taking a better look at the following site
http://www.regular-expressions.info/dotnet.html
Without actually seeing the Regex you're using I can't provide much insight. And giving you the answer wouldn't really teach you much either. Give a man a regex and you help him for a bit. Teach him regex and he's good for life
Take a look at the following:
http://www.geekzilla.co.uk/view2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm
thanks a lot fr reply..
this is what i wrote ..works for query strings too...but it fails while adding []..
/^(https?|ftp)://(?#)(([a-z0-9$.+!*\'(),;\?&=-]|%[0-9a-f]{2})+(?#)(:([a-z0-9$.+!*\'(),;\?&=-]|%[0-9a-f]{2})+)?(?#)#)?(#)((([a-z0-9][a-z0-9-][a-z0-9].)(#)[a-z]{2}[a-z0-9-]a-z0-9|(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5].){3}(?#)(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?#))(:\d+)?(?#))(((/+([a-z0-9$_.+!*\'(),;:#&=-]|%[0-9a-f]{2}))(?# )(\?([a-z0-9$_.+!*\'(),;:#&=-]|%[0-9a-f]{2}))(?#)?)?)?(?#)(#([a-z0-9$_.+!*\'(),;:#&=-]|%[0-9a-f]{2})*)?(?#)$/i
Use this if u want url with http
http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=]*)?
if oyu dnt want http in URL then go for
?://([\w-]+.)+[\w-]+(/[\w- ./?%&=]*)?

Categories

Resources