How to remove PROTOCOL from URI - c#

how can I remove the protocol from URI? i.e. remove HTTP

You can use this the System.Uri class like this:
System.Uri uri = new Uri("http://stackoverflow.com/search?q=something");
string uriWithoutScheme = uri.Host + uri.PathAndQuery + uri.Fragment;
This will give you stackoverflow.com/search?q=something
Edit: this also works for about:blank :-)

The best (and to me most beautiful) way is to use the Uri class for parsing the string to an absolute URI and then use the GetComponents method with the correct UriComponents enumeration to remove the scheme:
Uri uri;
if (Uri.TryCreate("http://stackoverflow.com/...", UriKind.Absolute, out uri))
{
return uri.GetComponents(UriComponents.AbsoluteUri &~ UriComponents.Scheme, UriFormat.UriEscaped);
}
For further reference: the UriComponents enumeration is a decorated with the FlagsAttribute, so bitwise operations (eg. & and |) can be used on it. In this case the &~ removes the bits for UriComponents.Scheme from UriComponents.AbsoluteUri using the AND operator in combination with the bitwise complement operator.

In the general sense (not limiting to http/https), an (absolute) uri is always a scheme followed by a colon, followed by scheme-specific data. So the only safe thing to do is cut at the scheme:
string s = "http://stackoverflow.com/questions/4517240/";
int i = s.IndexOf(':');
if (i > 0) s = s.Substring(i + 1);
In the case of http and a few others you may also want to .TrimStart('/'), but this is not part of the scheme, and is not guaranteed to exist. Trivial example: about:blank.

You could use the RegEx for this. The below sample would meet your need.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="http://www.google.com";
string re1="((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))"; // HTTP URL 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String httpurl1=m.Groups[1].ToString();
Console.Write("("+httpurl1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
Let me know if this helps

It's not the most beautiful way, but try something like this:
var uri = new Uri("http://www.example.com");
var scheme = uri.Scheme;
var result = uri.ToString().SubString(scheme.Length + 3);

The above answers work in most cases, but IMO it's not a complete solution:
uri.Host + uri.PathAndQuery + uri.Fragment;
drops port if specified (e.g. http://www.example.com:8080/path/ becomes www.example.com/path/ )
uri.GetComponents(UriComponents.AbsoluteUri & ~UriComponents.Scheme, UriFormat.UriEscaped)
preserves ports and seems generally better, but in some cases, (which are most likely to be incorrect, but not impossible), I got some characters escaped that shouldn't.
In both cases we get '/' added at the end, so if your url is potentially sensitive to that difference, or you care how it looks, you need need to check if it was present before and if not TrimEnd it.
On top of that both of those solution throw exception if Uri is considered invalid, so if your url already doesn't have the 'schema' (e.g. www.example.com) the code above fails.
If you want something really generic and working for input over which you might not have control (e.g. user input), I'd probably stick to a simpler solution, e.g:
var endOfSchemaIdx = url.IndexOf("://");
if(endOfSchemaIdx != -1)
return url.Substring(endOfSchemaIdx+3);
return url;
You can also fetch the schema via a library like FLURL (doesn't throw exception on www.example.com) and look up the first occurrence of "url.Schema" + "://", then delete it if exists. I feel safer if the rest of the url is not processed by any library, unless that is your intention.

Related

Parse Line and Break it into Variables

I have a text file that contain only the FULL version number of an application that I need to extract and then parse it into separate Variables.
For example lets say the version.cs contains 19.1.354.6
Code I'm using does not seem to be working:
char[] delimiter = { '.' };
string currentVersion = System.IO.File.ReadAllText(#"C:\Applicaion\version.cs");
string[] partsVersion;
partsVersion = currentVersion.Split(delimiter);
string majorVersion = partsVersion[0];
string minorVersion = partsVersion[1];
string buildVersion = partsVersion[2];
string revisVersion = partsVersion[3];
Altough your problem is with the file, most likely it contains other text than a version, why dont you use Version class which is absolutely for this kind of tasks.
var version = new Version("19.1.354.6");
var major = version.Major; // etc..
What you have works fine with the correct input, so I would suggest making sure there is nothing else in the file you're reading.
In the future, please provide error information, since we can't usually tell exactly what you expect to happen, only what we know should happen.
In light of that, I would also suggest looking into using Regex for parsing in the future. In my opinion, it provides a much more flexible solution for your needs. Here's an example of regex to use:
var regex = new Regex(#"([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9])");
var match = regex.Match("19.1.354.6");
if (match.Success)
{
Console.WriteLine("Match[1]: "+match.Groups[1].Value);
Console.WriteLine("Match[2]: "+match.Groups[2].Value);
Console.WriteLine("Match[3]: "+match.Groups[3].Value);
Console.WriteLine("Match[4]: "+match.Groups[4].Value);
}
else
{
Console.WriteLine("No match found");
}
which outputs the following:
// Match[1]: 19
// Match[2]: 1
// Match[3]: 354
// Match[4]: 6

How to retrieve the locale(country) code from URL?

I have a URL, which is like http://example.com/UK/Deal.aspx?id=322
My target is to remove the locale(country) part, to make it like http://example.com/Deal.aspx?id=322
Since the URL may have other similar formats like: https://ssl.example.com/JP/Deal.aspx?id=735, using "substring" function is not a good idea.
What I can think about is to use the following method for separating them, and map them back later.
HttpContext.Current.Request.Url.Scheme
HttpContext.Current.Request.Url.Host
HttpContext.Current.Request.Url.AbsolutePath
HttpContext.Current.Request.Url.Query
And, suppose HttpContext.Current.Request.Url.AbsolutePath will be:
/UK/Deal.aspx?id=322
I am not sure how to deal with this since my boss asked me not to use "regular expression"(he thinks it will impact performance...)
Except "Regular Expression", is there any other way to remove UK from it?
p.s.: the UK part may be JP, DE, or other country code.
By the way, for USA, there is no country code, and the url will be http://example.com/Deal.aspx?id=322
Please also take this situation into consideration.
Thank you.
Assuming that you'll have TwoLetterCountryISOName in the Url. yYou can use UriBuilder class to remove the path from Uri without using the Regex.
E.g.
var originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
if (IsLocaleEnabled(sourceUri))
{
var builder = new UriBuilder(sourceUri);
builder.Path
= builder.Path.Replace(sourceUri.Segments[1] /* remove UK/ */, string.Empty);
// Construct the Uri with new path
Uri newUri = builder.Uri;;
}
Update:
// Cache the instance for performance benefits.
static readonly Regex regex = new Regex(#"^[aA-zZ]{2}\/$", RegexOptions.Compiled);
/// <summary>
/// Regex to check if Url segments have the 2 letter
/// ISO code as first ocurrance after root
/// </summary>
private bool IsLocaleEnabled(Uri sourceUri)
{
// Update: Compiled regex are way much faster than using non-compiled regex.
return regex.IsMatch(sourceUri.Segments[1]);
}
For performance benefits you must cache it (means keep it in static readonly field). There's no need to parse a pre-defined regex on every request. This way you'll get all the performance benefits you can get.
Result - http://example.com/Deal.aspx?id=322
It all depends on whether the country code always has the same position. If it's not, then some more details on the possible formats are required.. Maybe you could check, if the first segment has two chars or something, to be sure it really is a country code (not sure if this is reliable though). Or you start with the filename, if it's always in the format /[optionalCountryCode]/deal.aspx?...
How about these two approaches (on string level):
public string RemoveCountryCode()
{
Uri originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
string hostAndPort = originalUri.GetLeftPart(UriPartial.Authority);
// v1: if country code is always there, always has same position and always
// has format 'XX' this is definitely the easiest and fastest
string trimmedPathAndQuery = originalUri.PathAndQuery.Substring("/XX/".Length);
// v2: if country code is always there, always has same position but might
// not have a fixed format (e.g. XXX)
trimmedPathAndQuery = string.Join("/", originalUri.PathAndQuery.Split('/').Skip(2));
// in both cases you need to join it with the authority again
return string.Format("{0}/{1}", hostAndPort, trimmedPathAndQuery);
}
If the AbsolutePath will always have the format /XX/...pagename.aspx?id=### where XX is the two letter country code, then you can just strip off the first 3 characters.
Example that removes the first 3 characters:
var targetURL = HttpContext.Current.Request.Url.AbsolutePath.Substring(3);
If the country code could be different lengths, then you could find the index of the second / character and start the substring from there.
var sourceURL = HttpContext.Current.Request.Url.AbsolutePath;
var firstOccurance = sourceURL.IndexOf('/')
var secondOccurance = sourceURL.IndexOf('/', firstOccurance);
var targetURL = sourceURL.Substring(secondOccurance);
The easy way would be to treat as string, split it by the "/" separator, remove the fourth element, and then join them back with the "/" separator again:
string myURL = "https://ssl.example.com/JP/Deal.aspx?id=735";
List<string> myURLsplit = myURL.Split('/').ToList().RemoveAt(3);
myURL = string.Join("/", myURLsplit);
RESULT: https://ssl.example.com/Deal.aspx?id=735

Regular Expression for allowing multiple language input

Quick question regarding regular expression validation on textbox entry. Basically I have a textbox that I am using for user input in the form of a website address. The user can input anything (it doesn't have to be a valid website address - i.e. www.facebook.com. They could enter "blah blah", and that's fine but it will not run.
What I am after is to validate different languages, Arabic, Greek, Chinese, etc etc, because at present I only allow English characters.
The code for the method is below. I believe I will have to switch this from a whitelist to blacklist, so instead of seeing what matches, change the expression to invalid characters, and if the user enters one of these, don't allow it.
public static bool IsValidAddress(string path)
{
bool valid = false;
valid = (path.Length > 0);
if (valid)
{
string regexPattern = #"([0-9a-zA-Z*?]{1})([-0-9a-zA-Z_\.*?]{0,254})";
// Elimate the '"' character first up so it simplifies regular expressions.
valid = (path.Contains("\"") == false);
if (valid)
{
valid = IsValidAddress(path, regexPattern);
}
if (valid)
{
// Need an additional check to determine that the address does not begin with xn--,
// which is not permitted by the Internationalized Domain Name standard.
valid = (path.IndexOf("xn--") != 0);
}
}
return valid;
}
As you can see, I have the 0-9a-zA-Z included, but by default this will eliminate other languages, whereas I wish to include the languages.
Any help is greatly appreciated. If I've confused anyone, sorry! I can give more information if it is needed.
Thanks.
I don't know why you're trying to validate Uri's with Regex. .Net's Uri class is surely a much better match to your task, no?
Uri uri;
if(!Uri.TryParse(uriString, UriKind.Absolute, out uri))
{
//it's a bad URI
}

C# Regex.Replace

this is my Set of string inside richtextbox1..
/Category/5
/Category/4
/Category/19
/Category/22
/Category/26
/Category/27
/Category/24
/Category/3
/Category/1
/Category/15
http://example.org/Category/15/noneedtoadd
i want to change all the starting "/" with some url like "http://example.com/"
output:
http://example.com/Category/5
http://example.com/Category/4
http://example.com/Category/19
http://example.com/Category/22
http://example.com/Category/26
http://example.com/Category/27
http://example.com/Category/24
http://example.com/Category/3
http://example.com/Category/1
http://example.com/Category/15
http://example.org/Category/15/noneedtoadd
just asking, what is the pattern for that? :)
You don't need a regular expression here. Iterate through the items in your list and use String.Format to build the desired URL.
String.Format(#"http://example.com{0}", str);
If you want to check to see whether one of the items in that textbox is a fully-formed URL before prepending the string, then use String.StartsWith (doc).
if (!String.StartsWith("http://")) {
// use String.Format
}
Since you're dealing with URIs, you can take advantage of the Uri Class which can resolve relative URIs:
Uri baseUri = new Uri("http://example.com/");
Uri result1 = new Uri(baseUri, "/Category/5");
// result1 == {http://example.com/Category/5}
Uri result2 = new Uri(baseUri, "http://example.org/Category/15/noneedtoadd");
// result2 == {http://example.org/Category/15/noneedtoadd}
The raw regex pattern is ^/ which means that it will match a slash at the beginning of the line.
Regex.Replace (text, #"^/", "http://example.com/")

RegEx for an IP Address

I try to extract the value (IP Address) of the wan_ip with this sourcecode:
Whats wrong?! I´m sure that the RegEx pattern is correct.
String input = #"var product_pic_fn=;var firmware_ver='20.02.024';var wan_ip='92.75.120.206';if (parent.location.href != window.location.href)";
Regex ip = new Regex(#"[\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b");
string[] result = ip.Split(input);
foreach (string bla in result)
{
Console.WriteLine(bla);
}
Console.Read();
The [ shouldn't be at the start of your pattern. Also, you probably want to use Matches(...).
Try:
String input = #"var product_pic_fn=;var firmware_ver='20.02.024';var wan_ip='92.75.120.206';if (parent.location.href != window.location.href)";
Regex ip = new Regex(#"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b");
MatchCollection result = ip.Matches(input);
Console.WriteLine(result[0]);
Very old post, you should use the accepted solution, but consider using the right RegEx for an IPV4 adress :
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
If you want to avoid special caracters after or before you can use :
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
Try this:
Match match = Regex.Match(input, #"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}");
if (match.Success)
{
Console.WriteLine(match.Value);
}
If you just want check correct IP use IPAddress.TryParse
using System.Net;
bool isIP(string host)
{
IPAddress ip;
return IPAddress.TryParse(host, out ip);
}
I know this post isn't new, but, I've tried several of the proposed solutions and none of them work quite as well as one I found thanks to a link provided by Justin Jones. They have quite a few for IP Address but this is the top of the list and using LinqPad (I LOVE LinqPad) most tests I've thrown at it work extremely well. I recommend utilizing this one rather than any of the previous provided expressions:
^(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])$
Give that a shot in LinqPad with the following:
// \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b 355.168.0.1 = 355.168.0.1 (Not Correct)
// ((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) 355.168.0.1 = 55.168.0.1 (Not correct)
// \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} 355.168.0.1 = 355.168.0.1 (Not Correct)
// ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ 355.168.0.1 = 355.168.0.1 (Not Correct)
// ^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$ 355.168.0.1 = 355.168.0.1 (Not Correct)
// ^(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])$ 355.168.0.1 = No Match (Correct)
Match match = Regex.Match("355.168.0.1", #"^(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])$");
if (match.Success) {
Console.WriteLine(match.Value);
}
else {
Console.WriteLine("No match.");
}
With the new RegEx this is not valid which is correct: 355.168.0.1 = No Match which is correct as noted in the comments.
I welcome any tweaks to this as I'm working on a tool that is making use of the expression and am always looking for better ways of doing this.
UPDATE: I've created a .NET Fiddle project to provide a working example of this expression along with a list of IP Addresses that test various values. Feel free to tinker with it and try various values to exercise this expression and provide any input if you find a case where the expression fails. https://dotnetfiddle.net/JoBXdI
UPDATE 2: Better yet refer to this post: Another related question.
Thanks and I hope this helps!
Regex.IsMatch(input, #"^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$")
Avoid using /b - it allows characters before or after the IP
For example ...198.192.168.12... was valid.
Use ^ and $ instead if you can split the input into chunks that would isolate the IP address.
Regex regexIP = new Regex(#"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$");
if (regexIP.Match(textBoxIP.Text).Success){
String myIP = textBoxIP.Text;
}
Note above will not validate the digits, as pointed out 172.316.254.1 was true. This only checks correct formatting.
UPDATE: To validate FORMATTING and VALUES you could use
Regex regexIP = new Regex(#"^([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\.([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\.([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\.([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])$");
if (regexIP.Match(textBoxIP.Text).Success){
String myIP = textBoxIP.Text;
}
(note using ([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5]) for each numeric value)
Credit: https://stackoverflow.com/a/10682785/4480932
I think you need to get rid of the [ - is that a stray character or what?
Regex(#"[\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b")
Regex(#"\A\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\z") try with this
I took this pattern from UrlAttribute.cs. at DataAnnotations namespace. As you may see, I took just a piece of the original pattern from source.
Regex.IsMatch(input, #"^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-
9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-
9]\d|1\d\d|2[0-4]\d|25[0-5])$");
(\d{1,3}\.){3}(\d{1,3})[^\d\.]
Check this. It should work perfectly
Another variant, depending on how you want to treat padding (e.g. a.b.c.00 is considered invalid format):
^(?25[0-5]|2[0-4][0-9]|[1]?[1-9][1-9]|[1-9]{1}|0{1})(.(?25[0-5]|2[0-4][0-9]|[1]?[1-9][1-9]|[1-9]{1}|0{1})){3}$
In Python:
>>> ip_regex = r'^{0}\.{0}\.{0}\.{0}$'.format(r'(25[0-5]|(?:2[0-4]|1\d|[1-9])?\d)')
>>> match(ip_regex, '10.11.12.13')
<re.Match object; span=(0, 11), match='10.11.12.13'>
>>> _.groups()
('10', '11', '12', '13')
>>>

Categories

Resources