IP address parsing in .NET - c#

I'm using IPAddress.TryParse() to parse IP addresses. However, it's a little too permissive (parsing "1" returns 0.0.0.1). I'd like to limit the input to dotted octet notation. What's the best way to do this?
(Note: I'm using .NET 2.0)
Edit
Let me clarify:
I'm writing an app that will scan a range of IPs looking for certain devices (basically a port scanner). When the user enters "192.168.0.1" for the starting address, I want to automatically fill in "192.168.0.255" as the ending address. The problem is that when they type "1", it parses as "0.0.0.1" and the ending address fills in as "0.0.0.255" - which looks goofy.

If you are interested in parsing the format, then I'd use a regular expression. Here's a good one (source):
bool IsDottedDecimalIP(string possibleIP)
{
Regex R = New Regex(#"\b(?:\d{1,3}\.){3}\d{1,3}\b");
return R.IsMatch(possibleIP) && Net.IPAddress.TryParse(possibleIP, null);
}
That regex doesn't catch invalid IPs but does enforce your pattern. The TryParse checks their validity.

An IP address is actually a 32 bit number - it is not xxx.xxx.xxx.xxx - that's just a human readable format for the same. So IP address 1 is actually 0.0.0.1.
EDIT: Given the clarification, you could either go with a regex as has been suggested, or you could format the short cuts to your liking, so if you want "1" to appears as "1.0.0.0". you could append that and still use the parse method.

Related

Why IPAddress.Parse("192.168.001.001") works while IPAddress.Parse("192.168.001.009") don't?

I'm stuck trying to parse IP addresses from a API result where each of the four pats of the IPv4 Address comes prefixed with 0 (zeroes). Something like this:
127.000.000.001 instead of 127.0.0.1
I started getting parse errors when trying to parse 192.168.001.009. It also fails for 192.168.001.008, but works for 007, 006, 005 up to 001!!!
It also fails for 192.168.001.018, but works for .017, .016 down to 010!
It works for 192.168.001.8 or .8 and also 192.168.001.18 and .19...
Is this a bug in the CLR? Or am I missing something stupid?
Just try:
IPAddress.Parse("192.168.001.007"); // works
IPAddress.Parse("192.168.001.87"); // works
IPAddress.Parse("192.168.001.008"); // throws exception
IPAddress.Parse("192.168.001.19"); // works
IPAddress.Parse("192.168.001.019"); // throws exception
// and so on!
The numbers, since they are starting with 0, are being interpreted as octal instead of decimal. These are not C# literals, so it's up to the library to interpret it one way or another.
A simple way to test it would be to construct an IP ending in ".010", parse it, and you'll see that it was parsed as an ip ending in .8.
A possible quick and dirty solution would be to search for the regex /\.0*/ and replace it with "."
You can find more information on the wikipedia entry for Dot-decimal-notation:
A popular implementation of IP networking, originating in 4.2BSD, contains a function inet_aton() for converting IP addresses in character strings representation to internal binary storage. In addition to the basic four-decimals format and full 32-bit addresses, it also supported intermediate syntaxes of octet.24bits (e.g. 10.1234567; for Class A addresses) and octet.octet.16bits (e.g. 172.16.12345; for Class B addresses). It also allowed the numbers to be written in hexadecimal and octal, by prefixing them with 0x and 0, respectively. These features continue to be supported by software until today, even though they are seen as non-standard. But this also means addresses where an IP address component is written with a leading zero digit may be interpreted differently by different programs: some will ignore the leading zero, some will interpret the number as octal.
This is probably because 00X or 0XY are considered octal numbers which allows only digits 0 through 7. Digits 8 and 9 are an error.

regular expression get all hosts from html

I'm trying to get all urls in one regular expression, currently i'm using this pattern.
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
However that regex returns the pages/files, instead of hosts. So instead of having to run a second regular expression, I'm hoping someone here can help
This returns http://www.yoursite.com/index.html
I'm attempting to return yoursite.com.
Also the the regex will be parsing from html and hosts will be checked after, so 100% accuracy isn't crucial.
Assuming that your regex:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Actually does parse the Urls (I haven't checked it), you could easily use a capture group to get the host:
/^(https?:\/\/)?(?<host>([\da-z\.-]+)\.([a-z\.]{2,6}))([\/\w \.-]*)*\/?$/
When you get the Match result, you can examine Groups["host"] to get the host name.
But you're much better off, in my opinion, just using Uri.TryCreate, although you'll need a little logic to get around the possible lack of a scheme. That is:
if (!Regex.IsMatch(line, "https?:\/\/"))
line = "http://" + line;
Uri uri;
if (Uri.TryCreate(line, UriKind.Absolute, out uri))
{
// it's a valid url.
host = uri.Host;
}
Parsing Urls is a pretty tricky business. For example, no individual dotted segment can exceed 63 characters, and there's nothing preventing the last dotted segment from having numbers or hyphens. Nor is it limited to 6 characters. You're better off passing the entire string to Uri.TryCreate than you are trying to duplicate the craziness of URL parsing with a single regular expression.
It's possible that the rest of the Url (after the host name) could be trash. If you want to eliminate that bit causing a problem, then extract everything up to the end of the host name:
^https?:\/\/[^\/]*
Then run that through Uri.TryCreate.
To capture just the yoursite.com from sample text http://www.yoursite.com/index?querystring=value you could use this expression, however this does not validate the string:
^(https?:\/\/)?(?:[^.\/?]*[.])?([^.\/?]*[.][^.\/?]*)
Live demo: http://www.rubular.com/r/UNR7qiQ0Eq

How to validate ip address in C#

I'm doing an application that uses IP address. I have to validate them to start from at least 1.0.0.1 but with the codes below it accepts 0.0.0.0:
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
I also tried changing it to:
\b(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
This code does not accept 0.0.0.0 but does not accept 100.0.0.0 to 109.0.0.0 either.
Can someone help?
Use
IPAddress addr = IPAddress.TryParse(str);
Then, if that worked get the numbers using
addr.GetAddressBytes();
and then check the byte values for the correct conditions using normal if-cases.
Save yourself the pain! Convert to a string, split on the dot character and check whether each of the 4 segments is in the range 0 or 1 to 255.
Otherwise if you use RegexBuddy (which is a fantastic app for RegEx) it has a bunch of IP address examples in the Library inc for 0.0.0.0 to 255.255.255.255:
\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b
Try using this,
ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";

Read X-Forwarded-For header

I want to read the value of the X-Forwarded-For header value in a request.
I've tried
HttpContext.Current.Request.Headers["X-Forwarded-For"].Split(new char[] { ',' }).FirstOrDefault();
in C#.
OR do I need to split the header by ":" and the take the second string?
I am asking this because, Wikipedia says
The general format of the field is:
X-Forwarded-For: client1, proxy1, proxy2
The format that you get in return is client1, proxy1, proxy2
So you split it with the comma, and get the first to see the ip of your client.
If helps, this is a simple way of getting the user's IP address, considering the X_FORWARDED_FOR header
var forwardedFor = Request.ServerVariables["HTTP_X_FORWARDED_FOR"];
var userIpAddress = String.IsNullOrWhiteSpace(forwardedFor) ?
Request.ServerVariables["REMOTE_ADDR"] : forwardedFor.Split(',').Select(s => s.Trim()).FirstOrDefault();
Don't forget that X-Forwarded-For can contain whatever client writes there. It can contain XSS or SQL-injection inside.
Sometimes the first may contain one of the local (private) reserved addresses which is not useful. Also the first position(s) are open to to spoofing.
Update - April 2018: Sampling the cases of a live production website where the first address is local (private) indicates some configuration issue on the end user's network or his ISP. The cases are occurring only rarely (<1%) and consistently for the same end users.
The answer below suggests walking from right to left until you hit a public address. Not sure anyone actually does this but it points out the issue.
https://husobee.github.io/golang/ip-address/2015/12/17/remote-ip-go.html

Regular expression to extract domain name from any domain

I'm trying to extract the domain name from a string in C#. You don't necessarily have to use a RegEx but we should be able to extract yourdomain.com from all of the following:
yourdomain.com
www.yourdomain.com
http://www.yourdomain.com
http://www.yourdomain.com/
store.yourdomain.com
http://store.yourdomain.com
whatever.youdomain.com
*.yourdomain.com
Also, any TLD is acceptable, so replace all the above with .net, .org, 'co'uk, etc.
If no scheme present (no colon in string), prepend "http://" to make it a valid URL.
Pass string to Uri constructor.
Access the Uri's Host property.
Now you have the hostname. What exactly you consider the ‘domain name’ of a given hostname is a debatable point. I'm guessing you don't simply mean everything after the first dot.
It's not possible to distinguish hostnames like ‘whatever.youdomain.com’ from domains-in-an-SLD like ‘warwick.ac.uk’ from just the strings. Indeed, there is even a bit of grey area about what is and isn't a public SLD, given the efforts of some registrars to carve out their own niches.
A common approach is to maintain a big list of SLDs and other suffixes used by unrelated entities. This is what web browsers do to stop unwanted public cookie sharing. Once you've found a public suffix, you could add the one nearest prefix in the host name split by dots to get the highest-level entity responsible for the given hostname, if that's what you want. Suffix lists are hell to maintain, but you can piggy-back on someone else's efforts.
Alternatively, if your app has the time and network connection to do it, it could start sniffing for information on the hostname. eg. it could do a whois query for the hostname, and keep looking at each parent until it got a result and that would be the domain name of the lowest-level entity responsible for the given hostname.
Or, if all that's too much work, you could try just chopping off any leading ‘www.’ present!
I would recommend trying this yourself. Using regulator and a regex cheat sheet.
http://sourceforge.net/projects/regulator/
http://regexlib.com/CheatSheet.aspx
Also find some good info on Regular Expressions at coding horror.
Have a look at this other answer. It was for PHP but you'll easily get the regex out of the 4-5 lines of PHP and you can benefit from the discussion that followed (see Alnitak's answer).
A regex doesn't really fit your requirement of "any TLD", since the format and number of TLDs is quite large and continually in flux. If you limited your scope to:
(?<domain>[^\.]+\.([A-Z]+$|co\.[A-Z]$))
You would catch .anything and .co.anything, which I imagine covers most realistic cases...

Categories

Resources