Compare two text files if contains match in c#

Compare two text files if contains match in c# - c#

I have been searching for an solution and haven't been able to find one for this situation.
How can I compare two text files so that it only displays match?
ipaddress.txt contains:
10.30.16.221
10.30.16.228
10.30.16.223
I have another text file that displays the dns name along with ip address.
dns.txt contains:
dogs.com 10.30.16.221
cats.com 10.30.16.222
snakes.com 10.30.16.223
How can I compare ipaddress.txt and dns.txt so that it only return lines that contains a matching ip address?
It should return
dogs.com 10.30.16.221
snakes.com 10.30.16.223

Read all the lines of ipaddress.txt to a hashset:
var ips = File.ReadAllLines("ipaddress.txt").ToHashSet();
Then, for every line in DNS, ask the hashset whether it contaisn the IP from that line:
var matches = File.ReadAllLines("dns.txt").Where(line => hs.Contains(line.Split().Last()));
It's more efficient to do an exact query after extracting just the IP, than it is to e.g. put the IPs in a list and for each line of DNS ask "does this DNS line end with any of the IPs in the list". You could also look at something like line[line.LastIndexOf(' ')+1..] to extract the IP. Note that both of these assume that dns is nicely formed with no trailing spaces etc; if the data in it is a bit wonky you'll need to clean it up. You could also use something like:
var ips = File.ReadAllLines("ipaddress.txt");
var matches = File.ReadAllLines("dns.txt").Where(line => ips.Any(ip => line.Contains(ip)));
but it's potentially a lot more inefficient for large lists

Related

Regular expression that matches all valid format IPv6 addresses

At first glance, I concede that this question looks like a duplicate of this question and any other related to it:
Regular expression that matches valid IPv6 addresses
That question in fact has an answer that nearly answers my question, but not fully.
The code from that question which I have issues with, yet had the most success with, is as shown below:
private string RemoveIPv6(string sInput)
{
string pattern = #"(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))";
//That is one looooong regex! From: https://stackoverflow.com/a/17871737/3472690
//if (IsCompressedIPv6(sInput))
// sInput = UncompressIPv6(sInput);
string output = Regex.Replace(sInput, pattern, "");
if (output.Contains("Addresses"))
output = output.Substring(0, "Addresses: ".Length);
return output;
}
The issues I had with the regex pattern as provided in this answer, David M. Syzdek's Answer, is that it doesn't match and remove the full form of the IPv6 addresses I'm throwing at it.
I'm using the regex pattern to mainly replace IPv6 addresses in strings with blanks or null value.
For instance,
Addresses: 2404:6800:4003:c02::8a
As well as...
Addresses: 2404:6800:4003:804::200e
And finally...
Addresses: 2001:4998:c:a06::2:4008
All either don't get fully matched by the regex, or failed to be completely matched.
The regex will return me the remaining parts of the string as shown below:
Addresses: 8a
Addresses: 200e
Addresses: 2:4008
As can be seen, it has left remnants of the IPv6 addresses, which is hard to detect and remove, due to the varying formats that the remnants take on. Below is the regex pattern by itself for better analysis:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Therefore, my question is, how can this regex pattern be corrected so it can match, and therefore allow the complete removal of any IPv6 addresses, from a string that doesn't solely contain the IPv6 address(es) itself?
Alternatively, how can the code snippet I provided above be corrected to provide the required outcome?
For those who may be wondering, I am getting the string from the StandardOutput of nslookup commands, and the IPv6 addresses will always differ. For the examples above, I got those IPv6 addresses from "google.com" and "yahoo.com".
I am not using the built-in function to resolve DNS entries for a good reason, which I don't think will matter for the moment, therefore I am using nslookup.
As for the code that is calling that function, if required, is as below: (It itself is also another function/method, or rather part of one)
string output = "";
string garbagecan = "";
string tempRead = "";
string lastRead = "";
using (StreamReader reader = nslookup.StandardOutput)
{
while (reader.Peek() != -1)
{
if (LinesRead > 3)
{
tempRead = reader.ReadLine();
tempRead = RemoveIPv6(tempRead);
if (tempRead.Contains("Addresses"))
output += tempRead;
else if (lastRead.Contains("Addresses"))
output += tempRead.Trim() + Environment.NewLine;
else
output += tempRead + Environment.NewLine;
lastRead = tempRead;
}
else
garbagecan = reader.ReadLine();
LinesRead++;
}
}
return output;
The corrected regex should only allow the removal of IPv6 addresses, and leave IPv4 addresses untouched. The string that will be passed to the regex will not contain the IPv6 address(es) alone, and will almost always contain other details, and as such, it is unpredictable at which index will the addresses appear. The regex is also skipping all other IPv6 addresses after the first occuring IPv6 addresses as well for some reason, it should be noted.
Apologies if there are any missing details, I will try my best to include them in when alerted. I would also prefer working code samples, if possible, as I have almost zero knowledge regarding regex.

(?:^|(?<=\s))(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))(?=\s|$)
Using lookarounds you can enforce a complete match rather than a partial match.See demo.
https://regex101.com/r/cT0hV4/5

(?i)(?<ipv6>(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))
Demo: Regex101
Github Repository

Extract hostname from fully qualified domain name (FQDN)

I need to implement a method which extracts a hostname from FQDN. For example if a hypothetical mail server is mymail.somecollege.edu I want to get as a result mymail
And if I get illegal string (not real FQDN) need no get null or some error code
How can I extract hostname?-
I don`t want to make a parsing of the input by myself.But rather looking for existing API.
Thanks
I tried to search for the first dot '.' - substring before it is a hostname.
But I am looking for existing API

I could not find any helper/API class to obtain the hostname (mymail) from the FQDN (mymail.somecollege.edu). You may have to just parse it like you mentioned: Extract everything up to the first "." character. NOTE: hostnames are not allowed to contain "." character.
var fullyQualifiedDomainName = Dns.GetHostEntry("computer").HostName;
var hostName = fileComputerName.Substring(0, fullyQualifiedDomainName
.IndexOf("."));

Read X-Forwarded-For header

I want to read the value of the X-Forwarded-For header value in a request.
I've tried
HttpContext.Current.Request.Headers["X-Forwarded-For"].Split(new char[] { ',' }).FirstOrDefault();
in C#.
OR do I need to split the header by ":" and the take the second string?
I am asking this because, Wikipedia says
The general format of the field is:
X-Forwarded-For: client1, proxy1, proxy2

The format that you get in return is client1, proxy1, proxy2
So you split it with the comma, and get the first to see the ip of your client.

If helps, this is a simple way of getting the user's IP address, considering the X_FORWARDED_FOR header
var forwardedFor = Request.ServerVariables["HTTP_X_FORWARDED_FOR"];
var userIpAddress = String.IsNullOrWhiteSpace(forwardedFor) ?
Request.ServerVariables["REMOTE_ADDR"] : forwardedFor.Split(',').Select(s => s.Trim()).FirstOrDefault();

Don't forget that X-Forwarded-For can contain whatever client writes there. It can contain XSS or SQL-injection inside.

Sometimes the first may contain one of the local (private) reserved addresses which is not useful. Also the first position(s) are open to to spoofing.
Update - April 2018: Sampling the cases of a live production website where the first address is local (private) indicates some configuration issue on the end user's network or his ISP. The cases are occurring only rarely (<1%) and consistently for the same end users.
The answer below suggests walking from right to left until you hit a public address. Not sure anyone actually does this but it points out the issue.
https://husobee.github.io/golang/ip-address/2015/12/17/remote-ip-go.html

Regular expression to extract domain name from any domain

I'm trying to extract the domain name from a string in C#. You don't necessarily have to use a RegEx but we should be able to extract yourdomain.com from all of the following:
yourdomain.com
www.yourdomain.com
http://www.yourdomain.com
http://www.yourdomain.com/
store.yourdomain.com
http://store.yourdomain.com
whatever.youdomain.com
*.yourdomain.com
Also, any TLD is acceptable, so replace all the above with .net, .org, 'co'uk, etc.

If no scheme present (no colon in string), prepend "http://" to make it a valid URL.
Pass string to Uri constructor.
Access the Uri's Host property.
Now you have the hostname. What exactly you consider the ‘domain name’ of a given hostname is a debatable point. I'm guessing you don't simply mean everything after the first dot.
It's not possible to distinguish hostnames like ‘whatever.youdomain.com’ from domains-in-an-SLD like ‘warwick.ac.uk’ from just the strings. Indeed, there is even a bit of grey area about what is and isn't a public SLD, given the efforts of some registrars to carve out their own niches.
A common approach is to maintain a big list of SLDs and other suffixes used by unrelated entities. This is what web browsers do to stop unwanted public cookie sharing. Once you've found a public suffix, you could add the one nearest prefix in the host name split by dots to get the highest-level entity responsible for the given hostname, if that's what you want. Suffix lists are hell to maintain, but you can piggy-back on someone else's efforts.
Alternatively, if your app has the time and network connection to do it, it could start sniffing for information on the hostname. eg. it could do a whois query for the hostname, and keep looking at each parent until it got a result and that would be the domain name of the lowest-level entity responsible for the given hostname.
Or, if all that's too much work, you could try just chopping off any leading ‘www.’ present!

I would recommend trying this yourself. Using regulator and a regex cheat sheet.
http://sourceforge.net/projects/regulator/
http://regexlib.com/CheatSheet.aspx
Also find some good info on Regular Expressions at coding horror.

Have a look at this other answer. It was for PHP but you'll easily get the regex out of the 4-5 lines of PHP and you can benefit from the discussion that followed (see Alnitak's answer).

A regex doesn't really fit your requirement of "any TLD", since the format and number of TLDs is quite large and continually in flux. If you limited your scope to:
(?<domain>[^\.]+\.([A-Z]+$|co\.[A-Z]$))
You would catch .anything and .co.anything, which I imagine covers most realistic cases...

IP address parsing in .NET

I'm using IPAddress.TryParse() to parse IP addresses. However, it's a little too permissive (parsing "1" returns 0.0.0.1). I'd like to limit the input to dotted octet notation. What's the best way to do this?
(Note: I'm using .NET 2.0)
Edit
Let me clarify:
I'm writing an app that will scan a range of IPs looking for certain devices (basically a port scanner). When the user enters "192.168.0.1" for the starting address, I want to automatically fill in "192.168.0.255" as the ending address. The problem is that when they type "1", it parses as "0.0.0.1" and the ending address fills in as "0.0.0.255" - which looks goofy.

If you are interested in parsing the format, then I'd use a regular expression. Here's a good one (source):
bool IsDottedDecimalIP(string possibleIP)
{
Regex R = New Regex(#"\b(?:\d{1,3}\.){3}\d{1,3}\b");
return R.IsMatch(possibleIP) && Net.IPAddress.TryParse(possibleIP, null);
}
That regex doesn't catch invalid IPs but does enforce your pattern. The TryParse checks their validity.

An IP address is actually a 32 bit number - it is not xxx.xxx.xxx.xxx - that's just a human readable format for the same. So IP address 1 is actually 0.0.0.1.
EDIT: Given the clarification, you could either go with a regex as has been suggested, or you could format the short cuts to your liking, so if you want "1" to appears as "1.0.0.0". you could append that and still use the parse method.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Compare two text files if contains match in c# - c#

Related

Regular expression that matches all valid format IPv6 addresses

Extract hostname from fully qualified domain name (FQDN)

Read X-Forwarded-For header

Regular expression to extract domain name from any domain

IP address parsing in .NET

Categories

Resources