At first glance, I concede that this question looks like a duplicate of this question and any other related to it:
Regular expression that matches valid IPv6 addresses
That question in fact has an answer that nearly answers my question, but not fully.
The code from that question which I have issues with, yet had the most success with, is as shown below:
private string RemoveIPv6(string sInput)
{
string pattern = #"(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))";
//That is one looooong regex! From: https://stackoverflow.com/a/17871737/3472690
//if (IsCompressedIPv6(sInput))
// sInput = UncompressIPv6(sInput);
string output = Regex.Replace(sInput, pattern, "");
if (output.Contains("Addresses"))
output = output.Substring(0, "Addresses: ".Length);
return output;
}
The issues I had with the regex pattern as provided in this answer, David M. Syzdek's Answer, is that it doesn't match and remove the full form of the IPv6 addresses I'm throwing at it.
I'm using the regex pattern to mainly replace IPv6 addresses in strings with blanks or null value.
For instance,
Addresses: 2404:6800:4003:c02::8a
As well as...
Addresses: 2404:6800:4003:804::200e
And finally...
Addresses: 2001:4998:c:a06::2:4008
All either don't get fully matched by the regex, or failed to be completely matched.
The regex will return me the remaining parts of the string as shown below:
Addresses: 8a
Addresses: 200e
Addresses: 2:4008
As can be seen, it has left remnants of the IPv6 addresses, which is hard to detect and remove, due to the varying formats that the remnants take on. Below is the regex pattern by itself for better analysis:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
Therefore, my question is, how can this regex pattern be corrected so it can match, and therefore allow the complete removal of any IPv6 addresses, from a string that doesn't solely contain the IPv6 address(es) itself?
Alternatively, how can the code snippet I provided above be corrected to provide the required outcome?
For those who may be wondering, I am getting the string from the StandardOutput of nslookup commands, and the IPv6 addresses will always differ. For the examples above, I got those IPv6 addresses from "google.com" and "yahoo.com".
I am not using the built-in function to resolve DNS entries for a good reason, which I don't think will matter for the moment, therefore I am using nslookup.
As for the code that is calling that function, if required, is as below: (It itself is also another function/method, or rather part of one)
string output = "";
string garbagecan = "";
string tempRead = "";
string lastRead = "";
using (StreamReader reader = nslookup.StandardOutput)
{
while (reader.Peek() != -1)
{
if (LinesRead > 3)
{
tempRead = reader.ReadLine();
tempRead = RemoveIPv6(tempRead);
if (tempRead.Contains("Addresses"))
output += tempRead;
else if (lastRead.Contains("Addresses"))
output += tempRead.Trim() + Environment.NewLine;
else
output += tempRead + Environment.NewLine;
lastRead = tempRead;
}
else
garbagecan = reader.ReadLine();
LinesRead++;
}
}
return output;
The corrected regex should only allow the removal of IPv6 addresses, and leave IPv4 addresses untouched. The string that will be passed to the regex will not contain the IPv6 address(es) alone, and will almost always contain other details, and as such, it is unpredictable at which index will the addresses appear. The regex is also skipping all other IPv6 addresses after the first occuring IPv6 addresses as well for some reason, it should be noted.
Apologies if there are any missing details, I will try my best to include them in when alerted. I would also prefer working code samples, if possible, as I have almost zero knowledge regarding regex.
(?:^|(?<=\s))(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))(?=\s|$)
Using lookarounds you can enforce a complete match rather than a partial match.See demo.
https://regex101.com/r/cT0hV4/5
(?i)(?<ipv6>(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))
Demo: Regex101
Github Repository
I wish to replace email addresses in a string to something else. It does not work for me.
string body = "this is a test abc#emailadx.com";
string pattern = #"\b[!#$%&'*+./0-9=?_`a-z{|}~^-]+#[.0-9a-z-]+\.[a-z]{2,6}\b";
Regex.Replace(body, pattern, "Hidden Email Address");
return body;
Any hints would be helpful please.
You want to do this:
return Regex.Replace(body, pattern, "Hidden Email Address");
If you look at the documentation for Regex.Replace, you'll see that it returns the newly replaced string. It does not affect the string that was passed in.
NOTE: this is assuming you're using C#. But I'm guessing you are, from the syntax.
FURTHERMORE: If your regex still isn't working well, try this one from the Regular Expressions Cookbook (by Goyvaerts & Levithan):
#"^[\w!#$%&'*+/=?`{|}~^.-]+#[A-Z0-9.-]+$"
I'm a complete newbie to RegEx and I'm sure it'll be brilliant to use once I know how to use it. :P
I have a couple of textBoxes and I was wondering if anyone could me acomplish what I need.
In the EMail textbox, I'd like to make sure the user writes in a valid email. xxx#yyy.zzz
Is there a way for RegEx to help me out?
I'd also really like a way to format the name the user writes down. So if a user writes in "SerGIo TAPIA gutTIerrez I want to format that string (behind the scenes before saving it) to "Sergio Tapia Gutierrez" Can RegEx do this?
Thanks so much SO.
(inb4 Rex :P )
A complete and accurate regex for email validation is surprisingly difficult, I trust you can use google to find some examples.
The general rule for email validation is to actually try to send an email.
Well, this is an easy one! :)
no, there exists no regex that can validate* e-mail addresses;
no, regex cannot transform "SerGIo TAPIA gutTIerrez" into "Sergio Tapia Gutierrez". Sure, some language like Perl (and other perhaps) can mix-in some fancy stuff inside regex-es to do this, but it is not regex that actually performs the transformation. Regex only matches text, plain and simple.
* by 'valid' I mean see if the address actually exists.
This is one way, but there are many others.
public static bool isEmail(string emailAddress)
{
if(string.IsNullOrEmpty(emailAddress))
return false;
Regex EmailAddress = new Regex(#"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$");
return EmailAddress.IsMatch(emailAddress);
}
http://www.cambiaresearch.com/c4/bf974b23-484b-41c3-b331-0bd8121d5177/Parsing-Email-Addresses-with-Regular-Expressions.aspx
public bool TestEmailRegex(string emailAddress)
{
// string patternLenient = #"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*";
// Regex reLenient = new Regex(patternLenient);
string patternStrict = #"^(([^<>()[\]\\.,;:\s#""]+"
+ #"(\.[^<>()[\]\\.,;:\s#""]+)*)|("".+""))#"
+ #"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ #"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ #"[a-zA-Z]{2,}))$";
Regex reStrict = new Regex(patternStrict);
// bool isLenientMatch = reLenient.IsMatch(emailAddress);
// return isLenientMatch;
bool isStrictMatch = reStrict.IsMatch(emailAddress);
return isStrictMatch;
}
Everyday I receive thousands of emails and I want to parse the content/body of these emails to load them into a database.
My problem is that nowadays I am parsing the email body manually and I would like to change the logic to a Regular Expression in C#.
Here is the body of the emails:
Gentilissima Agenzia Nexity Residenziale
il nostro utente:
Sig./Sig.ra :Pablo Azorin
Email: pabloazorin#gmail.com
Tel.: 02322-498900
sta cercando un immobile con le seguenti caratteristiche:
Categoria: Residenziale
Tipologia: Villa
Tipo di contratto: Vendita
Comune: Assago Prov. Milano
Zona: non specificata
Fascia di prezzo: non specificata
I need to extract the text in bold and I thought a RegEx is what I need for this...
Looking forward to get your suggestion about how to make it works.
Thanks!
--Pablo
Assuming that the parts in your email that are not bold always occur like that in all your emails, you can easily grab all the parts from your email with the regex:
Sig\./Sig\.ra :(.*)
Email: (.*)
Tel\.: (.*)
sta cercando un immobile con le seguenti caratteristiche:
Categoria: (.*)
Tipologia: (.*)
Tipo di contratto: (.*)
Comune: (.*)
Zona: (.*)
Fascia di prezzo: (.*)
In C#
Regex regexObj = new Regex(#"Sig\./Sig\.ra :(.*)
Email: (.*)
Tel\.: (.*)
sta cercando un immobile con le seguenti caratteristiche:
Categoria: (.*)
Tipologia: (.*)
Tipo di contratto: (.*)
Comune: (.*)
Zona: (.*)
Fascia di prezzo: (.*)");
Match matchObj = regexObj.Match(subjectString);
string Sig = matchObj.Groups[1].Value;
string Email = matchObj.Groups[2].Value;
// and so on to get all the other parts
Read Mastering Regular Expressions. It will teach you everything you need to know to complete this and other similar regex problems, and will give you enough understanding and insight to get you started writing much more complicated regular expressions.
For email downloading I used Mailbee .Net objects. This library is quite easy to use and is well documented. But if you want to avoid programming you can also use an email parser like EmailParser2Database.
If the emails are in the same format always, you can do this a number of different ways. A simple way of doing it would be to split on the newline and take a substring on each line, starting after the label.
With regexes, you'd probably create a regex that creates a number of named captures. You can then index into the Groups property of the match on the name of each named group in order to get the value out of it. This is a little more complex, of course.
i think it will be much better to split this string into an array of lines
you can initialize a dictionary with all the titles as keys
and you will search each line for the Title from the dictionary ("Email:" for example) and then u put the the result back into the into a dictionary as value
at the end you will have a dictionary with all the titles and values.
i think you dont need a regex for that.
actually that way the order of the titles wont matter.
We found that for spam filtering and other high-volume applications, regular expressions are a bit slow for parsing MIME headers, which is what you want to do. The code is somewhat specialized, but I wrote a C state machine for doing the parsing which is as fast as you'll get without going to something like re2c. The code is not for the faint of heart, but it is blindingly fast.
For emails I think you'll find an explicit state machine is easier to work with than regular expressions. It's also the last refuge of the goto statement!
You really don't want to do this manually, or with regular expressions. There are many different ways to encode data in an email, and many emails that don't strictly conform to the spec that can still be parsed. I have had success with AnPOP in a .NET environment.