Is there a way to separate email content (body text) from an added signature using IMap packages?
IEnumerable MailList = Client.Search(SearchCondition.Unseen());
var email = Client.GetMessage(MailList[0]);
string body = email.Body;
Thanks
This is a rather difficult problem.
For text/plain, you can look for the line "-- " (three characters, including the trailing space). For text/html, you can look for the CSS classes gmail_signature and moz-signature. For all mail, you can look for trailing text that matches the trailing text of the previous message from the same address.
However, none of this is foolproof. Lots of HTML sigs don't use those CSS rules (Outlook, for example, uses no relevant CSS), lots of plaintext sigs don't use --, and lots of middlecrapware inserts text after the signature so the "trailing text" may not be the at the very end.
Related
I've run a security scan at my server and got some CRLF exploitation warning.
So, as recommended, I've sanitized all my query parameter inputs like below.
var encodedStringSafeFromCRLF = Server.UrlDecode(Request.QueryString["address"])
.Replace("\r", string.Empty)
.Replace("%0d", string.Empty)
.Replace("%0D", string.Empty)
.Replace("\n", string.Empty)
.Replace("%0a", string.Empty)
.Replace("%0A", string.Empty);
Let's say, a genuine user is sending an address to me via "address" query parameter.
Example -
https://mywebsite.com/details?instId=151711&address=24%20House%20Road%0aSomePlace%0aCountry
Since "%0A" will be stripped from the above string, the address would now become
'24HouseRoadSomePlaceCountry' which was not my expectation.
How should I handle this ?
If I make code changes for CRLF this changes how the input is intrepreted.
If input string is not sanitized, then it would open my server for CRLF attack.
Any suggestions here ?
If you really need the user to supply data with CRLF sequences, then I would not filter those. As always, never trust user-supplied data in any way: do not use it to generate HTTP headers, responses or write to log files.
In general, it's safer to filter the other way around: specify all the characters you are willing to accept, and filter out everything else.
If you need to write the data to a log, you could for example URL encode the data first, so that "naked" CR LF are never written there.
I might internally specify that I use just \n as the newline, and convert all \r, \n, and \r\n into just one representation \n internally. So the rest of the code does not have to handle all versions.
In view I rendered two links that have mailto. Both of them have body attributes passed to mailto. One has short body text, other very long. When I click on link that has shorter body, it works and outlook opens. Link with longer body does not work (I clicked and nothing happens). But that happens only in chrome. In other browsers both links work. I noticed that in Chrome page source longer body text is made shorter with some notation. This might be the issue.
Does anyone know how to solve this problem? Any help would be appreciated.
Checking for spaces (and removing them) between the colon and the recipients, and between multiple recipients.
Variables that can be used with mailto:
mailto: set the recipient, or recipients, separate with comma
&cc= set the CC recipient(s)
&bcc= set the BCC recipient(s)
&subject= set the email subject, URL encode for longer sentences, so replace spaces with %20, etc.
&body= set the body of the message, including line breaks. Line breaks should be converted to %0A.
A MailTo Generator can be found here.
I have a textarea saving to a database that I'm using to send as the body of an email.
I allow tokens to be used as placeholders for information pertaining to that message.
If I don't touch the placeholders at all the email sends just fine with the line breaks exactly as they are in the textbox (the email is being sent in plain text).
However, when I start using the replace function, the new line characters start disappearing and all the lines get pushed together.
For example.
Body.Replace("%procedure%", CurrentOrder.Description);
Will replace the text %procedure%, but will also remove the newline at the end of the line. Even if the newline isn't directly after the text being replaced.
Any ideas?
edit:
For now, I'm just replacing "\n" with "<br />" and sending the email as HTML. I would rather keep it as plain text as I don't have control over the recipients at all.
EDIT 2: It appears to be an issue with outlook itself, not the email. I just viewed the exact same email in gmail, and the format was correct.
Outlook removes new lines unless the line ends with two spaces.
If you're testing the emails to an account that uses Outlook try adding two spaces before your new lines and see if that fixes it.
If its for HTML I would first of all replace the new lines with BR tags:
String str = str.Replace(Environment.NewLine, "<br/>");
either that, or instead of using a multiline textarea, use a JQuery or AJAX HTML Editor or something.
Maybe you could swap in a place holder (like above, and then swap it out?)
I need to parse email files with regex in c#, that is parse the email file that contains several emails and parse it into its constituents e.g from, to, bcc etc.
the regex am using for email is
"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*"
the problem am having is the To, Cc and Bcc sometimes contains more than one email, and occurs in more than one line
To: Me meagain <me#me.com>,
Me1 meagain <me1#me.com>,Me3 meagain <me1#me.com>
Also, which regex will match the message?
Parsing an email message with regular expressions is a terrible idea. You might be able to parse the constituent parts with regular expressions, but finding the constituent parts with regular expressions is going to give you fits.
The normal case, of course, is pretty easy. But then you run across something like a message that has an embedded message within it. That is, the content includes a full email message with From:, To:, Bcc:, etc. And your naive regex parser thinks, "Oh, boy! I found a new message!"
You're better off reading and understanding the Internet Message Format and writing a real parser, or using something already written like OpenPop.NET.
Also, check out the suggestions in Reading Email using Pop3 in C# and https://stackoverflow.com/questions/26606/free-pop3-net-library, among others.
A good example of the difficulty you'll face is that your regular expression for matching email addresses is inadequate. According to section 3.2.4 of RFC2822 (linked above), the following characters are allowed in the "local-part" of the email address:
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
The domain name can contain any ASCII except whitespace and the "\" character, and has to meet some format requirements. Then there's the "obsolete" stuff that, although deprecated, is still in use. And that's just in parsing email addresses. If you look at the stuff that can be included in the other fields, I think you'll agree that trying to parse it with regular expressions is going to be frustrating at best.
http://www.codeproject.com/KB/office/reading_an_outlook_msg.aspx
The above tutorial will give you a decent idea of how to read *.msg files from the file system. If you consider using the System.Net.Mail.MailMessage object you can get all info such as:
senders,
recepients,
attachements,
html email template,
text email template,
etc...
Thanks,
I created an API called SigParser which does this for you. It breaks reply chain emails into their parts and handles these sorts of problems where lines are splitting. You get a nice array of the email response bodies with who each section of the email was to if that data was in the reply chain header.
I have an ASP.NET/C# application, part of which converts WWW links to mailto links in an HTML email.
For example, if I have a link such as:
www.site.com
It gets rewritten as:
mailto:my#address.com?Subject=www.site.com
This works extremely well, until I run into URLs with ampersands, which then causes the subject to be truncated.
For example the link:
www.site.com?val1=a&val2=b
Shows up as:
mailto:my#address.com?Subject=www.site.com?val1=a&val2=b
Which is exactly what I want, but then when clicked, it creates a message with:
subject=www.site.com?val1=a
Which has dropped the &val2, which makes sense as & is the delimiter in a mailto command.
So, I have tried various other was to work around this with no success.
I have tried implicitly quoting the subject='' part and that did nothing.
I (in C#) replace '&' with & which Live Mail and Thunderbird just turn back into:
www.site.com?val1=a&val2=b
I replaced '&' with '%26' which resulted in:
mailto:my#address.com?Subject=www.site.com?val1=a%26amp;val2=b
In the mail with the subject:
www.site.com?val1=a&val2=b
EDIT:
In response to how URL is being built, this is much trimmed down but is the gist of it. In place of the att.Value.Replace I have tried System.Web.HtmlUtility.URLEncode calls which also results in a failure
HtmlAgilityPack.HtmlNodeCollection nodes =doc.DocumentNode.SelectNodes("//a[#href]");
foreach (HtmlAgilityPack.HtmlNode link in nodes)
{
HtmlAgilityPack.HtmlAttribute att = link.Attributes["href"];
att.Value = att.Value.Replace("&", "%26");
}
Try mailto:my#address.com?Subject=www.site.com?val1=a%26val2=b
& is an HTML escape code, whereas %26 is a URL escape code. Since it's a URL, that's all you need.
EDIT: I figured that's how you were building your URL. Don't build URLs that way! You need to get the %26 in there before you let anything else parse or escape it. If you really must do it this way (which you really should try to avoid), then you should search for "&" instead of just "&" because the string has already been HTML escaped at this point.
So, ideally, you build your URL properly before it's HTML escaped. If you can't do it properly, at least search for the right string instead of the wrong one. "&" is the wrong one.
You cant put any character as subject. You could try using System.Web.HttpUtility.URLEncode function on the subject´s value...
Using the URL escape code %26 is the right way.
Sadly this is still not working on the Android OS because of bug 8023
What I ended up doing for my case was eliminating the &.
www.site.com/mytest.php?val1=a=b=c. Where the 2nd and 3rd = would be equivalent to www.site.com?val1=a&val2=b&val3=c
In mytest.php I explode on ? and then explode again on =.
A total hack I know but it does work for me.