Best Practice to parse email and collect information you need - c#

We have implemented algorithm to parse emails and collect information from it something like if the email is
Hi John,
Here is your reservation info
Name : John F
Date : 2/12/2013
State : NY ....
we save a configuration for each email form like Look for Keys like "Name", "Date", "State".. and those are our KEYS and ":" is a delimiter and anything the follows after ":" is a VALUE for that key. This is how we parse and collect info we wanted. We read the email from GMAIL inbox sometimes the Email body we get from GMAIL is cluttered and our algorithm can't read KEY:VALUE pairs and capture nothing. If you see the Actual email in GMAIL inbox it is neat and tidy with all the formatting and stuff but the Email source we get in the code is different. Not sure if this has to do anything with Encoding. Please suggest what could be the reason. Thanks
Really appreciate your thoughts.
Here is an example the first image is what we see in gmail and 2nd image is what we are getting as a source (plain text NOT HTML.) By the way we are parsing Plain text NOT HTML

Related

Parsing Email(Email format is not Fixed) for certain fields in c#

We have a requirement to Parse the email and get certain fields like Name, Address and from email address from the mail body and Header. The Problem lies in the fact that the format of email is not fixed which means that the fields can come in any order in the email body but we only need to get the values of the above mentioned fields.
I am able to get the whole email but not able to understand how to handle the scenario of getting the value from my required fields.
Request you to please have a look and help me out.
Regards
Vineet More

Send RTF in a HTML body by email C#

We have an C# application which sends emails to clients. In these emails can be information about several things and this information can contains a note.
Example email:
Person: John
Age: 35
Note: He works as developer.
(Jonn's picture)
(Excel table)
Person: Mary
Age: 40
Note: (Another picture)
bla bla bla
Until now, we extracted the plain text of the note, but now we want to send the whole note (it is written in rtf format and it can contain images, excel tables and so on).
The email body is made in HTML and can contain several notes.
Does someone know what will be the best option to add these notes to the email? Is it that possible? because the body is a HTML document and I have to add several notes... Maybe is it easier as image (try to get an image from the rtf)? or is it better in HTML?
I hope you can help me or guide me.
Thank you in advance.
Regards.
Definitely its possible. I did similar sort of thing. First read RTF contents into a string (say rtfContent) by using InputStreamReader, then pass this string to a method ConvertRtf2Html(rtfContent). You can follow this link to download the project which converts RTF content to HTML and much(I don't know all the functionality, as I used only ConvertRtf2Html() method)

building a multiple email regular expression for a regular expression validator on asp.net c#?

I'm trying to make a form on asp.net where you can send emails. I want to have a textbox where I can type all email addresses which I want to send a message...
However I want to add a validator there so the user always has the correct syntax and therefor, use that string on c# and send the message properly.
For now, I have a validator for a single email address...
<asp:RegularExpressionValidator ID="RegularExpressionValidator5" runat="server"
ControlToValidate="tbxAlClEmail" ErrorMessage="E-mail Invalido"
Font-Bold="True" ForeColor="Red"
ValidationExpression="\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*"
ValidationGroup="vgpNuevoCliente">
</asp:RegularExpressionValidator>
so, we can see that....
email = \w+([-+.']\w+)#\w+([-.]\w+).\w+([-.]\w+)*
(I know that it is difficult to find an expression regular that can validate all kind of email addresses... but this will have to do)
now I want to addapt this regular expression to an asp.net validator:
email ( (,|;) [SPACE]* email )*
example of results i want:
john#hotmail.com, amy#yahoo.com,diana#hotmail.com; alicia#gmail.com
I hope you can help me with that...
Thanks in advance
What you have won't work to validate email addresses.
Using regex for email validation is extremely hard to do right and there are literally hundreds of examples on the net of validators that get close... but still aren't perfect.
But that's only a small part of the problem anyway. Your absolute best bet is to drop trying to validate the address and simply send the messages while telling the user which addresses failed.
If you are doing it right, then you aren't including every address in the TO field anyway and instead are sending distinct messages to each individual address. Which would make it fairly easy to report errors.
Of course, even then numerous mail servers are configured to not even respond if a bad email address is sent to it and instead they just black hole the message. No amount of validation etc is going to get past that.
For fun, you might want to read the following in its entirety so that you understand the full problem: http://www.regular-expressions.info/email.html

How to know if attachment is a signature in an Outlook email

I have an Outlook email and I need to process it's attachments. But when iterating through the attachments, if the attachment is a signature I want to skip it.
To know if the attachment is a signature I am using:
outlookMailItem.Attachments[i].PropertyAccessor.GetProperty(
"http://schemas.microsoft.com/mapi/proptag/0x3712001E");
But I am getting an Outlook security alert.
Is there another way using a safer code? Can it be done using Redemption?
PR_ATTACH_CONTENT_ID property is a good indication that an attachment is an embedded image, but there are attachments that have PR_ATTACH_CONTENT_ID property set, but they are not embedded images (Lotus Notes likes to set PR_ATTACH_CONTENT_ID on all attachments).
Even if PR_ATTACH_CONTENT_ID is not set, Outlook can use PR_ATTACH_CONTENT_LOCATION or PR_ATTACH_LONG_FILENAME to load an embedded image.
The only real test is to parse the HTML body and figure out which <img> tags refer to the attachments.
Redemption (I am its author) will let you access that property using RDOAttachment.Fields, you can also use RDOAttachment.Hidden property, which jumps through a few hoops to figure out whether an attachment is an embedded image and not a "real" attachment.
RDOAttachment.Hidden property works well only if the email format is HTML. For emails in Rich Text Format, the signature image would be treated as any other attachment and will have this value as false. A better bet would be to use "Attachment.Type", which works for both HTML and Rich Text. For signature, it would always be olOLE and for other attachments, it would be olByValue. So, you can filter the signature images using this property. However, note that, if the email format is Rich Text and if you have a screenshot embedded within the email, it's treated as olOLE type.

Stop Auto-hyperlink in Outlook, Gmail etc

My web application sends emails to users. The email contains a link for further user action. Our security standards require that the link in the email cannot be clickable. However, the email clients recognize https:// in the email and auto-link the URL.
Any idea on how to stop the email clients to auto-link. I am thinking if I skip the https://, it may stop the auto-linking. But, if I have to keep the https:// is there any way to avoid auto-linking.
The link in the email is dynamically constructed in the c# code.
I know this thread is old, but I just had this issue myself, and wasn't thrilled by the gif image fix. If you're working with HTML emails, a slightly nicer solution is to break up the link text with a non-rendering tag that tricks the parser. I'm a fan of a simple non-existant <z>:
https<z>://securesite.</z>com
It even works in Stack Overflow posts: https://securesite.com.
Hope this helps someone.
I too wish to disable this, as I believe this is a "valid" use as to not wanting auto-linking (one reason is the designer wants it that way, and they are currently paying the bills).
In email sent that has no images, the header has the domain name in it:
EXTRANET.EXAMPLE.COM
I even put inline styles to make sure it stays white on a black background:
<span style="font-size: 1.5em;padding: 0.5em 0;text-transform: uppercase; font-weight:bold;color:#FFFFFF;text-decoration:none;">EXTRANET.EXAMPLE.COM</span>
Gmail makes this a link, adds an underline and also turns it bright blue instead of the intended white.
At first I tried replacing the dots with . which made it look fine, but didn't fool the Gmail parser.
So, I added a spanned space which work just fine (i.e. it fools Gmail's parser):
<span style="font-size: 1.5em;padding: 0.5em 0;text-transform: uppercase; font-weight:bold;color:#FFFFFF;text-decoration:none;">EXTRANET<span style="font-size:0.1em"> </span>.<span style="font-size:0.1em"> </span>EXAMPLE<span style="font-size:0.1em"> </span>.<span style="font-size:0.1em"> </span>COM</span>
Just create a plain <span> tag around the colon (<span>:</span>) or something like that :)
Replace the actual text with a small GIF image that looks like text.
Email parsers will not recognize text within an image.
My application has a similar security requirement. The solution we used was to add an underscore to the beginning of the URL (_http://).
Sorry to dredge up an old question, but I just tried the answer suggested by pieman72, and found that it didn't work within Outlooks 2007–2013. However, wrapping the individual elements of the URL within table cells did fool the Outlook parser:
Visit <table><tr><td>www.</td><td>website</td><td>.com</td></tr></table> for more information.
I ran a sample message through the Email On Acid test suite and found that it eluded the parser on all the major e-mail clients which automatically convert URLs (Outlook, iOS, Android 2.2, etc.) I did not run any deliverability tests.
#raugfer suggests in another answer: wrap the email/URL with an anchor.
<a name="myname">test#email.com</a>
Quoting from that answer:
Since the text is already wrapped in a hyperlink, Gmail gives up and
leave it alone. :)
(Note: also worked for Apple mail client.)
Necroing the question, I know, but it's relevant... I'd like to present a reasonable scenario where Gmail's auto-linking (at least - haven't tested other clients) doesn't make sense.
A client has an application form on their site, where visitors fill out some personal information and submit it. The system then sends a notification email to the client, presenting the information the visitor supplied.
I'm wanting to enhance the email sent to the client by adding a <textarea> at the bottom, with the fields the visitor filled out presented in CSV format so that the client can simply copy it all and paste it into a spreadsheet.
Gmail, however, fails to recognize that the URLs and email addresses are inside a <textarea> tag, and "helpfully" adds the ... link code around the URL/email - inside the <textarea>. This results in the raw HTML link code showing up in the <textarea>.
This is what i did:
Replace all instances of "." with <span style=""color:transparent; font-size:0px;"">[{</span>.<span style=""color:transparent; font-size:0px;"">}]</span>
Replace all instances of "#" with <span style=""color:transparent; font-size:0px;"">[{</span>#<span style=""color:transparent; font-size:0px;"">}]</span>
These characters stopped it parsing links and email addresses, but aren't visible to the user. The negative is that when you copy and paste an email for example, you end up with: "test1{[{.}]}domain{[{.}]}com"
.

Categories

Resources