Parsing emails for TIFF attachments in C#

Parsing emails for TIFF attachments in C# - c#

I built an email parser that extracts TIFF attachments from emails sent by 2 different fax providers, RingCentral and eFax.
The application uses Pop3 to retrieve the email as a text stream and then parse the text to identify the section that represents the Tiff image.
By converting that section of text to a byte array and using a BinaryWriter, I'm able to create the TIFF file on my local hard drive.
public void SaveToFile(string filepath)
{
BinaryWriter bw = new BinaryWriter(new FileStream(filepath, FileMode.Create));
bw.Write(this.Data);
bw.Flush();
bw.Close();
}
The issue is that the eFax email attachments cause runtime errors when converting the text to a byte array.
//_data is a byte array
//RawData is a string
_data = Convert.FromBase64String(RawData); //fails on this line
I get the following error:
The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or a non-white space character among the padding characters.
I assume it has something to do with the encoding/decoding of the string, but I've tried various encoding types and still get the error.
Some additional information:
Programming Language: C#
Email Host: GMail
If I manually forward the email back to myself, the parser works, but will not work against the original.
I even tried auto-forwarding in GMail but this did not work.
I'm responding here to the first comment below and thanks for your response.
The TIFF file is created by taking the section of text from the email that is associated to the TIFF file attachment, converting it to a byte array, and saving the file with a .tiff file extension. This works fine for all RingCentral emails. For example, the RingCentral email section header looks like this:
------=_NextPart_3327195283162919167883
Content-Type: image/tiff; name="18307730038-0803-141603-326.tif"
Content-Transfer-Encoding: base64
Content-Description: 18307730038-0803-141603-326.tif
Content-Disposition: attachment; filename="18307730038-0803-141603-326.tif"
Please note the Content-Transfer-Encoding value of base64. This explains why I use the following C# conversion code:
_data = Convert.FromBase64String(tiffEmailString);
_data is the private variable and is used as the return value in the SaveToFile method above (i.e. _data is returned when this.Data property value was used).
Now for the eFax (the email the fails) section header:
Content-Type: image/tiff; name=FAX_20130802_1375447833_61.tif
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="FAX_20130802_1375447833_61.tif"
Content-MD5: 1B2M2Y8AsgTpgAmY7PhCfg==
It too shows base64. So shouldn't Convert.FromBase64String() method call work?
I'm also going to check whether my parser is grabbing additional text. But if I'm missing something, please point it out. Thanks.
Latest update:
As it turns out the issue was not the encoding but my parser! I was inadvertently including an additional header value in the attachment text. It's working now. Thanks.

Related

Save text with emoji to file become '?'

What I have to do
I have to create a text file (.txt, .doc, ...) with the exact text passed (so with emojis) by a .net WebaApi (and attach it to an email).
Situation:
I have a project with .net webapi. One of my routes consist of creating a text file and attach it to an email, with some text passed by a device that may contain emojis.
I can't figure out how to save emojis correctly. If I copy-paste an emoji into a word or notepad file it works, but if I save it through my code it doesn't. I suppose it is due to formatting, but I tried Unicode, UTF-32, UTF-8, ASCII,...
I tried many solutions found here on SO, but none of them worked for me.
For example this emoji (copy-pasted from .net debugger) --> 🎶 is converted into quotation mark or ¶ó, based on encoding used.
How can I save emoji as text into a file so that they can be read by the receivers?
This is what I've done:
//smsText is a string containing emojis
byte[] bytes = Encoding.Unicode.GetBytes(smsText);
Attachment attachment = new Attachment(new MemoryStream(bytes), tokenKey + ".doc");
attachment.ContentType = new ContentType("application/ms-word");
List <Attachment> attachments = new List<Attachment>();
attachments.Add(attachment);
//send email with attachments
Note that smsText, with debugger, contains the 🎶 correctly displayed.
The email correctly reach the receiver, with the .doc attachment, but the attachment doesn't contains the emojis

Your smsText contains a plaintext string. You can't just write that string into a stream or file that you then call a Word file*.
Word files are binary files with a specific format. You need to use a library that can write this format, or use Interop to interoperate with an existing Word installation.
See for example Free library to MS Word.
And if you're fine with plaintext files, just write the text's bytes to a stream and propagate the appropriate encoding (in this case Unicode, being UTF-16 on .NET).
*: yes you can, just like that Excel tries its best to format an HTML table as an Excel document, but you shouldn't.

MimeKit Character Encoding/Decoding Issue

While using MimeKit to convert .eml files to .msg files, I'm running into an issue that appears to be related to encoding.
With an EML file containing the following, for instance:
--__NEXTPART_20160610_5EF5CF91_471687D
Content-Type: text/plain; charset=iso-2022-jp
Content-Transfer-Encoding: 7bit
添付ファイル名テスト
The result is garbage in the body content:
・Y・t・t・#・C・・・ｼ・e・X・g
Additionally, base-64 encoded ü characters are showing up as ?? when the EML file is read. I've downloaded the latest release of MimeKit, but it doesn't seem to make a difference.
The .eml files open properly with Outlook 2016, but using MimeKit does not appear to be able to read and decode the files properly.

There are a few problems with your above MIME snippet :(
Content-Transfer-Encoding: 7bit is obviously not true, altho that's not likely to be the problem (MimeKit ignores values of 7bit and 8bit for this very reason).
Most importantly, however, is the fact that the charset parameter is iso-2022-jp but the content itself is very clearly not iso-2022-jp (it looks like utf-8).
When you get the TextPart.Text value, MimeKit gets that string by converting the raw stream content using the charset specified in the Content-Type header. If that is wrong, then the Text property will also have the wrong value.
The good news is that TextPart has GetText methods that allow you to specify a charset override.
I would recommend trying:
var text = part.GetText (Encoding.UTF8);
See if that works.
FWIW, iso-2022-jp is an encoding that forces Japanese characters into a 7bit ascii form that looks like complete jibberish. This is what your Japanese text would look like if it was actually in iso-2022-jp:
BE:IU%U%!%$%kL>%F%9%H
That's how I know it's not iso-2022-jp :)
Update:
Ultimately, the solution will probably be something like this:
var encodings = new List<Encoding> ();
string text = null;
try {
var encoding = Encoding.GetEncoding (part.ContentType.Charset,
new EncoderExceptionFallback (),
new DecoderExceptionFallback ());
encodings.Add (encoding);
} catch (ArgumentException) {
} catch (NotSupportedException) {
}
// add utf-8 as our first fallback
encodings.Add (Encoding.GetEncoding (65001,
new EncoderExceptionFallback (),
new DecoderExceptionFallback ()));
// add iso-8859-1 as our final fallback
encodings.Add (Encoding.GetEncoding (28591,
new EncoderExceptionFallback (),
new DecoderExceptionFallback ()));
for (int i = 0; i < encodings.Count; i++) {
try {
text = part.GetText (encodings[i]);
break;
} catch (DecoderFallbackException) {
// this means that the content did not convert cleanly
}
}

Accented characters displayed as hex values in mail source file

I have to convert the content of a mail message to XML format but I am facing some encoding problems. Indeed, all my accented characters and some others are displayed in the message file with their hex value.
Ex :
é is displayed =E9,
ô is displayed =F4,
= is displayed =3D...
The mail is configured to be sent with iso-8859-1 coding and I can see these parameters in the file :
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Notepad++ detects the file as "ANSI as UTF-8".
I need to convert it in C# (I am in a script task in an SSIS project) to be readable and I can not manage to do that.
I tried encoding it in UTF-8 in my StreamReader but it does nothing. Despite my readings on the topic, I still do not really understand the steps that lead to my problem and the means to solve it.
I point out that Outlook decodes the message well and the accented characters are displayed correctly.
Thanks in advance.

Ok I was looking on the wrong direction. The keyword here is "Quoted-Printable". This is where my issue comes from and this is what I really have to decode.
In order to do that, I followed the example posted by Martin Murphy in this thread :
C#: Class for decoding Quoted-Printable encoding?
The method described is :
public static string DecodeQuotedPrintables(string input)
{
var occurences = new Regex(#"=[0-9A-F]{2}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
char hexChar= (char) Convert.ToInt32(match.Groups[0].Value.Substring(1), 16);
input =input.Replace(match.Groups[0].Value, hexChar.ToString());
}
return input.Replace("=\r\n", "");
}
To summarize, I open a StreamReader in UTF8 and place each read line in a string like that :
myString += line + "\r\n";
I open then my StreamWriter in UTF8 too and write the myString variable decoded in it :
myStreamWriter.WriteLine(DecodeQuotedPrintables(myString));

PDF Image from Binary in MTOM

I am making a custom HttpWebRequest call to a GSoap web service that returns an XML + XOP envelope which contains a PDF Image as binary.
I am grabbing the response and taking the binary code in between the boundary string.
Finally, I convert the binary to byte[] and save it as a PDF.
Now, I can see the PDF metadata so the encoding was right, but when I try to open it, I get insufficient data for an image error and the image inside the PDF is not displayed.
I am converting the binary string via this:
retBytes = System.Text.Encoding.UTF8.GetBytes(modStr);
Where modStr is the string starting with %PDF-1.1 and ending with %%EOF. Do I need to do more encoding/decoding so that the image shows up granted I can see everything else (pages/metadata, etc)?

C# Create MIME Header Information? (Content-Type etc.)

I am reading in a file and want to add MIME header infromation to the top of the file.
For example: I have a text file called test.txt. I read the file into a byte array then want to append the following to the beggining:
Content-Type: text/plain;
name="test.txt"
Content-Transfer-Encoding: binary
Content-Disposition: attachment;
filename="test.txt"
How can i get the what the content type is? And how would you reccomend adding this to the beggining of the file? I was planning on creating a string with that information then converting it to a byte array and sticking it on the front of my file buffer but I am worried about running into encoding problems.

You can't add header information into the file itself; it is transmitted along with the file when you are using certain protocols (chiefly SMTP and HTTP).
EDIT: If you wish to work out the content-type (also known as Internet media type) from the file content, you may wish to look at something like mime-util or Apache Tika.
EDIT 2: The answers to this question will help with content-type detection in .NET:
Using .NET, how can you find the mime type of a file based on the file signature not the extension?
EDIT 3: If you know the file format you are working on, you can add any arbitary information you wish to it. You will need to special case each file format though. I can't imagine why you want protocol information inside your file, but that's up to you!
EDIT 4: To add text to the beginning of a text file:
static void WriteBeginning(string filename, string insertedtext)
{
string tempfile = Path.GetTempFileName();
StreamWriter writer = new StreamWriter(tempfile);
StreamReader reader = new StreamReader(filename);
writer.WriteLine(insertedtext);
while (!reader.EndOfStream)
writer.WriteLine(reader.ReadLine());
writer.Close();
reader.Close();
File.Copy(tempfile, filename, true);
}
(credit)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing emails for TIFF attachments in C# - c#

Related

Save text with emoji to file become '?'

MimeKit Character Encoding/Decoding Issue

Accented characters displayed as hex values in mail source file

PDF Image from Binary in MTOM

C# Create MIME Header Information? (Content-Type etc.)

Categories

Resources