Reading email from a mail file - c#

I have a huge number of mail archives that I want to de-duplicate and sort out. The archives are either in mbox format or contain a single mail message. To add a bit of complication, some of the files have windows EOL sequences, and some have the unix EOL.
Using C#. how do I read an archive and split it into its individual messages, or read a single message file? In python, I would use the mailbox.mbox class, but I cannot see the matching functionality in the C# documentation.

It is unlikely that you will find a library to read that file for C# - there aren't that many Unix users who also use C#.
What I would do would be either to:
Read the Python code, and then port it to C#
Find the description of the mbox format online. As it is a Unix system, chances are that the format is just a plain text file, which should be easy enough to parse.

Most standard Unix mail files delimit entries with a line starting "From "
So if you read in the mail file as a text file and switch to a new mail entry every time you see the string "From " at the start of a line it should work
- Any strings elsewhere should already have been delimited by the email program

If it one-time activity I think easiest steps to sort messages:
join all the mbox files into one
load compilation into thunderbird as local folder
run one of Duplicate message finder Add-On on folder
delete found dupliates
compact folder
take the dup-free message list :)
Duplicate Elimiators (Add-Ons for Thunderbird)
I've used this: Remove Duplicate Messages (Alternate)

Related

How to split the chain (Reply & Forward) of email from MSG or EML file using the C# application?

I have an application. It first reads thousand of Msg and Eml files and then converts them into a word document. Some of the individual emails have a chain of emails (Reply and forward emails). We want a split every single email into separate word document files. When we open an email in the outlook application and open a word document in the word application the look and feel must be an exact match in both applications
We want the best approach to achieve the Above stuff.
I have tried with regular expression but not properly split. The look and feel also do not exactly match.

Manipulate and process email

I am using Thunderbird and I need to somehow manipulate and process the e-mails that are stored in inbox.msf file using C# language. In C# I am using SS2.Pop3 library.So, I was wondering, how do you parse an .msf file?
.msf files are mail summary files and do not contain emails (admittedly, it's been like 10 years since I looked). Thunderbird uses standard UNIX mbox formatted files to store mail.
There is only 1 parser for those files available in C# and that is MimeKit (which also happens to be my library).
I'm pretty sure that if you have an Inbox.msf file, right next to it should be a file called Inbox which has the actual mail in it. That's the file you want to parse.

C# Windows Application input text file format

I need a help in developing a Windows Appl using C#.NET VS2010. The functionality is very simple, the user will input a text file and my program is supposed to extract the relevant data from the text file and output it to either csv or text or whatever.
My biggest problem whenever I deal with text files is the format. Even though if you open the input text file in a Notepad or Wordpad it looks perfect, the layout etc. But once we start programming it I realize that what I am seeing is not the way the data is stored inside the file. I read many articles on Unicode/UTF etc.. etc.. but I dont have a definite solution to know exactly what my file format is. So the end result is that I end up getting many exceptions.
In Unix Shell Scripting it used to be simple. There is some good Unix command like less which is similar to more but it also display any formatting characters inside the file. Also there are some useful commands like unix2dos and dos2unix.
Nevertheless, is there some program/code or professional method which can find the exact file formatting of my input file and then reformat it to "plain text" so that the data extraction becomes easy and bug-free.
Thanks

How to parse Amadeus air ticket file

Amadeus produces AIR file like below for every flight reservation.
I need to read reservation number and source and destination airports from this file.
I searched Google for "Amadeus air format" but haven't found format description.
Wikipedia entry about EDIFACt is a bit different, it does not describe this content.
Where to find information about the file structure ?
How to parse this file ?
I have not idea about the file structure, does it contain records like SQL table or is it some reservation protocol instructions like postscript file ?
Application should work in Microsoft Windows and preferably in Visual FoxPro or C# language.
FoxPro or Microsoft Visual Studio 2012 Express can used as programming environment
Google returns only Amadeus users guides and tutorials like in comment and in
http://www.amadeusschweiz.com/en/documentation/usermanuals.html
Those are user manuals. Most promising looks Amadeus Air user guide from this:
File which I received name was air.txt and first token in file is AIR-BLK206
Maybe BLK206 is some booking format descriptor. Google returns some
documens like my using this so it looks like it is commonly used.
This file probably describes how to reserve ticket, which produces air.txt file.
I seacrched this and ticket user guide for BLK but those do not contains this abbreviation.
Commands in user manual look different than those from this file.
How to use this information to extract reservation number and destination airport
from this file ?
I haven't found format description using Google. There are Amadeus user guides, tutorials and quick reference files similar which you posted but I don't understand how to use them to parse this file.
One message describes that this is form of EDIFACT. However EDIFACT message
sample in Wikipedia is also different.
I need to create quick prototype to customer which shows that we vćan read those files.
Maybe there are some programs which can used to display it in human readable form ?
You should consider contacting your local Amadeus Support for help regarding this. They offer excellent documentation around pretty much everything you need to know.
I'm 100% sure that what you're looking at not actually is EDIFACT. EDIFACT is very much delimited with pluses (+) and semicolon (;).
The example actually looks a lot more like a screen capture from the Amadeus Selling Platform with spaces replaced by semicolon(;). It's most likely a file in the Amadeus Interface Record format.
Also, parsing this file require you to know a great deal about how the Amadeus GDS works. And that's not very easy. A flight booking might seem like a trivial thing but it's a very complex world made up of strange ways to handle things.
Here is the product documentations (not including specifications) http://www.amadeus.com/travelagencies/x52025.html
Consider looking for your local sales office at http://www.amadeus.com/
Contact Amadeus. You'll need to sign a non-disclosure, and they'll send you the complete documentation. Note that AIR files contain more than only flight tickets. They can contain Exchange tickets, Refunds,TASF's, EMD's, MCO's, and hotel/(localised)train/car/boat bookings.
And note that the format of the AIR files can be tweaked using Amadeus ProPrinter.
From previous attempts to hack my way through "alien" file formats, my first piece of advice would be to get as many files as you can for flights you already know the details of. This will let you see any commonalities between the files to give you an indication of recurring patterns.
At first glance the ";" would seem to be a separator / delimiter for information - having never seen this format before it could be that the format is data-only with the reading application pulling out elements based on tacit knowledge of the file structure.
Match known information
Build patterns
the matter of parsing edifact-amadeus is much more complicated then getting the manual from amadeus. it takes inventing tree algorithms, managing large data files, timing procedures, etc. etc. if you do not have the time and drive go invent the wheel al over again, you'd better look for an existing solution. this is not a direct answer to the question you posed, but it is the outcome of some experience.
Air’s are data files and can be read using notepad or word pad or any such program
It includes data of Air, hote, train , car etc info in it and each has specific formats in which data is structured in the document
If you contact your local Amadeus office you can get the document which explains the data structure of each of these files(a 273 page document) the delimiter/separator in this file is “,”
The text AMD BLK 206 refers to Amadeus bulk file & 206 is the file structure which you should get clearly once you go through the document
Please go through this EDIFACT tutorial or basic guide which helps you understand the file structure and then you can easily parse it the way you like,
more over there are tools available to parse EDIFACT files like notepad++ or EDInotepad.

How to convert a Word file content (with images and tables) into RFC 822 format using c#?

I want to read a Word file (contains images and table).
After that i want to convert it's content into "RFC 822 format ".
Looking for some APIS and sample code for above.
Start with the Open XML SDK 2.0 for Microsoft Office to read the Word file.
RFC 822 is not the specification you're looking for.
The abstract for RFC 5322 reveals one reason:
This document specifies the Internet Message Format (IMF), a syntax
for text messages that are sent between computer users, within the
framework of "electronic mail" messages. This specification is a
revision of Request For Comments (RFC) 2822, which itself superseded
Request For Comments (RFC) 822, "Standard for the Format of ARPA
Internet Text Messages", updating it to reflect current practice and
incorporating incremental changes that were specified in other RFCs.
This paragraph from section 1.1 ("Scope") reveals another:
This document specifies a syntax only for text messages. In
particular, it makes no provision for the transmission of images,
audio, or other sorts of structured data in electronic mail messages.
There are several extensions published, such as the MIME document
series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms
for the transmission of such data through electronic mail, either by
extending the syntax provided here or by structuring such messages to
conform to this syntax. Those mechanisms are outside of the scope of
this specification.
I think that you actually want the answer to a different question, like:
How can I convert a Word message to a MIME-based format that will look nice in a variety of mail user-agents?
If I'm wrong, and you really are asking how to convert Word documents to plain text emails, you should probably take a look at antiword, a highly featureful tool for converting Word documents to plain text, PostScript, PDF, or docbook. (There's no indication that it can do anything with images when outputting plain text, though — yet, anyway.)

Categories

Resources