Get message text from Lync database - c#

I have a connection to Lync sql database. The problem is that saome messages are stored as HTML and some looke like:
{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}}
{\colortbl ;\red0\green0\blue0;}
{\*\generator Riched20 15.0.4420}{\*\mmathPr\mwrapIndent1440 }\viewkind4\uc1
\pard\cf1\embo\f0\fs20 this\embo0 \embo is\embo0 \embo from\embo0 \embo
db\embo0\f1\par
{\*\lyncflags rtf=1}}
It's easy to handle HTML-encoded messages, but how can I get at least text from the other type? Deoes Lync SDK allows to do this? I didn't find how to do this with Lync SDK.
Even if Lync SDK allows to get message text I don't want to install SDK just for this purpose. Hope that there is a better way. Maybe there are free 3rd party parsers for this?

The text is in RTF fornmat. You can convert the RTF text to plain text using the RichTextBox inthe System.Windows.Forms namespace.
First you create a richtextbox and provide it with the text.
System.Windows.Forms.RichTextBox richTextBox = new System.Windows.Forms.RichTextBox();
richTextBox.Rtf = rtfText;
You can then read the plain text
string plainText = richTextBox.Text;
When doing this on the text in your example it plainText returns: this is from db.

Related

Save text with emoji to file become '?'

What I have to do
I have to create a text file (.txt, .doc, ...) with the exact text passed (so with emojis) by a .net WebaApi (and attach it to an email).
Situation:
I have a project with .net webapi. One of my routes consist of creating a text file and attach it to an email, with some text passed by a device that may contain emojis.
I can't figure out how to save emojis correctly. If I copy-paste an emoji into a word or notepad file it works, but if I save it through my code it doesn't. I suppose it is due to formatting, but I tried Unicode, UTF-32, UTF-8, ASCII,...
I tried many solutions found here on SO, but none of them worked for me.
For example this emoji (copy-pasted from .net debugger) --> 🎶 is converted into quotation mark or ¶ó, based on encoding used.
How can I save emoji as text into a file so that they can be read by the receivers?
This is what I've done:
//smsText is a string containing emojis
byte[] bytes = Encoding.Unicode.GetBytes(smsText);
Attachment attachment = new Attachment(new MemoryStream(bytes), tokenKey + ".doc");
attachment.ContentType = new ContentType("application/ms-word");
List <Attachment> attachments = new List<Attachment>();
attachments.Add(attachment);
//send email with attachments
Note that smsText, with debugger, contains the 🎶 correctly displayed.
The email correctly reach the receiver, with the .doc attachment, but the attachment doesn't contains the emojis
Your smsText contains a plaintext string. You can't just write that string into a stream or file that you then call a Word file*.
Word files are binary files with a specific format. You need to use a library that can write this format, or use Interop to interoperate with an existing Word installation.
See for example Free library to MS Word.
And if you're fine with plaintext files, just write the text's bytes to a stream and propagate the appropriate encoding (in this case Unicode, being UTF-16 on .NET).
*: yes you can, just like that Excel tries its best to format an HTML table as an Excel document, but you shouldn't.

Converting .MSG files to .TXT; Should I use Microsoft.Interop Outlook?

I'm trying to do a conversion from .msg files into .txt. I have two questions.
1)I've been investigating and found the Microsoft.Interop Outlook package and there is a way where I can extract the bodyHTML, To, Sent Date, and a few other properties but I feel as if this is a very manual process because I have to trim out all the html tags such as < br>, &nbsp, a href etc...
Here is my current code...
MailItem mailItem = outlookApp.Session.OpenSharedItem(item) as MailItem;
TextFile textFile = new TextFile(); //collection of properties I am interested in
textFile.To = mailItem.To;
textFile.Subject = mailItem.Subject;
textFile.Sent = mailItem.SentOn.ToString();
textFile.Name = Path.GetFileNameWithoutExtension(item);
var atttach = mailItem.Attachments; //Really just want the names
textFile.Body = RemoveStuff(mailItem.HTMLBody); //manually removing all html tags
textFiles.Add(textFile);
Marshal.ReleaseComObject(mailItem);
Does anyone know if there is a more effective way to do this in C# or a way using Interop that I am not aware of?
2)If I go the interop route, is there a way I can bypass the popup in Outlook asking if I can allow access to Outlook? Seems inefficient if my goal is to create a converter.
Any help is greatly appreciated.
Thanks!
Firstly, why are you using HTMLBody property instead of the plain text Body?
Secondly, you can use MailItem.SaveAs(..., olTxt) to save the message as a text file. Or do you mean something else by txt file?
The security prompt is raised by Outlook if your antivirus app is not up to date. If you cannot control the environment where your code runs, Extended MAPI (C++ or Delphi only) or a wrapper like Redemption (any language - I am its author) are pretty much your only option. See http://www.outlookcode.com/article.aspx?id=52 for more details.
In Redemption, you can have something like the following:
using Redemption;
...
RDOSession session = new RDOSession();
RDOMail msg = session.GetMessageFromMsgFile(TheFileName);
msg.SaveAs(TxtFileName, rdoSaveAsType.olTXT);

C# Web scraper copying text

I have a web scraper written in C# for extracting data. I want to copy text from the web browser control and paste it into a Word file programmatically. When I try to extract rich text box content using its ID and InnerText, the text contains encoded characters like %2c.
I need to get the text with all formatting but I can't find any way. I have tried Encoding, HTTPUtility.UrlDecode, SendKeys and elem.InvokeMember() without success.
How can I programmatically copy and paste text from web browser control preserving formatting?
Here is the sample data to extract:
Description
The Advance Concepts Engineering team designs and develops new vehicles which will meet future regulatory requirements and customer competitive requirements. A qualified candidate will be responsible for the total vehicle packaging. The candidate will identify and resolve adaptation and packaging issues as the vehicle moves toward production. They will lead cross functional team meetings working with Systems & Components, Advance Manufacturing, Service, etc. to ensure that the solutions are optimized for all stages of the vehicle's life.
HtmlElement elem = wb.Document.GetElementById("ctl00_contplhDynamic_txtDescrContentHiddenTextarea");
if (elem == null) return;
elem.InvokeMember("Click");
//elem.InvokeMember("Select All");
//elem.InvokeMember("Copy");
SendKeys.SendWait("^a");
SendKeys.SendWait("^c");
Clipboard.Clear();
elem.Focus();
elem.InvokeMember("Right Click");
elem.InvokeMember("Select All");
elem.InvokeMember("Copy");
Clipboard.SetText(elem.InnerText);
string clipbrdText = Clipboard.GetText();
string data = elem.InnerText;
richTextBox1.Text = data;
string temp = System.Web.HttpUtility.UrlDecode(data);
Encoding iso = Encoding.GetEncoding("windows-1252");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(data);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);
The text with "%2c" etc has been encoded. If you are getting the content of a web page, you are decoding the HTML, not the URL. You can use HttpUtility.HtmlDecode, or if you are using .NET 4.0 or above you can also use WebUtility.HtmlDecode - this is available within the System.Net namespace.
You should note that Word does not use HTML for its formatting, so you won't be able to paste HTML tags and expect it to recognise them. i.e. <strong>Description</strong> will not result in bold text if you type that into Word.
EDIT:
It looks like you are mixing two different ways to copy the text in the code you pasted - both SendKeys.SendWait("^c"); and elem.InvokeMember("Copy");. I presume both of these methods work?
I think the problem you are having lies in the way you are getting the text. I see you're using Clipboard.GetText() to get the text. Try specifying that it is formatted text using Clipboard.GetText(TextDataFormat.Rtf) or Clipboard.GetText(TextDataFormat.Html). This should hopefully copy the string preserving the formatting.

Is Lotus notes email client unable to render <br > tag?

I have a weird problem with Lotus Notes 8.5. In my project I am sending meeting invitation to the user. for that, I generate .ics file. Here is how i generate .ics file
var body = "Dear Raj, \n\n How are you? line break is not working \n\n how?";
using (TextWriter writer = File.CreateText("../test.ics"))
{
writer.WriteLine("BEGIN:VCALENDAR");
writer.WriteLine("PRODID:-//Microsoft Corporation//Outlook 11.0 MIMEDIR//EN");
writer.WriteLine("VERSION:2.0");
writer.WriteLine("METHOD:REQUEST");
writer.WriteLine("BEGIN:VEVENT");
writer.WriteLine("ATTENDEE;ROLE=REQ-PARTICIPANT;RSVP=TRUE:MAILTO:participant#company.com");
writer.WriteLine("ORGANIZER;CN="Organizer":MAILTO:organizer#test.ccc");
writer.WriteLine("(DTSTART:20141231T010000Z");
writer.WriteLine("DTEND:20141231T010000Z");
writer.WriteLine("TRANSP:OPAQUE");
writer.WriteLine("SEQUENCE:0");
writer.WriteLine("UID:Company-interview-123");
writer.WriteLine("DTSTAMP:20141223T232322Z");
writer.WriteLine("SUMMARY:Interview Scheduled for Job");
writer.WriteLine("DESCRIPTION:{0}", body.Replace("\n","<br />"));
//Adding below property actually fixed the issue.
writer.WriteLine("X-ALT-DESC;FMTTYPE=text/html:{0}", body.Replace("\n","<br />"));
writer.WriteLine("LOCATION:Test Location");
writer.WriteLine("PRIORITY:5");
writer.WriteLine("X-MICROSOFT-CDO-IMPORTANCE:1");
writer.WriteLine("CLASS:PUBLIC");
writer.WriteLine("BEGIN:VALARM");
writer.WriteLine("TRIGGER:-PT15M");
writer.WriteLine("ACTION:DISPLAY");
writer.WriteLine("DESCRIPTION:Reminder");
writer.WriteLine("END:VALARM");
writer.WriteLine("END:VEVENT");
writer.WriteLine("END:VCALENDAR");
}
But Lotus email client is displaying the content as such.
its showing
Dear Raj, <br><br> How are you? line break is not working <br><br> how?
On all other email clients, my content is displaying as
Dear Raj,
How are you? line break is not working
how?
Am i missing something here?
Updated my .ics generation code to add X-ALT-DESC;FMTTYPE=text/html: to fix the issue
I just checked with a vcard that contains your Text in Lotus Notes 8.5 and IBM Notes 9, and it worked exactly as expected. BUT: It worked with your "original" Text without the replace. In the RFC2445 it states, that Line- Breaks have to be encoded as \n:
An intentional formatted text line break MUST only be included in a
"TEXT" property value by representing the line break with the
character sequence of BACKSLASH (US-ASCII decimal 92), followed by a
LATIN SMALL LETTER N (US-ASCII decimal 110) or a LATIN CAPITAL LETTER
N (US-ASCII decimal 78), that is "\n" or "\N".
That means: use
writer.WriteLine("DESCRIPTION:{0}", body);
instead of
writer.WriteLine("DESCRIPTION:{0}", body.Replace("\n","<br>"));
And your problem should be solved
The DESCRIPTION property is not meant to contain any rich text/html content but only plain text.
Lotus Notes may use some other property (X- property) to convey rich text description. Or it may use an ALTREP parameter on the DESCRIPTION, that point to another MIME bodypart in the invitation. See https://www.rfc-editor.org/rfc/rfc5545#section-3.2.1
So what you probably want to do is to send an invitation containing rich text from Lotus Notes to some external account, and then see what the MIME message that you receive looks like.

HTML String Encoding

I've a requirement to export ppt from C# without using introp dlls. I am able to do that but when I do append some HTML string i.e. "<b>Krishna</b><br/><strong>Ram</strong>" in any slide, it is showing the same text, not rendered one. Can any help me ?
It appears that PPT does not currently support HTML rendering directly in PPT. You must either export your slide show as HTML, or use the built in formatting as shown in answer to the following question: Apply Font Formatting to PowerPoint Text Programatically.
Set tr = ActiveWindow.Selection.SlideRange.Shapes(1).TextFrame.TextRange
With tr
.Text = "Hi There Buddy!"
.Words(1).Font.Bold = msoTrue
For an idea of the settings in C# and Office 2010 specifically see Font Members.
You should be able to test my assertion yourself, by HTML encoding your text using the HttpServerUtility.HtmlEncode Method:
String TestString = "This is a <Test String>.";
String EncodedString = Server.HtmlEncode(TestString);

Categories

Resources