Write Regex Expression To Include Name, Email and Phone No [duplicate] - c#

This question already has an answer here:
Regex Expression. Include name, email and phone no only and exclude every other text [closed]
(1 answer)
Closed 21 days ago.
I am redacting text from PDFs. I want a regex expression in C# in which I want to include only my name, my phone no and my email address and exclude every other text. For Example:
Actual Text: Hi, My name is Jordan Bush. My email is Jordan.Bush#abc.com. My phone no is +44 (0) 7595446609).
Text I Need: Jordan Bush Jordan.Bush#abc.com +44 (0) 7595446609)
Image Link is also attached for what I need.
Redacted Text
Note: my name, my phone no and my email addresses could occur more than once in a text.
I tried this "\b(?!(?:Jordan|Bush|#|abc|com|44 (0) 7595446609)|or)\b)\w+" but not get the required text. It is not converting "#abc.com" when it is appended with another name like "Joseph.Buttler#abc.com".It is showing "#abc.com". It should convert this into █. Plus it is not converting phone no 44 (0) 7595446609) into █.Screen shot is attached.
Here is the input Input Text
Here is the output Text I get after adding multiple names, multiple phone no and multiple email addresses
Here is the code:
using iText.Kernel.Colors;
using iText.Kernel.Pdf;
using iText.Layout;
using iText.Layout.Element;
using iText.Layout.Properties;
using iText.PdfCleanup;
using iText.PdfCleanup.Autosweep;
using System.Text.RegularExpressions;
namespace GeneratePdfDemo
{
public class Program
{
static void Main(string[] args)
{
PdfWriter writer = new PdfWriter("D:\\demo.pdf");
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
Paragraph header = new Paragraph("HEADER")
.SetTextAlignment(TextAlignment.CENTER)
.SetFontSize(20);
Paragraph body = new Paragraph("Hi, My name is Jordan Bush. My email is Jordan.Bush#abc.com. My phone no is +44 (0) 7595446609." +
"Hi, My name is Jordan Bush. My email is Jordan.Bush#abc.com. My phone no is +44 (0) 7595446609."
)
.SetTextAlignment(TextAlignment.CENTER)
.SetFontSize(20);
Paragraph body1 = new Paragraph("Hi, My name is Jordan Bush. My email is Jordan.Bush#abc.com. My phone no is +44 (0) 7595446609." +
"Hi, My name is Joesph Buttler. My email is Joesph.Buttler#abc.com. My phone no is +012346578." +
"Hi, My name is Joesph Buttler. My email is Joesph Buttler#abc.com. My phone no is +012346578."
)
.SetTextAlignment(TextAlignment.CENTER)
.SetFontSize(20);
document.Add(header);
document.Add(body);
document.Add(body1);
document.Close();
/*PdfDocument*/ pdf = new PdfDocument(new PdfReader("D:\\demo.pdf"), new PdfWriter("D:\\demo_1.pdf"));
CompositeCleanupStrategy cleanupStrategy = new CompositeCleanupStrategy();
cleanupStrategy.Add(new RegexBasedCleanupStrategy(new Regex(#"\b(?!(?:Jordan|Bush|Jordan\.Bush#abc\.com|44 (0) 7595446609)\b)\w+", RegexOptions.IgnoreCase)).SetRedactionColor(ColorConstants.BLACK));
PdfCleaner.AutoSweepCleanUp(pdf, cleanupStrategy);
PdfCleaner.CleanUpRedactAnnotations(pdf);
pdf.Close();
}
}
}

Related

Embed byte array data as image to email

I have tried to embed a base64 type and not working. The image not displayed in the email.
string imreBase64DataHeader = LogoFromByteArray(_org.O_Logo);
image = "<img src='data:image/png;base64," + imreBase64DataHeader + "' alt='img' />";
body += string.Format("<div>{0}</div> <br/><br/>", image);
Note: I have a pdf attachment in the email already.
The logo is saved in the db as byte array and I need to add this in the email signature.
Thank you in advance for the help.
Base64 images are not supported in Outlook for desktop. Instead, you need to save the image on the disk and then attach it to the email. Then in the message body you can refer to such attachments by using the cid: prefix:
.Attachments.Add "C:\Users\JoeSchmo\Pictures\ImageName.jpg", olByValue, 0
.HTMLBody = "<BODY><IMG src=""cid:ImageName.jpg"" width=200> </BODY>"
You may also consider setting the following properties on the attachment:
Const PR_ATTACH_MIME_TAG = "http://schemas.microsoft.com/mapi/proptag/0x370E001E"
Const PR_ATTACH_CONTENT_ID = "http://schemas.microsoft.com/mapi/proptag/0x3712001E"
Const PR_ATTACHMENT_HIDDEN = "http://schemas.microsoft.com/mapi/proptag/0x7FFE000B"
Set oPA = Attachment.PropertyAccessor
oPA.SetProperty PR_ATTACH_MIME_TAG, "image/jpeg"
oPA.SetProperty PR_ATTACH_CONTENT_ID, "cidName"
oPA.SetProperty PR_ATTACHMENT_HIDDEN, True
Read more about this in the Embed Images in New Messages using a Macro article.
You have to first convert byte array to base64encoded string.
Try this code.
string imageTag = "<img id ='Icon' src='data:image/*;base64,"+(Convert.ToBase64String(imageByte))+"' >";
Also if you are using outlook as your email client you might need to download the image in the email preview.

c# - Unable to read the circled text from an image using tessnet2 and Tesseract-OCR

I'm trying to writte code which change text from jpg to console. I used to write: tessnet2 and Tesseract-OCR (in c#). Now everthing is work fine when I have pure text, but problem is when the text is in circle like here:
Actually console return me this:
118 : Text
61 : 1
219 : #
Sometimes it change # to ~(depend of size of circle).
There is my code:
var image = new Bitmap(#"D:\OCR\texttoread.bmp");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.Init(#"C:\tessdata", "eng", false);
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
{
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
}
Console.Read();
Can someone tell me what I should do to read this text?
Try Ironocr plugin and use the following. It have more accuracy I think, Hope this will help you. Please find the below code for read.
var Ocr = new AutoOcr();
var Result = Ocr.Read(bmpCrop);
string text = Result.Text;
return text;

Sending Swedish and Chinese signs to Docx using OpenXML and RTF

Goal
Passing Swedish and Chinese signs to a DocX-file in a RTF format.[2]
Description
I need to dynamically generate a RTF-formatted string containing Swedish and Chinese signs and send it to an existing Docx-file. I have managed to handle the Swedish diaereses (åäö) but I can't manage to get the Chinese signs to be shown properly, instead they are shown as ????
private void buttonSendDiaeresesToDocx_Click(object sender, EventArgs e)
{
var desktop = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
var filename = #"SpecialCharactersInDocx.docx";
var filepath = Path.Combine(desktop, filename);
//Dynamic content fetched from the database.
var content = "This should be Swedish and Chinese signs -> åäö - 部件名称";
var rtfEncodedString = new StringBuilder();
rtfEncodedString.Append(#"{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard ");
rtfEncodedString.Append(content);
rtfEncodedString.Append(#"\par}");
removeExistingFile(filepath);
createEmptyDocx(filepath);
addRtfToWordDocument(filepath, rtfEncodedString.ToString());
openDocx(filepath);
}
private void addRtfToWordDocument(string filepath, string rtfEncodedString)
{
//Implemented as suggested at
//http://stackoverflow.com/a/14861397/1997617
using (WordprocessingDocument doc = WordprocessingDocument.Open(filepath, true))
{
string altChunkId = "AltChunkId1";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Rtf, altChunkId);
using (MemoryStream ms = new MemoryStream(Encoding.Default.GetBytes(rtfEncodedString)))
{
chunk.FeedData(ms);
}
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.ReplaceChild(
altChunk, mainDocPart.Document.Body.Elements<Paragraph>().Last());
mainDocPart.Document.Save();
}
}
I have tried to use different encodings for the memory stream (Default, ASCII, UTF8, GB18030, ...) but none seams to work. I've also tried to convert the encoding of the rtfEncodedString variable before passing it to the addRtfToWordDocument method.
How do I make both the Swedish and the Chinese signs to show properly in the document?
Notes and references
The above code snippet is the part of my solution that I think is relevant for the question. The entire code sample can be downloaded at http://www.bjornlarsson.se/externals/SpecialCharactersInDocx02.zip
The RTF format is needed in the real world application since the content is to be shown as a table (with bold text) in the document.
You could use wordpad to create the rtf string for you. Open wordpad copy your content save to file. And then use a texteditor to read the rtf.
your rtf string then looks like this :
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1031{\fonttbl{\f0\fnil Consolas;}{\f1\fnil\fcharset0 Consolas;}{\f2\fnil\fcharset134 SimSun;}{\f3\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 10.0.10586}\viewkind4\uc1
\pard\sa200\sl276\slmult1\f0\fs19\lang7 This should be Swedish and Chinese signs -> \f1\'e5\'e4\'f6 - \f2\'b2\'bf\'bc\'fe\'c3\'fb\'b3\'c6\f3\fs22\par
}
maybe it helps.I tested the rtf string with your code and it works!
Dynamic generate rtf string via richtextbox :
private void buttonSendDiaeresesToDocx_Click(object sender, EventArgs e)
{
var desktop = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
var filename = #"SpecialCharactersInDocx.docx";
var filepath = Path.Combine(desktop, filename);
removeExistingFile(filepath);
createEmptyDocx(filepath);
rtfEncodedString = new StringBuilder();
string contentOriginal = "This should be Swedish and Chinese signs -> åäö - 部件名称";
string rtfStart =
"{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1031{\\fonttbl{\\f0\\fnil\\fcharset0 Microsoft Sans Serif;}{\\f1\\fmodern\\fprq6\\fcharset134 SimSun;}}\r\n\\viewkind4\\uc1\\pard\\f0\\fs17 ";
RichTextBox rtfBox = new RichTextBox {Text = contentOriginal};
string content = rtfBox.Rtf;
content = content.Replace(rtfStart, "");
rtfEncodedString.Append(rtfStart);
rtfEncodedString.Append(content);
rtfEncodedString.Append(#"\par}");
addRtfToWordDocument(filepath, rtfEncodedString.ToString());
openDocx(filepath);
}

remove margin from email attached word document

I have a html page which is attached to the email as word document.
string body = String.Empty;
body = new StreamReader("execlude.html").ReadToEnd();
byte[] data = Encoding.ASCII.GetBytes(body);
MemoryStream ms = new new MemoryStream(data);
var message = new System.Net.Mail.MailMessage(email.From, email.To);
message.Attachments.Add(new Attachment(ms, "excluded.doc", "application/msword"));
Here the attachment becomes word format, but the margin are too big. Let me know a solution to remove margins.
Thanks in advance.
Brother try these links....
Changing the margins of a Word Document
Setting word document table cell margin programmatically using c#
https://social.msdn.microsoft.com/Forums/sharepoint/en-US/30377e45-6473-4385-a83d-664ad5cc7dea/how-to-set-margin-of-word-document-using-c?forum=worddev

Microsoft.Office.Interop.Word cannot return complete email address

I am using Microsoft.Office.Interop.Word to read texts in a Word document and return every word in the file. In the Word document, there is an email address and unfortunately I don't get complete email address returned to me. For example: the email address is abc#xyz.com but I get (1)abc (2)# (3)xyz (4). (5)com.
How to get the complete email address by using Microsoft.Office.Interop.Word? Thanks.
The code:
Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
Document document = application.Documents.Open(txtUploadedPathToken.Text);
// Loop through all words in the document.
int count = document.Words.Count;
foreach (Microsoft.Office.Interop.Word.Range range in document.Words)
{
string text = range.Text;
tableLayoutPanel2.Controls.Add(new Label() { Text = text, Anchor = AnchorStyles.Left, AutoSize = true });
}
I think I have found the solution. This is by using Content.Text and the .Split as the code below.
string doctexts = document.Content.Text;
string[] docwords = doctexts.Split(' ');

Categories

Resources