Prepare text to paste to MS Word with proper alignment

Prepare text to paste to MS Word with proper alignment - c#

I am working on a C# application that calculates some values. They are placed into a DataGridView. When I select values from a column of the DataGridView and I paste them to a column of a table from a Word document, I want the values to be center aligned. Even if I set all cells of the DataGridView to be center aligned, when I copy and paste the text into the Word table, they show up left aligned.
The code used to copy the table into clipboard is
Clipboard.SetDataObject(this.dataGridView1.GetClipboardContent());
How can I prepare the text so that when I paste it, it will show centered? If I copy and paste text from another column of Word, it maintains the original alignment. This indicates there are some special characters surrounding the cell values. I don't know how to view those characters (maybe I can add them to each value).

I'm going to cheat and start with an incomplete answer.
This RTF text seems like it should work for a single value:
{\rtf1\ansi\qc
This is some centered text.}
That's a "hard newline" after \qc or I think just a \n. If I put that RTF into a file then open in WordPad it shows up centered.
I've been playing with System.Windows.Forms.ClipBoard.
Clipboard.SetData(DataFormats.Rtf, "{\\rtf1\\qc\n{\\b foo}}");
If I run the above even in a console application, I can next ctrl-V paste into MS Word and the bold works, but unfortunately the centering doesn't.
In any case, I then looked at pasting into a MS Word table and clearly it's not just a matter of text with newlines, some delimiter or other is required to show cell boundaries. So not only does the RTF I have not work, there's likely at least one more step / wrapper beyond the RTF to get a "column" not just a block of text.
Feel free to not vote, I just thought perhaps something here might be helpful to avoid both of us doing the same thing twice.
EDIT: DataFormats.Html may also work and seems it could even be the format normally used by your grid control. (though it also supports CSV)
However there's an extra clipboard header for HTML I haven't figured out yet described here: How to set HTML to clipboard in C#?

I will post my solution in here, so it can be used (and improved) by other people. We need to place a populated dataGridView object on form and a button. The button calls function CopyColumn(). I have some problems to properly format the code (some of the long strings are separated as text by stackoverflow, maybe somebody will help with including them into code.
void CopyColumn()
{
if (this.dataGridView1.GetCellCount(DataGridViewElementStates.Selected) > 0)
{
try
{
Clipboard.SetDataObject(this.dataGridView1.GetClipboardContent());
string sText = Clipboard.GetText();
string sColumn = FormatColumn(sText);
Clipboard.SetData(DataFormats.Rtf, sColumn); // this will set the proper format of the Uncertainty column in clipboard memory
}
catch (System.Runtime.InteropServices.ExternalException)
{
MessageBox.Show("The Clipboard could not be accessed. Please try again.");
}
}
}
string FormatColumn(string sValues)
{
int nlines = NumLines(sValues);
string[] values = Values(sValues);
string sStart = #"{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang3081{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}}
{*\generator Riched20 10.0.14393}\viewkind4\uc1";
string sEnd = "}";
string sRowStart = #"\trowd\trgaph85\trleft5\trbrdrl\brdrs\brdrw10 \trbrdrt\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trpaddl85\trpaddr85\trpaddfl3\trpaddfr3
\clvertalc\clbrdrl\brdrw10\brdrs\clbrdrt\brdrw10\brdrs\clbrdrr\brdrw10\brdrs\clbrdrb\brdrw10\brdrs \cellx1706
\pard\intbl\widctlpar\qc ";
string sRowEnd = #"\cell\row";
string sFormattedColumn = sStart;
string sRow = string.Empty;
for(int i = 0; i < nlines; i++) {
sRow = sRowStart + values[i] + sRowEnd;
sFormattedColumn += sRow;
}
sFormattedColumn += sEnd;
return sFormattedColumn;
}
int NumLines(string sValue)
{
string[] values = sValue.Split('\r');
return values.Length;
}
string[] Values(string sValue)
{
string[] values = sValue.Split('\r');
for(int i = 0; i < values.Length; i++) {
values[i] = values[i].Replace("\n", "");
}
return values;
}

I fear this is not related to your VBA, but a bug in MS Word (tested for 2010 and 2016).
It's enough to do the following (not sure it's the same bug you're suffering from):
1. Open a new Word doc.
2. Insert a table (let's say 4 columns, 4 rows).
3. Mark all cells and format them as centered.
4. Type some text into a cell, select it and copy it.
Now there are two scenarios:
A: Paste the copied text into a single free cell -> The pasted text is centered, as expected.
B: Mark several adjacent free cells and paste the text -> The pasted text is left-aligned instead of centered!!
If I use LibreOffice (which I recommend to you, too), all this works fine (no surprise).
No, I did not pay for MS Office, but I have to use it at work :/

Related

Why does the Notepad++ [NULL] character not paste?

I am new to this site, and I don't know if I am providing enough info - I'll do my best =)
If you use Notepad++, then you will know what I am talking about -- When a user loads a .exe into Notepad++, the NUL / \x0 character is replaced by NULL, which has a black background, and white text. I tried pasting it into Visual Studio, hoping to obtain the same output, but it just pasted some spaces...
Does anyone know if this is a certain key-combination, or something? I would like to put the NULL character in replacement of \x0, just like Notepad++ =)

Notepad++ is a rich text editor unlike your regular notepad. It can display custom graphics so common in all modern text editors. While reading a file whenever notepad++ encounters the ASCII code of a null character then instead of displaying nothing it adds the string "NULL" to the UI setting the text background colour to black and text colour to white which is what you are seeing. You can show any custom style in your rich text editor too.
NOTE: This is by no means an efficient solution. I'm clearly traversing a read string 2 times just to take benefit of already present methods. This can be done manually in a single pass. It is just to give a hint about how you can do it. Also I wrote the code carefully but haven't ran it because I don't have the tools at the moment. I apologise for any mistakes let me know I'll update it
Step 1 : Read a text file by line (line ends at '\n') and replace all instances of null character of that line with the string "NUL" using the String.Replace(). Finally append the modified text to your RichTextBox.
Step 2 : Re traverse your read line using String.IndexOf() finding start indexes of each "NUL" word. Using these indexed you select text from RichTextBox and then style that selected text using RichTextBox.SelectionColor and RichTextBox.SelectionBackColor
richTextBoxCursor basically just represents the start index of each line in RichTextBox
StreamReader sr = new StreamReader(#"c:\test.txt" , Encoding.UTF8);
int richTextBoxCursor = 0;
while (!sr.EndOfStream){
richTextBoxCursor = richTextBox.TextLength;
string line = sr.ReadLine();
line = line.Replace(Convert.ToChar(0x0).ToString(), "NUL");
richTextBox.AppendText(line);
i = 0;
while(true){
i = line.IndexOf("NUL", i) ;
if(i == -1) break;
// This specific select function select text start from a certain start index to certain specified character range passed as second parameter
// i is the start index of each found "NUL" word in our read line
// 3 is the character range because "NUL" word has three characters
richTextBox.Select(richTextBoxCursor + i , 3);
richTextBox.SelectionColor = Color.White;
richTextBox.SelectionBackColor = Color.Black;
i++;
}
}

Notepad++ may use custom or special fonts to show these particular characters. This behavior also may not appropriate for all text editors. So, they don't show them.
If you want to write a text editor that visualize these characters, you probably need to implement this behavior programmatically. Seeing notepad++ source can be helpful If you want.

Text editor
As far as I know in order to make Visual Studio display non printable characters you need to install an extension from the marketplace at https://marketplace.visualstudio.com.
One such extension, which I have neither tried nor recomend - I just did a quick search and this is the first result - is
Invisible Character Visualizer.
Having said that, copy-pasting binaries is a risky business.
You may try Edit > Advanced > View White Space first.
Binary editor
To really see what's going on you could use the VS' binary editor: File->Open->(Open with... option)->Binary Editor -> OK

To answer your question.
It's a symbolic representation of 00H double byte.
You're copying and pasting the values. Notepad++ is showing you symbols that replace the representation of those values (because you configured it to do so in that IDE).

ITextsharp PDFParser extracting text to textbox

I want to extract file from PDF to a textbox in asp.net, and I have tried this code from the project here
I have successfully extract the text from my PDF, but the result is exported to .txt file first, and the result doesn't have any line, and there aren't any whitespace between words.
If this is the example of the PDF text
Hello World
This is the word ----------------------------------------------- This is word too
End of Hello World
The result will be like this
HelloWorld Thisistheword Thisiswordtoo EndofHelloWorld
What should I do so I can have a space between every word, and add new line in every line?
Also in this http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET I saw the following code:
int totalLen = 68;
float charUnit = ((float)totalLen) / (float)reader.NumberOfPages;
int totalWritten = 0;
float curUnit = 0;
What's the use of it?
Edit:
After searching for some more, I found the solution in the comment here
I just need to update my itextsharp.dll to the newer version ( I use version 5.4.4.0 ) and added the function like what the comment says and now the result is good like what I wanted it to be

There seems to be some sort of Trim() function happening in the PDFParser.
In addition to that,in the ExtractTextFromPDFBytes method, the newline tokens it is checking is incorrect, it should not be 'TD', 'Td':
Check for iTextSharp.text.Chunk.NEWLINE

Get different style sections in Microsoft Publisher via Interop

I have a little C# app that is extracting text from a Microsoft Publisher file via the COM Interop API.
This works fine, but I'm struggling if I have multiple styles in one section. Potentially every character in a word could have a different font, format, etc.
Do I really have to compare character after character? Or is there something that returns me the different style sections? Kinda like I can get the different Paragraphs?
foreach (Microsoft.Office.Interop.Publisher.Shape shp in pg.Shapes)
{
if (shp.HasTextFrame == MsoTriState.msoTrue)
{
text.Append(shp.TextFrame.TextRange.Text);
for(int i = 0; i< shp.TextFrame.TextRange.WordsCount; i++)
{
TextRange range = shp.TextFrame.TextRange.Words(i+1, 1);
string test = range.Text;
}
}
}
Or is there in general a better way to extract the text from a Publisher file? But I have to be able to actually write it back with the same formatting. It's for a translation.

You could consider using the clipboard to copy text sections as RTF which you can later paste back as RTF as with the example below for Word. I am not familiar with Publisher's object model.
string text = wordDocument.Content.Paragraphs[0];
System.Windows.Forms.Clipboard.SetText(text, TextDataFormat.Rtf);
Other than that, I have not found a collection of applied styles when using interop with any of the office products.

We tried an approach were we just compared for every character as many font styles as possible. Not pretty, but works in most cases...

Delete Lines From Beginning of Multiline Textbox in C#

Is there a graceful way in C# to delete multiple lines of text from the beginning of a multiline textbox? I am using Microsoft Visual C# 2008 Express Edition.
EDIT - Additional Details
The multiline textbox in my application is disabled (i.e. it is only editable by the application itself), and every line is terminated with a "\r\n".

This is an incomplete question. So assuming you are using either TextBox or RichTextBox you can use the Lines property found inTextBoxBase.
//get all the lines out as an arry
string[] lines = this.textBox.Lines;
You can then work with this array and set it back.
this.textBox.Lines= newLinesArray;
This might not be the most elegant way, but it will remove the first line.
EDIT: you don't need select, just using skip will be fine
//number of lines to remove from the beginning
int numOfLines = 30;
var lines = this.textBox1.Lines;
var newLines = lines.Skip(numOfLines);
this.textBox1.Lines = newLines.ToArray();

This solution works for me in WPF:
while (LogTextBox.LineCount > Constants.LogMaximumLines)
{
LogTextBox.Text = LogTextBox.Text.Remove(0, LogTextBox.GetLineLength(0));
}
You can replace LogTextBox with the name of your text box, and Constants.LogMaximumLines with the maximum number of lines you would like your text box to have.

Unfortunately, no, there is no "elegant" way to delete lines from the text of a multiline TextBox, regardless of whether you are using ASP.NET, WinForms, or WPF/Silverlight. In every case, you build a string that does not contain the lines you don't want and set the Text property.
WinForms will help you a little bit by pre-splitting the Text value into lines, using the Lines property, but it's not very helpful because it's a string array, and it's not exactly easy to delete an element of an array.
Generally, this algorithm will work for all possible versions of the TextBox class:
var lines = (from item in myTextBox.Text.Split('\n') select item.Trim());
lines = lines.Skip(numLinesToSkip);
myTextBox.Text = string.Join(Environment.Newline, lines.ToArray());
Note: I'm using Environment.Newline specifically for the case of Silverlight on a Unix platform. For all other cases, you're perfectly fine using "\r\n" in the string.Join call.
Also, I do not consider this an elegant solution, even though it's only 3 lines. What it does is the following:
splits the single string into an array of strings
iterates over that array and builds a second array that does not include the lines skipped
joins the array back into a single string.
I do not consider it elegant because it essentially builds two separate arrays, then builds a string from the second array. A more elegant solution would not do this.

One thing to keep in mind is that the Lines collection of the TextBox does not accurately reflect what the user sees as lines. The Lines collection basically works off of carriage returns, whereas the user could see lines wrapping from one line to the next without a carriage return. This may or may not be the behavior you want.
For example, the user would see the below as three lines, but the Lines collection will show 2 (since there are only 2 carriage returns):
This is line number
one.
This is line 2.
Also, if the form, and the text control are resizable the visible lines in the text will change as the control grows or shrinks.
I wrote a blog post several years ago on how to determine the number of lines in the textbox as the user sees them and get the index of a given line (like to get the line at index: http://ryanfarley.com/blog/archive/2004/04/07/511.aspx, perhaps this post will help.

if (txtLog.Lines.Length > maxNumberLines)
{
txtLog.Lines = txtLog.Lines.Skip(txtLog.Lines.Length - maxNumberLines).ToArray();
}

Reading line by line

I have a program that generates a plain text file. The structure (layout) is always the same. Example:
Text File:
LinkLabel
"Hello, this text will appear in a LinkLabel once it has been
added to the form. This text may not always cover more than one line. But will always be surrounded by quotation marks."
240, 780
So, to explain what is going on in that file:
Control
Text
Location
And when a button on the Form is clicked, and the user opens one of these files from the OpenFileDialog dialog, I need to be able to Read each line. Starting from the top, I want to check to see what control it is, then starting on the second line I need to be able to get all text inside the quotation marks (regardless of whether is is one line of text or more), and on the next line (after the closing quotation mark), I need to extract the location (240, 780)... I have thought of a few ways of going about this but when I go to write it down and put it to practice, it doesn't make much sense and end up figuring out ways that it won't work.
Has anybody ever done this before? Would anybody be able to provide any help, suggestions or advice on how I'd go about doing this?
I have looked up CSV files but that seems too complicated for something that seems so simple.
Thanks
jase

You could use a regular expression to get the lines from the text:
MatchCollection lines = Regex.Matches(File.ReadAllText(fileName), #"(.+?)\r\n""([^""]+)""\r\n(\d+), (\d+)\r\n");
foreach (Match match in lines) {
string control = match.Groups[1].Value;
string text = match.Groups[2].Value;
int x = Int32.Parse(match.Groups[3].Value);
int y = Int32.Parse(match.Groups[4].Value);
Console.WriteLine("{0}, \"{1}\", {2}, {3}", control, text, x, y);
}

I'll try and write down the algorithm, the way I solve these problems (in comments):
// while not at end of file
// read control
// read line of text
// while last char in line is not "
// read line of text
// read location
Try and write code that does what each comment says and you should be able to figure it out.
HTH.

You are trying to implement a parser and the best strategy for that is to divide the problem into smaller pieces. And you need a TextReader class that enables you to read lines.
You should separate your ReadControl method into three methods: ReadControlType, ReadText, ReadLocation. Each method is responsible for reading only the item it should read and leave the TextReader in a position where the next method can pick up. Something like this.
public Control ReadControl(TextReader reader)
{
string controlType = ReadControlType(reader);
string text = ReadText(reader);
Point location = ReadLocation(reader);
... return the control ...
}
Of course, ReadText is the most interesting one, since it spans multiple lines. In fact it's a loop that calls TextReader.ReadLine until the line ends with a quotation mark:
private string ReadText(TextReader reader)
{
string text;
string line = reader.ReadLine();
text = line.Substring(1); // Strip first quotation mark.
while (!text.EndsWith("\"")) {
line = reader.ReadLine();
text += line;
}
return text.Substring(0, text.Length - 1); // Strip last quotation mark.
}

This kind of stuff gets irritating, it's conceptually simple, but you can end up with gnarly code. You've got a comparatively simple case:one record per file, it gets much harder if you have lots of records, and you want to deal nicely with badly formed records (consider writing a parser for a language such as C#.
For large scale problems one might use a grammar driven parser such as this: link text
Much of your complexity comes from the lack of regularity in the file. The first field is terminated by nwline, the second by delimited by quotes, the third terminated by comma ...
My first recomendation would be to adjust the format of the file so that it's really easy to parse. You write the file so you're in control. For example, just don't have new lines in the text, and each item is on its own line. Then you can just read four lines, job done.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Prepare text to paste to MS Word with proper alignment - c#

Related

Why does the Notepad++ [NULL] character not paste?

ITextsharp PDFParser extracting text to textbox

Get different style sections in Microsoft Publisher via Interop

Delete Lines From Beginning of Multiline Textbox in C#

Reading line by line

Categories

Resources