Insert List<string> into word document - c#

I'm currently trying to find a way to read in, and insert data into a word document. So far this is what I have gotten:
class Program
{
static void Main(string[] args)
{
var FileName = #"C:\temp\test.DOC";
List<string> data = new List<string>();
Application app = new Application();
Document doc = app.Documents.Open(#"C:\temp\test.DOC");
foreach (Paragraph objParagraph in doc.Paragraphs)
{
data.Add(objParagraph.Range.Text.Trim());
}
//data.Insert
data.Insert(16, "Test 1");
data.Insert(16, "\tTest 2\tName\tAmount");
data.Insert(16, "Test 3");
data.Insert(16, "Test 4");
data.Insert(16, "Test 5");
data.Insert(16, "Test 6");
data.Insert(16, "Test 7");
data.Insert(16, "Test 8");
data.Insert(16, "Test 9");
data.Insert(16, "Test 10");
var x = doc.Paragraphs.Add();
x.Range.Text.Insert(0,"\tTest 2\tName\tAmount");
doc.SaveAs2(#"C:\temp\test3.DOC");
((_Document)doc).Close();
((_Application)app).Quit();
}
}
Now, this successfully populates the List data - but I'm trying to append each new test element at the [16]th index, and save it into the word document. Is there a simple way to accomplish this, or am I just over-thinking this issue?
I realize the string list is separate from the Document object which represents the word document.
I have a few other places in the document where I am using bookmarks to add data, but I don't think it is possible to use bookmarks for placing the data in this instance - or If I don't have to use bookmarks I'd like to stray away from that.
EDIT: I am trying to insert X amount of elements at the [16]th position within the data[].
EDIT 2:
Essentially I am sourcing the data dynamically, and I'm not sure how many records/rows I'll need to add to the document, so it could be as follows:
[15]
[16]\tName\tID\tAMOUNT
[17]\tName\tID\tAMOUNT
[18]\tName\tID\tAMOUNT
Since the headers will already be there (NAME,ID,AMOUNT), and each time I run the program I'm not sure how many elements I'll be inserting into the document - so as long as each element is placed under one another, and on the 16th line in the document template I have setup that should accomplish what I am trying to do.
Image 1 - Image into string array
Image 2 - Image after adding content into the string - this is what the resulting document. (this is to be saved)
I'm attempting to put each element ie: Test1 Test2 Test3 in their each own column each (see above)

Again I am totally confused as to why you want to read the word file into a string list array. This simply adds the text you show after line 15 into the word document. You do not specify WHERE Test 1, Test 2, Test3... are coming from.
Edit: Added a try-catch just in case the document does not have at least 16 paragraphs.
static void Main(string[] args)
{
List<string> data = new List<string>();
Application app = new Application();
Document doc = app.Documents.Open(#"C:\temp\test.DOC");
string testRows = "Test 1\n\tTest 2\tName\tAmount\nTest 3\nTest 4\nTest 5\nTest 6\nTest 7\nTest 8\nTest 9\nTest 10\n";
try
{
var x = doc.Paragraphs[16];
x = doc.Paragraphs.Add(x.Range);
x.Range.Text = testRows;
doc.SaveAs2(#"C:\temp\test3.DOC");
}
catch (System.Runtime.InteropServices.COMException e)
{
Console.WriteLine("COMException: " + e.StackTrace.ToString());
Console.ReadKey();
}
((_Document)doc).Close();
((_Application)app).Quit();
}

So what I figured out (for my purposes) is that is is easiest to insert a list of strings into makeshift columns separated by tabs by inserting at specific paragraphs.
Since I am using bookmarks to place text as well - I found it useful to work from a copy of a document instead of worrying about removing/creating bookmarks each time.
When populating the list that you are going to be placing at a specific paragraph mark it is useful to append tab characters as well as newline charters on the fly. Later on this will make it easier to loop through the list and place them nicely on the document.
Depending on the way you are going to go about placing columns some logic will have to be determined to space everything correctly. I did this by creating maximum lengths for columns and trimming, and accommodating for smaller/larger lengths by adding specific amounts of tab characters.
So, my columns I am populating would look like:
myList.Add("\t12345678912345\tJohn Doe\t\t\t\t123456\r\n");
myList.Add("\987654321654987\tJohn Smith\t\t\t\98765\r\n");
These lines would be inserted at paragraph 17 and placed neatly under headers.
Lastly, I decided to use bookmarks to place single lines of text like the date,title, and signature values since those values don't need to be correctly spaced or anything.
At the end I delete the copy of the word document I'm working on, and delete the pdf (since in my case I'm sending it via email)
Thank you for the help #JohnG - I hope this answer might help others who come across it. I removed the try-catch since I'm working from the template as well.
File.Copy(sCurrentPath + "\\" + "testTemplate.DOC", sCurrentPath + "\\" + "test.DOC");
Application app = new Application();
Document doc = app.Documents.Add(sCurrentPath + "\\" + "test.DOC");
foreach (string sValue in myList)
{
var List = doc.Paragraphs[17];
myList = doc.Paragraphs.Add(myList.Range);
myList.Range.Text = sValue;
}
if (doc.Bookmarks.Exists("Date"))
{
object oBookMark = "Date";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = DateTime.Now.ToString("MM/dd/yyyy");
}
if (doc.Bookmarks.Exists("Signature"))
{
object oBookMark = "Signature";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = "My Name";
}
if (doc.Bookmarks.Exists("Title"))
{
object oBookMark = "Title";
doc.Bookmarks.get_Item(ref oBookMark).Range.Text = "Title Here";
}
doc.ExportAsFixedFormat(sCurrentPath + "\\" + "test.pdf", WdExportFormat.wdExportFormatPDF);
File.Delete(sCurrentPath + "\\" + "testCopy.DOC");
File.Delete(sCurrentPath + "\\" + "test.pdf");
((_Document)doc).Close();
((_Application)app).Quit();

Related

Word found unreadable content in docm

I am writing a program in C# using Open XML that transfers data from excel to word.
Currently, I have this:
internal override void UpdateSectionSheets(int sectionNum, List<List<string>> tableContents)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(MainForm.WordFileDialog.FileName, true))
{
List<Table> tables = doc.MainDocumentPart.Document.Descendants<Table>().ToList();
foreach(Table table in tables)
{
int row = 1;
if (table.Descendants<TableRow>().FirstOrDefault().Descendants<TableCell>().FirstOrDefault().InnerText == sectionNum.ToString())
{
foreach(var item in tableContents[0])
{
// splits the tableContents[0][row - 1] into individual strings at each instance of "\n\n"
String str = tableContents[0][row - 1];
String[] separator = {"\n\n"};
Int32 count = 6; // max 6 sub strings (usually only two but allowed for extra)
String[] subStrs = str.Split(separator, count, StringSplitOptions.RemoveEmptyEntries);
// transfer comment
table.Descendants<TableRow>().ElementAt(row).Descendants<TableCell>().ElementAt(2).RemoveAllChildren<Paragraph>(); // removes the existing contents in the cell
foreach (string s in subStrs)
{
// for every substring, create a new paragraph and append the substring to that new paragraph. Makes it so that each sentence is on its own line
Text text = new Text(s);
table.Descendants<TableRow>().ElementAt(row).Descendants<TableCell>().ElementAt(2).AppendChild(new Paragraph(new Run(text)));
}
// transfer verdict
table.Descendants<TableRow>().ElementAt(row).Descendants<TableCell>().ElementAt(3).RemoveAllChildren<Paragraph>();
Paragraph p = new Paragraph(new ParagraphProperties(new Justification() { Val = JustificationValues.Center }));
p.Append(new Run(new Text(tableContents[1][row - 1])));
table.Descendants<TableRow>().ElementAt(row).Descendants<TableCell>().ElementAt(3).AppendChild(p);
row++;
}
}
}
doc.Save();
}
}
I believe the line causing the issue is: table.Descendants<TableRow>().ElementAt(row).Descendants<TableCell>().ElementAt(2).AppendChild(new Paragraph(new Run(text)));
If I put new Text(tableContents[0][row - 1]) in place of (text) in the above line, the program will run and word doc will open with no errors, but the output is not in the format I need.
The program runs without throwing any errors, but when I try to open the word doc it gives a "word found unreadable content in xxx.docm" error. If I say I trust the source and want word to recover the document, I can open the doc and see that the code is working how I want. However, I don't want to have to do that every time. Does anyone know what is causing the error and how I can fix it?

Extract words from a doc/docx file c#

I want to extract all the words from a Word file (doc/docx) and put them into a list. It seems like microsoft.Office.Interop works just if i want to extract paragraphs and add them into a list.
List<string> data = new List<string>();
Microsoft.Office.Interop.Word.Application app = new
Microsoft.Office.Interop.Word.Application();
Document doc = app.Documents.Open(dlg.FileName);
foreach (Paragraph objParagraph in doc.Paragraphs)
data.Add(objParagraph.Range.Text.Trim());
((_Document)doc).Close();
((_Application)app).Quit();`
I also found the way to extract word by word but it didn't works with big document because of the loop that generates an exception.
`Dictionary<int, string> motRap = new Dictionary<int, string>();
Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
Document document = application.Documents.Open("C:/Users/Titri/Desktop/test/test/bin/Debug/po.txt");
// Loop through all words in the document.
int count = document.Words.Count;
for (int i = 1; i <= count; i++)
{
string text = document.Words[i].Text;
motRap.Add(i, text);
}
// Close word.
application.Quit();`
So my question is, if there is a way to extract words from a big word file. I think that Microsoft.Office.Interop is not the good tool to extract from a big file.
Sorry my english is not good.
The object inside a paragraph is called Run, though I don't know whether or not this is available in Interop. To enhance your experience performancewise, I would suggest you switch to using OpenXmlSdk, in case you have to process a large amount of documents.
If you want to stick to Interop, why don't you just split each paragraph into an array (delimiter obviously space) and add all the words after that?

Rich Text to Plain Text via C#?

I have a program that reads through a Microsoft Word 2010 document and puts all text read from the first column of every table into a datatable. However, the resulting text also includes special formatting characters (that are usually invisible in the original Word document).
Is there a way that I can take the string of text that I've read and strip all the formatting characters from it?
The program is pretty simple, and uses the Microsoft.Office.Interop.Word assemblies. Here is the main loop where I'm grabbing the text from the document:
// Loop through each table in the document,
// grab only text from cells in the first column
// in each table.
foreach (Table tb in docs.Tables)
{
for (int row = 1; row <= tb.Rows.Count; row++)
{
var cell = tb.Cell(row, 1);
var listNumber = cell.Range.ListFormat.ListString;
var text = listNumber + " " + cell.Range.Text;
dt.Rows.Add(text);
}
}
EDIT: Here is what the text ("1. Introduction") looks like in the Word document:
This is what it looks like before being put into my datatable:
And this is what it looks like when put into the datatable:
So, I'm trying to figure out a simple way to get rid of the control characters that seem to be appearing (\r, \a, \n, etc).
EDIT: Here is the code I'm trying to use. I created a new method to convert the string:
private string ConvertToText(string rtf)
{
using (RichTextBox rtb = new RichTextBox())
{
rtb.Rtf = rtf;
return rtb.Text;
}
}
When I run the program, it bombs with the following error:
The variable rtf, at this point, looks like this:
RESOLUTION: I trimmed the unneeded characters before writing them to the datatable.
// Loop through each table in the document,
// grab only text from cells in the first column
// in each table.
foreach (Table tb in docs.Tables)
{
for (int row = 1; row <= tb.Rows.Count; row++)
{
var charsToTrim = new[] { '\r', '\a', ' ' };
var cell = tb.Cell(row, 1);
var listNumber = cell.Range.ListFormat.ListString;
var text = listNumber + " " + cell.Range.Text;
text = text.TrimEnd(charsToTrim);
dt.Rows.Add(text);
}
}
I don't know exactly what formatting you're trying to remove, but you could try something like:
text = text.Where(c => !Char.IsControl(c)).ToString();
That should strip the non-printing characters out.
Al alternative can be that You need to add a rich textbox in your form (you can keep it hidden if you don't want to show it) and when you have read all your data just assign it to the richtextbox. Like
//rtfText is rich text
//rtBox is rich text box
rtBox.Rtf = rtfText;
//get simple text here.
string plainText = rtBox.Text;
Why dont you give this a try:
using System;
using System.Text.RegularExpressions;
public class Example
{
static string CleanInput(string strIn)
{
// Replace invalid characters with empty strings.
try {
return Regex.Replace(strIn, #"[^\w\.#-]", "",
RegexOptions.None, TimeSpan.FromSeconds(1.5));
}
// If we timeout when replacing invalid characters,
// we should return Empty.
catch (RegexMatchTimeoutException) {
return String.Empty;
}
}
}
Here's a link for it as well.
http://msdn.microsoft.com/en-us/library/844skk0h.aspx
Totally different approach would be to look at the Open Office XML SDK.
This example should get you started.

how to get text file rows with no delimiter into array

I have a text file that I'm trying to input into an array called columns.
Each row in the text file belongs to a different attribute in a sub-class I have created.
For example, row 2 in my text file is a date that I would like to pass over...I do not want to use the Split because I do not have a delimiter but I do not know an alternative. I am not fully understanding the below if someone could help. When I try to run it, it says that columns[1] is out of its range...Thank you.
StreamReader textIn =
new StreamReader(
new FileStream(path, FileMode.OpenOrCreate, FileAccess.Read));
//create the list
List<Event> events = new List<Event>();
while (textIn.Peek() != -1)
{
string row = textIn.ReadLine();
string[] columns = row.Split(' ');
Event special = new Event();
special.Day = Convert.ToInt32(columns[0]);
special.Time = Convert.ToDateTime(columns[1]);
special.Price = Convert.ToDouble(columns[2]);
special.StrEvent = columns[3];
special.Description = columns[4];
events.Add(special);
}
Input file sample:
1
8:00 PM
25.00
Beethoven's 9th Symphony
Listen to the ninth and final masterpiece by Ludwig van Beethoven.
2
6:00 PM
15.00
Baseball Game
Come watch the championship team play their archrival--No work stoppages, guaranteed.
Well, one way to do it (though it is a bit ugly) would be to use File.ReadAllLines, and then loop through the array, something like this:
string[] lines = File.ReadAllLines(path);
int index = 0;
while (index < lines.Length)
{
Event special = new Event();
special.Day = Convert.ToInt32(lines[index]);
special.Time = Convert.ToDateTime(lines[index + 1]);
special.Price = Convert.ToDouble(lines[index + 2]);
special.StrEvent = lines[index + 3];
special.Description = lines[index + 4];
events.Add(special);
lines = lines + 5;
}
This is very brittle code - a lot can break it. What if one of the events is missing a line? What if there are multiple blank lines in it? What if one of the Convert.Toxxx methods throws an error?
If you have the option to change the format of the file, I strongly recommend you make it delimited at least. If you can't change the format, you'll need to make the code sample above more robust so that it can handle blank lines, failed conversions, missing lines, etc.
Much, much, much easier to use a delimited file. Even easier to use an XML or JSON file.
Delimited File (CSV)
Let's say you have the same sample input, but this time it's a CSV file, like this:
1,8:00 PM,25.00,"Beethoven's 9th Symphony","Listen to the ninth and final masterpiece by Ludwig van Beethoven."
2,6:00 PM,15.00,"Baseball Game","Come watch the championship team play their archrival--No work stoppages, guaranteed"
I put quotes on the last two items in case there's ever a comma in there, it won't break the parsing.
For CSV files, I like to use the Microsoft.VisualBasic.FileIO.TextFieldParser class, which despite it's name can be used in C#. Don't forget to add a reference to Microsoft.VisualBasic and a using directive (using Microsoft.VisualBasic.FileIO;).
The following code will allow you to parse the above CSV sample:
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] {","};
parser.TextFieldType = Delimited;
parser.HasFieldsEnclosedInQuotes = true;
string[] parsedLine;
while (!parser.EndOfData)
{
parsedLine = parser.ReadFields();
Event special = new Event();
special.Day = Convert.ToInt32(parsedLine[0]);
special.Time = Convert.ToDateTime(parsedLine[1]);
special.Price = Convert.ToDouble(parsedLine[2]);
special.StrEvent = parsedLine[3];
special.Description = parsedLine[4];
events.Add(special);
}
}
This still has some issues though - you would need to handle cases where there were missing fields and I would recommend using TryParse methods instead of Convert.Toxxx, but it's a little easier (I think) than the non-delimited sampe.
XML File (Using LINQ to XML)
Now let's try it with an XML file and use LINQ to XML to get the data:
<Events>
<Event>
<Day>1</Day>
<Time>8:00 PM</Time>
<Price>25.00</Price>
<Title><![CDATA[Beethoven's 9th Symphone]]></Title>
<Description><![CDATA[Listen to the ninth and final masterpiece by Ludwig van Beethoven.]]></Description>
</Event>
<Event>
<Day>2</Day>
<Time>6:00 PM</Time>
<Price>15.00</Price>
<Title><![CDATA[Baseball Game]]></Title>
<Description><![CDATA[Come watch the championship team play their archrival--No work stoppages, guaranteed]]></Description>
</Event>
</Events>
I've used CDATA for the title and description so that special characters won't break the XML parsing.
This is easily parsed into your Events by the following code:
XDocument doc = XDocument.Load(path);
List<Event> events = (from x in doc.Descendants("Event")
select new Event {
Day = Convert.ToInt32(x.Element("Day").Value),
Time = Convert.ToDateTime(x.Element("Time").Value),
Price = Convert.ToDouble(x.Element("Price").Value),
StrEvent = x.Element("Title").Value,
Description = x.Element("Description").Value
}).ToList();
Of course, this is still not perfect as you still have the possibility of conversion failures or missing elements.
Pipe-Delimited File Example
Per our discussion in the comments, if you want to use the pipe (|), you need to put each event (in its entirety) on one line, like this:
1|8:00 PM|25.00|Beethoven's 9th Symphony|Listen to the ninth and final masterpiece by Ludwig van Beethoven.
2|6:00 PM|15.00,|Baseball Game|Come watch the championship team play their archrival--No work stoppages, guaranteed
You can still use the TextFieldParser example above if you like (just change the delimiter from , to |, or if you want you can use your original code.
Some Final Thoughts
I wanted to also address the original code and show why it wasn't working. The main reason was that you were reading one line at a time, and then splitting on ' '. This would have been a good start if all the fields were on the same line (although it still would have had problems because of spaces in the Time, StrEvent and Description fields), but they weren't.
So when you read the first line (which was 1) and split on ' ', you got one value back (1). When you tried to access the next element of the split array, you got the index out of range error because there was no columns[1] for that line.
Essentially, you were trying to treat each line as if it had all the fields in it, when in reality it was one field per line.
For your given sample file something like
string[] lines = File.ReadAllLines(path);
for (int index = 4; index < lines.Length; index += 5)
{
Event special = new Event();
special.Day = Convert.ToInt32(lines[index - 4]);
special.Time = Convert.ToDateTime(lines[index - 3]);
special.Price = Convert.ToDouble(lines[index - 2]);
special.StrEvent = lines[index - 1];
special.Description = lines[index];
events.Add(special);
}
Would do the job, but like Tim already mentioned, you should consider changing your file format.
delimiters can be deleted if your side column values haven't intersect char or have fix size.by this condition you can read file and split field on it.
if you want to read from file and load data automatically to variables , i suggest Serialize and deSeialize variabls to file but that file isn't text file!

Text boxes and Xml in C#

I just started using VS2010 and C#.
I'm trying to create an app which takes values from two textboxes and adds it to an existing xml file.
eg.
Text Box 1 Text Box 2
---------- ----------
A B
C D
E F
I want the resultant xml file to be like this.
<root>
<element>
<Entry1>A</Entry1>
<Entry2>B</Entry2>
</element>
</root>
and so on...
Can this be done using C# ??
I'm unable to add the entries alternatively i.e. Entry1 should contain Text Box 1 line #1 and Entry2 Text Box 2 line #1.
Any help would be appreciated.
Thanks
You need to split the string retrieved from the text box based on the new line like this:
string[] lines = theText.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
Once you have values split for each text box, you can use System.xml.linq.xdocument class and loop through the values that you retrieve above.
Something like this:
XDocument srcTree = new XDocument(new XElement("Root",
new XElement("entry1", "textbox value1")))
You can retrieve a xml document using a linq query or save it in an xml file using the Save method of XDocument
The below code will give you a string of XML data from the textboxes:
private string createXmlTags(TextBox textBox1, TextBox textBox2)
{
string strXml = string.Empty;
string[] text1Val = textBox1.Text.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
string[] text2Val = textBox2.Text.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
int count = 1;
IList<XElement> testt = new List<XElement>();
for (int i = 0; i < text1Val.Count(); i++)
{
testt.Add(new XElement("Entry" + count, text1Val[i]));
while (!String.IsNullOrEmpty(text2Val[i]))
{
count = count + 1;
testt.Add(new XElement("Entry"+count,text2Val[i]));
break;
}
count = count + 1;
}
foreach (var xElement in testt)
{
strXml += xElement.ToString();
}
return strXml;
}
You can then insert the code to an existing xml document. Follow: How can I build XML in C#? and How to change XML Attribute
Read here: XDocument or XmlDocument
I will have the decency of not copying the code from there. Every basics you need to know on creating a XML doc is well explained there.
There are two options, I would personally go with XDocument.
I know there's no code in this answer but since you haven't tried anything, not even apparently searching Google (believe me, you'd find it), I'd rather point you in the right direction than "giving you the fish".

Categories

Resources