C# How to get automated number of heading in Word Interop - c#

I have code (as soon below) that loops through all the paragraphs and finds a specific string. When I find the string, I print out the automated number of the header it belongs to. The problem, I want the header number as well and can't figure out how to get it. Code here:
Edit: Cleaned up code with working content
string LastHeader = "";
foreach (Paragraph paragraph in aDoc.Paragraphs)
{
Microsoft.Office.Interop.Word.Style thisStyle =
(paragraph.Range.get_Style() as Microsoft.Office.Interop.Word.Style);
if (thisStyle != null && thisStyle.NameLocal.StartsWith("Heading 2"))
{
LastHeader = paragraph.Range.ListFormat.ListString + " " +
paragraph.Range.Text.Replace("\"", "\"\"").Replace("\r", "");
}
else
{
string content = paragraph.Range.Text;
for (int i = 0; i < textToSearch.Count; i++)
{
if (content.Contains(textToSearch[i]))
{
// Do stuff here
}
}
}
}

Every time I re-read your information I get the feeling I'm not understanding completely what you're asking, so the following may contain more information than you're looking for...
To get the numbering of a specific Paragraph:
paragraph.Range.ListFormat.ListString
Word has some built-in bookmarks that can give you information that is otherwise a lot of work to determine. One of these is \HeadingLevel. This gets the first paragraph formatted with a Heading style that precedes a SELECTION. And that's the sticky part, because we don't really want to be making selections... But it dates back to the old WordBasic days when code mimicked user actions, so we're stuck with that.
textToAnalyse.Select
aDoc.Bookmarks("\HeadingLevel").Range.ListFormat.ListString
The bookmark call, itself, will NOT change the selection.
The other option I see, since you're already looping the Paragraphs collection, would be to check the paragraph.Range.Style and, if it begins with the string "Heading" (assuming that's not used otherwise in these documents' styles) save the ListFormat.ListString so that you can call on it if your condition is met.
I do have to wonder, however, why you're "walking" the paragraphs collection and not using Word's built-in Find capability as that would be much faster. It can search text AND (style) formatting.

Related

How to extract the external JS and CSS files of a web page using Selenium

I need to dig into my web page and pull out the external CSS and JS files in order to do some comparisons for my web testing.
I've used:
IJavaScriptExecutor ijse_JS = iwd as IJavaScriptExecutor;
string sHtml = ijse_JS.ExecuteScript("return document.documentElement.innerHTML;").ToString();
I'd like the External CSS and JS files in a the form of a string for me to interrogate later.
Since this is for c#, I'd take anything that you can give, it doesn't have to be selenium, just whatever works
The central premise as to why I'm doing this is because I want to run this script on a site where I don't know what it has yet. I want to run a scan on it, make a change to something, and then do another scan and note the differences and spit it out into a spreadsheet which I can query. Right now, I have Dictionary> with key of Tag names, and the values being a collection of attributes that are useable by that tag. I have another collection full of every possible CSS property as well, and what I've been doing is running through the collection of Tags, and creating a collection of IWebElements, and then iterating through that whole collection
string sCurrentTag = "";
//iterate through a Hashset<string> of all of the possible tags for HTML (premade)
for (int k = 0; k < HashSet_of_All_Tags.Count; k++)
{
//I want to find all of the instances of that specific tag now
sCurrentTag = HashSet_of_All_Tags.ElementAt(k);
//create a collection of IWebElement that is of every instance of the Tag that I'm looking at
IReadOnlyCollection<IWebElement> iwe = iwebdriver_object.FindElements(By.TagName(sCurrentTag))
//now run through it and get all of the styling and attribute values (if they are set)
for(int j = 0; j < iwe; j++)
{
IWebElement Element_Im_Looking_At = iwe.ElementAt(j);
//This finds the tag that I'm looking at, and then it will look at the corresponding Hashset at that spot in the dictionary
//that holds all of the relavent attributes for that element i.e. the 'div' key will have 'align' and all global attributes
//packed into the Hashset. If there is nothing set, then don't bother keeping it. If something is set, keep it in the format
//of "Tag=Attribute_Value; ". I will then iterate through this string later with RegEx to get the information that I care about
for (int i = 0; i < dsh_Tags[sEle.TagName].Count(); i++)
{
//reset the strings to hold onto all of the attribute values and CSS values
sAttr = sStyle = "";
//dsh_Tags is a Dictionary<string, Hashset<string>>
sCurr = dsh_Tags[Element_Im_Looking_At.TagName].ElementAt(i);
s = Element_Im_Looking_At.GetAttribute(sCurr);
if (s != null && s != "")
sAttr += $"{sCurr}={s}; ";
}
//This loop iterates through all CSS properties and gets every value, and packs it into a string in the same format as the one above
//but does it for CSS styles
for (int i = 0; i < hs_AllCSSProp.Count(); i++)
{
sCurr = hs_AllCSSProp.ElementAt(i);
s = Element_Im_Looking_At.GetCssValue(sCurr);
if (s != null && s != "")
sStyle += $"{sCurr}={s}; ";
}
}
}
As you may be able to gather, there is an IMENSE amount of overhead. I've done a bunch of stuff with threads and stuff which has helped with speed, but this still takes a very long time as I have to got through the entire collection of HTML tags, and their corresponding attributes, and the entire collection of CSS properties. I do this because, again, I don't know exactly what I'm looking at initially. My plan was to make it so that I look through the CSS and JavaScript FIRST, and use regex to find all of the attributes that could change, and all of the CSS that is being altered here, and then create collections that are much much smaller to those corresponding elements, iterate through only the things that I want (which will usually only be 2 or 3 attributes and 4 or 5 css properties for each element) instead of every possible one. When I did this on a page where I knew eveything about it and what would change and only included the stuff that I care about, it increased speed by ~90%.
So this brings me to the point that I'm at now. If I could dynamically figure out what I needed to look at and then create libraries of sorts on the side for the site that I'm looking at, I could run a script that would scan all of the css, and all of the scripting ONCE, store those values, and then scan the whole page, make one change, and scan the whole page again and not the differences except this time, I would only be looking at the stuff that I need to.

"ProofError" elements causing issues in OpenXML

I am trying to read through a word document using Open XML.
I am looking for key tags within the document in order to identify the values i need to pick up from the document.
I am looping through each paragraph, and then each run within the document to be able to find these.
However it appears that the spelling & grammar check is causing problems, splitting up the "runs" within the documents with any errors it identifies with "ProofError" elements, which is making it difficult to parse the document correctly.
I have tried to remove all ProofError elements and save the document, however they appear to come back.
If i run the spelling and grammar check within MS Word manually there is no issue, though this isn't practical.
Does anyone know a way I can get around this?
Sample text from doc:
Communication System: UID 0, CW (0); Frequency: 900 MHz;Duty Cycle: 1:1
Medium: 900MHz HSL Medium parameters used: f = 900 MHz; σ = 0.979 S/m; εr = 40.68; ρ = 1000 kg/m3
Code used to explore the document
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(openFileDialog.FileName, false))
{
// start looking through the file here
// correct proof errors here
Body body = wordDocument.MainDocumentPart.Document.Body;
foreach (Paragraph p in body.OfType<Paragraph>())
{
p.GetType();
List<ProofError> errList = new List<ProofError>();
foreach (ProofError err in p.OfType<ProofError>())
{
errList.Add(err);
}
foreach (ProofError err in errList)
{
err.Remove();
}
}
wordDocument.Save();
}
The code above will remove any of the ProofError elements from the list, I hope that doing this and saving it would merge any similar runs together, however the proof errors come back when saving.
Screenshot below should show you the children of a paragraph.
Link to an example Document which throws up errors - these are due to the language being incorrect, but i have no control over the format coming in to me, and there will be other errors thrown up unrelated to language.
Sample File

Find a range of text with specific formatting with Word interop

I have a MS Word add-in that needs to extract text from a range of text based solely on its formatting: in my case in particular, if the text is underlined or struck through, the range of characters/words that are underlined or struck through need to be found so that I can keep track of them.
My first idea was to use Range.Find, as is outlined here, but that won't work when I have no idea what the string is that I'm looking for:
var rng = doc.Range(someStartRange, someEndRange);
rng.Find.Forward = true;
rng.Find.Format = true;
// I removed this line in favor of putting it inside Execute()
//rng.Find.Text = "";
rng.Find.Font.Underline = WdUnderline.wdUnderlineSingle;
// this works
rng.Find.Execute("");
int foundNumber = 0;
while (rng.Find.Found)
{
foundNumber++;
// this needed to be added as well, as per the link above
rng.Find.Execute("");
}
MessageBox.Show("Underlined strings found: " + foundNumber.ToString());
I would happily parse the text myself, but am not sure how to do this while still knowing the formatting. Thanks in advance for any ideas.
EDIT:
I changed my code to fix the find underline issue, and with that change the while loop never terminates. More specifically, rng.Find.Found finds the underlined text, but it finds the same text over and over, and never terminates.
EDIT 2:
Once I added the additional Execute() call inside the while loop, the find functioned as needed.
You need
rng.Find.Font.Underline = wdUnderline.wdUnderlineSingle;
(At the moment you are setting the formatting for the specified rng, rather than the formatting for the Find)

How do I text wrap to next acrofield in iTextSharp?

how do I get text to wrap from one acrofield to the next? I have an adobe pdf doc our client gave us. It has acro fields one atop another (all with the same name). They want the text to wrap from one to another when it reaches the end of the line. All the other examples I see out there do not deal with filling in acro fields that wrap. Please help!
// loop through disabilities and display them
foreach (var disability in formNature.Disabilities)
{
fields.SetField("EVALUATION", disability.PrimaryDisabilityName + "; ");
}
in theory this should loop through all the disabilities they had entered on the web form and display them one after another while text-wrapping when it reaches the end of each line. But instead it only displays one item one the field.
This isn't a complete answer unfortunately.
First, when you call SetField() you are erasing the current contents of the field and replacing it with your new value. When done in a loop only the last value will ever be stored then. What you need to do is loop through each value and concatenate them into one big string.
string buf = '';
foreach (var disability in formNature.Disabilities)
{
buf += disability.PrimaryDisabilityName + "; ";
}
buf = buf.Trim();
Second, the PDF standard to the best of my knowledge does not support chaining of fields for overflow which is what you are looking for. The only way that I know of to accomplish what you are trying is to actually measure the strings and compare them to the widths of the fields and truncate them as needed. To do this you will need to find the font used for the given field, create a BaseFont from it and use that to Measure the string. Then compare that with the field's rectangle and use only the characters that "fit" into that field. Repeat as needed.
That all said, I would really really recommend that you just edit the PDF and replace the multiple fields with one large field that supports multiple lines. Your life will be much, much easier.

C# Saving "X" times into one .txt file without overwriting last string

Well, now i have a new problem.
Im writing code in C#
I want to save from textBoxName into group.txt file each time i enter string into textbox and click on save button. It should save at this order (if its possible to sort it like A-Z that would be great):
1. Petar Milutinovic
2. Ljiljana Milutinovic
3. Stefan Milutinovic
4. ... etc
I cant get it to work, i tried to use tehniques from my first question, and no solution yet :(
This is easy one i guess, but im still a beginer and i need this baddly...
Try to tackle this from a top-down approach. Write out what should happen, because it's not obvious from your question.
Example:
User enters a value in a (single-line?) textbox
User clicks Save
One new line is appended to the end of a file, with the contents of the textbox in step 1
Note: each line is prefixed with a line number, in the form "X. Sample" where X is the line number and Sample is the text from the textbox.
Is the above accurate?
(If you just want to add a line to a text file, see http://msdn.microsoft.com/en-us/library/ms143356.aspx - File.AppendAllText(filename, myTextBox.Text + Environment.NewLine); may be what you want)
Here's a simple little routine you can use to read, sort, and write the file. There are loads of ways this can be done, mine probably isn't even the best. Even now I'm thinking "I could have written that using a FileStream and done the iteration for counting then", but they're micro-optimizations that can be done later if you have performance issues with multi-megabyte files.
public static void AddUserToGroup(string userName)
{
// Read the users from the file
List<string> users = File.ReadAllLines("group.txt").ToList();
// Strip out the index number
users = users.Select(u => u.Substring(u.IndexOf(". ") + 2)).ToList();
users.Add(userName); // Add the new user
users.Sort((x,y) => x.CompareTo(y)); // Sort
// Reallocate the number
for (int i = 0; i < users.Count; i++)
{
users[i] = (i + 1).ToString() + ". " + users[i];
}
// Write to the file again
File.WriteAllLines("group.txt", users);
}
If you need the file to be sorted every time a new line is added, you'll either have to load the file into a list, add the line, and sort it, or use some sort of search (I'd recommend a binary search) to determine where the new line belongs and insert it accordingly. The second approach doesn't have many advantages, though, as you basically have to rewrite the entire file in order to insert a line - it only saves you time in the best case scenario, which occurs when the line to be inserted falls at the end of the file. Additionally, the second method is a bit lighter on the processor, as you aren't attempting to sort every line - for small files however, the difference will be unnoticeable.

Categories

Resources