Find Superscript and Subscript in ms word? - c#

I want to convert word files to html for insert to database and I can't find subscript and superscript.
How can I find these from Microsot Word file?

I found this solution for my question:
if (range.Font.Subscript > 0 || range.Font.Superscript > 0)
{
foreach (var subItem in range.Words)
{
var supTempRange = doc.Paragraphs[i + 1].Range;
supTempRange.Find.ClearFormatting();
supTempRange.Find.Format = true;
supTempRange.Find.Font.Superscript = 1;
while (supTempRange.Find.Execute())
{
MessageBox.Show(supTempRange.Text);
}
var subTempRange = doc.Paragraphs[i + 1].Range;
subTempRange.Find.ClearFormatting();
subTempRange.Find.Format = true;
subTempRange.Find.Font.Subscript = 1;
while (subTempRange.Find.Execute())
{
MessageBox.Show(subTempRange.Text);
}
}}

Related

C# Split and get value

I have a little problem with getting items from webbrowser.document.
the part of code in document tha i need is this>
primary-text,7.gm2-body-2">**ineedthis.se**</div> <div jstcache="194"
I need to parse the "ineedthis.se" that will be different every time.
my code is this
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
System.IO.StreamReader sr = new System.IO.StreamReader(webBrowser1.DocumentStream.ToString());
string rssourcecode = sr.ReadToEnd();
Regex r = new Regex("7.gm2-body-2 > </ div > < div ", RegexOptions.Multiline);
MatchCollection matches = r.Matches(rssourcecode);
// Dim r As New System.Text.RegularExpressions.Regex("here need splitersorsomething", RegexOptions.Multiline)
foreach (Match itemcode in matches)
{
listBox1.Items.Add(itemcode.Value.Split(here need splites).GetValue(2));
}
}
so. can you please help me with right splitters ? thanks a lot
Your question was not very clear,still what I could understand is you want to get a part of a string(Substring) from a string.
Assuming you have a string value saved in variable:
string stringValue = primary-text,7.gm2-body-2">**ineedthis.se**</div> <div jstcache="194";
And you want to extract is: ineedthis.se
So you can try out stringValue.Substring(31,44);.
You can take reference from : https://www.c-sharpcorner.com/UploadFile/mahesh/substring-in-C-Sharp/
Let's say your string is following
var str = "<div primary-text,7.gm2-body-2\">**some random text**</div> <div jstcache=\"194\"";
The below is a very rudimentary solution but will help you in your answer
var foundStart = false;
var foundEnd = false;
var startIndex = -1;
var length = 0;
for(var index = 0; index < str.Length; index++)
{
if (!foundStart)
{
if (str[index].Equals('<') && str.Substring(index, 4).Equals("<div"))
{
foundStart = true;
continue;
}
}
if (foundStart && !foundEnd)
{
if (str[index].Equals('>'))
{
foundEnd = true;
continue;
}
}
if (foundStart && foundEnd)
{
if (startIndex == -1)
startIndex = index;
else
length++;
if (str[index + 1].Equals('<'))
{
foundStart = false;
foundEnd = false;
break;
}
}
}
//// This is your answer
Console.WriteLine(str.Substring(startIndex, length));

How can I use indexof and substring to find words in a string?

In the constructor :
var tempFR = File.ReadAllText(file);
GetResults(tempFR);
Then :
private List<string> GetResults(string file)
{
List<string> results = new List<string>();
string word = textBox1.Text;
string[] words = word.Split(new string[] { ",," }, StringSplitOptions.None);
for(int i = 0; i < words.Length; i++)
{
int start = file.IndexOf(words[i], 0);
results.Add(file.Substring(start));
}
return results;
}
words contains in this case 3 words System , public , test
I want to find all the words in file and add them to the list results using indexof and substring.
The way it is now start value is -1 all the time.
To clear some things.
This is a screenshot of the textBox1 :
That is why I'm using two commas to split and get the words.
This screenshot showing the words after split them from the textBox1 :
And this is the file string content :
I want to add to the List results all the words in the file.
When looking at the last screenshot there should be 11 results.
Three time the word using three times the word system five times the word public.
but the variable start is -1
Update :
Tried Barns solution/s but for me it's not working good.
First the code that make a search and then loop over the files and reporting to backgroundworker :
int numberofdirs = 0;
void DirSearch(string rootDirectory, string filesExtension, string[] textToSearch, BackgroundWorker worker, DoWorkEventArgs e)
{
List<string> filePathList = new List<string>();
int numberoffiles = 0;
try
{
filePathList = SearchAccessibleFilesNoDistinct(rootDirectory, null, worker, e).ToList();
}
catch (Exception err)
{
}
label21.Invoke((MethodInvoker)delegate
{
label21.Text = "Phase 2: Searching in files";
});
MyProgress myp = new MyProgress();
myp.Report4 = filePathList.Count.ToString();
foreach (string file in filePathList)
{
try
{
var tempFR = File.ReadAllText(file);
_busy.WaitOne();
if (worker.CancellationPending == true)
{
e.Cancel = true;
return;
}
bool reportedFile = false;
for (int i = 0; i < textToSearch.Length; i++)
{
if (tempFR.IndexOf(textToSearch[i], StringComparison.InvariantCultureIgnoreCase) >= 0)
{
if (!reportedFile)
{
numberoffiles++;
myp.Report1 = file;
myp.Report2 = numberoffiles.ToString();
myp.Report3 = textToSearch[i];
myp.Report5 = FindWordsWithtRegex(tempFR, textToSearch);
backgroundWorker1.ReportProgress(0, myp);
reportedFile = true;
}
}
}
numberofdirs++;
label1.Invoke((MethodInvoker)delegate
{
label1.Text = string.Format("{0}/{1}", numberofdirs, myp.Report4);
label1.Visible = true;
});
}
catch (Exception err)
{
}
}
}
I have the words array already in textToSearch and the file content in tempFR then I'm using the first solution of Barns :
private List<string> FindWordsWithtRegex(string filecontent, string[] words)
{
var res = new List<string>();
foreach (var word in words)
{
Regex reg = new Regex(word);
var c = reg.Matches(filecontent);
int k = 0;
foreach (var g in c)
{
Console.WriteLine(g.ToString());
res.Add(g + ":" + k++);
}
}
Console.WriteLine("Results of FindWordsWithtRegex");
res.ForEach(f => Console.WriteLine(f));
Console.WriteLine();
return res;
}
But the results I'm getting in the List res is not the same output in Barns solution/s this is the results I'm getting the List res for the first file :
In this case two words system and using but it found only the using 3 times but there is also system 3 times in the file content. and the output format is not the same as in the Barns solutions :
Here is an alternative using Regex instead of using IndexOf. Note I have created my own string to parse, so my results will be a bit different.
EDIT
private List<string> FindWordsWithCountRegex(string filecontent, string[] words)
{
var res = new List<string>();
foreach (var word in words)
{
Regex reg = new Regex(word, RegexOptions.IgnoreCase);
var c = reg.Matches(filecontent).Count();
res.Add(word + ":" + c);
}
return res;
}
Simple change this part and use a single char typically a space not a comma:
string[] words = word.Split(' ');
int start = file.IndexOf(words[i],0);
start will be -1 if the word is not found.
MSDN: IndexOf(String, Int32)
for(int i = 0; i < words.Length; i++)
{
int start = file.IndexOf(words[i], 0);
// only add to results if word is found (index >= 0)
if (start >= 0) results.Add(file.Substring(start));
}
If you want all appearance of the words you need an extra loop
int fileLength = file.Length;
for(int i = 0; i < words.Length; i++)
{
int startIdx = 0;
while (startIdx < fileLength ){
int idx = file.IndexOf(words[i], startIdx]);
if (start >= 0) {
// add to results
results.Add(file.Substring(start));
// and let Word-search continue from last found Word Position Ending
startIdx = (start + words.Length);
}
}
int start = file.IndexOf(words[i], 0);
// only add to results if word is found (index >= 0)
if (start >= 0) results.Add(file.Substring(start));
}
MayBe you want a caseinsensitiv search
file.IndexOf(words[i], 0, StringComparison.CurrentCultureIgnoreCase); MSDN: StringComparer Class

Docx - Removing section of document

Is there a way to remove sections of a document where i can specify the beginning and ending tags?
i need a way that i can remove a section of the document by passing in both my start and end catches, (##DELETEBEGIN and ##DELETEEND)
for example i have this in my document:
Hello, welcome to this document
##DELETEBEGIN{Some values to check in the code}
Some text that will be removed if the value is true
##DELETEEND
Final Line
If you need to delete text from ##DELETEBEGIN to ##DELETEEND, where ##DELETEBEGIN is not at the beginning of a Paragraph and ##DELETEEND is not at the end of a Paragraph, this code should work.
DocX document = DocX.Load("C:\\Users\\phil\\Desktop\\text.docx");
bool flag = false;
List<List<string>> list1 = new List<List<string>>();
List<string> list2 = new List<string>();
foreach (Novacode.Paragraph item in document.Paragraphs)
{
//use this if you need whole text of a paragraph
string paraText = item.Text;
var result = paraText.Split(' ');
int count = 0;
list2 = new List<string>();
//use this if you need word by word
foreach (var data in result)
{
string word = data.ToString();
if (word.Contains("##DELETEBEGIN")) flag = true;
if (word.Contains("##DELETEEND"))
{
flag = false;
list2.Add(word);
}
if (flag) list2.Add(word);
count++;
}
list1.Add(list2);
}
for (int i = 0; i < list1.Count(); i++)
{
string temp = "";
for (int y = 0; y < list1[i].Count(); y++)
{
if (y == 0)
{
temp = list1[i][y];
continue;
}
temp += " " + list1[i][y];
}
if (!temp.Equals("")) document.ReplaceText(temp, "");
}
document.Save();
I have to give some credit to this post for looping through each word.
I think i have found a solution to this, at least it works for me, please let me know if there is anything i can do better:
the deleteCommand would be the ##DELETEBEGIN string and the deleteEndCommand would be the ##DELETEEND
private void RemoveSection(DocX doc, string deleteCommand, string deleteEndCommand)
{
try
{
int deleteStart = 0;
int deleteEnd = 0;
//Get the array of the paragraphs containing the start and end catches
for (int i = 0; i < doc.Paragraphs.Count; i++)
{
if (doc.Paragraphs[i].Text.Contains(deleteCommand))
deleteStart = i;
if (doc.Paragraphs[i].Text.Contains(deleteEndCommand))
deleteEnd = i;
}
if (deleteStart > 0 && deleteEnd > 0)
{
//delete from the paraIndex as the arrays will shift when a paragraph is deleted
int paraIndex = deleteStart;
for (int i = deleteStart; i <= deleteEnd; i++)
{
doc.RemoveParagraphAt(paraIndex);
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}

Highlighting User Defined Keywords in RichTextBox

I am searching XML files to see if there are contents which match the words inserted in these textboxes txtComKeyword1, txtComKeyword2, txtComKeyword3 and/or txtComKeyword4. The function below is working, but may I know how can I highlight the keywords that user entered in the four textboxes that match that appear in my richComResults richtextbox?
For example, my user will fill in those four textboxes ie. txtComKeyword1, txtComKeyword2, txtComKeyword3 and txtComKeyword4. Then, my code will parse the XML file to see if the nodes contain these four keywords, if yes, the nodes' data will be output on my richComResults, I wanna highlight those four keywords (eg txtComKeyword1=hello, txtComKeyword2=bye, txtComKeyword3=morning, txtComKeyword4=night). These 4 words, if found and appear in richComResults, will be highlighted with color.
I have no clue after searching for a while, my case is much different from other questions. I am a newbie in programming, your help would be much appreciated. Thank you!
My Code:
private void searchComByKeywords()
{
// Process the list of files found in the directory.
string[] fileEntries = Directory.GetFiles(sourceDir);
foreach (string fileName in fileEntries)
{
XmlDocument xmlDoc = new XmlDocument(); //* create an xml document object.
string docPath = fileName;
xmlDoc.Load(docPath); //* load the XML document from the specified file.
XmlNodeList nodeList = xmlDoc.GetElementsByTagName("item");
foreach (XmlNode node in nodeList)
{
XmlElement itemElement = (XmlElement) node;
string itemDescription = itemElement.GetElementsByTagName("description")[0].InnerText;
if (txtComKeyword1.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword1.Text.ToLower()) ||
txtComKeyword2.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword2.Text.ToString()) ||
txtComKeyword3.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword3.Text.ToString()) ||
txtComKeyword4.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword4.Text.ToString()))
{
string itemTitle = itemElement.GetElementsByTagName("title")[0].InnerText;
string itemDate = itemElement.GetElementsByTagName("pubDate")[0].InnerText;
string itemAuthor = itemElement.GetElementsByTagName("author")[0].InnerText;
richComResults.AppendText("Author: " + itemAuthor + "\nDate: " + itemDate + "\nTitle: " + itemTitle + "\nDescription: " + itemDescription + "\n\n--------\n\n");
}
}
}
}
Try this:
int pointer = 0;
int index = 0;
string keyword = "txtComKeyword1";
while (true)
{
index = richComResults.Text.IndexOf(keyword, pointer);
//if keyword not found
if (index == -1)
{
break;
}
richComResults.Select(index, keyword.Length);
richComResults.SelectionFont = new System.Drawing.Font(richComResults.Font, FontStyle.Bold);
pointer = index + keyword.Length;
}
This searches for the keyword and highlights it. Then it continues the search after the found keyword. The pointer is used to keep track of the search position in your text. The index marks the position of the found keyword.
Jan's answer contains great content, but I shuddered mildly at the while(true) and break aspect! Here's my tweaked (case-insensitive) version...
int nextHigh = RTF.Text.IndexOf(txSearch, 0, StringComparison.OrdinalIgnoreCase);
while (nextHigh >= 0)
{
RTF.Select(nextHigh, txSearch.Length);
RTF.SelectionColor = Color.Red; // Or whatever
RTF.SelectionFont = new Font("Arial", 12, FontStyle.Bold); // you like
nextHigh = RTF.Text.IndexOf(txSearch, nextHigh + txSearch.Length, StringComparison.OrdinalIgnoreCase);
}
try this code :
void ParseLine(string line)
{
Regex r = new Regex("([ \\t{}():;])");
String[] tokens = r.Split(line);
foreach (string token in tokens)
{
// Set the tokens default color and font.
richTextBox1.SelectionColor = Color.Black;
richTextBox1.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
// Check whether the token is a keyword.
String[] keywords = { "Author", "Date", "Title", "Description", };
for (int i = 0; i < keywords.Length; i++)
{
if (keywords[i] == token)
{
// Apply alternative color and font to highlight keyword.
richTextBox1.SelectionColor = Color.Blue;
richTextBox1.SelectionFont = new Font("Courier New", 10, FontStyle.Bold);
break;
}
}
richTextBox1.SelectedText = token;
}
richTextBox1.SelectedText = "\n";
}
and after fill your string str with your method call my method :
string strRich =
"Author : Habib\nDate : 2012-08-10 \nTitle : mytitle \nDescription : desc\n";
Regex r = new Regex("\\n");
String[] lines = r.Split(strRich);
foreach (string l in lines)
{
ParseLine(l);
}
enjoy.

making a description text !

I have in my database the News Table which consist of => Id, Title, txt .
I need to be able to get a description text from the whole text which exist in txt Field , but without any codes like <...> , just a pure text !! how can I do this !?
By using the HTML Agility Pack:
http://htmlagilitypack.codeplex.com/
To extract all the text nodes in the HTML.
This question explains how you would do that:
C#: HtmlAgilityPack extract inner text
public static string Strip(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;
for (int i = 0; i < source.Length; i++)
{
char let = source[i];
if (let == '<')
{
inside = true;
continue;
}
if (let == '>')
{
inside = false;
continue;
}
if (!inside)
{
array[arrayIndex] = let;
arrayIndex++;
}
}
string text = new string(array, 0, arrayIndex);
return System.Text.RegularExpressions.Regex.Replace(text, #"\s+", " ").Trim();
}
Can you so something like:
(Get the record first then add a property to return some of the text)
return Text.Length >= 100 ? Text.SubString(0,100) : Text;

Categories

Resources