making a description text ! - c#

I have in my database the News Table which consist of => Id, Title, txt .
I need to be able to get a description text from the whole text which exist in txt Field , but without any codes like <...> , just a pure text !! how can I do this !?

By using the HTML Agility Pack:
http://htmlagilitypack.codeplex.com/
To extract all the text nodes in the HTML.
This question explains how you would do that:
C#: HtmlAgilityPack extract inner text

public static string Strip(string source)
{
char[] array = new char[source.Length];
int arrayIndex = 0;
bool inside = false;
for (int i = 0; i < source.Length; i++)
{
char let = source[i];
if (let == '<')
{
inside = true;
continue;
}
if (let == '>')
{
inside = false;
continue;
}
if (!inside)
{
array[arrayIndex] = let;
arrayIndex++;
}
}
string text = new string(array, 0, arrayIndex);
return System.Text.RegularExpressions.Regex.Replace(text, #"\s+", " ").Trim();
}

Can you so something like:
(Get the record first then add a property to return some of the text)
return Text.Length >= 100 ? Text.SubString(0,100) : Text;

Related

C# Split and get value

I have a little problem with getting items from webbrowser.document.
the part of code in document tha i need is this>
primary-text,7.gm2-body-2">**ineedthis.se**</div> <div jstcache="194"
I need to parse the "ineedthis.se" that will be different every time.
my code is this
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
System.IO.StreamReader sr = new System.IO.StreamReader(webBrowser1.DocumentStream.ToString());
string rssourcecode = sr.ReadToEnd();
Regex r = new Regex("7.gm2-body-2 > </ div > < div ", RegexOptions.Multiline);
MatchCollection matches = r.Matches(rssourcecode);
// Dim r As New System.Text.RegularExpressions.Regex("here need splitersorsomething", RegexOptions.Multiline)
foreach (Match itemcode in matches)
{
listBox1.Items.Add(itemcode.Value.Split(here need splites).GetValue(2));
}
}
so. can you please help me with right splitters ? thanks a lot
Your question was not very clear,still what I could understand is you want to get a part of a string(Substring) from a string.
Assuming you have a string value saved in variable:
string stringValue = primary-text,7.gm2-body-2">**ineedthis.se**</div> <div jstcache="194";
And you want to extract is: ineedthis.se
So you can try out stringValue.Substring(31,44);.
You can take reference from : https://www.c-sharpcorner.com/UploadFile/mahesh/substring-in-C-Sharp/
Let's say your string is following
var str = "<div primary-text,7.gm2-body-2\">**some random text**</div> <div jstcache=\"194\"";
The below is a very rudimentary solution but will help you in your answer
var foundStart = false;
var foundEnd = false;
var startIndex = -1;
var length = 0;
for(var index = 0; index < str.Length; index++)
{
if (!foundStart)
{
if (str[index].Equals('<') && str.Substring(index, 4).Equals("<div"))
{
foundStart = true;
continue;
}
}
if (foundStart && !foundEnd)
{
if (str[index].Equals('>'))
{
foundEnd = true;
continue;
}
}
if (foundStart && foundEnd)
{
if (startIndex == -1)
startIndex = index;
else
length++;
if (str[index + 1].Equals('<'))
{
foundStart = false;
foundEnd = false;
break;
}
}
}
//// This is your answer
Console.WriteLine(str.Substring(startIndex, length));

How can I use indexof and substring to find words in a string?

In the constructor :
var tempFR = File.ReadAllText(file);
GetResults(tempFR);
Then :
private List<string> GetResults(string file)
{
List<string> results = new List<string>();
string word = textBox1.Text;
string[] words = word.Split(new string[] { ",," }, StringSplitOptions.None);
for(int i = 0; i < words.Length; i++)
{
int start = file.IndexOf(words[i], 0);
results.Add(file.Substring(start));
}
return results;
}
words contains in this case 3 words System , public , test
I want to find all the words in file and add them to the list results using indexof and substring.
The way it is now start value is -1 all the time.
To clear some things.
This is a screenshot of the textBox1 :
That is why I'm using two commas to split and get the words.
This screenshot showing the words after split them from the textBox1 :
And this is the file string content :
I want to add to the List results all the words in the file.
When looking at the last screenshot there should be 11 results.
Three time the word using three times the word system five times the word public.
but the variable start is -1
Update :
Tried Barns solution/s but for me it's not working good.
First the code that make a search and then loop over the files and reporting to backgroundworker :
int numberofdirs = 0;
void DirSearch(string rootDirectory, string filesExtension, string[] textToSearch, BackgroundWorker worker, DoWorkEventArgs e)
{
List<string> filePathList = new List<string>();
int numberoffiles = 0;
try
{
filePathList = SearchAccessibleFilesNoDistinct(rootDirectory, null, worker, e).ToList();
}
catch (Exception err)
{
}
label21.Invoke((MethodInvoker)delegate
{
label21.Text = "Phase 2: Searching in files";
});
MyProgress myp = new MyProgress();
myp.Report4 = filePathList.Count.ToString();
foreach (string file in filePathList)
{
try
{
var tempFR = File.ReadAllText(file);
_busy.WaitOne();
if (worker.CancellationPending == true)
{
e.Cancel = true;
return;
}
bool reportedFile = false;
for (int i = 0; i < textToSearch.Length; i++)
{
if (tempFR.IndexOf(textToSearch[i], StringComparison.InvariantCultureIgnoreCase) >= 0)
{
if (!reportedFile)
{
numberoffiles++;
myp.Report1 = file;
myp.Report2 = numberoffiles.ToString();
myp.Report3 = textToSearch[i];
myp.Report5 = FindWordsWithtRegex(tempFR, textToSearch);
backgroundWorker1.ReportProgress(0, myp);
reportedFile = true;
}
}
}
numberofdirs++;
label1.Invoke((MethodInvoker)delegate
{
label1.Text = string.Format("{0}/{1}", numberofdirs, myp.Report4);
label1.Visible = true;
});
}
catch (Exception err)
{
}
}
}
I have the words array already in textToSearch and the file content in tempFR then I'm using the first solution of Barns :
private List<string> FindWordsWithtRegex(string filecontent, string[] words)
{
var res = new List<string>();
foreach (var word in words)
{
Regex reg = new Regex(word);
var c = reg.Matches(filecontent);
int k = 0;
foreach (var g in c)
{
Console.WriteLine(g.ToString());
res.Add(g + ":" + k++);
}
}
Console.WriteLine("Results of FindWordsWithtRegex");
res.ForEach(f => Console.WriteLine(f));
Console.WriteLine();
return res;
}
But the results I'm getting in the List res is not the same output in Barns solution/s this is the results I'm getting the List res for the first file :
In this case two words system and using but it found only the using 3 times but there is also system 3 times in the file content. and the output format is not the same as in the Barns solutions :
Here is an alternative using Regex instead of using IndexOf. Note I have created my own string to parse, so my results will be a bit different.
EDIT
private List<string> FindWordsWithCountRegex(string filecontent, string[] words)
{
var res = new List<string>();
foreach (var word in words)
{
Regex reg = new Regex(word, RegexOptions.IgnoreCase);
var c = reg.Matches(filecontent).Count();
res.Add(word + ":" + c);
}
return res;
}
Simple change this part and use a single char typically a space not a comma:
string[] words = word.Split(' ');
int start = file.IndexOf(words[i],0);
start will be -1 if the word is not found.
MSDN: IndexOf(String, Int32)
for(int i = 0; i < words.Length; i++)
{
int start = file.IndexOf(words[i], 0);
// only add to results if word is found (index >= 0)
if (start >= 0) results.Add(file.Substring(start));
}
If you want all appearance of the words you need an extra loop
int fileLength = file.Length;
for(int i = 0; i < words.Length; i++)
{
int startIdx = 0;
while (startIdx < fileLength ){
int idx = file.IndexOf(words[i], startIdx]);
if (start >= 0) {
// add to results
results.Add(file.Substring(start));
// and let Word-search continue from last found Word Position Ending
startIdx = (start + words.Length);
}
}
int start = file.IndexOf(words[i], 0);
// only add to results if word is found (index >= 0)
if (start >= 0) results.Add(file.Substring(start));
}
MayBe you want a caseinsensitiv search
file.IndexOf(words[i], 0, StringComparison.CurrentCultureIgnoreCase); MSDN: StringComparer Class

How to extract the content of a text file within a scope delimined by a string marker

I have a Console File,
I need to match a string ("Seq Started"), And if I get the string I want to copy all text till I get another string("Seq Ended") in a txt file.
You can use this.
We load the file and parse lines to search the starting and ending deliminers sequences assuming only one of each of them are allowed.
Then if the section is correclty found, we extract the lines of the soure using Linq and we save the result to the desired file.
using System.Linq;
using System.Collections.Generic;
static void Test()
{
string delimiterStart = "Seq Started";
string delimiterEnd = "Seq Ended";
string filenameSource = "c:\\sample source.txt";
string filenameDest = "c:\\sample dest.txt";
var result = new List<string>();
var lines = File.ReadAllLines(filenameSource);
int indexStart = -1;
int indexEnd = -1;
for ( int index = 0; index < lines.Length; index++ )
{
if ( lines[index].Contains(delimiterStart) )
if ( indexStart == -1 )
indexStart = index + 1;
else
{
Console.WriteLine($"Only one \"{delimiterStart}\" is allowed in file {filenameSource}.");
indexStart = -1;
break;
}
if ( lines[index].Contains(delimiterEnd) )
{
indexEnd = index;
break;
}
}
if ( indexStart != -1 && indexEnd != -1 )
{
result.AddRange(lines.Skip(indexStart).Take(indexEnd - indexStart));
File.WriteAllLines(filenameDest, result);
Console.WriteLine($"Content of file \"{filenameSource}\" extracted to file {filenameDest}.");
}
else
{
if ( indexStart == -1 )
Console.WriteLine($"\"{delimiterStart}\" not found in file {filenameSource}.");
if ( indexEnd == -1 )
Console.WriteLine($"\"{delimiterEnd}\" not found in file {filenameSource}.");
}
}
// Read each line of the file into a string array. Each element
// of the array is one line of the file.
string[] lines = System.IO.File.ReadAllLines(#"C:\Users\Public\TestFolder\WriteLines2.txt");
// Display the file contents by using a foreach loop.
System.Console.WriteLine("Contents of WriteLines2.txt = ");
foreach (string line in lines)
{
if(line == "Seq Started" && line != "Seq Ended")
{
//here you get each line in start to end one by one.
//apply your logic here.
}
}
Check the second example in below link-
[https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/file-system/how-to-read-from-a-text-file][1]

Docx - Removing section of document

Is there a way to remove sections of a document where i can specify the beginning and ending tags?
i need a way that i can remove a section of the document by passing in both my start and end catches, (##DELETEBEGIN and ##DELETEEND)
for example i have this in my document:
Hello, welcome to this document
##DELETEBEGIN{Some values to check in the code}
Some text that will be removed if the value is true
##DELETEEND
Final Line
If you need to delete text from ##DELETEBEGIN to ##DELETEEND, where ##DELETEBEGIN is not at the beginning of a Paragraph and ##DELETEEND is not at the end of a Paragraph, this code should work.
DocX document = DocX.Load("C:\\Users\\phil\\Desktop\\text.docx");
bool flag = false;
List<List<string>> list1 = new List<List<string>>();
List<string> list2 = new List<string>();
foreach (Novacode.Paragraph item in document.Paragraphs)
{
//use this if you need whole text of a paragraph
string paraText = item.Text;
var result = paraText.Split(' ');
int count = 0;
list2 = new List<string>();
//use this if you need word by word
foreach (var data in result)
{
string word = data.ToString();
if (word.Contains("##DELETEBEGIN")) flag = true;
if (word.Contains("##DELETEEND"))
{
flag = false;
list2.Add(word);
}
if (flag) list2.Add(word);
count++;
}
list1.Add(list2);
}
for (int i = 0; i < list1.Count(); i++)
{
string temp = "";
for (int y = 0; y < list1[i].Count(); y++)
{
if (y == 0)
{
temp = list1[i][y];
continue;
}
temp += " " + list1[i][y];
}
if (!temp.Equals("")) document.ReplaceText(temp, "");
}
document.Save();
I have to give some credit to this post for looping through each word.
I think i have found a solution to this, at least it works for me, please let me know if there is anything i can do better:
the deleteCommand would be the ##DELETEBEGIN string and the deleteEndCommand would be the ##DELETEEND
private void RemoveSection(DocX doc, string deleteCommand, string deleteEndCommand)
{
try
{
int deleteStart = 0;
int deleteEnd = 0;
//Get the array of the paragraphs containing the start and end catches
for (int i = 0; i < doc.Paragraphs.Count; i++)
{
if (doc.Paragraphs[i].Text.Contains(deleteCommand))
deleteStart = i;
if (doc.Paragraphs[i].Text.Contains(deleteEndCommand))
deleteEnd = i;
}
if (deleteStart > 0 && deleteEnd > 0)
{
//delete from the paraIndex as the arrays will shift when a paragraph is deleted
int paraIndex = deleteStart;
for (int i = deleteStart; i <= deleteEnd; i++)
{
doc.RemoveParagraphAt(paraIndex);
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}

Splitting CSV files with commas in the values [duplicate]

How to split the CSV file in c sharp? And how to display this?
I've been using the TextFieldParser Class in the Microsoft.VisualBasic.FileIO namespace for a C# project I'm working on. It will handle complications such as embedded commas or fields that are enclosed in quotes etc. It returns a string[] and, in addition to CSV files, can also be used for parsing just about any type of structured text file.
Display where? About splitting, the best way is to use a good library to that effect.
This library is pretty good, I can recommend it heartily.
The problems using naïve methods is that the usually fail, there are tons of considerations without even thinking about performance:
What if the text contains commas
Support for the many existing formats (separated by semicolon, or text surrounded by quotes, or single quotes, etc.)
and many others
Import Micorosoft.VisualBasic as a reference (I know, its not that bad) and use Microsoft.VisualBasic.FileIO.TextFieldParser - this handles CSV files very well, and can be used in any .Net language.
read the file one line at a time, then ...
foreach (String line in line.Split(new char[] { ',' }))
Console.WriteLine(line);
This is a CSV parser I use on occasion.
Usage: (dgvMyView is a datagrid type.)
CSVReader reader = new CSVReader("C:\MyFile.txt");
reader.DisplayResults(dgvMyView);
Class:
using System.IO;
using System.Text.RegularExpressions;
using System.Windows.Forms;
public class CSVReader
{
private const string ESCAPE_SPLIT_REGEX = "({1}[^{1}]*{1})*(?<Separator>{0})({1}[^{1}]*{1})*";
private string[] FieldNames;
private List<string[]> Records;
private int ReadIndex;
public CSVReader(string File)
{
Records = new List<string[]>();
string[] Record = null;
StreamReader Reader = new StreamReader(File);
int Index = 0;
bool BlankRecord = true;
FieldNames = GetEscapedSVs(Reader.ReadLine());
while (!Reader.EndOfStream)
{
Record = GetEscapedSVs(Reader.ReadLine());
BlankRecord = true;
for (Index = 0; Index <= Record.Length - 1; Index++)
{
if (!string.IsNullOrEmpty(Record[Index])) BlankRecord = false;
}
if (!BlankRecord) Records.Add(Record);
}
ReadIndex = -1;
Reader.Close();
}
private string[] GetEscapedSVs(string Data)
{
return GetEscapedSVs(Data, ",", "\"");
}
private string[] GetEscapedSVs(string Data, string Separator, string Escape)
{
string[] Result = null;
int Index = 0;
int PriorMatchIndex = 0;
MatchCollection Matches = Regex.Matches(Data, string.Format(ESCAPE_SPLIT_REGEX, Separator, Escape));
Result = new string[Matches.Count];
for (Index = 0; Index <= Result.Length - 2; Index++)
{
Result[Index] = Data.Substring(PriorMatchIndex, Matches[Index].Groups["Separator"].Index - PriorMatchIndex);
PriorMatchIndex = Matches[Index].Groups["Separator"].Index + Separator.Length;
}
Result[Result.Length - 1] = Data.Substring(PriorMatchIndex);
for (Index = 0; Index <= Result.Length - 1; Index++)
{
if (Regex.IsMatch(Result[Index], string.Format("^{0}[^{0}].*[^{0}]{0}$", Escape))) Result[Index] = Result[Index].Substring(1, Result[Index].Length - 2);
Result[Index] = Result[Index].Replace(Escape + Escape, Escape);
if (Result[Index] == null) Result[Index] = "";
}
return Result;
}
public int FieldCount
{
get { return FieldNames.Length; }
}
public string GetString(int Index)
{
return Records[ReadIndex][Index];
}
public string GetName(int Index)
{
return FieldNames[Index];
}
public bool Read()
{
ReadIndex = ReadIndex + 1;
return ReadIndex < Records.Count;
}
public void DisplayResults(DataGridView DataView)
{
DataGridViewColumn col = default(DataGridViewColumn);
DataGridViewRow row = default(DataGridViewRow);
DataGridViewCell cell = default(DataGridViewCell);
DataGridViewColumnHeaderCell header = default(DataGridViewColumnHeaderCell);
int Index = 0;
ReadIndex = -1;
DataView.Rows.Clear();
DataView.Columns.Clear();
for (Index = 0; Index <= FieldCount - 1; Index++)
{
col = new DataGridViewColumn();
col.CellTemplate = new DataGridViewTextBoxCell();
header = new DataGridViewColumnHeaderCell();
header.Value = GetName(Index);
col.HeaderCell = header;
DataView.Columns.Add(col);
}
while (Read())
{
row = new DataGridViewRow();
for (Index = 0; Index <= FieldCount - 1; Index++)
{
cell = new DataGridViewTextBoxCell();
cell.Value = GetString(Index).ToString();
row.Cells.Add(cell);
}
DataView.Rows.Add(row);
}
}
}
I had got the result for my query. its like simple like i had read a file using io.file. and all the text are stored into a string. After that i splitted with a seperator. The code is shown below.
using System;
using System.Collections.Generic;
using System.Text;
namespace CSV
{
class Program
{
static void Main(string[] args)
{
string csv = "user1, user2, user3,user4,user5";
string[] split = csv.Split(new char[] {',',' '});
foreach(string s in split)
{
if (s.Trim() != "")
Console.WriteLine(s);
}
Console.ReadLine();
}
}
}
The following function takes a line from a CSV file and splits it into a List<string>.
Arguments:
string line = the line to split
string textQualifier = what (if any) text qualifier (i.e. "" or "\"" or "'")
char delim = the field delimiter (i.e. ',' or ';' or '|' or '\t')
int colCount = the expected number of fields (0 means don't check)
Example usage:
List<string> fields = SplitLine(line, "\"", ',', 5);
// or
List<string> fields = SplitLine(line, "'", '|', 10);
// or
List<string> fields = SplitLine(line, "", '\t', 0);
Function:
private List<string> SplitLine(string line, string textQualifier, char delim, int colCount)
{
List<string> fields = new List<string>();
string origLine = line;
char textQual = '"';
bool hasTextQual = false;
if (!String.IsNullOrEmpty(textQualifier))
{
hasTextQual = true;
textQual = textQualifier[0];
}
if (hasTextQual)
{
while (!String.IsNullOrEmpty(line))
{
if (line[0] == textQual) // field is text qualified so look for next unqualified delimiter
{
int fieldLen = 1;
while (true)
{
if (line.Length == 2) // must be final field (zero length)
{
fieldLen = 2;
break;
}
else if (fieldLen + 1 >= line.Length) // must be final field
{
fieldLen += 1;
break;
}
else if (line[fieldLen] == textQual && line[fieldLen + 1] == textQual) // escaped text qualifier
{
fieldLen += 2;
}
else if (line[fieldLen] == textQual && line[fieldLen + 1] == delim) // must be end of field
{
fieldLen += 1;
break;
}
else // not a delimiter
{
fieldLen += 1;
}
}
string escapedQual = textQual.ToString() + textQual.ToString();
fields.Add(line.Substring(1, fieldLen - 2).Replace(escapedQual, textQual.ToString())); // replace escaped qualifiers
if (line.Length >= fieldLen + 1)
{
line = line.Substring(fieldLen + 1);
if (line == "") // blank final field
{
fields.Add("");
}
}
else
{
line = "";
}
}
else // field is not text qualified
{
int fieldLen = line.IndexOf(delim);
if (fieldLen != -1) // check next delimiter position
{
fields.Add(line.Substring(0, fieldLen));
line = line.Substring(fieldLen + 1);
if (line == "") // final field must be blank
{
fields.Add("");
}
}
else // must be last field
{
fields.Add(line);
line = "";
}
}
}
}
else // if there is no text qualifier, then use existing split function
{
fields.AddRange(line.Split(delim));
}
if (colCount > 0 && colCount != fields.Count) // count doesn't match expected so throw exception
{
throw new Exception("Field count was:" + fields.Count.ToString() + ", expected:" + colCount.ToString() + ". Line:" + origLine);
}
return fields;
}
Problem: Convert a comma separated string into an array where commas in "quoted strings,,," should not be considered as separators but as part of an entry
Input:
String: First,"Second","Even,With,Commas",,Normal,"Sentence,with ""different"" problems",3,4,5
Output:
String-Array: ['First','Second','Even,With,Commas','','Normal','Sentence,with "different" problems','3','4','5']
Code:
string sLine;
sLine = "First,\"Second\",\"Even,With,Commas\",,Normal,\"Sentence,with \"\"different\"\" problems\",3,4,5";
// 1. Split line by separator; do not split if separator is within quotes
string Separator = ",";
string Escape = '"'.ToString();
MatchCollection Matches = Regex.Matches(sLine,
string.Format("({1}[^{1}]*{1})*(?<Separator>{0})({1}[^{1}]*{1})*", Separator, Escape));
string[] asColumns = new string[Matches.Count + 1];
int PriorMatchIndex = 0;
for (int Index = 0; Index <= asColumns.Length - 2; Index++)
{
asColumns[Index] = sLine.Substring(PriorMatchIndex, Matches[Index].Groups["Separator"].Index - PriorMatchIndex);
PriorMatchIndex = Matches[Index].Groups["Separator"].Index + Separator.Length;
}
asColumns[asColumns.Length - 1] = sLine.Substring(PriorMatchIndex);
// 2. Remove quotes
for (int Index = 0; Index <= asColumns.Length - 1; Index++)
{
if (Regex.IsMatch(asColumns[Index], string.Format("^{0}[^{0}].*[^{0}]{0}$", Escape))) // If "Text" is sourrounded by quotes (but ignore double quotes => "Leave ""inside"" quotes")
{
asColumns[Index] = asColumns[Index].Substring(1, asColumns[Index].Length - 2); // "Text" => Text
}
asColumns[Index] = asColumns[Index].Replace(Escape + Escape, Escape); // Remove double quotes ('My ""special"" text' => 'My "special" text')
if (asColumns[Index] == null) asColumns[Index] = "";
}
The output array is asColumns

Categories

Resources