Highlighting User Defined Keywords in RichTextBox - c#

I am searching XML files to see if there are contents which match the words inserted in these textboxes txtComKeyword1, txtComKeyword2, txtComKeyword3 and/or txtComKeyword4. The function below is working, but may I know how can I highlight the keywords that user entered in the four textboxes that match that appear in my richComResults richtextbox?
For example, my user will fill in those four textboxes ie. txtComKeyword1, txtComKeyword2, txtComKeyword3 and txtComKeyword4. Then, my code will parse the XML file to see if the nodes contain these four keywords, if yes, the nodes' data will be output on my richComResults, I wanna highlight those four keywords (eg txtComKeyword1=hello, txtComKeyword2=bye, txtComKeyword3=morning, txtComKeyword4=night). These 4 words, if found and appear in richComResults, will be highlighted with color.
I have no clue after searching for a while, my case is much different from other questions. I am a newbie in programming, your help would be much appreciated. Thank you!
My Code:
private void searchComByKeywords()
{
// Process the list of files found in the directory.
string[] fileEntries = Directory.GetFiles(sourceDir);
foreach (string fileName in fileEntries)
{
XmlDocument xmlDoc = new XmlDocument(); //* create an xml document object.
string docPath = fileName;
xmlDoc.Load(docPath); //* load the XML document from the specified file.
XmlNodeList nodeList = xmlDoc.GetElementsByTagName("item");
foreach (XmlNode node in nodeList)
{
XmlElement itemElement = (XmlElement) node;
string itemDescription = itemElement.GetElementsByTagName("description")[0].InnerText;
if (txtComKeyword1.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword1.Text.ToLower()) ||
txtComKeyword2.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword2.Text.ToString()) ||
txtComKeyword3.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword3.Text.ToString()) ||
txtComKeyword4.Text != (String.Empty) && itemDescription.ToLower().Contains(txtComKeyword4.Text.ToString()))
{
string itemTitle = itemElement.GetElementsByTagName("title")[0].InnerText;
string itemDate = itemElement.GetElementsByTagName("pubDate")[0].InnerText;
string itemAuthor = itemElement.GetElementsByTagName("author")[0].InnerText;
richComResults.AppendText("Author: " + itemAuthor + "\nDate: " + itemDate + "\nTitle: " + itemTitle + "\nDescription: " + itemDescription + "\n\n--------\n\n");
}
}
}
}

Try this:
int pointer = 0;
int index = 0;
string keyword = "txtComKeyword1";
while (true)
{
index = richComResults.Text.IndexOf(keyword, pointer);
//if keyword not found
if (index == -1)
{
break;
}
richComResults.Select(index, keyword.Length);
richComResults.SelectionFont = new System.Drawing.Font(richComResults.Font, FontStyle.Bold);
pointer = index + keyword.Length;
}
This searches for the keyword and highlights it. Then it continues the search after the found keyword. The pointer is used to keep track of the search position in your text. The index marks the position of the found keyword.

Jan's answer contains great content, but I shuddered mildly at the while(true) and break aspect! Here's my tweaked (case-insensitive) version...
int nextHigh = RTF.Text.IndexOf(txSearch, 0, StringComparison.OrdinalIgnoreCase);
while (nextHigh >= 0)
{
RTF.Select(nextHigh, txSearch.Length);
RTF.SelectionColor = Color.Red; // Or whatever
RTF.SelectionFont = new Font("Arial", 12, FontStyle.Bold); // you like
nextHigh = RTF.Text.IndexOf(txSearch, nextHigh + txSearch.Length, StringComparison.OrdinalIgnoreCase);
}

try this code :
void ParseLine(string line)
{
Regex r = new Regex("([ \\t{}():;])");
String[] tokens = r.Split(line);
foreach (string token in tokens)
{
// Set the tokens default color and font.
richTextBox1.SelectionColor = Color.Black;
richTextBox1.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
// Check whether the token is a keyword.
String[] keywords = { "Author", "Date", "Title", "Description", };
for (int i = 0; i < keywords.Length; i++)
{
if (keywords[i] == token)
{
// Apply alternative color and font to highlight keyword.
richTextBox1.SelectionColor = Color.Blue;
richTextBox1.SelectionFont = new Font("Courier New", 10, FontStyle.Bold);
break;
}
}
richTextBox1.SelectedText = token;
}
richTextBox1.SelectedText = "\n";
}
and after fill your string str with your method call my method :
string strRich =
"Author : Habib\nDate : 2012-08-10 \nTitle : mytitle \nDescription : desc\n";
Regex r = new Regex("\\n");
String[] lines = r.Split(strRich);
foreach (string l in lines)
{
ParseLine(l);
}
enjoy.

Related

(C# WPF) "TextPointer.GetTextInRun" method ignores characters like "\r\n"

In C# WPF, I have a function to retrieve text from flowdocumentreader:
static string GetText(TextPointer textStart, TextPointer textEnd)
{
StringBuilder output = new StringBuilder();
TextPointer tp = textStart;
while (tp != null && tp.CompareTo(textEnd) < 0)
{
if (tp.GetPointerContext(LogicalDirection.Forward) ==
TextPointerContext.Text)
{
output.Append(tp.GetTextInRun(LogicalDirection.Forward));
}
tp = tp.GetNextContextPosition(LogicalDirection.Forward);
}
return output.ToString();
}
Then I use the function as follow:
string test = GetText(rtb.Document.ContentStart, rtb.Document.ContentEnd);
However, the string "test" ignores all the line breaks, which means "\r\n". It does keep the tab character, "\t".
My question is how to keep all the line breaks? I want to automatically highlight the first sentence of each paragraph, so I need to detect the line break characters, "\r\n".
Thanks in advance for your time.
Update:
I load the .rtf document into flowdocumentreader like this:
if (dlg.FileName.LastIndexOf(".rtf") != -1)
{
paraBodyText.Inlines.Clear();
string temp = File.ReadAllText(dlg.FileName, Encoding.UTF8);
MemoryStream stream = new MemoryStream(Encoding.UTF8.GetBytes(temp));
TextRange textRange = new TextRange(flow.ContentStart, flow.ContentEnd);
textRange.Load(stream, DataFormats.Rtf);
myDocumentReader.Document = flow;
stream.Close();
}
Edited
assuming that each paragraph has at least one sentence that ends with a dot, you can use the following code to make the first sentence bold:
List<TextRange> ranges = new List<TextRange>();
foreach (Paragraph p in rtb.Document.Blocks.OfType<Paragraph>())
{
TextPointer pointer = null;
foreach (Run r in p.Inlines.OfType<Run>())
{
int index = r.Text.IndexOf(".");
if (index != -1)
{
pointer = r.ContentStart.GetPositionAtOffset(index);
}
}
if (pointer == null)
continue;
var firsSentence = new TextRange(p.ContentStart, pointer);
ranges.Add(firsSentence);
}
foreach (var r in ranges)
{
r.ApplyPropertyValue(TextElement.FontWeightProperty, FontWeights.Bold);
}

How to find the index of a paragraph in word 2010

I am reading a word document in c#,where after reading it, I need to enter a comment for selected paragraphs.
So I need to find the index of paragraph through c#, is it possible??
foreach (Microsoft.Office.Interop.Word.Paragraph aPar in oDoc.Paragraphs) // looping through all the paragh in document
{
Microsoft.Office.Interop.Word.Range parRng = aPar.Range;
string sText = parRng.Text;
if (sText == para[1].ToString()) // found the paragraph and i need the index of this paragraph
{
oDoc.Comments.Add(oDoc.Paragraphs[0].Range, ref comments); // to add the comment in document
}
}
If I found the index of that paragraph, Can I insert the comment on that paragraph? Is it possible?
Or is there any other way to do this?
Try this
foreach (Microsoft.Office.Interop.Word.Paragraph aPar in oDoc.Paragraphs) // loads all words in document
{
Microsoft.Office.Interop.Word.Range parRng = aPar.Range;
string sText = parRng.Text.Replace("\r","");
if (sText == txtBoxParagraph.Text ) // found the paragraph and i need the index of this paragraph
{
oDoc.Comments.Add(parRng, txtBoxComments.Text); // to add the comment in document
}
}
It works for me.
int i = 1;
foreach (Word.Paragraph aPar in oDoc.Paragraphs)
{
string sText = aPar.Range.Text;
if (sText != "\r")
{
if (sText == para[1].ToString() + "\r")
{
Word.Range range = oDoc.Paragraphs[i + 1].Range;
if (!range.Text.Contains("GUID:"))
{
int pEnd = aPar.Range.End;
string guid = "GUID:" + para[0].Replace("{", "").Replace("}", "");
int length = guid.Length;
aPar.Range.InsertAfter(guid);
Word.Range parRng = oDoc.Range(pEnd, pEnd + length);
parRng.Font.Hidden = 1;
parRng.InsertParagraphAfter();
}
}
}
i++;
}

Avoid Scanning EntireRichtextbox Highlighting [duplicate]

This question already has answers here:
RichTextBox syntax highlighting in real time--Disabling the repaint
(3 answers)
Closed 9 years ago.
im working on a code editor and i came up with this set of codes:
public class Test2 : Form {
RichTextBox m_rtb = null;
public static void Main() {
Application.Run(new Test2());
}
public Test2() {
Text = "Test2";
ClientSize = new Size(400, 400);
m_rtb = new RichTextBox();
m_rtb.Multiline = true;
m_rtb.WordWrap = false;
m_rtb.AcceptsTab = true;
m_rtb.ScrollBars = RichTextBoxScrollBars.ForcedBoth;
m_rtb.Dock = DockStyle.Fill;
m_rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
m_rtb.SelectionColor = Color.Black;
Controls.Add(m_rtb);
Parse();
m_rtb.TextChanged += new EventHandler(this.TextChangedEvent);
}
void Parse() {
String inputLanguage =
"// Comment.\n" +
"using System;\n" + "\n" +
"public class Stuff : Form { \n" +
" public static void Main(String args) {\n" +
" }\n" +
"}\n" ;
// Foreach line in input,
// identify key words and format them when adding to the rich text box.
Regex r = new Regex("\\n");
String [] lines = r.Split(inputLanguage);
foreach (string l in lines) {
ParseLine(l);
}
}
void ParseLine(string line) {
Regex r = new Regex("([ \\t{}();])");
String [] tokens = r.Split(line);
foreach (string token in tokens) {
// Set the token's default color and font.
m_rtb.SelectionColor = Color.Black;
m_rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
// Check for a comment.
if (token == "//" || token.StartsWith("//")) {
// Find the start of the comment and then extract the whole comment.
int index = line.IndexOf("//");
string comment = line.Substring(index, line.Length - index);
m_rtb.SelectionColor = Color.LightGreen;
m_rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
m_rtb.SelectedText = comment;
break;
}
// Check whether the token is a keyword.
String [] keywords = { "public", "void", "using", "static", "class" };
for (int i = 0; i < keywords.Length; i++) {
if (keywords[i] == token) {
// Apply alternative color and font to highlight keyword.
m_rtb.SelectionColor = Color.Blue;
m_rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Bold);
break;
}
}
m_rtb.SelectedText = token;
}
m_rtb.SelectedText = "\n";
}
private void TextChangedEvent(object sender, EventArgs e) {
// Calculate the starting position of the current line.
int start = 0, end = 0;
for (start = m_rtb.SelectionStart - 1; start > 0; start--) {
if (m_rtb.Text[start] == '\n') { start++; break; }
}
// Calculate the end position of the current line.
for (end = m_rtb.SelectionStart; end < m_rtb.Text.Length; end++) {
if (m_rtb.Text[end] == '\n') break;
}
// Extract the current line that is being edited.
String line = m_rtb.Text.Substring(start, end - start);
// Backup the users current selection point.
int selectionStart = m_rtb.SelectionStart;
int selectionLength = m_rtb.SelectionLength;
// Split the line into tokens.
Regex r = new Regex("([ \\t{}();])");
string [] tokens = r.Split(line);
int index = start;
foreach (string token in tokens) {
// Set the token's default color and font.
m_rtb.SelectionStart = index;
m_rtb.SelectionLength = token.Length;
m_rtb.SelectionColor = Color.Black;
m_rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
// Check for a comment.
if (token == "//" || token.StartsWith("//")) {
// Find the start of the comment and then extract the whole comment.
int length = line.Length - (index - start);
string commentText = m_rtb.Text.Substring(index, length);
m_rtb.SelectionStart = index;
m_rtb.SelectionLength = length;
m_rtb.SelectionColor = Color.LightGreen;
m_rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
break;
}
// Check whether the token is a keyword.
String [] keywords = { "public", "void", "using", "static", "class" };
for (int i = 0; i < keywords.Length; i++) {
if (keywords[i] == token) {
// Apply alternative color and font to highlight keyword.
m_rtb.SelectionColor = Color.Blue;
m_rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Bold);
break;
}
}
index += token.Length;
}
// Restore the users current selection point.
m_rtb.SelectionStart = selectionStart;
m_rtb.SelectionLength = selectionLength;
}
}
problem was everytime i press space keys or type the entire code editor keeps on scanning like searching on what to highlight and i find it a bit annoying ...
so i just want to ask for possible solution about this ... to avoid highlighting of whole richtextbox like scanning what to highlight next .
thanks a lot in advance for the help! more power!
I answered this in your most recent question, but in case someone else is reading this and doesn't find it, I'll post it here (Since this is specifically about performance):
You can use a couple of things to improve performance:
1) You can get the line that the user is editing by getting the text range from the current selection. I would recommend using WPF's richtextbox as it contains much more features and has the useful TextPointer class which you can utilise to get the current line. It seems like you are using WinForms however, so this can be done with these few lines of code (with the addition of some trivial code):
int start_index = RTB.GetFirstCharIndexOfCurrentLine();
int line = RTB.GetLineFromCharIndex(index);
int last_index = RTB.GetFirstCharIndexFromLine(line+1);
RTB.Select(start_index, last_index);
You can then work with the current selection.
2) If you don't want it to update so frequently, you can create a timer which measures the delay since the last edit, and reset the timer if another edit is made before the timer elapses.

How to call a method from another one?

I'm working on a code-editor and I want to call the string line into a keyargs event which is inside another void-returning method.
Output should occur when I type enter key, and then the selected-list from ComboBox should append to text held in RichTextBox.
Now to fulfill that, I'd like to ask you, how to call this method:
void Parse()
{
String inputLanguage =
"using System;\n" + "\n" +
"public class Stuff : Form { \n" +
" public static void Main(String args) {\n" +
"\n" + "\n" +
" }\n" +
"}\n";
// Foreach line in input,
// identify key words and format them when adding to the rich text box.
Regex r = new Regex("\\n");
String[] lines = r.Split(inputLanguage);
foreach (string l in lines)
{
ParseLine(l);
}
}
void ParseLine(string line)
{
Regex r = new Regex("([ \\t{}();])");
String[] tokens = r.Split(line);
foreach (string token in tokens)
{
// Set the token's default color and font.
rtb.SelectionColor = Color.Black;
rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Regular);
// Check for a comment.
if (token == "//" || token.StartsWith("//"))
{
// Find the start of the comment and then extract the whole comment.
int index = line.IndexOf("//");
rtb.SelectedText = comment;
break;
}
// Check whether the token is a keyword.
var keywordsDef = new KeyWord();
String[] keywords = keywordsDef.keywords;
for (int i = 0; i < keywords.Length; i++)
{
if (keywords[i] == token)
{
// Apply alternative color and font to highlight keyword.
HighlighType.keywordsType(rtb);
break;
}
}
rtb.SelectedText = token;
}
rtb.SelectedText = "\n";
}
from within this one:
void lb_KeyDown(object sender, KeyEventArgs e)
{
if (e.KeyCode == Keys.Escape)
{
lb.Visible = false;
lb.Items.Clear();
}
if (e.KeyCode == Keys.Enter)
{
//ParseLine(string line);
Parse();
string comment = line.Substring(index, line.Length - index);
rtb.SelectedText = comment + " " + lb.SelectedIndex.ToString();
}
}
I really need help. Big thanks in advance!
You are passing the parameter wrong. You can not pass a type when calling a method. The commented line should read
ParseLine(line);
The variable line must be declared somewhere above ParseLine. What it contains is up to you, but probably you want to set
string line = lb.Text;
So your code could read like this:
void lb_KeyDown(object sender, KeyEventArgs e)
{
if (e.KeyCode == Keys.Escape)
{
lb.Visible = false;
lb.Items.Clear();
}
if (e.KeyCode == Keys.Enter)
{
string line = lb.Text;
ParseLine(line);
//Parse();
string comment = line.Substring(index, line.Length - index);
rtb.SelectionColor = Color.Green;
rtb.SelectionFont = new Font("Courier New", 10, FontStyle.Italic);
rtb.SelectedText = comment + " " + lb.SelectedIndex.ToString();
}
}
Calling the function is not the problem, but you need some way of retrieving the current line in whichever editor you're using. Once you have retrieved it, you can call ParseLine on it, but until you have it you have nothing to work on.

Reading CSV file and storing values into an array

I am trying to read a *.csv-file.
The *.csv-file consist of two columns separated by semicolon (";").
I am able to read the *.csv-file using StreamReader and able to separate each line by using the Split() function. I want to store each column into a separate array and then display it.
Is it possible to do that?
You can do it like this:
using System.IO;
static void Main(string[] args)
{
using(var reader = new StreamReader(#"C:\test.csv"))
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
listA.Add(values[0]);
listB.Add(values[1]);
}
}
}
My favourite CSV parser is one built into .NET library. This is a hidden treasure inside Microsoft.VisualBasic namespace.
Below is a sample code:
using Microsoft.VisualBasic.FileIO;
var path = #"C:\Person.csv"; // Habeeb, "Dubai Media City, Dubai"
using (TextFieldParser csvParser = new TextFieldParser(path))
{
csvParser.CommentTokens = new string[] { "#" };
csvParser.SetDelimiters(new string[] { "," });
csvParser.HasFieldsEnclosedInQuotes = true;
// Skip the row with the column names
csvParser.ReadLine();
while (!csvParser.EndOfData)
{
// Read current line fields, pointer moves to the next line.
string[] fields = csvParser.ReadFields();
string Name = fields[0];
string Address = fields[1];
}
}
Remember to add reference to Microsoft.VisualBasic
More details about the parser is given here: http://codeskaters.blogspot.ae/2015/11/c-easiest-csv-parser-built-in-net.html
LINQ way:
var lines = File.ReadAllLines("test.txt").Select(a => a.Split(';'));
var csv = from line in lines
select (from piece in line
select piece);
^^Wrong - Edit by Nick
It appears the original answerer was attempting to populate csv with a 2 dimensional array - an array containing arrays. Each item in the first array contains an array representing that line number with each item in the nested array containing the data for that specific column.
var csv = from line in lines
select (line.Split(',')).ToArray();
Just came across this library: https://github.com/JoshClose/CsvHelper
Very intuitive and easy to use. Has a nuget package too which made is quick to implement: https://www.nuget.org/packages/CsvHelper/27.2.1. Also appears to be actively maintained which I like.
Configuring it to use a semi-colon is easy: https://github.com/JoshClose/CsvHelper/wiki/Custom-Configurations
You can't create an array immediately because you need to know the number of rows from the beginning (and this would require to read the csv file twice)
You can store values in two List<T> and then use them or convert into an array using List<T>.ToArray()
Very simple example:
var column1 = new List<string>();
var column2 = new List<string>();
using (var rd = new StreamReader("filename.csv"))
{
while (!rd.EndOfStream)
{
var splits = rd.ReadLine().Split(';');
column1.Add(splits[0]);
column2.Add(splits[1]);
}
}
// print column1
Console.WriteLine("Column 1:");
foreach (var element in column1)
Console.WriteLine(element);
// print column2
Console.WriteLine("Column 2:");
foreach (var element in column2)
Console.WriteLine(element);
N.B.
Please note that this is just a very simple example. Using string.Split does not account for cases where some records contain the separator ; inside it.
For a safer approach, consider using some csv specific libraries like CsvHelper on nuget.
I usually use this parser from codeproject, since there's a bunch of character escapes and similar that it handles for me.
Here is my variation of the top voted answer:
var contents = File.ReadAllText(filename).Split('\n');
var csv = from line in contents
select line.Split(',').ToArray();
The csv variable can then be used as in the following example:
int headerRows = 5;
foreach (var row in csv.Skip(headerRows)
.TakeWhile(r => r.Length > 1 && r.Last().Trim().Length > 0))
{
String zerothColumnValue = row[0]; // leftmost column
var firstColumnValue = row[1];
}
If you need to skip (head-)lines and/or columns, you can use this to create a 2-dimensional array:
var lines = File.ReadAllLines(path).Select(a => a.Split(';'));
var csv = (from line in lines
select (from col in line
select col).Skip(1).ToArray() // skip the first column
).Skip(2).ToArray(); // skip 2 headlines
This is quite useful if you need to shape the data before you process it further (assuming the first 2 lines consist of the headline, and the first column is a row title - which you don't need to have in the array because you just want to regard the data).
N.B. You can easily get the headlines and the 1st column by using the following code:
var coltitle = (from line in lines
select line.Skip(1).ToArray() // skip 1st column
).Skip(1).Take(1).FirstOrDefault().ToArray(); // take the 2nd row
var rowtitle = (from line in lines select line[0] // take 1st column
).Skip(2).ToArray(); // skip 2 headlines
This code example assumes the following structure of your *.csv file:
Note: If you need to skip empty rows - which can by handy sometimes, you can do so by inserting
where line.Any(a=>!string.IsNullOrWhiteSpace(a))
between the from and the select statement in the LINQ code examples above.
You can use Microsoft.VisualBasic.FileIO.TextFieldParser dll in C# for better performance
get below code example from above article
static void Main()
{
string csv_file_path=#"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
Hi all, I created a static class for doing this.
+ column check
+ quota sign removal
public static class CSV
{
public static List<string[]> Import(string file, char csvDelimiter, bool ignoreHeadline, bool removeQuoteSign)
{
return ReadCSVFile(file, csvDelimiter, ignoreHeadline, removeQuoteSign);
}
private static List<string[]> ReadCSVFile(string filename, char csvDelimiter, bool ignoreHeadline, bool removeQuoteSign)
{
string[] result = new string[0];
List<string[]> lst = new List<string[]>();
string line;
int currentLineNumner = 0;
int columnCount = 0;
// Read the file and display it line by line.
using (System.IO.StreamReader file = new System.IO.StreamReader(filename))
{
while ((line = file.ReadLine()) != null)
{
currentLineNumner++;
string[] strAr = line.Split(csvDelimiter);
// save column count of dirst line
if (currentLineNumner == 1)
{
columnCount = strAr.Count();
}
else
{
//Check column count of every other lines
if (strAr.Count() != columnCount)
{
throw new Exception(string.Format("CSV Import Exception: Wrong column count in line {0}", currentLineNumner));
}
}
if (removeQuoteSign) strAr = RemoveQouteSign(strAr);
if (ignoreHeadline)
{
if(currentLineNumner !=1) lst.Add(strAr);
}
else
{
lst.Add(strAr);
}
}
}
return lst;
}
private static string[] RemoveQouteSign(string[] ar)
{
for (int i = 0;i< ar.Count() ; i++)
{
if (ar[i].StartsWith("\"") || ar[i].StartsWith("'")) ar[i] = ar[i].Substring(1);
if (ar[i].EndsWith("\"") || ar[i].EndsWith("'")) ar[i] = ar[i].Substring(0,ar[i].Length-1);
}
return ar;
}
}
I have spend few hours searching for a right library, but finally I wrote my own code :)
You can read file (or database) with whatever tools you want and then apply the following routine to each line:
private static string[] SmartSplit(string line, char separator = ',')
{
var inQuotes = false;
var token = "";
var lines = new List<string>();
for (var i = 0; i < line.Length; i++) {
var ch = line[i];
if (inQuotes) // process string in quotes,
{
if (ch == '"') {
if (i<line.Length-1 && line[i + 1] == '"') {
i++;
token += '"';
}
else inQuotes = false;
} else token += ch;
} else {
if (ch == '"') inQuotes = true;
else if (ch == separator) {
lines.Add(token);
token = "";
} else token += ch;
}
}
lines.Add(token);
return lines.ToArray();
}
var firstColumn = new List<string>();
var lastColumn = new List<string>();
// your code for reading CSV file
foreach(var line in file)
{
var array = line.Split(';');
firstColumn.Add(array[0]);
lastColumn.Add(array[1]);
}
var firstArray = firstColumn.ToArray();
var lastArray = lastColumn.ToArray();
Here's a special case where one of data field has semicolon (";") as part of it's data in that case most of answers above will fail.
Solution in that case will be
string[] csvRows = System.IO.File.ReadAllLines(FullyQaulifiedFileName);
string[] fields = null;
List<string> lstFields;
string field;
bool quoteStarted = false;
foreach (string csvRow in csvRows)
{
lstFields = new List<string>();
field = "";
for (int i = 0; i < csvRow.Length; i++)
{
string tmp = csvRow.ElementAt(i).ToString();
if(String.Compare(tmp,"\"")==0)
{
quoteStarted = !quoteStarted;
}
if (String.Compare(tmp, ";") == 0 && !quoteStarted)
{
lstFields.Add(field);
field = "";
}
else if (String.Compare(tmp, "\"") != 0)
{
field += tmp;
}
}
if(!string.IsNullOrEmpty(field))
{
lstFields.Add(field);
field = "";
}
// This will hold values for each column for current row under processing
fields = lstFields.ToArray();
}
The open-source Angara.Table library allows to load CSV into typed columns, so you can get the arrays from the columns. Each column can be indexed both by name or index. See http://predictionmachines.github.io/Angara.Table/saveload.html.
The library follows RFC4180 for CSV; it enables type inference and multiline strings.
Example:
using System.Collections.Immutable;
using Angara.Data;
using Angara.Data.DelimitedFile;
...
ReadSettings settings = new ReadSettings(Delimiter.Semicolon, false, true, null, null);
Table table = Table.Load("data.csv", settings);
ImmutableArray<double> a = table["double-column-name"].Rows.AsReal;
for(int i = 0; i < a.Length; i++)
{
Console.WriteLine("{0}: {1}", i, a[i]);
}
You can see a column type using the type Column, e.g.
Column c = table["double-column-name"];
Console.WriteLine("Column {0} is double: {1}", c.Name, c.Rows.IsRealColumn);
Since the library is focused on F#, you might need to add a reference to the FSharp.Core 4.4 assembly; click 'Add Reference' on the project and choose FSharp.Core 4.4 under "Assemblies" -> "Extensions".
I have been using csvreader.com(paid component) for years, and I have never had a problem. It is solid, small and fast, but you do have to pay for it. You can set the delimiter to whatever you like.
using (CsvReader reader = new CsvReader(s) {
reader.Settings.Delimiter = ';';
reader.ReadHeaders(); // if headers on a line by themselves. Makes reader.Headers[] available
while (reader.ReadRecord())
... use reader.Values[col_i] ...
}
I am just student working on my master's thesis, but this is the way I solved it and it worked well for me. First you select your file from directory (only in csv format) and then you put the data into the lists.
List<float> t = new List<float>();
List<float> SensorI = new List<float>();
List<float> SensorII = new List<float>();
List<float> SensorIII = new List<float>();
using (OpenFileDialog dialog = new OpenFileDialog())
{
try
{
dialog.Filter = "csv files (*.csv)|*.csv";
dialog.Multiselect = false;
dialog.InitialDirectory = ".";
dialog.Title = "Select file (only in csv format)";
if (dialog.ShowDialog() == DialogResult.OK)
{
var fs = File.ReadAllLines(dialog.FileName).Select(a => a.Split(';'));
int counter = 0;
foreach (var line in fs)
{
counter++;
if (counter > 2) // Skip first two headder lines
{
this.t.Add(float.Parse(line[0]));
this.SensorI.Add(float.Parse(line[1]));
this.SensorII.Add(float.Parse(line[2]));
this.SensorIII.Add(float.Parse(line[3]));
}
}
}
}
catch (Exception exc)
{
MessageBox.Show(
"Error while opening the file.\n" + exc.Message,
this.Text,
MessageBoxButtons.OK,
MessageBoxIcon.Error
);
}
}
This is my 2 simple static methods to convert text from csv file to List<List<string>> and vice versa. Each method use row convertor.
This code should take into account all the possibilities of the csv file. You can define own csv separator and this methods try to correct escape double 'quote' char, and deals with the situation when all text in quotes are one cell and csv separator is inside quoted string including multiple lines in one cell and can ignore empty rows.
Last method is only for testing. So you can ignore it, or test your own, or others solution with this test method :). For testing I used this hard csv with 2 rows on 4 lines:
0,a,""bc,d
"e, f",g,"this,is, o
ne ""lo
ng, cell""",h
This is final code. For simplicity, I removed all try catch blocks.
using System;
using System.Collections.Generic;
using System.Linq;
public static class Csv {
public static string FromListToString(List<List<string>> csv, string separator = ",", char quotation = '"', bool returnFirstRow = true)
{
string content = "";
for (int row = 0; row < csv.Count; row++) {
content += (row > 0 ? Environment.NewLine : "") + RowFromListToString(csv[row], separator, quotation);
}
return content;
}
public static List<List<string>> FromStringToList(string content, string separator = ",", char quotation = '"', bool returnFirstRow = true, bool ignoreEmptyRows = true)
{
List<List<string>> csv = new List<List<string>>();
string[] rows = content.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
if (rows.Length <= (returnFirstRow ? 0 : 1)) { return csv; }
List<string> csvRow = null;
for (int rowIndex = 0; rowIndex < rows.Length; rowIndex++) {
(List<string> row, bool rowClosed) = RowFromStringToList(rows[rowIndex], csvRow, separator, quotation);
if (rowClosed) { if (!ignoreEmptyRows || row.Any(rowItem => rowItem.Length > 0)) { csv.Add(row); csvRow = null; } } // row ok, add to list
else { csvRow = row; } // not fully created, continue
}
if (!returnFirstRow) { csv.RemoveAt(0); } // remove header
return csv;
}
public static string RowFromListToString(List<string> csvData, string separator = ",", char quotation = '"')
{
csvData = csvData.Select(element =>
{
if (element.Contains(quotation)) {
element = element.Replace(quotation.ToString(), quotation.ToString() + quotation.ToString());
}
if (element.Contains(separator) || element.Contains(Environment.NewLine)) {
element = "\"" + element + "\"";
}
return element;
}).ToList();
return string.Join(separator, csvData);
}
public static (List<string>, bool) RowFromStringToList(string csvRow, List<string> continueWithRow = null, string separator = ",", char quotation = '"')
{
bool rowClosed = true;
if (continueWithRow != null && continueWithRow.Count > 0) {
// in previous result quotation are fixed so i need convert back to double quotation
string previousCell = quotation.ToString() + continueWithRow.Last().Replace(quotation.ToString(), quotation.ToString() + quotation.ToString()) + Environment.NewLine;
continueWithRow.RemoveAt(continueWithRow.Count - 1);
csvRow = previousCell + csvRow;
}
char tempQuote = (char)162;
while (csvRow.Contains(tempQuote)) { tempQuote = (char)(tempQuote + 1); }
char tempSeparator = (char)(tempQuote + 1);
while (csvRow.Contains(tempSeparator)) { tempSeparator = (char)(tempSeparator + 1); }
csvRow = csvRow.Replace(quotation.ToString() + quotation.ToString(), tempQuote.ToString());
if(csvRow.Split(new char[] { quotation }, StringSplitOptions.None).Length % 2 == 0) { rowClosed = !rowClosed; }
string[] csvSplit = csvRow.Split(new string[] { separator }, StringSplitOptions.None);
List<string> csvList = csvSplit
.ToList()
.Aggregate("",
(string row, string item) => {
if (row.Count((ch) => ch == quotation) % 2 == 0) { return row + (row.Length > 0 ? tempSeparator.ToString() : "") + item; }
else { return row + separator + item; }
},
(string row) => row.Split(tempSeparator).Select((string item) => item.Trim(quotation).Replace(tempQuote, quotation))
).ToList();
if (continueWithRow != null && continueWithRow.Count > 0) {
return (continueWithRow.Concat(csvList).ToList(), rowClosed);
}
return (csvList, rowClosed);
}
public static bool Test()
{
string csvText = "0,a,\"\"bc,d" + Environment.NewLine + "\"e, f\",g,\"this,is, o" + Environment.NewLine + "ne \"\"lo" + Environment.NewLine + "ng, cell\"\"\",h";
List<List<string>> csvList = new List<List<string>>() { new List<string>() { "0", "a", "\"bc", "d" }, new List<string>() { "e, f", "g", "this,is, o" + Environment.NewLine + "ne \"lo" + Environment.NewLine + "ng, cell\"", "h" } };
List<List<string>> csvTextAsList = Csv.FromStringToList(csvText);
bool ok = Enumerable.SequenceEqual(csvList[0], csvTextAsList[0]) && Enumerable.SequenceEqual(csvList[1], csvTextAsList[1]);
string csvListAsText = Csv.FromListToString(csvList);
return ok && csvListAsText == csvText;
}
}
Usage examples:
// get List<List<string>> representation of csv
var csvFromText = Csv.FromStringToList(csvAsText);
// read csv file with custom separator and quote
// return no header and ignore empty rows
var csvFile = File.ReadAllText(csvFileFullPath);
var csvFromFile = Csv.FromStringToList(csvFile, ";", '"', false, false);
// get text representation of csvData from List<List<string>>
var csvAsText = Csv.FromListToString(csvData);
Notes:
This: char tempQuote = (char)162; is first rare character from ASCI table. The script searches for this, or the first next few ascii character that is NOT in the text and uses it as a temporary escape and quote characters.
Still wrong. You need to compensate for "" in quotes.
Here is my solution Microsoft style csv.
/// <summary>
/// Microsoft style csv file. " is the quote character, "" is an escaped quote.
/// </summary>
/// <param name="fileName"></param>
/// <param name="sepChar"></param>
/// <param name="quoteChar"></param>
/// <param name="escChar"></param>
/// <returns></returns>
public static List<string[]> ReadCSVFileMSStyle(string fileName, char sepChar = ',', char quoteChar = '"')
{
List<string[]> ret = new List<string[]>();
string[] csvRows = System.IO.File.ReadAllLines(fileName);
foreach (string csvRow in csvRows)
{
bool inQuotes = false;
List<string> fields = new List<string>();
string field = "";
for (int i = 0; i < csvRow.Length; i++)
{
if (inQuotes)
{
// Is it a "" inside quoted area? (escaped litteral quote)
if(i < csvRow.Length - 1 && csvRow[i] == quoteChar && csvRow[i+1] == quoteChar)
{
i++;
field += quoteChar;
}
else if(csvRow[i] == quoteChar)
{
inQuotes = false;
}
else
{
field += csvRow[i];
}
}
else // Not in quoted region
{
if (csvRow[i] == quoteChar)
{
inQuotes = true;
}
if (csvRow[i] == sepChar)
{
fields.Add(field);
field = "";
}
else
{
field += csvRow[i];
}
}
}
if (!string.IsNullOrEmpty(field))
{
fields.Add(field);
field = "";
}
ret.Add(fields.ToArray());
}
return ret;
}
}
I have a library that is doing exactly you need.
Some time ago I had wrote simple and fast enough library for work with CSV files. You can find it by the following link: https://github.com/ukushu/DataExporter/blob/master/Csv.cs
It works with CSV like with 2 dimensions array. Exactly like you need.
As example, in case of you need all of values of 3rd row only you need is to write:
Csv csv = new Csv();
csv.FileOpen("c:\\file1.csv");
var allValuesOf3rdRow = csv.Rows[2];
or to read 2nd cell of 3rd row:
var value = csv.Rows[2][1];
Headers are required in csv for json conversion in the below code
You can use below code as is without making any changes.
This code will work with two row headers or with one row header.
Below code reads the uploaded IForm File and converts to memory stream.
If you want to use file path instead of uploaded file you can replace
new StreamReader(ms, System.Text.Encoding.UTF8, true)) with new StreamReader("../../examplefilepath");
using (var ms = new MemoryStream())
{
administrativesViewModel.csvFile.CopyTo(ms);
ms.Position = 0;
using (StreamReader csvReader = new StreamReader(ms, System.Text.Encoding.UTF8, true))
{
List<string> lines = new List<string>();
while (!csvReader.EndOfStream)
{
var line = csvReader.ReadLine();
var values = line.Split(';');
if (values[0] != "" && values[0] != null)
{
lines.Add(values[0]);
}
}
var csv = new List<string[]>();
foreach (string item in lines)
{
csv.Add(item.Split(','));
}
var properties = lines[0].Split(',');
int csvI = 1;
var listObjResult = new List<Dictionary<string, string>>();
if (lines.Count() > 1)
{
var ln = lines[0].Substring(0, lines[0].Count() - 1);
var ln1 = lines[1].Substring(0, lines[1].Count() - 1);
var lnSplit = ln.Split(',');
var ln1Split = ln1.Split(',');
if (lnSplit.Count() != ln1Split.Count())
{
properties = lines[1].Split(',');
csvI = 2;
}
}
for (int i = csvI; i < csv.Count(); i++)
{
var objResult = new Dictionary<string, string>();
if (csvI > 0)
{
var splitProp = lines[0].Split(":");
if (splitProp.Count() > 1)
{
if (splitProp[0] != "" && splitProp[0] != null && splitProp[1] != "" && splitProp[1] != null)
{
objResult.Add(splitProp[0], splitProp[1]);
}
}
}
for (int j = 0; j < properties.Length; j++)
if (!properties[j].Contains(":"))
{
objResult.Add(properties[j], csv[i][j]);
}
listObjResult.Add(objResult);
}
var result = JsonConvert.SerializeObject(listObjResult);
var result2 = JArray.Parse(result);
Console.WriteLine(result2);
}
}
look at this
using CsvFramework;
using System.Collections.Generic;
namespace CvsParser
{
public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public List<Order> Orders { get; set; }
}
public class Order
{
public int Id { get; set; }
public int CustomerId { get; set; }
public int Quantity { get; set; }
public int Amount { get; set; }
public List<OrderItem> OrderItems { get; set; }
}
public class Address
{
public int Id { get; set; }
public int CustomerId { get; set; }
public string Name { get; set; }
}
public class OrderItem
{
public int Id { get; set; }
public int OrderId { get; set; }
public string ProductName { get; set; }
}
class Program
{
static void Main(string[] args)
{
var customerLines = System.IO.File.ReadAllLines(#"Customers.csv");
var orderLines = System.IO.File.ReadAllLines(#"Orders.csv");
var orderItemLines = System.IO.File.ReadAllLines(#"OrderItemLines.csv");
CsvFactory.Register<Customer>(builder =>
{
builder.Add(a => a.Id).Type(typeof(int)).Index(0).IsKey(true);
builder.Add(a => a.Name).Type(typeof(string)).Index(1);
builder.AddNavigation(n => n.Orders).RelationKey<Order, int>(k => k.CustomerId);
}, false, ',', customerLines);
CsvFactory.Register<Order>(builder =>
{
builder.Add(a => a.Id).Type(typeof(int)).Index(0).IsKey(true);
builder.Add(a => a.CustomerId).Type(typeof(int)).Index(1);
builder.Add(a => a.Quantity).Type(typeof(int)).Index(2);
builder.Add(a => a.Amount).Type(typeof(int)).Index(3);
builder.AddNavigation(n => n.OrderItems).RelationKey<OrderItem, int>(k => k.OrderId);
}, true, ',', orderLines);
CsvFactory.Register<OrderItem>(builder =>
{
builder.Add(a => a.Id).Type(typeof(int)).Index(0).IsKey(true);
builder.Add(a => a.OrderId).Type(typeof(int)).Index(1);
builder.Add(a => a.ProductName).Type(typeof(string)).Index(2);
}, false, ',', orderItemLines);
var customers = CsvFactory.Parse<Customer>();
}
}
}

Categories

Resources