c# reading in a text file into datatable - c#

i need to read files that look like this into a datatable:
A02 BLANK031
B02 F357442
C02 F264977
D02 BLANK037
E02 F272521
F02 E121562
G02 F264972
H02 F332321
A03 E208240
B03 F313854
C03 E229786
D03 E229787
E03 F307584
F03 F357478
i have a weird delimitter and some trailing spaces.
how would i read this into a datatable such that the first column will contain 'A02','B02'... and the second column will contain 'BLANK031','F357442',etc..
currently i am doing:
DataTable dt = new DataTable();
using (TextReader tr = File.OpenText(batchesAddresses[index]))
{
string line;
while ((line = tr.ReadLine()) != null)
{
string[] items = Regex.Split(line, ' ');
if (dt.Columns.Count == 0)
{
// Create the data columns for the data table based on the number of items
// on the first line of the file
for (int i = 0; i < items.Length; i++)
dt.Columns.Add(new DataColumn("Column" + i, typeof(string)));
}
dt.Rows.Add(items);
}
}
but this is not working because i have trailing spaces and multiple spaces between columns

If you use:
static readonly char[] space = { ' ' };
...
string[] items = line.Split(space, StringSplitOptions.RemoveEmptyEntries);
you should get the 2 values you expect, although something more selective might be desirable, especially if the right-hand-side might contain a space in the middle.

Change your regex to something like: (\w{3})\s+(\w{5,10}). This means capture 3 word chars (including digits) into group 1, look for one or more whitespace characters, and then capture 5-10 word chars into group 2.
Then do:
Regex r = new Regex("(\w{3})\s+(\w{5,10})");
Match m = r.Match(line);
string col1 = m.Groups[1].Value;
string col2 = m.Groups[2].Value;

The error regarding System.StringSplitOptions seems to be a casting bug in the compiler. Add a line prior to your split statement that defines the desired StringSplitOptions and then use the variable in the split statement.
static readonly char[] space = { ' ' };
static readonly StringSplitOptions options = StringSplitOptions.RemoveEmptyEntries;
...
string[] items = line.Split(space, options);
This should work for all overloads.

Related

How to copy a list in a Word table cell to Excel cell

I have the following test table in Word, with one cell having a multilevel list:
Using the code below, I can copy cells from the Word Table to a corresponding cell in an Excel worksheet:
foreach (Microsoft.Office.Interop.Word.Table table in objDoc.Tables)
{
for (int row = 1; row <= table.Rows.Count; row++)
{
for (int col = 1; col <= table.Columns.Count; col++)
{
string text = table.Cell(row, col).Range.Text;
worksheet.Cells[row, col] = text;
}
}
}
However, I get the following result where the Word cell containing the list is not copied properly into Excel:
I have also tried the following:
worksheet.Cells[row, col] = table.Cell(row, col).Range.FormattedText;
But I get the same results.
I also tried converting the list in the Word file by copying and pasting with Keep Text Only to remove Word's automatic formatting, and manually deleting the tabs. That yielded this result:
Although I can get the text with the list numbers, I do not get a carriage return, line break, or line feed to separate the items the list.
At the very least, I would like to preserve the list numbering and line breaks without having to manually cut/paste with Keep Text Only; and I want to avoid having to parse the text for the list numbers (which could be numbers or letters) and inserting line feeds.
There are multiple problems involved with achieving the stated result:
Excel doesn't use the same character as Word for new lines or new paragraphs. (In this case it must be new paragraphs since the numbering is being generated.) Excel wants ANSI 10; Word is using ANSI 13. So that needs to be converted.
Automatic Line numbering is formatting. Passing a string loses formatting; it can only be carried across using Copy. Or the numbering has to be converted to plain text.
Another issue is the "dot" at the end of the cell content, which is again ANSI 13 in combination with ANSI 7 (end-of-cell marker). This should also be removed.
The following bit of sample code takes care of all three conversions. (Note: this is VBA code that I've converted off the top of my head, so watch out for small syntax "gotchas")
Word.Range rng = table.Cell[rowCounter, colCounter].Range;
//convert the numbers to plain text, then undo the conversion
rng.ListFormat.ConvertNumbersToText();
string cellContent = rng.Text;
objDoc.Undo(1);
//remove end-of-cell characters
cellContent = TrimCellText2(cellContent);
//replace remaining paragraph marks with the Excel new line character
cellContent.Replace((char)13, (char)10);
worksheet.Cells[rowCounter, colCounter].Value = cellContent;
//cut off ANSI 13 + ANSI 7 from the end of the string coming from a
//Word table cell
private string TrimCellText2(s As String)
{
int len = s.Length;
while (len > 0 && s.Substring(len - 1) == (char)13 || s.Substring(len - 1) == (char)7);
s = s.Substring(0, Math.Min(len-1, len));
return s;
}
With the help of Cindy Meister, combined with the answer from Paul Walls in this other question for replacing characters in a C# string, here is the resulting answer.
foreach (Microsoft.Office.Interop.Word.Table table in objDoc.Tables)
{
for (int row = 1; row <= table.Rows.Count; row++)
{
for (int col = 1; col <= table.Columns.Count; col++)
{
// Convert the formatted list number to plain text, then undo the conversion
table.Cell(row, col).Range.ListFormat.ConvertNumbersToText();
string cellContent = table.Cell(row, col).Range.Text;
objDoc.Undo(1);
// remove end-of-cell characters
cellContent = trimCellText2(cellContent);
// Replace remaining paragraph marks with the excel newline character
char[] linefeeds = new char[] { '\r', '\n' };
string[] temp1 = cellContent.Split(linefeeds, StringSplitOptions.RemoveEmptyEntries);
cellContent = String.Join("\n", temp1);
// Replace tabs from the list format conversion with spaces
char[] tabs = new char[] { '\t', ' ' };
string[] temp2 = cellContent.Split(tabs, StringSplitOptions.RemoveEmptyEntries);
cellContent = String.Join(" ", temp2);
worksheet.Cells[row, col] = cellContent;
}
}
}
private static string trimCellText2(string myString)
{
int len = myString.Length;
string charString13 = "" + (char)13;
string charString7 = "" + (char)7;
while ((len > 0 && myString.Substring(len - 1) == charString13) || (myString.Substring(len - 1) == charString7))
myString = myString.Substring(0, Math.Min(len - 1, len));
return myString;
}
And here is the resulting output in Excel: Excel Output

How to split text into paragraphs?

I need to split a string into paragraphs and count those paragraphs (paragraphs separated by 2 or more empty lines).
In addition I need to read each word from the text and need the ability to mention the paragraph which this word belong to.
For example (Each paragraph is more then one line and two empty lines separates between paragraphs):
This is
the first
paragraph
This is
the second
paragraph
This is
the third
paragraph
Something like this should work for you:
var paragraphMarker = Environment.NewLine + Environment.NewLine;
var paragraphs = fileText.Split(new[] {paragraphMarker},
StringSplitOptions.RemoveEmptyEntries);
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}
You may need to change line delimiter, file can have different variants like "\n", "\r", "\r\n".
Also you can pass specific characters inside Trim function to remove symbols like '.',',','!','"' and others.
Edit: To add more flexibility you can use regexp for splitting paragraphs:
var paragraphs = Regex.Split(fileText, #"(\r\n?|\n){2}")
.Where(p => p.Any(char.IsLetterOrDigit));
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}
I think that you want to split the text in paragraphs, but do you have a delimiter to tell you to know you need to split the string?, for example if you want to identify the paragraph with "." this should do the trick
string paragraphs="My first paragraph. Once upon a time";
string[] words = paragraphs.Split('.');
foreach (string word in words)
{
Console.WriteLine(word);
}
The result for this will be:
My first paragraph
Once upon a time
Just remember that the "." character was removed!.
public static List<string> SplitLine(string isstr, int size = 100)
{
var words = isstr.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
List<string> lo = new List<string>();
string tmp = "";
int i = 0;
for (i = 0; i < words.Length; i++)
{
if ((tmp.Length + words[i].Length) > size)
{
lo.Add(tmp);
tmp = "";
}
tmp += " " + words[i];
}
if (!String.IsNullOrWhiteSpace(tmp))
{
lo.Add(tmp);
}
return lo;
}

How to count only letters in a string?

At the next code I'm splitting text to words, inserting them into a table separately and counting the numbers of letters in each word.
The problem is that counter is also counting spaces at the beginning of each line, and give me wrong value for some of the words.
How can I count only the letters of each word exactly?
var str = reader1.ReadToEnd();
char[] separators = new char[] {' ', ',', '/', '?'}; //Clean punctuation from copying
var words = str.Split(separators, StringSplitOptions.RemoveEmptyEntries).ToArray(); //Insert all the song words into "words" string
string constring1 = "datasource=localhost;port=3306;username=root;password=123";
using (var conDataBase1 = new MySqlConnection(constring1))
{
conDataBase1.Open();
for (int i = 0; i < words.Length; i++)
{
int numberOfLetters = words[i].ToCharArray().Length; //Calculate the numbers of letters in each word
var songtext = "insert into myproject.words (word_text,word_length) values('" + words[i] + "','" + numberOfLetters + "');"; //Insert words list and length into words table
MySqlCommand cmdDataBase1 = new MySqlCommand(songtext, conDataBase1);
try
{
cmdDataBase1.ExecuteNonQuery();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
This will be a simple and fast way of doing so:
int numberOfLetters = words[i].Count(word => !Char.IsWhiteSpace(word));
Another simple solution that will save you the above and rest of the answers here, will be to Trim() first, and than do your normal calculation, due your statement that it is happening just in the beginning of every line.
var words = str.Trim().Split(separators, StringSplitOptions.RemoveEmptyEntries);
Than all you will need is: (Without the redundant conversion)
int numberOfLetters = words[i].Length;
See String.Trim()
int numberOfLetters = words[i].Trim().ToCharArray().Length; //Calculate the numbers of letters in each word
instead of ' ' use '\s+' since it matches one or more whitespace at once, so it splits on any number of whitespace characters.
Regex.Split(myString, #"\s+");

How to get value specific column value in csv using c#?

I do a project in c# winforms.
I want to get first column value in csv.
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
listA.Add(values[0]);
}
------------------
no |name |
------------------
1 |wwwwww
2 |yyyyy
3 |aaaaa
4 |bbbbbb
Now I am using above this code. It gives the value row by row. i want all name value in listA
Any one have idea?
There is now way to read column in CSV without reading whole file. You can use some wrappers (for example: LINQ to CSV library) but they will just "hide" reading operation.
Yes - you're currently spliting on ;
Try using a comma instead.
Better to use a dedicated library btw...
Some frown upon Regex but I think it provides good flexibility. Here is an example inspired by
Adrian Mejia. Basically, you can choose particular characters between which the delimiter is valid in the context. i.e. a comma in "hello, world" or 'hello, world' would be valid.
static void Main(string[] args)
{
string csv = "Hello,1,3.5,25,\"speech marks\",'inverted commas'\r\nWorld,2,4,60,\"again, more speech marks\",'something else in inverted commas, with a comma'";
// General way to create grouping constructs which are valid 'text' fields
string p = "{0}([^{0}]*){0}"; // match group '([^']*)' (inverted commas) or \"([^\"]*)\" (speech marks)
string c = "(?<={0}|^)([^{0}]*)(?:{0}|$)"; // commas or other delimiter group (?<=,|^)([^,]*)(?:,|$)
char delimiter = ','; // this can be whatever delimiter you like
string p1 = String.Format(p, "\""); // speechmarks group (0)
string p2 = String.Format(p, "'"); // inverted comma group (1)
string c1 = String.Format(c, delimiter); // delimiter group (2)
/*
* The first capture group will be speech marks ie. "some text, "
* The second capture group will be inverted commas ie. 'this text'
* The third is everything else seperated by commas i.e. this,and,this will be [this][and][this]
* You can extend this to customise delimiters that represent text where a comma between is a valid entry eg. "this text, complete with a pause, is perfectly valid"
*
* */
//string pattern = "\"([^\"]*)\"|'([^']*)'|(?<=,|^)([^,]*)(?:,|$)";
string pattern = String.Format("{0}|{1}|{2}", new object[] { p1, p2, c1 }); // The actual pattern to match based on groups
string text = csv;
// If you're reading from a text file then this will do the trick. Uses the ReadToEnd() to put the whole file to a string.
//using (TextReader tr = new StreamReader("PATH TO MY CSV FILE", Encoding.ASCII))
//{
// text = tr.ReadToEnd(); // just read the whole stream
//}
string[] lines = text.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries); // if you have a blank line just remove it?
Regex regex = new Regex(pattern, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase); // compile for speed
List<object> rowsOfColumns = new List<object>();
foreach (string row in lines)
{
List<string> columns = new List<string>();
// Find matches.
MatchCollection matches = regex.Matches(row);
foreach (Match match in matches)
{
for (int ii = 0; ii < match.Groups.Count; ii++)
{
if (match.Groups[ii].Success) // ignore things that don't match
{
columns.Add(match.Groups[ii].Value.TrimEnd(new char[] { delimiter })); // strip the delimiter
break;
}
}
}
// Do something with your columns here (add to List for example)
rowsOfColumns.Add(columns);
}
}
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
string[] dates = line.Split(',');
for (int i = 0; i < dates.Length; i++)
{
if(i==0)
listA.Add(dates[0]);
}
}

Replace placeholders in order

I have a part of a URL like this:
/home/{value1}/something/{anotherValue}
Now i want to replace all between the brackets with values from a string-array.
I tried this RegEx pattern: \{[a-zA-Z_]\} but it doesn't work.
Later (in C#) I want to replace the first match with the first value of the array, second with the second.
Update: The /'s cant be used to separate. Only the placeholders {...} should be replaced.
Example: /home/before{value1}/and/{anotherValue}
String array: {"Tag", "1"}
Result: /home/beforeTag/and/1
I hoped it could works like this:
string input = #"/home/before{value1}/and/{anotherValue}";
string pattern = #"\{[a-zA-Z_]\}";
string[] values = {"Tag", "1"};
MatchCollection mc = Regex.Match(input, pattern);
for(int i, ...)
{
mc.Replace(values[i];
}
string result = mc.GetResult;
Edit:
Thank you Devendra D. Chavan and ipr101,
both solutions are greate!
You can try this code fragment,
// Begin with '{' followed by any number of word like characters and then end with '}'
var pattern = #"{\w*}";
var regex = new Regex(pattern);
var replacementArray = new [] {"abc", "cde", "def"};
var sourceString = #"/home/{value1}/something/{anotherValue}";
var matchCollection = regex.Matches(sourceString);
for (int i = 0; i < matchCollection.Count && i < replacementArray.Length; i++)
{
sourceString = sourceString.Replace(matchCollection[i].Value, replacementArray[i]);
}
[a-zA-Z_] describes a character class. For words, you'll have to add * at the end (any number of characters within a-zA-Z_.
Then, to have 'value1' captured, you'll need to add number support : [a-zA-Z0-9_]*, which can be summarized with: \w*
So try this one : {\w*}
But for replacing in C#, string.Split('/') might be easier as Fredrik proposed. Have a look at this too
You could use a delegate, something like this -
string[] strings = {"dog", "cat"};
int counter = -1;
string input = #"/home/{value1}/something/{anotherValue}";
Regex reg = new Regex(#"\{([a-zA-Z0-9]*)\}");
string result = reg.Replace(input, delegate(Match m) {
counter++;
return "{" + strings[counter] + "}";
});
My two cents:
// input string
string txt = "/home/{value1}/something/{anotherValue}";
// template replacements
string[] str_array = { "one", "two" };
// regex to match a template
Regex regex = new Regex("{[^}]*}");
// replace the first template occurrence for each element in array
foreach (string s in str_array)
{
txt = regex.Replace(txt, s, 1);
}
Console.Write(txt);

Categories

Resources