Skipping Lines with Dashes in a text file with regex in c#

Skipping Lines with Dashes in a text file with regex in c# - c#

I have a text file with SQL commands, I've done some code to "ignore" the comments and blank spaces in orde to get just the commands (I will post code below and a sample of the text file and output), that works fine but in that text file I also have lines such as this "-----------------------------------" that I need to ignore, I've done the code to ignore it but I can't figure out why it doesnt work properly.
Code:
public string[] Parser(string caminho)
{
string text = File.ReadAllText(caminho);
var Linha = Regex.Replace(text, #"\/\**?\*\/", " ");
var Commands = Linha.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.Where(line => !string.IsNullOrWhiteSpace(line))
.Where(line => !Regex.IsMatch(line, #"^[\s\-]+$"))
.ToArray();
}
This is the .Where I added to "ignore" the dashed lines:
.Where(line => !Regex.IsMatch(line, #"^[\s-]+$"))
Sample of text with the dashes:
/
---------------------------------------------------------------------
UPDATE CDPREPORTSQL
SET COMANDOSQL_FROM =
'SELECT DESCONTO,EMPCOD,EMPDSC,LINVER,NOMESISTEMA,OBS,ORCCOD,ORCVER,PEDCOD,PEDDSC,
ROUND(PRCUNIT*#CAMBIO#,5) PRCUNIT,
ROUND(PRCUNITSEMDESC*#CAMBIO#,5) PRCUNITSEMDESC,
PROPCHECK,QTDGLOB,QTDPROP,REFCOD,REFDSC,EMPCODVER, COEFGERAL_PLT FROM #OWNER#.VW_PROPOSTAS',
COMANDOSQL_WHERE =
'WHERE ORCCOD=#ORCCOD# AND ORCVER=#ORCVER# AND NOMESISTEMA=#NOMESISTEMA# AND PEDCOD=#MYCOD#'
WHERE REPID = 'CDP0000057'
/
---------------------------------------------------------------------
Sample of the output:
---------------------------------------------------------------------
UPDATE CDPREPORTSQL
SET COMANDOSQL_FROM =
'SELECT DESCONTO,EMPCOD,EMPDSC,LINVER,NOMESISTEMA,OBS,ORCCOD,ORCVER,PEDCOD,PEDDSC,
ROUND(PRCUNIT*#CAMBIO#,5) PRCUNIT,
ROUND(PRCUNITSEMDESC*#CAMBIO#,5) PRCUNITSEMDESC,
PROPCHECK,QTDGLOB,QTDPROP,REFCOD,REFDSC,EMPCODVER, COEFGERAL_PLT FROM #OWNER#.VW_PROPOSTAS',
COMANDOSQL_WHERE =
'WHERE ORCCOD=#ORCCOD# AND ORCVER=#ORCVER# AND NOMESISTEMA=#NOMESISTEMA# AND PEDCOD=#MYCOD#'
WHERE REPID = 'CDP0000057'
---------------------------------------------------------------------
These are the examples of statements that can occur and that I need to process:
/* */
UPDATE Orc
/*UPDATE comando */
set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
Another one:
/* */
---- comment
UPDATE Orc set MercadoInt = 'N', Coef_KrMo =
-1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
And another one:
/* */
UPDATE Orc set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
Note that I need to process them even if there are commented section in the middle if the statement
Note that everything else is working fine (it "ignores" the comments
and blank spaces)
The '/' is just to divide the commands in the text file

All this seems rather complex and slow. If you just want to find/reject lines of dashes, why not use:
if (line.StartsWith("----"))
(Assuming that 4 dashes is sufficient to detect such lines unambiguously)
If there may be whitespace at the start of the line, then:
if (line.Trim().StartsWith("----"))
Not only is this approach infinitely more readable than regex, it'll most probably be much faster.

The code below works on the examples you gave.
private const string DashComment = #"(^|\s+)--.*(\n|$)";
private const string SlashStarComment = #"\/\*.*?\*\/";
private string[] CommandSplitter(string text)
{
// strip /* ... */ comments
var strip1 = Regex.Replace(text, SlashStarComment, " ", RegexOptions.Multiline);
var strip2 = Regex.Replace(strip1, DashComment, "\n", RegexOptions.Multiline);
// split into individual commands separated by '/'
var commands = strip2.Split(new[] {'/'}, StringSplitOptions.RemoveEmptyEntries);
return commands.Where(line => !String.IsNullOrWhiteSpace(line))
.ToArray();
}
I took the three examples you posted in your question and put them in a single string. It looks like this (yeah, it's ugly):
private const string Test1 = #"/* */
UPDATE Orc
/*UPDATE comando */
set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
/* */
---- comment
UPDATE Orc set MercadoInt = 'N', Coef_KrMo =
-1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
/* */
UPDATE Orc set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/";
Then, I called the CommandSplitter:
var result = CommandSplitter(Test1);
And output the results:
foreach (var t in result)
{
Console.WriteLine(t);
Console.WriteLine("////////////////////////");
}
That removed the /* ... */ comments and the -- ... comments.
It also worked on this example:
private const string Test2 =
"Update Orc set /* this is a comment */ MercadoInt = 'N' -- this is another comment\n" +
"Where MercadoInt is NULL --another comment";
The output:
Update Orc set MercadoInt = 'N'
Where MercadoInt is NULL
Update
The code above returns an array of commands. Each command consists of multiple lines. If you want to remove extraneous spaces at the beginning of lines, and eliminate blank lines, then you have to process each individual command separately. So you'd want to extend the CommandSplitter like this:
private string[] CommandSplitter(string text)
{
// strip /* ... */ comments
var strip1 = Regex.Replace(text, SlashStarComment, " ", RegexOptions.Multiline);
var strip2 = Regex.Replace(strip1, DashComment, "\n", RegexOptions.Multiline);
// split into individual commands separated by '/'
var commands = strip2.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
return commands.Select(cmd => cmd.Split(new[] {'\n'})
.Select(l => l.Trim()))
.Select(lines => string.Join("\n", lines.Where(l => !string.IsNullOrWhiteSpace(l))))
.ToArray();
}

From what I understand, you have a text file with multiple SQL commands, seperated by:
/
---------------------------------------------------------------------
And you only want the text in between these dashes. If so, why not split the text with Regex.Split, then get out all the elements?
This regex seems to work:
\/\n\n-+
Based on the Regex.Split documentation, the code would be:
string input = File.ReadAllText(caminho);
string pattern = "\/\n\n-+";
string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
//do cool stuff with your cool query
}

If you don't want to use regex you could also use !line.TrimStart().StartWith("-") shoud be the same and I think it is faster.

I've done the code like this, so far is working good.
public string[] Parser(string caminho)
{
List<string> Commands2 = new List<string>();
string text = File.ReadAllText(caminho);
var Linha = Regex.Replace(text, #"\/\**?\*\/", " ");
var Commands = Linha.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.Where(line => !string.IsNullOrWhiteSpace(line))
.Where(line => !Regex.IsMatch(line, #"^[\s\-]+$"))
.ToArray();
Commands2 = Commands.ToList();
for(int idx = 0; idx < Commands2.Count; idx ++)
{
if (Commands2[idx].TrimStart().StartsWith("-"))
{
string linha = Commands2[idx];
string linha2 = linha.Remove(linha.IndexOf('-'), linha.LastIndexOf('-') - 1);
Commands2[idx] = linha2;
}
}
//test the output to a .txt file
StreamWriter Comandos = new StreamWriter(Directory.GetParent(caminho).ToString() + "Out.txt", false);
foreach (string linha in Commands2)
{
Comandos.Write(linha);
}
Comandos.Close();
return Commands2.ToArray();
}
After they analyzed my code they said that I can't use this (As
mentioned above) because it wont work for some cases like comments in
the middle of the statements.
I will try now doing so using Tsql120Parser

Related

How to align columns of different length in a txt file, based on max length of each column?

Having a txt file with 18 columns delimited by '' and separated by , where each line represents an Insert statement of a sqlite query:
(1999,1999,1999,1999,1999,0,0,'flaggr.png',261, 'Βάκχειος', 'Spl-up','B ', 'Pagrati/Athens,Attica,Greece', 'N/A', 'Hellenic Mythology', '','', ''),
(2000,2000,2000,2000,2000,0,2010,'flagru.png',3340, 'Анклав Снов', 'Act', 'G/D ', 'Bryansk,Russia', '2008-2009(as Vampire''s Crypt),2010-present', 'N/A', '','', ''),
(2001,2001,2001,2001,2001,0,2002,'flagru.png',271, 'Аркона', 'Act','P/FO ', 'Moscow,Russia', '2002(as Гиперборея),2002-present', 'Slavic Pism and FOtales, Legends, Mythology', '', '', ''),
(2002,2002,2002,2002,2002,0,1988,'flagru.png',470, 'Аспид', 'Spl-up','PROG ', 'Volgodonsk,Rostovregion,Russia', '1988-1997,2010-?', 'Politics, Horror, Death', '', '', ''),
(2003,2003,2003,2003,2003,0,2000,'flagua.png',359, 'Ірій', 'Unknown','FO D /G ', 'Lviv,Ukraine', '2000-?', 'Slavic mythology, Ukrainian FOlore', '', '', ''),
(2004,2004,2004,2004,2004,0,2011,'flagru.png',3036579, 'Лесьяр', 'Act','P FO ', 'Moscow,Russia', '2011-present', 'Pism, FOlore, Social matters, Feelings', '', '', ''),
(2005,2005,2005,2005,2005,0,2003,'flagru.png',218, 'М8Л8ТХ', 'Act','B with RAC', 'Tver,Ukraine(posterior),Russia', '2003-present', 'National Pride, National Socialism, Hatred, War, Intolerance, Pism', '', '', ''),
(2006,2006,2006,2006,2006,0,0,'flagru.png',354037, 'Рельос', 'Act','PR/POST-/ (early), G/POST-, Ambient (later)', 'Baltiisk,Kaliningradregion,Russia', 'N/A', 'N/A', '', '',''),
(2007,2007,2007,2007,2007,0,2006,'flagru.png',32937, 'Сивый Яр', 'Act','P/POST-B ', 'Vyritsa,Leningradregion,Russia', '2006-present', 'Pism, Pride, Heritage, Poetry, Slavonic Mythology', '', '', ''),
(2008,2008,2008,2008,2008,0,2001,'flagru.png',44, 'Темнозорь', 'Act','FO/B ', 'Moscow,Russia', '2001-present', 'Nature, Slavonic Pism, War, Right-wing nationalism', '4394', '', ''),
(2009,2009,2009,2009,2009,0,1993,'flagru.png',80, 'Эпидемия', 'Act','Pow ', 'Moscow,Russia', '1993-present', 'Fantasy, Tolkien, Elves', '', '', ''),
(2010,2010,2010,2010,2010,0,0,'flagjp.png',354039, 'こくまろみるく', 'Act','G/Pow ', 'N/A,Japan', 'N/A', 'Bizarre, Macabre', '', '', ''),
(2011,2011,2011,2011,2011,0,2012,'flagus.png',38723, 'מזמור', 'Act','B/Drone/D ', 'Portland,Oregon,United States', '2012-present', 'N/A', '', '', ''),
(2012,2012,2012,2012,2012,0,2004,'flaglb.png',67, 'دمار', 'Spl-up','B/Death ', 'Hamra,Beirut,Lebanon', '2004-2006', 'War, Pride, Blasphemy, Supremacy', '', '', ''),
(2013,2013,2013,2013,2013,0,2006,'flagcn.png',760, '原罪', 'Act','B (early), G/B (later)', 'Chengdu,SichuanProvince,China', '2006-present', 'Misanthropy, Hatred, Depression, War, Revelation', '', '', ''),
(2014,2014,2014,2014,2014,0,1995,'flagtw.png',443, '閃靈', 'Act','Melodic B/Death/FO ', 'Taipei,Taiwan', '1995-present', 'Taiwanese Myths and Legends, Anti-Fascism, History', '4443', '', ''),
(2015,2015,2015,2015,2015,0,2001,'flagjp.png',31450, '電気式華憐音楽集団', 'Act','Pow/G', 'N/A,Japan', '2001-present', 'Anime, Fantasy, Liberty', '', '', '');
What would be the best way to align all columns so for instance the first two rows become:
(1999,1999,1999,1999,1999,0,0, 'flaggr.png',261, 'Βάκχειος', 'Spl-up', 'B ', 'Pagrati/Athens,Attica,Greece', 'N/A', 'Hellenic Mythology', '','', ''),
(2000,2000,2000,2000,2000,0,2010,'flagru.png',3340, 'Анклав Снов', 'Act', 'G/D ', 'Bryansk,Russia', '2008-2009(as Vampire''s Crypt),2010-present', 'N/A', '','', ''),
I was thinking on:
Split all lines strings in file using comma as separator
Compute each column max length and store it in memory
Loop again the file but this time use computed max length and write output
The code I came was something like the following, however I realized one issue, there are some columns that have comma inside single quote like 'bla1,bla2,bla3' (columns 12 to 18 could have inner commas...)
so if I split string using comma, I will not get 18 columns.
After that problem I do not know how to continue...
What would be the way to split by comma, but considering single quote of some strings?
private static void AdjustColumnsInFile(string filePath, string outputFile)
{
//array to store max size of each column
int[] sizes = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
foreach (var line in File.ReadLines(filePath))
{
var words = line.Split(',');
if (words.Length == 18)
{
var i = 0;
//get max value of each column
foreach (var word in words)
{
sizes[i] = sizes[i] < word.Length ? word.Length : sizes[i];
i++;
}
}
}
...
using (var sw = new StreamWriter(outputFile))
{
foreach (var l in newLines)
{
sw.WriteLine($"{l}");
}
}
}

As I understand, your only problem is how to split string on commas given that some commas might appear inside '' quotes. You can do that with regular expression:
,(?=(?:[^\']*\'[^\']*\')*[^\']*$)
It basically matches comma which is followed by zero or even number of quotes ('). If comma appears inside '' quotes - in a valid string it will be followed by odd number of quotes, so will not match.
The rest should be easy, first calculate sizes:
//array to store max size of each column
int[] sizes = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
foreach (var line in File.ReadLines(filePath)) {
var tmp = line.Trim(); // remove leading and trailing whitespace
tmp = tmp.Remove(tmp.Length - 2, 2); // remove closing ) and , or ;
tmp = tmp.Remove(0, 1); // remove opening (
// split by comma
var words = Regex.Split(tmp, #",(?=(?:[^\']*\'[^\']*\')*[^\']*$)");
if (words.Length == 18) {
for (int i = 0; i < words.Length; i++) {
var word = words[i].Trim(); // remove whitespace
sizes[i] = sizes[i] < word.Length ? word.Length : sizes[i];
}
}
else throw new Exception("Invalid number of columns");
}
Then repeat and append spaces to columns which do not match expected size:
using (var writer = new StreamWriter(outputFile)) {
foreach (var line in File.ReadLines(filePath)) {
var tmp = line.Trim(); // remove trailing whitespace
bool hadTrailingComma = tmp.EndsWith(",");
tmp = tmp.Remove(tmp.Length - 2, 2); // remove closing ) and , or ;
tmp = tmp.Remove(0, 1); // remove opening (
var words = Regex.Split(tmp, #",(?=(?:[^\']*\'[^\']*\')*[^\']*$)");
var newLine = String.Join(",", words.Select((w, i) =>
{
w = w.Trim();
var targetSize = sizes[i];
if (w.Length < targetSize)
return w + new string(' ', targetSize - w.Length); // append spaces until max length
return w;
}));
writer.WriteLine($"({newLine}){(hadTrailingComma ? "," : ";")}");
}
}
Note that because of unicode characters such as こくまろみるく your output file might appear not aligned correctly, while in reality it is (that is - each column has the same size in characters).

C# split text when delimiter may be in values [duplicate]

Given
2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,"Corvallis, OR",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34
How to use C# to split the above information into strings as follows:
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
As you can see one of the column contains , <= (Corvallis, OR)
Based on
C# Regex Split - commas outside quotes
string[] result = Regex.Split(samplestring, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

Use the Microsoft.VisualBasic.FileIO.TextFieldParser class. This will handle parsing a delimited file, TextReader or Stream where some fields are enclosed in quotes and some are not.
For example:
using Microsoft.VisualBasic.FileIO;
string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields;
while (!parser.EndOfData)
{
fields = parser.ReadFields();
foreach (string field in fields)
{
Console.WriteLine(field);
}
}
parser.Close();
This should result in the following output:
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
See Microsoft.VisualBasic.FileIO.TextFieldParser for more information.
You need to add a reference to Microsoft.VisualBasic in the Add References .NET tab.

It is so much late but this can be helpful for someone. We can use RegEx as bellow.
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = CSVParser.Split(Test);

I see that if you paste csv delimited text in Excel and do a "Text to Columns", it asks you for a "text qualifier". It's defaulted to a double quote so that it treats text within double quotes as literal. I imagine that Excel implements this by going one character at a time, if it encounters a "text qualifier", it keeps going to the next "qualifier". You can probably implement this yourself with a for loop and a boolean to denote if you're inside literal text.
public string[] CsvParser(string csvText)
{
List<string> tokens = new List<string>();
int last = -1;
int current = 0;
bool inText = false;
while(current < csvText.Length)
{
switch(csvText[current])
{
case '"':
inText = !inText; break;
case ',':
if (!inText)
{
tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ','));
last = current;
}
break;
default:
break;
}
current++;
}
if (last != csvText.Length - 1)
{
tokens.Add(csvText.Substring(last+1).Trim());
}
return tokens.ToArray();
}

You could split on all commas that do have an even number of quotes following them.
You would also like to view at the specf for CSV format about handling comma's.
Useful Link : C# Regex Split - commas outside quotes

Use a library like LumenWorks to do your CSV reading. It'll handle fields with quotes in them and will likely overall be more robust than your custom solution by virtue of having been around for a long time.

It is a tricky matter to parse .csv files when the .csv file could be either comma separated strings, comma separated quoted strings, or a chaotic combination of the two. The solution I came up with allows for any of the three possibilities.
I created a method, ParseCsvRow() which returns an array from a csv string. I first deal with double quotes in the string by splitting the string on double quotes into an array called quotesArray. Quoted string .csv files are only valid if there is an even number of double quotes. Double quotes in a column value should be replaced with a pair of double quotes (This is Excel's approach). As long as the .csv file meets these requirements, you can expect the delimiter commas to appear only outside of pairs of double quotes. Commas inside of pairs of double quotes are part of the column value and should be ignored when splitting the .csv into an array.
My method will test for commas outside of double quote pairs by looking only at even indexes of the quotesArray. It also removes double quotes from the start and end of column values.
public static string[] ParseCsvRow(string csvrow)
{
const string obscureCharacter = "ᖳ";
if (csvrow.Contains(obscureCharacter)) throw new Exception("Error: csv row may not contain the " + obscureCharacter + " character");
var unicodeSeparatedString = "";
var quotesArray = csvrow.Split('"'); // Split string on double quote character
if (quotesArray.Length > 1)
{
for (var i = 0; i < quotesArray.Length; i++)
{
// CSV must use double quotes to represent a quote inside a quoted cell
// Quotes must be paired up
// Test if a comma lays outside a pair of quotes. If so, replace the comma with an obscure unicode character
if (Math.Round(Math.Round((decimal) i/2)*2) == i)
{
var s = quotesArray[i].Trim();
switch (s)
{
case ",":
quotesArray[i] = obscureCharacter; // Change quoted comma seperated string to quoted "obscure character" seperated string
break;
}
}
// Build string and Replace quotes where quotes were expected.
unicodeSeparatedString += (i > 0 ? "\"" : "") + quotesArray[i].Trim();
}
}
else
{
// String does not have any pairs of double quotes. It should be safe to just replace the commas with the obscure character
unicodeSeparatedString = csvrow.Replace(",", obscureCharacter);
}
var csvRowArray = unicodeSeparatedString.Split(obscureCharacter[0]);
for (var i = 0; i < csvRowArray.Length; i++)
{
var s = csvRowArray[i].Trim();
if (s.StartsWith("\"") && s.EndsWith("\""))
{
csvRowArray[i] = s.Length > 2 ? s.Substring(1, s.Length - 2) : ""; // Remove start and end quotes.
}
}
return csvRowArray;
}
One downside of my approach is the way I temporarily replace delimiter commas with an obscure unicode character. This character needs to be so obscure, it would never show up in your .csv file. You may want to put more handling around this.

This question and its duplicates have a lot of answers. I tried this one that looked promising, but found some bugs in it. I heavily modified it so that it would pass all of my tests.
/// <summary>
/// Returns a collection of strings that are derived by splitting the given source string at
/// characters given by the 'delimiter' parameter. However, a substring may be enclosed between
/// pairs of the 'qualifier' character so that instances of the delimiter can be taken as literal
/// parts of the substring. The method was originally developed to split comma-separated text
/// where quotes could be used to qualify text that contains commas that are to be taken as literal
/// parts of the substring. For example, the following source:
/// A, B, "C, D", E, "F, G"
/// would be split into 5 substrings:
/// A
/// B
/// C, D
/// E
/// F, G
/// When enclosed inside of qualifiers, the literal for the qualifier character may be represented
/// by two consecutive qualifiers. The two consecutive qualifiers are distinguished from a closing
/// qualifier character. For example, the following source:
/// A, "B, ""C"""
/// would be split into 2 substrings:
/// A
/// B, "C"
/// </summary>
/// <remarks>Originally based on: https://stackoverflow.com/a/43284485/2998072</remarks>
/// <param name="source">The string that is to be split</param>
/// <param name="delimiter">The character that separates the substrings</param>
/// <param name="qualifier">The character that is used (in pairs) to enclose a substring</param>
/// <param name="toTrim">If true, then whitespace is removed from the beginning and end of each
/// substring. If false, then whitespace is preserved at the beginning and end of each substring.
/// </param>
public static List<String> SplitQualified(this String source, Char delimiter, Char qualifier,
Boolean toTrim)
{
// Avoid throwing exception if the source is null
if (String.IsNullOrEmpty(source))
return new List<String> { "" };
var results = new List<String>();
var result = new StringBuilder();
Boolean inQualifier = false;
// The algorithm is designed to expect a delimiter at the end of each substring, but the
// expectation of the caller is that the final substring is not terminated by delimiter.
// Therefore, we add an artificial delimiter at the end before looping through the source string.
String sourceX = source + delimiter;
// Loop through each character of the source
for (var idx = 0; idx < sourceX.Length; idx++)
{
// If current character is a delimiter
// (except if we're inside of qualifiers, we ignore the delimiter)
if (sourceX[idx] == delimiter && inQualifier == false)
{
// Terminate the current substring by adding it to the collection
// (trim if specified by the method parameter)
results.Add(toTrim ? result.ToString().Trim() : result.ToString());
result.Clear();
}
// If current character is a qualifier
else if (sourceX[idx] == qualifier)
{
// ...and we're already inside of qualifier
if (inQualifier)
{
// check for double-qualifiers, which is escape code for a single
// literal qualifier character.
if (idx + 1 < sourceX.Length && sourceX[idx + 1] == qualifier)
{
idx++;
result.Append(sourceX[idx]);
continue;
}
// Since we found only a single qualifier, that means that we've
// found the end of the enclosing qualifiers.
inQualifier = false;
continue;
}
else
// ...we found an opening qualifier
inQualifier = true;
}
// If current character is neither qualifier nor delimiter
else
result.Append(sourceX[idx]);
}
return results;
}
Here are the test methods to prove that it works:
[TestMethod()]
public void SplitQualified_00()
{
// Example with no substrings
String s = "";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "" }, substrings);
}
[TestMethod()]
public void SplitQualified_00A()
{
// just a single delimiter
String s = ",";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "", "" }, substrings);
}
[TestMethod()]
public void SplitQualified_01()
{
// Example with no whitespace or qualifiers
String s = "1,2,3,1,2,3";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_02()
{
// Example with whitespace and no qualifiers
String s = " 1, 2 ,3, 1 ,2\t, 3 ";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_03()
{
// Example with whitespace and no qualifiers
String s = " 1, 2 ,3, 1 ,2\t, 3 ";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(
new List<String> { " 1", " 2 ", "3", " 1 ", "2\t", " 3 " },
substrings);
}
[TestMethod()]
public void SplitQualified_04()
{
// Example with no whitespace and trivial qualifiers.
String s = "1,\"2\",3,1,2,\"3\"";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
s = "\"1\",\"2\",3,1,\"2\",3";
substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_05()
{
// Example with no whitespace and qualifiers that enclose delimiters
String s = "1,\"2,2a\",3,1,2,\"3,3a\"";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2,2a", "3", "1", "2", "3,3a" },
substrings);
s = "\"1,1a\",\"2,2b\",3,1,\"2,2c\",3";
substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1,1a", "2,2b", "3", "1", "2,2c", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_06()
{
// Example with qualifiers enclosing whitespace but no delimiter
String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_07()
{
// Example with qualifiers enclosing whitespace but no delimiter
String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", "2 ", "3", "1", "2", "\t3\t" },
substrings);
}
[TestMethod()]
public void SplitQualified_08()
{
// Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 \" , 3,1, 2 ,\" 3 \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_09()
{
// Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 \" , 3,1, 2 ,\" 3 \"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 ", " 3", "1", " 2 ", " 3 " },
substrings);
}
[TestMethod()]
public void SplitQualified_10()
{
// Example with qualifiers enclosing whitespace and delimiter
String s = "\" 1 \",\"2 , 2b \",3,1,2,\" 3,3c \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2 , 2b", "3", "1", "2", "3,3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_11()
{
// Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 , 2b \" , 3,1, 2 ,\" 3,3c \"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 , 2b ", " 3", "1", " 2 ", " 3,3c " },
substrings);
}
[TestMethod()]
public void SplitQualified_12()
{
// Example with tab characters between delimiters
String s = "\t1,\t2\t,3,1,\t2\t,\t3\t";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_13()
{
// Example with newline characters between delimiters
String s = "\n1,\n2\n,3,1,\n2\n,\n3\n";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_14()
{
// Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
String s = "\" 1 \",\"\"\"2 , 2b \"\"\",3,1,2,\" \"\"3,3c \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "\"2 , 2b \"", "3", "1", "2", "\"3,3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_14A()
{
// Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
String s = "\"\"\"1\"\"\"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "\"1\"" },
substrings);
}
[TestMethod()]
public void SplitQualified_15()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with no whitespace or qualifiers
String s = "1|2|3|1|2,2f|3";
var substrings = s.SplitQualified('|', '#', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2,2f", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_16()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with qualifiers enclosing whitespace and delimiter
String s = "# 1 #|#2 | 2b #|3|1|2|# 3|3c #";
// whitespace should be removed
var substrings = s.SplitQualified('|', '#', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2 | 2b", "3", "1", "2", "3|3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_17()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
String s = "# 1 #| #2 | 2b # | 3|1| 2 |# 3|3c #";
// whitespace should be preserved
var substrings = s.SplitQualified('|', '#', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 | 2b ", " 3", "1", " 2 ", " 3|3c " },
substrings);
}

I had a problem with a CSV that contains fields with a quote character in them, so using the TextFieldParser, I came up with the following:
private static string[] parseCSVLine(string csvLine)
{
using (TextFieldParser TFP = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvLine))))
{
TFP.HasFieldsEnclosedInQuotes = true;
TFP.SetDelimiters(",");
try
{
return TFP.ReadFields();
}
catch (MalformedLineException)
{
StringBuilder m_sbLine = new StringBuilder();
for (int i = 0; i < TFP.ErrorLine.Length; i++)
{
if (i > 0 && TFP.ErrorLine[i]== '"' &&(TFP.ErrorLine[i + 1] != ',' && TFP.ErrorLine[i - 1] != ','))
m_sbLine.Append("\"\"");
else
m_sbLine.Append(TFP.ErrorLine[i]);
}
return parseCSVLine(m_sbLine.ToString());
}
}
}
A StreamReader is still used to read the CSV line by line, as follows:
using(StreamReader SR = new StreamReader(FileName))
{
while (SR.Peek() >-1)
myStringArray = parseCSVLine(SR.ReadLine());
}

With Cinchoo ETL - an open source library, it can automatically handles columns values containing separators.
string csv = #"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
using (var p = ChoCSVReader.LoadText(csv)
)
{
Console.WriteLine(p.Dump());
}
Output:
Key: Column1 [Type: String]
Value: 2
Key: Column2 [Type: String]
Value: 1016
Key: Column3 [Type: String]
Value: 7/31/2008 14:22
Key: Column4 [Type: String]
Value: Geoff Dalgas
Key: Column5 [Type: String]
Value: 6/5/2011 22:21
Key: Column6 [Type: String]
Value: http://stackoverflow.com
Key: Column7 [Type: String]
Value: Corvallis, OR
Key: Column8 [Type: String]
Value: 7679
Key: Column9 [Type: String]
Value: 351
Key: Column10 [Type: String]
Value: 81
Key: Column11 [Type: String]
Value: b437f461b3fd27387c5d8ab47a293d35
Key: Column12 [Type: String]
Value: 34
For more information, please visit codeproject article.
Hope it helps.

How many elements ( values ) are in each line in a text file

What to use in order to get the number of elements in each line. The example of the text file is given below. All I want to do is to get the number of elements in each line. Like the first line would have 4 elements, the second one 3 and so on.
1 5 4 6
2 4 6
1 9 8 7 5 3
3 2 1 1
private static void Skaitymaz(Trikampis[] trikampiai)
{
string line = null;
using (StreamReader reader = new StreamReader(#"U2.txt"))
{
string eilute = null;
while (null != (eilute = reader.ReadLine()))
{
int[] values = eilute.Split(' ');
}
}
}

Try,
string line = null;
using (StreamReader reader = new StreamReader(#"U2.txt"))
{
string eilute = null;
while (null != (eilute = reader.ReadLine()))
{
string[] values = eilute.Split(' ');
int noOfElement = values.Length;
}
}

You need to get length of the array after split,
values.Length

Something like that (Linq): read each line, split it by space or, probably, tabulation and count the items:
var numbers = File
.ReadLines(#"C:\MyText.txt")
.Select(line => line.Split(new Char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries).Length);
// Test: 4, 3, 6, 4
Console.Write(String.Join(", ", numbers));

Convert string list to corresponding int c#

I am making a quiz and have pulled a series of strings from a text file and added them a list, further separating the file info into individual strings. My question is, how would I make the individual strings correspond to a numerical value? For example A = 1, B = 2, so on and so forth.
The following code depicts the creation of the list and the adding of elements:
List<string> keyPool = new List<string>();
OpenFileDialog keyLoad = new OpenFileDialog();
keyLoad.Multiselect = false;
if (keyLoad.ShowDialog() == DialogResult.OK)
{
foreach (String fileName in keyLoad.FileNames)
{
key = File.ReadAllText(fileName);
kLabel.Text = ("Key:" + System.Environment.NewLine);
k1 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[0];
keyPool.Add(k1);
k2 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[1];
keyPool.Add(k2);
k3 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[2];
keyPool.Add(k3);
k4 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[3];
keyPool.Add(k4);
k5 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[4];
keyPool.Add(k5);
k6 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[5];
keyPool.Add(k6);
k7 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[6];
keyPool.Add(k7);
k8 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[7];
keyPool.Add(k8);
k9 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[8];
keyPool.Add(k9);
k10 = key.Split(new string[] { System.Environment.NewLine }, System.StringSplitOptions.RemoveEmptyEntries)[9];
keyPool.Add(k10);
}
}
How would this be done?

In general terms, you are looking for a dictionary. In this case, we are using it to define the mapping between strings and numbers:
Dictionary<string, int> mapping = new Dictionary<string, int>();
mapping.Add("A", 1);
...
int value = mapping["A"];
You could take advantage of the ASCII table if you just want to convert the first few capital letters to numbers:
int value = (int)stringValue[0] - (int)'A' + 1;

Assuming each "string" is a single letter, such as A, B, and so on, you can set up an enum and parse each letter into it's appropriate enum value:
public enum Letter
{
A = 1,
B = 2,
C = 3,
D = 4,
E = 5,
F = 6,
G = 7,
H = 8,
I = 9,
J = 10
}
You only have to split the string once, it puts all values into an array, and you can foreach through that and build up your list:
List<Letter> keyPool = new List<Letter>();
var letters = key.Split(new string[] { System.Environment.NewLine },
System.StringSplitOptions.RemoveEmptyEntries);
foreach(var letter in letters)
{
keyPool.Add((Letter)Enum.Parse(typeof(Letter), letter);
}
To convert and use it as an int, you can just cast it:
Letter letter = Letter.A;
int a = (int)letter;

use an enum
enum Keys
{
A=1,
B,
C,
//continue onward
}
To convert to/from string:
string s = Keys.B.ToString();
Keys key = (Keys)Enum.Parse(typeof(Keys), s);
To convert to/from int:
int i = (int)Keys.B;
Keys keyFromI = (Keys)i;

You don't need to add enum or dictionary for the alphabet !
static void Main(string[] args)
{
string intialString = "abc".ToUpper();
string numberString = "";
foreach (char c in intialString)
{
numberString += (int)c - 64;
}
Console.WriteLine(numberString);
}
Here is clear example. If you want use it !

Dividing a list of strings

Didn't quite know what to title this question so please feel free to edit.
I have a list of strings where all elements are strings with a length of 40.
What I want to do is split the list elements at character 20 and push the last part of the now divided string to the next element in the list, appending all other elements in the list.
E.g.
list[0] = 0011
list[1] = 2233
list[2] = 4455
^split here
// new list results in:
list[0] = 00
list[1] = 11
list[3] = 22
list[4] = 33
list[5] = 44
list[6] = 55
How can this be achieved?

list = list.SelectMany(s => new [] { s.Substring(0, 20), s.Substring(20, 20) })
.ToList();

list = list.SelectMany(x=>new[]{x.Substring(0, 20), x.Substring(20)}).ToList();

Not sure why you want to do that, but it's quite simple with linq:
List<string> split = list.SelectMany(s => new []{s.Substring(0, 2), s.Substring(2)}).ToList();

If you must work with the existing array:
const int elementCount = 3;
const int indexToSplit = 2;
string[] list = new string[elementCount * 2] { "0011", "0022", "0033", null, null, null };
for (int i = elementCount; i > 0; --i)
{
var str = list[i-1];
var left = str.Substring( 0, indexToSplit );
var right = str.Substring( indexToSplit, str.Length - indexToSplit );
var rightIndex = i * 2 - 1;
list[rightIndex] = right;
list[rightIndex - 1] = left;
}
foreach( var str in list )
{
Console.WriteLine( str );
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Skipping Lines with Dashes in a text file with regex in c# - c#

If you don't want to use regex you could also use !line.TrimStart().StartWith("-") shoud be the same and I think it is faster.

Related

How to align columns of different length in a txt file, based on max length of each column?

C# split text when delimiter may be in values [duplicate]

How many elements ( values ) are in each line in a text file

Convert string list to corresponding int c#

Dividing a list of strings

Categories

Resources