C# Text file and regular expressions - c#
I seem to be having a problem with the following file:
*User Type 0: Database Administrator
Users of this Type:
Database Administrator DBA Can Authorise:Y Administrator:Y
DM3 Admin Account DM3 Can Authorise:Y Administrator:Y
Permissions for these users:
Data - Currencies Parameters - Database Add FRA Deal Reports - Confirmation Production
Add Currency Amend Database Parameters Cancel FRA Deal Reports - System Printer Definitions
Delete Currency Parameters - Data Retention Amend FRA Deal Save System Printers
Amend Currency Amend Data Retention Parameters Amend Settlements Only Custom Confs/Tickets
Amend Currency Rates Data - Rate References Verify FRA Deal Add Custom Confs/Tickets
Amend Currency Holidays Add Rate Reference Add FRA Deal (Restricted) Delete Custom Confs/Tickets
Add Nostro Delete Rate Reference Release FRA Deal Amend Custom Confs/Tickets
Amend Nostro Amend Rate Reference Deal - IRS Reports - System Report Batches
Delete Nostro Deal - Call Accounts Add IRS Deal Save System Batches
Data - Currency Pairs Open Call Account Cancel IRS Deal Reports - View Reports Spooled
Add Currency Pair Amend Call Account Amend IRS Deal View - Audits
Delete Currency Pair Close Call Account Amend Settlements Only Print Audit
Amend Currency Pair Amend Settlements Only Verify IRS Deal Print Audit Detail
Data - Books Data - Sales Relationship Mgrs Add IRS Deal (Restricted) Filter Audit*
I am using a regular expression to check each line for a pattern. In total there are three patterns that need to match. If you look at the first three lines, that is all the information that needs to be taken from the file. The problem im having is that my regex is not matching. Also what needs to be done is the information needs to be taken from between two lines.... How do i do that?
This is the code i have so far:
string path = #"C:/User Permissions.txt";
string t = File.ReadAllText(path);
//Uses regular expression check to match the specified string pattern
string pattern1 = #"User Type ";
string pattern2 = #"Users of this Type:";
string pattern3 = #"Permissions for these users:";
Regex rgx1 = new Regex(pattern1);
Regex rgx2 = new Regex(pattern2);
Regex rgx3 = new Regex(pattern3);
MatchCollection matches = rgx1.Matches(t);
List<string[]> test = new List<string[]>();
foreach (var match in matches)
{
string[] newString = match.ToString().Split(new string[] { #"User Type ", }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 3; i <= newString.Length; i++)
{
test.Add(new string[] { newString[0], newString[1], newString[i - 1] });
}
}
MatchCollection matches2 = rgx2.Matches(t);
List<string[]> test2 = new List<string[]>();
foreach (var match2 in matches2)
{
string[] newString = match2.ToString().Split(new string[] { #"Permissions for these users: ", }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 3; i <= newString.Length; i++)
{
test2.Add(new string[] { newString[0], newString[1], newString[i - 1] });
}
}
MatchCollection matches3 = rgx3.Matches(t);
List<string[]> test3 = new List<string[]>();
foreach (var match3 in matches3)
{
string[] newString = match3.ToString().Split(new string[] { #"Users of this Type: ", }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 3; i <= newString.Length; i++)
{
test3.Add(new string[] { newString[0], newString[1], newString[i - 1] });
}
}
foreach (var line in test)
{
Console.WriteLine(line[0]);
Console.ReadLine();
}
Console.ReadLine();
Guffa's code seems very efficient compared to mine, the only problem i'm having now is how to extract the lines between "Users of this type" and Permissions for these users". How would go about doing this? Obviously checking to see if the name begins on a new line won't help.
No, you are not checking each line for a pattern, you are looking for the pattern in the entire file as a single string, and you only get the exact text that matches, so when you split each result you end up with an array containing two empty strings.
If I understand correctly, each line consists of a key and a value, so there is not really any point in using regular expressions for this. Just loop through the lines and compare strings.
Here is a start:
string[] lines = #"C:/User Permissions.txt"; string t = File.ReadAllLines(path);
foreach (string line in lines) {
if (line.StartsWith("User Type ") {
Console.WriteLine("User type:" + line.Substring(10));
} else if (line.StartsWith("Users of this Type:") {
Console.WriteLine("Users:" + line.Substring(19));
} else if (line.StartsWith("Permissions for these users:") {
Console.WriteLine("Permissions:" + line.Substring(28));
}
}
Edit:
Here is how to use a regular loop instead of a foreach, so that you can use an inner loop that reads lines:
string[] lines = #"C:/User Permissions.txt"; string t = File.ReadAllLines(path);
int line = 0;
while (line < lines.Length) {
if (lines[line].StartsWith("User Type ") {
Console.WriteLine("User type:" + lines[line].Substring(10));
} else if (lines[line].StartsWith("Users of this Type:") {
line++;
while (line < lines.Length && !lines[line].StartsWith("Permissions for these users:")) {
Console.WriteLine("User: " + lines[line]);
line++;
}
} else if (lines[line].StartsWith("Permissions for these users:") {
Console.WriteLine("Permissions:" + lines[line].Substring(28));
}
line++;
}
You are not going to succeed in extracting the data that you want from this txt dump using reg-exp (and hardly using any other technique without investing too much effort).
The most important obstacle to using regexp that I can see is the fact that information is actually listed in columns accross the txt file.
The problem is best illustrated with the fact that the category
Data - Sales Relationship Mgrs
is in one column whereas all the permissions for that category are in the next column.
Please investigate whether this information can be obtained in a different way.
Still, here is a rough algoritimic strategy for dealing with the file as is:
Read the file line by line,
Look at predefined offsets into the line for the information you are interested in.
When you get to the information stacked in columns, you could temporarily append each column to separate collections as you parse each line
Finally attempt to extract the privileges from a concatenation of all the temporary columns.
Related
SQL Insert not considering blank values for the insert in my C# code
I have a nice piece of C# code which allows me to import data into a table with less columns than in the SQL table (as the file format is consistently bad). My problem comes when I have a blank entry in a column. The values statement does not pickup an empty column from the csv. And so I receive the error You have more insert columns than values Here is the query printed to a message box... As you can see there is nothing for Crew members 4 to 11, below is the file... Please see my code: SqlConnection ADO_DB_Connection = new SqlConnection(); ADO_DB_Connection = (SqlConnection) (Dts.Connections["ADO_DB_Connection"].AcquireConnection(Dts.Transaction) as SqlConnection); // Inserting data of file into table int counter = 0; string line; string ColumnList = ""; // MessageBox.Show(fileName); System.IO.StreamReader SourceFile = new System.IO.StreamReader(fileName); while ((line = SourceFile.ReadLine()) != null) { if (counter == 0) { ColumnList = "[" + line.Replace(FileDelimiter, "],[") + "]"; } else { string query = "Insert into " + TableName + " (" + ColumnList + ") "; query += "VALUES('" + line.Replace(FileDelimiter, "','") + "')"; // MessageBox.Show(query.ToString()); SqlCommand myCommand1 = new SqlCommand(query, ADO_DB_Connection); myCommand1.ExecuteNonQuery(); } counter++; } If you could advise how to include those fields in the insert that would be great. Here is the same file but opened with a text editor and not given in picture format... Date,Flight_Number,Origin,Destination,STD_Local,STA_Local,STD_UTC,STA_UTC,BLOC,AC_Reg,AC_Type,AdultsPAX,ChildrenPAX,InfantsPAX,TotalPAX,AOC,Crew 1,Crew 2,Crew 3,Crew 4,Crew 5,Crew 6,Crew 7,Crew 8,Crew 9,Crew 10,Crew 11 05/11/2022,241,BOG,SCL,15:34,22:47,20:34,02:47,06:13,N726AV,"AIRBUS A-319 ",0,0,0,36,AV,100612,161910,323227
Not touching the potential for sql injection as I'm free handing this code. If this a system generated file (Mainframe extract, dump from Dynamics or LoB app) the probability for sql injection is awfully low. // Char required char FileDelimiterChar = FileDelimiter.ToChar()[0]; int columnCount = 0; while ((line = SourceFile.ReadLine()) != null) { if (counter == 0) { ColumnList = "[" + line.Replace(FileDelimiterChar, "],[") + "]"; // How many columns in line 1. Assumes no embedded commas // The following assumes FileDelimiter is of type char // Add 1 as we will have one fewer delimiters than columns columnCount = line.Count(x => x == FileDelimiterChar) +1; } else { string query = "Insert into " + TableName + " (" + ColumnList + ") "; // HACK: this fails if there are embedded delimiters int foundDelimiters = line.Count(x => x == FileDelimiter) +1; // at this point, we know how many delimiters we have // and how many we should have. string csv = line.Replace(FileDelimiterChar, "','"); // Pad out the current line with empty strings aka ',' // Note: I may be off by one here // Probably a classier linq way of doing this or string.Concat approach for (int index = foundDelimiters; index <= columnCount; index++) { csv += "','"; } query += "VALUES('" + csv + "')"; // MessageBox.Show(query.ToString()); SqlCommand myCommand1 = new SqlCommand(query, ADO_DB_Connection); myCommand1.ExecuteNonQuery(); } counter++; } Something like that should get you a solid shove in the right direction. The concept is that you need to inspect the first line and see how many columns you should have. Then for each line of data, how many columns do you actually have and then stub in the empty string. If you change this up to use SqlCommand objects and parameters, the approximate logic is still the same. You'll add all the expected parameters by figuring out columns in the first line and then for each line you will add your values and if you have a short row, you just send the empty string (or dbnull or whatever your system expects). The big take away IMO is that CSV parsing libraries exist for a reason and there are so many cases not addressed in the above psuedocode that you'll likely want to trash the current approach in favor of a standard parsing library and then while you're at it, address the potential security flaws. I see your updated comment that you'll take the formatting concerns back to the source party. If they can't address them, I would envision your SSIS package being Script Task -> Data Flow task. Script Task is going to wrangle the unruly data into a strict CSV dialect that a Data Flow task can handle. Preprocessing the data into a new file instead of trying to modify the existing in place. The Data Flow then becomes a chip shot of Flat File Source -> OLE DB Destination
Here's how you can process this file... I would still ask for Json or XML though. You need two outputs set up. Flight Info (the 1st 16 columns) and Flight Crew (a business key [flight number and date maybe] and CrewID). Seems to me the problem is how the crew is handled in the CSV. So basic steps are Read the file, use regex to split it, write out first 16 col to output1 and the rest (with key) to flight crew. And skip the header row on your read. var lines = System.File.IO.ReadAllLines("filepath"); for(int i =1; i<lines.length; i++) { var = new System.Text.RegularExpressions.Regex("new Regex("(?:^|,)(?=[^\"]|(\")?)\"?((?(1)(?:[^\"]|\"\")*|[^,\"]*))\"?(?=,|$)"); //Some code I stole to split quoted CSVs var m = r.Matches(line[i]); //Gives you all matches in a MatchCollection //first 16 columns are always correct OutputBuffer0.AddRow(); OutputBuffer0.Date = m[0].Groups[2].Value; OutputBuffer0.FlightNumber = m[1].Groups[2].Value; [And so on until m[15]] for(int j=16; j<m.Length; j++) { OutputBuffer1.AddRow(); //This is a new output that you need to set up OutputBuffer1.FlightNumber = m[1].Groups[2].Value; [Keep adding to make a business key here] OutputBuffer1.CrewID = m[j].Groups[2].Value; } } Be careful as I just typed all this out to give you a general plan without any testing. For example m[0] might actually be m[0].Value and all of the data types will be strings that will need to be converted. To check out how regex processes your rows, please visit https://regex101.com/r/y8Ayag/1 for explanation. You can even paste in your row data. UPDATE: I just tested this and it works now. Needed to escape the regex function. And specify that you wanted the value of group 2. Also needed to hit IO in the File.ReadAllLines.
The solution that I implemented in the end avoided the script task completely. Also meaning no SQL Injection possibilities. I've done a flat file import. Everything into one column then using split_string and a pivot in SQL then inserted into a staging table before tidy up and off into main. Flat File Import to single column table -> SQL transform -> Load This also allowed me to iterate through the files better using a foreach loop container. ELT on this occasion. Thanks for all the help and guidance.
Issue renaming two columns in a CSV file instead of one
I need to be able to rename the column in a spreadsheet from 'idn_prod' to 'idn_prod1', but there are two columns with this name. I have tried implementing code from similar posts, but I've only been able to update both columns. Below you'll find the code I have that just renames both columns. //locate and edit column in csv string file1 = #"C:\Users\username\Documents\AppDevProjects\import.csv"; string[] lines = System.IO.File.ReadAllLines(file1); System.IO.StreamWriter sw = new System.IO.StreamWriter(file1); foreach(string s in lines) { sw.WriteLine(s.Replace("idn_prod", "idn_prod1")); } I expect only the 2nd column to be renamed, but the actual output is that both are renamed. Here are the first couple rows of the CSV:
I'm assuming that you only need to update the column header, the actual rows need not be updated. var file1 = #"test.csv"; var lines = System.IO.File.ReadAllLines(file1); var columnHeaders = lines[0]; var textToReplace = "idn_prod"; var newText = "idn_prod1"; var indexToReplace = columnHeaders .LastIndexOf("idn_prod");//LastIndex ensures that you pick the second idn_prod columnHeaders = columnHeaders .Remove(indexToReplace,textToReplace.Length) .Insert(indexToReplace, newText);//I'm removing the second idn_prod and replacing it with the updated value. using (System.IO.StreamWriter sw = new System.IO.StreamWriter(file1)) { sw.WriteLine(columnHeaders); foreach (var str in lines.Skip(1)) { sw.WriteLine(str); } sw.Flush(); }
Replace foreach(string s in lines) loop with for loop and get the lines count and rename only the 2nd column.
I believe the only way to handle this properly is to crack the header line (first string that has column names) into individual parts, separated by commas or tabs or whatever, and run through the columns one at a time yourself. Your loop would consider the first line from the file, use the Split function on the delimiter, and look for the column you're interested in: bool headerSeen = false; foreach (string s in lines) { if (!headerSeen) { // special: this is the header string [] parts = s.Split("\t"); for (int i = 0; i < parts.Length; i++) { if (parts[i] == "idn_prod") { // only fix the *first* one seen parts[i] = "idn_prod1"; break; } } sw.WriteLine( string.Join("\t", parts)); headerSeen = true; } else { sw.WriteLine( s ); } } The only reason this is even remotely possible is that it's the header and not the individual lines; headers tend to be more predictable in format, and you worry less about quoting and fields that contain the delimiter, etc. Trying this on the individual data lines will rarely work reliably: if your delimiter is a comma, what happens if an individual field contains a comma? Then you have to worry about quoting, and this enters all kinds of fun. For doing any real CSV work in C#, it's really worth looking into a package that specializes in this, and I've been thrilled with CsvHelper from Josh Close. Highly recommended.
Finding multiple semi predictable patterns in a string
Alright, so I'm writing an application that needs to be able to extract a VAT-Number from an invoice (https://en.wikipedia.org/wiki/VAT_identification_number) The biggest challenge to overcome here is that as apparent from the wikipedia article I have linked to, each country uses its own format for these VAT-numbers (The Netherlands uses a 14 character number while Germany uses a 11 character number). In order to extract these numbers, I throw every line from the invoice into an array of strings, and for each string I test if it has a length that is equal to one of the VAT formats, and if that checks out, I check if said string also contains a country code ("NL", "DE", etc). string[] ProcessedFile = Reader.ProcessFile(Input); foreach(string S in ProcessedFile) { RtBEditor.AppendText(S + "\n"); } foreach(string X in ProcessedFile) { string S = X.Replace(" ", string.Empty); if (S.Length == 7) { if (S.Contains("GBGD")) { MessageBox.Show("Land = Groot Britanie (Regering)"); } } /* repeat for all other lenghts and country codes. */ The problem with this code is that 1st: if there is a string that happens to have the same length as one of the VAT-formats, and it has a country code embedded in it, the code will incorrectly think that it has found the VAT-number. 2nd: In some cases, the VAT-number will be included like "VAT-number: [VAT-number]". In this case, the text that precedes the actual number will be added to its length, making the program unable to detect the actual VAT-Number. The best way to fix this is in my assumption to somehow isolate the VAT-Number from the strings all together, but I have yet to find a way how to actually do this. Does anyone by any chance know any potential solution? Many thanks in advance! EDIT: Added a dummy invoice to clarify what kind of data is contained within the invoices.
As someone in the comments had pointed out, the best way to fix this is by using Regex. After trying around a bit I came to the following solution: public Regex FilterNormaal = new Regex(#"[A-Z]{2}(\d)+B?\d*"); private void BtnUitlezen_Click(object sender, EventArgs e) { RtBEditor.Clear(); /* Temp dummy vatcodes for initial testing. */ Form1.Dummy1.VAT = "NL855291886B01"; Form1.Dummy2.VAT = "DE483270846"; Form1.Dummy3.VAT = "SE482167803501"; OCR Reader = new OCR(); /* Grab and process image */ if(openFileDialog1.ShowDialog() == DialogResult.OK) { try { Input = new Bitmap(openFileDialog1.FileName); } catch { MessageBox.Show("Please open an image file."); } } string[] ProcessedFile = Reader.ProcessFile(Input); foreach(string S in ProcessedFile) { string X = S.Replace(" ", string.Empty); RtBEditor.AppendText(X + "\n"); } foreach (Match M in FilterNormaal.Matches(RtBEditor.Text)) { MessageBox.Show(M.Value); } } At first, I attempted to iterate through my array of strings to find a match, but for reasons unknown, this did not yield any results. When applying the regex to the entire textbox, it did output the results I needed.
How can i analise millions of strings that merge into each other?
I have millions of strings, around 8GB worth of HEX; each string is 3.2kb in length. Each of these strings contains multiple parts of data I need to extract. This is an example of one such string: GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ$GPGGA,104646.091,,,,,0,0,,,M,,M,,*41$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test2ÿÿ3ÿÿ4ÿÿ5ÿÿ6ÿÿ7ÿÿ8ÿÿ9ÿÿ:ÿÿ;ÿÿ<ÿÿ=ÿÿ>ÿÿ?ÿÿ#ÿÿAÿÿBÿÿCÿÿDÿÿEÿÿFÿÿGÿÿHÿÿIÿÿJÿÿ$GPGGA,104647.091,,,,,0,0,,,M,,M,,*40$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header TestKÿÿLÿÿMÿÿNÿÿOÿÿPÿÿQÿÿRÿÿSÿÿTÿÿUÿÿVÿÿWÿÿXÿÿYÿÿZÿÿ[ÿÿ\ÿÿ]ÿÿ^ÿÿ_ÿÿ`ÿÿaÿÿbÿÿcÿÿ$GPGGA,104648.091,,,,,0,0,,,M,,M,,*4F$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Testdÿÿeÿÿfÿÿgÿÿhÿÿiÿÿjÿÿkÿÿlÿÿmÿÿnÿÿoÿÿpÿÿqÿÿrÿÿsÿÿtÿÿuÿÿvÿÿwÿÿxÿÿyÿÿzÿÿ{ÿÿ|ÿÿ$GPGGA,104649.091,,,,,0,0,,,M,,M,,*4E$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test}ÿÿ~ÿÿ.ÿÿ€ÿÿ.ÿÿ‚ÿÿƒÿÿ„ÿÿ…ÿÿ†ÿÿ‡ÿÿˆÿÿ‰ÿÿŠÿÿ‹ÿÿŒÿÿ.ÿÿŽÿÿ.ÿÿ.ÿÿ‘ÿÿ’ÿÿ“ÿÿ”ÿÿ•ÿÿ$GPGGA,104650.091,,,,,0,0,,,M,,M,,*46$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Head as you can see it is pretty much this repeated: GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ I want to separate this string into two lists like this: _GPSList $GPGGA,104644.091,,,,,0,0,,,M,,M,,*43 $GPVTG,0.00,T,,M,0.00,N,0.00,K,N* $GPVTG,0.00,T,,M,0.00,N,0.00,K,N _WavList 32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ 32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ Issue 1: This repetition isn't containing within a single string, it overflows into the next string. so if some data crosses the end and start of two strings how to I deal with that? Issue 2: How do I analyse the string and extract only the parts I need?
The solution I'm providing is not a complete answer but more like an idea which might help you get what you want. Everything else which I present is an assumption on my behalf. //Assuming your data is stored in a file "yourdatafile" //Splitting all the text on "$" assuming this will separate GPSData string[] splittedstring = File.ReadAllText("yourdatafile").Split('$'); //I found an extra string lingering in the sample you provided //because I splitted on "$", so you gotta take that into account var GPSList = new List<string>(); var WAVList = new List<string>(); foreach (var str in splittedstring) { //So if the string contains "Header" we would want to separate it from GPS data if (str.Contains("Header")) { string temp = str.Remove(str.IndexOf("Header")); int indexOfAsterisk = temp.LastIndexOf("*"); string stringBeforeAsterisk = str.Substring(0, indexOfAsterisk + 1); string stringAfterAsterisk = str.Replace(stringBeforeAsterisk, ""); WAVList.Add(stringAfterAsterisk); GPSList.Add("$" + stringBeforeAsterisk); } else GPSList.Add("$" + str); } This provides the exact output as you need, only exception is with that extra string. Also some non-standard characters might look like black blocks.
Error trying to read csv file
Good Day, i am having trouble reading csv files on my asp.net project. it always returns the error index out of range cannot find column 6 before i go on explaining what i did here is the code: string savepath; HttpPostedFile postedFile = context.Request.Files["Filedata"]; savepath = context.Server.MapPath("files"); string filename = postedFile.FileName; todelete = savepath + #"\" + filename; string forex = savepath + #"\" + filename; postedFile.SaveAs(savepath + #"\" + filename); DataTable tblcsv = new DataTable(); tblcsv.Columns.Add("latitude"); tblcsv.Columns.Add("longitude"); tblcsv.Columns.Add("mps"); tblcsv.Columns.Add("activity_type"); tblcsv.Columns.Add("date_occured"); tblcsv.Columns.Add("details"); string ReadCSV = File.ReadAllText(forex); foreach (string csvRow in ReadCSV.Split('\n')) { if (!string.IsNullOrEmpty(csvRow)) { //Adding each row into datatable tblcsv.Rows.Add(); int count = 0; foreach (string FileRec in csvRow.Split('-')) { tblcsv.Rows[tblcsv.Rows.Count - 1][count] = FileRec; count++; } } } i tried using comma separated columns but the string that comes with it contains comma so i tried the - symbol just to make sure that there are no excess commas on the text file but the same error is popping up. am i doing something wrong? thank you in advance
Your excel file might have more columns than 6 for one or more rows. For this reason the splitting in inner foreach finds more columns but the tblcsv does not have more columns than 6 to assign the extra column value. Try something like this: foreach (string FileRec in csvRow.Split('-')) { if(count > 5) return; tblcsv.Rows[tblcsv.Rows.Count - 1][count] = FileRec; count++; } However it would be better if you check for additional columns before processing and handle the issue.
StringBuilder errors = new StringBuilder(); //// this will hold the record for those array which have length greater than the 6 foreach (string csvRow in ReadCSV.Split('\n')) { if (!string.IsNullOrEmpty(csvRow)) { //Adding each row into datatable DataRow dr = tblcsv.NewRow(); and then int count = 0; foreach (string FileRec in csvRow.Split('-')) { try { dr[count] = FileRec; tblcsv.Rows.Add(dr); } catch (IndexOutOfRangeException i) { error.AppendLine(csvRow;) break; } count++; } } } Now in this case we will have the knowledge of the csv row which is causing the errors, and rest will be processed successfully. Validate the row in errors whether its desired input, if not then correct value in csv file.
You can't treat the file as a CSV if the delimiter appears inside a field. In this case you can use a regular expression to extract the first five fields up to the dash, then read the rest of the line as the sixth field. With a regex you can match the entire string and even avoid splitting lines. Regular expressions are also a lot faster than splits and consume less memory because they don't create temporary strings. That's why they are used extensively to parse log files. The ability to capture fields by name doesn't hurt either The following sample parses the entire file and captures each field in a named group. The last field captures everything to the end of the line: var pattern="^(?<latitude>.*?)-(?<longitude>.*?)-(?<mps>.*?)-(?<activity_type>.*?)-" + "(?<date_occured>.*?)-(?<detail>.*)$"; var regex=new Regex(pattern,RegexOptions.Multiline); var matches=regex.Matches(forex); foreach (Match match in matches) { DataRow dr = tblcsv.NewRow(); row["latitude"]=match.Groups["latitude"].Value); row["longitude"]=match.Groups["longitude"].Value); ... tblcsv.Rows.Add(dr); } The (?<latitude>.*?)- pattern captures everything up to the first dash into a group named latitude. The .*? pattern means the matching isn't greedy ie it won't try to capture everything to the end of the line but will stop when the first - is encountered. The column names match the field names, which means you can add all fields with a loop: foreach (Match match in matches) { var row = tblCsv.NewRow(); foreach (Group group in match.Groups) { foreach (DataColumn col in tblCsv.Columns) { row[col.ColumnName]=match.Groups[col.ColumnName].Value; } } tblCsv.Rows.Add(row); } tblCsv.Rows.Add(row);