I am importing excel into sql server db the excel sheet has three columns :
id(number only)|data|passport
before importing it i want to check for certain things such as:
the passport should begin a letter and rest of the characters must be numbers
id must be numeric only
I am able to check for passport but i am not able to check id even though i am using same code i used for checking passport.
using (DbDataReader dr = command.ExecuteReader())
{
// SQL Server Connection String
string sqlConnectionString = "Data Source=DITSEC3;Initial Catalog=test;Integrated Security=True";
con.Open();
DataTable dt7 = new DataTable();
dt7.Load(dr);
DataRow[] ExcelRows = new DataRow[dt7.Rows.Count];
DataColumn[] ExcelColumn = new DataColumn[dt7.Columns.Count];
//=================================================
for (int i1 = 0; i1 < dt7.Rows.Count; i1++)
{
if (dt7.Rows[i1]["passport"] == null)
{
dt7.Rows[i1]["passport"] = 0;
}
if (dt7.Rows[i1]["id"] == null)
{
dt7.Rows[i1]["id"] = 0;
}
string a = Convert.ToString(dt7.Rows[i1]["passport"]);
string b = dt7.Rows[i1]["id"].ToString();
if (!string.IsNullOrEmpty(b))
{
int idlen = b.Length;
for (int j = 0; j < idlen; j++)
{
if (Char.IsDigit(b[j]))
{
//action
}
if(!Char.IsDigit(b[j]))
{
flag = flag + 1;
int errline = i1 + 2;
Label12.Text = "Error at line: " + errline.ToString();
//Label12.Visible = true;
}
}
if (!String.IsNullOrEmpty(a))
{
int len = a.Length;
for (int j = 1; j < len; j++)
{
if (Char.IsLetter(a[0]) && Char.IsDigit(a[j]) && !Char.IsSymbol(a[j]))
{
//action
}
else
{
flag = flag + 1;
int errline = i1 + 2;
Label12.Text = "Error at line: " + errline.ToString();
//Label12.Visible = true;
}
}
}
}
For some strange reason when i use breakpoint i can see the values of id as long as id is numeric in excel the moment flow comes to cell which has id as 25h547 the value if b turn "" any reason for this? i can give you entire code if you require.
What seems to be happening is that when the data is imported into the holding datatable and the first record in column is alphanumeric it will assume all the records in the column to be alphanumeric if the first one is numeric it will assume that all records in the column are numeric and therefore will be blank for alphanumeric records which occur somewhere in column. I solved the problem myself by modifying connectionstring : "Excel 8.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text"
"IMEX=1;" tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text.
specify the imex mode in connectionstring to handle mixed values
See: Mixed values in excel rows
Missing values. The Excel driver reads a certain number of rows (by
default, 8 rows) in the specified source to guess at the data type of
each column. When a column appears to contain mixed data types,
especially numeric data mixed with text data, the driver decides in
favor of the majority data type, and returns null values for cells
that contain data of the other type. (In a tie, the numeric type
wins.) Most cell formatting options in the Excel worksheet do not seem
to affect this data type determination. You can modify this behavior
of the Excel driver by specifying Import Mode. To specify Import Mode,
add IMEX=1 to the value of Extended Properties in the connection
string of the Excel connection manager in the Properties window
Related
I'm trying to parse Excel (.xls, .xlsx) files. The structure of files is the same except for the amount of the records.
I need to parse the industry. In this case it is "FinTech". Due to the fact that it is in one cell, I guess I have to use a regex expression such as ^Industry: (.*)$?
It has to find which row/column the list of the people starts and put it into a IEnumerable<Person>. It could use the following regex expressions.
Number always consists of 6 digits. ^[0-9]{6}$
Name consists of at least two words where each one of them starts with a capital letter. ^([a-zA-Z]+\s?\b){2,}$
A test .xlsx file can be found here https://docs.google.com/spreadsheets/d/15SR04cHXgGLWe0cuOOuuB5vUZigebh96/edit?usp=sharing&ouid=112418126731411268789&rtpof=true&sd=true.
List of people
Normal condition
Industry: FinTech
# Number Name
1 226250 Zain Griffiths
2 226256 Michael Houghton
3 226259 Hugo Willis Johnson
4 226264 Anna-Maria Rose
The actual question
First of all, I'm not completely sure if my regex expressions are correct. I was only able to display the rows and the columns but I'm not sure how to actually parse the industry and the list of the people into a IEnumerable<Person>. So how do I do that?
Snippet
// Program.cs
var excel = new ExcelParser();
var sheet1 = excel.Import(#"a.xlsx");
Console.OutputEncoding = Encoding.UTF8;
for (var i = 0; i < sheet1.Rows.Count; i++)
{
for (var j = 0; j < sheet1.Columns.Count; j++)
{
var cell = sheet1.Rows[i][j].ToString()?.Trim();
Console.Write($"Column: {cell} | ");
}
Console.WriteLine();
}
Console.ReadLine();
// ExcelParser.cs
public sealed class ExcelParser
{
public ExcelParser()
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
}
public DataTable Import(string filePath)
{
// does file exist?
if (!File.Exists(filePath))
{
throw new FileNotFoundException();
}
// .xls or .xlsx allowed
var extension = new FileInfo(filePath).Extension.ToLowerInvariant();
if (extension is not (".xls" or ".xlsx"))
{
throw new NotSupportedException();
}
// read .xls or .xlsx
using var stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
using var reader = ExcelReaderFactory.CreateReader(stream);
var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
UseHeaderRow = false
}
});
// Sheet1
return dataSet.Tables[0];
}
}
The structure of files is the same except for the amount of the records
As long as the table is structured (or semi-structured), you can state one/two simple assumptions and parse the tables based on these assumptions, and in case the structure is not following the assumptions, you will return false (throw exception, etc..).
Actually, designing regexs to parse the table is kind of assumptions encoding.. I just want to Keep it simple, So, Based on the problem statement, here are my assumptions:
There will be a "industry" (or "industry:", call .ToLower()) string in a separate cell (regex will do nothing more than finding such a string), and industry's name will be in the same cell.[1]
First person's name will be next to the first 6-digits-number cell.[2]
Here is the code
public (string industryName, List<string> peopleNames) ParseSheet(DataTable sheet1)
{
// 1. Get Indices of industry cell and first Name in people names..
var industryCellIndex = (-1, -1, false);
var peopleFirstCellIndex = (-1, -1, false);
for (var i = 0; i < sheet1.Rows.Count; i++)
{
for (var j = 0; j < sheet1.Columns.Count; j++)
{
// .ToLower() added
var cell = sheet1.Rows[i][j].ToString()?.Trim().ToLower();
if (cell.StartsWith("industry"))
{
industryCellIndex = (i, j, true);
break;
}
// the name after the first 6-digits number cell will be the first name in people records
if (cell.Length == 6 && int.TryParse(cell, out _))
{
peopleFirstCellIndex = (i, j + 1, true);
break;
}
}
if (industryCellIndex.Item3 && peopleFirstCellIndex.Item3)
break;
}
if (!industryCellIndex.Item3 || !peopleFirstCellIndex.Item3)
{
// throw new Exception("Excel file is not normalized!");
return (null, null);
}
// 2. retrieve the desired data
var industryName = sheet1.Rows[industryCellIndex.Item1][industryCellIndex.Item2]
.Replace(":", ""); // will do nothing if there were no ":"
industryName = industryName.Substring(industryName.IndexOf("indusrty") + "indusrty".Length);
var peopleNames = new List<string>();
var colIndex = peopleFirstCellIndex.Item2;
for (var rowIndex = peopleFirstCellIndex.Item1;
rowIndex < sheet1.Rows.Count;
rowIndex++)
{
peopleNames.Add(sheet1.Rows[rowIndex][colIndex].ToString()?.Trim());
}
return (industryName, peopleNames);
}
[1] If this assumption needs some editing (like: the indusrty name might be the next cell that has "industry" string), the idea still the same.. you can consider this in parsing.
[2] And, for example, after the "#" cell by 2 columns and 1 row.
I am making a program in Visual Studio where you can read in an excel file in a specific format and where my program converts the data from the excel file in a different format and stores it in a database table.
Below you can find a part of my code where something strange happens
//copy schema into new datatable
DataTable _longDataTable = _library.Clone();
foreach (DataRow drlibrary in _library.Rows)
{
//count number of variables in a row
string check = drlibrary["Check"].ToString();
int varCount = check.Length - check.Replace("{", "").Length;
int count_and = 0;
if (check.Contains("and") || check.Contains("or"))
{
count_and = Regex.Matches(check, "and").Count;
varCount = varCount - count_and;
}
//loop through number of counted variables in order to add rows to long datatable (one row per variable)
for (int i = 1; i <= varCount; i++)
{
var newRow = _longDataTable.NewRow();
newRow.ItemArray = drlibrary.ItemArray;
string j = i.ToString();
//fill variablename with variable number
if (i < 10)
{
newRow["VariableName"] = "Variable0" + j;
}
else
{
newRow["VariableName"] = "Variable" + j;
}
}
}
When varCount equals 1, I get the following error message when running the program after inserting an excel file
The source contains no DataRows.
I don't know why I can't run the for loop with just one iteration. Anyone who can help me?
I have a csv file with 8 columns, and I am trying to populate an object with 8 variables, each being a list to hold the columns in the csv file. Firstly, I am populating a DataTable with my csv data.
I am now trying to populate my object with the data from the DataTable
DataTable d = GetDataTableFromCSVFile(file);
CoolObject l = new CoolObject();
for (int i = 0; i < d.Rows.Count; i++)
{
l.column1[i] = d.Rows[i].Field<int>("column1"); <-- error here
}
And here is my CoolObject
public class CoolObject
{
public List<int> column1 { set; get; }
protected CoolObject()
{
column1 = new List<int>();
}
}
Unfortunately I am receiving an error on the highlighted line:
System.InvalidCastException: Specified cast is not valid
Why is this not allowed? How do I work around it?
Obviously you DataTable contains columns of type string, so do integer validation in GetDataTableFromCSVFile method, so consumers of this method don't need to worry about it.
Obviously you DataTable contains columns of type string, so do integer validation in GetDataTableFromCSVFile method, so consumers of this method don't need to worry about it.
private DataTable GetDataTableFromCSVFile()
{
var data = new DataTable();
data.Columns.Add("Column1", typeof(int));
// Read lines of file
// line is imaginery object which contains values of one row of csv data
foreach(var line in lines)
{
var row = data.NewRow();
int.TryParse(line.Column1Value, out int column1Value)
row.SetField("Column1", column1Value) // will set 0 if value is invalid
// other columns
}
return data;
}
Then another problem with your code, that you assugn new values to List<int> through index, where list is empty
l.column1[i] = d.Rows[i].Field<int>("column1");
Above line will throw exception because empty list doesn't have item on index i.
So you in the end your method will look
DataTable d = GetDataTableFromCSVFile(file);
CoolObject l = new CoolObject();
foreach (var row in d.Rows)
{
l.column1.Add(row.Field<int>("column1"));
}
In case you are using some third-party library for retrieving data from csv to DataTable - you can check if that library provide possibility to validate/convert string values to expected types in DataTable.
Sounds like someone didn't enter a number in one of the cells. You'll have to perform a validation check before reading the value.
for (int i = 0; i < d.Rows.Count; i++)
{
object o = d.rows[i]["column1"];
if (!o is int) continue;
l.column1[i] = (int)o;
}
Or perhaps it is a number but for some reason is coming through as a string. You could try it this way:
for (int i = 0; i < d.Rows.Count; i++)
{
int n;
bool ok = int.TryParse(d.rows[i]["column1"].ToString(), out n);
if (!ok) continue;
l.column1[i] = n;
}
UPDATED: added full block of code where error occurs
UPDATE 2: I found a weird anomaly. The code has now been continuously breaking on that line, when the tabName variable equals "service line prior year". This morning, for grins, I changed the tab name to "test", so in turn the tabName variable equals "test", and it worked more often then not. I am really at a loss.
I have researched a ton and can't find anything that addresses what is happening in my code. It happens randomly though. Sometimes it doesn't happen, then other times it happens in the same spot, but all on this part of the code (on the line templateSheet = templateBook.Sheets[tabName];):
public void ExportToExcel(DataSet dataSet, string filePath, int i, int h, Excel.Application excelApp)
{
//create the excel definitions again.
//Excel.Application excelApp = new Excel.Application();
//excelApp.Visible = true;
FileInfo excelFileInfo = new FileInfo(filePath);
Boolean fileOpenTest = IsFileOpen(excelFileInfo);
Excel.Workbook templateBook;
Excel.Worksheet templateSheet;
//check to see if the template is already open, if its not then open it,
//if it is then bind it to work with it
if (!fileOpenTest)
{ templateBook = excelApp.Workbooks.Open(filePath); }
else
{ templateBook = (Excel.Workbook)System.Runtime.InteropServices.Marshal.BindToMoniker(filePath); }
//this grabs the name of the tab to dump the data into from the "Query Dumps" Tab
string tabName = lstQueryDumpSheet.Items[i].ToString();
templateSheet = templateBook.Sheets[tabName];
excelApp.Calculation = Excel.XlCalculation.xlCalculationManual;
templateSheet = templateBook.Sheets[tabName];
// Copy DataTable
foreach (System.Data.DataTable dt in dataSet.Tables)
{
// Copy the DataTable to an object array
object[,] rawData = new object[dt.Rows.Count + 1, dt.Columns.Count];
// Copy the values to the object array
for (int col = 0; col < dt.Columns.Count; col++)
{
for (int row = 0; row < dt.Rows.Count; row++)
{ rawData[row, col] = dt.Rows[row].ItemArray[col]; }
}
// Calculate the final column letter
string finalColLetter = string.Empty;
string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int colCharsetLen = colCharset.Length;
if (dt.Columns.Count > colCharsetLen)
{ finalColLetter = colCharset.Substring((dt.Columns.Count - 1) / colCharsetLen - 1, 1); }
finalColLetter += colCharset.Substring((dt.Columns.Count - 1) % colCharsetLen, 1);
//this grabs the cell address from the "Query Dump" sheet, splits it on the '=' and
//pulls out only the cell address (i.e., "address=a3" becomes "a3")
string dumpCellString = lstQueryDumpText.Items[i].ToString();
string dumpCell = dumpCellString.Split('=').Last();
//referts to the range in which we are dumping the DataSet. The upper right hand cell is
//defined by the 'dumpCell' varaible and the bottom right cell is defined by the
//final column letter and the count of rows.
string firstRef = "";
string baseRow = "";
if (char.IsLetter(dumpCell, 1))
{
char[] createCellRef = dumpCell.ToCharArray();
firstRef = createCellRef[0].ToString() + createCellRef[1].ToString();
for (int z = 2; z < createCellRef.Count(); z++)
{
baseRow = baseRow + createCellRef[z].ToString();
}
}
else
{
char[] createCellRef = dumpCell.ToCharArray();
firstRef = createCellRef[0].ToString();
for (int z = 1; z < createCellRef.Count(); z++)
{
baseRow = baseRow + createCellRef[z].ToString();
}
}
int baseRowInt = Convert.ToInt32(baseRow);
int startingCol = ColumnLetterToColumnIndex(firstRef);
int endingCol = ColumnLetterToColumnIndex(finalColLetter);
int finalCol = startingCol + endingCol;
string endCol = ColumnIndexToColumnLetter(finalCol - 1);
int endRow = (baseRowInt + (dt.Rows.Count - 1));
string cellCheck = endCol + endRow;
string excelRange;
if (dumpCell.ToUpper() == cellCheck.ToUpper())
{
excelRange = string.Format(dumpCell + ":" + dumpCell);
}
else
{
excelRange = string.Format(dumpCell + ":{0}{1}", endCol, endRow);
}
//this dumps the cells into the range on Excel as defined above
templateSheet.get_Range(excelRange, Type.Missing).Value2 = rawData;
//checks to see if all the SQL queries have been run from the "Query Dump" tab, if not, continue
//the loop, if it is the last one, then save the workbook and move on.
if (i == lstSqlAddress.Items.Count - 1)
{
excelApp.Calculation = Excel.XlCalculation.xlCalculationAutomatic;
/*Run through the value save sheet array then grab the address from the corresponding list
place in the address array. If the address reads "whole sheet" then save the whole page,
else set the addresses range and value save that.*/
//for (int y = 0; y < lstSaveSheet.Items.Count; y++)
//{
// MessageBox.Show("Save Sheet: " + lstSaveSheet.Items[y] + "\n" + "Save Address: " + lstSaveRange.Items[y]);
//}
//run the macro to hide the unused columns
excelApp.Run("ReportMakerExecute");
//save excel file as hospital name and move onto the next
SaveTemplateAs(templateBook, h);
//close the open Excel App before looping back
//Marshal.ReleaseComObject(templateSheet);
//Marshal.ReleaseComObject(templateBook);
//templateSheet = null;
//templateBook = null;
//GC.Collect();
//GC.WaitForPendingFinalizers();
}
//Close excel Applications
//excelApp.Quit();
//Marshal.ReleaseComObject(templateSheet);
//Marshal.FinalReleaseComObject(excelApp);
//excelApp = null;
//templateSheet = null;
// GC.Collect();
//GC.WaitForPendingFinalizers();
}
}
The try/catch block is of no use either. This is the error:
"An unhandled exception of type 'System.AccessViolationException' occurred inSQUiRE (Sql QUery REtriever) v1.exe. Additional information: Attempted to read or write protected memory. This is often an indication that other memory is corrupt."
System.AccessViolationException would normally happen when you try to access an unallocated memory in a native code (not .NET). Then .NET translates it to the managed world as this exception.
Your code itself does not have any unsafe block. So access violation must me happening inside Excel.
Given the fact that it sometimes happens, some times not, I would say that it can be caused by a parallel Excel usage (I think the Excel COM is not thread-safe).
I would recommend you putting all your code inside a lock block, to prevent Excel from begin used in parallel. Something like this:
public void ExportToExcel(DataSet dataSet, string filePath, int i, int h, Excel.Application excelApp)
{
lock(this.GetType()) // You can change here to other instance to me used a mutex
{
// Your original code here
}
}
So long story, three days of testing longer, it was because of an excel file that was trying to open and fill with SQL results. The buffer was filling up and causing an exception...it just happened at the same point in every run because the load time for the excel file was the determining factor in it working or failing.
So after the load i just added a delaying do...while that checked to see if the file was accessible or not and it stopped the failures. fileOpenTest was taken from here
do
{
Task.Delay(2000);
}
while(!fileOpenTest);
I have a dataset returned from one table with 3 columns; study_key, version and interp_text_rtf.
I write the interp_text_rtf to a file and need to attach the version value (an int between 1 and 9 depending on how many versions)to the file name created from the interp_text_rtf contents.
I have this code so far:
public void WriteRTF(DataSet aDataSet)
{
foreach (DataTable aDataTable in aDataSet.Tables)
{
for (int i = 0; i < aDataTable.Rows.Count; i++)
{
foreach (DataColumn wDataColumn in aDataTable.Columns)
{
if (wDataColumn.ColumnName == "interpretation_text_rtf")
{
rtf = aDataTable.Rows[i][wDataColumn].ToString();
}
}
File.WriteAllText(mPath + rtfFile + i + ".rtf", rtf);
}
}
}
Right now it is just incrementing "i" with 0, 1, 2 etc depending on how many versions but I need to use value from version column instead of increment.
Thanks in advance for any help.
Why are you iterating the columns?
You can access them directly using the column name:
foreach (DataTable aDataTable in aDataSet.Tables)
{
for (int i = 0; i < aDataTable.Rows.Count; i++)
{
rtf = aDataTable.Rows[i]["interpretation_text_rtf"].ToString();
version = aDataTable.Rows[i]["version"].ToString();
File.WriteAllText(mPath + rtfFile + i + version +".rtf", rtf);
}
}