I am looking for advice on the best way to parse a Microsoft Excel file and update/store the data into a given SQL Server database. I using ASP.NET MVC so I plan on having a page/view take in an Excel spreadsheet and using that user given file I will need to use C# to parse the data from the columns and update the database based on matches with the spreadsheet column that contains the key column of the database table. The spreadsheet will always be in the same format so I will only need to handle on format. It seems like this could be a pretty common thing I am just looking for the best way to approach this before getting started. I am using Entity Framework in my current application but I don't have to use it.
I found this solution which seems like it could be a good option:
public IEnumerable<MyEntity> ReadEntitiesFromFile( IExcelDataReader reader, string filePath )
{
var myEntities = new List<MyEntity>();
var stream = File.Open( filePath, FileMode.Open, FileAccess.Read );
using ( var reader = ExcelReaderFactory.CreateOpenXmlReader( stream ) )
{
while ( reader.Read() )
{
var myEntity = new MyEntity():
myEntity.MyProperty1 = reader.GetString(1);
myEntity.MyProperty2 = reader.GetInt32(2);
myEntites.Add(myEntity);
}
}
return myEntities;
}
Here is an example of a what a file might look like (Clock# is the key)
So given a file in this format I want to match the user to the data table record using the clock # and update the record with each of the cells information. Each of the columns in the spreadsheet have a relatable column in the data table. All help is much appreciated.
You can use the classes in the namespace Microsoft.Office.Interop.Excel, which abstracts all the solution you found. Instead of me rewriting it, you can check out this article: http://www.codeproject.com/Tips/696864/Working-with-Excel-Using-Csharp.
Better yet, why not bypass the middle man? You can use an existing ETL tool, such as Pentaho, or Talend, or something to go straight from Excel to your database. These types of tools often offer a lot of customization, and are fairly straightforward to use. I've used Pentaho quite a lot for literally what you're describing, and it saved me the head ache of writing the code myself. Unless you want to/need to write it yourself, I think the latter is the best approach.
Try This
public string GetDataTableOfExcel(string file_path)
{
using (OleDbConnection conn = new OleDbConnection())
{
DataTable dt = new DataTable();
string Import_FileName = Server.MapPath(file_path);
//Import_FileName = System.IO.Path.GetDirectoryName(file_path);
string fileExtension = Path.GetExtension(Import_FileName);
if (fileExtension == ".xlsx")
conn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + Import_FileName + ";" + "Extended Properties='Excel 12.0 Xml;HDR=YES;'";
using (OleDbCommand comm = new OleDbCommand())
{
comm.CommandText = "Select * from [Sheet1$]";
comm.Connection = conn;
using (OleDbDataAdapter da = new OleDbDataAdapter())
{
da.SelectCommand = comm;
da.Fill(dt);
}
}
}
}
Now Your Data in DataTable. You can create insert query from datatable's data.
file_path is excel file's full path with directory name.
Related
Here is my connection string for .txt file and some piece of code
public class FileTransfers
{
public void fileFromDrive(string filename)
{
FileInfo file = new FileInfo(filename);
string fileConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
file.DirectoryName +
"; Extended Properties='text;HDR=YES;FMT=Delimited(,)';";
using (OleDbConnection con = new OleDbConnection(fileConnectionString))
{
using (OleDbCommand cmd = new OleDbCommand(
string.Format("SELECT * FROM [{0}]", file.Name), con))
{
con.Open();
using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
{
DataTable tbl = new DataTable("Attendance");
adp.Fill(tbl);
}
}
}
}
}
But the problem is when I debug records in tbl it shows me the data in only one column, but there are 7 multiple columns in my .txt file and hundreds of rows.
I have tried FMT=Delimited(,), FMT=TabDelimited,FMT=FiXed but didn't got multiple columns. I know every entry needs a (,) at is end, but I cant do that manually.
There are some details you need to consider doing this process as Jan Schreuder mentions in his article Using OleDb to Import Text Files,
The Jet engine makes assumptions about the content of the file. This
can result in incorrect imports. For example, it might think a column
contains date values. But in fact, your file should treat the columns
as a string. In these cases, you should create a Schema.Ini file that
describes the type of value for each column. The class creates a
Schema.Ini file before it opens the delimited file, but only to
specify what the delimiter is. You may want to change this to use
pre-defined INI files that describe your input file.
So go ahead and create the schema.ini file as prescribed and you issue will be all gone. It's contents should be looking like this,
[FileName.csv]
ColNameHeader=True
Format=CSVDelimited
For more details on how tos refer to the following MSDN guide,
Schema.ini File (Text File Driver)
I'm working with SQL Server 2014. One of the features of my web app is to upload CSV files, and import the data into a table (called TF) in my database (called TMPA).
I have no idea how to do this.
string excelPath = Server.MapPath("~/Files/") + Path.GetFileName(FileUpload2.PostedFile.FileName);
FileUpload1.SaveAs(excelPath);
SqlConnection con = new SqlConnection(#"Data Source=SAMSUNG-PC\SQLEXPRESS;Initial Catalog=TMPA;Persist Security Info=True");
StreamReader sr = new StreamReader(excelPath);
string line = sr.ReadLine();
string[] value = line.Split(',');
DataTable dt = new DataTable();
DataRow row;
foreach (string dc in value)
{
dt.Columns.Add(new DataColumn(dc));
}
while (!sr.EndOfStream)
{
value = sr.ReadLine().Split(',');
if (value.Length == dt.Columns.Count)
{
row = dt.NewRow();
row.ItemArray = value;
dt.Rows.Add(row);
}
}
SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock);
bc.DestinationTableName = "TF";
bc.BatchSize = dt.Rows.Count;
con.Open();
bc.WriteToServer(dt);
bc.Close();
con.Close();
I tried this code, but it wouldn't work.
PS : TF has more columns than what the CSV file : some of the columns are computed and should be calculated automatically after each insert ..
Here is the canvas of my CSV file : 4 columns :
IdProduit,Mois,Reel,Budget
IdProduit is a string, Mois is a date, Reel and Budget are floats.
On the other hand, my SQL Server table looks like this :
|IdProduit|Mois|Reel|Budget|ReelPreviousMois|VarReelBudget|VarReelPrvM|...
|---------|----|----|------|----------------|-------------|-----------|-----
All the other columns should either be null or automatically calculated.
Help me !
I fixed it using this opensource .net library called Filehelpers.
Here's the link : http://www.filehelpers.net/
Here's what I did :
<asp:FileUpload ID="FileUpload1" runat="server" />
<asp:Button ID="Button1" OnClick = "UploadF" runat="server" Text="Importer" />
And here's the code behind :
[DelimitedRecord("|")]
public class TBFtable
{
public string IdProduit;
public DateTime Mois;
public float Reel;
public float Budget;
}
protected void UploadF(object sender, EventArgs e)
{
string excelPath = Server.MapPath("~/Files/") + Path.GetFileName(FileUpload1.PostedFile.FileName);
FileUpload1.SaveAs(excelPath);
SqlServerStorage storage = new SqlServerStorage(typeof(TBFtable),ConfigurationManager.ConnectionStrings["bd"].ConnectionString);
storage.InsertSqlCallback = new InsertSqlHandler(GetInsertSqlCust);
TBFtable[] res = CommonEngine.ReadFile(typeof(TBFtable), excelPath) as TBFtable[];
storage.InsertRecords(res);
ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "alertMessage", "alert('Données enregistrées avec succès !')", true);
}
protected string GetInsertSqlCust(object record)
{
TBFtable obj = (TBFtable) record;
return String.Format("INSERT INTO TF (IdProduit, Mois, Reel, Budget ) " + " VALUES ( '{0}' , '{1}' , '{2}' , '{3}' ); ", obj.IdProduit, obj.Mois,obj.Reel, obj.Budget );
}
You're on the right path. Using SqlBulkCopy will provide the best performance when inserting the data into SQL Server. However, as opposed to writing your own Csv parser, I would use the stellar one provided in the .NET Framework via the TextFieldParser class in the Microsoft.VisualBasic assembly. You may need to do some digging to see if SqlBulkCopy allows a partial dataset to be used. I don't believe it does, but you could add the missing columns to your DataTable before sending it to SqlBulkCopy as a workaround.
I know this is an old question, but for whoever may be interested.
Unless you're sure that your files will never be massive, you should avoid loading the whole batch into memory and sending it all at once to SQL Server as is the case with the DataTable approach and (I think) your accepted answer. You may be risking an out of memory exception on the client side (your file processing server in this case) or, worse still, on SQL Server side. You can avoid that by using SqlBulkCopy class and an implementation of IDataReader interface.
I wrote a package that I think could be of interest in cases such as yours. The code would look like so:
var dataReader = new CsvDataReader("pathToYourCsv",
new List<TypeCode>(4)
{
TypeCode.String, //IdProduit
TypeCode.DateTime, //Mois
TypeCode.Double, //Reel
TypeCode.Double //Budget
});
this.bulkCopyUtility.BulkCopy("tableName", dataReader);
There are also additional configuration options for more complex scenarios (flexible column mapping, additional static column values which are not present in the csv file, value transformation).
The package is open sourced (project on Github) and should work on .NET Core and .NET Framework.
As a side comment, SQL Server recovery mode may be important when doing massive SQL imports. Whenever possible, use Simple or Bulk Logged to avoid huge transaction files.
i just was just wondering, how do i import large excel files into mysql with c#? My coding experience isn't great and i was hoping if there's anyone out there who could give me some rough idea to start on it. So far, i was able to load excel files into datagridview with the following codes:
string PathConn = " Provider=Microsoft.JET.OLEDB.4.0;Data Source=" + pathTextBox.Text + ";Extended Properties =\"Excel 8.0;HDR=Yes;\";";
OleDbConnection conn = new OleDbConnection(PathConn);
conn.Open();
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter("Select * from [" + loadTextBox.Text + "$]", conn);
table = new DataTable();
myDataAdapter.Fill(table);
but after that, i don't know how i could extract the information and save it into mysql database. Assuming i have a empty scheme created before, how do i work on uploading excel files into mysql? thanks.
I think you would then need to loop over the items in the datatable and do something with them (maybe an insert statement to your DB)
like so
foreach(DataRow dr in table.Rows)
{
string s = dr[0].ToString() // this will be the first column in the datatabl as they are zero indexed
}
this is what i do in data migration scenarios from one SQL Server to another or DataFiles to SQL:
Create the new Table on the destination SQL Server (Column names, Primary Key etc.)
Load existing Data into a DataTable (Thats what you did already)
Now Query the new Table with the DataAdapter into another DataTable (Same as you did with the excel file except you now query the SQL Table.)
Load OldData from 'table' into 'newTable' using DataTable Method "Load()"
string PathConn = (MYSQL Connection String goes here)
OleDbConnection conn = new OleDbConnection(PathConn);
conn.Open();
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter("Select * from [" + loadTextBox.Text + "$]", conn);
newTable = new DataTable();
myDataAdapter.Fill(newTable);
Now use the Load() Method on the new table:
newTable.Load(table.CreateDataReader(), <Specify LoadOption here>)
Matching columns will be imported into the new DataTable. (You can ensure the mapping through using Aliases in the select statements)
After Loading the existing Data into the new Table you will be able to use an DataAdapter to write the changes back to database.
Example for writing data back: ConnString - connection String for DB,
SelectStmt (can use the same as you did on the empty Table before) and provide the newTable as dtToWrite
public static void writeDataTableToServer(string ConnString, string selectStmt, DataTable dtToWrite)
{
using (OdbcConnection odbcConn = new OdbcConnection(ConnString))
{
odbcConn.Open();
using (OdbcTransaction trans = odbcConn.BeginTransaction())
{
using (OdbcDataAdapter daTmp = new OdbcDataAdapter(selectStmt, ConnString))
{
using (OdbcCommandBuilder cb = new OdbcCommandBuilder(daTmp))
{
try
{
cb.ConflictOption = ConflictOption.OverwriteChanges;
daTmp.UpdateBatchSize = 5000;
daTmp.SelectCommand.Transaction = trans;
daTmp.SelectCommand.CommandTimeout = 120;
daTmp.InsertCommand = cb.GetInsertCommand();
daTmp.InsertCommand.Transaction = trans;
daTmp.InsertCommand.CommandTimeout = 120;
daTmp.UpdateCommand = cb.GetUpdateCommand();
daTmp.UpdateCommand.Transaction = trans;
daTmp.UpdateCommand.CommandTimeout = 120;
daTmp.DeleteCommand = cb.GetDeleteCommand();
daTmp.DeleteCommand.Transaction = trans;
daTmp.DeleteCommand.CommandTimeout = 120;
daTmp.Update(dtToWrite);
trans.Commit();
}
catch (OdbcException ex)
{
trans.Rollback();
throw ex;
}
}
}
}
odbcConn.Close();
}
}
Hope this helps.
Primary Key on the newTable is necessary, otherwise you might get a CommandBuilder exception.
BR
Therak
Your halfway there, You have obtained the information from the Excel spreadsheet and have it stored in a DataTable.
The first thing you need to do before you look to import a significant amount of data into SQL is validate what you have read in from the spreadsheets.
You have a few options, one of which is do something very similar to how you read in your data and that is use a SQLAdapter to perform am INSERT into the SQL Database. All your really needing to do in this case is create a new connection and write the INSERT command.
There are many example of doing this on here.
Another option which i would use, is LINQ to CSV (http://linqtocsv.codeplex.com/).
With this you can load all of your data into class objects which makes it easier to validate each object before you perform your INSERT into SQL.
If you have limited experience then use the SQLAdapter to connect to you DB.
Good Luck
I know this topic is done to death but I am at wits end.
I need to parse a csv. It's a pretty average CSV and the parsing logic has been written using OleDB by another developer who swore that it work before he went on vacation :)
CSV sample:
Dispatch Date,Master Tape,Master Time Code,Material ID,Channel,Title,Version,Duration,Language,Producer,Edit Date,Packaging,1 st TX,Last TX,Usage,S&P Rating,Comments,Replace,Event TX Date,Alternate Title
,a,b,c,d,e,f,g,h,,i,,j,k,,l,m,,n,
The problem I have is that I get various errors depending on the connection string I try.
when I try the connection string:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source="D:\TEST.csv\";Extended Properties="text;HDR=No;FMT=Delimited"
I get the error:
'D:\TEST.csv' is not a valid path. Make sure that the path name is spelled correctly and that you are connected to the server on which the file resides.
When I try the connection string:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=D:\TEST.csv;Extended Properties=Excel 12.0;
or the connection string
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\TEST.csv;Extended Properties=Excel 8.0;
I get the error:
External table is not in the expected format.
I am considering throwing away all the code and starting from scratch. Is there something obvious I am doing wrong?
You should indicate only the directory name in your connection string. The file name will be used to query:
var filename = #"c:\work\test.csv";
var connString = string.Format(
#"Provider=Microsoft.Jet.OleDb.4.0; Data Source={0};Extended Properties=""Text;HDR=YES;FMT=Delimited""",
Path.GetDirectoryName(filename)
);
using (var conn = new OleDbConnection(connString))
{
conn.Open();
var query = "SELECT * FROM [" + Path.GetFileName(filename) + "]";
using (var adapter = new OleDbDataAdapter(query, conn))
{
var ds = new DataSet("CSV File");
adapter.Fill(ds);
}
}
And instead of OleDB you could use a decent CSV parser (or another one).
Alternate solution is to use TextFieldParser class (part of .Net framework itself.) https://learn.microsoft.com/en-us/dotnet/api/microsoft.visualbasic.fileio.textfieldparser
This way you do not have to rely on other developer who has gone for holidays. I have used it so many times and have not hit any snag.
I have posted this from work (hence I cannot post an example snippet. I will do so when I go home this evening).
It seems your first row contains the column names, so you need to include the HDR=YES property, like this:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=D:\TEST.csv;Extended Properties="Excel 12.0;HDR=YES";
Try the connection string:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\TEST.csv;Extended Properties=\"Excel 8.0;IMEX=1\""
var s=#"D:\TEST.csv";
string dir = Path.GetDirectoryName(s);
string sConnection = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=YES;FMT=Delimited\"";
As part of a project I'm working on in C# I need to read in a .dbf file. The first thing I want to do is to get the schema table from the file. I have code that works as long as the filename (without the extension) is not longer than 8 characters.
For example, let's say I have a file named MyLongFilename.dbf. The following code does not work; it throws the following exception: “The Microsoft Jet database engine could not find the object 'MyLongFilename'. Make sure the object exists and that you spell its name and the path name correctly.”
string cxn = "PROVIDER=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyLongFilename;Extended Properties=dBASE 5.0";
OleDbConnection connection = new OleDbConnection(cxn);
To get past this exception, the next step is to use a name the OldDbConnection likes ('MyLongF~1' instead of 'MyLongFilename'), which leads to this:
string cxn = "PROVIDER=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyLongF~1;Extended Properties=dBASE 5.0";
OleDbConnection connection = new OleDbConnection(cxn);
This does successfully return an OleDbConnection. Now to get the schema table I try the following:
connection.Open();
DataTable schemaTable = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Columns,
new object[] { null, null, fileNameNoExt, null });
This returns a DataTable with no rows. If I rename the filename to 8 or less characters then this code works and I get back a row for each field in the database.
With the long filename, I know the returned connection is valid because I can use it to fill a DataSet like so:
string selectQuery = "SELECT * FROM [MyLongF~1#DBF];";
OleDbCommand command = new OleDbCommand(selectQuery, connection);
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter();
dataAdapter.SelectCommand = command;
DataSet dataSet = new DataSet();
dataAdapter.Fill(dataSet);
This gives me back a DataSet containing a DataTable with all of the data from the dbf file.
So the question is how can I get just the schema table for the long named dbf file? Of course I can work around the issue by renaming/copying the file, but that’s a hack I don’t want to have to make. Nor do I want to fill the DataSet with the top 1 record and deduce the schema from columns.
According to MSDN, the folder represents the database and the files represent tables. You should be using the directory path not including the filename in the connection string then, and the name of the table as part of the restrictions to GetOleDbSchemaTable.
Well, i think the connection should be
string cxn = "PROVIDER=Microsoft.Jet.OLEDB.4.0;Data Source=C:\;Extended Properties=dBASE 5.0";
OleDbConnection connection = new OleDbConnection(cxn);
and the other is, maybe you should try with other provider, I boosted a lot along ago when I used like this:
string cxn = "PROVIDER=VFPOLEDB.1;Data Source=C:\;Extended Properties=dBASE 5.0";
But you should have VFP 7 installed
or install Microsoft OLE DB Provider for Visual FoxPro 9.0 from here
const string connectionString = #"Provider = vfpoledb; Data Source = {0}; Collating Sequence = general;";
OleDbConnection conn = new OleDbConnection(string.Format(connectionString, dirName));
conn.Open();
OleDbCommand cmd = new OleDbCommand(string.Format("select * from {0}", fileName), conn);
Is fileNameNoExt holding the short filename version? Also, MyLongF~1 is 9 characters, not 8.
If you have a single (and possibly small) dbf file you can solve the problem copying the dbf file elsewhere and open the copy instead of the original file.
I believe that the DataSource should represent the directory that contains the .DBF files. Each .DBF file corresponds to a table in that directory.
My guess is c:\MyLongF~1 is a short name for a directory that contains a filename corresponding to MyLongF~1#DBF
Can you verify whether or not this is the case?