Our ASP.NET web app lets users import data from Excel. We're currently hosted on our own servers, so have access to the file system both from the web server and the SQL server. The process currently works like this:
Save uploaded file to temp folder on server
Execute T-SQL to read uploaded file directly from the file system to a SQL temp table via OLEDB
Execute T-SQL to read data from temp table and process as needed (e.g. compare to existing data and update/insert as appropriate)
Step 2 looks something like this:
Select * into #MY_TEMP_TABLE
From OpenRowSet(
'Microsoft.ACE.OLEDB.12.0',
'Excel 12.0; Database=PATH_TO_MY_UPLOADED_FILE; HDR=YES',
'Select * From MY_WORKSHEET_NAME%')
This is very fast and straightforward compared to (for example) reading the file into a datatable in .NET using EPPlus and then inserting the data row by row.
We're in the process of making the app Azure-ready (Azure Website and SQL Database, not VM). We can upload the file to blob storage and get its contents into a byte array, but then we're stuck with the row-by-row processing approach, which is slow and brittle.
Is there a fast way to programmatically bulk upload Excel to SQL in Azure?
I'd look at one of the commercial Excel components from the likes of ComponentOne. Read the spreadsheet's contents into memory and the write it into Azure SQL Database using standard ADO.Net techniques.
This will probably be more reliable and you can utilise retry logic for transient failures (http://www.asp.net/aspnet/overview/developing-apps-with-windows-azure/building-real-world-cloud-apps-with-windows-azure/transient-fault-handling).
Note that you need to be aware of Throttling behaviour in Azure SQL Database and how it might impact your app: http://msdn.microsoft.com/en-us/library/azure/dn338079.aspx
Related
I have a requirement to save a large amount (>100GB per day) of transactional data to a data lake gen2. The data is many small JSON transactions so I was planning to batch the transactions together into logical file groups to avoid creating lots of small files. This will allow data analysis to occur over the entire dataset.
However, I also have a separate requirement to retrieve individual transactions from a c# app. Is that possible? There doesn't seem to be an appropriate method on the REST API, and the USQL examples that I've found don't seem to be exposed to c# apps in any way.
Maybe I'm trying to use data lake for the wrong purpose but I don't want to save this quantity of data twice if I can help it.
Thanks!
This solution will allow T-SQL queries against all you JSON files
Create a Data Factory to Read JSON files and output parquet formatted files.
Use Azure Synapse Workspace On-Demand to read Parque files with OPENROWSET pointing to the Azure Storage location of the parquet files
Synapse Workspace On-Demand create a SQL Server Login for C# App
Use ADO.NET to send SQL Commands from C#
I'm trying to download a big file from FTP and store it somewhere into some azure storage and then run BulkInsert in sql on azure and save data into a table
I have an FTP server that I read data as CSV files from. Some of those files are very heavy about 1.5GB or even more. So far, I have been downloading these files into memory and then save them to the database using C# BulkCopy on azure but now I'm getting this error OutOfMemoryException which seems to be due to the size of the file.
That's why I'm thinking about using BulkInsert directly from SQL on azure but then that SQL instance needs access to the storage that the file is downloaded to and of course it cannot be my local machine because it seems like I can not run BulkInsert command on SQL Server on Azure when the source file is located on my local storage.
Is there any way of download and save a file into Azure storage that SQL has access to and then execute BulkInsert?
You can using the Data Factory to copy the data from the FTP to Azure SQL.
Data factory has the better performance to transfer big data:
Data Factory support the FTP as connector.
Please reference these tutorials:
Copy data from FTP server by using Azure Data Factory
Copy data to or from Azure SQL Database by using Azure Data Factory
Copy data from Azure Blob storage to a SQL database by using the
Copy Data tool
How can I download the file directly from FTP into that storage on azure?
You use create a pipeline, using FTP as Source, Storage as linked server.
You can also copy a big file from FTP to Azure SQL directly.
Hope this helps.
I'm developing an application using C# 4.0 and SQL Server 2008 R2 Express, my application needs to store and retrieve files (docx, pdf, png) locally and remotely, which approach would be the best?
Store the files in separate database (problem: restricted to 10 GB)
Use a windows shared folder (who to do?)
Use an FTP server (which server and library and how to do?)
SQL Server supports FILESTREAM, so if you have enough control over the SQL Server install to enable that feature then it seems like a good fit for you.
FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
Files stored directly in the file system with FILESTREAM don't count towards the database size because they aren't stored in the DB.
To confirm with an official source: https://learn.microsoft.com/en-us/sql/relational-databases/blob/filestream-compatibility-with-other-sql-server-features
SQL Server Express supports FILESTREAM. The 10-GB database size limit does not include the FILESTREAM data container.
I have an asp.net (c#) site in Azure and an accompanying Azure SQL database. I need to upload 1gb+ csv files and get them parsed and into my Azure SQL database. I can get the csv files into an Azure blob. I now have the URL to the blob (newGuid.txt).
Either from SQL or from the web app, how do I parse this csv and get it inserted into my Azure SQL database? (the csv has 36 columns if that helps)
I can't figure out how to reference the URL to use SqlBlukCopy. I initially thought I would BULK INSERT but Azure SQL doesn't allow that. I can't download the files locally and use BCP for each one.
I agree this is an old question, but as gets traction, here is the answer:
You can use Azure Data Factory to move data from Blob Storage (and many more data sources) to Azure SQL Database (and many other data sinks).
There is nothing simpler today to achieve your goal. Here are some tutorials:
Move data by using Copy Activity
Move data to and from Azure Blob
Move data to and from SQL Server running on-premises or on Azure VM (IaaS)
You can configure a data pipeline to just run once. Or run on specific time, or run on a schedule. You can make it copy single file, or using file pattern to get more files, etc.
I have a datalogging application (c#/.net) that logs data to a SQLite database. This database is written to constantly while the application is running. It is also possible for the database to be archived and a new database created once the size of the SQLite database reaches a predefined size.
I'm writing a web application for reporting on the data. My web setup is c#/.Net with a SQL Server. Clients will be able to see their own data gathered online from their instance of my application.
For test purposes, to upload the data to test with I've written a rough and dirty application which basically reads from the SQLite DB and then injects the data into the SQL Server using SQL - I run the application once to populate the SQL Server DB online.
My application is written in c# and is modular so I could add a process that periodically checks the SQLite DB then transfer new data in batches to my SQL Server.
My question is, if I wanted to continually synchronise the client side SQLLite database (s) with my server as the application is datalogging what would the best way of going about this be?
Is there any technology/strategy I should be looking into employing here? Any recommended techniques?
Several options come to mind. You can add a timestamp to each table that you want to copy from and then select rows written after the last update. This is fast and will work if you archive the database and start with an empty one.
You can also journal your updates for each table into an XML string that describes the changes and store that into a new table that is treated as a queue.
You could take a look at the Sync Framework. How complex is the schema that you're looking to sync up & is it only one-way or does data need to come back down?
As a simply solution I'd look at exporting data in some delimited format and then using bcp/BULK INSERT to pull it in to your central server.
Might want to investigate concept of Log Shipping
There exists a open source project on Github also available on Nuget. It is called SyncWinR, it implements the Sync Framework Toolkit to enabled synchronization with WinRT or Windows Phone 8 and SQLite.
You can access the project from https://github.com/Mimetis/SyncWinRT.