Reading data from Azure Data Lake in C# - c#

I have a requirement to save a large amount (>100GB per day) of transactional data to a data lake gen2. The data is many small JSON transactions so I was planning to batch the transactions together into logical file groups to avoid creating lots of small files. This will allow data analysis to occur over the entire dataset.
However, I also have a separate requirement to retrieve individual transactions from a c# app. Is that possible? There doesn't seem to be an appropriate method on the REST API, and the USQL examples that I've found don't seem to be exposed to c# apps in any way.
Maybe I'm trying to use data lake for the wrong purpose but I don't want to save this quantity of data twice if I can help it.
Thanks!

This solution will allow T-SQL queries against all you JSON files
Create a Data Factory to Read JSON files and output parquet formatted files.
Use Azure Synapse Workspace On-Demand to read Parque files with OPENROWSET pointing to the Azure Storage location of the parquet files
Synapse Workspace On-Demand create a SQL Server Login for C# App
Use ADO.NET to send SQL Commands from C#

Related

Insert a csv file into Azure SQL from an Azure Blob

I have an asp.net (c#) site in Azure and an accompanying Azure SQL database. I need to upload 1gb+ csv files and get them parsed and into my Azure SQL database. I can get the csv files into an Azure blob. I now have the URL to the blob (newGuid.txt).
Either from SQL or from the web app, how do I parse this csv and get it inserted into my Azure SQL database? (the csv has 36 columns if that helps)
I can't figure out how to reference the URL to use SqlBlukCopy. I initially thought I would BULK INSERT but Azure SQL doesn't allow that. I can't download the files locally and use BCP for each one.
I agree this is an old question, but as gets traction, here is the answer:
You can use Azure Data Factory to move data from Blob Storage (and many more data sources) to Azure SQL Database (and many other data sinks).
There is nothing simpler today to achieve your goal. Here are some tutorials:
Move data by using Copy Activity
Move data to and from Azure Blob
Move data to and from SQL Server running on-premises or on Azure VM (IaaS)
You can configure a data pipeline to just run once. Or run on specific time, or run on a schedule. You can make it copy single file, or using file pattern to get more files, etc.

Consolidating Multiple API Responses into a Single Large File and Writing to an Azure Blob

I am trying to send data returned from a JSON REST API into flat files on an Azure Storage Blob. Rather than storing the response from each request in a separate file, I would like to consolidate the responses into large batches of 400,000 or so (1 JSON response per line in the file), and then write a single large file (~1 GB) to an Azure blob.
Does anyone know of a good way to handle this in C# code, or perhaps there is a better way to handle it with an existing tool or framework?
I have something similar to this set up, where I log REST API data daily (around 2-3GB) and write them to an external data store.
I ended up using log4net to write the files to a local file, which is nice because it can automatically rotate the files based on something like a date. So you'll end up with 1 file per day.
Then you can setup a job to use the Azure C# SDK to upload files.
(Check out RollingFileAppender for log4net: http://logging.apache.org/log4net/release/config-examples.html - towards bottom of the page)

How to import Excel document programmatically into Azure SQL table?

Our ASP.NET web app lets users import data from Excel. We're currently hosted on our own servers, so have access to the file system both from the web server and the SQL server. The process currently works like this:
Save uploaded file to temp folder on server
Execute T-SQL to read uploaded file directly from the file system to a SQL temp table via OLEDB
Execute T-SQL to read data from temp table and process as needed (e.g. compare to existing data and update/insert as appropriate)
Step 2 looks something like this:
Select * into #MY_TEMP_TABLE
From OpenRowSet(
'Microsoft.ACE.OLEDB.12.0',
'Excel 12.0; Database=PATH_TO_MY_UPLOADED_FILE; HDR=YES',
'Select * From MY_WORKSHEET_NAME%')
This is very fast and straightforward compared to (for example) reading the file into a datatable in .NET using EPPlus and then inserting the data row by row.
We're in the process of making the app Azure-ready (Azure Website and SQL Database, not VM). We can upload the file to blob storage and get its contents into a byte array, but then we're stuck with the row-by-row processing approach, which is slow and brittle.
Is there a fast way to programmatically bulk upload Excel to SQL in Azure?
I'd look at one of the commercial Excel components from the likes of ComponentOne. Read the spreadsheet's contents into memory and the write it into Azure SQL Database using standard ADO.Net techniques.
This will probably be more reliable and you can utilise retry logic for transient failures (http://www.asp.net/aspnet/overview/developing-apps-with-windows-azure/building-real-world-cloud-apps-with-windows-azure/transient-fault-handling).
Note that you need to be aware of Throttling behaviour in Azure SQL Database and how it might impact your app: http://msdn.microsoft.com/en-us/library/azure/dn338079.aspx

Is it possible to have a Windows Phone 7.x application update variables from an online source?

I am working on a currency converter program, and I am attempting to allow the program to update all of the conversion rates from an online source, so that the currency conversion is always up to date. Can this be done, and can anyone give me an idea on a better way of phrasing what I am trying to do? Thanks
You should store the data in a file in isolated storage or in a SQL CE database. You can then check if there are new versions of data online periodically or on application start and then update the data inside.
To check if there is new data online, you need to either reuse an existing service or provide your own.
For your Wp7 App, you need fresh Data of Conversion Rates. Since a device may not be always connected to Internet, so you would be storing your Data in one of the ways.
A few can be:
Storing Data in a "Text File" or Xml in Isolated Storage
Using a Database to store your Data, like inbuilt SQL CE or others like SQLite or Sterling
To bring fresh Data (i.e. Conversion Rates in your Case), you need to connect to a Web Service over Http to fetch data. The service may be your own hosted or a third party.

Best way to synchronise client database with server database

I have a datalogging application (c#/.net) that logs data to a SQLite database. This database is written to constantly while the application is running. It is also possible for the database to be archived and a new database created once the size of the SQLite database reaches a predefined size.
I'm writing a web application for reporting on the data. My web setup is c#/.Net with a SQL Server. Clients will be able to see their own data gathered online from their instance of my application.
For test purposes, to upload the data to test with I've written a rough and dirty application which basically reads from the SQLite DB and then injects the data into the SQL Server using SQL - I run the application once to populate the SQL Server DB online.
My application is written in c# and is modular so I could add a process that periodically checks the SQLite DB then transfer new data in batches to my SQL Server.
My question is, if I wanted to continually synchronise the client side SQLLite database (s) with my server as the application is datalogging what would the best way of going about this be?
Is there any technology/strategy I should be looking into employing here? Any recommended techniques?
Several options come to mind. You can add a timestamp to each table that you want to copy from and then select rows written after the last update. This is fast and will work if you archive the database and start with an empty one.
You can also journal your updates for each table into an XML string that describes the changes and store that into a new table that is treated as a queue.
You could take a look at the Sync Framework. How complex is the schema that you're looking to sync up & is it only one-way or does data need to come back down?
As a simply solution I'd look at exporting data in some delimited format and then using bcp/BULK INSERT to pull it in to your central server.
Might want to investigate concept of Log Shipping
There exists a open source project on Github also available on Nuget. It is called SyncWinR, it implements the Sync Framework Toolkit to enabled synchronization with WinRT or Windows Phone 8 and SQLite.
You can access the project from https://github.com/Mimetis/SyncWinRT.

Categories

Resources