Reading from SQL Server - need to read from CSV - c#

At the moment, I source my data from a SQL serve r(2008) database. The cyurrent method is to use a DataTable, which is then passed around and used.
if (parameters != null)
{
SqlDataAdapter _dataAdapter = new SqlDataAdapter(SqlQuery, CreateFORSConnection());
foreach (var param in parameters)
{
_dataAdapter.SelectCommand.Parameters.AddWithValue(param.Name, param.Value);
}
DataTable ExtractedData = new DataTable(TableName);
_dataAdapter.Fill(ExtractedData);
return ExtractedData;
}
return null;
But now, the user has said that we can also get data from txt files, which have the same structure as the tables in SQL Server. So, if I have a table called 'Customer', then I have a csv file with Customer. with the same column structure. The first line in the CSV is the column name, and matches my tables.
Would it be possible to read the txt file into a data table, and then run a SELECT on that data table somehow? Most of my queries are single table queries:
SELECT * FROM Table WHERE Code = 111
There is, however, ONE case where I do a join. That may be a bit more tricky, but I can make a plan. If I can get the txt files into data tables first, I can work with that.
Using the above code, can I not change the connection string to rather read from a CSV instead of SQL Server?

First, you'll need to read the CSV data into a DataTable. There are many CSV parsers out there, but since you prefer using ADO.NET, you can use the OleDB client. See the following article.
http://www.switchonthecode.com/tutorials/csharp-tutorial-using-the-built-in-oledb-csv-parser
Joining is a bit harder, since both sets of data live in different places. But what you can do is get two DataTables (one from each source), then use Linq to join them.
Inner join of DataTables in C#

You could read the text file into a List<string> (if there is just 1 column per file), and then use LINQ to query the list. For example:
var result = from entry in myList
where entry == "111"
select entry;
Of course, this example is kind of useless since all you get back is the same string you are searching for. But if there are multiple columns in the file, and they match the columns in your DataTable, why not read the file into the data table, and then use LINQ to query the table?
Here is a simple tutorial about how to use LINQ to query a DataTable:
http://blogs.msdn.com/b/adonet/archive/2007/01/26/querying-datasets-introduction-to-linq-to-dataset.aspx

Related

Save large result set to multiple XML files

I have a stored procedure that returns a large result set (nearly 20 million records). I need to save this result to multiple XML files. I am currently using ADO.Net to fill a dataset, but it quickly throws System.OutOfMemoryException. What other methods that I can use to accomplish this?
Are you using sql server ?
in this case there is a sql instruction to automatically convert the result of a query into a xml structure, you would then get it as a string in the application.
Options :
you split the string into several ones and save them to files (in the app)
modify PS to split result into several xml objects then get them as different strings / row (1 row => 1 object) and save each of them into a file.
write a new PS that calls the original PS, split result into X xml objects, then returns X xml strings that you just have to save in the application
Not using sql server ?
do the XML formatting in the PS or write a new one that does it
Anyway, if think it will be easier to do the xml formatting server side
Assuming you are using SQL Server - you can use paging in your stored procedure. ROW_NUMBER is an option. SQL Server 2012 and above support OFFSET and FETCH.
Also, how many DataTables are you filling? There are row limits for DataTables.
The maximum number of rows that a DataTable can store is 16,777,216
https://msdn.microsoft.com/en-us/library/system.data.datatable.aspx

Bulk insert c# datatable to sql server - Efficient check for duplicate

I wish to import many files into a database (with custom business logic preventing simple SSIS package use).
High level description of problem:
Pull existing sql data into DataTable (~1M rows)
Read excel file into 2d array in one chunk
validate fields row by row (custom logic)
Check for duplicate in existing DataTable
Insert into DataRow
Bulk insert DataTable into SQL table
Problem with my approach:
Each row must be checked for duplicates, I thought a call to remote server to leverage SQL would be too slow, so I opted for LINQ. The query was simple, but the size of the dataset causes it to crawl (90% execution time spent in this query checking the fields).
var existingRows = from row in recordDataTable.AsEnumerable()
where row.Field<int>("Entry") == entry
&& row.Field<string>("Device") == dev
select row;
bool update = existingRows.Count() > 0;
What other ways might there be to more efficiently check for duplicates?
Using linq it will basically do a for loop over your ~1M records every time you check for a duplicate.
You would be better off putting the data into a dictionary so your lookups are against an in memory index.

sql query to fill dataset with multiple datatables based on specific column

I have a SQL Server table called tSongList that contains the following information:
colAlbumID, colSongName, colAlbumTrackNumber, colRequestedCount, colPlayPriority
The purpose of this table is to help a DJ keep a list of which songs the DJ should play and which album they are from. I have a C# class that will take a list of songs from a specific albumID and calculate the colPlayPriority based on the colRequestedCount. I have designed this class to take a DataTable containing the columns above and compute the necessary information.
So my question is if I want to use SQL to select all the rows from the tSongList, how do I get the SQL result into multiple DataTables grouped by colAlbumID? In other words, I want a DataTable for each Album that contains it's song information.
I know that I can use a SqlDataAdapter to fill a DataSet and since a DataSet can contain multiple DataTables, is there a way to construct a SQL query to return a DataSet containing the DataTables grouped by albumID?
Also if this can't be done, should I just select everything into one DataTable and use the Select function to get a DataRow array instead?
One option is to return a single DataTable and use LINQ to create your grouping.
var albumGroups = from a in dtAlbums.AsEnumerable()
group a by a.Field<int>("albumID") into g
select new { colSongName = g.Field<string>("colSongName"),
colAlbumTrackNumber = g.Field<int>("colAlbumTrackNumber"),
colRequestedCount = g.Field<int>("colRequestedCount"),
colPlayPriority = g.Field<int>("colPlayPriority") };
I think the example on MSDN will show you how to do this. You will make life much easier for yourself if you properly normalise your data first. You can create a hierarchical DataSet with the necessary relationships in place making it trivial to get all songs for an album.

Quickest way to run SQL on a multidimensional array

I want to take a table as represented by a multidimensional string array (column names in another array) and use a SQL SELECT statement to get a subset of rows.
The Catch:
the table is input by the user (different table each time)
the SQL is also input by the user
Do I need to:
Create a table in SQL
Populate the table in SQL
Query the table
or is the a simpler solution? E.g. converting the multidimensional array to a DataTable and then running the SQL on that object?
I think you would be able to use DataTable for this. It's normally used to store data retrieved from a database but you can populate it manually. The best part is the DataTable.Select() method, which allows you to write just the WHERE clause of a query and it will return the matching rows.
You can create your own expression tree representing the query the user enters. This is how Linq works, under the hood. If you could give an example of what your trying to achieve it may help also what your going to write the application in c# for web for example.
For instance if you letting your users enter new rows, in some kind of GUI to a table then you could, do this in a datagrid and enable column filter to achieve the result mentioned above?
In a web app you could have an input box above each column users can enter data to filter that column on.

Join multiple DataRows into a single DataRow

I am writing this in C# using .NET 3.5. I have a System.Data.DataSet object with a single DataTable that uses the following schema:
Id : uint
AddressA: string
AddressB: string
Bytes : uint
When I run my application, let's say the DataTable gets filled with the following:
1 192.168.0.1 192.168.0.10 300
2 192.168.0.1 192.168.0.20 400
3 192.168.0.1 192.168.0.30 300
4 10.152.0.13 167.10.2.187 80
I'd like to be able to query this DataTable where AddressA is unique and the Bytes column is summed together (I'm not sure I'm saying that correctly). In essence, I'd like to get the following result:
1 192.168.0.1 1000
2 10.152.0.13 80
I ultimately want this result in a DataTable that can be bound to a DataGrid, and I need to update/regenerate this result every 5 seconds or so.
How do I do this? DataTable.Select() method? If so, what does the query look like? Is there an alternate/better way to achieve my goal?
EDIT: I do not have a database. I'm simply using an in-memory DataSet to store the data, so a pure SQL solution won't work here. I'm trying to figure out how to do it within the DataSet itself.
For readability (and because I love it) I would try to use LINQ:
var aggregatedAddresses = from DataRow row in dt.Rows
group row by row["AddressA"] into g
select new {
Address = g.Key,
Byte = g.Sum(row => (uint)row["Bytes"])
};
int i = 1;
foreach(var row in aggregatedAddresses)
{
result.Rows.Add(i++, row.Address, row.Byte);
}
If a performace issue is discovered with the LINQ solution I would go with a manual solution summing up the rows in a loop over the original table and inserting them into the result table.
You can also bind the aggregatedAddresses directly to the grid instead of putting it into a DataTable.
most efficient solution would be to do the sum in SQL directly
select AddressA, SUM(bytes) from ... group by AddressA
I agree with Steven as well that doing this on the server side is the best option. If you are using .NET 3.5 though, you don't have to go through what Rune suggests. Rather, use the extension methods for datasets to help query and sum the values.
Then, you can map it easily to an anonymous type which you can set as the data source for your grid (assuming you don't allow edits to this, which I don't see how you can, since you are aggregating the data).
I agree with Steven that the best way to do this is to do it in the database. But if that isn't an option you can try the following:
Make a new datatable and add the columns you need manually using DataTable.Columns.Add(name, datatype)
Step through the first datatables Rows collection and for each row create a new row in your new datatable using DataTable.NewRow()
Copy the values of the columns found in the first table into the new row
Find the matching row in the other data table using Select() and copy out the final value into the new data row
Add the row to your new data table using DataTable.Rows.Add(newRow)
This will give you a new data table containing the combined data from the two tables. It won't be very fast, but unless you have huge amounts of data it will probably be fast enough. But try to avoid doing a LIKE-query in the Select, for that one is slow.
One possible optimization would be possible if both tables contains rows with identical primary keys. You could then sort both tables and step through them fetching both data rows using their array index. This would rid you of the Select call.

Categories

Resources