I have a dataset fetched from a ODBC data source (MySQL), I need to temporarily put it into a SQL Server DB to be fetched by another process (and then transformed further).
Instead of creating a nicely formatted database table I'd rather have one field of type text and whack the stuff there.
This is only a data migration exercise, a one-off, so I don't care about elegance, performance, or any other aspects of it.
All I want is to be able to de-serialize the "text-blob" (or binary) back into anything resembling a dataset. A Dictionary object would do the trick too.
Any quick fixes for me? :)
Use DataSet.WriteXml to write out your "text-blob" then use DataSet.ReadXml later when you want to translate the "text-blob" back into a DataSet to perform whatever subsequent manipulations you want to do.
Why don't you do everything on a SSIS package, extract from MySQL, transform as you wish and load wherever you need it?
Related
i am creating an application using visual studio that use a database of course. i don't get why we create a data set as i tried some query without creating a data set and it worked perfectly. the queries i will be using are update, delete, insert and select (simple and complex ones).
so should i use the data set and why?
Note the database is a big one, and as i understood creating a data set will create a copy of the database, so will this make a storage (memory) problem?
You are looking for ways without a Dataset..
For INSERT, UPDATE and DELETE can use System.Data.Sql and System.Data.SqlClient, open your own SqlConnection and proceed https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection?view=dotnet-plat-ext-3.1 .... but for reads (SELECT), it is practical to use a DataSet. On this level (below Entity Framework !) the DataSet has a Fill() method, you can fill it with any data you want. The only class I know of that can read without a DataSet is DataReader, refer https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/retrieving-data-using-a-datareader
NOTE: as mensioned in the comments above, these low level methods to access a database are inherently unsafe, because you have to specify and pass explicit SQL to ADO, making user input vulnerable.. This can be avoided using a parametrized DataAdaper+Dataset, or Entity Framework, in either way you can avoid SQL injection.
If you have a big database you should consider using an ORM like Entity Framework. By using the old ADO .Net a dataset would help you handling your data. Inside it you will have a DataTable that represents your data and so, whenever you make a select the rows of your database will be stored temporarily in your dataset/datatable. That said, you don't need all your database there, but only the rows you need to work on.
Every time you add a new record to the datatable, it is inserted flagged as new. When you modify a row, it is flagged as modified or deleted, when it is the case. So after all handling you can save your dataset back to the database.
I need some help. I've been back and forth on which direction I should go and there are some options of which none I like or can use.
I wrote a generic data dump tool that pulls data from a specified server and dumps it to a comma delimited file. It's configuration and what query to run comes from a SQL table created specifically for this tool. However, I have a new requirement that there are some data dumps that need data pulled from different servers and merged together, but I don't want to alter the tool for this "custom" type of pull/dump. I'm trying to keep it generic so I'm not constantly coding on it. My thought is to create a lib in which my reporting tool can use for each of these custom type of pulls and the data returned by this lib is a SqlDataReader object. However, since this lib will have to pull from different servers and merge the data, is it possibly for the lib to create a SqlDataReader of it's own with this pulled data and returned to the data dump tool or am I think too much into this?
I don't want to return an array because it's not how the tool loops through data now, mainly because some of my existing data dumps are millions of rows, so my existing loop is a datareader loop to keep memory down. However, the libs may create a two dimensional array, as long as it can be converted to a SqlDataReader object before returning. This way I don't have to change much on the looping within my application.
Hope that all makes sense. I have it in my head bouncing around so I ended up writing this like 10 times.
Edit: Keep in mind, each record will be scattered across 3 servers and have to be merged. These are three different processes that work together, but have their own servers. ID from server 1 will relate to Server1ID on Server2 for example.
All the ADO.NET data access classes implement common interfaces so you can return an IDataReader instead of SqlDataReader.
I've come up with a solution. I'll write the object, which will by dynamic based on the "custom" report to be generated. This object will pull data from the first server and insert it into a local table/SQL Server. It will then go to the next server, pull data based on the first data pulled and update it within the same server. Then finally the last server to pull the final data that will also need to be merged into my local table. Once all that is merged correctly, I'll select * back as the DataReader needed to the original caller Data/Dump exe. Seems to be the only real way to make this work without modifying the the original exe for each custom data pull.
Thanks for everyone's input.
I have a Datagrid in my app. This datagrid fetches some data from a MySQL DB. They are fetched from a List<> to be true, because I'm not able to fetch the data from a Dataset (and I don't know why).
Anyway, when I update a field in my app i want these changes to be reflected on the list and therefore on the table in my DB.
Any idea?
Also, it's a good option to save tables data on a List<> or it's better to save them on a DataSet/DataTable?
Thank you.
I'll answer your last question first. In general, you want to use DataSet/DataTable, because they have many methods and properties related to database functionality. A List<> may get you where you're going for now, but extending it in the future will be a nightmare.
I would focus on getting a DataSet properly filled from your Database (see: http://dev.mysql.com/usingmysql/dotnet/ if you don't know where to start), and then simply setting your DataGrid to use that DataSet as its DataSource. You can then use things like LINQ to SQL or Entity Framework to better model that DataSet in code. Assuming you have the proper ODBC drivers installed, it should be as simple as creating the correct Connection String and doing everything normally from there.
You can definitely do things the way you're doing now, but you will have to manually send any SQL update statements instead of relying on the automatic ways of doing it. But I would seriously consider reworking it to use proper .NET data objects.
I want to setup a table that can:
Save the data on the user's machine
Reference & present the data in the GUI
Capable of adding rows dynamically during runtime
What's the best way to go about this?
DataGridView or TableLayoutPanel or...? I'm having trouble with SQL server CE, as I was going to connect it with the DataGridView, but I'm very new to this kind of work, and wondered if it was even necessary to use SQL.
SQL CE should work OK, but no: you don't have to use SQL. You could just populate a DataSet and save/load that to a file on disk. Or you could use any other serializable object tree and a serializer such as XmlSerializer etc. All of these should work fine with standard bindings like DataGridView. Note, though, that databases get you more granular control over the data. It all depends on whether that is valuable, or if a single flat file will suffice.
Say I have a few tables in the MSSQL database, each with about 5-10 attributes. There are some simple associations between the tables, but each of the table have 500,000 to 1,000,000 rows.
There is an algorithm that runs on that data (all of it), so before running the algorithm, I have to retrieve all the data from the database. The algorithm does not change the data, only reads it, so I just need to retrieve the data.
I am using LINQ to SQL. To retrieve all the data takes about two minutes. What I want to know is whether the serialization to file and then deserialization (when needed) would actually load the data faster.
The data is about 200 MB, and I don't mind saving it to disk. So, would it be faster if the objects were deserialized from the file or by using LINQ 2 SQL DataContext?
Any experiences with this?
I would argue that LINQtoSQL may not be the best choice for this kind of application. When you are talking about so many objects, you incur quite some overhead creating object instances (your persistent classes).
I would choose a solution where a stored procedure retrieves only the necessary data via ADO.NET, the application stores it in memory (memory is cheap nowadays, 200MB should not be a problem) and the analyzing algorithm is run on the in-memory data.
I don't think you should store the data on file. In the end, your database is also simply one or more files that are read by the database engine. So you either
let the database engine read your data and you analyze it, or
let the database engine read your data, you write it to file, you read the file (reading the same data again, but now you do it yourself) and you analyze the data
The latter option involves a lot of overhead without any advantages as far as I can see.
EDIT: If your data changes very infrequently, you may consider preprocessing your data before analyzing and caching the preprocessed data somewhere (in the database or on the file system). This only makes sense if your preprocessed data can be analyzed (a lot) faster than the raw data. Maybe some preprocessing can be done in the database itself.
You should try to use ADO.NET directly without the LINQ to SQL layer on top of it, i.e. using an SqlDataReader to read the data.
If you work sequentially with the data, you can get the records from the reader when you need them without having to read them all into memory first.
If you have a process that operates on most of the data in a database... then that sounds like a job for a stored procedure. It won't be object oriented, but it will be a lot faster and less brittle.
Since you are doing this in C# and your database is MsSql (since you use Linq to Sql), could you not run your code in a managed stored procedure? That would allow you to keep your current code as it is, but loading the data would be much faster since the code was running in the sql server.