The situation: I have a list of queries written so that each select data from their respective table. I want to create this list of queries as an SSIS object variable and iterate through each one, using the query as a OLE DB source in a DFT.
Is there any way to do this so that the DFT source component does not have an issue with the meta data being incorrect, after we switch to a query using a different table than the first?
The destination will also be changing as well. I know that you can delay the validation but i don't believe that helps with the switching meta data.
No, if the meta data is not the same for all queries, then you cannot use them in a single data flow task. The meta data for a DFT is set at design time, and cannot change or "refresh" during a run. You're correct that delaying validation will not help with this.
You might want to look into BiML, which dynamically creates packages based on meta data.
Related
I've hit a wall when it comes to adding a new entity object (a regular SQL table) to the Data Context using LINQ-to-SQL. This isn't regarding the drag-and-drop method that is cited regularly across many other threads. This method has worked repeatedly without issue.
The end goal is relatively simple. I need to find a way to add a table that gets created during runtime via stored procedure to the current Data Context of the LINQ-to-SQL dbml file. I'll then need to be able to use the regular LINQ query methods/extension methods (InsertOnSubmit(), DeleteOnSubmit(), Where(), Contains(), FirstOrDefault(), etc...) on this new table object through the existing Data Context. Essentially, I need to find a way to procedurally create the code that would otherwise be automatically generated when you do use the drag-and-drop method during development (when the application isn't running), but have it generate this same code while the application is running via command and/or event trigger.
More Detail
There's one table that gets used a lot and, over the course of an entire year, collects many thousands of rows. Each row contains a timestamp and this table needs to be divided into multiple tables based on the year that the row was added.
Current Solution (using one table)
Single table with tens of thousands of rows which are constantly queried against.
Table is added to Data Context during development using drag-and-drop, so there are no additional coding issues
Significant performance decrease over time
Goals (using multiple tables)
(Complete) While the application is running, use C# code to check if a table for the current year already exists. If it does, no action is taken. If not, a new table gets created using a stored procedure with the current year as a prefix on the table name (2017_TableName, 2018_TableName, 2019_TableName, and so on...).
(Incomplete) While the application is still running, add the newly created table to the active LINQ-to-SQL Data Context (the same code that would otherwise be added using drag-and-drop during development).
(Incomplete) Run regular LINQ queries against the newly added table.
Final Thoughts
Other than the above, my only other concern is how to write the C# code that references a table that may or may not already exist. Is it possible to use a variable in place of the standard 'DB_DataContext.2019_TableName' methodology in order to actually get the table's data into a UI control? Is there a way to simply create an Enumerable of all the tables where the name is prefixed with a year and then select the most current table?
From what I've read so far, the most likely solution seems to involve the use of a SQL add-on like SQLMetal or Huagati which (based solely from what I've read) will generate the code I need during runtime and update the corresponding dbml file. I have no experience using these types of add-ons, so any additional insight into these would be appreciated.
Lastly, I've seen some references to LINQ-to-Entities and/or LINQ-to-Objects. Would these be the components I'm looking for?
Thanks for reading through a rather lengthy first post. Any comments/criticisms are welcome.
The simplest way to achieve what you want is to redirect in SQL Server, and leave your client code alone. At design-time create your L2S Data Context, or EF DbContex referencing a database with only a single table. Then at run-time substitue a view or synonym for that table that points to the "current year" table.
HOWEVER this should not be necessary in the first place. SQL Server supports partitioning, so you can store all the data in a physically separate data structures, but have a single logical table. And SQL Server supports columnstore tables, which can compress and store many millions of rows with excellent performance.
I am writing a GUI that will be integrated with SAP Business One. I'm having a difficult time determine how to load, edit, and save the data in the best way possible (reliable, fast, easy).
I have two tables which are called Structures and StructureRows (not great names). A structure can contain other structures, generics, and specifics. The structures rows hold all of these items and have a type associated with them. The generics are placeholders for specifics and the specifics are an actual item in inventory.
A job will contain job metadata as well as n structures. On the screen where you edit the job, you can add structures and delete structures as well as edit the rows underneath them. For example, if you added Structure 1 to Job 1 and Structure 1 contains Generic 1, the user would be able to swap Generic 1 for a Specific.
I understand how to store the data, but I don't know what the best method to load and save the data is...
I see a few different options:
When someone adds a structure to a job, load the structure, and then recursively load any structures beneath it (the generics and specifics will already be loaded). I would put this all into an Object Model such as List and each Structure object would have List and List. When I save the changes back to the database, I would have to manually loop through the data and persist the changes.
Somehow load the data into a view in SQL and then group and order the datatable/dataset on the client side. Bind the data to a GridView so changes are automatically reflected in the dataset. When you go to save, SQL / ADO.NET could handle this automatically? This seems like the ideal solution, but I don't know how to actually implement it...
The part that throws me off is being able to add a structure to a structure. If it wasn't for this, I would select the Specifics and Generics from the StructureRows table, and group them in the GUI based on the Structure they belong to. I would have them in a DataTable and bind that to the GridView so any changes were persisted automatically to the DataTable and then I could turn around and push them to SQL very easily...
Is loading and saving the data manually via an object model the only option I have? If not, how would you do it? I'm not sure if I'm just making it more complicated then it needs to be or if this is actually difficult to do with C#, ADO.NET and MS SQL.
The HierarchyID datatype was introduced in SQLServer 2008 to handle this kind of thing. Haven't done it myself, but here's a place to start that gives a fair example of how to use it.
That being said, if you aren't wedded to your current tables, and don't need to query out the individual elements (in other words, you are always dealing the job as a whole), I'd be tempted to store the data for each job as XML. (If you were doing a web-based app, you could also go with JSON.) It preserves the hierarchy and there are many tools in .NET for working with XML. There's also a built-in TreeView class for winForms, and doubtless other third party controls available.
I have a dataset fetched from a ODBC data source (MySQL), I need to temporarily put it into a SQL Server DB to be fetched by another process (and then transformed further).
Instead of creating a nicely formatted database table I'd rather have one field of type text and whack the stuff there.
This is only a data migration exercise, a one-off, so I don't care about elegance, performance, or any other aspects of it.
All I want is to be able to de-serialize the "text-blob" (or binary) back into anything resembling a dataset. A Dictionary object would do the trick too.
Any quick fixes for me? :)
Use DataSet.WriteXml to write out your "text-blob" then use DataSet.ReadXml later when you want to translate the "text-blob" back into a DataSet to perform whatever subsequent manipulations you want to do.
Why don't you do everything on a SSIS package, extract from MySQL, transform as you wish and load wherever you need it?
I have built a backend system that allows a user to add multiple content section, widgets etc.
I want to keep the queries to the SQL server to a minimum for performance reasons, this is my current flow:
I check my main table which widgets have been added.
I run through each row and build the 'batch' sql query that gets content from mulitple tables.
Call the completed list of queries.
I populate in a DataSet.
Now for the problem:
The tables will never be in the same order, and I can't find a way to name the returned tables.
Is it best to just dedicate a column in each returned DataTable to specify what it actually is, and loop through the DataSet?
Or is there actually a way of naming the returned tables?
There is no way to do it automatically, as far as I know. You can give table mappings a try -> http://geekswithblogs.net/dotNETvinz/archive/2009/08/03/why-dataset-creates-tablen-as-the-default-table-name.aspx
Nobody replied with an answer for specifying DataTables in a DataSet, so I ended up adding a Column to each DataTable making it unique and "searchable" solving my problem.
I am trying to use MongoDB, C# and NoRM to work on some sample projects, but at this point I'm having a much harder time wrapping my head around the data model. With RDBMS's related data is no problem. In MongoDB, however, I'm having a difficult time deciding what to do with them.
Let's use StackOverflow as an example... I have no problem understanding that the majority of data on a question page should be included in one document. Title, question text, revisions, comments... all good in one document object.
Where I start to get hazy is on the question of user data like username, avatar, reputation (which changes especially often)... Do you denormalize and update thousands of document records every time there is a user change or do you somehow link the data together?
What is the most efficient way to accomplish a user relationship without causing tons of queries to happen on each page load? I noticed the DbReference<T> type in NoRM, but haven't found a great way to use it yet. What if I have nullable optional relationships?
Thanks for your insight!
The balance that I have found is using SQL as the normalized database and Mongo as the denormalized copy. I use a ESB to keep them in sync with each other. I use a concept that I call "prepared documents" and "stored documents". Stored documents are data that is only kept in mongo. Useful for data that isn't relational. The prepared documents contain data that can be rebuilt using the data within the normalized database. They act as living caches in a way - they can be rebuilt from scratch if the data ever falls out of sync (in complicated documents this is an expensive process because these documents require many queries to be rebuilt). They can also be updated one field at a time. This is where the service bus comes in. It responds to events sent after the normalized database has been updated and then updates the relevant mongo prepared documents.
Use each database to their strengths. Allow SQL to be the write database that ensures data integrity. Let Mongo be the read-only database that is blazing fast and can contain sub-documents so that you need less queries.
** EDIT **
I just re-read your question and realized what you were actually asking for. I'm leaving my original answer in case its helpful at all.
The way I would handle the Stackoverflow example you gave is to store the user id in each comment. You would load up the post which would have all of the comments in it. Thats one query.
You would then traverse the comment data and pull out an array of user ids that you need to load. Then load those as a batch query (using the Q.In() query operator). Thats two queries total. You would then need to merge the data together into a final form. There is a balance that you need to strike between when to do it like this and when to use something like an ESB to manually update each document. Use what works best for each individual scenario of your data structure.
I think you need to strike a balance.
If I were you, I'd just reference the userid instead of their name/reputation in each post.
Unlike a RDBMS though, you would opt to have comments embedded in the document.
Why you want to avoid denormalization and updating 'thousands of document records'? Mongodb db designed for denormalization. Stackoverlow handle millions of different data in background. And some data can be stale for some short period and it's okay.
So main idea of above said is that you should have denormalized documents in order to fast display them at ui.
You can't query by referenced document, in any way you need denormalization.
Also i suggest have a look into cqrs architecture.
Try to investigate cqrs and event sourcing architecture. This will allow you to update all this data by queue.