DataTable merge treats duplicate PK as update rather than erroring

DataTable merge treats duplicate PK as update rather than erroring - c#

I am trying to merge two DataTables - one representing current data, and one representing proposed insertions to that data.
These tables have the same schema, both with a simple, user-provided primary key string.
What I'd like is, if the a proposed insertion row has a key that is already present in the current data, an error should be thrown. However, the proposed addition just gets merged as a proposed alteration to the existing row, which is not what I want.
My current code is something along the lines of
currentData.EnforceConstraints = false;
currentData.Merge(additions);
currentData.EnforceConstraints = true;
where I'm actually merging whole DataSets, not just DataTables. I was hoping to get an error on the EnforceConstraints = true line, but I don't.
I also tried using diffgrams, but had the same problem - duplicate insertions get treated as modifications.
Is there a way to merge a set of insertions into a DataSet and have duplicate PKs be treated as an error rather than an update?
Similarly, since modified DataRows remember their original values, I'd hope that merging a modified row whose original values don't match the target row's current values would throw an exception too.

Isn't the Unique flag used for this purpose? My understanding is that for Merge it will merge rows based on Primary Key.

Related

Reduce Number of Primary Keys

I am working on a project which reads information from CSV files posted twice a day and stores the info into a database. Each CSV file may contain rows from previous files. Unfortunately, to get a unique row in the CSV files, you have to assign 8 columns as the primary key. I feel that this is ridiculous to work with. So, I really want to reduce the number down to one. So far, the only idea I have is to create a hash of all of the primary key columns or just append them all into one string. Before I do this, I'd like to know if there might be a better way to reduce the 8 primary keys down to one.
PK columns are defined as:
// ....
table.Columns.Add("plantNumber",typeof(string)); //e.g. 341
table.Columns.Add("shipLocation",typeof(string)); //e.g. 11000047
table.Columns.Add("shipDate",typeof(DateTime)); //e.g. 2017/04/18 00:00
table.Columns.Add("releaseNumber",typeof(string)); //e.g. VH6516128
table.Columns.Add("releaseDate",typeof(DateTime)); //e.g. 2017/04/14
table.Columns.Add("orderNumber",typeof(string)); //e.g. 216967
table.Columns.Add("orderLine",typeof(string)); //e.g. 0011
table.Columns.Add("sequence",typeof(string)); //e.g. 044
// ....
table.PrimaryKey = new DataColumn[]
{
table.Columns["plantNumber"],
table.Columns["shipLocation"],
table.Columns["shipDate"],
table.Columns["releaseDate"],
table.Columns["releaseNumber"],
table.Columns["orderNumber"],
table.Columns["orderLine"],
table.Columns["sequence"],
};
Note: the reason many of the seemingly numeric fields are treated as a string instead of an int is because they quoted in the CSV file, and may begin with zero's which I need to preserve. I also do not know 100% certain they won't ever contain letters.
UPDATE:
I don't consider an auto-incremental number to be a good solution, because I still need to ensure that not only within the SQL DB, but within the DataTable itself that the combination of the 8 columns are unique. The individual columns by themselves are not unique. Only the combination of the columns.

To me that is not a primary key. The primary key isn't 'the only thing unique' in your row. An unique index can do the same for you.
A primary key (in my opinion), should just be a single (often) numerical value to technically represent the data as unique. Functionally something else can define a row as unique, as you have in your sample here, but I wouldn't make that the primary key just for that reason only.

Nothing wrong with a compound index. Thats's how relational databases work, but if you really have to you could concat or hash the 8 values that build the unique key into a single column, but that would have the adverse effect of making your data static, unless you rebuild the hash/concat index.

Blank for new columns for old rows dynamodb

I am using c#.
I have thousands of rows in dynamodb like
Now I insert few more rows but with different columns
like
Here Row 3 and Row 4 are new rows
Now I need structure like below
Now I want old rows should have default 0 or blank string for new columns,
I don't want to iterate through old rows and updated items.
Is there any way to set default values..

What you're trying to achieve doesn't really work with NoSQL type databases. Technically, there is no concept of rows and columns when it comes to DynamoDB. Instead, there are items (your records) that contain any number of attributes. You can read more about it here.
If you're using an object mapper, I'd recommend dealing with this logic on that level, ie. when getting items from DynamoDB you can assign default values there if there is a need to do so.

This cannot be archived with a DynamoDB query. DynamoDB only "knows" about indexed keys (partition key, sort key and secondary indices) and is not able to tell you the name of all properties that exist in your table (unlike a relational database).
The only thing I can think of is:
Scan the entire database
Store all property names in e.g. a HashSet<T>
For each item: Add the missing properties, assing your default values and update it

Same Query gives different results from within Sqlite Expert [duplicate]

I'm trying to populate a DataTable, to build a LocalReport, using the following:
MySqlCommand cmd = new MySqlCommand();
cmd.Connection = new MySqlConnection(Properties.Settings.Default.dbConnectionString);
cmd.CommandType = CommandType.Text;
cmd.CommandText = "SELECT ... LEFT JOIN ... WHERE ..."; /* query snipped */
// prepare data
dataTable.Clear();
cn.Open();
// fill datatable
dt.Load(cmd.ExecuteReader());
// fill report
rds = new ReportDataSource("InvoicesDataSet_InvoiceTable",dt);
reportViewerLocal.LocalReport.DataSources.Clear();
reportViewerLocal.LocalReport.DataSources.Add(rds);
At one point I noticed that the report was incomplete and it was missing one record. I've changed a few conditions so that the query would return exactly two rows and... surprise: The report shows only one row instead of two. I've tried to debug it to find where the problem is and I got stuck at
dt.Load(cmd.ExecuteReader());
When I've noticed that the DataReader contains two records but the DataTable contains only one. By accident, I've added an ORDER BY clause to the query and noticed that this time the report showed correctly.
Apparently, the DataReader contains two rows but the DataTable only reads both of them if the SQL query string contains an ORDER BY (otherwise it only reads the last one). Can anyone explain why this is happening and how it can be fixed?
Edit:
When I first posted the question, I said it was skipping the first row; later I realized that it actually only read the last row and I've edited the text accordingly (at that time all the records were grouped in two rows and it appeared to skip the first when it actually only showed the last). This may be caused by the fact that it didn't have a unique identifier by which to distinguish between the rows returned by MySQL so adding the ORDER BY statement caused it to create a unique identifier for each row.
This is just a theory and I have nothing to support it, but all my tests seem to lead to the same result.

After fiddling around quite a bit I found that the DataTable.Load method expects a primary key column in the underlying data. If you read the documentation carefully, this becomes obvious, although it is not stated very explicitly.
If you have a column named "id" it seems to use that (which fixed it for me). Otherwise, it just seems to use the first column, whether it is unique or not, and overwrites rows with the same value in that column as they are being read. If you don't have a column named "id" and your first column isn't unique, I'd suggest trying to explicitly set the primary key column(s) of the datatable before loading the datareader.

Just in case anyone is having a similar problem as canceriens, I was using If DataReader.Read ... instead of If DataReader.HasRows to check existence before calling dt.load(DataReader) Doh!

I had same issue. I took hint from your blog and put up the ORDER BY clause in the query so that they could form together the unique key for all the records returned by query. It solved the problem. Kinda weird.

Don't use
dr.Read()
Because It moves the pointer to the next row.
Remove this line hope it will work.

Had the same issue. It is because the primary key on all the rows is the same. It's probably what's being used to key the results, and therefore it's just overwriting the same row over and over again.
Datatables.Load points to the fill method to understand how it works. This page states that it is primary key aware. Since primary keys can only occur once and are used as the keys for the row ...
"The Fill operation then adds the rows to destination DataTable objects in the DataSet, creating the DataTable objects if they do not already exist. When creating DataTable objects, the Fill operation normally creates only column name metadata. However, if the MissingSchemaAction property is set to AddWithKey, appropriate primary keys and constraints are also created." (http://msdn.microsoft.com/en-us/library/zxkb3c3d.aspx)

Came across this problem today.
Nothing in this thread fixed it unfortunately, but then I wrapped my SQL query in another SELECT statement and it work!
Eg:
SELECT * FROM (
SELECT ..... < YOUR NORMAL SQL STATEMENT HERE />
) allrecords
Strange....

Can you grab the actual query that is running from SQL profiler and try running it? It may not be what you expected.
Do you get the same result when using a SqlDataAdapter.Fill(dataTable)?
Have you tried different command behaviors on the reader? MSDN Docs

I know this is an old question, but for me the think that worked whilst querying an access database and noticing it was missing 1 row from query, was to change the following:-
if(dataset.read()) - Misses a row.
if(dataset.hasrows) - Missing row appears.

For anyone else that comes across this thread as I have, the answer regarding the DataTable being populated by a unique ID from MySql is correct.
However, if a table contains multiple unique IDs but only a single ID is returned from a MySql command (instead of receiving all Columns by using '*') then that DataTable will only organize by the single ID that was given and act as if a 'GROUP BY' was used in your query.
So in short, the DataReader will pull all records while the DataTable.Load() will only see the unique ID retrieved and use that to populate the DataTable thus skipping rows of information

Not sure why you're missing the row in the datatable, is it possible you need to close the reader? In any case, here is how I normally load reports and it works every time...
Dim deals As New DealsProvider()
Dim adapter As New ReportingDataTableAdapters.ReportDealsAdapter
Dim report As ReportingData.ReportDealsDataTable = deals.GetActiveDealsReport()
rptReports.LocalReport.DataSources.Add(New ReportDataSource("ActiveDeals_Data", report))
Curious to see if it still happens.

In my case neither ORDER BY, nor dt.AcceptChanges() is working. I dont know why is that problem for. I am having 50 records in database but it only shows 49 in the datatable. skipping first row, and if there is only one record in datareader it shows nothing at all.
what a bizzareeee.....

Have you tried calling dt.AcceptChanges() after the dt.Load(cmd.ExecuteReader()) call to see if that helps?

I know this is an old question, but I was experiencing the same problem and none of the workarounds mentioned here did help.
In my case, using an alias on the colum that is used as the PrimaryKey solved the issue.
So, instead of
SELECT a
, b
FROM table
I used
SELECT a as gurgleurp
, b
FROM table
and it worked.

I had the same problem.. do not used dataReader.Read() at all.. it will takes the pointer to the next row. Instead use directly datatable.load(dataReader).

Encountered the same problem, I have also tried selecting unique first column but the datatable still missing a row.
But selecting the first column(which is also unique) in group by solved the problem.
i.e
select uniqueData,.....
from mytable
group by uniqueData;
This solves the problem.

Best choice to store a list of ints in mssql

I am wondering which method is the best way to store a list of integers in a sql column.
.....i.e. "1,2,3,4,6,7"
EDIT: These values represent other IDs in SQL tables. The row would look like
[1] [2]
id, listOfOtherIDs
The choices I have researched so far are:
A varchar of separated value that are "explode-able" i.e. by commas or tabs
An XML containing all the values individually
Using individual rows for each value.
Which method is the best method to use?
Thanks,
Ian

A single element of a record can only refer to one value; it's a basic database design principle.
You will have to change the database's design: use a single row for each value.
You might want to read up on normalization.
As is shown here in the description of the first normal form:
First normal form states that at every row and column intersection in the table there, exists a single value, and never a list of values. For example, you cannot have a field named Price in which you place more than one Price. If you think of each intersection of rows and columns as a cell, each cell can hold only one value.

While Jeroen's answer is valid for "multi-valued" attributes, there are genuine situations where multiple comma-separated values may actually be representing one large value. Things like path data (on a map), integer sequence, list of prime factors and many more could well be stored in a comma-separated varchar. I think it is better to explain what exactly are you storing and how do you need to retrieve and use that value.
EDIT:
Looking at your edit, if by IDs you mean PK of another table, then this sounds like a genuine M-N relation between this table and the one whose IDs you're storing. This stuff should really be stored in a separate gerund, which BTW is a table that would have the PK of each of these tables as FKs, thus linking the related rows of both tables. So Jeroen's answer very well suits your situation.

How do you delete a row from a split table?

I have two tables. One contains binary data and the other contains the metadata. I am attempting to delete the entire row from both tables, but keep getting the error:
Invalid data encountered. A required relationship is missing.
Examine StateEntries to determine the source of the constraint violation.
The rest of the info is not very helpful. Here is my code currently.
var attachment = _attachmentBinaryRepository.Single(w => w.Id == id);
_attachmentBinaryRepository.Delete(attachment);
_unitOfWork.Commit();
return true;
I was handed this project, but understand the basics of table-splitting. I am just lost in regard to deleting both. I assume, this code is just trying to delete from the one table, but on the one containing the binary data.
Anyone have suggestions?

I don't have the code with me, but I ended up fixing this by retrieving corresponding rows from all of the tables in the relationship. The rows then delete without any trouble.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

DataTable merge treats duplicate PK as update rather than erroring - c#

Isn't the Unique flag used for this purpose? My understanding is that for Merge it will merge rows based on Primary Key.

Related

Reduce Number of Primary Keys

Blank for new columns for old rows dynamodb

Same Query gives different results from within Sqlite Expert [duplicate]

Best choice to store a list of ints in mssql

How do you delete a row from a split table?

Categories

Resources