Compare 2 large DATA tables(70k Records) - c#

How do I compare 2 large DATA tables in C#? The DataTable.Select method takes forever.
I need to compare each record’s field value with the other table. The source and target field data type might be different, e.g. Table1’s field1 data type is INT and Table2’s field1 datatype is VARCHAR.

You need to profile application to find what exactly is slow: iterating through columns and comparing values or finding matching records (rows) that needs to be compared.
As for record, solution could be to convert a table to dictionary. That works if your tables have unique column, then you can convert them to dictionary, where key is unique column value for record and value is whole row. Then iterate first DataTable, get unique column value and get row from 2nd Datatable, but from Dictionary.
If issue is in comparison between 2 rows, then it is better to show the code to see, maybe there are extra comparisons or casts. It's hard to tell without code.

Related

Getting the Average in ROWS c#

i have a sql server database with table. These are
1stAP_TB, 2ndAP_TB, 3rdAP_TB, 4thAP_TB, 1steng_TB, 2ndeng_TB, 3rdeng_TB,
4theng_TB
all in them are in row. The numbers will be solve individually on specific column. Now, i need to know how am i going to get the average of 1stAP_TB, 2ndAP_TB, 3rdAP_TB and 4thAP_TB while there are in rows.
Also, there are multiple data that will be save inside the database. I am using C# programming language.
Try below method
create table aveexample
(a1stAP_TB int,
a2ndAP_TB int,
a3rdAP_TB int,
a4thAP_TB int,
a1steng_TB int,
a2ndeng_TB int,
a3rdeng_TB int,
a4theng_TB int
)
Sample data
insert into aveexample values(1,2,3,4,5,6,7,8)
insert into aveexample values(11,22,33,44,55,66,77,78)
insert into aveexample values(2,3,1,4,10,10,45,5)
Method 1
select *, (select AVG(totaldata)
from (values(a1stAP_TB),
(a2ndAP_TB),(a3rdAP_TB),(a4thAP_TB),(a1steng_TB),
(a2ndeng_TB),(a3rdeng_TB),(a4theng_TB)) total(totaldata))as average
from aveexample
Method 2
select ((a1stAP_TB)+
(a2ndAP_TB)+(a3rdAP_TB)+(a4thAP_TB)+(a1steng_TB)+
(a2ndeng_TB)+(a3rdeng_TB)+(a4theng_TB))/8 as Average
from aveexample
It is difficult to give concrete advice given the very limited description in the question, but from the description and comments so far, it seems to me like the database needs to be redesigned to better fit your requirements. First, you have no ID field, so there is no way to differentiate one row from the next. Then, what you are left with is a series of repeated values. The clue here is that you have "1st", "2nd", "3rd" in the column names. That's probably a sign that those columns need to be moved into rows of a related table. It may not instantly seem to be the best approach, but this is called "First Normal Form" and is a typical best practice with SQL databases. See also Database Normalization Basics.
It seems to me that what you have here is some entity (which you haven't mentioned in your question) that has a number of values associated with it. The 'entity' here should be given a unique ID and then all of the values for that entity stored with its ID.
You might have a table with the following columns:
CREATE TABLE MyItems (
ID int NOT NULL,
Sequence int NOT NULL,
Value int NOT NULL,
CONSTRAINT PK_MyValues_ID_Sequence PRIMARY KEY
(ID,Sequence)
)
Note: ID + sequence forms the unique primary key for the table and makes every row unique. This also lets you keep track of what order the items were added in. This may or may not be important to you but every table should probably have a unique primary key.
Your data table would then look something like this (the example represents two different entities, the first having 4 values and the second having 3 values):
It's difficult to show a sensible example without knowing more about the application and what it does... but with this table design you have a basis from which to add values one at a time, as you said you needed, and a way to query them back. You can use grouping to produce things like totals and averages, or you can do that in code by iterating over the results of a query or in a LINQ statement.
You can then compute the average for an entity of a given ID using a LINQ query along the lines of:
var average = MyItems.Where(p=>p.ID == 1).Average(q=>q.Value);
As an example of the flexibility of this sort of approach, you could just as easily compute the average of every second value entered across the entire database:
var averageOfSecondItems = MyItems.Where(p => p.Sequence == 2).Average(q => q.Value);
The example I've shown deals with one type of value. In your question it appears that you might have two different types of value. There are several ways you could handle that - for example you could add another column to the table if the values are always entered in pairs, or you could create a second table to hold the separate values. Again, it's hard to make a recommendation based on the limited information given.
If putting your data into First Normal Form seems like a lot of work, then your application might be a better fit for a document database ("NoSQL" database), but that is really a different question. In the question, a SQL database was specified so I've concentrated on that.

How to Auto increment and auto change in database column value

I have a SQL table that stores different data. And the primary key has integer value that is incremented 1 one new data is entered. As long as we keep on adding it works fine. But when we delete any center value or ending value it causes problems.
i.e for example I have added 5 rows in the table. And the column sr_num holds value of 5. And when I delete the 4th record the sr_num column remains like this: 1,2,3,5.
I want it to be 1,2,3,4 as soon as I delete the 4th entry, I want the 5th one to take 4th position and same number as well.
It must to happen to all.
No. That is not what your primary key is used for. It is only for logical reference, to allow for uniqueness. You should mentally ignore the fact that it uses an integer. #Adriano and #marc_s are both correct. Let go of the idea that you could/should renumber your primary key values. There are some rare occasions when you might consider it, but this is not one of those rare occasions.
Instead, you could set up a query (or view) that uses ROW_NUMBER() in your query (as #Adriano mentioned). Then, you will have your consecutive numbers without messing with your primary key values. People usually refer to this as an ordinal column or simply Ord.
It is a bad Idea what you want to.
example: your sr_num has a foreign key to other table, once you
update the sr_num you need to update the other table with the same
value as sr_num too.

varchar column sorted by itself

I have table with columns, some columns are varchar. I have noticed that the rows in the table are sorted automatically. I, instead, want the rows to be in the same order as they are inserted into the table. Any clues? Please note that I haven't applied any ORDER BY clause and Dates are all same for the columns.
As is evident that although I added Testing Book 3 first, it automatically came below the Testing Book 2 which is not desired.
Is it because my PK is composite?
You did not specify exactly which RDBMS you are using, but I can say the following with regards to Microsoft SQL Server:
You CANNOT guarantee ANY predictable / repeatable ordering without an ORDER BY clause
If you want rows to be ordered by when they were inserted, you need to: add a new column that is either an IDENTITY (could be INT or BIGINT) or a DATETIME / DATETIME2 datatype with a default constraint of GETDATE() or GETUTCDATE() AND ORDER BY this new field
The new field has nothing to do with a PK. This is in reference to a suggestion someone else made. A PK is for relationships, not sorting, and while an IDENTITY is typically used for a PK, there are plenty of situations to have a PK of one or more non-auto-incrementing fields and still have an auto-incrementing field.
If you need the detail on what millisecond / nanosecond the records are inserted as well as the guaranteed / repeatable sort, then do both the DATETIME / DATETIME2 and IDENTITY fields.
Adding one, or both, of these fields does not imply any specific index structure. Their existence merely allows you to create one or more indexes that would include them to enforce your desired ordering.
Please note that SQL does not guarantee ordering when inserting or selecting rows.
You can see answers for a question similar to yours here
The way I would do ordering by insertion is I would add DATETIME column that gets a date/time value of when you do insert. (How this can be done you can see in the accepted answer to this question)
Then during selects make an order by on the DATETIME column

Compare two datatable rows to two ints in any order

I'm doing some comparison, I have datatable with one text column, and I compare each row of datatable with all others.
My point is to avoid double comparison.
I thought writing IDs of compared rows to other datatable, so every time I can check if that two rows are already compared.
Table of already compared rows:
------
1245 4589
5589 6952
2233 2339
So if I want to compare rows with ids 6952 and 5589, I want to see if there is row with columns 6952/5589 or 5589/6952 in table of already compared rows.
What is the simpliest way?
I think you can add another string column which stores compared column IDs in delimited.
i.e. ,1234,5434,32453,
So you just have a string comparison. Compare ,ID, with that column's value

Insert rows into Access Table in order from c#

I have a sorted list of insert statements that I am trying to write to an Access db. I have triple verified that the list of insert statements is in the correct order. When I open the mdb file the records are never in order. Maybe for the first 100 records, but after that it starts getting out of whack.
I am really at a loss here, any ideas? Note that this table is being created in C# first dynamically - i.e. the set of of columns is not predictable each time this code needs to be run.
Maybe you just need to add an ID field to the tables and then the insertion order should be maintained.
When adding rows to any database, the concept of "Order inside a Table" is meaningless.
You get your order when retrieving records by using an ORDER BY.
Make sure you have an ID or TimeStamp column to sort on.

Categories

Resources