I have 200,000 records in a database with the PK as a varchar(50)
Every 5 minutes I do a SELECT COUNT(*) FROM TABLE
If that result is greater than the List.Count I then execute
"SELECT * FROM TABLE WHERE PRIMARYKEY NOT IN ( " + myList.ToCSVString() + ")"
The reason I do this is because records are being added to the table via another process.
This query takes a long time to run and I also believe its throwing an OutOfMemoryException
Is there a better way to implement this?
Thanks
SQL Server has a solution for this, add a timestamp column, every time you touch any row in the table the timestamp will grow.
Add an index for the timestamp column.
Instead of just storing ids in memory, store ids and last timestamp.
To update:
select max timestamp
select all the rows between old max timestamp and current max timestamp
merge that into the list
Handling deletions is a bit more tricky, but can be achieved if you tombstone as opposed to delete.
Can you change the table?
If so, you might want to add a new auto incremented column that will serve as the PK TableId.
On each SELECT save the max id and on the next select add where TableId > maxId.
Create an INT PK, and use something like this:
"SELECT * FROM TABLE WHERE MY_ID > " + myList.Last().Id;
If you can't change your PK, create another column with date as type , and with NOW() as the default value and use it to query for new items.
Create another table in the database with a single column for for the primary key. When your application starts, insert the PKs into this table. Then you can detect added keys directly with a select rather than checking the count:
select PrimaryKey from Table where PrimaryKey not in (select PrimaryKey from OtherTable)
If this CSV list is large, I would recommend loading your file into a temp table, put an index on it and do a left join where null
select tbl.*
from table tbl
left join #tmpTable tmp on tbl.primarykey = tmp.primarykey
where tmp.primary key is null
edit: a Primary Key should not be a varchar. It should almost always be a incremented int/bigint. This would've been a lot easier. select * from table where primarykey > #lastknownkey
Smack the DB programmer who designed this.. :p
This design would also cause index fragmentation because rows won't be inserted in a linear fashion.
Related
First I am sorry for my bad English, is not my language.
My problem is: I have a table with around 10 million records of transaction of bank. It don't have PK and didn't sort as any column.
My work is create a page to filter and export it to csv. But limit of rows to export Csv is around 200k records.
I have some idea like:
create 800 tables of 800 ATMs (just an idea, I know it's stupid) and send data from main table to it 1 time per day => export to 800 file csv
use Linq to get 100k record per time then next time, I skip those. But I am stuck when Skip command need OrderBy and I got OutOfMemoryException with it
db.tblEJTransactions.OrderBy(u => u.Id).Take(100000).ToList()
Can anyone help me, every idea is welcome (my boss said I can use anything includes create hundred of tables, use Nosql ... )
If you don't have a primary key in your table, then add one.
The simplest and easiest is to add an int IDENTITY column.
ALTER TABLE dbo.T
ADD ID int NOT NULL IDENTITY (1, 1)
ALTER TABLE dbo.T
ADD CONSTRAINT PK_T PRIMARY KEY CLUSTERED (ID)
If you can't alter the original table, create a copy.
Once the table has a primary key you can sort by it and select chunks/pages of 200K rows with predictable results.
I'm not sure about my solution. But you can refer and try it:
select top 1000000 *, row_number() over (order by (select null)) from tblEJTransactions
The above query returns sorted list.
And then you can use Linq to get the result.
I'm using EF in order to insert and retrieve info from DB,
there is any way to insert new row but at the specified position,
Like i have 10 rows with IDs ranging from 0 to 9 and new row i'm inserting will be on the position 4?
I'm using ASP.NET MVC 5 and LINQ.
Thank you.
The simple answer is no. Order has no meaning unless it's explicit in a database system. Sure in most cases I can insert into a table and pull from this exact table and get the exact order as it was inserted, but this is undefined...and the only guarantee is to use an ORDER BY clause.
If you are talking about changing an auto number property, this is also not possible, the database does not go back and fill in gaps with id numbers. If numbering is critical and important to you don't set the auto-increment property.
Your ID and order position are different things.
For ID you use an autonumeric and you shouldnt mess with that.
For order you use another column and run a trigger when a new row is insert update all the rows
So when the new row is inserted with order_id = 4 all the rows get update
something like
UPDATE table
set order_id = order_id +1
when order_id >= 4
So, I would do so quickly:
I would plan the database to not auto increment primary key and saving would so that the id is attributed according to the specific location. Obviously put an IF to verify that it is available, and if I would start a review cycle to the cascade of subsequent ID or positioning the value traded in the end.
for example
MyTable table = myDb.MyTable.Find(id); //position
if (table==null)
{ table.id=position; table.Field=value; myDb.SaveChanges() }
else
{
var temp = table.id;
var max = table.count(x=> x.id).value;
table.id=max+1;myDb.SaveChanges();
table.id=id; table.Field=value; myDb.SaveChanges();
}
sorry if translate is no good! ;-)
I want to get a new row id for "products", for this I use MAX SQL command as follwing (the command is in insert new record button click event):
SqlCommand cmd = new SqlCommand("Select ISNULL(MAX(id)+1,0) from products", SqlCon);
the issue is when there are rows with IDs 10,11,12 (12 is MAX) and i delete id 12 record , i gets MAX+1 id 12 when the new id row is 13 ("id" field is PK with identity increment 1).
can i do it with other way?
example:
id prodect
-- -------
1 dog
2 cat
3 mouse
4 elefant
when i deletes row 4 i get MAX(id)+1 = 4 and i want to get 5 since this is the next row id.
I suspect the actual question is How can I find the ID of the row I just inserted so I can use it as a foreign key in related tables or in an image file name?
SQL Server since 2005 provides the OUTPUT clause in INSERT, UPDATE, DELETE statements that returns the values of the columns just inserted or modified. In the case of the insert statement, the syntax is:
insert into Products (Product)
OUTPUT inserted.ID
VALUES ('xxx')
This is a better option than the IDENT_CURRENT or SCOPE_IDENTITY values because it returns the values using a single statement and there is no ambiguity about what is returned:
IDENT_CURRENT may return a different value if multiple users are writing to the table outside a transaction
SCOPE_IDENTITY returns the last ID generated in a transaction, no matter the table
You can return more than one column:
insert into Products (Product)
OUTPUT inserted.ID, inserted.Product
VALUES ('xxx')
You can execute this statement with ExecuteScalar, if you return only one column or ExecuteReader, if you want to return more columns.
In the case of UPDATE or DELETE statements, the deleted table contains the deleted values and inserted contains the new values
Note ORMs like Entity Framework use such statements already to retrieve auto-generated IDs and update saved objects. In this case one only needs to read the ID property of the saved objects.
I will take a stab at what I think you are after. :)
If you include SELECT SCOPE_IDENTITY(); in your SQL you will get the ID you need:
INSERT INTO products (
* your fields *
)
VALUES (
* your values *
);
SELECT SCOPE_IDENTITY();
And then in your code you can have:
var Id = Convert.ToInt32(cmd.ExecuteScalar());
This will give you the id of the record you have inserted.
One possible solution could be that you don't delete the rows. You can add a flag and make it inactive/deleted. That way your row numbers will always be preserved and your code will give you the max Id.
I think the OP tries to tackle the wrong problem...
When you insert a new product into the products table, you should try to retrieve the new id directly with the scope_identity function as such (SQLServer!):
string sql = "insert into products(name) values('Yellow Cup'); SELECT SCOPE_IDENTITY();";
var sqlCommand = new SqlCommand(sql, conn);
var id = cSqlServer.ExecuteScalar();
Definitely MAX is not what anybody would use in this case. Closest solution would be to get recently used identity value and then increment it by 1 (in your case) or by seed value, whatever it is.
select ident_current('products') + 1
Caution - although this solves your purpose for now, beware that 'ident_current' will return you the identity value set by other sessions as well. In simple words, if there is some request/trigger/execution that causes id to be incremented even before your button click finishes then you you will get inserted_id and not deleted one.
I was given a task to insert over 1000 rows with 4 columns. The table in question does not have a PK or FK. Let's say it contains columns ID, CustomerNo, Description. The records needed to be inserted can have the same CustomerNo and Description values.
I read about importing data to a temporary table, comparing it with the real table, removing duplicates, and moving new records to the real table.
I also could have 1000 queries that check if such a record already exists and insert data if it does not. But I'm too ashamed to try that out for obvious reasons.
I'm not expecting any specific code, because I did not give any specific details. What I'm hoping for is some pseudocode or general advice for completing such tasks. I can't wait to give some upvotes!
So the idea is, you don't want to insert an entry if there's already an entry with the same ID?
If so, after you import your data into a temporary table, you can accomplish what you're looking for in the where clause of a select statement:
insert into table
select ID, CustomerNo, Description from #data_source
where (#data_source.ID not in (select table.ID from table))
I would suggest to you to load the data into a temp table or variable table. Then you can do a "Select Into" using the distinct key word which will removed the duplicated records.
you will always need to read the target table, unless you bulk load the target table into a temp table(in this point you will have two temp tables) compare both, eliminate duplicates and then insert in target table, but even this is not accurate, because you can have a new insert in the target table while you do this.
Is there a way to randomize the rows in SQL Server?
I don't want to retrieve the rows in a random manner, I know how to to that.
I want to shuffle the row IDs in the database (ex. ID1 will change to ID27 and ID27 will change to ID1).
I can copy all records to a temporary table, truncate the original table and insert the records back from the temporary table using a parallel loop for randomization.
Is there an easier way to this ?
ID is the identity seed, auto incremented
This sounds like a really strange requirement. Since the id is an identity you can't change that, so you'll have to swap all the other data on the row, which you could probably do with something like this:
select
a.id as old_id,
b.*
into #newdata
from
(
select
id,
row_number() over (order by id) as rn
from
data
) a
join (
select
*,
row_number() over (order by newid()) as rn
from
data
) b on a.rn = b.rn
This creates a temp table with old and new id numbers + all the columns from the table. You could then use to update all the columns for the rows from in the original table using this temp. table.
Can't really recommend doing this, especially if there's a lot of rows. Before doing this you probably should take a table level exclusive lock to the table just in case.