Trying to implement grouping in IEnumerable to stream from Database - c#

Currently the application I'm working with uses strongly typed DataSets to work with data from the DB. We have a table called COM_ControlIn that represents a "file" and several other tables have a relationship with the control table. The one I need to stream from is called COM_GenericTransactionItems. There is a column in this table called COMControlIn_UID which links it up to the control table as the name suggests.
We have several methods to fetch data from this table, such as one that finds all records for a given COMControlIn_UID, but the problem with all of these is that they fetch all records at once, which is becoming a problem now that the sheer amount of data is causing us to hit .NET's memory limit. All of our existing code uses strongly typed datasets built from XSDs generated by Visual Studio from the database schema.
My idea was to use IEnumerable to "stream" batches of records from the database instead of fetching everything at once, while still keeping the strongly typed datasets we've used previously to keep it compatible without major changes. The code I've written looks more or less like this:
COM_GenericTransactionItemsDS com_GenericTransactionItemsDS = new COM_GenericTransactionItemsDS();
long lastUID = 0;
using (SqlConnection sqlConnection = new SqlConnection("...")
{
sqlConnection.Open();
SqlCommand sqlCommand = new SqlCommand("SELECT MAX(UID) FROM COM_GenericTransactionItems WHERE COMControlIn_UID = " + p_COMControlIn_UID, sqlConnection);
//because apparently I'm not allowed to straight cast...
long maxUID = Convert.ToInt64(sqlCommand.ExecuteScalar());
while (lastUID < maxUID)
{
com_GenericTransactionItemsDS.Clear();
using (SqlDataAdapter sqlDataAdapter = new SqlDataAdapter())
{
//Build Select
string strSQL = "SELECT TOP(" + fetchAmount + ") " + SQL_Columns + " FROM COM_GenericTransactionItems " +
"WHERE COMControlIn_UID = " + p_COMControlIn_UID.ToString() + " AND UID > " + lastUID + " ORDER BY UID";
//Get Data
sqlDataAdapter.SelectCommand = new SqlCommand(strSQL, sqlConnection);
sqlDataAdapter.SelectCommand.CommandTimeout = Convert.ToInt32(context.strContext[(int)eCCE_Context._COMMAND_TIMEOUT]);
sqlDataAdapter.Fill(com_GenericTransactionItemsDS, "COM_GenericTransactionItems");
lastUID = com_GenericTransactionItemsDS.COM_GenericTransactionItems.Max(r => r.UID);
}
yield return com_GenericTransactionItemsDS;
}
}
It works extremely well for fetching data and has dropped our memory usage significantly, but I have run into a problem a little further down the line.
I need to group items within this table by a specific column (a date), but the notion of this conflicts with the whole batching approach, because you need to know what your entire dataset looks like to do the grouping.
I can't do the grouping in SQL because I need the data in a sort of key-value pair like Linq used to give me before I switched to using this method (unless there is a way for me to do this in SQL).
When I try using SelectMany to flatten all of my rows into one enumerable I get RowNotInTableException whenever I try to access any of them. I don't really know what else to try.
For reference, this is the Linq query I use to do the grouping:
var dateGroups = from row in p_COM_GenericTransactionItemsDS.SelectMany(c => c.COM_GenericTransactionItems) group row by (DateTime)row[tableDefinitions.CaptureDate] into groups select groups;
I think the problem lies with the way I'm returning data from my streaming method, but I don't know how else to do it. Ideally I'd like to extract all the rows out of our data tables into an IEnumerable and just iterate through that, but DataRows don't keep the table's schema (I've read the schema is kept in the DataTable they're related to), so once you remove them from the dataset they are essentially useless.

I've solved my problem. I changed my streaming method to loop through the items it receives in a batch, make a copy of them and return them one by one, like so:
foreach (COM_GenericTransactionItemsDS.COM_GenericTransactionItemsRow row in com_GenericTransactionItemsDS.COM_GenericTransactionItems.Rows)
{
lastUID = row.UID;
COM_GenericTransactionItemsDS.COM_GenericTransactionItemsRow newRow = com_GenericTransactionItemsDS.COM_GenericTransactionItems.NewCOM_GenericTransactionItemsRow();
newRow.ItemArray = row.ItemArray;
yield return newRow;
}

Related

ADO.NET and SQLite single cell select performance

I want to create simple database in runtime, fill it with data from internal resource and then read each record through loop. Previously I used LiteDb for that but I couldn't squeeze time anymore so
I choosed SQLite.
I think there are few things to improve I am not aware of.
Database creation process:
First step is to create table
using var create = transaction.Connection.CreateCommand();
create.CommandText = "CREATE TABLE tableName (Id TEXT PRIMARY KEY, Value TEXT) WITHOUT ROWID";
create.ExecuteNonQuery();
Next insert command is defined
var insert = transaction.Connection.CreateCommand();
insert.CommandText = "INSERT OR IGNORE INTO tableName VALUES (#Id, #Record)";
var idParam = insert.CreateParameter();
var valueParam = insert.CreateParameter();
idParam.ParameterName = "#" + IdColumn;
valueParam.ParameterName = "#" + ValueColumn;
insert.Parameters.Add(idParam);
insert.Parameters.Add(valueParam);
Through loop each value is inserted
idParameter.Value = key;
valueParameter.Value = value.ValueAsText;
insert.Parameters["#Id"] = idParameter;
insert.Parameters["#Value"] = valueParameter;
insert.ExecuteNonQuery();
Transaction commit transaction.Commit();
Create index
using var index = transaction.Connection.CreateCommand();
index.CommandText = "CREATE UNIQUE INDEX idx_tableName ON tableName(Id);";
index.ExecuteNonQuery();
And after that i perform milion selects (to retrieve single value):
using var command = _connection.CreateCommand();
command.CommandText = "SELECT Value FROM tableName WHERE Id = #id;";
var param = command.CreateParameter();
param.ParameterName = "#id";
param.Value = id;
command.Parameters.Add(param);
return command.ExecuteReader(CommandBehavior.SingleResult).ToString();
For all select's one connection is shared and never closed. Insert is quite fast (less then minute) but select's are very troublesome here. Is there a way to improve them?
Table is quite big (around ~2 milions records) and Value contains quite heavy serialized objects.
System.Data.SQLite provider is used and connection string contains this additional options: Version=3;Journal Mode=Off;Synchronous=off;
If you go for performance, you need to consider this: each independent SELECT command is a roundtrip to the DB with some extra costs. It's similar to a N+1 select problem in case of parent-child relations.
The best thing you can do is to get a LIST of items (values):
SELECT Value FROM tableName WHERE Id IN (1, 2, 3, 4, ...);
Here's a link on how to code that: https://www.mikesdotnetting.com/article/116/parameterized-in-clauses-with-ado-net-and-linq
You could have the select command not recreated for every Id but created once and only executed for every Id. From your code it seems every select is CreateCommand/CreateParameters and so on. See this for example: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.prepare?view=net-5.0 - you run .Prepare() once and then only execute (they don't need to be NonQuery)
you could then try to see if you can be faster with ExecuteScalar and not having reader created for one data result, like so: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.executescalar?view=net-5.0
If scalar will not prove to be faster then you could try to use .SingleRow instead of .SingleResult in your ExecuteReader for possible performance optimisations. According to this: https://learn.microsoft.com/en-us/dotnet/api/system.data.commandbehavior?view=net-5.0 it might work. I doubt that but if first two don't help, why not try it too.

Getting Database values into variable form c#

I am developing a cricket simulation and i need to retrieve certain statistics from a players data. I've got the following code.
public List<float> BattingData()
{
con.ConnectionString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString.ToString();
string query = "SELECT [INNS], [NOT OUTS], [AVG] FROM [" + batTeam + "] WHERE [Player] = '" + name + "';";
SqlCommand com = new SqlCommand(query, con);
con.Open();
using (SqlDataReader reader = com.ExecuteReader())
{
if(reader.HasRows)
{
while (reader.NextResult())
{
Innings = Convert.ToInt32(reader["INNS"]);
NotOuts = Convert.ToInt32(reader["NOT OUTS"]);
Avg = Convert.ToSingle(reader["AVG"]);
}
}
}
con.Close();
OutRatePG = (Innings = NotOuts) / Innings;
OutRatePB = OutRatePG / 240;
RunsPB = Avg / 240;
battingData.Add(OutRatePB);
battingData.Add(RunsPB);
return battingData;
}
The error that I am getting is that when I try to divie by 'Innings' it is saying cannot divide by zero, so I think the variables are being returned as zero and no data is being assigned to them.
This line is the issue:
while (reader.NextResult())
What this does is move the reader to the next resultset, ignoring the rest of the rows unread. To advance a reader to the next row, you need to call reader.Read() instead.
You have some other issues with your code:
You appear to have a separate table for each team. This is incorrect database design. You should create a Team table, with each team in it, and then foreign key your TeamResults table to it. Query it using INNER JOIN.
You are concatenating user-entered values to your query. This leaves you open to SQL injection attacks. Use parameters instead. (You cannot parameterize a table name, another reason you should do as above 1.)
You do not need to check for HasRows. If there are no rows, Read() will return false.
It looks like you only want one row. If that is the case you don't want a while(reader.Read()) loop, instead if(reader.Read()). (If you only need a single value, you can refactor the code to use command.ExecuteScalar().)
In database records check if value for Innings has 0
also you can try the below code before performing any operation.
> if(Innings>0) { OutRatePG = (Innings - NotOuts) / Innings; }

Checking and Saving/Loading from MySQL C#

I am making something that requires MySQL. I have the saving done from in-game, which is simply done by INSERT.
I have a column that will have a password in and I need to check if the inputted password matched any of the rows and then if it is, get all of the contents of the row then save it to variables.
Does anyone have an idea how to do this in C#?
//////////////////////////
I have found how to save and get the string, however it will only get 1 string at a time :(
MySql.Data.MySqlClient.MySqlCommand command = conn.CreateCommand();
command.CommandText = "SELECT * FROM (player) WHERE (pass)";
command.ExecuteNonQuery();
command.CommandType = System.Data.CommandType.Text;
MySql.Data.MySqlClient.MySqlDataReader reader = command.ExecuteReader();
reader.Read();
ayy = reader.GetString(1);
print (ayy);
if(ayy == password){
//something
}
My best practice is to use MySQLDataAdapter to fill a DataTable. You can then iterate through the rows and try to match the password.
Something like this;
DataTable dt = new DataTable();
using(MySQLDataAdapter adapter = new MySQLDataAdaper(query, connection))
{
adapter.Fill(dt);
}
foreach(DataRow row in dt.Rows)
{
//Supposing you stored your password in a stringfield in your database
if((row.Field<String>("columnName").Equals("password"))
{
//Do something with it
}
}
I hope this compiles since I typed this from my phone. You can find a nice explanation and example here.
However, if you are needing data from a specific user, why not specificly ask it from the database? Your query would be like;
SELECT * FROM usercolumn WHERE user_id = input_id AND pass = input_pass
Since I suppose every user is unique, you will now get the data from the specific user, meaning you should not have to check for passwords anymore.
For the SQL statement, you should be able to search your database as follows and get only the entry you need back from it.
"SELECT * FROM table_name WHERE column_name LIKE input_string"
If input_string contains any of the special characters for SQL string comparison (% and _, I believe) you'll just have to escape them which can be done quite simply with regex. As I said in the comments, it's been a while since I've done SQL, but there's plenty of resources online for perfecting that query.
This should then return the entire row, and if I'm thinking correctly you should be able to then put the entire row into an array of objects all at once, or simply read them string by string and convert to values as needed using one of the Convert methods, as found here: http://msdn.microsoft.com/en-us/library/system.convert(v=vs.110).aspx
Edit as per Prix's comment: Data entered into the MySQL table should not need conversion.
Example to get an integer:
string x = [...];
[...]
var y = Convert.ToInt32(x);
If you're able to get them into object arrays, that works as well.
object[] obj = [...];
[...]
var x0 = Convert.To[...](obj[0]);
var x1 = Convert.To[...](obj[1]);
Etcetera.

fast/efficient way to run a query in Access based on datatable rows?

I have a datatable that may have 1000 or so rows in it. I need to go thru the datatable row by row, get the value of a column, run a query (Access 2007 DB) and update the datatable with the result. Here's what I have so far, which works:
String FilePath = "c:\\MyDB.accdb";
string QueryString = "SELECT MDDB.NDC, MDDB.NDC_DESC "
+ "FROM MDDB_MASTER AS MDDB WHERE MDDB.NDC = #NDC";
OleDbConnection strAccessConn = new OleDbConnection(string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + FilePath));
strAccessConn.Open();
OleDbDataReader reader = null;
int rowcount = InputTable.Rows.Count; //InputTable is the datatable
int count = 0;
while (count < rowcount)
{
string NDC = InputTable.Rows[count]["NDC"].ToString();
//NDC is a column in InputTable
OleDbCommand cmd = new OleDbCommand(QueryString, strAccessConn);
cmd.Parameters.Add("#NDC", OleDbType.VarChar).Value = NDC;
reader = cmd.ExecuteReader();
while (reader.Read())
{
//update the NDCDESC column with the query result
//the query should only return 1 line
dataSet1.Tables["InputTable"].Rows[count]["NDCDESC"] = reader.GetValue(1).ToString();
}
dataGridView1.Refresh();
count++;
}
strAccessConn.Close();
However this seems very inefficient since the query needs to run one time for each row in the datatable. Is there a better way?
You're thinking of an update query. You don't actually have to go over every row one by one. SQL is a set based language, so you only have to write a single statement that it should do for all rows.
Do this:
1) Create > Query Design
2) Close the dialog that selects tables
3) Make sure you're in sql mode (top left corner)
4) Paste this:
UPDATE INPUTTABLE
INNER JOIN MDDB_MASTER ON INPUTTABLE.NDC = MDDB_MASTER.NDC
SET INPUTTABLE.NDCDESC = [MDDB_MASTER].[NDC_DESC];
5) Switch to design mode to see what happens. You may have to correct Input table, I couldn't find it's name. I'm assuming they;re both in the same database.
You'll see the query type is now an update query.
You can run this text through cmd.ExecuteNonQuery(sql) and the whole thing should run very quickly. If it doesn't you'll need an index on one of the tables;
THis works by joining the two table on NDC and then copying the NDC_DESC over from MDDB_MASTER to the inputtable.
I missed the part about InputTable coming from Excel.
For better speed, instead of executing the query in Access over and over, you can get all rows from MDDB_MASTER into a datatable in one select statement:
SELECT MDDB.NDC, MDDB.NDC_DESC FROM MDDB_MASTER
And then use the DataTable.Select method to filter the right row.
mddb_master.Select("NDC = '" + NDC +'")
This will be done in memory and should be much faster than all the round trips you have now. Especially over the network these round trips are expensive. 225k rows should be only a few MB (roughly a JPEG image) so that shouldn't be an issue.
You could use the "IN" clause to build a bigger query such as:
string QueryString = "SELECT MDDB.NDC, MDDB.NDC_DESC "
+ "FROM MDDB_MASTER AS MDDB WHERE MDDB.NDC IN (";
int rowcount = InputTable.Rows.Count; //InputTable is the datatable
int count = 0;
while (count < rowcount)
{
string NDC = InputTable.Rows[count]["NDC"].ToString();
QueryString += (count == 0 ? "" : ",") + "'" + NDC + "'";
}
QueryString += ")";
You can optimize that with StringBuilders since that could be a lot of strings but that's a job for you. :)
Then in a single query, you would get all the NDC descriptions you need and avoid performing 1000 queries. You would then roll through the reader, find values in the InputTable, and update them. Of course, in this case, you're looping through the InputTable multiple times but it might be a better option. Especially if yor InputTable could hold duplicate NDC values.
Also, note that you have a OleDbDataReader leak in your code. You keep reassigning the reader reference to a new instance of a reader before disposing of the old reader. Same with commands. You keep instantiating a new command but are not disposing of it properly.

Sorting the result of a stored procedure

I'd like to sort on a column in the result of a stored procedure without having to add the Order By clause in the stored procedure. I don't want the data to be sorted after I have executed the query, sorting should be part of the query if possible. I have the following code:
public static DataTable RunReport(ReportQuery query)
{
OffertaDataContext db = new OffertaDataContext();
Report report = (from r in db.Reports where r.Id == (int)query.ReportId select r).Single();
//???: check security clearance.
DataSet dataSet = new DataSet();
/*
doesn't work, I guess the "Result" table hasn't been created yet;
if(!string.IsNullOrEmpty(query.SortField))
{
dataSet.DefaultViewManager.DataViewSettings["Result"].Sort = query.SortField + " " + (query.SortAscending ? "ASC" : "DESC");
}
*/
using (SqlConnection conn = new SqlConnection(Config.ConnectionString))
{
conn.Open();
using (SqlCommand exec = conn.CreateCommand())
{
using (SqlDataAdapter adapter = new SqlDataAdapter())
{
exec.Connection = conn;
exec.CommandType = CommandType.StoredProcedure;
exec.CommandText = report.ReportProc;
adapter.SelectCommand = exec;
try
{
adapter.Fill(dataSet, query.Skip, query.Take, "Result");
}
catch (Exception e)
{
throw e;
}
finally
{
conn.Close();
}
return dataSet.Tables["Result"];
}
}
}
}
How do I add sorting?
Get the DataTable you are populating in the dataSet ("Result").
Now - there's no way to sort the DataTable, except via the Query, View, or Stored
Procedure that populates it.
Since you don't wanna do it in the SP, you can sort the DefaultView of the
DataTable, or any DataView that is associated with the DataTable.
You can achieve it using the Sort property of the DataView. This is a string which specifies the column (or columns) to sort on, and the order (ASC or DESC).
Example:
myTable.DefaultView.Sort = "myColumn DESC";
You can now use the DefaultView to do whatever you want (bind it to something or whatever)
To be honest, since you are using DataTable, you might as well just sort at the client.
Dynamic sorting (at the server) via SPs etc is always a pain; to do it in pure TSQL, you either need some horribly inefficient CASE block at the end of the SELECT, or you need to use dynamic SQL (for example via sp_ExecuteSQL), manipulating the ORDER BY in the final query. The only other option (in raw TSQL) would be to EXEC/INTO to get the data into a table variable (or temp table), then SELECT from this with an ORDER BY.
If it is an option, LINQ-to-SQL actually does OK at this; it supports querying (and composing against) UDFs - so rather than an SP, code the query in a UDF (the SP can always just SELECT from the UDF if you need to support legacy callers). Then you can use "order by" etc in a LINQ query:
var qry = from row in ctx.SomeMethod(args)
order by row.Name, row.Key
select row;
(or there are various methods for adding a dynamic sort to a LINQ query - the above is just a simple example)
the final TSQL will be something like:
SELECT blah FROM theudf(args) ORDER BY blah
i.e. it will get it right, and do the "ORDER BY" at the server. This is particularly useful when used with Skip() and Take() to get paged data.

Categories

Resources