I have to read in a large record set, process it, then write it out to a flat file.
The large result set comes from a Stored Proc in SQL 2000.
I currently have:
var results = session.CreateSQLQuery("exec usp_SalesExtract").List();
I would like to be able to read the result set row by row, to reduce the memory foot print
Thanks
NHibernate is not designed for that usage. Plus, you're not really using its features.
So, in this case, it's better to use raw ADO.NET.
Why not just use SQL Server's bcp Utility: http://msdn.microsoft.com/en-us/library/aa174646%28SQL.80%29.aspx to write the file from the stored procedure. If you need to do logic on the data, modify the the procedure to do what your need.
NHibernate doesn't allow to do it directly.
You can do it with ADO.NET SqlDataReader using session.Connection property:
SqlCommand MyCommand = new SqlCommand("sp_SalesExtract", session.Connection);
MyCommand.CommandType = CommandType.StoredProcedure;
SqlDataReader MyDataReader = MyCommand.ExecuteReader();
while (MyDataReader.Read())
{
// handle row data (MyDataReader[0] ...)
}
If you can query the data with Linq to NH you can stream the results with the following extension method (Pass ISessionImplementor in if you don't like the reflection hack):
public static EnumerableImpl Stream<T>(this IQueryable<T> source)
{
var provider = ((NhQueryable<T>) source).Provider as DefaultQueryProvider;
var sessionImpl = (ISessionImplementor)provider.GetType()
.GetProperty("Session", BindingFlags.NonPublic |
BindingFlags.Instance).GetValue(provider);
var expression = new NhLinqExpression(source.Expression, sessionImpl.Factory);
var query = sessionImpl.CreateQuery(expression);
query.SetParameters(expression.ParameterValuesByName);
provider.SetResultTransformerAndAdditionalCriteria(
query, expression, expression.ParameterValuesByName);
return (EnumerableImpl)((AbstractQueryImpl2)query).Enumerable();
}
private static void SetParameters(this IQuery query,
IDictionary<string, Tuple<object, IType>> parameters)
{
foreach (var parameterName in query.NamedParameters)
{
var param = parameters[parameterName];
if (param.Item1 == null)
{
if (typeof(IEnumerable).IsAssignableFrom(param.Item2.ReturnedClass) &&
param.Item2.ReturnedClass != typeof(string))
query.SetParameterList(parameterName, null, param.Item2);
else query.SetParameter(parameterName, null, param.Item2);
}
else
{
if (param.Item1 is IEnumerable && !(param.Item1 is string))
query.SetParameterList(parameterName, (IEnumerable)param.Item1);
else if (param.Item2 != null)
query.SetParameter(parameterName, param.Item1, param.Item2);
else query.SetParameter(parameterName, param.Item1);
}
}
}
You'll need to wrap it in a using statement to ensure the reader is closed:
using (var results = session.Query<Fark>().Take(50).Where(x => x.Enabled).Stream())
{
results.ForEach(x => writer.WriteLine(x.ToCsv()));
}
Related
I have a fairly agnostic ADO.NET application that connects to a number of databases and is able to extract the necessary information to run. I have hit a snag with DB2 and how it handles named parameters, particularly when I reuse a named parameter in the same query. I know of a couple of ways to get around this by simply adding more parameters, but in theory it should work as it does on other databases that I connect to as the parameter name is the same.
What I'm doing is a bit more complicated and involves subqueries etc, but to demonstrate, take the following query:
select value from test.table where cola=#key1 and colb=#key1;
The named parameter #key1 is used twice.
My code is as follows:
try
{
DbProviderFactory dbfFactory = DbProviderFactories.GetFactory("IBM.Data.DB2.iSeries");
using (DbConnection dbConnection = dbfFactory.CreateConnection())
{
dbConnection.ConnectionString = "DataSource=xxx.xxx.xxx.xxx;UserID=xxxxxxxx;password=xxxxxxxxx";
using (DbCommand dbCommand = dbConnection.CreateCommand())
{
IDbDataParameter iddpParameter1 = dbCommand.CreateParameter();
iddpParameter1.ParameterName = "#key1";
iddpParameter1.DbType = DbType.String;
iddpParameter1.Value = "1";
dbCommand.Parameters.Add(iddpParameter1);
dbCommand.CommandType = CommandType.Text;
dbCommand.CommandText = "select value from test.table where cola=#key1 and colb=#key1";
dbConnection.Open();
using (IDataReader idrReader = dbCommand.ExecuteReader())
{
while (idrReader.Read())
{
...
}
}
}
} // end dbConnection
} // end try
catch (Exception ex)
{
Console.Write(ex.Message);
}
When I run this I get an exception that tells me:
System.InvalidOperationException: Not enough parameters specified. The command requires 2 parameter(s), but only 1 parameter(s) exist in the parameter collection.
I get what it is telling me, but I'm looking for help in figuring out how I can have the provider use the named parameter for both parameters as they are the same. It seems that it is doing a blind count of named parameters and not realizing that they are the same named parameters. SQL Server seems to allow me to do this with the same code above. I'm guessing it's just one of those differences in the providers, but hoping someone has run into this and has a solution for DB2 that doesn't get into specific DB2 code.
Thanks, appreciate the assistance.
well I did a little more digging, and I wonder if it might be the connector that you are using. So I'm doing the following (which is very similar to what you are doing)
in my app config file I have
<connectionStrings>
<add name="AWOLNATION" providerName="Ibm.Data.DB2" connectionString="Server=sail:50000;Database=Remix;" />
</connectionStrings>
in my Databasemanager class I would initialize it like so
public static DatabaseManager Instance(string connectionStringName)
{
var connectionStringSettings = ConfigurationManager.ConnectionStrings[connectionStringName];
if (connectionStringSettings == null) throw new MissingMemberException("[app.config]", string.Format("ConnectionStrings[{0}]", connectionStringName));
return new DatabaseManager(connectionStringSettings);
}
private DatabaseManager(ConnectionStringSettings connectionInformation)
{
_connectionInformation = connectionInformation;
_parameters = new Dictionary<string, object>();
}
private void Initialize()
{
_connection = DbProviderFactories.GetFactory(_connectionInformation.ProviderName).CreateConnection();
_connection.ConnectionString = _connectionInformation.ConnectionString;
_command = _connection.CreateCommand();
}
I add parameters a little different though. I have a Dictionary<string,object> that I add too when setting up my query. To use your example I would have had this
public IEnumerable<object> GetSomething(string key)
{
var sql = "select value from test.table where cola = #key1 and colb = #key1";
_manager.AddParameter("#key1", key);
return _manager.ExecuteReader<object>(sql, ToSomethignUseful);
}
private object ToSomethignUseful(DatabaseManager arg)
{
return new { Value = arg.GetArgument<object>("value") };
}
then reading is where the OP and I have similar code
public IEnumerable<T> ExecuteReader<T>(string sql, Func<DatabaseManager, T> conversionBlock)
{
Initialize();
using (_connection)
{
_connection.Open();
_command.CommandText = sql;
_command.CommandType = CommandType.Text;
if (_parameters.Count > 0)
AddParameters(_command, _parameters);
_parameters.Clear();
using (_reader = _command.ExecuteReader())
{
while (_reader.Read())
{
yield return conversionBlock(this);
}
}
}
}
private static void AddParameters(DbCommand command, Dictionary<string, object> parameters)
{
foreach (var param in parameters)
{
command.Parameters.Add(CreateParameter(command, param.Key, param.Value));
}
}
private static DbParameter CreateParameter(DbCommand command, string key, object value)
{
var parameter = command.CreateParameter();
parameter.ParameterName = key;
parameter.Value = value;
return parameter;
}
running said code is working for me, so I wonder if the difference is in the provider that we are using. I'm using named parameters in production and have been for atleast a year now, possibly closer to 2 years.
I will say that I did get the same error when essentially running the same code twice, as shown in this code
public static void IndendedPrintForEach<T>(this IEnumerable<T> array, string header, Func<T, string> consoleStringConverterMethod)
{
var list = array.ToList();
var color = Console.ForegroundColor;
Console.ForegroundColor = ConsoleColor.Magenta;
Console.WriteLine($"<<<{header}>>>");
Console.ForegroundColor = color;
if (!list.Any())
{
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.WriteLine(" ************NoItemsFound************");
Console.ForegroundColor = color;
}
else
{
foreach (var item in list)
Console.WriteLine($" {consoleStringConverterMethod(item)}");
}
}
on line 3 var list = array.ToList() was the fix to the problem that you were seeing for me. before I had if (!array.Any()) which would run the query and use the parameters (which I clear out before I execture the query) then when I go to enumerate through and print each item in the array I was then getting the error. For me the problem was that it was re-running the query which I had no more parameters. The fix was to enumerate the query with the ToList() and then do my checking and printing on the list.
You answered you own question: "Unfortunately, I have not found a solution. I had to create another named parameter and just assign it the same value"
Oracle/DB2/Sybase especially are difficult with SQL queries and parameters.
Parameters in the SQL query should be in the same order they are added to the C# parameters are added to the C# SQL command (Oracle, Sybase)
Put parenthesis around the SQL query where clause parts using parameters (all)
Make sure the SQL data types are matching the C# parameter data types (all)
Check for overflow/underflow of parameter data so that the SQL query does not error
Pass in null/empty string in the appropriate format for the database. Ideally, create C# SQLParameter create methods to create the parameter in the correct format for the database
Oracle is particularly finicky about this. Take the time to build a C# wrapper library to construct a C# query object correctly, construct C# parameters correctly and add the C# SQL parameters to the query.
Put in notes that the query parameter add order should match the order of "#" parameters in the SQL query.
This wrapper library is you documentation for you and the next developer to avoid the problems you've encountered.
I'm looking for the absolute easiest way to call stored procedures from C# without explicit parameter objects, like so:
using (DataTable dt=conn.ExecuteQuery("MySP", param1, "param2", p3, p4)) {
On first invocation the library queries the DB schema for the SP signature then caches it for subsequent calls.
A) Is there any way to do it THIS SIMPLY with the Enterprise Library Data Access Block?
B) I don't find ORMs attractive because of synchronization issues between schema and code metadata.
I DID find this generator-less wrapper but am hoping there is a major library or best practice I somehow just haven't discovered yet.
I do have an example of an SqlDataReader where the Function call is
ExecuteNonQuery("dbo.[Sp_Skp_UpdateFuncties]", parameters);
This is in a class DataBaseManager which hold the databaseconnectionstring
public classDataBaseManager
{
...
public int ExecuteStoredProcedure(string storedprocedureNaam, IEnumerable<KeyValuePair<string, object>> parameters)
{
var sqlCommand = new SqlCommand
{
Connection = DatabaseConnectie.SqlConnection,
CommandType = CommandType.StoredProcedure,
CommandText = storedprocedureNaam,
};
foreach (KeyValuePair<string, object> keyValuePair in parameters)
{
sqlCommand.Parameters.Add(
new SqlParameter { ParameterName = "#" + keyValuePair.Key, Value = keyValuePair.Value ?? DBNull.Value }
);
}
if (sqlCommand == null)
throw new KoppelingException("Stored procedure ({0}) aanroepen lukt niet", storedprocedureNaam);
return sqlCommand.ExecuteNonQuery();
}
....
}
Dapper?
var rows = conn.Query("procname",
new { name = "abc", id = 123 }, // <=== args, fully named and typed
commandType: CommandType.StoredProcedure
).ToList();
The above is the dynamic API which allows automatic binding to column names:
foreach(var row in rows) {
int x = row.X; // look ma, no column mappings
string y = row.Y;
//...
}
But you can also use Query<SomeType> and it will populate an object model for you. When binding to an object model it includes all the meta-programming / caching you might expect from people obsessive-compulsive about performance. Hint: I usually use the generic API - it is very very fast.
TL;DR I'm using EntityFramework 5.0 with Oracle and need to query a table for two columns only using index with NVL of two columns.
Details after hours of attempts... I'll try to organize it as possible.
The desired SQL query should be:
SELECT t.Code, NVL(t.Local, t.Global) Description
FROM Shows t
Where t.Code = 123
So what is the problem? If I want to use Context.Shows.Parts.SqlQuery(query) I must return the whole row(*), but then I get Table Access Full, so I must return only the desired columns.
The next thing(Actually there were a lot of tries before the following...) that I've tried which gives a very close results was using the null-coalescing operator(??) :
Context.Shows.Where(x => x.Code == 123)
.Select(x => new { x.Code, Description = x.Local ?? x.Global);
But the SQL it's using is complicated using case & when and not using my Index on Code, Nvl(Local, Global) which is critical!
My next step was using Database.SqlQuery
context.Database.SqlQuery<Tuple<int, string>>("the Raw-SQLQuery above");
But I get an error that Tuple must not be abstract and must have default ctor(it doesn't).
Final step which I dislike is creating a class which has only those two properites(Code, Description), now... it works great, but I don't want to write a class for each query like that.
Ideas?
This is a no-solution answer.
I think whatever you try, you can't do that. Even if you define your own mutable generic Tuple, it will failed since the name of the property must match the name of the column:
SqlQuery(String, Object[]): Creates a raw SQL query that will
return elements of the given generic type. The type can be any type
that has properties that match the names of the columns returned from
the query, or can be a simple primitive type.
I think the best you can do is creating your own generic method for querying the database via classic Command and ExecuteReader pattern. Untested, but you get the idea:
public static IEnumerable<Tuple<T>> SqlQuery<T>(this DbContext context, string sql)
{
using(var connection = new SqlConnection(context.Database.Connection.ConnectionString))
using (var command = new SqlCommand(sql, connection))
{
var reader = command.ExecuteReader();
while (reader.NextResult())
{
yield return new Tuple<T>((T)reader[0]);
}
}
}
public static IEnumerable<Tuple<T1, T2>> SqlQuery<T1, T2>(this DbContext context, string sql)
{
using (var connection = new SqlConnection(context.Database.Connection.ConnectionString))
using (var command = new SqlCommand(sql, connection))
{
var reader = command.ExecuteReader();
while (reader.NextResult())
{
yield return new Tuple<T1, T2>((T1)reader[0], (T2)reader[1]);
}
}
}
I have the following class which I am using to read in large amounts of data from an Access database.
public class ConnectToAccess
{
private readonly string _connectionString;
public ConnectToAccess(String connectionString)
{
_connectionString = connectionString;
}
public List<String> GetData(String sql)
{
var data = new List<String>();
using (var connection = new OleDbConnection(_connectionString))
{
using (var command = connection.CreateCommand())
{
command.CommandText = sql;
command.CommandType = CommandType.Text;
connection.Open();
using (var reader = command.ExecuteReader())
{
if (reader != null && reader.HasRows)
while (reader.Read())
{
data.Add(reader["First Name"] + " " + reader["Last Name"]);
}
}
}
}
return data;
}
}
As is, this code is working and is successfully pulling data in from the database. However, I would like to enhance the GetData() method to make it more dynamic. I would like it to somehow return a list of anonymous objects, where each object has properties relating to the columns of the dataset returned.
I've been coding in .Net for a while, but I'm still rather new at many concepts. I'm not quite sure how to create this list of anonymous objects that mirror the columns in the dataset most effectively. I'm also not sure what return type I would use in this case, I'm thinking maybe List. Then I suppose I would need to use reflection to pull the data out of those anonymous objects and transfer it into where it needs to go.
If anyone can help me with any significant part of this puzzle, I would be most obliged.
You can't have an anonymous type as a return type.
Why not just return a DataTable. You can even use a DataAdapter to make the process a lot easier. It also gets you the schema.
If you insist on getting objects for everything:
public IEnumerable<T> GetData(String sql, Func<DataReader, T> selector)
{
//code elided
while (reader.Read())
{
yield return selector(reader);
}
}
Now you can use it with a selector:
var people = GetData("Select * from People", reader => new Person { Name = reader{"Name"], Age = reader["Age"] })
people.Take(5); //first five records only
I have 1M html files that I need to parse and then insert the extracted information into my sql server. Each file parsed out information end up in multiple tables due to relationships among the objects I have parsed out
I am using Entity Framework right now to do this but adding each piece of my information to the proper object on the EF context takes a long time and not efficient! I need this faster especially that I have so many file to process.
What is the fasted way to parse out a lot of file in parallel and insert it in SQL server where items you are adding have relationships?
Also, is there a better technology for this? Like Informatica?
I think SqlBulkCopy Class will be the best option in this case.
You can make a generic wrapper around SqlBulkCopy class, which will allow you to use SqlBulkCopy on any entity. Below is the wrapper for LINQ-to-SQL, but the same idea will work with Entity Framework, with the assumption that your entity mapped to tables one-to-one.
public void BulkInsert<TBusinessObject>(IEnumerable<TBusinessObject> entities, int timeoutInSeconds)
where TBusinessObject : class, IBusinessObject
{
AssertUtilities.ArgumentAllNotNull(entities, "entities");
AssertUtilities.ArgumentNotNegative(timeoutInSeconds, "timeoutInSeconds");
var metaTable = Mapping.GetTable(typeof(TBusinessObject));
if (metaTable == null)
throw new DataAccessException("MetaTable is not found.");
var insertDataMembers = metaTable.RowType.PersistentDataMembers
.Where(arg => !arg.IsDbGenerated)
.OrderBy(arg => arg.Ordinal)
.ToList();
using (var dataTable = new DataTable())
{
dataTable.Locale = CultureInfo.InvariantCulture;
var dataColumns = insertDataMembers
.Select(arg => new DataColumn(arg.MappedName))
.ToArray();
dataTable.Columns.AddRange(dataColumns);
foreach (var entity in entities)
{
var itemArray = insertDataMembers
.Select(arg => arg.StorageAccessor.GetBoxedValue(entity))
.ToArray();
dataTable.Rows.Add(itemArray);
}
try
{
if (Connection.State != ConnectionState.Open)
Connection.Open();
var sqlConnection = (SqlConnection)Connection;
var sqlTransaction = (SqlTransaction)Transaction;
using (var bulkCopy = new SqlBulkCopy(sqlConnection, SqlBulkCopyOptions.Default, sqlTransaction))
{
bulkCopy.BulkCopyTimeout = timeoutInSeconds;
bulkCopy.DestinationTableName = metaTable.TableName;
foreach (var dataColumn in dataColumns)
bulkCopy.ColumnMappings.Add(dataColumn.ColumnName, dataColumn.ColumnName);
bulkCopy.WriteToServer(dataTable);
}
}
catch (Exception exception)
{
throw DataAccessExceptionTranslator.Translate(exception);
}
}
}