Associative Array/Hash/Hashtable using Connector/NET - c#

I am working with asp.NET and c#, using the MySQL's connector/NET plug-in thingy to connect to a MySQL db (no surprises there).
And that works fine, can connect, run queries etc etc all fine and dandy, but is it possible to return a Hashtable or similar of the results? Save running a describe on the same table to get the column names and use those values to create the Hash each time.

The MySQL C/C++ connector which I assume to be wrapped around C# (versus re-implemented in C#) returns a two-demential array containing the results. This is only the column and row data, not the column name. The API also returns a field (column name) value through mysql_fetch_field_direct() -- a separate function call after obtaining the query results. This too is a two-demential array. The connector itself doesn't contain API for merging the two separate results (column names + column/row data) into a hash table.
Instead of making a second query to obtain the column names, all you need to do is call mysql_fetch_field_direct() for each column as you progress through assigning values. This gives you the field name along with the data contained in that column/row. At this point it's up to the developer as to how to arrange that data such as storing it in a hash table, etc.
I use a helper function as a wrapper around query execution that stores each row in a binary tree with the column name being the key and returns a linked list of trees for me to do with what I need.

in .net you get only datatables and datasets, a datatable is made out of datarows, those are very very similar to hashtables and in most cases you can use those to achieve the tasks, but if you need hashtable you can use this code
public static Hashtable convertDataRowToHashTable(DataRow dr)
{
if (dr == null)
{
return null;
}
Hashtable ret = new Hashtable(dr.Table.Columns.Count);
for (int iColNr = 0; iColNr < dr.Table.Columns.Count; iColNr++)
{
ret[dr.Table.Columns[iColNr].ColumnName] = dr[iColNr];
}
return ret;
}
other direction (hast table to datrow) is not that easy, as datarow does not have a public constructor (by design) and you have to call newRow = myDataTable.NewRow(); to get a new instance of a row, and than you can work with row almost as with hashtable
newRow["column1"]="some value";
but if you need a new column in hashtable you will have to add column to datatable and not to data row myTable.Columns.Add("name", "type");
hope this helps

Related

SqlBulkCopy WriteToServer with an IDataReader instead of DataTable and Programmatically Adjusted Field Values

We have a working code in C# that utilizes SqlBulkCopy to insert records into a table from a stored procedure source. At a high-level:
Reads data from a stored procedure that puts the records into a DataTable. Essentially calls the SP and does an AdpAdapter to put the records into the DataTable. Let's call this srcDataTable.
Dynamically maps the column names between source and destination through configuration, a table that's similar to the following:
TargetTableName
ColumnFromSource
ColumnInDestination
DefaultValue
Formatting
TableA
StudentFirstName
FirstName
NULL
NULL
TableA
StudentLastName
LastName
NULL
NULL
TableA
Birthday
Birthdate
1/1/1900
dd/MM/yyyy
Based on the mapping from #2, set up new rows from srcDataTable using .NewRow() of a DataRow to another DataTable that matches the structure of the destination table (where ColumnNameOfDestination is based). Let's call this targetDataTable. As you can see from the table, there may be instances where the value from the source is not specified, or needs to be formatted a certain way. This is the primary reason why we're having to add data rows on the fly to another data table, and the adjustment / defaulting of the values are handled in code.
Call SqlBulkCopy to write all the rows in targetDataTable to the actual SQL table.
This approach has been working alright in tandem with stored procedures that utilize FETCH and OFFSET so it only returns an X number of rows at a time to deal with memory constraints. Unfortunately, as we're getting more and more data sources that are north of 50 million rows, and that we're having to share servers, we're needing to find a faster way to do so while keeping memory consumption in check. Researching options, it seems like utilizing an IDataReader for SQLBulkCopy will allow us to limit the memory consumption of the code, and not having to delegate getting X number of records in the stored procedure itself anymore.
In terms of preserving current functionality, it looks like we can utilize SqlBulkCopyMappingOptions to allow us to maintain mapping the fields even if they're named differently. What I can't confirm however is the defaulting or formatting of the values.
Is there a way to extend the DataReader's Read() method so that we can introduce that same logic to revise whatever value will be written to the destination if there's configuration asking us to? So a) check if the current row has a value populated from the source, b) default its value to the destination table if configured, and c) apply formatting rules as it gets written to the destination table.
You appear to be asking "can I make my own class that implements IDataReader and has some altered logic to the Read() method?"
The answer's yes; you can write your own data reader that does whatever it likes in Read(), format the server's hard disk as soon as it's called even.. When you're implementing an interface you aren't "extend[ing] the DataReader's read method", you're providing your own implementation that externally appears to obey a specific contract but the implementation detail is entirely up to you. If you want, upon every read, to pull down a row from db X into a temp array, zip through the array tweaking the values to have some default or other adjustment, before returning true that's fine..
..if you wanted to do the value adjustment in the GetXXX, then that's also fine.. you're writing the reader so you decide. All the bulk copier is going to do is call Read until it returns false and write the data it gets from e.g. GetValue (if it wasn't immediately clear: read doesn't produce the data that will be written, GetValue does. Read is just an instruction to move to the next set of data that must be written but it doesn't even have to do that. You could implement it as { return DateTime.Now.DayOfWeek == DayOfWeek.Monday; } and GetValue as { return Guid.NewGuid().ToString(); } and your copy operation would spend until 23:59:59.999 filling the database with guids, but only on Monday)
The question is a bit unclear. It looks like the actual question is whether it's possible to transform data before using SqlBulkCopy with a data reader.
There are a lot of ways to do it, and the appropriate one depends on how the rest of the ETL code does. Does it only work with data readers? Or does it load batches of rows that can be modified in memory?
Use IEnumerable<> and ObjectReader
FastMember's ObjectReader class creates an IDataReader wrapper over any IEnumerable<T> collection. This means that both strongly-typed .NET collections and iterator results can be sent to SqlBulkCopy.
IEnumerable<string> lines=File.ReadLines(filePath);
using(var bcp = new SqlBulkCopy(connection))
using(var reader = ObjectReader.Create(lines, "FileName"))
{
bcp.DestinationTableName = "SomeTable";
bcp.WriteToServer(reader);
}
It's possible to create a transformation pipeline using LINQ queries and iterator methods this way, and feed the result to SqlBulkCopy using ObjectReader. The code is a lot simpler than trying to create a custom IDataReader.
In this example, Dapper can be used to return query results as an IEnumerable<>:
IEnumerable<Order> orders=connection.Query<Order>("select ... where category=#category",
new {category="Cars"});
var ordersWithDate=orders.Select(ord=>new OrderWithDate {
....
SaleDate=DateTime.Parse(ord.DateString,CultureInfo.GetCultureInfo("en-GB");
});
using var reader = ObjectReader.Create(ordersWithDate, "Id","SaleDate",...));
Custom transforming data readers
It's also possible to create custom data readers by implementing the IDataReader interface. Libraries like ExcelDataReader and CsvHelper provide such wrappers over their results. CsvHelper's CsvDataReader creates an IDataReader wrapper over the parsed CSV results. The downside to this is that IDataReader has a lot of methods to implement. The GetSchemaTable will have to be implemented to provide column and information to later transformation steps and SqlBulkCopy.
IDataReader may be dynamic, but it requires adding a lot of hand-coded type information to work. In CsvDataReader most methods just forward the call to the underlying CsvReader, eg :
public long GetInt64(int i)
{
return csv.GetField<long>(i);
}
public string GetName(int i)
{
return csv.Configuration.HasHeaderRecord
? csv.HeaderRecord[i]
: string.Empty;
}
But GetSchemaTable() is 70 lines, with defaults that aren't optimal. Why use sting as the column type when the parser can already parse date and numeric data for example?
One way to get around this is to create a new custom IDataReader using a copy of the previous reader's Schema Table and adding the extra columns. CsvDataReader's constructor accepts a DataTable schemaTable parameter to handle cases where its own GetSchemaTable isn't good enough. That DataTable could be modified to add extra columns :
/// <param name="csv">The CSV.</param>
/// <param name="schemaTable">The DataTable representing the file schema.</param>
public CsvDataReader(CsvReader csv, DataTable schemaTable = null)
{
this.csv = csv;
csv.Read();
if (csv.Configuration.HasHeaderRecord)
{
csv.ReadHeader();
}
else
{
skipNextRead = true;
}
this.schemaTable = schemaTable ?? GetSchemaTable();
}
A DerivedColumnReader could be created that does just that in its constructor :
public DerivedColumnReader<TSource,TResult>(string sourceName, string targetname,Fun<TSource,TResult> func,DataTable schemaTable)
{
...
AddSchemaColumn(schemaTable);
_schemaTable=schemaTable;
}
void AddSchemaColumn(DataTable dt,string targetName)
{
var row = dt.NewRow();
row["AllowDBNull"] = true;
row["BaseColumnName"] = targetName;
row["ColumnName"] = targetName;
row["ColumnMapping"] = MappingType.Element;
row["ColumnOrdinal"] = dt.Rows.Count+1;
row["DataType"] = typeof(TResult);
//20-30 more properties
dt.Rows.Add(row);
}
That's a lot of boiler plate that's eliminated with LINQ.
Just providing closure to this. So the main question really is to how we can avoid running into out of memory exceptions when fetching data from SQL without employing FETCH and OFFSET in the stored procedure. The resolution didn't require getting fancy with a custom Reader similar to SqlDataReader, but adding count checking and calling SqlBulkCopy in batches. The code is similar to what's written below:
using (var dataReader = sqlCmd.ExecuteReader(CommandBehavior.SequentialAccess))
{
int rowCount = 0;
while (dataReader.Read())
{
DataRow dataRow = SourceDataSet.Tables[source.ObjectName].NewRow();
for (int i = 0; i < SourceDataSet.Tables[source.ObjectName].Columns.Count; i++)
{
dataRow[(SourceDataSet.Tables[source.ObjectName].Columns[i])] = dataReader[i];
}
SourceDataSet.Tables[source.ObjectName].Rows.Add(dataRow);
rowCount++;
if (rowCount % recordLimitPerBatch == 0)
{
// Apply our field mapping
ApplyFieldMapping();
// Write it up
WriteRecordsIntoDestinationSQLObject();
// Remove from our dataset once we get to this point
SourceDataSet.Tables[source.ObjectName].Rows.Clear();
}
}
}
Where ApplyFieldMapping() makes field-specific changes to the contents of the datatable, and WriteRecordsIntoDestinationSqlObject(). This allowed us to call the stored procedure just once to fetch the data, and let the loop keep memory in check by writing records out and clearing them afterwards when we hit a preset recordLimitPerBatch.

Reading from SQL Server - need to read from CSV

At the moment, I source my data from a SQL serve r(2008) database. The cyurrent method is to use a DataTable, which is then passed around and used.
if (parameters != null)
{
SqlDataAdapter _dataAdapter = new SqlDataAdapter(SqlQuery, CreateFORSConnection());
foreach (var param in parameters)
{
_dataAdapter.SelectCommand.Parameters.AddWithValue(param.Name, param.Value);
}
DataTable ExtractedData = new DataTable(TableName);
_dataAdapter.Fill(ExtractedData);
return ExtractedData;
}
return null;
But now, the user has said that we can also get data from txt files, which have the same structure as the tables in SQL Server. So, if I have a table called 'Customer', then I have a csv file with Customer. with the same column structure. The first line in the CSV is the column name, and matches my tables.
Would it be possible to read the txt file into a data table, and then run a SELECT on that data table somehow? Most of my queries are single table queries:
SELECT * FROM Table WHERE Code = 111
There is, however, ONE case where I do a join. That may be a bit more tricky, but I can make a plan. If I can get the txt files into data tables first, I can work with that.
Using the above code, can I not change the connection string to rather read from a CSV instead of SQL Server?
First, you'll need to read the CSV data into a DataTable. There are many CSV parsers out there, but since you prefer using ADO.NET, you can use the OleDB client. See the following article.
http://www.switchonthecode.com/tutorials/csharp-tutorial-using-the-built-in-oledb-csv-parser
Joining is a bit harder, since both sets of data live in different places. But what you can do is get two DataTables (one from each source), then use Linq to join them.
Inner join of DataTables in C#
You could read the text file into a List<string> (if there is just 1 column per file), and then use LINQ to query the list. For example:
var result = from entry in myList
where entry == "111"
select entry;
Of course, this example is kind of useless since all you get back is the same string you are searching for. But if there are multiple columns in the file, and they match the columns in your DataTable, why not read the file into the data table, and then use LINQ to query the table?
Here is a simple tutorial about how to use LINQ to query a DataTable:
http://blogs.msdn.com/b/adonet/archive/2007/01/26/querying-datasets-introduction-to-linq-to-dataset.aspx

SAP connector 3.0 .NET set value on table structure

I'm trying to get data from SAP via SAP Connector 3.0 on a MVC3 application.
There is no problems with the connection.
My problem is when I try to set values on a structure from a table it says
"TABLE [STRUCTURE ZHRS_ABSENCES]: cannot set value (array storing element values is null)"
My code is the following:
//create function
IRfcFunction function = conex.Repository
.CreateFunction("Z_HR_PORTAL_GET_EMPLOYEE_DATA");
//get table from function
IRfcTable absenceHoli = function.GetTable("P_ABSENCES");
//setting value to structure
absenceHoli.SetValue(0, "0000483"); //this is where the error occurs
I'm not sure about the connector you're using, but there's a similar common misunderstanding when using JCo. A table parameter can hold multiple lines. You'll usually have to append a line to the table. This will probably return some kind of structure that you'll be able to fill. Also check this answer.
I think you just need to Append a new row before trying to call SetValue
e.g.
absenceHoli.Append();
absenceHoli.SetValue("ColumnName", "0000483"); // Add further SetValue statements for further columns
You can get the column names by putting a breakpoint on after you're got the Table Structure and examining it, which is probably nicer than just specifying column indexes.
In my case I needed to use Insert:
absenceHoli.Insert();
absenceHoli.SetValue(..., ...);

Can I access entire DataTable if all I have is a single DataRow?

DataRow contains a Table property, which seems to return the entire Table for which this row belongs.
I'd like to know if I can use that table safely, or if there are gotcha's.
In http://msdn.microsoft.com/en-us/library/system.data.datarow.table.aspx documentation, it says "A DataRow does not necessarily belong to any table's collection of rows. This behavior occurs when the DataRow has been created but not added to the DataRowCollection.", but I know for a fact my row belongs to a table.
In terms of pointers, if each Row from DataTable points to original DataTable, than I'm good to go. Is that all 'Table' property does?
Just to explain why I'm trying to get entire Table based on a single DataRow:
I'm using linq to join two (sometimes more) tables. I'd like to have a generic routine which takes the output of linq (var), and generate a single DataTable with all results.
I had opened another question at stackoverflow (Join in LINQ that avoids explicitly naming properties in "new {}"?), but so far there doesn't seem to be a generic solution, so I'm trying to write one.
if you know the row is part of table than yes you can access it without any problem. if the possibility exists where the row may not be associated to a table than check if the property is null.
if(row.Table == null)
{
}
else
{
}
As long as it's not null, you can use it freely.

Join multiple DataRows into a single DataRow

I am writing this in C# using .NET 3.5. I have a System.Data.DataSet object with a single DataTable that uses the following schema:
Id : uint
AddressA: string
AddressB: string
Bytes : uint
When I run my application, let's say the DataTable gets filled with the following:
1 192.168.0.1 192.168.0.10 300
2 192.168.0.1 192.168.0.20 400
3 192.168.0.1 192.168.0.30 300
4 10.152.0.13 167.10.2.187 80
I'd like to be able to query this DataTable where AddressA is unique and the Bytes column is summed together (I'm not sure I'm saying that correctly). In essence, I'd like to get the following result:
1 192.168.0.1 1000
2 10.152.0.13 80
I ultimately want this result in a DataTable that can be bound to a DataGrid, and I need to update/regenerate this result every 5 seconds or so.
How do I do this? DataTable.Select() method? If so, what does the query look like? Is there an alternate/better way to achieve my goal?
EDIT: I do not have a database. I'm simply using an in-memory DataSet to store the data, so a pure SQL solution won't work here. I'm trying to figure out how to do it within the DataSet itself.
For readability (and because I love it) I would try to use LINQ:
var aggregatedAddresses = from DataRow row in dt.Rows
group row by row["AddressA"] into g
select new {
Address = g.Key,
Byte = g.Sum(row => (uint)row["Bytes"])
};
int i = 1;
foreach(var row in aggregatedAddresses)
{
result.Rows.Add(i++, row.Address, row.Byte);
}
If a performace issue is discovered with the LINQ solution I would go with a manual solution summing up the rows in a loop over the original table and inserting them into the result table.
You can also bind the aggregatedAddresses directly to the grid instead of putting it into a DataTable.
most efficient solution would be to do the sum in SQL directly
select AddressA, SUM(bytes) from ... group by AddressA
I agree with Steven as well that doing this on the server side is the best option. If you are using .NET 3.5 though, you don't have to go through what Rune suggests. Rather, use the extension methods for datasets to help query and sum the values.
Then, you can map it easily to an anonymous type which you can set as the data source for your grid (assuming you don't allow edits to this, which I don't see how you can, since you are aggregating the data).
I agree with Steven that the best way to do this is to do it in the database. But if that isn't an option you can try the following:
Make a new datatable and add the columns you need manually using DataTable.Columns.Add(name, datatype)
Step through the first datatables Rows collection and for each row create a new row in your new datatable using DataTable.NewRow()
Copy the values of the columns found in the first table into the new row
Find the matching row in the other data table using Select() and copy out the final value into the new data row
Add the row to your new data table using DataTable.Rows.Add(newRow)
This will give you a new data table containing the combined data from the two tables. It won't be very fast, but unless you have huge amounts of data it will probably be fast enough. But try to avoid doing a LIKE-query in the Select, for that one is slow.
One possible optimization would be possible if both tables contains rows with identical primary keys. You could then sort both tables and step through them fetching both data rows using their array index. This would rid you of the Select call.

Categories

Resources