Currently we are using in our company OleDb in an older application.
I have started to profilng the application and dotTrace said me that this code is one of the bottlenecks. In total it takes about 18s for execution (avg 6ms).
m_DataSet = new DataSet("CommandExecutionResult");
m_DataAdapter.SelectCommand = m_OleDbCommand;
m_DataAdapter.Fill(m_DataSet, "QueryResult"); // <-- bottleneck
ReturnValue = m_DataSet.Tables[0].Copy();
m_InsertedRecordId = -1;
m_EffectedRecords = m_DataSet.Tables[0].Rows.Count;
I know, maybe there some ways to reduce the number of queries. BUT is there a way to get a DataTable from an access database without using the DataAdapter?
Related
Scenario: I want to write my own Autocomplete-API for Addresses, just like the One Google is offering. (Very Basic: Street, Housenumber, City, Postcode, Country). It is intended for private use and training-purposes only. I want to cover about 1 Million Addresses for a Start.
Technology Used: .Net Framework (not Core), C#, Visual Studio, OSMSharp, Microsoft SQL-Server, Web Api 2 (although i will probably switch to ASP.Net Core in the Future.)
Approach:
Set Up Project (Web Api 2 or Console Project for Demo-Purposes)
Download relevant File from OpenStreetMaps using DownloadClient() (https://download.geofabrik.de/)
Read in the File using OSMSharp and Filter out relevant Data.
Convert Filtered Data to a DataTable.
Use DataTable to feed SQLBulkCopy Method to import Data into Database.
Problem: Step 4 is taking way too long. For a File like "Regierungsbezirk Köln" in the Format osm.pbf which is about 160MB (the uncompressed osm file is about 2.8 GB) where talking about 4-5 Hours. I want to optimize this. The Bulk Copy of the DataTable into the Database on the other Hand (About 1 Million Rows) is taking just about 5 Seconds. (Woah. Amazing.)
Minimal Reproduction: https://github.com/Cr3pit0/OSM2Database-Minimal-Reproduction
What i tried:
Use a Stored Procedure in SQL-Server. This comes with a whole different Set of Problems and i didn't quite manage to get it Working (mainly because the uncompressed osm.pbf File is over 2GB and SQL Server doesn't like that)
Come up with a different approach to Filter and Convert the Data from the File to a DataTable (or CSV).
Use the Overpass-API. Although I read somewhere that the Overpass-API is not intended for DataSets above 10,000 Entries.
Ask the Jedi-Grandmasters on StackOverflow for Help. (Currently in Process ... :D)
Code Extract:
public static DataTable getDataTable_fromOSMFile(string FileDownloadPath)
{
Console.WriteLine("Finished Downloading. Reading File into Stream...");
using (var fileStream = new FileInfo(FileDownloadPath).OpenRead())
{
PBFOsmStreamSource source = new PBFOsmStreamSource(fileStream);
if (source.Any() == false)
{
return new DataTable();
}
Console.WriteLine("Finished Reading File into Stream. Filtering and Formatting RawData to Addresses...");
Console.WriteLine();
DataTable dataTable = convertAdressList_toDataTable(
source.Where(x => x.Type == OsmGeoType.Way && x.Tags.Count > 0 && x.Tags.ContainsKey("addr:street"))
.Select(Address.fromOSMGeo)
.Distinct(new AddressComparer())
);
return dataTable;
}
};
private static DataTable convertAdressList_toDataTable(IEnumerable<Address> addresses)
{
DataTable dataTable = new DataTable();
if (addresses.Any() == false)
{
return dataTable;
}
dataTable.Columns.Add("Id");
dataTable.Columns.Add("Street");
dataTable.Columns.Add("Housenumber");
dataTable.Columns.Add("City");
dataTable.Columns.Add("Postcode");
dataTable.Columns.Add("Country");
Int32 counter = 0;
Console.WriteLine("Finished Filtering and Formatting. Writing Addresses From Stream to a DataTable Class for the Database-SQLBulkCopy-Process ");
foreach (Address address in addresses)
{
dataTable.Rows.Add(counter + 1, address.Street, address.Housenumber, address.City, address.Postcode, address.Country);
counter++;
if (counter % 10000 == 0 && counter != 0)
{
Console.WriteLine("Wrote " + counter + " Rows From Stream to DataTable.");
}
}
return dataTable;
};
Okay i think i got it. Im down to about 12 Minutes for a File-Size of about 600mb and about 3.1 Million Rows of Data after Filtering.
The first Thing i tried is to replace the logic that populates my DataTable with FastMember. Which worked, but didnt give the Performance Increase i was hoping for (I canceled the Process after 3 Hours...). After more Research i stumbled upon an old Project which is called "osm2mssql" (https://archive.codeplex.com/?p=osm2mssql). I used a little part of the Code which directly read the Data from the osm.pbf File and modified it to my Use-Case ( → which is to extract Address-Data from Ways). I did actually use FastMember to write an IEnumerable<Address> to the Datatable, but i dont need OSM-Sharp and whatever extra Dependencies they have anymore. So thank you very much for the Suggestion of FastMember. I will certainly keep that Library in Mind in future Projects.
For those who are interested, i updated my Github-Project accordingly (https://github.com/Cr3pit0/OSM2Database-Minimal-Reproduction) (although i didnt thoroughly test it, because i moved on from the Test-Project to the Real Deal, which is a Web Api)
Im quite sure it can be further optimized but i dont think i care at the Moment. 12 Minutes for a Method which might be called once a month to update the whole Database is fine i guess. Now i can move on to opimizing my Queries for the Autocomplete.
So thank you very much to whoever wrote "osm2mssql".
I need to export thousands of files from a Cosmos DB, and I am wondering if there may be a more efficient way to get all these documents (but I haven't been able to figure one out by browsing the documentation and searching).
Right now I am using the FeedIterator to get my results:
Database database = m_cosmosClient.GetDatabase(m_databaseId);
DatabaseResponse databaseResponse = await database.ReadAsync();
// The response from Azure Cosmos
DatabaseProperties properties = databaseResponse;
Container container = databaseResponse.Database.GetContainer(m_cosmosDbContainer);
QueryDefinition query = new QueryDefinition(queryString);
QueryRequestOptions queryOptions = new QueryRequestOptions { MaxItemCount = 10000, MaxBufferedItemCount = 10000 };
List<Article> results = new List<Article>();
FeedIterator<Article> resultSetIterator = container.GetItemQueryIterator<Article>(query, null, queryOptions);
while (resultSetIterator.HasMoreResults)
{
FeedResponse<Article> response = await resultSetIterator.ReadNextAsync();
results.AddRange(response);
if (response.Diagnostics != null)
{
Console.WriteLine($"\nQueryWithSqlParameters Diagnostics: {response.Diagnostics.ToString()}");
}
}
I am worried that without some form of multi-tasking that I could run out of memory, and then again it is always nice to have a faster run time.
The Cosmos DB Data Migration Tool is a good (and simple) option if you want to run the extract from a workstation. It can be run interactively or automated using scripts.
Creating a job in Azure Data Factory is a bit more complex but also offers a lot more flexibility.
This article discusses the various options for data migration in and out of Cosmos DB.
I am currently trying to get the value of a GPU's total VRAM using the Nuget package OpenHardwareMonitor.
I know that it is possible to get the value through the use of the package, however, I have been trying for quite a while, and have not found the specific code for completing the task.
I am not looking for the answer of getting the total VRAM using any other means, such as WMI. I am just looking for an answer using OpenHardwareMonitor.
If you have the solution to this problem, that would be greatly appreciated!
The problem is that the NuGet package is built from an older version of the code. In the meantime additional sensors have been added that include details about total, free and used GPU memory (at least for NVidea GPU's). See this diff.
If that package ever gets updated, you should be able to find the memory details in the list of sensors:
var computer = new Computer();
computer.GPUEnabled = true;
computer.Open();
var gpu = computer.Hardware.First(x => x.HardwareType == HardwareType.GpuNvidia);
var totalVideoRamInMB = gpu.Sensors.First(x => x.Name.Equals("GPU Memory Total")).Value / 1024;
computer.Close();
Until then, a workaround would be to extract the memory information from the GetReport() result, where the GPU memory information looks like this:
Memory Info
Value[0]: 2097152
Value[1]: 2029816
Value[2]: 0
Value[3]: 8221004
Value[4]: 753168
Where Value[0] is the total GPU memory and Value[4] the amount of free GPU memory in kB. So with some regex magic, we can extract that information:
var pattern = #"Memory Info.*Value\[0\]:\s*(?<total>[0-9]+).*Value\[4\]:\s*(?<free>[0-9]+)";
var computer = new Computer();
computer.GPUEnabled = true;
computer.Open();
var gpu = computer.Hardware.First(x => x.HardwareType == HardwareType.GpuNvidia);
var report = gpu.GetReport();
var match = Regex.Match(report, pattern, RegexOptions.Singleline);
var totalVideoRamInMB = float.Parse(match.Groups["total"].Value) / 1024;
var freeVideoRamInMB = float.Parse(match.Groups["free"].Value) / 1024;
computer.Close();
Note that OpenHardwareMonitor only implements GPU memory information for NVidea GPU's.
I've searched all over and I now have to ask SO. I'm trying to construct a simple dataflow using EzAPI. It's been anything but easy, but I'm committed to figuring this out. What I can't figure out is how to get the EzOleDBDestination working. Here's my complete code
var a = new Application();
// using a template since it's impossible to set up an ADO.NET connection to MySQL
// using EzAPI and potentially even with the raw SSIS API...
var pkg = new EzPackage(a.LoadPackage(#"C:\...\Package.dtsx", null));
pkg.Name = "Star";
var df = new EzDataFlow(pkg);
df.Name = "My DataFlow";
var src = new EzAdoNetSource(df);
src.Name = "Source Database";
src.SqlCommand = "SELECT * FROM enum_institution";
src.AccessMode = AccessMode.AM_SQLCOMMAND;
src.Connection = new EzConnectionManager(pkg, pkg.Connections["SourceDB"]);
src.ReinitializeMetaData();
var derived = new EzDerivedColumn(df);
derived.AttachTo(src);
derived.Name = "Prepare Dimension Attributes";
derived.LinkAllInputsToOutputs();
derived.Expression["SourceNumber"] = "id";
derived.Expression["Name"] = "(DT_STR,255,1252)description";
// EDIT: reordered the operation here and I no longer get an error, but
// I'm not getting any mappings or any input columns when I open the package in the designer
var dest = new EzOleDbDestination(df);
dest.AttachTo(derived, 0, 0);
dest.Name = "Target Database";
dest.AccessMode = 0;
dest.Table = "[dbo].[DimInstitution]";
dest.Connection = new EzConnectionManager(pkg, pkg.Connections["TargetDB"]);
// this comes from Yahia's link
var destInput = dest.Meta.InputCollection[0];
var destVirInput = destInput.GetVirtualInput();
var destInputCols = destInput.InputColumnCollection;
var destExtCols = destInput.ExternalMetadataColumnCollection;
var sourceColumns = derived.Meta.OutputCollection[0].OutputColumnCollection;
foreach(IDTSOutputColumn100 outputCol in sourceColumns) {
// Now getting COM Exception here...
var extCol = destExtCols[outputCol.Name];
if(extCol != null) {
// Create an input column from an output col of previous component.
destVirInput.SetUsageType(outputCol.ID, DTSUsageType.UT_READONLY);
var inputCol = destInputCols.GetInputColumnByLineageID(outputCol.ID);
if(inputCol != null) {
// map the input column with an external metadata column
dest.Comp.MapInputColumn(destInput.ID, inputCol.ID, extCol.ID);
}
}
}
Basically, anything that involves calls to ReinitializeMetadata() results in 0xC0090001, because that method is where the error happens. There's no real documentation to help me, so I have to rely on any gurus here.
I should mention that the source DB is MySQL and the target DB is SQL Server. Building packages like this using the SSIS designer works fine, so I know it's possible.
Feel free to tell me if I'm doing anything else wrong.
EDIT: here's a link to the base package I'm using as a template: http://www.filedropper.com/package_1 . I've redacted the connection details, but any MySQL and SQL Server database will do. The package will read from MySQL (using the MySQL ADO.NET Connector) and write to SQL Server.
The database schema is mostly irrelevant. For testing, just make a table in MySQL that has two columns: id (int) and description (varchar), with id being the primary key. Make equivalent columns in SQL Server. The goal here is simply to copy from one to the other. It may end up being more complex at some point, but I have to get past this hurdle first.
I can't test this now BUT I am rather sure that the following will help you get it working:
Calling ReinitializeMetadata() causes the component to fetch the table metadata. This should only be called after setting the AccessMode and related property. You are calling it before setting AccessMode...
Various samples including advice on debugging problems
define the derived column(s) directly in the SQL command instead of using a EzDerivedColumn
try to get it working with 2 SQL Server DBs first, some of the available MySQL ADO.NET provider have some shortcomings under some circumstances
UPDATE - as per comments some more information on debugging this and a link to a complete end-to-end sample with source:
http://blogs.msdn.com/b/mattm/archive/2009/08/03/looking-up-ssis-hresult-comexception-errorcode.aspx
http://blogs.msdn.com/b/mattm/archive/2009/08/03/debugging-a-comexception-during-package-generation.aspx
Complete working sample with source
I've had this exact same issue and been able to resolve it with a lot of experimentation. In short you must set the connection for both the source and destination, and then call the attachTo after both connections are set. You must call attachTo for every component.
I've written a blog about starting with an SSIS package as a template, and then manipulating it programmatically to produce a set of new packages.
The article explains the issue more.
When using ODP.NET load data into and spatial database i'm using UDT to define the SDOGEOMETRY type.
Then i use the ArrayBindCount on the OracleCommand to load batches of data. Everything works, but i see a constant increase of memory of the process, and performance counters shows the same thing..
Parameter is created using:
var param = new OracleParameter("geom", OracleDbType.Object);
param.UdtTypeName = "MDSYS.SDO_GEOMETRY";
param.Direction = ParameterDirection.Input;
cmd.Parameters.Add(param);
Also, i set the cmd.AddToStatementCache = false to prevent data from ending up in there..
When adding data i use:
param.Value = new object[numRowsToInsert];
int row = 0;
foreach (var row in rowstoinsert)
{
OracleUDT.SdoGeometry geoVal = rowstoinsert[row].geom;
(param.Value as object[])[row] = geoval;
}
...
cmd.ExecuteNonQuery(); //THIS IS WHERE MEMORY LEAK APPEARS TO BE
..
I tried running the program with ExecuteNonQuery() removed, and then there is no MemoryLeakage at all....
Edit:
I also tried removing the UDT-parameter and run through the program, also without any leak. So it looks the problem is very close related to UDT:s and when statements are executed.
I'm using ODP.NET 11.2.0.2.1
Anyone got any clue?
Is there something i need to clean that does not get created if not running ExecuteNonQuery()?
Thought I'd give a followup on this one.
After numerous emails with Oracle Tech-Support i finally got this accepted as a bug
This appears to be Bug 10157396 which is fixed in 12.1, is planned to be fixed in 11.2.0.4 and has been backported to 11.2.0.2 (available in Patch Bundle 18). This can be downloaded from MyOracleSupport as Patches 10098816 (11.2.0.2) and 13897456 (Bundle 18) for a temporary solution while we get a backport to 11.2.0.3 or until 11.2.0.4 is released.