Excel List<List<string>> per each row - c#

I have a small program where you can select some database tables and create a excel file with all values for each table and thats my solution to create the excel file.
foreach (var selectedDatabase in this.lstSourceDatabaseTables.SelectedItems)
{
//creates a new worksheet foreach selected table
foreach (TableRetrieverItem databaseTable in tableItems.FindAll(e => e.TableName.Equals(selectedDatabase)))
{
_xlWorksheet = (Excel.Worksheet) xlApp.Worksheets.Add();
_xlWorksheet.Name = databaseTable.TableName.Length > 31 ? databaseTable.TableName.Substring(0, 31): databaseTable.TableName;
_xlWorksheet.Cells[1, 1] = string.Format("{0}.{1}", databaseTable.TableOwner,databaseTable.TableName);
ColumnRetriever retrieveColumn = new ColumnRetriever(SourceConnectionString);
IEnumerable<ColumnRetrieverItem> dbColumns = retrieveColumn.RetrieveColumns(databaseTable.TableName);
var results = retrieveColumn.GetValues(databaseTable.TableName);
int i = 1;
(result is a result.Item3 is a List<List<string>> which contains all values from a table and for each row is a new list inserted)
for (int j = 0; j < results.Item3.Count(); j++)
{
int tmp = 1;
foreach (var value in results.Item3[j])
{
_xlWorksheet.Cells[j + 3, tmp] = value;
tmp++;
}
}
}
}
It works but when you have a table with 5.000 or more values it will take such a long time.
Does someone maybe know a better solution to add the List List string per row than my for foreach solution ?

I utilize the GetExcelColumnName function in my code sample to convert from column count to the excel column name.
The whole idea is, that it's very slow to write excel cells one by one. So instead precompute the whole table of values and then assign the result in a single operation. In order to assign values to a two dimensional range, use a two dimensional array of values:
var rows = results.Item3.Count;
var cols = results.Item3.Max(x => x.Count);
object[,] values = new object[rows, cols];
// TODO: initialize values from results content
// get the appropriate range
Range range = w.Range["A3", GetExcelColumnName(cols) + (rows + 2)];
// assign all values at once
range.Value = values;
Maybe you need to change some details about the used index ranges - can't test my code right now.

As I see, youd didn't do profiling. I recomend to do profiling first (for example dotTrace) and see what parts of your code actualy causing performance issues.
In my practice there is rare cases (almost no such cases) when code executes slower than database requests, even if code is realy awfull in algorithmic terms.
First, I recomend to fill up your excel not by columns, but by rows. If your table has many columns this will cause multiple round trips to database - it is great impact to performance.
Second, write to excel in batches - by rows. Think of excel files as mini-databases, with same 'batch is faster than one by one' principles.

Related

c# FlaUi get all the Values from DataGridView of another program

I'm trying to pull all the values from another program's DataGridBox. For that I'm using FlaUi. I made a code that does what I want. However, it is very slow. Is there a faster way to pull up all the values from another program's DataGridView using FlaUi?
my code:
var desktop = automation.GetDesktop();
var window = desktop.FindFirstDescendant(cf => cf.ByName("History: NEWLIFE")).AsWindow();
var table = window.FindFirstDescendant(cf => cf.ByName("DataGridView")).AsDataGridView();
int rowscount = (table.FindAllChildren(cf => cf.ByProcessId(30572)).Length) - 2;
// Remove the last row if we have the "add" row
for (int i = 0; i < rowscount; i++)
{
string string1 = "Row " + i;
string string2 = "Symbol Row " + i;
var RowX = table.FindFirstDescendant(cf => cf.ByName(string1));
var SymbolRowX = RowX.FindFirstDescendant(cf => cf.ByName(string2));
SCAN.Add("" + SymbolRowX.Patterns.LegacyIAccessible.Pattern.Value);
}
var message = string.Join(Environment.NewLine, SCAN);
MessageBox.Show(message);
Thank you in-advance
Searching for descendants is pretty slow as it will go thru all objects in the tree until it finds the desired control (or there are no controls left). It might be much faster to use the grid pattern to find the desired cells or get all rows at once and loop thru them.
Alternatively you could try caching as UIA uses inter process calls which are generally slow. So each Find method or value property does such a call. If you have a large grid, that can sum up pretty badly. For that exact case, using UIA Caching could make sense.
For that, you would get everything you need (all descendants of the table and the LegacyIAccessible pattern) in one go inside a cache request and then loop thru those elements in the code with CachedChildren and such.
A simple example for this can be found at the FlaUI wiki at https://github.com/FlaUI/FlaUI/wiki/Caching:
var grid = <FindGrid>.AsGrid();
var cacheRequest = new CacheRequest();
cacheRequest.TreeScope = TreeScope.Descendants;
cacheRequest.Add(Automation.PropertyLibrary.Element.Name);
using (cacheRequest.Activate())
{
var rows = _grid.Rows;
foreach (var row in rows)
{
foreach (var cell in row.CachedChildren)
{
Console.WriteLine(cell.Name);
}
}
}

Find matching records in DataTable as fast as possible

I have C# DataTables with very large numbers of rows, and in my importer app I must query these hundreds of thousands of times in a given import. So I'm trying to find the fastest possible way to search. Thus far I am puzzling over very strange results. First, here are 2 different approaches I have been experimenting with:
APPROACH #1
public static bool DoesRecordExist(string keyColumn, string keyValue, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0)
return dt.Select($"{keyColumn} = '{SafeTrim(keyValue)}'").Count() > 0;
else
return false;
}
APPROACH #2
public static bool DoesRecordExist(string keyColumn, string keyValue, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0)
{
int counter = dt.AsEnumerable().Where(r => string.Equals(SafeTrim(r[keyColumn]), keyValue, StringComparison.CurrentCultureIgnoreCase)).Count();
return counter > 0;
}
else
return false;
}
In a mock test I run each method 15,000 times, handing in hardcoded data. This is apples-to-apples, a fair test. Approach #1 is dramatically faster. But in actual app execution, Approach #1 is dramatically slower.
Why the counterintuitive results? Is there some other faster way to query datatables that I haven't tried?
EDIT: The reason I use datatables as opposed to other types of
collections is because all my datasources are either MySQL tables or
CSV files. So datatables seemed like a logical choice. Some of these
tables contain 10+ columns, so different types of collections seemed
an awkward match.
If you want a faster access and still want to stick to the DataTables, use a dictionary to store the row numbers for given keys. Here I assume that each key is unique in the DataTable. If not, you would have to use a Dictionary<string, List<int>> or Dictionary<string, HashSet<int>> to store the indexes.
var indexes = new Dictionary<string, int>();
for (int i = 0; i < dt.Rows.Count; i++) {
indexes.Add((string)dt.Rows[i].Column(keyColumn), i);
}
Now you can access a row in a super fast way with
var row = dt.Rows[indexes[theKey]];
I have a very similar issue except that I need the actual First Occurrence of a matching row.
Using the .Select.FirstOrDefault (Approach 1) takes 38 minutes to run.
Using the .Where.FirstOrDefault (Approach 2) takes 6 minutes to run.
In a similar situation where I didn't need the FirstOrDefault, but just needed to find and work with the uniquely matching record, what I found to be the fastest by far is to use a HashTable where the Key is the Combined Values of any Columns you are trying to match, and the Value is the Data Row itself. Finding a Match is near instant.
The Function is
public Hashtable ConvertToLookup(DataTable myDataTable, params string[] pKeyFieldNames)
{
Hashtable myLookup = new Hashtable(StringComparer.InvariantCultureIgnoreCase); //Makes the Key Case Insensitive
foreach (DataRow myRecord in myDataTable.Rows)
{
string myHashKey = "";
foreach (string strKeyFieldName in pKeyFieldNames)
{
myHashKey += Convert.ToString(myRecord[strKeyFieldName]).Trim();
}
if (myLookup.ContainsKey(myHashKey) == false)
{
myLookup.Add(myHashKey, myRecord);
}
}
return myLookup;
}
The usage is...
//Build the Lookup Table
Hashtable myLookUp = ConvertToLookup(myDataTable, "Col1Name", "Col2Name");
//Use it
if (myLookUp.ContainsKey(mySearchForValue) == true)
{
DataRow myRecord = (DataRow)myLookUp[mySearchForValue]);
}
All. BINGO! Wanted to share as a different answer just because my previous might be suited for a bit of a different approach. In this scenario, I was able to go from 8 MINUTES, down to 6 SECONDS, not using either approaches...
Again, the key is a HashTable, or in my case a dictionary because I had multiple records. To recap, for me, I needed to delete 1 row from my DataTable for every matching record I found in another DataTable. With the goal that in the end, my First Datatable only contained the "Missing" records.
This uses a different function...
// -----------------------------------------------------------
// Creates a Dictionary with Grouping Counts from a DataTable
public Dictionary<string, Int32> GroupBy(DataTable myDataTable, params string[] pGroupByFieldNames)
{
Dictionary<string, Int32> myGroupBy = new Dictionary<string, Int32>(StringComparer.InvariantCultureIgnoreCase); //Makes the Key Case Insensitive
foreach (DataRow myRecord in myDataTable.Rows)
{
string myKey = "";
foreach (string strGroupFieldName in pGroupByFieldNames)
{
myKey += Convert.ToString(myRecord[strGroupFieldName]).Trim();
}
if (myGroupBy.ContainsKey(myKey) == false)
{
myGroupBy.Add(myKey, 1);
}
else
{
myGroupBy[myKey] += 1;
}
}
return myGroupBy;
}
Now.. say you have a Table of Records that you want to use as the "Match Values" based on Col1 and Col2
Dictionary<string, Int32> myQuickLookUpCount = GroupBy(myMatchTable, "Col1", "Col2");
And now the magic. We are looping through your Primary Table, and removing 1 instance of a record for each instance in the Matching Table. This is the part that took 8 minutes with Approach #2, or 38 minutes with Approach #1.. but now only takes seconds.
myDataTable.AcceptChanges(); //Trick that allows us to delete during a ForEach!
foreach (DataRow myDataRow in myDataTable.Rows)
{
//Grab the Key Values
string strKey1Value = Convert.ToString(myDataRow ["Col1"]);
string strKey2Value = Convert.ToString(myDataRow ["Col2"]);
if (myQuickLookUpCount.TryGetValue(strKey1Value + strKey2Value, out Int32 intTotalCount) == true && intTotalCount > 0)
{
myDataTable.Delete(); //Mark our Row to Delete
myQuickLookUpCount [strKey1Value + strKey2Value ] -= 1; //Decrement our Counter
}
}
myDataTable.AcceptChanges(); //Commits our changes and actually deletes the rows.

Random row from Linq to Sql with no duplicate

What is the best (and fastest) way to retreive random rows using Linq to SQL with unique data / no duplicate record? oh i preffer to do it in 1 statement, does it possible?
i found this relevant question but i don't think that this approach resulting unique records.
i have tried this so far :
//first approach
AirAsiaDataContext LinqDataCtx = new AirAsiaDataContext();
var tes = (from u in LinqDataCtx.users.AsEnumerable()
orderby Guid.NewGuid()
select u).Take(5);
//second approach
var usr = from u in LinqDataCtx.users
select u;
int count = usr.Count(); // 1st round-trip
int index = new Random().Next(count);
List<user> tes2 = new List<user>();
for (int i = 0; i < 5; i++)
{
tes2.Add(usr.Skip(index).FirstOrDefault()); // 2nd round-trip
}
as you can see above, i have tried 2 solution, it works, but above codes did not resulting unique records, there are chances for duplicate.
db.TableName.OrderBy(x=>Guid.NewGuid()).FirstOrDefault();
If you want to take unique data / no duplicate record,
you'd better to use another list to store the row which you taked already.

Fastest way to update all rows to have same value in one column in datatable, without loop in C#

I have datatable "users" and column "is_published" in it. I have about 100k rows.
What is the fastest way to update value in the column, so the whole rows in column have same value = 1.
I try with classic foreach loop and it't slow, also I try with LINQ :
dsData.Tables["users"].Select().ToList().ForEach(x => x["is_published"] = 1;);
and it still isn't fast enough.
Also variant wit Expression doesn't work for me, because after that fields is ReadOnly and I can't change value again.
This is C#.
when you create your table you can simply push a default value to your column..
DataTable dt = new DataTable();
dt.Columns["is_published"].DataType = System.Int32;
dt.Columns["is_Published"].DefaultValue = 1;
then when you need to change the rows to default value ( or will you need? )
// Say your user selects the row which its index is 2..
// The ItemArray gives the selectedRow's cells as object..
// And say your columns index no is 5..
dt.Rows[2].ItemArray[5] = default ;
or
dt.Rows[2].ItemArray[5] = dt.Columns["is_published"].DefaultValue;
Separate the select and the update into two operations. Skip the ToList() operation and instead iterate afterwards over the IEnumerable collection using forEach and update the value:
var rows = dsData.Tables["users"].Select();
forEach(var row in rows)
{
row["is_published"] = 1;
}
The ToList forces an immediate query evaluation which in this case acts as a copy of all items from the IEnumerable collection, so you can gain some speed here. I ran some tests and the result in this case is (using your code and the modification): ToList is 3 times slower than iterating over IEnumerable and single update!
IMO 40 seconds is an awful lot for 100K items. If your DataTable is bound to a DataGridView or some other UI control, i believe that the update of the GUI is taking so long and not the update of the values itself. In my tests the update using ToList took fractions of a second (on my simple Lenovo netbook with AMD E-450 processor, and i assume you are not using a 386 machine). Try suspending the UI bevor updating and refreshing the values and then enable it again - example in this SO post.
My original post (as i can see you gained some speed using the code - interesting):
More an experiment for my part, but it is possible to:
convert the table to XML
fetch all elements that should be changed
change them
write the changed XML back to the table
The code:
// temp table
var dataTable = new DataTable("Table 1");
dataTable.Columns.Add("title", typeof(string));
dataTable.Columns.Add("number", typeof(int));
dataTable.Columns.Add("subnum1", typeof(int));
dataTable.Columns.Add("subnum2", typeof(int));
// add temp data
Enumerable.Range(1, 100000).ToList().ForEach(e =>
{
dataTable.Rows.Add(new object[] { "A", 1, 2, 3 });
});
// "bulk update"!
var sb = new StringBuilder();
var xmlWriter = XmlWriter.Create(sb);
dataTable.WriteXml(xmlWriter);
var xml = XDocument.Parse(sb.ToString());
// take column to change
var elementsToChange = xml.Descendants("title").ToList();
// the list is referenced to the XML, so the XML is changed too!
elementsToChange.ForEach(e => e.Value = "Z");
// clear current table
dataTable.Clear();
// write changed data back to table
dataTable.ReadXml(xml.CreateReader());
The table is updated. IMO the parts that make this solution slow are the
convertion from and to XML
and the fill of the StringBuilder
The other way around the pure update of the list is probably faster than the table update.
Finaly! I speed up update so it takes 2-3 sec. I added BeginLoadData() and EndLoadData()
DataTable dt = ToDataSet().Tables["users"];
var sb = new StringBuilder();
var xmlWriter = XmlWriter.Create(sb);
dt.WriteXml(xmlWriter);
var xml = XDocument.Parse(sb.ToString());
xml.Descendants("is_published").ToList().ForEach(e => e.Value = "1");
dt.Clear();
dt.BeginLoadData();
dt.ReadXml(xml.CreateReader());
dt.EndLoadData();

less expensive way to find duplicate rows in a datatable?

I want to find all rows in a DataTable where each of a group of columns is a duplicate. My current idea is to get a list of indexes of all rows that appear more than once as follows:
public List<int> findDuplicates_New()
{
string[] duplicateCheckFields = { "Name", "City" };
List<int> duplicates = new List<int>();
List<string> rowStrs = new List<string>();
string rowStr;
//convert each datarow to a delimited string and add it to list rowStrs
foreach (DataRow dr in submissionsList.Rows)
{
rowStr = string.Empty;
foreach (DataColumn dc in submissionsList.Columns)
{
//only use the duplicateCheckFields in the string
if (duplicateCheckFields.Contains(dc.ColumnName))
{
rowStr += dr[dc].ToString() + "|";
}
}
rowStrs.Add(rowStr);
}
//count how many of each row string are in the list
//add the string's index (which will match the row's index)
//to the duplicates list if more than 1
for (int c = 0; c < rowStrs.Count; c++)
{
if (rowStrs.Count(str => str == rowStrs[c]) > 1)
{
duplicates.Add(c);
}
}
return duplicates;
}
However, this isn't very efficient: it's O(n^2) to go through the list of strings and get the count of each string. I looked at this solution but couldn't figure out how to use it with more than 1 field. I'm looking for a less expensive way to handle this problem.
Try this:
How can I check for an exact match in a table where each row has 70+ columns?
The essence is to make a set where you store hashes for rows and only do comparisons between rows with colliding hashes, complexity will be O(n)
...
If you have a large number of rows and storing the hashes themselves is an issue (an unlikely case, but still...) you can use a Bloom filter. The core idea of a Bloom filter is to calculate several different hashes of each row and use them as an address in a bitmap. As you're scanning through the rows you can double-check the rows that already have all the bits in the bitmap previously set.

Categories

Resources