I am developing a small time-management app (so that I can learn C#/WPF). I need to know the best way to return calculations to various textblocks on one of my forms.
I have a table called "tblActivity" and I need to calculate how many times certain values exist. In the old days of VBA, I would have simply used DSum or DCount, but I'm not sure as the the most efficient/correct/fastest way to return this sort of data (the fields are indexed by the way).
if you want to query the table as a whole you would do something like this:
int rowCount = tblActivity.rows.count();
if you want the count where a column meets certain criteria, run a select statement
datarow[] SelectedIndexCountROw = tblActivity.select("Index = 12 AND Index2 = 'something'");
what you can still do if you need to display the data and the count
int COunt;
foreach row as datarow in tblActivity.rows
{
string ValueFromTable = row("Column");
//display data if you must,
COunt += 1;
}
Related
I am creating an Excel file using Open XML SDK. In this process, I have a scenario like below.
I need to add data into a Dictionary<uint, string> if key is not exists. For that I am using below code.
var dataLines = sheetData.Elements<Row>().ToList();
for (int i = 0; i < dataLines.Count; i++)
{
var x = dataLines[i];
if (!dataDictionary.TryGetValue(x.RowIndex.Value, out var res)) // 700 seconds, 1,279,999,998 Hit counts
{
dataDictionary.Add(x.RowIndex.Value, x.OuterXml);
}
}
When I am trying to create an Excel sheet which has rows around 90,000 - 92,000, the line with the IF condition in above code consume 700 seconds to complete. (checked with a performance profiler, also this line has 1,279,999,998 Hit counts).
How could I reduce the time the line with the IF condition in above code consumes?
Is there any better way to achive this with less time?
If the if statement is slow, one option you have is to eliminate it entirely and use the indexer of the dictionary to set the value. This means that the "last match will win". If you want the "first match to win", all you have to do is reverse the order you are iterating the list.
var dataLines = sheetData.Elements<Row>().ToList();
for (int i = dataLines.Count - 1; i >= 0; i--)
{
var x = dataLines[i];
dataDictionary[x.RowIndex.Value] = x.OuterXml;
}
If x.RowIndex.Value is unique, it doesn't matter which direction you iterate.
If it is important that the key is sorted in ascending order, you can use a SortedDictionary<TKey, TValue>.
But as others have pointed out, it seems odd that you have so many hit counts. There is probably recursion going on in your application that you need to track down.
i tried Trial version of Gembox.SpreadSheet.
when i Get Cells[,].value by for() or Foreach().
so i think after Calculate() & get Cell[].value, but that way just take same time,too.
it take re-Calculate when i Get Cell[].value.
workSheet.Calcuate(); <- after this, values are Calculated, am i right?
for( int i =0; i <worksheet.GetUsedCellRange(true).LastRowIndex+1;++i)
{
~~~~for Iteration~~~
var value = workSheet.Cells[i,j].Value; <- re-Calcuate value(?)
}
so here is a Question.
Can i Get calculated values? or you guys know pre-Calculate function or Get more Speed?
Unfortunate, I'm not sure what exactly you're asking, can you please try reformulating your question a bit so that it's easier to understand it?
Nevertheless, here is some information which I hope you'll find useful.
To iterate through all cells, you should use one of the following:
1.
foreach (ExcelRow row in workSheet.Rows)
{
foreach (ExcelCell cell in row.AllocatedCells)
{
var value = cell.Value;
// ...
}
}
2.
for (CellRangeEnumerator enumerator = workSheet.Cells.GetReadEnumerator(); enumerator.MoveNext(); )
{
ExcelCell cell = enumerator.Current;
var value = cell.Value;
// ...
}
3.
for (int r = 0, rCount = workSheet.Rows.Count; r < rCount; ++r)
{
for (int c = 0, cCount = workSheet.CalculateMaxUsedColumns(); c < cCount; ++c)
{
var value = workSheet.Cells[r, c].Value;
// ...
}
}
I believe all of them will have pretty much the same performances.
However, depending on the spreadsheet's content this last one could end up a bit slower. This is because it does not exclusively iterate only through allocated cells.
So for instance, let say you have a spreadsheet which has 2 rows. The first row is empty, it has no data, and the second row has 3 cells. Now if you use 1. or 2. approach then you will iterate only through those 3 cells in the second row, but if you use 3. approach then you will iterate through 3 cells in the first row (which previously were not allocated and now they are because we accessed them) and then through 3 cells in the second row.
Now regarding the calculation, note that when you save the file with some Excel application it will save the last calculated formula values in it. In this case you don't have to call Calculate method because you already have the required values in cells.
You should call Calculate method when you need to update, re-calculate the formulas in your spreadsheet, for instance after you have added or modified some cell values.
Last, regarding your question again it is hard to understand it, but nevertheless:
Can i Get calculated values?
Yes, that line of code var value = workSheet.Cells[i,j].Value; should give you the calculated value because you used Calculate method before it. However, if you have formulas that are currently not supported by GemBox.Spreadsheet's calculation engine then it will not be able to calculate the value. You can find a list of currently supported Excel formula functions here.
or you guys know pre-Calculate function or Get more Speed?
I don't know what "pre-Calculate function" means and for speed please refer to first part of this answer.
I have a string that contains: "# of rows, # of columns, Row'X'Col'X'=Serial#, ...
How do I create a DataGrid table with the number of rows and columns defined, and then place the serial #s into the grid.
Examples:
2,1,R1C1=111,R2C1=112,
2,2,R1C1=211,R1C2=212,R2C1=213,R2C2=214,
thanks
Below is code that does what you are asking; however I must point out some problems with this approach. First, getting the total rows and cols from the first two elements in order to create your table is risky. If that data is wrong, this code will most likely crash or possibly omit data. Example if the input is: 2,2,RXCX=.., RXCX=.., RXCX=.., RXCX=..,RXCX=, RXCX=… This line will only get the first 4 values.
Worse… this will crash… if the input is 2,2,RXCX=.., RXCX=.. Then it will crash when you try to access the 4th element in the splitArray because there isn’t a 4th element. Either way is not good.
My point is to be safe… it would be a better approach to see how much data is actually there before you create the grid. You could get how many items there are with StringArray.Length minus the first two elements. These elements will define the dimensions and allow you to check their validity. This will make sure your loops won’t go out of bounds because the supplied data was wrong. It seems redundant and error prone to supply the dimension values when you can get that info from the data itself.
I still am not 100% sure what you want to accomplish here. It looks like a search of some form. This is what I am picturing…
Looking at your (previous) screen shots it appears to me that after you type into the Serial # text box and click the “Search Txt Files” button it will search for data that came from the input string i.e. “PLX51…” and then have the grid display the “filtered” results that match (or are LIKE) what’s in the Serial # textbox. If this is true, I would ignore the RXCX vales and put the data in a single column. Then wire up an OnKeyPress event for the text box to filter the grid whenever the user types into the Serial # text box.
Otherwise I am lost as to why you would need to create the data in the fashion described. Just because the input has unnecessary data… doesn’t mean you have to use it. Just a thought.
string inputString = "2,2,R1C1=211,R1C2=212,R2C1=213,R2C2=214";
string[] splitArray = inputString.Split(',');
int totalRows = int.Parse(splitArray[0]);
int totalCols = int.Parse(splitArray[1]);
int itemIndex = 2;
// add the columns
for (int i = 0; i < totalCols; i++)
{
dataGridView1.Columns.Add("Col", "Col");
}
// add the rows
dataGridView1.Rows.Add(totalRows);
for (int i = 0; i < totalRows; i++)
{
for (int j = 0; j < totalCols; j++)
{
dataGridView1.Rows[i].Cells[j].Value = splitArray[itemIndex];
itemIndex++;
}
}
I want the Excel spreadsheet cells I populate with C# to expand or contract so that all their content displays without manually adjusting the width of the cells - displaying at "just enough" width to display the data - no more, no less.
I tried this:
_xlSheet = (MSExcel.Excel.Worksheet)_xlSheets.Item[1];
_xlSheet.Columns.AutoFit();
_xlSheet.Rows.AutoFit();
...but it does nothing in my current project (it works fine in a small POC sandbox app that contains no ranges). Speaking of ranges, the reason this doesn't work might have something to do with my having created cell ranges like so:
var rowRngMemberName = _xlSheet.Range[_xlSheet.Cells[1, 1], _xlSheet.Cells[1, 6]];
rowRngMemberName.Merge(Type.Missing);
rowRngMemberName.Font.Bold = true;
rowRngMemberName.Font.Italic = true;
rowRngMemberName.Font.Size = 20;
rowRngMemberName.Value2 = shortName;
...and then adding "normal"/generic single-cell values after that.
In other words, I have values that span multiple columns - several rows of that. Then below that, I revert to "one cell, one value" mode.
Is this the problem?
If so, how can I resolve it?
Is it possible to have independent sections of a spreadsheet whose formatting (autofitting) isn't affected by other parts of the sheet?
UPDATE
As for getting multiple rows to accommodate a value, I'm using this code:
private void AddDescription(String desc)
{
int curDescriptionBottomRow = curDescriptionTopRow + 3;
var range =
_xlSheet.Range[_xlSheet.Cells[curDescriptionTopRow, 1], _xlSheet.Cells[curDescriptionBottomRow, 1]];
range.Merge();
range.Font.Bold = true;
range.VerticalAlignment = XlVAlign.xlVAlignCenter;
range.Value2 = desc;
}
...and here's what it accomplishes:
AutoFit is what is needed, after all, but the key is to call it at the right time - after all other manipulation has been done. Otherwise, subsequent manipulation can lose the autofittedness.
If I get what you are asking correctly you are looking to wrap text... at least thats the official term for it...
xlWorkSheet.Range["A4:A4"].Cells.WrapText = true;
Here is the documentation: https://msdn.microsoft.com/en-us/library/office/ff821514.aspx
I'm writing a C# application that runs a number of regular expressions (~10) on a lot (~25 million) of strings. I did try to google this, but any searches for regex with "slows down" are full of tutorials about how backreferencing etc. slows down regexes. I am assuming that this is not my problem because my regexes start out fast and slow down.
For the first million or so strings it takes about 60ms per 1000 strings to run the regular expressions. By the end, it's slowed down to the point where its taking about 600ms. Does anyone know why?
It was worse, but I improved it by using instances of RegEx instead of the cached version and compiling the expressions that I could.
Some of my regexes need to vary e.g. depending on the user's name it might be
mike said (\w*) or john said (\w*)
My understanding is that it is not possible to compile those regexes and pass in parameters (e.g saidRegex.Match(inputString, userName)).
Does anyone have any suggestions?
[Edited to accurately reflect speed - was per 1000 strings, not per string]
This may not be a direct answer to your question about RegEx performance degradation - which is somewhat fascinating. However - after reading all of the commentary and discussion above - I'd suggest the following:
Parse the data once, splitting out the matched data into a database table. It looks like you're trying to capture the following fields:
Player_Name | Monetary_Value
If you were to create a database table containing these values per-row, and then catch each new row as it is being created - parse it - and append to the data table - you could easily do any kind of analysis / calculation against the data - without having to parse 25M rows again and again (which is a waste).
Additionally - on the first run, if you were to break the 25M records down into 100,000 record blocks, then run the algorithm 250 times (100,000 x 250 = 25,000,000) - you could enjoy all the performance you're describing with no slow-down, because you're chunking up the job.
In other words - consider the following:
Create a database table as follows:
CREATE TABLE PlayerActions (
RowID INT PRIMARY KEY IDENTITY,
Player_Name VARCHAR(50) NOT NULL,
Monetary_Value MONEY NOT NULL
)
Create an algorithm that breaks your 25m rows down into 100k chunks. Example using LINQ / EF5 as an assumption.
public void ParseFullDataSet(IEnumerable<String> dataSource) {
var rowCount = dataSource.Count();
var setCount = Math.Floor(rowCount / 100000) + 1;
if (rowCount % 100000 != 0)
setCount++;
for (int i = 0; i < setCount; i++) {
var set = dataSource.Skip(i * 100000).Take(100000);
ParseSet(set);
}
}
public void ParseSet(IEnumerable<String> dataSource) {
String playerName = String.Empty;
decimal monetaryValue = 0.0m;
// Assume here that the method reflects your RegEx generator.
String regex = RegexFactory.Generate();
for (String data in dataSource) {
Match match = Regex.Match(data, regex);
if (match.Success) {
playerName = match.Groups[1].Value;
// Might want to add error handling here.
monetaryValue = Convert.ToDecimal(match.Groups[2].Value);
db.PlayerActions.Add(new PlayerAction() {
// ID = ..., // Set at DB layer using Auto_Increment
Player_Name = playerName,
Monetary_Value = monetaryValue
});
db.SaveChanges();
// If not using Entity Framework, use another method to insert
// a row to your database table.
}
}
}
Run the above one time to get all of your pre-existing data loaded up.
Create a hook someplace which allows you to detect the addition of a new row. Every time a new row is created, call:
ParseSet(new List<String>() { newValue });
or if multiples are created at once, call:
ParseSet(newValues); // Where newValues is an IEnumerable<String>
Now you can do whatever computational analysis or data mining you want from the data, without having to worry about performance over 25m rows on-the-fly.
Regex does takes time to compute. However, U can make it compact using some tricks.
You can also use string functions in C# to avoid regex function.
The code would be lengthy but might improve performance.
String has several functions to cut and extract characters and do pattern matching as u need.
like eg: IndeOfAny, LastIndexOf, Contains....
string str= "mon";
string[] str2= new string[] {"mon","tue","wed"};
if(str2.IndexOfAny(str) >= 0)
{
//success code//
}