I need to optimize below code so it can execute faster, by means using more memory or parallel, currently it is taking 2 minutes to complete single record in Windows 10 64bit, 16GB RAM PC
data1 list array length = 1000
data2 list array length = 100000
data3 list array length = 100
for (int d1 = 0; d1 < data1.Count; d1++)
{
if (data1[d1].status == 'UNMATCHED')
{
for (int d2 = 0; d2 < data2.Count; d2++)
{
if (data2[d2].status == 'UNMATCHED')
{
vMatched = false;
for (int d3 = 0; d3 < data3.Count; d3++)
{
if (data3[d3].rule == "rule1")
{
if (data1[d1].value == data2[d2].value)
{
data1[d1].status = 'MATCHED';
data1[d2].status = 'MATCHED';
vMatched = true;
break;
}
}
else if (data3[d3].rule == "rule2")
{
...
}
else if (data3[d3].rule == "rule100")
{
...
}
}
if (vMatched)
break;
}
}
}
}
First of all, for any kind of performance oriented programming, avoid using strings, use more appropriate types, like enum or bools, instead. Another recommendation is to profile your code, so you know what parts actually take time.
In the given example there is only one rule presented, so the data3-loop could be eliminated by first checking if this rule exist and only then proceed with the matching.
This matching between items in data1 & data2 essentially pairs unmatched items with the same value. Whenever problems like this occur, the standard solution is some kind of search structure, like a dictionary, to get better than linear search time. For example
var data2Dictionary = data2.ToDictionary(d => Tuple.Create(d.value, d.status), d => d);
This should let you drastically decrease the time to find a item with a specific value and status. Keep in mind that the code above will throw in case multiple items share the same value & status, and that the dictionary key will not be updated if the item changes value or status.
You can avoid to start everytime the 2nd loop from 0. By keeping last index with "UNMATCHED" inside data2.
It should reduce the complexity.
In the worst case:
Now 1000 * 100000 * 100 iterations: 10000000000
New (1000+100000) * 100 iterations: 10100000
I've been working on a project where I need on a button press that this line gets executed.
if (listView1.SelectedItems[0].SubItems[3].Text == "0") //Checks to see Value
{
listView1.SelectedItems[0].SubItems[3].Text = "1";// If Value is Greater, Increase and Change ListView
questionNumberLabel.Text = listView1.SelectedItems[0].SubItems[3].Text;// Increase and Change Label
}
Now I have this repeated about 10 times with each value increasing by one. But I know that this is ugly, and dysfunctional. As well as conflates the file size. I've tried a few things. Primarily this method.
if (listView1.SelectedItems[0].SubItems[3].Text == "0")
{
for (var i = 1; i < 100;)
{
if (!Int32.TryParse(listView1.SelectedItems[0].SubItems[3].Text, out i))
{
i = 0;
}
i++;
listView1.SelectedItems[0].SubItems[3].Text = i.ToString();
Console.WriteLine(i);
}
}
But instead of just adding one, it does the 100 instances and ends. The reason this is becoming a pain in the *** is because the
listView1.SelectedItems[0].SubItems[3].Text
is just that - it's a string, not an int. That's why I parsed it and tried to run it like that. But it still isn't having the out come I want.
I've also tried this
string listViewItemToChange = listView1.SelectedItems[0].SubItems[3].Text;
Then parsing the string, to make it prettier. It worked like it did before, but still hasn't given me the outcome I want. Which to reiterate is, I'm wanting the String taken from the list view to be changed into an int, used in the for loop, add 1, then restring it and output it on my listView.
Please help :(
You say you want the text from a listview subitem converted to an int which is then used in a loop
so - first your creating your loop variable, i, then in your loop you're assigning to it potentially 3 different values 2 of which are negated by the, i++. None of it makes sense and you shouldn't be manipulating your loop variable like that (unless understand what you're doing).
if you move statements around a little..
int itemsToCheck = 10; // "Now I have this repeated about 10 times "
for (var item = 0; item < itemsToCheck; item++)
{
int i;
if (!Int32.TryParse(listView1.SelectedItems[item].SubItems[3].Text, out i))
{
i = 0;
}
i++;
listView1.SelectedItems[item].SubItems[3].Text = i.ToString();
Console.WriteLine(i);
}
Something along those lines is what you're looking for. I haven't changed what your code does with i, just added a loop count itemsToCheck and used a different loop variable so your loop variable and parsed value are not one in the same which will likely be buggy.
Maybe this give you an idea. You can start using this syntax from C# 7.0
var s = listView1.SelectedItems[0].SubItems[3].Text;
var isNumeric = int.TryParse(s, out int n);
if(isNumeric is true && n > 0){
questionNumberLabel.Text = s;
}
to shortcut more
var s = listView1.SelectedItems[0].SubItems[3].Text;
if(int.TryParse(s, out int n) && n > 0){
questionNumberLabel.Text = s;
}
I have a file with comma separated values (CSV), that have this format:
26/09/2015,GROUP_1,0,0,0,0,0,0,0,0,0,0,12345.006,12345.006,27469.005,27469.005,27983.005,27983.005,28081.005,0,0,0,28105.005,28105.005,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Every number represents the work hours over an interval of 15 minutes, over the range 8:00 am - 8:00 pm. The first start time is 08:00:00) and the last start time will be 19:45:00; there are 49 "columns" of data.
0,0,0,0,0,0,0,0,0,0,12345.006,12345.006,27469.005,27469.005,27983.005,27983.005,28081.005,0,0,0,28105.005,28105.005,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
The date will be the date when the "event" happened and it's the date on the data. But I need to get the values that are the same and specify a time range. For example, those first two non-zero values are the same:
12345.006,12345.006
These start at 10:30 and 10:45; I need to merge these and report 12345 hours for the time span 10:30 - 11:00 am.
I read the file; I have those values as an array, and the problem I'm having is how to "group" the same values into the appropriate time ranges.
DateTime startDate = new DateTime(2015,08,05);
DateTime finisDahte = new DateTime(2015,08,05);
int column = 0;
for (int i = 0; i < data.Length; i++)
{
//timerange start with every 15 minutes by column
if (column >= 2)
{
if (data[i] != "0")
{
//Getting rid of decimals, they are not neccesary and that's how the file have it, I dont know why
if (data[i].Contains('.'))
{
data[i] = data[i].Substring(0, data[i].LastIndexOf('.'));
}
//we check if there is a next index to compare the same value
if ((i + 1) <= totalElementos)
{
var nextElem = data[i + 1];
if (nextElem != "0")
{
nextElem.Substring(0, nextElem.LastIndexOf('.'));
}
else
{
//the is no next element... something here
}
//CRUCIAL PART: if the current index it's the same as the next one, it means they share the time range
if (data[i] == nextElem)
{
//the same index as the next one
//I need to identify when it's the first time I'm comparing a value with the next one, so I can set a start date
//I need to sum the total amount of time ranges for every repetition they have and save when the value start and when the value is different (so it's a new value)
}
else
{
//it's not the same index, so technically the finishdate will be set here?
}
}
else
{
//there is not more indexes, so finishdate will be here
}
}
}
//column++;
}
Hope I could explain. Thanks
You need to generate a class like the code below. I changed the date to US format for testing. The code below reads from a string using StringReader and when reading from a file use StreamReader instead.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
List<DataSample> samples = new List<DataSample>();
string data = "9/26/2015,GROUP_1,0,0,0,0,0,0,0,0,0,0,12345.006,12345.006,27469.005,27469.005,27983.005,27983.005,28081.005," +
"0,0,0,28105.005,28105.005,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0\n";
StringReader reader = new StringReader(data);
string inputline = "";
while ((inputline = reader.ReadLine()) != null)
{
string[] dataArray = inputline.Split(new char[] { ',' });
DateTime startDate = DateTime.Parse(dataArray[0]);
startDate = startDate.AddHours(8);
DateTime timeCounter = startDate;
string groupName = dataArray[1];
for (int i = 2; i < dataArray.Length; i++)
{
if (dataArray[i] != "0")
{
DataSample newSample = new DataSample();
samples.Add(newSample);
newSample.name = groupName;
newSample.time = timeCounter;
newSample.value = double.Parse(dataArray[i]);
}
timeCounter = timeCounter.AddMinutes(15);
}
}
var groupByValue = samples.AsEnumerable()
.GroupBy(x => x.value)
.ToList();
foreach (var group in groupByValue)
{
Console.WriteLine("Value : {0}, Times : {1}", group.Key.ToString(), string.Join(",",group.Select(x => x.time.ToString())));
}
Console.ReadLine();
}
}
public class DataSample
{
public string name { get; set; }
public DateTime time { get; set; }
public double value { get; set; }
}
}
This is not a coding service; you need to go a little farther. You've done a good job of outlining your algorithm; now, you should put in a few print statements to track the operation of your code. Do the loops and if statements give you the control flow you expected? A good way to do this is to put your comments into print statements, such as in your last inside comment:
print "there is not more indexes, so finishdate will be here"
Also print the loop index, values you found, etc.
Once you have corrected any flow problems there, start filling in the comment-only blocks with useful code, a few lines or one block at a time.
When you hit a specific problem, post your code and the actual output. That's where StackOverflow is designed to help you.
In the meantime, I'd like to g a change to your outer loop. Let it be driven as a while loop, so you can freely advance your index as needed. Right now, you're keeping two variables for almost the same purpose: i and column. Instead, use just one, something like:
column = 0
while (column < data.Length)
// Find all of the indices with the same consecutive value
finish_index = column;
while data[start_column] = data[finish_index+1]
finish_index++;
// You now have the range of work slots to merge.
printf "Time slots %d - %d have %d work hours", column, finish_index, int(data[column])
column = finish_index+1
}
You will still have to convert column numbers to times: 15 minutes * column + 8:00am. I've also left out a few intermediate nice steps, but I think that you already have them in your comments.
Does this get you moving?
I am comparing two lists of data that were generated from a binary file. I have a good idea on why it's running slow, when there's a significant amount of records, it does un-needed redundant work.
For example, if a1 = a1, condition is true. Since 2a != 1a so why even bother checking it? I need to eliminate 1a from being checked again. If I don't, it will check the first record when it goes to check the 400,000th record. I thought about making the second for loop a foreach, but I can't remove 1a while iterating through the nested loop
The amount of items that can be in either 'for loop' can vary. I don't think a single for loop using 'i' will work since the match can be anywhere. I'm reading from a binary file
This is my current code. Program has been running for over an hour, and it's still going. I removed a lot of my iterating code for readability reasons.
for (int i = 0; i < origItemList.Count; i++)
{
int modFoundIndex = 0;
Boolean foundIt = false;
for (int g = 0; g < modItemList.Count; g++)
{
if ((origItemList[i].X == modItemList[g].X)
&& (origItemList[i].Y == modItemList[g].Y)
&& (origItemList[i].Z == modItemList[g].Z)
&& (origItemList[i].M == modItemList[g].M))
{
foundIt = true;
modFoundIndex = g;
break;
}
else
{
foundIt = false;
}
}
if (foundIt)
{
/*
* This is run assumming it finds an x,y,z,m
coordinate. It thenchecks the database file.
*
*/
//grab the rows where the coordinates match
DataRow origRow = origDbfFile.dataset.Tables[0].Rows[i];
DataRow modRow = modDbfFile.dataset.Tables[0].Rows[modFoundIndex];
//number matched indicates how many columns were matched
int numberMatched = 0;
//get the number of columns to match in order to detect all changes
int numOfColumnsToMatch = origDbfFile.datatable.Columns.Count;
List<String> mismatchedColumns = new List<String>();
//check each column name for a change
foreach (String columnName in columnNames)
{
//this grabs whatever value is in that field
String origRowValue = "" + origRow.Field<Object>(columnName);
String modRowValue = "" + modRow.Field<Object>(columnName);
//check if they are the same
if (origRowValue.Equals(modRowValue))
{
//if they aren the same, increase the number matched by one
numberMatched++;
//add the column to the list of columns that don't match
}
else
{
mismatchedColumns.Add(columnName);
}
}
/* In the event it matches 15/16 columns, show the change */
if (numberMatched != numOfColumnsToMatch)
{
//Grab the shapeFile in question
Item differentAttrShpFile = origItemList[i];
//start blue highlighting
result += "<div class='turnBlue'>";
//show where the change was made at
result += "Change Detected at<br/> point X: " +
differentAttrShpFile.X + ",<br/> point Y: " +
differentAttrShpFile.Y + ",<br/>";
result += "</div>"; //end turnblue div
foreach (String mismatchedColumn in mismatchedColumns)
{
//iterate changes here
}
}
}
}
You're coming at this in a totally wrong way. The loop you have is O(n^2), breaking when you find the match will on average cut the time in half for a hit, that's not enough. If you have a quarter million items in the list then this loop executes 62 billion times and even if the compiler optimizes out the extra array lookups you're still looking at at least a trillion instructions. You don't do O(n^2) for large n if you can possibly help it!
What you need to do is get rid of the O(n^2) aspect of this. My suggestion:
1) Define a hashing function that looks at the x, y, z & m and comes up with an integer value, my inclination would be to use one that's the wordsize of your target platform.
2) Iterate over both lists, compute hashes for everything.
3) Build an index to one of the tables, hash and the object. I suspect a dictionary is the best data structure here but a simple sorted array would also do.
4) Iterate over the list you didn't build the index over, compare the hashes to the entries in the index. If it's a hash that's an O(n) task, if it's a sorted array it's O(n log n).
5) When the hashes match do a full comparison to confirm the hit is real as you will get the occasional collision with a good 64-bit hash and you'll get a decent number of them if your hashes are 32-bit.
This is something similar to Loren said but below is in language of .NET :)
1. Override GetHashCode method to return sum of x,y,z and m. Override Equals method to check for this sum.
2. Iterate and create HashSet from modItemList (List) before loop.
3. In inner loop, first check if origItemList[i] exists in HashSet using YourModHashSet.Contains(MyObject) method.
4. If .Contains return you false, carry one, no match.
5. If .Contains return you true, iterate thru entire modItemList and apply your current logic of checking for x,y,z and m for entire list. Note that here you should use List as hash table might eat up many objects for which hash code is same.
Also, I would use Foreach instead of For because I've seen Foreach giving little better results (5 to 30% faster) in such case.
Update:
I created MyObject class like below:
public class MyObject
{
public int X, Y, Z, M;
public override int GetHashCode()
{
return X*10000 + Y*100 + Z*10 + M;
}
public override bool Equals(object obj)
{
return (obj.GetHashCode() == this.GetHashCode());
}
}
GetHashCode method is important here. We don't want many false positives. False positive occurs when Hash matches for some other combination of X, Y, Z and M. Best way to prevent false positive is to multiply each member such that each will impact one decimal place in HashCode. Note that you should consider not exceeding Int.Max value. If the expected value of X,Y,Z and M are small you should be good.
set2.Clear();
s1 = DateTime.Now;
MyObject matchingElement;
totalmatch = 0;
foreach (MyObject elem in list2)
set2.Add(elem);
foreach (MyObject t1 in list1)
{
if (set2.Contains(t1))
{
matchingElement = null;
foreach (MyObject t2 in list2)
{
if (t1.X == t2.X && t1.Y == t2.Y && t1.Z == t2.Z && t1.M == t2.M)
{
totalmatch++;
matchingElement = t2;
break;
}
}
//Do Something on matchingElement if not null
}
}
Console.WriteLine("set foreach with contains: " + (DateTime.Now - s1).TotalSeconds + "\t Total Match: " + totalmatch);
Above is sample code that I was trying to describe in my answer. This code should work super fast if matches are expected to be less.