Possible memory leak in simple batch file processing function in c# - c#

I'm running a very simple function that reads lines from a text file in batches. Each line contains an sql query so the function grabs a specified number of queries, executes them against the SQL database, then grabs the next batch of queries until the entire file is read. The problem is that over time with very large files the process starts to slow considerably. I'm guessing there is a memory leak somewhere in the function but can't determine where it may be. There is nothing else going on while this function is running. My programming skills are crude at best so please go easy on me. :)
for (int x = 0; x<= totalBatchesInt; x++)
{
var lines = System.IO.File.ReadLines(file).Skip(skipCount).Take(batchSize).ToArray();
string test = string.Join("\n", lines);
SqlCommand cmd = new SqlCommand(test.ToString());
try
{
var rowsEffected = qm.ExecuteNonQuery(CommandType.Text, cmd.CommandText, 6000, true);
totalRowsEffected = totalRowsEffected + rowsEffected;
globalRecordCounter = globalRecordCounter + rowsEffected;
fileRecordCounter = fileRecordCounter + rowsEffected;
skipCount = skipCount + batchSize;
TraceSource.TraceEvent(TraceEventType.Information, (int)ProcessEvents.Starting, "Rows
progress for " + folderName + "_" + fileName + " : " + fileRecordCounter.ToString() + "
of " + linesCount + " records");
}
catch (Exception esql)
{
TraceSource.TraceEvent(TraceEventType.Information, (int)ProcessEvents.Cancelling, "Error
processing file " + folderName + "_" + fileName + " : " + esql.Message.ToString() + ".
Aborting file read");
}
}

There are many things wrong with your code:
You never dispose your command. That's a native handle to an ODBC driver, waiting for the garbage collector to dispose it is very bad practice.
You shouldn't be sending those commands individually anyway. Either send them all at once in one command, or use transactions to group them together.
This one is the reason why it's getting slower over time: File.ReadLines(file).Skip(skipCount).Take(batchSize) will read the same file over and over and ignore a growing amount of lines every attempt, and so growing slower and slower as the number of lines ignored (but processed) gets larger and larger.
To fix #3, simply create the enumerator once and iterate it in batches. In pure C#, you can do something like:
using var enumerator = File.ReadLines(file).GetEnumerator();
for (int x = 0; x<= totalBatchesInt; x++)
{
var lines = new List<string>();
while(enumerator.MoveNext() && lines.Count < batchSize)
list.Add(enumerator.Current);
string test = string.Join("\n", lines);
// your code...
}
Or if you're using Morelinq (which I recommend), something like this:
foreach(var lines in File.ReadLines(file).Batch(batchSize))
{
// your code...
}

Related

Second for loop isn't running inside my update

This if statement within the update() have 2 for-loop, but it only runs the first one after the if condition is activated, and I don't know why.
I'm building a code for path optimizing in unity. Currently I have to find out the path that came across the nodes/points/positions with a certain positions array that the index is the order the path should follow. Some path between 2 nodes are repeated , ex: A to B and B to A is consider the same path and shall thicken the width of line AB eventually rendered. So I tried to sort out the position array into 2 different array for comparing if any of the pair of nodes(or we can say line) is repeated. And I encountered a problem in if statement within the update().
The first should sort out the original array for later comparison. The second one is just for testing if the first one do their job. No comparing yet. However after hitting play and satisfy the if statement I can see all the Debug.log in the first one, everything is normal, the sorting is normal, while the second one just doesn't print anything at all.
I tried comment out the first one, and the second one will run.
I tried to put second one outside the if statement, after it, and without commenting the first one, the second one won't run.
I tried to put the second one before the first one, in the if statement, the second one will run and the first one won't.
So I think this might be some kind of syntax error or am I using the if statement wrong? Please help.
if (l > 0)//activate when we choose any preset processes
{
for (int n = 0; n <= positions.Length; n++)//this loop will sort all the pos1 and pos 2 into array for current frame
{
curPos_1 = positions[n];//current position of node 1
curPos_2 = positions[n + 1];
Debug.Log("CURPOS_1 of line number " + n + " is " + curPos_1);
Debug.Log("CURPOS_2 of line number " + n + " is " + curPos_2);
flag[n] = 0;
Pos_1[n] = curPos_1;
Pos_2[n] = curPos_2;
Debug.Log("POS_1 array of line number " + n + " is " + Pos_1[n]);
Debug.Log("POS_2 array of line number " + n + " is " + Pos_2[n]);
}
for (int o = 0; o <= positions.Length; o++)
{
Debug.Log("flag of number " + o + " is " + flag[o]);
}
}
As described, all for loop should print something. Not just one of it.
Have you checked your Unity Console Window ?
In your first loop you get the next item but its condition will fail at the end, i.e. off by one.
Correct code should be something like this:
var floats = new float[100];
for (var i = 0; i < floats.Length - 1; i++)
{
var f1 = floats[i];
var f2 = floats[i + 1];
}
Now, Unity, has a behavior of ON ERROR RESUME NEXT, so it's highly probable that an error has occured but you haven't seen it (did you turn off the red icon for toggling errors in console ?).
Also, for some conditions only you know about (you didn't post the whole context), it could work once after you've changed some state of your program.

Excel Interop Secondary AxisGroup only appearing sometimes

So, I wrote a short little C# program that will take some text files and produce some scatterplots out of them. The plot has 2 series on it and 2 y-axes. For some reason, the second series does not appear on every run. However, it always appears if you step through the program with the debugger.
My code for the 2 series is as follows:
//show records data
Microsoft.Office.Interop.Excel.Series series1b = seriesCollection2.NewSeries();
series1b.AxisGroup = XlAxisGroup.xlPrimary;
series1b.Name = "Records";
series1b.XValues = ws.get_Range("A" + FACT_TABLE_START.ToString() + ":A" + FACT_TABLE_END.ToString());
series1b.Values = ws.get_Range("D" + FACT_TABLE_START.ToString() + ":D" + FACT_TABLE_END.ToString());
System.Threading.Thread.Sleep(500);
//show duration data
Microsoft.Office.Interop.Excel.Series series2b = seriesCollection2.NewSeries();
series2b.AxisGroup = XlAxisGroup.xlSecondary;
series2b.Name = "Duration";
series2b.XValues = ws.get_Range("A" + FACT_TABLE_START.ToString() + ":A" + FACT_TABLE_END.ToString());
series2b.Values = ws.get_Range("E" + FACT_TABLE_START.ToString() + ":E" + FACT_TABLE_END.ToString());
Probably about 3/4 of the time, the second series will appear just fine. But the other 1/4, series2b will not appear on the plot. I am guessing something asynchronous is going on here? I added the Sleep(500) statement because it seemed to make series2b appear more often.
Why are my graphs only sometimes being created properly?

Method of storing strings other than List<string>

I'm developing a poker app and currently I want to store all the card combinations names I'm planning to use list and do something like this :
private static List<string> combinationNames = new List<string>
{
" High Card ",
" Pair ",
" Two Pair ",
" Three of a Kind ",
" Straight ",
" Flush ",
" Full House ",
" Four of a Kind ",
" Straight Flush ",
" Royal Flush ! "
};
for (int j = 0; j < combinationNames.Count; j++)
{
if (current == j)
{
MessageBox.Show("You Have : ", combinationNames[j]);
}
}
So is there a better way of storing those names and later access them like I did ?
There's not much to go on in your question to understand what's specifically wrong with the code you have. That said, at the very least I would expect the following to be an improvement:
private readonly static string[] combinationNames =
{
" High Card ",
" Pair ",
" Two Pair ",
" Three of a Kind ",
" Straight ",
" Flush ",
" Full House ",
" Four of a Kind ",
" Straight Flush ",
" Royal Flush ! "
};
if (current >= 0 && current < combinationNames.Length)
{
MessageBox.Show("You Have : ", combinationNames[current]);
}
I.e.:
Since the list won't change, it can be an array instead of a list
Since the list object won't change, the variable can be readonly
All the code you did does with j is compare it to current; there's no need to enumerate every possible value for jā€¦just make sure current is within the valid range and then use its value directly.
Note on that last point it's not really clear where you get current from, but likely it should already be guaranteed to be valid before you get as far as displaying the text, so you shouldn't even really need the range check. I just put that there to ensure the new version of code above is reasonably consistent with the behavior of the code you showed (what little was there).
If you need more specific advice than the above, please explain more precisely what you think would be "better" and in what way the code you have now is not sufficiently addressing your needs. I.e. what does the code do now and how is that different from what you want it to do?

Outlook.Items Restrict() weird behavior

I just want to filter my Mails with the Restrict-Method like so:
restriction += "[ReceivedTime] < '" + ((DateTime)time).ToString("yyyy-MM-dd HH:mm") + "'";
var count = oFolder.Items.Restrict(restriction).Count;//Cast<object>().ToList();
for (int i = 0; i < count; i++)
{
var crntReceivedTime = ((OutLook.MailItem)oFolder.Items.Restrict(restriction).Cast<object>().ToList()[i]).ReceivedTime;
if (crntReceivedTime > time)
{
string t = "";
}
}
Theoretically the line string t = ""; should never be called, because I determined the Items to never have entries which ReceivedTime's value is bigger than time.
The problem is the line gets called, what means the restricted Items Collection contains entries which its shouldn't contain.
Did I do something wrong or is the Restrict()-method just failing?
Firstly, you are using multiple dot notation. You are calling Restrict (which is expensive even if it is called once) on each step of the loop. Call it once, cache the returned (restricted) Items collection, then loop over the items in that collection.
Secondly, what is the full restriction? You are using += to add an extra restriction on ReceivedTime. What is the actual value of the restriction variable?
Edit: I had no problem with the following script executed from OutlookSpy (I am its author - click Script button, paste the script, click Run):
restriction = " [ReceivedTime] < '2011-06-11 00:00' "
set Folder = Application.ActiveExplorer.CurrentFolder
set restrItems = Folder.Items.Restrict(restriction)
for each item in restrItems
if TypeName(item) = "MailItem" Then
Debug.Print item.ReceivedTime & " - " & item.Subject
End If
next

ODBCDataReader hangs randomly?

So, I have a huge query that I need to run on an Access DB. I am attempting to use a for loop to break it down because I can't run the query all at once (it has an IN with 50k values). The reader is causing all kinds of problems hanging and such. Most times when I break up the for loop into 50-10000 values the reader will read 400 (exactly 400) values and then hang for about 3 minutes then do another hundred or so, hang, ad infinium. If I do over 10k values per query it gets to 2696 and then hangs, does another 1k or so after hanging, on and on. I have never really worked with odbc, sql or any type of database for that matter, so it must be something stupid, or is this expected? Maybe there's a better way to do something like this? Here's my code that is looped:
//connect to mdb
OdbcConnection mdbConn = new OdbcConnection();
mdbConn.ConnectionString = #"Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\PINAL_IMAGES.mdb;";
mdbConn.Open();
OdbcCommand mdbCmd = mdbConn.CreateCommand();
mdbCmd.CommandText = #"SELECT RAW_NAME,B FROM 026_006_886 WHERE (B='CM1' OR B='CM2') AND MERGEDNAME IN" + imageChunk;
OdbcDataReader mdbReader = mdbCmd.ExecuteReader();
while (mdbReader.Read())
{
sw.WriteLine(#"for /R %%j in (" + mdbReader[0] + #") do move %%~nj.tif .\" + mdbReader[1] + #"\done");
linesRead++;
Console.WriteLine(linesRead);
}
mdbConn.Close();
Here's how I populate the imageChunk variable for the IN by reading 5000 lines with a value line from a text file using a StreamReader:
string imageChunk = "(";
for (int j = 0; j < 5000; j++)
{
string image;
if ((image = sr.ReadLine()) != null)
{
imageChunk += #"'" + sr.ReadLine() + #"',";
}
else
{
break;
}
}
imageChunk = imageChunk.Substring(0, imageChunk.Length - 1);
imageChunk += ")";
Your connection to the DB and execution of the querys seems ok to me. I suspect the "hanging" is coming because you are running the query multiple times. A couple of tips for speed. Columns B and MergedName should have indexes on them. Re-factoring your data table structure may also improve speed. Are you MergedNames truely random? If so you are probably stuck with the speed you have. As #Remou suggests, I would also compare total runtime of uploading your MergedNames list to a table, then joining the table to get your results, then delete your table on completion.
Ended up using a data adapter... Was slow but provided constant feedback instead of freezing up. Never really got a good answer why, but got some advice on smarter ways to perform a large query.

Categories

Resources