String.format slow, need faster alternative - c#

I was hoping to get some advice on how to speed up the following function. Specifically, I'm hoping to find a faster way to convert numbers (mostly doubles, IIRC there's one int in there) to strings to store as Listview subitems. As it stands, this function takes 9 seconds to process 16 orders! Absolutely insane, especially considering that with the exception of the call to process the DateTimes, it's all just string conversion.
I thought it was the actual displaying of the listview items that was slow, so I did some research and found that adding all subitems to an array and using Addrange was far faster than adding the items one at a time. I implemented the change, but got no better speed.
I then added some stopwatches around each line to narrow down exactly what's causing the slowdown; unsurprisingly, the call to the datetime function is the biggest slowdown, but I was surprised to see that the string.format calls were extremely slow as well, and given the number of them, make up the majority of my time.
private void ProcessOrders(List<MyOrder> myOrders)
{
lvItems.Items.Clear();
marketInfo = new MarketInfo();
ListViewItem[] myItems = new ListViewItem[myOrders.Count];
string[] mySubItems = new string[8];
int counter = 0;
MarketInfo.GetTime();
CurrentTime = MarketInfo.CurrentTime;
DateTime OrderIssueDate = new DateTime();
foreach (MyOrder myOrder in myOrders)
{
string orderIsBuySell = "Buy";
if (!myOrder.IsBuyOrder)
orderIsBuySell = "Sell";
var listItem = new ListViewItem(orderIsBuySell);
mySubItems[0] = (myOrder.Name);
mySubItems[1] = (string.Format("{0:g}", myOrder.QuantityRemaining) + "/" + string.Format("{0:g}", myOrder.InitialQuantity));
mySubItems[2] = (string.Format("{0:f}", myOrder.Price));
mySubItems[3] = (myOrder.Local);
if (myOrder.IsBuyOrder)
{
if (myOrder.Range == -1)
mySubItems[4] = ("Local");
else
mySubItems[4] = (string.Format("{0:g}", myOrder.Range));
}
else
mySubItems[4] = ("N/A");
mySubItems[5] = (string.Format("{0:g}", myOrder.MinQuantityToBuy));
string IssueDateString = (myOrder.DateWhenIssued + " " + myOrder.TimeWhenIssued);
if (DateTime.TryParse(IssueDateString, out OrderIssueDate))
mySubItems[6] = (string.Format(MarketInfo.ParseTimeData(CurrentTime, OrderIssueDate, myOrder.Duration)));
else
mySubItems[6] = "Error getting date";
mySubItems[7] = (string.Format("{0:g}", myOrder.ID));
listItem.SubItems.AddRange(mySubItems);
myItems[counter] = listItem;
counter++;
}
lvItems.BeginUpdate();
lvItems.Items.AddRange(myItems.ToArray());
lvItems.EndUpdate();
}
Here's the time data from a sample run:
0: 166686
1: 264779
2: 273716
3: 136698
4: 587902
5: 368816
6: 955478
7: 128981
Where the numbers are equal to the indexes of the array. All other lines were so low in ticks as to be negligible compared to these.
Although I'd like to be able to use the number formatting of string.format for pretty output, I'd like to be able to load a list of orders within my lifetime more, so if there's an alternative to string.format that's considerably faster but without the bells and whistles, I'm all for it.
Edit: Thanks to all of the people who suggested the myOrder class might be using getter methods rather than actually storing the variables as I originally thought. I checked that and sure enough, that was the cause of my slowdown. Although I don't have access to the class to change it, I was able to piggyback onto the method call to populate myOrders and copy each of the variables to a list within the same call, then use that list when populating my Listview. Populates pretty much instantly now. Thanks again.

I find it hard to believe that simple string.Format calls are causing your slowness problems - it's generally a very fast call, especially for nice simple ones like most of yours.
But one thing that might give you a few microseconds...
Replace
string.Format("{0:g}", myOrder.MinQuantityToBuy)
with
myOrder.MinQuantityToBuy.ToString("g")
This will work when you're doing a straight format of a single value, but isn't any good for more complex calls.

I put all the string.format calls into a loop and was able to run them all 1 million times in under a second, so your problem isn't string.format...it's somewhere else in your code.
Perhaps some of these properties have logic in their getter methods? What sort of times do you get if you comment out all the code for the listview?

It is definitely not string.Format that is slowing you down. Suspect the property accesses from myOrder.
On one of the format calls, try to declare a few local variables and set those to the properties you try to format, then pass those local variables to yoru string.Format and retime. You may find that your string.Format now runs in lightning speed as it should.
Now, property accesses usually don't require much time to run. However, I've seen some classes where each property access is logged (for audit trail). Check if this is the case and if some operation is holding your property access from returning immediately.
If there is some operation holding a property access, try to queue up those operations (e.g. queue up the logging calls) and have a background thread execute them. Return the property access immediately.
Also, never put slow-running code (e.g. elaborate calculations) into a property accesser/getter, nor code that has side-effects. People using the class will not be aware that it will be slow (since most property accesses are fast) or has side effects (since most property accesses do not have side effects). If the access is slow, rename the access to a GetXXX() function. If it has side effects, name the method something that conveys this fact.

Wow. I feel a little stupid now. I've spent hours beating my head against the wall trying to figure out why a simple string operation would be taking so long. MarketOrders is (I thought) an array of myOrders, which is populated by an explicit call to a method which is severely restricted as far as how times per second it can be run. I don't have access to that code to check, but I had been assuming that myOrders were simple structs with member variables that were assigned when MarketOrders is populated, so the string.format calls would simply be acting on existing data. On reading all of the replies that point to the access of the myOrder data as the culprit, I started thinking about it and realized that MarketOrders is likely just an index, rather than an array, and the myOrder info is being read on demand. So every time I call an operation on one of its variables, I'm calling the slow lookup method, waiting for it to become eligible to run again, returning to my method, calling the next lookup, etc. No wonder it's taking forever.
Thanks for all of the replies. I can't believe that didn't occur to me.

I am glad that you got your issue solved. However I did a small refactoring on your method and came up with this:
private void ProcessOrders(List<MyOrder> myOrders)
{
lvItems.Items.Clear();
marketInfo = new MarketInfo();
ListViewItem[] myItems = new ListViewItem[myOrders.Count];
string[] mySubItems = new string[8];
int counter = 0;
MarketInfo.GetTime();
CurrentTime = MarketInfo.CurrentTime;
// ReSharper disable TooWideLocalVariableScope
DateTime orderIssueDate;
// ReSharper restore TooWideLocalVariableScope
foreach (MyOrder myOrder in myOrders)
{
string orderIsBuySell = myOrder.IsBuyOrder ? "Buy" : "Sell";
var listItem = new ListViewItem(orderIsBuySell);
mySubItems[0] = myOrder.Name;
mySubItems[1] = string.Format("{0:g}/{1:g}", myOrder.QuantityRemaining, myOrder.InitialQuantity);
mySubItems[2] = myOrder.Price.ToString("f");
mySubItems[3] = myOrder.Local;
if (myOrder.IsBuyOrder)
mySubItems[4] = myOrder.Range == -1 ? "Local" : myOrder.Range.ToString("g");
else
mySubItems[4] = "N/A";
mySubItems[5] = myOrder.MinQuantityToBuy.ToString("g");
// This code smells:
string issueDateString = string.Format("{0} {1}", myOrder.DateWhenIssued, myOrder.TimeWhenIssued);
if (DateTime.TryParse(issueDateString, out orderIssueDate))
mySubItems[6] = MarketInfo.ParseTimeData(CurrentTime, orderIssueDate, myOrder.Duration);
else
mySubItems[6] = "Error getting date";
mySubItems[7] = myOrder.ID.ToString("g");
listItem.SubItems.AddRange(mySubItems);
myItems[counter] = listItem;
counter++;
}
lvItems.BeginUpdate();
lvItems.Items.AddRange(myItems.ToArray());
lvItems.EndUpdate();
}
This method should be further refactored:
Remove outer dependencies Inversion of control (IoC) in mind and by using dependency injection (DI);
Create new property "DateTimeWhenIssued" for MyOrder that will return DateTime data type. This should be used instead of joining two strings (DateWhenIssued and TimeWhenIssued) and then parsing them into DateTime;
Rename ListViewItem as this is a built in class;
ListViewItem should have a new constructor for boolean "IsByOrder": var listItem = new ListViewItem(myOrder.IsBuyOrder). Instead of a string "Buy" or "Sell";
"mySubItems" string array should be replaced with a class for better readability and extendibility;
Lastly, the foreach (MyOrder myOrder in myOrders) could be replaced with a "for" loop as you are using a counter anyway. Besides "for" loops are faster too.
Hopefully you do not mind my suggestions and that they are doable in your situation.
PS. Are you using generic arrays? ListViewItem.SubItems property could be
public List<string> SubItems { get; set; };

Related

Is it possible to add SearchResultCollections together? Or return more than one?

Im searching in ActiveDirectory, and I have to make things perform fast. Sadly, I need to search in different domains and different OU-s. My goal is to write a function that does multiple searches, and returns the result of all search. Sadly, SearchResultCollections cannot be added together. I think converting them to any other IEnumerable would affect the speed, or iterating over the results building up a list from them would also be ineffective in terms of speed (Correct me here, If I'm wrong). Do you guys have any idea?
using System.DirectoryServices;
public SearchResultCollection SearchAD(ObjectClassValue objectClass, PropertyComparator comparator, string input, IEnumerable<string> searchPaths, IEnumerable<string> propertiesToCompareWith, IEnumerable<string> propertiesToLoad = null)
{
string objClass = getObjectClass(objectClass);
DirectoryEntry entry;
SearchResultCollection result = null;
var searcher = new DirectorySearcher();
searcher.PropertiesToLoad.AddRange(propertiesToLoad.ToArray());
searcher.SearchScope = SearchScope.Subtree;
searcher.Filter = buildSearchFilter(objectClass, comparator, input, propertiesToCompareWith);
for (int i = 0; i < searchPaths.ToArray().Length; i++)
{
entry = new DirectoryEntry(searchPaths.ToArray()[i]);
using (searcher)
{
//here I want to add the results together
}
}
}
The only thing that stands out is your repeated use of searchPaths.ToArray(). You're converting that collection to an array twice on every iteration of the loop (once when it makes the comparison, and again when you use a value from it). It's an easy thing to change: Don't convert it to an array at all. Just use a foreach loop on searchPaths directly:
foreach (var searchPath in searchPaths)
{
entry = new DirectoryEntry(searchPath);
// etc
}
When programming with AD, the biggest slow downs will be from the number of network requests made and the amount of data you're asking for. I wrote an article about how to optimize performance, which might help you: Active Directory: Better Performance
You're already using PropertiesToLoad, which is good. That's important, for reasons I describe in that article. But there might be other things you can change in your code where you actually make the search.
Your query could also affect the speed of the search. A bad query can really slow things down. So be aware of that.
A small thing, which won't affect performance, Subtree is the default search scope, so this line isn't needed:
searcher.SearchScope = SearchScope.Subtree;
When it comes to combining the results, adding them all to the same collection in whatever format you need will be just fine. It won't add to the overall time in any significant way.

Change foreach loop by for loop

I am creating a Fingerprint Verification System in which I have to match fingerprints using records in database. I have used foreach loop to to so but it is taking almost 40 seconds for only 350 records. I want to speed it up. I want my foreach loop to convert into for loop but I am facing some difficulties in initializing the for loop. Here is my code.
foreach (KeyValuePair<decimal, MemoryStream> tt in profileDic)
{
this.Invoke(new Function(delegate()
{
textBox1.Text += "\n" + tt.Key.ToString();
}));
temp = new DPFP.Template(tt.Value);
Verificator = new DPFP.Verification.Verification();
featuresForVerification = ExtractFeatures(Sample, DPFP.Processing.DataPurpose.Verification);
if (featuresForVerification != null)
{
DPFP.Verification.Verification.Result result = new DPFP.Verification.Verification.Result();
Verificator.Verify(featuresForVerification, temp, ref result);
#region UI Envoke
this.Invoke(new Function(delegate()
{
if (result.Verified)
{
MessageBox.Show("FAR " + result.FARAchieved + "\n" + tt.Key.ToString());
}
else
{
//MessageBox.Show("No Match");
}
}));
#endregion
}
this.Invoke(new Function(delegate()
{
progressBar1.Value += 1;
}));
Thread.CurrentThread.Priority = ThreadPriority.Highest;
}
I am confused at the first line foreach (KeyValuePair<decimal, MemoryStream> tt in profileDic). Can someone tell me how I can iterate through every item in Dictionary object profileDic using a for loop. I am not sure how to get KeyValuePair<decimal, MemoryStream> tt in profileDic when using a for loop.
I have to match [entries against a list]. I have used foreach loop to do
so but it is taking almost 40 seconds for only 350 records. I want to speed
it up. I want my foreach loop to convert into for for loop [...]
At such a point it is a good idea to just step back and think about what we are doing here in general. Performance optimization usually comes on two levels: algorithms and workflow.
Essentially, the problem to be solved here is to find an entry in a potentially large set of records. There may be two causes why this is slow:
the list is very large, and iterating it takes ages
the list may not be that large, but the routine is called quite often
The list is very large
If we remember our knowledge about the so-called Big-O notation and think about it, we may quickly find that an array search takes O(n) while a hash set search for example would only take O(1) in normal cases, only in the worst case we will be again down to O(n). Not bad!
By some lucky coincident (and by the help of the cheat sheet linked above) we find out that a Dictionary<K,V> or alternatively a database index is more or less what we want: A dictionary is basically a "pimped" hash set, and a database index is typically a B-Tree, which performs with Θ(log(n)). The final decision if we should use a dictionary or a database table mostly depends on the amount of data we are talking about.
As a practical example, I recently had a piece of code on my table that iterated through a list in the same linear manner. The code did this inside of two other, nested loops. The algorithm took 4 hours to complete. After introducing two dictionaties at strategic places I had it down to under a minute. I leave it to the amused reader to calculate the percentage.
Hence, the real question to ask is not "is for faster than foreach?" (no) but instead we should ask: "How can I reorganize my data structures and optimize the algorithms involved to make it perform?"
The code is called quite often
That's another, but related problem. In some cases the data structures can't really be optimized, or it would cause way too many changes in the code. But when we closely look at what the profiler is telling us, we may find that the costly rountine is called 800.000 times in 15 seconds and that these calls alone contribute a fair amount to the total time.
If we look even closer, we may find that we call the routine with a very limited set of input data, so that essentially a large portion of the calls may just be omitted by caching the results of the costly operation. I just had such a case last week where I was able to reduce the amount of database calls down to 5% of the original amount. One can imagine what that did to overall performance.
In this second case we therefore should ask ourselves a slightly different question: "Why are we doing this at all? How can we change the logical workflow to avoid most of these calls? Is there maybe a completely different way to achieve the same results?".
Summary (TL;DR)
There are two basic approaches with every performance optimization:
Algorithmic or "low level": Quicksort or Bubblesort? Tree, List or HashSet?
Workflow and Logic: Why do we have to call that particular costly routine 5 million times?
Instead of changing it to Foreach Loop from For Loop, You might actually want to add like Exit For or Break; in c# once you already find the result.
What biometric system are you using? This kind of work should be done in biometric device. But if you really need to find a person directly in database, you should not use C# collections, but database itself.
Every fingerprint has its minutiaes. There are unique features of fingerprint. There are some algorithms that transform this into storable data for example in database. This could look like md5 hash.
Next, when you have records in database with minutiaes, you can just ask database for this value.
It should work like that: you get minutias (or complete data that can be stored directly in database) and then ask database for this value, something like:
SELECT *
FROM users
WHERE fingerprint = :your_data
Database operations are far more faster than iterating throug any collection in any way.
To answer your stated question: to replace a foreach loop with a for loop, replace:
foreach (KeyValuePair<decimal, MemoryStream> tt in profileDic)
{
//...
}
with
for (var i=0; i < profileDic.Count, i++)
{
KeyValuePair<decimal, MemoryStream> tt = profileDic.ElementAt(i);
//...
}
To use this you'd also need to include a using System.Linq; statement in your code.
That said, this assumes that the order of elements in your dictionary will not change; which is not guaranteed (unless you're using a SortedDictionary or OrderedDictionary).
A better approach is therefore:
[decimal[]]keys = profileDic.Keys;
for (var i=0; i < keys.Count, i++)
{
KeyValuePair<decimal, MemoryStream> tt = profileDic[keys[i]];
//...
}
But this adds more overhead / likely pushes the for loop's time over that of the foreach loop / is a micro-optimisation that won't solve your real performance issue.
Per comments, the loop is likely not your problem, but what's occurring within that loop (if in this part of the code at all).
We don't know enough about your code to comment on it, so it's likely best that you investigate yourself; e.g. using the performance analysis techniques described here: https://msdn.microsoft.com/en-us/library/ms182372.aspx
I've refactored your code to make it more readable (i.e. by pulling the UI updates into their own methods, so they don't clutter the main thread).
I also moved those operations which look like they wouldn't need to be updated each iteration outside of your loop... but without knowing your code this is pure guesswork / so no guarantees.
Finally I removed your code to alter the priority of the current thread at the end of each iteration. Playing with thread priorities is not a good way to fix slow code; there are only certain cases where it's appropriate, and seeing it's context here I'm over 99% certain that this is not one of those cases.
//...
featuresForVerification = ExtractFeatures(Sample, DPFP.Processing.DataPurpose.Verification); //since this appears unaffected by our profileDic values, let's initialise once
if (featuresForVerification != null)
{
DPFP.Verification.Verification verificator = new DPFP.Verification.Verification();
foreach (KeyValuePair<decimal, MemoryStream> tt in profileDic)
{
string key = tt.Key.ToString(); //we use this a lot, so let's only convert it to string once, then reuse that
UIReportCurrentKey(key);
temp = new DPFP.Template(tt.Value);
DPFP.Verification.Verification.Result result = new DPFP.Verification.Verification.Result();
verificator.Verify(featuresForVerification, temp, ref result);
UIReportMatch(result, key);
//if a match were found, would we want to keep comparing, or exit on first match? If the latter, add code to record which record matched, then use break to exit the loop
UIIncremementProgressBar();
}
} else {
throw new NoFeaturesToVerifyException("The verfication tool was not given any features to verify");
//alternatively set progress bar to complete / whatever other UI actions you have /
//whatever you want to happen if no features were selected for comparison
}
//...
#region UI Updaters
/*
I don't know much about winforms updates; have a feeling these can be made more efficient too,
but for the moment just shoving the code into its own functions to make the main method less
cluttered with UI logic.
*/
// Adds the key of the item currently being processed to the UI textbox
private void UIReportCurrentKey(string key)
{
this.Invoke(new Function(delegate()
{
textBox1.Text += "\n" + key;
}));
}
private void UIReportMatch(DPFP.Verification.Verification.Result result, string key)
{
if (result.Verified)
{
this.Invoke(new Function(delegate()
{
MessageBox.Show("FAR " + result.FARAchieved + "\n" + key);
}));
}
/*
else
{
this.Invoke(new Function(delegate()
{
MessageBox.Show("No Match");
}));
}
*/
}
private void UIIncremementProgressBar()
{
this.Invoke(new Function(delegate()
{
progressBar1.Value++;
}));
}
#endregion UI Updaters
#region Custom Exceptions
public class NoFeaturesToVerifyException: ApplicationException
{ //Choose base class appropriately
public NoFeaturesToVerifyException(): base() {}
public NoFeaturesToVerifyException(string message): base(message) {}
//...
}
#endregion Custom Exceptions

Avoid memory leaks with strings

I've found a memory leak in my parser. I don't know how to fix that problem.
Let's see that basic routing.
private void parsePage() {
String[] tmp = null;
foreach (String row in rows) {
tmp = row.Split(new []{" "}, StringSplitOptions.None);
PrivateRow t = new PrivateRow();
t.Field1 = tmp[1];
t.Field2 = tmp[2];
t.Field3 = tmp[3];
t.Field4 = String.Join(" ", tmp);
myBigCollection.Add(t);
}
}
private void parseFromFile() {
String[] tmp = null;
foreach (String row in rows) {
PrivateRow t = new PrivateRow();
t.Field1 = "mystring1";
t.Field2 = "mystring2222";
t.Field3 = "mystring3333";
t.Field4 = "mystring1 xxx yy zzz";
myBigCollection.Add(t);
}
}
Launching parsePage(), on a collection (rows is a List of 100000 elements) make my app grown from 20MB to 70MB.
Launching parseFromFile(), that read SAME collection from file, but avoiding split/join, take about 1MB.
Using a MemoryProfiler, I see that "t" fields and PrivateRow, kkep reference to String.Split() array and Split.Join.
I suppose that's because I assign a reference, not a copy, that can be garbage collected.
Ok, use 70mb isn't a big deal, but when I launch on production, with a lot o site, it can raise 2.5-3GB...
Cheers
This isn't a memory leak per se. It's actually behaving properly. The reason your second function uses so much less memory, is simply because you only have four strings in use. Each of these four strings is allocated only once, and subsequent uses of the strings for new t.Fieldx instances actually refer to the same string values. Strings are immutable, so if you refer to the same string value more than once, it can be handled by the same string instances. See the paragraph labelled "Interning" at this article on String in .NET for some more detail on this.
In your first function, you have what are probably mostly different strings for each field, and each time through the loop. That simply is much more varied data. The fact that those strings are held on to is what you want to have happen for as long as your PrivateRow objects exist.
You don't have a memory leak at all, it's just garbage collector takes time to process it.
I suppose that's because I assign a reference, not a copy, that can
be garbage collected.
That is not correct assumption. string during assignment is copied, even if it is a reference type. It is special, kind of, unique type inside BCL.
Now what about possible solution, in case you have intensive memory pressure. If you have massive amount of string to process from file, you may look on 2 options.
1) Process them in sequence, by reading a srteam (not load all at once). Loading as less data in memory as possible/required/makes sence.
2) Use MemoryMappedFile to, again, load only chunks of data and process them in sequence.
2nd can be combined with 1st.
Like others have said, there is no evidence of a memory leak here, just delayed garbage collection. All memory should be cleaned up eventually.
That being said, there are a couple things you can do to help keep memory usage lower or recover it more quickly:
1)You should be able to replace
t.Field4 = String.Join(" ", tmp);
with
t.Field4 = row;
You created tmp by splitting row, then you're joining it back together. Avoid creating a new string by just using row.
2) Call GC.Collect(); at the end of the method to request immediate garbage collection. This won't reduce the memory used within the method, but it should free up memory more quickly.
If your application is memory-usage critical and there is a lot of repeating data you should replace string values with Enums.

Would the process of determining whether a string has been modified cause performance hits? If so, could we optimize this further?

I am in need of a way to detect whether a string changes within my code, however, I am at the same time cautious about my performance:
In reference to this question and answer.
Specifically, this code snippet:
// holds a copy of the previous value for comparison purposes
private string oldString = string.Empty;
private void button6_Click(object sender, EventArgs e)
{
// Get the new string value
string newString = //some varying value I get from other parts of my program
// Compare the old string to the new one
if (oldString != newString)
{
// The string values are different, so update the ListBox
listBox1.Items.Clear();
listBox1.Items.Add(x + /*other things*/);
}
// Save the new value back into the temporary variable
oldString = newString;
}
I am currently working on a Grasshopper 3D component. Each component is itself a class library, and the main method is a method called SolveInstance(). The condition in which it runs I'm not actually too sure, but what I do know is that in the minimum, it runs a number of times a section, so your graphical UI pretty much is real-time, or so imperceptible to the human eye.
For my particular example, this is what my particular case would look like (it's untested psuedo-code).
// instance vars
private string _oldOutputString = string.Empty;
private string _newOutputString = string.Empty;
// Begin SolveInstance() method
// This constructor call saves a string to _newOutputString based on two lists
_valueList = new ValueList(firstList, secondList);
// Compare the old string to the new one
if (_oldOutputString != _newOutputString)
{
// Save the new value back into the temporary variable
_oldOutputString = _newOutputString;
// Call eventargs method
Menu_MyCustomItemClicked(_sender, _e);
}
DA.SetData(0, _oldOutputStr);
My question is: Would doing this, where that particular piece of code gets called many times a second, take a hit in performance?
That string comparison should take on the order of a microsecond or less.
You're only doing it once per button click.
How fast can you click a button - ten times per second?
That means, worst case, that comparison can cost you on the order of ten microseconds per second, or 0.001 percent of time.
Don't worry about anything taking less than 1 percent of time, or even 10%, because if you could fix it, it would save you no more than that.
Would doing this, where that particular piece of code gets called many times a second, take a hit in performance?
Yes.
That's not the important thing though, the important thing is will it cause a significant hit.
Which is a matter of how long it takes, vs. what's significant to you.
The only way to know is to measure.
However, it's worth considering what makes some string comparisons slower than others.
The fastest string comparison is where two strings are in fact the exact same object. In particular;
string a = "ABC";
string b = a;
bool c = a == b; // Very fast.
The next fastest is when one, but not both are null (both being null is an example of the above case, anyway).
The next fastest is when they have different lengths, for those types of comparisons where they can't be equivalent if they have the different lengths. This doesn't apply to case-sensitive comparisons because if you capitalise "weißbier" to "WEISSBEIR" the lengths are different, but does to exact-match comparisons.
The next fastest is when they differ early on.
The slowest is when two strings are different objects, but are in fact equal.
The average cost of string equality tests are proportional to the length of the strings.
We can reduce the speed of the slowest by interning strings (whether in the default intern pool, or a custom cache of strings) and if we know all strings have gone through the same process we can make all equality comparisons as fast as the fastest case (because we've made sure that either two strings will be the same string, or they'll not be equivalent). However doing so in itself takes time, so it's not always worth it.
In all, if you are only changing the string on a real change, then it in practice will be much faster than if you are building up the string repeatedly, potentially to end up with the same string as you had to begin with.

IEnumerable<string> and string[]

Is there any advantage to using this
private static IEnumerable<string> GetColumnNames(this IDataReader reader)
{
for (int i = 0; i < reader.FieldCount; i++)
yield return reader.GetName(i);
}
instead of this
private static string[] GetColumnNames(this IDataReader reader)
{
var columnNames = new string[reader.FieldCount];
for (int i = 0; i < reader.FieldCount; i++)
columnNames[i] = reader.GetName(i);
return columnNames;
}
Here is how I use this method
int orderId = _noOrdinal;
IEnumerable<string> columnNames = reader.GetColumnNames();
if (columnNames.Contains("OrderId"))
orderId = reader.GetOrdinal("OrderId");
while (reader.Read())
yield return new BEContractB2CService
{
//..................
Order = new BEOrder
{ Id = orderId == _noOrdinal ?
Guid.Empty : reader.GetGuid(orderId)
},
//............................
The two approaches are quite different so it depends on what you are subsequently going to do with the result I would say.
For example:
The first case requires the data reader to remain open until the result is read, the second doesn't. So how long are you going to hold this result for and do you want to leave the data reader open that long.
The first case is less performant if you are definitely going to read the data, but probably more performant if you often don't, particularly if there is a lot of data.
The result from your first case should only be read/iterated/searched once. Then second case can be stored and searched multiple times.
If you have a large amount of data then the first case could be used in such a way that you don't need to bring all that data in to memory in one go. But again that really depends on what you do with the IEnumerable in the calling method.
Edit:
Given your use-case the methods are probably pretty much equivalent for any given measure of 'good-ness'. Tables don't tend to have many columns, and your use of .Contains ensures the data will be read every time. Personally I would stick with the array method here if only because it's a more straightforward approach.
What's the next line of the code... is it looking for a different column name? If so the second case is the only way to go.
On reason off the top of my head: The array version means you have to spend time building the array first. Most of the code's clients may not necessarily need a specific array. Mostly, i've found, that most code is just going to iterate over it in which case, why waste time building an array (or list as an alternative) you never actually need.
The first one is lazy. That is your code is not evaluated until you iterate the enumerable and because you use closures it will run the code until it yields a value then turn control back to the calling code until you iterate to the next value via MoveNext. Additionally with linq you can achieve the second one by calling the first and then calling ToArray. The reason you might want to do this is to make sure you get the data as it is when you make the call versus when you iterate in case the values change in between.
One advantage has to do with memory consumption. If FieldCount is say 1 million, then the latter needs to allocate an array with 1 million entries, while the former does not.
This benefit depends on how the method is consumed though. For example, if you are processing a list of files one-by-one, then there is no need to know all the files up front.

Categories

Resources