Is there any advantage to using this
private static IEnumerable<string> GetColumnNames(this IDataReader reader)
{
for (int i = 0; i < reader.FieldCount; i++)
yield return reader.GetName(i);
}
instead of this
private static string[] GetColumnNames(this IDataReader reader)
{
var columnNames = new string[reader.FieldCount];
for (int i = 0; i < reader.FieldCount; i++)
columnNames[i] = reader.GetName(i);
return columnNames;
}
Here is how I use this method
int orderId = _noOrdinal;
IEnumerable<string> columnNames = reader.GetColumnNames();
if (columnNames.Contains("OrderId"))
orderId = reader.GetOrdinal("OrderId");
while (reader.Read())
yield return new BEContractB2CService
{
//..................
Order = new BEOrder
{ Id = orderId == _noOrdinal ?
Guid.Empty : reader.GetGuid(orderId)
},
//............................
The two approaches are quite different so it depends on what you are subsequently going to do with the result I would say.
For example:
The first case requires the data reader to remain open until the result is read, the second doesn't. So how long are you going to hold this result for and do you want to leave the data reader open that long.
The first case is less performant if you are definitely going to read the data, but probably more performant if you often don't, particularly if there is a lot of data.
The result from your first case should only be read/iterated/searched once. Then second case can be stored and searched multiple times.
If you have a large amount of data then the first case could be used in such a way that you don't need to bring all that data in to memory in one go. But again that really depends on what you do with the IEnumerable in the calling method.
Edit:
Given your use-case the methods are probably pretty much equivalent for any given measure of 'good-ness'. Tables don't tend to have many columns, and your use of .Contains ensures the data will be read every time. Personally I would stick with the array method here if only because it's a more straightforward approach.
What's the next line of the code... is it looking for a different column name? If so the second case is the only way to go.
On reason off the top of my head: The array version means you have to spend time building the array first. Most of the code's clients may not necessarily need a specific array. Mostly, i've found, that most code is just going to iterate over it in which case, why waste time building an array (or list as an alternative) you never actually need.
The first one is lazy. That is your code is not evaluated until you iterate the enumerable and because you use closures it will run the code until it yields a value then turn control back to the calling code until you iterate to the next value via MoveNext. Additionally with linq you can achieve the second one by calling the first and then calling ToArray. The reason you might want to do this is to make sure you get the data as it is when you make the call versus when you iterate in case the values change in between.
One advantage has to do with memory consumption. If FieldCount is say 1 million, then the latter needs to allocate an array with 1 million entries, while the former does not.
This benefit depends on how the method is consumed though. For example, if you are processing a list of files one-by-one, then there is no need to know all the files up front.
Related
I have a two-dimensional array, object[,] mData
I need to retrieve the index of a specific string (key) in that object array without using a loop (e.g foreach, for).
The reason why I don't want to use a loop is because I am trying to optimize a current block of code that uses a loop which causes the process to take a long time since it is managing too much data.
Is there a way to do this?
CODE
`
object [,] mData = null;
string sKey = String.Empty;
for (int iIndex = 0; iIndex < mData.GetUpperBound(1); iIndex++)
{
if (mData[0, iIndex].Value == sKey);
{
return;
}
}
`
You will need a loop to linear search for an element. Even if there is a method that you can call to get the index (which there isn't, I don't think), there would still be a loop inside the method.
If you're worried about performance, try a binary search if the data is sorted.
I am trying to optimize a current block of code that uses a loop which causes the process to take a long time since it is managing too many data.
Loops don't neccessarily make your code run significantly slower. The core of your problem is that you have too many data. If you have that many data, then slowness is expected. What you can do is to run the time-consuming operation asynchronously so that the UI doesn't freeze.
My question is raised based on this question, I had posted an answer on that question..here
This is the code.
var lines = System.IO.File.ReadLines(#"C:\test.txt");
var Minimum = lines[0];//Default length set
var Maximum = "";
foreach (string line in lines)
{
if (Maximum.Length < line.Length)
{
Maximum = line;
}
if (Minimum.Length > line.Length)
{
Minimum = line;
}
}
and alternative for this code using LINQ (My approach)
var lines = System.IO.File.ReadLines(#"C:\test.txt");
var Maximum = lines.OrderByDescending(a => a.Length).First().ToString();
var Minimum = lines.OrderBy(a => a.Length).First().ToString();
LINQ is easy to read and implement..
I want to know which one is good for performance.
And how Linq work internally for OrderByDescending and OrderBy for ordering by length?
You can read the source code for OrderBy.
Stop doing micro-optimizing or premature-optimization on your code. Try to write code that performs correctly, then if you face a performance problem later then profile your application and see where is the problem. If you have a piece of code which have performance problem due to finding the shortest and longest string then start to optimize this part.
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass
up our opportunities in that critical 3% - Donald Knuth
File.ReadLines is returning an IEnumerable<string>, It means that if you do a foreach over it it will return data to you one by one. I think the best performance improvement you can do here is to improve the reading of file from the disk. If it is small enough to load the whole file into memory use File.ReadAllLines, if it is not try reading the file in big chunks that fits in memory. Reading a file line by line will cause performance degradation due to I/O operation from disk. So the problem here is not how LINQ or loop perform, The problem is in number of disk reads.
With the second method, you are not only sorting the lines twice... You are reading the file twice. This because File.ReadLines returns a IEnumerable<string>. This clearly shows why you shouldn't ever ever enumerate a IEnumerable<> twice unless you know how it was built. If you really want to do it, add a .ToList() or a .ToArray() that will materialize the IEnumerable<> to a collection... And while the first method has a memory footprint of a single line of text (because it reads the file one line at a time), the second method will load the whole file in memory to sort it, so will have a much bigger memory footprint, and if the file is some hundred mb, the difference is big (note that technically you could have a file with a single line of text long 1gb, so this rule isn't absolute... It is for reasonable files that have lines long up to some hundred characters :-) )
Now... Someone will tell you that premature optimization is evil, but I'll tell you that ignorance is twice evil.
If you know the difference between the two blocks of code then you can do an informed choice between the two... Otherwise you are simply randomly throwing rocks until it seems to work. Where seems to work is the keyword here.
In my opinion, you need to understand some points for deciding what is the best way.
First, let's think that we want to solve the problem with LINQ. Then, to write the most optimized code, you must understand Deferred Execution. Most Linq methods, such as Select, Where, OrderBy, Skip, Take and some others uses DE. So, what is Deferred Execution? It means that, these methods will not be executed unless the user doesn't need them. These methods will just create iterator. And this iterator is ready to be executed, when we need them. So, how can user make them execute? The answer is, with the help of foreach which will call GetEnumerator or other Linq methods. Such as, ToList(), First(), FirstOrDefault(), Max() and some others.
These process will help us to gain some performance.
Now, let's come back to your problem. File.ReadLines will return IEnumerable<string>, which it means that, it will not read the lines, unless we need them. In your example, you have twice called sorting method for this object, which it means that it will sort this collection over again twice. Instead of that, you can sort the collection once, then call ToList() which will execute the OrderedEnumerable iterator and then get first and last element of the collection which physically inside our hands.
var orderedList = lines
.OrderBy(a => a.Length) // This method uses deferred execution, so it is not executed yet
.ToList(); // But, `ToList()` makes it to execute.
var Maximum = orderedList.Last();
var Minimum = orderedList.First();
BTW, you can find OrderBy source code, here.
It returns OrderedEnumerable instance and the sorting algorithm is here:
public IEnumerator<TElement> GetEnumerator()
{
Buffer<TElement> buffer = new Buffer<TElement>(source);
if (buffer.count > 0)
{
EnumerableSorter<TElement> sorter = GetEnumerableSorter(null);
int[] map = sorter.Sort(buffer.items, buffer.count);
sorter = null;
for (int i = 0; i < buffer.count; i++) yield return buffer.items[map[i]];
}
}
And now, let's come back to another aspect which effects the performance. If you see, Linq uses another element to store sorted collection. Of course, it will take some memory, which tells us it is not the most efficent way.
I just tried to explain you how does Linq work. But, I am very agree with #Dotctor as a result to your overall answer. Just, don't forget that, you can use File.ReadAllLines which will not return IEnumerable<stirng>, but string[].
What does it mean? As I tried to explain in the beginning, difference is that, if it is IEnumerable, then .net will read line one by one when enuemrator enumerates over iterator. But, if it is string[], then all lines in our application memory.
The most efficient approach is to avoid LINQ here, the approach using foreach needs only one enumeration.
If you want to put the whole file into a collection anyway you could use this:
List<string> orderedLines = System.IO.File.ReadLines(#"C:\test.txt")
.OrderBy(l => l.Length)
.ToList();
string shortest = orderedLines.First();
string longest = orderedLines.Last();
Apart from that you should read about LINQ's deferred execution.
Also note that your LINQ approach does not only order all lines twice to get the longest and the shortest, it also needs to read the whole file twice since File.ReadLines is using a StreamReader(as opposed to ReadAllLines which reads all lines into an array first).
MSDN:
When you use ReadLines, you can start enumerating the collection of
strings before the whole collection is returned; when you use
ReadAllLines, you must wait for the whole array of strings be returned
before you can access the array
In general that can help to make your LINQ queries more efficient, f.e. if you filter out lines with Where, but in this case it's making things worse.
As Jeppe Stig Nielsen has mentioned in a comment, since OrderBy needs to create another buffer-collection internally(with ToList the second), there is another approach that might be more efficient:
string[] allLines = System.IO.File.ReadAllLines(#"C:\test.txt");
Array.Sort(allLines, (x, y) => x.Length.CompareTo(y.Length));
string shortest = allLines.First();
string longest = allLines.Last();
The only drawback of Array.Sort is that it performs an unstable sort as opposed to OrderBy. So if two lines have the same length the order might not be maintained.
I was hoping to get some advice on how to speed up the following function. Specifically, I'm hoping to find a faster way to convert numbers (mostly doubles, IIRC there's one int in there) to strings to store as Listview subitems. As it stands, this function takes 9 seconds to process 16 orders! Absolutely insane, especially considering that with the exception of the call to process the DateTimes, it's all just string conversion.
I thought it was the actual displaying of the listview items that was slow, so I did some research and found that adding all subitems to an array and using Addrange was far faster than adding the items one at a time. I implemented the change, but got no better speed.
I then added some stopwatches around each line to narrow down exactly what's causing the slowdown; unsurprisingly, the call to the datetime function is the biggest slowdown, but I was surprised to see that the string.format calls were extremely slow as well, and given the number of them, make up the majority of my time.
private void ProcessOrders(List<MyOrder> myOrders)
{
lvItems.Items.Clear();
marketInfo = new MarketInfo();
ListViewItem[] myItems = new ListViewItem[myOrders.Count];
string[] mySubItems = new string[8];
int counter = 0;
MarketInfo.GetTime();
CurrentTime = MarketInfo.CurrentTime;
DateTime OrderIssueDate = new DateTime();
foreach (MyOrder myOrder in myOrders)
{
string orderIsBuySell = "Buy";
if (!myOrder.IsBuyOrder)
orderIsBuySell = "Sell";
var listItem = new ListViewItem(orderIsBuySell);
mySubItems[0] = (myOrder.Name);
mySubItems[1] = (string.Format("{0:g}", myOrder.QuantityRemaining) + "/" + string.Format("{0:g}", myOrder.InitialQuantity));
mySubItems[2] = (string.Format("{0:f}", myOrder.Price));
mySubItems[3] = (myOrder.Local);
if (myOrder.IsBuyOrder)
{
if (myOrder.Range == -1)
mySubItems[4] = ("Local");
else
mySubItems[4] = (string.Format("{0:g}", myOrder.Range));
}
else
mySubItems[4] = ("N/A");
mySubItems[5] = (string.Format("{0:g}", myOrder.MinQuantityToBuy));
string IssueDateString = (myOrder.DateWhenIssued + " " + myOrder.TimeWhenIssued);
if (DateTime.TryParse(IssueDateString, out OrderIssueDate))
mySubItems[6] = (string.Format(MarketInfo.ParseTimeData(CurrentTime, OrderIssueDate, myOrder.Duration)));
else
mySubItems[6] = "Error getting date";
mySubItems[7] = (string.Format("{0:g}", myOrder.ID));
listItem.SubItems.AddRange(mySubItems);
myItems[counter] = listItem;
counter++;
}
lvItems.BeginUpdate();
lvItems.Items.AddRange(myItems.ToArray());
lvItems.EndUpdate();
}
Here's the time data from a sample run:
0: 166686
1: 264779
2: 273716
3: 136698
4: 587902
5: 368816
6: 955478
7: 128981
Where the numbers are equal to the indexes of the array. All other lines were so low in ticks as to be negligible compared to these.
Although I'd like to be able to use the number formatting of string.format for pretty output, I'd like to be able to load a list of orders within my lifetime more, so if there's an alternative to string.format that's considerably faster but without the bells and whistles, I'm all for it.
Edit: Thanks to all of the people who suggested the myOrder class might be using getter methods rather than actually storing the variables as I originally thought. I checked that and sure enough, that was the cause of my slowdown. Although I don't have access to the class to change it, I was able to piggyback onto the method call to populate myOrders and copy each of the variables to a list within the same call, then use that list when populating my Listview. Populates pretty much instantly now. Thanks again.
I find it hard to believe that simple string.Format calls are causing your slowness problems - it's generally a very fast call, especially for nice simple ones like most of yours.
But one thing that might give you a few microseconds...
Replace
string.Format("{0:g}", myOrder.MinQuantityToBuy)
with
myOrder.MinQuantityToBuy.ToString("g")
This will work when you're doing a straight format of a single value, but isn't any good for more complex calls.
I put all the string.format calls into a loop and was able to run them all 1 million times in under a second, so your problem isn't string.format...it's somewhere else in your code.
Perhaps some of these properties have logic in their getter methods? What sort of times do you get if you comment out all the code for the listview?
It is definitely not string.Format that is slowing you down. Suspect the property accesses from myOrder.
On one of the format calls, try to declare a few local variables and set those to the properties you try to format, then pass those local variables to yoru string.Format and retime. You may find that your string.Format now runs in lightning speed as it should.
Now, property accesses usually don't require much time to run. However, I've seen some classes where each property access is logged (for audit trail). Check if this is the case and if some operation is holding your property access from returning immediately.
If there is some operation holding a property access, try to queue up those operations (e.g. queue up the logging calls) and have a background thread execute them. Return the property access immediately.
Also, never put slow-running code (e.g. elaborate calculations) into a property accesser/getter, nor code that has side-effects. People using the class will not be aware that it will be slow (since most property accesses are fast) or has side effects (since most property accesses do not have side effects). If the access is slow, rename the access to a GetXXX() function. If it has side effects, name the method something that conveys this fact.
Wow. I feel a little stupid now. I've spent hours beating my head against the wall trying to figure out why a simple string operation would be taking so long. MarketOrders is (I thought) an array of myOrders, which is populated by an explicit call to a method which is severely restricted as far as how times per second it can be run. I don't have access to that code to check, but I had been assuming that myOrders were simple structs with member variables that were assigned when MarketOrders is populated, so the string.format calls would simply be acting on existing data. On reading all of the replies that point to the access of the myOrder data as the culprit, I started thinking about it and realized that MarketOrders is likely just an index, rather than an array, and the myOrder info is being read on demand. So every time I call an operation on one of its variables, I'm calling the slow lookup method, waiting for it to become eligible to run again, returning to my method, calling the next lookup, etc. No wonder it's taking forever.
Thanks for all of the replies. I can't believe that didn't occur to me.
I am glad that you got your issue solved. However I did a small refactoring on your method and came up with this:
private void ProcessOrders(List<MyOrder> myOrders)
{
lvItems.Items.Clear();
marketInfo = new MarketInfo();
ListViewItem[] myItems = new ListViewItem[myOrders.Count];
string[] mySubItems = new string[8];
int counter = 0;
MarketInfo.GetTime();
CurrentTime = MarketInfo.CurrentTime;
// ReSharper disable TooWideLocalVariableScope
DateTime orderIssueDate;
// ReSharper restore TooWideLocalVariableScope
foreach (MyOrder myOrder in myOrders)
{
string orderIsBuySell = myOrder.IsBuyOrder ? "Buy" : "Sell";
var listItem = new ListViewItem(orderIsBuySell);
mySubItems[0] = myOrder.Name;
mySubItems[1] = string.Format("{0:g}/{1:g}", myOrder.QuantityRemaining, myOrder.InitialQuantity);
mySubItems[2] = myOrder.Price.ToString("f");
mySubItems[3] = myOrder.Local;
if (myOrder.IsBuyOrder)
mySubItems[4] = myOrder.Range == -1 ? "Local" : myOrder.Range.ToString("g");
else
mySubItems[4] = "N/A";
mySubItems[5] = myOrder.MinQuantityToBuy.ToString("g");
// This code smells:
string issueDateString = string.Format("{0} {1}", myOrder.DateWhenIssued, myOrder.TimeWhenIssued);
if (DateTime.TryParse(issueDateString, out orderIssueDate))
mySubItems[6] = MarketInfo.ParseTimeData(CurrentTime, orderIssueDate, myOrder.Duration);
else
mySubItems[6] = "Error getting date";
mySubItems[7] = myOrder.ID.ToString("g");
listItem.SubItems.AddRange(mySubItems);
myItems[counter] = listItem;
counter++;
}
lvItems.BeginUpdate();
lvItems.Items.AddRange(myItems.ToArray());
lvItems.EndUpdate();
}
This method should be further refactored:
Remove outer dependencies Inversion of control (IoC) in mind and by using dependency injection (DI);
Create new property "DateTimeWhenIssued" for MyOrder that will return DateTime data type. This should be used instead of joining two strings (DateWhenIssued and TimeWhenIssued) and then parsing them into DateTime;
Rename ListViewItem as this is a built in class;
ListViewItem should have a new constructor for boolean "IsByOrder": var listItem = new ListViewItem(myOrder.IsBuyOrder). Instead of a string "Buy" or "Sell";
"mySubItems" string array should be replaced with a class for better readability and extendibility;
Lastly, the foreach (MyOrder myOrder in myOrders) could be replaced with a "for" loop as you are using a counter anyway. Besides "for" loops are faster too.
Hopefully you do not mind my suggestions and that they are doable in your situation.
PS. Are you using generic arrays? ListViewItem.SubItems property could be
public List<string> SubItems { get; set; };
I'd like to insert an int into a sorted array. This operation is going to be performed very often, so it needs to be as fast as possible.
It is possible and even preferred to use a List or any other class instead of an array
All values are in the 1 to 34 range
The array typically contains exactly 14 values
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide. Also, I felt like I missed an idea. Do you have experiences on this topic or any new ideas to consider?
I will use an int array whose length is 35(because you said range 1-34) to record the status of the numbers.
int[] status = Enumerable.Repeat(0, 35).ToArray();
//an array contains 35 zeros
//which means currently there is no elements in the array
status[10] = 1; // now the array have only one number: 10
status[11] ++; // a new number 11 is added to the list
So if you want to add a number i to the list:
status[i]++; // O(1) to add a number
To remove an i from the list:
status[i]--; // O(1) to remove a number
Want to know all the numebrs in the list?
for (int i = 0; i < status.Length; i++)
{
if (status[i] > 0)
{
for (int j = 0; j < status[i]; j++)
Console.WriteLine(i);
}
}
//or more easier using LINQ
var result = status.SelectMany((i, index) => Enumerable.Repeat(index, i));
The following example may help you understand my code better:
the real number array: 1 12 12 15 9 34 // i don't care if it's sorted
the status array: status[1]=1,status[12]=2,status[15]=1,status[9]=1,status[34]=1
all others are 0
At 14 values this is a pretty small array, I don't think switching to a smarter data structure such as a list will win you much, especially if you fast good random access. Even binary search may actually be slower than linear search at this scale. Are you sure that, say, insert-on-copy does not satisfy your performance requirements?
This operation is going to be performed very often, so it needs to be as fast as possible.
The things that you notice happen "very often" are frequently not the bottlenecks in the program - it's often surprising what the actual bottlenecks are. You should code something simple and measure the actual performance of your program before performing any optimizations.
I was thinking of many different approaches, including binary search and simple insert-on-copy, but found it hard to decide.
Assuming that this is the bottleneck, the big-O performance of the different methods is not going to be relevant here because of the small size of your array. It is easier to just try a few different approaches, measure the results, see which performs best and choose that method. If you have followed the advice from the first paragraph you already have a profiler setup that you can use for this step too.
For inserting into the middle, a LinkedList<int> would be the fastest option - anything else involves copying data. At 14 elements, don't stress over binary search etc - just walk forwards to the item you want:
using System;
using System.Collections.Generic;
static class Program
{
static void Main()
{
LinkedList<int> data = new LinkedList<int>();
Random rand = new Random(12345);
for (int i = 0; i < 20; i++)
{
data.InsertSortedValue(rand.Next(300));
}
foreach (int i in data) Console.WriteLine(i);
}
}
static class LinkedListExtensions {
public static void InsertSortedValue(this LinkedList<int> list, int value)
{
LinkedListNode<int> node = list.First, next;
if (node == null || node.Value > value)
{
list.AddFirst(value);
}
else
{
while ((next = node.Next) != null && next.Value < value)
node = next;
list.AddAfter(node, value);
}
}
}
Doing the brute-force approach is the best decision here because 14 isn't a number :). However, this is not a scalable decision, since should 14 become 14000 one day that will cause problems
What is the most common operation with your array?
Insert? Read?
Heap data structure will give you O(log(14)) for both of them. SortedDictionary may hit your performance.
Using a simple array will give you O(1) for reading and O(14) for insert.
By the way, have you tried System.Collections.Generic.SortedDictionary ot System.Collections.Generic.SortedList?
If you're on .Net 4 you should take a look at the SortedSet<T>. Otherwise take a look at SortedDictionary<TKey, TValue> where you make TValue as object and just put null into it, cause you're just interested into the keys.
If there is no repeated value on the array and the possible values won´t change maybe a fixed size array where the value is equal to the index is a good choice
Both insert and read are O(1)
You have a range of possible values from 1-34 which is rather narrow. So the fastest way would likely be using an array with 34 slots. To insert a number n just do array[n-1]++ and to remove it do array[n.1]-- (if n>0).
To check if a value exists in your collection you do array[n-1]>0.
edit: Damn...Danny was faster. :)
Write a method takes an array of integers and sorts them in place using Bubble Sort. The method is not allowed to create any additional arrays. Bubble Sort is a simple sorting algorithm that works by looping through the array to be sorted, comparing each pair of adjacent elements and swapping them if they are in the wrong order.
i am sending out email to a list of people. I have the list of recipients in array but the list can get up to 500 people. There is a limitation on the number of recipients that my mail server sends out at once (50 recipients)
so if the list is > 50 i need to break it up in to different mails.
What is the best way to take one array and break it up into arrays of 50
for example:
if array is 120 long, i would expect 3 arrays returned, one with 50, another with 50 and a third with 20.
You could use the Batch operation from MoreLINQ:
Person[] array = ...;
var arrays = list.Batch(50).Select(x = x.ToArray());
foreach (Person[] shorterArray in arrays)
{
...
}
(If you're happy with IEnumerable<Person> instead of arrays, you don't need the Select call of course.)
Maybe ArraySegment<T> works for you? You'd have to split it up manually though, but this is not hard in a loop.
int recipient = 0;
while (recipient < recipients.Count) {
ArraySegment<string> recipientSegment = new ArraySegment<string>(recipients, recipient, Math.Min(50, recipients.Count-recipient));
// build your message here, using the recipientSegment for the names
recipient += 50;
}
I would simply iterate over the complete array, building up the recipients string, then sending out an email when the limit is reached, then resetting the string and continuing on with the iteration until the next limit event or until the end of the array is reached.
If you can use LINQ when you may find this useful: Linq: How to group by maximum number of items
Shouldn't LINQ be the right stuff for this?
A common method for "paging" results from a set is to combine the Skip and Take methods provided by LINQ. This solution is great because it can be further combined with other LINQ methods to implement filtering, ordering, etc. as needed.
I'm not sure what the performance considerations are for your application, so keep in mind that this may not perform very well for sets where the number of pages is relatively large (i.e., batch size is significantly smaller than the total size of the set), but it's at least fairly straightforward for anyone familiar with this style of coding.
Here's an example of what this implementation might look like:
List<EmailAddress> list = new List<EmailAddress>();
const int BATCH_SIZE = 50;
for (int i = 0; i < list.Count; i += BATCH_SIZE)
{
IEnumerable<EmailAddress> currentBatch =
list.Skip(i).Take(BATCH_SIZE);
// do stuff...
}