Most efficient way to determine first character in string?

Most efficient way to determine first character in string? - c#

Which of these methods are the most efficient one or is there a better way to do it?
this.returnList[i].Title[0].ToString()
or
this.returnList[i].Title.Substring(0, 1)

They're both very fast:
Char Index
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample[0].ToString();
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.2012243
00:00:00.2207168
00:00:00.2184807
00:00:00.2258847
00:00:00.2296456
00:00:00.2261465
00:00:00.2120131
00:00:00.2221702
00:00:00.2346083
00:00:00.2330840
Substring
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample.Substring(0, 1);
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.3268155
00:00:00.3337077
00:00:00.3439908
00:00:00.3273090
00:00:00.3380794
00:00:00.3400650
00:00:00.3280275
00:00:00.3333719
00:00:00.3295982
00:00:00.3368425
I also agree with BrokenGlass that using the char index is a cleaner way of writing it. Plus if you're doing it 10 trillion times it'll be much faster!

There is a big loophole in your code that may cause problems, depending on what you mean by "first character" and what returnList contains.
C# strings contain UTF-16, which is a variable-length encoding, and if returnList is an array of strings, then returnList[i] might only be one char of a Unicode point. If you want to return the first Unicode grapheme of a string:
string s = returnList[i].Title;
if (string.IsNullOrEmpty(s))
return s;
int charsInGlyph = char.IsSurrogatePair(s, 0) ? 2 : 1;
return s.Substring(0, charsInGlyph);
You can run into the same problems with BOMs, tagged, and combining characters; these are all valid characters but are not meaningful if displayed to a user.
If you want Unicode points or graphemes, not chars, you must use strings; Unicode graphemes can be more than one char.

I don't think it would matter much efficiency wise, but in my opinion the clearer, more idiomatic and hence more maintainable way of returning the first character is using the index operator:
char c = returnList[i].Title[0];
This assumes of course there is at least one character, if that's not a given you have to check for that.

Those should be close to identical in performance.
The expensive part of the operation is to create the string, and there is no more efficient way to do that.
Unless of couse you want to pre-create strings for all possible characters and store in a dictionary, but that would use up a lot of memory for such a trivial task.

returnList[I].Title[0] is much faster as it does not need to create a new string, only accessing a char from the original one.
Of course, it will throw an exception if the string is empty, so you should check that first.
As a rule of thumb, never use strings with a fixed length of 1. that's what char is for.
The performance difference is not likely to matter though, but the better readability certainly will.

Related

Smallest and fastest way to capture the output of a forloop

I am working on an optimization pass for our asp.net web application and I want to shrink down and speed up as much of the backend as I can.
Is there any way to shrink down a for loop so that it becomes a lambda expression maybe?
A pure example of something that I might want to shrink down:
string outS = "";
for (int i = 0; i < length; i++)
{
outS += random.Next(0, 9).ToString();
}
return int.Parse(outS).ToString();
Instead of creating a new variable, performing some sort of function to generate it, and then returning it, is there a way where I can do all that in a single line? Like returning a lambda expression function?
Or is the current functionality the fastest way to do it anyway?
Like this silly example:
return => for(int i = 0; i < length; i++)
{
random.Next(0, 9).ToString();
}

If the logic method returns a random number that has a specified number of digits in the length variable, you could do something like this:
private int TestMethod(int length) =>
new Random().Next((int)Math.Pow(10, length - 1), (int)Math.Pow(10, length));
This form has some adventages:
No string concatenation. String concatenation is not recommended if many concatenations are performed.
The number of times a random number is requested is reduced (in the above code, the Random.Next method is only invoked once).
Nevertheless, it's a good practice that you perform a profiler comparing alternatives. You can do it with the Diagnostic Tools View in Visual Studio, and with the StopWatch class

If you're just wondering about condensing lines of code, you can use Linq's ForEach method and pass in a lambda expression.
This:
string outS = "";
for (int i = 0; i < length; i++)
{
outS += random.Next(0, 9).ToString();
}
return int.Parse(outS).ToString();
Becomes this:
string outS = "";
new List<int>(10).ForEach(_ => outS += random.Next(0, 9).ToString());
return int.Parse(outS).ToString();

Can I achieve linear (or close to) complexity concatenating strings without allocation?

{
std::string s = "this is a string ";
std::string res = std::string();
int sLength = s.length();
for(int i = 0; i < sLength; i++)
res += s[i];
}
I would imagine this C++ is strictly linear in complexity (with regard to character allocation), as opposed to the quadratic complexity of the equivalent in C#.
However, the fact that in this case, memory is not allocated beforehand, are we actually looking at quadratic complexity as well ? If yes, would allocation of a char array help us to achieve linear complexity ?

You could achieve linear complexity by calling std::string::reserve. This would prevent re-allocations as the string grows:
std::string s = "this is a string ";
std::string res = "what is this? ";
res.reserve(s.length() + res.length());
for(int i = 0; i < s.length(); i++)
res += s[i];
Obviously you wouldn't concatenate strings like that in real life.

According to cppr, adding a character to a string has constant complexity (although I think this carries an implicit "amortized", as for string::push_back).
So your code already has linear complexity.
If you care about the allocation cost and have an upper bound for the size beforehand, you can also just reserve enough space to avoid any reallocation and copying of the string. Then, not even a plain char buffer should beat your performance (significantly).

What would be the shortest way to sum up the digits in odd and even places separately

I've always loved reducing number of code lines by using simple but smart math approaches. This situation seems to be one of those that need this approach. So what I basically need is to sum up digits in the odd and even places separately with minimum code. So far this is the best way I have been able to think of:
string number = "123456789";
int sumOfDigitsInOddPlaces=0;
int sumOfDigitsInEvenPlaces=0;
for (int i=0;i<number.length;i++){
if(i%2==0)//Means odd ones
sumOfDigitsInOddPlaces+=number[i];
else
sumOfDigitsInEvenPlaces+=number[i];
{
//The rest is not important
Do you have a better idea? Something without needing to use if else

int* sum[2] = {&sumOfDigitsInOddPlaces,&sumOfDigitsInEvenPlaces};
for (int i=0;i<number.length;i++)
{
*(sum[i&1])+=number[i];
}

You could use two separate loops, one for the odd indexed digits and one for the even indexed digits.
Also your modulus conditional may be wrong, you're placing the even indexed digits (0,2,4...) in the odd accumulator. Could just be that you're considering the number to be 1-based indexing with the number array being 0-based (maybe what you intended), but for algorithms sake I will consider the number to be 0-based.
Here's my proposition
number = 123456789;
sumOfDigitsInOddPlaces=0;
sumOfDigitsInEvenPlaces=0;
//even digits
for (int i = 0; i < number.length; i = i + 2){
sumOfDigitsInEvenPlaces += number[i];
}
//odd digits, note the start at j = 1
for (int j = 1; i < number.length; i = i + 2){
sumOfDigitsInOddPlaces += number[j];
}
On the large scale this doesn't improve efficiency, still an O(N) algorithm, but it eliminates the branching

Since you added C# to the question:
var numString = "123456789";
var odds = numString.Split().Where((v, i) => i % 2 == 1);
var evens = numString.Split().Where((v, i) => i % 2 == 0);
var sumOfOdds = odds.Select(int.Parse).Sum();
var sumOfEvens = evens.Select(int.Parse).Sum();

Do you like Python?
num_string = "123456789"
odds = sum(map(int, num_string[::2]))
evens = sum(map(int, num_string[1::2]))

This Java solution requires no if/else, has no code duplication and is O(N):
number = "123456789";
int[] sums = new int[2]; //sums[0] == sum of even digits, sums[1] == sum of odd
for(int arrayIndex=0; arrayIndex < 2; ++arrayIndex)
{
for (int i=0; i < number.length()-arrayIndex; i += 2)
{
sums[arrayIndex] += Character.getNumericValue(number.charAt(i+arrayIndex));
}
}

Assuming number.length is even, it is quite simple. Then the corner case is to consider the last element if number is uneven.
int i=0;
while(i<number.length-1)
{
sumOfDigitsInEvenPlaces += number[ i++ ];
sumOfDigitsInOddPlaces += number[ i++ ];
}
if( i < number.length )
sumOfDigitsInEvenPlaces += number[ i ];
Because the loop goes over i 2 by 2, if number.length is even, removing 1 does nothing.
If number.length is uneven, it removes the last item.
If number.length is uneven, then the last value of i when exiting the loop is that of the not yet visited last element.
If number.length is uneven, by modulo 2 reasoning, you have to add the last item to sumOfDigitsInEvenPlaces.
This seems slightly more verbose, but also more readable, to me than Anonymous' (accepted) answer. However, benchmarks to come.
Well, the compiler seems to think my code more understandable as well, since he removes it all if I don't print the results (which explains why I kept getting a time of 0 all along...). The other code though is obfuscated enough for even the compiler.
In the end, even with huge arrays, it's pretty hard for clock_t to tell the difference between the two. You get about a third less instructions in the second case, but since everything's in cache (and your running sums even in registers) it doesn't matter much.
For the curious, I've put the disassembly of both versions (compiled from C) here : http://pastebin.com/2fciLEMw

Time complexity of a powerset generating function

I'm trying to figure out the time complexity of a function that I wrote (it generates a power set for a given string):
public static HashSet<string> GeneratePowerSet(string input)
{
HashSet<string> powerSet = new HashSet<string>();
if (string.IsNullOrEmpty(input))
return powerSet;
int powSetSize = (int)Math.Pow(2.0, (double)input.Length);
// Start at 1 to skip the empty string case
for (int i = 1; i < powSetSize; i++)
{
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
string set = string.Empty;
for (int j = 0; j < pset.Length; j++)
{
if (pset[j] == '1')
{
set = string.Concat(set, input[j].ToString());
}
}
powerSet.Add(set);
}
return powerSet;
}
So my attempt is this:
let the size of the input string be n
in the outer for loop, must iterate 2^n times (because the set size is 2^n).
in the inner for loop, we must iterate 2*n times (at worst).
1. So Big-O would be O((2^n)*n) (since we drop the constant 2)... is that correct?
And n*(2^n) is worse than n^2.
if n = 4 then
(4*(2^4)) = 64
(4^2) = 16
if n = 100 then
(10*(2^10)) = 10240
(10^2) = 100
2. Is there a faster way to generate a power set, or is this about optimal?

A comment:
the above function is part of an interview question where the program is supposed to take in a string, then print out the words in the dictionary whose letters are an anagram subset of the input string (e.g. Input: tabrcoz Output: boat, car, cat, etc.). The interviewer claims that a n*m implementation is trivial (where n is the length of the string and m is the number of words in the dictionary), but I don't think you can find valid sub-strings of a given string. It seems that the interviewer is incorrect.
I was given the same interview question when I interviewed at Microsoft back in 1995. Basically the problem is to implement a simple Scrabble playing algorithm.
You are barking up completely the wrong tree with this idea of generating the power set. Nice thought, clearly way too expensive. Abandon it and find the right answer.
Here's a hint: run an analysis pass over the dictionary that builds a new data structure more amenable to efficiently solving the problem you actually have to solve. With an optimized dictionary you should be able to achieve O(nm). With a more cleverly built data structure you can probably do even better than that.

2. Is there a faster way to generate a power set, or is this about optimal?
Your algorithm is reasonable, but your string handling could use improvement.
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
All you're doing here is setting up a bitfield, but using a string. Just skip this, and use variable i directly.
for (int j = 0; j < input.Length; j++)
{
if (i & (1 << j))
{
When you build the string, use a StringBuilder, not creating multiple strings.
// At the beginning of the method
StringBuilder set = new StringBuilder(input.Length);
...
// Inside the loop
set.Clear();
...
set.Append(input[j]);
...
powerSet.Add(set.ToString());
Will any of this change the complexity of your algorithm? No. But it will significantly reduce the number of extra String objects you create, which will provide you a good speedup.

Fastest way to count number of uppercase characters in c#

Any thoughts on the efficiency of this? ...
CommentText.ToCharArray().Where(c => c >= 'A' && c <= 'Z').Count()

Ok, just knocked up some code to time your method against this:
int count = 0;
for (int i = 0; i < s.Length; i++)
{
if (char.IsUpper(s[i])) count++;
}
The result:
Yours: 19737 ticks
Mine: 118 ticks
Pretty big difference! Sometimes the most straight-forward way is the most efficient.
Edit
Just out of interest, this:
int count = s.Count(c => char.IsUpper(c));
Comes in at at around 2500 ticks. So for a "Linqy" one-liner it's pretty quick.

First there is no reason you need to call ToCharArray() since, assuming CommentText is a string it is already an IEnumerable<char>. Second, you should probably be calling char.IsUpper instead of assuming you are only dealing with ASCII values. The code should really look like,
CommentText.Count(char.IsUpper)
Third, if you are worried about speed there isn't much that can beat the old for loop,
int count = 0;
for (int i = 0; i < CommentText.Length; i++)
if (char.IsUpper(CommentText[i]) count++;
In general, calling any method is going to be slower than inlining the code but this kind of optimization should only be done if you are absolutely sure this is the bottle-neck in your code.

You're only counting standard ASCII and not ÃÐÊ etc.
How about
CommentText.ToCharArray().Where(c => Char.IsUpper(c)).Count()

Without even testing I'd say
int count = 0;
foreach (char c in commentText)
{
if (Char.IsUpper(c))
count++;
}
is faster, off now to test it.

What you are doing with that code is to create a collection with the characters, then create a new collection containing only the uppercase characters, then loop through that collection only to find out how many there are.
This will perform better (but still not quite as good as a plain loop), as it doesn't create the intermediate collections:
CommentText.Count(c => Char.IsUpper(c))
Edit: Removed the ToCharArray call also, as Matt suggested.

I've got this
Regex x = new Regex("[A-Z]{1}",
RegexOptions.Compiled | RegexOptions.CultureInvariant);
int c = x.Matches(s).Count;
but I don't know if its particularly quick. It won't get special chars either, I s'pose
EDIT:
Quick comparison to this question's answer. Debug in vshost, 10'000 iterations with the string:
aBcDeFGHi1287jKK6437628asghwHllmTbynerA
The answer: 20-30 ms
The regex solution: 140-170 ms

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Most efficient way to determine first character in string? - c#

Which of these methods are the most efficient one or is there a better way to do it? this.returnList[i].Title[0].ToString() or this.returnList[i].Title.Substring(0, 1)

Related

Smallest and fastest way to capture the output of a forloop

Can I achieve linear (or close to) complexity concatenating strings without allocation?

What would be the shortest way to sum up the digits in odd and even places separately

Time complexity of a powerset generating function

Fastest way to count number of uppercase characters in c#

Categories

Resources