Algorithm Comparison C# [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I am currently looking at typical interview questions, to get me in the right frame of mind.
I am trying to come up with my own solutions to the problems instead of trying to remember the given solutions.
The problem is that I'm not sure if my solutions are optimal or have a major flaw in design that i am not seeing.
So here is one of the solutions that I came up with for the basic "Is this string unique" problem as in check if all characters in a string are unique.
public static bool IsUnique(string str)
{
bool isUnique = true;
for (int i = 0; i < str.Length; i++)
{
if (str.LastIndexOf(str.ElementAt(i)) != i)
{
isUnique = false;
break;
}
}
return isUnique;
}
Does anyone have advice on whether this code is optimal and has acceptable time and space complexity?

For the purposes of this answer I will refer to Big-O notation to indicate complexity of an algorithm. The trick to efficiency is to realize the minimum Big-O measurement at which the problem can be solved, and then attempt to replicate that efficiency.
You can derive some efficiency facts by thinking about the algorithm logically: to check if all characters are unique, you need to evaluate all characters. So that's an O(n) traversal of the string guaranteed, and I doubt you'd easily get more efficient than that. Now, can you solve it yourself in O(n) or O(2n) time? If so, that's pretty decent because your algorithm is in linear time and will scale linearly (steadily get slower for larger string inputs).
Your current algorithm loops over the string and then for each character, iterates over the string again to compare it and find an equal character. This makes the algorithm an n traversal where each visit does an n traversal itself, so an O(n^2) algorithm. This is known as a polynomial time algorithm, which is not very good because it does not scale linearly; it scales polynomially. This means that your algorithm will get much slower with larger inputs, and that's a bad thing.
A quick change to make it slightly more efficient would be to start the comparison for an equivalent character at the current index you're at in the string + 1... You know that all previously checked characters are unique, so you care only about future characters. This would become an n traversal where each visit does a substring traversal from the current point (less work done as you traverse the string), but this is also an O(n^2) algorithm because it runs in the square of the outer loop's time. This is also a polynomial time algorithm, as before, but is slightly more efficient. It will still scale badly with larger inputs, however.
Think of alternative ways to avoid repeated iterations. These often come at the cost of memory, but are practical. I know how I would try and solve it, but telling you my answer doesn't help you learn. ;)
EDIT: As you requested, I'll share my answer
I'd do it by having a HashSet that I load each visited character into. HashSet lookups and adds are approximately an O(1) operation. The beauty of the HashSet.Add method is that it returns true if it added the value and false if the value already existed (which is the condition that determines your algorithm result). So mine would be:
var hashSet = new HashSet<char>();
foreach (char c in myString)
{
if (!hashSet.Add(c))
{
return false;
}
}
return true;
Pros: O(n) linear algorithm.
Cons: Extra memory used for HashSet.
EDIT2: Everyone loves cheap LINQ tricks, so here's another way
var hashSet = new HashSet<char>();
return myString.Any(c => !hashSet.Add(c));

Using a HashSet is more efficient as it has a constant looup time O(1), compared to looking up a character in a string with a linear lookup time O(n):
public static bool AreCharsUnique(string str)
{
var charset = new HashSet<char>();
foreach (char c in str) {
if (charset.Contains(c)) {
return false;
} else {
charset.Add(c);
}
}
return true;
}

Related

Can you speed up this algorithm? C# / C++ [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hey I've been working on something from time to time and it has become relatively large now (and slow). However I managed to pinpoint the bottleneck after close up measuring of performance in function of time.
Say I want to "permute" the string "ABC". What I mean by "permute" is not quite a permutation but rather a continuous substring set following this pattern:
A
AB
ABC
B
BC
C
I have to check for every substring if it is contained within another string S2 so I've done some quick'n dirty literal implementation as follows:
for (int i = 0; i <= strlen1; i++)
{
for (int j = 0; j <= strlen2- i; j++)
{
sub = str1.Substring(i, j);
if (str2.Contains(sub)) {do stuff}
else break;
This was very slow initially but once I realised that if the first part doesnt exist, there is no need to check for the subsequent ones meaning that if sub isn't contained within str2, i can call break on the inner loop.
Ok this gave blazing fast results but calculating my algorithm complexity I realised that in worst case this will be N^4 ? I forgot that str.contains() and str.substr() both have their own complexities (N or N^2 I forgot which).
The fact that I have a huge amount of calls on those inside a 2nd for loop makes it perform rather.. well N^4 ~ said enough.
However I calculated the average run-time of this both mathematically using probability theory to evaluate the probability of growth of the substring in a pool of randomly generated strings (this was my base line) measuring when the probability became > 0.5 (50%)
This showed an exponential relationship between the number of different characters and the string length (roughly) which means that in the scenarios I use my algorithm the length of string1 wont (most probably) never exceed 7
Thus the average complexity would be ~O(N * M) where N is string length1 and M is string length 2. Due to the fact that I've tested N in function of constant M, I've gotten linear growth ~O(N) (not bad opposing to the N^4 eh?)
I did time testing and plotted a graph which showed nearly perfect linear growth so I got my actual results matching my mathematical predictions (yay!)
However, this was NOT taking into account the cost of string.contains() and string.substring() which made me wonder if this could be optimized even further?
I've been also thinking of making this in C++ because I need rather low-level stuff? What do you guys think? I have put a great time into analysing this hope I've elaborated everything clear enough :)!
Your question is tagged both C++ and C#.
In C++ the optimal solution will be to use iterators, and std::search. The original strings remains unmodified, and no intermediate objects get created. There won't be an equivalent of your Substring() taking place at all, so this eliminates that part of the overhead.
This should achieve the theoretically-best performance: brute force search, testing all permutations, with no intermediate object construction or destruction, other than the iterators themselves, which simply replace your two int index variables. I can't think of any faster way of implementing this basic algorithm.
Are You testing one string against one string? If You test bunch of strings against another bunch of strings, it is a whole different story. Even if You have the best algorithm for comparing one string against another O(X), it does not mean repeating it M*N times You would get the best algorithm for processing M strings against N.
When I made something simmiliar, I built dictionary of all substrings of all N strings
Dictionary<string, List<int>>
The string is a substring and int is index of string that contains that substring. Then I tested all substrings of all M strings against it. The speed was suddenly not O(M*N*X), but O(max(M,N)*S), where S is number of substrings of one string. Depending on M, N, X, S that may be faster. I do not say the dictionary of substrings is the best approach, I just want to point out that You should always try to see the whole picture.

Algorithm for Math.Random [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
This was the question asked by one Interviewer. I was unable to answer.
Question was, assume you want to pick a random number from the given array.
Condition is you are not supposed to pick anything in sequential and
not to use built in Random function.
I have no idea. Like to know how is this Math.Random does for us?
I googled and didn't find the implementation/logic behind that.
Any one know?
So far three people have told you to use the last digit of Ticks. This doesn't work. Try doing so in a tight loop and you will quickly see why it is a bad idea.
The question is not very well posed. I like giving ambiguously posed questions in interviews because you get to find out how the candidate deals with an ambiguous situation. In this case I would immediately push back and find out what the interviewer means by "random". Is pseudo-randomness good enough? Is there a source of high-quality entropy available?
Once you have a clarified question it should be easier to answer.
The problem comes down to managing entropy. If you have a very weak source of entropy -- like the value of Ticks (not the last digit, which is worthless, but the entire value) then you can use that to seed a pseudo-random-number generator. If you have a high quality source of entropy then you can just use that to generate random bits directly.
Guaranteed to be random. (tongue FIRMLY in cheek):
void Main()
{
Enumerable.Range(0, 10).Select(x => ComeOnItsKindaRandom(0, 10)).Dump();
}
public int ComeOnItsKindaRandom(int minValue, int maxValue)
{
var query = "http://www.random.org/integers/?num=1&min={0}&max={1}&col=1&base=10&format=plain&rnd=new";
var request = WebRequest.Create(string.Format(query, minValue, maxValue));
var response = request.GetResponse();
using(var sr = new StreamReader(response.GetResponseStream()))
{
var body = sr.ReadToEnd().Trim();
return int.Parse(body);
}
}
If you only want to get one item from an array, without using the Random class, you could use a modulo function with an unknown value, such as DateTime.Now.Ticks:
string[] items = new[] { "1", "2", "3", "4", "5" };
// Modulo items.Lenth returns a value from 0 to Length - 1
int index = (int)(DateTime.Now.Ticks % items.Length);
Console.WriteLine(items[index]);
I would use the
DateTime.Now.Tick
And then taking just enough numbers for example for Math.Random(10), I will only take the last two numbers.
Or you can take the modulo of this tick like followed :
public static class MyMath
{
private static int counter = 1;
public static int Random(int max)
{
counter++;
long ticks = DateTime.Now.Ticks;
int result = Math.Abs((int) (ticks/counter)%max);
return result;
}
}
see the following test :
[Test]
public void test()
{
List<int> test = new List<int>();
for (int i = 0; i < 10; i++)
{
test.Add(MyMath.Random(100));
}
Console.WriteLine("result:");
foreach (int i in test)
{
Console.WriteLine();
}
}
Here is an implementation of Random numbers in C. You could try rewriting it in C#.
Random Numbers for C: End, at last?
http://www.cse.yorku.ca/~oz/marsaglia-rng.html
It seems to be of very high quality.
But writing this code in a interview mite not be easy, but you could definitely tell him the ideas used.
Just one idea, you could use one of the last digits in DateTime.Now.Ticks to get a basis to choose the index. Or maybe some hash function on the same thing. Or use a web service that can give you random numbers measured from radiation. Math.Random just picks from predefined tables (yes indeed, it's not truly random).
I think they were just seeing if you knew about LCG (Linear Congruential Generator) algorithms.
The maths behind them is somewhat tricky however, so I doubt they could expect you to be able to write one off the top of your head.
But failing that, couldn't you just cheat like this to generate a random index?
int index = Guid.NewGuid().GetHashCode() % array.Length;
First of all, it's known as Pseudo-Random not just Random, since Random series is impossible to generate in a computational form,
Most Pseudo-Random number generators PRNG are on this form:
at Time 0: R(0) = Random(Seed)
at Time i: R(i) = Random(R(i-1));
Second, Random doesn't mean you don't know what will the ith outcome, but that the series is robust & it's very difficult to guess the formula or the seed given chain of outcomes
Hope this helps

What is the fastest way to calculate frequency distribution for array in C#?

I am just wondering what is the best approach for that calculation. Let's assume I have an input array of values and array of boundaries - I wanted to calculate/bucketize frequency distribution for each segment in boundaries array.
Is it good idea to use bucket search for that?
Actually I found that question Calculating frequency distribution of a collection with .Net/C#
But I do not understand how to use buckets for that purpose cause the size of each bucket can be different in my situation.
EDIT:
After all discussions I have inner/outer loop solution, but still I want to eliminate the inner loop with a Dictionary to get O(n) performance in that case, if I understood correctly I need to hash input values into a bucket index. So we need some sort of hash function with O(1) complexity? Any ideas how to do it?
Bucket Sort is already O(n^2) worst case, so I would just do a simple inner/outer loop here. Since your bucket array is necessarily shorter than your input array, keep it on the inner loop. Since you're using custom bucket sizes, there are really no mathematical tricks that can eliminate that inner loop.
int[] freq = new int[buckets.length - 1];
foreach(int d in input)
{
for(int i = 0; i < buckets.length - 1; i++)
{
if(d >= buckets[i] && d < buckets[i+1])
{
freq[i]++;
break;
}
}
}
It's also O(n^2) worst case but you can't beat the code simplicity. I wouldn't worry about optimization until it becomes a real issue. If you have a larger bucket array, you could use a binary search of some sort. But, since frequency distributions are typically < 100 elements, I doubt you'd see a lot of real-world performance benefit.
If your input array represents real world data (with its patterns) and array of boundaries is large to iterate it again and again in inner loop you can consider the following approach:
First of all sort your input array. If you work with real-world data
I would recommend to consider Timsort - Wiki for this. It
provides very good performance guarantees for a patterns that can be seen in
real-world data.
Traverse through sorted array and compare it with the first value in the array of boundaries:
If value in input array is less then boundary - increment frequency counter for this boundary
If value in input array is bigger then boundary - go to the next value in array of boundaries and increment the counter for new boundary.
In a code it can look like this:
Timsort(myArray);
int boundPos;
boundaries = GetBoundaries(); //assume the boundaries is a Dictionary<int,int>()
for (int i = 0; i<myArray.Lenght; i++) {
if (myArray[i]<boundaries[boundPos]) {
boundaries[boubdPos]++;
}
else {
boundPos++;
boundaries[boubdPos]++;
}
}

Optimizing a Recursive Function for Very Large Lists .Net

I have built an application that is used to simulate the number of products that a company can produce in different "modes" per month. This simulation is used to aid in finding the optimal series of modes to run in for a month to best meet the projected sales forecast for the month. This application has been working well, until recently when the plant was modified to run in additional modes. It is now possible to run in 16 modes. For a month with 22 work days this yields 9,364,199,760 possible combinations. This is up from 8 modes in the past that would have yielded a mere 1,560,780 possible combinations. The PC that runs this application is on the old side and cannot handle the number of calculations before an out of memory exception is thrown. In fact the entire application cannot support more than 15 modes because it uses integers to track the number of modes and it exceeds the upper limit for an integer. Baring that issue, I need to do what I can to reduce the memory utilization of the application and optimize this to run as efficiently as possible even if it cannot achieve the stated goal of 16 modes. I was considering writing the data to disk rather than storing the list in memory, but before I take on that overhead, I would like to get people’s opinion on the method to see if there is any room for optimization there.
EDIT
Based on a suggestion by few to consider something more academic then merely calculating every possible answer, listed below is a brief explanation of how the optimal run (combination of modes) is chosen.
Currently the computer determines every possible way that the plant can run for the number of work days that month. For example 3 Modes for a max of 2 work days would result in the combinations (where the number represents the mode chosen) of (1,1), (1,2), (1,3), (2,2), (2,3), (3,3) For each mode a product produces at a different rate of production, for example in mode 1, product x may produce at 50 units per hour where product y produces at 30 units per hour and product z produces at 0 units per hour. Each combination is then multiplied by work hours and production rates. The run that produces numbers that most closely match the forecasted value for each product for the month is chosen. However, because some months the plant does not meet the forecasted value for a product, the algorithm increases the priority of a product for the next month to ensure that at the end of the year the product has met the forecasted value. Since warehouse space is tight, it is important that products not overproduce too much either.
Thank you
private List<List<int>> _modeIterations = new List<List<int>>();
private void CalculateCombinations(int modes, int workDays, string combinationValues)
{
List<int> _tempList = new List<int>();
if (modes == 1)
{
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
_modeIterations.Add(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",");
}
}
}
This kind of optimization problem is difficult but extremely well-studied. You should probably read up in the literature on it rather than trying to re-invent the wheel. The keywords you want to look for are "operations research" and "combinatorial optimization problem".
It is well-known in the study of optimization problems that finding the optimal solution to a problem is almost always computationally infeasible as the problem grows large, as you have discovered for yourself. However, it is frequently the case that finding a solution guaranteed to be within a certain percentage of the optimal solution is feasible. You should probably concentrate on finding approximate solutions. After all, your sales targets are already just educated guesses, therefore finding the optimal solution is already going to be impossible; you haven't got complete information.)
What I would do is start by reading the wikipedia page on the Knapsack Problem:
http://en.wikipedia.org/wiki/Knapsack_problem
This is the problem of "I've got a whole bunch of items of different values and different weights, I can carry 50 pounds in my knapsack, what is the largest possible value I can carry while meeting my weight goal?"
This isn't exactly your problem, but clearly it is related -- you've got a certain amount of "value" to maximize, and a limited number of slots to pack that value into. If you can start to understand how people find near-optimal solutions to the knapsack problem, you can apply that to your specific problem.
You could process the permutation as soon as you have generated it, instead of collecting them all in a list first:
public delegate void Processor(List<int> args);
private void CalculateCombinations(int modes, int workDays, string combinationValues, Processor processor)
{
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
else
{
for (int i = workDays + 1; --i >= 0; )
{
CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",", processor);
}
}
}
I am assuming here, that your current pattern of work is something along the lines
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3);
foreach( List<int> list in _modeIterations ) {
... process the list ...
}
With the direct-process-approach, this would be
private void ProcessPermutation(List<int> args)
{
... process ...
}
... somewhere else ...
CalculateCombinations(initial_value_1, initial_value_2, initial_value_3, ProcessPermutation);
I would also suggest, that you try to prune the search tree as early as possible; if you can already tell, that certain combinations of the arguments will never yield something, which can be processed, you should catch those already during generation, and avoid the recursion alltogether, if this is possible.
In new versions of C#, generation of the combinations using an iterator (?) function might be usable to retain the original structure of your code. I haven't really used this feature (yield) as of yet, so I cannot comment on it.
The problem lies more in the Brute Force approach that in the code itself. It's possible that brute force might be the only way to approach the problem but I doubt it. Chess, for example, is unresolvable by Brute Force but computers play at it quite well using heuristics to discard the less promising approaches and focusing on good ones. Maybe you should take a similar approach.
On the other hand we need to know how each "mode" is evaluated in order to suggest any heuristics. In your code you're only computing all possible combinations which, anyway, will not scale if the modes go up to 32... even if you store it on disk.
if (modes == 1)
{
List<int> _tempList = new List<int>();
combinationValues += Convert.ToString(workDays);
string[] _combinations = combinationValues.Split(',');
foreach (string _number in _combinations)
{
_tempList.Add(Convert.ToInt32(_number));
}
processor.Invoke(_tempList);
}
Everything in this block of code is executed over and over again, so no line in that code should make use of memory without freeing it. The most obvious place to avoid memory craziness is to write out combinationValues to disk as it is processed (i.e. use a FileStream, not a string). I think that in general, doing string concatenation the way you are doing here is bad, since every concatenation results in memory sadness. At least use a stringbuilder (See back to basics , which discusses the same issue in terms of C). There may be other places with issues, though. The simplest way to figure out why you are getting an out of memory error may be to use a memory profiler (Download Link from download.microsoft.com).
By the way, my tendency with code like this is to have a global List object that is Clear()ed rather than having a temporary one that is created over and over again.
I would replace the List objects with my own class that uses preallocated arrays to hold the ints. I'm not really sure about this right now, but I believe that each integer in a List is boxed, which means much more memory is used than with a simple array of ints.
Edit: On the other hand it seems I am mistaken: Which one is more efficient : List<int> or int[]

working with incredibly large numbers in .NET

I'm trying to work through the problems on projecteuler.net but I keep running into a couple of problems.
The first is a question of storing large quanities of elements in a List<t>. I keep getting OutOfMemoryException's when storing large quantities in the list.
Now I admit I might not be doing these things in the best way but, is there some way of defining how much memory the app can consume?
It usually crashes when I get abour 100,000,000 elements :S
Secondly, some of the questions require the addition of massive numbers. I use ulong data type where I think the number is going to get super big, but I still manage to wrap past the largest supported int and get into negative numbers.
Do you have any tips for working with incredibly large numbers?
Consider System.Numerics.BigInteger.
You need to use a large number class that uses some basic math principals to split these operations up. This implementation of a C# BigInteger library on CodePoject seems to be the most promising. The article has some good explanations of how operations with massive numbers work, as well.
Also see:
Big integers in C#
As far as Project Euler goes, you might be barking up the wrong tree if you are hitting OutOfMemory exceptions. From their website:
Each problem has been designed according to a "one-minute rule", which means that although it may take several hours to design a successful algorithm with more difficult problems, an efficient implementation will allow a solution to be obtained on a modestly powered computer in less than one minute.
As user Jakers said, if you're using Big Numbers, probably you're doing it wrong.
Of the ProjectEuler problems I've done, none have required big-number math so far.
Its more about finding the proper algorithm to avoid big-numbers.
Want hints? Post here, and we might have an interesting Euler-thread started.
I assume this is C#? F# has built in ways of handling both these problems (BigInt type and lazy sequences).
You can use both F# techniques from C#, if you like. The BigInt type is reasonably usable from other languages if you add a reference to the core F# assembly.
Lazy sequences are basically just syntax friendly enumerators. Putting 100,000,000 elements in a list isn't a great plan, so you should rethink your solutions to get around that. If you don't need to keep information around, throw it away! If it's cheaper to recompute it than store it, throw it away!
See the answers in this thread. You probably need to use one of the third-party big integer libraries/classes available or wait for C# 4.0 which will include a native BigInteger datatype.
As far as defining how much memory an app will use, you can check the available memory before performing an operation by using the MemoryFailPoint class.
This allows you to preallocate memory before doing the operation, so you can check if an operation will fail before running it.
string Add(string s1, string s2)
{
bool carry = false;
string result = string.Empty;
if (s1.Length < s2.Length)
s1 = s1.PadLeft(s2.Length, '0');
if(s2.Length < s1.Length)
s2 = s2.PadLeft(s1.Length, '0');
for(int i = s1.Length-1; i >= 0; i--)
{
var augend = Convert.ToInt64(s1.Substring(i,1));
var addend = Convert.ToInt64(s2.Substring(i,1));
var sum = augend + addend;
sum += (carry ? 1 : 0);
carry = false;
if(sum > 9)
{
carry = true;
sum -= 10;
}
result = sum.ToString() + result;
}
if(carry)
{
result = "1" + result;
}
return result;
}
I am not sure if it is a good way of handling it, but I use the following in my project.
I have a "double theRelevantNumber" variable and an "int PowerOfTen" for each item and in my relevant class I have a "int relevantDecimals" variable.
So... when large numbers is encountered they are handled like this:
First they are changed to x,yyy form. So if the number 123456,789 was inputed and the "powerOfTen" was 10, it would start like this:
theRelevantNumber = 123456,789
PowerOfTen = 10
The number was then: 123456,789*10^10
It is then changed to:
1,23456789*10^15
It is then rounded by the number of relevant decimals (for example 5) to 1,23456 and then saved along with "PowerOfTen = 15"
When adding or subracting numbers together, any number outside the relevant decimals are ignored. Meaning if you take:
1*10^15 + 1*10^10 it will change to 1,00001 if "relevantDecimals" is 5 but will not change at all if "relevantDecimals" are 4.
This method make you able to deal with numbers up doubleLimit*10^intLimit without any problem, and at least for OOP it is not that hard to keep track of.
You don't need to use BigInteger. You can do this even with string array of numbers.
class Solution
{
static void Main(String[] args)
{
int n = 5;
string[] unsorted = new string[6] { "3141592653589793238","1", "3", "5737362592653589793238", "3", "5" };
string[] result = SortStrings(n, unsorted);
foreach (string s in result)
Console.WriteLine(s);
Console.ReadLine();
}
static string[] SortStrings(int size, string[] arr)
{
Array.Sort(arr, (left, right) =>
{
if (left.Length != right.Length)
return left.Length - right.Length;
return left.CompareTo(right);
});
return arr;
}
}
If you want to work with incredibly large numbers look here...
MIKI Calculator
I am not a professional programmer i write for myself, sometimes, so sorry for unprofessional use of c# but the program works. I will be grateful for any advice and correction.
I use this calculator to generate 32-character passwords from numbers that are around 58 digits long.
Since the program adds numbers in the string format, you can perform calculations on numbers with the maximum length of the string variable. The program uses long lists for the calculation, so it is possible to calculate on larger numbers, possibly 18x the maximum capacity of the list.

Categories

Resources