Time complexity of a powerset generating function - c#

I'm trying to figure out the time complexity of a function that I wrote (it generates a power set for a given string):
public static HashSet<string> GeneratePowerSet(string input)
{
HashSet<string> powerSet = new HashSet<string>();
if (string.IsNullOrEmpty(input))
return powerSet;
int powSetSize = (int)Math.Pow(2.0, (double)input.Length);
// Start at 1 to skip the empty string case
for (int i = 1; i < powSetSize; i++)
{
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
string set = string.Empty;
for (int j = 0; j < pset.Length; j++)
{
if (pset[j] == '1')
{
set = string.Concat(set, input[j].ToString());
}
}
powerSet.Add(set);
}
return powerSet;
}
So my attempt is this:
let the size of the input string be n
in the outer for loop, must iterate 2^n times (because the set size is 2^n).
in the inner for loop, we must iterate 2*n times (at worst).
1. So Big-O would be O((2^n)*n) (since we drop the constant 2)... is that correct?
And n*(2^n) is worse than n^2.
if n = 4 then
(4*(2^4)) = 64
(4^2) = 16
if n = 100 then
(10*(2^10)) = 10240
(10^2) = 100
2. Is there a faster way to generate a power set, or is this about optimal?

A comment:
the above function is part of an interview question where the program is supposed to take in a string, then print out the words in the dictionary whose letters are an anagram subset of the input string (e.g. Input: tabrcoz Output: boat, car, cat, etc.). The interviewer claims that a n*m implementation is trivial (where n is the length of the string and m is the number of words in the dictionary), but I don't think you can find valid sub-strings of a given string. It seems that the interviewer is incorrect.
I was given the same interview question when I interviewed at Microsoft back in 1995. Basically the problem is to implement a simple Scrabble playing algorithm.
You are barking up completely the wrong tree with this idea of generating the power set. Nice thought, clearly way too expensive. Abandon it and find the right answer.
Here's a hint: run an analysis pass over the dictionary that builds a new data structure more amenable to efficiently solving the problem you actually have to solve. With an optimized dictionary you should be able to achieve O(nm). With a more cleverly built data structure you can probably do even better than that.

2. Is there a faster way to generate a power set, or is this about optimal?
Your algorithm is reasonable, but your string handling could use improvement.
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
All you're doing here is setting up a bitfield, but using a string. Just skip this, and use variable i directly.
for (int j = 0; j < input.Length; j++)
{
if (i & (1 << j))
{
When you build the string, use a StringBuilder, not creating multiple strings.
// At the beginning of the method
StringBuilder set = new StringBuilder(input.Length);
...
// Inside the loop
set.Clear();
...
set.Append(input[j]);
...
powerSet.Add(set.ToString());
Will any of this change the complexity of your algorithm? No. But it will significantly reduce the number of extra String objects you create, which will provide you a good speedup.

Related

2D Array Input in same line

How I take 2D array input in same line. in C# Console.ReadLine() allow us to take input one at a time .I want to take input as a row
int [,] arr = new int[m,n];
for(i = 0; i < m; i++)
{
for(j = 0; j < n; j++)
{
arr[i, j] = int.Parse(Console.ReadLine());
}
}
I want to take input this way
2 2,
10 20,
30 40
Entering a two-dimensional array on the command line is going to be error prone and frustrating for users. But if you MUST do it:
Figure out what symbols will separate values. (Commas, spaces?)
Figure out what symbols will separate array dimensions. (Pipes, perhaps? Whatever you choose, make sure it isn't the same symbol you use for separating values.)
Prompt the user for data and capture it into a string.
Validate the data.
Write a parser that parses your data into a multi-dimensional array.
I'd advise against trying to do this. But I don't dictate your requirements.
var delimiter = ' ';
for(var i = 0; i<n; i++) {
var row = Console.ReadLine();
var _arr = row.Trim().Split(delimiter);
for(var j=0; j<m; j++) {
arr[j, i] = int.Parse(_arr[j].Trim());
}
}
Update:
#mike-hofer, wholeheartedly agree with the "error prone and frustrating for users" characteristics of such way of input. I assume this is rather for a quick and dirty testing. Plus, exactly the same approach will be applied if you have to read the array from a file line by line, so there is some broader value in this question.
#rahi-ratul75, the code above does not do any error checking. The most likely error will be an entry which won't parse as an integer. You may want, therefore, to use int.TryParse(,) and, when false, ask to re-enter the line. The main logic, however, is there:
read the line
split it into an array
parse the entry into an integer

Binary search slower, what am I doing wrong?

EDIT: so it looks like this is normal behavior, so can anyone just recommend a faster way to do these numerous intersections?
so my problem is this. I have 8000 lists (strings in each list). For each list (ranging from size 50 to 400), I'm comparing it to every other list and performing a calculation based on the intersection number. So I'll do
list1(intersect)list1= number
list1(intersect)list2= number
list1(intersect)list888= number
And I do this for every list. Previously, I had HashList and my code was essentially this: (well, I was actually searching through properties of an object, so I
had to modify the code a bit, but it's basically this:
I have my two versions below, but if anyone knows anything faster, please let me know!
Loop through AllLists, getting each list, starting with list1, and then do this:
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
foreach (string word in list1)
{
if (block.generator_list.Contains(word))
{
//simple integer count
}
}
}
// a little more code, but the same, but looping through the other list if it's smaller/bigger
Then I make the lists into regular lists, and applied Sort(), which changed my code to
foreach (List list in AllLists)
{
if (list1_length < list_length) //just a check to so I'm looping through the
//smaller list
{
for (int i = 0; i < list1_length; i++)
{
var test = list.BinarySearch(list1[i]);
if (test > -1)
{
//simple integer count
}
}
}
The first version takes about 6 seconds, the other one takes more than 20 (I just stop there cuz otherwise it would take more than a minute!!!) (and this is for a smallish subset of the data)
I'm sure there's a drastic mistake somewhere, but I can't find it.
Well I have tried three distinct methods for achieving this (assuming I understood the problem correctly). Please note I have used HashSet<int> in order to more easily generate random input.
setting up:
List<HashSet<int>> allSets = new List<HashSet<int>>();
Random rand = new Random();
for(int i = 0; i < 8000; ++i) {
HashSet<int> ints = new HashSet<int>();
for(int j = 0; j < rand.Next(50, 400); ++j) {
ints.Add(rand.Next(0, 1000));
}
allSets.Add(ints);
}
the three methods I checked (code is what runs in the inner loop):
the loop:
note that you are getting duplicated results in your code (intersecting set A with set B and later intersecting set B with set A).
It won't affect your performance thanks to the list length check you are doing. But iterating this way is clearer.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
}
}
first method:
used IEnumerable.Intersect() to get the intersection with the other list and checked IEnumerable.Count() to get the size of the intersection.
var intersect = allSets[i].Intersect(allSets[j]);
count = intersect.Count();
this was the slowest one averaging 177s
second method:
cloned the smaller set of the two sets I was intersecting, then used ISet.IntersectWith() and checked the resulting sets Count.
HashSet<int> intersect;
HashSet<int> intersectWith;
if(allSets[i].Count < allSets[j].Count) {
intersect = new HashSet<int>(allSets[i]);
intersectWith = allSets[j];
} else {
intersect = new HashSet<int>(allSets[j]);
intersectWith = allSets[i];
}
intersect.IntersectWith(intersectWith);
count = intersect.Count;
}
}
this one was slightly faster, averaging 154s
third method:
did something very similar to what you did iterated over the shorter set and checked ISet.Contains on the longer set.
for(int i = 0; i < allSets.Count; ++i) {
for(int j = i + 1; j < allSets.Count; ++j) {
count = 0;
if(allSets[i].Count < allSets[j].Count) {
loopingSet = allSets[i];
containsSet = allSets[j];
} else {
loopingSet = allSets[j];
containsSet = allSets[i];
}
foreach(int k in loopingSet) {
if(containsSet.Contains(k)) {
++count;
}
}
}
}
this method was by far the fastest (as expected), averaging 66s
conclusion
the method you're using is the fastest of these three. I certainly can't think of a faster single threaded way to do this. Perhaps there is a better concurrent solution.
I've found that one of the most important considerations in iterating/searching any kind of collection is to choose the collection type very carefully. To iterate through a normal collection for your purposes will not be the most optimal. Try using something like:
System.Collections.Generic.HashSet<T>
Using the Contains() method while iterating over the shorter list of two (as you mentioned you're already doing) should give close to O(1) performance, the same as key lookups in the generic Dictionary type.

Weighted Permutations Algorithm

Team:
I am building a business rules engine that is contextually aware -- but it's weighted -- or in other words each business rule has a level of granularity defined by the segments of a key. The segments are not combinatorial in that they cannot be weighted in any order, but rather permutable like a combination lock (interestingly enough improperly named but widely accepted).
However, to reduce the amount of code that's necessary to provide the business rules we are only building exclusion files meaning that each segment could end up with a specific key value or ALL.
So, now that we have an abstract background, let's take a concrete example. The defined segments are as follows:
Line of Business (LOB)
Company
State
Now, let's assume for this example that the LOB is ABC, Company is G and State is WY. If you break that down I should get the following permutations:
ABC_G_WY
ABC_G_ALL
ABC_ALL_WY
ABC_ALL_ALL
ALL_G_WY
ALL_G_ALL
ALL_ALL_WY
ALL_ALL_ALL
However, I need an algorithm to solve that problem. The segments must also be returned in the aforementioned order because you must always find the most finite rule first.
I look forward to your responses and thank you all in advance!
public static void Main(string[] args)
{
List<string> inputValues = new List<string>() { "ABC", "G", "WY" };
List<string> results = new List<string>();
int permutations = (int)Math.Pow(2.0, (double)inputValues.Count);
for (int i = 0; i < permutations; i++)
{
int mask = 1;
Stack<string> lineValues = new Stack<string>();
for (int j = inputValues.Count-1; j >= 0; j--, mask <<= 1)
{
if ((i & mask) == 0)
{
lineValues.Push(inputValues[j]);
}
else
{
lineValues.Push("ALL");
}
}
results.Add(string.Join("_", lineValues.ToArray())); //ToArray can go away in 4.0(?) I've been told. I'm still on 3.5
}
foreach (string s in results)
{
Console.WriteLine(s);
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey(true);
}
If I get the question right, You should:
-Generate all binary strings of length N (there will be 2^N of them)
-sort them by number of bits set
-generate rules. Rule has 'ALL' in position i, if bit number i in the corresponding binary string is set

Most efficient way to determine first character in string?

Which of these methods are the most efficient one or is there a better way to do it?
this.returnList[i].Title[0].ToString()
or
this.returnList[i].Title.Substring(0, 1)
They're both very fast:
Char Index
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample[0].ToString();
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.2012243
00:00:00.2207168
00:00:00.2184807
00:00:00.2258847
00:00:00.2296456
00:00:00.2261465
00:00:00.2120131
00:00:00.2221702
00:00:00.2346083
00:00:00.2330840
Substring
var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
clock.Start();
for (var j = 0; j < 10000000; j++)
{
var first = sample.Substring(0, 1);
}
clock.Stop();
Console.Write(clock.Elapsed);
clock.Reset();
}
// Results
00:00:00.3268155
00:00:00.3337077
00:00:00.3439908
00:00:00.3273090
00:00:00.3380794
00:00:00.3400650
00:00:00.3280275
00:00:00.3333719
00:00:00.3295982
00:00:00.3368425
I also agree with BrokenGlass that using the char index is a cleaner way of writing it. Plus if you're doing it 10 trillion times it'll be much faster!
There is a big loophole in your code that may cause problems, depending on what you mean by "first character" and what returnList contains.
C# strings contain UTF-16, which is a variable-length encoding, and if returnList is an array of strings, then returnList[i] might only be one char of a Unicode point. If you want to return the first Unicode grapheme of a string:
string s = returnList[i].Title;
if (string.IsNullOrEmpty(s))
return s;
int charsInGlyph = char.IsSurrogatePair(s, 0) ? 2 : 1;
return s.Substring(0, charsInGlyph);
You can run into the same problems with BOMs, tagged, and combining characters; these are all valid characters but are not meaningful if displayed to a user.
If you want Unicode points or graphemes, not chars, you must use strings; Unicode graphemes can be more than one char.
I don't think it would matter much efficiency wise, but in my opinion the clearer, more idiomatic and hence more maintainable way of returning the first character is using the index operator:
char c = returnList[i].Title[0];
This assumes of course there is at least one character, if that's not a given you have to check for that.
Those should be close to identical in performance.
The expensive part of the operation is to create the string, and there is no more efficient way to do that.
Unless of couse you want to pre-create strings for all possible characters and store in a dictionary, but that would use up a lot of memory for such a trivial task.
returnList[I].Title[0] is much faster as it does not need to create a new string, only accessing a char from the original one.
Of course, it will throw an exception if the string is empty, so you should check that first.
As a rule of thumb, never use strings with a fixed length of 1. that's what char is for.
The performance difference is not likely to matter though, but the better readability certainly will.

Search a file for a sequence of bytes (C#)

I'm writing a C# application in which I need to search a file (could be very big) for a sequence of bytes, and I can't use any libraries to do so. So, I need a function that takes a byte array as an argument and returns the position of the byte following the given sequence. The function doesn't have to be fast, it simply has to work. Any help would be greatly appreciated :)
If it doesn't have to be fast you could use this:
int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
for (int i = 0; i < data.Length - pattern.Length; i++)
{
bool match = true;
for (int k = 0; k < pattern.Length; k++)
{
if (data[i + k] != pattern[k])
{
match = false;
break;
}
}
if (match)
{
return i + pattern.Length;
}
}
}
But I really would recommend you to use Knuth-Morris-Pratt algorithm, it's the algorithm mostly used as a base of IndexOf methods for strings. The algorithm above will perform really slow, exept for small arrays and small patterns.
The straight-forward approach as pointed out by Turrau works, and for your purposes is probably good enough, since you say it doesn't have to be fast - especially since for most practical purposes the algorithm is much faster than the worst case O(n*m). (Depending on your pattern I guess).
For an optimal solution you can also check out the Knuth-Morris-Pratt algorithm, which makes use of partial matches which in the end is O(n+m).
Here's an extract of some code I used to do a boyer-moore type search. It's mean to work on pcap files, so it operates record by record, but should be easy enough to modify to suit just searching a long binary file. It's sort of extracted from some test code, so I hope I got everything for you to follow along. Also look up boyer-moore searching on wikipedia, since that is what it's based off of.
int[] badMatch = new int[256];
byte[] pattern; //the pattern we are searching for
//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
badMatch[i] = pattern.Length;
}
//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}
// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>
// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
// if our payload character equals our search string character, continue matching counting backwards
for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
{
k--;
}
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
if (j == -1)
{
//we MATCHED!!!
//i = end;
cont = false;
}
}

Categories

Resources