I want to calculate Jaccard similarity on 10 000 texts.
Jaccard Similarity is easy to calculate : length of the intersect divided by the length of the union.
string sTtxt1 = "some text one";
string sTtxt2 = "some text two";
string sTtxt3 = "some text three";
HashSet<string[]> hashText= new HashSet<string[]>();
hashText.Add(sTtxt1);
hashText.Add(sTtxt2);
hashText.Add(sTtxt3);
double[,] dSimilarityValue;
for (int i = 0; i < hashText.Count; i++)
{
dSimilarityValue[i, i] = 100.00;
for (int j = i + 1; j < dSimilarityValue.Count; j++)
{
dSimilarityValue[i, j] = (double) hashText.ElementAt(i).Intersect(hashText.ElementAt(j)).Count() / (double) hashText.ElementAt(i).Union(hashText.ElementAt(j)).Count();
}
}
With .NET4, What rules should I use to parallelizing ?
Thank you!
Just make the inner loop parallel for
Parallel Class
Parallel.For(0, N, i =>
{
// Do Work.
});
Parallel.For(j, dSimilarityValue.Count, i =>
{
dSimilarityValue[i, j] =
(double)hashText.ElementAt(i).Intersect(hashText.ElementAt(j)).Count() /
(double)hashText.ElementAt(i).Union(hashText.ElementAt(j)).Count();
});
And I think it would be better to declare the size of the Array in new.
Don't know what you mean by "rules".
Related
I have a float array containing 1M floats
I want to do sampling: for each 4 floats I want to take only 1. So i am doing this :
for(int i = 0; i< floatArray.Length; i++) {
if(i % 4 == 0) {
resultFloat.Add(floatArray[i])
}
}
This works fine, but it takes much time to run through all the elements , is there any other methods to make it with better results (if there are any)
I can see two factors that might be slowing down performance.
As you have already been offered, you should set the step to 4:
for (int i = 0; i < floatArray.Length; i += 4)
{
resultFloat.Add(floatArray[i]);
}
Looks like resultFloat is a list of float. I suggest to use array instead of list, like this:
int m = (floatArray.Length + 3) / 4;
float[] resultFloat = new float[m];
for (int i = 0, k = 0; i < floatArray.Length; i += 4, k++)
{
resultFloat[k] = floatArray[i];
}
Just increment your loop by 4 each iteration instead of by 1:
for(int i = 0; i< floatArray.Length; i+=4)
{
resultFloat.Add(floatArray[i]);
}
If you really have an issue with performance, then you'd be even better off not using a dynamic container for the results, but a statically sized array.
float[] resultFloat = new float[(floatArray.Length + 3) >> 2];
for(int i = 0; i < resultFloat.Length; i++)
resultFloat[i] = floatArray[i << 2];
Usually performance isn't an issue thow, and you shouldn't optimize until a profiler gave you proof that you should. In all other cases the more readable code is preferrable.
Just to add another option, if you want this to be the fastest, use Parallel.For instead of a normal for loop.
int resultLength = (floatArray.Length + 3) / 4;
var resultFloat = new float[resultLength];
Parallel.For(0, resultLength, i =>
{
resultFloat[i] = floatArray[i * 4];
});
List<decimal> list = new List<decimal>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(4);
list.Add(5);
list.Add(6);
list.Add(7);
list.Add(8);
list.Add(9);
list.Add(10);
list.Add(11);
list.Add(12);
list.Add(13);
list.Add(14);
list.Add(15);
list.Add(16);
var sampleData = list.Where((x, i) => (i + 1) % (4) == 0).ToList();
I have a 2D array for a lottery I am creating. Basically it's a set of 2 integers:
int[,] coupon = new int[rowAmount, numAmount];
Where rowAmount is the amount of rows, and numAmount is the amount of numbers in that row.
Now I need to select the numbers for each row, however there may not be duplicates of a number within a specific row.
for (int r = 0; r < rowAmount; ++r)
{
for (int n = 0; n < numAmount; ++n)
{
userNum = lotRng.Next(1, numAmount * rngMult);
while (COUPON CONTAINS DUPLICATE NUMBER ON SECOND SPOT )
{
userNum = lotRng.Next(1, numAmount * rngMult);
}
coupon[r, n] = userNum;
}
}
My issue is the while part, I cannot figure out how to check if coupon contains the userNum on the second slot(The numAmount slot). For lists and stuff I used to just do list.Contains() but that doesn't seem to work on here.
Depending on the size of your array is wether it makes sense to optimize performance.
Depending on that one possibility would be to sort the array and use Array.BinarySearch .
You have to sort your array for that.
https://msdn.microsoft.com/en-us/library/2cy9f6wb(v=vs.110).aspx
So you have a number of possibilities to optimize data structure.
Solution with an array of lists is one of my favourites for this. It's very similar to my other with jagged arrays but faster- because List search is most efficient and Linq searches are not.
const int rowAmount = 1000;
const int numAmount=1000;
const int rngMult = 10;
Random lotRng = new Random();
var coupon = new List<int>[rowAmount];
int userNum;
for (int r = 0; r < rowAmount; r++)
{
coupon[r]= new List<int>();
for (int n = 0; n < numAmount; ++n)
{
do userNum = lotRng.Next(1, numAmount * rngMult);
while (coupon[r].Contains(userNum));
coupon[r].Add(userNum);
}
}
Of course it would be also possible to use a list of lists (kind of 2D lists), if necessary.
var couponLoL = new List<List<int>>();
The following quick and dirty way show a possible way of copying a 2D array to a list, but not to recommend here for several reasons (loop, boxing for value types):
var coupon= new int[rowAmount,numAmount];
[..]
do userNum = lotRng.Next(1, numAmount * rngMult);
while (coupon.Cast<int>().ToList().Contains(userNum));
In this special case it makes even less sense, because this would look in the whole 2D array for the double value. But it is worth knowing how to convert from 2D to 1D array (and then in a list).
Solution with 2D jagged arrays: If you want to access rows and columns in C#, a jagged array is very convenient and unless you care very much how effient the array ist stored internally, jagged array are to recommend strongly for that.
Jagged arrays are simply arrays of arrays.
const int rowAmount = 1000;
const int numAmount=1000;
const int rngMult = 10;
int userNum;
Random lotRng = new Random();
var coupon = new int[rowAmount][];
for (int r = 0; r < rowAmount; r++)
{
coupon[r] = new int[numAmount];
for (int n = 0; n < numAmount; ++n)
{
do userNum = lotRng.Next(1, numAmount * rngMult);
while (Array.Exists(coupon[r], x => x == userNum));
coupon[r, n] = userNum;
}
}
The above function Array.Exists works only for 1 dimension what is enough here, and needs no Linq. The same as above with Linq method .Any :
while (coupon[r].Any(x => x == userNum));
If you would have to search in two dimensions for a double value, you would need a loop more again, but still on nested loop level less than without this.
Linq is elegant, but normally not the fastest method (but you would have to handle with very big arrays of multi-million sizes for that to matter).
For other possibilities of using Linq, look for example here:
How to use LINQ with a 2 dimensional array
Another idea would be to make one-dimensional array of size rowAmount*numAmount.
It needs a little bit of thinking, but allows most simple and fastest access of searching.
Solution with loops in an array. Not elegant, but you could refactor the search loops in an own method to look better. But as a second point, in this case a linear search like shown is not a really fast solution either.
Only inner part of the 2 for-loops, not full code (as in other answers by me here):
bool foundDup;
do
{
userNum = lotRng.Next(1, numAmount * rngMult);
foundDup = false;
for (var x = 0; x < coupon.GetLength(1); x++) //Iterate over second dimension only
if (coupon[r, x] == userNum)
{ foundDup = true;
break;
}
} while (foundDup);
coupon[r, n] = userNum;
In the special context of this question, you can optimize the loop:
for (var x = 0; x < n; x++)
As you say in your comment you need to check all n fields vs. your new userNum. You can solve this with the following code:
for (int r = 0; r < rowAmount; ++r)
{
for (int n = 0; n < numAmount; ++n)
{
userNum = lotRng.Next(1, numAmount * rngMult);
for (int x = 0; x < coupon.GetLength(1); x++) //Iterate over your second dimension again
{
while (coupon[r,x] == userNum)
{
userNum = lotRng.Next(1, numAmount * rngMult);
}
}
coupon[r, n] = userNum;
}
}
this is a really easy question but i cant figure out a way around it. Apparently the almost ordered has a bug that it might randomize a little bit more than you ask it. the code is rather simple:
public void Section1Task1AlmostOrdered(int arraySize, int percentage)
{
int[] testArray = new int[arraySize];
Console.WriteLine("Ordered List: ");
for (int i = 1; i <= testArray.Length; i++)
{
testArray[i-1] = i;
Console.Write(i + "\t");
}
Console.WriteLine("Almost Ordered List: ");
testArray = shuffler.AlmostOrdered(arraySize, percentage);
for (int i = 0; i < testArray.Length; i++)
{
Console.Write(testArray[i] + "\t");
}
}
The shuffler is this part of the code:
public int[] AlmostOrdered(int n, double p)
{
if (p > 100)
{
throw new InvalidOperationException("Cannot shuffle more than 100% of the numbers");
}
int shuffled = 0;
//Create and Populate an array
int[] array = new int[n];
for(int i = 1; i <= n; i++)
{
array[i-1] = i;
}
//Calculate numbers to shuffle
int numsOutOfPlace = (int) Math.Ceiling(n * (p / 100));
int firstRandomIndex = 0;
int secondRandomIndex = 0;
do
{
firstRandomIndex = this.random.Next(n-1);
// to make sure that the two numbers are not the same
do
{
secondRandomIndex = this.random.Next(n - 1);
} while (firstRandomIndex == secondRandomIndex);
int temp = array[firstRandomIndex];
array[firstRandomIndex] = array[secondRandomIndex];
array[secondRandomIndex] = temp;
shuffled++;
}
while (shuffled < numsOutOfPlace);
return array;
}
When i enter values 10 for array size and 40 for percentage to be shuffled, it is shuffling 5 numbers instead of 4. Is there a way to perfect this method to make it more accurate?
Likely the problem is with the calculation:
int numsOutOfPlace = (int)Math.Ceiling(n * (p / 100));
So if p=40 and n=10, then in theory you should get 4. But you're dealing with floating point numbers. So if (p/100) returns 0.400000000001, then the result will be 4.000000001, and Math.Ceiling will round that up to 5.
You might want to replace Math.Ceiling with Math.Round and see how that works out.
I want to print a multiplication table in C# but it's not aligned!
When we type a number "n" in textbox means: n*n table.
What can I do?
for (int i = 1; i <= Convert.ToInt32(textBox1.Text); i++)
{
for (int j = 1; j <= Convert.ToInt32(textBox1.Text); j++)
{
richTextBox1.Text += Convert.ToString(i * j) + " ";
}
richTextBox1.Text += "\n";
}
Set the font of RichTextBox to the monospaced font Courier New, then add the text to RichTextBox using String.Format and setting alignment for the result of multiplication. Use a positive number to align right and use a negative number to align left:
var n = 5;
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= n; j++)
{
this.richTextBox1.AppendText(string.Format("{0,3}", i * j));
}
this.richTextBox1.AppendText(Environment.NewLine);
}
Instead of format the result by {0,3} you can use below code to format based on maximum length of characters is a number which belongs to n*n:
Left Aligned:
string.Format("{0,-" +((n*n).ToString().Length + 1).ToString() +"}", i * j)
Right Aligned:
string.Format("{0," +((n*n).ToString().Length + 1).ToString() +"}", i * j)
If you want to align using spaces, you need to use a monospaced font (like Courier, or Consolas), otherwise you can use tabs: numbers won't be aligned this way though, and since numbers in your routine can get considerably big, you may end up having your numbers occupy more than the tab separation, and will get inconsistencies in the alignment if that happens.
As a general rule, if you want to align any kind of text box, go with a monospaced font.
You can pad with spaces, using, for example, String.PadLeft or String.PadRight.
This would be as simple as changing:
richTextBox1.Text += Convert.ToString(i * j) + " ";
With
richTextBox1.Text += Convert.ToString(i * j).PadLeft(5);
However this would assume all numbers are at maximum 5 characters in width.
For your precise routine, you could calculate the maximum width though, so you'd end up with something like:
// convert your input only once
int myNumber = Convert.ToInt32(textBox1.Text);
// pad with the maximum possible length, plus one space
int padAmount = (myNumber * myNumber).ToString().Length + 1;
for (int i = 1; i <= myNumber; i++)
{
for (int j = 1; j <= myNumber; j++)
{
// pad your input by the amount of spaces needed to fit all possible numbers
richTextBox1.Text += (i*j).ToString().PadLeft(padAmount);
}
}
// use Environment.NewLine instead of `\n`
richTextBox1.Text += Environment.NewLine;
Here's a fiddle. It's (for obvious reasons) for console, so in my fiddle the input number is fixed (it's in myNumber) and the output is just a string (instead of richTextBox1.Text), but it should show how it works.
Although I've made a few changes (I only convert the input number once, and use Environment.NewLine instead of \n), this is far from optimal though, you should build your string (using a StringBuilder) and assign it at once, instead of adding to the Text property. I've made a fiddle with this approach, and memory consumption has gone down by over 30mb (to just a handful of kb) just by using StringBuilder.
I think the best solution is using a tab instead of a special font and padding with white spaces.
For tabs you must add a "\t" after every number. The "\t" will be evaluated as an tab-character.
for (int i = 1; i <= Convert.ToInt32(textBox1.Text); i++)
{
for (int j = 1; j <= Convert.ToInt32(textBox1.Text); j++)
{
richTextBox1.Text += Convert.ToString(i * j) + "\t "; //here at the end
}
richTextBox1.Text += "\n";
}
But it is important, that a tab has a fixed width. In case your numbers are too long, you need 2 tabs for short numbers. But for small tables like yours it is no problem.
You can change the position of the tabs with the SelectionTabs property:
this.richTextBox1.SelectionTabs = new[] { 20, 40, 60, 80 };
BTW, you should use a StringBuilder to concatenate multiple string parts to one. And it would be more effective to parse the number from textBox1 only once and not during every iteration.
var sb = new StringBuilder(); //In namespace System.Text
var x = Convert.ToInt32(textBox1.Text); //parse only once
for (int i = 1; i <= x; i++)
{
for (int j = 1; j <= x; j++)
{
sb.Append(Convert.ToString(i * j));
sb.Append("\t ");
}
sb.Append("\n");
}
richTextBox1.Text += sb.ToString();
I heard about Counting Sort and wrote my version of it based on what I understood.
public void my_counting_sort(int[] arr)
{
int range = 100;
int[] count = new int[range];
for (int i = 0; i < arr.Length; i++) count[arr[i]]++;
int index = 0;
for (int i = 0; i < count.Length; i++)
{
while (count[i] != 0)
{
arr[index++] = i;
count[i]--;
}
}
}
The above code works perfectly.
However, the algorithm given in CLRS is different. Below is my implementation
public int[] counting_sort(int[] arr)
{
int k = 100;
int[] count = new int[k + 1];
for (int i = 0; i < arr.Length; i++)
count[arr[i]]++;
for (int i = 1; i <= k; i++)
count[i] = count[i] + count[i - 1];
int[] b = new int[arr.Length];
for (int i = arr.Length - 1; i >= 0; i--)
{
b[count[arr[i]]] = arr[i];
count[arr[i]]--;
}
return b;
}
I've directly translated this from pseudocode to C#. The code doesn't work and I get an IndexOutOfRange Exception.
So my questions are:
What's wrong with the second piece of code ?
What's the difference algorithm wise between my naive implementation and the one given in the book ?
The problem with your version is that it won't work if the elements have satellite data.
CLRS version would work and it's stable.
EDIT:
Here's an implementation of the CLRS version in Python, which sorts pairs (key, value) by key:
def sort(a):
B = 101
count = [0] * B
for (k, v) in a:
count[k] += 1
for i in range(1, B):
count[i] += count[i-1]
b = [None] * len(a)
for i in range(len(a) - 1, -1, -1):
(k, v) = a[i]
count[k] -= 1
b[count[k]] = a[i]
return b
>>> print sort([(3,'b'),(2,'a'),(3,'l'),(1,'s'),(1,'t'),(3,'e')])
[(1, 's'), (1, 't'), (2, 'a'), (3, 'b'), (3, 'l'), (3, 'e')]
It should be
b[count[arr[i]]-1] = arr[i];
I'll leave it to you to track down why ;-).
I don't think they perform any differently. The second just pushes the correlation of counts out of the loop so that it's simplified a bit within the final loop. That's not necessary as far as I'm concerned. Your way is just as straightforward and probably more readable. In fact (I don't know about C# since I'm a Java guy) I would expect that you could replace that inner while-loop with a library array fill; something like this:
for (int i = 0; i < count.Length; i++)
{
arrayFill(arr, index, count[i], i);
index += count[i];
}
In Java the method is java.util.Arrays.fill(...).
The problem is that you have hard-coded the length of the array that you are using to 100. The length of the array should be m + 1 where m is the maximum element on the original array. This is the first reason that you would think using counting-sort, if you have information about the elements of the array are all minor that some constant and it would work great.