How to handle arrays of extremely large strings in C#

How to handle arrays of extremely large strings in C# - c#

I'm implementing the Floyd-Warshal algorithm, as can be found here.
I'm not just interested in the shortest distance between nodes, but also in the path, corresponding with that distance.
In order to do this, I have modified the algorithm as follows:
double[,] dist = new double[V, V]; // existing line
string[,] connections = new string[V, V]; // new line, needed for remembering the path
...
for (i = 0; i < V; i++){
for (j = 0; j < V; j++){
dist[i, j] = graph[i, j];
connections[i, j] = $"({i},{j})";}} // added: initialisation of "connections"
...
if (dist[i, k] + dist[k, j] < dist[i, j])
{
dist[i, j] = dist[i, k] + dist[k, j];
connections[i, j] = connections[i, k] + "-" + connections[k, j]; // Added for remembering shortest path
}
I'm running this algorithm with a snake-like list of locations of one million, all of them simply being added one after the other.
As a result, my connections array looks as follows:
[0, 0] "(0,0)"
[0, 1] "(0,1)"
[0, 2] "(0,1)-(1,2)"
[0, 3] "(0,1)-(1,2)-(2,3)"
[0, 4] "(0,1)-(1,2)-(2,3)-(3,4)"
[0, 5] "(0,1)-(1,2)-(2,3)-(3,4)-(4,5)"
...
[0, 787] "(0,1)-(1,2)-...(786,787)" // length of +7000 characters
...
... at the moment of my OutOfMemoryException (what a surprise) :-)
I would like to avoid that OutOfMemoryException, and start thinking of different techniques:
Forcing garbage collection once a variable is not needed anymore (in case this is not yet done)
"Swapping" very large objects between memory and hard disk, in order to get more memory access.
I believe the second option being the most realistic (don't kill me if I'm wrong) :-)
Is there a technique in C# which makes that possible?
Oh, if you react like "You're an idiot! There's a far better way to keep the shortest paths in Floyd-Warshal!", don't refrain from telling me how :-)
Thanks in advance

Related

Sorting Loop Not Sorting First Variable

This is my first stack overflow post so sorry if its not in the right format. In any case, the following is in C# and I'm using Visual Studio 2019 .NET Core 3.1
Goal: Sorting an Array of integers in ascending order ex. [3, 2, 5, 1] -> [1, 2, 3, 5]
Problem: Method sorts everything but the first element ex. [3, 2, 5, 1] -> [3, 1, 2, 5]
Code:
for (int i = 0; i < array.Length; i++)
{
if(i != array.Length - 1 && array[i] > array[i + 1])
{
int lowerValue = array[i + 1];
int higherValue = array[i];
array[i] = lowerValue;
array[i + 1] = higherValue;
i = 0;
}
}
I've put in Console.WriteLine() statements(one outside of the for loop above to see the starting array and one in the if statement to see how it updates the array when a change happens and yes I know I could use the debugger, but I don't currently have any method calls to the method this for loop is in) to see how its adjusting the array as it goes along and it will regularly show that its running through an iteration without changing anything.
Also if anyone could tell me what the time complexity of this is, that would be great. My initial guess is that its linear but I feel like with the "i = 0" statement and the way it swaps values makes it exponential. Thanks!

When I run your code, I get 2135
As #Flydog57 says, it would be useful to go through every single step to notice when it doesn't run as desired. The code looks a bit like a BubbleSort without inner loop... The iteration also goes through only once. If I see it correctly, you are missing a multiple iteration. Have a look at the BubleSort.
My suggestion for the sorting looks like this:
int[] array = { 3, 2, 5, 1 };
var sortedArray = array.OrderBy(p => p).ToArray();
Regarding O notation, I am unfortunately a bit rusty and can't help. But for me it looks like O(n).

Your algorithm is not a variant of bubble sort and it reset the loop counter when facing to descending order. in Bubble sort in every traverse, we are sure that the global max/min is in the right place but your algorithm is not sure about that till the execution for the last i.
In your algorithm, the left part of the array is always partially ordered.
Firstly, the problem with your code is that after resetting the counter, the loop would add it up to 1(i++), therefore in your code, it can't be less than 1. if you set it to -1 it would sort the array, because after setting it to -1 the loop changes it to 0 and that's the desired behavior. The if block should be like this:
if(i != array.Length - 1 && array[i] > array[i + 1])
{
int lowerValue = array[i + 1];
int higherValue = array[i];
array[i] = lowerValue;
array[i + 1] = higherValue;
i = -1;
}
The order of the algorithm: The worst-case happens when the array is in descending order, and in this case, for every i, The if condition would be true then the i would be reset to zero, therefore the total time complexity would be:
1 + 2 + 3 + 4 + .. + n(array length)= n(n+1)/2= O(n^2)

I have a for loop that asigns characters to char array, error: System.IndexOutOfRangeException

I'm making a program, that has a few simple games like Tik Tak Toe, and so on. I made a char array, that contained "x" as cross and "o" as circle. I made a for loop just to assign placeholder characters. When I run it, it underscores the curly bracket at the end of the loop and says "System.IndexOutOfRangeException". What confuses me is how can a for loop be out of bounds.
I tried changing the "i<3" to "i<2".
And even if that worked, I would ask anyways, because I wouldn't understand why it worked
char[,] CoC = new char[2, 2];
for (int i = 0; i < 3; i++)
{
CoC[i, 0] = 'a';
CoC[i, 1] = 'b';
CoC[i, 2] = 'c';
}
"CoC" stands for CrossOrCircle, "a, b, c" are just placeholders

CoC is created as new char[2, 2], meaning it has only two elements in each dimension. You'll need to initialize it to a larger size to accommodate the loop you have there:
char[,] CoC = new char[3, 3];
// Here -------^--^

char[,] CoC = new char[3, 3];
will be better because you need an array of [3,3] and not an array [2,2],
your loop indicates 3x3 items

Faster conversion of BGR packed to RGB planar pixel format

From a SDK I get images that have the pixel format BGR packed, i.e. BGRBGRBGR. For another application, I need to convert this format to RGB planar RRRGGGBBB.
I am using C# .NET 4.5 32bit and the data is in byte arrays which have the same size.
Right now I am iterating through the array source and assigning the BGR values to there appropriate places in the target array, but that takes too long (180ms for a 1,3megapixel image). The processor the code runs at has access to MMX, SSE, SSE2, SSE3, SSSE3.
Is there a way to speed up the conversion?
edit: Here is the conversion I am using:
// the array with the BGRBGRBGR pixel data
byte[] source;
// the array with the RRRGGGBBB pixel data
byte[] result;
// the amount of pixels in one channel, width*height
int imageSize;
for (int i = 0; i < source.Length; i += 3)
{
result[i/3] = source[i + 2]; // R
result[i/3 + imageSize] = source[i + 1]; // G
result[i/3 + imageSize * 2] = source[i]; // B
}
edit: I tried splitting the access to the source array into three loops, one for each channel, but it didn't really help. So I'm open to suggestions.
for (int i = 0; i < source.Length; i += 3)
{
result[i/3] = source[i + 2]; // R
}
for (int i = 0; i < source.Length; i += 3)
{
result[i/3 + imageSize] = source[i + 1]; // G
}
for (int i = 0; i < source.Length; i += 3)
{
result[i/3 + imageSize * 2] = source[i]; // B
}
Bump because the question is still unanswered. Any advise is very appreciated!

You could try to use SSE3's PSHUFB - Packed Shuffle Bytes instruction. Make sure you are using aligned memory read/writes. You will have to do something tricky to deal with the last dangling B value in each XMMWORD-sized block. Might be tough to get it right but should be a huge speedup. You could also look for library code. I'm guessing you will need to make a C or C++ DLL and use P/Invoke, but maybe there is a way to use SSE instructions from C# that I don't know about.
edit - this question is for a slightly different problem, ARGB to BGR, but the techniques used are similar to what you need.

Basler company has a SDK for their cameras, called Basler Pylon, working in Windows and Linux.
This SDK has APIs for C++, C# and more.
It has an image conversion class PixelDataConverter, which seems to be what you need.

Matching Two Lists

I have a table that contains human entered observation data. There is a column that is supposed to correspond to another list; the human entered value should identically match that in a sort of master list of possibilities.
The problem however is that the human data is abbreviated, misspelled, and etc. Is there a mechanism that does some sort of similarity search to find what the human entered data should actually be?
Examples
**Human Entered** **Should Be**
Carbon-12 Carbon(12)
South Korea Republic of Korea
farenheit Fahrenheit
The only thought I really have is to break up the Human Entered data into like 3 character sections and see if they are contained in the Should Be list. It would just pick the highest rated entry. As a later addition it could present the user with a choice of the top 10 or something.
I'm also not necessarily interested in an absolutely perfect solution, but if it worked like 70% right it would save A LOT of time going through the list.

One option is to look for a small Levenshtein distance between two strings rather than requiring an exact match. This would help find matches where there are minor spelling differences or typos.
Another option is to normalize the strings before comparing them. The normalization techniques that make sense depend on your specific application but it could for example involve:
Removing all punctuation.
Converting UK spellings to US spellings.
Using the scientific name for a substance instead of its common names.
etc.
You can then compare the normalized forms of the members of each list instead of the original forms. You may also want to consider using a case-insensitive comparison instead of a case-sensitive comparison.

You can try to calculate the similarity of two strings using Levenshtein distance:
private static int CalcLevensteinDistance(string left, string right)
{
if (left == right)
return 0;
int[,] matrix = new int[left.Length + 1, right.Length + 1];
for (int i = 0; i <= left.Length; i++)
// delete
matrix[i, 0] = i;
for (int j = 0; j <= right.Length; j++)
// insert
matrix[0, j] = j;
for (int i = 0; i < left.Length; i++)
{
for (int j = 0; j < right.Length; j++)
{
if (left[i] == right[j])
matrix[i + 1, j + 1] = matrix[i, j];
else
{
// deletion or insertion
matrix[i + 1, j + 1] = System.Math.Min(matrix[i, j + 1] + 1, matrix[i + 1, j] + 1);
// substitution
matrix[i + 1, j + 1] = System.Math.Min(matrix[i + 1, j + 1], matrix[i, j] + 1);
}
}
}
return matrix[left.Length, right.Length];
}
Now calculate the similarity between two strings in %
public static double CalcSimilarity(string left, string right, bool ignoreCase)
{
if (ignoreCase)
{
left = left.ToLower();
right = right.ToLower();
}
double distance = CalcLevensteinDistance(left, right);
if (distance == 0.0f)
return 1.0f;
double longestStringSize = System.Math.Max(left.Length, right.Length);
double percent = distance / longestStringSize;
return 1.0f - percent;
}

Have you considered using a (...or several) drop down list(s) to enforce correct input? In my opinion, that would be a better approach in most cases when considering usability and user friendlyness. It would also make treatment of this input a lot easier. When just using free text input, you'd probably get a lot of different ways to write one thing, and you'll "never" be able to figure out every way of writing anything complex.
Example: As you wrote; "carbon-12", "Carbon 12", "Carbon ( 12 )", "Carbon (12)", "Carbon - 12" etc... Just for this, the possibilities are nearly endless. When you also consider things like "South Korea" vs "Republic of Korea" where the mapping is not "1:1" (What about North Korea? Or just "Korea"?), this gets even harder.
Of course, I know nothing about your application and might be completely wrong. But usually, when you expect complex values in a certain format, a drop down list would in many cases make both your job as a developer easier, as well as give the end user a better experience.

C# search with resemblance / affinity

Suppose We have a, IEnumerable Collection with 20 000 Person object items.
Then suppose we have created another Person object.
We want to list all Persons that ressemble this Person.
That means, for instance, if the Surname affinity is more than 90 % , add that Person to the list.
e.g. ("Andrew" vs "Andrw")
What is the most effective / quick way of doing this?
Iterating through the collection and comparing char by char with affinity determination? OR?
Any ideas?
Thank you!

You may be interested in:
Levenshtein Distance Algorithm
Peter Norvig - How to Write a Spelling Corrector
(you'll be interested in the part where he compares a word against a collection of existing words)

Depending on how often you'll need to do this search, the brute force iterate and compare method might be fast enough. Twenty thousand records really isn't all that much and unless the number of requests is large your performance may be acceptable.
That said, you'll have to implement the comparison logic yourself and if you want a large degree of flexibility (or if you need find you have to work on performance) you might want to look at something like Lucene.Net. Most of the text search engines I've seen and worked with have been more file-based, but I think you can index in-memory objects as well (however I'm not sure about that).
Good luck!

I'm not sure if you're asking for help writing the search given your existing affinity function, or if you're asking for help writing the affinity function. So for the moment I'll assume you're completely lost.
Given that assumption, you'll notice that I divided the problem into two pieces, and that's what you need to do as well. You need to write a function that takes two string inputs and returns a boolean value indicating whether or not the inputs are sufficiently similar. Then you need a separate search a delegate that will match any function with that kind of signature.
The basic signature for your affinity function might look like this:
bool IsAffinityMatch(string p1, string p2)
And then your search would look like this:
MyPersonCollection.Where(p => IsAffinityMatch(p.Surname, OtherPerson.Surname));

I provide the source code of that Affinity method:
/// <summary>
/// Compute Levenshtein distance according to the Levenshtein Distance Algorithm
/// </summary>
/// <param name="s">String 1</param>
/// <param name="t">String 2</param>
/// <returns>Distance between the two strings.
/// The larger the number, the bigger the difference.
/// </returns>
private static int Compare(string s, string t)
{
/* if both string are not set, its uncomparable. But others fields can still match! */
if (string.IsNullOrEmpty(s) && string.IsNullOrEmpty(t)) return 0;
/* if one string has value and the other one hasn't, it's definitely not match */
if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(t)) return -1;
s = s.ToUpper().Trim();
t = t.ToUpper().Trim();
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
int cost;
if (n == 0) return m;
if (m == 0) return n;
for (int i = 0; i <= n; d[i, 0] = i++) ;
for (int j = 0; j <= m; d[0, j] = j++) ;
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}
that means, if 0 is returned, 2 strings are identical.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to handle arrays of extremely large strings in C# - c#

Related

Sorting Loop Not Sorting First Variable

I have a for loop that asigns characters to char array, error: System.IndexOutOfRangeException

Faster conversion of BGR packed to RGB planar pixel format

Matching Two Lists

C# search with resemblance / affinity

Categories

Resources