Array.Reverse algorithm? - c#

What kind of algorithm Array.Reverse(string a), uses behind the scene to reverse the string?

UPDATE: See the bottom of this answer for one truly horrifying ramification of reversing a string in-place in .NET.
"Good" Answer
In .NET, there's no Array.Reverse overload that takes a string. That said, here's how one might be implemented if it were to exist:
static string ReverseString(string str) {
char[] reversed = new char[str.Length];
for (int i = 0; i < reversed.Length; ++i)
reversed[i] = str[str.Length - 1 - i];
return new string(reversed);
}
Note that in .NET this method has to return a string, since the System.String type is immutable and so you couldn't reverse one in-place.
Scary Answer
OK, actually, it is possible to reverse a string in-place in .NET.
Here's one way, which requires compiling in unsafe mode:
static unsafe void ReverseString(string str) {
int i = 0;
int j = str.Length - 1;
fixed (char* fstr = str) {
while (i < j) {
char temp = fstr[j];
fstr[j--] = fstr[i];
fstr[i++] = temp;
}
}
}
And here's another way, which uses reflection and does not need to be compiled in unsafe mode:
static void ReverseString(string str) {
int i = 0;
int j = str.Length - 1;
// what a tricky bastard!
MethodInfo setter = typeof(string).GetMethod(
"SetChar",
BindingFlags.Instance | BindingFlags.NonPublic
);
while (i < j) {
char temp = str[j];
setter.Invoke(str, new object[] { j--, str[i] });
setter.Invoke(str, new object[] { i++, temp });
}
}
Totally inadvisable and reckless, yes -- not to mention that it would likely have horrendous performance. But possible nonetheless.
The Horror
Oh, and by the way, in case there's any doubt in your mind whatsoever that you should never do anything like this: be aware that either of the ReverseString methods I've provided above will actually allow you to write the following utterly bizarre program:
ReverseString("Hello!");
Console.WriteLine("Hello!");
The above code will output, believe it or not*:
!olleH
So yeah, unless you want all hell to break loose in your code, don't reverse a string in-place. Even though technically you can ;)
*You can try it for yourself if you don't believe me.

Probably a standard in-place reversal algorithm.
function reverse-in-place(a[0..n])
for i from 0 to floor(n/2)
swap(a[i], a[n-i])
Sources
Wikipedia/In-place algorithm

The algorithm is probably using two pointers i and j that start at 0 and length-1 respectively. Then the characters at position i and j are swapped (with the help of a temporal variable) and i is incremented and j decremented by 1. These steps are repeated until both pointers reach each other (i ≥ j).
In pseudo-code:
i := 0;
j := a.length-1;
while (i < j) do
tmp := a[i];
a[i] := a[j];
a[j] := tmp;
i := i+1;
j := j-1;
endwhile;

According to Reflector, Array.Reverse(Array) (there's no string variation) first calls something called TrySZReverse, for which I can't find the implementation. I assume it's some sort of heavily optimized native method..
If that fails, it does something like this:
int num = index;
int num2 = (index + length) - 1;
while (num < num2)
{
object obj2 = objArray[num];
objArray[num] = objArray[num2];
objArray[num2] = obj2;
num++;
num2--;
}
So, an in place algorithm, where it swaps the values at each end, then moves inward, repeatedly.

Here's a general-purpose, language-independent, question-appropriate answer: It copies the input string to the output string, reading from one end and writing to the other.

My suggestion:
private string Reverse(string text)
{
char[] c = text.ToCharArray(0, text.Length);
Array.Reverse(c);
return new string(c);
}

Related

Make custom class instances reinstantiate on operator = is it possible?

I have a code like this:
public static IEnumerable<IntEx> FibbonacciNumbersStr()
{
IntEx j = 0;
IntEx i = 1;
for (Int64 k = 0; k < Int64.MaxValue; k++)
{
yield return j;
IntEx oldi = i;
i = i+j;
j = oldi;
}
}
IntEx is a custom 'int' class able to calculate gigantic numbers and write them as string in any given number base, the problem here is, when I say IntEx oldi = i, oldi will be a POINTER to i and ends up not doing its purpose, (thus making that fibonacci sequence totally wrong. Tho, if I do
public static IEnumerable<IntEx> FibbonacciNumbersStr()
{
IntEx j = 0;
IntEx i = 1;
for (Int64 k = 0; k < Int64.MaxValue; k++)
{
yield return j;
IntEx oldi = new IntEx(i);
i = i+j;
j = oldi;
}
}
It works perfectly.
Is there any way I can get the second result using the simple = operator to reinstantiate instead of pointing to, thus imitating better the behavior of Int64 or something?
I mean, the point here is I may end up wanting to distribute this code for other people to work with and they would probably end up having that 'point to instead of being a new value' problem when using IntEx class -_-
Any ideas?
There is no way to override the = behavior in C#. If IntEx is a class type then assignment will always cause the LHS to refer to the same value as the RHS.
This can be solved to a degree by inserting a struct into the equation. In struct assignment are fields are copied memberwise between the LHS and RHS. It still can't be customized but does ensure some separation of the values
you can always create a static method to set a value.
Thus this solutions would work.
IntEx.SetValue(i);
or make the IntEx produce instances not by creating new but making a static method to it like.
IntEx.CreateInstance(i);

Is there an equivalent to the F# Seq.windowed in C#?

I am working on some C# code dealing with problems like moving averages, where I often need to take a List / IEnumerable and work on chunks of consecutive data. The F# Seq module has a great function, windowed, which taking in a Sequence, returns a sequence of chunks of consecutive elements.
Does C# have an equivalent function out-of-the-box with LINQ?
You can always just call SeqModule.Windowed from C#, you just need to reference FSharp.Core.Dll. The function names are also slightly mangled, so you call Windowed rather than windowed, so that it fits with the C# capitalisation conventions
You could always roll your own (or translate the one from F# core):
let windowed windowSize (source: seq<_>) =
checkNonNull "source" source
if windowSize <= 0 then invalidArg "windowSize" (SR.GetString(SR.inputMustBeNonNegative))
seq { let arr = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked windowSize
let r = ref (windowSize-1)
let i = ref 0
use e = source.GetEnumerator()
while e.MoveNext() do
arr.[!i] <- e.Current
i := (!i + 1) % windowSize
if !r = 0 then
yield Array.init windowSize (fun j -> arr.[(!i+j) % windowSize])
else
r := (!r - 1) }
My attempt looks like this, it's way slower than just calling F# directly (as suggested by John Palmer). I'm guessing it's because of F# using an Unchecked array.:
public static IEnumerable<T[]> Windowed<T>(this IEnumerable<T> list, int windowSize)
{
//Checks elided
var arr = new T[windowSize];
int r = windowSize - 1, i = 0;
using(var e = list.GetEnumerator())
{
while(e.MoveNext())
{
arr[i] = e.Current;
i = (i + 1) % windowSize;
if(r == 0)
yield return ArrayInit<T>(windowSize, j => arr[(i + j) % windowSize]);
else
r = r - 1;
}
}
}
public static T[] ArrayInit<T>(int size, Func<int, T> func)
{
var output = new T[size];
for(var i = 0; i < size; i++) output[i] = func(i);
return output;
}
John Palmer's answer is great, here is an example based on his answer.
var numbers = new[] {1, 2, 3, 4, 5};
var windowed = SeqModule.Windowed(2, numbers);
You may (or not) want to add ToArray() to the end, without ToArray, the return type is still in F# world (Sequence). With ToArray, it is back to C# world (Array).
The Reactive Extensions have a few operators to help with this, such as Buffer and Window. The Interactive Extensions, which can be found in the experimental branch, add these and a significant number of additional operators to LINQ.

What's wrong with my implementation of the KMP algorithm?

static void Main(string[] args)
{
string str = "ABC ABCDAB ABCDABCDABDE";//We should add some text here for
//the performance tests.
string pattern = "ABCDABD";
List<int> shifts = new List<int>();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
NaiveStringMatcher(shifts, str, pattern);
stopWatch.Stop();
Trace.WriteLine(String.Format("Naive string matcher {0}", stopWatch.Elapsed));
foreach (int s in shifts)
{
Trace.WriteLine(s);
}
shifts.Clear();
stopWatch.Restart();
int[] pi = new int[pattern.Length];
Knuth_Morris_Pratt(shifts, str, pattern, pi);
stopWatch.Stop();
Trace.WriteLine(String.Format("Knuth_Morris_Pratt {0}", stopWatch.Elapsed));
foreach (int s in shifts)
{
Trace.WriteLine(s);
}
Console.ReadKey();
}
static IList<int> NaiveStringMatcher(List<int> shifts, string text, string pattern)
{
int lengthText = text.Length;
int lengthPattern = pattern.Length;
for (int s = 0; s < lengthText - lengthPattern + 1; s++ )
{
if (text[s] == pattern[0])
{
int i = 0;
while (i < lengthPattern)
{
if (text[s + i] == pattern[i])
i++;
else break;
}
if (i == lengthPattern)
{
shifts.Add(s);
}
}
}
return shifts;
}
static IList<int> Knuth_Morris_Pratt(List<int> shifts, string text, string pattern, int[] pi)
{
int patternLength = pattern.Length;
int textLength = text.Length;
//ComputePrefixFunction(pattern, pi);
int j;
for (int i = 1; i < pi.Length; i++)
{
j = 0;
while ((i < pi.Length) && (pattern[i] == pattern[j]))
{
j++;
pi[i++] = j;
}
}
int matchedSymNum = 0;
for (int i = 0; i < textLength; i++)
{
while (matchedSymNum > 0 && pattern[matchedSymNum] != text[i])
matchedSymNum = pi[matchedSymNum - 1];
if (pattern[matchedSymNum] == text[i])
matchedSymNum++;
if (matchedSymNum == patternLength)
{
shifts.Add(i - patternLength + 1);
matchedSymNum = pi[matchedSymNum - 1];
}
}
return shifts;
}
Why does my implemention of the KMP algorithm work slower than the Naive String Matching algorithm?
The KMP algorithm has two phases: first it builds a table, and then it does a search, directed by the contents of the table.
The naive algorithm has one phase: it does a search. It does that search much less efficiently in the worst case than the KMP search phase.
If the KMP is slower than the naive algorithm then that is probably because building the table is taking you longer than it takes to simply search the string naively in the first place. Naive string matching is usually very fast on short strings. There is a reason why we don't use fancy-pants algorithms like KMP inside the BCL implementations of string searching. By the time you set up the table, you could have done half a dozen searches of short strings with the naive algorithm.
KMP is only a win if you have enormous strings and you are doing lots of searches that allow you to re-use an already-built table. You need to amortize away the huge cost of building the table by doing lots of searches using that table.
And also, the naive algorithm only has bad performance in bizarre and unlikely scenarios. Most people are searching for words like "London" in strings like "Buckingham Palace, London, England", and not searching for strings like "BANANANANANANA" in strings like "BANAN BANBAN BANBANANA BANAN BANAN BANANAN BANANANANANANANANAN...". The naive search algorithm is optimal for the first problem and highly sub-optimal for the latter problem; but it makes sense to optimize for the former, not the latter.
Another way to put it: if the searched-for string is of length w and the searched-in string is of length n, then KMP is O(n) + O(w). The Naive algorithm is worst case O(nw), best case O(n + w). But that says nothing about the "constant factor"! The constant factor of the KMP algorithm is much larger than the constant factor of the naive algorithm. The value of n has to be awfully big, and the number of sub-optimal partial matches has to be awfully large, for the KMP algorithm to win over the blazingly fast naive algorithm.
That deals with the algorithmic complexity issues. Your methodology is also not very good, and that might explain your results. Remember, the first time you run code, the jitter has to jit the IL into assembly code. That can take longer than running the method in some cases. You really should be running the code a few hundred thousand times in a loop, discarding the first result, and taking an average of the timings of the rest.
If you really want to know what is going on you should be using a profiler to determine what the hot spot is. Again, make sure you are measuring the post-jit run, not the run where the code is jitted, if you want to have results that are not skewed by the jit time.
Your example is too small and it does not have enough repetitions of the pattern where KMP avoids backtracking.
KMP can be slower than the normal search in some cases.
A Simple KMPSubstringSearch Implementation.
https://github.com/bharathkumarms/AlgorithmsMadeEasy/blob/master/AlgorithmsMadeEasy/KMPSubstringSearch.cs
using System;
using System.Collections.Generic;
using System.Linq;
namespace AlgorithmsMadeEasy
{
class KMPSubstringSearch
{
public void KMPSubstringSearchMethod()
{
string text = System.Console.ReadLine();
char[] sText = text.ToCharArray();
string pattern = System.Console.ReadLine();
char[] sPattern = pattern.ToCharArray();
int forwardPointer = 1;
int backwardPointer = 0;
int[] tempStorage = new int[sPattern.Length];
tempStorage[0] = 0;
while (forwardPointer < sPattern.Length)
{
if (sPattern[forwardPointer].Equals(sPattern[backwardPointer]))
{
tempStorage[forwardPointer] = backwardPointer + 1;
forwardPointer++;
backwardPointer++;
}
else
{
if (backwardPointer == 0)
{
tempStorage[forwardPointer] = 0;
forwardPointer++;
}
else
{
int temp = tempStorage[backwardPointer];
backwardPointer = temp;
}
}
}
int pointer = 0;
int successPoints = sPattern.Length;
bool success = false;
for (int i = 0; i < sText.Length; i++)
{
if (sText[i].Equals(sPattern[pointer]))
{
pointer++;
}
else
{
if (pointer != 0)
{
int tempPointer = pointer - 1;
pointer = tempStorage[tempPointer];
i--;
}
}
if (successPoints == pointer)
{
success = true;
}
}
if (success)
{
System.Console.WriteLine("TRUE");
}
else
{
System.Console.WriteLine("FALSE");
}
System.Console.Read();
}
}
}
/*
* Sample Input
abxabcabcaby
abcaby
*/

C# compress a byte array

I do not know much about compression algorithms. I am looking for a simple compression algorithm (or code snippet) which can reduce the size of a byte[,,] or byte[]. I cannot make use of System.IO.Compression. Also, the data has lots of repetition.
I tried implementing the RLE algorithm (posted below for your inspection). However, it produces array's 1.2 to 1.8 times larger.
public static class RLE
{
public static byte[] Encode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 0; i < source.Length; i++)
{
runLength = 1;
while (runLength < byte.MaxValue
&& i + 1 < source.Length
&& source[i] == source[i + 1])
{
runLength++;
i++;
}
dest.Add(runLength);
dest.Add(source[i]);
}
return dest.ToArray();
}
public static byte[] Decode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 1; i < source.Length; i+=2)
{
runLength = source[i - 1];
while (runLength > 0)
{
dest.Add(source[i]);
runLength--;
}
}
return dest.ToArray();
}
}
I have also found a java, string and integer based, LZW implementation. I have converted it to C# and the results look good (code posted below). However, I am not sure how it works nor how to make it work with bytes instead of strings and integers.
public class LZW
{
/* Compress a string to a list of output symbols. */
public static int[] compress(string uncompressed)
{
// Build the dictionary.
int dictSize = 256;
Dictionary<string, int> dictionary = new Dictionary<string, int>();
for (int i = 0; i < dictSize; i++)
dictionary.Add("" + (char)i, i);
string w = "";
List<int> result = new List<int>();
for (int i = 0; i < uncompressed.Length; i++)
{
char c = uncompressed[i];
string wc = w + c;
if (dictionary.ContainsKey(wc))
w = wc;
else
{
result.Add(dictionary[w]);
// Add wc to the dictionary.
dictionary.Add(wc, dictSize++);
w = "" + c;
}
}
// Output the code for w.
if (w != "")
result.Add(dictionary[w]);
return result.ToArray();
}
/* Decompress a list of output ks to a string. */
public static string decompress(int[] compressed)
{
int dictSize = 256;
Dictionary<int, string> dictionary = new Dictionary<int, string>();
for (int i = 0; i < dictSize; i++)
dictionary.Add(i, "" + (char)i);
string w = "" + (char)compressed[0];
string result = w;
for (int i = 1; i < compressed.Length; i++)
{
int k = compressed[i];
string entry = "";
if (dictionary.ContainsKey(k))
entry = dictionary[k];
else if (k == dictSize)
entry = w + w[0];
result += entry;
// Add w+entry[0] to the dictionary.
dictionary.Add(dictSize++, w + entry[0]);
w = entry;
}
return result;
}
}
Have a look here. I used this code as a basis to compress in one of my work projects. Not sure how much of the .NET Framework is accessbile in the Xbox 360 SDK, so not sure how well this will work for you.
The problem with that RLE algorithm is that it is too simple. It prefixes every byte with how many times it is repeated, but that does mean that in long ranges of non-repeating bytes, each single byte is prefixed with a "1". On data without any repetitions this will double the file size.
This can be avoided by using Code-type RLE instead; the 'Code' (also called 'Token') will be a byte that can have two meanings; either it indicates how many times the single following byte is repeated, or it indicates how many non-repeating bytes follow that should be copied as they are. The difference between those two codes is made by enabling the highest bit, meaning there are still 7 bits available for the value, meaning the amount to copy or repeat per such code can be up to 127.
This means that even in worst-case scenarios, the final size can only be about 1/127th larger than the original file size.
A good explanation of the whole concept, plus full working (and, in fact, heavily optimised) C# code, can be found here:
http://www.shikadi.net/moddingwiki/RLE_Compression
Note that sometimes, the data will end up larger than the original anyway, simply because there are not enough repeating bytes in it for RLE to work. A good way to deal with such compression failures is by adding a header to your final data. If you simply add an extra byte at the start that's on 0 for uncompressed data and 1 for RLE compressed data, then, when RLE fails to give a smaller result, you just save it uncompressed, with the 0 in front, and your final data will be exactly one byte larger than the original. The system at the other side can then read that starting byte and use that to determine if the following data should be uncompressed or just copied.
Look into Huffman codes, it's a pretty simple algorithm. Basically, use fewer bits for patterns that show up more often, and keep a table of how it's encoded. And you have to account in your codewords that there are no separators to help you decode.

How can we prepend strings with StringBuilder?

I know we can append strings using StringBuilder. Is there a way we can prepend strings (i.e. add strings in front of a string) using StringBuilder so we can keep the performance benefits that StringBuilder offers?
Using the insert method with the position parameter set to 0 would be the same as prepending (i.e. inserting at the beginning).
C# example : varStringBuilder.Insert(0, "someThing");
Java example : varStringBuilder.insert(0, "someThing");
It works both for C# and Java
Prepending a String will usually require copying everything after the insertion point back some in the backing array, so it won't be as quick as appending to the end.
But you can do it like this in Java (in C# it's the same, but the method is called Insert):
aStringBuilder.insert(0, "newText");
If you require high performance with lots of prepends, you'll need to write your own version of StringBuilder (or use someone else's). With the standard StringBuilder (although technically it could be implemented differently) insert require copying data after the insertion point. Inserting n piece of text can take O(n^2) time.
A naive approach would be to add an offset into the backing char[] buffer as well as the length. When there is not enough room for a prepend, move the data up by more than is strictly necessary. This can bring performance back down to O(n log n) (I think). A more refined approach is to make the buffer cyclic. In that way the spare space at both ends of the array becomes contiguous.
Here's what you can do If you want to prepend using Java's StringBuilder class:
StringBuilder str = new StringBuilder();
str.Insert(0, "text");
You could try an extension method:
/// <summary>
/// kind of a dopey little one-off for StringBuffer, but
/// an example where you can get crazy with extension methods
/// </summary>
public static void Prepend(this StringBuilder sb, string s)
{
sb.Insert(0, s);
}
StringBuilder sb = new StringBuilder("World!");
sb.Prepend("Hello "); // Hello World!
You could build the string in reverse and then reverse the result.
You incur an O(n) cost instead of an O(n^2) worst case cost.
If I understand you correctly, the insert method looks like it'll do what you want. Just insert the string at offset 0.
I haven't used it but Ropes For Java Sounds intriguing. The project name is a play on words, use a Rope instead of a String for serious work. Gets around the performance penalty for prepending and other operations. Worth a look, if you're going to be doing a lot of this.
A rope is a high performance
replacement for Strings. The
datastructure, described in detail in
"Ropes: an Alternative to Strings",
provides asymptotically better
performance than both String and
StringBuffer for common string
modifications like prepend, append,
delete, and insert. Like Strings,
ropes are immutable and therefore
well-suited for use in multi-threaded
programming.
Try using Insert()
StringBuilder MyStringBuilder = new StringBuilder("World!");
MyStringBuilder.Insert(0,"Hello "); // Hello World!
You could create an extension for StringBuilder yourself with a simple class:
namespace Application.Code.Helpers
{
public static class StringBuilderExtensions
{
#region Methods
public static void Prepend(this StringBuilder sb, string value)
{
sb.Insert(0, value);
}
public static void PrependLine(this StringBuilder sb, string value)
{
sb.Insert(0, value + Environment.NewLine);
}
#endregion
}
}
Then, just add:
using Application.Code.Helpers;
To the top of any class that you want to use the StringBuilder in and any time you use intelli-sense with a StringBuilder variable, the Prepend and PrependLine methods will show up. Just remember that when you use Prepend, you will need to Prepend in reverse order than if you were Appending.
Judging from the other comments, there's no standard quick way of doing this. Using StringBuilder's .Insert(0, "text") is approximately only 1-3x as fast as using painfully slow String concatenation (based on >10000 concats), so below is a class to prepend potentially thousands of times quicker!
I've included some other basic functionality such as append(), subString() and length() etc. Both appends and prepends vary from about twice as fast to 3x slower than StringBuilder appends. Like StringBuilder, the buffer in this class will automatically increase when the text overflows the old buffer size.
The code has been tested quite a lot, but I can't guarantee it's free of bugs.
class Prepender
{
private char[] c;
private int growMultiplier;
public int bufferSize; // Make public for bug testing
public int left; // Make public for bug testing
public int right; // Make public for bug testing
public Prepender(int initialBuffer = 1000, int growMultiplier = 10)
{
c = new char[initialBuffer];
//for (int n = 0; n < initialBuffer; n++) cc[n] = '.'; // For debugging purposes (used fixed width font for testing)
left = initialBuffer / 2;
right = initialBuffer / 2;
bufferSize = initialBuffer;
this.growMultiplier = growMultiplier;
}
public void clear()
{
left = bufferSize / 2;
right = bufferSize / 2;
}
public int length()
{
return right - left;
}
private void increaseBuffer()
{
int nudge = -bufferSize / 2;
bufferSize *= growMultiplier;
nudge += bufferSize / 2;
char[] tmp = new char[bufferSize];
for (int n = left; n < right; n++) tmp[n + nudge] = c[n];
left += nudge;
right += nudge;
c = new char[bufferSize];
//for (int n = 0; n < buffer; n++) cc[n]='.'; // For debugging purposes (used fixed width font for testing)
for (int n = left; n < right; n++) c[n] = tmp[n];
}
public void append(string s)
{
// If necessary, increase buffer size by growMultiplier
while (right + s.Length > bufferSize) increaseBuffer();
// Append user input to buffer
int len = s.Length;
for (int n = 0; n < len; n++)
{
c[right] = s[n];
right++;
}
}
public void prepend(string s)
{
// If necessary, increase buffer size by growMultiplier
while (left - s.Length < 0) increaseBuffer();
// Prepend user input to buffer
int len = s.Length - 1;
for (int n = len; n > -1; n--)
{
left--;
c[left] = s[n];
}
}
public void truncate(int start, int finish)
{
if (start < 0) throw new Exception("Truncation error: Start < 0");
if (left + finish > right) throw new Exception("Truncation error: Finish > string length");
if (finish < start) throw new Exception("Truncation error: Finish < start");
//MessageBox.Show(left + " " + right);
right = left + finish;
left = left + start;
}
public string subString(int start, int finish)
{
if (start < 0) throw new Exception("Substring error: Start < 0");
if (left + finish > right) throw new Exception("Substring error: Finish > string length");
if (finish < start) throw new Exception("Substring error: Finish < start");
return toString(start,finish);
}
public override string ToString()
{
return new string(c, left, right - left);
//return new string(cc, 0, buffer); // For debugging purposes (used fixed width font for testing)
}
private string toString(int start, int finish)
{
return new string(c, left+start, finish-start );
//return new string(cc, 0, buffer); // For debugging purposes (used fixed width font for testing)
}
}
This should work:
aStringBuilder = "newText" + aStringBuilder;

Categories

Resources