I have to calculate CRC32 checksum for a string in C# and send it to an external application.
On the other end they will calculate it using Java.
But my checksum does not match on the their end.
e.g. CRC32 checksum of the following string
43HLV109520DAP10072la19z6
is 1269993351 on their end.
And 2947932745 at my end using C#
Please tell me what's going wrong in my code.
I am using this 0xffffffff default seed and following crc table
readonly static uint[] CRCTable = new uint[] {
0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA, 0x076DC419,
0x706AF48F, 0xE963A535, 0x9E6495A3, 0x0EDB8832, 0x79DCB8A4,
0xE0D5E91E, 0x97D2D988, 0x09B64C2B, 0x7EB17CBD, 0xE7B82D07,
0x90BF1D91, 0x1DB71064, 0x6AB020F2, 0xF3B97148, 0x84BE41DE,
0x1ADAD47D, 0x6DDDE4EB, 0xF4D4B551, 0x83D385C7, 0x136C9856,
0x646BA8C0, 0xFD62F97A, 0x8A65C9EC, 0x14015C4F, 0x63066CD9,
0xFA0F3D63, 0x8D080DF5, 0x3B6E20C8, 0x4C69105E, 0xD56041E4,
0xA2677172, 0x3C03E4D1, 0x4B04D447, 0xD20D85FD, 0xA50AB56B,
0x35B5A8FA, 0x42B2986C, 0xDBBBC9D6, 0xACBCF940, 0x32D86CE3,
0x45DF5C75, 0xDCD60DCF, 0xABD13D59, 0x26D930AC, 0x51DE003A,
0xC8D75180, 0xBFD06116, 0x21B4F4B5, 0x56B3C423, 0xCFBA9599,
0xB8BDA50F, 0x2802B89E, 0x5F058808, 0xC60CD9B2, 0xB10BE924,
0x2F6F7C87, 0x58684C11, 0xC1611DAB, 0xB6662D3D, 0x76DC4190,
0x01DB7106, 0x98D220BC, 0xEFD5102A, 0x71B18589, 0x06B6B51F,
0x9FBFE4A5, 0xE8B8D433, 0x7807C9A2, 0x0F00F934, 0x9609A88E,
0xE10E9818, 0x7F6A0DBB, 0x086D3D2D, 0x91646C97, 0xE6635C01,
0x6B6B51F4, 0x1C6C6162, 0x856530D8, 0xF262004E, 0x6C0695ED,
0x1B01A57B, 0x8208F4C1, 0xF50FC457, 0x65B0D9C6, 0x12B7E950,
0x8BBEB8EA, 0xFCB9887C, 0x62DD1DDF, 0x15DA2D49, 0x8CD37CF3,
0xFBD44C65, 0x4DB26158, 0x3AB551CE, 0xA3BC0074, 0xD4BB30E2,
0x4ADFA541, 0x3DD895D7, 0xA4D1C46D, 0xD3D6F4FB, 0x4369E96A,
0x346ED9FC, 0xAD678846, 0xDA60B8D0, 0x44042D73, 0x33031DE5,
0xAA0A4C5F, 0xDD0D7CC9, 0x5005713C, 0x270241AA, 0xBE0B1010,
0xC90C2086, 0x5768B525, 0x206F85B3, 0xB966D409, 0xCE61E49F,
0x5EDEF90E, 0x29D9C998, 0xB0D09822, 0xC7D7A8B4, 0x59B33D17,
0x2EB40D81, 0xB7BD5C3B, 0xC0BA6CAD, 0xEDB88320, 0x9ABFB3B6,
0x03B6E20C, 0x74B1D29A, 0xEAD54739, 0x9DD277AF, 0x04DB2615,
0x73DC1683, 0xE3630B12, 0x94643B84, 0x0D6D6A3E, 0x7A6A5AA8,
0xE40ECF0B, 0x9309FF9D, 0x0A00AE27, 0x7D079EB1, 0xF00F9344,
0x8708A3D2, 0x1E01F268, 0x6906C2FE, 0xF762575D, 0x806567CB,
0x196C3671, 0x6E6B06E7, 0xFED41B76, 0x89D32BE0, 0x10DA7A5A,
0x67DD4ACC, 0xF9B9DF6F, 0x8EBEEFF9, 0x17B7BE43, 0x60B08ED5,
0xD6D6A3E8, 0xA1D1937E, 0x38D8C2C4, 0x4FDFF252, 0xD1BB67F1,
0xA6BC5767, 0x3FB506DD, 0x48B2364B, 0xD80D2BDA, 0xAF0A1B4C,
0x36034AF6, 0x41047A60, 0xDF60EFC3, 0xA867DF55, 0x316E8EEF,
0x4669BE79, 0xCB61B38C, 0xBC66831A, 0x256FD2A0, 0x5268E236,
0xCC0C7795, 0xBB0B4703, 0x220216B9, 0x5505262F, 0xC5BA3BBE,
0xB2BD0B28, 0x2BB45A92, 0x5CB36A04, 0xC2D7FFA7, 0xB5D0CF31,
0x2CD99E8B, 0x5BDEAE1D, 0x9B64C2B0, 0xEC63F226, 0x756AA39C,
0x026D930A, 0x9C0906A9, 0xEB0E363F, 0x72076785, 0x05005713,
0x95BF4A82, 0xE2B87A14, 0x7BB12BAE, 0x0CB61B38, 0x92D28E9B,
0xE5D5BE0D, 0x7CDCEFB7, 0x0BDBDF21, 0x86D3D2D4, 0xF1D4E242,
0x68DDB3F8, 0x1FDA836E, 0x81BE16CD, 0xF6B9265B, 0x6FB077E1,
0x18B74777, 0x88085AE6, 0xFF0F6A70, 0x66063BCA, 0x11010B5C,
0x8F659EFF, 0xF862AE69, 0x616BFFD3, 0x166CCF45, 0xA00AE278,
0xD70DD2EE, 0x4E048354, 0x3903B3C2, 0xA7672661, 0xD06016F7,
0x4969474D, 0x3E6E77DB, 0xAED16A4A, 0xD9D65ADC, 0x40DF0B66,
0x37D83BF0, 0xA9BCAE53, 0xDEBB9EC5, 0x47B2CF7F, 0x30B5FFE9,
0xBDBDF21C, 0xCABAC28A, 0x53B39330, 0x24B4A3A6, 0xBAD03605,
0xCDD70693, 0x54DE5729, 0x23D967BF, 0xB3667A2E, 0xC4614AB8,
0x5D681B02, 0x2A6F2B94, 0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B,
0x2D02EF8D
};
CRC32 is calculated over a sequence of bytes and not over a string. So to calculate CRC32 you need to transform the string into bytes first. If you use a different encoding to transform a string to a sequence of bytes the result will be different.
Thus you need to use the same encoding on both sides. I recommend using UTF-8 without BOM.
I have calculated CRC32 with Java and got the same you got in C#. I.e. CRC32(43HLV109520DAP10072la19z6)=2947932745. This means that either they have a bug in java, or you have a bug during transmission.
Code follows.
I suggest you try to send simple data to java application, like zeros or ones, and try to deduce how do they compute CRC.
public static void main(String[] args) {
CRC32 crc32 = new CRC32();
String data = "43HLV109520DAP10072la19z6";
String[] cs = new String[] {"utf8" /*, "cp1252", "cp866" */};
byte[] array;
byte b;
for(int i=0; i<cs.length; ++i) {
array = data.getBytes(Charset.forName(cs[i]));
crc32.reset();
crc32.update(array);
System.out.println(String.format("%s: %d", cs[i], crc32.getValue()));
/*
for(int j=0; j<array.length/2; j++) {
b = array[i];
array[i] = array[array.length-1-i];
array[array.length-1-i] = b;
}
*/
for(int j=0; j<array.length; j+=2) {
b = array[i];
array[i] = array[i+2];
array[i+1] = b;
}
crc32.reset();
crc32.update(array);
System.out.println(String.format("of modified: %d", crc32.getValue()));
}
}
UPDATE
Endiannes reverse also not help
for(int j=0; j<array.length; j+=4) {
b = array[i];
array[i] = array[i+3];
array[i+3] = b;
b = array[i+1];
array[i+1] = array[i+2];
array[i+2] = b;
}
Without delving into any detail, the problem can be related to Java's lack of unsigned integer types. The problem could happen at the int level, but also at the byte level. This is one avenue of investigation.
CRC is calculated over a sequence of bytes and not over a string.
Whichever CRC in java looks different due unavailability of Unsigned int in java.
Convert calculated Int CRC into Hex String and take last 2 Bytes (length 4)
That is your actual CRC unsigned Int.
String hexCrc = Integer.toHexString(crcCalculated);
hexCrc = hexCrc.substring(hexCrc.length()-4);
compare hex CRC of c# and Java both should be same.
I'm currently making a game but I seem to have problems reading values from a text file. For some reason, when I read the value, it gives me the ASCII code of the value rather than the actual value itself when I wrote it to the file. I've tried about every ASCII conversion function and string conversion function, but I just can't seem to figure it out.
I use a 2D array of integers. I use a nested for loop to write each element into the file. I've looked at the file and the values are correct, but I don't understand why it's returning the ASCII code. Here's the code I'm using to write and read to file:
Writing to file:
for (int i = 0; i < level.MaxRows(); i++)
{
for (int j = 0; j < level.MaxCols(); j++)
{
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
//Console.WriteLine(level.GetValueAtIndex(i, j));
}
//add new line
fileWrite.WriteLine();
}
And here's the code where I read the values from the file:
string str = "";
int iter = 0; //used to iterate in each column of array
for (int i = 0; i < level.MaxRows(); i++)
{
iter = 0;
//TODO: For some reason, the file is returning ASCII code, convert to int
//keep reading characters until a space is reached.
str = fileRead.ReadLine();
//take the above string and extract the values from it.
//Place each value in the level.
foreach (char id in str)
{
if (id != ' ')
{
//convert id to an int
num = (int)id;
level.ChangeTile(i, iter, num);
iter++;
}
}
This is the latest version of the loop that I use to read the values. Reading other values is fine; it's just when I get to the array, things go wrong. I guess my question is, why did the conversion to ASCII happen? If I can figure that out, then I might be able to solve the issue. I'm using XNA 4 to make my game.
This is where the convertion to ascii is happening:
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
The + operator implicitly converts the integer returned by GetValueAtIndex into a string, because you are adding it to a string (really, what did you expect to happen?)
Furthermore, the ReadLine method returns a String, so I am not sure why you'd expect a numeric value to magically come back here. If you want to write binary data, look into BinaryWriter
This is where you are converting the characters to character codes:
num = (int)id;
The id variable is a char, and casting that to int gives you the character code, not the numeric value.
Also, this converts a single character, not a whole number. If you for example have "12 34 56 " in your text file, it will get the codes for 1, 2, 3, 4, 5 and 6, not 12, 34 and 56.
You would want to split the line on spaces, and parse each substring:
foreach (string id in str.Split(' ')) {
if (id.Length > 0) {
num = Int32.Parse(id);
level.ChangeTile(i, iter, num);
iter++;
}
}
Update: I've kept the old code (below) with the assumption that one record was on each line, but I've also added a different way of doing it that should work with multiple integers on a line, separated by a space.
Multiple records on one line
str = fileRead.ReadLine();
string[] values = str.Split(new Char[] {' '});
foreach (string value in values)
{
int testNum;
if (Int32.TryParse(str, out testnum))
{
// again, not sure how you're using iter here
level.ChangeTile(i, iter, num);
}
}
One record per line
str = fileRead.ReadLine();
int testNum;
if (Int32.TryParse(str, out testnum))
{
// however, I'm not sure how you're using iter here; if it's related to
// parsing the string, you'll probably need to do something else
level.ChangeTile(i, iter, num);
}
Please note that the above should work if you write out each integer line-by-line (i.e. how you were doing it via the WriteLine which you remarked out in your code above). If you switch back to using a WriteLine, this should work.
You have:
foreach (char id in str)
{
//convert id to an int
num = (int)id;
A char is an ASCII code (or can be considered as such; technically it is a unicode code-point, but that is broadly comparable assuming you are writing ANSI or low-value UTF-8).
What you want is:
num = (int)(id - '0');
This:
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
converts the int returned from level.GetValueAtIndex(i, j) into a string. Assuming the function returns the value 5 for a particular i and j then you write "5 " into the file.
When you then read it is being read as a string which consists of chars and you get the ASCII code of 5 when you cast it simply to an int. What you need is:
foreach (char id in str)
{
if (id != ' ')
{
//convert id to an int
num = (int)(id - '0'); // subtract the ASCII value for 0 from your current id
level.ChangeTile(i, iter, num);
iter++;
}
}
However this only works if you only ever are going to have single digit integers (only 0 - 9). This might be better:
foreach (var cell in fileRead.ReadLine().Split(' '))
{
num = Int.Parse(cell);
level.ChangeTile(i, iter, num);
iter++;
}
I'm trying to figure out the time complexity of a function that I wrote (it generates a power set for a given string):
public static HashSet<string> GeneratePowerSet(string input)
{
HashSet<string> powerSet = new HashSet<string>();
if (string.IsNullOrEmpty(input))
return powerSet;
int powSetSize = (int)Math.Pow(2.0, (double)input.Length);
// Start at 1 to skip the empty string case
for (int i = 1; i < powSetSize; i++)
{
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
string set = string.Empty;
for (int j = 0; j < pset.Length; j++)
{
if (pset[j] == '1')
{
set = string.Concat(set, input[j].ToString());
}
}
powerSet.Add(set);
}
return powerSet;
}
So my attempt is this:
let the size of the input string be n
in the outer for loop, must iterate 2^n times (because the set size is 2^n).
in the inner for loop, we must iterate 2*n times (at worst).
1. So Big-O would be O((2^n)*n) (since we drop the constant 2)... is that correct?
And n*(2^n) is worse than n^2.
if n = 4 then
(4*(2^4)) = 64
(4^2) = 16
if n = 100 then
(10*(2^10)) = 10240
(10^2) = 100
2. Is there a faster way to generate a power set, or is this about optimal?
A comment:
the above function is part of an interview question where the program is supposed to take in a string, then print out the words in the dictionary whose letters are an anagram subset of the input string (e.g. Input: tabrcoz Output: boat, car, cat, etc.). The interviewer claims that a n*m implementation is trivial (where n is the length of the string and m is the number of words in the dictionary), but I don't think you can find valid sub-strings of a given string. It seems that the interviewer is incorrect.
I was given the same interview question when I interviewed at Microsoft back in 1995. Basically the problem is to implement a simple Scrabble playing algorithm.
You are barking up completely the wrong tree with this idea of generating the power set. Nice thought, clearly way too expensive. Abandon it and find the right answer.
Here's a hint: run an analysis pass over the dictionary that builds a new data structure more amenable to efficiently solving the problem you actually have to solve. With an optimized dictionary you should be able to achieve O(nm). With a more cleverly built data structure you can probably do even better than that.
2. Is there a faster way to generate a power set, or is this about optimal?
Your algorithm is reasonable, but your string handling could use improvement.
string str = Convert.ToString(i, 2);
string pset = str;
for (int k = str.Length; k < input.Length; k++)
{
pset = "0" + pset;
}
All you're doing here is setting up a bitfield, but using a string. Just skip this, and use variable i directly.
for (int j = 0; j < input.Length; j++)
{
if (i & (1 << j))
{
When you build the string, use a StringBuilder, not creating multiple strings.
// At the beginning of the method
StringBuilder set = new StringBuilder(input.Length);
...
// Inside the loop
set.Clear();
...
set.Append(input[j]);
...
powerSet.Add(set.ToString());
Will any of this change the complexity of your algorithm? No. But it will significantly reduce the number of extra String objects you create, which will provide you a good speedup.
I'm writing a C# application in which I need to search a file (could be very big) for a sequence of bytes, and I can't use any libraries to do so. So, I need a function that takes a byte array as an argument and returns the position of the byte following the given sequence. The function doesn't have to be fast, it simply has to work. Any help would be greatly appreciated :)
If it doesn't have to be fast you could use this:
int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
for (int i = 0; i < data.Length - pattern.Length; i++)
{
bool match = true;
for (int k = 0; k < pattern.Length; k++)
{
if (data[i + k] != pattern[k])
{
match = false;
break;
}
}
if (match)
{
return i + pattern.Length;
}
}
}
But I really would recommend you to use Knuth-Morris-Pratt algorithm, it's the algorithm mostly used as a base of IndexOf methods for strings. The algorithm above will perform really slow, exept for small arrays and small patterns.
The straight-forward approach as pointed out by Turrau works, and for your purposes is probably good enough, since you say it doesn't have to be fast - especially since for most practical purposes the algorithm is much faster than the worst case O(n*m). (Depending on your pattern I guess).
For an optimal solution you can also check out the Knuth-Morris-Pratt algorithm, which makes use of partial matches which in the end is O(n+m).
Here's an extract of some code I used to do a boyer-moore type search. It's mean to work on pcap files, so it operates record by record, but should be easy enough to modify to suit just searching a long binary file. It's sort of extracted from some test code, so I hope I got everything for you to follow along. Also look up boyer-moore searching on wikipedia, since that is what it's based off of.
int[] badMatch = new int[256];
byte[] pattern; //the pattern we are searching for
//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
badMatch[i] = pattern.Length;
}
//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}
// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>
// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
// if our payload character equals our search string character, continue matching counting backwards
for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
{
k--;
}
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
if (j == -1)
{
//we MATCHED!!!
//i = end;
cont = false;
}
}