How do I use this CRC-32C C# library? - c#

I've downloaded this library https://github.com/robertvazan/crc32c.net for my project I'm working on. I need to use CRC in a part of my project so I downloaded the library as it is obviously going to be much faster than anything I'm going to write in the near future.
I have some understanding of how crc works, I once made a software implementation of it (as a part of learning) that worked, but I have got to be doing something incredibly stupid while trying to get this library to work and not realize it. No matter what I do, I can't seem to be able to get crc = 0 even though the arrays were not changed.
Basically, my question is, how do I actually use this library to check for integrity of a byte array?
The way I understand it, I should call Crc32CAlgorithm.Compute(array) once to compute the crc the first time and then call it again on an array that has the previously returned value appended (I've tried to append it as well as set last 4 bytes of the array to zeroes before putting the returned value there) and if the second call returns 0 the array was unchanged.
Please help me, I don't know what I'm doing wrong.
EDIT: It doesn't work right when I do this: (yes, I realize linq is very slow, this is just an example)
using(var hash = new Crc32CAlgorithm())
{
var array = new byte[] { 1, 2, 3, 4, 5, 6, 7, 8 };
var crc = hash.ComputeHash(array);
var arrayWithCrc = array.Concat(crc).ToArray();
Console.WriteLine(string.Join(" ", hash.ComputeHash(arrayWithCrc)));
}
Console outputs: 199 75 103 72

You do not need to append a CRC to a message and compute the CRC of that in order to check a CRC. Just compute the CRC on the message on one end, send that CRC along with the message, compute CRC on just the message on the other end (not including the sent CRC), and then compare the CRC you computed to the one that was sent with the message.
They should be equal to each other. That's all there is to it. That works for any hash you might use, not just CRCs.
If you feel deeply compelled to make use of the lovely mathematical property of CRCs where computing the CRC on the message with its CRC appended gives a specific result, you can. You have to append the CRC bits in the correct order, and you need to look for the "residue" of the CRC, which may not be zero.
In your case, you are in fact appending the bits in the correct order (by appending the bytes in little-endian order), and the result you are getting is the correct residue for the CRC-32C. That residue is 0x48674bc7, which separated into bytes, in little-endian order, and then converted into decimal is your 199 75 103 72.
You will find that if you take any sequence of bytes, compute the CRC-32C of that, append that CRC to the sequence in little-endian order, and compute the CRC-32C of the sequence plus CRC, you will always get 0x48674bc7.
However that's smidge slower than just comparing the two CRC's, since now you have to compute a CRC on four more bytes than before. So, really, there's no need to do it this way.

Related

Sending Int32 equal to 4, received as equal to 67108864

What's going on, I do this on the server:
var msg = Server.Api.CreateMessage();
msg.Write(2);
msg.Write(FreshChunks.Count());
Server.Api.SendMessage(msg, peer.Connection, NetDeliveryMethod.ReliableUnordered);
then on the client it succesfuly reads the byte = 2 and the switch then routes to function which reads Int32 (FreshChunks.Count) which was equal 4 but when received it equals 67108864. I've tried with Int16-64 and UInt16-64, none of them work out the correct value.
Given that:
In your usage of msg.Write(2), the compiler reads the 2 as an int (Int32)
You mentioned that you "successfully read the byte = 2".
It seems that one of these options is happening:
msg.Write is writing only bytes that have at least one-bit set (=1) in them. (to save space)
msg.Write is always casting the given argument to a byte.
When asking for 4 bytes (Int32),
You got:
0x04 00 00 00. The first byte is exactly the 4 you passed.
It seems that when asking from msg.Read more bytes than it has (you requested 4bytes and it has only 1 due to msg.Write logic)
It does one of these:
Appends the remaining bytes with zeros
Keeps on reading, and in your case, there were 3 0's bytes in the message's metadata that was returned to you.
For solving your problem, you should read the documentation of the Write and Read methods and understand how they behave.

Cutting random bytes off of file byte array in C#

So I've been working on this project for a while now, involving LSB steganography. Really fun stuff. Anyways, I just finished writing the code for embedding and extracting files from an image(instead of just plaintext), and I'm running into this problem. I can recognize the MIME and extension of the bytes, but because the embedded file doesn't usually take up all of the LSBs of the image, there's a lot of garbage data. So I have the extracted file + some garbage in the byte array right after it. I need to figure out how to cut these, so that the file that is being exported is the correct, smaller size.
TLDR: I have a byte array with a recognized file in it, with some additional random bytes. How do I find out where the file ends and the random bytes begin?
Remember this is all in C#.
Any advice is appreciated.
Link to my project for reference: https://github.com/nicosogangstar/Steg
Generally you have two options.
End of stream marker
This is the more direct approach of the two, but it may lack some versatily depending on what data you want to hide. After you embed your data, continue with embedding a unique sequence of bits/bytes such that you know it cannot be prematurely encountered in the data before. As you extract the bits, you can stop reading once you encounter this sequence. If you expect to hide only readable text, i.e. bytes with ascii codes between 32 and 127, your marker can be as short as eight 0s, or eight 1s. However, if you intend to hide any sort of binary data, where each byte has a chance of appearing, you may accidentally encounter the marker while extracting legitimate data and thus halt the process prematurely.
Header information
You can add a header preceding data, e.g, another 16-24 bits (or any other amount) which can be translated to a number that tells you how many bits/bytes/pixels to read before stopping. For example, if you want to hide a byte array of size 1000, first embed 2 bytes related to the length of the secret and then follow it with the actual data. More specifically, split the length in 2 bytes, where the first byte has the 8th to 15th bits and the second byte has the 0th to 7th bits of the number 1000 in binary.
00000011 11101000 1000 in binary
3 -24 byte values
You can embed all sorts of information in a header, such as whether the data is encrypted or compressed with some algorithm, the original filename of the date, how many LSBs to read for extracting the information, etc.

What is the fastest way to scan a C# byte[] for two bytes on a 64 bit computer and return the location?

I have an in-memory byte[] and need to locate the offset where 13 and 10 are. I will then use the following to extract that line:
String oneLine = Encoding.ASCII.GetString(bytes, 0, max);
What is the fastest way to search for the two bytes on a x64 bit computer? ..and convert it to a string?
Is there anything I can do other than iterating through each byte, scanning for 13 and then scan for 10?
// Disclaimer:
// This is just for my curiosity. Perhaps I'll gain a better understanding of
// how .NET interfaces with RAM, the CPU instructions related to comparisons, etc.
//
// I don't suspect a performance problem, but I do suspect a lack of understanding
// (on my part) on how C# does low-level operations.
Not sure if it will be 'the fastest way' but you can look at Boyer-Moore algorithm to find the indexes of the required values.
Have a look at this SO thread Search longest pattern in byte array in C#
Boyer-Moore would be better than a linear array traversal because it can skip elements depending on the length of you 'needle' and it gets better as the 'haystack' gets bigger. HTH.
Since you are looking for a two byte sequence, you don't have to scan every byte, just every other one. If the target index contains a 13, then look at the next byte for a 10. If the target index contatins a 10, look at the previous byte for a 13. That should cut your scan time approximatly in half over a linear search.

Hash function to obtain a limited length result

I need to hash a number (about 22 digits) and the result length must be less than 12 characters. It can be a number or a mix of characters, and must be unique. (The number entered will be unique too).
For example, if the number entered is 000000000000000000001, the result should be something like 2s5As5A62s.
I looked at the typicals, like MD5, SHA-1, etc., but they give high length results.
The problem with your question is that the input is larger than the output and unique. If you're expecting a unique output as well, it won't happen. The reason behind this that if you have an input space of say 22 numeric digits (10^22 possibilities) and an output space of hexadecimal digits with a length of 11 digits (16^11 possibilities), you end up with more input possibilities than output possibilities.
The graph below shows that you would need a an output space of 19 hexadecimal digits and a perfect one-to-one function, otherwise you will have collisions pretty often (more than 50% of the time). I assume this is something you do not want, but you did not specify.
Since what you want cannot be done, I would suggest rethinking your design or using a checksum such as the cyclic redundancy check (CRC). CRC-64 will produce a 64 bit output and when encoded with any base64 algorithm, will give you something along the lines of what you want. This does not provide cryptographic strength like SHA-1, so it should never be used in anything related to information security.
However, if you were able to change your criteria to allow for long hash outputs, then I would strongly suggest you look at SHA-512, as it will provide high quality outputs with an extremely low chance of duplication. By a low chance I mean that no two inputs have yet been found to equal the same hash in the history of the algorithm.
If both of these suggestions still are not great for you, then your last alternative is probably just going with only base64 on the input data. It will essentially utilize the standard English alphabet in the best way possible to represent your data, thus reducing the number of characters as much as possible while retaining a complete representation of the input data. This is not a hash function, but simply a method for encoding binary data.
Why not taking MD5 or SHA-N then refactor to BASE64 (or base-whatever) and take only 12 characters of them ?
NB: In all case the hash will NEVER be unique (but can offer low collision probability)
You can't use a hash if it has to be unique.
You need about 74 bits to store such a number. If you convert it to base-64 it will be about 12 characters.
Can you elaborate on what your requirement is for the hashing? Do you need to make sure the result is diverse? (i.e. not 1 = a, 2 = b)
Just thinking out loud, and a little bit laterally, but could you not apply principles of run-length encoding on your number, treating it as data you want to compress. You could then use the base64 version of your compressed version.

How to properly read 16 byte unsigned integer with BinaryReader

I need to parse a binary stream in .NET to convert a 16 byte unsigned integer. I would like to use the BinaryReader.ReadUIntXX() functions but there isn't a BinaryReader.ReadUInt128() function available. I assume I will have to roll my own function using the ReadByte function and build an array but I don't know if this is the most efficient method?
Thanks!
I would love to take credit for this, but one quick search of the net, and viola:
http://msdn.microsoft.com/en-us/library/bb384066.aspx
Here is the code sample (which is on the same page)
byte[] bytes = { 0, 0, 0, 25 };
// If the system architecture is little-endian (that is, little end first),
// reverse the byte array.
if (BitConverter.IsLittleEndian)
Array.Reverse(bytes);
int i = BitConverter.ToInt32(bytes, 0);
Console.WriteLine("int: {0}", i);
// Output: int: 25
The only thing that most developers do not know is the difference between big-endian, and little-endian. Well like most things in life the human race simply can't agree on very simple things (left and right hand cars is a good example as well). When the bits (remember 1 and 0's and binary math), are laid out the order of the bits will determine the value of the field. One byte is eigth bits.. then there is signed and unsigned.. but lets stick to the order. The number 1 (one) can be represented in one of two ways , 10000000 or 00000001 (see clarification in comments for detailed explanation) - as the comment in the code suggests, the big-endian is the one with the one in front, litte-endian is the one with zero. (see http: // en.wikipedia.org/wiki/Endianness -sorry new user and they wont' let me hyperlink more than once....) Why can't we all just agree???
I learned this lesson many years ago when dealing with embedded systems....remember linking? :) Am I showing my age??
I think the comments from 0xA3, SLaks, and Darin Dimitrov answered the question but to put it all together.
BinaryReader.ReadUInt128() is not supported in the binary reader class in .NET and the only solution I could find was to create my own function. As 0xA3 mentioned, there is a BigInt data type in .NET 4.0. I am in the process of creating my own function based upon everyone's comments.
Thanks!
a guid is exactly 16 bytes in size.
Guid guid = new Guid(byteArray);
But you cannot do maths with a Guid. If you need to, you can search for some implementations of a BigInteger for .net on the internet. You can then convert your bytearray into a BigInteger.

Categories

Resources