Using the same function yields different results? - c#

Going from Framework 4.5 to Net 6.0 for an old project, there is an function that has broken immensely. When using the function for the 1st time, it returns the value 1651. Running the same function for a 2nd time, it returns a 0 value. This has caused an issue where our program cannot open the PKG file properly (it fails reading the 2nd value count.). I am wondering if there's a quick fix to be done?
The File Check (The code using the broken function): Github ArkPackage.cs
private void readNewFileTable(Stream header, bool readHash = false)
{
uint numFiles = header.ReadUInt32LE();
var files = new OffsetFile[numFiles];
for (var i = 0; i < numFiles; i++)
{
// Version 3 uses 32-bit file offsets
long arkFileOffset = header.ReadInt64LE();
string path = header.ReadLengthPrefixedString(System.Text.Encoding.UTF8);
var flags = header.ReadInt32LE();
uint size = header.ReadUInt32LE();
if (readHash) header.Seek(4, SeekOrigin.Current); // Skips checksum
var finalSlash = path.LastIndexOf('/');
var fileDir = path.Substring(0, finalSlash < 0 ? 0 : finalSlash);
var fileName = path.Substring(finalSlash < 0 ? 0 : (finalSlash + 1));
var parent = makeOrGetDir(fileDir);
var file = new OffsetFile(fileName, parent, contentFileMeta, arkFileOffset, size);
file.ExtendedInfo["id"] = i;
file.ExtendedInfo["flags"] = flags;
files[i] = file;
parent.AddFile(file);
}
var numFiles2 = header.ReadUInt32LE();
if(numFiles != numFiles2)
throw new Exception("Ark header appears invalid (file count mismatch)");
for(var i = 0; i < numFiles2; i++)
{
files[i].ExtendedInfo["flags2"] = header.ReadInt32LE();
}
The Broken Function (L154 is a variation that uses 161-171): Github StreamExtensions.cs
public static uint ReadUInt32LE(this Stream s) => unchecked((uint)s.ReadInt32LE());
/// <summary>
/// Read a signed 32-bit little-endian integer from the stream.
/// </summary>
/// <param name="s"></param>
/// <returns></returns>
public static int ReadInt32LE(this Stream s)
{
int ret;
byte[] tmp = new byte[4];
s.Read(tmp, 0, 4);
ret = tmp[0] & 0x000000FF;
ret |= (tmp[1] << 8) & 0x0000FF00;
ret |= (tmp[2] << 16) & 0x00FF0000;
ret |= (tmp[3] << 24);
return ret;
}
Before (Framework 4.5):
After (Net 6.0):

Starting in .Net 5, I believe it was, FileStream.Read doesn't block until the entire amount of data requested is available, it instead reads at most as many characters as instructed. Check the return value to see how much it read
If you want to read exactly as many characters as you want, either use a loop or use BinaryReader.ReadBytes which implements the loop for you.
Also that bit of code speaks volumes about the general quality of whatever library that is. You shouldn't be allocating arrays like that anymore, this isn't the 90's.

Related

C# equalent to perl `pack("v",value)` while packing some values into `byte[]`

I am trying to replicate behavior of a perl script in my c# code. When we convert any value into the Byte[] it should look same irrespective of the language used. SO
I have this function call which looks like this in perl:
$diag_cmd = pack("V", length($s_part)) . $s_part;
where $s_par is defined in following function. It is taking the .pds file at the location C:\Users\c_desaik\Desktop\DIAG\PwrDB\offtarget\data\get_8084_gpio.pds
$s_part =
sub read_pds
{
my $bin_s;
my $input_pds_file = $_[0];
open(my $fh, '<', $input_pds_file) or die "cannot open file $input_pds_file";
{
local $/;
$bin_s = <$fh>;
}
close($fh);
return $bin_s;
}
My best guess is that this function is reading the .pds file and turning it into a Byte array.
Now, I tried to replicate the behavior into c# code like following
static byte[] ConstructPacket()
{
List<byte> retval = new List<byte>();
retval.AddRange(System.IO.File.ReadAllBytes(#"C:\Users\c_desaik\Desktop\DIAG\PwrDB\offtarget\data\get_8084_gpio.pds"));
return retval.ToArray();
}
But the resulting byte array does not look same. Is there any special mechanism that I have to follow to replicate the behavior of pack("V", length($s_part)) . $s_part ?
As Simon Whitehead mentioned the template character V tells pack to pack your values into unsigned long (32-bit) integers (in little endian order). So you need to convert your bytes to a list (or array) of unsigned integers.
For example:
static uint[] UnpackUint32(string filename)
{
var retval = new List<uint>();
using (var filestream = System.IO.File.Open(filename, System.IO.FileMode.Open))
{
using (var binaryStream = new System.IO.BinaryReader(filestream))
{
var pos = 0;
while (pos < binaryStream.BaseStream.Length)
{
retval.Add(binaryStream.ReadUInt32());
pos += 4;
}
}
}
return retval.ToArray();
}
And call this function:
var list = UnpackUint32(#"C:\Users\c_desaik\Desktop\DIAG\PwrDB\offtarget\data\get_8084_gpio.pds");
Update
If you wanna read one length-prefixed string or a list of them, you can use this function:
private string[] UnpackStrings(string filename)
{
var retval = new List<string>();
using (var filestream = System.IO.File.Open(filename, System.IO.FileMode.Open))
{
using (var binaryStream = new System.IO.BinaryReader(filestream))
{
var pos = 0;
while ((pos + 4) <= binaryStream.BaseStream.Length)
{
// read the length of the string
var len = binaryStream.ReadUInt32();
// read the bytes of the string
var byteArr = binaryStream.ReadBytes((int) len);
// cast this bytes to a char and append them to a stringbuilder
var sb = new StringBuilder();
foreach (var b in byteArr)
sb.Append((char)b);
// add the new string to our collection of strings
retval.Add(sb.ToString());
// calculate start position of next value
pos += 4 + (int) len;
}
}
}
return retval.ToArray();
}
pack("V", length($s_part)) . $s_part
which can also be written as
pack("V/a*", $s_part)
creates a length-prefixed string. The length is stored as a 32-bit unsigned little-endian number.
+----------+----------+----------+----------+-------- ...
| Length | Length | Length | Length | Bytes
| ( 7.. 0) | (15.. 8) | (23..16) | (31..24) |
+----------+----------+----------+----------+-------- ...
This is how you recreate the original string from the bytes:
Read 4 bytes
If using a machine other than a little-endian machine,
Rearrange the bytes into the native order.
Cast those bytes into an 32-bit unsigned integer.
Read a number of bytes equal to that number.
Convert that sequences of bytes into a string.
Some languages provide tools that perform more than one of these steps.
I don't know C#, so I can't write the code for you, but I can give you an example in two other languages.
In Perl, this would be written as follows:
sub read_bytes {
my ($fh, $num_bytes_to_read) = #_;
my $buf = '';
while ($num_bytes_to_read) {
my $num_bytes_read = read($fh, $buf, $num_bytes_to_read, length($buf));
if (!$num_bytes_read) {
die "$!\n" if !defined($num_bytes_read);
die "Premature EOF\n";
}
$num_bytes_to_read -= $num_bytes_read;
}
return $buf;
}
sub read_uint32le { unpack('V', read_bytes($_[0], 4)) }
sub read_pstr { read_bytes($_[0], read_uint32le($_[0])) }
my $str = read_pstr($fh);
In C,
int read_bytes(FILE* fh, void* buf, size_t num_bytes_to_read) {
while (num_bytes_to_read) {
size_t num_bytes_read = fread(buf, 1, num_bytes_to_read, fh);
if (!num_bytes_read)
return 0;
num_bytes_to_read -= num_bytes_read;
buf += num_bytes_read;
}
return 1;
}
int read_uint32le(FILE* fh, uint32_t* p_i) {
int ok = read_bytes(fh, p_i, sizeof(*p_i));
if (!ok)
return 0;
{ /* Rearrange bytes on non-LE machines */
const char* p = (char*)p_i;
*p_i = ((((p[3] << 8) | p[2]) << 8) | p[1]) << 8) | p[0];
}
return 1;
}
char* read_pstr(FILE* fh) {
uint32_t len;
char* buf = NULL;
int ok;
ok = read_uint32le(fh, &len);
if (!ok)
goto ERROR;
buf = malloc(len+1);
if (!buf)
goto ERROR;
ok = read_bytes(fh, buf, len);
if (!ok)
goto ERROR;
buf[len] = '\0';
return buf;
ERROR:
if (p)
free(p);
return NULL;
}
char* str = read_pstr(fh);

Using SCardGetStatusChange to be notified of card insert/remove without polling

I'm trying to detect when a card has been inserted into a reader.
If I do a nasty polling loop like this:
public struct SCARD_READERSTATE
{
[MarshalAs(UnmanagedType.LPWStr)]
public string szReader;
public byte[] pvUserData;
public byte[] rgbAtr;
public uint dwCurrentState;
public uint dwEventState;
public uint cbAtr;
}
byte[] atr = null;
SCARD_READERSTATE[] rs = new SCARD_READERSTATE[1];
rs[0].szReader = readersList[0];
rs[0].dwCurrentState = SCARD_STATE_UNAWARE;
rs[0].dwEventState = SCARD_STATE_PRESENT;
int hctx = hContext.ToInt32();
var cardResult = SCardGetStatusChange(hctx, 100, rs, 1);
if (cardResult == 0 && rs[0].cbAtr > 0 && rs[0].rgbAtr != null)
{
atr = new byte[rs[0].cbAtr];
Array.Copy(rs[0].rgbAtr, atr, rs[0].cbAtr);
}
while ( (rs[0].dwCurrentState & SCARD_STATE_PRESENT) == 0)
{
rs = new SCARD_READERSTATE[1];
rs[0].szReader = readersList[0];
//rs[0].dwCurrentState = SCARD_STATE_PRESENT;
//rs[0].dwEventState = SCARD_STATE_PRESENT;
SCardGetStatusChange(hctx, 100000000, rs, 1);
System.Threading.Thread.Sleep(1000);
}
it works, but it has a nasty thread sleep in it. Ideally I'd like to make a blocking call to SCardGetStatusChange on a background thread and then surface up the events.
Apparently by setting the szReader to the value "\\?PnP?\Notification" it should block, as long as everything else in the struct is 0.
I've changed the code to
rs[0].szReader = "\\\\?PnP?\\Notification";
rs[0].cbAtr = 0;
rs[0].dwCurrentState = 0;
rs[0].dwEventState = 0;
rs[0].pvUserData = new byte[0];
rs[0].rgbAtr = new byte0];
SCardGetStatusChange(hctx, 100000000, rs, 1);
but it just returns a success result immediately. Can any pInvoke masters out there see what's wrong?
In your sample the second call to SCardGetStatusChange should block if you copy dwEventState into dwCurrentState and then reset dwEventState, so there's no need for the sleep.
The "\\?PnP?\Notification" struct is to tell you when a new smart card reader has been attached, not when a card has been inserted. From the MSDN page on SCardGetStatusChange:
To be notified of the arrival of a new smart card reader, set the szReader member of a SCARD_READERSTATE structure to "\\?PnP?\Notification", and set all of the other members of that structure to zero.
When using the "\\?PnP?\Notification" struct:
the pvUserData and rgbAttr fields should be set to null
a new byte[0] is a valid pointer to a zero length array, but what the API needs here is null pointers or zero values)
the high 16 bits of dwCurrentState should contain the current reader count
i.e. rs[0].dwCurrentState = (readerCount << 16);
the MSDN page is currently inaccurate on this point.

Base-N encoding of a byte array

A couple of days ago I came across this CodeReview for Base-36 encoding a byte array. However, the answers that followed didn't touch on decoding back into a byte array, or possibly reusing the answer to perform encodings of different bases (radix).
The answer for the linked question uses BigInteger. So as far as implementation goes, the base and its digits could be parametrized.
The problem with BigInteger though, is that we're treating our input as an assumed integer. However, our input, a byte array, is just an opaque series of values.
If the byte array ends in a series of zero bytes, eg {0xFF,0x7F,0x00,0x00}, those bytes will be lost when using the algorithm in the answer (would only encode {0xFF,0x7F}.
If the last non-zero byte has the sign bit set then the proceeding zero byte is consumed as it's treated as the BigInt's sign delimiter. So {0xFF,0xFF,0x00,0x00} would encode only as {0xFF,0xFF,0x00}.
How could a .NET programmer use BigInteger to create a reasonably efficient and radix-agnostic encoder, with decoding support, plus the ability to handle endian-ness, and with the ability to 'work around' the ending zero bytes being lost?
edit [2020/01/26]: FWIW, the code below along with its unit test live along side my open source libraries on Github.
edit [2016/04/19]: If you're fond of exceptions, you may wish to change some of the Decode implementation code to throw InvalidDataException instead of just returning null.
edit [2014/09/14]: I've added a 'HACK' to Encode() to handle cases where the last byte in the input is signed (if you were to convert to sbyte). Only sane solution I could think of right now is to just Resize() the array by one. Additional unit tests for this case passed, but I didn't rerun perf code to account for such cases. If you can help it, always have your input to Encode() include a dummy 0 byte at the end to avoid additional allocations.
Usage
I've created a RadixEncoding class (found in the "Code" section) which initializes with three parameters:
The radix digits as a string (length determines the actual radix of course),
The assumed byte ordering (endian) of input byte arrays,
And whether or not the user wants the encode/decode logic to acknowledge ending zero bytes.
To create a Base-36 encoding, with little-endian input, and with respect given to ending zero bytes:
const string k_base36_digits = "0123456789abcdefghijklmnopqrstuvwxyz";
var base36_no_zeros = new RadixEncoding(k_base36_digits, EndianFormat.Little, false);
And then to actually perform encoding/decoding:
const string k_input = "A test 1234";
byte[] input_bytes = System.Text.Encoding.UTF8.GetBytes(k_input);
string encoded_string = base36_no_zeros.Encode(input_bytes);
byte[] decoded_bytes = base36_no_zeros.Decode(encoded_string);
Performance
Timed with Diagnostics.Stopwatch, ran on an i7 860 #2.80GHz. Timing EXE ran by itself, not under a debugger.
Encoding was initialized with the same k_base36_digits string from above, EndianFormat.Little, and with ending zero bytes acknowledged (even though the UTF8 bytes don't have any extra ending zero bytes)
To encode the UTF8 bytes of "A test 1234" 1,000,000 times takes 2.6567905secs
To decode the same string the same amount of times takes 3.3916248secs
To encode the UTF8 bytes of "A test 1234. Made slightly larger!" 100,000 times takes 1.1577325secs
To decode the same string the same amount of times takes 1.244326secs
Code
If you don't have a CodeContracts generator, you will have to reimplement the contracts with if/throw code.
using System;
using System.Collections.Generic;
using System.Numerics;
using Contract = System.Diagnostics.Contracts.Contract;
public enum EndianFormat
{
/// <summary>Least Significant Bit order (lsb)</summary>
/// <remarks>Right-to-Left</remarks>
/// <see cref="BitConverter.IsLittleEndian"/>
Little,
/// <summary>Most Significant Bit order (msb)</summary>
/// <remarks>Left-to-Right</remarks>
Big,
};
/// <summary>Encodes/decodes bytes to/from a string</summary>
/// <remarks>
/// Encoded string is always in big-endian ordering
///
/// <p>Encode and Decode take a <b>includeProceedingZeros</b> parameter which acts as a work-around
/// for an edge case with our BigInteger implementation.
/// MSDN says BigInteger byte arrays are in LSB->MSB ordering. So a byte buffer with zeros at the
/// end will have those zeros ignored in the resulting encoded radix string.
/// If such a loss in precision absolutely cannot occur pass true to <b>includeProceedingZeros</b>
/// and for a tiny bit of extra processing it will handle the padding of zero digits (encoding)
/// or bytes (decoding).</p>
/// <p>Note: doing this for decoding <b>may</b> add an extra byte more than what was originally
/// given to Encode.</p>
/// </remarks>
// Based on the answers from http://codereview.stackexchange.com/questions/14084/base-36-encoding-of-a-byte-array/
public class RadixEncoding
{
const int kByteBitCount = 8;
readonly string kDigits;
readonly double kBitsPerDigit;
readonly BigInteger kRadixBig;
readonly EndianFormat kEndian;
readonly bool kIncludeProceedingZeros;
/// <summary>Numerial base of this encoding</summary>
public int Radix { get { return kDigits.Length; } }
/// <summary>Endian ordering of bytes input to Encode and output by Decode</summary>
public EndianFormat Endian { get { return kEndian; } }
/// <summary>True if we want ending zero bytes to be encoded</summary>
public bool IncludeProceedingZeros { get { return kIncludeProceedingZeros; } }
public override string ToString()
{
return string.Format("Base-{0} {1}", Radix.ToString(), kDigits);
}
/// <summary>Create a radix encoder using the given characters as the digits in the radix</summary>
/// <param name="digits">Digits to use for the radix-encoded string</param>
/// <param name="bytesEndian">Endian ordering of bytes input to Encode and output by Decode</param>
/// <param name="includeProceedingZeros">True if we want ending zero bytes to be encoded</param>
public RadixEncoding(string digits,
EndianFormat bytesEndian = EndianFormat.Little, bool includeProceedingZeros = false)
{
Contract.Requires<ArgumentNullException>(digits != null);
int radix = digits.Length;
kDigits = digits;
kBitsPerDigit = System.Math.Log(radix, 2);
kRadixBig = new BigInteger(radix);
kEndian = bytesEndian;
kIncludeProceedingZeros = includeProceedingZeros;
}
// Number of characters needed for encoding the specified number of bytes
int EncodingCharsCount(int bytesLength)
{
return (int)Math.Ceiling((bytesLength * kByteBitCount) / kBitsPerDigit);
}
// Number of bytes needed to decoding the specified number of characters
int DecodingBytesCount(int charsCount)
{
return (int)Math.Ceiling((charsCount * kBitsPerDigit) / kByteBitCount);
}
/// <summary>Encode a byte array into a radix-encoded string</summary>
/// <param name="bytes">byte array to encode</param>
/// <returns>The bytes in encoded into a radix-encoded string</returns>
/// <remarks>If <paramref name="bytes"/> is zero length, returns an empty string</remarks>
public string Encode(byte[] bytes)
{
Contract.Requires<ArgumentNullException>(bytes != null);
Contract.Ensures(Contract.Result<string>() != null);
// Don't really have to do this, our code will build this result (empty string),
// but why not catch the condition before doing work?
if (bytes.Length == 0) return string.Empty;
// if the array ends with zeros, having the capacity set to this will help us know how much
// 'padding' we will need to add
int result_length = EncodingCharsCount(bytes.Length);
// List<> has a(n in-place) Reverse method. StringBuilder doesn't. That's why.
var result = new List<char>(result_length);
// HACK: BigInteger uses the last byte as the 'sign' byte. If that byte's MSB is set,
// we need to pad the input with an extra 0 (ie, make it positive)
if ( (bytes[bytes.Length-1] & 0x80) == 0x80 )
Array.Resize(ref bytes, bytes.Length+1);
var dividend = new BigInteger(bytes);
// IsZero's computation is less complex than evaluating "dividend > 0"
// which invokes BigInteger.CompareTo(BigInteger)
while (!dividend.IsZero)
{
BigInteger remainder;
dividend = BigInteger.DivRem(dividend, kRadixBig, out remainder);
int digit_index = System.Math.Abs((int)remainder);
result.Add(kDigits[digit_index]);
}
if (kIncludeProceedingZeros)
for (int x = result.Count; x < result.Capacity; x++)
result.Add(kDigits[0]); // pad with the character that represents 'zero'
// orientate the characters in big-endian ordering
if (kEndian == EndianFormat.Little)
result.Reverse();
// If we didn't end up adding padding, ToArray will end up returning a TrimExcess'd array,
// so nothing wasted
return new string(result.ToArray());
}
void DecodeImplPadResult(ref byte[] result, int padCount)
{
if (padCount > 0)
{
int new_length = result.Length + DecodingBytesCount(padCount);
Array.Resize(ref result, new_length); // new bytes will be zero, just the way we want it
}
}
#region Decode (Little Endian)
byte[] DecodeImpl(string chars, int startIndex = 0)
{
var bi = new BigInteger();
for (int x = startIndex; x < chars.Length; x++)
{
int i = kDigits.IndexOf(chars[x]);
if (i < 0) return null; // invalid character
bi *= kRadixBig;
bi += i;
}
return bi.ToByteArray();
}
byte[] DecodeImplWithPadding(string chars)
{
int pad_count = 0;
for (int x = 0; x < chars.Length; x++, pad_count++)
if (chars[x] != kDigits[0]) break;
var result = DecodeImpl(chars, pad_count);
DecodeImplPadResult(ref result, pad_count);
return result;
}
#endregion
#region Decode (Big Endian)
byte[] DecodeImplReversed(string chars, int startIndex = 0)
{
var bi = new BigInteger();
for (int x = (chars.Length-1)-startIndex; x >= 0; x--)
{
int i = kDigits.IndexOf(chars[x]);
if (i < 0) return null; // invalid character
bi *= kRadixBig;
bi += i;
}
return bi.ToByteArray();
}
byte[] DecodeImplReversedWithPadding(string chars)
{
int pad_count = 0;
for (int x = chars.Length - 1; x >= 0; x--, pad_count++)
if (chars[x] != kDigits[0]) break;
var result = DecodeImplReversed(chars, pad_count);
DecodeImplPadResult(ref result, pad_count);
return result;
}
#endregion
/// <summary>Decode a radix-encoded string into a byte array</summary>
/// <param name="radixChars">radix string</param>
/// <returns>The decoded bytes, or null if an invalid character is encountered</returns>
/// <remarks>
/// If <paramref name="radixChars"/> is an empty string, returns a zero length array
///
/// Using <paramref name="IncludeProceedingZeros"/> has the potential to return a buffer with an
/// additional zero byte that wasn't in the input. So a 4 byte buffer was encoded, this could end up
/// returning a 5 byte buffer, with the extra byte being null.
/// </remarks>
public byte[] Decode(string radixChars)
{
Contract.Requires<ArgumentNullException>(radixChars != null);
if (kEndian == EndianFormat.Big)
return kIncludeProceedingZeros ? DecodeImplReversedWithPadding(radixChars) : DecodeImplReversed(radixChars);
else
return kIncludeProceedingZeros ? DecodeImplWithPadding(radixChars) : DecodeImpl(radixChars);
}
};
Basic Unit Tests
using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
static bool ArraysCompareN<T>(T[] input, T[] output)
where T : IEquatable<T>
{
if (output.Length < input.Length) return false;
for (int x = 0; x < input.Length; x++)
if(!output[x].Equals(input[x])) return false;
return true;
}
static bool RadixEncodingTest(RadixEncoding encoding, byte[] bytes)
{
string encoded = encoding.Encode(bytes);
byte[] decoded = encoding.Decode(encoded);
return ArraysCompareN(bytes, decoded);
}
[TestMethod]
public void TestRadixEncoding()
{
const string k_base36_digits = "0123456789abcdefghijklmnopqrstuvwxyz";
var base36 = new RadixEncoding(k_base36_digits, EndianFormat.Little, true);
var base36_no_zeros = new RadixEncoding(k_base36_digits, EndianFormat.Little, true);
byte[] ends_with_zero_neg = { 0xFF, 0xFF, 0x00, 0x00 };
byte[] ends_with_zero_pos = { 0xFF, 0x7F, 0x00, 0x00 };
byte[] text = System.Text.Encoding.ASCII.GetBytes("A test 1234");
Assert.IsTrue(RadixEncodingTest(base36, ends_with_zero_neg));
Assert.IsTrue(RadixEncodingTest(base36, ends_with_zero_pos));
Assert.IsTrue(RadixEncodingTest(base36_no_zeros, text));
}
Interestingly, I was able to port Kornman's techniques across to Java and got expected output up to and including base36. Whereas when running his? code from c# using C:\Windows\Microsoft.NET\Framework\v4.0.30319 csc, the output was not as expected.
For example, trying to base16 encode the obtained MD5 hashBytes for the String "hello world" below using Kornman's RadixEncoding encode, I could see the groups of two bytes per characters had the bytes in wrong order.
Rather than 5eb63bbbe01eeed093cb22bb8f5acdc3
I saw something like e56bb3bb0ee1....
This was on Windows 7.
const string input = "hello world";
public static void Main(string[] args)
{
using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
byte[] inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
byte[] hashBytes = md5.ComputeHash(inputBytes);
// Convert the byte array to hexadecimal string
StringBuilder sb = new StringBuilder();
for (int i = 0; i < hashBytes.Length; i++)
{
sb.Append(hashBytes[i].ToString("X2"));
}
Console.WriteLine(sb.ToString());
}
}
Java code is below for anyone interested. As mentioned above, it only works to base 36.
private static final char[] BASE16_CHARS = "0123456789abcdef".toCharArray();
private static final BigInteger BIGINT_16 = BigInteger.valueOf(16);
private static final char[] BASE36_CHARS = "0123456789abcdefghijklmnopqrstuvwxyz".toCharArray();
private static final BigInteger BIGINT_36 = BigInteger.valueOf(36);
public static String toBaseX(byte[] bytes, BigInteger base, char[] chars)
{
if (bytes == null) {
return null;
}
final int bitsPerByte = 8;
double bitsPerDigit = Math.log(chars.length) / Math.log(2);
// Number of chars to encode specified bytes
int size = (int) Math.ceil((bytes.length * bitsPerByte) / bitsPerDigit);
StringBuilder sb = new StringBuilder(size);
for (BigInteger value = new BigInteger(bytes); !value.equals(BigInteger.ZERO);) {
BigInteger[] quotientAndRemainder = value.divideAndRemainder(base);
sb.insert(0, chars[Math.abs(quotientAndRemainder[1].intValue())]);
value = quotientAndRemainder[0];
}
return sb.toString();
}

Printing PNG images to a zebra network printer

I am trying to find a way of printing images to a zebra and having a lot of trouble.
According to the docs:
The first encoding, known as B64, encodes the data using the MIME
Base64 scheme. Base64 is used to encode e-mail atachedments ...
Base64 encodes six bits to the byte, for an expantion of 33 percent
over the un-enclosed data. The second encoding, known as Z64,
first compresses the data using the LZ77 algorithm to reduce its size.
(This algorithm is used by the PKZIP and is intergral to the PNG
graphics format.) The compressed data is then encoded using the
MIME Base64 scheme as described above. A CRC will be calculated
accross the Base64-encoded data.
But it doesn't have a great deal more info.
Basically I was trying encoding with
private byte[] GetItemFromPath(string filepath)
{
using (MemoryStream ms = new MemoryStream())
{
using (Image img = Image.FromFile(filepath))
{
img.Save(ms, ImageFormat.Png);
return ms.ToArray();
}
}
}
Then trying to print with something like:
var initialArray = GetItemFromPath("C:\\RED.png");
string converted = Convert.ToBase64String(b);
PrintThis(string.Format(#"~DYRED.PNG,P,P,{1},0,:B64:
{0}
^XA
^F0200,200^XGRED.PNG,1,1^FS
^XZ", converted .ToString(), initialArray.Length));
From the sounds of it, either B64 or Z64 are both accepted.
I've tried a few variations, and a couple of methods for generating the CRC and calculating the 'size'.
But none seem to work and the download of the graphics to the printer is always getting aborted.
Has anyone managed to accomplish something like this? Or knows where I am going wrong?
All credit for me coming to this answer was from LabView Forum user Raydur. He posts a LabView solution that can be opened up in LabView to send images down. I personally didn't run it with my printer, I just used it to figure out the correct image code so I could replicate it in my code. The big thing that I was missing was padding my Hexadecimal code. For example, 1A is fine, but if you have just A, you need to pad a 0 in front of it to send 0A. The size of the file in the ZPL you are sending is also the original size of the byte array, not the final string representation of the data.
I've scoured many, many, many forums and Stackoverflow posts trying to figure this out because it seems like such a simple thing to do. I've tried every single solution posted elsewhere but I really wanted to just print a.PNG because the manual for my printer(Mobile QLN320) has support for it built-in. It says to either send it in Base64 or Hexadecimal, and I tried both to no avail. For anyone wanting to do Base64, I found in an older manual that you need to manually calculate CRC codes for each packet you send so I chose to go with the easier Hexadecimal route. So here is the code I got to work!
string ipAddress = "192.168.1.30";
int port = 6101;
string zplImageData = string.Empty;
//Make sure no transparency exists. I had some trouble with this. This PNG has a white background
string filePath = #"C:\Users\Path\To\Logo.png";
byte[] binaryData = System.IO.File.ReadAllBytes(filePath);
foreach (Byte b in binaryData)
{
string hexRep = String.Format("{0:X}", b);
if (hexRep.Length == 1)
hexRep = "0" + hexRep;
zplImageData += hexRep;
}
string zplToSend = "^XA" + "^MNN" + "^LL500" + "~DYE:LOGO,P,P," + binaryData.Length + ",," + zplImageData+"^XZ";
string printImage = "^XA^FO115,50^IME:LOGO.PNG^FS^XZ";
try
{
// Open connection
System.Net.Sockets.TcpClient client = new System.Net.Sockets.TcpClient();
client.Connect(ipAddress, port);
// Write ZPL String to connection
System.IO.StreamWriter writer = new System.IO.StreamWriter(client.GetStream(),Encoding.UTF8);
writer.Write(zplToSend);
writer.Flush();
writer.Write(printImage);
writer.Flush();
// Close Connection
writer.Close();
client.Close();
}
catch (Exception ex)
{
// Catch Exception
}
The ZPL II Programming Guide documents the ~DG command and GRF format (page 124) to download images. Volume Two adds details on an optional compression format (page 52).
First, you have to convert the image to a 1bpp bi-level image, then convert it to a hex-encoded string. You can further compress the image to reduce transmission time. You can then print the image with the ^ID command.
While there is inherent support for PNG images in the ~DY command, it is poorly documented and does not seem to work on certain models of printers. The ZB64 format is basically not documented, and attempts to get more information from Zebra support have been fruitless. If you have your heart set on ZB64, you can use the Java based Zebralink SDK (look to ImagePrintDemo.java and com.zebra.sdk.printer.internal.GraphicsConversionUtilZpl.sendImageToStream).
Once you have the command data, it can be sent via TCP/IP if the printer has a print-server, or it can be sent by writing in RAW format to the printer.
The code below prints a 5 kB PNG as a 13 kB compressed GRF (60 kB uncompressed):
class Program
{
static unsafe void Main(string[] args)
{
var baseStream = new MemoryStream();
var tw = new StreamWriter(baseStream, Encoding.UTF8);
using (var bmpSrc = new Bitmap(Image.FromFile(#"label.png")))
{
tw.WriteLine(ZplImage.GetGrfStoreCommand("R:LBLRA2.GRF", bmpSrc));
}
tw.WriteLine(ZplImage.GetGrfPrintCommand("R:LBLRA2.GRF"));
tw.WriteLine(ZplImage.GetGrfDeleteCommand("R:LBLRA2.GRF"));
tw.Flush();
baseStream.Position = 0;
var gdipj = new GdiPrintJob("ZEBRA S4M-200dpi ZPL", GdiPrintJobDataType.Raw, "Raw print", null);
gdipj.WritePage(baseStream);
gdipj.CompleteJob();
}
}
class ZplImage
{
public static string GetGrfStoreCommand(string filename, Bitmap bmpSource)
{
if (bmpSource == null)
{
throw new ArgumentNullException("bmpSource");
}
validateFilename(filename);
var dim = new Rectangle(Point.Empty, bmpSource.Size);
var stride = ((dim.Width + 7) / 8);
var bytes = stride * dim.Height;
using (var bmpCompressed = bmpSource.Clone(dim, PixelFormat.Format1bppIndexed))
{
var result = new StringBuilder();
result.AppendFormat("^XA~DG{2},{0},{1},", stride * dim.Height, stride, filename);
byte[][] imageData = GetImageData(dim, stride, bmpCompressed);
byte[] previousRow = null;
foreach (var row in imageData)
{
appendLine(row, previousRow, result);
previousRow = row;
}
result.Append(#"^FS^XZ");
return result.ToString();
}
}
public static string GetGrfDeleteCommand(string filename)
{
validateFilename(filename);
return string.Format("^XA^ID{0}^FS^XZ", filename);
}
public static string GetGrfPrintCommand(string filename)
{
validateFilename(filename);
return string.Format("^XA^FO0,0^XG{0},1,1^FS^XZ", filename);
}
static Regex regexFilename = new Regex("^[REBA]:[A-Z0-9]{1,8}\\.GRF$");
private static void validateFilename(string filename)
{
if (!regexFilename.IsMatch(filename))
{
throw new ArgumentException("Filename must be in the format "
+ "R:XXXXXXXX.GRF. Drives are R, E, B, A. Filename can "
+ "be alphanumeric between 1 and 8 characters.", "filename");
}
}
unsafe private static byte[][] GetImageData(Rectangle dim, int stride, Bitmap bmpCompressed)
{
byte[][] imageData;
var data = bmpCompressed.LockBits(dim, ImageLockMode.ReadOnly, PixelFormat.Format1bppIndexed);
try
{
byte* pixelData = (byte*)data.Scan0.ToPointer();
byte rightMask = (byte)(0xff << (data.Stride * 8 - dim.Width));
imageData = new byte[dim.Height][];
for (int row = 0; row < dim.Height; row++)
{
byte* rowStart = pixelData + row * data.Stride;
imageData[row] = new byte[stride];
for (int col = 0; col < stride; col++)
{
byte f = (byte)(0xff ^ rowStart[col]);
f = (col == stride - 1) ? (byte)(f & rightMask) : f;
imageData[row][col] = f;
}
}
}
finally
{
bmpCompressed.UnlockBits(data);
}
return imageData;
}
private static void appendLine(byte[] row, byte[] previousRow, StringBuilder baseStream)
{
if (row.All(r => r == 0))
{
baseStream.Append(",");
return;
}
if (row.All(r => r == 0xff))
{
baseStream.Append("!");
return;
}
if (previousRow != null && MatchByteArray(row, previousRow))
{
baseStream.Append(":");
return;
}
byte[] nibbles = new byte[row.Length * 2];
for (int i = 0; i < row.Length; i++)
{
nibbles[i * 2] = (byte)(row[i] >> 4);
nibbles[i * 2 + 1] = (byte)(row[i] & 0x0f);
}
for (int i = 0; i < nibbles.Length; i++)
{
byte cPixel = nibbles[i];
int repeatCount = 0;
for (int j = i; j < nibbles.Length && repeatCount <= 400; j++)
{
if (cPixel == nibbles[j])
{
repeatCount++;
}
else
{
break;
}
}
if (repeatCount > 2)
{
if (repeatCount == nibbles.Length - i
&& (cPixel == 0 || cPixel == 0xf))
{
if (cPixel == 0)
{
if (i % 2 == 1)
{
baseStream.Append("0");
}
baseStream.Append(",");
return;
}
else if (cPixel == 0xf)
{
if (i % 2 == 1)
{
baseStream.Append("F");
}
baseStream.Append("!");
return;
}
}
else
{
baseStream.Append(getRepeatCode(repeatCount));
i += repeatCount - 1;
}
}
baseStream.Append(cPixel.ToString("X"));
}
}
private static string getRepeatCode(int repeatCount)
{
if (repeatCount > 419)
throw new ArgumentOutOfRangeException();
int high = repeatCount / 20;
int low = repeatCount % 20;
const string lowString = " GHIJKLMNOPQRSTUVWXY";
const string highString = " ghijklmnopqrstuvwxyz";
string repeatStr = "";
if (high > 0)
{
repeatStr += highString[high];
}
if (low > 0)
{
repeatStr += lowString[low];
}
return repeatStr;
}
private static bool MatchByteArray(byte[] row, byte[] previousRow)
{
for (int i = 0; i < row.Length; i++)
{
if (row[i] != previousRow[i])
{
return false;
}
}
return true;
}
}
internal static class NativeMethods
{
#region winspool.drv
#region P/Invokes
[DllImport("winspool.Drv", SetLastError = true, CharSet = CharSet.Unicode)]
internal static extern bool OpenPrinter(string szPrinter, out IntPtr hPrinter, IntPtr pd);
[DllImport("winspool.Drv", SetLastError = true, CharSet = CharSet.Unicode)]
internal static extern bool ClosePrinter(IntPtr hPrinter);
[DllImport("winspool.Drv", SetLastError = true, CharSet = CharSet.Unicode)]
internal static extern UInt32 StartDocPrinter(IntPtr hPrinter, Int32 level, IntPtr di);
[DllImport("winspool.Drv", SetLastError = true, CharSet = CharSet.Unicode)]
internal static extern bool EndDocPrinter(IntPtr hPrinter);
[DllImport("winspool.Drv", SetLastError = true, CharSet = CharSet.Unicode)]
internal static extern bool StartPagePrinter(IntPtr hPrinter);
[DllImport("winspool.Drv", SetLastError = true, CharSet = CharSet.Unicode)]
internal static extern bool EndPagePrinter(IntPtr hPrinter);
[DllImport("winspool.Drv", SetLastError = true, CharSet = CharSet.Unicode)]
internal static extern bool WritePrinter(
// 0
IntPtr hPrinter,
[MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 2)] byte[] pBytes,
// 2
UInt32 dwCount,
out UInt32 dwWritten);
#endregion
#region Structs
[StructLayout(LayoutKind.Sequential)]
internal struct DOC_INFO_1
{
[MarshalAs(UnmanagedType.LPWStr)]
public string DocName;
[MarshalAs(UnmanagedType.LPWStr)]
public string OutputFile;
[MarshalAs(UnmanagedType.LPWStr)]
public string Datatype;
}
#endregion
#endregion
}
/// <summary>
/// Represents a print job in a spooler queue
/// </summary>
public class GdiPrintJob
{
IntPtr PrinterHandle;
IntPtr DocHandle;
/// <summary>
/// The ID assigned by the print spooler to identify the job
/// </summary>
public UInt32 PrintJobID { get; private set; }
/// <summary>
/// Create a print job with a enumerated datatype
/// </summary>
/// <param name="PrinterName"></param>
/// <param name="dataType"></param>
/// <param name="jobName"></param>
/// <param name="outputFileName"></param>
public GdiPrintJob(string PrinterName, GdiPrintJobDataType dataType, string jobName, string outputFileName)
: this(PrinterName, translateType(dataType), jobName, outputFileName)
{
}
/// <summary>
/// Create a print job with a string datatype
/// </summary>
/// <param name="PrinterName"></param>
/// <param name="dataType"></param>
/// <param name="jobName"></param>
/// <param name="outputFileName"></param>
public GdiPrintJob(string PrinterName, string dataType, string jobName, string outputFileName)
{
if (string.IsNullOrWhiteSpace(PrinterName))
throw new ArgumentNullException("PrinterName");
if (string.IsNullOrWhiteSpace(dataType))
throw new ArgumentNullException("PrinterName");
IntPtr hPrinter;
if (!NativeMethods.OpenPrinter(PrinterName, out hPrinter, IntPtr.Zero))
throw new Win32Exception();
this.PrinterHandle = hPrinter;
NativeMethods.DOC_INFO_1 docInfo = new NativeMethods.DOC_INFO_1()
{
DocName = jobName,
Datatype = dataType,
OutputFile = outputFileName
};
IntPtr pDocInfo = Marshal.AllocHGlobal(Marshal.SizeOf(docInfo));
RuntimeHelpers.PrepareConstrainedRegions();
try
{
Marshal.StructureToPtr(docInfo, pDocInfo, false);
UInt32 docid = NativeMethods.StartDocPrinter(hPrinter, 1, pDocInfo);
if (docid == 0)
throw new Win32Exception();
this.PrintJobID = docid;
}
finally
{
Marshal.FreeHGlobal(pDocInfo);
}
}
/// <summary>
/// Write the data of a single page or a precomposed PCL document
/// </summary>
/// <param name="data"></param>
public void WritePage(Stream data)
{
if (data == null)
throw new ArgumentNullException("data");
if (!data.CanRead && !data.CanWrite)
throw new ObjectDisposedException("data");
if (!data.CanRead)
throw new NotSupportedException("stream is not readable");
if (!NativeMethods.StartPagePrinter(this.PrinterHandle))
throw new Win32Exception();
byte[] buffer = new byte[0x14000]; /* 80k is Stream.CopyTo default */
uint read = 1;
while ((read = (uint)data.Read(buffer, 0, buffer.Length)) != 0)
{
UInt32 written;
if (!NativeMethods.WritePrinter(this.PrinterHandle, buffer, read, out written))
throw new Win32Exception();
if (written != read)
throw new InvalidOperationException("Error while writing to stream");
}
if (!NativeMethods.EndPagePrinter(this.PrinterHandle))
throw new Win32Exception();
}
/// <summary>
/// Complete the current job
/// </summary>
public void CompleteJob()
{
if (!NativeMethods.EndDocPrinter(this.PrinterHandle))
throw new Win32Exception();
}
#region datatypes
private readonly static string[] dataTypes = new string[]
{
// 0
null,
"RAW",
// 2
"RAW [FF appended]",
"RAW [FF auto]",
// 4
"NT EMF 1.003",
"NT EMF 1.006",
// 6
"NT EMF 1.007",
"NT EMF 1.008",
// 8
"TEXT",
"XPS_PASS",
// 10
"XPS2GDI"
};
private static string translateType(GdiPrintJobDataType type)
{
return dataTypes[(int)type];
}
#endregion
}
public enum GdiPrintJobDataType
{
Unknown = 0,
Raw = 1,
RawAppendFF = 2,
RawAuto = 3,
NtEmf1003 = 4,
NtEmf1006 = 5,
NtEmf1007 = 6,
NtEmf1008 = 7,
Text = 8,
XpsPass = 9,
Xps2Gdi = 10
}
For some reason I cannot get B64 to work, but luckily I was able to Google my way into making Z64 work (in 3 soul-searching days or so) using plain old JavaScript.
Somewhere else on the ZPL programming Guide you stumble upon the The CISDFCRC16 command--let's be cryptic, why not--section, which states:
"The value of the field is calculated the CRC-16 for the
contents of a specified file using the CRC16-CCITT polynomial which is
x^16 + x^12 + x^5 + 1. It is calculated using an initial CRC of
0x0000."
Japanglish aside, you can now check out the Catalogue of parametrised CRC algorithms with 16 bits (http://reveng.sourceforge.net/crc-catalogue/16.htm) and look for the XMODEM algorithm, which happens to be
width=16 poly=0x1021 init=0x0000 refin=false refout=false
xorout=0x0000 check=0x31c3 name="XMODEM"
Aha. I then started looking for the rest of the code I needed and stumbled upon the following:
LZ77-Algorithm-Based JavaScript Compressor (http://lab.polygonpla.net/js/tinylz77.html)
base-64.js
(https://github.com/beatgammit/base64-js/blob/master/lib/b64.js)
Lammert Bies' 2008 CRC Library
(http://www.lammertbies.nl/comm/info/crc-calculation.html) ported
from ANSI C--with the precaution to bitwise-AND
with 0xffff the update function return value since JavaScript
treats every number as a 32-bit signed integer.
So I read the file as a byte array (Uint8Array), parse it as a string, compress it with LZ77, turn that back into a byte array and encode it using base64, at which point I calculate the CRC and paste it all into my ZPL ~DT command for savings of about 40%. Beautiful.
Unfortunately I'm developing a proprietary solution so I cannot post any code.
Good luck!
-What one man did another can do.
After looking at the ZPL manual you need to calculate the Cyclic Redundancy Check (CRC) for the image. Here is some C Code that calculates the CRC (source):
// Update the CRC for transmitted and received data using
// the CCITT 16bit algorithm (X^16 + X^12 + X^5 + 1).
unsigned char ser_data;
static unsigned int crc;
crc = (unsigned char)(crc >> 8) | (crc << 8);
crc ^= ser_data;
crc ^= (unsigned char)(crc & 0xff) >> 4;
crc ^= (crc << 8) << 4;
crc ^= ((crc & 0xff) << 4) << 1;
You can also refer to Wikipedia's page on CRC, as it contains other code examples as well.
https://en.wikipedia.org/wiki/Cyclic_redundancy_check
Everything else you are sending down looks good. I would look into using one of the Zebra SDKs. I know the Android one will send an image to the printer and save it for you.
Although this question has the C# tag, several other answers are not strictly C#, so here is an answer using Node 8.5+ (javascript), using java and the Zebra SDK. The same steps are very similar for any .NET language that can also use the SDK and perform a POST request.
const { promisify } = require('util');
const java = require('java');
java.asyncOptions = {
asyncSuffix: "",
syncSuffix: "Sync",
promiseSuffix: "Promise", // Generate methods returning promises, using the suffix Promise.
promisify
};
// Include all .jar's under C:\Program Files\Zebra Technologies\link_os_sdk\PC\v2.14.5198\lib
// in your lib folder
java.classpath.push(__dirname + "/lib/ZSDK_API.jar");
var ByteArrayOutputStream = java.import('java.io.ByteArrayOutputStream');
var ZebraImageFactory = java.import('com.zebra.sdk.graphics.ZebraImageFactory');
var PrinterUtil = java.import('com.zebra.sdk.printer.PrinterUtil');
const main = async function () {
let path = `C:\\images\\yourimage.png`;
let os = new ByteArrayOutputStream();
let image = await ZebraImageFactory.getImagePromise(path);
PrinterUtil.convertGraphicPromise("E:IMAGE.PNG", image, os);
console.log(os.toStringSync()); // junk:Z64:~:CRC
console.log('done');
};
main();
Then you can print the image via ZPL like:
^XA
~DYE:IMAGE,P,P,1,,:B64:<YOURB64>:<YOURCRC>
^FO0,0^IME:IMAGE.PNG
^XZ
Using something like
await axios.post(`${printer.ip}/pstprnt`, zpl);
In this GitHub project you will find everything you need. https://github.com/BinaryKits/BinaryKits.Zpl
There is also a PNG to GRF image converter with additional data compression available.
var elements = new List<ZplElementBase>();
elements.Add(new ZplDownloadGraphics('R', "SAMPLE", System.IO.File.ReadAllBytes("sample.png")));
elements.Add(new ZplRecallGraphic(100, 100, 'R', "SAMPLE"));
var renderEngine = new ZplEngine(elements);
var zpl = renderEngine.ToZplString(new ZplRenderOptions());

Best way to find position in the Stream where given byte sequence starts

How do you think what is the best way to find position in the System.Stream where given byte sequence starts (first occurence):
public static long FindPosition(Stream stream, byte[] byteSequence)
{
long position = -1;
/// ???
return position;
}
P.S. The simpliest yet fastest solution is preffered. :)
I've reached this solution.
I did some benchmarks with an ASCII file that was 3.050 KB and 38803 lines.
With a search byte array of 22 bytes in the last line of the file I've got the result in about 2.28 seconds (in a slow/old machine).
public static long FindPosition(Stream stream, byte[] byteSequence)
{
if (byteSequence.Length > stream.Length)
return -1;
byte[] buffer = new byte[byteSequence.Length];
using (BufferedStream bufStream = new BufferedStream(stream, byteSequence.Length))
{
int i;
while ((i = bufStream.Read(buffer, 0, byteSequence.Length)) == byteSequence.Length)
{
if (byteSequence.SequenceEqual(buffer))
return bufStream.Position - byteSequence.Length;
else
bufStream.Position -= byteSequence.Length - PadLeftSequence(buffer, byteSequence);
}
}
return -1;
}
private static int PadLeftSequence(byte[] bytes, byte[] seqBytes)
{
int i = 1;
while (i < bytes.Length)
{
int n = bytes.Length - i;
byte[] aux1 = new byte[n];
byte[] aux2 = new byte[n];
Array.Copy(bytes, i, aux1, 0, n);
Array.Copy(seqBytes, aux2, n);
if (aux1.SequenceEqual(aux2))
return i;
i++;
}
return i;
}
If you treat the stream like another sequence of bytes, you can just search it like you were doing a string search. Wikipedia has a great article on that. Boyer-Moore is a good and simple algorithm for this.
Here's a quick hack I put together in Java. It works and it's pretty close if not Boyer-Moore. Hope it helps ;)
public static final int BUFFER_SIZE = 32;
public static int [] buildShiftArray(byte [] byteSequence){
int [] shifts = new int[byteSequence.length];
int [] ret;
int shiftCount = 0;
byte end = byteSequence[byteSequence.length-1];
int index = byteSequence.length-1;
int shift = 1;
while(--index >= 0){
if(byteSequence[index] == end){
shifts[shiftCount++] = shift;
shift = 1;
} else {
shift++;
}
}
ret = new int[shiftCount];
for(int i = 0;i < shiftCount;i++){
ret[i] = shifts[i];
}
return ret;
}
public static byte [] flushBuffer(byte [] buffer, int keepSize){
byte [] newBuffer = new byte[buffer.length];
for(int i = 0;i < keepSize;i++){
newBuffer[i] = buffer[buffer.length - keepSize + i];
}
return newBuffer;
}
public static int findBytes(byte [] haystack, int haystackSize, byte [] needle, int [] shiftArray){
int index = needle.length;
int searchIndex, needleIndex, currentShiftIndex = 0, shift;
boolean shiftFlag = false;
index = needle.length;
while(true){
needleIndex = needle.length-1;
while(true){
if(index >= haystackSize)
return -1;
if(haystack[index] == needle[needleIndex])
break;
index++;
}
searchIndex = index;
needleIndex = needle.length-1;
while(needleIndex >= 0 && haystack[searchIndex] == needle[needleIndex]){
searchIndex--;
needleIndex--;
}
if(needleIndex < 0)
return index-needle.length+1;
if(shiftFlag){
shiftFlag = false;
index += shiftArray[0];
currentShiftIndex = 1;
} else if(currentShiftIndex >= shiftArray.length){
shiftFlag = true;
index++;
} else{
index += shiftArray[currentShiftIndex++];
}
}
}
public static int findBytes(InputStream stream, byte [] needle){
byte [] buffer = new byte[BUFFER_SIZE];
int [] shiftArray = buildShiftArray(needle);
int bufferSize, initBufferSize;
int offset = 0, init = needle.length;
int val;
try{
while(true){
bufferSize = stream.read(buffer, needle.length-init, buffer.length-needle.length+init);
if(bufferSize == -1)
return -1;
if((val = findBytes(buffer, bufferSize+needle.length-init, needle, shiftArray)) != -1)
return val+offset;
buffer = flushBuffer(buffer, needle.length);
offset += bufferSize-init;
init = 0;
}
} catch (IOException e){
e.printStackTrace();
}
return -1;
}
You'll basically need to keep a buffer the same size as byteSequence so that once you've found that the "next byte" in the stream matches, you can check the rest but then still go back to the "next but one" byte if it's not an actual match.
It's likely to be a bit fiddly whatever you do, to be honest :(
I needed to do this myself, had already started, and didn't like the solutions above. I specifically needed to find where the search-byte-sequence ends. In my situation, I need to fast-forward the stream until after that byte sequence. But you can use my solution for this question too:
var afterSequence = stream.ScanUntilFound(byteSequence);
var beforeSequence = afterSequence - byteSequence.Length;
Here is StreamExtensions.cs
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace System
{
static class StreamExtensions
{
/// <summary>
/// Advances the supplied stream until the given searchBytes are found, without advancing too far (consuming any bytes from the stream after the searchBytes are found).
/// Regarding efficiency, if the stream is network or file, then MEMORY/CPU optimisations will be of little consequence here.
/// </summary>
/// <param name="stream">The stream to search in</param>
/// <param name="searchBytes">The byte sequence to search for</param>
/// <returns></returns>
public static int ScanUntilFound(this Stream stream, byte[] searchBytes)
{
// For this class code comments, a common example is assumed:
// searchBytes are {1,2,3,4} or 1234 for short
// # means value that is outside of search byte sequence
byte[] streamBuffer = new byte[searchBytes.Length];
int nextRead = searchBytes.Length;
int totalScannedBytes = 0;
while (true)
{
FillBuffer(stream, streamBuffer, nextRead);
totalScannedBytes += nextRead; //this is only used for final reporting of where it was found in the stream
if (ArraysMatch(searchBytes, streamBuffer, 0))
return totalScannedBytes; //found it
nextRead = FindPartialMatch(searchBytes, streamBuffer);
}
}
/// <summary>
/// Check all offsets, for partial match.
/// </summary>
/// <param name="searchBytes"></param>
/// <param name="streamBuffer"></param>
/// <returns>The amount of bytes which need to be read in, next round</returns>
static int FindPartialMatch(byte[] searchBytes, byte[] streamBuffer)
{
// 1234 = 0 - found it. this special case is already catered directly in ScanUntilFound
// #123 = 1 - partially matched, only missing 1 value
// ##12 = 2 - partially matched, only missing 2 values
// ###1 = 3 - partially matched, only missing 3 values
// #### = 4 - not matched at all
for (int i = 1; i < searchBytes.Length; i++)
{
if (ArraysMatch(searchBytes, streamBuffer, i))
{
// EG. Searching for 1234, have #123 in the streamBuffer, and [i] is 1
// Output: 123#, where # will be read using FillBuffer next.
Array.Copy(streamBuffer, i, streamBuffer, 0, searchBytes.Length - i);
return i; //if an offset of [i], makes a match then only [i] bytes need to be read from the stream to check if there's a match
}
}
return 4;
}
/// <summary>
/// Reads bytes from the stream, making sure the requested amount of bytes are read (streams don't always fulfill the full request first time)
/// </summary>
/// <param name="stream">The stream to read from</param>
/// <param name="streamBuffer">The buffer to read into</param>
/// <param name="bytesNeeded">How many bytes are needed. If less than the full size of the buffer, it fills the tail end of the streamBuffer</param>
static void FillBuffer(Stream stream, byte[] streamBuffer, int bytesNeeded)
{
// EG1. [123#] - bytesNeeded is 1, when the streamBuffer contains first three matching values, but now we need to read in the next value at the end
// EG2. [####] - bytesNeeded is 4
var bytesAlreadyRead = streamBuffer.Length - bytesNeeded; //invert
while (bytesAlreadyRead < streamBuffer.Length)
{
bytesAlreadyRead += stream.Read(streamBuffer, bytesAlreadyRead, streamBuffer.Length - bytesAlreadyRead);
}
}
/// <summary>
/// Checks if arrays match exactly, or with offset.
/// </summary>
/// <param name="searchBytes">Bytes to search for. Eg. [1234]</param>
/// <param name="streamBuffer">Buffer to match in. Eg. [#123] </param>
/// <param name="startAt">When this is zero, all bytes are checked. Eg. If this value 1, and it matches, this means the next byte in the stream to read may mean a match</param>
/// <returns></returns>
static bool ArraysMatch(byte[] searchBytes, byte[] streamBuffer, int startAt)
{
for (int i = 0; i < searchBytes.Length - startAt; i++)
{
if (searchBytes[i] != streamBuffer[i + startAt])
return false;
}
return true;
}
}
}
Bit old question, but here's my answer. I've found that reading blocks and then searching in that is extremely inefficient compared to just reading one at a time and going from there.
Also, IIRC, the accepted answer would fail if part of the sequence was in one block read and half in another - ex, given 12345, searching for 23, it would read 12, not match, then read 34, not match, etc... haven't tried it, though, seeing as it requires net 4.0. At any rate, this is way simpler, and likely much faster.
static long ReadOneSrch(Stream haystack, byte[] needle)
{
int b;
long i = 0;
while ((b = haystack.ReadByte()) != -1)
{
if (b == needle[i++])
{
if (i == needle.Length)
return haystack.Position - needle.Length;
}
else
i = b == needle[0] ? 1 : 0;
}
return -1;
}
static long Search(Stream stream, byte[] pattern)
{
long start = -1;
stream.Seek(0, SeekOrigin.Begin);
while(stream.Position < stream.Length)
{
if (stream.ReadByte() != pattern[0])
continue;
start = stream.Position - 1;
for (int idx = 1; idx < pattern.Length; idx++)
{
if (stream.ReadByte() != pattern[idx])
{
start = -1;
break;
}
}
if (start > -1)
{
return start;
}
}
return start;
}

Categories

Resources