LZW decompression problem after Clear Code (Unix Compress .Z-files) - c#

I am implementing my own decompression code for decompressing Unix COMPRESS'ed .Z files. I have the basic decompression working and tested on smaller example files, but when I test it out on "real" files which may include the so called "Clear Code" (0x256), I run into trouble.
My code is able to decompress the file well up until that point, but after clearing the table and resetting the code length back to its initial size of 9, I notice the next code I am reading is faulty as it is larger than 255 (somewhere in the 400s). As it is the first entry since "resetting", I obviously don't have this entry in the table.
I have compared the codes read just after the reset code 0x256 by my code and SharpZipLib. I noticed that SharpzipLib seems to get a different code after the reset than I do. Our implementations are quite different so I am having a hard time finding the issue. I suspect the issue is that I start reading the bits in the wrong place somehow, though I have not managed to figured out what I am doing wrong...
Maybe a second pair of eyes would help?
Code:
namespace LzwDecompressor;
public class Decompressor
{
#region LZW Constants
private readonly byte[] _magicBytes = new byte[] { 0x1f, 0x9d };
private const int BlockModeMask = 0x80; //0b1000 0000
private const int MaxCodeBitsMask = 0x1f; //0b0001 1111
private const int InitialCodeLength = 9;
private const int ClearCode = 256;
#endregion
private int _maxBits;
private int _maxCode = (1 << InitialCodeLength) - 1;
private bool _blockMode;
private int _codeLength = InitialCodeLength;
private readonly Stream _inputStream;
public Decompressor(Stream stream) => _inputStream = stream;
public byte[] Decompress()
{
if (_inputStream.Length < 3)
throw new LzwDecompressorException("Input too small to even fit the required header.");
ParseHeader();
var dictionary = InitDict();
using var outStream = new MemoryStream();
var code = ReadCode();
if (code >= 256) throw new LzwDecompressorException("The first code cannot be larger than 255!");
outStream.Write(new[] { (byte)code }); //First code is always uncompressed
var old = code;
var nextIndex = _blockMode ? 257 : 256; //Skip 256 in block mode as it is a "clear code"
while ((code = ReadCode()) != -1)
{
if (_blockMode && code == ClearCode)
{
_codeLength = InitialCodeLength;
_maxCode = (1 << _codeLength) - 1;
dictionary = InitDict();
nextIndex = 257; //Block mode first index
//Logically I should here be able to read the next code and write it instantly as the first code is basically uncompressed. But as the code is wrong, I cannot do that
//code = ReadCode();
//outStream.Write(new [] { (byte)code });
//old = code;
continue;
}
var word = new List<byte>();
if (dictionary.TryGetValue(code, out var entry))
{
word.AddRange(entry);
}
else if (dictionary.Count + 1 == nextIndex)
{
word.AddRange(dictionary[old].ToArray().Concat(new[] { dictionary[old][0] }));
}
if (word.Count > 0)
{
outStream.Write(word.ToArray());
dictionary[nextIndex++] = new List<byte>(dictionary[old].ToArray().Append(word[0]));
old = code;
}
if (_codeLength == _maxBits) continue; //prevent code length growing beyond max
if (nextIndex == (1 << _codeLength))
{
_codeLength++;
_maxCode = (1 << _codeLength) - 1;
_ = dictionary.EnsureCapacity(1 << _codeLength);
}
}
return outStream.ToArray();
}
#region Private methods
private void ParseHeader()
{
if (_inputStream.ReadByte() != _magicBytes[0] || _inputStream.ReadByte() != _magicBytes[1])
{
throw new LzwDecompressorException("The given file does not contain the LZW magic bytes");
}
var descriptorByte = _inputStream.ReadByte();
_maxBits = descriptorByte & MaxCodeBitsMask;
_blockMode = (descriptorByte & BlockModeMask) > 0;
}
private static Dictionary<int, List<byte>> InitDict()
{
var dict = new Dictionary<int, List<byte>>(1 << InitialCodeLength); //2⁹ max entries
for (var i = 0; i < 256; i++) dict[i] = new List<byte> { (byte)i };
return dict;
}
private int ReadCode()
{
var code = 0x0;
for (var i = 0; i < _codeLength; i++)
{
var bit = ReadBit();
if (bit == -1) return -1;
code |= bit << i;
}
return code;
}
#region Bit Reader
private int _currentBitMask = 0x100;
private int _currentByte;
private int ReadBit()
{
if (_currentBitMask == 0x100)
{
_currentBitMask = 0x1;
var newByte = _inputStream.ReadByte();
if (newByte == -1) return -1;
_currentByte = newByte;
}
var bit = (_currentByte & _currentBitMask) > 0 ? 1 : 0;
_currentBitMask <<= 1;
return bit;
}
#endregion
#endregion
}
public class LzwDecompressorException : Exception
{
public LzwDecompressorException() { }
public LzwDecompressorException(string message) : base($"LZW Decompressor: {message}") { }
public LzwDecompressorException(string message, Exception inner) : base($"LZW Decompressor: {message}", inner) { }
}
I recognize that there may be other stuff missing still. Also performance issues and improvements are bound to be found. I have not paid too much attention to these yet as I am first and foremost looking to get it working until I start changing data structures and such for more performant variants.

Update
I managed to get it working. I finally found an example suited for my use case (decompressing a Unix COMPRESS'ed file). I found this python code on GH: unlzw
The only thing I was missing was the following byte position calculation after resetting:
# process clear code (256)
if (code == 256) and flags:
# Flush unused input bits and bytes to next 8*bits bit boundary
rem = (nxt - mark) % bits
if rem:
rem = bits - rem
if rem > inlen - nxt:
break
nxt += rem
I had already refactored my original code to work with a byte array instead of the memory stream I had originally. This simplified my thinking somewhat as I was now keeping track of the "current byte position" instead of the current byte. This makes it easier to, on Clear code, re-calculate the new byte position from which to continue reading. This python snippet which seems to be based on some old machine instructions according to this comment from the code:
Flush unused input bits and bytes to next 8*bits bit boundary
(this is a vestigial aspect of the compressed data format
derived from an implementation that made use of a special VAX
machine instruction!)
Having implemented my own version of this calculation, based on my parameters I already had at hand, I managed to get it working! I must say I do not fully understand the logic behind it, but I am happy it works :)

Related

How to efficently Read the Size of an JPEG Image on Stream

Im programming a little add-on for our business application, the goal is to take pictures with a Barcodereader.
Everything works, but the problem is the Barcodereader sends the picture in intervals, they are pretty random (depends on the size of the image and the baud rate). Without full analysis of the bytes I receive there is no way to tell if picture is already loaded.
At the moment my logic tries to find start/end of the JPEG by searching for FF D8 and FF D9 bytes respectively. The problem is bytes FF D9 can occur inside image.
I obviously could do some specific analysis of the bytes, but as the Barcodereader continuously sends data, performing time consuming operations (debug, CPU, IO, etc) while receiving bytes will end up in a bluescreen.
What I exactly want is
Reading the byte on which the size of the image is shown (I couldn't
even research if the size will take the header itself / footer
itself in consideration... do I have to calculate for that? ).
Checking if the I received all bytes.
I will put the code of me receiving and working with the Bytes (its on a datareceived event, serialPortish) and a correct full Image in Bytes and and a corrupt image maybe that will help.
DataReceivedEvent
private void ScannerPort_DataReceived(object sender, DataReceivedEventArgs e)
{
if (_WaitingForImage)
{
List<byte> imageBufferList = new List<byte>(e.RawBuffer);
{
if (imageBufferList[0] == 0x24 && imageBufferList[1] == 0x69)
{
for (int i = 0; i < 17; i++)
{
imageBufferList.RemoveAt(0);
}
}
byte[] imageBuffer = imageBufferList.ToArray();
_ImageReceiving = false;
_WaitingForImage = false;
this.OnImageReceived(imageBuffer);
}
//imageBufferList.AddRange(e.RawBuffer);
}
Full ByteArray
https://codepen.io/NicolaiWalsemann/pen/KKzxaXg
Corrupt byte Array
https://codepen.io/NicolaiWalsemann/pen/YzqONxd
Solution 1
I could easily do a Timer which waits for 500ms-2000ms after the DataReceived event is first called. This would make sure that I have everything and then I can parse it as much as I want. But obviously always having to wait unreasonably is not what I want.
I think someone has already answered this: Detect Eof for JPG images
I couldn't say it any better.
Since you will be receiving chunks of data, you will need to parse as you go. Something like the following. It is untested and I may have the count calculation backwards (big vs little endian). It also assumes it's possible for a chunk to span 2 images or that the chunks may split FFxx codes and counts. It is also not optimized in any way, but for small images may be ok.
private List<byte> imageBuffer = new List<byte>();
private int imageState = 0;
private int skipBytes = 0;
private void ScannerPort_DataReceived(object sender, DataReceivedEventArgs e)
{
List<byte> tempBuffer = new List<byte>(e.RawBuffer);
foreach (byte b in tempBuffer)
{
_ImageReceiving = true;
imageBuffer.Add(b);
switch (imageState)
{
case 0: // Searching for FF
if(b == 0xFF)
imageState = 1;
break;
case 1: // First byte after FF
if (b == 0 || b == 1 || (b <= 0xD8 && b >= 0xD1))
{
// Code is not followed by a count
imageState = 0;
}
else if (b == 0xD9)
{
// End of image
_ImageReceiving = false;
this.OnImageReceived(imageBuffer.ToArray());
imageBuffer = new List<byte>();
imageState = 0;
}
else
{
// Code is
imageState = 2;
}
break;
case 2: // First count byte, big endian?
skipBytes = ((int) b) * 0x100;
imageState = 3;
break;
case 3: // Second count byte
skipBytes += b;
imageState = 4;
break;
case 4: // skip
skipBytes--;
if (skipBytes == 0) imageState = 0;
break;
}
}
}
I haven't tested this, I don't know if it's right. But this is how I would approach the problem. Compared to the other answers there are a few considerations that I've tried to deal with;
Keep the byte array as a byte array
a single read event may contain partial chunks from multiple images
Assemble a whole image in a MemoryStream
Use the chunk length information to copy or skip the whole chunk
You might also wish to set the state back to 0 and stop copying an image if the memory buffer exceeds a maximum size. Or if no new header is encountered after the last one.
private MemoryStream image = new MemoryStream();
private int state = 0;
private bool copy = false;
private int blockLen = -1;
private void ImageReceived(Stream image) {
// TODO use whole image buffer
}
private void Received(byte[] block)
{
var i = 0;
while (i < block.Length)
{
if (state == 4 && blockLen > 0)
{
var remaining = block.Length - i;
if (remaining > blockLen)
remaining = blockLen;
if (copy)
image.Write(block, i, remaining);
i += remaining;
blockLen -= remaining;
if (blockLen <= 0)
state = 0;
}
else
{
var b = block[i++];
switch (state)
{
case 0:
if (b == 0xFF)
state = 1;
break;
case 1:
if (b == 0xD8) { // SOI
copy = true;
image.Seek(0, SeekOrigin.Begin);
image.SetLength(0);
image.WriteByte((byte)0xFF); // the first byte that we already skipped
} else if (b == 0xD9) { // EOI
if (copy)
{
image.WriteByte(b);
image.Seek(0, SeekOrigin.Begin);
ImageReceived(image);
}
copy = false;
state = 0;
} else if (b == 0xFF) { // NOOP
} else if ((b & 0xF8) == 0xD0) { // RSTn
// You could verify that n cycles from 0-7
state = 0;
} else {
state = 2;
}
break;
case 2:
blockLen = b << 8;
state = 3;
break;
case 3:
// length includes the 2 length bytes, which we've just skipped
blockLen = (blockLen | b) -2;
state = 4;
break;
}
if (copy)
image.WriteByte(b);
}
}
}

Generating Schröder Paths

I want to generate schröder paths from (0, 0) to (2n, 0) with
no peaks, i.e., no up step followed immediately by a down step.
Some examples are for n=3 : shröder paths.
/ is coded as U, -- is coded as R and \ is coded as D. Here is my code to generate these paths :
public static void addParen(List<String> list, int upstock,int rightstock,int
downstock,bool B, char[] str, int count,int total,int n)
{
if (total == n && downstock == 0)
{
String s = copyvalueof(str);
list.Add(s);
}
if (total > n || (total==n && downstock>0) )
return;
else
{
if (upstock > 0 && total<n)
{
str[count] = 'U';
addParen(list, upstock - 1,rightstock, downstock+1,B=true, str, count + 1,total+1,n);
}
if (downstock > 0 && total<n && B==false)
{
str[count] = 'D';
addParen(list, upstock,rightstock, downstock - 1,B=false, str, count + 1,total+1,n);
}
if (rightstock > 0 && total < n)
{
str[count] = 'R';
addParen(list, upstock, rightstock-1, downstock, B = false, str, count + 1, total + 2,n);
}
}
}
public static List<String> generatePaths(int count)
{
char[] str = new char[count * 2];
bool B = false;
List<String> list = new List<String>();
addParen(list, count-1, count, 0,B,str, 0, 0,count*2);
return list;
}
The total is 2n. I start with n-1 ups n rights and zero downs.Since there is no Up yet my bool B is false ( if there comes an up then down can not come after it,so to prevent this I put B=true which prevents it.) If an up comes, then there should be a corresponding down, and total should be incremented by one. If right comes then, total should be incremented by 2. My algorithm in general works like that but I couldn't get a correct result with this implementation.
The original solution didn't adapt to OP's needs, as it was too complicated to port to javascript and the intention was to show better practices solving these kind of problems more than actually solving easily this one in particular.
But in the spirit of using immutable types to solve path algorithms, we can still do so in a much simpler fashion: we'll use string.
Ok as always, lets build up our infrastructure: tools that will make our life easier:
private const char Up = 'U';
private const char Down = 'D';
private const char Horizontal = 'R';
private static readonly char[] upOrHorizontal = new[] { Up, Horizontal };
private static readonly char[] downOrHorizontal = new[] { Down, Horizontal };
private static readonly char[] all = new[] { Up, Horizontal, Down };
And a handy little helper method:
private static IList<char> GetAllPossibleDirectionsFrom(string path)
{
if (path.Length == 0)
return upOrHorizontal;
switch (path.Last())
{
case Up: return upOrHorizontal;
case Down: return downOrHorizontal;
case Horizontal: return all;
default:
Debug.Assert(false);
throw new NotSupportedException();
}
}
Remember, break down your problems to smaller problems. All difficult problems can be solved solving smaller easier problems. This helper method is hard to get wrong; thats good, its hard to write a bug inside easy short methods.
And now, we solve the bigger problem. We'll not use iterator blocks so porting is easier. We'll concede here using a mutable list to keep track of all the valid paths we find.
Our recursive solution is the following:
private static void getPaths(IList<string> allPaths,
string currentPath,
int height,
int maxLength,
int maxHeight)
{
if (currentPath.Length == maxLength)
{
if (height == 0)
{
allPaths.Add(currentPath);
}
}
else
{
foreach (var d in GetAllPossibleDirectionsFrom(currentPath))
{
int newHeight;
switch (d)
{
case Up:
newHeight = height + 1;
break;
case Down:
newHeight = height - 1;
break;
case Horizontal:
newHeight = height;
break;
default:
Debug.Assert(false);
throw new NotSupportedException();
}
if (newHeight < 0 /*illegal path*/ ||
newHeight >
maxLength - (currentPath.Length + 1)) /*can not possibly
end with zero height*/
continue;
getPaths(allPaths,
currentPath + d.ToString(),
newHeight,
maxLength,
maxHeight);
}
}
}
Not much to say, its pretty self explanatory. We could cut back some on the arguments; height is not strictly necessary, we could count ups and downs in the current path and figure out the height we are currently at, but that seems wasteful. maxLength could also, and probably should be removed, we have enough information with maxHeight.
Now we just need a method to kick this off:
public static IList<string> GetSchroderPathsWithoutPeaks(int n)
{
var allPaths = new List<string>();
getPaths(allPaths, "", 0, 2 * n, n);
return allPaths;
}
And we're set! If we take this out for a test drive:
var paths = GetSchroderPathsWithoutPeaks(2);
Console.WriteLine(string.Join(Environment.NewLine, paths));
We get the expected results:
URRD
URDR
RURD
RRRR
As to why your solution doesn't work? Well, just the fact that you can't figure it out says it all about how unecessary complicated your current solution is starting to look. When that happens, its normally a good idea to take a step back, rethink your approach, write down a clear specification of what your program should be doing step by step and start over.

Play video frame by frame performance issues

I want to play a video (mostly .mov with Motion JPEG) in frame by frame mode with changing framerate. I have a function who gives me a framenumber and then I have to jump there. It will be mostly in one direction but can skip a few frames from time to time; also the velocity is not constant.
So I have a timer asking every 40ms about a new framenumber and setting the new position.
My first approach now is with DirectShow.Net (Interop.QuartzTypeLib). Therefore I render and open the video and set it to pause to draw the picture in the graph
FilgraphManagerClass media = new FilgraphManagerClass();
media.RenderFile(FileName);
media.pause();
Now I will just set a new position
media.CurrentPosition = framenumber * media.AvgTimePerFrame;
Since the video is in pause mode it will then draw every requested new position (frame). Works perfectly fine but really slow... the video keeps stuttering and lagging and its not the video source; there are enough frames recorded to play a fluent video.
With some performance tests I found out that the LAV-Codec is the bottleneck here. This is not included directly in my project since its a DirectShow-Player it will be cast through my codec pack I installed on my PC.
Ideas:
Using the LAV-Codec by myself directly in C#. I searched but everyone is using DirectShow it seems, building their own filters and not using existing ones directly in the project.
Instead of seeking or setting the time, can I get single frames just by the framenumber and draw them simply?
Is there a complete other way to archive what I want to do?
Background:
This project has to be a train simulator. We recorded real time videos of trains driving from inside the cockpit and know which frame is what position. Now my C# programm calculates the position of the train in dependence of time and acceleration, gives back the appropriate framenumber and draw this frame.
Additional Information:
There is another project (not written by me) in C/C++ who uses DirectShow and the avcodec-LAV directly with a similar way I do and it works fine! Thats because I had the idea to use a codec / filter like the avrcodec-lav by myself. But I can't find an interop or interface to work with C#.
Thanks everyone for reading this and trying to help! :)
Obtaining specific frame by seeking filter graph (the entire pipeline) is pretty slow since every seek operation involves the following on its backyard: flushing everything, possibly re-creating worker threads, seeking to first key frame/splice point/clean point/I-Frame before the requested time, start of decoding starting from found position skipping frames until originally requested time is reached.
Overall, the method works well when you scrub paused video, or retrieve specific still frames. When however you try to play this as smooth video, it eventually causes significant part of the effort to be wasted and spent on seeking within video stream.
Solutions here are:
re-encode video to remove or reduce temporal compression (e.g. Motion JPEG AVI/MOV/MP4 files)
whenever possible prefer to skip frames and/or re-timestamp them according to your algorithm instead of seeking
have a cached of decoded video frames and pick from there, populate them as necessary in worker thread
The latter two are unfortunately hard to achieve without advanced filter development (where continuous decoding without interruption by seeking operations is the key to achieving decent performance). With basic DirectShow.Net you only have basic control over streaming and hence the first item from the list above.
Wanted to post a comment instead of an answer, but don't have the reputation. I think your heading in the wrong direction with Direct Show. I've been messing with motion-jpeg for a few years now between C# & Android, and have gotten great performance with built-in .NET code (for converting byte-array to Jpeg frame) and a bit of multi-threading. I can easily achieve over 30fps from multiple devices with each device running in it's own thread.
Below is an older version of my motion-jpeg parser from my C# app 'OmniView'. To use, just send the network stream to the constructor, and receive the OnImageReceived event. Then you can easily save the frames to the hard-drive for later use (perhaps with the filename set to the timestamp for easy lookup). For better performance though, you will want to save all of the images to one file.
using OmniView.Framework.Helpers;
using System;
using System.IO;
using System.Text;
using System.Windows.Media.Imaging;
namespace OmniView.Framework.Devices.MJpeg
{
public class MJpegStream : IDisposable
{
private const int BUFFER_SIZE = 4096;
private const string tag_length = "Content-Length:";
private const string stamp_format = "yyyyMMddHHmmssfff";
public delegate void ImageReceivedEvent(BitmapImage img);
public delegate void FrameCountEvent(long frames, long failed);
public event ImageReceivedEvent OnImageReceived;
public event FrameCountEvent OnFrameCount;
private bool isHead, isSetup;
private byte[] buffer, newline, newline_src;
private int imgBufferStart;
private Stream data_stream;
private MemoryStream imgStreamA, imgStreamB;
private int headStart, headStop;
private long imgSize, imgSizeTgt;
private bool useStreamB;
public volatile bool EnableRecording, EnableSnapshot;
public string RecordPath, SnapshotFilename;
private string boundary_tag;
private bool tagReadStarted;
private bool enableBoundary;
public volatile bool OututFrameCount;
private long FrameCount, FailedCount;
public MJpegStream() {
isSetup = false;
imgStreamA = new MemoryStream();
imgStreamB = new MemoryStream();
buffer = new byte[BUFFER_SIZE];
newline_src = new byte[] {13, 10};
}
public void Init(Stream stream) {
this.data_stream = stream;
FrameCount = FailedCount = 0;
startHeader(0);
}
public void Dispose() {
if (data_stream != null) data_stream.Dispose();
if (imgStreamA != null) imgStreamA.Dispose();
if (imgStreamB != null) imgStreamB.Dispose();
}
//=============================
public void Process() {
if (isHead) processHeader();
else {
if (enableBoundary) processImageBoundary();
else processImage();
}
}
public void Snapshot(string filename) {
SnapshotFilename = filename;
EnableSnapshot = true;
}
//-----------------------------
// Header
private void startHeader(int remaining_bytes) {
isHead = true;
headStart = 0;
headStop = remaining_bytes;
imgSizeTgt = 0;
tagReadStarted = false;
}
private void processHeader() {
int t = BUFFER_SIZE - headStop;
headStop += data_stream.Read(buffer, headStop, t);
int nl;
//
if (!isSetup) {
byte[] new_newline;
if ((nl = findNewline(headStart, headStop, out new_newline)) >= 0) {
string tag = Encoding.UTF8.GetString(buffer, headStart, nl - headStart);
if (tag.StartsWith("--")) boundary_tag = tag;
headStart = nl+new_newline.Length;
newline = new_newline;
isSetup = true;
return;
}
} else {
while ((nl = findData(newline, headStart, headStop)) >= 0) {
string tag = Encoding.UTF8.GetString(buffer, headStart, nl - headStart);
if (!tagReadStarted && tag.Length > 0) tagReadStarted = true;
headStart = nl+newline.Length;
//
if (!processHeaderData(tag, nl)) return;
}
}
//
if (headStop >= BUFFER_SIZE) {
string data = Encoding.UTF8.GetString(buffer, headStart, headStop - headStart);
throw new Exception("Invalid Header!");
}
}
private bool processHeaderData(string tag, int index) {
if (tag.StartsWith(tag_length)) {
string val = tag.Substring(tag_length.Length);
imgSizeTgt = long.Parse(val);
}
//
if (tag.Length == 0 && tagReadStarted) {
if (imgSizeTgt > 0) {
finishHeader(false);
return false;
}
if (boundary_tag != null) {
finishHeader(true);
return false;
}
}
//
return true;
}
private void finishHeader(bool enable_boundary) {
int s = shiftBytes(headStart, headStop);
enableBoundary = enable_boundary;
startImage(s);
}
//-----------------------------
// Image
private void startImage(int remaining_bytes) {
isHead = false;
imgBufferStart = remaining_bytes;
Stream imgStream = getStream();
imgStream.Seek(0, SeekOrigin.Begin);
imgStream.SetLength(imgSizeTgt);
imgSize = 0;
}
private void processImage() {
long img_r = (imgSizeTgt - imgSize - imgBufferStart);
int bfr_r = Math.Max(BUFFER_SIZE - imgBufferStart, 0);
int t = (int)Math.Min(img_r, bfr_r);
int s = data_stream.Read(buffer, imgBufferStart, t);
int x = imgBufferStart + s;
appendImageData(0, x);
imgBufferStart = 0;
//
if (imgSize >= imgSizeTgt) processImageData(0);
}
private void processImageBoundary() {
int t = Math.Max(BUFFER_SIZE - imgBufferStart, 0);
int s = data_stream.Read(buffer, imgBufferStart, t);
//
int nl, start = 0;
int end = imgBufferStart + s;
while ((nl = findData(newline, start, end)) >= 0) {
int tag_length = boundary_tag.Length;
if (nl+newline.Length+tag_length > BUFFER_SIZE) {
appendImageData(start, nl+newline.Length - start);
start = nl+newline.Length;
continue;
}
//
string v = Encoding.UTF8.GetString(buffer, nl+newline.Length, tag_length);
if (v == boundary_tag) {
appendImageData(start, nl - start);
int xstart = nl+newline.Length + tag_length;
int xsize = shiftBytes(xstart, end);
processImageData(xsize);
return;
} else {
appendImageData(start, nl+newline.Length - start);
}
start = nl+newline.Length;
}
//
if (start < end) {
int end_x = end - newline.Length;
if (start < end_x) {
appendImageData(start, end_x - start);
}
//
shiftBytes(end - newline.Length, end);
imgBufferStart = newline.Length;
}
}
private void processImageData(int remaining_bytes) {
if (EnableSnapshot) {
EnableSnapshot = false;
saveSnapshot();
}
//
try {
BitmapImage img = createImage();
if (EnableRecording) recordFrame();
if (OnImageReceived != null) OnImageReceived.Invoke(img);
FrameCount++;
}
catch (Exception) {
// output frame error ?!
FailedCount++;
}
//
if (OututFrameCount && OnFrameCount != null) OnFrameCount.Invoke(FrameCount, FailedCount);
//
useStreamB = !useStreamB;
startHeader(remaining_bytes);
}
private void appendImageData(int index, int length) {
Stream imgStream = getStream();
imgStream.Write(buffer, index, length);
imgSize += (length - index);
}
//-----------------------------
private void recordFrame() {
string stamp = DateTime.Now.ToString(stamp_format);
string filename = RecordPath+"\\"+stamp+".jpg";
//
ImageHelper.Save(getStream(), filename);
}
private void saveSnapshot() {
Stream imgStream = getStream();
//
imgStream.Position = 0;
Stream file = File.Open(SnapshotFilename, FileMode.Create, FileAccess.Write);
try {imgStream.CopyTo(file);}
finally {file.Close();}
}
private BitmapImage createImage() {
Stream imgStream = getStream();
imgStream.Position = 0;
return ImageHelper.LoadStream(imgStream);
}
//-----------------------------
private Stream getStream() {return useStreamB ? imgStreamB : imgStreamA;}
private int findNewline(int start, int stop, out byte[] data) {
for (int i = start; i < stop; i++) {
if (i < stop-1 && buffer[i] == newline_src[0] && buffer[i+1] == newline_src[1]) {
data = newline_src;
return i;
} else if (buffer[i] == newline_src[1]) {
data = new byte[] {newline_src[1]};
return i;
}
}
data = null;
return -1;
}
private int findData(byte[] data, int start, int stop) {
int data_size = data.Length;
for (int i = start; i < stop-data_size; i++) {
if (findInnerData(data, i)) return i;
}
return -1;
}
private bool findInnerData(byte[] data, int buffer_index) {
int count = data.Length;
for (int i = 0; i < count; i++) {
if (data[i] != buffer[buffer_index+i]) return false;
}
return true;
}
private int shiftBytes(int start, int end) {
int c = end - start;
for (int i = 0; i < c; i++) {
buffer[i] = buffer[end-c+i];
}
return c;
}
}
}

C# compress a byte array

I do not know much about compression algorithms. I am looking for a simple compression algorithm (or code snippet) which can reduce the size of a byte[,,] or byte[]. I cannot make use of System.IO.Compression. Also, the data has lots of repetition.
I tried implementing the RLE algorithm (posted below for your inspection). However, it produces array's 1.2 to 1.8 times larger.
public static class RLE
{
public static byte[] Encode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 0; i < source.Length; i++)
{
runLength = 1;
while (runLength < byte.MaxValue
&& i + 1 < source.Length
&& source[i] == source[i + 1])
{
runLength++;
i++;
}
dest.Add(runLength);
dest.Add(source[i]);
}
return dest.ToArray();
}
public static byte[] Decode(byte[] source)
{
List<byte> dest = new List<byte>();
byte runLength;
for (int i = 1; i < source.Length; i+=2)
{
runLength = source[i - 1];
while (runLength > 0)
{
dest.Add(source[i]);
runLength--;
}
}
return dest.ToArray();
}
}
I have also found a java, string and integer based, LZW implementation. I have converted it to C# and the results look good (code posted below). However, I am not sure how it works nor how to make it work with bytes instead of strings and integers.
public class LZW
{
/* Compress a string to a list of output symbols. */
public static int[] compress(string uncompressed)
{
// Build the dictionary.
int dictSize = 256;
Dictionary<string, int> dictionary = new Dictionary<string, int>();
for (int i = 0; i < dictSize; i++)
dictionary.Add("" + (char)i, i);
string w = "";
List<int> result = new List<int>();
for (int i = 0; i < uncompressed.Length; i++)
{
char c = uncompressed[i];
string wc = w + c;
if (dictionary.ContainsKey(wc))
w = wc;
else
{
result.Add(dictionary[w]);
// Add wc to the dictionary.
dictionary.Add(wc, dictSize++);
w = "" + c;
}
}
// Output the code for w.
if (w != "")
result.Add(dictionary[w]);
return result.ToArray();
}
/* Decompress a list of output ks to a string. */
public static string decompress(int[] compressed)
{
int dictSize = 256;
Dictionary<int, string> dictionary = new Dictionary<int, string>();
for (int i = 0; i < dictSize; i++)
dictionary.Add(i, "" + (char)i);
string w = "" + (char)compressed[0];
string result = w;
for (int i = 1; i < compressed.Length; i++)
{
int k = compressed[i];
string entry = "";
if (dictionary.ContainsKey(k))
entry = dictionary[k];
else if (k == dictSize)
entry = w + w[0];
result += entry;
// Add w+entry[0] to the dictionary.
dictionary.Add(dictSize++, w + entry[0]);
w = entry;
}
return result;
}
}
Have a look here. I used this code as a basis to compress in one of my work projects. Not sure how much of the .NET Framework is accessbile in the Xbox 360 SDK, so not sure how well this will work for you.
The problem with that RLE algorithm is that it is too simple. It prefixes every byte with how many times it is repeated, but that does mean that in long ranges of non-repeating bytes, each single byte is prefixed with a "1". On data without any repetitions this will double the file size.
This can be avoided by using Code-type RLE instead; the 'Code' (also called 'Token') will be a byte that can have two meanings; either it indicates how many times the single following byte is repeated, or it indicates how many non-repeating bytes follow that should be copied as they are. The difference between those two codes is made by enabling the highest bit, meaning there are still 7 bits available for the value, meaning the amount to copy or repeat per such code can be up to 127.
This means that even in worst-case scenarios, the final size can only be about 1/127th larger than the original file size.
A good explanation of the whole concept, plus full working (and, in fact, heavily optimised) C# code, can be found here:
http://www.shikadi.net/moddingwiki/RLE_Compression
Note that sometimes, the data will end up larger than the original anyway, simply because there are not enough repeating bytes in it for RLE to work. A good way to deal with such compression failures is by adding a header to your final data. If you simply add an extra byte at the start that's on 0 for uncompressed data and 1 for RLE compressed data, then, when RLE fails to give a smaller result, you just save it uncompressed, with the 0 in front, and your final data will be exactly one byte larger than the original. The system at the other side can then read that starting byte and use that to determine if the following data should be uncompressed or just copied.
Look into Huffman codes, it's a pretty simple algorithm. Basically, use fewer bits for patterns that show up more often, and keep a table of how it's encoded. And you have to account in your codewords that there are no separators to help you decode.

How to know position(linenumber) of a streamreader in a textfile?

an example (that might not be real life, but to make my point) :
public void StreamInfo(StreamReader p)
{
string info = string.Format(
"The supplied streamreaer read : {0}\n at line {1}",
p.ReadLine(),
p.GetLinePosition()-1);
}
GetLinePosition here is an imaginary extension method of streamreader.
Is this possible?
Of course I could keep count myself but that's not the question.
I came across this post while looking for a solution to a similar problem where I needed to seek the StreamReader to particular lines. I ended up creating two extension methods to get and set the position on a StreamReader. It doesn't actually provide a line number count, but in practice, I just grab the position before each ReadLine() and if the line is of interest, then I keep the start position for setting later to get back to the line like so:
var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();
streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();
Assert.AreEqual(line1, line2);
and the important part:
public static class StreamReaderExtensions
{
readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
public static long GetPosition(this StreamReader reader)
{
// shift position back from BaseStream.Position by the number of bytes read
// into internal buffer.
int byteLen = (int)byteLenField.GetValue(reader);
var position = reader.BaseStream.Position - byteLen;
// if we have consumed chars from the buffer we need to calculate how many
// bytes they represent in the current encoding and add that to the position.
int charPos = (int)charPosField.GetValue(reader);
if (charPos > 0)
{
var charBuffer = (char[])charBufferField.GetValue(reader);
var encoding = reader.CurrentEncoding;
var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
position += bytesConsumed;
}
return position;
}
public static void SetPosition(this StreamReader reader, long position)
{
reader.DiscardBufferedData();
reader.BaseStream.Seek(position, SeekOrigin.Begin);
}
}
This works quite well for me and depending on your tolerance for using reflection It thinks it is a fairly simple solution.
Caveats:
While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files (ASCII).
I only ever use the StreamReader.ReadLine() method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
No, not really possible. The concept of a "line number" is based upon the actual data that's already been read, not just the position. For instance, if you were to Seek() the reader to an arbitrary position, it's not actuall going to read that data, so it wouldn't be able to determine the line number.
The only way to do this is to keep track of it yourself.
It is extremely easy to provide a line-counting wrapper for any TextReader:
public class PositioningReader : TextReader {
private TextReader _inner;
public PositioningReader(TextReader inner) {
_inner = inner;
}
public override void Close() {
_inner.Close();
}
public override int Peek() {
return _inner.Peek();
}
public override int Read() {
var c = _inner.Read();
if (c >= 0)
AdvancePosition((Char)c);
return c;
}
private int _linePos = 0;
public int LinePos { get { return _linePos; } }
private int _charPos = 0;
public int CharPos { get { return _charPos; } }
private int _matched = 0;
private void AdvancePosition(Char c) {
if (Environment.NewLine[_matched] == c) {
_matched++;
if (_matched == Environment.NewLine.Length) {
_linePos++;
_charPos = 0;
_matched = 0;
}
}
else {
_matched = 0;
_charPos++;
}
}
}
Drawbacks (for the sake of brevity):
Does not check constructor argument for null
Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
overriding those methods via routing calls to _inner. instead of base.
passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:
public override int ReadBlock(char[] buffer, int index, int count) {
var readCount = _inner.ReadBlock(buffer, index, count);
for (int i = 0; i < readCount; i++)
AdvancePosition(buffer[index + i]);
return readCount;
}
No.
Consider that it's possible to seek to any poisition using the underlying stream object (which could be at any point in any line).
Now consider what that would do to any count kept by the StreamReader.
Should the StreamReader go and figure out which line it's now on?
Should it just keep a number of lines read, regardless of position within the file?
There are more questions than just these that would make this a nightmare to implement, imho.
Here is a guy that implemented a StreamReader with ReadLine() method that registers file position.
http://www.daniweb.com/forums/thread35078.html
I guess one should inherit from StreamReader, and then add the extra method to the special class along with some properties (_lineLength + _bytesRead):
// Reads a line. A line is defined as a sequence of characters followed by
// a carriage return ('\r'), a line feed ('\n'), or a carriage return
// immediately followed by a line feed. The resulting string does not
// contain the terminating carriage return and/or line feed. The returned
// value is null if the end of the input stream has been reached.
//
/// <include file='doc\myStreamReader.uex' path='docs/doc[#for="myStreamReader.ReadLine"]/*' />
public override String ReadLine()
{
_lineLength = 0;
//if (stream == null)
// __Error.ReaderClosed();
if (charPos == charLen)
{
if (ReadBuffer() == 0) return null;
}
StringBuilder sb = null;
do
{
int i = charPos;
do
{
char ch = charBuffer[i];
int EolChars = 0;
if (ch == '\r' || ch == '\n')
{
EolChars = 1;
String s;
if (sb != null)
{
sb.Append(charBuffer, charPos, i - charPos);
s = sb.ToString();
}
else
{
s = new String(charBuffer, charPos, i - charPos);
}
charPos = i + 1;
if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
{
if (charBuffer[charPos] == '\n')
{
charPos++;
EolChars = 2;
}
}
_lineLength = s.Length + EolChars;
_bytesRead = _bytesRead + _lineLength;
return s;
}
i++;
} while (i < charLen);
i = charLen - charPos;
if (sb == null) sb = new StringBuilder(i + 80);
sb.Append(charBuffer, charPos, i);
} while (ReadBuffer() > 0);
string ss = sb.ToString();
_lineLength = ss.Length;
_bytesRead = _bytesRead + _lineLength;
return ss;
}
Think there is a minor bug in the code as the length of the string is used to calculate file position instead of using the actual bytes read (Lacking support for UTF8 and UTF16 encoded files).
I came here looking for something simple. If you're just using ReadLine() and don't care about using Seek() or anything, just make a simple subclass of StreamReader
class CountingReader : StreamReader {
private int _lineNumber = 0;
public int LineNumber { get { return _lineNumber; } }
public CountingReader(Stream stream) : base(stream) { }
public override string ReadLine() {
_lineNumber++;
return base.ReadLine();
}
}
and then you make it the normal way, say from a FileInfo object named file
CountingReader reader = new CountingReader(file.OpenRead())
and you just read the reader.LineNumber property.
The points already made with respect to the BaseStream are valid and important. However, there are situations in which you want to read a text and know where in the text you are. It can still be useful to write that up as a class to make it easy to reuse.
I tried to write such a class now. It seems to work correctly, but it's rather slow. It should be fine when performance isn't crucial (it isn't that slow, see below).
I use the same logic to track position in the text regardless if you read a char at a time, one buffer at a time, or one line at a time. While I'm sure this can be made to perform rather better by abandoning this, it made it much easier to implement... and, I hope, to follow the code.
I did a very basic performance comparison of the ReadLine method (which I believe is the weakest point of this implementation) to StreamReader, and the difference is almost an order of magnitude. I got 22 MB/s using my class StreamReaderEx, but nearly 9 times as much using StreamReader directly (on my SSD-equipped laptop). While it could be interesting, I don't know how to make a proper reading test; maybe using 2 identical files, each larger than the disk buffer, and reading them alternately..? At least my simple test produces consistent results when I run it several times, and regardless of which class reads the test file first.
The NewLine symbol defaults to Environment.NewLine but can be set to any string of length 1 or 2. The reader considers only this symbol as a newline, which may be a drawback. At least I know Visual Studio has prompted me a fair number of times that a file I open "has inconsistent newlines".
Please note that I haven't included the Guard class; this is a simple utility class and it should be obvoius from the context how to replace it. You can even remove it, but you'd lose some argument checking and thus the resulting code would be farther from "correct". For example, Guard.NotNull(s, "s") simply checks that is s is not null, throwing an ArgumentNullException (with argument name "s", hence the second parameter) should it be the case.
Enough babble, here's the code:
public class StreamReaderEx : StreamReader
{
// NewLine characters (magic value -1: "not used").
int newLine1, newLine2;
// The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
bool insideNewLine;
// StringBuilder used for ReadLine implementation.
StringBuilder lineBuilder = new StringBuilder();
public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
{
init(newLine);
}
public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
{
init(newLine);
}
public string NewLine
{
get { return "" + (char)newLine1 + (char)newLine2; }
private set
{
Guard.NotNull(value, "value");
Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");
newLine1 = value[0];
newLine2 = (value.Length == 2 ? value[1] : -1);
}
}
public int LineNumber { get; private set; }
public int LinePosition { get; private set; }
public override int Read()
{
int next = base.Read();
trackTextPosition(next);
return next;
}
public override int Read(char[] buffer, int index, int count)
{
int n = base.Read(buffer, index, count);
for (int i = 0; i

Categories

Resources