How windows calculate file dimension? - c#

I Calculate the binary size of a file with this function:
public static int BinarySize(string path)
{
FileStream fs = new FileStream(path, FileMode.Open);
int hexIn;
string ret = "";
for (int i = 0; (hexIn = fs.ReadByte()) != -1; i++)
{
ret += Convert.ToString(hexIn, 2);
}
fs.Close();
return ret.Length;
}
An example of my problem is when I calculate the dimension of this simple black PNG image (10x10 pixels)
With that function I find 640 bits => 80 bytes, but windows say that this file dimension is 136 byte.
Why this difference of 56 bytes? Is the security, permissions or some private information that windows attach to every file?

Convert.ToString(hexIn,2) does not always return 8 characters, it trims leading zeroes, so if hexIn is 4, it returns 100, but not 00000100.
You might want to change it to Convert.ToString(hexIn,2).PadLeft(8, '0');.
Also you'd want to use StringBuilder instead of string for ret variable.
By the way, reading file to determine its size is a bit wasteful. Better use FileInfo class to get file information.

Related

C# Filestream Read - Recycle array?

I am working with filestream read: https://msdn.microsoft.com/en-us/library/system.io.filestream.read%28v=vs.110%29.aspx
What I'm trying to do is read a large file in a loop a certain number of bytes at a time; not the whole file at once. The code example shows this for reading:
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
The definition of "bytes" is: "When this method returns, contains the specified byte array with the values between offset and (offset + count - 1) replaced by the bytes read from the current source."
I want to only read in 1 mb at a time so I do this:
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read)) {
int intBytesToRead = 1024;
int intTotalBytesRead = 0;
int intInputFileByteLength = 0;
byte[] btInputBlock = new byte[intBytesToRead];
byte[] btOutputBlock = new byte[intBytesToRead];
intInputFileByteLength = (int)fsInputFile.Length;
while (intInputFileByteLength - 1 >= intTotalBytesRead)
{
if (intInputFileByteLength - intTotalBytesRead < intBytesToRead)
{
intBytesToRead = intInputFileByteLength - intTotalBytesRead;
}
// *** Problem is here ***
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, intTotalBytesRead - n, n);
}
fsOutputFile.Close(); }
Where the problem area is stated, btInputBlock works on the first cycle because it reads in 1024 bytes. But then on the second loop, it doesn't recycle this byte array. It instead tries to append the new 1024 bytes into btInputBlock. As far as I can tell, you can only specify the offset and length of the file you want to read and not the offset and length of btInputBlock. Is there a way to "re-use" the array that is being dumped into by Filestream.Read or should I find another solution?
Thanks.
P.S. The exception on the read is: "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
Your code can be simplified somewhat
int num;
byte[] buffer = new byte[1024];
while ((num = fsInputFile.Read(buffer, 0, buffer.Length)) != 0)
{
//Do your work here
fsOutputFile.Write(buffer, 0, num);
}
Note that Read takes in the Array to fill, the offset (which is the offset of the array where the bytes should be placed, and the (max) number of bytes to read.
That's because you're incrementing intTotalBytesRead, which is an offset for the array, not for the filestream. In your case it should always be zero, which will overwrite previous byte data in the array, rather than append it at the end, using intTotalBytesRead.
int n = fsInputFile.Read(btInputBlock, intTotalBytesRead, intBytesToRead); //currently
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead); //should be
Filestream doesn't need an offset, every Read picks up where the last one left off.
See https://msdn.microsoft.com/en-us/library/system.io.filestream.read(v=vs.110).aspx
for details
Your Read call should be Read(btInputBlock, 0, intBytesToRead). The 2nd parameter is the offset into the array you want to start writing the bytes to. Similarly for Write you want Write(btInputBlock, 0, n) as the 2nd parameter is the offset in the array to start writing bytes from. Also you don't need to call Close as the using will clean up the FileStream for you.
using (FileStream fsInputFile = new FileStream(strInputFileName, FileMode.Open, FileAccess.Read))
{
int intBytesToRead = 1024;
byte[] btInputBlock = new byte[intBytesToRead];
while (fsInputFile.Postion < fsInputFile.Length)
{
int n = fsInputFile.Read(btInputBlock, 0, intBytesToRead);
intTotalBytesRead += n;
fsOutputFile.Write(btInputBlock, 0, n);
}
}

How to write an array of doubles to a file very fast in C#?

I want to write something like this to a file:
FileStream output = new FileStream("test.bin", FileMode.Create, FileAccess.ReadWrite);
BinaryWriter binWtr = new BinaryWriter(output);
double [] a = new double [1000000]; //this array fill complete
for(int i = 0; i < 1000000; i++)
{
binWtr.Write(a[i]);
}
And unfortunately this code's process last very long!
(in this example about 10 seconds!)
The file format is binary.
How can I make that faster?
You should be able to speed up the process by converting your array of doubles to an array of bytes, and then write the bytes in one shot.
This answer shows how to do the conversion (the code below comes from that answer):
static byte[] GetBytes(double[] values) {
var result = new byte[values.Length * sizeof(double)];
Buffer.BlockCopy(values, 0, result, 0, result.Length);
return result;
}
With the array of bytes in hand, you can call Write that takes an array of bytes:
var byteBuf = GetBytes(a);
binWtr.Write(byteBuf);
You're writing the bytes 1 by 1, of course it's going to be slow.
You could do the writing in memory to an array and then write the array to disk all at once like this :
var arr = new double[1000000];
using(var strm = new MemoryStream())
using (var bw = new BinaryWriter(strm))
{
foreach(var d in arr)
{
bw.Write(d);
}
bw.Flush();
File.WriteAllBytes("myfile.bytes",strm.ToArray());
}

Achieving the best possible compression with data consisting of 0s and 1s?

I am trying to achieve the best possible compression for data that consists of just 1s and 0s in a matrix.
To demonstrate what I mean, here's a sample 6 by 6 matrix:
1,0,0,1,1,1
0,1,0,1,1,1
1,0,0,1,0,0
0,1,1,0,1,1
1,0,0,0,0,1
0,1,0,1,0,1
I'd like to compress that into an as small string or byte array as possible. The matrices I will need to compress are bigger though (always 4096 by 4096 1s and 0s).
I suppose it could be compressed quite heavily, but I'm not sure how. I'll mark the best compression as the answer. Performance does not matter.
I assume that you want to compress string into other strings even though your data really is binary. I don't know what the best compression algorithm is (and that will vary depending on your data) but you can convert the input text into bits, compress these and then convert the compressed bytes into a string again using base-64 encoding. This will allow you to go from string to string and still apply a compression algorithm of your choice.
The .NET framework provides the class DeflateStream that will allow you to compress a stream of bytes. The first step is to create a custom Stream that will allow you to read and write your text format. For lack of better name I have named it TextStream. Note that to simplify matters a bit I use \n as the line ending (instead of \r\n).
class TextStream : Stream {
readonly String text;
readonly Int32 bitsPerLine;
readonly StringBuilder buffer;
Int32 textPosition;
// Initialize a readable stream.
public TextStream(String text) {
if (text == null)
throw new ArgumentNullException("text");
this.text = text;
}
// Initialize a writeable stream.
public TextStream(Int32 bitsPerLine) {
if (bitsPerLine <= 0)
throw new ArgumentException();
this.bitsPerLine = bitsPerLine;
this.buffer = new StringBuilder();
}
public override Boolean CanRead { get { return this.text != null; } }
public override Boolean CanWrite { get { return this.buffer != null; } }
public override Boolean CanSeek { get { return false; } }
public override Int64 Length { get { throw new InvalidOperationException(); } }
public override Int64 Position {
get { throw new InvalidOperationException(); }
set { throw new InvalidOperationException(); }
}
public override void Flush() {
}
public override Int32 Read(Byte[] buffer, Int32 offset, Int32 count) {
// TODO: Validate buffer, offset and count.
if (!CanRead)
throw new InvalidOperationException();
var byteCount = 0;
Byte currentByte = 0;
var bitCount = 0;
for (; byteCount < count && this.textPosition < this.text.Length; this.textPosition += 1) {
if (text[this.textPosition] != '0' && text[this.textPosition] != '1')
continue;
currentByte = (Byte) ((currentByte << 1) | (this.text[this.textPosition] == '0' ? 0 : 1));
bitCount += 1;
if (bitCount == 8) {
buffer[offset + byteCount] = currentByte;
byteCount += 1;
currentByte = 0;
bitCount = 0;
}
}
if (bitCount > 0) {
buffer[offset + byteCount] = currentByte;
byteCount += 1;
}
return byteCount;
}
public override void Write(Byte[] buffer, Int32 offset, Int32 count) {
// TODO: Validate buffer, offset and count.
if (!CanWrite)
throw new InvalidOperationException();
for (var i = 0; i < count; ++i) {
var currentByte = buffer[offset + i];
for (var mask = 0x80; mask > 0; mask /= 2) {
if (this.buffer.Length > 0) {
if ((this.buffer.Length + 1)%(2*this.bitsPerLine) == 0)
this.buffer.Append('\n');
else
this.buffer.Append(',');
}
this.buffer.Append((currentByte & mask) == 0 ? '0' : '1');
}
}
}
public override String ToString() {
if (this.text != null)
return this.text;
else
return this.buffer.ToString();
}
public override Int64 Seek(Int64 offset, SeekOrigin origin) {
throw new InvalidOperationException();
}
public override void SetLength(Int64 length) {
throw new InvalidOperationException();
}
}
Then you can write methods for compressing and decompressing using DeflateStream. Note that the the uncompressed input is a string like the one you have provided in your question an the compressed output is a base-64 encoded string.
String Compress(String text) {
using (var inputStream = new TextStream(text))
using (var outputStream = new MemoryStream()) {
using (var compressedStream = new DeflateStream(outputStream, CompressionMode.Compress))
inputStream.CopyTo(compressedStream);
return Convert.ToBase64String(outputStream.ToArray());
}
}
String Decompress(String compressedText, Int32 bitsPerLine) {
var bytes = Convert.FromBase64String(compressedText);
using (var inputStream = new MemoryStream(bytes))
using (var outputStream = new TextStream(bitsPerLine)) {
using (var compressedStream = new DeflateStream(inputStream, CompressionMode.Decompress))
compressedStream.CopyTo(outputStream);
return outputStream.ToString();
}
}
To test it I used a method to create a random string (using a fixed seed to always create the same string):
String CreateRandomString(Int32 width, Int32 height) {
var random = new Random(0);
var stringBuilder = new StringBuilder();
for (var i = 0; i < width; ++i) {
for (var j = 0; j < height; ++j) {
if (i > 0 && j == 0)
stringBuilder.Append('\n');
else if (j > 0)
stringBuilder.Append(',');
stringBuilder.Append(random.Next(2) == 0 ? '0' : '1');
}
}
return stringBuilder.ToString();
}
Creating a random 4,096 x 4,096 string has an uncompressed size of 33,554,431 characters. This is compressed to 2,797,056 characters which is a reduction to about 8% of the original size.
Skipping the base-64 encoding would increase the compression ratio even more but the output would be binary and not a string. If you also consider the input as binary you actually get the following result for random data with equal probability of 0 and 1:
Input bytes: 4,096 x 4,096 / 8 = 2,097,152
Output bytes: 2,097,792
Size after compression: 100%
Simply converting to bytes is a better than doing that following by a deflate. However, using random input but with 25% 0 and 75% 1 you get this result:
Input bytes: 4,096 x 4,096 / 8 = 2,097,152
Output bytes: 1,757,846
Size after compression: 84%
How much deflate will compress your data really depends of the nature of the data. If it is completely random you wont be able to get much compression after converting from text to bytes.
Hmm... as small as possible is not really possible without knowing the problem domain.
Here's the general approach:
Represent the ones and zeros in the array using bits not bytes or characters or whatever.
Compress using a general purpose loss-less compression algorithm. The two most common are:
Huffman encoding and some type of LZW.
Huffman can be mathematically proven to provide the best possible compression of data, the catch is in order to decompress the data you also need the Huffman tree which may be as big as the original data. LZW gives you compression equivalent to Huffman (within a few percent) for most inputs, but performs best on data with repeating segments such as text.
Implementations for the compression algorithms should be easy to come by (GZIP uses LZ77 which is an earlier slightly less optimal version of LZW.)
A good implementation of compression algorithms using modern algorithms go to 7zip.org. It's open source and they have a C API with a DLL, but you'll have to create the .Net interface (unless someone already made one.)
The non general approach:
This relays on a known characteristic of the data. For example: if you know most of the data is zeroes you can encode only the coordinates of the ones.
If the data contains patches of ones and zeros they can be encoded with RLE or two dimensional variants of the algorithm.
Trying to create your own algorithm for specifically compressing this data will most likely not yield much.
Create a GZipStream with Max CompressionLevel
Run a 4096x4096 loop
- set all 64 bits of a ulong to bits of the array
- when 64 bits are done write the ulong to the compressionstream and start at the first bit again
This will very easily add your cube into a pretty compressed block of memory
Using Huffman Coding you can compress it quite much:
0 => 111
1 => 10
, => 0
\r => 1100
\n => 1101
Yields for you example matrix (in bits):
10011101 11010010 01011001 10111101 00111010 01001011 00110110 01110111
01001110 11111001 10111101 00100111 01001011 00110110 01110111 01110111
01011001 10111101 00111010 0111010
If the commas, line feed and carriage return can be excluded then you only need a BitArray to store each value. Although now you need to know the dimension of the matrix when decoding. If you don't then you could store it as an int and then the data itself if you're planning on serializing the data.
Something like:
var input = #"1,0,0,1,1,1
0,1,0,1,1,1
1,0,0,1,0,0
0,1,1,0,1,1
1,0,0,0,0,1
0,1,0,1,0,1";
var values = new List<bool>();
foreach(var c in input)
{
if (c == '0')
values.Add(false);
else if (c == '1')
values.Add(true);
}
var ba = new BitArray(values.ToArray());
then serialize the BitArray. You'd probably need to add the number of padding bits to properly decode the data. (4096 * 4096 is divisible by 8).
The BitArray approach should get you the most compression unless there is a significant amount of repeating patterns in the matrix (yes I'm assuming the data is mostly random).

insert hex data at specific offset

I need to be able to insert audio data into existing ac3 files. AC3 files are pretty simple and can be appended to each other without stripping headers or anything. The problem I have is that if you want to add/overwrite/erase a chunk of an ac3 file, you have to do it in 32ms increments, and each 32ms is equal to 1536 bytes of data. So when I insert a data chunk (which must be 1536 bytes, as I just said), I need to find the nearest offset that is divisible by 1536 (like 0, 1536 (0x600), 3072 (0xC00), etc). Let's say I can figure that out. I've read about changing a particular character at a specific offset, but I need to INSERT (not overwrite) that entire 1536-byte data chunk. How would I do that in C#, given the starting offset and the 1536-byte data chunk?
Edit: The data chunk I want to insert is basically just 32ms of silence, and I have the hex, ASCII and ANSI text translations of it. Of course, I may want to insert this chunk multiple times to get 128ms of silence instead of just 32, for example.
byte[] filbyte=File.ReadAllBytes(#"C:\abc.ac3");
byte[] tobeinserted=;//allocate in your way using encoding whatever
byte[] total=new byte[filebyte.Length+tobeinserted.Length];
for(int i=0;int j=0;i<total.Length;)
{
if(i==1536*pos)//make pos your choice
{
while(j<tobeinserted.Length)
total[i++]=tobeinserted[j++];
}
else{total[i++]=filbyte[i-j];}
}
File.WriteAllBytes(#"C:\abc.ac3",total);
Here is the helper method that will do what you need:
public static void Insert(string filepath, int insertOffset, Stream dataToInsert)
{
var newFilePath = filepath + ".tmp";
using (var source = File.OpenRead(filepath))
using (var destination = File.OpenWrite(newFilePath))
{
CopyTo(source, destination, insertOffset);// first copy the data before insert
dataToInsert.CopyTo(destination);// write data that needs to be inserted:
CopyTo(source, destination, (int)(source.Length - insertOffset)); // copy remaining data
}
// delete old file and rename new one:
File.Delete(filepath);
File.Move(newFilePath, filepath);
}
private static void CopyTo(Stream source, Stream destination, int count)
{
const int bufferSize = 32 * 1024;
var buffer = new byte[bufferSize];
var remaining = count;
while (remaining > 0)
{
var toCopy = remaining > bufferSize ? bufferSize : remaining;
var actualRead = source.Read(buffer, 0, toCopy);
destination.Write(buffer, 0, actualRead);
remaining -= actualRead;
}
}
And here is an NUnit test with example usage:
[Test]
public void TestInsert()
{
var originalString = "some original text";
var insertString = "_ INSERTED TEXT _";
var insertOffset = 8;
var file = #"c:\someTextFile.txt";
if (File.Exists(file))
File.Delete(file);
using (var originalData = new MemoryStream(Encoding.ASCII.GetBytes(originalString)))
using (var f = File.OpenWrite(file))
originalData.CopyTo(f);
using (var dataToInsert = new MemoryStream(Encoding.ASCII.GetBytes(insertString)))
Insert(file, insertOffset, dataToInsert);
var expectedText = originalString.Insert(insertOffset, insertString);
var actualText = File.ReadAllText(file);
Assert.That(actualText, Is.EqualTo(expectedText));
}
Be aware that I have removed some checks for code clarity - do not forget to check for null, file access permissions and file size. For example insertOffset can be bigger than file length - this condition is not checked here.

How Write a List<int> to binary file (4 bytes long)?

I need to write a List of ints to a binary file of 4 bytes in length, so, I need to make sure that the binary file is correct, and I do the following:
using (FileStream fileStream = new FileStream(binaryFileName, FileMode.Create)) // destiny file directory.
{
using (BinaryWriter binaryWriter = new BinaryWriter(fileStream))
{
for (int i = 0; i < frameCodes.Count; i++)
{
binaryWriter.Write(frameCodes[i]);
binaryWriter.Write(4);
}
binaryWriter.Close();
}
}
at this line: binaryWriter.Write(4); I give the size, is that correct?
at this line "binaryWriter.Write(4);" I give the size, that's correct??
No, it's not correct. The line binaryWriter.Write(4); will write the integer 4 into the stream (e.g. something like 00000000 00000000 00000000 00000100).
This line is correct: binaryWriter.Write(frameCodes[i]);. It writes the integer frameCodes[i] into the stream. Since an integer requires 4 bytes, exactly 4 bytes will be written.
Of course, if your list contains X entries, then the resulting file will be of size 4*X.
AS PER MSDN
These two might help you. I know its not close to answer but will help you in getting the concept
using System;
public class Example
{
public static void Main()
{
int value = -16;
Byte[] bytes = BitConverter.GetBytes(value);
// Convert bytes back to Int32.
int intValue = BitConverter.ToInt32(bytes, 0);
Console.WriteLine("{0} = {1}: {2}",
value, intValue,
value.Equals(intValue) ? "Round-trips" : "Does not round-trip");
// Convert bytes to UInt32.
uint uintValue = BitConverter.ToUInt32(bytes, 0);
Console.WriteLine("{0} = {1}: {2}", value, uintValue,
value.Equals(uintValue) ? "Round-trips" : "Does not round-trip");
}
}
byte[] bytes = { 0, 0, 0, 25 };
// If the system architecture is little-endian (that is, little end first),
// reverse the byte array.
if (BitConverter.IsLittleEndian)
Array.Reverse(bytes);
int i = BitConverter.ToInt32(bytes, 0);
Console.WriteLine("int: {0}", i);

Categories

Resources