I have a binary file which I am reading to a collection of byte arrays.
The file contains multiple (arbitrary number) of records. Essentially a block of bytes. Each record is of arbitrary length.
The header of the file provides the offsets of each of the records.
record 0: offset 2892
record 1: offset 4849
....
record 98: offset 328932
record 99: offset 338498
I have written code to do loop and read in each record to it's byte array. Looking at difference in offsets gives me the record size. A seek to the offset and then a call to ReadBytes() reads the record into its array.
My current incomplete solution won't work for the last record. How would you read that last record into an array (remember it is of arbitrary length).
As for why? Each record is encrypted and needs to be decrypted separately. I am writing code which will read in each record into a byte array. Decrypt it and then write all the record back to a file.
Code added at request:
//recordOffsets contain byte location of each record start. All headers (other than universal header) are contained within record 0.
recordBlocks = new List<RecordBlock>();
//store all recordOffsets. Record0 offset will be used to load rest of headers. Remaining are used to parse text of eBook.
for (int i = 0; i < standardHeader.numRecs; i++)
{
RecordBlock r = new RecordBlock();
r.offset = bookReader.ReadInt32(EndianReader.Endian.BigEndian);
r.number = bookReader.ReadInt32(EndianReader.Endian.BigEndian);
recordBlocks.Add(r);
}
foreach (RecordBlock r in recordBlocks)
{
if (r.number == recordBlocks.Count)
{
///deal with last record
}
else
{
r.size = recordBlocks[(r.number) + 1].offset - r.offset;
}
bookReader.Seek(r.offset, SeekOrigin.Begin);
r.data = bookReader.ReadBytes(r.size);
}
System.IO.File.ReadAllBytes() will read all bytes in Byte Array and after that you can read from that byte array record by record.
You could use the Length property from the FileInfo Class to determine the total number of bytes, so that you can calculate the amount of bytes of the last record as well.
So you can keep most of your current logic.
your problem which seems to me is how will you get the actual record size of the last record.
Either you may add this information in the header explicitly then your code will work as charm in my thoughts.
I'm not a .net guy, but it would seem you have a couple options. There's got to be a way to tell the size of the file. if you can find that, you can read everything. alternatively, the msdn description for binaryreader.readbytes() says thT if you ask for mroe than the stream contains you'll get whatever's in the file. do you know the max size of the blob you're reading? if so, just read that into pre-cleared memory.
Related
I have to translate a project from c# to R. In this c# project i have to handle binary files.
I have three problems:
1.I am having some issues to convert this code:
//c#
//this work fine
using (BinaryReader rb = new BinaryReader(archive.Entries[0].Open())){
a = rb.ReadInt32();
b = rb.ReadInt32();
c = rb.ReadDouble();
}
#R
#this work, but it reads different values
#I tried to change the size in ReadBin, but it's the same story. The working diretory is the right one
to.read <- "myBinaryFile.tmp"
line1<-c(readBin(to.read,"integer",2),
readBin(to.read,"double",1))
How can I read float (in c# i have rb.ReadSingle()) in R?
Is there in R a function to memorize the position that you have arrived when you are reading a binary file? So next time you will read it again, you could skip what you have already read (as in c# with BinaryReader)
Answering your questions directly:
I am having some issues to convert this code...
What is the problem here? Your code block contains the comment "but it's the same story", but what is the story? You haven't explained anything here. If your problem is with the double, you should try setting readBin(..., size = 8). In your case, your code would read line1 <- c(readBin(to.read,"integer", 2), readBin(to.read, "double", 1, 8)).
How can I read float (in c# i have rb.ReadSingle()) in R?
Floats are 4 bytes in size in this case (I would presume), so set size = 4 in readBin().
Is there in R a function to memorize the position that you have arrived when you are reading a binary file? So next time you will read it again, you could skip what you have already read (as in c# with BinaryReader)
As far as I know there is nothing available (more knowledgeable people are welcome to add their inputs). You could, however, easily write a wrapper script for readBin() that does this for you. For instance, you could specify how many bytes you want to discard (i.e., this can correspond to n bytes that you have already read into R), and read in that many bytes via a dummy readBin() like so readBin(con = yourinput, what = "raw", n = n), where the integer n would indicate the number of bytes you wish to throw away. Thereafter, you could have your wrapper script go read succeeding bytes into a variable of your choice.
I'm loading in a binary file into a memory stream and modifying the bytes and then storing the file to disk. However to save time I retain the modified byte array to calculate a checksum. When I load the saved file from disk and calculate the checksum the file length is about 150 bytes different from the original byte length when it was saved and obviously the checksum doesn't match the one before it was saved. Any ideas as to why this happens? I've searched and searched for clues but it looks like I'd have to reload the file after it was saved to calculate an accurate checksum.
Also note that the shorter byte array does render its contents correctly and so does the longer byte array, in fact the two arrays render identically!
Here's the code that collects the modified bytes from the memory stream:
writerStream.Flush();
storedFile = new Byte[writerStream.Length];
writerStream.Position = 0;
writerStream.Read(storedFile, 0, Convert.ToInt32(writerStream.Length));
And here's how I read the file:
using (BinaryReader readFile = new BinaryReader(Delimon.Win32.IO.File.Open(filePath, Delimon.Win32.IO.FileMode.Open)))
{
byte[] cgmBytes = readFile.ReadBytes(Convert.ToInt32(readFile.BaseStream.Length));
hash = fileCheck.ComputeHash(cgmBytes);
}
And here's how the file is saved:
(using BinaryWriter aWriter = new BinaryWriter(Delimon.Win32.IO.File.Create(filePath))
{
aWriter.Write(storedFile);
}
Any suggestions would be much appreciated.
Thx
The problem seems to have resolved itself by simply changing the point where the stream position is set:
writerStream.Flush();
writerStream.Position = 0;
storedFile = new Byte[writerStream.Length];
writerStream.Read(storedFile, 0, Convert.ToInt32(writerStream.Length));
In the previous code the Position was set after reading the stream length, now the position is set before reading the stream length. In either case the byte length DOES NOT CHANGE, but the saved file when retrieved returns the identical byte length now. Why? Not sure, setting the stream position does not affect the stream length nor should it affect how a newly instantiated writer decides to save the byte array. Gremlins?...
What's the best and the fastest method to remove an item from a binary file?
I have a binary file and I know that I need to remove B number of bytes from a position A, how to do it?
Thanks
You might want to consider working in batches to prevent allocation on the LOH but that depends on the size of your file and the frequency in which you call this logic.
long skipIndex = 100;
int skipLength = 40;
using (FileStream fileStream = File.Open("file.dat", FileMode.Open))
{
int bufferSize;
checked
{
bufferSize = (int)(fileStream.Length - (skipLength + skipIndex));
}
byte[] buffer = new byte[bufferSize];
// read all data after
fileStream.Position = skipIndex + skipLength;
fileStream.Read(buffer, 0, bufferSize);
// write to displacement
fileStream.Position = skipIndex;
fileStream.Write(buffer, 0, bufferSize);
fileStream.SetLength(fileStream.Position); // trim the file
}
Depends... There are a few ways to do this, depending on your requirements.
The basic solution is to read chunks of data from the source file into a target file, skipping over the bits that must be removed (is it always only one segment to remove, or multiple segments?). After you're done, delete the original file and rename the temp file to the original's name.
Things to keep in mind here are that you should tend towards larger chunks rather than smaller. The size of your files will determine a suitable value. 1MB is a good 'default'.
The simple approach assumes that deleting and renaming a new file is not a problem. If you have specific permissions attached to the file, or used NTFS streams or some-such, this approach won't work.
In that case, make a copy of the original file. Then, skip to the first byte after the segment to ignore in the copied file, skip to the start of the segment in the source file, and transfer bytes from copy to original. If you're using Streams, you'll want to call Stream.SetLength to truncate the original to the correct size
If you want to just rewrite the original file, and remove a sequence from it the best way is to "rearrange" the file.
The idea is:
for i = A+1 to file.length - B
file[i] = file[i+B]
For better performance it's best to read and write in chunks and not single bytes. Test with different chunk sizes to see what best for your target system.
I am using a windows mobile compact edition 6.5 phone and am writing out binary data to a file from bluetooth. These files get quite large, 16M+ and what I need to do is to once the file is written then I need to search the file for a start character and then delete everything before, thus eliminating garbage. I cannot do this inline when the data comes in due to graphing issues and speed as I get alot of data coming in and there is already too many if conditions on the incoming data. I figured it was best to post process. Anyway here is my dilemma, speed of search for the start bytes and the rewrite of the file takes sometimes 5mins or more...I basically move the file over to a temp file parse through it and rewrite a whole new file. I have to do this byte by byte.
private void closeFiles() {
try {
// Close file stream for raw data.
if (this.fsRaw != null) {
this.fsRaw.Flush();
this.fsRaw.Close();
// Move file, seek the first sync bytes,
// write to fsRaw stream with sync byte and rest of data after it
File.Move(this.s_fileNameRaw, this.s_fileNameRaw + ".old");
FileStream fsRaw_Copy = File.Open(this.s_fileNameRaw + ".old", FileMode.Open);
this.fsRaw = File.Create(this.s_fileNameRaw);
int x = 0;
bool syncFound = false;
// search for sync byte algorithm
while (x != -1) {
... logic to search for sync byte
if (x != -1 && syncFound) {
this.fsPatientRaw.WriteByte((byte)x);
}
}
this.fsRaw.Close();
fsRaw_Copy.Close();
File.Delete(this.s_fileNameRaw + ".old");
}
} catch(IOException e) {
CLogger.WriteLog(ELogLevel.ERROR,"Exception in writing: " + e.Message);
}
}
There has got to be a faster way than this!
------------Testing times using answer -------------
Initial Test my way with one byte read and and one byte write:
27 Kb/sec
using a answer below and a 32768 byte buffer:
321 Kb/sec
using a answer below and a 65536 byte buffer:
501 Kb/sec
You're doing a byte-wise copy of the entire file. That can't be efficient for a load of reasons. Search for the start offset (and end offset if you need both), then copy from one stream to another the entire contents between the two offsets (or the start offset and end of file).
EDIT
You don't have to read the entire contents to make the copy. Something like this (untested, but you get the idea) would work.
private void CopyPartial(string sourceName, byte syncByte, string destName)
{
using (var input = File.OpenRead(sourceName))
using (var reader = new BinaryReader(input))
using (var output = File.Create(destName))
{
var start = 0;
// seek to sync byte
while (reader.ReadByte() != syncByte)
{
start++;
}
var buffer = new byte[4096]; // 4k page - adjust as you see fit
do
{
var actual = reader.Read(buffer, 0, buffer.Length);
output.Write(buffer, 0, actual);
} while (reader.PeekChar() >= 0);
}
}
EDIT 2
I actually needed something similar to this today, so I decided to write it without the PeekChar() call. Here's the kernel of what I did - feel free to integrate it with the second do...while loop above.
var buffer = new byte[1024];
var total = 0;
do
{
var actual = reader.Read(buffer, 0, buffer.Length);
writer.Write(buffer, 0, actual);
total += actual;
} while (total < reader.BaseStream.Length);
Don't discount an approach because you're afraid it will be too slow. Try it! It'll only take 5-10 minutes to give it a try and may result in a much better solution.
If the detection process for the start of the data is not too complex/slow, then avoiding writing data until you hit the start may actually make the program skip past the junk data more efficiently.
How to do this:
Use a simple bool to know whether or not you have detected the start of the data. If you are reading junk, then don't waste time writing it to the output, just scan it to detect the start of the data. Once you find the start, then stop scanning for the start and just copy the data to the output. Just copying the good data will incur no more than an if (found) check, which really won't make any noticeable difference to your performance.
You may find that in itself solves the problem. But you can optimise it if you need more performance:
What can you do to minimise the work you do to detect the start of the data? Perhaps if you are looking for a complex sequence you only need to check for one particular byte value that starts the sequence, and it's only if you find that start byte that you need to do any more complex checking. There are some very simple but efficient string searching algorithms that may help in this sort of case too. Or perhaps you can allocate a buffer (e.g. 4kB) and gradually fill it with bytes from your incoming stream. When the buffer is filled, then and only then search for the end of the "junk" in your buffer. By batching the work you can make use of memory/cache coherence to make the processing considerably more efficient than it would be if you did the same work byte by byte.
Do all the other "conditions on the incoming data" need to be continually checked? How can you minimise the amount of work you need to do but still achieve the required results? Perhaps some of the ideas above might help here too?
Do you actually need to do any processing on the data while you are skipping junk? If not, then you can break the whole thing into two phases (skip junk, copy data), and skipping the junk won't cost you anything when it actually matters.
I want to convert an image file to a string. The following works:
MemoryStream ms = new MemoryStream();
Image1.Save(ms, ImageFormat.Jpeg);
byte[] picture = ms.ToArray();
string formmattedPic = Convert.ToBase64String(picture);
However, when saving this to a XmlWriter, it takes ages before it's saved(20secs for a 26k image file). Is there a way to speed this action up?
Thanks,
Raks
There are three points where you are doing large operations needlessly:
Getting the stream's bytes
Converting it to Base64
Writing it to the XmlWriter.
Instead. First call Length and GetBuffer. This let's you operate upon the stream's buffer directly. (Do flush it first though).
Then, implement base-64 yourself. It's relatively simple as you take groups of 3 bytes, do some bit-twiddling to get the index into the character it'll be converted to, and then output that character. At the very end you add some = symbols according to how many bytes where in the last block sent (= for one remainder byte, == for two remainder bytes and none if there were no partial blocks).
Do this writting into a char buffer (a char[]). The most efficient size is a matter for experimentation but I'd start with 2048 characters. When you've filled the buffer, call XmlWriter.WriteRaw on it, and then start writing back at index 0 again.
This way, you're doing less allocations, and you're started on the output from the moment you've got your image loaded into the memory stream. Generally, this should result in better throughput.