reading binary files with R

reading binary files with R - c#

I have to translate a project from c# to R. In this c# project i have to handle binary files.
I have three problems:
1.I am having some issues to convert this code:
//c#
//this work fine
using (BinaryReader rb = new BinaryReader(archive.Entries[0].Open())){
a = rb.ReadInt32();
b = rb.ReadInt32();
c = rb.ReadDouble();
}
#R
#this work, but it reads different values
#I tried to change the size in ReadBin, but it's the same story. The working diretory is the right one
to.read <- "myBinaryFile.tmp"
line1<-c(readBin(to.read,"integer",2),
readBin(to.read,"double",1))
How can I read float (in c# i have rb.ReadSingle()) in R?
Is there in R a function to memorize the position that you have arrived when you are reading a binary file? So next time you will read it again, you could skip what you have already read (as in c# with BinaryReader)

Answering your questions directly:
I am having some issues to convert this code...
What is the problem here? Your code block contains the comment "but it's the same story", but what is the story? You haven't explained anything here. If your problem is with the double, you should try setting readBin(..., size = 8). In your case, your code would read line1 <- c(readBin(to.read,"integer", 2), readBin(to.read, "double", 1, 8)).
How can I read float (in c# i have rb.ReadSingle()) in R?
Floats are 4 bytes in size in this case (I would presume), so set size = 4 in readBin().
Is there in R a function to memorize the position that you have arrived when you are reading a binary file? So next time you will read it again, you could skip what you have already read (as in c# with BinaryReader)
As far as I know there is nothing available (more knowledgeable people are welcome to add their inputs). You could, however, easily write a wrapper script for readBin() that does this for you. For instance, you could specify how many bytes you want to discard (i.e., this can correspond to n bytes that you have already read into R), and read in that many bytes via a dummy readBin() like so readBin(con = yourinput, what = "raw", n = n), where the integer n would indicate the number of bytes you wish to throw away. Thereafter, you could have your wrapper script go read succeeding bytes into a variable of your choice.

Related

Python/C#, Reading File into Byte Array - Not Quite the Same Result

I'm attempting to read a file and process it in both C# and IronPython, but I'm running into a slight problem.
When I read the file in either language, I get a byte array that's almost identical, but not quite.
For instance, the array has 1552 bytes. They're all the same except for one thing. Any time the value "10" appears in the Python implementation, the value "13" appears in the C# implementation. Aside from that, all other bytes are the same.
Here's roughly what I'm doing to get the bytes:
Python:
f = open('C:\myfile.blah')
contents = f.read()
bytes = bytearray(contents, 'cp1252')
C#:
var bytes = File.ReadAllBytes(#"C:\myfile.blah");
Perhaps I'm choosing the wrong encoding? Though I wouldn't think so, since the Python implementation behaves as I would expect and processes the file successfully.
Any idea what's going on here?

(I don't know python) But it looks like you need to pass the 'rb' flag:
open('C:\myfile.blah', 'rb')
Reference:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written.
Note that the values 10 and 13 give clues as to what the problem is:
Line feed is 10 in decimal and Carriage return is 13 in decimal.

Parsing a big CSV file C# .net 4

I know this question has been asked before, but I can't seem to get it working with the answers I've read. I've got a CSV file ~ 1.2GB , If I'm running the process like a 32bit i get outOfMemoryException, it works if i run it as a 64bit process, but it still takes 3,4gb in memory, i do know that I'm storing a lot of data in my customData class, but still 3,4gb of ram?, Am I doing something wrong when reading the file?
dict is a dictionary in which i just have a mapping to which property to save something in, depending on the column it's in. Am i doing the reading the right way?
StreamReader reader = new StreamReader(File.OpenRead(path));
while(!reader.EndOfStream) {
String line = reader.ReadLine();
String[] values = line.Split(';');
CustomData data = new CustomData();
string value;
for (int i = 0; i < values.Length; i++) {
dict.TryGetValue(i, out value);
Type targetType = data.GetType();
PropertyInfo prop = targetType.GetProperty(value);
if(values[i]==null)
{
prop.SetValue(data, "NULL",null);
}
else
{
prop.SetValue(data, values[i], null);
}
}
dataList.Add(data);
}

There doesn't seem to be anything wrong in your usage of the stream reader, you read a line in memory, then forget it.
However, in C# a string is encoded in memory as UTF-16 so on the average a character consumes 2 bytes in memory.
If your CSV contains also a lot of empty fields that you convert to "NULL" you add up to 7 bytes for each empty field.
So on the whole, since you basically store all the data from your file in memory, it's not really surprising that you require almost 3 times the size of the file in memory.
The actual solution is to parse your data by chucks of N lines, treat them, and free them from memory.
Note: Consider using a CSV parser, there is more to CSV than just comas or semi-colons, what if one of your field conatins a semi-colon, a newline, a quote... ?
Edit
Actually each string take up to 20+(N/2)*4 bytes in memory see C# in Depth

Ok a couple of points here.
As pointed out in the comments, .NET under x86 can only consume 1.5GBytes per process, so consider that your maximum memory in 32 bit
The StreamReader itself will have an overhead. I don't know if it caches the entire file in memory, or not (maybe someone can clarify?). If so, reading and processing the file in chunks might be a better solution
The CustomData class, how many fields does it have, and how many instances are created? Note you will need 32bits for each reference in x86 and 64 bits for each reference in x64. So if you have CustomData class, which has 10 fields of type System.Object, each CustomData class before storing any data requires 88 bytes.
The dataList.Add at the end. I assume you are adding to a generic List? If so, note that List employes a doubling algorithm to resize. If you have 1GByte in a List and it requires 1 more byte in size, it will create a 2GByte array and copy the 1GByte to the 2GByte array on resize. So all of a sudden the 1GByte + 1 byte actually requires 3GBytes to manipulate. Another alternative is to use a pre-sized array

How to combine SoundEffectInstances into a new Sound File /mp3 or wav

I'm working on the new WindowsPhone platform. I have a few intances of a SoundEffectInstance that I would like to combine into a new single Sound file (either SoundEffectInstance, SoundEffect or MediaElement, it does not matter.) I then want to save that file as an mp3 to the phone.
How do I do that? Normally, I would try to send all the files to a bytearray but I'm not sure if that is the correct method here, or how to convert the bytearray into an MP3 format sound.
So for example I have SoundEffectInstance soudBackground, playing from 0 - 5 seconds. I then have SoundEffectInstance chime playing from 3 - 4 seconds, and SoundEffectInstance foreground playing from 3.5 to 7 seconds. I want to combine all these into a single mp3 file that lasts 7 seconds long.

There are two task that you are trying to accomplish here:
Combine several sound files into a single sound file
Save the resulting file as an MP3.
As far as I have found thus far you will have a good bit of challenges with item 2. To date I have not found a pure .Net MP3 encoder. All the ones I find rely on P/Invokes to native code (Which of course won't work on the phone).
As for combining the files, you don't want to treat them as a SoundEffectInstance. That class is only meant for playing and it abstracts most of the details of the sound file away. Instead you will need to treat the sound files as arrays of ints. I'm going to assume that the sample rate on all three sound files is the exact same and that these are 16-bit recordings. I am also going to assume that these wave files are recorded in mono. I'm keeping the scenario simple for now. You can extend upon it with stereo and various sample rates after you've mastered this simpler scenario.
The first 48 bytes of the wave files is nothing but header. Skip past that (for now) and read the contents of the wave files into their own arrays. Once they are all read we can start mixing them together. Ignoring the time differences in which you want to start playing these sounds if we wanted to start producing a sample that is the combined result of all three we could do it by adding the values in the sound file array together and writing that out to an array to hold our result. But there's a problem. 16-bit numbers can only go up to 32,767 (and down to -32,768). If the combined value of all three sounds were to go beyond these limits you'll get really bad distortion. The easiest (though not necessarily the best) way to handle this is to consider the maximum number of simultaneous sounds that will play and scale the values down accordingly. From the 3.5 second to 4 second mark you will have all three sounds playing. So we will scale by dividing by three. Another way is to sum up the sound samples using a data type that can go beyond this range and then normalizing the values back to this range when you are done mixing them together.
Let's define some parameters.
int SamplesPerSecond = 22000;
int ResultRecordingLength = 7;
short[] Sound01;
short[] Sound02;
short[] Sound03;
int[] ResultantSoundBuffer;
short[] ProcessedResultSoundBuffer;
//Insert code to populate sound array's here.
// Sound01.Length will equal 5.0*SamplesPerSecond
// Sound02.Length will equal 1.0*SamplesPerSecond
// Sound03.Length will equal 3.5*SamplesPerSecond
ResultantSound = new int[ResultRecordingLength*SamplesPerSecond];
Once you've got your sound files read and the array prepared for receiving the resulting file you can start rendering. There's several ways we could go about this. Here is one:
void InitResultArray(int[] resultArray)
{
for(int i=0;i<resultArray.Length;++i)
{
resultArray[i]=0;
}
}
void RenderSound(short[] sourceSound, int[] resultArray, double timeOffset)
{
int startIndex = (int)(timeOffset*SamplesPerSecond);
int readIndex = 0;
for(int readIndex=0;((readIndex<sourceSound.Length)&&(readIndex+sourceSound<resultArray.Length;++readIndex)
{
resultArray[readIndex+startIndex] += (int)sourceSound[readIndex];
}
}
RangeAdjust(int[] resultArray)
{
int max = int.MinimumValue;
int min = int.MaximumValue;
for(int i=0;i<resultArray;++i)
{
max = Math.Max(max, resultArray[i]);
min = Math.Min(min, resultArray[i]);
}
//I want the range normalized to [-32,768..32,768]
//you may want to normalize differently.
double scale = 65536d/(double)(max-min);
double offset = 32767-(max*scale);
for(int i=0;i<resultArray.Length;++i)
{
resultArray[i]= (scale*resultArray[i])+offset;
}
}
You would call InitResultAttay to ensure the result array is filled with zeros (I believe it is by default, but I still prefer to explicitly set it to zero) and then call RenderSound() for each sound that you want in your result. After you've rendered your sounds call RangeAdjust to normalize the sound. All that's left is to write it to a file. You'll need to convert from ints back to shorts.
short[] writeBuffer = new short[ResultantSound.Length];
for(int i=0;i<writeBuffer.Length;++i)
writeBuffer[i]=(short)ResultantSound[i];
Now the mixed sound is all ready to write to the file. There is just one thing missing, you need to write the 48 byte wave header before writing the file. I've written code on how to do that here: http://www.codeproject.com/KB/windows-phone-7/WpVoiceMemo.aspx

Custom archive format File Reading

C#.NET 4.0
I'm having an interesting problem here with reading a custom file archive format. In C#, I wrote a program that creates an archive header (some overhead info about the archive as a whole, number of files, those kinds of things). It then takes an input file to be stored, reads and bytes, and then writes some overhead about the file (filename, type, size and such) and then the actual file data. I can also extract files from the archive through this program. To test it, I stored a png image and extracted it by reading the filesize from the overhead and then allocating an array of bytes of that size, pulled the filedata into that array, and then wrote it with a streamwriter. No big deal, worked fine. Now, we go to the C++ side...
C++
My C++ program needs to read the filedata in, determine the filetype, and then pass it off to the appropriate processing class. The processing classes were giving errors, which they shouldn't have. So I decided to write the filedata out fro the C++ program after reading it using fwrite(), and the resulting file appears to be damaged? In a nutshell, this is the code being used to read the file...
unsigned char * data = 0;
char temp = 0;
__int64 fileSize = 0;
fread(&fileSize, sizeof(__int64), 1, _fileHandle);
data = new unsigned char[fileSize];
for (__int64 i = 0; i < fileSize; i++)
{
fread(&temp, 1, 1, _fileHandle);
data[i] = temp;
}
(I'm at work right now, so I just wrote this from memory. However, I'm 99% positive it's accurate to my code at home. I'm also not concerned with non MS Standards at the moment, so please bear with the __int64.)
I haven't gone through all 300 something thousand bytes to determine if everything is consistent, but the first 20 or so bytes that I looked at appear to be correct. I don't exactly see why there is a problem. Is there something funny about fread()? I also to double check the file in the archive, removed all the archive overhead and saved just the image data to a new png image with notepad, which worked fine.
Should I be reading this differently? Is there something wrong with using fread() to read in this data?

Given that the first n bytes appear to be correct, did you by chance forget to open the file in binary mode ("rb")? If you didn't then it's helpfully converting any sequences of \r\n into \n for you which would obviously not be what you want.
Since this question is tagged C++ did you consider using the canonical C++ approach of iostreams rather than the somewhat antiquated FILE* streams from C?

How to write a file format handler

Today i'm cutting video at work (yea me!), and I came across a strange video format, an MOD file format with an companion MOI file.
I found this article online from the wiki, and I wanted to write a file format handler, but I'm not sure how to begin.
I want to write a file format handler to read the information files, has anyone ever done this and how would I begin?
Edit:
Thanks for all the suggestions, I'm going to attempt this tonight, and I'll let you know. The MOI files are not very large, maybe 5KB in size at most (I don't have them in front of me).

You're in luck in that the MOI format at least spells out the file definition. All you need to do is read in the file and interpret the results based on the file definition.
Following the definition, you should be able to create a class that could read and interpret a file which returns all of the file format definitions as properties in their respective types.
Reading the file requires opening the file and generally reading it on a byte-by-byte progression, such as:
using(FileStream fs = File.OpenRead(path-to-your-file)) {
while(true) {
int b = fs.ReadByte();
if(b == -1) {
break;
}
//Interpret byte or bytes here....
}
}
Per the wiki article's referenced PDF, it looks like someone already reverse engineered the format. From the PDF, here's the first entry in the format:
Hex-Address: 0x00
Data Type: 2 Byte ASCII
Value (Hex): "V6"
Meaning: Version
So, a simplistic implementation could pull the first 2 bytes of data from the file stream and convert to ASCII, which would provide a property value for the Version.
Next entry in the format definition:
Hex-Address: 0x02
Data Type: 4 Byte Unsigned Integer
Value (Hex):
Meaning: Total size of MOI-file
Interpreting the next 4 bytes and converting to an unsigned int would provide a property value for the MOI file size.
Hope this helps.

If the files are very large and just need to be streamed in, I would create a new reader object that uses an unmanagedmemorystream to read the information in.
I've done a lot of different file format processing like this. More recently, I've taken to making a lot of my readers more functional where reading tends to use 'yield return' to return read only objects from the file.
However, it all depends on what you want to do. If you are trying to create a general purpose format for use in other applications or create an API, you probably want to conform to an existing standard. If however you just want to get data into your own application, you are free to do it however you want. You could use a binaryreader on the stream and construct the information you need within your app, or get the reader to return objects representing the contents of the file.
The one thing I would recommend. Make sure it implements IDisposable and you wrap it in a using!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

reading binary files with R - c#

Related

Python/C#, Reading File into Byte Array - Not Quite the Same Result

Parsing a big CSV file C# .net 4

How to combine SoundEffectInstances into a new Sound File /mp3 or wav

Custom archive format File Reading

How to write a file format handler

Categories

Resources