I don't know C++ very well especially the IO part. Can anyone please help me to translate the following C++ code into C#?
unsigned *PostingOffset, *PostingLength, NumTerms;
void LoadSubIndex(char *subindex) {
FILE *in = fopen(subindex, "rb");
if (in == 0) {
printf("Error opening sub-index file '%s'!\n", subindex);
exit(EXIT_FAILURE);
}
int len=0;
// Array of terms
char **Term;
char *TermList;
fread(&NumTerms, sizeof(unsigned), 1, in);
PostingOffset = (unsigned*)malloc(sizeof(unsigned) * NumTerms);
PostingLength = (unsigned*)malloc(sizeof(unsigned) * NumTerms);
Term = (char**)malloc(sizeof(char*) * NumTerms);
Term = (char**)malloc(sizeof(char*) * NumTerms);
// Offset of each posting
fread(PostingOffset, sizeof(unsigned), NumTerms, in);
// Length of each posting in bytes
fread(PostingLength, sizeof(unsigned), NumTerms, in);
// Number of bytes in the posting terms array
fread(&len, sizeof(unsigned), 1, in);
TermList = (char*)malloc(sizeof(char) * len);
fread(TermList, sizeof(unsigned)*len, 1, in);
unsigned k=1;
Term[0] = &TermList[0];
for (int i=1; i<len; i++) {
if (TermList[i-1] == '\0') {
Term[k] = &TermList[i];
k++;
}
}
fclose(in);
}
Thanks in advance.
I'll give you a headstart.
using(var reader = new BinaryReader(new FileStream(subindex, FileMode.Open)) {
int numTerms = reader.ReadUInt32();
postingOffset = new UInt32[numTerms];
postingLength = new UInt32[numTerms];
var term = new byte[numTerms];
for(int i=0;i<numTerms;i++)
postingOffset[i] = reader.ReadUInt32();
for(int i=0;i<numTerms;i++)
postingLength[i] = reader.ReadUInt32();
var len = reader.ReadInt32();
var termList = new ... // byte[] or uint32[] ??
//etc
}
There's no need to close the file handle here - it will close when the using { } block loses scope.
I didn't finish it because there are some flaws in your code. With TermList you are reading in 4 times as much data as you've allocated. You shouldn't be allocating Term twice either - that will result in leaking memory.
To turn Term back into a string, use Encoding.ASCII.GetString(term).TrimEnd('\0');
Related
I am trying to replicate behavior of a perl script in my c# code. When we convert any value into the Byte[] it should look same irrespective of the language used. SO
I have this function call which looks like this in perl:
$diag_cmd = pack("V", length($s_part)) . $s_part;
where $s_par is defined in following function. It is taking the .pds file at the location C:\Users\c_desaik\Desktop\DIAG\PwrDB\offtarget\data\get_8084_gpio.pds
$s_part =
sub read_pds
{
my $bin_s;
my $input_pds_file = $_[0];
open(my $fh, '<', $input_pds_file) or die "cannot open file $input_pds_file";
{
local $/;
$bin_s = <$fh>;
}
close($fh);
return $bin_s;
}
My best guess is that this function is reading the .pds file and turning it into a Byte array.
Now, I tried to replicate the behavior into c# code like following
static byte[] ConstructPacket()
{
List<byte> retval = new List<byte>();
retval.AddRange(System.IO.File.ReadAllBytes(#"C:\Users\c_desaik\Desktop\DIAG\PwrDB\offtarget\data\get_8084_gpio.pds"));
return retval.ToArray();
}
But the resulting byte array does not look same. Is there any special mechanism that I have to follow to replicate the behavior of pack("V", length($s_part)) . $s_part ?
As Simon Whitehead mentioned the template character V tells pack to pack your values into unsigned long (32-bit) integers (in little endian order). So you need to convert your bytes to a list (or array) of unsigned integers.
For example:
static uint[] UnpackUint32(string filename)
{
var retval = new List<uint>();
using (var filestream = System.IO.File.Open(filename, System.IO.FileMode.Open))
{
using (var binaryStream = new System.IO.BinaryReader(filestream))
{
var pos = 0;
while (pos < binaryStream.BaseStream.Length)
{
retval.Add(binaryStream.ReadUInt32());
pos += 4;
}
}
}
return retval.ToArray();
}
And call this function:
var list = UnpackUint32(#"C:\Users\c_desaik\Desktop\DIAG\PwrDB\offtarget\data\get_8084_gpio.pds");
Update
If you wanna read one length-prefixed string or a list of them, you can use this function:
private string[] UnpackStrings(string filename)
{
var retval = new List<string>();
using (var filestream = System.IO.File.Open(filename, System.IO.FileMode.Open))
{
using (var binaryStream = new System.IO.BinaryReader(filestream))
{
var pos = 0;
while ((pos + 4) <= binaryStream.BaseStream.Length)
{
// read the length of the string
var len = binaryStream.ReadUInt32();
// read the bytes of the string
var byteArr = binaryStream.ReadBytes((int) len);
// cast this bytes to a char and append them to a stringbuilder
var sb = new StringBuilder();
foreach (var b in byteArr)
sb.Append((char)b);
// add the new string to our collection of strings
retval.Add(sb.ToString());
// calculate start position of next value
pos += 4 + (int) len;
}
}
}
return retval.ToArray();
}
pack("V", length($s_part)) . $s_part
which can also be written as
pack("V/a*", $s_part)
creates a length-prefixed string. The length is stored as a 32-bit unsigned little-endian number.
+----------+----------+----------+----------+-------- ...
| Length | Length | Length | Length | Bytes
| ( 7.. 0) | (15.. 8) | (23..16) | (31..24) |
+----------+----------+----------+----------+-------- ...
This is how you recreate the original string from the bytes:
Read 4 bytes
If using a machine other than a little-endian machine,
Rearrange the bytes into the native order.
Cast those bytes into an 32-bit unsigned integer.
Read a number of bytes equal to that number.
Convert that sequences of bytes into a string.
Some languages provide tools that perform more than one of these steps.
I don't know C#, so I can't write the code for you, but I can give you an example in two other languages.
In Perl, this would be written as follows:
sub read_bytes {
my ($fh, $num_bytes_to_read) = #_;
my $buf = '';
while ($num_bytes_to_read) {
my $num_bytes_read = read($fh, $buf, $num_bytes_to_read, length($buf));
if (!$num_bytes_read) {
die "$!\n" if !defined($num_bytes_read);
die "Premature EOF\n";
}
$num_bytes_to_read -= $num_bytes_read;
}
return $buf;
}
sub read_uint32le { unpack('V', read_bytes($_[0], 4)) }
sub read_pstr { read_bytes($_[0], read_uint32le($_[0])) }
my $str = read_pstr($fh);
In C,
int read_bytes(FILE* fh, void* buf, size_t num_bytes_to_read) {
while (num_bytes_to_read) {
size_t num_bytes_read = fread(buf, 1, num_bytes_to_read, fh);
if (!num_bytes_read)
return 0;
num_bytes_to_read -= num_bytes_read;
buf += num_bytes_read;
}
return 1;
}
int read_uint32le(FILE* fh, uint32_t* p_i) {
int ok = read_bytes(fh, p_i, sizeof(*p_i));
if (!ok)
return 0;
{ /* Rearrange bytes on non-LE machines */
const char* p = (char*)p_i;
*p_i = ((((p[3] << 8) | p[2]) << 8) | p[1]) << 8) | p[0];
}
return 1;
}
char* read_pstr(FILE* fh) {
uint32_t len;
char* buf = NULL;
int ok;
ok = read_uint32le(fh, &len);
if (!ok)
goto ERROR;
buf = malloc(len+1);
if (!buf)
goto ERROR;
ok = read_bytes(fh, buf, len);
if (!ok)
goto ERROR;
buf[len] = '\0';
return buf;
ERROR:
if (p)
free(p);
return NULL;
}
char* str = read_pstr(fh);
I would like to know how I could read a file into ByteArrays that are 4 bytes long.
These arrays will be manipulated and then have to be converted back to a single array ready to be written to a file.
EDIT:
Code snippet.
var arrays = new List<byte[]>();
using (var f = new FileStream("file.cfg.dec", FileMode.Open))
{
for (int i = 0; i < f.Length; i += 4)
{
var b = new byte[4];
var bytesRead = f.Read(b, i, 4);
if (bytesRead < 4)
{
var b2 = new byte[bytesRead];
Array.Copy(b, b2, bytesRead);
arrays.Add(b2);
}
else if (bytesRead > 0)
arrays.Add(b);
}
}
foreach (var b in arrays)
{
BitArray source = new BitArray(b);
BitArray target = new BitArray(source.Length);
target[26] = source[0];
target[31] = source[1];
target[17] = source[2];
target[10] = source[3];
target[30] = source[4];
target[16] = source[5];
target[24] = source[6];
target[2] = source[7];
target[29] = source[8];
target[8] = source[9];
target[20] = source[10];
target[15] = source[11];
target[28] = source[12];
target[11] = source[13];
target[13] = source[14];
target[4] = source[15];
target[19] = source[16];
target[23] = source[17];
target[0] = source[18];
target[12] = source[19];
target[14] = source[20];
target[27] = source[21];
target[6] = source[22];
target[18] = source[23];
target[21] = source[24];
target[3] = source[25];
target[9] = source[26];
target[7] = source[27];
target[22] = source[28];
target[1] = source[29];
target[25] = source[30];
target[5] = source[31];
var back2byte = BitArrayToByteArray(target);
arrays.Clear();
arrays.Add(back2byte);
}
using (var f = new FileStream("file.cfg.enc", FileMode.Open))
{
foreach (var b in arrays)
f.Write(b, 0, b.Length);
}
EDIT 2:
Here is the Ugly Betty-looking code that accomplishes what I wanted. Now I must refine it for performance...
var arrays_ = new List<byte[]>();
var arrays_save = new List<byte[]>();
var arrays = new List<byte[]>();
using (var f = new FileStream("file.cfg.dec", FileMode.Open))
{
for (int i = 0; i < f.Length; i += 4)
{
var b = new byte[4];
var bytesRead = f.Read(b, 0, b.Length);
if (bytesRead < 4)
{
var b2 = new byte[bytesRead];
Array.Copy(b, b2, bytesRead);
arrays.Add(b2);
}
else if (bytesRead > 0)
arrays.Add(b);
}
}
foreach (var b in arrays)
{
arrays_.Add(b);
}
foreach (var b in arrays_)
{
BitArray source = new BitArray(b);
BitArray target = new BitArray(source.Length);
target[26] = source[0];
target[31] = source[1];
target[17] = source[2];
target[10] = source[3];
target[30] = source[4];
target[16] = source[5];
target[24] = source[6];
target[2] = source[7];
target[29] = source[8];
target[8] = source[9];
target[20] = source[10];
target[15] = source[11];
target[28] = source[12];
target[11] = source[13];
target[13] = source[14];
target[4] = source[15];
target[19] = source[16];
target[23] = source[17];
target[0] = source[18];
target[12] = source[19];
target[14] = source[20];
target[27] = source[21];
target[6] = source[22];
target[18] = source[23];
target[21] = source[24];
target[3] = source[25];
target[9] = source[26];
target[7] = source[27];
target[22] = source[28];
target[1] = source[29];
target[25] = source[30];
target[5] = source[31];
var back2byte = BitArrayToByteArray(target);
arrays_save.Add(back2byte);
}
using (var f = new FileStream("file.cfg.enc", FileMode.Open))
{
foreach (var b in arrays_save)
f.Write(b, 0, b.Length);
}
EDIT 3:
Loading a big file into byte arrays of 4 bytes wasn't the smartest idea...
I have over 68 million arrays being processed and manipulated. I really wonder if its possible to load it into a single array and still have the bit manipulation work. :/
Here's another way, similar to #igofed's solution:
var arrays = new List<byte[]>();
using (var f = new FileStream("test.txt", FileMode.Open))
{
for (int i = 0; i < f.Length; i += 4)
{
var b = new byte[4];
var bytesRead = f.Read(b, i, 4);
if (bytesRead < 4)
{
var b2 = new byte[bytesRead];
Array.Copy(b, b2, bytesRead);
arrays.Add(b2);
}
else if (bytesRead > 0)
arrays.Add(b);
}
}
//make changes to arrays
using (var f = new FileStream("test-out.txt", FileMode.Create))
{
foreach (var b in arrays)
f.Write(b, 0, b.Length);
}
Regarding your "Edit 3" ... I'll bite, although it's really a diversion from the original question.
There's no reason you need Lists of arrays, since you're just breaking up the file into a continuous list of 4-byte sequences, looping through and processing each sequence, and then looping through and writing each sequence. You can do much better. NOTE: The implementation below does not check for or handle input files whose lengths are not exactly multiples of 4. I leave that as an exercise to you, if it is important.
To directly address your comment, here is a single-array solution. We'll ditch the List objects, read the whole file into a single byte[] array, and then copy out 4-byte sections of that array to do your bit transforms, then put the result back. At the end we'll just slam the whole thing into the output file.
byte[] data;
using (Stream fs = File.OpenRead("E:\\temp\\test.bmp")) {
data = new byte[fs.Length];
fs.Read(data, 0, data.Length);
}
byte[] element = new byte[4];
for (int i = 0; i < data.Length; i += 4) {
Array.Copy(data, i, element, 0, element.Length);
BitArray source = new BitArray(element);
BitArray target = new BitArray(source.Length);
target[26] = source[0];
target[31] = source[1];
// ...
target[5] = source[31];
target.CopyTo(data, i);
}
using (Stream fs = File.OpenWrite("E:\\temp\\test_out.bmp")) {
fs.Write(data, 0, data.Length);
}
All of the ugly initial read code is gone since we're just using a single byte array. Notice I reserved a single 4-byte array before the processing loop to re-use, so we can save the garbage collector some work. Then we loop through the giant data array 4 bytes at a time and copy them into our working array, use that to initialize the BitArrays for your transforms, and then the last statement in the block converts the BitArray back into a byte array, and copies it directly back to its original location within the giant data array. This replaces BitArrayToByteArray method, since you did not provide it. At the end, writing is also easy since it's just slamming out the now-transformed giant data array.
When I ran your original solution I got an OutOfMemory exception on my original test file of 100MB, so I used a 44MB file. It consumed 650MB in memory and ran in 30 seconds. The single-array solution used 54MB of memory and ran in 10 seconds. Not a bad improvement, and it demonstrates how bad holding onto millions of small array objects is.
Here is what you want:
using (var reader = new StreamReader("inputFileName"))
{
using (var writer = new StreamWriter("outputFileName"))
{
char[] buff = new char[4];
int readCount = 0;
while((readCount = reader.Read(buff, 0, 4)) > 0)
{
//manipulations with buff
writer.Write(buff);
}
}
}
IEnumerable<byte[]> arraysOf4Bytes = File
.ReadAllBytes(path)
.Select((b,i) => new{b, i})
.GroupBy(x => x.i / 4)
.Select(g => g.Select(x => x.b).ToArray())
Is there a more efficient way to convert byte array to int16 array ?? or is there a way to use Buffer.BlockCopy to copy evry two byte to int16 array ???
public static int[] BYTarrToINT16arr(string fileName)
{
try
{
int bYte = 2;
byte[] buf = File.ReadAllBytes(fileName);
int bufPos = 0;
int[] data = new int[buf.Length/2];
byte[] bt = new byte[bYte];
for (int i = 0; i < buf.Length/2; i++)
{
Array.Copy(buf, bufPos, bt, 0, bYte);
bufPos += bYte;
Array.Reverse(bt);
data[i] = BitConverter.ToInt16(bt, 0);
}
return data;
}
catch
{
return null;
}
}
Use a FileStream and a BinaryReader. Something like this:
var int16List = List<Int16>();
using (var stream = new FileStream(filename, FileMode.Open))
using (var reader = new BinaryReader(stream))
{
try
{
while (true)
int16List.Add(reader.ReadInt16());
}
catch (EndOfStreamException ex)
{
// We've read the whole file
}
}
return int16List.ToArray();
You can also read the whole file into a byte[], and then use a MemoryStream instead of the FileStream if you want.
If you do this then you'll also be able to size the List approrpriately up front and make it a bit more efficient.
Apart from having an off-by-one possibility in case the number of bytes is odd (you'll miss the last byte) your code is OK. You can optimize it by dropping the bt array altogether, swapping i*2 and i*2+1 bytes before calling BitConverter.ToInt16, and passing i*2 as the starting index to the BitConverter.ToInt16 method.
This works if you don't mind using interopservices. I assume it is faster than the other techniques.
using System.Runtime.InteropServices;
public Int16[] Copy_Byte_Buffer_To_Int16_Buffer(byte[] buffer)
{
Int16[] result = new Int16[1];
int size = buffer.Length;
if ((size % 2) != 0)
{
/* Error here */
return result;
}
else
{
result = new Int16[size/2];
IntPtr ptr_src = Marshal.AllocHGlobal (size);
Marshal.Copy (buffer, 0, ptr_src, size);
Marshal.Copy (ptr_src, result, 0, result.Length);
Marshal.FreeHGlobal (ptr_src);
return result;
}
}
var bytes = File.ReadAllBytes(path);
var ints = bytes.TakeWhile((b, i) => i % 2 == 0).Select((b, i) => BitConverter.ToInt16(bytes, i));
if (bytes.Length % 2 == 1)
{
ints = ints.Union(new[] {BitConverter.ToInt16(new byte[] {bytes[bytes.Length - 1], 0}, 0)});
}
return ints.ToArray();
try...
int index = 0;
var my16s = bytes.GroupBy(x => (index++) / 2)
.Select(x => BitConverter.ToInt16(x.Reverse().ToArray(),0)).ToList();
I am honestly really confused on reading binary files in C#.
I have C++ code for reading binary files:
FILE *pFile = fopen(filename, "rb");
uint n = 1024;
uint readC = 0;
do {
short* pChunk = new short[n];
readC = fread(pChunk, sizeof (short), n, pFile);
} while (readC > 0);
and it reads the following data:
-156, -154, -116, -69, -42, -36, -42, -41, -89, -178, -243, -276, -306,...
I tried convert this code to C# but cannot read such data. Here is code:
using (var reader = new BinaryReader(File.Open(filename, FileMode.Open)))
{
sbyte[] buffer = new sbyte[1024];
for (int i = 0; i < 1024; i++)
{
buffer[i] = reader.ReadSByte();
}
}
and i get the following data:
100, -1, 102, -1, -116, -1, -69, -1, -42, -1, -36
How can i get similar data?
A short is not a signed byte, it's a signed 16 bit value.
short[] buffer = new short[1024];
for (int i = 0; i < 1024; i++) {
buffer[i] = reader.ReadInt16();
}
That's because in C++ you're reading shorts and in C# you're reading signed bytes (that's why SByte means). You should use reader.ReadInt16()
Your C++ code reads 2 bytes at a time (you're using sizeof(short)), while your C# code reads one byte at a time. A SByte (see http://msdn.microsoft.com/en-us/library/d86he86x(v=vs.71).aspx) uses 8 bits of storage.
You should use the same data type to get the correct output or cast to a new type.
In c++ you are using short. (i suppose the file is also written with short) so use short itself in c#. or you can use Sytem.Int16.
You are getting different values because short and sbyte are not equivalent. short is 2 bytes and Sbyte is 1 byte
using (var reader = new BinaryReader(File.Open(filename, FileMode.Open)))
{
System.Int16[] buffer = new System.Int16[1024];
for (int i = 0; i < 1024; i++)
{
buffer[i] = reader.ReadInt16();
}
}
The idea: Being able to take the bytes of any struct, send those bytes across a TcpClient (or through my Client wrapper), then have the receiving client load those bytes and use pointers to "paint" them onto a new struct.
The problem: It reads the bytes into the buffer perfectly; it reads the array of bytes on the other end perfectly. The "paint" operation, however, fails miserably. I write a new Vector3(1F, 2F, 3F); I read a Vector3(0F, 0F, 0F)...Obviously, not ideal.
Unfortunately, I don't see the bug - If it works one way, it should work the reverse - And the values are being filled in.
The write/read functions are as follows:
public static unsafe void Write<T>(Client client, T value) where T : struct
{
int n = System.Runtime.InteropServices.Marshal.SizeOf(value);
byte[] buffer = new byte[n];
{
var handle = System.Runtime.InteropServices.GCHandle.Alloc(value, System.Runtime.InteropServices.GCHandleType.Pinned);
void* ptr = handle.AddrOfPinnedObject().ToPointer();
byte* bptr = (byte*)ptr;
for (int t = 0; t < n; ++t)
{
buffer[t] = *(bptr + t);
}
handle.Free();
}
client.Writer.Write(buffer);
}
Line Break
public static unsafe T Read<T>(Client client) where T : struct
{
T r = new T();
int n = System.Runtime.InteropServices.Marshal.SizeOf(r);
{
byte[] buffer = client.Reader.ReadBytes(n);
var handle = System.Runtime.InteropServices.GCHandle.Alloc(r, System.Runtime.InteropServices.GCHandleType.Pinned);
void* ptr = handle.AddrOfPinnedObject().ToPointer();
byte* bptr = (byte*)ptr;
for (int t = 0; t < n; ++t)
{
*(bptr + t) = buffer[t];
}
handle.Free();
}
return r;
}
Help, please, thanks.
Edit:
Well, one major problem is that I'm getting a handle to a temporary copy created when I passed in the struct value.
Edit2:
Changing "T r = new T();" to "object r = new T();" and "return r" to "return (T)r" boxes and unboxes the struct and, in the meantime, makes it a reference, so the pointer actually points to it.
However, it is slow. I'm getting 13,500 - 14,500 write/reads per second.
Edit3:
OTOH, serializing/deserializing the Vector3 through a BinaryFormatter gets about 750 writes/reads per second. So a Lot faster than what I was using. :)
Edit4:
Sending the floats individually got 8,400 RW/second. Suddenly, I feel much better about this. :)
Edit5:
Tested GCHandle allocation pinning and freeing; 28,000,000 ops per second (Compared to about 1,000,000,000 Int32/int add+assign/second. So compared to integers, it's 35 times slower. However, that's still comparatively fast enough). Note that you don't appear to be able to pin classes, even if GCHandle does auto-boxed structs fine (GCHandle accepts values of type "object").
Now, if the C# guys update constraints to the point where the pointer allocation recognizes that "T" is a struct, I could just assign directly to a pointer, which is...Yep, incredibly fast.
Next up: Probably testing write/read using separate threads. :) See how much the GCHandle actually affects the send/receive delay.
As it turns out:
Edit6:
double start = Timer.Elapsed.TotalSeconds;
for (t = 0; t < count; ++t)
{
Vector3 from = new Vector3(1F, 2F, 3F);
// Vector3* ptr = &test;
// Vector3* ptr2 = &from;
int n = sizeof(Vector3);
if (n / 4 * 4 != n)
{
// This gets 9,000,000 ops/second;
byte* bptr1 = (byte*)&test;
byte* bptr2 = (byte*)&from;
// int n = 12;
for (int t2 = 0; t2 < n; ++t2)
{
*(bptr1 + t2) = *(bptr2 + t2);
}
}
else
{
// This speedup gets 24,000,000 ops/second.
int n2 = n / 4;
int* iptr1 = (int*)&test;
int* iptr2 = (int*)&from;
// int n = 12;
for (int t2 = 0; t2 < n2; ++t2)
{
*(iptr1 + t2) = *(iptr2 + t2);
}
}
}
So, overall, I don't think the GCHandle is really slowing things down. (Those who are thinking this is a slow way of assigning one Vector3 to another, remember that the purpose is to serialize structs into a byte[] buffer to send over a network. And, while that's not what we're doing here, it would be rather easy to do with the first method).
Edit7:
The following got 6,900,000 ops/second:
for (t = 0; t < count; ++t)
{
Vector3 from = new Vector3(1F, 2F, 3F);
int n = sizeof(Vector3);
byte* bptr2 = (byte*)&from;
byte[] buffer = new byte[n];
for (int t2 = 0; t2 < n; ++t2)
{
buffer[t2] = *(bptr2 + t2);
}
}
...Help! I've got IntruigingPuzzlitus! :D