Binary Writer/Reader extra character [closed]

Binary Writer/Reader extra character [closed] - c#

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I am converting some legacy VB6 code to C# and this just has me a little baffled. The VB6 code wrote certain data sequentially to a file. This data is always 110 bytes. I can read this file just fine in the converted code, but I'm having trouble with when I write the file from the converted code.
Here is a stripped down sample I wrote real quick in LINQPad:
void Main()
{
int[,] data = new[,]
{
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
},
{
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39
}
};
using ( MemoryStream stream = new MemoryStream() )
{
using ( BinaryWriter writer = new BinaryWriter( stream, Encoding.ASCII, true ) )
{
for( var i = 0; i < 2; i++ )
{
byte[] name = Encoding.ASCII.GetBytes( "Blah" + i.ToString().PadRight( 30, ' ' ) );
writer.Write( name );
for( var x = 0; x < 20; x++ )
{
writer.Write( data[i,x] );
}
}
}
using ( BinaryReader reader = new BinaryReader( stream ) )
{
// Note the extra +4 is because of the problem below.
reader.BaseStream.Seek( 30 + ( 20 * 4 ) + 4, SeekOrigin.Begin );
string name = new string( reader.ReadChars(30) );
Console.WriteLine( name );
// This is the problem..This extra 4 bytes should not be here.
//reader.ReadInt32();
for( var x = 0; x < 20; x++ )
{
Console.WriteLine( reader.ReadInt32() );
}
}
}
}
As you can see, I have a 30 character string written first. The string is NEVER longer than 30 characters and is padded with spaces if it is shorter. After that, twenty 32-bit integers are written. It is always 20 integers. So I know each character in a string is one byte. I know a 32 bit integer is four bytes. So in my reader sample, I should be able to seek 110 bytes ( 30 + (4 * 20) ), read 30 chars, and then read 20 ints and that's my data. However, for some reason, there is an extra 4 bytes being written after the string.
Am I just missing something completely obvious (as is normally the case for myself)? Strings aren't null terminated in .Net and this is four bytes anyway, not just an extra byte? So where is this extra 4 bytes coming from? I'm not directly calling Write(string) so it can't be a prefixed length, which it's obviously not since it's after my string. If you uncomment the ReadInt32(), it produces the desired result.

The extra 4 bytes are from the extra 4 characters you're writing. Change the string you're encoding as ASCII to this:
("Blah" + i.ToString()).PadRight(30, ' ')
That is, pad the string after you've concatenated the prefix and the integer.

Your extra four bytes are whitespace, because you aren't subtracting the length of 'Blah'. You don't know where you are in your stream. So basically, you think you're writing only 30 chars, but you really wrote 34 chars.
I know you didn't ask this - but you're writing garbage data to a file that doesn't need to be there.
Instead of padding your string with whitespace, you should just include a header or pointer that indicates the length of the next field in your file.
For example, say you have a 120 byte file. The first 4 bytes of the file indicate that the length of the following string is 96 bytes. So you read 4 bytes, get the length and then read 96 bytes. The next 4 bytes say that you have a string that's 16 bytes long, so you read the next 16 bytes and get your next string. This is pretty much how every well defined protocol works.

Related

Does C# have a way of casting a double array to a string similar to the C++ cast to a char*?

I have inherited C++ code that casts a double array to a char* as shown below. In C#, I have not been able to generate a string from an array of doubles that matches the string generated by the C++ cast. In C#, is there someway to generate a string from a double array that would match the string created in C++ where a simple cast to char* is done? The result of the C++ cast appears to be some kind of binary data.
I want to replace the C++ code that creates the string with C# code that will generate the same string and store it in a database memo field. I want to keep the C++ code that retrieves the string from the database memo field and converts it to a double array for use in calculations.
C++ code that casts the double array Darray to a char *:
char*s = (char*)Darray
I have tried several things in C# that didn't create the desired string including (obvious compile error):
string s = (string) Darray;
C# code that didn't create identical string to C++ code:
int length = Darray.Length * sizeof(double);
IntPtr pnt = Marshal.AllocHGlobal(length );
Marshal.Copy(Darray, 0, pnt, Darray.Length);
byte[] Barray = new byte[length];
Marshal.Copy(pnt, Barray, 0, length);
string theString = BitConverter.ToString(Barray);
C# code that also didn't create identical string to C++ code:
BinaryFormatter formatter = new BinaryFormatter();
using (MemoryStream m = new MemoryStream())
{
formatter.Serialize(m, Darray);
m.Position = 0;
StreamReader sr = new StreamReader(m);
string theString = sr.ReadToEnd();
}
C# code that also didn't create identical string to C++ code:
byte[] theBytesData = new byte[numBytesReqd];
Buffer.BlockCopy(Darray, 0, theBytesData, 0, numBytesReqd);
string theString = Encoding.ASCII.GetString(theBytesData, 0, theBytesData.Length);
Maybe there is no solution to this problem other than a mixed language program.
For the following C++ code:
double Darray[] = { 1.0,2.0,3.0 };
char* DarrayCp = (char*)Darray;
for (int i = 0; i < blockSize; i++)
{
cout << i << "\tDarrayCp: " << DarrayCp[i] << endl;
}
I get the following output which I'd like to reproduce with C# code:
0 DarrayCp:
1 DarrayCp:
2 DarrayCp:
3 DarrayCp:
4 DarrayCp:
5 DarrayCp:
6 DarrayCp: ð
7 DarrayCp: ?
8 DarrayCp:
9 DarrayCp:
10 DarrayCp:
11 DarrayCp:
12 DarrayCp:
13 DarrayCp:
14 DarrayCp:
15 DarrayCp: #
16 DarrayCp:
17 DarrayCp:
18 DarrayCp:
19 DarrayCp:
20 DarrayCp:
21 DarrayCp:
22 DarrayCp:
23 DarrayCp: #

Since you're just reinterpreting the raw bytes in the C++ code, the output characters you're seeing depend on the encoding used by the console or whatever other output method you're using to test the C++ code.
It looks like ISO-8859-1 gives you the same output as the sample you posted:
var Darray = new double[] { 1.0, 2.0, 3.0 };
var bytes = new byte[Darray.Length * sizeof(double)];
Buffer.BlockCopy(Darray, 0, bytes, 0, bytes.Length);
var str = Encoding.GetEncoding("ISO-8859-1").GetString(bytes);
but it's unclear to me if this conversion is actually useful for whatever you're trying to accomplish, since string and char in C# use UTF-16 characters so the resulting str has a completely different byte representation. bytes already represents the same data as DarrayCp in your C++ source, without any conversion.

C# Convert Hex String Array to Byte Array

I have a String[] of hex values "10" "0F" "3E" "42" stored.
I found this method to convert to a Byte[]
public static byte[] ToByteArray(String HexString)
{
int NumberChars = HexString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
{
bytes[i / 2] = Convert.ToByte(HexString.Substring(i, 2), 16);
}
return bytes;
}
However this converts the values to the hex equivalent. But the values are already in the hex equivalent!
For example this makes "10" "0F" "3E" "42" into "16" "15" "62" "66".
I want it to directly copy the values as they are already the correct hex value.
Edit:
Basically...
I want a byte array with the literal characters in the String[] So say the second value in String[] is 0F. I want the first byte in Byte[] to be 0F and not 16
Any ideas?
Edit2
Let me clarify. I don't want to convert my String[] values into Hexadecimal, as they are already Hexadecimal. I want to directly copy them to a Byte[]
The problem is my string of values "10" "0F" "3E" 42" already has the hexadecimal value I want. I want the byte array to contain those exact values and not convert them, they are already hexadecimal form.

You have to convert (or parse) string in order to get byte since string and byte are different types:
// 10 == 10d
byte b = Convert.ToByte("10"); // if "10" is a decimal representation
// 16 == 0x10
byte b = Convert.ToByte("10", 16); // if "10" is a hexadecimal representation
If you want to process an array, you can try a simple Linq:
using System.Linq;
...
string[] hexValues = new string[] {
"10", "0F", "3E", "42"};
byte[] result = hexValues
.Select(value => Convert.ToByte(value, 16))
.ToArray();
If you want to print out result as hexadecimal, use formatting ("X2" format string - at least 2 hexadecimal digits, use captital letters):
// 10, 0F, 3E, 42
Console.Write(string.Join(", ", result.Select(b => b.ToString("X2"))));
Compare with same array but in a different format ("d2" - at least 2 decimal digits)
// 16, 15, 62, 66
Console.Write(string.Join(", ", result.Select(b => b.ToString("d2"))));
If no format provided, .Net uses default one and represents byte in decimal:
// 16, 15, 62, 66
Console.Write(string.Join(", ", result));

You're really confusing representation and numbers here.
A string like "0F" can be seen as a representation of a number in base 16, that is, in decimal representation, 16.
Which is the exact same thing as representing 16 as F or 0F or XVI or
IIIIIIIIIIIIIIII or whatever other representation you choose.
The string "0F" actually looks in memory like this
Hexadecimal representation:
0x30 0x46 0x00
Decimal representation:
48 70 0
Binary representation:
0b00110000 0b01000110 0b00000000

Byte is simply a data type which is infact a subset of an integer.
Byte takes interger values ranging from -2^7(-128) to 2^7-1$(127)
Calling Convert.ToByte(string, 16) simply converts your string to an equivalent hex value and then to an equivalent value in byte.
Note the byte data type is always an integer data but used in place of an integer just to save space in memory. As referenced above the byte datatype takes values from -128 to 127 thereby saving you more space in memory than the integer data type would.
Please Note that you are likely to run into an error if the hexadecimal value you wish to convert to byte is less than -128 or greater than 127
The link below shows an instance of this error when I try converting a string whose value when converted to hexadecimal is greater than 127.
Error when converting to Byte
You get an error whenever you do this.
I hope my answer and Dmitry Bychenko's sheds more light into your problem.
Please feel free to comment if it doesnt.

c# Convert.To/FromBase64String confusion

Assuming I have this Method.
private static void Example(string data)
{
Console.WriteLine("Initial : {0}", data);
data = data.PadRight(data.Length + 1, '0');
Console.WriteLine("Step 1 : {0}", data);
data = data.PadRight(data.Length + 4 - data.Length % 4, '=');
Console.WriteLine("Step 2 : {0}", data);
byte[] byteArray = Convert.FromBase64String(data);
string newData = Convert.ToBase64String(byteArray);
Console.WriteLine("Step 3 : {0}", newData);
}
I expect the output given the input string "1" to be as follows
Initial : 1
Step 1 : 10
Step 2 : 10==
Step 3 : 10==
Instead the output is this.
Initial : 1
Step 1 : 10
Step 2 : 10==
Step 3 : 1w==
And I have no idea why. I would expect the output to be the same as the input but it isn't.
I have tried replacing
data = data.PadRight(data.Length + 1, '0');
with
data = data + "0";
It appears with longer input strings too, for example strings with a length of 5 or 9. It works fine if I add "=" but then I exceed my padding limit with Convert.FromBase64String()
So my question is really what is going on and how can I get my expected output,?
What am I doing wrong?
Edit: For those confused as to why I'm using bas64 it is related to this PHP decrypting data with RSA Private Key

Basically, there's no byte array which would be encoded to 10==.
If a base64 string ends with ==, that means that the final 4 characters only represent a single byte. So only the first character and the first 2 bits of the second character are relevant. Looking at the Wikipedia table, 10 means values of:
'1' = 53 '0' = 52
110101 110100
So that's encoding a byte of 1101 0111, and then the final four bits (0100) are ignored. When you re-encode the data, it's using 0s for the final four bits instead, giving:
'1' = 53 'w' = 48
110101 110000
Fundamentally, it's not clear what you're trying to do - but if your input is part of a base64-encoded value, that's pretty odd. The code is behaving the way I'd expect it to - it's just not useful code...

Parsing Plain Text Table

I'm trying to parse a table in plain text format. The program is written in Visual Studio using C#. I need to parse through the table and insert the data into the database.
Below is a sample table I will be reading in:
ID Name Value1 Value2 Value3 Value4 //header
1 nameA 3.0 0.2 2 6.2
2 nameB
3 nameC 2.9 3.0 7.3
4 nameD 1.5 3.0 1.8 1.1
5 nameE
6 nameF 1.2 2.4 3.3 2.5
7 nameG 3.0 3.2 2.1 4.5
8 nameH 88 12.4 28.9
In the example, I will need to capture data for id 1, 3, 4, 6, 7, and 8.
I thought of two ways to approach this, but neither of them works 100%.
Method 1:
By reading in the header, I can get the start index for each column. I will then use Substring collect data for each row.
ISSUE: once it past a certain row (which I will have no idea when this is happening), the columns shift, and Substring will no longer to collect the correct data.
This method will only collect correct data for 1, 3, and 4.
Method 2:
Using Regex to collect all the matches. I'm hoping this can collect ID, Name, Value1, Value2, Value3, Value4, in this order.
My pattern is (\d*?)\s\s\s+(.*?)\s\s\s+(\d*\.*\d*)\s\s\s+(\d*\.*\d*)\s\s\s+(\d*\.*\d*)\s\s\s+(\d*\.*\d*)
ISSUE: data that are collected are shifted left for some rows. For example, on ID 3, Value2 should be blank, but the regex will be reading Value2 = 3.0, Value3 = 7.3, and Value4 = blank. Same thing goes for ID 8.
Question:
How can I read in the whole table and parse them correctly?
(1) I do not know starting from which row the values will be shifted and
(2) I do not know how many cells it will be shifted by and if they are consistent.
Additional Information
The table is in a PDF file, I converted the PDF to text file so I can read in the data. The shifting data happens when a table goes across multiple pages, but it is not consistent.
EDIT
Below are some actual data:
68 BENZYL ALCOHOL 6.0 0.4 1 7.4
91 EVERNIA PRUNASTRI (OAK MOSS) 34 3 3 10
22 test 2323 23 12

ok, here u go! Use this regex pattern:
NOTE: you have to match this to any single line, not to the whole document! If you want to do it for your whole document then you have to add the 'multiline' modifier ('m'). You can do this by adding (?m) at the beginning of the regex pattern!
EDIT:
You provided some lines of your real data. Here's my updated regex pattern:
^(?<id>\d+)(?:\s{2,25})(?<name>.+?)(?:\s{2,45})(?<val1>\d+(?:\.\d+)?)?(?:\s{2,33})(?<val2>\d+(?:\.\d+)?)?(?:\s{2,14})(?<val3>\d+(?:\.\d+)?)?(?:\s{2,19})(?<val4>\d+(?:\.\d+)?)?$

How about treating this file like a fixed-length file, where you can define each column by an index and length. Once you have defined your fixed length columns, you can just get the value for the column with Substring, then Trim to clean it up.
You can wrap all this up in a Linq statement to project to an anonymouse type and filter for the IDs you want.
Something like this:
static void Main(string[] args)
{
int[] select = new int[] { 1, 3, 4, 6, 7, 8 };
string[] lines = File.ReadAllLines("TextFile1.txt");
var q = lines.Skip(1).Select(l => new {
Id = Int32.Parse(GetValue(l, 0, 6)),
Name = GetValue(l, 6, 11),
Value1 = GetValue(l, 17, 11),
Value2 = GetValue(l, 28, 13),
Value3 = GetValue(l, 41, 14),
Value4 = GetValue(l, 55, 13),
}).Where(o => select.Contains(o.Id));
var r = q.ToArray();
}
static string GetValue(string line, int index, int length)
{
string value = null;
int lineLength = line.Length;
// Take as much of the line as we can up to column length
if(lineLength > index)
value = line.Substring(index, Math.Min(length, lineLength - index)).Trim();
// Return null if we just have whitespace
return String.IsNullOrWhiteSpace(value) ? null : value;
}

How to identify proper substring length

I'm trying to read column values from this file starting at the arrow position:
Here's my error:
I'm guessing it's because the length values are wrong.
Say I have column with value :"Dog "
with the word dog and a few spaces after it. Do I have to set the length parameter as 3 (for dog) or can I set it as 6 to accommodate the spaces after Dog. This because each column length is fixed. As you can see some words are smaller than others and in order to be consistent I just want to set length as max column length (ex: 28 is length of 3rd column of my file but not all 28 spots are taken up everytime - ex: the word client is only 6 characters long

Robert Levy's answer is correct for the issue you're seeing - you've attempted to pull a substring from a string with a starting position that is greater than the length of the string.
You're parsing a fixed-length field file, where each field has a certain amount of characters, whether or not it uses all of them, and the pos and len arrays are intended to define those field lengths for use with Substring. As long as the line you're reading matches the expected field starts and lengths, you will be ok. As soon as you come to a line that doesn't match (for example, what appears to be the totals line - 0TotalRecords: 3,390,315) the field length definitions you've been using won't work, as the format has changed (and the line length may not even be the same).
There are a couple of things I would change to make this work. First, I would change your pos and len arrays so that they take the entirety of the field, not part of it. You can use Trim() to get rid of any leading or trailing blanks. As defined, your first field will only take the last number of the Seq# (pos 4, len 1), and your second field will only take the first 5 characters of the field, even though it appears to have space for ~12 characters.
Take a look at this (it's hard to be exact working from the picture, but for purposes of demonstration it will work):
1 2 3 4
01234567890123456789012345678901234567890
Seq# Field Description
3 BELNR ACCOUNTING DOCUMENT NBR
The numbers are the position of each charcter in the line. I would define the pos array to be the start of the field (0 for the first field, and then the position of the first letter of the field heading for each field after that), so you would have:
Seq# = 0
Field = 6
Description = 18
The len array would hold the length of the field, which I would define as the amount of characters up to the beginning of the next field, like this:
Seq# = 6
Field = 12
Description = 28 (using what you have as it is hard to tell
This would make your array initialization the following:
int[] pos = new int[3] { 0, 6, 18 };
int[] len = new int[3] { 6, 12, 28 };
If you wanted the fourth field, it would start at position 36 (pos 18 + len 28 = 36).
The second thing is I would check in the loop to see if the Total Records line is there, and skip that line (most likely it's the last line):
foreach (string line in textBox1.Lines)
{
if (!line.Contains("Total Records"))
{
val[j] = line.Substring(pos[j], len[j]).Trim();
}
}
Another way to do this would be to modify the original query and add a TakeWhile clause to it to only take lines until you hit the Total Records one:
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8)
.TakeWhile(l => !l.Contains("Total Records")).ToArray();
The above would skip the first 8 lines and take all the remaining lines up to, but not including, the first line to contain "Total Records" in the string.
Then you could do something like this:
string[] lines = File.ReadAllLines(ofd.FileName).Skip(8)
.TakeWhile(l => !l.Contains("Total Records")).ToArray();
textBox1.Lines = lines;
int[] vale = new int[3];
int[] pos = new int[3] { 0, 6, 18 };
int[] len = new int[3] { 6, 12, 28 };
foreach (string line in textBox1.Lines)
{
val[j] = line.Substring(pos[j], len[j]).Trim();
}
Now you don't have to check for the "Total Records" line.
Of course, if there are other lines in your file, or there are records after the "Total Records" line (which I rather doubt) you'll have to handle those cases as well.
In short, the code for pulling out the substrings will only work for lines that match that particular format (or more specifically, have fields that match those positions/lengths) - anything outside out of that will either give you incorrect values or throw an error (if the start position is greater than the length of the string).

that exception is complaining about the first parameter which suggests that your file contains a row that is < 18 characters

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.