Fixed length strings or structures in C# - c#

I need to create a structure or series of strings that are fixed lenght for a project I am working on. Currently it is written in COBOL and is a communication application. It sends a fixed length record via the web and recieves a fixed length record back. I would like to write it as a structure for simplicity, but so far the best thing I have found is a method that uses string.padright to put the string terminator in the correct place.
I could write a class that encapsulates this and returns a fixed length string, but I'm hoping to find a simple way to fill a structure and use it as a fixed length record.
edit--
The fixed length record is used as a parameter in a URL, so its http:\somewebsite.com\parseme?record="firstname lastname address city state zip". I'm pretty sure I won't have to worry about ascii to unicode conversions since it's in a url. It's a little larger than that and more information is passed than address, about 30 or 35 fields.

Add the MarshalAs tag to your structure. Here is an example:
<StructLayout (LayoutKind.Sequential, CharSet:=CharSet.Auto)> _
Public Structure OSVERSIONINFO
Public dwOSVersionInfoSize As Integer
Public dwMajorVersion As Integer
Public dwMinorVersion As Integer
Public dwBuildNumber As Integer
Public dwPlatformId As Integer
<MarshalAs (UnmanagedType.ByValTStr, SizeConst:=128)> _
Public szCSDVersion As String
End Structure
http://bytes.com/groups/net-vb/369711-defined-fixed-length-string-structure

You could use the VB6 compat FixedLengthString class, but it's pretty trivial to write your own.
If you need parsing and validation and all that fun stuff, you may want to take a look at FileHelpers which uses attributes to annotate and parse fixed length records.
FWIW, on anything relatively trivial (data processing, for instance) I'd probably just use a ToFixedLength() extension method that took care of padding or truncating as needed when writing out the records. For anything more complicated (like validation or parsing), I'd turn to FileHelpers.

byte[] arrayByte = new byte[35];
// Fill arrayByte with the other function
//Convert array of byte to string chain
string arrayStringConverr = Encoding.ASCII.GetString(arrayByte, 0, arrayByte.Length);

As an answer, you should be able to use char arrays of the correct size, without having to marshal.
Also, the difference between a class and a struct in .net is minimal. A struct cannot be null while a class can. Otherwise their use and capabilities are pretty much identical.
Finally, it sounds like you should be mindful of the size of the characters that are being sent. I'm assuming (I know, I know) that COBOL uses 8bit ASCII characters, while .net strings are going to use a UTF-16 encoded character set. This means that a 10 character string in COBOL is 10 bytes, but in .net, the same string is 20 bytes.

Related

ProtectedMemory - How To Ensure That The Size Of The Byte Array Is A Multiple Of 16?

I'm reading up on the ProtectedMemory class in C# (which uses the Data Protection API in Windows (DPAPI)) and I see that in order to use the Protect() Method of the class, the data to be encrypted must be stored in a byte array whose size/length is a multiple of 16.
I know how to convert many different data types to byte array form and back again, but how can I guarantee that the size of a byte array is a multiple of 16? Do I literally need to create an array whose size is a multiple of 16 and keep track of the original data's length using another variable or am I missing something? With traditional block-ciphers all of these details are handled for you automatically with padding settings. Likewise, when I attempt to convert data back to its original form from a byte array, how do I ensure that any additional bytes are ignored, assuming of course that the original data wasn't a multiple of 16.
In the code sample provided in the .NET Framework documentation, the byte array utilised just so happens to be 16 bytes long so I'm not sure what best practice is in relation to this hence the question.
Yes, just to iterate over the possibilities given in the comments (and give an answer to this nice question), you can use:
a padding method that is also used for block cipher modes, see all the options on the Wikipedia page on the subject.
prefix a length in some form or other. A fixed size of 32 bits / 4 bytes is probably easiest. Do write down the type of encoding for the size (unsigned, little endian is probably best for C#).
Both of these already operate on bytes, so you may need to define a character encoding such as UTF-8 if you use a string.
You could also use a specific encoding of the string, e.g. one defined by ASN.1 / DER and then perform zero padding. That way you can even indicate the type of the data that has been encoded in a platform independent way. You may want to read up on masochism before taking this route.

amount of memory allocated when double is converted to string

i have data structure i am passing this from server to client using data contract.
the data structure is
class datatransfer
{
double m_value1;
double m_value2;
double m_value3;
};
in the client i want to write into file.
The idea is to convert the values of the data transfer into string using string builder
than transfer the string to the client.
or
send the data structure and write the file using stream writer
which is the best approach? converting to string or send the datastructure and write to the file?
Reason for the question: to avoid the generation of string.
double size is 8 bytes. If i convert it to string what will be the size allocated .
It depends entirely on how you format the String representation of the Double. There is no pre-defined sizing guideline.
For example:
var myDouble = 5.183498029834092834D;
var shortString = myDouble.ToString("#.00"); // 5.18, uses 8 bytes
var longerString = myDouble.ToString("#.0000000"); // 5.1834980 18 bytes
Note that the sizes are the result of a Char being 2 bytes on my system.
What you're really trying to do is called serialization. There's a number of ways to do this. The simplest of which may be to just decorate your class with the [Serializable] attribute:
[Serializable]
public class datatransfer {
double m_value1;
double m_value2;
double m_value3;
}
You'll need to make your member variables public, or provide publicly accessible properties for setting the member variables. Otherwise they will not be serialized using this approach.
The size of a double when converted to string depends on the number itself and the encoding of the string.
Just as example, assuming ANSI encoding, the number 1 will need 1 byte, the number 1.123 will need 5 bytes. Moreover if you transmit that as text you need to consider more bytes used as delimiters (for example, using a space, you'll need N - 1 extra bytes). You should always transfer data in binary (if possible but it may depend on the type of connection you have and the protocol you have to use).
As a general rule you should think that binary data is smaller (then faster to transfer) but it's not easy to debug and you may have problems with a firewall. Text data is larger, verbose, you have to validate them on client side (your structure, XML for example, may be corrupted) and then slower to transfer. Big advantages are it's more easy to debug and whatever connection/protocol you may have usually you can transfer text (but don't forget you can transfer binary data encoded as text).
So this is not a definitive answer, what kind of data transfer method/framework/protocol you're using? A WCF service? .NET Remoting? Custom TCP/IP connection? If data structure is not too big you may find that binary serialization is a very good solution

Encoding-free String class for handling bytes? (Or alternative approach)

I have an application converted from Python 2 (where strings are essentially lists of bytes) and I'm using a string as a convenient byte buffer.
I am rewriting some of this code in the Boo language (Python-like syntax, runs on .NET) and am finding that the strings have an intrinsic encoding type, such as ASCII, UTF-8, etc. Most of the information dealing with bytes refer to arrays of bytes, which are (apparently) fixed length, making them quite awkward to work with.
I can obviously get bytes from a string, but at the risk of expanding some characters into multiple bytes, or discarding/altering bytes above 127, etc. This is fine and I fully understand the reasons for this - but what would be handy for me is either (a) an encoding that guarantees no conversion or discarding of characters so that I can use a string as a convenient byte buffer, or (b) some sort of ByteString class that gives the convenience of the string class. (Ideally the latter as it seems less of a hack.) Do either of these already exist? (Or are trivial to implement?)
I am aware of System.IO.MemoryStream, but the prospect of creating one of those each time and then having to make a System.IO.StreamReader at the end just to get access to ReadToEnd() doesn't seem very efficient, and this is in performance-sensitive code.
(I hope nobody minds that I tagged this as C# as I felt the answers would likely apply there also, and that C# users might have a good idea of the possible solutions.)
EDIT: I've also just discovered System.Text.StringBuilder - again, is there such a thing for bytes?
Use the Latin-1 encoding as described in this answer. It maps values in the range 128-255 unchanged, useful when you want to roundtrip bytes to chars.
UPDATE
Or if you want to manipulate bytes directly, use List<byte>:
List<byte> result = ...
...
// Add a byte at the end
result.Add(b);
// Add a collection of bytes at the end
byte[] bytesToAppend = ...
result.AddRange(bytesToAppend);
// Insert a collection of bytes at any position
byte[] bytesToInsert = ...
int insertIndex = ...
result.InsertRange(insertIndex, bytesToInsert);
// Remove a range of bytes
result.RemoveRange(index, count);
... etc ...
I've also just discovered System.Text.StringBuilder - again, is there such a thing for bytes?
The StringBuilder class is needed because regular strings are immutable, and a List<byte> gives you everything you might expect from a "StringBuilder for bytes".
I would suggest that you use MemoryStream combined with the GetBuffer() operator to retrieve the end result. Strings are actually fixed length and immutable, and to add or replace one byte to a string requires you to copy the whole thing into a new string, which is quite slow. To avoid this you would need to use a StringBuilder which allocates memory and doubles the capacity when needed, but then you might just as well use MemoryStream instead which does a similar thing but on bytes.
Each element in the string is a char and are actually two bytes because .NET strings are always UTF-16 in memory, which means you will also be wasting memory if you decide to store only one byte in each element.

Binary encoding for low bandwidth connections?

In my application I have a simple XML formatted file containing structured data. Each data entry has a data type and a value. Something like
<entry>
<field type="integer">5265</field>
<field type="float">34.23</field>
<field type="string">Jorge</field>
</entry>
Now, this formatting allow us to have the data in a human readable form in order to check for various values, as well as performing transformation and reading of the file easily for interoperability.
The problem is we have a very low bandwidth connection (about 1000 bps, yeah, thats bits per second) so XML is no exactly the best format to transmit the data. I'm looking for ways to encode the xml file into a binary equivalent that its more suitable for transmission.
Do you know of any good tutorial on the matter?
Additionally we compress the data before sending (simple GZIP) so I'm a little concerned with losing compression ratio if I go binary. Would the size be affected (when compressing) so badly that it would be a bad idea to try to optimize it in the first place?
Note: This is not premature optimization, it's a requisite. 1000 bps is a really low bandwidth so every byte counts.
Note2: Application is written in c# but any tutorial will do.
Try using ASN.1. The packed encoding rules should yield a pretty decently compressed form on their own and and the xml encoding rules should yield something equivalent to your existing xml.
Also, consider using 7zip instead of gzip.
You may want to investigate Google Protocol Buffers. They produce far smaller payloads than XML, though not necessarily the smallest payloads possible; whether they're acceptable for your use depends on a lot of factors. They're certainly easier than devising your own scheme from scratch, though.
They've been ported to C#/.NET and seem to work quite well there in my (thus far, somewhat limited) experience. There's a package at that link to integrate somewhat with VS and automatically create C# classes from the .proto files, which is very nice.
I'd dump (for transmission anyway, you could deconstruct at the sender, and reconstruct at the receiver, in Java you could use a custom Input/OutputStream to do the work neatly) the XML. Go binary with fixed fields - data type, length, data.
Say if you have 8 or fewer datatypes, encode that in three bits. Then the length, e.g., as an 8-bit value (0..255).
Then for each datatype, encode differently.
Integer/Float: BCD - 4 bits per digit, use 15 as decimal point. Or just the raw bits themselves (might want different datatypes for 8-bit int, 16-bit int, 32-bit int, 64-bit long, 32-bit float, 64-bit double).
String - can you get away with 7-bit ASCII instead of 8? Etc. All upper-case letters + digits and some punctuation could get you down to 6-bits per character.
You might want to prefix it all with the total number of fields to transmit. And perform a CRC or 8/10 encoding if the transport is lossy, but hopefully that's already handled by the system.
However don't underestimate how well XML text can be compressed. I would certainly do some calculations to check how much compression is being achieved.
Anything which is efficient at converting the plaintext form to binary is likely to make the compression ratio much worse, yes.
However, it could well be that an XML-optimised binary format will be better than the compressed text anyway. Have a look at the various XML Binary formats listed on the Wikipedia page. I have a bit of experience with WBXML, but that's all.
As JeeBee says, a custom binary format is likely to be the most efficient approach, to be honest. You can try to the gzip it, but the results of that will depend on what the data is like in the first place.
And yes, as Skirwan says, Protocol Buffers are a fairly obvious candidate here - but you may want to think about custom floating point representations, depending on what your actual requirements are. If you only need 4SF (and you know the scale) then sending a two byte integer may well be the best bet.
The first thing to try is gzip; beyond that, I would try protobuf-net - I can think of a few ways of encoding that quite easily, but it depends how you are building the xml, and whether you mind a bit of code to shim between the two formats. In particular, I can imagine representing the different data-types as either 3 optional fields on the same type, or 3 different subclasses of an abstract contract.
[ProtoContract]
class EntryItem {
[ProtoMember(1)]
public int? Int32Value {get;set;}
[ProtoMember(2)]
public float? SingleValue {get;set;}
[ProtoMember(3)]
public string StringValue {get;set;}
}
[ProtoContract]
class Entry {
[ProtoMember(1)]
public List<EntryItem> Items {get; set;}
}
With test:
[TestFixture]
public class TestEntries {
[Test]
public void ShowSize() {
Entry e = new Entry {
Items = new List<EntryItem>{
new EntryItem { Int32Value = 5265},
new EntryItem { SingleValue = 34.23F },
new EntryItem { StringValue = "Jorge" }
}
};
var ms = new MemoryStream();
Serializer.Serialize(ms, e);
Console.WriteLine(ms.Length);
Console.WriteLine(BitConverter.ToString(ms.ToArray()));
}
}
Results (21 bytes)
0A-03-08-91-29-0A-05-15-85-EB-08-42-0A-07-1A-05-4A-6F-72-67-65
I would look into configuring your app to be responsive to smaller XML fragments; in particular ones which are small enough to fit in a single network packet.
Then arrange your data to be transmitted in order of importance to the user so that they can see useful stuff and maybe even start working on it before all the data arrives.
Late response -- at least it comes before year's end ;-)
You mentioned Fast Infoset. Did you try it? It should give you the best results in terms of both compactness and performance. Add GZIP compression and the final size will be really small, and you will have avoided the processing penalties of compressing XML. WCF-Xtensions offers a Fast Infoset message encoding and GZIP/DEFLATE/LZMA/PPMs compression too (works on .NET/CF/SL/Azure).
Here's the pickle you're in though: You're compresing things with Gzip. Gzip is horrible on plain text until you hit about the length of the total concatonated works of Dickens or about 1200 lines of code. The overhead of the dictionary and other things Gzip uses for compression.
1Kbps is fine for the task of 7500 chars (it'll take about a minute given optimal conditions, but for <300 chars, you should be fine!) However if you're really that concerned, you're going to want to compress this down for brevity. Here's how I do things of this scale:
T[ype]L[ength][data data data]+
That is, that T represents the TYPE. say 0x01 for INT, 0x02 for STRING, etc. LENGTH is just an int... so 0xFF = 254 chars long, etc. An example datapacket would look like:
0x01 0x01 0x3F 0x01 0x01 0x2D 0x02 0x06 H E L L O 0x00
This says I have an INT, length 1, of value 0x3F, an INT, length 1, of value 0x2D then a STRING, length 6 of a null terminated "HELLO" (Ascii assumed). Learn the wonders that are System.Text.Encoding.Utf8.getBytes and BitConverter and ByteConverter.
for reference see This page to see just how much 1Kbps is. Really, for the size you're dealing with you should bee fine.

Getting a string, int, etc in binary representation?

Is it possible to get strings, ints, etc in binary format? What I mean is that assume I have the string:
"Hello" and I want to store it in binary format, so assume "Hello" is
11110000110011001111111100000000 in binary (I know it not, I just typed something quickly).
Can I store the above binary not as a string, but in the actual format with the bits.
In addition to this, is it actually possible to store less than 8 bits. What I am getting at is if the letter A is the most frequent letter used in a text, can I use 1 bit to store it with regards to compression instead of building a binary tree.
Is it possible to get strings, ints,
etc in binary format?
Yes. There are several different methods for doing so. One common method is to make a MemoryStream out of an array of bytes, and then make a BinaryWriter on top of that memory stream, and then write ints, bools, chars, strings, whatever, to the BinaryWriter. That will fill the array with the bytes that represent the data you wrote. There are other ways to do this too.
Can I store the above binary not as a string, but in the actual format with the bits.
Sure, you can store an array of bytes.
is it actually possible to store less than 8 bits.
No. The smallest unit of storage in C# is a byte. However, there are classes that will let you treat an array of bytes as an array of bits. You should read about the BitArray class.
What encoding would you be assuming?
What you are looking for is something like Huffman coding, it's used to represent more common values with a shorter bit pattern.
How you store the bit codes is still limited to whole bytes. There is no data type that uses less than a byte. The way that you store variable width bit values is to pack them end to end in a byte array. That way you have a stream of bit values, but that also means that you can only read the stream from start to end, there is no random access to the values like you have with the byte values in a byte array.
What I am getting at is if the letter
A is the most frequent letter used in
a text, can I use 1 bit to store it
with regards to compression instead of
building a binary tree.
The algorithm you're describing is known as Huffman coding. To relate to your example, if 'A' appears frequently in the data, then the algorithm will represent 'A' as simply 1. If 'B' also appears frequently (but less frequently than A), the algorithm usually would represent 'B' as 01. Then, the rest of the characters would be 00xxxxx... etc.
In essence, the algorithm performs statistical analysis on the data and generates a code that will give you the most compression.
You can use things like:
Convert.ToBytes(1);
ASCII.GetBytes("text");
Unicode.GetBytes("text");
Once you have the bytes, you can do all the bit twiddling you want. You would need an algorithm of some sort before we can give you much more useful information.
The string is actually stored in binary format, as are all strings.
The difference between a string and another data type is that when your program displays the string, it retrieves the binary and shows the corresponding (ASCII) characters.
If you were to store data in a compressed format, you would need to assign more than 1 bit per character. How else would you identify which character is the mose frequent?
If 1 represents an 'A', what does 0 mean? all the other characters?

Categories

Resources