How to serialize a collection as an alphanumeric string? - c#

I need to express a collection of about 10-15 short strings (and maybe some ints) as a fairly compact alphanumeric string - one which I can send as a parameter in a get request.
Basically, I'm thinking that my collection will be a hashtable, and I'd like to serialize it so it looks sort of like a viewstate string (only hopefully not so long!).
eg.
testpage.aspx?code=rO0ABXNyAAlTb21lQ2xhc3PSHbLk6OgfswIAA0kAAWl
and then testpage.aspx can deserialize this back to the original collection.
Is this possible?

One option here is to pick a delimiter, for example ¤; join the strings, encode them (perhaps UTF8), and pack the bytes as base-64...
string[] data = {"abc","123", "def"};
string s = string.Join("¤", data);
byte[] raw = Encoding.UTF8.GetBytes(s);
string alphaNumeric = Convert.ToBase64String(raw); // send this
(you may need to handle the few non-alphanumeric characters that base-64 uses).
And to reverse it:
raw = Convert.FromBase64String(alphaNumeric);
s = Encoding.UTF8.GetString(raw);
data = s.Split('¤');
If you want to send key/value pairs... well, the obvious choice would be query-string parameters themselves, since they are designed for this. But if you need it as a byte-stream:
var data = new DbConnectionStringBuilder();
data["foo"] = "abc";
data["bar"] = "123 + ;la";
string s = data.ConnectionString;
byte[] raw = Encoding.UTF8.GetBytes(s);
string alphaNumeric = Convert.ToBase64String(raw); // send this
raw = Convert.FromBase64String(alphaNumeric);
s = Encoding.UTF8.GetString(raw);
data.ConnectionString = s;
foreach (string key in data.Keys)
{
Console.WriteLine(key + "=" + data[key]);
}

Why don't you just serialize the data using protobuf-net and pass it through the Session? Or, if it has to be a string, just use XmlSerializer?
Personally, passing serialized data through the URL seems really bad to me!

you can serialize your dictionary/hashtable to JSON,
and then change it to base64 (just to make it a tad less visible and resolve possible usage of URL characters etc).
or you can just URLEncode it.

You can use standard .Net serialization and serialize your object to a MemoryStream. You can then read out the contents of the MemoryStream as a byte array and use Convert.ToBase64String on the array to get a string representation of it.
To deserialize it you can do the opposite.
If you are worried that the serialized object is too large, you can wrap the MemoryStream in a System.IO.Compression.DeflateStream to compress it.

Thank you all, putting it together I wrote a test console app. I was a bit disappointed that the resulting string was so long.
My example implements a DeflateStream, which for my datasize probably introduces more overhead than it saves with compression. But even without the compression it was still pretty big.
What I was hoping to achieve was to make something slightly more compact (obfuscation for the user was a plus, but not critical) - I suspect that it'd actually be better for me to use a plain old parameterized string. Maybe JSON might be ok, but I'm using ASP.net 2.0, and I don't think that I get a readybaked json serializer there.
Nonetheless, I learnt something new and interesting, so thanks for that!
static void Main(string[] args)
{
Hashtable ht1 = new Hashtable(1);
ht1.Add("name", "bob");
Console.WriteLine(ToCompactString(ht1));
Console.WriteLine();
string str = "name:bob";
Console.WriteLine(ToCompactString(str));
Console.ReadLine();
}
private static string ToCompactString(object obj)
{
var ms = new MemoryStream();
var ds = new DeflateStream(ms, CompressionMode.Compress);
var bf = new BinaryFormatter();
bf.Serialize(ds, obj);
byte[] bytes = ms.ToArray();
ds.Close();
ms.Close();
string result = Convert.ToBase64String(bytes);
return result;
}

Related

How can I convert an XElement to a byte array for a PutFile operation?

I need to convert a big XElement to a byte array so that it can be uploaded later to a fileshare. What is the correct method to call to do that?
Below you see the signature of a method fileShare.PutFile that is internal:
void PutFile(string folder, string fileName, byte[] content);
Then given an XElement xml, I tried converting it to a byte array by encoding its XElement.Value using Encoding.Default.GetBytes() as follows:
byte[] bytes = Encoding.Default.GetBytes(xml.Value);
fileShare.PutFile(folderName, blobName, bytes);
I am not so sure xml.Value (XElement.Value) is really what GetBytes method is really needing though. Is this correct?
To test this, I spun up a console app and put in some fake data. I did this for the XElement:
XElement root = new XElement("Root",
new XElement("Child1", 1),
new XElement("Child2", 2),
new XElement("Child3", 3),
new XElement("Child4", 4),
new XElement("Child5", 5),
new XElement("Child6", 6)
);
Then I tried that line of code putting to a byte array
byte[] bytes = Encoding.Default.GetBytes(root.Value);
Well I guess I forgot that when I step over and see Autos that bytes variable is btye[6] and when I expand - I see that [0] = 49 and so on
Now this may not mean it is not working ... or does it mean that? How can I interpret the contents of the bytes array, to check whether it is correct?
Firstly, using Encoding.Default is not recommended. From the docs:
Warning
Different computers can use different encodings as the default, and the default encoding can change on a single computer. If you use the Default encoding to encode and decode data streamed between computers or retrieved at different times on the same computer, it may translate that data incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these reasons, using the default encoding is not recommended. To ensure that encoded bytes are decoded properly, you should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding. You could also use a higher-level protocol to ensure that the same format is used for encoding and decoding.
Secondly, XElement.Value returns
A String that contains all of the text content of this element. If there are multiple text nodes, they will be concatenated.
Thus if you upload the Value you will be stripping away the entire XML markup structure from your file leaving only the plain text. While you might want to do that, it seems very unlikely. If you compare the value with the string returned by XElement.ToString() the difference should be clear.
Instead, to convert the XML contents of your XElement (including both markup and text) to a byte array, it would be better to write your XElement directly to a MemoryStream using an appropriately configured XmlWriterSettings and return the byte array thereby created. The following extension method does the job:
public static partial class XNodeExtensions
{
static Encoding DefaultEncoding { get; } = new UTF8Encoding(false); // Disable the BOM because XElement.ToString() does not include it.
public static byte [] ToByteArray(this XNode node, SaveOptions options = default, Encoding encoding = default)
{
// Emulate the settings of XElement.ToString() and XDocument.ToString()
// https://referencesource.microsoft.com/#System.Xml.Linq/System/Xml/Linq/XLinq.cs,2004
// I omitted the XML declaration because XElement.ToString() omits it, but you might want to include it, depending upon your needs.
var settings = new XmlWriterSettings { OmitXmlDeclaration = true, Indent = (options & SaveOptions.DisableFormatting) == 0, Encoding = encoding ?? DefaultEncoding };
if ((options & SaveOptions.OmitDuplicateNamespaces) != 0)
settings.NamespaceHandling |= NamespaceHandling.OmitDuplicates;
return node.ToByteArray(settings);
}
public static byte [] ToByteArray(this XNode node, XmlWriterSettings settings)
{
using var ms = new MemoryStream();
using (var writer = XmlWriter.Create(ms, settings))
node.WriteTo(writer);
return ms.ToArray();
}
}
Now you can format your XElement to a UTF8-encoded byte array by doing:
var bytes = root.ToByteArray();
The extension method has the added advantage that, if you really need to use some encoding other than UTF8, unsupported Unicode characters will be escaped rather than replaced with a fallback as explained in this answer to XmlDocument with Kanji text content is not encoded correctly to ISO-8859-1 using XmlTextWriter.
var bytes = root.ToByteArray(encoding : Encoding.Default);
To check for correctness, you could examine the contents of the byte array in the debugger or your console app by decoding it to a string as follows:
var resultString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(resultString);
Or with the default encoding:
var resultString = Encoding.Default.GetString(bytes);
You could also assert that the contents of the byte array are correct by parsing the contents back to a new XElement and checking that the result is semantically identical to the original by using XNode.DeepEquals():
Assert.IsTrue(
XNode.DeepEquals(root,
XElement.Load(new StreamReader(new MemoryStream(bytes), encoding))));
Demo fiddle here.

How can I take from Session byte's array? Inside the session was written byte's array too

Is it byte inside the session or data will be converted into the string after writing?
if yes I think I can take it like this:
var res = Encoding.UTF8.GetBytes(Session["session_state"]);
or can I take it "as is" without converting into the byte array? like:
var res = Session["session_state"] as bytes[]; // or smth. like that
The data isn't converted. If the session object is serialized (depending on how it is stored), then it is deserialized before you get access to it again.
Just cast the value to a byte array:
var res = Session["session_state"] as byte[];
or:
var res = (byte[])Session["session_state"];
Side note: A byte array can't reliably be converted to a string using UTF-8 encoding. UTF-8 is used the other way around, i.e. converting a string to bytes and then back. To make a string from bytes you would rather use something like base64.
You will always get what you stored in the session regardless of the mode you are using for the session state (inproc, state server, ...)
So the answer will be
var res = Session["session_state"] as byte[];

Bit Array to String and back to Bit Array

Possible Duplicate Converting byte array to string and back again in C#
I am using Huffman Coding for compression and decompression of some text from here
The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.
For my situation, i need to get the compressed content, store it and decompress it when ever need.
The output from the encoder and the input to the decoder are BitArray.
When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.
Tree huffmanTree = new Tree();
huffmanTree.Build(input);
string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);
// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();
// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);
// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);
// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);
// Convert bytes back to bit array
BitArray todecode = new BitArray(e);
string decoded = huffmanTree.Decode(todecode);
Console.WriteLine("Decoded: " + decoded);
Console.ReadLine();
The Output of Original code from the tutorial is:
The Output of My Code is:
Where am I wrong friends? Help me, Thanks in advance.
You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.
string output = Encoding.UTF8.GetString(e);
e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.
Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.
I'm pretty sure Encoding doesn't roundtrip - that is you can't encode an arbitrary sequence of bytes to a string, and then use the same Encoding to get bytes back and always expect them to be the same.
If you want to be able to roundtrip from your raw bytes to string and back to the same raw bytes, you'd need to use base64 encoding e.g.
http://blogs.microsoft.co.il/blogs/mneiter/archive/2009/03/22/how-to-encoding-and-decoding-base64-strings-in-c.aspx

Best way to deserialize a long string (response of an external web service)

I am querying a web service that was built by another developer. It returns a result set in a JSON-like format. I get three column values (I already know what the ordinal position of each column means):
[["Boston","142","JJK"],["Miami","111","QLA"],["Sacramento","042","PPT"]]
In reality, this result set can be thousands of records long.
What's the best way to parse this string?
I guess a JSON deserializer would be nice, but what is a good one to use in C#/.NET? I'm pretty sure the System.Runtime.Serialization.Json serializer won't work.
Using the built in libraries for asp.net (System.Runtime.Serialization and System.ServiceModel.Web) you can get what you want pretty easily:
string[][] parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new System.IO.MemoryStream(System.Text.Encoding.Default.GetBytes(jsonStr)))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(string[][]));
parsed = serializer.ReadObject(ms) as string[][];
}
A little more complex example (which was my original answer)
First make a dummy class to use for serialization. It just needs one member to hold the result which should be of type string[][].
[DataContract]
public class Result
{
[DataMember(Name="d")]
public string[][] d { get; set; }
}
Then it's as simple as wrapping your result up like so: { "d": /your results/ }. See below for an example:
Result parsed = null;
var jsonStr = #"[[""Boston"",""142"",""JJK""],[""Miami"",""111"",""QLA""],[""Sacramento"",""042"",""PPT""]]";
using (var ms = new MemoryStream(Encoding.Default.GetBytes(string.Format(#"{{ ""d"": {0} }}", jsonStr))))
{
var serializer = new System.Runtime.Serialization.Json.DataContractJsonSerializer(typeof(Result));
parsed = serializer.ReadObject(ms) as Result;
}
How about this?
It sounds like you have a pretty simple format that you could write a custom parser for, since you don't always want to wait for it to parse and return the entire thing before it uses it.
I would just write a recursive parser that looks for the tokens "[", ",", "\"", and "]" and does the appropriate thing.

How to serialize an object in C# and prevent tampering?

I have a C# class as follows:
public class TestObj
{
private int intval;
private string stringval;
private int[] intarray;
private string[] stringarray;
//... public properties not shown here
}
I would like to serialize an instance of this class into a string.
In addition:
I will be appending this string as a QueryString param to a URL. So I would like to take some effort to ensure that the string cannot be tampered with easily.
Also, I would like the serialization method to be efficient so the size of the string is minmal.
Any suggestions of specific .NET Framework classes/methods I should use?
Sign the stream and add the signature to your query. Use a HMAC signing algorithm, like HMACSHA1. You will need to have a secret between your client and your server to sign and validate the signature.
1) To serialize:
public String SerializeObject(TestObj object)
{
String Serialized = String.Empty;
MemoryStream memoryStream = new MemoryStream ( );
XmlSerializer xs = new XmlSerializer(typeof(TestObj));
XmlTextWriter xmlTextWriter = new XmlTextWriter ( memoryStream, Encoding.UTF8 );
xs.Serialize (xmlTextWriter, object);
memoryStream = (MemoryStream) xmlTextWriter.BaseStream;
Serialized = UTF8Encoding.GetString(memoryStream.ToArray());
return Serialized;
}
2) To prevent tampering:
Come up with a secret string, e.g. "MySecretWord".
Take your serialized object instance as a string, and append the secret word to it.
Hash the string (e.g. SHA or use HMAC (as suggested by Remus) )
Append the hash to the query string
On the receiving side (which also knows your "MySecretWord" secret string) you strip away the hash, take the original serialized instance, append the known secret string and hash it again. Then compare the two hashes for equality. If they are equal, your string was not modified.
You may need to Url/Base64 Encode your string so it works as a query string. This is also important as you need the query string to arrive exactly as sent.

Categories

Resources