How do I read exactly one char from a Stream?

How do I read exactly one char from a Stream? - c#

I have a Stream with some text data (can be ASCII, UTF-8, Unicode; encoding is known). I need to read exactly one char from the stream, without advancing stream position any longer. StreamReader is inappropriate, as it aggressively prefetches data from the stream.
Ideas?

If you want to read and decode the text one byte at a time, the most convenient approach I know of is to use the System.Text.Decoder class.
Here's a simple example:
class Program
{
static void Main(string[] args)
{
Console.OutputEncoding = Encoding.Unicode;
string originalText = "Hello world! ブ䥺ぎょズィ穃 槞こ廤樊稧 ひゃご禺 壪";
byte[] rgb = Encoding.UTF8.GetBytes(originalText);
MemoryStream dataStream = new MemoryStream(rgb);
string result = DecodeOneByteAtATimeFromStream(dataStream);
Console.WriteLine("Result string: \"" + result + "\"");
if (originalText == result)
{
Console.WriteLine("Original and result strings are equal");
}
}
static string DecodeOneByteAtATimeFromStream(MemoryStream dataStream)
{
Decoder decoder = Encoding.UTF8.GetDecoder();
StringBuilder sb = new StringBuilder();
int inputByteCount;
byte[] inputBuffer = new byte[1];
while ((inputByteCount = dataStream.Read(inputBuffer, 0, 1)) > 0)
{
int charCount = decoder.GetCharCount(inputBuffer, 0, 1);
char[] rgch = new char[charCount];
decoder.GetChars(inputBuffer, 0, 1, rgch, 0);
sb.Append(rgch);
}
return sb.ToString();
}
}
Presumably you are already aware of the drawbacks of processing data of any sort just one byte at a time. :) Suffice to say, this is not a very efficient way to do things.

Related

Decompressing GZIP stream

I am trying to decompress a GZipped string which is part of response from a webservice. The string that I have is:
"[31,-117,8,0,0,0,0,0,0,0,109,-114,65,11,-62,48,12,-123,-1,75,-50,-61,-42,-127,30,122,21,111,-126,94,60,-119,-108,-72,102,44,-48,-75,-93,-21,100,56,-6,-33,-19,20,20,101,57,37,95,-14,94,-34,4,-63,-5,-72,-73,-44,-110,-117,-96,38,-88,26,-74,38,-112,3,117,-7,25,-82,5,24,-116,56,-97,-44,108,-23,28,24,-44,-85,83,34,-41,97,-88,24,-99,23,36,124,-120,94,99,-120,15,-42,-91,-108,91,45,-11,70,119,60,-110,21,-20,12,-115,-94,111,-80,-93,89,-41,-65,-127,-82,76,41,51,-19,52,90,-5,69,-85,76,-96,-128,64,22,35,-33,-23,-124,-79,-55,-1,-2,-10,-87,0,55,-76,55,10,-57,122,-9,73,42,-45,98,-44,5,-77,101,-3,58,-91,39,38,51,-15,121,21,1,0,0]"
I'm trying to decompress that string using the following method:
public static string UnZip(string value)
{
// Removing brackets from string
value = value.TrimStart('[');
value = value.TrimEnd(']');
//Transform string into byte[]
string[] strArray = value.Split(',');
byte[] byteArray = new byte[strArray.Length];
for (int i = 0; i < strArray.Length; i++)
{
if (strArray[i][0] != '-')
byteArray[i] = Convert.ToByte(strArray[i]);
else
{
int val = Convert.ToInt16(strArray[i]);
byteArray[i] = (byte)(val + 256);
}
}
//Prepare for decompress
System.IO.MemoryStream ms = new System.IO.MemoryStream(byteArray);
System.IO.Compression.GZipStream sr = new System.IO.Compression.GZipStream(ms,
System.IO.Compression.CompressionMode.Decompress);
//Reset variable to collect uncompressed result
byteArray = new byte[byteArray.Length];
//Decompress
int rByte = sr.Read(byteArray, 0, byteArray.Length);
//Transform byte[] unzip data to string
System.Text.StringBuilder sB = new System.Text.StringBuilder(rByte);
//Read the number of bytes GZipStream red and do not a for each bytes in
//resultByteArray;
for (int i = 0; i < rByte; i++)
{
sB.Append((char)byteArray[i]);
}
sr.Close();
ms.Close();
sr.Dispose();
ms.Dispose();
return sB.ToString();
}
The method is a modified version of the one in the following link:
http://www.codeproject.com/Articles/27203/GZipStream-Compress-Decompress-a-string
Sadly, the result of that method is a corrupted string. More specifically, I know that the input string contains a compressed JSON object and the output string has only some of the expected string:
"{\"rootElement\":{\"children\":[{\"children\":[],\"data\":{\"fileUri\":\"file:////Luciano/e/orto_artzi_2006_0_5_pixel/index/shapefiles/index_cd20/shp_all/index_cd2.shp\",\"relativePath\":\"/i"
Any idea what could be the problem and how to solve it?

Try
public static string UnZip(string value)
{
// Removing brackets from string
value = value.TrimStart('[');
value = value.TrimEnd(']');
//Transform string into byte[]
string[] strArray = value.Split(',');
byte[] byteArray = new byte[strArray.Length];
for (int i = 0; i < strArray.Length; i++)
{
byteArray[i] = unchecked((byte)Convert.ToSByte(strArray[i]));
}
//Prepare for decompress
using (System.IO.MemoryStream output = new System.IO.MemoryStream())
{
using (System.IO.MemoryStream ms = new System.IO.MemoryStream(byteArray))
using (System.IO.Compression.GZipStream sr = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Decompress))
{
sr.CopyTo(output);
}
string str = Encoding.UTF8.GetString(output.GetBuffer(), 0, (int)output.Length);
return str;
}
}
The MemoryBuffer() doesn't "duplicate" the byteArray but is directly backed by it, so you can't reuse the byteArray.
I'll add that I find funny that they "compressed" a json of 277 characters to a stringized byte array of 620 characters.
As a sidenote, the memory occupation of this method is out-of-the-roof... The 620 character string (that in truth is a 277 byte array) to be decompressed causes the creation of strings/arrays for a total size of 4887 bytes (including the 620 initial character string) (disclaimer: the GC can reclaim part of this memory during the execution of the method). This is ok for byte arrays of 277 bytes... But for bigger ones the memory occupation will become quite big.

Following on from Xanatos's answer in C# slightly modified to return a simple byte array. This takes a gzip compressed byte array and returns the inflated gunzipped array.
public static byte[] Decompress(byte[] compressed_data)
{
var outputStream = new MemoryStream();
using (var compressedStream = new MemoryStream(compressed_data))
using (System.IO.Compression.GZipStream sr = new System.IO.Compression.GZipStream(
compressedStream, System.IO.Compression.CompressionMode.Decompress))
{
sr.CopyTo(outputStream);
outputStream.Position = 0;
return outputStream.ToArray();
}
}

UTF8 Byte to String & Winsock GetStream

Well, I'm trying to convert a large information in bytes for string. (11076 length)
The problem in the end, the information is with missing characters. (length 10996)
Look:
The information is received by Winsock connection, look the proccess:
public static void UpdateClient(UserConnection client)
{
string data = null;
Decoder utf8Decoder = Encoding.UTF8.GetDecoder();
Console.WriteLine("Iniciando");
byte[] buffer = ReadFully(client.TCPClient.GetStream(), 0);
int charCount = utf8Decoder.GetCharCount(buffer, 0, buffer.Length);
Char[] chars = new Char[charCount];
int charsDecodedCount = utf8Decoder.GetChars(buffer, 0, buffer.Length, chars, 0);
foreach (Char c in chars)
{
data = data + String.Format("{0}", c);
}
int buffersize = buffer.Length;
Console.WriteLine("Chars is: " + chars.Length);
Console.WriteLine("Data is: " + data);
Console.WriteLine("Byte is: " + buffer.Length);
Console.WriteLine("Size is: " + data.Length);
Server.Network.ReceiveData.SelectPacket(client.Index, data);
}
public static byte[] ReadFully(Stream stream, int initialLength)
{
if (initialLength < 1)
{
initialLength = 32768;
}
byte[] buffer = new byte[initialLength];
int read = 0;
int chunk;
chunk = stream.Read(buffer, read, buffer.Length - read);
checkreach:
read += chunk;
if (read == buffer.Length)
{
int nextByte = stream.ReadByte();
if (nextByte == -1)
{
return buffer;
}
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
goto checkreach;
}
byte[] ret = new byte[read];
Array.Copy(buffer, ret, read);
return ret;
}
Anyone have tips or a solution?

It's perfectly normal for UTF-8 encoded text to be more bytes than the number of characters. In UTF-8 some characters (for example á and ã) are encoded into two or more bytes.
As the ReadFully method returns garbage if you try to use it to read more than fits in the initial buffer or if it can't read the entire stream with one Read call, you shouldn't use it. Also the way that the char array is converted to a string is extremely slow. Just use a StreamReader to read the stream and decode it to a string:
public static void UpdateClient(UserConnection client) {
string data;
using (StreamReader reader = new StreamReader(client.TCPClient.GetStream(), Encoding.UTF8)) {
data = reader.ReadToEnd();
}
Console.WriteLine("Data is: " + data);
Console.WriteLine("Size is: " + data.Length);
Server.Network.ReceiveData.SelectPacket(client.Index, data);
}

Convert binary to string not works

I created a simple program.
I create a string and compress it by following methods and store it in a binary data field type in sql server 2008 (binary(1000) field type).
When I read that binary data and result string is true like original string data with the same length and data but when I want to decompress it it gave me an error.
I use this method to get bytes:
System.Text.ASCIIEncoding.ASCII.GetBytes(mystring)
And this method to get string:
System.Text.ASCIIEncoding.ASCII.GetString(binarydata)
In hard code in VS2012 editor, result string works fine, but when I read it from sql it gives me this error in first line of decompression method:
The input is not a valid Base-64 string as it contains a
non-base 64 character, more than two padding characters,
or a non-white space character among the padding characters.
What's wrong with my code? These two strings are same but
string test1=Decompress("mystring");
...this method works fine but this gave me that error and can not decompress retrieved string
string temp=System.Text.ASCIIEncoding.ASCII.GetString(get data from sql) ;
string test2=Decompress(temp);
The comparing these string do not shows any deference
int result = string.Compare(test1, test2); // result=0
My compression method:
public static string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
var memoryStream = new MemoryStream();
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
{
gZipStream.Write(buffer, 0, buffer.Length);
}
memoryStream.Position = 0;
var compressedData = new byte[memoryStream.Length];
memoryStream.Read(compressedData, 0, compressedData.Length);
var gZipBuffer = new byte[compressedData.Length + 4];
Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
return Convert.ToBase64String(gZipBuffer);
}
My decompression method:
public static string Decompress(string compressedText)
{
byte[] gZipBuffer = Convert.FromBase64String(compressedText);
using (var memoryStream = new MemoryStream())
{
int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);
var buffer = new byte[dataLength];
memoryStream.Position = 0;
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
{
gZipStream.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}

The most likely issue is the way you are getting the string from the SQL binary filed.
Currently (I guess, you have not showed how you stored or retrieved your data from SQL)
Compress : Text -> UTF8.GetBytes -> compress -> base64 string-> Send to Sql (transformed to binary)
Decompress: Binary -> String representation of binary -> base64 decode -> decompress -> UTF8.GetString
Your issue is the String representation of binary step is not the same as the Send to Sql (transformed to binary). If you are storing this as a varbinary you should be returning the byte array from compress and decompress should take in a byte array.
public byte[] string Compress(string text)
{
//Snip
}
public static string Decompress(byte[] compressedText)
{
//Snip
}
this changes your process to
Compress : Text -> UTF8.GetBytes -> compress -> Send to Sql
Decompress: Binary -> decompress -> UTF8.GetString

How to read very long input from console in C#?

I need to load veeeery long line from console in C#, up to 65000 chars. Console.ReadLine itself has a limit of 254 chars(+2 for escape sequences), but I can use this:
static string ReadLine()
{
Stream inputStream = Console.OpenStandardInput(READLINE_BUFFER_SIZE);
byte[] bytes = new byte[READLINE_BUFFER_SIZE];
int outputLength = inputStream.Read(bytes, 0, READLINE_BUFFER_SIZE);
Console.WriteLine(outputLength);
char[] chars = Encoding.UTF7.GetChars(bytes, 0, outputLength);
return new string(chars);
}
...to overcome that limit, for up to 8190 chars(+2 for escape sequences) - unfortunately I need to enter WAY bigger line, and when READLINE_BUFFER_SIZE is set to anything bigger than 8192, error "Not enough storage is available to process this command" shows up in VS. Buffer should be set to 65536. I've tried a couple of solutions to do that, yet I'm still learning and none exceeded either 1022 or 8190 chars, how can I increase that limit to 65536? Thanks in advance.

You have to add following line of code in your main() method:
byte[] inputBuffer = new byte[4096];
Stream inputStream = Console.OpenStandardInput(inputBuffer.Length);
Console.SetIn(new StreamReader(inputStream, Console.InputEncoding, false, inputBuffer.Length));
Then you can use Console.ReadLine(); to read long user input.

try Console.Read with StringBuilder
StringBuilder sb =new StringBuilder();
while (true) {
char ch = Convert.ToChar(Console.Read());
sb.Append(ch);
if (ch=='\n') {
break;
}
}

I agree with Manmay, that seems to work for me, and I also attempt to keep the default stdin so I can restore it afterwards:
if (dbModelStrPathname == #"con" ||
dbModelStrPathname == #"con:")
{
var stdin = Console.In;
var inputBuffer = new byte[262144];
var inputStream = Console.OpenStandardInput(inputBuffer.Length);
Console.SetIn(new StreamReader(inputStream, Console.InputEncoding, false, inputBuffer.Length));
dbModelStr = Console.In.ReadLine();
Console.SetIn(stdin);
}
else
{
dbModelStr = File.ReadAllText(dbModelStrPathname);
}

Unicode-to-string conversion in C#

How can I convert a Unicode value to its equivalent string?
For example, I have "రమెశ్", and I need a function that accepts this Unicode value and returns a string.
I was looking at the System.Text.Encoding.Convert() function, but that does not take in a Unicode value; it takes two encodings and a byte array.
I bascially have a byte array that I need to save in a string field and then come back later and convert the string first back to a byte array.
So I use ByteConverter.GetString(byteArray) to save the byte array to a string, but I can't get it back to a byte array.

Use .ToString();:
this.Text = ((char)0x00D7).ToString();

Try the following:
byte[] bytes = ...;
string convertedUtf8 = Encoding.UTF8.GetString(bytes);
string convertedUtf16 = Encoding.Unicode.GetString(bytes); // For UTF-16
The other way around is using `GetBytes():
byte[] bytesUtf8 = Encoding.UTF8.GetBytes(convertedUtf8);
byte[] bytesUtf16 = Encoding.Unicode.GetBytes(convertedUtf16);
In the Encoding class, there are more variants if you need them.

To convert a string to a Unicode string, do it like this: very simple... note the BytesToString function which avoids using any inbuilt conversion stuff. Fast, too.
private string BytesToString(byte[] Bytes)
{
MemoryStream MS = new MemoryStream(Bytes);
StreamReader SR = new StreamReader(MS);
string S = SR.ReadToEnd();
SR.Close();
return S;
}
private string ToUnicode(string S)
{
return BytesToString(new UnicodeEncoding().GetBytes(S));
}

UTF8Encoding Class
UTF8Encoding uni = new UTF8Encoding();
Console.WriteLine( uni.GetString(new byte[] { 1, 2 }));

There are different types of encoding. You can try some of them to see if your bytestream get converted correctly:
System.Text.ASCIIEncoding encodingASCII = new System.Text.ASCIIEncoding();
System.Text.UTF8Encoding encodingUTF8 = new System.Text.UTF8Encoding();
System.Text.UnicodeEncoding encodingUNICODE = new System.Text.UnicodeEncoding();
var ascii = string.Format("{0}: {1}", encodingASCII.ToString(), encodingASCII.GetString(textBytesASCII));
var utf = string.Format("{0}: {1}", encodingUTF8.ToString(), encodingUTF8.GetString(textBytesUTF8));
var unicode = string.Format("{0}: {1}", encodingUNICODE.ToString(), encodingUNICODE.GetString(textBytesCyrillic));
Have a look here as well: http://george2giga.com/2010/10/08/c-text-encoding-and-transcoding/.

var ascii = $"{new ASCIIEncoding().ToString()}: {((ASCIIEncoding)new ASCIIEncoding()).GetString(textBytesASCII)}";
var utf = $"{new UTF8Encoding().ToString()}: {((UTF8Encoding)new UTF8Encoding()).GetString(textBytesUTF8)}";
var unicode = $"{new UnicodeEncoding().ToString()}: {((UnicodeEncoding)new UnicodeEncoding()).GetString(textBytesCyrillic)}";

Wrote a cycle for converting unicode symbols in string to UTF8 letters:
string stringWithUnicodeSymbols = #"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, #"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
try
{
if (s.Length == 4)
{
var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
outString += decoded;
}
else
{
outString += s;
}
}
catch (Exception e)
{
outString += s;
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I read exactly one char from a Stream? - c#

I have a Stream with some text data (can be ASCII, UTF-8, Unicode; encoding is known). I need to read exactly one char from the stream, without advancing stream position any longer. StreamReader is inappropriate, as it aggressively prefetches data from the stream. Ideas?

Related

Decompressing GZIP stream

UTF8 Byte to String & Winsock GetStream

Convert binary to string not works

How to read very long input from console in C#?

Unicode-to-string conversion in C#

Categories

Resources