I am having the problem that whitespace of some sort is being inserted between characters when I am converting a Queue<byte> list into a string for comparison. I do not think that they are actual whitespace characters, however, because the Queue retains only seven values and when debugging I am still able to see the seven character values. See image:
Relevant code:
Queue<byte> bufKeyword = new Queue<byte>(7);
// Remove old byte from queue and add new one
if (bufKeyword.Count == 7) bufKeyword.Dequeue();
bufKeyword.Enqueue((byte)fsInput.ReadByte());
// Check buffer string for match
StringBuilder bufKeywordString = new StringBuilder();
foreach (byte qByte in bufKeyword) {
bufKeywordString.Append(Encoding.ASCII.GetString(BitConverter.GetBytes(qByte)));
}
string _bufKeywordString = bufKeywordString.ToString();
Console.WriteLine("{0}", _bufKeywordString); //DEBUG - SEE IMAGE
StringBuilder bufWriteString = new StringBuilder();
if (_bufKeywordString.StartsWith("time=")) //Does not work because of 'whitespace'
{
for (int i = 1; i < 25; i++) { bufWriteString.Append(fsInput.ReadByte()); } // Read next 24 bytes
fileWriteQueue.Enqueue(bufWriteString.ToString()); // Add this data to write queue
fileWriteQueueCount++;
fileBytesRead += 24; // Change to new spot in file
}
There is no BitConverter.GetBytes for byte argument. byte gets converted to short, and BitConverter.GetBytes(short) returns an array of two elements.
So instead of
bufKeywordString.Append(Encoding.ASCII.GetString(BitConverter.GetBytes(qByte)));
try
bufKeywordString.Append(Encoding.ASCII.GetString(new byte[] {qByte});
Related
I had not found a way of adding two bytestrings in google documentation. I tried to convert them to strings, concatenate, than convert back. But it gets stuck on conversion back to ByteString.
if (someList.Count > 3)
{
var bigString = "";
for (int i = 0; i < someList.Count; i++)
{
string partString= someList.ElementAt(i).ToBase64();
bigString += partString;
Logger.Write($"{i}");
}
Logger.Write("here");
var chunk = ByteString.FromBase64(bigString);
Logger.Write("here2");
SomeFunc(chunk);
someList.Clear();
}
It gets "here", but never to "here2".
UPD: ByteStrings in someList contain WaveIn audioData
I would definitely not suggest converting to base64 and back - there's no need for that, and it causes issues as base64 strings aren't naturally joinable like this due to padding.
It looks like you're not actually trying to concatenate two ByteStrings, but some arbitrary number. I'd suggest writing them all to a MemoryStream, then creating a new ByteString from that:
using var tempStream = new MemoryStream();
foreach (var byteString in someList)
{
byteString.WriteTo(tempStream);
}
tempStream.Position = 0;
var combinedString = ByteString.FromStream(tempStream);
I have a file which is very long, and has no line breaks, CR or LF or other delimiters.
Records are fixed length, and the first control record length is 24 and all other record lengths are of fixed length 81 bytes.
I know how to read a fixed length file per line basis and I am using Multi Record Engine and have defined classes for each 81 byte line record but can’t figure out how I can read 80 characters at a time and then parse that string for the actual fields.
You can use the FileStream to read the number of bytes you need - like in your case either 24 or 81. Keep in mind that progressing through the stream the position changes and therefor you should not use the offset (should always be 0) - also be aware that if there is no information "left" on the stream it will cause an exception.
So you would end up with something like this:
var recordlength = 81;
var buffer = new byte[recordlength];
stream.Read(buffer, 0, recordlength); // offset = 0, start at current position
var record = System.Text.Encoding.UTF8.GetString(buffer); // single record
Since the recordlength is different for the control record you could use that part into a single method, let's name it Read and use that read method to traverse through the stream untill you reach the end, like this:
public List<string> Records()
{
var result = new List<string>();
using(var stream = new FileStream(#"c:\temp\lipsum.txt", FileMode.Open))
{
// first record
result.Add(Read(stream, 24));
var record = "";
do
{
record = Read(stream);
if (!string.IsNullOrEmpty(record)) result.Add(record);
}
while (record.Length > 0);
}
return result;
}
private string Read(FileStream stream, int length = 81)
{
if (stream.Length < stream.Position + length) return "";
var buffer = new byte[length];
stream.Read(buffer, 0, length);
return System.Text.Encoding.UTF8.GetString(buffer);
}
This will give you a list of records (including the starting control record).
This is far from perfect, but an example - also keep in mind that even if the file is empty there is always 1 result in the returned list.
I have a binary file I am reading and printing into a textbox while wrapping at a set point, but it is wrapping at places it shouldn't be. I want to ignore all line feed characters except those I have defined.
There isn't a single Newline byte, rather it seems to be a series of them. I think I found the series of Hex values 00-01-01-0B that seem to correspond with where the line feeds should be.
How do I ignore existing line breaks, and use what I want instead?
This is where I am at:
shortFile = new FileStream(#"tempfile.dat", FileMode.Open, FileAccess.Read);
DisplayArea.Text = "";
byte[] block = new byte[1000];
shortFile.Position = 0;
while (shortFile.Read(block, 0, 1000) > 0)
{
string trimmedText = System.Text.Encoding.Default.GetString(block);
DisplayArea.Text += trimmedText + "\n";
}
I had just figured it out a couple minutes before dlatikay posted, but really appreciated seeing that he also had the right idea. I just replaced all control characters with spaces.
for (int i = 0; i < block.Length; i++)
{
if (block[i] < 32)
{
block[i] = 0x20;
}
}
I want to deserialize a list of 1 million pairs of (String,Guid) for a performance critical app. The format can be anything I choose, and serialization does not have the same performance requirements.
What sort of approach is best? Text or binary? Write each pair (string,guid) consecutively, or write all strings followed by all guids?
I started playing with LinqPad, (and the simpler example of deserializing strings only) and found that (slightly counter-intuitively), using a TextReader and ReadLine() was a fair bit faster than using a BinaryReader and ReadString(). (Is the filesystem cache playing tricks on me?)
public string[] DeSerializeBinary()
{
var tmr = System.Diagnostics.Stopwatch.StartNew();
long ms = 0;
string[] arr = null;
using (var rdr = new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read)))
{
var num = rdr.ReadInt32();
arr = new String[num];
for (int i = 0; i < num; i++)
{
arr[i] = rdr.ReadString();
}
tmr.Stop();
ms = tmr.ElapsedMilliseconds;
Console.WriteLine("DeSerializeBinary took {0}ms", ms);
}
return arr;
}
public string[] DeserializeText()
{
var tmr = System.Diagnostics.Stopwatch.StartNew();
long ms = 0;
string[] arr = null;
using (var rdr = File.OpenText(file))
{
var num = Int32.Parse(rdr.ReadLine());
arr = new String[num];
for (int i = 0; i < num; i++)
{
arr[i] = rdr.ReadLine();
}
tmr.Stop();
ms = tmr.ElapsedMilliseconds;
Console.WriteLine("DeserializeText took {0}ms", ms);
}
return arr;
}
Some Edits:
I used RamMap to clear the file system cache, and it turns out there was very little difference to Text & Binary reader for strings only.
I have a fairly simple class that holds the string and guid. It also holds an int index which corresponds to its position in the list. Obviously there's no need to include this in serialization.
In a test for (binary) deSerializing Strings and Guids alternately, I get around 500ms.
Ideal timing is 50ms, or as close as I can get. However, a simple experiment showed it takes at least 120ms to read the (compressed) file into memory from a reasonably fast SSD drive, without any sort of parsing at all. So 50ms seems unlikely.
Our strings have no theoretical length restrictions. However, we can assume that the performance target only applies if they are all 20 characters or less.
Timings include opening the file.
Reading the Strings is the clear bottleneck now (hence my experiments with serializing strings only). The JIT_NewFast took 30% before I preallocated an array of 16bytes for reading GUIDs.
It's not surprising that reading a bunch of strings is faster with StreamReader than with BinaryReader. StreamReader reads in blocks from the underlying stream, and parses the strings from that buffer. BinaryReader doesn't have a buffer like that. It reads the string length from the underlying stream, and then reads that many characters. So BinaryReader makes more calls to the base stream's Read method.
But there's more to deserializing a (String, Guid) pair than just reading. You also have to parse the Guid. If you write the file in binary then the Guid is written in binary, which makes it much easier and faster to create a Guid structure. If it's a string, then you have to call new Guid(string) to parse the text and create a Guid, after you split the line into its two fields.
Hard to say which of those will be faster.
I can't imagine that we're talking about a whole lot of time here. Certainly reading a file with a million lines will take around a second. Unless the string is really long. A GUID is only 36 characters if you count the separators, right?
With BinaryWriter, you can write the file like this:
writer.Write(count); // integer number of records
foreach (var pair in pairs)
{
writer.Write(pair.theString);
writer.Write(pair.theGuid.ToByteArray());
}
And to read it, you have:
count = reader.ReadInt32();
byte[] guidBytes = new byte[16];
for (int i = 0; i < count; ++i)
{
string s = reader.ReadString();
reader.Read(guidBytes, 0, guidBytes.Length);
pairs.Add(new Pair(s, new Guid(guidBytes));
}
Whether that's faster than splitting a string and calling the Guid constructor that takes a string parameter, I don't know.
I suspect that any difference is going to be pretty slight. I'd probably go with the simplest method: a text file.
If you want to get really crazy, you can write a custom format that you can easily slurp up in just a couple of large reads (a header, an index, and two arrays for strings and GUIDs), and do everything else in memory. That would almost certainly be faster. But faster enough to warrant the extra work? Doubtful.
Update
Or maybe not doubtful. Here's some code that writes and reads a custom binary format. The format is:
count (int32)
guids (count * 16 bytes)
strings (one big concatenated string)
index (index of each string's starting character in the big string)
I assume you're using a Dictionary<string, Guid> to hold these things. But your data structure doesn't really matter. The code would be substantially the same.
Note that I tested this very briefly. I won't say that the code is 100% bug free, but I think you can get the idea of what I'm doing.
private void WriteGuidFile(string filename, Dictionary<string, Guid>guids)
{
using (var fs = File.Create(filename))
{
using (var writer = new BinaryWriter(fs, Encoding.UTF8))
{
List<int> stringIndex = new List<int>(guids.Count);
StringBuilder bigString = new StringBuilder();
// write count
writer.Write(guids.Count);
// Write the GUIDs and build the string index
foreach (var pair in guids)
{
writer.Write(pair.Value.ToByteArray(), 0, 16);
stringIndex.Add(bigString.Length);
bigString.Append(pair.Key);
}
// Add one more entry to the string index.
// makes deserializing easier
stringIndex.Add(bigString.Length);
// Write the string that contains all of the strings, combined
writer.Write(bigString.ToString());
// write the index
foreach (var ix in stringIndex)
{
writer.Write(ix);
}
}
}
}
Reading is just slightly more involved:
private Dictionary<string, Guid> ReadGuidFile(string filename)
{
using (var fs = File.OpenRead(filename))
{
using (var reader = new BinaryReader(fs, Encoding.UTF8))
{
// read the count
int count = reader.ReadInt32();
// The guids are in a huge byte array sized 16*count
byte[] guidsBuffer = new byte[16*count];
reader.Read(guidsBuffer, 0, guidsBuffer.Length);
// Strings are all concatenated into one
var bigString = reader.ReadString();
// Index is an array of int. We can read it as an array of
// ((count+1) * 4) bytes.
byte[] indexBuffer = new byte[4*(count+1)];
reader.Read(indexBuffer, 0, indexBuffer.Length);
var guids = new Dictionary<string, Guid>(count);
byte[] guidBytes = new byte[16];
int startix = 0;
int endix = 0;
for (int i = 0; i < count; ++i)
{
endix = BitConverter.ToInt32(indexBuffer, 4*(i+1));
string key = bigString.Substring(startix, endix - startix);
Buffer.BlockCopy(guidsBuffer, (i*16),
guidBytes, 0, 16);
guids.Add(key, new Guid(guidBytes));
startix = endix;
}
return guids;
}
}
}
A couple of notes here. First, I'm using BitConverter to convert the data in the byte arrays to integers. It would be faster to use unsafe code and just index into the arrays using an int32*.
You might gain some speed by using pointers to index into the guidBuffer and calling Guid Constructor (Int32, Int16, Int16, Byte, Byte, Byte, Byte, Byte, Byte, Byte, Byte) rather than using Buffer.BlockCopy to copy the GUID into the temporary array.
You could make the string index an index of lengths rather than the starting positions. That would eliminate the need for the extra value at the end of the array, but it's unlikely that it'd make any difference in the speed.
There might be other optimization opportunities, but I think you get the general idea here.
I'm currently making a game but I seem to have problems reading values from a text file. For some reason, when I read the value, it gives me the ASCII code of the value rather than the actual value itself when I wrote it to the file. I've tried about every ASCII conversion function and string conversion function, but I just can't seem to figure it out.
I use a 2D array of integers. I use a nested for loop to write each element into the file. I've looked at the file and the values are correct, but I don't understand why it's returning the ASCII code. Here's the code I'm using to write and read to file:
Writing to file:
for (int i = 0; i < level.MaxRows(); i++)
{
for (int j = 0; j < level.MaxCols(); j++)
{
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
//Console.WriteLine(level.GetValueAtIndex(i, j));
}
//add new line
fileWrite.WriteLine();
}
And here's the code where I read the values from the file:
string str = "";
int iter = 0; //used to iterate in each column of array
for (int i = 0; i < level.MaxRows(); i++)
{
iter = 0;
//TODO: For some reason, the file is returning ASCII code, convert to int
//keep reading characters until a space is reached.
str = fileRead.ReadLine();
//take the above string and extract the values from it.
//Place each value in the level.
foreach (char id in str)
{
if (id != ' ')
{
//convert id to an int
num = (int)id;
level.ChangeTile(i, iter, num);
iter++;
}
}
This is the latest version of the loop that I use to read the values. Reading other values is fine; it's just when I get to the array, things go wrong. I guess my question is, why did the conversion to ASCII happen? If I can figure that out, then I might be able to solve the issue. I'm using XNA 4 to make my game.
This is where the convertion to ascii is happening:
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
The + operator implicitly converts the integer returned by GetValueAtIndex into a string, because you are adding it to a string (really, what did you expect to happen?)
Furthermore, the ReadLine method returns a String, so I am not sure why you'd expect a numeric value to magically come back here. If you want to write binary data, look into BinaryWriter
This is where you are converting the characters to character codes:
num = (int)id;
The id variable is a char, and casting that to int gives you the character code, not the numeric value.
Also, this converts a single character, not a whole number. If you for example have "12 34 56 " in your text file, it will get the codes for 1, 2, 3, 4, 5 and 6, not 12, 34 and 56.
You would want to split the line on spaces, and parse each substring:
foreach (string id in str.Split(' ')) {
if (id.Length > 0) {
num = Int32.Parse(id);
level.ChangeTile(i, iter, num);
iter++;
}
}
Update: I've kept the old code (below) with the assumption that one record was on each line, but I've also added a different way of doing it that should work with multiple integers on a line, separated by a space.
Multiple records on one line
str = fileRead.ReadLine();
string[] values = str.Split(new Char[] {' '});
foreach (string value in values)
{
int testNum;
if (Int32.TryParse(str, out testnum))
{
// again, not sure how you're using iter here
level.ChangeTile(i, iter, num);
}
}
One record per line
str = fileRead.ReadLine();
int testNum;
if (Int32.TryParse(str, out testnum))
{
// however, I'm not sure how you're using iter here; if it's related to
// parsing the string, you'll probably need to do something else
level.ChangeTile(i, iter, num);
}
Please note that the above should work if you write out each integer line-by-line (i.e. how you were doing it via the WriteLine which you remarked out in your code above). If you switch back to using a WriteLine, this should work.
You have:
foreach (char id in str)
{
//convert id to an int
num = (int)id;
A char is an ASCII code (or can be considered as such; technically it is a unicode code-point, but that is broadly comparable assuming you are writing ANSI or low-value UTF-8).
What you want is:
num = (int)(id - '0');
This:
fileWrite.Write(level.GetValueAtIndex(i, j) + " ");
converts the int returned from level.GetValueAtIndex(i, j) into a string. Assuming the function returns the value 5 for a particular i and j then you write "5 " into the file.
When you then read it is being read as a string which consists of chars and you get the ASCII code of 5 when you cast it simply to an int. What you need is:
foreach (char id in str)
{
if (id != ' ')
{
//convert id to an int
num = (int)(id - '0'); // subtract the ASCII value for 0 from your current id
level.ChangeTile(i, iter, num);
iter++;
}
}
However this only works if you only ever are going to have single digit integers (only 0 - 9). This might be better:
foreach (var cell in fileRead.ReadLine().Split(' '))
{
num = Int.Parse(cell);
level.ChangeTile(i, iter, num);
iter++;
}