Variable chunk size - c#

I am using the code below to break stream into smaller chunks. however, my chunk size is constant and i want it to be a variable. I want program to read till it hit symbol '$'and make '$ position to be chunk size.
For example: lets say txt file contains 01234583145329$34212349$2134567009$, so my 1st chunk size should be 14, second should be 8 and third should be 10. I did some research and find out that can be achieved by indexof method, but I am not able to implement that with the code below. Please advise.
If there is another efficient way other than Indexof, please let me know.
public static IEnumerable<IEnumerable<byte>> ReadByChunk(int chunkSize)
{
IEnumerable<byte> result;
int startingByte = 0;
do
{
result = ReadBytes(startingByte, chunkSize);
startingByte += chunkSize;
yield return result;
}
while (result.Any());
}
public static IEnumerable<byte> ReadBytes(int startingByte, int byteToRead)
{
byte[] result;
using (FileStream stream = File.Open(#"C:\Users\file.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
using (BinaryReader reader = new BinaryReader(stream))
{
int bytesToRead = Math.Max(Math.Min(byteToRead, (int)reader.BaseStream.Length - startingByte), 0);
reader.BaseStream.Seek(startingByte, SeekOrigin.Begin);
result = reader.ReadBytes(bytesToRead);
int chunkSize = Index of
}
return result;
}
static void Main()
{
int chunkSize = 8;
foreach (IEnumerable<byte> bytes in ReadByChunk(chunkSize))
{
//more code
}
}

You seem to care about characters, not bytes here, as you are trying to find $ characters. Just reading bytes will only work in the specific case of "each character is encoded with one byte". Therefore, you should use ReadChar and return IEnumerable<IEnumerable<char>> instead.
You seem to be creating a new reader and stream for each chunk, which I feel is quite unnecessary. You could just create one stream and one reader in ReadByChunk, and pass it to the ReadBytes method.
The IndexOf you found is probably for strings. I assume you want to lazily read from a stream, so reading everything into a string first and then using IndexOf seems to go against your intention.
For a text file, I would also recommend you to use StreamReader. BinaryReader is for reading binary files.
Here's my attempt:
public static IEnumerable<IEnumerable<char>> ReadByChunk()
{
using (StreamReader reader = new StreamReader(File.Open(...))) {
while (reader.Peek() != -1) { // while not at the end of the stream...
yield return ReadUntilNextDollarSign(reader);
}
}
}
public static IEnumerable<char> ReadUntilNextDollarSign(StreamReader reader)
{
char c;
// while not at the end of the stream, and the next char is not a dollar sign...
while (reader.Peek() != -1 && (c = (char)reader.Read()) != '$') {
yield return c;
}
}

Related

What is different with the writing in FileStream?

When I searched the method about decompress the file by using SharpZipLib, I found lot of methods like this:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite = File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
byte[] data = new byte[2048];
while (true)
{
size = s.Read(data, 0, data.Length);
if (size > 0)
{
fileWrite.Write(data, 0, size);
}
else
{
break;
}
}
fileWrite.Close();
}
}
}
The format FileStream.Write is:
FileStream.Write(byte[] array, int offset, int count)
Now I try to separate part of read and write because I want to use thread to speed up the decompress rate in write function, and I use dynamic array byte[] and int[] to deposit the file's data and size like below
Read:
public static void TarWriteCharacters(string tarfile, string targetDir)
{
using (TarInputStream s = new TarInputStream(File.OpenRead(tarfile)))
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
int size = 2048;
List<int> SizeList = new List<int>();
List<byte[]> mydatalist = new List<byte[]>();
while (true)
{
byte[] data = new byte[2048];
size = s.Read(data, 0, data.Length);
if (size > 0)
{
mydatalist.Add(data);
SizeList.Add(size);
}
else
{
break;
}
}
test = new Thread(() =>
FileWriteFun(pathToTar, args, SizeList, mydatalist)
);
test.Start();
streamWriter.Close();
}
}
}
Write:
public static void FileWriteFun(string pathToTar , string[] args, List<int> SizeList, List<byte[]> mydataList)
{
//some codes here
using (FileStream fileWrite= File.Create(targetDir + directoryName + fileName))
{
for (int i = 0; i < mydataList.Count; i++)
{
fileWrite.Write(mydataList[i], 0, SizeList[i]);
}
fileWrite.Close();
}
}
Edit
(1)byte[] data = new byte[2048] into while loop to assign data to new array.
(2)change int[] SizeList = new int[2048] to List<int> SizeList = new List<int>() because of int range
As read on a stream is only guarantied to return one byte (typically it will be more, but you can't rely on the full requested length each time), your solution can theoretically fail after 2048 bytes as your SizeList can only hold 2048 entries.
You could use a List to hold the sizes.
Or use a MemoryStream instead of inventing your own.
But the two main problems are:
1) You keep reading into the same byte array, overwriting previously read data. When you add your data byte array to mydatalist, you must assign data to a new byte array.
2) you close your stream before the second thread is done writing.
In general threading is difficult and should only be used where you know it will improve performance. Simply reading and writing data is typically IO bound in performance, not cpu bound, so introducing a second thread will just give a small performance penalty and no gain in speed. You could use multithreading to ensure concurrent read/write operations, but most likely the disk cache will do this for you if you stick to the first solution - amd if not, using async is easier than multithreaded to achieve this.

How to get minimum and maximum values in text file in C#?

In the text file(test.txt) have value
"Hi this my code project file"
Minimum=5 and maximum=6
I need out put is
minimum= 5 ("Hi t)
maximum=6 ("Hi th)
"Hi th
I believe you are looking for a function that reads the text files into a stream and then parses it into string variable. Once you have done that you can call stringVariable.substring(0,x) to get the output sub string that you are looking for.
Here is the code demonstrates this idea.
public string void GetSubString(int x) {
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
var buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
var str = System.Text.Encoding.Default.GetString(fileStream);
string sub = str.Substring(0, x);
return sub;
}
Here is a simpler version of Jeffrey's answer:
The reason the output has a single quote is because you can't have double quotes normally within double quotes and I forgot how to escape them.
int min = 5;
int max = 6;
String s = "'Hi this my code project file'";
String minS = s.Substring(0, min);
String maxS = s.Substring(0, max);
Console.WriteLine(minS);
Console.WriteLine(maxS);

Reading a file one byte at a time in reverse order

Hi I am trying to read a file one byte at a time in reverse order.So far I only managed to read the file from begining to end and write it on another file.
I need to be able to read the file from the end to the begining and print it to another file.
This is what I have so far:
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName ,FileMode.Open , FileAccess.Read))
{
//file.Seek(endOfFile, SeekOrigin.End);
int bytes;
using (FileStream newFile = new FileStream("newsFile.txt" , FileMode.Create , FileAccess.Write))
{
while ((bytes = file.ReadByte()) >= 0)
{
Console.WriteLine(bytes.ToString());
newFile.WriteByte((byte)bytes);
}
}
}
I know that I have to use the Seek method on the fileStream and that gets me to the end of the file.I already did that at the commented protion of the code , but I do not know how to read the file now in the while loop.
How can I achive this?
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
byte[] output = new byte[file.Length]; // reversed file
// read the file backwards using SeekOrigin.Current
//
long offset;
file.Seek(0, SeekOrigin.End);
for (offset = 0; offset < fs.Length; offset++)
{
file.Seek(-1, SeekOrigin.Current);
output[offset] = (byte)file.ReadByte();
file.Seek(-1, SeekOrigin.Current);
}
// write entire reversed file array to new file
//
File.WriteAllBytes("newsFile.txt", output);
}
You could do it by reading one byte at a time, or you could read a larger buffer, write it to the output file in reverse, and continue like that until you've reached the beginning of the file. For example:
string inputFilename = "inputFile.txt";
string outputFilename = "outputFile.txt";
using (ofile = File.OpenWrite(outputFilename))
{
using (ifile = File.OpenRead(inputFilename))
{
int bufferSize = 4096;
byte[] buffer = new byte[bufferSize];
long filePos = ifile.Length;
do
{
long newPos = Math.Max(0, filePos - bufferSize);
int bytesToRead = (int)(filePos - newPos);
ifile.Seek(newPos, SeekOrigin.Set);
int bytesRead = ifile.Read(buffer, 0, bytesToRead);
// write the buffer to the output file, in reverse
for (int i = bytesRead-1; i >= 0; --i)
{
ofile.WriteByte(buffer[i]);
}
filePos = newPos;
} while (filePos > 0);
}
}
An obvious optimization would be to reverse the buffer after you've read it, and then write it in one whole chunk to the output file.
And if you know that the file will fit into memory, it's really easy:
var buffer = File.ReadAllBytes(inputFilename);
// now, reverse the buffer
int i = 0;
int j = buffer.Length-1;
while (i < j)
{
byte b = buffer[i];
buffer[i] = buffer[j];
buffer[j] = b;
++i;
--j;
}
// and write it
File.WriteAllBytes(outputFilename, buffer);
If the file is small (fits in your RAM) then this would work:
public static IEnumerable<byte> Reverse(string inputFilename)
{
var bytes = File.ReadAllBytes(inputFilename);
Array.Reverse(bytes);
foreach (var b in bytes)
{
yield return b;
}
}
Usage:
foreach (var b in Reverse("smallfile.dat"))
{
}
If the file is large (bigger than your RAM) then this would work:
using (var inputFile = File.OpenRead("bigfile.dat"))
using (var inputFileReversed = new ReverseStream(inputFile))
using (var binaryReader = new BinaryReader(inputFileReversed))
{
while (binaryReader.BaseStream.Position != binaryReader.BaseStream.Length)
{
var b = binaryReader.ReadByte();
}
}
It uses the ReverseStream class which can be found here.

How to read mixed file of byte and string

I've a mixed file with a lot of string line and part of byte encoded data.
Example:
--Begin Attach
Content-Info: /Format=TIF
Content-Description: 30085949.tif (TIF File)
Content-Transfer-Encoding: binary; Length=220096
II*II* Îh ÿÿÿÿÿÿü³küìpsMg›Êq™Æ™Ôd™‡–h7ÃAøAú áùõ=6?Eã½/ô|û ƒú7z:>„Çÿý<þ¯úýúßj?å¿þÇéöûþ“«ÿ¾ÁøKøÈ%ŠdOÿÞÈ<,Wþ‡ÿ·ƒïüúCÿß%Ï$sŸÿÃÿ÷‡þåiò>GÈù#ä|‘ò:#ä|Š":#¢:;ˆèŽˆèʤV‘ÑÑÑÑÑÑÑÑÑçIþ×o(¿zHDDDDDFp'.Ñ:ˆR:aAràÁ¬LˆÈù!ÿÿï[ÿ¯Äàiƒ"VƒDÇ)Ê6PáÈê$9C”9C†‡CD¡pE#¦œÖ{i~Úý¯kköDœ4ÉU”8`ƒt!l2G
--End Attach--
i try to read file with streamreader:
string[] lines = System.IO.File.ReadAllLines(#"C:\Users\Davide\Desktop\20041230000D.xmm")
I read line by line the file, and when line is equal "Content-Transfer-Encoding: binary; Length=220096", i read all following lines and write a "filename"(in this case 30085949.tif) file.
But i'm reading strings, not byte data and result file is damaged (now i try with tiff file). Any suggestion for me?
SOLUTION
Thanks for reply. I've adopted this solution: I builded a LineReader extend BinaryReader:
public class LineReader : BinaryReader
{
public LineReader(Stream stream, Encoding encoding)
: base(stream, encoding)
{
}
public int currentPos;
private StringBuilder stringBuffer;
public string ReadLine()
{
currentPos = 0;
char[] buf = new char[1];
stringBuffer = new StringBuilder();
bool lineEndFound = false;
while (base.Read(buf, 0, 1) > 0)
{
currentPos++;
if (buf[0] == Microsoft.VisualBasic.Strings.ChrW(10))
{
lineEndFound = true;
}
else
{
stringBuffer.Append(buf[0]);
}
if (lineEndFound)
{
return stringBuffer.ToString();
}
}
return stringBuffer.ToString();
}
}
Where Microsoft.VisualBasic.Strings.ChrW(10) is a Line Feed.
When i parse my file:
using (LineReader b = new LineReader(File.OpenRead(path), Encoding.Default))
{
int pos = 0;
int length = (int)b.BaseStream.Length;
while (pos < length)
{
string line = b.ReadLine();
pos += (b.currentPos);
if (!beginNextPart)
{
if (line.StartsWith(BEGINATTACH))
{
beginNextPart = true;
}
}
else
{
if (line.StartsWith(ENDATTACH))
{
beginNextPart = false;
}
else
{
if (line.StartsWith("Content-Transfer-Encoding: binary; Length="))
{
attachLength = Convert.ToInt32(line.Replace("Content-Transfer-Encoding: binary; Length=", ""));
byte[] attachData = b.ReadBytes(attachLength);
pos += (attachLength);
ByteArrayToFile(#"C:\users\davide\desktop\files.tif", attachData);
}
}
}
}
}
I read a byte length from file and i read following n bytes.
Your problem here is that a StreamReader assumes that it is the only thing reading the file, and as a result it reads ahead. Your best bet is to read the file as binary and use the appropriate text encoding to retrieve the string data out of your own buffer.
Since apparently you don't mind reading the entire file into memory, you can start with a:
byte[] buf = System.IO.File.ReadAllBytes(#"C:\Users\Davide\Desktop\20041230000D.xmm");
Then assuming you're using UTF-8 for your text data:
int offset = 0;
int binaryLength = 0;
while (binaryLength == 0 && offset < buf.Length) {
var eolIdx = Array.IndexOf(offset, 13); // In a UTF-8 stream, byte 13 always represents newline
string line = System.Text.Encoding.UTF8.GetString(buf, offset, eolIdx - offset - 1);
// Process your line appropriately here, and set binaryLength if you expect binary data to follow
offset = eolIdx + 1;
}
// You don't necessarily need to copy binary data out, but just to show where it is:
var binary = new byte[binaryLength];
Buffer.BlockCopy(buf, offset, binary, 0, binaryLength);
You might also want to do a line.TrimEnd('\r'), if you expect Window-style line endings.

Replace sequence of bytes in binary file

What is the best method to replace sequence of bytes in binary file to the same length of other bytes? The binary files will be pretty large, about 50 mb and should not be loaded at once in memory.
Update: I do not know location of bytes which needs to be replaced, I need to find them first.
Assuming you're trying to replace a known section of the file.
Open a FileStream with read/write access
Seek to the right place
Overwrite existing data
Sample code coming...
public static void ReplaceData(string filename, int position, byte[] data)
{
using (Stream stream = File.Open(filename, FileMode.Open))
{
stream.Position = position;
stream.Write(data, 0, data.Length);
}
}
If you're effectively trying to do a binary version of a string.Replace (e.g. "always replace bytes { 51, 20, 34} with { 20, 35, 15 } then it's rather harder. As a quick description of what you'd do:
Allocate a buffer of at least the size of data you're interested in
Repeatedly read into the buffer, scanning for the data
If you find a match, seek back to the right place (e.g. stream.Position -= buffer.Length - indexWithinBuffer; and overwrite the data
Sounds simple so far... but the tricky bit is if the data starts near the end of the buffer. You need to remember all potential matches and how far you've matched so far, so that if you get a match when you read the next buffer's-worth, you can detect it.
There are probably ways of avoiding this trickiness, but I wouldn't like to try to come up with them offhand :)
EDIT: Okay, I've got an idea which might help...
Keep a buffer which is at least twice as big as you need
Repeatedly:
Copy the second half of the buffer into the first half
Fill the second half of the buffer from the file
Search throughout the whole buffer for the data you're looking for
That way at some point, if the data is present, it will be completely within the buffer.
You'd need to be careful about where the stream was in order to get back to the right place, but I think this should work. It would be trickier if you were trying to find all matches, but at least the first match should be reasonably simple...
My solution :
/// <summary>
/// Copy data from a file to an other, replacing search term, ignoring case.
/// </summary>
/// <param name="originalFile"></param>
/// <param name="outputFile"></param>
/// <param name="searchTerm"></param>
/// <param name="replaceTerm"></param>
private static void ReplaceTextInBinaryFile(string originalFile, string outputFile, string searchTerm, string replaceTerm)
{
byte b;
//UpperCase bytes to search
byte[] searchBytes = Encoding.UTF8.GetBytes(searchTerm.ToUpper());
//LowerCase bytes to search
byte[] searchBytesLower = Encoding.UTF8.GetBytes(searchTerm.ToLower());
//Temporary bytes during found loop
byte[] bytesToAdd = new byte[searchBytes.Length];
//Search length
int searchBytesLength = searchBytes.Length;
//First Upper char
byte searchByte0 = searchBytes[0];
//First Lower char
byte searchByte0Lower = searchBytesLower[0];
//Replace with bytes
byte[] replaceBytes = Encoding.UTF8.GetBytes(replaceTerm);
int counter = 0;
using (FileStream inputStream = File.OpenRead(originalFile)) {
//input length
long srcLength = inputStream.Length;
using (BinaryReader inputReader = new BinaryReader(inputStream)) {
using (FileStream outputStream = File.OpenWrite(outputFile)) {
using (BinaryWriter outputWriter = new BinaryWriter(outputStream)) {
for (int nSrc = 0; nSrc < srcLength; ++nSrc)
//first byte
if ((b = inputReader.ReadByte()) == searchByte0
|| b == searchByte0Lower) {
bytesToAdd[0] = b;
int nSearch = 1;
//next bytes
for (; nSearch < searchBytesLength; ++nSearch)
//get byte, save it and test
if ((b = bytesToAdd[nSearch] = inputReader.ReadByte()) != searchBytes[nSearch]
&& b != searchBytesLower[nSearch]) {
break;//fail
}
//Avoid overflow. No need, in my case, because no chance to see searchTerm at the end.
//else if (nSrc + nSearch >= srcLength)
// break;
if (nSearch == searchBytesLength) {
//success
++counter;
outputWriter.Write(replaceBytes);
nSrc += nSearch - 1;
}
else {
//failed, add saved bytes
outputWriter.Write(bytesToAdd, 0, nSearch + 1);
nSrc += nSearch;
}
}
else
outputWriter.Write(b);
}
}
}
}
Console.WriteLine("ReplaceTextInBinaryFile.counter = " + counter);
}
You can use my BinaryUtility to search and replace one or more bytes without loading the entire file into memory like this:
var searchAndReplace = new List<Tuple<byte[], byte[]>>()
{
Tuple.Create(
BitConverter.GetBytes((UInt32)0xDEADBEEF),
BitConverter.GetBytes((UInt32)0x01234567)),
Tuple.Create(
BitConverter.GetBytes((UInt32)0xAABBCCDD),
BitConverter.GetBytes((UInt16)0xAFFE)),
};
using(var reader =
new BinaryReader(new FileStream(#"C:\temp\data.bin", FileMode.Open)))
{
using(var writer =
new BinaryWriter(new FileStream(#"C:\temp\result.bin", FileMode.Create)))
{
BinaryUtility.Replace(reader, writer, searchAndReplace);
}
}
BinaryUtility code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
public static class BinaryUtility
{
public static IEnumerable<byte> GetByteStream(BinaryReader reader)
{
const int bufferSize = 1024;
byte[] buffer;
do
{
buffer = reader.ReadBytes(bufferSize);
foreach (var d in buffer) { yield return d; }
} while (bufferSize == buffer.Length);
}
public static void Replace(BinaryReader reader, BinaryWriter writer, IEnumerable<Tuple<byte[], byte[]>> searchAndReplace)
{
foreach (byte d in Replace(GetByteStream(reader), searchAndReplace)) { writer.Write(d); }
}
public static IEnumerable<byte> Replace(IEnumerable<byte> source, IEnumerable<Tuple<byte[], byte[]>> searchAndReplace)
{
foreach (var s in searchAndReplace)
{
source = Replace(source, s.Item1, s.Item2);
}
return source;
}
public static IEnumerable<byte> Replace(IEnumerable<byte> input, IEnumerable<byte> from, IEnumerable<byte> to)
{
var fromEnumerator = from.GetEnumerator();
fromEnumerator.MoveNext();
int match = 0;
foreach (var data in input)
{
if (data == fromEnumerator.Current)
{
match++;
if (fromEnumerator.MoveNext()) { continue; }
foreach (byte d in to) { yield return d; }
match = 0;
fromEnumerator.Reset();
fromEnumerator.MoveNext();
continue;
}
if (0 != match)
{
foreach (byte d in from.Take(match)) { yield return d; }
match = 0;
fromEnumerator.Reset();
fromEnumerator.MoveNext();
}
yield return data;
}
if (0 != match)
{
foreach (byte d in from.Take(match)) { yield return d; }
}
}
}
public static void BinaryReplace(string sourceFile, byte[] sourceSeq, string targetFile, byte[] targetSeq)
{
FileStream sourceStream = File.OpenRead(sourceFile);
FileStream targetStream = File.Create(targetFile);
try
{
int b;
long foundSeqOffset = -1;
int searchByteCursor = 0;
while ((b=sourceStream.ReadByte()) != -1)
{
if (sourceSeq[searchByteCursor] == b)
{
if (searchByteCursor == sourceSeq.Length - 1)
{
targetStream.Write(targetSeq, 0, targetSeq.Length);
searchByteCursor = 0;
foundSeqOffset = -1;
}
else
{
if (searchByteCursor == 0)
{
foundSeqOffset = sourceStream.Position - 1;
}
++searchByteCursor;
}
}
else
{
if (searchByteCursor == 0)
{
targetStream.WriteByte((byte) b);
}
else
{
targetStream.WriteByte(sourceSeq[0]);
sourceStream.Position = foundSeqOffset + 1;
searchByteCursor = 0;
foundSeqOffset = -1;
}
}
}
}
finally
{
sourceStream.Dispose();
targetStream.Dispose();
}
}

Categories

Resources