Replace sequence of bytes in binary file

Replace sequence of bytes in binary file - c#

What is the best method to replace sequence of bytes in binary file to the same length of other bytes? The binary files will be pretty large, about 50 mb and should not be loaded at once in memory.
Update: I do not know location of bytes which needs to be replaced, I need to find them first.

Assuming you're trying to replace a known section of the file.
Open a FileStream with read/write access
Seek to the right place
Overwrite existing data
Sample code coming...
public static void ReplaceData(string filename, int position, byte[] data)
{
using (Stream stream = File.Open(filename, FileMode.Open))
{
stream.Position = position;
stream.Write(data, 0, data.Length);
}
}
If you're effectively trying to do a binary version of a string.Replace (e.g. "always replace bytes { 51, 20, 34} with { 20, 35, 15 } then it's rather harder. As a quick description of what you'd do:
Allocate a buffer of at least the size of data you're interested in
Repeatedly read into the buffer, scanning for the data
If you find a match, seek back to the right place (e.g. stream.Position -= buffer.Length - indexWithinBuffer; and overwrite the data
Sounds simple so far... but the tricky bit is if the data starts near the end of the buffer. You need to remember all potential matches and how far you've matched so far, so that if you get a match when you read the next buffer's-worth, you can detect it.
There are probably ways of avoiding this trickiness, but I wouldn't like to try to come up with them offhand :)
EDIT: Okay, I've got an idea which might help...
Keep a buffer which is at least twice as big as you need
Repeatedly:
Copy the second half of the buffer into the first half
Fill the second half of the buffer from the file
Search throughout the whole buffer for the data you're looking for
That way at some point, if the data is present, it will be completely within the buffer.
You'd need to be careful about where the stream was in order to get back to the right place, but I think this should work. It would be trickier if you were trying to find all matches, but at least the first match should be reasonably simple...

My solution :
/// <summary>
/// Copy data from a file to an other, replacing search term, ignoring case.
/// </summary>
/// <param name="originalFile"></param>
/// <param name="outputFile"></param>
/// <param name="searchTerm"></param>
/// <param name="replaceTerm"></param>
private static void ReplaceTextInBinaryFile(string originalFile, string outputFile, string searchTerm, string replaceTerm)
{
byte b;
//UpperCase bytes to search
byte[] searchBytes = Encoding.UTF8.GetBytes(searchTerm.ToUpper());
//LowerCase bytes to search
byte[] searchBytesLower = Encoding.UTF8.GetBytes(searchTerm.ToLower());
//Temporary bytes during found loop
byte[] bytesToAdd = new byte[searchBytes.Length];
//Search length
int searchBytesLength = searchBytes.Length;
//First Upper char
byte searchByte0 = searchBytes[0];
//First Lower char
byte searchByte0Lower = searchBytesLower[0];
//Replace with bytes
byte[] replaceBytes = Encoding.UTF8.GetBytes(replaceTerm);
int counter = 0;
using (FileStream inputStream = File.OpenRead(originalFile)) {
//input length
long srcLength = inputStream.Length;
using (BinaryReader inputReader = new BinaryReader(inputStream)) {
using (FileStream outputStream = File.OpenWrite(outputFile)) {
using (BinaryWriter outputWriter = new BinaryWriter(outputStream)) {
for (int nSrc = 0; nSrc < srcLength; ++nSrc)
//first byte
if ((b = inputReader.ReadByte()) == searchByte0
|| b == searchByte0Lower) {
bytesToAdd[0] = b;
int nSearch = 1;
//next bytes
for (; nSearch < searchBytesLength; ++nSearch)
//get byte, save it and test
if ((b = bytesToAdd[nSearch] = inputReader.ReadByte()) != searchBytes[nSearch]
&& b != searchBytesLower[nSearch]) {
break;//fail
}
//Avoid overflow. No need, in my case, because no chance to see searchTerm at the end.
//else if (nSrc + nSearch >= srcLength)
// break;
if (nSearch == searchBytesLength) {
//success
++counter;
outputWriter.Write(replaceBytes);
nSrc += nSearch - 1;
}
else {
//failed, add saved bytes
outputWriter.Write(bytesToAdd, 0, nSearch + 1);
nSrc += nSearch;
}
}
else
outputWriter.Write(b);
}
}
}
}
Console.WriteLine("ReplaceTextInBinaryFile.counter = " + counter);
}

You can use my BinaryUtility to search and replace one or more bytes without loading the entire file into memory like this:
var searchAndReplace = new List<Tuple<byte[], byte[]>>()
{
Tuple.Create(
BitConverter.GetBytes((UInt32)0xDEADBEEF),
BitConverter.GetBytes((UInt32)0x01234567)),
Tuple.Create(
BitConverter.GetBytes((UInt32)0xAABBCCDD),
BitConverter.GetBytes((UInt16)0xAFFE)),
};
using(var reader =
new BinaryReader(new FileStream(#"C:\temp\data.bin", FileMode.Open)))
{
using(var writer =
new BinaryWriter(new FileStream(#"C:\temp\result.bin", FileMode.Create)))
{
BinaryUtility.Replace(reader, writer, searchAndReplace);
}
}
BinaryUtility code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
public static class BinaryUtility
{
public static IEnumerable<byte> GetByteStream(BinaryReader reader)
{
const int bufferSize = 1024;
byte[] buffer;
do
{
buffer = reader.ReadBytes(bufferSize);
foreach (var d in buffer) { yield return d; }
} while (bufferSize == buffer.Length);
}
public static void Replace(BinaryReader reader, BinaryWriter writer, IEnumerable<Tuple<byte[], byte[]>> searchAndReplace)
{
foreach (byte d in Replace(GetByteStream(reader), searchAndReplace)) { writer.Write(d); }
}
public static IEnumerable<byte> Replace(IEnumerable<byte> source, IEnumerable<Tuple<byte[], byte[]>> searchAndReplace)
{
foreach (var s in searchAndReplace)
{
source = Replace(source, s.Item1, s.Item2);
}
return source;
}
public static IEnumerable<byte> Replace(IEnumerable<byte> input, IEnumerable<byte> from, IEnumerable<byte> to)
{
var fromEnumerator = from.GetEnumerator();
fromEnumerator.MoveNext();
int match = 0;
foreach (var data in input)
{
if (data == fromEnumerator.Current)
{
match++;
if (fromEnumerator.MoveNext()) { continue; }
foreach (byte d in to) { yield return d; }
match = 0;
fromEnumerator.Reset();
fromEnumerator.MoveNext();
continue;
}
if (0 != match)
{
foreach (byte d in from.Take(match)) { yield return d; }
match = 0;
fromEnumerator.Reset();
fromEnumerator.MoveNext();
}
yield return data;
}
if (0 != match)
{
foreach (byte d in from.Take(match)) { yield return d; }
}
}
}

public static void BinaryReplace(string sourceFile, byte[] sourceSeq, string targetFile, byte[] targetSeq)
{
FileStream sourceStream = File.OpenRead(sourceFile);
FileStream targetStream = File.Create(targetFile);
try
{
int b;
long foundSeqOffset = -1;
int searchByteCursor = 0;
while ((b=sourceStream.ReadByte()) != -1)
{
if (sourceSeq[searchByteCursor] == b)
{
if (searchByteCursor == sourceSeq.Length - 1)
{
targetStream.Write(targetSeq, 0, targetSeq.Length);
searchByteCursor = 0;
foundSeqOffset = -1;
}
else
{
if (searchByteCursor == 0)
{
foundSeqOffset = sourceStream.Position - 1;
}
++searchByteCursor;
}
}
else
{
if (searchByteCursor == 0)
{
targetStream.WriteByte((byte) b);
}
else
{
targetStream.WriteByte(sourceSeq[0]);
sourceStream.Position = foundSeqOffset + 1;
searchByteCursor = 0;
foundSeqOffset = -1;
}
}
}
}
finally
{
sourceStream.Dispose();
targetStream.Dispose();
}
}

Related

Storing String Indexed Binary Data in File Using C# [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I wonder what is the best way to store binary data indexed by a string key into a single file.
This would be the circumstances I would be looking for:
Data indexed by a string key with variable length (max. 255 characters, ASCII only is fine).
Binary data has variable length (500 bytes up to 10 KB).
Amount of data stored < 5,000 entries.
In production only functions "GetDataByKey" & "GetAllKeys" needed and therefore should be fast.
Adding data is not used in production and can therefore be slow.
Is there any simple C# based library that would fit to those requirements?
I was looking at some NoSQL databases, but this seems to be a bit over the top for such a very simple data structure.
As only a small percentage of the data records are used during an application run I would prefer not to just read everything into memory on application start (e. g. using serialization), but instead just read the entries from the file that are really needed during runtime.
Any ideas or tips would be much appreciated, thanks!

Use Binaryformater like code below :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;
using System.Runtime.Serialization.Formatters.Binary;
using System.Xml.Serialization;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.bin";
static void Main(string[] args)
{
Read_Write readWrite = new Read_Write();
readWrite.CreateData(1000);
readWrite.WriteData(FILENAME);
Data data = readWrite.GetRecord(FILENAME, "101");
}
}
[Serializable()]
[XmlRoot(ElementName="ABC")]
public struct Data
{
public byte[] name;
public byte[] data;
}
public class Read_Write
{
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
const int MIN_SIZE = 500;
const int MAX_SIZE = 10000;
public List<Data> data { get; set; }
Dictionary<string, Data> dict = new Dictionary<string, Data>();
public void CreateData(int numberRecords)
{
data = new List<Data>();
for (int i = 0; i < numberRecords; i++)
{
Data newData = new Data();
string name = i.ToString() + '\0'; //null terminate string
newData.name = Encoding.UTF8.GetBytes(name);
Random rand = new Random();
int size = rand.Next(MIN_SIZE, MAX_SIZE);
newData.data = Enumerable.Range(0, size).Select(x => (byte)(rand.Next(0, 0xFF) & 0xFF)).ToArray();
data.Add(newData);
}
}
public void WriteData(string filename)
{
Stream writer = File.OpenWrite(filename);
//write number of records
byte[] numberOfRecords = BitConverter.GetBytes((int)data.Count());
writer.Write(numberOfRecords, 0, 4);
foreach (Data d in data)
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(writer, d);
}
writer.Flush();
writer.Close();
}
public Data GetRecord(string filename, string name)
{
Data record = new Data();
Stream reader = File.OpenRead(filename);
byte[] numberOfRecords = new byte[4];
reader.Read(numberOfRecords, 0, 4);
int records = BitConverter.ToInt32(numberOfRecords, 0);
DateTime start = DateTime.Now;
for(int i = 0; i < records; i++)
{
BinaryFormatter formatter = new BinaryFormatter();
Data d = (Data)formatter.Deserialize(reader);
//if (name == GetString(d.name))
//{
// record = d;
// break;
//}
}
DateTime end = DateTime.Now;
TimeSpan time = end - start;
reader.Close();
return record;
}
public string GetString(byte[] characters)
{
int length = characters.ToList().IndexOf(0x00);
return Encoding.UTF8.GetString(characters, 0, length);
}
}
}

As there seems not to be a solution/library available for this yet (probably, because the problem is just too simple to share it ;-) ), I've build a small class myself.
In case somebody else needs the same, that's the way I store this string key based binary data now:
internal class BinaryKeyStorage
{
private const string FILE_PATH = #"data.bin";
private static MemoryMappedFile _memoryFile;
private static MemoryMappedViewStream _memoryFileStream;
private static Dictionary<string, Entry> _index;
private class Entry
{
public Entry(int position, int length)
{
Position = position;
Length = length;
}
public int Position { get; }
public int Length { get; }
}
public static void CreateFile(Dictionary<string, byte[]> keyValues)
{
// 4 bytes for int count of entries
// and per entry:
// - string length + 1 byte for string prefix
// - 2x4 bytes for int address start and length
var headerLength = 4 + keyValues.Keys.Sum(dataKey => dataKey.Length + 9);
var nextStartPosition = headerLength;
using (var binaryWriter = new BinaryWriter(File.Open(FILE_PATH, FileMode.Create)))
{
binaryWriter.Write(keyValues.Count);
// writing header
foreach (var keyValue in keyValues)
{
binaryWriter.Write(keyValue.Key);
binaryWriter.Write(nextStartPosition);
binaryWriter.Write(keyValue.Value.Length);
nextStartPosition += keyValue.Value.Length;
}
// writing data
foreach (var keyValue in keyValues)
{
binaryWriter.Write(keyValue.Value);
}
}
}
public static List<string> GetAllKeys()
{
InitializeIndexIfNeeded();
return _index.Keys.ToList();
}
public static byte[] GetData(string key)
{
InitializeIndexIfNeeded();
var entry = _index[key];
_memoryFileStream.Seek(entry.Position, SeekOrigin.Begin);
var data = new byte[entry.Length];
_memoryFileStream.Read(data, 0, data.Length);
return data;
}
private static void InitializeIndexIfNeeded()
{
if (_memoryFile != null) return;
_memoryFile = MemoryMappedFile.CreateFromFile(FILE_PATH, FileMode.Open);
_memoryFileStream = _memoryFile.CreateViewStream();
_index = new Dictionary<string, Entry>();
using (var binaryReader = new BinaryReader(_memoryFileStream, Encoding.Default, true))
{
var count = binaryReader.ReadInt32();
for (var i = 0; i < count; i++)
{
var dataKey = binaryReader.ReadString();
var dataPosition = binaryReader.ReadInt32();
var dataLength = binaryReader.ReadInt32();
_index.Add(dataKey, new Entry(dataPosition, dataLength));
}
}
}
}
It just caches the file header/index (the string keys together with the position/length of the data) in memory, the actual data is read directly from the memory mapped file only if needed.

How to read a binary file quickly in c#? (ReadOnlySpan vs MemoryStream)

I'm trying to parse a binary file as fastest as possible. So this is what I first tried to do:
using (FileStream filestream = path.OpenRead()) {
using (var d = new GZipStream(filestream, CompressionMode.Decompress)) {
using (MemoryStream m = new MemoryStream()) {
d.CopyTo(m);
m.Position = 0;
using (BinaryReaderBigEndian b = new BinaryReaderBigEndian(m)) {
while (b.BaseStream.Position != b.BaseStream.Length) {
UInt32 value = b.ReadUInt32();
} } } } }
Where BinaryReaderBigEndian class is implemented as it follows:
public static class BinaryReaderBigEndian {
public BinaryReaderBigEndian(Stream stream) : base(stream) { }
public override UInt32 ReadUInt32() {
var x = base.ReadBytes(4);
Array.Reverse(x);
return BitConverter.ToUInt32(x, 0);
} }
Then, I tried to get a performance improvement using ReadOnlySpan instead of MemoryStream. So, I tried doing:
using (FileStream filestream = path.OpenRead()) {
using (var d = new GZipStream(filestream, CompressionMode.Decompress)) {
using (MemoryStream m = new MemoryStream()) {
d.CopyTo(m);
int position = 0;
ReadOnlySpan<byte> stream = new ReadOnlySpan<byte>(m.ToArray());
while (position != stream.Length) {
UInt32 value = stream.ReadUInt32(position);
position += 4;
} } } }
Where BinaryReaderBigEndian class changed in:
public static class BinaryReaderBigEndian {
public override UInt32 ReadUInt32(this ReadOnlySpan<byte> stream, int start) {
var data = stream.Slice(start, 4).ToArray();
Array.Reverse(x);
return BitConverter.ToUInt32(x, 0);
} }
But, unfortunately, I didn't notice any improvement. So, where am I doing wrong?

I did some measurement of your code on my computer (Intel Q9400, 8 GiB RAM, SSD disk, Win10 x64 Home, .NET Framework 4/7/2, tested with 15 MB (when unpacked) file) with these results:
No-Span version: 520 ms
Span version: 720 ms
So Span version is actually slower! Why? Because new ReadOnlySpan<byte>(m.ToArray()) performs additional copy of whole file and also ReadUInt32() performs many slicings of the Span (slicing is cheap, but not free). Since you performed more work, you can't expect performance to be any better just because you used Span.
So can we do better? Yes. It turns out that the slowest part of your code is actually garbage collection caused by repeatedly allocating 4-byte Arrays created by the .ToArray() calls in ReadUInt32() method. You can avoid it by implementing ReadUInt32() yourself. It's pretty easy and also eliminates need for Span slicing. You can also replace new ReadOnlySpan<byte>(m.ToArray()) with new ReadOnlySpan<byte>(m.GetBuffer()).Slice(0, (int)m.Length);, which performs cheap slicing instead of copy of whole file. So now code looks like this:
public static void Read(FileInfo path)
{
using (FileStream filestream = path.OpenRead())
{
using (var d = new GZipStream(filestream, CompressionMode.Decompress))
{
using (MemoryStream m = new MemoryStream())
{
d.CopyTo(m);
int position = 0;
ReadOnlySpan<byte> stream = new ReadOnlySpan<byte>(m.GetBuffer()).Slice(0, (int)m.Length);
while (position != stream.Length)
{
UInt32 value = stream.ReadUInt32(position);
position += 4;
}
}
}
}
}
public static class BinaryReaderBigEndian
{
public static UInt32 ReadUInt32(this ReadOnlySpan<byte> stream, int start)
{
UInt32 res = 0;
for (int i = 0; i < 4; i++)
{
res = (res << 8) | (((UInt32)stream[start + i]) & 0xff);
}
return res;
}
}
With these changes I get from 720 ms down to 165 ms (4x faster). Sounds great, doesn't it? But we can do even better. We can completely avoid MemoryStream copy and inline and further optimize ReadUInt32():
public static void Read(FileInfo path)
{
using (FileStream filestream = path.OpenRead())
{
using (var d = new GZipStream(filestream, CompressionMode.Decompress))
{
var buffer = new byte[64 * 1024];
do
{
int bufferDataLength = FillBuffer(d, buffer);
if (bufferDataLength % 4 != 0)
throw new Exception("Stream length not divisible by 4");
if (bufferDataLength == 0)
break;
for (int i = 0; i < bufferDataLength; i += 4)
{
uint value = unchecked(
(((uint)buffer[i]) << 24)
| (((uint)buffer[i + 1]) << 16)
| (((uint)buffer[i + 2]) << 8)
| (((uint)buffer[i + 3]) << 0));
}
} while (true);
}
}
}
private static int FillBuffer(Stream stream, byte[] buffer)
{
int read = 0;
int totalRead = 0;
do
{
read = stream.Read(buffer, totalRead, buffer.Length - totalRead);
totalRead += read;
} while (read > 0 && totalRead < buffer.Length);
return totalRead;
}
And now it takes less than 90 ms (8x faster then the original!). And without Span! Span is great in situations, where it allows perform slicing and avoid array copy, but it won't improve performance just by blindly using it. After all, Span is designed to have performance characteristics on par with Array, but not better (and only on runtimes that have special support for it, such as .NET Core 2.1).

System.OutofMemoryException Error when Convert.ToBase64String()

I am trying to convert a zip file into a text file (xml) using the following methods. It works fine for smaller file but dose not seem to work for files larger than 50 mb.
class Program
{
public static void Main(string[] args)
{
try
{
string importFilePath = #"D:\CorpTax\Tasks\966442\CS Publish error\CSUPD20180604L.zip";
int maxLengthInMb = 20;
byte[] payLoad = File.ReadAllBytes(importFilePath);
int payLoadInMb = (payLoad.Length / 1024) / 1024;
bool splitIntoMultipleFiles = (payLoadInMb / maxLengthInMb) > 1;
int payLoadLength = splitIntoMultipleFiles ? maxLengthInMb * 1024 * 1024 : payLoad.Length;
if (splitIntoMultipleFiles)
{
foreach (byte[] splitPayLoad in payLoad.Slices(payLoadLength))
{
ToXml(payLoad);
}
}
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}
public static string ToXml(byte[] payLoad)
{
using (XmlStringWriter xmlStringWriter = new XmlStringWriter())
{
xmlStringWriter.WriteStartDocument();
xmlStringWriter.Writer.WriteStartElement("Payload");
xmlStringWriter.Writer.WriteRaw(Convert.ToBase64String(payLoad));
xmlStringWriter.Writer.WriteEndElement();
xmlStringWriter.WriteEndDocument();
return xmlStringWriter.ToString();
}
}
}
I have a .zip file which is like 120 MB in size and I get the
System.OutOfMemoryException when calling Convert.ToBase64String().
So I went ahead and split the byte array into a size of 20 mb chunks hoping that it will not fail. But I see that it works until it goes through the loop 3 times i.e able to convert 60mb of the data and in the 4th iteration i get the same exception. Some times I also get exceptions at the line return xmlStringWriter.ToString()
To split the byte[] I have used the following extension classes
public static class ArrayExtensions
{
public static T[] CopySlice<T>(this T[] source, int index, int length, bool padToLength = false)
{
int n = length;
T[] slice = null;
if (source.Length < index + length)
{
n = source.Length - index;
if (padToLength)
{
slice = new T[length];
}
}
if (slice == null) slice = new T[n];
Array.Copy(source, index, slice, 0, n);
return slice;
}
public static IEnumerable<T[]> Slices<T>(this T[] source, int count, bool padToLength = false)
{
for (var i = 0; i < source.Length; i += count)
{
yield return source.CopySlice(i, count, padToLength);
}
}
}
I got the above code from the following link
Splitting a byte[] into multiple byte[] arrays in C#
Funny part is the program runs fine when I run it in a console application but when I put this code into the windows application it throws the System.OutOfMemoryException.

Preferablilty you want to be doing something like this
byte[] Packet = new byte[4096];
string b64str = "";
using (FileStream fs = new FileStream(file, FileMode.Open))
{
int i = Packet.Length;
while (i == Packet.Length)
{
i = fs.Read(Packet, 0, Packet.Length);
b64str = Convert.ToBase64String(Packet, 0, i);
}
}
with that b64str you should create your xml data.
Also it is typically unwise to allocate 20mb on stack all in one go.

Reading a file one byte at a time in reverse order

Hi I am trying to read a file one byte at a time in reverse order.So far I only managed to read the file from begining to end and write it on another file.
I need to be able to read the file from the end to the begining and print it to another file.
This is what I have so far:
string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName ,FileMode.Open , FileAccess.Read))
{
//file.Seek(endOfFile, SeekOrigin.End);
int bytes;
using (FileStream newFile = new FileStream("newsFile.txt" , FileMode.Create , FileAccess.Write))
{
while ((bytes = file.ReadByte()) >= 0)
{
Console.WriteLine(bytes.ToString());
newFile.WriteByte((byte)bytes);
}
}
}
I know that I have to use the Seek method on the fileStream and that gets me to the end of the file.I already did that at the commented protion of the code , but I do not know how to read the file now in the while loop.
How can I achive this?

string fileName = Console.ReadLine();
using (FileStream file = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
byte[] output = new byte[file.Length]; // reversed file
// read the file backwards using SeekOrigin.Current
//
long offset;
file.Seek(0, SeekOrigin.End);
for (offset = 0; offset < fs.Length; offset++)
{
file.Seek(-1, SeekOrigin.Current);
output[offset] = (byte)file.ReadByte();
file.Seek(-1, SeekOrigin.Current);
}
// write entire reversed file array to new file
//
File.WriteAllBytes("newsFile.txt", output);
}

You could do it by reading one byte at a time, or you could read a larger buffer, write it to the output file in reverse, and continue like that until you've reached the beginning of the file. For example:
string inputFilename = "inputFile.txt";
string outputFilename = "outputFile.txt";
using (ofile = File.OpenWrite(outputFilename))
{
using (ifile = File.OpenRead(inputFilename))
{
int bufferSize = 4096;
byte[] buffer = new byte[bufferSize];
long filePos = ifile.Length;
do
{
long newPos = Math.Max(0, filePos - bufferSize);
int bytesToRead = (int)(filePos - newPos);
ifile.Seek(newPos, SeekOrigin.Set);
int bytesRead = ifile.Read(buffer, 0, bytesToRead);
// write the buffer to the output file, in reverse
for (int i = bytesRead-1; i >= 0; --i)
{
ofile.WriteByte(buffer[i]);
}
filePos = newPos;
} while (filePos > 0);
}
}
An obvious optimization would be to reverse the buffer after you've read it, and then write it in one whole chunk to the output file.
And if you know that the file will fit into memory, it's really easy:
var buffer = File.ReadAllBytes(inputFilename);
// now, reverse the buffer
int i = 0;
int j = buffer.Length-1;
while (i < j)
{
byte b = buffer[i];
buffer[i] = buffer[j];
buffer[j] = b;
++i;
--j;
}
// and write it
File.WriteAllBytes(outputFilename, buffer);

If the file is small (fits in your RAM) then this would work:
public static IEnumerable<byte> Reverse(string inputFilename)
{
var bytes = File.ReadAllBytes(inputFilename);
Array.Reverse(bytes);
foreach (var b in bytes)
{
yield return b;
}
}
Usage:
foreach (var b in Reverse("smallfile.dat"))
{
}

If the file is large (bigger than your RAM) then this would work:
using (var inputFile = File.OpenRead("bigfile.dat"))
using (var inputFileReversed = new ReverseStream(inputFile))
using (var binaryReader = new BinaryReader(inputFileReversed))
{
while (binaryReader.BaseStream.Position != binaryReader.BaseStream.Length)
{
var b = binaryReader.ReadByte();
}
}
It uses the ReverseStream class which can be found here.

How to read mixed file of byte and string

I've a mixed file with a lot of string line and part of byte encoded data.
Example:
--Begin Attach
Content-Info: /Format=TIF
Content-Description: 30085949.tif (TIF File)
Content-Transfer-Encoding: binary; Length=220096
II*II* Îh ÿÿÿÿÿÿü³küìpsMg›Êq™Æ™Ôd™‡–h7ÃAøAú áùõ=6?Eã½/ô|û ƒú7z:>„Çÿý<þ¯úýúßj?å¿þÇéöûþ“«ÿ¾ÁøKøÈ%ŠdOÿÞÈ<,Wþ‡ÿ·ƒïüúCÿß%Ï$sŸÿÃÿ÷‡þåiò>GÈù#ä|‘ò:#ä|Š":#¢:;ˆèŽˆèÊ¤V‘ÑÑÑÑÑÑÑÑÑçIþ×o(¿zHDDDDDFp'.Ñ:ˆR:aAràÁ¬LˆÈù!ÿÿï[ÿ¯Äàiƒ"VƒDÇ)Ê6PáÈê$9C”9C†‡CD¡pE#¦œÖ{i~Úý¯kköDœ4ÉU”8`ƒt!l2G
--End Attach--
i try to read file with streamreader:
string[] lines = System.IO.File.ReadAllLines(#"C:\Users\Davide\Desktop\20041230000D.xmm")
I read line by line the file, and when line is equal "Content-Transfer-Encoding: binary; Length=220096", i read all following lines and write a "filename"(in this case 30085949.tif) file.
But i'm reading strings, not byte data and result file is damaged (now i try with tiff file). Any suggestion for me?
SOLUTION
Thanks for reply. I've adopted this solution: I builded a LineReader extend BinaryReader:
public class LineReader : BinaryReader
{
public LineReader(Stream stream, Encoding encoding)
: base(stream, encoding)
{
}
public int currentPos;
private StringBuilder stringBuffer;
public string ReadLine()
{
currentPos = 0;
char[] buf = new char[1];
stringBuffer = new StringBuilder();
bool lineEndFound = false;
while (base.Read(buf, 0, 1) > 0)
{
currentPos++;
if (buf[0] == Microsoft.VisualBasic.Strings.ChrW(10))
{
lineEndFound = true;
}
else
{
stringBuffer.Append(buf[0]);
}
if (lineEndFound)
{
return stringBuffer.ToString();
}
}
return stringBuffer.ToString();
}
}
Where Microsoft.VisualBasic.Strings.ChrW(10) is a Line Feed.
When i parse my file:
using (LineReader b = new LineReader(File.OpenRead(path), Encoding.Default))
{
int pos = 0;
int length = (int)b.BaseStream.Length;
while (pos < length)
{
string line = b.ReadLine();
pos += (b.currentPos);
if (!beginNextPart)
{
if (line.StartsWith(BEGINATTACH))
{
beginNextPart = true;
}
}
else
{
if (line.StartsWith(ENDATTACH))
{
beginNextPart = false;
}
else
{
if (line.StartsWith("Content-Transfer-Encoding: binary; Length="))
{
attachLength = Convert.ToInt32(line.Replace("Content-Transfer-Encoding: binary; Length=", ""));
byte[] attachData = b.ReadBytes(attachLength);
pos += (attachLength);
ByteArrayToFile(#"C:\users\davide\desktop\files.tif", attachData);
}
}
}
}
}
I read a byte length from file and i read following n bytes.

Your problem here is that a StreamReader assumes that it is the only thing reading the file, and as a result it reads ahead. Your best bet is to read the file as binary and use the appropriate text encoding to retrieve the string data out of your own buffer.
Since apparently you don't mind reading the entire file into memory, you can start with a:
byte[] buf = System.IO.File.ReadAllBytes(#"C:\Users\Davide\Desktop\20041230000D.xmm");
Then assuming you're using UTF-8 for your text data:
int offset = 0;
int binaryLength = 0;
while (binaryLength == 0 && offset < buf.Length) {
var eolIdx = Array.IndexOf(offset, 13); // In a UTF-8 stream, byte 13 always represents newline
string line = System.Text.Encoding.UTF8.GetString(buf, offset, eolIdx - offset - 1);
// Process your line appropriately here, and set binaryLength if you expect binary data to follow
offset = eolIdx + 1;
}
// You don't necessarily need to copy binary data out, but just to show where it is:
var binary = new byte[binaryLength];
Buffer.BlockCopy(buf, offset, binary, 0, binaryLength);
You might also want to do a line.TrimEnd('\r'), if you expect Window-style line endings.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replace sequence of bytes in binary file - c#

What is the best method to replace sequence of bytes in binary file to the same length of other bytes? The binary files will be pretty large, about 50 mb and should not be loaded at once in memory. Update: I do not know location of bytes which needs to be replaced, I need to find them first.

Related

Storing String Indexed Binary Data in File Using C# [closed]

How to read a binary file quickly in c#? (ReadOnlySpan vs MemoryStream)

System.OutofMemoryException Error when Convert.ToBase64String()

Reading a file one byte at a time in reverse order

How to read mixed file of byte and string

Categories

Resources