I have a huge text file, size > 4GB and I want to replace some text in it programmatically. I know the line number at which I have to replace the text but the problem is that I do not want to copy all the text (along with my replaced line) to a second file. I have to do this within the source file. Is there a way to do this in C#?
The text which has to be replaced is exactly the same size as the source text (if this helps).
Since the file is so large you may want to take a look at the .NET 4.0 support for memory mapped files. Basically you'll need to move the file/stream pointer to the location in the file, overwrite that location, then flush the file to disk. You won't need to load the entire file into memory.
For example, without using memory mapped files, the following will overwrite a part of an ascii file. Args are the input file, the zero based start index and the new text.
static void Main(string[] args)
{
string inputFilename = args[0];
int startIndex = int.Parse(args[1]);
string newText = args[2];
using (FileStream fs = new FileStream(inputFilename, FileMode.Open, FileAccess.Write))
{
fs.Position = startIndex;
byte[] newTextBytes = Encoding.ASCII.GetBytes(newText);
fs.Write(newTextBytes, 0, newTextBytes.Length);
}
}
Unless the new text is exactly the same size as the old text, you will have to re-write the file. There is no way around it. You can at least do this without keeping the entire file in memory.
Hello I tested the following -works well.This caters to variable length lines separated by Environment.NewLine. if you have fixed length lines you can straightaway seek to it.For converting bytes to string and vice versa you can use Encoding.
static byte[] ReadNextLine(FileStream fs)
{
byte[] nl = new byte[] {(byte) Environment.NewLine[0],(byte) Environment.NewLine[1] };
List<byte> ll = new List<byte>();
bool lineFound = false;
while (!lineFound)
{
byte b = (byte)fs.ReadByte();
if ((int)b == -1) break;
ll.Add(b);
if (b == nl[0]){
b = (byte)fs.ReadByte();
ll.Add(b);
if (b == nl[1]) lineFound = true;
}
}
return ll.Count ==0?null: ll.ToArray();
}
static void Main(string[] args)
{
using (FileStream fs = new FileStream(#"c:\70-528\junk.txt", FileMode.Open, FileAccess.ReadWrite))
{
int replaceLine=1231;
byte[] b = null;
int lineCount=1;
while (lineCount<replaceLine && (b=ReadNextLine(fs))!=null ) lineCount++;//Skip Lines
long seekPos = fs.Position;
b = ReadNextLine(fs);
fs.Seek(seekPos, 0);
string line=new string(b.Select(x=>(char)x).ToArray());
line = line.Replace("Text1", "Text2");
b=line.ToCharArray().Select(x=>(byte)x).ToArray();
fs.Write(b, 0, b.Length);
}
}
I'm guessing you'll want to use the FileStream class and seek to your positon, and place your updated data.
Related
I have one file:
out.txt
out.txt Contains the line:
123456789
We need to write a program so that in a certain position of the file, new numbers can be inserted.
For example: insert the number 2000 in position 3.
The file should have the following result:
1232000456789
Program restrictions! Lines cannot be cached from a file. That is, files can weigh up to ~40 GB or even more. Therefore, you need to somehow insert a buffer into the desired position. And move the rest by the length of the inserted character length.
Here is the broken code. It replaces characters. And the rest of the data disappears:
using (FileStream fileStream = new FileStream(oldFileForUpdate, FileMode.OpenOrCreate, FileAccess.Write))
{
foreach (KeyValuePair<long, byte[]> keyValuePair in fileStructExtension.Buffer)
{
fileStream.Seek(keyValuePair.Key, SeekOrigin.Begin);
await fileStream.WriteAsync(keyValuePair.Value, 0, keyValuePair.Value.Length, cancellationToken);
}
}
Your best bet is to copy data to a temp file like so:
Copy all the bytes before the insertion point to the temp file.
Copy the bytes to be inserted to the temp file.
Copy the remainder of the bytes from the original file to the temp file.
Delete the original file.
Rename the temp file to the original file.
Sample implementation (adjust the buffer size to taste):
public static void Insert(string filename, long offset, byte[] data)
{
byte[] buffer = new byte[32768];
var tempFile = filename + ".tmp";
using (var input = File.OpenRead(filename))
using (var output = File.OpenWrite(tempFile))
{
while (offset > 0)
{
int n = (int)Math.Min(offset, buffer.Length);
n = input.Read(buffer, 0, n);
if (n <= 0)
throw new InvalidOperationException("Input file is too short.");
output.Write(buffer, 0, n);
offset -= n;
}
output.Write(data, 0, data.Length);
input.CopyTo(output);
}
File.Delete(filename);
File.Move(tempFile, filename);
}
This does of course require that you can create a temporary file in the same folder as the source folder, and that there is enough free disk space for the temporary file plus the original file.
I want to read file continuously like GNU tail with "-f" param. I need it to live-read log file.
What is the right way to do it?
More natural approach of using FileSystemWatcher:
var wh = new AutoResetEvent(false);
var fsw = new FileSystemWatcher(".");
fsw.Filter = "file-to-read";
fsw.EnableRaisingEvents = true;
fsw.Changed += (s,e) => wh.Set();
var fs = new FileStream("file-to-read", FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (var sr = new StreamReader(fs))
{
var s = "";
while (true)
{
s = sr.ReadLine();
if (s != null)
Console.WriteLine(s);
else
wh.WaitOne(1000);
}
}
wh.Close();
Here the main reading cycle stops to wait for incoming data and FileSystemWatcher is used just to awake the main reading cycle.
You want to open a FileStream in binary mode. Periodically, seek to the end of the file minus 1024 bytes (or whatever), then read to the end and output. That's how tail -f works.
Answers to your questions:
Binary because it's difficult to randomly access the file if you're reading it as text. You have to do the binary-to-text conversion yourself, but it's not difficult. (See below)
1024 bytes because it's a nice convenient number, and should handle 10 or 15 lines of text. Usually.
Here's an example of opening the file, reading the last 1024 bytes, and converting it to text:
static void ReadTail(string filename)
{
using (FileStream fs = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// Seek 1024 bytes from the end of the file
fs.Seek(-1024, SeekOrigin.End);
// read 1024 bytes
byte[] bytes = new byte[1024];
fs.Read(bytes, 0, 1024);
// Convert bytes to string
string s = Encoding.Default.GetString(bytes);
// or string s = Encoding.UTF8.GetString(bytes);
// and output to console
Console.WriteLine(s);
}
}
Note that you must open with FileShare.ReadWrite, since you're trying to read a file that's currently open for writing by another process.
Also note that I used Encoding.Default, which in US/English and for most Western European languages will be an 8-bit character encoding. If the file is written in some other encoding (like UTF-8 or other Unicode encoding), It's possible that the bytes won't convert correctly to characters. You'll have to handle that by determining the encoding if you think this will be a problem. Search Stack overflow for info about determining a file's text encoding.
If you want to do this periodically (every 15 seconds, for example), you can set up a timer that calls the ReadTail method as often as you want. You could optimize things a bit by opening the file only once at the start of the program. That's up to you.
To continuously monitor the tail of the file, you just need to remember the length of the file before.
public static void MonitorTailOfFile(string filePath)
{
var initialFileSize = new FileInfo(filePath).Length;
var lastReadLength = initialFileSize - 1024;
if (lastReadLength < 0) lastReadLength = 0;
while (true)
{
try
{
var fileSize = new FileInfo(filePath).Length;
if (fileSize > lastReadLength)
{
using (var fs = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fs.Seek(lastReadLength, SeekOrigin.Begin);
var buffer = new byte[1024];
while (true)
{
var bytesRead = fs.Read(buffer, 0, buffer.Length);
lastReadLength += bytesRead;
if (bytesRead == 0)
break;
var text = ASCIIEncoding.ASCII.GetString(buffer, 0, bytesRead);
Console.Write(text);
}
}
}
}
catch { }
Thread.Sleep(1000);
}
}
I had to use ASCIIEncoding, because this code isn't smart enough to cater for variable character lengths of UTF8 on buffer boundaries.
Note: You can change the Thread.Sleep part to be different timings, and you can also link it with a filewatcher and blocking pattern - Monitor.Enter/Wait/Pulse. For me the timer is enough, and at most it only checks the file length every second, if the file hasn't changed.
This is my solution
static IEnumerable<string> TailFrom(string file)
{
using (var reader = File.OpenText(file))
{
while (true)
{
string line = reader.ReadLine();
if (reader.BaseStream.Length < reader.BaseStream.Position)
reader.BaseStream.Seek(0, SeekOrigin.Begin);
if (line != null) yield return line;
else Thread.Sleep(500);
}
}
}
so, in your code you can do
foreach (string line in TailFrom(file))
{
Console.WriteLine($"line read= {line}");
}
You could use the FileSystemWatcher class which can send notifications for different events happening on the file system like file changed.
private void button1_Click(object sender, EventArgs e)
{
if (folderBrowserDialog.ShowDialog() == DialogResult.OK)
{
path = folderBrowserDialog.SelectedPath;
fileSystemWatcher.Path = path;
string[] str = Directory.GetFiles(path);
string line;
fs = new FileStream(str[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
tr = new StreamReader(fs);
while ((line = tr.ReadLine()) != null)
{
listBox.Items.Add(line);
}
}
}
private void fileSystemWatcher_Changed(object sender, FileSystemEventArgs e)
{
string line;
line = tr.ReadLine();
listBox.Items.Add(line);
}
If you are just looking for a tool to do this then check out free version of Bare tail
Yesterday my teachers gives me a task to make something like database in .txt file which has to contains hexes and a C# application which takes all hexes from this database, also with its offSets. Then i gotta use it,the offset, to take the hex from file on this offset and compare the both haxes, does they are same.
I am using fileSystemWatcher to "spy" chosen directory for new files and with one, two, three or little bit more files it works perfect but if i try to copy very "big" folder the application stops - "not responding".
I have try to find from where the problem comes like i adding and deleting functions and found the "black sheep" -the function which have to take the file's hex which is comply on the given offset.
public string filesHex(string path,int bytesToRead,string offsetLong)
{
byte[] byVal;
try
{
using (Stream fileStream = new FileStream(path, FileMode.Open, FileAccess.Read))
{
BinaryReader brFile = new BinaryReader(fileStream);
offsetLong = offsetLong.Replace("x", string.Empty);
long result = 0;
long.TryParse(offsetLong, System.Globalization.NumberStyles.HexNumber, null, out result);
fileStream.Position = result;
byte[] offsetByte = brFile.ReadBytes(0);
string offsetString = HexStr(offsetByte);
//long offset = System.Convert.ToInt64(offsetString, 16);
byVal = brFile.ReadBytes(bytesToRead);
}
string hex = HexStr(byVal).Substring(2);
return hex;
}
You could create a new Thread and run the filesHex method in it.
You can change your string inside the thread code and get it's value after like this:
public string hex="";
public void filesHex(string path,int bytesToRead,string offsetLong)
{
byte[] byVal;
using (Stream fileStream = new FileStream(path, FileMode.Open, FileAccess.Read))
{
BinaryReader brFile = new BinaryReader(fileStream);
offsetLong = offsetLong.Replace("x", string.Empty);
long result = 0;
long.TryParse(offsetLong, System.Globalization.NumberStyles.HexNumber, null, out result);
fileStream.Position = result;
byte[] offsetByte = brFile.ReadBytes(0);
string offsetString = HexStr(offsetByte);
//long offset = System.Convert.ToInt64(offsetString, 16);
byVal = brFile.ReadBytes(bytesToRead);
}
hex = HexStr(byVal).Substring(2);
}
This would be your call:
Thread thread = new Thread(() => filesHex("a",5,"A"));//example for parameters.
thread.Start();
string hexfinal=hex;//here you can acess the desired string.
Now it would not freeze the main UI thread because you run your method on a sperate thread.
Goodluck.
I am trying to write a program that transfers a file through sound (kind of like a fax). I broke up my program into several steps:
convert file to binary
convert 1 to a certain tone and 0 to another
play the tones to another computer
other computer listens to tones
other computer converts tones into binary
other computer converts binary into file.
However, I can't seem to find a way to convert a file to binary. I found a way to convert a string to binary using
public static string StringToBinary(string data)
{
StringBuilder sb = new StringBuilder();
foreach (char c in data.ToCharArray())
{
sb.Append(Convert.ToString(c, 2).PadLeft(8,'0'));
}
return sb.ToString();
}
From http://www.fluxbytes.com/csharp/convert-string-to-binary-and-binary-to-string-in-c/ .
But I can't find out how to convert a file to binary (the file could be of any extension).
So, how can I convert a file to binary? Is there a better way for me to write my program?
Why don't you just open the file in binary mode?
this function opens the file in binary mode and returns the byte array:
private byte[] GetBinaryFile(filename)
{
byte[] bytes;
using (FileStream file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
bytes = new byte[file.Length];
file.Read(bytes, 0, (int)file.Length);
}
return bytes;
}
then to convert it to bits:
byte[] bytes = GetBinaryFile("filename.bin");
BitArray bits = new BitArray(bytes);
now bits variable holds 0,1 you wanted.
or you can just do this:
private BitArray GetFileBits(filename)
{
byte[] bytes;
using (FileStream file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
bytes = new byte[file.Length];
file.Read(bytes, 0, (int)file.Length);
}
return new BitArray(bytes);
}
Or even shorter code could be:
private BitArray GetFileBits(filename)
{
byte[] bytes = File.ReadAllBytes(filename);
return new BitArray(bytes);
}
I'm trying to differentiate between "text files" and "binary" files, as I would effectively like to ignore files with "unreadable" contents.
I have a file that I believe is a GZIP archive. I'm tring to ignore this kind of file by detecting the magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad++ I can see the first three hex codes are 1f 8b 08.
However if I read the file using a StreamReader, I'm not sure how to get to the original bytes..
using (var streamReader = new StreamReader(#"C:\file"))
{
char[] buffer = new char[10];
streamReader.Read(buffer, 0, 10);
var s = new String(buffer);
byte[] bytes = new byte[6];
System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6);
var hex = BitConverter.ToString(bytes);
var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray()));
}
At the end of the using statement I have the following variable values:
hex: "1F-00-FD-FF-08-00"
otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03"
Neither of which start with the hex values shown in Notepad++.
Is it possible to get the original bytes from the result of reading a file via StreamReader?
Your code tries to change a binary buffer into a string. Strings are Unicode in NET so two bytes are required. The resulting is a bit unpredictable as you can see.
Just use a BinaryReader and its ReadBytes method
using(FileStream fs = new FileStream(#"C:\file", FileMode.Open, FileAccess.Read))
{
using (var reader = new BinaryReader(fs, new ASCIIEncoding()))
{
byte[] buffer = new byte[10];
buffer = reader.ReadBytes(10);
if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8)
// you have a signature match....
}
}
Usage (for a pdf file):
Assert.AreEqual("25504446", GetMagicNumbers(filePath, 4));
Method GetMagicNumbers:
private static string GetMagicNumbers(string filepath, int bytesCount)
{
// https://en.wikipedia.org/wiki/List_of_file_signatures
byte[] buffer;
using (var fs = new FileStream(filepath, FileMode.Open, FileAccess.Read))
using (var reader = new BinaryReader(fs))
buffer = reader.ReadBytes(bytesCount);
var hex = BitConverter.ToString(buffer);
return hex.Replace("-", String.Empty).ToLower();
}
You can't. StreamReader is made to read text, not binary. Use the Stream directly to read bytes. In your case FileStream.
To guess whether a file is text or binary you could read the first 4K into a byte[] and interpret that.
Btw, you tried to force chars into bytes. This is invalid by principle. I suggest you familiarize yourself with what an Encoding is: it is the only way to convert between chars and bytes in a semantically correct way.