I need to open a file, read a line and if that line respect some conditions write it to another file, the line that I'm reading is a normal ASCII string representig HEX values and I need to paste it into a new file as HEX values and not as ASCII string.
What I have is this:
private void Button_Click(object sender, RoutedEventArgs e)
{
byte[] arrayByte = { 0x00 };
var linesToKeep = File.ReadLines(fileName).Where(l => l.Contains(":10"));
foreach (string line in linesToKeep)
{
string partialA = line.Substring(9);
string partialB = partialA.Remove(partialA.Length - 2);
arrayByte = ToByteArray(partialB);
using (var stream = new FileStream(fileName+"_gugggu", FileMode.OpenOrCreate))
{
FileInfo file = null;
file = new FileInfo(fileName + "_gugggu");
stream.Position = file.Length;
stream.Write(arrayByte, 0, arrayByte.Length);
}
}
}
public static byte[] ToByteArray(String HexString)
{
int NumberChars = HexString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
{
bytes[i / 2] = Convert.ToByte(HexString.Substring(i, 2), 16);
}
return bytes;
}
This method is doing what I need but it takes ages to finish, the original files have roughly 70000 lines... Is there a better way to do that in order to increase speed?
As stated in the comments, you practically keep opening and closing the destination file. You do this for each line in the source file, this causes a huge and unnecessary overhead.
There's no need to close, re-open and re-set the stream position each time you read a line from the source file.
I created a file with 316240 lines. The code below reads, determines which lines to keep and then writes them to the destination file. Since I didn't have access to your source file, I uses some kind of string-to-byte conversion mock-up.
Result:
As you see it completes in 1/3 of a second.
Code:
private Stopwatch sw = new Stopwatch();
private FileInfo _sourceFile = new FileInfo(#"D:\source.txt");
private FileInfo _destinationFile = new FileInfo(#"D:\destination.hex");
private void ConvertFile()
{
sw.Start();
using (var streamReader = new StreamReader(_sourceFile.OpenRead()))
{
using (var streamWrite = _destinationFile.OpenWrite())
{
while (!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
if (line.Contains(" x"))
{
var array = ToByteArray(line);
streamWrite.Write(array, 0, array.Length);
}
}
}
}
sw.Stop();
MessageBox.Show("Done in " + sw.Elapsed);
}
private byte[] ToByteArray(string hexString)
{
return hexString.ToList().ConvertAll(c => Convert.ToByte(c)).ToArray();
}
private void Button_Click(object sender, RoutedEventArgs e)
{
ConvertFile();
}
HTH
Related
I was searching the web but failed to find the correct example.
The goal is to have a function:
private void InsertLine(string source, string position, string content)
And write to a file using the StreamWriter so you do not need to read all lines because I the file is potentially huge.
My function so far:
private void InsertLine(string source, string position, string content)
{
if (!File.Exists(source))
throw new Exception(String.Format("Source:{0} does not exsists", source));
var pos = GetPosition(position);
int line_number = 0;
string line;
using (var fs = File.Open(source, FileMode.Open, FileAccess.ReadWrite))
{
var destinationReader = new StreamReader(fs);
var writer = new StreamWriter(fs);
while (( line = destinationReader.ReadLine()) != null)
{
if (line_number == pos)
{
writer.WriteLine(content);
break;
}
line_number++;
}
}
}
The function does not work in the file as nothing happens.
You can't just insert a line into a file. A file is a sequence of bytes.
You need to:
Write all of the preceding lines
Write the line to be inserted
Write all of the following lines
Here's some untested code based upon yours:
private void InsertLine(string source, string position, string content)
{
if (!File.Exists(source))
throw new Exception(String.Format("Source:{0} does not exsists", source));
// I don't know what all of this is for....
var pos = GetPosition(position);
int line_number = 0;
string line;
using (var fs = File.Open(source, FileMode.Open, FileAccess.ReadWrite))
{
var destinationReader = new StreamReader(fs);
var writer = new StreamWriter(fs);
while (( line = destinationReader.ReadLine()) != null)
{
writer.WriteLine(line); // ADDED: You need to write every original line
if (line_number == pos)
{
writer.WriteLine(content);
// REMOVED the break; here. You need to write all following lines
}
line_number++; // MOVED this out of the if {}. Always count lines.
}
}
}
This probably won't work as expected, however. You're trying to write to the same file you're reading. You should open a new (temporary) file, perform the copy + insert, and then move/rename the temporary file to replace the original file.
I have a file that I need to save as an array. I am trying to convert a text file to an integer array using StreamReader. I just am unsure as to what to put in the for loop at the end of the code.
This is what I have so far:
//Global Variables
int[] Original;
//Load File
private void mnuLoad_Click_1(object sender, EventArgs e)
{
//code to load the numbers from a file
OpenFileDialog fd = new OpenFileDialog();
//open the file dialog and check if a file was selected
if (fd.ShowDialog() == DialogResult.OK)
{
//open file to read
StreamReader sr = new StreamReader(fd.OpenFile());
int Records = int.Parse(sr.ReadLine());
//Assign Array Sizes
Original = new int[Records];
int[] OriginalArray;
for (int i = 0; i < Records; i++)
{
//add code here
}
}
The .txt file is:
5
6
7
9
10
2
PS I am a beginner, so my coding skills are very basic!
UPDATE: I have previous experience using Line.Split and then mapping file to arrays but obviously that does not apply here, so what do I do now?
//as continued for above code
for (int i = 0; i < Records; i++)
{
int Line = int.Parse(sr.ReadLine());
OriginalArray = int.Parse(Line.Split(';')); //get error here
Original[i] = OriginalArray[0];
}
You should just be able to use similar code to what you had above it:
OriginalArray[i] = Convert.ToInt32(sr.ReadLine());
Every time the sr.ReadLine is called it increments the data pointer by 1 line, hence iterating through the array in the text file.
try this
OpenFileDialog fd = new OpenFileDialog();
if (fd.ShowDialog() == DialogResult.OK)
{
StreamReader reader = new StreamReader(fd.OpenFile());
var list = new List<int>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
int value = 0;
if (!string.IsNullOrWhiteSpace(line) && int.TryParse(line, out value))
list.Add(value);
}
MessageBox.Show(list.Aggregate("", (x, y) => (string.IsNullOrWhiteSpace(x) ? "" : x + ", ") + y.ToString()));
}
You can read the entire file into a string array, then parse (checking the integrity of each one).
Something like:
int[] Original;
//Load File
private void mnuLoad_Click_1(object sender, EventArgs e)
{
//code to load the numbers from a file
var fd = new OpenFileDialog();
//open the file dialog and check if a file was selected
if (fd.ShowDialog() == DialogResult.OK)
{
var file = fd.FileName;
try
{
var ints = new List<int>();
var data = File.ReadAllLines(file);
foreach (var datum in data)
{
int value;
if (Int32.TryParse(datum, out value))
{
ints.Add(value);
}
}
Original = ints.ToArray();
}
catch (IOException)
{
// blah, error
}
}
}
Another way to do it if you'd like to read to the end of the file and you don't know how long it is with a while loop:
String line = String.Empty;
int i=0;
while((line = sr.ReadLine()) != null)
{
yourArray[i] = Convert.ToInt32(line);
i++;
//but if you only want to write to the file w/o any other operation
//you could just write w/o conversion or storing into an array
sw.WriteLine(line);
//or sw.Write(line + " "); if you'd like to have it in one row
}
//using linq & anonymous methods (via lambda)
string[] records = File.ReadAllLines(file);
int[] unsorted = Array.ConvertAll<string, int>(records, new Converter<string, int>(i => int.Parse(i)));
I have a txt file with data such as the following:
:10FF800000040B4E00040B4E00047D1400047D148D
:10FF900000040B4E0004CF6200040B4E00040B4E15
:10FFA00000040B4E00040B4E00040B4E00040B4EDD
:10FFB00000047D1400047D1400047D1400047D14ED
:10FFC00000040B4E000000000000000000000000D4
:10FFD0000000000000040B4E0000000000000000C4
:10FFE0000000000000000000000000000000000011
:10FFF0000000000000000000060000000000BFF844
:020000020000FC
:020000040014E6
:043FF0005AC8A58C7A
:00000001FF
what I want to do with my C# program is to add a line after or before a specific line, lets say add the line:
:020000098723060
before this line:
:020000020000FC
I have tried using the File.ReadLines("file.txt").Last(); but that just gives me the last one, what if i want the third or fourth? also, is there any way to identify the ":" in the file?
The simplest way - if you're happy to read the whole file into memory - would be just:
public void InsertLineBefore(string file, string lineToFind, string lineToInsert)
{
List<string> lines = File.ReadLines(file).ToList();
int index = lines.IndexOf(lineToFind);
// TODO: Validation (if index is -1, we couldn't find it)
lines.Insert(index, lineToInsert);
File.WriteAllLines(file, lines);
}
public void InsertLineAfter(string file, string lineToFind, string lineToInsert)
{
List<string> lines = File.ReadLines(file).ToList();
int index = lines.IndexOf(lineToFind);
// TODO: Validation (if index is -1, we couldn't find it)
lines.Insert(index + 1, lineToInsert);
File.WriteAllLines(file, lines);
}
There are significantly more efficient ways of doing this, but this approach is really simple.
A brute force approach
string[] lines = File.ReadAllLines("file.txt");
using(StreamWrite sw = new StreamWriter("file.txt"))
{
foreach(string line in lines)
{
if(line == ":020000020000FC")
sw.WriteLine(":020000098723060");
sw.WriteLine(line);
}
}
I would say it is better to read and write line by line, especially if the target file tend to be of large size:
using (StreamReader r = new StreamReader("Test.txt"))
{
using (StreamWriter w = new StreamWriter("TestOut.txt"))
{
while (!r.EndOfStream)
{
string line = r.ReadLine();
w.WriteLine(line);
if (line == ":020000020000FC")
w.WriteLine(":020000098723060");
}
w.Close();
r.Close();
}
}
Not sure if you're trying to avoid reading the entire file in due to size, etc...but can't you just read the file and then replace...e.g.
var text = readFile(somePath);
writeFile( text.replace(":020000020000FC\n",":020000098723060\n:020000020000FC\n") , somePath);
Here is a solution, though it may not be the best, it does work:
public void AddTextToFile(string filePath, int lineNumber, string txt) //zero based lineNumber
{
Collection<string> newLines = new Collection<string>(File.ReadAllLines(filePath).ToList());
if (lineNumber < newLines.Count)
newLines.Insert(lineNumber, txt);
else
newLines.Add(txt);
using (StreamWriter writer = new StreamWriter(filePath, false))
{
foreach (string s in newLines)
writer.WriteLine(s);
}
}
And to answer your question about determining if ":" exists in a string, the answer is yes, in the example above, you could check if the line contains it by...
if(newLines[idx].Contains(':'))
//do something
The ":" character doesn't really help the implementation, the lines are all newline-delimited already.
Here's an attempt at a method that doesn't load it all to memory or output to a different file.
Never cross the streams.
static Int32 GetCharPos(StreamReader s)
{
var ia = BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField;
Int32 charpos = (Int32)s.GetType().InvokeMember("charPos", ia, null, s, null);
Int32 charlen = (Int32)s.GetType().InvokeMember("charLen", ia, null, s, null);
return (Int32)s.BaseStream.Position - charlen + charpos;
}
static void Appsert(string data, string precedingEntry = null)
{
if (precedingEntry == null)
{
using (var filestream = new FileStream(dataPath, FileMode.Append))
using (var tw = new StreamWriter(filestream))
{
tw.WriteLine(data);
return;
}
}
int seekPos = -1;
using (var readstream = new FileStream(dataPath,
FileMode.Open, FileAccess.Read, FileShare.Write))
using (var writestream = new FileStream(dataPath,
FileMode.Open, FileAccess.Write, FileShare.Read))
using (var tr = new StreamReader(readstream))
{
while (seekPos == -1)
{
var line = tr.ReadLine();
if (line == precedingEntry)
seekPos = GetCharPos(tr);
else if (tr.EndOfStream)
seekPos = (int)readstream.Length;
}
writestream.Seek(seekPos, SeekOrigin.Begin);
readstream.Seek(seekPos, SeekOrigin.Begin);
int readLength = 0;
var readBuffer = new byte[4096];
var writeBuffer = new byte[4096];
var writeData = tr.CurrentEncoding.GetBytes(data + Environment.NewLine);
int writeLength = writeData.Length;
writeData.CopyTo(writeBuffer, 0);
while (true & writeLength > 0)
{
readLength = readstream.Read(readBuffer, 0, readBuffer.Length);
writestream.Write(writeBuffer, 0, writeLength);
var tmp = writeBuffer;
writeBuffer = readBuffer;
writeLength = readLength;
readBuffer = tmp;
}
}
}
Code:
static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
string[] fileAry = Directory.GetFiles(dirPath, filePattern);
Console.WriteLine("Total File Count : " + fileAry.Length);
using (TextWriter tw = new StreamWriter(destFile, true))
{
foreach (string filePath in fileAry)
{
using (TextReader tr = new StreamReader(filePath))
{
tw.WriteLine(tr.ReadToEnd());
tr.Close();
tr.Dispose();
}
Console.WriteLine("File Processed : " + filePath);
}
tw.Close();
tw.Dispose();
}
}
I need to optimize this as its extremely slow: takes 3 minutes for 45 files of average size 40 — 50 Mb XML file.
Please note: 45 files of an average 45 MB is just one example, it can be n numbers of files of m size, where n is in thousands & m can be of average 128 Kb. In short, it can vary.
Could you please provide any views on optimization?
General answer
Why not just use the Stream.CopyTo(Stream destination) method?
private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
{
string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
using (var outputStream = File.Create(outputFilePath))
{
foreach (var inputFilePath in inputFilePaths)
{
using (var inputStream = File.OpenRead(inputFilePath))
{
// Buffer size can be passed as the second argument.
inputStream.CopyTo(outputStream);
}
Console.WriteLine("The file {0} has been processed.", inputFilePath);
}
}
}
Buffer size adjustment
Please, note that the mentioned method is overloaded.
There are two method overloads:
CopyTo(Stream destination).
CopyTo(Stream destination, int bufferSize).
The second method overload provides the buffer size adjustment through the bufferSize parameter.
One option is to utilize the copy command, and let it do what is does well.
Something like:
static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
var cmd = new ProcessStartInfo("cmd.exe",
String.Format("/c copy {0} {1}", filePattern, destFile));
cmd.WorkingDirectory = dirPath;
cmd.UseShellExecute = false;
Process.Start(cmd);
}
I would use a BlockingCollection to read so you can read and write concurrently.
Clearly should write to a separate physical disk to avoid hardware contention.
This code will preserve order.
Read is going to be faster than write so no need for parallel read.
Again since read is going to be faster limit the size of the collection so read does not get farther ahead of write than it needs to.
A simple task to read the single next in parallel while writing the current has the problem of different file sizes - write a small file is faster than read a big.
I use this pattern to read and parse text on T1 and then insert to SQL on T2.
public void WriteFiles()
{
using (BlockingCollection<string> bc = new BlockingCollection<string>(10))
{
// play with 10 if you have several small files then a big file
// write can get ahead of read if not enough are queued
TextWriter tw = new StreamWriter(#"c:\temp\alltext.text", true);
// clearly you want to write to a different phyical disk
// ideally write to solid state even if you move the files to regular disk when done
// Spin up a Task to populate the BlockingCollection
using (Task t1 = Task.Factory.StartNew(() =>
{
string dir = #"c:\temp\";
string fileText;
int minSize = 100000; // play with this
StringBuilder sb = new StringBuilder(minSize);
string[] fileAry = Directory.GetFiles(dir, #"*.txt");
foreach (string fi in fileAry)
{
Debug.WriteLine("Add " + fi);
fileText = File.ReadAllText(fi);
//bc.Add(fi); for testing just add filepath
if (fileText.Length > minSize)
{
if (sb.Length > 0)
{
bc.Add(sb.ToString());
sb.Clear();
}
bc.Add(fileText); // could be really big so don't hit sb
}
else
{
sb.Append(fileText);
if (sb.Length > minSize)
{
bc.Add(sb.ToString());
sb.Clear();
}
}
}
if (sb.Length > 0)
{
bc.Add(sb.ToString());
sb.Clear();
}
bc.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
string text;
try
{
while (true)
{
text = bc.Take();
Debug.WriteLine("Take " + text);
tw.WriteLine(text);
}
}
catch (InvalidOperationException)
{
// An InvalidOperationException means that Take() was called on a completed collection
Debug.WriteLine("That's All!");
tw.Close();
tw.Dispose();
}
}))
Task.WaitAll(t1, t2);
}
}
}
BlockingCollection Class
Tried solution posted by sergey-brunov for merging 2GB file. System took around 2 GB of RAM for this work. I have made some changes for more optimization and it now takes 350MB RAM to merge 2GB file.
private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
{
string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
foreach (var inputFilePath in inputFilePaths)
{
using (var outputStream = File.AppendText(outputFilePath))
{
// Buffer size can be passed as the second argument.
outputStream.WriteLine(File.ReadAllText(inputFilePath));
Console.WriteLine("The file {0} has been processed.", inputFilePath);
}
}
}
Several things you can do:
I my experience the default buffer sizes can be increased with noticeable benefit up to about 120K, I suspect setting a large buffer on all streams will be the easiest and most noticeable performance booster:
new System.IO.FileStream("File.txt", System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read, 150000);
Use the Stream class, not the StreamReader class.
Read contents into a large buffer, dump them in output stream at once — this will speed up small files operations.
No need of the redundant close/dispose: you have the using statement.
// Binary File Copy
public static void mergeFiles(string strFileIn1, string strFileIn2, string strFileOut, out string strError)
{
strError = String.Empty;
try
{
using (FileStream streamIn1 = File.OpenRead(strFileIn1))
using (FileStream streamIn2 = File.OpenRead(strFileIn2))
using (FileStream writeStream = File.OpenWrite(strFileOut))
{
BinaryReader reader = new BinaryReader(streamIn1);
BinaryWriter writer = new BinaryWriter(writeStream);
// create a buffer to hold the bytes. Might be bigger.
byte[] buffer = new Byte[1024];
int bytesRead;
// while the read method returns bytes keep writing them to the output stream
while ((bytesRead =
streamIn1.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
while ((bytesRead =
streamIn2.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
}
}
catch (Exception ex)
{
strError = ex.Message;
}
}
i have this code that compares two text files and write the difference to a log file but for some reason the log.txt file is some times blank even when is test with some lines starting with a * these are not always written either do I have to save the text file when finished writing although this does not explain why sometimes it works any help would be great
private void compare()
{
string FilePath = #"c:\snapshot\1.txt";
string Filepath2 = #"c:\snapshot\2.txt";
int counter = 0;
string line;
string line2;
var dir = "c:\\snapshot\\log.txt";
using (FileStream fs = File.Create(dir))
{
fs.Dispose();
}
StreamWriter dest = new StreamWriter(dir);
if (File.Exists(FilePath) & File.Exists(Filepath2))
{
// Read the file and display it line by line.
using (var file = File.OpenText(FilePath))
using (var file2 = File.OpenText(Filepath2))
{
while (((line = file.ReadLine()) != null & (line2 = file2.ReadLine()) != null))
{
if (line.Contains("*"))
{
dest.WriteLine(line2);
}
else if (!line.Contains(line2))
{
dest.WriteLine(line2);
}
counter++;
}
}
}
dest.Close();
}
Everything left in the buffer should be written out once you hit the close statement on your StreamReader. If you are missing stuff then it might be that you aren't reaching that line for some reason (i.e. you crash). Also, if you are trying to look at the file while it's being written (i.e. while the program is still running), you won't necessarily see everything (since it hasn't closed).
Generally, it's better to use a using statement with the StreamReader. That should ensure that it always gets closed.
private void compare()
{
string FileName1 = #"c:\snapshot\1.txt";
string FileName2 = #"c:\snapshot\2.txt";
string FileNameOutput = #"c:\snapshot\log.txt"; //dir ???
int counter = 0; // um what's this for you aren't using it.
using (FileStream fso = new FileStream(FileNameOutput, FileMode.Create, FileAccess.Write))
{
TextWriter dest = new StreamWriter(fso);
using(FileStream fs1 = new FileStream(FileName1, FileMode.Open, FileAccess.Read))
{
using (FileStream fs2 = new FileStream(FileName2, FileMode.Open, FileAccess.Read))
{
TextReader firstFile = new StreamReader(fs1);
TextReader secondFile = new StreamReader(fs2);
while (((line1 = firstFile.ReadLine()) != null & (line2 = secondFile.ReadLine()) != null))
{
if ((line1.Contains("*") || (!line1.Contains(line2)))
{
dest.Write(line2); // Writeline would give you an extra line?
}
counter++; //
}
}
}
fso.Flush();
}
I commend the overloads of FileStream to you. Do it the way I have and the code will crash at the point you instance the stream if the user running it doesn't have all the required permissions. It's a nice way of showing what you intend, and what you don't.
PS You do know contains is case and culture sensitive?
Not sure if I'm understanding your comparing logic right, but as long as I separated comparing from whole code you can adjust it to your own needs:
public static void WriteDifferences(string sourcePath, string destinationPath, string differencesPath)
{
var sourceLines = File.ReadAllLines(sourcePath).ToList();
var destinationLines = File.ReadAllLines(destinationPath).ToList();
// make lists equal size
if (sourceLines.Count > destinationLines.Count)
{
destinationLines.AddRange(Enumerable.Range(0, sourceLines.Count - destinationLines.Count).Select(x => (string)null));
}
else
{
sourceLines.AddRange(Enumerable.Range(0, destinationLines.Count - sourceLines.Count).Select(x => (string)null));
}
var differences = sourceLines.Zip(destinationLines, (source, destination) => Compare(source, destination));
File.WriteAllLines(differencesPath, differences.Where(x => x != null));
}
private static string Compare(string source, string destination)
{
return !source.Contains(destination) || source.Contains("*") ? destination : null;
}