Combine multiple files into single file - c#

Code:
static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
string[] fileAry = Directory.GetFiles(dirPath, filePattern);
Console.WriteLine("Total File Count : " + fileAry.Length);
using (TextWriter tw = new StreamWriter(destFile, true))
{
foreach (string filePath in fileAry)
{
using (TextReader tr = new StreamReader(filePath))
{
tw.WriteLine(tr.ReadToEnd());
tr.Close();
tr.Dispose();
}
Console.WriteLine("File Processed : " + filePath);
}
tw.Close();
tw.Dispose();
}
}
I need to optimize this as its extremely slow: takes 3 minutes for 45 files of average size 40 — 50 Mb XML file.
Please note: 45 files of an average 45 MB is just one example, it can be n numbers of files of m size, where n is in thousands & m can be of average 128 Kb. In short, it can vary.
Could you please provide any views on optimization?

General answer
Why not just use the Stream.CopyTo(Stream destination) method?
private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
{
string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
using (var outputStream = File.Create(outputFilePath))
{
foreach (var inputFilePath in inputFilePaths)
{
using (var inputStream = File.OpenRead(inputFilePath))
{
// Buffer size can be passed as the second argument.
inputStream.CopyTo(outputStream);
}
Console.WriteLine("The file {0} has been processed.", inputFilePath);
}
}
}
Buffer size adjustment
Please, note that the mentioned method is overloaded.
There are two method overloads:
CopyTo(Stream destination).
CopyTo(Stream destination, int bufferSize).
The second method overload provides the buffer size adjustment through the bufferSize parameter.

One option is to utilize the copy command, and let it do what is does well.
Something like:
static void MultipleFilesToSingleFile(string dirPath, string filePattern, string destFile)
{
var cmd = new ProcessStartInfo("cmd.exe",
String.Format("/c copy {0} {1}", filePattern, destFile));
cmd.WorkingDirectory = dirPath;
cmd.UseShellExecute = false;
Process.Start(cmd);
}

I would use a BlockingCollection to read so you can read and write concurrently.
Clearly should write to a separate physical disk to avoid hardware contention.
This code will preserve order.
Read is going to be faster than write so no need for parallel read.
Again since read is going to be faster limit the size of the collection so read does not get farther ahead of write than it needs to.
A simple task to read the single next in parallel while writing the current has the problem of different file sizes - write a small file is faster than read a big.
I use this pattern to read and parse text on T1 and then insert to SQL on T2.
public void WriteFiles()
{
using (BlockingCollection<string> bc = new BlockingCollection<string>(10))
{
// play with 10 if you have several small files then a big file
// write can get ahead of read if not enough are queued
TextWriter tw = new StreamWriter(#"c:\temp\alltext.text", true);
// clearly you want to write to a different phyical disk
// ideally write to solid state even if you move the files to regular disk when done
// Spin up a Task to populate the BlockingCollection
using (Task t1 = Task.Factory.StartNew(() =>
{
string dir = #"c:\temp\";
string fileText;
int minSize = 100000; // play with this
StringBuilder sb = new StringBuilder(minSize);
string[] fileAry = Directory.GetFiles(dir, #"*.txt");
foreach (string fi in fileAry)
{
Debug.WriteLine("Add " + fi);
fileText = File.ReadAllText(fi);
//bc.Add(fi); for testing just add filepath
if (fileText.Length > minSize)
{
if (sb.Length > 0)
{
bc.Add(sb.ToString());
sb.Clear();
}
bc.Add(fileText); // could be really big so don't hit sb
}
else
{
sb.Append(fileText);
if (sb.Length > minSize)
{
bc.Add(sb.ToString());
sb.Clear();
}
}
}
if (sb.Length > 0)
{
bc.Add(sb.ToString());
sb.Clear();
}
bc.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
string text;
try
{
while (true)
{
text = bc.Take();
Debug.WriteLine("Take " + text);
tw.WriteLine(text);
}
}
catch (InvalidOperationException)
{
// An InvalidOperationException means that Take() was called on a completed collection
Debug.WriteLine("That's All!");
tw.Close();
tw.Dispose();
}
}))
Task.WaitAll(t1, t2);
}
}
}
BlockingCollection Class

Tried solution posted by sergey-brunov for merging 2GB file. System took around 2 GB of RAM for this work. I have made some changes for more optimization and it now takes 350MB RAM to merge 2GB file.
private static void CombineMultipleFilesIntoSingleFile(string inputDirectoryPath, string inputFileNamePattern, string outputFilePath)
{
string[] inputFilePaths = Directory.GetFiles(inputDirectoryPath, inputFileNamePattern);
Console.WriteLine("Number of files: {0}.", inputFilePaths.Length);
foreach (var inputFilePath in inputFilePaths)
{
using (var outputStream = File.AppendText(outputFilePath))
{
// Buffer size can be passed as the second argument.
outputStream.WriteLine(File.ReadAllText(inputFilePath));
Console.WriteLine("The file {0} has been processed.", inputFilePath);
}
}
}

Several things you can do:
I my experience the default buffer sizes can be increased with noticeable benefit up to about 120K, I suspect setting a large buffer on all streams will be the easiest and most noticeable performance booster:
new System.IO.FileStream("File.txt", System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read, 150000);
Use the Stream class, not the StreamReader class.
Read contents into a large buffer, dump them in output stream at once — this will speed up small files operations.
No need of the redundant close/dispose: you have the using statement.

// Binary File Copy
public static void mergeFiles(string strFileIn1, string strFileIn2, string strFileOut, out string strError)
{
strError = String.Empty;
try
{
using (FileStream streamIn1 = File.OpenRead(strFileIn1))
using (FileStream streamIn2 = File.OpenRead(strFileIn2))
using (FileStream writeStream = File.OpenWrite(strFileOut))
{
BinaryReader reader = new BinaryReader(streamIn1);
BinaryWriter writer = new BinaryWriter(writeStream);
// create a buffer to hold the bytes. Might be bigger.
byte[] buffer = new Byte[1024];
int bytesRead;
// while the read method returns bytes keep writing them to the output stream
while ((bytesRead =
streamIn1.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
while ((bytesRead =
streamIn2.Read(buffer, 0, 1024)) > 0)
{
writeStream.Write(buffer, 0, bytesRead);
}
}
}
catch (Exception ex)
{
strError = ex.Message;
}
}

Related

Proper way to copy a file line by line to another file?

I need to open a file, read a line and if that line respect some conditions write it to another file, the line that I'm reading is a normal ASCII string representig HEX values and I need to paste it into a new file as HEX values and not as ASCII string.
What I have is this:
private void Button_Click(object sender, RoutedEventArgs e)
{
byte[] arrayByte = { 0x00 };
var linesToKeep = File.ReadLines(fileName).Where(l => l.Contains(":10"));
foreach (string line in linesToKeep)
{
string partialA = line.Substring(9);
string partialB = partialA.Remove(partialA.Length - 2);
arrayByte = ToByteArray(partialB);
using (var stream = new FileStream(fileName+"_gugggu", FileMode.OpenOrCreate))
{
FileInfo file = null;
file = new FileInfo(fileName + "_gugggu");
stream.Position = file.Length;
stream.Write(arrayByte, 0, arrayByte.Length);
}
}
}
public static byte[] ToByteArray(String HexString)
{
int NumberChars = HexString.Length;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
{
bytes[i / 2] = Convert.ToByte(HexString.Substring(i, 2), 16);
}
return bytes;
}
This method is doing what I need but it takes ages to finish, the original files have roughly 70000 lines... Is there a better way to do that in order to increase speed?
As stated in the comments, you practically keep opening and closing the destination file. You do this for each line in the source file, this causes a huge and unnecessary overhead.
There's no need to close, re-open and re-set the stream position each time you read a line from the source file.
I created a file with 316240 lines. The code below reads, determines which lines to keep and then writes them to the destination file. Since I didn't have access to your source file, I uses some kind of string-to-byte conversion mock-up.
Result:
As you see it completes in 1/3 of a second.
Code:
private Stopwatch sw = new Stopwatch();
private FileInfo _sourceFile = new FileInfo(#"D:\source.txt");
private FileInfo _destinationFile = new FileInfo(#"D:\destination.hex");
private void ConvertFile()
{
sw.Start();
using (var streamReader = new StreamReader(_sourceFile.OpenRead()))
{
using (var streamWrite = _destinationFile.OpenWrite())
{
while (!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
if (line.Contains(" x"))
{
var array = ToByteArray(line);
streamWrite.Write(array, 0, array.Length);
}
}
}
}
sw.Stop();
MessageBox.Show("Done in " + sw.Elapsed);
}
private byte[] ToByteArray(string hexString)
{
return hexString.ToList().ConvertAll(c => Convert.ToByte(c)).ToArray();
}
private void Button_Click(object sender, RoutedEventArgs e)
{
ConvertFile();
}
HTH

Reading every single line from txt file C#

i want to read every single line form txt file. Instead of every line i get every second line. The question is why and how can i do something about it.
list.txt:
60001
60002
60003
60004
..every number in single line and so on 100 lines
StreamReader podz = new StreamReader(#"D:\list.txt");
string pd="";
int count=0;
while ((pd = podz.ReadLine()) != null)
{
Console.WriteLine("\n number: {0}", podz.ReadLine());
count++;
}
Console.WriteLine("\n c {0}", count);
Console.ReadLine();
Because you're reading two lines per loop iteration:
while ((pd = podz.ReadLine()) != null) // here
{
Console.WriteLine("\n number: {0}", podz.ReadLine()); // and here
count++;
}
Instead, just read the line in the first place and use the pd variable in which you store the line in the second place:
while ((pd = podz.ReadLine()) != null)
{
Console.WriteLine("\n number: {0}", pd);
count++;
}
I suggest using File instead of Streams and Readers:
var lines = File
.ReadLines(#"D:\list.txt");
int count = 0;
foreach (var line in lines) {
Console.WriteLine("\n number: {0}", line);
count++;
}
Console.WriteLine("\n c {0}", count);
Console.ReadLine();
Your code has some problems:
You are reading the line incorrectly (calling ReadLine twice per each iteration)
You are not closing the Stream
If the file is in use by another process (i.e. a process writing to the file) you may get some errors
The StreamReader class is useful when the size of the file is very large, and in case you are dealing with a small file you can simply call System.IO.File.ReadAllLines("FileName").
In case the size of the file is large, follow this approach
public static List<String> ReadAllLines(String fileName)
{
using(System.IO.FileStream fs = new System.IO.FileStream(fileName, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read))
{
using(System.IO.StreamReader sr = new System.IO.StreamReader(fs))
{
List<String> lines = new List<String>();
while (!sr.EndOfStream)
{
lines.Add(sr.ReadLine());
}
return lines;
}
}

The process cannot access the file because it is being used by another process. (Text File will not close)

I am trying to Write to a text file after this code block checks for the last time the PC was restarted. The code below reads from a text file, the last time the PC was resarted, and from there it determines whether to show a splash-screen. However, After this method runs, i need to write to the text file what the current "System Up-Time" is. But i keep getting an error that says the text file is in use. This has driven me insane. I have made sure all StreamWriters and StreamReaders are closed. I have tried Using Statements. I have tried GC.Collect. I feel like i have tried everything.
Any help would be appreciated.
private void checkLastResart()
{
StreamReader sr = new StreamReader(Path.GetDirectoryName(Application.ExecutablePath) + #"\Settings.txt");
if (sr.ReadLine() == null)
{
sr.Close();
MessageBox.Show("There was an error loading 'System UpTime'. All settings have been restored to default.");
StreamWriter sw = new StreamWriter(Path.GetDirectoryName(Application.ExecutablePath) + #"\Settings.txt", false);
sw.WriteLine("Conversion Complete Checkbox: 0");
sw.WriteLine("Default Tool: 0");
sw.WriteLine("TimeSinceResart: 0");
sw.Flush();
sw.Close();
}
else
{
try
{
StreamReader sr2 = new StreamReader(Path.GetDirectoryName(Application.ExecutablePath) + #"\Settings.txt");
while (!sr2.EndOfStream)
{
string strSetting = sr2.ReadLine();
if (strSetting.Contains("TimeSinceResart:"))
{
double lastTimeRecorded = double.Parse(strSetting.Substring(17));
//If the lastTimeRecorded is greater than timeSinceResart (computer has been resarted) OR 2 hours have passed since LVT was last run
if (lastTimeRecorded > timeSinceRestart || lastTimeRecorded + 7200 < timeSinceRestart)
{
runSplashScreen = true;
}
else
{
runSplashScreen = false;
}
}
}
sr2.Close();
sr2.Dispose();
}
catch (Exception e) { MessageBox.Show("An error has occured loading 'System UpTime'.\r\n\r\n" + e); }
}
}
Below is a sample of writing to the Text file, after the above code has been run. It doesnt matter if i open a StreamWriter, or use File.WriteAllLines, an error is thrown immediately.
StreamWriter sw = new StreamWriter(Path.GetDirectoryName(Application.ExecutablePath) + #"\Settings.txt");
string[] lines = File.ReadAllLines(Path.GetDirectoryName(Application.ExecutablePath) + #"\Settings.txt");
lines[2] = "TimeSinceResart: " + timeSinceRestart;
foreach (string s in lines)
sw.WriteLine(s);
Your writing code should be changed in this way
string file = Path.Combine(Path.GetDirectoryName(Application.ExecutablePath),"Settings.txt");
// First read the two lines in memory
string[] lines = File.ReadAllLines(file);
// then use the StreamWriter that locks the file
using(StreamWriter sw = new StreamWriter(file))
{
lines[2] = "TimeSinceResart: " + timeSinceRestart;
foreach (string s in lines)
sw.WriteLine(s);
}
In this way the lock on the StreamWriter doesn't block the reading with FileReadAllLines.
Said that, please note a couple of things. Do not create path strings with string concatenation, use the static methods of the Path class. But most important, when you create a disposable object like a stream be sure to use the using statement to close correctly the file
To complete the answer in response to your comment. Using statement also for the first part of your code
private void checkLastResart()
{
string file = Path.Combine(Path.GetDirectoryName(Application.ExecutablePath),"Settings.txt");
using(StreamReader sr = new StreamReader(file))
{
if (sr.ReadLine() == null)
{
sr.Close();
MessageBox.Show(...)
using(StreamWriter sw = new StreamWriter(file, false))
{
sw.WriteLine("Conversion Complete Checkbox: 0");
sw.WriteLine("Default Tool: 0");
sw.WriteLine("TimeSinceResart: 0");
sw.Flush();
}
}
else
{
....
}
} // exit using block closes and disposes the stream
}
Where you create sr2, sr still has settings.txt open.

writing to text file does not alway work/save

i have this code that compares two text files and write the difference to a log file but for some reason the log.txt file is some times blank even when is test with some lines starting with a * these are not always written either do I have to save the text file when finished writing although this does not explain why sometimes it works any help would be great
private void compare()
{
string FilePath = #"c:\snapshot\1.txt";
string Filepath2 = #"c:\snapshot\2.txt";
int counter = 0;
string line;
string line2;
var dir = "c:\\snapshot\\log.txt";
using (FileStream fs = File.Create(dir))
{
fs.Dispose();
}
StreamWriter dest = new StreamWriter(dir);
if (File.Exists(FilePath) & File.Exists(Filepath2))
{
// Read the file and display it line by line.
using (var file = File.OpenText(FilePath))
using (var file2 = File.OpenText(Filepath2))
{
while (((line = file.ReadLine()) != null & (line2 = file2.ReadLine()) != null))
{
if (line.Contains("*"))
{
dest.WriteLine(line2);
}
else if (!line.Contains(line2))
{
dest.WriteLine(line2);
}
counter++;
}
}
}
dest.Close();
}
Everything left in the buffer should be written out once you hit the close statement on your StreamReader. If you are missing stuff then it might be that you aren't reaching that line for some reason (i.e. you crash). Also, if you are trying to look at the file while it's being written (i.e. while the program is still running), you won't necessarily see everything (since it hasn't closed).
Generally, it's better to use a using statement with the StreamReader. That should ensure that it always gets closed.
private void compare()
{
string FileName1 = #"c:\snapshot\1.txt";
string FileName2 = #"c:\snapshot\2.txt";
string FileNameOutput = #"c:\snapshot\log.txt"; //dir ???
int counter = 0; // um what's this for you aren't using it.
using (FileStream fso = new FileStream(FileNameOutput, FileMode.Create, FileAccess.Write))
{
TextWriter dest = new StreamWriter(fso);
using(FileStream fs1 = new FileStream(FileName1, FileMode.Open, FileAccess.Read))
{
using (FileStream fs2 = new FileStream(FileName2, FileMode.Open, FileAccess.Read))
{
TextReader firstFile = new StreamReader(fs1);
TextReader secondFile = new StreamReader(fs2);
while (((line1 = firstFile.ReadLine()) != null & (line2 = secondFile.ReadLine()) != null))
{
if ((line1.Contains("*") || (!line1.Contains(line2)))
{
dest.Write(line2); // Writeline would give you an extra line?
}
counter++; //
}
}
}
fso.Flush();
}
I commend the overloads of FileStream to you. Do it the way I have and the code will crash at the point you instance the stream if the user running it doesn't have all the required permissions. It's a nice way of showing what you intend, and what you don't.
PS You do know contains is case and culture sensitive?
Not sure if I'm understanding your comparing logic right, but as long as I separated comparing from whole code you can adjust it to your own needs:
public static void WriteDifferences(string sourcePath, string destinationPath, string differencesPath)
{
var sourceLines = File.ReadAllLines(sourcePath).ToList();
var destinationLines = File.ReadAllLines(destinationPath).ToList();
// make lists equal size
if (sourceLines.Count > destinationLines.Count)
{
destinationLines.AddRange(Enumerable.Range(0, sourceLines.Count - destinationLines.Count).Select(x => (string)null));
}
else
{
sourceLines.AddRange(Enumerable.Range(0, destinationLines.Count - sourceLines.Count).Select(x => (string)null));
}
var differences = sourceLines.Zip(destinationLines, (source, destination) => Compare(source, destination));
File.WriteAllLines(differencesPath, differences.Where(x => x != null));
}
private static string Compare(string source, string destination)
{
return !source.Contains(destination) || source.Contains("*") ? destination : null;
}

C# How to skip number of lines while reading text file using Stream Reader?

I have a program which reads a text file and processes it to be seperated into sections.
So the question is how can the program be changed to allow the program to skip reading the first 5 lines of the file while using the Stream Reader to read the file?
Could someones please advise on the codes? Thanks!
The Codes:
class Program
{
static void Main(string[] args)
{
TextReader tr = new StreamReader(#"C:\Test\new.txt");
String SplitBy = "----------------------------------------";
// Skip first 5 lines of the text file?
String fullLog = tr.ReadToEnd();
String[] sections = fullLog.Split(new string[] { SplitBy }, StringSplitOptions.None);
//String[] lines = sections.Skip(5).ToArray();
foreach (String r in sections)
{
Console.WriteLine(r);
Console.WriteLine("============================================================");
}
}
}
Try the following
// Skip 5 lines
for(var i = 0; i < 5; i++) {
tr.ReadLine();
}
// Read the rest
string remainingText = tr.ReadToEnd();
If the lines are fixed then the most efficient way is as follows:
using( Stream stream = File.Open(fileName, FileMode.Open) )
{
stream.Seek(bytesPerLine * (myLine - 1), SeekOrigin.Begin);
using( StreamReader reader = new StreamReader(stream) )
{
string line = reader.ReadLine();
}
}
And if the lines vary in length then you'll have to just read them in a line at a time as follows:
using (var sr = new StreamReader("file"))
{
for (int i = 1; i <= 5; ++i)
sr.ReadLine();
}
If you want to use it more times in your program then it maybe a good idea to make a custom class inherited from StreamReader with the ability to skip lines.
Something like this could do:
class SkippableStreamReader : StreamReader
{
public SkippableStreamReader(string path) : base(path) { }
public void SkipLines(int linecount)
{
for (int i = 0; i < linecount; i++)
{
this.ReadLine();
}
}
}
after this you could use the SkippableStreamReader's function to skip lines.
Example:
SkippableStreamReader exampleReader = new SkippableStreamReader("file_to_read");
//do stuff
//and when needed
exampleReader.SkipLines(number_of_lines_to_skip);
I'll add two more suggestions to the list.
If there will always be a file, and you will only be reading, I suggest this:
var lines = File.ReadLines(#"C:\Test\new.txt").Skip(5).ToArray();
File.ReadLines doesn't block the file from others and only loads into memory necessary lines.
If your stream can come from other sources then I suggest this approach:
class Program
{
static void Main(string[] args)
{
//it's up to you to get your stream
var stream = GetStream();
//Here is where you'll read your lines.
//Any Linq statement can be used here.
var lines = ReadLines(stream).Skip(5).ToArray();
//Go on and do whatever you want to do with your lines...
}
}
public IEnumerable<string> ReadLines(Stream stream)
{
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
The Iterator block will automatically clean itself up once you are done with it. Here is an article by Jon Skeet going in depth into how that works exactly (scroll down to the "And finally..." section).
I'd guess it's as simple as:
static void Main(string[] args)
{
var tr = new StreamReader(#"C:\new.txt");
var SplitBy = "----------------------------------------";
// Skip first 5 lines of the text file?
foreach (var i in Enumerable.Range(1, 5)) tr.ReadLine();
var fullLog = tr.ReadToEnd();
String[] sections = fullLog.Split(new string[] { SplitBy }, StringSplitOptions.None);
//String[] lines = sections.Skip(5).ToArray();
foreach (String r in sections)
{
Console.WriteLine(r);
Console.WriteLine("============================================================");
}
}
The StreamReader with ReadLine or ReadToEnd will actually go and read the bytes into the memory, even if you are not processing these lines, they will be loaded, which will affect the app performance in case of big files (10+ MB).
If you want to skip a specific number of lines you need to know the position of the file you want to move to, which gives you two options:
If you know the line length you can calculate the position and move there with Stream.Seek. This is the most efficient way to skip stream content without reading it. The issue here is that you can rarely know the line length.
var linesToSkip = 10;
using(var reader = new StreamReader(fileName) )
{
reader.BaseStream.Seek(lineLength * (linesToSkip - 1), SeekOrigin.Begin);
var myNextLine = reader.ReadLine();
// TODO: process the line
}
If you don't know the line length, you have to read line by line and skip them until you get to the line number desired. The issue here is that is the line number is high, you will get a performance hit
var linesToSkip = 10;
using (var reader = new StreamReader(fileName))
{
for (int i = 1; i <= linesToSkip; ++i)
reader.ReadLine();
var myNextLine = reader.ReadLine();
// TODO: process the line
}
And if you need just skip everything, you should do it without reading all the content into memory:
using(var reader = new StreamReader(fileName) )
{
reader.BaseStream.Seek(0, SeekOrigin.End);
// You can wait here for other processes to write into this file and then the ReadLine will provide you with that content
var myNextLine = reader.ReadLine();
// TODO: process the line
}

Categories

Resources