Event driven stdin in C# - c#

Does C# provide an event when data is received on the stdin stream for my own process? Something like Process.OutputDataReceived, only I need an event for InputDataReceived.
I've searched high and low, and learned to redirect stdin->stdout, monitor output streams of spawned apps and a ton of other stuff, but nowhere has anyone shown which event is triggered when stdin is recieved. Unless I use a dumb polling loop in main().
// dumb polling loop -- is this the only way? does this consume a lot of CPU?
while ((line = Console.ReadLine()) != null && line != "") {
// do work
}
Also, I need to get binary data from the stream, something like this:
using (Stream stdin = Console.OpenStandardInput())
using (Stream stdout = Console.OpenStandardOutput())
{
byte[] buffer = new byte[2048];
int bytes;
while ((bytes = stdin.Read(buffer, 0, buffer.Length)) > 0) {
stdout.Write(buffer, 0, bytes);
}
}

The polling loop won't consume much CPU, because ReadLine blocks and waits. Put this code in an own worker-thread and raise your event out of it. As far as I know, there is no such feature in .NET.
EDIT: I was wrong here in the first place. Corrected:
You can actually read the binary data from stdin, as this SO answer says:
To read binary, the best approach is to use the raw input stream - here showing something like "echo" between stdin and stdout:
using (Stream stdin = Console.OpenStandardInput())
using (Stream stdout = Console.OpenStandardOutput())
{
byte[] buffer = new byte[2048];
int bytes;
while ((bytes = stdin.Read(buffer, 0, buffer.Length)) > 0) {
stdout.Write(buffer, 0, bytes);
}
}

Here's an async approach. Like OutputDataReceived, the callback runs on newlines. For binary, streaming to base64 might work. Switching it to a binary stream is harder because you can't just check for newline.
using System.Diagnostics;
using System.Threading.Tasks;
public static void ListenToParent(Action<string> onMessageFromParent)
{
Task.Run(async () =>
{
while (true) // Loop runs only once per line received
{
var text = await Console.In.ReadLineAsync();
onMessageFromParent(text);
}
});
}
Here's how my parent app sets up the child process:
var child = new Process()
{
EnableRaisingEvents = true,
StartInfo =
{
FileName = ..., // .exe path
RedirectStandardOutput = true,
RedirectStandardInput = true,
UseShellExecute = false,
CreateNoWindow = true
},
};
child.Start();
child.BeginOutputReadLine();
... and how it sends a line to the child process:
child.StandardInput.WriteLine("Message from parent");

Related

How to copy a large stream to another without OutOfMemoryException in C# [duplicate]

What is the best way to copy the contents of one stream to another? Is there a standard utility method for this?
From .NET 4.5 on, there is the Stream.CopyToAsync method
input.CopyToAsync(output);
This will return a Task that can be continued on when completed, like so:
await input.CopyToAsync(output)
// Code from here on will be run in a continuation.
Note that depending on where the call to CopyToAsync is made, the code that follows may or may not continue on the same thread that called it.
The SynchronizationContext that was captured when calling await will determine what thread the continuation will be executed on.
Additionally, this call (and this is an implementation detail subject to change) still sequences reads and writes (it just doesn't waste a threads blocking on I/O completion).
From .NET 4.0 on, there's is the Stream.CopyTo method
input.CopyTo(output);
For .NET 3.5 and before
There isn't anything baked into the framework to assist with this; you have to copy the content manually, like so:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write (buffer, 0, read);
}
}
Note 1: This method will allow you to report on progress (x bytes read so far ...)
Note 2: Why use a fixed buffer size and not input.Length? Because that Length may not be available! From the docs:
If a class derived from Stream does not support seeking, calls to Length, SetLength, Position, and Seek throw a NotSupportedException.
MemoryStream has .WriteTo(outstream);
and .NET 4.0 has .CopyTo on normal stream object.
.NET 4.0:
instream.CopyTo(outstream);
I use the following extension methods. They have optimized overloads for when one stream is a MemoryStream.
public static void CopyTo(this Stream src, Stream dest)
{
int size = (src.CanSeek) ? Math.Min((int)(src.Length - src.Position), 0x2000) : 0x2000;
byte[] buffer = new byte[size];
int n;
do
{
n = src.Read(buffer, 0, buffer.Length);
dest.Write(buffer, 0, n);
} while (n != 0);
}
public static void CopyTo(this MemoryStream src, Stream dest)
{
dest.Write(src.GetBuffer(), (int)src.Position, (int)(src.Length - src.Position));
}
public static void CopyTo(this Stream src, MemoryStream dest)
{
if (src.CanSeek)
{
int pos = (int)dest.Position;
int length = (int)(src.Length - src.Position) + pos;
dest.SetLength(length);
while(pos < length)
pos += src.Read(dest.GetBuffer(), pos, length - pos);
}
else
src.CopyTo((Stream)dest);
}
.NET Framework 4 introduce new "CopyTo" method of Stream Class of System.IO namespace. Using this method we can copy one stream to another stream of different stream class.
Here is example for this.
FileStream objFileStream = File.Open(Server.MapPath("TextFile.txt"), FileMode.Open);
Response.Write(string.Format("FileStream Content length: {0}", objFileStream.Length.ToString()));
MemoryStream objMemoryStream = new MemoryStream();
// Copy File Stream to Memory Stream using CopyTo method
objFileStream.CopyTo(objMemoryStream);
Response.Write("<br/><br/>");
Response.Write(string.Format("MemoryStream Content length: {0}", objMemoryStream.Length.ToString()));
Response.Write("<br/><br/>");
There is actually, a less heavy-handed way of doing a stream copy. Take note however, that this implies that you can store the entire file in memory. Don't try and use this if you are working with files that go into the hundreds of megabytes or more, without caution.
public static void CopySmallTextStream(Stream input, Stream output)
{
using (StreamReader reader = new StreamReader(input))
using (StreamWriter writer = new StreamWriter(output))
{
writer.Write(reader.ReadToEnd());
}
}
NOTE: There may also be some issues concerning binary data and character encodings.
The basic questions that differentiate implementations of "CopyStream" are:
size of the reading buffer
size of the writes
Can we use more than one thread (writing while we are reading).
The answers to these questions result in vastly different implementations of CopyStream and are dependent on what kind of streams you have and what you are trying to optimize. The "best" implementation would even need to know what specific hardware the streams were reading and writing to.
Unfortunately, there is no really simple solution. You can try something like that:
Stream s1, s2;
byte[] buffer = new byte[4096];
int bytesRead = 0;
while (bytesRead = s1.Read(buffer, 0, buffer.Length) > 0) s2.Write(buffer, 0, bytesRead);
s1.Close(); s2.Close();
But the problem with that that different implementation of the Stream class might behave differently if there is nothing to read. A stream reading a file from a local harddrive will probably block until the read operaition has read enough data from the disk to fill the buffer and only return less data if it reaches the end of file. On the other hand, a stream reading from the network might return less data even though there are more data left to be received.
Always check the documentation of the specific stream class you are using before using a generic solution.
There may be a way to do this more efficiently, depending on what kind of stream you're working with. If you can convert one or both of your streams to a MemoryStream, you can use the GetBuffer method to work directly with a byte array representing your data. This lets you use methods like Array.CopyTo, which abstract away all the issues raised by fryguybob. You can just trust .NET to know the optimal way to copy the data.
if you want a procdure to copy a stream to other the one that nick posted is fine but it is missing the position reset, it should be
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start
}
but if it is in runtime not using a procedure you shpuld use memory stream
Stream output = new MemoryStream();
byte[] buffer = new byte[32768]; // or you specify the size you want of your buffer
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start
Since none of the answers have covered an asynchronous way of copying from one stream to another, here is a pattern that I've successfully used in a port forwarding application to copy data from one network stream to another. It lacks exception handling to emphasize the pattern.
const int BUFFER_SIZE = 4096;
static byte[] bufferForRead = new byte[BUFFER_SIZE];
static byte[] bufferForWrite = new byte[BUFFER_SIZE];
static Stream sourceStream = new MemoryStream();
static Stream destinationStream = new MemoryStream();
static void Main(string[] args)
{
// Initial read from source stream
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginReadCallback(IAsyncResult asyncRes)
{
// Finish reading from source stream
int bytesRead = sourceStream.EndRead(asyncRes);
// Make a copy of the buffer as we'll start another read immediately
Array.Copy(bufferForRead, 0, bufferForWrite, 0, bytesRead);
// Write copied buffer to destination stream
destinationStream.BeginWrite(bufferForWrite, 0, bytesRead, BeginWriteCallback, null);
// Start the next read (looks like async recursion I guess)
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginWriteCallback(IAsyncResult asyncRes)
{
// Finish writing to destination stream
destinationStream.EndWrite(asyncRes);
}
For .NET 3.5 and before try :
MemoryStream1.WriteTo(MemoryStream2);
Easy and safe - make new stream from original source:
MemoryStream source = new MemoryStream(byteArray);
MemoryStream copy = new MemoryStream(byteArray);
The following code to solve the issue copy the Stream to MemoryStream using CopyTo
Stream stream = new MemoryStream();
//any function require input the stream. In mycase to save the PDF file as stream
document.Save(stream);
MemoryStream newMs = (MemoryStream)stream;
byte[] getByte = newMs.ToArray();
//Note - please dispose the stream in the finally block instead of inside using block as it will throw an error 'Access denied as the stream is closed'

Can I get a GZipStream for a file without writing to intermediate temporary storage?

Can I get a GZipStream for a file on disk without writing the entire compressed content to temporary storage? I'm currently using a temporary file on disk in order to avoid possible memory exhaustion using MemoryStream on very large files (this is working fine).
public void UploadFile(string filename)
{
using (var temporaryFileStream = File.Open("tempfile.tmp", FileMode.CreateNew, FileAccess.ReadWrite))
{
using (var fileStream = File.OpenRead(filename))
using (var compressedStream = new GZipStream(temporaryFileStream, CompressionMode.Compress, true))
{
fileStream.CopyTo(compressedStream);
}
temporaryFileStream.Position = 0;
Uploader.Upload(temporaryFileStream);
}
}
What I'd like to do is eliminate the temporary storage by creating GZipStream, and have it read from the original file only as the Uploader class requests bytes from it. Is such a thing possible? How might such an implementation be structured?
Note that Upload is a static method with signature static void Upload(Stream stream).
Edit: The full code is here if it's useful. I hope I've included all the relevant context in my sample above however.
Yes, this is possible, but not easily with any of the standard .NET stream classes. When I needed to do something like this, I created a new type of stream.
It's basically a circular buffer that allows one producer (writer) and one consumer (reader). It's pretty easy to use. Let me whip up an example. In the meantime, you can adapt the example in the article.
Later: Here's an example that should come close to what you're asking for.
using (var pcStream = new ProducerConsumerStream(BufferSize))
{
// start upload in a thread
var uploadThread = new Thread(UploadThreadProc(pcStream));
uploadThread.Start();
// Open the input file and attach the gzip stream to the pcStream
using (var inputFile = File.OpenRead("inputFilename"))
{
// create gzip stream
using (var gz = new GZipStream(pcStream, CompressionMode.Compress, true))
{
var bytesRead = 0;
var buff = new byte[65536]; // 64K buffer
while ((bytesRead = inputFile.Read(buff, 0, buff.Length)) != 0)
{
gz.Write(buff, 0, bytesRead);
}
}
}
// The entire file has been compressed and copied to the buffer.
// Mark the stream as "input complete".
pcStream.CompleteAdding();
// wait for the upload thread to complete.
uploadThread.Join();
// It's very important that you don't close the pcStream before
// the uploader is done!
}
The upload thread should be pretty simple:
void UploadThreadProc(object state)
{
var pcStream = (ProducerConsumerStream)state;
Uploader.Upload(pcStream);
}
You could, of course, put the producer on a background thread and have the upload be done on the main thread. Or have them both on background threads. I'm not familiar with the semantics of your uploader, so I'll leave that decision to you.

capturing ALL stdout data using Process.Start [duplicate]

In C# (.NET 4.0 running under Mono 2.8 on SuSE) I would like to run an external batch command and capture its ouput in binary form. The external tool I use is called 'samtools' (samtools.sourceforge.net) and among other things it can return records from an indexed binary file format called BAM.
I use Process.Start to run the external command, and I know that I can capture its output by redirecting Process.StandardOutput. The problem is, that's a text stream with an encoding, so it doesn't give me access to the raw bytes of the output. The almost-working solution I found is to access the underlying stream.
Here's my code:
Process cmdProcess = new Process();
ProcessStartInfo cmdStartInfo = new ProcessStartInfo();
cmdStartInfo.FileName = "samtools";
cmdStartInfo.RedirectStandardError = true;
cmdStartInfo.RedirectStandardOutput = true;
cmdStartInfo.RedirectStandardInput = false;
cmdStartInfo.UseShellExecute = false;
cmdStartInfo.CreateNoWindow = true;
cmdStartInfo.Arguments = "view -u " + BamFileName + " " + chromosome + ":" + start + "-" + end;
cmdProcess.EnableRaisingEvents = true;
cmdProcess.StartInfo = cmdStartInfo;
cmdProcess.Start();
// Prepare to read each alignment (binary)
var br = new BinaryReader(cmdProcess.StandardOutput.BaseStream);
while (!cmdProcess.StandardOutput.EndOfStream)
{
// Consume the initial, undocumented BAM data
br.ReadBytes(23);
// ... more parsing follows
But when I run this, the first 23bytes that I read are not the first 23 bytes in the ouput, but rather somewhere several hundred or thousand bytes downstream. I assume that StreamReader does some buffering and so the underlying stream is already advanced say 4K into the output. The underlying stream does not support seeking back to the start.
And I'm stuck here. Does anyone have a working solution for running an external command and capturing its stdout in binary form? The ouput may be very large so I would like to stream it.
Any help appreciated.
By the way, my current workaround is to have samtools return the records in text format, then parse those, but this is pretty slow and I'm hoping to speed things up by using the binary format directly.
Using StandardOutput.BaseStream is the correct approach, but you must not use any other property or method of cmdProcess.StandardOutput. For example, accessing cmdProcess.StandardOutput.EndOfStream will cause the StreamReader for StandardOutput to read part of the stream, removing the data you want to access.
Instead, simply read and parse the data from br (assuming you know how to parse the data, and won't read past the end of stream, or are willing to catch an EndOfStreamException). Alternatively, if you don't know how big the data is, use Stream.CopyTo to copy the entire standard output stream to a new file or memory stream.
Since you explicitly specified running on Suse linux and mono, you can work around the problem by using native unix calls to create the redirection and read from the stream. Such as:
using System;
using System.Diagnostics;
using System.IO;
using Mono.Unix;
class Test
{
public static void Main()
{
int reading, writing;
Mono.Unix.Native.Syscall.pipe(out reading, out writing);
int stdout = Mono.Unix.Native.Syscall.dup(1);
Mono.Unix.Native.Syscall.dup2(writing, 1);
Mono.Unix.Native.Syscall.close(writing);
Process cmdProcess = new Process();
ProcessStartInfo cmdStartInfo = new ProcessStartInfo();
cmdStartInfo.FileName = "cat";
cmdStartInfo.CreateNoWindow = true;
cmdStartInfo.Arguments = "test.exe";
cmdProcess.StartInfo = cmdStartInfo;
cmdProcess.Start();
Mono.Unix.Native.Syscall.dup2(stdout, 1);
Mono.Unix.Native.Syscall.close(stdout);
Stream s = new UnixStream(reading);
byte[] buf = new byte[1024];
int bytes = 0;
int current;
while((current = s.Read(buf, 0, buf.Length)) > 0)
{
bytes += current;
}
Mono.Unix.Native.Syscall.close(reading);
Console.WriteLine("{0} bytes read", bytes);
}
}
Under unix, file descriptors are inherited by child processes unless marked otherwise (close on exec). So, to redirect stdout of a child, all you need to do is change the file descriptor #1 in the parent process before calling exec. Unix also provides a handy thing called a pipe which is a unidirectional communication channel, with two file descriptors representing the two endpoints. For duplicating file descriptors, you can use dup or dup2 both of which create an equivalent copy of a descriptor, but dup returns a new descriptor allocated by the system and dup2 places the copy in a specific target (closing it if necessary). What the above code does, then:
Creates a pipe with endpoints reading and writing
Saves a copy of the current stdout descriptor
Assigns the pipe's write endpoint to stdout and closes the original
Starts the child process so it inherits stdout connected to the write endpoint of the pipe
Restores the saved stdout
Reads from the reading endpoint of the pipe by wrapping it in a UnixStream
Note, in native code, a process is usually started by a fork+exec pair, so the file descriptors can be modified in the child process itself, but before the new program is loaded. This managed version is not thread-safe as it has to temporarily modify the stdout of the parent process.
Since the code starts the child process without managed redirection, the .NET runtime does not change any descriptors or create any streams. So, the only reader of the child's output will be the user code, which uses a UnixStream to work around the StreamReader's encoding issue,
I checked out what's happening with reflector. It seems to me that StreamReader doesn't read until you call read on it. But it's created with a buffer size of 0x1000, so maybe it does. But luckily, until you actually read from it, you can safely get the buffered data out of it: it has a private field byte[] byteBuffer, and two integer fields, byteLen and bytePos, the first means how many bytes are in the buffer, the second means how many have you consumed, should be zero. So first read this buffer with reflection, then create the BinaryReader.
Maybe you can try like this:
public class ThirdExe
{
private static TongueSvr _instance = null;
private Diagnostics.Process _process = null;
private Stream _messageStream;
private byte[] _recvBuff = new byte[65536];
private int _recvBuffLen;
private Queue<TonguePb.Msg> _msgQueue = new Queue<TonguePb.Msg>();
void StartProcess()
{
try
{
_process = new Diagnostics.Process();
_process.EnableRaisingEvents = false;
_process.StartInfo.FileName = "d:/code/boot/tongueerl_d.exe"; // Your exe
_process.StartInfo.UseShellExecute = false;
_process.StartInfo.CreateNoWindow = true;
_process.StartInfo.RedirectStandardOutput = true;
_process.StartInfo.RedirectStandardInput = true;
_process.StartInfo.RedirectStandardError = true;
_process.ErrorDataReceived += new Diagnostics.DataReceivedEventHandler(ErrorReceived);
_process.Exited += new EventHandler(OnProcessExit);
_process.Start();
_messageStream = _process.StandardInput.BaseStream;
_process.BeginErrorReadLine();
AsyncRead();
}
catch (Exception e)
{
Debug.LogError("Unable to launch app: " + e.Message);
}
private void AsyncRead()
{
_process.StandardOutput.BaseStream.BeginRead(_recvBuff, 0, _recvBuff.Length
, new AsyncCallback(DataReceived), null);
}
void DataReceived(IAsyncResult asyncResult)
{
int nread = _process.StandardOutput.BaseStream.EndRead(asyncResult);
if (nread == 0)
{
Debug.Log("process read finished"); // process exit
return;
}
_recvBuffLen += nread;
Debug.LogFormat("recv data size.{0} remain.{1}", nread, _recvBuffLen);
ParseMsg();
AsyncRead();
}
void ParseMsg()
{
if (_recvBuffLen < 4)
{
return;
}
int len = IPAddress.NetworkToHostOrder(BitConverter.ToInt32(_recvBuff, 0));
if (len > _recvBuffLen - 4)
{
Debug.LogFormat("current call can't parse the NetMsg for data incomplete");
return;
}
TonguePb.Msg msg = TonguePb.Msg.Parser.ParseFrom(_recvBuff, 4, len);
Debug.LogFormat("recv msg count.{1}:\n {0} ", msg.ToString(), _msgQueue.Count + 1);
_recvBuffLen -= len + 4;
_msgQueue.Enqueue(msg);
}
The key is _process.StandardOutput.BaseStream.BeginRead(_recvBuff, 0, _recvBuff.Length, new AsyncCallback(DataReceived), null); and the very very important is that convert to asynchronous reads event like Process.OutputDataReceived.

Preserving binary data in streams

Using C#, I was surprised how complicated it seemed to preserve binary info from a stream. I'm trying to download a PNG datafile using the WebRequest class, but just transfering the resulting Stream to a file, without corrupting it was more verbose than I thought. First, just using StreamReader and StreamWriter was no good as the ReadToEnd() function returns a string, which effectivly doubles the size of the PNG file (probably due to the UTF conversion)
So my question is, do I really have to write all this code, or is there a cleaner way of doing it?
Stream srBytes = webResponse.GetResponseStream();
// Write to file
Stream swBytes = new FileStream("map(" + i.ToString() + ").png",FileMode.Create,FileAccess.Write);
int count = 0;
byte[] buffer = new byte[4096];
do
{
count = srBytes.Read(buffer, 0, buffer.Length);
swBytes.Write(buffer, 0, count);
}
while (count != 0);
swBytes.Close();
Using StreamReader/StreamWriter is definitely a mistake, yes - because that's trying to load the file as text, which it's not.
Options:
Use WebClient.DownloadFile as SLaks suggested
In .NET 4, use Stream.CopyTo(Stream) to copy the data in much the same way as you've got here
Otherwise, write your own utility method to do the copying, then you only need to do it once; you could even write this as an extension method, which means when you upgrade to .NET 4 you can just get rid of the utility method and use the built-in one with no change to the calling code:
public static class StreamExtensions
{
public static void CopyTo(this Stream source, Stream destination)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (destination == null)
{
throw new ArgumentNullException("destination");
}
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, bytesRead);
}
}
}
Note that you should be using using statements for the web response, response stream and output stream in order to make sure they're always closed appropriately, like this:
using (WebResponse response = request.GetResponse())
using (Stream responseStream = response.GetResponseStream())
using (Stream outputStream = File.OpenWrite("map(" + i + ").png"))
{
responseStream.CopyTo(outputStream);
}
You can call WebClient.DownloadFile(url, localPath).
In .Net 4.0, you can simplify your current code by calling Stream.CopyTo.

How to merge efficiently gigantic files with C#

I have over 125 TSV files of ~100Mb each that I want to merge. The merge operation is allowed destroy the 125 files, but not the data. What matter is that a the end, I end up with a big file of the content of all the files one after the other (no specific order).
Is there an efficient way to do that? I was wondering if Windows provides an API to simply make a big "Union" of all those files? Otherwise, I will have to read all the files and write a big one.
Thanks!
So "merging" is really just writing the files one after the other? That's pretty straightforward - just open one output stream, and then repeatedly open an input stream, copy the data, close. For example:
static void ConcatenateFiles(string outputFile, params string[] inputFiles)
{
using (Stream output = File.OpenWrite(outputFile))
{
foreach (string inputFile in inputFiles)
{
using (Stream input = File.OpenRead(inputFile))
{
input.CopyTo(output);
}
}
}
}
That's using the Stream.CopyTo method which is new in .NET 4. If you're not using .NET 4, another helper method would come in handy:
private static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
There's nothing that I'm aware of that is more efficient than this... but importantly, this won't take up much memory on your system at all. It's not like it's repeatedly reading the whole file into memory then writing it all out again.
EDIT: As pointed out in the comments, there are ways you can fiddle with file options to potentially make it slightly more efficient in terms of what the file system does with the data. But fundamentally you're going to be reading the data and writing it, a buffer at a time, either way.
Do it from the command line:
copy 1.txt+2.txt+3.txt combined.txt
or
copy *.txt combined.txt
Do you mean with merge that you want to decide with some custom logic what lines go where? Or do you mean that you mainly want to concatenate the files into one big one?
In the case of the latter, it is possible that you don't need to do this programmatically at all, just generate one batch file with this (/b is for binary, remove if not needed):
copy /b "file 1.tsv" + "file 2.tsv" "destination file.tsv"
Using C#, I'd take the following approach. Write a simple function that copies two streams:
void CopyStreamToStream(Stream dest, Stream src)
{
int bytesRead;
// experiment with the best buffer size, often 65536 is very performant
byte[] buffer = new byte[GOOD_BUFFER_SIZE];
// copy everything
while((bytesRead = src.Read(buffer, 0, buffer.Length)) > 0)
{
dest.Write(buffer, 0, bytesRead);
}
}
// then use as follows (do in a loop, don't forget to use using-blocks)
CopStreamtoStream(yourOutputStream, yourInputStream);
Using a folder of 100MB text files totalling ~12GB, I found that a small time saving could be made over the accepted answer by using File.ReadAllBytes and then writing that out to the stream.
[Test]
public void RaceFileMerges()
{
var inputFilesPath = #"D:\InputFiles";
var inputFiles = Directory.EnumerateFiles(inputFilesPath).ToArray();
var sw = new Stopwatch();
sw.Start();
ConcatenateFilesUsingReadAllBytes(#"D:\ReadAllBytesResult", inputFiles);
Console.WriteLine($"ReadAllBytes method in {sw.Elapsed}");
sw.Reset();
sw.Start();
ConcatenateFiles(#"D:\CopyToResult", inputFiles);
Console.WriteLine($"CopyTo method in {sw.Elapsed}");
}
private static void ConcatenateFiles(string outputFile, params string[] inputFiles)
{
using (var output = File.OpenWrite(outputFile))
{
foreach (var inputFile in inputFiles)
{
using (var input = File.OpenRead(inputFile))
{
input.CopyTo(output);
}
}
}
}
private static void ConcatenateFilesUsingReadAllBytes(string outputFile, params string[] inputFiles)
{
using (var stream = File.OpenWrite(outputFile))
{
foreach (var inputFile in inputFiles)
{
var currentBytes = File.ReadAllBytes(inputFile);
stream.Write(currentBytes, 0, currentBytes.Length);
}
}
}
ReadAllBytes method in 00:01:22.2753300
CopyTo method in 00:01:30.3122215
I repeated this a number of times with similar results.

Categories

Resources