What would be the fastest way to concatenate three files in C#? - c#

I need to concatenate 3 files using C#. A header file, content, and a footer file, but I want to do this as cool as it can be done.
Cool = really small code or really fast (non-assembly code).

I support Mehrdad Afshari on his code being exactly same as used in System.IO.Stream.CopyTo.
I would still wonder why did he not use that same function instead of rewriting its implementation.
string[] srcFileNames = { "file1.txt", "file2.txt", "file3.txt" };
string destFileName = "destFile.txt";
using (Stream destStream = File.OpenWrite(destFileName))
{
foreach (string srcFileName in srcFileNames)
{
using (Stream srcStream = File.OpenRead(srcFileName))
{
srcStream.CopyTo(destStream);
}
}
}
According to the disassembler (ILSpy) the default buffer size is 4096. CopyTo function has got an overload, which lets you specify the buffer size in case you are not happy with 4096 bytes.

void CopyStream(Stream destination, Stream source) {
int count;
byte[] buffer = new byte[BUFFER_SIZE];
while( (count = source.Read(buffer, 0, buffer.Length)) > 0)
destination.Write(buffer, 0, count);
}
CopyStream(outputFileStream, fileStream1);
CopyStream(outputFileStream, fileStream2);
CopyStream(outputFileStream, fileStream3);

If your files are text and not large, there's something to be said for dead-simple, obvious code. I'd use the following.
File.ReadAllText("file1") + File.ReadAllText("file2") + File.ReadAllText("file3");
If your files are large text files and you're on Framework 4.0, you can use File.ReadLines to avoid buffering the entire file.
File.WriteAllLines("out", new[] { "file1", "file2", "file3" }.SelectMany(File.ReadLines));
If your files are binary, See Mehrdad's answer

Another way....how about letting the OS do it for you?:
ProcessStartInfo psi = new ProcessStartInfo("cmd.exe",
String.Format(#" /c copy {0} + {1} + {2} {3}",
file1, file2, file3, dest));
psi.UseShellExecute = false;
Process process = Process.Start(psi);
process.WaitForExit();

You mean 3 text files?? Does the result need to be a file again?
How about something like:
string contents1 = File.ReadAllText(filename1);
string contents2 = File.ReadAllText(filename2);
string contents3 = File.ReadAllText(filename3);
File.WriteAllText(outputFileName, contents1 + contents2 + contents3);
Of course, with a StringBuilder and a bit of extra smarts, you could easily extend that to handle any number of input files :-)
Cheers

If you are in a Win32 environment, the most efficient solution could be to use the Win32 API function "WriteFile". There is an example in VB 6, but rewriting it in C# is not difficult.

Related

create zip file in .net with password

I'm working on a project that I need to create zip with password protected from file content in c#.
Before I've use System.IO.Compression.GZipStream for creating gzip content.
Does .net have any functionality for create zip or rar password protected file?
Take a look at DotNetZip (#AFract supplied a new link to GitHub in the comments)
It has got pretty geat documentation and it also allow you to load the dll at runtime as an embeded file.
Unfortunately there is no such functionality in the framework. There is a way to make ZIP files, but without password. If you want to create password protected ZIP files in C#, I'd recommend SevenZipSharp. It's basically a managed wrapper for 7-Zip.
SevenZipBase.SetLibraryPath(Path.Combine(
Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) ?? Environment.CurrentDirectory,
"7za.dll"));
SevenZipCompressor compressor = new SevenZipCompressor();
compressor.Compressing += Compressor_Compressing;
compressor.FileCompressionStarted += Compressor_FileCompressionStarted;
compressor.CompressionFinished += Compressor_CompressionFinished;
string password = #"whatever";
string destinationFile = #"C:\Temp\whatever.zip";
string[] sourceFiles = Directory.GetFiles(#"C:\Temp\YourFiles\");
if (String.IsNullOrWhiteSpace(password))
{
compressor.CompressFiles(destinationFile, sourceFiles);
}
else
{
//optional
compressor.EncryptHeaders = true;
compressor.CompressFilesEncrypted(destinationFile, password, sourceFiles);
}
DotNetZip worked great in a clean way.
DotNetZip is a FAST, FREE class library and toolset for manipulating zip files.
Code
static void Main(string[] args)
{
using (ZipFile zip = new ZipFile())
{
zip.Password = "mypassword";
zip.AddDirectory(#"C:\Test\Report_CCLF5\");
zip.Save(#"C:\Test\Report_CCLF5_PartB.zip");
}
}
I want to add some more alternatives.
For .NET one can use SharpZipLib, for Xamarin use SharpZipLib.Portable.
Example for .NET:
using ICSharpCode.SharpZipLib.Zip;
// Compresses the supplied memory stream, naming it as zipEntryName, into a zip,
// which is returned as a memory stream or a byte array.
//
public MemoryStream CreateToMemoryStream(MemoryStream memStreamIn, string zipEntryName) {
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemStream);
zipStream.SetLevel(3); //0-9, 9 being the highest level of compression
zipStream.Password = "Your password";
ZipEntry newEntry = new ZipEntry(zipEntryName);
newEntry.DateTime = DateTime.Now;
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(memStreamIn, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false; // False stops the Close also Closing the underlying stream.
zipStream.Close(); // Must finish the ZipOutputStream before using outputMemStream.
outputMemStream.Position = 0;
return outputMemStream;
// Alternative outputs:
// ToArray is the cleaner and easiest to use correctly with the penalty of duplicating allocated memory.
byte[] byteArrayOut = outputMemStream.ToArray();
// GetBuffer returns a raw buffer raw and so you need to account for the true length yourself.
byte[] byteArrayOut = outputMemStream.GetBuffer();
long len = outputMemStream.Length;
}
More samples can be found here.
If you can live without password functionality, one can mention ZipStorer or the built in .NET function in System.IO.Compression.

Whats the most efficient way in C# to cut off the first 4 bytes of a file?

I have a compressed (LZMA) .txt file and need to decompress it, but i have to exclude the first 4 bytes as they are not part of the file content.
I load my file like this:
byte[] curFile = File.ReadAllBytes(files[i]);
Performance is critical as i have to loop trough over 14k+ files, average file size is around 4KB.
for (int i = 0; i < filePath.Length; i++)
{
var positionToSkipTo = 4;
using (var fileStream = File.OpenRead(filePath))
{
fileStream.Seek(positionToSkipTo, SeekOrigin.Begin);
var curFile = new byte[fileStream.Length - positionToSkipTo];
fileStream.Read(curFile, 0, curFile.Length);
//Do your thing
}
}
Everything is self-explanatory. Important functions are listed at MSDN FileStream class documentation.
If you're just using a byte array, you can utilize the ConstrainedCopy method in the Array class.
Array.ConstrainedCopy(unclippedArray, 4, clippedArray, 0, unclippedArray.Length - 4);
If you're not going to just be dealing with the raw bytes, utilize a memory stream and a binary reader or a filestream like other people suggested.

capturing ALL stdout data using Process.Start [duplicate]

In C# (.NET 4.0 running under Mono 2.8 on SuSE) I would like to run an external batch command and capture its ouput in binary form. The external tool I use is called 'samtools' (samtools.sourceforge.net) and among other things it can return records from an indexed binary file format called BAM.
I use Process.Start to run the external command, and I know that I can capture its output by redirecting Process.StandardOutput. The problem is, that's a text stream with an encoding, so it doesn't give me access to the raw bytes of the output. The almost-working solution I found is to access the underlying stream.
Here's my code:
Process cmdProcess = new Process();
ProcessStartInfo cmdStartInfo = new ProcessStartInfo();
cmdStartInfo.FileName = "samtools";
cmdStartInfo.RedirectStandardError = true;
cmdStartInfo.RedirectStandardOutput = true;
cmdStartInfo.RedirectStandardInput = false;
cmdStartInfo.UseShellExecute = false;
cmdStartInfo.CreateNoWindow = true;
cmdStartInfo.Arguments = "view -u " + BamFileName + " " + chromosome + ":" + start + "-" + end;
cmdProcess.EnableRaisingEvents = true;
cmdProcess.StartInfo = cmdStartInfo;
cmdProcess.Start();
// Prepare to read each alignment (binary)
var br = new BinaryReader(cmdProcess.StandardOutput.BaseStream);
while (!cmdProcess.StandardOutput.EndOfStream)
{
// Consume the initial, undocumented BAM data
br.ReadBytes(23);
// ... more parsing follows
But when I run this, the first 23bytes that I read are not the first 23 bytes in the ouput, but rather somewhere several hundred or thousand bytes downstream. I assume that StreamReader does some buffering and so the underlying stream is already advanced say 4K into the output. The underlying stream does not support seeking back to the start.
And I'm stuck here. Does anyone have a working solution for running an external command and capturing its stdout in binary form? The ouput may be very large so I would like to stream it.
Any help appreciated.
By the way, my current workaround is to have samtools return the records in text format, then parse those, but this is pretty slow and I'm hoping to speed things up by using the binary format directly.
Using StandardOutput.BaseStream is the correct approach, but you must not use any other property or method of cmdProcess.StandardOutput. For example, accessing cmdProcess.StandardOutput.EndOfStream will cause the StreamReader for StandardOutput to read part of the stream, removing the data you want to access.
Instead, simply read and parse the data from br (assuming you know how to parse the data, and won't read past the end of stream, or are willing to catch an EndOfStreamException). Alternatively, if you don't know how big the data is, use Stream.CopyTo to copy the entire standard output stream to a new file or memory stream.
Since you explicitly specified running on Suse linux and mono, you can work around the problem by using native unix calls to create the redirection and read from the stream. Such as:
using System;
using System.Diagnostics;
using System.IO;
using Mono.Unix;
class Test
{
public static void Main()
{
int reading, writing;
Mono.Unix.Native.Syscall.pipe(out reading, out writing);
int stdout = Mono.Unix.Native.Syscall.dup(1);
Mono.Unix.Native.Syscall.dup2(writing, 1);
Mono.Unix.Native.Syscall.close(writing);
Process cmdProcess = new Process();
ProcessStartInfo cmdStartInfo = new ProcessStartInfo();
cmdStartInfo.FileName = "cat";
cmdStartInfo.CreateNoWindow = true;
cmdStartInfo.Arguments = "test.exe";
cmdProcess.StartInfo = cmdStartInfo;
cmdProcess.Start();
Mono.Unix.Native.Syscall.dup2(stdout, 1);
Mono.Unix.Native.Syscall.close(stdout);
Stream s = new UnixStream(reading);
byte[] buf = new byte[1024];
int bytes = 0;
int current;
while((current = s.Read(buf, 0, buf.Length)) > 0)
{
bytes += current;
}
Mono.Unix.Native.Syscall.close(reading);
Console.WriteLine("{0} bytes read", bytes);
}
}
Under unix, file descriptors are inherited by child processes unless marked otherwise (close on exec). So, to redirect stdout of a child, all you need to do is change the file descriptor #1 in the parent process before calling exec. Unix also provides a handy thing called a pipe which is a unidirectional communication channel, with two file descriptors representing the two endpoints. For duplicating file descriptors, you can use dup or dup2 both of which create an equivalent copy of a descriptor, but dup returns a new descriptor allocated by the system and dup2 places the copy in a specific target (closing it if necessary). What the above code does, then:
Creates a pipe with endpoints reading and writing
Saves a copy of the current stdout descriptor
Assigns the pipe's write endpoint to stdout and closes the original
Starts the child process so it inherits stdout connected to the write endpoint of the pipe
Restores the saved stdout
Reads from the reading endpoint of the pipe by wrapping it in a UnixStream
Note, in native code, a process is usually started by a fork+exec pair, so the file descriptors can be modified in the child process itself, but before the new program is loaded. This managed version is not thread-safe as it has to temporarily modify the stdout of the parent process.
Since the code starts the child process without managed redirection, the .NET runtime does not change any descriptors or create any streams. So, the only reader of the child's output will be the user code, which uses a UnixStream to work around the StreamReader's encoding issue,
I checked out what's happening with reflector. It seems to me that StreamReader doesn't read until you call read on it. But it's created with a buffer size of 0x1000, so maybe it does. But luckily, until you actually read from it, you can safely get the buffered data out of it: it has a private field byte[] byteBuffer, and two integer fields, byteLen and bytePos, the first means how many bytes are in the buffer, the second means how many have you consumed, should be zero. So first read this buffer with reflection, then create the BinaryReader.
Maybe you can try like this:
public class ThirdExe
{
private static TongueSvr _instance = null;
private Diagnostics.Process _process = null;
private Stream _messageStream;
private byte[] _recvBuff = new byte[65536];
private int _recvBuffLen;
private Queue<TonguePb.Msg> _msgQueue = new Queue<TonguePb.Msg>();
void StartProcess()
{
try
{
_process = new Diagnostics.Process();
_process.EnableRaisingEvents = false;
_process.StartInfo.FileName = "d:/code/boot/tongueerl_d.exe"; // Your exe
_process.StartInfo.UseShellExecute = false;
_process.StartInfo.CreateNoWindow = true;
_process.StartInfo.RedirectStandardOutput = true;
_process.StartInfo.RedirectStandardInput = true;
_process.StartInfo.RedirectStandardError = true;
_process.ErrorDataReceived += new Diagnostics.DataReceivedEventHandler(ErrorReceived);
_process.Exited += new EventHandler(OnProcessExit);
_process.Start();
_messageStream = _process.StandardInput.BaseStream;
_process.BeginErrorReadLine();
AsyncRead();
}
catch (Exception e)
{
Debug.LogError("Unable to launch app: " + e.Message);
}
private void AsyncRead()
{
_process.StandardOutput.BaseStream.BeginRead(_recvBuff, 0, _recvBuff.Length
, new AsyncCallback(DataReceived), null);
}
void DataReceived(IAsyncResult asyncResult)
{
int nread = _process.StandardOutput.BaseStream.EndRead(asyncResult);
if (nread == 0)
{
Debug.Log("process read finished"); // process exit
return;
}
_recvBuffLen += nread;
Debug.LogFormat("recv data size.{0} remain.{1}", nread, _recvBuffLen);
ParseMsg();
AsyncRead();
}
void ParseMsg()
{
if (_recvBuffLen < 4)
{
return;
}
int len = IPAddress.NetworkToHostOrder(BitConverter.ToInt32(_recvBuff, 0));
if (len > _recvBuffLen - 4)
{
Debug.LogFormat("current call can't parse the NetMsg for data incomplete");
return;
}
TonguePb.Msg msg = TonguePb.Msg.Parser.ParseFrom(_recvBuff, 4, len);
Debug.LogFormat("recv msg count.{1}:\n {0} ", msg.ToString(), _msgQueue.Count + 1);
_recvBuffLen -= len + 4;
_msgQueue.Enqueue(msg);
}
The key is _process.StandardOutput.BaseStream.BeginRead(_recvBuff, 0, _recvBuff.Length, new AsyncCallback(DataReceived), null); and the very very important is that convert to asynchronous reads event like Process.OutputDataReceived.

Path at MemoryStream for use in CMD

Is there a way to get the "Path" to a memorystream?
For example if i want to use CMD and point to a filepath, like "C:..." but instead the file is in a memorystream, is it possible to point it there?
I have tried searching on it but i can´t find any clear information on this.
EDIT:
If it helps, the thing i am wanting to access is an image file, a print screen like this:
using (Bitmap b = new Bitmap(Screen.PrimaryScreen.Bounds.Width, Screen.PrimaryScreen.Bounds.Height))
{
using (Graphics g = Graphics.FromImage(b))
{
g.CopyFromScreen(0, 0, 0, 0, Screen.PrimaryScreen.Bounds.Size, CopyPixelOperation.SourceCopy);
}
using (MemoryStream ms = new MemoryStream())
{
b.Save(ms, ImageFormat.Bmp);
StreamReader read = new StreamReader(ms);
ms.Position = 0;
var cwebp = new Process
{
StartInfo =
{
WindowStyle = ProcessWindowStyle.Normal,
FileName = "cwebp.exe",
Arguments = string.Format(
"-q 100 -lossless -m 6 -alpha_q 100 \"{0}\" -o \"{1}\"", ms, "C:\test.webp")
},
};
cwebp.Start();
}
}
and then some random testing to get it to work....
And the thing i want to pass it to is cwebp, a Webp encoder.
Which is why i must use CMD, as i can´t work with it at the C# level, else i wouldn´t have this problem.
Yeah that is usually protected. If you know where it is, you might be able to grab it with an unsafe pointer. It might be easier to write it to a text file that cmd could read, or push it to Console to read.
If using .NET 4.0 or greater you can use a MemoryMappedFile. I haven't toyed with this class since 4.0 beta. However, my understanding is its useful for writing memory to disk in cases where you are dealing with large amounts of data or want some level of application memory sharing.
Usage per MSDN:
static void Main(string[] args)
{
long offset = 0x10000000; // 256 megabytes
long length = 0x20000000; // 512 megabytes
// Create the memory-mapped file.
using (var mmf = MemoryMappedFile.CreateFromFile(#"c:\ExtremelyLargeImage.data", FileMode.Open,"ImgA"))
{
// Create a random access view, from the 256th megabyte (the offset)
// to the 768th megabyte (the offset plus length).
using (var accessor = mmf.CreateViewAccessor(offset, length))
{
int colorSize = Marshal.SizeOf(typeof(MyColor));
MyColor color;
// Make changes to the view.
for (long i = 0; i < length; i += colorSize)
{
accessor.Read(i, out color);
color.Brighten(10);
accessor.Write(i, ref color);
}
}
}
}
If cwebp.exe is expecting a filename, there is nothing you can put on the command line that satisfies your criteria. Anything enough like a file that the external program can open it won't be able to get its data from your program's memory. There are a few possibilities, but they probably all require changes to cwebp.exe:
You can write to the new process's standard in
You can create a named pipe from which the process can read your data
You can create a named shared memory object from which the other process can read
You haven't said why you're avoiding writing to a file, so it's hard to say which is best.

How to merge efficiently gigantic files with C#

I have over 125 TSV files of ~100Mb each that I want to merge. The merge operation is allowed destroy the 125 files, but not the data. What matter is that a the end, I end up with a big file of the content of all the files one after the other (no specific order).
Is there an efficient way to do that? I was wondering if Windows provides an API to simply make a big "Union" of all those files? Otherwise, I will have to read all the files and write a big one.
Thanks!
So "merging" is really just writing the files one after the other? That's pretty straightforward - just open one output stream, and then repeatedly open an input stream, copy the data, close. For example:
static void ConcatenateFiles(string outputFile, params string[] inputFiles)
{
using (Stream output = File.OpenWrite(outputFile))
{
foreach (string inputFile in inputFiles)
{
using (Stream input = File.OpenRead(inputFile))
{
input.CopyTo(output);
}
}
}
}
That's using the Stream.CopyTo method which is new in .NET 4. If you're not using .NET 4, another helper method would come in handy:
private static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
There's nothing that I'm aware of that is more efficient than this... but importantly, this won't take up much memory on your system at all. It's not like it's repeatedly reading the whole file into memory then writing it all out again.
EDIT: As pointed out in the comments, there are ways you can fiddle with file options to potentially make it slightly more efficient in terms of what the file system does with the data. But fundamentally you're going to be reading the data and writing it, a buffer at a time, either way.
Do it from the command line:
copy 1.txt+2.txt+3.txt combined.txt
or
copy *.txt combined.txt
Do you mean with merge that you want to decide with some custom logic what lines go where? Or do you mean that you mainly want to concatenate the files into one big one?
In the case of the latter, it is possible that you don't need to do this programmatically at all, just generate one batch file with this (/b is for binary, remove if not needed):
copy /b "file 1.tsv" + "file 2.tsv" "destination file.tsv"
Using C#, I'd take the following approach. Write a simple function that copies two streams:
void CopyStreamToStream(Stream dest, Stream src)
{
int bytesRead;
// experiment with the best buffer size, often 65536 is very performant
byte[] buffer = new byte[GOOD_BUFFER_SIZE];
// copy everything
while((bytesRead = src.Read(buffer, 0, buffer.Length)) > 0)
{
dest.Write(buffer, 0, bytesRead);
}
}
// then use as follows (do in a loop, don't forget to use using-blocks)
CopStreamtoStream(yourOutputStream, yourInputStream);
Using a folder of 100MB text files totalling ~12GB, I found that a small time saving could be made over the accepted answer by using File.ReadAllBytes and then writing that out to the stream.
[Test]
public void RaceFileMerges()
{
var inputFilesPath = #"D:\InputFiles";
var inputFiles = Directory.EnumerateFiles(inputFilesPath).ToArray();
var sw = new Stopwatch();
sw.Start();
ConcatenateFilesUsingReadAllBytes(#"D:\ReadAllBytesResult", inputFiles);
Console.WriteLine($"ReadAllBytes method in {sw.Elapsed}");
sw.Reset();
sw.Start();
ConcatenateFiles(#"D:\CopyToResult", inputFiles);
Console.WriteLine($"CopyTo method in {sw.Elapsed}");
}
private static void ConcatenateFiles(string outputFile, params string[] inputFiles)
{
using (var output = File.OpenWrite(outputFile))
{
foreach (var inputFile in inputFiles)
{
using (var input = File.OpenRead(inputFile))
{
input.CopyTo(output);
}
}
}
}
private static void ConcatenateFilesUsingReadAllBytes(string outputFile, params string[] inputFiles)
{
using (var stream = File.OpenWrite(outputFile))
{
foreach (var inputFile in inputFiles)
{
var currentBytes = File.ReadAllBytes(inputFile);
stream.Write(currentBytes, 0, currentBytes.Length);
}
}
}
ReadAllBytes method in 00:01:22.2753300
CopyTo method in 00:01:30.3122215
I repeated this a number of times with similar results.

Categories

Resources