Streaming large data from different clients at the same time - c#

This is a bit architecture and code issue. I have a lot of source url's containing huge files that come from many different clients that I have to download and save on filesystem.
I have hardware limits on RAM. So I want to buffer each stream in chunks of bytes and I think it will be good idea to initiate one thread for each downloading of a stream.
I have added a coding for initiating a thread/task using Task Parallel Library as such:
public Task RunTask(Action action)
{
Task task = Task.Run(action);
return task;
}
and I pass for the action parameter the following method:
public void DownloadFileThroughWebStream(WebClient webClient, Uri src, string dest, long buffersize)
{
Stream stream = webClient.OpenRead(src);
byte[] buffer = new byte[buffersize];
int len;
using (BufferedStream bufferedStream = new BufferedStream(stream))
{
using (FileStream fileStream = new FileStream(Path.GetFullPath(dest), FileMode.Create, FileAccess.Write))
{
while ((len = stream.Read(buffer, 0, buffer.Length)) > 0)
{
fileStream.Write(buffer, 0, len);
fileStream.Flush();
}
}
}
}
And I for testing purposes try to download some resources from http uri's as by initiating a thread/task for each specific download:
[Test]
public async Task DownloadSomeStream()
{
Uri uri = new Uri("http://mirrors.standaloneinstaller.com/video-sample/metaxas-keller-Bell.mpeg");
List<Uri> streams = new List<Uri> { uri, uri, uri};
List<Task> tasks = new List<Task>();
var path = "C:\\TMP\\";
//Create task for each of the streams from uri
int c = 1;
foreach (var uri in streams)
{
WebClient webClient = new WebClient();
Task task = taskInitiator.RunTask(() => DownloadFileThroughWebStream(webClient, uri, Path.Combine(path,"File"+c), 8192));
tasks.Add(task);
c++;
}
Task allTasksHaveCompleted = Task.WhenAll(tasks);
await allTasksHaveCompleted;
}
I get the following exception:
System.IO.IOException: 'The process cannot access the file 'D:\TMP\File4' because it is being used by another process'
on line:
using (FileStream fileStream = new FileStream(Path.GetFullPath(dest), FileMode.Create, FileAccess.Write))
So there are two things that i dont understand with this exception:
Why it is not allowed to write? and how another process is allocating the file?
Why do it want to save file4 when I have only added 3 url's, so I only should have files: file1, file2, and file3 ?
Also, other questions that could be nice to get some thoughts on:
Is it right approach what I am doing in regards to what I want to achieve? Am I doing the Task initiations using Task Parallel Library correct?
Any tips and trick, best practices, etc.?

Why it is not allowed to write? and how another process is allocating
the file?
The file is not locked by another process, but by the same process. If you open a file for write, you basically get an exclusive lock for it. When you try to open the file again for writing from another task, it is locked and that is why you get the error.
To handle this case, you should put a lock around writing the data to disk. You should have a separate lock object for every unique file name you are writing to, and be careful to use the proper lock!
Why do it want to save file4 when I have only added 3 url's, so I only
should have files: file1, file2, and file3 ?
This is because you capture the variable c in the delegate you pass to Task.Run. Since these tasks normally start after the loop is over, the value of c is now 4. See here for more information about closures.

We can create download method which can execute downloading:
async Task DownloadFile(string url, string location, string fileName)
{
using (var client = new WebClient())
{
await client.DownloadFileTaskAsync(url, $"{location}{fileName}");
}
}
And the above method can be called by Task.Run() to execute simultaneous download of files:
IList<string> urls = new List<string>()
{
#"http://mirrors.standaloneinstaller.com/video-sample/metaxas-keller-Bell.mpeg",
#"https://...",
#"https://..."
};
string location = "D:";
Directory.CreateDirectory(location);
Task.Run(async () =>
{
var tasks = urls.Select(url =>
{
var fileName = url.Substring(url.LastIndexOf('/'));
return DownloadFile(url, location, fileName);
}).ToArray();
await Task.WhenAll(tasks);
}).GetAwaiter().GetResult();

Related

FileStreamResult - The process cannot access the file, because it is being used by another process

ASP .NET Core
MVC Controller - download file from server storage using FileStream and returning FileStreamResult
public IActionResult Download(string path, string fileName)
{
var fileStream = System.IO.File.OpenRead(path);
return File(fileStream, "application/force-download", fileName);
}
Everything works fine, but once the user cancels downloading before the download is complete, other actions in the controller working with this file (Delete file, rename file) do not work because: The process cannot access the file, because it is being used by another process
FileStream automatically dispose when the file download is complete, but for some reason it does not terminate when the user terminates the download manually.
I have to restart the web application => the program that uses the file is IISExpress
Does anyone please know how to dispose stream if the user manually ends the download?
EDIT:
FileStream stream = null;
try
{
using (stream = System.IO.File.OpenRead(path))
{
return File(stream, "application/force-download", fileName);
}
}
Code that I tried to end the Stream after returning FileStreamResult, I am aware that it can not work, because after return File (stream, contentType, fileName) it immediately jumps to the block finally and the stream closes, so the download does not start because the stream is closed
It seems the source of the FileStreamResult class shows it has no support for cancellation.
You will need to implement your own, if required. E.g. (not-tested, just imagined)
using System.IO;
namespace System.Web.Mvc
{
public class CancellableFileStreamResult : FileResult
{
// default buffer size as defined in BufferedStream type
private const int BufferSize = 0x1000;
private readonly CancellationToken _cancellationToken;
public CancellableFileStreamResult(Stream fileStream, string contentType,
CancellationToken cancellationToken)
: base(contentType)
{
if (fileStream == null)
{
throw new ArgumentNullException("fileStream");
}
FileStream = fileStream;
_cancellationToken = cancellationToken;
}
public Stream FileStream { get; private set; }
protected override void WriteFile(HttpResponseBase response)
{
// grab chunks of data and write to the output stream
Stream outputStream = response.OutputStream;
using (FileStream)
{
byte[] buffer = new byte[BufferSize];
while (!_cancellationToken.IsCancellationRequested)
{
int bytesRead = FileStream.Read(buffer, 0, BufferSize);
if (bytesRead == 0)
{
// no more data
break;
}
outputStream.Write(buffer, 0, bytesRead);
}
}
}
}
}
You can then use it like
public IActionResult Download(string path, string fileName, CancellationToken cancellationToken)
{
var fileStream = System.IO.File.OpenRead(path);
var result = new CancellableFileStreamResult(
fileStream, "application/force-download", cancellationToken);
result.FileDownloadName = fileName;
return result;
}
Again, I'm this is not tested, just imagined.
Maybe this doesn't work, as the action is already finished, thus cannot be cancelled anymore.
EDIT:
The above answer "Imagined" for ASP.net framework. ASP.net core has a quite different underlying framework: In .net core, the action is processed by and executor, as shown in the source. That will eventually call WriteFileAsync in the FileResultHelper. There you can see that StreamCopyOperation is called with the cancellationToken context.RequestAborted. I.e. cancellation is in place in .net Core.
The big question is: why isn't the request aborted in your case.

Parallel.ForEach memory usage keeps growing

public string SavePath { get; set; } = #"I:\files\";
public void DownloadList(List<string> list)
{
var rest = ExcludeDownloaded(list);
var result = Parallel.ForEach(rest, link=>
{
Download(link);
});
}
private void Download(string link)
{
using(var net = new System.Net.WebClient())
{
var data = net.DownloadData(link);
var fileName = code to generate unique fileName;
if (File.Exists(fileName))
return;
File.WriteAllBytes(fileName, data);
}
}
var downloader = new DownloaderService();
var links = downloader.GetLinks();
downloader.DownloadList(links);
I observed the usage of RAM for the project keeps growing
I guess there is something wrong on the Parallel.ForEach(), but I cannot figure it out.
Is there the memory leak, or what is happening?
Update 1
After changed to the new code
private void Download(string link)
{
using(var net = new System.Net.WebClient())
{
var fileName = code to generate unique fileName;
if (File.Exists(fileName))
return;
var data = net.DownloadFile(link, fileName);
Track theTrack = new Track(fileName);
theTrack.Title = GetCDName();
theTrack.Save();
}
}
I still observed increasing memory use after keeping running for 9 hours, it is much slowly growing usage though.
Just wondering, is it because that I didn't free the memory use of theTrack file?
Btw, I use ALT package for update file metadata, unfortunately, it doesn't implement IDisposable interface.
The Parallel.ForEach method is intended for parallelizing CPU-bound workloads. Downloading a file is an I/O bound workload, and so the Parallel.ForEach is not ideal for this case because it needlessly blocks ThreadPool threads. The correct way to do it is asynchronously, with async/await. The recommended class for making asynchronous web requests is the HttpClient, and for controlling the level of concurrency an excellent option is the TPL Dataflow library. For this case it is enough to use the simplest component of this library, the ActionBlock class:
async Task DownloadListAsync(List<string> list)
{
using (var httpClient = new HttpClient())
{
var rest = ExcludeDownloaded(list);
var block = new ActionBlock<string>(async link =>
{
await DownloadFileAsync(httpClient, link);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 10
});
foreach (var link in rest)
{
await block.SendAsync(link);
}
block.Complete();
await block.Completion;
}
}
async Task DownloadFileAsync(HttpClient httpClient, string link)
{
var fileName = Guid.NewGuid().ToString(); // code to generate unique fileName;
var filePath = Path.Combine(SavePath, fileName);
if (File.Exists(filePath)) return;
var response = await httpClient.GetAsync(link);
response.EnsureSuccessStatusCode();
using (var contentStream = await response.Content.ReadAsStreamAsync())
using (var fileStream = new FileStream(filePath, FileMode.Create,
FileAccess.Write, FileShare.None, 32768, FileOptions.Asynchronous))
{
await contentStream.CopyToAsync(fileStream);
}
}
The code for downloading a file with HttpClient is not as simple as the WebClient.DownloadFile(), but it's what you have to do in order to keep the whole process asynchronous (both reading from the web and writing to the disk).
Caveat: Asynchronous filesystem operations are currently not implemented efficiently in .NET. For maximum efficiency it may be preferable to avoid using the FileOptions.Asynchronous option in the FileStream constructor.
.NET 6 update: The preferable way for parallelizing asynchronous work is now the Parallel.ForEachAsync API. A usage example can be found here.
Use WebClient.DownloadFile() to download directly to a file so you don't have the whole file in memory.

Mvc - How to stream large file in 4k chunks for download

i was following this example, but when download starts it hangs and than after a minute it shows server error. I guess response end before all data id sent to client.
Do you know another way that i can do this or why it's not working?
Writing to Output Stream from Action
private void StreamExport(Stream stream, System.Collections.Generic.IList<byte[]> data)
{
using (BufferedStream bs = new BufferedStream(stream, 256 * 1024))
using (StreamWriter sw = new StreamWriter(bs))
{
foreach (var stuff in data)
{
sw.Write(stuff);
sw.Flush();
}
}
}
Can you show the calling method? What is the Stream being passed in? Is it the Response Stream?
There are many helpful classes to use that you don't have to chuck yourself because they chunk by default. If you use StreamContent there is a constructor overload where you can specify buffer size. I believe default is 10kB.
From memory here so it my not be complete:
[Route("download")]
[HttpGet]
public async Task<HttpResponseMessage> GetFile()
{
var response = this.Request.CreateResponse(HttpStatusCode.OK);
//don't use a using statement around the stream because the framework will dispose StreamContent automatically
var stream = await SomeMethodToGetFileStreamAsync();
//buffer size of 4kB
var content = new StreamContent(stream, 4096);
response.Content = content;
return response;
}

Can I get a GZipStream for a file without writing to intermediate temporary storage?

Can I get a GZipStream for a file on disk without writing the entire compressed content to temporary storage? I'm currently using a temporary file on disk in order to avoid possible memory exhaustion using MemoryStream on very large files (this is working fine).
public void UploadFile(string filename)
{
using (var temporaryFileStream = File.Open("tempfile.tmp", FileMode.CreateNew, FileAccess.ReadWrite))
{
using (var fileStream = File.OpenRead(filename))
using (var compressedStream = new GZipStream(temporaryFileStream, CompressionMode.Compress, true))
{
fileStream.CopyTo(compressedStream);
}
temporaryFileStream.Position = 0;
Uploader.Upload(temporaryFileStream);
}
}
What I'd like to do is eliminate the temporary storage by creating GZipStream, and have it read from the original file only as the Uploader class requests bytes from it. Is such a thing possible? How might such an implementation be structured?
Note that Upload is a static method with signature static void Upload(Stream stream).
Edit: The full code is here if it's useful. I hope I've included all the relevant context in my sample above however.
Yes, this is possible, but not easily with any of the standard .NET stream classes. When I needed to do something like this, I created a new type of stream.
It's basically a circular buffer that allows one producer (writer) and one consumer (reader). It's pretty easy to use. Let me whip up an example. In the meantime, you can adapt the example in the article.
Later: Here's an example that should come close to what you're asking for.
using (var pcStream = new ProducerConsumerStream(BufferSize))
{
// start upload in a thread
var uploadThread = new Thread(UploadThreadProc(pcStream));
uploadThread.Start();
// Open the input file and attach the gzip stream to the pcStream
using (var inputFile = File.OpenRead("inputFilename"))
{
// create gzip stream
using (var gz = new GZipStream(pcStream, CompressionMode.Compress, true))
{
var bytesRead = 0;
var buff = new byte[65536]; // 64K buffer
while ((bytesRead = inputFile.Read(buff, 0, buff.Length)) != 0)
{
gz.Write(buff, 0, bytesRead);
}
}
}
// The entire file has been compressed and copied to the buffer.
// Mark the stream as "input complete".
pcStream.CompleteAdding();
// wait for the upload thread to complete.
uploadThread.Join();
// It's very important that you don't close the pcStream before
// the uploader is done!
}
The upload thread should be pretty simple:
void UploadThreadProc(object state)
{
var pcStream = (ProducerConsumerStream)state;
Uploader.Upload(pcStream);
}
You could, of course, put the producer on a background thread and have the upload be done on the main thread. Or have them both on background threads. I'm not familiar with the semantics of your uploader, so I'll leave that decision to you.

File Stream not getting closed once I read the file at Application Launch in Windows 8

I have written a code to read and write a file. When I read the file using StreamReader in LoadFile Method and then next time when I call SaveFile Method and try to open StreamWriter my application is not able to open the file and throws an exception that file stream already in use. I have dispose the stream still the streams in Load method is not getting closed. Following is the code. Please tell me what wrong i am doing:
private async Task LoadFile()
{
assetFolder = await ApplicationData.Current.LocalFolder.CreateFolderAsync(Constants.LocalFolderForLog, CreationCollisionOption.OpenIfExists);
using (Stream streamData = await assetFolder.OpenStreamForReadAsync(LogFile))
{
using (StreamReader writer = new StreamReader(streamData))
{
Logger.Load(writer);
writer.Dispose();
}
streamData.Dispose();
}
}
---Save File
public static async Task SaveFile()
{
//StorageFolder assetFolder = await ApplicationData.Current.LocalFolder.CreateFolderAsync(Constants.LocalFolderForLog, CreationCollisionOption.OpenIfExists);
//var storageFile = await assetFolder.CreateFileAsync(
// LogFile,
// CreationCollisionOption.OpenIfExists);
using (Stream streamData = await assetFolder.OpenStreamForWriteAsync(LogFile, CreationCollisionOption.ReplaceExisting))
{
using (StreamWriter writer = new StreamWriter(streamData))
{
Logger.Save(writer);
writer.Dispose();
}
streamData.Dispose();
}
}
You probably have a race due to the asynchronous opening of the stream.
Note, that using the asynchronous operation, control immediately returns to the caller (the code which invoked LoadFile(), even though the operation is still going on in the background.
Make sure you await on your first Task, before invoking the second one.
You can find nice examples here.

Categories

Resources