I am writing an internet speed measurer. I want to measure internet speed without saving remote file to either file or memory - I need fetch the data and forget it, so it seems that WebClient and StreamReader is not good for it (maybe i should inherit them and override some private methods). How to make it?
I think that when writing such a system you should also select exactly which size of file you want to download.
Anyway, you can maybe download parts of the file if the "very large" or "infinite" size is a problem. You can maybe use HttpWebRequest.AddRange
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.addrange.aspx
You can use System.Net.WebRequest to measure the throughput, using asynchronous calls to capture the data as it is being read.
The MSDN example code for WebRequest.BeginGetResponse shows one method for using the asynchronous methods to read the data from a remote file as it is being received. The example stores the response data in a StringBuilder, but you can skip that since you're not actually interested in the data itself.
I added some timing code to their example and tested it against a couple of large file downloads, seems to do the job you need it for.
Related
Let's say I received a .csv-File over network,
so I have a byte[].
I also have a parser that reads .csv-files and does business things with it,
using File.ReadAllLines().
So far I did:
File.WriteAllBytes(tempPath, incomingBuffer);
parser.Open(tempPath);
I won't ever need the actual file on this device, though.
Is there a way to "store" this file in some virtual place and "open" it again from there, but all in memory?
That would save me ages of waiting on the IO operations to complete (good article on that on coding horror),
plus reducing wear on the drive (relevant if this occured a few dozen times a minute 24/7)
and in general eliminating a point of failure.
This is a bit in the UNIX-direction, where everything is a file-stream, but we're talking windows here.
I won't ever need the actual file on this device, though. - Well, you kind of do if all your API's expect file on the disk.
You can:
1) Get decent API's(I am sure there are CSV parsers that take Stream as construtor parameter - you then can possibly use MemoryStream, for example.)
2) If performance is serious issue, and there is no way you can handle the API's, there's one simple solution: write your own implementation of ramdisk, which will cache everything that is needed, and page stuff to hdd if necessary.
http://code.msdn.microsoft.com/windowshardware/RAMDisk-Storage-Driver-9ce5f699 (Oh did I mention that you absolutely need to have mad experience with drivers :p?)
There's also "ready" solutions for ramdisk(Google!), which means you can just run(in your application initializer) 'CreateRamDisk.exe -Hdd "__MEMDISK__"'(for example), and use File.WriteAllBytes("__MEMDISK__:\yourFile.csv");
Alternatively you can read about memory-mapped files(>= C# 4.0 has nice support). However, by the sounds of it, that probably does not help you too much.
I have read a number of closely related questions but not one that hits this exactly. If it is a duplicate, please send me a link.
I am using an angular version of the flowjs library for doing HTML5 file uploads (https://github.com/flowjs/ng-flow). This works very well and I am able to upload multiple files simultaneously in 1MB chunks. There is an ASP.Net Web API Files controller that accepts these and saves them on disk. Although I can make this work, I am not doing it efficiently and would like to know a better approach.
First, I used the MultipartFormDataStreamProvider in an async method that worked great as long as the file uploaded in a single chunk. Then I switched to just using the FileStream to write the file to disk. This also worked as long as the chunks arrived in order, but of course, I cannot rely on that.
Next, just to see it work, I wrote the chunks to individual file streams and combined them after the fact, hence the inefficiency. A 1GB file would generate a thousand chunks that needed to be read and rewritten after the upload was complete. I could hold all file chunks in memory and flush them after they are all uploaded, but I'm afraid the server would blow up.
It seems that there should be a nice asynchronous solution to this dilemma but I don't know what it is. One possibility might be to use async/await to combine previous chunks while writing the current chunk. Another might be to use Begin/EndInvoke to create a separate thread so that the file manipulation on disk was handled independent of the thread reading from the HttpContext but this would rely on the ThreadPool and I'm afraid that the created threads will be unduly terminated when my MVC controller returns. I could create a FileWatcher that ran completely independent of ASP.Net but that would be very kludgey.
So my questions are, 1) is there a simple solution already that I am missing? (seems like there should be) and 2) if not, what is the best approach to solving this inside the Web API framework?
Thanks, bob
I'm not familiar with that kind of chunked upload, but I believe this should work:
Use flowTotalSize to pre-allocate the file when the first chunk comes in.
Have one SemaphoreSlim per file to serialize the asynchronous writes for that file.
Each chunk will write to its own offset (flowChunkSize * (flowChunkNumber - 1)) within the file.
This doesn't handle situations where the uploads are unexpectedly terminated. That kind of solution usually involves allocating/writing a temporary file (with a special extension) and then moving/renaming that file once the last chunk arrives.
Don't forget to ensure that your file writing is actually asynchronous.
Using #Stephen Cleary's answer, and this thread: https://github.com/flowjs/ng-flow/issues/41 I was able to make an ASP.NET Web Api Implementation and uploaded it for those still wondering about this question such as #Herb Caudill
https://github.com/samhowes/NgFlowSample/tree/master.
The original answer is the real answer to this question, but I don't have enough reputation yet to comment. I did not use a SemaphoreSlim, but instead enabled file Write sharing. But did in fact pre-allocate and make sure that each chunk is getting written to the right location by calculating an offset.
I will be contributing this to the Flow samples at: https://github.com/flowjs/flow.js/tree/master/samples
This is what I have done. Uploaded the chunks and saved those chunks on the server an save the location of chunks in the database with their order (not the order they came in, the order of the chunk in the file they should be).
Then I introduced another endpoint to merge those chunks. Since this part can be a long process I used a messaging service to run the process in the background.
And after the service is done merging the file, sending a notification (or you can trigger an event).
Agree, it won't fix the problem of having to save all those chunks, but after the merging is done, we can just delete those from the disk. However there are some IIS configuration required though for the upload to work smoothly.
Here's my two cents to this old question. Now most of the application use azure or aws for storage. However, still sharing my thoughts in case it helps someone.
and thank you for taking your time to look at my question, I know if there's any kind of performance advantange if instead of using XDocument.Load(Url), I download the file first and then read from it.
For your examples you can use VB.NET or C#, it's all the same for me.
In general, downloading the file first and saving it will likely be slower than just using XDocument.Load(string). The Load method which accepts a string will stream the contents directly into the XDocument reader, which eliminates extra overhead in the save/read calls. Internally, the Load(string) method creates a Stream and downloads the file, reading from the Stream directly.
However, if the XML document you're loading is static, and you're calling this multiple times, it could (potentially) make sense to cache it locally to avoid the network traffic.
I'm working on improving the upload performance of a .net app that uploads groups of largeish (~15mb each) files to S3.
I have adjusted multipart options (threads, chunk size, etc.) and I think I've improved that as much as possible, but while closely watching the network utilization, I noticed something unexpected.
I iterate over a number of files in a directory and then submit each of them for upload using an instance of the S3 transfer utility like so:
// prepare the upload
this._transferUtility.S3Client.PutBucket(new PutBucketRequest().WithBucketName(streamingBucket));
request = new TransferUtilityUploadRequest()
.WithBucketName(streamingBucket)
.WithFilePath(streamFile)
.WithKey(targetFile)
.WithTimeout(uploadTimeout)
.WithSubscriber(this.uploadFileProgressCallback);
// start the upload
this._transferUtility.Upload(request);
Then I watch for these to complete in the uploadFileProgressCallback specified above.
However when I watch the network interface, I can see a number of distinct "humps" in my outbound traffic graph which coincide precisely with the number of files I'm uploading to S3.
Since this is an asynchronous call, I was under the impression that each transfer would begin immediately, and I'd see a stepped increase in outbound data followed by a stepped decrease as each upload was completed. Base on what I'm seeing now I wonder if these requests, while asynchronous to the calling code, are being queued up somewhere and then executed in series?
If so I'd like to change that so the request all begin uploading at (close to) the same time, so I can maximize the upload bandwidth I have available and reduce the overal execution time.
I poked around in the S3 .net SDK documentation but I couldn't find any mention of this queueing mechanism or any properties/etc. that appeared to provide a way of increasing the concurrency of these calls.
Any pointers appreciated!
This is something that's not intrinsically supported by the SDKs due to simplicity requirements maybe? I implemented my own concurrent part uploads based on this article.
http://aws.typepad.com/aws/2010/11/amazon-s3-multipart-upload.html
Some observations:
This approach is good only when you have the complete content in memory as you have to break it into chunks and wrap it up in part uploads. In many cases it may not make sense to have order of GBs of data in memory just so that you can do concurrent uploads. You may have to evaluate the tradeoff there.
SDKs have a limit of upto 16MB for a singlePut upload and any file size beyond this value would be divided into 5MB chunks for part uploads. Unfortunately these values are not configurable. So I had to pretty much write my own multipart upload logic. The values mentioned above are for the java SDK and I'd expect these to be the same for the C# one too.
All operations are non-blocking which is good.
In c# you could try to set the partsize manually.
TransferUtilityUploadRequest request =
new TransferUtilityUploadRequest()
.WithPartSize(??).
Or
TransferUtilityConfig utilityConfig = new TransferUtilityConfig();
utilityConfig.MinSizeBeforePartUpload = ??;
But i don't know the defaults
I need to download a file from a website using different threads and downloading different sections of the file at once, I know I can use Webclient.downloadfile method but it doesnt support downloading a file in chunks. If you could point to a tutorial or give me an idea on how to do it I would appreciate it. Thanks!
The server at the other end, the one providing the file, has to support downloading in chunks as well. It would need some way to specify which byte position in the file you want to start at, instead of starting at the first and sending until the client stops accepting them, or it reaches the end of the file.
Assuming the server does support that, they would provide some kind of documentation on how to utilize it and you would definitely find help here turning that into code.
To piggy back on Rex's answer, there is no fool-proof way to know. Some web servers will provide you with a content-length or some will return -1 for length. Annoying, I know..
Your best bet is to specify a fixed range and utilize some heuristics or analysis to determine estimated length of your chunks over time.
You'll also want to look at this similar SO question on Multipart Downloading in C#.
The WebClient object has a 'Headers' property, which should let you define a 'Range' header to ask for only a part of the file.
There are a lot of ifs here, but if you are downloading, say, a giant text file, you could actually split it into many files on the server, and return the addresses of each to the client (or use a filename convention and just report how many sections there are), and the client could in turn could spin up the threads to download each of sections, which it could then reconstitute into a single, large file.
I'm not sure of your use case, but this particular scenario may not be likely to make anything go faster, if that is the idea.