I am currently working on a small project, where I need to send a potentially large file over the internet.
After some debating I decided to go with the streaming option instead of a chunking approach. The files can potentially be very big, I don't really want to specify an exact upper bound, 2GB maybe 4GB, who knows.
Naturally this can take a long time. Again I don't really want to have a timeout. It just takes as long as it takes, doesn't matter.
While poking around trying different files of varying size, I slowly, step by step, tuned the properties of my BasicHttpBinding. I am just wondering if the values I came up with are basically okay, or if they are totally evil?
transferMode="Streamed"
sendTimeout="10675199.02:48:05.4775807"
receiveTimeout="10675199.02:48:05.4775807"
openTimeout="10675199.02:48:05.4775807"
closeTimeout="10675199.02:48:05.4775807"
maxReceivedMessageSize="9223372036854775807"
This just doesn't feel right somehow, these are just the maximum possible values for each underlying data structure. But I don't know what else to do.
So again:
Is this basically the right approach? Or did I completely misunderstand and misuse the framework here?
Thanks
Well, a more natural approach might be to send the file in a sequence in mid-size chunks, with a final message to commit; this also makes it possible to resume from error. There is perhaps a slight DOS issue with fully open numbers...
I already have a problem with streaming when the connection between WCF client and server goes through the VPN. If interested, read more in this thread.
If the stream is big enough to be streamed for more then a minute - an exception occurs.
Related
We are currently developing a software solution which has a client and a number of WCF services that it consumes. The issues we are having is WCF services timing out after a period of inactivity. As far as I understand, there are 2 ways to resolve this:
Increase timeouts (as far as I understood, this is generally not recommended. Eg. setting timeout to infinite/weeks is considered bad practice)
Periodically ping the WCF services from the Client (I'm not sure that I'm a huge fan of his as it will add redundant, periodic calls)
Handle timeout issues and attempt to reconnect (this is slow and requires a lot of manual code)
Reliable Sessions - some sources mention that this is the in-built WCF pinging and message reliability mechanism, but other sources mention that this will still time out.
What is the recommended/best way of resolving this issue? Is there any official reading material on this? I could not find all that much info myself
Thanks!
As i can see, you have to use a combination of your stated points.
You are right, increasing the timeouts is bad practice and can give you a lot of problems.
If you don't want to use Reliable Sessions, then Ping is the only applicable way to hold the connection.
You need to handle this things, no matter if a timeout occurs, the connection is lost or a exception is thrown. There are a plenty of possibilities that your connection can fault.
Reliable Sessions are a good way not to implement a ping, but technically, it does nearly the same. WCF automatically sends an "I am still here" Request.
The conclusion of this is, that you need point 3 and point 2 or 4. To reduce the manually code for point 3, you can use Proxies or a wrapper around your ServiceClient, that establishes a new connection if the old one is faulted during a request. Point 4 is easy to implement, because you only need some small additions to your binding in your config. And the traffic overhead is not that big. Point 2 is the most expensive way, you need to handle a Thread/Task that only pings the server and the service needs to be extended. But as you stated before, Reliable Sessions can fail, and Pings should bring you on the safe side.
You should ask yourself what is your WCF endpoint is doing? Is the way you have your command setup the most optimal?
Perhaps it'd be better to have your endpoint that takes a long time be based on a polling system that allows there to be a quick query instead of waiting on the results of the endpoints actions.
You should also consider data transfer as a possible issue. Is the amount of data you're transferring back a lot?
To get a more pointed answer, we'd need to know more about the specific endpoint as well as any other responsibilities there are for the service.
Let's say I received a .csv-File over network,
so I have a byte[].
I also have a parser that reads .csv-files and does business things with it,
using File.ReadAllLines().
So far I did:
File.WriteAllBytes(tempPath, incomingBuffer);
parser.Open(tempPath);
I won't ever need the actual file on this device, though.
Is there a way to "store" this file in some virtual place and "open" it again from there, but all in memory?
That would save me ages of waiting on the IO operations to complete (good article on that on coding horror),
plus reducing wear on the drive (relevant if this occured a few dozen times a minute 24/7)
and in general eliminating a point of failure.
This is a bit in the UNIX-direction, where everything is a file-stream, but we're talking windows here.
I won't ever need the actual file on this device, though. - Well, you kind of do if all your API's expect file on the disk.
You can:
1) Get decent API's(I am sure there are CSV parsers that take Stream as construtor parameter - you then can possibly use MemoryStream, for example.)
2) If performance is serious issue, and there is no way you can handle the API's, there's one simple solution: write your own implementation of ramdisk, which will cache everything that is needed, and page stuff to hdd if necessary.
http://code.msdn.microsoft.com/windowshardware/RAMDisk-Storage-Driver-9ce5f699 (Oh did I mention that you absolutely need to have mad experience with drivers :p?)
There's also "ready" solutions for ramdisk(Google!), which means you can just run(in your application initializer) 'CreateRamDisk.exe -Hdd "__MEMDISK__"'(for example), and use File.WriteAllBytes("__MEMDISK__:\yourFile.csv");
Alternatively you can read about memory-mapped files(>= C# 4.0 has nice support). However, by the sounds of it, that probably does not help you too much.
I have a WCF Service deployed in azure. I have a client consuming this service which runs on Windows Phone 7. Everything worked fine but when i was trying to send to server some larger files or enumerables with lost of items, errors occured. I found out that there can be configured max message size, max array length etc in configuration file. So i added few zeros to default values and it worked. However, i am not happy with this solution, because it is dirty.
my question is:
1.What exactly are disadvantages of mindlessly increasing message size limits and how does it affect service?
2.What is alternative for me instead of increasing message size?
In particular, i nedd to send to server GPS track which consists of same metadata and huge ammount of location points.
If i understand concept correctly, by default wcf uses SOAP, which is XML based. So objects sent are encoded as XML (similiar to XML serialization in .net?). So can it be somehow switched to some binary mode to send BLOBS or to upload large objects troyugh streams? Or is my oinly option to bypass WCF service completely and upload directly to server Storage (like SQL azure or Azure Blob Service), which exposes API to do so?
Thank you.
As Peretz mentioned in a comment, that's what is supposed to happen.
The defaults are just that--defaults. Not "recommended" settings, nor pseudo-max sizes. They're available to alter based on your needs (and should be).
You could use net.tcp binding (if you're not already) which handles data a little better (with regards to serializing), but what you're doing is well within the boundary of WCF and its abilities.
I had quite same problem with huge gps tracks. I realy suggest not to use soap for this kind of tasks. You can use WebHttpBinding and implement streaming from your storage, or use something like ASP.NET WebAPI(will ship with MVC4 and can be hosted outside of IIS) for low level plumbing of streams to client. This will allow you to implement multiple download streams and all what you might need in this kind of tasks.
As in overall design concept of such systems, try not to think that one tool can solve your problems. Just use right took for a task. If you have some busines tasks - implement transactional ws-* based services for them. To transfer huge amount of data - use something like REST services. This will also help to query tracks.
For example: Tracks/{deviceid}/{trackDate}.{format}.
Feel free to ask=)
You should not arbitarily increase message sizes by chucking 0's on the end of these settings. Yes they are defaults which can be changed, but increasing message sizes whenever a limit is hit is not something you should always resort to. One of the reasons for having such small sizes is security: it prevents clients from flooding servers with huge messages and taking them down. It also encourages clients to send small messages which helps with scalability.
There are encodings that you can use: it depends on the bindings used. I thought that WCF encodes SOAP as binary anyway...but I may be wrong, I haven't touched WCF for 6 months now.
In previous projects whenever we hit size limits we looked at cutting our data into smaller chunks. One of the best things we ever did was implement pagable grids in our GUI's which only got 10-20 or so records from the sever at a time. Entity Framework was greating in allowing us to write a single generic skip take query to do this for ALL of our grids.
Just increasing sizes is an easy fix...until you cant increase any further. Its a brittle and broken approach.
I need to implement some atomic writes to secondary storage. How can I make this fool proof?
If I open a C# file handle using File.Open I will receive a handle. I can write some data to it. Flush it and close it. But I still have some questions. I guess the below statements are true?
Data might not be written to disk but rather exist in the Windows Disk cache
Data might not be written to disk but rather exist in the HDD cache
And this will lead to the following issues:
A power outage will make the edits in the file I made reverted (On a transactional FS like NTFS)
A kernel-panic will make the edits in the file I made reverted (On a transactional FS like NTFS)
Am I correct in my assumptions? If so, how can I make a fool proof write to the disk?
I have looked a little bit into NoSQL and have been thinking there might be a nosql server that could talk to the system closer to the hardware and not return to me software until it can guarantee that the bytes are written to disk.
All ideas and thoughts are welcome
Jens
[edit]
Maybe there is a specific amount of time I can wait before being sure that all changes are written to physical disk?
The only way to make an operation fully "fool proof" is queue, run the operation, and confirm. Things stay in queue and can be run again, until confirmed, or if the confirmation is negative "rolled back".
The window of time you are talking about, assuming you are not involving a network (everything is local), is very small. Still, if you want to ensure things, you queue them. MSMQ is one option. If the data comes from SQL Server, you can consider it's queueing mechanism Service Broker (not recommending this direction, but it is one way).
Ultimately, the idea here is a lot like a handshake, as used in most server to server communication. Everyone agrees things are done before both sides get rid of their piece of the work.
I am not an expert in Windows internals, but I believe you are correct. I didn't test it in great detail, but was able to use MSMQ as a pretty reliable place to store data, with another process that monitored the queue for final processing.
I've been badly let-down and received an application that in certain situations is at least 100 times too slow, which I have to release to our customers very soon (a matter of weeks).
Through some very simple profiling I have discovered that the bottleneck is its use of .NET Remoting to transfer data between a Windows service and the graphical front-end - both running on the same machine.
Microsoft guidelines say "Minimize round trips and avoid chatty interfaces": write
MyComponent.SaveCustomer("bob", "smith");
rather than
MyComponent.Firstname = "bob";
MyComponent.LastName = "smith";
MyComponent.SaveCustomer();
I think this is the root of the problem in our application. Unfortunately calls to MyComponent.* (the profiler shows that 99.999% of the time is spent in such statements) are scattered liberally throughout the source code and I don't see any hope of redesigning the interface in accordance with the guidelines above.
Edit: In fact, most of the time the front-end reads properties from MyComponent rather than writes to it. But I suspect that MyComponent can change at any time in the back-end.
I looked to see if I can read all properties from MyComponent in one go and then cache them locally (ignoring the change-at-any-time issue above), but that would involve altering hundreds of lines of code.
My question is: Are they any 'quick-win' things I can try to improve performance?
I need at least a 100-times speed-up. I am a C/C++/Delphi programmer and am pretty-much unfamiliar with C#/.NET/Remoting other than what I have read up on in the last couple of days. I'm looking for things that can be completed in a few days - a major restructuring of the code is not an option.
Just for starters, I have already confirmed that it is using BinaryFormatter.
(Sorry, this is probably a terrible question along the lines of 'How can I feasibly fix X if I rule out all of the feasible options'… but I'm desperate!)
Edit 2
In response to Richard's comment below: I think my question boils down to:
Is there any setting I can change to reduce the cost of a .NET Remoting round-trip when both ends of the connection are on the same machine?
Is there any setting I can change to reduce the number of round-trips - so that each invocation of a remote object property doesn't result in a separate round-trip? And might this break anything?
Under .Net Remoting you have 3 ways of communicating by HTTP, TCP and IPC. If the commnuicatin is on the same pc I sugest using IPC channels it will speed up your calls.
In short, no there are no quick wins here. Personally I would not make MyComponent (as a DTO) a MarshalByRefObject (which is presumably the problem), as those round trips are going to cripple you. I would keep it as a regular class, and just move a few key methods to pump them around (i.e. have a MarshalByRef manager/repository/etc class).
That should reduce round-trips; if you still have problems then it will probably be bandwidth related; this is easier to fix; for example by changing the serializer. protobuf-net allows you to do this easily by simply implementing ISerializable and forwarding the two methods (one from the interface, plus the ctor) to ProtoBuf.Serializer - it then does all the work for you, and works with remoting. I can provide examples of this if you like.
Actually, protobuf-net may help with CPU usage too, as it is a much more CPU-efficient serializer.
Could you make MyComponent a class that will cache the values and only submit them when SaveCustomer() is called?
You can try compressing traffic. If not 100-times increase, you'll still gain some performance benefit
If you need the latest data (always see the real value), and the cost of getting the data each time dominates the runtime then you need to be radical.
How about changing polling to push. Rather than calling the remote side each time you need a value, have the remote push all changes and cache the latest values locally.
Local lookups (after the initial get) are always up to date with all remoting overhead being done in the background (on another thread). Just be careful about thread safety for non-atomic types.