Why is passing a dataset to a web service method not good?

Why is passing a dataset to a web service method not good? - c#

Please explain in detail. I have been telling my mentor that everything I've researched on WCF and many programmers all over the net say it is NOT good to pass DataSets to the service, why is that? I created a BUNCH of classes in the service and the work great with the application, but he says that I just wasted time doing all that work, he has a better way of doing it.
He keeps telling me to create a SINGLE OperationContract. There will be many functions in the service, but the OperationContract will take the string name of the function and the dataset providing the details for that function.
Is his way bad practice? Not safe? I'm just trying to understand why many people say don't use datasets.

The first reason is interoperability. If you expect consumers of your service to be implemented in any other technologies other than .NET, they may have lots of trouble extracting or generating the data in the DataSet, as they will have no equivalent data structure on their end.
Performance can be affected quite a bit, as well. In particular, the serialization format for untyped datasets can be huge because it will contain not just the data, but also the XSD schema for the data set, which can be quite large depending on the complexity of the DataSet. This can make your messages a lot larger, which will use more network bandwidth, take longer to transfer (particularly over high latency links), and will take more resources at the endpoint to parse.

So the web service you have does something specific lets say it sends a bunch of emails. Lets say this service has one method that sends an email. The method should accept and email address, subject and a body.
Now if we send a data set with the information required the service would have to know the shape of the data and parse it.
Alternatively if the web service accepted a object with properties for email address, subject and body. It can be used in more than one place and is less prone to going wrong dues to a malformed dataset.

One more thing: you can get incorrect data using DataSet.
For example a value in the DataSet might look like the following before serialization:
<date_time>12:19:38</date_time>
In the client it would come with a offset specified:
<date_time>12:19:38.0000000-04:00</date_time>
The client code would adjust this to its local time (much like Outlook when you schedule an appointment with someone in a different timezone).
More details can be found here.

Using WCF is not just an implementation decision - it is a design choice. When you choose to use WCF you have to abandon many of your treasured OO principles behind and embrace a new set of patterns and principals that are associated to service orientation.
One such principle is that of explicit contracts: A service should have well defined public contracts (see this Wikipedia article). This is crucial for interoperability, but is also important so clients have an accurate picture of what functionality your service provides.
A DataSet is basically just a big bag of "stuff" - there is no limitation to what it could contain - or any well defined contract that explains how I can get data out. By using a DataSet you introduce inherent coupling between the client and the server - the client has to have "inside information" about how the DataSet was created in order to get the data out. By introducing this level of coupling between the client and service you have just negated one of the main motivations for using WCF (precisely that of decoupling the two areas of functionality to allow for independent deployment and/or development lifecycle).

Related

Data layer as WCF vs an in app data layer

I am working on an app which uses WCF as data layer.
I understand there as certain benefits such as security. What would be other benefits or handicaps in such approach?
Isn't serializing and de-serializing would cost performance?
how about maintenance, testing and maintainability?
What would be other drawbacks of such approach?

So you have a data layer and it is accessed using WCF. First the upside to this: you can move your data layer wherever you need it and your applications should not care. (as long as the dns resolves correctly) And if it is hosted inside IIS then you gain some security by doing SLL as your secured layer in front of your service. And if your services are well written you can easily throw them into a load balanced process.
On the downside you need to be concerned about how you expose that service. If it communicates the data back in XML you will suffer a much larger serialization penalty than if you used JSON as your means of serializing data.
In the middle side of things (neither good or bad) you would be forcing yourself to be careful (I would hope) in how you format your requests. For example, passing only a key for a delete instead of the entire record to delete. (believe me, I've seen systems written like this!!)
You should also carefully design your services so that your svc file contains something like this:
public Customer GetCustomer(int customerID)
{
return DataLayer.GetCustomer(customerID);
}
This way you can easily directly utilize your datalayer if some other application is already sitting on your WCF server. A good example of this is you may have your data layer isolated inside your internal network. Sheltered by the DMZ. Your intranet may need to access the same data layer so you can put your intranet applications on that server and directly use the datalayer. Or they can be on a different server but use the data layer libraries directly.
One final note...which we encountered a need for in one situation. If you implement something out on the DMZ that needs to directly access a server instead of being routed through the firewalls, you can easily create a proxy of your data services. The proxy just takes your service interface and implements calls through the firewall to your service behind the DMZ. Took us maybe one day to implement this.
For testing: well that is no different than anywhere else you have a data layer. You need to do your tests, use repeatable data in your test setup, and proper cleanup after your tests complete. It also does not change for maintainability, etc. However you need to have a clear approach for versioning of your services to encompass interface changes. But, again, that is the same no matter where your data services lie.
Hope this helps some.

WCF single point-of-contact

WCF beginner's question: I've been told that changing the WCF contract is costly and requires constant maintenance (recreating the proxy in the client side), and therefore the preferred method is having one very generic point-of-contact method (which decides how to act, say, according to a given enum parameter).
This sounds quite smelly to me, but I haven't been able to find any information about this issue (bad choice of search keywords? probably).
Any advice, or maybe a useful link?
Thanks!

You don't need to generate the proxy again, you can simply ensure the client is built with the correct interface version. If you're very careful and only add methods, not remove or modify, that works just fine too. That's a lot of responsibility to manage, of course.
To use an interface rather than generate a client proxy, check my question from a while ago:
WCF Service Reference generates its own contract interface, won't reuse mine

You are confusing some terms here and I think you might be referring to a known flaw which has been fixed in .Net 3.5 SP1.
Recreating the WCF proxy used to be an expensive operation at runtime. This has been improved in .Net 3.5 to cache the proxy objects transparently MSDN Blog.
If you are referring to the "code maintenance" of the proxy, then all you are referring to is implementing an interface at the client. If you need to maintain the interface then this comes back to basic SOA. If your services provide access and as much information as possible, assuming that your service will be used for purposes you haven't yet considered then you will likely not need to modify the interface after it is created. You should also consider your upgrade paths as well.
Juval Lowy has a good discussion about this problem in his book which is a little dense but has some pretty good information in it.
A piece of advice: WCF has a whole lot of features designed to make your code really simple and elegant. If you are worreid about maintenance, what you may be driven to do is write an interface:
string ServiceMethod(string xml) //returns XML
Don't do this. Take the time to design a good maintainable interface and a good data/message contract. This will let WCF provide all the extras you get for free when hosting your service for interaction.

Generic (as in non-specific, monolithic) interfaces are hard to understand and program to. The reason not to define a single method as the API is that it's impossible for clients to understand what's going on, and when you change the (implicit) API of this interface, your clients will break in horrible ways that you won't detect at compile time.
It's been a while since I touched WCF, but if your clients are internal (same codebase, versioning and deployment schemes), then regenerating the WCF proxies is very easy, and having a "strong" detailed API will make your life so much easier than a generic one.

It depends on what kind of change you mean. Change to the service contract is indeed costly and should not happen. Service contracts are (or should be) at a sufficiently high level of granularity that change is very rare.
More common are changes to the types which are exposed on the service. These changes are more common and therefore you do need to approach your change in such a way as to avoid breaking existing clients if possible.
There are several ways you can do this, such as exposing your types polymorphically using an interface, but the simplest way is to simply ensure that changes to your types on add new data member fields and make the new fields non-mandatory. If you can limit your changes to these then this is has the lowest impact to existing clients and enables new clients to use the new fields.
Hope this helps.

This is true that modifying the service contract (interface) would also required the client to recreate the proxy class at their end using the new published WSDL and may even require the client to change their code as par the new proxy. I don't think you can create such a generic interface that can handle all changes further down the road in the contract. A contract has to be written very carefully so that it doesn't change often and if there is a need to change the contract then it is better to deploy the service with a different version so that your old client can still work with the old version.

WCF and size of DTOs

We've got a business logic/data access layer that we're exposing on a couple of different endpoints via a WCF service. We've created DTOs for use as the data contract of the service. We'll be using the service via the different endpoints for multiple different applications. In some of the applications, we only need a few fields from the DTO while in others we may need almost all of them. For those in which we only need a few, we really don't want to be sending the entire object "over the wire" every time - we'd like to pare it down to what we actually need for a given application.
I've gone back and forth between creating specific sets of DTOs for use with each application (overkill?) and using something like EmitDefaultValue=false on the members that are only needed in certain apps. I've also considered using the XmlSerializer rather than DataContractSerializer in order to have greater control over the serialization within the service.
My question is - first off, should we worry that much about the size of data we're passing? Second, assuming the answer is 'yes' or that we decide to care about it even if it is 'no', what approach is recommended here and why?
EDIT
Thanks for the responses so far. I was concerned we might be getting into premature optimizations. I'd like to leave the question open for now, however, in hopes that I can get some answers to the rest of it, both for my own edification and in case anybody else has this question and has a valid reason to need to optimize.

first off, should we worry that much about the size of data we're passing?
You didn't give the number/sizes of the fields but in general: No. You've already got the envelope(s) and the overhead of setting up the channel, a few more bytes won't matter much.
So unless we're talking about hundreds of doubles or something similar, I would first wait and if there's a real problem: experiment and measure.

Should you worry? Maybe. Performance/stress test your services and find out.
If you decide you do care...a couple options:
Create a different service (or maybe different operations in the same service) that return partially hydrated DataContracts. So these new services and/or operations return the same DataContrcts, but only partially hydrated.
Create "lite" versions of your DataContracts and return those instead. Basically the same as option 1, but with this approach you don't have to worry about consumers misusing the full DataContract (potentially getting null reference exceptions and such).
I prefer option 2, but if you have control over your consumers, option 1 might work for you.

It seems you may be entering the "premature optimization" zone. I'd avoid using application specific DataContracts for an entity because of the maintenance work it will cause problems in the long run. However, if your application has a valid need to hide information from some client applications and not other then its good to have multiple DataContracts for a given entity. #Henk is right, unless you're dealing with massive deeply nested entities (in which case you have a different problem) then do not "optimize" your design simply to reduce network transmission packets.

WCF: Individual methods or a generic ProcessMessage method accepting xml

My company is developing an application that receives data from another company via TCP sockets and xml messages. This is delivered to a single gateway application which then broadcasts it to multiple copies of the same internal application on various machines in our organisation.
WCF was chosen as the technology to handle the internal communications (internally bi-directional). The developers considered two methods.
Individual methods exposed by the
WCF service for each different
message received by the gateway
application. The gateway
application would parse the incoming
external message and call the
appropriate WCF service method. The
incoming XML would be translated
into DataContract DTO’s and supplied
as argument to the appropriate WCF
method.
The internal application
exposed a WCF service with one
method “ProcessMessage” which
accepted an Xml string message as
argument. The internal app would
parse then deserialize the received
xml and process it accordingly.
The lead developer thought option two was the better option as it was “easier” to serialized/deserialize the xml. I thought the argument didn’t make sense because DataContracts are serialized and deserialized by WCF and by using WCF we had better typing of our data. In option 2 someone could call the WCF service and pass in any string. I believe option 1 presents a neater interface and makes the application more maintainable and useable.
Both options would still require parsing and validation of the original xml string at some point, so it may also be a question where is the recommended place to perform this validation.
I was wondering what the current thoughts are for passing this kind of information and what people’s opinions on both alternatives are.

Option 1 is suited if you can ensure that the client always sends serialized representations of data contracts to the server.
However if you need some flexibility in the serialization/deserialization logic and not get tightly coupled with DataContracts, then option 2 looks good. Particularly useful when you want to support alternate forms of xml (say Atom representations, raw xml in custom format etc)
Also in option 2 inside the ProcessMessage() method, you have the option of deciding whether or not to deserialize the incoming xml payload (based on request headers or something that is specific to your application).
In option 1, the WCF runtime will always deserialize the payload.

I recently asked a couple of questions around this area: XML vs Objects and XML vs Objects #2. You'll find the answers to those questions interesting.
For our particular problem we've decided on a hybrod approach, with the interface looking something like this:
// Just using fields for simplicity and no attributes shown.
interface WCFDataContract
{
// Header details
public int id;
public int version;
public DateTime writeDateTime;
public string xmlBlob;
// Footer details
public int anotherBitOfInformation;
public string andSoemMoreInfo;
public book andABooleanJustInCase;
}
The reason we use an xmlBlob is because we own the header and footer schema but not the blob in the middle. Also, we don't really have to process that blob, rather we just pass it to another library (created by another department). The other library returns us more strongly typed data.
Good luck - I know from experience that your option 2 can be quite seductive and can sometimes be hard to argue against without being accused of being overly pure and not pragmatic enough ;)

I hope I understood this right. I think it might make sense to have your gateway app handle all the deserialization and have your internal app expose WCF services that take actual DataContract objects.
This way, your deserialization of the TCP-based XML is more centralized at the gateway, and your internal apps don't need to worry about it, they just need to expose whatever WCF services make sense, and can deal with actual objects.
If you force the internal apps to do the deserialization, you might end up with more maintenance if the format changes or whatever.
So I think I would say option 1 (unless I misunderstood).

Architectural question: In what assembly should I put which class, for a clean solution?

PREAMBLE:
This is by far the longest post I've left here...but I think it's required in this case.
I've had questions about these kinds of things for a long time: how to name assemblies, and how to divide up classes within them.
I'd like to give an example of an application here, with only a bare minimum of classes to demonstrate what I'm trying to understand.
Imagine an application that
Accepts client messages, store them in a db, and then later dequeues them to an MTA server.
It's a Web application that has both an ASP.NET interface to write a message + attach attachments.
There's also a Silverlight client, so the webapp exposes a ClientServices WCF ServiceContract, with one OperationContract (SaveMessage).
There's also a Windows client...does the same thing as the Silerlight contract.
OK. that should be enough of a fake scenario to demonstrate my cluelessness.
The above will need the following classes:
Message
MessageAddress
MessageAddressType (an enum with From, To)
MessageAddressCollection
MessageAttachment
MessageAttachmentType
MessageAttachmentCollection
MessageException
MessageAddressFormatException
MessageExtensions (static extension for Message)
MessageAddressExtensions (static extension for MessageAddress)
MessageAttachmentExtensions (static extension for MessageAttachment)
Project.Contract.dll
My first stab at organizing the above into the right assemblies would be observing that Message, MessageAddress, MessageAttachment, the enums needed for its properties (MessageAddressType, MessageAttachmentType) and the collections needed for them(MessageAddressCollection, MessageAttachmentCollection), are all to be marked as [DataContract] so that they can be serialized between the WCF client and the server.
Being common to both, I think I would move them into a neutral shared assembly called Contract.
Project.Client.dll
I'll need a Client proxy of the server [ServiceContract], that refs the classes in the Contract.dll.
So now the server, which also refs Project.Contract.dll could now save serialized Messages received from a WCF Client, and save them into a db.
Plugins
Next I would realize that I would like to have these objects be processed server side by 3rd party plugins (eg; a virus checker)...
But plugins should have readonly access (only) to the variables in order to check the variables, and throw errors if they see something they don't like.
So I would think about going back to have Message inherit from IMessageReadOnly ...but where to put that interface?
Project.Interfaces.dll
If I put it in an assembly called Project.Interfaces.dll, this would work for the plugins who could reference that without having a reference to Contracts.dll...but now the client has to reference both Contracts assembly AND Interfaces...doesn't sound like a good direction...
Duplicate Objects
Alternatively, I could have two Messages structures (and duplicate the other MessageAttachment, etc. classes as well)...one for communicating from client to server (in the Contracts.dll), and then use a second ServerMessage/ServerMessageAddress/ServerMessageAddressCollection on the server side, which inherits from IMessageReadOnly, and then it would appear that I am closer to what I want.
With duplicate objects, plugins are limited in access, while Server BL, etc. has full access for types relevant to its work, all while the client has different but identical objects...
In fact...they I should probably start considering them as non-identical, making it clearer in my head that the objects are just there to talk to clients, ie Contract/Comm objects)...
The Website UI
which brings up ...hum...if there are two different Messages, and they have now different properties...which one is the most appropriate for using to back the ASP.NET forms? The ServerMessage object seems fastest (no mapping going on between types)...but all the logic has already been worked out against client message objects (with different properties and internal logic). So would I use a ClientMessage, and map it to a Servermessage, to keep the various UI logics the same, across different mediums? or should i prefer mapping, and just rewrite the UI validation?
What about the third case, Silverlight...The Contracts assembly was a Full Framework assembly...which Silverlight can't ref (different framework/build mechanism)....so the assembly that i have on the Silverlight side might be exactly the same code, but has to be a different assembly. How does that work out?
What exactly to Consider as DataContract?
Finally...and this is, I swear, near the end of my huge question...what about the pesky extra classes that are not clearly DataContract?
For example, The MessageAddress was a DataContract. Ok. And the enums it exposed are part of it...Makes sense... But if the messageAddress constructor raises a MessageAddressFormatException...is it considered part of the DataContract?
Can there be Classes common to both Server, Client, AND Plugins?
Or is it an exception that is common to BOTH ServerMessageAddress and ClientMessageAddress, so should not be duplicated, and instead be in a Common assembly...so that in the end, the client has to bind to Contracts AND Common? (Didn't we just go down this alley with the Interfaces assembly?)
What about common Base classes/Interfaces?
And should these exceptions have common base classes? for example...ClientMessageAddressException, ServerMessageAddressException, ServerMessageVirusException (from plugin)...should I struggle to get them to -- as best as possible -- all derive from an abstract MessageException...or is there a time when enheritence/reusse just no longer an appropriate goal to strive for?
HUGE THANKS FOR READING THIS FAR.
I'm a developer and on the tech side I can bumble along ok...but these kinds of questions, where I've had to lay out the assemblies, the architecture, myself, leave me hugely perplexed...and lose me SOOOO much time, as I drive myself batty, moving things around from one assembly to another to see which one is the best fit, all while not really certain of what I am doing, and trying to not get circular references...
So -- really -- thanks for listening, and I hope this gets read by people who can describe how to lay out the above cleanly, hopefully expressing how to think my way through it for future projects as well.

After spending 10 minutes editing the question for formatting, I'm still going to downvote it. There's no way I'm going to read all that.
Go pick up a copy of
Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries (2nd Edition)

As an architect, I've learned that it doesn't pay to get too wrapped up in getting things absolutely perfect the first time, and perfect is subjective. Refactoring, especially moving classes between assemblies, doesn't have too huge a cost. It sounds to me like you're already thinking things through logically and correctly. Here's my opinions on a few of your questions:
Q: Should I have read-only contracts for my data contract classes?
The plugins most-likely shouldn't be aware of your data contracts at all. A virus checker may take a byte array, a spell checker a string and locale, etc. If you're making a general interface layer for the plugins, you should just isolate what's shared to the data specific to the plugin. This will allow you to maximize their reuse. Thus, I think you'll get little payoff on creating interfaces to your data contract structures, which should mostly be dumb bags of data with little logic that are practically interfaces themselves.
Q: Should I use the same data contract classes as my Silverlight app does in my ASP.NET application or use server-side classes directly?
I would go with the client message objects so you can benefit from code reuse. Object creation is fairly cheap, and I'm sure that most of the mapping would be one-to-one. It's not as fast, true, but that won't be the bottleneck in your application.
Q: Where do I put my exception classes?
I would put your example exception classes in the assembly with the data contract, since they are all raised due to contract violations or as a means to communicate errors while fulfilling the contract.
Q: Should the exceptions have common base classes?
I have yet to need to do this, but I don't know your code base as well as you do. My guess is that it will gain you little if anything.
Edit:
You may be overplanning for the future. In my experience, taking a YAGNI approach has allowed us to get the important things done more quickly. Making incremental design changes is preferred to spending valuable time building an elaborate architecture that you might never even benefit from.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.