Why HttpWebRequest.GetRequestStream() tries to connect - c#

Maybe this seems like weird question, but I came across the following situation:
I try to make a post request to a service, and to add the post-data I chose make a Stream out of the request and use a StreamWriter to write the body on it.
But, before I actually execute the request (with GetResponse), even before I write to the stream object, I get an "Unable to connect exception" exactly on
var stream = request.GetRequestStream();
After a little investigation, I realized that request.GetRequestStream() is actually trying to connect. The problem in my case was network connectivity to the server (firewall issue).
BUT my question here is Why HttpWebRequest.GetRequestStream() tries to connect???
My simple thought was that, while on the request creation, there is no connection to the server of the request.
I found some related questions, such like this
But it does not seem to ansewr my question exactly.
Any explanation please?
PS: Any suggestion of how to avoid this "early" connection effect would be much appreciated.

.NET I/O APIs generally operate on streams, which are APIs that allow developers to read and write an ordered sequence of data. By making reading and writing into generic APIs, it enables generic libraries to operate on streams to do powerful things: compression, encryption, encoding, etc. (BTW, treating different kinds of I/O similarly has a long history, most famously in UNIX where everything is a file.)
Although reading and writing data works pretty similarly across many different kinds of streams, opening a stream is much harder to make generic. Think about the vastly different APIs you use to open a file vs. make an HTTP request vs. execute a database query.
Therefore, .NET's Stream class has no generic Open() method because getting a stream into an opened state is very different between different types of streams. Instead, the streams APIs expect to be given a stream that's already open, where "open" means that it's ready to be written to and/or read from.
Therefore, in .NET there's a typical pattern for I/O:
Write some resource-specific code to open a stream. These APIs generally return an open stream.
Hand off that open stream to generic APIs that read and/or write from it.
Close the stream (also generic) when you're done.
Now think about how that pattern above aligns to an HTTP request, which has the following steps:
a. Lookup the server's IP address in DNS
b. Make a TCP connection to the server
c. Send the URL and request headers to the server
d. If it's a POST (or PUT or other method that sends a request body) then upload the request body. If it's a GET, this is a no-op.
e. Now read the response
f. Finally, close the connection.
(I'm ignoring a lot of real-world complexity in the steps above like SSL, keep-alive connections, cached responses, etc. but the basic workflow is accurate enough to answer your question.)
OK now put yourself in the shoes of the .NET team trying to build an HTTP client API, remembering to split the non-generic parts ("get an open stream") from the generic parts: read and/or write, and then close the stream.
If your API only had to handle GET requests, then you'd probably make the connection while executing the same API that returns the response stream. This is exactly what HttpWebRequest.GetResponse does.
But if you're sending POST requests (or PUT or other similar methods), then you have to upload data to the server. Unlike HTTP headers which are only a few KB, the data you upload in a POST could be huge. If you're uploading a 10GB file, you don't want to park it in RAM during the hours it might take to upload to the server. This would kill your client's performance in the meantime. Instead, you need a way to get a Stream so you only have to load small chunks of data into RAM before sending to the server. And remember that Stream has no Open() method, so your API must provide an open stream.
Now you have an answer to your first question: HttpWebRequest.GetRequestStream must make the network connection because if it didn't then the stream would be closed and you couldn't write to it.
Now on to your second question: how can you delay the connection? I assume you mean that the connection should happen upon the first write to the request stream. One way to do this would be to write a class that inherits from Stream that only calls GetRequestStream as late as possible, and then delegates all methods to the underlying request stream. Something like this as as starting point:
using System.Net;
using System.Threading.Tasks;
using System.Threading;
class DelayConnectRequestStream : Stream
{
private HttpWebRequest _req;
private Stream _stream = null;
public DelayConnectRequestStream (HttpWebRequest req)
{
_req = req;
}
public void Write (byte[] buffer, int offset, int count)
{
if (_stream == null)
{
_stream = req.GetRequestStream();
}
return _stream.Write(buffer, offset, count);
}
public override WriteAsync (byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
if (_stream == null)
{
// TODO: figure out if/how to make this async
_stream = req.GetRequestStream();
}
return _stream.WriteAsync(buffer, offset, count, cancellationToken);
}
// repeat the pattern above for all needed methods on Stream
// you may need to decide by trial and error which properties and methods
// must require an open stream. Some properties/methods you can probably just return
// without opening the stream, e.g. CanRead which will always be false so no need to
// create a stream before returning from that getter.
// Also, the code sample above is not thread safe. For
// thread safety, you could use Lazy<T> or roll your own locking.
}
But honestly the approach above seems like overkill. If I were in your shoes, I'd look at why I am trying to defer opening of the stream and to see if there's another way to solve this problem.

Related

Use gRPC to share very large file

I want to use gRPC to share very large file (more than 6GB) between endpoints and a server.
The project where I'm currently working require a central server where endpoints can upload and download files. One of the constraint is that endpoints don't know each others, but they can receive and send messages each others from a common bus.
To implement this server and its communication with endpoints, I'm evaluating to use gRPC.
Do you think is the best solution for file stream? what alternatives do I have?
thanks in advance.
gRPC with client/server streaming is capable of handling upload/download of files.
However, there's a discussion here on the performance of gRPC vs HTTP for file upload/download, which says HTTP is any day going to be faster to upload/download because this is just reading/writing incoming bytes, while gRPC performs additional serialization/deserialization for each message in the stream adding significant overhead.
There is another blog doing some benchmark on the same - https://ops.tips/blog/sending-files-via-grpc/ .
If you are looking to implement something that has to handle scale, you can do some more research.
If you really want to do this over gRPC, then the key thing is to make the response "server streaming", so that instead of returning 6GiB in one chunk, it returns multiple chunks of whatever size you need, for example maybe 128kiB at a time (or whatever); you can so this with something like:
syntax = "proto3";
message FileRequest {
string id = 1; // or whatever
}
message FileResponse {
bytes chunk = 1; // some segment of the file
}
service SearchService {
rpc GetFile(FileRequest) returns (stream FileResponse);
}
but nothing is automatic: it is now your job to write the multiple segments back.
I suspect a vanilla http download-style response may be simpler!

Building on top of a custom stream

I have created a stream based on a stateless protocol, think 2 web servers sending very limited requests to each other.
As such neither will know if I suddenly stop one as no connection will close, there will simply be no requests. There could legitimately be a gap in requests so I don't want to treat the lack of them as a lost connection.
What I want to do is send a heartbeat to say "I'm alive", I obviously don't want the heartbeat data when I read form the stream though, so my question.
How do I create a new stream class that wraps another stream and sends heartbeat data without exposing that to calling code?
Assuming 2 similar implementations on both sides: send each block of data with a header so you can safely send Zero-data heartbeat blocks. I.e. translate Write on outer stream into several writes on inner stream like "{Data, 100 bytes, [bytes]}, {Data, 13 bytes, [bytes]}", heartbeat would look like "{Ping, 0 bytes, []}". On receiving end immediately respond with similar empty Ping.

How to write HTTP request content to file Asynchronously?

I am passing some data in HTTP body using the Ajax request. These contents on server are accessible using Request.InputStream. Now i want to write it to file on server disk Asynchronously. How to do that ?
If the data sent using the HTTP is more then i don't want server application to die/get affected. So that i would like to write it Asynchronously.
Following is sample for reference regarding what i am trying to do ...
System.IO.Stream str;
// Create a Stream object.
str = Request.InputStream;
// TO DO Write it to file asyncronously ...
str.Write(someFile, 0, strLen);
Stream.BeginWrite & Stream.EndWrite are the things to look at. Try this MSDN link.
One approach would be, assuming you're happy to not have control over reading the stream (i.e. you trust that something else will, or happy with it not happening, or not happening in full, is to use a 'virtual stream'.
If you create a class that 'wraps' the stream you wish to use and passes all calls to that underlying stream, you could hook your own logic in the read method, so that when your stream's read method is called you read from the underlying stream, write to your own stream, and then pass it back.
This is not 100% asynchronous as the reader of the stream waits until you write the buffer to your file before receiving the buffer, but it doesn't force the reader to wait until you've read the entire stream and also works well with non-seekable streams.
I wrote a BizTalk-related whitepaper on this with full source code, if you're happy to ignore the BizTalk bits the stream implementation is discussed there in detail
What would you like to do while the file is being written? Something tells me you're mistaken, all proper HTTP servers are multithreaded, so no matter how long a request runs, other requests can be serviced in parallel, too.

How to flush HttpListener response stream?

HttpListener gives you response stream, but calling flush means nothing (and from sources it's clear, because it's actually doing nothing). Digging inside HTTP API shows that this is a limitation of HttpListener itself.
Anyone knows exactly how to flush response stream of HttpListener (may be with reflection or additional P/Invokes)?
Update: You can't http stream anything if you don't have a flush option or ability to define buffer size.
Flush only works in most of the System.Net namespace when Transfer-Encoding is set to Chuncked, else the whole request is returned and Flush really does nothing. At least this is what I have experienced while working with HttpWebResponse.
I've not tried this yet, but how about writing a separate TCP server for streaming responses? Then forward the request from the HttpListener to the "internal" tcp server. Using this redirect you might be able to stream data back as you need.
As for flushing it, they only way I see to do it is to simulate a dispose, without actually disposing. If you can hack into the HttpResponseStream object, tell it to dispose, unset the m_Closed flag, etc, you might be able to flush the streaming data.

NetworkStream.Write returns immediately - how can I tell when it has finished sending data?

Despite the documentation, NetworkStream.Write does not appear to wait until the data has been sent. Instead, it waits until the data has been copied to a buffer and then returns. That buffer is transmitted in the background.
This is the code I have at the moment. Whether I use ns.Write or ns.BeginWrite doesn't matter - both return immediately. The EndWrite also returns immediately (which makes sense since it is writing to the send buffer, not writing to the network).
bool done;
void SendData(TcpClient tcp, byte[] data)
{
NetworkStream ns = tcp.GetStream();
done = false;
ns.BeginWrite(bytWriteBuffer, 0, data.Length, myWriteCallBack, ns);
while (done == false) Thread.Sleep(10);
}
 
public void myWriteCallBack(IAsyncResult ar)
{
NetworkStream ns = (NetworkStream)ar.AsyncState;
ns.EndWrite(ar);
done = true;
}
How can I tell when the data has actually been sent to the client?
I want to wait for 10 seconds(for example) for a response from the server after sending my data otherwise I'll assume something was wrong. If it takes 15 seconds to send my data, then it will always timeout since I can only start counting from when NetworkStream.Write returns - which is before the data has been sent. I want to start counting 10 seconds from when the data has left my network card.
The amount of data and the time to send it could vary - it could take 1 second to send it, it could take 10 seconds to send it, it could take a minute to send it. The server does send an response when it has received the data (it's a smtp server), but I don't want to wait forever if my data was malformed and the response will never come, which is why I need to know if I'm waiting for the data to be sent, or if I'm waiting for the server to respond.
I might want to show the status to the user - I'd like to show "sending data to server", and "waiting for response from server" - how could I do that?
I'm not a C# programmer, but the way you've asked this question is slightly misleading. The only way to know when your data has been "received", for any useful definition of "received", is to have a specific acknowledgment message in your protocol which indicates the data has been fully processed.
The data does not "leave" your network card, exactly. The best way to think of your program's relationship to the network is:
your program -> lots of confusing stuff -> the peer program
A list of things that might be in the "lots of confusing stuff":
the CLR
the operating system kernel
a virtualized network interface
a switch
a software firewall
a hardware firewall
a router performing network address translation
a router on the peer's end performing network address translation
So, if you are on a virtual machine, which is hosted under a different operating system, that has a software firewall which is controlling the virtual machine's network behavior - when has the data "really" left your network card? Even in the best case scenario, many of these components may drop a packet, which your network card will need to re-transmit. Has it "left" your network card when the first (unsuccessful) attempt has been made? Most networking APIs would say no, it hasn't been "sent" until the other end has sent a TCP acknowledgement.
That said, the documentation for NetworkStream.Write seems to indicate that it will not return until it has at least initiated the 'send' operation:
The Write method blocks until the requested number of bytes is sent or a SocketException is thrown.
Of course, "is sent" is somewhat vague for the reasons I gave above. There's also the possibility that the data will be "really" sent by your program and received by the peer program, but the peer will crash or otherwise not actually process the data. So you should do a Write followed by a Read of a message that will only be emitted by your peer when it has actually processed the message.
TCP is a "reliable" protocol, which means the data will be received at the other end if there are no socket errors. I have seen numerous efforts at second-guessing TCP with a higher level application confirmation, but IMHO this is usually a waste of time and bandwidth.
Typically the problem you describe is handled through normal client/server design, which in its simplest form goes like this...
The client sends a request to the server and does a blocking read on the socket waiting for some kind of response. If there is a problem with the TCP connection then that read will abort. The client should also use a timeout to detect any non-network related issue with the server. If the request fails or times out then the client can retry, report an error, etc.
Once the server has processed the request and sent the response it usually no longer cares what happens - even if the socket goes away during the transaction - because it is up to the client to initiate any further interaction. Personally, I find it very comforting to be the server. :-)
In general, I would recommend sending an acknowledgment from the client anyway. That way you can be 100% sure the data was received, and received correctly.
If I had to guess, the NetworkStream considers the data to have been sent once it hands the buffer off to the Windows Socket. So, I'm not sure there's a way to accomplish what you want via TcpClient.
I can not think of a scenario where NetworkStream.Write wouldn't send the data to the server as soon as possible. Barring massive network congestion or disconnection, it should end up on the other end within a reasonable time. Is it possible that you have a protocol issue? For instance, with HTTP the request headers must end with a blank line, and the server will not send any response until one occurs -- does the protocol in use have a similar end-of-message characteristic?
Here's some cleaner code than your original version, removing the delegate, field, and Thread.Sleep. It preforms the exact same way functionally.
void SendData(TcpClient tcp, byte[] data) {
NetworkStream ns = tcp.GetStream();
// BUG?: should bytWriteBuffer == data?
IAsyncResult r = ns.BeginWrite(bytWriteBuffer, 0, data.Length, null, null);
r.AsyncWaitHandle.WaitOne();
ns.EndWrite(r);
}
Looks like the question was modified while I wrote the above. The .WaitOne() may help your timeout issue. It can be passed a timeout parameter. This is a lazy wait -- the thread will not be scheduled again until the result is finished, or the timeout expires.
I try to understand the intent of .NET NetworkStream designers, and they must design it this way. After Write, the data to send are no longer handled by .NET. Therefore, it is reasonable that Write returns immediately (and the data will be sent out from NIC some time soon).
So in your application design, you should follow this pattern other than trying to make it working your way. For example, use a longer time out before received any data from the NetworkStream can compensate the time consumed before your command leaving the NIC.
In all, it is bad practice to hard code a timeout value inside source files. If the timeout value is configurable at runtime, everything should work fine.
How about using the Flush() method.
ns.Flush()
That should ensure the data is written before continuing.
Bellow .net is windows sockets which use TCP.
TCP uses ACK packets to notify the sender the data has been transferred successfully.
So the sender machine knows when data has been transferred but there is no way (that I am aware of) to get that information in .net.
edit:
Just an idea, never tried:
Write() blocks only if sockets buffer is full. So if we lower that buffers size (SendBufferSize) to a very low value (8? 1? 0?) we may get what we want :)
Perhaps try setting
tcp.NoDelay = true

Categories

Resources