I want to use gRPC to share very large file (more than 6GB) between endpoints and a server.
The project where I'm currently working require a central server where endpoints can upload and download files. One of the constraint is that endpoints don't know each others, but they can receive and send messages each others from a common bus.
To implement this server and its communication with endpoints, I'm evaluating to use gRPC.
Do you think is the best solution for file stream? what alternatives do I have?
thanks in advance.
gRPC with client/server streaming is capable of handling upload/download of files.
However, there's a discussion here on the performance of gRPC vs HTTP for file upload/download, which says HTTP is any day going to be faster to upload/download because this is just reading/writing incoming bytes, while gRPC performs additional serialization/deserialization for each message in the stream adding significant overhead.
There is another blog doing some benchmark on the same - https://ops.tips/blog/sending-files-via-grpc/ .
If you are looking to implement something that has to handle scale, you can do some more research.
If you really want to do this over gRPC, then the key thing is to make the response "server streaming", so that instead of returning 6GiB in one chunk, it returns multiple chunks of whatever size you need, for example maybe 128kiB at a time (or whatever); you can so this with something like:
syntax = "proto3";
message FileRequest {
string id = 1; // or whatever
}
message FileResponse {
bytes chunk = 1; // some segment of the file
}
service SearchService {
rpc GetFile(FileRequest) returns (stream FileResponse);
}
but nothing is automatic: it is now your job to write the multiple segments back.
I suspect a vanilla http download-style response may be simpler!
Related
Maybe this seems like weird question, but I came across the following situation:
I try to make a post request to a service, and to add the post-data I chose make a Stream out of the request and use a StreamWriter to write the body on it.
But, before I actually execute the request (with GetResponse), even before I write to the stream object, I get an "Unable to connect exception" exactly on
var stream = request.GetRequestStream();
After a little investigation, I realized that request.GetRequestStream() is actually trying to connect. The problem in my case was network connectivity to the server (firewall issue).
BUT my question here is Why HttpWebRequest.GetRequestStream() tries to connect???
My simple thought was that, while on the request creation, there is no connection to the server of the request.
I found some related questions, such like this
But it does not seem to ansewr my question exactly.
Any explanation please?
PS: Any suggestion of how to avoid this "early" connection effect would be much appreciated.
.NET I/O APIs generally operate on streams, which are APIs that allow developers to read and write an ordered sequence of data. By making reading and writing into generic APIs, it enables generic libraries to operate on streams to do powerful things: compression, encryption, encoding, etc. (BTW, treating different kinds of I/O similarly has a long history, most famously in UNIX where everything is a file.)
Although reading and writing data works pretty similarly across many different kinds of streams, opening a stream is much harder to make generic. Think about the vastly different APIs you use to open a file vs. make an HTTP request vs. execute a database query.
Therefore, .NET's Stream class has no generic Open() method because getting a stream into an opened state is very different between different types of streams. Instead, the streams APIs expect to be given a stream that's already open, where "open" means that it's ready to be written to and/or read from.
Therefore, in .NET there's a typical pattern for I/O:
Write some resource-specific code to open a stream. These APIs generally return an open stream.
Hand off that open stream to generic APIs that read and/or write from it.
Close the stream (also generic) when you're done.
Now think about how that pattern above aligns to an HTTP request, which has the following steps:
a. Lookup the server's IP address in DNS
b. Make a TCP connection to the server
c. Send the URL and request headers to the server
d. If it's a POST (or PUT or other method that sends a request body) then upload the request body. If it's a GET, this is a no-op.
e. Now read the response
f. Finally, close the connection.
(I'm ignoring a lot of real-world complexity in the steps above like SSL, keep-alive connections, cached responses, etc. but the basic workflow is accurate enough to answer your question.)
OK now put yourself in the shoes of the .NET team trying to build an HTTP client API, remembering to split the non-generic parts ("get an open stream") from the generic parts: read and/or write, and then close the stream.
If your API only had to handle GET requests, then you'd probably make the connection while executing the same API that returns the response stream. This is exactly what HttpWebRequest.GetResponse does.
But if you're sending POST requests (or PUT or other similar methods), then you have to upload data to the server. Unlike HTTP headers which are only a few KB, the data you upload in a POST could be huge. If you're uploading a 10GB file, you don't want to park it in RAM during the hours it might take to upload to the server. This would kill your client's performance in the meantime. Instead, you need a way to get a Stream so you only have to load small chunks of data into RAM before sending to the server. And remember that Stream has no Open() method, so your API must provide an open stream.
Now you have an answer to your first question: HttpWebRequest.GetRequestStream must make the network connection because if it didn't then the stream would be closed and you couldn't write to it.
Now on to your second question: how can you delay the connection? I assume you mean that the connection should happen upon the first write to the request stream. One way to do this would be to write a class that inherits from Stream that only calls GetRequestStream as late as possible, and then delegates all methods to the underlying request stream. Something like this as as starting point:
using System.Net;
using System.Threading.Tasks;
using System.Threading;
class DelayConnectRequestStream : Stream
{
private HttpWebRequest _req;
private Stream _stream = null;
public DelayConnectRequestStream (HttpWebRequest req)
{
_req = req;
}
public void Write (byte[] buffer, int offset, int count)
{
if (_stream == null)
{
_stream = req.GetRequestStream();
}
return _stream.Write(buffer, offset, count);
}
public override WriteAsync (byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
if (_stream == null)
{
// TODO: figure out if/how to make this async
_stream = req.GetRequestStream();
}
return _stream.WriteAsync(buffer, offset, count, cancellationToken);
}
// repeat the pattern above for all needed methods on Stream
// you may need to decide by trial and error which properties and methods
// must require an open stream. Some properties/methods you can probably just return
// without opening the stream, e.g. CanRead which will always be false so no need to
// create a stream before returning from that getter.
// Also, the code sample above is not thread safe. For
// thread safety, you could use Lazy<T> or roll your own locking.
}
But honestly the approach above seems like overkill. If I were in your shoes, I'd look at why I am trying to defer opening of the stream and to see if there's another way to solve this problem.
I have a .Net Remoting service that will return a class to the client application. That class has a string property where the string can range from 1kb to 400kb worth of data.
I tried passing 256kb worth of string from server to client and the client was able to get it in less than 5 seconds which is still ok since this call will only be used for trouble-shooting purposes by an administrator. However I read
here that when sending huge data: "the socket will be blocked from receiving all other messages until it receives the remaining .... packets". If my data ever reached an MB size I do not want to block the client from receiving other messages.
How can I achieve my goal of not blocking the client? Do I compress the string using GZipStream like in here? Or are there other better ways?
Good article from Tess Fernandez : https://blogs.msdn.microsoft.com/tess/2008/09/02/outofmemoryexceptions-while-remoting-very-large-datasets/
I am working on a C# client for a server that wraps Netty. It is a TCP/IP server and I have tried using C# class TcpClient, but could not write anything onto the server or receive a printed response.
The Netty socket classes include the following: http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/nio/NioClientSocketChannelFactory.html http://docs.jboss.org/netty/3.2/api/org/jboss/netty/bootstrap/ClientBootstrap.html
The message is encoded as a byte[] in Java. Part of class PingSerializer, in the server code, reads as follows:
public byte[] requestToBytes(Ping message) {
return NorbertExampleProtos.Ping.newBuilder().setTimestamp(message.timestamp).build().toByteArray();
}
public Ping requestFromBytes(byte[] bytes) {
try {
return new Ping(NorbertExampleProtos.Ping.newBuilder().mergeFrom(bytes).build().getTimestamp());
} catch (InvalidProtocolBufferException e) {
System.out.println("Invalid protocol buffer exception " + e.getMessage());
throw new IllegalArgumentException(e);
}
}
I would like to know whether it is possible for a client written in C# to connect to the socket, ping the server and print out the server's response, without modifying the server code or using a cross-language development tool such as Apache Thrift or IKVM to handle the messages. Thanks, I would appreciate any help.
Judging by the code sample you've given, it looks like the data is encoded using Protocol Buffers, Google's serialization format.
Fortunately, there are at least two libraries implementing Protocol Buffers for .NET:
protobuf-net: Written for .NET from the ground up, this is a good choice if you don't particularly need the C# code to look like the equivalent Java/C++ code.
protobuf-csharp-port: This is a port from the Java client code, with some .NET idioms added - so if you're working with Protocol Buffers on multiple platforms, this may be more appropriate. (Disclaimer: I did most of the coding for this port.)
The good news is that the wire format for the two is the same, because it's the standard Protocol Buffer wire format. So if you decide later on that you've made the wrong choice, you don't need to worry about the data format changing.
In terms of communicating with the server, TcpClient should be absolutely fine. You'll need to find out exactly what the protocol is - for example, whether it's Protocol Buffers over HTTP, or something similar. (If it is over HTTP, WebClient would be a simpler approach.) However, beyond that it's straight TCP/IP: you write the bytes to the server, and it should write a reply. You can use Wireshark to look at the traffic between the client and the server, if you need to trace where problems are occurring.
There is an application called CS2J that will convert all of your C# code directly over to Java. However, you cannot expect it to be perfect and you will have a bit of debugging to do. It is supposed to very accurate.
I have a C# application (which is the client) and I have a server. Now the server gets and sends all sorts of messages which are strings to the client, I am using StreamWriter for this, now the sending message on the client and the server looks pretty the same, I take the string encode it to UTF-8 and then send it
public void SendMessage(String p)
{
if (p != "")
{
string StringMessage = HttpUtility.UrlEncode(p, System.Text.Encoding.UTF8);
try
{
swSender.WriteLine(StringMessage);
swSender.Flush();
}
catch (IOException e)
{
//do some stuff
}
}
}
now,the strings I send is something like this:
"SUBJECT####SOMEDATA1<><>SOMEDATA2<><>SOMEDATA3
This causes some problems, and makes me think. Is this the way big applications send/ receive data? Because it looks pretty silly. If no, then can some one provide an example on how big applications send messages?
Also: my way of sending messages makes me make big nested if
For example:
if(Subject="something")
do something
else if(subject="something else")
do something else
How can I fix this?
It all greatly depends on your application's needs.
Generally speaking: no, inventing your own protocol is not a good idea.
There are quite a few ways to send messages from client to server.
I'd suggest you to do some reading on WCF, or if you are in .NET 2.0 than .NET Remoting.
Also, you might want to consider to send HTTP messages, as there are a shitload of frameworks to do that.
One way is to use XML-RPC. I used this for .NET. I followed the instructions w/o modifying it and got the client/server working within 30mins and another 10 to modify it to my liking. Essentially you call functions normally and through the magic of the library it will block for the server to execute the code and it will return results. RPC = remote procedure call.
If your using asp.net use the instructions labeled IIS even if your on linux using fastcgi or apache. I ignored that which was a mistake because it was labeled IIS. There is a .NET Remoting option (if the server isnt asp.net but another app) thats available.
A not as good option is to learn webclient and post json strings to the server. Then read the response as json. XML-RPC is pretty standard and suggested.
try to use HttpUtility.HtmlEncode Method
instead UrlEncode()
I'm trying to send an image to wcf to use OCR.
For now, I succeeded in transforming my image into a byte[] and sending it to the server using wcf. Unfortunately, it works for an array whose size is <16Kb and doesn't work for an array >17Kb.
I've already set the readerQuotas and maxArrayLength to its maximum size in web.config on the server size.
Do you know how to send big data to a wcf server, or maybe any library to use OCR directly on wp7?
If all else fails, send it in fragments of 16Kb, followed by an "all done" message that commits it (reassembling if necessary)
Bit of a hack but howabout sending it with a HTTP post if it isn't too big? or alternatively changing the webservice so it accepts a blob? (the current array limitation is a limit on the array datatype in the W3C spec)
Finaly solved.
You have to update your web.config to allow the server to receive big data. And then you have to use the Stream type in your WCF and byte[] type in your WP7. Types will match and both WCF or WP7 will agree to receive and send it.
In WCF :
public string ConvertImgToStringPiece(Stream img)
{
//.....
}
In WP7 :
Service1Client proxy = new Service1Client();
proxy.ConvertImgToStringPieceCompleted += new EventHandler<ConvertImgToStringPieceCompletedEventArgs>(proxy_ConvertImgToStringPieceCompleted);
proxy.ConvertImgToStringPieceAsync(b); //b is my Byte[], more thant 17Kb.
I don't know if this works on WP7, but with WCF you can also use streams to upload bigger amounts of data.
You can try using a WCF session. The key thing to remember is that sessions in WCF are different than normal sessions we use for Internet programming. It's basically a call to a method that starts the session, any interim calls, and then a final one that ends the session. You could have a service call that starts the session, send chunks of the image, and then call the last one which closes the session and will return whatever you need.
http://msdn.microsoft.com/en-us/library/ms733040.aspx