I have this C# Script that is supposed to download a zip archive from GitHub, unpack it and put it in a specific folder:
using (var client = new HttpClient())
{
var filePath = Path.GetFullPath("runtime");#"https://github.com/BlackBirdTV/tank/releases/latest/download/runtime.zip?raw=true";
ConsoleUtilities.UpdateProgress("Downloading Runtime...", 0);
var request = await client.GetStreamAsync(url);
var buffer = new byte[(int)bufferSize];
var totalBytesRead = 0;
int bytes = 0;
while ((bytes = await request.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
totalBytesRead += bytes;
ConsoleUtilities.UpdateProgress($"Downloading Runtime... ({totalBytesRead} of {bufferSize} bytes read) ", (int)(totalBytesRead / bufferSize * 100));
}
}
Decompress(buffer, filePath);
When I now run this, the download starts and it seems like it finishes, yet at a sporadic place it just stops. Somehow, It downloads the bytes as my Console shows, but they are zeroed out. It seems like either my computer receives zeros (which I doubt) or the bytes don't get written to the buffer.
Weirdly enough, downloading the file over the browser works just fine.
Any help is greatly appreciated
As I state in the comments, your problem is that each iteration of your while loop is overwriting your buffer, and you are not accumulating the data anywhere. So your last iteration doesn't completely fill the buffer and all you're left with is whatever data you got in the last iteration.
You could fix that bug by storing the accumulated buffer somewhere, but a far better solution is to not fuss with buffers and such and just use the built-in CopyToAsync method of Stream:
using var client = new HttpClient();
using var stream = await client.GetStreamAsync("https://github.com/BlackBirdTV/tank/releases/latest/download/runtime.zip?raw=true");
using var file = new FileStream(#"c:\temp\runtime.zip", FileMode.Create);
await stream.CopyToAsync(file);
Here I'm saving it to a local file at c:\temp\runtime.zip, but obviously change that to suit your needs. I suppose you're avoiding this method so you can track progress, which is fair enough. So if that's really important to you, read on for a fix to your original solution.
For completeness, here's your original code fixed up to work by writing the buffer to a FileStream:
var bufferSize = 1024 * 10;
var url = #"https://github.com/BlackBirdTV/tank/releases/latest/download/runtime.zip?raw=true";
using var client = new HttpClient();
using var stream = await client.GetStreamAsync(url);
using var file = new FileStream(#"c:\temp\runtime.zip", FileMode.Create);
var filePath = Path.GetFullPath("runtime");
var request = await client.GetStreamAsync(url);
var buffer = new byte[(int)bufferSize];
var totalBytesRead = 0;
int bytes = 0;
while ((bytes = await request.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
await file.WriteAsync(buffer, 0, bytes);
totalBytesRead += bytes;
ConsoleUtilities.UpdateProgress($"Downloading Runtime... ({totalBytesRead} of {bufferSize} bytes read) ", (int)(totalBytesRead / bufferSize * 100));
}
Related
I want to be able to upload multiple files via FileInput but got stuck when it comes to parallelism.
I simply want to show the user a progress-bar (depending on overall read bytes) and afterwards a simple list of what has already been processed.
Currently my callback is looking like this:
private async Task HandleInputFileChange(InputFileChangeEventArgs fileChangeEventArgs)
{
_filesProcessed.Clear();
_alreadyRead = 0;
var browserFiles = fileChangeEventArgs.GetMultipleFiles();
_max = browserFiles.Sum(bf => bf.Size);
await Task.WhenAll(browserFiles.Select(browserFile => Task.Run(async () =>
{
var trustedFileName = Path.GetRandomFileName();
var filePath = Path.Combine(HostEnvironment.ContentRootPath, HostEnvironment.EnvironmentName, FolderName, trustedFileName);
await using var fileStream = new FileStream(filePath, FileMode.Create);
await using var readStream = browserFile.OpenReadStream(AllowedFileSize);
int bytesRead;
var readBuffer = new byte[1024 * 10];
while ((bytesRead = await readStream.ReadAsync(readBuffer)) != 0)
{
_alreadyRead += bytesRead;
await fileStream.WriteAsync(readBuffer, 0, bytesRead);
await InvokeAsync(StateHasChanged);
}
_filesProcessed.Add(browserFile);
})));
}
However, with this code I mostly end up in a NullReferenceException at
Microsoft.AspNetCore.Components.Server.Circuits.RemoteJSDataStream.ReceiveData(RemoteJSRuntime runtime, Int64 streamId, Int64 chunkId, Byte[] chunk, String error)
Currently I'm not even sure if this is possible to do or not as it may seem to be an issue with how things get synchronized by the framework.
I have created some avro files. I can use the following commands to convert them to json, just to check whether the files are ok
java -jar avro-tools-1.8.2.jar tojson FileName.avro>outputfilename.json
Now, I have some big avro files and the REST API I m trying to upload to, has size limitations and thus I am trying to upload it in chunks using streams.
The following sample, which just reads from the original file in chunks and copies to another avro file, creates the file perfectly
using System;
using System.IO;
class Test
{
public static void Main()
{
// Specify a file to read from and to create.
string pathSource = #"D:\BDS\AVRO\filename.avro";
string pathNew = #"D:\BDS\AVRO\test\filenamenew.avro";
try
{
using (FileStream fsSource = new FileStream(pathSource,
FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = (int)fsSource.Length;
int numBytesRead = 0;
using (FileStream fsNew = new FileStream(pathNew,
FileMode.Append, FileAccess.Write))
{
// Read the source file into a byte array.
//byte[] bytes = new byte[fsSource.Length];
//int numBytesToRead = (int)fsSource.Length;
//int numBytesRead = 0;
while (numBytesToRead > 0)
{
int bytesRead = fsSource.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
// Read may return anything from 0 to numBytesToRead.
// Break when the end of the file is reached.
if (bytesRead == 0)
break;
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
fsNew.Write(actualbytes, 0, actualbytes.Length);
}
}
}
// Write the byte array to the other FileStream.
}
catch (FileNotFoundException ioEx)
{
Console.WriteLine(ioEx.Message);
}
}
}
How do I know this creates a ok avro. Because the earlier command to convert to json, again works i.e.
java -jar avro-tools-1.8.2.jar tojson filenamenew.avro>outputfilename.json
However, when I use the same code, but instead of copying to another file, just call a rest api, the file gets uploaded but upon downloading the same file from the server and running the command above to convert to json says - "Not a Data file".
So, obviously something is getting corrupted and I am struggling to figure out what.
This is the snippet
string filenamefullyqualified = path + filename;
Stream stream = System.IO.File.Open(filenamefullyqualified, FileMode.Open, FileAccess.Read, FileShare.None);
long? position = 0;
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = stream.Length;
int numBytesRead = 0;
do
{
var content = new MultipartFormDataContent();
int bytesRead = stream.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
if (bytesRead == 0)
break;
//Append Data
url = String.Format("https://{0}.dfs.core.windows.net/raw/datawarehouse/{1}/{2}/{3}/{4}/{5}?action=append&position={6}", datalakeName, filename.Substring(0, filename.IndexOf("_")), year, month, day, filename, position.ToString());
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
ByteArrayContent byteContent = new ByteArrayContent(actualbytes);
content.Add(byteContent);
method = new HttpMethod("PATCH");
request = new HttpRequestMessage(method, url)
{
Content = content
};
request.Headers.Add("Authorization", "Bearer " + accesstoken);
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
position = position + request.Content.Headers.ContentLength;
Array.Clear(buffer, 0, buffer.Length);
} while (numBytesToRead > 0);
stream.Close();
I have looked through the forum threads but haven't come across anything which deals with splitting of avro files.
I have a hunch that my "content" for the http request isn't right. what is it that I am missing?
If you need more details, I will be happy to provide.
I have found the problem now. The problem was because of MultipartFormDataContent. When an avro file is uploaded with that, it adds extra text like content Type etc, along with removal of many lines (I do not know why).
So, the solution was to upload the contents as "ByteArrayContent" itself and not add it to MultipartFormDataContent like I was doing earlier.
Here is the snippet, almost similar to the one in the question, except that I no longer use MultipartFormDataContent
string filenamefullyqualified = path + filename;
Stream stream = System.IO.File.Open(filenamefullyqualified, FileMode.Open, FileAccess.Read, FileShare.None);
//content.Add(CreateFileContent(fs, path, filename, "text/plain"));
long? position = 0;
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = stream.Length;
int numBytesRead = 0;
//while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
//{
do
{
//var content = new MultipartFormDataContent();
int bytesRead = stream.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
if (bytesRead == 0)
break;
//Append Data
url = String.Format("https://{0}.dfs.core.windows.net/raw/datawarehouse/{1}/{2}/{3}/{4}/{5}?action=append&position={6}", datalakeName, filename.Substring(0, filename.IndexOf("_")), year, month, day, filename, position.ToString());
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
ByteArrayContent byteContent = new ByteArrayContent(actualbytes);
//byteContent.Headers.ContentType= new MediaTypeHeaderValue("text/plain");
//content.Add(byteContent);
method = new HttpMethod("PATCH");
//request = new HttpRequestMessage(method, url)
//{
// Content = content
//};
request = new HttpRequestMessage(method, url)
{
Content = byteContent
};
request.Headers.Add("Authorization", "Bearer " + accesstoken);
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
position = position + request.Content.Headers.ContentLength;
Array.Clear(buffer, 0, buffer.Length);
} while (numBytesToRead > 0);
stream.Close();
But the streaming by record will not be able to handle the AVRO file as a whole in a transaction. We may end up in partial success, if some records fail, for example.
If we have a small tool that can split AVRO files based on a threshold number of records, it will be great.
The spark-based split by partition technique does allow to split data set to a pre-defined number of files; but, it does not allow splitting based on the number of records. I.e., I do not want an AVRO file with more than 500 records.
So we have to devise a batching logic based on the comfortable heap size the application can handle along with a two-phase commit, to handle transactions
I'm downloading a PDF file from an AWS S3 bucket using the official client in C#. It appears to download the whole file, but everything is 0s after 8192 (0x2000) bytes.
See below (original file on left, S3 download on right):
Any ideas as to why this is happening would be greatly appreciated.
Here's the code:
var client = new AmazonS3Client(
new AmazonS3Config
{
RegionEndpoint = RegionEndpoint.EUWest1
});
var transferUtility = new TransferUtility(client);
var request = new TransferUtilityOpenStreamRequest
{
BucketName = bucketName,
Key = key
};
using (var stream = transferUtility.OpenStream(request))
{
var bytes = new byte[stream.Length];
stream.Read(bytes, 0, (int)stream.Length);
stream.Close();
return bytes;
}
Thanks in advance,
Steve.
For anyone else hitting this issue, it was a case of having to repeatedly call Read on the stream until all bytes have been received:
using (var stream = transferUtility.OpenStream(request))
{
var position = 0;
var length = stream.Length;
var bytes = new byte[length];
do
{
position += stream.Read(bytes, position, (int)(stream.Length - position));
} while (position < length);
stream.Close();
return bytes;
}
Thanks to John for pointing that out.
Edit:
Or check out this extension method kindly pointed out by JohnLBevan: https://stackoverflow.com/a/24412022/361842
I am trying to build an application that downloads a small binary file (20-25 KB) from a custom webserver using httpwebrequests.
This is the server-side code:
Stream UpdateRequest = context.Request.InputStream;
byte[] UpdateContent = new byte[context.Request.ContentLength64];
UpdateRequest.Read(UpdateContent, 0, UpdateContent.Length);
String remoteVersion = "";
for (int i = 0;i < UpdateContent.Length;i++) { //check if update is necessary
remoteVersion += (char)UpdateContent[i];
}
byte[] UpdateRequestResponse;
if (remoteVersion == remotePluginVersion) {
UpdateRequestResponse = new byte[1];
UpdateRequestResponse[0] = 0; //respond with a single byte set to 0 if no update is required
} else {
FileInfo info = new FileInfo(Path.Combine(Directory.GetCurrentDirectory(), "remote logs", "PointAwarder.dll"));
UpdateRequestResponse = File.ReadAllBytes(Path.Combine(Directory.GetCurrentDirectory(), "remote logs", "PointAwarder.dll"));
//respond with the updated file otherwise
}
//this byte is past the threshold and will not be the same in the version the client recieves
Console.WriteLine("5000th byte: " + UpdateRequestResponse[5000]);
//send the response
context.Response.ContentLength64 = UpdateRequestResponse.Length;
context.Response.OutputStream.Write(UpdateRequestResponse, 0, UpdateRequestResponse.Length);
context.Response.Close();
After this the array UpdateRequestResponse contains the entire file and has been sent to the client.
The client runs this code:
//create the request
WebRequest request = WebRequest.Create(url + "pluginUpdate");
request.Method = "POST";
//create a byte array of the current version
byte[] requestContentTemp = version.ToByteArray();
int count = 0;
for (int i = 0; i < requestContentTemp.Length; i++) {
if (requestContentTemp[i] != 0) {
count++;
}
}
byte[] requestContent = new byte[count];
for (int i = 0, j = 0; i < requestContentTemp.Length; i++) {
if (requestContentTemp[i] != 0) {
requestContent[j] = requestContentTemp[i];
j++;
}
}
//send the current version
request.ContentLength = requestContent.Length;
Stream dataStream = request.GetRequestStream();
dataStream.Write(requestContent, 0, requestContent.Length);
dataStream.Close();
//get and read the response
WebResponse response = request.GetResponse();
Stream responseStream = response.GetResponseStream();
byte[] responseBytes = new byte[response.ContentLength];
responseStream.Read(responseBytes, 0, (int)response.ContentLength);
responseStream.Close();
response.Close();
//if the response containd a single 0 we are up-to-date, otherwise write the content of the response to file
if (responseBytes[0] != 0 || response.ContentLength > 1) {
BinaryWriter writer = new BinaryWriter(File.Open(Path.Combine(Directory.GetCurrentDirectory(), "ServerPlugins", "PointAwarder.dll"), FileMode.Create));
writer.BaseStream.Write(responseBytes, 0, responseBytes.Length);
writer.Close();
TShockAPI.Commands.HandleCommand(TSPlayer.Server, "/reload");
}
The byte array responseBytes on the client should be identical to the array UpdateRequestResponse on the server, but it isn't. after about 4000 bytes every byte after that is set to 0 rather than what it should be (responseBytes[3985] is the last non-zero byte).
Does this happen because httpWebRequest has a size limit? I can't see any bug in my code that could be causing it and the same code works in other instances where I only have to pass around short sequences of data (less than 100 bytes).
The MSDN pages don't mention any size limit like this.
It's not that it has any artificial limit, this is a byproduct of the Streaming nature of what you're attempting to do. I have a feeling the following line is the offender:
responseStream.Read(responseBytes, 0, (int)response.ContentLength);
I've had this issue in the past (with TCP streams), it doesn't read all of the contents of the array, because they haven't all been sent over the wire yet. This is what I would try instead.
for (int i = 0; i < response.ContentLength; i++)
{
responseBytes[i] = responseStream.ReadByte();
}
That way, it will make sure to read all the way until the end of the stream.
EDIT
usr's BinaryReader based solution is much more efficient. Here is the relevant solution:
BinaryReader binReader = new BinaryReader(responseStream);
const int bufferSize = 4096;
byte[] responseBytes;
using (MemoryStream ms = new MemoryStream())
{
byte[] buffer = new byte[bufferSize];
int count;
while ((count = binReader.Read(buffer, 0, buffer.Length)) != 0)
ms.Write(buffer, 0, count);
responseBytes = ms.ToArray();
}
You are assuming that Read is reading as many bytes as you request. But the requested count is just an upper limit. You must tolerate reading small chunks.
You can use var bytes = new BinaryReader(myStream).ReadBytes(count); to read an exact number. Don't call ReadByte too often because that is very CPU intensive.
The best solution would be to step away from the fairly manual HttpWebRequest and use HttpClient or WebClient. All of this is automated for you and you get back a byte[].
I would like to upload a large amount of data to a web server from a client machine. I jumped right to PushStreamContent so I could write directly to the stream, as the results vary in size and can be rather large.
The flow is as follows:
User runs query > Reader Ready Event Fires > Begin Upload
Once the ready event is fired, the listener picks it up and iterates over the result set, uploading the data as a multipart form:
Console.WriteLine("Query ready, uploading");
byte[] buffer = new byte[1024], form = new byte[200];
int offset = 0, byteCount = 0;
StringBuilder rowBuilder = new StringBuilder();
string builderS;
var content = new PushStreamContent(async (stream, httpContent, transportContext) =>
//using (System.IO.Stream stream = new System.IO.FileStream("test.txt", System.IO.FileMode.OpenOrCreate))
{
int bytes = 0;
string boundary = createFormBoundary();
httpContent.Headers.Remove("Content-Type");
httpContent.Headers.TryAddWithoutValidation("Content-Type", "multipart/form-data; boundary=" + boundary);
await stream.WriteAsync(form, 0, form.Length);
form = System.Text.Encoding.UTF8.GetBytes(createFormElement(boundary, "file"));
await stream.WriteAsync(form, 0, form.Length);
await Task.Run(async () =>
{
foreach (var row in rows)
{
for (int i = 0; i < row.Length; i++)
{
rowBuilder.Append(row[i].Value);
if (i + 1 < row.Length)
rowBuilder.Append(',');
else
{
rowBuilder.Append("\r\n");
}
}
builderS = rowBuilder.ToString();
rowBuilder.Clear();
byteCount = System.Text.Encoding.UTF8.GetByteCount(builderS);
bytes += byteCount;
if (offset + byteCount > buffer.Length)
{
await stream.WriteAsync(buffer, 0, offset);
offset = 0;
if (byteCount > buffer.Length)
{
System.Diagnostics.Debug.WriteLine("Expanding buffer to {0} bytes", byteCount);
buffer = new byte[byteCount];
}
}
offset += System.Text.Encoding.UTF8.GetBytes(builderS, 0, builderS.Length, buffer, offset);
}
});
await stream.WriteAsync(buffer, 0, offset);
form = System.Text.Encoding.UTF8.GetBytes(boundary);
await stream.WriteAsync(form, 0, form.Length);
await stream.FlushAsync(); //pretty sure this does nothing
System.Diagnostics.Debug.WriteLine("Wrote {0}.{1} megabytes of data", bytes / 1000000, bytes % 1000000);
I think the code above would work great if I were the server, just adding stream.Close(); would finish it, however since I am the client here closing it causes an error (TaskCancelled). Waiting to read doesn't do anything either, I presume because the PushStreamContent doesn't end the request unless I explicitly close the stream. That being said, writing to a file produces exactly what I expect to be uploaded so everything writes perfectly.
Any ideas on what I can do here? I might be totally misusing PushStreamContent but it seems like this should be an appropriate use case.
So the solution is a little confusing at first but it seems to make sense and perhaps more importantly, it works:
using(var content = new MultipartFormDataContent())
{
var pushContent = new PushStreamContent(async (stream, httpContent, transportContext) =>
{
//do the stream writing stuff
stream.Close();
});
content.add(pushContent);
//post, put, etc. content here
}
This works because the stream passed to the PushStreamContent method is not the actual request stream, it's a stream handled by the HttpClient, just like adding a file to a request stream. As a result, closing it signals the end of input for this part of the HttpContent and allows the request to be finalized.