Amazon S3 File Full of Zeros - c#

I'm downloading a PDF file from an AWS S3 bucket using the official client in C#. It appears to download the whole file, but everything is 0s after 8192 (0x2000) bytes.
See below (original file on left, S3 download on right):
Any ideas as to why this is happening would be greatly appreciated.
Here's the code:
var client = new AmazonS3Client(
new AmazonS3Config
{
RegionEndpoint = RegionEndpoint.EUWest1
});
var transferUtility = new TransferUtility(client);
var request = new TransferUtilityOpenStreamRequest
{
BucketName = bucketName,
Key = key
};
using (var stream = transferUtility.OpenStream(request))
{
var bytes = new byte[stream.Length];
stream.Read(bytes, 0, (int)stream.Length);
stream.Close();
return bytes;
}
Thanks in advance,
Steve.

For anyone else hitting this issue, it was a case of having to repeatedly call Read on the stream until all bytes have been received:
using (var stream = transferUtility.OpenStream(request))
{
var position = 0;
var length = stream.Length;
var bytes = new byte[length];
do
{
position += stream.Read(bytes, position, (int)(stream.Length - position));
} while (position < length);
stream.Close();
return bytes;
}
Thanks to John for pointing that out.
Edit:
Or check out this extension method kindly pointed out by JohnLBevan: https://stackoverflow.com/a/24412022/361842

Related

C# HttpClient Abruptly stops download

I have this C# Script that is supposed to download a zip archive from GitHub, unpack it and put it in a specific folder:
using (var client = new HttpClient())
{
var filePath = Path.GetFullPath("runtime");#"https://github.com/BlackBirdTV/tank/releases/latest/download/runtime.zip?raw=true";
ConsoleUtilities.UpdateProgress("Downloading Runtime...", 0);
var request = await client.GetStreamAsync(url);
var buffer = new byte[(int)bufferSize];
var totalBytesRead = 0;
int bytes = 0;
while ((bytes = await request.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
totalBytesRead += bytes;
ConsoleUtilities.UpdateProgress($"Downloading Runtime... ({totalBytesRead} of {bufferSize} bytes read) ", (int)(totalBytesRead / bufferSize * 100));
}
}
Decompress(buffer, filePath);
When I now run this, the download starts and it seems like it finishes, yet at a sporadic place it just stops. Somehow, It downloads the bytes as my Console shows, but they are zeroed out. It seems like either my computer receives zeros (which I doubt) or the bytes don't get written to the buffer.
Weirdly enough, downloading the file over the browser works just fine.
Any help is greatly appreciated
As I state in the comments, your problem is that each iteration of your while loop is overwriting your buffer, and you are not accumulating the data anywhere. So your last iteration doesn't completely fill the buffer and all you're left with is whatever data you got in the last iteration.
You could fix that bug by storing the accumulated buffer somewhere, but a far better solution is to not fuss with buffers and such and just use the built-in CopyToAsync method of Stream:
using var client = new HttpClient();
using var stream = await client.GetStreamAsync("https://github.com/BlackBirdTV/tank/releases/latest/download/runtime.zip?raw=true");
using var file = new FileStream(#"c:\temp\runtime.zip", FileMode.Create);
await stream.CopyToAsync(file);
Here I'm saving it to a local file at c:\temp\runtime.zip, but obviously change that to suit your needs. I suppose you're avoiding this method so you can track progress, which is fair enough. So if that's really important to you, read on for a fix to your original solution.
For completeness, here's your original code fixed up to work by writing the buffer to a FileStream:
var bufferSize = 1024 * 10;
var url = #"https://github.com/BlackBirdTV/tank/releases/latest/download/runtime.zip?raw=true";
using var client = new HttpClient();
using var stream = await client.GetStreamAsync(url);
using var file = new FileStream(#"c:\temp\runtime.zip", FileMode.Create);
var filePath = Path.GetFullPath("runtime");
var request = await client.GetStreamAsync(url);
var buffer = new byte[(int)bufferSize];
var totalBytesRead = 0;
int bytes = 0;
while ((bytes = await request.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
await file.WriteAsync(buffer, 0, bytes);
totalBytesRead += bytes;
ConsoleUtilities.UpdateProgress($"Downloading Runtime... ({totalBytesRead} of {bufferSize} bytes read) ", (int)(totalBytesRead / bufferSize * 100));
}

S3 file uploaded is getting corrupted

I have some code that uploads filestream to s3 bucket. One of my customer is having some issues but I'm having trouble reproducing it on my end. Upon upload, their filesize remains at 0 bytes. This does not occur every time. Seems very sporadic.
using (var webclient = new WebClient())
{
using (MemoryStream stream = new MemoryStream(webclient.DownloadData(uri)))
{
using (var client = new AmazonS3Client())
{
PutObjectRequest request = new PutObjectRequest
{
BucketName = bucketName,
Key = GetObjectKey(fileName, companyAccountId),
InputStream = stream
};
if (height > 0)
request.Metadata.Add("x-amz-meta-height", height.ToString());
if (width > 0)
request.Metadata.Add("x-amz-meta-width", width.ToString());
var response = await client.PutObjectAsync(request);
}
}
}
Any help or suggestion would be greatly appreciated to help determine why a file being uploaded is remaining at 0 bytes.
Is there anything else that could be consuming the inputStream and leaving the position at the end? I have had a similar problem before and solved the problem by ensuring the stream is at the start by either
stream.Seek(0, SeekOrigin.Begin);
or
stream.Position = 0;

Split an avro file and upload to REST

I have created some avro files. I can use the following commands to convert them to json, just to check whether the files are ok
java -jar avro-tools-1.8.2.jar tojson FileName.avro>outputfilename.json
Now, I have some big avro files and the REST API I m trying to upload to, has size limitations and thus I am trying to upload it in chunks using streams.
The following sample, which just reads from the original file in chunks and copies to another avro file, creates the file perfectly
using System;
using System.IO;
class Test
{
public static void Main()
{
// Specify a file to read from and to create.
string pathSource = #"D:\BDS\AVRO\filename.avro";
string pathNew = #"D:\BDS\AVRO\test\filenamenew.avro";
try
{
using (FileStream fsSource = new FileStream(pathSource,
FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = (int)fsSource.Length;
int numBytesRead = 0;
using (FileStream fsNew = new FileStream(pathNew,
FileMode.Append, FileAccess.Write))
{
// Read the source file into a byte array.
//byte[] bytes = new byte[fsSource.Length];
//int numBytesToRead = (int)fsSource.Length;
//int numBytesRead = 0;
while (numBytesToRead > 0)
{
int bytesRead = fsSource.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
// Read may return anything from 0 to numBytesToRead.
// Break when the end of the file is reached.
if (bytesRead == 0)
break;
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
fsNew.Write(actualbytes, 0, actualbytes.Length);
}
}
}
// Write the byte array to the other FileStream.
}
catch (FileNotFoundException ioEx)
{
Console.WriteLine(ioEx.Message);
}
}
}
How do I know this creates a ok avro. Because the earlier command to convert to json, again works i.e.
java -jar avro-tools-1.8.2.jar tojson filenamenew.avro>outputfilename.json
However, when I use the same code, but instead of copying to another file, just call a rest api, the file gets uploaded but upon downloading the same file from the server and running the command above to convert to json says - "Not a Data file".
So, obviously something is getting corrupted and I am struggling to figure out what.
This is the snippet
string filenamefullyqualified = path + filename;
Stream stream = System.IO.File.Open(filenamefullyqualified, FileMode.Open, FileAccess.Read, FileShare.None);
long? position = 0;
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = stream.Length;
int numBytesRead = 0;
do
{
var content = new MultipartFormDataContent();
int bytesRead = stream.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
if (bytesRead == 0)
break;
//Append Data
url = String.Format("https://{0}.dfs.core.windows.net/raw/datawarehouse/{1}/{2}/{3}/{4}/{5}?action=append&position={6}", datalakeName, filename.Substring(0, filename.IndexOf("_")), year, month, day, filename, position.ToString());
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
ByteArrayContent byteContent = new ByteArrayContent(actualbytes);
content.Add(byteContent);
method = new HttpMethod("PATCH");
request = new HttpRequestMessage(method, url)
{
Content = content
};
request.Headers.Add("Authorization", "Bearer " + accesstoken);
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
position = position + request.Content.Headers.ContentLength;
Array.Clear(buffer, 0, buffer.Length);
} while (numBytesToRead > 0);
stream.Close();
I have looked through the forum threads but haven't come across anything which deals with splitting of avro files.
I have a hunch that my "content" for the http request isn't right. what is it that I am missing?
If you need more details, I will be happy to provide.
I have found the problem now. The problem was because of MultipartFormDataContent. When an avro file is uploaded with that, it adds extra text like content Type etc, along with removal of many lines (I do not know why).
So, the solution was to upload the contents as "ByteArrayContent" itself and not add it to MultipartFormDataContent like I was doing earlier.
Here is the snippet, almost similar to the one in the question, except that I no longer use MultipartFormDataContent
string filenamefullyqualified = path + filename;
Stream stream = System.IO.File.Open(filenamefullyqualified, FileMode.Open, FileAccess.Read, FileShare.None);
//content.Add(CreateFileContent(fs, path, filename, "text/plain"));
long? position = 0;
byte[] buffer = new byte[(20 * 1024 * 1024) + 100];
long numBytesToRead = stream.Length;
int numBytesRead = 0;
//while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
//{
do
{
//var content = new MultipartFormDataContent();
int bytesRead = stream.Read(buffer, 0, buffer.Length);
byte[] actualbytes = new byte[bytesRead];
Array.Copy(buffer, actualbytes, bytesRead);
if (bytesRead == 0)
break;
//Append Data
url = String.Format("https://{0}.dfs.core.windows.net/raw/datawarehouse/{1}/{2}/{3}/{4}/{5}?action=append&position={6}", datalakeName, filename.Substring(0, filename.IndexOf("_")), year, month, day, filename, position.ToString());
numBytesRead += bytesRead;
numBytesToRead -= bytesRead;
ByteArrayContent byteContent = new ByteArrayContent(actualbytes);
//byteContent.Headers.ContentType= new MediaTypeHeaderValue("text/plain");
//content.Add(byteContent);
method = new HttpMethod("PATCH");
//request = new HttpRequestMessage(method, url)
//{
// Content = content
//};
request = new HttpRequestMessage(method, url)
{
Content = byteContent
};
request.Headers.Add("Authorization", "Bearer " + accesstoken);
var response = await client.SendAsync(request);
response.EnsureSuccessStatusCode();
position = position + request.Content.Headers.ContentLength;
Array.Clear(buffer, 0, buffer.Length);
} while (numBytesToRead > 0);
stream.Close();
But the streaming by record will not be able to handle the AVRO file as a whole in a transaction. We may end up in partial success, if some records fail, for example.
If we have a small tool that can split AVRO files based on a threshold number of records, it will be great.
The spark-based split by partition technique does allow to split data set to a pre-defined number of files; but, it does not allow splitting based on the number of records. I.e., I do not want an AVRO file with more than 500 records.
So we have to devise a batching logic based on the comfortable heap size the application can handle along with a two-phase commit, to handle transactions

Can't download complete image file from skydrive using REST API

I'm working on a quick wrapper for the skydrive API in C#, but running into issues with downloading a file. For the first part of the file, everything comes through fine, but then there start to be differences in the file and shortly thereafter everything becomes null. I'm fairly sure that it's just me not reading the stream correctly.
This is the code I'm using to download the file:
public const string ApiVersion = "v5.0";
public const string BaseUrl = "https://apis.live.net/" + ApiVersion + "/";
public SkyDriveFile DownloadFile(SkyDriveFile file)
{
string uri = BaseUrl + file.ID + "/content";
byte[] contents = GetResponse(uri);
file.Contents = contents;
return file;
}
public byte[] GetResponse(string url)
{
checkToken();
Uri requestUri = new Uri(url + "?access_token=" + HttpUtility.UrlEncode(token.AccessToken));
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUri);
request.Method = WebRequestMethods.Http.Get;
WebResponse response = request.GetResponse();
Stream responseStream = response.GetResponseStream();
byte[] contents = new byte[response.ContentLength];
responseStream.Read(contents, 0, (int)response.ContentLength);
return contents;
}
This is the image file I'm trying to download
And this is the image I am getting
These two images lead me to believe that I'm not waiting for the response to finish coming through, because the content-length is the same as the size of the image I'm expecting, but I'm not sure how to make my code wait for the entire response to come through or even really if that's the approach I need to take.
Here's my test code in case it's helpful
[TestMethod]
public void CanUploadAndDownloadFile()
{
var api = GetApi();
SkyDriveFolder folder = api.CreateFolder(null, "TestFolder", "Test Folder");
SkyDriveFile file = api.UploadFile(folder, TestImageFile, "TestImage.png");
file = api.DownloadFile(file);
api.DeleteFolder(folder);
byte[] contents = new byte[new FileInfo(TestImageFile).Length];
using (FileStream fstream = new FileStream(TestImageFile, FileMode.Open))
{
fstream.Read(contents, 0, contents.Length);
}
using (FileStream fstream = new FileStream(TestImageFile + "2", FileMode.CreateNew))
{
fstream.Write(file.Contents, 0, file.Contents.Length);
}
Assert.AreEqual(contents.Length, file.Contents.Length);
bool sameData = true;
for (int i = 0; i < contents.Length && sameData; i++)
{
sameData = contents[i] == file.Contents[i];
}
Assert.IsTrue(sameData);
}
It fails at Assert.IsTrue(sameData);
This is because you don't check the return value of responseStream.Read(contents, 0, (int)response.ContentLength);. Read doesn't ensure that it will read response.ContentLength bytes. Instead it returns the number of bytes read. You can use a loop or stream.CopyTo there.
Something like this:
WebResponse response = request.GetResponse();
MemoryStream m = new MemoryStream();
response.GetResponseStream().CopyTo(m);
byte[] contents = m.ToArray();
As LB already said, you need to continue to call Read() until you have read the entire stream.
Although Stream.CopyTo will copy the entire stream it does not ensure that read the number of bytes expected. The following method will solve this and raise an IOException if it does not read the length specified...
public static void Copy(Stream input, Stream output, long length)
{
byte[] bytes = new byte[65536];
long bytesRead = 0;
int len = 0;
while (0 != (len = input.Read(bytes, 0, Math.Min(bytes.Length, (int)Math.Min(int.MaxValue, length - bytesRead)))))
{
output.Write(bytes, 0, len);
bytesRead = bytesRead + len;
}
output.Flush();
if (bytesRead != length)
throw new IOException();
}

SharpZipLib Deflater creates bad data

Original compressed data can be correctly inflated back. However, if I inflate data, deflate, and again inflate, resulting data are incorrect (e.g. simple data extraction, its modification and again compression - only now when testing no modification occurs, so I can test it).
Resulting data are somehow "damaged". The starting (about) 40 bytes are OK, and then "block" of incorrect data follows (remnants of original data are still there, but many bytes are missing).
Changing compression level doesn't help (except setting NO_COMPRESSION creates somehow incomplete stream).
Question is simple: why is that happening?
using ICSharpCode.SharpZipLib.Zip.Compression;
public byte[] Inflate(byte[] inputData)
{
Inflater inflater = new Inflater(false);
using (var inputStream = new MemoryStream(inputData))
using (var ms = new MemoryStream())
{
var inputBuffer = new byte[4096];
var outputBuffer = new byte[4096];
while (inputStream.Position < inputData.Length)
{
var read = inputStream.Read(inputBuffer, 0, inputBuffer.Length);
inflater.SetInput(inputBuffer, 0, read);
while (inflater.IsNeedingInput == false)
{
var written = inflater.Inflate(outputBuffer, 0, outputBuffer.Length);
if (written == 0)
break;
ms.Write(outputBuffer, 0, written);
}
if (inflater.IsFinished == true)
break;
}
inflater.Reset();
return ms.ToArray();
}
}
public byte[] Deflate(byte[] inputData)
{
Deflater deflater = new Deflater(Deflater.BEST_SPEED, false);
deflater.SetInput(inputData);
deflater.Finish();
using (var ms = new MemoryStream())
{
var outputBuffer = new byte[65536 * 4];
while (deflater.IsNeedingInput == false)
{
var read = deflater.Deflate(outputBuffer);
ms.Write(outputBuffer, 0, read);
if (deflater.IsFinished == true)
break;
}
deflater.Reset();
return ms.ToArray();
}
}
Edit: My bad, by mistake I rewrote first several bytes of the original compressed data. This isn't SharpZipLib fault, but mine.
I know this is a tangential answer, but the exact same thing happened to me, I abandoned SharpZipLib and went to DotNetZip :
http://dotnetzip.codeplex.com/
Easier API, no corrupt or strange byte order files.

Categories

Resources