High memory usage and slow upload using AWS S3 TransferUtility - c#

I have created an application using C# and the AWS .Net SDK (v3.3.106.25) to upload some database backup files to a S3 bucket. The application is currently unusable as memory usage goes up to 100% and uploads of files take progressively longer.
I am trying to upload 3 files, 2 of which are about 1.45GB and one is about 4MB. I am using the TransferUtility method as I understand that it utilises multi part uploads. I have set the part size to 16MB. Each file is uploaded consecutively. Here are some facts about the upload:
File 1 - 4MB - upload duration 4 seconds
File 2 - 1.47GB - upload duration 11.5 minutes
File 3 - 1.45GB - upload duration 1 hour 12 minutes before killing the process as PC became unusable
I am running this on a Windows 10 machine with 16GB RAM and Intel Core i7 CPU # 3.40GHz
Here is my upload code:
private async Task UploadFileAsync(string keyName, string filePath, int partSizeMB, S3StorageClass storageClass)
{
try
{
using (IAmazonS3 s3Client = new AmazonS3Client(_region))
{
var fileTransferUtility = new TransferUtility(s3Client);
var fileTransferUtilityRequest = new TransferUtilityUploadRequest
{
BucketName = _bucketName,
FilePath = filePath,
StorageClass = storageClass,
PartSize = partSizeMB * 1024 * 1024, // set to 16MB
Key = keyName,
CannedACL = S3CannedACL.Private
};
await fileTransferUtility.UploadAsync(fileTransferUtilityRequest);
}
}
catch (AmazonS3Exception e)
{
string errMsg = string.Format("Error encountered on server. Message:{0} when writing an object", e.Message);
System.Exception argEx = new System.Exception(errMsg, e.InnerException);
throw argEx;
}
catch (Exception e)
{
string errMsg = string.Format("Unknown encountered on server. Message:'{0}' when writing an object", e.Message);
System.Exception argEx = new System.Exception(errMsg, e.InnerException);
throw argEx;
}
}
This code is being called 3 times in a loop with each call awaited.
Can anyone please suggest how I can upload these files in a more efficient manner.
Many thanks.

I have decided to abandon the use of the high level TransferUtility API method as it doesn't seem fit for purpose for large files. It seems that it loads the whole file into memory before splitting it into parts and uploading each part. For large files it just consumes all available memory and your server can grind to a halt.
For anyone interested this is how I have solved the issue:
I now use the low level api methods InitiateMultipartUploadAsync,UploadPartAsync and CompleteMultipartUploadAsync and manage the multi part upload myself.
The key to making this work is the use of the .Net MemoryMappedFile class and the CreateViewStream method to manage only retrieving the parts of the file into memory one at a time.
I use a Queue to control which parts have been uploaded and also to retry any individual parts that might have failed.
Here is my new code:
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using System;
using System.IO;
using System.Threading.Tasks;
using System.Threading;
using System.Collections.Generic;
using System.IO.MemoryMappedFiles;
using System.Linq;
using Amazon.Runtime;
public class S3Upload
{
// declarations
private readonly string _bucketName;
private readonly RegionEndpoint _region;
//event handlers
public event EventHandler<ProgressUpdatedEventArgs> OnProgressUpdated;
private bool CheckFilePath(string filePath)
{
// check the filePath exists
if (!Directory.Exists(Path.GetDirectoryName(filePath)))
{
return false;
}
if (!File.Exists(filePath))
{
return false;
}
return true;
}
public async Task UploadFileMultiPartAsync(string keyName, string filePath, string storageClass,
int partSizeMB = 16, int retryWaitInterval = 60000,
int maxRetriesOnFail = 10)
{
if (CheckFilePath(filePath))
{
long fileSize = new FileInfo(filePath).Length;
long partSize = partSizeMB * (long)Math.Pow(1024, 2);
partSize = GetPartSize(fileSize, partSize);
S3StorageClass sClass = new S3StorageClass(storageClass);
try
{
await UploadFileMultiPartAsync(keyName, filePath, fileSize, partSize, sClass, retryWaitInterval, maxRetriesOnFail);
}
catch (Exception ex)
{
throw new Exception(ex.Message, ex.InnerException);
}
}
else
{
string errMsg = string.Format("Cannot find file {0}. Check the file exists and that the application has access permissions.", filePath);
System.IO.DirectoryNotFoundException argEx = new System.IO.DirectoryNotFoundException(errMsg);
throw argEx;
}
}
private async Task UploadFileMultiPartAsync(string keyName, string filePath, long fileSize,
long partSize, S3StorageClass storageClass,
int retryWaitInterval,
int maxRetriesOnFail)
{
int retryCount = 0;
long offset = 0;
// we need to calculate the number of parts based on the fileSize and partSize
int iterations = (int)Math.Ceiling((double)fileSize / (double)partSize);
int currentIterations = iterations;
// create a queue of indexes to be processed. Indexes will be removed from this list as the
// uploads are processed. If the upload is not successful then it will be re-added to the end
// of the queue for later retry. We pause after each full loop is completed before starting the retry
Queue<int> q = new Queue<int>(Enumerable.Range(0, iterations));
// the following 2 variables store values returned from the S3 call and are persisted throughout the loop
string uploadId = "";
List<PartETag> eTags = new List<PartETag>();
// Create the memory-mapped file.
using (var mmf = MemoryMappedFile.CreateFromFile(filePath, FileMode.Open, "uploadFile"))
{
while (q.Count > 0)
{
int iPart = q.Dequeue();
offset = iPart * partSize;
long chunkSize = (offset + partSize > fileSize) ? fileSize - offset : partSize;
using (var stream = mmf.CreateViewStream(offset, chunkSize))
{
using (BinaryReader binReader = new BinaryReader(stream))
{
byte[] bytes = binReader.ReadBytes((int)stream.Length);
//convert to stream
MemoryStream mStream = new MemoryStream(bytes, false);
bool lastPart = (q.Count == 0) ? true : false;
UploadResponse response = await UploadChunk(keyName, uploadId, iPart, lastPart, mStream, eTags, iterations);
uploadId = response.uploadId;
eTags = response.eTags;
if (!response.success)
{
// the upload failed so we add the failed index to the back of the
// queue for retry later
q.Enqueue(iPart);
lastPart = false;
}
// if we have attempted an upload for every part and some have failed then we
// wait a bit then try resending the parts that failed. We try this a few times
// then give up.
if (!lastPart && iPart == currentIterations - 1)
{
if (retryCount < maxRetriesOnFail)
{
currentIterations = q.Count;
Thread.Sleep(retryWaitInterval);
retryCount += 1;
}
else
{
// reached maximum retries so we abort upload and raise error
try
{
await AbortMultiPartUploadAsync(keyName, uploadId);
string errMsg = "Multi part upload aborted. Some parts could not be uploaded. Maximum number of retries reached.";
throw new Exception(errMsg);
}
catch (Exception ex)
{
string errMsg = string.Format("Multi part upload failed. Maximum number of retries reached. Unable to abort upload. Error: {0}", ex.Message);
throw new Exception(errMsg);
}
}
}
}
}
}
}
}
private async Task AbortMultiPartUploadAsync(string keyName, string uploadId)
{
using (var _s3Client = new AmazonS3Client(_region))
{
AbortMultipartUploadRequest abortMPURequest = new AbortMultipartUploadRequest
{
BucketName = _bucketName,
Key = keyName,
UploadId = uploadId
};
await _s3Client.AbortMultipartUploadAsync(abortMPURequest);
}
}
private async Task<UploadResponse> UploadChunk(string keyName, string uploadId, int chunkIndex, bool lastPart, MemoryStream stream, List<PartETag> eTags, int numParts)
{
try
{
using (var _s3Client = new AmazonS3Client(_region))
{
var partNumber = chunkIndex + 1;
// Step 1: build and send a multi upload request
// we check uploadId == "" rather than chunkIndex == 0 as if the initiate call failed on the first run
// then chunkIndex = 0 would have been added to the end of the queue for retries and uploadId
// will still not have been initialized, even though we might be on a later chunkIndex
if (uploadId == "")
{
var initiateRequest = new InitiateMultipartUploadRequest
{
BucketName = _bucketName,
Key = keyName
};
InitiateMultipartUploadResponse initResponse = await _s3Client.InitiateMultipartUploadAsync(initiateRequest);
uploadId = initResponse.UploadId;
}
// Step 2: upload each chunk (this is run for every chunk unlike the other steps which are run once)
var uploadRequest = new UploadPartRequest
{
BucketName = _bucketName,
Key = keyName,
UploadId = uploadId,
PartNumber = partNumber,
InputStream = stream,
IsLastPart = lastPart,
PartSize = stream.Length
};
// Track upload progress.
uploadRequest.StreamTransferProgress +=
(_, e) => OnPartUploadProgressUpdate(numParts, uploadRequest, e);
UploadPartResponse uploadResponse = await _s3Client.UploadPartAsync(uploadRequest);
//Step 3: build and send the multipart complete request
if (lastPart)
{
eTags.Add(new PartETag
{
PartNumber = partNumber,
ETag = uploadResponse.ETag
});
var completeRequest = new CompleteMultipartUploadRequest
{
BucketName = _bucketName,
Key = keyName,
UploadId = uploadId,
PartETags = eTags
};
CompleteMultipartUploadResponse result = await _s3Client.CompleteMultipartUploadAsync(completeRequest);
return new UploadResponse(uploadId, eTags, true);
}
else
{
eTags.Add(new PartETag
{
PartNumber = partNumber,
ETag = uploadResponse.ETag
});
return new UploadResponse(uploadId, eTags, true);
}
}
}
catch
{
return new UploadResponse(uploadId, eTags, false);
}
}
private class UploadResponse
{
public string uploadId { get; set; }
public List<PartETag> eTags { get; set; }
public bool success { get; set; }
public UploadResponse(string Id, List<PartETag> Tags, bool succeeded)
{
uploadId = Id;
eTags = Tags;
success = succeeded;
}
}
private void OnPartUploadProgressUpdate(int numParts, UploadPartRequest request, StreamTransferProgressArgs e)
{
// Process event.
if (OnProgressUpdated != null)
{
int partIndex = request.PartNumber - 1;
int totalIncrements = numParts * 100;
int percentDone = (int)Math.Floor((double)(partIndex * 100 + e.PercentDone) / (double)totalIncrements * 100);
OnProgressUpdated(this, new ProgressUpdatedEventArgs(percentDone));
}
}
private long GetPartSize(long fileSize, long partSize)
{
// S3 multi part limits
//====================================
// min part size = 5MB
// max part size = 5GB
// total number of parts = 10,000
//====================================
if (fileSize < partSize)
{
partSize = fileSize;
}
if (partSize <= 0)
{
return Math.Min(fileSize, 16 * (long)Math.Pow(1024, 2)); // default part size to 16MB
}
if (partSize > 5000 * (long)Math.Pow(1024, 2))
{
return 5000 * (long)Math.Pow(1024, 2);
}
if (fileSize / partSize > 10000)
{
return (int)(fileSize / 10000);
}
return partSize;
}
}
public class ProgressUpdatedEventArgs : EventArgs
{
public ProgressUpdatedEventArgs(int iPercentDone)
{ PercentDone = iPercentDone; }
public int PercentDone { get; set; }
}

Related

Access token empty error when uploading large files to a ToDoTask using Graph Api

I am trying to attach large files to a ToDoTask using the Graph Api using the example in the docs for attaching large files for ToDoTask and the recommend class LargeFileUploadTask for uploading large files.
I have done this sucessfully before with attaching large files to emails and sending so i used that as base for the following method.
public async Task CreateTaskBigAttachments( string idList, string title, List<string> categories,
BodyType contentType, string content, Importance importance, bool isRemindOn, DateTime? dueTime, cAttachment[] attachments = null)
{
try
{
var _newTask = new TodoTask
{
Title = title,
Categories = categories,
Body = new ItemBody()
{
ContentType = contentType,
Content = content,
},
IsReminderOn = isRemindOn,
Importance = importance
};
if (dueTime.HasValue)
{
var _timeZone = TimeZoneInfo.Local;
_newTask.DueDateTime = DateTimeTimeZone.FromDateTime(dueTime.Value, _timeZone.StandardName);
}
var _task = await _graphServiceClient.Me.Todo.Lists[idList].Tasks.Request().AddAsync(_newTask);
//Add attachments
if (attachments != null)
{
if (attachments.Length > 0)
{
foreach (var _attachment in attachments)
{
var _attachmentContentSize = _attachment.ContentBytes.Length;
var _attachmentInfo = new AttachmentInfo
{
AttachmentType = AttachmentType.File,
Name = _attachment.FileName,
Size = _attachmentContentSize,
ContentType = _attachment.ContentType
};
var _uploadSession = await _graphServiceClient.Me
.Todo.Lists[idList].Tasks[_task.Id]
.Attachments.CreateUploadSession(_attachmentInfo).Request().PostAsync();
using (var _stream = new MemoryStream(_attachment.ContentBytes))
{
_stream.Position = 0;
LargeFileUploadTask<TaskFileAttachment> _largeFileUploadTask = new LargeFileUploadTask<TaskFileAttachment>(_uploadSession, _stream, MaxChunkSize);
try
{
await _largeFileUploadTask.UploadAsync();
}
catch (ServiceException errorGraph)
{
if (errorGraph.StatusCode == HttpStatusCode.InternalServerError || errorGraph.StatusCode == HttpStatusCode.BadGateway
|| errorGraph.StatusCode == HttpStatusCode.ServiceUnavailable || errorGraph.StatusCode == HttpStatusCode.GatewayTimeout)
{
Thread.Sleep(1000); //Wait time until next attempt
//Try again
await _largeFileUploadTask.ResumeAsync();
}
else
throw errorGraph;
}
}
}
}
}
}
catch (ServiceException errorGraph)
{
throw errorGraph;
}
catch (Exception ex)
{
throw ex;
}
}
Up to the point of creating the task everything goes well, it does create the task for the user and its properly shown in the user tasks list. Also, it does create an upload session properly.
The problem comes when i am trying to upload the large file in the UploadAsync instruction.
The following error happens.
Code: InvalidAuthenticationToken Message: Access token is empty.
But according to the LargeFileUploadTask doc , the client does not need to set Auth Headers.
param name="baseClient" To use for making upload requests. The client should not set Auth headers as upload urls do not need them.
Is not LargeFileUploadTask allowed to be used to upload large files to a ToDoTask?
If not then what is the proper way to upload large files to a ToDoTask using the Graph Api, can someone provide an example?
If you want, you can raise an issue for the same with the details here, so that they can have look: https://github.com/microsoftgraph/msgraph-sdk-dotnet-core/issues.
It seems like its a bug and they are working on it.
Temporarily I did this code to deal with the issue of the large files.
var _task = await _graphServiceClient.Me.Todo.Lists[idList].Tasks.Request().AddAsync(_newTask);
//Add attachments
if (attachments != null)
{
if (attachments.Length > 0)
{
foreach (var _attachment in attachments)
{
var _attachmentContentSize = _attachment.ContentBytes.Length;
var _attachmentInfo = new AttachmentInfo
{
AttachmentType = AttachmentType.File,
Name = _attachment.FileName,
Size = _attachmentContentSize,
ContentType = _attachment.ContentType
};
var _uploadSession = await _graphServiceClient.Me
.Todo.Lists[idList].Tasks[_task.Id]
.Attachments.CreateUploadSession(_attachmentInfo).Request().PostAsync();
// Get the upload URL and the next expected range from the response
string _uploadUrl = _uploadSession.UploadUrl;
using (var _stream = new MemoryStream(_attachment.ContentBytes))
{
_stream.Position = 0;
// Create a byte array to hold the contents of each chunk
byte[] _chunk = new byte[MaxChunkSize];
//Bytes to read
int _bytesRead = 0;
//Times the stream has been read
var _ind = 0;
while ((_bytesRead = _stream.Read(_chunk, 0, _chunk.Length)) > 0)
{
// Calculate the range of the current chunk
string _currentChunkRange = $"bytes {_ind * MaxChunkSize}-{_ind * MaxChunkSize + _bytesRead - 1}/{_stream.Length}";
//Despues deberiamos calcular el next expected range en caso de ocuparlo
// Create a ByteArrayContent object from the chunk
ByteArrayContent _byteArrayContent = new ByteArrayContent(_chunk, 0, _bytesRead);
// Set the header for the current chunk
_byteArrayContent.Headers.Add("Content-Range", _currentChunkRange);
_byteArrayContent.Headers.Add("Content-Type", _attachment.ContentType);
_byteArrayContent.Headers.Add("Content-Length", _bytesRead.ToString());
// Upload the chunk using the httpClient Request
var _client = new HttpClient();
var _requestMessage = new HttpRequestMessage()
{
RequestUri = new Uri(_uploadUrl + "/content"),
Method = HttpMethod.Put,
Headers =
{
{ "Authorization", bearerToken },
}
};
_requestMessage.Content = _byteArrayContent;
var _response = await _client.SendAsync(_requestMessage);
if (!_response.IsSuccessStatusCode)
throw new Exception("File attachment failed");
_ind++;
}
}
}
}
}

DeleteMessageAsync() not deleting message in SQS queue .Net Core

I am trying to delete message in SQS queue, but it is not deleting in the queue. I have been trying to make a lot of changes, but is still not working. I am new to c#, .net core, and AWS. Can anyone please help me with this?
Here is my main method:
[HttpGet]
public async Task<ReceiveMessageResponse> Get()
{
ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest
{
WaitTimeSeconds = 3 //it'll ping the queue for 3 seconds if I don't do this, sometimes I receive message and sometimes I don't
};
receiveMessageRequest.QueueUrl = myQueueUrl;
receiveMessageRequest.MaxNumberOfMessages = 10; // can change number of messages as needed
//receiveing messages/responses
var receiveMessageResponse = await amazonSQSClient.ReceiveMessageAsync(receiveMessageRequest);
if (receiveMessageResponse.Messages.Count > 0){
var bucketName = getBucketName(receiveMessageResponse);
var objectKey = getObjectKey(receiveMessageResponse);
var versionId = getVersionId(receiveMessageResponse);
string filePath = "C:\\InputPdfFile\\"; // change it later
string path = filePath + objectKey;
//get the file from s3 bucket and download it in in
var downloadInputFile = await DownloadAsync(path, versionId, objectKey);
//Get score from the output file
string jsonOutputFileName = "\\file-1.txt"; //change it later from text file to json file
string jsonOutputPath = "C:\\OutputJsonFile"; //change it later
string jasonArchivePath = "C:\\ArchiveJsonFile"; //change it later
int score = GetOutputScore(jsonOutputPath, jsonOutputFileName);
//update metadata from the score received from ML worker (GetOutputScore)
PutObjectResponse putObjectResponse = await UpdateMetadataAsync(score);
//Move file from output to archive after updating metadata
string sourceFile = jsonOutputPath + jsonOutputFileName;
string destFile = jasonArchivePath + jsonOutputFileName;
if (!Directory.Exists(jasonArchivePath))
{
Directory.CreateDirectory(jasonArchivePath);
}
System.IO.File.Move(sourceFile, destFile);
//delete message after moving file from archive
*DeleteMessage(receiveMessageResponse);* //not sure why it is not deleting**
}
return receiveMessageResponse;
}
Here is my Delete method:
public async void DeleteMessage(ReceiveMessageResponse receiveMessageResponse)
{
if (receiveMessageResponse.Messages.Count > 0)
{
foreach (var message in receiveMessageResponse.Messages)
{
var delRequest = new DeleteMessageRequest
{
QueueUrl = myQueueUrl,
ReceiptHandle = message.ReceiptHandle
};
var deleteMessage = await amazonSQSClient.DeleteMessageAsync(delRequest);
}
}
else // It is not going in else because the message was found but still not deleting it
{
Console.WriteLine("No message found");
}
}
Any help would be greatly appreciated!

Parse WebCacheV01.dat in C#

I'm looking to parse the WebCacheV01.dat file using C# to find the last file location for upload in an Internet browser.
%LocalAppData%\Microsoft\Windows\WebCache\WebCacheV01.dat
I using the Managed Esent nuget package.
Esent.Isam
Esent.Interop
When I try and run the below code it fails at:
Api.JetGetDatabaseFileInfo(filePath, out pageSize, JET_DbInfo.PageSize);
Or if I use
Api.JetSetSystemParameter(instance, JET_SESID.Nil, JET_param.CircularLog, 1, null);
at
Api.JetAttachDatabase(sesid, filePath, AttachDatabaseGrbit.ReadOnly);
I get the following error:
An unhandled exception of type
'Microsoft.Isam.Esent.Interop.EsentFileAccessDeniedException' occurred
in Esent.Interop.dll
Additional information: Cannot access file, the file is locked or in use
string localAppDataPath = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData);
string filePathExtra = #"\Microsoft\Windows\WebCache\WebCacheV01.dat";
string filePath = string.Format("{0}{1}", localAppDataPath, filePathExtra);
JET_INSTANCE instance;
JET_SESID sesid;
JET_DBID dbid;
JET_TABLEID tableid;
String connect = "";
JET_SNP snp;
JET_SNT snt;
object data;
int numInstance = 0;
JET_INSTANCE_INFO [] instances;
int pageSize;
JET_COLUMNDEF columndef = new JET_COLUMNDEF();
JET_COLUMNID columnid;
Api.JetCreateInstance(out instance, "instance");
Api.JetGetDatabaseFileInfo(filePath, out pageSize, JET_DbInfo.PageSize);
Api.JetSetSystemParameter(JET_INSTANCE.Nil, JET_SESID.Nil, JET_param.DatabasePageSize, pageSize, null);
//Api.JetSetSystemParameter(instance, JET_SESID.Nil, JET_param.CircularLog, 1, null);
Api.JetInit(ref instance);
Api.JetBeginSession(instance, out sesid, null, null);
//Do stuff in db
Api.JetEndSession(sesid, EndSessionGrbit.None);
Api.JetTerm(instance);
Is it not possible to read this without making modifications?
Viewer
http://www.nirsoft.net/utils/ese_database_view.html
Python
https://jon.glass/attempts-to-parse-webcachev01-dat/
libesedb
impacket
Issue:
The file is probably in use.
Solution:
in order to free the locked file, please stop the Schedule Task -\Microsoft\Windows\Wininet\CacheTask.
The Code
public override IEnumerable<string> GetBrowsingHistoryUrls(FileInfo fileInfo)
{
var fileName = fileInfo.FullName;
var results = new List<string>();
try
{
int pageSize;
Api.JetGetDatabaseFileInfo(fileName, out pageSize, JET_DbInfo.PageSize);
SystemParameters.DatabasePageSize = pageSize;
using (var instance = new Instance("Browsing History"))
{
var param = new InstanceParameters(instance);
param.Recovery = false;
instance.Init();
using (var session = new Session(instance))
{
Api.JetAttachDatabase(session, fileName, AttachDatabaseGrbit.ReadOnly);
JET_DBID dbid;
Api.JetOpenDatabase(session, fileName, null, out dbid, OpenDatabaseGrbit.ReadOnly);
using (var tableContainers = new Table(session, dbid, "Containers", OpenTableGrbit.ReadOnly))
{
IDictionary<string, JET_COLUMNID> containerColumns = Api.GetColumnDictionary(session, tableContainers);
if (Api.TryMoveFirst(session, tableContainers))
{
do
{
var retrieveColumnAsInt32 = Api.RetrieveColumnAsInt32(session, tableContainers, columnIds["ContainerId"]);
if (retrieveColumnAsInt32 != null)
{
var containerId = (int)retrieveColumnAsInt32;
using (var table = new Table(session, dbid, "Container_" + containerId, OpenTableGrbit.ReadOnly))
{
var tableColumns = Api.GetColumnDictionary(session, table);
if (Api.TryMoveFirst(session, table))
{
do
{
var url = Api.RetrieveColumnAsString(
session,
table,
tableColumns["Url"],
Encoding.Unicode);
var downloadedFileName = Api.RetrieveColumnAsString(
session,
table,
columnIds2["Filename"]);
if(string.IsNullOrEmpty(downloadedFileName)) // check for download history only.
continue;
// Order by access Time to find the last uploaded file.
var accessedTime = Api.RetrieveColumnAsInt64(
session,
table,
columnIds2["AccessedTime"]);
var lastVisitTime = accessedTime.HasValue ? DateTime.FromFileTimeUtc(accessedTime.Value) : DateTime.MinValue;
results.Add(url);
}
while (Api.TryMoveNext(session, table.JetTableid));
}
}
}
} while (Api.TryMoveNext(session, tableContainers));
}
}
}
}
}
catch (Exception ex)
{
// log goes here....
}
return results;
}
Utils
Task Scheduler Wrapper
You can use Microsoft.Win32.TaskScheduler.TaskService Wrapper to stop it using c#, just add this Nuget package [nuget]:https://taskscheduler.codeplex.com/
Usage
public static FileInfo CopyLockedFileRtl(DirectoryInfo directory, FileInfo fileInfo, string remoteEndPoint)
{
FileInfo copiedFileInfo = null;
using (var ts = new TaskService(string.Format(#"\\{0}", remoteEndPoint)))
{
var task = ts.GetTask(#"\Microsoft\Windows\Wininet\CacheTask");
task.Stop();
task.Enabled = false;
var byteArray = FileHelper.ReadOnlyAllBytes(fileInfo);
var filePath = Path.Combine(directory.FullName, "unlockedfile.dat");
File.WriteAllBytes(filePath, byteArray);
copiedFileInfo = new FileInfo(filePath);
task.Enabled = true;
task.Run();
task.Dispose();
}
return copiedFileInfo;
}
I was not able to get Adam's answer to work. What worked for me was making a copy with AlphaVSS (a .NET class library that has a managed API for the Volume Shadow Copy Service). The file was in "Dirty Shutdown" state, so I additionally wrote this to handle the exception it threw when I opened it:
catch (EsentErrorException ex)
{ // Usually after the database is copied, it's in Dirty Shutdown state
// This can be verified by running "esentutl.exe /Mh WebCacheV01.dat"
logger.Info(ex.Message);
switch (ex.Error)
{
case JET_err.SecondaryIndexCorrupted:
logger.Info("Secondary Index Corrupted detected, exiting...");
Api.JetTerm2(instance, TermGrbit.Complete);
return false;
case JET_err.DatabaseDirtyShutdown:
logger.Info("Dirty shutdown detected, attempting to recover...");
try
{
Api.JetTerm2(instance, TermGrbit.Complete);
Process.Start("esentutl.exe", "/p /o " + newPath);
Thread.Sleep(5000);
Api.JetInit(ref instance);
Api.JetBeginSession(instance, out sessionId, null, null);
Api.JetAttachDatabase(sessionId, newPath, AttachDatabaseGrbit.None);
}
catch (Exception e2)
{
logger.Info("Could not recover database " + newPath + ", will try opening it one last time. If that doesn't work, try using other esentutl commands", e2);
}
break;
}
}
I'm thinking about using the 'Recent Items' folder as when you select a file to upload an entry is written here:
C:\Users\USER\AppData\Roaming\Microsoft\Windows\Recent
string recent = (Environment.GetFolderPath(Environment.SpecialFolder.Recent));

Which way is better? Save a media file to MongoDB as array of bytes or as string?

I'm saving media files (pictures, PDFs, etc.) in MongoDB as array of bytes. I saw examples where people saved it by Encoding and Decoding array of bytes to string. What is the difference? Maybe difference in performance? So which way is better?
I've noticed that when file saved as array of bytes Mongo Management Studio opens collection longer then when it saved like string
I assume that you want to store the file inside a document.
But have you considered using GridFS vs storing the file inside the document?
Like Liam pointed out, a MongoDB provides a blog-post on GridFS considerations here
One of the advantages in a project I'm working on is that no checking on file-sizes has to be done, and you can simply write and read the file in a binary stream.
From a performance perspective, saving and retrieving the file in binary form is faster than first serializing it to a string.
In a test program, running against a MongoDb 3.2 database, saving a file in binary form in a document was up to 3 times faster than saving the file in a string-serialized form. Which is understandable, since the string-serialized form is simply 'more bytes' to save or read.
In the same test program a quick test was also performed against GridFS, but there you really have to play a round with the chunck-size to get to the best possible performance.
Below a code-dump for a very crude test program (note that you have to provide the right example.jpg yourself and that the database connection has been hard-coded.)
class Program
{
static bool keepRunning;
static string fileName = "example.jpg";
static int numDocs = 571;
static IMongoDatabase mongoDb;
static void Main(string[] args)
{
Console.CancelKeyPress += delegate
{
Exit();
};
keepRunning = true;
SetupMongoDb();
var fileBytes = File.ReadAllBytes(fileName);
Console.WriteLine($"Picturesize in bytes: {fileBytes.Length}");
ClearCollections();
Console.WriteLine($"Saving {numDocs} pictures to the database.");
Console.WriteLine("\nStart Saving in Binary Mode.");
Stopwatch binaryStopWatch = Stopwatch.StartNew();
SaveBinaryBased(numDocs, fileBytes);
binaryStopWatch.Stop();
Console.WriteLine("Done Saving in Binary Mode.");
Console.WriteLine("\nStart Saving in String-based Mode.");
Stopwatch stringStopWatch = Stopwatch.StartNew();
SaveStringBased(numDocs, fileBytes);
stringStopWatch.Stop();
Console.WriteLine("Done Saving in String-based Mode.");
Console.WriteLine("\nTime Report Saving");
Console.WriteLine($" * Total Time Binary for {numDocs} records: {binaryStopWatch.ElapsedMilliseconds} ms.");
Console.WriteLine($" * Total Time String for {numDocs} records: {stringStopWatch.ElapsedMilliseconds} ms.");
Console.WriteLine("\nCollection Statistics:");
Statistics("binaryPics");
Statistics("stringBasedPics");
Console.WriteLine("\nTest Retrieval:");
Console.WriteLine("\nStart Retrieving from binary collection.");
binaryStopWatch.Restart();
RetrieveBinary();
binaryStopWatch.Stop();
Console.WriteLine("Done Retrieving from binary collection.");
Console.WriteLine("\nStart Retrieving from string-based collection.");
stringStopWatch.Restart();
RetrieveString();
stringStopWatch.Stop();
Console.WriteLine("Done Retrieving from string-based collection.");
Console.WriteLine("\nTime Report Retrieving:");
Console.WriteLine($" * Total Time Binary for retrieving {numDocs} records: {binaryStopWatch.ElapsedMilliseconds} ms.");
Console.WriteLine($" * Total Time String for retrieving {numDocs} records: {stringStopWatch.ElapsedMilliseconds} ms.");
ClearGridFS();
Console.WriteLine($"\nStart saving {numDocs} files to GridFS:");
binaryStopWatch.Restart();
SaveFilesToGridFS(numDocs, fileBytes);
binaryStopWatch.Stop();
Console.WriteLine($"Saved {numDocs} files to GridFS in {binaryStopWatch.ElapsedMilliseconds} ms.");
Console.WriteLine($"\nStart retrieving {numDocs} files from GridFS:");
binaryStopWatch.Restart();
RetrieveFromGridFS();
binaryStopWatch.Stop();
Console.WriteLine($"Retrieved {numDocs} files from GridFS in {binaryStopWatch.ElapsedMilliseconds} ms.");
while (keepRunning)
{
Thread.Sleep(500);
}
}
private static void Exit()
{
keepRunning = false;
}
private static void ClearCollections()
{
var collectionBin = mongoDb.GetCollection<BsonDocument>("binaryPics");
var collectionString = mongoDb.GetCollection<BsonDocument>("stringBasedPics");
collectionBin.DeleteMany(new BsonDocument());
collectionString.DeleteMany(new BsonDocument());
}
private static void SetupMongoDb()
{
string hostName = "localhost";
int portNumber = 27017;
string databaseName = "exampleSerialization";
var clientSettings = new MongoClientSettings()
{
Server = new MongoServerAddress(hostName, portNumber),
MinConnectionPoolSize = 1,
MaxConnectionPoolSize = 1500,
ConnectTimeout = new TimeSpan(0, 0, 30),
SocketTimeout = new TimeSpan(0, 1, 30),
WaitQueueTimeout = new TimeSpan(0, 1, 0)
};
mongoDb = new MongoClient(clientSettings).GetDatabase(databaseName);
}
private static void SaveBinaryBased(int numDocuments, byte[] content)
{
var collection = mongoDb.GetCollection<BsonDocument>("binaryPics");
BsonDocument baseDoc = new BsonDocument();
baseDoc.SetElement(new BsonElement("jpgContent", content));
for (int i = 0; i < numDocs; ++i)
{
baseDoc.SetElement(new BsonElement("_id", Guid.NewGuid()));
baseDoc.SetElement(new BsonElement("filename", fileName));
baseDoc.SetElement(new BsonElement("title", $"picture number {i}"));
collection.InsertOne(baseDoc);
}
}
private static void SaveStringBased(int numDocuments, byte[] content)
{
var collection = mongoDb.GetCollection<BsonDocument>("stringBasedPics");
BsonDocument baseDoc = new BsonDocument();
baseDoc.SetElement(new BsonElement("jpgStringContent", System.Text.Encoding.UTF8.GetString(content)));
for (int i = 0; i < numDocs; ++i)
{
baseDoc.SetElement(new BsonElement("_id", Guid.NewGuid()));
baseDoc.SetElement(new BsonElement("filename", fileName));
baseDoc.SetElement(new BsonElement("title", $"picture number {i}"));
collection.InsertOne(baseDoc);
}
}
private static void Statistics(string collectionName)
{
new BsonDocument { { "collstats", collectionName } };
var command = new BsonDocumentCommand<BsonDocument>(new BsonDocument { { "collstats", collectionName } });
var stats = mongoDb.RunCommand(command);
Console.WriteLine($" * Collection : {collectionName}");
Console.WriteLine($" * Count : {stats["count"].AsInt32} documents");
Console.WriteLine($" * Average Doc Size: {stats["avgObjSize"].AsInt32} bytes");
Console.WriteLine($" * Total Storage : {stats["storageSize"].AsInt32} bytes");
Console.WriteLine("\n");
}
private static void RetrieveBinary()
{
var collection = mongoDb.GetCollection<BsonDocument>("binaryPics");
var docs = collection.Find(new BsonDocument()).ToEnumerable();
foreach (var doc in docs)
{
byte[] fileArray = doc.GetElement("jpgContent").Value.AsByteArray;
// we can simulate that we do something with the results but that's not the purpose of this experiment
fileArray = null;
}
}
private static void RetrieveString()
{
var collection = mongoDb.GetCollection<BsonDocument>("stringBasedPics");
var docs = collection.Find(new BsonDocument()).ToEnumerable();
foreach (var doc in docs)
{
// Simply get the string, we don't want to hit the performance test
// with a conversion to a byte array
string result = doc.GetElement("jpgStringContent").Value.AsString;
}
}
private static void SaveFilesToGridFS(int numFiles, byte[] content)
{
var bucket = new GridFSBucket(mongoDb, new GridFSBucketOptions
{
BucketName = "pictures"
});
for (int i = 0; i < numFiles; ++i)
{
string targetFileName = $"{fileName.Substring(0, fileName.Length - ".jpg".Length)}{i}.jpg";
int chunkSize = content.Length <= 1048576 ? 51200 : 1048576;
bucket.UploadFromBytes(targetFileName, content, new GridFSUploadOptions { ChunkSizeBytes = chunkSize });
}
}
private static void ClearGridFS()
{
var bucket = new GridFSBucket(mongoDb, new GridFSBucketOptions { BucketName = "pictures" });
bucket.Drop();
}
private static void RetrieveFromGridFS()
{
var bucket = new GridFSBucket(mongoDb, new GridFSBucketOptions { BucketName = "pictures" });
var filesIds = mongoDb.GetCollection<BsonDocument>("pictures.files").Find(new BsonDocument()).ToEnumerable().Select(doc => doc.GetElement("_id").Value);
foreach (var id in filesIds)
{
var fileBytes = bucket.DownloadAsBytes(id);
fileBytes = null;
}
}
}

C# Task.WaitAll isn't waiting

My aim is to download images from an Amazon Web Services bucket.
I have the following code function which downloads multiple images at once:
public static void DownloadFilesFromAWS(string bucketName, List<string> imageNames)
{
int batchSize = 50;
int maxDownloadMilliseconds = 10000;
List<Task> tasks = new List<Task>();
for (int i = 0; i < imageNames.Count; i++)
{
string imageName = imageNames[i];
Task task = Task.Run(() => GetFile(bucketName, imageName));
tasks.Add(task);
if (tasks.Count > 0 && tasks.Count % batchSize == 0)
{
Task.WaitAll(tasks.ToArray(), maxDownloadMilliseconds);//wait to download
tasks.Clear();
}
}
//if there are any left, wait for them
Task.WaitAll(tasks.ToArray(), maxDownloadMilliseconds);
}
private static void GetFile(string bucketName, string filename)
{
try
{
using (AmazonS3Client awsClient = new AmazonS3Client(Amazon.RegionEndpoint.EUWest1))
{
string key = Path.GetFileName(filename);
GetObjectRequest getObjectRequest = new GetObjectRequest() {
BucketName = bucketName,
Key = key
};
using (GetObjectResponse response = awsClient.GetObject(getObjectRequest))
{
string directory = Path.GetDirectoryName(filename);
if (!Directory.Exists(directory))
{
Directory.CreateDirectory(directory);
}
if (!File.Exists(filename))
{
response.WriteResponseStreamToFile(filename);
}
}
}
}
catch (AmazonS3Exception amazonS3Exception)
{
if (amazonS3Exception.ErrorCode == "NoSuchKey")
{
return;
}
if (amazonS3Exception.ErrorCode != null && (amazonS3Exception.ErrorCode.Equals("InvalidAccessKeyId") || amazonS3Exception.ErrorCode.Equals("InvalidSecurity")))
{
// Log AWS invalid credentials
throw new ApplicationException("AWS Invalid Credentials");
}
else
{
// Log generic AWS exception
throw new ApplicationException("AWS Exception: " + amazonS3Exception.Message);
}
}
catch
{
//
}
}
The downloading of the images all works fine but the Task.WaitAll seems to be ignored and the rest of the code continues to be executed - meaning I try to get files that are currently non existent (as they've not yet been downloaded).
I found this answer to another question which seems to be the same as mine. I tried to use the answer to change my code but it still wouldn't wait for all files to be downloaded.
Can anyone tell me where I am going wrong?
The code behaves as expected. Task.WaitAll returns after ten seconds even when not all files have been downloaded, because you have specified a timeout of 10 seconds (10000 milliseconds) in variable maxDownloadMilliseconds.
If you really want to wait for all downloads to finish, call Task.WaitAll without specifying a timeout.
Use
Task.WaitAll(tasks.ToArray());//wait to download
at both places.
To see some good explanations on how to implement parallel downloads while not stressing the system (only have a maximum number of parallel downloads), see the answer at How can I limit Parallel.ForEach?

Categories

Resources