I am trying to pass large (500-1000MB) files from a SharePoint web through multiple sites. My problem is that the entire file ends up in memory, which is obviously bad. When requesting a 500MB file, I see memory allocation across the entire system rise by 1-1.5GB.
The setup is as following:
A client requests a file from Site A. Site A then requests this file from Site B, which lastly requests this file from a SharePoint web. Relevant methods on Site A and B are implemented as handlers (.ASHX).
ProcessRequest - Site A
public void ProcessRequest(HttpContext context)
{
using (var client = new WebClient())
{
using(var stream = client.OpenRead( ... url to Site B ...)
{
if (stream != null)
{
... header setting emitted for clarity ...
stream.CopyTo(context.Response.OutputStream);
}
}
}
}
ProcessRequest - Site B
public void ProcessRequest(HttpContext context)
{
using (var stream = Resolve<IDocumentService>().StreamDocumentFromSP( ... paramaters to get correct file ...))
{
if (stream != null)
stream.CopyTo(context.Response.OutputStream);
}
}
StreamDocumentFromSP - Site B
public Stream StreamDocument(... paramaters to get correct file ...)
{
SPFile tempFile = null;
SPSecurity.RunWithElevatedPrivileges(() =>
{
using (var site = new SPSite(url))
{
using (var web = site.OpenWeb())
{
tempFile = web.GetFile(itemGuid);
}
}
});
return tempFile.OpenBinaryStream();
}
CopyTo
public static void CopyTo(this Stream input, Stream output, int bufferSize)
{
if (!input.CanRead) throw new InvalidOperationException("input must be open for reading");
if (!output.CanWrite) throw new InvalidOperationException("output must be open for writing");
var buf = new[] {new byte[bufferSize], new byte[bufferSize]};
var bufl = new[] {0, 0};
var bufno = 0;
var read = input.BeginRead(buf[bufno], 0, buf[bufno].Length, null, null);
IAsyncResult write = null;
while (true)
{
// Wait for the read operation to complete
read.AsyncWaitHandle.WaitOne();
bufl[bufno] = input.EndRead(read);
// If zero bytes read, the copy is complete
if (bufl[bufno] == 0)
break;
// Wait for the in-flight write operation, if one exists, to complete
// The only time one won't exist is after the very first read operation completes
if (write != null)
{
write.AsyncWaitHandle.WaitOne();
output.EndWrite(write);
}
// Start the new write operation
write = output.BeginWrite(buf[bufno], 0, bufl[bufno], null, null);
// Toggle the current, in-use buffer and start the read operation on the new buffer.
bufno ^= 1; // Faster than: bufno = bufno == 0 ? 1 : 0;
read = input.BeginRead(buf[bufno], 0, buf[bufno].Length, null, null);
}
// Wait for the final in-flight write operation, if one exists, to complete
// The only time one won't exist is if the input stream is empty.
if (write != null)
{
write.AsyncWaitHandle.WaitOne();
output.EndWrite(write);
}
output.Flush();
}
Ideally memory allocation should not rise by much more than the buffersize in my CopyTo method. How can I achieve this?
I ended up solving my problem by flushing the output stream manually after each cycle, in my CopyTo-method. Then, in ProcessRequest for both sites A and B, I set:
context.Response.BufferOutput = false
And that did the trick. I was unaware that HTTPHandlers by default buffer the entire response stream.
Related
I am trying to download files from a SharePoint library using the client object model. I seem to be able to access the files using OpenBinaryStream() and then executing the query, but when I try to access the stream, it is a stream of Length = 0. I've seen many examples and I've tried several, but I can't get the files to download. I've uploaded successfully, and credentials and permissions aren't the problem. Anyone have any thoughts?
public SharepointFileContainer DownloadFolder(bool includeSubfolders, params object[] path)
{
try
{
List<string> pathStrings = new List<string>();
foreach (object o in path)
pathStrings.Add(o.ToString());
var docs = _context.Web.Lists.GetByTitle(Library);
_context.Load(docs);
_context.ExecuteQuery();
var rootFolder = docs.RootFolder;
_context.Load(rootFolder);
_context.ExecuteQuery();
var folder = GetFolder(rootFolder, pathStrings);
var files = folder.Files;
_context.Load(files);
_context.ExecuteQuery();
SharepointFileContainer remoteFiles = new SharepointFileContainer();
foreach (Sharepoint.File f in files)
{
_context.Load(f);
var file = f.OpenBinaryStream();
_context.ExecuteQuery();
var memoryStream = new MemoryStream();
file.Value.CopyTo(memoryStream);
remoteFiles.Files.Add(f.Name, memoryStream);
}
...
}
SharepointFileContainer is just a custom class for my calling application to dispose of the streams when it has finished processing them. GetFolder is a recursive method to drill down the given folder path. I've had problems with providing the direct url and have had the most success with this.
My big question is why "file.Value" is a Stream with a Length == 0?
Thanks in advance!
EDIT:
Thanks for your input so far...unfortunately I'm experiencing the same problem. Both solutions pitched make use of OpenBinaryDirect. The resulting FileInformation class has this for the stream...
I'm still getting a file with 0 bytes downloaded.
You need to get the list item of the file (as a ListItem object) and then use it's property File. Something like:
//...
// Previous code
//...
var docs = _context.Web.Lists.GetByTitle(Library);
var listItem = docs.GetItemById(listItemId);
_context.Load(docs);
clientContext.Load(listItem, i => i.File);
clientContext.ExecuteQuery();
var fileRef = listItem.File.ServerRelativeUrl;
var fileInfo = Microsoft.SharePoint.Client.File.OpenBinaryDirect(clientContext, fileRef);
var fileName = Path.Combine(filePath,(string)listItem.File.Name);
using (var fileStream = System.IO.File.Create(fileName))
{
fileInfo.Stream.CopyTo(fileStream);
}
After that you do whatever you need to do with the stream. The current one just saves it to the specified path, but you can also download it in the browser, etc..
We can use the following code to get the memory stream.
var fileInformation = Microsoft.SharePoint.Client.File.OpenBinaryDirect(clientContext, file.ServerRelativeUrl);
if (fileInformation != null && fileInformation.Stream != null)
{
using (MemoryStream memoryStream = new MemoryStream())
{
byte[] buffer = new byte[32768];
int bytesRead;
do
{
bytesRead = fileInformation.Stream.Read(buffer, 0, buffer.Length);
memoryStream.Write(buffer, 0, bytesRead);
} while (bytesRead != 0);
}
}
Reference: https://praveenkasireddy.wordpress.com/2012/11/11/download-document-from-document-set-using-client-object-model-om/
I have a REST GET API that is written using WCF library to return Stream of a specific requested file that is located on API server that hosts that web service application. The service works well if the size of the requested file is small; that is less than 100 MB. But if file size is greater than > 100 MB, then the service returns 0 bytes without any logged information I can get the library method (saying, the "catch" block).
The library method (the class library project) returns Stream of needed file is
public Stream GetFile(string fileId, string seekStartPosition=null)
{
_lastActionResult = string.Empty;
Stream fileStream = null;
try
{
Guid fileGuid;
if (Guid.TryParse(fileId, out fileGuid) == false)
{
_lastActionResult = string.Format(ErrorMessage.FileIdInvalidT, fileId);
}
else
{
ContentPackageItemService contentItemService = new ContentPackageItemService();
string filePath = DALCacheHelper.GetFilePath(fileId);
if (File.Exists(filePath))
{
fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
long seekStart = 0;
// if seek position is specified, move the stream pointer to that location
if (string.IsNullOrEmpty(seekStartPosition) == false && long.TryParse(seekStartPosition, out seekStart))
{
// make sure seek position is smaller than file size
FileInfo fi = new FileInfo(filePath);
if (seekStart >= 0 && seekStart < fi.Length)
{
fileStream.Seek(seekStart, SeekOrigin.Begin);
}
else
{
_lastActionResult = string.Format(ErrorMessage.FileSeekInvalidT, seekStart, fi.Length);
}
}
}
else
{
_lastActionResult = string.Format(ErrorMessage.FileNotFoundT, fileId);
Logger.Write(_lastActionResult,
"General", 1, Constants.LogId.RESTSync, System.Diagnostics.TraceEventType.Error, System.Reflection.MethodBase.GetCurrentMethod().Name);
}
}
}
catch(Exception ex)
{
Logger.Write(ex,"General", 1, Constants.LogId.RESTSync, System.Diagnostics.TraceEventType.Error, System.Reflection.MethodBase.GetCurrentMethod().Name);
}
return fileStream;
}
API method on the client side project (where .svc file is):
[WebGet(UriTemplate = "files/{fileid}")]
public Stream GetFile(string fileid)
{
ContentHandler handler = new ContentHandler();
Stream fileStream = null;
try
{
fileStream = handler.GetFile(fileid);
}
catch (Exception ex)
{
Logger.Write(string.Format("{0} {1}", ex.Message, ex.StackTrace), "General", 1, Constants.LogId.RESTSync, System.Diagnostics.TraceEventType.Error, System.Reflection.MethodBase.GetCurrentMethod().Name);
throw new WebFaultException<ErrorResponse>(new ErrorResponse(HttpStatusCode.InternalServerError, ex.Message), HttpStatusCode.InternalServerError);
}
if (fileStream == null)
{
throw new WebFaultException<ErrorResponse>(new ErrorResponse(handler.LastActionResult), HttpStatusCode.InternalServerError);
}
return fileStream;
}
As you are using REST, I presume you are using the WebHttpBinding. You need to set the MaxReceivedMessageSize on the client binding to be sufficient for the maximum expected response size. The default is 64K. Here's the msdn documentation for the property if you are creating your binding in code. If you are creating your binding in your app.config, then this is the documentation you need.
I have an application that download files from a Unix FTP server. It works fine, just have this performance problem: Files which size is <= 1K takes in average between 2084 and 2400 milliseconds to download, while applications like Filezilla download the same files in less than 1 second (per each file).
Maybe this time its OK for some average users, but is not acceptable for my application, since I need to download THOUSANDS of files.
I optimize the code as much as I could:
- The cache and buffer to read the content are created 1 time in the constructor of the class.
- I create 1 time the network credentials, and re-use on every file download. I know this is working, since for the first file it takes like 7s to download, and all subsequent downloads are on the range of 2s.
- I change the size of the buffer from 2K until 32K. I dont know if this will help or not, since the files Im downloading are less than 1K, so in theory the buffer will be fill with all the information in 1 round from network.
Maybe is not related to the network, but to the way Im writing and/or windows handles the write of the file??
Can someone please give me some tips on how to reduce the time to something similar to filezilla??
I need to reduce the time, otherwise my ftp will be running for 3 days 24 hours a day to finish its task :(
Many thanks in advance.
The code here: Its not complete, it just show the download part.
//Create this on the constructor of my class
downloadCache = new MemoryStream(2097152);
downloadBuffer = new byte[32768];
public bool downloadFile(string pRemote, string pLocal, out long donwloadTime)
{
FtpWebResponse response = null;
Stream responseStream = null;
try
{
Stopwatch fileDownloadTime = new Stopwatch();
donwloadTime = 0;
fileDownloadTime.Start();
FtpWebRequest request = (FtpWebRequest)WebRequest.Create(pRemote);
request.Method = WebRequestMethods.Ftp.DownloadFile;
request.UseBinary = false;
request.AuthenticationLevel = AuthenticationLevel.None;
request.EnableSsl = false;
request.Proxy = null;
//I created the credentials 1 time and re-use for every file I need to download
request.Credentials = this.manager.ftpCredentials;
response = (FtpWebResponse)request.GetResponse();
responseStream = response.GetResponseStream();
downloadCache.Seek(0, SeekOrigin.Begin);
int bytesSize = 0;
int cachedSize = 0;
//create always empty file. Need this because WriteCacheToFile just append the file
using (FileStream fileStream = new FileStream(pLocal, FileMode.Create)) { };
// Download the file until the download is completed.
while (true)
{
bytesSize = responseStream.Read(downloadBuffer, 0, downloadBuffer.Length);
if (bytesSize == 0 || 2097152 < cachedSize + bytesSize)
{
WriteCacheToFile(pLocal, cachedSize);
if (bytesSize == 0)
{
break;
}
downloadCache.Seek(0, SeekOrigin.Begin);
cachedSize = 0;
}
downloadCache.Write(downloadBuffer, 0, bytesSize);
cachedSize += bytesSize;
}
fileDownloadTime.Stop();
donwloadTime = fileDownloadTime.ElapsedMilliseconds;
//file downloaded OK
return true;
}
catch (Exception ex)
{
return false;
}
finally
{
if (response != null)
{
response.Close();
}
if (responseStream != null)
{
responseStream.Close();
}
}
}
private void WriteCacheToFile(string downloadPath, int cachedSize)
{
using (FileStream fileStream = new FileStream(downloadPath, FileMode.Append))
{
byte[] cacheContent = new byte[cachedSize];
downloadCache.Seek(0, SeekOrigin.Begin);
downloadCache.Read(cacheContent, 0, cachedSize);
fileStream.Write(cacheContent, 0, cachedSize);
}
}
Sounds to me your problem is related to Nagels algorithm used in the TCP client.
You can try turning the Nagel's algorithm off and also set SendChunked to false.
I am hoping that I wont have to copy the full example or create a minimal solution and the issue is some basic knowledge problem of mine.
I have a application that syncs data with Azure Storage, it all works fine and the upload part is already in a Async Method. It all works.
I was trying to optimize alittle more and wanted to change a stream.CopyTo to a CopyToAsync.
using(var filestream = streamProvider())
{
filestream.CopyTo(stream);
stream.Position = 0;
}
I changed it to await filestream.CopyToAsync(stream) and only parts of my files are upload.
At the point of writing I found out that the Exception throw is a "Found invalid data while decoding", IO.InvalidDataException.
This lead me back to the stream being a GZip/ or DeflateStream - so guess my question is about if its not expected to work with CopyToAsync on those kind of streams?
Context
public async Task UploadFile(string storePath, Func<Stream> streamProvider, bool copyLocalBeforeUpload = false)
{
using (new Timer(this.MessageQueue, "UPLOAD", storePath))
{
using (var stream = copyLocalBeforeUpload ? new MemoryStream() : streamProvider() )
{
try
{
if (copyLocalBeforeUpload)
{
using (var filestream = streamProvider())
{
await filestream.CopyToAsync(stream);
stream.Position = 0;
}
}
var blob = GetBlobReference(storePath);
await blob.UploadFromStreamAsync(stream,new BlobRequestOptions { RetryPolicy = new LinearRetry(TimeSpan.FromMilliseconds(100), 3) });
}
catch (Exception e)
{
if (this.MessageQueue != null)
{
this.MessageQueue.Enqueue(string.Format("Failed uploading '{0}'", storePath));
}
}
}
}
}
I want to download a file in a method, and then continue working with that file using some data that is stored in variables in the first method.
I know you can use DownloadFileAsync, but then I need to continue my work in the DownloadFileCompleted method, and the variables can't be reached from there (unless I declare some global ones and use instead, though that isn't the right way I suppose).
So I googled and found another way, by downloading the file manually, bit by bit. That would suit me quite perfect. Though what I want to know is if there are any other methods/solution to my problem that is more simple?
Or if you can play around with the events and achieve something that suits me better :)
Oh, and please change my question if you find a better title of it, I couldn't think of one.
You have to do it piece by piece to update a progress bar. This code does the trick.
public class WebDownloader
{
private static readonly ILog log = LogManager.GetLogger(typeof(WebDownloader));
public delegate void DownloadProgressDelegate(int percProgress);
public static void Download(string uri, string localPath, DownloadProgressDelegate progressDelegate)
{
long remoteSize;
string fullLocalPath; // Full local path including file name if only directory was provided.
log.InfoFormat("Attempting to download file (Uri={0}, LocalPath={1})", uri, localPath);
try
{
/// Get the name of the remote file.
Uri remoteUri = new Uri(uri);
string fileName = Path.GetFileName(remoteUri.LocalPath);
if (Path.GetFileName(localPath).Length == 0)
fullLocalPath = Path.Combine(localPath, fileName);
else
fullLocalPath = localPath;
/// Have to get size of remote object through the webrequest as not available on remote files,
/// although it does work on local files.
using (WebResponse response = WebRequest.Create(uri).GetResponse())
using (Stream stream = response.GetResponseStream())
remoteSize = response.ContentLength;
log.InfoFormat("Downloading file (Uri={0}, Size={1}, FullLocalPath={2}).",
uri, remoteSize, fullLocalPath);
}
catch (Exception ex)
{
throw new ApplicationException(string.Format("Error connecting to URI (Exception={0})", ex.Message), ex);
}
int bytesRead = 0, bytesReadTotal = 0;
try
{
using (WebClient client = new WebClient())
using (Stream streamRemote = client.OpenRead(new Uri(uri)))
using (Stream streamLocal = new FileStream(fullLocalPath, FileMode.Create, FileAccess.Write, FileShare.None))
{
byte[] byteBuffer = new byte[1024 * 1024 * 2]; // 2 meg buffer although in testing only got to 10k max usage.
int perc = 0;
while ((bytesRead = streamRemote.Read(byteBuffer, 0, byteBuffer.Length)) > 0)
{
bytesReadTotal += bytesRead;
streamLocal.Write(byteBuffer, 0, bytesRead);
int newPerc = (int)((double)bytesReadTotal / (double)remoteSize * 100);
if (newPerc > perc)
{
log.InfoFormat("...Downloading (BytesRead={0}, Perc={1})...", bytesReadTotal, perc);
perc = newPerc;
if (progressDelegate != null)
progressDelegate(perc);
}
}
}
}
catch (Exception ex)
{
throw new ApplicationException(string.Format("Error downloading file (Exception={0})", ex.Message), ex);
}
log.InfoFormat("File successfully downloaded (Uri={0}, BytesDownloaded={1}/{2}, FullLocalPath={3}).",
uri, bytesReadTotal, remoteSize, fullLocalPath);
}
}
You will need to spin off a thread to run this code as its obviously synchronous.
e.g.
Task.Factory.StartNew(_ => Download(...));