Right now all I am using to calculate the size are the files in the folders. I do not think this is all of it, because the content database size is about 15gb. When I calculate the size of all the files I get around 10gb. Does anyone know what I may be missing?
Here is the code I have so far.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using System.Globalization;
namespace WebSizeTesting
{
class Program
{
static void Main(string[] args)
{
long SiteCollectionBytes = 0;
using (SPSite mainSite = new SPSite("http://sharepoint-test"))
{
// loop through the websites
foreach (SPWeb web in mainSite.AllWebs)
{
long webBytes = GetSPFolderSize(web.RootFolder);
// Add in size of each web site's recycle bin
webBytes += web.RecycleBin.OfType<SPRecycleBinItem>().Select(item => item.Size).ToArray<long>().Sum();
Console.WriteLine("Url: {0}, Size: {1}", web.Url, ConvertBytesToDisplayText( webBytes ));
SiteCollectionBytes += webBytes;
}
long siteCollectionRecycleBinBytes = mainSite.RecycleBin.OfType<SPRecycleBinItem>().Select(item => item.Size).ToArray<long>().Sum();
Console.WriteLine("Site Collection Recycle Bin: " + ConvertBytesToDisplayText(siteCollectionRecycleBinBytes));
SiteCollectionBytes += siteCollectionRecycleBinBytes;
}
Console.WriteLine("Total Size: " + ConvertBytesToDisplayText(SiteCollectionBytes));
Console.ReadKey();
}
public static long GetSPFolderSize(SPFolder folder)
{
long byteCount = 0;
// calculate the files in the immediate folder
foreach (SPFile file in folder.Files)
{
byteCount += file.TotalLength;
// also include file versions
foreach (SPFileVersion fileVersion in file.Versions)
{
byteCount += fileVersion.Size;
}
}
// Handle sub folders
foreach (SPFolder subFolder in folder.SubFolders)
{
byteCount += GetSPFolderSize(subFolder);
}
return byteCount;
}
public static string ConvertBytesToDisplayText(long byteCount)
{
string result = "";
if (byteCount > Math.Pow(1024, 3))
{
// display as gb
result = (byteCount / Math.Pow(1024, 3)).ToString("#,#.##", CultureInfo.InvariantCulture) + " GB";
}
else if (byteCount > Math.Pow(1024, 2))
{
// display as mb
result = (byteCount / Math.Pow(1024, 2)).ToString("#,#.##", CultureInfo.InvariantCulture) + " MB";
}
else if (byteCount > 1024)
{
// display as kb
result = (byteCount / 1024).ToString("#,#.##", CultureInfo.InvariantCulture) + " KB";
}
else
{
// display as bytes
result = byteCount.ToString("#,#.##", CultureInfo.InvariantCulture) + " Bytes";
}
return result;
}
}
}
edit 2:15 pm 3/1/2010 cst I added in the ability to count file versions as part of the size to the code. As was suggested by Goyuix in the post below. It still is off by a considerable amount of the physical database size.
edit 8:38 am 3/3/2010 cst I added in the calculating of the recycle bin size for each web, and the site collection recycle bin. These changes where suggested by ArjanP. Also i wanted to add, that I am very open to more efficient ways of doing this.
Did you consider the Trash Can? There will be cans for Webs and the Site Collection, all taking up space in the content database.
There will always be 'overhead' in a content database.. every 'empty' Web will consume a number of bytes already. 30% seems much but not excessive, it depends on the ratio of content and the number of webs.
The content database also stores configuration information, like what lists actually exist, features, permissions, etc... while that would probably not account for 5GB of data, it is something to consider. Also, each file is also typically associate with an SPListItem that may contain metadata for that file.
Do you have versioning turned on for any of the lists / libraries? If so, you will also need to check the SPListItem.Versions property for each version.
I'm not quite sure your code considers list attachments, too.
Related
I have the following upload code using Unity's UnityWebRequest API (Unity 2019.2.13f1):
public IEnumerator UploadJobFile(string jobId, string path)
{
if (!File.Exists(path))
{
Debug.LogError("The given file to upload does not exist. Please re-create the recording and try again.");
yield break;
}
UnityWebRequest upload = new UnityWebRequest(hostURL + "/jobs/upload/" + jobId);
upload.uploadHandler = new UploadHandlerFile(path);
upload.downloadHandler = new DownloadHandlerBuffer();
upload.method = UnityWebRequest.kHttpVerbPOST;
upload.SetRequestHeader("filename", Path.GetFileName(path));
UnityWebRequestAsyncOperation op = upload.SendWebRequest();
while (!upload.isDone)
{
//Debug.Log("Uploading file...");
Debug.Log("Uploading file. Progress " + (int)(upload.uploadProgress * 100f) + "%"); // <-----------------
yield return null;
}
if (upload.isNetworkError || upload.isHttpError)
{
Debug.LogError("Upload error:\n" + upload.error);
}
else
{
Debug.Log("Upload success");
}
// this is needed to clear resources on the file
upload.Dispose();
}
string hostURL = "http://localhost:8080";
string jobId = "manualUploadTest";
string path = "E:/Videos/short.mp4";
void Update()
{
if (Input.GetKeyDown(KeyCode.O))
{
Debug.Log("O key was pressed.");
StartCoroutine(UploadAndTest(jobId, path));
}
}
And the files I receive on the server side arrive broken, especially if they are larger (30 MB or more). They are missing bytes in the end and sometimes have entire byte blocks duplicated in the middle.
This happens both when testing client and server on the same machine or when running on different machines.
The server does not complain - from its perspective, no transport errors happened.
I noticed that if I comment out the access to upload.uploadProgress (and e.g. instead use the commented-out debug line above it which just prints a string literal), the files stay intact. Ditching the wile loop altogether and replacing it with yield return op also works.
I tested this strange behavior repeatedly in an outer loop - usually after at most 8 repetitions with the "faulty" code, the file appears broken. If I use the "correct" variant, 100 uploads (update: 500) in a row were successful.
Has upload.uploadProgress side-effects? For what it's worth, the same happens if I print op.progress instead - the files are also broken.
This sounds like a real bug. uploadProgress obviously should not have side effects.
I have this code:
using (var zip = new ZipFile())
{
zip.CompressionLevel = CompressionLevel.None;
zip.AddDirectory(myDirectoryInfo.FullName);
zip.UseZip64WhenSaving = Zip64Option.Always;
zip.SaveProgress += SaveProgress;
zip.Save(outputPackage);
}
private void SaveProgress(object sender, SaveProgressEventArgs e)
{
if (e.EntriesTotal > 0 && e.EntriesSaved > 0)
{
var counts = String.Format("{0} / {1}", e.EntriesSaved, e.EntriesTotal);
var percentcompletion = ((double)e.EntriesSaved / e.EntriesTotal) * 100;
}
}
What I really want to do is estimate the time remaining for the packaging to complete. But in SaveProgress the SaveProgressEventArgs values BytesTransferred and TotalBytesToTransfer have values of 0. I believe I need these to accurately estimate time?
So first, am I supposed to have values from these? Seems like the packaging is working okay. Second, what's the best way to estimate time remaining here, and third, is there a way to ensure that this is the fastest way to package a large directory? I don't want to compress- this is a directory filled with already compressed files that just need to be stuffed into an archive.
This question already has answers here:
Windows filesystem: Creation time of a file doesn't change when while is deleted and created again
(2 answers)
Closed 9 years ago.
I have a logging class. It created a new log.txt file if one isn't present and writes messsages to that file. I also have a method that runs to check the file size and when the file was created against local settings. If the difference between the log.txt's creation time and the current time exceeds the local settings MaxLogHours value, then it is archived to a local archive folder and deleted. The new log.txt file is created by the above process the next time a log message is sent to the class.
This works great, except when I look at the FileInfo.CreationTime for my log.txt file, it is always the same - 7/17/2012 12:05/18 PM - no matter what I do. I've manually deleted the file, the program deletes it, always the same. What is going on here? I also timestamp the old ones, but still nothing works. Does Windows think that the file is the same one because it has the same filename? I'd appreciate any help, thanks!
archive method
public static void ArchiveLog(Settings s)
{
FileInfo fi = new FileInfo(AppDomain.CurrentDomain.BaseDirectory + "\\log.txt");
string archiveDir = AppDomain.CurrentDomain.BaseDirectory + "\\archive";
TimeSpan ts = DateTime.Now - fi.CreationTime;
if ((s.MaxLogKB != 0 && fi.Length >= s.MaxLogKB * 1000) ||
(s.MaxLogHours != 0 && ts.TotalHours >= s.MaxLogHours))
{
if (!Directory.Exists(archiveDir))
{
Directory.CreateDirectory(archiveDir);
}
string archiveFile = archiveDir + "\\log" + string.Format("{0:MMddyyhhmmss}", DateTime.Now) + ".txt";
File.Copy(AppDomain.CurrentDomain.BaseDirectory + "\\log.txt", archiveFile);
File.Delete(AppDomain.CurrentDomain.BaseDirectory + "\\log.txt");
}
}
Writing/Creating the log:
public static void MsgLog(string Msg, bool IsStandardMsg = true)
{
try
{
using (StreamWriter sw = new StreamWriter(Directory.GetCurrentDirectory() + "\\log.txt", true))
{
sw.WriteLine("Msg at " + DateTime.Now + " - " + Msg);
Console.Out.WriteLine(Msg);
}
}
catch (Exception ex)
{
Console.Out.WriteLine(ex.Message);
}
}
This may happened , so it's writen in FileSystemInfo.CreationTime
This method may return an inaccurate value, because it uses native
functions whose values may not be continuously updated by the
operating system.
I think the problem is that you are using FileInfo.CreationTime without first checking if the file still exists. Run this POC - it will always generate "After delete CreationTime: 1/1/1601 12:00:00 AM" - because file does not exists anymore and you did not touch FileInfo.CreationTime prior to delete. However if you uncomment the line:
//Console.WriteLine("Before delete CreationTime: {0}", fi.CreationTime);
in the code below strangely both calls will return correct and updated value.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading;
namespace ConsoleApplication17088573
{
class Program
{
static void Main(string[] args)
{
for (int i = 0; i < 10; i++)
{
string fname = "testlog.txt";
using (var fl = File.Create(fname))
{
using (var sw = new StreamWriter(fl))
{
sw.WriteLine("Current datetime is {0}", DateTime.Now);
}
}
var fi = new FileInfo(fname);
//Console.WriteLine("Before delete CreationTime: {0}", fi.CreationTime);
File.Delete(fname);
Console.WriteLine("After delete CreationTime: {0}", fi.CreationTime);
Thread.Sleep(1000);
}
}
}
}
I have around 300k image files in a remote location. I download (have to) and write the details of these files to a text file (with some additional info). Due to the nature of the info I'm getting, I have to process each file as they arrive (Also I write each file info to a file line) to get some form of statistics for example, I have a list of objects with attributes size and count to see how many images of certain sizes I have.
I have also thought about getting everything read and written to a file without keeping any statistics info where I could just open the file again to add the statistics. But I can't think of a way to process a 250k line multi attribute file for statistics info.
I know the lists (yeah I have 2 of them) and the constant loop for each item is bugging the application down but is there another way? Right now it's been 2 hours and the application is still on 26k. For each image item, I do something like this to keep count where I check if an image comes with a certain size that did come before, I add it to that List item.
public void AddSizeTokens(Token token)
{
int index = tokenList.FindIndex(item => item.size== token.size);
if (index >= 0)
tokenList[index].count+=1;
else
tokenList.Add(token);
}
What a single line from the file I write to looks like
Hits Size Downloads Local Loc Virtual ID
204 88.3 4212 .../someImage.jpg f-dd-edb2-4a64-b42
I'm downloading the files like below;
try
{
using (WebClient client = new WebClient())
{
if (File.Exists(filePath + "/" + fileName + "." + ext))
{
return "File Exists: " + filePath + "/" + fileName + "." + ext;
}
client.DownloadFile(virtualPath, filePath + "/" + fileName + "." + ext);
return "Downloaded: " + filePath + "/" + fileName + "." + ext;
}
}
catch (Exception e) {
return"Problem Downloading " + fileName + ": " + e.Message;
}
You should be changing your tokenList from List<Token> to Dictionary<long, Token>.
The key is the size.
Your code would look like this:
Dictionary<long, Token> tokens = new Dictionary<long, Token>();
public void AddSizeTokens(Token token)
{
Token existingToken;
if(!tokens.TryGetValue(token.size, out existingToken))
tokens.Add(token.size, token);
else
existingToken.count += 1;
}
That will change it from an O(n) operation to a O(1) operation.
Another point to consider is Destrictor's comment. Your internet connection speed is very possibly the bottle neck here.
Well, I thought perhaps the coding was the issue. Some of the problem was indeed so. As per Daniel Hilgarth's instructions, changing to dictionary helped a lot, but only the first 30 minutes. Then It was getting worse by every minute.
The problem was apparently the innocent looking UI elements that I've fed information. They ate away so much cpu that it killed the application eventually. Minimizing UI info feed helped (1.5k per minute to at slowest 1.3k). Unbelievable! Hope it helps others who have similar problems.
Some code I'm working with occasionally needs to refer to long UNC paths (e.g. \\?\UNC\MachineName\Path), but we've discovered that no matter where the directory is located, even on the same machine, it's much slower when accessing through the UNC path than the local path.
For example, we've written some benchmarking code that writes a string of gibberish to a file, then later read it back, multiple times. I'm testing it with 6 different ways to access the same shared directory on my dev machine, with the code running on the same machine:
C:\Temp
\\MachineName\Temp
\\?\C:\Temp
\\?\UNC\MachineName\Temp
\\127.0.0.1\Temp
\\?\UNC\127.0.0.1\Temp
And here are the results:
Testing: C:\Temp
Wrote 1000 files to C:\Temp in 861.0647 ms
Read 1000 files from C:\Temp in 60.0744 ms
Testing: \\MachineName\Temp
Wrote 1000 files to \\MachineName\Temp in 2270.2051 ms
Read 1000 files from \\MachineName\Temp in 1655.0815 ms
Testing: \\?\C:\Temp
Wrote 1000 files to \\?\C:\Temp in 916.0596 ms
Read 1000 files from \\?\C:\Temp in 60.0517 ms
Testing: \\?\UNC\MachineName\Temp
Wrote 1000 files to \\?\UNC\MachineName\Temp in 2499.3235 ms
Read 1000 files from \\?\UNC\MachineName\Temp in 1684.2291 ms
Testing: \\127.0.0.1\Temp
Wrote 1000 files to \\127.0.0.1\Temp in 2516.2847 ms
Read 1000 files from \\127.0.0.1\Temp in 1721.1925 ms
Testing: \\?\UNC\127.0.0.1\Temp
Wrote 1000 files to \\?\UNC\127.0.0.1\Temp in 2499.3211 ms
Read 1000 files from \\?\UNC\127.0.0.1\Temp in 1678.18 ms
I tried the IP address to rule out a DNS issue. Could it be checking credentials or permissions on each file access? If so, is there a way to cache it? Does it just assume since it's a UNC path that it should do everything over TCP/IP instead of directly accessing the disk? Is it something wrong with the code we're using for the reads/writes? I've ripped out the pertinent parts for benchmarking, seen below:
using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.InteropServices;
using System.Text;
using Microsoft.Win32.SafeHandles;
using Util.FileSystem;
namespace UNCWriteTest {
internal class Program {
[DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
public static extern bool DeleteFile(string path); // File.Delete doesn't handle \\?\UNC\ paths
private const int N = 1000;
private const string TextToSerialize =
"asd;lgviajsmfopajwf0923p84jtmpq93worjgfq0394jktp9orgjawefuogahejngfmliqwegfnailsjdhfmasodfhnasjldgifvsdkuhjsmdofasldhjfasolfgiasngouahfmp9284jfqp92384fhjwp90c8jkp04jk34pofj4eo9aWIUEgjaoswdfg8jmp409c8jmwoeifulhnjq34lotgfhnq34g";
private static readonly byte[] _Buffer = Encoding.UTF8.GetBytes(TextToSerialize);
public static string WriteFile(string basedir) {
string fileName = Path.Combine(basedir, string.Format("{0}.tmp", Guid.NewGuid()));
try {
IntPtr writeHandle = NativeFileHandler.CreateFile(
fileName,
NativeFileHandler.EFileAccess.GenericWrite,
NativeFileHandler.EFileShare.None,
IntPtr.Zero,
NativeFileHandler.ECreationDisposition.New,
NativeFileHandler.EFileAttributes.Normal,
IntPtr.Zero);
// if file was locked
int fileError = Marshal.GetLastWin32Error();
if ((fileError == 32 /* ERROR_SHARING_VIOLATION */) || (fileError == 80 /* ERROR_FILE_EXISTS */)) {
throw new Exception("oopsy");
}
using (var h = new SafeFileHandle(writeHandle, true)) {
using (var fs = new FileStream(h, FileAccess.Write, NativeFileHandler.DiskPageSize)) {
fs.Write(_Buffer, 0, _Buffer.Length);
}
}
}
catch (IOException) {
throw;
}
catch (Exception ex) {
throw new InvalidOperationException(" code " + Marshal.GetLastWin32Error(), ex);
}
return fileName;
}
public static void ReadFile(string fileName) {
var fileHandle =
new SafeFileHandle(
NativeFileHandler.CreateFile(fileName, NativeFileHandler.EFileAccess.GenericRead, NativeFileHandler.EFileShare.Read, IntPtr.Zero,
NativeFileHandler.ECreationDisposition.OpenExisting, NativeFileHandler.EFileAttributes.Normal, IntPtr.Zero), true);
using (fileHandle) {
//check the handle here to get a bit cleaner exception semantics
if (fileHandle.IsInvalid) {
//ms-help://MS.MSSDK.1033/MS.WinSDK.1033/debug/base/system_error_codes__0-499_.htm
int errorCode = Marshal.GetLastWin32Error();
//now that we've taken more than our allotted share of time, throw the exception
throw new IOException(string.Format("file read failed on {0} to {1} with error code {1}", fileName, errorCode));
}
//we have a valid handle and can actually read a stream, exceptions from serialization bubble out
using (var fs = new FileStream(fileHandle, FileAccess.Read, 1*NativeFileHandler.DiskPageSize)) {
//if serialization fails, we'll just let the normal serialization exception flow out
var foo = new byte[256];
fs.Read(foo, 0, 256);
}
}
}
public static string[] TestWrites(string baseDir) {
try {
var fileNames = new List<string>();
DateTime start = DateTime.UtcNow;
for (int i = 0; i < N; i++) {
fileNames.Add(WriteFile(baseDir));
}
DateTime end = DateTime.UtcNow;
Console.Out.WriteLine("Wrote {0} files to {1} in {2} ms", N, baseDir, end.Subtract(start).TotalMilliseconds);
return fileNames.ToArray();
}
catch (Exception e) {
Console.Out.WriteLine("Failed to write for " + baseDir + " Exception: " + e.Message);
return new string[] {};
}
}
public static void TestReads(string baseDir, string[] fileNames) {
try {
DateTime start = DateTime.UtcNow;
for (int i = 0; i < N; i++) {
ReadFile(fileNames[i%fileNames.Length]);
}
DateTime end = DateTime.UtcNow;
Console.Out.WriteLine("Read {0} files from {1} in {2} ms", N, baseDir, end.Subtract(start).TotalMilliseconds);
}
catch (Exception e) {
Console.Out.WriteLine("Failed to read for " + baseDir + " Exception: " + e.Message);
}
}
private static void Main(string[] args) {
foreach (string baseDir in args) {
Console.Out.WriteLine("Testing: {0}", baseDir);
string[] fileNames = TestWrites(baseDir);
TestReads(baseDir, fileNames);
foreach (string fileName in fileNames) {
DeleteFile(fileName);
}
}
}
}
}
This doesn't surprise me. You're writing/reading a fairly small amount of data, so the file system cache is probably minimizing the impact of the physical disk I/O; basically, the bottleneck is going to be the CPU. I'm not certain whether the traffic will be going via the TCP/IP stack or not but at a minimum the SMB protocol is involved. For one thing that means the requests are being passed back and forth between the SMB client process and the SMB server process, so you've got context switching between three distinct processes, including your own. Using the local file system path you're switching into kernel mode and back but no other process is involved. Context switching is much slower than the transition to and from kernel mode.
There are likely to be two distinct additional overheads, one per file and one per kilobyte of data. In this particular test the per file SMB overhead is likely to be dominant. Because the amount of data involved also affects the impact of physical disk I/O, you may find that this is only really a problem when dealing with lots of small files.