Getting time remaining from DotNetZip packaging - c#

I have this code:
using (var zip = new ZipFile())
{
zip.CompressionLevel = CompressionLevel.None;
zip.AddDirectory(myDirectoryInfo.FullName);
zip.UseZip64WhenSaving = Zip64Option.Always;
zip.SaveProgress += SaveProgress;
zip.Save(outputPackage);
}
private void SaveProgress(object sender, SaveProgressEventArgs e)
{
if (e.EntriesTotal > 0 && e.EntriesSaved > 0)
{
var counts = String.Format("{0} / {1}", e.EntriesSaved, e.EntriesTotal);
var percentcompletion = ((double)e.EntriesSaved / e.EntriesTotal) * 100;
}
}
What I really want to do is estimate the time remaining for the packaging to complete. But in SaveProgress the SaveProgressEventArgs values BytesTransferred and TotalBytesToTransfer have values of 0. I believe I need these to accurately estimate time?
So first, am I supposed to have values from these? Seems like the packaging is working okay. Second, what's the best way to estimate time remaining here, and third, is there a way to ensure that this is the fastest way to package a large directory? I don't want to compress- this is a directory filled with already compressed files that just need to be stuffed into an archive.

Related

How to read and write more then 25000 records/lines into text file at a time?

I am connecting my application with stock market live data provider using web socket. So when market is live and socket is open then it's giving me nearly 45000 lines in a minute. at a time I am deserializing it line by line
and then write that line into text file and also reading text file and removing first line of text file. So handling another process with socket becomes slow. So please can you help me that how should I perform that process very fast like nearly 25000 lines in a minute.
string filePath = #"D:\Aggregate_Minute_AAPL.txt";
var records = (from line in File.ReadLines(filePath).AsParallel()
select line);
List<string> str = records.ToList();
str.ForEach(x =>
{
string result = x;
result = result.TrimStart('[').TrimEnd(']');
var jsonString = Newtonsoft.Json.JsonConvert.DeserializeObject<List<LiveAMData>>(x);
foreach (var item in jsonString)
{
string value = "";
string dirPath = #"D:\COMB1\MinuteAggregates";
string[] fileNames = null;
fileNames = System.IO.Directory.GetFiles(dirPath, item.sym+"_*.txt", System.IO.SearchOption.AllDirectories);
if(fileNames.Length > 0)
{
string _fileName = fileNames[0];
var lineList = System.IO.File.ReadAllLines(_fileName).ToList();
lineList.RemoveAt(0);
var _item = lineList[lineList.Count - 1];
if (!_item.Contains(item.sym))
{
lineList.RemoveAt(lineList.Count - 1);
}
System.IO.File.WriteAllLines((_fileName), lineList.ToArray());
value = $"{item.sym},{item.s},{item.o},{item.h},{item.c},{item.l},{item.v}{Environment.NewLine}";
using (System.IO.StreamWriter sw = System.IO.File.AppendText(_fileName))
{
sw.Write(value);
}
}
}
});
How to make process fast, if application perform this then it takes nearly 3000 to 4000 symbols. and if there is no any process then it executes 25000 lines per minute. So how to increase line execution time/process with all this code ?
First you need to cleanup you code to gain more visibility, i did a quick refactor and this is what i got
const string FilePath = #"D:\Aggregate_Minute_AAPL.txt";
class SomeClass
{
public string Sym { get; set; }
public string Other { get; set; }
}
private void Something() {
File
.ReadLines(FilePath)
.AsParallel()
.Select(x => x.TrimStart('[').TrimEnd(']'))
.Select(JsonConvert.DeserializeObject<List<SomeClass>>)
.ForAll(WriteRecord);
}
private const string DirPath = #"D:\COMB1\MinuteAggregates";
private const string Separator = #",";
private void WriteRecord(List<SomeClass> data)
{
foreach (var item in data)
{
var fileNames = Directory
.GetFiles(DirPath, item.Sym+"_*.txt", SearchOption.AllDirectories);
foreach (var fileName in fileNames)
{
var fileLines = File.ReadAllLines(fileName)
.Skip(1).ToList();
var lastLine = fileLines.Last();
if (!lastLine.Contains(item.Sym))
{
fileLines.RemoveAt(fileLines.Count - 1);
}
fileLines.Add(
new StringBuilder()
.Append(item.Sym)
.Append(Separator)
.Append(item.Other)
.Append(Environment.NewLine)
.ToString()
);
File.WriteAllLines(fileName, fileLines);
}
}
}
From here should be more easy to play with List.AsParallel to check how and with what parameters the code is faster.
Also:
You are opening the write file twice
The removes are also somewhat expensive, in the index 0 is more (however, if there are few elements this could not make much difference
if(fileNames.Length > 0) is useless, use a for, if the list is empty, then he for will simply skip
You can try StringBuilder instead string interpolation
I hope this hints can help you to improve your time! and that i have not forgetting something.
Edit
We have nearly 10,000 files in our directory. So when process is
running then it's passing an error that The Process can not access the
file because it is being used by another process
Well, is there a possibility that in your process lines there is duplicated file names?
If that is the case, you could try a simple approach, a retry after some milliseconds, something like
private const int SleepMillis = 5;
private const int MaxRetries = 3;
public void WriteFile(string fileName, string[] fileLines, int retries = 0)
{
try
{
File.WriteAllLines(fileName, fileLines);
}
catch(Exception e) //Catch the special type if you can
{
if (retries >= MaxRetries)
{
Console.WriteLine("Too many tries with no success");
throw; // rethrow exception
}
Thread.Sleep(SleepMillis);
WriteFile(fileName, fileLines, ++retries); // try again
}
}
I tried to keep it simple, but there are some annotations:
- If you can make your methods async, it could be an improvement by changing the sleep for a Task.Delay, but you need to know and understand well how async works
- If the collision happens a lot, then you should try another approach, something like a concurrent map with semaphores
Second edit
In real scenario I am connecting to websocket and receiving 70,000 to
1 lac records on every minute and after that I am bifurcating those
records with live streaming data and storing in it's own file. And
that becomes slower when I am applying our concept with 11,000 files
It is a hard problem, from what i understand, you're talking about 1166 records per second, at this size the little details can become big bottlenecks.
At that phase i think it is better to think about other solutions, it could be so much I/O for the disk, could be many threads, or too few, network...
You should start by profiling the app to check where the app is spending more time to focus in that area, how much resources is using? how much resources do you have? how is the memory, processor, garbage collector, network? do you have an SSD?
You need a clear view of what is slowing you down so you can attack that directly, it will depend on a lot of things, it will be hard to help with that part :(.
There are tons of tools for profile c# apps, and many ways to attack this problem (spread the charge in several servers, use something like redis to save data really quick, some event store so you can use events....

WP background file transfer, more than 25 files

according to this topic
http://msdn.microsoft.com/en-us/library/windowsphone/develop/hh202959(v=vs.105).aspx
I'm trying to download more than 25 mp3 files from a list, in background, I made lot of different tries, basically I tried to pass a list, remove the file downloaded and recall the function again... but doesn't work with app in background... maybe because it's a variable? should I store into isolated storage? here is the last code:
ObservableCollection<File> remoteFileList = new ObservableCollection<File>();
public void downloadList()
{
if ((remoteFileList.Count > 0) && (BackgroundTransferService.Requests.Count() < 5))
{
File t = remoteFileList.First();
BackgroundTransferRequest transfer = startDownload(t.Name);
transfer.TransferProgressChanged += new EventHandler<BackgroundTransferEventArgs>(transfer_TransferProgressChanged);
remoteFileList.Remove(t);
}
}
public void transfer_TransferStatusChanged(object sender, BackgroundTransferEventArgs e)
{
BackgroundTransferRequest b = e.Request as BackgroundTransferRequest;
System.Diagnostics.Debug.WriteLine(b.TransferStatus);
ProcessTransfer(e.Request);
downloadList();
}
To pop items off the BackgroundTransfer queue, you need to call the Remove() method in the BackgroundTransferService class. You cannot have more than 25 requests in a queue without popping something out of it.

How to set a dynamic number of threadCounter variables?

I'm not really into multithreading so probably the question is stupid but it seems I cannot find a way to solve this problem (especially because I'm using C# and I've been using it for a month).
I have a dynamic number of directories (I got it from a query in the DB). Inside those queries there are a certain amount of files.
For each directory I need to use a method to transfer these files using FTP in a cuncurrent way because I have basically no limit in FTP max connections (not my word, it's written in the specifics).
But I still need to control the max amount of files transfered per directory. So I need to count the files I'm transfering (increment/decrement).
How could I do it? Should I use something like an array and use the Monitor class?
Edit: Framework 3.5
You can use the Semaphore class to throttle the number of concurrent files per directory. You would probably want to have one semaphore per directory so that the number of FTP uploads per directory can be controlled independently.
public class Example
{
public void ProcessAllFilesAsync()
{
var semaphores = new Dictionary<string, Semaphore>();
foreach (string filePath in GetFiles())
{
string filePathCapture = filePath; // Needed to perform the closure correctly.
string directoryPath = Path.GetDirectoryName(filePath);
if (!semaphores.ContainsKey(directoryPath))
{
int allowed = NUM_OF_CONCURRENT_OPERATIONS;
semaphores.Add(directoryPath, new Semaphore(allowed, allowed));
}
var semaphore = semaphores[directoryPath];
ThreadPool.QueueUserWorkItem(
(state) =>
{
semaphore.WaitOne();
try
{
DoFtpOperation(filePathCapture);
}
finally
{
semaphore.Release();
}
}, null);
}
}
}
var allDirectories = db.GetAllDirectories();
foreach(var directoryPath in allDirectories)
{
DirectoryInfo directories = new DirectoryInfo(directoryPath);
//Loop through every file in that Directory
foreach(var fileInDir in directories.GetFiles()) {
//Check if we have reached our max limit
if (numberFTPConnections == MAXFTPCONNECTIONS){
Thread.Sleep(1000);
}
//code to copy to FTP
//This can be Aync, when then transfer is completed
//decrement the numberFTPConnections so then next file can be transfered.
}
}
You can try something along the lines above. Note that It's just the basic logic and there are proberly better ways to do this.

SharePoint get the size of individual web sites using API

Right now all I am using to calculate the size are the files in the folders. I do not think this is all of it, because the content database size is about 15gb. When I calculate the size of all the files I get around 10gb. Does anyone know what I may be missing?
Here is the code I have so far.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using System.Globalization;
namespace WebSizeTesting
{
class Program
{
static void Main(string[] args)
{
long SiteCollectionBytes = 0;
using (SPSite mainSite = new SPSite("http://sharepoint-test"))
{
// loop through the websites
foreach (SPWeb web in mainSite.AllWebs)
{
long webBytes = GetSPFolderSize(web.RootFolder);
// Add in size of each web site's recycle bin
webBytes += web.RecycleBin.OfType<SPRecycleBinItem>().Select(item => item.Size).ToArray<long>().Sum();
Console.WriteLine("Url: {0}, Size: {1}", web.Url, ConvertBytesToDisplayText( webBytes ));
SiteCollectionBytes += webBytes;
}
long siteCollectionRecycleBinBytes = mainSite.RecycleBin.OfType<SPRecycleBinItem>().Select(item => item.Size).ToArray<long>().Sum();
Console.WriteLine("Site Collection Recycle Bin: " + ConvertBytesToDisplayText(siteCollectionRecycleBinBytes));
SiteCollectionBytes += siteCollectionRecycleBinBytes;
}
Console.WriteLine("Total Size: " + ConvertBytesToDisplayText(SiteCollectionBytes));
Console.ReadKey();
}
public static long GetSPFolderSize(SPFolder folder)
{
long byteCount = 0;
// calculate the files in the immediate folder
foreach (SPFile file in folder.Files)
{
byteCount += file.TotalLength;
// also include file versions
foreach (SPFileVersion fileVersion in file.Versions)
{
byteCount += fileVersion.Size;
}
}
// Handle sub folders
foreach (SPFolder subFolder in folder.SubFolders)
{
byteCount += GetSPFolderSize(subFolder);
}
return byteCount;
}
public static string ConvertBytesToDisplayText(long byteCount)
{
string result = "";
if (byteCount > Math.Pow(1024, 3))
{
// display as gb
result = (byteCount / Math.Pow(1024, 3)).ToString("#,#.##", CultureInfo.InvariantCulture) + " GB";
}
else if (byteCount > Math.Pow(1024, 2))
{
// display as mb
result = (byteCount / Math.Pow(1024, 2)).ToString("#,#.##", CultureInfo.InvariantCulture) + " MB";
}
else if (byteCount > 1024)
{
// display as kb
result = (byteCount / 1024).ToString("#,#.##", CultureInfo.InvariantCulture) + " KB";
}
else
{
// display as bytes
result = byteCount.ToString("#,#.##", CultureInfo.InvariantCulture) + " Bytes";
}
return result;
}
}
}
edit 2:15 pm 3/1/2010 cst I added in the ability to count file versions as part of the size to the code. As was suggested by Goyuix in the post below. It still is off by a considerable amount of the physical database size.
edit 8:38 am 3/3/2010 cst I added in the calculating of the recycle bin size for each web, and the site collection recycle bin. These changes where suggested by ArjanP. Also i wanted to add, that I am very open to more efficient ways of doing this.
Did you consider the Trash Can? There will be cans for Webs and the Site Collection, all taking up space in the content database.
There will always be 'overhead' in a content database.. every 'empty' Web will consume a number of bytes already. 30% seems much but not excessive, it depends on the ratio of content and the number of webs.
The content database also stores configuration information, like what lists actually exist, features, permissions, etc... while that would probably not account for 5GB of data, it is something to consider. Also, each file is also typically associate with an SPListItem that may contain metadata for that file.
Do you have versioning turned on for any of the lists / libraries? If so, you will also need to check the SPListItem.Versions property for each version.
I'm not quite sure your code considers list attachments, too.

Delete files from the folder older than 4 days

I would like to run a timer for every 5 hours and delete the files from the folder older than 4 days. Could you please with sample code?
DateTime CutOffDate = DateTime.Now.AddDays(-4)
DirectoryInfo di = new DirectoryInfo(folderPath);
FileInfo[] fi = di.GetFiles();
for (int i = 0; i < fi.Length; i++)
{
if (fi[i].LastWriteTime < CutOffDate)
{
File.Delete(fi[i].FullName);
}
}
You can substitute LastWriteTime property for something else, that's just what I use when clearing out an Image Cache in an app I have.
EDIT:
Though this doesnt include the timer part... I'll let you figure that part out yourself. A little Googling should show you several ways to do it on a schedule.
Since it hasn't been mentioned, I would recommend using a System.Threading.Timer for something like this. Here's an example implementation:
System.Threading.Timer DeleteFileTimer = null;
private void CreateStartTimer()
{
TimeSpan InitialInterval = new TimeSpan(0,0,5);
TimeSpan RegularInterval = new TimeSpan(5,0,0);
DeleteFileTimer = new System.Threading.Timer(QueryDeleteFiles, null,
InitialInterval, RegularInterval);
}
private void QueryDeleteFiles(object state)
{
//Delete Files Here... (Fires Every Five Hours).
//Warning: Don't update any UI elements from here without Invoke()ing
System.Diagnostics.Debug.WriteLine("Deleting Files...");
}
private void StopDestroyTimer()
{
DeleteFileTimer.Change(System.Threading.Timeout.Infinite,
System.Threading.Timeout.Infinite);
DeleteFileTimer.Dispose();
}
This way, you can run your file deletion code in a windows service with minimal hassle.

Categories

Resources