How do I optimize calculating the hash of thousands of files? - c#

I have a .Net program that runs through a directory containing tens of thousands of relatively small files (around 10MB each), calculates their MD5 hash and stores that data in an SQLite database. The whole process works fine, however it takes a relatively long time (1094353ms with around 60 thousand files) and I'm looking for ways to optimize it. Here are the solutions I've thought of:
Use additional threads and calculate the hash of more than one file simultaneously. Not sure how I/O speed would limit me with this one.
Use a better hashing algorithm. I've looked around and the one I'm currently using seems to be the fastest one (on C# at least).
Which would be the best approach, and are there any better ones?
Here's my current code:
private async Task<string> CalculateHash(string file, System.Security.Cryptography.MD5 md5) {
Task<string> MD5 = Task.Run(() =>
{
{
using (var stream = new BufferedStream(System.IO.File.OpenRead(file), 1200000))
{
var hash = md5.ComputeHash(stream);
var fileMD5 = string.Concat(Array.ConvertAll(hash, x => x.ToString("X2")));
return fileMD5;
}
};
});
return await MD5;
}
public async Main() {
using (var md5 = System.Security.Cryptography.MD5.Create()) {
foreach (var file in Directory.GetFiles(path)) {
var hash = await CalculateHash(file, md5);
// Adds `hash` to the database
}
}
}

Create a pipeline of work, the easiest way I know how to create a pipeline that uses both parts of the code that must be single threaded and parts that must be multi-threaded is to use TPL Dataflow
public static class Example
{
private class Dto
{
public Dto(string filePath, byte[] data)
{
FilePath = filePath;
Data = data;
}
public string FilePath { get; }
public byte[] Data { get; }
}
public static async Task ProcessFiles(string path)
{
var getFilesBlock = new TransformBlock<string, Dto>(filePath => new Dto(filePath, File.ReadAllBytes(filePath))); //Only lets one thread do this at a time.
var hashFilesBlock = new TransformBlock<Dto, Dto>(dto => HashFile(dto),
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = Environment.ProcessorCount, //We can multi-thread this part.
BoundedCapacity = 50}); //Only allow 50 byte[]'s to be waiting in the queue. It will unblock getFilesBlock once there is room.
var writeToDatabaseBlock = new ActionBlock<Dto>(WriteToDatabase,
new ExecutionDataflowBlockOptions {BoundedCapacity = 50});//MaxDegreeOfParallelism defaults to 1 so we don't need to specifiy it.
//Link the blocks together.
getFilesBlock.LinkTo(hashFilesBlock, new DataflowLinkOptions {PropagateCompletion = true});
hashFilesBlock.LinkTo(writeToDatabaseBlock, new DataflowLinkOptions {PropagateCompletion = true});
//Queue the work for the first block.
foreach (var filePath in Directory.EnumerateFiles(path))
{
await getFilesBlock.SendAsync(filePath).ConfigureAwait(false);
}
//Tell the first block we are done adding files.
getFilesBlock.Complete();
//Wait for the last block to finish processing its last item.
await writeToDatabaseBlock.Completion.ConfigureAwait(false);
}
private static Dto HashFile(Dto dto)
{
using (var md5 = System.Security.Cryptography.MD5.Create())
{
return new Dto(dto.FilePath, md5.ComputeHash(dto.Data));
}
}
private static async Task WriteToDatabase(Dto arg)
{
//Write to the database here.
}
}
This creates a pipeline with 3 segments.
One that is single threaded that reads the files from the hard drive in to memory and stored as a byte[].
A second one that can use up to Enviorement.ProcessorCount threads to hash the files, it will only allow 50 items to be sitting on it's inbound queue, when the first block tries to add it will stop processing new items until the next block is ready to accept new items.
And a third one that is single threaded and adds the data to the database, it allows only 50 items in it's inbound queue at a time.
Because of the two 50 limits there will be at most 100 byte[] in memory (50 in hashFilesBlock queue, 50 in the writeToDatabaseBlock queue, items currently being processed count toward the BoundedCapacity limit.
Update: for fun I wrote a version that reports progress too, it's untested though and uses C# 7 features.
using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
public static class Example
{
private class Dto
{
public Dto(string filePath, byte[] data)
{
FilePath = filePath;
Data = data;
}
public string FilePath { get; }
public byte[] Data { get; }
}
public static async Task ProcessFiles(string path, IProgress<ProgressReport> progress)
{
int totalFilesFound = 0;
int totalFilesRead = 0;
int totalFilesHashed = 0;
int totalFilesUploaded = 0;
DateTime lastReported = DateTime.UtcNow;
void ReportProgress()
{
if (DateTime.UtcNow - lastReported < TimeSpan.FromSeconds(1)) //Try to fire only once a second, but this code is not perfect so you may get a few rapid fire.
{
return;
}
lastReported = DateTime.UtcNow;
var report = new ProgressReport(totalFilesFound, totalFilesRead, totalFilesHashed, totalFilesUploaded);
progress.Report(report);
}
var getFilesBlock = new TransformBlock<string, Dto>(filePath =>
{
var dto = new Dto(filePath, File.ReadAllBytes(filePath));
totalFilesRead++; //safe because single threaded.
return dto;
});
var hashFilesBlock = new TransformBlock<Dto, Dto>(inDto =>
{
using (var md5 = System.Security.Cryptography.MD5.Create())
{
var outDto = new Dto(inDto.FilePath, md5.ComputeHash(inDto.Data));
Interlocked.Increment(ref totalFilesHashed); //Need the interlocked due to multithreaded.
ReportProgress();
return outDto;
}
},
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = Environment.ProcessorCount, BoundedCapacity = 50});
var writeToDatabaseBlock = new ActionBlock<Dto>(arg =>
{
//Write to database here.
totalFilesUploaded++;
ReportProgress();
},
new ExecutionDataflowBlockOptions {BoundedCapacity = 50});
getFilesBlock.LinkTo(hashFilesBlock, new DataflowLinkOptions {PropagateCompletion = true});
hashFilesBlock.LinkTo(writeToDatabaseBlock, new DataflowLinkOptions {PropagateCompletion = true});
foreach (var filePath in Directory.EnumerateFiles(path))
{
await getFilesBlock.SendAsync(filePath).ConfigureAwait(false);
totalFilesFound++;
ReportProgress();
}
getFilesBlock.Complete();
await writeToDatabaseBlock.Completion.ConfigureAwait(false);
ReportProgress();
}
}
public class ProgressReport
{
public ProgressReport(int totalFilesFound, int totalFilesRead, int totalFilesHashed, int totalFilesUploaded)
{
TotalFilesFound = totalFilesFound;
TotalFilesRead = totalFilesRead;
TotalFilesHashed = totalFilesHashed;
TotalFilesUploaded = totalFilesUploaded;
}
public int TotalFilesFound { get; }
public int TotalFilesRead{ get; }
public int TotalFilesHashed{ get; }
public int TotalFilesUploaded{ get; }
}

As far as I understand, Task.Run will instantiate a new thread for every file you have there, which leads to lots of threads and context switching between them. The case like you describe, sounds like a good case for using Parallel.For or Parallel.Foreach, something like this:
public void CalcHashes(string path)
{
string GetFileHash(System.Security.Cryptography.MD5 md5, string fileName)
{
using (var stream = new BufferedStream(System.IO.File.OpenRead(fileName), 1200000))
{
var hash = md5.ComputeHash(stream);
var fileMD5 = string.Concat(Array.ConvertAll(hash, x => x.ToString("X2")));
return fileMD5;
}
}
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 8;
Parallel.ForEach(filenames, options, fileName =>
{
using (var md5 = System.Security.Cryptography.MD5.Create())
{
GetFileHash(md5, fileName);
}
});
}
EDIT: Seems Parallel.ForEach does not actually do the partitioning automatically. Added max degree of parallelism limit to 8. As a result:
107005 files
46628 ms

Related

Nethereum C# automated ether transfer

I wish to automate the transfer of ether to a list of people.
Assume the list is in a csv.
I wrote some code to automate the process.
class Program
{
int nonce = 0;
static void Main(string[] args)
{
var account = SetupAccount();
var recipients = ReadCsv();
var web3Init = GetConnection();
nonce = web3.Eth.Transactions.GetTransactionCount.SendRequestAsync(account.Address).Result;
//var recipients = new List<Records>() { new Records() { Value = 10000000000000000, Address = "0x5CC494843e3f4AC175A5e730c300b011FAbF2cEa" } };
foreach (var recipient in recipients)
{
try
{
var web3 = GetConnection();
var receipt = SendEther(account, recipient, web3).Result;
}
catch (System.Exception)
{
MessageBox.Show("Failed");
}
Thread.Sleep(30000);
}
}
private static async Task<TransactionReceipt> SendEther(Account account, Records recipient, Web3 web3)
{
var transactionPolling = web3.TransactionManager.TransactionReceiptService;
//var currentBalance = await web3.Eth.GetBalance.SendRequestAsync(account.Address);
//assumed client is mining already
//when sending a transaction using an Account, a raw transaction is signed and send using the private key
return await transactionPolling.SendRequestAndWaitForReceiptAsync(() =>
{
var transactionInput = new TransactionInput
{
From = account.Address,
//Gas = new HexBigInteger(25000),
GasPrice = new HexBigInteger(10 ^ 10),
To = recipient.Address,
Value = new HexBigInteger(new BigInteger(recipient.Value)),
Nonce = nonce
};
var txSigned = new Nethereum.Signer.TransactionSigner();
var signedTx = txSigned.SignTransaction(account.PrivateKey, transactionInput.To, transactionInput.Value, transactionInput.Nonce);
var transaction = new Nethereum.RPC.Eth.Transactions.EthSendRawTransaction(web3.Client);
nonce++;
return transaction.SendRequestAsync(signedTx);
});
}
private static Web3 GetConnection()
{
return new Web3("https://mainnet.infura.io");
}
private static Account SetupAccount()
{
var password = "#Password";
var accountFilePath = #"filePath";
return Account.LoadFromKeyStoreFile(accountFilePath, password);
}
private static List<Records> ReadCsv()
{
string filePath = #"C:\Users\Potti\source\repos\ConversionFiles\XrcfRecipients.csv";
if (File.Exists(filePath))
{
using (StreamReader stream = new StreamReader(filePath))
{
CsvReader reader = new CsvReader(stream, new Configuration
{
TrimOptions = TrimOptions.Trim,
HasHeaderRecord = true,
HeaderValidated = null
});
reader.Configuration.RegisterClassMap<RecordMapper>();
return reader.GetRecords<Records>().ToList();
}
}
else
{
return null;
}
}
}
class Records
{
public string Address { get; set; }
public decimal Value { get; set; }
}
sealed class RecordMapper : ClassMap<Records>
{
public RecordMapper()
{
Map(x => x.Address).Name("Address");
Map(x => x.Value).Name("Value");
}
}
How do i modify the process to execute all the transactions at once instead of waiting for each to complete? (Fire and forget)
Also, are there any security considerations of doing this?
What you are currently doing is waiting for each transaction to be mined. What you can do is the following:
var account = new Account("privateKey"); // or load it from your keystore file as you are doing.
var web3 = new Web3(account, "https://mainnet.infura.io");
First create a web3 instance using the same Account object, because we are using an account with a private key, Nethereum will sign your transactions offline before sending them.
Now using the TransactionManager, you can then send a transaction per each recepient
var transactionHashes = new List<string>();
foreach(var recepient in recepients){
var transactionInput = new TransactionInput
{
From = account.Address,
GasPrice = Web3.Convert.ToWei(1.5, UnitConversion.EthUnit.Gwei);,
To = recipient.Address,
Value = new HexBigInteger(new BigInteger(recipient.Value)),
};
var transactionHash = web3.Eth.TransactionManager.SendTransactionAsync(transactionInput);
transanctionHashes.Add(transactionHash);
}
Note that when Nethereum uses the same instance of an Account and TransactionManager (or Web3 in this scenario) it creates a default NonceMemoryService, so you don't need to keep track of your nonces (Transaction number), to sign the transaction.
Also I have done a conversion for the GasPrice from Gwei to Wei, as an example of Unit conversions, I assume that you already have converted to Wei the Ether amounts you are going to send.
Finally, another note, to further simplify this, there is an upcoming EtherTransferService which allows you to input Ether amounts and Gwei price amounts to avoid doing conversions. Also the gas price will be calculated for you, if not passed any parameter.
web3.Eth.GetEtherTransferService().TransferEtherAsync("toAddress", EtherAmount);

Run Async every x number of times in a for loop

I'm downloading 100K+ files and want to do it in patches, such as 100 files at a time.
static void Main(string[] args) {
Task.WaitAll(
new Task[]{
RunAsync()
});
}
// each group has 100 attachments.
static async Task RunAsync() {
foreach (var group in groups) {
var tasks = new List<Task>();
foreach (var attachment in group.attachments) {
tasks.Add(DownloadFileAsync(attachment, downloadPath));
}
await Task.WhenAll(tasks);
}
}
static async Task DownloadFileAsync(Attachment attachment, string path) {
using (var client = new HttpClient()) {
using (var fileStream = File.Create(path + attachment.FileName)) {
var downloadedFileStream = await client.GetStreamAsync(attachment.url);
await downloadedFileStream.CopyToAsync(fileStream);
}
}
}
Expected
Hoping it to download 100 files at a time, then download next 100;
Actual
It downloads a lot more at the same time. Quickly got an error Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host
Running tasks in "batch" is not a good idea in terms of performance. A long running task would make whole batch block. A better approach would be starting a new task as soon as one is finished.
This can be implemented with a queue as #MertAkcakaya suggested. But I will post another alternative based on my other answer Have a set of Tasks with only X running at a time
int maxTread = 3;
System.Net.ServicePointManager.DefaultConnectionLimit = 50; //Set this once to a max value in your app
var urls = new Tuple<string, string>[] {
Tuple.Create("http://cnn.com","temp/cnn1.htm"),
Tuple.Create("http://cnn.com","temp/cnn2.htm"),
Tuple.Create("http://bbc.com","temp/bbc1.htm"),
Tuple.Create("http://bbc.com","temp/bbc2.htm"),
Tuple.Create("http://stackoverflow.com","temp/stackoverflow.htm"),
Tuple.Create("http://google.com","temp/google1.htm"),
Tuple.Create("http://google.com","temp/google2.htm"),
};
DownloadParallel(urls, maxTread);
async Task DownloadParallel(IEnumerable<Tuple<string,string>> urls, int maxThreads)
{
SemaphoreSlim maxThread = new SemaphoreSlim(maxThreads);
var client = new HttpClient();
foreach(var url in urls)
{
await maxThread.WaitAsync();
DownloadFile(client, url.Item1, url.Item2)
.ContinueWith((task) => maxThread.Release() );
}
}
async Task DownloadFile(HttpClient client, string url, string fileName)
{
var stream = await client.GetStreamAsync(url);
using (var fileStream = File.Create(fileName))
{
await stream.CopyToAsync(fileStream);
}
}
PS: DownloadParallel will return as soon as it starts the last download. So don't await it. If you really want to await it you should add for (int i = 0; i < maxThreads; i++) await maxThread.WaitAsync(); at the end of the method.
PS2: Don't forget to add exception handling to DownloadFile

Why does this code lead to a deadlock?

The following code is meant to copy files asynchronously but it causes deadlock in my app. It uses a task combinator helper method called 'Interleaved(..)' found here to return tasks in the order they complete.
public static async Task<List<StorageFile>> CopyFiles_CAUSES_DEADLOCK(IEnumerable<StorageFile> sourceFiles, IProgress<int> progress, StorageFolder destinationFolder)
{
List<StorageFile> copiedFiles = new List<StorageFile>();
List<Task<StorageFile>> copyTasks = new List<Task<StorageFile>>();
foreach (var file in sourceFiles)
{
// Create the copy tasks and add to list
var copyTask = file.CopyAsync(destinationFolder, Guid.NewGuid().ToString()).AsTask();
copyTasks.Add(copyTask);
}
// Serve up each task as it completes
foreach (var bucket in Interleaved(copyTasks))
{
var copyTask = await bucket;
var copiedFile = await copyTask;
copiedFiles.Add(copiedFile);
progress.Report((int)((double)copiedFiles.Count / sourceFiles.Count() * 100.0));
}
return copiedFiles;
}
I originally created a simpler 'CopyFiles(...)' which processes the tasks in the order they were supplied (as opposed to completed) and this works fine, but I can't figure out why this one deadlocks frequently. Particularly, when there are many files to process.
Here is the simpler 'CopyFiles' code that works:
public static async Task<List<StorageFile>> CopyFiles_RUNS_OK(IEnumerable<StorageFile> sourceFiles, IProgress<int> progress, StorageFolder destinationFolder)
{
List<StorageFile> copiedFiles = new List<StorageFile>();
int sourceFilesCount = sourceFiles.Count();
List<Task<StorageFile>> tasks = new List<Task<StorageFile>>();
foreach (var file in sourceFiles)
{
// Create the copy tasks and add to list
var copiedFile = await file.CopyAsync(destinationFolder, Guid.NewGuid().ToString()).AsTask();
copiedFiles.Add(copiedFile);
progress.Report((int)((double)copiedFiles.Count / sourceFilesCount *100.0));
}
return copiedFiles;
}
EDIT:
In an attempt to find out what's going on I've changed the implementation of CopyFiles(...) to use TPL Dataflow. I am aware that this code will return items in the order they were supplied, which is not what I want, but it removes the Interleaved dependency as a start. Anyway, despite this the app still hangs. It seems as if it's not returning from the file.CopyAsync(..) call. There is of course the possibility I'm just doing something wrong here.
public static async Task<List<StorageFile>> CopyFiles_CAUSES_HANGING_ALSO(IEnumerable<StorageFile> sourceFiles, IProgress<int> progress, StorageFolder destinationFolder)
{
int sourceFilesCount = sourceFiles.Count();
List<StorageFile> copiedFiles = new List<StorageFile>();
// Store for input files.
BufferBlock<StorageFile> inputFiles = new BufferBlock<StorageFile>();
//
Func<StorageFile, Task<StorageFile>> copyFunc = sf => sf.CopyAsync(destinationFolder, Guid.NewGuid().ToString()).AsTask();
TransformBlock<StorageFile, Task<StorageFile>> copyFilesBlock = new TransformBlock<StorageFile, Task<StorageFile>>(copyFunc);
inputFiles.LinkTo(copyFilesBlock, new DataflowLinkOptions() { PropagateCompletion = true });
foreach (var file in sourceFiles)
{
inputFiles.Post(file);
}
inputFiles.Complete();
while (await copyFilesBlock.OutputAvailableAsync())
{
Task<StorageFile> file = await copyFilesBlock.ReceiveAsync();
copiedFiles.Add(await file);
progress.Report((int)((double)copiedFiles.Count / sourceFilesCount * 100.0));
}
copyFilesBlock.Completion.Wait();
return copiedFiles;
}
Many thanks in advance for any help.

Task.WaitSubset / Task.WaitN?

There're Task.WaitAll method which waits for all tasks and Task.WaitAny method which waits for one task. How to wait for any N tasks?
Use case: search result pages are downloaded, each result needs a separate task to download and process it. If I use WaitAll to wait for the results of the subtasks before getting next search result page, I will not use all available resources (one long task will delay the rest). Not waiting at all can cause thousands of tasks to be queued which isn't the best idea either.
So, how to wait for a subset of tasks to be completed? Or, alternatively, how to wait for the task scheduler queue to have only N tasks?
This looks like an excellent problem for TPL Dataflow, which will allow you to control parallelism and buffering to process at maximum speed.
Here's some (untested) code to show you what I mean:
static void Process()
{
var searchReader =
new TransformManyBlock<SearchResult, SearchResult>(async uri =>
{
// return a list of search results at uri.
return new[]
{
new SearchResult
{
IsResult = true,
Uri = "http://foo.com"
},
new SearchResult
{
// return the next search result page here.
IsResult = false,
Uri = "http://google.com/next"
}
};
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 8, // restrict buffer size.
MaxDegreeOfParallelism = 4 // control parallelism.
});
// link "next" pages back to the searchReader.
searchReader.LinkTo(searchReader, x => !x.IsResult);
var resultActor = new ActionBlock<SearchResult>(async uri =>
{
// do something with the search result.
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 64,
MaxDegreeOfParallelism = 16
});
// link search results into resultActor.
searchReader.LinkTo(resultActor, x => x.IsResult);
// put in the first piece of input.
searchReader.Post(new SearchResult { Uri = "http://google/first" });
}
struct SearchResult
{
public bool IsResult { get; set; }
public string Uri { get; set; }
}
I think you should independently limit the number of parallel download tasks and the number of concurrent result processing tasks. I would do it using two SemaphoreSlim objects, like below. This version doesn't use the synchronous SemaphoreSlim.Wait (thanks #svick for making the point). It was only slightly tested, the exception handling can be improved; substitute your own DownloadNextPageAsync and ProcessResults:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Console_21666797
{
partial class Program
{
// the actual download method
// async Task<string> DownloadNextPageAsync(string url) { ... }
// the actual process methods
// void ProcessResults(string data) { ... }
// download and process all pages
async Task DownloadAndProcessAllAsync(
string startUrl, int maxDownloads, int maxProcesses)
{
// max parallel downloads
var downloadSemaphore = new SemaphoreSlim(maxDownloads);
// max parallel processing tasks
var processSemaphore = new SemaphoreSlim(maxProcesses);
var tasks = new HashSet<Task>();
var complete = false;
var protect = new Object(); // protect tasks
var page = 0;
// do the page
Func<string, Task> doPageAsync = async (url) =>
{
bool downloadSemaphoreAcquired = true;
try
{
// download the page
var data = await DownloadNextPageAsync(
url).ConfigureAwait(false);
if (String.IsNullOrEmpty(data))
{
Volatile.Write(ref complete, true);
}
else
{
// enable the next download to happen
downloadSemaphore.Release();
downloadSemaphoreAcquired = false;
// process this download
await processSemaphore.WaitAsync();
try
{
await Task.Run(() => ProcessResults(data));
}
finally
{
processSemaphore.Release();
}
}
}
catch (Exception)
{
Volatile.Write(ref complete, true);
throw;
}
finally
{
if (downloadSemaphoreAcquired)
downloadSemaphore.Release();
}
};
// do the page and save the task
Func<string, Task> queuePageAsync = async (url) =>
{
var task = doPageAsync(url);
lock (protect)
tasks.Add(task);
await task;
lock (protect)
tasks.Remove(task);
};
// process pages in a loop until complete is true
while (!Volatile.Read(ref complete))
{
page++;
// acquire download semaphore synchrnously
await downloadSemaphore.WaitAsync().ConfigureAwait(false);
// do the page
var task = queuePageAsync(startUrl + "?page=" + page);
}
// await completion of the pending tasks
Task[] pendingTasks;
lock (protect)
pendingTasks = tasks.ToArray();
await Task.WhenAll(pendingTasks);
}
static void Main(string[] args)
{
new Program().DownloadAndProcessAllAsync("http://google.com", 10, 5).Wait();
Console.ReadLine();
}
}
}
Something like this should work. There might be some edge cases, but all in all it should ensure a minimum of completions.
public static async Task WhenN(IEnumerable<Task> tasks, int n, CancellationTokenSource cts = null)
{
var pending = new HashSet<Task>(tasks);
if (n > pending.Count)
{
n = pending.Count;
// or throw
}
var completed = 0;
while (completed != n)
{
var completedTask = await Task.WhenAny(pending);
pending.Remove(completedTask);
completed++;
}
if (cts != null)
{
cts.Cancel();
}
}
Usage:
static void Main(string[] args)
{
var tasks = new List<Task>();
var completed = 0;
var cts = new CancellationTokenSource();
for (int i = 0; i < 100; i++)
{
tasks.Add(Task.Run(async () =>
{
await Task.Delay(temp * 100, cts.Token);
Console.WriteLine("Completed task {0}", i);
completed++;
}, cts.Token));
}
Extensions.WhenN(tasks, 30, cts).Wait();
Console.WriteLine(completed);
Console.ReadLine();
}
Task[] runningTasks = MyTasksFactory.StartTasks();
while(runningTasks.Any())
{
int finished = Task.WaitAny(runningTasks);
Task.Factory.StareNew(()=> {Consume(runningTasks[Finished].Result);})
runningTasks.RemoveAt(finished);
}

Retrieve process network usage

How can I get a process send/receive bytes? the preferred way is doing it with C#.
I've searched this a lot and I didn't find any simple solution for this. Some solutions suggested to install the WinPCap on the machine and to work with this lib.
Like this guy asked: Need "Processes with Network Activity" functionality in managed code - Like resmon.exe does it
I don't want the overhead of the lib.
Is there a simple solution for this?
Actually I want the exactly data that the Resource Monitor of Windows gives under the "Processes with Network Activity" tab:
How does the Resource Monitor of Windows gets this information?
Any example?
Also, tried to use the counter method which is mentioned over here:
Missing network sent/received
but with no success - as not every process is shown under this counter.
And again I'm wondering how the Resource Monitor gets this information even without using this counter...
Resource monitor uses ETW - thankfully, Microsoft have created a nice nuget .net wrapper to make it easier to use.
I wrote something like this recently to report back my process's network IO:
using System;
using System.Diagnostics;
using System.Threading.Tasks;
using Microsoft.Diagnostics.Tracing.Parsers;
using Microsoft.Diagnostics.Tracing.Session;
namespace ProcessMonitoring
{
public sealed class NetworkPerformanceReporter : IDisposable
{
private DateTime m_EtwStartTime;
private TraceEventSession m_EtwSession;
private readonly Counters m_Counters = new Counters();
private class Counters
{
public long Received;
public long Sent;
}
private NetworkPerformanceReporter() { }
public static NetworkPerformanceReporter Create()
{
var networkPerformancePresenter = new NetworkPerformanceReporter();
networkPerformancePresenter.Initialise();
return networkPerformancePresenter;
}
private void Initialise()
{
// Note that the ETW class blocks processing messages, so should be run on a different thread if you want the application to remain responsive.
Task.Run(() => StartEtwSession());
}
private void StartEtwSession()
{
try
{
var processId = Process.GetCurrentProcess().Id;
ResetCounters();
using (m_EtwSession = new TraceEventSession("MyKernelAndClrEventsSession"))
{
m_EtwSession.EnableKernelProvider(KernelTraceEventParser.Keywords.NetworkTCPIP);
m_EtwSession.Source.Kernel.TcpIpRecv += data =>
{
if (data.ProcessID == processId)
{
lock (m_Counters)
{
m_Counters.Received += data.size;
}
}
};
m_EtwSession.Source.Kernel.TcpIpSend += data =>
{
if (data.ProcessID == processId)
{
lock (m_Counters)
{
m_Counters.Sent += data.size;
}
}
};
m_EtwSession.Source.Process();
}
}
catch
{
ResetCounters(); // Stop reporting figures
// Probably should log the exception
}
}
public NetworkPerformanceData GetNetworkPerformanceData()
{
var timeDifferenceInSeconds = (DateTime.Now - m_EtwStartTime).TotalSeconds;
NetworkPerformanceData networkData;
lock (m_Counters)
{
networkData = new NetworkPerformanceData
{
BytesReceived = Convert.ToInt64(m_Counters.Received / timeDifferenceInSeconds),
BytesSent = Convert.ToInt64(m_Counters.Sent / timeDifferenceInSeconds)
};
}
// Reset the counters to get a fresh reading for next time this is called.
ResetCounters();
return networkData;
}
private void ResetCounters()
{
lock (m_Counters)
{
m_Counters.Sent = 0;
m_Counters.Received = 0;
}
m_EtwStartTime = DateTime.Now;
}
public void Dispose()
{
m_EtwSession?.Dispose();
}
}
public sealed class NetworkPerformanceData
{
public long BytesReceived { get; set; }
public long BytesSent { get; set; }
}
}
You can use PerformanceCounter. Sample code:
//Define
string pn = "MyProcessName.exe";
var readOpSec = new PerformanceCounter("Process","IO Read Operations/sec", pn);
var writeOpSec = new PerformanceCounter("Process","IO Write Operations/sec", pn);
var dataOpSec = new PerformanceCounter("Process","IO Data Operations/sec", pn);
var readBytesSec = new PerformanceCounter("Process","IO Read Bytes/sec", pn);
var writeByteSec = new PerformanceCounter("Process","IO Write Bytes/sec", pn);
var dataBytesSec = new PerformanceCounter("Process","IO Data Bytes/sec", pn);
var counters = new List<PerformanceCounter>
{
readOpSec,
writeOpSec,
dataOpSec,
readBytesSec,
writeByteSec,
dataBytesSec
};
// get current value
foreach (PerformanceCounter counter in counters)
{
float rawValue = counter.NextValue();
// display the value
}
And this is to get performance counters for the Network card. Note it is not process specific
string cn = "get connection string from WMI";
var networkBytesSent = new PerformanceCounter("Network Interface", "Bytes Sent/sec", cn);
var networkBytesReceived = new PerformanceCounter("Network Interface", "Bytes Received/sec", cn);
var networkBytesTotal = new PerformanceCounter("Network Interface", "Bytes Total/sec", cn);
Counters.Add(networkBytesSent);
Counters.Add(networkBytesReceived);
Counters.Add(networkBytesTotal);
Have a look at the IP Helper API. There is an implementation in C# by Simon Mourier that sums transferred bytes per process: https://stackoverflow.com/a/25650933/385513
It would be interesting to know how this compares with Event Tracing for Windows (ETW)...

Categories

Resources