This question already has answers here:
How do you get total amount of RAM the computer has?
(18 answers)
Closed 18 days ago.
I want to spawn multiple threads using a ThreadPool based on the available system memory, how can I do that using C#? For example: I want to spawn around 200 threads per GB of available system memory.
I have done this exercise on python where I could use the psutil package to fetch the memory.
memory = psutil.virtual_memory()
total_memory = memory.available/1024/1024/1024
available_memory = round(total_memory)
print ('Available memory in GB is %s'%available_memory)
I would like to replicate the same on C#.
Based on what you've said you what to do the same thing of the python code but now in C#.
ComputerInfo computerInfo = new ComputerInfo();
double totalMemory = computerInfo.AvailablePhysicalMemory / 1024.0 / 1024.0 / 1024.0;
int availableMemory = (int)Math.Round(totalMemory);
Console.WriteLine("Available memory in GB is " + availableMemory);
Let me know if that's you are looking for
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm getting both Depth & Color frames from the Kinect 2, using the Kinect SDK ( C# ), and I'm sending them to Python clients using ZeroMQ.
this.shorts = new ushort[ 217088]; // 512 * 424
this.depthBytes = new Byte[ 434176]; // 512 * 424 * 2
this.colorBytes = new Byte[4147200]; // 1920 * 1080 * 4
public void SendDepthFrame(DepthFrame depthFrame)
{
depthFrame.CopyFrameDataToArray(this.shorts);
Buffer.BlockCopy(shorts, 0, this.depthBytes, 0, this.depthBytes.Length);
this.depthPublisher.SendByteArray(this.depthBytes);
}
public void SendColorFrame(ColorFrame colorFrame, WriteableBitmap map)
{
colorFrame.CopyRawFrameDataToArray(this.colorBytes);
this.colorPublisher.SendByteArray(this.colorBytes);
}
Since I'm sending uncompressed data, I'm overloading the network and I'd like to compress these frames.
Is this possible for a continuous stream-processing?
I know that I can do that by compressing in a PNG/JPEG format, but I would like to maintain the notion of video stream.
The goal is to send the compressed data in C#, and then decoding them in Python.
Is there any libs that allow to do that ?
May forget about compression for the moment and downscale for PoC
If your design indeed makes sense, try to focus rather on core CV-functionality first, at a cost of reduced ( downscaled ) FPS, colordepth, resolution ( in this order of priority ).
Your indicated data produces about 1 Gbps exgress data-stream, where the forthcoming CV-processing will choke anyways, having remarkable CV-process performance ( delay / latency ) / interim data-representations' memory-management bottlenecks.
This said, the PoC may benefit from 1/4 - 1/10 slower FPS acquisition/stream-processing and the finetuned solution may show you, how many nanoseconds-per-frame does your code have in stream-processing margin ( to finally decide if there is time & processing-power enough to include any sort of CODEC-processing into the otherwise working pipeline )
check the lower-left window delays in [usec] by a right-click -> [Open in a New Tab]to see enlarged and realise a scale / order of magnitude of a few actual openCV procesing latencies of about a 1/4 of your one FullFD still image in a real-world processing with much smaller FPS on a single-threaded i7/3.33 GHz device, where L3 cache sizes can carry as much as 15 MB of imagery-data with fastest latencies of less than 13 ns ( core-local access case ) .. 40 ns ( core-remote NUMA access case ) + block-nature of the CV-orchestrated image-processing benefits a lot from minimal if not zero cache-miss-rate -- but this is not a universal deployment hardware scenario to rely on:
The costs ( penalty ) of each cache-miss and a need to ask for and peform an access to data in the main DDR-RAM is about +100 ns >>> https://stackoverflow.com/a/33065382/3666197
Without a working pipeline, there are no quantitative data about the sustained stream-processing / it's margin-per-frame to decide the CODEC-dilemma a-priori of the proposed PoC-implementation.
This question already has answers here:
Performance issues with nested loops and string concatenations
(8 answers)
Closed 8 years ago.
im having an issue with the speed of a simple hex editor i was working on
im using a background worker, simple for/foreach loop and couple simple statements but it's still way way slower than modern hex editors
that's the main loop that is taking too long to finish
for (int i = 0; i < buffer.Count() - 1; i++)
{
string hex = Convert.ToString(buffer[i], 16);
hexstring += ((hex.Length == 1 ? hex = "0" + hex : hex = hex)) + " ";
double x = ((double)i/(double)buffer.Count());
bw.ReportProgress((int)(x * 100));
}
i know this could be written a million times better but im so curious what's causing this delay
a 1 mb exe. would take 5 mins of +50% cpu usage and this is far from being accepted, any thoughts ?
edit 1 : buffer is only a byte[], here is it's other only usage
buffer = File.ReadAllBytes(((string[]) e.Data.GetData(DataFormats.FileDrop, false))[0]);
I hate to be "that guy" in this case, but you're reinventing a built-in wheel. There is a function in .NET which enables converting a byte array to a hex string. All you need is love, err this:
string hex = BitConverter.ToString(buffer);
I suppose this doesn't answer your question of why your solution is slow. Your solution is primarily slow because of string immutability. Strings are immutable (read-only) and when you concatenate them (AKA combine them with + or += operators) you create a new object. You're creating 3, sometimes 4 strings per loop, which is not cheap since they take up memory and the garbage collector has to eventually collect them. You can avoid this by using a StringBuilder which floats a buffer under the hood when appending strings (vs creating new ones). Also, if the buffer is large, it's going to take a while - sort of the nature of the beast (more operations take longer). Hope this helps!
The reason is your use of the += operator to concatenate strings.
Each time you do that, it will copy all the previous content of the string and the added content into a new string. Each time there will be more and more data to move. At the end of the loop it will move 6 MB of data each iteration.
When you are done with creating the string for the 1 MB of data, you will have copied 3 TB of data. That is a little more than there is available RAM, so a whole bunch of garbage collections also had to be done to clean up old strings and make room for new ones.
If you use a StringBuilder instead, you will see a dramatic change in performance.
Next thing to improve would be to report the progress a little less often. You could for example do that for every kilobyte processed instead of every byte.
I am trying to measure DDR3 memory data transfer rate through a test. According to the CPU spec. maximum theoritical bandwidth is 51.2 GB/s. This should be the combined bandwidth of four channels, meaning 12.8 GB/channel. However, this is a theoretical limit and I am curious of how to further increase the practical limit in this post. In the below described test scenario I achieve a ~14 GB/s data transfer rate which I believe may be a close approximation when killing most of the throuhgput boost of the CPU L1, L2, and L3 caches.
Update 20/3 2014: This assumption of killing the L1-L3 caches is wrong. The harware prefetching of the memory controller will analyze the data accesses pattern and since it sequential, it will have an easy task of prefetching data into the CPU caches.
Specific questions follow at the bottom but mainly I am interested in a) a verifications of the assumptions leading up to this result, and b) if there is a better way measuring memory bandwith in .NET.
I have constructed a test in C# on .NET as a starter. Although .NET is not ideal from a memory allocation perspective, I think it is doable for this test (please let me know if you disagree and why). The test is to allocate an int64 array and fill it with integers. This array should have data aligned in memory. Then I simply loop this array using as many threads as I have cores on the machine and read the int64 value from the array and set it to a local public field in the test class. Since the result field is public, I should avoid compiler optimising away stuff in the loop. Futhermore, and this may be a weak assumption, I think the result stays in the register and not written to memory until it is over written again. Between each read of an element in the array I use an variable Step offset of 10, 100, and 1000 in the array in order to not be able to fetch many references in the same cache block (64 byte).
Reading the Int64 from the array should mean a lookup read of 8 bytes and then the read of the actual value another 8 byte. Since data is fetched from memory in 64 byte cache line, each read in the array should correspond to a 64 byte read from RAM each time in the loop given that the read data is not located in any CPU caches.
Here is how I initiallize the data array:
_longArray = new long[Config.NbrOfCores][];
for (int threadId = 0; threadId < Config.NbrOfCores; threadId++)
{
_longArray[threadId] = new long[Config.NmbrOfRequests];
for (int i = 0; i < Config.NmbrOfRequests; i++)
_longArray[threadId][i] = i;
}
And here is the actual test:
GC.Collect();
timer.Start();
Parallel.For(0, Config.NbrOfCores, threadId =>
{
var intArrayPerThread = _longArray[threadId];
for (int redo = 0; redo < Config.NbrOfRedos; redo++)
for (long i = 0; i < Config.NmbrOfRequests; i += Config.Step)
_result = intArrayPerThread[i];
});
timer.Stop();
Since the data summary is quite important for the result I give this info too (can be skipped if you trust me...)
var timetakenInSec = timer.ElapsedMilliseconds / (double)1000;
long totalNbrOfRequest = Config.NmbrOfRequests / Config.Step * Config.NbrOfCores*Config.NbrOfRedos;
var throughput_ReqPerSec = totalNbrOfRequest / timetakenInSec;
var throughput_BytesPerSec = throughput_ReqPerSec * byteSizePerRequest;
var timeTakenPerRequestInNanos = Math.Round(1e6 * timer.ElapsedMilliseconds / totalNbrOfRequest, 1);
var resultMReqPerSec = Math.Round(throughput_ReqPerSec/1e6, 1);
var resultGBPerSec = Math.Round(throughput_BytesPerSec/1073741824, 1);
var resultTimeTakenInSec = Math.Round(timetakenInSec, 1);
Neglecting to give you the actual output rendering code I get the following result:
Step 10: Throughput: 570,3 MReq/s and 34 GB/s (64B), Timetaken/request: 1,8 ns/req, Total TimeTaken: 12624 msec, Total Requests: 7 200 000 000
Step 100: Throughput: 462,0 MReq/s and 27,5 GB/s (64B), Timetaken/request: 2,2 ns/req, Total TimeTaken: 15586 msec, Total Requests: 7 200 000 000
Step 1000: Throughput: 236,6 MReq/s and 14,1 GB/s (64B), Timetaken/request: 4,2 ns/req, Total TimeTaken: 30430 msec, Total Requests: 7 200 000 000
Using 12 threads instead of 6 (since the CPU is hyper threaded) I get pretty much the same throughput (as expected I think): 32.9 / 30.2 / 15.5 GB/s .
As can be seen, throughput drops as the step increases which I think is normal. Partly I think it is due to that the 12 MB L3 cache forces mores cache misses and partly it may be the Memory Controllers prefetch mechanism that is not working as well when the reads are so far apart. I further believe that the step 1000 result is the closest one to the actual practical memory speed since it should kill most of the CPU caches and "hopefully" kill the prefetch mechanism. Futher more I am assuming that most of the overhead in this loop is the memory fetch operation and not something else.
hardware for this test is:
Intel Core I7-3930 (specs: CPU breif, more detailed, and really detailed spec ) using 32 GB total of DDR3-1600 memories.
Open questions
Am I correct in the assumptions made above?
Is there a way to increase the use of the memory bandwidth? For instance by doing it in C/C++ instead and spread out memory allocation more on heap enabling all four memory channels to be used.
Is there a better way to measure the memory data transfer?
Much obliged for input on this. I know it is a complex area under the hood...
All code here is available for download at https://github.com/Toby999/ThroughputTest. Feel free to contact me at an forwarding email tobytemporary[at]gmail.com.
The decrease in throughput as you increase step is likely caused by the memory prefetching not working well anymore if you don't stride linearly through memory.
Things you can do to improve the speed:
The test speed will be artificially bound by the loop itself taking up CPU cycles. As Roy shows, more speed can be achieved by unfolding the loop.
You should get rid of boundary checking (with "unchecked")
Instead of using Parallel.For, use Thread.Start and pin each thread you start on a separate core (using the code from here: Set thread processor affinity in Microsoft .Net)
Make sure all threads start at the same time, so you don't measure any stragglers (you can do this by spinning on a memory address that you Interlock.Exchange to a new value when all threads are running and spinning)
On a NUMA machine (for example a 2 Socket Modern Xeon), you may have to take extra steps to allocate memory on the NUMA node that a thread will live on. To do this, you need to PInvoke VirtualAllocExNuma
Speaking of memory allocations, using Large Pages should provide yet another boost
While .NET isn't the easiest framework to use for this type of testing, it IS possible to coax it into doing what you want.
Reported RAM results (128 MB) for my bus8thread64.exe benchmark on an i7 3820 with max memory bandwidth of 51.2 GB/s, vary from 15.6 with 1 thread, 28.1 with 2 threads to 38.7 at 8 threads. Code is:
void inc1word(IDEF data1[], IDEF ands[], int n)
{
int i, j;
for(j=0; j<passes1; j++)
{
for (i=0; i<wordsToTest; i=i+64)
{
ands[n] = ands[n] & data1[i ] & data1[i+1 ] & data1[i+2 ] & data1[i+3 ]
& data1[i+4 ] & data1[i+5 ] & data1[i+6 ] & data1[i+7 ]
& data1[i+8 ] & data1[i+9 ] & data1[i+10] & data1[i+11]
& data1[i+12] & data1[i+13] & data1[i+14] & data1[i+15]
& data1[i+16] & data1[i+17] & data1[i+18] & data1[i+19]
& data1[i+20] & data1[i+21] & data1[i+22] & data1[i+23]
& data1[i+24] & data1[i+25] & data1[i+26] & data1[i+27]
& data1[i+28] & data1[i+29] & data1[i+30] & data1[i+31]
& data1[i+32] & data1[i+33] & data1[i+34] & data1[i+35]
& data1[i+36] & data1[i+37] & data1[i+38] & data1[i+39]
& data1[i+40] & data1[i+41] & data1[i+42] & data1[i+43]
& data1[i+44] & data1[i+45] & data1[i+46] & data1[i+47]
& data1[i+48] & data1[i+49] & data1[i+50] & data1[i+51]
& data1[i+52] & data1[i+53] & data1[i+54] & data1[i+55]
& data1[i+56] & data1[i+57] & data1[i+58] & data1[i+59]
& data1[i+60] & data1[i+61] & data1[i+62] & data1[i+63];
}
}
}
This also measures burst reading speeds, where max DTR, based on this, is 46.9 GB/s. Benchmark and source code are in:
http://www.roylongbottom.org.uk/quadcore.zip
For results with interesting speeds using L3 caches are in:
http://www.roylongbottom.org.uk/busspd2k%20results.htm#anchor8Thread
C/C++ would give a more accurate metric of memory performance as .NET can sometimes do some weird things with memory handling and won't give you an accurate picture since it doesn't use compiler intrinsics or SIMD instructions.
There's no guarantee that the CLR is going to give you anything capable of truly benchmarking your RAM. I'm sure there's probably software already written to do this. Ah, yes, PassMark makes something: http://www.bandwidthtest.net/memory_bandwidth.htm
That's probably your best bet as making benchmarking software is pretty much all they do.
Also, nice processor btw, I have the same one in one of my machines ;)
UPDATE (2/20/2014):
I remember seeing some code in the XNA Framework that did some heavy duty optimizations in C# that may give you exactly what you want. Have you tried using "unsafe" code and pointers?
Since my original question was a bit too vague, let me clarify.
My goals are:
to estimate blank disc size after selecting filesystem via IMAPI
to estimate space which my file will consume on this disc if i burn it.
What i would like to know:
Is it possible to get bytes per sector for selected file system programmatically
If not, is there default value for bytes per sector which IMAPI uses for different file systems / media types, and is it documented somewhere.
Ok, so the short answer to my question is: one can safely assume, that sector size for DVD/BD discs = 2048 bytes.
The reason, why i was getting different sizes during my debug sessions, was because of an error in code, which retrieved sectors count :)
Mentioned code block was copypasted from http://www.codeproject.com/Articles/24544/Burning-and-Erasing-CD-DVD-Blu-ray-Media-with-C-an , so just in case im posting a quick fix.
original code:
discFormatData = new MsftDiscFormat2Data();
discFormatData.Recorder = discRecorder;
IMAPI_MEDIA_PHYSICAL_TYPE mediaType = discFormatData.CurrentPhysicalMediaType;
fileSystemImage = new MsftFileSystemImage();
fileSystemImage.ChooseImageDefaultsForMediaType(mediaType);
if (!discFormatData.MediaHeuristicallyBlank)
{
fileSystemImage.MultisessionInterfaces = discFormatData.MultisessionInterfaces;
fileSystemImage.ImportFileSystem();
}
Int64 freeMediaBlocks = fileSystemImage.FreeMediaBlocks;
fixed code:
discFormatData = new MsftDiscFormat2Data { Recorder = discRecorder };
fileSystemImage = new MsftFileSystemImage();
fileSystemImage.ChooseImageDefaults(discRecorder);
if (!discFormatData.MediaHeuristicallyBlank)
{
fileSystemImage.MultisessionInterfaces = discFormatData.MultisessionInterfaces;
fileSystemImage.ImportFileSystem();
}
Int64 freeMediaBlocks = fileSystemImage.FreeMediaBlocks;
If you know free/used blocks and the total size of the storage volume (ignoring used/free space) then you can calculate the size per block and then work the rest out.
block size = total size / (blocks used + blocks free)
free space = size per block * blocks free
I'd be surprised if you found the block size was anything other than 1K though
via IMAPI - IWriteEngine2::get_BytesPerSector
http://msdn.microsoft.com/en-us/library/windows/desktop/aa832661(v=vs.85).aspx
This project uses a managed IMAPI2 wrapper to make life easier - http://www.codeproject.com/Articles/24544/Burning-and-Erasing-CD-DVD-Blu-ray-Media-with-C-an
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I retrieve disk information in C#?
I need a .net c# code example to detect each server's drive space.
I also want step by step implementation instructions.
You can use the DriveInfo.GetDrives method to retrieve an array of logical drives on a machine.
Example:
var nameAndFreeSpaceOfDrives = from drive in DriveInfo.GetDrives()
where drive.IsReady
select new { drive.Name, drive.TotalFreeSpace };
You also can use management objects to obtain free space:
using System.Management;
.........
ManagementObject disk = new ManagementObject("win32_logicaldisk.deviceid=\"c:\"");
disk.Get();
MessageBox.Show(disk["FreeSpace"] + " bytes");
You also have to add reference to System.Management assembly manualy