I have a computationally intense program that starts by filling a very large array using data from a binary file. The program then goes on to manipulate this data, etc. Loading my variable from the bin file is the most time consuming step.
Is there a way to keep the variable in Visual Studio's memory so I can just edit/recompile the meat of the program, without repeating this loading process each time?
I'm coming from Matlab, where I could load a variable into the workspace and then use it as much as I wanted from any script until I closed the Matlab environment.
I don't think that's possible. You will have to load it to memory everytime you compile. Matlab is very different from C#. As you said it runs scripts under a closed environment.
C# compiles a standalone program that communicates with Visual Studio (debug version), but doesn't belong to VS.
I don't think you can do it with Visual Studio, but you might consider using an in memory cache like redis to store your data. It's a bit of a pain to set up on Windows, so I recommend getting it via Chocolatey. Once it's running StackExchange.Redis is the go to provider.
loading huge data into memory is not time consuming, processing it is time consuming.
loading huge files mostly depends on read speed of disk drive. if you have SSD that should not take more than few seconds.
anyway, you can always store your data in form of json or somewhat contrived way, store binary form of your data (deserialized) into disk.
you will load your baked data from disk, rather than getting raw data and processing it again.
Related
I am working on a code written by x-colleague. We have several Image files and we are converting them to XAML. The code is using XDocument to load in the image file (not of huge sizes but quite a lot of them) and do the processing on multi-thread. I have tried to look for every object which I think can be disposed once each iteration completes but still the issue is there, If I keep running the process it consumes the RAM fully and then Visual Studio crashes, what surprises me most is once this happened then I am unable to open anything on my PC, every single thing complains about memory is full including Visual Studio.
I am unable to upload the image here.
What I have tried it to run it on a single thread, though I encounter GC pressure but I am still able to run the code and memory stays good until the end.
I know I need to look for alternative instead of using XDocument but that is out of scope at the moment and I need to work through the code.
Can You please help me or give me some pointers?
Code below is how I load the XML before sending it to API for processing:
XDocument doc;
using (var fileStream = new MemoryStream(System.Text.Encoding.ASCII.GetBytes(Image1.sv.ToString())))
{
doc = XDocument.Load(fileStream);
}
API then uses multi-threading to process the image file to convert to XAML using different methods, each of these are using XDocument, its loading via memory stream, save in memory and continued the processing.
I have used Diagnostic Tools within VS to identify the memory leak.
Kind regards
The new MemoryStream(Encoding.ASCII.GetBytes(someString)) step seems very redundant, so we can shave a lot of things by just... not doing that, and using XDocument.Parse(someString):
var doc = XDocument.Parse(Image1.sv.ToString());
This also avoids losing data by going via ASCII, which is almost always the wrong choice.
More savings may be possible, if we knew what Image1.sv was here - i.e. it may be possible to avoid allocating a single large string in the first place.
I want to write a map editor for a game. I intend doing it using C++ and OpenGL. However, the game was written in Unity, so map loading/saving code was written in C#.
Since I worked on a similar project in C# WinForms, I have already written a C# dll that can manage some game generated files, including map files. I now plan to use it to load/save map files in the main C++ program.
What does the C# dll do? (tl;dr below the second line)
It has a method for loading Region into memory, consisting of an array of 1024 MemoryStreams that each contain a compressed Chunk (about 2kB to 20kB per chunk, mostly around 5kB). It also has a method for requesting a Chunk from the Region. It decompresses the stream and reads it into a Chunk object (which is a complex object with arrays, lists, dictionaries and other custom classes with complexities of their own).
I also have the methods that do the reverse - pack the Chunk object into a MemoryStream, compress it and add it the Region object that has a method which saves it to a file on the disk.
The uncompressed chunk data ranges from 15kB to over 120kB in size, and that's just raw data, not including any object creation related overhead.
In the main program, I'd probably have several thousand of those Chunksloaded into memory at once, some maybe briefly to cache some data and be unloaded (to perhaps generate distant terrain), others fully to modify them to the users wishes.
tl;dr I'd be loading anywhere from a few hundred megabytes up to over a gigabyte of data within a managed c# dll. The data wouldn't be heavily accessed, only changed when user does changes to terrain, which is not that often speaking in terms of time in CPU scale. But as the user moves the map, a lot of chunks might be requested to be loaded/unloaded at a time.
Given that all this is within a managed C# dll, my question is, what happens to memory management and how does that impact performance of the native C++ program? To what extent can I control the memory allocation for the Region/Chunk objects? How does that impact the speed of execution?
Is it something that can be overlooked/ignored and/or dealt with, or will it pose enough of a problem to justify rewriting the dll in native C++ with a more elaborate memory management scheme?
I have a very long (over 3 hour, and somewhat manual) pre-processing method to obtain all the data I need to run an analysis. I am running this in debug mode and the pre-processing works great and I get all the data I want correctly, however once I start processing the data, I discover a bug. If I stop the process I will have to re-run the pre-processing again, only to discover another possible bug. Is there a way to save this pre-processed data, so I can just dump it into memory without having to pre-process everytime without stopping the process?
I am break pointed just after the pre-process and before the processing, and would kind of like a save point without having to STOP the process and add code.
If the data takes long to generate, but there's not actually that much of it, then you could use serialization to write your data into a file.
Probably the simplest option would be to use BinaryFormatter: you just need to mark all the types that you want to save as [Serializable] and it will work automatically.
Not sure I fully understand your requirement but how about creating a memory dump file? Then you can continue execution as many times as you want from a known save point. See on MSDN: Use Dump Files to Debug App Crashes and Hangs in Visual Studio.
I have an application used to import a large dataset (millions of records) from one database to another, doing a diff in the process (IE removing things that were deleted, updating things, etc). Due to many foreign key constraints and such and to try to speed up the processing of the application, it loads up the entire destination database into memory and then tries to load up parts of the source database and does an in-memory compare, updating the destination in memory as it goes. In the end it writes these changes back to the destination. The databases do not match one to one, so a single table in one may be multiple tables in the other, etc.
So to my question: it currently takes hours to run this process (sometimes close to a day depending on the amount of data added/changed) and this makes it very difficult to debug. Historically, when we encounter a bug, we have made a change, and then rerun the app which has to load all of the data into memory again (taking quite some time) and then run the import process until we get to the part we were at and then we cross our fingers and hope our change worked. This isn't fun :(
To speed up the debugging process I am making an architectural change by moving the import code into a separate dll that is loaded into a separate appdomain so that we can unload it, make changes, and reload it and try to run a section of the import again, picking up where we left off, and seeing if we get better results. I thought that I was a genius when I came up with this plan :) But it has a problem. I either have to load up all the data from the destination database into the second appdomain and then, before unloading, copy it all to the first using the [Serializable] deal (this is really really slow when unloading and reloading the dll) or load the data in the host appdomain and reference it in the second using MarshalByRefObject (which has turned out to make the whole process slow it seems)
So my question is: How can I do this quickly? Like, a minute max! I would love to just copy the data as if it was just passed by reference and not have to actually do a full copy.
I was wondering if there was a better way to implement this so that the data could better be shared between the two or at least quickly passed between them. I have searched and found things recommending the use of a database (we are loading the data in memory to AVOID the database) or things just saying to use MarshalByRefObject. I'd love to do something that easy but it hasn't really worked yet.
I read somewhere that loading a C++ dll or unmanaged dll will cause it to ignore app domains and could introduce some problems. Is there anyway I could use this to my advantage, IE, load an unmanaged dll that holds my list for me or something, and use it to trick my application into using the same memory are for both appdomains so that the lists just stick around when I unload the other dll by unloading the app domain?
I hope this makes sense. It's my first question on here so if I've done a terrible job do help me out. This has frustrated me for a few days now.
App domains approach is a good way of separating for the sake of loading/unloading only part of your application. Unfortunately, as you discovered, exchanging data between two app domains is not easy/fast. It is just like two different system processes trying to communicate which will always be slower than the same process communication. So the way to go is to use quickest possible inter process communication mechanism. Skip WCF as it ads overhead you do not need here. Use named pipes through which you can stream data very fast. I have used it before with good results. To go even faster you can try MemoryMappedFile (link) but that's more difficult to implement. Start with named pipes and if that is too slow go for memory mapped files.
Even when using fast way of sending, you may hit another bottleneck - data serialization. For large amounts of data, standard serialization (even binary) is very slow. You may want to look at Google's protocol buffers.
One word of caution on AppDomain - any uncaught exception in one of the app domains brings the whole process down. They are not that separated, unfortunately.
On the side note. I do not know what your application does but millions of records does not seem that excessive. Maybe there is a room for optimization?
You didn't say if it were SQL Server, but did you look at using SSIS for doing this? There are evidently some techniques that can make it fast with big data.
Are there any in-built or 3rd party libraries that allow you to simply dump all variables in memory during run time? What I would like is to be able to view variables & current values similarly to viewing them by hitting a break point and hovering over variables, but without actually having to halt the program execution (i.e. just get a snapshot). Would be good if it could dump them to a file which can then be opened later in a program to get a nice GUI interface to view them, but simple text file dump would be good enough.
I can't think of an easy way to do this in a generic fashion. What could work is programmatically creating a dump file of your running process. You could either do this with P/Invoke to the dbghelp.dll routines or spawn a cdb.exe process to create the dump file. Once you have the file, you could open it up in a debugger for later analysis using SOS.dll with cdb.exe/windbg.exe, or even write a debugger script to dump the data you want (mostly) automatically.
I believe some sort of logging framework would help you to do that...
Check out:
http://www.dotnetlogging.com/
At my workplace we use log4net which works pretty well for us.
So how come you're wanting to dump out all the variables for later analysis? Have you considered writing your code test first so that you can reduce your reliance on the debugger and have a suite of automated test checking the values for you?
In the past I've used the YourKit .Net profiler in order to profile .Net applications.
While I've only ever used it to connect to running applications personally the Snapshot documentation does state that they have a Profiler API that can be used to programmatically dump snapshots for later review.
Code wise this looks to be as simple as the following:
Controller c = new Controller();
String snapshotPath = c.CaptureSnapshot();
I believe you can then load the snapshot files into the YourKit GUI at a later date to review them.
I would not be surprised if some of the other popular profilers like JetBrains dotTrace Performance and RedGates ANTS Performance Profiler have similar programmatic APIs but I couldn't quickly find obvious documentation on their websites (and I didn't want to watch their webinars to find out if this feature existed!)
For this you can use WMemoryProfiler to
Get all objects in all appdomains as an object array
Create a memory dump of your own process
Serialize specific objects to disc
To make this happen you need Windbg of course but the Api of WMemoryProfiler is fully managed and you can basically self debug your process. The library takes care of the usual debugger oddities since it does wrap Windbg in a nice accessible library.
The code below does get all instances of System.Threading.Thread objects into an object array. This way you can write a visualizer for your own application objects at runtime. The other overload does simply give you all objects in all AppDomains.
using (var debugger = new MdbEng())
{
var dummy = new Thread(() => {});
dummy.Name = "Dummy Thread";
// Get all thread objects in all AppDomains
var threads = debugger.GetObjects("System.Threading.Thread", true);
foreach (Thread t in threads)
{
Console.WriteLine("Managed thread {0} has Name {1}", t.ManagedThreadId, t.Name);
}
GC.KeepAlive(dummy);
}
Since it is a wrapper around Windbg you can also create a memory dump on the fly and later load a memory dump from your process to extract object data for visualization from the dump. Commerical Memory Profilers (e.g. MemoryProfiler from Scitech) employ this technique since years but it is quite slow when you have a huge memory dump since they are using also Windbg as dump analyzer.
You can try Intellitrace tool provided with ultimate version of visual studio. It is what you describe - it records what is happening in your app and allows you to debug it without executing your program with hovering over variables and all other debug windows to help you.
You can use PostSharp . I found it very useful to record debug times because of the environment application was deployed. And instrumented/recorded many things.
But obviously you'll need to specify all variables you need to record.
Check more details here.