OutofMemoryException on LOH - c#

I've a simple c# application which tries to read and write a single row to SQL Server Database.
That row contains an Image field that I fill with 100 MB or little greater than.
Every time I get System.OutOfMemoryException.
So, instead to read and write the array bytes in a single instruction, I would like do that in more than one instruction by appending all together.
Is it possibile accomplish that through SqlDataAdapter?

Related

C# - Sort File.ReadLines IEnumerable without Memory Overhead?

Is this possible?
I have the following code to reduce the total amount of memory usage:
File.WriteAllLines(
Path.Combine(Path.GetDirectoryName(file[0]), "(Sort A-Z) " + Path.GetFileName(file[0])),
File.ReadLines(file[0]).OrderBy(s=>s)
);
(file[0] is the input file path).
This reduces usage from ForEach's e.t.c reducing CPU usage aswell as memory usage (barely).
Its also faster than using a Foreach.
The issue however, is the .OrderBy(s=>s) causes it to load the entire thing into memory. Its not as bad as normally loading it into memory, but it still rises quite a bit of memory. (Im using a 80mb file).
Is there some way to order the IEnumerable/Order by A->Z when saving to a file without using much memory?
I know it sounds vague and unsure what im looking for, because I dont know myself.
Running with .OrderBy(s=>s) on a 2.7 million line file:
https://i.imgur.com/rUyDeFJ.gifv
Running WITHOUT .OrderBy(s=>s) on a 2.7 million line file:
https://i.imgur.com/Ejbnuty.gifv
(You can see it finish)
It is necessary for .OrderBy to load the entire contents into memory. It would be impossible for it to work any other way.
OrderBy receives an IEnumerable. Therefore it receives items on at a time. However, consider the scenario where the very last row needs to be sorted before the very first row. This could only be achieved if the last row and first row were both in memory at the same time. Consider the scenario where the entire input were already sorted in the reverse order. Hopefully these examples show why it is necessary for OrderBy to load the entire contents into memory.
Algorithms exist to partition data sets into individual partitions, on disk, then merge those partitions. However, they are beyond the scope of the Linq OrderBy function.
Internally OrderBy reads everything into a buffer array then performs a quicksort over it. If you're feeling brave, refer to the reference source:
https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,2530
(It's scattered throughout this file, but lines 2534-2542 best illustrate this)

Working with large array on disk

I need to work with a large 2-dimensional array of doubles, with more than 100 million cells. The matrix first needs to be filled and then manipulated by taking either one row or one column. The matrix can be bigger than 1 terabyte in size, and will not find in memory.
How can the array be stored efficiently? The main operation is fast saving it from memory row by row (double[100k] each) and fast reading to the memory of one row or one column.
You could use Memory Mapped Files. You are essentially still working with an array, but allowing the kernel to choose what parts to load into memory. You could also possibly use Fixed size buffers to read whole sections of the memory mapped files.

Reading Range of Lines from a File

Can any one let me know fastest way of showing Range of Lines in a files of 5 GB size. For Example: If the File is having a Size of 5GB and it has line numbers has one of the column in the file. Say if the number of lines in a file are 1 million, I have Start Index Line # and End Index Line #. Say i want to read 25th Line to 89 th line of a large file, rather than reading each and every line, is there any fastest way of reading specific lines from 25th to 89th without reading whole file from begining in C#
In short, no. How can you possibly know where the carriage returns/line numbers are before you actually read them?
To avoid memory issues you could:
File.ReadLines(path)
.SkipWhile(line=>someCondition)
.TakeWhile(line=>someOtherCondition)
5GB is a huge amount of data to sift through without building some sort of index. I think you've stumbled upon a case where loading your data into a database and adding the appropriate indexes might serve you best.

data structure for indexing big file

I need to build an index for a very big (50GB+) ASCII text file which will enable me to provide fast random read access to file (get nth line, get nth word in nth line). I've decided to use List<List<long>> map, where map[i][j] element is position of jth word of ith line in the file.
I will build the index sequentially, i.e. read the whole file and populating index with map.Add(new List<long>()) (new line) and map[i].Add(position) (new word). I will then retrieve specific word position with map[i][j].
The only problem I see is that I can't predict total count of lines/words, so I will bump into O(n) on every List reallocation, no idea of how I can avoid this.
Are there any other problems with the data structure I chose for the task? Which structure could be better?
UPD: File will not be altered during the runtime. There are no other ways to retrieve content except what I've listed.
Increasing size of a large list is very expensive operation; so, it's better to reserve list size at the beginning.
I'd suggest to use 2 lists. The first contains indexes of words within file, and the second contains indexes in the first list (index of the first word in the appropriate line).
You are very likely to exceed all available RAM. And when the system starts to page in/page out GC-managed RAM, performance of the program will be completely killed. I'd suggest to store your data in memory-mapped file rather than in managed memory. http://msdn.microsoft.com/en-us/library/dd997372.aspx
UPD memory mapped files are effective, when you need to work with huge amounts of data not fitting in RAM. Basically, it's your the only choice if your index becomes bigger than available RAM.

How to read/write a specific number of bytes to file

I am looking to create a file by structuring it in size blocks. Essentially I am looking to create a rudimentary file system.
I need to write a header, and then an "infinite" possible number of entries of the same size/structure. The important parts are:
Each block of data needs to be read/writable individually
Header needs to be readable/writable as its own entity
Need a way to store this data and be able to determine its location in the file quickly
The would imagine the file would resemble something like:
[HEADER][DATA1][DATA2][DATA3][...]
What is the proper way to handle something like this? Lets say I want to read DATA3 from the file, how do I know where that data chunk starts?
If I understand you correctly and you need a way to assign a kind of names/IDs to your DATA chunks, you can try to introduce yet another type of chunk.
Let's call it TOC (table of contents).
So, the file structure will look like [HEADER][TOC1][DATA1][DATA2][DATA3][TOC2][...].
TOC chunk will contain names/IDs and references to multiple DATA chunks. Also, it will contain some internal data such as pointer to the next TOC chunk (so, you might consider each TOC chunk as a linked-list node).
At runtime all TOC chunks could be represented as a kind of HashMap, where key is a name/ID of the DATA chunk and value is its location in the file.
We can store in the header the size of chunk. If the size of chunks are variable, you can store pointers which points to actual chunk. An interesting design for variable size is in postgres heap file page. http://doxygen.postgresql.org/bufpage_8h_source.html
I am working in reverse but this may help.
I write decompilers for binary files. Generally there is a fixed header of a known number of bytes. This contains specific file identification so we can recognize the file type we are dealing with.
Following that will be a fixed number of bytes containing the number of sections (groups of data) This number then tells us how many data pointers there will be. Each data pointer may be four bytes (or whatever you need) representing the start of the data block. From this we can work out the size of each block. The decompiler then reads the blocks one at a time to get the size and location in the file of each data block. The job then is to extract that block of bytes and do whatever is needed.
We step through the file one block at a time. The size of the last block is the start pointer to the end of the file.

Categories

Resources