How to get sub array without cloning - c#

I know in C# we can always get the sub-array of a given array by using Array.Copy() method. However, this will consume more memory and processing time which is unnecessary in read-only situation. For example, I'm writing a heavy load network program which exchanges messages with other nodes in the cluster very frequently. The first 20 bytes of every message is the message header while the rest bytes make up the message body. Therefore, I will divide the received raw message into header byte array and body byte array in order to process them separately. However, this will obviously consume double memory and extra time. In C, we can easily use a pointer and assign offset to it to access different parts of the array.
For instance, in C language, if we have a char a[] = "ABCDEFGHIJKLMN", we can declare a char* ptr = a + 3 to represent the array DEFGHIJKLMN.
Is there a way to accomplish this in C#?

You might be interested in ArraySegments or unsafe.
ArraySegments delimits a section of a one-dimensional array.
Check ArraySegments in action
ArraySegments usage example:
int[] array = { 10, 20, 30 };
ArraySegment<int> segment = new ArraySegment<int>(array, 1, 2);
// The segment contains offset = 1, count = 2 and range = { 20, 30 }
Unsafe define an unsafe context in which pointers can be used.
Unsafe usage example:
int[] a = { 4, 5, 6, 7, 8 };
unsafe
{
fixed (int* c = a)
{
// use the pointer
}
}

First of all you must consider this as a premature optimization.
But you may use several ways to reduce memory consumption, if you sure you really need it:
1) You may use Flyweight pattern https://en.wikipedia.org/wiki/Flyweight_pattern to pool duplicated resources.
2) You may try to use unsafe directive and manual pointer management.
3) You may just switch to C for this functionality and just call native code from your C# program.
From my experience memory consumption for short-lived objects is not a big problem and I'd just write code with flyweight pattern and profile application afterwards.

Assuming you have a Message wrapper class in C#? Why not just add a property on it called header that returns the top 20 bytes.
You can easily accomplish this using skip and take suggested by Jonathon Reinhart above if you have the entire initial array in a memory array, but it sounds like you may have it in a network stream, which means the property might be a little more involved by doing a read of the initial 20 bytes from the the stream.
Something along the lines of:
class Message
{
private readonly Stream _stream;
private byte[] _inMemoryBytes;
public Message(Stream stream)
{
_stream = stream;
}
public IEnumerable<byte> Header
{
get
{
if (_inMemoryBytes.Length >= 20)
return _inMemoryBytes.Take(20);
_stream.Read(_inMemoryBytes, 0, 20);
return _inMemoryBytes.Take(20);
}
}
public IEnumerable<byte> FullMessage
{
get
{
// Read and return the whole message. You might want amend to data already read.
}
}
}

Related

Reading/writing struct containing a variable-size 2D array using C#

I am rewriting some VB code in C# and am struggling a bit with a binary write/read operation. I have got a structure (equivalent to VB UDT) which contains a header and a variable-size 2D array. The row and column count is stored in the header.
Possibly I should be using an object rather than a struct, but I think that is a different discussion and unless it is a really bad idea I would prefer to stick with the struct for now.
I have done this sort of thing using VB many times and it is accomplished in a few lines of code. What I would generally do is something like this...
To write:
1) Dump UDT to a binary file with a Put statement.
To read:
1) Get the row and column counts from the file header using Get statements
2) Dim UDT and size the array part using the column/row counts found above
3) Read file contents into variable using a Get statement
My C# version is currently something like this:
using (BinaryReader b = new BinaryReader(File.Open( [filename] ,FileMode.Open)))
{
// Read number of rows and columns...
byte[] buff = b.ReadBytes(Marshal.SizeOf(typeof(GridStructHeader))); // read byte array
GCHandle handle = GCHandle.Alloc(buff, GCHandleType.Pinned); // protect buffer from GC
//Marshal the bytes...
GridStructHeader h =
(GridStructHeader)Marshal.PtrToStructure(handle.AddrOfPinnedObject(),
typeof(GridStructHeader));
handle.Free(); // give control of the buffer back to the GC
// Create structure...
GridStruct grd = new GridStruct();
grd.Cell = new GridCell[h.NumCols, h.NumRows];
// Read grid into memory...
buff = b.ReadBytes(Marshal.SizeOf(grd)); // size buffer to match structure
handle = GCHandle.Alloc(buff, GCHandleType.Pinned);
grd = (GridStruct)Marshal.PtrToStructure(handle.AddrOfPinnedObject(),
typeof(GridStruct));
handle.Free();
}
GridStruct is a struct containing the header and 2D data array. The relevant part of it looks like this:
[Serializable]
struct GridStruct
{
public int NumCols;
public int NumRows;
public GridCell[,] Cell;
}
GridStructHeader is a data type used for convenience here, and contains only the first part of the header. I am using this as a way of reading the row and column count (both Int data type). I initially tried to do this using ReadInt32() to read the values individually, but this wasn't working and I put it down to Serialization.
GridStructHeader contains the row and column count (and some other fields, omitted here for simplicity)
[Serializable]
struct GridStructHeader
{
public int NumCols;
public int NumRows;
}
GridCell is a struct containing the contents of each cell in the grid (AKA matrix AKA 2D array).
When I run the above I am getting an AccessViolation exception at the line
grd = (GridStruct)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(GridStruct));
My methodology is not quite right, I think. If anyone can advise how best to do this it would be appreciated. It doesn't have to be super-fast, particularly. Clarity and brevity of code are of higher priority.

How to truncate an array in place in C#

I mean is it really possible? MSDN says that arrays are fixed-size and the only way to resize is "copy-to-new-place". But maybe it is possible with unsafe/some magic with internal CLR structures, they all are written in C++ where we have a full memory control and can call realloc and so on.
I have no code provided for this question, because I don't even know if it can exist.
I'm not talking about Array.Resize methods and so on, because they obviosly do not have needed behaviour.
Assume that we have a standard x86 process with 2GB ram, and I have 1.9GB filled by single array. Then I want to release half of it. So I want to write something like:
MagicClass.ResizeArray(ref arr, n)
And do not get OutOfMemoryException. Array.Resize will try to allocate another gigabyte of RAM and will fail with 1.9+1 > 2GB OutOfMemory.
You can try Array.Resize():
int[] myArray = new int[] { 1, 2, 3, 4 };
int myNewSize = 1;
Array.Resize(ref myArray, myNewSize);
// Test: 1
Console.Write(myArray.Length);
realloc will attempt to do the inplace resize - but it reserves the right to copy the whole thing elsewhere and return a pointer that's completely different.
Pretty much the same outward behaviour is exposed by .NET's List<T> class - which you should be using anyway if you find yourself changing array sizes often. It hides the actual array reference from you so that the change is propagated throughout all of the references to the same list. As you remove items from the end, only the length of the list changes while the inner array stays the same - avoiding the copying.
It doesn't release the memory (you can always do that explicitly with Capacity = XXX, but that makes a new copy of the array), but then again, unless you're working with large arrays, neither does realloc - and if you're working with large arrays, yada, yada - we've been there :)
realloc doesn't really make sense in the kind of memory model .NET has anyway - the heap is continously collected and compacted over time. So if you're trying to use it to avoid the copies when just trimming an array, while also keeping memory usage low... don't bother. At the next heap compaction, the whole memory above your array is going to be moved to fill in the blanks. Even if it were possible to do the realloc, the only benefit you have over simply copying the array is that you would keep your array in the old-living heap - and that isn't necessarily what you want anyway.
Neither array type in BCL supports what you want. That being said - you can implement your own type that would support what you need. It can be backed by standard array, but would implement own Length and indexer properties, that would 'hide' portion of array from you.
public class MyTruncatableArray<T>
{
private T[] _array;
private int _length;
public MyTruncatableArray(int size)
{
_array = new T[size];
_length = size;
}
public T this[int index]
{
get
{
CheckIndex(index, _length);
return _array[index];
}
set
{
CheckIndex(index, _length);
_array[index] = value;
}
}
public int Length
{
get { return _length; }
set
{
CheckIndex(value);
_length = value;
}
}
private void CheckIndex(int index)
{
this.CheckIndex(index, _array.Length);
}
private void CheckIndex(int index, int maxValue)
{
if (index < 0 || index > maxValue)
{
throw new ArgumentException("New array length must be positive and lower or equal to original size");
}
}
}
It really depend what exactly do need. (E.g. do you need to truncate just so that you can easier use it from your code. Or is perf/GC/memory consumption a concern? If the latter is the case - did you perform any measurements that proves standard Array.Resize method unusable for your case?)

Mutable struct vs. class?

I'm unsure about whether to use a mutable struct or a mutable class.
My program stores an array with a lot of objects.
I've noticed that using a class doubles the amount of memory needed. However, I want the objects to be mutable, and I've been told that using mutable structs is evil.
This is what my type looks like:
struct /* or class */ Block
{
public byte ID;
public bool HasMetaData; // not sure whether HasMetaData == false or
// MetaData == null is faster, might remove this
public BlockMetaData MetaData; // BlockMetaData is always a class reference
}
Allocating a large amount of objects like this (notice that both codes below are run 81 times):
// struct
Block[,,] blocks = new Block[16, 256, 16];
uses about 35 MiB of memory, whilst doing it like this:
// class
Block[,,] blocks = new Block[16, 256, 16];
for (int z = 0; z < 16; z++)
for (int y = 0; y < 256; y++)
for (int x = 0; x < 16; x++)
blocks[x, y, z] = new Block();
uses about 100 MiB of ram.
So to conclude, my question is as follows:
Should I use a struct or a class for my Block type? Instances should be mutable and store a few values plus one object reference.
First off, if you really want to save memory then don't be using a struct or a class.
byte[,,] blockTypes = new byte[16, 256, 16];
BlockMetaData[,,] blockMetadata = new BlockMetaData[16, 256, 16];
You want to tightly pack similar things together in memory. You never want to put a byte next to a reference in a struct if you can possibly avoid it; such a struct will waste three to seven bytes automatically. References have to be word-aligned in .NET.
Second, I'm assuming that you're building a voxel system here. There might be a better way to represent the voxels than a 3-d array, depending on their distribution. If you are going to be making a truly enormous number of these things then store them in an immutable octree. By using the persistence properties of the immutable octree you can make cubic structures with quadrillions of voxels in them so long as the universe you are representing is "clumpy". That is, there are large regions of similarity throughout the world. You trade somewhat larger O(lg n) time for accessing and changing elements, but you get to have way, way more elements to work with.
Third, "ID" is a really bad way to represent the concept of "type". When I see "ID" I assume that the number uniquely identifies the element, not describes it. Consider changing the name to something less confusing.
Fourth, how many of the elements have metadata? You can probably do far better than an array of references if the number of elements with metadata is small compared to the total number of elements. Consider a sparse array approach; sparse arrays are much more space efficient.
Do they really have to be mutable? You could always make it an immutable struct with methods to create a new value with one field different:
struct Block
{
// I'd definitely get rid of the HasMetaData
private readonly byte id;
private readonly BlockMetaData metaData;
public Block(byte id, BlockMetaData metaData)
{
this.id = id;
this.metaData = metaData;
}
public byte Id { get { return id; } }
public BlockMetaData MetaData { get { return metaData; } }
public Block WithId(byte newId)
{
return new Block(newId, metaData);
}
public Block WithMetaData(BlockMetaData newMetaData)
{
return new Block(id, newMetaData);
}
}
I'm still not sure whether I'd make it a struct, to be honest - but I'd try to make it immutable either way, I suspect.
What are your performance requirements in terms of both memory and speed? How close does an immutable class come to those requirements?
An array of structs will offer better storage efficiency than an array of immutable references to distinct class instances having the same fields, because the latter will require all of the memory required by the former, in addition to memory to manage the class instances and memory required to hold the references. All that having been said, your struct as designed has a very inefficient layout. If you're really concerned about space, and every item in fact needs to independently store a byte, a Boolean, and a class reference, your best bet may be to either have two arrays of byte (a byte is actually smaller than a Boolean) and an array of class references, or else have an array of bytes, an array with 1/32 as many elements of something like BitVector32, and an array of class references.

Using enums to index a bit array

The application is a windows-based C# user interface for an embedded control and monitoring system.
The embedded system (written in C) maintains a table of faults which it sends to the c# application. The fault table contains one bit for each fault, stored in structure. Now from the point of view of the user interface the table is somewhat sparse - There are only a few of the fault bits that we are interested in, so the structure can be represented as below:
typedef struct
{
BYTE FaultsIAmNotInterestedIn0[14];
BYTE PowerSupplyFaults[4];
BYTE FaultsIAmNotInterestedIn1[5];
BYTE MachineryFaults[2];
BYTE FaultsIAmNotInterestedIn2[5];
BYTE CommunicationFaults[4];
}FAULT_TABLE;
Now, I want to be able to index each fault bit that I am interested in. In C I would use an enumeration to do this:
typedef enum
{
FF_PSU1 = offsetof(FAULT_TABLE,PowerSupplyFaults)*8,
FF_PSU2,
FF_PSU3,
FF_PSU4,
FF_PSU5,
FF_PSU6,
FF_PSU7,
FF_PSU8,
FF_PSU9,
FF_PSU10,
FF_PSU11,
FF_PSU12,
FF_PSU13,
FF_MACHINERY1 = offsetof(FAULT_TABLE,MachineryFaults)*8,
FF_MACHINERY2,
FF_MACHINERY3,
FF_MACHINERY4,
FF_MACHINERY5,
FF_MACHINERY6,
FF_MACHINERY7,
FF_MACHINERY8,
FF_MACHINERY9,
FF_MACHINERY10,
FF_COMMS1 = offsetof(FAULT_TABLE,CommunicationFaults)*8,
FF_COMMS2,
FF_COMMS3,
FF_COMMS4,
FF_COMMS5,
FF_COMMS6,
FF_COMMS7,
FF_COMMS8,
FF_COMMS9,
FF_COMMS10,
FF_COMMS11,
FF_COMMS12,
FF_COMMS13
}FAULT_FLAGS;
Is there a way that I can create a similar enumeration, based on a data structure in C#?
Add a Flags attribute to your enum. See http://msdn.microsoft.com/query/dev10.query?appId=Dev10IDEF1&l=EN-US&k=k(SYSTEM.FLAGSATTRIBUTE);k(TargetFrameworkMoniker-%22.NETFRAMEWORK%2cVERSION%3dV4.0%22);k(DevLang-CSHARP)&rd=true
You can get the offset of an element of a struct with Marshal.OffsetOf
http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.marshal.offsetof.aspx
Your struct can be created in pretty much the same way although you need to ensure you specify the layout (although I suppose this depends on how the struct is instantiated).
http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute(v=vs.71).aspx
However... the problem will come with the Enum. Enum values need to be compile time constants and Marshal.OffsetOf isn't. So you may have to store the offsets as static members of a class instead.
I also think you'll probably have to stray into unsafe code to make any real use of these offsets once you have them - and there is probably a better way to approach whatever you want to do in managed code.
You can create an enum to map the bits/flags, but you cannot map more than 32 bits.
In your example, I would create several const of long (64 bits). That won't solve the bit limitation when your array is very long (e.g. the first one).
Another trick is creating a static function that gives you the value of the bit (as a bool), upon the bit position in the array. In this case, the position (being a constant) is surely within the int range.
public static bool GetBit(byte[] buffer, int pos)
{
return (buffer[pos >> 3] & (1 << (pos & 7)) != 0);
}
Hope it helps.
Create an enum that specifies the bit-offset within each set of bytes:
enum PowerSupplyFaults
{
PowerSupplyFault1, PowerSupplyFault2, PowerSupplyFault3
}
PowerSupplyFault1 will be auto-assigned 0, Fault2 as 1, etc.
Assuming you receive a struct that mirrors the C version:
struct FaultTable
{
// ...
public byte[] PowerSupplyFaults;
public byte[] MachineFaults;
// ...
}
You can test for a fault by feeding the bytes into a BitArray and testing by index:
FaultTable faults = GetFaultTable();
BitArray psu_faults = new BitArray( fault_table.PowerSupplyFaults );
if ( psu_faults[ (int)PowerSupplyFaults.PowerSupplyFault3 ] )
{
// ...
}
Alternatively, you can walk all the bits and see which are set:
for ( int i = 0; i < psu_faults.Length; i++ )
{
if ( psu_faults[ i ] )
{
Console.WriteLine( "PSU fault: {0}", (PowerSupplyFaults)i );
}
}

How to optimize memory usage in this algorithm?

I'm developing a log parser, and I'm reading files of strings of more than 150MB.- This is my approach, Is there any way to optimize what is in the While statement? The problem is that is consuming a lot of memory.- I also tried with a stringbuilder facing the same memory comsuption.-
private void ReadLogInThread()
{
string lineOfLog = string.Empty;
try
{
StreamReader logFile = new StreamReader(myLog.logFileLocation);
InformationUnit infoUnit = new InformationUnit();
infoUnit.LogCompleteSize = myLog.logFileSize;
while ((lineOfLog = logFile.ReadLine()) != null)
{
myLog.transformedLog.Add(lineOfLog); //list<string>
myLog.logNumberLines++;
infoUnit.CurrentNumberOfLine = myLog.logNumberLines;
infoUnit.CurrentLine = lineOfLog;
infoUnit.CurrentSizeRead += lineOfLog.Length;
if (onLineRead != null)
onLineRead(infoUnit);
}
}
catch { throw; }
}
Thanks in advance!
EXTRA:
Im saving each line because after reading the log I will need to check for some information on every stored line.- The language is C#
Memory economy can be achieved if your log lines are actually can be parsed to a data row representation.
Here is a typical log line i can think of:
Event at: 2019/01/05:0:24:32.435, Reason: Operation, Kind: DataStoreOperation, Operation Status: Success
This line takes 200 bytes in memory.
At the same time, following representation just takes belo 16 bytes:
Enum LogReason { Operation, Error, Warning };
Enum EventKind short { DataStoreOperation, DataReadOperation };
Enum OperationStatus short { Success, Failed };
LogRow
{
DateTime EventTime;
LogReason Reason;
EventKind Kind;
OperationStatus Status;
}
Another optimization possibility is just parsing a line to array of string tokens,
this way you could make use of string interning.
For example, if a word "DataStoreOperation" takes 36 bytes, and if it has 1000000 entiries in the file, the economy is (18*2 - 4) * 1000000 = 32 000 000 bytes.
Try to make your algorithm sequential.
Using an IEnumerable instead of a List helps playing nice with memory, while keeping same semantic as working with a list, if you don't need random access to lines by index in the list.
IEnumerable<string> ReadLines()
{
// ...
while ((lineOfLog = logFile.ReadLine()) != null)
{
yield return lineOfLog;
}
}
//...
foreach( var line in ReadLines() )
{
ProcessLine(line);
}
I am not sure if it will fit your project but you can store the result in StringBuilder instead of strings list.
For example, this process on my machine takes 250MB memory after loading (file is 50MB):
static void Main(string[] args)
{
using (StreamReader streamReader = File.OpenText("file.txt"))
{
var list = new List<string>();
string line;
while (( line=streamReader.ReadLine())!=null)
{
list.Add(line);
}
}
}
On the other hand, this code process will take only 100MB:
static void Main(string[] args)
{
var stringBuilder = new StringBuilder();
using (StreamReader streamReader = File.OpenText("file.txt"))
{
string line;
while (( line=streamReader.ReadLine())!=null)
{
stringBuilder.AppendLine(line);
}
}
}
Memory usage keeps going up because you're simply adding them to a List<string>, constantly growing. If you want to use less memory one thing you can do is to write the data to disk, rather than keeping it in scope. Of course, this will greatly cause speed to degrade.
Another option is to compress the string data as you're storing it to your list, and decompress it coming out but I don't think this is a good method.
Side Note:
You need to add a using block around your streamreader.
using (StreamReader logFile = new StreamReader(myLog.logFileLocation))
Consider this implementation: (I'm speaking c/c++, substitute c# as needed)
Use fseek/ftell to find the size of the file.
Use malloc to allocate a chunk of memory the size of the file + 1;
Set that last byte to '\0' to terminate the string.
Use fread to read the entire file into the memory buffer.
You now have char * which holds the contents of the file as a
string.
Create a vector of const char * to hold pointers to the positions
in memory where each line can be found. Initialize the first element
of the vector to the first byte of the memory buffer.
Find the carriage control characters (probably \r\n) Replace the
\r by \0 to make the line a string. Increment past the \n.
This new pointer location is pushed back onto the vector.
Repeat the above until all of the lines in the file have been NUL
terminated, and are pointed to by elements in the vector.
Iterate though the vector as needed to investigate the contents of
each line, in your business specific way.
When you are done, close the file, free the memory, and continue
happily along your way.
1) Compress the strings before you store them (ie see System.IO.Compression and GZipStream). This would probably kill the performance of your program though since you'd have to uncompress to read each line.
2) Remove any extra white space characters or common words you can do without. ie if you can understand what the log is saying with the words "the, a, of...", remove them. Also, shorten any common words (ie change "error" to "err" and "warning" to "wrn"). This would slow down this step in the process but shouldn't affect performance of the rest.
What encoding is your original file? If it is ascii then just the strings alone are going to take over 2x the size of the file just to load up into your array. A C# character is 2 bytes and a C# string adds an extra 20 bytes per string in addition to the characters.
In your case, since it is a log file, you can probably exploit the fact that there is a lot of repetition in the the messages. You most likely can parse the incoming line into a data structure which reduces the memory overhead. For example, if you have a timestamp in the log file you can convert that to a DateTime value which is 8 bytes. Even a short timestamp of 1/1/10 would add 12 bytes to the size of a string, and a timestamp with time information would be even longer. Other tokens in your log stream might be able to be turned into a code or an enum in a similar manner.
Even if you have the leave the value as a string, if you can break it down into pieces that are used a lot, or remove boilerplate that is not needed at all you can probably cut down on your memory usage. If there are a lot of common strings you can Intern them and only pay for 1 string no matter how many you have.
If you must store the raw data, and assuming that your logs are mostly ASCII, then you can save some memory by storing UTF8 bytes internally. Strings are UTF16 internally, so you're storing an extra byte for each character. So by switching to UTF8 you're cutting memory use by half (not counting class overhead, which is still significant). Then you can convert back to normal strings as needed.
static void Main(string[] args)
{
List<Byte[]> strings = new List<byte[]>();
using (TextReader tr = new StreamReader(#"C:\test.log"))
{
string s = tr.ReadLine();
while (s != null)
{
strings.Add(Encoding.Convert(Encoding.Unicode, Encoding.UTF8, Encoding.Unicode.GetBytes(s)));
s = tr.ReadLine();
}
}
// Get strings back
foreach( var str in strings)
{
Console.WriteLine(Encoding.UTF8.GetString(str));
}
}

Categories

Resources