I'm trying to get some measurements to know how much footprint if "encryption/decryption process" added\. Also I'm comparing different approaches like using FileStream or returning MemoryStream (which I need in some cases).
Looks like large files are kept in Memory (Gen2 & LOH). How could I clear heap completely (I want to see same Gen2 results in FileStream Approach)?
I'm using the using keyword. But looks like there is no hope with that! I also reduced the default Buffer Size as you can see the code below. But I still have numbers in Gen2
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i9-10920X CPU 3.50GHz, 1 CPU, 24 logical and 12 physical cores
[Host] : .NET Framework 4.8 (4.8.4250.0), X86 LegacyJIT
DefaultJob : .NET Framework 4.8 (4.8.4250.0), X86 LegacyJIT
File Stream Results
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |----------:|----------:|----------:|----------:|---------:|------:|------------:|
| TXT300BYTES_Decrypt | 2.500 ms | 0.0444 ms | 0.0593 ms | 19.5313 | - | - | 105.11 KB |
| PDF500KB_Decrypt | 12.909 ms | 0.2561 ms | 0.4348 ms | 187.5000 | 15.6250 | - | 1019.59 KB |
| PDF1MB_Decrypt | 14.125 ms | 0.2790 ms | 0.4001 ms | 406.2500 | 15.6250 | - | 2149.96 KB |
| TIFF1MB_Decrypt | 10.087 ms | 0.1949 ms | 0.1728 ms | 437.5000 | 31.2500 | - | 2329.37 KB |
| TIFF5MB_Decrypt | 22.779 ms | 0.4316 ms | 0.4239 ms | 2000.0000 | 187.5000 | - | 10434.34 KB |
| TIFF10MB_Decrypt | 38.467 ms | 0.7382 ms | 0.8205 ms | 3857.1429 | 285.7143 | - | 20144.01 KB |
Memory Stream Results
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |----------:|----------:|----------:|----------:|----------:|---------:|------------:|
| TXT300BYTES_Decrypt | 1.673 ms | 0.0098 ms | 0.0092 ms | 27.3438 | 1.9531 | - | 147.69 KB |
| PDF500KB_Decrypt | 9.956 ms | 0.1407 ms | 0.1248 ms | 328.1250 | 328.1250 | 328.1250 | 2316.08 KB |
| PDF1MB_Decrypt | 11.998 ms | 0.0622 ms | 0.0486 ms | 921.8750 | 546.8750 | 531.2500 | 4737.8 KB |
| TIFF1MB_Decrypt | 9.252 ms | 0.0973 ms | 0.0910 ms | 953.1250 | 671.8750 | 500.0000 | 4902.34 KB |
| TIFF5MB_Decrypt | 24.220 ms | 0.1105 ms | 0.0980 ms | 2531.2500 | 718.7500 | 468.7500 | 20697.43 KB |
| TIFF10MB_Decrypt | 41.463 ms | 0.5678 ms | 0.5033 ms | 4833.3333 | 1500.0000 | 916.6667 | 40696.31 KB |
public static class Constants
{
public const int BufferSize = 40960; // Default is 81920
}
File Decrypt Method
public class DescryptionService
{
public async Task<string> DecryptFileAsync(string sourcePath)
{
var tempFilePath = SecurityFileHelper.CreateTempFile();
using var sourceStream = new FileStream(sourcePath, FileMode.Open, FileAccess.Read, FileShare.Read);
var keyBytes = Convert.FromBase64String(_key);
using var destinationStream = new FileStream(tempFilePath, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite);
using var provider = new AesCryptoServiceProvider();
var IV = new byte[provider.IV.Length];
await sourceStream.ReadAsync(IV, 0, IV.Length);
using var cryptoTransform = provider.CreateDecryptor(keyBytes, IV);
using var cryptoStream = new CryptoStream(sourceStream, cryptoTransform, CryptoStreamMode.Read);
await cryptoStream.CopyToAsync(destinationStream, Constants.BufferSize);
return tempFilePath;
}
}
Memory Decrypt Method
public class DescryptionService
{
public async Task<Stream> DecryptStreamAsync(Stream sourceStream)
{
var memoryStream = new MemoryStream();
if (sourceStream.Position != 0) sourceStream.Position = 0;
var tempFilePath = SecurityFileHelper.CreateTempFile();
try
{
var keyBytes = Convert.FromBase64String(_key);
using var destinationStream = new FileStream(tempFilePath, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite);
using var provider = new AesCryptoServiceProvider();
var IV = new byte[provider.IV.Length];
await sourceStream.ReadAsync(IV, 0, IV.Length);
using var cryptoTransform = provider.CreateDecryptor(keyBytes, IV);
using var cryptoStream = new CryptoStream(sourceStream, cryptoTransform, CryptoStreamMode.Read);
await cryptoStream.CopyToAsync(destinationStream, Constants.BufferSize);
destinationStream.Position = 0;
await destinationStream.CopyToAsync(memoryStream, Constants.BufferSize);
await memoryStream.FlushAsync();
memoryStream.Position = 0;
}
finally
{
if (File.Exists(tempFilePath))
File.Delete(tempFilePath);
}
return memoryStream;
}
}
// Calling it like this
using var encryptedStream = File.OpenRead("some file path");
var svc = new DecryptionService();
using var decryptedStream = await svc.DecryptStreamAsync(encryptedStream);
By the way I also added these lines:
decryptedStream.Position = 0;
decryptedStream.SetLength(0);
decryptedStream.Capacity = 0; // <<< this one will null bytes in memory stream
And still have these results
| Method | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |----------:|----------:|----------:|----------:|----------:|----------:|---------:|------------:|
| TXT300BYTES_Decrypt | 1.659 ms | 0.0322 ms | 0.0301 ms | 1.662 ms | 27.3438 | 1.9531 | - | 148.03 KB |
| PDF500KB_Decrypt | 11.085 ms | 0.2829 ms | 0.8297 ms | 10.769 ms | 328.1250 | 328.1250 | 328.1250 | 2312.33 KB |
| PDF1MB_Decrypt | 12.479 ms | 0.2029 ms | 0.3859 ms | 12.402 ms | 906.2500 | 562.5000 | 531.2500 | 4734.61 KB |
| TIFF1MB_Decrypt | 9.352 ms | 0.0971 ms | 0.0861 ms | 9.359 ms | 953.1250 | 593.7500 | 500.0000 | 4908 KB |
| TIFF5MB_Decrypt | 24.760 ms | 0.4752 ms | 0.4213 ms | 24.607 ms | 2593.7500 | 843.7500 | 531.2500 | 20715.76 KB |
| TIFF10MB_Decrypt | 41.976 ms | 0.6657 ms | 0.5901 ms | 42.011 ms | 4833.3333 | 1500.0000 | 916.6667 | 40744.43 KB |
What did I miss?! :(
I'm comparing different approaches like using FileStream or returning MemoryStream
Looks like large files are kept in Memory (Gen2 & LOH). How could I clear heap completely (I want to see same Gen2 results in FileStream Approach)?
I am not sure if I understand what you mean by clearing the heap and why do you want to see Gen 2 collections for FileStream.
The harness that you are using (BenchmarkDotNet) enforces two full memory cleanups after every benchmark iteration. It ensures that every benchmarking iteration starts with a "clean heap". To ensure that the self-tuning nature of GC (or any other things like memory leaks) is not affecting other benchmarks, every benchmark is executed in a stand-alone process. Moreover, the number of collections as scaled per 1k operations (benchmark invocations). This allows for an apples-to-apples comparison of the GC metrics.
You are comparing two different approaches and most probably (this is a hypothesis that needs to be verified with a memory profiler) one of them allocates large objects and hence you get Gen 2 collections. The other does not.
It's a performance characteristic of a given solution and you should just take it under consideration when implementing the business logic. For example: if your service is supposed to be low latency and you can't allow for long GC pauses caused by Gen 2 collections you should choose the approach that is not allocating large objects.
If you want to get rid of Gen 2 collections you can try pooling the memory by using:
ArrayPool for arrays
RecycableMemoryStream for memory streams
Related
This question already has answers here:
Most efficient way to concatenate strings?
(18 answers)
Closed last month.
I have following code snippet that will append filename with current timestamp and it's working fine.
Just want to make sure this is best way to append strings in c# 10, if not then how we can we make below code more efficient?
ex: testfile.txt ->o/p testfile_timestamp.txt
string[] strName = formFile.FileName.Split('.');
string updatedFilename = strName[0] + "_"
+ DateTime.Now.ToUniversalTime().ToString("THHmmssfff") + "." +
strName[strName.Length - 1];
How about this:
// Just a stupid method name for demo, you'll find a better one :)
public static string DatifyFileName(string fileName)
{
// Use well tested Framework method to get filename without extension
var nameWithoutExtension = System.IO.Path.GetFileNameWithoutExtension(fileName);
// Use well tested Framework method to get extension
var extension = System.IO.Path.GetExtension(fileName);
// interpolate to get the desired output.
return $"{nameWithoutExtension}_{DateTime.Now.ToUniversalTime().ToString("THHmmssfff")}{extension}";
}
Or if you are familiar with Span<char>:
public static string DatifyFileName(ReadOnlySpan<char> fileName)
{
var lastDotIndex = fileName.LastIndexOf('.');
//Maybe : if( lastDotIndex < 0 ) throw ArgumentException("no extension found");
var nameWithoutExtension = fileName[..lastDotIndex];
var extension = fileName[lastDotIndex..];
return $"{nameWithoutExtension}_{DateTime.Now.ToUniversalTime().ToString("THHmmssfff")}{extension}";
}
Fiddle
And just to give some fire to the discussion :D ...
BenchmarkDotNet=v0.13.3, OS=Windows 10 (10.0.19044.2364/21H2/November2021Update)
Intel Core i9-10885H CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.101
[Host] : .NET 7.0.1 (7.0.122.56804), X64 RyuJIT AVX2
DefaultJob : .NET 7.0.1 (7.0.122.56804), X64 RyuJIT AVX2
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
|----------------- |-----------:|---------:|---------:|------:|--------:|-------:|----------:|------------:|
| Interpolated | 906.7 ns | 16.92 ns | 16.61 ns | 1.08 | 0.02 | 0.0458 | 384 B | 1.66 |
| InterpolatedSpan | 842.0 ns | 13.06 ns | 12.22 ns | 1.00 | 0.00 | 0.0277 | 232 B | 1.00 |
| StringBuilder | 1,010.8 ns | 6.70 ns | 5.94 ns | 1.20 | 0.02 | 0.1068 | 904 B | 3.90 |
| Original | 960.0 ns | 18.68 ns | 19.19 ns | 1.14 | 0.03 | 0.0734 | 616 B | 2.66 |
// * Hints *
Outliers
Benchmark.StringBuilder: Default -> 1 outlier was removed (1.03 us)
Benchmark.Original: Default -> 2 outliers were removed (1.03 us, 1.06 us)
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
RatioSD : Standard deviation of the ratio distribution ([Current]/[Baseline])
Gen0 : GC Generation 0 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
1 ns : 1 Nanosecond (0.000000001 sec)
Strings are immutable. whenever you modify any string, it is recreated in the memory. Hence to overcome this you should always use StringBuilder. Sample code shown below:
string[] strName = formFile.FileName.Split('.');
StringBuilder updatedFilename = new StringBuilder();
updatedFilename.Append(strName[0]);
updatedFilename.Append("_");
updatedFilename.Append(DateTime.Now.ToUniversalTime().ToString("THHmmssfff"));
updatedFilename.Append(".");
updatedFilename.Append(strName[strName.Length - 1]);
// You can get this using `ToString()` method
string filename = updatedFilename.ToString();
I am trying to experiment with data partition in Apache Ignite using .NET thin client.
In article https://apacheignite.readme.io/docs/cache-modes it's written that in partition mode "the overall data set is divided equally into partitions and all partitions are split equally between participating nodes". So I am expecting cache with 1000 records to be splitted beetween 3 Ignite nodes equally: 333(4) on each node. But when I start 3 instances of Ignite on local machine (using platforms\dotnet\bin\Apache.Ignite.exe) I can see only that 1000 records are replicated on all nodes: 1000 same records on each node. I tried to disable backups creation: 1000 records were created only on first node. Currently I use sequental Int32 values as cache key. But even when I ran LinqExample with colocated employes (with AffinityKey) data was't splitted beetween nodes.
I also tried to experiment with cache configuration:
var cacheConfiguration = new CacheClientConfiguration(_key, cacheEntity);
cacheConfiguration.CacheMode = CacheMode.Partitioned;
cacheConfiguration.Backups = 0;
cacheConfiguration.ReadFromBackup = false;
cacheConfiguration.RebalanceMode = CacheRebalanceMode.Sync;
cacheConfiguration.RebalanceOrder = 1;
cacheConfiguration.RebalanceThrottle = TimeSpan.FromSeconds(1);
cacheConfiguration.RebalanceDelay = TimeSpan.Zero;
cacheConfiguration.RebalanceBatchSize = 1024;
var cache = _igniteClient.CreateCache<int, TEntityType>(cacheConfiguration).WithExpiryPolicy(_expiryPolicy);
Perhaps I don't understand the main concept of data partitioning implemented in Apache Ignite or Apache.Ignite.exe needs some additional configuration to support data partitioning. Execuse me if it's a simple question.
PS. I am using DBeaver to check amount of records on each node. 3 ignite instances by default uses ports 10800, 10801, 10802 for .NET client requests. So I am using localhost:10800, localhost:10801, localhost:10802 adresses in DBeaver.
To get the number of entries per node, use included Visor tool.
Go to Ignite bin directory
Start ignitevisorcmd
Use open to connect to a cluster
Type cache -a
Generated report will include table with node specifics:
Nodes for: person-cache-1(#c0)
+====================================================================================================================+
| Node ID8(#), IP | CPUs | Heap Used | CPU Load | Up Time | Size (Primary / Backup) | Hi/Mi/Rd/Wr |
+====================================================================================================================+
| 1F81F615(#n0), 127.0.0.1 | 12 | 10.26 % | 0.00 % | 00:08:15.394 | Total: 513 (513 / 0) | Hi: 0 |
| | | | | | Heap: 0 (0 / <n/a>) | Mi: 0 |
| | | | | | Off-Heap: 513 (513 / 0) | Rd: 0 |
| | | | | | Off-Heap Memory: <n/a> | Wr: 0 |
+-----------------------------+------+-----------+----------+--------------+---------------------------+-------------+
| FBCF3B0F(#n1), 127.0.0.1 | 12 | 15.23 % | 0.00 % | 00:07:46.647 | Total: 486 (486 / 0) | Hi: 0 |
| | | | | | Heap: 0 (0 / <n/a>) | Mi: 0 |
| | | | | | Off-Heap: 486 (486 / 0) | Rd: 0 |
| | | | | | Off-Heap Memory: <n/a> | Wr: 0 |
+--------------------------------------------------------------------------------------------------------------------+
According to Size(Primary/...) 999 entries are split between two nodes, with 513 entries on the first node and 486 on the second node.
Check "Primary" and not "Total"; "Total" is sum of primary entries and replicated entries from other nodes. Shown setup does not have replication, so it is equal to "Primary".
If a cluster had three nodes, and a cache had 2 replicas (so 3 copies in total), then the "Total" number on one node will be close to a total number of entries (as each node contains all entries, either as a primary node or backup node for two other nodes), "Primary" would be approximately 1/3 of the total and "Backup" would be 2/3.
my question is more of algorithm design nature than about programming. I have 6 buildings in my dataset and a table with distances from each building to each building:
| From_Building_ID | To_Building_ID | Distance_Mile |
+------------------+----------------+---------------+
| 1368 | 10692 | 167.201 |
| 1368 | 10767 | 216.307 |
| 1368 | 6377 | 359.002 |
| 1368 | 10847 | 362.615 |
| 1368 | 10080 | 67.715 |
| 6377 | 10692 | 488.3 |
| 6377 | 1368 | 359.002 |
| 6377 | 10080 | 327.024 |
| 6377 | 10767 | 150.615 |
| 6377 | 10847 | 41.421 |
| 10080 | 10847 | 330.619 |
| 10080 | 6377 | 327.024 |
| 10080 | 10767 | 184.329 |
| 10080 | 10692 | 166.549 |
| 10080 | 1368 | 67.715 |
| 10692 | 1368 | 167.201 |
| 10692 | 10767 | 345.606 |
| 10692 | 6377 | 488.3 |
| 10692 | 10847 | 491.898 |
| 10692 | 10080 | 166.549 |
| 10767 | 1368 | 216.307 |
| 10767 | 10692 | 345.606 |
| 10767 | 10080 | 184.329 |
| 10767 | 10847 | 154.22 |
| 10767 | 6377 | 150.615 |
| 10847 | 6377 | 41.4211 |
| 10847 | 10692 | 491.898 |
| 10847 | 1368 | 362.615 |
| 10847 | 10080 | 330.619 |
| 10847 | 10767 | 154.22 |
+------------------+----------------+---------------+
My goal is to get a short table that includes unique combination of buildings. If a combination between any two buildings has already appeared it should not appear twice, so eventually I should end up with half the number of rows of the original set. I will then sum up the distances (for compensation purposes). the end result should look similar to this:
+------------------+----------------+---------------+
| From_Building_ID | To_Building_ID | Distance_Mile |
+------------------+----------------+---------------+
| 1368 | 10692 | 167.201 |
| 1368 | 10767 | 216.307 |
| 1368 | 6377 | 359.002 |
| 1368 | 10847 | 362.615 |
| 1368 | 10080 | 67.715 |
| 6377 | 10692 | 488.3 |
| 6377 | 10080 | 327.024 |
| 6377 | 10767 | 150.615 |
| 6377 | 10847 | 41.421 |
| 10080 | 10847 | 330.619 |
| 10080 | 10767 | 184.329 |
| 10080 | 10692 | 166.549 |
| 10692 | 10767 | 345.606 |
| 10692 | 10847 | 491.898 |
| 10767 | 10847 | 154.22 |
+------------------+----------------+---------------+
I created a class in C# with the appropriate properties:
class Distances
{
public int FromBuldingID { get; set; }
public int ToBuildingID { get; set; }
public double Distance_Mile { get; set; }
public Distances(int f, int t, double mile)
{
FromBuldingID = f;
ToBuildingID = t;
Distance_Mile = mile;
}
}
and created a List<Distances> dist that contains all the distances as described.
I tried to select distinct distances, but the data is not reliable, so it's not a viable option,
(for example the distances between 6377 10847 and 10847 6377 are not the same).
I am trying now to design my algorithm, without much success so far:
for (int i = 0; i < dist.Count; i++)
{
if (true)// what may the condition be?
{
}
}
Any help would be appreciated. Thanks!
One way:
var uniques = dist.Where(d=>d.FromBuildingID < d.ToBuildingID).ToList();
A more robust way, which will take both A:B and B:A and use the one with the smallest Distance_Mile, and throw out the other.
var uniques = dist
.GroupBy(d=>new {
a=Math.Min(d.FromBuildingID, d.ToBuildingID),
b=Math.Max(d.FromBuildingID, d.ToBuildingID)
}).Select(d=>d.OrderBy(z=>z.Distance_Mile).First())
.ToList();
In either case, if you just want the sum, instead of the final .ToList(), just put .Sum(d=>d.Distance_Mile)
One way to think about this problem is to consider that we want to use the System.Linq extension method, Distinct() to filter our duplicate items, but that method uses the class's default equality comparer to determine if two instances are equal, and the default comparer uses a reference comparison, which doesn't work for our scenario.
Since we want to consider two instances equal if either their FromBuildingId and ToBuildindId properties are equal, or if one's FromBuildingId equals the other's ToBuildingId, and it's ToBuildingId equals the other's FromBuildingId, we need to override the class's default Equals (and GetHashCode) method with that logic:
public class Distance
{
public int FromBuildingId { get; set; }
public int ToBuildingId { get; set; }
public double TotalMiles { get; set; }
public Distance(int fromBuildingId, int toBuildingId, double totalMiles)
{
FromBuildingId = fromBuildingId;
ToBuildingId = toBuildingId;
TotalMiles = totalMiles;
}
public override bool Equals(object obj)
{
var other = obj as Distance;
return other != null &&
(other.FromBuildingId == FromBuildingId && other.ToBuildingId == ToBuildingId) ||
(other.FromBuildingId == ToBuildingId && other.ToBuildingId == FromBuildingId);
}
public override int GetHashCode()
{
unchecked
{
return 17 * (FromBuildingId.GetHashCode() + ToBuildingId.GetHashCode());
}
}
}
With this done, we can now use the Distinct method on our list:
var distances = new List<Distance>
{
new Distance(1, 2, 3.4),
new Distance(2, 1, 3.3), // Should be considered equal to #1
new Distance(5, 6, 7.8),
new Distance(5, 6, 7.2) // Should be considered equal to #3
};
// remove duplicates
var uniqueDistances = distances.Distinct().ToList();
// uniqueDistnaces will only have 2 items: the first and the third from distances.
And then it's just one more extension method to get the Sum of the distinct distances:
var sum = distances.Distinct().Sum(d => d.TotalMiles);
The other answers using LINQ are valid but be aware using LINQ is generally a choice of readability vs performance. If you wished your algorithm to be able to scale performance-wise to much larger datasets, you can use dictionarys with value tuples as keys to achieve fast duplicate checking for each combination in the list when looping through.
Dictionary<ValueTuple<int, int>, boolean> uniqueCombinations = new Dictionary<ValueTuple<int, int>, boolean>();
Be aware value tuples are only available from C# 7.0 onwards. Otherwise you can use standard tuples as the key which will have a performance decrease but the dictionary structure should still make it faster than using LINQ. Tuples are the cleanest way of using unique pairs for dictionary keys since arrays are awkward to compare, using hashcodes to compare rather than the actual values in it.
Insertion should be be done in (toBuildingId, fromBuildingId) order while checking for duplicates in the dictionary should be reverse order with (fromBuildingId, toBuildingId). The boolean value is largely unnecessary but a value is needed to use the unique properties of the Dictionary data structure for fast checking of duplicates.
I'm extracting data from an API which gives me some information in JSON. However, one of the values gives me:
{X0=80,X1=80,X2=80,X3=80,X4=80,X5=80,X6=80,X7=80,X8=80,X9=80,X10=80,X11=80,X12=80,X13=80,X14=80,X15=80,X16=80,X17=80,X18=80,X19=80,X20=80,X21=80,X22=80,X23=80,X24=80,X25=80,X26=80,X27=80,X28=80,X29=80,X30=80,X31=80,X32=80,X33=80,X34=80,X35=80,X36=80,X37=80,X38=80,X39=80,X40=80,X41=80,X42=80,X43=80,X44=80,X45=80,X46=80,X47=80,X48=80,X49=80,X50=80,X51=80,X52=80,X53=80,X54=80,X55=80,X56=80,X57=80,X58=80,X59=80,X60=80,X61=80,X62=80}
I am trying to decode this for use in a C# Console Application (.NET).
My main question is, what would be the best way to extract this string into a dictionary or array? I am not sure if I have worded this correctly, so please correct me if I'm wrong!
Thanks!
You can use Linq with Trim, Split, Select, ToDictionary
var result = json.Trim('{', '}')
.Split(',')
.Select(x => x.Split('='))
.ToDictionary(x => x[0], x => int.Parse(x[1]));
Console.WriteLine(string.Join("\r\n", result.Select(x => x.Key + " : " + x.Value)));
Full Demo Here
And just because i'm bored
Benchmarks
Mode : Release (64Bit)
Test Framework : .NET Framework 4.7.1
Operating System : Microsoft Windows 10 Pro
Version : 10.0.17134
CPU Name : Intel(R) Core(TM) i7-3770K CPU # 3.50GHz
Description : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads) : 4 (8) : Architecture : x64
Clock Speed : 3901 MHz : Bus Speed : 100 MHz
L2Cache : 1 MB : L3Cache : 8 MB
Benchmarks Runs : Inputs (1) * Scales (3) * Benchmarks (3) * Runs (100) = 900
Results
--- Random Set ----------------------------------------------------------------------
| Value | Average | Fastest | Cycles | Garbage | Test | Gain |
--- Scale 100 -------------------------------------------------------- Time 0.229 ---
| Split | 0.058 ms | 0.043 ms | 207,064 | 48.000 KB | Base | 0.00 % |
| JsonReplace | 0.077 ms | 0.064 ms | 273,556 | 24.000 KB | Pass | -32.38 % |
| Regex | 0.270 ms | 0.235 ms | 950,504 | 80.000 KB | Pass | -364.87 % |
--- Scale 1,000 ------------------------------------------------------ Time 0.633 ---
| Split | 0.490 ms | 0.446 ms | 1,718,180 | 495.102 KB | Base | 0.00 % |
| JsonReplace | 0.671 ms | 0.596 ms | 2,352,043 | 195.078 KB | Pass | -36.86 % |
| Regex | 2.544 ms | 2.293 ms | 8,897,994 | 731.125 KB | Pass | -419.00 % |
--- Scale 10,000 ----------------------------------------------------- Time 5.005 ---
| Split | 5.247 ms | 4.673 ms | 18,363,748 | 4.843 MB | Base | 0.00 % |
| JsonReplace | 6.782 ms | 5.488 ms | 23,721,593 | 1.829 MB | Pass | -29.25 % |
| Regex | 31.840 ms | 27.008 ms | 111,277,134 | 6.637 MB | Pass | -506.80 % |
-------------------------------------------------------------------------------------
Data
private string GenerateData(int scale)
{
var ary = Enumerable.Range(0, scale)
.Select(x => $"X{x}={Rand.Next()}")
.ToList();
return $"{{{string.Join(",", ary)}}}";
}
Split
public class Split : Benchmark<string, Dictionary<string,int>>
{
protected override Dictionary<string,int> InternalRun()
{
return Input.Trim('{', '}')
.Split(',')
.Select(x => x.Split('='))
.ToDictionary(x => x[0], x => int.Parse(x[1]));
}
}
Regex
Credited to emsimpson92 using Cast
public class Regex : Benchmark<string, Dictionary<string,int>>
{
protected override Dictionary<string,int> InternalRun()
{
var regex = new System.Text.RegularExpressions.Regex("(?<key>[^,]+)=(?<value>[^,]+)");
var matchCollection = regex.Matches(Input.Trim('{', '}'));
return matchCollection.Cast<Match>()
.ToDictionary(
x => x.Groups["key"].Value,
x => int.Parse(x.Groups["value"].Value));
}
}
JsonReplace
Credited to Hanzalah Adalan Modified to work with string.replace
public unsafe class JsonReplace : Benchmark<string, Dictionary<string,int>>
{
protected override Dictionary<string,int> InternalRun()
{
return JsonConvert.DeserializeObject<Dictionary<string,int>>(Input.Replace("=", ":"));
}
}
Additional Resources
String.Trim Method
Returns a new string in which all leading and trailing occurrences of
a set of specified characters from the current String object are
removed.
String.Split Method (String[], StringSplitOptions)
Splits a string into substrings based on the strings in an array. You
can specify whether the substrings include empty array elements.
Enumerable.Select Method (IEnumerable, Func)
Projects each element of a sequence into a new form.
Enumerable.ToDictionary Method (IEnumerable, Func)
Creates a Dictionary from an IEnumerable according to
a specified key selector function.
This can be done with Regex The following pattern will capture keys and values in 2 separate groups (?<key>[^,]+)=(?<value>[^,]+)
Demo
Once you have your MatchCollection, run a foreach loop through it, and add each element to a dictionary.
Var myAwesomeDictionary = Newtonsoft.Json.JsonConvert.DeserializeObject<Dictionary<string,string>>(_yourJsonStringHere);
I have a FIX log file. I'm iterating on the lines, putting each string into
Message m = new Message(str, false)
Because for some reason, validation fails on the file (even the first line). Now, I see that it's a 35=X type, and 268=4 (i.e. NoMDEntries=4, so I should have 4 groups in the message)
BUT, in the debug display I am not seeing any groups. m.base._groups has a count of 0.
The string in question is:
1128=9 | 9=363 | 35=X | 49=CME | 34=3151 | 52=20121216223556363 | 75=20121217 | 268=4 | 279=0 | 22=8 | 48=43585 | 83=902 | 107=6EH3 | 269=4 | 270=13186 | 273=223556000 | 286=5 | 279=0 | 22=8 | 48=43585 | 83=903 | 107=6EH3 | 269=E | 270=13186 | 271=9 | 273=223556000 | 279=0 | 22=8 | 48=43585 | 83=904 | 107=6EH3 | 269=F | 270=13185 | 273=223556000 | 279=1 | 22=8 | 48=43585 | 83=905 | 107=6EH3 | 269=0 | 270=13186 | 271=122 | 273=223556000 | 336=0 | 346=10 | 1023=1 | 10=179 |
Another thing is how do I read the groups? Instinctively, I want to do something like
for (int i = 1; i <= noMDEntries; i++) {
Group g = m.GetGroup(i);
int action = Int32.Parse(g.GetField(279));
....
}
But that's not how it works and I haven't found documentation with better explanations.
Thanks for the help,
Yonatan.
From your code snippets, I think you're using QuickFIX/n, the native C# implementation, so I will answer accordingly.
1) Your message construction is failing because you didn't provide a DataDictionary.
Use Message::FromString instead:
Message m = new Message();
m.FromString(msg_str, false, data_dic, data_dic, someMsgFactory);
Even better, use MarketDataIncrementalRefresh::FromString to get the right return type.
You can see some uses of this function here:
https://github.com/connamara/quickfixn/blob/master/UnitTests/MessageTests.cs
2) To read groups... well, QF/n has a doc page on that, which I think explains it pretty well.
http://quickfixn.org/tutorial/repeating-groups