How to validate against a variable set of value ranges?

How to validate against a variable set of value ranges? - c#

I'm currently working on an application which allows the user to read and a configuration for a device. The configuration is stored as XML.
The issue I'm faced with is how to define validation for the application. For example, most of the values I'm storing in the XML file have to be within different ranges, e.g. 0 - 2, 1 - 50, 10 characters or 20 characters, etc.
There are a lot of these constraints that I have to validate against, and I don't want to hard-code the ranges because when version 2 of the device comes out the configuration file will have different set of ranges. E.g. instead of 0 - 2, it will be 0 - 4 and instead of 20 characters, 40 is now allowed.
How should I approach this? Should I store the validation rules in separate XML files? Should I define a class with hard-coded configuration ranges for this device, and create a new class for the version 2 of the device with its configuration ranges?

It could be done inside a XML, so kind of declarative programming, where in XML you define a behaviour. But it's not flexible and you easily can jump into pretty conplicated scenarious.
What I personally would prefer to do is maintain the logic inside the code, but the parameters range that the data in source XML has to be checked against store in some MatchData.xml.
Hope this helps.

I almost always prefer an external configuration file in cases like this. You could define an object that performs the validation (validater). When the validater object is instantiated it instantiates a validation rules object that contains all the ranges for the various validation items. I would serialize/de-serialize this object using an XML file and that file would be included in your app distribution.

Related

OutputBuffer not working for large c# list

I'm currently using SSIS to do an improvement on a project. need to insert single documents in a MongoDB collection of type Time Series. At some point I want to retrieve rows of data after going through a C# transformation script. I did this:
foreach (BsonDocument bson in listBson)
{
OutputBuffer.AddRow();
OutputBuffer.DatalineX = (string) bson.GetValue("data");
}
But this piece of code that works great with small file does not work with a 6 million line file. That is, there are no lines in the output. The other following tasks validate but react as if they had received nothing as input.
Where could the problem come from?

Your OuputBuffer has DatalineX defined as a string, either DT_STR or DT_WSTR and a specific length. When you exceed that value, things go bad. In normal strings, you'd have a maximum length of 8k or 4k respectively.
Neither of which are useful for your use case of at least 6M characters. To handle that, you'll need to change your data type to DT_TEXT/DT_NTEXT Those data types do not require a length as they are "max" types. There are lots of things to be aware of when using the LOB types.
Performance can suck depending on whether SSIS can keep the data in memory (good) or has to write intermediate values to disk (bad)
You can't readily manipulate them in a data flow
You'll use a different syntax in a Script Component to work with them
e.g.
// TODO: convert to bytes
Output0Buffer.DatalineX.AddBlobData(bytes);
Longer example of questionable accuracy with regard to encoding the bytes that you get to solve at https://stackoverflow.com/a/74902194/181965

Using a common property in serilog Logs

I am currently trying to understand serilogs for structured Logging.
Is there a way to enforce common Property to be used in serilog . Like if I have a log written in code like below
log.Info("Disk Quota {DiskQuota} exceeded by user {Username}", 100, "User1")
How can I use message template to ensure that any future log written in 2 to 3 classes where disk quota exceeding warning could be thrown and needs to be logged, uses always {Username} only and not {user} or {userid} etc.
log.Info("Disk Quota {DiskQuota} exceeded by user {User}", 100, "User2") // Disallow , Possibly ??

There is nothing out-of-the-box in Serilog that does that for you.
A good solution to this would be to implement your own source code analyzer, that can perform these checks during the build and emit warnings and/or errors when messages are similar but have different property names.
You'd have to define a String metric to decide what "similar" means to you. There are a number of String metrics you can use, and each of them are similar but have some different characteristics, with the popular ones being:
Levenshtein Distance: The minimum number of single-character edits required to change one word into the other. Strings do not have to be the same length;
Hamming Distance: The number of characters that are different in two equal length strings;
Smith–Waterman: A family of algorithms for computing variable sub-sequence similarities;
Sørensen–Dice Coefficient: A similarity algorithm that computes difference coefficients of adjacent character pairs.
You also leverage parts of the source code of the SerilogAnalyzer project to find message templates in code, and perform the checks.
A much more simple solution (but not as effective because relies on good developer behavior) would be to declare these standard message templates in a class, inside of a project that is shared across your solution, so that you can reference them instead everywhere instead of typing.
e.g.
public static class LogMessageTemplates
{
public const string DiskQuotaExceededByUsername = "Disk Quota {DiskQuota} exceeded by user {Username}";
}
And then everywhere you'd rely on developers to always use these pre-defined message templates, instead of typing the strings directly. e.g.:
log.Info(LogMessageTemplates.DiskQuotaExceededByUsername, 100, "User1");
If you use this approach you'll probably want to install SerilogAnalyzer in the project to help you identify places where the message template has X parameters but you are using Y given that it's now more difficult to spot how many properties a message template has just by looking at the name.

How do I read the Windows NTFS $Secure file (and/or the $SDS stream) programmatically in C#

The methods in the .NET platform's DirectorySecurity namespace (e.g. GetAccessRules()) are far too slow for my purposes. Instead, I wish to directly query the NTFS $Secure metafile (or, alternatively, the $SDS stream) in order to retrieve a list of local accounts and their associated permissions for each file system object.
My plan is to first read the $MFT metafile (which I've already figured out how to do) - and then, for each entry therein, look up the appropriate security descriptor in the metafile (or stream).
The ideal code block would look something like this:
//I've already successfully written code for MFTReader:
var mftReader = new MFTReader(driveToAnalyze, RetrieveMode.All);
IEnumerable<INode> nodes = mftReader.GetNodes(driveToAnalyze.Name);
foreach (NodeWrapper node in nodes)
{
//Now I wish to return security information for each file system object
//WITHOUT needing to traverse the directory tree.
//This is where I need help:
var securityInfo = GetSecurityInfoFromMetafile(node.FullName, node.SecurityID);
yield return Tuple.Create(node.FullName, securityInfo.PrincipalName, DecodeAccessMask(securityInfo.AccessMask));
}
And I would like my output to look like this:
c:\Folder1\File1.txt jane_smith Read, Write, Execute
c:\Folder1\File1.txt bill_jones Read, Execute
c:\Folder1\File2.txt john_brown Full Control
etc.
I am running .NET version 4.7.1 on the Windows 10.

There's no API to read directly from $Secure, just like there is no API to read directly from $MFT. (There's FSCTL_QUERY_FILE_LAYOUT but that just gives you an abstracted interpretation of the MFT contents.)
Since you said you can read $MFT, it sounds like you must be using a volume handle to read directly from the volume, just like chkdsk and similar tools. That allows you to read whatever you want provided you know how to interpret the on-disk structures. So your question reduces to how to correctly interpret the $Secure file.
I will not give you code snippets or exact data structures, but I will give you some very good hints. There are actually two approaches possible.
The first approach is you could scan forward in $SDS. All of the security descriptors are there, in SecurityId order. You'll find there's at various 16-byte aligned offsets, there will be a 20-byte header that includes the SecurityId among other information, and following that there's the security descriptor in serialized form. The SecurityId values will appear in ascending order in $SDS. Also every alternate 256K region in $SDS is a mirror of the previous 256K region. To cut the work in half only consider the regions 0..256K-1, 512K..768K-1, etc.
The second approach is to make use of the $SII index, also part of the $Secure file. The structure of this is a B-tree very similar to how directories are structured in NTFS. The index entries in $SII have SecurityId as the index for lookups, and also contain the byte offset you can go to in $SDS to find the corresponding header and security descriptor. This approach will be more performant than scanning $SDS, but requires you to know how to interpret a lot more structures.

Craig pretty much covered everything. I would like to clear some of them. Like Craig, no code here.
Navigate to the node number 9 which corresponds to $Secure.
Get all the streams and get all the fragments of the $SDS stream.
Read the content and extract each security descriptor.
Use IsValidSecurityDescriptor to make sure the SD is valid and stop when you reach an invalid SD.
Remember that the $Secure store the security descriptors in self-relative format.

Are you using FSCTL_QUERY_FILE_LAYOUT? The only real source of how to use this function I have found is here:
https://wimlib.net/git/?p=wimlib;a=blob;f=src/win32_capture.c;h=d62f7d07ef20c08c9bec93f261131033e39b159b;hb=HEAD
It looks like he solves the problem with security descriptors like this:
He gets basically all information about files from the MFT, but not security descriptors. For those he gets the field SecurityId from the MFT and looks in a hash table whether he already has a mapping from this ID to the ACL. If he has, he just returns it, otherwise he uses NtQuerySecurityObject and caches it in the hash table. This should drastically reduce the amount of calls. It assumes that there are few security descriptors and that the SecurityID field correctly represents the single instancing of the descriptors

Compress a short but repeating string

I'm working on a web app that needs to take a list of files on a query string (specifically a GET and not a POST), something like:
http://site.com/app?things=/stuff/things/item123,/stuff/things/item456,/stuff/things/item789
I want to shorten that string:
http://site.com/app?things=somekindofencoding
The string isn't terribly long, varies from 20-150 chars. Something that short isn't really suitable for GZip, but it does have an awful lot of repetition so compression should be possible.
I don't want a DB or Dictionary of strings - the URL will be built by a different application to the one that consumes it. I want a reversible compression that shortens this URL. It doesn't need to be secure.
Is there an existing way to do this? I'm working in C#/.Net but would be happy to adapt an algorithm from some other language/stack.

If you can express the data in BNF you could contruct a parser for the data. in stead of sending the data you could send the AST where each node would be identified as one character (or several if you have a lot of different nodes). In your example
we could have
files : file files
|
file : path id
path : itemsthing
| filesitem
| stuffthingsitem
you could the represent a list of files as path[id1,id2,...,idn] using 0,1,2 for the paths and the input being:
/stuff/things/item123,/stuff/things/item456,/stuff/things/item789
/files/item1,/files/item46,/files/item7
you'd then end up with ?things=2[123,456,789]1[1,46,7]
where /stuff/things/item is represented with 2 and /files/item/ is represented with 1 each number within [...] is an id. so 2[123] would expand to /stuff/things/item123
EDIT The approach does not have to be static. If you have to discover the repeated items dynamically you can use the same approach and pass the map between identifier and token. in that case the above example would be
?things=2[123,456,789]1[1,46,7]&tokens=2=/stuff/things/,1=/files/item
which if the grammar is this simple ofcourse would do better with
?things=/stuff/things/[123,456,789]/files/item[1,46,7]
compressing the repeated part to less than the unique value with such a short string is possible but will most likely have to be based on constraining the possible values or risk actually increasing the size when "compressing"

You can try zlib using raw deflate (no zlib or gzip headers and trailers). It will generally provide some compression even on short strings that are composed of printable characters and does look for and take advantage of repeated strings. I haven't tried it, but could also see if smaz works for your data.
I would recommend obtaining a large set of real-life example URLs to use for benchmark testing of possible compression approaches.

Text File Mapping

I have a text files that are coming always in the same text format (I do not have the xsd of the text file).
I want to map the data from it to some class.
Is there some standard way to do so, except starting writing string parsers or some complicated REGEXs.
I really do not want to go with text parsers becasue we are several people working on this and it probably take each of us time to understand what the other is doing .
Example
Thanks.

If you have a special format you need your own parser for sure.
If the format is a standard one like xml, yml, json, csv etc, the parsing library will be always available in your language.
UPDATE
From the sample you provide it seems the format is more like INI file but entries are custom. May be you could extend NINI

Solution:
Change the format of that file to a standard format like tab delimited or comma separated csv file.
Then use a many libraries that out there to read that files or import it in a database and use an ORM like Entity Framework to read them

Assuming you cannot change the incoming file format to something more machine-readable, then you will probably need to write your own custom parser. The best way to do it would be to create classes to represent and store all of the different kinds of data, using the appropriate data formats for each field (custom enums, DateTime, Version, etc.)
Try to compartmentalize the code. For example, take these lines here:
272 298 9.663 18.665 -90.000 48 0 13 2 10 5 20009 1 2 1 257 "C4207" 0 0 1000 0 0
This could be a single class or struct. Its constructor could accept the above string as a parameter, and each value could be parsed to to different local members. That same class could have a Save() or ToString() method that converts all the values back to a string if needed.
Then the parent class would simply contain an array of the above structure, based on how many entries are in the file.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.