I'm in the very early stages of working on a tag editor for mp4 files and more specifically iTunes AAC ones. After doing some snooping around it seems that the file's structure is not as complicated as I first thought and is built in a sort of tree like the following
4 Bytes [Atom Length] 4 Bytes [Atom Name] X Bytes [Atom Data]
An atom's data is as large as the length and can contain either Data(information) or another atom. What I am trying to work out is how one determines if the data is information or an actual atom. Any insight would be much appreciated.
After a lot of snooping around it seems the only way to determine if a node leads to data or another node is by knowing the data structure. As I am only interested in the tags contained the structure is pretty easy to figure out. All the tags are contained in the following hierarchy:
moov.udta.meta.ilst
When delving into the ilst node each tag is represented as a child atom who's name determines what data it contains. As for the actual data, each child atom carries a child of its own which contains the actual information and a flag as to what sort of information it is e.g text or numbers, so all in all it looks something like this:
moov.udta.meta.ilst.[atom size][atom name].[data]
Of course this still leaves the issue with self made tags stored in the uuid atom node which companies like Sony use to add more information to the file. I would imagine that each child in the uuid stores its children in the same way ilst does but I can't be sure.
Related
I'm looking at classes to use to read a large xml file. A fast implementation of the C# XmlReader class, XmlTextReader, provides "forward-only access." What does this mean?
"forward-only" means just that - you can only go forward through data. The main benefits of such approach are no need to store previous information (leading to low memory usage) and ability to read from non-seekable sources like TCP stream (where you can't seek back unlike with file stream that allow random access).
"Forward-only" is very easy to see for table-based structures (like reading from database) - "forward-only" reader will let you only check "current" record or move to the next row. There will be no way to access data from already seen rows via such reader (you have to save data outside of reader to be able to access it).
For XmlReader it is slightly more confusing as it produces tree structure out of stream of text. From stream reading point of view "forward-only" means you will not be able to get any data that reader already looked at (like root node that is basically first line of the file or parent node of current one as it had to be earlier in the file).
But from XML tree generation point of view "forward-only" may be confusing - it produces elements in depth-first order (because that how they are present in the text of the XML) meaning that "next" element is not necessary the one you'd like to see in the tree (especially if you expect breadth-first access like "names of all authors of this book").
Note that XmlReader allows you to access all attributes of current node at any time as it considers them part of the "current element".
The methods in the .NET platform's DirectorySecurity namespace (e.g. GetAccessRules()) are far too slow for my purposes. Instead, I wish to directly query the NTFS $Secure metafile (or, alternatively, the $SDS stream) in order to retrieve a list of local accounts and their associated permissions for each file system object.
My plan is to first read the $MFT metafile (which I've already figured out how to do) - and then, for each entry therein, look up the appropriate security descriptor in the metafile (or stream).
The ideal code block would look something like this:
//I've already successfully written code for MFTReader:
var mftReader = new MFTReader(driveToAnalyze, RetrieveMode.All);
IEnumerable<INode> nodes = mftReader.GetNodes(driveToAnalyze.Name);
foreach (NodeWrapper node in nodes)
{
//Now I wish to return security information for each file system object
//WITHOUT needing to traverse the directory tree.
//This is where I need help:
var securityInfo = GetSecurityInfoFromMetafile(node.FullName, node.SecurityID);
yield return Tuple.Create(node.FullName, securityInfo.PrincipalName, DecodeAccessMask(securityInfo.AccessMask));
}
And I would like my output to look like this:
c:\Folder1\File1.txt jane_smith Read, Write, Execute
c:\Folder1\File1.txt bill_jones Read, Execute
c:\Folder1\File2.txt john_brown Full Control
etc.
I am running .NET version 4.7.1 on the Windows 10.
There's no API to read directly from $Secure, just like there is no API to read directly from $MFT. (There's FSCTL_QUERY_FILE_LAYOUT but that just gives you an abstracted interpretation of the MFT contents.)
Since you said you can read $MFT, it sounds like you must be using a volume handle to read directly from the volume, just like chkdsk and similar tools. That allows you to read whatever you want provided you know how to interpret the on-disk structures. So your question reduces to how to correctly interpret the $Secure file.
I will not give you code snippets or exact data structures, but I will give you some very good hints. There are actually two approaches possible.
The first approach is you could scan forward in $SDS. All of the security descriptors are there, in SecurityId order. You'll find there's at various 16-byte aligned offsets, there will be a 20-byte header that includes the SecurityId among other information, and following that there's the security descriptor in serialized form. The SecurityId values will appear in ascending order in $SDS. Also every alternate 256K region in $SDS is a mirror of the previous 256K region. To cut the work in half only consider the regions 0..256K-1, 512K..768K-1, etc.
The second approach is to make use of the $SII index, also part of the $Secure file. The structure of this is a B-tree very similar to how directories are structured in NTFS. The index entries in $SII have SecurityId as the index for lookups, and also contain the byte offset you can go to in $SDS to find the corresponding header and security descriptor. This approach will be more performant than scanning $SDS, but requires you to know how to interpret a lot more structures.
Craig pretty much covered everything. I would like to clear some of them. Like Craig, no code here.
Navigate to the node number 9 which corresponds to $Secure.
Get all the streams and get all the fragments of the $SDS stream.
Read the content and extract each security descriptor.
Use IsValidSecurityDescriptor to make sure the SD is valid and stop when you reach an invalid SD.
Remember that the $Secure store the security descriptors in self-relative format.
Are you using FSCTL_QUERY_FILE_LAYOUT? The only real source of how to use this function I have found is here:
https://wimlib.net/git/?p=wimlib;a=blob;f=src/win32_capture.c;h=d62f7d07ef20c08c9bec93f261131033e39b159b;hb=HEAD
It looks like he solves the problem with security descriptors like this:
He gets basically all information about files from the MFT, but not security descriptors. For those he gets the field SecurityId from the MFT and looks in a hash table whether he already has a mapping from this ID to the ACL. If he has, he just returns it, otherwise he uses NtQuerySecurityObject and caches it in the hash table. This should drastically reduce the amount of calls. It assumes that there are few security descriptors and that the SecurityID field correctly represents the single instancing of the descriptors
I have a directory with about 30 randomly named XML files. So the name is no clue about their content. And I need to merge all of these files into a single file according to predefined rules. And unfortunately, it is too complex to use simple stylesheets.
Each file can have up to 15 different elements within its root. So, I have 15 different methods that each take an XDocument as parameter and search for a specific element in the XML. It will then process that data. And because I call these methods in a specific order, I can assure that all data is processed in the correct order.
Example nodes are e.g. a list of products, a list of prices for specific product codes, a list of translations for product names, a list of countries, a list of discounts on product in specific country and much, much more. And no, these aren't very simple structures either.
Right now, I'm doing something like this:
List<XmlFileData> files = ImportFolder.EnumerateFiles("*.xml", SearchOption.TopDirectoryOnly).Select(f => new XDocument(f.FullName)).ToList();
files.ForEach(MyXml, FileInformation);
files.ForEach(MyXml, ParseComments);
files.ForEach(MyXml, ParsePrintOptions);
files.ForEach(MyXml, ParseTranslations);
files.ForEach(MyXml, ParseProducts);
// etc.
MyXml.Save(ExportFile.FullName);
I wonder if I can do this in a way that I have to read less in memory and generate a faster result. Speed is more important than memory, though. Thus, this solution works. I just need something faster that will use less memory.
Any suggestions?
One approach would be to create a separate List<XElement> for each of the different data types. For example:
List<XElement> Comments = new List<XElement>();
List<XElement> Options = new List<XElement>();
// etc.
Then for each document you can go through the elements in that document and add them to the appropriate lists. Or, in pseudocode:
for each document
for each element in document
add element to the appropriate list
This way you don't have to load all of the documents into memory at the same time. In addition, you only do a single pass over each document.
Once you've read all of the documents, you can concatenate the different elements into your single MyXml document. That is:
MyXml = create empty document
Add Comments list to MyXml
Add Options list to MyXml
// etc.
Another benefit of this approach is that if the total amount of data is larger than will fit in memory, those lists of elements could be files. You'd write all of the Comment elements to the Comments file, the Options to the Options file, etc. And once you've read all of the input documents and saved the individual elements to files, you can then read each of the element files to create the final XML document.
Depending on the complexity of your rules, and how interdependent the data is between the various files, you could probably process each file in parallel (or at least certain chunks of it).
Given that the XDocument's aren't being changed during the read, you could most certainly gather your data in parallel, which would likely offer a speed advantage.
See https://msdn.microsoft.com/en-us/library/dd460693%28v=vs.110%29.aspx
You should examine the data you're loading in, and whether you can work on that in any special way to keep memory-usage low (and even gain some speed).
I am a novice at programming, working on a C# solution for a geomorphology project. I need to extract coordinates from a variable number of Google Earth KML ground overlay files, converted to one long text string, and enter them into an array that can be accessed by other methods.
The KML tags and data of interest look like this:
<LatLonBox>
<north>37.91904192681665</north>
<south>37.46543388598137</south>
<east>15.35832653742206</east>
<west>14.60128369746704</west>
<rotation>-0.1556640799496235</rotation>
</LatLonBox>
The text files I will be processing with the program could have between 1 and a 100 or more of these data groups, each embedded within the standard KML file headers/footers and other tags extraneous for my work. I have already developed the method for extracting the coordinate values as strings and have tested it for one KML file.
At this point it seems that the most efficient approach would be to construct some kind of looping method to search through the string for a coordinate data group, extract the data to a row in the array, then continue to the next group. The method might also go through the string and extract all the "north" data to a column in the array first, then loop back for all the "south" data, etc. I am open to any suggestions.
Due to my limited programming background, straight-forward solutions would be preferred over elegant or advanced solutions, but give it your best shot.
Thanks for your help.
I have an issue where I need to load a fixed-length file. Process some of the fields, generate a few others, and finally output a new file. The difficult part is that the file is of part numbers and some of the products are superceded by other products (which can also be superceded). What I need to do is follow the superceded trail to get information I need to replace some of the fields in the row I am looking at. So how can I best handle about 200000 lines from a file and the need to move up and down within the given products? I thought about using a collection to hold the data or a dataset, but I just don't think this is the right way. Here is an example of what I am trying to do:
Before
Part Number List Price Description Superceding Part Number
0913982 3852943
3852943 0006710 CARRIER,BEARING
After
Part Number List Price Description Superceding Part Number
0913982 0006710 CARRIER,BEARING 3852943
3852943 0006710 CARRIER,BEARING
As usual any help would be appreciated, thanks.
Wade
Create structure of given fields.
Read file and put structures in collection. You may use part number as key for hashtable to provide fastest searching.
Scan collection and fix the data.
200 000 objects from given lines will fit easily in memory.
For example.
If your structure size is 50 bytes then you will need only 10Mb of memory. It is nothing for modern PC.