I want to transition some old files first to human readable type, so in Delphi code is following:
OpenFileWriteRA(MyF, dd+'invoice.mfs', SizeOf(TFSerialDocEx)) then
and then calling
ReadFile(MyF, vSS1, SizeOf(TFSerialDocEx), nr1, nil);
So i am looking for a way to conver this files using with small programm i want to make it with C#, as i am more fammiliar with C# than with Delphi. .MFS file is written in binary, so what would i need to convert this to text/string, i tryed with simple binary convert but it was not ok, as it seems SizeOf Object at paramters is big thing here or?
There broadly speaking are three approaches that I would consider:
1. Transform data with Delphi code
Since you already have Delphi code to read the data, and structures defined, it will be simplest and quickest to transform the data with Delphi code. Simply read it using your existing code and then output in human readable form. For instance using the built in JSON libraries.
2. Define an equivalent formatted C# structure and blit the binary data onto that structure
Define a formatted structure in C# that has identical binary layout to the structure put to disk. This will use LayoutKind.Sequential and perhaps specify Pack = 1 if the Delphi structure is packed. You may need to use the MarshalAs attribute on some members to achieve binary equivalence. Then read the structure from disk into a byte array. Pin this array, and use Marshal.PtrToStructure on the pinned object address to deserialize. Now you have the data, you can write it how you please.
An example can be found here: Proper struct layout from delphi packed record
3. Read the structure field by field with a binary reader
Rather than declaring a binary compatible structure you can use a BinaryReader to read from a stream one field at a time. Method calls like Read, ReadInt32, ReadDouble, etc. let you work your way through the record. Remember that the fields will have been written in the order in which the Delphi record was declared. If the original record is aligned rather than packed you will need to step over any padding. Again, once you have the data available to your C# code you can write it as you please.
Related
I'm currently using SSIS to do an improvement on a project. need to insert single documents in a MongoDB collection of type Time Series. At some point I want to retrieve rows of data after going through a C# transformation script. I did this:
foreach (BsonDocument bson in listBson)
{
OutputBuffer.AddRow();
OutputBuffer.DatalineX = (string) bson.GetValue("data");
}
But this piece of code that works great with small file does not work with a 6 million line file. That is, there are no lines in the output. The other following tasks validate but react as if they had received nothing as input.
Where could the problem come from?
Your OuputBuffer has DatalineX defined as a string, either DT_STR or DT_WSTR and a specific length. When you exceed that value, things go bad. In normal strings, you'd have a maximum length of 8k or 4k respectively.
Neither of which are useful for your use case of at least 6M characters. To handle that, you'll need to change your data type to DT_TEXT/DT_NTEXT Those data types do not require a length as they are "max" types. There are lots of things to be aware of when using the LOB types.
Performance can suck depending on whether SSIS can keep the data in memory (good) or has to write intermediate values to disk (bad)
You can't readily manipulate them in a data flow
You'll use a different syntax in a Script Component to work with them
e.g.
// TODO: convert to bytes
Output0Buffer.DatalineX.AddBlobData(bytes);
Longer example of questionable accuracy with regard to encoding the bytes that you get to solve at https://stackoverflow.com/a/74902194/181965
I have a binary file that is created by an open source application that is written in C. Since it is open source I can see how the data is structured when it is written to the file. The problem is that I don't know C, but I can at least mostly tell what is going when the structs are being declared. But from what I've seen in other posts it isn't a simple as creating a struct in C# with the same data types as the ones in C.
I found this post https://stackoverflow.com/a/3863658/201021 which has a class for translating structs but (as far as I can tell) you need to declare the struct properly in C# for it to work.
I've read about the MarshalAs attribute and the StructLayout attribute. I mostly get how you would use them to control the physical structure of the data type. I think what I'm missing are the details.
I'm not asking for somebody to just convert the C data structures into C#. What I'd really like is some pointers to information that will help me figure out how to do it myself. I have another binary file in a slightly different format to read so some general knowledge around this topic would be really appreciated.
How do you convert a C data structure to a C# struct that will allow you to read the data type from a file?
Notes:
Specifically I'm trying to read the rstats and cstats files that are output by the Tomato router firmware. This file contains bandwidth usage data and ip traffic data.
The C code for the data structure is (from rstats.c):
#define MAX_COUNTER 2
#define MAX_NSPEED ((24 * SHOUR) / INTERVAL)
#define MAX_NDAILY 62
#define MAX_NMONTHLY 25
typedef struct {
uint32_t xtime;
uint64_t counter[MAX_COUNTER];
} data_t;
typedef struct {
uint32_t id;
data_t daily[MAX_NDAILY];
int dailyp;
data_t monthly[MAX_NMONTHLY];
int monthlyp;
} history_t;
typedef struct {
char ifname[12];
long utime;
unsigned long speed[MAX_NSPEED][MAX_COUNTER];
unsigned long last[MAX_COUNTER];
int tail;
char sync;
} speed_t;
I think your first link https://stackoverflow.com/a/3863658/201021 is a good way to follow. So I guess the next thing would be constructing a C# struct to map C struct. Here is the map for different types from MSDN http://msdn.microsoft.com/en-us/library/ac7ay120(v=vs.110).aspx
Cheers!
I'm not an ANSI C programmer either but, at first glance at the source file, it appears to be saving data into a .gz file and then renaming it. The open function decompresses it with gzip. So, you might be looking at a compressed file at the top layer.
Once you know that you are dealing with the raw file, it looks like the best place to start is the load(int new) function. You need to figure out how to reverse engineer whats going on. If you get lost, you may have to learn how some of the native C function calls work.
The first interesting line is:
if (f_read("/var/lib/misc/rstats-stime", &save_utime, sizeof(save_utime)) != sizeof(save_utime)) {
save_utime = 0;
}
In scanning the file save_time is declared as a long. In C, that is a 32-bit number so int is the C# equivalent. Given it's name, it seems to be a time-stamp. So, the first step appears to be to read in a 4-byte int.
The next interesting piece is
speed_count = decomp(hgz, speed, sizeof(speed[0]), MAX_SPEED_IF);
In the save function it saves speed as an array of size_t structs with sizeof() * count type behavior. But, it doesn't save the actual count. Since it passes MAX_SPEED_IF (which is defined as = 10) into decomp from the load function, it makes sense to see what it's doing in decomp. In looking, it seems that it tries to read( ... size * max) (a.k.a. size * MAX_SPEED_IF) and depends on the return value from the read library function to know how many size_t structures were actually saved.
From there, it's just a matter of reading in the correct number of bytes for the number of size_t structures written. Then, it goes on to load the history data.
This is the only approach I can think to reverse engineer a binary file while referencing the source code and porting it to a different language all at the same time.
BTW. I'm only offering my help. I could be totally wrong. Like I said, I'm not an ansi c guy. But, I do hope that this helps get you going.
The short answer is that you probably cannot do this automatically, at least at runtime.
Knowing how many C programs are written, there's little chance of any meta-data being in the file. Even if there is, you need to address that as "a program that reads data with meta-data in this format". There are also all sorts of subtleties such as word length, packing etc.
Just because the two languages have "C" in the name does not make them magically compatible I am afraid. I fear you need to write a specific program for each file type and as part of that, re-declare your structures in C#
I am designing a text file format to be read in C#. I have a need to store types: int, double and string on a single line. I'm planning to use a .CSV format so the file can be manually opened and read. A particular record may have say 8 known types, then a variable number of "indicator" combinations of either (string, int, double) or (string, int, double, double), and some lines may include no "indicators". Thus, each record is may be of variable length.
In VB6 I would just input the data, split the data, into a variant array, then determine the number of elements on that line in the array, and use the ***VarType function to determine if the final "indicator" variables are string, int, or double and parse the field accordingly.
There may be a better way to design a text file and that may be the best solution. If so I'm interested in hearing ideas. I have searched but found no questions that specifically talk about reading variable length lines of text with mixed type into C#.
If a better format is not forthcoming, is there a way to duplicate the VB6 VarType function within C# as described two paragraphs above***? I can handle the text file reading and line splitting easily in C#.
you could use either json or xml as they are well supported in .NET and have automatic serialization capabilities
First I agree with Keith's suggestion to use Xml or JSON. You are reinventing a wheel here. This page has an introductory example of how to serialize objects to a file and some links to more info.
If you need to stick with your own file format and custom serialization/deserialization however take a look at the Convert class, as well as the various TryParse methods which hang off of the intrinsic value types like int and double.
Is there a way to avoid casting to a non-string type when reading data from a text file containing exclusively integer values separated by integer markers ('0000' for example) ?
(Real-Life example : genetic DNA sequences, each DNA marker being a digit sequence.)
EDIT :
Sample data : 581684531650000651651561156843000021484865321200001987984948978465156115684300002148486532120000198798400009489786515611568430000214848653212000019879849480006516515611684531650000651651561156843000021 etc...
Unless I use a binary writer and read bytes, rather than text (because that is how data written at first), I think this a funky idea, so "NO" would be the straight answer for this.
Just wanted to get a definitive confirmation to that here, just to be definitely sure.
I welcome any intermediate solution to write/read this kind of data efficiently without having to code a custom reader GUI to display it outside my app, intelligibly (in some generic reader/viewer).
The short answer is no, because a text file is a string of characters.
The long answer is sort of yes; if you put your data into a format like XML, a deserializer can implicitly cast the data back to the correct type (without you having to do it manually) based on your schema.
If you have control over the format, consider using a binary format for your file and use e.g. BinaryReader.ReadInt32.
rather then just casting, you really should use the .TryParse(...) method(s) of the types you are trying to read. This is a much more type-safe solution.
And to answer your question, other then using a binary file, there is not (to my knowledge) a way to do this without casting (or using the TryParse methods)
The only way to control all the read process is to read bytes. Else you read strings.
Edit : I Didn't talk about automatic serialization via XML because of the details on the file format you gave.
If the data is text and you need to access it as an integer, a conversion will be required. The only question is which code does the conversion.
Depending upon the file format, you could look for classes or libraries that already handle them. Otherwise, keep your code well organized so you don't have to pay attention to the conversion too much.
Some options:
// Could throw exceptions
var x = Convert.ToInt32(text);
var x = Int32.Parse(text);
// Won't throw an exception, just check the results
int x = 0;
if (Int32.TryParse(text, out x)) { ... }
Today i'm cutting video at work (yea me!), and I came across a strange video format, an MOD file format with an companion MOI file.
I found this article online from the wiki, and I wanted to write a file format handler, but I'm not sure how to begin.
I want to write a file format handler to read the information files, has anyone ever done this and how would I begin?
Edit:
Thanks for all the suggestions, I'm going to attempt this tonight, and I'll let you know. The MOI files are not very large, maybe 5KB in size at most (I don't have them in front of me).
You're in luck in that the MOI format at least spells out the file definition. All you need to do is read in the file and interpret the results based on the file definition.
Following the definition, you should be able to create a class that could read and interpret a file which returns all of the file format definitions as properties in their respective types.
Reading the file requires opening the file and generally reading it on a byte-by-byte progression, such as:
using(FileStream fs = File.OpenRead(path-to-your-file)) {
while(true) {
int b = fs.ReadByte();
if(b == -1) {
break;
}
//Interpret byte or bytes here....
}
}
Per the wiki article's referenced PDF, it looks like someone already reverse engineered the format. From the PDF, here's the first entry in the format:
Hex-Address: 0x00
Data Type: 2 Byte ASCII
Value (Hex): "V6"
Meaning: Version
So, a simplistic implementation could pull the first 2 bytes of data from the file stream and convert to ASCII, which would provide a property value for the Version.
Next entry in the format definition:
Hex-Address: 0x02
Data Type: 4 Byte Unsigned Integer
Value (Hex):
Meaning: Total size of MOI-file
Interpreting the next 4 bytes and converting to an unsigned int would provide a property value for the MOI file size.
Hope this helps.
If the files are very large and just need to be streamed in, I would create a new reader object that uses an unmanagedmemorystream to read the information in.
I've done a lot of different file format processing like this. More recently, I've taken to making a lot of my readers more functional where reading tends to use 'yield return' to return read only objects from the file.
However, it all depends on what you want to do. If you are trying to create a general purpose format for use in other applications or create an API, you probably want to conform to an existing standard. If however you just want to get data into your own application, you are free to do it however you want. You could use a binaryreader on the stream and construct the information you need within your app, or get the reader to return objects representing the contents of the file.
The one thing I would recommend. Make sure it implements IDisposable and you wrap it in a using!