C#: Location of const variable in a binary

C#: Location of const variable in a binary - c#

Is it possible to know the location of const variables within an exe? We were thinking of watermarking our program so that each user that downloads the program from our server will have some unique key embedded in the code.
Is there another way to do this?

You could build a binary with a watermark that is a string representation of a GUID in a .net type as a constant. After you build, perform a search for the GUID string in the binary file to check its location. You can change this GUID value to another GUID value and then run the binary and actually see the changed value in code output.
Note: The formatting is important as the length would be very important since you're messing with a compiled binary. For example, you'll want to keep the leading zeros of a GUID so that all instances have the same char length when converted to a string.
i have actually done this sort of thing with Win32 DLLs and even the Sql Server 2000 Desktop exe. (There was a hack where you could switch the desktop edition into a full blown SQL server by flipping a switch in the binary.)
This process could then be automated and a new copy of a DLL would be programmatically altered by a small, server-side utility for each client download.
Also take a look at this: link
It discusses the use of storing settings in a .Net DLL and uses a class-based approach and embeds the app settings file and is configurable after compilation.

Key consideration #1: Assembly signing
Since you are distributing your application, clearly you are signing it. As such, since you're modifying the binary contents, you'll have to integrate the signing process directly in the downloading process.
Key consideration #2: const or readonly
There is a key difference between const and readonly variables that many people do not know about. In particular, if I do the following:
private readonly int SomeValue = 3;
...
if (SomeValue > 0)
...
Then it will compile to byte code like the following:
ldsfld [SomeValue]
ldc.i4.0
ble.s
If you make the following:
private const int SomeValue = 3;
...
if (SomeValue > 0)
...
Then it will compile to byte code like the following:
{contents of if block here}
const variables are [allowed to be] substituted and evaluated by the compiler instead of at run time, where readonly variables are always evaluated at run time. This makes a big difference when you expose fields to other assemblies, as a change to a const variable is a breaking change that forces a recompile of all dependent assemblies.
My recommendation
I see two reasonably easy options for watermarking, though I'm not an expert in the area so don't know how "good" they are overall.
Watermark the embedded splash screen or About box logo image.
Watermark the symmetric key for loading your string resources. Keep a cache so only have to decode them once and it won't be a performance problem - this is a variable applied to a commonly used obfuscation technique. The strings are stored in the binary as UTF-8 encoded strings, and can be replaced in-line as long as the new string's null-terminated length is less than or equal to the length of the string currently found in the binary.
Finally, Google reported the following article on watermarking software that you might want to take a look at.

In C++ (for example):
#define GUID_TO_REPLACE "CC7839EB7EC047B290D686C65F98E0F4"
printf(GUID_TO_REPLACE);
in PHP:
<?php
exec("sed -e 's/CC7839EB7EC047B290D686C65F98E0F4/replacedreplacedreplacedreplaced/g' TestApp.exe > TestAppTagged.exe");
?>
If you stick your compiled binary on the server, visit the php script, download the tagged exe, and run it...you'll see that it now prints the "replaced" string rather than the GUID :)
Note that the length of the replaced string must be identical to the original (32 in this case), so you'll need to pad the length if you want to tag it with something shorter.

I'm not sure what you mean by "location" of a const value. You can certainly use items like reflection to access a const field on a particular type. Const fields bind like any other non-instance field of the same accessibility. I don't know if that fits your definition of location though.

Related

How do you convert a C data structure to a C# struct that will allow you to read the data type from a file?

I have a binary file that is created by an open source application that is written in C. Since it is open source I can see how the data is structured when it is written to the file. The problem is that I don't know C, but I can at least mostly tell what is going when the structs are being declared. But from what I've seen in other posts it isn't a simple as creating a struct in C# with the same data types as the ones in C.
I found this post https://stackoverflow.com/a/3863658/201021 which has a class for translating structs but (as far as I can tell) you need to declare the struct properly in C# for it to work.
I've read about the MarshalAs attribute and the StructLayout attribute. I mostly get how you would use them to control the physical structure of the data type. I think what I'm missing are the details.
I'm not asking for somebody to just convert the C data structures into C#. What I'd really like is some pointers to information that will help me figure out how to do it myself. I have another binary file in a slightly different format to read so some general knowledge around this topic would be really appreciated.
How do you convert a C data structure to a C# struct that will allow you to read the data type from a file?
Notes:
Specifically I'm trying to read the rstats and cstats files that are output by the Tomato router firmware. This file contains bandwidth usage data and ip traffic data.
The C code for the data structure is (from rstats.c):
#define MAX_COUNTER 2
#define MAX_NSPEED ((24 * SHOUR) / INTERVAL)
#define MAX_NDAILY 62
#define MAX_NMONTHLY 25
typedef struct {
uint32_t xtime;
uint64_t counter[MAX_COUNTER];
} data_t;
typedef struct {
uint32_t id;
data_t daily[MAX_NDAILY];
int dailyp;
data_t monthly[MAX_NMONTHLY];
int monthlyp;
} history_t;
typedef struct {
char ifname[12];
long utime;
unsigned long speed[MAX_NSPEED][MAX_COUNTER];
unsigned long last[MAX_COUNTER];
int tail;
char sync;
} speed_t;

I think your first link https://stackoverflow.com/a/3863658/201021 is a good way to follow. So I guess the next thing would be constructing a C# struct to map C struct. Here is the map for different types from MSDN http://msdn.microsoft.com/en-us/library/ac7ay120(v=vs.110).aspx
Cheers!

I'm not an ANSI C programmer either but, at first glance at the source file, it appears to be saving data into a .gz file and then renaming it. The open function decompresses it with gzip. So, you might be looking at a compressed file at the top layer.
Once you know that you are dealing with the raw file, it looks like the best place to start is the load(int new) function. You need to figure out how to reverse engineer whats going on. If you get lost, you may have to learn how some of the native C function calls work.
The first interesting line is:
if (f_read("/var/lib/misc/rstats-stime", &save_utime, sizeof(save_utime)) != sizeof(save_utime)) {
save_utime = 0;
}
In scanning the file save_time is declared as a long. In C, that is a 32-bit number so int is the C# equivalent. Given it's name, it seems to be a time-stamp. So, the first step appears to be to read in a 4-byte int.
The next interesting piece is
speed_count = decomp(hgz, speed, sizeof(speed[0]), MAX_SPEED_IF);
In the save function it saves speed as an array of size_t structs with sizeof() * count type behavior. But, it doesn't save the actual count. Since it passes MAX_SPEED_IF (which is defined as = 10) into decomp from the load function, it makes sense to see what it's doing in decomp. In looking, it seems that it tries to read( ... size * max) (a.k.a. size * MAX_SPEED_IF) and depends on the return value from the read library function to know how many size_t structures were actually saved.
From there, it's just a matter of reading in the correct number of bytes for the number of size_t structures written. Then, it goes on to load the history data.
This is the only approach I can think to reverse engineer a binary file while referencing the source code and porting it to a different language all at the same time.
BTW. I'm only offering my help. I could be totally wrong. Like I said, I'm not an ansi c guy. But, I do hope that this helps get you going.

The short answer is that you probably cannot do this automatically, at least at runtime.
Knowing how many C programs are written, there's little chance of any meta-data being in the file. Even if there is, you need to address that as "a program that reads data with meta-data in this format". There are also all sorts of subtleties such as word length, packing etc.
Just because the two languages have "C" in the name does not make them magically compatible I am afraid. I fear you need to write a specific program for each file type and as part of that, re-declare your structures in C#

monitor html change using hash func

I want to write an application that gets a list of urls.
For each of them I need to monitor periodically if the content has changed.
I thought :
to use HtmlAgilityPack to fetch html content (any other recommendation?)
I don't need to spot the change itself,
so I though to hash the content, save it in the DB
and re-compare the has in the future.
How would you suggest hashing? .net's GetHashCode() ?
I saw this documentation http://support.microsoft.com/kb/307020
which advise using
tmpSource = ASCIIEncoding.ASCII.GetBytes(sSourceData);
why?

You should absolutely not use GetHashCode() for this. The documentation explicitly states:
Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework.
The results of GetHashCode can change between runs - all that's guaranteed is that calling it on two equal objects in the same process (possibly AppDomain) will give the same hash code. Indeed, String.GetHashCode's algorithm has changed over time, and in .NET 4 the 32-bit implementation is different to the 64-bit implementation.
If you want to use hashing, use MD5, SHA1 etc - something with a specified algorithm which will not change. (Note that these operation on binary data rather than string data, which is probably more appropriate too - you don't need to bother decoding the data as text.)
It's not clear to me whether refetching periodically is really the best idea though - do these servers not support last modified times, etags etc?

As you have asked for suggestions. I would have used this method instead
WebClient client = new WebClient();
String htmlCode = client.DownloadString("http://google.com");
And i would have saved this string in my DB. After the particular interval i could have compared them again.
But yes I do agree the string size would be really be large.
If I just want to get a alert on the fact the content has changed some how. I would use MD5. As the result size of an MD5 string is only 27 characters.
Hence easier to compare and store in DB

Generating a non-guid unique key outside of a database

I have a situation where I need to create some kind of uniqueness between 'entities', but it is not a GUID, and it is not saved in a database (It is saved, however. Just not by a database).
The basic use of the key is a mere redundancy check. It does not have to be as scalable as a real 'primary key', but in the simplest terms I can think of , this is how it works.
[receiver] has List<possibilities>.
possibilities exist independently, but many will have the same values (impossible to predict. This is by design)
Frequently, the receivers list of possibilities will have to be emptied and then refilled (this is a business requirement).
The key is basically used to add a very lightweight redundancy check. In other words, sometimes the same possibility will be repeated, sometimes it should only appear once in the receiver's list.
I basically want to use something very light and simple. A string is sufficient. I was just wanting to figure out a modest algorithm to accomplish this. I thought about using the GetHashCode() method, but I am not certain about how reliable that is. Can I get some thoughts?

If you can use GetHashCode() at a first glance, you can probably use an MD5 hash as well, obtaining less collision probability. The resulting MD5 can be stored as a 24 charachter string by encoding it base 64, let see this example:
public static class MD5Gen
{
static MD5 hash = MD5.Create();
public static string Encode(string toEncode)
{
return Convert.ToBase64String(
hash.ComputeHash(Encoding.UTF8.GetBytes(toEncode)));
}
}
with this you encode a source string in an md5 hash in string format too. You just have to write the "possibility" class in term of string.

Try this for generating Guid.
VBScript Function to Generate a UUID/GUID
If you are on Windows, you can use the simple VBScript below to generate a UUID. Just save the code to a file called createguid.vbs, and then run cscript createguid.vbs at a command prompt.
Set TypeLib = CreateObject("Scriptlet.TypeLib")
NewGUID = TypeLib.Guid
WScript.Echo(left(NewGUID, len(NewGUID)-2))
Set TypeLib = Nothing
Create a UUID/GUID via the Windows Command Line
If you have the Microsoft SDK installed on your system, you can use the utility uuidgen.exe, which is located in the "C:\Program Files\Microsoft SDK\Bin" directory
or try the same for more info.
Link
I would say go for the Windows command line as it is more reliable.

Application-specific data and how to handle it?

I am curious as to how applications generate their own data that is used with the application itself. For example, if you take any kind of PC game's save file or some sort of program that generates binary data like Photoshop's PSD files or .torrent files for BitTorrent applications, I'd assume they are all specific to the corresponding application and that the authors of that application programmed the way this data was created. My first question is: is that true? I am 99% positive that it is binary data because when opening a PSD file or a .torrent file in Notepad++, it's easy to see that it's nothing that can be read by a human...
My second question is: if I wanted to make an application that generates its own data in binary format (no plain-text or anything that's easily manipulated), how would I go about handling this data? I can vaguely picture generating this data and saving it to a file in binary format, but I am really stuck on how I'd handle this data when it's needed by the application again. Since this type of data is not plain text and can't be treated as a string or anything like that, how is it that applications create and handle/parse their own binary data (or any binary data in general)?
I can obviously see that when you open a PSD file, Photoshop opens and it displays whatever the PSD file contained. But how do many applications handle these formats? I am just not seeing how to parse this specific data (or binary data in general) and programmatically do what you want to with it.

Well, as a simple example, let's take bitmaps.
Bitmaps have a standard file structure, which is defined by the info header and file header.
On the wikipedia article (link: http://en.wikipedia.org/wiki/BMP_file_format) you'll see that the info header has a well defined format, as well as the file header.
Each of these is written as binary as is, and is read in as binary as is. Then, the actually bitmap image is written out as binary.
In other applications, the application may choose to do a custom plain text format, in which case it must be written to in a consistent manner or have some support for versioning so you can use newer features in the file.
Look up on serialization though, it's a rather broad topic and there are lots of approaches to this.
Edit: Here is a code sample (not optimal) for reading (or writing, with the right modifications) in bitmaps:
// Tell visual studio to align on 2-byte boundary
// Necessary so if you write to file, it only writes 14 bytes and not 16.
#pragma pack(2)
struct BMIH
{
short bfType;
long bfSize;
short bfReserved0;
short bfReserved1;
long bOffbits;
};
#pragma pack(8)
struct BMFH
{
long biSize;
long biWidth;
long biHeight;
short biPlanes;
short biBitCount;
long biCompression;
long biImageSize;
long biXPelsPerMeter;
long biYPelsPerMeter;
long biClrUsed;
long biClrImportant;
};
BMIH infoheader;
BMFH fileheader;
std::fstream file(filename.c_str(), std::ios::in | std::ios::binary);
// Read in info and file headers
file.read((char *) &infoheader, sizeof(infoheader));
file.read((char *) &fileheader, sizeof(fileheader));
// Calculate size of image
int size = fileheader.biHeight * fileheader.biWidth;
int bytes = size * fileheader.biBitCount / 8;
// Read in the image to a buffer
unsigned char data = new unsigned char[bytes];
file.read((char *) td.data, bytes);
file.close();
That code is actually a drastic simplification and completely ignores all sorts of issues, such as what happens if the file headers or data are corrupt, if the file isn't incomplete, etc. But it's just meant as a proof of concept. The #pragmas are actually visual studio specific for enforcing proper alignment of the headers.
When we write this out to a file, we might not actually say "Okay, now write out this integer". Instead, we want to write it as a binary format. For example, code that you might (but shouldn't) use to write it would look like:
// Assume for arguments sake these data structures came pre-filled
BMFH fileheader;
BMIH infoheader;
unsigned char *data;
int size = fileheader.biHeight * fileheader.biWidth;
int bytes = size * fileheader.biBitCount / 8;
std::fstream file("MyImage.bitmap", std::ios::out | std::ios::binary);
file.write((char *) &infoheader, sizeof(BMIH));
file.write((char *) &fileheader, sizeof(BMFH));
file.write((char *) data, sizeof(unsigned char) * bytes);

Read up on Binary Serialization on MSDN. The .Net Framework goes a long way to helping with this.

Yes, Many applications leverage some sort of application-specific binary formats that can not be easily manipulated. To create your own binary format, there are some options:
Binary Serialization Technique
Using IO classes to manually read and write bytes and actually creating a random access file.

Should we store format strings in resources?

For the project that I'm currently on, I have to deliver specially formatted strings to a 3rd party service for processing. And so I'm building up the strings like so:
string someString = string.Format("{0}{1}{2}: Some message. Some percentage: {3}%", token1, token2, token3, number);
Rather then hardcode the string, I was thinking of moving it into the project resources:
string someString = string.Format(Properties.Resources.SomeString, token1, token2, token3, number);
The second option is in my opinion, not as readable as the first one i.e. the person reading the code would have to pull up the string resources to work out what the final result should look like.
How do I get around this? Is the hardcoded format string a necessary evil in this case?

I do think this is a necessary evil, one I've used frequently. Something smelly that I do, is:
// "{0}{1}{2}: Some message. Some percentage: {3}%"
string someString = string.Format(Properties.Resources.SomeString
,token1, token2, token3, number);
..at least until the code is stable enough that I might be embarrassed having that seen by others.

There are several reasons that you would want to do this, but the only great reason is if you are going to localize your application into another language.
If you are using resource strings there are a couple of things to keep in mind.
Include format strings whenever possible in the set of resource strings you want localized. This will allow the translator to reorder the position of the formatted items to make them fit better in the context of the translated text.
Avoid having strings in your format tokens that are in your language. It is better to use
these for numbers. For instance, the message:
"The value you specified must be between {0} and {1}"
is great if {0} and {1} are numbers like 5 and 10. If you are formatting in strings like "five" and "ten" this is going to make localization difficult.
You can get arround the readability problem you are talking about by simply naming your resources well.
string someString = string.Format(Properties.Resources.IntegerRangeError, minValue, maxValue );
Evaluate if you are generating user visible strings at the right abstraction level in your code. In general I tend to group all the user visible strings in the code closest to the user interface as possible. If some low level file I/O code needs to provide errors, it should be doing this with exceptions which you handle in you application and consistent error messages for. This will also consolidate all of your strings that require localization instead of having them peppered throughout your code.

One thing you can do to help add hard coded strings or even speed up adding strings to a resource file is to use CodeRush Xpress which you can download for free here: http://www.devexpress.com/Products/Visual_Studio_Add-in/CodeRushX/
Once you write your string you can access the CodeRush menu and extract to a resource file in a single step. Very nice.
Resharper has similar functionality.

I don't see why including the format string in the program is a bad thing. Unlike traditional undocumented magic numbers, it is quite obvious what it does at first glance. Of course, if you are using the format string in multiple places it should definitely be stored in an appropriate read-only variable to avoid redundancy.
I agree that keeping it in the resources is unnecessary indirection here. A possible exception would be if your program needs to be localized, and you are localizing through resource files.

yes you can
new lets see how
String.Format(Resource_en.PhoneNumberForEmployeeAlreadyExist,letterForm.EmployeeName[i])
this will gave me dynamic message every time
by the way I'm useing ResXManager

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.