Recently our site has been deluged with the resurgence of the Asprox botnet SQL injection attack. Without going into details, the attack attempts to execute SQL code by encoding the T-SQL commands in an ASCII encoded BINARY string. It looks something like this:
DECLARE%20#S%20NVARCHAR(4000);SET%20#S=CAST(0x44004500...06F007200%20AS%20NVARCHAR(4000));EXEC(#S);--
I was able to decode this in SQL, but I was a little wary of doing this since I didn't know exactly what was happening at the time.
I tried to write a simple decode tool, so I could decode this type of text without even touching SQL Server. The main part I need to be decoded is:
CAST(0x44004500...06F007200 AS
NVARCHAR(4000))
I've tried all of the following commands with no luck:
txtDecodedText.Text =
System.Web.HttpUtility.UrlDecode(txtURLText.Text);
txtDecodedText.Text =
Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(txtURLText.Text));
txtDecodedText.Text =
Encoding.Unicode.GetString(Encoding.Unicode.GetBytes(txtURLText.Text));
txtDecodedText.Text =
Encoding.ASCII.GetString(Encoding.Unicode.GetBytes(txtURLText.Text));
txtDecodedText.Text =
Encoding.Unicode.GetString(Convert.FromBase64String(txtURLText.Text));
What is the proper way to translate this encoding without using SQL Server? Is it possible? I'll take VB.NET code since I'm familiar with that too.
Okay, I'm sure I'm missing something here, so here's where I'm at.
Since my input is a basic string, I started with just a snippet of the encoded portion - 4445434C41 (which translates to DECLA) - and the first attempt was to do this...
txtDecodedText.Text = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(txtURL.Text));
...and all it did was return the exact same thing that I put in since it converted each character into is a byte.
I realized that I need to parse every two characters into a byte manually since I don't know of any methods yet that will do that, so now my little decoder looks something like this:
while (!boolIsDone)
{
bytURLChar = byte.Parse(txtURLText.Text.Substring(intParseIndex, 2));
bytURL[intURLIndex] = bytURLChar;
intParseIndex += 2;
intURLIndex++;
if (txtURLText.Text.Length - intParseIndex < 2)
{
boolIsDone = true;
}
}
txtDecodedText.Text = Encoding.UTF8.GetString(bytURL);
Things look good for the first couple of pairs, but then the loop balks when it gets to the "4C" pair and says that the string is in the incorrect format.
Interestingly enough, when I step through the debugger and to the GetString method on the byte array that I was able to parse up to that point, I get ",-+" as the result.
How do I figure out what I'm missing - do I need to do a "direct cast" for each byte instead of attempting to parse it?
I went back to Michael's post, did some more poking and realized that I did need to do a double conversion, and eventually worked out this little nugget:
Convert.ToString(Convert.ToChar(Int32.Parse(EncodedString.Substring(intParseIndex, 2), System.Globalization.NumberStyles.HexNumber)));
From there I simply made a loop to go through all the characters 2 by 2 and get them "hexified" and then translated to a string.
To Nick, and anybody else interested, I went ahead and posted my little application over in CodePlex. Feel free to use/modify as you need.
Try removing the 0x first and then call Encoding.UTF8.GetString. I think that may work.
Essentially: 0x44004500
Remove the 0x, and then always two bytes are one character:
44 00 = D
45 00 = E
6F 00 = o
72 00 = r
So it's definitely a Unicode/UTF format with two bytes/character.
Related
I'm attempting to read a file and process it in both C# and IronPython, but I'm running into a slight problem.
When I read the file in either language, I get a byte array that's almost identical, but not quite.
For instance, the array has 1552 bytes. They're all the same except for one thing. Any time the value "10" appears in the Python implementation, the value "13" appears in the C# implementation. Aside from that, all other bytes are the same.
Here's roughly what I'm doing to get the bytes:
Python:
f = open('C:\myfile.blah')
contents = f.read()
bytes = bytearray(contents, 'cp1252')
C#:
var bytes = File.ReadAllBytes(#"C:\myfile.blah");
Perhaps I'm choosing the wrong encoding? Though I wouldn't think so, since the Python implementation behaves as I would expect and processes the file successfully.
Any idea what's going on here?
(I don't know python) But it looks like you need to pass the 'rb' flag:
open('C:\myfile.blah', 'rb')
Reference:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written.
Note that the values 10 and 13 give clues as to what the problem is:
Line feed is 10 in decimal and Carriage return is 13 in decimal.
I need help decoding this received response.
at
OK
+CUSD: 0,"ar#?$ #9#d? ?# ???(d??)##1pD?"?T?Hc#
?& ?#D??? ?#??5 41 IA ?R",17
OK
+CUSD: 0,"ar?hb? ?' 10?# ? ?hb#?J##?#?? #f#??#?#S#d$#",17
I tried when dcs value was 72 on another network provider.
but this one value 17 I don't understand.
how to decode it?
after results :
AT+CSCS="UCS2"
OK
at+cusd=1,"002a003100350030002a0032002a00330032003300390031002a00360039003100370037002a00310023",15
+CUSD: 0,"00610072003f00680062003f0020003f00270020002000310030003f00400020003f0020003f006800620040003f004a00400040003f0040003f003f0020004000660040003f003f0040003f004000530040006400240040",17
AT+CSMP?
+CSMP: 17,167,0,0
OK
by the way when i set my AT+CSCS="UTF-8" it report Error but
it is reported back with this command AT+CSCS=?
The format of the response is according to 27.007:
+CUSD=[<n>[,<str>[,<dcs>]]]
Thus the third parameter is <dcs>. Its format is just deferred:
<dcs>: 3GPP TS 23.038 [25] Cell Broadcast Data Coding Scheme in integer format
(default 0)
In chapter "5 CBS Data Coding Scheme" in 23.038 it states These codings may also be used for USSD.
For 17, binary 0001 0001:
bit 7..4 Coding Group Bits = 0001
bit 3..0 = 0001 --> UCS2; message preceded by language indication
And it notes that
An MS not supporting UCS2 coding will present the two character language identifier followed by improperly interpreted user data.
which is exactly the case in your output (e.g. ar meaning arabic followed by garbage).
For 72, binary 0100 1000:
bit 7..4 Coding Group Bits = 01xx
bit 5 = 0 --> uncompressed,
bit 4 = 0 --> no class meaning
bit 3 & 2 = 1 & 0 --> UCS2 (16bit)
The "not supporting" part above might just be that you are using a limited character set encoding (PCCP437). In any case, unless your modem does not support UTF-8 you really should use that and not this PCCP437. Or you might use USC2. If your modem lacks both of those characters, you can try HEX (guessing on my part from what I saw when researching this answer, maybe you need to set the <dcs> parameter in AT+CSMP for this to work?).
Notice that after selecting UCS2 every string must be encoded that way, including switching to another character set, see this answer for an example.
Use the following functions to decode "UCS2" response data:
public static String HexStr2UnicodeStr(String strHex)
{
byte[] ba = Hex2ByteArray(strHex);
return HexBytes2UnicodeStr(ba);
}
public static String HexBytes2UnicodeStr(byte[] ba)
{
var strMessage = Encoding.BigEndianUnicode.GetString(ba, 0, ba.Length);
return strMessage;
}
for example:
String str1 = SmsEngine.HexStr2UnicodeStr("002a003100350030002a0032002a00330032003300390031002a00360039003100370037002a00310023");
// str1 = "*150*2*32391*69177*1#"
Please also check UnicodeStr2HexStr()
It's late and my caffeine IV is running low so my mind is mush and I'm having problems finding a solution to what I think is a simple encoding problem (which I have almost no experience dealing with).
I have a DB using EF6 Code First and everything seems to work well until I copy some text from a website forum contained within a codeblock. I checked the header and it's supposedly encoded in UTF-8.
I essentially take this text, split it to an array of strings and check the DB for a record matching the string in each line. Everything was going well until I hit a problem with a string "Magnеtic" not matching up to anything in my DB table yet when I went into the SQLMS and queried the table with LIKE '%Magnеtic%' I got a result.
I dropped the text from the website into Notepad++ with the text from the DB query and saw that they look equal:
Magnеtic
Magnеtic
Then, I changed the encoding to ANSI and it showed:
Magnetic <--From DB
Magnеtic <--From website
A tiny light bulb went on in my head but my attempts to remedy this issue has failed.
I've tried using various methods but I think it's my fried brain attacking the problem with the wrong tools:
string.compare(a, b) == 0
string.equals(a, b)
string.ToUpperInvariant()
and probably a few others that I can't remember.
So now you know what my issue is and I feel this is such a simple problem to fix but, like I said, I'm fried and now need some community help.
I'm not a professional coder, more a hobbyist so I may not be using best practices or advanced techniques to do some things.
Edit:
Today I did some more searching and found a couple of methods that didn't work.
private string RemoveAccent(string txt)
{
byte[] bytes = Encoding.GetEncoding("Cyrillic").GetBytes(txt);
return Encoding.ASCII.GetString(bytes);
}
This one appears to remove the accented characters of the Cyrillic encoding. The result wasn't as expected but it DID have an effect.
Results:
Magn?tic <- Computer interpretation
Magnetic <- Visual representation
I also tried:
private string RemoveAccent2(string txt)
{
char[] toReplace = "àèìòùÀÈÌÒÙ äëïöüÄËÏÖÜ âêîôûÂÊÎÔÛ áéíóúÁÉÍÓÚðÐýÝ ãñõÃÑÕšŠžŽçÇåÅøØ".ToCharArray();
char[] replaceChars = "aeiouAEIOU aeiouAEIOU aeiouAEIOU aeiouAEIOUdDyY anoANOsSzZcCaAoO".ToCharArray();
for (int i = 0; i < toReplace.Count(); i++)
{
txt = txt.Replace(toReplace[i], replaceChars[i]);
}
return txt;
}
This method didn't provide any changes.
What can help in these cases is to copy-paste the character into google. In this case, the results point to the Wikipedia article about the letter Ye in Cyrillic, which looks exactly like E in the Latin alphabet, but has different encoding in Unicode.
This means the results you're getting are correct: the string “Magnеtic” looks exactly the same as “Magnetic” (at least using common fonts), but it's actually a different string.
according to my understanding, a base64 encoded string (ie the output of encode) must always be a multiple of 4.
the c# Convert.FromBase64String says that its input must be a multiple of 4
However if I give it a 25 character string it doesnt complain
[convert]::FromBase64String("ei5gsIELIki+GpnPGyPVBA==")
[convert]::FromBase64String("1ei5gsIELIki+GpnPGyPVBA==")
both work. (The first one is 24 , second is 25)
[convert]::FromBase64String("11ei5gsIELIki+GpnPGyPVBA==")
fails with Invalid length exception
I assume this is a bug in the c# library but I just want to make sure - I am writing code that is sniffing strings to see if they are valid base64 strings and I want to be sure that I understand what a valid one looks like (one possible implementation was to give the string to system.convert and see if it threw - why reinvent perfectly good code)
Yes, this is a flaw (aka bug). It got started due to a perf optimization in an internal helper function named FromBase64_ComputeResultLength() which calculates the length of the byte[] result. It has this comment (edited to fit):
// For legal input, we can assume that 0 <= padding < 3. But it may be
// more for illegal input.
// We will notice it at decode when we see a '=' at the wrong place.
The "we will notice" remark is not entirely accurate, the decoder does flag an '=' if one isn't expected but it fails to check if there's one too many. Which is the case for the 25-char string.
You can report the problem at connect.microsoft.com, I don't see an existing report that resembles it. Do note that it is fairly unlikely that Microsoft can actually fix it any time soon since the change is going to break existing programs that now successfully parse bad base64 strings. It normally requires a major .NET release update to get rid of such problems, like it was done for .NET 4.0, there isn't one on the horizon afaik.
But yes, the simple workaround for you is to check if the string length is divisible by 4, use the % operator.
I know there are similar questions already on SO but none of them seem to address this problem. I have inherited the following c# code that has been used to create password hashes in a legacy .net app, for various reasons the C# implementation is now being migrated to php:
string input = "fred";
SHA256CryptoServiceProvider provider = new SHA256CryptoServiceProvider();
byte[] hashedValue = provider.ComputeHash(Encoding.ASCII.GetBytes(input));
string output = "";
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
foreach ( char c in asciiString ) {
int tmp = c;
output += String.Format("{0:x2}",
(uint)System.Convert.ToUInt32(tmp.ToString()));
}
return output;
My php code is very simple but for the same input "fred" doesn't produce the same result:
$output = hash('sha256', "fred");
I've traced the problem down to an encoding issue - if I change this line in the C# code:
string asciiString = ASCIIEncoding.ASCII.GetString(hashedValue);
to
string asciiString = ASCIIEncoding.UTF7.GetString(hashedValue);
Then the php and C# output match (it yields d0cfc2e5319b82cdc71a33873e826c93d7ee11363f8ac91c4fa3a2cfcd2286e5).
Since I'm not able to change the .net code I need to work out how to replicate the results in php.
Thanks in advance for any help,
I don’t know PHP well enough to answer your question; however, I must point out that your C# code is broken. Try generating the hash of these two inputs: "âèí" and "çñÿ". You will find that their hash collides:
3f3b221c6c6e3f71223f51695d456d52223f243f3f363949443f3f763b483615
The first bug lies in this operation:
Encoding.ASCII.GetBytes(input)
This assumes that all characters within your input are US-ASCII. Any non-ASCII characters would cause the encoder to fall back to the byte value for the ? character, thereby giving (unwanted) hash collisions, as demonstrated above. Notwithstanding, this will not be an issue if your input is constrained to only allow US-ASCII characters.
The other (more severe) bug lies in the following operation:
ASCIIEncoding.ASCII.GetString(hashedValue)
ASCII only defines mappings for values 0–127. Since the elements of your hashedValue byte array may contain any byte value (0–255), encoding them as ASCII would cause data to be lost whenever a value greater than 127 is encountered. This may lead to further “unwanted” (read: potentially maliciously generated) hash collisions, even when your original input was US-ASCII.
Given that, statistically, half of the bytes constituting your hashes would be greater than 127, then you are losing at least half the strength of your hash algorithm. If a hacker gains access to your stored hashes, it is quite likely that they will manage to devise an attack to generate hash collisions by exploiting this cryptographic weakness.
Edit: Notwithstanding the considerations mentioned in my posts and Jon’s, here is the PHP code that succumbs to the same weakness – so to speak – as your C# code, and thereby gives the same hash:
$output = hash('sha256', $input, true);
for ($i = 0; $i < strlen($output); $i++)
if ($output[$i] > chr(127))
$output[$i] = '?';
$output = bin2hex($output);
Could you use mb_convert_encoding (see http://php.net/manual/en/function.mb-convert-encoding.php - the page also has a link to a list of supported encodings) to convert the PHP string to ASCII from UTF7?
I've traced the problem down to an encoding issue
Yes. You're trying to treat arbitrary binary data as if it's valid text-encoded data. It's not. You should not be using any Encoding here.
If you want the results in hex, the simplest approach is to use BitConverter.ToString
string text = BitConverter.ToString(hashedValue).Replace("-", "").ToLower();
And yes, as pointed out elsewhere, you probably shouldn't be using ASCII to convert the text to binary at the start of the hashing process. I'd probably use UTF-8.
It's really important that you understand the problem here though, as otherwise you'll run into it in other places too. You should only use encodings such as ASCII, UTF-8 etc (on any platform) when you've genuinely got encoded text data. You shouldn't use them for images, the results of cryptography, the results of hashing, etc.
EDIT: Okay, you say you can't change the C# code... it's not clear whether that just means you've got legacy data, or whether you need to keep using the C# code regardless. You should absolutey not run this code for a second longer than you have to.
But in PHP, you may find you can get away with just replacing every byte with a value >= 0x80 in the hash with 0x3F, which is the ASCII for "question mark". If you look through your data you'll probably find there are a lot of 3F bytes in there.
If you can get this to work, I would strongly suggest that you migrate over to the true MD5 hash without losing information like this. Wherever you're storing the hashes, store two: the legacy one (which is all you have now) and the rehashed one. Whenever you're asked to validate that a password is correct, you should:
Check whether you have a "new" one; if so, only use that - ignore the legacy one.
If you only have a legacy one:
Hash the password in the broken way to check whether it's correct
If it is, hash it again properly and store the results in the "new" place.
Then when everyone's logged in correctly once, you'll be able to wipe out the legacy hashes.