decoding G711 converting from c++ to c# - c#

i try to decoding G711 packet i find this code in c++ but i do not know how to convert this code to c#
this the code
gint16 alaw_exp_table[256] = {
-5504, -5248, -6016, -5760, -4480, -4224, -4992, -4736,
-7552, -7296, -8064, -7808, -6528, -6272, -7040, -6784,
-2752, -2624, -3008, -2880, -2240, -2112, -2496, -2368,
-3776, -3648, -4032, -3904, -3264, -3136, -3520, -3392,
-22016,-20992,-24064,-23040,-17920,-16896,-19968,-18944,
-30208,-29184,-32256,-31232,-26112,-25088,-28160,-27136,
-11008,-10496,-12032,-11520, -8960, -8448, -9984, -9472,
-15104,-14592,-16128,-15616,-13056,-12544,-14080,-13568,
-344, -328, -376, -360, -280, -264, -312, -296,
-472, -456, -504, -488, -408, -392, -440, -424,
-88, -72, -120, -104, -24, -8, -56, -40,
-216, -200, -248, -232, -152, -136, -184, -168,
-1376, -1312, -1504, -1440, -1120, -1056, -1248, -1184,
-1888, -1824, -2016, -1952, -1632, -1568, -1760, -1696,
-688, -656, -752, -720, -560, -528, -624, -592,
-944, -912, -1008, -976, -816, -784, -880, -848,
5504, 5248, 6016, 5760, 4480, 4224, 4992, 4736,
7552, 7296, 8064, 7808, 6528, 6272, 7040, 6784,
2752, 2624, 3008, 2880, 2240, 2112, 2496, 2368,
3776, 3648, 4032, 3904, 3264, 3136, 3520, 3392,
22016, 20992, 24064, 23040, 17920, 16896, 19968, 18944,
30208, 29184, 32256, 31232, 26112, 25088, 28160, 27136,
11008, 10496, 12032, 11520, 8960, 8448, 9984, 9472,
15104, 14592, 16128, 15616, 13056, 12544, 14080, 13568,
344, 328, 376, 360, 280, 264, 312, 296,
472, 456, 504, 488, 408, 392, 440, 424,
88, 72, 120, 104, 24, 8, 56, 40,
216, 200, 248, 232, 152, 136, 184, 168,
1376, 1312, 1504, 1440, 1120, 1056, 1248, 1184,
1888, 1824, 2016, 1952, 1632, 1568, 1760, 1696,
688, 656, 752, 720, 560, 528, 624, 592,
944, 912, 1008, 976, 816, 784, 880, 848};
#include <glib.h>
#include "G711adecode.h"
#include "G711atable.h"
int
decodeG711a(void *input, int inputSizeBytes, void *output, int *outputSizeBytes)
{
guint8 *dataIn = (guint8 *)input;
gint16 *dataOut = (gint16 *)output;
int i;
for (i=0; i<inputSizeBytes; i++)
{
dataOut[i] = alaw_exp_table[dataIn[i]];
}
*outputSizeBytes = inputSizeBytes * 2;
return 0;
}
can any body help me ?
thanks in advance

This is a short[].
You can translate it like any other array:
short[] x = { 1,2,3 };

Looks like this is a fairly simple lossy encoding of a short array. Since G711 is an audio codec, that makes sense. Just iterate over the input byte array, and put the values from the lookup table in the output short array.
private short[] alaw_exp_table = {/* same data as the C code */};
public short[] decodeG711a(byte[] input)
{
short[] result = new short[input.Length];
for (int i = 0; i < input.Length; i++)
{
result[i] = alaw_exp_table[input[i]];
}
return result;
}

Related

TransformFinalBlock with TripleDes(3DES) - DESMode.ECB in Dart/Flutter

I tried and looked for a solution to my problem, but I didn't find anything that worked.
I have a C# program that works perfectly. My task is to turn this program into Flutter, but I can't.
** The code (in C#) that works is this: **
private ICryptoTransform mDesEnc;
TripleDESCryptoServiceProvider mDes = new TripleDESCryptoServiceProvider();
mDes.Key = Convert.FromBase64String("B4WOmhsRsOSD2ZRfhmULCcI0lR4kNiy6");
mDes.Mode = CipherMode.ECB;
mDes.Padding = PaddingMode.Zeros;
mDesEnc = mDes.CreateEncryptor();
mSeq = 2;
byte[] data = Encoding.UTF8.GetBytes(mSeq.ToString());
var dec = mDesEnc.TransformFinalBlock(data, 0, data.Length);
var ret = Convert.ToBase64String(dec);
//dec = 213, 5, 215, 181, 143, 185, 167, 134
//ret = "1QXXtY+5p4Y="
In Flutter, I tried to do this:
final key = base64.decode("B4WOmhsRsOSD2ZRfhmULCcI0lR4kNiy6");
print(key);
final seq = 2;
final bytes = utf8.encode(seq.toString());
mDes3CBC = DES3(
key: key,
mode: DESMode.ECB,
paddingType: DESPaddingType.OneAndZeroes,
);
List<int> t = [];
t.add(bytes.last);
final encrypted = mDes3CBC!.encrypt(t);
print(encrypted);
but the result is: // [11, 195, 182, 192, 231, 57, 14, 15] (Needs to be like 'dec' in the other example).
different from expected.
I created the padding in a auxiliar function and didn't add the padding from the library.
void teste() {
final key = base64.decode("B4WOmhsRsOSD2ZRfhmULCcI0lR4kNiy6");
print(key);
final seq = 2;
var bytes = new List<int>.from(utf8.encode(seq.toString()));
_paddingZero(bytes);
mDes3CBC = DES3(
key: key,
mode: DESMode.ECB,
paddingType: DESPaddingType.None,
);
final encrypted = mDes3CBC!.encrypt(bytes);
print(encrypted);
}
_paddingZero(List<int> data) {
int l = 8 - data.length;
for (int i = 0; i < l; i++) {
data.add(0);
}
}

Uncompress String with nodejs that was compressed using a C# snippet

I collect some large log infos using a C# tool. Therefore I searched for a way to compress that giant string and I found this snippet to do the trick:
public static string CompressString(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
var memoryStream = new MemoryStream();
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
{
gZipStream.Write(buffer, 0, buffer.Length);
}
memoryStream.Position = 0;
var compressedData = new byte[memoryStream.Length];
memoryStream.Read(compressedData, 0, compressedData.Length);
var gZipBuffer = new byte[compressedData.Length + 4];
Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
return Convert.ToBase64String(gZipBuffer);
}
After my logging action the C# tool sends this compressed String to a node.js REST interface which writes it into a database.
Now (in my naive understanding of compression) I thought that I could simply use something like the follwoing code on nodejs side to uncompress it:
zlib.gunzip(Buffer.from(compressedLogMessage, 'base64'), function(err, uncompressedLogMessage) {
if(err) {
console.error(err);
}
else {
console.log(uncompressedLogMessage.toString('utf-8'));
}
});
But I get the error:
{ Error: incorrect header check
at Zlib._handle.onerror (zlib.js:370:17) errno: -3, code: 'Z_DATA_ERROR' }
It seems that the compression method does not match with the uncompression function. I expect that anyone with compression/uncompression knowledge could maybe see the issue(s) immediately.
What could I change or improve to make the uncompression work?
Thanks a lot!
========== UPDATE ===========
It seems that message receiving and base64 decoding works..
Using CompressString("Hello World") results in:
// before compression
"Hello World"
// after compression before base64 encoding
new byte[] { 11, 0, 0, 0, 31, 139, 8, 0, 0, 0, 0, 0, 0, 3, 243, 72, 205, 201, 201, 87, 8, 207, 47, 202, 73, 1, 0, 86, 177, 23, 74, 11, 0, 0, 0 }
// after base64 encoding
CwAAAB+LCAAAAAAAAAPzSM3JyVcIzy/KSQEAVrEXSgsAAAA=
And on node js side:
// after var buf = Buffer.from('CwAAAB+LCAAAAAAAAAPzSM3JyVcIzy/KSQEAVrEXSgsAAAA=', 'base64');
{"buf":{"type":"Buffer","data":[11,0,0,0,31,139,8,0,0,0,0,0,0,3,243,72,205,201,201,87,8,207,47,202,73,1,0,86,177,23,74,11,0,0,0]}}
// after zlib.gunzip(buf, function(err, dezipped) { ... }
{ Error: incorrect header check
at Zlib._handle.onerror (zlib.js:370:17) errno: -3, code: 'Z_DATA_ERROR' }
=============== Update 2 ==================
#01binary's answer was correct! That's the working solution:
function toArrayBuffer(buffer) {
var arrayBuffer = new ArrayBuffer(buffer.length);
var view = new Uint8Array(arrayBuffer);
for (var i = 0; i < buffer.length; ++i) {
view[i] = buffer[i];
}
return arrayBuffer;
}
// Hello World (compressed with C#) => CwAAAB+LCAAAAAAAAAPzSM3JyVcIzy/KSQEAVrEXSgsAAAA=
var arrayBuffer = toArrayBuffer(Buffer.from('CwAAAB+LCAAAAAAAAAPzSM3JyVcIzy/KSQEAVrEXSgsAAAA=', 'base64'))
var zlib = require('zlib');
zlib.gunzip(Buffer.from(arrayBuffer, 4), function(err, uncompressedMessage) {
if(err) {
console.log(err)
}
else {
console.log(uncompressedMessage.toString()) // Hello World
}
});
The snippet you found appears to write 4 extra bytes to the beginning of the output stream, containing the "uncompressed" size of the original data. The original author must have assumed that logic on the receiving end is going to read those 4 bytes, know that it needs to allocate a buffer of that size, and pass the rest of the stream (at +4 offset) to gunzip.
If you are using this signature on the Node side:
https://nodejs.org/api/buffer.html#buffer_class_method_buffer_from_arraybuffer_byteoffset_length
...then pass a byte offset of 4. The first two bytes of your gzip stream should be { 0x1F, 0x8b }, and you can see in your array that those two bytes start at offset 4. A simple example of the zlib header can be found here:
Zlib compression incompatibile C vs C# implementations

TripleDES in CFB mode using Crypto++ and .NET

I am trying to get same result using TripleDES using C++ app which has Crypto++ and .NET app which uses TripleDESCryptoServiceProvider. I tried setting Key and IV the same but I am getting different results.
This question was already asked here, but there is no clear answer.
Here is C++ example
#include <stdio.h>
#include <cstdlib>
#include <string>
#include <iostream>
#include "dll.h"
#include "mybase64.h"
using namespace std;
USING_NAMESPACE(CryptoPP)
int main()
{
std::cout << "Crypto++ Example" << endl;
std:cout << "TEST" << endl;
const int textSize = 4;
const int keySize = 24;
byte iv[] = { 240, 4, 37, 12, 167, 153, 233, 177 };
byte key[] = {191, 231, 220, 196, 173, 36, 92, 125, 146, 210, 117, 220, 95, 104, 154, 69, 180, 113, 146, 19, 124, 62, 60, 79};
byte encryptedText[textSize];
char cText[] = {'T', 'E', 'S', 'T'};
byte* text = new byte[textSize];
for (int ndx = 0; ndx<4; ndx++)
{
text[ndx] = (byte)cText[ndx];
}
CFB_FIPS_Mode<DES_EDE3>::Encryption encryption;
encryption.SetKeyWithIV(key, keySize, iv);
encryption.ProcessString(encryptedText, text, 4);
string encoded;
encoded = base64_encode(encryptedText, 4);
cout << encoded << endl;
system("pause");
return 0;
}
which produces following result:
K3zUUA==
Here is C# example:
using System;
using System.Collections.Generic;
using System.Text;
using System.Security.Cryptography;
using System.IO;
namespace TripleDESExample
{
class Program
{
static void Main(string[] args)
{
string message = "TEST";
byte[] iv = { 240, 4, 37, 12, 167, 153, 233, 177 };
byte[] key = { 191, 231, 220, 196, 173, 36, 92, 125, 146, 210, 117, 220, 95, 104, 154, 69, 180, 113, 146, 19, 124, 62, 60, 79 };
byte[] data = Encoding.ASCII.GetBytes(message);
using (var tdes = new TripleDESCryptoServiceProvider())
{
tdes.Mode = CipherMode.CFB;
tdes.Padding = PaddingMode.Zeros;
tdes.IV = iv;
tdes.Key = key;
using (var ms = new MemoryStream())
{
using (var crypto = new CryptoStream(ms, tdes.CreateEncryptor(), CryptoStreamMode.Write))
{
crypto.Write(data, 0, data.Length);
crypto.Close();
}
Array.Copy(ms.ToArray(), data, data.Length);
Console.WriteLine(string.Format("Encrypted: {0}", Convert.ToBase64String(data)));
}
}
Console.WriteLine("Press any key...");
Console.ReadKey();
}
}
}
Which produces following result:
K7nXyg==
So you can see that they produce different result.
K7nXyg==
K3zUUA==
Can anyone point what could be the issue for them showing different result.
If possible please provide example code.
---------------------UPDATE 4/27/2017-----------------------------------------
Now tried using a little differently implementation of .NET giving me different result as well...
using System;
using System.Collections.Generic;
using System.Text;
using System.Security.Cryptography;
using System.IO;
namespace TripleDESExample
{
class Program
{
static void Main(string[] args)
{
string message = "TEST";
byte[] iv = { 240, 4, 37, 12, 167, 153, 233, 177 };
byte[] key = { 191, 231, 220, 196, 173, 36, 92, 125, 146, 210, 117, 220, 95, 104, 154, 69, 180, 113, 146, 19, 124, 62, 60, 79 };
byte[] bytes = Encoding.ASCII.GetBytes(message);
TripleDESCryptoServiceProvider cryptoServiceProvider1 = new TripleDESCryptoServiceProvider();
cryptoServiceProvider1.Key = key;
cryptoServiceProvider1.IV = iv;
cryptoServiceProvider1.Mode = CipherMode.CFB;
cryptoServiceProvider1.Padding = PaddingMode.Zeros;
TripleDESCryptoServiceProvider cryptoServiceProvider2 = cryptoServiceProvider1;
byte[] inArray = cryptoServiceProvider2.CreateEncryptor().TransformFinalBlock(bytes, 0, bytes.Length);
cryptoServiceProvider2.Clear();
Console.WriteLine(string.Format("Encrypted: {0}", Convert.ToBase64String(inArray, 0, inArray.Length)));
Console.WriteLine("Press any key...");
Console.ReadKey();
}
}
}
which gives me:
K7nXyp+x9kY=
Why?
------UPDATE 4/28/2017-----------
This article describes very well Crypto++ implementation.
When I try to increment BlockSize and FeedbackSize I get following error:
Based on the discussion here it seems like .NET TripleDESCryptoServiceProvider uses CipherMode.CFB in 8-bit while Crypto++ uses it with 128-bit. When trying to set FeedbackSize for .NET higher it throws exception.
Does anyone know how to resolve this issue?
From the comments:
The issue is likely the feedback size. I believe .Net uses a small feedback size, like 8-bits, for CFB mode. Crypto++ uses the full block size for CFB mode. I'd recommend getting a baseline using CBC mode. Once you arrive at the same result in .Net and Crypto++, then switch to CFB mode and turn knobs on the feedback size.
Do you have example how to accomplish this?
You can find examples of CBC Mode on the Crypto++ wiki. Other wiki pages of interest may be TripleDES and CFB Mode.
You can also find test vectors for these modes of operation on the NIST website.
You really need to get to a baseline. You should not use random messages and random keys and ivs until you achieve your baseline.
Here's an example of using a less-than-blocksize feedback size in Crypto++. The example is available at CFB Mode on the Crypto++ wiki (we added it for this answer). You will have to dial in your parameters random parameters (but I suggest you baseline first with something like the NIST test vectors).
You should be wary of using a feedback size that is smaller than the block size because it can reduce the security of the block cipher. If given a choice, you should increase the feedback size for Mcrypt or .Net; and not reduce the feedback size for Crypto++.
SecByteBlock key(AES::DEFAULT_KEYLENGTH), iv(AES::BLOCKSIZE);
memset(key, 0x00, key.size());
memset(iv, 0x00, iv.size());
AlgorithmParameters params = MakeParameters(Name::FeedbackSize(), 1 /*8-bits*/)
(Name::IV(), ConstByteArrayParameter(iv));
string plain = "CFB Mode Test";
string cipher, encoded, recovered;
/*********************************\
\*********************************/
try
{
cout << "plain text: " << plain << endl;
CFB_Mode< AES >::Encryption enc;
enc.SetKey( key, key.size(), params );
StringSource ss1( plain, true,
new StreamTransformationFilter( enc,
new StringSink( cipher )
) // StreamTransformationFilter
); // StringSource
}
catch( CryptoPP::Exception& ex )
{
cerr << ex.what() << endl;
exit(1);
}
/*********************************\
\*********************************/
// Pretty print cipher text
StringSource ss2( cipher, true,
new HexEncoder(
new StringSink( encoded )
) // HexEncoder
); // StringSource
cout << "cipher text: " << encoded << endl;
/*********************************\
\*********************************/
try
{
CFB_Mode< AES >::Decryption dec;
dec.SetKey( key, key.size(), params );
// The StreamTransformationFilter removes
// padding as required.
StringSource ss3( cipher, true,
new StreamTransformationFilter( dec,
new StringSink( recovered )
) // StreamTransformationFilter
); // StringSource
cout << "recovered text: " << recovered << endl;
}
catch( CryptoPP::Exception& ex )
{
cerr << ex.what() << endl;
exit(1);
}
It produces the following output:
$ ./test.exe
plain text: CFB Mode Test
cipher text: 2506FBCA6F97DC7653B414C291
recovered text: CFB Mode Test
So you can see that they produce different result.
K7nXyg==
K3zUUA==
The following reproduces K7nXyg==, but its not clear to me that's what you want. You really should get to your baseline. Then you can tell us things like a key with no parity and an 8-bit feedback size.
const byte key[] = { 191, 231, 220, 196, 173, 36, 92, 125,
146, 210, 117, 220, 95, 104, 154, 69,
180, 113, 146, 19, 124, 62, 60, 79 };
const byte iv[] = { 240, 4, 37, 12, 167, 153, 233, 177 };
ConstByteArrayParameter cb(iv, sizeof(iv));
AlgorithmParameters params = MakeParameters(Name::FeedbackSize(), 1 /*8-bits*/)
(Name::IV(), ConstByteArrayParameter(iv, sizeof(iv)));
string plain = "TEST";
string cipher, encoded, recovered;
/*********************************\
\*********************************/
try
{
cout << "plain text: " << plain << endl;
CFB_Mode< DES_EDE3 >::Encryption enc;
enc.SetKey( key, sizeof(key), params );
StringSource ss1( plain, true,
new StreamTransformationFilter( enc,
new StringSink( cipher )
) // StreamTransformationFilter
); // StringSource
}
catch( CryptoPP::Exception& ex )
{
cerr << ex.what() << endl;
exit(1);
}
/*********************************\
\*********************************/
// Pretty print cipher text
StringSource ss2( cipher, true,
new Base64Encoder(
new StringSink( encoded )
) // HexEncoder
); // StringSource
cout << "cipher text: " << encoded << endl;
/*********************************\
\*********************************/
try
{
CFB_Mode< DES_EDE3 >::Decryption dec;
dec.SetKey( key, sizeof(key), params );
// The StreamTransformationFilter removes
// padding as required.
StringSource ss3( cipher, true,
new StreamTransformationFilter( dec,
new StringSink( recovered )
) // StreamTransformationFilter
); // StringSource
cout << "recovered text: " << recovered << endl;
}
catch( CryptoPP::Exception& ex )
{
cerr << ex.what() << endl;
exit(1);
}

Sending CMD to PLC via TCP Client

I'm trying to send a command to a PLC that controls electronic lockers via a TCP Client. I am able to connect but it appears my command is not being read by the PLC.
I have the following code:
private const string STX = "0x02";
private const string ETX = "0x03";
private const string STATUS = "0x30";
private const string OPEN = "0x31";
private const string SUM = STX + ETX;
static void Main(string[] args)
{
var tcpClient = new TcpClient();
tcpClient.Connect("192.168.1.190", 4000);
if (tcpClient.Connected)
{
var networkStream = tcpClient.GetStream();
if (networkStream.CanWrite)
{
var ADDY = "00";
var asciiEncode = new ASCIIEncoding();
byte[] b = asciiEncode.GetBytes(STX + ADDY + OPEN + ETX + SUM);
networkStream.Write(b, 0, b.Length);
byte[] b1 = new byte[100];
var k = networkStream.Read(b1, 0, 100);
for (var i = 0; i < k; i++)
{
Console.WriteLine(Convert.ToChar(b1[i]));
}
}
}
}
STATUS/OPEN are the commands that can be sent. The PLC came with some documentation and here is a picture of it. I assuming my CMD is wrong, how do I fix it? This is my first time trying to connect to and send/retrieve commands from a PLC. Any help is appreciated.
You're on the right track, but the encoding of your command is wrong. Low-level protocols like these are tricky to get right.
Your byte array b contains the ASCII-encoded string "0x02000x310x030x020x03", encoded in ASCII, which translates to byte[22] { 48, 120, 48, 50, 48, 48, 48, 120, 51, 49, 48, 120, 48, 51, 48, 120, 48, 50, 48, 120, 48, 51 }while you want to be sending an array of the actual bytes 0x02, etc.
Try something like:
byte[b] = new byte { 0x02, 0x00, 0x31, 0x30, 0x02, 0x03 }
See also http://www.december.com/html/spec/ascii.html to see how STX and ETX relate to the other ASCII characters (eg a-z, 0-9, etc).

How to identify doc, docx, pdf, xls and xlsx based on file header

How to identify doc, docx, pdf, xls and xlsx based on file header in C#?
I don't want to rely on the file extensions neither MimeMapping.GetMimeMapping for this as either of the two can be manipulated.
I know how to read the header but dont know what combination of bytes can say if a file is a doc, docx, pdf, xls or xlsx.
Any thoughts?
This question contains a example of using the first bytes of a file to determine the file type: Using .NET, how can you find the mime type of a file based on the file signature not the extension
It is a very long post, so I am posting the relevant answer below:
public class MimeType
{
private static readonly byte[] BMP = { 66, 77 };
private static readonly byte[] DOC = { 208, 207, 17, 224, 161, 177, 26, 225 };
private static readonly byte[] EXE_DLL = { 77, 90 };
private static readonly byte[] GIF = { 71, 73, 70, 56 };
private static readonly byte[] ICO = { 0, 0, 1, 0 };
private static readonly byte[] JPG = { 255, 216, 255 };
private static readonly byte[] MP3 = { 255, 251, 48 };
private static readonly byte[] OGG = { 79, 103, 103, 83, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0 };
private static readonly byte[] PDF = { 37, 80, 68, 70, 45, 49, 46 };
private static readonly byte[] PNG = { 137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82 };
private static readonly byte[] RAR = { 82, 97, 114, 33, 26, 7, 0 };
private static readonly byte[] SWF = { 70, 87, 83 };
private static readonly byte[] TIFF = { 73, 73, 42, 0 };
private static readonly byte[] TORRENT = { 100, 56, 58, 97, 110, 110, 111, 117, 110, 99, 101 };
private static readonly byte[] TTF = { 0, 1, 0, 0, 0 };
private static readonly byte[] WAV_AVI = { 82, 73, 70, 70 };
private static readonly byte[] WMV_WMA = { 48, 38, 178, 117, 142, 102, 207, 17, 166, 217, 0, 170, 0, 98, 206, 108 };
private static readonly byte[] ZIP_DOCX = { 80, 75, 3, 4 };
public static string GetMimeType(byte[] file, string fileName)
{
string mime = "application/octet-stream"; //DEFAULT UNKNOWN MIME TYPE
//Ensure that the filename isn't empty or null
if (string.IsNullOrWhiteSpace(fileName))
{
return mime;
}
//Get the file extension
string extension = Path.GetExtension(fileName) == null
? string.Empty
: Path.GetExtension(fileName).ToUpper();
//Get the MIME Type
if (file.Take(2).SequenceEqual(BMP))
{
mime = "image/bmp";
}
else if (file.Take(8).SequenceEqual(DOC))
{
mime = "application/msword";
}
else if (file.Take(2).SequenceEqual(EXE_DLL))
{
mime = "application/x-msdownload"; //both use same mime type
}
else if (file.Take(4).SequenceEqual(GIF))
{
mime = "image/gif";
}
else if (file.Take(4).SequenceEqual(ICO))
{
mime = "image/x-icon";
}
else if (file.Take(3).SequenceEqual(JPG))
{
mime = "image/jpeg";
}
else if (file.Take(3).SequenceEqual(MP3))
{
mime = "audio/mpeg";
}
else if (file.Take(14).SequenceEqual(OGG))
{
if (extension == ".OGX")
{
mime = "application/ogg";
}
else if (extension == ".OGA")
{
mime = "audio/ogg";
}
else
{
mime = "video/ogg";
}
}
else if (file.Take(7).SequenceEqual(PDF))
{
mime = "application/pdf";
}
else if (file.Take(16).SequenceEqual(PNG))
{
mime = "image/png";
}
else if (file.Take(7).SequenceEqual(RAR))
{
mime = "application/x-rar-compressed";
}
else if (file.Take(3).SequenceEqual(SWF))
{
mime = "application/x-shockwave-flash";
}
else if (file.Take(4).SequenceEqual(TIFF))
{
mime = "image/tiff";
}
else if (file.Take(11).SequenceEqual(TORRENT))
{
mime = "application/x-bittorrent";
}
else if (file.Take(5).SequenceEqual(TTF))
{
mime = "application/x-font-ttf";
}
else if (file.Take(4).SequenceEqual(WAV_AVI))
{
mime = extension == ".AVI" ? "video/x-msvideo" : "audio/x-wav";
}
else if (file.Take(16).SequenceEqual(WMV_WMA))
{
mime = extension == ".WMA" ? "audio/x-ms-wma" : "video/x-ms-wmv";
}
else if (file.Take(4).SequenceEqual(ZIP_DOCX))
{
mime = extension == ".DOCX" ? "application/vnd.openxmlformats-officedocument.wordprocessingml.document" : "application/x-zip-compressed";
}
return mime;
}
Using file signatures it is not so feasible (since the new office formats are ZIP files and the old Office files are OLE CF / OLE SS containers), but you can use C# code to read them and figure out what they are.
For newest Office formats, you can read the (DOCX/PPTX/XLSX/...) ZIP file using System.IO.Packaging : https://msdn.microsoft.com/en-us/library/ms568187(v=vs.110).aspx
Doing that, you can find the ContentType of the first document part and infer using that.
For older Office files (Office 2003) you can use this library to distinguish them based on their contents (note that MSI and MSG files are also using this file format):
http://sourceforge.net/projects/openmcdf/
E.g., here are the contents of an XLS file:
I hope this helps! :)
It would have certainly helped me, if I had found this answer earlier. ;)
The answer from user2173353 is the most correct one, given that the OP specifically mentioned Office file formats. However, I didn't like the idea of adding an entire library (OpenMCDF) just to identify legacy Office formats, so I wrote my own routine for doing just this.
public static CfbFileFormat GetCfbFileFormat(Stream fileData)
{
if (!fileData.CanSeek)
throw new ArgumentException("Data stream must be seekable.", nameof(fileData));
try
{
// Notice that values in a CFB files are always little-endian. Fortunately BinaryReader.ReadUInt16/ReadUInt32 reads with little-endian.
// If using .net < 4.5 this BinaryReader constructor is not available. Use a simpler one but remember to also remove the 'using' statement.
using (BinaryReader reader = new BinaryReader(fileData, Encoding.Unicode, true))
{
// Check that data has the CFB file header
var header = reader.ReadBytes(8);
if (!header.SequenceEqual(new byte[] {0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1}))
return CfbFileFormat.Unknown;
// Get sector size (2 byte uint) at offset 30 in the header
// Value at 1C specifies this as the power of two. The only valid values are 9 or 12, which gives 512 or 4096 byte sector size.
fileData.Position = 30;
ushort readUInt16 = reader.ReadUInt16();
int sectorSize = 1 << readUInt16;
// Get first directory sector index at offset 48 in the header
fileData.Position = 48;
var rootDirectoryIndex = reader.ReadUInt32();
// File header is one sector wide. After that we can address the sector directly using the sector index
var rootDirectoryAddress = sectorSize + (rootDirectoryIndex * sectorSize);
// Object type field is offset 80 bytes into the directory sector. It is a 128 bit GUID, encoded as "DWORD, WORD, WORD, BYTE[8]".
fileData.Position = rootDirectoryAddress + 80;
var bits127_96 = reader.ReadInt32();
var bits95_80 = reader.ReadInt16();
var bits79_64 = reader.ReadInt16();
var bits63_0 = reader.ReadBytes(8);
var guid = new Guid(bits127_96, bits95_80, bits79_64, bits63_0);
// Compare to known file format GUIDs
CfbFileFormat result;
return Formats.TryGetValue(guid, out result) ? result : CfbFileFormat.Unknown;
}
}
catch (IOException)
{
return CfbFileFormat.Unknown;
}
catch (OverflowException)
{
return CfbFileFormat.Unknown;
}
}
public enum CfbFileFormat
{
Doc,
Xls,
Msi,
Ppt,
Unknown
}
private static readonly Dictionary<Guid, CfbFileFormat> Formats = new Dictionary<Guid, CfbFileFormat>
{
{Guid.Parse("{00020810-0000-0000-c000-000000000046}"), CfbFileFormat.Xls},
{Guid.Parse("{00020820-0000-0000-c000-000000000046}"), CfbFileFormat.Xls},
{Guid.Parse("{00020906-0000-0000-c000-000000000046}"), CfbFileFormat.Doc},
{Guid.Parse("{000c1084-0000-0000-c000-000000000046}"), CfbFileFormat.Msi},
{Guid.Parse("{64818d10-4f9b-11cf-86ea-00aa00b929e8}"), CfbFileFormat.Ppt}
};
Additional formats identifiers can be added as needed.
I've tried this on .doc and .xls, and it has worked fine. I haven't tested on CFB files using 4096 byte sector size, as I don't even know where to find those.
The code is based on information from the following documents:
http://fileformats.archiveteam.org/wiki/Microsoft_Compound_File
https://msdn.microsoft.com/en-us/library/dd942138.aspx
user2173353 has what appears to be the correct solution for detecting the new Office .docx / .xlsx formats.
To add some details to this, the below check appears to identify these correctly:
/// <summary>
/// MS .docx, .xslx and other extensions are (correctly) identified as zip files using signature lookup.
/// This tests if System.IO.Packaging is able to open, and if package has parts, this is not a zip file.
/// </summary>
/// <param name="stream"></param>
/// <returns></returns>
private static bool IsPackage(this Stream stream)
{
Package package = Package.Open(stream, FileMode.Open, FileAccess.Read);
return package.GetParts().Any();
}

Categories

Resources