Detect the encoding of a text file using C# - c#

I have a set of markdown files to be passed to jekyll project , need to find the encoding format of them i.e UTF-8 with BOM or UTF-8 without BOM or ANSI using a program or a API .
if i pass the location of the files , the files have to be listed,read and the encoding should be produced as result .
Is there any Code or API for it ?
i have already tried the sr.CurrentEncoding for stream reader as mentioned in Effective way to find any file's Encoding but the result varies with the result from a notepad++ result .
also tried to use https://github.com/errepi/ude ( Mozilla Universal Charset Detector) as suggested in https://social.msdn.microsoft.com/Forums/vstudio/en-US/862e3342-cc88-478f-bca2-e2de6f60d2fb/detect-encoding-of-the-file?forum=csharpgeneral by implementing the ude.dll in the c# project but the result is not effective as in notepad++ , the file encoding is shown as utf-8 , but from the program , the result is utf-8 with BOM.
but i should get same result from both ways , so where the problem has occurred?

Detecting encoding is always a tricky business, but detecting BOMs is dead simple. To get the BOM as byte array, just use the GetPreamble() function of the encoding objects. This should allow you to detect a whole range of encodings by preamble.
Now, as for detecting UTF-8 without preamble, actually that's not very hard either. See, UTF8 has strict bitwise rules about what values are expected in a valid sequence, and you can initialize a UTF8Encoding object in a way that will fail by throwing an exception when these sequences are incorrect.
So if you first do the BOM check, and then the strict decoding check, and finally fall back to Win-1252 encoding (what you call "ANSI") then your detection is done.
Byte[] bytes = File.ReadAllBytes(filename);
Encoding encoding = null;
String text = null;
// Test UTF8 with BOM. This check can easily be copied and adapted
// to detect many other encodings that use BOMs.
UTF8Encoding encUtf8Bom = new UTF8Encoding(true, true);
Boolean couldBeUtf8 = true;
Byte[] preamble = encUtf8Bom.GetPreamble();
Int32 prLen = preamble.Length;
if (bytes.Length >= prLen && preamble.SequenceEqual(bytes.Take(prLen)))
{
// UTF8 BOM found; use encUtf8Bom to decode.
try
{
// Seems that despite being an encoding with preamble,
// it doesn't actually skip said preamble when decoding...
text = encUtf8Bom.GetString(bytes, prLen, bytes.Length - prLen);
encoding = encUtf8Bom;
}
catch (ArgumentException)
{
// Confirmed as not UTF-8!
couldBeUtf8 = false;
}
}
// use boolean to skip this if it's already confirmed as incorrect UTF-8 decoding.
if (couldBeUtf8 && encoding == null)
{
// test UTF-8 on strict encoding rules. Note that on pure ASCII this will
// succeed as well, since valid ASCII is automatically valid UTF-8.
UTF8Encoding encUtf8NoBom = new UTF8Encoding(false, true);
try
{
text = encUtf8NoBom.GetString(bytes);
encoding = encUtf8NoBom;
}
catch (ArgumentException)
{
// Confirmed as not UTF-8!
}
}
// fall back to default ANSI encoding.
if (encoding == null)
{
encoding = Encoding.GetEncoding(1252);
text = encoding.GetString(bytes);
}
Note that Windows-1252 (US / Western European ANSI) is a one-byte-per-character encoding, meaning everything in it produces a technically valid character, so unless you go for heuristic methods, no further detection can be done on it to distinguish it from other one-byte-per-character encodings.

Necromancing.
First, you check the Byte-Order Mark:
If that doesn't work, you can try to infer the encoding from the text-content with Mozilla Universal Charset Detector C# port.
If that doesn't work, you just return the CurrentCulture/InstalledUiCulture/System-Encoding - or whatever.
if the system-encoding doesn't work, we can either return ASCII or UTF8. Since entries 0-127 of UTF8 are identical to ASCII, we so simply return UTF8.
Example (DetectOrGuessEncoding):
namespace SQLMerge
{
class EncodingDetector
{
public static System.Text.Encoding BomInfo(string srcFile)
{
return BomInfo(srcFile, false);
} // End Function BomInfo
public static System.Text.Encoding BomInfo(string srcFile, bool thorough)
{
byte[] b = new byte[5];
using (System.IO.FileStream file = new System.IO.FileStream(srcFile, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read))
{
int numRead = file.Read(b, 0, 5);
if (numRead < 5)
System.Array.Resize(ref b, numRead);
file.Close();
} // End Using file
if (b.Length >= 4 && b[0] == 0x00 && b[1] == 0x00 && b[2] == 0xFE && b[3] == 0xFF) // UTF32-BE
return System.Text.Encoding.GetEncoding("utf-32BE"); // UTF-32, big-endian
else if (b.Length >= 4 && b[0] == 0xFF && b[1] == 0xFE && b[2] == 0x00 && b[3] == 0x00) // UTF32-LE
return System.Text.Encoding.UTF32; // UTF-32, little-endian
// https://en.wikipedia.org/wiki/Byte_order_mark#cite_note-14
else if (b.Length >= 4 && b[0] == 0x2b && b[1] == 0x2f && b[2] == 0x76 && (b[3] == 0x38 || b[3] == 0x39 || b[3] == 0x2B || b[3] == 0x2F)) // UTF7
return System.Text.Encoding.UTF7; // UTF-7
else if (b.Length >= 3 && b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF) // UTF-8
return System.Text.Encoding.UTF8; // UTF-8
else if (b.Length >= 2 && b[0] == 0xFE && b[1] == 0xFF) // UTF16-BE
return System.Text.Encoding.BigEndianUnicode; // UTF-16, big-endian
else if (b.Length >= 2 && b[0] == 0xFF && b[1] == 0xFE) // UTF16-LE
return System.Text.Encoding.Unicode; // UTF-16, little-endian
// Maybe there is a future encoding ...
// PS: The above yields more than this - this doesn't find UTF7 ...
if (thorough)
{
System.Collections.Generic.List<System.Collections.Generic.KeyValuePair<System.Text.Encoding, byte[]>> lsPreambles =
new System.Collections.Generic.List<System.Collections.Generic.KeyValuePair<System.Text.Encoding, byte[]>>();
foreach (System.Text.EncodingInfo ei in System.Text.Encoding.GetEncodings())
{
System.Text.Encoding enc = ei.GetEncoding();
byte[] preamble = enc.GetPreamble();
if (preamble == null)
continue;
if (preamble.Length == 0)
continue;
if (preamble.Length > b.Length)
continue;
System.Collections.Generic.KeyValuePair<System.Text.Encoding, byte[]> kvp =
new System.Collections.Generic.KeyValuePair<System.Text.Encoding, byte[]>(enc, preamble);
lsPreambles.Add(kvp);
} // Next ei
// li.Sort((a, b) => a.CompareTo(b)); // ascending sort
// li.Sort((a, b) => b.CompareTo(a)); // descending sort
lsPreambles.Sort(
delegate (
System.Collections.Generic.KeyValuePair<System.Text.Encoding, byte[]> kvp1,
System.Collections.Generic.KeyValuePair<System.Text.Encoding, byte[]> kvp2)
{
return kvp2.Value.Length.CompareTo(kvp1.Value.Length);
}
);
for (int j = 0; j < lsPreambles.Count; ++j)
{
for (int i = 0; i < lsPreambles[j].Value.Length; ++i)
{
if (b[i] != lsPreambles[j].Value[i])
{
goto NEXT_J_AND_NOT_NEXT_I;
}
} // Next i
return lsPreambles[j].Key;
NEXT_J_AND_NOT_NEXT_I: continue;
} // Next j
} // End if (thorough)
return null;
} // End Function BomInfo
public static System.Text.Encoding DetectOrGuessEncoding(string fileName)
{
return DetectOrGuessEncoding(fileName, false);
}
public static System.Text.Encoding DetectOrGuessEncoding(string fileName, bool withOutput)
{
if (!System.IO.File.Exists(fileName))
return null;
System.ConsoleColor origBack = System.ConsoleColor.Black;
System.ConsoleColor origFore = System.ConsoleColor.White;
if (withOutput)
{
origBack = System.Console.BackgroundColor;
origFore = System.Console.ForegroundColor;
}
// System.Text.Encoding systemEncoding = System.Text.Encoding.Default; // Returns hard-coded UTF8 on .NET Core ...
System.Text.Encoding systemEncoding = GetSystemEncoding();
System.Text.Encoding enc = BomInfo(fileName);
if (enc != null)
{
if (withOutput)
{
System.Console.BackgroundColor = System.ConsoleColor.Green;
System.Console.ForegroundColor = System.ConsoleColor.White;
System.Console.WriteLine(fileName);
System.Console.WriteLine(enc);
System.Console.BackgroundColor = origBack;
System.Console.ForegroundColor = origFore;
}
return enc;
}
using (System.IO.Stream strm = System.IO.File.OpenRead(fileName))
{
UtfUnknown.DetectionResult detect = UtfUnknown.CharsetDetector.DetectFromStream(strm);
if (detect != null && detect.Details != null && detect.Details.Count > 0 && detect.Details[0].Confidence < 1)
{
if (withOutput)
{
System.Console.BackgroundColor = System.ConsoleColor.Red;
System.Console.ForegroundColor = System.ConsoleColor.White;
System.Console.WriteLine(fileName);
System.Console.WriteLine(detect);
System.Console.BackgroundColor = origBack;
System.Console.ForegroundColor = origFore;
}
foreach (UtfUnknown.DetectionDetail detail in detect.Details)
{
if (detail.Encoding == systemEncoding
|| detail.Encoding == System.Text.Encoding.UTF8
)
return detail.Encoding;
}
return detect.Details[0].Encoding;
}
else if (detect != null && detect.Details != null && detect.Details.Count > 0)
{
if (withOutput)
{
System.Console.BackgroundColor = System.ConsoleColor.Green;
System.Console.ForegroundColor = System.ConsoleColor.White;
System.Console.WriteLine(fileName);
System.Console.WriteLine(detect);
System.Console.BackgroundColor = origBack;
System.Console.ForegroundColor = origFore;
}
return detect.Details[0].Encoding;
}
enc = GetSystemEncoding();
if (withOutput)
{
System.Console.BackgroundColor = System.ConsoleColor.DarkRed;
System.Console.ForegroundColor = System.ConsoleColor.Yellow;
System.Console.WriteLine(fileName);
System.Console.Write("Assuming ");
System.Console.Write(enc.WebName);
System.Console.WriteLine("...");
System.Console.BackgroundColor = origBack;
System.Console.ForegroundColor = origFore;
}
return systemEncoding;
} // End Using strm
} // End Function DetectOrGuessEncoding
public static System.Text.Encoding GetSystemEncoding()
{
// The OEM code page for use by legacy console applications
// int oem = System.Globalization.CultureInfo.CurrentCulture.TextInfo.OEMCodePage;
// The ANSI code page for use by legacy GUI applications
// int ansi = System.Globalization.CultureInfo.InstalledUICulture.TextInfo.ANSICodePage; // Machine
int ansi = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage; // User
try
{
// https://stackoverflow.com/questions/38476796/how-to-set-net-core-in-if-statement-for-compilation
#if ( NETSTANDARD && !NETSTANDARD1_0 ) || NETCORE || NETCOREAPP3_0 || NETCOREAPP3_1
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
#endif
System.Text.Encoding enc = System.Text.Encoding.GetEncoding(ansi);
return enc;
}
catch (System.Exception)
{ }
try
{
foreach (System.Text.EncodingInfo ei in System.Text.Encoding.GetEncodings())
{
System.Text.Encoding e = ei.GetEncoding();
// 20'127: US-ASCII
if (e.WindowsCodePage == ansi && e.CodePage != 20127)
{
return e;
}
}
}
catch (System.Exception)
{ }
// return System.Text.Encoding.GetEncoding("iso-8859-1");
return System.Text.Encoding.UTF8;
} // End Function GetSystemEncoding
} // End Class
}

namespace WindowsFormsApp2
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
List<FilePath> filePaths = new List<FilePath>();
filePaths = GetLstPaths();
}
public static List<FilePath> GetLstPaths()
{
#region Getting Files
DirectoryInfo directoryInfo = new DirectoryInfo(#"C:\Users\Safi\Desktop\ss\");
DirectoryInfo directoryTargetInfo = new DirectoryInfo(#"C:\Users\Safi\Desktop\ss1\");
FileInfo[] fileInfos = directoryInfo.GetFiles("*.txt");
List<FilePath> lstFiles = new List<FilePath>();
foreach (FileInfo fileInfo in fileInfos)
{
Encoding enco = GetLittleIndianFiles(directoryInfo + fileInfo.Name);
string filePath = directoryInfo + fileInfo.Name;
string targetFilePath = directoryTargetInfo + fileInfo.Name;
if (enco != null)
{
FilePath f1 = new FilePath();
f1.filePath = filePath;
f1.targetFilePath = targetFilePath;
lstFiles.Add(f1);
}
}
int count = 0;
lstFiles.ForEach(d =>
{
count++;
});
MessageBox.Show(Convert.ToString(count) + "Files are Converted");
#endregion
return lstFiles;
}
public static Encoding GetLittleIndianFiles(string srcFile)
{
byte[] b = new byte[5];
using (System.IO.FileStream file = new System.IO.FileStream(srcFile, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read))
{
int numRead = file.Read(b, 0, 5);
if (numRead < 5)
System.Array.Resize(ref b, numRead);
file.Close();
} // End Using file
if (b.Length >= 2 && b[0] == 0xFF && b[1] == 0xFE)
return System.Text.Encoding.Unicode; // UTF-16, little-endian
return null;
}
}
public class FilePath
{
public string filePath { get; set; }
public string targetFilePath { get; set; }
}
}

Related

Python to C# Zlib decompression conversion

It tooks me hours trying to figure out how to mimic the exact decompression in this project: https://github.com/crimsoncantab/aok-hotkeys/blob/master/modules/hkizip.py but to no avail.
If you go to this website: http://aokhotkeys.appspot.com/ you can click any preset and download it to see for yourself.
The decompression is supposed to output a .hki file to text file
This is my attempt, which may not be as accurate as in the python project linked above:
private string ZlibCodecDecompress(byte[] compressed)
{
int outputSize = 2048;
byte[] output = new Byte[outputSize];
// If you have a ZLIB stream, set this to true. If you have
// a bare DEFLATE stream, set this to false.
bool expectRfc1950Header = false;
using (MemoryStream ms = new MemoryStream())
{
ZlibCodec compressor = new ZlibCodec();
compressor.InitializeInflate(expectRfc1950Header);
compressor.InputBuffer = compressed;
compressor.AvailableBytesIn = compressed.Length;
compressor.NextIn = 0;
compressor.CompressLevel = Ionic.Zlib.CompressionLevel.Level8;
compressor.OutputBuffer = output;
foreach (var f in new FlushType[] { FlushType.None, FlushType.Finish })
{
int bytesToWrite = 0;
do
{
compressor.AvailableBytesOut = outputSize;
compressor.NextOut = 0;
compressor.Inflate(f);
bytesToWrite = outputSize - compressor.AvailableBytesOut;
if (bytesToWrite > 0)
ms.Write(output, 0, bytesToWrite);
}
while ((f == FlushType.None && (compressor.AvailableBytesIn != 0 || compressor.AvailableBytesOut == 0)) ||
(f == FlushType.Finish && bytesToWrite != 0));
}
compressor.EndInflate();
return UTF8Encoding.UTF8.GetString(ms.ToArray());
}
}
This method returns: "Ionic.Zlib.ZlibException: Bad state"
I'd appreciate any help.

Reading a single UTF8 character from stream in C#

I am looking to read the next UTF8 character from a Stream or BinaryReader. Things that don't work:
BinaryReader::ReadChar -- this will throw on a 3 or 4 byte character. Since it returns a two byte structure, it has no choice.
BinaryReader::ReadChars -- this will throw if you ask it to read 1 character and it encounters a 3 or 4 byte character. Will read multiple characters if you ask it to read more than 1 character.
StreamReader::Read -- this needs to know how many bytes to read, but the number of bytes in a UTF8 character is variable.
The code I have that seems to work:
private char[] ReadUTF8Char(Stream s)
{
byte[] bytes = new byte[4];
var enc = new UTF8Encoding(false, true);
if (1 != s.Read(bytes, 0, 1))
return null;
if (bytes[0] <= 0x7F) //Single byte character
{
return enc.GetChars(bytes, 0, 1);
}
else
{
var remainingBytes =
((bytes[0] & 240) == 240) ? 3 : (
((bytes[0] & 224) == 224) ? 2 : (
((bytes[0] & 192) == 192) ? 1 : -1
));
if (remainingBytes == -1)
return null;
s.Read(bytes, 1, remainingBytes);
return enc.GetChars(bytes, 0, remainingBytes + 1);
}
}
Obviously, this is a bit of a mess, and somewhat specific to UTF8. Is there a more elegant, less custom, easier-to-read solution to this problem?
I know this question is a bit old but here is another solution. It is not as good in performance as the OPs solution (which I also prefer), but it only uses builtin-utf8-functionality without knowing about the utf8-encoding internals.
private static char ReadUTF8Char(Stream s)
{
if (s.Position >= s.Length)
throw new Exception("Error: Read beyond EOF");
using (BinaryReader reader = new BinaryReader(s, Encoding.Unicode, true))
{
int numRead = Math.Min(4, (int)(s.Length - s.Position));
byte[] bytes = reader.ReadBytes(numRead);
char[] chars = Encoding.UTF8.GetChars(bytes);
if (chars.Length == 0)
throw new Exception("Error: Invalid UTF8 char");
int charLen = Encoding.UTF8.GetByteCount(new char[] { chars[0] });
s.Position += (charLen - numRead);
return chars[0];
}
}
The encoding passed to the constructor of BinaryReader doesn't matter. I had to use this version of the constructor to leave the stream open. If you already have a binary reader you can just use this:
private static char ReadUTF8Char(BinaryReader reader)
{
var s = reader.BaseStream;
if (s.Position >= s.Length)
throw new Exception("Error: Read beyond EOF");
int numRead = Math.Min(4, (int)(s.Length - s.Position));
byte[] bytes = reader.ReadBytes(numRead);
char[] chars = Encoding.UTF8.GetChars(bytes);
if (chars.Length == 0)
throw new Exception("Error: Invalid UTF8 char");
int charLen = Encoding.UTF8.GetByteCount(new char[] { chars[0] });
s.Position += (charLen - numRead);
return chars[0];
}

How do you read UTF-8 characters from an infinite byte stream - C#

Normally, to read characters from a byte stream you use a StreamReader. In this example I'm reading records delimited by '\r' from an infinite stream.
using(var reader = new StreamReader(stream, Encoding.UTF8))
{
var messageBuilder = new StringBuilder();
var nextChar = 'x';
while (reader.Peek() >= 0)
{
nextChar = (char)reader.Read()
messageBuilder.Append(nextChar);
if (nextChar == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}
}
The problem is that the StreamReader has a small internal buffer, so if the code waiting for an 'end of record' delimiter ('\r' in this case) it has to wait until the StreamReader's internal buffer is flushed (usually because more bytes have arrived).
This alternative implementation works for single byte UTF-8 characters, but will fail on multibyte characters.
int byteAsInt = 0;
var messageBuilder = new StringBuilder();
while ((byteAsInt = stream.ReadByte()) != -1)
{
var nextChar = Encoding.UTF8.GetChars(new[]{(byte) byteAsInt});
Console.Write(nextChar[0]);
messageBuilder.Append(nextChar);
if (nextChar[0] == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}
How can I modify this code so that it works with multi-byte characters?
Rather than Encoding.UTF8.GetChars which is designed to convert complete buffers, get an instance of Decoder and repeatedly call its member method GetChars this will make use of the Decoder's internal buffer to handle partial multi-byte sequences from the end of one call to the next.
Thanks to Richard, I now have a working infinite stream reader. As he explained, the trick is to use a Decoder instance and call its GetChars method. I've tested it with multi-byte Japanese text and it works fine.
int byteAsInt = 0;
var messageBuilder = new StringBuilder();
var decoder = Encoding.UTF8.GetDecoder();
var nextChar = new char[1];
while ((byteAsInt = stream.ReadByte()) != -1)
{
var charCount = decoder.GetChars(new[] {(byte) byteAsInt}, 0, 1, nextChar, 0);
if(charCount == 0) continue;
Console.Write(nextChar[0]);
messageBuilder.Append(nextChar);
if (nextChar[0] == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}
I don't understand why you're not using the stream reader's ReadLine method. If there's a good reason not to, however, it nonetheless seems to me that repeatedly calling GetChars on the decoder is inefficient. Why not make use of the fact that the byte representation of '\r' can't be part of a multi-byte sequence? (Bytes in a multi-byte sequence must be greater than 127; that is, they have the highest bit set.)
var messageBuilder = new List<byte>();
int byteAsInt;
while ((byteAsInt = stream.ReadByte()) != -1)
{
messageBuilder.Add((byte)byteAsInt);
if (byteAsInt == '\r')
{
var messageString = Encoding.UTF8.GetString(messageBuilder.ToArray());
Console.Write(messageString);
ProcessBuffer(messageString);
messageBuilder.Clear();
}
}
Mike,
I found your solution perfect for my situation as well. But I noticed that sometimes it takes four GetChar() calls to determine the characters to be returned. This meant that charCount was 2, while my nextChar buffer size was 1. So I got error "The output character buffer is too small to contain the decoded characters, encoding Unicode fallback System.Text.DecoderReplacementFallback."
I changed my code to:
// ...
var nextChar = new char[4]; // 2 might suffice
for (var i = startPos; i < bytesRead; i++)
{
int charCount;
//...
charCount = decoder.GetChars(buffer, i, 1, nextChar, 0);
if (charCount == 0)
{
bytesSkipped++;
continue;
}
for (int ic = 0; ic < charCount; ic++)
{
char c = nextChar[ic];
charPos++;
// Process character here...
}
}

How to verify this PGP message in C# using Bouncy Castle or other c# library

All i need to do is verify the message below but I can not get Bouncy Castle to take the data in and given the public key verify the message. I am happy for it to be some other Lib that is used if it is free. This is to be embedded in my app that receives data over the Internet so i would prefer to keep it all managed code if at all possible.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
SCI Version: 1.0
SCI Code: 1
SCI Reason: OK
SCI Balance: 0.00050000
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MBC v1.0
iQEcBAEBAgAGBQJOGSuYAAoJEJ+5g06lAnqFkdsH/0NoqQbNvR8ZPe8D4gL4gvff
6K1t2LOt0sQGj+RSPeEbag7ZnVNI65LiES/yie1N6cXMkFgb9/ttjxi9/wlbxD/j
gSkuZ6mT9Oc5ExLsRZq9ygytvVs7Ol7uQm6oxDzJX1JMs0ls2EwJbmmpTEOHn8Av
dGlxdZeh+3RlqHJmOdssQCJ0cw5VXuj5vfP35OYz2zO2+sNg0eCXdR5Ml+2S7n3U
n9VHPEECg72LvpxF/y/nApopXoHpwECXoBwHgyd9QIIw1IJgalyRLDmAJ2WXdROV
ln2Mkt/km3KtBS3h4QL407wi/KhgZ4tFohZupt7zq2zUwtHWOhbL2KSUu939OKk=
=mIjM
-----END PGP SIGNATURE-----
For those interested i discovered an example for this exact task in the BouncyCastle source code. You need to download the source code not the binary to get the examples and it seems to have examples for all the different OpenPGP use cases.
Following Seer's suggestion to look at this example I finally got message verification running.
I have method VerifyFile which takes signed message and public key and returns the content of the message if the verification passes. For example it can be used like this:
string key = #"-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: BCPG C# v1.6.1.0
mQENBFpc87wBCACK5FG6Z70iovzSzQF7OEB/YbKF7SPS1/GZAQhp/2n2G8x5Lxj5
/CKqR8JLj1+222unuWgfqvfny0fLvttt1r6lAH/kqDYMBg26GTbZy93R5BYatBjd
pzYl/lIyKxc/QwDdZm8zNxeUpDSfoe9jVULOg0MiMDtdQupOf6CanlEioXfyf88F
1BLcJyFSckaYieosBw5hnnI+1cZZ3k+4HpDJJslVzngfTPWRibtN5PKff1CKP55E
ME99XkuPDaNL7XZmu7lZSEUN3jJFVydZQrDkvxddihzV4pTgRI3gDAFoJxxIYZX3
JsQAJItlqq8bBsQ+bKPikgAiMySGcEi+ilI5ABEBAAG0GnNoYWxhbWFub3YubWFy
aW5AZ21haWwuY29tiQEcBBABAgAGBQJaXPO8AAoJEBvHdfmVFHzkvHEH/179VAdH
gWRN5HVprsp4vHP3q1CJV9j+fPlQIZU3JEwrM+INxzpfSqZeN4NwB7yoo2NCdCNP
Ndg8zhiuEYM51hNtqU5cwYBcaAbm1so6TSVo8i4nrfN3+oDYEfYPqglNrd1V233J
oyLriwpGkR6RBYMY2q2Re+EFNR1bxUmeE0wnb8FOodRCSh0Wd3Iy9mvmhv5voHIr
aZzgsuifGw1JilSu9+RoC6b1CHb9jUkWQ/odkTvl5/rxA14TKstgoLoSLHktYQfw
le6B8+lPtmODtagWoDEeR/M0zm/wyCOt5wqjjJCgvaipUaA+oiijIYwCpqUBwfm3
DZ9DStGHGVxQQnc=
=s91O
-----END PGP PUBLIC KEY BLOCK-----
";
string message = #"-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
test tes tes ts tse tse t
-----BEGIN PGP SIGNATURE-----
Version: OpenPGP.js v1.0.1
Comment: http://openpgpjs.org
wsBcBAEBCAAQBQJaXP5WCRAbx3X5lRR85AAAUcoH/jtcyWcpTVyXyz/ptWLo
Hx+g51EeeA0Hpq7kZCXu4FuyhNn/QvnvKyt9qegxQoRSZhT37ln8t80NW6sS
B4XVFziq8TBkjPuaYBI/ijbLigdPMEi81PsOpIXx3BXKCt27TLmUVHpFTWPa
u2NQUQl3k3Xc0H1sy1A+jmjfvCyqWxTOU1IY4rlzRKHdp+D1oCz5iKfyfUko
ktAZgqOMx5pWL975YhM793MS8aYqhOdQpeuW401fm18xxwE4x6svSSys+qq8
MdkL/i7YVjUKr/M8SIrGPb/IjKwClM7jfpN+sHv0p/GcQ7J1kmXYUdA6AJp5
Z0vYk4aPcMSlrPwdRX21I9w=
=zXfe
-----END PGP SIGNATURE-----
";
MemoryStream messageStream = new MemoryStream(Encoding.ASCII.GetBytes(message));
MemoryStream keyStream = new MemoryStream(Encoding.ASCII.GetBytes(key));
try {
var msg= VerifyFile(messageStream,
PgpUtilities.GetDecoderStream(keyStream));
// verification passes msg is the content of the message
} catch (Exception e) {
// verification fails
}
And here is the implementation:
private static String VerifyFile(Stream inputStream, Stream keyIn)
{
ArmoredInputStream aIn = new ArmoredInputStream(inputStream);
MemoryStream outStr = new MemoryStream(); // File.Create(resultName);
//
// write out signed section using the local line separator.
// note: trailing white space needs to be removed from the end of
// each line RFC 4880 Section 7.1
//
MemoryStream lineOut = new MemoryStream();
int lookAhead = ReadInputLine(lineOut, aIn);
byte[] lineSep = LineSeparator;
if (lookAhead != -1 && aIn.IsClearText())
{
byte[] line = lineOut.ToArray();
outStr.Write(line, 0, GetLengthWithoutSeparatorOrTrailingWhitespace(line));
outStr.Write(lineSep, 0, lineSep.Length);
while (lookAhead != -1 && aIn.IsClearText())
{
lookAhead = ReadInputLine(lineOut, lookAhead, aIn);
line = lineOut.ToArray();
outStr.Write(line, 0, GetLengthWithoutSeparatorOrTrailingWhitespace(line));
outStr.Write(lineSep, 0, lineSep.Length);
}
}
else
{
// a single line file
if (lookAhead != -1)
{
byte[] line = lineOut.ToArray();
outStr.Write(line, 0, GetLengthWithoutSeparatorOrTrailingWhitespace(line));
outStr.Write(lineSep, 0, lineSep.Length);
}
}
outStr.Flush();
//outStr.Close();
PgpPublicKeyRingBundle pgpRings = new PgpPublicKeyRingBundle(keyIn);
PgpObjectFactory pgpFact = new PgpObjectFactory(aIn);
PgpSignatureList p3 = (PgpSignatureList)pgpFact.NextPgpObject();
PgpSignature sig = p3[0];
var key = pgpRings.GetPublicKey(sig.KeyId);
if (key == null)
{
throw new Exception("Can't verify the message signature.");
}
sig.InitVerify(key);
//
// read the input, making sure we ignore the last newline.
//
outStr.Seek(0, SeekOrigin.Begin);
StreamReader reader = new StreamReader(outStr);
string messageContent = reader.ReadToEnd();
outStr.Seek(0, SeekOrigin.Begin);
Stream sigIn = outStr; //File.OpenRead(resultName);
lookAhead = ReadInputLine(lineOut, sigIn);
ProcessLine(sig, lineOut.ToArray());
if (lookAhead != -1)
{
do
{
lookAhead = ReadInputLine(lineOut, lookAhead, sigIn);
sig.Update((byte)'\r');
sig.Update((byte)'\n');
ProcessLine(sig, lineOut.ToArray());
}
while (lookAhead != -1);
}
sigIn.Close();
if (sig.Verify()) {
// signature verified
return messageContent;
} else {
// signature verification failed
throw new Exception("Can't verify the message signature.");
}
}
private static int ReadInputLine(
MemoryStream bOut,
Stream fIn)
{
bOut.SetLength(0);
int lookAhead = -1;
int ch;
while ((ch = fIn.ReadByte()) >= 0)
{
bOut.WriteByte((byte)ch);
if (ch == '\r' || ch == '\n')
{
lookAhead = ReadPassedEol(bOut, ch, fIn);
break;
}
}
return lookAhead;
}
private static int ReadPassedEol(
MemoryStream bOut,
int lastCh,
Stream fIn)
{
int lookAhead = fIn.ReadByte();
if (lastCh == '\r' && lookAhead == '\n')
{
bOut.WriteByte((byte)lookAhead);
lookAhead = fIn.ReadByte();
}
return lookAhead;
}
private static void ProcessLine(
PgpSignature sig,
byte[] line)
{
// note: trailing white space needs to be removed from the end of
// each line for signature calculation RFC 4880 Section 7.1
int length = GetLengthWithoutWhiteSpace(line);
if (length > 0)
{
sig.Update(line, 0, length);
}
}
private static void ProcessLine(
Stream aOut,
PgpSignatureGenerator sGen,
byte[] line)
{
int length = GetLengthWithoutWhiteSpace(line);
if (length > 0)
{
sGen.Update(line, 0, length);
}
aOut.Write(line, 0, line.Length);
}
private static int GetLengthWithoutSeparatorOrTrailingWhitespace(byte[] line)
{
int end = line.Length - 1;
while (end >= 0 && IsWhiteSpace(line[end]))
{
end--;
}
return end + 1;
}
private static bool IsLineEnding(
byte b)
{
return b == '\r' || b == '\n';
}
private static int GetLengthWithoutWhiteSpace(
byte[] line)
{
int end = line.Length - 1;
while (end >= 0 && IsWhiteSpace(line[end]))
{
end--;
}
return end + 1;
}
private static bool IsWhiteSpace(
byte b)
{
return IsLineEnding(b) || b == '\t' || b == ' ';
}
private static int ReadInputLine(
MemoryStream bOut,
int lookAhead,
Stream fIn)
{
bOut.SetLength(0);
int ch = lookAhead;
do
{
bOut.WriteByte((byte)ch);
if (ch == '\r' || ch == '\n')
{
lookAhead = ReadPassedEol(bOut, ch, fIn);
break;
}
}
while ((ch = fIn.ReadByte()) >= 0);
if (ch < 0)
{
lookAhead = -1;
}
return lookAhead;
}
private static byte[] LineSeparator
{
get { return Encoding.ASCII.GetBytes(Environment.NewLine); }
}
using System;
using System.Collections;
using System.IO;
using Org.BouncyCastle.Bcpg.OpenPgp;
namespace Org.BouncyCastle.Bcpg.OpenPgp.Examples
{
/**
* A simple utility class that signs and verifies files.
* <p>
* To sign a file: SignedFileProcessor -s [-a] fileName secretKey passPhrase.<br/>
* If -a is specified the output file will be "ascii-armored".</p>
* <p>
* To decrypt: SignedFileProcessor -v fileName publicKeyFile.</p>
* <p>
* <b>Note</b>: this example will silently overwrite files, nor does it pay any attention to
* the specification of "_CONSOLE" in the filename. It also expects that a single pass phrase
* will have been used.</p>
* <p>
* <b>Note</b>: the example also makes use of PGP compression. If you are having difficulty Getting it
* to interoperate with other PGP programs try removing the use of compression first.</p>
*/
public sealed class SignedFileProcessor
{
private SignedFileProcessor() {}
/**
* verify the passed in file as being correctly signed.
*/
private static void VerifyFile(
Stream inputStream,
Stream keyIn)
{
inputStream = PgpUtilities.GetDecoderStream(inputStream);
PgpObjectFactory pgpFact = new PgpObjectFactory(inputStream);
PgpCompressedData c1 = (PgpCompressedData) pgpFact.NextPgpObject();
pgpFact = new PgpObjectFactory(c1.GetDataStream());
PgpOnePassSignatureList p1 = (PgpOnePassSignatureList) pgpFact.NextPgpObject();
PgpOnePassSignature ops = p1[0];
PgpLiteralData p2 = (PgpLiteralData) pgpFact.NextPgpObject();
Stream dIn = p2.GetInputStream();
PgpPublicKeyRingBundle pgpRing = new PgpPublicKeyRingBundle(PgpUtilities.GetDecoderStream(keyIn));
PgpPublicKey key = pgpRing.GetPublicKey(ops.KeyId);
Stream fos = File.Create(p2.FileName);
ops.InitVerify(key);
int ch;
while ((ch = dIn.ReadByte()) >= 0)
{
ops.Update((byte)ch);
fos.WriteByte((byte) ch);
}
fos.Close();
PgpSignatureList p3 = (PgpSignatureList)pgpFact.NextPgpObject();
PgpSignature firstSig = p3[0];
if (ops.Verify(firstSig))
{
Console.Out.WriteLine("signature verified.");
}
else
{
Console.Out.WriteLine("signature verification failed.");
}
}
/**
* Generate an encapsulated signed file.
*
* #param fileName
* #param keyIn
* #param outputStream
* #param pass
* #param armor
*/
private static void SignFile(
string fileName,
Stream keyIn,
Stream outputStream,
char[] pass,
bool armor,
bool compress)
{
if (armor)
{
outputStream = new ArmoredOutputStream(outputStream);
}
PgpSecretKey pgpSec = PgpExampleUtilities.ReadSecretKey(keyIn);
PgpPrivateKey pgpPrivKey = pgpSec.ExtractPrivateKey(pass);
PgpSignatureGenerator sGen = new PgpSignatureGenerator(pgpSec.PublicKey.Algorithm, HashAlgorithmTag.Sha1);
sGen.InitSign(PgpSignature.BinaryDocument, pgpPrivKey);
foreach (string userId in pgpSec.PublicKey.GetUserIds())
{
PgpSignatureSubpacketGenerator spGen = new PgpSignatureSubpacketGenerator();
spGen.SetSignerUserId(false, userId);
sGen.SetHashedSubpackets(spGen.Generate());
// Just the first one!
break;
}
Stream cOut = outputStream;
PgpCompressedDataGenerator cGen = null;
if (compress)
{
cGen = new PgpCompressedDataGenerator(CompressionAlgorithmTag.ZLib);
cOut = cGen.Open(cOut);
}
BcpgOutputStream bOut = new BcpgOutputStream(cOut);
sGen.GenerateOnePassVersion(false).Encode(bOut);
FileInfo file = new FileInfo(fileName);
PgpLiteralDataGenerator lGen = new PgpLiteralDataGenerator();
Stream lOut = lGen.Open(bOut, PgpLiteralData.Binary, file);
FileStream fIn = file.OpenRead();
int ch = 0;
while ((ch = fIn.ReadByte()) >= 0)
{
lOut.WriteByte((byte) ch);
sGen.Update((byte)ch);
}
fIn.Close();
lGen.Close();
sGen.Generate().Encode(bOut);
if (cGen != null)
{
cGen.Close();
}
if (armor)
{
outputStream.Close();
}
}
public static void Main(
string[] args)
{
// TODO provide command-line option to determine whether to use compression in SignFile
if (args[0].Equals("-s"))
{
Stream keyIn, fos;
if (args[1].Equals("-a"))
{
keyIn = File.OpenRead(args[3]);
fos = File.Create(args[2] + ".asc");
SignFile(args[2], keyIn, fos, args[4].ToCharArray(), true, true);
}
else
{
keyIn = File.OpenRead(args[2]);
fos = File.Create(args[1] + ".bpg");
SignFile(args[1], keyIn, fos, args[3].ToCharArray(), false, true);
}
keyIn.Close();
fos.Close();
}
else if (args[0].Equals("-v"))
{
using (Stream fis = File.OpenRead(args[1]),
keyIn = File.OpenRead(args[2]))
{
VerifyFile(fis, keyIn);
}
}
else
{
Console.Error.WriteLine("usage: SignedFileProcessor -v|-s [-a] file keyfile [passPhrase]");
}
}
}
}

c#: how to read parts of a file? (DICOM)

I would like to read a DICOM file in C#. I don't want to do anything fancy, I just for now would like to know how to read in the elements, but first I would actually like to know how to read the header to see if is a valid DICOM file.
It consists of Binary Data Elements. The first 128 bytes are unused (set to zero), followed by the string 'DICM'. This is followed by header information, which is organized into groups.
A sample DICOM header
First 128 bytes: unused DICOM format.
Followed by the characters 'D','I','C','M'
Followed by extra header information such as:
0002,0000, File Meta Elements Groups Len: 132
0002,0001, File Meta Info Version: 256
0002,0010, Transfer Syntax UID: 1.2.840.10008.1.2.1.
0008,0000, Identifying Group Length: 152
0008,0060, Modality: MR
0008,0070, Manufacturer: MRIcro
In the above example, the header is organized into groups. The group 0002 hex is the file meta information group which contains 3 elements: one defines the group length, one stores the file version and the their stores the transfer syntax.
Questions
How to I read the header file and verify if it is a DICOM file by checking for the 'D','I','C','M' characters after the 128 byte preamble?
How do I continue to parse the file reading the other parts of the data?
Something like this should read the file, its basic and doesn't handle all cases, but it would be a starting point:
public void ReadFile(string filename)
{
using (FileStream fs = File.OpenRead(filename))
{
fs.Seek(128, SeekOrigin.Begin);
if ((fs.ReadByte() != (byte)'D' ||
fs.ReadByte() != (byte)'I' ||
fs.ReadByte() != (byte)'C' ||
fs.ReadByte() != (byte)'M'))
{
Console.WriteLine("Not a DCM");
return;
}
BinaryReader reader = new BinaryReader(fs);
ushort g;
ushort e;
do
{
g = reader.ReadUInt16();
e = reader.ReadUInt16();
string vr = new string(reader.ReadChars(2));
long length;
if (vr.Equals("AE") || vr.Equals("AS") || vr.Equals("AT")
|| vr.Equals("CS") || vr.Equals("DA") || vr.Equals("DS")
|| vr.Equals("DT") || vr.Equals("FL") || vr.Equals("FD")
|| vr.Equals("IS") || vr.Equals("LO") || vr.Equals("PN")
|| vr.Equals("SH") || vr.Equals("SL") || vr.Equals("SS")
|| vr.Equals("ST") || vr.Equals("TM") || vr.Equals("UI")
|| vr.Equals("UL") || vr.Equals("US"))
length = reader.ReadUInt16();
else
{
// Read the reserved byte
reader.ReadUInt16();
length = reader.ReadUInt32();
}
byte[] val = reader.ReadBytes((int) length);
} while (g == 2);
fs.Close();
}
return ;
}
The code does not actually try and take into account that the transfer syntax of the encoded data can change after the group 2 elements, it also doesn't try and do anything with the actual values read in.
Just some pseudologic
How to I read the header file and verify if it is a DICOM file by checking for the 'D','I','C','M' characters after the 128 byte preamble?
Open as binary file, using File.OpenRead
Seek to position 128 and read 4 bytes into the array and compare it againts byte[] value for DICM. You can use ASCIIEncoding.GetBytes() for that
How do I continue to parse the file reading the other parts of the data?
Continue reading the file using Read or ReadByte using the FileStream object handle that you have earlier
Use the same method like above to do your comparison.
Dont forget to close and dispose the file.
you can also use like this.
FileStream fs = File.OpenRead(path);
byte[] data = new byte[132];
fs.Read(data, 0, data.Length);
int b0 = data[0] & 255, b1 = data[1] & 255, b2 = data[2] & 255, b3 = data[3] & 255;
if (data[128] == 68 && data[129] == 73 && data[130] == 67 && data[131] == 77)
{
//dicom file
}
else if ((b0 == 8 || b0 == 2) && b1 == 0 && b3 == 0)
{
//dicom file
}
Taken from EvilDicom.Helper.DicomReader from the Evil Dicom library:
public static bool IsValidDicom(BinaryReader r)
{
try
{
//128 null bytes
byte[] nullBytes = new byte[128];
r.Read(nullBytes, 0, 128);
foreach (byte b in nullBytes)
{
if (b != 0x00)
{
//Not valid
Console.WriteLine("Missing 128 null bit preamble. Not a valid DICOM file!");
return false;
}
}
}
catch (Exception)
{
Console.WriteLine("Could not read 128 null bit preamble. Perhaps file is too short");
return false;
}
try
{
//4 DICM characters
char[] dicm = new char[4];
r.Read(dicm, 0, 4);
if (dicm[0] != 'D' || dicm[1] != 'I' || dicm[2] != 'C' || dicm[3] != 'M')
{
//Not valid
Console.WriteLine("Missing characters D I C M in bits 128-131. Not a valid DICOM file!");
return false;
}
return true;
}
catch (Exception)
{
Console.WriteLine("Could not read DICM letters in bits 128-131.");
return false;
}
}

Categories

Resources