Read .dat file in c# - c#

i want to read a .dat file that have following conditions:-
First name offset 21
First name format ASCIIz 15 chars + \0
Middle initials offset 37
ID offset -8
ID format/length Unsigned int (4 bytes)
so help me for sorting this issue in c#.
Thanks in advance.
Gurpreet
.dat file
( ÿ / rE ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ XÙþÞ¦d e e Mr. Sam Ascott Sam 9209 Sandpiper Lane 21204 410 5558987 410 5556700 275 MM229399098 (¬ Þ e ܤ•Þ„ œÔ£ÝáØáØ ’Þ[Þ €–˜ ä–˜ [Þ ¶ Norman Eaton Friend of Dr. Shultz Removal of #1,16,17 & 32 öÜÝ)Ý Ä d 01 21 21 21 e 101 22099 XÙþÞ¦d e . Mrs. Patty Baxter Patty 3838 Tommytrue Court 21234 410 2929290 410 3929209 FM218798127 HAY FEVER Þ . „¤¢Þè   _ÐÍÝBÒBÒ ’ÞÝ €–˜ ä–˜ ÍÝ f Joanne Abbey

Here is a tutorial how to use BinaryReader for this purpose:
http://dotnetperls.com/binaryreader

You can use Jet OleDB to query .dat files:
var query = "select * from file.dat";
var connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\file.dat;Extended Properties=\"text;HDR=NO;FMT=FixedLength\"");
See this link:
Code Project. Read Text File (txt, csv, log, tab, fixed length)
And check these:
Reading a sequential-access file
DAT files in C#
Read a file in C#
BinaryReader Class
Jon Skeet. Reading binary data in C#

As said, have a look at the BinaryReader
//Example...
BinaryReader reader = new BinaryReader(stream);
string name = Encoding.ASCII.GetString(reader.ReadBytes(8));
int number = reader.ReadInt32();

my .dat file content - " €U§µ­PÕ „ÕG¬u "
click here to see the content of my .dat file in Notepad and hexa editor
string fileName = #"W:\yourfilename.dat";
//Read the binary file as byte array
byte[] bHex = File.ReadAllBytes(fileName);
//Create string builder for extracting the HEX values
StringBuilder st = new StringBuilder();
//initialize the int for 0
int i = 0;
// check it worked
//Reverse the HEX array for readability
foreach (char c in bHex.Reverse())
{
i++;
// 12 to 21 byte in the reverse order for interseted value in ticks"
if (i > 12 && i < 21)
st.Append(Convert.ToInt32(c).ToString("X2"));
}
// Convert HEX to Deciamal
long Output = Convert.ToInt64(st.ToString(), 16);
//Convert ticks to date time
DateTime dt = new DateTime(Output);
//Write the output date to console
Console.Write(dt);
Final de-crypt binary content to data-time.
final output of the program

Related

EBCDIC COMP value to integer

How can I convert an EBCDIC encoded file with PIC S9(04) COMP. to an integer value?
This file contains 1234 in the first line and -1234 on the second line.
Binary:
0101100011010010101100000101000011101111011000001011000001010000
Example:
How can I convert an EBCDIC encoded file with PIC S9(04) COMP. to an integer value?
Don't do that.
During file conversion, the binary data is converted using an EBCDIC to ASCII table. The 04 D2 (+1234) was converted to 1A 4B. The 04 (EBCDIC SEL) was changed to 1A (ASCII SUB), because there is no equivalent for the conversion. The D2 (EBCDIC 'K') was changed to 4B (ASCII 'K'). There is no way to reverse the conversion, due to the SUB. The same problem exists with the conversion of -1234.
Your best bet is to use PIC +9(04), which will be +1234 or -1234 both before and after conversion.
Test data using PIC +9(04) written to file
+1234
-1234
+0001
-0001
+0000
Test program
using System;
using System.IO;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string path = "z:e1.txt";
using (StreamReader sr = new StreamReader(path))
{
while (sr.Peek() >= 0)
{
String s = sr.ReadLine();
short n = Int16.Parse(s);
Console.WriteLine(n);
}
}
}
}
}
Displayed result
1234
-1234
1
-1
0
COBOL data types with computational usage (COMP, COMP-n) are not text, so you must not do any character conversion operation with them.
Computational types are stored as either binary numbers, floating point numbers, or packed decimal numbers. COMP is binary. The length depends on the number of 9s in the PIC clause. Note that the binary types are storeed in big-endian format.
In your case, the PIC +9(4) occupies two bytes, each. For example. if you see
0x1234
as the hexadecimal represetation of a PIC +9(4) COMP field in the input data, this is representing the number (0x12 * 0x100) + 0x34, in decimal (18 * 256) + 52 = 4608 + 52 = 4660.
In summary, mixed data (records) containing fields of types PIC ... USAGE DISPLAY and PIC ... USAGE COMP-n must never be translated, but must be split into indidual fields, and each field must be handled according to its USAGE type.

How to convert From Hex To Dump in C#

I convert my Hex to dump to get special character like symbol but when I try to convert my "0x18" i "\u0018" this value. Can anyone give me solution regarding this matter.
Here is my code:
public static string FromHexDump(string sText)
{
Int32 lIdx;
string prValue ="" ;
for (lIdx = 1; lIdx < sText.Length; lIdx += 2)
{
string prString = "0x" + Mid(sText, lIdx, 2);
string prUniCode = Convert.ToChar(Convert.ToInt64(prString,16)).ToString();
prValue = prValue + prUniCode;
}
return prValue;
}
I used VB language. I have a database that already encrypted text to my password and the value is BAA37D40186D like this so I loop it by step 2 and it will like this 0xBA,0xA3,0x7D,0x40,0x18,0x6D and the VB result getting like this º£}#m
You can use this code:
var myHex = '\x0633';
var formattedString += string.Format(#"\x{0:x4}", (int)myHex);
Or you can use this code from MSDN (https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/types/how-to-convert-between-hexadecimal-strings-and-numeric-types):
string hexValues = "48 65 6C 6C 6F 20 57 6F 72 6C 64 21";
string[] hexValuesSplit = hexValues.Split(' ');
foreach (string hex in hexValuesSplit)
{
// Convert the number expressed in base-16 to an integer.
int value = Convert.ToInt32(hex, 16);
// Get the character corresponding to the integral value.
string stringValue = Char.ConvertFromUtf32(value);
char charValue = (char)value;
Console.WriteLine("hexadecimal value = {0}, int value = {1}, char value = {2} or {3}",
hex, value, stringValue, charValue);
}
The question is unclear - what is the database column's type? Does it contain 6 bytes, or 12 characters with the hex encoding of the bytes? In any case, this has nothing to do with special characters or encodings.
First, 0x18 is the byte value of the Cancel Character in the Latin 1 codepage, not the pound sign. That's 0xA3. It seems that the byte values in the question are just the Latin 1 bytes for the string in hex.
.NET strings are Unicode (UTF16LE specifically). There's no UTF8 string or Latin1 string. Encodings and codepages apply when converting bytes to strings or vice versa. This is done using the Encoding class and eg Encoding.GetBytes
In this case, this code will convert the byte to the expected string form, including the unprintable character :
new byte[] {0xBA,0xA3,0x7D,0x40,0x18,0x6D};
var latinEncoding=Encoding.GetEncoding(1252);
var result=latinEncoding.GetString(dbBytes);
The result is :
º£}#m
With the Cancel character between # and m.
If the database column contains the byte values as strings :
it takes double the required space and
the hex values have to be converted back to bytes before converting to strings
The x format is used to convert numbers or bytes to their hex form and vice versa. For each byte value, ToString("x") returns the hex string.
The hex string can be produced from the original buffer with :
var dbBytes=new byte[] {0xBA,0xA3,0x7D,0x40,0x18,0x6D};
var hexString=String.Join("",dbBytes.Select(c=>c.ToString("x")));
There are many questions that show how to parse a byte string into a byte array. I'll just steal Jared Parson's LINQ answer :
public static byte[] StringToByteArray(string hex) {
return Enumerable.Range(0, hex.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
With that, we can parse the hex string into a byte array and convert it to the original string :
var bytes=StringToByteArray(hexString);
var latinEncoding=Encoding.GetEncoding(1252);
var result=latinEncoding.GetString(bytes);
First of all u don't need dump but Unicode, I would recomend to read about unicode/encoding etc and why this is a problem with strings.
PS: solution : StackOverflow

C# Encoding.Default.GetBytes() method and Integer representation

I saw this code example:
using (FileStream fStream = File.Open(#"C:\myMessage.dat", FileMode.Create))
{
string msg = "Helloo";
byte[] msgAsByteArray = Encoding.Default.GetBytes(msg);
foreach (var a in msgAsByteArray)
{
Console.WriteLine($"a: {a}");
}
// Write byte[] to file.
fStream.Write(msgAsByteArray, 0, msgAsByteArray.Length);
// Reset internal position of stream.
fStream.Position = 0;
// Read the types from file and display to console.
Console.Write("Your message as an array of bytes: ");
byte[] bytesFromFile = new byte[msgAsByteArray.Length];
for (int i = 0; i < msgAsByteArray.Length; i++)
{
bytesFromFile[i] = (byte)fStream.ReadByte();
Console.Write(bytesFromFile[i]);
}
// Display decoded messages.
Console.Write("\nDecoded Message: ");
Console.WriteLine(Encoding.Default.GetString(bytesFromFile));
And the result of Console.WriteLine($"a: {a}") is this:
a: 72
a: 101
a: 108
a: 108
a: 111
a: 111
1.
I thought byte[] is composed of many each unit of byte.
But each byte is represented in integer number.
That numbers must be corresponding ASCII characters.
In C#, byte array means data represented in ASCII?
2.
Is the file myMessage.dat composed of binary data composed of only 0 and 1?
But when I open myMessage.dat with the text editor, it's showing Helloo text string. What's the reason for this?
A byte is a 8bit integer with values from 0 to 255. The output to console outputs the normal number, by providing a format string (https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings) you can output as hex. You can use this answer to get the binary representation.
You explicitly converted the "Halloo" to bytes with Encoding.Default.GetBytes() - that is kindof like converting it to its ascii value but heeding the default encoding on your system.
Your texteditor interpretes the data of the file and displays it as it can. If you put a byte[] myBytes = new [] {0,7,12,3,9,30} into a file and open that with your textedit you will get nonreadable texts as "normal text" starts around 32 , before are f.e. tabs, bells, line feeds and other special non printable characters. See f.e. NonPrintableAscii

Stream, string and null character

I have a stream which contains several \0 inside it. I have to replace textual parts of this stream, but when I do
StreamReader reader = new StreamReader(stream);
string text = reader.ReadToEnd();
text only contains the beginning of the stream (because of the \0 character). So
text = text.Replace(search, replace);
StreamWriter writer = new StreamWriter(stream);
writer.Write(text);
will not do the expected job since I don't parse the "full" stream. Any idea on how to get access to the full data and replace some textual parts ?
EDIT : An example of what I see on notepad
stream
H‰­—[oã6…ÿÛe)Rêq%ÙrlËñE±“-úàÝE[,’íKÿþŽDjxÉ6ŒÅ"XkÏáGqF að÷óð!SN>¿¿‰È†/$ËÙpñ<^HVÀHuñ'¹¿à»U?`äŸ?
¾fØø(Ç,ükøéàâ+ùõ7øø2ÜTJ«¶Ïäd×SÿgªŸF_ß8ÜU#<Q¨|œp6åâ-ªÕ]³®7Ûn¹ÚÝ|‰,¨¹^ãI©…Ë<UIÐI‡Û©* Ǽ,,ý¬5O->qä›Ü
endstream 
endobj
8 0 obj
<<
/Type /FontDescriptor
/FontName /Verdana
/Ascent 765
/Descent -207
/CapHeight 1489
/Flags 32
/ItalicAngle 0
/StemV 86
/StemH 0
/FontBBox [ -560 -303 1523 1051 ]
/FontFile2 31 0 R
>>
endobj
9 0 obj
And I want to replace /FontName /Verdana by /FontName /Arial on the fly, for example.
Ah, now we're getting to it...
This file a pdf
Then it's not a text file. That's a binary file, and should be treated as a binary file. Using StreamReader on it will lose data. You'll need to use a different API to access the data in it - one which understands the PDF format. Have a look at iTextSharp or PDFTron.
I can't duplicate your results. The code below creates a string with a \0 in it, writes to file, and then reads it back. The resulting string has the \0 in it:
string s = "hello\x0world";
File.WriteAllText("foo.txt", s);
string t;
using (var f = new StreamReader("foo.txt"))
{
t = f.ReadToEnd();
}
Console.WriteLine(t == s); // prints "True"
I get the same results if I do var t = File.ReadAllText("foo.txt");

convert Hex UTF-8 bytes to Hex code point

how can i convert
Hex UTF-8 bytes -E0 A4 A4 to hex code point - 0924
ref: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=e0+a4+a4&mode=bytes
I need this because when i read Unicode data in c# it is taking it as single byte sequence and displaying 3 characters instead of 1,but i need 3 byte sequence(read 3 bytes and display single character),I tried many solutions but didn't get the result.
If I can display or store a 3-byte sequence utf-8 character then I don't need conversion.
senario is like this:
string str=getivrresult();
in str I have a word with each character as 3 byte utf-8 sequence.
Edited:
string str="त";
//i want it as "त" in str.
Character त
Character name DEVANAGARI LETTER TA
Hex code point 0924
Decimal code point 2340
Hex UTF-8 bytes E0 A4 A4
Octal UTF-8 bytes 340 244 244
UTF-8 bytes as Latin-1 characters bytes à ¤ ¤
Thank You.
Use the GetString methdod in the Encoding class:
byte[] data = { 0xE0, 0xA4, 0xA4 };
string str = Encoding.UTF8.GetString(data);
The string now contains one character with the character code 0x924.
//utf-8 Single Byte Sequence input
string str = "त";
int i = 0;
byte[] data=new byte[3];
foreach (char c in str)
{
string tmpstr = String.Format("{0:x2}", (int)c);
data[i] = Convert.ToByte(int.Parse(tmpstr, System.Globalization.NumberStyles.HexNumber));
i++;
}
//utf-8 3-Byte Sequence Output now stp contains "त".
string stp = Encoding.UTF8.GetString(data);

Categories

Resources