Read part of a file as text and part as binary

Read part of a file as text and part as binary - c#

I have a file done like this
10 NDI 27 2477 6358 4197 -67 0 VVFAˆ ÿÿÿÿ
The last column is binary.
I have to read this file, the problem is that I can not read it as a text because in some lines the last columns has a new line character and thus I wouldn't read the entire line.
Then I should read it as a binary file, but then how can I retrieve the first and the third column?
I tried by reading bytes in this way:
byte[] lines1 = System.IO.File.ReadAllBytes("D:\\dynamic\\ap1_dynamic\\AP_1.txt");
And then convert it into string with
for (i = 0; i < lines1.Length; i++) {
Convert.ToString(lines1[i],2);
}
but then it reads everything as 0 and 1.. I would like to read the first 8 columns as text, while the last one as binary..
I am using Visual Studio 2013, C#.

Reading the file as binary is correct, as you can convert part of the binary data to text. In this context binary means bytes.
Converting the bytes to binary is not what you want to do. In this context binary means text representation in base 2, but you don't want a text representation of the data.
If the lines are fixed length, you can do something like this to read the values:
int lineLen = 70; // correct this with the actual length
int firstPos = 0;
int firstLen = 3; // correct with actual length
int thirdPos = 15; // correct with actual position
int thirdLen = 3; // correct with actual length
int lastPos = 60; // correct with actual position
int lastLen = 10; // correct with actual length
int lines = lines.length / lineLength;
for (int i = 0; i < lines; i++) {
int first = Int32.Parse(Encoding.UTF8.GetString(i * lineLen + firstPos, firstLen).Trim());
int third = Int32.Parse(Encoding.UTF8.GetString(i * lineLen + thirdPos, thirdLen).Trim());
byte[] last = new byte[lastLen];
Array.Copy(lines1, i * lineLen + lastPos, last, 0, lastLen);
// do something with the data in first, third and last
}

Related

Is there a way to split every string in an array and retrieve the 5th bit each time, adding to a total?

Hi sorry guys I'm pretty new to C# and programming in general.
I have a text file that I'm reading from which contains 10 lines (all but the first of which are relevant).
I want to split each line (besides the first since it's only one word) by the commas, then retrieve the 5th one along of each line, adding it to a total.
Currently all I have been able to do is essentially split and add the same value to the total 10 times, instead of adding the 9 different values together, or face a "System.IndexOutOfRangeException".
int totalValues = 0;
string[] larray = lines.ToArray(); //create array from list
string vehicleValue;
for (int i = 0; i < larray.Length; i++)
{
string[] bits = larray[i].Split(',');
vehicleValue = bits[4];
int vvint = int.Parse(vehicleValue);
totalValues = totalValues + vvint;
}
totalValue.Text = totalValues.ToString();
As it stands, the above code results in a "System.IndexOutOfRangeException" highlighting "vehicleValue = bits [4];"
Every line of the file looks like this, besides the first one.
Car,Ford,GT40,1964,250000,987,Red,A1,2,4,FALSE
The value I want out of this specific line would be '250000' - the 5th one along. I'm trying to get the 5th one along from every line.

Your problem is that you are trying to parse also the first line (which does not contain enough entries so you get the exception). You can skip the first line by starting your iteration at index 1:
int totalValues = 0;
string[] larray = lines.ToArray(); //create array from list
string vehicleValue;
for (int i = 1; i < larray.Length; i++)
{
string[] bits = larray[i].Split(',');
vehicleValue = bits[4];
int vvint = int.Parse(vehicleValue);
totalValues = totalValues + vvint;
}
totalValue.Text = totalValues.ToString();

bits[4] is the fifth item in the array as indexing starts from zero, to get the fourth item you should get bits[3]
int totalValues = 0;
string[] larray = lines.ToArray(); //create array from list
string vehicleValue;
for (int i = 0; i < larray.Length; i++)
{
string[] bits = larray[i].Split(',');
vehicleValue = bits[3];
int vvint = int.Parse(bits[3]);
totalValues = totalValues + vvint;
}
totalValue.Text = totalValues.ToString();

Binary search on file with different line length

I have some code which does a binary search over a file with sorted hex values (SHA1 hashes) on each line. This is used to search the HaveIBeenPwned database. The latest version contains a count of the number of times each password hash was found, so some lines have extra characters at the end, in the format ':###'
The length of this additional check isn't fixed, and it isn't always there. This causes the buffer to read incorrect values and fail to find values that actually exist.
Current code:
static bool Check(string asHex, string filename)
{
const int LINELENGTH = 40; //SHA1 hash length
var buffer = new byte[LINELENGTH];
using (var sr = File.OpenRead(filename))
{
//Number of lines
var high = (sr.Length / (LINELENGTH + 2)) - 1;
var low = 0L;
while (low <= high)
{
var middle = (low + high + 1) / 2;
sr.Seek((LINELENGTH + 2) * ((long)middle), SeekOrigin.Begin);
sr.Read(buffer, 0, LINELENGTH);
var readLine = Encoding.ASCII.GetString(buffer);
switch (readLine.CompareTo(asHex))
{
case 0:
return true;
case 1:
high = middle - 1;
break;
case -1:
low = middle + 1;
break;
default:
break;
}
}
}
return false;
}
My idea is to seek forward from the middle until a newline character is found, then seek backwards for the same point, which should give me a complete line which I can split by the ':' delimiter. I then compare the first part of the split string array which should be just a SHA1 hash.
I think this should still centre on the correct value, however I am wondering if there is a neater way to do this? If the midpoint isn't that actual midpoint between the end of line characters, should it be adjusted before the high and low values are?

I THINK this may be a possible simpler (faster) solution without the backtracking to the beginning of the line. I think you can just use byte file indexes instead of trying to work with a full "record/line. Because the middle index will not always be at the start of a line/record, the "readline" can return a partial line/record. If you were to immediately do a second "readline", you would get a full line/record. It wouldn't be quite optimal, because you would actually be comparing a little ahead of the middle index.
I downloaded the pwned-passwords-update-1 and pulled out about 30 records at the start, end, and in the middle, it seemed to find them all. What do you think?
const int HASHLENGTH = 40;
static bool Check(string asHex, string filename)
{
using (var fs = File.OpenRead(filename))
{
var low = 0L;
// We don't need to start at the very end
var high = fs.Length - (HASHLENGTH - 1); // EOF - 1 HASHLENGTH
StreamReader sr = new StreamReader(fs);
while (low <= high)
{
var middle = (low + high + 1) / 2;
fs.Seek(middle, SeekOrigin.Begin);
// Resync with base stream after seek
sr.DiscardBufferedData();
var readLine = sr.ReadLine();
// 1) If we are NOT at the beginning of the file, we may have only read a partial line so
// Read again to make sure we get a full line.
// 2) No sense reading again if we are at the EOF
if ((middle > 0) && (!sr.EndOfStream)) readLine = sr.ReadLine() ?? "";
string[] parts = readLine.Split(':');
string hash = parts[0];
// By default string compare does a culture-sensitive comparison we may not be what we want?
// Do an ordinal compare (0-9 < A-Z < a-z)
int compare = String.Compare(asHex, hash, StringComparison.Ordinal);
if (compare < 0)
{
high = middle - 1;
}
else if (compare > 0)
{
low = middle + 1;
}
else
{
return true;
}
}
}
return false;
}

My way of solving your problem was to create a new binary file containing the hashes only. 16 byte/hash and a faster binary search  ( I don't have 50 reps needed to comment only )

JPEG Steganography, Inconsistent DCT Coefficients and errors

My problem is as follows :
My problem is that even after doing LSB replacement after the quantization step I still get errors and changes on the detection side. for strings, letters get changed but for bitmaps the image isn't readable as deduced from getting "Parameters no valid". I've tried a lot of debugging and I just can't figure it out.
My goal is pretty simple, insert a set of bits (before string or Bitmap) into a JPEG image, save it and be able to detect and extract said set of bits to its original form. I've been successful with BMP and PNG as there is no compression there, but JPEG is another story. Btw I'm doing LSB replacement.
I understand what I need to do, apply the LSB replacement after the DCT coefficients have been quantized. For that purpose I have been using a JPEG Encoder and modified what I needed in the appropriate spot.
I modified the method EncodeImageBufferToJpg to convert a string or bitmap into a bit array (int[]) and then do LSB replacement to one Coefficient per block for each channel Y, Cb, Cr.
This here is my modified method for EncodeImageBufferToJpg, plus the Detection+Process method I use to reconstruct the message: Link Here.
For the Y channel for example :
In encoding :
Int16[] DCT_Quant_Y = Do_FDCT_Quantization_And_ZigZag(Y_Data, Tables.FDCT_Y_Quantization_Table);
if (!StegoEncodeDone)
{
// We clear the LSB to 0
DCT_Quant_Y[DCIndex] -= Convert.ToInt16(DCT_Quant_Y[DCIndex] % 2);
// We add the bit to the LSB
DCT_Quant_Y[DCIndex] += Convert.ToInt16(MsgBits[MsgIndx]);
// Ys for debug print
Ys.Add(DCT_Quant_Y[DCIndex]);
MsgIndx++;
if (MsgIndx >= MsgBits.Length) StegoEncodeDone = true;
}
DoHuffmanEncoding(DCT_Quant_Y, ref prev_DC_Y, Tables.Y_DC_Huffman_Table, Tables.Y_AC_Huffman_Table, OutputStream);
and in detection :
Int16[] DCT_Quant_Y = Do_FDCT_Quantization_And_ZigZag(Y_Data, Tables.FDCT_Y_Quantization_Table);
// SteganoDecode *********************************************
if (!StegoDecodeDone)
{
int Dtt = Math.Abs(DCT_Quant_Y[DCIndex] % 2);
int DYY = Y_Data[DCIndex];
int DDCTYYB = DCT_Quant_Y[DCIndex];
Ys.Add(DCT_Quant_Y[DCIndex]);
// Si le DCT Coefficient est negatif le % retournais un -1 mais binaire => 0,1 => positif
charValue = charValue * 2 + Math.Abs(DCT_Quant_Y[DCIndex] % 2);
ProcessStaganoDecode();
}
// End *********************************************************
DCT_Quant_Y.CopyTo(Y, index);
public void ProcessStaganoDecode()
{
Counter++;
cc++;
if (IDFound) MsgBits.Add(charValue % 2);
else IDBits.Add(charValue % 2);
if (Counter == 8)
{
// If we find a '-' we inc, else we set to 0. because they have to be 3 consecutive "---"
char ccs = (char)reverseBits(charValue);
if (((char)reverseBits(charValue)) == '-')
{
SepCounter++;
}
else SepCounter = 0;
if (SepCounter >= 3)
{
if (IDFound)
{
MsgBits.RemoveRange(MsgBits.Count - 3 * 8, 3 * 8);
StegoDecodeDone = MarqueFound = true;
}
else
{
IDFound = true;
IDBits.RemoveRange(IDBits.Count - 3 * 8, 3 * 8);
string ID = BitToString(IDBits);
IDNum = Convert.ToInt16(BitToString(IDBits));
Console.WriteLine("ID Found : " + IDNum);
}
SepCounter = 0;
}
charValue = 0;
Counter = 0;
}
}
All the code is in the class: BaseJPEGEncoder.
Here's the VS 2015 C# project for you to check the rest of the classes etc. I can only put 2 links, so sorry couldn't put the original: Here. I got the original encoder from "A simple JPEG encoder in C#" at CodeProject
I've read some answers to other questions from these two people, and I would love to get their attention to give me some help if they can: Sneftel and Reti43. Couldn't find a way to contact them.

erroneous character fixing of strings in c#

I have five strings like below,
ABBCCD
ABBDCD
ABBDCD
ABBECD
ABBDCD
all the strings are basically same except for the fourth characters. But only the character that appears maximum time will take the place. For example here D was placed 3 times in the fourth position. So, the final string will be ABBDCD. I wrote following code, but it seemed to be less efficient in terms of time. Because this function can be called million times. What should I do to improve the performance?
Here changedString is the string to be matched with other 5 strings. If Any position of the changed string is not matched with other four, then the maxmum occured character will be placed on changedString.
len is the length of the strings which is same for all strings.
for (int i = 0; i < len;i++ )
{
String findDuplicate = string.Empty + changedString[i] + overlapStr[0][i] + overlapStr[1][i] + overlapStr[2][i] +
overlapStr[3][i] + overlapStr[4][i];
char c = findDuplicate.GroupBy(x => x).OrderByDescending(x => x.Count()).First().Key;
if(c!=changedString[i])
{
if (i > 0)
{
changedString = changedString.Substring(0, i) + c +
changedString.Substring(i + 1, changedString.Length - i - 1);
}
else
{
changedString = c + changedString.Substring(i + 1, changedString.Length - 1);
}
}
//string cleanString = new string(findDuplicate.ToCharArray().Distinct().ToArray());
}

I'm not quite sure what you are going to do, but if it is about sorting strings by some n-th character, then the best way is to use Counting Sort http://en.wikipedia.org/wiki/Counting_sort It is used for sorting array of small integers and is quite fine for chars. It has linear O(n) time. The main idea is that if you know all your possible elements (looks like they can be only A-Z here) then you can create an additional array and count them. For your example it will be {0, 0, 1 ,3 , 1, 0,...} if we use 0 for 'A', 1 for 'B' and so on.

There is a function that might help performance-wise as it runs five times faster. The idea is to count occurrences yourself using a dictionary to convert character to a position into counting array, increment value at this position and check if it is greater than previously highest number of occurrences. If it is, current character is top and is stored as result. This repeats for each string in overlapStr and for each position within the strings. Please read comments inside code to see details.
string HighestOccurrenceByPosition(string[] overlapStr)
{
int len = overlapStr[0].Length;
// Dictionary transforms character to offset into counting array
Dictionary<char, int> char2offset = new Dictionary<char, int>();
// Counting array. Each character has an entry here
int[] counters = new int[overlapStr.Length];
// Highest occurrence characters found so far
char[] topChars = new char[len];
for (int i = 0; i < len; ++i)
{
char2offset.Clear();
// faster! char2offset = new Dictionary<char, int>();
// Highest number of occurrences at the moment
int highestCount = 0;
// Allocation of counters - as previously unseen character arrives
// it is given a slot at this offset
int lastOffset = 0;
// Current offset into "counters"
int offset = 0;
// Small optimization. As your data seems very similar, this helps
// to reduce number of expensive calls to TryGetValue
// You might need to remove this optimization if you don't have
// unused value of char in your dataset
char lastChar = (char)0;
for (int j = 0; j < overlapStr.Length; ++ j)
{
char thisChar = overlapStr[j][i];
// If this is the same character as last one
// Offset already points to correct cell in "counters"
if (lastChar != thisChar)
{
// Get offset
if (!char2offset.TryGetValue(thisChar, out offset))
{
// First time seen - allocate & initialize cell
offset = lastOffset;
counters[offset] = 0;
// Map character to this cell
char2offset[thisChar] = lastOffset++;
}
// This is now last character
lastChar = thisChar;
}
// increment and get count for character
int charCount = ++counters[offset];
// This is now highestCount.
// TopChars receives current character
if (charCount > highestCount)
{
highestCount = charCount;
topChars[i] = thisChar;
}
}
}
return new string(topChars);
}
P.S. This is certainly not the best solution. But as it is significantly faster than original I thought I should help out.

Creating a binary file from an IntelHex in C#

I'm trying to create a binary file from a intelHex file. Iside the intelHex file I have data and address to which I should write the data inside the binary file.
IntelHex file looks like that
:10010000214601360121470136007EFE09D2190140
:100110002146017EB7C20001FF5F16002148011988
:10012000194E79234623965778239EDA3F01B2CAA7
:100130003F0156702B5E712B722B732146013421C7
:00000001FF
So I have 4 lines here with data since the last one tells us thats the end of file.
Here is what I'm doing to create the file
while (!streamReader.EndOfStream)
{
string temp = String.Empty;
int address = 0;
line = streamReader.ReadLine();
// Get address for each data
address = Convert.ToInt32(line.Substring(3, 4), 16);
// Get data from each line
temp = line.Substring(7, 2);
if (temp == "01")
break;
else
{
temp = line.Substring(9, line.Length - 11);
string[] array = new string[(temp.Length / 2)];
int j = 0;
for (int i = 0; i < array.Length; ++i)
{
array[i] = temp[j].ToString() + temp[j + 1].ToString();
j = j + 2;
}
temp = String.Empty;
for (int i = 0; i < array.Length; ++i)
{
temp = temp + Convert.ToChar(Convert.ToInt32(array[i], 16));
}
}
binaryWriter.Seek(address, SeekOrigin.Begin);
binaryWriter.Write(temp);
binaryWriter.Flush();
}
Console.WriteLine("Done...\nPress any key to exit...");
The problem here is, that data in binary file in some places is not equal to data from the intelHex file. Looks like there is some random data added to the file and I do not know from where. First time I saw that there is an additional data before the data from the intelHex file. For instance first data line starts with 21, but in binary file I have a number 12 before the 21. I do not know what is wrong here. Hope someone can help me or guide me where I can find some usefull informations about creating binary files in C#

<Generic answer pointing out that a Unicode character (char) is not an octet (byte), and that the code produces the wrong output because binary data is written as Unicode string to the file.>
Hint: use a byte[] for binary data, not a string.
Also: In before answers suggesting to use a StringBuilder for the loop.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read part of a file as text and part as binary - c#

Related

Is there a way to split every string in an array and retrieve the 5th bit each time, adding to a total?

Binary search on file with different line length

JPEG Steganography, Inconsistent DCT Coefficients and errors

erroneous character fixing of strings in c#

Creating a binary file from an IntelHex in C#

Categories

Resources