JPEG Steganography, Inconsistent DCT Coefficients and errors - c#

My problem is as follows :
My problem is that even after doing LSB replacement after the quantization step I still get errors and changes on the detection side. for strings, letters get changed but for bitmaps the image isn't readable as deduced from getting "Parameters no valid". I've tried a lot of debugging and I just can't figure it out.
My goal is pretty simple, insert a set of bits (before string or Bitmap) into a JPEG image, save it and be able to detect and extract said set of bits to its original form. I've been successful with BMP and PNG as there is no compression there, but JPEG is another story. Btw I'm doing LSB replacement.
I understand what I need to do, apply the LSB replacement after the DCT coefficients have been quantized. For that purpose I have been using a JPEG Encoder and modified what I needed in the appropriate spot.
I modified the method EncodeImageBufferToJpg to convert a string or bitmap into a bit array (int[]) and then do LSB replacement to one Coefficient per block for each channel Y, Cb, Cr.
This here is my modified method for EncodeImageBufferToJpg, plus the Detection+Process method I use to reconstruct the message: Link Here.
For the Y channel for example :
In encoding :
Int16[] DCT_Quant_Y = Do_FDCT_Quantization_And_ZigZag(Y_Data, Tables.FDCT_Y_Quantization_Table);
if (!StegoEncodeDone)
{
// We clear the LSB to 0
DCT_Quant_Y[DCIndex] -= Convert.ToInt16(DCT_Quant_Y[DCIndex] % 2);
// We add the bit to the LSB
DCT_Quant_Y[DCIndex] += Convert.ToInt16(MsgBits[MsgIndx]);
// Ys for debug print
Ys.Add(DCT_Quant_Y[DCIndex]);
MsgIndx++;
if (MsgIndx >= MsgBits.Length) StegoEncodeDone = true;
}
DoHuffmanEncoding(DCT_Quant_Y, ref prev_DC_Y, Tables.Y_DC_Huffman_Table, Tables.Y_AC_Huffman_Table, OutputStream);
and in detection :
Int16[] DCT_Quant_Y = Do_FDCT_Quantization_And_ZigZag(Y_Data, Tables.FDCT_Y_Quantization_Table);
// SteganoDecode *********************************************
if (!StegoDecodeDone)
{
int Dtt = Math.Abs(DCT_Quant_Y[DCIndex] % 2);
int DYY = Y_Data[DCIndex];
int DDCTYYB = DCT_Quant_Y[DCIndex];
Ys.Add(DCT_Quant_Y[DCIndex]);
// Si le DCT Coefficient est negatif le % retournais un -1 mais binaire => 0,1 => positif
charValue = charValue * 2 + Math.Abs(DCT_Quant_Y[DCIndex] % 2);
ProcessStaganoDecode();
}
// End *********************************************************
DCT_Quant_Y.CopyTo(Y, index);
public void ProcessStaganoDecode()
{
Counter++;
cc++;
if (IDFound) MsgBits.Add(charValue % 2);
else IDBits.Add(charValue % 2);
if (Counter == 8)
{
// If we find a '-' we inc, else we set to 0. because they have to be 3 consecutive "---"
char ccs = (char)reverseBits(charValue);
if (((char)reverseBits(charValue)) == '-')
{
SepCounter++;
}
else SepCounter = 0;
if (SepCounter >= 3)
{
if (IDFound)
{
MsgBits.RemoveRange(MsgBits.Count - 3 * 8, 3 * 8);
StegoDecodeDone = MarqueFound = true;
}
else
{
IDFound = true;
IDBits.RemoveRange(IDBits.Count - 3 * 8, 3 * 8);
string ID = BitToString(IDBits);
IDNum = Convert.ToInt16(BitToString(IDBits));
Console.WriteLine("ID Found : " + IDNum);
}
SepCounter = 0;
}
charValue = 0;
Counter = 0;
}
}
All the code is in the class: BaseJPEGEncoder.
Here's the VS 2015 C# project for you to check the rest of the classes etc. I can only put 2 links, so sorry couldn't put the original: Here. I got the original encoder from "A simple JPEG encoder in C#" at CodeProject
I've read some answers to other questions from these two people, and I would love to get their attention to give me some help if they can: Sneftel and Reti43. Couldn't find a way to contact them.

Related

read weighing scale data via RS232

I want to read data from weighing scale data via RS232 and i try any more way
my weighing scale model yh-t7e datasheet
The output of the scale on the AccessPort program is this value .
output on Access Port
The weight on the scales = 3.900 kg
in picture =009.300
baud rate 1200
When I use this code
string s = "";
int num8 = 0;
string RST = "";
while (this.serialPort1.BytesToRead > 0)
{
string data = serialPort1.ReadExisting();
if (data != null)
{
if (data.ToString() != "")
{
if (data.Length > 6)
{
RST = data.Substring(6, 1) + data.Substring(5, 1) + data.Substring(4, 1) + data.Substring(3, 1) + data.Substring(2, 1);
this.textBox4110.Text = RST.ToString();
}
}
}
}
output in my program
When I use the above code in the program
Sometimes displays the weight number and sometimes does not. I have to open and close the program several times.
And by changing the weight on the scale, its number does not change on the program and the display value is fixed.
and When I use this code
while (this.serialPort1.BytesToRead > 0)
{
int data = serialPort1.ReadByte();
this.textBox4110.Text = data.ToString();
}
in my program Displays the number 48
What should I do?
thanks regards
I don't know why your serial port sometimes responds and sometimes doesn't.
I worked with RS232 years ago and never had this problem.
About other questions:
You're working with a byte array, can't call the ToString, since it will convert to string the byte rappresentation
If you have to reverse the bites order (4 - 3 - 2 - 1), you can call the Array.Reverse method
Just for make an example about what I mean, I took your code:
while (this.serialPort1.BytesToRead > 0)
{
int data = serialPort1.ReadByte();
this.textBox4110.Text = data.ToString();
}
your "data" variable contains a byte with value 48 that is the 0 char in ASCII table.
So, if you want the char, you have to convert it using the right encoding.
Suppose you are working with UTF8:
while (this.serialPort1.BytesToRead > 0)
{
var dataLen = this.serialPort1.BytesToRead;
var byteArray = new byte[dataLen];
this.serialPort1.Read(byteArray, 0, dataLen);
var txt = Encoding.UTF8.GetString(byteArray);
this.textBox4110.Text = txt;
}
Honestly I know the Encoding.UTF8.GetString accept a byte array, not sure it will work only with a single byte...

Brute Force Algorithm Prints Only 0

I am creating a Brute Force Algorithm (for educational purposes) on C#, and I have met an error.
My way of bruteforce is like a clock - last character in the string goes up from 0 to last character in a character array set in the beggining of the program, when it reaches the end - resets to 0 and increases the previous character by 1 and so on.
Yet, the code below prints only 0's:
// Array with characters to use in Brute Force Algorithm.
// You can remove or add more characters in this array.
private static char[] fCharList =
{
'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j' ,'k','l','m','n','o','p',
'q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','C','L','M','N','O','P',
'Q','R','S','T','U','V','X','Y','Z','~','!','#','#','$','%','^','&','*','(',')','[',']','{','}','.',',','/','?','\'','|','"',';',':','<','>','\\','=','-','+','`','_'
};
private static String password;
static void Main(string[] args)
{
Console.WriteLine("Enter the password: ");
password = Console.ReadLine();
Bruteforce();
}
// The Bruteforce algorithm
// Goes like a clock cycle: Last index goes from fCharList[0] to fCharList[fCharList.Length - 1]
// then resets and last index -1 increases and so on.
public static String Bruteforce()
{
String currPass = fCharList[0].ToString();
bool foundPass = false;
for (int i = 1; !foundPass; i++)
{
// If there's a need to increase (foundd fCharList[fCharList.Length - 1] in the String)
if (currPass.Contains(fCharList[fCharList.Length - 1]))
{
//If no need to increase the whole length and reset all the characters
if (!(currPass.IndexOf(fCharList[fCharList.Length - 1]) == 0))
{
String updateCurrPass = "";
for (int j = currPass.Length - 1; j >= currPass.IndexOf(fCharList[fCharList.Length - 1]); j--)
{
updateCurrPass += fCharList[0].ToString();
}
currPass.Insert(currPass.IndexOf(fCharList[fCharList.Length - 1]) - 1, fCharList[Array.IndexOf(fCharList, currPass.ElementAt<char>(currPass.IndexOf(fCharList[fCharList.Length - 1]) - 1)) + 1].ToString() + updateCurrPass);
}
}
else // If no cycle ended - continue increasing last "digit"
{
currPass.Insert(currPass.Length - 1, fCharList[Array.IndexOf(fCharList, currPass.ElementAt(currPass.Length - 1)) + 1].ToString());
}
Console.Write(currPass + " ");
}
return "";
}
I tried all the possible issues with the currPass.Insert(currPass.Length - 1, fCharList[Array.IndexOf(fCharList, currPass.ElementAt(currPass.Length - 1)) + 1].ToString()); (as I suspected that the problem might occure in the printing process itself), but with no success.
I also tried to trace the code using breakpoints and paper, still nothing.
It would be very pleasing if someone can help me solve this problem.
Edit:
Below, many suggested the updateCurrPass += fCharList[0].ToString(); should rather be updateCurrPass += fCharList[j].ToString();. I haven't yet checked that option too deep, but to explain better my situation, I want it to work like a clock cycle - when the last digit is the latest character in fCharList, the previous digit increases and so on. The code mentioned resets the digits that have reached the last character. (So, if the string currPass was "0aa___" (_ is the last character), it will become 0ab000) the updateCurrPass adds the 3 0's, while the rest of the the function increases the a to b.
As noted by #VincentElbertBudiman, you have to use j in your loop and not 0. (or maybe not, I'm unsure if I understand well your algorithm)
But there's something more:
You have to note that in C#, Strings are immutable, meaning currPass.Insert(...) does nothing in itself, you have to reassign the result like currPass = currPass.Insert(...);
But hey, I think you are overcomplicating the algorithm.
What I would've done:
string currPass = fCharList[0].ToString();
bool found = false;
while(!found)
{
currPass = IncreasePassword(currPass);
found = CheckPass(currPass);
}
With IncreasePassword:
public static string IncreasePassword(string pass)
{
bool changed = false;
StringBuilder sb = new StringBuilder(pass);
// loop through pass until we change something
// or we reach the end (what comes first)
for(int i = 0; i < pass.Length && !changed; i++)
//for(int i = pass.Length - 1; i >= 0 && !changed; i--)
{
int index = Array.IndexOf(fCharList, sb[i]);
// if current char can be increased
if(index < fCharList.Length - 1)
{
// if we have __012 then we'll go on 00112
// so here we replace the left __ with 00
for(int j = i - 1; j >= 0 && sb[j] == fCharList[fCharList.Length - 1]; j--)
//for(int j = i + 1; j < sb.Length && sb[j] == fCharList[fCharList.Length - 1]; j++)
{
sb[j] = fCharList[0];
}
// and here we increase the current char
sb[i] = fCharList[index + 1];
changed = true;
}
}
// if we didn't change anything, it means every char were '_'
// so we start with a fresh new full-of-0 string
if(!changed)
{
return "".PadLeft(pass.Length + 1, fCharList[0]);
}
return sb.ToString();
}
Live example.
Explanations
This will work from left to right the following way:
Say our fCharList is { '0','1','2' } for simplifications.
We'll have:
0
1
2
00
10
20
01
11
21
02
12
22
000
100
200
010
110
210
020
....
Test results
This is what it gives with reversed inputs (as my solution goes the other way around) from Weeble's suggestions:
Input Output
0 1
1 2
P Q
_ 00
00 10
_0 01
__CBA 00DBA
____ 00000
Please note that your fCharList is broken as there's a C instead of K!
Did you think because you assign updateCurrPass like this ? :
for (int j = currPass.Length - 1; j >= currPass.IndexOf(fCharList[fCharList.Length - 1]); j--)
{
updateCurrPass += fCharList[0].ToString();
}
I think it should be involving j or something :
for (int j = currPass.Length - 1; j >= currPass.IndexOf(fCharList[fCharList.Length - 1]); j--)
{
updateCurrPass += fCharList[j].ToString();
}
Added code :
This below code will do nothing because the function will return the function (instead of changing the object passed by parameters), you should add currPass = currPass.Insert(...) as noted by #Rafalon
else // If no cycle ended - continue increasing last "digit"
{
currPass.Insert(currPass.Length - 1, fCharList[Array.IndexOf(fCharList, currPass.ElementAt(currPass.Length - 1)) + 1].ToString());
}
Use this instead :
else // If no cycle ended - continue increasing last "digit"
{
currPass = currPass.Insert(currPass.Length - 1, fCharList[Array.IndexOf(fCharList, currPass.ElementAt(currPass.Length - 1)) + 1].ToString());
}
While there are specific problems pointed out by the other answers, may I suggest that your broader problem is that you're trying to do too much at once? You may well be capable of writing this whole algorithm in one piece and tweaking it until it works, but it's both very hard to do this and hard to have confidence in the end that it's correct. You will find it easier to break down the algorithm into smaller functions and test those individually to confirm they do what you expect.
For example, you could break out the part of the algorithm that works out "given the last password I checked, what is the next password in sequence to consider"? This is the core of the algorithm, and it can be separated from the work of iterating through all the possible passwords and testing them against the current one. And it is much easier to test! For example, you can use test data like this:
Input Expected output
"0" "1"
"1" "2"
"P" "Q"
"_" "00"
"00" "01"
"0_" "10"
"ABC__" "ABD00"
"____" "00000"
When you find out that your function doesn't give the right answer for some or all of these tests, you have a much smaller and more specific job to do to figure out what's wrong than trying to debug the whole algorithm at once.

Binary search on file with different line length

I have some code which does a binary search over a file with sorted hex values (SHA1 hashes) on each line. This is used to search the HaveIBeenPwned database. The latest version contains a count of the number of times each password hash was found, so some lines have extra characters at the end, in the format ':###'
The length of this additional check isn't fixed, and it isn't always there. This causes the buffer to read incorrect values and fail to find values that actually exist.
Current code:
static bool Check(string asHex, string filename)
{
const int LINELENGTH = 40; //SHA1 hash length
var buffer = new byte[LINELENGTH];
using (var sr = File.OpenRead(filename))
{
//Number of lines
var high = (sr.Length / (LINELENGTH + 2)) - 1;
var low = 0L;
while (low <= high)
{
var middle = (low + high + 1) / 2;
sr.Seek((LINELENGTH + 2) * ((long)middle), SeekOrigin.Begin);
sr.Read(buffer, 0, LINELENGTH);
var readLine = Encoding.ASCII.GetString(buffer);
switch (readLine.CompareTo(asHex))
{
case 0:
return true;
case 1:
high = middle - 1;
break;
case -1:
low = middle + 1;
break;
default:
break;
}
}
}
return false;
}
My idea is to seek forward from the middle until a newline character is found, then seek backwards for the same point, which should give me a complete line which I can split by the ':' delimiter. I then compare the first part of the split string array which should be just a SHA1 hash.
I think this should still centre on the correct value, however I am wondering if there is a neater way to do this? If the midpoint isn't that actual midpoint between the end of line characters, should it be adjusted before the high and low values are?
I THINK this may be a possible simpler (faster) solution without the backtracking to the beginning of the line. I think you can just use byte file indexes instead of trying to work with a full "record/line. Because the middle index will not always be at the start of a line/record, the "readline" can return a partial line/record. If you were to immediately do a second "readline", you would get a full line/record. It wouldn't be quite optimal, because you would actually be comparing a little ahead of the middle index.
I downloaded the pwned-passwords-update-1 and pulled out about 30 records at the start, end, and in the middle, it seemed to find them all. What do you think?
const int HASHLENGTH = 40;
static bool Check(string asHex, string filename)
{
using (var fs = File.OpenRead(filename))
{
var low = 0L;
// We don't need to start at the very end
var high = fs.Length - (HASHLENGTH - 1); // EOF - 1 HASHLENGTH
StreamReader sr = new StreamReader(fs);
while (low <= high)
{
var middle = (low + high + 1) / 2;
fs.Seek(middle, SeekOrigin.Begin);
// Resync with base stream after seek
sr.DiscardBufferedData();
var readLine = sr.ReadLine();
// 1) If we are NOT at the beginning of the file, we may have only read a partial line so
// Read again to make sure we get a full line.
// 2) No sense reading again if we are at the EOF
if ((middle > 0) && (!sr.EndOfStream)) readLine = sr.ReadLine() ?? "";
string[] parts = readLine.Split(':');
string hash = parts[0];
// By default string compare does a culture-sensitive comparison we may not be what we want?
// Do an ordinal compare (0-9 < A-Z < a-z)
int compare = String.Compare(asHex, hash, StringComparison.Ordinal);
if (compare < 0)
{
high = middle - 1;
}
else if (compare > 0)
{
low = middle + 1;
}
else
{
return true;
}
}
}
return false;
}
My way of solving your problem was to create a new binary file containing the hashes only. 16 byte/hash and a faster binary search  ( I don't have 50 reps needed to comment only )

Why won't this c# string-wrapping algorithm work? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Alright, so, in my C# Cosmos operating system, I'm working on a system that lets me input a string, and a desired width, and wrap the string into lines based on the width.
It works sorta like this: Input string is "Hello beautiful world". Width is 6. The algorithm will iterate through the string, and if the char index is that of the width, and the current char is a Space, it'll take everything from the beginning of the string up until that point, add it to a List, and remove it from the string itself, and reset the char index to 0, and start over. It does this until the string is either empty or smaller than the width. If it's smaller than the width, it's added to the List, and the for loop is terminated. In laymen, our output string should come out like this:
Hello
beautiful
world.
This is my code.
public static List<string> split_string(int width, string text)
{
List<string> text_lines = new List<string>();
//Separate lines of text.
for (int i = 0; i < text.Length; i++)
{
if (text.Length <= width)
{
text_lines.Add(text);
i = text.Length + 5;
}
else
{
if (i >= width)
{
if (text[i] == ' ')
{
text_lines.Add(text.Substring(0, i + 1));
text = text.Remove(0, i + 1);
i = 0;
}
}
}
}
return text_lines;
}
The thing is, sometimes, if I end up having to deal with the string being smaller than the width, we get issues. It seems to skip that part of the string. Yikes!
For example, here's a piece of my OS that uses this. It's supposed to take a title and a message, and display it in a messagebox with an OK button.
public static void ShowMessagebox(string title, string text)
{
int splitWidth = 25;
if(text.Length < splitWidth)
{
splitWidth = text.Length;
}
if(title.Length > splitWidth)
{
splitWidth = title.Length;
}
var lines = new List<string>();
if(splitWidth > text.Length)
{
lines.Add(text);
}
else
{
lines = TUI.Utils.split_string(splitWidth, text);
}
foreach(var line in lines)
{
if(text.Contains(line))
{
text = text.Replace(line, "");
}
}
if(text.Length > 0)
{
lines.Add(text);
}
int h = lines.Count + 4;
int w = 0;
foreach(var line in lines)
{
if(line.Length + 4 > w)
{
w = line.Length + 4;
}
}
int x = (Console.WindowWidth - w) / 2;
int y = (Console.WindowHeight - h) / 2;
TUI.Utils.ClearArea(x, y, w, h, ConsoleColor.Green);
TUI.Utils.ClearArea(x, y, w, 1, ConsoleColor.White);
TUI.Utils.Write(x + 1, y, title, ConsoleColor.White, ConsoleColor.Black);
for(int i = 0; i < lines.Count - 1; i++)
{
TUI.Utils.Write(x + 2, (y + 2) + i, lines[i], ConsoleColor.Green, ConsoleColor.White);
}
int xw = x + w;
int yh = y + h;
TUI.Utils.Write(xw - 6, yh - 2, "<OK>", TUI.Utils.COL_BUTTON_SELECTED, TUI.Utils.COL_BUTTON_TEXT);
bool stuck = true;
while (stuck)
{
var kinf = Console.ReadKey();
if (kinf.Key == ConsoleKey.Enter)
{
stuck = false;
Console.Clear();
}
else
{
}
}
}
Pretty simple. Starts with a default width of 25 chars, and if the title is bigger, it sets it to title length. If text length is smaller than the width it sets the width to compensate. Then it makes the call to the splitter algorithm from above, found in 'TUI.Utils', and then does some stuff to print to the screen.
Here's a piece of my OS's "ConfigurationManager", an application that takes in user input and uses it to generate a config file. Very work-in-progress right now.
Curse.ShowMessagebox("Memphis can't run properly this system.", "Memphis needs at least one FAT partition on a Master Boot Record to be able to store it's configuration and other files on. Please use a partition utility like GParted to partition your hard drive properly.");
But have a look at what comes out onto my screen...
The messagebox coming out of the above method call
As you can see, not really the thing I want. It's missing some of the string!
You don't need to change text, as we can just store the offset of our original substring. The fewer string manipulations we do, the better.
public static List<string> split_string(int width, string text)
{
width = width - 1; //So we're not constantly comparing to width - 1
var returnSet = new List<string>();
var currLength = 0;
var oldOffset = 0;
for (var i = 0; i < text.Length; i++)
{
if (currLength >= width && text[i] == ' ')
{
returnSet.Add(text.Substring(oldOffset, i - oldOffset));
oldOffset = i + 1;
currLength = 0;
continue;
}
currLength++;
}
if (oldOffset < text.Length)
returnSet.Add(text.Substring(oldOffset));
return returnSet;
}
Testing:
split_string(25, "Memphis needs at least one FAT partition on a Master Boot Record to be able to store it's configuration and other files on. Please use a partition utility like GParted to partition your hard drive properly.");
Gives:
Memphis needs at least one
FAT partition on a Master
Boot Record to be able to
store it's configuration
and other files on. Please
use a partition utility like
GParted to partition your
hard drive properly.
split_string(6, "Hello beautiful world.")
Gives
Hello
beautiful
world.

Non-exponential formatted float

I have a UTF-8 formatted data file that contains thousands of floating point numbers. At the time it was designed the developers decided to omit the 'e' in the exponential notation to save space. Therefore the data looks like:
1.85783+16 0.000000+0 1.900000+6-3.855418-4 1.958263+6 7.836995-4
-2.000000+6 9.903130-4 2.100000+6 1.417469-3 2.159110+6 1.655700-3
2.200000+6 1.813662-3-2.250000+6-1.998687-3 2.300000+6 2.174219-3
2.309746+6 2.207278-3 2.400000+6 2.494469-3 2.400127+6 2.494848-3
-2.500000+6 2.769739-3 2.503362+6 2.778185-3 2.600000+6 3.020353-3
2.700000+6 3.268572-3 2.750000+6 3.391230-3 2.800000+6 3.512625-3
2.900000+6 3.750746-3 2.952457+6 3.872690-3 3.000000+6 3.981166-3
3.202512+6 4.437824-3 3.250000+6 4.542310-3 3.402356+6 4.861319-3
The problem is float.Parse() will not work with this format. The intermediate solution I had was,
protected static float ParseFloatingPoint(string data)
{
int signPos;
char replaceChar = '+';
// Skip over first character so that a leading + is not caught
signPos = data.IndexOf(replaceChar, 1);
// Didn't find a '+', so lets see if there's a '-'
if (signPos == -1)
{
replaceChar = '-';
signPos = data.IndexOf('-', 1);
}
// Found either a '+' or '-'
if (signPos != -1)
{
// Create a new char array with an extra space to accomodate the 'e'
char[] newData = new char[EntryWidth + 1];
// Copy from string up to the sign
for (int i = 0; i < signPos; i++)
{
newData[i] = data[i];
}
// Replace the sign with an 'e + sign'
newData[signPos] = 'e';
newData[signPos + 1] = replaceChar;
// Copy the rest of the string
for (int i = signPos + 2; i < EntryWidth + 1; i++)
{
newData[i] = data[i - 1];
}
return float.Parse(new string(newData), NumberStyles.Float, CultureInfo.InvariantCulture);
}
else
{
return float.Parse(data, NumberStyles.Float, CultureInfo.InvariantCulture);
}
}
I can't call a simple String.Replace() because it will replace any leading negative signs. I could use substrings but then I'm making LOTS of extra strings and I'm concerned about the performance.
Does anyone have a more elegant solution to this?
string test = "1.85783-16";
char[] signs = { '+', '-' };
int decimalPos = test.IndexOf('.');
int signPos = test.LastIndexOfAny(signs);
string result = (signPos > decimalPos) ?
string.Concat(
test.Substring(0, signPos),
"E",
test.Substring(signPos)) : test;
float.Parse(result).Dump(); //1.85783E-16
The ideas I'm using here ensure the decimal comes before the sign (thus avoiding any problems if the exponent is missing) as well as using LastIndexOf() to work from the back (ensuring we have the exponent if one existed). If there is a possibility of a prefix "+" the first if would need to include || signPos < decimalPos.
Other results:
"1.85783" => "1.85783"; //Missing exponent is returned clean
"-1.85783" => "-1.85783"; //Sign prefix returned clean
"-1.85783-3" => "-1.85783e-3" //Sign prefix and exponent coexist peacefully.
According to the comments a test of this method shows only a 5% performance hit (after avoiding the String.Format(), which I should have remembered was awful). I think the code is much clearer: only one decision to make.
In terms of speed, your original solution is the fastest I've tried so far (#Godeke's is a very close second). #Godeke's has a lot of readability, for only a minor amount of performance degradation. Add in some robustness checks, and his may be the long term way to go. In terms of robustness, you can add that in to yours like so:
static char[] signChars = new char[] { '+', '-' };
static float ParseFloatingPoint(string data)
{
if (data.Length != EntryWidth)
{
throw new ArgumentException("data is not the correct size", "data");
}
else if (data[0] != ' ' && data[0] != '+' && data[0] != '-')
{
throw new ArgumentException("unexpected leading character", "data");
}
int signPos = data.LastIndexOfAny(signChars);
// Found either a '+' or '-'
if (signPos > 0)
{
// Create a new char array with an extra space to accomodate the 'e'
char[] newData = new char[EntryWidth + 1];
// Copy from string up to the sign
for (int ii = 0; ii < signPos; ++ii)
{
newData[ii] = data[ii];
}
// Replace the sign with an 'e + sign'
newData[signPos] = 'e';
newData[signPos + 1] = data[signPos];
// Copy the rest of the string
for (int ii = signPos + 2; ii < EntryWidth + 1; ++ii)
{
newData[ii] = data[ii - 1];
}
return Single.Parse(
new string(newData),
NumberStyles.Float,
CultureInfo.InvariantCulture);
}
else
{
Debug.Assert(false, "data does not have an exponential? This is odd.");
return Single.Parse(data, NumberStyles.Float, CultureInfo.InvariantCulture);
}
}
Benchmarks on my X5260 (including the times to just grok out the individual data points):
Code Average Runtime Values Parsed
--------------------------------------------------
Nothing (Overhead) 13 ms 0
Original 50 ms 150000
Godeke 60 ms 150000
Original Robust 56 ms 150000
Thanks Godeke for your contiually improving edits.
I ended up changing the parameters of the parsing function to take a char[] rather than a string and used your basic premise to come up with the following.
protected static float ParseFloatingPoint(char[] data)
{
int decimalPos = Array.IndexOf<char>(data, '.');
int posSignPos = Array.LastIndexOf<char>(data, '+');
int negSignPos = Array.LastIndexOf<char>(data, '-');
int signPos = (posSignPos > negSignPos) ? posSignPos : negSignPos;
string result;
if (signPos > decimalPos)
{
char[] newData = new char[data.Length + 1];
Array.Copy(data, newData, signPos);
newData[signPos] = 'E';
Array.Copy(data, signPos, newData, signPos + 1, data.Length - signPos);
result = new string(newData);
}
else
{
result = new string(data);
}
return float.Parse(result, NumberStyles.Float, CultureInfo.InvariantCulture);
}
I changed the input to the function from string to char[] because I wanted to move away from ReadLine(). I'm assuming this would perform better then creating lots of strings. Instead I get a fixed number of bytes from the data file (since it will ALWAYS be 11 char width data), converting the byte[] to char[], and then performing the above processing to convert to a float.
Could you possibly use a regular expression to pick out each occurrence?
Some information here on suitable expresions:
http://www.regular-expressions.info/floatingpoint.html
Why not just write a simple script to reformat the data file once and then use float.Parse()?
You said "thousands" of floating point numbers, so even a terribly naive approach will finish pretty quickly (if you said "trillions" I would be more hesitant), and code that you only need to run once will (almost) never be performance critical. Certainly it would take less time to run then posting the question to SO takes, and there's much less opportunity for error.

Categories

Resources