I have millions of strings, around 8GB worth of HEX; each string is 3.2kb in length.
Each of these strings contains multiple parts of data I need to extract.
This is an example of one such string:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ$GPGGA,104646.091,,,,,0,0,,,M,,M,,*41$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test2ÿÿ3ÿÿ4ÿÿ5ÿÿ6ÿÿ7ÿÿ8ÿÿ9ÿÿ:ÿÿ;ÿÿ<ÿÿ=ÿÿ>ÿÿ?ÿÿ#ÿÿAÿÿBÿÿCÿÿDÿÿEÿÿFÿÿGÿÿHÿÿIÿÿJÿÿ$GPGGA,104647.091,,,,,0,0,,,M,,M,,*40$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header TestKÿÿLÿÿMÿÿNÿÿOÿÿPÿÿQÿÿRÿÿSÿÿTÿÿUÿÿVÿÿWÿÿXÿÿYÿÿZÿÿ[ÿÿ\ÿÿ]ÿÿ^ÿÿ_ÿÿ`ÿÿaÿÿbÿÿcÿÿ$GPGGA,104648.091,,,,,0,0,,,M,,M,,*4F$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Testdÿÿeÿÿfÿÿgÿÿhÿÿiÿÿjÿÿkÿÿlÿÿmÿÿnÿÿoÿÿpÿÿqÿÿrÿÿsÿÿtÿÿuÿÿvÿÿwÿÿxÿÿyÿÿzÿÿ{ÿÿ|ÿÿ$GPGGA,104649.091,,,,,0,0,,,M,,M,,*4E$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test}ÿÿ~ÿÿ.ÿÿ€ÿÿ.ÿÿ‚ÿÿƒÿÿ„ÿÿ…ÿÿ†ÿÿ‡ÿÿˆÿÿ‰ÿÿŠÿÿ‹ÿÿŒÿÿ.ÿÿŽÿÿ.ÿÿ.ÿÿ‘ÿÿ’ÿÿ“ÿÿ”ÿÿ•ÿÿ$GPGGA,104650.091,,,,,0,0,,,M,,M,,*46$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Head
as you can see it is pretty much this repeated:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
I want to separate this string into two lists like this:
_GPSList
$GPGGA,104644.091,,,,,0,0,,,M,,M,,*43
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N
_WavList
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
Issue 1:
This repetition isn't containing within a single string, it overflows into the next string. so if some data crosses the end and start of two strings how to I deal with that?
Issue 2: How do I analyse the string and extract only the parts I need?
The solution I'm providing is not a complete answer but more like an idea which might help you get what you want.
Everything else which I present is an assumption on my behalf.
//Assuming your data is stored in a file "yourdatafile"
//Splitting all the text on "$" assuming this will separate GPSData
string[] splittedstring = File.ReadAllText("yourdatafile").Split('$');
//I found an extra string lingering in the sample you provided
//because I splitted on "$", so you gotta take that into account
var GPSList = new List<string>();
var WAVList = new List<string>();
foreach (var str in splittedstring)
{
//So if the string contains "Header" we would want to separate it from GPS data
if (str.Contains("Header"))
{
string temp = str.Remove(str.IndexOf("Header"));
int indexOfAsterisk = temp.LastIndexOf("*");
string stringBeforeAsterisk = str.Substring(0, indexOfAsterisk + 1);
string stringAfterAsterisk = str.Replace(stringBeforeAsterisk, "");
WAVList.Add(stringAfterAsterisk);
GPSList.Add("$" + stringBeforeAsterisk);
}
else
GPSList.Add("$" + str);
}
This provides the exact output as you need, only exception is with that extra string. Also some non-standard characters might look like black blocks.
I've been trying to understand how XML and CSV parsing work, without actually writing any code yet. I might have to parse a .csv file in the ongoing project and I'd like to be ready. (I'll have to convert them to .ofx files)
I'm also aware there's probably a thousand XLM and csv parsers out there, so I'm more curious than I am worried. I intend on using the XMLReader that I believe microsoft provides.
Let's say I have the following .csv file
02/02/2016 ; myfirstname ; mylastname ; somefield ; 321654 ; commentary ; blabla
Sometimes a field will be missing. Which means, for the sake of the example, that the lastname isn't mandatory, and somefield could be right after the first name.
My questions are :
How do I avoid the confusion between somefield and lastname?
I could count the total number of fields, but in my situation two are optional, if there is only one missing, I can't be sure which one it is.
How do I avoid false "tags"? I mean, if the user first comment includes a ;, how can I be sure it's a part of his comment and not the start of the following tag?
Again, I could count the remaining fields and find out where I am, but that excludes the optional fields problem.
My questions also apply to XML, what can I do if the user starts writing XML in his form ? Wether I decide to export the form as .csv or .xml, there can be trouble.
Right now I'm on the assumption that the c# Xml reader/parser are awesome enough to deal with it ; and if they are, I'm really curious on the how.
Assuming the CSV/XML data has been exported properly, none of this will be a problem. Missing fields will be handled by repeated separators:
02/02/2016;myfirstname;;somefield
Semi-colons within a field will normally be handled by quoting:
02/02/2016;"myfirst;name";
Quotes are escaped within a string:
02/02/2016;"my""first""name";
With XML it's even less of an issue since the tags or attributes will all have names.
If your CSV data is NOT well-formed, then you have a much bigger problem, as it may be impossible to distinguish missing fields and non-quoted separators.
How do I avoid false "tags"? String values should be quoted if the (can) contain separator characters. If you create the CSV file, quote and unquote all string values.
How do I avoid the confusion between somefield and lastname? No general solution for this, all case must be handled one by one. Can a general algorithm decide wheather first name or last name is missing? No.
If you know what field(s) can be omitted, you can write an "intelligent" handling.
Use XML and all of your problem will be solved.
Fisrt
How do I avoid the confusion between somefield and lastname?
There is no way to do this without change the logic of file. For example: when "mylastname" is empty You may have a "" value, empty string or like this ;;
How do I avoid false "tags"? I mean, if the user first comment includes a ;, how can I be sure it's a part of his comment and not the start of the following tag?
It is simple you have to file like this:
; - separor of columns
"" - delimetr of columns
value;value;"value;;;;value";value
To split this only for separtor ; without the separator in "" this code do this is tested and compiled
public static string[] SplitWithDelimeter(this string line, char separator, char checkSeparator, bool eraseCheckSeparator)
{
var separatorsIndexes = new List<int>();
var open = false;
for (var i = 0; i < line.Length; i++)
{
if (line[i] == checkSeparator)
{
open = !open;
}
if (!open && line[i] == separator )
{
separatorsIndexes.Add(i);
}
}
separatorsIndexes.Add(line.Length);
var result = new string[separatorsIndexes.Count];
var first = 0;
for (var j = 0; j < separatorsIndexes.Count; j++)
{
var tempLine = line.Substring(first, separatorsIndexes[j] - first);
result[j] = eraseCheckSeparator ? tempLine.Replace(checkSeparator, ' ').Trim() : tempLine;
first = separatorsIndexes[j] + 1;
}
return result;
}
Return would be:
value
value
"value;;;;value"
value
Having a bit of trouble with converting a comma separated text file to a generic list. I have a class (called "Customers") defined with the following attributes:
Name (string)
City (string)
Balance (double)
CardNumber (int)
The values will be stored in a text file in this format: Name,City, Balance, CarNumber e.g. John,Memphis,10,200789. There will be multiple lines of this. What I want to do is have each line be placed in a list item when the user clicks a button.
I've worked out I can break each line up using the .Split() method, but have no idea how to make the correct value go into the correct attribute of the list. (Please note: I know how to use get/set properties, and I am not allowed to use LINQ to solve the problem).
Any help appreciated, as I am only learning and have been working on this for a for while with no luck. Thanks
EDIT:
Sorry, it appears I'm not making myself clear. I know how to use .add.
If I have two lines in the text file:
A,B,1,2 and
C,D,3,4
What I don't know how to do is make the name "field" in the list item in position 0 equal "A", and the name "field" in the item in position 1 equal "C" and so on.
Sorry for the poor use of terminology, I'm only learning. Hope you understand what I'm asking (I'm sure it's really easy to do once you know)
The result of string.Split will give you an array of strings:
string[] lineValues = line.Split(',');
You can access values in an array by index:
string name = lineValues[0];
string city = lineValues[1];
You can convert strings to double or int using their respective Parse methods:
double balance = double.Parse(lineValues[2]);
int cardNumber = int.Parse(lineValues[3]);
You can instantiate the class and assign to it very simply:
Customer customerForCurrentLine = new Customer()
{
Name = name,
City = city,
Balance = balance,
CardNumber = cardNumber,
};
Simply loop over the lines, instantiate a Customer for that line, and add it to a variable you've created of the type List<Customer>
If you want your program to be bulletproof, you're going to have to do a lot of checking to skip over lines that don't have enough values, or that would fail to parse to the correct number type. For example, check lineValues.Length == 4 and use int.TryParse(...) and double.TryParse(...).
Read a file and split its text based on newline character. Then for total line count run a loop that will split based on comma and create a new object and insert values in its properties and add that object to a list.
This way
List<Customers> lst = new List<Customers>();
string[] str = System.IO.File.ReadAllText(#"C:\CutomersFile.txt")
.Split(new string[] { Environment.NewLine },
StringSplitOptions.None);
for (int i = 0; i < str.Length; i++)
{
string[] s = str[i].Split(',');
Customers c = new Customers();
c.Name = s[0];
c.City = s[1];
c.Balance = Convert.ToDouble(s[2]);
c.CardNumber = Convert.ToInt32(s[3]);
lst.Add(c);
}
BTW class name should be Customer and not Customers
Split() generates an array of strings in the order they appeared in the source string. Thus, if your name field is the first column in the CSV file, it will always be the first index in the array.
someCustomer.Name = splitResult[0];
And so on. You'll also need to investigate String.TryParse for your class's numerically typed properties.