How can i analise millions of strings that merge into each other? - c#

I have millions of strings, around 8GB worth of HEX; each string is 3.2kb in length.
Each of these strings contains multiple parts of data I need to extract.
This is an example of one such string:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ$GPGGA,104646.091,,,,,0,0,,,M,,M,,*41$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test2ÿÿ3ÿÿ4ÿÿ5ÿÿ6ÿÿ7ÿÿ8ÿÿ9ÿÿ:ÿÿ;ÿÿ<ÿÿ=ÿÿ>ÿÿ?ÿÿ#ÿÿAÿÿBÿÿCÿÿDÿÿEÿÿFÿÿGÿÿHÿÿIÿÿJÿÿ$GPGGA,104647.091,,,,,0,0,,,M,,M,,*40$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header TestKÿÿLÿÿMÿÿNÿÿOÿÿPÿÿQÿÿRÿÿSÿÿTÿÿUÿÿVÿÿWÿÿXÿÿYÿÿZÿÿ[ÿÿ\ÿÿ]ÿÿ^ÿÿ_ÿÿ`ÿÿaÿÿbÿÿcÿÿ$GPGGA,104648.091,,,,,0,0,,,M,,M,,*4F$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Testdÿÿeÿÿfÿÿgÿÿhÿÿiÿÿjÿÿkÿÿlÿÿmÿÿnÿÿoÿÿpÿÿqÿÿrÿÿsÿÿtÿÿuÿÿvÿÿwÿÿxÿÿyÿÿzÿÿ{ÿÿ|ÿÿ$GPGGA,104649.091,,,,,0,0,,,M,,M,,*4E$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test}ÿÿ~ÿÿ.ÿÿ€ÿÿ.ÿÿ‚ÿÿƒÿÿ„ÿÿ…ÿÿ†ÿÿ‡ÿÿˆÿÿ‰ÿÿŠÿÿ‹ÿÿŒÿÿ.ÿÿŽÿÿ.ÿÿ.ÿÿ‘ÿÿ’ÿÿ“ÿÿ”ÿÿ•ÿÿ$GPGGA,104650.091,,,,,0,0,,,M,,M,,*46$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Head
as you can see it is pretty much this repeated:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
I want to separate this string into two lists like this:
_GPSList
$GPGGA,104644.091,,,,,0,0,,,M,,M,,*43
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N
_WavList
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
Issue 1:
This repetition isn't containing within a single string, it overflows into the next string. so if some data crosses the end and start of two strings how to I deal with that?
Issue 2: How do I analyse the string and extract only the parts I need?

The solution I'm providing is not a complete answer but more like an idea which might help you get what you want.
Everything else which I present is an assumption on my behalf.
//Assuming your data is stored in a file "yourdatafile"
//Splitting all the text on "$" assuming this will separate GPSData
string[] splittedstring = File.ReadAllText("yourdatafile").Split('$');
//I found an extra string lingering in the sample you provided
//because I splitted on "$", so you gotta take that into account
var GPSList = new List<string>();
var WAVList = new List<string>();
foreach (var str in splittedstring)
{
//So if the string contains "Header" we would want to separate it from GPS data
if (str.Contains("Header"))
{
string temp = str.Remove(str.IndexOf("Header"));
int indexOfAsterisk = temp.LastIndexOf("*");
string stringBeforeAsterisk = str.Substring(0, indexOfAsterisk + 1);
string stringAfterAsterisk = str.Replace(stringBeforeAsterisk, "");
WAVList.Add(stringAfterAsterisk);
GPSList.Add("$" + stringBeforeAsterisk);
}
else
GPSList.Add("$" + str);
}
This provides the exact output as you need, only exception is with that extra string. Also some non-standard characters might look like black blocks.

Related

Trying to format string of Datagrid output

Hi i have the following which creates two worksheets in an excel spreadsheet based on the values in a datagrid, I am able to get it working for two datagrids, however i need to do it for 14 datagrids, this is what i have got so far;
var grid1Output = RadGridView1.ToExcelML();
var grid2Output = RadGridView2.ToExcelML().Replace("Worksheet1", "Worksheet2");
var workBook = grid1Output.Replace("</Worksheet>", "</Worksheet>" +
grid2Output.Substring(grid2Output.IndexOf("<Worksheet"),
grid2Output.IndexOf("</Worksheet")- grid2Output.IndexOf("<Worksheet")) + " </Worksheet>");
The above works fine, however I need to do it for 14 gridoutputs in total. My problem is, I am having trouble replacing strings at the right place. How do i do this?
I would probably do it with Linq for XML methods rather than string manipulation, but the choice is yours.
Either way, it shouldn't be that hard to write a method that takes a grid output (I am assuming it's a string), extracts the contents, and returns them. The calling routine then assembles the 14 XML strings and wraps them in a single Worksheet tag.
Here's a stab at it. Bear in mind that I'm not familiar with the RadGridView and the output of ToExcelML, so you probably won't be able to use this code without some modification. I'm making some assumptions that may not be valid.
First, I would create a method that takes an XML string as input. I am assuming that this string is entirely wrapped in a <Worksheetn> tag.
string ExtractWorksheetContents(string excelML, int index)
{
// You might also be able to do this with a regex, depending on how the contents are structured
// Since I don't know enough about the content, I will do this with string manipulation, as
// you did, rather than loading the XML and making assumptions.
string tagName = string.Format("Worksheet{0}", index);
int worksheetStart = excelML.IndexOf("<" + tagName);
int worksheetEnd = excelML.IndexOf("</" + tagName + ">") + tagName.Length + 3);
// Should contain some checks that neither w'sheet start nor end are -1
return excelML.Substring(worksheetStart, worksheetEnd-worksheetStart);
}
Then I would assemble the results. Again, I'm making assumptions about how the XML is structured.
StringBuilder sb = new StringBuilder();
sb.Append("<Worksheet>");
RadGridView[] gridViews = new RadGridView[] { RadGridView1, RadGridView2 .... RadGridView14 };
for(int i=0;i<14; i++)
{
var rgv = gridViews[i];
sb.Append(ExtractWorksheetContents(rgv.ToExcelML(),i+1));
}
sb.Append("</Worksheet>");
var workBook = sb.ToString();
Hope this helps somewhat.

remove first element from array

PHP developer here working with c#.
I'm using a technique to remove a block of text from a large string by exploding the string into an array and then shifting the first element out of the array and turning what remains back into a string.
With PHP (an awesome & easy language) it was just
$array = explode('somestring',$string);
array_shift($array);
$newstring = implode(' ', $array);
and I'm done.
I get so mad at c# for not allowing me to create dynamic arrays and for not offering me default functions that can do the same thing as PHP regarding arrays. Instead of dynamic arrays I have to create lists and predefine key structures etc. But I'm new and I'm sure there are still equally graceful ways to do the same with c#.
Will someone show me a clean way to accomplish this goal with c#?
Rephrase of question: How can I remove the first element from an array using c# code.
Here is how far I've gotten, but RemoveAt throws a error while debugging so I don't believe it works:
//scoop-out feed header information
if (entry_start != "")
{
string[] parts = Regex.Split(this_string, #entry_start);
parts.RemoveAt(0);
this_string = String.Join(" ", parts);
}
I get so mad at c# for not allowing me to create dynamic arrays
You may take a look at the List<T> class. Its RemoveAt might be worth checking.
But for your particular scenario you could simply use LINQ and the Skip extension method (don't forget to add using System.Linq; to your file in order to bring it into scope):
if (entry_start != "")
{
string[] parts = Regex.Split(this_string, #entry_start).Skip(1).ToArray();
this_string = String.Join(" ", parts);
}
C# is not designed to be quick and dirty, nor it particularly specializes in text manipulation. Furthermore, the technique you use for removing some portion of a string from a beginning is crazy imho.
Why don't you just use String.Substring(int start, int length) coupled with String.IndexOf("your delimiter")?
Here is the corresponding C# code:
string input = "a,b,c,d,e";
string[] splitvals = input.Split(',');
string output = String.Join(",", splitvals, 1, splitvals.Length-1);
MessageBox.Show(output);
You can use LINQ for this:
if (entry_start != "")
this_string = String.Join(" ", Regex.Split(this_string, #entry_start).Skip(1).ToArray());
string split = ",";
string str = "asd1,asd2,asd3,asd4,asd5";
string[] ary = str.Split(new string[] { split }, StringSplitOptions.RemoveEmptyEntries);
string newstr = string.Join(split, ary, 1, ary.Count() - 1);
splits at ",". removes the first record. then combines back with ","
As stated above, you can use LINQ. Skip(int) will return an IEnumerable<string> that you can then cast back as array.
string[] myArray = new string[]{"this", "is", "an", "array"};
myArray = myArray.Skip(1).toArray();
You might be more comfortable with generic lists than arrays, which work more like PHP arrays.
List<T>
But if your goal is "to remove a block of text from a large string" then the easier way would be:
string Example = "somestring";
string BlockRemoved = Example.Substring(1);
// BlockRemoved = "omestring"
Edit
I misunderstood the question, thinking you were just removing the first element from the array where the array consisted of the characters that make up the string.
To split a string by a delimiter, look at the String.Split method instead. Some good examples are given here.

cutting from string in C#

My strings look like that: aaa/b/cc/dd/ee . I want to cut first part without a / . How can i do it? I have many strings and they don't have the same length. I tried to use Substring(), but what about / ?
I want to add 'aaa' to the first treeNode, 'b' to the second etc. I know how to add something to treeview, but i don't know how can i receive this parts.
Maybe the Split() method is what you're after?
string value = "aaa/b/cc/dd/ee";
string[] collection = value.Split('/');
Identifies the substrings in this instance that are delimited by one or more characters specified in an array, then places the substrings into a String array.
Based on your updates related to a TreeView (ASP.Net? WinForms?) you can do this:
foreach(string text in collection)
{
TreeNode node = new TreeNode(text);
myTreeView.Nodes.Add(node);
}
Use Substring and IndexOf to find the location of the first /
To get the first part:
// from memory, need to test :)
string output = String.Substring(inputString, 0, inputString.IndexOf("/"));
To just cut the first part:
// from memory, need to test :)
string output = String.Substring(inputString,
inputString.IndexOf("/"),
inputString.Length - inputString.IndexOf("/");
You would probably want to do:
string[] parts = "aaa/b/cc/dd/ee".Split(new char[] { '/' });
Sounds like this is a job for... Regular Expressions!
One way to do it is by using string.Split to split your string into an array, and then string.Join to make whatever parts of the array you want into a new string.
For example:
var parts = input.Split('/');
var processedInput = string.Join("/", parts.Skip(1));
This is a general approach. If you only need to do very specific processing, you can be more efficient with string.IndexOf, for example:
var processedInput = input.Substring(input.IndexOf('/') + 1);

Loop Problem: Assign data to different strings when in a loop

I have a string which consists of different fields. So what I want to do is get the different text and assign each of them into a field.
ex: Hello Allan IBM
so what I want to do is:
put these three words in different strings like
string Greeting = "Hello"
string Name = "Allan"
string Company = "IBM"
//all of it happening in a loop.
string data = "Hello Allan IBM"
string s = data[i].ToString();
string[] words = s.Split(',');
foreach (string word in words) {
Console.WriteLine(word);
}
any suggestions?
thanks hope to hear from you soon
If I understand correctly you have a string with place-holders and you want to put different string in those place-holders:
var format="{0}, {1} {2}. How are you?";
//string Greeting = "Hello"
//string Name = "Allan"
//string Company = "IBM"
//all of it happening in a loop.
string data = ...; //I think you have an array of strings separated by ,
foreach( va s in data){
{
//string s = data[i];//.ToString(); - it is already a string array
string[] words = data[i].Split(',');
Console.WriteLine(format, words[0], words[1], words[2]);
}
To me it sound not like a problem that can be solved with a loop. The essential problem is that the loop can only work if you do exactly the same operation on the items within the loop. If your problem doesn't fit, you end up with a dozen of lines of code within the loop to handle special cases, what could have been written in a shorter way without a loop.
If there are only two or three strings you have to set (what should be the case if you have named variables), assign them from the indexes of the split string. An alternative would be using regular expressions to match some patterns to make it more robust, if one of the expected strings is missing.
Another possibility would be to set attributes on members or properties like:
[MyParseAttribute(/*position*/ /*regex*/)]
string Greeting {get;set;}
And use reflexion to populate them. Here you could create a loop on all properties having that attribute, as it sounds to me that you are eager to create a loop :-)

string[] management

ok, so ill cut to the chase here. and to be clear, im looking for code examples where possible.
so, i have a normal string, lets say,
string mystring = "this is my string i want to use";
ok, now that i have my string, i split it by the space with
string[] splitArray = mystring.Split(new char[] { ' ' });
ok, so now i have splitArray[0] through splitArray[7].
now, i need to do some fancy things with the string that i normally wouldnt need to do.
here are a few:
i need to cut off the first word, so i am left with the other 7 words, so that i have something like:
string myfirstword = "this";
mystring = "is my string i want to use";
now, i will need to use mystring over and over again, using different parts of it at different times, and depending on the string i will have no idea how long, it will be. so i will give some examples of things ill need.
first, ill need to know, how many words are there (this is easy, just throwing it in)
second, ill need some way of using things like,
string secondword = splitArray[1];
string everythingAfterTheSecondWord = splitArray[2+];
if you noticed, i included a [2+] ... the + indicating that i want all strings in the array put back together, spaces in all, into a string. so for example,
string examplestring = "this is my example for my stack overflow question";
string[] splitArray2 = examplestring.Split(new char[] { ' ' });
now, if i called on splitArray2[4+] i would want a return of "for my stack overflow question". now obviously its not as simple as adding a + to a string array.. but thats what i need, and under the current situation i have tried many other easier ways that simply to not work.
ALSO, if i called on something like splitArray2[2-5] i would want, words 2 through 5 obviously.
Summary:
i need greater management of my string[] arrays, and i need to be able to find, every word after word *, need to be able to strip out random words in the string while leaving the rest of the string intact, and need to be able to find string m through n
Thanks!
Most of what you're looking for can be achieved with a List<string>. Briefly:
string mystring = "this is my string i want to use";
List<string> splitArray = new List<string>(mystring.Split(new char[] { ' ' }));
string firstWord = splitArray[0];
// mystring2 = "is my string i want to use"
splitArray.RemoveAt(0);
string mystring2 = String.Join(" ", splitArray.ToArray());
To do the more complicated things you describe with splitArray[2+] requires LINQ though, and hence .NET 3.5.
List<string> everythingAfterTheSecondWord = splitArray.Skip(2).ToList();
For splitArray[2-5]:
List<string> arraySlice = splitArray.Skip(2).Take(3).ToList();
Well, to do the "every word starting at word X" you could do this:
string newString = string.join(splitArray," ",x);
To get y words starting at x, do this:
string newString = string.join(splitArray," ",x,y);
To get the number of words:
int wordCount= splitArray.Length;
Putting it all together, words x-y goes like this:
string newString = string.join(splitArray," ",x, splitArray.Length-x+1);

Categories

Resources