Looping Through Large KML Text Files - c#

I am a novice at programming, working on a C# solution for a geomorphology project. I need to extract coordinates from a variable number of Google Earth KML ground overlay files, converted to one long text string, and enter them into an array that can be accessed by other methods.
The KML tags and data of interest look like this:
<LatLonBox>
<north>37.91904192681665</north>
<south>37.46543388598137</south>
<east>15.35832653742206</east>
<west>14.60128369746704</west>
<rotation>-0.1556640799496235</rotation>
</LatLonBox>
The text files I will be processing with the program could have between 1 and a 100 or more of these data groups, each embedded within the standard KML file headers/footers and other tags extraneous for my work. I have already developed the method for extracting the coordinate values as strings and have tested it for one KML file.
At this point it seems that the most efficient approach would be to construct some kind of looping method to search through the string for a coordinate data group, extract the data to a row in the array, then continue to the next group. The method might also go through the string and extract all the "north" data to a column in the array first, then loop back for all the "south" data, etc. I am open to any suggestions.
Due to my limited programming background, straight-forward solutions would be preferred over elegant or advanced solutions, but give it your best shot.
Thanks for your help.

Related

Parsing Complex PDF document with C#

See attached K-1 Document. I have attempted to use numerous tweaks with iTextSharp library but haven't had success in loading data correctly.
Ideally I would like to parse out the document similar to how humans would read them, one textbox at a time, reading its contents.
var reader = new PdfReader(FILE, Encoding.ASCII.GetBytes(password));
string[] lines;
var strategy = new LocationTextExtractionStrategy();
string currentPageText = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
lines = currentPageText.Split(new string[] {"\r\n", "\n"}, StringSplitOptions.None);
I also tried playing with Annotation parsing but didn't have luck.
I'm a newbie and probably looking at wrong place. Can you help guide me in the right direction?
Thanks a lot.
You would like to parse out the document similar to how humans would read them, one textbox at a time, reading its contents. That means you first will have to try and automatically recognize those text boxes. Then you can extract text by these areas.
To recognize those text boxes automatically in your document, you have to extract the border lines enclosing the boxes. For this you will first have to find out how those border lines are created. They might be drawn using vector graphics as lines or rectangles, but they could also be part of a background bitmap image.
Unfortunately I don't have your IRS form at hand and so cannot analyze its internals. Let's assume the borders are created using vector graphics for now. Thus, you have to extract vector graphics.
To extract vector graphics with iText(Sharp), you make use of classes from the iText(Sharp) parser namespace by making them parse the document and feed the parsing events into a listener you create which collects the vector graphic operations:
You implement IExtRenderListener, in particular its ModifyPath and RenderPath methods which respectively are called when additional path elements (e.g. lines or rectangles) are added to the current path or when the current path is rendered (stroked? filled?). Your implementation collects these information.
You parse your document into an instance of your listener, e.g. using PdfReaderContentParser.
You analyse the lines and rectangles found and derive the coordinates of the boxes they build.
You parse the same page in a LocationTextExtractionStrategy instance.
You retrieve the texts of the recognized text boxes by calling LocationTextExtractionStrategy.GetResultantText with a matching ITextChunkFilter argument for each box.
(Actually you can do the parsing into the instance of your listener and the LocationTextExtractionStrategy instance in one pass for a bit of optimization.)
All iText(Sharp) specific tasks are trivial, and the only other task, the analysis of the lines and rectangles found to derive the coordinates of the boxes, should be no big problem for a software developer proficient in C#.
The first question if this form is electronic or a scanned one? the latter would make the data extraction much harder as it should involve OCR too.
in case you have electronic PDF and if you have all the similar forms then why don't you just use the following strategy:
store coordinates of each "box" in the config file
process documents and exract text from every "box" (i.e. region)
additional process extracted text with regular expressions to separate name from address (or maybe you may just set the region to read text from line by line)
In case you have few variations of the form then you may check the very first box to extract the name of the form and load the appropraite settings file (that contains a set of regions for that variation)
This approach should work with any PDF library.
Take a look at IvyPdf library and template editor. It's using c# and provides high-level functions to parse and extract data so you don't have to deal with internals of PDF documents. You can build fairly complex scenarios using it.
I don't think it can read annotations though.

Sort a text file where each line is a name followed by a score

I'm going to make a scoreboard for an XNA game but I'm having some trouble figuring out how. The player, after finishing a level, will enter his name when prompted. His name and time will be recorder in a text file something like this:
sam 90
james 64
matthew 100
I'm trying to figure out a way to sort this data by the time taken only and not taking into account the name.
I haven't started coding this yet but if anybody can give me any ideas it would be greatly appreciated.
First, read the text file using File.ReadAllLines(...) so you get a string array. Then iterate over the array and split each string on blank space (assuming users can't enter spaces in their names) and order on the second element, which should be the score. You have to cast it into a string with int.Parse(...) to be able to order it properly.
string[] scores = File.ReadAllLines("scorefile.txt");
var orderedScores = scores.OrderByDescending(x => int.Parse(x.Split(' ')[1]));
foreach (var score in orderedScores)
{
Console.WriteLine(score);
}
//outputs:
//matthew 100
//sam 90
//james 64
I would recommend using something like a semi-colon to separate the name and the score instead of a space, as that makes it much easier to handle in the case that users are allowed to enter spaces in their names.
Why not make this into a database file, .sdf which you can easily create on the fly if needed. Would be best for keeping track of data allowing sorting and future reuse.
SQLite is designed for this exact purpose, makes basic CRUD operations a doddle. You can also encrypt database files if your game grows and you want to start using it as a way to download/upload high scores and share scores with friends/world.
Don't get me wrong, this is definitely a little more work than simply parsing a text file, but it is future proof and you get a lot of functionality right out of the bag without having to write parsers and complex search routines etc.
XML is a definite other choice, or JSON. All 3 are all good alternatives. A plain text file probably isn't the way to go though, as in the end will probably cause you more work.
Create the score table as
name::score
and read every line and on it
string line = "Sam::255";
string name = line.split("::")[0];
Also do similar to the score.

Best Way to Load a File, Manipulate the Data, and Write a New File

I have an issue where I need to load a fixed-length file. Process some of the fields, generate a few others, and finally output a new file. The difficult part is that the file is of part numbers and some of the products are superceded by other products (which can also be superceded). What I need to do is follow the superceded trail to get information I need to replace some of the fields in the row I am looking at. So how can I best handle about 200000 lines from a file and the need to move up and down within the given products? I thought about using a collection to hold the data or a dataset, but I just don't think this is the right way. Here is an example of what I am trying to do:
Before
Part Number List Price Description Superceding Part Number
0913982 3852943
3852943 0006710 CARRIER,BEARING
After
Part Number List Price Description Superceding Part Number
0913982 0006710 CARRIER,BEARING 3852943
3852943 0006710 CARRIER,BEARING
As usual any help would be appreciated, thanks.
Wade
Create structure of given fields.
Read file and put structures in collection. You may use part number as key for hashtable to provide fastest searching.
Scan collection and fix the data.
200 000 objects from given lines will fit easily in memory.
For example.
If your structure size is 50 bytes then you will need only 10Mb of memory. It is nothing for modern PC.

Data reading and storing in c#, (Concept..no code)

I want to translate my Matlab code (least squares plane fitting) into C#.
I have many problems in understanding c#.
Let me ask here.
Reading a text file and storing data in xyz format in matrix (e.g., xyzdata= xyz) in Matlab is quite easy.
Translating it into CSharp?
How can I read [x y z] without knowing length of file and how can I store it in Matrix form?
Thank you very much for your help and If someone has plane fitting code / link, please guide me.
I don't know the content of your text file, but File.ReadAllLines is the easiest way to read a text file into a string array representing all lines in the file. No trouble with having to know the length of the file.
If the lines contain the entries of your matrix, the next step would be looping through the lines and for each line use String.Split to get the individual elements.
When you've got that far, you have all information for creating a matrix of the required size. To fill its elements you're going to need Int32.Parse or Decimal.Parse to convert the elements as string into numbers.
However, hard to tell from your post what kind of matrix you'll need (probably a multi dimensional array). Search "[matrix] [c#]" here at stack overflow. And try "[math] [.net]" to find posts on math libraries for .net.

How to read a text file into a List in C#

I have a text file that has the following format:
1234
ABC123 1000 2000
The first integer value is a weight and the next line has three values, a product code, weight and cost, and this line can be repeated any number of times. There is a space in between each value.
I have been able to read in the text file, store the first value on the first line into a variable, and then the subsequent lines into an array and then into a list, using first readline.split('').
To me this seems an inefficient way of doing it, and I have been trying to find a way where I can read from the second line where the product codes, weights and costs are listed down into a list without the need of using an array. My list control contains an object where I am only storing the weight and cost, not the product code.
Does anyone know how to read in a text file, take in some values from the file straight into a list control?
Thanks
What you do is correct. There is no generalized way of doing it, since what you did is that you descirbed the algorithm for it, that has to be coded or parametrized somehow.
Since your text file isn't as structured as a CSV file, this kind of manual parsing is probably your best bet.
C# doesn't have a Scanner class like Java, so what you wan't doesn't exist in the BCL, though you could write your own.
The other answers are correct - there's no generalized solution for this.
If you've got a relatively small file, you can use File.ReadAllLines(), which will at least get rid of a lot cruft code, since it'll immediately convert it to a string array for you.
If you don't want to parse strings from the file and to reserve an additional memory for holding split strings you can use a binary format to store your information in the file. Then you can use the class BinaryReader with methods like ReadInt32(), ReadDouble() and others. It is more efficient than read by characters.
But one thing: binary format is bad readable by humans. It will be difficult to edit the file in the editor. But programmatically - without any problems.

Categories

Resources