Text File Mapping - c#

I have a text files that are coming always in the same text format (I do not have the xsd of the text file).
I want to map the data from it to some class.
Is there some standard way to do so, except starting writing string parsers or some complicated REGEXs.
I really do not want to go with text parsers becasue we are several people working on this and it probably take each of us time to understand what the other is doing .
Example
Thanks.

If you have a special format you need your own parser for sure.
If the format is a standard one like xml, yml, json, csv etc, the parsing library will be always available in your language.
UPDATE
From the sample you provide it seems the format is more like INI file but entries are custom. May be you could extend NINI

Solution:
Change the format of that file to a standard format like tab delimited or comma separated csv file.
Then use a many libraries that out there to read that files or import it in a database and use an ORM like Entity Framework to read them

Assuming you cannot change the incoming file format to something more machine-readable, then you will probably need to write your own custom parser. The best way to do it would be to create classes to represent and store all of the different kinds of data, using the appropriate data formats for each field (custom enums, DateTime, Version, etc.)
Try to compartmentalize the code. For example, take these lines here:
272 298 9.663 18.665 -90.000 48 0 13 2 10 5 20009 1 2 1 257 "C4207" 0 0 1000 0 0
This could be a single class or struct. Its constructor could accept the above string as a parameter, and each value could be parsed to to different local members. That same class could have a Save() or ToString() method that converts all the values back to a string if needed.
Then the parent class would simply contain an array of the above structure, based on how many entries are in the file.

Related

Design better text file format for reading mixed type data of variable length

I am designing a text file format to be read in C#. I have a need to store types: int, double and string on a single line. I'm planning to use a .CSV format so the file can be manually opened and read. A particular record may have say 8 known types, then a variable number of "indicator" combinations of either (string, int, double) or (string, int, double, double), and some lines may include no "indicators". Thus, each record is may be of variable length.
In VB6 I would just input the data, split the data, into a variant array, then determine the number of elements on that line in the array, and use the ***VarType function to determine if the final "indicator" variables are string, int, or double and parse the field accordingly.
There may be a better way to design a text file and that may be the best solution. If so I'm interested in hearing ideas. I have searched but found no questions that specifically talk about reading variable length lines of text with mixed type into C#.
If a better format is not forthcoming, is there a way to duplicate the VB6 VarType function within C# as described two paragraphs above***? I can handle the text file reading and line splitting easily in C#.
you could use either json or xml as they are well supported in .NET and have automatic serialization capabilities
First I agree with Keith's suggestion to use Xml or JSON. You are reinventing a wheel here. This page has an introductory example of how to serialize objects to a file and some links to more info.
If you need to stick with your own file format and custom serialization/deserialization however take a look at the Convert class, as well as the various TryParse methods which hang off of the intrinsic value types like int and double.

Compress a short but repeating string

I'm working on a web app that needs to take a list of files on a query string (specifically a GET and not a POST), something like:
http://site.com/app?things=/stuff/things/item123,/stuff/things/item456,/stuff/things/item789
I want to shorten that string:
http://site.com/app?things=somekindofencoding
The string isn't terribly long, varies from 20-150 chars. Something that short isn't really suitable for GZip, but it does have an awful lot of repetition so compression should be possible.
I don't want a DB or Dictionary of strings - the URL will be built by a different application to the one that consumes it. I want a reversible compression that shortens this URL. It doesn't need to be secure.
Is there an existing way to do this? I'm working in C#/.Net but would be happy to adapt an algorithm from some other language/stack.
If you can express the data in BNF you could contruct a parser for the data. in stead of sending the data you could send the AST where each node would be identified as one character (or several if you have a lot of different nodes). In your example
we could have
files : file files
|
file : path id
path : itemsthing
| filesitem
| stuffthingsitem
you could the represent a list of files as path[id1,id2,...,idn] using 0,1,2 for the paths and the input being:
/stuff/things/item123,/stuff/things/item456,/stuff/things/item789
/files/item1,/files/item46,/files/item7
you'd then end up with ?things=2[123,456,789]1[1,46,7]
where /stuff/things/item is represented with 2 and /files/item/ is represented with 1 each number within [...] is an id. so 2[123] would expand to /stuff/things/item123
EDIT The approach does not have to be static. If you have to discover the repeated items dynamically you can use the same approach and pass the map between identifier and token. in that case the above example would be
?things=2[123,456,789]1[1,46,7]&tokens=2=/stuff/things/,1=/files/item
which if the grammar is this simple ofcourse would do better with
?things=/stuff/things/[123,456,789]/files/item[1,46,7]
compressing the repeated part to less than the unique value with such a short string is possible but will most likely have to be based on constraining the possible values or risk actually increasing the size when "compressing"
You can try zlib using raw deflate (no zlib or gzip headers and trailers). It will generally provide some compression even on short strings that are composed of printable characters and does look for and take advantage of repeated strings. I haven't tried it, but could also see if smaz works for your data.
I would recommend obtaining a large set of real-life example URLs to use for benchmark testing of possible compression approaches.

How to validate against a variable set of value ranges?

I'm currently working on an application which allows the user to read and a configuration for a device. The configuration is stored as XML.
The issue I'm faced with is how to define validation for the application. For example, most of the values I'm storing in the XML file have to be within different ranges, e.g. 0 - 2, 1 - 50, 10 characters or 20 characters, etc.
There are a lot of these constraints that I have to validate against, and I don't want to hard-code the ranges because when version 2 of the device comes out the configuration file will have different set of ranges. E.g. instead of 0 - 2, it will be 0 - 4 and instead of 20 characters, 40 is now allowed.
How should I approach this? Should I store the validation rules in separate XML files? Should I define a class with hard-coded configuration ranges for this device, and create a new class for the version 2 of the device with its configuration ranges?
It could be done inside a XML, so kind of declarative programming, where in XML you define a behaviour. But it's not flexible and you easily can jump into pretty conplicated scenarious.
What I personally would prefer to do is maintain the logic inside the code, but the parameters range that the data in source XML has to be checked against store in some MatchData.xml.
Hope this helps.
I almost always prefer an external configuration file in cases like this. You could define an object that performs the validation (validater). When the validater object is instantiated it instantiates a validation rules object that contains all the ranges for the various validation items. I would serialize/de-serialize this object using an XML file and that file would be included in your app distribution.

How to read a text file into a List in C#

I have a text file that has the following format:
1234
ABC123 1000 2000
The first integer value is a weight and the next line has three values, a product code, weight and cost, and this line can be repeated any number of times. There is a space in between each value.
I have been able to read in the text file, store the first value on the first line into a variable, and then the subsequent lines into an array and then into a list, using first readline.split('').
To me this seems an inefficient way of doing it, and I have been trying to find a way where I can read from the second line where the product codes, weights and costs are listed down into a list without the need of using an array. My list control contains an object where I am only storing the weight and cost, not the product code.
Does anyone know how to read in a text file, take in some values from the file straight into a list control?
Thanks
What you do is correct. There is no generalized way of doing it, since what you did is that you descirbed the algorithm for it, that has to be coded or parametrized somehow.
Since your text file isn't as structured as a CSV file, this kind of manual parsing is probably your best bet.
C# doesn't have a Scanner class like Java, so what you wan't doesn't exist in the BCL, though you could write your own.
The other answers are correct - there's no generalized solution for this.
If you've got a relatively small file, you can use File.ReadAllLines(), which will at least get rid of a lot cruft code, since it'll immediately convert it to a string array for you.
If you don't want to parse strings from the file and to reserve an additional memory for holding split strings you can use a binary format to store your information in the file. Then you can use the class BinaryReader with methods like ReadInt32(), ReadDouble() and others. It is more efficient than read by characters.
But one thing: binary format is bad readable by humans. It will be difficult to edit the file in the editor. But programmatically - without any problems.

storing large data in string

i am trying to store large data more than 255 characters in a string datatype but it truncates after 255. how can i achive this basically i need to pass this data to database
C# strings do not have any particular character limit. However the database column you are writing to may have a limit. If you are storing large amounts of data, you should use a BLOB column instead of an ordinary varchar type.
StringBuilder class
Like they said the string class is not limited, but you can do this for large strings. I feel it handles them better.
StringBuilder sb = new StringBuilder();
sb.append("Some text...");
sb.append("more text...");
sb.append("even more text!");
sb.toString();
Okay, it sounds like you have several different technologies involved - Excel, XML, databases etc. Try to tackle just one at a time. First read the data out of Excel, and make sure you can do that without any truncation.
Write a small console app which will read the value, then write it to the console - and its length. If that works, you know the problem isn't in Excel.
Next you can write a small console app with hardcoded input data (so you don't need to keep using interop with Excel) and write the XML from that, or whatever your next stage is.
Basically, take the one big problem ("when I read data from Excel and write it to the database it truncates long values") and split it into smaller and smaller ones until you've found what's wrong.
The string type does not limit strings to 255 characters. Your database column must be 255 characters.
I know that c# strings can hold much longer data than that. If the truncation occurs on commiting to DB, check the length constraint on ur Db field
The problem lies in the Excel part; .Character has a 255 characters limitation.
To read the complete text from a shape the following VBA syntax would do:
Worksheets("YourSheet").Shapes("Shape1").OLEFormat.Object.Text

Categories

Resources