I'm trying to write a parser to read values from a string based on a pattern. For example:
I need to write a method so it accepts "12:04:03" and a pattern example such as "{hh}:{mm}:{ss}" and it can parse it to return the corresponding portions ("12,"04","03").
I'm not trying to actually parse time, it's just a practical example. The pattern groups can be hardcoded.
What I think I could do:
Parse the string with RegEx and then find the original content looping through the string.
While this would work, I think there might be a more efficient way or even pre-built solution in the Framework.
So, how can I solve this problem elegant and efficiently?
Related
I am reading in a header from a file which has time fields for example Time (UTC +1). I then need to compare this with a list of stored headers to work out if the file is valid however my stored headers are used for writing and so allow flexibility on the timezones by being written like so Time (UTC {0}).
I would like to know what the best way of dealing with this in as much of a flexible statement as possible. The only way I can imagine doing it is by getting the position of the { and only comparing up to that. This is fine in this circumstance but what if I have some words after the parameter which are more important than a closing bracket.
EDIT: I would like to give some context to the problem so that I can explain better how flexible I need it. I think I possibly didn't emphasise the fact that I didn't want it to JUST work with the time field.
I am trying to write a system which is very flexible. I store a list of valid headings and then use them to find out what value to read/write to the csv file. It is very flexible and easily maintainable. I want to be able to keep it neat and flexible. I want to be able to write a function which takes in a string which has one of more parameters in it and then compare it with a value which has had the parameters filled in (Like the example with the Time header). In the future I may have a field for temperature in a particular place so my stored heading would be Temperature in {0}({1}) which when I am reading back it would be Temperature in Britain(c) or Temperature in America(f).
You could use a regex like this one :
string pattern = #"Time \(UTC \{(\+)*\d\}\)";
Regex rgx = new Regex(pattern);
Regex has a Match method you can use to check whether any string matches the pattern you provided.
So, I want to do this,
For example, there is a string called [FULLNAME]-Awesome Guy-[END],
But there are multiple strings in a list, so like:
[OTHER]-AG-[END]
[FULLNAME]-Awesome Guy-[END]
[NICKNAME]-AG-[END]
My question is, how can I find [FULLNAME] then set a string as [FULLNAME]-Awesome Guy-[END]
Can you guys help?
Thanks!
i'd probably recommend using a regular expression here if you just need something quick. if you need something more robust and able to handle breaking up the various tags, you might want to look at writing up your own basic parser to break stuff up by tag and let you search that way.
this code:
string s = "[OTHER]-AG-[END] [FULLNAME]-Awesome Guy-[END] [NICKNAME]-AG-[END]";
Regex re = new Regex(#"\[FULLNAME\][^[]+\[END\]");
Console.WriteLine(re.Match(s));
prints
[FULLNAME]-Awesome Guy-[END]
although it will give you malformed results if there is a [ character in the name somewhere.
I have a lot of text data with different structure. I need to extract parts of these texts based on some text-based rules. I would use regular expressions but unfortunately the people who are using the application have never heard of it.
Basically the app does the following thing:
Load the data into a textbox
Type the structure of the output as a simple set of rules into another textbox
Receive the results in a 3rd textbox
Examples of data structures (I have megabytes of this data):
Label1: value1, measurement
Label2; value2; something else
Nr, value3 (comment)
...
I need some other approach that I could use instead of regular expressions. It can be extremely simple because all I need is one value from every row.
From the example above I have to obtain the following structure:
"value1, value2, value3"
Is there a simpler alternative to regex? Did someone already implement something like this?
I can also imagine that I am approaching the problem from the wrong angle, like forcing the simple user to write data extraction rules. In this case the question is transformed to something more generic like "How can build an application that lets a very simple user extract data from a separate texts?"
Edit:
I have the following simplest as possible matching implemented for them:
File content:
"Strain at break Ax2";"Unknown"
"Strain at break Ax1";"Unknown"
"Strain at break";"Unknown"
"Yield point strain";"Unknown"
"Uniform elongation";25.4087;"%"
"Tensile strength";261.323;"MPa"
"End test phase Yield point";1;"%"
"Maximum tensile force";5.22647;"kN"
Pattern:
"Tensile strength";(?<value>[^;\n]*);
"Maximum tensile force";(?<value>[^;\n]*);
Still too complex. The problem is if I start replacing the ugly part with another string to obtain for example:
"Tensile strength", [First value after]
I loose all the generic nature of the extraction because every file looks different from this one.
Take a look at the FileHelpers library. It allows runtime generation of file layouts and I think the one that would help in your example is the DelimitedClassBuilder.
In your case, I'd probably use FileHelpers to parse the record definitions into the DelimitedClassBuilder and then use the result to parse your records.
I have solved the issue by defining the rules as regular expressions. After the rules were defined I defined a wrapper rule-set that was easier to read by the users.
Ex. to extract a value from a line
Maximum amount of Sheet Drawing Force= 35.659695[kN]
I defined the regular expression
{0}=\s*(?<value>[^[\n\r]*)
then let the user define the name of the field. The {0} placeholder was then replaced with the name of the field and the regular expression applied.
Besides doing it manually using regular expression search, is there other better ways to parse a JAD file?
I need to be able to search for and replace/insert a new MIdlet-Install-Notify property to a JAD file given, also updating the value of the MIDlet-Jar-URL property.
Using ANTLR or TinyPG is a bit overkill for my case.
TIA
Even Regex might be overkill, although it certainly will get the job done. It is a very simple text format to parse, string.StartsWith() and string.IndexOf() to find the colon would work well.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Parsing formatted string.
How can I use a String.Format format and transform its output to its inputs?
For example:
string formatString = "My name is {0}. I have {1} cow(s).";
string s = String.Format(formatString, "strager", 2);
// Call the magic method...
ICollection<string> parts = String.ReverseFormat(formatString, s);
// parts now contains "strager" and "2".
I know I can use regular expressions to do this, but I would like to use the same format string so I only need to maintain one line of code instead of two.
Here is some code from someone attempting a Scanf equivalent in C#:
http://www.codeproject.com/KB/recipes/csscanf.aspx
You'll have to implement it yourself, as there's nothing built in to do it for you.
To that end, I suggest you get the actual source code for the .Net string.format implmentation (actually, the relevant code is in StringBuilder.AppendFormat()). It's freely available, and it uses a state machine to walk the string in a very performant manner. You can mimic that code to also walk your formatted string and extract that data.
Note that it won't always be possible to go backwards. Sometimes the formatted string can have characters the match the format specifiers, making it difficult to impossible for the program to know what the original looked like. As I think about it, you might have better luck walking the original string to turn it into a regular expression, and then use that to do the match.
I'd also recommend renaming your method to InvertFormat(), because ReverseFormat sounds like you'd expect this output:
.)s(woc 2 evah .regarts si eman yM
I don't believe there's anything in-box to support this, but in C#, you can pass an array of objects directly to any method taking params-marked array parameters, such as String.Format(). Other than that, I don't believe there's some way for C# & the .NET Framework to know that string X was built from magic format string Y and undo the merge.
Therefore, the only thing I can think of is that you could format your code thusly:
object[] parts = {"strager", 2};
string s = String.Format(formatString, parts);
// Later on use parts, converting each member .ToString()
foreach (object p in parts)
{
Console.WriteLine(p.ToString());
}
Not ideal, and probably not quite what you're looking for, but I think it's the only way.