How to parse a midlet JAD file in C#? - c#

Besides doing it manually using regular expression search, is there other better ways to parse a JAD file?
I need to be able to search for and replace/insert a new MIdlet-Install-Notify property to a JAD file given, also updating the value of the MIDlet-Jar-URL property.
Using ANTLR or TinyPG is a bit overkill for my case.
TIA

Even Regex might be overkill, although it certainly will get the job done. It is a very simple text format to parse, string.StartsWith() and string.IndexOf() to find the colon would work well.

Related

Fastest way of removing unicode codes from a string

Hi I'm trying to figure out a way to remove the tags from the results returned from the Google Feed API. Specifically they are placing bold tags on titles and inside the description.
The codes that are being inserted are as follows:
\u003cb
\u003e
\u003c/b\u003e
Since its a fixed amount I did try doing a String.Replace() for each of these codes per string but it resulted in bad performance not surprisingly. I'm not sure if RegEx would be better (or worse). Does anyone have an idea on how to remove these? Google does not supply an option to remove tags from the results.
You could remove the unicode codes using a regex like this one:
\\u[\d\w]{4}
var subject = #"\u003cb\u003e\u003c/b\u003e";
var result = Regex.Replace(subject, #"\\u[\d\w]{4}", String.Empty);
As for performance, this article seems to suggest that regex is much slower, but i would run your own tests with your own data as it might be wildly different. The regular expression itself will play a big part in performance and I don't think that article states what the regex is being used so its impossible to compare. The size and type of your data will also play a big part, so it's difficult to say which is better without understanding your data.
Also, you should try compiling the regex with the RegexOptions.Compiled flag to see if that boosts performance.

C# , How to write RegEx.Replace to replace value for an xml element?

Have a xml string, goal is to replace an xml element value to a fixed string, i.e. for blah blah blah replace it to fixed value, I am thinking to use RegEx.Replace instead of loading the string to a DOM model and replace.
Could anyone please help on how to write this regular expression? essentially the goal is to match everything inside element tag 'abc'
Thanks a lot!
This article tells you what you need to know: XML is not Regular
Ignoring the most obvious solution to their problem (which would be to use a pre-existing XML parser), they think they should use regular expressions (regex for short). Now they have two problems.
Use regular expressions only on regular languages.
That said, there are many sites that purport to offer guidance on writing regular expressions for XML. They are all wrong. But they exist, and you can use them at your own risk.
For what it's worth, don't.
Process the XML normally, with a XmlDocument, Xml.Linq or XmlReader/Writer, it's what they are for, cover all kinds of edge cases we couldn't even imagine, and above all, are proven to work.
Don't use a regex for this, please . . . just don't.
My two cents.
let the downvoting begin
Regular expressions are meant to be used on regular languages. XML is a non-regular language. As such, regular expressions cannot be used to properly parse anything written in it. You will need to use a real XML parser, which can be found in the numerous libraries available in C#, to do it.
Regular expressions are not suitable for processing markup. Among other flaws, they won't work if elements can be nested:
<abc> ... <abc> ... </abc> ... </abc>
They are also unable to distinguish a comment from a non-comment.
You need a real XML parser.

Regex to remove xml declaration from a string

First of all, I know this is a bad solution and I shouldn't be doing this.
Background: Feel free to skip
However, I need a quick fix for a live system. We currently have a data structure which serialises itself to a string by creating "xml" fragments via a series of string builders. Whether this is valid XML I rather doubt. After creating this xml, and before sending it over a message queue, some clean-up code searches the string for occurrences of the xml declaration and removes them.
The way this is done (iterate every character doing indexOf for the <?xml) is so slow its causing thread timeouts and killing our systems. Ultimately I'll be trying to fix this properly (build xml using xml documents or something similar) but for today I need a quick fix to replace what's there.
Please bear in mind, I know this is a far from ideal solution, but I need a quick fix to get us back up and running.
Question
My thought to use a regex to find the declarations. I was planning on: <\?xml.*?>, then using Regex.Replace(input, string.empty) to remove.
Could you let me know if there are any glaring problems with this regex, or whether just writing it in code using string.IndexOf("<?xml") and string.IndexOf("?>") pairs in a (much saner) loop is better.
EDIT
I need to take care of newlines.
Would: <\?xml[^>]*?> do the trick?
EDIT2
Thanks for the help. Regex wise <\?xml.*?\?> worked fine. I ended up writing some timing code and testing both using ar egex, and IndexOf(). I found, that for our simplest use case, JUST the declaration stripping took:
Nearly a second as it was
.01 of a second with the regex
untimable using a loop and IndexOf()
So I went for IndexOf() as it's easy a very simple loop.
You probably want either this: <\?xml.*\?> or this: <\?xml.*?\?>, because the way you have it now, the regex is not looking for '?>' but just for '>'. I don't think you want the first option, because it's greedy and it will remove everything between the first occurrence of ''. The second option will work as long as you don't have nested XML-tags. If you do, it will remove everything between the first ''. If you have another '' tag.
Also, I don't know how regexes are implemented in .NET, but I seriously doubt if they're faster than using indexOf.
strXML = strXML.Remove(0, sXMLContent.IndexOf(#"?>", 0) + 2);

Regular expression in C# , is this possible?

I never use regular expression before and plan to use it to solve my problem but not quite sure whether it can help me.
I have a situation where I need store a rule or formula to build string values like following examples in a database field then retrieve this rule and build the string value.
FacilityCode + Left(ModelNO,2)
Right(PO,3) + Left(Serial,2)
Is this achievable using .net regular expression? Any good tutorial or simple examples of this problem.
Regexp : http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
But it doesn't seems fitting :)
It might be better to code some random string generator. Regex is for searching data not creating data.
The thing to remember about regex is that it is like an aircraft carrier; it does one thing very very well, it does not do other jobs very well at all.
An aircraft carrier moves planes very well on the ocean; it does not make a cheese sandwich well AT ALL!!
That is to say, if you use regex when you shouldn't you will almost certainly use far more processing power than if you used another tool for that job. Html parsing comes to mind.
Regex is provided as part of System.Text.RegularExpressions, but you can't rely exclusively on it. It'll let you search existing strings, but you'll need to implement your own logic for building new strings based on what you find in the existing data.
Also, keep in mind that System.Text.RegularExpressions works differently from regexp in Perl and other implementations. For example, it doesn't recognize POSIX character class definitions.
Since you're new to regex, you might want to check out the "Regular Expressions User Guide" on zytrax.com. It's not as comprehensive as an O'Reilly manual, but it'll do as a start.

Wikilinks - turn the text [[a]] into an internal link

I need to implement something similar to wikilinks on my site. The user is entering plain text and will enter [[asdf]] wherever there is an internal link. Only the first five examples are really applicable in the implementation I need.
Would you use regex, what expression would do this? Is there a library out there somewhere that already does this in C#?
On the pure regexp side, the expression would rather be:
\[\[([^\]\|\r\n]+?)\|([^\]\|\r\n]+?)\]\]([^\] ]\S*)
\[\[([^\]\|\r\n]+?)\]\]([^\] ]\S*)
By replacing the (.+?) suggested by David with ([^\]\|\r\n]+?), you ensure to only capture legitimate wiki links texts, without closing square brackets or newline characters.
([^\] ]\S+) at the end ensures the wiki link expression is not followed by a closing square bracket either.
I am note sure if there is C# libraries already implementing this kind of detection.
However, to make that kind of detection really full-proof with regexp, you should use the pushdown automaton present in the C# regexp engine, as illustrated here.
I don't know if there are existing libraries to do this, but if it were me I'd probably just use regexes:
match \[\[(.+?)\|(.+?)\]\](\S+) and replace with \1\3
match \[\[(.+?)\]\](\S+) and replace with \1\2
Or something like that, anyway.
Although this is an old question and already answered, I thought I'd add this as an addendum for anyone else coming along. The existing two answers do all the real work and got me 90% there, but here is the last bit for anyone looking for code to get straight on with trying:
string html = "Some text with a wiki style [[page2.html|link]]";
html = Regex.Replace(html, #"\[\[([^\]\|\r\n]+?)\|([^\]\|\r\n]+?)\]\]([^\] ]\S*)", #"$2$3");
html = Regex.Replace(html, #"\[\[([^\]\|\r\n]+?)\]\]([^\] ]\S*)", #"$1$2");
The only change to the actual regex is I think the original answer had the replacement parts the wrong way around, so the href was set to the display text and the link was shown on the page. I've therefore swapped them.

Categories

Resources