Parsing out Excel functions from Formula string - c#

I have a string which contains an Excel formula. How to parse out each particular function name from within the string?
I can't figure out how to write the regex for this. Basically it has to be the string of characters before a ( that isn't in a single or double quote.
For example:
=VLOOKUP($A9,'Summary'!$A$10:$C$30,3,FALSE) - Should return VLOOKUP
=IFERROR((C10/B10),"N/A") - should return IFERROR
='New Chart Data (Date)'!L70 - Should return nothing because there is no function
=IFERROR((C10/B10),Len(E30)) - should return IFERROR and LEN
='New Chart Data(Date)'!L70 + Len(5) - should return Len. This is the tricky one. A lot will return Data as well which is wrong.
Any ideas?
Thanks in advance.

You can use something like this I guess...
(?<=[=,])[A-Za-z2]+(?=\()
regex101 demo (with descriptions of regex)
Actually, there's one catch: a formula such as =IFERROR((C10/B10), Len(E30)) won't get Len. You can use this one instead and trim any spaces if any:
(?<=[=,])\s*[A-Za-z2]+(?=\()
Or since C# accepts variable length lookbehinds...
(?<=[=,]\s*)[A-Za-z2]+(?=\()
Which I think takes a bit more resources than the previous.
EDIT: I didn't think of the fact that sheetnames can take the form =Sheet(2) e.g. ='=Sheet(2)'!A1
(?<=[=,])\s*[A-Za-z2]+(?=\()(?![^']*'!)
revised regex101
EDIT2: Forgot operators as well... I guess I'll use a word boundary like Andy's, since the only issue is
\b[A-Za-z2]+(?=\()(?![^']*'!)
updated regex101

I think it could be simplified, using a word-break \b rather than a look-behind:
\b([A-Za-z2]+)(?=\()

Related

C# Regex replace round brackets and contents at end of string

Been struggling for an hour to get this working.
Have string of following format:
"blabla(arbitrarycontent)sfsf (arbytrarycontent)"
and also
"blabla (arbytrarycontent)"
I need to ditch the "(arbitrarycontent)", including the brackets, if it occurs at the end of the string.
So the first example the result should be "blabla(arbitrarycontent)sfsf".
For the second it should be "blabla".
Have tried all sorts of Regex patterns like below but unsuccessful.
\(.*\)$
Using .NET 4.0
Thx for any help
Simply forbid the part between the parentheses to contain parentheses. That makes sure that you only match the last pair:
\([^()]*\)$

Correction in this simple regular expression

I am new to regular expressions and the one that i have written might be a very simple one but donot know where I am wrong.
#"^([a-zA-Z._]+)#([\d]+)"
This RE is for the following string:
somename#somenumber
Now i am trying to retrieve the somename and somenumber. This is what i did:
ac.name = m.Groups[0].Value;
ac.number = m.Groups[1].Value;
Here ac.name reads the complete string, and ac.number reads somenumber. Where am I wrong in ac.name?
i guess the regex is correct, the problem is, you get the ac.name not from group 1 but group(0), which is the whole string. try this:
ac.name = m.Groups[1].Value;
ac.number = m.Groups[2].Value;
This regex is correct. I think your mistake is in somewhere else. You seem to use C#. So, you should think about the regex usage in the language.
Looking to the code sample in MSDN, you need to use 1-based indexes while accessing Groups instead of zero-based (as also Kent suggested). So, use this:
String name = m.Groups[1].Value;
String number = m.Groups[2].Value;
use this regex (\w+)#(\d+([.,]\d+)?)
Groups[1] will be contain name
Groups[2] will be contain number
I think you should move the + into the capture group:
#"^([a-zA-Z._]+)#([\d]+)"
If this is C#, try without the ^
([a-zA-Z\._]+)#([\d]+)
I just tried it out and it groups properly
Update: escaped the .
If you want only one match (and hence the ^ in original expression), use .Match instead of .Matches method. See MSDN documentation on Regular Expression Classes.

Regex between, from the last to specific end

Today my wish is to take text form the string.
This string must be, between last slash and .partX.rar or .rar
First I tried to find edge's end of the word and then the beginning. After I get that two elements I merged them but I got empty results.
String:
http://hosting.xyz/1234/15-game.part4.rar.html
http://hosting.xyz/1234/16-game.rar.html
Regex:
Begin:(([^/]*)$) - start from last /
End:(.*(?=.part[0-9]+.rar|.rar)) stop before .partX.rar or .rar
As you see, if I merge that codes I won't get any result.
What is more, "end" select me only .partX instead of .partX.rar
All what I want is:
15-game.part4.rar and 16-game.rar
What i tried:
(([^/]*)$)(.*(?=.part[0-9]+.rar|.rar))
(([^/]*)$)
(.*(?=.part[0-9]+.rar|.rar))
I tried also
/[a-zA-Z0-9]+
but I do not know how select symbols.. This could be the easiest way. But this select only letters and numbers, not - or _.
If I could select symbols..
You don't really need a regex for this as you can merely split the url on / and then grab the part of the file name that you need. Since you didn't mention a language, here's an implementation in Perl:
use strict;
use warnings;
my $str1="http://hosting.xyz/1234/15-game.part4.rar.html";
my $str2="http://hosting.xyz/1234/16-game.rar.html";
my $file1=(split(/\//,$str1))[-1]; #last element of the resulting array from splitting on slash
my $file2=(split(/\//,$str2))[-1];
foreach($file1,$file2)
{
s/\.html$//; #for each file name, if it ends in ".html", get rid of that ending.
print "$_\n";
}
The output is:
15-game.part4.rar
16-game.rar
Nothing could be simpler! :-)
Use this:
new Regex("^.*\/(.*)\.html$")
You'll find your filename in the first captured group (don't have a c# compiler at hand, so can't give you working sample, but you have a working regex now! :-) )
See a demo here: http://rubular.com/r/UxFNtJenyF
I'm not a C# coder so can't write full code here but I think you'll need support of negative lookahead here like this:
new Regex("/(?!.*/)(.+?)\.html$");
Matched Group # 1 will have your string i.e. "16-game.rar" OR "15-game.part4.rar"
Use two regexes:
start to substitute .*/ with nothing;
then substitute \.html with nothing.
Job done!

Splitting a string in C#

Let's say I have this string:
"param1,r:1234,p:myparameters=1,2,3"
...and I would like to split it into:
param1
r:1234
p:myparameters=1,2,3
I've used the split function and of course it splits it at every comma. Is there a way to do this using regex or will I have to write my own split function?
Personally, I would try something like this:
,(?=[^,]+:.*?)
Basically, use a positive look-ahead to find a comma, followed by a "key-value" pair (this defined by a key, a colon, and more information [data] (including other commas). This should disqualify the commas between the numbers, too.
You can use ; for separating values which makes easy to work with it.
Since you have , for separation and also for values it is difficult to split it.
You have
string str = "param1,r:1234,p:myparameters=1,2,3"
Recommended to use
string str = "param1;r:1234;p:myparameters=1,2,3"
which can be splited as
var strArray = str.Split(';');
strArray[0]; // contains param1
strArray[1]; // r:1234
strArray[2]; // p:myparameters=1,2,3
I'm not sure how you would write a split that knew which commas to split on there, honestly.
Unless it's a fixed number each time in which case, just use the String.Split overload that takes an int specifying how many substrings to return at max
If you're going to have comma-delimited data that's not always a fixed number of items and it could have literal commas in the data itself, they really should be quoted. If you can control the input in any way, you should encourage that, and use an actual CSV parser instead of String.Split
That depends. You can't parse it with regex (or anything else) unless you can identify a consistent rule separating one group from another. Based on your sample, I can't clearly identify such a rule (though I have some guesses). How does the system know that p:myparameters=1,2,3 is a single item? For example, if there were another item after it, what would be the difference between that and the 1,2,3? Figure that out and you'll be pretty close to a solution.
If you're able to change the format of the input string, why not decide on a consistent delimiter between your groups? ; would be a good choice. Use an input like param1;r:1234;p:myparameters=1,2,3 and there will be no ambiguity where the groups are, plus you can just split on ; and you won't need regex.
The simplest approach would be changing your delimiter from "," to something like "|". Then you can split on "|" no problem. However if you can't change the delimiting character then maybe you could encode the sections in a fashion similar to CSV.
CSV files have the same issue... the standard there is to put double quotes "" around columns.
For example, your string would be "param1","r:1234","p:myparameters=1,2,3".
Then you could use the Microsoft.VisualBasic.FileIO.TextFieldParser to split/parse. You can include this in c# even though its in the VisualBasic namespace.
TextFieldParser
Do you mean that:string[] str = System.Text.RegularExpression.Regex.Spilt("param1,r:1234,p:myparameters=1,2,3",#"\,");

Generate Public "ID Text" (like Stackoverflow)

I'm trying to create a simple method to turn a name (first name, last name, middle initial) into a public URL-friendly ID (like Stackoverflow does with question titles). Now people could enter all kinds of crazy characters, umlauts etc., is there something in .NET I can use to normalize it to URL-acceptable/english characters or do I need to write my own method to get this done?
Thank you!
Edit: An example (e.g. via RegEx or other way) would be super helpful!!! :)
Sounds like what you're after is a Slug Generator!
Simple method using UrlEncode
You obviously have to do something to deal with the collisions (prevent them on user creation being sensible but that means you are tied to this structure)
s => Regex.Replace(HttpUtility.UrlEncode(s), "%..", "")
This is relying on the output of UrlEncode always using two characters for the encoded form and that you are happy to have space convert to '+'
A regular expression to validate the string with the characters and lengths you wish to allow.
Think you'll have to write your own method...
Safelist of characters...
A to Z
For Each c As char In SafeList
If safe ... etc.
Next

Categories

Resources