Using C#, I will be handling character arrays of info, looking for the following pattern:
a pipe (0x7C), 2 to 7 pairs of characters, followed by another pipe (0x7C).
Stated another way:
|1122[33][44][55][66][77]|
The character pairs consist of characters whose range is from 33-124 decimal ( '!' to '|').
Pairs 3 through 7 are optional, but occur in order, if they occur, so you could have
|1122| <---shortest
|112233|
|11223344|
|1122334455|
|112233445566|
|11223344556677| <---longest
I want to 1) find out if this pattern exists in the character array, 2) extract the individual pairs. These tasks can be separate. I think the best approach to this would be a RegEx, but so far I haven't been able to dream-up an expression to get the job done.
Is a RegEx the way to go and what would a solution for the RegEx itself be?
Is there a better way?
Chuck
If I understand your question correctly the correct pattern would be:
\|([!-|]{2}){2,7}\|
Or to capture each set
\|([!-|]{2})([!-|]{2})([!-|]{2})?([!-|]{2})?([!-|]{2})?([!-|]{2})?([!-|]{2})?\|
Not sure if the range will work directly like that or not, so you may need to do [A-Za-Z!##$......] if the simplified range doesn't work
Also, I think you don't want to include pipe(|) in the range as it could mess up the rest so [!-{] might be better
Related
I'm trying to validate a pattern used for renaming.
The user will fill value like :
%1% - %3%%2%
I'm able to match with a regex, everything is ok:
[^%]*(%[\d]+%)+[^%]*
But before that I want to validate the string and be able to find when the user made mistakes like :
%1% - %3%2%
%1% - %3%%2
...
Whatever I try, I can get the corrected value but I don't know if the string is well formatted or not. Only to check manually.
Are there any way with regex to answer to this problem ? Or maybe I don't need regex for this...
EDIT FOR CLARIFICATION
For a good example, just take a program which rename your mp3 files.
You define a mapping between %1% and the track title, %2% for the artist, ...
Sorry, my mistake was to provide only one string. But the user can submit :
%1% - %3%%2%
%1%_%2%%3%
%1%%3% %2%
...
Whatever he want. My goal to parse the string if everything is correct, seems ok for me. Unless I find a tricky bad example.
But before I save it, I want to validate and refuse a string like
%1% - %3%%2
My problem was to find the wrong value. What I done, and seems to me not clean, is to use my regex, and then verify if the total of "%" found in the string is even and if this total divided by 2 is equal of the total of group found. But I'm not sure it works always (not sure if my last phrase is clear)
I think this regex is what you're trying to accomplish.
(%[\d]%) - (%[\d]%)*
I don't know if the string is well formatted or not.
This pattern puts in a check for three consecutive %%% which seems to catch a good number of failure bad format scenarios. Then we can require the pattern to validate* for only good items by adding the $ anchor to require only fully formed valid patterns.
The valid pattern of (%\d%) is what we seek:
^ # Start Anchor
(?!.+%%%) # Stop if 3 % anywhere.
%\d% # First \d
\s-\s # Dash and spaces
(%\d%)+ # Groups of numbers
$ # Stop Anchor
It works on the one example you gave %1% - %3%%2% and doesn't match on the 2 failure examples you provided.
Because this pattern is documented you will need to use IgnorePatternWhiteSpace as a regex option. Otherwise delete all comments and join onto one line without spaces.
When one uses * (zero to many) it can create some ungodly backtracking scenarios which can actually fail a good pattern. Is there really going to be zero items?
Your examples don't show it; if not why not use + 1 to many?
I’m using Nintex Workflows with a RegEx action. I believe the RegEx is based on .NET. I need to perform a RegEx on some data that is sent to me by users who input it in a different formats based on the person writing the data.
Test: A-BC12 (1,2,3,4,5,6,7,8,9);
Test: A-DE34 (1,2,3,4, words, 5,6,7,8,9);
Test: AFG56 (1,2,3,4 word, 5);
STOP some extra
My goal is this.
Start the extract after Test:
Capture the last 4 of the alpha numeric before the parenthesis
Capture the numbers only inside the parenthesis
Split each data based on ;
End the whole capture when the word STOP is found.
End results
BC12 (1,2,3,4,5,6,7,8,9);
DE34 (1,2,3,4,5,6,7,8,9);
FG56 (1,2,3,4,5);
I have tried splitting the data, forward lookup and exclude and I can’t seem to get everything to work together. If I have to execute multiple RegEx to achieve my results I’m ok with that.
I’ve tried the following to achieve each one of my goals
(?s)(?<=^.*?Test:\s)[a-zA-Z0-9]+ this only capture the first ABC12 or A-BC12 then stops
[,;] split the data so it is easier to maintain. However the word Test: is captured.
I feel I'm going in the right direction, however I'm missing something or taking the wrong approach. Any help would be greatly appreciated.
If you need to omit the first group you can use this regex: Test:\s*A[^;]*;(.*?)STOP.
That way, you can take $1 and split it on ;.
Edit: Clarifications have rendered the above solution obsolete. I've made new stuff that will directly address your steps:
a. Start the extract after Test:
b. Capture the last 4 of the alpha numeric before the parenthesis
c. Capture the numbers only inside the parenthesis
d. Split each data based on ;
e. End the whole capture when the word STOP is found.
You're actually looking for something like:
Use Test:\s*(.*?)STOP. This addresses steps a and e.
Take $1 and use [A-Z0-9]{4}\s*\(([^)]*)\);. This addresses steps b and d.
Take the $1 from the previous step, and use ([0-9]+) to get the numbers. This will get all the numbers, and if given: 9,10 it will produce two matches: 9 and 10.
You may need to use modifiers, like i for case insensitive, s for single line, and g for global.
I hope this is finally what you're looking for!
I need to check a string that contains a list of e-mails. These emails are usually separated by commas, but I need to check if somewhere in that list there is a delimiter other than a comma. Here's an example:
email1#email.com,email2#email.com,email3#email.com#email4#email.com
I need to identify that different character and replace to a comma.
I cannot just use a regex to identify special characters other than the comma and replace them because emails may have some of these characters. So I need to find something between two e-mail.
I made the following regex to identify an e-mail and I believe it will cover most of the emails:
^[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#[a-z0-9]+(\.[a-z0-9]+)+$
But I'm a little lost on how to use it to solve my problem, using C #. I need to capture something that was between two matches of this regex and replace to a comma.
Could anyone help me?
Thank you.
Your problem is unsolvable because the delimiter can not always be determined by a human.
Consider this input where the delimiter is a .:
user#server.co.uk.user#otherServer.com
Is this:
user#server.co | uk.user#otherServer.com
or is it:
user#server.co.uk | user#otherServer.com
Or this input:
user#server.intuser#otherServer.com
Is it delimiter u:
user#server.int | ser#otherServer.com
Or delimiter t:
user#server.in | user#otherServer.com
If you're not willing to accept a certain percentage of failures, you're better off looking for ways not to receive this input to begin with.
([^#,]+#[^.]+\.\w{3}(?!,|$)).
Try this.Replace by $1,.See demo.
http://regex101.com/r/tF4jD3/15
P.S this will work for email id's of format something#something.com.
I can't think of an elegant way to achieve this. If you don't mind an inelegant solution, you can replace any top level domain plus one character with the same TLD plus comma.
You'll end up replacing ".com#" with ".com,", ".eu*" with ".eu," and so on. Replacement could take place using Regex so your iterations will be the same number of the TLDs you want to replace.
One option you could try is to split the incoming string using the # symbol and check that each part of the resulting array has a comma in int--except the first and last.
If you find one that is missing the comma do a search for the .com or .net or .org in that element and stick a comma after that character.
Lastly just run splice the list back together with the # symbol
Thanks for the replies.
The string must have only commas as the delimiter.
The example I mentioned was just to illustrate, because this list was generated using a jquery plugin that had a flaw that was noticed only after allowing it to be saved in the list something like "email1#email.comemail2#email.com" or any other combination non standard "email1#email.com,email2#email.com".
My main concern is cases like "email1#email.com/email2#email.com"
I'm trying to automate a search for this kind of inconsistency, as prevention.
I thought about using regex but I really do not know if it is the best approach.
I am now thinking, as it is not a critical part of the system, it would be a simpler way just to use a list of invalid characters to make the replace.
But I will try the vks's solution.
Thank you all.
How can I extend already present Regex's with an attribute telling that the regex can't exceed a maximum length of (let's say) 255?
I've got the following regex:
([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)
I've tried it like that, but failed:
{.,255([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)}
Best way of doing this, if it has to be a solely regex based solution, would be to use lookarounds.
View this example: http://regex101.com/r/yM3vL0
What I am doing here is only matching strings that are at most three characters long. Granted, for my example, this is not the best way to do it. But ignore that, I'm just trying to show an example that will work for you.
You also have to anchor your pattern, otherwise the engine will just ignore the lookaround (do I have to explain this in depth?)
In other words, you can use the following in your regular expression to limit it to at most 255 characters:
^(?!^.{256})([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)
I also feel it is my duty to tell you your regular expression is bad and you should feel bad.
A regex is not really made to solve all problems. In this case, I'd suggest that testing a length of 255 is going to be expensive because that's going to require 255 states in the underlying representation. Instead, just test the length of the string separately.
But if you really must, you will need to make your characters optional, so something like:
.?{255}
Will match any string of 255 or fewer characters.
Why not just check for Max Length of the string as well? If you're using DataAnnotations, you can stick [StringLength(255)] on the property.
If you're using ASP.NET Validators, you can use a RangeValidator.
If you're using a custom validation function it's much more readable (and faster) to check the length before you throw a complex regex against it.
You "may" be able to use a look-ahead as follows:
^(?=.{0,255}$)your regex here$
So...
^(?=^.{0,255}$)([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
I have code that searches a folder that contains SQL patch files. I want to add file names to an array if they match the following name format:
number-text-text.sql
What Regex would I use to match number-text-text.sql?
Would it be possible to make the Regex match file names where the number part of the file name is between two numbers? If so what would be the Regex syntax for this?
The following regex make it halfway there:
\d+-[a-zA-Z]+-[a-zA-Z]+\.sql
Regarding to match in a specific range it gets trickier as regex doesn't have a simple way to handle ranges. To limit the match to a filename with a number between 2 and 13 this is the regex:
([2-9]|1[0-3])-[a-zA-Z]+-[a-zA-Z]+\.sql
Your regular expression should be:
(\d+)-[a-zA-Z]+-[a-zA-Z]+\.sql
You would then use the first captured group to check if your number is between the two numbers you desire. Don't try to check if a number is within a range with a regular expression; do it in two steps. Your code will be much clearer for it.
How about:
\d+-[^-]+-[^-]+\.sql
Edit: You want just letters, so here it is without specific ranges.
\d+-[a-z]+-[a-z]+\.sql - You'll also want to use the i flag, not sure how that's done in c#, but here it is in js/perl:
/\d+-[a-z]+-[a-z]+\.sql/i
Ranges are more difficult. Here's an example of how to match 0-255:
([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])
So to match (0-255)-text-text.sql, you'd have this:
/^(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])-[a-z]+-[a-z]+\.sql/i
(I put the digits in a non-capturing group and matched from the beginning of the string to prevent partial matches on the number and in case you're expecting numbered groups or something).
Basically every time you need another digit of possibility, you'll need to add a new condition inside this case. The smaller the digit you'd like to match, the more cases you'll need as well. What is your desired min/max? AFAIK there's not a simple way to do this dynamically (although I'd love for someone to show me I'm wrong about that).
The simplest way to get around this would be to simply capture the digits, and use native syntax to see if it's in your range. Example in js:
var match = filename.match(/(\d+)-[a-z]+-[a-z]+\.sql/i);
if(match && match[1] < maximumNumber && match[1] > minimumNumber){
doStuff();
}
This should work:
select '4-dfsg-asdfg.sql' ~ E'^[0-9]+-[a-zA-Z]+-[a-zA-Z]+\\.sql$'
This restricts the TEXT to simple ASCII characters. May or may not be what you want.
This is tested in PostgreSQL. Regular expression flavors differ a lot between implementations. You probably know that?
Anchors at begin ^ and end $ are optional, depending how you are going to do it.