Regex between, from the last to specific end - c#

Today my wish is to take text form the string.
This string must be, between last slash and .partX.rar or .rar
First I tried to find edge's end of the word and then the beginning. After I get that two elements I merged them but I got empty results.
String:
http://hosting.xyz/1234/15-game.part4.rar.html
http://hosting.xyz/1234/16-game.rar.html
Regex:
Begin:(([^/]*)$) - start from last /
End:(.*(?=.part[0-9]+.rar|.rar)) stop before .partX.rar or .rar
As you see, if I merge that codes I won't get any result.
What is more, "end" select me only .partX instead of .partX.rar
All what I want is:
15-game.part4.rar and 16-game.rar
What i tried:
(([^/]*)$)(.*(?=.part[0-9]+.rar|.rar))
(([^/]*)$)
(.*(?=.part[0-9]+.rar|.rar))
I tried also
/[a-zA-Z0-9]+
but I do not know how select symbols.. This could be the easiest way. But this select only letters and numbers, not - or _.
If I could select symbols..

You don't really need a regex for this as you can merely split the url on / and then grab the part of the file name that you need. Since you didn't mention a language, here's an implementation in Perl:
use strict;
use warnings;
my $str1="http://hosting.xyz/1234/15-game.part4.rar.html";
my $str2="http://hosting.xyz/1234/16-game.rar.html";
my $file1=(split(/\//,$str1))[-1]; #last element of the resulting array from splitting on slash
my $file2=(split(/\//,$str2))[-1];
foreach($file1,$file2)
{
s/\.html$//; #for each file name, if it ends in ".html", get rid of that ending.
print "$_\n";
}
The output is:
15-game.part4.rar
16-game.rar

Nothing could be simpler! :-)
Use this:
new Regex("^.*\/(.*)\.html$")
You'll find your filename in the first captured group (don't have a c# compiler at hand, so can't give you working sample, but you have a working regex now! :-) )
See a demo here: http://rubular.com/r/UxFNtJenyF

I'm not a C# coder so can't write full code here but I think you'll need support of negative lookahead here like this:
new Regex("/(?!.*/)(.+?)\.html$");
Matched Group # 1 will have your string i.e. "16-game.rar" OR "15-game.part4.rar"

Use two regexes:
start to substitute .*/ with nothing;
then substitute \.html with nothing.
Job done!

Related

Ignore nested single quotes inside of single quotes

I'm working with matching an entire string within single quotes. The problem is, these strings are dynamically generated and I need to ignore all other single quotes within the first set of quotes. I've come across other solutions that are similar but I can't seem to tweak them to my needs.
Here is what I've worked with so far:
'(?:''|[^'])*'
I would like to match essentially everything within the first and last single quotes between content: and ;
Some example text:
#bottom {
content: 'Here we have an embedded unescaped 'single' that is generated at runtime. {Let's ignore it
please'
;
}
This is the playground I've been working in:
https://regex101.com/r/ITHciu/2
Any help would be greatly appreciated.
If you absolutely have to use Regexes for this and you are certain that ; will not be inside the string you are searching for, you could try this: '[^;]*'\s*;$. It will select everything from a ' and go until a like that ends with whitesapce and a ;.
Edit: if you need the stuff between the ' and ';, you could use a group '([^;]*)'\s*;$.
However, a much cleaner solution would be to make a little parser, that will read the string char by char. It's a fun exercise if you got a little bit more time.
If nothing else, you could use that regex to correct the invalid syntax in your files. And tell the people manually writing them what the valid syntax should be.

Regular Expression for Digits and Special Characters - C#

I use Html-Agility-Pack to extract information from some websites. In the process I get data in the form of string and I use that data in my program.
Sometimes the data I get includes multiple details in the single string. As the name of this Movie "Dog Eats Dog (2012) (2012)". The name should have been "Dog Eats Dog (2012)" rather than the first one.
Above is the one example from many. In order to correct the issue I tried to use string.Distinct() method but it would remove all the duplicate characters in the string as in above example it would return "Dog Eats (2012)". Now it solved my initial problem by removing the 2nd (2012) but created a new one by changing the actual title.
I thought my problem could be solved with Regex but I have no idea as to how I can use it here. As far as I know if I use Regex it would tell me that there are duplicate items in the string according to the defined Regex code.
But how do I remove it? There can be a string like "Meme 2013 (2013) (2013)".
Now the actual title is "Meme 2013" with year (2013) and the duplicate year (2013). Even if I get a bool value indicating that the string has duplicate year, I cant think of any method to actually remove the duplicate substring.
The duplicate year always comes in the end of the string. So what should be the Regex that I would use to determine that the string actually has two years in it, like (2012) (2012)?
If I can correctly identify the string contains duplicate maybe I can use string.LastIndexOf() to try and remove the duplicate part. If there is any better way to do it please let me know.
Thanks.
The right regex is "( \(\d{4}\))\1+".
string pattern = #"( \(\d{4}\))\1+";
new Regex(pattern).Replace(s, "$1");
Example here : https://repl.it/Evcy/2
Explanation:
Capture one " (dddd)" block, and remove all following identical ones.
( \(\d{4}\)) does the capture, \1+ finds any non empty sequence of that captured block
Finally, replace the initial block and its copies by the initial block alone.
This regex will allow for any pattern of whitespace, even none, as in (2013)(2013)
`#"(\(\d{4}\))(?:\s*\1)+"`
I have a demo of it here

regex to replace last two digits of an assemblyversion

I'm working with teamcity and a C# project, and I want to use the file content patcher to replace the last two digits in an assemblyversion (eg: the two stars in [assembly: AssemblyVersion("1.0.*.*")]). I've found the docs on the file content patcher and it suggests using
(^\s*\[\s*assembly\s*:\s*((System\s*\.)?\s*Reflection\s*\.)?\s*AssemblyVersion(Attribute)?\s*\(\s*#?\")(([0-9\*]+\.)+)[0-9\*]+(\"\s*\)\s*\]) if you just want to change the LAST digit, which got me partway there.
I figured if I did (^\s*\[\s*assembly\s*:\s*((System\s*\.)?\s*Reflection\s*\.)?\s*AssemblyVersion(Attribute)?\s*\(\s*#?\")(([0-9\*]+(\.))+)[0-9\*]+(\"\s*\)\s*\]) it would capture the last period as it's own group, letting me replace the two stars without a problem. However it looks like the first star is still captured in the group with the 1.0 (so the group becomes 1.0.*.).
What I want is to restrict the first group to capturing the {major}.{minor}. and then have the last period be it's own group so I could do something like: $1$5\%build.number%$7%build.vcs.number%$8 which would give me AssemblyVersion("1.0.{build#}.{vcs#}")]
Generally I can stumble through regex without many problems but I've been working on this for the last few hours and I can't seem to get it correct. Any information on reaching this conclusion would he appreciated.
If you want to keep to the solution you found to replace while also validating, you may use
(^\s*\[\s*assembly\s*:\s*((System\s*\.)?\s*Reflection\s*\.)?‌​\s*AssemblyVersion(A‌​ttribute)?\s*\(\s*#?‌​\")([0-9\*]+\.[0-9\*‌​]+)\.([0-9\*]+\.[0-9‌​\*]+)(\"\s*\)\s*\])
and replace with $1$5.%build.number%.%build.vcs.number%$7.
See the regex demo
I just unrolled the ([0-9\*]+(\.))+ into ([0-9\*]+\.[0-9\*‌​]+)\.([0-9\*]+\.[0-9‌​\*]+), 2 groups (([0-9\*]+\.[0-9\*‌​]+)) separated with a literal dot (\.). I also had to remove the [0-9\*]+ that followed the ([0-9\*]+(\.))+ pattern.
I would first extract 1.0.*.* and then use Version.Parse.
Much smaller regex (and can be shortened more)..
string input = #"[assembly:AssemblyVersion(""1.2.3.4"")]";
var verStr = Regex.Match(input, #"\[.+?\(\""(.+?)\""\)\]").Groups[1].Value;
var version = Version.Parse(verStr);

C# Regex replace round brackets and contents at end of string

Been struggling for an hour to get this working.
Have string of following format:
"blabla(arbitrarycontent)sfsf (arbytrarycontent)"
and also
"blabla (arbytrarycontent)"
I need to ditch the "(arbitrarycontent)", including the brackets, if it occurs at the end of the string.
So the first example the result should be "blabla(arbitrarycontent)sfsf".
For the second it should be "blabla".
Have tried all sorts of Regex patterns like below but unsuccessful.
\(.*\)$
Using .NET 4.0
Thx for any help
Simply forbid the part between the parentheses to contain parentheses. That makes sure that you only match the last pair:
\([^()]*\)$

Parsing a CSV File with C#, ignoring thousand separators

Working on a program that takes a CSV file and splits on each ",". The issue I have is there are thousand separators in some of the numbers. In the CSV file, the numbers render correctly. When viewed as a text document, they are shown like below:
Dog,Cat,100,100,Fish
In a CSV file, there are four cells, with the values "Dog", "Cat", "100,000", "Fish". When I split on the "," to an array of strings, it contains 5 elements, when what I want is 4. Anyone know a way to work around this?
Thanks
There are two common mistakes made when reading csv code: using a split() function and using regular expressions. Both approaches are wrong, in that they are prone to corner cases such as yours and slower than they could be.
Instead, use a dedicated parser such as Microsoft.VisualBasic.TextFieldParser, CodeProject's FastCSV or Linq2csv, or my own implemention here on Stack Overflow.
Typically, CSV files would wrap these elements in quotes, causing your line to be displayed as:
Dog,Cat,"100,100",Fish
This would parse correctly (if using a reasonable method, ie: the TextFieldParser class or a 3rd party library), and avoid this issue.
I would consider your file as an error case - and would try to correct the issue on the generation side.
That being said, if that is not possible, you will need to have more information about the data structure in the file to correct this. For example, in this case, you know you should have 4 elements - if you find five, you may need to merge back together the 3rd and 4th, since those two represent the only number within the line.
This is not possible in a general case, however - for example, take the following:
100,100,100
If that is 2 numbers, should it be 100100, 100, or should it be 100, 100100? There is no way to determine this without more information.
you might want to have a look at the free opensource project FileHelpers. If you MUST use your own code, here is a primer on the CSV "standard" format
well you could always split on ("\",\"") and then trim the first and last element.
But I would look into regular expressions that match elements with in "".
Don't just split on the , split on ", ".
Better still, use a CSV library from google or codeplex etc
Reading a CSV file in .NET?
You may be able to use Regex.Replace to get rid of specifically the third comma as per below before parsing?
Replaces up to a specified number of occurrences of a pattern specified in the Regex constructor with a replacement string, starting at a specified character position in the input string. A MatchEvaluator delegate is called at each match to evaluate the replacement.
[C#] public string Replace(string, MatchEvaluator, int, int);
I ran into a similar issue with fields with line feeds in. Im not convinced this is elegant, but... For mine I basically chopped mine into lines, then if the line didnt start with a text delimeter, I appended it to the line above.
You could try something like this : Step through each field, if the field has an end text delimeter, move to the next, if not, grab the next field, appaend it, rince and repeat till you do have an end delimeter (allows for 1,000,000,000 etc) ..
(Im caffeine deprived, and hungry, I did write some code but it was so ugly, I didnt even post it)
Do you know that it will always contain exactly four columns? If so, this quick-and-dirty LINQ code would work:
string[] elements = line.Split(',');
string element1 = elements.ElementAt(0);
string element2 = elements.ElementAt(1);
// Exclude the first two elements and the last element.
var element3parts = elements.Skip(2).Take(elements.Count() - 3);
int element3 = Convert.ToInt32(string.Join("",element3parts));
string element4 = elements.Last();
Not elegant, but it works.

Categories

Resources