C# Regex quick help - c#

I'm trying to read a text file, and then break it up by each line thats is split by a "\n". Then Regex it and write out the regex.
string contents = File.ReadAllText(filename);
string[] firefox = filename.Split("\r\n");
string prefix = prefix = Regex.Match(firefox, #"(\d)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix);
string[] firefox = filename.Split("\r\n"); doesnt exactly work.
What I want to do is run a regex foreach line of contents and then write out each line after the regex
So...
filename:
Hero123
Hero243
Hero5959
writes out to:
13
243
5959
Well everybody is suggesting something off the base in which i started. the ending result will be about a 20 line regex with Ints. I've got to parse it out line by line.

File.ReadAllLines
var lines = File.ReadAllLines(originalPath);
File.WriteAllLines(newPath, lines
.Select(l => Regex.Match(l, #"\d+").Value).ToArray());

There are a number of problems with your code:
The reason the splitting doesn't work, is because you're splitting filename, not contents, which contains the actual file data. I agree with the other poster on using File.ReadAllLines :) It's a little more flexible with the file format compared to using \r\n, amongst other things.
Also, you have string prefix = prefix = ..., the second equals sign is probably intended to be a +. You should using StringBuilder if the data files can become large, or better yet, write to an output stream as you go.
Passing an array to Regex.Match doesn't work either. To apply the regex to all lines, you should do something like:
foreach (string line in firefox)
{
prefix = prefix + Regex.Match(line, // etc
// Or rather:
// stringBuilder.AppendLine(...)
}
Either that, or do it all at once with a multiline regex :)

Related

C# regex to parse /simple1/1.2-SNAPSHOT/

I need to find the last two values at the end of such a string, "simple1" and "1.2-SNAPSHOT" in the sample url below. But my code below (try to get simple1/1.2-SNAPSHOT/) doesn't work, can anyone help?
http://localhost:8060/nexus/service/local/repositories/snapshots/content/org/sonatype/mavenbook/simple1/1.2-SNAPSHOT/
List<string> artifacts = new List<string>(); // this is already foler URL
// store all URLs to the artifacts be deleted
artifacts = nexusAPI.findArtifacts(repository, contents, days, pattern);
var regex = new Regex(".*\\/(.*\\/.*\\/)$");
foreach (string url in artifacts)
{
Console.WriteLine("group/artifact: {0}", regex.Matches(url));
}
I would just split the string on '/' and get the last two parts. The regex isn't going to do anything more then that.
If you must use RegEx, you're encountering an issue in that regexes are greedy - that means it puts as much in each .* as it possibly can. So your first step is to make the regex not greedy. Simply use this as your pattern:
(.*?)/
Here's a simple test showing how that this works.
This tells the regex to look for any character up to the slash, and then stop.
When you call Regex.Matches(url, "(.*?)/"), you will get returned an array of the matching data. From there, you can just look at the last two elements.
Of course, as SledgeHammer mentioned, this is one case where regex is unnecessary and even cumbersome. Simply working with url.Split(new char[] {'/'}) will give you the results you need.

How to reliably split the contents of a TextBox into lines?

When I get TextBox.Text, how can I split it into lines?
For text files I do this by splitting at \r\n (Notepad++). With the TextBox, usually it is \r\n, but apparently not always - since my \r\n split occasionally fails for pasted input. Luckily for text files I can use Notepad++'s "Show all characters" features to inspect the whitespace. But how can I see what characters are in the TextBox?
Clearly the TextBox itself knows how to deal with this, since it is able to display the text with linebreaks at correct locations. How can I take var s = MyTextBox.Text; and "split s at every location where the TextBox would have displayed a break"?
Edit: I've checked my Regex and actually I am already splitting at Environment.Newline, not \r\n.
You could use
var lines = Regex.Split(s, #"\r\n|\r|\n");
to reliably split at any newline, no matter where the newlines came from.
In the case of a TextBox, you can use the Lines property, as suggested in the comments. But you might need to get the lines from a string in other situations too.
For doing this I use an extension method based on a StringReader:
public static IEnumerable<string> GetLines(this string s)
{
using (var reader = new StringReader(s))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
(this method is also available in my NString library)
You can use it like this:
string[] lines = textBox.Text.GetLines().ToArray();
The StringReader class knows where to split the lines, and it's probably faster than using a regex (just an intuition, I haven't actually benchmarked it EDIT: I just did a quick benchmark, StringReader is about 6 times faster than Regex)

Trouble with finding new line character from a string, Reading Text from .txt file

I am reading text from a .TXT file in a String. I am using File.ReadAllText.
string Str= File.ReadAllText(#"C:\temp\file.txt", Encoding.Default);
Let's assume the Str contains following string.
string Str= #"one
two
three";
Now the problem is I cannot find the newline characters from Str.
string[] lines = Str.Split('\n');
foreach(string line in lines)
{
Console.WriteLine(line.IndexOf('\n'); // prints -1 three times
}
Is there any way I can find newline character in this situation? Please suggest.
.Split will delete the delimiter characters, and the resulting output will not include them.
From MSDN:
Delimiter characters are not included in the elements of the returned
array.
If you need to find the length of a line, just use the .Length property of the string.
In any case, as mentioned in the comments, use the File.ReadAllLines method to avoid having to split the file contents yourself.
Per MSDN documentation for the String.Split method:
Delimiter characters are not included in the elements of the returned
array.
Just as an observation, if you are trying to load a file and process it line by line you may want to do something like this:
foreach (var line in File.ReadLines(#"C:\temp\file.txt"))
{
Console.WriteLine(line);
}

Regular expression to extract file name from p4 path

From path "//source/project/file.cs#232", I need to match file.cs
Match myMatch = Regex.Match(path, #"(\w+\.\w+)[^/]*$");
This would give file.cs in groups[1].
But for paths with dots in the file name, this doesn't work.
path "//source/project/file.initial.config.cs#232"
How could I modify this to work to give file.initial.config.cs?
Try this regex -- also into group 1, and assuming the extension can only be letters, numbers or the underscore:
.*/((?:.*?\.)+\w+)
This could be made more robust, if necessary, with knowledge of the allowable characters and suffixes for file naming, as well as details about the text in which (if) this file name is embedded. For example, if spaces were not allowed as part of the name
.*/((?:\S*?\.)+\w+)
or if ONLY letters, digits or the underscore are allowed:
.*/((?:\w*?\.)+\w+)
If we could be assured that there will be no dots or spaces after the last dot in the sequence, and spaces not allowed in the filename, it could be shortened further to:
.*/(\S*\.\w+)
to pick up everything between the last "/" and the last "." as well as any word characters after the last "."
etc
A number of non-'/' before '#':
/([^/]+)#
This should allow you to do what you want, or at least give you a better idea of how to achieve it:
/(\w+)(?:\..*)(\w{2,3})\#)
• example: http://regex101.com/r/wQ9jG2
Can you not simply modify your regex from (\w+\.\w+)[^/]*$ to (\w+(\.\w+)+)[^/]*$, to allow multiple occurrences of .words?
Why use regex, when you can do it in c# ?
I've created a function for you:
public static class FileNameHelper
{
public static string GetFileNameFromPath(string path, string extWithoutdot = "cs")
{
var startIndex = path.LastIndexOf('/') + 1;
var stringg = path.Substring(startIndex);
var remIndex = stringg.LastIndexOf("." + extWithoutdot) + extWithoutdot.Length+1;
return stringg.Remove(remIndex);
}
}
How to use ?
string filename=FileNameHelper.GetFileNameFromPath("//source/project/file.initial.config.cs#232","cs");
Remember to use the extension without .
See this has a lot of advantage over regex. They are:
Its not regex !
Its fast and efficient.
Its readable and pure c#
Note: Don't use regex in c# for trivial things. It's definitely a blow on the performance. First think of ways of achieving it in c#. Regex should be a last resort. Of course, if performance doesn't matter, use whatever !
By the way, mark it as answer if it helps. I know it'll help :)
If you're not averse to avoiding regular expressions, you could do this with just a small bit of string manipulation:
string mypath = "//source/project/file.initial.config.cs#232";
string filename = GetFileName(mypath);
static string GetFileName(string path)
{
var pathPieces = path.Split('/').Last().Split('#');
var filename = pathPieces.Take(pathPieces.Length - 1);
return String.Join("#", filename);
}
Easier, and works with any arbitrary filename (even those with spaces or # characters).
EDIT: Now works with filenames with # characters in them, although those are highly discouraged in Perforce.
(?<=/)[^/]+(?=#)
Using lookaround, it matches only the filename.

RegEx for extracting number from a string

I have a bunch of files in a directory, mostly labled something like...
PO1000000100.doc or .pdf or .txt
Some of them are PurchaseOrderPO1000000109.pdf
What i need to do is extract the PO1000000109 part of it. So basically PO with 10 numbers after it...
How can I do this with a regex?
(What i'll do is a foreach loop on the files in the directory, get the filename, and run it through the regex to get the PO number...)
I'm using C# - not sure if this is relevant.
Try this
String data =
Regex.Match(#"PO\d{10}", "PurchaseOrderPO1000000109.pdf",
RegexOptions.IgnoreCase).Value;
Could add a Regex.IsMatch with same vars above ofc :)
If the PO part is always the same, you can just get the number without needing to use a regex:
new string(theString.Where(c => char.IsDigit(c)).ToArray());
Later you can prepend the PO part manually.
NOTE: I'm assuming that you have only one single run of numbers in your strings. If you have for example "abc12345def678" you will get "12345678", which may not be what you want.
Regex.Replace(fileName, #"^.?PO(\d{10}).$", "$1");
Put stars after dots.
string data="PurchaseOrderPO1000000109.pdf\nPO1000000100.doc";
MatchCollection matches = Regex.Matches(data, #"PO[0-9]{10}");
foreach(Match m in matches){
print(m.Value);
}
Results
PO1000000109
PO1000000100
This RegEx will pick up all numbers from a string \d*.
As described here.
A possible regexp could be:
^.*(\d{10})\.\D{3}$
var re = new System.Text.RegularExpressions.Regex("(?<=^PurchaseOrder)PO\\d{10}(?=\\.pdf$)");
Assert.IsTrue(re.IsMatch("PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("some PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("OrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("PurchaseOrderPO1234567890.pdf2"));

Categories

Resources