Find and Replace RegEx question

Find and Replace RegEx question - c#

I am starting to get a grip on RegEx thanks to all the great help here on SO with my other questions. But I am still suck on this one:
My code is:
StreamReader reader = new StreamReader(fDialog.FileName.ToString());
string content = reader.ReadToEnd();
reader.Close();
I am reading in a text file and I want to search for this text and change it (the X and Y value always follow each other in my text file):
X17.8Y-1.
But this text can also be X16.1Y2.3 (the values will always be different after X and Y)
I want to change it to this
X17.8Y-1.G54
or
X(value)Y(value)G54
My RegEx statement follows but it is not working.
content = Regex.Replace(content, #"(X(?:\d*\.)?\d+)*(Y(?:\d*\.)?\d+)", "$1$2G54");
Can someone please modify it for me so it works and will search for X(wildcard) Y(Wildcard) and replace it with X(value)Y(value)G54?

The regular expression you need is:
X[-\d.]+Y[-\d.]+
Here is how to use it in C#:
string content = "foo X17.8Y-1. bar";
content = Regex.Replace(content, #"X[-\d.]+Y[-\d.]+", "$0G54");
Console.WriteLine(content);
Output:
foo X17.8Y-1.G54 bar

What comes after "X(value)Y(value)" in your text file? A space? A newline? Another "X(value)Y(value)" value pair?
I'm not very good using shortcuts in regexes, like \d, nor am I familiar with .NET, but I had used the following regex to match the value pair:
(X[0-9.-]+Y[0-9.-]+)
And the replacement is
$1G54
This will work as long as a value pair is not directly followed by a digit, period or a dash.

To be picky about the input, you could use
string num = #"-?(?:\d+\.\d+|\d+\.|\.\d+|\d+)";
content = Regex.Replace(content, "(?<x>X" + num + ")(?<y>Y" + num + ")", "${x}${y}G54");
Is there a reliable terminator for the Y value? Say it's whitespace:
content = Regex.Replace(content, #"(X.+?)(Y.+?)(\s)", "$1$2G54$3");
How robust does the code need to be? If it's rewriting debugging output or some other quick-and-dirty task, keep it simple.

I know this doesn't answer your question directly, but you should check out Expresso. It's a .NET Regular Expression Tool that allows you to debug, test, and fine-tune your complex expressions. It's a life-saver.
More of a do-it yourself answer, but it'll be helpful even if someone gives you an answer here.

It looks like you just need to support optional negative values:
content = Regex.Replace(content, #"(X-?(?:\d*\.)?\d+)*(Y-?(?:\d*\.)?\d+)", "$1$2G54");

Related

Replace characters === either side of text with html tags

I'm really struggling to find a way of getting the following to work, I have some data in the following format ===some text=== I want to replace the === around the text with html tags.
I've tried using Match and replace, but I get a bad compile constant value, I have also tried Replace {tag} with a value or completely remove {any-tag} but that just removes all the text. I have also tried http://www.rexegg.com/regex-lookarounds.html but none work, I think the problem I'm having is because the tags around the text do not have closing tags I'm unable to find the text
So I have tried something like this:
string format = Regex.Replace(data.FirstOrDefault().countrylist, "=== This could be any text ===", " </p><p class=\"strong\">Need to keep text here<p>");
example of how the text looks:
====Rise and fall of the Roman empire====
====20th and 21st centuries====
So I would want it to look:
</p><p class=\"strong\">Rise and fall of the Roman empire<p>
</p><p class=\"strong\">20th and 21st centuries<p>
I'm not the greatest at regular expressions, and all my attempts have failed so any help would be much appreciated.

Try this one:
var yourstring = "===20th and 21st centuries===";
var regex = new Regex(Regex.Escape("==="));
// The last 1 tells to replace only the first occurence of the Escape
yourstring = regex.Replace(yourstring, "</p><p class=\"strong\">", 1);
yourstring = regex.Replace(yourstring, "<p>", 1);
Do not forget the error handling, I don't exactly know what happens if it wants to replace an occurence and can't find it
Edit: If you have multiple entries which should be replaced, loop the replace part until it will not be able to replace anymore then it will throw an error you can catch to continue

The following worked in my environment:
string text = "===Rise and fall of the Roman empire===";
var pattern = #"===(.*)===";
var regex = new Regex(pattern);
var match = regex.Match(text);
var result = string.Concat("</p><p class=\"strong\">", match.Groups[1].Value, "<p>");
Regards,

How to read text file between ""

I need an "idea" on how to read text file data between quotes. For example:
line 1: "read a title"
line 2: "read a descr"
line 1: "read a title"
line 2: "read a descr"
I want to do a foreach type of thing, and I want to read all Line 1's, and Line 2's as a pair, but between the ".
In my program I am going to output (foreach of course):
readTerminatedNull(file1);
readTerminatedNull(file2);
I would read line by line, but some of the text could be:
line 1: "read a super long
title that goes off"
line 2: "read a descr"
So that's why I want to read between the ".
Sorry if that is too complicated, and it's a little hard to explain.
Edit:
Thanks for all the feed back guys, but I'm not sure you are getting what I am trying to do :p not your faults, I wrote this kinda wierd.
I will have a text file full of refrences, and text. like so.
text inside:
Refren: "myrefrence_1"
String: "This is a string of a refrence"
Refren: "myrefrence_2"
String: "hello world"
Refren: "myrefrence_3"
String: "I like cookies."
I want it to to read myrefrence_1 in the quotes of the first line, and then read the string in the next line between the ".
I will then stuff into my program that matches the refrence with the string.
But sometimes the text will be more than one line.
Refren: "this is text that goes and then
return keys on some parts."
and I still want it to read through the ".

(not tested, but you'll get the idea)
// Read all text from file
string sData = File.ReadAllText(#"c:/file.txt");
// Match strings between " "
Match match = Regex.Match(sData , "\"(\w|\d|\s|\\\")*\"",
RegexOptions.IgnoreCase);
// Read results and strip " out of them
foreach (var sResult in match) {
sResult = sResult.Remove(0,1).Remove(sResult.length-2, 1);
// Do whatever with sResult
}

You could learn some new tricks by looking into state machines. Basically: Read each character at a time and figure out what state you are in now. First, code this as a big while loop with a big switch statement inside. Then, go and read up on the state pattern for how to do this in an object oriented way. Then, ditch that and use delegates, because c# makes this stuff so easy to do.
Then, scrap it all, write some crappy Regular Expression with a multiline flag and slurp it the Perl way. Meditate on why this is the same as your original state machine solution.
Then, get really stuck in and learn about parser generators (lexx/yacc or some .NET variant) and write a simple BNF grammar for your problem. Take special note of how the trivial grammars used in the tutorials are all way more complicated than the one you need to write. Why is that so? Check out what Noam Chomsky had to say about that.
Eventually, you'll burn out. We all do. But you'll have so much fun digging into what makes programming the coolest activity on the planet. Burn-out is just the realization that that's a pipe dream ;)
When you're done, go outside. Meet people. Talk. Smile a lot. Be friendly. You're now a zen infused developer with a wicked grin. Yay for you! You rock!

What you're describing sounds like a single-column CSV file. The easiest way to access that is probably to use the Microsoft.VisualBasic.FileIO.TextFieldParser class, something like:
using (var csvParser = new TextFieldParser(new StringReader(content))
{
Delimiters = new[] {","},
HasFieldsEnclosedInQuotes = true
})
{
while (!csvParser.EndOfData)
{
var fields = csvParser.ReadFields();
Console.Print(fields[0]); //do something with the first (in your case only) field found.
}
}
Probably the easiest way to determine whether this approach makes sense, is to think about what happens if the string you're reading actually contains a double quote. Would it end up as "He said ""this is quoted"", but I wasn't listening" (doubling up the quotes), or is this situation impossible?
If the quotes would be doubled up in this way, then a standard CSV reader like this built-in framework one is probably your best bet.

To read all of the lines of the file you can use:
File.ReadAllLines(pathToFile);
to strip the text from "" you can use the substring method of string: http://msdn.microsoft.com/en-us/library/aka44szs.aspx
you can do it like that:
string strippedString = original.Substring(1, original.length -2);

Try this one
var text = File.ReadAllLines(pathToFile);
var lines = text.Split(':')
.Where((s,i) => i % 2 != 0)
.Select(s => s.trim('"'));

First of all you need to read in the file using:
File.ReadAllLines(filePath);
Then you could split all the lines using the string.Split function.
Splitting on the closing bracket would be your best bet.

As i have understood from you question is you want to read and write text file with some specific settings. is it ?
I would like to refer to to INI files which are the text files it self and provide the settings configurations as you wish to achieve. here are some links these could help you.
http://www.codeproject.com/Articles/1966/An-INI-file-handling-class-using-C
http://jachman.wordpress.com/2006/09/11/how-to-access-ini-files-in-c-net/

Finding text between tags and replacing it along with the tags

I am using The following regex pattern to find text between [code] and [/code] tags:
(?<=[code]).*?(?=[/code])
It returns me anything which is enclosed between these 2 tags, e.g. this: [code]return Hi There;[/code] gives me return Hi There;.
I need help with regex to replace entire text along with the tags.

Use this:
var s = "My temp folder is: [code]Path.GetTempPath()[/code]";
var result = Regex.Replace(s, #"\[code](.*?)\[/code]",
m =>
{
var codeString = m.Groups[1].Value;
// then evaluate this string
return EvaluateMyCode(codeString)
});

I would use a HTML Parser for this. I can see that what you are trying to do is simple, however these things have a habit to get much more complicated overtime. The end result is much pain for the poor sole who has to maintain the code in the future.
Take a look at this question about HTML Parsers
What is the best way to parse html in C#?
[Edit]
Here is a much more relevant answer to the question asked.
#Milad Naseri regex is correct you just need to do something like
string matchCodeTag = #"\[code\](.*?)\[/code\]";
string textToReplace = "[code]The Ape Men are comming[/code]";
string replaceWith = "Keep Calm";
string output = Regex.Replace(textToReplace, matchCodeTag, replaceWith);
Check out this web sites for more examples
http://www.dotnetperls.com/regex-replace
http://oreilly.com/windows/archive/csharp-regular-expressions.html
Hope this helps

You need to use back referencing, i.e. replace \[code\](.*?)\[/code\] with something like <code>$1</code> which will give you what's been enclosed by the [code][/code] tags enclosed in -- for this example -- <code></code> tags.

Regex for a string

It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.

Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.

You need to use a real parser. Things like infinitely nested tags can't be handled via regex.

You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));

NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.

I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.

How to Extract the Word Following a Symbol?

I have a string that could have any sentence in it but somewhere in that string will be the # symbol, followed by an attached word, sort of like #username you see on some sites.
so maybe the string is "hey how are you" or it's "#john hey how are you".
IF there's an "#" in the string i want to pull what comes immediately after it into its own new string.
in this instance how can i pull "john" into a different string so i could theoretically notify this person of his new message? i'm trying to play with string.contains or .replace but i'm pretty new and having a hard time.
this btw is in c# asp.net

You can use the Substring and IndexOf methods together to achieve this.
I hope this helps.
Thanks,
Damian

Here's how you do it without regex:
string s = "hi there #john how are you";
string getTag(string s)
{
int atSign = s.IndexOf("#");
if (atSign == -1) return "";
// start at #, stop at sentence or phrase end
// I'm assuming this is English, of course
// so we leave in ' and -
int wordEnd = s.IndexOfAny(" .,;:!?", atSign);
if (wordEnd > -1)
return s.Substring(atSign, wordEnd - atSign);
else
return s.Substring(atSign);
}

You should really learn regular expressions. This will work for you:
using System.Text.RegularExpressions;
var res = Regex.Match("hey #john how are you", #"#(\S+)");
if (res.Success)
{
//john
var name = res.Groups[1].Value;
}
Finds the first occurrence. If you want to find all you can use Regex.Matches. \S means anything else than a whitespace. This means it also make hey #john, how are you => john, and #john123 => john123 which may be wrong. Maybe [a-zA-Z] or similar would suit you better (depends on which characters the usernames is made of). If you would give more examples, I could tune it :)
I can recommend this page:
http://www.regular-expressions.info/
and this tool where you can test your statements:
http://regexlib.com/RESilverlight.aspx

The best way to solve this is using Regular Expressions. You can find a great resource here.
Using RegEx, you can search for the pattern you are after. I always have to refer to some documentation to write one...
Here is a pattern to start with - "#(\w+)" - the # will get matched, and then the parentheses will indicate that you want what comes after. The "\w" means you want only word characters to match (a-z or A-Z), and the "+" indicates that there should be one or more word characters in a row.

You can try Regex...
I think will be something like this
string userName = Regex.Match(yourString, "#(.+)\\s").Groups[1].Value;

RegularExpressions. Dont know C#, but the RegEx would be
/(#[\w]+) / - Everything in the parans is captured in a special variable, or attached to RegEx object.

Use this:
var r = new Regex(#"#\w+");
foreach (Match m in r.Matches(stringToSearch))
DoSomething(m.Value);
DoSomething(string foundName) is a function that handles name (found after #).
This will find all #names in stringToSearch

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find and Replace RegEx question - c#

The regular expression you need is: X[-\d.]+Y[-\d.]+ Here is how to use it in C#: string content = "foo X17.8Y-1. bar"; content = Regex.Replace(content, #"X[-\d.]+Y[-\d.]+", "$0G54"); Console.WriteLine(content); Output: foo X17.8Y-1.G54 bar

It looks like you just need to support optional negative values: content = Regex.Replace(content, #"(X-?(?:\d\.)?\d+)(Y-?(?:\d*\.)?\d+)", "$1$2G54");

Related

Replace characters === either side of text with html tags

How to read text file between ""

Finding text between tags and replacing it along with the tags

Regex for a string

How to Extract the Word Following a Symbol?

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find and Replace RegEx question - c#

The regular expression you need is: X[-\d.]+Y[-\d.]+ Here is how to use it in C#: string content = "foo X17.8Y-1. bar"; content = Regex.Replace(content, #"X[-\d.]+Y[-\d.]+", "$0G54"); Console.WriteLine(content); Output: foo X17.8Y-1.G54 bar

It looks like you just need to support optional negative values: content = Regex.Replace(content, #"(X-?(?:\d*\.)?\d+)*(Y-?(?:\d*\.)?\d+)", "$1$2G54");

Related

Replace characters === either side of text with html tags

How to read text file between ""

Finding text between tags and replacing it along with the tags

Regex for a string

How to Extract the Word Following a Symbol?

Categories

Resources

It looks like you just need to support optional negative values: content = Regex.Replace(content, #"(X-?(?:\d\.)?\d+)(Y-?(?:\d*\.)?\d+)", "$1$2G54");