replace single backslash in database driven string - c#

This might seem like a question that's been answered many times. My team and I have tried many solutions over the past hour without any luck. We have a database driven string value that contains c:\test and we want to replace the backslash with \\ resulting in c:\\test.
We've tried using .Replace, Regex.Replace, .Split and rebuilding the string, I tried using a for loop and substring to examine each character. When you get past the colon the next character shows up as "\t".
Please try the solution before submitting as we've tried a lot of different methods including dozens of suggestions already on stack overflow.
If we manually set the string as a literal like path = #"c:\test" then using replace works fine.
I would think that the solution would be to create a string that doesn't process the escape character but I have no idea how to implement that.

Sounds like your string already contains "tab" character ('\t') you probably need to replace it with "\\t" :
var result = "c:\test".Replace("\t", "\\t");

Related

Ignore nested single quotes inside of single quotes

I'm working with matching an entire string within single quotes. The problem is, these strings are dynamically generated and I need to ignore all other single quotes within the first set of quotes. I've come across other solutions that are similar but I can't seem to tweak them to my needs.
Here is what I've worked with so far:
'(?:''|[^'])*'
I would like to match essentially everything within the first and last single quotes between content: and ;
Some example text:
#bottom {
content: 'Here we have an embedded unescaped 'single' that is generated at runtime. {Let's ignore it
please'
;
}
This is the playground I've been working in:
https://regex101.com/r/ITHciu/2
Any help would be greatly appreciated.
If you absolutely have to use Regexes for this and you are certain that ; will not be inside the string you are searching for, you could try this: '[^;]*'\s*;$. It will select everything from a ' and go until a like that ends with whitesapce and a ;.
Edit: if you need the stuff between the ' and ';, you could use a group '([^;]*)'\s*;$.
However, a much cleaner solution would be to make a little parser, that will read the string char by char. It's a fun exercise if you got a little bit more time.
If nothing else, you could use that regex to correct the invalid syntax in your files. And tell the people manually writing them what the valid syntax should be.

Issue with find and replace apostrophe( ' ) in a Word Docx using OpenXML and Regex

Word seems to use a different apostrophe character than Visual Studio and it is causing problems with using Regex.
I am trying to edit some Word documents in C# using OpenXML. I am basically replacing [[COMPANY]] with a company name. This has worked pretty smoothly until I have reached my corner case of companies with names that end in s. I end up with issue s where sometimes it creates a s's.
Example:
Company Name: Simmons
Text in Doc: The [[COMPANY]]'s business is cars.
Result: The Simmons's business is cars.
This is improper English.
I should be able to just use a basic find and replace like I did for [[COMPANY]], but it is not working.
Regex apostropheReplace = new Regex("s\\'s");
docText = apostropheReplace.Replace(docText, "s\'");
This does not. It seems that Word is using an different character for and apostrophe(') than the standard one that is created when I use the key on my keyboard in Visual Studio. If I write a find and replace using my keyboard it will not work, but if I copy and paste the apostrophe from Word it does.
Regex apostrophyReplace = new Regex("s\\’s");
docText = apostrophyReplace.Replace(docText, "s\'");
Notice the different character in the Regex for the second one. I'm confused as to why this is, and also want to know if the is a proper way of doing this. I tried "'" but that does not work. I just want to know if using the copied character from Word is the proper way of doing this, and is there a way to do it so that both characters work so I don't have an issue with docs that may be created with a different program.
The reason this happens is because they are different characters.
Word actually changes some punctuation characters after you type them in order to give them the right inclination or to improve presentation.
I ran in the very same issue before and I used this as regular expression: [\u2018\u2019\u201A\u201b\u2032']
So essentially modify your code to:
Regex apostropheReplace = new Regex("s\\[\u2018\u2019\u201A\u201b\u2032']s");
docText = apostropheReplace.Replace(docText, "s\'")
I found these were the five most common type of single quotes and apostrophes used.
And in case you come across the same issue with double quotes, here is what you can use: [\u201C\u201D\u201E\u201F\u2033\u2036\"]
Answering the question:
Is there a way to do it so that both characters work?
If you want one Regex to be able to handle both scenarios, this is perhaps a simple and readable solution:
Regex apostropheReplace = new Regex("s\\['’]s");
docText = apostropheReplace.Replace(docText, "s\'")
This has the added benefit of being understandable to other developers that you are attempting to cover both apostrophe cases. This benefit gets at the other part of your question:
If using the copied character from Word is the proper way of doing this?
That depends on what you mean by "proper". If you mean "most understandable to other developers," I'd say yes, because there would be the least amount of look-up needed to know exactly what your Regex is looking for. If you mean "most performant", that should not be an issue with this straightforward Regex search (some nice Regex performance tips can be found here).
If you mean "most versatile/robust single quote Regex", then as #Leonardo-Seccia points out, there are other character encodings that might cause trouble. (Some of the common Microsoft Word ones are listed here.) Such a solution might look like this:
Regex apostropheReplace =
new Regex("s\\['\u2018\u2019\u201A\u201b]s");
docText = apostropheReplace.Replace(docText, "s\'")
But you can certainly add other character encodings as needed. A more complete list of character encodings can be found here - to add them to the above Regex, simply change the "U+" to "u" and add it to the list after another "\" character. For example, to add the "prime" symbol (′ or U+2032) to the list above, change the RegEx string from
Regex("s\\['\u2018\u2019\u201A\u201b]s")
to
Regex("s\\['\u2018\u2019\u201A\u201b\u2032]s")
Ultimately, you would be the judge of what character encodings are the most "proper" for inclusion in your Regex based on your use cases.

How do I see if a string contains another string with quotes in it?

I am trying to see if a large string contains this line of HTML:
<label ng-class="choiceCaptionClass" class="ng-binding choice-caption">Was this information helpful?</label>
As you can see, this snippet has quotations in multiple places and it's causing problems when I do something like this:
Assert.IsTrue(responseContent.Contains("<label ng-class="choiceCaptionClass" class="ng - binding choice - caption">Was this information helpful?</label>"));
I've tried both of these ways of defining the string:
#"<label ng-class=""choiceCaptionClass"" class=""ng - binding choice - caption"">Was this information helpful?</label>"
and
"<label ng-class=\"choiceCaptionClass\" class=\"ng - binding choice - caption\">Was this information helpful?</label>"
But in each case the Contains() method looks for the literal string with either the double quotes or the backslashes. Is there another way I could define this string so I can correctly search for it?
Escaping the double-quotes with backslashes is the proper thing to do.
The reason your search may be failing is that the strings don't actually match. For example, in your version with backslashes, you have spaces around some of the dashes but your HTML string does not.
Try using regular expressions. I made this one for you but you can test your own regex here.
var regex = new Regex(#"<label\s+ng-class\s*=\s*""choiceCaptionClass""\s+class\s*=\s*""ng-binding choice-caption""\s*>\s*Was this information helpful\?\s*</label>", RegexOptions.IgnoreCase);
Assert.IsTrue(regex.IsMatch(responseContent));
If this is not working use the tester tool to figure it out what part of the pattern is getting off.
Hope this help!

Regular Expression for Digits and Special Characters - C#

I use Html-Agility-Pack to extract information from some websites. In the process I get data in the form of string and I use that data in my program.
Sometimes the data I get includes multiple details in the single string. As the name of this Movie "Dog Eats Dog (2012) (2012)". The name should have been "Dog Eats Dog (2012)" rather than the first one.
Above is the one example from many. In order to correct the issue I tried to use string.Distinct() method but it would remove all the duplicate characters in the string as in above example it would return "Dog Eats (2012)". Now it solved my initial problem by removing the 2nd (2012) but created a new one by changing the actual title.
I thought my problem could be solved with Regex but I have no idea as to how I can use it here. As far as I know if I use Regex it would tell me that there are duplicate items in the string according to the defined Regex code.
But how do I remove it? There can be a string like "Meme 2013 (2013) (2013)".
Now the actual title is "Meme 2013" with year (2013) and the duplicate year (2013). Even if I get a bool value indicating that the string has duplicate year, I cant think of any method to actually remove the duplicate substring.
The duplicate year always comes in the end of the string. So what should be the Regex that I would use to determine that the string actually has two years in it, like (2012) (2012)?
If I can correctly identify the string contains duplicate maybe I can use string.LastIndexOf() to try and remove the duplicate part. If there is any better way to do it please let me know.
Thanks.
The right regex is "( \(\d{4}\))\1+".
string pattern = #"( \(\d{4}\))\1+";
new Regex(pattern).Replace(s, "$1");
Example here : https://repl.it/Evcy/2
Explanation:
Capture one " (dddd)" block, and remove all following identical ones.
( \(\d{4}\)) does the capture, \1+ finds any non empty sequence of that captured block
Finally, replace the initial block and its copies by the initial block alone.
This regex will allow for any pattern of whitespace, even none, as in (2013)(2013)
`#"(\(\d{4}\))(?:\s*\1)+"`
I have a demo of it here

how to make string returned by ResourceManager.GetString not verbatim

Okay, I have a string
string textToShow = "this\nrocks"
which when put in label in winforms window will then show
this
rocks
Which is the result I'd like to get. Now, instead of setting the textToShow in the code, I set it in the resource file. When I tried to get the value from resource file using
Properties.Resources.ResourceManager.GetString("textToShow");
the whole string instead will be treated as verbatim, showing
this\nrocks
when put in a label in a winforms window. This is not the result i'm looking for. What's the best way to store strings with special characters in resource file then? I can do string replace for every special characters, like
string.Replace(#"\n", "\n");
but then I need to replace every special characters whenever I call method ResourceManager.GetString, which I think is not the most elegant solution. If there is some ways to make string returned from method ResourceManager.GetString not verbatim, please do tell me.
Thanks
This was already answered here: StackOverflow: How to deal with newline
Basically you have two useful options:
Use shift + enter in the resource manager text editer to add a new line.
Or use String.Format() to replace {0} with \n on read.
The .Net 4.5 framework has the unescape functionality as shown here:
using System.Text.RegularExpressions;
Regex.Unescape(Properties.Resources.ResourceManager.GetString("textToShow"));
solves your issue. Now you can use \n and \u in the resource files.
On the resource editor type "this<shift+enter>rocks" as the resource value.

Categories

Resources