Check if string contains rage of Unicode characters

Check if string contains rage of Unicode characters - c#

What is the best way to check if string contains specified Unicode character? My problem is I cannot parse string/characters to format \u[byte][byte][byte][byte]. I followed many tutorials and threads here on StackOverflow, but when I have method like this one:
private bool ContainsInvalidCharacters(string name)
{
if (translation.Any(c => c > 255))
{
byte[] bytes = new byte[name.Length];
Buffer.BlockCopy(name.ToCharArray(), 0, bytes, 0, bytes.Length);
string decoded = Encoding.UTF8.GetString(bytes, 0, name.Length);
(decoded.Contains("\u0001"))
{
//do something
}
}
I get output like: "c\0o\0n\0t\0i\0n\0g\0u\0t\0".
This really is not my cup of tea. I will be grateful for any help.

If I were to picture a rage of Unicode characters that would be my bet:
ლ(~•̀︿•́~)つ︻̷┻̿═━一
So to answer your question, that is to check string for such rage you could simply:
private bool ContainsInvalidCharacters(string name)
{
return name.IndexOf("ლ(~•̀︿•́~)つ︻̷┻̿═━一") != -1;
}
;)

Is this what you want?
public static bool ContainsInvalidCharacters(string name)
{
return name.IndexOfAny(new[]
{
'\u0001', '\u0002', '\u0003',
}) != -1;
}
and
bool res = ContainsInvalidCharacters("Hello\u0001");
Note the use of '\uXXXX': the ' denote a char instead of a string.

Check this also
/// <summary>
/// Check invalid character based on the pattern
/// </summary>
/// <param name="text">The string</param>
/// <returns></returns>
public static string IsInvalidCharacters(this string text)
{
string pattern = #"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
var match = Regex.Match(text, pattern, "");
return match.Sucess;
}

Related

Removing HTML from messages safely

I need to output all of the plaintext within messages that may include valid and/or invalid HTML and possibly text that is superficially similar to HTML (i.e. non-HTML text within <...> such as: < why would someone do this?? >).
It is more important that I preserve all non-HTML content than it is to strip out all HTML, but ideally I would like to get rid of as much of the HTML as possible for readability.
I am currently using HTML Agility Pack, but I am having issues where non-HTML within < and > is also removed, for example:
my function:
text = HttpUtility.HtmlDecode(text);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(text);
text = doc.DocumentNode.InnerText;
simple example input*:
this text has <b>weird < things</b> going on >
actual output (unacceptable, lost the word "things"):
this text has weird going on >
desired output:
this text has weird < things going on >
Is there a way to remove only legitimate HTML tags within HTML Agility Pack without stripping out other content that may include < and/or >? Or do I need to manually create a white-list of tags to remove like in this question? That is my fallback solution but I'm hoping there is a more complete solution built in to HTML Agility Pack (or another tool) that I just haven't been able to find.
*(real input often has a ton of unneeded HTML in it, I can give a longer example if that would be useful)

You could use this pattern to replace the HTML tags:
</?[a-zA-Z][a-zA-Z0-9 \"=_-]*?>
Explanation:
<
maybe / (as it may be closing tag)
match a-z or A-Z as the first letter
MAYBE match any of a-z, or A-Z, 0-9, "=_- indefinitely
>
Final Code:
using System;
using System.Text.RegularExpressions;
namespace Regular
{
class Program
{
static void Main(string[] args)
{
string yourText = "this text has <b>weird < things</b> going on >";
string newText = Regex.Replace(yourText, "</?[a-zA-Z][a-zA-Z0-9 \"=_-]*>", "");
Console.WriteLine(newText);
}
}
}
Outputs:
this text has weird < things going on >
#corey-ogburn's comment is not correct as <[space]abc> would be replaced.
As you only want to strip them off the string I don't see a reason where you'd want to check if you have a tag starting/ending, but you could easily make it with regex.
It's not always a good choice to use RegEx to parse HTML, but I think it'd be fine if you want to parse simple text.

I wrote this a really long time ago to do something similar. You might use it as a starting point:
You'll need:
using System;
using System.Collections.Generic;
And the code:
/// <summary>
/// Instances of this class strip HTML/XML tags from a string
/// </summary>
public class HTMLStripper
{
public HTMLStripper() { }
public HTMLStripper(string source)
{
m_source = source;
stripTags();
}
private const char m_beginToken = '<';
private const char m_endToken = '>';
private const char m_whiteSpace = ' ';
private enum tokenType
{
nonToken = 0,
beginToken = 1,
endToken = 2,
escapeToken = 3,
whiteSpace = 4
}
private string m_source = string.Empty;
private string m_stripped = string.Empty;
private string m_tagName = string.Empty;
private string m_tag = string.Empty;
private Int32 m_startpos = -1;
private Int32 m_endpos = -1;
private Int32 m_currentpos = -1;
private IList<string> m_skipTags = new List<string>();
private bool m_tagFound = false;
private bool m_tagsStripped = false;
/// <summary>
/// Gets or sets the source string.
/// </summary>
/// <value>
/// The source string.
/// </value>
public string source { get { return m_source; } set { clear(); m_source = value; stripTags(); } }
/// <summary>
/// Gets the string stripped of HTML tags.
/// </summary>
/// <value>
/// The string.
/// </value>
public string stripped { get { return m_stripped; } set { } }
/// <summary>
/// Gets or sets a value indicating whether [HTML tags were stripped].
/// </summary>
/// <value>
/// <c>true</c> if [HTML tags were stripped]; otherwise, <c>false</c>.
/// </value>
public bool tagsStripped { get { return m_tagsStripped; } set { } }
/// <summary>
/// Adds the name of an HTML tag to skip stripping (leave in the text).
/// </summary>
/// <param name="value">The value.</param>
public void addSkipTag(string value)
{
if (value.Length > 0)
{
// Trim start and end tokens from skipTags if present and add to list
CharEnumerator tmpScanner = value.GetEnumerator();
string tmpString = string.Empty;
while (tmpScanner.MoveNext())
{
if (tmpScanner.Current != m_beginToken && tmpScanner.Current != m_endToken) { tmpString += tmpScanner.Current; }
}
if (tmpString.Length > 0) { m_skipTags.Add(tmpString); }
}
}
/// <summary>
/// Clears this instance.
/// </summary>
public void clear()
{
m_source = string.Empty;
m_tag = string.Empty;
m_startpos = -1;
m_endpos = -1;
m_currentpos = -1;
m_tagsStripped = false;
}
/// <summary>
/// Clears all.
/// </summary>
public void clearAll()
{
this.clear();
m_skipTags.Clear();
}
/// <summary>
/// Strips the HTML tags.
/// </summary>
private void stripTags()
{
// Preserve source and make a copy for stripping
m_stripped = m_source;
// Find first tag
getNext();
// If there are any tags (if next tag is string.Empty we are at EOS)...
if (m_tagName != string.Empty)
{
do
{
// If the tag we found is not to be skipped...
if (!m_skipTags.Contains(m_tagName))
{
// Remove tag from string
m_stripped = m_stripped.Remove(m_startpos, m_endpos - m_startpos + 1);
m_tagsStripped = true;
}
// Get next tag, rinse and repeat (if next tag is string.Empty we are at EOS)
getNext();
} while (m_tagName != string.Empty);
}
}
/// <summary>
/// Steps the pointer to the next HTML tag.
/// </summary>
private void getNext()
{
m_tagFound = false;
m_tag = string.Empty;
m_tagName = string.Empty;
bool beginTokenFound = false;
CharEnumerator scanner = m_stripped.GetEnumerator();
// If we're not at the beginning of the string, move the enumerator to the appropriate location in the string
if (m_currentpos != -1)
{
Int32 index = 0;
do
{
scanner.MoveNext();
index += 1;
} while (index < m_currentpos + 1);
}
while (!m_tagFound && m_currentpos + 1 < m_stripped.Length)
{
// Find next begin token
while (scanner.MoveNext())
{
m_currentpos += 1;
if (evaluateChar(scanner.Current) == tokenType.beginToken)
{
m_startpos = m_currentpos;
beginTokenFound = true;
break;
}
}
// If a begin token is found, find next end token
if (beginTokenFound)
{
while (scanner.MoveNext())
{
m_currentpos += 1;
// If we find another begin token before finding an end token we are not in a tag
if (evaluateChar(scanner.Current) == tokenType.beginToken)
{
m_tagFound = false;
beginTokenFound = true;
break;
}
// If the char immediately following a begin token is a white space we are not in a tag
if (m_currentpos - m_startpos == 1 && evaluateChar(scanner.Current) == tokenType.whiteSpace)
{
m_tagFound = false;
beginTokenFound = true;
break;
}
// End token found
if (evaluateChar(scanner.Current) == tokenType.endToken)
{
m_endpos = m_currentpos;
m_tagFound = true;
break;
}
}
}
if (m_tagFound)
{
// Found a tag, get the info for this tag
m_tag = m_stripped.Substring(m_startpos, (m_endpos + 1) - m_startpos);
m_tagName = m_stripped.Substring(m_startpos + 1, m_endpos - m_startpos - 1);
// If this tag is to be skipped, we do not want to reset the position within the string
// Also, if we are at the end of the string (EOS) we do not want to reset the position
if (!m_skipTags.Contains(m_tagName) && m_currentpos != stripped.Length)
{
m_currentpos = -1;
}
}
}
}
/// <summary>
/// Evaluates the next character.
/// </summary>
/// <param name="value">The value.</param>
/// <returns>tokenType</returns>
private tokenType evaluateChar(char value)
{
tokenType returnValue = new tokenType();
switch (value)
{
case m_beginToken:
returnValue = tokenType.beginToken;
break;
case m_endToken:
returnValue = tokenType.endToken;
break;
case m_whiteSpace:
returnValue = tokenType.whiteSpace;
break;
default:
returnValue = tokenType.nonToken;
break;
}
return returnValue;
}
}

How to replace text with index value in C#

I have a text file which contains a repeated string called "map" for more than 800 now I would like to replace them with map to map0, map1, map2, .....map800.
I tried this way but it didn't work for me:
void Main() {
string text = File.ReadAllText(#"T:\File1.txt");
for (int i = 0; i < 2000; i++)
{
text = text.Replace("map", "map"+i);
}
File.WriteAllText(#"T:\File1.txt", text);
}
How can I achieve this?

This should work fine:
void Main() {
string text = File.ReadAllText(#"T:\File1.txt");
int num = 0;
text = (Regex.Replace(text, "map", delegate(Match m) {
return "map" + num++;
}));
File.WriteAllText(#"T:\File1.txt", text);
}

/// <summary>
/// Replaces each existing key within the original string by adding a number to it.
/// </summary>
/// <param name="original">The original string.</param>
/// <param name="key">The key we are searching for.</param>
/// <param name="offset">The offset of the number we want to start with. The default value is 0.</param>
/// <param name="increment">The increment of the number.</param>
/// <returns>A new string where each key has been extended with a number string with "offset" and beeing incremented with "increment".The default value is 1.</returns>
/// <example>
/// Assuming that we have an original string of "mapmapmapmap" and the key "map" we
/// would get "map0map1map2map3" as result.
/// </example>
public static string AddNumberToKeyInString(string original, string key, int offset = 0, int increment = 1)
{
if (original.Contains(key))
{
int counter = offset;
int position = 0;
int index;
// While we are withing the length of the string and
// the "key" we are searching for exists at least once
while (position < original.Length && (index = original.Substring(position).IndexOf(key)) != -1)
{
// Insert the counter after the "key"
original = original.Insert(position + key.Length, counter.ToString());
position += index + key.Length + counter.ToString().Length;
counter += increment;
}
}
return original;
}

It's because you are replacing the same occurrence of map each time. So the resulting string will have map9876543210 map9876543210 map9876543210 for 10 iterations, if the original string was "map map map". You need to find each individual occurrence of map, and replace it. Try using the indexof method.

Something along these lines should give you an idea of what you're trying to do:
static void Main(string[] args)
{
string text = File.ReadAllText(#"C:\temp\map.txt");
int mapIndex = text.IndexOf("map");
int hitCount = 0;
int hitTextLength = 1;
while (mapIndex >= 0 )
{
text = text.Substring(0, mapIndex) + "map" + hitCount++.ToString() + text.Substring(mapIndex + 2 + hitTextLength);
mapIndex = text.IndexOf("map", mapIndex + 3 + hitTextLength);
hitTextLength = hitCount.ToString().Length;
}
File.WriteAllText(#"C:\temp\map1.txt", text);
}
Due to the fact that strings are immutable this wouldn't be the ideal way to deal with large files (1MB+) as you would be creating and disposing the entire string for each instance of "map" in the file.
For an example file:
map hat dog
dog map cat
lost cat map
mapmapmaphat
map
You get the results:
map0 hat dog
dog map1 cat
lost cat map2
map3map4map5hat
map6

How can I limit a string to no more than a certain length? [duplicate]

This question already has answers here:
How do I truncate a .NET string?
(37 answers)
Closed 6 years ago.
I tried the following:
var Title = LongTitle.Substring(0,20)
This works but not if LongTitle has a length of less than 20. How can I limit strings to a maximum of 20 characters but not get an error if they are just for example 5 characters long?

Make sure that length won't exceed LongTitle (null checking skipped):
int maxLength = Math.Min(LongTitle.Length, 20);
string title = LongTitle.Substring(0, maxLength);
This can be turned into extension method:
public static class StringExtensions
{
/// <summary>
/// Truncates string so that it is no longer than the specified number of characters.
/// </summary>
/// <param name="str">String to truncate.</param>
/// <param name="length">Maximum string length.</param>
/// <returns>Original string or a truncated one if the original was too long.</returns>
public static string Truncate(this string str, int length)
{
if(length < 0)
{
throw new ArgumentOutOfRangeException(nameof(length), "Length must be >= 0");
}
if (str == null)
{
return null;
}
int maxLength = Math.Min(str.Length, length);
return str.Substring(0, maxLength);
}
}
Which can be used as:
string title = LongTitle.Truncate(20);

Shortest, the:
var title = longTitle.Substring(0, Math.Min(20, longTitle.Length))

string title = new string(LongTitle.Take(20).ToArray());

If the string Length is bigger than 20, use 20, else use the Length.
string title = LongTitle.Substring(0,
(LongTitle.Length > 20 ? 20 : LongTitle.Length));

You can use the StringLength attribute. That way no string can be stored that is longer (or shorter) than a specified length.
See: http://msdn.microsoft.com/en-us/library/system.componentmodel.dataannotations.stringlengthattribute%28v=vs.100%29.aspx
[StringLength(20, ErrorMessage = "Your Message")]
public string LongTitle;

String.Replace ignoring case

I have a string called "hello world"
I need to replace the word "world" to "csharp"
for this I use:
string.Replace("World", "csharp");
but as a result, I don't get the string replaced. The reason is case sensitiveness. The original string contains "world" whereas I'm trying to replace "World".
Is there any way to avoid this case sensitiveness in string.Replace method?

You could use a Regex and perform a case insensitive replace:
class Program
{
static void Main()
{
string input = "hello WoRlD";
string result =
Regex.Replace(input, "world", "csharp", RegexOptions.IgnoreCase);
Console.WriteLine(result); // prints "hello csharp"
}
}

var search = "world";
var replacement = "csharp";
string result = Regex.Replace(
stringToLookInto,
Regex.Escape(search),
replacement.Replace("$","$$"),
RegexOptions.IgnoreCase
);
The Regex.Escape is useful if you rely on user input which can contains Regex language elements
Update
Thanks to comments, you actually don't have to escape the replacement string.
Here is a small fiddle that tests the code:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var tests = new[] {
new { Input="abcdef", Search="abc", Replacement="xyz", Expected="xyzdef" },
new { Input="ABCdef", Search="abc", Replacement="xyz", Expected="xyzdef" },
new { Input="A*BCdef", Search="a*bc", Replacement="xyz", Expected="xyzdef" },
new { Input="abcdef", Search="abc", Replacement="x*yz", Expected="x*yzdef" },
new { Input="abcdef", Search="abc", Replacement="$", Expected="$def" },
};
foreach(var test in tests){
var result = ReplaceCaseInsensitive(test.Input, test.Search, test.Replacement);
Console.WriteLine(
"Success: {0}, Actual: {1}, {2}",
result == test.Expected,
result,
test
);
}
}
private static string ReplaceCaseInsensitive(string input, string search, string replacement){
string result = Regex.Replace(
input,
Regex.Escape(search),
replacement.Replace("$","$$"),
RegexOptions.IgnoreCase
);
return result;
}
}
Its output is:
Success: True, Actual: xyzdef, { Input = abcdef, Search = abc, Replacement = xyz, Expected = xyzdef }
Success: True, Actual: xyzdef, { Input = ABCdef, Search = abc, Replacement = xyz, Expected = xyzdef }
Success: True, Actual: xyzdef, { Input = A*BCdef, Search = a*bc, Replacement = xyz, Expected = xyzdef }
Success: True, Actual: x*yzdef, { Input = abcdef, Search = abc, Replacement = x*yz, Expected = x*yzdef}
Success: True, Actual: $def, { Input = abcdef, Search = abc, Replacement = $, Expected = $def }

2.5X FASTER and MOST EFFECTIVE method than other's regular expressions methods:
/// <summary>
/// Returns a new string in which all occurrences of a specified string in the current instance are replaced with another
/// specified string according the type of search to use for the specified string.
/// </summary>
/// <param name="str">The string performing the replace method.</param>
/// <param name="oldValue">The string to be replaced.</param>
/// <param name="newValue">The string replace all occurrences of <paramref name="oldValue"/>.
/// If value is equal to <c>null</c>, than all occurrences of <paramref name="oldValue"/> will be removed from the <paramref name="str"/>.</param>
/// <param name="comparisonType">One of the enumeration values that specifies the rules for the search.</param>
/// <returns>A string that is equivalent to the current string except that all instances of <paramref name="oldValue"/> are replaced with <paramref name="newValue"/>.
/// If <paramref name="oldValue"/> is not found in the current instance, the method returns the current instance unchanged.</returns>
[DebuggerStepThrough]
public static string Replace(this string str,
string oldValue, string newValue,
StringComparison comparisonType)
{
// Check inputs.
if (str == null)
{
// Same as original .NET C# string.Replace behavior.
throw new ArgumentNullException(nameof(str));
}
if (str.Length == 0)
{
// Same as original .NET C# string.Replace behavior.
return str;
}
if (oldValue == null)
{
// Same as original .NET C# string.Replace behavior.
throw new ArgumentNullException(nameof(oldValue));
}
if (oldValue.Length == 0)
{
// Same as original .NET C# string.Replace behavior.
throw new ArgumentException("String cannot be of zero length.");
}
//if (oldValue.Equals(newValue, comparisonType))
//{
//This condition has no sense
//It will prevent method from replacesing: "Example", "ExAmPlE", "EXAMPLE" to "example"
//return str;
//}
// Prepare string builder for storing the processed string.
// Note: StringBuilder has a better performance than String by 30-40%.
StringBuilder resultStringBuilder = new StringBuilder(str.Length);
// Analyze the replacement: replace or remove.
bool isReplacementNullOrEmpty = string.IsNullOrEmpty(newValue);
// Replace all values.
const int valueNotFound = -1;
int foundAt;
int startSearchFromIndex = 0;
while ((foundAt = str.IndexOf(oldValue, startSearchFromIndex, comparisonType)) != valueNotFound)
{
// Append all characters until the found replacement.
int charsUntilReplacment = foundAt - startSearchFromIndex;
bool isNothingToAppend = charsUntilReplacment == 0;
if (!isNothingToAppend)
{
resultStringBuilder.Append(str, startSearchFromIndex, charsUntilReplacment);
}
// Process the replacement.
if (!isReplacementNullOrEmpty)
{
resultStringBuilder.Append(newValue);
}
// Prepare start index for the next search.
// This needed to prevent infinite loop, otherwise method always start search
// from the start of the string. For example: if an oldValue == "EXAMPLE", newValue == "example"
// and comparisonType == "any ignore case" will conquer to replacing:
// "EXAMPLE" to "example" to "example" to "example" … infinite loop.
startSearchFromIndex = foundAt + oldValue.Length;
if (startSearchFromIndex == str.Length)
{
// It is end of the input string: no more space for the next search.
// The input string ends with a value that has already been replaced.
// Therefore, the string builder with the result is complete and no further action is required.
return resultStringBuilder.ToString();
}
}
// Append the last part to the result.
int charsUntilStringEnd = str.Length - startSearchFromIndex;
resultStringBuilder.Append(str, startSearchFromIndex, charsUntilStringEnd);
return resultStringBuilder.ToString();
}
Note: ignore case == StringComparison.OrdinalIgnoreCase as parameter for StringComparison comparisonType. It is the fastest, case-insensitive way to replace all values.
Advantages of this method:
High CPU and MEMORY efficiency;
It is the fastest solution, 2.5 times faster than other's methods
with regular expressions (proof in the end);
Suitable for removing parts from the input string (set newValue to
null), optimized for this;
Same as original .NET C# string.Replace behavior, same exceptions;
Well commented, easy to understand;
Simpler – no regular expressions. Regular expressions are always slower because of their versatility (even compiled);
This method is well tested and there are no hidden flaws like infinite loop in other's solutions, even highly rated:
#AsValeO: Not works with Regex language elements, so it's not
universal method
#Mike Stillion: There is a problem with this code. If the text in new
is a superset of the text in old, this can produce an endless loop.
Benchmark-proof: this solution is 2.59X times faster than regex from #Steve B., code:
// Results:
// 1/2. Regular expression solution: 4486 milliseconds
// 2/2. Current solution: 1727 milliseconds — 2.59X times FASTER! than regex!
// Notes: the test was started 5 times, the result is an average; release build.
const int benchmarkIterations = 1000000;
const string sourceString = "aaaaddsdsdsdsdsd";
const string oldValue = "D";
const string newValue = "Fod";
long totalLenght = 0;
Stopwatch regexStopwatch = Stopwatch.StartNew();
string tempString1;
for (int i = 0; i < benchmarkIterations; i++)
{
tempString1 = sourceString;
tempString1 = ReplaceCaseInsensitive(tempString1, oldValue, newValue);
totalLenght = totalLenght + tempString1.Length;
}
regexStopwatch.Stop();
Stopwatch currentSolutionStopwatch = Stopwatch.StartNew();
string tempString2;
for (int i = 0; i < benchmarkIterations; i++)
{
tempString2 = sourceString;
tempString2 = tempString2.Replace(oldValue, newValue,
StringComparison.OrdinalIgnoreCase);
totalLenght = totalLenght + tempString2.Length;
}
currentSolutionStopwatch.Stop();
Original idea – #Darky711; thanks #MinerR for StringBuilder.

Lots of suggestions using Regex. How about this extension method without it:
public static string Replace(this string str, string old, string #new, StringComparison comparison)
{
#new = #new ?? "";
if (string.IsNullOrEmpty(str) || string.IsNullOrEmpty(old) || old.Equals(#new, comparison))
return str;
int foundAt = 0;
while ((foundAt = str.IndexOf(old, foundAt, comparison)) != -1)
{
str = str.Remove(foundAt, old.Length).Insert(foundAt, #new);
foundAt += #new.Length;
}
return str;
}

Extensions make our lives easier:
static public class StringExtensions
{
static public string ReplaceInsensitive(this string str, string from, string to)
{
str = Regex.Replace(str, from, to, RegexOptions.IgnoreCase);
return str;
}
}

You can use the Microsoft.VisualBasic namespace to find this helper function:
Replace(sourceString, "replacethis", "withthis", , , CompareMethod.Text)

.Net Core has this method built-in:
Replace(String, String, StringComparison) Doc. Now we can simply write:
"...".Replace("oldValue", "newValue", StringComparison.OrdinalIgnoreCase)

Modified #Darky711's answer to use the passed in comparison type and match the framework replace naming and xml comments as closely as possible.
/// <summary>
/// Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
/// </summary>
/// <param name="str">The string performing the replace method.</param>
/// <param name="oldValue">The string to be replaced.</param>
/// <param name="newValue">The string replace all occurrances of oldValue.</param>
/// <param name="comparisonType">Type of the comparison.</param>
/// <returns></returns>
public static string Replace(this string str, string oldValue, string #newValue, StringComparison comparisonType)
{
#newValue = #newValue ?? string.Empty;
if (string.IsNullOrEmpty(str) || string.IsNullOrEmpty(oldValue) || oldValue.Equals(#newValue, comparisonType))
{
return str;
}
int foundAt;
while ((foundAt = str.IndexOf(oldValue, 0, comparisonType)) != -1)
{
str = str.Remove(foundAt, oldValue.Length).Insert(foundAt, #newValue);
}
return str;
}

(Edited: wasn't aware of the `naked link' problem, sorry about that)
Taken from here:
string myString = "find Me and replace ME";
string strReplace = "me";
myString = Regex.Replace(myString, "me", strReplace, RegexOptions.IgnoreCase);
Seems you are not the first to complain of the lack of case insensitive string.Replace.

I have wrote extension method:
public static string ReplaceIgnoreCase(this string source, string oldVale, string newVale)
{
if (source.IsNullOrEmpty() || oldVale.IsNullOrEmpty())
return source;
var stringBuilder = new StringBuilder();
string result = source;
int index = result.IndexOf(oldVale, StringComparison.InvariantCultureIgnoreCase);
while (index >= 0)
{
if (index > 0)
stringBuilder.Append(result.Substring(0, index));
if (newVale.IsNullOrEmpty().IsNot())
stringBuilder.Append(newVale);
stringBuilder.Append(result.Substring(index + oldVale.Length));
result = stringBuilder.ToString();
index = result.IndexOf(oldVale, StringComparison.InvariantCultureIgnoreCase);
}
return result;
}
I use two additional extension methods for previous extension method:
public static bool IsNullOrEmpty(this string value)
{
return string.IsNullOrEmpty(value);
}
public static bool IsNot(this bool val)
{
return val == false;
}

Doesn't this work: I cant imaging anything else being much quicker or easier.
public static class ExtensionMethodsString
{
public static string Replace(this String thisString, string oldValue, string newValue, StringComparison stringComparison)
{
string working = thisString;
int index = working.IndexOf(oldValue, stringComparison);
while (index != -1)
{
working = working.Remove(index, oldValue.Length);
working = working.Insert(index, newValue);
index = index + newValue.Length;
index = working.IndexOf(oldValue, index, stringComparison);
}
return working;
}
}

Extending Petrucio's answer with Regex.Escape on the search string, and escaping matched group as suggested in Steve B's answer (and some minor changes to my taste):
public static class StringExtensions
{
public static string ReplaceIgnoreCase(this string str, string from, string to)
{
return Regex.Replace(str, Regex.Escape(from), to.Replace("$", "$$"), RegexOptions.IgnoreCase);
}
}
Which will produce the following expected results:
Console.WriteLine("(heLLo) wOrld".ReplaceIgnoreCase("(hello) world", "Hi $1 Universe")); // Hi $1 Universe
Console.WriteLine("heLLo wOrld".ReplaceIgnoreCase("(hello) world", "Hi $1 Universe")); // heLLo wOrld
However without performing the escapes you would get the following, which is not an expected behaviour from a String.Replace that is just case-insensitive:
Console.WriteLine("(heLLo) wOrld".ReplaceIgnoreCase_NoEscaping("(hello) world", "Hi $1 Universe")); // (heLLo) wOrld
Console.WriteLine("heLLo wOrld".ReplaceIgnoreCase_NoEscaping("(hello) world", "Hi $1 Universe")); // Hi heLLo Universe

Using #Georgy Batalov solution I had a problem when using the following example
string original = "blah,DC=bleh,DC=blih,DC=bloh,DC=com";
string replaced = original.ReplaceIgnoreCase(",DC=", ".")
Below is how I rewrote his extension
public static string ReplaceIgnoreCase(this string source, string oldVale,
string newVale)
{
if (source.IsNullOrEmpty() || oldVale.IsNullOrEmpty())
return source;
var stringBuilder = new StringBuilder();
string result = source;
int index = result.IndexOf(oldVale, StringComparison.InvariantCultureIgnoreCase);
bool initialRun = true;
while (index >= 0)
{
string substr = result.Substring(0, index);
substr = substr + newVale;
result = result.Remove(0, index);
result = result.Remove(0, oldVale.Length);
stringBuilder.Append(substr);
index = result.IndexOf(oldVale, StringComparison.InvariantCultureIgnoreCase);
}
if (result.Length > 0)
{
stringBuilder.Append(result);
}
return stringBuilder.ToString();
}

My this Method could Ignore Case as well as Select Only Whole Word
public static string Replace(this string s, string word, string by, StringComparison stringComparison, bool WholeWord)
{
s = s + " ";
int wordSt;
StringBuilder sb = new StringBuilder();
while (s.IndexOf(word, stringComparison) > -1)
{
wordSt = s.IndexOf(word, stringComparison);
if (!WholeWord || ((wordSt == 0 || !Char.IsLetterOrDigit(char.Parse(s.Substring(wordSt - 1, 1)))) && !Char.IsLetterOrDigit(char.Parse(s.Substring(wordSt + word.Length, 1)))))
{
sb.Append(s.Substring(0, wordSt) + by);
}
else
{
sb.Append(s.Substring(0, wordSt + word.Length));
}
s = s.Substring(wordSt + word.Length);
}
sb.Append(s);
return sb.ToString().Substring(0, sb.Length - 1);
}

Another way is to ignore the case sensitivity in String.Replace() using the option StringComparison.CurrentCultureIgnoreCase
string.Replace("World", "csharp", StringComparison.CurrentCultureIgnoreCase)

I recommend the StringComparison.CurrentCultureIgnoreCase method as proposed by ZZY / Gama Sharma. This is just another technique that could be used with LINQ:
List<string> ItemsToRedact = new List<string> { "star", "citizen", "test", "universe"};
string Message = "Just like each sTaR is unique yet mAkes the uniVERSE what it is, the light in you makes you who you are";
List<string> ReplacementList = Message.Split(' ').Where(x => ItemsToRedact.Contains(x.ToLower())).ToList();
foreach (var word in ReplacementList)
{
Message = Message.Replace(word, "[Redacted] ");
}
Console.WriteLine(Message);
returns: Just like each [Redacted] is unique yet mAkes the [Redacted] what it is, the light in you makes you who you are
This code could be further distilled but I broke it up for readability

Below function is to remove all match word like (this) from the string set. By Ravikant Sonare.
private static void myfun()
{
string mystring = "thiTHISThiss This THIS THis tThishiThiss. Box";
var regex = new Regex("this", RegexOptions.IgnoreCase);
mystring = regex.Replace(mystring, "");
string[] str = mystring.Split(' ');
for (int i = 0; i < str.Length; i++)
{
if (regex.IsMatch(str[i].ToString()))
{
mystring = mystring.Replace(str[i].ToString(), string.Empty);
}
}
Console.WriteLine(mystring);
}

You can also try the Regex class.
var regex = new Regex( "camel", RegexOptions.IgnoreCase );
var newSentence = regex.Replace( sentence, "horse" );

Use this, Tested and 100% Worked!
For VB.NET
Dim myString As String
Dim oldValue As String
Dim newValue As String
myString = Form1.TextBox1.Text
oldValue = TextBox1.Text
newValue = TextBox2.Text
Dim working As String = myString
Dim index As Integer = working.IndexOf(oldValue, StringComparison.CurrentCultureIgnoreCase)
While index <> -1
working = working.Remove(index, oldValue.Length)
working = working.Insert(index, newValue)
index = index + newValue.Length
index = working.IndexOf(oldValue, index, StringComparison.CurrentCultureIgnoreCase)
Form1.TextBox1.Text = working
End While
For C#
private void Button2_Click(System.Object sender, System.EventArgs e)
{
string myString;
string oldValue;
string newValue;
myString = Form1.TextBox1.Text;
oldValue = TextBox1.Text;
newValue = TextBox2.Text;
string working = myString;
int index = working.IndexOf(oldValue, StringComparison.CurrentCultureIgnoreCase);
while (index != -1)
{
working = working.Remove(index, oldValue.Length);
working = working.Insert(index, newValue);
index = index + newValue.Length;
index = working.IndexOf(oldValue, index, StringComparison.CurrentCultureIgnoreCase);
Form1.TextBox1.Text = working;
}
}

I prefer this - "Hello World".ToLower().Replace( "world", "csharp" );

Extract only right most n letters from a string

How can I extract a substring which is composed of the rightmost six letters from another string?
Ex: my string is "PER 343573". Now I want to extract only "343573".
How can I do this?

string SubString = MyString.Substring(MyString.Length-6);

Write an extension method to express the Right(n); function. The function should deal with null or empty strings returning an empty string, strings shorter than the max length returning the original string and strings longer than the max length returning the max length of rightmost characters.
public static string Right(this string sValue, int iMaxLength)
{
//Check if the value is valid
if (string.IsNullOrEmpty(sValue))
{
//Set valid empty string as string could be null
sValue = string.Empty;
}
else if (sValue.Length > iMaxLength)
{
//Make the string no longer than the max length
sValue = sValue.Substring(sValue.Length - iMaxLength, iMaxLength);
}
//Return the string
return sValue;
}

Probably nicer to use an extension method:
public static class StringExtensions
{
public static string Right(this string str, int length)
{
return str.Substring(str.Length - length, length);
}
}
Usage
string myStr = "PER 343573";
string subStr = myStr.Right(6);

using System;
public static class DataTypeExtensions
{
#region Methods
public static string Left(this string str, int length)
{
str = (str ?? string.Empty);
return str.Substring(0, Math.Min(length, str.Length));
}
public static string Right(this string str, int length)
{
str = (str ?? string.Empty);
return (str.Length >= length)
? str.Substring(str.Length - length, length)
: str;
}
#endregion
}
Shouldn't error, returns nulls as empty string, returns trimmed or base values. Use it like "testx".Left(4) or str.Right(12);

MSDN
String mystr = "PER 343573";
String number = mystr.Substring(mystr.Length-6);
EDIT: too slow...

if you are not sure of the length of your string, but you are sure of the words count (always 2 words in this case, like 'xxx yyyyyy') you'd better use split.
string Result = "PER 343573".Split(" ")[1];
this always returns the second word of your string.

This isn't exactly what you are asking for, but just looking at the example, it appears that you are looking for the numeric section of the string.
If this is always the case, then a good way to do it would be using a regular expression.
var regex= new Regex("\n+");
string numberString = regex.Match(page).Value;

Since you are using .NET, which all compiles to MSIL, just reference Microsoft.VisualBasic, and use Microsoft's built-in Strings.Right method:
using Microsoft.VisualBasic;
...
string input = "PER 343573";
string output = Strings.Right(input, 6);
No need to create a custom extension method or other work. The result is achieved with one reference and one simple line of code.
As further info on this, using Visual Basic methods with C# has been documented elsewhere. I personally stumbled on it first when trying to parse a file, and found this SO thread on using the Microsoft.VisualBasic.FileIO.TextFieldParser class to be extremely useful for parsing .csv files.

Use this:
String text = "PER 343573";
String numbers = text;
if (text.Length > 6)
{
numbers = text.Substring(text.Length - 6);
}

Guessing at your requirements but the following regular expression will yield only on 6 alphanumerics before the end of the string and no match otherwise.
string result = Regex.Match("PER 343573", #"[a-zA-Z\d]{6}$").Value;

var str = "PER 343573";
var right6 = string.IsNullOrWhiteSpace(str) ? string.Empty
: str.Length < 6 ? str
: str.Substring(str.Length - 6); // "343573"
// alternative
var alt_right6 = new string(str.Reverse().Take(6).Reverse().ToArray()); // "343573"
this supports any number of character in the str. the alternative code not support null string. and, the first is faster and the second is more compact.
i prefer the second one if knowing the str containing short string. if it's long string the first one is more suitable.
e.g.
var str = "";
var right6 = string.IsNullOrWhiteSpace(str) ? string.Empty
: str.Length < 6 ? str
: str.Substring(str.Length - 6); // ""
// alternative
var alt_right6 = new string(str.Reverse().Take(6).Reverse().ToArray()); // ""
or
var str = "123";
var right6 = string.IsNullOrWhiteSpace(str) ? string.Empty
: str.Length < 6 ? str
: str.Substring(str.Length - 6); // "123"
// alternative
var alt_right6 = new string(str.Reverse().Take(6).Reverse().ToArray()); // "123"

Another solution that may not be mentioned
S.Substring(S.Length < 6 ? 0 : S.Length - 6)

Use this:
string mystr = "PER 343573";
int number = Convert.ToInt32(mystr.Replace("PER ",""));

Null Safe Methods :
Strings shorter than the max length returning the original string
String Right Extension Method
public static string Right(this string input, int count) =>
String.Join("", (input + "").ToCharArray().Reverse().Take(count).Reverse());
String Left Extension Method
public static string Left(this string input, int count) =>
String.Join("", (input + "").ToCharArray().Take(count));

Here's the solution I use...
It checks that the input string's length isn't lower than the asked length. The solutions I see posted above don't take this into account unfortunately - which can lead to crashes.
/// <summary>
/// Gets the last x-<paramref name="amount"/> of characters from the given string.
/// If the given string's length is smaller than the requested <see cref="amount"/> the full string is returned.
/// If the given <paramref name="amount"/> is negative, an empty string will be returned.
/// </summary>
/// <param name="string">The string from which to extract the last x-<paramref name="amount"/> of characters.</param>
/// <param name="amount">The amount of characters to return.</param>
/// <returns>The last x-<paramref name="amount"/> of characters from the given string.</returns>
public static string GetLast(this string #string, int amount)
{
if (#string == null) {
return #string;
}
if (amount < 0) {
return String.Empty;
}
if (amount >= #string.Length) {
return #string;
} else {
return #string.Substring(#string.Length - amount);
}
}

This is the method I use: I like to keep things simple.
private string TakeLast(string input, int num)
{
if (num > input.Length)
{
num = input.Length;
}
return input.Substring(input.Length - num);
}

//s - your string
//n - maximum number of right characters to take at the end of string
(new Regex("^.*?(.{1,n})$")).Replace(s,"$1")

Just a thought:
public static string Right(this string #this, int length) {
return #this.Substring(Math.Max(#this.Length - length, 0));
}

Without resorting to the bit converter and bit shifting (need to be sure of encoding)
this is fastest method I use as an extension method 'Right'.
string myString = "123456789123456789";
if (myString > 6)
{
char[] cString = myString.ToCharArray();
Array.Reverse(myString);
Array.Resize(ref myString, 6);
Array.Reverse(myString);
string val = new string(myString);
}

I use the Min to prevent the negative situations and also handle null strings
// <summary>
/// Returns a string containing a specified number of characters from the right side of a string.
/// </summary>
public static string Right(this string value, int length)
{
string result = value;
if (value != null)
result = value.Substring(0, Math.Min(value.Length, length));
return result;
}

using Microsoft.visualBasic;
public class test{
public void main(){
string randomString = "Random Word";
print (Strings.right(randomString,4));
}
}
output is "Word"

//Last word of string :: -> sentence
var str ="A random sentence";
var lword = str.Substring(str.LastIndexOf(' ') + 1);
//Last 6 chars of string :: -> ntence
var n = 6;
var right = str.Length >n ? str.Substring(str.Length - n) : "";

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Check if string contains rage of Unicode characters - c#

Is this what you want? public static bool ContainsInvalidCharacters(string name) { return name.IndexOfAny(new[] { '\u0001', '\u0002', '\u0003', }) != -1; } and bool res = ContainsInvalidCharacters("Hello\u0001"); Note the use of '\uXXXX': the ' denote a char instead of a string.

Related

Removing HTML from messages safely

How to replace text with index value in C#

How can I limit a string to no more than a certain length? [duplicate]

String.Replace ignoring case

Extract only right most n letters from a string

Categories

Resources