multiline formatting for verbatim strings in c# (prefix with #) - c#

I love using the #"strings" in c#, especially when I have a lot of multi-line text. The only annoyance is that my code formatting goes to doodie when doing this, because the second and greater lines are pushed fully to the left instead of using the indentation of my beautifully formatted code. I know this is by design, but is there some option/hack way of allowing these lines to be indented, without adding the actual tabs/spaces to the output?
adding example:
var MyString = #" this is
a multi-line string
in c#.";
My variable declaration is indented to the "correct" depth, but the second and further lines in the string get pushed to the left margin- so the code is kinda ugly. You could add tabs to the start of line 2 and 3, but the string itself would then contain those tabs... make sense?

How about a string extension? Update: I reread your question and I hope there is a better answer. This is something that bugs me too and having to solve it as below is frustrating but on the plus side it does work.
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
public static class StringExtensions
{
public static string StripLeadingWhitespace(this string s)
{
Regex r = new Regex(#"^\s+", RegexOptions.Multiline);
return r.Replace(s, string.Empty);
}
}
}
And an example console program:
using System;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string x = #"This is a test
of the emergency
broadcasting system.";
Console.WriteLine(x);
Console.WriteLine();
Console.WriteLine("---");
Console.WriteLine();
Console.WriteLine(x.StripLeadingWhitespace());
Console.ReadKey();
}
}
}
And the output:
This is a test
of the emergency
broadcasting system.
---
This is a test
of the emergency
broadcasting system.
And a cleaner way to use it if you decide to go this route:
string x = #"This is a test
of the emergency
broadcasting system.".StripLeadingWhitespace();
// consider renaming extension to say TrimIndent() or similar if used this way

Cymen has given the right solution. I use a similar approach as derived from Scala's stripMargin() method. Here's what my extension method looks like:
public static string StripMargin(this string s)
{
return Regex.Replace(s, #"[ \t]+\|", string.Empty);
}
Usage:
var mystring = #"
|SELECT
| *
|FROM
| SomeTable
|WHERE
| SomeColumn IS NOT NULL"
.StripMargin();
Result:
SELECT
*
FROM
SomeTable
WHERE
SomeColumn IS NOT NULL

I can't think of an answer that would completely satisfy your question, however you could write a function that strips leading spaces from lines of text contained in a string and call it on each creation of such a string.
var myString = TrimLeadingSpacesOfLines(#" this is a
a multi-line string
in c#.");
Yes it is a hack, but you specified your acceptance of a hack in your question.

Here is a longish solution which tries to mimic textwrap.dedent as much as possible. The first line is left as-is and expected not to be indented. (You can generate the unit tests based on the doctests using doctest-csharp.)
/// <summary>
/// Imitates the Python's
/// <a href="https://docs.python.org/3/library/textwrap.html#textwrap.dedent">
/// <c>textwrap.dedent</c></a>.
/// </summary>
/// <param name="text">Text to be dedented</param>
/// <returns>array of dedented lines</returns>
/// <code doctest="true">
/// Assert.That(Dedent(""), Is.EquivalentTo(new[] {""}));
/// Assert.That(Dedent("test me"), Is.EquivalentTo(new[] {"test me"}));
/// Assert.That(Dedent("test\nme"), Is.EquivalentTo(new[] {"test", "me"}));
/// Assert.That(Dedent("test\n me"), Is.EquivalentTo(new[] {"test", " me"}));
/// Assert.That(Dedent("test\n me\n again"), Is.EquivalentTo(new[] {"test", "me", " again"}));
/// Assert.That(Dedent(" test\n me\n again"), Is.EquivalentTo(new[] {" test", "me", " again"}));
/// </code>
private static string[] Dedent(string text)
{
var lines = text.Split(
new[] {"\r\n", "\r", "\n"},
StringSplitOptions.None);
// Search for the first non-empty line starting from the second line.
// The first line is not expected to be indented.
var firstNonemptyLine = -1;
for (var i = 1; i < lines.Length; i++)
{
if (lines[i].Length == 0) continue;
firstNonemptyLine = i;
break;
}
if (firstNonemptyLine < 0) return lines;
// Search for the second non-empty line.
// If there is no second non-empty line, we can return immediately as we
// can not pin the indent.
var secondNonemptyLine = -1;
for (var i = firstNonemptyLine + 1; i < lines.Length; i++)
{
if (lines[i].Length == 0) continue;
secondNonemptyLine = i;
break;
}
if (secondNonemptyLine < 0) return lines;
// Match the common prefix with at least two non-empty lines
var firstNonemptyLineLength = lines[firstNonemptyLine].Length;
var prefixLength = 0;
for (int column = 0; column < firstNonemptyLineLength; column++)
{
char c = lines[firstNonemptyLine][column];
if (c != ' ' && c != '\t') break;
bool matched = true;
for (int lineIdx = firstNonemptyLine + 1; lineIdx < lines.Length;
lineIdx++)
{
if (lines[lineIdx].Length == 0) continue;
if (lines[lineIdx].Length < column + 1)
{
matched = false;
break;
}
if (lines[lineIdx][column] != c)
{
matched = false;
break;
}
}
if (!matched) break;
prefixLength++;
}
if (prefixLength == 0) return lines;
for (var i = 1; i < lines.Length; i++)
{
if (lines[i].Length > 0) lines[i] = lines[i].Substring(prefixLength);
}
return lines;
}

Related

How to Trim exactly 1 whitespace after splitting the string

I have made a program that evaluates a string by splitting it at a pipeline, the string are randomly generated and sometimes whitespace is a part of what need to be evaluated.
HftiVfzRIDBeotsnU uabjvLPC | LstHCfuobtv eVzDUBPn jIRfai
This string is same length on either side(2 x whitespace on left side of pipeline), but my problem comes when i have to trim the space on both sides of the pipeline (i do this after splitting)
is there some way of making sure that i only trim 1 single space instead of them all.
my code so far:
foreach (string s in str)
{
int bugCount = 0;
string[] info = s.Split('|');
string testCase = info[0].TrimEnd();
char[] testArr = testCase.ToCharArray();
string debugInfo = info[1].TrimStart();
char[] debugArr = debugInfo.ToCharArray();
int arrBound = debugArr.Count();
for (int i = 0; i < arrBound; i++)
if (testArr[i] != debugArr[i])
bugCount++;
if (bugCount <= 2 && bugCount != 0)
Console.WriteLine("Low");
if (bugCount <= 4 && bugCount != 0)
Console.WriteLine("Medium");
if (bugCount <= 6 && bugCount != 0)
Console.WriteLine("High");
if (bugCount > 6)
Console.WriteLine("Critical");
else
Console.WriteLine("Done");
}
Console.ReadLine();
You have 2 options.
If there is always 1 space before and after the pipe, split on {space}|{space}.
myInput.Split(new[]{" | "},StringSplitOptions.None);
Otherwise, instead of using TrimStart() & TrimEnd() use SubString.
var split = myInput.Split('|');
var s1 = split[0].EndsWith(" ")
? split[0].SubString(0,split[0].Length-1)
: split[0];
var s2 = split[1].StartsWith(" ")
? split[1].SubString(1) // to end of line
: split[1];
Note, there is some complexity here - if the pipe has no space around it, but the last/first character is a legitimate (data) space character the above will cut it off. You need more logic, but hopefully this will get you started!
There is no way to tell the Trim.. methods family to stop after cutting out some number of characters.
In general case, you'd need to do it manually by inspecting the parts obtained after Split and checking their first/last characters and substring'ing to get the correct part.
However, in your case, there's a much simpler way - the Split can also take a string as an argument, and even more - a set of strings:
string[] info = s.Split(new []{ " | " });
// or even
string[] info = s.Split(new []{ " | ", " |", "| ", "|" });
That should take care of the single spaces around the pipe | character by simply treating them as a part of the separator.
This is a string extension to trim space for count times, just in case.
public static class StringExtension
{
/// <summary>
/// Trim space at the end of string for count times
/// </summary>
/// <param name="input"></param>
/// <param name="count">number of space at the end to trim</param>
/// <returns></returns>
public static string TrimEnd(this string input, int count = 1)
{
string result = input;
if (count <= 0)
{
return result;
}
if (result.EndsWith(new string(' ', count)))
{
result = result.Substring(0, result.Length - count);
}
return result;
}
/// <summary>
/// Trim space at the start of string for count times
/// </summary>
/// <param name="input"></param>
/// <param name="count">number of space at the start to trim</param>
/// <returns></returns>
public static string TrimStart(this string input, int count = 1)
{
string result = input;
if (count <= 0)
{
return result;
}
if (result.StartsWith(new string(' ', count)))
{
result = result.Substring(count);
}
return result;
}
}
In the main
static void Main(string[] args)
{
string a = "1234 ";
string a1 = a.TrimEnd(1); // returns "1234 "
string a2 = a.TrimEnd(2); // returns "1234"
string a3 = a.TrimEnd(3); // returns "1234 "
string b = " 5678";
string b1 = b.TrimStart(1); // returns " 5678"
string b2 = b.TrimStart(2); // returns "5678"
string b3 = b.TrimStart(3); // returns " 5678"
}

OpenXML replace text in all document

I have the piece of code below. I'd like replace the text "Text1" by "NewText", that's work. But when I place the text "Text1" in a table that's not work anymore for the "Text1" inside the table.
I'd like make this replacement in the all document.
using (WordprocessingDocument doc = WordprocessingDocument.Open(String.Format("c:\\temp\\filename.docx"), true))
{
var body = doc.MainDocumentPart.Document.Body;
foreach (var para in body.Elements<Paragraph>())
{
foreach (var run in para.Elements<Run>())
{
foreach (var text in run.Elements<Text>())
{
if (text.Text.Contains("##Text1##"))
text.Text = text.Text.Replace("##Text1##", "NewText");
}
}
}
}
Your code does not work because the table element (w:tbl) is not contained in
a paragraph element (w:p). See the following MSDN article for more information.
The Text class (serialized as w:t) usually represents literal text within a Run element in a
word document. So you could simply search for all w:t elements (Text class) and replace your
tag if the text element (w:t) contains your tag:
using (WordprocessingDocument doc = WordprocessingDocument.Open("yourdoc.docx", true))
{
var body = doc.MainDocumentPart.Document.Body;
foreach (var text in body.Descendants<Text>())
{
if (text.Text.Contains("##Text1##"))
{
text.Text = text.Text.Replace("##Text1##", "NewText");
}
}
}
Borrowing on some other answers in various places, and with the fact that four main obstacles must be overcome:
Delete any high level Unicode chars from your replace string that cannot be read from Word (from bad user input)
Ability to search for your find result across multiple runs or text elements within a paragraph (Word will often break up a single sentence into several text runs)
Ability to include a line break in your replace text so as to insert multi-line text into the document.
Ability to pass in any node as the starting point for your search so as to restrict the search to that part of the document (such as the body, the header, the footer, a specific table, table row, or tablecell).
I am sure advanced scenarios such as bookmarks, complex nesting will need more modification on this, but it is working for the types of basic word documents I have run into so far, and is much more helpful to me than disregarding runs altogether or using a RegEx on the entire file with no ability to target a specific TableCell or Document part (for advanced scenarios).
Example Usage:
var body = document.MainDocumentPart.Document.Body;
ReplaceText(body, replace, with);
The code:
using System;
using System.Collections.Generic;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace My.Web.Api.OpenXml
{
public static class WordTools
{
/// <summary>
/// Find/replace within the specified paragraph.
/// </summary>
/// <param name="paragraph"></param>
/// <param name="find"></param>
/// <param name="replaceWith"></param>
public static void ReplaceText(Paragraph paragraph, string find, string replaceWith)
{
var texts = paragraph.Descendants<Text>();
for (int t = 0; t < texts.Count(); t++)
{ // figure out which Text element within the paragraph contains the starting point of the search string
Text txt = texts.ElementAt(t);
for (int c = 0; c < txt.Text.Length; c++)
{
var match = IsMatch(texts, t, c, find);
if (match != null)
{ // now replace the text
string[] lines = replaceWith.Replace(Environment.NewLine, "\r").Split('\n', '\r'); // handle any lone n/r returns, plus newline.
int skip = lines[lines.Length - 1].Length - 1; // will jump to end of the replacement text, it has been processed.
if (c > 0)
lines[0] = txt.Text.Substring(0, c) + lines[0]; // has a prefix
if (match.EndCharIndex + 1 < texts.ElementAt(match.EndElementIndex).Text.Length)
lines[lines.Length - 1] = lines[lines.Length - 1] + texts.ElementAt(match.EndElementIndex).Text.Substring(match.EndCharIndex + 1);
txt.Space = new EnumValue<SpaceProcessingModeValues>(SpaceProcessingModeValues.Preserve); // in case your value starts/ends with whitespace
txt.Text = lines[0];
// remove any extra texts.
for (int i = t + 1; i <= match.EndElementIndex; i++)
{
texts.ElementAt(i).Text = string.Empty; // clear the text
}
// if 'with' contained line breaks we need to add breaks back...
if (lines.Count() > 1)
{
OpenXmlElement currEl = txt;
Break br;
// append more lines
var run = txt.Parent as Run;
for (int i = 1; i < lines.Count(); i++)
{
br = new Break();
run.InsertAfter<Break>(br, currEl);
currEl = br;
txt = new Text(lines[i]);
run.InsertAfter<Text>(txt, currEl);
t++; // skip to this next text element
currEl = txt;
}
c = skip; // new line
}
else
{ // continue to process same line
c += skip;
}
}
}
}
}
/// <summary>
/// Determine if the texts (starting at element t, char c) exactly contain the find text
/// </summary>
/// <param name="texts"></param>
/// <param name="t"></param>
/// <param name="c"></param>
/// <param name="find"></param>
/// <returns>null or the result info</returns>
static Match IsMatch(IEnumerable<Text> texts, int t, int c, string find)
{
int ix = 0;
for (int i = t; i < texts.Count(); i++)
{
for (int j = c; j < texts.ElementAt(i).Text.Length; j++)
{
if (find[ix] != texts.ElementAt(i).Text[j])
{
return null; // element mismatch
}
ix++; // match; go to next character
if (ix == find.Length)
return new Match() { EndElementIndex = i, EndCharIndex = j }; // full match with no issues
}
c = 0; // reset char index for next text element
}
return null; // ran out of text, not a string match
}
/// <summary>
/// Defines a match result
/// </summary>
class Match
{
/// <summary>
/// Last matching element index containing part of the search text
/// </summary>
public int EndElementIndex { get; set; }
/// <summary>
/// Last matching char index of the search text in last matching element
/// </summary>
public int EndCharIndex { get; set; }
}
} // class
} // namespace
public static class OpenXmlTools
{
// filters control characters but allows only properly-formed surrogate sequences
private static Regex _invalidXMLChars = new Regex(
#"(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\uFEFF\uFFFE\uFFFF]",
RegexOptions.Compiled);
/// <summary>
/// removes any unusual unicode characters that can't be encoded into XML which give exception on save
/// </summary>
public static string RemoveInvalidXMLChars(string text)
{
if (string.IsNullOrEmpty(text)) return "";
return _invalidXMLChars.Replace(text, "");
}
}
Maybe this solution is easier
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
//1. Copy all the file into a string
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
docText = sr.ReadToEnd();
//2. Use regular expression to replace all text
Regex regexText = new Regex(find);
docText = regexText.Replace(docText, replace);
//3. Write the changed string into the file again
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
sw.Write(docText);

out of bounds error c#

Im trying to read contents of a csv file into different variables in order to send to a web service.It has been working fine but suddenly today i got and exception.
index was outside the bounds of the array:
what Did I do wrong?
String sourceDir = #"\\198.0.0.4\e$\Globus\LIVE\bnk.run\URA.BP\WEBOUT\";
// Process the list of files found in the directory.
string[] fileEntries = Directory.GetFiles(sourceDir);
foreach (string fileName2 in fileEntries)
{
// read values
StreamReader st = new StreamReader(fileName2);
while (st.Peek() >= 0)
{
String report1 = st.ReadLine();
String[] columns = report1.Split(','); //split columns
String prnout = columns[0];
String tinout = columns[1];
String amtout = columns[2];
String valdate = columns[3];
String paydate = columns[4];
String status = columns[5];
String branch = columns[6];
String reference = columns[7];
}
}
It's hard to guess without even seeing the .csv file, but my first one would be that you don't have 8 columns.
It would be easier if you could show the original .csv file, and tell us where the exception pops.
edit: If you think the data is alright, I'd suggest you debugging and see what the split call returns in Visual Studio. That might help
edit2: And since you're doing that processing in a loop, make sure each row has at least 8 columns.
My money is on bad data file. If that is the only thing in the equation that has changed (aka you haven't made any code changes) then that's pretty much your only option.
If your data file isn't too long post it here and we can tell you for sure.
You can add something like below to check for invalid column lengths:
while (st.Peek() >= 0)
{
String report1 = st.ReadLine();
String[] columns = report1.Split(','); //split columns
if(columns.Length < 8)
{
//Log something useful, throw an exception, whatever.
//You have the option to quitely note that there was a problem and
//continue on processing the rest of the file if you want.
continue;
}
//working with columns below
}
Just for sanity's sake, I combined all the various notes written here. This code is a bit cleaner and has some validation in it.
Try this:
string dir = #"\\198.0.0.4\e$\Globus\LIVE\bnk.run\URA.BP\WEBOUT\";
foreach (string fileName2 in Directory.GetFiles(dir)) {
StreamReader st = new StreamReader(fileName2);
while (!sr.EndOfStream) {
string line = sr.ReadLine();
if (!String.IsNullOrEmpty(line)) {
string[] columns = line.Split(',');
if (columns.Length == 8) {
string prnout = columns[0];
string tinout = columns[1];
string amtout = columns[2];
string valdate = columns[3];
string paydate = columns[4];
string status = columns[5];
string branch = columns[6];
string reference = columns[7];
}
}
}
}
EDIT: As some other users have commented, the CSV format also accepts text qualifiers, which usually means the double quote symbol ("). For example, a text qualified line may look like this:
user,"Hello!",123.23,"$123,123.12",and so on,
Writing CSV parsing code is a little more complicated when you have a fully formatted file like this. Over the years I've been parsing improperly formatted CSV files, I've worked up a standard code script that passes virtually all unit tests, but it's a pain to explain.
/// <summary>
/// Read in a line of text, and use the Add() function to add these items to the current CSV structure
/// </summary>
/// <param name="s"></param>
public static bool TryParseLine(string s, char delimiter, char text_qualifier, out string[] array)
{
bool success = true;
List<string> list = new List<string>();
StringBuilder work = new StringBuilder();
for (int i = 0; i < s.Length; i++) {
char c = s[i];
// If we are starting a new field, is this field text qualified?
if ((c == text_qualifier) && (work.Length == 0)) {
int p2;
while (true) {
p2 = s.IndexOf(text_qualifier, i + 1);
// for some reason, this text qualifier is broken
if (p2 < 0) {
work.Append(s.Substring(i + 1));
i = s.Length;
success = false;
break;
}
// Append this qualified string
work.Append(s.Substring(i + 1, p2 - i - 1));
i = p2;
// If this is a double quote, keep going!
if (((p2 + 1) < s.Length) && (s[p2 + 1] == text_qualifier)) {
work.Append(text_qualifier);
i++;
// otherwise, this is a single qualifier, we're done
} else {
break;
}
}
// Does this start a new field?
} else if (c == delimiter) {
list.Add(work.ToString());
work.Length = 0;
// Test for special case: when the user has written a casual comma, space, and text qualifier, skip the space
// Checks if the second parameter of the if statement will pass through successfully
// e.g. "bob", "mary", "bill"
if (i + 2 <= s.Length - 1) {
if (s[i + 1].Equals(' ') && s[i + 2].Equals(text_qualifier)) {
i++;
}
}
} else {
work.Append(c);
}
}
list.Add(work.ToString());
// If we have nothing in the list, and it's possible that this might be a tab delimited list, try that before giving up
if (list.Count == 1 && delimiter != DEFAULT_TAB_DELIMITER) {
string[] tab_delimited_array = ParseLine(s, DEFAULT_TAB_DELIMITER, DEFAULT_QUALIFIER);
if (tab_delimited_array.Length > list.Count) {
array = tab_delimited_array;
return success;
}
}
// Return the array we parsed
array = list.ToArray();
return success;
}
You should note that, even as complicated as this algorithm is, it still is unable to parse CSV files where there are embedded newlines within a text qualified value, for example, this:
123,"Hi, I am a CSV File!
I am saying hello to you!
But I also have embedded newlines in my text.",2012-07-23
To solve those, I have a multiline parser that uses the Try() feature to add additional lines of text to verify that the main function worked correctly:
/// <summary>
/// Parse a line whose values may include newline symbols or CR/LF
/// </summary>
/// <param name="sr"></param>
/// <returns></returns>
public static string[] ParseMultiLine(StreamReader sr, char delimiter, char text_qualifier)
{
StringBuilder sb = new StringBuilder();
string[] array = null;
while (!sr.EndOfStream) {
// Read in a line
sb.Append(sr.ReadLine());
// Does it parse?
string s = sb.ToString();
if (TryParseLine(s, delimiter, text_qualifier, out array)) {
return array;
}
}
// Fails to parse - return the best array we were able to get
return array;
}
Since you don't know how many columns will be in csv file, you might need to test for length:
if (columns.Length == 8) {
String prnout = columns[0];
String tinout = columns[1];
...
}
I bet you just got an empty line (extra EOL at the end), and that's as simple as that

How to find out next word in a sentence in c#?

I have a string
"bat and ball not pen or boat not phone"
I want to pick words adjacent to not
for example -- "not pen", "not phone"
but I was unable to do it? I have tried to pick up the word by using the index and substring but its not possible.
tempTerm = tempTerm.Trim().Substring(0, tempTerm.Length - (orterm.Length + 1)).ToString();
How about using some Regex
Something like
string s = "bat and ball not pen or boat not phone";
Regex reg = new Regex("not\\s\\w+");
MatchCollection matches = reg.Matches(s);
foreach (Match match in matches)
{
string sub = match.Value;
}
See Learn Regular Expression (Regex) syntax with C# and .NET for some more details
You can split the sentence, and then just loop through looking for "not":
string sentence = "bat and ball not pen or boat not phone";
string[] words = sentence.Split(new char[] {' '});
List<string> wordsBesideNot = new List<string>();
for (int i = 0; i < words.Length - 1; i++)
{
if (words[i].Equals("not"))
wordsBesideNot.Add(words[i + 1]);
}
// At this point, wordsBesideNot is { "pen", "phone" }
String[] parts = myStr.Split(' ');
for (int i = 0; i < parts.Length; i++)
if (parts[i] == "not" && i + 1 < parts.Length)
someList.Add(parts[i + 1]);
This should get you all the words adjacent to not, you could compare with case insensitive if need be.
You can use this regex: not\s\w+\b. It will match desired phrases:
not pen
not phone
I'd say start by splitting your string into an array - it will make this kind of thing a whole lot easier.
In C# I would so something like this
// Orginal string
string s = "bat and ball not pen or boat not phone";
// Seperator
string seperate = "not ";
// Length of the seperator
int length = seperate.Length;
// sCopy so you dont touch the original string
string sCopy = s.ToString();
// List to store the words, you could use an array if
// you count the 'not's.
List<string> stringList = new List<string>();
// While the seperator (not ) exists in the string
while (sCopy.IndexOf(seperate) != -1)
{
// Index of the next seperator
int index = sCopy.IndexOf(seperate);
// Remove anything before the seperator and the
// seperator itself.
sCopy = sCopy.Substring(index + length);
// In case of multiple spaces remove them.
sCopy = sCopy.TrimStart(' ');
// If there are more spaces or more words to come
// then specify the length
if (sCopy.IndexOf(' ') != -1)
{
// Cut the word out of sCopy
string sub = sCopy.Substring(0, sCopy.IndexOf(' '));
// Add the word to the list
stringList.Add(sub);
}
// Otherwise just get the rest of the string
else
{
// Cut the word out of sCopy
string sub = sCopy.Substring(0);
// Add the word to the list
stringList.Add(sub);
}
}
int p = 0;
The words in the list are pen and phone. This will fail when you get odd characters, full stops etc. If you don't know how the string is going to be constructed you might need something more complex.
public class StringHelper
{
/// <summary>
/// Gets the surrounding words of a given word in a given text.
/// </summary>
/// <param name="text">A text in which the given word to be searched.</param>
/// <param name="word">A word to be searched in the given text.</param>
/// <param name="prev">The number of previous words to include in the result.</param>
/// <param name="next">The number of next words to include in the result.</param>
/// <param name="all">Sets whether the method returns all instances of the search word.</param>
/// <returns>An array that consists of parts of the text, including the search word and the surrounding words.</returns>
public static List<string> GetSurroundingWords(string text, string word, int prev, int next, bool all = false)
{
var phrases = new List<string>();
var words = text.Split();
var indices = new List<int>();
var index = -1;
while ((index = Array.IndexOf(words, word, index + 1)) != -1)
{
indices.Add(index);
if (!all && indices.Count == 1)
break;
}
foreach (var ind in indices)
{
var prevActual = ind;
if (prev > prevActual)
prev = prevActual;
var nextActual = words.Length - ind;
if (next > nextActual)
next = nextActual;
var picked = new List<string>();
for (var i = 1; i <= prev; i++)
picked.Add(words[ind - i]);
picked.Reverse();
picked.Add(word);
for (var i = 1; i <= next; i++)
picked.Add(words[ind + i]);
phrases.Add(string.Join(" ", picked));
}
return phrases;
}
}
[TestClass]
public class StringHelperTests
{
private const string Text = "Date and Time in C# are handled by DateTime class in C# that provides properties and methods to format dates in different datetime formats.";
[TestMethod]
public void GetSurroundingWords()
{
// Arrange
var word = "class";
var expected = new [] { "DateTime class in C#" };
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 1, 2);
// Assert
Assert.AreEqual(expected.Length, actual.Count);
Assert.AreEqual(expected[0], actual[0]);
}
[TestMethod]
public void GetSurroundingWords_NoMatch()
{
// Arrange
var word = "classify";
var expected = new List<string>();
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 1, 2);
// Assert
Assert.AreEqual(expected.Count, actual.Count);
}
[TestMethod]
public void GetSurroundingWords_MoreSurroundingWordsThanAvailable()
{
// Arrange
var word = "class";
var expected = "Date and Time in C# are handled by DateTime class in C#";
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 50, 2);
// Assert
Assert.AreEqual(expected.Length, actual[0].Length);
Assert.AreEqual(expected, actual[0]);
}
[TestMethod]
public void GetSurroundingWords_ZeroSurroundingWords()
{
// Arrange
var word = "class";
var expected = "class";
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 0, 0);
// Assert
Assert.AreEqual(expected.Length, actual[0].Length);
Assert.AreEqual(expected, actual[0]);
}
[TestMethod]
public void GetSurroundingWords_AllInstancesOfSearchWord()
{
// Arrange
var word = "and";
var expected = new[] { "Date and Time", "properties and methods" };
// Act
var actual = StringHelper.GetSurroundingWords(Text, word, 1, 1, true);
// Assert
Assert.AreEqual(expected.Length, actual.Count);
Assert.AreEqual(expected[0], actual[0]);
Assert.AreEqual(expected[1], actual[1]);
}
}

Splitting CamelCase

This is all asp.net c#.
I have an enum
public enum ControlSelectionType
{
NotApplicable = 1,
SingleSelectRadioButtons = 2,
SingleSelectDropDownList = 3,
MultiSelectCheckBox = 4,
MultiSelectListBox = 5
}
The numerical value of this is stored in my database. I display this value in a datagrid.
<asp:boundcolumn datafield="ControlSelectionTypeId" headertext="Control Type"></asp:boundcolumn>
The ID means nothing to a user so I have changed the boundcolumn to a template column with the following.
<asp:TemplateColumn>
<ItemTemplate>
<%# Enum.Parse(typeof(ControlSelectionType), DataBinder.Eval(Container.DataItem, "ControlSelectionTypeId").ToString()).ToString()%>
</ItemTemplate>
</asp:TemplateColumn>
This is a lot better... However, it would be great if there was a simple function I can put around the Enum to split it by Camel case so that the words wrap nicely in the datagrid.
Note: I am fully aware that there are better ways of doing all this. This screen is purely used internally and I just want a quick hack in place to display it a little better.
I used:
public static string SplitCamelCase(string input)
{
return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
}
Taken from http://weblogs.asp.net/jgalloway/archive/2005/09/27/426087.aspx
vb.net:
Public Shared Function SplitCamelCase(ByVal input As String) As String
Return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim()
End Function
Here is a dotnet Fiddle for online execution of the c# code.
Indeed a regex/replace is the way to go as described in the other answer, however this might also be of use to you if you wanted to go a different direction
using System.ComponentModel;
using System.Reflection;
...
public static string GetDescription(System.Enum value)
{
FieldInfo fi = value.GetType().GetField(value.ToString());
DescriptionAttribute[] attributes = (DescriptionAttribute[])fi.GetCustomAttributes(typeof(DescriptionAttribute), false);
if (attributes.Length > 0)
return attributes[0].Description;
else
return value.ToString();
}
this will allow you define your Enums as
public enum ControlSelectionType
{
[Description("Not Applicable")]
NotApplicable = 1,
[Description("Single Select Radio Buttons")]
SingleSelectRadioButtons = 2,
[Description("Completely Different Display Text")]
SingleSelectDropDownList = 3,
}
Taken from
http://www.codeguru.com/forum/archive/index.php/t-412868.html
This regex (^[a-z]+|[A-Z]+(?![a-z])|[A-Z][a-z]+) can be used to extract all words from the camelCase or PascalCase name. It also works with abbreviations anywhere inside the name.
MyHTTPServer will contain exactly 3 matches: My, HTTP, Server
myNewXMLFile will contain 4 matches: my, New, XML, File
You could then join them into a single string using string.Join.
string name = "myNewUIControl";
string[] words = Regex.Matches(name, "(^[a-z]+|[A-Z]+(?![a-z])|[A-Z][a-z]+)")
.OfType<Match>()
.Select(m => m.Value)
.ToArray();
string result = string.Join(" ", words);
As #DanielB noted in the comments, that regex won't work for numbers (and with underscores), so here is an improved version that supports any identifier with words, acronyms, numbers, underscores (slightly modified #JoeJohnston's version), see online demo (fiddle):
([A-Z]+(?![a-z])|[A-Z][a-z]+|[0-9]+|[a-z]+)
Extreme example: __snake_case12_camelCase_TLA1ABC → snake, case, 12, camel, Case, TLA, 1, ABC
Tillito's answer does not handle strings already containing spaces well, or Acronyms. This fixes it:
public static string SplitCamelCase(string input)
{
return Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled);
}
If C# 3.0 is an option you can use the following one-liner to do the job:
Regex.Matches(YOUR_ENUM_VALUE_NAME, "[A-Z][a-z]+").OfType<Match>().Select(match => match.Value).Aggregate((acc, b) => acc + " " + b).TrimStart(' ');
Here's an extension method that handles numbers and multiple uppercase characters sanely, and also allows for upper-casing specific acronyms in the final string:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.Text.RegularExpressions;
using System.Web.Configuration;
namespace System
{
/// <summary>
/// Extension methods for the string data type
/// </summary>
public static class ConventionBasedFormattingExtensions
{
/// <summary>
/// Turn CamelCaseText into Camel Case Text.
/// </summary>
/// <param name="input"></param>
/// <returns></returns>
/// <remarks>Use AppSettings["SplitCamelCase_AllCapsWords"] to specify a comma-delimited list of words that should be ALL CAPS after split</remarks>
/// <example>
/// wordWordIDWord1WordWORDWord32Word2
/// Word Word ID Word 1 Word WORD Word 32 Word 2
///
/// wordWordIDWord1WordWORDWord32WordID2ID
/// Word Word ID Word 1 Word WORD Word 32 Word ID 2 ID
///
/// WordWordIDWord1WordWORDWord32Word2Aa
/// Word Word ID Word 1 Word WORD Word 32 Word 2 Aa
///
/// wordWordIDWord1WordWORDWord32Word2A
/// Word Word ID Word 1 Word WORD Word 32 Word 2 A
/// </example>
public static string SplitCamelCase(this string input)
{
if (input == null) return null;
if (string.IsNullOrWhiteSpace(input)) return "";
var separated = input;
separated = SplitCamelCaseRegex.Replace(separated, #" $1").Trim();
//Set ALL CAPS words
if (_SplitCamelCase_AllCapsWords.Any())
foreach (var word in _SplitCamelCase_AllCapsWords)
separated = SplitCamelCase_AllCapsWords_Regexes[word].Replace(separated, word.ToUpper());
//Capitalize first letter
var firstChar = separated.First(); //NullOrWhiteSpace handled earlier
if (char.IsLower(firstChar))
separated = char.ToUpper(firstChar) + separated.Substring(1);
return separated;
}
private static readonly Regex SplitCamelCaseRegex = new Regex(#"
(
(?<=[a-z])[A-Z0-9] (?# lower-to-other boundaries )
|
(?<=[0-9])[a-zA-Z] (?# number-to-other boundaries )
|
(?<=[A-Z])[0-9] (?# cap-to-number boundaries; handles a specific issue with the next condition )
|
(?<=[A-Z])[A-Z](?=[a-z]) (?# handles longer strings of caps like ID or CMS by splitting off the last capital )
)"
, RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace
);
private static readonly string[] _SplitCamelCase_AllCapsWords =
(WebConfigurationManager.AppSettings["SplitCamelCase_AllCapsWords"] ?? "")
.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
.Select(a => a.ToLowerInvariant().Trim())
.ToArray()
;
private static Dictionary<string, Regex> _SplitCamelCase_AllCapsWords_Regexes;
private static Dictionary<string, Regex> SplitCamelCase_AllCapsWords_Regexes
{
get
{
if (_SplitCamelCase_AllCapsWords_Regexes == null)
{
_SplitCamelCase_AllCapsWords_Regexes = new Dictionary<string,Regex>();
foreach(var word in _SplitCamelCase_AllCapsWords)
_SplitCamelCase_AllCapsWords_Regexes.Add(word, new Regex(#"\b" + word + #"\b", RegexOptions.Compiled | RegexOptions.IgnoreCase));
}
return _SplitCamelCase_AllCapsWords_Regexes;
}
}
}
}
You can use C# extension methods
public static string SpacesFromCamel(this string value)
{
if (value.Length > 0)
{
var result = new List<char>();
char[] array = value.ToCharArray();
foreach (var item in array)
{
if (char.IsUpper(item) && result.Count > 0)
{
result.Add(' ');
}
result.Add(item);
}
return new string(result.ToArray());
}
return value;
}
Then you can use it like
var result = "TestString".SpacesFromCamel();
Result will be
Test String
Using LINQ:
var chars = ControlSelectionType.NotApplicable.ToString().SelectMany((x, i) => i > 0 && char.IsUpper(x) ? new char[] { ' ', x } : new char[] { x });
Console.WriteLine(new string(chars.ToArray()));
I also have an enum which I had to separate. In my case this method solved the problem-
string SeparateCamelCase(string str)
{
for (int i = 1; i < str.Length; i++)
{
if (char.IsUpper(str[i]))
{
str = str.Insert(i, " ");
i++;
}
}
return str;
}
public enum ControlSelectionType
{
NotApplicable = 1,
SingleSelectRadioButtons = 2,
SingleSelectDropDownList = 3,
MultiSelectCheckBox = 4,
MultiSelectListBox = 5
}
public class NameValue
{
public string Name { get; set; }
public object Value { get; set; }
}
public static List<NameValue> EnumToList<T>(bool camelcase)
{
var array = (T[])(Enum.GetValues(typeof(T)).Cast<T>());
var array2 = Enum.GetNames(typeof(T)).ToArray<string>();
List<NameValue> lst = null;
for (int i = 0; i < array.Length; i++)
{
if (lst == null)
lst = new List<NameValue>();
string name = "";
if (camelcase)
{
name = array2[i].CamelCaseFriendly();
}
else
name = array2[i];
T value = array[i];
lst.Add(new NameValue { Name = name, Value = value });
}
return lst;
}
public static string CamelCaseFriendly(this string pascalCaseString)
{
Regex r = new Regex("(?<=[a-z])(?<x>[A-Z])|(?<=.)(?<x>[A-Z])(?=[a-z])");
return r.Replace(pascalCaseString, " ${x}");
}
//In your form
protected void Button1_Click1(object sender, EventArgs e)
{
DropDownList1.DataSource = GeneralClass.EnumToList<ControlSelectionType >(true); ;
DropDownList1.DataTextField = "Name";
DropDownList1.DataValueField = "Value";
DropDownList1.DataBind();
}
The solution from Eoin Campbell works good except if you have a Web Service.
You would need to do the Following as the Description Attribute is not serializable.
[DataContract]
public enum ControlSelectionType
{
[EnumMember(Value = "Not Applicable")]
NotApplicable = 1,
[EnumMember(Value = "Single Select Radio Buttons")]
SingleSelectRadioButtons = 2,
[EnumMember(Value = "Completely Different Display Text")]
SingleSelectDropDownList = 3,
}
public static string GetDescriptionFromEnumValue(Enum value)
{
EnumMemberAttribute attribute = value.GetType()
.GetField(value.ToString())
.GetCustomAttributes(typeof(EnumMemberAttribute), false)
.SingleOrDefault() as EnumMemberAttribute;
return attribute == null ? value.ToString() : attribute.Value;
}
And if you don't fancy using regex - try this:
public static string SeperateByCamelCase(this string text, char splitChar = ' ') {
var output = new StringBuilder();
for (int i = 0; i < text.Length; i++)
{
var c = text[i];
//if not the first and the char is upper
if (i > 0 && char.IsUpper(c)) {
var wasLastLower = char.IsLower(text[i - 1]);
if (i + 1 < text.Length) //is there a next
{
var isNextUpper = char.IsUpper(text[i + 1]);
if (!isNextUpper) //if next is not upper (start of a word).
{
output.Append(splitChar);
}
else if (wasLastLower) //last was lower but i'm upper and my next is an upper (start of an achromin). 'abcdHTTP' 'abcd HTTP'
{
output.Append(splitChar);
}
}
else
{
//last letter - if its upper and the last letter was lower 'abcd' to 'abcd A'
if (wasLastLower)
{
output.Append(splitChar);
}
}
}
output.Append(c);
}
return output.ToString();
}
Passes these tests, it doesn't like numbers but i didn't need it to.
[TestMethod()]
public void ToCamelCaseTest()
{
var testData = new string[] { "AAACamel", "AAA", "SplitThisByCamel", "AnA", "doesnothing", "a", "A", "aasdasdAAA" };
var expectedData = new string[] { "AAA Camel", "AAA", "Split This By Camel", "An A", "doesnothing", "a", "A", "aasdasd AAA" };
for (int i = 0; i < testData.Length; i++)
{
var actual = testData[i].SeperateByCamelCase();
var expected = expectedData[i];
Assert.AreEqual(actual, expected);
}
}
#JustSayNoToRegex
Takes a C# identifier, with uderscores and numbers, and converts it to space-separated string.
public static class StringExtensions
{
public static string SplitOnCase(this string identifier)
{
if (identifier == null || identifier.Length == 0) return string.Empty;
var sb = new StringBuilder();
if (identifier.Length == 1) sb.Append(char.ToUpperInvariant(identifier[0]));
else if (identifier.Length == 2) sb.Append(char.ToUpperInvariant(identifier[0])).Append(identifier[1]);
else {
if (identifier[0] != '_') sb.Append(char.ToUpperInvariant(identifier[0]));
for (int i = 1; i < identifier.Length; i++) {
var current = identifier[i];
var previous = identifier[i - 1];
if (current == '_' && previous == '_') continue;
else if (current == '_') {
sb.Append(' ');
}
else if (char.IsLetter(current) && previous == '_') {
sb.Append(char.ToUpperInvariant(current));
}
else if (char.IsDigit(current) && char.IsLetter(previous)) {
sb.Append(' ').Append(current);
}
else if (char.IsLetter(current) && char.IsDigit(previous)) {
sb.Append(' ').Append(char.ToUpperInvariant(current));
}
else if (char.IsUpper(current) && char.IsLower(previous)
&& (i < identifier.Length - 1 && char.IsUpper(identifier[i + 1]) || i == identifier.Length - 1)) {
sb.Append(' ').Append(current);
}
else if (char.IsUpper(current) && i < identifier.Length - 1 && char.IsLower(identifier[i + 1])) {
sb.Append(' ').Append(current);
}
else {
sb.Append(current);
}
}
}
return sb.ToString();
}
}
Tests:
[TestFixture]
static class HelpersTests
{
[Test]
public static void Basic()
{
Assert.AreEqual("Foo", "foo".SplitOnCase());
Assert.AreEqual("Foo", "_foo".SplitOnCase());
Assert.AreEqual("Foo", "__foo".SplitOnCase());
Assert.AreEqual("Foo", "___foo".SplitOnCase());
Assert.AreEqual("Foo 2", "foo2".SplitOnCase());
Assert.AreEqual("Foo 23", "foo23".SplitOnCase());
Assert.AreEqual("Foo 23 A", "foo23A".SplitOnCase());
Assert.AreEqual("Foo 23 Ab", "foo23Ab".SplitOnCase());
Assert.AreEqual("Foo 23 Ab", "foo23_ab".SplitOnCase());
Assert.AreEqual("Foo 23 Ab", "foo23___ab".SplitOnCase());
Assert.AreEqual("Foo 23", "foo__23".SplitOnCase());
Assert.AreEqual("Foo Bar", "Foo_bar".SplitOnCase());
Assert.AreEqual("Foo Bar", "Foo____bar".SplitOnCase());
Assert.AreEqual("AAA", "AAA".SplitOnCase());
Assert.AreEqual("Foo A Aa", "fooAAa".SplitOnCase());
Assert.AreEqual("Foo AAA", "fooAAA".SplitOnCase());
Assert.AreEqual("Foo Bar", "FooBar".SplitOnCase());
Assert.AreEqual("Mn M", "MnM".SplitOnCase());
Assert.AreEqual("AS", "aS".SplitOnCase());
Assert.AreEqual("As", "as".SplitOnCase());
Assert.AreEqual("A", "a".SplitOnCase());
Assert.AreEqual("_", "_".SplitOnCase());
}
}
Simple version similar to some of the above, but with logic to not auto-insert the separator (which is by default, a space, but can be any char) if there's already one at the current position.
Uses a StringBuilder rather than 'mutating' strings.
public static string SeparateCamelCase(this string value, char separator = ' ') {
var sb = new StringBuilder();
var lastChar = separator;
foreach (var currentChar in value) {
if (char.IsUpper(currentChar) && lastChar != separator)
sb.Append(separator);
sb.Append(currentChar);
lastChar = currentChar;
}
return sb.ToString();
}
Example:
Input : 'ThisIsATest'
Output : 'This Is A Test'
Input : 'This IsATest'
Output : 'This Is A Test' (Note: Still only one space between 'This' and 'Is')
Input : 'ThisIsATest' (with separator '_')
Output : 'This_Is_A_Test'
Try this:
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
Console
.WriteLine(
SeparateByCamelCase("TestString") == "Test String" // True
);
}
public static string SeparateByCamelCase(string str)
{
return String.Join(" ", SplitByCamelCase(str));
}
public static IEnumerable<string> SplitByCamelCase(string str)
{
if (str.Length == 0)
return new List<string>();
return
new List<string>
{
Head(str)
}
.Concat(
SplitByCamelCase(
Tail(str)
)
);
}
public static string Head(string str)
{
return new String(
str
.Take(1)
.Concat(
str
.Skip(1)
.TakeWhile(IsLower)
)
.ToArray()
);
}
public static string Tail(string str)
{
return new String(
str
.Skip(
Head(str).Length
)
.ToArray()
);
}
public static bool IsLower(char ch)
{
return ch >= 'a' && ch <= 'z';
}
}
See sample online

Categories

Resources