Get the different substrings from one main string - c#

I have the following main string which contains link Name and link URL. The name and url is combined with #;. I want to get the string of each link (name and url i.e. My web#?http://www.google.com), see example below
string teststring = "My web#;http://www.google.com My Web2#;http://www.bing.se Handbooks#;http://www.books.se/";
and I want to get three different strings using any string function:
My web#?http://www.google.com
My Web2#?http://www.bing.se
Handbooks#?http://www.books.de

So this looks like you want to split on the space after a #;, instead of splitting at #; itself. C# provides arbitrary length lookbehinds, which makes that quite easy. In fact, you should probably do the replacement of #; with #? first:
string teststring = "My web#;http://www.google.com My Web2#;http://www.bing.se Handbooks#;http://www.books.se/";
teststring = Regex.Replace(teststring, #"#;", "#?");
string[] substrings = Regex.Split(teststring, #"(?<=#\?\S*)\s+");
That's it:
foreach(var s in substrings)
Console.WriteLine(s);
Output:
My web#?http://www.google.com
My Web2#?http://www.bing.se
Handbooks#?http://www.books.se/
If you are worried that your input might already contain other #? that you don't want to split on, you can of course do the splitting first (using #; in the pattern) and then loop over substrings and do the replacement call inside the loop.

If these are constant strings, you can just use String.Substring. This will require you to count letters, which is a nuisance, in order to provide the right parameters, but it will work.
string string1 = teststring.Substring(0, 26).Replace(";","?");
If they aren't, things get complicated. You could almost do a split with " " as the delimiter, except that your site name has a space. Do any of the substrings in your data have constant features, such as domain endings (i.e. first .com, then .de, etc.) or something like that?

If you have any control on the input format, you may want to change it to be easy to parse, for example by using another separator between items, other than space.
If this format can't be changed, why not just implement the split in code? It's not as short as using a RegEx, but it might be actually easier for a reader to understand since the logic is straight forward.
This will almost definitely will be faster and cheaper in terms of memory usage.
An example for code that solves this would be:
static void Main(string[] args)
{
var testString = "My web#;http://www.google.com My Web2#;http://www.bing.se Handbooks#;http://www.books.se/";
foreach(var x in SplitAndFormatUrls(testString))
{
Console.WriteLine(x);
}
}
private static IEnumerable<string> SplitAndFormatUrls(string input)
{
var length = input.Length;
var last = 0;
var seenSeparator = false;
var previousChar = ' ';
for (var index = 0; index < length; index++)
{
var currentChar = input[index];
if ((currentChar == ' ' || index == length - 1) && seenSeparator)
{
var currentUrl = input.Substring(last, index - last);
yield return currentUrl.Replace("#;", "#?");
last = index + 1;
seenSeparator = false;
previousChar = ' ';
continue;
}
if (currentChar == ';' && previousChar == '#')
{
seenSeparator = true;
}
previousChar = currentChar;
}
}

Related

How to get parentheses inside parentheses

I'm trying to keep a parenthese within a string that's surrounded by a parenthese.
The string in question is: test (blue,(hmmm) derp)
The desired output into an array is: test and (blue,(hmmm) derp).
The current output is: (blue,, (hmm) and derp).
My current code is thatof this:
var input = Regex
.Split(line, #"(\([^()]*\))")
.Where(s => !string.IsNullOrEmpty(s))
.ToList();
How can i extract the text inside the outside parentheses (keeping them) and keep the inside parenthese as one string in an array?
EDIT:
To clarify my question, I want to ignore the inner parentheses and only split on the outer parentheses.
herpdediderp (orange,(hmm)) some other crap (red,hmm)
Should become:
herpdediderp, orange,(hmm), some other crap and red,hmm.
The code works for everything except the double parentheses: (orange,(hmm)) to orange,(hmm).
You can use the method
public string Trim(params char[] trimChars)
Like this
string trimmedLine = line.Trim('(', ')'); // Specify undesired leading and trailing chars.
// Specify separator characters for the split (here command and space):
string[] input = trimmedLine.Split(new[]{',', ' '}, StringSplitOptions.RemoveEmptyEntries);
If the line can start or end with 2 consecutive parentheses, use simply good old if-statements:
if (line.StartsWith("(")) {
line = line.Substring(1);
}
if (line.EndsWith(")")) {
line = line.Substring(0, line.Length - 1);
}
string[] input = line.Split(new[]{',', ' '},
Lot's o' guessing going on here - from me and the others. You could try
[^(]+|\([^(]*(?:\([^(]*\)[^(]*)*\)
It handles one level of parentheses recursion (could be extended though).
Here at regexstorm.
Visual illustration at regex101.
If this piques your interest, I'll add an explanation ;)
Edit:
If you need to use split, put the selection in to a group, like
([^(]+|\([^(]*(?:\([^(]*\)[^(]*)*\))
and filter out empty strings. See example here at ideone.
Edit 2:
Not quite sure what behaviour you want with multiple levels of parentheses, but I assume this could do it for you:
([^(]+|\([^(]*(?:\([^(]*(?:\([^(]*\)[^(]*)*\)[^(]*)*\))
^^^^^^^^^^^^^^^^^^^ added
For each level of recursion you want, you "just" add another inner level. So this is for two levels of recursion ;)
See it here at ideone.
Hopefully someone will come up with a regex. Here's my code answer.
static class ExtensionMethods
{
static public IEnumerable<string> GetStuffInsideParentheses(this IEnumerable<char> input)
{
int levels = 0;
var current = new Queue<char>();
foreach (char c in input)
{
if (levels == 0)
{
if (c == '(') levels++;
continue;
}
if (c == ')')
{
levels--;
if (levels == 0)
{
yield return new string(current.ToArray());
current.Clear();
continue;
}
}
if (c == '(')
{
levels++;
}
current.Enqueue(c);
}
}
}
Test program:
public class Program
{
public static void Main()
{
var input = new []
{
"(blue,(hmmm) derp)",
"herpdediderp (orange,(hmm)) some other crap (red,hmm)"
};
foreach ( var s in input )
{
var output = s.GetStuffInsideParentheses();
foreach ( var o in output )
{
Console.WriteLine(o);
}
Console.WriteLine();
}
}
}
Output:
blue,(hmmm) derp
orange,(hmm)
red,hmm
Code on DotNetFiddle
I think if you think about the problem backwards, it becomes a bit easier - don't split on what you don't what, extract what you do want.
The only slightly tricky part if matching nested parentheses, I assume you will only go one level deep.
The first example:
var s1 = "(blue, (hmmm) derp)";
var input = Regex.Matches(s1, #"\((?:\(.+?\)|[^()]+)+\)").Cast<Match>().Select(m => Regex.Matches(m.Value, #"\(\w+\)|\w+").Cast<Match>().Select(m2 => m2.Value).ToArray()).ToArray();
// input is string[][] { string[] { "blue", "(hmmm)", "derp" } }
The second example uses an extension method:
public static string TrimOutside(this string src, string openDelims, string closeDelims) {
if (!String.IsNullOrEmpty(src)) {
var openIndex = openDelims.IndexOf(src[0]);
if (openIndex >= 0 && src.EndsWith(closeDelims.Substring(openIndex, 1)))
src = src.Substring(1, src.Length - 2);
}
return src;
}
The code/patterns are different because the two examples are being handled differently:
var s2 = "herpdediderp (orange,(hmm)) some other crap (red,hmm)";
var input3 = Regex.Matches(s2, #"\w(?:\w| )+\w|\((?:[^(]+|\([^)]+\))+\)").Cast<Match>().Select(m => m.Value.TrimOutside("(",")")).ToArray();
// input2 is string[] { "herpdediderp", "orange,(hmm)", "some other crap", "red,hmm" }

Finding longest word in string

Ok, so I know that questions LIKE this have been asked a lot on here, but I can't seem to make solutions work.
I am trying to take a string from a file and find the longest word in that string.
Simples.
I think the issue is down to whether I am calling my methods on a string[] or char[], currently stringOfWords returns a char[].
I am trying to then order by descending length and get the first value but am getting an ArgumentNullException on the OrderByDescending method.
Any input much appreciated.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Text;
using System.Threading.Tasks;
namespace TextExercises
{
class Program
{
static void Main(string[] args)
{
var fileText = File.ReadAllText(#"C:\Users\RichardsPC\Documents\TestText.txt");
var stringOfWords = fileText.ToArray();
Console.WriteLine("Text in file: " + fileText);
Console.WriteLine("Words in text: " + fileText.Split(' ').Length);
// This is where I am trying to solve the problem
var finalValue = stringOfWords.OrderByDescending(n => n.length).First();
Console.WriteLine("Largest word is: " + finalValue);
}
}
}
Don't split the string, use a Regex
If you care about performance you don't want to split the string. The reason in order to do the split method will have to traverse the entire string, create new strings for the items it finds to split and put them into an array, computational cost of more than N, then doing an order by you do another (at least) O(nLog(n)) steps.
You can use a Regex for this, which will be more efficient, because it will only iterate over the string once
var regex = new Regex(#"(\w+)\s",RegexOptions.Compiled);
var match = regex.Match(fileText);
var currentLargestString = "";
while(match.Success)
{
if(match.Groups[1].Value.Length>currentLargestString.Length)
{
currentLargestString = match.Groups[1].Value;
}
match = match.NextMatch();
}
The nice thing about this is that you don't need to break the string up all at once to do the analysis and if you need to load the file incrementally is a fairly easy change to just persist the word in an object and call it against multiple strings
If you're set on using an Array don't order by just iterate over
You don't need to do an order by your just looking for the largest item, computational complexity of order by is in most cases O(nLog(n)), iterating over the list has a complexity of O(n)
var largest = "";
foreach(var item in strArr)
{
if(item.Length>largest.Length)
largest = item;
}
Method ToArray() in this case returns char[] which is an array of individual characters. But instead you need an array of individual words. You can get it like this:
string[] stringOfWords = fileText.Split(' ');
And you have a typo in your lambda expression (uppercase L):
n => n.Length
Try this:
var fileText = File.ReadAllText(#"C:\Users\RichardsPC\Documents\TestText.txt");
var words = fileText.Split(' ')
var finalValue = fileText.OrderByDescending(n=> n.Length).First();
Console.WriteLine("Longest word: " + finalValue");
As suggested in the other answer, you need to split your string.
string[] stringOfWords = fileText.split(new Char [] {',' , ' ' });
//all is well, now let's loop over it and see which is the biggest
int biggest = 0;
int biggestIndex = 0;
for(int i=0; i<stringOfWords.length; i++) {
if(biggest < stringOfWords[i].length) {
biggest = stringOfWords[i].length;
biggestIndex = i;
}
}
return stringOfWords[i];
What we're doing here is splitting the string based on whitespace (' '), or commas- you can add an unlimited number of delimiters there - each word, then, gets its own space in the array.
From there, we're iterating over the array. If we encounter a word that's longer than the current longest word, we update it.

How to implement "Find, Replace, Next" in a String on C#?

I'm searching for a solution to this case:
I have a Method inside a DLL that receive a string that contains some words as "placeholders/parameters" that will be replaced by a result of another specific method (inside dll too)
Too simplificate: It's a query string received as an argument to be on a method inside a DLL, where X word that matchs a specifc case, will be replaced.
My method receive a string that could be like this:
(on .exe app)
string str = "INSERT INTO mydb.mytable (id_field, description, complex_number) VALUES ('#GEN_COMPLEX_ID#','A complex solution', '#GEN_COMPLEX_ID#');"
MyDLLClass.MyMethod(str);
So, the problem is: if i replace the #GEN_COMPLEX_ID# on this string, wanting that a different should be on each match, it not will happen because the replaced executes the function in a single shot (not step by step). So, i wanna help to implement this: a step by step replace of any text (like Find some word, replace, than next ... replace ... next... etc.
Could you help me?
Thanks!
This works pretty well for me:
string yourOriginalString = "ab cd ab cd ab cd";
string pattern = "ab";
string yourNewDescription = "123";
int startingPositionOffset = 0;
int yourOriginalStringLength = yourOriginalString.Length;
MatchCollection match = Regex.Matches(yourOriginalString, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
foreach (Match m in match)
{
yourOriginalString = yourOriginalString.Substring(0, m.Index+startingPositionOffset) + yourNewDescription + yourOriginalString.Substring(m.Index + startingPositionOffset+ m.Length);
startingPositionOffset = yourOriginalString.Length - yourOriginalStringLength;
}
If what you're asking is how to replace each placeholder with a different value, you can do it using the Regex.Replace overload which accepts a MatchEvaluator delegate, and executes it for each match:
// conceptually, something like this (note that it's not checking if there are
// enough values in the replacementValues array)
static string ReplaceMultiple(
string input, string placeholder, IEnumerable<string> replacementValues)
{
var enumerator = replacementValues.GetEnumerator();
return Regex.Replace(input, placeholder,
m => { enumerator.MoveNext(); return enumerator.Current; });
}
This is, of course, presuming that all placeholders look the same.
Pseudo-code
var split = source.Split(placeholder); // create array of items without placeholders
var result = split[0]; // copy first item
for(int i = 1; i < result.Length; i++)
{
bool replace = ... // ask user
result += replace ? replacement : placeholder; // to put replacement or not to put
result += split[i]; // copy next item
}
you should use the split method like this
string [] placeholder = {"#Placeholder#"} ;
string[] request = cd.Split(placeholder, StringSplitOptions.RemoveEmptyEntries);
StringBuilder requetBuilding = new StringBuilder();
requetBuilding.Append(request[0]);
int index = 1;
requetBuilding.Append("Your place holder replacement");
requetBuilding.Append(request[index]);
index++; //next replacement
// requetBuilding.Append("Your next place holder replacement");
// requetBuilding.Append(request[index]);

Replacing characters in a string with another string

So what I am trying to do is as follows :
example of a string is A4PC
I am trying to replace for example any occurance of "A" with "[A4]" so I would get and similar any occurance of "4" with "[A4]"
"[A4][A4]PC"
I tried doing a normal Replace on the string but found out I got
"[A[A4]]PC"
string badWordAllVariants =
restriction.Value.Replace("A", "[A4]").Replace("4", "[A4]")
since I have two A's in a row causing an issue.
So I was thinking it would be better rather than the replace on the string I need to do it on a character per character basis and then build up a string again.
Is there anyway in Linq or so to do something like this ?
You don't need any LINQ here - String.Replace works just fine:
string input = "AAPC";
string result = input.Replace("A", "[A4]"); // "[A4][A4]PC"
UPDATE: For your updated requirements I suggest to use regular expression replace
string input = "A4PC";
var result = Regex.Replace(input, "A|4", "[A4]"); // "[A4][A4]PC"
This works well for me:
string x = "AAPC";
string replace = x.Replace("A", "[A4]");
EDIT:
Based on the updated question, the issue is the second replacement. In order to replace multiple strings you will want to do this sequentially:
var original = "AAPC";
// add arbitrary room to allow for more new characters
StringBuilder resultString = new StringBuilder(original.Length + 10);
foreach (char currentChar in original.ToCharArray())
{
if (currentChar == 'A') resultString.Append("[A4]");
else if (currentChar == '4') resultString.Append("[A4]");
else resultString.Append(currentChar);
}
string result = resultString.ToString();
You can run this routine with any replacements you want to make (in this case the letters 'A' and '4' and it should work. If you would want to replace strings the code would be similar in structure but you would need to "look ahead" and probably use a for loop. Hopefully this helps!
By the way - you want to use a string builder here and not strings because strings are static which means space gets allocated every time you loop. (Not good!)
I think this should do the trick
string str = "AA4PC";
string result = Regex.Replace(str, #"(?<Before>[^A4]?)(?<Value>A|4)(?<After>[^A4]?)", (m) =>
{
string before = m.Groups["Before"].Value;
string after = m.Groups["After"].Value;
string value = m.Groups["Value"].Value;
if (before != "[" || after != "]")
{
return "[A4]";
}
return m.ToString();
});
It is going to replace A and 4 that hasn't been replaced yet for [A4].

How can I convert PascalCase to split words?

I have variables containing text such as:
ShowSummary
ShowDetails
AccountDetails
Is there a simple way function / method in C# that I can apply to these variables to yield:
"Show Summary"
"Show Details"
"Account Details"
I was wondering about an extension method but I've never coded one and I am not sure where to start.
See this post by Jon Galloway and one by Phil
In the application I am currently working on, we have a delegate based split extension method. It looks like so:
public static string Split(this string target, Func<char, char, bool> shouldSplit, string splitFiller = " ")
{
if (target == null)
throw new ArgumentNullException("target");
if (shouldSplit == null)
throw new ArgumentNullException("shouldSplit");
if (String.IsNullOrEmpty(splitFiller))
throw new ArgumentNullException("splitFiller");
int targetLength = target.Length;
// We know the resulting string is going to be atleast the length of target
StringBuilder result = new StringBuilder(targetLength);
result.Append(target[0]);
// Loop from the second character to the last character.
for (int i = 1; i < targetLength; ++i)
{
char firstChar = target[i - 1];
char secondChar = target[i];
if (shouldSplit(firstChar, secondChar))
{
// If a split should be performed add in the filler
result.Append(splitFiller);
}
result.Append(secondChar);
}
return result.ToString();
}
Then it is could be used as follows:
string showSummary = "ShowSummary";
string spacedString = showSummary.Split((c1, c2) => Char.IsLower(c1) && Char.IsUpper(c2));
This allows you to split on any conditions between two chars, and insert a filler of your choice (default of a space).
The best would be to iterate through each character within the string. Check if the character is upper case. If so, insert a space character before it. Otherwise, move onto the next character.
Also, ideally start from the second character so that a space would not be inserted before the first character.
try something like this
var word = "AccountDetails";
word = string.Join(string.Empty,word
.Select(c => new string(c, 1)).Select(c => c[0] < 'Z' ? " " + c : c)).Trim();

Categories

Resources