sub-strings replacements according to some mapping

sub-strings replacements according to some mapping - c#

Given a string, I need to replace substrings according to a given mapping. The mapping determines where to start the replacement, the length of text to be replaced and the replacement string. The mapping is according to the following scheme:
public struct mapItem
{
public int offset;
public int length;
public string newString;
}
For example: given a mapping {{0,3,"frog"},{9,3,"kva"}} and a string
"dog says gav"
we replace starting at position 0 a substring of the length 3 to the "frog", i.e.
dog - > frog
and starting the position 9 a substring of the length 3 to the "kva", i.e.
gav->kva
The new string becomes:
"frog says kva"
How can I do it efficiently?

You have to take care that replacements take into account the shift produced by preceding replacements. Also using a StringBuilder is more efficient, as is doesn't allocate new memory at each operation as string operations do. (Strings are invariant, which means that a completely new string is created at each string operation.)
var maps = new List<MapItem> { ... };
var sb = new StringBuilder("dog says gav");
int shift = 0;
foreach (MapItem map in maps.OrderBy(m => m.Offset)) {
sb.Remove(map.Offset + shift, map.Length);
sb.Insert(map.Offset + shift, map.NewString);
shift += map.NewString.Length - map.Length;
}
string result = sb.ToString();
The OrderBy makes sure that the replacements are executed from left to right. If you know that the mappings are provided in this order, you can drop the OrderBy.
Another simpler way is to begin with the replacements at the right end and work backwards, so that the character shifts do not alter the positions of not yet executed replacements:
var sb = new StringBuilder("dog says gav");
foreach (MapItem map in maps.OrderByDescending(m => m.Offset)) {
sb.Remove(map.Offset, map.Length);
sb.Insert(map.Offset, map.NewString);
}
string result = sb.ToString();
In case the mappings are already ordered in ascending order, a simple reverse for-statement seems appropriate:
var sb = new StringBuilder("dog says gav");
for (int i = maps.Count - 1; i >= 0; i--) {
MapItem map = maps[i];
sb.Remove(map.Offset, map.Length);
sb.Insert(map.Offset, map.NewString);
}
string result = sb.ToString();

You can write an Extension Method like below:
public static class ExtensionMethod
{
public static string ReplaceSubstringByMap(this string str, List<mapItem> map)
{
int offsetShift = 0;
foreach (mapItem mapItem in map.OrderBy(x => x.offset))
{
str = str.Remove(mapItem.offset + offsetShift, mapItem.length).Insert(mapItem.offset + offsetShift, mapItem.newString);
offsetShift += mapItem.newString.Length - mapItem.length;
}
return str;
}
}
And invoke it like below:
var map = new List<mapItem>
{
new mapItem
{
offset = 0,
length = 1,
newString = "frog"
},
new mapItem
{
offset = 9,
length = 1,
newString = "kva"
}
};
string str = "dog says gav";
var result = str.ReplaceSubstringByMap(map);

Related

Method that takes a message and index, creates a substring using the index

Problem: I want to write a method that takes a message/index pair like this:
("Hello, I am *Name1, how are you doing *Name2?", 2)
The index refers to the asterisk delimited name in the message. So if the index is 1, it should refer to *Name1, if it's 2 it should refer to *Name2.
The method should return just the name with the asterisk (*Name2).
I have attempted to play around with substrings, taking the first delimited * and ending when we reach a character that isn't a letter, number, underscore or hyphen, but the logic just isn't setting in.
I know this is similar to a few problems on SO but I can't find anything this specific. Any help is appreciated.
This is what's left of my very vague attempt so far. Based on this thread:
public string GetIndexedNames(string message, int index)
{
int strStart = message.IndexOf("#") + "#".Length;
int strEnd = message.LastIndexOf(" ");
String result = message.Substring(strStart, strEnd - strStart);
}

If you want to do it the old school way, then something like:
public static void Main(string[] args)
{
string message = "Hello, I am *Name1, how are you doing *Name2?";
string name1 = GetIndexedNames(message, "*", 1);
string name2 = GetIndexedNames(message, "*", 2);
Console.WriteLine(message);
Console.WriteLine(name1);
Console.WriteLine(name2);
Console.ReadLine();
}
public static string GetIndexedNames(string message, string singleCharDelimiter, int index)
{
string valid = "abcdefghijlmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-";
string[] parts = message.Split(singleCharDelimiter.ToArray());
if (parts.Length >= index)
{
StringBuilder sb = new StringBuilder();
for(int i = 0; i < parts[index].Length; i++)
{
string character = parts[index].Substring(i, 1);
if (valid.Contains(character))
{
sb.Append(character);
}
else
{
return sb.ToString();
}
}
return sb.ToString();
}
return "";
}

You can try using regular expressions to match the names. Assuming that name is a sequence of word characters (letters or digits):
using System.Linq;
using System.Text.RegularExpressions;
...
// Either name with asterisk *Name or null
// index is 1-based
private static ObtainName(string source, int index) => Regex
.Matches(source, #"\*\w+")
.Cast<Match>()
.Select(match => match.Value)
.Distinct() // in case the same name repeats several times
.ElementAtOrDefault(index - 1);
Demo:
string name = ObtainName(
"Hello, I am *Name1, how are you doing *Name2?", 2);
Console.Write(name);
Outcome:
*Name2

Perhaps not the most elegant solution, but if you want to use IndexOf, use a loop:
public static string GetIndexedNames(string message, int index, char marker='*')
{
int lastFound = 0;
for (int i = 0; i < index; i++) {
lastFound = message.IndexOf(marker, lastFound+1);
if (lastFound == -1) return null;
}
var space = message.IndexOf(' ', lastFound);
return space == -1 ? message.Substring(lastFound) : message.Substring(lastFound, space - lastFound);
}

ForEach Substring Trim

I have the following query:
Locations.ForEach(a => a.Name = a.Name.Replace(#"/", #" ").Replace(#"\", #" ");
What I like to do is if the Name has a length of greater than 31 characters, I like to trim upto the 31st character. Is there each easy way to do this with the .ForEach I have.

One way to accomplish this is to use a combination of the System.Linq extension method Take to "take" the number of characters you want from the beginning of the string and string.Concat to combine those characters back to a string, and then use this for the new value.
Using your example:
int maxLength = 31;
Locations.ForEach(a =>
a.Name = string.Concat(a.Name.Replace(#"/", #" ").Replace(#"\", #" ").Take(maxLength));
Or a full compile-able example:
public class Location
{
public string Name { get; set; }
public override string ToString()
{
return Name;
}
}
private static void Main()
{
var locations = new List<Location>();
var maxLength = 5;
// Populate our list with locations that have an ever growing Name length
for (int i = 0; i < 10; i++)
{
locations.Add(new Location {Name = new string('*', i)});
}
Console.WriteLine("Beginning values:");
locations.ForEach(Console.WriteLine);
// Trim the values
locations.ForEach(l => l.Name = string.Concat(l.Name.Take(maxLength)));
Console.WriteLine("\nEnding values:");
locations.ForEach(Console.WriteLine);
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

Or example with Substring:
const int maxLength = 31;
Locations.ForEach(a => a.Name = a.Name
.Substring(0, Math.Min(a.Name.Length, maxLength))
.Replace(#"/", #" ")
.Replace(#"\", #" "));

Get a range from two integers

string num = db.SelectNums(id);
string[] numArr = num.Split('-').ToArray();
string num contain for an example "48030-48039";
string[] numArr will therefor contain (48030, 48039).
Now I have two elements, a high and low. I now need to get ALL the numbers from 48030 to 48039. The issue is that it has to be string since there will be telephone numbers with leading zeroes now and then.
Thats why I cannot use Enumerable.Range().ToArray() for this.
Any suggestions? The expected result should be 48030, 48031, 48032, ... , 48039

This should work with your leading zero requirement:
string num = db.SelectNums(id);
string[] split = num.Split('-');
long start = long.Parse(split[0]);
long end = long.Parse(split[1]);
bool includeLeadingZero = split[0].StartsWith("0");
List<string> results = new List<string>();
for(int i = start; i <= end; i++)
{
string result = includeLeadingZero ? "0" : "";
result += i.ToString();
results.Add(result);
}
string[] arrayResults = results.ToArray();
A few things to note:
It assumes your input will be 2 valid integers split by a single hyphen
I am giving you a string array result, personally I would prefer to work with a List<int> in the end
If the first number contains a single leading zero, then all results will contain a single leading zero
It uses long to cater for longer numbers, beware that the max number that will parse is 9,223,372,036,854,775,807, you could double this value (not length) by using ulong

Are you saying this?
int[] nums = new int[numArr.Length];
for (int i = 0; i < numArr.Length; i++)
{
nums[i] = Convert.ToInt32(numArr[i]);
}
Array.Sort(nums);
for (int n = nums[0]; n <= nums[nums.Length - 1]; n++)
{
Console.WriteLine(n);
}
here link

I am expecting your string always have valid two integers, so using Parse instead TryParse
string[] strList = "48030-48039".Split('-').ToArray();
var lst = strList.Select(int.Parse).ToList();
var min = lst.OrderBy(l => l).FirstOrDefault();
var max = lst.OrderByDescending(l => l).FirstOrDefault();
var diff = max - min;
//adding 1 here otherwise 48039 will not be there
var rng = Enumerable.Range(min,diff+1);
If you expecting invalid string
var num = 0;
var lst = (from s in strList where int.TryParse(s, out num) select num).ToList();

This is one way to do it:
public static string[] RangeTest()
{
Boolean leadingZero = false;
string num = "048030-48039"; //db.SelectNums(id);
if (num.StartsWith("0"))
leadingZero = true;
int min = int.Parse(num.Split('-').Min());
int count = int.Parse(num.Split('-').Max()) - min;
if (leadingZero)
return Enumerable.Range(min, count).Select(x => "0" + x.ToString()).ToArray();
else
return Enumerable.Range(min, count).Select(x => "" + x.ToString()).ToArray(); ;
}

You can use string.Format to ensure numbers are formatted with leading zeros. That will make the method work with arbitrary number of leading zeros.
private static string CreateRange(string num)
{
var tokens = num.Split('-').Select(s => s.Trim()).ToArray();
// use UInt64 to allow huge numbers
var start = UInt64.Parse(tokens[0]);
var end = UInt64.Parse(tokens[1]);
// if your start number is '000123', this will create
// a format string with 6 zeros ('000000')
var format = new string('0', tokens[0].Length);
// use StringBuilder to make GC happy.
// (if only there was a Enumerable.Range<ulong> overload...)
var sb = new StringBuilder();
for (var i = start; i <= end; i++)
{
// format ensures that your numbers are padded properly
sb.Append(i.ToString(format));
sb.Append(", ");
}
// trim trailing comma after the last element
if (sb.Length >= 2) sb.Length -= 2;
return sb.ToString();
}
Usage example:
// prints 0000012, 0000013, 0000014
Console.WriteLine( CreateRange("0000012-0000014") );

Three significant issues were brought up in comments:
The phone numbers have enough digits to exceed Int32.MaxValue so
converting to int isn't viable.
The phone numbers can have leading zeros (presumeably for some
international calling?)
The possible range of numbers can theoretically exceed the maximum size of an array (which may have memory issues, and I think may not be represented as a string)
As such, you may need to use long instead of int, and I would suggest using deferred execution if needed for very large ranges.
public static IEnumerable<string> EnumeratePhoneNumbers(string fromAndTo)
{
var split = fromAndTo.Split('-');
return EnumeratePhoneNumbers(split[0], split[1]);
}
public static IEnumerable<string> EnumeratePhoneNumbers(string fromString, string toString)
{
long from = long.Parse(fromString);
long to = long.Parse(toString);
int totalNumberLength = fromString.Length;
for (long phoneNumber = from; phoneNumber <= to; phoneNumber++)
{
yield return phoneNumber.ToString().PadLeft(totalNumberLength, '0');
}
}
This assumes that the padded zeros are already included in the lower bound fromString text. It will iterate and yield out numbers as you need them. This can be useful if you're churning out a lot of numbers and don't need to fill up memory with them, or if you just need the first 10 or 100. For example:
var first100Numbers =
EnumeratePhoneNumbers("0018155500-7018155510")
.Take(100)
.ToArray();
Normally that range would produce 7 billion results which cannot be stored in an array, and might run into memory issues (I'm not even sure if it can be stored in a string); by using deferred execution, you only create the 100 needed.
If you do have a small range, you can still join up your results into a string as you desired:
string numberRanges = String.Join(", ", EnumeratePhoneNumbers("0018155500-0018155510"));
And naturally you can put this array creation into your own helper method:
public static string GetPhoneNumbersListing(string fromAndTo)
{
return String.Join(", ", EnumeratePhoneNumbers("0018155500-0018155510"));
}
So your usage would be:
string numberRanges = GetPhoneNumbersListing("0018155500-0018155510");

A complete solution inspired by the answer by #Dan-o:
Inputs:
Start: 48030
End: 48039
Digits: 6
Expected String Output:
048030, 048031, 048032, 048033, 048034, 048035, 048036, 048037, 048038, 048039
Program:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
public class Program
{
public static void Main()
{
int first = 48030;
int last = 48039;
int digits = 6;
Console.WriteLine(CreateRange(first, last, digits));
}
public static string CreateRange(int first, int last, int numDigits)
{
string separator = ", ";
var sb = new StringBuilder();
sb.Append(first.ToString().PadLeft(numDigits, '0'));
foreach (int num in Enumerable.Range(first + 1, last - first))
{
sb.Append(separator + num.ToString().PadLeft(numDigits, '0'));
}
return sb.ToString();
}
}

For Each item In Enumerable.Range(min, count).ToArray()
something = item.PadLeft(5, "0")
Next

Count words and spaces in string C#

I want to count words and spaces in my string. String looks like this:
Command do something ptuf(123) and bo(1).ctq[5] v:0,
I have something like this so far
int count = 0;
string mystring = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
foreach(char c in mystring)
{
if(char.IsLetter(c))
{
count++;
}
}
What should I do to count spaces also?

int countSpaces = mystring.Count(Char.IsWhiteSpace); // 6
int countWords = mystring.Split().Length; // 7
Note that both use Char.IsWhiteSpace which assumes other characters than " " as white-space(like newline). Have a look at the remarks section to see which exactly .

you can use string.Split with a space
http://msdn.microsoft.com/en-us/library/system.string.split.aspx
When you get a string array the number of elements is the number of words, and the number of spaces is the number of words -1

if you want to count spaces you can use LINQ :
int count = mystring.Count(s => s == ' ');

This will take into account:
Strings starting or ending with a space.
Double/triple/... spaces.
Assuming that the only word seperators are spaces and that your string is not null.
private static int CountWords(string S)
{
if (S.Length == 0)
return 0;
S = S.Trim();
while (S.Contains(" "))
S = S.Replace(" "," ");
return S.Split(' ').Length;
}
Note: the while loop can also be done with a regex: How do I replace multiple spaces with a single space in C#?

Here's a method using regex. Just something else to consider. It is better if you have long strings with lots of different types of whitespace. Similar to Microsoft Word's WordCount.
var str = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
int count = Regex.Matches(str, #"[\S]+").Count; // count is 7
For comparison,
var str = "Command do something ptuf(123) and bo(1).ctq[5] v:0,";
str.Count(char.IsWhiteSpace) is 17, while the regex count is still 7.

I've got some ready code to get a list of words in a string:
(extension methods, must be in a static class)
/// <summary>
/// Gets a list of words in the text. A word is any string sequence between two separators.
/// No word is added if separators are consecutive (would mean zero length words).
/// </summary>
public static List<string> GetWords(this string Text, char WordSeparator)
{
List<int> SeparatorIndices = Text.IndicesOf(WordSeparator.ToString(), true);
int LastIndexNext = 0;
List<string> Result = new List<string>();
foreach (int index in SeparatorIndices)
{
int WordLen = index - LastIndexNext;
if (WordLen > 0)
{
Result.Add(Text.Substring(LastIndexNext, WordLen));
}
LastIndexNext = index + 1;
}
return Result;
}
/// <summary>
/// returns all indices of the occurrences of a passed string in this string.
/// </summary>
public static List<int> IndicesOf(this string Text, string ToFind, bool IgnoreCase)
{
int Index = -1;
List<int> Result = new List<int>();
string T, F;
if (IgnoreCase)
{
T = Text.ToUpperInvariant();
F = ToFind.ToUpperInvariant();
}
else
{
T = Text;
F = ToFind;
}
do
{
Index = T.IndexOf(F, Index + 1);
Result.Add(Index);
}
while (Index != -1);
Result.RemoveAt(Result.Count - 1);
return Result;
}
/// <summary>
/// Implemented - returns all the strings in uppercase invariant.
/// </summary>
public static string[] ToUpperAll(this string[] Strings)
{
string[] Result = new string[Strings.Length];
Strings.ForEachIndex(i => Result[i] = Strings[i].ToUpperInvariant());
return Result;
}

In addition to Tim's entry, in case you have padding on either side, or multiple spaces beside each other:
Int32 words = somestring.Split( // your string
new[]{ ' ' }, // break apart by spaces
StringSplitOptions.RemoveEmptyEntries // remove empties (double spaces)
).Length; // number of "words" remaining

using namespace;
namespace Application;
class classname
{
static void Main(string[] args)
{
int count;
string name = "I am the student";
count = name.Split(' ').Length;
Console.WriteLine("The count is " +count);
Console.ReadLine();
}
}

if you need whitespace count only try this.
string myString="I Love Programming";
var strArray=myString.Split(new char[] { ' ' });
int countSpace=strArray.Length-1;

How about indirectly?
int countl = 0, countt = 0, count = 0;
foreach(char c in str)
{
countt++;
if (char.IsLetter(c))
{
countl++;
}
}
count = countt - countl;
Console.WriteLine("No. of spaces are: "+count);

How can I search through a string in C# and replace areas bounded by a pattern?

We tried a few solutions now that try and use XML parsers. All fail because the strings are not always 100% valid XML. Here's our problem.
We have strings that look like this:
var a = "this is a testxxx of my data yxxx and of these xxx parts yxxx";
var b = "hello testxxx world yxxx ";
"this is a testxxx3yxxx and of these xxx1yxxx";
"hello testxxx1yxxx ";
The key here is that we want to do something to the data between xxx and yxxx. In the example above I would need a function that counts words and replaces the strings with a word count.
Is there a way we can process the string a and apply a function to change the data that's between the xxx and yxxx? Any function right now as we're just trying to get an idea of how to code this.

You can use Split method:
var parts = a.Split(new[] {"xxx", "yxxx"}, StringSplitOptions.None)
.Select((s, index) =>
{
string s1 = index%2 == 1 ? string.Format("{0}{2}{1}", "xxx", "yxxx", s + "1") : s;
return s1;
});
var result = string.Join("", parts);

If it always going to xxx and yxxx, you can use regex as suggested.
var stringBuilder = new StringBuilder();
Regex regex = new Regex("xxx(.*?)yxxx");
var splitGroups = Regex.Match(a);
foreach(var group in splitGroups)
{
var value = splitGroupsCopy[i];
// do something to value and then append it to string builder
stringBuilder.Append(string.Format("{0}{1}{2}", "xxx", value, "yxxx"));
}
I suppose this is as basic as it gets.

Using Regex.Replace will replace all the matches with your choice of text, something like this:
Regex rgx = new Regex("xxx.+yxxx");
string cleaned = rgx.Replace(a, "replacementtext");

This code will process each of the parts delimited by "xxx". It preserves the "xxx" separators. If you do not want to preserve the "xxx" separators, remove the two lines that say "result.Append(separator);".
Given:
"this is a testxxx of my data yxxx and there are many of these xxx parts yxxx"
It prints:
"this is a testxxx>> of my data y<<xxx and there are many of these xxx>> parts y<<xxx"
I'm assuming that's the kind of thing you want. Add your own processing to "processPart()".
using System;
using System.Text;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
string text = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
string separator = "xxx";
var result = new StringBuilder();
int index = 0;
while (true)
{
int start = text.IndexOf(separator, index);
if (start < 0)
{
result.Append(text.Substring(index));
break;
}
result.Append(text.Substring(index, start - index));
int end = text.IndexOf(separator, start + separator.Length);
if (end < 0)
{
throw new InvalidOperationException("Unbalanced separators.");
}
start += separator.Length;
result.Append(separator);
result.Append(processPart(text.Substring(start, end-start)));
result.Append(separator);
index = end + separator.Length;
}
Console.WriteLine(result);
}
private static string processPart(string part)
{
return ">>" + part + "<<";
}
}
}
[EDIT] Here's the code amended to work with two different separators:
using System;
using System.Text;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
string text = "this is a test<pre> of my data y</pre> and there are many of these <pre> parts y</pre>";
string separator1 = "<pre>";
string separator2 = "</pre>";
var result = new StringBuilder();
int index = 0;
while (true)
{
int start = text.IndexOf(separator1, index);
if (start < 0)
{
result.Append(text.Substring(index));
break;
}
result.Append(text.Substring(index, start - index));
int end = text.IndexOf(separator2, start + separator1.Length);
if (end < 0)
{
throw new InvalidOperationException("Unbalanced separators.");
}
start += separator1.Length;
result.Append(separator1);
result.Append(processPart(text.Substring(start, end-start)));
result.Append(separator2);
index = end + separator2.Length;
}
Console.WriteLine(result);
}
private static string processPart(string part)
{
return "|" + part + "|";
}
}
}

The indexOf() function will return to you the index of the first occurrence of a given substring.
(My indices might be a bit off, but) I would suggest doing something like this:
var searchme = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
var startindex= searchme.indexOf("xxx");
var endindex = searchme.indexOf("yxxx") + 3; //added 3 to find the index of the last 'x' instead of the index of the 'y' character
var stringpiece = searchme.substring(startindex, endindex - startindex);
and you can repeat that while startindex != -1
Like I said, the indices might be slightly off, you might have to add a +1 or -1 somewhere, but this will get you along nicely (I think).
Here is a little sample program that counts chars instead of words. But you should just need to change the processor function.
var a = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
a = ProcessString(a, CountChars);
string CountChars(string a)
{
return a.Length.ToString();
}
string ProcessString(string a, Func<string, string> processor)
{
int idx_start, idx_end = -4;
while ((idx_start = a.IndexOf("xxx", idx_end + 4)) >= 0)
{
idx_end = a.IndexOf("yxxx", idx_start + 3);
if (idx_end < 0)
break;
var string_in_between = a.Substring(idx_start + 3, idx_end - idx_start - 3);
var newString = processor(string_in_between);
a = a.Substring(0, idx_start + 3) + newString + a.Substring(idx_end, a.Length - idx_end);
idx_end -= string_in_between.Length - newString.Length;
}
return a;
}

I would use Regex Groups:
Here my solution to get the parts in the string:
private static IEnumerable<string> GetParts( string searchFor, string begin, string end ) {
string exp = string.Format("({0}(?<searchedPart>.+?){1})+", begin, end);
Regex regex = new Regex(exp);
MatchCollection matchCollection = regex.Matches(searchFor);
foreach (Match match in matchCollection) {
Group #group = match.Groups["searchedPart"];
yield return #group.ToString();
}
}
you can use it like to get the parts:
string a = "this is a testxxx of my data yxxx and there are many of these xxx parts yxxx";
IEnumerable<string> parts = GetParts(a, "xxx", "yxxx");
To replace the parts in the original String you can use the Regex Group to determine Length and StartPosition (#group.Index, #group.Length).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

sub-strings replacements according to some mapping - c#

Related

Method that takes a message and index, creates a substring using the index

ForEach Substring Trim

Get a range from two integers

Count words and spaces in string C#

How can I search through a string in C# and replace areas bounded by a pattern?

Categories

Resources