UTF-16 safe substring in C# .NET - c#

I want to get a substring of a given length say 150. However, I want to make sure I don't cut off the string in between a unicode character.
e.g. see the following code:
var str = "Hello😀 world!";
var substr = str.Substring(0, 6);
Here substr is an invalid string since the smiley character is cut in half.
Instead I want a function that does as follows:
var str = "Hello😀 world!";
var substr = str.UnicodeSafeSubstring(0, 6);
where substr contains "Hello😀"
For reference, here is how I would do it in Objective-C using rangeOfComposedCharacterSequencesForRange
NSString* str = #"Hello😀 world!";
NSRange range = [message rangeOfComposedCharacterSequencesForRange:NSMakeRange(0, 6)];
NSString* substr = [message substringWithRange:range]];
What is the equivalent code in C#?

Looks like you're looking to split a string on graphemes, that is on single displayed characters.
In that case, you have a handy method: StringInfo.SubstringByTextElements:
var str = "Hello😀 world!";
var substr = new StringInfo(str).SubstringByTextElements(0, 6);

This should return the maximal substring starting at index startIndex and with length up to length of "complete" graphemes... So initial/final "splitted" surrogate pairs will be removed, initial combining marks will be removed, final characters missing their combining marks will be removed.
Note that probably it isn't what you asked... You seem to want to use graphemes as the unit of measure (or perhaps you want to include the last grapheme even if its length will go over the length parameter)
public static class StringEx
{
public static string UnicodeSafeSubstring(this string str, int startIndex, int length)
{
if (str == null)
{
throw new ArgumentNullException("str");
}
if (startIndex < 0 || startIndex > str.Length)
{
throw new ArgumentOutOfRangeException("startIndex");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException("length");
}
if (startIndex + length > str.Length)
{
throw new ArgumentOutOfRangeException("length");
}
if (length == 0)
{
return string.Empty;
}
var sb = new StringBuilder(length);
int end = startIndex + length;
var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex);
while (enumerator.MoveNext())
{
string grapheme = enumerator.GetTextElement();
startIndex += grapheme.Length;
if (startIndex > length)
{
break;
}
// Skip initial Low Surrogates/Combining Marks
if (sb.Length == 0)
{
if (char.IsLowSurrogate(grapheme[0]))
{
continue;
}
UnicodeCategory cat = char.GetUnicodeCategory(grapheme, 0);
if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark)
{
continue;
}
}
sb.Append(grapheme);
if (startIndex == length)
{
break;
}
}
return sb.ToString();
}
}
Variant that will simply include "extra" characters at the end of the substring, if necessary to make whole a grapheme:
public static class StringEx
{
public static string UnicodeSafeSubstring(this string str, int startIndex, int length)
{
if (str == null)
{
throw new ArgumentNullException("str");
}
if (startIndex < 0 || startIndex > str.Length)
{
throw new ArgumentOutOfRangeException("startIndex");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException("length");
}
if (startIndex + length > str.Length)
{
throw new ArgumentOutOfRangeException("length");
}
if (length == 0)
{
return string.Empty;
}
var sb = new StringBuilder(length);
int end = startIndex + length;
var enumerator = StringInfo.GetTextElementEnumerator(str, startIndex);
while (enumerator.MoveNext())
{
if (startIndex >= length)
{
break;
}
string grapheme = enumerator.GetTextElement();
startIndex += grapheme.Length;
// Skip initial Low Surrogates/Combining Marks
if (sb.Length == 0)
{
if (char.IsLowSurrogate(grapheme[0]))
{
continue;
}
UnicodeCategory cat = char.GetUnicodeCategory(grapheme, 0);
if (cat == UnicodeCategory.NonSpacingMark || cat == UnicodeCategory.SpacingCombiningMark || cat == UnicodeCategory.EnclosingMark)
{
continue;
}
}
sb.Append(grapheme);
}
return sb.ToString();
}
}
This will return what you asked "Hello😀 world!".UnicodeSafeSubstring(0, 6) == "Hello😀".
Note: It's worth pointing out that both of these solutions rely on StringInfo.GetTextElementEnumerator. This method didn't work as expected prior to a fix in .NET5, so if you're on an earlier version of .NET then this will split more complex multi-character emoji's.

Here is a simple implementation for truncate (startIndex = 0):
string truncatedStr = (str.Length > maxLength)
? str.Substring(0, maxLength - (char.IsLowSurrogate(str[maxLength]) ? 1 : 0))
: str;

Related

IPv6 Abbreviation(zero blocks compression) logic. I'm using c#

This is a complete un compressed IP address 2001:0008:0000:CD30:0000:0000:0000:0101
I need to compress it like this
2001:8:0:CD30::101
But i was only able to compress the zeroes in blocks like this
2001:8:0:CD30:0:0:0:101
using this code
string output = "";
string a = textBox1.Text;
if (a.Length != 39 )
MessageBox.Show("Invalid IP please enter the IPv6 IP in this format 6cd9:a87a:ad46:0005:ad40:0000:5698:8ab8");
else
{
for (int i = 0; i < a.Length; i++)
{
if ((a[i] >= '1' && a[i] <= '9') || (Char.ToLower(a[i]) >= 'a' && Char.ToLower(a[i]) <= 'f') || ((i + 1) % 5 == 0 && a[i] == ':'))
{
output = output + a[i];
}
else if ((a[i]=='0' && a[i-1]==':') || (a[i]=='0' && a[i-1]=='0' && a[i-2]==':') || (a[i]=='0' && a[i-1]=='0' && a[i-2]=='0' && a[i-3]==':'))
{
}
else if (a[i] == '0')
{
output = output + a[i];
}
else
{
MessageBox.Show("Invalid IP please enter the IPv6 IP in this format 6cd9:a87a:ad46:0005:ad40:0000:5698:8ab8");
}
}
textBox2.Text = output;
}
Im using c# but i only need the programming logic about how can whole blocks of zeroes be deleted the problem is there could be more then 1 group of blocks containing all zeros in an ip but only one should be abbreviated.
Was far more tricky than I expected, but here you got the way to do it with regular expressions:
private static string Compress(string ip)
{
var removedExtraZeros = ip.Replace("0000","*");
//2001:0008:*:CD30:*:*:*:0101
var blocks = ip.Split(':');
var regex = new Regex(":0+");
removedExtraZeros = regex.Replace(removedExtraZeros, ":");
//2001:8:*:CD30:*:*:*:101
var regex2 = new Regex(":\\*:\\*(:\\*)+:");
removedExtraZeros = regex2.Replace(removedExtraZeros, "::");
//2001:8:*:CD30::101
return removedExtraZeros.Replace("*", "0");
}
If you would like to achieve the same result without using Regex:
public string Compress(string value)
{
var values = value.Split(",");
var ints = values.Select(i => int.Parse(i, System.Globalization.NumberStyles.HexNumber));
var result = ints.Select(Conversion.Hex);
return string.Join(":", result);
}
Didn't spend time on micro-optimizations (stackalloc, spans, etc.) but this gives an idea. For reference on optimization, you can look at the implementation of the IPAddress.Parse in the .net core. Do mind that result of the IPAddress.Parse will give a different result than the example above:
Compress -> 2001:8:0:CD30:0:0:0:101
IPAddress.Parse -> 2001:8:0:cd30::101
This could be "cleaned" up by moving into some object and other ideas.
Edit:
After chatting with one of my colleagues, I spent some time writing an "optimized" version of this. I haven't spent time cleaning up the code, so maybe one of the future edits will be even cleaner.
public string Compress(string value)
{
Span<char> chars = stackalloc char[value.Length];
const char zero = '0';
const char colon = ':';
int index = 0;
int positionInSegment = 0;
bool startsWithZero;
while (index < _originalValue.Length)
{
startsWithZero = value[index] == zero && positionInSegment == 0;
positionInSegment++;
if (startsWithZero)
{
if (index == value.Length - 1)
{
chars[index] = zero;
break;
}
if (value[index + 1] == colon)
{
chars[index] = zero;
positionInSegment = 0;
index++;
continue;
}
positionInSegment = 0;
index++;
continue;
}
if (value[index] == colon)
{
positionInSegment = 0;
chars[index] = colon;
index++;
continue;
}
chars[index] = value[index];
index++;
}
return chars.ToString();
}
I have also created a public gist for future references:
https://gist.github.com/DeanMilojevic/7b4f1d060ce8cfa191592694b11234d7

Decoding Bitcoin Base58 address to byte array

I'm trying to decode bitcoin address from Base58 string into byte array, and to do that I rewrited original function from Satoshi repository (https://github.com/bitcoin/bitcoin/blob/master/src/base58.cpp), written in c++, to c# (which I'm using).
Original code
static const char* pszBase58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
bool DecodeBase58(const char *psz, std::vector<unsigned char>& vch) {
// Skip leading spaces.
while (*psz && isspace(*psz))
psz++;
// Skip and count leading '1's.
int zeroes = 0;
while (*psz == '1') {
zeroes++;
psz++;
}
// Allocate enough space in big-endian base256 representation.
std::vector<unsigned char> b256(strlen(psz) * 733 / 1000 + 1); // log(58) / log(256), rounded up.
// Process the characters.
while (*psz && !isspace(*psz)) {
// Decode base58 character
const char *ch = strchr(pszBase58, *psz);
if (ch == NULL)
return false;
// Apply "b256 = b256 * 58 + ch".
int carry = ch - pszBase58;
for (std::vector<unsigned char>::reverse_iterator it = b256.rbegin(); it != b256.rend(); it++) {
carry += 58 * (*it);
*it = carry % 256;
carry /= 256;
}
assert(carry == 0);
psz++;
}
// Skip trailing spaces.
while (isspace(*psz))
psz++;
if (*psz != 0)
return false;
// Skip leading zeroes in b256.
std::vector<unsigned char>::iterator it = b256.begin();
while (it != b256.end() && *it == 0)
it++;
// Copy result into output vector.
vch.reserve(zeroes + (b256.end() - it));
vch.assign(zeroes, 0x00);
while (it != b256.end())
vch.push_back(*(it++));
return true;
}
Mine rewrited c# version
private static string Base58characters = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
public static bool Decode(string source, ref byte[] destination)
{
int i = 0;
while (i < source.Length)
{
if (source[i] == 0 || !Char.IsWhiteSpace(source[i]))
{
break;
}
i++;
}
int zeros = 0;
while (source[i] == '1')
{
zeros++;
i++;
}
byte[] b256 = new byte[(source.Length - i) * 733 / 1000 + 1];
while (i < source.Length && !Char.IsWhiteSpace(source[i]))
{
int ch = Base58characters.IndexOf(source[i]);
if (ch == -1) //null
{
return false;
}
int carry = Base58characters.IndexOf(source[i]);
for (int k = b256.Length - 1; k > 0; k--)
{
carry += 58 * b256[k];
b256[k] = (byte)(carry % 256);
carry /= 256;
}
i++;
}
while (i < source.Length && Char.IsWhiteSpace(source[i]))
{
i++;
}
if (i != source.Length)
{
return false;
}
int j = 0;
while (j < b256.Length && b256[j] == 0)
{
j++;
}
destination = new byte[zeros + (b256.Length - j)];
for (int kk = 0; kk < destination.Length; kk++)
{
if (kk < zeros)
{
destination[kk] = 0x00;
}
else
{
destination[kk] = b256[j++];
}
}
return true;
}
Function that I'm using for converting from byte-array to HexString
public static string ByteArrayToHexString(byte[] source)
{
return BitConverter.ToString(source).Replace("-", "");
}
To test if everything is working correctly I've used test cases found online here (https://github.com/ThePiachu/Bitcoin-Unit-Tests/blob/master/Address/Address%20Generation%20Test%201.txt). Good thing is that 97% of this test are passed correctly but for 3 there is a little error and I do not know where it is coming from. So my ask to you is to point me what could for these test go wrong or where in rewriting I've made an error. Thank you in advance.
The test cases where errors occures are 1, 21 and 25.
1.
Input:
16UwLL9Risc3QfPqBUvKofHmBQ7wMtjvM
Output:
000966776006953D5567439E5E39F86A0D273BEED61967F6
Should be:
00010966776006953D5567439E5E39F86A0D273BEED61967F6
21.
Input:
1v3VUYGogXD7S1E8kipahj7QXgC568dz1
Output:
0008201462985DF5255E4A6C9D493C932FAC98EF791E2F22
Should be:
000A08201462985DF5255E4A6C9D493C932FAC98EF791E2F22
25.
Input:
1axVFjCkMWDFCHjQHf99AsszXTuzxLxxg
Output:
006C0B8995C7464E89F6760900EA6978DF18157388421561
Should be:
00066C0B8995C7464E89F6760900EA6978DF18157388421561
In your for-loop:
for (int k = b256.Length - 1; k > 0; k--)
The loop condition should be k >= 0 so that you don't skip the first byte in b256.

display full text in a label in c#

i have a label control in windows form. i want to display full text in the label . condition is like this:
if text length exceeds more that 32 character than it will come in the new line.
if possible split by full word, without hyphen(-).
So far i have reach till below code:
private void Form1_Load(object sender, EventArgs e)
{
string strtext = "This is a very long text. this will come in one line.This is a very long text. this will come in one line.";
if (strtext.Length > 32)
{
IEnumerable<string> strEnum = Split(strtext, 32);
label1.Text =string.Join("-\n", strEnum);
}
}
static IEnumerable<string> Split(string str, int chunkSize)
{
return Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize));
}
but issue is that the last line is not displaying entirely because its splitting by 32 character.
Is there another way to achieve this?
I don't know if you will accept an answer that doesn't use linq, but this is simple:
string SplitOnWholeWord(string toSplit, int maxLineLength)
{
StringBuilder sb = new StringBuilder();
string[] parts = toSplit.Split();
string line = string.Empty;
foreach(string s in parts)
{
if(s.Length > 32)
{
string p = s;
while(p.Length > 32)
{
int addedChars = 32 - line.Length;
line = string.Join(" ", line, p.Substring(0, addedChars));
sb.AppendLine(line);
p = p.Substring(addedChars);
line = string.Empty;
}
line = p;
}
else
{
if(line.Length + s.Length > maxLineLength)
{
sb.AppendLine(line);
line = string.Empty;
}
line = (line.Length > 0 ? string.Join(" ", line, s) : s);
}
}
sb.Append(line.Trim());
return sb.ToString();
}
Call with
string result = SplitOnWholeWord(strtext, 32);
It is possible to transform this in an extension method easily:
Put the code above in a separate file and create a static class
public static class StringExtensions
{
public static string SplitOnWholeWord(this string toSplit, int maxLineLength)
{
// same code as above.....
}
}
and call it in this way:
string result = strtext.SplitOnWholeWord(32);
Try this..
string strtext = "This is a very long text. this will come in one line.This is a very long text. this will come in one line.";
if (strtext.Length > 32)
{
IEnumerable<string> strEnum = Split(strtext, 32);
string a = string.Join("-\n", strEnum);
if ((strtext.Length % 32)>0)
{
string lastpart = strtext.Substring(((strtext.Length / 32) * 32));
a = a + "-\n" + lastpart;
}
label1.Text=a;
}
Hope it helps :)
Throwing my answer into the mix. This works:
static IEnumerable<string> Split(string str, int chunkSize) {
int difference = (str.Length % chunkSize);
int count = str.Length / chunkSize;
return Enumerable.Range(0, count + 1)
.Select(i => str.Substring(i * chunkSize, i == count ? difference : chunkSize));
}
You have to take the Ceiling of result in the following calculation
str.Length / chunkSize
Right now it will return integer part of the result and ignore if any reminder is there , thus if you have 120 characters in the str , and your chunk size is 50 , the above calculation will give result = 2 which you are using as number of chunks and that is wrong you need 3 here.
To make sure that your division works fine , you can add additional length to the str.length
Use the following code:
static IEnumerable<string> Split(string str, int chunkSize)
{
return Enumerable.Range(0, (str.Length+chunkSize-1) / chunkSize)
.Select(i => str.Substring(i * chunkSize, (str.length-(i*chunkSize))>=chunkSize? chunkSize:str.length-(i*chunkSize)));
}
You could try
static IEnumerable<string> Split(string str, int chunkSize)
{
var count = str.Length / chunkSize;
var result=Enumerable.Range(0, count)
.Select(i => str.Substring(i * chunkSize, chunkSize));
var end = count * chunkSize;
if (end < str.Length) {
result = result.Concat(str.Substring(end, str.Length - end));
}
return result;
}
or
static IEnumerable<string> Split(string str, int chunkSize)
{
for (var i=0; i<str.Length; i+=chunkSize) {
yield return str.Substring(i, Math.Min(str.Length-i, chunkSize));
}
}
EDIT: Justified split, after comment
static IEnumerable<string> split(string str,int chunkSize) {
var words=str.Split(' ');
var line=new StringBuilder(chunkSize);
for (var i=0; i<words.Length;i++) {
var word=words[i];
if (line.Length + word.Length + 1 > chunkSize) {
if (line.Length == 0) {
for(var x=0;x<word.Length/chunkSize;x++) {
yield return word.Substring(x*chunkSize,chunkSize);
}
var remainder = word.Length % chunkSize;
if (remainder>0) {
line.Append(word.Substring(word.Length-remainder, remainder));
}
} else {
yield return line.ToString();
line.Clear();
i--; // Force reprocessing this word
}
} else {
if (line.Length>0) {
line.Append(" ");
}
line.Append(word);
}
}
}
don't forget to change your string.Join("-\n") to be string.Join("\n")

Mask out part first 12 characters of string with *?

How can I take the value 123456789012345 or 1234567890123456 and turn it into:
************2345 and ************3456
The difference between the strings above is that one contains 15 digits and the other contains 16.
I have tried the following, but it does not keep the last 4 digits of the 15 digit number and now matter what the length of the string, be it 13, 14, 15, or 16, I want to mask all beginning digits with a *, but keep the last 4. Here is what I have tried:
String.Format("{0}{1}", "************", str.Substring(11, str.Length - 12))
Something like this:
string s = "1234567890123"; // example
string result = s.Substring(s.Length - 4).PadLeft(s.Length, '*');
This will mask all but the last four characters of the string. It assumes that the source string is at least 4 characters long.
using System;
class Program
{
static void Main()
{
var str = "1234567890123456";
if (str.Length > 4)
{
Console.WriteLine(
string.Concat(
"".PadLeft(12, '*'),
str.Substring(str.Length - 4)
)
);
}
else
{
Console.WriteLine(str);
}
}
}
Easiest way: Create an extension method to extract the last four digits. Use that in your String.Format call.
For example:
public static string LastFour(this string value)
{
if (string.IsNullOrEmpty(value) || value.length < 4)
{
return "0000";
}
return value.Substring(value.Length - 4, 4)
}
In your code:
String.Format("{0}{1}", "************", str.LastFour());
In my opinion, this leads to more readable code, and it's reusable.
EDIT: Perhaps not the easiest way, but an alternative way that may produce more maintainable results. <shrug/>
Try this:
var maskSize = ccDigits.Length - 4;
var mask = new string('*', maskSize) + ccDigits.Substring(maskSize);
LINQ:
char maskBy = '*';
string input = "123456789012345";
int count = input.Length <= 4 ? 0 : input.Length - 4;
string output = new string(input.Select((c, i) => i < count ? maskBy : c).ToArray());
static private String MaskInput(String input, int charactersToShowAtEnd)
{
if (input.Length < charactersToShowAtEnd)
{
charactersToShowAtEnd = input.Length;
}
String endCharacters = input.Substring(input.Length - charactersToShowAtEnd);
return String.Format(
"{0}{1}",
"".PadLeft(input.Length - charactersToShowAtEnd, '*'),
endCharacters
);
}
Adjust the function header as required, call with:
MaskInput("yourInputHere", 4);
private string MaskDigits(string input)
{
//take first 6 characters
string firstPart = input.Substring(0, 6);
//take last 4 characters
int len = input.Length;
string lastPart = input.Substring(len - 4, 4);
//take the middle part (****)
int middlePartLenght = len - (firstPart.Length + lastPart.Length);
string middlePart = new String('*', middlePartLenght);
return firstPart + middlePart + lastPart;
}
MaskDigits("1234567890123456");
// output : "123456******3456"
Try the following:
private string MaskString(string s)
{
int NUM_ASTERISKS = 4;
if (s.Length < NUM_ASTERISKS) return s;
int asterisks = s.Length - NUM_ASTERISKS;
string result = new string('*', asterisks);
result += s.Substring(s.Length - NUM_ASTERISKS);
return result;
}
Regex with a match evaluator will do the job
string filterCC(string source) {
var x=new Regex(#"^\d+(?=\d{4}$)");
return x.Replace(source,match => new String('*',match.Value.Length));
}
This will match any number of digits followed by 4 digits and the end (it won't include the 4 digits in the replace). The replace function will replace the match with a string of * of equal length.
This has the additional benefit that you could use it as a validation algorthim too. Change the first + to {11,12} to make it match a total of 15 or 16 chars and then you can use x.IsMatch to determine validity.
EDIT
Alternatively if you always want a 16 char result just use
return x.Replace(source,new String('*',12));
// "123456789".MaskFront results in "****56789"
public static string MaskFront(this string str, int len, char c)
{
var strArray = str.ToCharArray();
for (var i = 0; i < len; i++)
{
if(i < strArray.Length)
{
strArray[i] = c;
}
else
{
break;
}
}
return string.Join("", strArray);
}
// "123456789".MaskBack results in "12345****"
public static string MaskBack(this string str, int len, char c)
{
var strArray = str.ToCharArray();
var tracker = strArray.Length - 1;
for (var i = 0; i < len; i++)
{
if (tracker > -1)
{
strArray[tracker] = c;
tracker--;
}
else
{
break;
}
}
return string.Join("", strArray);
}
Try this out:
static string Mask(string str)
{
if (str.Length <= 4) return str;
Regex rgx = new Regex(#"(.*?)(\d{4})$");
string result = String.Empty;
if (rgx.IsMatch(str))
{
for (int i = 0; i < rgx.Matches(str)[0].Groups[1].Length; i++)
result += "*";
result += rgx.Matches(str)[0].Groups[2];
return result;
}
return str;
}
Mask from start and from end with sending char
public static string Maskwith(this string value, int fromStart, int fromEnd, char ch)
{
return (value?.Length >= fromStart + fromEnd) ?
string.Concat(Enumerable.Repeat(ch, fromStart)) + value.Substring(fromStart, value.Length - (fromStart + fromEnd)) + string.Concat(Enumerable.Repeat(ch, fromEnd))
: "";
} //Console.WriteLine("mytestmask".Maskwith(2,3,'*')); **testm***
show chars from start and from end by passing value and mask the middle
public static string MasktheMiddle(this string value, int visibleCharLength, char ch)
{
if (value?.Length <= (visibleCharLength * 2))
return string.Concat(Enumerable.Repeat(ch,value.Length));
else
return value.Substring(0, visibleCharLength) + string.Concat(Enumerable.Repeat(ch, value.Length - (visibleCharLength * 2))) + value.Substring(value.Length - visibleCharLength);
} //Console.WriteLine("mytestmask".MasktheMiddle(2,'*')); Result: my******sk
How can I take the value 123456789012345 or 1234567890123456 and turn it into:
************2345 and ************3456
one more way to do this:
var result = new string('*',0,value.Length - 4) + new string(value.Skip(value.Length - 4).ToArray())
// or using string.Join
An extension method using C# 8's index and range:
public static string MaskStart(this string input, int showNumChars, char maskChar = '*') =>
input[^Math.Min(input.Length, showNumChars)..]
.PadLeft(input.Length, maskChar);
A simple way
string s = "1234567890123"; // example
int l = s.Length;
s = s.Substring(l - 4);
string r = new string('*', l);
r = r + s;

Iterating through the Alphabet - C# a-caz

I have a question about iterate through the Alphabet.
I would like to have a loop that begins with "a" and ends with "z". After that, the loop begins "aa" and count to "az". after that begins with "ba" up to "bz" and so on...
Anybody know some solution?
Thanks
EDIT: I forgot that I give a char "a" to the function then the function must return b. if u give "bnc" then the function must return "bnd"
First effort, with just a-z then aa-zz
public static IEnumerable<string> GetExcelColumns()
{
for (char c = 'a'; c <= 'z'; c++)
{
yield return c.ToString();
}
char[] chars = new char[2];
for (char high = 'a'; high <= 'z'; high++)
{
chars[0] = high;
for (char low = 'a'; low <= 'z'; low++)
{
chars[1] = low;
yield return new string(chars);
}
}
}
Note that this will stop at 'zz'. Of course, there's some ugly duplication here in terms of the loops. Fortunately, that's easy to fix - and it can be even more flexible, too:
Second attempt: more flexible alphabet
private const string Alphabet = "abcdefghijklmnopqrstuvwxyz";
public static IEnumerable<string> GetExcelColumns()
{
return GetExcelColumns(Alphabet);
}
public static IEnumerable<string> GetExcelColumns(string alphabet)
{
foreach(char c in alphabet)
{
yield return c.ToString();
}
char[] chars = new char[2];
foreach(char high in alphabet)
{
chars[0] = high;
foreach(char low in alphabet)
{
chars[1] = low;
yield return new string(chars);
}
}
}
Now if you want to generate just a, b, c, d, aa, ab, ac, ad, ba, ... you'd call GetExcelColumns("abcd").
Third attempt (revised further) - infinite sequence
public static IEnumerable<string> GetExcelColumns(string alphabet)
{
int length = 0;
char[] chars = null;
int[] indexes = null;
while (true)
{
int position = length-1;
// Try to increment the least significant
// value.
while (position >= 0)
{
indexes[position]++;
if (indexes[position] == alphabet.Length)
{
for (int i=position; i < length; i++)
{
indexes[i] = 0;
chars[i] = alphabet[0];
}
position--;
}
else
{
chars[position] = alphabet[indexes[position]];
break;
}
}
// If we got all the way to the start of the array,
// we need an extra value
if (position == -1)
{
length++;
chars = new char[length];
indexes = new int[length];
for (int i=0; i < length; i++)
{
chars[i] = alphabet[0];
}
}
yield return new string(chars);
}
}
It's possible that it would be cleaner code using recursion, but it wouldn't be as efficient.
Note that if you want to stop at a certain point, you can just use LINQ:
var query = GetExcelColumns().TakeWhile(x => x != "zzz");
"Restarting" the iterator
To restart the iterator from a given point, you could indeed use SkipWhile as suggested by thesoftwarejedi. That's fairly inefficient, of course. If you're able to keep any state between call, you can just keep the iterator (for either solution):
using (IEnumerator<string> iterator = GetExcelColumns())
{
iterator.MoveNext();
string firstAttempt = iterator.Current;
if (someCondition)
{
iterator.MoveNext();
string secondAttempt = iterator.Current;
// etc
}
}
Alternatively, you may well be able to structure your code to use a foreach anyway, just breaking out on the first value you can actually use.
Edit: Made it do exactly as the OP's latest edit wants
This is the simplest solution, and tested:
static void Main(string[] args)
{
Console.WriteLine(GetNextBase26("a"));
Console.WriteLine(GetNextBase26("bnc"));
}
private static string GetNextBase26(string a)
{
return Base26Sequence().SkipWhile(x => x != a).Skip(1).First();
}
private static IEnumerable<string> Base26Sequence()
{
long i = 0L;
while (true)
yield return Base26Encode(i++);
}
private static char[] base26Chars = "abcdefghijklmnopqrstuvwxyz".ToCharArray();
private static string Base26Encode(Int64 value)
{
string returnValue = null;
do
{
returnValue = base26Chars[value % 26] + returnValue;
value /= 26;
} while (value-- != 0);
return returnValue;
}
The following populates a list with the required strings:
List<string> result = new List<string>();
for (char ch = 'a'; ch <= 'z'; ch++){
result.Add (ch.ToString());
}
for (char i = 'a'; i <= 'z'; i++)
{
for (char j = 'a'; j <= 'z'; j++)
{
result.Add (i.ToString() + j.ToString());
}
}
I know there are plenty of answers here, and one's been accepted, but IMO they all make it harder than it needs to be. I think the following is simpler and cleaner:
static string NextColumn(string column){
char[] c = column.ToCharArray();
for(int i = c.Length - 1; i >= 0; i--){
if(char.ToUpper(c[i]++) < 'Z')
break;
c[i] -= (char)26;
if(i == 0)
return "A" + new string(c);
}
return new string(c);
}
Note that this doesn't do any input validation. If you don't trust your callers, you should add an IsNullOrEmpty check at the beginning, and a c[i] >= 'A' && c[i] <= 'Z' || c[i] >= 'a' && c[i] <= 'z' check at the top of the loop. Or just leave it be and let it be GIGO.
You may also find use for these companion functions:
static string GetColumnName(int index){
StringBuilder txt = new StringBuilder();
txt.Append((char)('A' + index % 26));
//txt.Append((char)('A' + --index % 26));
while((index /= 26) > 0)
txt.Insert(0, (char)('A' + --index % 26));
return txt.ToString();
}
static int GetColumnIndex(string name){
int rtn = 0;
foreach(char c in name)
rtn = rtn * 26 + (char.ToUpper(c) - '#');
return rtn - 1;
//return rtn;
}
These two functions are zero-based. That is, "A" = 0, "Z" = 25, "AA" = 26, etc. To make them one-based (like Excel's COM interface), remove the line above the commented line in each function, and uncomment those lines.
As with the NextColumn function, these functions don't validate their inputs. Both with give you garbage if that's what they get.
Here’s what I came up with.
/// <summary>
/// Return an incremented alphabtical string
/// </summary>
/// <param name="letter">The string to be incremented</param>
/// <returns>the incremented string</returns>
public static string NextLetter(string letter)
{
const string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (!string.IsNullOrEmpty(letter))
{
char lastLetterInString = letter[letter.Length - 1];
// if the last letter in the string is the last letter of the alphabet
if (alphabet.IndexOf(lastLetterInString) == alphabet.Length - 1)
{
//replace the last letter in the string with the first leter of the alphbat and get the next letter for the rest of the string
return NextLetter(letter.Substring(0, letter.Length - 1)) + alphabet[0];
}
else
{
// replace the last letter in the string with the proceeding letter of the alphabet
return letter.Remove(letter.Length-1).Insert(letter.Length-1, (alphabet[alphabet.IndexOf(letter[letter.Length-1])+1]).ToString() );
}
}
//return the first letter of the alphabet
return alphabet[0].ToString();
}
just curious , why not just
private string alphRecursive(int c) {
var alphabet = "abcdefghijklmnopqrstuvwxyz".ToCharArray();
if (c >= alphabet.Length) {
return alphRecursive(c/alphabet.Length) + alphabet[c%alphabet.Length];
} else {
return "" + alphabet[c%alphabet.Length];
}
}
This is like displaying an int, only using base 26 in stead of base 10. Try the following algorithm to find the nth entry of the array
q = n div 26;
r = n mod 26;
s = '';
while (q > 0 || r > 0) {
s = alphabet[r] + s;
q = q div 26;
r = q mod 26;
}
Of course, if you want the first n entries, this is not the most efficient solution. In this case, try something like daniel's solution.
I gave this a go and came up with this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Alphabetty
{
class Program
{
const string alphabet = "abcdefghijklmnopqrstuvwxyz";
static int cursor = 0;
static int prefixCursor;
static string prefix = string.Empty;
static bool done = false;
static void Main(string[] args)
{
string s = string.Empty;
while (s != "Done")
{
s = GetNextString();
Console.WriteLine(s);
}
Console.ReadKey();
}
static string GetNextString()
{
if (done) return "Done";
char? nextLetter = GetNextLetter(ref cursor);
if (nextLetter == null)
{
char? nextPrefixLetter = GetNextLetter(ref prefixCursor);
if(nextPrefixLetter == null)
{
done = true;
return "Done";
}
prefix = nextPrefixLetter.Value.ToString();
nextLetter = GetNextLetter(ref cursor);
}
return prefix + nextLetter;
}
static char? GetNextLetter(ref int letterCursor)
{
if (letterCursor == alphabet.Length)
{
letterCursor = 0;
return null;
}
char c = alphabet[letterCursor];
letterCursor++;
return c;
}
}
}
Here is something I had cooked up that may be similar. I was experimenting with iteration counts in order to design a numbering schema that was as small as possible, yet gave me enough uniqueness.
I knew that each time a added an Alpha character, it would increase the possibilities 26x but I wasn't sure how many letters, numbers, or the pattern I wanted to use.
That lead me to the code below. Basically you pass it an AlphaNumber string, and every position that has a Letter, would eventually increment to "z\Z" and every position that had a Number, would eventually increment to "9".
So you can call it 1 of two ways..
//This would give you the next Itteration... (H3reIsaStup4dExamplf)
string myNextValue = IncrementAlphaNumericValue("H3reIsaStup4dExample")
//Or Loop it resulting eventually as "Z9zzZzzZzzz9zZzzzzzz"
string myNextValue = "H3reIsaStup4dExample"
while (myNextValue != null)
{
myNextValue = IncrementAlphaNumericValue(myNextValue)
//And of course do something with this like write it out
}
(For me, I was doing something like "1AA000")
public string IncrementAlphaNumericValue(string Value)
{
//We only allow Characters a-b, A-Z, 0-9
if (System.Text.RegularExpressions.Regex.IsMatch(Value, "^[a-zA-Z0-9]+$") == false)
{
throw new Exception("Invalid Character: Must be a-Z or 0-9");
}
//We work with each Character so it's best to convert the string to a char array for incrementing
char[] myCharacterArray = Value.ToCharArray();
//So what we do here is step backwards through the Characters and increment the first one we can.
for (Int32 myCharIndex = myCharacterArray.Length - 1; myCharIndex >= 0; myCharIndex--)
{
//Converts the Character to it's ASCII value
Int32 myCharValue = Convert.ToInt32(myCharacterArray[myCharIndex]);
//We only Increment this Character Position, if it is not already at it's Max value (Z = 90, z = 122, 57 = 9)
if (myCharValue != 57 && myCharValue != 90 && myCharValue != 122)
{
myCharacterArray[myCharIndex]++;
//Now that we have Incremented the Character, we "reset" all the values to the right of it
for (Int32 myResetIndex = myCharIndex + 1; myResetIndex < myCharacterArray.Length; myResetIndex++)
{
myCharValue = Convert.ToInt32(myCharacterArray[myResetIndex]);
if (myCharValue >= 65 && myCharValue <= 90)
{
myCharacterArray[myResetIndex] = 'A';
}
else if (myCharValue >= 97 && myCharValue <= 122)
{
myCharacterArray[myResetIndex] = 'a';
}
else if (myCharValue >= 48 && myCharValue <= 57)
{
myCharacterArray[myResetIndex] = '0';
}
}
//Now we just return an new Value
return new string(myCharacterArray);
}
}
//If we got through the Character Loop and were not able to increment anything, we retun a NULL.
return null;
}
Here's my attempt using recursion:
public static void PrintAlphabet(string alphabet, string prefix)
{
for (int i = 0; i < alphabet.Length; i++) {
Console.WriteLine(prefix + alphabet[i].ToString());
}
if (prefix.Length < alphabet.Length - 1) {
for (int i = 0; i < alphabet.Length; i++) {
PrintAlphabet(alphabet, prefix + alphabet[i]);
}
}
}
Then simply call PrintAlphabet("abcd", "");

Categories

Resources