mb_strcut in C#? - c#

Does anyone know of an implementation of the php function mb_strcut in C#?
http://php.net/manual/en/function.mb-strcut.php
mb_strcut() extracts a substring from a string similarly to mb_substr(), but operates on bytes instead of characters. If the cut position happens to be between two bytes of a multi-byte character, the cut is performed starting from the first byte of that character. This is also the difference to the substr() function, which would simply cut the string between the bytes and thus result in a malformed byte sequence.

Thanks Dash could have not written the below without your help
public static string LimitByteLength(string input, int startByte, int byteLength)
{
var maxLength = startByte + byteLength;
return
new string(
input.SkipWhile((c, i) => GetByteCount(input.Substring(0, i + 1)) <= startByte)
.TakeWhile((c, i) => GetByteCount(input.Substring(0, i + 1)) <= maxLength).ToArray());
}
private static int GetByteCount(string input)
{
return Encoding.Unicode.GetByteCount(input);
}

Related

C# writeline cap

I am writing a program that writes to a file that requires specific positions:
it looks something like:
writer.WriteLine("{0,-3}{1,-5}{2,-30}", data1, data2, data3);
The positions that it starts is correct however, if data1 exceeds 3 character, it pushes the format by the exceeded amount,
Is there a way to make data1 cap at 3 characters and ignore any excess characters using the writeline format?
If I understood correctly, you should "safely substring" (Substring throws an exception if you choose a length greater than the string length) your strings to the desidered lenght.
public static string SafeSubstring(this string text, int start, int length)
{
return text.Length <= start ? string.Empty
: text.Length - start <= length ? text.Substring(start)
: text.Substring(start, length);
}
Then, for example:
writer.WriteLine("{0,-3}{1,-5}{2,-30}",
data1.SafeSubstring(0, 3),
data2.SafeSubstring(0, 3),
data3.SafeSubstring(0, 3));
Use string.Substring. This code trims the string if it's longer than 3 characters:
Console.WriteLine("{0,-3}{1,-5}{2,-30}",
data1.Substring(0,data1.Length > 3 ? 3 : data1.Length),
data2, data3);
You can't solve this with formatting alone, but you can implement a (extension) method:
public static string ToLength(this object value, int length) {
if (length == 0)
return "";
else if (null == value)
return new string(' ', Math.Abs(length));
string v = value.ToString();
if (v.Length >= Math.Abs(length))
return v.Substring(0, Math.Abs(length));
else if (length < 0)
return v.PadRight(-length);
else
return v.PadLeft(length);
}
Then use it
writer.WriteLine("{0}{1}{2}",
data1.ToLength(-3),
data2.ToLength(-5),
data3.ToLength(-30));
I've implemented extension method for object, not string since you may want to put any object to stream.

Best way to remove characters from string in c# win. form

i have a string of length 98975333 and i need to remove first 5 letters in it. Can anyone suggest the best way to do this keeping performance in mind?
I tried
str.Substring(5,str.Length);
str.Remove(0,5);
which gives me result in 0.29 sec
but i want something even faster than the above.
Problem Using StringBuilder
-> i need to substring a part of the string and to do this i need to write
StringBuilder2.ToString().Substring(anyvaluehere)"
here the conversion of StringBuilder to string by ".ToString()" takes time and in this case i cant use StringBuilder
If you are working with long strings, always use StringBuilder. This class provides you fast adding and removing characters, faster than String.Concat or it's syntactic sugar "a" + "b". Moreover StringBuilder.ToString() method has special implementation for best performance as possible.
Sorry, c# strings are not arrays; they are immutable so extracting a (possibly very long) substring involves a copy.
However, most [string utilities] accept start and end indices, for instance IndexOf and CompareInfo.Compare all take a startIndexoverload.
Perhaps if you tell us what you want to do afterward we could suggest alternatives?
Update
Here are some ways you can write performant string parsing with the immutable strings in c#. Say for instance that you need to deserialize XML data inside the string, and need to skip the first N characters. You could do something like this:
public static object XmlDeserializeFromString<T>(this string objectData, int skip)
{
var serializer = new XmlSerializer(typeof(T));
using (var reader = new StringReader(objectData))
{
for (; skip > 0 && reader.Read() != -1; skip--)
;
return (T)serializer.Deserialize(reader);
}
}
As you can see from the source. StringReader.Read() does not make a copy of the unread portion of the string, it keeps an internal index to the remaining unread portion.
Or say you want to skip the first N characters of a string, then parse the string by splitting it at every "," character. You could write something like this:
public static IEnumerable<Pair<int>> WalkSplits(this string str, int startIndex, int count, params char[] separator)
{
if (string.IsNullOrEmpty(str))
yield break;
var length = str.Length;
int endIndex;
if (count < 0)
endIndex = length;
else
{
endIndex = startIndex + count;
if (endIndex > length)
endIndex = length;
}
while (true)
{
int nextIndex = str.IndexOfAny(separator, startIndex, endIndex - startIndex);
if (nextIndex == startIndex)
{
startIndex = nextIndex + 1;
}
else if (nextIndex == -1)
{
if (startIndex < endIndex)
yield return new Pair<int>(startIndex, endIndex - 1);
yield break;
}
else
{
yield return new Pair<int>(startIndex, nextIndex - 1);
startIndex = nextIndex + 1;
}
}
}
And then use the start and end indices of the Pair to further parse the string, or extract small substrings to feed to further parsing methods.
(Pair<T> is a small struct I created similar to KeyValuePair<TKey, TValue> but with identically typed first and second values. I can provide if needed.)
Using a StringBuilder to produce and manipulate the string will help you save on resources:
StringBuilder sb = new StringBuilder();
sb.Append("text"); //to add text in front
sb.Insert(50,"text"); // to insert text
sb.Remove(50,4); // to remove text
sb.ToString(); // to produce the string
If you have a fixed length of string that you wish to store elsewhere, you can make a char array and use StringBuilder's CopyTo() method:
e.g.
char[] firstfive = new char[5];
sb.CopyTo(0,firstfive,0,5);
Edit:
Actually, the OP figured this out himself, but I'm including it on the post for reference:
To get a portion of the StringBuilder as string:
sb.ToString(intStart,intLength)
Use String.Remove() i.e
String newStr = "";
newStr = str.Remove(0,5); //This will delete 5 characters starting from 0 index
Or
newStr = str.Remove(5); //Assumes the starting position as 0 and will ddelete 5 chars from that
Read more Here

Cannot write string to byte array with alignment?

I've currently written a code for converting a string to an array of bytes and then writing those bytes to an buffer byte array. However, for some reason, the alignment part of the code seems to stop the execution of the program. I've testing it enough to know that it's the "int DynamicAlign.." part, but I can't figure out why it's happening.
public void WriteStr( string myString )
{
byte[] myBytes = System.Text.Encoding.ASCII.GetBytes( myString );
for( int i = 0; i < myBytes.Length; i ++ )
{
Buffer[ BytePeek ] = myBytes[ i ];
BytePeek ++;
}
int DynamicAlign = ((myBytes.Length + 1) % ByteAlign != 0)
? ByteAlign - ((myBytes.Length + 1) % ByteAlign)
: 0;
BytePeek += (ushort)(1 + DynamicAlign);
}
If you don't know how byte alignment works, I found this as extra info: http://pastebin.com/tXzLWpBG
The extra "+ 1" and "1 +" are for taking into account the null terminating string at the end of the read sequence.
Alright, so the issue was that I was not setting the alignment for the write buffer, thus it gave the division by zero error with the modulo operation, since the byte alignment was preset to 0...

What is the Fastest way to split ':' seperated string into given number of chunks where result/record length is variable

I have a large string accepted from TCP listner which is in following format
"1,7620257787,0123456789,99,0922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036:1,7620257787,0123456789,99,0922337203,9223372036,32.5455,87,12.7857:2/1/2012,234234234:3,7620257787,01234343456789,99,0922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036:34,76202343457787,012434343456789,93339,34340922337203,9223372036,32.5455,87,12.7857,1/1/2012,9223372036"
You can see that this is a : seperated string which contains Records which are comma seperated fields.
I am looking for the best (fastest) way that split the string in given number of chunks and take care that one chunk should contain full record (string upto ':')
or other way of saying , there should not be any chunck which is not ending with :
e.g. 20 MB string to 4 chunks of 5 MB each with proper records (thus size of each chunk may not be exactly 5 MB but very near to it and total of all 4 chunks will be 20 MB)
I hope you can understand my question (sorry for the bad english)
I like the following link , but it does not take care of full record while spliting also don't know if that is the best and fastest way.
Split String into smaller Strings by length variable
I don't know how large a 'large string' is, but initially I would just try it with the String.Split method.
The idea is to divide the lenght of your data for the num of blocks required, then look backwards to search the last sep in the current block.
private string[] splitToBlocks(string data, int numBlocks, char sep)
{
// We return an array of the request length
if (numBlocks <= 1 || data.Length == 0)
{
return new string [] { data };
}
string[] result = new string[numBlocks];
// The optimal size of each block
int blockLen = (data.Length / numBlocks);
int idx = 0; int pos = 0; int lastSepPos = blockLen;
while (idx < numBlocks)
{
// Search backwards for the first sep starting from the lastSepPos
char c = data[lastSepPos];
while (c != sep) { lastSepPos--; c = data[lastSepPos]; }
// Get the block data in the result array
result[idx] = data.Substring(pos, (lastSepPos + 1) - pos);
// Reposition for then next block
idx++;
pos = lastSepPos + 1;
if(idx == numBlocks-1)
lastSepPos = data.Length - 1;
else
lastSepPos = blockLen * (idx + 1);
}
return result;
}
Please test it. I have not fully tested for fringe cases.
OK, I suggest you way with two steps:
Split string into chunks (see below)
Check chunks for completeness
Splitting string into chunks with help of linq (linq extension method taked from Split a collection into `n` parts with LINQ? ):
string tcpstring = "chunk1 : chunck2 : chunk3: chunk4 : chunck5 : chunk6";
int numOfChunks = 4;
var chunks = (from string z in (tcpstring.Split(':').AsEnumerable()) select z).Split(numOfChunks);
List<string> result = new List<string>();
foreach (IEnumerable<string> chunk in chunks)
{
result.Add(string.Join(":",chunk));
}
.......
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
}
Am I understand your aims clearly?
[EDIT]
In my opinion, In case of performance consideration, better way to use String.Split method for chunking
It seems you want to split on ":" (you can use the Split method).
Then you have to add ":" after splitting to each chunk that has been split.
(you can then split on "," for all the strings that have been split by ":".
int index = yourstring.IndexOf(":");
string[] whatever = string.Substring(0,index);
yourstring = yourstring.Substring(index); //make a new string without the part you just cut out.
this is a general view example, all you need to do is establish an iteration that will run while the ":" character is encountered; cheers...

Get the letters (ABCDE) between two letters (AE) using C#

I need to get the letters as an array on passing two letters using C#
For ex.: When i pass "AE", i need to get the {A,B,C,D,E} as an array. and passing "FJ" should return {F,G,H,I,J}.
The Enumerable class can create a range, which makes the looping simple:
public static char[] CharactersBetween(char start, char end) {
return Enumerable.Range(start, end - start + 1).Select(c => (char)c).ToArray();
}
Note: A char value converts implicitly into int, so there is no conversion needed in that direction. You only have to convert the integers back to char.
Edit:
If you want to send in the alphabet to use (to handle language differences), you can use Substring to get a part of that string:
public static char[] CharactersBetween(char start, char end, string alphabet) {
int idx = alphabet.IndexOf(start);
return alphabet.Substring(idx, alphabet.IndexOf(end) - idx + 1).ToCharArray();
}
Do you mean something like
char[] CharactersBetween(char start, char end)
{
List<char> result = new List<char>();
for (char i = start; i <= end; i++)
{
result.Add(i);
}
return result.ToArray();
}
This should work out well
string startandend = "AG";
string result= "";
for( char i = startandend[0]; i <= startandend[1]; i++){
result += i;
}
result will now contain ABCDEFG.
You should probably add some logic to check if startandend actually have a Length of 2 and so on, but this should be a good starting block for you.
If you want the char[] instead of the string representation, simply call result.ToCharArray() at the end.
Use a loop with integer conversion
with
System.Convert.ToInt32(Char);
and
System.Convert.ToChar(Int32);
see http://msdn.microsoft.com/en-us/library/system.convert_methods.aspx
Pretty simple if you use a fixed alphabet,
public static string ALPHABET = "ABCDEFGHIJKLMNOPWRSTUVWXYZ";
public static List<char> GetLetters(string firstLast)
{
List<char> letters = new List<char>();
int start = ALPHABET.IndexOf(firstLast[0]);
int end = ALPHABET.IndexOf(firstLast[1]);
for (int i = start; i <= end; i++)
{
letters.Add(ALPHABET[i]);
}
return letters;
}
Obviously add in your checks for various things, but it does the basic job.
You're going to have to reference whichever alphabet you want to use. English is easy enough as the letters happen to correspond to code-point order, French treats Œ and Æ as letters in their own right sometimes, and not others. Danish and Norwegian place "Æ, Ø, Å" after Z and Swedish does the same with "Å, Ä, Ö". Irish uses "ABCDEFGHILMNOPRSTU" as the alphabet, but does also use J, K, Q, V, W, X, Y & Z in loan words.
And those are relatively easy cases. So there's no one-size-fits-all.
The easiest way to pass an alphabet is to have a string that contains it. So, e.g. the Danish alphabet would have the string "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ" while French could either include the ligatures or not as you wish (but do you need to deal with the possibility of receiving them while not using them?).
This done:
public static IEnumerable<char> AlphabetRange(string alphabet, string alphaRange)
{
if(alphaRange == null)
throw new ArgumentNullException();
if(alphaRange.Length < 2)
throw new ArgumentException();
int startIdx = alphabet.IndexOf(alphaRange[0]);
int endIdx = alphabet.IndexOf(alphaRange[1]) + 1;
if(startIdx == -1 || endIdx == 0)
throw new ArgumentException();
while(startIdx < endIdx)
yield return alphabet[startIdx++];
}

Categories

Resources