any problems with doing this?
int i = new StreamReader("file.txt").ReadToEnd().Split(new char[] {'\n'}).Length
The method you posted isn't particularly good. Lets break this apart:
// new StreamReader("file.txt").ReadToEnd().Split(new char[] {'\n'}).Length
// becomes this:
var file = new StreamReader("file.txt").ReadToEnd(); // big string
var lines = file.Split(new char[] {'\n'}); // big array
var count = lines.Count;
You're actually holding this file in memory twice: once to read all the lines, once to split it into an array. The garbage collector hates that.
If you like one liners, you can write System.IO.File.ReadAllLines(filePath).Length, but that still retrieves the entire file in an array. There's no point doing that if you aren't going to hold onto the array.
A faster solution would be:
int TotalLines(string filePath)
{
using (StreamReader r = new StreamReader(filePath))
{
int i = 0;
while (r.ReadLine() != null) { i++; }
return i;
}
}
The code above holds (at most) one line of text in memory at any given time. Its going to be efficient as long as the lines are relatively short.
Well, the problem with doing this is that you allocate a lot of memory when doing this on large files.
I would rather read the file line by line and manually increment a counter. This may not be a one-liner but it's much more memory-efficient.
Alternatively, you may load the data in even-sized chunks and count the line breaks in these. This is probably the fastest way.
If you're looking for a short solution, I can give you a one-liner that at least saves you from having to split the result:
int i = File.ReadAllLines("file.txt").Count;
But that has the same problems of reading a large file into memory as your original. You should really use a streamreader and count the line breaks as you read them until you reach the end of the file.
Sure - it reads the entire stream into memory. It's terse, but I can create a file today that will fail this hard.
Read a character at a time and increment your count on newline.
EDIT - after some quick research
If you want terse and want that shiny new generic feel, consider this:
public class StreamEnumerator : IEnumerable<char>
{
StreamReader _reader;
public StreamEnumerator(Stream stm)
{
if (stm == null)
throw new ArgumentNullException("stm");
if (!stm.CanSeek)
throw new ArgumentException("stream must be seekable", "stm");
if (!stm.CanRead)
throw new ArgumentException("stream must be readable", "stm");
_reader = new StreamReader(stm);
}
public IEnumerator<char> GetEnumerator()
{
int c = 0;
while ((c = _reader.Read()) >= 0)
{
yield return (char)c;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
which defines a new class which allows you to enumerate over streams, then your counting code can look like this:
StreamEnumerator chars = new StreamEnumerator(stm);
int lines = chars.Count(c => c == '\n');
which gives you a nice terse lambda expression to do (more or less) what you want.
I still prefer the Old Skool:
public static int CountLines(Stream stm)
{
StreamReader _reader = new StreamReader(stm);
int c = 0, count = 0;
while ((c = _reader.Read()) != -1)
{
if (c == '\n')
{
count++;
}
}
return count;
}
NB: Environment.NewLine version left as an exercise for the reader
That should do the trick:
using System.Linq;
....
int i = File.ReadLines(file).Count();
Mayby this?
string file = new StreamReader("YourFile.txt").ReadToEnd();
string[] lines = file.Split('\n');
int countOfLines = lines.GetLength(0));
Assuming the file exists and you can open it, that will work.
It's not very readable or safe...
Related
This question already has answers here:
Is there an easy way to change a char in a string in C#?
(8 answers)
Closed 5 years ago.
This is kind of a basic question, but I learned programming in C++ and am just transitioning to C#, so my ignorance of the C# methods are getting in my way.
A client has given me a few fixed length files and they want the 484th character of every odd numbered record, skipping the first one (3, 5, 7, etc...) changed from a space to a 0. In my mind, I should be able to do something like the below:
static void Main(string[] args)
{
List<string> allLines = System.IO.File.ReadAllLines(#"C:\...").ToList();
foreach(string line in allLines)
{
//odd numbered logic here
line[483] = '0';
}
...
//write to new file
}
However, the property or indexer cannot be assigned to because it is read only. All my reading says that I have not set a setter for the variable, and I have tried what was shown at this SO article, but I am doing something wrong every time. Should what is shown in that article work? Should I do something else?
You cannot modify C# strings directly, because they are immutable. You can convert strings to char[], modify it, then make a string again, and write it to file:
File.WriteAllLines(
#"c:\newfile.txt"
, File.ReadAllLines(#"C:\...").Select((s, index) => {
if (index % 2 = 0) {
return s; // Even strings do not change
}
var chars = s.ToCharArray();
chars[483] = '0';
return new string(chars);
})
);
Since strings are immutable, you can't modify a single character by treating it as a char[] and then modify a character at a specific index. However, you can "modify" it by assigning it to a new string.
We can use the Substring() method to return any part of the original string. Combining this with some concatenation, we can take the first part of the string (up to the character you want to replace), add the new character, and then add the rest of the original string.
Also, since we can't directly modify the items in a collection being iterated over in a foreach loop, we can switch your loop to a for loop instead. Now we can access each line by index, and can modify them on the fly:
for(int i = 0; i < allLines.Length; i++)
{
if (allLines[i].Length > 483)
{
allLines[i] = allLines[i].Substring(0, 483) + "0" + allLines[i].Substring(484);
}
}
It's possible that, depending on how many lines you're processing and how many in-line concatenations you end up doing, there is some chance that using a StringBuilder instead of concatenation will perform better. Here is an alternate way to do this using a StringBuilder. I'll leave the perf measuring to you...
var sb = new StringBuilder();
for (int i = 0; i < allLines.Length; i++)
{
if (allLines[i].Length > 483)
{
sb.Clear();
sb.Append(allLines[i].Substring(0, 483));
sb.Append("0");
sb.Append(allLines[i].Substring(484));
allLines[i] = sb.ToString();
}
}
The first item after the foreach (string line in this case) is a local variable that has no scope outside the loop - that’s why you can’t assign a value to it. Try using a regular for loop instead.
Purpose of for each is meant to iterate over a container. It's read only in nature. You should use regular for loop. It will work.
static void Main(string[] args)
{
List<string> allLines = System.IO.File.ReadAllLines(#"C:\...").ToList();
for (int i=0;i<=allLines.Length;++i)
{
if (allLines[i].Length > 483)
{
allLines[i] = allLines[i].Substring(0, 483) + "0";
}
}
...
//write to new file
}
I am downloading data from a site and the site gives the data to me in very large blocks. Within the very large block, there are "chunks" that I need to parse individually. These "chunks" begin with "(ClinicalData)" and end with "(/ClinicalData)". Therefore, an example string would look something like:
(ClinicalData)(ID="1")(/ClinicalData)(ClinicalData)(ID="2")(/ClinicalData)(ClinicalData)(ID="3")(/ClinicalData)(ClinicalData)(ID="4")(/ClinicalData)(ClinicalData)(ID="5")(/ClinicalData)
Under "ideal" circumstances, the block is meant to be one-single line of data, however sometimes there are erroneous newline characters. Since I want to parse the (ClinicalData) chunks within the block, I want to make my data parse-able line-by-line. Therefore, I take the text file, read it all into a StringBuilder, remove new-lines (just in case), and then insert my own newlines, that way I can read line-by-line.
StringBuilder dataToWrite = new StringBuilder(File.ReadAllText(filepath), Int32.MaxValue);
// Need to clear newline characters just in case they exist.
dataToWrite.Replace("\n", "");
// set my own newline characters so the data becomes parse-able by line
dataToWrite.Replace("<ClinicalData", "\n<ClinicalData");
// set the data back into a file, which is then used in a StreamReader to parse by lines.
File.WriteAllText(filepath, dataToWrite.ToString());
This has been working out great (albeit maybe not efficient, but at least it is friendly to me :)), until I have not encountered a chunk of data that is being given to me as a 280MB large file.
Now I am getting a System.OutOfMemoryException with this block and I just cannot figure out a way around it. I believe the issue is that StringBuilder cannot handle 280MB of straight text? Well, I have tried string splits, regex.match splits, and various other ways to break it into guaranteed "(ClinicalData) chunks, but I continue to get the memory exception. I have also had no luck in attempting to read pre-defined chunks (e.g.: using .ReadBytes).
Any suggestions on how to handle a 280MB large, potentially-but-might-not-actually-be single line of text would be great!
That's an extremely inefficient way to read a text file, let alone a large one. If you only need one pass, replacing or adding individual characters, you should use a StreamReader. If you only need one character of lookahead you only need to maintain a single intermediate state, something like:
enum ReadState
{
Start,
SawOpen
}
using (var sr = new StreamReader(#"path\to\clinic.txt"))
using (var sw = new StreamWriter(#"path\to\output.txt"))
{
var rs = ReadState.Start;
while (true)
{
var r = sr.Read();
if (r < 0)
{
if (rs == ReadState.SawOpen)
sw.Write('<');
break;
}
char c = (char) r;
if ((c == '\r') || (c == '\n'))
continue;
if (rs == ReadState.SawOpen)
{
if (c == 'C')
sw.WriteLine();
sw.Write('<');
rs = ReadState.Start;
}
if (c == '<')
{
rs = ReadState.SawOpen;
continue;
}
sw.Write(c);
}
}
First off, I don't think you need to put all the text in a StringBuilder, since you aren't even concatenating parts to it. You could just try the following:
File.ReadAllText(filepath).Replace("\n", "").Replace("<ClinicalData", "\n<ClinicalData");
Why not try a StreamReader for this task? You can pick a "chunk" size that you want to read by and then split up those chunks into the (ClinicalData)data(/ClinicalData) parts. Here is some detailed code on how to do this:
char[] buffer = new char[1024];
string remainder = string.Empty;
List<ClientData> list = new List<ClientData>();
using (StreamReader reader = File.OpenText(#"source.txt"))
{
while (reader.Read(buffer, 0, 1024) > 0)
{
remainder = Parse(remainder + new string(buffer), list);
}
}
with the following method:
string Parse(string value, List<ClientData> list)
{
string[] parts = value.Split(new string[1] { "</ClientData>" }, StringSplitOptions.None);
for (int i = 0; i < parts.Length - 1; i++)
list.Add(new ClientData(parts[i]));
return parts[parts.Length - 1];
}
and the ClientData class however you have it implemented:
class ClientData
{
public ClientData(string value)
{
// fill in however you are already parsing out ID, and other info
}
}
There are many ways to implement something like this, but hopefully this can help get you started.
StreamReader's ReadLine() method is only one of the many ways you can read the text from the file. You can read into a buffer with a specified length, and then parse out the ClinicalData tags. I can provide an example if you'd like.
http://msdn.microsoft.com/en-us/library/9kstw824%28v=vs.110%29.aspx
Alternately, if you are reading an XML file, XmlReader is another option.
http://msdn.microsoft.com/en-us/library/system.xml.xmlreader%28v=vs.110%29.aspx
I have a text file, which I am trying to insert a line of code into. Using my linked-lists I believe I can avoid having to take all the data out, sort it, and then make it into a new text file.
What I did was come up with the code below. I set my bools, but still it is not working. I went through debugger and what it seems to be going on is that it is going through the entire list (which is about 10,000 lines) and it is not finding anything to be true, so it does not insert my code.
Why or what is wrong with this code?
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
using (StreamReader inFile = new StreamReader("Students.txt", true))
{
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i];
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.CompareTo(lastName) < 0)
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
{
lines.Add(newRecord);
}
You're just reading the file into memory and not committing it anywhere.
I'm afraid that you're going to have to load and completely re-write the entire file. Files support appending, but they don't support insertions.
you can write to a file the same way that you read from it
string[] lines;
/// instanciate and build `lines`
File.WriteAllLines("path", lines);
WriteAllLines also takes an IEnumerable, so you can past a List of string into there if you want.
one more issue: it appears as though you're reading your file twice. one with ReadAllLines and another with your StreamReader.
There are at least four possible errors.
The opening of the streamreader is not required, you have already read
all the lines. (Well not really an error, but...)
The check for StartsWith can be fooled if you lines starts with blank
space and you will miss the insertionPoint. (Adding a Trim will remove any problem here)
In the CompareTo line you check for < 0 but you should check for == 0. CompareTo returns 0 if the strings are equivalent, however.....
To check if two string are equals you should avoid using CompareTo as
explained in MSDN link above but use string.Equals
List<string> lines = new List<string>(File.ReadAllLines("Students.txt"));
string newLastName = "'Constant";
string newRecord = "(LIST (LIST 'Constant 'Malachi 'D ) '1234567890 'mdcant#mail.usi.edu 4.000000 )";
string line;
string lastName;
bool insertionPointFound = false;
for (int i = 0; i < lines.Count && !insertionPointFound; i++)
{
line = lines[i].Trim();
if (line.StartsWith("(LIST (LIST "))
{
values = line.Split(" ".ToCharArray());
lastName = values[2];
if (newLastName.Equals(lastName))
{
lines.Insert(i, newRecord);
insertionPointFound = true;
}
}
}
if (!insertionPointFound)
lines.Add(newRecord);
I don't list as an error the missing write back to the file. Hope that you have just omitted that part of the code. Otherwise it is a very simple problem.
(However I think that the way in which CompareTo is used is probably the main reason of your problem)
EDIT Looking at your comment below it seems that the answer from Sam I Am is the right one for you. Of course you need to write back the modified array of lines. All the changes are made to an in memory array of lines and nothing is written back to a file if you don't have code that writes a file. However you don't need new file
File.WriteAllLines("Students.txt", lines);
I've got a string that I want to read line-by-line, but I also need to have the line delimiter character, which StringReader.ReadLine unfortunately trims (unlike in ruby where it is kept). What is the fastest and most robust way to accomplish this?
Alternatives I've been thinking about:
Reading the input character-by-character and checking for the line delimiter each time
Using RegExp.Split with a positive lookahead
Alternatively I only care about the line delimiter because I need to know the actual position in the string, and the delimiter can be either one or tho character long. Therefore if I could get back the actual position of the cursor within the string would be also good, but StringReader doesn't have this feature.
EDIT: here is my current implementation. End-of-file is designated by returning an empty string.
StringBuilder line = new StringBuilder();
int r = _input.Read();
while (r >= 0)
{
char c = Convert.ToChar(r);
line.Append(c);
if (c == '\n') break;
if (c == '\r')
{
int peek = _input.Peek();
if (peek == -1) break;
if (Convert.ToChar(peek) != '\n') break;
}
r = _input.Read();
}
return line.ToString();
Are you concerned about inconsistencies between files (i.e. coming from Unix/Mac vs. Windows), or within files?
One very easy optimization if you know that individual files are consistent with themselves would be to only read the first line character-by-character and figure out what the delimiter is. Then determining the exact position of any other line would be simple math.
Failing that, I think I would go the character-by-character route. A regex seems too "clever." This sounds like a complex function and I think the most important thing would be to make it easy to write, read, understand, and most importantly debug.
There's another way to do this, which would be more efficient if your data source was a stream. Unfortunately it's not, as referenced in your comment, so you would have to create one first; however, I'll include the solution anyway, it might give you some inspiration:
public IEnumerable<int> GetLineStartIndices(string s)
{
yield return 0;
byte[] chars = Encoding.UTF8.GetBytes(s);
using (MemoryStream stream = new MemoryStream(chars))
{
using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
{
while (reader.ReadLine() != null)
{
yield return stream.Position;
}
}
}
}
This will give you back the start position of each new line. Obviously you can tweak this to do whatever else you need, i.e. do something else with the actual lines you read.
Just note that this has to make a copy of the string to create the byte array, so it's really not suitable for very large strings. It's a bit nicer than the char-by-char approach though, less bug-prone, so perhaps worth considering if the strings are not megabytes-long.
If you only care about the position: ReadLine() moves you to the next line. If you store the .Position of the stream underneath you can compare it to the .Position after the following ReadLine(). That's the length of the string you just read plus the delimiter.
Length of the delimiter is currentPosition - previousPosition - line.Length.
That way you could easily find out if it was 1 or 2 bytes (without knowing the details, but you said you care only about the positions anyway).
File.ReadAllText will get you all of the file contents. Yup. All. So you better check that file size before using it.
EDIT:
read it all in then create an enumerator that yields line by line.
foreach(string line in Read("some.file"))
{ ... }
private IEnumerator Read(string file)
{
string buffer = File.ReadAllText()
for (int index=0;index<buffer.length;index++)
{
string line = ... logic to build a "line" here
yield return line;
}
yield break;
}
FileStream fs = new FileStream("E:\\hh.txt", FileMode.Open, FileAccess.Read);
BinaryReader read = new BinaryReader(fs);
byte[] ch = read.ReadBytes((int)fs.Length);
byte[] che=new byte[(int)fs.Length];
int size = (int)fs.Length,j=0;
for ( int i =0; i <= (size-1); i++)
{
if (ch[i] != '|')
{
che[j] = ch[i];
j++;
}
}
richTextBox1.Text = Encoding.ASCII.GetString(che);
read.Close();
fs.Close();
I had an interview question that asked me for my 'feedback' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily on large strings.
public string ReverseString(string sz)
{
string result = string.Empty;
for(int i = sz.Length-1; i>=0; i--)
{
result += sz[i]
}
return result;
}
I couldn't spot it. I saw no problems whatsoever.
In hindsight I could have said the user should resize but it looks like C# doesn't have a resize (i am a C++ guy).
I ended up writing things like use an iterator if its possible, [x] in containers could not be random access so it may be slow. and misc things. But I definitely said I never had to optimize C# code so my thinking may have not failed me on the interview.
I wanted to know, what is the problem with this code, do you guys see it?
-edit-
I changed this into a wiki because there can be several right answers.
Also i am so glad i explicitly said i never had to optimize a C# program and mentioned the misc other things. Oops. I always thought C# didnt have any performance problems with these type of things. oops.
Most importantly? That will suck performance wise - it has to create lots of strings (one per character). The simplest way is something like:
public static string Reverse(string sz) // ideal for an extension method
{
if (string.IsNullOrEmpty(sz) || sz.Length == 1) return sz;
char[] chars = sz.ToCharArray();
Array.Reverse(chars);
return new string(chars);
}
The problem is that string concatenations are expensive to do as strings are immutable in C#. The example given will create a new string one character longer each iteration which is very inefficient. To avoid this you should use the StringBuilder class instead like so:
public string ReverseString(string sz)
{
var builder = new StringBuilder(sz.Length);
for(int i = sz.Length-1; i>=0; i--)
{
builder.Append(sz[i]);
}
return builder.ToString();
}
The StringBuilder is written specifically for scenarios like this as it gives you the ability to concatenate strings without the drawback of excessive memory allocation.
You will notice I have provided the StringBuilder with an initial capacity which you don't often see. As you know the length of the result to begin with, this removes needless memory allocations.
What normally happens is it allocates an amount of memory to the StringBuilder (default 16 characters). Once the contents attempts to exceed that capacity it doubles (I think) its own capactity and carries on. This is much better than allocating memory each time as would happen with normal strings, but if you can avoid this as well it's even better.
A few comments on the answers given so far:
Every single one of them (so far!) will fail on surrogate pairs and combining characters. Oh the joys of Unicode. Reversing a string isn't the same as reversing a sequence of chars.
I like Marc's optimisation for null, empty, and single character inputs. In particular, not only does this get the right answer quickly, but it also handles null (which none of the other answers do)
I originally thought that ToCharArray followed by Array.Reverse would be the fastest, but it does create one "garbage" copy.
The StringBuilder solution creates a single string (not char array) and manipulates that until you call ToString. There's no extra copying involved... but there's a lot more work maintaining lengths etc.
Which is the more efficient solution? Well, I'd have to benchmark it to have any idea at all - but even so that's not going to tell the whole story. Are you using this in a situation with high memory pressure, where extra garbage is a real pain? How fast is your memory vs your CPU, etc?
As ever, readability is usually king - and it doesn't get much better than Marc's answer on that front. In particular, there's no room for an off-by-one error, whereas I'd have to actually put some thought into validating the other answers. I don't like thinking. It hurts my brain, so I try not to do it very often. Using the built-in Array.Reverse sounds much better to me. (Okay, so it still fails on surrogates etc, but hey...)
Since strings are immutable, each += statement will create a new string by copying the string in the last step, along with the single character to form a new string. Effectively, this will be an O(n2) algorithm instead of O(n).
A faster way would be (O(n)):
// pseudocode:
static string ReverseString(string input) {
char[] buf = new char[input.Length];
for(int i = 0; i < buf.Length; ++i)
buf[i] = input[input.Length - i - 1];
return new string(buf);
}
You can do this in .NET 3.5 instead:
public static string Reverse(this string s)
{
return new String((s.ToCharArray().Reverse()).ToArray());
}
Better way to tackle it would be to use a StringBuilder, since it is not immutable you won't get the terrible object generation behavior that you would get above. In .net all strings are immutable, which means that the += operator there will create a new object each time it is hit. StringBuilder uses an internal buffer, so the reversal could be done in the buffer w/ no extra object allocations.
You should use the StringBuilder class to create your resulting string. A string is immutable so when you append a string in each interation of the loop, a new string has to be created, which isn't very efficient.
I prefer something like this:
using System;
using System.Text;
namespace SpringTest3
{
static class Extentions
{
static private StringBuilder ReverseStringImpl(string s, int pos, StringBuilder sb)
{
return (s.Length <= --pos || pos < 0) ? sb : ReverseStringImpl(s, pos, sb.Append(s[pos]));
}
static public string Reverse(this string s)
{
return ReverseStringImpl(s, s.Length, new StringBuilder()).ToString();
}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("abc".Reverse());
}
}
}
x is the string to reverse.
Stack<char> stack = new Stack<char>(x);
string s = new string(stack.ToArray());
This method cuts the number of iterations in half. Rather than starting from the end, it starts from the beginning and swaps characters until it hits center. Had to convert the string to a char array because the indexer on a string has no setter.
public string Reverse(String value)
{
if (String.IsNullOrEmpty(value)) throw new ArgumentNullException("value");
char[] array = value.ToCharArray();
for (int i = 0; i < value.Length / 2; i++)
{
char temp = array[i];
array[i] = array[(array.Length - 1) - i];
array[(array.Length - 1) - i] = temp;
}
return new string(array);
}
Necromancing.
As a public service, this is how you actually CORRECTLY reverse a string (reversing a string is NOT equal to reversing a sequence of chars)
public static class Test
{
private static System.Collections.Generic.List<string> GraphemeClusters(string s)
{
System.Collections.Generic.List<string> ls = new System.Collections.Generic.List<string>();
System.Globalization.TextElementEnumerator enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(s);
while (enumerator.MoveNext())
{
ls.Add((string)enumerator.Current);
}
return ls;
}
// this
private static string ReverseGraphemeClusters(string s)
{
if(string.IsNullOrEmpty(s) || s.Length == 1)
return s;
System.Collections.Generic.List<string> ls = GraphemeClusters(s);
ls.Reverse();
return string.Join("", ls.ToArray());
}
public static void TestMe()
{
string s = "Les Mise\u0301rables";
// s = "noël";
string r = ReverseGraphemeClusters(s);
// This would be wrong:
// char[] a = s.ToCharArray();
// System.Array.Reverse(a);
// string r = new string(a);
System.Console.WriteLine(r);
}
}
See:
https://vimeo.com/7403673
By the way, in Golang, the correct way is this:
package main
import (
"unicode"
"regexp"
)
func main() {
str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}
func ReverseGrapheme(str string) string {
buf := []rune("")
checked := false
index := 0
ret := ""
for _, c := range str {
if !unicode.Is(unicode.M, c) {
if len(buf) > 0 {
ret = string(buf) + ret
}
buf = buf[:0]
buf = append(buf, c)
if checked == false {
checked = true
}
} else if checked == false {
ret = string(append([]rune(""), c)) + ret
} else {
buf = append(buf, c)
}
index += 1
}
return string(buf) + ret
}
func ReverseGrapheme2(str string) string {
re := regexp.MustCompile("\\PM\\pM*|.")
slice := re.FindAllString(str, -1)
length := len(slice)
ret := ""
for i := 0; i < length; i += 1 {
ret += slice[length-1-i]
}
return ret
}
And the incorrect way is this (ToCharArray.Reverse):
func Reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
Note that you need to know the difference between
- a character and a glyph
- a byte (8 bit) and a codepoint/rune (32 bit)
- a codepoint and a GraphemeCluster [32+ bit] (aka Grapheme/Glyph)
Reference:
Character is an overloaded term than can mean many things.
A code point is the atomic unit of information. Text is a sequence of
code points. Each code point is a number which is given meaning by the
Unicode standard.
A grapheme is a sequence of one or more code points that are displayed
as a single, graphical unit that a reader recognizes as a single
element of the writing system. For example, both a and ä are
graphemes, but they may consist of multiple code points (e.g. ä may be
two code points, one for the base character a followed by one for the
diaresis; but there's also an alternative, legacy, single code point
representing this grapheme). Some code points are never part of any
grapheme (e.g. the zero-width non-joiner, or directional overrides).
A glyph is an image, usually stored in a font (which is a collection
of glyphs), used to represent graphemes or parts thereof. Fonts may
compose multiple glyphs into a single representation, for example, if
the above ä is a single code point, a font may chose to render that as
two separate, spatially overlaid glyphs. For OTF, the font's GSUB and
GPOS tables contain substitution and positioning information to make
this work. A font may contain multiple alternative glyphs for the same
grapheme, too.
static string reverseString(string text)
{
Char[] a = text.ToCharArray();
string b = "";
for (int q = a.Count() - 1; q >= 0; q--)
{
b = b + a[q].ToString();
}
return b;
}