How do I get 3 lines of text from a paragraph - c#

I'm trying to create an "snippet" from a paragraph. I have a long paragraph of text with a word hilighted in the middle. I want to get the line containing the word before that line and the line after that line.
I have the following piece of information:
The text (in a string)
The lines are deliminated by a NEWLINE character \n
I have the index into the string of the text I want to hilight
A couple other criteria:
If my word falls on first line of the paragraph, it should show the 1st 3 lines
If my word falls on the last line of the paragraph, it should show the last 3 lines
Should show the entire paragraph in the degenative cases (the paragraph only has 1 or 2 lines)
Here's an example:
This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
This is the 5th line of RABBIT text in the paragraph
Example, if my index points to BIRD, it should show lines 1, 2, & 3 as one complete string like this:
This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
If my index points to DOG, it should show lines 3, 4, & 5 as one complete string like this:
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
This is the 5th line of RABBIT text in the paragraph
etc.
Anybody want to help tackle this?

In my opinion this is an excellent opportunity to use the StringReader class:
Read your text line by line.
Keep your lines in some kind of buffer (e.g., a Queue<string>), dropping lines you don't need after a given number of lines have been read.
Once your "needle" is found, read one more line (if possible) and then just return what's in your buffer.
In my opinion, this has some advantages over the other approaches suggested:
Since it doesn't utilize String.Split, it doesn't do more work than you need -- i.e., reading the entire string looking for the characters to split on, and creating an array of the substrings.
In fact, it doesn't necessarily read the entire string at all, since once it finds the text it's looking for it only goes as far as necessary to get the desired number of padding lines.
It could even be refactored (very easily) to be able to deal with any textual input via a TextReader -- e.g., a StreamReader -- so it could even work with huge files, without having to load the entire contents of a given file into memory.
Imagine this scenario: you want to find an excerpt of text from a text file that contains the entire text from a novel. (Not that this is your scenario -- I'm just speaking hypothetically.) Using String.Split would require that the entire text of the novel be split according to the delimiter you specified, whereas using a StringReader (well, in this case, a StreamReader) would only require reading until the desired text was found, at which point the excerpt would be returned.
Again, I realize this isn't necessarily your scenario -- just suggesting that this approach provides scalability as one of its strengths.
Here's a quick implementation:
// rearranged code to avoid horizontal scrolling
public static string FindSurroundingLines
(string haystack, string needle, int paddingLines) {
if (string.IsNullOrEmpty(haystack))
throw new ArgumentException("haystack");
else if (string.IsNullOrEmpty(needle))
throw new ArgumentException("needle");
else if (paddingLines < 0)
throw new ArgumentOutOfRangeException("paddingLines");
// buffer needs to accomodate paddingLines on each side
// plus line containing the needle itself, so:
// (paddingLines * 2) + 1
int bufferSize = (paddingLines * 2) + 1;
var buffer = new Queue<string>(/*capacity*/ bufferSize);
using (var reader = new StringReader(haystack)) {
bool needleFound = false;
while (!needleFound && reader.Peek() != -1) {
string line = reader.ReadLine();
if (buffer.Count == bufferSize)
buffer.Dequeue();
buffer.Enqueue(line);
needleFound = line.Contains(needle);
}
// at this point either the needle has been found,
// or we've reached the end of the text (haystack);
// all that's left to do is make sure the string returned
// includes the specified number of padding lines
// on either side
int endingLinesRead = 0;
while (
(reader.Peek() != -1 && endingLinesRead++ < paddingLines) ||
(buffer.Count < bufferSize)
) {
if (buffer.Count == bufferSize)
buffer.Dequeue();
buffer.Enqueue(reader.ReadLine());
}
var resultBuilder = new StringBuilder();
while (buffer.Count > 0)
resultBuilder.AppendLine(buffer.Dequeue());
return resultBuilder.ToString();
}
}
Some example input/output (with text containing your example input):
Code:
Console.WriteLine(FindSurroundingLines(text, "MOUSE", 1);
Output:
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
Code:
Console.WriteLine(FindSurroundingLines(text, "BIRD", 1);
Output:
This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
Code:
Console.WriteLine(FindSurroundingLines(text, "DOG", 0);
Output:
This is the 4th line of DOG text in the paragraph
Code:
Console.WriteLine(FindSurroundingLines(text, "This", 2);
Output:
This is the 1st line of CAT text in the paragraph
This is the 2nd line of BIRD text in the paragraph
This is the 3rd line of MOUSE text in the paragraph
This is the 4th line of DOG text in the paragraph
This is the 5th line of RABBIT text in the paragraph

Using the LINQ extension methods to get the right strings:
string[] lines = text.Split('\n');
// Find the right line to work with
int position = 0;
for (int i = 0; i < lines.Count(); i++)
if (lines[i].Contains(args[0]))
position = i - 1;
// Get in range if we had a match in the first line
if (position == -1)
position = 0;
// Adjust the line index so we have 3 lines to work with
if (position > lines.Count() - 3)
position = lines.Count() - 3;
string result = String.Join("\n", lines.Skip(position).Take(3).ToArray());
This can of course be optimized a bit by quitting the for loop as soon as the index has been found, and probably a number of other things. You can probably even LINQify so you never need to actually store that extra array, but I can't think of a good way to do that right now.
An alternative for the checks on position could be something like position = Math.Max(0,Math.Min(position, lines.Count() - 3)); - which would handle both of them at once.

There are a few ways one can handle this:
First Method:
Use String.IndexOf() and String.LastIndexOf().
You can find where the current selected word is by using TextBox.SelectionStart(). Then simply look for LastIndexOf from the selection location looking for the '\n' to find the previous line (don't grab the first lastindexof from the selection, once you find one...do it again from that location so you get the beginning of that line). Then do the same from the selection point only using IndexOf to find the '\n' to get the end of the line. Once again, don't use the first one you find, repeat it starting from the first found location to get the second line's end. Then simply substring the text with the area you found.
Second Method: Use String.Split() by the '\n' character (creates an array of strings, each one containing a different line from the text in order of array index). Find the index of the line the text is in, and then simply grab from the String[index] for the line before, including, and after. Hopefully this two methods are clear enough for you to figure out your coding. If you are still stuck, let me know.

Alright. Lemme have a crack,
I think the first thing I would do is split everything into arrays. Simply because then we have a simple way to "count" the lines.
string[] lines = fullstring.Split('\n');
Once we have that, Unfortunately I don't know of any indexof that goes through each point in an array. There probably is one, but without trawling through the internet, I would simply go
int i = -1;
string animal = 'bird';
foreach(string line in lines)
{
i++;
if(line.indexof(animal) > -1) break;
}
// we will need a if(i == -1) then we didn't find the animal etc
Ok so then, We now have the line. All we need to do, is...
if(i == 0)
{
writeln(lines[0);
writeln(lines[1]);
etc
}
else
if(i == lines.count - 1)
{
//this means last array index
}
else
{
//else we are in the middle. So just write out the i -1, i, i+1
}
I know that is messy as hell. But that's how I would solve the issue.

Related

Overwriting Console line with various length texts

I'm trying to display percentages of loading in the same place
and I found solution on that
Console.Write($"\r{ (double) (i+1) * 100 / list.Count }% - {text}");
but after the percentage I'd want to display some text which has different lengths e.g something between 20-40 characters
The problem with this approach is that if "new" line is shorter than "previous" then some part of "previous" text still remains there.
I managed to write 'hack' which overwrites current line with spaces (clears it) and then writes my line
Console.Write($"\r ");
Console.Write($"\r{ (double) (i+1) * 100 / list.Count }% - {text}");
Is there an better solution to do that?
The easiest way to do this is generally with
var stringOfLengthMaxWithSpacestoLeft = yourString.PadLeft(MaxStringLength, ' ');
or
var stringOfLengthMaxWithSpacestoRight = yourString.PadRight(MaxStringLength, ' ');
If you want to clear the line, all you have to do is use the backspace character and then overwrite with with the same length, i.e.
for (var i = 0; i++; i < MaxStringLength)
Console.Write("\b");
Then you can start writing again.

Read random line from a large text file

I have a file with 5000+ lines. I want to find the most efficient way to choose one of those lines each time I run my program. I had originally intended to use the random method to choose one (that was before I knew there were 5000 lines). Thought that might be inefficient so I thought I'd look at reading the first line, then deleting it from the top and appending it to the bottom. But it seems that I have to read the whole file and create a new file to delete from the top.
What is the most efficient way: the random method or the new file method?
The program will be run every 5 mins and I'm using c# 4.5
In .NET 4.*, it is possible to access a single line of a file directly. For example, to get line X:
string line = File.ReadLines(FileName).Skip(X).First();
Full example:
var fileName = #"C:\text.txt"
var file = File.ReadLines(fileName).ToList();
int count = file.Count();
Random rnd = new Random();
int skip = rnd.Next(0, count);
string line = file.Skip(skip).First();
Console.WriteLine(line);
Lets assume file is so large that you cannot afford to fit it into RAM. Then, you would want to use Reservoir Sampling, an algorithm designed to handle picking randomly from lists of unknown, arbitrary length that might not fit into memory:
Random r = new Random();
int currentLine = 1;
string pick = null;
foreach (string line in File.ReadLines(filename))
{
if (r.Next(currentLine) == 0) {
pick = line;
}
++currentLine;
}
return pick;
At a high level, reservoir sampling follows a basic rule: Each further line has a 1/N chance of replacing all previous lines.
This algorithm is slightly unintuitive. At a high level, it works by having line N have a 1/N chance of replacing the currently selected line. Thus, line 1 has a 100% chance of being selected, but a 50% chance of later being replaced by line 2.
I've found understanding this algorithm to be easiest in the form of a proof of correctness. So, a simple proof by induction:
1) Base case: By inspection, the algorithm works if there is 1 line.
2) If the algorithm works for N-1 lines, processing N lines works because:
3) After processing N-1 iterations of an N line file, all N-1 lines are equally likely (probability 1/(N-1)).
4) The next iteration insures that line N has a probability of 1/N (because that's what the algorithm explicitly assigns it, and it is the final iteration), reducing the probability of all previous lines to:
1/(N-1) * (1-(1/N))
1/(N-1) * (N/N-(1/N))
1/(N-1) * (N-1)/N
(1*(N-1)) / (N*(N-1))
1/N
If you know how many lines are in the file in advance, this algorithm is more expensive than necessary, as it always reads the entire file.
I assume that the goal is to randomly choose one line from a file of 5000+ lines.
Try this:
Get the line count using File.ReadLines(file).Count().
Generate a random number, using the line count as an upper limit.
Do a lazy read of the file with File.ReadLines(file).
Choose a line from this array using the random number.
EDIT: as pointed out, doing File.ReadLines(file).toArray() is pretty inefficient.
Here's a quick implementation of #LucasTrzesniewskis proposed method in the comments to the question:
// open the file
using(FileStream stream = File.OpenRead("yourfile.dat"))
{
// 1. index all offsets that are the beginning of a line
List<Long> lineOffsets = new List<Long>();
lineOffsets.Add(stream.Position); //the very first offset is a beginning of a line!
int ch;
while((ch = stream.ReadByte()) != -1) // "-1" denotes the end of the file
{
if(ch == '\n')
lineOffsets.Add(stream.Position);
}
// 2. read a random line
stream.Seek(0, SeekOrigin.Begin); // go back to the beginning of the file
// set the position of the stream to one the previously saved offsets
stream.Position = lineOffsets[new Random().Next(lineOffsets.Count)];
// read the whole line from the specified offset
using(StreamReader reader = new StreamReader(stream))
{
Console.WriteLine(reader.ReadLine());
}
}
I don't have any VS near me at the moment, so this is untested.

Selecting text and changing it's color on a line of a RichTextBox

Good Afternoon. I am new to stack overflow as a poster but have referenced it for years. I have been researching this problem of mine for about 2 weeks and while I've seen solutions that are close I still am left with an issue.
I am writing a C# gui that reads in an assembly code file and highlights different text items for further processing via another program. My form has a RichTextBox that the text is displayed in. In the case below I am trying to select the text at the location of the ‘;’ until the end of the line and change the text to color red. Here is the code that I am using.
Please note: The files that are read in by the program are of inconsistent length, not all lines are formatted the same so I cannot simply search for the ';' and operate on that.
On another post a member has given an extension method for AppendText which I have gotten to work perfectly except for the original text is still present along with my reformatted text. Here is the link to that site:
How to use multi color in richtextbox
// Loop that it all runs in
Foreach (var line in inArray)
{
// getting the index of the ‘;’ assembly comments
int cmntIndex = line.LastIndexOf(';');
// getting the index of where I am in the rtb at this time.
int rtbIndex = rtb.GetFirstCharIndexOfCurrentLine();
// just making sure I have a valid index
if (cmntIndex != -1)
{
// using rtb.select to only select the desired
// text but for some reason I get it all
rtb.Select(cmntIndex + rtbIndex, rtb.SelectionLength);
rtb.SelectionColor = Color.Red;
}
}
Below is the sample assembly code from a file in it's original form all the text is black:
;;TAG SOMETHING, SOMEONE START
ASSEMBLY CODE ; Assembly comments
ASSEMBLY CODE ; Assembly comments
ASSEMBLY CODE ; Assembly comments
;;TAG SOMETHING, SOMEONE FINISH
When rtb.GetFirstCharIndexOfCurrentLine() is called it returns a valid index of the RTB and I imagine that if I add the value returned by line.LastIndexOf(';') I will then be able to just select the text above that looks like ; Assembly comments and turn it red.
What does happen is that the entire line turns red.
When I use the AppendText method above I get
ASSEMBLY CODE (this is black) ; Assembly comments (this is red) (the rest is black) ASSEMBLY CODE ; Assembly comments
The black code is the exact same code as the recolored text. In this case I need to know how to clear the line in the RTB and/or overwrite the text there. All the options that I have tried result in deletion of those lines.
Anywho, I'm sure that was lengthy but I'm really stumped here and would greatly appreciate advice.
I hope I've understood you correctly.
This loops over each line in the richtextbox, works out which lines are the assembly comments, then makes everything red after the ";"
With FOREACH loop as requested
To use a foreach loop you simply need to keep track of the index manually like so:
// Index
int index = 0;
// Loop over each line
foreach (string line in richTextBox1.Lines)
{
// Ignore the non-assembly lines
if (line.Substring(0, 2) != ";;")
{
// Start position
int start = (richTextBox1.GetFirstCharIndexFromLine(index) + line.LastIndexOf(";") + 1);
// Length
int length = line.Substring(line.LastIndexOf(";"), (line.Length - (line.LastIndexOf(";")))).Length;
// Make the selection
richTextBox1.SelectionStart = start;
richTextBox1.SelectionLength = length;
// Change the colour
richTextBox1.SelectionColor = Color.Red;
}
// Increase index
index++;
}
With FOR loop
// Loop over each line
for(int i = 0; i < richTextBox1.Lines.Count(); i++)
{
// Current line text
string currentLine = richTextBox1.Lines[i];
// Ignore the non-assembly lines
if (currentLine.Substring(0, 2) != ";;")
{
// Start position
int start = (richTextBox1.GetFirstCharIndexFromLine(i) + currentLine.LastIndexOf(";") + 1);
// Length
int length = currentLine.Substring(currentLine.LastIndexOf(";"), (currentLine.Length - (currentLine.LastIndexOf(";")))).Length;
// Make the selection
richTextBox1.SelectionStart = start;
richTextBox1.SelectionLength = length;
// Change the colour
richTextBox1.SelectionColor = Color.Red;
}
}
Edit:
Re-reading your question I'm confused as to whether you wanted to make the ; red as well.
If you do remove the +1 from this line:
int start = (richTextBox1.GetFirstCharIndexFromLine(i) + currentLine.LastIndexOf(";") + 1);
Private Sub RichTextBox1_Click(sender As Object, e As EventArgs) Handles RichTextBox1.Click
Dim MyInt1 As Integer
Dim MyInt2 As Integer
' Reset your RTB back color to white at each click
RichTextBox1.SelectionBackColor = Color.White
' Define the nth first character number of the line you clicked
MyInt1 = RichTextBox1.GetFirstCharIndexOfCurrentLine()
' use that nth to find the line number in the RTB
MyInt2 = RichTextBox1.GetLineFromCharIndex(MyInt1)
'Select the line using an array property of RTB (RichTextBox1.Lines())
RichTextBox1.Select(MyInt1, RichTextBox1.Lines(MyInt2).Length)
' This line would be for font color change : RichTextBox1.SelectionColor = Color.Maroon
' This one changes back color :
RichTextBox1.SelectionBackColor = Color.Yellow
End Sub
' There are a few bugs inherent to the rtb.select method
' It bugs if a line wraps, or fails on an "http" line... probably more.
(I just noticed the default stackoverflow.com character colors on my above code are not correct for comment lines and others.)

RichTextBox - sorting lines randomly

I want to write an application which sorts randomly line of text which I copy from a source and paste into RichTextBox area.
However, there is one condition - text is formatted (some words are in bold, underline etc.). So any suggestions? How should it look like?
I think I should use RichTextBox.Rtf or something but I am really a beginner and I appreciate every hint or example code.
Thanks
It is a bit tricky. You can retrieve the formatted RTF text lines like this
string[] rtfLines = new string[richTextBox1.Lines.Length];
for (int i = 0; i < rtfLines.Length; i++) {
int start = richTextBox1.GetFirstCharIndexFromLine(i);
int length = richTextBox1.Lines[i].Length;
richTextBox1.Select(start, length);
rtfLines[i] = richTextBox1.SelectedRtf;
}
Now you can shuffle the lines like this
var random = new Random();
rtfLines = rtfLines.OrderBy(s => random.NextDouble()).ToArray();
Clear the RichtTextBox
richTextBox1.Text = "";
Inserting the lines is best done in reverse order because it is easier to select the beginning of the text
// Insert the line which will be the last line.
richTextBox1.Select(0, 0);
richTextBox1.SelectedRtf = rtfLines[0];
// Prepend the other lines and add a line break.
for (int i = 1; i < rtfLines.Length; i++) {
richTextBox1.Select(0, 0);
// Replace the ending "}\r\n" with "\\par }\r\n". "\\par" is a line break.
richTextBox1.SelectedRtf =
rtfLines[i].Substring(0, rtfLines[i].Length - 3) + "\\par }\r\n";
}
The task seems not complicated(if I understand it correctly).
Get your clipboard into string then parse into array- use Split().
Then determine how many randon events you need and iterate through every word ; generate random number for each iteration(which should match the amount of events), intersect that number with one of the events and apply that case to that particular word. Maybe not the most efficient way to do it, but that's what comes to my mind

How to get String Line number in Foreach loop from reading array?

The program helps users to parse a text file by grouping certain part of the text files into "sections" array.
So the question is "Are there any methods to find out the line numbers/position within the array?" The program utilizes a foreach loop to read the "sections" array.
May someone please advise on the codes? Thanks!
namespace Testing
{
class Program
{
static void Main(string[] args)
{
TextReader tr = new StreamReader(#"C:\Test\new.txt");
String SplitBy = "----------------------------------------";
// Skip 5 lines of the original text file
for(var i = 0; i < 5; i++)
{
tr.ReadLine();
}
// Read the reststring
String fullLog = tr.ReadToEnd();
String[] sections = fullLog.Split(new string[] { SplitBy }, StringSplitOptions.None);
//String[] lines = sections.Skip(5).ToArray();
int t = 0;
// Tried using foreach (String r in sections.skip(4)) but skips sections instead of the Text lines found within each sections
foreach (String r in sections)
{
Console.WriteLine("The times are : " + t);
// Is there a way to know or get the "r" line number?
Console.WriteLine(r);
Console.WriteLine("============================================================");
t++;
}
}
}
}
A foreach loop doesn't have a loop counter of any kind. You can keep your own counter:
int number = 1;
foreach (var element in collection) {
// Do something with element and number,
number++;
}
or, perhaps easier, make use of LINQ's Enumerable.Select that gives you the current index:
var numberedElements = collection.Select((element, index) => new { element, index });
with numberedElements being a collection of anonymous type instances with properties element and index. In the case a file you can do this:
var numberedLines = File.ReadLines(filename)
.Select((Line,Number) => new { Line, Number });
with the advantage that the whole thing is processed lazily, so it will only read the parts of the file into memory that you actually use.
As far as I know, there is not a way to know which line number you are at within the file. You'd either have to keep track of the lines yourself, or read the file again until you get to that line and count along the way.
Edit:
So you're trying to get the line number of a string inside the array after the master string's been split by the SplitBy?
If there's a specific delimiter in that sub string, you could split it again - although, this might not give you what you're looking for, except...
You're essentially back at square one.
What you could do is try splitting the section string by newline characters. This should spit it out into an array that corresponds with line numbers inside the string.
Yes, you can use a for loop instead of foreach. Also, if you know the file isn't going to be too large, you can read all of the lines into an array with:
string[] lines = File.ReadAllLines(#"C:\Test\new.txt");
Well, don't use a foreach, use a for loop
for( int i = 0; i < sections.Length; ++ )
{
string section = sections[i];
int lineNum = i + 1;
}
You can of course maintain a counter when using a foreach loop as well, but there is no reason to since you have the standard for loop at your disposal which is made for this sort of thing.
Of course, this won't necessarily give you the line number of the string in the text file unless you split on Environment.NewLine. You are splitting on a large number of '-' characters and I have no idea how your file is structured. You'll likely end up underestimating the line number because all of the '---' bits will be discarded.
Not as your code is written. You must track the line number for yourself. Problematic areas of your code:
You skip 5 lines at the beginning of your code, you must track this.
Using the Split method, you are potentially "removing" lines from the original collection of lines. You must find away to know how many splits you have made, because they are an original part of the line count.
Rather than taking the approach you have, I suggest doing the parsing and searching within a classic indexed for-loop that visits each line of the file. This probably means giving up conveniences like Split, and rather looking for markers in the file manually with e.g. IndexOf.
I've got a much simpler solution to the questions after reading through all the answers yesterday.
As the string had a newline after each line, it is possible to split the strings and convert it into a new array which then is possible to find out the line number according to the array position.
The Codes:
foreach (String r in sections)
{
Console.WriteLine("The times are : " + t);
IList<String> names = r.Split('\n').ToList<String>();
}

Categories

Resources