find all next lines when previous line contains a string

find all next lines when previous line contains a string - c#

I'm working on an ASP mvc application and i'm trying to get all the next lines when previous line contains a word
I've used the code below but i just can get the last line that contains the word given
int counter = 0;
string line;
List<string> found = new List<string>();
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader("C:\\Users\\Chaimaa\\Documents\\path.txt");
while ((line = file.ReadLine()) != null)
{
if (line.Contains("fact"))
{
found.Add(line);
}
foreach (var i in found)
{
var output = i;
ViewBag.highlightedText = output;
}
}
Any help on what should I add to
1- get ALL lines that contains the word
2- and preferably get the ALL NEXT lines

You can use an overload of Where that provides an index, store indexes in a hash set, and use the containment check to decide if a line should be kept or not, like this:
var seen = new HashSet<int>();
var res = data.Where((v, i) => {
if (v.Contains("fact")) {
seen.Add(i);
}
return seen.Contains(i-1);
});
Demo.
As a side benefit, seen would contain indexes of all lines where the word "fact" has been found.

You can write a PairWise method that takes in a sequence of values and returns a sequence containing each item paired with the item that came before it:
public static IEnumerable<Tuple<T, T>> Pairwise<T>(this IEnumerable<T> source)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
T prev = iterator.Current;
while (iterator.MoveNext())
{
yield return Tuple.Create(prev, iterator.Current);
prev = iterator.Current;
}
}
}
With this method we can pair off each of the lines, get the lines where the previous value contains a word, and then project out the second value, which is the line after it:
var query = lines.Pairwise()
.Where(pair => pair.Item1.Contains(word))
.Select(pair => pair.Item2);

Related

How to split a string into an array of two letter substrings with C#

Problem
Given a sample string abcdef, i am trying to split that into an array of two character string elements that should results in ['ab','cd','ef'];
What i tried
I tried to iterate through the string while storing the substring in the current index in an array i declared inside the method, but am getting this output
['ab','bc','cd','de','ef']
Code I used
static string[] mymethod(string str)
{
string[] r= new string[str.Length];
for(int i=0; i<str.Length-1; i++)
{
r[i]=str.Substring(i,2);
}
return r;
}
Any solution to correct that with the code to return the correct output is really welcome, Thanks

your problem was that you incremented your index by 1 instead of 2 every time
var res = new List<string>();
for (int i = 0; i < x.Length - 1; i += 2)
{
res.Add(x.Substring(i, 2));
}
should work
EDIT:
because you ask for a default _ suffix in case of odd characters amount,
this should be the change:
var testString = "odd";
string workOn = testString.Length % 2 != 0
? testString + "_"
: testString;
var res = new List<string>();
for (int i = 0; i < workOn.Length - 1; i += 2)
{
res.Add(workOn.Substring(i, 2));
}
two notes to notice:
in .NET 6 Chunk() is available so you can use this as suggested in other answers
this solution might not be the best in case of a very long input
so it really depends on what are your inputs and expectations

.net 6 has an IEnumerable.Chunk() method that you can use to do this, as follows:
public static void Main()
{
string[] result =
"abcdef"
.Chunk(2)
.Select(chunk => new string(chunk)).ToArray();
Console.WriteLine(string.Join(", ", result)); // Prints "ab, cd, ef"
}
Before .net 6, you can use MoreLinq.Batch() to do the same thing.
[EDIT] In response the the request below:
MoreLinq is a set of Linq utilities originally written by Jon Skeet. You can find an implementation by going to Project | Manage NuGet Packages and then browsing for MoreLinq and installing it.
After installing it, add using MoreLinq.Extensions; and then you'll be able to use the MoreLinq.Batch extension like so:
public static void Main()
{
string[] result = "abcdef"
.Batch(2)
.Select(chunk => new string(chunk.ToArray())).ToArray();
Console.WriteLine(string.Join(", ", result)); // Prints "ab, cd, ef"
}
Note that there is no string constructor that accepts an IEnumerable<char>, hence the need for the chunk.ToArray() above.
I would say, though, that including the whole of MoreLinq just for one extension method is perhaps overkill. You could just write your own extension method for Enumerable.Chunk():
public static class MyBatch
{
public static IEnumerable<T[]> Chunk<T>(this IEnumerable<T> self, int size)
{
T[] bucket = null;
int count = 0;
foreach (var item in self)
{
if (bucket == null)
bucket = new T[size];
bucket[count++] = item;
if (count != size)
continue;
yield return bucket;
bucket = null;
count = 0;
}
if (bucket != null && count > 0)
yield return bucket.Take(count).ToArray();
}
}

If you are using latest .NET version i.e (.NET 6.0 RC 1), then you can try Chunk() method,
var strChunks = "abcdef".Chunk(2); //[['a', 'b'], ['c', 'd'], ['e', 'f']]
var result = strChunks.Select(x => string.Join('', x)).ToArray(); //["ab", "cd", "ef"]
Note: I am unable to test this on fiddle or my local machine due to latest version of .NET

With linq you can achieve it with the following way:
char[] word = "abcdefg".ToCharArray();
var evenCharacters = word.Where((_, idx) => idx % 2 == 0);
var oddCharacters = word.Where((_, idx) => idx % 2 == 1);
var twoCharacterLongSplits = evenCharacters
.Zip(oddCharacters)
.Select((pair) => new char[] { pair.First, pair.Second });
The trick is the following, we create two collections:
one where we have only those characters where the original index was even (% 2 == 0)
one where we have only those characters where the original index was odd (% 2 == 1)
Then we zip them. So, we create a tuple by taking one item from the even and one item from the odd collection. Then we create a new tuple by taking one item from the even and ...
And last we convert the tuples to arrays to have the desired output format.

You are on the right track but you need to increment by 2 not by one. You also need to check if the array has not ended before taking the second character else you risk running into an index out of bounds exception. Try this code I've written below. I've tried it and it works. Best!
public static List<string> splitstring(string str)
{
List<string> result = new List<string>();
int strlen = str.Length;
for(int i = 0; i<strlen; i+=2)
{
string currentstr = str[i].ToString();
if (i + 1 <= strlen-1)
{ currentstr += str[i + 1].ToString(); }
result.Add(currentstr);
}
return result;
}

Quickest algorithm for identifying pairs in collection of string

I am looking for the quickest algorithm:
GOAL: output the total number of pair occurrences found on a line. The individual elements may be in any order on any given line.
INPUT:
a;b;c;d
a;e;f;g
a;b;f;h
OUTPUT
a;b = 2
a;c = 1
a;d = 1
a;e = 1
a;f = 2
a;g = 1
b;c = 1
b;d = 1
I am programming in C#, I've got a nested for loop adding do a common dictionary of type where string is like a;b and when an occurrence is found it adds to the existing int tally or adds a new one at tally = 0.
Note this:
a;b = 1
b;a = 1
Should be reduced to this:
a;b = 1
I am open to using other languages, the output is in a plain text file which I feed into Gephi visualization tool.
Bonus: Very interested to know the name of this particular algorithm if it's out there. Pretty sure it is.
String[] data = File.ReadAllLines(#"C:\input.txt");
Dictionary<string, int> ress = new Dictionary<string, int>();
foreach (var line in data)
{
string[] outStrings = line.Split(';');
for (int i = 0; i < outStrings.Count(); i++)
{
for (int y = 0; y < outStrings.Count(); y++)
{
if (outStrings[i] != outStrings[y])
{
try
{
if (ress.Any(x => x.Key == outStrings[i] + ";" + outStrings[y]))
{
ress[outStrings[i] + ";" + outStrings[y]] += 1;
}
else
{
ress.Add(outStrings[i] + ";" + outStrings[y], 0);
}
}
catch (Exception)
{
}
}
}
}
}
foreach (var val in ress)
{
Console.WriteLine(val.Key + "----" + val.Value);
}

I think your inner loop should start with i + 1 instead of starting back at 0 again, and the outer loop should only run until Length - 1, since the last item will be compared on the inner loop. Also, when you add a new item, you should add the value 1, not 0 (since the whole reason we're adding it is because we found one).
You can also just store the key into a string once instead of doing multiple concatenations during your comparison and assignment, and you can use the ContainsKey method to determine if a key exists already.
Also, you might want to consider avoiding empty catch blocks unless you're really certain that you don't care if or what went wrong. If I'm expecting an exception and know how to handle it, then I catch that exception, otherwise I'll just let it bubble up the stack.
Here's one way you could modify your code to find all pairs and their counts:
Update
I added a check to ensure that the "pair" key is always sorted, so that "b;a" becomes "a;b". This wasn't an issue in your sample data, but I extended the data to include lines like b;a;a;b;a;b;a;. Also I added StringSplitOptions.RemoveEmptyEntries to the Split method to handle cases where a line begins or ends with a ; (otherwise the null value resulted in a pair like ";a").
private static void Main()
{
var data = File.ReadAllLines(#"f:\public\temp\temp.txt");
var pairCount = new Dictionary<string, int>();
foreach (var line in data)
{
var lineItems = line.Split(new[] {';'}, StringSplitOptions.RemoveEmptyEntries);
for (var outer = 0; outer < lineItems.Length - 1; outer++)
{
for (var inner = outer + 1; inner < lineItems.Length; inner++)
{
var outerComparedToInner = string.Compare(lineItems[outer],
lineItems[inner], StringComparison.Ordinal);
// If both items are the same character, ignore them and keep looping
if (outerComparedToInner == 0) continue;
// Create the pair such that the lower of the two
// values is first, so that "b;a" becomes "a;b"
var thisPair = outerComparedToInner < 0
? $"{lineItems[outer]};{lineItems[inner]}"
: $"{lineItems[inner]};{lineItems[outer]}";
if (pairCount.ContainsKey(thisPair))
{
pairCount[thisPair]++;
}
else
{
pairCount.Add(thisPair, 1);
}
}
}
}
Console.WriteLine("Pair\tCount\n----\t-----");
foreach (var val in pairCount.OrderBy(i => i.Key))
{
Console.WriteLine($"{val.Key}\t{val.Value}");
}
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output
Given a file containing your sample data, the output is:

#mrmcgreg, finally after changing the implementation to the ECLAT algorythm everything runs in seconds instead of hours.
Basically for each unique tag, keep track of the LINE NUMBERS where those tags are found, and simply intersect the pair of list of numbers by combination pairs to get the count.
Dictionary<string, List<int>> uniqueTagList = new Dictionary<string, List<int>>();
foreach (var uniqueTag in uniquetags)
{
List<int> lineNumbers = new List<int>();
foreach (var item in data.Select((value, i) => new { i, value }))
{
var value = item.value;
var index = item.i;
//split data into tags
var tags = item.ToString().Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
foreach (var tag in tags)
{
if (uniqueTag == tag)
{
lineNumbers.Add(index);
}
}
}
//remove all having support threshold.
if (lineNumbers.Count > 5)
{
uniqueTagList.Add(uniqueTag, lineNumbers);
}
}

Reading the 2 last line from a text

i am new to c# and i am working on an app that display the time difference from two date on the last two line on a text file.
I want to read the before last line from a file text, i already know how to read the last line but i need to read the before last.
This is my code :
var lastLine = File.ReadAllLines("C:\\test.log").Last();
richTextBox1.Text = lastLine.ToString();

All the previous answers eagerly load all the file up in memory before returning the requested last lines. This can be an issue if the file is big. Luckily, it is easily avoidable.
public static IEnumerable<string> ReadLastLines(string path, int count)
{
if (count < 1)
return Enumerable.Empty<string>();
var queue = new Queue<string>(count);
foreach (var line in File.ReadLines(path))
{
if (queue.Count == count)
queue.Dequeue();
queue.Enqueue(line);
}
return queue;
}
This will only keep in memory the last n read lines avoiding memory issues with large files.

Since
File.ReadAllLines("C:\\test.log");
returns an array you can take the last two items of the array:
var data = File.ReadAllLines("C:\\test.log");
string last = data[data.Length - 1];
string lastButOne = data[data.Length - 2];
In general case with long files (and that's why ReadAllLines is a bad choice) you can implement
public static partial class EnumerableExtensions {
public static IEnumerable<T> Tail<T>(this IEnumerable<T> source, int count) {
if (null == source)
throw new ArgumentNullException("source");
else if (count < 0)
throw new ArgumentOutOfRangeException("count");
else if (0 == count)
yield break;
Queue<T> queue = new Queue<T>(count + 1);
foreach (var item in source) {
queue.Enqueue(item);
if (queue.Count > count)
queue.Dequeue();
}
foreach (var item in queue)
yield return item;
}
}
...
var lastTwolines = File
.ReadLines("C:\\test.log") // Not all lines
.Tail(2);

You can try to do this
var lastLines = File.ReadAllLines("C:\\test.log").Reverse().Take(2).Reverse();
But depending on how large your file is there are probably more efficient methods to process this than reading all lines at once. See Get last 10 lines of very large text file > 10GB and How to read last “n” lines of log file

Simply store the result of ReadAllLines to a variable and than take the two last ones:
var allText = File.ReadAllLines("C:\\test.log");
var lastLines = allText.Skip(allText.Length - 2);

You can use Skip() and Take() like
var lastLine = File.ReadAllLines("C:\\test.log");
var data = lastLine.Skip(lastLine.Length - 2);
richTextBox1.Text = lastLine.ToString();

You can use StreamReader in a combination of Queue<string> since you have to read whole file either way.
// if you want to read more lines change this to the ammount of lines you want
const int LINES_KEPT = 2;
Queue<string> meQueue = new Queue<string>();
using ( StreamReader reader = new StreamReader(File.OpenRead("C:\\test.log")) )
{
string line = string.Empty;
while ( ( line = reader.ReadLine() ) != null )
{
if ( meQueue.Count == LINES_KEPT )
meQueue.Dequeue();
meQueue.Enqueue(line);
}
}
Now you can just use these 2 lines like such :
string line1 = meQueue.Dequeue();
string line2 = meQueue.Dequeue(); // <-- this is the last line.
Or to add this to the RichTextBox :
richTextBox1.Text = string.Empty; // clear the text
while ( meQueue.Count != 0 )
{
richTextBox1.Text += meQueue.Dequeue(); // add all lines in the same order as they were in file
}
Using File.ReadAllLines will read the whole text and then using Linq will iterate through already red lines. This method does everything in one run.

string line;
string[] lines = new string[]{"",""};
int index = 0;
using ( StreamReader reader = new StreamReader(File.OpenRead("C:\\test.log")) )
{
while ( ( line = reader.ReadLine() ) != null )
{
lines[index] = line;
index = 1-index;
}
}
// Last Line -1 = lines[index]
// Last line = lines[1-index]

randomly select a line in a textfile c# streamreader

Hi I am reading from a text file and would like each line to be put into a seperate variable. From what I remember from my programming classes arrays cannot be dynamic. So if I set 15 arrays, and the text file has 1000 lines what can I do and how do I implement it.
The thing is only one line will be needed but I want the line to be randomly selected. the linetext is the whole text file with \r\n appended to the end of every request.
Maybe randomly select the \r\n then count 4 and add the string after it till the next \r\n. The problem with this idea is the strings getting called will also contain \ so any ideas?
if (crawler == true)
{
TextReader tr = new StreamReader("textfile.txt");
while (tr.Peek() != -1)
{
linktext = linktext + tr.ReadLine() + "\r\n";
}
//link = linktext;
hi.Text = linktext.ToString();
timer1.Interval = 7000; //1000ms = 1sec 7 seconds per cycle
timer1.Tick += new EventHandler(randomLink); //every cycle randomURL is called.
timer1.Start(); // start timer.
}

File.ReadAllLines(...) will read every line of the given file into an array of strings. I think that should be what you want but your question is kind of hard to follow.

You don't need to keep more than two lines in memory at a time... there's a sneaky trick you can use:
Create an instance of Random, or take one as a parameter
Read the first line. This automatically becomes the "current" line to return
Read the second line, and then call Random.Next(2). If the result is 0, make the second line the "current" line
Read the third line, and then call Random.Next(3). If the result is 0, make the third line the "current" line
... etc
When you reach the end of the file (reader.ReadLine returns null) return the "current" line.
Here's a general implementation for an IEnumerable<T> - if you're using .NET 4, you can use File.ReadLines() to get an IEnumerable<string> to pass to it. (This implementation has a bit more in it than is really needed - it's optimized for IList<T> etc.)
public static T RandomElement<T>(this IEnumerable<T> source,
Random random)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (random == null)
{
throw new ArgumentNullException("random");
}
ICollection collection = source as ICollection;
if (collection != null)
{
int count = collection.Count;
if (count == 0)
{
throw new InvalidOperationException("Sequence was empty.");
}
int index = random.Next(count);
return source.ElementAt(index);
}
ICollection<T> genericCollection = source as ICollection<T>;
if (genericCollection != null)
{
int count = genericCollection.Count;
if (count == 0)
{
throw new InvalidOperationException("Sequence was empty.");
}
int index = random.Next(count);
return source.ElementAt(index);
}
using (IEnumerator<T> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
throw new InvalidOperationException("Sequence was empty.");
}
int countSoFar = 1;
T current = iterator.Current;
while (iterator.MoveNext())
{
countSoFar++;
if (random.Next(countSoFar) == 0)
{
current = iterator.Current;
}
}
return current;
}
}

A List<T> is a dynamically expanding list. You might want to use that instead of an array.
If there is only 1000 elements, just read them into the list and select a random element.

Regarding the array thing.. you could use a List<> instead, which is dynamic
Here is an example of how this can be achieved:
public static string GetRandomLine(ref string file) {
List<string> lines = new List<string>();
Random rnd = new Random();
int i = 0;
try {
if (File.Exists(file)) {
StreamReader reader = new StreamReader(file);
while (!(reader.Peek() == -1))
lines.Add(reader.ReadLine());
i = rnd.Next(lines.Count);
reader.Close();
reader.Dispose();
return lines[i].Trim();
}
else {
return string.Empty;
}
}
catch (IOException ex) {
MessageBox.Show("Error: " + ex.Message);
return string.Empty;
}
}

If you create the file then the ideal way would be to store meta data about the file, like the number of lines, before hand, and then decide which 'random' line to choose.
Otherwise, you cant get around the "array" problem by not using them. Instead use a List which stores any number of strings. After that picking a random one is as simple as generating a random number between 0 and the size of the list.
Your problem has been done before, I recommend googling for "C# read random line from file".

Split large file into smaller files by number of lines in C#?

I am trying to figure out how to split a file by the number of lines in each file. THe files are csv and I can't do it by bytes. I need to do it by lines. 20k seems to be a good number per file. What is the best way to read a stream at a given position? Stream.BaseStream.Position? So if I read the first 20k lines i would start the position at 39,999? How do I know I am almost at the end of a files? Thanks all

using (System.IO.StreamReader sr = new System.IO.StreamReader("path"))
{
int fileNumber = 0;
while (!sr.EndOfStream)
{
int count = 0;
using (System.IO.StreamWriter sw = new System.IO.StreamWriter("other path" + ++fileNumber))
{
sw.AutoFlush = true;
while (!sr.EndOfStream && ++count < 20000)
{
sw.WriteLine(sr.ReadLine());
}
}
}
}

int index=0;
var groups = from line in File.ReadLines("myfile.csv")
group line by index++/20000 into g
select g.AsEnumerable();
int file=0;
foreach (var group in groups)
File.WriteAllLines((file++).ToString(), group.ToArray());

I'd do it like this:
// helper method to break up into blocks lazily
public static IEnumerable<ICollection<T>> SplitEnumerable<T>
(IEnumerable<T> Sequence, int NbrPerBlock)
{
List<T> Group = new List<T>(NbrPerBlock);
foreach (T value in Sequence)
{
Group.Add(value);
if (Group.Count == NbrPerBlock)
{
yield return Group;
Group = new List<T>(NbrPerBlock);
}
}
if (Group.Any()) yield return Group; // flush out any remaining
}
// now it's trivial; if you want to make smaller files, just foreach
// over this and write out the lines in each block to a new file
public static IEnumerable<ICollection<string>> SplitFile(string filePath)
{
return File.ReadLines(filePath).SplitEnumerable(20000);
}
Is that not sufficient for you? You mention moving from position to position,but I don't see why that's necessary.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

find all next lines when previous line contains a string - c#

Related

How to split a string into an array of two letter substrings with C#

Quickest algorithm for identifying pairs in collection of string

Reading the 2 last line from a text

randomly select a line in a textfile c# streamreader

Split large file into smaller files by number of lines in C#?

Categories

Resources