what i'm trying to do here is to delete the longest line from a txt file. Code does it's job, but i also need it to delete multiple "longest lines" and blank lines as well. Any ideas on how to do it?
Code is in C#
namespace _5_2
{
//------------------------------------------------------------
class Program
{
const string CFd = "..\\..\\U1.txt";
const string CFr = "..\\..\\Results.txt";
static void Main(string[] args)
{
int nr;
Read(CFd, out nr);
Print(CFd, CFr, nr);
Console.WriteLine("Longest line nr. {0, 4:d}", nr + 1);
Console.WriteLine("Program done");
}
//------------------------------------------------------------
/** Finds number of the longest line.
#param fv - text file name
#param nr - number of the longest line */
//------------------------------------------------------------
static void Read(string fv, out int nr)
{
string[] lines = File.ReadAllLines(fv, Encoding.GetEncoding(1257));
int ilgis = 0;
nr = 0;
int nreil = 0;
foreach (string line in lines)
{
if (line.Length > ilgis)
{
ilgis = line.Length;
nr = nreil;
}
nreil++;
}
}
static void Print(string fv, string fvr, int nr)
{
string[] lines = File.ReadAllLines(fv, Encoding.GetEncoding(1257));
int nreil = 0;
using (var fr = File.CreateText(fvr))
{
foreach (string line in lines)
{
if (nr != nreil)
{
fr.WriteLine(line);
}
nreil++;
}
}
}
}
}
I would suggest using LINQ. Take advantage of the .Max extension method and iterate over the string array.
string[] lines = { "1", "two", "three" };
var longestLine = lines.Max(line => line.Length);
var revised = lines.Where(line => line.Length < longestLine).ToArray();
The revised variable will contain a string array that excludes the lines with the longest line count.
Read lines, filter out empty lines and the 10 longest lines, write lines:
string[] lines = File.ReadAllLines(inputFile, Encoding.GetEncoding(1257));
var filtered = lines
.Where(line => line.Length > 0) // remove all empty lines
.Except(lines.OrderByDescending(line => line.Length).Take(10)); // remove 10 longest lines
File.WriteAllLines(outputFile, filtered);
You could identify the longest line, and then loop through the list, deleting all of that length. To also delete empty ones, you could test against String.IsNullOrWhiteSpace.
Something like (pseudocode):
foreach (string line in lines)
{
if (String.IsNullOrWhiteSpace(line))
{
lines.Delete(line);
Continue;
}
if (line.Length >= longestLine) // ">=" instead of "==" just to be on the safe side
{
lines.Delete(line);
}
}
Related
I need to decide to complete the solution so that all text is removed after any comment markers. Any spaces at the end of the line must be removed
My code:
public static string StripComments(string text, string[] commentSymbols)
{
string[] str = text.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < str.Length; i++)
{
int index=0;
if (str[i].IndexOf(commentSymbols[0]) != -1)
{
index = str[i].IndexOf(commentSymbols[0], StringComparison.Ordinal)-1;
str[i] = str[i].Remove(index)+"\n";
}
else if (str[i].IndexOf(commentSymbols[1]) != -1)
{
index = str[i].IndexOf(commentSymbols[1],StringComparison.Ordinal)-1;
str[i] = str[i].Remove(index)+"\n";
}
else {
str[i]+="\n";
}
}
str[str.Length-1]=str[str.Length-1].Remove(str[str.Length-1].LastIndexOf('\n'));
return string.Join("",str).TrimEnd();
}
For example:
string stripped = StripCommentsSolution.StripComments("apples, pears # and bananas\ngrapes\nbananas !apples", new [] { "#", "!" })
// result should == "apples, pears\ngrapes\nbananas"
When running tests, it throws an error:
Expected string length 6 but was 8. Strings differ at index 1.
Expected: "a \ n b \ nc" But was: "a \ n b \ nc"
cause of #evgeny20 mentioned in comments, he want's it without regexes here one solution i found, without it.
public static string StripComments(string text, string[] commentSymbols)
{
string[] lines = text.Split(new[] { "\n" }, StringSplitOptions.None);
lines = lines.Select(x => x.Split(commentSymbols, StringSplitOptions.None).First().TrimEnd()).ToArray();
return string.Join("\n", lines);
}
Best Regards and happy learning
Here is my Solution, have a try
you need this using...
using System.Text.RegularExpressions;
public static string StripComments(string text, string[] commentSymbols)
{
foreach (string commentSymbol in commentSymbols)
{
text = Regex.Replace(text, $"{commentSymbol}[^\\n]*", "");
}
text = Regex.Replace(text, "\\s*(\\n+)", "$1");
return text;
}
Edit:
Regex changed that line ends are trimmed too
Best Regards
In general, your code seems a resonable approach. It does however not cover:
more or less than two commentSymbols
lines containing more than one comment symbol
The following code handles these cases:
public static string StripComments(string text, string[] commentSymbols)
{
string[] str = text.Split('\n');
for (int i = 0; i < str.Length; i++)
{
int index = -1;
// find first occurrence of any commentSymbol in str[i]
foreach (var commentSymbol in commentSymbols)
{
int indexOfThisSymbol = str[i].IndexOf(commentSymbol);
if (indexOfThisSymbol != -1 && (index == -1 || indexOfThisSymbol < index))
index = indexOfThisSymbol;
}
if (index != -1)
{
// strip off comment symbol and everything after it
str[i] = str[i].Substring(0, index);
}
// strip off trailing spaces
str[i] = str[i].TrimEnd();
}
// combine again
return string.Join("\n", str);
}
i can provide a second Solution like i understand your request.
Please have a look.
public static string StripComments(string text, string[] commentSymbols)
{
var textLines = text.Split("\n");
for (int i = 0; i < textLines.Length;i++ )
{
foreach (string commentSymbol in commentSymbols)
{
textLines[i] = Regex.Replace(textLines[i], $"{Regex.Escape(commentSymbol)}[^\\n]*", "");
}
textLines[i] = textLines[i].TrimEnd();
}
return string.Join("\n",textLines);
}
this preserves all line breaks and removes comments and whitespaces as requested in your link.
If it isn't right, you can give a more constructive example. At least I'm trying to help you.
EDIT: this solution now supports Regex Specialcharacters as commentsymbol
With this i passed the test in codewars successfully
Best Regards
I am trying to read a file and split the text after every 1000 characters. But I want to keep the words intact. So it should just split at the space. If the 1000th character is not a space, then split at the first space just before or just after it. Any idea how to do that? I am also removing the extra spaces from the text.
while ((line = file.ReadLine()) != null)
{
text = text + line.Trim();
noSpaceText = Regex.Replace(text, #"\r\n?|\n/", "");
}
List<string> rowsToInsert = new List<string>();
int splitAt = 1000;
for (int i = 0; i < noSpaceText.Length; i = i + splitAt)
{
if (noSpaceText.Length - i >= splitAt)
{
rowsToInsert.Add(noSpaceText.Substring(i, splitAt));
}
else
rowsToInsert.Add(noSpaceText.Substring(i,
((noSpaceText.Length - i))));
}
foreach(var item in rowsToInsert)
{
Console.WriteLine(item);
}
Okay, just typed this non tested solution which should do the trick:
public static List<string> SplitOn(this string input, int charLength, char[] seperator)
{
List<string> splits = new List<string>();
var tokens = input.Split(seperator);
// -1 because first token adds 1 to length
int totalLength = -1;
List<string> segments = new List<string>;
foreach(var t in tokens)
{
if(totalLength + t.Length+1 > charLength)
{
splits.Add(String.Join(" ", segments));
totalLength = -1;
segments.Clear();
}
totalLength += t.Length + 1;
segments.Add(t);
}
if(segments.Count>0)
{
splits.Add(String.Join(" ", segments));
}
return splits;
}
It's an extension Method, which splits an input text in segments by whitespaces, means, i iterate over an array with just words. Then counting the length of each segment, checking for totallength and add it to result list.
An alternate solution:
public static List<string> SplitString(string stringInput, int blockLength)
{
var output = new List<string>();
var count = 0;
while(count < stringInput.Length)
{
string block = "";
if(count + blockLength > stringInput.Length)
{
block = stringInput.Substring(count, stringInput.Length - count);
}
else
{
block = stringInput.Substring(count, blockLength + 1);
}
if(block.Length < blockLength)
{
output.Add(block);
count += block.Length;
}
else if(block.EndsWith(" "))
{
output.Add(block);
count = count+blockLength + 1;
}
else
{
output.Add(block.Substring(0, block.LastIndexOf(" ")));
count = count + block.LastIndexOf(" ") +1;
}
}
return output;
}
I have a text file from which I read the text in lines. Also from all that text I need to find the longest sentence and find in which line it begins. I have no trouble finding the longest sentence but the problem arises when I need to find where it begins.
The contents of the text file is:
V. M. Putinas
Margi sakalai
Lydėdami gęstančią žarą vėlai
Pakilo į dangų;;, margi sakalai.
Paniekinę žemės vylingus sapnus,
Padangėje ištiesė,,; savo sparnus.
Ir tarė margieji: negrįšim į žemę,
Kol josios kalnai ir pakalnės aptemę.
My code:
static void Sakiniai (string fv, string skyrikliai)
{
char[] skyrikliaiSak = { '.', '!', '?' };
string naujas = "";
string[] lines = File.ReadAllLines(fv, Encoding.GetEncoding(1257));
foreach (string line in lines)
{
// Add lines into a string so I can separate them into sentences
naujas += line;
}
// Separating into sentences
string[] sakiniai = naujas.Split(skyrikliaiSak);
// This method finds the longest sentence
string ilgiausiasSak = RastiIlgiausiaSakini(sakiniai);
}
From the text file the longest sentence is: "Margi sakalai Lydėdami gęstančią žarą vėlai Pakilo į dangų;;, margi sakalai"
How can I find the exact line where the sentence begins?
What about a nested for loop? If two sentences are the same length, this just finds the first one.
var lines = File.ReadAllLines(fv, Encoding.GetEncoding(1257));
var terminators = new HashSet<char> { '.', '?', '!' };
var currentLength = 0;
var currentSentence = new StringBuilder();
var maxLength = 0;
var maxLine = default(int?);
var maxSentence = "";
for (var currentLine = 0; currentLine < lines.Count(); currentLine++)
{
foreach (var character in lines[currentLine])
{
if (terminators.Contains(character))
{
if (currentLength > maxLength)
{
maxLength = currentLength;
maxLine = currentLine;
maxSentence = currentSentence.ToString();
}
currentLength = 0;
currentSentence.Clear();
}
else
{
currentLength++;
currentSentence.Append(character);
}
}
}
First find the start index of the longest sentence in the whole content
int startIdx = naujas.IndexOf(ilgiausiasSak);
then loop the lines to find out which line the startIdx falls in
int i = 0;
while (i < lines.Length && startIdx >= 0)
{
startIdx -= lines[i].Length;
i++;
}
// do stuff with i
i is where the longest sentence starts at. e.g. i=2 means it start from the second line
Build an index that solves your problem.
We can make a straightforward modification of your existing code:
var lineOffsets = new List<int>();
lineOffsets.Add(0);
foreach (string line in lines)
{
// Add lines into a string so I can separate them into sentences
naujas += line;
lineOffsets.Add(naujas.Length);
}
All right; now you have a list of the character offset in your final string corresponding to each line.
You have a substring of the big string. You can use IndexOf to find the offset of the substring in the big string. Then you can search through the list to find the list index of the last element that is smaller or equal than the offset. That's the line number.
If the list is large, you can binary search it.
How about
identify the lines in the text
split the text into sentences
split the sentences into sections based on the line breaks (could work also with splitting on words as well if needed)
verify the sections of the sentence are on consecutive rows
In the end certain sections of the sentence might occur on multiple lines forming other sentences so you need to correctly identify the sentences spreading consecutive rows
// define separators for various contexts
var separator = new
{
Lines = new[] { '\n' },
Sentences = new[] { '.', '!', '?' },
Sections = new[] { '\n' },
};
// isolate the lines and their corresponding number
var lines = paragraph
.Split(separator.Lines, StringSplitOptions.RemoveEmptyEntries)
.Select((text, number) => new
{
Number = number += 1,
Text = text,
})
.ToList();
// isolate the sentences with corresponding sections and line numbers
var sentences = paragraph
.Split(separator.Sentences, StringSplitOptions.RemoveEmptyEntries)
.Select(sentence => sentence.Trim())
.Select(sentence => new
{
Text = sentence,
Length = sentence.Length,
Sections = sentence
.Split(separator.Sections)
.Select((section, index) => new
{
Index = index,
Text = section,
Lines = lines
.Where(line => line.Text.Contains(section))
.Select(line => line.Number)
})
.OrderBy(section => section.Index)
})
.OrderByDescending(p => p.Length)
.ToList();
// build the possible combinations of sections within a sentence
// and filter only those that are on consecutive lines
var results = from sentence in sentences
let occurences = sentence.Sections
.Select(p => p.Lines)
.Cartesian()
.Where(p => p.Consecutive())
.SelectMany(p => p)
select new
{
Text = sentence.Text,
Length = sentence.Length,
Lines = occurences,
};
and the end results would look like this
where .Cartesian and .Consecutive are just some helper extension methods over enumerable (see associated gist for the entire source code in linqpad ready format)
public static IEnumerable<T> Yield<T>(this T instance)
{
yield return instance;
}
public static IEnumerable<IEnumerable<T>> Cartesian<T>(this IEnumerable<IEnumerable<T>> instance)
{
var seed = Enumerable.Empty<T>().Yield();
return instance.Aggregate(seed, (accumulator, sequence) =>
{
var results = from vector in accumulator
from item in sequence
select vector.Concat(new[]
{
item
});
return results;
});
}
public static bool Consecutive(this IEnumerable<int> instance)
{
var distinct = instance.Distinct().ToList();
return distinct
.Zip(distinct.Skip(1), (a, b) => a + 1 == b)
.All(p => p);
}
I have a text file with lines of text laid out like so
12345MLOL68
12345MLOL68
12345MLOL68
I want to read the file and add commas to the 5th point, 6th point and 9th point and write it to a different text file so the result would be.
12345,M,LOL,68
12345,M,LOL,68
12345,M,LOL,68
This is what I have so far
public static void ToCSV(string fileWRITE, string fileREAD)
{
int count = 0;
string x = "";
StreamWriter commas = new StreamWriter(fileWRITE);
string FileText = new System.IO.StreamReader(fileREAD).ReadToEnd();
var dataList = new List<string>();
IEnumerable<string> splitString = Regex.Split(FileText, "(.{1}.{5})").Where(s => s != String.Empty);
foreach (string y in splitString)
{
dataList.Add(y);
}
foreach (string y in dataList)
{
x = (x + y + ",");
count++;
if (count == 3)
{
x = (x + "NULL,NULL,NULL,NULL");
commas.WriteLine(x);
x = "";
count = 0;
)
}
commas.Close();
}
The problem I'm having is trying to figure out how to split the original string lines I read in at several points. The line
IEnumerable<string> splitString = Regex.Split(FileText, "(.{1}.{5})").Where(s => s != String.Empty);
Is not working in the way I want to. It's just adding up the 1 and 5 and splitting all strings at the 6th char.
Can anyone help me split each string at specific points?
Simpler code:
public static void ToCSV(string fileWRITE, string fileREAD)
{
string[] lines = File.ReadAllLines(fileREAD);
string[] splitLines = lines.Select(s => Regex.Replace(s, "(.{5})(.)(.{3})(.*)", "$1,$2,$3,$4")).ToArray();
File.WriteAllLines(fileWRITE, splitLines);
}
Just insert at the right place in descending order like this.
string str = "12345MLOL68";
int[] indices = {5, 6, 9};
indices = indices.OrderByDescending(x => x).ToArray();
foreach (var index in indices)
{
str = str.Insert(index, ",");
}
We're doing this in descending order because if we do other way indices will change, it will be hard to track it.
Here is the Demo
Why don't you use substring , example
editedstring=input.substring(0,5)+","+input.substring(5,1)+","+input.substring(6,3)+","+input.substring(9);
This should suits your need.
I have a text file that I load into a string array. The contents of the file looks something like this:
OTI*IA*IX*NA~ REF*G1*J EVERETTE~ REF*11*0113722462~
AMT*GW*229.8~ NM1*QC*1*JENNINGS*PHILLIP~ OTI*IA*IX*NA~ REF*G1*J
EVERETTE~ REF*11*0113722463~ AMT*GW*127.75~
NM1*QC*1*JENNINGS*PHILLIP~ OTI*IA*IX*NA~ REF*G1*J EVERETTE~
REF*11*0113722462~ AMT*GW*10.99~ NM1*QC*1*JENNINGS*PHILLIP~ ...
I'm looking for the lines that start with OTI, and if it's followed by "IA" then I need to get the 10 digit number from the line that starts with REF*11. So far, I have this:
string[] readText = File.ReadAllLines("myfile.txt");
foreach (string s in readText) //string contains 1 line of text from above example
{
string[] currentline = s.Split('*');
if (currentline[0] == "OTI")
{
//move down 2 lines and grab the 10 digit
//number from the line that starts with REF*11
}
}
The line I need is always 2 lines after the current OTI line. How do I access the line that's 2 lines down from my current line?
Instead of using foreach() you can use a for(int index = 0; index < readText.Length; index++)
Then you know the line you are accessing and you can easily say int otherIndex = index + 2
string[] readText = File.ReadAllLines("myfile.txt");
for(int index = 0; index < readText.Length; index++)
{
string[] currentline = readText[index].Split('*');
if (currentline[0] == "OTI")
{
//move down 2 lines and grab the 10 digit
//number from the line that starts with REF*11
int refIndex = index + 2;
string refLine = readText[refIndex];
}
}
What about:
string[] readText = File.ReadAllLines("myfile.txt");
for (int i = 0; i < readText.Length; i++)
{
if (readText[i].StartsWith("OTI") && readText[i+2].StartsWith("REF*11")){
string number = readText[i+2].Substring("REF*11".Length, 10);
//do something
}
}
This looks like an EDI file! Ahh, EDI, the memories...
The good news is that the EDI file is delimited, just like most CSV file formats. You can use any standard CSV file library to load the EDI file into a gigantic array, and then iterate through it by position.
I published my open source CSV library here, feel free to use it if it's helpful. You can simply specify the "asterisk" as the delimiter:
https://code.google.com/p/csharp-csv-reader/
// This code assumes the file is on disk, and the first row of the file
// has the names of the columns on it
DataTable dt = CSVReader.LoadDataTable(myfilename, '*', '\"');
At this point, you can iterate through the datatable as normal.
for (int i = 0; i < dt.Rows.Count; i++) {
if (dt.Rows[i][0] == "OTI") {
Console.WriteLine("The row I want is: " + dt.Rows[i + 2][0]);
}
}
If you want to use regex to tokenize the items and create dynamic entities, here is such a pattern
string data = #"NM1*QC*1*JENNINGS*PHILLIP~
OTI*IA*IX*NA~
REF*G1*J EVERETTE~
REF*11*0113722463~
AMT*GW*127.75~
NM1*QC*1*JENNINGS*PHILLIP~
OTI*IA*IX*NA~
REF*G1*J EVERETTE~
REF*11*0113722462~
AMT*GW*10.99~
NM1*QC*1*JENNINGS*PHILLIP~";
string pattern = #"^(?<Command>\w{3})((?:\*)(?<Value>[^~*]+))+";
var lines = Regex.Matches(data, pattern, RegexOptions.Multiline)
.OfType<Match>()
.Select (mt => new
{
Op = mt.Groups["Command"].Value,
Data = mt.Groups["Value"].Captures.OfType<Capture>().Select (c => c.Value)
}
);
That produces a list of items like this which you can apply your business logic to:
Why dont you use regular expression matches using Regex.Match or Regex.Matches defined in System.Text.RegularExpressions? You can also look at string pattern matching algorithms such as the Knuth-Morris-Pratt algorithms.
string[] readText = File.ReadAllLines("myfile.txt");
foreach (string s in readText) //string contains 1 line of text from above example
{
string[] currentline = s.Split('*');
if (currentline[0] == "REF"&¤tline[1] == "11")
{
found=false;
needed=current+2;
}
}
string[] readText = File.ReadAllLines("myfile.txt");
for(int linenum = 0;linenum < readText.Length; linenum++)
{
string s = readText[linenum];
string[] currentline = s.Split('*');
if (currentline[0] == "OTI")
{
//move down 2 lines and grab the 10 digit
linenum +=2;
string refLine = readText[linenum];
//number from the line that starts with REF*11
// Extract your number here from refline
}
}
Thank guys .. this is what I came up with, but I'm also reading your answers as I KNOW I will learn something! Thanks again!
string[] readText = File.ReadAllLines("myfile.txt");
int i = 0;
foreach (string s in readText)
{
string[] currentline = s.Split('*');
if (currentline[0] == "OTI")
{
lbRecon.Items.Add(readText[i+2].Substring(8, 9));
}
i++;
}