So, I have a text file with thousands of lines formatted similarly to this:
123456:0.8525000:1590882780:91011
These files are almost always a different length, and I only need to read the first two parts of the line, being 123456:0.8525000.
I know that I can split each line using C#, but I'm unsure how to only read the first 2 parts. Anyone have any idea on how to do this? Sorry if my question doesn't make sense, I can restate it if needed.
The Split function returns a string[], an array of strings.
Just take the 2 first elements of the result of Split (with : as the separator).
var read = "123456:0.8525000:1590882780:91011";
var values = read.Split(":");
Console.WriteLine(values[0]); // 123456
Console.WriteLine(values[1]); // 0.8525000
.NET Fiddle
Don't forget that elements of values are string and not yet int or double values. See How to convert string to integer in C# for how to convert from string to number type.
There are TONS of ways to doing this but I am going to suggest some options that involving read the full line as its much easier to work with / understand and that your lines are of varying length. I did add a suggestion on using StreamReader on a file at the end in addendum but you may need to figure out serious work arounds on skipping lines you don't want, restarting a char iterating loop on new lines etc.
I first demonstrate the latest and greatest IAsyncEnumerable found in NetCore 3.x followed by a similar string-based approach. By sharing an Int example that is a slightly advanced and that will also be asynchronous, I hope to also help others and demonstrate a fairly modern approach in 2020. Streaming out only the data you need will be a huge benefit in keeping it fast and a low memory footprint.
public static async IAsyncEnumerable<int> StreamFileOutAsIntsAsync(string filePathName)
{
if (string.IsNullOrWhiteSpace(filePathName)) throw new ArgumentNullException(nameof(filePathName));
if (!File.Exists(filePathName)) throw new ArgumentException($"{filePathName} is not a valid file path.");
using var streamReader = File.OpenText(filePathName);
string currentLine;
while ((currentLine = await streamReader.ReadLineAsync().ConfigureAwait(false)) != null)
{
if (int.TryParse(currentLine.AsSpan(), out var output))
{
yield return output;
}
}
}
This streams every int out of a file, checking that file exists and that the filename path is not null or blank etc.
Streaming maybe too much for a beginner so I don't know your level.
You may want to start with just turning the file into a list of strings.
Modifying my previous example above to something less complex but split your strings for you. I recommend learning about streaming so you don't have every piece of string in memory while you work on it... or maybe you want them all. I am not here to judge.
Once you get your string line out from a file you can do whatever else needs to be done.
public static async Task<List<string>> GetStringsFromFileAsync(string filePathName)
{
if (string.IsNullOrWhiteSpace(filePathName)) throw new ArgumentNullException(nameof(filePathName));
if (!File.Exists(filePathName)) throw new ArgumentException($"{filePathName} is not a valid file path.");
using var streamReader = File.OpenText(filePathName);
string currentLine;
var strings = new List<string>();
while ((currentLine = await streamReader.ReadLineAsync().ConfigureAwait(false)) != null)
{
var lineAsArray = currentLine.Split(new string[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
// Simple Data Validation
if (lineAsArray.Length == 4)
{
strings.Add($"{lineAsArray[0]}:{lineAsArray[1]}");
strings.Add($"{lineAsArray[2]}:{lineAsArray[3]}");
}
}
return strings;
}
The meat of the code is really simple, open the file for reading!
using var streamReader = File.OpenText(filePathName);
and then loop through that file...
while ((currentLine = await streamReader.ReadLineAsync()) != null)
{
var lineAsArray = currentLine.Split(new string[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
// Simple Data Validation
if (lineAsArray.Length == 4)
{
// Do whatever you need to do with the first bits of information.
// In this case, we add them all to a list for return.
strings.Add($"{lineAsArray[0]}:{lineAsArray[1]}");
strings.Add($"{lineAsArray[2]}:{lineAsArray[3]}");
}
}
What this demonstrates is that, for every line that I read out that is not null, break into four parts (based on the ":") character removing all empty entries.
We then use a C# feature called String Interpolation ($"") to put the first two back together with ":" as a string. Then the second two. Or whatever you need to do with reading each part of the line.
That's really all there is to it! Hope it helps.
Addendum: If you really need to read parts of file, please use a StreamReader.Read and Peek()
using (var sr = new StreamReader(path))
{
while (sr.Peek() >= 0)
{
Console.Write((char)sr.Read());
}
}
Reading each character
Some bare bones code:
string fileName = #"c:\some folder\path\file.txt";
using (StreamReader sr = new StreamReader(fileName))
{
while (!sr.EndOfStream)
{
String[] values = sr.ReadLine().Split(":".ToCharArray());
if (values.Length >= 2)
{
// ... do something with values[0] and values[1] ...
Console.WriteLine(values[0] + ", " + values[1]);
}
}
}
I have millions of strings, around 8GB worth of HEX; each string is 3.2kb in length.
Each of these strings contains multiple parts of data I need to extract.
This is an example of one such string:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ$GPGGA,104646.091,,,,,0,0,,,M,,M,,*41$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test2ÿÿ3ÿÿ4ÿÿ5ÿÿ6ÿÿ7ÿÿ8ÿÿ9ÿÿ:ÿÿ;ÿÿ<ÿÿ=ÿÿ>ÿÿ?ÿÿ#ÿÿAÿÿBÿÿCÿÿDÿÿEÿÿFÿÿGÿÿHÿÿIÿÿJÿÿ$GPGGA,104647.091,,,,,0,0,,,M,,M,,*40$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header TestKÿÿLÿÿMÿÿNÿÿOÿÿPÿÿQÿÿRÿÿSÿÿTÿÿUÿÿVÿÿWÿÿXÿÿYÿÿZÿÿ[ÿÿ\ÿÿ]ÿÿ^ÿÿ_ÿÿ`ÿÿaÿÿbÿÿcÿÿ$GPGGA,104648.091,,,,,0,0,,,M,,M,,*4F$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Testdÿÿeÿÿfÿÿgÿÿhÿÿiÿÿjÿÿkÿÿlÿÿmÿÿnÿÿoÿÿpÿÿqÿÿrÿÿsÿÿtÿÿuÿÿvÿÿwÿÿxÿÿyÿÿzÿÿ{ÿÿ|ÿÿ$GPGGA,104649.091,,,,,0,0,,,M,,M,,*4E$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test}ÿÿ~ÿÿ.ÿÿ€ÿÿ.ÿÿ‚ÿÿƒÿÿ„ÿÿ…ÿÿ†ÿÿ‡ÿÿˆÿÿ‰ÿÿŠÿÿ‹ÿÿŒÿÿ.ÿÿŽÿÿ.ÿÿ.ÿÿ‘ÿÿ’ÿÿ“ÿÿ”ÿÿ•ÿÿ$GPGGA,104650.091,,,,,0,0,,,M,,M,,*46$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Head
as you can see it is pretty much this repeated:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
I want to separate this string into two lists like this:
_GPSList
$GPGGA,104644.091,,,,,0,0,,,M,,M,,*43
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N
_WavList
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
Issue 1:
This repetition isn't containing within a single string, it overflows into the next string. so if some data crosses the end and start of two strings how to I deal with that?
Issue 2: How do I analyse the string and extract only the parts I need?
The solution I'm providing is not a complete answer but more like an idea which might help you get what you want.
Everything else which I present is an assumption on my behalf.
//Assuming your data is stored in a file "yourdatafile"
//Splitting all the text on "$" assuming this will separate GPSData
string[] splittedstring = File.ReadAllText("yourdatafile").Split('$');
//I found an extra string lingering in the sample you provided
//because I splitted on "$", so you gotta take that into account
var GPSList = new List<string>();
var WAVList = new List<string>();
foreach (var str in splittedstring)
{
//So if the string contains "Header" we would want to separate it from GPS data
if (str.Contains("Header"))
{
string temp = str.Remove(str.IndexOf("Header"));
int indexOfAsterisk = temp.LastIndexOf("*");
string stringBeforeAsterisk = str.Substring(0, indexOfAsterisk + 1);
string stringAfterAsterisk = str.Replace(stringBeforeAsterisk, "");
WAVList.Add(stringAfterAsterisk);
GPSList.Add("$" + stringBeforeAsterisk);
}
else
GPSList.Add("$" + str);
}
This provides the exact output as you need, only exception is with that extra string. Also some non-standard characters might look like black blocks.
Given a text file, how would I go about reading an particular digits in line .
Say, I have a file 123.txt. How would I go about reading line number and store first 5 digits in different variable and next 6 digits to another variable.
All I've seen is stuff involving storing the entire text file as a String array . but there are some complications: The text file is enormously huge and the machine that the application I'm coding isn't exactly a top-notch system. Speed isn't the top priority, but it is definitely a major issue.
// Please Help here
// Want to compare data of input file with database table columns.
// How to split data in to parts
// Access that split data later for comparison.
// Data in input file is like,
//
// 016584824684000000000000000+
// 045787544574000000000000000+
// 014578645447000000000000000+
// 047878741489000000000000000+ and so on..
string[] lines = System.IO.File.ReadAllLines("F:\\123.txt"); // Input file
// How can I divide lines from input file in 2 parts (For ex. 01658 and 4824684) and save it in variable so that I can use it for comparing later.
string conStr = ConfigurationManager.ConnectionStrings["BVI"].ConnectionString;
cnn = new SqlConnection(conStr);
cnn.Open();
// So I want to compare first 5 digits of all lines of input file (ex. 01658)with Transit_ID and next 6 digits with Client_Account and then export matching rows in excel file.
sql = "SELECT Transit_ID AS TransitID, Client_Account AS AccountNo FROM TCA_CLIENT_ACCOUNT WHERE Transit_ID = " //(What should I put here to comapare with first 5 digits of all lines of input file)" AND Client_Account = " ??" );
All I've seen is stuff involving storing the entire text file as a String array
Large text files should be processed by streaming one line at a time so that you don't allocate a large amount of memory needlessly
using (StreamReader sr = File.OpenText(path))
{
string s;
while ((s = sr.ReadLine()) != null)
{
// How would I go about reading line number and store first 5
// digits in different variable and next 6 digits to another variable.
string first = s.Substring(0, 5);
string second = s.Substring(6, 6);
}
}
https://msdn.microsoft.com/en-us/library/system.io.file.opentext(v=vs.110).aspx
Just use Substring(int32, int32) to get the appropriate values like this:
string[] lines = System.IO.File.ReadAllLines("F:\\123.txt");
List<string> first = new List<string>();
List<string> second = new List<string>();
foreach (string line in lines)
{
first.Add(line.Substring(0, 5));
second.Add(line.Substring(6, 6));
}
Though Eric's answer is much cleaner. This was just a quick and dirty proof of concept using your sample data. You should definitely use the using statement and StreamReader as he suggested.
first will contain the first 5 digits from each element in lines, and second will contain the next 6 digits.
Then to build your SQL, you'd do something like this;
sql = "SELECT Transit_ID AS TransitID, Client_Account AS AccountNo FROM TCA_CLIENT_ACCOUNT WHERE Transit_ID = #TransitId AND Client_Account = #ClientAcct");
SqlCommand cmd = new SqlCommand(sql);
for (int i = 0; i < lines.Count; i++)
{
cmd.Parameters.AddWithValue("#TransitId", first[i]);
cmd.Parameters.AddWithValue("#ClientAcct", second[i]);
//execute your command and validate results
}
That will loop N times and run a command for each of the values in lines.
I'm simply trying to execute File.ReadAllLines against a specific file and, for every line, split on |. I have to use regex on this one.
This code below doesnt work, but you'll see what i'm trying to do:
string[] contents = File.ReadAllLines(filename);
string[] splitlines = Regex.Split(contents, '|');
foreach (string split in splitlines)
{
//Regex line = content.Split('|');
//content.Split('|');
string prefix = prefix = Regex.Match(line, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
It's not entirely clear to me what you are trying to do, but there are a number of errors in your code. I have tried to guess what you are doing, but if this isn't what you want, please explain what you do want preferably with some examples:
string inputFilename = "input.txt";
string outputFilename = "output.txt";
using (StreamWriter streamWriter = File.AppendText(outputFilename))
{
using (StreamReader streamReader = File.OpenText(inputFilename))
{
while (true)
{
string line = streamReader.ReadLine();
if (line == null)
{
break;
}
string[] splitlines = line.Split('|');
foreach (string split in splitlines)
{
Match match = Regex.Match(split, #"\S+\d+");
if (match.Success)
{
string prefix = match.Groups[0].Value;
streamWriter.WriteLine(prefix);
}
else
{
// Handle match failed...
}
}
}
}
}
Key points:
You seem to want to perform an operation on each line, so you need to iterate over the lines.
Use the simple string.Split method if you want to split on a single character. Regex.Split doesn't accept a character and "|" has a special meaning in regular expressions so it wouldn't have worked anyway unless you escaped it.
You were opening and closing the output file multiple times. You should open it just once and keep it open until you have finished writing to it. The using keyword is useful here.
Use WriteLine instead of appending "\r\n".
If the input file is large, use a StreamReader instead of ReadAllLines.
If the match fails, your program will throw an exception. You probably should check match.Success before using the match and if this returns false, handle the error appropriately (skip the line, report a warning, throw an exception with an appropriate message, etc.)
You aren't actually using groups 1 and 2 in the regular expression, so you can remove the parentheses to save the regular expression engine from having to store results that you won't use anyway.
You should pass the original string to Regex.Split and not an array.
Looks like you are using line instead of split when settings the prefix. Without knowing more about your code I cant tell if it's right or not but in any case it sticks out as the error.(it shouldnt build either)
This is a really inefficient on at least two levels :)
Regex.Split takes a string, not an array of strings.
I would recommend calling Regex.Split on each item of contents individually, then looping over the results of that call. This would mean nested for loops.
string[] contents = File.ReadAllLines(filename);
foreach (string line in contents)
{
string[] splitlines = Regex.Split(line);
foreach (string splitline in splitlines)
{
string prefix = Regex.Match(splitline, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
}
This, of course isn't the most efficient way to go about it.
A more efficient way might be to split on a regular expression instead. I think this works:
string splitlines = Regex.Split(File.ReadAllText(filename), "$|\\|");
I have to assume, based on the limited feedback, that this is what you're looking for:
string inputFile = filename;
string outputFile = Path.Combine( workingdirform2, "configuration.txt" );
using ( StreamReader inputFileStream = File.OpenText( inputFile ) )
{
using ( StreamWriter ouputFileStream = File.AppendText( outputFile ) )
{
// Iterate over the file contents to extract the prefix
string currentLine;
while ( ( currentLine = inputFileStream.ReadLine() ) != null )
{
// Notice the updated Regex - your's is a bit broken
string prefix = Regex.Match( currentLine, #"^(\S+?)\d+" ).Groups[1].Value;
ouputFileStream.WriteLine( prefix );
}
}
}
This would take a file full of:
Text1231|abc|abc
Text1232|abc|abc
Text1233|abc|abc
Text1234|abc|abc
and place:
Text
Text
Text
Text
into a new file.
I hope this, at least, gets you on the right path. My crystal ball is getting hazy.. haaazzzy..
Probably one of the best way to process text files in C# is to use fileHelpers. Give it a look. It allows you to strongly type your import data.