Fastest way to read a file line by line? [duplicate] - c#

This question already has answers here:
What's the fastest way to read a text file line-by-line?
(9 answers)
Closed 7 years ago.
I currently have a .TXT file and I'm trying to read all the lines, and sort them, and then display them in order.
Here is what I have
string inFile = "Z:/Daniel/Accounts.txt";
string outFile = "Z:/Daniel/SortedAccounts.txt";
var contents = File.ReadAllLines(inFile);
Array.Sort(contents);
File.WriteAllLines(outFile, contents);
int i = 0;
int lineCount = File.ReadLines("Z:/Daniel/SortedAccounts.txt").Count();
do
{
string accounts = File.ReadLines("Z:/Daniel/SortedAccounts.txt").Skip(i).Take(1).First();
//First Name
int pFrom1 = accounts.IndexOf("#1#") + "#1#".Length;
int pTo1 = accounts.LastIndexOf("#2#");
String accountFirstName = accounts.Substring(pFrom1, pTo1 - pFrom1);
//Last Name
int pFrom2 = accounts.IndexOf("#2#") + "#2#".Length;
int pTo2 = accounts.LastIndexOf("#3#");
String accountLastName = accounts.Substring(pFrom2, pTo2 - pFrom2);
//Email
int pFrom3 = accounts.IndexOf("#3#") + "#3#".Length;
int pTo3 = accounts.LastIndexOf("#4#");
String accountEmail = accounts.Substring(pFrom3, pTo3 - pFrom3);
//Phone Number
int pFrom4 = accounts.IndexOf("#4#") + "#4#".Length;
int pTo4 = accounts.LastIndexOf("#5#");
String accountNumber = accounts.Substring(pFrom4, pTo4 - pFrom4);
//Preferred Contact
int pFrom5 = accounts.IndexOf("#5#") + "#5#".Length;
int pTo5 = accounts.LastIndexOf("#6#");
String accountPreferredContact = accounts.Substring(pFrom5, pTo5 - pFrom5);
//Populate Combobox
accountComboBox.Items.Add(accountLastName + "," + accountFirstName);
i = i + 1;
} while (i < lineCount);
And an example of what's inside Accounts.txt is
#1#Daniel#2#Mos#3#dasdnmasdda#gmail.com#4#31012304#5#EMAIL#6#
#1#Daniael#2#Mosa#3#dddasdsa#gmail.com#4#310512304#5#EMAIL#6#
#1#Dansdael#2#Mossdsa#3#dasdsdssa#gmail.com#4#31121234#5#TEXT#6#
#1#Danasdl#2#Mosasaa#3#daasda#gmail.com#4#310123304#5#EMAIL#6#
#1#Dandasel#2#Moasddand#3#daasdsda#gmail.com#4#3123551234#5#TEXT#6#
#1#Danasdl#2#Mossdsadd#3#daasddsa#gmail.com#4#310213304#5#TEXT#6#
The issue is that sometimes, Accounts.txt will have over 10,000 lines and it then takes a while for the program to load.
Is there a faster implementation of the code I have written?

My suggestions:
read the file line by line, like file.readlines, streaming as you go instead of reading the whole file (especially not reading the file twice the way you are!)
for each line, apply a compiled regex that gets the values you need out of that string
create an account class (or just a single string value i guess) that holds all the values from 2 as necessary. looks like your loop only cared about 2 of the strings (accountLastName and accountFirstName)
add those to a list that isn't the combobox's items.
sort those using linq/sort if you need them sorted (sort as little as necessary and as late as possible), something like items.OrderBy( x => x.LastName ).ThenBy( y => y.FirstName) or whatever
add the entire block of items to your combobox at the very end, instead of one at a time. ideally something like combobox.Items.AddRange(items) (many of the combobox/etc collections might fire a collection change event every time one item is added, that can be a lot of overhead if you're adding 1000's of items)

All code should be refactored in below way. You need to measure performance for either both approaches.
const string inFile = "Z:/Daniel/Accounts.txt";
const string outFile = "Z:/Daniel/SortedAccounts.txt";
string[] contents = File.ReadAllLines(inFile);
Array.Sort(contents);
File.WriteAllLines(outFile, contents);
IEnumerable<string> lines = File.ReadLines("Z:/Daniel/SortedAccounts.txt");
foreach (string line in lines)
{
//First Name
string[] data = Regex.Split(line, "[#\\d#]");
string accountFirstName = data[0];
string accountLastName = data[1];
string accountEmail = data[2];
//Phone Number
string accountNumber = data[3];
//Preferred Contact
string accountPreferredContact = data[4];
//Populate Combobox
//accountComboBox.Items.Add(accountLastName + "," + accountFirstName);
}
Edit "Use AddRange"
class Account
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string Email { get; set; }
public string Number { get; set; }
public string PreferredContact { get; set; }
}
accountComboBox.Items.AddRange(
lines.Select(line => Regex.Split(line, "[#\\d#]")).Select(data => new Account
{
FirstName = data[0],
LastName = data[1],
Email = data[2],
Number = data[3],
PreferredContact = data[4]
}).Select(item => string.Format("{0},{1}", item.LastName, item.FirstName)).ToArray()
);

Related

C# How to implement CSV file into this code

Hi I am fairly new to coding, I have a piece of code that searches for a string and replaces it with another string like so:
var replacements = new[]{
new{Find="123",Replace="Word one"},
new{Find="ABC",Replace="Word two"},
new{Find="999",Replace="Word two"},
};
var myLongString = "123 is a long 999 string yeah";
foreach(var set in replacements)
{
myLongString = myLongString.Replace(set.Find, set.Replace);
}
If I want to use a CSV file that contains a lot of words and their replacements, for example, LOL,Laugh Out Loud, and ROFL, Roll Around Floor Laughing. How would I implement that?
Create a text file that looks like (you could use commas, but I like pipes (|)):
123|Word One
ABC|Word Two
999|Word Three
LOL|Laugh Out Loud
ROFL|Roll Around Floor Laughing
Then create a tiny helper class:
public class WordReplace
{
public string Find { get; set; }
public string Replace { get; set; }
}
And finally, call this code:
private static string DoWordReplace()
{
//first read in the data
var fileData = File.ReadAllLines("WordReplace.txt");
var wordReplacePairs = new List<WordReplace>();
var lineNo = 1;
foreach (var item in fileData)
{
var pair = item.Split(new[] {'|'}, StringSplitOptions.RemoveEmptyEntries);
if (pair.Length != 2)
{
throw new ApplicationException($"Malformed file, line {lineNo}, data = [{item}] ");
}
wordReplacePairs.Add(new WordReplace{Find = pair[0], Replace = pair[1]});
++lineNo;
}
var longString = "LOL, 123 is a long 999 string yeah, ROFL";
//now do the replacements
var buffer = new StringBuilder(longString);
foreach (var pair in wordReplacePairs)
{
buffer.Replace(pair.Find, pair.Replace);
}
return buffer.ToString();
}
The result is:
Laugh Out Loud, Word One is a long Word Three string yeah, Roll Around Floor Laughing

Create a list of objects with initialized properties from a string with infos

I have a string that looks like that:
random text 12234
another random text
User infos:
User name : John
ID : 221223
Date : 23.02.2018
Job: job1
User name : Andrew
ID : 378292
Date : 12.08.2017
Job: job2
User name : Chris
ID : 930712
Date : 05.11.2016
Job : job3
some random text
And this class:
class User
{
public string UserName { get; set; }
public string ID { get; set; }
public string Date { get; set; }
public string Job { get; set; }
public User(string _UserName, string _ID, string _Date, string _Job)
{
UserName = _UserName
ID = _ID;
Date = _Date;
Job = _Job;
}
}
And I want to create a List of Users with informations from that string.
I have tried doing that:
List<User> Users = new List<User>();
string Data = (the data above)
string[] lines = Data.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
List<string> UserNames = new List<string>();
List<string> IDs = new List<string>();
List<string> Dates = new List<string>();
List<string> Jobs = new List<string>();
foreach (var line in lines)
{
if (line.StartsWith("User name : "))
{
UserNames.Add(Line.Remove(0, 12));
}
if (Line.StartsWith("ID : "))
{
IDs.Add(Line.Remove(0, 5));
}
if (Line.StartsWith("Date : "))
{
Dates.Add(Line.Remove(0, 7));
}
if (Line.StartsWith("Job : "))
{
Jobs.Add(Line.Remove(0, 6));
}
}
var AllData = UserNames.Zip(IDs, (u, i) => new { UserName = u, ID = i });
foreach (var data in AllData)
{
Users.Add(new User(data.UserName, data.ID, "date", "job"));
}
But I can only combine two lists using this code. Also, I have more than 4 values for each user (the string above was just a short example) .
Is there a better method? Thanks.
Since it seems to be always 4 lines of information you could go in steps of 4 with a loop through the splitted array lines. At each step you would split by colon : and collect the last item, which is the desired value:
EDIT: In this case I would suggets to look for the START of the data.
int startIndex = Data.IndexOf("User name");
EDIT 2:
also ends with another line of text
then you can use LastIndexOf to find the end of the important information:
int endIndex = Data.LastIndexOf("Job");
int lengthOfLastLine = Data.Substring(endIndex).IndexOf(Environment.NewLine);
endIndex += lengthOfLastLine;
and then simply take a SubString from the startindex on until the end
string [] lines = Data.Substring(startIndex, endIndex - startIndex)
.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
List<User> allUsers = new List<UserQuery.User>();
for (int i = 0; i < lines.Length; i += 4)
{
string name = lines[i].Split(':').Last().Trim();
string ID = lines[i + 1].Split(':').Last().Trim();
string Date = lines[i + 2].Split(':').Last().Trim();
string Job = lines[i + 3].Split(':').Last().Trim();
allUsers.Add(new User(name, ID, Date, Job));
}
Ahhh, and you should Trim the spaces away.
This solution should be readable. The hard coded step size of 4 is actually annoying in my solution
Disclaimer: This solution works only as long as the format does not change. If the order of the lines should change, it will return false results
Instead of checking each line to add each of them to a a list, you can create your list of User directly. There you go:
Split by double new line
Split by new line
Build each User
Code:
var users = data.Split(new[] {"\n\n" }, StringSplitOptions.None).Select(lines =>
{
var line = lines.Split(new[] { "\n" }, StringSplitOptions.None);
return new User(line[0].Substring(11), line[1].Substring(4), line[2].Substring(6), line[3].Substring(5));
});
Try it online!
As #Mong Zhu answer, remove everything before and after. A this point, this is another question I wont try to solve. Remove the noise before and after then parse your data.
For a robust, flexible and self-documenting solution that will allow you to easily add new fields, ignore all the extraneous text and also cater for variations in your file format (this seems to be the case with, for example, no space in "ID:" only in the 3rd record), I would use a Regex and some LINQ to return a collection of records as follows:
using System.Text.RegularExpressions;
public class Record
{
public string Name { get; set; }
public string ID { get; set; }
public string Date { get; set; }
public string Job { get; set; }
}
public List<Record> Test()
{
string s = #"User name : John
ID : 221223
Date : 23.02.2018
Job: job1
User name : Andrew
ID : 378292
Date : 12.08.2017
Job: job2
User name : Chris
ID: 930712
Date : 05.11.2016
Job: job3
";
Regex r = new Regex(#"User\sname\s:\s(?<name>\w+).*?ID\s:\s(?<id>\w+).*?Date\s:\s(?<date>[0-9.]+).*?Job:\s(?<job>\w\w+)",RegexOptions.Singleline);
r.Matches(s);
return (from Match m in r.Matches(s)
select new Record
{
Name = m.Groups["name"].Value,
ID = m.Groups["id"].Value,
Date = m.Groups["date"].Value,
Job = m.Groups["job"].Value
}).ToList();
}
The CSV format seems to be what you're looking for (since you want to add some header to this file the actual CSV stars on 6th line):
random text 12234
another random text
User infos:
UserName;ID;Date;Job
John;221223;23.02.2018;job1
Andrew;378292;12.08.2017;job2
Chris;930712;05.11.2016;job3
And then you could read this file and parse it:
var lines = File.ReadAllLines("pathToFile");
var dataStartIndex = lines.IndexOf("UserName;ID;Date;Job");
var Users = lines.Skip(dataStartIndex + 1).Select(s =>
{
var splittedStr = s.Split(';');
return new User(splittedStr[0], splittedStr[1], splittedStr[2], splittedStr[3]);
}).ToList();
If you're working with console entry just skip the header part and let user enter comma separated values for each user on a different string. Parse it in a same way:
var splittedStr = ReadLine().Split(';');
var userToAdd = new User(splittedStr[0], splittedStr[1], splittedStr[2] , splittedStr[3]);
Users.Add(userToAdd);

How to insert database what read four values in txt file?

I'm using c# MVC project.
I have a customer class and customer table.I have a Insert function with four parameter name , surname ,phone,address.I want to read .txt file line by line and split with "," and use this Insert function but I don't know how can create algorithm.
static void AddCustomer(string Name, string Surname, string Phone, string Address)
{
using (var session = NHibernateHelper.OpenSession())
{
using (var transaction = session.BeginTransaction())
{
var customer = new Customer
{
Name = Name,
Surname = Surname,
Phone = Phone,
Address = Address,
};
session.Save(customer);
transaction.Commit();
}
}
}
while ((line = file.ReadLine()) != null)
{
string text = file.ReadToEnd();
string[] lines = text.Split(',');
for (int i = 0; i < lines.Length; i++)
{
//HOW CAN I USER ADDCUSTOMER()
}
counter++;
}
You've almost got it. Assuming file is a StreamReader, you can just split the current line on comma, and pass the separate parts to the AddCustomer method:
while ((line = file.ReadLine()) != null)
{
// Split the line on comma
var lineParts = line.Split(',');
string name = lineParts[0];
string surname = lineParts[1];
string phone = lineParts[2];
string address = lineParts[3];
AddCustomer(name, surname, phone, address);
}
Please note that this does no error checking at all (lineParts[1] will blow up if there's no comma in the given line) and that this is a bad way to parse CSV (if the data contains comma's, which addresses tend to do, it'll not work properly). Use a CSV parsing library.
See Parsing CSV files in C#, with header and plenty of other questions about CSV, where it is suggested that you use the FileHelpers library. Your class that maps to and from the CSV file will look like this:
[DelimitedRecord(",")]
[IgnoreEmptyLines()]
public class MyProduct
{
[FieldOrder(0)]
public string Name { get; set; }
[FieldOrder(1)]
public string Surname { get; set; }
[FieldOrder(2)]
public string Phone { get; set; }
[FieldOrder(3)]
public string Address { get; set; }
}
And the code to read the file:
var engine = new FileHelperEngine<CustomerCsvRecord>();
CustomerCsvRecord[] customers = engine.ReadFile(fileName);
foreach (var customer in customers)
{
AddCustomer(customer.Name, customer.Surname, customer.Phone, customer.Address);
}
This will do your job.
string fileContent = System.IO.File.ReadAllText("YOUR_FILE_PATH");
//assumeing each customer record will be on separate line
string[] lines = fileContent.Split(new string [] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in lines)
{
//assuming a single line content will be like this "name,surname,phone,address"
string[] items = line.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
Customer cust = new Customer();
cust.Name = items[0];
cust.Surname = items[1];
cust.Phone = items[2];
cust.Address = items[3];
//now use this 'cust' object
}

How to convert string line into fix size array?

I am reading .txt file in c#. I have one problem like my txt file contain more then 2000 lines and it will take lots of time to execute because currently i am read line by line and use sub-string function for split data.
Each line contain one employee detail.
In line i get data into fix position like,
Employee Name = 0 to 50 Character,
Employee Address = 51 to 200 Character,
Employee BDate = 201 to 215 Character,
Employee Gender = 216 to 220 Character
etc..
Is there any technique to read line and split it to array or something else with this field?
I want to improve performance.
You can use File.ReadLines which will give you IEnumerable<string>:
Employee[] employees = new Employee[2000];
int index = 0;
foreach (var line in File.ReadLines("data.txt"))
{
var employee = new Employee();
employee.Name = line.Substring(0,50);
employee.Address = line.Substring(51,150);
employee.BDate = line.Substring(201,15);
employee.Gender = line.Substring(216,5);
employees[index++] = employee;
}
Use ReadToEnd() if you're going to need to read all 2000+ lines. Then you can execute the line parsing in parallel.
You could use File.ReadAllLines():
var textLines = File.ReadAllLines();
foreach (var line in textLines)
{
var name = line.Substring(51);
var address = line.Substring(51,150);
var bdate = line.Substring(201, 15);
var gender = line.Substring(216,5);
}
You can use Parallel ForEach for improve performance. Refer the below code.
public class Employee
{
public string Name { get; set; }
public string Address { get; set; }
public string DOB { get; set; }
public string Gender { get; set; }
}
var textLines = File.ReadAllLines("FileName.txt");
var employees = new List<Employee>();
Parallel.ForEach(textLines,
emp =>
employees.Add(new Employee()
{
Name = emp.Substring(51),
Address = emp.Substring(51, 150),
DOB = emp.Substring(201, 15),
Gender = emp.Substring(216, 5)
}));

How can I substring an ID?

I got a ID
which are LSHOE-UCT. How can I substring and seperate those ID in order to become:
gender = "L"
Product = "Shoe"
Category = "UCT"
Here is my code:
private void assignProductCategory(string AcStockCategoryID)
{
//What should I insert?
string[] splitParameter = AcStockCategoryID.Split('-');
}
I need to seperate those, ID them and insert to difference table from my database. And that is where I am having the main problem
string[] s = AcStockCategoryID.Split('-');
string gender = s[0].Substring(0, 1);
string Product= s[0].Substring(1, s[0].Length - 1);
string Category = s[1];
To try a different approach, this would work, too.
string id = "LSHOE-UCT";
string gender = id.Substring(0,1);
int indexOfDash = id.IndexOf("-");
string product = id.Substring(1, indexOfDash - 1);
string category = id.Substring(indexOfDash + 1);
try this, I just typed it randomly apologies if there are any typos...
string id = "LSHOE-UCT";
string[] arr = id.Split('-');
string gender = id.Substring(0,1); // this will give you L
string product = arr[0].Substring(1); // this will give you shoe
string category = arr[1]; // this will give you UCT;
Warning: Complete Overkill
You could also use LINQ's extension methods (IEnumerable) to accomplish this. I thought I'd have a little thought experiment about how you could use IEnumerable to over-engineer a solution:
int indexOfDash = id.IndexOf("-");
var firstPart = id.TakeWhile(s => s != '-');
var linqGender = firstPart.Take(1).ToArray()[0]; // string is L
var linqProduct = String.Join("", firstPart.Skip(1).Take(indexOfDash-1)); // string is SHOE
var secondPart = id.Skip(indexOfDash+1);
var linqCategory = String.Join("", secondPart); //string is UCT
EDIT: Updated my answer due to the ID format not being correct in my first post.
If your acStockCategoryID is always going to be in the format of LSHOE-UTC, then you could do something like the following:
private void assignProductCategory(string AcStockCategoryID)
{
string[] splitParameter = AcStockCategoryID.Split('-');
System.Text.StringBuilder sb = new System.Text.StringBuilder();
sb.AppendLine("gender=" + splitParameter[0].Substring(0, 1));
sb.AppendLine("Product=" + splitParameter[0].Substring(1));
sb.AppendLine("Category=" + splitParameter[1]);
// use sb.ToString() wherever you need the results
}
I would do it backwards.
public class LCU
{
public string Gender {get; set;}
public string Product {get; set;}
public string Category {get; set;}
public LCU(){}
}
private static LSU LShoe_UctHandler(string id)
{
var lcu = new LCU();
var s = id.Split('-');
if (s.length < 2) throw new ArgumentException("id");
lcu.Category = s[1];
lcu.Gender = s[0].Substring(0,1);
lcu.Product = s[0].Substring(1);
return lcu;
}
Then just pass ID to LShoe_UctHandler like so...
var lcu = LShoe_UctHandler("LGlobtrotters-TrainingShoes");
Console.WriteLine("gender = {0}", lcu.Gender);
Console.WriteLine("Product = {0}", lcu.Product );
Console.WriteLine("Category = {0}", lcu.Category );
[Hand keyed - so sorry for typos and casing errors]
Try this:
string id = "LSHOE-UCT";
Console.WriteLine("Gender: {0}",id.Substring(0,1));
Console.WriteLine("Product: {0}",id.Split('-')[0].Substring(1));
Console.WriteLine("Product: {0}", id.Split('-')[1]);

Categories

Resources