Reading Text Files and then breaking down lines into columns - c#

I have an assignment where I need to read a text file and then breakdown each line into columns then I need to insert that into database.
What's the best approach for this? Any help will be appreciate it and if you could provide code will be even better.
This is what I have so far
string filename = Server.MapPath("~/Text_File_4.txt");
StreamReader sr = new StreamReader(filename);
string styl;
string colr;
string sdim;
string size;
string qty;
string line;
string sprice;
string sretail;
while ((line = sr.ReadLine()) != null)
{
styl = line.Substring(0, 6);
colr = line.Substring(6, 2);
sdim = line.Substring(8, 1);
size = line.Substring(14, 3);
qty = line.Substring(19, 5);
sprice = line.Substring(27, 6);
sretail = line.Substring(38, 4);
con.Open();
cmd = new SqlCommand("insert into ststyl00(ststyl, stcolr, stsdim, stszcd, stprq, strprq) values(#ststyl, #stcolr, #stsdim, #stszcd, #stprq, #strprq)", con);
cmd.Parameters.Add("#ststyl", SqlDbType.VarChar, 15).Value = styl;
cmd.Parameters.Add("#stcolr", SqlDbType.VarChar, 3).Value = colr;
cmd.Parameters.Add("#stsdim", SqlDbType.VarChar, 8).Value = sdim;
cmd.Parameters.Add("#stszcd", SqlDbType.VarChar, 3).Value = size;
cmd.Parameters.Add("#stprq", SqlDbType.VarChar, 8).Value = sprice;
cmd.Parameters.Add("#strprq", SqlDbType.VarChar, 8).Value = sretail;
cmd.ExecuteNonQuery();
con.Close();
}

Input is a CSV
If your input files are CSV files, I strongly recommend using the CSV Reader class available at
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
Input is Fixed-Width
If your input is fixed-width, just read all of the lines in and parse each individual line into an appropriate structure to store in the database (more on that in a moment).
If you have just a little text to read (perhaps a few megabytes or less), just use
File.ReadAllLines
http://msdn.microsoft.com/en-us/library/s2tte0y1
to read in all of the lines of the file into a string[].
Writing to the DB
You now have a capability to read in the file. Now, you need to write it out to the database. Presumably there is a DB table with a given schema that matches the data in the file. Have a look at ADO.Net to understand how to write to a database and ask specific questions as needed.
http://msdn.microsoft.com/en-us/library/h43ks021(v=vs.100).aspx

This sounds like you have to have the text file with delimiter. The delimiter which separates the data into columns, e.g.
data1, data2, data3, data4
The delimiter could be comma or any other character which is not appearing into regular data. If you have the text file in this format, it would be easy to parse it and push it to database.
The approach could be - You open the file using StreamReader. Read the file line by line i.e. read a line at a time. Split the line into columns through specifying delimiter.
string[] lineData = sr.ReadLine().split('delimiter');
foreach(string colData in lineData)
{
//store data into appropriate collections and push it to database
}

In addition to the other parsing techniques already suggested, you can use the TextFieldParser class (it's in the Microsoft.VisualBasic.FileIO namespace) in conjunction with the ADO.Net code you've already written

Related

Extract specific words from a string in C#

Although its easy in python but i am new to C# and i am having trouble extracting a particular word from a string . i have two txt file .
abc.txt
select * from schema1.table1
xyz.txt
select * from schema2.table2 where a=5
i need to extract "schema1" and "schema2" words only but i tried but i am having trouble with it as it is C#.
MY code
{
StreamReader sr = new StreamReader(file);
string data = sr.ReadLine();
while (data != null)
{
string[] values = data.Split('.');
foreach (string value in values)
{
Console.WriteLine(value.Split(' ').Last());
data = sr.ReadLine();
}
}
}
but the output gives whole lot of other words too . any kind of lead is appreciated .
You may try the following:
string sql = "select * from schema2.table2 where a=5";
var schema = Regex.Replace(sql, #"^select \* from ([^.]+)\.\S+.*$", "$1");
Console.WriteLine(schema); // schema2
This answer makes very large assumptions, including that every SQL query you would need to parse would always start with select * from some_schema.some_table. Obviously, for more complex/different queries, the above logic would fail.
In general, you might need to find a .NET library which can parse SQL queries.

Problem reading csv file that has a column with first and last names

The csv file has Id and Name. Some of the Names are composed of the first and last names eg "John, Smith". If you see in db,after inserting in to SQL table the Name are inserted as "John". Could you please suggest how to get full name of the Name if it is ',' seperated?
string filepath = selecteditem.FullName;
using (StreamReader sr = new StreamReader(filepath))
{
while (sr.Peek() != -1)
{
string line = sr.ReadLine();
string[] value = line.Split(',');
List<string> lineValues = line.Split(',').ToList();
conn.Open();
cmd.CommandText = "insert into
The string.Split method has an overload that allows you to control how many splits are returned by the original string, so if your input string is
string input = "1,John, Smith";
var splits = input.Split(new char[] { ','}, 2, StringSplitOptions.RemoveEmptyEntries);
and you have only two entries in the splits array, the first is the ID, the second is the name.
If you have access to Excel, open it, save as XLSX
Find and replace all commas within the workbook with a | or other equally obscure character, having first made sure, the obscure character isn't in the sheet to begin with.
Save and then re-save as .csv.
In the C# code or directly in the Sql, replace the obscure character with the original comma.

How can I divide line from input file in two parts and then compare it to the two column data of database table in c#?

Given a text file, how would I go about reading an particular digits in line .
Say, I have a file 123.txt. How would I go about reading line number and store first 5 digits in different variable and next 6 digits to another variable.
All I've seen is stuff involving storing the entire text file as a String array . but there are some complications: The text file is enormously huge and the machine that the application I'm coding isn't exactly a top-notch system. Speed isn't the top priority, but it is definitely a major issue.
// Please Help here
// Want to compare data of input file with database table columns.
// How to split data in to parts
// Access that split data later for comparison.
// Data in input file is like,
//
// 016584824684000000000000000+
// 045787544574000000000000000+
// 014578645447000000000000000+
// 047878741489000000000000000+ and so on..
string[] lines = System.IO.File.ReadAllLines("F:\\123.txt"); // Input file
// How can I divide lines from input file in 2 parts (For ex. 01658 and 4824684) and save it in variable so that I can use it for comparing later.
string conStr = ConfigurationManager.ConnectionStrings["BVI"].ConnectionString;
cnn = new SqlConnection(conStr);
cnn.Open();
// So I want to compare first 5 digits of all lines of input file (ex. 01658)with Transit_ID and next 6 digits with Client_Account and then export matching rows in excel file.
sql = "SELECT Transit_ID AS TransitID, Client_Account AS AccountNo FROM TCA_CLIENT_ACCOUNT WHERE Transit_ID = " //(What should I put here to comapare with first 5 digits of all lines of input file)" AND Client_Account = " ??" );
All I've seen is stuff involving storing the entire text file as a String array
Large text files should be processed by streaming one line at a time so that you don't allocate a large amount of memory needlessly
using (StreamReader sr = File.OpenText(path))
{
string s;
while ((s = sr.ReadLine()) != null)
{
// How would I go about reading line number and store first 5
// digits in different variable and next 6 digits to another variable.
string first = s.Substring(0, 5);
string second = s.Substring(6, 6);
}
}
https://msdn.microsoft.com/en-us/library/system.io.file.opentext(v=vs.110).aspx
Just use Substring(int32, int32) to get the appropriate values like this:
string[] lines = System.IO.File.ReadAllLines("F:\\123.txt");
List<string> first = new List<string>();
List<string> second = new List<string>();
foreach (string line in lines)
{
first.Add(line.Substring(0, 5));
second.Add(line.Substring(6, 6));
}
Though Eric's answer is much cleaner. This was just a quick and dirty proof of concept using your sample data. You should definitely use the using statement and StreamReader as he suggested.
first will contain the first 5 digits from each element in lines, and second will contain the next 6 digits.
Then to build your SQL, you'd do something like this;
sql = "SELECT Transit_ID AS TransitID, Client_Account AS AccountNo FROM TCA_CLIENT_ACCOUNT WHERE Transit_ID = #TransitId AND Client_Account = #ClientAcct");
SqlCommand cmd = new SqlCommand(sql);
for (int i = 0; i < lines.Count; i++)
{
cmd.Parameters.AddWithValue("#TransitId", first[i]);
cmd.Parameters.AddWithValue("#ClientAcct", second[i]);
//execute your command and validate results
}
That will loop N times and run a command for each of the values in lines.

OdbcConnection Text Driver ignores scheme.ini settings

Here is my code:
OdbcConnection conn = new OdbcConnection("Driver={Microsoft Text Driver (*.txt; *.csv)};DSN=scrapped.csv");
conn.Open();
OdbcCommand foo = new OdbcCommand(#"SELECT * FROM [scrapped.csv] WHERE KWOTA < 100.00", conn);
IDataReader dr = foo.ExecuteReader();
StreamWriter asd = new StreamWriter("outfile.txt");
while (dr.Read())
{
int cols = dr.GetSchemaTable().Rows.Count;
for (int i = 0; i < cols; i++)
{
asd.Write(string.Format("{0};",dr[i].ToString()));
}
asd.WriteLine();
}
asd.Flush();
asd.Close();
dr.Close();
conn.Close();
Here is my Scheme.ini
[scrapped.csv]
Format=Delimited(;)
NumberDigits=2
CurrencyThousandSymbol=
CurrencyDecimalSymbol=,
CurrencyDigits=2
Col1=DataOperacji Date
Col2=DataKsiegowania Date
Col3=OpisOperacji Text
Col4=Tytul Text
Col5=NadawcaOdbiorca Text
Col6=NumerKonta Text
Col7=Kwota Currency
Col8=SaldoPoOperacji Currency
Here I have sample from my CSV:
2013-01-22;2013-08-24;notmatter;"notmatter";"notmatter";'notmatter';7 111,55;10 222,20;
2013-03-26;2013-08-23;notmatter;"notmatter";"notmatter";'notmatter';-275,00;15 466,24;
So even if I have date and currency set in scheme.ini and regional settings (which should be used by odbc by defult but are not) values which i write to output file are total mess.
They are empty if there is space (my local thousend delimiter) and if I have value like 15,45 i got 15,4500 instead.
Date fields also behave abnormal, and even if I insert to scheme.ini DateTimeFormat I get nothing like I specified in format.
Any help would be appreciated, what to do with it, I would like to use ODBC and query CSV data like database with WHERE something = something
I added a line to your schema.ini and ran against an adodb connection and it worked for me in the matter of dates, other bits are still not right. Note DateTimeFormat.
[scrapped.csv]
Format=Delimited(;)
NumberDigits=2
CurrencyThousandSymbol=
CurrencyDecimalSymbol=,
CurrencyDigits=2
DateTimeFormat="yyyy-mm-dd"
Col1=DataOperacji Date
Col2=DataKsiegowania Date
Col3=OpisOperacji Text
Col4=Tytul Text
Col5=NadawcaOdbiorca Text
Col6=NumerKonta Text
Col7=Kwota Currency
Col8=SaldoPoOperacji Currency
You may also need:
ColNameHeader=False
MaxScanRows=0
But at the moment, I cannot see a way to get a space accepted as the CurrencyThousandSymbol

Microsoft's Microsoft Text Driver sees text as float operation

Using .NET
I have a text file with comma separated data. One of the columns consists of text like the following : 1997/020269/07
Now when I do a select with an OdbcCommand the string is seen as a float and it returns the 'answer' instead of the actual text!
How can I get the actual text? Am I going to be forced to parsing the file manually?
Hope someone can help...please?! :)
Edit: Some code maybe? :)
string strConnString =
#"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + _FilePath +
#"; Extensions=asc,csv,tab,txt;Persist Security Info=False";
var conn = new System.Data.Odbc.OdbcConnection(strConnString);
var cmd = new System.Data.Odbc.OdbcCommand("select MyColumn from TextFile.txt", conn);
var reader = cmd.ExecuteReader();
while (reader.Read())
{ Console.WriteLine(reader["MyColumn"]); }
This returns 0.014074977 instead of 1997/020269/07
Have you tried using a schema.ini file -- these can be used to explicitly define the format of the text file, including data types.
Your schema.ini file might end up looking a little like:
[sourcefilename.txt]
ColNameHeader=true
Format=CSVDelimited
Col1=MyColumn Text Width 14
Col2=...
Try using schema.ini
[yourfile.txt]
ColNameHeader=false
MaxScanRows=0
Format=FixedLength
Col1=MyColumn Text Width 20
Bye.

Categories

Resources