How to insert value at specific column in text files - c#

I have 3 txt files which are generated on a daily basis by one of our systems, that need values inserted at specific column positions.
I've accomplished this with the code below, however:
The specific value (**LineText) needs to be on all rows that have text and not just one row. I am not sure how to accomplish this.
My code currently inserts the value (**LineText), however it pushes everything over. Is there a way for the value to be inserted without pushing the rest of the data over?
Each day 3 files will be generated with the names REYYYYMMDD.TXT, TRYYYYMMDD.TXT and CTYYYYMMDD.TXT. Is there a way for the code to pick up these names? I've tried using wildcards such as RE*.TXT, TR*.TXT etc but it doesn't work.
Results example below (What my code currently does with the RE20150109.TXT file)
223016254 CSST45124
167520001 EUR SKBSUS12454
158013456 CSST15568
140490002 CSST14779
167520004 SKBSUS88897
515800001 CSST13679
149370003 CSST32897
161930009 RTVS10035
Below is what I would like it to do but am not sure how :
223016254 EUR CSST45124
167520001 EUR SKBSUS12454
158013456 EUR CSST15568
140490002 EUR CSST14779
167520004 EUR SKBSUS88897
515800001 EUR CSST13679
149370003 EUR CSST32897
161930009 EUR RTVS10035
My C# code is below:
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace AstTXTEdit
{
class Program
{
static void Main(string[] args)
{
string REFilePath = #"C:\AstImport\RE20150109.TXT";
int RElineNo = 1; //How do i set this to be all rows within the text file?
string RELineText = "";
int REPosition = 12;
var REFullContent = File.ReadAllLines(REFilePath);
RELineText = REFullContent[RElineNo];
RELineText = RELineText.Insert(REPosition, "EUR");
REFullContent[RElineNo] = RELineText;
File.WriteAllLines(REFilePath, REFullContent);
string TRFilePath = #"C:\AstImport\TR20150109.TXT";
int TRlineNo = 1; //How do i set this to be all rows within the text file?
string TRLineText = "";
int TRPosition = 40;
var FullContent = File.ReadAllLines(TRFilePath);
TRLineText = FullContent[TRlineNo];
TRLineText = TRLineText.Insert(TRPosition, "Y");
FullContent[TRlineNo] = TRLineText;
File.WriteAllLines(TRFilePath, FullContent);
string CTFilePath = #"C:\AstImport\CT20150109.TXT";
int CTlineNo = 1; //How do i set this to be all rows within the text file?
string CTLineText = "";
int CTPosition = 36;
var CTFullContent = File.ReadAllLines(CTFilePath);
CTLineText = FullContent[CTlineNo];
CTLineText = CTLineText.Insert(CTPosition, "I");
FullContent[CTlineNo] = CTLineText;
File.WriteAllLines(CTFilePath, FullContent);
}
}
}
Any Assistance would be most appreciated.
Kind Regards,
Andrea

The first two questions:
The specific value (**LineText) needs to be on all rows that have text and not just one row. I am not sure how to accomplish this.
and
My code currently inserts the value (**LineText), however it pushes everything over. Is there a way for the value to be inserted without pushing the rest of the data over?
can be solved by iterating all lines with the Select and transforming each line by first inserting the new string an then removing the same length of characters after the insertion:
// Read entire file;
var lines = File.ReadAllLines("data2.txt");
var eur = "EUR"; // String to insert.
// Calc the position to insert at.
var insertAt = "223016254".Length + 1;
var result =
lines
.Select(x =>
// Insert the 'eur' string.
x.Insert(insertAt, eur)
// Remove the spaces after insertion.
.Remove(insertAt + eur.Length, eur.Length))
.ToList();
As far as the third question is concerned:
Each day 3 files will be generated with the names REYYYYMMDD.TXT, TRYYYYMMDD.TXT and CTYYYYMMDD.TXT. Is there a way for the code to pick up these names?
I wouldn't use Directory.GetFiles wildcards because in most cases they are too simple. A linq query with with a regex file name pattern could do much more:
string fileNamePattern = #"(CT|RE|TR)\d{8}\.TXT$"
string[] files = Directory.GetFiles(path);
files =
files
.Where(fileName =>
Regex.IsMatch(fileName, fileNamePattern , RegexOptions.IgnoreCase))
.ToArray();

Question 3:
List<string> fileNames = Directory.GetFiles(#"c:\AstImport", "RE*.TXT").ToList();
fileNames.AddRange(Directory.GetFiles(#"c:\myfolder", "TR*.TXT"));
foreach (string fileName in fileNames)
{
}

Question 1: It would be easy to insert the missing strings with linq
var v = File.ReadAllLines(path).Select(s => { string[] arr = s.Split(' '); if (arr[1] == "EUR") return s; else return String.Join(" ", new string[] { arr[0], "EUR", arr[1] }); });
File.WriteAllLines(path, v);
Question 2: I did not understand the question
Question 3: The Linq Answer
Directory.GetFiles("path", "RE*.txt").ToList().ForEach(s => { /* Same as 1 */ });

Related

SSIS C# Script Task: How to match/replace pattern with increment on a large XML file

There are other similar questions that have been asked and answered, but none of those answers work in what I'm trying to do, or there isn't enough information for me to know how to implement it in my own code. I've been at it for two days and now must ask for help.
I have a script task in an SSIS package where I need to do a match and replace on a large XML file that contains thousands of Record Identifier tags. Each one contains a number. I need those numbers to be consecutive and increment by one. For example, within the xml file, I am able to find tags that appear like this:
<ns1:recordIdentifier>1</ns1:recordIdentifier>
<ns1:recordIdentifier>6</ns1:recordIdentifier>
<ns1:recordIdentifier>223</ns1:recordIdentifier>
<ns1:recordIdentifier>4102</ns1:recordIdentifier>
I need to find and replace those tags with consecutive increments like so:
<ns1:recordIdentifier>1</ns1:recordIdentifier>
<ns1:recordIdentifier>2</ns1:recordIdentifier>
<ns1:recordIdentifier>3</ns1:recordIdentifier>
<ns1:recordIdentifier>4</ns1:recordIdentifier>
The code I have so far is causing all the numbers to be "1" with no incrementation.
I've tried dozens of different methods, but nothing has worked yet.
Any ideas as to how I can modify the below code to increment as desired?
public void Main()
{
string varStart = "<ns1:recordIdentifier>";
string varEnd = "</ns1:recordIdentifier>";
int i = 1;
string path = Dts.Variables["User::xmlFilename"].Value.ToString();
string outPath = Dts.Variables["User::xmlOutputFile"].Value.ToString();
string ptrn = #"<ns1:recordIdentifier>\d{1,4}<\/ns1:recordIdentifier>";
string replace = varStart + i + varEnd;
using (StreamReader sr = File.OpenText(path))
{
string s = "";
while ((s = sr.ReadLine()) != null && i>0)
{
File.WriteAllText(outPath, Regex.Replace(File.ReadAllText(path),
ptrn, replace));
i++;
}
}
}
You were on the right path with the Replace method, but will need to use the MatchEvaluater parameter when you increment.
string inputFile = Dts.Variables["User::xmlFilename"].Value.ToString();
string outPutfile = Dts.Variables["User::xmlOutputFile"].Value.ToString();
string fileText = File.ReadAllText(inputFile);
//get any number between elements
Regex reg = new Regex("<ns1:recordIdentifier>[0-9]</ns1:recordIdentifier>");
string xmlStartTag = "<ns1:recordIdentifier>";
string xmlEndTag = "</ns1:recordIdentifier>";
//assuming this starts at 1
int incrementInt = 1;
fileText = reg.Replace(fileText, tag =>
{ return xmlStartTag + incrementInt++.ToString() + xmlEndTag; });
File.WriteAllText(outPutfile, fileText);

Convert txt with different number of spaces into xls file

I tried searching for a solution here but I can't seem to find any answers. I have a textfile that appears like this:
Nmr_test 101E-6 PASSED PASSED PASSED PASSED
Dc_volts 10V_100 CAL_+10V +9.99999000 +10.0000100 +9.99999740 +9.99999727
Dcv_lin 10V_6U 11.5 +0.0000E+000 +7.0000E+000 +2.0367E+001 +2.7427E+001
Dcv_lin 10V_6U 3 +0.0000E+000 +5.0000E+000 +1.3331E+001 +1.8872E+001
I have to convert this textfile to an Excel/xls file but I can't figure out how to insert them to the correct excel columns as they have different number of spaces in between columns. I've tried using this code below which is using space as a separator but it fails of course due to the varying number of spaces between the columns:
var lines = File.ReadAllLines(string.Concat(Directory.GetCurrentDirectory(), "\\Temp_textfile.txt"));
var rowcounter = 1;
foreach(var line in lines)
{
var columncounter = 1;
var values = line.Split(' ');
foreach(var value in values)
{
excelworksheet.Cells[rowcounter, columncounter] = new Cell(value);
columncounter++;
}
rowcounter++;
}
excelworkbook.Worksheets.Add(excelworksheet);
excelworkbook.Save(string.Concat(Directory.GetCurrentDirectory(), "\\Exported_excelfile.xls"));
Any advice?
EDIT: Got it working using SubString that selects each column using their fixed width.

Read specific values out of a text-file and put them in a list

I have a text-file with many lines, each line looks like this:
"string string double double" between each value is a space. I'd like to read out the first string and last double of every line and put these two values in a existing list. That is my code so far, but it doesnt really work.
private void bOpen_Click(object sender, RoutedEventArgs e)
{
bool exists = File.Exists(#"C:\Users\p2\Desktop\Liste.txt");
if (exists == true)
{
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(#"C:\Users\p2\Desktop\Liste.txt"))
{
Vgl comp = new Vgl();
comp.name = Abzahlungsdarlehenrechner.zgName;
comp.gErg = Abzahlungsdarlehenrechner.zgErg;
GlobaleDaten.VglDaten.Add(comp);
int i = 0;
string line = File.ReadLines(#"Liste.txt").Skip(0).Take(1).First();
while ((line = sr.ReadLine()) != null)
{
sb.Append((line));
listBox.Items.Add(line);
GlobaleDaten.VglDaten.Add(comp);
i++;
}
}
}
I have already read this, but it didnt help How do I read specific value[...]
You can try Linq:
var source = File
.ReadLines(#"C:\Users\p2\Desktop\Liste.txt")
.Select(line => line.Split(' '))
.Select(items => new Vgl() {
name = items[0],
gErg = double.Parse(items[3])
});
// If you want to add into existing list
GlobaleDaten.VglDaten.AddRange(source);
// If you want to create a new list
//List<Vgl> list = source.ToList();
how about
List<Vgl> Result = File.ReadLines(#"C:\Users\p2\Desktop\Liste.txt")
.Select(x => new Vgl()
{
name = x.Split(' ').First(),
gErg = decimal.Parse(x.Split(' ').Last(), NumberStyles.AllowCurrencySymbol)
})
.ToList();
I would avoid storing money within doulbe values because this could lead to rounding issues. Use decimal instead. Examples here: Is a double really unsuitable for money?
You can use:
string[] splitBySpace = line.Split(' ');
string first = splitBySpace.ElementAt(0);
decimal last = Convert.ToDecimal(splitBySpace.ElementAt(splitBySpace.Length - 1));
Edit : To Handle Currency symbol:
string[] splitBySpace = line.Split(' ');
string pattern = #"[^0-9\.\,]+";
string first = splitBySpace.ElementAt(0);
string last = (new Regex(pattern)).Split(splitBySpace.ElementAt(splitBySpace.Length - 1))
.FirstOrDefault();
decimal lastDecimal;
bool success = decimal.TryParse(last, out lastDecimal);
I agree with #Dmitry and fubo, if you are looking for alternatives, you could try this.
var source = File
.ReadLines(#"C:\Users\p2\Desktop\Liste.txt")
.Select(line =>
{
var splits = line.Split(' '));
return new Vgl()
{
name = splits[0],
gErg = double.Parse(splits[3])
};
}
use string.split using space as the delimiter on line to the string into an array with each value. Then just access the first and last array element. Of course, if you aren't absolutely certain that each line contains exactly 4 values, you may want to inspect the length of the array to ensure there are at least 4 values.
reference on using split:
https://msdn.microsoft.com/en-us/library/ms228388.aspx
Read the whole file as a string.
Split the string in a foreach loop using \r\n as a row separator. Add each row to a list of strings.
Iterate through that list and split again each record in another loop using space as field separator and put them into another list of strings.
Now you have all the four fields containig one row. Now just use First and Last methods to get the first word and the last number.

Remove a specific column from a delimited file

I've been working with some big delimited text (~1GB) files these days. It looks like somewhat below
COlumn1 #COlumn2#COlumn3#COlumn4
COlumn1#COlumn2#COlumn3 #COlumn4
where # is the delimiter.
In case a column is invalid I might have to remove it from the whole text file. The output file when Column 3 is invalid should look like this.
COlumn1 #COlumn2#COlumn4
COlumn1#COlumn2#COlumn4
string line = "COlumn1# COlumn2 #COlumn3# COlumn4";
int junk =3;
int columncount = line.Split(new char[] { '#' }, StringSplitOptions.None).Count();
//remove the [junk-1]th '#' and the value till [junk]th '#'
//"COlumn1# COlumn2 # COlumn4"
I's not able to find a c# version of this in SO. Is there a way I can do that? Please help.
EDIT:
The solution which I found myself is like below which does the job. Is there a way I could modify this to a better way so that it narrows down the performance impact it might have in case of large text files?
int junk = 3;
string line = "COlumn1#COlumn2#COlumn3#COlumn4";
int counter = 0;
int colcount = line.Split(new char[] { '#' }, StringSplitOptions.None).Length - 1;
string[] linearray = line.Split(new char[] { '#' }, StringSplitOptions.None);
List<string> linelist = linearray.ToList();
linelist.RemoveAt(junk - 1);
string finalline = string.Empty;
foreach (string s in linelist)
{
counter++;
finalline += s;
if (counter < colcount)
finalline += "#";
}
Console.WriteLine(finalline);
EDITED
This method can be very memory expensive, as your can read in this post, the suggestion should be:
If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.
To avoid memory consumption you should use a StreamReader to read file line by line
This could be a start for your task, missing your invalid match logic
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const string fileName = "temp.txt";
var results = FindInvalidColumns(fileName);
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var builder = new StringBuilder();
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
if (!results.Contains(i))
builder.Append(split[i]);
using (var fs = new FileStream("new.txt", FileMode.Append, FileAccess.Write))
using (var sw = new StreamWriter(fs))
{
sw.WriteLine(builder.ToString());
}
}
}
}
private static List<int> FindInvalidColumns(string fileName)
{
var invalidColumnIndexes = new List<int>();
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
{
if (IsInvalid(split[i]) && !invalidColumnIndexes.Contains(i))
invalidColumnIndexes.Add(i);
}
}
}
return invalidColumnIndexes;
}
private static bool IsInvalid(string s)
{
return false;
}
}
}
First, what you will do is re-write the line to a text file using a 0-length string for COlumn3. Therefore the line after being written correctly would look like this:
COlumun1#COlumn2##COlumn4
As you can see, there are two delimiters between COlumn2 and COlumn4. This is a cell with no data in it. (By "cell" I mean one column of a certain, single row.) Later, when some other process reads this using the Split function, it will still create a new value for Column 3, but in the array generated by Split, the 3rd position would be an empty string:
String[] columns = stream_reader.ReadLine().Split('#');
int lengthOfThirdItem = columns[2].Length; // for proof
// lengthOfThirdItem = 0
This reduces invalid values to null and persists them back in the text file.
For more on String.Split see C# StreamReader save to Array with separator.
It is not possible to write to lines internal to a text file while it is also open for read. This article discusses it some (simultaneous read-write a file in C#), but it looks like that question-asker just wants to be able to write lines to the end. You want to be able to write lines at any point in the interior. I think this is not possible without buffering the data in some way.
The simplest way to buffer the data is rename the file to a temp file first (using File.CoMovepy() // http://msdn.microsoft.com/en-us/library/system.io.file.move(v=vs.110).aspx). Then use the temp file as the data source. Just open the temp file that to read in the data which may have corrupt entries, and write the data afresh to the original file name using the approach I describe above to represent empty columns. After this is complete, then you should delete the temp file.
Important
Deleting the temp file may leave you vulnerable to power and data transients (or software 'transients'). (I.e., a power drop that interrupts part of the process could leave the data in an unusable state.) So you may also want to leave the temp file on the drive as an emergency backup in case of some problem.

Searching strings in txt file

I have a .txt file with a list of 174 different strings. Each string has an unique identifier.
For example:
123|this data is variable|
456|this data is variable|
789|so is this|
etc..
I wish to write a programe in C# that will read the .txt file and display only one of the 174 strings if I specify the ID of the string I want. This is because in the file I have all the data is variable so only the ID can be used to pull the string. So instead of ending up with the example about I get just one line.
eg just
123|this data is variable|
I seem to be able to write a programe that will pull just the ID from the .txt file and not the entire string or a program that mearly reads the whole file and displays it. But am yet to wirte on that does exactly what I need. HELP!
Well the actual string i get out from the txt file has no '|' they were just in the example. An example of the real string would be: 0111111(0010101) where the data in the brackets is variable. The brackets dont exsist in the real string either.
namespace String_reader
{
class Program
{
static void Main(string[] args)
{
String filepath = #"C:\my file name here";
string line;
if(File.Exists(filepath))
{
StreamReader file = null;
try
{
file = new StreamReader(filepath);
while ((line = file.ReadLine()) !=null)
{
string regMatch = "ID number here"; //this is where it all falls apart.
Regex.IsMatch (line, regMatch);
Console.WriteLine (line);// When program is run it just displays the whole .txt file
}
}
}
finally{
if (file !=null)
file.Close();
}
}
Console.ReadLine();
}
}
}
Use a Regex. Something along the lines of Regex.Match("|"+inputString+"|",#"\|[ ]*\d+\|(.+?)\|").Groups[1].Value
Oh, I almost forgot; you'll need to substitute the d+ for the actual index you want. Right now, that'll just get you the first one.
The "|" before and after the input string makes sure both the index and the value are enclosed in a | for all elements, including the first and last. There's ways of doing a Regex without it, but IMHO they just make your regex more complicated, and less readable.
Assuming you have path and id.
Console.WriteLine(File.ReadAllLines(path).Where(l => l.StartsWith(id + "|")).FirstOrDefault());
Use ReadLines to get a string array of lines then string split on the |
You could use Regex.Split method
FileInfo info = new FileInfo("filename.txt");
String[] lines = info.OpenText().ReadToEnd().Split(' ');
foreach(String line in lines)
{
int id = Convert.ToInt32(line.Split('|')[0]);
string text = Convert.ToInt32(line.Split('|')[1]);
}
Read the data into a string
Split the string on "|"
Read the items 2 by 2: key:value,key:value,...
Add them to a dictionary
Now you can easily find your string with dictionary[key].
first load the hole file to a string.
then try this:
string s = "123|this data is variable| 456|this data is also variable| 789|so is this|";
int index = s.IndexOf("123", 0);
string temp = s.Substring(index,s.Length-index);
string[] splitStr = temp.Split('|');
Console.WriteLine(splitStr[1]);
hope this is what you are looking for.
private static IEnumerable<string> ReadLines(string fspec)
{
using (var reader = new StreamReader(new FileStream(fspec, FileMode.Open, FileAccess.Read, FileShare.Read)))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
var dict = ReadLines("input.txt")
.Select(s =>
{
var split = s.Split("|".ToArray(), 2);
return new {Id = Int32.Parse(split[0]), Text = split[1]};
})
.ToDictionary(kv => kv.Id, kv => kv.Text);
Please note that with .NET 4.0 you don't need the ReadLines function, because there is ReadLines
You can now work with that as any dictionary:
Console.WriteLine(dict[12]);
Console.WriteLine(dict[999]);
No error handling here, please add your own
You can use Split method to divide the entire text into parts sepparated by '|'. Then all even elements will correspond to numbers odd elements - to strings.
StreamReader sr = new StreamReader(filename);
string text = sr.ReadToEnd();
string[] data = text.Split('|');
Then convert certain data elements to numbers and strings, i.e. int[] IDs and string[] Strs. Find the index of the given ID with idx = Array.FindIndex(IDs, ID.Equals) and the corresponding string will be Strs[idx]
List <int> IDs;
List <string> Strs;
for (int i = 0; i < data.Length - 1; i += 2)
{
IDs.Add(int.Parse(data[i]));
Strs.Add(data[i + 1]);
}
idx = Array.FindIndex(IDs, ID.Equals); // we get ID from input
answer = Strs[idx];

Categories

Resources