Removing extra letter sets in an inconsistent text file using Regex - c#

I have a hard time figuring out how to remove extra letters using Regex.
I have this example below that says that it has 42 of "|" (vertical bars) per line.
|V.7|42|
1|0|1|58|4|4|351|25|8|||1|0||6|3|1000|49|20|430|17|6|0|10|0|1200|25||30|20|20|20|20|0|100|61028|1|0|0|1|1|0|
1|0|1|58|4|4|351|25|8|||1|0||6|3|1000|49|20|430|17|6|0|10|0|1200|25||30|20|20|20|20|0|100|61028|1|0|0|1|1|0|
2|543|2|58|4|4|366|26|9|100||2|200||8|3|1000|49|20|430|17|6|10|21|54|2400|36||30|20|20|20|20|543|150|61028|2|100|1|2|2|0|
3|1230|3|60|5|5|390|26|10|100||3|1500||10|3|1000|49|20|430|17|6|10|32|123|4800|46||30|20|20|20|20|1230|200|61028|3|1000|2|3|3|0|
4|2002|4|61|6|6|424|27|12|100||4|6000||12|4|769|37|15|315|12|4|10|45|200|9600|57||30|20|20|20|20|2002|250|61028|4|5000|3|4|4|0|
5|3306|5|63|7|7|468|29|14|100||5|18000||16|4|556|27|11|208|8|2|10|58|331||69||30|20|20|20|20|3306|300|61027|1|10000|4|5|5|0|
6|4950|6|66|8|8|522|31|17|100||6|||18|4|435|21|9|147|6|1|10|74|495||80||30|20|20|20|20|4950|350|61027|2|30000|5|6|6|0|
7|6947|7|69|10|10|585|33|20|100||7|||20|4|333|17|7|97|4|1|10|90|695||92||20|15|15|15|15|6947|400|61027|3|50000|6|7|7|0|
8|9309|8|73|12|12|658|35|24|100||8|||24|4|286|14|6|73|3|1|10|109|931||105||20|15|15|15|15|9309|450|61026|1|100000|7|8|8|0|
9|12050|9|77|14|14|741|38|28|100||9|||27|5|250|13|5|55|3|1|10|129|1205||117||20|15|15|15|15|12050|500|61026|2|300000|8|9|9|0|
10|15183|10|82|16|16|834|41|33|100|100|10|||29|5|222|11|4|0|0|0|10|151|1366||130|5|20|15|15|15|15|15183|550|61025|1|500000|9|10|10|0|
11|18720|11|87|19|19|936|45|38|100|100|11|||31|5|200|10|4|0|0|0|11|176|1685||143|10|20|15|15|15|15|18720|600|||||||0|
12|21335|12|92|22|22|1048|48|44|100|100|12|||36|5|182|9|4|0|0|0|12|203|2134||157|15|10|15|10|10|10|21335|650|||||||0|
Now I have another one with 45, what I want is to remove the new letters so that it has exactly 42 vertical bars like above.
|V.8|45|
1|0|1|58|4|4|351|25|8|||1|0||6|3|1000|49|20|430|17|6|0|10|0|1200|25||30|20|20|20|20|0|100|61028|1|0|0|1|1|0|5000|40022|1|
2|543|2|58|4|4|366|26|9|100||2|200||8|3|1000|49|20|430|17|6|10|21|54|2400|36||30|20|20|20|20|543|150|61028|2|100|1|2|2|0|25000|61034|1|
3|1230|3|60|5|5|390|26|10|100||3|1500||10|3|1000|49|20|430|17|6|10|32|123|4800|46||30|20|20|20|20|1230|200|61028|3|1000|2|3|3|0|75000|40250|1|
4|2002|4|61|6|6|424|27|12|100||4|6000||12|4|769|37|15|315|12|4|10|45|200|9600|57||30|20|20|20|20|2002|250|61028|4|5000|3|4|4|0|160000|61035|1|
5|3306|5|63|7|7|468|29|14|100||5|18000||16|4|556|27|11|208|8|2|10|58|331||69||30|20|20|20|20|3306|300|61027|1|10000|4|5|5|0|300000|40355|3|
6|4950|6|66|8|8|522|31|17|100||6|||18|4|435|21|9|147|6|1|10|74|495||80||30|20|20|20|20|4950|350|61027|2|30000|5|6|6|0||||
7|6947|7|69|10|10|585|33|20|100||7|||20|4|333|17|7|97|4|1|10|90|695||92||20|15|15|15|15|6947|400|61027|3|50000|6|7|7|0||||
8|9309|8|73|12|12|658|35|24|100||8|||24|4|286|14|6|73|3|1|10|109|931||105||20|15|15|15|15|9309|450|61026|1|100000|7|8|8|0||||
9|12050|9|77|14|14|741|38|28|100||9|||27|5|250|13|5|55|3|1|10|129|1205||117||20|15|15|15|15|12050|500|61026|2|300000|8|9|9|0||||
10|15183|10|82|16|16|834|41|33|100|100|10|||29|5|222|11|4|0|0|0|10|151|1366||130|5|20|15|15|15|15|15183|550|61025|1|500000|9|10|10|0||||
11|18720|11|87|19|19|936|45|38|100|100|11|||31|5|200|10|4|0|0|0|11|176|1685||143|10|20|15|15|15|15|18720|600|||||||0||||
12|21335|12|92|22|22|1048|48|44|100|100|12|||36|5|182|9|4|0|0|0|12|203|2134||157|15|10|15|10|10|10|21335|650|||||||0||||
And I have this code at the moment:
public string Fix(string FileName, int columnsCount)
{
var InputFile = File.ReadLines(FileName).Skip(1).ToArray();
string Result = "";
for(int i = 0; i < InputFile.Length; i++)
{
int FoundMatches = Regex.Matches(Regex.Escape(InputFile[i]), FindWhatTxtBox.Text).Count;
// If too many letters found, trim the rest.
if(FoundMatches > CountTxtBox.Text.Length)
{
string CurrentLine = InputFile[i];
}
}
return Result;
}
As you can see each line has either one to no numbers inside its vertical bar. How can I remove the extra letters?

Do you have to use a RegEx? It can also be done with string manipulation like this:
using System;
using System.Linq;
public class Program
{
public static void Main()
{
string s = "1|0|1|58|4|4|351|25|8|||1|0||6|3|1000|49|20|430|17|6|0|10|0|1200|25||30|20|20|20|20|0|100|61028|1|0|0|1|1|0|5000|40022|1|";
var arr = s.Split('|') ;
var retVal = String.Join("|", arr.Take(43));
Console.WriteLine(retVal);
}
}
It takes 43 because the 1st digit seems a counter to me... But you can make it 42 of course. Beware that this code will fail is there are less than 43 entries to work with.

Too simple to use Regex. See code below :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string INPUT_FILENAME = #"c:\temp\test.txt";
const string OUTPUT_FILENAME = #"c:\temp\test1.txt";
static void Main(string[] args)
{
StreamReader reader = new StreamReader(INPUT_FILENAME);
StreamWriter writer = new StreamWriter(OUTPUT_FILENAME);
string inputLine = "";
int lineCount = 0;
while ((inputLine = reader.ReadLine()) != null)
{
if (++lineCount == 1)
{
writer.WriteLine(inputLine);
}
else
{
string[] inputArray = inputLine.Split(new char[] {'|'});
writer.WriteLine(string.Join("|", inputArray.Take(43)));
}
}
reader.Close();
writer.Flush();
writer.Close();
}
}
}

Here is a data file, let us keep it easy by only needing 5 items but still using Regex.
Keep your examples small for StackOverflow...one will get more answers.
The below code can be changed to 42 ({0,42}) or any number as needed, but the example will read then write out only 5.
Data File
1|2|3|4|5|6|7|8|9|10
10|9|8|7|6|5|4|3|2|1|0|1|
||||||||||||11|12|
Code To get 0 to 5 Items per line
var data = File.ReadAllText(#"C:\Temp\test.txt");
string pattern = #"^(\d*\|){0,5}";
File.WriteAllLines(#"C:\Temp\testOut.txt",
Regex.Matches(data, pattern, RegexOptions.Multiline)
.OfType<Match>()
.Select(mt => mt.Groups[0].Value));
Resultant File
1|2|3|4|5|
10|9|8|7|6|
|||||

Related

How can I split a text File and use the Integers?

I have a text file that displays students names and their scores. The format looks like this:
James Johnson, 85
Robert Jones, 90
Lindsey Parks, 98
etc.
I have 10 names and scores all in the above format. My problem is how do I split the text file by the delimiter, and use the integers from the text file
Here is my code so far:
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using Microsoft.VisualBasic;
using Microsoft.VisualBasic.FileIO;
namespace TextFiles1
{
class Program
{
static void Main(string[] args)
{
StreamReader sr = new StreamReader(#"C:\Users\jonda\Desktop\StudentScores.txt.txt");
string data = sr.ReadLine();
while (data != null)
{
Console.WriteLine(data);
string[] names = data.Split(',');
data = sr.ReadLine();
}
int total = 0;
double average = 0;
for (int index = 0; index < data.Length; index++)
{
total = total + data[index];
}
average = (double)total / data.Length;
Console.WriteLine("Average = " + average.ToString("N2"));
int high = data[0];
for (int index = 0; index < data.Length; index++)
{
if (data[index] > high)
{
high = data[index];
}
}
Console.WriteLine("Highest Score =" + high);
sr.Close();
Console.ReadLine();
}
}
}
First of all, it's a good idea to separate file operations and other operations. File operations are slow and costly, and should be completed as soon as possible. I would use a separate method, read the lines into a List and close the file operation first.
private static List<string> ReadFile(string path)
{
List<string> records = new List<string>();
using (StreamReader sr = new StreamReader(path))
{
while (!sr.EndOfStream)
records.Add(sr.ReadLine());
}
return records;
}
Then I would pass that list to another function and calculate average, max etc.
private static void CalculateAverage(List<string> lines)
{
char[] seperator = new char[] { ',' };
List<int> scores = new List<int>();
if (lines != null && lines.Count > 0)
{
foreach (string line in lines)
{
Console.WriteLine(line);
string[] parts = line.Split(seperator);
int val;
if (int.TryParse(parts[1], out val))
scores.Add(val);
}
}
Console.WriteLine("Average: {0}", scores.Average());
Console.WriteLine("Highest Score: {0}", scores.Max());
}
Then in your main program call the methods like this:
List<string> lines = ReadFile(path);
CalculateAverage(lines);
Use Regex to find each person info and then split each of them and extract Name and Score.
Try like this:
var inputStr = "James Johnson, 85 Robert Jones, 90 Lindsey Parks, 98";
var regex = new Regex(#"[A-z]* [A-z]*, [0-9]*");
return regex.Matches(inputStr)
.OfType<Match>()
.Select(p => p.Value.Split(','))
.Select(p => new { Name = p[0], Score = Convert.ToInt32(p[1].Trim()) });
Result :
I hope to be helpful for you :)

Extract lines of text between delimiters and add them to a List<string>

I have a text file which contains some numbers between delimiters that I want to extract and add them to two different lists . The lists I want to populate are:
points = new List<string>();
coords = new List<string>();
This is how the input file looks like:
blah
blah
point [
0 50 50,
50 50 50,
50 0 50,
0 0 50,
]
blah
blah
coordIndex [
3,2,0,-1,
2,1,0,-1,
]
blah
blah
blah
blah
point [
0 50 0,
50 50 0,
50 0 0,
0 0 0,
]
blah
blah
coordIndex [
3,0,2,-1,
0,1,2,-1,
]
blah
blah
What I want to do is to get the numbers (including the commas) following the logic below:
Get the lines between the keywords "point [" and the next "]". Each line is a "string".
Add these lines to a the list "points"
Get the lines between the keywords "coordIndex [" and the next "]". Each line is a "string".
Add these lines to a the list "coords"
So far I have only managed to get rid off the blank spaces to "gain access" to the points field but I do not know how to populate the lists.
Can someone help? I am happy to use regex or whatever other option.
The code
using System;
using System.IO;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;
namespace parsing
{
class Program
{
static void Main(string[] args)
{
string inputFile = "Q:/inputFile.txt";
//Lists to be populated
var points = new List<string>();
var coords = new List<string>();
//Parsing the file
using (StreamReader readInputFile = new StreamReader(inputFile))
{
string line;
while ((line = readInputFile.ReadLine()) != null)
{
if (!string.IsNullOrWhiteSpace(line))
{
//Remove tabs
string line_noTabs = line.Replace("\t", "");
//Get lines between the keywords "point [" and the next "]". Each line is a "string".
//Add these lines to a the list "points"
//Get lines between the keywords "coordIndex [" and the next "]". Each line is a "string".
//Add these lines to a the list "coords"
}
}
}
}//end main
}//end program
}//end namespace
You would just need to go through the file, one line at a time and set a series of flags which you can then later check to see in which chunk of your file you are.
You will then check with the flags to see to which list you will need to add your item.
Your code would need to look something like so:
using (StreamReader readInputFile = new StreamReader(inputFile))
{
string line;
bool isPoint = false;
bool isCoord = false;
Regex pointRegex = new Regex("point\\s+\\[");
Regex coordRegex = new Regex("coordIndex\\s+\\[");
Regex endBrace = new Regex("\\s*\\]\\s*");
while ((line = readInputFile.ReadLine()) != null)
{
if (!string.IsNullOrWhiteSpace(line))
{
//Remove tabs
string line_noTabs = line.Replace("\t", "");
if(pointRegex.IsMatch(line_noTabs))
{
isPoint = true;
continue;
}
else if(coordRegex.IsMatch(line_noTabs))
{
isCoord = true;
continue;
}
else if (endBrace.IsMatch(line_noTabs))
{
isPoint = false; //Reset
isCoord = false; //Reset
continue;
}
if(isPoint)
points.Add(line_noTabs);
else if(isCoord)
coords.Add(line_noTabs);
}
}
}
Here's one approach:
Use \[([^\]]+)] to get everything between the brackets (https://regex101.com/r/sG4zO1/1).
For each regex result, use regexResult.Split(new char[0], StringSplitOptions.RemoveEmptyEntries) to get an array of numbers inside the brackets.
Looping through the regex results, know:
if index % 2 == 0, then you are looking at point values
if index % 2 == 1, then you have coordIndex values
If you want to get each line in each point\coordIndex batch as a new Item in a List, then by using a regex like (\d+\s\d+\s\d+)|(\d,\d,\d,-?\d,) you will get two groups of matches - one for points, and one for coordIndexes (this is relying on the examples you gave, if the data can vary then the regex will have to be improved).
If later on you need to use the numbers in those point\coordIndex strings, you will need to apply some c# to "parse" the strings.
Try this. I change your list to integers. I sued Regex but can easily change code to a method similar to your code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"C:\temp\test.txt";
static void Main(string[] args)
{
string input = File.ReadAllText(FILENAME);
string pattern = #"(?'type'\w+)\s+\[(?'array'[^\]]+)\]";
Regex expr = new Regex(pattern, RegexOptions.Singleline);
MatchCollection matches = expr.Matches(input);
//Lists to be populated
List<List<int>> points = new List<List<int>>();
List<List<int>> coords = new List<List<int>>();
foreach (Match match in matches)
{
string type = match.Groups["type"].Value;
string strArray = match.Groups["array"].Value;
StringReader reader = new StringReader(strArray);
string line = "";
while ((line = reader.ReadLine()) != null)
{
line = line.Trim();
if (line.Length > 0)
{
List<int> intArray = line.Split(new char[] { ',', ' '}, StringSplitOptions.RemoveEmptyEntries).Select(x => int.Parse(x)).ToList();
switch (type)
{
case "point":
points.Add(intArray);
break;
case "coordIndex":
coords.Add(intArray);
break;
}
}
}
}
}
}
}
​

C# Writing Binary Data

I am trying to get some data to write to a binary file. The data consists of multiple values (strings, decimal, ints) that need to be a single string and then written to a binary file.
What I have so far creates the file, but it's putting my string in there as they appear and not converting them to binary, which I assume should look like 1010001010 etc. when I open the file in notepad?
The actual output is Jesse23023130123456789.54321 instead of the binary digits.
Where have I steered myself wrong on this?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace BinaryData
{
class Program
{
static void Main(string[] args)
{
string name = "Jesse";
int courseNum = 230;
int num = 23130;
decimal d = 123456789.54321M;
string combined = name + courseNum + num + d;
FileStream writeStream;
writeStream = new FileStream("BinaryData.dat", FileMode.Create);
BinaryWriter bw = new BinaryWriter(writeStream);
bw.Write(combined);
}
}
}
There's more than one way to do this, but here's a basic approach. After you combine everything into a single string iterate through the string and convert each character into it's binary representation with Convert.ToString(char, 2). ASCII characters normally will be 7 bits or less in length, so you'll need to PadLeft(8, '0') to ensure 8 bits per byte. Then for the reverse you just grab 8 bits at a time and convert it back to its ASCII character. Without padding with leading 0's to ensure eight bits you won't be sure how many bits make up each character in the file.
using System;
using System.Text;
public class Program
{
public static void Main()
{
string name = "Jesse";
int courseNum = 230;
int num = 23130;
decimal d = 123456789.54321M;
string combined = name + courseNum + num + d;
// Translate ASCII to binary
StringBuilder sb = new StringBuilder();
foreach (char c in combined)
{
sb.Append(Convert.ToString(c, 2).PadLeft(8, '0'));
}
string binary = sb.ToString();
Console.WriteLine(binary);
// Translate binary to ASCII
StringBuilder decodedBinary = new StringBuilder();
for (int i = 0; i < binary.Length; i += 8)
{
decodedBinary.Append(Convert.ToChar(Convert.ToByte(binary.Substring(i, 8), 2)));
}
Console.WriteLine(decodedBinary);
}
}
Results:
01001010011001010111001101110011011001010011001000110011001100000011001000110011001100010011001100110000001100010011001000110011001101000011010100110110001101110011100000111001001011100011010100110100001100110011001000110001
Jesse23023130123456789.54321
Fiddle Demo
Here you go:
The main method:
static void Main(string[] args)
{
string name = "Jesse";
int courseNum = 230;
int num = 23130;
decimal d = 123456789.54321M;
string combined = name + courseNum + num + d;
string bitString = GetBits(combined);
System.IO.File.WriteAllText(#"your_full_path_with_exiting_text_file", bitString);
Console.ReadLine();
}
The method returns the bits, 0 and 1 based on your string input of-course:
public static string GetBits(string input)
{
StringBuilder sb = new StringBuilder();
foreach (byte b in Encoding.Unicode.GetBytes(input))
{
sb.Append(Convert.ToString(b, 2));
}
return sb.ToString();
}
If you want to create the .txt file then add the code for it. This example has already a .txt created, so it just needs the full path to write to it.

Find all the words containing more than 10 letters in string from text file

Down below I have written code that successfully checks if the text file I enter has the character "A" in it. It returns yes or no based on the result. However, now I would like to list all the words bigger than 10 characters. Please note that I use ReadAllText when I read the string. Therefore the whole text file is within the same string. I'm looking for the way to think rather than oven-ready code. Thank you all!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace TESTING
{
class Testing
{
static void Main(string[] args)
{
//ask user for the filename
string userInput = fetchFileName("Enter the textfile you want to view: ");
//test if the filename writes anything to console
string fileContents = File.ReadAllText(userInput);
string theFileContents = analyseFile(fileContents);
//Console.WriteLine(theFileContents);
Console.ReadLine();
}
private static string analyseFile(string fileContents)
{
string str = fileContents;
if (str.Contains("A"))
{
Console.WriteLine("YES");
}
else
{
Console.WriteLine("NO");
}
return str;
}
private static string fetchFileName(string askFileName)
{
Console.WriteLine(askFileName);
string userAnswer = Console.ReadLine();
return userAnswer;
}
}
}
Since your file is in a string, you could use string's Split method to convert it to tokens ("words"):
var tokens = fileContents.Split(' ', '\t', '\n', '\r');
With an array of tokens in hand, use the filtering technique that you prefer to keep only 10-character words. C# offers many choices to do that - you could use a for loop, a foreach loop, or use Where extension method provided by LINQ.
just split your fileContent in words using String.Split(' ') and check then make a LINQ-query on the resulting array returning every word with Length > 10.
Sth. like this:
string fileContents = File.ReadAllText(userInput);
var result = fileContents.Split(' ').Where(x => x.Length > 10);
This should work but is not tested
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace test1
{
class Program
{
static void Main(string[] args)
{
//ask user for the filename
string userInput = fetchFileName("Enter the textfile you want to view: ");
//test if the filename writes anything to console
string fileContents = File.ReadAllText(userInput);
string theFileContents = analyseFile(fileContents);
// Console.WriteLine(theFileContents);
foreach (var item in tenOrMore(fileContents))
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static IEnumerable<string> tenOrMore(string fileContents)
{
foreach (var item in fileContents.Split(' ', '\t', '\n', '\r'))
{
if (item.Length.CompareTo(10) > 0)
{
yield return item;
}
}
}
private static string analyseFile(string fileContents)
{
string str = fileContents;
if (str.Contains("A"))
{
Console.WriteLine("YES");
}
else
{
Console.WriteLine("NO");
}
return str;
}
private static string fetchFileName(string askFileName)
{
Console.WriteLine(askFileName);
string userAnswer = Console.ReadLine();
return userAnswer;
}
}
}
use Split as below:
ArrayList ar=new ArrayList(); // list of strings with the lenghth of greater than 10
String[] userInputWords=userInput.Split(' ', '\t', '\n', '\r');
foreach(String str in userInputWords){
if (str.Length()>10){
ar.add(str);
}
}
Another, more naive way is to start from the beginning of the string and increment letter counter every time you encounter anything that isn't a whitespace character.
When you do encounter a whitespace, see if the letter counter is > 10. If yes, the word was longer than 10 characters. Reset letter counter to 0 and continue searching for words until the end of file.
This approach has an advantage of not doing splitting and extra string allocation, which is may be a lot of work for large files with many words.
Pseudocode:
letterCount = 0
wordCount = 0
letter = GetNextLetter()
while (isNotEndOfFile())
{
if (notAWhitespace(letter))
{
letterCount++
}
else
{
if (letterCount > 10) { wordCount++ }
letterCount=0
}
letter = GetNextLetter()
}
if (letterCount > 10) { wordCount++ }

Replacing text in a file with datatable values

We have a sample text file with the text:
The things God has prepared for those who love him
We read the text into datatable and assigned some values like this:
The 1
----------
things 2
----------
God 3
----------
has 4
----------
prepared 5
----------
for 6
----------
those 7
----------
who 8
----------
love 9
----------
him 10
----------
We're trying to replace the text in the input file with these corresponding numbers.
Is it possible? If possible, how can we do it?
Edit2:
we edited our code like this:
:
void replace()
{
string s1, s2;
StreamReader streamReader;
streamReader = File.OpenText("C:\\text.txt");
StreamWriter streamWriter = File.CreateText("C:\\sample1.txt");
int x = st.Rows.Count;
int i1 = 0;
// Now, read the entire file into a string
while ((line = streamReader.ReadLine()) != null)
{
for (int i = 0; i < x; i++)
{
s1 = Convert.ToString(st.Rows[i]["Word"]);
s2 = Convert.ToString(st.Rows[i]["Binary"]);
s2+="000";
char[] delimiterChars = { ' ', '\t' };
string[] words = line.Split(delimiterChars);
// Write the modification into the same file
string ab = words[i1]; // exception occurs here
// Console.WriteLine(ab);
streamWriter.Write(ab.Replace(s1, s2));
i1++;
}
}
streamReader.Close();
streamWriter.Close();
}
but we're getting an "Array index out of bounds" exception. we're unable to find the problem.
thanks in advance
here is a bit of code to help you get going, it hasn't been extensively tested:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
File.WriteAllText("sample1.txt", "The things God has prepared for those who love him the");
string text = File.ReadAllText("sample1.txt").ToLower();
var words = text
.Split(new [] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.Distinct()
.OrderByDescending(word => word.Length);
var values = new Dictionary<string, int>();
for (int i = 0; i < words.Count(); i++)
{
values.Add(words.ElementAt(i), i + 1);
}
foreach (var kvp in values)
{
text = text.Replace(kvp.Key, kvp.Value.ToString());
}
File.WriteAllText("sample1.txt", text);
Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}
}
}
it creates a test text file, reads it, converts it to lowercase, makes identifiers for distinct words, and replaces text based on those identifiers. long words are replaced before short words to offer a bit of bad replacement prevention.
UPDATE: I just noticed the question was updated and it's no longer an option to read the entire file in one string.. sigh.. so my answer only applies when you read and write all text in one go, maybe you can reuse parts of it when reading and writing per word.

Categories

Resources