Get specific words from string c# - c#

I am working on a final year project. I have a file that contain some text. I need to get words form this file that contain "//jj" tag. e.g abc//jj, bcd//jj etc.
suppose file is containing the following text
ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj
dsdsd sfsfhf//vv
dfdfdf
I need all the words that are associated with //jj tag. I am stuck here past few days.
My code that i am trying
// Create OpenFileDialog
Microsoft.Win32.OpenFileDialog dlg = new Microsoft.Win32.OpenFileDialog();
// Set filter for file extension and default file extension
dlg.DefaultExt = ".txt";
dlg.Filter = "Text documents (.txt)|*.txt";
// Display OpenFileDialog by calling ShowDialog method
Nullable<bool> result = dlg.ShowDialog();
// Get the selected file name and display in a TextBox
string filename = string.Empty;
if (result == true)
{
// Open document
filename = dlg.FileName;
FileNameTextBox.Text = filename;
}
string text;
using (var streamReader = new StreamReader(filename, Encoding.UTF8))
{
text = streamReader.ReadToEnd();
}
string FilteredText = string.Empty;
string pattern = #"(?<before>\w+) //jj (?<after>\w+)";
MatchCollection matches = Regex.Matches(text, pattern);
for (int i = 0; i < matches.Count; i++)
{
FilteredText="before:" + matches[i].Groups["before"].ToString();
//Console.WriteLine("after:" + matches[i].Groups["after"].ToString());
}
textbx.Text = FilteredText;
I cant find my result please help me.

With LINQ you could do this with one line:
string[] taggedwords = input.Split(' ').Where(x => x.EndsWith(#"//jj")).ToArray();
And all your //jj words will be there...

Personally I think Regex is overkill if that's definitely how the string will look. You haven't specified that you definitely need to use Regex so why not try this instead?
// A list that will hold the words ending with '//jj'
List<string> results = new List<string>();
// The text you provided
string input = #"ffafa adada//bb adad ssss//jj aad adad adadad aaada dsdsd//jj dsdsd sfsfhf//vv dfdfdf";
// Split the string on the space character to get each word
string[] words = input.Split(' ');
// Loop through each word
foreach (string word in words)
{
// Does it end with '//jj'?
if(word.EndsWith(#"//jj"))
{
// Yes, add to the list
results.Add(word);
}
}
// Show the results
foreach(string result in results)
{
MessageBox.Show(result);
}
Results are:
ssss//jj
dsdsd//jj
Obviously this is not quite as robust as a regex, but you didn't provide any more detail for me to go on.

You have an extra space in your regex, it assumes there's a space before "//jj". What you want is:
string pattern = #"(?<before>\w+)//jj (?<after>\w+)";

This regular expression will yield the words you are looking for:
string pattern = "(\\S*)\\/\\/jj"
A bit nicer without backslash escaping:
(\S*)\/\/jj
Matches will include the //jj but you can get the word from the first bracketed group.

Related

Restrict user from inputting certain languages.

I am creating a website which include a comment area for users. Example a guestbook or a product review. And I want to restrict a user on posting inappropriate languages on the comment area. For example: vulgarities.
If the user input any vulgarities, the characters would be replace by * . *Example - from stupid to s * * * * **.
I had been researching on related website but it was unfruitful. Suggestions or tutorials on this would be greatly appreciated.
There is no way to fully stop "bad language" from being used but you can try to prevent it by creating a text file that contains a bad word in every line. Then load the list of words from the file to a List<String> in your program. You can do so by doing the following:
// The list of swear words
List<string> swearWords = new List<string>();
private void GetSwearWords()
{
// Get the path to the file that has the swear words list
string path = <File Path>;
// Open the text file
TextReader reader = new StreamReader(path);
// Loop through each line in the file.
string line = "";
while ((line = reader.ReadLine()) != null)
{
// Lower cases word and removes whitespaces
string word = line.Trim().ToLower();
// Adds the word to the list
swearWords.Add(word);
}
}
Then, to determine if a string has one of these bad words you do the following:
private bool HasSwearWord(string text)
{
// Splits words, removes whitespace and any punctuation
string[] wordArray = Regex.Split(text, #"\W+");
// Check if any word in the string is a swear word
foreach (string word in wordArray)
{
if (swearWords.Contains(word.ToLower()))
{
return true;
}
}
return false;
}

In C# how can I prepare a string to be valid for windows directory name

I am writing a C# program which reads certain tags from files and based on tag values it creates a directory structure.
Now there could be anything in those tags,
If the tag name is not suitable for a directory name I have to prepare it to make it suitable by replacing those characters with anything suitable. So that directory creation does not fail.
I was using following code but I realised this is not enough..
path = path.replace("/","-");
path = path.replace("\\","-");
please advise what's the best way to do it..
thanks,
Import System.IO namespace and for path use
Path.GetInvalidPathChars
and for filename use
Path.GetInvalidFileNameChars
For Eg
string filename = "salmnas dlajhdla kjha;dmas'lkasn";
foreach (char c in Path.GetInvalidFileNameChars())
filename = filename.Replace(System.Char.ToString(c), "");
foreach (char c in Path.GetInvalidPathChars())
filename = filename.Replace(System.Char.ToString(c), "");
Then u can use Path.Combine to add tags to create a path
string mypath = Path.Combine(#"C:\", "First_Tag", "Second_Tag");
//return C:\First_Tag\Second_Tag
You can use the full list of invalid characters here to handle the replacement as desired. These are available directly via the Path.GetInvalidFileNameChars and Path.GetInvalidPathChars methods.
The characters you must now use are: ? < > | : \ / * "
string PathFix(string path)
{
List<string> _forbiddenChars = new List<string>();
_forbiddenChars.Add("?");
_forbiddenChars.Add("<");
_forbiddenChars.Add(">");
_forbiddenChars.Add(":");
_forbiddenChars.Add("|");
_forbiddenChars.Add("\\");
_forbiddenChars.Add("/");
_forbiddenChars.Add("*");
_forbiddenChars.Add("\"");
for (int i = 0; i < _forbiddenChars.Count; i++)
{
path = path.Replace(_forbiddenChars[i], "");
}
return path;
}
Tip: You can't include double-quote ("), but you can include 2 quotes ('').
In this case:
string PathFix(string path)
{
List<string> _forbiddenChars = new List<string>();
_forbiddenChars.Add("?");
_forbiddenChars.Add("<");
_forbiddenChars.Add(">");
_forbiddenChars.Add(":");
_forbiddenChars.Add("|");
_forbiddenChars.Add("\\");
_forbiddenChars.Add("/");
_forbiddenChars.Add("*");
//_forbiddenChars.Add("\""); Do not delete the double-quote character, so we could replace it with 2 quotes (before the return).
for (int i = 0; i < _forbiddenChars.Count; i++)
{
path = path.Replace(_forbiddenChars[i], "");
}
path = path.Replace("\"", "''"); //Replacement here
return path;
}
You'll of course use only one of those (or combine them to one function with a bool parameter for replacing the quote, if needed)
The correct answer of Nikhil Agrawal has some syntax errors.
Just for the reference, here is a compiling version:
public static string MakeValidFolderNameSimple(string folderName)
{
if (string.IsNullOrEmpty(folderName)) return folderName;
foreach (var c in System.IO.Path.GetInvalidFileNameChars())
folderName = folderName.Replace(c.ToString(), string.Empty);
foreach (var c in System.IO.Path.GetInvalidPathChars())
folderName = folderName.Replace(c.ToString(), string.Empty);
return folderName;
}

how to use regex class for string maniplations

I am working on string maniplations using regex.
Source: string value = #"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
output required:
Foldername: folder1
content name: content
folderpath:/webdav/MyPublication/Building%20Blocks/folder0/folder1/
I am new to this, can any one say how it can be done using regex.
Thank you.
The rules you need seem to be the following:
Folder name = last string preceding a '/' character but not containing a '/' character
content name = last string not containing a '/' character until (but not including) a '_' or '.' character
folderpath = same as folder name except it can contain a '/' character
Assuming the rules above - you probably want this code:
string value = #"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
var foldernameMatch = Regex.Match(value, #"([^/]+)/[^/]+$");
var contentnameMatch = Regex.Match(value, #"([^/_\.]+)[_\.][^/]*$");
var folderpathMatch = Regex.Match(value, #"(.*/)[^/]*$");
if (foldernameMatch.Success && contentnameMatch.Success && folderpathMatch.Success)
{
var foldername = foldernameMatch.Groups[1].Value;
var contentname = contentnameMatch.Groups[1].Value;
var folderpath = folderpathMatch.Groups[1].Value;
}
else
{
// handle bad input
}
Note that you can also combine these to become one large regex, although it can be more cumbersome to follow (if it weren't already):
var matches = Regex.Match(value, #"(.*/)([^/]+)/([^/_\.]+)[_\.][^/]*$");
if (matches.Success)
{
var foldername = matches.Groups[2].Value;
var contentname = matches.Groups[3].Value;
var folderpath = matches.Groups[1].Value + foldername + "/";
}
else
{
// handle bad input
}
You could use named captures, but you're probably better off (from a security and implementation aspect) just using the Uri class.
I agree with Jeff Moser on this one, but to answer the original question, I believe the following regular expression would work:
^(\/.+\/)(.+?)\/(.+?)\.
edit: Added example.
var value = "/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
var regex = Regex.Match(value, #"^(\/.+\/)(.+?)\/(.+?)\.");
// check if success
if (regex.Success)
{
// asssign the values from the regular expression
var folderName = regex.Groups[2].Value;
var contentName = regex.Groups[3].Value;
var folderPath = regex.Groups[1].Value;
}

Searching strings in txt file

I have a .txt file with a list of 174 different strings. Each string has an unique identifier.
For example:
123|this data is variable|
456|this data is variable|
789|so is this|
etc..
I wish to write a programe in C# that will read the .txt file and display only one of the 174 strings if I specify the ID of the string I want. This is because in the file I have all the data is variable so only the ID can be used to pull the string. So instead of ending up with the example about I get just one line.
eg just
123|this data is variable|
I seem to be able to write a programe that will pull just the ID from the .txt file and not the entire string or a program that mearly reads the whole file and displays it. But am yet to wirte on that does exactly what I need. HELP!
Well the actual string i get out from the txt file has no '|' they were just in the example. An example of the real string would be: 0111111(0010101) where the data in the brackets is variable. The brackets dont exsist in the real string either.
namespace String_reader
{
class Program
{
static void Main(string[] args)
{
String filepath = #"C:\my file name here";
string line;
if(File.Exists(filepath))
{
StreamReader file = null;
try
{
file = new StreamReader(filepath);
while ((line = file.ReadLine()) !=null)
{
string regMatch = "ID number here"; //this is where it all falls apart.
Regex.IsMatch (line, regMatch);
Console.WriteLine (line);// When program is run it just displays the whole .txt file
}
}
}
finally{
if (file !=null)
file.Close();
}
}
Console.ReadLine();
}
}
}
Use a Regex. Something along the lines of Regex.Match("|"+inputString+"|",#"\|[ ]*\d+\|(.+?)\|").Groups[1].Value
Oh, I almost forgot; you'll need to substitute the d+ for the actual index you want. Right now, that'll just get you the first one.
The "|" before and after the input string makes sure both the index and the value are enclosed in a | for all elements, including the first and last. There's ways of doing a Regex without it, but IMHO they just make your regex more complicated, and less readable.
Assuming you have path and id.
Console.WriteLine(File.ReadAllLines(path).Where(l => l.StartsWith(id + "|")).FirstOrDefault());
Use ReadLines to get a string array of lines then string split on the |
You could use Regex.Split method
FileInfo info = new FileInfo("filename.txt");
String[] lines = info.OpenText().ReadToEnd().Split(' ');
foreach(String line in lines)
{
int id = Convert.ToInt32(line.Split('|')[0]);
string text = Convert.ToInt32(line.Split('|')[1]);
}
Read the data into a string
Split the string on "|"
Read the items 2 by 2: key:value,key:value,...
Add them to a dictionary
Now you can easily find your string with dictionary[key].
first load the hole file to a string.
then try this:
string s = "123|this data is variable| 456|this data is also variable| 789|so is this|";
int index = s.IndexOf("123", 0);
string temp = s.Substring(index,s.Length-index);
string[] splitStr = temp.Split('|');
Console.WriteLine(splitStr[1]);
hope this is what you are looking for.
private static IEnumerable<string> ReadLines(string fspec)
{
using (var reader = new StreamReader(new FileStream(fspec, FileMode.Open, FileAccess.Read, FileShare.Read)))
{
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
}
var dict = ReadLines("input.txt")
.Select(s =>
{
var split = s.Split("|".ToArray(), 2);
return new {Id = Int32.Parse(split[0]), Text = split[1]};
})
.ToDictionary(kv => kv.Id, kv => kv.Text);
Please note that with .NET 4.0 you don't need the ReadLines function, because there is ReadLines
You can now work with that as any dictionary:
Console.WriteLine(dict[12]);
Console.WriteLine(dict[999]);
No error handling here, please add your own
You can use Split method to divide the entire text into parts sepparated by '|'. Then all even elements will correspond to numbers odd elements - to strings.
StreamReader sr = new StreamReader(filename);
string text = sr.ReadToEnd();
string[] data = text.Split('|');
Then convert certain data elements to numbers and strings, i.e. int[] IDs and string[] Strs. Find the index of the given ID with idx = Array.FindIndex(IDs, ID.Equals) and the corresponding string will be Strs[idx]
List <int> IDs;
List <string> Strs;
for (int i = 0; i < data.Length - 1; i += 2)
{
IDs.Add(int.Parse(data[i]));
Strs.Add(data[i + 1]);
}
idx = Array.FindIndex(IDs, ID.Equals); // we get ID from input
answer = Strs[idx];

How can I efficiently process a delimited text file?

I'm simply trying to execute File.ReadAllLines against a specific file and, for every line, split on |. I have to use regex on this one.
This code below doesnt work, but you'll see what i'm trying to do:
string[] contents = File.ReadAllLines(filename);
string[] splitlines = Regex.Split(contents, '|');
foreach (string split in splitlines)
{
//Regex line = content.Split('|');
//content.Split('|');
string prefix = prefix = Regex.Match(line, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
It's not entirely clear to me what you are trying to do, but there are a number of errors in your code. I have tried to guess what you are doing, but if this isn't what you want, please explain what you do want preferably with some examples:
string inputFilename = "input.txt";
string outputFilename = "output.txt";
using (StreamWriter streamWriter = File.AppendText(outputFilename))
{
using (StreamReader streamReader = File.OpenText(inputFilename))
{
while (true)
{
string line = streamReader.ReadLine();
if (line == null)
{
break;
}
string[] splitlines = line.Split('|');
foreach (string split in splitlines)
{
Match match = Regex.Match(split, #"\S+\d+");
if (match.Success)
{
string prefix = match.Groups[0].Value;
streamWriter.WriteLine(prefix);
}
else
{
// Handle match failed...
}
}
}
}
}
Key points:
You seem to want to perform an operation on each line, so you need to iterate over the lines.
Use the simple string.Split method if you want to split on a single character. Regex.Split doesn't accept a character and "|" has a special meaning in regular expressions so it wouldn't have worked anyway unless you escaped it.
You were opening and closing the output file multiple times. You should open it just once and keep it open until you have finished writing to it. The using keyword is useful here.
Use WriteLine instead of appending "\r\n".
If the input file is large, use a StreamReader instead of ReadAllLines.
If the match fails, your program will throw an exception. You probably should check match.Success before using the match and if this returns false, handle the error appropriately (skip the line, report a warning, throw an exception with an appropriate message, etc.)
You aren't actually using groups 1 and 2 in the regular expression, so you can remove the parentheses to save the regular expression engine from having to store results that you won't use anyway.
You should pass the original string to Regex.Split and not an array.
Looks like you are using line instead of split when settings the prefix. Without knowing more about your code I cant tell if it's right or not but in any case it sticks out as the error.(it shouldnt build either)
This is a really inefficient on at least two levels :)
Regex.Split takes a string, not an array of strings.
I would recommend calling Regex.Split on each item of contents individually, then looping over the results of that call. This would mean nested for loops.
string[] contents = File.ReadAllLines(filename);
foreach (string line in contents)
{
string[] splitlines = Regex.Split(line);
foreach (string splitline in splitlines)
{
string prefix = Regex.Match(splitline, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
}
This, of course isn't the most efficient way to go about it.
A more efficient way might be to split on a regular expression instead. I think this works:
string splitlines = Regex.Split(File.ReadAllText(filename), "$|\\|");
I have to assume, based on the limited feedback, that this is what you're looking for:
string inputFile = filename;
string outputFile = Path.Combine( workingdirform2, "configuration.txt" );
using ( StreamReader inputFileStream = File.OpenText( inputFile ) )
{
using ( StreamWriter ouputFileStream = File.AppendText( outputFile ) )
{
// Iterate over the file contents to extract the prefix
string currentLine;
while ( ( currentLine = inputFileStream.ReadLine() ) != null )
{
// Notice the updated Regex - your's is a bit broken
string prefix = Regex.Match( currentLine, #"^(\S+?)\d+" ).Groups[1].Value;
ouputFileStream.WriteLine( prefix );
}
}
}
This would take a file full of:
Text1231|abc|abc
Text1232|abc|abc
Text1233|abc|abc
Text1234|abc|abc
and place:
Text
Text
Text
Text
into a new file.
I hope this, at least, gets you on the right path. My crystal ball is getting hazy.. haaazzzy..
Probably one of the best way to process text files in C# is to use fileHelpers. Give it a look. It allows you to strongly type your import data.

Categories

Resources