Transform Search String into FullText Compatible Search String? - c#

I'm working with the fulltext search engine of MSSQL 2008 which expects a search string like this:
("keyword1" AND "keyword2*" OR "keyword3")
My users are entering things like this:
engine 2009
"san francisco" hotel december xyz
stuff* "in miami" 1234
something or "something else"
I'm trying to transform these into fulltext engine compatible strings like these:
("engine" AND "2009")
("san francisco" AND "hotel" AND "december" AND "xyz")
("stuff*" "in miami" "1234")
("something" OR "something else")
I have a really difficult time with this, tried doing it using counting quotation marks, spaces and inserting etc. but my code looks like horrible for-and-if vomit.
Can someone help?

Here you go:
class Program {
static void Main(string[] args) {
// setup some test expressions
List<string> searchExpressions = new List<string>(new string[] {
"engine 2009",
"\"san francisco\" hotel december xyz",
"stuff* \"in miami\" 1234 ",
"something or \"something else\""
});
// display and parse each expression
foreach (string searchExpression in searchExpressions) {
Console.WriteLine(string.Concat(
"User Input: ", searchExpression,
"\r\n\tSql Expression: ", ParseSearchExpression(searchExpression),
"\r\n"));
}
Console.ReadLine();
}
private static string ParseSearchExpression(string searchExpression) {
// replace all 'spacecharacters' that exists within quotes with character 0
string temp = Regex.Replace(searchExpression, #"""[^""]+""", (MatchEvaluator)delegate(Match m) {
return Regex.Replace(m.Value, #"[\s]", "\x00");
});
// split string on any spacecharacter (thus: quoted items will not be splitted)
string[] tokens = Regex.Split(temp, #"[""\s]+", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
// generate result
StringBuilder result = new StringBuilder();
string tokenLast = string.Empty;
foreach (string token in tokens) {
if (token.Length > 0) {
if ((token.Length > 0) && (!token.Equals("AND", StringComparison.OrdinalIgnoreCase))) {
if (result.Length > 0) {
result.Append(tokenLast.Equals("OR", StringComparison.OrdinalIgnoreCase) ? " OR " : " AND ");
}
result.Append("\"").Append(token.Replace("\"", "\"\"").Replace("\x00", " ")).Append("\"");
}
tokenLast = token;
}
}
if (result.Length > 0) {
result.Insert(0, "(").Append(")");
}
return result.ToString();
}
}

Related

How to replace the given email address with a value in a given input?

I have written the method shown below to replace some of the email domains like #gmail.com and #yahoo.com with a given text:
public static string RemovePersonalInfo(string input)
{
string[] tokens = input.Split(new char[] { ' ', '\t', '\r', '\n' });
string output = string.Empty;
foreach (string token in tokens)
{
if (token.Contains("#gmail.com"))
{
output += " SOMETEXT";
}
else
{
output += " " + token;
}
}
tokens = output.Split(new char[] { ' ', '\t', '\r', '\n' });
output = string.Empty;
foreach (string token in tokens)
{
if (token.Contains("#yahoo.com"))
{
output += " SOMETEXT";
}
else
{
output += " " + token;
}
}
return output;
}
It is working as expected for the below input.
But I don't think it is a good solution, I can see the improvements in the code but it is not scalable, let's see tomorrow some other email domain comes, I will have to again modify the code and write another if condition. the second improvement is that I am running the loop twice, it can be done in one loop. so performance can be improved.
Or if there is any better approach than this, please suggest.
Input:
test#gmail.com test#abc.com #teest#yahoo.com
Output:
SOMETEXT test#abc.com SOMETEXT
Note: I am not supposed to use the Replace method. So the only intention here is to use the same logic in basic programming languages like C and C++ as well.
To expand on my comment, I came to realize that there isn't really much point using a Dictionary because you don't need any of the functionality it provides. All you really need is a list of find replace pairs:
public static string RemovePersonalInfo(string input)
{
//this is just hardcoded for purposes of the question. Consider putting it in config file or DB etc.
//It's simply a list of Find/Replace pairs
(string F, string R)[] frs = {
( "#yahoo.com", "SOMETEXT" ),
( "#gmail.com", "SOMEOTHERTEXTMAYBE" )
};
string[] tokens = input.Split(' ', '\t', '\r', '\n');
var outputSb = new StringBuilder();
foreach (string token in tokens) {
var fr = frs.FirstOrDefault(t => token.Contains(t.F));
outputSb.Append(" ").Append(fr == default ? token : fr.R);
}
return outputSb.ToString();
}
The actual pairs can come from config, DB, code etc..
If SOMETEXT will always be the same, you can just use a simple enumerable of string:
public static string RemovePersonalInfo(string input, string sometext)
{
//this is just hardcoded for purposes of the question. Consider putting it in config file or DB etc
var ws = new[]{"#yahoo.com","#gmail.com"};
string[] tokens = input.Split(' ', '\t', '\r', '\n');
var outputSb = new StringBuilder();
foreach (string token in tokens)
outputSb.Append(" ").Append(ws.Any(w => token.Contains(w)) ? sometext : token);
return outputSb.ToString();
}
If the output is not supposed to have a leading space, use outputSb.ToString(1, sb.Length-1)
Thank you #Caius, your answer helped me really.
I have also tried the below ways to solve the issue. I have hardcoded the values in my example, but these can be configured.
public static string RemovePersonalInfo(string input)
{
if (input == null) { throw new NullReferenceException(nameof(input)); }
if (string.IsNullOrWhiteSpace(input)) { return input; }
return RemovePersonalInfo(input.Split(new char[] { ' ', '\t', '\r', '\n' }), new string[] { "#gmail.com", "#yahoo.com" });
}
private static string RemovePersonalInfo(IEnumerable<string> tokens, IEnumerable<string> domains, string replacement = "SOMETEXT")
{
return string.Join(" ", tokens.Select(token => (domains.Any(domain => token.Contains(domain)) ? replacement : token)));
}
public static string RemovePersonalInfoUsingRegex(string input)
{
//Regex and SOMETEXT can come from the configurartion
return Regex.Replace(input, #"(?<=\s+|^)(\S+(#gmail.com|#yahoo.com))(?=\s+|$)", "SOMETEXT", RegexOptions.IgnoreCase);
}

How to read .txt and count word/length, etc

I wrote a exam last week and had a really hard task to solve and didn't got the point.
I had a .txt with a Text.
The Text is like this:
Der zerbrochne Krug, ein Lustspiel,
von Heinrich von Kleist.
Berlin. In der Realschulbuchhandlung.
1811.
[8]
PERSONEN.
WALTER, Gerichtsrath. ADAM, Dorfrichter. LICHT, Schreiber. FRAU MARTHE
RULL. EVE, ihre Tochter. VEIT TÜMPEL, ein Bauer. RUPRECHT, sein Sohn.
FRAU BRIGITTE. EIN BEDIENTER, BÜTTEL, MÄGDE, etc.
Die Handlung spielt in einem niederländischen Dorfe bei Utrecht.
[9] Scene: Die Gerichtsstube. Erster Auftritt.
And i got the Main with this code:
var document = new Document("Text.txt");
if (document.Contains("Haus") == true)
Console.WriteLine(document["Haus"]); // Word: haus, Frequency.: 36, Length: 4
else
Console.WriteLine("Word not found!");
Now i had to write a class which helps to make the code above works.
Does anyone have an idea how to solve this problem and would help a young student of business informatics to understand, how this works?
Normally the StreamReader is easy for me, but in this case it wasn't possible for me...
Thank you very much and much love and healthy for all of you, who tries tohelpme.
Well this is the class you are looking for, hope this might help you.
class Document : Dictionary<string, int>
{
private const char WORDSPLITTER = ' ';
public string Filename { get; }
public Document(string filename)
{
Filename = filename;
Fill();
}
private void Fill()
{
foreach (var item in File.ReadLines(Filename))
{
foreach (var word in item.Split(WORDSPLITTER))
{
if (ContainsKey(word))
base[word] += 1;
else
Add(word, 1);
}
}
}
public bool Contains(string word) => ContainsKey(word);
public new string this[string word]
{
get
{
if (ContainsKey(word))
return $"Word: {word}, frequency: {base[word]}, Length: {word.Length}";
else
return $"Word {word} not found!";
}
}
}
Try the below function :
private bool FindWord( string SearchWord)
{
List<string> LstWords = new List<string>();
string[] Lines = File.ReadAllLines("Path of your File");
foreach (string line in Lines )
{
string[] words = line.Split(' ');
foreach (string word in words )
{
LstWords.Add(word);
}
}
// Find word set word to upper letters and target word to upper
int index = LstWords.FindIndex(x => x.Trim ().ToUpper ().Equals(SearchWord.ToUpper ()));
if (index==-1)
{
// Not Found
return false;
}
else
{
//word found
return true;
}
}
I find that Regex could be a good way to solve this:
var ms = Regex.Matches(textToSearch, wordToFind, RegexOptions.IgnoreCase);
if (ms.Count > 0)
{
Console.WriteLine($"Word: {wordToFind} Frequency: {ms.Count} Length: {wordToFind.Length}");
}
else
{
Console.WriteLine("Word not found!");
}
Regex is in the namespace:
using System.Text.RegularExpressions;
You will need to set the RegexOptions that are appropriate for your problem.
One of the approach would be below steps-
Create a class Document with below properties -
//Contains file name
public string FileName { get; set; }
//Contains file data
public string FileData { get; set; }
//Contains word count
public int WordCount { get; set; }
//Holds all the words
public Dictionary<string, int> DictWords { get; set; } = new Dictionary<string, int>();
Define the constructor which does 2 things -
Assign the property Filename to incoming file
Read the file from the path and get all the words from the file
Find the word count and insert them to dictionary, so the Final dictionary will
have all the <<<'word'>>, <<'TotalCount'>>> records
//Constructor
public Document(string fileName)
{
//1/ Assign File Name name troperty
FileName = fileName;
//2. Read File from the Path
string text = System.IO.File.ReadAllText(fileName, Encoding.Default);
string[] source = text.Split(new char[] { '.', '!', '?', ',', '(', ')', '\t', '\n', '\r', ' ' },
StringSplitOptions.RemoveEmptyEntries);
//3. Add the counts to Dictionary
foreach (String word in source)
{
if (DictWords.ContainsKey(word))
{
DictWords[word]++;
} else
{
DictWords[word] = 1;
}
}
}
Create "Contains" method which will be used to check whether the word is present or
not in the document-
//4. Method will return true /false based on the existence of the key/word.
public bool Contains(string word)
{
if (DictWords.ContainsKey(word))
{
return true;
}
else
{
return false;
}
}
Create an indexer on string for the class to get the desired output to be print to
Console -
//4. Define index on the word.
public string this[string word]
{
get
{
if (DictWords.TryGetValue(word, out int value))
{
return $"Word: {word}, Frequency.:{value}, Length: {word.Length}";
}
return string.Empty;
}
}
Tests :
var document = new Document(#"Text.txt");
if (document.Contains("BEDIENTER") == true)
Console.WriteLine(document["BEDIENTER"]);
else
Console.WriteLine("Word not found!");
//Output
// Word: BEDIENTER, Frequency.:1, Length: 9

What is wrong here in my C# Program?

I want to search a particular word in a defined string for which I am using the foreach key word, but it's not working.
I am just a beginner at this. Please help me what is wrong in this and I don't want to use arrays.
static void Main(string[] args)
{
string str = "Hello You are welcome";
foreach (string item in str) // can we use string here?
{
if (str.Contains(are); // I am checking if the word "are" is present in the above string
Console.WriteLine("True");
)
}
string str = "Hello You are welcome";
if (str.Contains("are"))
{
Console.WriteLine("True");
}
or you mean:
string str = "Hello You are welcome";
foreach (var word in str.Split()) // split the string (by space)
{
if (word == "are")
{
Console.WriteLine("True");
}
}
Try this
static void Main(string[] args)
{
string str = "Hello You are welcome";
foreach (var item in str.Split(' ')) // split the string (by space)
{
if (item == "are")
{
Console.WriteLine("True");
}
}
}

How to display the first special character entered in textbox, in a label

I have created a regex function and called it when the data is being saved.
public static bool CheckSpecialCharacter(string value)
{
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(#"[~`!##$%^*()=|\{}';.,<>]");
if (regex.IsMatch(value))
{
return false;
}
else
{
return true;
}
}
Used here:
if (ClassName.CheckSpecialCharacter(txt_ExpName1.Text)==false)
{
lblErrMsg.Text = "Special characters not allowed";
return;
}
Now instead of writing "Special characters not allowed", I want to attach the 1st special character that was entered in the textbox, so
if # was entered, the message should be read as "Special character # not allowed"
Is it possible to do this? please help.Thanks.
Try following code.
public static string CheckSpecialCharacter(string value)
{
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(#"[~`!##$%^*()=|\{}';.,<>]");
var match = regex.Match(value);
if (match.Success)
{
return match.Value;
}
else
{
return string.empty;
}
}
usage:
var value = ClassName.CheckSpecialCharacter(txt_ExpName1.Text);
if (!string.IsNullOrEmpty(value ))
{
lblErrMsg.Text = value + " Special characters not allowed";
return;
}
OR you can do it by returning bool and adding one out parameter in the function, but i will not suggest that.. check this link
EDIT - To do the same thing in Javascript
function CheckSpecialCharacter(value)
{
var res = value.match(/[~`!##$%^*()=|\{}';.,<>]/g);
return res == null ? "" : res[0];
}
usage:
var value = CheckSpecialCharacter(document.getElementById("txt_ExpName1").value);
if(value != "")
{
document.getElementById("lblErrMsg").innerHTML = value + " Special characters not allowed";
}
Try this:
public static bool CheckSpecialCharacter(string value, out string character)
{
var regex = new System.Text.RegularExpressions.Regex(#"[~`!##$%^*()=|\{}';.,<>]");
var match = regex.Match(value);
character = regex.Match(value).Value;
return match.Length == 0;
}
and then
string character;
if (ClassName.CheckSpecialCharacter(txt_ExpName1.Text, out character) == false)
{
lblErrMsg.Text = character + " Special characters not allowed";
return;
}
You can just use the Matches(string) function from Regex to get the matches then check the first element like this :
var regex = new Regex(#"[~`!##$%^*()=|\{}';.,<>]");
var matches = regex.Matches("This contains # two b#d characters");
if (matches.Count > 0)
{
var firstBadCharacter = matches[0];
}
Then you can wrap the result of your check in an Exception :
throw new ArgumentException("Special character '" + firstBadCharacter + "' not allowed.");

Split special string in c#

I want to split the below string with given output.
Can anybody help me to do this.
Examples:
/TEST/TEST123
Output: /Test/
/TEST1/Test/Test/Test/
Output: /Test1/
/Text/12121/1212/
Output: /Text/
/121212121/asdfasdf/
Output: /121212121/
12345
Output: 12345
I have tried string.split function but it is not worked well. Is there any idea or logic that i can implement to achieve this situation.
If the answer in regular expression that would be fine for me.
You simply want the first result of Spiting by /
string output = input.Split('/')[0];
But in case that you have //TEST/ and output should be /TEST you can use regex.
string output = Regex.Matches(input, #"\/?(.+?)\/")[0].Groups[1].Value;
For your 5th case : you have to separate the logic. for example:
public static string Method(string input)
{
var split = input.Split(new[] {'/'}, StringSplitOptions.RemoveEmptyEntries);
if (split.Length == 0) return input;
return split[0];
}
Or using regex.
public static string Method(string input)
{
var matches = Regex.Matches(input, #"\/?(.+?)\/");
if (matches.Count == 0) return input;
return matches[0].Groups[1].Value;
}
Some results using method:
TEST/54/ => TEST
TEST => TEST
/TEST/ => TEST
I think this would work:
string s1 = "/TEST/TEST123";
string s2 = "/TEST1/Test/Test/Test/";
string s3 = "/Text/12121/1212/";
string s4 = "/121212121/asdfasdf/";
string s5 = "12345";
string pattern = #"\/?[a-zA-Z0-9]+\/?";
Console.WriteLine(Regex.Matches(s1, pattern)[0]);
Console.WriteLine(Regex.Matches(s2, pattern)[0]);
Console.WriteLine(Regex.Matches(s3, pattern)[0]);
Console.WriteLine(Regex.Matches(s4, pattern)[0]);
Console.WriteLine(Regex.Matches(s5, pattern)[0]);
class Program
{
static void Main(string[] args)
{
string example = "/TEST/TEST123";
var result = GetFirstItem(example);
Console.WriteLine("First in the list : {0}", result);
}
static string GetFirstItem(string value)
{
var collection = value?.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
var result = collection[0];
return result;
}
}
StringSplitOptions.RemoveEmptyEntries is an enum which tells the Split function that when it has split the string into an array, if there are elements in the array that are empty strings, the function should not include the empty elements in the results. Basically you want the collection to contain only values.
public string functionName(string input)
{
if(input.Contains('/'))
{
string SplitStr = input.Split('/')[1];
return "/"+SplitStr .Substring(0, 1) +SplitStr.Substring(1).ToLower()+"/"
}
return input;
}
output = (output.Contains("/"))? '/' +input.Split('/')[1]+'/':input;
private void button1_Click(object sender, EventArgs e)
{
string test = #"/Text/12121/1212/";
int first = test.IndexOf("/");
int last = test.Substring(first+1).IndexOf("/");
string finall = test.Substring(first, last+2);
}
i try this code with all your examples and get correct output. try this.
The following method may help you.
public string getValue(string st)
{
if (st.IndexOf('/') == -1)
return st;
return "/" + st.Split('/')[1] + "/";
}

Categories

Resources