Retrieve The Second Name Using Regex - c#

I want to use Regex to retrieve the person and its address.
The result wild be :
All Frank Anderson and its address inside of a string list.
Problem:
The problem I'm facing is that I cannot retrieve the second name that is "Frank Andre Anderson" based on my regex.
It also might be other people who can have another second name.
Thank you!
string pFirstname = "Frank"
string pLastname = "Anderson";
string input = w.DownloadString("http://www.birthday.no/sok/?f=Frank&l=Anderson");
Match theRegex8 = Regex.Match(input, #"(?<=\><b>)" + pFirstname + "(.+?)" + pLastname + "</b></a></h3><p><span>(.+?<)", RegexOptions.IgnoreCase);
foreach (var matchgroup in theRegex8.Groups)
{
var sss = matchgroup;
}
The current result that I'm using the code is:

You must be looking for something like
(?<=>[^<]*<b>)Frank([^<]+)Anderson</b></a></h3><p><span>([^<]+)
See concise RegexStorm demo
In C#, the regex declaration will be
Match theRegex8 = Regex.Match(input, #"(?<=>[^<]*<b>)" + pFirstname + "([^<]+)" + pLastname + "</b></a></h3><p><span>([^<]+)", RegexOptions.IgnoreCase);
The problem you had was with . matching any character while we need to restrict to a non-angle bracket.
Update
Perhaps, you could leverage HtmlAgilityPack by getting all <a> tags that have <b> as the first child, and then get the InnerText that meets your conditions:
var conditions = new[] { pFirstname, pLastname};
var seconds = new List<string>();
var webGet = new HtmlAgilityPack.HtmlWeb();
var doc = webGet.Load("http://www.birthday.no/sok/?f=Frank&l=Anderson");
var a_nodes = doc.DocumentNode.Descendants("a").Where(a => a.HasChildNodes && a.ChildNodes[0].Name == "b");
var res = a_nodes.Select(a => a.ChildNodes[0].InnerText).Where(b => conditions.All(condition => b.Contains(condition))).ToList();
foreach (var name in res)
{
var splts = name.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries);
if (splts.GetLength(0) > 2) // we have 3 elements at the least
seconds.Add(name.Trim().Substring(name.Trim().IndexOf(" ") + 1, name.Trim().LastIndexOf(" ") - name.Trim().IndexOf(" ") - 1));
}
This way, you will get just the second names. I could not test this code, but I think you get the gist.

Related

Unexpected regex result with a single space

Can somebody please tell me why a space comes up with 2 matches for the below pattern?
((?<key>(?:((?!\d)\w+(?:\.(?!\d)\w+)*)\.)?((?!\d)\w+)):(?<value>([^ "]+)|("[^"]*?")+))*
Trying to match the following cases:
var body = "Key:Hello";
var body = "Key:\"Hello\"";
var body = "Key1:Hello Key2:\"Goodbye\"";
This may provide more context:
pattern = #"((?<key>" + StringExtensions.REGEX_IDENTIFIER_MIDSTRING + "):(?<value>([^ \"]+)|(\"[^\"]*?\")+))*";
My goal is to pull the keys, values out of a command-line like string in the form of [key]:[value] with optional repeats. Values can either be a with no spaces or in quotes with spaces.
Probably right there in front of me but I'm not seeing it.
Probably because “.”, because a period in regex, marches every character except line breaks
I took a different approach:
public static Dictionary<string, string> GetCommandLineKeyValues(this string commandLine)
{
var keyValues = new Dictionary<string, string>();
var pattern = #"(?<command>(" + StringExtensions.REGEX_IDENTIFIER + " )?)(?<args>.*)";
var args = commandLine.RegexGet(pattern, "args");
Match match;
if (args.Length > 0)
{
string key;
string value;
pattern = #" ?(?<key>" + StringExtensions.REGEX_IDENTIFIER_MIDSTRING + ")*?:(?<value>([^ \"]+)|(\"[^\"]*?\")+)";
do
{
match = args.RegexGetMatch(pattern);
if (match == null)
{
break;
}
key = match.Groups["key"].Value;
value = match.Groups["value"].Value;
keyValues.Add(key, value);
args = match.Replace(args, string.Empty);
}
while (args.RegexIsMatch(pattern));
}
return keyValues;
}
I took what I call the "pac-man" approach to Regex.. match, eat (hence the Match.Replace), and continue matching.
For convenience:
public const string REGEX_IDENTIFIER = #"^(?:((?!\d)\w+(?:\.(?!\d)\w+)*)\.)?((?!\d)\w+)$";

How to prevent duplicates in Search Results (ASP.NET Razor Syntax)

I'm creating a search algorithm based on keywords and sometimes phrases. I want to output results based on each word of every phrase in the search bar.
Below is my code which works fine:
var searchWords = Request["searchTerm"].Split(' ');
IEnumerable<dynamic> stories = Enumerable.Empty<string>();
var sqlSelect = "SELECT TOP 10 Feeds.* FROM Feeds WHERE Feeds.AdminPublish=#1 AND Title LIKE #0 ORDER BY UploadDate DESC";
foreach(var word in searchWords)
{
stories = stories.Concat(db.Query(sqlSelect, "%" + word + "%",).ToList());
}
In my View -Page (Result Page)
#foreach (var d in stories){
//Show Title and description
#d.Title
#d.Username
}
Problem
Let's say am searching for "xmas of Annoying", and after searching the db, it splits the phrase into three and returns results based on LIKENESS/SIMILARITIES of the three found words. I get aresults like
The Sociology of Annoying Xmas Songs
Welcome to the future
The Sociology of Annoying Xmas Songs
Welcome to the future
The Sociology of Annoying Xmas Songs
Welcome to the future
How do I condition my result to not show show results that have similar ID's or how do I remove excess duplicates
Yo can use Distinct Keyword To Select Distinct from Table
var sqlSelect = #"SELECT distinct TOP 10 Feeds.*
FROM Feeds WHERE Feeds.AdminPublish=#1 AND
Title LIKE #0 ORDER BY UploadDate DESC";
EDIT:
then You can do like this
foreach(var word in searchWords)
{
stories = stories.Concat(db.Query(sqlSelect, "%" + word + "%",).ToList());
}
stories= stories.Distinct<dynamic>();
Also you can use the Distinct extension method in lambda expression to achive the same in your View (Razor)
#foreach (var d in stories.Distinct()){
//Show Title and description
#d.Title
#d.Username
}
You can also provide a specific property on the basis of which the list can be made distinct, like:
stories.Distinct(x=>x.Title)
//Or
stories.Distinct(x=>x.UserName)
I would say to not add duplicate values to stories in the first place...
Something similar to this:
foreach(var word in searchWords)
{
lst = db.Query(sqlSelect, "%" + word + "%",).ToList()
foreach(var v in lst)
{
if (!stories.Contains(v))
stories.Add(v)
}
}
Mike from #Mikesdotnetting.com came up with a dynamic solution, most credit goes to him, and some to me for spotting out ways to tweek the structure, for future reference; heres the solution
var searchWords = Request["searchTerm"].Split(' ').Select(s => "%" + s + "%");
var sql = "SELECT TOP 10 Feeds.* FROM Feeds WHERE";
for(var i = 0; i < searchWords.Count(); i++)
{
sql += i == 0 ? "" : " OR ";
sql += " Title LIKE #" + i;
}
sql += " ORDER BY UploadDate DESC";
var stories = db.Query(sql, searchWords.array());
IF you want a result that might look something like this:
SearchTerm = "one two three"
SQL = ... where title like '%one%' or title like '%two%' or title like '%three%' or fname like '%one%' or fname like '%two%' or fname like '%three%' WHERE A FINAL CONDITION BINDS THEM ALL etc
If so, try this:
var fields = new List<string>{"Title", "SubTitle", "Fname", "UserName", "Link"};
var searchWords = Request["searchTerm"].Split(' ').Select(s => "%" + s + "%").ToList();
var sql = "SELECT TOP 10 Feeds.* FROM Feeds WHERE (";
System.Text.StringBuilder sb = new StringBuilder();
var i = 0;
foreach(var word in searchWords)
{
foreach(var field in fields)
{
sb.AppendFormat(" OR {0} LIKE #{1}", field, i);
}
i++;
}
sql = sql + sb.ToString().Trim().TrimStart("OR".ToCharArray()) + string.Format(") AND Feeds.AdminPublish=#{0} ORDER BY UploadDate DESC", i);
searchWords.Add(your_parameter_value_for_AdminPublish);
var stories = db.Query(sql, searchWords.ToArray());

C# regex to extract key value

Is there an easy and elegant way to extract key value pairs from a string of below format?
"key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'"
My attempt resulted in this but I'm not too happy with it
var regex = new Regex(#"\'\s", RegexOptions.None);
var someString = #"key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'" + " ";
var splitArray = regex.Split(someString);
IDictionary<string, string> keyValuePairs = new Dictionary<string, string>();
foreach (var split in splitArray)
{
regex = new Regex(#"\=\'", RegexOptions.None);
var keyValuArray = regex.Split(split);
if (keyValuArray.Length > 1)
{
keyValuePairs.Add(keyValuArray[0], keyValuArray[1]);
}
}
You should be able to do it without a split, using a MatchCollection instead:
var rx = new Regex("([^=\\s]+)='([^']*)'");
var str = "key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'";
foreach (Match m in rx.Matches(str)) {
Console.WriteLine("{0} {1}", m.Groups[1], m.Groups[2]);
}
Demo.
The heart of this solution is this regular expression: ([^=\\s]+)='([^']*)' It defines the structure of your key-value pair: a sequence of non-space characters defines the key, then there's an equal sign, followed by the value enclosed in single quotes. This solution goes through the matches in sequence, extracting keys and values, which are assigned to matching groups Group[1] and Group[2], in this order.
Another way to do it:
var someString = #"key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'" + " ";
Dictionary<string, string> dic = Regex.Matches(someString, #"(?<key>\w+)='(?<value>[^']*)'")
.OfType<Match>()
.ToDictionary(m => m.Groups["key"].Value, m => m.Groups["value"].Value);
You can do it like this
var str = "key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'";
var arr = Regex.Split(str, "(?<=')\\s(?=\\w)"); // split on whitespace to get key=value
foreach(var s in arr) {
var nArr = s.Split("="); // split on = to get key and value
keyValuePairs.Add(nArr[0], nArr[1]);
}
(?<=')\s(?=\w) will look for space which is after ' and before the start of the key

Regex without escaping Characters - Problems

I found some solutions for my problem, which is quite simple:
I have a string, which is looking like this:
"\r\nContent-Disposition: form-data; name=\"ctl00$cphMainContent$grid$ctl03$ucPicture$ctl00\""
My goal is to break it down, so I have a Dictionary of values, like:
Key = "name", value ? "ctl..."
My approach was: Split it by "\r\n" and then by the equal or the colon sign.
This worked fine, but then some funny Tester uploaded a file with all allowed charactes, which made the String looking like this:
"\r\nContent-Disposition: form-data; name=\"ctl00_cphMainContent_grid_ctl03_ucPicture_btnUpload$fileUpload\"; filename=\"C:\\Users\\matthias.mueller\\Desktop\\- ie+![]{}_-´;,.$¨##ç %&()=~^`'.jpg\"\r\nContent-Type: image/jpeg"
Of course, the simple splitting doesn't work anymore, since it splits now the filename.
I corrected this by reading out "filename=" and escaping the signs I'm looking to split, and then creating a regex.
Now comes my problem: I found two Regex-samples, which could do the work for the equal sign, the semicolon and the colon. one is:
[^\\]=
The other one I found was:
(?<!\\\\)=
The problem is, the first one doesn't only split, but it splits the equal sign and one character before this sign, which means my key in the Dictionary is "nam" instead of "name"
The second one works fine on this matter, but it still splits the escaped equal sign in the filename.
Is my approach for this problem even working? Would there be a better solution for this? And why is the first Regex cutting a character?
Edit: To avoid confusion, my escaped String looks like this:
"Content-Disposition: form-data; name=\"ctl00_cphMainContent_grid_ctl03_ucPicture_btnUpload$fileUpload\"; filename=\"C\:\Users\matthias.mueller\Desktop\- ie+![]{}_-´\;,.$¨##ç %&()\=~^`'.jpg\""
So I want basically: Split by equal Sign EXCEPT the escaped ones. By the way: The string here shows only one \, but there are 2.
Edit 2: OK seems like I have a working solution, but it's so ugly:
Dictionary<string, string> ParseHeader(byte[] bytes, int pos)
{
Dictionary<string, string> items;
string header;
string[] headerLines;
int start;
int end;
string input = _encoding.GetString(bytes, pos, bytes.Length - pos);
start = input.IndexOf("\r\n", 0);
if (start < 0) return null;
end = input.IndexOf("\r\n\r\n", start);
if (end < 0) return null;
WriteBytes(false, bytes, pos, end + 4 - 0); // Write the header to the form content
header = input.Substring(start, end - start);
items = new Dictionary<string, string>();
headerLines = Regex.Split(header, "\r\n");
Regex regLineParts = new Regex(#"(?<!\\\\);");
Regex regColon = new Regex(#"(?<!\\\\):");
Regex regEqualSign = new Regex(#"(?<!\\\\)=");
foreach (string hl in headerLines)
{
string workString = hl;
//Escape the Semicolon in filename
if (hl.Contains("filename"))
{
String orig = hl.Substring(hl.IndexOf("filename=\"") + 10);
orig = orig.Substring(0, orig.IndexOf('"'));
string toReplace = orig;
toReplace = toReplace.Replace(toReplace, toReplace.Replace(";", #"\\;"));
toReplace = toReplace.Replace(toReplace, toReplace.Replace(":", #"\\:"));
toReplace = toReplace.Replace(toReplace, toReplace.Replace("=", #"\\="));
workString = hl.Replace(orig, toReplace);
}
string[] lineParts = regLineParts.Split(workString);
for (int i = 0; i < lineParts.Length; i++)
{
string[] p;
if (i == 0)
p = regColon.Split(lineParts[i]);
else
p = regEqualSign.Split(lineParts[i]);
if (p.Length == 2)
{
string orig = p[0];
orig = orig.Replace(#"\\;", ";");
orig = orig.Replace(#"\\:", ":");
orig = orig.Replace(#"\\=", "=");
p[0] = orig;
orig = p[1];
orig = orig.Replace(#"\\;", ";");
orig = orig.Replace(#"\\:", ":");
orig = orig.Replace(#"\\=", "=");
p[1] = orig;
items.Add(p[0].Trim(), p[1].Trim());
}
}
}
return items;
}
Needs some further testing.
I had a go at writing a parser for you. It handles literal strings, like "here is a string", as the values in name-value pairs. I've also written a few tests, and the last shows an '=' character inside a literal string. It also handles escaping quotes (") inside literal strings by escaping as \" -- I'm not sure if this is right, but you could change it.
A quick explanation. I first find anything that looks like a literal string and replace it with a value like PLACEHOLDER8230498234098230498. This means the whole thing is now literal name-value pairs; eg
key="value"
becomes
key=PLACEHOLDER8230498234098230498
The original string value is stored off in the literalStrings dictionary for later.
So now we split on semicolons (to get key=value strings) and then on equals, to get the proper key/value pairs.
Then I substitute the placeholder values back in before returning the result.
public class HttpHeaderParser
{
public NameValueCollection Parse(string header)
{
var result = new NameValueCollection();
// 'register' any string values;
var stringLiteralRx = new Regex(#"""(?<content>(\\""|[^\""])+?)""", RegexOptions.IgnorePatternWhitespace);
var equalsRx = new Regex("=", RegexOptions.IgnorePatternWhitespace);
var semiRx = new Regex(";", RegexOptions.IgnorePatternWhitespace);
Dictionary<string, string> literalStrings = new Dictionary<string, string>();
var cleanedHeader = stringLiteralRx.Replace(header, m =>
{
var replacement = "PLACEHOLDER" + Guid.NewGuid().ToString("N");
var stringLiteral = m.Groups["content"].Value.Replace("\\\"", "\"");
literalStrings.Add(replacement, stringLiteral);
return replacement;
});
// now it's safe to split on semicolons to get name-value pairs
var nameValuePairs = semiRx.Split(cleanedHeader);
foreach(var nameValuePair in nameValuePairs)
{
var nameAndValuePieces = equalsRx.Split(nameValuePair);
var name = nameAndValuePieces[0].Trim();
var value = nameAndValuePieces[1];
string replacementValue;
if (literalStrings.TryGetValue(value, out replacementValue))
{
value = replacementValue;
}
result.Add(name, value);
}
return result;
}
}
There's every chance there are some proper bugs in it.
Here's some unit tests you should incorporate, too;
[TestMethod]
public void TestMethod1()
{
var tests = new[] {
new { input=#"foo=bar; baz=quux", expected = #"foo|bar^baz|quux"},
new { input=#"foo=bar;baz=""quux""", expected = #"foo|bar^baz|quux"},
new { input=#"foo=""bar"";baz=""quux""", expected = #"foo|bar^baz|quux"},
new { input=#"foo=""b,a,r"";baz=""quux""", expected = #"foo|b,a,r^baz|quux"},
new { input=#"foo=""b;r"";baz=""quux""", expected = #"foo|b;r^baz|quux"},
new { input=#"foo=""b\""r"";baz=""quux""", expected = #"foo|b""r^baz|quux"},
new { input=#"foo=""b=r"";baz=""quux""", expected = #"foo|b=r^baz|quux"},
};
var parser = new HttpHeaderParser();
foreach(var test in tests)
{
var actual = parser.Parse(test.input);
var actualAsString = String.Join("^", actual.Keys.Cast<string>().Select(k => string.Format("{0}|{1}", k, actual[k])));
Assert.AreEqual(test.expected, actualAsString);
}
}
Looks to me like you'll need a bit more of a solid parser for this than a regex split. According to this page the name/value pairs can either be 'raw';
x=1
or quoted;
x="foo bar baz"
So you'll need to look for a solution that not only splits on the equals, but ignores any equals inside;
x="y=z"
It might be that there is a better or more managed way for you to access this info. If you are using a classic ASP.NET WebForms FileUpload control, you can access the filename using the properties of the control, like
FileUpload1.HasFile
FileUpload1.FileName
If you're using MVC, you can use the HttpPostedFileBase class as a parameter to the action method. See this answer
[HttpPost]
public ActionResult Index(HttpPostedFileBase file)
{
// Verify that the user selected a file
if (file != null && file.ContentLength > 0)
{
// extract only the fielname
var fileName = Path.GetFileName(file.FileName);
// store the file inside ~/App_Data/uploads folder
var path = Path.Combine(Server.MapPath("~/App_Data/uploads"), fileName);
file.SaveAs(path);
}
// redirect back to the index action to show the form once again
return RedirectToAction("Index");
}
This:
(?<!\\\\)=
matches = not preceded by \\.
It should be:
(?<!\\)=
(Make sure you use # (verbatim) strings for the regex, to avoid confusion)

WebMatrix internal search engine

I've wrriten a code which searchs the database but i don't know why when i search for some specific keywords it'll show other links which are unrelated. here's the code and the result.
Page.Title = "Catalog Search";
var db = Database.Open("Shopping");
var searchWords = Request["searchTerm"].Split(' ');
IEnumerable<dynamic> result = Enumerable.Empty<string>();
var sqlSelect = "SELECT ProductId, ProductTitle FROM Products WHERE " +
"ProductTitle LIKE #0";
foreach(var word in searchWords)
{
result = result.Concat(db.Query(sqlSelect, "%" + word + "%").ToList());
}
so i searched for "Samsung LCD" and here's the result.
Samsung - 15" Series 9 Ultrabook Laptop
Samsung - Galaxy Tab 2 7.0
Samsung - 32" Class - LCD
Samsung - 32" Class - LCD
i've seen a php code which is exactly what i want but unfortunately i don't know how to convert it. here's the php code.
$searchTerms = explode(' ', $bucketsearch);
$searchTermBits = array();
foreach ($searchTerms as $term) {
$term = trim($term);
if (!empty($term)) {
$searchTermBits[] = "bucketname LIKE '%$term%'";
}
}
...
$result = mysql_query("SELECT * FROM buckets WHERE ".implode(' AND ', $searchTermBits).");
and the result of the php search code.
SELECT * FROM buckets WHERE bucketname LIKE '%apple%' AND bucketname LIKE '%and%' AND bucketname LIKE '%pear%'
I took the liberty of adding a few null checks and such, but the code for C# would pretty much look like this:
// make sure search terms are passed in, and remove blank entries
var searchTerms = Request["searchTerms"] == null ?
new string[] {} :
Request["searchTerms"].Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
// build the list of query items using parameterization
var searchTermBits = new List<string>();
for (var i=0; i<searchTerms.Length; i++) {
searchTermBits.Add("bucketname LIKE #" + i);
}
// create your sql command using a join over the array
var query = "SELECT * FROM buckets";
if (searchTerms.Length > 0) {
query += " WHERE " + string.Join(" AND ", searchTermBits);
}
// ask the database using a lambda to add the %
var db = Database.Open("StarterSite");
var results = db.Query(query, searchTerms.Select(x => "%" + x + "%").ToArray());
// enjoy!
Response.Write(results.Count());
Let me know if you run into any more trouble!

Categories

Resources