Extract substring between startsequence and endsequence in C# using LINQ - c#

I have an XML instance that contains processing instructions. I want a specific one (the schematron declaration):
<?xml-model href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?>
There may or may not be more than these very processing instructions present, so I can't rely on its position in the DOM; it is guaranteed, on the other hand, that there will be only one (or none) such Schematron file reference. Thus, I get it like so:
XProcessingInstruction p = d.Nodes().OfType<XProcessingInstruction>()
.Where(x => x.Target.Equals("xml-model") &&
x.Data.Contains("schematypens=\"http://purl.oclc.org/dsdl/schematron\""))
.FirstOrDefault();
In the example given, the content of p.Data is the string
href="../../a/b/c.sch" schematypens="http://purl.oclc.org/dsdl/schematron"
I need to extract the path specified via #href (i. e. in this example I would want the string ../../a/b/c.sch) without double quotes. In other words: I need the substring after href=" and before the next ". I'm trying to achieve my goal with LINQ:
var a = p.Data.Split(' ').Where(s => s.StartsWith("href=\""))
.Select(s => s.Substring("href=\"".Length))
.Select(s => s.TakeWhile(c => c != '"'));
I would have thought this gave me a IEnumerable<char> which I could then convert to a string in one of the ways described here, but that's not the case: According to LINQPad, I seem to be getting a IEnumerabale<IEnumerable<char>> which I can't manage to make into a string.
How could this be done correctly using LINQ? Maybe I'd better be using Regex within LINQ?
Edit: After typing this down, I came up with a working solution, but it seems very inelegant:
string a = new string
(
p.Data.Substring(p.Data.IndexOf("href=\"") + "href=\"".Length)
.TakeWhile(c => c != '"').ToArray()
);
What would be a better way?

Try this:
var input = #"<?xml-model href=""../../a/b/c.sch"" schematypens=""http://purl.oclc.org/dsdl/schematron""?>";
var match = Regex.Match(input, #"href=""(.*?)""");
var url = match.Groups[1].Value;
That gives me ../../a/b/c.sch in url.
Please don't use Regex for general XML parsing, but for this situation it's fine.

Related

Need to get two values out into an array from string split

I have a string that looks like this:
var result = "y-9m-10y-9m-11y-0m-02y-0m-03";
I need to make 2 lists:
one for all the y- objects(9,9,0,0)
and another for the m- objects(10,11,02,03).
How can I do this?
I have this older code from before that doesn't care about the y- objects. Now I need to get both sets.
var result = "m-10m-11m-02m-03";
var months = result.Split(new[] { "m-" }, StringSplitOptions.RemoveEmptyEntries);
Quick and dirty solution using regular expressions and LINQ:
var months = Regex.Matches(result, #"m-(\d+)").Cast<Match>().Select(m => int.Parse(m.Groups[1].Value));
var years = Regex.Matches(result, #"y-(\d+)").Cast<Match>().Select(m => int.Parse(m.Groups[1].Value));
Note that this doesn't do any error checking.
Edit: In the question you seem to use the extracted strings without converting them to int. In this case, omit the int.Parse and use m.Groups[1].Value directly.

Get specific char from a string request

I need to get 4 variable from a string
That's a resquest someone can make to a server :
String request= "Jouer_un_bulletin <nbT> <mise> <numeros_grilleA> <numeros_grilleB>"
I need to get nbt, mise, numeros_grilleA and numeros_grilleB
You can try using regular expressions:
String request = "Jouer_un_bulletin <nbT> <mise> <numeros_grilleA> <numeros_grilleB>";
String pattern = #"(?<=<)[^<>]*(?=>)";
String[] prms = Regex
.Matches(request, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Test:
// nbT
// mise
// numeros_grilleA
// numeros_grilleB
Console.Write(String.Join(Environment.NewLine, prms));
You need to parse the request. How to do this is entirely based on what you expect to receive and how you validate it. Assuming (and this is a big assumption) that you will always get the format above, you can use String.Split to split the string on the open bracket character, and then take the component pieces (ignoring the first one) and trim off the close bracket and any additional spaces. These will be your variables. This is not, by any means, a good way of sending data, and you should at a minimum validate this data a lot before using it.
The very basic (and by all means this is terrible code you shouldn't use) concept is:
String request= "Jouer_un_bulletin <nbT> <mise> <numeros_grilleA> <numeros_grilleB>";
var pieces = request.Split('<');
var strList = new List<string>();
for(int i = 1 ; i < pieces.Length; i++)
{
strList.Add(pieces[i].Trim(' ','>'));
}

Using LINQ to filter out lines in textfile C#

I am having trouble using LINQ with 2 predicates. Basically, I need to remove a line from a text file which matches my predicates below. But it is not working and ONLY the lines for a particular Environment.MachineName are being removed.
EDIT: Added input data for rLogonTime
private static string rLogonTime = DateTime.Now.ToLongTimeString().ToString();
var oldLines = File.ReadAllLines(rLogonPath);
var newLines = oldLines.Where(x => !x.Contains(Environment.MachineName) && !x.Contains(rLogonTime)); //<--
File.WriteAllLines(rLogonPath, newLines);
Is it possible to do this? Or is there another way?
Cheers.
EDIT: Once again, I apologise for not being clear.
rLogonTime is a static string, my text file DOES contain the same value as it does in DEBUG mode in my app. IF I try
var newLines = oldLines.Where(x => !x.Contains(rLogonTime));
on its own, it works. Same for Environment.MachineName
But they DO NOT work together.
Also, a couple of lines from the text file as requested.
u####,######!,####.vshost.exe,31/10/2014,11:58:11 AM,PC-67027
u####,######!,####.vshost.exe,31/10/2014,12:15:02 PM,PC-65027
I think your logic is almost correct. You want to exclude all lines that contain Environment.MachineName and rLogonTime, so the lines you want to exclude are ones that:
contains MachineName AND contains LogonTime
So wrap that all in a NOT:
var newLines = oldLines.Where(x => !(x.Contains(Environment.MachineName) && x.Contains(rLogonTime)))

Regular Expression finding quotations

i am trying to check if a string is a quotation with regex in C#.
For e.g.
string x = "The flora and fauna of Britain \"has been transported to almost every corner of the globe since colonial times\" (Plants and Animals of Britain, 1942: 8).;
string y = "Morris et al (2000: 47) state \"that the debate of these particular issues should be left to representative committees.\"";
x and y are two quotations and the regex (or alternative solution) should be able to return true.
I came with this but there is a small problem:
string pattern = #"([‘'""]([\w\W]+?)[)])|(([\w\W]+?)[(]([\w\W]+?)[’'""])";
Is there any alternatives? Thanks in advance.
The project is an anti-plagiarism web application. The application found that these strings(quotation) was copied from the web. Now assume the user wants not to include these quotations in the search results, the question is how to do it.
The search results are stored in database, i am using EF and linq as such:
var webSearches = _db.WebSearches.Where(x => x.SubmissionId == submissionId).GroupBy(x => x.PlagiarisedText).Select(x => x.FirstOrDefault()).OrderBy(x => x.Id);
I want to filter the result (plagiarisedText) by not including quotations.
Thanks for replies, I appreciate.
Use \\\".
Use Regex.IsMatch() to find if it contains or not.
Console.WriteLine(Regex.IsMatch(x, "\\\""));// true if it contains ", otherwise false
If Regex is not a requirement you can use String functions:
int first = str.IndexOf('"');
int last = str.LastIndexOf('"');
if (str.Substring(first, last - first) != string.Empty)
{
// true
}
If it will be true when the first and the end characters are both "s, then you can simple use the following regex:
".*"

Regex matching dynamic words within an html string

I have an html string to work with as follows:
string html = new MvcHtmlString(item.html.ToString()).ToHtmlString();
There are two different types of text I need to match although very similar. I need the initial ^^ removed and the closing |^^ removed. Then if there are multiple clients I need the ^ separating clients changed to a comma(,).
^^Client One- This text is pretty meaningless for this task, but it will exist in the real document.|^^
^^Client One^Client Two^Client Three- This text is pretty meaningless for this task, but it will exist in the real document.|^^
I need to be able to match each client and make it bold.
Client One- This text is pretty meaningless for this task, but it will exist in the real document.
Client One, Client Two, Client Three- This text is pretty meaningless for this task, but it will exist in the real document.
A nice stack over flow user provided the following but I could not get it to work or find any matches when I tested it on an online regex tester.
const string pattern = #"\^\^(?<clients>[^-]+)(?<text>-.*)\|\^\^";
var result = Regex.Replace(html, pattern,
m =>
{
var clientlist = m.Groups["clients"].Value;
var newClients = string.Join(",", clientlist.Split('^').Select(s => string.Format("<strong>{0}</strong>", s)));
return newClients + m.Groups["text"];
});
I am very new to regex so any help is appreciated.
I'm new to C# so forgive me if I make rookie mistakes :)
const string pattern = #"\^\^([^-]+)(-[^|]+)\|\^\^";
var temp = Regex.Replace(html, pattern, "<strong>$1</strong>$2");
var result = Regex.Replace(temp, #"\^", "</strong>, <strong>");
I'm using $1 even though MSDN is vague about using that syntax to reference subgroups.
Edit: if it's possible that the text after - contains a ^ you can do this:
var result = Regex.Replace(temp, #"\^(?=.*-)", "</strong>, <strong>");

Categories

Resources